All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
@ 2016-10-25  9:03 ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm,
	Christian Borntraeger

Peter,

here is v2 with some improved patch descriptions and some fixes. The
previous version has survived one day of linux-next and I only changed
small parts.
So unless there is some other issue, feel free to pull (or to apply
the patches) to tip/locking.

The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69:

  Linux 4.9-rc2 (2016-10-23 17:10:14 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  tags/cpurelax

for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07:

  processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200)

----------------------------------------------------------------
cpu_relax: drop lowlatency, introduce yield

For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

So my proposal boils down to:
- lowest latency: use barrier() or mb() if necessary
- low latency: use cpu_relax (e.g. might give up some cpu for the other
  _hardware_ threads)
- really give up CPU: use  cpu_relax_yield

PS: In the long run I would also try to provide for s390 something
like cpu_relax_yield_to with a cpu number (or just add that to
cpu_relax_yield), since a yield_to is always better than a yield as
long as we know the waiter.

----------------------------------------------------------------
Christian Borntraeger (5):
      processor.h: introduce cpu_relax_yield
      stop_machine: yield CPU during stop machine
      s390: make cpu_relax a barrier again
      processor.h: Remove cpu_relax_lowlatency users
      processor.h: remove cpu_relax_lowlatency

 arch/alpha/include/asm/processor.h      | 2 +-
 arch/arc/include/asm/processor.h        | 4 ++--
 arch/arm/include/asm/processor.h        | 2 +-
 arch/arm64/include/asm/processor.h      | 2 +-
 arch/avr32/include/asm/processor.h      | 2 +-
 arch/blackfin/include/asm/processor.h   | 2 +-
 arch/c6x/include/asm/processor.h        | 2 +-
 arch/cris/include/asm/processor.h       | 2 +-
 arch/frv/include/asm/processor.h        | 2 +-
 arch/h8300/include/asm/processor.h      | 2 +-
 arch/hexagon/include/asm/processor.h    | 2 +-
 arch/ia64/include/asm/processor.h       | 2 +-
 arch/m32r/include/asm/processor.h       | 2 +-
 arch/m68k/include/asm/processor.h       | 2 +-
 arch/metag/include/asm/processor.h      | 2 +-
 arch/microblaze/include/asm/processor.h | 2 +-
 arch/mips/include/asm/processor.h       | 2 +-
 arch/mn10300/include/asm/processor.h    | 2 +-
 arch/nios2/include/asm/processor.h      | 2 +-
 arch/openrisc/include/asm/processor.h   | 2 +-
 arch/parisc/include/asm/processor.h     | 2 +-
 arch/powerpc/include/asm/processor.h    | 2 +-
 arch/s390/include/asm/processor.h       | 4 ++--
 arch/s390/kernel/processor.c            | 4 ++--
 arch/score/include/asm/processor.h      | 2 +-
 arch/sh/include/asm/processor.h         | 2 +-
 arch/sparc/include/asm/processor_32.h   | 2 +-
 arch/sparc/include/asm/processor_64.h   | 2 +-
 arch/tile/include/asm/processor.h       | 2 +-
 arch/unicore32/include/asm/processor.h  | 2 +-
 arch/x86/include/asm/processor.h        | 2 +-
 arch/x86/um/asm/processor.h             | 2 +-
 arch/xtensa/include/asm/processor.h     | 2 +-
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c                     | 4 ++--
 kernel/locking/mcs_spinlock.h           | 4 ++--
 kernel/locking/mutex.c                  | 4 ++--
 kernel/locking/osq_lock.c               | 6 +++---
 kernel/locking/qrwlock.c                | 6 +++---
 kernel/locking/rwsem-xadd.c             | 4 ++--
 kernel/stop_machine.c                   | 2 +-
 lib/lockref.c                           | 2 +-
 42 files changed, 53 insertions(+), 53 deletions(-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
@ 2016-10-25  9:03 ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm,
	Christian Borntraeger

Peter,

here is v2 with some improved patch descriptions and some fixes. The
previous version has survived one day of linux-next and I only changed
small parts.
So unless there is some other issue, feel free to pull (or to apply
the patches) to tip/locking.

The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69:

  Linux 4.9-rc2 (2016-10-23 17:10:14 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  tags/cpurelax

for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07:

  processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200)

----------------------------------------------------------------
cpu_relax: drop lowlatency, introduce yield

For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

So my proposal boils down to:
- lowest latency: use barrier() or mb() if necessary
- low latency: use cpu_relax (e.g. might give up some cpu for the other
  _hardware_ threads)
- really give up CPU: use  cpu_relax_yield

PS: In the long run I would also try to provide for s390 something
like cpu_relax_yield_to with a cpu number (or just add that to
cpu_relax_yield), since a yield_to is always better than a yield as
long as we know the waiter.

----------------------------------------------------------------
Christian Borntraeger (5):
      processor.h: introduce cpu_relax_yield
      stop_machine: yield CPU during stop machine
      s390: make cpu_relax a barrier again
      processor.h: Remove cpu_relax_lowlatency users
      processor.h: remove cpu_relax_lowlatency

 arch/alpha/include/asm/processor.h      | 2 +-
 arch/arc/include/asm/processor.h        | 4 ++--
 arch/arm/include/asm/processor.h        | 2 +-
 arch/arm64/include/asm/processor.h      | 2 +-
 arch/avr32/include/asm/processor.h      | 2 +-
 arch/blackfin/include/asm/processor.h   | 2 +-
 arch/c6x/include/asm/processor.h        | 2 +-
 arch/cris/include/asm/processor.h       | 2 +-
 arch/frv/include/asm/processor.h        | 2 +-
 arch/h8300/include/asm/processor.h      | 2 +-
 arch/hexagon/include/asm/processor.h    | 2 +-
 arch/ia64/include/asm/processor.h       | 2 +-
 arch/m32r/include/asm/processor.h       | 2 +-
 arch/m68k/include/asm/processor.h       | 2 +-
 arch/metag/include/asm/processor.h      | 2 +-
 arch/microblaze/include/asm/processor.h | 2 +-
 arch/mips/include/asm/processor.h       | 2 +-
 arch/mn10300/include/asm/processor.h    | 2 +-
 arch/nios2/include/asm/processor.h      | 2 +-
 arch/openrisc/include/asm/processor.h   | 2 +-
 arch/parisc/include/asm/processor.h     | 2 +-
 arch/powerpc/include/asm/processor.h    | 2 +-
 arch/s390/include/asm/processor.h       | 4 ++--
 arch/s390/kernel/processor.c            | 4 ++--
 arch/score/include/asm/processor.h      | 2 +-
 arch/sh/include/asm/processor.h         | 2 +-
 arch/sparc/include/asm/processor_32.h   | 2 +-
 arch/sparc/include/asm/processor_64.h   | 2 +-
 arch/tile/include/asm/processor.h       | 2 +-
 arch/unicore32/include/asm/processor.h  | 2 +-
 arch/x86/include/asm/processor.h        | 2 +-
 arch/x86/um/asm/processor.h             | 2 +-
 arch/xtensa/include/asm/processor.h     | 2 +-
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c                     | 4 ++--
 kernel/locking/mcs_spinlock.h           | 4 ++--
 kernel/locking/mutex.c                  | 4 ++--
 kernel/locking/osq_lock.c               | 6 +++---
 kernel/locking/qrwlock.c                | 6 +++---
 kernel/locking/rwsem-xadd.c             | 4 ++--
 kernel/stop_machine.c                   | 2 +-
 lib/lockref.c                           | 2 +-
 42 files changed, 53 insertions(+), 53 deletions(-)


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-10-25  9:03 ` Christian Borntraeger
@ 2016-10-25  9:03   ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm,
	Christian Borntraeger

For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 +
 arch/arc/include/asm/processor.h        | 2 ++
 arch/arm/include/asm/processor.h        | 1 +
 arch/arm64/include/asm/processor.h      | 1 +
 arch/avr32/include/asm/processor.h      | 1 +
 arch/blackfin/include/asm/processor.h   | 1 +
 arch/c6x/include/asm/processor.h        | 1 +
 arch/cris/include/asm/processor.h       | 1 +
 arch/frv/include/asm/processor.h        | 1 +
 arch/h8300/include/asm/processor.h      | 1 +
 arch/hexagon/include/asm/processor.h    | 1 +
 arch/ia64/include/asm/processor.h       | 1 +
 arch/m32r/include/asm/processor.h       | 1 +
 arch/m68k/include/asm/processor.h       | 1 +
 arch/metag/include/asm/processor.h      | 1 +
 arch/microblaze/include/asm/processor.h | 1 +
 arch/mips/include/asm/processor.h       | 1 +
 arch/mn10300/include/asm/processor.h    | 1 +
 arch/nios2/include/asm/processor.h      | 1 +
 arch/openrisc/include/asm/processor.h   | 1 +
 arch/parisc/include/asm/processor.h     | 1 +
 arch/powerpc/include/asm/processor.h    | 1 +
 arch/s390/include/asm/processor.h       | 3 ++-
 arch/s390/kernel/processor.c            | 4 ++--
 arch/score/include/asm/processor.h      | 1 +
 arch/sh/include/asm/processor.h         | 1 +
 arch/sparc/include/asm/processor_32.h   | 1 +
 arch/sparc/include/asm/processor_64.h   | 1 +
 arch/tile/include/asm/processor.h       | 1 +
 arch/unicore32/include/asm/processor.h  | 1 +
 arch/x86/include/asm/processor.h        | 1 +
 arch/x86/um/asm/processor.h             | 1 +
 arch/xtensa/include/asm/processor.h     | 1 +
 33 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 43a7559..0556fda 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p);
   ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 16b630f..6c158d5 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -60,6 +60,7 @@ struct task_struct;
 #ifndef CONFIG_EZNPS_MTM_EXT
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #else
@@ -67,6 +68,7 @@ struct task_struct;
 #define cpu_relax()     \
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	barrier()
 
 #endif
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 8a1e8e9..db660e0 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define cpu_relax()			barrier()
 #endif
 
+#define cpu_relax_yield()  	              cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 60e3482..3f9b0e5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,6 +149,7 @@ static inline void cpu_relax(void)
 	asm volatile("yield" ::: "memory");
 }
 
+#define cpu_relax_yield()                     cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index 941593c..e412e8b 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data;
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 0c265ab..8b8704a 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -92,6 +92,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    	smp_mb()
+#define cpu_relax_yield()      cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index f2ef31b..914d730 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -121,6 +121,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(task)	(task_pt_regs(task)->sp)
 
 #define cpu_relax()		do { } while (0)
+#define cpu_relax_yield()             cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 862126b..01dd52e 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -63,6 +63,7 @@ static inline void release_thread(struct task_struct *dead_task)
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 73f0a79..4d00d65 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -107,6 +107,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk)->thread.frame0->sp)
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 111df73..683a061 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -127,6 +127,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index d850113..1558ddb 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -56,6 +56,7 @@ struct thread_struct {
 }
 
 #define cpu_relax() __vmyield()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index ce53c50..4654b71 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -547,6 +547,7 @@ ia64_eoi (void)
 }
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index 9f8fd9b..b262037 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -133,6 +133,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)  ((tsk)->thread.sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index c84a218..13e07ae 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -156,6 +156,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define task_pt_regs(tsk)	((struct pt_regs *) ((tsk)->thread.esp0))
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index a0333eb..61d6e27 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -152,6 +152,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define user_stack_pointer(regs)        ((regs)->ctx.AX[0].U0)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index c38d0dd..fd7dd11 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -22,6 +22,7 @@
 extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
+# define cpu_relax_yield() cpu_relax()
 # define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 0d36c87..9a656f6 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -389,6 +389,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index b10ba12..89f63d1 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -69,6 +69,7 @@ extern void print_cpu_info(struct mn10300_cpuinfo *);
 extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 1c953f0..303e593 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -88,6 +88,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.kregs->sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 70334c9..6ecfc2a 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -92,6 +92,7 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index 2e674e1..ea2ff9f 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -309,6 +309,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.regs.gr[30])
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index c07c31b..908fa7c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -404,6 +404,7 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #define cpu_relax()	barrier()
 #endif
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 0332317..d05965b 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -234,8 +234,9 @@ static inline unsigned short stap(void)
 /*
  * Give up the time slice of the virtual PU.
  */
-void cpu_relax(void);
+void cpu_relax_yield(void);
 
+#define cpu_relax() cpu_relax_yield()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 81d0808..9e60ef1 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -53,7 +53,7 @@ void s390_update_cpu_mhz(void)
 		on_each_cpu(update_cpu_mhz, NULL, 0);
 }
 
-void notrace cpu_relax(void)
+void notrace cpu_relax_yield(void)
 {
 	if (!smp_cpu_mtid && MACHINE_HAS_DIAG44) {
 		diag_stat_inc(DIAG_STAT_X044);
@@ -61,7 +61,7 @@ void notrace cpu_relax(void)
 	}
 	barrier();
 }
-EXPORT_SYMBOL(cpu_relax);
+EXPORT_SYMBOL(cpu_relax_yield);
 
 /*
  * cpu_init - initializes state that is per-CPU.
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index 851f441..e8e87b4 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -24,6 +24,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define current_text_addr() ({ __label__ _l; _l: &&_l; })
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index f9a0994..099a991 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -97,6 +97,7 @@ extern struct sh_cpuinfo cpu_data[];
 
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 812fd08..50e908a3c 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -119,6 +119,7 @@ extern struct task_struct *last_task_used_math;
 int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index ce2595c..3e8fac7 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -216,6 +216,7 @@ unsigned long get_wchan(struct task_struct *task);
 				     "nop\n\t"				\
 				     ".previous"			\
 				     ::: "memory")
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 0684e88..91a39a5 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -264,6 +264,7 @@ static inline void cpu_relax(void)
 	barrier();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index 8d21b7a..fc54d5d 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -71,6 +71,7 @@ extern void release_thread(struct task_struct *);
 unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
+#define cpu_relax_yield()		cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..44adada 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -588,6 +588,7 @@ static __always_inline void cpu_relax(void)
 	rep_nop();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 233ee09..01597af 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -26,6 +26,7 @@ static inline void rep_nop(void)
 }
 
 #define cpu_relax()		rep_nop()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index b42d68b..fe14dc2 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -206,6 +206,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)		(task_pt_regs(tsk)->areg[1])
 
 #define cpu_relax()  barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
@ 2016-10-25  9:03   ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm,
	Christian Borntraeger

For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 +
 arch/arc/include/asm/processor.h        | 2 ++
 arch/arm/include/asm/processor.h        | 1 +
 arch/arm64/include/asm/processor.h      | 1 +
 arch/avr32/include/asm/processor.h      | 1 +
 arch/blackfin/include/asm/processor.h   | 1 +
 arch/c6x/include/asm/processor.h        | 1 +
 arch/cris/include/asm/processor.h       | 1 +
 arch/frv/include/asm/processor.h        | 1 +
 arch/h8300/include/asm/processor.h      | 1 +
 arch/hexagon/include/asm/processor.h    | 1 +
 arch/ia64/include/asm/processor.h       | 1 +
 arch/m32r/include/asm/processor.h       | 1 +
 arch/m68k/include/asm/processor.h       | 1 +
 arch/metag/include/asm/processor.h      | 1 +
 arch/microblaze/include/asm/processor.h | 1 +
 arch/mips/include/asm/processor.h       | 1 +
 arch/mn10300/include/asm/processor.h    | 1 +
 arch/nios2/include/asm/processor.h      | 1 +
 arch/openrisc/include/asm/processor.h   | 1 +
 arch/parisc/include/asm/processor.h     | 1 +
 arch/powerpc/include/asm/processor.h    | 1 +
 arch/s390/include/asm/processor.h       | 3 ++-
 arch/s390/kernel/processor.c            | 4 ++--
 arch/score/include/asm/processor.h      | 1 +
 arch/sh/include/asm/processor.h         | 1 +
 arch/sparc/include/asm/processor_32.h   | 1 +
 arch/sparc/include/asm/processor_64.h   | 1 +
 arch/tile/include/asm/processor.h       | 1 +
 arch/unicore32/include/asm/processor.h  | 1 +
 arch/x86/include/asm/processor.h        | 1 +
 arch/x86/um/asm/processor.h             | 1 +
 arch/xtensa/include/asm/processor.h     | 1 +
 33 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 43a7559..0556fda 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p);
   ((tsk) = current ? rdusp() : task_thread_info(tsk)->pcb.usp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 16b630f..6c158d5 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -60,6 +60,7 @@ struct task_struct;
 #ifndef CONFIG_EZNPS_MTM_EXT
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #else
@@ -67,6 +68,7 @@ struct task_struct;
 #define cpu_relax()     \
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	barrier()
 
 #endif
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 8a1e8e9..db660e0 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define cpu_relax()			barrier()
 #endif
 
+#define cpu_relax_yield()  	              cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 60e3482..3f9b0e5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,6 +149,7 @@ static inline void cpu_relax(void)
 	asm volatile("yield" ::: "memory");
 }
 
+#define cpu_relax_yield()                     cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index 941593c..e412e8b 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data;
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 0c265ab..8b8704a 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -92,6 +92,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) = current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    	smp_mb()
+#define cpu_relax_yield()      cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index f2ef31b..914d730 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -121,6 +121,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(task)	(task_pt_regs(task)->sp)
 
 #define cpu_relax()		do { } while (0)
+#define cpu_relax_yield()             cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 862126b..01dd52e 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -63,6 +63,7 @@ static inline void release_thread(struct task_struct *dead_task)
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 73f0a79..4d00d65 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -107,6 +107,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk)->thread.frame0->sp)
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 111df73..683a061 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -127,6 +127,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) = current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index d850113..1558ddb 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -56,6 +56,7 @@ struct thread_struct {
 }
 
 #define cpu_relax() __vmyield()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index ce53c50..4654b71 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -547,6 +547,7 @@ ia64_eoi (void)
 }
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index 9f8fd9b..b262037 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -133,6 +133,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)  ((tsk)->thread.sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index c84a218..13e07ae 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -156,6 +156,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define task_pt_regs(tsk)	((struct pt_regs *) ((tsk)->thread.esp0))
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index a0333eb..61d6e27 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -152,6 +152,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define user_stack_pointer(regs)        ((regs)->ctx.AX[0].U0)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index c38d0dd..fd7dd11 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -22,6 +22,7 @@
 extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
+# define cpu_relax_yield() cpu_relax()
 # define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 0d36c87..9a656f6 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -389,6 +389,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index b10ba12..89f63d1 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -69,6 +69,7 @@ extern void print_cpu_info(struct mn10300_cpuinfo *);
 extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 1c953f0..303e593 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -88,6 +88,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.kregs->sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 70334c9..6ecfc2a 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -92,6 +92,7 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index 2e674e1..ea2ff9f 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -309,6 +309,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.regs.gr[30])
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index c07c31b..908fa7c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -404,6 +404,7 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #define cpu_relax()	barrier()
 #endif
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 0332317..d05965b 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -234,8 +234,9 @@ static inline unsigned short stap(void)
 /*
  * Give up the time slice of the virtual PU.
  */
-void cpu_relax(void);
+void cpu_relax_yield(void);
 
+#define cpu_relax() cpu_relax_yield()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 81d0808..9e60ef1 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -53,7 +53,7 @@ void s390_update_cpu_mhz(void)
 		on_each_cpu(update_cpu_mhz, NULL, 0);
 }
 
-void notrace cpu_relax(void)
+void notrace cpu_relax_yield(void)
 {
 	if (!smp_cpu_mtid && MACHINE_HAS_DIAG44) {
 		diag_stat_inc(DIAG_STAT_X044);
@@ -61,7 +61,7 @@ void notrace cpu_relax(void)
 	}
 	barrier();
 }
-EXPORT_SYMBOL(cpu_relax);
+EXPORT_SYMBOL(cpu_relax_yield);
 
 /*
  * cpu_init - initializes state that is per-CPU.
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index 851f441..e8e87b4 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -24,6 +24,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define current_text_addr() ({ __label__ _l; _l: &&_l; })
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index f9a0994..099a991 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -97,6 +97,7 @@ extern struct sh_cpuinfo cpu_data[];
 
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 812fd08..50e908a3c 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -119,6 +119,7 @@ extern struct task_struct *last_task_used_math;
 int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index ce2595c..3e8fac7 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -216,6 +216,7 @@ unsigned long get_wchan(struct task_struct *task);
 				     "nop\n\t"				\
 				     ".previous"			\
 				     ::: "memory")
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 0684e88..91a39a5 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -264,6 +264,7 @@ static inline void cpu_relax(void)
 	barrier();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index 8d21b7a..fc54d5d 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -71,6 +71,7 @@ extern void release_thread(struct task_struct *);
 unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
+#define cpu_relax_yield()		cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..44adada 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -588,6 +588,7 @@ static __always_inline void cpu_relax(void)
 	rep_nop();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 233ee09..01597af 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -26,6 +26,7 @@ static inline void rep_nop(void)
 }
 
 #define cpu_relax()		rep_nop()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index b42d68b..fe14dc2 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -206,6 +206,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)		(task_pt_regs(tsk)->areg[1])
 
 #define cpu_relax()  barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-10-25  9:03 ` Christian Borntraeger
  (?)
  (?)
@ 2016-10-25  9:03 ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 +
 arch/arc/include/asm/processor.h        | 2 ++
 arch/arm/include/asm/processor.h        | 1 +
 arch/arm64/include/asm/processor.h      | 1 +
 arch/avr32/include/asm/processor.h      | 1 +
 arch/blackfin/include/asm/processor.h   | 1 +
 arch/c6x/include/asm/processor.h        | 1 +
 arch/cris/include/asm/processor.h       | 1 +
 arch/frv/include/asm/processor.h        | 1 +
 arch/h8300/include/asm/processor.h      | 1 +
 arch/hexagon/include/asm/processor.h    | 1 +
 arch/ia64/include/asm/processor.h       | 1 +
 arch/m32r/include/asm/processor.h       | 1 +
 arch/m68k/include/asm/processor.h       | 1 +
 arch/metag/include/asm/processor.h      | 1 +
 arch/microblaze/include/asm/processor.h | 1 +
 arch/mips/include/asm/processor.h       | 1 +
 arch/mn10300/include/asm/processor.h    | 1 +
 arch/nios2/include/asm/processor.h      | 1 +
 arch/openrisc/include/asm/processor.h   | 1 +
 arch/parisc/include/asm/processor.h     | 1 +
 arch/powerpc/include/asm/processor.h    | 1 +
 arch/s390/include/asm/processor.h       | 3 ++-
 arch/s390/kernel/processor.c            | 4 ++--
 arch/score/include/asm/processor.h      | 1 +
 arch/sh/include/asm/processor.h         | 1 +
 arch/sparc/include/asm/processor_32.h   | 1 +
 arch/sparc/include/asm/processor_64.h   | 1 +
 arch/tile/include/asm/processor.h       | 1 +
 arch/unicore32/include/asm/processor.h  | 1 +
 arch/x86/include/asm/processor.h        | 1 +
 arch/x86/um/asm/processor.h             | 1 +
 arch/xtensa/include/asm/processor.h     | 1 +
 33 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 43a7559..0556fda 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p);
   ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 16b630f..6c158d5 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -60,6 +60,7 @@ struct task_struct;
 #ifndef CONFIG_EZNPS_MTM_EXT
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #else
@@ -67,6 +68,7 @@ struct task_struct;
 #define cpu_relax()     \
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	barrier()
 
 #endif
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 8a1e8e9..db660e0 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define cpu_relax()			barrier()
 #endif
 
+#define cpu_relax_yield()  	              cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 60e3482..3f9b0e5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,6 +149,7 @@ static inline void cpu_relax(void)
 	asm volatile("yield" ::: "memory");
 }
 
+#define cpu_relax_yield()                     cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index 941593c..e412e8b 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data;
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 0c265ab..8b8704a 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -92,6 +92,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    	smp_mb()
+#define cpu_relax_yield()      cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index f2ef31b..914d730 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -121,6 +121,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(task)	(task_pt_regs(task)->sp)
 
 #define cpu_relax()		do { } while (0)
+#define cpu_relax_yield()             cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 862126b..01dd52e 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -63,6 +63,7 @@ static inline void release_thread(struct task_struct *dead_task)
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 73f0a79..4d00d65 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -107,6 +107,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk)->thread.frame0->sp)
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 111df73..683a061 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -127,6 +127,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index d850113..1558ddb 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -56,6 +56,7 @@ struct thread_struct {
 }
 
 #define cpu_relax() __vmyield()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index ce53c50..4654b71 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -547,6 +547,7 @@ ia64_eoi (void)
 }
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index 9f8fd9b..b262037 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -133,6 +133,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)  ((tsk)->thread.sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index c84a218..13e07ae 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -156,6 +156,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define task_pt_regs(tsk)	((struct pt_regs *) ((tsk)->thread.esp0))
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index a0333eb..61d6e27 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -152,6 +152,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define user_stack_pointer(regs)        ((regs)->ctx.AX[0].U0)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index c38d0dd..fd7dd11 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -22,6 +22,7 @@
 extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
+# define cpu_relax_yield() cpu_relax()
 # define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 0d36c87..9a656f6 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -389,6 +389,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index b10ba12..89f63d1 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -69,6 +69,7 @@ extern void print_cpu_info(struct mn10300_cpuinfo *);
 extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 1c953f0..303e593 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -88,6 +88,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.kregs->sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 70334c9..6ecfc2a 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -92,6 +92,7 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index 2e674e1..ea2ff9f 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -309,6 +309,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.regs.gr[30])
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index c07c31b..908fa7c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -404,6 +404,7 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #define cpu_relax()	barrier()
 #endif
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 0332317..d05965b 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -234,8 +234,9 @@ static inline unsigned short stap(void)
 /*
  * Give up the time slice of the virtual PU.
  */
-void cpu_relax(void);
+void cpu_relax_yield(void);
 
+#define cpu_relax() cpu_relax_yield()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 81d0808..9e60ef1 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -53,7 +53,7 @@ void s390_update_cpu_mhz(void)
 		on_each_cpu(update_cpu_mhz, NULL, 0);
 }
 
-void notrace cpu_relax(void)
+void notrace cpu_relax_yield(void)
 {
 	if (!smp_cpu_mtid && MACHINE_HAS_DIAG44) {
 		diag_stat_inc(DIAG_STAT_X044);
@@ -61,7 +61,7 @@ void notrace cpu_relax(void)
 	}
 	barrier();
 }
-EXPORT_SYMBOL(cpu_relax);
+EXPORT_SYMBOL(cpu_relax_yield);
 
 /*
  * cpu_init - initializes state that is per-CPU.
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index 851f441..e8e87b4 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -24,6 +24,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define current_text_addr() ({ __label__ _l; _l: &&_l; })
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index f9a0994..099a991 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -97,6 +97,7 @@ extern struct sh_cpuinfo cpu_data[];
 
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 812fd08..50e908a3c 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -119,6 +119,7 @@ extern struct task_struct *last_task_used_math;
 int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index ce2595c..3e8fac7 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -216,6 +216,7 @@ unsigned long get_wchan(struct task_struct *task);
 				     "nop\n\t"				\
 				     ".previous"			\
 				     ::: "memory")
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 0684e88..91a39a5 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -264,6 +264,7 @@ static inline void cpu_relax(void)
 	barrier();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index 8d21b7a..fc54d5d 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -71,6 +71,7 @@ extern void release_thread(struct task_struct *);
 unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
+#define cpu_relax_yield()		cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..44adada 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -588,6 +588,7 @@ static __always_inline void cpu_relax(void)
 	rep_nop();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 233ee09..01597af 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -26,6 +26,7 @@ static inline void rep_nop(void)
 }
 
 #define cpu_relax()		rep_nop()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index b42d68b..fe14dc2 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -206,6 +206,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)		(task_pt_regs(tsk)->areg[1])
 
 #define cpu_relax()  barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-10-25  9:03 ` Christian Borntraeger
                   ` (2 preceding siblings ...)
  (?)
@ 2016-10-25  9:03 ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Christian Borntraeger, Will Deacon,
	x86, Heiko Carstens, linux-kernel, Nicholas Piggin, Russell King,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, virtualization, linuxppc-dev, Ingo Molnar

For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 +
 arch/arc/include/asm/processor.h        | 2 ++
 arch/arm/include/asm/processor.h        | 1 +
 arch/arm64/include/asm/processor.h      | 1 +
 arch/avr32/include/asm/processor.h      | 1 +
 arch/blackfin/include/asm/processor.h   | 1 +
 arch/c6x/include/asm/processor.h        | 1 +
 arch/cris/include/asm/processor.h       | 1 +
 arch/frv/include/asm/processor.h        | 1 +
 arch/h8300/include/asm/processor.h      | 1 +
 arch/hexagon/include/asm/processor.h    | 1 +
 arch/ia64/include/asm/processor.h       | 1 +
 arch/m32r/include/asm/processor.h       | 1 +
 arch/m68k/include/asm/processor.h       | 1 +
 arch/metag/include/asm/processor.h      | 1 +
 arch/microblaze/include/asm/processor.h | 1 +
 arch/mips/include/asm/processor.h       | 1 +
 arch/mn10300/include/asm/processor.h    | 1 +
 arch/nios2/include/asm/processor.h      | 1 +
 arch/openrisc/include/asm/processor.h   | 1 +
 arch/parisc/include/asm/processor.h     | 1 +
 arch/powerpc/include/asm/processor.h    | 1 +
 arch/s390/include/asm/processor.h       | 3 ++-
 arch/s390/kernel/processor.c            | 4 ++--
 arch/score/include/asm/processor.h      | 1 +
 arch/sh/include/asm/processor.h         | 1 +
 arch/sparc/include/asm/processor_32.h   | 1 +
 arch/sparc/include/asm/processor_64.h   | 1 +
 arch/tile/include/asm/processor.h       | 1 +
 arch/unicore32/include/asm/processor.h  | 1 +
 arch/x86/include/asm/processor.h        | 1 +
 arch/x86/um/asm/processor.h             | 1 +
 arch/xtensa/include/asm/processor.h     | 1 +
 33 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 43a7559..0556fda 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p);
   ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 16b630f..6c158d5 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -60,6 +60,7 @@ struct task_struct;
 #ifndef CONFIG_EZNPS_MTM_EXT
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #else
@@ -67,6 +68,7 @@ struct task_struct;
 #define cpu_relax()     \
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	barrier()
 
 #endif
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 8a1e8e9..db660e0 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define cpu_relax()			barrier()
 #endif
 
+#define cpu_relax_yield()  	              cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 60e3482..3f9b0e5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,6 +149,7 @@ static inline void cpu_relax(void)
 	asm volatile("yield" ::: "memory");
 }
 
+#define cpu_relax_yield()                     cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index 941593c..e412e8b 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data;
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 0c265ab..8b8704a 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -92,6 +92,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    	smp_mb()
+#define cpu_relax_yield()      cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index f2ef31b..914d730 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -121,6 +121,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(task)	(task_pt_regs(task)->sp)
 
 #define cpu_relax()		do { } while (0)
+#define cpu_relax_yield()             cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 862126b..01dd52e 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -63,6 +63,7 @@ static inline void release_thread(struct task_struct *dead_task)
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 73f0a79..4d00d65 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -107,6 +107,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk)->thread.frame0->sp)
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 111df73..683a061 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -127,6 +127,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index d850113..1558ddb 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -56,6 +56,7 @@ struct thread_struct {
 }
 
 #define cpu_relax() __vmyield()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index ce53c50..4654b71 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -547,6 +547,7 @@ ia64_eoi (void)
 }
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index 9f8fd9b..b262037 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -133,6 +133,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)  ((tsk)->thread.sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index c84a218..13e07ae 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -156,6 +156,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define task_pt_regs(tsk)	((struct pt_regs *) ((tsk)->thread.esp0))
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index a0333eb..61d6e27 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -152,6 +152,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define user_stack_pointer(regs)        ((regs)->ctx.AX[0].U0)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index c38d0dd..fd7dd11 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -22,6 +22,7 @@
 extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
+# define cpu_relax_yield() cpu_relax()
 # define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 0d36c87..9a656f6 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -389,6 +389,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index b10ba12..89f63d1 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -69,6 +69,7 @@ extern void print_cpu_info(struct mn10300_cpuinfo *);
 extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 1c953f0..303e593 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -88,6 +88,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.kregs->sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 70334c9..6ecfc2a 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -92,6 +92,7 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index 2e674e1..ea2ff9f 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -309,6 +309,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.regs.gr[30])
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index c07c31b..908fa7c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -404,6 +404,7 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #define cpu_relax()	barrier()
 #endif
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 0332317..d05965b 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -234,8 +234,9 @@ static inline unsigned short stap(void)
 /*
  * Give up the time slice of the virtual PU.
  */
-void cpu_relax(void);
+void cpu_relax_yield(void);
 
+#define cpu_relax() cpu_relax_yield()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 81d0808..9e60ef1 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -53,7 +53,7 @@ void s390_update_cpu_mhz(void)
 		on_each_cpu(update_cpu_mhz, NULL, 0);
 }
 
-void notrace cpu_relax(void)
+void notrace cpu_relax_yield(void)
 {
 	if (!smp_cpu_mtid && MACHINE_HAS_DIAG44) {
 		diag_stat_inc(DIAG_STAT_X044);
@@ -61,7 +61,7 @@ void notrace cpu_relax(void)
 	}
 	barrier();
 }
-EXPORT_SYMBOL(cpu_relax);
+EXPORT_SYMBOL(cpu_relax_yield);
 
 /*
  * cpu_init - initializes state that is per-CPU.
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index 851f441..e8e87b4 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -24,6 +24,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define current_text_addr() ({ __label__ _l; _l: &&_l; })
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index f9a0994..099a991 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -97,6 +97,7 @@ extern struct sh_cpuinfo cpu_data[];
 
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 812fd08..50e908a3c 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -119,6 +119,7 @@ extern struct task_struct *last_task_used_math;
 int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index ce2595c..3e8fac7 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -216,6 +216,7 @@ unsigned long get_wchan(struct task_struct *task);
 				     "nop\n\t"				\
 				     ".previous"			\
 				     ::: "memory")
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 0684e88..91a39a5 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -264,6 +264,7 @@ static inline void cpu_relax(void)
 	barrier();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index 8d21b7a..fc54d5d 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -71,6 +71,7 @@ extern void release_thread(struct task_struct *);
 unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
+#define cpu_relax_yield()		cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..44adada 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -588,6 +588,7 @@ static __always_inline void cpu_relax(void)
 	rep_nop();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 233ee09..01597af 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -26,6 +26,7 @@ static inline void rep_nop(void)
 }
 
 #define cpu_relax()		rep_nop()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index b42d68b..fe14dc2 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -206,6 +206,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)		(task_pt_regs(tsk)->areg[1])
 
 #define cpu_relax()  barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
-- 
2.5.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 2/5] stop_machine: yield CPU during stop machine
  2016-10-25  9:03 ` Christian Borntraeger
  (?)
@ 2016-10-25  9:03   ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm,
	Christian Borntraeger

Some time ago commit 57f2ffe14fd125c2  ("s390: remove diag 44 calls
from cpu_relax()") did stop cpu_relax on s390 yielding to the
hypervisor.

As it turns out this made stop_machine run really slow on virtualized
overcommited systems. For example the kprobes test during bootup took
several seconds instead of just running unnoticed with large guests.

Therefore, the yielding was reintroduced with commit 4d92f50249eb
("s390: reintroduce diag 44 calls for cpu_relax()"), but in fact the
stop machine code seems to be the only place where this yielding
was really necessary. This place is probably the most important one
as it makes all but one guest CPUs wait for one guest CPU.

As we now have cpu_relax_yield, we can use this in multi_cpu_stop.
For now lets only add it here. We can add it later in other places
when necessary.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 kernel/stop_machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ec9ab2f..1eb8266 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read multi_stop_state. */
-		cpu_relax();
+		cpu_relax_yield();
 		if (msdata->state != curstate) {
 			curstate = msdata->state;
 			switch (curstate) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 2/5] stop_machine: yield CPU during stop machine
@ 2016-10-25  9:03   ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

Some time ago commit 57f2ffe14fd125c2  ("s390: remove diag 44 calls
from cpu_relax()") did stop cpu_relax on s390 yielding to the
hypervisor.

As it turns out this made stop_machine run really slow on virtualized
overcommited systems. For example the kprobes test during bootup took
several seconds instead of just running unnoticed with large guests.

Therefore, the yielding was reintroduced with commit 4d92f50249eb
("s390: reintroduce diag 44 calls for cpu_relax()"), but in fact the
stop machine code seems to be the only place where this yielding
was really necessary. This place is probably the most important one
as it makes all but one guest CPUs wait for one guest CPU.

As we now have cpu_relax_yield, we can use this in multi_cpu_stop.
For now lets only add it here. We can add it later in other places
when necessary.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 kernel/stop_machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ec9ab2f..1eb8266 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read multi_stop_state. */
-		cpu_relax();
+		cpu_relax_yield();
 		if (msdata->state != curstate) {
 			curstate = msdata->state;
 			switch (curstate) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 2/5] stop_machine: yield CPU during stop machine
@ 2016-10-25  9:03   ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

Some time ago commit 57f2ffe14fd125c2  ("s390: remove diag 44 calls
from cpu_relax()") did stop cpu_relax on s390 yielding to the
hypervisor.

As it turns out this made stop_machine run really slow on virtualized
overcommited systems. For example the kprobes test during bootup took
several seconds instead of just running unnoticed with large guests.

Therefore, the yielding was reintroduced with commit 4d92f50249eb
("s390: reintroduce diag 44 calls for cpu_relax()"), but in fact the
stop machine code seems to be the only place where this yielding
was really necessary. This place is probably the most important one
as it makes all but one guest CPUs wait for one guest CPU.

As we now have cpu_relax_yield, we can use this in multi_cpu_stop.
For now lets only add it here. We can add it later in other places
when necessary.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 kernel/stop_machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ec9ab2f..1eb8266 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read multi_stop_state. */
-		cpu_relax();
+		cpu_relax_yield();
 		if (msdata->state != curstate) {
 			curstate = msdata->state;
 			switch (curstate) {
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 2/5] stop_machine: yield CPU during stop machine
  2016-10-25  9:03 ` Christian Borntraeger
                   ` (3 preceding siblings ...)
  (?)
@ 2016-10-25  9:03 ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Christian Borntraeger, Will Deacon,
	x86, Heiko Carstens, linux-kernel, Nicholas Piggin, Russell King,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, virtualization, linuxppc-dev, Ingo Molnar

Some time ago commit 57f2ffe14fd125c2  ("s390: remove diag 44 calls
from cpu_relax()") did stop cpu_relax on s390 yielding to the
hypervisor.

As it turns out this made stop_machine run really slow on virtualized
overcommited systems. For example the kprobes test during bootup took
several seconds instead of just running unnoticed with large guests.

Therefore, the yielding was reintroduced with commit 4d92f50249eb
("s390: reintroduce diag 44 calls for cpu_relax()"), but in fact the
stop machine code seems to be the only place where this yielding
was really necessary. This place is probably the most important one
as it makes all but one guest CPUs wait for one guest CPU.

As we now have cpu_relax_yield, we can use this in multi_cpu_stop.
For now lets only add it here. We can add it later in other places
when necessary.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 kernel/stop_machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ec9ab2f..1eb8266 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read multi_stop_state. */
-		cpu_relax();
+		cpu_relax_yield();
 		if (msdata->state != curstate) {
 			curstate = msdata->state;
 			switch (curstate) {
-- 
2.5.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 3/5] s390: make cpu_relax a barrier again
  2016-10-25  9:03 ` Christian Borntraeger
  (?)
@ 2016-10-25  9:03   ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm,
	Christian Borntraeger

stop_machine seemed to be the only important place for yielding during
cpu_relax. This was fixed by using cpu_relax_yield. Therefore, we can
now redefine cpu_relax to be a barrier instead on s390, making s390
identical to all other architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index d05965b..5d262cf 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -236,7 +236,7 @@ static inline unsigned short stap(void)
  */
 void cpu_relax_yield(void);
 
-#define cpu_relax() cpu_relax_yield()
+#define cpu_relax() barrier()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 3/5] s390: make cpu_relax a barrier again
@ 2016-10-25  9:03   ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

stop_machine seemed to be the only important place for yielding during
cpu_relax. This was fixed by using cpu_relax_yield. Therefore, we can
now redefine cpu_relax to be a barrier instead on s390, making s390
identical to all other architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index d05965b..5d262cf 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -236,7 +236,7 @@ static inline unsigned short stap(void)
  */
 void cpu_relax_yield(void);
 
-#define cpu_relax() cpu_relax_yield()
+#define cpu_relax() barrier()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 3/5] s390: make cpu_relax a barrier again
@ 2016-10-25  9:03   ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

stop_machine seemed to be the only important place for yielding during
cpu_relax. This was fixed by using cpu_relax_yield. Therefore, we can
now redefine cpu_relax to be a barrier instead on s390, making s390
identical to all other architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index d05965b..5d262cf 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -236,7 +236,7 @@ static inline unsigned short stap(void)
  */
 void cpu_relax_yield(void);
 
-#define cpu_relax() cpu_relax_yield()
+#define cpu_relax() barrier()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 3/5] s390: make cpu_relax a barrier again
  2016-10-25  9:03 ` Christian Borntraeger
                   ` (5 preceding siblings ...)
  (?)
@ 2016-10-25  9:03 ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Christian Borntraeger, Will Deacon,
	x86, Heiko Carstens, linux-kernel, Nicholas Piggin, Russell King,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, virtualization, linuxppc-dev, Ingo Molnar

stop_machine seemed to be the only important place for yielding during
cpu_relax. This was fixed by using cpu_relax_yield. Therefore, we can
now redefine cpu_relax to be a barrier instead on s390, making s390
identical to all other architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index d05965b..5d262cf 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -236,7 +236,7 @@ static inline unsigned short stap(void)
  */
 void cpu_relax_yield(void);
 
-#define cpu_relax() cpu_relax_yield()
+#define cpu_relax() barrier()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
-- 
2.5.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 4/5] processor.h: Remove cpu_relax_lowlatency users
  2016-10-25  9:03 ` Christian Borntraeger
  (?)
@ 2016-10-25  9:03   ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm,
	Christian Borntraeger

With the s390 special case of a yielding cpu_relax implementation gone,
we can now remove all users of cpu_relax_lowlatency and replace them
with cpu_relax.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c                     | 4 ++--
 kernel/locking/mcs_spinlock.h           | 4 ++--
 kernel/locking/mutex.c                  | 4 ++--
 kernel/locking/osq_lock.c               | 6 +++---
 kernel/locking/qrwlock.c                | 6 +++---
 kernel/locking/rwsem-xadd.c             | 4 ++--
 lib/lockref.c                           | 2 +-
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 8832f8e..383d134 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -723,7 +723,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
 		if (busywait_stop(timeout_us, cpu))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	} while (!need_resched());
 
 	return false;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..5dc3465 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -342,7 +342,7 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 		endtime = busy_clock() + vq->busyloop_timeout;
 		while (vhost_can_busy_poll(vq->dev, endtime) &&
 		       vhost_vq_avail_empty(vq->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 		preempt_enable();
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
@@ -533,7 +533,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 		while (vhost_can_busy_poll(&net->dev, endtime) &&
 		       !sk_has_rx_data(sk) &&
 		       vhost_vq_avail_empty(&net->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 
 		preempt_enable();
 
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index c835270..6a385aa 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -28,7 +28,7 @@ struct mcs_spinlock {
 #define arch_mcs_spin_lock_contended(l)					\
 do {									\
 	while (!(smp_load_acquire(l)))					\
-		cpu_relax_lowlatency();					\
+		cpu_relax();						\
 } while (0)
 #endif
 
@@ -108,7 +108,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 			return;
 		/* Wait until the next pointer is set */
 		while (!(next = READ_ONCE(node->next)))
-			cpu_relax_lowlatency();
+			cpu_relax();
 	}
 
 	/* Pass lock to next waiter. */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..4463405 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -241,7 +241,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 			break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 
@@ -377,7 +377,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	osq_unlock(&lock->osq);
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..4ea2710 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 				break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	return next;
@@ -122,7 +122,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (need_resched())
 			goto unqueue;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	return true;
 
@@ -148,7 +148,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (smp_load_acquire(&node->locked))
 			return true;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 
 		/*
 		 * Or we race against a concurrent unqueue()'s step-B, in which
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 19248dd..cc3ed0c 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -54,7 +54,7 @@ static __always_inline void
 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
 {
 	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
-		cpu_relax_lowlatency();
+		cpu_relax();
 		cnts = atomic_read_acquire(&lock->cnts);
 	}
 }
@@ -130,7 +130,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	/* When no more readers, set the locked flag */
@@ -141,7 +141,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 					    _QW_LOCKED) == _QW_WAITING))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 unlock:
 	arch_spin_unlock(&lock->wait_lock);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..2fa2e2e6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -368,7 +368,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 			return false;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 out:
@@ -423,7 +423,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	osq_unlock(&sem->osq);
 done:
diff --git a/lib/lockref.c b/lib/lockref.c
index 5a92189..c4bfcb8 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -20,7 +20,7 @@
 		if (likely(old.lock_count == prev.lock_count)) {		\
 			SUCCESS;						\
 		}								\
-		cpu_relax_lowlatency();						\
+		cpu_relax();							\
 	}									\
 } while (0)
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 4/5] processor.h: Remove cpu_relax_lowlatency users
@ 2016-10-25  9:03   ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

With the s390 special case of a yielding cpu_relax implementation gone,
we can now remove all users of cpu_relax_lowlatency and replace them
with cpu_relax.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c                     | 4 ++--
 kernel/locking/mcs_spinlock.h           | 4 ++--
 kernel/locking/mutex.c                  | 4 ++--
 kernel/locking/osq_lock.c               | 6 +++---
 kernel/locking/qrwlock.c                | 6 +++---
 kernel/locking/rwsem-xadd.c             | 4 ++--
 lib/lockref.c                           | 2 +-
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 8832f8e..383d134 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -723,7 +723,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
 		if (busywait_stop(timeout_us, cpu))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	} while (!need_resched());
 
 	return false;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..5dc3465 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -342,7 +342,7 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 		endtime = busy_clock() + vq->busyloop_timeout;
 		while (vhost_can_busy_poll(vq->dev, endtime) &&
 		       vhost_vq_avail_empty(vq->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 		preempt_enable();
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
@@ -533,7 +533,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 		while (vhost_can_busy_poll(&net->dev, endtime) &&
 		       !sk_has_rx_data(sk) &&
 		       vhost_vq_avail_empty(&net->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 
 		preempt_enable();
 
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index c835270..6a385aa 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -28,7 +28,7 @@ struct mcs_spinlock {
 #define arch_mcs_spin_lock_contended(l)					\
 do {									\
 	while (!(smp_load_acquire(l)))					\
-		cpu_relax_lowlatency();					\
+		cpu_relax();						\
 } while (0)
 #endif
 
@@ -108,7 +108,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 			return;
 		/* Wait until the next pointer is set */
 		while (!(next = READ_ONCE(node->next)))
-			cpu_relax_lowlatency();
+			cpu_relax();
 	}
 
 	/* Pass lock to next waiter. */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..4463405 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -241,7 +241,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 			break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 
@@ -377,7 +377,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	osq_unlock(&lock->osq);
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..4ea2710 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 				break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	return next;
@@ -122,7 +122,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (need_resched())
 			goto unqueue;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	return true;
 
@@ -148,7 +148,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (smp_load_acquire(&node->locked))
 			return true;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 
 		/*
 		 * Or we race against a concurrent unqueue()'s step-B, in which
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 19248dd..cc3ed0c 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -54,7 +54,7 @@ static __always_inline void
 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
 {
 	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
-		cpu_relax_lowlatency();
+		cpu_relax();
 		cnts = atomic_read_acquire(&lock->cnts);
 	}
 }
@@ -130,7 +130,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	/* When no more readers, set the locked flag */
@@ -141,7 +141,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 					    _QW_LOCKED) == _QW_WAITING))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 unlock:
 	arch_spin_unlock(&lock->wait_lock);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..2fa2e2e6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -368,7 +368,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 			return false;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 out:
@@ -423,7 +423,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	osq_unlock(&sem->osq);
 done:
diff --git a/lib/lockref.c b/lib/lockref.c
index 5a92189..c4bfcb8 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -20,7 +20,7 @@
 		if (likely(old.lock_count == prev.lock_count)) {		\
 			SUCCESS;						\
 		}								\
-		cpu_relax_lowlatency();						\
+		cpu_relax();							\
 	}									\
 } while (0)
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 4/5] processor.h: Remove cpu_relax_lowlatency users
@ 2016-10-25  9:03   ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

With the s390 special case of a yielding cpu_relax implementation gone,
we can now remove all users of cpu_relax_lowlatency and replace them
with cpu_relax.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c                     | 4 ++--
 kernel/locking/mcs_spinlock.h           | 4 ++--
 kernel/locking/mutex.c                  | 4 ++--
 kernel/locking/osq_lock.c               | 6 +++---
 kernel/locking/qrwlock.c                | 6 +++---
 kernel/locking/rwsem-xadd.c             | 4 ++--
 lib/lockref.c                           | 2 +-
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 8832f8e..383d134 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -723,7 +723,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
 		if (busywait_stop(timeout_us, cpu))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	} while (!need_resched());
 
 	return false;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..5dc3465 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -342,7 +342,7 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 		endtime = busy_clock() + vq->busyloop_timeout;
 		while (vhost_can_busy_poll(vq->dev, endtime) &&
 		       vhost_vq_avail_empty(vq->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 		preempt_enable();
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
@@ -533,7 +533,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 		while (vhost_can_busy_poll(&net->dev, endtime) &&
 		       !sk_has_rx_data(sk) &&
 		       vhost_vq_avail_empty(&net->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 
 		preempt_enable();
 
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index c835270..6a385aa 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -28,7 +28,7 @@ struct mcs_spinlock {
 #define arch_mcs_spin_lock_contended(l)					\
 do {									\
 	while (!(smp_load_acquire(l)))					\
-		cpu_relax_lowlatency();					\
+		cpu_relax();						\
 } while (0)
 #endif
 
@@ -108,7 +108,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 			return;
 		/* Wait until the next pointer is set */
 		while (!(next = READ_ONCE(node->next)))
-			cpu_relax_lowlatency();
+			cpu_relax();
 	}
 
 	/* Pass lock to next waiter. */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..4463405 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -241,7 +241,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 			break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 
@@ -377,7 +377,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	osq_unlock(&lock->osq);
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..4ea2710 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 				break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	return next;
@@ -122,7 +122,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (need_resched())
 			goto unqueue;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	return true;
 
@@ -148,7 +148,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (smp_load_acquire(&node->locked))
 			return true;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 
 		/*
 		 * Or we race against a concurrent unqueue()'s step-B, in which
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 19248dd..cc3ed0c 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -54,7 +54,7 @@ static __always_inline void
 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
 {
 	while ((cnts & _QW_WMASK) = _QW_LOCKED) {
-		cpu_relax_lowlatency();
+		cpu_relax();
 		cnts = atomic_read_acquire(&lock->cnts);
 	}
 }
@@ -130,7 +130,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) = 0))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	/* When no more readers, set the locked flag */
@@ -141,7 +141,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 					    _QW_LOCKED) = _QW_WAITING))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 unlock:
 	arch_spin_unlock(&lock->wait_lock);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..2fa2e2e6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -368,7 +368,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 			return false;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 out:
@@ -423,7 +423,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	osq_unlock(&sem->osq);
 done:
diff --git a/lib/lockref.c b/lib/lockref.c
index 5a92189..c4bfcb8 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -20,7 +20,7 @@
 		if (likely(old.lock_count = prev.lock_count)) {		\
 			SUCCESS;						\
 		}								\
-		cpu_relax_lowlatency();						\
+		cpu_relax();							\
 	}									\
 } while (0)
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 4/5] processor.h: Remove cpu_relax_lowlatency users
  2016-10-25  9:03 ` Christian Borntraeger
                   ` (8 preceding siblings ...)
  (?)
@ 2016-10-25  9:03 ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Christian Borntraeger, Will Deacon,
	x86, Heiko Carstens, linux-kernel, Nicholas Piggin, Russell King,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, virtualization, linuxppc-dev, Ingo Molnar

With the s390 special case of a yielding cpu_relax implementation gone,
we can now remove all users of cpu_relax_lowlatency and replace them
with cpu_relax.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c                     | 4 ++--
 kernel/locking/mcs_spinlock.h           | 4 ++--
 kernel/locking/mutex.c                  | 4 ++--
 kernel/locking/osq_lock.c               | 6 +++---
 kernel/locking/qrwlock.c                | 6 +++---
 kernel/locking/rwsem-xadd.c             | 4 ++--
 lib/lockref.c                           | 2 +-
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 8832f8e..383d134 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -723,7 +723,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
 		if (busywait_stop(timeout_us, cpu))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	} while (!need_resched());
 
 	return false;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..5dc3465 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -342,7 +342,7 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 		endtime = busy_clock() + vq->busyloop_timeout;
 		while (vhost_can_busy_poll(vq->dev, endtime) &&
 		       vhost_vq_avail_empty(vq->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 		preempt_enable();
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
@@ -533,7 +533,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 		while (vhost_can_busy_poll(&net->dev, endtime) &&
 		       !sk_has_rx_data(sk) &&
 		       vhost_vq_avail_empty(&net->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 
 		preempt_enable();
 
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index c835270..6a385aa 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -28,7 +28,7 @@ struct mcs_spinlock {
 #define arch_mcs_spin_lock_contended(l)					\
 do {									\
 	while (!(smp_load_acquire(l)))					\
-		cpu_relax_lowlatency();					\
+		cpu_relax();						\
 } while (0)
 #endif
 
@@ -108,7 +108,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 			return;
 		/* Wait until the next pointer is set */
 		while (!(next = READ_ONCE(node->next)))
-			cpu_relax_lowlatency();
+			cpu_relax();
 	}
 
 	/* Pass lock to next waiter. */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..4463405 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -241,7 +241,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 			break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 
@@ -377,7 +377,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	osq_unlock(&lock->osq);
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..4ea2710 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 				break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	return next;
@@ -122,7 +122,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (need_resched())
 			goto unqueue;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	return true;
 
@@ -148,7 +148,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (smp_load_acquire(&node->locked))
 			return true;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 
 		/*
 		 * Or we race against a concurrent unqueue()'s step-B, in which
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 19248dd..cc3ed0c 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -54,7 +54,7 @@ static __always_inline void
 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
 {
 	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
-		cpu_relax_lowlatency();
+		cpu_relax();
 		cnts = atomic_read_acquire(&lock->cnts);
 	}
 }
@@ -130,7 +130,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	/* When no more readers, set the locked flag */
@@ -141,7 +141,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 					    _QW_LOCKED) == _QW_WAITING))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 unlock:
 	arch_spin_unlock(&lock->wait_lock);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..2fa2e2e6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -368,7 +368,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 			return false;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 out:
@@ -423,7 +423,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	osq_unlock(&sem->osq);
 done:
diff --git a/lib/lockref.c b/lib/lockref.c
index 5a92189..c4bfcb8 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -20,7 +20,7 @@
 		if (likely(old.lock_count == prev.lock_count)) {		\
 			SUCCESS;						\
 		}								\
-		cpu_relax_lowlatency();						\
+		cpu_relax();							\
 	}									\
 } while (0)
 
-- 
2.5.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 5/5] processor.h: remove cpu_relax_lowlatency
  2016-10-25  9:03 ` Christian Borntraeger
@ 2016-10-25  9:03   ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm,
	Christian Borntraeger

As there are no users left, we can remove cpu_relax_lowlatency.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 -
 arch/arc/include/asm/processor.h        | 2 --
 arch/arm/include/asm/processor.h        | 1 -
 arch/arm64/include/asm/processor.h      | 1 -
 arch/avr32/include/asm/processor.h      | 1 -
 arch/blackfin/include/asm/processor.h   | 1 -
 arch/c6x/include/asm/processor.h        | 1 -
 arch/cris/include/asm/processor.h       | 1 -
 arch/frv/include/asm/processor.h        | 1 -
 arch/h8300/include/asm/processor.h      | 1 -
 arch/hexagon/include/asm/processor.h    | 1 -
 arch/ia64/include/asm/processor.h       | 1 -
 arch/m32r/include/asm/processor.h       | 1 -
 arch/m68k/include/asm/processor.h       | 1 -
 arch/metag/include/asm/processor.h      | 1 -
 arch/microblaze/include/asm/processor.h | 1 -
 arch/mips/include/asm/processor.h       | 1 -
 arch/mn10300/include/asm/processor.h    | 1 -
 arch/nios2/include/asm/processor.h      | 1 -
 arch/openrisc/include/asm/processor.h   | 1 -
 arch/parisc/include/asm/processor.h     | 1 -
 arch/powerpc/include/asm/processor.h    | 1 -
 arch/s390/include/asm/processor.h       | 1 -
 arch/score/include/asm/processor.h      | 1 -
 arch/sh/include/asm/processor.h         | 1 -
 arch/sparc/include/asm/processor_32.h   | 1 -
 arch/sparc/include/asm/processor_64.h   | 1 -
 arch/tile/include/asm/processor.h       | 1 -
 arch/unicore32/include/asm/processor.h  | 1 -
 arch/x86/include/asm/processor.h        | 1 -
 arch/x86/um/asm/processor.h             | 1 -
 arch/xtensa/include/asm/processor.h     | 1 -
 32 files changed, 33 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 0556fda..31e8dbe 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -59,7 +59,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
 #define ARCH_HAS_PREFETCHW
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 6c158d5..d102a49 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -61,7 +61,6 @@ struct task_struct;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #else
 
@@ -69,7 +68,6 @@ struct task_struct;
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	barrier()
 
 #endif
 
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index db660e0..9e71c58b 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -83,7 +83,6 @@ unsigned long get_wchan(struct task_struct *p);
 #endif
 
 #define cpu_relax_yield()  	              cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 3f9b0e5..6132f64 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -150,7 +150,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield()                     cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index e412e8b..ee62365 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -93,7 +93,6 @@ extern struct avr32_cpuinfo boot_cpu_data;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
 struct cpu_context {
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 8b8704a..57acfb1 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -93,7 +93,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    	smp_mb()
 #define cpu_relax_yield()      cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
 static inline uint32_t __pure bfin_revid(void)
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 914d730..1fd22e7 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -122,7 +122,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		do { } while (0)
 #define cpu_relax_yield()             cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
 
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 01dd52e..1a57841 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -64,7 +64,6 @@ static inline void release_thread(struct task_struct *dead_task)
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 4d00d65..c1e5f2a 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -108,7 +108,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
 #define ARCH_HAS_PREFETCH
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 683a061..42d6053 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -128,7 +128,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
 	local_irq_disable();		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index 1558ddb..5d694cc 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -57,7 +57,6 @@ struct thread_struct {
 
 #define cpu_relax() __vmyield()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Decides where the kernel will search for a free chunk of vm space during
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 4654b71..0c2c3b2 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -548,7 +548,6 @@ ia64_eoi (void)
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
 ia64_get_irr(unsigned int vector)
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index b262037..9b83a13 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -134,6 +134,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 13e07ae..b0d0442 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -157,6 +157,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index 61d6e27..ee302a6 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -153,7 +153,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
 
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index fd7dd11..08ec1f7 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -23,7 +23,6 @@ extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
 # define cpu_relax_yield() cpu_relax()
-# define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
 		(((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 9a656f6..8ea95e7 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -390,7 +390,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Return_address is a replacement for __builtin_return_address(count)
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index 89f63d1..d11397b 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -70,7 +70,6 @@ extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * User space process size: 1.75GB (default).
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 303e593..d32c176 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -89,7 +89,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 6ecfc2a..7f47fc7 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -93,7 +93,6 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_OPENRISC_PROCESSOR_H */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index ea2ff9f..a4a07f4 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -310,7 +310,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * parisc_requires_coherency() is used to identify the combined VIPT/PIPT
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 908fa7c..5684e68 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -405,7 +405,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #endif
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
 int validate_sp(unsigned long sp, struct task_struct *p,
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 5d262cf..17c001a 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -237,7 +237,6 @@ static inline unsigned short stap(void)
 void cpu_relax_yield(void);
 
 #define cpu_relax() barrier()
-#define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
 #define ECAG_CPU_ATTRIBUTE	1
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index e8e87b4..a1e97c0 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -25,7 +25,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
 /*
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index 099a991..9454ff1 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -98,7 +98,6 @@ extern struct sh_cpuinfo cpu_data[];
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 void stop_this_cpu(void *);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 50e908a3c..fc32b73 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -120,7 +120,6 @@ int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
 
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index 3e8fac7..12787df 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -217,7 +217,6 @@ unsigned long get_wchan(struct task_struct *task);
 				     ".previous"			\
 				     ::: "memory")
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
  * UltraSPARC-I will treat these as nops, and UltraSPARC-II has
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 91a39a5..c1c228b 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -265,7 +265,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
 struct seq_operations;
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index fc54d5d..eeefe7c 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -72,7 +72,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
 #define cpu_relax_yield()		cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 44adada..7513c99 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -589,7 +589,6 @@ static __always_inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
 static inline void sync_core(void)
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 01597af..b4bd63b 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -27,7 +27,6 @@ static inline void rep_nop(void)
 
 #define cpu_relax()		rep_nop()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
 
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index fe14dc2..7d8d6be 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -207,7 +207,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()  barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 5/5] processor.h: remove cpu_relax_lowlatency
@ 2016-10-25  9:03   ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm,
	Christian Borntraeger

As there are no users left, we can remove cpu_relax_lowlatency.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 -
 arch/arc/include/asm/processor.h        | 2 --
 arch/arm/include/asm/processor.h        | 1 -
 arch/arm64/include/asm/processor.h      | 1 -
 arch/avr32/include/asm/processor.h      | 1 -
 arch/blackfin/include/asm/processor.h   | 1 -
 arch/c6x/include/asm/processor.h        | 1 -
 arch/cris/include/asm/processor.h       | 1 -
 arch/frv/include/asm/processor.h        | 1 -
 arch/h8300/include/asm/processor.h      | 1 -
 arch/hexagon/include/asm/processor.h    | 1 -
 arch/ia64/include/asm/processor.h       | 1 -
 arch/m32r/include/asm/processor.h       | 1 -
 arch/m68k/include/asm/processor.h       | 1 -
 arch/metag/include/asm/processor.h      | 1 -
 arch/microblaze/include/asm/processor.h | 1 -
 arch/mips/include/asm/processor.h       | 1 -
 arch/mn10300/include/asm/processor.h    | 1 -
 arch/nios2/include/asm/processor.h      | 1 -
 arch/openrisc/include/asm/processor.h   | 1 -
 arch/parisc/include/asm/processor.h     | 1 -
 arch/powerpc/include/asm/processor.h    | 1 -
 arch/s390/include/asm/processor.h       | 1 -
 arch/score/include/asm/processor.h      | 1 -
 arch/sh/include/asm/processor.h         | 1 -
 arch/sparc/include/asm/processor_32.h   | 1 -
 arch/sparc/include/asm/processor_64.h   | 1 -
 arch/tile/include/asm/processor.h       | 1 -
 arch/unicore32/include/asm/processor.h  | 1 -
 arch/x86/include/asm/processor.h        | 1 -
 arch/x86/um/asm/processor.h             | 1 -
 arch/xtensa/include/asm/processor.h     | 1 -
 32 files changed, 33 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 0556fda..31e8dbe 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -59,7 +59,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
 #define ARCH_HAS_PREFETCHW
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 6c158d5..d102a49 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -61,7 +61,6 @@ struct task_struct;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #else
 
@@ -69,7 +68,6 @@ struct task_struct;
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	barrier()
 
 #endif
 
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index db660e0..9e71c58b 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -83,7 +83,6 @@ unsigned long get_wchan(struct task_struct *p);
 #endif
 
 #define cpu_relax_yield()  	              cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 3f9b0e5..6132f64 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -150,7 +150,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield()                     cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index e412e8b..ee62365 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -93,7 +93,6 @@ extern struct avr32_cpuinfo boot_cpu_data;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
 struct cpu_context {
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 8b8704a..57acfb1 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -93,7 +93,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    	smp_mb()
 #define cpu_relax_yield()      cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
 static inline uint32_t __pure bfin_revid(void)
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 914d730..1fd22e7 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -122,7 +122,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		do { } while (0)
 #define cpu_relax_yield()             cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
 
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 01dd52e..1a57841 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -64,7 +64,6 @@ static inline void release_thread(struct task_struct *dead_task)
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 4d00d65..c1e5f2a 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -108,7 +108,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
 #define ARCH_HAS_PREFETCH
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 683a061..42d6053 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -128,7 +128,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
 	local_irq_disable();		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index 1558ddb..5d694cc 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -57,7 +57,6 @@ struct thread_struct {
 
 #define cpu_relax() __vmyield()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Decides where the kernel will search for a free chunk of vm space during
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 4654b71..0c2c3b2 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -548,7 +548,6 @@ ia64_eoi (void)
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
 ia64_get_irr(unsigned int vector)
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index b262037..9b83a13 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -134,6 +134,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 13e07ae..b0d0442 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -157,6 +157,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index 61d6e27..ee302a6 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -153,7 +153,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
 
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index fd7dd11..08ec1f7 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -23,7 +23,6 @@ extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
 # define cpu_relax_yield() cpu_relax()
-# define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
 		(((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 9a656f6..8ea95e7 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -390,7 +390,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Return_address is a replacement for __builtin_return_address(count)
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index 89f63d1..d11397b 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -70,7 +70,6 @@ extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * User space process size: 1.75GB (default).
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 303e593..d32c176 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -89,7 +89,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 6ecfc2a..7f47fc7 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -93,7 +93,6 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_OPENRISC_PROCESSOR_H */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index ea2ff9f..a4a07f4 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -310,7 +310,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * parisc_requires_coherency() is used to identify the combined VIPT/PIPT
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 908fa7c..5684e68 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -405,7 +405,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #endif
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
 int validate_sp(unsigned long sp, struct task_struct *p,
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 5d262cf..17c001a 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -237,7 +237,6 @@ static inline unsigned short stap(void)
 void cpu_relax_yield(void);
 
 #define cpu_relax() barrier()
-#define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
 #define ECAG_CPU_ATTRIBUTE	1
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index e8e87b4..a1e97c0 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -25,7 +25,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
 /*
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index 099a991..9454ff1 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -98,7 +98,6 @@ extern struct sh_cpuinfo cpu_data[];
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 void stop_this_cpu(void *);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 50e908a3c..fc32b73 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -120,7 +120,6 @@ int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
 
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index 3e8fac7..12787df 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -217,7 +217,6 @@ unsigned long get_wchan(struct task_struct *task);
 				     ".previous"			\
 				     ::: "memory")
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
  * UltraSPARC-I will treat these as nops, and UltraSPARC-II has
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 91a39a5..c1c228b 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -265,7 +265,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
 struct seq_operations;
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index fc54d5d..eeefe7c 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -72,7 +72,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
 #define cpu_relax_yield()		cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 44adada..7513c99 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -589,7 +589,6 @@ static __always_inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
 static inline void sync_core(void)
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 01597af..b4bd63b 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -27,7 +27,6 @@ static inline void rep_nop(void)
 
 #define cpu_relax()		rep_nop()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
 
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index fe14dc2..7d8d6be 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -207,7 +207,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()  barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 5/5] processor.h: remove cpu_relax_lowlatency
  2016-10-25  9:03 ` Christian Borntraeger
                   ` (10 preceding siblings ...)
  (?)
@ 2016-10-25  9:03 ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

As there are no users left, we can remove cpu_relax_lowlatency.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 -
 arch/arc/include/asm/processor.h        | 2 --
 arch/arm/include/asm/processor.h        | 1 -
 arch/arm64/include/asm/processor.h      | 1 -
 arch/avr32/include/asm/processor.h      | 1 -
 arch/blackfin/include/asm/processor.h   | 1 -
 arch/c6x/include/asm/processor.h        | 1 -
 arch/cris/include/asm/processor.h       | 1 -
 arch/frv/include/asm/processor.h        | 1 -
 arch/h8300/include/asm/processor.h      | 1 -
 arch/hexagon/include/asm/processor.h    | 1 -
 arch/ia64/include/asm/processor.h       | 1 -
 arch/m32r/include/asm/processor.h       | 1 -
 arch/m68k/include/asm/processor.h       | 1 -
 arch/metag/include/asm/processor.h      | 1 -
 arch/microblaze/include/asm/processor.h | 1 -
 arch/mips/include/asm/processor.h       | 1 -
 arch/mn10300/include/asm/processor.h    | 1 -
 arch/nios2/include/asm/processor.h      | 1 -
 arch/openrisc/include/asm/processor.h   | 1 -
 arch/parisc/include/asm/processor.h     | 1 -
 arch/powerpc/include/asm/processor.h    | 1 -
 arch/s390/include/asm/processor.h       | 1 -
 arch/score/include/asm/processor.h      | 1 -
 arch/sh/include/asm/processor.h         | 1 -
 arch/sparc/include/asm/processor_32.h   | 1 -
 arch/sparc/include/asm/processor_64.h   | 1 -
 arch/tile/include/asm/processor.h       | 1 -
 arch/unicore32/include/asm/processor.h  | 1 -
 arch/x86/include/asm/processor.h        | 1 -
 arch/x86/um/asm/processor.h             | 1 -
 arch/xtensa/include/asm/processor.h     | 1 -
 32 files changed, 33 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 0556fda..31e8dbe 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -59,7 +59,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
 #define ARCH_HAS_PREFETCHW
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 6c158d5..d102a49 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -61,7 +61,6 @@ struct task_struct;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #else
 
@@ -69,7 +68,6 @@ struct task_struct;
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	barrier()
 
 #endif
 
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index db660e0..9e71c58b 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -83,7 +83,6 @@ unsigned long get_wchan(struct task_struct *p);
 #endif
 
 #define cpu_relax_yield()  	              cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 3f9b0e5..6132f64 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -150,7 +150,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield()                     cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index e412e8b..ee62365 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -93,7 +93,6 @@ extern struct avr32_cpuinfo boot_cpu_data;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
 struct cpu_context {
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 8b8704a..57acfb1 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -93,7 +93,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    	smp_mb()
 #define cpu_relax_yield()      cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
 static inline uint32_t __pure bfin_revid(void)
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 914d730..1fd22e7 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -122,7 +122,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		do { } while (0)
 #define cpu_relax_yield()             cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
 
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 01dd52e..1a57841 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -64,7 +64,6 @@ static inline void release_thread(struct task_struct *dead_task)
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 4d00d65..c1e5f2a 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -108,7 +108,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
 #define ARCH_HAS_PREFETCH
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 683a061..42d6053 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -128,7 +128,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
 	local_irq_disable();		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index 1558ddb..5d694cc 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -57,7 +57,6 @@ struct thread_struct {
 
 #define cpu_relax() __vmyield()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Decides where the kernel will search for a free chunk of vm space during
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 4654b71..0c2c3b2 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -548,7 +548,6 @@ ia64_eoi (void)
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
 ia64_get_irr(unsigned int vector)
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index b262037..9b83a13 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -134,6 +134,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 13e07ae..b0d0442 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -157,6 +157,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index 61d6e27..ee302a6 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -153,7 +153,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
 
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index fd7dd11..08ec1f7 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -23,7 +23,6 @@ extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
 # define cpu_relax_yield() cpu_relax()
-# define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
 		(((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 9a656f6..8ea95e7 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -390,7 +390,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Return_address is a replacement for __builtin_return_address(count)
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index 89f63d1..d11397b 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -70,7 +70,6 @@ extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * User space process size: 1.75GB (default).
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 303e593..d32c176 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -89,7 +89,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 6ecfc2a..7f47fc7 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -93,7 +93,6 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_OPENRISC_PROCESSOR_H */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index ea2ff9f..a4a07f4 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -310,7 +310,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * parisc_requires_coherency() is used to identify the combined VIPT/PIPT
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 908fa7c..5684e68 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -405,7 +405,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #endif
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
 int validate_sp(unsigned long sp, struct task_struct *p,
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 5d262cf..17c001a 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -237,7 +237,6 @@ static inline unsigned short stap(void)
 void cpu_relax_yield(void);
 
 #define cpu_relax() barrier()
-#define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
 #define ECAG_CPU_ATTRIBUTE	1
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index e8e87b4..a1e97c0 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -25,7 +25,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
 /*
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index 099a991..9454ff1 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -98,7 +98,6 @@ extern struct sh_cpuinfo cpu_data[];
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 void stop_this_cpu(void *);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 50e908a3c..fc32b73 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -120,7 +120,6 @@ int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
 
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index 3e8fac7..12787df 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -217,7 +217,6 @@ unsigned long get_wchan(struct task_struct *task);
 				     ".previous"			\
 				     ::: "memory")
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
  * UltraSPARC-I will treat these as nops, and UltraSPARC-II has
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 91a39a5..c1c228b 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -265,7 +265,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
 struct seq_operations;
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index fc54d5d..eeefe7c 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -72,7 +72,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
 #define cpu_relax_yield()		cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 44adada..7513c99 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -589,7 +589,6 @@ static __always_inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
 static inline void sync_core(void)
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 01597af..b4bd63b 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -27,7 +27,6 @@ static inline void rep_nop(void)
 
 #define cpu_relax()		rep_nop()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
 
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index fe14dc2..7d8d6be 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -207,7 +207,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()  barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [GIT PULL v2 5/5] processor.h: remove cpu_relax_lowlatency
  2016-10-25  9:03 ` Christian Borntraeger
                   ` (11 preceding siblings ...)
  (?)
@ 2016-10-25  9:03 ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-10-25  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Christian Borntraeger, Will Deacon,
	x86, Heiko Carstens, linux-kernel, Nicholas Piggin, Russell King,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, virtualization, linuxppc-dev, Ingo Molnar

As there are no users left, we can remove cpu_relax_lowlatency.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 -
 arch/arc/include/asm/processor.h        | 2 --
 arch/arm/include/asm/processor.h        | 1 -
 arch/arm64/include/asm/processor.h      | 1 -
 arch/avr32/include/asm/processor.h      | 1 -
 arch/blackfin/include/asm/processor.h   | 1 -
 arch/c6x/include/asm/processor.h        | 1 -
 arch/cris/include/asm/processor.h       | 1 -
 arch/frv/include/asm/processor.h        | 1 -
 arch/h8300/include/asm/processor.h      | 1 -
 arch/hexagon/include/asm/processor.h    | 1 -
 arch/ia64/include/asm/processor.h       | 1 -
 arch/m32r/include/asm/processor.h       | 1 -
 arch/m68k/include/asm/processor.h       | 1 -
 arch/metag/include/asm/processor.h      | 1 -
 arch/microblaze/include/asm/processor.h | 1 -
 arch/mips/include/asm/processor.h       | 1 -
 arch/mn10300/include/asm/processor.h    | 1 -
 arch/nios2/include/asm/processor.h      | 1 -
 arch/openrisc/include/asm/processor.h   | 1 -
 arch/parisc/include/asm/processor.h     | 1 -
 arch/powerpc/include/asm/processor.h    | 1 -
 arch/s390/include/asm/processor.h       | 1 -
 arch/score/include/asm/processor.h      | 1 -
 arch/sh/include/asm/processor.h         | 1 -
 arch/sparc/include/asm/processor_32.h   | 1 -
 arch/sparc/include/asm/processor_64.h   | 1 -
 arch/tile/include/asm/processor.h       | 1 -
 arch/unicore32/include/asm/processor.h  | 1 -
 arch/x86/include/asm/processor.h        | 1 -
 arch/x86/um/asm/processor.h             | 1 -
 arch/xtensa/include/asm/processor.h     | 1 -
 32 files changed, 33 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 0556fda..31e8dbe 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -59,7 +59,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
 #define ARCH_HAS_PREFETCHW
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 6c158d5..d102a49 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -61,7 +61,6 @@ struct task_struct;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #else
 
@@ -69,7 +68,6 @@ struct task_struct;
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	barrier()
 
 #endif
 
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index db660e0..9e71c58b 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -83,7 +83,6 @@ unsigned long get_wchan(struct task_struct *p);
 #endif
 
 #define cpu_relax_yield()  	              cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 3f9b0e5..6132f64 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -150,7 +150,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield()                     cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index e412e8b..ee62365 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -93,7 +93,6 @@ extern struct avr32_cpuinfo boot_cpu_data;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
 struct cpu_context {
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 8b8704a..57acfb1 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -93,7 +93,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    	smp_mb()
 #define cpu_relax_yield()      cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
 static inline uint32_t __pure bfin_revid(void)
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 914d730..1fd22e7 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -122,7 +122,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		do { } while (0)
 #define cpu_relax_yield()             cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
 
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 01dd52e..1a57841 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -64,7 +64,6 @@ static inline void release_thread(struct task_struct *dead_task)
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 4d00d65..c1e5f2a 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -108,7 +108,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
 #define ARCH_HAS_PREFETCH
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 683a061..42d6053 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -128,7 +128,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
 	local_irq_disable();		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index 1558ddb..5d694cc 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -57,7 +57,6 @@ struct thread_struct {
 
 #define cpu_relax() __vmyield()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Decides where the kernel will search for a free chunk of vm space during
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 4654b71..0c2c3b2 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -548,7 +548,6 @@ ia64_eoi (void)
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
 ia64_get_irr(unsigned int vector)
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index b262037..9b83a13 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -134,6 +134,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 13e07ae..b0d0442 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -157,6 +157,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index 61d6e27..ee302a6 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -153,7 +153,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
 
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index fd7dd11..08ec1f7 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -23,7 +23,6 @@ extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
 # define cpu_relax_yield() cpu_relax()
-# define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
 		(((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 9a656f6..8ea95e7 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -390,7 +390,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Return_address is a replacement for __builtin_return_address(count)
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index 89f63d1..d11397b 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -70,7 +70,6 @@ extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * User space process size: 1.75GB (default).
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 303e593..d32c176 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -89,7 +89,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 6ecfc2a..7f47fc7 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -93,7 +93,6 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_OPENRISC_PROCESSOR_H */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index ea2ff9f..a4a07f4 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -310,7 +310,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * parisc_requires_coherency() is used to identify the combined VIPT/PIPT
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 908fa7c..5684e68 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -405,7 +405,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #endif
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
 int validate_sp(unsigned long sp, struct task_struct *p,
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 5d262cf..17c001a 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -237,7 +237,6 @@ static inline unsigned short stap(void)
 void cpu_relax_yield(void);
 
 #define cpu_relax() barrier()
-#define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
 #define ECAG_CPU_ATTRIBUTE	1
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index e8e87b4..a1e97c0 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -25,7 +25,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
 /*
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index 099a991..9454ff1 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -98,7 +98,6 @@ extern struct sh_cpuinfo cpu_data[];
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 void stop_this_cpu(void *);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 50e908a3c..fc32b73 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -120,7 +120,6 @@ int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
 
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index 3e8fac7..12787df 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -217,7 +217,6 @@ unsigned long get_wchan(struct task_struct *task);
 				     ".previous"			\
 				     ::: "memory")
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
  * UltraSPARC-I will treat these as nops, and UltraSPARC-II has
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 91a39a5..c1c228b 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -265,7 +265,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
 struct seq_operations;
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index fc54d5d..eeefe7c 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -72,7 +72,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
 #define cpu_relax_yield()		cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 44adada..7513c99 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -589,7 +589,6 @@ static __always_inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
 static inline void sync_core(void)
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 01597af..b4bd63b 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -27,7 +27,6 @@ static inline void rep_nop(void)
 
 #define cpu_relax()		rep_nop()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
 
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index fe14dc2..7d8d6be 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -207,7 +207,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()  barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
 
-- 
2.5.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
  2016-10-25  9:03 ` Christian Borntraeger
@ 2016-11-15 10:15   ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 10:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm

On 10/25/2016 11:03 AM, Christian Borntraeger wrote:
> Peter,
> 
> here is v2 with some improved patch descriptions and some fixes. The
> previous version has survived one day of linux-next and I only changed
> small parts.
> So unless there is some other issue, feel free to pull (or to apply
> the patches) to tip/locking.
> 
> The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69:
> 
>   Linux 4.9-rc2 (2016-10-23 17:10:14 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  tags/cpurelax
> 
> for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07:
> 
>   processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200)

Ping.

Peter, you had these patches in your
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/
repository, but now the patches are gone. 
Any feedback?



> 
> ----------------------------------------------------------------
> cpu_relax: drop lowlatency, introduce yield
> 
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.
> 
> So my proposal boils down to:
> - lowest latency: use barrier() or mb() if necessary
> - low latency: use cpu_relax (e.g. might give up some cpu for the other
>   _hardware_ threads)
> - really give up CPU: use  cpu_relax_yield
> 
> PS: In the long run I would also try to provide for s390 something
> like cpu_relax_yield_to with a cpu number (or just add that to
> cpu_relax_yield), since a yield_to is always better than a yield as
> long as we know the waiter.
> 
> ----------------------------------------------------------------
> Christian Borntraeger (5):
>       processor.h: introduce cpu_relax_yield
>       stop_machine: yield CPU during stop machine
>       s390: make cpu_relax a barrier again
>       processor.h: Remove cpu_relax_lowlatency users
>       processor.h: remove cpu_relax_lowlatency
> 
>  arch/alpha/include/asm/processor.h      | 2 +-
>  arch/arc/include/asm/processor.h        | 4 ++--
>  arch/arm/include/asm/processor.h        | 2 +-
>  arch/arm64/include/asm/processor.h      | 2 +-
>  arch/avr32/include/asm/processor.h      | 2 +-
>  arch/blackfin/include/asm/processor.h   | 2 +-
>  arch/c6x/include/asm/processor.h        | 2 +-
>  arch/cris/include/asm/processor.h       | 2 +-
>  arch/frv/include/asm/processor.h        | 2 +-
>  arch/h8300/include/asm/processor.h      | 2 +-
>  arch/hexagon/include/asm/processor.h    | 2 +-
>  arch/ia64/include/asm/processor.h       | 2 +-
>  arch/m32r/include/asm/processor.h       | 2 +-
>  arch/m68k/include/asm/processor.h       | 2 +-
>  arch/metag/include/asm/processor.h      | 2 +-
>  arch/microblaze/include/asm/processor.h | 2 +-
>  arch/mips/include/asm/processor.h       | 2 +-
>  arch/mn10300/include/asm/processor.h    | 2 +-
>  arch/nios2/include/asm/processor.h      | 2 +-
>  arch/openrisc/include/asm/processor.h   | 2 +-
>  arch/parisc/include/asm/processor.h     | 2 +-
>  arch/powerpc/include/asm/processor.h    | 2 +-
>  arch/s390/include/asm/processor.h       | 4 ++--
>  arch/s390/kernel/processor.c            | 4 ++--
>  arch/score/include/asm/processor.h      | 2 +-
>  arch/sh/include/asm/processor.h         | 2 +-
>  arch/sparc/include/asm/processor_32.h   | 2 +-
>  arch/sparc/include/asm/processor_64.h   | 2 +-
>  arch/tile/include/asm/processor.h       | 2 +-
>  arch/unicore32/include/asm/processor.h  | 2 +-
>  arch/x86/include/asm/processor.h        | 2 +-
>  arch/x86/um/asm/processor.h             | 2 +-
>  arch/xtensa/include/asm/processor.h     | 2 +-
>  drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
>  drivers/vhost/net.c                     | 4 ++--
>  kernel/locking/mcs_spinlock.h           | 4 ++--
>  kernel/locking/mutex.c                  | 4 ++--
>  kernel/locking/osq_lock.c               | 6 +++---
>  kernel/locking/qrwlock.c                | 6 +++---
>  kernel/locking/rwsem-xadd.c             | 4 ++--
>  kernel/stop_machine.c                   | 2 +-
>  lib/lockref.c                           | 2 +-
>  42 files changed, 53 insertions(+), 53 deletions(-)
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
@ 2016-11-15 10:15   ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 10:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Nicholas Piggin, linux-kernel, linux-s390,
	linux-arch, linuxppc-dev, Heiko Carstens, Martin Schwidefsky,
	Noam Camus, sparclinux, x86, Will Deacon, Catalin Marinas,
	Russell King, virtualization, xen-devel, kvm

On 10/25/2016 11:03 AM, Christian Borntraeger wrote:
> Peter,
> 
> here is v2 with some improved patch descriptions and some fixes. The
> previous version has survived one day of linux-next and I only changed
> small parts.
> So unless there is some other issue, feel free to pull (or to apply
> the patches) to tip/locking.
> 
> The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69:
> 
>   Linux 4.9-rc2 (2016-10-23 17:10:14 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  tags/cpurelax
> 
> for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07:
> 
>   processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200)

Ping.

Peter, you had these patches in your
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/
repository, but now the patches are gone. 
Any feedback?



> 
> ----------------------------------------------------------------
> cpu_relax: drop lowlatency, introduce yield
> 
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.
> 
> So my proposal boils down to:
> - lowest latency: use barrier() or mb() if necessary
> - low latency: use cpu_relax (e.g. might give up some cpu for the other
>   _hardware_ threads)
> - really give up CPU: use  cpu_relax_yield
> 
> PS: In the long run I would also try to provide for s390 something
> like cpu_relax_yield_to with a cpu number (or just add that to
> cpu_relax_yield), since a yield_to is always better than a yield as
> long as we know the waiter.
> 
> ----------------------------------------------------------------
> Christian Borntraeger (5):
>       processor.h: introduce cpu_relax_yield
>       stop_machine: yield CPU during stop machine
>       s390: make cpu_relax a barrier again
>       processor.h: Remove cpu_relax_lowlatency users
>       processor.h: remove cpu_relax_lowlatency
> 
>  arch/alpha/include/asm/processor.h      | 2 +-
>  arch/arc/include/asm/processor.h        | 4 ++--
>  arch/arm/include/asm/processor.h        | 2 +-
>  arch/arm64/include/asm/processor.h      | 2 +-
>  arch/avr32/include/asm/processor.h      | 2 +-
>  arch/blackfin/include/asm/processor.h   | 2 +-
>  arch/c6x/include/asm/processor.h        | 2 +-
>  arch/cris/include/asm/processor.h       | 2 +-
>  arch/frv/include/asm/processor.h        | 2 +-
>  arch/h8300/include/asm/processor.h      | 2 +-
>  arch/hexagon/include/asm/processor.h    | 2 +-
>  arch/ia64/include/asm/processor.h       | 2 +-
>  arch/m32r/include/asm/processor.h       | 2 +-
>  arch/m68k/include/asm/processor.h       | 2 +-
>  arch/metag/include/asm/processor.h      | 2 +-
>  arch/microblaze/include/asm/processor.h | 2 +-
>  arch/mips/include/asm/processor.h       | 2 +-
>  arch/mn10300/include/asm/processor.h    | 2 +-
>  arch/nios2/include/asm/processor.h      | 2 +-
>  arch/openrisc/include/asm/processor.h   | 2 +-
>  arch/parisc/include/asm/processor.h     | 2 +-
>  arch/powerpc/include/asm/processor.h    | 2 +-
>  arch/s390/include/asm/processor.h       | 4 ++--
>  arch/s390/kernel/processor.c            | 4 ++--
>  arch/score/include/asm/processor.h      | 2 +-
>  arch/sh/include/asm/processor.h         | 2 +-
>  arch/sparc/include/asm/processor_32.h   | 2 +-
>  arch/sparc/include/asm/processor_64.h   | 2 +-
>  arch/tile/include/asm/processor.h       | 2 +-
>  arch/unicore32/include/asm/processor.h  | 2 +-
>  arch/x86/include/asm/processor.h        | 2 +-
>  arch/x86/um/asm/processor.h             | 2 +-
>  arch/xtensa/include/asm/processor.h     | 2 +-
>  drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
>  drivers/vhost/net.c                     | 4 ++--
>  kernel/locking/mcs_spinlock.h           | 4 ++--
>  kernel/locking/mutex.c                  | 4 ++--
>  kernel/locking/osq_lock.c               | 6 +++---
>  kernel/locking/qrwlock.c                | 6 +++---
>  kernel/locking/rwsem-xadd.c             | 4 ++--
>  kernel/stop_machine.c                   | 2 +-
>  lib/lockref.c                           | 2 +-
>  42 files changed, 53 insertions(+), 53 deletions(-)
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
  2016-10-25  9:03 ` Christian Borntraeger
                   ` (14 preceding siblings ...)
  (?)
@ 2016-11-15 10:15 ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 10:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

On 10/25/2016 11:03 AM, Christian Borntraeger wrote:
> Peter,
> 
> here is v2 with some improved patch descriptions and some fixes. The
> previous version has survived one day of linux-next and I only changed
> small parts.
> So unless there is some other issue, feel free to pull (or to apply
> the patches) to tip/locking.
> 
> The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69:
> 
>   Linux 4.9-rc2 (2016-10-23 17:10:14 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  tags/cpurelax
> 
> for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07:
> 
>   processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200)

Ping.

Peter, you had these patches in your
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/
repository, but now the patches are gone. 
Any feedback?



> 
> ----------------------------------------------------------------
> cpu_relax: drop lowlatency, introduce yield
> 
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.
> 
> So my proposal boils down to:
> - lowest latency: use barrier() or mb() if necessary
> - low latency: use cpu_relax (e.g. might give up some cpu for the other
>   _hardware_ threads)
> - really give up CPU: use  cpu_relax_yield
> 
> PS: In the long run I would also try to provide for s390 something
> like cpu_relax_yield_to with a cpu number (or just add that to
> cpu_relax_yield), since a yield_to is always better than a yield as
> long as we know the waiter.
> 
> ----------------------------------------------------------------
> Christian Borntraeger (5):
>       processor.h: introduce cpu_relax_yield
>       stop_machine: yield CPU during stop machine
>       s390: make cpu_relax a barrier again
>       processor.h: Remove cpu_relax_lowlatency users
>       processor.h: remove cpu_relax_lowlatency
> 
>  arch/alpha/include/asm/processor.h      | 2 +-
>  arch/arc/include/asm/processor.h        | 4 ++--
>  arch/arm/include/asm/processor.h        | 2 +-
>  arch/arm64/include/asm/processor.h      | 2 +-
>  arch/avr32/include/asm/processor.h      | 2 +-
>  arch/blackfin/include/asm/processor.h   | 2 +-
>  arch/c6x/include/asm/processor.h        | 2 +-
>  arch/cris/include/asm/processor.h       | 2 +-
>  arch/frv/include/asm/processor.h        | 2 +-
>  arch/h8300/include/asm/processor.h      | 2 +-
>  arch/hexagon/include/asm/processor.h    | 2 +-
>  arch/ia64/include/asm/processor.h       | 2 +-
>  arch/m32r/include/asm/processor.h       | 2 +-
>  arch/m68k/include/asm/processor.h       | 2 +-
>  arch/metag/include/asm/processor.h      | 2 +-
>  arch/microblaze/include/asm/processor.h | 2 +-
>  arch/mips/include/asm/processor.h       | 2 +-
>  arch/mn10300/include/asm/processor.h    | 2 +-
>  arch/nios2/include/asm/processor.h      | 2 +-
>  arch/openrisc/include/asm/processor.h   | 2 +-
>  arch/parisc/include/asm/processor.h     | 2 +-
>  arch/powerpc/include/asm/processor.h    | 2 +-
>  arch/s390/include/asm/processor.h       | 4 ++--
>  arch/s390/kernel/processor.c            | 4 ++--
>  arch/score/include/asm/processor.h      | 2 +-
>  arch/sh/include/asm/processor.h         | 2 +-
>  arch/sparc/include/asm/processor_32.h   | 2 +-
>  arch/sparc/include/asm/processor_64.h   | 2 +-
>  arch/tile/include/asm/processor.h       | 2 +-
>  arch/unicore32/include/asm/processor.h  | 2 +-
>  arch/x86/include/asm/processor.h        | 2 +-
>  arch/x86/um/asm/processor.h             | 2 +-
>  arch/xtensa/include/asm/processor.h     | 2 +-
>  drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
>  drivers/vhost/net.c                     | 4 ++--
>  kernel/locking/mcs_spinlock.h           | 4 ++--
>  kernel/locking/mutex.c                  | 4 ++--
>  kernel/locking/osq_lock.c               | 6 +++---
>  kernel/locking/qrwlock.c                | 6 +++---
>  kernel/locking/rwsem-xadd.c             | 4 ++--
>  kernel/stop_machine.c                   | 2 +-
>  lib/lockref.c                           | 2 +-
>  42 files changed, 53 insertions(+), 53 deletions(-)
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
  2016-10-25  9:03 ` Christian Borntraeger
                   ` (13 preceding siblings ...)
  (?)
@ 2016-11-15 10:15 ` Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 10:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, Will Deacon, x86, Heiko Carstens,
	linux-kernel, Nicholas Piggin, Russell King, sparclinux,
	Noam Camus, Catalin Marinas, Martin Schwidefsky, xen-devel,
	virtualization, linuxppc-dev, Ingo Molnar

On 10/25/2016 11:03 AM, Christian Borntraeger wrote:
> Peter,
> 
> here is v2 with some improved patch descriptions and some fixes. The
> previous version has survived one day of linux-next and I only changed
> small parts.
> So unless there is some other issue, feel free to pull (or to apply
> the patches) to tip/locking.
> 
> The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69:
> 
>   Linux 4.9-rc2 (2016-10-23 17:10:14 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  tags/cpurelax
> 
> for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07:
> 
>   processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200)

Ping.

Peter, you had these patches in your
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/
repository, but now the patches are gone. 
Any feedback?



> 
> ----------------------------------------------------------------
> cpu_relax: drop lowlatency, introduce yield
> 
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.
> 
> So my proposal boils down to:
> - lowest latency: use barrier() or mb() if necessary
> - low latency: use cpu_relax (e.g. might give up some cpu for the other
>   _hardware_ threads)
> - really give up CPU: use  cpu_relax_yield
> 
> PS: In the long run I would also try to provide for s390 something
> like cpu_relax_yield_to with a cpu number (or just add that to
> cpu_relax_yield), since a yield_to is always better than a yield as
> long as we know the waiter.
> 
> ----------------------------------------------------------------
> Christian Borntraeger (5):
>       processor.h: introduce cpu_relax_yield
>       stop_machine: yield CPU during stop machine
>       s390: make cpu_relax a barrier again
>       processor.h: Remove cpu_relax_lowlatency users
>       processor.h: remove cpu_relax_lowlatency
> 
>  arch/alpha/include/asm/processor.h      | 2 +-
>  arch/arc/include/asm/processor.h        | 4 ++--
>  arch/arm/include/asm/processor.h        | 2 +-
>  arch/arm64/include/asm/processor.h      | 2 +-
>  arch/avr32/include/asm/processor.h      | 2 +-
>  arch/blackfin/include/asm/processor.h   | 2 +-
>  arch/c6x/include/asm/processor.h        | 2 +-
>  arch/cris/include/asm/processor.h       | 2 +-
>  arch/frv/include/asm/processor.h        | 2 +-
>  arch/h8300/include/asm/processor.h      | 2 +-
>  arch/hexagon/include/asm/processor.h    | 2 +-
>  arch/ia64/include/asm/processor.h       | 2 +-
>  arch/m32r/include/asm/processor.h       | 2 +-
>  arch/m68k/include/asm/processor.h       | 2 +-
>  arch/metag/include/asm/processor.h      | 2 +-
>  arch/microblaze/include/asm/processor.h | 2 +-
>  arch/mips/include/asm/processor.h       | 2 +-
>  arch/mn10300/include/asm/processor.h    | 2 +-
>  arch/nios2/include/asm/processor.h      | 2 +-
>  arch/openrisc/include/asm/processor.h   | 2 +-
>  arch/parisc/include/asm/processor.h     | 2 +-
>  arch/powerpc/include/asm/processor.h    | 2 +-
>  arch/s390/include/asm/processor.h       | 4 ++--
>  arch/s390/kernel/processor.c            | 4 ++--
>  arch/score/include/asm/processor.h      | 2 +-
>  arch/sh/include/asm/processor.h         | 2 +-
>  arch/sparc/include/asm/processor_32.h   | 2 +-
>  arch/sparc/include/asm/processor_64.h   | 2 +-
>  arch/tile/include/asm/processor.h       | 2 +-
>  arch/unicore32/include/asm/processor.h  | 2 +-
>  arch/x86/include/asm/processor.h        | 2 +-
>  arch/x86/um/asm/processor.h             | 2 +-
>  arch/xtensa/include/asm/processor.h     | 2 +-
>  drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
>  drivers/vhost/net.c                     | 4 ++--
>  kernel/locking/mcs_spinlock.h           | 4 ++--
>  kernel/locking/mutex.c                  | 4 ++--
>  kernel/locking/osq_lock.c               | 6 +++---
>  kernel/locking/qrwlock.c                | 6 +++---
>  kernel/locking/rwsem-xadd.c             | 4 ++--
>  kernel/stop_machine.c                   | 2 +-
>  lib/lockref.c                           | 2 +-
>  42 files changed, 53 insertions(+), 53 deletions(-)
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-10-25  9:03   ` Christian Borntraeger
  (?)
  (?)
@ 2016-11-15 12:30   ` Russell King - ARM Linux
  2016-11-15 13:19       ` Christian Borntraeger
  2016-11-15 13:19     ` Christian Borntraeger
  -1 siblings, 2 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2016-11-15 12:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Peter Zijlstra, Ingo Molnar, Nicholas Piggin, linux-kernel,
	linux-s390, linux-arch, linuxppc-dev, Heiko Carstens,
	Martin Schwidefsky, Noam Camus, sparclinux, x86, Will Deacon,
	Catalin Marinas, virtualization, xen-devel, kvm

On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.

Rather than having to update all these architectures in this way, can't
we put in some linux/*.h header something like:

#ifndef cpu_relax_yield
#define cpu_relax_yield() cpu_relax()
#endif

so only those architectures that need to do something need to be
modified?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-10-25  9:03   ` Christian Borntraeger
                     ` (2 preceding siblings ...)
  (?)
@ 2016-11-15 12:30   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2016-11-15 12:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.

Rather than having to update all these architectures in this way, can't
we put in some linux/*.h header something like:

#ifndef cpu_relax_yield
#define cpu_relax_yield() cpu_relax()
#endif

so only those architectures that need to do something need to be
modified?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-10-25  9:03   ` Christian Borntraeger
  (?)
@ 2016-11-15 12:30   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2016-11-15 12:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> For spinning loops people do often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency.
> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> towards other hardware threads in an SMT environment.
> On s390 cpu_relax does even more, it uses an hypercall to the
> hypervisor to give up the timeslice.
> In contrast to the SMT yielding this can result in larger latencies.
> In some places this latency is unwanted, so another variant
> "cpu_relax_lowlatency" was introduced. Before this is used in more
> and more places, lets revert the logic and provide a cpu_relax_yield
> that can be called in places where yielding is more important than
> latency. By default this is the same as cpu_relax on all architectures.

Rather than having to update all these architectures in this way, can't
we put in some linux/*.h header something like:

#ifndef cpu_relax_yield
#define cpu_relax_yield() cpu_relax()
#endif

so only those architectures that need to do something need to be
modified?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-11-15 12:30   ` Russell King - ARM Linux
  2016-11-15 13:19       ` Christian Borntraeger
@ 2016-11-15 13:19       ` Christian Borntraeger
  1 sibling, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 13:19 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Peter Zijlstra, Ingo Molnar, Nicholas Piggin, linux-kernel,
	linux-s390, linux-arch, linuxppc-dev, Heiko Carstens,
	Martin Schwidefsky, Noam Camus, sparclinux, x86, Will Deacon,
	Catalin Marinas, virtualization, xen-devel, kvm

On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>> For spinning loops people do often use barrier() or cpu_relax().
>> For most architectures cpu_relax and barrier are the same, but on
>> some architectures cpu_relax can add some latency.
>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>> towards other hardware threads in an SMT environment.
>> On s390 cpu_relax does even more, it uses an hypercall to the
>> hypervisor to give up the timeslice.
>> In contrast to the SMT yielding this can result in larger latencies.
>> In some places this latency is unwanted, so another variant
>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>> and more places, lets revert the logic and provide a cpu_relax_yield
>> that can be called in places where yielding is more important than
>> latency. By default this is the same as cpu_relax on all architectures.
> 
> Rather than having to update all these architectures in this way, can't
> we put in some linux/*.h header something like:
> 
> #ifndef cpu_relax_yield
> #define cpu_relax_yield() cpu_relax()
> #endif
> 
> so only those architectures that need to do something need to be
> modified?

These patches are part of linux-next since a month or so, changing that 
would invalidate all the next testing. If people want that, I can certainly
do that, though.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
@ 2016-11-15 13:19       ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 13:19 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>> For spinning loops people do often use barrier() or cpu_relax().
>> For most architectures cpu_relax and barrier are the same, but on
>> some architectures cpu_relax can add some latency.
>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>> towards other hardware threads in an SMT environment.
>> On s390 cpu_relax does even more, it uses an hypercall to the
>> hypervisor to give up the timeslice.
>> In contrast to the SMT yielding this can result in larger latencies.
>> In some places this latency is unwanted, so another variant
>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>> and more places, lets revert the logic and provide a cpu_relax_yield
>> that can be called in places where yielding is more important than
>> latency. By default this is the same as cpu_relax on all architectures.
> 
> Rather than having to update all these architectures in this way, can't
> we put in some linux/*.h header something like:
> 
> #ifndef cpu_relax_yield
> #define cpu_relax_yield() cpu_relax()
> #endif
> 
> so only those architectures that need to do something need to be
> modified?

These patches are part of linux-next since a month or so, changing that 
would invalidate all the next testing. If people want that, I can certainly
do that, though.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
@ 2016-11-15 13:19       ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 13:19 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>> For spinning loops people do often use barrier() or cpu_relax().
>> For most architectures cpu_relax and barrier are the same, but on
>> some architectures cpu_relax can add some latency.
>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>> towards other hardware threads in an SMT environment.
>> On s390 cpu_relax does even more, it uses an hypercall to the
>> hypervisor to give up the timeslice.
>> In contrast to the SMT yielding this can result in larger latencies.
>> In some places this latency is unwanted, so another variant
>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>> and more places, lets revert the logic and provide a cpu_relax_yield
>> that can be called in places where yielding is more important than
>> latency. By default this is the same as cpu_relax on all architectures.
> 
> Rather than having to update all these architectures in this way, can't
> we put in some linux/*.h header something like:
> 
> #ifndef cpu_relax_yield
> #define cpu_relax_yield() cpu_relax()
> #endif
> 
> so only those architectures that need to do something need to be
> modified?

These patches are part of linux-next since a month or so, changing that 
would invalidate all the next testing. If people want that, I can certainly
do that, though.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-11-15 12:30   ` Russell King - ARM Linux
  2016-11-15 13:19       ` Christian Borntraeger
@ 2016-11-15 13:19     ` Christian Borntraeger
  1 sibling, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 13:19 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>> For spinning loops people do often use barrier() or cpu_relax().
>> For most architectures cpu_relax and barrier are the same, but on
>> some architectures cpu_relax can add some latency.
>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>> towards other hardware threads in an SMT environment.
>> On s390 cpu_relax does even more, it uses an hypercall to the
>> hypervisor to give up the timeslice.
>> In contrast to the SMT yielding this can result in larger latencies.
>> In some places this latency is unwanted, so another variant
>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>> and more places, lets revert the logic and provide a cpu_relax_yield
>> that can be called in places where yielding is more important than
>> latency. By default this is the same as cpu_relax on all architectures.
> 
> Rather than having to update all these architectures in this way, can't
> we put in some linux/*.h header something like:
> 
> #ifndef cpu_relax_yield
> #define cpu_relax_yield() cpu_relax()
> #endif
> 
> so only those architectures that need to do something need to be
> modified?

These patches are part of linux-next since a month or so, changing that 
would invalidate all the next testing. If people want that, I can certainly
do that, though.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-11-15 13:19       ` Christian Borntraeger
  (?)
  (?)
@ 2016-11-15 13:37       ` Russell King - ARM Linux
  2016-11-15 13:52         ` Christian Borntraeger
  2016-11-15 13:52           ` Christian Borntraeger
  -1 siblings, 2 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2016-11-15 13:37 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Peter Zijlstra, Ingo Molnar, Nicholas Piggin, linux-kernel,
	linux-s390, linux-arch, linuxppc-dev, Heiko Carstens,
	Martin Schwidefsky, Noam Camus, sparclinux, x86, Will Deacon,
	Catalin Marinas, virtualization, xen-devel, kvm

On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> > On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> >> For spinning loops people do often use barrier() or cpu_relax().
> >> For most architectures cpu_relax and barrier are the same, but on
> >> some architectures cpu_relax can add some latency.
> >> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> >> towards other hardware threads in an SMT environment.
> >> On s390 cpu_relax does even more, it uses an hypercall to the
> >> hypervisor to give up the timeslice.
> >> In contrast to the SMT yielding this can result in larger latencies.
> >> In some places this latency is unwanted, so another variant
> >> "cpu_relax_lowlatency" was introduced. Before this is used in more
> >> and more places, lets revert the logic and provide a cpu_relax_yield
> >> that can be called in places where yielding is more important than
> >> latency. By default this is the same as cpu_relax on all architectures.
> > 
> > Rather than having to update all these architectures in this way, can't
> > we put in some linux/*.h header something like:
> > 
> > #ifndef cpu_relax_yield
> > #define cpu_relax_yield() cpu_relax()
> > #endif
> > 
> > so only those architectures that need to do something need to be
> > modified?
> 
> These patches are part of linux-next since a month or so, changing that 
> would invalidate all the next testing. If people want that, I can certainly
> do that, though.

It's three weeks since you posted them.  For one of those weeks (the
week you posted them) I was away, and missed them while catching up.
Sorry, but it sometimes takes a while to spot things amongst the
backlog, and normally takes some subsequent activity on the thread to
bring it back into view.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-11-15 13:19       ` Christian Borntraeger
                         ` (2 preceding siblings ...)
  (?)
@ 2016-11-15 13:37       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2016-11-15 13:37 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> > On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> >> For spinning loops people do often use barrier() or cpu_relax().
> >> For most architectures cpu_relax and barrier are the same, but on
> >> some architectures cpu_relax can add some latency.
> >> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> >> towards other hardware threads in an SMT environment.
> >> On s390 cpu_relax does even more, it uses an hypercall to the
> >> hypervisor to give up the timeslice.
> >> In contrast to the SMT yielding this can result in larger latencies.
> >> In some places this latency is unwanted, so another variant
> >> "cpu_relax_lowlatency" was introduced. Before this is used in more
> >> and more places, lets revert the logic and provide a cpu_relax_yield
> >> that can be called in places where yielding is more important than
> >> latency. By default this is the same as cpu_relax on all architectures.
> > 
> > Rather than having to update all these architectures in this way, can't
> > we put in some linux/*.h header something like:
> > 
> > #ifndef cpu_relax_yield
> > #define cpu_relax_yield() cpu_relax()
> > #endif
> > 
> > so only those architectures that need to do something need to be
> > modified?
> 
> These patches are part of linux-next since a month or so, changing that 
> would invalidate all the next testing. If people want that, I can certainly
> do that, though.

It's three weeks since you posted them.  For one of those weeks (the
week you posted them) I was away, and missed them while catching up.
Sorry, but it sometimes takes a while to spot things amongst the
backlog, and normally takes some subsequent activity on the thread to
bring it back into view.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-11-15 13:19       ` Christian Borntraeger
                         ` (3 preceding siblings ...)
  (?)
@ 2016-11-15 13:37       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 45+ messages in thread
From: Russell King - ARM Linux @ 2016-11-15 13:37 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
> > On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
> >> For spinning loops people do often use barrier() or cpu_relax().
> >> For most architectures cpu_relax and barrier are the same, but on
> >> some architectures cpu_relax can add some latency.
> >> For example on power,sparc64 and arc, cpu_relax can shift the CPU
> >> towards other hardware threads in an SMT environment.
> >> On s390 cpu_relax does even more, it uses an hypercall to the
> >> hypervisor to give up the timeslice.
> >> In contrast to the SMT yielding this can result in larger latencies.
> >> In some places this latency is unwanted, so another variant
> >> "cpu_relax_lowlatency" was introduced. Before this is used in more
> >> and more places, lets revert the logic and provide a cpu_relax_yield
> >> that can be called in places where yielding is more important than
> >> latency. By default this is the same as cpu_relax on all architectures.
> > 
> > Rather than having to update all these architectures in this way, can't
> > we put in some linux/*.h header something like:
> > 
> > #ifndef cpu_relax_yield
> > #define cpu_relax_yield() cpu_relax()
> > #endif
> > 
> > so only those architectures that need to do something need to be
> > modified?
> 
> These patches are part of linux-next since a month or so, changing that 
> would invalidate all the next testing. If people want that, I can certainly
> do that, though.

It's three weeks since you posted them.  For one of those weeks (the
week you posted them) I was away, and missed them while catching up.
Sorry, but it sometimes takes a while to spot things amongst the
backlog, and normally takes some subsequent activity on the thread to
bring it back into view.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-11-15 13:37       ` Russell King - ARM Linux
  2016-11-15 13:52         ` Christian Borntraeger
@ 2016-11-15 13:52           ` Christian Borntraeger
  1 sibling, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 13:52 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Peter Zijlstra, Ingo Molnar, Nicholas Piggin, linux-kernel,
	linux-s390, linux-arch, linuxppc-dev, Heiko Carstens,
	Martin Schwidefsky, Noam Camus, sparclinux, x86, Will Deacon,
	Catalin Marinas, virtualization, xen-devel, kvm

On 11/15/2016 02:37 PM, Russell King - ARM Linux wrote:
> On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
>> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
>>> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>>>> For spinning loops people do often use barrier() or cpu_relax().
>>>> For most architectures cpu_relax and barrier are the same, but on
>>>> some architectures cpu_relax can add some latency.
>>>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>>>> towards other hardware threads in an SMT environment.
>>>> On s390 cpu_relax does even more, it uses an hypercall to the
>>>> hypervisor to give up the timeslice.
>>>> In contrast to the SMT yielding this can result in larger latencies.
>>>> In some places this latency is unwanted, so another variant
>>>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>>>> and more places, lets revert the logic and provide a cpu_relax_yield
>>>> that can be called in places where yielding is more important than
>>>> latency. By default this is the same as cpu_relax on all architectures.
>>>
>>> Rather than having to update all these architectures in this way, can't
>>> we put in some linux/*.h header something like:
>>>
>>> #ifndef cpu_relax_yield
>>> #define cpu_relax_yield() cpu_relax()
>>> #endif
>>>
>>> so only those architectures that need to do something need to be
>>> modified?
>>
>> These patches are part of linux-next since a month or so, changing that 
>> would invalidate all the next testing. If people want that, I can certainly
>> do that, though.
> 
> It's three weeks since you posted them.  For one of those weeks (the
> week you posted them) I was away, and missed them while catching up.
> Sorry, but it sometimes takes a while to spot things amongst the
> backlog, and normally takes some subsequent activity on the thread to
> bring it back into view.

Absolutely no need to apologize. Thank you for doing the review and the proposal. 
I will do whatever is consensus, but since this looks like tip/locking material
I will wait for Peter or Ingo to decide about the if and how.

Christian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
@ 2016-11-15 13:52           ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 13:52 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On 11/15/2016 02:37 PM, Russell King - ARM Linux wrote:
> On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
>> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
>>> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>>>> For spinning loops people do often use barrier() or cpu_relax().
>>>> For most architectures cpu_relax and barrier are the same, but on
>>>> some architectures cpu_relax can add some latency.
>>>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>>>> towards other hardware threads in an SMT environment.
>>>> On s390 cpu_relax does even more, it uses an hypercall to the
>>>> hypervisor to give up the timeslice.
>>>> In contrast to the SMT yielding this can result in larger latencies.
>>>> In some places this latency is unwanted, so another variant
>>>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>>>> and more places, lets revert the logic and provide a cpu_relax_yield
>>>> that can be called in places where yielding is more important than
>>>> latency. By default this is the same as cpu_relax on all architectures.
>>>
>>> Rather than having to update all these architectures in this way, can't
>>> we put in some linux/*.h header something like:
>>>
>>> #ifndef cpu_relax_yield
>>> #define cpu_relax_yield() cpu_relax()
>>> #endif
>>>
>>> so only those architectures that need to do something need to be
>>> modified?
>>
>> These patches are part of linux-next since a month or so, changing that 
>> would invalidate all the next testing. If people want that, I can certainly
>> do that, though.
> 
> It's three weeks since you posted them.  For one of those weeks (the
> week you posted them) I was away, and missed them while catching up.
> Sorry, but it sometimes takes a while to spot things amongst the
> backlog, and normally takes some subsequent activity on the thread to
> bring it back into view.

Absolutely no need to apologize. Thank you for doing the review and the proposal. 
I will do whatever is consensus, but since this looks like tip/locking material
I will wait for Peter or Ingo to decide about the if and how.

Christian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
@ 2016-11-15 13:52           ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 13:52 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On 11/15/2016 02:37 PM, Russell King - ARM Linux wrote:
> On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
>> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
>>> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>>>> For spinning loops people do often use barrier() or cpu_relax().
>>>> For most architectures cpu_relax and barrier are the same, but on
>>>> some architectures cpu_relax can add some latency.
>>>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>>>> towards other hardware threads in an SMT environment.
>>>> On s390 cpu_relax does even more, it uses an hypercall to the
>>>> hypervisor to give up the timeslice.
>>>> In contrast to the SMT yielding this can result in larger latencies.
>>>> In some places this latency is unwanted, so another variant
>>>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>>>> and more places, lets revert the logic and provide a cpu_relax_yield
>>>> that can be called in places where yielding is more important than
>>>> latency. By default this is the same as cpu_relax on all architectures.
>>>
>>> Rather than having to update all these architectures in this way, can't
>>> we put in some linux/*.h header something like:
>>>
>>> #ifndef cpu_relax_yield
>>> #define cpu_relax_yield() cpu_relax()
>>> #endif
>>>
>>> so only those architectures that need to do something need to be
>>> modified?
>>
>> These patches are part of linux-next since a month or so, changing that 
>> would invalidate all the next testing. If people want that, I can certainly
>> do that, though.
> 
> It's three weeks since you posted them.  For one of those weeks (the
> week you posted them) I was away, and missed them while catching up.
> Sorry, but it sometimes takes a while to spot things amongst the
> backlog, and normally takes some subsequent activity on the thread to
> bring it back into view.

Absolutely no need to apologize. Thank you for doing the review and the proposal. 
I will do whatever is consensus, but since this looks like tip/locking material
I will wait for Peter or Ingo to decide about the if and how.

Christian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
  2016-11-15 13:37       ` Russell King - ARM Linux
@ 2016-11-15 13:52         ` Christian Borntraeger
  2016-11-15 13:52           ` Christian Borntraeger
  1 sibling, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2016-11-15 13:52 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: linux-arch, linux-s390, kvm, Peter Zijlstra, Will Deacon, x86,
	Heiko Carstens, linux-kernel, Nicholas Piggin, virtualization,
	sparclinux, Noam Camus, Catalin Marinas, Martin Schwidefsky,
	xen-devel, linuxppc-dev, Ingo Molnar

On 11/15/2016 02:37 PM, Russell King - ARM Linux wrote:
> On Tue, Nov 15, 2016 at 02:19:53PM +0100, Christian Borntraeger wrote:
>> On 11/15/2016 01:30 PM, Russell King - ARM Linux wrote:
>>> On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:
>>>> For spinning loops people do often use barrier() or cpu_relax().
>>>> For most architectures cpu_relax and barrier are the same, but on
>>>> some architectures cpu_relax can add some latency.
>>>> For example on power,sparc64 and arc, cpu_relax can shift the CPU
>>>> towards other hardware threads in an SMT environment.
>>>> On s390 cpu_relax does even more, it uses an hypercall to the
>>>> hypervisor to give up the timeslice.
>>>> In contrast to the SMT yielding this can result in larger latencies.
>>>> In some places this latency is unwanted, so another variant
>>>> "cpu_relax_lowlatency" was introduced. Before this is used in more
>>>> and more places, lets revert the logic and provide a cpu_relax_yield
>>>> that can be called in places where yielding is more important than
>>>> latency. By default this is the same as cpu_relax on all architectures.
>>>
>>> Rather than having to update all these architectures in this way, can't
>>> we put in some linux/*.h header something like:
>>>
>>> #ifndef cpu_relax_yield
>>> #define cpu_relax_yield() cpu_relax()
>>> #endif
>>>
>>> so only those architectures that need to do something need to be
>>> modified?
>>
>> These patches are part of linux-next since a month or so, changing that 
>> would invalidate all the next testing. If people want that, I can certainly
>> do that, though.
> 
> It's three weeks since you posted them.  For one of those weeks (the
> week you posted them) I was away, and missed them while catching up.
> Sorry, but it sometimes takes a while to spot things amongst the
> backlog, and normally takes some subsequent activity on the thread to
> bring it back into view.

Absolutely no need to apologize. Thank you for doing the review and the proposal. 
I will do whatever is consensus, but since this looks like tip/locking material
I will wait for Peter or Ingo to decide about the if and how.

Christian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [tip:locking/core] locking/core: Introduce cpu_relax_yield()
  2016-10-25  9:03   ` Christian Borntraeger
                     ` (3 preceding siblings ...)
  (?)
@ 2016-11-16 12:08   ` tip-bot for Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: tip-bot for Christian Borntraeger @ 2016-11-16 12:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: borntraeger, schwidefsky, linux-kernel, will.deacon, npiggin,
	tglx, heiko.carstens, hpa, linux, mingo, noamc, peterz,
	catalin.marinas, torvalds

Commit-ID:  79ab11cdb90d8536817ab7357ecb6b1ff76be26c
Gitweb:     http://git.kernel.org/tip/79ab11cdb90d8536817ab7357ecb6b1ff76be26c
Author:     Christian Borntraeger <borntraeger@de.ibm.com>
AuthorDate: Tue, 25 Oct 2016 11:03:11 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 16 Nov 2016 10:15:09 +0100

locking/core: Introduce cpu_relax_yield()

For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/alpha/include/asm/processor.h      | 1 +
 arch/arc/include/asm/processor.h        | 2 ++
 arch/arm/include/asm/processor.h        | 1 +
 arch/arm64/include/asm/processor.h      | 1 +
 arch/avr32/include/asm/processor.h      | 1 +
 arch/blackfin/include/asm/processor.h   | 1 +
 arch/c6x/include/asm/processor.h        | 1 +
 arch/cris/include/asm/processor.h       | 1 +
 arch/frv/include/asm/processor.h        | 1 +
 arch/h8300/include/asm/processor.h      | 1 +
 arch/hexagon/include/asm/processor.h    | 1 +
 arch/ia64/include/asm/processor.h       | 1 +
 arch/m32r/include/asm/processor.h       | 1 +
 arch/m68k/include/asm/processor.h       | 1 +
 arch/metag/include/asm/processor.h      | 1 +
 arch/microblaze/include/asm/processor.h | 1 +
 arch/mips/include/asm/processor.h       | 1 +
 arch/mn10300/include/asm/processor.h    | 1 +
 arch/nios2/include/asm/processor.h      | 1 +
 arch/openrisc/include/asm/processor.h   | 1 +
 arch/parisc/include/asm/processor.h     | 1 +
 arch/powerpc/include/asm/processor.h    | 1 +
 arch/s390/include/asm/processor.h       | 3 ++-
 arch/s390/kernel/processor.c            | 4 ++--
 arch/score/include/asm/processor.h      | 1 +
 arch/sh/include/asm/processor.h         | 1 +
 arch/sparc/include/asm/processor_32.h   | 1 +
 arch/sparc/include/asm/processor_64.h   | 1 +
 arch/tile/include/asm/processor.h       | 1 +
 arch/unicore32/include/asm/processor.h  | 1 +
 arch/x86/include/asm/processor.h        | 1 +
 arch/x86/um/asm/processor.h             | 1 +
 arch/xtensa/include/asm/processor.h     | 1 +
 33 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 43a7559..0556fda 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p);
   ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 16b630f..6c158d5 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -60,6 +60,7 @@ struct task_struct;
 #ifndef CONFIG_EZNPS_MTM_EXT
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #else
@@ -67,6 +68,7 @@ struct task_struct;
 #define cpu_relax()     \
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	barrier()
 
 #endif
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 8a1e8e9..db660e0 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define cpu_relax()			barrier()
 #endif
 
+#define cpu_relax_yield()  	              cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 60e3482..3f9b0e5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,6 +149,7 @@ static inline void cpu_relax(void)
 	asm volatile("yield" ::: "memory");
 }
 
+#define cpu_relax_yield()                     cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index 941593c..e412e8b 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data;
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 0c265ab..8b8704a 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -92,6 +92,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    	smp_mb()
+#define cpu_relax_yield()      cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index f2ef31b..914d730 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -121,6 +121,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(task)	(task_pt_regs(task)->sp)
 
 #define cpu_relax()		do { } while (0)
+#define cpu_relax_yield()             cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 862126b..01dd52e 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -63,6 +63,7 @@ static inline void release_thread(struct task_struct *dead_task)
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 73f0a79..4d00d65 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -107,6 +107,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk)->thread.frame0->sp)
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 111df73..683a061 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -127,6 +127,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index d850113..1558ddb 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -56,6 +56,7 @@ struct thread_struct {
 }
 
 #define cpu_relax() __vmyield()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index ce53c50..4654b71 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -547,6 +547,7 @@ ia64_eoi (void)
 }
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index 9f8fd9b..b262037 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -133,6 +133,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)  ((tsk)->thread.sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index c84a218..13e07ae 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -156,6 +156,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define task_pt_regs(tsk)	((struct pt_regs *) ((tsk)->thread.esp0))
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index a0333eb..61d6e27 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -152,6 +152,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define user_stack_pointer(regs)        ((regs)->ctx.AX[0].U0)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index c38d0dd..fd7dd11 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -22,6 +22,7 @@
 extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
+# define cpu_relax_yield() cpu_relax()
 # define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 0d36c87..9a656f6 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -389,6 +389,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index b10ba12..89f63d1 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -69,6 +69,7 @@ extern void print_cpu_info(struct mn10300_cpuinfo *);
 extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 1c953f0..303e593 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -88,6 +88,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.kregs->sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 70334c9..6ecfc2a 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -92,6 +92,7 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index 2e674e1..ea2ff9f 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -309,6 +309,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.regs.gr[30])
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index c07c31b..908fa7c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -404,6 +404,7 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #define cpu_relax()	barrier()
 #endif
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 602af69..5bb4433 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -234,8 +234,9 @@ static inline unsigned short stap(void)
 /*
  * Give up the time slice of the virtual PU.
  */
-void cpu_relax(void);
+void cpu_relax_yield(void);
 
+#define cpu_relax() cpu_relax_yield()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 81d0808..9e60ef1 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -53,7 +53,7 @@ void s390_update_cpu_mhz(void)
 		on_each_cpu(update_cpu_mhz, NULL, 0);
 }
 
-void notrace cpu_relax(void)
+void notrace cpu_relax_yield(void)
 {
 	if (!smp_cpu_mtid && MACHINE_HAS_DIAG44) {
 		diag_stat_inc(DIAG_STAT_X044);
@@ -61,7 +61,7 @@ void notrace cpu_relax(void)
 	}
 	barrier();
 }
-EXPORT_SYMBOL(cpu_relax);
+EXPORT_SYMBOL(cpu_relax_yield);
 
 /*
  * cpu_init - initializes state that is per-CPU.
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index 851f441..e8e87b4 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -24,6 +24,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define current_text_addr() ({ __label__ _l; _l: &&_l; })
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index f9a0994..099a991 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -97,6 +97,7 @@ extern struct sh_cpuinfo cpu_data[];
 
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 812fd08..50e908a3c 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -119,6 +119,7 @@ extern struct task_struct *last_task_used_math;
 int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index ce2595c..3e8fac7 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -216,6 +216,7 @@ unsigned long get_wchan(struct task_struct *task);
 				     "nop\n\t"				\
 				     ".previous"			\
 				     ::: "memory")
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 0684e88..91a39a5 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -264,6 +264,7 @@ static inline void cpu_relax(void)
 	barrier();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index 8d21b7a..fc54d5d 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -71,6 +71,7 @@ extern void release_thread(struct task_struct *);
 unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
+#define cpu_relax_yield()		cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..44adada 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -588,6 +588,7 @@ static __always_inline void cpu_relax(void)
 	rep_nop();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 233ee09..01597af 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -26,6 +26,7 @@ static inline void rep_nop(void)
 }
 
 #define cpu_relax()		rep_nop()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index b42d68b..fe14dc2 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -206,6 +206,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)		(task_pt_regs(tsk)->areg[1])
 
 #define cpu_relax()  barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [tip:locking/core] locking/core, stop_machine: Yield the CPU during stop machine()
  2016-10-25  9:03   ` Christian Borntraeger
  (?)
  (?)
@ 2016-11-16 12:09   ` tip-bot for Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: tip-bot for Christian Borntraeger @ 2016-11-16 12:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux, npiggin, mingo, linux-kernel, schwidefsky, torvalds,
	heiko.carstens, tglx, borntraeger, will.deacon, catalin.marinas,
	noamc, hpa, peterz

Commit-ID:  bf0d31c05411f030e7df9f98b5be4d0c6fd5cd39
Gitweb:     http://git.kernel.org/tip/bf0d31c05411f030e7df9f98b5be4d0c6fd5cd39
Author:     Christian Borntraeger <borntraeger@de.ibm.com>
AuthorDate: Tue, 25 Oct 2016 11:03:12 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 16 Nov 2016 10:15:09 +0100

locking/core, stop_machine: Yield the CPU during stop machine()

Some time ago the following commit:

  57f2ffe14fd125c2 ("s390: remove diag 44 calls from cpu_relax()")

... stopped cpu_relax() on s390 yielding to the hypervisor.

As it turns out this made stop_machine() run really slow on virtualized
overcommited systems. For example the kprobes test during bootup took
several seconds instead of just running unnoticed with large guests.

Therefore, yielding was reintroduced with commit:

  4d92f50249eb ("s390: reintroduce diag 44 calls for cpu_relax()")

... but in fact the stop machine code seems to be the only place where
this yielding was really necessary. This place is probably the most
important one as it makes all but one guest CPUs wait for one guest CPU.

As we now have cpu_relax_yield(), we can use this in multi_cpu_stop().
For now lets only add it here. We can add it later in other places
when necessary.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-3-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/stop_machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ec9ab2f..1eb8266 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read multi_stop_state. */
-		cpu_relax();
+		cpu_relax_yield();
 		if (msdata->state != curstate) {
 			curstate = msdata->state;
 			switch (curstate) {

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [tip:locking/core] locking/core, s390: Make cpu_relax() a barrier again
  2016-10-25  9:03   ` Christian Borntraeger
  (?)
  (?)
@ 2016-11-16 12:09   ` tip-bot for Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: tip-bot for Christian Borntraeger @ 2016-11-16 12:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, npiggin, will.deacon, torvalds, heiko.carstens,
	schwidefsky, peterz, borntraeger, catalin.marinas, mingo, linux,
	tglx, noamc, hpa

Commit-ID:  22b6430d36659b37ed139b7fd87fcc7237fb0cfd
Gitweb:     http://git.kernel.org/tip/22b6430d36659b37ed139b7fd87fcc7237fb0cfd
Author:     Christian Borntraeger <borntraeger@de.ibm.com>
AuthorDate: Tue, 25 Oct 2016 11:03:13 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 16 Nov 2016 10:15:10 +0100

locking/core, s390: Make cpu_relax() a barrier again

stop_machine() seemed to be the only important place for yielding during
cpu_relax(). This was fixed by using cpu_relax_yield().

Therefore, we can now redefine cpu_relax() to be a barrier instead on s390,
making s390 identical to all other architectures.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-4-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/s390/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 5bb4433..79343e3 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -236,7 +236,7 @@ static inline unsigned short stap(void)
  */
 void cpu_relax_yield(void);
 
-#define cpu_relax() cpu_relax_yield()
+#define cpu_relax() barrier()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [tip:locking/core] locking/core: Remove cpu_relax_lowlatency() users
  2016-10-25  9:03   ` Christian Borntraeger
  (?)
  (?)
@ 2016-11-16 12:10   ` tip-bot for Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: tip-bot for Christian Borntraeger @ 2016-11-16 12:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, catalin.marinas, linux-kernel, noamc, tglx, borntraeger,
	heiko.carstens, schwidefsky, linux, peterz, torvalds, npiggin,
	hpa, will.deacon

Commit-ID:  f2f09a4cee3507dba0e24b87ba2961a5c377d3a7
Gitweb:     http://git.kernel.org/tip/f2f09a4cee3507dba0e24b87ba2961a5c377d3a7
Author:     Christian Borntraeger <borntraeger@de.ibm.com>
AuthorDate: Tue, 25 Oct 2016 11:03:14 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 16 Nov 2016 10:15:10 +0100

locking/core: Remove cpu_relax_lowlatency() users

With the s390 special case of a yielding cpu_relax() implementation gone,
we can now remove all users of cpu_relax_lowlatency() and replace them
with cpu_relax().

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c                     | 4 ++--
 kernel/locking/mcs_spinlock.h           | 4 ++--
 kernel/locking/mutex.c                  | 4 ++--
 kernel/locking/osq_lock.c               | 6 +++---
 kernel/locking/qrwlock.c                | 6 +++---
 kernel/locking/rwsem-xadd.c             | 4 ++--
 lib/lockref.c                           | 2 +-
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 8832f8e..383d134 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -723,7 +723,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
 		if (busywait_stop(timeout_us, cpu))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	} while (!need_resched());
 
 	return false;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..5dc3465 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -342,7 +342,7 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 		endtime = busy_clock() + vq->busyloop_timeout;
 		while (vhost_can_busy_poll(vq->dev, endtime) &&
 		       vhost_vq_avail_empty(vq->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 		preempt_enable();
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
@@ -533,7 +533,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 		while (vhost_can_busy_poll(&net->dev, endtime) &&
 		       !sk_has_rx_data(sk) &&
 		       vhost_vq_avail_empty(&net->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 
 		preempt_enable();
 
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index c835270..6a385aa 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -28,7 +28,7 @@ struct mcs_spinlock {
 #define arch_mcs_spin_lock_contended(l)					\
 do {									\
 	while (!(smp_load_acquire(l)))					\
-		cpu_relax_lowlatency();					\
+		cpu_relax();						\
 } while (0)
 #endif
 
@@ -108,7 +108,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 			return;
 		/* Wait until the next pointer is set */
 		while (!(next = READ_ONCE(node->next)))
-			cpu_relax_lowlatency();
+			cpu_relax();
 	}
 
 	/* Pass lock to next waiter. */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 17a88e9..a65e09a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -369,7 +369,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 			break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 
@@ -492,7 +492,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	if (!waiter)
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..4ea2710 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 				break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	return next;
@@ -122,7 +122,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (need_resched())
 			goto unqueue;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	return true;
 
@@ -148,7 +148,7 @@ unqueue:
 		if (smp_load_acquire(&node->locked))
 			return true;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 
 		/*
 		 * Or we race against a concurrent unqueue()'s step-B, in which
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 19248dd..cc3ed0c 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -54,7 +54,7 @@ static __always_inline void
 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
 {
 	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
-		cpu_relax_lowlatency();
+		cpu_relax();
 		cnts = atomic_read_acquire(&lock->cnts);
 	}
 }
@@ -130,7 +130,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	/* When no more readers, set the locked flag */
@@ -141,7 +141,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 					    _QW_LOCKED) == _QW_WAITING))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 unlock:
 	arch_spin_unlock(&lock->wait_lock);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..2fa2e2e6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -368,7 +368,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 			return false;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 out:
@@ -423,7 +423,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	osq_unlock(&sem->osq);
 done:
diff --git a/lib/lockref.c b/lib/lockref.c
index 5a92189..c4bfcb8 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -20,7 +20,7 @@
 		if (likely(old.lock_count == prev.lock_count)) {		\
 			SUCCESS;						\
 		}								\
-		cpu_relax_lowlatency();						\
+		cpu_relax();							\
 	}									\
 } while (0)
 

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [tip:locking/core] locking/core, arch: Remove cpu_relax_lowlatency()
  2016-10-25  9:03   ` Christian Borntraeger
  (?)
@ 2016-11-16 12:11   ` tip-bot for Christian Borntraeger
  -1 siblings, 0 replies; 45+ messages in thread
From: tip-bot for Christian Borntraeger @ 2016-11-16 12:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, torvalds, schwidefsky, mingo, linux, hpa,
	will.deacon, heiko.carstens, catalin.marinas, linux-arch, tglx,
	borntraeger, npiggin, noamc, peterz

Commit-ID:  5bd0b85ba8bb9de6f61f33f3752fc85f4c87fc22
Gitweb:     http://git.kernel.org/tip/5bd0b85ba8bb9de6f61f33f3752fc85f4c87fc22
Author:     Christian Borntraeger <borntraeger@de.ibm.com>
AuthorDate: Tue, 25 Oct 2016 11:03:15 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 16 Nov 2016 10:15:11 +0100

locking/core, arch: Remove cpu_relax_lowlatency()

As there are no users left, we can remove cpu_relax_lowlatency()
implementations from every architecture.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Cc: <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1477386195-32736-6-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/alpha/include/asm/processor.h      | 1 -
 arch/arc/include/asm/processor.h        | 2 --
 arch/arm/include/asm/processor.h        | 1 -
 arch/arm64/include/asm/processor.h      | 1 -
 arch/avr32/include/asm/processor.h      | 1 -
 arch/blackfin/include/asm/processor.h   | 1 -
 arch/c6x/include/asm/processor.h        | 1 -
 arch/cris/include/asm/processor.h       | 1 -
 arch/frv/include/asm/processor.h        | 1 -
 arch/h8300/include/asm/processor.h      | 1 -
 arch/hexagon/include/asm/processor.h    | 1 -
 arch/ia64/include/asm/processor.h       | 1 -
 arch/m32r/include/asm/processor.h       | 1 -
 arch/m68k/include/asm/processor.h       | 1 -
 arch/metag/include/asm/processor.h      | 1 -
 arch/microblaze/include/asm/processor.h | 1 -
 arch/mips/include/asm/processor.h       | 1 -
 arch/mn10300/include/asm/processor.h    | 1 -
 arch/nios2/include/asm/processor.h      | 1 -
 arch/openrisc/include/asm/processor.h   | 1 -
 arch/parisc/include/asm/processor.h     | 1 -
 arch/powerpc/include/asm/processor.h    | 1 -
 arch/s390/include/asm/processor.h       | 1 -
 arch/score/include/asm/processor.h      | 1 -
 arch/sh/include/asm/processor.h         | 1 -
 arch/sparc/include/asm/processor_32.h   | 1 -
 arch/sparc/include/asm/processor_64.h   | 1 -
 arch/tile/include/asm/processor.h       | 1 -
 arch/unicore32/include/asm/processor.h  | 1 -
 arch/x86/include/asm/processor.h        | 1 -
 arch/x86/um/asm/processor.h             | 1 -
 arch/xtensa/include/asm/processor.h     | 1 -
 32 files changed, 33 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 0556fda..31e8dbe 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -59,7 +59,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
 #define ARCH_HAS_PREFETCHW
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 6c158d5..d102a49 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -61,7 +61,6 @@ struct task_struct;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #else
 
@@ -69,7 +68,6 @@ struct task_struct;
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	barrier()
 
 #endif
 
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index db660e0..9e71c58b 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -83,7 +83,6 @@ unsigned long get_wchan(struct task_struct *p);
 #endif
 
 #define cpu_relax_yield()  	              cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 3f9b0e5..6132f64 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -150,7 +150,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield()                     cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index e412e8b..ee62365 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -93,7 +93,6 @@ extern struct avr32_cpuinfo boot_cpu_data;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
 struct cpu_context {
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 8b8704a..57acfb1 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -93,7 +93,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    	smp_mb()
 #define cpu_relax_yield()      cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
 static inline uint32_t __pure bfin_revid(void)
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 914d730..1fd22e7 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -122,7 +122,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		do { } while (0)
 #define cpu_relax_yield()             cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
 
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 01dd52e..1a57841 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -64,7 +64,6 @@ static inline void release_thread(struct task_struct *dead_task)
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 4d00d65..c1e5f2a 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -108,7 +108,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
 #define ARCH_HAS_PREFETCH
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 683a061..42d6053 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -128,7 +128,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
 	local_irq_disable();		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index 1558ddb..5d694cc 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -57,7 +57,6 @@ struct thread_struct {
 
 #define cpu_relax() __vmyield()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Decides where the kernel will search for a free chunk of vm space during
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 4654b71..0c2c3b2 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -548,7 +548,6 @@ ia64_eoi (void)
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
 ia64_get_irr(unsigned int vector)
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index b262037..9b83a13 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -134,6 +134,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 13e07ae..b0d0442 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -157,6 +157,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index 61d6e27..ee302a6 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -153,7 +153,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
 
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index fd7dd11..08ec1f7 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -23,7 +23,6 @@ extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
 # define cpu_relax_yield() cpu_relax()
-# define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
 		(((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 9a656f6..8ea95e7 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -390,7 +390,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Return_address is a replacement for __builtin_return_address(count)
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index 89f63d1..d11397b 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -70,7 +70,6 @@ extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * User space process size: 1.75GB (default).
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 303e593..d32c176 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -89,7 +89,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 6ecfc2a..7f47fc7 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -93,7 +93,6 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_OPENRISC_PROCESSOR_H */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index ea2ff9f..a4a07f4 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -310,7 +310,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * parisc_requires_coherency() is used to identify the combined VIPT/PIPT
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 908fa7c..5684e68 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -405,7 +405,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #endif
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
 int validate_sp(unsigned long sp, struct task_struct *p,
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 79343e3..9e32f25 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -237,7 +237,6 @@ static inline unsigned short stap(void)
 void cpu_relax_yield(void);
 
 #define cpu_relax() barrier()
-#define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
 #define ECAG_CPU_ATTRIBUTE	1
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index e8e87b4..a1e97c0 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -25,7 +25,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
 /*
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index 099a991..9454ff1 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -98,7 +98,6 @@ extern struct sh_cpuinfo cpu_data[];
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 void stop_this_cpu(void *);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 50e908a3c..fc32b73 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -120,7 +120,6 @@ int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
 
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index 3e8fac7..12787df 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -217,7 +217,6 @@ unsigned long get_wchan(struct task_struct *task);
 				     ".previous"			\
 				     ::: "memory")
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
  * UltraSPARC-I will treat these as nops, and UltraSPARC-II has
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 91a39a5..c1c228b 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -265,7 +265,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
 struct seq_operations;
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index fc54d5d..eeefe7c 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -72,7 +72,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
 #define cpu_relax_yield()		cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 44adada..7513c99 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -589,7 +589,6 @@ static __always_inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
 static inline void sync_core(void)
diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h
index 01597af..b4bd63b 100644
--- a/arch/x86/um/asm/processor.h
+++ b/arch/x86/um/asm/processor.h
@@ -27,7 +27,6 @@ static inline void rep_nop(void)
 
 #define cpu_relax()		rep_nop()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(t) (&(t)->thread.regs)
 
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index fe14dc2..7d8d6be 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -207,7 +207,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()  barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
 

^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-11-16 12:12 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-25  9:03 [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield Christian Borntraeger
2016-10-25  9:03 ` Christian Borntraeger
2016-10-25  9:03 ` [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield Christian Borntraeger
2016-10-25  9:03   ` Christian Borntraeger
2016-11-15 12:30   ` Russell King - ARM Linux
2016-11-15 12:30   ` Russell King - ARM Linux
2016-11-15 13:19     ` Christian Borntraeger
2016-11-15 13:19       ` Christian Borntraeger
2016-11-15 13:19       ` Christian Borntraeger
2016-11-15 13:37       ` Russell King - ARM Linux
2016-11-15 13:52         ` Christian Borntraeger
2016-11-15 13:52         ` Christian Borntraeger
2016-11-15 13:52           ` Christian Borntraeger
2016-11-15 13:52           ` Christian Borntraeger
2016-11-15 13:37       ` Russell King - ARM Linux
2016-11-15 13:37       ` Russell King - ARM Linux
2016-11-15 13:19     ` Christian Borntraeger
2016-11-15 12:30   ` Russell King - ARM Linux
2016-11-16 12:08   ` [tip:locking/core] locking/core: Introduce cpu_relax_yield() tip-bot for Christian Borntraeger
2016-10-25  9:03 ` [GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield Christian Borntraeger
2016-10-25  9:03 ` Christian Borntraeger
2016-10-25  9:03 ` [GIT PULL v2 2/5] stop_machine: yield CPU during stop machine Christian Borntraeger
2016-10-25  9:03 ` Christian Borntraeger
2016-10-25  9:03   ` Christian Borntraeger
2016-10-25  9:03   ` Christian Borntraeger
2016-11-16 12:09   ` [tip:locking/core] locking/core, stop_machine: Yield the CPU during stop machine() tip-bot for Christian Borntraeger
2016-10-25  9:03 ` [GIT PULL v2 3/5] s390: make cpu_relax a barrier again Christian Borntraeger
2016-10-25  9:03 ` Christian Borntraeger
2016-10-25  9:03   ` Christian Borntraeger
2016-10-25  9:03   ` Christian Borntraeger
2016-11-16 12:09   ` [tip:locking/core] locking/core, s390: Make cpu_relax() " tip-bot for Christian Borntraeger
2016-10-25  9:03 ` [GIT PULL v2 4/5] processor.h: Remove cpu_relax_lowlatency users Christian Borntraeger
2016-10-25  9:03   ` Christian Borntraeger
2016-10-25  9:03   ` Christian Borntraeger
2016-11-16 12:10   ` [tip:locking/core] locking/core: Remove cpu_relax_lowlatency() users tip-bot for Christian Borntraeger
2016-10-25  9:03 ` [GIT PULL v2 4/5] processor.h: Remove cpu_relax_lowlatency users Christian Borntraeger
2016-10-25  9:03 ` [GIT PULL v2 5/5] processor.h: remove cpu_relax_lowlatency Christian Borntraeger
2016-10-25  9:03   ` Christian Borntraeger
2016-11-16 12:11   ` [tip:locking/core] locking/core, arch: Remove cpu_relax_lowlatency() tip-bot for Christian Borntraeger
2016-10-25  9:03 ` [GIT PULL v2 5/5] processor.h: remove cpu_relax_lowlatency Christian Borntraeger
2016-10-25  9:03 ` Christian Borntraeger
2016-11-15 10:15 ` [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield Christian Borntraeger
2016-11-15 10:15   ` Christian Borntraeger
2016-11-15 10:15 ` Christian Borntraeger
2016-11-15 10:15 ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.