All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency
@ 2016-10-21 11:58 Christian Borntraeger
  2016-10-21 11:58 ` [PATCH 1/5] processor.h: introduce cpu_relax_yield Christian Borntraeger
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-21 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus,
	Christian Borntraeger

For spinning loops people did often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency. For example on s390
cpu_relax gives up the time slice to the hypervisor. On power cpu_relax
tries to give some of the CPU to the neighbor threads. To reduce the
latency another variant cpu_relax_lowlatency was introduced. Before this
is used in more and more places, lets revert the logic of provide a new
function cpu_relax_yield that can spend some time and for s390 yields
the guest CPU.

So my proposal boils down to:
- lowest latency: use barrier() or mb() if necessary
- low latency: use cpu_relax (e.g. might give up some cpu for the other
  threads)
- really give up CPU: use  cpu_relax_yield

The alternative is to keep cpu_relax_lowlatency if there is some need.

Not fully sure about arc/eznps and power, but lets hear first if the
approach is ok.

PS: In the long run I would also try to provide for s390 something like
cpu_relax_yield_to with a cpu number (or just add that to cpu_relax_yield),
since a yield_to is always better than a yield as long as we know the waiter.


Christian Borntraeger (5):
  processor.h: introduce cpu_relax_yield
  stop_machine: yield CPU during stop machine
  s390: make cpu_relax a barrier again
  Remove cpu_relax_lowlatency users
  remove cpu_relax_lowlatency

 arch/alpha/include/asm/processor.h      | 2 +-
 arch/arc/include/asm/processor.h        | 2 ++
 arch/arm/include/asm/processor.h        | 2 +-
 arch/arm64/include/asm/processor.h      | 2 +-
 arch/avr32/include/asm/processor.h      | 2 +-
 arch/blackfin/include/asm/processor.h   | 2 +-
 arch/c6x/include/asm/processor.h        | 2 +-
 arch/cris/include/asm/processor.h       | 2 +-
 arch/frv/include/asm/processor.h        | 2 +-
 arch/h8300/include/asm/processor.h      | 2 +-
 arch/hexagon/include/asm/processor.h    | 2 +-
 arch/ia64/include/asm/processor.h       | 2 +-
 arch/m32r/include/asm/processor.h       | 2 +-
 arch/m68k/include/asm/processor.h       | 2 +-
 arch/metag/include/asm/processor.h      | 2 +-
 arch/microblaze/include/asm/processor.h | 2 +-
 arch/mips/include/asm/processor.h       | 2 +-
 arch/mn10300/include/asm/processor.h    | 2 +-
 arch/nios2/include/asm/processor.h      | 2 +-
 arch/openrisc/include/asm/processor.h   | 2 +-
 arch/parisc/include/asm/processor.h     | 2 +-
 arch/powerpc/include/asm/processor.h    | 2 +-
 arch/s390/include/asm/processor.h       | 4 ++--
 arch/s390/kernel/processor.c            | 4 ++--
 arch/score/include/asm/processor.h      | 2 +-
 arch/sh/include/asm/processor.h         | 2 +-
 arch/sparc/include/asm/processor_32.h   | 2 +-
 arch/sparc/include/asm/processor_64.h   | 2 +-
 arch/tile/include/asm/processor.h       | 2 +-
 arch/unicore32/include/asm/processor.h  | 2 +-
 arch/x86/include/asm/processor.h        | 2 +-
 arch/xtensa/include/asm/processor.h     | 2 +-
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c                     | 4 ++--
 kernel/locking/mcs_spinlock.h           | 4 ++--
 kernel/locking/mutex.c                  | 4 ++--
 kernel/locking/osq_lock.c               | 6 +++---
 kernel/locking/qrwlock.c                | 6 +++---
 kernel/locking/rwsem-xadd.c             | 4 ++--
 kernel/stop_machine.c                   | 2 +-
 lib/lockref.c                           | 2 +-
 41 files changed, 52 insertions(+), 50 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] processor.h: introduce cpu_relax_yield
  2016-10-21 11:58 [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Christian Borntraeger
@ 2016-10-21 11:58 ` Christian Borntraeger
  2016-10-21 11:58 ` [PATCH 2/5] stop_machine: yield CPU during stop machine Christian Borntraeger
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-21 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus,
	Christian Borntraeger

For spinning loops people did often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency. For example on s390
cpu_relax gives up the time slice to the hypervisor. On power cpu_relax
tries to give some of the CPU to the neighbor threads. To reduce the
latency another variant cpu_relax_lowlatency was introduced. Before this
is used in more and more places, lets revert the logic of provide a new
function cpu_relax_yield that can spend some time and for s390 yields
the guest CPU.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 +
 arch/arc/include/asm/processor.h        | 2 ++
 arch/arm/include/asm/processor.h        | 1 +
 arch/arm64/include/asm/processor.h      | 1 +
 arch/avr32/include/asm/processor.h      | 1 +
 arch/blackfin/include/asm/processor.h   | 1 +
 arch/c6x/include/asm/processor.h        | 1 +
 arch/cris/include/asm/processor.h       | 1 +
 arch/frv/include/asm/processor.h        | 1 +
 arch/h8300/include/asm/processor.h      | 1 +
 arch/hexagon/include/asm/processor.h    | 1 +
 arch/ia64/include/asm/processor.h       | 1 +
 arch/m32r/include/asm/processor.h       | 1 +
 arch/m68k/include/asm/processor.h       | 1 +
 arch/metag/include/asm/processor.h      | 1 +
 arch/microblaze/include/asm/processor.h | 1 +
 arch/mips/include/asm/processor.h       | 1 +
 arch/mn10300/include/asm/processor.h    | 1 +
 arch/nios2/include/asm/processor.h      | 1 +
 arch/openrisc/include/asm/processor.h   | 1 +
 arch/parisc/include/asm/processor.h     | 1 +
 arch/powerpc/include/asm/processor.h    | 1 +
 arch/s390/include/asm/processor.h       | 3 ++-
 arch/s390/kernel/processor.c            | 4 ++--
 arch/score/include/asm/processor.h      | 1 +
 arch/sh/include/asm/processor.h         | 1 +
 arch/sparc/include/asm/processor_32.h   | 1 +
 arch/sparc/include/asm/processor_64.h   | 1 +
 arch/tile/include/asm/processor.h       | 1 +
 arch/unicore32/include/asm/processor.h  | 1 +
 arch/x86/include/asm/processor.h        | 1 +
 arch/xtensa/include/asm/processor.h     | 1 +
 32 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 43a7559..0556fda 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p);
   ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 16b630f..6c158d5 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -60,6 +60,7 @@ struct task_struct;
 #ifndef CONFIG_EZNPS_MTM_EXT
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #else
@@ -67,6 +68,7 @@ struct task_struct;
 #define cpu_relax()     \
 	__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()	barrier()
 
 #endif
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 8a1e8e9..db660e0 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define cpu_relax()			barrier()
 #endif
 
+#define cpu_relax_yield()  	              cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index df2e53d..797ee20 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,6 +149,7 @@ static inline void cpu_relax(void)
 	asm volatile("yield" ::: "memory");
 }
 
+#define cpu_relax_yield()                     cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index 941593c..e412e8b 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data;
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 0c265ab..8b8704a 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -92,6 +92,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    	smp_mb()
+#define cpu_relax_yield()      cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index f2ef31b..914d730 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -121,6 +121,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(task)	(task_pt_regs(task)->sp)
 
 #define cpu_relax()		do { } while (0)
+#define cpu_relax_yield()             cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 862126b..01dd52e 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -63,6 +63,7 @@ static inline void release_thread(struct task_struct *dead_task)
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 73f0a79..4d00d65 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -107,6 +107,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk)->thread.frame0->sp)
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 111df73..683a061 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -127,6 +127,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define	KSTK_ESP(tsk)	((tsk) == current ? rdusp() : (tsk)->thread.usp)
 
 #define cpu_relax()    barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index d850113..1558ddb 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -56,6 +56,7 @@ struct thread_struct {
 }
 
 #define cpu_relax() __vmyield()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index ce53c50..4654b71 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -547,6 +547,7 @@ ia64_eoi (void)
 }
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index 9f8fd9b..b262037 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -133,6 +133,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)  ((tsk)->thread.sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index c84a218..13e07ae 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -156,6 +156,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define task_pt_regs(tsk)	((struct pt_regs *) ((tsk)->thread.esp0))
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index a0333eb..61d6e27 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -152,6 +152,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define user_stack_pointer(regs)        ((regs)->ctx.AX[0].U0)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index c38d0dd..fd7dd11 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -22,6 +22,7 @@
 extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
+# define cpu_relax_yield() cpu_relax()
 # define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 0d36c87..9a656f6 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -389,6 +389,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index b10ba12..89f63d1 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -69,6 +69,7 @@ extern void print_cpu_info(struct mn10300_cpuinfo *);
 extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 1c953f0..303e593 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -88,6 +88,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.kregs->sp)
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 70334c9..6ecfc2a 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -92,6 +92,7 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 #define init_stack      (init_thread_union.stack)
 
 #define cpu_relax()     barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index 2e674e1..ea2ff9f 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -309,6 +309,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)	((tsk)->thread.regs.gr[30])
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index c07c31b..908fa7c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -404,6 +404,7 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #define cpu_relax()	barrier()
 #endif
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 0332317..d05965b 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -234,8 +234,9 @@ static inline unsigned short stap(void)
 /*
  * Give up the time slice of the virtual PU.
  */
-void cpu_relax(void);
+void cpu_relax_yield(void);
 
+#define cpu_relax() cpu_relax_yield()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 81d0808..9e60ef1 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -53,7 +53,7 @@ void s390_update_cpu_mhz(void)
 		on_each_cpu(update_cpu_mhz, NULL, 0);
 }
 
-void notrace cpu_relax(void)
+void notrace cpu_relax_yield(void)
 {
 	if (!smp_cpu_mtid && MACHINE_HAS_DIAG44) {
 		diag_stat_inc(DIAG_STAT_X044);
@@ -61,7 +61,7 @@ void notrace cpu_relax(void)
 	}
 	barrier();
 }
-EXPORT_SYMBOL(cpu_relax);
+EXPORT_SYMBOL(cpu_relax_yield);
 
 /*
  * cpu_init - initializes state that is per-CPU.
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index 851f441..e8e87b4 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -24,6 +24,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define current_text_addr() ({ __label__ _l; _l: &&_l; })
 
 #define cpu_relax()		barrier()
+#define cpu_relax_yield()	cpu_relax()
 #define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index f9a0994..099a991 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -97,6 +97,7 @@ extern struct sh_cpuinfo cpu_data[];
 
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 812fd08..50e908a3c 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -119,6 +119,7 @@ extern struct task_struct *last_task_used_math;
 int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index ce2595c..3e8fac7 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -216,6 +216,7 @@ unsigned long get_wchan(struct task_struct *task);
 				     "nop\n\t"				\
 				     ".previous"			\
 				     ::: "memory")
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 0684e88..91a39a5 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -264,6 +264,7 @@ static inline void cpu_relax(void)
 	barrier();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index 8d21b7a..fc54d5d 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -71,6 +71,7 @@ extern void release_thread(struct task_struct *);
 unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
+#define cpu_relax_yield()		cpu_relax()
 #define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..44adada 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -588,6 +588,7 @@ static __always_inline void cpu_relax(void)
 	rep_nop();
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index b42d68b..fe14dc2 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -206,6 +206,7 @@ extern unsigned long get_wchan(struct task_struct *p);
 #define KSTK_ESP(tsk)		(task_pt_regs(tsk)->areg[1])
 
 #define cpu_relax()  barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] stop_machine: yield CPU during stop machine
  2016-10-21 11:58 [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Christian Borntraeger
  2016-10-21 11:58 ` [PATCH 1/5] processor.h: introduce cpu_relax_yield Christian Borntraeger
@ 2016-10-21 11:58 ` Christian Borntraeger
  2016-10-21 12:05     ` Peter Zijlstra
  2016-10-21 11:58 ` [PATCH 3/5] s390: make cpu_relax a barrier again Christian Borntraeger
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-21 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus,
	Christian Borntraeger

stop_machine can take a very long time if the hypervisor does
overcommitment for guest CPUs. When waiting for "the one", lets
give up our CPU by using the new cpu_relax_yield.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 kernel/stop_machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ec9ab2f..1eb8266 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read multi_stop_state. */
-		cpu_relax();
+		cpu_relax_yield();
 		if (msdata->state != curstate) {
 			curstate = msdata->state;
 			switch (curstate) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] s390: make cpu_relax a barrier again
  2016-10-21 11:58 [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Christian Borntraeger
  2016-10-21 11:58 ` [PATCH 1/5] processor.h: introduce cpu_relax_yield Christian Borntraeger
  2016-10-21 11:58 ` [PATCH 2/5] stop_machine: yield CPU during stop machine Christian Borntraeger
@ 2016-10-21 11:58 ` Christian Borntraeger
  2016-10-21 11:58 ` [PATCH 4/5] Remove cpu_relax_lowlatency users Christian Borntraeger
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-21 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus,
	Christian Borntraeger

stop_machine seems to be the only important place for yielding during
cpu_relax. This was fixed by using cpu_relax_yield. Therefore, we can
now redefine cpu_relax to be a barrier instead. (With the option to do
some SMT tuning later on)

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index d05965b..5d262cf 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -236,7 +236,7 @@ static inline unsigned short stap(void)
  */
 void cpu_relax_yield(void);
 
-#define cpu_relax() cpu_relax_yield()
+#define cpu_relax() barrier()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] Remove cpu_relax_lowlatency users
  2016-10-21 11:58 [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Christian Borntraeger
                   ` (2 preceding siblings ...)
  2016-10-21 11:58 ` [PATCH 3/5] s390: make cpu_relax a barrier again Christian Borntraeger
@ 2016-10-21 11:58 ` Christian Borntraeger
  2016-10-21 11:58 ` [PATCH 5/5] remove cpu_relax_lowlatency Christian Borntraeger
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-21 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus,
	Christian Borntraeger

With the s390 special case of a yielding cpu_relax implementation gone,
we can now remove all users of cpu_relax_lowlatency.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c                     | 4 ++--
 kernel/locking/mcs_spinlock.h           | 4 ++--
 kernel/locking/mutex.c                  | 4 ++--
 kernel/locking/osq_lock.c               | 6 +++---
 kernel/locking/qrwlock.c                | 6 +++---
 kernel/locking/rwsem-xadd.c             | 4 ++--
 lib/lockref.c                           | 2 +-
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 8832f8e..383d134 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -723,7 +723,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
 		if (busywait_stop(timeout_us, cpu))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	} while (!need_resched());
 
 	return false;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..5dc3465 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -342,7 +342,7 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 		endtime = busy_clock() + vq->busyloop_timeout;
 		while (vhost_can_busy_poll(vq->dev, endtime) &&
 		       vhost_vq_avail_empty(vq->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 		preempt_enable();
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
@@ -533,7 +533,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 		while (vhost_can_busy_poll(&net->dev, endtime) &&
 		       !sk_has_rx_data(sk) &&
 		       vhost_vq_avail_empty(&net->dev, vq))
-			cpu_relax_lowlatency();
+			cpu_relax();
 
 		preempt_enable();
 
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index c835270..6a385aa 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -28,7 +28,7 @@ struct mcs_spinlock {
 #define arch_mcs_spin_lock_contended(l)					\
 do {									\
 	while (!(smp_load_acquire(l)))					\
-		cpu_relax_lowlatency();					\
+		cpu_relax();						\
 } while (0)
 #endif
 
@@ -108,7 +108,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 			return;
 		/* Wait until the next pointer is set */
 		while (!(next = READ_ONCE(node->next)))
-			cpu_relax_lowlatency();
+			cpu_relax();
 	}
 
 	/* Pass lock to next waiter. */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..4463405 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -241,7 +241,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 			break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 
@@ -377,7 +377,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	osq_unlock(&lock->osq);
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..4ea2710 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 				break;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	return next;
@@ -122,7 +122,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (need_resched())
 			goto unqueue;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	return true;
 
@@ -148,7 +148,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 		if (smp_load_acquire(&node->locked))
 			return true;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 
 		/*
 		 * Or we race against a concurrent unqueue()'s step-B, in which
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 19248dd..cc3ed0c 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -54,7 +54,7 @@ static __always_inline void
 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
 {
 	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
-		cpu_relax_lowlatency();
+		cpu_relax();
 		cnts = atomic_read_acquire(&lock->cnts);
 	}
 }
@@ -130,7 +130,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 
 	/* When no more readers, set the locked flag */
@@ -141,7 +141,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 					    _QW_LOCKED) == _QW_WAITING))
 			break;
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 unlock:
 	arch_spin_unlock(&lock->wait_lock);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..2fa2e2e6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -368,7 +368,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 			return false;
 		}
 
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	rcu_read_unlock();
 out:
@@ -423,7 +423,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * memory barriers as we'll eventually observe the right
 		 * values at the cost of a few extra spins.
 		 */
-		cpu_relax_lowlatency();
+		cpu_relax();
 	}
 	osq_unlock(&sem->osq);
 done:
diff --git a/lib/lockref.c b/lib/lockref.c
index 5a92189..c4bfcb8 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -20,7 +20,7 @@
 		if (likely(old.lock_count == prev.lock_count)) {		\
 			SUCCESS;						\
 		}								\
-		cpu_relax_lowlatency();						\
+		cpu_relax();							\
 	}									\
 } while (0)
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] remove cpu_relax_lowlatency
  2016-10-21 11:58 [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Christian Borntraeger
                   ` (3 preceding siblings ...)
  2016-10-21 11:58 ` [PATCH 4/5] Remove cpu_relax_lowlatency users Christian Borntraeger
@ 2016-10-21 11:58 ` Christian Borntraeger
  2016-10-21 12:06 ` [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Peter Zijlstra
  2016-10-21 14:57 ` David Miller
  6 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-21 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus,
	Christian Borntraeger

As there are no users left, we can remove cpu_relax_lowlatency.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/alpha/include/asm/processor.h      | 1 -
 arch/arm/include/asm/processor.h        | 1 -
 arch/arm64/include/asm/processor.h      | 1 -
 arch/avr32/include/asm/processor.h      | 1 -
 arch/blackfin/include/asm/processor.h   | 1 -
 arch/c6x/include/asm/processor.h        | 1 -
 arch/cris/include/asm/processor.h       | 1 -
 arch/frv/include/asm/processor.h        | 1 -
 arch/h8300/include/asm/processor.h      | 1 -
 arch/hexagon/include/asm/processor.h    | 1 -
 arch/ia64/include/asm/processor.h       | 1 -
 arch/m32r/include/asm/processor.h       | 1 -
 arch/m68k/include/asm/processor.h       | 1 -
 arch/metag/include/asm/processor.h      | 1 -
 arch/microblaze/include/asm/processor.h | 1 -
 arch/mips/include/asm/processor.h       | 1 -
 arch/mn10300/include/asm/processor.h    | 1 -
 arch/nios2/include/asm/processor.h      | 1 -
 arch/openrisc/include/asm/processor.h   | 1 -
 arch/parisc/include/asm/processor.h     | 1 -
 arch/powerpc/include/asm/processor.h    | 1 -
 arch/s390/include/asm/processor.h       | 1 -
 arch/score/include/asm/processor.h      | 1 -
 arch/sh/include/asm/processor.h         | 1 -
 arch/sparc/include/asm/processor_32.h   | 1 -
 arch/sparc/include/asm/processor_64.h   | 1 -
 arch/tile/include/asm/processor.h       | 1 -
 arch/unicore32/include/asm/processor.h  | 1 -
 arch/x86/include/asm/processor.h        | 1 -
 arch/xtensa/include/asm/processor.h     | 1 -
 30 files changed, 30 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 0556fda..31e8dbe 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -59,7 +59,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
 #define ARCH_HAS_PREFETCHW
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index db660e0..9e71c58b 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -83,7 +83,6 @@ unsigned long get_wchan(struct task_struct *p);
 #endif
 
 #define cpu_relax_yield()  	              cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 797ee20..9274bf4 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -150,7 +150,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield()                     cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 /* Thread switching */
 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index e412e8b..ee62365 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -93,7 +93,6 @@ extern struct avr32_cpuinfo boot_cpu_data;
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define cpu_sync_pipeline()	asm volatile("sub pc, -2" : : : "memory")
 
 struct cpu_context {
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 8b8704a..57acfb1 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -93,7 +93,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    	smp_mb()
 #define cpu_relax_yield()      cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
 static inline uint32_t __pure bfin_revid(void)
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 914d730..1fd22e7 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -122,7 +122,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		do { } while (0)
 #define cpu_relax_yield()             cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 
 extern const struct seq_operations cpuinfo_op;
 
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 01dd52e..1a57841 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -64,7 +64,6 @@ static inline void release_thread(struct task_struct *dead_task)
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 4d00d65..c1e5f2a 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -108,7 +108,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* data cache prefetch */
 #define ARCH_HAS_PREFETCH
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 683a061..42d6053 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -128,7 +128,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()    barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()	cpu_relax()
 
 #define HARD_RESET_NOW() ({		\
 	local_irq_disable();		\
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index 1558ddb..5d694cc 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -57,7 +57,6 @@ struct thread_struct {
 
 #define cpu_relax() __vmyield()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Decides where the kernel will search for a free chunk of vm space during
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 4654b71..0c2c3b2 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -548,7 +548,6 @@ ia64_eoi (void)
 
 #define cpu_relax()	ia64_hint(ia64_hint_pause)
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 static inline int
 ia64_get_irr(unsigned int vector)
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index b262037..9b83a13 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -134,6 +134,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* _ASM_M32R_PROCESSOR_H */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 13e07ae..b0d0442 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -157,6 +157,5 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif
diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h
index 61d6e27..ee302a6 100644
--- a/arch/metag/include/asm/processor.h
+++ b/arch/metag/include/asm/processor.h
@@ -153,7 +153,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 extern void setup_priv(void);
 
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index fd7dd11..08ec1f7 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -23,7 +23,6 @@ extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
 # define cpu_relax_yield() cpu_relax()
-# define cpu_relax_lowlatency()	cpu_relax()
 
 #define task_pt_regs(tsk) \
 		(((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 9a656f6..8ea95e7 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -390,7 +390,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * Return_address is a replacement for __builtin_return_address(count)
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index 89f63d1..d11397b 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -70,7 +70,6 @@ extern void dodgy_tsc(void);
 
 #define cpu_relax() barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * User space process size: 1.75GB (default).
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 303e593..d32c176 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -89,7 +89,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()  cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index 6ecfc2a..7f47fc7 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -93,7 +93,6 @@ extern unsigned long thread_saved_pc(struct task_struct *t);
 
 #define cpu_relax()     barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_OPENRISC_PROCESSOR_H */
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index ea2ff9f..a4a07f4 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -310,7 +310,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /*
  * parisc_requires_coherency() is used to identify the combined VIPT/PIPT
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 908fa7c..5684e68 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -405,7 +405,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode)
 #endif
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
 int validate_sp(unsigned long sp, struct task_struct *p,
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 5d262cf..17c001a 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -237,7 +237,6 @@ static inline unsigned short stap(void)
 void cpu_relax_yield(void);
 
 #define cpu_relax() barrier()
-#define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE	0
 #define ECAG_CPU_ATTRIBUTE	1
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index e8e87b4..a1e97c0 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -25,7 +25,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		barrier()
 #define cpu_relax_yield()	cpu_relax()
-#define cpu_relax_lowlatency()        cpu_relax()
 #define release_thread(thread)	do {} while (0)
 
 /*
diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h
index 099a991..9454ff1 100644
--- a/arch/sh/include/asm/processor.h
+++ b/arch/sh/include/asm/processor.h
@@ -98,7 +98,6 @@ extern struct sh_cpuinfo cpu_data[];
 #define cpu_sleep()	__asm__ __volatile__ ("sleep" : : : "memory")
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 void default_idle(void);
 void stop_this_cpu(void *);
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 50e908a3c..fc32b73 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -120,7 +120,6 @@ int do_mathemu(struct pt_regs *regs, struct task_struct *fpt);
 
 #define cpu_relax()	barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 extern void (*sparc_idle)(void);
 
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index 3e8fac7..12787df 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -217,7 +217,6 @@ unsigned long get_wchan(struct task_struct *task);
 				     ".previous"			\
 				     ::: "memory")
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Prefetch support.  This is tuned for UltraSPARC-III and later.
  * UltraSPARC-I will treat these as nops, and UltraSPARC-II has
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 91a39a5..c1c228b 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -265,7 +265,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Info on this processor (see fs/proc/cpuinfo.c) */
 struct seq_operations;
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index fc54d5d..eeefe7c 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -72,7 +72,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
 #define cpu_relax_yield()		cpu_relax()
-#define cpu_relax_lowlatency()                cpu_relax()
 
 #define task_pt_regs(p) \
 	((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 44adada..7513c99 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -589,7 +589,6 @@ static __always_inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Stop speculative execution and prefetching of modified code. */
 static inline void sync_core(void)
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index fe14dc2..7d8d6be 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -207,7 +207,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()  barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Special register access. */
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
  2016-10-21 11:58 ` [PATCH 2/5] stop_machine: yield CPU during stop machine Christian Borntraeger
@ 2016-10-21 12:05     ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-10-21 12:05 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Nicholas Piggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus,
	virtualization, xen-devel-request, kvm

On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
> stop_machine can take a very long time if the hypervisor does
> overcommitment for guest CPUs. When waiting for "the one", lets
> give up our CPU by using the new cpu_relax_yield.

This seems something that would apply to most other virt stuff. Lets Cc
a few more lists for that.

> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  kernel/stop_machine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index ec9ab2f..1eb8266 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
>  	/* Simple state machine */
>  	do {
>  		/* Chill out and ensure we re-read multi_stop_state. */
> -		cpu_relax();
> +		cpu_relax_yield();
>  		if (msdata->state != curstate) {
>  			curstate = msdata->state;
>  			switch (curstate) {
> -- 
> 2.5.5
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
@ 2016-10-21 12:05     ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-10-21 12:05 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, xen-devel-request, Heiko Carstens,
	linux-kernel, Nicholas Piggin, virtualization, Noam Camus,
	Martin Schwidefsky, linuxppc-dev

On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
> stop_machine can take a very long time if the hypervisor does
> overcommitment for guest CPUs. When waiting for "the one", lets
> give up our CPU by using the new cpu_relax_yield.

This seems something that would apply to most other virt stuff. Lets Cc
a few more lists for that.

> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  kernel/stop_machine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index ec9ab2f..1eb8266 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
>  	/* Simple state machine */
>  	do {
>  		/* Chill out and ensure we re-read multi_stop_state. */
> -		cpu_relax();
> +		cpu_relax_yield();
>  		if (msdata->state != curstate) {
>  			curstate = msdata->state;
>  			switch (curstate) {
> -- 
> 2.5.5
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency
  2016-10-21 11:58 [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Christian Borntraeger
                   ` (4 preceding siblings ...)
  2016-10-21 11:58 ` [PATCH 5/5] remove cpu_relax_lowlatency Christian Borntraeger
@ 2016-10-21 12:06 ` Peter Zijlstra
  2016-10-21 14:57 ` David Miller
  6 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-10-21 12:06 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Nicholas Piggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus

On Fri, Oct 21, 2016 at 01:58:53PM +0200, Christian Borntraeger wrote:
> Christian Borntraeger (5):
>   processor.h: introduce cpu_relax_yield
>   stop_machine: yield CPU during stop machine
>   s390: make cpu_relax a barrier again
>   Remove cpu_relax_lowlatency users
>   remove cpu_relax_lowlatency

Yay for killing cpu_relax_lowlatency()...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
  2016-10-21 12:05     ` Peter Zijlstra
  (?)
  (?)
@ 2016-10-21 12:41     ` Juergen Gross
  -1 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2016-10-21 12:41 UTC (permalink / raw)
  To: Peter Zijlstra, Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, xen-devel, Heiko Carstens,
	linux-kernel, Nicholas Piggin, virtualization, Noam Camus,
	Martin Schwidefsky, linuxppc-dev

On 21/10/16 14:05, Peter Zijlstra wrote:
> On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
>> stop_machine can take a very long time if the hypervisor does
>> overcommitment for guest CPUs. When waiting for "the one", lets
>> give up our CPU by using the new cpu_relax_yield.
> 
> This seems something that would apply to most other virt stuff. Lets Cc
> a few more lists for that.

Corrected xen-devel mail address.


Juergen

> 
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  kernel/stop_machine.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index ec9ab2f..1eb8266 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
>>  	/* Simple state machine */
>>  	do {
>>  		/* Chill out and ensure we re-read multi_stop_state. */
>> -		cpu_relax();
>> +		cpu_relax_yield();
>>  		if (msdata->state != curstate) {
>>  			curstate = msdata->state;
>>  			switch (curstate) {
>> -- 
>> 2.5.5
>>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
  2016-10-21 12:05     ` Peter Zijlstra
                       ` (2 preceding siblings ...)
  (?)
@ 2016-10-21 12:41     ` Juergen Gross
  -1 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2016-10-21 12:41 UTC (permalink / raw)
  To: Peter Zijlstra, Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, Heiko Carstens, linux-kernel,
	Nicholas Piggin, virtualization, Noam Camus, Martin Schwidefsky,
	xen-devel, linuxppc-dev

On 21/10/16 14:05, Peter Zijlstra wrote:
> On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
>> stop_machine can take a very long time if the hypervisor does
>> overcommitment for guest CPUs. When waiting for "the one", lets
>> give up our CPU by using the new cpu_relax_yield.
> 
> This seems something that would apply to most other virt stuff. Lets Cc
> a few more lists for that.

Corrected xen-devel mail address.


Juergen

> 
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  kernel/stop_machine.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index ec9ab2f..1eb8266 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
>>  	/* Simple state machine */
>>  	do {
>>  		/* Chill out and ensure we re-read multi_stop_state. */
>> -		cpu_relax();
>> +		cpu_relax_yield();
>>  		if (msdata->state != curstate) {
>>  			curstate = msdata->state;
>>  			switch (curstate) {
>> -- 
>> 2.5.5
>>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
  2016-10-21 12:05     ` Peter Zijlstra
  (?)
@ 2016-10-21 12:41     ` Juergen Gross
  -1 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2016-10-21 12:41 UTC (permalink / raw)
  To: Peter Zijlstra, Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, Heiko Carstens, linux-kernel,
	Nicholas Piggin, virtualization, Noam Camus, Martin Schwidefsky,
	xen-devel, linuxppc-dev

On 21/10/16 14:05, Peter Zijlstra wrote:
> On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
>> stop_machine can take a very long time if the hypervisor does
>> overcommitment for guest CPUs. When waiting for "the one", lets
>> give up our CPU by using the new cpu_relax_yield.
> 
> This seems something that would apply to most other virt stuff. Lets Cc
> a few more lists for that.

Corrected xen-devel mail address.


Juergen

> 
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  kernel/stop_machine.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index ec9ab2f..1eb8266 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
>>  	/* Simple state machine */
>>  	do {
>>  		/* Chill out and ensure we re-read multi_stop_state. */
>> -		cpu_relax();
>> +		cpu_relax_yield();
>>  		if (msdata->state != curstate) {
>>  			curstate = msdata->state;
>>  			switch (curstate) {
>> -- 
>> 2.5.5
>>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency
  2016-10-21 11:58 [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Christian Borntraeger
                   ` (5 preceding siblings ...)
  2016-10-21 12:06 ` [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Peter Zijlstra
@ 2016-10-21 14:57 ` David Miller
  2016-10-21 15:08     ` Christian Borntraeger
  2016-10-21 15:08   ` Christian Borntraeger
  6 siblings, 2 replies; 25+ messages in thread
From: David Miller @ 2016-10-21 14:57 UTC (permalink / raw)
  To: borntraeger
  Cc: peterz, npiggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, heiko.carstens, schwidefsky, noamc

From: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Fri, 21 Oct 2016 13:58:53 +0200

> For spinning loops people did often use barrier() or cpu_relax().
> For most architectures cpu_relax and barrier are the same, but on
> some architectures cpu_relax can add some latency. For example on s390
> cpu_relax gives up the time slice to the hypervisor. On power cpu_relax
> tries to give some of the CPU to the neighbor threads. To reduce the
> latency another variant cpu_relax_lowlatency was introduced. Before this
> is used in more and more places, lets revert the logic of provide a new
> function cpu_relax_yield that can spend some time and for s390 yields
> the guest CPU.

Sparc64, fwiw, behaves similarly to powerpc.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency
  2016-10-21 14:57 ` David Miller
@ 2016-10-21 15:08     ` Christian Borntraeger
  2016-10-21 15:08   ` Christian Borntraeger
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-21 15:08 UTC (permalink / raw)
  To: David Miller
  Cc: peterz, npiggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, heiko.carstens, schwidefsky, noamc, virtualization,
	xen-devel, KVM list

On 10/21/2016 04:57 PM, David Miller wrote:
> From: Christian Borntraeger <borntraeger@de.ibm.com>
> Date: Fri, 21 Oct 2016 13:58:53 +0200
> 
>> For spinning loops people did often use barrier() or cpu_relax().
>> For most architectures cpu_relax and barrier are the same, but on
>> some architectures cpu_relax can add some latency. For example on s390
>> cpu_relax gives up the time slice to the hypervisor. On power cpu_relax
>> tries to give some of the CPU to the neighbor threads. To reduce the
>> latency another variant cpu_relax_lowlatency was introduced. Before this
>> is used in more and more places, lets revert the logic of provide a new
>> function cpu_relax_yield that can spend some time and for s390 yields
>> the guest CPU.
> 
> Sparc64, fwiw, behaves similarly to powerpc.

As sparc currently defines cpu_relax_lowlatency to cpu_relax, this patch set
should be a no-op then for sparc, correct?

My intend was that cpu_relax should not add a huge latency but can certainly
push some cpu power to hardware threads of the same core. This seems to be
the case for sparc/power and some arc variants. 

Christian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency
@ 2016-10-21 15:08     ` Christian Borntraeger
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-21 15:08 UTC (permalink / raw)
  To: David Miller
  Cc: linux-arch, linux-s390, KVM list, peterz, heiko.carstens,
	linux-kernel, npiggin, virtualization, noamc, schwidefsky,
	xen-devel, linuxppc-dev

On 10/21/2016 04:57 PM, David Miller wrote:
> From: Christian Borntraeger <borntraeger@de.ibm.com>
> Date: Fri, 21 Oct 2016 13:58:53 +0200
> 
>> For spinning loops people did often use barrier() or cpu_relax().
>> For most architectures cpu_relax and barrier are the same, but on
>> some architectures cpu_relax can add some latency. For example on s390
>> cpu_relax gives up the time slice to the hypervisor. On power cpu_relax
>> tries to give some of the CPU to the neighbor threads. To reduce the
>> latency another variant cpu_relax_lowlatency was introduced. Before this
>> is used in more and more places, lets revert the logic of provide a new
>> function cpu_relax_yield that can spend some time and for s390 yields
>> the guest CPU.
> 
> Sparc64, fwiw, behaves similarly to powerpc.

As sparc currently defines cpu_relax_lowlatency to cpu_relax, this patch set
should be a no-op then for sparc, correct?

My intend was that cpu_relax should not add a huge latency but can certainly
push some cpu power to hardware threads of the same core. This seems to be
the case for sparc/power and some arc variants. 

Christian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency
  2016-10-21 14:57 ` David Miller
  2016-10-21 15:08     ` Christian Borntraeger
@ 2016-10-21 15:08   ` Christian Borntraeger
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-21 15:08 UTC (permalink / raw)
  To: David Miller
  Cc: linux-arch, linux-s390, KVM list, peterz, heiko.carstens,
	linux-kernel, npiggin, virtualization, noamc, schwidefsky,
	xen-devel, linuxppc-dev

On 10/21/2016 04:57 PM, David Miller wrote:
> From: Christian Borntraeger <borntraeger@de.ibm.com>
> Date: Fri, 21 Oct 2016 13:58:53 +0200
> 
>> For spinning loops people did often use barrier() or cpu_relax().
>> For most architectures cpu_relax and barrier are the same, but on
>> some architectures cpu_relax can add some latency. For example on s390
>> cpu_relax gives up the time slice to the hypervisor. On power cpu_relax
>> tries to give some of the CPU to the neighbor threads. To reduce the
>> latency another variant cpu_relax_lowlatency was introduced. Before this
>> is used in more and more places, lets revert the logic of provide a new
>> function cpu_relax_yield that can spend some time and for s390 yields
>> the guest CPU.
> 
> Sparc64, fwiw, behaves similarly to powerpc.

As sparc currently defines cpu_relax_lowlatency to cpu_relax, this patch set
should be a no-op then for sparc, correct?

My intend was that cpu_relax should not add a huge latency but can certainly
push some cpu power to hardware threads of the same core. This seems to be
the case for sparc/power and some arc variants. 

Christian




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency
  2016-10-21 15:08     ` Christian Borntraeger
                       ` (2 preceding siblings ...)
  (?)
@ 2016-10-21 15:12     ` David Miller
  -1 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2016-10-21 15:12 UTC (permalink / raw)
  To: borntraeger
  Cc: peterz, npiggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, heiko.carstens, schwidefsky, noamc, virtualization,
	xen-devel, kvm

From: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Fri, 21 Oct 2016 17:08:54 +0200

> On 10/21/2016 04:57 PM, David Miller wrote:
>> From: Christian Borntraeger <borntraeger@de.ibm.com>
>> Date: Fri, 21 Oct 2016 13:58:53 +0200
>> 
>>> For spinning loops people did often use barrier() or cpu_relax().
>>> For most architectures cpu_relax and barrier are the same, but on
>>> some architectures cpu_relax can add some latency. For example on s390
>>> cpu_relax gives up the time slice to the hypervisor. On power cpu_relax
>>> tries to give some of the CPU to the neighbor threads. To reduce the
>>> latency another variant cpu_relax_lowlatency was introduced. Before this
>>> is used in more and more places, lets revert the logic of provide a new
>>> function cpu_relax_yield that can spend some time and for s390 yields
>>> the guest CPU.
>> 
>> Sparc64, fwiw, behaves similarly to powerpc.
> 
> As sparc currently defines cpu_relax_lowlatency to cpu_relax, this patch set
> should be a no-op then for sparc, correct?
> 
> My intend was that cpu_relax should not add a huge latency but can certainly
> push some cpu power to hardware threads of the same core. This seems to be
> the case for sparc/power and some arc variants. 

Agreed.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency
  2016-10-21 15:08     ` Christian Borntraeger
  (?)
@ 2016-10-21 15:12     ` David Miller
  -1 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2016-10-21 15:12 UTC (permalink / raw)
  To: borntraeger
  Cc: linux-arch, linux-s390, kvm, peterz, heiko.carstens,
	linux-kernel, npiggin, virtualization, noamc, schwidefsky,
	xen-devel, linuxppc-dev

From: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Fri, 21 Oct 2016 17:08:54 +0200

> On 10/21/2016 04:57 PM, David Miller wrote:
>> From: Christian Borntraeger <borntraeger@de.ibm.com>
>> Date: Fri, 21 Oct 2016 13:58:53 +0200
>> 
>>> For spinning loops people did often use barrier() or cpu_relax().
>>> For most architectures cpu_relax and barrier are the same, but on
>>> some architectures cpu_relax can add some latency. For example on s390
>>> cpu_relax gives up the time slice to the hypervisor. On power cpu_relax
>>> tries to give some of the CPU to the neighbor threads. To reduce the
>>> latency another variant cpu_relax_lowlatency was introduced. Before this
>>> is used in more and more places, lets revert the logic of provide a new
>>> function cpu_relax_yield that can spend some time and for s390 yields
>>> the guest CPU.
>> 
>> Sparc64, fwiw, behaves similarly to powerpc.
> 
> As sparc currently defines cpu_relax_lowlatency to cpu_relax, this patch set
> should be a no-op then for sparc, correct?
> 
> My intend was that cpu_relax should not add a huge latency but can certainly
> push some cpu power to hardware threads of the same core. This seems to be
> the case for sparc/power and some arc variants. 

Agreed.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency
  2016-10-21 15:08     ` Christian Borntraeger
  (?)
  (?)
@ 2016-10-21 15:12     ` David Miller
  -1 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2016-10-21 15:12 UTC (permalink / raw)
  To: borntraeger
  Cc: linux-arch, linux-s390, kvm, peterz, heiko.carstens,
	linux-kernel, npiggin, virtualization, noamc, schwidefsky,
	xen-devel, linuxppc-dev

From: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Fri, 21 Oct 2016 17:08:54 +0200

> On 10/21/2016 04:57 PM, David Miller wrote:
>> From: Christian Borntraeger <borntraeger@de.ibm.com>
>> Date: Fri, 21 Oct 2016 13:58:53 +0200
>> 
>>> For spinning loops people did often use barrier() or cpu_relax().
>>> For most architectures cpu_relax and barrier are the same, but on
>>> some architectures cpu_relax can add some latency. For example on s390
>>> cpu_relax gives up the time slice to the hypervisor. On power cpu_relax
>>> tries to give some of the CPU to the neighbor threads. To reduce the
>>> latency another variant cpu_relax_lowlatency was introduced. Before this
>>> is used in more and more places, lets revert the logic of provide a new
>>> function cpu_relax_yield that can spend some time and for s390 yields
>>> the guest CPU.
>> 
>> Sparc64, fwiw, behaves similarly to powerpc.
> 
> As sparc currently defines cpu_relax_lowlatency to cpu_relax, this patch set
> should be a no-op then for sparc, correct?
> 
> My intend was that cpu_relax should not add a huge latency but can certainly
> push some cpu power to hardware threads of the same core. This seems to be
> the case for sparc/power and some arc variants. 

Agreed.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
  2016-10-21 12:05     ` Peter Zijlstra
                       ` (3 preceding siblings ...)
  (?)
@ 2016-10-22  0:06     ` Nicholas Piggin
  2016-10-24  7:52         ` Christian Borntraeger
  -1 siblings, 1 reply; 25+ messages in thread
From: Nicholas Piggin @ 2016-10-22  0:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian Borntraeger, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus,
	virtualization, xen-devel-request, kvm

On Fri, 21 Oct 2016 14:05:36 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
> > stop_machine can take a very long time if the hypervisor does
> > overcommitment for guest CPUs. When waiting for "the one", lets
> > give up our CPU by using the new cpu_relax_yield.  
> 
> This seems something that would apply to most other virt stuff. Lets Cc
> a few more lists for that.
> 
> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  kernel/stop_machine.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > index ec9ab2f..1eb8266 100644
> > --- a/kernel/stop_machine.c
> > +++ b/kernel/stop_machine.c
> > @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
> >  	/* Simple state machine */
> >  	do {
> >  		/* Chill out and ensure we re-read multi_stop_state. */
> > -		cpu_relax();
> > +		cpu_relax_yield();
> >  		if (msdata->state != curstate) {
> >  			curstate = msdata->state;
> >  			switch (curstate) {
> > -- 
> > 2.5.5
> >   

This is the only caller of cpu_relax_yield()?

As a step to removing cpu_yield_lowlatency this series is nice so I
have no objection. But "general" kernel coders still have basically
no chance of using this properly.

I wonder what can be done about that. I've got that spin_do/while
series I'll rebase on top of this, but a spin_yield variant of them
is of no more help to the caller.

What makes this unique? Long latency and not performance critical?
Most places where we spin and maybe yield have been moved to arch
code, but I wonder whether we can make an easier to use architecture
independent API?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
  2016-10-21 12:05     ` Peter Zijlstra
                       ` (4 preceding siblings ...)
  (?)
@ 2016-10-22  0:06     ` Nicholas Piggin
  -1 siblings, 0 replies; 25+ messages in thread
From: Nicholas Piggin @ 2016-10-22  0:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, xen-devel-request, Heiko Carstens,
	linux-kernel, virtualization, Noam Camus, Martin Schwidefsky,
	linuxppc-dev

On Fri, 21 Oct 2016 14:05:36 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
> > stop_machine can take a very long time if the hypervisor does
> > overcommitment for guest CPUs. When waiting for "the one", lets
> > give up our CPU by using the new cpu_relax_yield.  
> 
> This seems something that would apply to most other virt stuff. Lets Cc
> a few more lists for that.
> 
> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  kernel/stop_machine.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> > index ec9ab2f..1eb8266 100644
> > --- a/kernel/stop_machine.c
> > +++ b/kernel/stop_machine.c
> > @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
> >  	/* Simple state machine */
> >  	do {
> >  		/* Chill out and ensure we re-read multi_stop_state. */
> > -		cpu_relax();
> > +		cpu_relax_yield();
> >  		if (msdata->state != curstate) {
> >  			curstate = msdata->state;
> >  			switch (curstate) {
> > -- 
> > 2.5.5
> >   

This is the only caller of cpu_relax_yield()?

As a step to removing cpu_yield_lowlatency this series is nice so I
have no objection. But "general" kernel coders still have basically
no chance of using this properly.

I wonder what can be done about that. I've got that spin_do/while
series I'll rebase on top of this, but a spin_yield variant of them
is of no more help to the caller.

What makes this unique? Long latency and not performance critical?
Most places where we spin and maybe yield have been moved to arch
code, but I wonder whether we can make an easier to use architecture
independent API?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
  2016-10-22  0:06     ` Nicholas Piggin
@ 2016-10-24  7:52         ` Christian Borntraeger
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-24  7:52 UTC (permalink / raw)
  To: Nicholas Piggin, Peter Zijlstra
  Cc: linux-kernel, linux-s390, linux-arch, linuxppc-dev,
	Heiko Carstens, Martin Schwidefsky, Noam Camus, virtualization,
	xen-devel-request, kvm

On 10/22/2016 02:06 AM, Nicholas Piggin wrote:
> On Fri, 21 Oct 2016 14:05:36 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
>>> stop_machine can take a very long time if the hypervisor does
>>> overcommitment for guest CPUs. When waiting for "the one", lets
>>> give up our CPU by using the new cpu_relax_yield.  
>>
>> This seems something that would apply to most other virt stuff. Lets Cc
>> a few more lists for that.
>>
>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>> ---
>>>  kernel/stop_machine.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>>> index ec9ab2f..1eb8266 100644
>>> --- a/kernel/stop_machine.c
>>> +++ b/kernel/stop_machine.c
>>> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
>>>  	/* Simple state machine */
>>>  	do {
>>>  		/* Chill out and ensure we re-read multi_stop_state. */
>>> -		cpu_relax();
>>> +		cpu_relax_yield();
>>>  		if (msdata->state != curstate) {
>>>  			curstate = msdata->state;
>>>  			switch (curstate) {
>>> -- 
>>> 2.5.5
>>>   
> 
> This is the only caller of cpu_relax_yield()?

As of today yes. Right now the yielding (call to hypervisor) in 
cpu_relax is only done for s390. Some time ago Heiko did remove 
that also from s390 with commit 57f2ffe14fd125c2 ("s390: remove 
diag 44 calls from cpu_relax()")

As it turns out this make stop_machine run really slow on virtualized
systems. For example the kprobes test during bootup took several seconds 
instead of just running unnoticed with large guests. Therefore, we 
reintroduced that with commit 4d92f50249eb ("s390: reintroduce diag 44
calls for cpu_relax()"), but the only place where we noticed the missing
yield was in the stop_machine code.

I would assume that we might find some other places where this makes
sense in the future, but I expect that we have much less places for 
yield than we need for lowlatency.

PS: We do something similar for our arch implementation for spinlocks,
but there  we use the directed yield as we know which CPU holds the lock.


> 
> As a step to removing cpu_yield_lowlatency this series is nice so I
> have no objection. But "general" kernel coders still have basically
> no chance of using this properly.
> 
> I wonder what can be done about that. I've got that spin_do/while
> series I'll rebase on top of this, but a spin_yield variant of them
> is of no more help to the caller.
> 
> What makes this unique? Long latency and not performance critical?

I think what makes this unique is that ALL cpus spin and wait for one.
It was really the only place that I noticed a regression with Heikos
first patch.

> Most places where we spin and maybe yield have been moved to arch
> code, but I wonder whether we can make an easier to use architecture
> independent API?


Peter, I will fixup the patch set (I forgot to remove the lowlatency
in 2 places) and push it on my tree for linux-next. Lets see what happens.
Would the tip tree be the right place if things work out ok?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
@ 2016-10-24  7:52         ` Christian Borntraeger
  0 siblings, 0 replies; 25+ messages in thread
From: Christian Borntraeger @ 2016-10-24  7:52 UTC (permalink / raw)
  To: Nicholas Piggin, Peter Zijlstra
  Cc: linux-arch, linux-s390, kvm, xen-devel-request, Heiko Carstens,
	linux-kernel, virtualization, Noam Camus, Martin Schwidefsky,
	linuxppc-dev

On 10/22/2016 02:06 AM, Nicholas Piggin wrote:
> On Fri, 21 Oct 2016 14:05:36 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
>>> stop_machine can take a very long time if the hypervisor does
>>> overcommitment for guest CPUs. When waiting for "the one", lets
>>> give up our CPU by using the new cpu_relax_yield.  
>>
>> This seems something that would apply to most other virt stuff. Lets Cc
>> a few more lists for that.
>>
>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>> ---
>>>  kernel/stop_machine.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>>> index ec9ab2f..1eb8266 100644
>>> --- a/kernel/stop_machine.c
>>> +++ b/kernel/stop_machine.c
>>> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
>>>  	/* Simple state machine */
>>>  	do {
>>>  		/* Chill out and ensure we re-read multi_stop_state. */
>>> -		cpu_relax();
>>> +		cpu_relax_yield();
>>>  		if (msdata->state != curstate) {
>>>  			curstate = msdata->state;
>>>  			switch (curstate) {
>>> -- 
>>> 2.5.5
>>>   
> 
> This is the only caller of cpu_relax_yield()?

As of today yes. Right now the yielding (call to hypervisor) in 
cpu_relax is only done for s390. Some time ago Heiko did remove 
that also from s390 with commit 57f2ffe14fd125c2 ("s390: remove 
diag 44 calls from cpu_relax()")

As it turns out this make stop_machine run really slow on virtualized
systems. For example the kprobes test during bootup took several seconds 
instead of just running unnoticed with large guests. Therefore, we 
reintroduced that with commit 4d92f50249eb ("s390: reintroduce diag 44
calls for cpu_relax()"), but the only place where we noticed the missing
yield was in the stop_machine code.

I would assume that we might find some other places where this makes
sense in the future, but I expect that we have much less places for 
yield than we need for lowlatency.

PS: We do something similar for our arch implementation for spinlocks,
but there  we use the directed yield as we know which CPU holds the lock.


> 
> As a step to removing cpu_yield_lowlatency this series is nice so I
> have no objection. But "general" kernel coders still have basically
> no chance of using this properly.
> 
> I wonder what can be done about that. I've got that spin_do/while
> series I'll rebase on top of this, but a spin_yield variant of them
> is of no more help to the caller.
> 
> What makes this unique? Long latency and not performance critical?

I think what makes this unique is that ALL cpus spin and wait for one.
It was really the only place that I noticed a regression with Heikos
first patch.

> Most places where we spin and maybe yield have been moved to arch
> code, but I wonder whether we can make an easier to use architecture
> independent API?


Peter, I will fixup the patch set (I forgot to remove the lowlatency
in 2 places) and push it on my tree for linux-next. Lets see what happens.
Would the tip tree be the right place if things work out ok?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
  2016-10-24  7:52         ` Christian Borntraeger
@ 2016-10-24  8:47           ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-10-24  8:47 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Nicholas Piggin, linux-kernel, linux-s390, linux-arch,
	linuxppc-dev, Heiko Carstens, Martin Schwidefsky, Noam Camus,
	virtualization, xen-devel-request, kvm

On Mon, Oct 24, 2016 at 09:52:31AM +0200, Christian Borntraeger wrote:
> Peter, I will fixup the patch set (I forgot to remove the lowlatency
> in 2 places) and push it on my tree for linux-next. Lets see what happens.
> Would the tip tree be the right place if things work out ok?

I think so, you're touching a fair bit of kernel/locking/ and there's
bound to be some conflicts with work there. So carrying it in the
locking tree might be best.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
@ 2016-10-24  8:47           ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-10-24  8:47 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: linux-arch, linux-s390, kvm, xen-devel-request, Heiko Carstens,
	linux-kernel, Nicholas Piggin, virtualization, Noam Camus,
	Martin Schwidefsky, linuxppc-dev

On Mon, Oct 24, 2016 at 09:52:31AM +0200, Christian Borntraeger wrote:
> Peter, I will fixup the patch set (I forgot to remove the lowlatency
> in 2 places) and push it on my tree for linux-next. Lets see what happens.
> Would the tip tree be the right place if things work out ok?

I think so, you're touching a fair bit of kernel/locking/ and there's
bound to be some conflicts with work there. So carrying it in the
locking tree might be best.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-10-24  8:47 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-21 11:58 [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Christian Borntraeger
2016-10-21 11:58 ` [PATCH 1/5] processor.h: introduce cpu_relax_yield Christian Borntraeger
2016-10-21 11:58 ` [PATCH 2/5] stop_machine: yield CPU during stop machine Christian Borntraeger
2016-10-21 12:05   ` Peter Zijlstra
2016-10-21 12:05     ` Peter Zijlstra
2016-10-21 12:41     ` Juergen Gross
2016-10-21 12:41     ` Juergen Gross
2016-10-21 12:41     ` Juergen Gross
2016-10-22  0:06     ` Nicholas Piggin
2016-10-24  7:52       ` Christian Borntraeger
2016-10-24  7:52         ` Christian Borntraeger
2016-10-24  8:47         ` Peter Zijlstra
2016-10-24  8:47           ` Peter Zijlstra
2016-10-22  0:06     ` Nicholas Piggin
2016-10-21 11:58 ` [PATCH 3/5] s390: make cpu_relax a barrier again Christian Borntraeger
2016-10-21 11:58 ` [PATCH 4/5] Remove cpu_relax_lowlatency users Christian Borntraeger
2016-10-21 11:58 ` [PATCH 5/5] remove cpu_relax_lowlatency Christian Borntraeger
2016-10-21 12:06 ` [PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency Peter Zijlstra
2016-10-21 14:57 ` David Miller
2016-10-21 15:08   ` Christian Borntraeger
2016-10-21 15:08     ` Christian Borntraeger
2016-10-21 15:12     ` David Miller
2016-10-21 15:12     ` David Miller
2016-10-21 15:12     ` David Miller
2016-10-21 15:08   ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.