All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] preempt_count rework -v2
@ 2013-09-10 13:08 Peter Zijlstra
  2013-09-10 13:08 ` [PATCH 1/7] sched: Introduce preempt_count accessor functions Peter Zijlstra
                   ` (7 more replies)
  0 siblings, 8 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
	Peter Zijlstra

These patches optimize preempt_enable by firstly folding the preempt and
need_resched tests into one -- this should work for all architectures. And
secondly by providing per-arch preempt_count implementations; with x86 using
per-cpu preempt_count for fastest access.


These patches have been boot tested on CONFIG_PREEMPT=y x86_64 and survive
building a x86_64-defconfig kernel.

kernel/sched/core.c:kick_process() now looks like:

  ffffffff8106f3f0 <kick_process>:
  ffffffff8106f3f0:       55                      push   %rbp
  ffffffff8106f3f1:       65 ff 04 25 e0 b7 00    incl   %gs:0xb7e0
  ffffffff8106f3f8:       00 
  ffffffff8106f3f9:       48 89 e5                mov    %rsp,%rbp
  ffffffff8106f3fc:       48 8b 47 08             mov    0x8(%rdi),%rax
  ffffffff8106f400:       8b 50 18                mov    0x18(%rax),%edx
  ffffffff8106f403:       65 8b 04 25 1c b0 00    mov    %gs:0xb01c,%eax
  ffffffff8106f40a:       00 
  ffffffff8106f40b:       39 c2                   cmp    %eax,%edx
  ffffffff8106f40d:       74 1b                   je     ffffffff8106f42a <kick_process+0x3a>
  ffffffff8106f40f:       89 d1                   mov    %edx,%ecx
  ffffffff8106f411:       48 c7 c0 00 2c 01 00    mov    $0x12c00,%rax
  ffffffff8106f418:       48 8b 0c cd a0 bc cb    mov    -0x7e344360(,%rcx,8),%rcx
  ffffffff8106f41f:       81 
  ffffffff8106f420:       48 3b bc 08 00 08 00    cmp    0x800(%rax,%rcx,1),%rdi
  ffffffff8106f427:       00 
  ffffffff8106f428:       74 1e                   je     ffffffff8106f448 <kick_process+0x58>
* ffffffff8106f42a:       65 ff 0c 25 e0 b7 00    decl   %gs:0xb7e0
  ffffffff8106f431:       00 
* ffffffff8106f432:       0f 94 c0                sete   %al
* ffffffff8106f435:       84 c0                   test   %al,%al
* ffffffff8106f437:       75 02                   jne    ffffffff8106f43b <kick_process+0x4b>
  ffffffff8106f439:       5d                      pop    %rbp
  ffffffff8106f43a:       c3                      retq   
* ffffffff8106f43b:       e8 b0 b6 f9 ff          callq  ffffffff8100aaf0 <___preempt_schedule>
  ffffffff8106f440:       5d                      pop    %rbp
  ffffffff8106f441:       c3                      retq   
  ffffffff8106f442:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  ffffffff8106f448:       89 d7                   mov    %edx,%edi
  ffffffff8106f44a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  ffffffff8106f450:       ff 15 ea e0 ba 00       callq  *0xbae0ea(%rip)        # ffffffff81c1d540 <smp_ops+0x20>
  ffffffff8106f456:       eb d2                   jmp    ffffffff8106f42a <kick_process+0x3a>
  ffffffff8106f458:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  ffffffff8106f45f:       00 

Where the '*' marked lines are preempt_enable(), sadly GCC isn't able to get
rid of the sete+test :/ Its a rather frequent pattern in the kernel, so
'fixing' the x86 GCC backend to recognise this might be useful.


---
 arch/alpha/include/asm/Kbuild      |   1 +
 arch/arc/include/asm/Kbuild        |   1 +
 arch/arm/include/asm/Kbuild        |   1 +
 arch/arm64/include/asm/Kbuild      |   1 +
 arch/avr32/include/asm/Kbuild      |   1 +
 arch/blackfin/include/asm/Kbuild   |   1 +
 arch/c6x/include/asm/Kbuild        |   1 +
 arch/cris/include/asm/Kbuild       |   1 +
 arch/frv/include/asm/Kbuild        |   1 +
 arch/h8300/include/asm/Kbuild      |   1 +
 arch/hexagon/include/asm/Kbuild    |   1 +
 arch/ia64/include/asm/Kbuild       |   1 +
 arch/m32r/include/asm/Kbuild       |   1 +
 arch/m68k/include/asm/Kbuild       |   1 +
 arch/metag/include/asm/Kbuild      |   1 +
 arch/microblaze/include/asm/Kbuild |   1 +
 arch/mips/include/asm/Kbuild       |   1 +
 arch/mips/mm/init.c                |   5 +-
 arch/mn10300/include/asm/Kbuild    |   1 +
 arch/openrisc/include/asm/Kbuild   |   1 +
 arch/parisc/include/asm/Kbuild     |   1 +
 arch/powerpc/include/asm/Kbuild    |   1 +
 arch/s390/include/asm/Kbuild       |   1 +
 arch/score/include/asm/Kbuild      |   1 +
 arch/sh/include/asm/Kbuild         |   1 +
 arch/sparc/include/asm/Kbuild      |   1 +
 arch/tile/include/asm/Kbuild       |   1 +
 arch/um/include/asm/Kbuild         |   1 +
 arch/unicore32/include/asm/Kbuild  |   1 +
 arch/x86/include/asm/calling.h     |  50 +++++++++++++++++
 arch/x86/include/asm/thread_info.h |   5 +-
 arch/x86/kernel/Makefile           |   2 +
 arch/x86/kernel/asm-offsets.c      |   1 -
 arch/x86/kernel/cpu/common.c       |   5 ++
 arch/x86/kernel/entry_32.S         |   7 +--
 arch/x86/kernel/entry_64.S         |   4 +-
 arch/x86/kernel/process_32.c       |  10 ++++
 arch/x86/kernel/process_64.c       |  10 ++++
 arch/x86/kernel/traps.c            |   4 +-
 arch/xtensa/include/asm/Kbuild     |   1 +
 include/linux/hardirq.h            |   8 +--
 include/linux/preempt.h            | 112 +++++++++++++++++--------------------
 include/linux/sched.h              |   5 --
 include/linux/thread_info.h        |   1 +
 include/linux/uaccess.h            |   8 +--
 include/trace/events/sched.h       |   2 +-
 init/main.c                        |   2 +-
 kernel/context_tracking.c          |   4 +-
 kernel/cpu/idle.c                  |   7 +++
 kernel/sched/core.c                |  54 +++++++++---------
 kernel/softirq.c                   |  16 +++---
 kernel/timer.c                     |   8 +--
 lib/smp_processor_id.c             |   3 +-
 53 files changed, 226 insertions(+), 136 deletions(-)


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 1/7] sched: Introduce preempt_count accessor functions
  2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
  2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
	Peter Zijlstra

[-- Attachment #1: peterz-preempt_count-accessors.patch --]
[-- Type: text/plain, Size: 4950 bytes --]

Replace the single preempt_count() 'function' that's an lvalue with
two proper functions:

 preempt_count() - returns the preempt_count value as rvalue
 preempt_count_ptr() - returns a pointer to the preempt_count

Then change all sites that modify the preempt count to use
preempt_count_ptr().

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/preempt.h |   20 ++++++++++++++------
 init/main.c             |    2 +-
 kernel/sched/core.c     |    4 ++--
 kernel/softirq.c        |    4 ++--
 kernel/timer.c          |    8 ++++----
 lib/smp_processor_id.c  |    3 +--
 6 files changed, 24 insertions(+), 17 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,19 +10,27 @@
 #include <linux/linkage.h>
 #include <linux/list.h>
 
+static __always_inline int preempt_count(void)
+{
+	return current_thread_info()->preempt_count;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+	return &current_thread_info()->preempt_count;
+}
+
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
 #else
-# define add_preempt_count(val)	do { preempt_count() += (val); } while (0)
-# define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
+# define add_preempt_count(val)	do { *preempt_count_ptr() += (val); } while (0)
+# define sub_preempt_count(val)	do { *preempt_count_ptr() -= (val); } while (0)
 #endif
 
 #define inc_preempt_count() add_preempt_count(1)
 #define dec_preempt_count() sub_preempt_count(1)
 
-#define preempt_count()	(current_thread_info()->preempt_count)
-
 #ifdef CONFIG_PREEMPT
 
 asmlinkage void preempt_schedule(void);
@@ -81,9 +89,9 @@ do { \
 
 /* For debugging and tracer internals only! */
 #define add_preempt_count_notrace(val)			\
-	do { preempt_count() += (val); } while (0)
+	do { *preempt_count_ptr() += (val); } while (0)
 #define sub_preempt_count_notrace(val)			\
-	do { preempt_count() -= (val); } while (0)
+	do { *preempt_count_ptr() -= (val); } while (0)
 #define inc_preempt_count_notrace() add_preempt_count_notrace(1)
 #define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
 
--- a/init/main.c
+++ b/init/main.c
@@ -690,7 +690,7 @@ int __init_or_module do_one_initcall(ini
 
 	if (preempt_count() != count) {
 		sprintf(msgbuf, "preemption imbalance ");
-		preempt_count() = count;
+		*preempt_count_ptr() = count;
 	}
 	if (irqs_disabled()) {
 		strlcat(msgbuf, "disabled interrupts ", sizeof(msgbuf));
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2233,7 +2233,7 @@ void __kprobes add_preempt_count(int val
 	if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
 		return;
 #endif
-	preempt_count() += val;
+	*preempt_count_ptr() += val;
 #ifdef CONFIG_DEBUG_PREEMPT
 	/*
 	 * Spinlock count overflowing soon?
@@ -2264,7 +2264,7 @@ void __kprobes sub_preempt_count(int val
 
 	if (preempt_count() == val)
 		trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
-	preempt_count() -= val;
+	*preempt_count_ptr() -= val;
 }
 EXPORT_SYMBOL(sub_preempt_count);
 
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -106,7 +106,7 @@ static void __local_bh_disable(unsigned
 	 * We must manually increment preempt_count here and manually
 	 * call the trace_preempt_off later.
 	 */
-	preempt_count() += cnt;
+	*preempt_count_ptr() += cnt;
 	/*
 	 * Were softirqs turned off above:
 	 */
@@ -256,7 +256,7 @@ asmlinkage void __do_softirq(void)
 				       " exited with %08x?\n", vec_nr,
 				       softirq_to_name[vec_nr], h->action,
 				       prev_count, preempt_count());
-				preempt_count() = prev_count;
+				*preempt_count_ptr() = prev_count;
 			}
 
 			rcu_bh_qs(cpu);
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1092,7 +1092,7 @@ static int cascade(struct tvec_base *bas
 static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 			  unsigned long data)
 {
-	int preempt_count = preempt_count();
+	int count = preempt_count();
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -1119,16 +1119,16 @@ static void call_timer_fn(struct timer_l
 
 	lock_map_release(&lockdep_map);
 
-	if (preempt_count != preempt_count()) {
+	if (count != preempt_count()) {
 		WARN_ONCE(1, "timer: %pF preempt leak: %08x -> %08x\n",
-			  fn, preempt_count, preempt_count());
+			  fn, count, preempt_count());
 		/*
 		 * Restore the preempt count. That gives us a decent
 		 * chance to survive and extract information. If the
 		 * callback kept a lock held, bad luck, but not worse
 		 * than the BUG() we had.
 		 */
-		preempt_count() = preempt_count;
+		*preempt_count_ptr() = count;
 	}
 }
 
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -9,10 +9,9 @@
 
 notrace unsigned int debug_smp_processor_id(void)
 {
-	unsigned long preempt_count = preempt_count();
 	int this_cpu = raw_smp_processor_id();
 
-	if (likely(preempt_count))
+	if (likely(preempt_count()))
 		goto out;
 
 	if (irqs_disabled())



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
  2013-09-10 13:08 ` [PATCH 1/7] sched: Introduce preempt_count accessor functions Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
  2013-09-11  1:59   ` Andy Lutomirski
  2013-09-11 11:14   ` Peter Zijlstra
  2013-09-10 13:08 ` [PATCH 3/7] sched, arch: Create asm/preempt.h Peter Zijlstra
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
	Peter Zijlstra

[-- Attachment #1: peterz-preempt_count-need_resched.patch --]
[-- Type: text/plain, Size: 7108 bytes --]

In order to combine the preemption and need_resched test we need to
fold the need_resched information into the preempt_count value.

We keep the existing TIF_NEED_RESCHED infrastructure in place but at 3
sites test it and fold its value into preempt_count; namely:

 - resched_task() when setting TIF_NEED_RESCHED on the current task
 - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
                   remote task it follows it up with a reschedule IPI
                   and we can modify the cpu local preempt_count from
                   there.
 - cpu_idle_loop() for when resched_task() found tsk_is_polling().

We use an inverted bitmask to indicate need_resched so that a 0 means
both need_resched and !atomic.

Also remove the barrier() in preempt_enable() between
preempt_enable_no_resched() and preempt_check_resched() to avoid
having to reload the preemption value and allow the compiler to use
the flags of the previuos decrement. I couldn't come up with any sane
reason for this barrier() to be there as preempt_enable_no_resched()
already has a barrier() before doing the decrement.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/preempt.h     |   42 +++++++++++++++++++++++++++++++++++++-----
 include/linux/sched.h       |    2 +-
 include/linux/thread_info.h |    1 +
 kernel/context_tracking.c   |    2 +-
 kernel/cpu/idle.c           |   10 ++++++++++
 kernel/sched/core.c         |   18 ++++++++++++++----
 6 files changed, 64 insertions(+), 11 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,9 +10,19 @@
 #include <linux/linkage.h>
 #include <linux/list.h>
 
+/*
+ * We use the MSB mostly because its available; see <linux/hardirq.h> for
+ * the other bits.
+ */
+#define PREEMPT_NEED_RESCHED	0x80000000
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
 static __always_inline int preempt_count(void)
 {
-	return current_thread_info()->preempt_count;
+	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
 }
 
 static __always_inline int *preempt_count_ptr(void)
@@ -20,6 +30,30 @@ static __always_inline int *preempt_coun
 	return &current_thread_info()->preempt_count;
 }
 
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+	*preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+	*preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+	return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
+}
+
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
@@ -37,7 +71,7 @@ asmlinkage void preempt_schedule(void);
 
 #define preempt_check_resched() \
 do { \
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+	if (unlikely(!*preempt_count_ptr())) \
 		preempt_schedule(); \
 } while (0)
 
@@ -47,7 +81,7 @@ void preempt_schedule_context(void);
 
 #define preempt_check_resched_context() \
 do { \
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+	if (unlikely(!*preempt_count_ptr())) \
 		preempt_schedule_context(); \
 } while (0)
 #else
@@ -83,7 +117,6 @@ do { \
 #define preempt_enable() \
 do { \
 	preempt_enable_no_resched(); \
-	barrier(); \
 	preempt_check_resched(); \
 } while (0)
 
@@ -111,7 +144,6 @@ do { \
 #define preempt_enable_notrace() \
 do { \
 	preempt_enable_no_resched_notrace(); \
-	barrier(); \
 	preempt_check_resched_context(); \
 } while (0)
 
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2405,7 +2405,7 @@ static inline int signal_pending_state(l
 
 static inline int need_resched(void)
 {
-	return unlikely(test_thread_flag(TIF_NEED_RESCHED));
+	return unlikely(test_preempt_need_resched());
 }
 
 /*
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -104,6 +104,7 @@ static inline int test_ti_thread_flag(st
 #define test_thread_flag(flag) \
 	test_ti_thread_flag(current_thread_info(), flag)
 
+#define test_need_resched()	test_thread_flag(TIF_NEED_RESCHED)
 #define set_need_resched()	set_thread_flag(TIF_NEED_RESCHED)
 #define clear_need_resched()	clear_thread_flag(TIF_NEED_RESCHED)
 
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -115,7 +115,7 @@ void __sched notrace preempt_schedule_co
 {
 	enum ctx_state prev_ctx;
 
-	if (likely(!preemptible()))
+	if (likely(preempt_count() || irqs_disabled()))
 		return;
 
 	/*
--- a/kernel/cpu/idle.c
+++ b/kernel/cpu/idle.c
@@ -106,6 +106,13 @@ static void cpu_idle_loop(void)
 				current_set_polling();
 			}
 			arch_cpu_idle_exit();
+			/*
+			 * We need to test and propagate the TIF_NEED_RESCHED
+			 * bit here because we might not have send the
+			 * reschedule IPI to idle tasks.
+			 */
+			if (test_need_resched())
+				set_preempt_need_resched();
 		}
 		tick_nohz_idle_exit();
 		schedule_preempt_disabled();
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -526,8 +526,10 @@ void resched_task(struct task_struct *p)
 	set_tsk_need_resched(p);
 
 	cpu = task_cpu(p);
-	if (cpu == smp_processor_id())
+	if (cpu == smp_processor_id()) {
+		set_preempt_need_resched();
 		return;
+	}
 
 	/* NEED_RESCHED must be visible before we test polling */
 	smp_mb();
@@ -1411,6 +1413,14 @@ static void sched_ttwu_pending(void)
 
 void scheduler_ipi(void)
 {
+	/*
+	 * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting
+	 * TIF_NEED_RESCHED remotely (for the first time) will also send
+	 * this IPI.
+	 */
+	if (test_need_resched())
+		set_preempt_need_resched();
+
 	if (llist_empty(&this_rq()->wake_list)
 			&& !tick_nohz_full_cpu(smp_processor_id())
 			&& !got_nohz_idle_kick())
@@ -2445,6 +2455,7 @@ static void __sched __schedule(void)
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
 	clear_tsk_need_resched(prev);
+	clear_preempt_need_resched();
 	rq->skip_clock_update = 0;
 
 	if (likely(prev != next)) {
@@ -2531,7 +2542,7 @@ asmlinkage void __sched notrace preempt_
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task. Just return..
 	 */
-	if (likely(!preemptible()))
+	if (likely(preempt_count() || irqs_disabled()))
 		return;
 
 	do {
@@ -2556,11 +2567,10 @@ EXPORT_SYMBOL(preempt_schedule);
  */
 asmlinkage void __sched preempt_schedule_irq(void)
 {
-	struct thread_info *ti = current_thread_info();
 	enum ctx_state prev_state;
 
 	/* Catch callers which need to be fixed */
-	BUG_ON(ti->preempt_count || !irqs_disabled());
+	BUG_ON(preempt_count() || !irqs_disabled());
 
 	prev_state = exception_enter();
 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 3/7] sched, arch: Create asm/preempt.h
  2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
  2013-09-10 13:08 ` [PATCH 1/7] sched: Introduce preempt_count accessor functions Peter Zijlstra
  2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
  2013-09-10 13:08 ` [PATCH 4/7] sched: Create more preempt_count accessors Peter Zijlstra
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
	Peter Zijlstra

[-- Attachment #1: peterz-asm-preempt_count.patch --]
[-- Type: text/plain, Size: 10772 bytes --]

In order to prepare to per-arch implementations of preempt_count move
the required bits into an asm-generic header and use this for all
archs.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/alpha/include/asm/Kbuild      |    1 
 arch/arc/include/asm/Kbuild        |    1 
 arch/arm/include/asm/Kbuild        |    1 
 arch/arm64/include/asm/Kbuild      |    1 
 arch/avr32/include/asm/Kbuild      |    1 
 arch/blackfin/include/asm/Kbuild   |    1 
 arch/c6x/include/asm/Kbuild        |    1 
 arch/cris/include/asm/Kbuild       |    1 
 arch/frv/include/asm/Kbuild        |    1 
 arch/h8300/include/asm/Kbuild      |    1 
 arch/hexagon/include/asm/Kbuild    |    1 
 arch/ia64/include/asm/Kbuild       |    1 
 arch/m32r/include/asm/Kbuild       |    1 
 arch/m68k/include/asm/Kbuild       |    1 
 arch/metag/include/asm/Kbuild      |    1 
 arch/microblaze/include/asm/Kbuild |    1 
 arch/mips/include/asm/Kbuild       |    1 
 arch/mn10300/include/asm/Kbuild    |    1 
 arch/openrisc/include/asm/Kbuild   |    1 
 arch/parisc/include/asm/Kbuild     |    1 
 arch/powerpc/include/asm/Kbuild    |    1 
 arch/s390/include/asm/Kbuild       |    1 
 arch/score/include/asm/Kbuild      |    1 
 arch/sh/include/asm/Kbuild         |    1 
 arch/sparc/include/asm/Kbuild      |    1 
 arch/tile/include/asm/Kbuild       |    1 
 arch/um/include/asm/Kbuild         |    1 
 arch/unicore32/include/asm/Kbuild  |    1 
 arch/x86/include/asm/Kbuild        |    1 
 arch/xtensa/include/asm/Kbuild     |    1 
 include/asm-generic/preempt.h      |   44 +++++++++++++++++++++++++++++++++++++
 include/linux/preempt.h            |   39 --------------------------------
 32 files changed, 75 insertions(+), 38 deletions(-)

--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 
 generic-y += exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -46,3 +46,4 @@ generic-y += ucontext.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -33,3 +33,4 @@ generic-y += timex.h
 generic-y += trace_clock.h
 generic-y += types.h
 generic-y += unaligned.h
+generic-y += preempt.h
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -50,3 +50,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y	+= clkdev.h
 generic-y	+= exec.h
 generic-y	+= trace_clock.h
 generic-y	+= param.h
+generic-y += preempt.h
--- a/arch/blackfin/include/asm/Kbuild
+++ b/arch/blackfin/include/asm/Kbuild
@@ -44,3 +44,4 @@ generic-y += ucontext.h
 generic-y += unaligned.h
 generic-y += user.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -56,3 +56,4 @@ generic-y += ucontext.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/cris/include/asm/Kbuild
+++ b/arch/cris/include/asm/Kbuild
@@ -11,3 +11,4 @@ generic-y += module.h
 generic-y += trace_clock.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/frv/include/asm/Kbuild
+++ b/arch/frv/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/h8300/include/asm/Kbuild
+++ b/arch/h8300/include/asm/Kbuild
@@ -6,3 +6,4 @@ generic-y += mmu.h
 generic-y += module.h
 generic-y += trace_clock.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -53,3 +53,4 @@ generic-y += types.h
 generic-y += ucontext.h
 generic-y += unaligned.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -3,4 +3,5 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += kvm_para.h
 generic-y += trace_clock.h
+generic-y += preempt.h
 generic-y += vtime.h
\ No newline at end of file
--- a/arch/m32r/include/asm/Kbuild
+++ b/arch/m32r/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += module.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -31,3 +31,4 @@ generic-y += trace_clock.h
 generic-y += types.h
 generic-y += word-at-a-time.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/metag/include/asm/Kbuild
+++ b/arch/metag/include/asm/Kbuild
@@ -52,3 +52,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
 generic-y += syscalls.h
+generic-y += preempt.h
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -1,2 +1,3 @@
 # MIPS headers
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/mn10300/include/asm/Kbuild
+++ b/arch/mn10300/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -67,3 +67,4 @@ generic-y += ucontext.h
 generic-y += user.h
 generic-y += word-at-a-time.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += word-at-a-time.h auxvec.h u
 	  div64.h irq_regs.h kdebug.h kvm_para.h local64.h local.h param.h \
 	  poll.h xor.h clkdev.h exec.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -2,4 +2,5 @@
 generic-y += clkdev.h
 generic-y += rwsem.h
 generic-y += trace_clock.h
+generic-y += preempt.h
 generic-y += vtime.h
\ No newline at end of file
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -2,3 +2,4 @@
 
 generic-y += clkdev.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/score/include/asm/Kbuild
+++ b/arch/score/include/asm/Kbuild
@@ -4,3 +4,4 @@ header-y +=
 generic-y += clkdev.h
 generic-y += trace_clock.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -34,3 +34,4 @@ generic-y += termios.h
 generic-y += trace_clock.h
 generic-y += ucontext.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -16,3 +16,4 @@ generic-y += serial.h
 generic-y += trace_clock.h
 generic-y += types.h
 generic-y += word-at-a-time.h
+generic-y += preempt.h
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -37,3 +37,4 @@ generic-y += termios.h
 generic-y += trace_clock.h
 generic-y += types.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += hw_irq.h irq_regs.h kdebug.
 generic-y += ftrace.h pci.h io.h param.h delay.h mutex.h current.h exec.h
 generic-y += switch_to.h clkdev.h
 generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@ -60,3 +60,4 @@ generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += xor.h
+generic-y += preempt.h
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,3 +5,4 @@ genhdr-y += unistd_64.h
 genhdr-y += unistd_x32.h
 
 generic-y += clkdev.h
+generic-y += preempt.h
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -28,3 +28,4 @@ generic-y += termios.h
 generic-y += topology.h
 generic-y += trace_clock.h
 generic-y += xor.h
+generic-y += preempt.h
--- /dev/null
+++ b/include/asm-generic/preempt.h
@@ -0,0 +1,44 @@
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <linux/thread_info.h>
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+	return &current_thread_info()->preempt_count;
+}
+
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+	*preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+	*preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+	return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
+}
+
+#endif /* __ASM_PREEMPT_H */
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -6,7 +6,6 @@
  * preempt_count (used for kernel preemption, interrupt count, etc.)
  */
 
-#include <linux/thread_info.h>
 #include <linux/linkage.h>
 #include <linux/list.h>
 
@@ -16,43 +15,7 @@
  */
 #define PREEMPT_NEED_RESCHED	0x80000000
 
-/*
- * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
- * that think a non-zero value indicates we cannot preempt.
- */
-static __always_inline int preempt_count(void)
-{
-	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
-}
-
-static __always_inline int *preempt_count_ptr(void)
-{
-	return &current_thread_info()->preempt_count;
-}
-
-/*
- * We fold the NEED_RESCHED bit into the preempt count such that
- * preempt_enable() can decrement and test for needing to reschedule with a
- * single instruction.
- *
- * We invert the actual bit, so that when the decrement hits 0 we know we both
- * need to resched (the bit is cleared) and can resched (no preempt count).
- */
-
-static __always_inline void set_preempt_need_resched(void)
-{
-	*preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
-}
-
-static __always_inline void clear_preempt_need_resched(void)
-{
-	*preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
-}
-
-static __always_inline bool test_preempt_need_resched(void)
-{
-	return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
-}
+#include <asm/preempt.h>
 
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
   extern void add_preempt_count(int val);



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 4/7] sched: Create more preempt_count accessors
  2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
                   ` (2 preceding siblings ...)
  2013-09-10 13:08 ` [PATCH 3/7] sched, arch: Create asm/preempt.h Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
  2013-09-10 13:08 ` [PATCH 5/7] sched: Extract the basic add/sub preempt_count modifiers Peter Zijlstra
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
	Peter Zijlstra

[-- Attachment #1: peterz-task_preempt_count.patch --]
[-- Type: text/plain, Size: 2762 bytes --]

We need a few special preempt_count accessors:
 - task_preempt_count() for when we're interested in the preemption
   count of another (non-running) task.
 - init_task_preempt_count() for properly initializing the preemption
   count.
 - init_idle_preempt_count() a special case of the above for the idle
   threads.

With these no generic code ever touches thread_info::preempt_count
anymore and architectures could choose to remove it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/asm-generic/preempt.h |   14 ++++++++++++++
 include/trace/events/sched.h  |    2 +-
 kernel/sched/core.c           |    7 +++----
 3 files changed, 18 insertions(+), 5 deletions(-)

--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -18,6 +18,20 @@ static __always_inline int *preempt_coun
 }
 
 /*
+ * must be macros to avoid header recursion hell
+ */
+#define task_preempt_count(p) \
+	(task_thread_info(p)->preempt_count & ~PREEMPT_NEED_RESCHED)
+
+#define init_task_preempt_count(p) do { \
+	task_thread_info(p)->preempt_count = 1 | PREEMPT_NEED_RESCHED; \
+} while (0)
+
+#define init_idle_preempt_count(p, cpu) do { \
+	task_thread_info(p)->preempt_count = 0; \
+} while (0)
+
+/*
  * We fold the NEED_RESCHED bit into the preempt count such that
  * preempt_enable() can decrement and test for needing to reschedule with a
  * single instruction.
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -100,7 +100,7 @@ static inline long __trace_sched_switch_
 	/*
 	 * For all intents and purposes a preempted task is a running task.
 	 */
-	if (task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)
+	if (task_preempt_count(p) & PREEMPT_ACTIVE)
 		state = TASK_RUNNING | TASK_STATE_MAX;
 #endif
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -996,7 +996,7 @@ void set_task_cpu(struct task_struct *p,
 	 * ttwu() will sort out the placement.
 	 */
 	WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
-			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
+			!(task_preempt_count(p) & PREEMPT_ACTIVE));
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -1743,8 +1743,7 @@ void sched_fork(struct task_struct *p)
 	p->on_cpu = 0;
 #endif
 #ifdef CONFIG_PREEMPT_COUNT
-	/* Want to start with kernel preemption disabled. */
-	task_thread_info(p)->preempt_count = 1;
+	init_task_preempt_count(p);
 #endif
 #ifdef CONFIG_SMP
 	plist_node_init(&p->pushable_tasks, MAX_PRIO);
@@ -4237,7 +4236,7 @@ void init_idle(struct task_struct *idle,
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 
 	/* Set the preempt count _outside_ the spinlocks! */
-	task_thread_info(idle)->preempt_count = 0;
+	init_idle_preempt_count(idle, cpu);
 
 	/*
 	 * The idle tasks have their own, simple scheduling class:



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 5/7] sched: Extract the basic add/sub preempt_count modifiers
  2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
                   ` (3 preceding siblings ...)
  2013-09-10 13:08 ` [PATCH 4/7] sched: Create more preempt_count accessors Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
  2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
	Peter Zijlstra

[-- Attachment #1: peterz-cleanup-preempt.patch --]
[-- Type: text/plain, Size: 14016 bytes --]

Rewrite the preempt_count macros in order to extract the 3 basic
preempt_count value modifiers:

  __preempt_count_add()
  __preempt_count_sub()

and the new:

  __preempt_count_dec_and_test()

And since we're at it anyway, replace the unconventional
$op_preempt_count names with the more conventional preempt_count_$op.

Since these basic operators are equivalent to the previous _notrace()
variants, do away with the _notrace() versions.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/mips/mm/init.c           |    5 -
 arch/x86/kernel/traps.c       |    4 -
 include/asm-generic/preempt.h |   35 +++++++++++++
 include/linux/hardirq.h       |    8 +--
 include/linux/preempt.h       |  106 +++++++++++++++++++-----------------------
 include/linux/sched.h         |    5 -
 include/linux/uaccess.h       |    8 ---
 kernel/context_tracking.c     |    2 
 kernel/sched/core.c           |   29 ++++-------
 kernel/softirq.c              |   12 ++--
 10 files changed, 112 insertions(+), 102 deletions(-)

--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -124,7 +124,7 @@ void *kmap_coherent(struct page *page, u
 
 	BUG_ON(Page_dcache_dirty(page));
 
-	inc_preempt_count();
+	pagefault_disable();
 	idx = (addr >> PAGE_SHIFT) & (FIX_N_COLOURS - 1);
 #ifdef CONFIG_MIPS_MT_SMTC
 	idx += FIX_N_COLOURS * smp_processor_id() +
@@ -193,8 +193,7 @@ void kunmap_coherent(void)
 	write_c0_entryhi(old_ctx);
 	EXIT_CRITICAL(flags);
 #endif
-	dec_preempt_count();
-	preempt_check_resched();
+	pagefault_enable();
 }
 
 void copy_user_highpage(struct page *to, struct page *from,
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -88,7 +88,7 @@ static inline void conditional_sti(struc
 
 static inline void preempt_conditional_sti(struct pt_regs *regs)
 {
-	inc_preempt_count();
+	preempt_count_inc();
 	if (regs->flags & X86_EFLAGS_IF)
 		local_irq_enable();
 }
@@ -103,7 +103,7 @@ static inline void preempt_conditional_c
 {
 	if (regs->flags & X86_EFLAGS_IF)
 		local_irq_disable();
-	dec_preempt_count();
+	preempt_count_dec();
 }
 
 static int __kprobes
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -55,4 +55,39 @@ static __always_inline bool test_preempt
 	return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
 }
 
+/*
+ * The various preempt_count add/sub methods
+ */
+
+static __always_inline void __preempt_count_add(int val)
+{
+	*preempt_count_ptr() += val;
+}
+
+static __always_inline void __preempt_count_sub(int val)
+{
+	*preempt_count_ptr() -= val;
+}
+
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+	return !--*preempt_count_ptr();
+}
+
+/*
+ * Returns true when we need to resched -- even if we can not.
+ */
+static __always_inline bool need_resched(void)
+{
+	return unlikely(test_preempt_need_resched());
+}
+
+/*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+static __always_inline bool should_resched(void)
+{
+	return unlikely(!*preempt_count_ptr());
+}
+
 #endif /* __ASM_PREEMPT_H */
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -37,7 +37,7 @@ extern void rcu_nmi_exit(void);
 #define __irq_enter()					\
 	do {						\
 		account_irq_enter_time(current);	\
-		add_preempt_count(HARDIRQ_OFFSET);	\
+		preempt_count_add(HARDIRQ_OFFSET);	\
 		trace_hardirq_enter();			\
 	} while (0)
 
@@ -53,7 +53,7 @@ extern void irq_enter(void);
 	do {						\
 		trace_hardirq_exit();			\
 		account_irq_exit_time(current);		\
-		sub_preempt_count(HARDIRQ_OFFSET);	\
+		preempt_count_sub(HARDIRQ_OFFSET);	\
 	} while (0)
 
 /*
@@ -66,7 +66,7 @@ extern void irq_exit(void);
 		lockdep_off();					\
 		ftrace_nmi_enter();				\
 		BUG_ON(in_nmi());				\
-		add_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET);	\
+		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		rcu_nmi_enter();				\
 		trace_hardirq_enter();				\
 	} while (0)
@@ -76,7 +76,7 @@ extern void irq_exit(void);
 		trace_hardirq_exit();				\
 		rcu_nmi_exit();					\
 		BUG_ON(!in_nmi());				\
-		sub_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET);	\
+		preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		ftrace_nmi_exit();				\
 		lockdep_on();					\
 	} while (0)
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -18,97 +18,86 @@
 #include <asm/preempt.h>
 
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
-  extern void add_preempt_count(int val);
-  extern void sub_preempt_count(int val);
+extern void preempt_count_add(int val);
+extern void preempt_count_sub(int val);
+#define preempt_count_dec_and_test() ({ preempt_count_sub(1); should_resched(); })
 #else
-# define add_preempt_count(val)	do { *preempt_count_ptr() += (val); } while (0)
-# define sub_preempt_count(val)	do { *preempt_count_ptr() -= (val); } while (0)
+#define preempt_count_add(val)	__preempt_count_add(val)
+#define preempt_count_sub(val)	__preempt_count_sub(val)
+#define preempt_count_dec_and_test() __preempt_count_dec_and_test()
 #endif
 
-#define inc_preempt_count() add_preempt_count(1)
-#define dec_preempt_count() sub_preempt_count(1)
-
-#ifdef CONFIG_PREEMPT
-
-asmlinkage void preempt_schedule(void);
-
-#define preempt_check_resched() \
-do { \
-	if (unlikely(!*preempt_count_ptr())) \
-		preempt_schedule(); \
-} while (0)
-
-#ifdef CONFIG_CONTEXT_TRACKING
-
-void preempt_schedule_context(void);
-
-#define preempt_check_resched_context() \
-do { \
-	if (unlikely(!*preempt_count_ptr())) \
-		preempt_schedule_context(); \
-} while (0)
-#else
-
-#define preempt_check_resched_context() preempt_check_resched()
-
-#endif /* CONFIG_CONTEXT_TRACKING */
-
-#else /* !CONFIG_PREEMPT */
-
-#define preempt_check_resched()		do { } while (0)
-#define preempt_check_resched_context()	do { } while (0)
-
-#endif /* CONFIG_PREEMPT */
+#define __preempt_count_inc() __preempt_count_add(1)
+#define __preempt_count_dec() __preempt_count_sub(1)
 
+#define preempt_count_inc() preempt_count_add(1)
+#define preempt_count_dec() preempt_count_sub(1)
 
 #ifdef CONFIG_PREEMPT_COUNT
 
 #define preempt_disable() \
 do { \
-	inc_preempt_count(); \
+	preempt_count_inc(); \
 	barrier(); \
 } while (0)
 
 #define sched_preempt_enable_no_resched() \
 do { \
 	barrier(); \
-	dec_preempt_count(); \
+	preempt_count_dec(); \
 } while (0)
 
-#define preempt_enable_no_resched()	sched_preempt_enable_no_resched()
+#define preempt_enable_no_resched() sched_preempt_enable_no_resched()
 
+#ifdef CONFIG_PREEMPT
+asmlinkage void preempt_schedule(void);
 #define preempt_enable() \
 do { \
-	preempt_enable_no_resched(); \
-	preempt_check_resched(); \
+	barrier(); \
+	if (unlikely(preempt_count_dec_and_test())) \
+		preempt_schedule(); \
 } while (0)
 
-/* For debugging and tracer internals only! */
-#define add_preempt_count_notrace(val)			\
-	do { *preempt_count_ptr() += (val); } while (0)
-#define sub_preempt_count_notrace(val)			\
-	do { *preempt_count_ptr() -= (val); } while (0)
-#define inc_preempt_count_notrace() add_preempt_count_notrace(1)
-#define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
+#define preempt_check_resched() \
+do { \
+	if (should_resched()) \
+		preempt_schedule(); \
+} while (0)
+
+#else
+#define preempt_enable() preempt_enable_no_resched()
+#define preempt_check_resched() do { } while (0)
+#endif
 
 #define preempt_disable_notrace() \
 do { \
-	inc_preempt_count_notrace(); \
+	__preempt_count_inc(); \
 	barrier(); \
 } while (0)
 
 #define preempt_enable_no_resched_notrace() \
 do { \
 	barrier(); \
-	dec_preempt_count_notrace(); \
+	__preempt_count_dec(); \
 } while (0)
 
-/* preempt_check_resched is OK to trace */
+#ifdef CONFIG_PREEMPT
+
+#ifdef CONFIG_CONTEXT_TRACKING
+asmlinkage void preempt_schedule_context(void);
+#else
+#define preempt_schedule_context() preempt_schedule()
+#endif
+
 #define preempt_enable_notrace() \
 do { \
-	preempt_enable_no_resched_notrace(); \
-	preempt_check_resched_context(); \
+	barrier(); \
+	if (unlikely(__preempt_count_dec_and_test())) \
+		preempt_schedule_context(); \
 } while (0)
+#else
+#define preempt_enable_notrace() preempt_enable_no_resched_notrace()
+#endif
 
 #else /* !CONFIG_PREEMPT_COUNT */
 
@@ -118,10 +107,11 @@ do { \
  * that can cause faults and scheduling migrate into our preempt-protected
  * region.
  */
-#define preempt_disable()		barrier()
+#define preempt_disable()			barrier()
 #define sched_preempt_enable_no_resched()	barrier()
-#define preempt_enable_no_resched()	barrier()
-#define preempt_enable()		barrier()
+#define preempt_enable_no_resched()		barrier()
+#define preempt_enable()			barrier()
+#define preempt_check_resched()			do { } while (0)
 
 #define preempt_disable_notrace()		barrier()
 #define preempt_enable_no_resched_notrace()	barrier()
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2403,11 +2403,6 @@ static inline int signal_pending_state(l
 	return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
 }
 
-static inline int need_resched(void)
-{
-	return unlikely(test_preempt_need_resched());
-}
-
 /*
  * cond_resched() and cond_resched_lock(): latency reduction via
  * explicit rescheduling in places that are safe. The return
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -15,7 +15,7 @@
  */
 static inline void pagefault_disable(void)
 {
-	inc_preempt_count();
+	preempt_count_inc();
 	/*
 	 * make sure to have issued the store before a pagefault
 	 * can hit.
@@ -30,11 +30,7 @@ static inline void pagefault_enable(void
 	 * the pagefault handler again.
 	 */
 	barrier();
-	dec_preempt_count();
-	/*
-	 * make sure we do..
-	 */
-	barrier();
+	preempt_count_dec();
 	preempt_check_resched();
 }
 
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -111,7 +111,7 @@ void context_tracking_user_enter(void)
  * instead of preempt_schedule() to exit user context if needed before
  * calling the scheduler.
  */
-void __sched notrace preempt_schedule_context(void)
+asmlinkage void __sched notrace preempt_schedule_context(void)
 {
 	enum ctx_state prev_ctx;
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2239,7 +2239,7 @@ notrace unsigned long get_parent_ip(unsi
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
 				defined(CONFIG_PREEMPT_TRACER))
 
-void __kprobes add_preempt_count(int val)
+void __kprobes preempt_count_add(int val)
 {
 #ifdef CONFIG_DEBUG_PREEMPT
 	/*
@@ -2248,7 +2248,7 @@ void __kprobes add_preempt_count(int val
 	if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
 		return;
 #endif
-	*preempt_count_ptr() += val;
+	__preempt_count_add(val);
 #ifdef CONFIG_DEBUG_PREEMPT
 	/*
 	 * Spinlock count overflowing soon?
@@ -2259,9 +2259,9 @@ void __kprobes add_preempt_count(int val
 	if (preempt_count() == val)
 		trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
 }
-EXPORT_SYMBOL(add_preempt_count);
+EXPORT_SYMBOL(preempt_count_add);
 
-void __kprobes sub_preempt_count(int val)
+void __kprobes preempt_count_sub(int val)
 {
 #ifdef CONFIG_DEBUG_PREEMPT
 	/*
@@ -2279,9 +2279,9 @@ void __kprobes sub_preempt_count(int val
 
 	if (preempt_count() == val)
 		trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
-	*preempt_count_ptr() -= val;
+	__preempt_count_sub(val);
 }
-EXPORT_SYMBOL(sub_preempt_count);
+EXPORT_SYMBOL(preempt_count_sub);
 
 #endif
 
@@ -2545,9 +2545,9 @@ asmlinkage void __sched notrace preempt_
 		return;
 
 	do {
-		add_preempt_count_notrace(PREEMPT_ACTIVE);
+		__preempt_count_add(PREEMPT_ACTIVE);
 		__schedule();
-		sub_preempt_count_notrace(PREEMPT_ACTIVE);
+		__preempt_count_sub(PREEMPT_ACTIVE);
 
 		/*
 		 * Check again in case we missed a preemption opportunity
@@ -2574,11 +2574,11 @@ asmlinkage void __sched preempt_schedule
 	prev_state = exception_enter();
 
 	do {
-		add_preempt_count(PREEMPT_ACTIVE);
+		__preempt_count_add(PREEMPT_ACTIVE);
 		local_irq_enable();
 		__schedule();
 		local_irq_disable();
-		sub_preempt_count(PREEMPT_ACTIVE);
+		__preempt_count_sub(PREEMPT_ACTIVE);
 
 		/*
 		 * Check again in case we missed a preemption opportunity
@@ -3818,16 +3818,11 @@ SYSCALL_DEFINE0(sched_yield)
 	return 0;
 }
 
-static inline int should_resched(void)
-{
-	return need_resched() && !(preempt_count() & PREEMPT_ACTIVE);
-}
-
 static void __cond_resched(void)
 {
-	add_preempt_count(PREEMPT_ACTIVE);
+	preempt_count_add(PREEMPT_ACTIVE);
 	__schedule();
-	sub_preempt_count(PREEMPT_ACTIVE);
+	preempt_count_sub(PREEMPT_ACTIVE);
 }
 
 int __sched _cond_resched(void)
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -100,7 +100,7 @@ static void __local_bh_disable(unsigned
 
 	raw_local_irq_save(flags);
 	/*
-	 * The preempt tracer hooks into add_preempt_count and will break
+	 * The preempt tracer hooks into preempt_count_add and will break
 	 * lockdep because it calls back into lockdep after SOFTIRQ_OFFSET
 	 * is set and before current->softirq_enabled is cleared.
 	 * We must manually increment preempt_count here and manually
@@ -120,7 +120,7 @@ static void __local_bh_disable(unsigned
 #else /* !CONFIG_TRACE_IRQFLAGS */
 static inline void __local_bh_disable(unsigned long ip, unsigned int cnt)
 {
-	add_preempt_count(cnt);
+	preempt_count_add(cnt);
 	barrier();
 }
 #endif /* CONFIG_TRACE_IRQFLAGS */
@@ -139,7 +139,7 @@ static void __local_bh_enable(unsigned i
 
 	if (softirq_count() == cnt)
 		trace_softirqs_on(_RET_IP_);
-	sub_preempt_count(cnt);
+	preempt_count_sub(cnt);
 }
 
 /*
@@ -169,12 +169,12 @@ static inline void _local_bh_enable_ip(u
 	 * Keep preemption disabled until we are done with
 	 * softirq processing:
  	 */
-	sub_preempt_count(SOFTIRQ_DISABLE_OFFSET - 1);
+	preempt_count_sub(SOFTIRQ_DISABLE_OFFSET - 1);
 
 	if (unlikely(!in_interrupt() && local_softirq_pending()))
 		do_softirq();
 
-	dec_preempt_count();
+	preempt_count_dec();
 #ifdef CONFIG_TRACE_IRQFLAGS
 	local_irq_enable();
 #endif
@@ -360,7 +360,7 @@ void irq_exit(void)
 
 	account_irq_exit_time(current);
 	trace_hardirq_exit();
-	sub_preempt_count(HARDIRQ_OFFSET);
+	preempt_count_sub(HARDIRQ_OFFSET);
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
  2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
                   ` (4 preceding siblings ...)
  2013-09-10 13:08 ` [PATCH 5/7] sched: Extract the basic add/sub preempt_count modifiers Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
  2013-09-10 13:27   ` Peter Zijlstra
                     ` (2 more replies)
  2013-09-10 13:08 ` [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call Peter Zijlstra
  2013-09-10 13:51 ` [PATCH 0/7] preempt_count rework -v2 Ingo Molnar
  7 siblings, 3 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
	Peter Zijlstra

[-- Attachment #1: peterz-x86-per-cpu-preempt_count.patch --]
[-- Type: text/plain, Size: 8172 bytes --]

Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/Kbuild        |    1 
 arch/x86/include/asm/preempt.h     |  102 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/thread_info.h |    5 -
 arch/x86/kernel/asm-offsets.c      |    1 
 arch/x86/kernel/cpu/common.c       |    5 +
 arch/x86/kernel/entry_32.S         |    7 --
 arch/x86/kernel/entry_64.S         |    4 -
 arch/x86/kernel/process_32.c       |   10 +++
 arch/x86/kernel/process_64.c       |   10 +++
 9 files changed, 132 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,4 +5,3 @@ genhdr-y += unistd_64.h
 genhdr-y += unistd_x32.h
 
 generic-y += clkdev.h
-generic-y += preempt.h
--- /dev/null
+++ b/arch/x86/include/asm/preempt.h
@@ -0,0 +1,102 @@
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <asm/percpu.h>
+#include <linux/thread_info.h>
+
+DECLARE_PER_CPU(int, __preempt_count);
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+	return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+	return &__raw_get_cpu_var(__preempt_count);
+}
+
+/*
+ * must be macros to avoid header recursion hell
+ */
+#define task_preempt_count(p) \
+	(task_thread_info(p)->saved_preempt_count & ~PREEMPT_NEED_RESCHED)
+
+#define init_task_preempt_count(p) do { \
+	task_thread_info(p)->saved_preempt_count = 1 | PREEMPT_NEED_RESCHED; \
+} while (0)
+
+#define init_idle_preempt_count(p, cpu) do { \
+	task_thread_info(p)->saved_preempt_count = 0; \
+	per_cpu(__preempt_count, (cpu)) = 0; \
+} while (0)
+
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+	__this_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+	__this_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+	return !(__this_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
+}
+
+/*
+ * The various preempt_count add/sub methods
+ */
+
+static __always_inline void __preempt_count_add(int val)
+{
+	__this_cpu_add_4(__preempt_count, val);
+}
+
+static __always_inline void __preempt_count_sub(int val)
+{
+	__this_cpu_add_4(__preempt_count, -val);
+}
+
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+	unsigned char c;
+
+	asm ("decl " __percpu_arg(0) "; sete %1"
+			: "+m" (__preempt_count), "=qm" (c));
+
+	return c != 0;
+}
+
+/*
+ * Returns true when we need to resched -- even if we can not.
+ */
+static __always_inline bool need_resched(void)
+{
+	return unlikely(test_preempt_need_resched());
+}
+
+/*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+static __always_inline bool should_resched(void)
+{
+	return unlikely(!__this_cpu_read_4(__preempt_count));
+}
+
+#endif /* __ASM_PREEMPT_H */
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -28,8 +28,7 @@ struct thread_info {
 	__u32			flags;		/* low level flags */
 	__u32			status;		/* thread synchronous flags */
 	__u32			cpu;		/* current CPU */
-	int			preempt_count;	/* 0 => preemptable,
-						   <0 => BUG */
+	int			saved_preempt_count;
 	mm_segment_t		addr_limit;
 	struct restart_block    restart_block;
 	void __user		*sysenter_return;
@@ -49,7 +48,7 @@ struct thread_info {
 	.exec_domain	= &default_exec_domain,	\
 	.flags		= 0,			\
 	.cpu		= 0,			\
-	.preempt_count	= INIT_PREEMPT_COUNT,	\
+	.saved_preempt_count = INIT_PREEMPT_COUNT,	\
 	.addr_limit	= KERNEL_DS,		\
 	.restart_block = {			\
 		.fn = do_no_restart_syscall,	\
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -32,7 +32,6 @@ void common(void) {
 	OFFSET(TI_flags, thread_info, flags);
 	OFFSET(TI_status, thread_info, status);
 	OFFSET(TI_addr_limit, thread_info, addr_limit);
-	OFFSET(TI_preempt_count, thread_info, preempt_count);
 
 	BLANK();
 	OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1095,6 +1095,9 @@ DEFINE_PER_CPU(char *, irq_stack_ptr) =
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
+
 DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
 
 /*
@@ -1169,6 +1172,8 @@ void debug_stack_reset(void)
 
 DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
 DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
 
 #ifdef CONFIG_CC_STACKPROTECTOR
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -362,12 +362,9 @@ END(ret_from_exception)
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
-	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
-	jnz restore_all
 need_resched:
-	movl TI_flags(%ebp), %ecx	# need_resched set ?
-	testb $_TIF_NEED_RESCHED, %cl
-	jz restore_all
+	cmpl $0,PER_CPU_VAR(__preempt_count)
+	jnz restore_all
 	testl $X86_EFLAGS_IF,PT_EFLAGS(%esp)	# interrupts off (exception path) ?
 	jz restore_all
 	call preempt_schedule_irq
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1103,10 +1103,8 @@ ENTRY(native_iret)
 	/* Returning to kernel space. Check if we need preemption */
 	/* rcx:	 threadinfo. interrupts off. */
 ENTRY(retint_kernel)
-	cmpl $0,TI_preempt_count(%rcx)
+	cmpl $0,PER_CPU_VAR(__preempt_count)
 	jnz  retint_restore_args
-	bt  $TIF_NEED_RESCHED,TI_flags(%rcx)
-	jnc  retint_restore_args
 	bt   $9,EFLAGS-ARGOFFSET(%rsp)	/* interrupts off? */
 	jnc  retint_restore_args
 	call preempt_schedule_irq
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -291,6 +291,16 @@ __switch_to(struct task_struct *prev_p,
 	if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
 		set_iopl_mask(next->iopl);
 
+#ifdef CONFIG_PREEMPT_COUNT
+	/*
+	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
+	 * preempt_count of all tasks was equal here and this would not be
+	 * needed.
+	 */
+	task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);
+	__raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;
+#endif
+
 	/*
 	 * Now maybe handle debug registers and/or IO bitmaps
 	 */
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -363,6 +363,16 @@ __switch_to(struct task_struct *prev_p,
 	this_cpu_write(old_rsp, next->usersp);
 	this_cpu_write(current_task, next_p);
 
+#ifdef CONFIG_PREEMPT_COUNT
+	/*
+	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
+	 * preempt_count of all tasks was equal here and this would not be
+	 * needed.
+	 */
+	task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);
+	__raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;
+#endif
+
 	this_cpu_write(kernel_stack,
 		  (unsigned long)task_stack_page(next_p) +
 		  THREAD_SIZE - KERNEL_STACK_OFFSET);



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
  2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
                   ` (5 preceding siblings ...)
  2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
  2013-09-10 13:42   ` Ingo Molnar
  2013-09-10 13:51 ` [PATCH 0/7] preempt_count rework -v2 Ingo Molnar
  7 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
	Peter Zijlstra

[-- Attachment #1: peterz-x86-opt-call.patch --]
[-- Type: text/plain, Size: 4962 bytes --]

Remove the bloat of the C calling convention out of the
preempt_enable() sites by creating an ASM wrapper which allows us to
do an asm("call ___preempt_schedule") instead.

calling.h bits by Andi Kleen

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/calling.h |   50 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/preempt.h |    8 ++++++
 arch/x86/kernel/Makefile       |    2 +
 arch/x86/kernel/preempt.S      |   25 ++++++++++++++++++++
 include/asm-generic/preempt.h  |   10 ++++++++
 include/linux/preempt.h        |   13 ++++------
 6 files changed, 100 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -48,6 +48,8 @@ For 32-bit we have the following convent
 
 #include <asm/dwarf2.h>
 
+#ifdef CONFIG_X86_64
+
 /*
  * 64-bit system call stack frame layout defines and helpers,
  * for assembly code:
@@ -192,3 +194,51 @@ For 32-bit we have the following convent
 	.macro icebp
 	.byte 0xf1
 	.endm
+
+#else /* CONFIG_X86_64 */
+
+/*
+ * For 32bit only simplified versions of SAVE_ALL/RESTORE_ALL. These
+ * are different from the entry_32.S versions in not changing the segment
+ * registers. So only suitable for in kernel use, not when transitioning
+ * from or to user space. The resulting stack frame is not a standard
+ * pt_regs frame. The main use case is calling C code from assembler
+ * when all the registers need to be preserved.
+ */
+
+	.macro SAVE_ALL
+	pushl_cfi %eax
+	CFI_REL_OFFSET eax, 0
+	pushl_cfi %ebp
+	CFI_REL_OFFSET ebp, 0
+	pushl_cfi %edi
+	CFI_REL_OFFSET edi, 0
+	pushl_cfi %esi
+	CFI_REL_OFFSET esi, 0
+	pushl_cfi %edx
+	CFI_REL_OFFSET edx, 0
+	pushl_cfi %ecx
+	CFI_REL_OFFSET ecx, 0
+	pushl_cfi %ebx
+	CFI_REL_OFFSET ebx, 0
+	.endm
+
+	.macro RESTORE_ALL
+	popl_cfi %ebx
+	CFI_RESTORE ebx
+	popl_cfi %ecx
+	CFI_RESTORE ecx
+	popl_cfi %edx
+	CFI_RESTORE edx
+	popl_cfi %esi
+	CFI_RESTORE esi
+	popl_cfi %edi
+	CFI_RESTORE edi
+	popl_cfi %ebp
+	CFI_RESTORE ebp
+	popl_cfi %eax
+	CFI_RESTORE eax
+	.endm
+
+#endif /* CONFIG_X86_64 */
+
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -99,4 +99,12 @@ static __always_inline bool should_resch
 	return unlikely(!__this_cpu_read_4(__preempt_count));
 }
 
+#ifdef CONFIG_PREEMPT
+#define __preempt_schedule() asm ("call ___preempt_schedule")
+
+#ifdef CONFIG_CONTEXT_TRACKING
+#define __preempt_schedule_context() asm ("call ___preempt_schedule_context")
+#endif
+#endif /* CONFIG_PREEMPT */
+
 #endif /* __ASM_PREEMPT_H */
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -36,6 +36,8 @@ obj-y			+= tsc.o io_delay.o rtc.o
 obj-y			+= pci-iommu_table.o
 obj-y			+= resource.o
 
+obj-$(CONFIG_PREEMPT)	+= preempt.o
+
 obj-y				+= process.o
 obj-y				+= i387.o xsave.o
 obj-y				+= ptrace.o
--- /dev/null
+++ b/arch/x86/kernel/preempt.S
@@ -0,0 +1,25 @@
+
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/asm.h>
+#include <asm/calling.h>
+
+ENTRY(___preempt_schedule)
+	CFI_STARTPROC
+	SAVE_ALL
+	call preempt_schedule
+	RESTORE_ALL
+	ret
+	CFI_ENDPROC
+
+#ifdef CONFIG_CONTEXT_TRACKING
+
+ENTRY(___preempt_schedule_context)
+	CFI_STARTPROC
+	SAVE_ALL
+	call preempt_schedule_context
+	RESTORE_ALL
+	ret
+	CFI_ENDPROC
+
+#endif
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -90,4 +90,14 @@ static __always_inline bool should_resch
 	return unlikely(!*preempt_count_ptr());
 }
 
+#ifdef CONFIG_PREEMPT
+extern asmlinkage void preempt_schedule(void);
+#define __preempt_schedule() preempt_schedule()
+
+#ifdef CONFIG_CONTEXT_TRACKING
+extern asmlinkage void preempt_schedule_context(void);
+#define __preempt_schedule_context() preempt_schedule_context()
+#endif
+#endif /* CONFIG_PREEMPT */
+
 #endif /* __ASM_PREEMPT_H */
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -50,18 +50,17 @@ do { \
 #define preempt_enable_no_resched() sched_preempt_enable_no_resched()
 
 #ifdef CONFIG_PREEMPT
-asmlinkage void preempt_schedule(void);
 #define preempt_enable() \
 do { \
 	barrier(); \
 	if (unlikely(preempt_count_dec_and_test())) \
-		preempt_schedule(); \
+		__preempt_schedule(); \
 } while (0)
 
 #define preempt_check_resched() \
 do { \
 	if (should_resched()) \
-		preempt_schedule(); \
+		__preempt_schedule(); \
 } while (0)
 
 #else
@@ -83,17 +82,15 @@ do { \
 
 #ifdef CONFIG_PREEMPT
 
-#ifdef CONFIG_CONTEXT_TRACKING
-asmlinkage void preempt_schedule_context(void);
-#else
-#define preempt_schedule_context() preempt_schedule()
+#ifndef CONFIG_CONTEXT_TRACKING
+#define __preempt_schedule_context() __preempt_schedule()
 #endif
 
 #define preempt_enable_notrace() \
 do { \
 	barrier(); \
 	if (unlikely(__preempt_count_dec_and_test())) \
-		preempt_schedule_context(); \
+		__preempt_schedule_context(); \
 } while (0)
 #else
 #define preempt_enable_notrace() preempt_enable_no_resched_notrace()



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
  2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
@ 2013-09-10 13:27   ` Peter Zijlstra
  2013-09-10 14:02   ` Eric Dumazet
  2013-09-10 16:48   ` Peter Zijlstra
  2 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:27 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch

On Tue, Sep 10, 2013 at 03:08:17PM +0200, Peter Zijlstra wrote:
> @@ -1095,6 +1095,9 @@ DEFINE_PER_CPU(char *, irq_stack_ptr) =
>  
>  DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
>  
> +DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
> +EXPORT_PER_CPU_SYMBOL(__preempt_count);
> +

Note that afaict this patch takes away the reason irq_count exists.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
  2013-09-10 13:08 ` [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call Peter Zijlstra
@ 2013-09-10 13:42   ` Ingo Molnar
  2013-09-10 13:55       ` Jan Beulich
  0 siblings, 1 reply; 52+ messages in thread
From: Ingo Molnar @ 2013-09-10 13:42 UTC (permalink / raw)
  To: Peter Zijlstra, Jan Beulich
  Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	linux-kernel, linux-arch


* Peter Zijlstra <peterz@infradead.org> wrote:

> +	.macro SAVE_ALL
> +	pushl_cfi %eax
> +	CFI_REL_OFFSET eax, 0
> +	pushl_cfi %ebp
> +	CFI_REL_OFFSET ebp, 0
> +	pushl_cfi %edi
> +	CFI_REL_OFFSET edi, 0
> +	pushl_cfi %esi
> +	CFI_REL_OFFSET esi, 0
> +	pushl_cfi %edx
> +	CFI_REL_OFFSET edx, 0
> +	pushl_cfi %ecx
> +	CFI_REL_OFFSET ecx, 0
> +	pushl_cfi %ebx
> +	CFI_REL_OFFSET ebx, 0
> +	.endm
> +
> +	.macro RESTORE_ALL
> +	popl_cfi %ebx
> +	CFI_RESTORE ebx
> +	popl_cfi %ecx
> +	CFI_RESTORE ecx
> +	popl_cfi %edx
> +	CFI_RESTORE edx
> +	popl_cfi %esi
> +	CFI_RESTORE esi
> +	popl_cfi %edi
> +	CFI_RESTORE edi
> +	popl_cfi %ebp
> +	CFI_RESTORE ebp
> +	popl_cfi %eax
> +	CFI_RESTORE eax
> +	.endm

Side note: shouldn't the pushl_cfi and popl_cfi macros be adjusted, 
instead of open coding it?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
                   ` (6 preceding siblings ...)
  2013-09-10 13:08 ` [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call Peter Zijlstra
@ 2013-09-10 13:51 ` Ingo Molnar
  2013-09-10 13:56   ` Ingo Molnar
  7 siblings, 1 reply; 52+ messages in thread
From: Ingo Molnar @ 2013-09-10 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	linux-kernel, linux-arch


* Peter Zijlstra <peterz@infradead.org> wrote:

> These patches optimize preempt_enable by firstly folding the preempt and
> need_resched tests into one -- this should work for all architectures. And
> secondly by providing per-arch preempt_count implementations; with x86 using
> per-cpu preempt_count for fastest access.
> 
> 
> These patches have been boot tested on CONFIG_PREEMPT=y x86_64 and survive
> building a x86_64-defconfig kernel.
> 
> kernel/sched/core.c:kick_process() now looks like:
> 
>   ffffffff8106f3f0 <kick_process>:
>   ffffffff8106f3f0:       55                      push   %rbp
>   ffffffff8106f3f1:       65 ff 04 25 e0 b7 00    incl   %gs:0xb7e0
>   ffffffff8106f3f8:       00 
>   ffffffff8106f3f9:       48 89 e5                mov    %rsp,%rbp
>   ffffffff8106f3fc:       48 8b 47 08             mov    0x8(%rdi),%rax
>   ffffffff8106f400:       8b 50 18                mov    0x18(%rax),%edx
>   ffffffff8106f403:       65 8b 04 25 1c b0 00    mov    %gs:0xb01c,%eax
>   ffffffff8106f40a:       00 
>   ffffffff8106f40b:       39 c2                   cmp    %eax,%edx
>   ffffffff8106f40d:       74 1b                   je     ffffffff8106f42a <kick_process+0x3a>
>   ffffffff8106f40f:       89 d1                   mov    %edx,%ecx
>   ffffffff8106f411:       48 c7 c0 00 2c 01 00    mov    $0x12c00,%rax
>   ffffffff8106f418:       48 8b 0c cd a0 bc cb    mov    -0x7e344360(,%rcx,8),%rcx
>   ffffffff8106f41f:       81 
>   ffffffff8106f420:       48 3b bc 08 00 08 00    cmp    0x800(%rax,%rcx,1),%rdi
>   ffffffff8106f427:       00 
>   ffffffff8106f428:       74 1e                   je     ffffffff8106f448 <kick_process+0x58>
> * ffffffff8106f42a:       65 ff 0c 25 e0 b7 00    decl   %gs:0xb7e0
>   ffffffff8106f431:       00 
> * ffffffff8106f432:       0f 94 c0                sete   %al
> * ffffffff8106f435:       84 c0                   test   %al,%al
> * ffffffff8106f437:       75 02                   jne    ffffffff8106f43b <kick_process+0x4b>
>   ffffffff8106f439:       5d                      pop    %rbp
>   ffffffff8106f43a:       c3                      retq   
> * ffffffff8106f43b:       e8 b0 b6 f9 ff          callq  ffffffff8100aaf0 <___preempt_schedule>

Mind also posting the 'before' assembly, to make it clear how much we've 
improved things?

>   ffffffff8106f440:       5d                      pop    %rbp
>   ffffffff8106f441:       c3                      retq   
>   ffffffff8106f442:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
>   ffffffff8106f448:       89 d7                   mov    %edx,%edi
>   ffffffff8106f44a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
>   ffffffff8106f450:       ff 15 ea e0 ba 00       callq  *0xbae0ea(%rip)        # ffffffff81c1d540 <smp_ops+0x20>
>   ffffffff8106f456:       eb d2                   jmp    ffffffff8106f42a <kick_process+0x3a>
>   ffffffff8106f458:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
>   ffffffff8106f45f:       00 
> 
> Where the '*' marked lines are preempt_enable(), sadly GCC isn't able to 
> get rid of the sete+test :/ Its a rather frequent pattern in the kernel, 
> so 'fixing' the x86 GCC backend to recognise this might be useful.

So what we do in kick_process() is:

        preempt_disable();
        cpu = task_cpu(p);
        if ((cpu != smp_processor_id()) && task_curr(p))
                smp_send_reschedule(cpu);
        preempt_enable();

The preempt_disable() looks sweet:

>   ffffffff8106f3f1:       65 ff 04 25 e0 b7 00    incl   %gs:0xb7e0
>   ffffffff8106f3f8:       00 

and the '*' you marked is the preempt_enable() portion, which, with your 
new code, looks like this:

 #define preempt_check_resched() \
 do { \
        if (unlikely(!*preempt_count_ptr())) \
                preempt_schedule(); \
 } while (0)

Which GCC translates to:

> * ffffffff8106f42a:       65 ff 0c 25 e0 b7 00    decl   %gs:0xb7e0
>   ffffffff8106f431:       00 
> * ffffffff8106f432:       0f 94 c0                sete   %al
> * ffffffff8106f435:       84 c0                   test   %al,%al
> * ffffffff8106f437:       75 02                   jne    ffffffff8106f43b <kick_process+0x4b>

So, is the problem that GCC cannot pass a 'CPU flags' state out of asm(), 
only an explicit (pseudo-)value, right?

Ideally we'd like to have something like:

> * ffffffff8106f42a:       65 ff 0c 25 e0 b7 00    decl   %gs:0xb7e0
>   ffffffff8106f431:       00 
> * ffffffff8106f437:       75 02                   jne    ffffffff8106f43b <kick_process+0x4b>

right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
  2013-09-10 13:42   ` Ingo Molnar
@ 2013-09-10 13:55       ` Jan Beulich
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Beulich @ 2013-09-10 13:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Linus Torvalds, Andi Kleen, Arjan van de Ven, Mike Galbraith,
	linux-arch, linux-kernel, Peter Anvin

>>> On 10.09.13 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:

> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> +	.macro SAVE_ALL
>> +	pushl_cfi %eax
>> +	CFI_REL_OFFSET eax, 0
>> +	pushl_cfi %ebp
>> +	CFI_REL_OFFSET ebp, 0
>> +	pushl_cfi %edi
>> +	CFI_REL_OFFSET edi, 0
>> +	pushl_cfi %esi
>> +	CFI_REL_OFFSET esi, 0
>> +	pushl_cfi %edx
>> +	CFI_REL_OFFSET edx, 0
>> +	pushl_cfi %ecx
>> +	CFI_REL_OFFSET ecx, 0
>> +	pushl_cfi %ebx
>> +	CFI_REL_OFFSET ebx, 0
>> +	.endm
>> +
>> +	.macro RESTORE_ALL
>> +	popl_cfi %ebx
>> +	CFI_RESTORE ebx
>> +	popl_cfi %ecx
>> +	CFI_RESTORE ecx
>> +	popl_cfi %edx
>> +	CFI_RESTORE edx
>> +	popl_cfi %esi
>> +	CFI_RESTORE esi
>> +	popl_cfi %edi
>> +	CFI_RESTORE edi
>> +	popl_cfi %ebp
>> +	CFI_RESTORE ebp
>> +	popl_cfi %eax
>> +	CFI_RESTORE eax
>> +	.endm
> 
> Side note: shouldn't the pushl_cfi and popl_cfi macros be adjusted, 
> instead of open coding it?

If you mean the open coding of CFI_REL_OFFSET and CFI_RESTORE,
then no - there may be pushes/pops that don't save the caller's
register values (i.e. where solely the frame pointer adjustment
matters).

If you meant something else, please clarify what.

Jan


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
@ 2013-09-10 13:55       ` Jan Beulich
  0 siblings, 0 replies; 52+ messages in thread
From: Jan Beulich @ 2013-09-10 13:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Linus Torvalds, Andi Kleen, Arjan van de Ven, Mike Galbraith,
	linux-arch, linux-kernel, Peter Anvin

>>> On 10.09.13 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:

> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> +	.macro SAVE_ALL
>> +	pushl_cfi %eax
>> +	CFI_REL_OFFSET eax, 0
>> +	pushl_cfi %ebp
>> +	CFI_REL_OFFSET ebp, 0
>> +	pushl_cfi %edi
>> +	CFI_REL_OFFSET edi, 0
>> +	pushl_cfi %esi
>> +	CFI_REL_OFFSET esi, 0
>> +	pushl_cfi %edx
>> +	CFI_REL_OFFSET edx, 0
>> +	pushl_cfi %ecx
>> +	CFI_REL_OFFSET ecx, 0
>> +	pushl_cfi %ebx
>> +	CFI_REL_OFFSET ebx, 0
>> +	.endm
>> +
>> +	.macro RESTORE_ALL
>> +	popl_cfi %ebx
>> +	CFI_RESTORE ebx
>> +	popl_cfi %ecx
>> +	CFI_RESTORE ecx
>> +	popl_cfi %edx
>> +	CFI_RESTORE edx
>> +	popl_cfi %esi
>> +	CFI_RESTORE esi
>> +	popl_cfi %edi
>> +	CFI_RESTORE edi
>> +	popl_cfi %ebp
>> +	CFI_RESTORE ebp
>> +	popl_cfi %eax
>> +	CFI_RESTORE eax
>> +	.endm
> 
> Side note: shouldn't the pushl_cfi and popl_cfi macros be adjusted, 
> instead of open coding it?

If you mean the open coding of CFI_REL_OFFSET and CFI_RESTORE,
then no - there may be pushes/pops that don't save the caller's
register values (i.e. where solely the frame pointer adjustment
matters).

If you meant something else, please clarify what.

Jan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 13:51 ` [PATCH 0/7] preempt_count rework -v2 Ingo Molnar
@ 2013-09-10 13:56   ` Ingo Molnar
  2013-09-10 15:14     ` Peter Zijlstra
                       ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Ingo Molnar @ 2013-09-10 13:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	linux-kernel, linux-arch


* Ingo Molnar <mingo@kernel.org> wrote:

> So what we do in kick_process() is:
> 
>         preempt_disable();
>         cpu = task_cpu(p);
>         if ((cpu != smp_processor_id()) && task_curr(p))
>                 smp_send_reschedule(cpu);
>         preempt_enable();
> 
> The preempt_disable() looks sweet:
> 
> >   ffffffff8106f3f1:       65 ff 04 25 e0 b7 00    incl   %gs:0xb7e0
> >   ffffffff8106f3f8:       00 
> 
> and the '*' you marked is the preempt_enable() portion, which, with your 
> new code, looks like this:
> 
>  #define preempt_check_resched() \
>  do { \
>         if (unlikely(!*preempt_count_ptr())) \
>                 preempt_schedule(); \
>  } while (0)
> 
> Which GCC translates to:
> 
> > * ffffffff8106f42a:       65 ff 0c 25 e0 b7 00    decl   %gs:0xb7e0
> >   ffffffff8106f431:       00 
> > * ffffffff8106f432:       0f 94 c0                sete   %al
> > * ffffffff8106f435:       84 c0                   test   %al,%al
> > * ffffffff8106f437:       75 02                   jne    ffffffff8106f43b <kick_process+0x4b>

Correction, so this comes from the new x86-specific optimization:

+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+       unsigned char c;
+
+       asm ("decl " __percpu_arg(0) "; sete %1"
+                       : "+m" (__preempt_count), "=qm" (c));
+
+       return c != 0;
+}

And that's where the sete and test originates from.

Couldn't it be improved by merging the preempt_schedule() call into a new 
primitive, keeping the call in the regular flow, or using section tricks 
to move it out of line? The scheduling case is a slowpath in most cases.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
  2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
  2013-09-10 13:27   ` Peter Zijlstra
@ 2013-09-10 14:02   ` Eric Dumazet
  2013-09-10 15:25     ` Peter Zijlstra
  2013-09-10 16:48   ` Peter Zijlstra
  2 siblings, 1 reply; 52+ messages in thread
From: Eric Dumazet @ 2013-09-10 14:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch

On Tue, 2013-09-10 at 15:08 +0200, Peter Zijlstra wrote:

> +static __always_inline int preempt_count(void)
> +{
> +	return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
> +}

Not sure why you used the _4 prefix on all accessors ?

>  
> +#ifdef CONFIG_PREEMPT_COUNT
> +	/*
> +	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
> +	 * preempt_count of all tasks was equal here and this would not be
> +	 * needed.
> +	 */
> +	task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);

	this_cpu_read(__preempt_count) ?

> +	__raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;

	this_cpu_write(__preempt_count,
                       task_thread_info(next_p)->saved_preempt_count;

> +#endif
> +
>  	this_cpu_write(kernel_stack,
>  		  (unsigned long)task_stack_page(next_p) +
>  		  THREAD_SIZE - KERNEL_STACK_OFFSET);
> 
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
  2013-09-10 13:55       ` Jan Beulich
  (?)
@ 2013-09-10 14:25       ` Ingo Molnar
  -1 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2013-09-10 14:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Linus Torvalds, Andi Kleen, Arjan van de Ven, Mike Galbraith,
	linux-arch, linux-kernel, Peter Anvin


* Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 10.09.13 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:
> 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> >> +	.macro SAVE_ALL
> >> +	pushl_cfi %eax
> >> +	CFI_REL_OFFSET eax, 0
> >> +	pushl_cfi %ebp
> >> +	CFI_REL_OFFSET ebp, 0
> >> +	pushl_cfi %edi
> >> +	CFI_REL_OFFSET edi, 0
> >> +	pushl_cfi %esi
> >> +	CFI_REL_OFFSET esi, 0
> >> +	pushl_cfi %edx
> >> +	CFI_REL_OFFSET edx, 0
> >> +	pushl_cfi %ecx
> >> +	CFI_REL_OFFSET ecx, 0
> >> +	pushl_cfi %ebx
> >> +	CFI_REL_OFFSET ebx, 0
> >> +	.endm
> >> +
> >> +	.macro RESTORE_ALL
> >> +	popl_cfi %ebx
> >> +	CFI_RESTORE ebx
> >> +	popl_cfi %ecx
> >> +	CFI_RESTORE ecx
> >> +	popl_cfi %edx
> >> +	CFI_RESTORE edx
> >> +	popl_cfi %esi
> >> +	CFI_RESTORE esi
> >> +	popl_cfi %edi
> >> +	CFI_RESTORE edi
> >> +	popl_cfi %ebp
> >> +	CFI_RESTORE ebp
> >> +	popl_cfi %eax
> >> +	CFI_RESTORE eax
> >> +	.endm
> > 
> > Side note: shouldn't the pushl_cfi and popl_cfi macros be adjusted, 
> > instead of open coding it?
> 
> If you mean the open coding of CFI_REL_OFFSET and CFI_RESTORE, then no - 
> there may be pushes/pops that don't save the caller's register values 
> (i.e. where solely the frame pointer adjustment matters).

Ok.

> If you meant something else, please clarify what.

No, that's what I meant.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 13:56   ` Ingo Molnar
@ 2013-09-10 15:14     ` Peter Zijlstra
  2013-09-10 15:29     ` Arjan van de Ven
  2013-09-10 16:34     ` Linus Torvalds
  2 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 15:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	linux-kernel, linux-arch

On Tue, Sep 10, 2013 at 03:56:36PM +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@kernel.org> wrote:

> > > * ffffffff8106f42a:       65 ff 0c 25 e0 b7 00    decl   %gs:0xb7e0
> > >   ffffffff8106f431:       00 
> > > * ffffffff8106f432:       0f 94 c0                sete   %al
> > > * ffffffff8106f435:       84 c0                   test   %al,%al
> > > * ffffffff8106f437:       75 02                   jne    ffffffff8106f43b <kick_process+0x4b>
> 
> Correction, so this comes from the new x86-specific optimization:
> 
> +static __always_inline bool __preempt_count_dec_and_test(void)
> +{
> +       unsigned char c;
> +
> +       asm ("decl " __percpu_arg(0) "; sete %1"
> +                       : "+m" (__preempt_count), "=qm" (c));
> +
> +       return c != 0;
> +}
> 
> And that's where the sete and test originates from.

Correct, used in:

#define preempt_enable() \
do { \
        barrier(); \
        if (unlikely(preempt_count_dec_and_test())) \
                __preempt_schedule(); \
} while (0)

> Couldn't it be improved by merging the preempt_schedule() call into a new 
> primitive, keeping the call in the regular flow, or using section tricks 
> to move it out of line? The scheduling case is a slowpath in most cases.

Not if we want to keep using the GCC unlikely thing afaik. That said,
all this inline asm stuff is isn't my strong point, so maybe someone
else has a good idea.

But I really think fixing GCC would be good, as we have the same pattern
with all *_and_test() functions.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
  2013-09-10 14:02   ` Eric Dumazet
@ 2013-09-10 15:25     ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 15:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch

On Tue, Sep 10, 2013 at 07:02:51AM -0700, Eric Dumazet wrote:
> On Tue, 2013-09-10 at 15:08 +0200, Peter Zijlstra wrote:
> 
> > +static __always_inline int preempt_count(void)
> > +{
> > +	return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
> > +}
> 
> Not sure why you used the _4 prefix on all accessors ?

Last time I tried using the proper this_cpu* stuff that all exploded in
my face due to header recursion hell, so I've limited myself to what's
available in arch/x86/include/asm/percpu.h.

It was a few weeks ago though and maybe I just didn't try hard enough.

> > +#ifdef CONFIG_PREEMPT_COUNT
> > +	/*
> > +	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
> > +	 * preempt_count of all tasks was equal here and this would not be
> > +	 * needed.
> > +	 */
> > +	task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);
> 
> 	this_cpu_read(__preempt_count) ?
> 
> > +	__raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;
> 
> 	this_cpu_write(__preempt_count,
>                        task_thread_info(next_p)->saved_preempt_count;

OK, that does indeed generate slightly better code.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 13:56   ` Ingo Molnar
  2013-09-10 15:14     ` Peter Zijlstra
@ 2013-09-10 15:29     ` Arjan van de Ven
  2013-09-10 15:35       ` Peter Zijlstra
  2013-09-10 16:24       ` Linus Torvalds
  2013-09-10 16:34     ` Linus Torvalds
  2 siblings, 2 replies; 52+ messages in thread
From: Arjan van de Ven @ 2013-09-10 15:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Frederic Weisbecker,
	linux-kernel, linux-arch

On 9/10/2013 6:56 AM, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
>> So what we do in kick_process() is:
>>
>>          preempt_disable();
>>          cpu = task_cpu(p);
>>          if ((cpu != smp_processor_id()) && task_curr(p))
>>                  smp_send_reschedule(cpu);
>>          preempt_enable();
>>
>> The preempt_disable() looks sweet:
>>
>>>    ffffffff8106f3f1:       65 ff 04 25 e0 b7 00    incl   %gs:0xb7e0
>>>    ffffffff8106f3f8:       00
>>
>> and the '*' you marked is the preempt_enable() portion, which, with your
>> new code, looks like this:
>>
>>   #define preempt_check_resched() \
>>   do { \
>>          if (unlikely(!*preempt_count_ptr())) \
>>                  preempt_schedule(); \
>>   } while (0)
>>
>> Which GCC translates to:
>>
>>> * ffffffff8106f42a:       65 ff 0c 25 e0 b7 00    decl   %gs:0xb7e0
>>>    ffffffff8106f431:       00
>>> * ffffffff8106f432:       0f 94 c0                sete   %al
>>> * ffffffff8106f435:       84 c0                   test   %al,%al
>>> * ffffffff8106f437:       75 02                   jne    ffffffff8106f43b <kick_process+0x4b>
>
> Correction, so this comes from the new x86-specific optimization:
>
> +static __always_inline bool __preempt_count_dec_and_test(void)
> +{
> +       unsigned char c;
> +
> +       asm ("decl " __percpu_arg(0) "; sete %1"
> +                       : "+m" (__preempt_count), "=qm" (c));
> +
> +       return c != 0;
> +}
>
> And that's where the sete and test originates from.
>
> Couldn't it be improved by merging the preempt_schedule() call into a new
> primitive, keeping the call in the regular flow, or using section tricks
> to move it out of line? The scheduling case is a slowpath in most cases.
>
also.. yuck on using "dec"
"dec" sucks, please use "sub foo  ,1" instead
(dec sucks because of its broken flags behavior; it creates basically a bubble in the pipeline)



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 15:29     ` Arjan van de Ven
@ 2013-09-10 15:35       ` Peter Zijlstra
  2013-09-10 16:24       ` Linus Torvalds
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 15:35 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Linus Torvalds, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Frederic Weisbecker,
	linux-kernel, linux-arch

On Tue, Sep 10, 2013 at 08:29:16AM -0700, Arjan van de Ven wrote:

> also.. yuck on using "dec" "dec" sucks, please use "sub foo  ,1"
> instead (dec sucks because of its broken flags behavior; it creates
> basically a bubble in the pipeline)

Then someone needs to go change:

  atomic{,64}_dec*()
  local{,64}_dec*()
  percpu_add_op()

All of which use decl/incl, percpu_add_op() actually goes out of its way
to generate them.

Also, subl ,1 is 4 bytes larger.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 15:29     ` Arjan van de Ven
  2013-09-10 15:35       ` Peter Zijlstra
@ 2013-09-10 16:24       ` Linus Torvalds
  2013-09-11 16:00         ` H. Peter Anvin
  1 sibling, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 16:24 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Peter Zijlstra, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Tue, Sep 10, 2013 at 8:29 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>
> also.. yuck on using "dec"
> "dec" sucks, please use "sub foo  ,1" instead

That's a bigger instruction, largely due to the constant.

> (dec sucks because of its broken flags behavior; it creates basically a
> bubble in the pipeline)

Intel could (and should) just fix that. It's "easy" enough - you just
need to rename the carry flag as a separate register (and make sure
that the conditional instructions that don't use it don't think they
need it).

In fact I thought Intel already did that on their large cores. Are you
sure they still have trouble with inc/dec?

           Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 13:56   ` Ingo Molnar
  2013-09-10 15:14     ` Peter Zijlstra
  2013-09-10 15:29     ` Arjan van de Ven
@ 2013-09-10 16:34     ` Linus Torvalds
  2013-09-10 16:45       ` Peter Zijlstra
  2 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 16:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Tue, Sep 10, 2013 at 6:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> +static __always_inline bool __preempt_count_dec_and_test(void)
> +{
> +       unsigned char c;
> +
> +       asm ("decl " __percpu_arg(0) "; sete %1"
> +                       : "+m" (__preempt_count), "=qm" (c));
> +
> +       return c != 0;
> +}
>
> And that's where the sete and test originates from.

We could make this use "asm goto" instead.

An "asm goto" cannot have outputs, but this particular one doesn't
_need_ outputs. You could mark the preempt_count memory as an input,
and then have a memory clobber. I think you need the memory clobber
anyway for that preempt-count thing.

So I _think_ something like

static __always_inline bool __preempt_count_dec_and_test(void)
{
       asm goto("decl " __percpu_arg(0) "\n\t"
                "je %l[became_zero]"
                       : :"m" (__preempt_count):"memory":became_zero);
       return 0;
became_zero:
       return 1;
}

would work.

You need to wrap it in

  #ifdef CC_HAVE_ASM_GOTO

and then have the old "sete" version for older compilers, but for
newer ones you'd get pretty much perfect code. UNTESTED.

             Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 16:34     ` Linus Torvalds
@ 2013-09-10 16:45       ` Peter Zijlstra
  2013-09-10 17:06         ` Linus Torvalds
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 16:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Tue, Sep 10, 2013 at 09:34:52AM -0700, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 6:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > +static __always_inline bool __preempt_count_dec_and_test(void)
> > +{
> > +       unsigned char c;
> > +
> > +       asm ("decl " __percpu_arg(0) "; sete %1"
> > +                       : "+m" (__preempt_count), "=qm" (c));
> > +
> > +       return c != 0;
> > +}
> >
> > And that's where the sete and test originates from.
> 
> We could make this use "asm goto" instead.
> 
> An "asm goto" cannot have outputs, but this particular one doesn't
> _need_ outputs. You could mark the preempt_count memory as an input,
> and then have a memory clobber. I think you need the memory clobber
> anyway for that preempt-count thing.
> 
> So I _think_ something like
> 
> static __always_inline bool __preempt_count_dec_and_test(void)
> {
>        asm goto("decl " __percpu_arg(0) "\n\t"
>                 "je %l[became_zero]"
>                        : :"m" (__preempt_count):"memory":became_zero);
>        return 0;
> became_zero:
>        return 1;
> }

The usage site:

#define preempt_enable() \
do { \
	barrier(); \
	if (unlikely(preempt_count_dec_and_test())) \
		__preempt_schedule(); \
} while (0)

Already includes the barrier explicitly, so do we still need the memory
clobber in that asm goto thing?

That said, your change results in:

  ffffffff8106f420 <kick_process>:
  ffffffff8106f420:       55                      push   %rbp
  ffffffff8106f421:       65 ff 04 25 e0 b7 00    incl   %gs:0xb7e0
  ffffffff8106f428:       00 
  ffffffff8106f429:       48 89 e5                mov    %rsp,%rbp
  ffffffff8106f42c:       48 8b 47 08             mov    0x8(%rdi),%rax
  ffffffff8106f430:       8b 50 18                mov    0x18(%rax),%edx
  ffffffff8106f433:       65 8b 04 25 1c b0 00    mov    %gs:0xb01c,%eax
  ffffffff8106f43a:       00 
  ffffffff8106f43b:       39 c2                   cmp    %eax,%edx
  ffffffff8106f43d:       74 1b                   je     ffffffff8106f45a <kick_process+0x3a>
  ffffffff8106f43f:       89 d1                   mov    %edx,%ecx
  ffffffff8106f441:       48 c7 c0 00 2c 01 00    mov    $0x12c00,%rax
  ffffffff8106f448:       48 8b 0c cd a0 bc cb    mov    -0x7e344360(,%rcx,8),%rcx
  ffffffff8106f44f:       81 
  ffffffff8106f450:       48 3b bc 08 00 08 00    cmp    0x800(%rax,%rcx,1),%rdi
  ffffffff8106f457:       00 
  ffffffff8106f458:       74 26                   je     ffffffff8106f480 <kick_process+0x60>
* ffffffff8106f45a:       65 ff 0c 25 e0 b7 00    decl   %gs:0xb7e0
  ffffffff8106f461:       00 
* ffffffff8106f462:       74 0c                   je     ffffffff8106f470 <kick_process+0x50>
  ffffffff8106f464:       5d                      pop    %rbp
  ffffffff8106f465:       c3                      retq   
  ffffffff8106f466:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  ffffffff8106f46d:       00 00 00 
* ffffffff8106f470:       e8 9b b6 f9 ff          callq  ffffffff8100ab10 <___preempt_schedule>
  ffffffff8106f475:       5d                      pop    %rbp
  ffffffff8106f476:       c3                      retq   
  ffffffff8106f477:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  ffffffff8106f47e:       00 00 
  ffffffff8106f480:       89 d7                   mov    %edx,%edi
  ffffffff8106f482:       ff 15 b8 e0 ba 00       callq  *0xbae0b8(%rip)        # ffffffff81c1d540 <smp_ops+0x20>
  ffffffff8106f488:       eb d0                   jmp    ffffffff8106f45a <kick_process+0x3a>
  ffffffff8106f48a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)


Which is indeed perfect. So should I go 'fix' the other _and_test()
functions we have to do this same thing?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
  2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
  2013-09-10 13:27   ` Peter Zijlstra
  2013-09-10 14:02   ` Eric Dumazet
@ 2013-09-10 16:48   ` Peter Zijlstra
  2 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 16:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch


Updated version to include the changes suggested by Eric and the asm
goto magic from Linus.

---
Subject: sched, x86: Provide a per-cpu preempt_count implementation
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Aug 14 14:51:00 CEST 2013

Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/Kbuild        |    1 
 arch/x86/include/asm/preempt.h     |  114 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/thread_info.h |    5 -
 arch/x86/kernel/asm-offsets.c      |    1 
 arch/x86/kernel/cpu/common.c       |    5 +
 arch/x86/kernel/entry_32.S         |    7 --
 arch/x86/kernel/entry_64.S         |    4 -
 arch/x86/kernel/process_32.c       |   10 +++
 arch/x86/kernel/process_64.c       |   10 +++
 9 files changed, 144 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,4 +5,3 @@ genhdr-y += unistd_64.h
 genhdr-y += unistd_x32.h
 
 generic-y += clkdev.h
-generic-y += preempt.h
--- /dev/null
+++ b/arch/x86/include/asm/preempt.h
@@ -0,0 +1,114 @@
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <asm/percpu.h>
+#include <linux/thread_info.h>
+
+DECLARE_PER_CPU(int, __preempt_count);
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+	return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+	return &__raw_get_cpu_var(__preempt_count);
+}
+
+/*
+ * must be macros to avoid header recursion hell
+ */
+#define task_preempt_count(p) \
+	(task_thread_info(p)->saved_preempt_count & ~PREEMPT_NEED_RESCHED)
+
+#define init_task_preempt_count(p) do { \
+	task_thread_info(p)->saved_preempt_count = 1 | PREEMPT_NEED_RESCHED; \
+} while (0)
+
+#define init_idle_preempt_count(p, cpu) do { \
+	task_thread_info(p)->saved_preempt_count = 0; \
+	per_cpu(__preempt_count, (cpu)) = 0; \
+} while (0)
+
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+	__this_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+	__this_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+	return !(__this_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
+}
+
+/*
+ * The various preempt_count add/sub methods
+ */
+
+static __always_inline void __preempt_count_add(int val)
+{
+	__this_cpu_add_4(__preempt_count, val);
+}
+
+static __always_inline void __preempt_count_sub(int val)
+{
+	__this_cpu_add_4(__preempt_count, -val);
+}
+
+#ifdef CC_HAVE_ASM_GOTO
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+	asm goto("decl " __percpu_arg(0) "\n\t"
+		 "je %l[became_zero]"
+			: :"m" (__preempt_count):"memory":became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+	unsigned char c;
+
+	asm ("decl " __percpu_arg(0) "; sete %1"
+			: "+m" (__preempt_count), "=qm" (c));
+
+	return c != 0;
+}
+#endif
+
+/*
+ * Returns true when we need to resched -- even if we can not.
+ */
+static __always_inline bool need_resched(void)
+{
+	return unlikely(test_preempt_need_resched());
+}
+
+/*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+static __always_inline bool should_resched(void)
+{
+	return unlikely(!__this_cpu_read_4(__preempt_count));
+}
+
+#endif /* __ASM_PREEMPT_H */
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -28,8 +28,7 @@ struct thread_info {
 	__u32			flags;		/* low level flags */
 	__u32			status;		/* thread synchronous flags */
 	__u32			cpu;		/* current CPU */
-	int			preempt_count;	/* 0 => preemptable,
-						   <0 => BUG */
+	int			saved_preempt_count;
 	mm_segment_t		addr_limit;
 	struct restart_block    restart_block;
 	void __user		*sysenter_return;
@@ -49,7 +48,7 @@ struct thread_info {
 	.exec_domain	= &default_exec_domain,	\
 	.flags		= 0,			\
 	.cpu		= 0,			\
-	.preempt_count	= INIT_PREEMPT_COUNT,	\
+	.saved_preempt_count = INIT_PREEMPT_COUNT,	\
 	.addr_limit	= KERNEL_DS,		\
 	.restart_block = {			\
 		.fn = do_no_restart_syscall,	\
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -32,7 +32,6 @@ void common(void) {
 	OFFSET(TI_flags, thread_info, flags);
 	OFFSET(TI_status, thread_info, status);
 	OFFSET(TI_addr_limit, thread_info, addr_limit);
-	OFFSET(TI_preempt_count, thread_info, preempt_count);
 
 	BLANK();
 	OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1095,6 +1095,9 @@ DEFINE_PER_CPU(char *, irq_stack_ptr) =
 
 DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
+
 DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
 
 /*
@@ -1169,6 +1172,8 @@ void debug_stack_reset(void)
 
 DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
 DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
 
 #ifdef CONFIG_CC_STACKPROTECTOR
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -362,12 +362,9 @@ END(ret_from_exception)
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
-	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
-	jnz restore_all
 need_resched:
-	movl TI_flags(%ebp), %ecx	# need_resched set ?
-	testb $_TIF_NEED_RESCHED, %cl
-	jz restore_all
+	cmpl $0,PER_CPU_VAR(__preempt_count)
+	jnz restore_all
 	testl $X86_EFLAGS_IF,PT_EFLAGS(%esp)	# interrupts off (exception path) ?
 	jz restore_all
 	call preempt_schedule_irq
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1103,10 +1103,8 @@ ENTRY(native_iret)
 	/* Returning to kernel space. Check if we need preemption */
 	/* rcx:	 threadinfo. interrupts off. */
 ENTRY(retint_kernel)
-	cmpl $0,TI_preempt_count(%rcx)
+	cmpl $0,PER_CPU_VAR(__preempt_count)
 	jnz  retint_restore_args
-	bt  $TIF_NEED_RESCHED,TI_flags(%rcx)
-	jnc  retint_restore_args
 	bt   $9,EFLAGS-ARGOFFSET(%rsp)	/* interrupts off? */
 	jnc  retint_restore_args
 	call preempt_schedule_irq
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -291,6 +291,16 @@ __switch_to(struct task_struct *prev_p,
 	if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
 		set_iopl_mask(next->iopl);
 
+#ifdef CONFIG_PREEMPT_COUNT
+	/*
+	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
+	 * preempt_count of all tasks was equal here and this would not be
+	 * needed.
+	 */
+	task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
+	this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
+#endif
+
 	/*
 	 * Now maybe handle debug registers and/or IO bitmaps
 	 */
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -363,6 +363,16 @@ __switch_to(struct task_struct *prev_p,
 	this_cpu_write(old_rsp, next->usersp);
 	this_cpu_write(current_task, next_p);
 
+#ifdef CONFIG_PREEMPT_COUNT
+	/*
+	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
+	 * preempt_count of all tasks was equal here and this would not be
+	 * needed.
+	 */
+	task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
+	this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
+#endif
+
 	this_cpu_write(kernel_stack,
 		  (unsigned long)task_stack_page(next_p) +
 		  THREAD_SIZE - KERNEL_STACK_OFFSET);


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 16:45       ` Peter Zijlstra
@ 2013-09-10 17:06         ` Linus Torvalds
  2013-09-10 21:25           ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 17:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Tue, Sep 10, 2013 at 9:45 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The usage site:
>
> #define preempt_enable() \
> do { \
>         barrier(); \
>         if (unlikely(preempt_count_dec_and_test())) \
>                 __preempt_schedule(); \
> } while (0)
>
> Already includes the barrier explicitly, so do we still need the memory
> clobber in that asm goto thing?

Yeah, you do to be safe, just to let gcc know that it may be changing
the memory location.

The issue is that because an "asm goto" cannot have outputs, I had to
change the (correct) "+m" input/output into just a "m" (input).

So without the memory clobber, gcc might decide that the thing doesn't
actually change the preempt count, and perhaps move a load of that
across the "asm goto".

Admittedly, that does sound almost impossibly unlikely, but I'd be
happier being careful.

> That said, your change results in:
>
> * ffffffff8106f45a:       65 ff 0c 25 e0 b7 00    decl   %gs:0xb7e0
>   ffffffff8106f461:       00
> * ffffffff8106f462:       74 0c                   je     ffffffff8106f470 <kick_process+0x50>
...
> Which is indeed perfect. So should I go 'fix' the other _and_test()
> functions we have to do this same thing?

It would be a good thing to test. There might be downsides with "asm
goto" (maybe it limits gcc some way), but it does in general save us
not only two instructions but also a register.

                 Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 17:06         ` Linus Torvalds
@ 2013-09-10 21:25           ` Peter Zijlstra
  2013-09-10 21:43             ` Linus Torvalds
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 21:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Tue, Sep 10, 2013 at 10:06:44AM -0700, Linus Torvalds wrote:
> > Which is indeed perfect. So should I go 'fix' the other _and_test()
> > functions we have to do this same thing?
> 
> It would be a good thing to test. There might be downsides with "asm
> goto" (maybe it limits gcc some way), but it does in general save us
> not only two instructions but also a register.

Here's one that builds and boots on kvm until wanting to mount root.

I'm not entirely sure on the "ir" vs "er" thing and atomic64_t and
local_t are inconsistent wrt that so I'm too.

I'll try running on actual hardware tomorrow sometime.

---
 arch/x86/include/asm/atomic.h      |   52 +++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/atomic64_64.h |   54 ++++++++++++++++++++++++++++++++++++-
 arch/x86/include/asm/local.h       |   52 +++++++++++++++++++++++++++++++++++
 3 files changed, 157 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -74,6 +74,18 @@ static inline void atomic_sub(int i, ato
  * true if the result is zero, or false for all
  * other cases.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic_sub_and_test(int i, atomic_t *v)
+{
+	asm volatile goto(LOCK_PREFIX "subl %1,%0;"
+			  "je %l[became_zero]"
+			  : : "m" (v->counter), "ir" (i)
+			  : "memory" : became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
 static inline int atomic_sub_and_test(int i, atomic_t *v)
 {
 	unsigned char c;
@@ -83,6 +95,7 @@ static inline int atomic_sub_and_test(in
 		     : "ir" (i) : "memory");
 	return c;
 }
+#endif
 
 /**
  * atomic_inc - increment atomic variable
@@ -116,6 +129,18 @@ static inline void atomic_dec(atomic_t *
  * returns true if the result is 0, or false for all other
  * cases.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic_dec_and_test(atomic_t *v)
+{
+	asm volatile goto(LOCK_PREFIX "decl %0;"
+			  "je %l[became_zero]"
+			  : : "m" (v->counter)
+			  : "memory" : became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
 static inline int atomic_dec_and_test(atomic_t *v)
 {
 	unsigned char c;
@@ -125,6 +150,7 @@ static inline int atomic_dec_and_test(at
 		     : : "memory");
 	return c != 0;
 }
+#endif
 
 /**
  * atomic_inc_and_test - increment and test
@@ -134,6 +160,18 @@ static inline int atomic_dec_and_test(at
  * and returns true if the result is zero, or false for all
  * other cases.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic_inc_and_test(atomic_t *v)
+{
+	asm volatile goto(LOCK_PREFIX "incl %0;"
+			  "je %l[became_zero]"
+			  : : "m" (v->counter)
+			  : "memory" : became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
 static inline int atomic_inc_and_test(atomic_t *v)
 {
 	unsigned char c;
@@ -143,6 +181,7 @@ static inline int atomic_inc_and_test(at
 		     : : "memory");
 	return c != 0;
 }
+#endif
 
 /**
  * atomic_add_negative - add and test if negative
@@ -153,6 +192,18 @@ static inline int atomic_inc_and_test(at
  * if the result is negative, or false when
  * result is greater than or equal to zero.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic_add_negative(int i, atomic_t *v)
+{
+	asm volatile goto(LOCK_PREFIX "addl %1,%0;"
+			  "js %l[became_neg]"
+			  : : "m" (v->counter), "ir" (i)
+			  : "memory" : became_neg);
+	return 0;
+became_neg:
+	return 1;
+}
+#else
 static inline int atomic_add_negative(int i, atomic_t *v)
 {
 	unsigned char c;
@@ -162,6 +213,7 @@ static inline int atomic_add_negative(in
 		     : "ir" (i) : "memory");
 	return c;
 }
+#endif
 
 /**
  * atomic_add_return - add integer and return
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -70,6 +70,18 @@ static inline void atomic64_sub(long i,
  * true if the result is zero, or false for all
  * other cases.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic64_sub_and_test(long i, atomic64_t *v)
+{
+	asm volatile goto(LOCK_PREFIX "subq %1,%0;"
+			  "je %l[became_zero]"
+			  : : "m" (v->counter), "er" (i)
+			  : "memory" : became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
 static inline int atomic64_sub_and_test(long i, atomic64_t *v)
 {
 	unsigned char c;
@@ -79,6 +91,7 @@ static inline int atomic64_sub_and_test(
 		     : "er" (i), "m" (v->counter) : "memory");
 	return c;
 }
+#endif
 
 /**
  * atomic64_inc - increment atomic64 variable
@@ -114,6 +127,18 @@ static inline void atomic64_dec(atomic64
  * returns true if the result is 0, or false for all other
  * cases.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic64_dec_and_test(atomic64_t *v)
+{
+	asm volatile goto(LOCK_PREFIX "decq %0;"
+			  "je %l[became_zero]"
+			  : : "m" (v->counter)
+			  : "memory" : became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
 static inline int atomic64_dec_and_test(atomic64_t *v)
 {
 	unsigned char c;
@@ -123,6 +148,7 @@ static inline int atomic64_dec_and_test(
 		     : "m" (v->counter) : "memory");
 	return c != 0;
 }
+#endif
 
 /**
  * atomic64_inc_and_test - increment and test
@@ -132,6 +158,18 @@ static inline int atomic64_dec_and_test(
  * and returns true if the result is zero, or false for all
  * other cases.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic64_inc_and_test(atomic64_t *v)
+{
+	asm volatile goto(LOCK_PREFIX "incq %0;"
+			  "je %l[became_zero]"
+			  : : "m" (v->counter)
+			  : "memory" : became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
 static inline int atomic64_inc_and_test(atomic64_t *v)
 {
 	unsigned char c;
@@ -141,6 +179,7 @@ static inline int atomic64_inc_and_test(
 		     : "m" (v->counter) : "memory");
 	return c != 0;
 }
+#endif
 
 /**
  * atomic64_add_negative - add and test if negative
@@ -151,6 +190,18 @@ static inline int atomic64_inc_and_test(
  * if the result is negative, or false when
  * result is greater than or equal to zero.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic64_add_negative(long i, atomic64_t *v)
+{
+	asm volatile goto(LOCK_PREFIX "addq %1,%0;"
+			  "js %l[became_neg]"
+			  : : "m" (v->counter), "er" (i)
+			  : "memory" : became_neg);
+	return 0;
+became_neg:
+	return 1;
+}
+#else
 static inline int atomic64_add_negative(long i, atomic64_t *v)
 {
 	unsigned char c;
@@ -160,6 +211,7 @@ static inline int atomic64_add_negative(
 		     : "er" (i), "m" (v->counter) : "memory");
 	return c;
 }
+#endif
 
 /**
  * atomic64_add_return - add and return
@@ -219,7 +271,7 @@ static inline int atomic64_add_unless(at
 
 /*
  * atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic_t
+ * @v: pointer of type atomic64_t
  *
  * The function returns the old value of *v minus 1, even if
  * the atomic variable, v, was not decremented.
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -50,6 +50,18 @@ static inline void local_sub(long i, loc
  * true if the result is zero, or false for all
  * other cases.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int local_sub_and_test(long i, local_t *l)
+{
+	asm volatile goto(_ASM_SUB " %1,%0;"
+			  "je %l[became_zero]"
+			  : : "m" (l->a.counter), "ir" (i)
+			  : "memory" : became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
 static inline int local_sub_and_test(long i, local_t *l)
 {
 	unsigned char c;
@@ -59,6 +71,7 @@ static inline int local_sub_and_test(lon
 		     : "ir" (i) : "memory");
 	return c;
 }
+#endif
 
 /**
  * local_dec_and_test - decrement and test
@@ -68,6 +81,18 @@ static inline int local_sub_and_test(lon
  * returns true if the result is 0, or false for all other
  * cases.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int local_dec_and_test(local_t *l)
+{
+	asm volatile goto(_ASM_DEC " %0;"
+			  "je %l[became_zero]"
+			  : : "m" (l->a.counter)
+			  : "memory" : became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
 static inline int local_dec_and_test(local_t *l)
 {
 	unsigned char c;
@@ -77,6 +102,7 @@ static inline int local_dec_and_test(loc
 		     : : "memory");
 	return c != 0;
 }
+#endif
 
 /**
  * local_inc_and_test - increment and test
@@ -86,6 +112,18 @@ static inline int local_dec_and_test(loc
  * and returns true if the result is zero, or false for all
  * other cases.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int local_inc_and_test(local_t *l)
+{
+	asm volatile goto(_ASM_INC " %0;"
+			  "je %l[became_zero]"
+			  : : "m" (l->a.counter)
+			  : "memory" : became_zero);
+	return 0;
+became_zero:
+	return 1;
+}
+#else
 static inline int local_inc_and_test(local_t *l)
 {
 	unsigned char c;
@@ -95,6 +133,7 @@ static inline int local_inc_and_test(loc
 		     : : "memory");
 	return c != 0;
 }
+#endif
 
 /**
  * local_add_negative - add and test if negative
@@ -105,6 +144,18 @@ static inline int local_inc_and_test(loc
  * if the result is negative, or false when
  * result is greater than or equal to zero.
  */
+#ifdef CC_HAVE_ASM_GOTO
+static inline int local_add_negative(long i, local_t *l)
+{
+	asm volatile goto(_ASM_ADD " %1,%0;"
+			  "js %l[became_neg]"
+			  : : "m" (l->a.counter), "ir" (i)
+			  : "memory" : became_neg);
+	return 0;
+became_neg:
+	return 1;
+}
+#else
 static inline int local_add_negative(long i, local_t *l)
 {
 	unsigned char c;
@@ -114,6 +165,7 @@ static inline int local_add_negative(lon
 		     : "ir" (i) : "memory");
 	return c;
 }
+#endif
 
 /**
  * local_add_return - add and return

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 21:25           ` Peter Zijlstra
@ 2013-09-10 21:43             ` Linus Torvalds
  2013-09-10 21:51               ` H. Peter Anvin
  2013-09-11 13:13               ` Peter Zijlstra
  0 siblings, 2 replies; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 21:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Tue, Sep 10, 2013 at 2:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Here's one that builds and boots on kvm until wanting to mount root.
>
> I'm not entirely sure on the "ir" vs "er" thing and atomic64_t and
> local_t are inconsistent wrt that so I'm too.

"i" is "any constant", while "e" is "32-bit signed constant".

And I think all of the 64-bit ones should probably be "e", because
afaik there is no way to add a 64-bit constant directly to memory (you
have to load it into a register first).

Of course, in reality, the constant is always just 1 or -1 or
something like that, so nobody will ever notice the incorrect case...

And it doesn't matter for the 32-bit cases, obviously, but we could
just make them all be "e" for simplicity.

That said, looking at your patch, I get the *very* strong feeling that
we could make a macro that does all the repetitions for us, and then
have a

  GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
  GENERATE_RMW(atomic_dec_and_test, LOCK_PREFIX "decl", "e", "")
  ..
  GENERATE_RMW(atomic_add_negative, LOCK_PREFIX "addl", "s", "")

  GENERATE_RMW(local_sub_and_test, "subl", "e", __percpu_prefix)
  ...

etc.

I'm sure the macro would be nasty as hell (and I bet it needs a few
more arguments), but then we'd avoid the repetition..

                Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 21:43             ` Linus Torvalds
@ 2013-09-10 21:51               ` H. Peter Anvin
  2013-09-10 22:02                 ` Linus Torvalds
  2013-09-11 13:13               ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: H. Peter Anvin @ 2013-09-10 21:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Ingo Molnar, Andi Kleen, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On 09/10/2013 02:43 PM, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 2:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> Here's one that builds and boots on kvm until wanting to mount root.
>>
>> I'm not entirely sure on the "ir" vs "er" thing and atomic64_t and
>> local_t are inconsistent wrt that so I'm too.
> 
> "i" is "any constant", while "e" is "32-bit signed constant".
> 
> And I think all of the 64-bit ones should probably be "e", because
> afaik there is no way to add a 64-bit constant directly to memory (you
> have to load it into a register first).
> 
> Of course, in reality, the constant is always just 1 or -1 or
> something like that, so nobody will ever notice the incorrect case...
> 
> And it doesn't matter for the 32-bit cases, obviously, but we could
> just make them all be "e" for simplicity.
> 
> That said, looking at your patch, I get the *very* strong feeling that
> we could make a macro that does all the repetitions for us, and then
> have a
> 
>   GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
>   GENERATE_RMW(atomic_dec_and_test, LOCK_PREFIX "decl", "e", "")
>   ..
>   GENERATE_RMW(atomic_add_negative, LOCK_PREFIX "addl", "s", "")
> 
>   GENERATE_RMW(local_sub_and_test, "subl", "e", __percpu_prefix)
>   ...
> 
> etc.
> 
> I'm sure the macro would be nasty as hell (and I bet it needs a few
> more arguments), but then we'd avoid the repetition..
> 

Actually, the right thing here really is "er" (which I think you meant,
but just to make it clear.)  Why?  Even if the value is representable as
a signed immediate, if gcc already happens to have it in a register it
will be better to use the register.

"e" doesn't work on versions of gcc older than the first x86-64 release,
but we don't care about that anymore.

A final good question is if we should encapsulate the add/inc and
sub/dec into a single function; one could easily do somethin glike:

static inline int atomic_sub_and_test(int i, atomic_t *)
{
	if (__builtin_constant_p(i) && i == 1)
		/* Use incl */
	else if (__builtin_constant_p(i) && i == -1)
		/* Use decl */
	else
		/* Use addl */
}

	-hpa


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 21:51               ` H. Peter Anvin
@ 2013-09-10 22:02                 ` Linus Torvalds
  2013-09-10 22:06                   ` H. Peter Anvin
  0 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 22:02 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Ingo Molnar, Andi Kleen, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Tue, Sep 10, 2013 at 2:51 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/10/2013 02:43 PM, Linus Torvalds wrote:
>
> Actually, the right thing here really is "er" (which I think you meant,
> but just to make it clear.)

Yes, I was just answering the i-vs-e confusion.

> "e" doesn't work on versions of gcc older than the first x86-64 release,
> but we don't care about that anymore.

Indeed.

> A final good question is if we should encapsulate the add/inc and
> sub/dec into a single function; one could easily do somethin glike:

Yes. However, I would do that at a higher level than the one that
builds the actual functions.

That said, there's a few cases where you might want to specify
add-vs-sub explicitly, but they are rather odd, namely the fact that
"-128" fits in a byte, but "128" does not.

So it can be better to add 128 by doing a "subl $-128" than by doing
an "add $128".

But we probably don't have any situation where we care about that
special value of "128". I've seen the trick, though.

              Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 22:02                 ` Linus Torvalds
@ 2013-09-10 22:06                   ` H. Peter Anvin
  0 siblings, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2013-09-10 22:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Ingo Molnar, Andi Kleen, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On 09/10/2013 03:02 PM, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 2:51 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 09/10/2013 02:43 PM, Linus Torvalds wrote:
>>
>> Actually, the right thing here really is "er" (which I think you meant,
>> but just to make it clear.)
> 
> Yes, I was just answering the i-vs-e confusion.
> 
>> "e" doesn't work on versions of gcc older than the first x86-64 release,
>> but we don't care about that anymore.
> 
> Indeed.
> 
>> A final good question is if we should encapsulate the add/inc and
>> sub/dec into a single function; one could easily do somethin glike:
> 
> Yes. However, I would do that at a higher level than the one that
> builds the actual functions.
> 
> That said, there's a few cases where you might want to specify
> add-vs-sub explicitly, but they are rather odd, namely the fact that
> "-128" fits in a byte, but "128" does not.
> 
> So it can be better to add 128 by doing a "subl $-128" than by doing
> an "add $128".
> 
> But we probably don't have any situation where we care about that
> special value of "128". I've seen the trick, though.
> 

Yes, and if __builtin_constant_p() we could even do it explicitly.
Unfortunately I don't think gcc allows alternatives in asm() statements,
unlike in its own pattern tables.

	-hpa


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
@ 2013-09-11  1:59   ` Andy Lutomirski
  2013-09-11  8:25     ` Peter Zijlstra
  2013-09-11 11:14   ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Andy Lutomirski @ 2013-09-11  1:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch

On 09/10/2013 06:08 AM, Peter Zijlstra wrote:
> In order to combine the preemption and need_resched test we need to
> fold the need_resched information into the preempt_count value.
> 
> We keep the existing TIF_NEED_RESCHED infrastructure in place but at 3
> sites test it and fold its value into preempt_count; namely:
> 
>  - resched_task() when setting TIF_NEED_RESCHED on the current task
>  - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
>                    remote task it follows it up with a reschedule IPI
>                    and we can modify the cpu local preempt_count from
>                    there.
>  - cpu_idle_loop() for when resched_task() found tsk_is_polling().


It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
is set but the preempt resched bit is not -- the need_resched call
between monitor and mwait won't notice TIF_NEED_RESCHED.

Is this condition possible?

--Andy


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-11  1:59   ` Andy Lutomirski
@ 2013-09-11  8:25     ` Peter Zijlstra
  2013-09-11 11:06       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11  8:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch

On Tue, Sep 10, 2013 at 06:59:57PM -0700, Andy Lutomirski wrote:
> On 09/10/2013 06:08 AM, Peter Zijlstra wrote:
> > In order to combine the preemption and need_resched test we need to
> > fold the need_resched information into the preempt_count value.
> > 
> > We keep the existing TIF_NEED_RESCHED infrastructure in place but at 3
> > sites test it and fold its value into preempt_count; namely:
> > 
> >  - resched_task() when setting TIF_NEED_RESCHED on the current task
> >  - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
> >                    remote task it follows it up with a reschedule IPI
> >                    and we can modify the cpu local preempt_count from
> >                    there.
> >  - cpu_idle_loop() for when resched_task() found tsk_is_polling().
> 
> 
> It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
> is set but the preempt resched bit is not -- the need_resched call
> between monitor and mwait won't notice TIF_NEED_RESCHED.
> 
> Is this condition possible?

Ah indeed, I'll have to go find all idle loops out there it seems. Now
if only they were easy to spot :/

I was hoping the generic idle thing was good enough, apparently not
quite. Thanks for spotting that.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-11  8:25     ` Peter Zijlstra
@ 2013-09-11 11:06       ` Peter Zijlstra
  2013-09-11 13:34         ` Mike Galbraith
  2013-09-11 16:35         ` Andy Lutomirski
  0 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 11:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
	Vaidyanathan Srinivasan

On Wed, Sep 11, 2013 at 10:25:30AM +0200, Peter Zijlstra wrote:
> On Tue, Sep 10, 2013 at 06:59:57PM -0700, Andy Lutomirski wrote:

> > It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
> > is set but the preempt resched bit is not -- the need_resched call
> > between monitor and mwait won't notice TIF_NEED_RESCHED.
> > 
> > Is this condition possible?
> 
> Ah indeed, I'll have to go find all idle loops out there it seems. Now
> if only they were easy to spot :/
> 
> I was hoping the generic idle thing was good enough, apparently not
> quite. Thanks for spotting that.

OK, and the reason I didn't notice is that the entire TS_POLLING thing
is completely wrecked so we'll get the interrupt anyway.

The below might fix it.. it boots, but then it already did so that
doesn't say much.

Mike does it fix the funny you saw?

---
Subject: sched, idle: Fix the idle polling state logic
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Sep 11 12:43:13 CEST 2013

Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
regressed several workloads and caused excessive reschedule
interrupts.

The patch in question failed to notice that the x86 code had an
inverted sense of the polling state versus the new generic code (x86:
default polling, generic: default !polling).

Fix the two prominent x86 mwait based idle drivers and introduce a few
new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
usage).

Also switch the idle routines to using test_need_resched() which is an
immediate TIF_NEED_RESCHED test as opposed to need_resched which will
end up being slightly different.

Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/kernel/process.c     |    6 +--
 drivers/acpi/processor_idle.c |   46 +++++-------------------
 drivers/idle/intel_idle.c     |    2 -
 include/linux/sched.h         |   78 ++++++++++++++++++++++++++++++++++++++----
 kernel/cpu/idle.c             |    9 ++--
 5 files changed, 89 insertions(+), 52 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -391,9 +391,9 @@ static void amd_e400_idle(void)
 		 * The switch back from broadcast mode needs to be
 		 * called with interrupts disabled.
 		 */
-		 local_irq_disable();
-		 clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
-		 local_irq_enable();
+		local_irq_disable();
+		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
+		local_irq_enable();
 	} else
 		default_idle();
 }
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -119,17 +119,10 @@ static struct dmi_system_id processor_po
  */
 static void acpi_safe_halt(void)
 {
-	current_thread_info()->status &= ~TS_POLLING;
-	/*
-	 * TS_POLLING-cleared state must be visible before we
-	 * test NEED_RESCHED:
-	 */
-	smp_mb();
-	if (!need_resched()) {
+	if (!test_need_resched()) {
 		safe_halt();
 		local_irq_disable();
 	}
-	current_thread_info()->status |= TS_POLLING;
 }
 
 #ifdef ARCH_APICTIMER_STOPS_ON_C3
@@ -737,6 +730,11 @@ static int acpi_idle_enter_c1(struct cpu
 	if (unlikely(!pr))
 		return -EINVAL;
 
+	if (cx->entry_method == ACPI_CSTATE_FFH) {
+		if (current_set_polling_and_test())
+			return -EINVAL;
+	}
+
 	lapic_timer_state_broadcast(pr, cx, 1);
 	acpi_idle_do_entry(cx);
 
@@ -790,18 +788,9 @@ static int acpi_idle_enter_simple(struct
 	if (unlikely(!pr))
 		return -EINVAL;
 
-	if (cx->entry_method != ACPI_CSTATE_FFH) {
-		current_thread_info()->status &= ~TS_POLLING;
-		/*
-		 * TS_POLLING-cleared state must be visible before we test
-		 * NEED_RESCHED:
-		 */
-		smp_mb();
-
-		if (unlikely(need_resched())) {
-			current_thread_info()->status |= TS_POLLING;
+	if (cx->entry_method == ACPI_CSTATE_FFH) {
+		if (current_set_polling_and_test())
 			return -EINVAL;
-		}
 	}
 
 	/*
@@ -819,9 +808,6 @@ static int acpi_idle_enter_simple(struct
 
 	sched_clock_idle_wakeup_event(0);
 
-	if (cx->entry_method != ACPI_CSTATE_FFH)
-		current_thread_info()->status |= TS_POLLING;
-
 	lapic_timer_state_broadcast(pr, cx, 0);
 	return index;
 }
@@ -858,18 +844,9 @@ static int acpi_idle_enter_bm(struct cpu
 		}
 	}
 
-	if (cx->entry_method != ACPI_CSTATE_FFH) {
-		current_thread_info()->status &= ~TS_POLLING;
-		/*
-		 * TS_POLLING-cleared state must be visible before we test
-		 * NEED_RESCHED:
-		 */
-		smp_mb();
-
-		if (unlikely(need_resched())) {
-			current_thread_info()->status |= TS_POLLING;
+	if (cx->entry_method == ACPI_CSTATE_FFH) {
+		if (current_set_polling_and_test())
 			return -EINVAL;
-		}
 	}
 
 	acpi_unlazy_tlb(smp_processor_id());
@@ -915,9 +892,6 @@ static int acpi_idle_enter_bm(struct cpu
 
 	sched_clock_idle_wakeup_event(0);
 
-	if (cx->entry_method != ACPI_CSTATE_FFH)
-		current_thread_info()->status |= TS_POLLING;
-
 	lapic_timer_state_broadcast(pr, cx, 0);
 	return index;
 }
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -359,7 +359,7 @@ static int intel_idle(struct cpuidle_dev
 	if (!(lapic_timer_reliable_states & (1 << (cstate))))
 		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
 
-	if (!need_resched()) {
+	if (!current_set_polling_and_test()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
 		smp_mb();
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2471,34 +2471,98 @@ static inline int tsk_is_polling(struct
 {
 	return task_thread_info(p)->status & TS_POLLING;
 }
-static inline void current_set_polling(void)
+static inline void __current_set_polling(void)
 {
 	current_thread_info()->status |= TS_POLLING;
 }
 
-static inline void current_clr_polling(void)
+static inline bool __must_check current_set_polling_and_test(void)
+{
+	__current_set_polling();
+
+	/*
+	 * Polling state must be visible before we test NEED_RESCHED,
+	 * paired by resched_task()
+	 */
+	smp_mb();
+
+	return unlikely(test_need_resched());
+}
+
+static inline void __current_clr_polling(void)
 {
 	current_thread_info()->status &= ~TS_POLLING;
-	smp_mb__after_clear_bit();
+}
+
+static inline bool __must_check current_clr_polling_and_test(void)
+{
+	__current_clr_polling();
+
+	/*
+	 * Polling state must be visible before we test NEED_RESCHED,
+	 * paired by resched_task()
+	 */
+	smp_mb();
+
+	return unlikely(test_need_resched());
 }
 #elif defined(TIF_POLLING_NRFLAG)
 static inline int tsk_is_polling(struct task_struct *p)
 {
 	return test_tsk_thread_flag(p, TIF_POLLING_NRFLAG);
 }
-static inline void current_set_polling(void)
+
+static inline void __current_set_polling(void)
 {
 	set_thread_flag(TIF_POLLING_NRFLAG);
 }
 
-static inline void current_clr_polling(void)
+static inline bool __must_check current_set_polling_and_test(void)
+{
+	__current_set_polling();
+
+	/*
+	 * Polling state must be visible before we test NEED_RESCHED,
+	 * paired by resched_task()
+	 *
+	 * XXX: assumes set/clear bit are identical barrier wise.
+	 */
+	smp_mb__after_clear_bit();
+
+	return unlikely(test_need_resched());
+}
+
+static inline void __current_clr_polling(void)
 {
 	clear_thread_flag(TIF_POLLING_NRFLAG);
 }
+
+static inline bool __must_check current_clr_polling_and_test(void)
+{
+	__current_clr_polling();
+
+	/*
+	 * Polling state must be visible before we test NEED_RESCHED,
+	 * paired by resched_task()
+	 */
+	smp_mb__after_clear_bit();
+
+	return unlikely(test_need_resched());
+}
+
 #else
 static inline int tsk_is_polling(struct task_struct *p) { return 0; }
-static inline void current_set_polling(void) { }
-static inline void current_clr_polling(void) { }
+static inline void __current_set_polling(void) { }
+static inline void __current_clr_polling(void) { }
+
+static inline bool __must_check current_set_polling_and_test(void)
+{
+	return unlikely(test_need_resched);
+}
+static inline bool __must_check current_clr_polling_and_test(void)
+{
+	return unlikely(test_need_resched);
+}
 #endif
 
 /*
--- a/kernel/cpu/idle.c
+++ b/kernel/cpu/idle.c
@@ -44,7 +44,7 @@ static inline int cpu_idle_poll(void)
 	rcu_idle_enter();
 	trace_cpu_idle_rcuidle(0, smp_processor_id());
 	local_irq_enable();
-	while (!need_resched())
+	while (!test_need_resched())
 		cpu_relax();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 	rcu_idle_exit();
@@ -92,8 +92,7 @@ static void cpu_idle_loop(void)
 			if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
 				cpu_idle_poll();
 			} else {
-				current_clr_polling();
-				if (!need_resched()) {
+				if (!current_clr_polling_and_test()) {
 					stop_critical_timings();
 					rcu_idle_enter();
 					arch_cpu_idle();
@@ -103,7 +102,7 @@ static void cpu_idle_loop(void)
 				} else {
 					local_irq_enable();
 				}
-				current_set_polling();
+				__current_set_polling();
 			}
 			arch_cpu_idle_exit();
 			/*
@@ -136,7 +135,7 @@ void cpu_startup_entry(enum cpuhp_state
 	 */
 	boot_init_stack_canary();
 #endif
-	current_set_polling();
+	__current_set_polling();
 	arch_cpu_idle_prepare();
 	cpu_idle_loop();
 }

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
  2013-09-11  1:59   ` Andy Lutomirski
@ 2013-09-11 11:14   ` Peter Zijlstra
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 11:14 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
	Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch


Prompted by a question from Mike I updated the Changelog to explain why
we need to keep TIF_NEED_RESCHED.

---
Subject: sched: Add NEED_RESCHED to the preempt_count
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Aug 14 14:55:31 CEST 2013

In order to combine the preemption and need_resched test we need to
fold the need_resched information into the preempt_count value.

Since the NEED_RESCHED flag is set across CPUs this needs to be an
atomic operation, however we very much want to avoid making
preempt_count atomic, therefore we keep the existing TIF_NEED_RESCHED
infrastructure in place but at 3 sites test it and fold its value into
preempt_count; namely:

 - resched_task() when setting TIF_NEED_RESCHED on the current task
 - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
                   remote task it follows it up with a reschedule IPI
                   and we can modify the cpu local preempt_count from
                   there.
 - cpu_idle_loop() for when resched_task() found tsk_is_polling().

We use an inverted bitmask to indicate need_resched so that a 0 means
both need_resched and !atomic.

Also remove the barrier() in preempt_enable() between
preempt_enable_no_resched() and preempt_check_resched() to avoid
having to reload the preemption value and allow the compiler to use
the flags of the previuos decrement. I couldn't come up with any sane
reason for this barrier() to be there as preempt_enable_no_resched()
already has a barrier() before doing the decrement.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/preempt.h     |   42 +++++++++++++++++++++++++++++++++++++-----
 include/linux/sched.h       |    2 +-
 include/linux/thread_info.h |    1 +
 kernel/context_tracking.c   |    2 +-
 kernel/cpu/idle.c           |   10 ++++++++++
 kernel/sched/core.c         |   18 ++++++++++++++----
 6 files changed, 64 insertions(+), 11 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,9 +10,19 @@
 #include <linux/linkage.h>
 #include <linux/list.h>
 
+/*
+ * We use the MSB mostly because its available; see <linux/hardirq.h> for
+ * the other bits.
+ */
+#define PREEMPT_NEED_RESCHED	0x80000000
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
 static __always_inline int preempt_count(void)
 {
-	return current_thread_info()->preempt_count;
+	return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
 }
 
 static __always_inline int *preempt_count_ptr(void)
@@ -20,6 +30,30 @@ static __always_inline int *preempt_coun
 	return &current_thread_info()->preempt_count;
 }
 
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+	*preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+	*preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+	return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
+}
+
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
@@ -37,7 +71,7 @@ asmlinkage void preempt_schedule(void);
 
 #define preempt_check_resched() \
 do { \
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+	if (unlikely(!*preempt_count_ptr())) \
 		preempt_schedule(); \
 } while (0)
 
@@ -47,7 +81,7 @@ void preempt_schedule_context(void);
 
 #define preempt_check_resched_context() \
 do { \
-	if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+	if (unlikely(!*preempt_count_ptr())) \
 		preempt_schedule_context(); \
 } while (0)
 #else
@@ -83,7 +117,6 @@ do { \
 #define preempt_enable() \
 do { \
 	preempt_enable_no_resched(); \
-	barrier(); \
 	preempt_check_resched(); \
 } while (0)
 
@@ -111,7 +144,6 @@ do { \
 #define preempt_enable_notrace() \
 do { \
 	preempt_enable_no_resched_notrace(); \
-	barrier(); \
 	preempt_check_resched_context(); \
 } while (0)
 
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2405,7 +2405,7 @@ static inline int signal_pending_state(l
 
 static inline int need_resched(void)
 {
-	return unlikely(test_thread_flag(TIF_NEED_RESCHED));
+	return unlikely(test_preempt_need_resched());
 }
 
 /*
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -104,6 +104,7 @@ static inline int test_ti_thread_flag(st
 #define test_thread_flag(flag) \
 	test_ti_thread_flag(current_thread_info(), flag)
 
+#define test_need_resched()	test_thread_flag(TIF_NEED_RESCHED)
 #define set_need_resched()	set_thread_flag(TIF_NEED_RESCHED)
 #define clear_need_resched()	clear_thread_flag(TIF_NEED_RESCHED)
 
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -115,7 +115,7 @@ void __sched notrace preempt_schedule_co
 {
 	enum ctx_state prev_ctx;
 
-	if (likely(!preemptible()))
+	if (likely(preempt_count() || irqs_disabled()))
 		return;
 
 	/*
--- a/kernel/cpu/idle.c
+++ b/kernel/cpu/idle.c
@@ -106,6 +106,13 @@ static void cpu_idle_loop(void)
 				current_set_polling();
 			}
 			arch_cpu_idle_exit();
+			/*
+			 * We need to test and propagate the TIF_NEED_RESCHED
+			 * bit here because we might not have send the
+			 * reschedule IPI to idle tasks.
+			 */
+			if (test_need_resched())
+				set_preempt_need_resched();
 		}
 		tick_nohz_idle_exit();
 		schedule_preempt_disabled();
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -526,8 +526,10 @@ void resched_task(struct task_struct *p)
 	set_tsk_need_resched(p);
 
 	cpu = task_cpu(p);
-	if (cpu == smp_processor_id())
+	if (cpu == smp_processor_id()) {
+		set_preempt_need_resched();
 		return;
+	}
 
 	/* NEED_RESCHED must be visible before we test polling */
 	smp_mb();
@@ -1411,6 +1413,14 @@ static void sched_ttwu_pending(void)
 
 void scheduler_ipi(void)
 {
+	/*
+	 * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting
+	 * TIF_NEED_RESCHED remotely (for the first time) will also send
+	 * this IPI.
+	 */
+	if (test_need_resched())
+		set_preempt_need_resched();
+
 	if (llist_empty(&this_rq()->wake_list)
 			&& !tick_nohz_full_cpu(smp_processor_id())
 			&& !got_nohz_idle_kick())
@@ -2445,6 +2455,7 @@ static void __sched __schedule(void)
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
 	clear_tsk_need_resched(prev);
+	clear_preempt_need_resched();
 	rq->skip_clock_update = 0;
 
 	if (likely(prev != next)) {
@@ -2531,7 +2542,7 @@ asmlinkage void __sched notrace preempt_
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task. Just return..
 	 */
-	if (likely(!preemptible()))
+	if (likely(preempt_count() || irqs_disabled()))
 		return;
 
 	do {
@@ -2556,11 +2567,10 @@ EXPORT_SYMBOL(preempt_schedule);
  */
 asmlinkage void __sched preempt_schedule_irq(void)
 {
-	struct thread_info *ti = current_thread_info();
 	enum ctx_state prev_state;
 
 	/* Catch callers which need to be fixed */
-	BUG_ON(ti->preempt_count || !irqs_disabled());
+	BUG_ON(preempt_count() || !irqs_disabled());
 
 	prev_state = exception_enter();
 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 21:43             ` Linus Torvalds
  2013-09-10 21:51               ` H. Peter Anvin
@ 2013-09-11 13:13               ` Peter Zijlstra
  2013-09-11 13:26                 ` Peter Zijlstra
                                   ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 13:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Tue, Sep 10, 2013 at 02:43:06PM -0700, Linus Torvalds wrote:
> That said, looking at your patch, I get the *very* strong feeling that
> we could make a macro that does all the repetitions for us, and then
> have a
> 
>   GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")

The below seems to compile..

---
 arch/x86/include/asm/atomic.h      |   29 +--------
 arch/x86/include/asm/atomic64_64.h |   28 +--------
 arch/x86/include/asm/local.h       |   28 +--------
 arch/x86/include/asm/addcc.h       |  115 +++++++++++++++++++++++++++++++++++++
 4 files changed, 128 insertions(+), 72 deletions(-)

Index: linux-2.6/arch/x86/include/asm/atomic.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/atomic.h
+++ linux-2.6/arch/x86/include/asm/atomic.h
@@ -6,6 +6,7 @@
 #include <asm/processor.h>
 #include <asm/alternative.h>
 #include <asm/cmpxchg.h>
+#include <asm/addcc.h>
 
 /*
  * Atomic operations that C can't guarantee us.  Useful for
@@ -76,12 +77,7 @@ static inline void atomic_sub(int i, ato
  */
 static inline int atomic_sub_and_test(int i, atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_ADDcc(v->counter, -i, LOCK_PREFIX, "e");
 }
 
 /**
@@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *
  */
 static inline int atomic_dec_and_test(atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "decl %0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_ADDcc(v->counter, -1, LOCK_PREFIX, "e");
 }
 
 /**
@@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(at
  */
 static inline int atomic_inc_and_test(atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "incl %0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_ADDcc(v->counter, 1, LOCK_PREFIX, "e");
 }
 
 /**
@@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(at
  */
 static inline int atomic_add_negative(int i, atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_ADDcc(v->counter, i, LOCK_PREFIX, "s");
 }
 
 /**
Index: linux-2.6/arch/x86/include/asm/atomic64_64.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/atomic64_64.h
+++ linux-2.6/arch/x86/include/asm/atomic64_64.h
@@ -72,12 +72,7 @@ static inline void atomic64_sub(long i,
  */
 static inline int atomic64_sub_and_test(long i, atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "subq %2,%0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "er" (i), "m" (v->counter) : "memory");
-	return c;
+	GENERATE_ADDcc(v->counter, -i, LOCK_PREFIX, "e");
 }
 
 /**
@@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64
  */
 static inline int atomic64_dec_and_test(atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "decq %0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "m" (v->counter) : "memory");
-	return c != 0;
+	GENERATE_ADDcc(v->counter, -1, LOCK_PREFIX, "e");
 }
 
 /**
@@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(
  */
 static inline int atomic64_inc_and_test(atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "incq %0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "m" (v->counter) : "memory");
-	return c != 0;
+	GENERATE_ADDcc(v->counter, 1, LOCK_PREFIX, "e");
 }
 
 /**
@@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(
  */
 static inline int atomic64_add_negative(long i, atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "addq %2,%0; sets %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "er" (i), "m" (v->counter) : "memory");
-	return c;
+	GENERATE_ADDcc(v->counter, i, LOCK_PREFIX, "s");
 }
 
 /**
Index: linux-2.6/arch/x86/include/asm/local.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/local.h
+++ linux-2.6/arch/x86/include/asm/local.h
@@ -52,12 +52,7 @@ static inline void local_sub(long i, loc
  */
 static inline int local_sub_and_test(long i, local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_SUB "%2,%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_ADDcc(l->a.counter, -i, "", "e");
 }
 
 /**
@@ -70,12 +65,7 @@ static inline int local_sub_and_test(lon
  */
 static inline int local_dec_and_test(local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_DEC "%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_ADDcc(l->a.counter, -1, "", "e");
 }
 
 /**
@@ -88,12 +78,7 @@ static inline int local_dec_and_test(loc
  */
 static inline int local_inc_and_test(local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_INC "%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_ADDcc(l->a.counter, 1, "", "e");
 }
 
 /**
@@ -107,12 +92,7 @@ static inline int local_inc_and_test(loc
  */
 static inline int local_add_negative(long i, local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_ADD "%2,%0; sets %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_ADDcc(l->a.counter, i, "", "s");
 }
 
 /**
Index: linux-2.6/arch/x86/include/asm/addcc.h
===================================================================
--- /dev/null
+++ linux-2.6/arch/x86/include/asm/addcc.h
@@ -0,0 +1,115 @@
+#ifndef _ASM_X86_ADDcc
+#define _ASM_X86_ADDcc
+
+extern void __bad_addcc_size(void);
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define GENERATE_ADDcc(var, val, lock, cc)				\
+do {									\
+	const int add_ID__ = (__builtin_constant_p(val) &&		\
+			((val) == 1 || (val) == -1)) ? (val) : 0;	\
+									\
+	switch (sizeof(var)) {						\
+	case 4:								\
+		if (add_ID__ == 1) {					\
+			asm volatile goto(lock "incl %0;"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var)			\
+					  : "memory" : cc_label);	\
+		} else if (add_ID__ == -1) {				\
+			asm volatile goto(lock "decl %0;"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var)			\
+					  : "memory" : cc_label);	\
+		} else {						\
+			asm volatile goto(lock "addl %1, %0;"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var), "er" (val)	\
+					  : "memory" : cc_label);	\
+		}							\
+		break;							\
+									\
+	case 8:								\
+		if (add_ID__ == 1) {					\
+			asm volatile goto(lock "incq %0;"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var)			\
+					  : "memory" : cc_label);	\
+		} else if (add_ID__ == -1) {				\
+			asm volatile goto(lock "decq %0;"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var)			\
+					  : "memory" : cc_label);	\
+		} else {						\
+			asm volatile goto(lock "addq %1, %0;"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var), "er" (val)	\
+					  : "memory" : cc_label);	\
+		}							\
+		break;							\
+									\
+	default: __bad_addcc_size();					\
+	}								\
+									\
+	return 0;							\
+cc_label:								\
+	return 1;							\
+} while (0)
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define GENERATE_ADDcc(var, val, lock, cc)				\
+do {									\
+	const int add_ID__ = (__builtin_constant_p(val) &&		\
+			((val) == 1 || (val) == -1)) ? (val) : 0;	\
+	char c;								\
+									\
+	switch (sizeof(var)) {						\
+	case 4:								\
+		if (add_ID__ == 1) {					\
+			asm volatile (lock "incl %0;"			\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : : "memory");		\
+		} else if (add_ID__ == -1) {				\
+			asm volatile (lock "decl %0;"			\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : : "memory");		\
+		} else {						\
+			asm volatile (lock "addl %2, %0;"		\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : "er" (val) : "memory");	\
+		}							\
+		break;							\
+									\
+	case 8:								\
+		if (add_ID__ == 1) {					\
+			asm volatile (lock "incq %0;"			\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : : "memory");		\
+		} else if (add_ID__ == -1) {				\
+			asm volatile (lock "decq %0;"			\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : : "memory");		\
+		} else {						\
+			asm volatile (lock "addq %2, %0;"		\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : "er" (val) : "memory");	\
+		}							\
+		break;							\
+									\
+	default: __bad_addcc_size();					\
+	}								\
+									\
+	return c != 0;							\
+} while (0)
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_ADDcc */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-11 13:13               ` Peter Zijlstra
@ 2013-09-11 13:26                 ` Peter Zijlstra
  2013-09-11 15:29                 ` H. Peter Anvin
  2013-09-11 15:33                 ` Linus Torvalds
  2 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 13:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Wed, Sep 11, 2013 at 03:13:23PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 10, 2013 at 02:43:06PM -0700, Linus Torvalds wrote:
> > That said, looking at your patch, I get the *very* strong feeling that
> > we could make a macro that does all the repetitions for us, and then
> > have a
> > 
> >   GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
> 
> The below seems to compile..
> 
> +#define GENERATE_ADDcc(var, val, lock, cc)				\

And here's one that builds and is usable for per-cpu things too:

static __always_inline bool __preempt_count_dec_and_test(void)
{
	GENERATE_ADDcc(__preempt_count, -1, "", __percpu_arg(0), "e");
}

---
 arch/x86/include/asm/addcc.h       |  115 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/atomic.h      |   29 +--------
 arch/x86/include/asm/atomic64_64.h |   28 +--------
 arch/x86/include/asm/local.h       |   28 +--------
 4 files changed, 128 insertions(+), 72 deletions(-)

--- /dev/null
+++ b/arch/x86/include/asm/addcc.h
@@ -0,0 +1,115 @@
+#ifndef _ASM_X86_ADDcc
+#define _ASM_X86_ADDcc
+
+extern void __bad_addcc_size(void);
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define GENERATE_ADDcc(var, val, lock, arg0, cc)			\
+do {									\
+	const int add_ID__ = (__builtin_constant_p(val) &&		\
+			((val) == 1 || (val) == -1)) ? (val) : 0;	\
+									\
+	switch (sizeof(var)) {						\
+	case 4:								\
+		if (add_ID__ == 1) {					\
+			asm volatile goto(lock "incl " arg0 ";"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var)			\
+					  : "memory" : cc_label);	\
+		} else if (add_ID__ == -1) {				\
+			asm volatile goto(lock "decl " arg0 ";"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var)			\
+					  : "memory" : cc_label);	\
+		} else {						\
+			asm volatile goto(lock "addl %1, " arg0 ";"	\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var), "er" (val)	\
+					  : "memory" : cc_label);	\
+		}							\
+		break;							\
+									\
+	case 8:								\
+		if (add_ID__ == 1) {					\
+			asm volatile goto(lock "incq " arg0 ";"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var)			\
+					  : "memory" : cc_label);	\
+		} else if (add_ID__ == -1) {				\
+			asm volatile goto(lock "decq " arg0 ";"		\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var)			\
+					  : "memory" : cc_label);	\
+		} else {						\
+			asm volatile goto(lock "addq %1, " arg0 ";"	\
+					  "j" cc " %l[cc_label]"	\
+					  : : "m" (var), "er" (val)	\
+					  : "memory" : cc_label);	\
+		}							\
+		break;							\
+									\
+	default: __bad_addcc_size();					\
+	}								\
+									\
+	return 0;							\
+cc_label:								\
+	return 1;							\
+} while (0)
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define GENERATE_ADDcc(var, val, lock, arg0, cc)			\
+do {									\
+	const int add_ID__ = (__builtin_constant_p(val) &&		\
+			((val) == 1 || (val) == -1)) ? (val) : 0;	\
+	char c;								\
+									\
+	switch (sizeof(var)) {						\
+	case 4:								\
+		if (add_ID__ == 1) {					\
+			asm volatile (lock "incl " arg0 ";"		\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : : "memory");		\
+		} else if (add_ID__ == -1) {				\
+			asm volatile (lock "decl " arg0 ";"		\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : : "memory");		\
+		} else {						\
+			asm volatile (lock "addl %2, " arg0 ";"		\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : "er" (val) : "memory");	\
+		}							\
+		break;							\
+									\
+	case 8:								\
+		if (add_ID__ == 1) {					\
+			asm volatile (lock "incq " arg0 ";"		\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : : "memory");		\
+		} else if (add_ID__ == -1) {				\
+			asm volatile (lock "decq " arg0 ";"		\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : : "memory");		\
+		} else {						\
+			asm volatile (lock "addq %2, " arg0 ";"		\
+					  "set" cc " %1"		\
+					  : "+m" (var), "=qm" (c)	\
+					  : "er" (val) : "memory");	\
+		}							\
+		break;							\
+									\
+	default: __bad_addcc_size();					\
+	}								\
+									\
+	return c != 0;							\
+} while (0)
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_ADDcc */
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -6,6 +6,7 @@
 #include <asm/processor.h>
 #include <asm/alternative.h>
 #include <asm/cmpxchg.h>
+#include <asm/addcc.h>
 
 /*
  * Atomic operations that C can't guarantee us.  Useful for
@@ -76,12 +77,7 @@ static inline void atomic_sub(int i, ato
  */
 static inline int atomic_sub_and_test(int i, atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_ADDcc(v->counter, -i, LOCK_PREFIX, "%0", "e");
 }
 
 /**
@@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *
  */
 static inline int atomic_dec_and_test(atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "decl %0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_ADDcc(v->counter, -1, LOCK_PREFIX, "%0", "e");
 }
 
 /**
@@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(at
  */
 static inline int atomic_inc_and_test(atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "incl %0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_ADDcc(v->counter, 1, LOCK_PREFIX, "%0", "e");
 }
 
 /**
@@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(at
  */
 static inline int atomic_add_negative(int i, atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_ADDcc(v->counter, i, LOCK_PREFIX, "%0", "s");
 }
 
 /**
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -72,12 +72,7 @@ static inline void atomic64_sub(long i,
  */
 static inline int atomic64_sub_and_test(long i, atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "subq %2,%0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "er" (i), "m" (v->counter) : "memory");
-	return c;
+	GENERATE_ADDcc(v->counter, -i, LOCK_PREFIX, "%0", "e");
 }
 
 /**
@@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64
  */
 static inline int atomic64_dec_and_test(atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "decq %0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "m" (v->counter) : "memory");
-	return c != 0;
+	GENERATE_ADDcc(v->counter, -1, LOCK_PREFIX, "%0", "e");
 }
 
 /**
@@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(
  */
 static inline int atomic64_inc_and_test(atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "incq %0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "m" (v->counter) : "memory");
-	return c != 0;
+	GENERATE_ADDcc(v->counter, 1, LOCK_PREFIX, "%0", "e");
 }
 
 /**
@@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(
  */
 static inline int atomic64_add_negative(long i, atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "addq %2,%0; sets %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "er" (i), "m" (v->counter) : "memory");
-	return c;
+	GENERATE_ADDcc(v->counter, i, LOCK_PREFIX, "%0", "s");
 }
 
 /**
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -52,12 +52,7 @@ static inline void local_sub(long i, loc
  */
 static inline int local_sub_and_test(long i, local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_SUB "%2,%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_ADDcc(l->a.counter, -i, "", "%0", "e");
 }
 
 /**
@@ -70,12 +65,7 @@ static inline int local_sub_and_test(lon
  */
 static inline int local_dec_and_test(local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_DEC "%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_ADDcc(l->a.counter, -1, "", "%0", "e");
 }
 
 /**
@@ -88,12 +78,7 @@ static inline int local_dec_and_test(loc
  */
 static inline int local_inc_and_test(local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_INC "%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_ADDcc(l->a.counter, 1, "", "%0", "e");
 }
 
 /**
@@ -107,12 +92,7 @@ static inline int local_inc_and_test(loc
  */
 static inline int local_add_negative(long i, local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_ADD "%2,%0; sets %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_ADDcc(l->a.counter, i, "", "%0", "s");
 }
 
 /**

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-11 11:06       ` Peter Zijlstra
@ 2013-09-11 13:34         ` Mike Galbraith
  2013-09-12  6:01           ` Mike Galbraith
  2013-09-11 16:35         ` Andy Lutomirski
  1 sibling, 1 reply; 52+ messages in thread
From: Mike Galbraith @ 2013-09-11 13:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Peter Anvin, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
	Vaidyanathan Srinivasan

On Wed, 2013-09-11 at 13:06 +0200, Peter Zijlstra wrote: 
> On Wed, Sep 11, 2013 at 10:25:30AM +0200, Peter Zijlstra wrote:
> > On Tue, Sep 10, 2013 at 06:59:57PM -0700, Andy Lutomirski wrote:
> 
> > > It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
> > > is set but the preempt resched bit is not -- the need_resched call
> > > between monitor and mwait won't notice TIF_NEED_RESCHED.
> > > 
> > > Is this condition possible?
> > 
> > Ah indeed, I'll have to go find all idle loops out there it seems. Now
> > if only they were easy to spot :/
> > 
> > I was hoping the generic idle thing was good enough, apparently not
> > quite. Thanks for spotting that.
> 
> OK, and the reason I didn't notice is that the entire TS_POLLING thing
> is completely wrecked so we'll get the interrupt anyway.
> 
> The below might fix it.. it boots, but then it already did so that
> doesn't say much.
> 
> Mike does it fix the funny you saw?

Yup, modulo test_need_resched() not existing in master.
E5620 pipe-test
v3.7.10                  578.5 KHz
v3.7.10-nothrottle       366.7 KHz
v3.8.13                  468.3 KHz
v3.9.11                  462.0 KHz
v3.10.4                  419.4 KHz
v3.11-rc3-4-g36f571e     400.1 KHz
master-9013-gbf83e61     553.5 KHz

I still have your not quite complete hrtimer patch in there, lest menu
governor munch any improvement, as well as my throttle-nohz patch, so
seems we may be a _bit_ down vs 3.7, but reschedule_interrupt did crawl
back under its rock, and while I haven't yet booted Q6600 box, core2
Toshiba Satellite lappy is using mwait_idle_with_hints() in master.

Thanks,

-Mike

> ---
> Subject: sched, idle: Fix the idle polling state logic
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Sep 11 12:43:13 CEST 2013
> 
> Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
> regressed several workloads and caused excessive reschedule
> interrupts.
> 
> The patch in question failed to notice that the x86 code had an
> inverted sense of the polling state versus the new generic code (x86:
> default polling, generic: default !polling).
> 
> Fix the two prominent x86 mwait based idle drivers and introduce a few
> new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
> usage).
> 
> Also switch the idle routines to using test_need_resched() which is an
> immediate TIF_NEED_RESCHED test as opposed to need_resched which will
> end up being slightly different.
> 
> Reported-by: Mike Galbraith <bitbucket@online.de>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/kernel/process.c     |    6 +--
>  drivers/acpi/processor_idle.c |   46 +++++-------------------
>  drivers/idle/intel_idle.c     |    2 -
>  include/linux/sched.h         |   78 ++++++++++++++++++++++++++++++++++++++----
>  kernel/cpu/idle.c             |    9 ++--
>  5 files changed, 89 insertions(+), 52 deletions(-)
> 
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -391,9 +391,9 @@ static void amd_e400_idle(void)
>  		 * The switch back from broadcast mode needs to be
>  		 * called with interrupts disabled.
>  		 */
> -		 local_irq_disable();
> -		 clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> -		 local_irq_enable();
> +		local_irq_disable();
> +		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> +		local_irq_enable();
>  	} else
>  		default_idle();
>  }
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -119,17 +119,10 @@ static struct dmi_system_id processor_po
>   */
>  static void acpi_safe_halt(void)
>  {
> -	current_thread_info()->status &= ~TS_POLLING;
> -	/*
> -	 * TS_POLLING-cleared state must be visible before we
> -	 * test NEED_RESCHED:
> -	 */
> -	smp_mb();
> -	if (!need_resched()) {
> +	if (!test_need_resched()) {
>  		safe_halt();
>  		local_irq_disable();
>  	}
> -	current_thread_info()->status |= TS_POLLING;
>  }
>  
>  #ifdef ARCH_APICTIMER_STOPS_ON_C3
> @@ -737,6 +730,11 @@ static int acpi_idle_enter_c1(struct cpu
>  	if (unlikely(!pr))
>  		return -EINVAL;
>  
> +	if (cx->entry_method == ACPI_CSTATE_FFH) {
> +		if (current_set_polling_and_test())
> +			return -EINVAL;
> +	}
> +
>  	lapic_timer_state_broadcast(pr, cx, 1);
>  	acpi_idle_do_entry(cx);
>  
> @@ -790,18 +788,9 @@ static int acpi_idle_enter_simple(struct
>  	if (unlikely(!pr))
>  		return -EINVAL;
>  
> -	if (cx->entry_method != ACPI_CSTATE_FFH) {
> -		current_thread_info()->status &= ~TS_POLLING;
> -		/*
> -		 * TS_POLLING-cleared state must be visible before we test
> -		 * NEED_RESCHED:
> -		 */
> -		smp_mb();
> -
> -		if (unlikely(need_resched())) {
> -			current_thread_info()->status |= TS_POLLING;
> +	if (cx->entry_method == ACPI_CSTATE_FFH) {
> +		if (current_set_polling_and_test())
>  			return -EINVAL;
> -		}
>  	}
>  
>  	/*
> @@ -819,9 +808,6 @@ static int acpi_idle_enter_simple(struct
>  
>  	sched_clock_idle_wakeup_event(0);
>  
> -	if (cx->entry_method != ACPI_CSTATE_FFH)
> -		current_thread_info()->status |= TS_POLLING;
> -
>  	lapic_timer_state_broadcast(pr, cx, 0);
>  	return index;
>  }
> @@ -858,18 +844,9 @@ static int acpi_idle_enter_bm(struct cpu
>  		}
>  	}
>  
> -	if (cx->entry_method != ACPI_CSTATE_FFH) {
> -		current_thread_info()->status &= ~TS_POLLING;
> -		/*
> -		 * TS_POLLING-cleared state must be visible before we test
> -		 * NEED_RESCHED:
> -		 */
> -		smp_mb();
> -
> -		if (unlikely(need_resched())) {
> -			current_thread_info()->status |= TS_POLLING;
> +	if (cx->entry_method == ACPI_CSTATE_FFH) {
> +		if (current_set_polling_and_test())
>  			return -EINVAL;
> -		}
>  	}
>  
>  	acpi_unlazy_tlb(smp_processor_id());
> @@ -915,9 +892,6 @@ static int acpi_idle_enter_bm(struct cpu
>  
>  	sched_clock_idle_wakeup_event(0);
>  
> -	if (cx->entry_method != ACPI_CSTATE_FFH)
> -		current_thread_info()->status |= TS_POLLING;
> -
>  	lapic_timer_state_broadcast(pr, cx, 0);
>  	return index;
>  }
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -359,7 +359,7 @@ static int intel_idle(struct cpuidle_dev
>  	if (!(lapic_timer_reliable_states & (1 << (cstate))))
>  		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
>  
> -	if (!need_resched()) {
> +	if (!current_set_polling_and_test()) {
>  
>  		__monitor((void *)&current_thread_info()->flags, 0, 0);
>  		smp_mb();
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2471,34 +2471,98 @@ static inline int tsk_is_polling(struct
>  {
>  	return task_thread_info(p)->status & TS_POLLING;
>  }
> -static inline void current_set_polling(void)
> +static inline void __current_set_polling(void)
>  {
>  	current_thread_info()->status |= TS_POLLING;
>  }
>  
> -static inline void current_clr_polling(void)
> +static inline bool __must_check current_set_polling_and_test(void)
> +{
> +	__current_set_polling();
> +
> +	/*
> +	 * Polling state must be visible before we test NEED_RESCHED,
> +	 * paired by resched_task()
> +	 */
> +	smp_mb();
> +
> +	return unlikely(test_need_resched());
> +}
> +
> +static inline void __current_clr_polling(void)
>  {
>  	current_thread_info()->status &= ~TS_POLLING;
> -	smp_mb__after_clear_bit();
> +}
> +
> +static inline bool __must_check current_clr_polling_and_test(void)
> +{
> +	__current_clr_polling();
> +
> +	/*
> +	 * Polling state must be visible before we test NEED_RESCHED,
> +	 * paired by resched_task()
> +	 */
> +	smp_mb();
> +
> +	return unlikely(test_need_resched());
>  }
>  #elif defined(TIF_POLLING_NRFLAG)
>  static inline int tsk_is_polling(struct task_struct *p)
>  {
>  	return test_tsk_thread_flag(p, TIF_POLLING_NRFLAG);
>  }
> -static inline void current_set_polling(void)
> +
> +static inline void __current_set_polling(void)
>  {
>  	set_thread_flag(TIF_POLLING_NRFLAG);
>  }
>  
> -static inline void current_clr_polling(void)
> +static inline bool __must_check current_set_polling_and_test(void)
> +{
> +	__current_set_polling();
> +
> +	/*
> +	 * Polling state must be visible before we test NEED_RESCHED,
> +	 * paired by resched_task()
> +	 *
> +	 * XXX: assumes set/clear bit are identical barrier wise.
> +	 */
> +	smp_mb__after_clear_bit();
> +
> +	return unlikely(test_need_resched());
> +}
> +
> +static inline void __current_clr_polling(void)
>  {
>  	clear_thread_flag(TIF_POLLING_NRFLAG);
>  }
> +
> +static inline bool __must_check current_clr_polling_and_test(void)
> +{
> +	__current_clr_polling();
> +
> +	/*
> +	 * Polling state must be visible before we test NEED_RESCHED,
> +	 * paired by resched_task()
> +	 */
> +	smp_mb__after_clear_bit();
> +
> +	return unlikely(test_need_resched());
> +}
> +
>  #else
>  static inline int tsk_is_polling(struct task_struct *p) { return 0; }
> -static inline void current_set_polling(void) { }
> -static inline void current_clr_polling(void) { }
> +static inline void __current_set_polling(void) { }
> +static inline void __current_clr_polling(void) { }
> +
> +static inline bool __must_check current_set_polling_and_test(void)
> +{
> +	return unlikely(test_need_resched);
> +}
> +static inline bool __must_check current_clr_polling_and_test(void)
> +{
> +	return unlikely(test_need_resched);
> +}
>  #endif
>  
>  /*
> --- a/kernel/cpu/idle.c
> +++ b/kernel/cpu/idle.c
> @@ -44,7 +44,7 @@ static inline int cpu_idle_poll(void)
>  	rcu_idle_enter();
>  	trace_cpu_idle_rcuidle(0, smp_processor_id());
>  	local_irq_enable();
> -	while (!need_resched())
> +	while (!test_need_resched())
>  		cpu_relax();
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>  	rcu_idle_exit();
> @@ -92,8 +92,7 @@ static void cpu_idle_loop(void)
>  			if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
>  				cpu_idle_poll();
>  			} else {
> -				current_clr_polling();
> -				if (!need_resched()) {
> +				if (!current_clr_polling_and_test()) {
>  					stop_critical_timings();
>  					rcu_idle_enter();
>  					arch_cpu_idle();
> @@ -103,7 +102,7 @@ static void cpu_idle_loop(void)
>  				} else {
>  					local_irq_enable();
>  				}
> -				current_set_polling();
> +				__current_set_polling();
>  			}
>  			arch_cpu_idle_exit();
>  			/*
> @@ -136,7 +135,7 @@ void cpu_startup_entry(enum cpuhp_state
>  	 */
>  	boot_init_stack_canary();
>  #endif
> -	current_set_polling();
> +	__current_set_polling();
>  	arch_cpu_idle_prepare();
>  	cpu_idle_loop();
>  }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-11 13:13               ` Peter Zijlstra
  2013-09-11 13:26                 ` Peter Zijlstra
@ 2013-09-11 15:29                 ` H. Peter Anvin
  2013-09-11 15:33                 ` Linus Torvalds
  2 siblings, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2013-09-11 15:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On 09/11/2013 06:13 AM, Peter Zijlstra wrote:
> On Tue, Sep 10, 2013 at 02:43:06PM -0700, Linus Torvalds wrote:
>> That said, looking at your patch, I get the *very* strong feeling that
>> we could make a macro that does all the repetitions for us, and then
>> have a
>>
>>   GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
> 
> The below seems to compile..
> 
> +
> +#define GENERATE_ADDcc(var, val, lock, cc)				\
> +do {									\
> +	const int add_ID__ = (__builtin_constant_p(val) &&		\
> +			((val) == 1 || (val) == -1)) ? (val) : 0;	\
> +									\
> +	switch (sizeof(var)) {						\
> +	case 4:								\
> +		if (add_ID__ == 1) {					\
> +			asm volatile goto(lock "incl %0;"		\
> +					  "j" cc " %l[cc_label]"	\
> +					  : : "m" (var)			\
> +					  : "memory" : cc_label);	\
> +		} else if (add_ID__ == -1) {				\
> +			asm volatile goto(lock "decl %0;"		\
> +					  "j" cc " %l[cc_label]"	\
> +					  : : "m" (var)			\
> +					  : "memory" : cc_label);	\
> +		} else {						\
> +			asm volatile goto(lock "addl %1, %0;"		\
> +					  "j" cc " %l[cc_label]"	\
> +					  : : "m" (var), "er" (val)	\
> +					  : "memory" : cc_label);	\
> +		}							\
> +		break;							\
> +									\
> +	case 8:								\
> +		if (add_ID__ == 1) {					\
> +			asm volatile goto(lock "incq %0;"		\
> +					  "j" cc " %l[cc_label]"	\
> +					  : : "m" (var)			\
> +					  : "memory" : cc_label);	\
> +		} else if (add_ID__ == -1) {				\
> +			asm volatile goto(lock "decq %0;"		\
> +					  "j" cc " %l[cc_label]"	\
> +					  : : "m" (var)			\
> +					  : "memory" : cc_label);	\
> +		} else {						\
> +			asm volatile goto(lock "addq %1, %0;"		\
> +					  "j" cc " %l[cc_label]"	\
> +					  : : "m" (var), "er" (val)	\
> +					  : "memory" : cc_label);	\
> +		}							\
> +		break;							\
> +									\

At least in the "asm goto" case you can use:

	lock "add%z0 %1,%0;"

... and skip the switch statement.

There was a bug in some old (gcc 3.x?) early x86-64 versions which would
treat %z0 as if it was %Z0 which means it would emit "ll" instead of "q"
but that doesn't apply to any gcc which has "asm goto"...

	-hpa



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-11 13:13               ` Peter Zijlstra
  2013-09-11 13:26                 ` Peter Zijlstra
  2013-09-11 15:29                 ` H. Peter Anvin
@ 2013-09-11 15:33                 ` Linus Torvalds
  2013-09-11 18:59                   ` Peter Zijlstra
  2 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-11 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Wed, Sep 11, 2013 at 6:13 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 10, 2013 at 02:43:06PM -0700, Linus Torvalds wrote:
>> That said, looking at your patch, I get the *very* strong feeling that
>> we could make a macro that does all the repetitions for us, and then
>> have a
>>
>>   GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
>
> The below seems to compile..

You have that horrible duplication in the macro itself now, though.

The only difference between the 4-byte and the 8-byte one is the "l"
vs "q". And the counter increment/decrement argument (by the caller)
defines whether it's add/dec/inc. Why not just add those as parameters
to the macro, and suddenly the macro becomes 10x smaller.

An excessively complex macro that makes the arguments trivially
simpler is not worth it.

Especially since that excessive macro complexity now means that your
macro is useless for things that the *simpler* macro could have done,
like the bit-test-and-modify cases.

So your complexity actually makes things worse.

So just pass in the operation name and size. Then somebody will use it
for test_and_set_bit() and friends too.

              Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-10 16:24       ` Linus Torvalds
@ 2013-09-11 16:00         ` H. Peter Anvin
  0 siblings, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2013-09-11 16:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, Ingo Molnar, Peter Zijlstra, Andi Kleen,
	Mike Galbraith, Thomas Gleixner, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On 09/10/2013 09:24 AM, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 8:29 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>>
>> also.. yuck on using "dec"
>> "dec" sucks, please use "sub foo  ,1" instead
> 
> That's a bigger instruction, largely due to the constant.
> 
>> (dec sucks because of its broken flags behavior; it creates basically a
>> bubble in the pipeline)
> 
> Intel could (and should) just fix that. It's "easy" enough - you just
> need to rename the carry flag as a separate register (and make sure
> that the conditional instructions that don't use it don't think they
> need it).
> 
> In fact I thought Intel already did that on their large cores. Are you
> sure they still have trouble with inc/dec?
> 

Big cores are fine, but Atom (and presumably Quark) might still suffer a
performance penalty.

	-hpa



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-11 11:06       ` Peter Zijlstra
  2013-09-11 13:34         ` Mike Galbraith
@ 2013-09-11 16:35         ` Andy Lutomirski
  2013-09-11 18:05           ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Andy Lutomirski @ 2013-09-11 16:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
	Vaidyanathan Srinivasan

On Wed, Sep 11, 2013 at 4:06 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Sep 11, 2013 at 10:25:30AM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 10, 2013 at 06:59:57PM -0700, Andy Lutomirski wrote:
>
>> > It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
>> > is set but the preempt resched bit is not -- the need_resched call
>> > between monitor and mwait won't notice TIF_NEED_RESCHED.
>> >
>> > Is this condition possible?
>>
>> Ah indeed, I'll have to go find all idle loops out there it seems. Now
>> if only they were easy to spot :/
>>
>> I was hoping the generic idle thing was good enough, apparently not
>> quite. Thanks for spotting that.
>
> OK, and the reason I didn't notice is that the entire TS_POLLING thing
> is completely wrecked so we'll get the interrupt anyway.
>

I bet that this improves cross-cpu wakeup latency, too -- the old code
would presumably wake up the cpu and then immediately interrupt it.

It might be nice to rename one or both of need_resched and
test_need_resched, though -- the difference is somewhat inscrutable.

--Andy

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-11 16:35         ` Andy Lutomirski
@ 2013-09-11 18:05           ` Peter Zijlstra
  2013-09-11 18:07             ` Andy Lutomirski
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 18:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
	Vaidyanathan Srinivasan

On Wed, Sep 11, 2013 at 09:35:08AM -0700, Andy Lutomirski wrote:
> I bet that this improves cross-cpu wakeup latency, too -- the old code
> would presumably wake up the cpu and then immediately interrupt it.

Yeah,. its what clued Mike in to there being a problem.

> It might be nice to rename one or both of need_resched and
> test_need_resched, though -- the difference is somewhat inscrutable.

I agreed, I've just been unable to come up with anything sane.

The best I could come up with is renaming
{set,clear,test}_need_resched() to {}_tif_need_resched(), so we then end
up with test_tif_need_resched(), which is slightly more different from
need_resched().

Ideas?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-11 18:05           ` Peter Zijlstra
@ 2013-09-11 18:07             ` Andy Lutomirski
  0 siblings, 0 replies; 52+ messages in thread
From: Andy Lutomirski @ 2013-09-11 18:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
	Vaidyanathan Srinivasan

On Wed, Sep 11, 2013 at 11:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Sep 11, 2013 at 09:35:08AM -0700, Andy Lutomirski wrote:
>> I bet that this improves cross-cpu wakeup latency, too -- the old code
>> would presumably wake up the cpu and then immediately interrupt it.
>
> Yeah,. its what clued Mike in to there being a problem.
>
>> It might be nice to rename one or both of need_resched and
>> test_need_resched, though -- the difference is somewhat inscrutable.
>
> I agreed, I've just been unable to come up with anything sane.
>
> The best I could come up with is renaming
> {set,clear,test}_need_resched() to {}_tif_need_resched(), so we then end
> up with test_tif_need_resched(), which is slightly more different from
> need_resched().

Nothing great.  That's at least better than the current state, I think.

--Andy

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-11 15:33                 ` Linus Torvalds
@ 2013-09-11 18:59                   ` Peter Zijlstra
  2013-09-11 23:02                     ` Linus Torvalds
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 18:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Wed, Sep 11, 2013 at 08:33:21AM -0700, Linus Torvalds wrote:
> An excessively complex macro that makes the arguments trivially
> simpler is not worth it.

OK, stripped it down further, I couldn't quite see how to collapse the
unary and binary operator variants though :/

---
 arch/x86/include/asm/rmwcc.h       |   62 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/atomic.h      |   29 ++---------------
 arch/x86/include/asm/atomic64_64.h |   28 ++--------------
 arch/x86/include/asm/local.h       |   28 ++--------------
 4 files changed, 75 insertions(+), 72 deletions(-)

--- /dev/null
+++ b/arch/x86/include/asm/rmwcc.h
@@ -0,0 +1,62 @@
+#ifndef _ASM_X86_RMWcc
+#define _ASM_X86_RMWcc
+
+extern void __bad_rmwcc_nargs(void);
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define GENERATE_RMWcc(var, val, op, nargs, arg0, cc)			\
+do {									\
+	switch (nargs) {						\
+	case 0:								\
+		asm volatile goto (op " " arg0 ";"			\
+				"j" cc " %l[cc_label]"			\
+				: : "m" (var)				\
+				: "memory" : cc_label);			\
+		break;							\
+									\
+	case 1:								\
+		asm volatile goto (op " %1, " arg0 ";"			\
+				"j" cc " %l[cc_label]"			\
+				: : "m" (var), "er" (val)		\
+				: "memory" : cc_label);			\
+		break;							\
+									\
+	default: __bad_rmwcc_nargs();					\
+	}								\
+									\
+	return 0;							\
+cc_label:								\
+	return 1;							\
+} while (0)
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define GENERATE_RMWcc(var, val, op, nargs, arg0, cc)			\
+do {									\
+	char c;								\
+									\
+	switch (nargs) {						\
+	case 0:								\
+		asm volatile (op " " arg0 ";"				\
+				"set" cc " %1"				\
+				: "+m" (var), "=qm" (c)			\
+				: : "memory");				\
+		break;							\
+									\
+	case 1:								\
+		asm volatile (op " %2, " arg0 ";"			\
+				"set" cc " %1"				\
+				: "+m" (var), "=qm" (c)			\
+				: "er" (val) : "memory");		\
+		break;							\
+									\
+	default: __bad_rmwcc_nargs();					\
+	}								\
+									\
+	return c != 0;							\
+} while (0)
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_RMWcc */
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -6,6 +6,7 @@
 #include <asm/processor.h>
 #include <asm/alternative.h>
 #include <asm/cmpxchg.h>
+#include <asm/rmwcc.h>
 
 /*
  * Atomic operations that C can't guarantee us.  Useful for
@@ -76,12 +77,7 @@ static inline void atomic_sub(int i, ato
  */
 static inline int atomic_sub_and_test(int i, atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_RMWcc(v->counter, i, LOCK_PREFIX "subl", 1, "%0", "e");
 }
 
 /**
@@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *
  */
 static inline int atomic_dec_and_test(atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "decl %0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_RMWcc(v->counter, 0, LOCK_PREFIX "decl", 0, "%0", "e");
 }
 
 /**
@@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(at
  */
 static inline int atomic_inc_and_test(atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "incl %0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_RMWcc(v->counter, 0, LOCK_PREFIX "incl", 0, "%0", "e");
 }
 
 /**
@@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(at
  */
 static inline int atomic_add_negative(int i, atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_RMWcc(v->counter, i, LOCK_PREFIX "addl", 1, "%0", "s");
 }
 
 /**
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -72,12 +72,7 @@ static inline void atomic64_sub(long i,
  */
 static inline int atomic64_sub_and_test(long i, atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "subq %2,%0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "er" (i), "m" (v->counter) : "memory");
-	return c;
+	GENERATE_RMWcc(v->counter, i, LOCK_PREFIX "subq", 1, "%0", "e");
 }
 
 /**
@@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64
  */
 static inline int atomic64_dec_and_test(atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "decq %0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "m" (v->counter) : "memory");
-	return c != 0;
+	GENERATE_RMWcc(v->counter, 0, LOCK_PREFIX "decq", 0, "%0", "e");
 }
 
 /**
@@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(
  */
 static inline int atomic64_inc_and_test(atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "incq %0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "m" (v->counter) : "memory");
-	return c != 0;
+	GENERATE_RMWcc(v->counter, 0, LOCK_PREFIX "incq", 0, "%0", "e");
 }
 
 /**
@@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(
  */
 static inline int atomic64_add_negative(long i, atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "addq %2,%0; sets %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "er" (i), "m" (v->counter) : "memory");
-	return c;
+	GENERATE_RMWcc(v->counter, i, LOCK_PREFIX "addq", 1, "%0", "s");
 }
 
 /**
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -52,12 +52,7 @@ static inline void local_sub(long i, loc
  */
 static inline int local_sub_and_test(long i, local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_SUB "%2,%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_RMWcc(l->a.counter, i, _ASM_SUB, 1, "%0", "e");
 }
 
 /**
@@ -70,12 +65,7 @@ static inline int local_sub_and_test(lon
  */
 static inline int local_dec_and_test(local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_DEC "%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_RMWcc(l->a.counter, 0, _ASM_DEC, 0, "%0", "e");
 }
 
 /**
@@ -88,12 +78,7 @@ static inline int local_dec_and_test(loc
  */
 static inline int local_inc_and_test(local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_INC "%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GENERATE_RMWcc(l->a.counter, 0, _ASM_INC, 0, "%0", "e");
 }
 
 /**
@@ -107,12 +92,7 @@ static inline int local_inc_and_test(loc
  */
 static inline int local_add_negative(long i, local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_ADD "%2,%0; sets %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GENERATE_RMWcc(l->a.counter, i, _ASM_ADD, 1, "%0", "s");
 }
 
 /**

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-11 18:59                   ` Peter Zijlstra
@ 2013-09-11 23:02                     ` Linus Torvalds
  2013-09-12  2:20                       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-11 23:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Wed, Sep 11, 2013 at 11:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> OK, stripped it down further, I couldn't quite see how to collapse the
> unary and binary operator variants though :/

Ok, this looks pretty good. I assume it works too? ;)

                 Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-11 23:02                     ` Linus Torvalds
@ 2013-09-12  2:20                       ` Peter Zijlstra
  2013-09-12  2:43                         ` Linus Torvalds
  2013-09-13  7:25                         ` Kevin Easton
  0 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-12  2:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Wed, Sep 11, 2013 at 04:02:14PM -0700, Linus Torvalds wrote:
> On Wed, Sep 11, 2013 at 11:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > OK, stripped it down further, I couldn't quite see how to collapse the
> > unary and binary operator variants though :/
> 
> Ok, this looks pretty good. I assume it works too? ;)

Only compile tested that one.. the below is kvm boot tested until root
mount -- I'll try on actual hardware when I've gotten some sleep.

I split the thing up into two macros GEN_UNARY_RMWcc and
GEN_BINARY_RMWcc which ends up being more readable as well as smaller
code overall.

I also attempted to convert asm/bitops.h, although I'm not sure it'll
compile right with older GCCs due to the comment near BITOP_ADDR()

It also changes the fallback from sbb %0,%0 to setc %0, which afaict
should be similar but I'm not too familiar with all that.

I might have to add the clobber to the macro arguments so we can do
version without "memory" clobber, although bitops is inconsistent with
that as well, __test_and_clear_bit() doesn't have a memory clobber but
__test_and_change_bit() does.

---
 arch/x86/include/asm/atomic.h      |   29 +++---------------
 arch/x86/include/asm/atomic64_64.h |   28 ++---------------
 arch/x86/include/asm/bitops.h      |   58 +++++--------------------------------
 arch/x86/include/asm/local.h       |   28 ++---------------
 arch/x86/include/asm/rmwcc.h       |   52 +++++++++++++++++++++++++++++++++
 5 files changed, 73 insertions(+), 122 deletions(-)

--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -6,6 +6,7 @@
 #include <asm/processor.h>
 #include <asm/alternative.h>
 #include <asm/cmpxchg.h>
+#include <asm/rmwcc.h>
 
 /*
  * Atomic operations that C can't guarantee us.  Useful for
@@ -76,12 +77,7 @@ static inline void atomic_sub(int i, ato
  */
 static inline int atomic_sub_and_test(int i, atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, i, "%0", "e");
 }
 
 /**
@@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *
  */
 static inline int atomic_dec_and_test(atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "decl %0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e");
 }
 
 /**
@@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(at
  */
 static inline int atomic_inc_and_test(atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "incl %0; sete %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e");
 }
 
 /**
@@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(at
  */
 static inline int atomic_add_negative(int i, atomic_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
-		     : "+m" (v->counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, i, "%0", "s");
 }
 
 /**
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -72,12 +72,7 @@ static inline void atomic64_sub(long i,
  */
 static inline int atomic64_sub_and_test(long i, atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "subq %2,%0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "er" (i), "m" (v->counter) : "memory");
-	return c;
+	GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, i, "%0", "e");
 }
 
 /**
@@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64
  */
 static inline int atomic64_dec_and_test(atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "decq %0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "m" (v->counter) : "memory");
-	return c != 0;
+	GEN_UNARY_RMWcc(LOCK_PREFIX "decq", v->counter, "%0", "e");
 }
 
 /**
@@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(
  */
 static inline int atomic64_inc_and_test(atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "incq %0; sete %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "m" (v->counter) : "memory");
-	return c != 0;
+	GEN_UNARY_RMWcc(LOCK_PREFIX "incq", v->counter, "%0", "e");
 }
 
 /**
@@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(
  */
 static inline int atomic64_add_negative(long i, atomic64_t *v)
 {
-	unsigned char c;
-
-	asm volatile(LOCK_PREFIX "addq %2,%0; sets %1"
-		     : "=m" (v->counter), "=qm" (c)
-		     : "er" (i), "m" (v->counter) : "memory");
-	return c;
+	GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, i, "%0", "s");
 }
 
 /**
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -14,6 +14,7 @@
 
 #include <linux/compiler.h>
 #include <asm/alternative.h>
+#include <asm/rmwcc.h>
 
 #if BITS_PER_LONG == 32
 # define _BITOPS_LONG_SHIFT 5
@@ -204,12 +205,7 @@ static inline void change_bit(long nr, v
  */
 static inline int test_and_set_bit(long nr, volatile unsigned long *addr)
 {
-	int oldbit;
-
-	asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
-		     "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
-
-	return oldbit;
+	GEN_BINARY_RMWcc(LOCK_PREFIX "bts", *addr, nr, "%0", "c");
 }
 
 /**
@@ -236,13 +232,7 @@ test_and_set_bit_lock(long nr, volatile
  */
 static inline int __test_and_set_bit(long nr, volatile unsigned long *addr)
 {
-	int oldbit;
-
-	asm("bts %2,%1\n\t"
-	    "sbb %0,%0"
-	    : "=r" (oldbit), ADDR
-	    : "Ir" (nr));
-	return oldbit;
+	GEN_BINARY_RMWcc("bts", *addr, nr, "%0", "c");
 }
 
 /**
@@ -255,13 +245,7 @@ static inline int __test_and_set_bit(lon
  */
 static inline int test_and_clear_bit(long nr, volatile unsigned long *addr)
 {
-	int oldbit;
-
-	asm volatile(LOCK_PREFIX "btr %2,%1\n\t"
-		     "sbb %0,%0"
-		     : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
-
-	return oldbit;
+	GEN_BINARY_RMWcc(LOCK_PREFIX "btr", *addr, nr, "%0", "c");
 }
 
 /**
@@ -282,26 +266,13 @@ static inline int test_and_clear_bit(lon
  */
 static inline int __test_and_clear_bit(long nr, volatile unsigned long *addr)
 {
-	int oldbit;
-
-	asm volatile("btr %2,%1\n\t"
-		     "sbb %0,%0"
-		     : "=r" (oldbit), ADDR
-		     : "Ir" (nr));
-	return oldbit;
+	GEN_BINARY_RMWcc("btr", *addr, nr, "%0", "c");
 }
 
 /* WARNING: non atomic and it can be reordered! */
 static inline int __test_and_change_bit(long nr, volatile unsigned long *addr)
 {
-	int oldbit;
-
-	asm volatile("btc %2,%1\n\t"
-		     "sbb %0,%0"
-		     : "=r" (oldbit), ADDR
-		     : "Ir" (nr) : "memory");
-
-	return oldbit;
+	GEN_BINARY_RMWcc("btc", *addr, nr, "%0", "c");
 }
 
 /**
@@ -314,13 +285,7 @@ static inline int __test_and_change_bit(
  */
 static inline int test_and_change_bit(long nr, volatile unsigned long *addr)
 {
-	int oldbit;
-
-	asm volatile(LOCK_PREFIX "btc %2,%1\n\t"
-		     "sbb %0,%0"
-		     : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
-
-	return oldbit;
+	GEN_BINARY_RMWcc(LOCK_PREFIX "btc", *addr, nr, "%0", "c");
 }
 
 static __always_inline int constant_test_bit(long nr, const volatile unsigned long *addr)
@@ -331,14 +296,7 @@ static __always_inline int constant_test
 
 static inline int variable_test_bit(long nr, volatile const unsigned long *addr)
 {
-	int oldbit;
-
-	asm volatile("bt %2,%1\n\t"
-		     "sbb %0,%0"
-		     : "=r" (oldbit)
-		     : "m" (*(unsigned long *)addr), "Ir" (nr));
-
-	return oldbit;
+	GEN_BINARY_RMWcc("bt", *(volatile unsigned long *)addr, nr, "%0", "c");
 }
 
 #if 0 /* Fool kernel-doc since it doesn't do macros yet */
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -52,12 +52,7 @@ static inline void local_sub(long i, loc
  */
 static inline int local_sub_and_test(long i, local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_SUB "%2,%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GEN_BINARY_RMWcc(_ASM_SUB, l->a.counter, i, "%0", "e");
 }
 
 /**
@@ -70,12 +65,7 @@ static inline int local_sub_and_test(lon
  */
 static inline int local_dec_and_test(local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_DEC "%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GEN_UNARY_RMWcc(_ASM_DEC, l->a.counter, "%0", "e");
 }
 
 /**
@@ -88,12 +78,7 @@ static inline int local_dec_and_test(loc
  */
 static inline int local_inc_and_test(local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_INC "%0; sete %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : : "memory");
-	return c != 0;
+	GEN_UNARY_RMWcc(_ASM_INC, l->a.counter, "%0", "e");
 }
 
 /**
@@ -107,12 +92,7 @@ static inline int local_inc_and_test(loc
  */
 static inline int local_add_negative(long i, local_t *l)
 {
-	unsigned char c;
-
-	asm volatile(_ASM_ADD "%2,%0; sets %1"
-		     : "+m" (l->a.counter), "=qm" (c)
-		     : "ir" (i) : "memory");
-	return c;
+	GEN_BINARY_RMWcc(_ASM_ADD, l->a.counter, i, "%0", "s");
 }
 
 /**
--- /dev/null
+++ b/arch/x86/include/asm/rmwcc.h
@@ -0,0 +1,52 @@
+#ifndef _ASM_X86_RMWcc
+#define _ASM_X86_RMWcc
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define GEN_UNARY_RMWcc(op, var, arg0, cc)				\
+do {									\
+	asm volatile goto (op " " arg0 ";"				\
+			"j" cc " %l[cc_label]"				\
+			: : "m" (var)					\
+			: "memory" : cc_label);				\
+	return 0;							\
+cc_label:								\
+	return 1;							\
+} while (0)
+
+#define GEN_BINARY_RMWcc(op, var, val, arg0, cc)			\
+do {									\
+	asm volatile goto (op " %1, " arg0 ";"				\
+			"j" cc " %l[cc_label]"				\
+			: : "m" (var), "er" (val)			\
+			: "memory" : cc_label);				\
+	return 0;							\
+cc_label:								\
+	return 1;							\
+} while (0)
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define GEN_UNARY_RMWcc(op, var, arg0, cc)				\
+do {									\
+	char c;								\
+	asm volatile (op " " arg0 ";"					\
+			"set" cc " %1"					\
+			: "+m" (var), "=qm" (c)				\
+			: : "memory");					\
+	return c != 0;							\
+} while (0)
+
+#define GEN_BINARY_RMWcc(op, var, val, arg0, cc)			\
+do {									\
+	char c;								\
+	asm volatile (op " %2, " arg0 ";"				\
+			"set" cc " %1"					\
+			: "+m" (var), "=qm" (c)				\
+			: "er" (val) : "memory");			\
+	return c != 0;							\
+} while (0)
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_RMWcc */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-12  2:20                       ` Peter Zijlstra
@ 2013-09-12  2:43                         ` Linus Torvalds
  2013-09-12 11:51                           ` Peter Zijlstra
  2013-09-13  7:25                         ` Kevin Easton
  1 sibling, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-12  2:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Wed, Sep 11, 2013 at 7:20 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> I split the thing up into two macros GEN_UNARY_RMWcc and
> GEN_BINARY_RMWcc which ends up being more readable as well as smaller
> code overall.

Yes, that looks like the right abstraction level. Powerful without
being complicated.

> I also attempted to convert asm/bitops.h, although I'm not sure it'll
> compile right with older GCCs due to the comment near BITOP_ADDR()

Actually, since you now have that memory clobber, it no longer
matters, and "m" is the right thing.

That said, the very same memory clobber may be what makes this whole
approach a loss, if it causes gcc to do tons of reloads or other
random things.

That memory clobber wasn't an issue for the whole
__preempt_count_dec_and_test() thing, because there we needed to keep
anything before the decrement contained anyway.

For other cases? Who knows.. A lot of the "change and test atomically"
things have the same containment need, so it might not be a problem.

> I might have to add the clobber to the macro arguments so we can do
> version without "memory" clobber, although bitops is inconsistent with
> that as well, __test_and_clear_bit() doesn't have a memory clobber but
> __test_and_change_bit() does.

You do need the memory clobber. The "asm goto" version can not have
outputs, so "=m" and "+m" are both no-go, and so the only thing left
is that memory clobber, as far as I can see..

The bitops didn't use to need it, because they had that "+/=m" that
already tells gcc that the target is changed.

           Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
  2013-09-11 13:34         ` Mike Galbraith
@ 2013-09-12  6:01           ` Mike Galbraith
  0 siblings, 0 replies; 52+ messages in thread
From: Mike Galbraith @ 2013-09-12  6:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Peter Anvin, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
	Vaidyanathan Srinivasan

On Wed, 2013-09-11 at 15:34 +0200, Mike Galbraith wrote: 
> On Wed, 2013-09-11 at 13:06 +0200, Peter Zijlstra wrote: 

> > Mike does it fix the funny you saw?
> 
> Yup, modulo test_need_resched() not existing in master.

...

> while I haven't yet booted Q6600 box, core2
> Toshiba Satellite lappy is using mwait_idle_with_hints() in master.

Heh.  Core2 lappy uses mwait_idle_with_hints().. unless you boot
processor.max_cstate=1 so it doesn't use piece of shite hpet.  Q6600
always has idle woes, seemingly because box doesn't do deep enough
cstate to make something (acpi?) happy, despite mwait_idle() having
worked fine for years.

Seems I need to find the cstate dependency spot, and rap it upside the
head if I want my core2 boxen to regain happy campers status.

-Mike


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-12  2:43                         ` Linus Torvalds
@ 2013-09-12 11:51                           ` Peter Zijlstra
  2013-09-12 12:25                             ` Ingo Molnar
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-12 11:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch

On Wed, Sep 11, 2013 at 07:43:42PM -0700, Linus Torvalds wrote:
> On Wed, Sep 11, 2013 at 7:20 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > I split the thing up into two macros GEN_UNARY_RMWcc and
> > GEN_BINARY_RMWcc which ends up being more readable as well as smaller
> > code overall.
> 
> Yes, that looks like the right abstraction level. Powerful without
> being complicated.

> That said, the very same memory clobber may be what makes this whole
> approach a loss, if it causes gcc to do tons of reloads or other
> random things.

Random things below, not sure why

> For other cases? Who knows.. A lot of the "change and test atomically"
> things have the same containment need, so it might not be a problem.

Right, all atomic ops already contain a memory clobber to go along with
their memory barrier semantics.

So I did a defconfig build without the patch and with the patch and got
a significant size increase:

      text    data     bss   filename
   11173443 1423736 1183744 defconfig-build/vmlinux.before
   11174516 1423736 1183744 defconfig-build/vmlinux.after

I then undid all the bitop conversions that added the extra memory
clobber and got:

   11174204 1423736 1183744 defconfig-build/vmlinux.after

Which is still a significant increase, so I had a look at what GCC
generates, for mm/rmap.c, which uses quite a few atomic_*_and_test(),
functions I got:

   text    data     bss     dec     hex filename
   8675       1      16    8692    21f4 defconfig-build/mm/rmap.o
   8739       1      16    8756    2234 defconfig-build/mm/rmap.o

So the increase is there too, doing a objdump -D on them the first
difference is:

0000000000000660 <do_page_add_anon_rmap>:
     660:	55                   	push   %rbp
     661:	48 89 e5             	mov    %rsp,%rbp
     664:	48 83 ec 20          	sub    $0x20,%rsp
     668:	48 89 5d f0          	mov    %rbx,-0x10(%rbp)
     66c:	4c 89 65 f8          	mov    %r12,-0x8(%rbp)
     670:	48 89 fb             	mov    %rdi,%rbx
     673:	f0 ff 47 18          	lock incl 0x18(%rdi)
     677:	0f 94 c0             	sete   %al
     67a:	84 c0                	test   %al,%al
     67c:	75 12                	jne    690 <do_page_add_anon_rmap+0x30>
     67e:	48 8b 5d f0          	mov    -0x10(%rbp),%rbx
     682:	4c 8b 65 f8          	mov    -0x8(%rbp),%r12
     686:	c9                   	leaveq 

vs.:

0000000000000660 <do_page_add_anon_rmap>:
     660:	55                   	push   %rbp
     661:	48 89 e5             	mov    %rsp,%rbp
     664:	48 83 ec 20          	sub    $0x20,%rsp
     668:	48 89 5d e0          	mov    %rbx,-0x20(%rbp)
     66c:	4c 89 65 e8          	mov    %r12,-0x18(%rbp)
     670:	48 89 fb             	mov    %rdi,%rbx
     673:	4c 89 6d f0          	mov    %r13,-0x10(%rbp)
     677:	4c 89 75 f8          	mov    %r14,-0x8(%rbp)
     67b:	f0 ff 47 18          	lock incl 0x18(%rdi)
     67f:	74 17                	je     698 <do_page_add_anon_rmap+0x38>
     681:	48 8b 5d e0          	mov    -0x20(%rbp),%rbx
     685:	4c 8b 65 e8          	mov    -0x18(%rbp),%r12
     689:	4c 8b 6d f0          	mov    -0x10(%rbp),%r13
     68d:	4c 8b 75 f8          	mov    -0x8(%rbp),%r14
     691:	c9                   	leaveq 

For some obscure (to me) reason the new fangled asm goto construct
generates a bunch of extra MOVs.

OTOH the good news is that a kernel with that patch applied does indeed
boot properly on real hardware.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-12 11:51                           ` Peter Zijlstra
@ 2013-09-12 12:25                             ` Ingo Molnar
  0 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2013-09-12 12:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
	Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
	Linux Kernel Mailing List, linux-arch


* Peter Zijlstra <peterz@infradead.org> wrote:

> So the increase is there too, doing a objdump -D on them the first 
> difference is:
> 
> 0000000000000660 <do_page_add_anon_rmap>:
>      660:	55                   	push   %rbp
>      661:	48 89 e5             	mov    %rsp,%rbp
>      664:	48 83 ec 20          	sub    $0x20,%rsp
>      668:	48 89 5d f0          	mov    %rbx,-0x10(%rbp)
>      66c:	4c 89 65 f8          	mov    %r12,-0x8(%rbp)
>      670:	48 89 fb             	mov    %rdi,%rbx
>      673:	f0 ff 47 18          	lock incl 0x18(%rdi)
>      677:	0f 94 c0             	sete   %al
>      67a:	84 c0                	test   %al,%al
>      67c:	75 12                	jne    690 <do_page_add_anon_rmap+0x30>
>      67e:	48 8b 5d f0          	mov    -0x10(%rbp),%rbx
>      682:	4c 8b 65 f8          	mov    -0x8(%rbp),%r12
>      686:	c9                   	leaveq 
> 
> vs.:
> 
> 0000000000000660 <do_page_add_anon_rmap>:
>      660:	55                   	push   %rbp
>      661:	48 89 e5             	mov    %rsp,%rbp
>      664:	48 83 ec 20          	sub    $0x20,%rsp
>      668:	48 89 5d e0          	mov    %rbx,-0x20(%rbp)
>      66c:	4c 89 65 e8          	mov    %r12,-0x18(%rbp)
>      670:	48 89 fb             	mov    %rdi,%rbx
>      673:	4c 89 6d f0          	mov    %r13,-0x10(%rbp)
>      677:	4c 89 75 f8          	mov    %r14,-0x8(%rbp)
>      67b:	f0 ff 47 18          	lock incl 0x18(%rdi)
>      67f:	74 17                	je     698 <do_page_add_anon_rmap+0x38>
>      681:	48 8b 5d e0          	mov    -0x20(%rbp),%rbx
>      685:	4c 8b 65 e8          	mov    -0x18(%rbp),%r12
>      689:	4c 8b 6d f0          	mov    -0x10(%rbp),%r13
>      68d:	4c 8b 75 f8          	mov    -0x8(%rbp),%r14
>      691:	c9                   	leaveq 
> 
> For some obscure (to me) reason the new fangled asm goto construct 
> generates a bunch of extra MOVs.

It adds two pairs of MOVs that shows that R13 and R14 got clobbered, but 
the change also got rid of of a SETE and a TEST here:

>      673:	f0 ff 47 18          	lock incl 0x18(%rdi)
>      677:	0f 94 c0             	sete   %al
>      67a:	84 c0                	test   %al,%al
>      67c:	75 12                	jne    690 <do_page_add_anon_rmap+0x30>

so there's a slight increase in size, but the extra instructions look 
rather lightweight and it could all go away if asm goto is improved ...

It would all be very sweet if all those clobbers went away.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-12  2:20                       ` Peter Zijlstra
  2013-09-12  2:43                         ` Linus Torvalds
@ 2013-09-13  7:25                         ` Kevin Easton
  2013-09-13  8:06                           ` Peter Zijlstra
  1 sibling, 1 reply; 52+ messages in thread
From: Kevin Easton @ 2013-09-13  7:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, LKML

On Thu, Sep 12, 2013 at 04:20:40AM +0200, Peter Zijlstra wrote:
> On Wed, Sep 11, 2013 at 04:02:14PM -0700, Linus Torvalds wrote:
> > On Wed, Sep 11, 2013 at 11:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > OK, stripped it down further, I couldn't quite see how to collapse the
> > > unary and binary operator variants though :/
> > 
> > Ok, this looks pretty good. I assume it works too? ;)
> 
> Only compile tested that one.. the below is kvm boot tested until root
> mount -- I'll try on actual hardware when I've gotten some sleep.
> 
> I split the thing up into two macros GEN_UNARY_RMWcc and
> GEN_BINARY_RMWcc which ends up being more readable as well as smaller
> code overall.

If you wanted to collapse the unary and binary variants as you mentioned
upthread, you could do something like (for the CC_HAVE_ASM_GOTO case):

#define GEN_RMWcc(fullop, cc, ...) \
do { \
 asm volatile goto (fullop \
 "j" cc " %l[cc_label]" \
 : : __VA_ARGS__ \
 : "memory" : cc_label); \
 return 0; \
cc_label: \
 return 1; \
} while (0)

#define GEN_UNARY_RMWcc(op, var, arg0, cc) GEN_RMWcc(op " " arg0 ";", cc, "m" (var))
#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) GEN_RMWcc(op " %1, " arg0 ";", cc, "m" (var), "er" (val))

    - Kevin

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 0/7] preempt_count rework -v2
  2013-09-13  7:25                         ` Kevin Easton
@ 2013-09-13  8:06                           ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-13  8:06 UTC (permalink / raw)
  To: Kevin Easton
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
	Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
	Frederic Weisbecker, LKML

On Fri, Sep 13, 2013 at 05:25:00PM +1000, Kevin Easton wrote:
> If you wanted to collapse the unary and binary variants as you mentioned
> upthread, you could do something like (for the CC_HAVE_ASM_GOTO case):

Indeed, another 11 lines off, awesome!

--- /dev/null
+++ b/arch/x86/include/asm/rmwcc.h
@@ -0,0 +1,41 @@
+#ifndef _ASM_X86_RMWcc
+#define _ASM_X86_RMWcc
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define __GEN_RMWcc(fullop, var, cc, ...)				\
+do {									\
+	asm volatile goto (fullop "; j" cc " %l[cc_label]"		\
+			: : "m" (var), ## __VA_ARGS__ 			\
+			: "memory" : cc_label);				\
+	return 0;							\
+cc_label:								\
+	return 1;							\
+} while (0)
+
+#define GEN_UNARY_RMWcc(op, var, arg0, cc) 				\
+	__GEN_RMWcc(op " " arg0, var, cc)
+
+#define GEN_BINARY_RMWcc(op, var, val, arg0, cc)			\
+	__GEN_RMWcc(op " %1, " arg0, var, cc, "er" (val))
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define __GEN_RMWcc(fullop, var, cc, ...)				\
+do {									\
+	char c;								\
+	asm volatile (fullop "; set" cc " %1"				\
+			: "+m" (var), "=qm" (c)				\
+			: __VA_ARGS__ : "memory");			\
+	return c != 0;							\
+} while (0)
+
+#define GEN_UNARY_RMWcc(op, var, arg0, cc)				\
+	__GEN_RMWcc(op " " arg0, var, cc)
+
+#define GEN_BINARY_RMWcc(op, var, val, arg0, cc)			\
+	__GEN_RMWcc(op " %2, " arg0, var, cc, "er" (val))
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_RMWcc */


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2013-09-13  8:06 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
2013-09-10 13:08 ` [PATCH 1/7] sched: Introduce preempt_count accessor functions Peter Zijlstra
2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
2013-09-11  1:59   ` Andy Lutomirski
2013-09-11  8:25     ` Peter Zijlstra
2013-09-11 11:06       ` Peter Zijlstra
2013-09-11 13:34         ` Mike Galbraith
2013-09-12  6:01           ` Mike Galbraith
2013-09-11 16:35         ` Andy Lutomirski
2013-09-11 18:05           ` Peter Zijlstra
2013-09-11 18:07             ` Andy Lutomirski
2013-09-11 11:14   ` Peter Zijlstra
2013-09-10 13:08 ` [PATCH 3/7] sched, arch: Create asm/preempt.h Peter Zijlstra
2013-09-10 13:08 ` [PATCH 4/7] sched: Create more preempt_count accessors Peter Zijlstra
2013-09-10 13:08 ` [PATCH 5/7] sched: Extract the basic add/sub preempt_count modifiers Peter Zijlstra
2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
2013-09-10 13:27   ` Peter Zijlstra
2013-09-10 14:02   ` Eric Dumazet
2013-09-10 15:25     ` Peter Zijlstra
2013-09-10 16:48   ` Peter Zijlstra
2013-09-10 13:08 ` [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call Peter Zijlstra
2013-09-10 13:42   ` Ingo Molnar
2013-09-10 13:55     ` Jan Beulich
2013-09-10 13:55       ` Jan Beulich
2013-09-10 14:25       ` Ingo Molnar
2013-09-10 13:51 ` [PATCH 0/7] preempt_count rework -v2 Ingo Molnar
2013-09-10 13:56   ` Ingo Molnar
2013-09-10 15:14     ` Peter Zijlstra
2013-09-10 15:29     ` Arjan van de Ven
2013-09-10 15:35       ` Peter Zijlstra
2013-09-10 16:24       ` Linus Torvalds
2013-09-11 16:00         ` H. Peter Anvin
2013-09-10 16:34     ` Linus Torvalds
2013-09-10 16:45       ` Peter Zijlstra
2013-09-10 17:06         ` Linus Torvalds
2013-09-10 21:25           ` Peter Zijlstra
2013-09-10 21:43             ` Linus Torvalds
2013-09-10 21:51               ` H. Peter Anvin
2013-09-10 22:02                 ` Linus Torvalds
2013-09-10 22:06                   ` H. Peter Anvin
2013-09-11 13:13               ` Peter Zijlstra
2013-09-11 13:26                 ` Peter Zijlstra
2013-09-11 15:29                 ` H. Peter Anvin
2013-09-11 15:33                 ` Linus Torvalds
2013-09-11 18:59                   ` Peter Zijlstra
2013-09-11 23:02                     ` Linus Torvalds
2013-09-12  2:20                       ` Peter Zijlstra
2013-09-12  2:43                         ` Linus Torvalds
2013-09-12 11:51                           ` Peter Zijlstra
2013-09-12 12:25                             ` Ingo Molnar
2013-09-13  7:25                         ` Kevin Easton
2013-09-13  8:06                           ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.