* [PATCH 0/7] preempt_count rework -v2
@ 2013-09-10 13:08 Peter Zijlstra
2013-09-10 13:08 ` [PATCH 1/7] sched: Introduce preempt_count accessor functions Peter Zijlstra
` (7 more replies)
0 siblings, 8 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
Peter Zijlstra
These patches optimize preempt_enable by firstly folding the preempt and
need_resched tests into one -- this should work for all architectures. And
secondly by providing per-arch preempt_count implementations; with x86 using
per-cpu preempt_count for fastest access.
These patches have been boot tested on CONFIG_PREEMPT=y x86_64 and survive
building a x86_64-defconfig kernel.
kernel/sched/core.c:kick_process() now looks like:
ffffffff8106f3f0 <kick_process>:
ffffffff8106f3f0: 55 push %rbp
ffffffff8106f3f1: 65 ff 04 25 e0 b7 00 incl %gs:0xb7e0
ffffffff8106f3f8: 00
ffffffff8106f3f9: 48 89 e5 mov %rsp,%rbp
ffffffff8106f3fc: 48 8b 47 08 mov 0x8(%rdi),%rax
ffffffff8106f400: 8b 50 18 mov 0x18(%rax),%edx
ffffffff8106f403: 65 8b 04 25 1c b0 00 mov %gs:0xb01c,%eax
ffffffff8106f40a: 00
ffffffff8106f40b: 39 c2 cmp %eax,%edx
ffffffff8106f40d: 74 1b je ffffffff8106f42a <kick_process+0x3a>
ffffffff8106f40f: 89 d1 mov %edx,%ecx
ffffffff8106f411: 48 c7 c0 00 2c 01 00 mov $0x12c00,%rax
ffffffff8106f418: 48 8b 0c cd a0 bc cb mov -0x7e344360(,%rcx,8),%rcx
ffffffff8106f41f: 81
ffffffff8106f420: 48 3b bc 08 00 08 00 cmp 0x800(%rax,%rcx,1),%rdi
ffffffff8106f427: 00
ffffffff8106f428: 74 1e je ffffffff8106f448 <kick_process+0x58>
* ffffffff8106f42a: 65 ff 0c 25 e0 b7 00 decl %gs:0xb7e0
ffffffff8106f431: 00
* ffffffff8106f432: 0f 94 c0 sete %al
* ffffffff8106f435: 84 c0 test %al,%al
* ffffffff8106f437: 75 02 jne ffffffff8106f43b <kick_process+0x4b>
ffffffff8106f439: 5d pop %rbp
ffffffff8106f43a: c3 retq
* ffffffff8106f43b: e8 b0 b6 f9 ff callq ffffffff8100aaf0 <___preempt_schedule>
ffffffff8106f440: 5d pop %rbp
ffffffff8106f441: c3 retq
ffffffff8106f442: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
ffffffff8106f448: 89 d7 mov %edx,%edi
ffffffff8106f44a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
ffffffff8106f450: ff 15 ea e0 ba 00 callq *0xbae0ea(%rip) # ffffffff81c1d540 <smp_ops+0x20>
ffffffff8106f456: eb d2 jmp ffffffff8106f42a <kick_process+0x3a>
ffffffff8106f458: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
ffffffff8106f45f: 00
Where the '*' marked lines are preempt_enable(), sadly GCC isn't able to get
rid of the sete+test :/ Its a rather frequent pattern in the kernel, so
'fixing' the x86 GCC backend to recognise this might be useful.
---
arch/alpha/include/asm/Kbuild | 1 +
arch/arc/include/asm/Kbuild | 1 +
arch/arm/include/asm/Kbuild | 1 +
arch/arm64/include/asm/Kbuild | 1 +
arch/avr32/include/asm/Kbuild | 1 +
arch/blackfin/include/asm/Kbuild | 1 +
arch/c6x/include/asm/Kbuild | 1 +
arch/cris/include/asm/Kbuild | 1 +
arch/frv/include/asm/Kbuild | 1 +
arch/h8300/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/ia64/include/asm/Kbuild | 1 +
arch/m32r/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/metag/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/mips/mm/init.c | 5 +-
arch/mn10300/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
arch/parisc/include/asm/Kbuild | 1 +
arch/powerpc/include/asm/Kbuild | 1 +
arch/s390/include/asm/Kbuild | 1 +
arch/score/include/asm/Kbuild | 1 +
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/tile/include/asm/Kbuild | 1 +
arch/um/include/asm/Kbuild | 1 +
arch/unicore32/include/asm/Kbuild | 1 +
arch/x86/include/asm/calling.h | 50 +++++++++++++++++
arch/x86/include/asm/thread_info.h | 5 +-
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/asm-offsets.c | 1 -
arch/x86/kernel/cpu/common.c | 5 ++
arch/x86/kernel/entry_32.S | 7 +--
arch/x86/kernel/entry_64.S | 4 +-
arch/x86/kernel/process_32.c | 10 ++++
arch/x86/kernel/process_64.c | 10 ++++
arch/x86/kernel/traps.c | 4 +-
arch/xtensa/include/asm/Kbuild | 1 +
include/linux/hardirq.h | 8 +--
include/linux/preempt.h | 112 +++++++++++++++++--------------------
include/linux/sched.h | 5 --
include/linux/thread_info.h | 1 +
include/linux/uaccess.h | 8 +--
include/trace/events/sched.h | 2 +-
init/main.c | 2 +-
kernel/context_tracking.c | 4 +-
kernel/cpu/idle.c | 7 +++
kernel/sched/core.c | 54 +++++++++---------
kernel/softirq.c | 16 +++---
kernel/timer.c | 8 +--
lib/smp_processor_id.c | 3 +-
53 files changed, 226 insertions(+), 136 deletions(-)
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 1/7] sched: Introduce preempt_count accessor functions
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
` (6 subsequent siblings)
7 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
Peter Zijlstra
[-- Attachment #1: peterz-preempt_count-accessors.patch --]
[-- Type: text/plain, Size: 4950 bytes --]
Replace the single preempt_count() 'function' that's an lvalue with
two proper functions:
preempt_count() - returns the preempt_count value as rvalue
preempt_count_ptr() - returns a pointer to the preempt_count
Then change all sites that modify the preempt count to use
preempt_count_ptr().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
include/linux/preempt.h | 20 ++++++++++++++------
init/main.c | 2 +-
kernel/sched/core.c | 4 ++--
kernel/softirq.c | 4 ++--
kernel/timer.c | 8 ++++----
lib/smp_processor_id.c | 3 +--
6 files changed, 24 insertions(+), 17 deletions(-)
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,19 +10,27 @@
#include <linux/linkage.h>
#include <linux/list.h>
+static __always_inline int preempt_count(void)
+{
+ return current_thread_info()->preempt_count;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+ return ¤t_thread_info()->preempt_count;
+}
+
#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
extern void add_preempt_count(int val);
extern void sub_preempt_count(int val);
#else
-# define add_preempt_count(val) do { preempt_count() += (val); } while (0)
-# define sub_preempt_count(val) do { preempt_count() -= (val); } while (0)
+# define add_preempt_count(val) do { *preempt_count_ptr() += (val); } while (0)
+# define sub_preempt_count(val) do { *preempt_count_ptr() -= (val); } while (0)
#endif
#define inc_preempt_count() add_preempt_count(1)
#define dec_preempt_count() sub_preempt_count(1)
-#define preempt_count() (current_thread_info()->preempt_count)
-
#ifdef CONFIG_PREEMPT
asmlinkage void preempt_schedule(void);
@@ -81,9 +89,9 @@ do { \
/* For debugging and tracer internals only! */
#define add_preempt_count_notrace(val) \
- do { preempt_count() += (val); } while (0)
+ do { *preempt_count_ptr() += (val); } while (0)
#define sub_preempt_count_notrace(val) \
- do { preempt_count() -= (val); } while (0)
+ do { *preempt_count_ptr() -= (val); } while (0)
#define inc_preempt_count_notrace() add_preempt_count_notrace(1)
#define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
--- a/init/main.c
+++ b/init/main.c
@@ -690,7 +690,7 @@ int __init_or_module do_one_initcall(ini
if (preempt_count() != count) {
sprintf(msgbuf, "preemption imbalance ");
- preempt_count() = count;
+ *preempt_count_ptr() = count;
}
if (irqs_disabled()) {
strlcat(msgbuf, "disabled interrupts ", sizeof(msgbuf));
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2233,7 +2233,7 @@ void __kprobes add_preempt_count(int val
if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
return;
#endif
- preempt_count() += val;
+ *preempt_count_ptr() += val;
#ifdef CONFIG_DEBUG_PREEMPT
/*
* Spinlock count overflowing soon?
@@ -2264,7 +2264,7 @@ void __kprobes sub_preempt_count(int val
if (preempt_count() == val)
trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
- preempt_count() -= val;
+ *preempt_count_ptr() -= val;
}
EXPORT_SYMBOL(sub_preempt_count);
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -106,7 +106,7 @@ static void __local_bh_disable(unsigned
* We must manually increment preempt_count here and manually
* call the trace_preempt_off later.
*/
- preempt_count() += cnt;
+ *preempt_count_ptr() += cnt;
/*
* Were softirqs turned off above:
*/
@@ -256,7 +256,7 @@ asmlinkage void __do_softirq(void)
" exited with %08x?\n", vec_nr,
softirq_to_name[vec_nr], h->action,
prev_count, preempt_count());
- preempt_count() = prev_count;
+ *preempt_count_ptr() = prev_count;
}
rcu_bh_qs(cpu);
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1092,7 +1092,7 @@ static int cascade(struct tvec_base *bas
static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
unsigned long data)
{
- int preempt_count = preempt_count();
+ int count = preempt_count();
#ifdef CONFIG_LOCKDEP
/*
@@ -1119,16 +1119,16 @@ static void call_timer_fn(struct timer_l
lock_map_release(&lockdep_map);
- if (preempt_count != preempt_count()) {
+ if (count != preempt_count()) {
WARN_ONCE(1, "timer: %pF preempt leak: %08x -> %08x\n",
- fn, preempt_count, preempt_count());
+ fn, count, preempt_count());
/*
* Restore the preempt count. That gives us a decent
* chance to survive and extract information. If the
* callback kept a lock held, bad luck, but not worse
* than the BUG() we had.
*/
- preempt_count() = preempt_count;
+ *preempt_count_ptr() = count;
}
}
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -9,10 +9,9 @@
notrace unsigned int debug_smp_processor_id(void)
{
- unsigned long preempt_count = preempt_count();
int this_cpu = raw_smp_processor_id();
- if (likely(preempt_count))
+ if (likely(preempt_count()))
goto out;
if (irqs_disabled())
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
2013-09-10 13:08 ` [PATCH 1/7] sched: Introduce preempt_count accessor functions Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
2013-09-11 1:59 ` Andy Lutomirski
2013-09-11 11:14 ` Peter Zijlstra
2013-09-10 13:08 ` [PATCH 3/7] sched, arch: Create asm/preempt.h Peter Zijlstra
` (5 subsequent siblings)
7 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
Peter Zijlstra
[-- Attachment #1: peterz-preempt_count-need_resched.patch --]
[-- Type: text/plain, Size: 7108 bytes --]
In order to combine the preemption and need_resched test we need to
fold the need_resched information into the preempt_count value.
We keep the existing TIF_NEED_RESCHED infrastructure in place but at 3
sites test it and fold its value into preempt_count; namely:
- resched_task() when setting TIF_NEED_RESCHED on the current task
- scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
remote task it follows it up with a reschedule IPI
and we can modify the cpu local preempt_count from
there.
- cpu_idle_loop() for when resched_task() found tsk_is_polling().
We use an inverted bitmask to indicate need_resched so that a 0 means
both need_resched and !atomic.
Also remove the barrier() in preempt_enable() between
preempt_enable_no_resched() and preempt_check_resched() to avoid
having to reload the preemption value and allow the compiler to use
the flags of the previuos decrement. I couldn't come up with any sane
reason for this barrier() to be there as preempt_enable_no_resched()
already has a barrier() before doing the decrement.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
include/linux/preempt.h | 42 +++++++++++++++++++++++++++++++++++++-----
include/linux/sched.h | 2 +-
include/linux/thread_info.h | 1 +
kernel/context_tracking.c | 2 +-
kernel/cpu/idle.c | 10 ++++++++++
kernel/sched/core.c | 18 ++++++++++++++----
6 files changed, 64 insertions(+), 11 deletions(-)
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,9 +10,19 @@
#include <linux/linkage.h>
#include <linux/list.h>
+/*
+ * We use the MSB mostly because its available; see <linux/hardirq.h> for
+ * the other bits.
+ */
+#define PREEMPT_NEED_RESCHED 0x80000000
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
static __always_inline int preempt_count(void)
{
- return current_thread_info()->preempt_count;
+ return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
}
static __always_inline int *preempt_count_ptr(void)
@@ -20,6 +30,30 @@ static __always_inline int *preempt_coun
return ¤t_thread_info()->preempt_count;
}
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+ *preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+ *preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+ return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
+}
+
#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
extern void add_preempt_count(int val);
extern void sub_preempt_count(int val);
@@ -37,7 +71,7 @@ asmlinkage void preempt_schedule(void);
#define preempt_check_resched() \
do { \
- if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ if (unlikely(!*preempt_count_ptr())) \
preempt_schedule(); \
} while (0)
@@ -47,7 +81,7 @@ void preempt_schedule_context(void);
#define preempt_check_resched_context() \
do { \
- if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ if (unlikely(!*preempt_count_ptr())) \
preempt_schedule_context(); \
} while (0)
#else
@@ -83,7 +117,6 @@ do { \
#define preempt_enable() \
do { \
preempt_enable_no_resched(); \
- barrier(); \
preempt_check_resched(); \
} while (0)
@@ -111,7 +144,6 @@ do { \
#define preempt_enable_notrace() \
do { \
preempt_enable_no_resched_notrace(); \
- barrier(); \
preempt_check_resched_context(); \
} while (0)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2405,7 +2405,7 @@ static inline int signal_pending_state(l
static inline int need_resched(void)
{
- return unlikely(test_thread_flag(TIF_NEED_RESCHED));
+ return unlikely(test_preempt_need_resched());
}
/*
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -104,6 +104,7 @@ static inline int test_ti_thread_flag(st
#define test_thread_flag(flag) \
test_ti_thread_flag(current_thread_info(), flag)
+#define test_need_resched() test_thread_flag(TIF_NEED_RESCHED)
#define set_need_resched() set_thread_flag(TIF_NEED_RESCHED)
#define clear_need_resched() clear_thread_flag(TIF_NEED_RESCHED)
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -115,7 +115,7 @@ void __sched notrace preempt_schedule_co
{
enum ctx_state prev_ctx;
- if (likely(!preemptible()))
+ if (likely(preempt_count() || irqs_disabled()))
return;
/*
--- a/kernel/cpu/idle.c
+++ b/kernel/cpu/idle.c
@@ -106,6 +106,13 @@ static void cpu_idle_loop(void)
current_set_polling();
}
arch_cpu_idle_exit();
+ /*
+ * We need to test and propagate the TIF_NEED_RESCHED
+ * bit here because we might not have send the
+ * reschedule IPI to idle tasks.
+ */
+ if (test_need_resched())
+ set_preempt_need_resched();
}
tick_nohz_idle_exit();
schedule_preempt_disabled();
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -526,8 +526,10 @@ void resched_task(struct task_struct *p)
set_tsk_need_resched(p);
cpu = task_cpu(p);
- if (cpu == smp_processor_id())
+ if (cpu == smp_processor_id()) {
+ set_preempt_need_resched();
return;
+ }
/* NEED_RESCHED must be visible before we test polling */
smp_mb();
@@ -1411,6 +1413,14 @@ static void sched_ttwu_pending(void)
void scheduler_ipi(void)
{
+ /*
+ * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting
+ * TIF_NEED_RESCHED remotely (for the first time) will also send
+ * this IPI.
+ */
+ if (test_need_resched())
+ set_preempt_need_resched();
+
if (llist_empty(&this_rq()->wake_list)
&& !tick_nohz_full_cpu(smp_processor_id())
&& !got_nohz_idle_kick())
@@ -2445,6 +2455,7 @@ static void __sched __schedule(void)
put_prev_task(rq, prev);
next = pick_next_task(rq);
clear_tsk_need_resched(prev);
+ clear_preempt_need_resched();
rq->skip_clock_update = 0;
if (likely(prev != next)) {
@@ -2531,7 +2542,7 @@ asmlinkage void __sched notrace preempt_
* If there is a non-zero preempt_count or interrupts are disabled,
* we do not want to preempt the current task. Just return..
*/
- if (likely(!preemptible()))
+ if (likely(preempt_count() || irqs_disabled()))
return;
do {
@@ -2556,11 +2567,10 @@ EXPORT_SYMBOL(preempt_schedule);
*/
asmlinkage void __sched preempt_schedule_irq(void)
{
- struct thread_info *ti = current_thread_info();
enum ctx_state prev_state;
/* Catch callers which need to be fixed */
- BUG_ON(ti->preempt_count || !irqs_disabled());
+ BUG_ON(preempt_count() || !irqs_disabled());
prev_state = exception_enter();
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 3/7] sched, arch: Create asm/preempt.h
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
2013-09-10 13:08 ` [PATCH 1/7] sched: Introduce preempt_count accessor functions Peter Zijlstra
2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
2013-09-10 13:08 ` [PATCH 4/7] sched: Create more preempt_count accessors Peter Zijlstra
` (4 subsequent siblings)
7 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
Peter Zijlstra
[-- Attachment #1: peterz-asm-preempt_count.patch --]
[-- Type: text/plain, Size: 10772 bytes --]
In order to prepare to per-arch implementations of preempt_count move
the required bits into an asm-generic header and use this for all
archs.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
arch/alpha/include/asm/Kbuild | 1
arch/arc/include/asm/Kbuild | 1
arch/arm/include/asm/Kbuild | 1
arch/arm64/include/asm/Kbuild | 1
arch/avr32/include/asm/Kbuild | 1
arch/blackfin/include/asm/Kbuild | 1
arch/c6x/include/asm/Kbuild | 1
arch/cris/include/asm/Kbuild | 1
arch/frv/include/asm/Kbuild | 1
arch/h8300/include/asm/Kbuild | 1
arch/hexagon/include/asm/Kbuild | 1
arch/ia64/include/asm/Kbuild | 1
arch/m32r/include/asm/Kbuild | 1
arch/m68k/include/asm/Kbuild | 1
arch/metag/include/asm/Kbuild | 1
arch/microblaze/include/asm/Kbuild | 1
arch/mips/include/asm/Kbuild | 1
arch/mn10300/include/asm/Kbuild | 1
arch/openrisc/include/asm/Kbuild | 1
arch/parisc/include/asm/Kbuild | 1
arch/powerpc/include/asm/Kbuild | 1
arch/s390/include/asm/Kbuild | 1
arch/score/include/asm/Kbuild | 1
arch/sh/include/asm/Kbuild | 1
arch/sparc/include/asm/Kbuild | 1
arch/tile/include/asm/Kbuild | 1
arch/um/include/asm/Kbuild | 1
arch/unicore32/include/asm/Kbuild | 1
arch/x86/include/asm/Kbuild | 1
arch/xtensa/include/asm/Kbuild | 1
include/asm-generic/preempt.h | 44 +++++++++++++++++++++++++++++++++++++
include/linux/preempt.h | 39 --------------------------------
32 files changed, 75 insertions(+), 38 deletions(-)
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
generic-y += exec.h
generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -46,3 +46,4 @@ generic-y += ucontext.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -33,3 +33,4 @@ generic-y += timex.h
generic-y += trace_clock.h
generic-y += types.h
generic-y += unaligned.h
+generic-y += preempt.h
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -50,3 +50,4 @@ generic-y += unaligned.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
generic-y += exec.h
generic-y += trace_clock.h
generic-y += param.h
+generic-y += preempt.h
--- a/arch/blackfin/include/asm/Kbuild
+++ b/arch/blackfin/include/asm/Kbuild
@@ -44,3 +44,4 @@ generic-y += ucontext.h
generic-y += unaligned.h
generic-y += user.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -56,3 +56,4 @@ generic-y += ucontext.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/cris/include/asm/Kbuild
+++ b/arch/cris/include/asm/Kbuild
@@ -11,3 +11,4 @@ generic-y += module.h
generic-y += trace_clock.h
generic-y += vga.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/frv/include/asm/Kbuild
+++ b/arch/frv/include/asm/Kbuild
@@ -2,3 +2,4 @@
generic-y += clkdev.h
generic-y += exec.h
generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/h8300/include/asm/Kbuild
+++ b/arch/h8300/include/asm/Kbuild
@@ -6,3 +6,4 @@ generic-y += mmu.h
generic-y += module.h
generic-y += trace_clock.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -53,3 +53,4 @@ generic-y += types.h
generic-y += ucontext.h
generic-y += unaligned.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -3,4 +3,5 @@ generic-y += clkdev.h
generic-y += exec.h
generic-y += kvm_para.h
generic-y += trace_clock.h
+generic-y += preempt.h
generic-y += vtime.h
\ No newline at end of file
--- a/arch/m32r/include/asm/Kbuild
+++ b/arch/m32r/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
generic-y += exec.h
generic-y += module.h
generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -31,3 +31,4 @@ generic-y += trace_clock.h
generic-y += types.h
generic-y += word-at-a-time.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/metag/include/asm/Kbuild
+++ b/arch/metag/include/asm/Kbuild
@@ -52,3 +52,4 @@ generic-y += unaligned.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
generic-y += exec.h
generic-y += trace_clock.h
generic-y += syscalls.h
+generic-y += preempt.h
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -1,2 +1,3 @@
# MIPS headers
generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/mn10300/include/asm/Kbuild
+++ b/arch/mn10300/include/asm/Kbuild
@@ -2,3 +2,4 @@
generic-y += clkdev.h
generic-y += exec.h
generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -67,3 +67,4 @@ generic-y += ucontext.h
generic-y += user.h
generic-y += word-at-a-time.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += word-at-a-time.h auxvec.h u
div64.h irq_regs.h kdebug.h kvm_para.h local64.h local.h param.h \
poll.h xor.h clkdev.h exec.h
generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -2,4 +2,5 @@
generic-y += clkdev.h
generic-y += rwsem.h
generic-y += trace_clock.h
+generic-y += preempt.h
generic-y += vtime.h
\ No newline at end of file
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -2,3 +2,4 @@
generic-y += clkdev.h
generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/score/include/asm/Kbuild
+++ b/arch/score/include/asm/Kbuild
@@ -4,3 +4,4 @@ header-y +=
generic-y += clkdev.h
generic-y += trace_clock.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -34,3 +34,4 @@ generic-y += termios.h
generic-y += trace_clock.h
generic-y += ucontext.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -16,3 +16,4 @@ generic-y += serial.h
generic-y += trace_clock.h
generic-y += types.h
generic-y += word-at-a-time.h
+generic-y += preempt.h
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -37,3 +37,4 @@ generic-y += termios.h
generic-y += trace_clock.h
generic-y += types.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += hw_irq.h irq_regs.h kdebug.
generic-y += ftrace.h pci.h io.h param.h delay.h mutex.h current.h exec.h
generic-y += switch_to.h clkdev.h
generic-y += trace_clock.h
+generic-y += preempt.h
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@ -60,3 +60,4 @@ generic-y += unaligned.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
+generic-y += preempt.h
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,3 +5,4 @@ genhdr-y += unistd_64.h
genhdr-y += unistd_x32.h
generic-y += clkdev.h
+generic-y += preempt.h
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -28,3 +28,4 @@ generic-y += termios.h
generic-y += topology.h
generic-y += trace_clock.h
generic-y += xor.h
+generic-y += preempt.h
--- /dev/null
+++ b/include/asm-generic/preempt.h
@@ -0,0 +1,44 @@
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <linux/thread_info.h>
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+ return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+ return ¤t_thread_info()->preempt_count;
+}
+
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+ *preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+ *preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+ return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
+}
+
+#endif /* __ASM_PREEMPT_H */
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -6,7 +6,6 @@
* preempt_count (used for kernel preemption, interrupt count, etc.)
*/
-#include <linux/thread_info.h>
#include <linux/linkage.h>
#include <linux/list.h>
@@ -16,43 +15,7 @@
*/
#define PREEMPT_NEED_RESCHED 0x80000000
-/*
- * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
- * that think a non-zero value indicates we cannot preempt.
- */
-static __always_inline int preempt_count(void)
-{
- return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
-}
-
-static __always_inline int *preempt_count_ptr(void)
-{
- return ¤t_thread_info()->preempt_count;
-}
-
-/*
- * We fold the NEED_RESCHED bit into the preempt count such that
- * preempt_enable() can decrement and test for needing to reschedule with a
- * single instruction.
- *
- * We invert the actual bit, so that when the decrement hits 0 we know we both
- * need to resched (the bit is cleared) and can resched (no preempt count).
- */
-
-static __always_inline void set_preempt_need_resched(void)
-{
- *preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
-}
-
-static __always_inline void clear_preempt_need_resched(void)
-{
- *preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
-}
-
-static __always_inline bool test_preempt_need_resched(void)
-{
- return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
-}
+#include <asm/preempt.h>
#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
extern void add_preempt_count(int val);
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 4/7] sched: Create more preempt_count accessors
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
` (2 preceding siblings ...)
2013-09-10 13:08 ` [PATCH 3/7] sched, arch: Create asm/preempt.h Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
2013-09-10 13:08 ` [PATCH 5/7] sched: Extract the basic add/sub preempt_count modifiers Peter Zijlstra
` (3 subsequent siblings)
7 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
Peter Zijlstra
[-- Attachment #1: peterz-task_preempt_count.patch --]
[-- Type: text/plain, Size: 2762 bytes --]
We need a few special preempt_count accessors:
- task_preempt_count() for when we're interested in the preemption
count of another (non-running) task.
- init_task_preempt_count() for properly initializing the preemption
count.
- init_idle_preempt_count() a special case of the above for the idle
threads.
With these no generic code ever touches thread_info::preempt_count
anymore and architectures could choose to remove it.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
include/asm-generic/preempt.h | 14 ++++++++++++++
include/trace/events/sched.h | 2 +-
kernel/sched/core.c | 7 +++----
3 files changed, 18 insertions(+), 5 deletions(-)
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -18,6 +18,20 @@ static __always_inline int *preempt_coun
}
/*
+ * must be macros to avoid header recursion hell
+ */
+#define task_preempt_count(p) \
+ (task_thread_info(p)->preempt_count & ~PREEMPT_NEED_RESCHED)
+
+#define init_task_preempt_count(p) do { \
+ task_thread_info(p)->preempt_count = 1 | PREEMPT_NEED_RESCHED; \
+} while (0)
+
+#define init_idle_preempt_count(p, cpu) do { \
+ task_thread_info(p)->preempt_count = 0; \
+} while (0)
+
+/*
* We fold the NEED_RESCHED bit into the preempt count such that
* preempt_enable() can decrement and test for needing to reschedule with a
* single instruction.
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -100,7 +100,7 @@ static inline long __trace_sched_switch_
/*
* For all intents and purposes a preempted task is a running task.
*/
- if (task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)
+ if (task_preempt_count(p) & PREEMPT_ACTIVE)
state = TASK_RUNNING | TASK_STATE_MAX;
#endif
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -996,7 +996,7 @@ void set_task_cpu(struct task_struct *p,
* ttwu() will sort out the placement.
*/
WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
- !(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
+ !(task_preempt_count(p) & PREEMPT_ACTIVE));
#ifdef CONFIG_LOCKDEP
/*
@@ -1743,8 +1743,7 @@ void sched_fork(struct task_struct *p)
p->on_cpu = 0;
#endif
#ifdef CONFIG_PREEMPT_COUNT
- /* Want to start with kernel preemption disabled. */
- task_thread_info(p)->preempt_count = 1;
+ init_task_preempt_count(p);
#endif
#ifdef CONFIG_SMP
plist_node_init(&p->pushable_tasks, MAX_PRIO);
@@ -4237,7 +4236,7 @@ void init_idle(struct task_struct *idle,
raw_spin_unlock_irqrestore(&rq->lock, flags);
/* Set the preempt count _outside_ the spinlocks! */
- task_thread_info(idle)->preempt_count = 0;
+ init_idle_preempt_count(idle, cpu);
/*
* The idle tasks have their own, simple scheduling class:
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 5/7] sched: Extract the basic add/sub preempt_count modifiers
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
` (3 preceding siblings ...)
2013-09-10 13:08 ` [PATCH 4/7] sched: Create more preempt_count accessors Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
` (2 subsequent siblings)
7 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
Peter Zijlstra
[-- Attachment #1: peterz-cleanup-preempt.patch --]
[-- Type: text/plain, Size: 14016 bytes --]
Rewrite the preempt_count macros in order to extract the 3 basic
preempt_count value modifiers:
__preempt_count_add()
__preempt_count_sub()
and the new:
__preempt_count_dec_and_test()
And since we're at it anyway, replace the unconventional
$op_preempt_count names with the more conventional preempt_count_$op.
Since these basic operators are equivalent to the previous _notrace()
variants, do away with the _notrace() versions.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
arch/mips/mm/init.c | 5 -
arch/x86/kernel/traps.c | 4 -
include/asm-generic/preempt.h | 35 +++++++++++++
include/linux/hardirq.h | 8 +--
include/linux/preempt.h | 106 +++++++++++++++++++-----------------------
include/linux/sched.h | 5 -
include/linux/uaccess.h | 8 ---
kernel/context_tracking.c | 2
kernel/sched/core.c | 29 ++++-------
kernel/softirq.c | 12 ++--
10 files changed, 112 insertions(+), 102 deletions(-)
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -124,7 +124,7 @@ void *kmap_coherent(struct page *page, u
BUG_ON(Page_dcache_dirty(page));
- inc_preempt_count();
+ pagefault_disable();
idx = (addr >> PAGE_SHIFT) & (FIX_N_COLOURS - 1);
#ifdef CONFIG_MIPS_MT_SMTC
idx += FIX_N_COLOURS * smp_processor_id() +
@@ -193,8 +193,7 @@ void kunmap_coherent(void)
write_c0_entryhi(old_ctx);
EXIT_CRITICAL(flags);
#endif
- dec_preempt_count();
- preempt_check_resched();
+ pagefault_enable();
}
void copy_user_highpage(struct page *to, struct page *from,
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -88,7 +88,7 @@ static inline void conditional_sti(struc
static inline void preempt_conditional_sti(struct pt_regs *regs)
{
- inc_preempt_count();
+ preempt_count_inc();
if (regs->flags & X86_EFLAGS_IF)
local_irq_enable();
}
@@ -103,7 +103,7 @@ static inline void preempt_conditional_c
{
if (regs->flags & X86_EFLAGS_IF)
local_irq_disable();
- dec_preempt_count();
+ preempt_count_dec();
}
static int __kprobes
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -55,4 +55,39 @@ static __always_inline bool test_preempt
return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
}
+/*
+ * The various preempt_count add/sub methods
+ */
+
+static __always_inline void __preempt_count_add(int val)
+{
+ *preempt_count_ptr() += val;
+}
+
+static __always_inline void __preempt_count_sub(int val)
+{
+ *preempt_count_ptr() -= val;
+}
+
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+ return !--*preempt_count_ptr();
+}
+
+/*
+ * Returns true when we need to resched -- even if we can not.
+ */
+static __always_inline bool need_resched(void)
+{
+ return unlikely(test_preempt_need_resched());
+}
+
+/*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+static __always_inline bool should_resched(void)
+{
+ return unlikely(!*preempt_count_ptr());
+}
+
#endif /* __ASM_PREEMPT_H */
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -37,7 +37,7 @@ extern void rcu_nmi_exit(void);
#define __irq_enter() \
do { \
account_irq_enter_time(current); \
- add_preempt_count(HARDIRQ_OFFSET); \
+ preempt_count_add(HARDIRQ_OFFSET); \
trace_hardirq_enter(); \
} while (0)
@@ -53,7 +53,7 @@ extern void irq_enter(void);
do { \
trace_hardirq_exit(); \
account_irq_exit_time(current); \
- sub_preempt_count(HARDIRQ_OFFSET); \
+ preempt_count_sub(HARDIRQ_OFFSET); \
} while (0)
/*
@@ -66,7 +66,7 @@ extern void irq_exit(void);
lockdep_off(); \
ftrace_nmi_enter(); \
BUG_ON(in_nmi()); \
- add_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET); \
+ preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
rcu_nmi_enter(); \
trace_hardirq_enter(); \
} while (0)
@@ -76,7 +76,7 @@ extern void irq_exit(void);
trace_hardirq_exit(); \
rcu_nmi_exit(); \
BUG_ON(!in_nmi()); \
- sub_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET); \
+ preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
ftrace_nmi_exit(); \
lockdep_on(); \
} while (0)
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -18,97 +18,86 @@
#include <asm/preempt.h>
#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
- extern void add_preempt_count(int val);
- extern void sub_preempt_count(int val);
+extern void preempt_count_add(int val);
+extern void preempt_count_sub(int val);
+#define preempt_count_dec_and_test() ({ preempt_count_sub(1); should_resched(); })
#else
-# define add_preempt_count(val) do { *preempt_count_ptr() += (val); } while (0)
-# define sub_preempt_count(val) do { *preempt_count_ptr() -= (val); } while (0)
+#define preempt_count_add(val) __preempt_count_add(val)
+#define preempt_count_sub(val) __preempt_count_sub(val)
+#define preempt_count_dec_and_test() __preempt_count_dec_and_test()
#endif
-#define inc_preempt_count() add_preempt_count(1)
-#define dec_preempt_count() sub_preempt_count(1)
-
-#ifdef CONFIG_PREEMPT
-
-asmlinkage void preempt_schedule(void);
-
-#define preempt_check_resched() \
-do { \
- if (unlikely(!*preempt_count_ptr())) \
- preempt_schedule(); \
-} while (0)
-
-#ifdef CONFIG_CONTEXT_TRACKING
-
-void preempt_schedule_context(void);
-
-#define preempt_check_resched_context() \
-do { \
- if (unlikely(!*preempt_count_ptr())) \
- preempt_schedule_context(); \
-} while (0)
-#else
-
-#define preempt_check_resched_context() preempt_check_resched()
-
-#endif /* CONFIG_CONTEXT_TRACKING */
-
-#else /* !CONFIG_PREEMPT */
-
-#define preempt_check_resched() do { } while (0)
-#define preempt_check_resched_context() do { } while (0)
-
-#endif /* CONFIG_PREEMPT */
+#define __preempt_count_inc() __preempt_count_add(1)
+#define __preempt_count_dec() __preempt_count_sub(1)
+#define preempt_count_inc() preempt_count_add(1)
+#define preempt_count_dec() preempt_count_sub(1)
#ifdef CONFIG_PREEMPT_COUNT
#define preempt_disable() \
do { \
- inc_preempt_count(); \
+ preempt_count_inc(); \
barrier(); \
} while (0)
#define sched_preempt_enable_no_resched() \
do { \
barrier(); \
- dec_preempt_count(); \
+ preempt_count_dec(); \
} while (0)
-#define preempt_enable_no_resched() sched_preempt_enable_no_resched()
+#define preempt_enable_no_resched() sched_preempt_enable_no_resched()
+#ifdef CONFIG_PREEMPT
+asmlinkage void preempt_schedule(void);
#define preempt_enable() \
do { \
- preempt_enable_no_resched(); \
- preempt_check_resched(); \
+ barrier(); \
+ if (unlikely(preempt_count_dec_and_test())) \
+ preempt_schedule(); \
} while (0)
-/* For debugging and tracer internals only! */
-#define add_preempt_count_notrace(val) \
- do { *preempt_count_ptr() += (val); } while (0)
-#define sub_preempt_count_notrace(val) \
- do { *preempt_count_ptr() -= (val); } while (0)
-#define inc_preempt_count_notrace() add_preempt_count_notrace(1)
-#define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
+#define preempt_check_resched() \
+do { \
+ if (should_resched()) \
+ preempt_schedule(); \
+} while (0)
+
+#else
+#define preempt_enable() preempt_enable_no_resched()
+#define preempt_check_resched() do { } while (0)
+#endif
#define preempt_disable_notrace() \
do { \
- inc_preempt_count_notrace(); \
+ __preempt_count_inc(); \
barrier(); \
} while (0)
#define preempt_enable_no_resched_notrace() \
do { \
barrier(); \
- dec_preempt_count_notrace(); \
+ __preempt_count_dec(); \
} while (0)
-/* preempt_check_resched is OK to trace */
+#ifdef CONFIG_PREEMPT
+
+#ifdef CONFIG_CONTEXT_TRACKING
+asmlinkage void preempt_schedule_context(void);
+#else
+#define preempt_schedule_context() preempt_schedule()
+#endif
+
#define preempt_enable_notrace() \
do { \
- preempt_enable_no_resched_notrace(); \
- preempt_check_resched_context(); \
+ barrier(); \
+ if (unlikely(__preempt_count_dec_and_test())) \
+ preempt_schedule_context(); \
} while (0)
+#else
+#define preempt_enable_notrace() preempt_enable_no_resched_notrace()
+#endif
#else /* !CONFIG_PREEMPT_COUNT */
@@ -118,10 +107,11 @@ do { \
* that can cause faults and scheduling migrate into our preempt-protected
* region.
*/
-#define preempt_disable() barrier()
+#define preempt_disable() barrier()
#define sched_preempt_enable_no_resched() barrier()
-#define preempt_enable_no_resched() barrier()
-#define preempt_enable() barrier()
+#define preempt_enable_no_resched() barrier()
+#define preempt_enable() barrier()
+#define preempt_check_resched() do { } while (0)
#define preempt_disable_notrace() barrier()
#define preempt_enable_no_resched_notrace() barrier()
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2403,11 +2403,6 @@ static inline int signal_pending_state(l
return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
}
-static inline int need_resched(void)
-{
- return unlikely(test_preempt_need_resched());
-}
-
/*
* cond_resched() and cond_resched_lock(): latency reduction via
* explicit rescheduling in places that are safe. The return
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -15,7 +15,7 @@
*/
static inline void pagefault_disable(void)
{
- inc_preempt_count();
+ preempt_count_inc();
/*
* make sure to have issued the store before a pagefault
* can hit.
@@ -30,11 +30,7 @@ static inline void pagefault_enable(void
* the pagefault handler again.
*/
barrier();
- dec_preempt_count();
- /*
- * make sure we do..
- */
- barrier();
+ preempt_count_dec();
preempt_check_resched();
}
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -111,7 +111,7 @@ void context_tracking_user_enter(void)
* instead of preempt_schedule() to exit user context if needed before
* calling the scheduler.
*/
-void __sched notrace preempt_schedule_context(void)
+asmlinkage void __sched notrace preempt_schedule_context(void)
{
enum ctx_state prev_ctx;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2239,7 +2239,7 @@ notrace unsigned long get_parent_ip(unsi
#if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
defined(CONFIG_PREEMPT_TRACER))
-void __kprobes add_preempt_count(int val)
+void __kprobes preempt_count_add(int val)
{
#ifdef CONFIG_DEBUG_PREEMPT
/*
@@ -2248,7 +2248,7 @@ void __kprobes add_preempt_count(int val
if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
return;
#endif
- *preempt_count_ptr() += val;
+ __preempt_count_add(val);
#ifdef CONFIG_DEBUG_PREEMPT
/*
* Spinlock count overflowing soon?
@@ -2259,9 +2259,9 @@ void __kprobes add_preempt_count(int val
if (preempt_count() == val)
trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
}
-EXPORT_SYMBOL(add_preempt_count);
+EXPORT_SYMBOL(preempt_count_add);
-void __kprobes sub_preempt_count(int val)
+void __kprobes preempt_count_sub(int val)
{
#ifdef CONFIG_DEBUG_PREEMPT
/*
@@ -2279,9 +2279,9 @@ void __kprobes sub_preempt_count(int val
if (preempt_count() == val)
trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
- *preempt_count_ptr() -= val;
+ __preempt_count_sub(val);
}
-EXPORT_SYMBOL(sub_preempt_count);
+EXPORT_SYMBOL(preempt_count_sub);
#endif
@@ -2545,9 +2545,9 @@ asmlinkage void __sched notrace preempt_
return;
do {
- add_preempt_count_notrace(PREEMPT_ACTIVE);
+ __preempt_count_add(PREEMPT_ACTIVE);
__schedule();
- sub_preempt_count_notrace(PREEMPT_ACTIVE);
+ __preempt_count_sub(PREEMPT_ACTIVE);
/*
* Check again in case we missed a preemption opportunity
@@ -2574,11 +2574,11 @@ asmlinkage void __sched preempt_schedule
prev_state = exception_enter();
do {
- add_preempt_count(PREEMPT_ACTIVE);
+ __preempt_count_add(PREEMPT_ACTIVE);
local_irq_enable();
__schedule();
local_irq_disable();
- sub_preempt_count(PREEMPT_ACTIVE);
+ __preempt_count_sub(PREEMPT_ACTIVE);
/*
* Check again in case we missed a preemption opportunity
@@ -3818,16 +3818,11 @@ SYSCALL_DEFINE0(sched_yield)
return 0;
}
-static inline int should_resched(void)
-{
- return need_resched() && !(preempt_count() & PREEMPT_ACTIVE);
-}
-
static void __cond_resched(void)
{
- add_preempt_count(PREEMPT_ACTIVE);
+ preempt_count_add(PREEMPT_ACTIVE);
__schedule();
- sub_preempt_count(PREEMPT_ACTIVE);
+ preempt_count_sub(PREEMPT_ACTIVE);
}
int __sched _cond_resched(void)
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -100,7 +100,7 @@ static void __local_bh_disable(unsigned
raw_local_irq_save(flags);
/*
- * The preempt tracer hooks into add_preempt_count and will break
+ * The preempt tracer hooks into preempt_count_add and will break
* lockdep because it calls back into lockdep after SOFTIRQ_OFFSET
* is set and before current->softirq_enabled is cleared.
* We must manually increment preempt_count here and manually
@@ -120,7 +120,7 @@ static void __local_bh_disable(unsigned
#else /* !CONFIG_TRACE_IRQFLAGS */
static inline void __local_bh_disable(unsigned long ip, unsigned int cnt)
{
- add_preempt_count(cnt);
+ preempt_count_add(cnt);
barrier();
}
#endif /* CONFIG_TRACE_IRQFLAGS */
@@ -139,7 +139,7 @@ static void __local_bh_enable(unsigned i
if (softirq_count() == cnt)
trace_softirqs_on(_RET_IP_);
- sub_preempt_count(cnt);
+ preempt_count_sub(cnt);
}
/*
@@ -169,12 +169,12 @@ static inline void _local_bh_enable_ip(u
* Keep preemption disabled until we are done with
* softirq processing:
*/
- sub_preempt_count(SOFTIRQ_DISABLE_OFFSET - 1);
+ preempt_count_sub(SOFTIRQ_DISABLE_OFFSET - 1);
if (unlikely(!in_interrupt() && local_softirq_pending()))
do_softirq();
- dec_preempt_count();
+ preempt_count_dec();
#ifdef CONFIG_TRACE_IRQFLAGS
local_irq_enable();
#endif
@@ -360,7 +360,7 @@ void irq_exit(void)
account_irq_exit_time(current);
trace_hardirq_exit();
- sub_preempt_count(HARDIRQ_OFFSET);
+ preempt_count_sub(HARDIRQ_OFFSET);
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
` (4 preceding siblings ...)
2013-09-10 13:08 ` [PATCH 5/7] sched: Extract the basic add/sub preempt_count modifiers Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
2013-09-10 13:27 ` Peter Zijlstra
` (2 more replies)
2013-09-10 13:08 ` [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call Peter Zijlstra
2013-09-10 13:51 ` [PATCH 0/7] preempt_count rework -v2 Ingo Molnar
7 siblings, 3 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
Peter Zijlstra
[-- Attachment #1: peterz-x86-per-cpu-preempt_count.patch --]
[-- Type: text/plain, Size: 8172 bytes --]
Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.
We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.
Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/Kbuild | 1
arch/x86/include/asm/preempt.h | 102 +++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/thread_info.h | 5 -
arch/x86/kernel/asm-offsets.c | 1
arch/x86/kernel/cpu/common.c | 5 +
arch/x86/kernel/entry_32.S | 7 --
arch/x86/kernel/entry_64.S | 4 -
arch/x86/kernel/process_32.c | 10 +++
arch/x86/kernel/process_64.c | 10 +++
9 files changed, 132 insertions(+), 13 deletions(-)
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,4 +5,3 @@ genhdr-y += unistd_64.h
genhdr-y += unistd_x32.h
generic-y += clkdev.h
-generic-y += preempt.h
--- /dev/null
+++ b/arch/x86/include/asm/preempt.h
@@ -0,0 +1,102 @@
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <asm/percpu.h>
+#include <linux/thread_info.h>
+
+DECLARE_PER_CPU(int, __preempt_count);
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+ return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+ return &__raw_get_cpu_var(__preempt_count);
+}
+
+/*
+ * must be macros to avoid header recursion hell
+ */
+#define task_preempt_count(p) \
+ (task_thread_info(p)->saved_preempt_count & ~PREEMPT_NEED_RESCHED)
+
+#define init_task_preempt_count(p) do { \
+ task_thread_info(p)->saved_preempt_count = 1 | PREEMPT_NEED_RESCHED; \
+} while (0)
+
+#define init_idle_preempt_count(p, cpu) do { \
+ task_thread_info(p)->saved_preempt_count = 0; \
+ per_cpu(__preempt_count, (cpu)) = 0; \
+} while (0)
+
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+ __this_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+ __this_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+ return !(__this_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
+}
+
+/*
+ * The various preempt_count add/sub methods
+ */
+
+static __always_inline void __preempt_count_add(int val)
+{
+ __this_cpu_add_4(__preempt_count, val);
+}
+
+static __always_inline void __preempt_count_sub(int val)
+{
+ __this_cpu_add_4(__preempt_count, -val);
+}
+
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+ unsigned char c;
+
+ asm ("decl " __percpu_arg(0) "; sete %1"
+ : "+m" (__preempt_count), "=qm" (c));
+
+ return c != 0;
+}
+
+/*
+ * Returns true when we need to resched -- even if we can not.
+ */
+static __always_inline bool need_resched(void)
+{
+ return unlikely(test_preempt_need_resched());
+}
+
+/*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+static __always_inline bool should_resched(void)
+{
+ return unlikely(!__this_cpu_read_4(__preempt_count));
+}
+
+#endif /* __ASM_PREEMPT_H */
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -28,8 +28,7 @@ struct thread_info {
__u32 flags; /* low level flags */
__u32 status; /* thread synchronous flags */
__u32 cpu; /* current CPU */
- int preempt_count; /* 0 => preemptable,
- <0 => BUG */
+ int saved_preempt_count;
mm_segment_t addr_limit;
struct restart_block restart_block;
void __user *sysenter_return;
@@ -49,7 +48,7 @@ struct thread_info {
.exec_domain = &default_exec_domain, \
.flags = 0, \
.cpu = 0, \
- .preempt_count = INIT_PREEMPT_COUNT, \
+ .saved_preempt_count = INIT_PREEMPT_COUNT, \
.addr_limit = KERNEL_DS, \
.restart_block = { \
.fn = do_no_restart_syscall, \
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -32,7 +32,6 @@ void common(void) {
OFFSET(TI_flags, thread_info, flags);
OFFSET(TI_status, thread_info, status);
OFFSET(TI_addr_limit, thread_info, addr_limit);
- OFFSET(TI_preempt_count, thread_info, preempt_count);
BLANK();
OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1095,6 +1095,9 @@ DEFINE_PER_CPU(char *, irq_stack_ptr) =
DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
+
DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
/*
@@ -1169,6 +1172,8 @@ void debug_stack_reset(void)
DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
EXPORT_PER_CPU_SYMBOL(current_task);
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
#ifdef CONFIG_CC_STACKPROTECTOR
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -362,12 +362,9 @@ END(ret_from_exception)
#ifdef CONFIG_PREEMPT
ENTRY(resume_kernel)
DISABLE_INTERRUPTS(CLBR_ANY)
- cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ?
- jnz restore_all
need_resched:
- movl TI_flags(%ebp), %ecx # need_resched set ?
- testb $_TIF_NEED_RESCHED, %cl
- jz restore_all
+ cmpl $0,PER_CPU_VAR(__preempt_count)
+ jnz restore_all
testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off (exception path) ?
jz restore_all
call preempt_schedule_irq
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1103,10 +1103,8 @@ ENTRY(native_iret)
/* Returning to kernel space. Check if we need preemption */
/* rcx: threadinfo. interrupts off. */
ENTRY(retint_kernel)
- cmpl $0,TI_preempt_count(%rcx)
+ cmpl $0,PER_CPU_VAR(__preempt_count)
jnz retint_restore_args
- bt $TIF_NEED_RESCHED,TI_flags(%rcx)
- jnc retint_restore_args
bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */
jnc retint_restore_args
call preempt_schedule_irq
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -291,6 +291,16 @@ __switch_to(struct task_struct *prev_p,
if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
set_iopl_mask(next->iopl);
+#ifdef CONFIG_PREEMPT_COUNT
+ /*
+ * If it were not for PREEMPT_ACTIVE we could guarantee that the
+ * preempt_count of all tasks was equal here and this would not be
+ * needed.
+ */
+ task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);
+ __raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;
+#endif
+
/*
* Now maybe handle debug registers and/or IO bitmaps
*/
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -363,6 +363,16 @@ __switch_to(struct task_struct *prev_p,
this_cpu_write(old_rsp, next->usersp);
this_cpu_write(current_task, next_p);
+#ifdef CONFIG_PREEMPT_COUNT
+ /*
+ * If it were not for PREEMPT_ACTIVE we could guarantee that the
+ * preempt_count of all tasks was equal here and this would not be
+ * needed.
+ */
+ task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);
+ __raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;
+#endif
+
this_cpu_write(kernel_stack,
(unsigned long)task_stack_page(next_p) +
THREAD_SIZE - KERNEL_STACK_OFFSET);
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
` (5 preceding siblings ...)
2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
@ 2013-09-10 13:08 ` Peter Zijlstra
2013-09-10 13:42 ` Ingo Molnar
2013-09-10 13:51 ` [PATCH 0/7] preempt_count rework -v2 Ingo Molnar
7 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:08 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch,
Peter Zijlstra
[-- Attachment #1: peterz-x86-opt-call.patch --]
[-- Type: text/plain, Size: 4962 bytes --]
Remove the bloat of the C calling convention out of the
preempt_enable() sites by creating an ASM wrapper which allows us to
do an asm("call ___preempt_schedule") instead.
calling.h bits by Andi Kleen
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/calling.h | 50 +++++++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/preempt.h | 8 ++++++
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/preempt.S | 25 ++++++++++++++++++++
include/asm-generic/preempt.h | 10 ++++++++
include/linux/preempt.h | 13 ++++------
6 files changed, 100 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -48,6 +48,8 @@ For 32-bit we have the following convent
#include <asm/dwarf2.h>
+#ifdef CONFIG_X86_64
+
/*
* 64-bit system call stack frame layout defines and helpers,
* for assembly code:
@@ -192,3 +194,51 @@ For 32-bit we have the following convent
.macro icebp
.byte 0xf1
.endm
+
+#else /* CONFIG_X86_64 */
+
+/*
+ * For 32bit only simplified versions of SAVE_ALL/RESTORE_ALL. These
+ * are different from the entry_32.S versions in not changing the segment
+ * registers. So only suitable for in kernel use, not when transitioning
+ * from or to user space. The resulting stack frame is not a standard
+ * pt_regs frame. The main use case is calling C code from assembler
+ * when all the registers need to be preserved.
+ */
+
+ .macro SAVE_ALL
+ pushl_cfi %eax
+ CFI_REL_OFFSET eax, 0
+ pushl_cfi %ebp
+ CFI_REL_OFFSET ebp, 0
+ pushl_cfi %edi
+ CFI_REL_OFFSET edi, 0
+ pushl_cfi %esi
+ CFI_REL_OFFSET esi, 0
+ pushl_cfi %edx
+ CFI_REL_OFFSET edx, 0
+ pushl_cfi %ecx
+ CFI_REL_OFFSET ecx, 0
+ pushl_cfi %ebx
+ CFI_REL_OFFSET ebx, 0
+ .endm
+
+ .macro RESTORE_ALL
+ popl_cfi %ebx
+ CFI_RESTORE ebx
+ popl_cfi %ecx
+ CFI_RESTORE ecx
+ popl_cfi %edx
+ CFI_RESTORE edx
+ popl_cfi %esi
+ CFI_RESTORE esi
+ popl_cfi %edi
+ CFI_RESTORE edi
+ popl_cfi %ebp
+ CFI_RESTORE ebp
+ popl_cfi %eax
+ CFI_RESTORE eax
+ .endm
+
+#endif /* CONFIG_X86_64 */
+
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -99,4 +99,12 @@ static __always_inline bool should_resch
return unlikely(!__this_cpu_read_4(__preempt_count));
}
+#ifdef CONFIG_PREEMPT
+#define __preempt_schedule() asm ("call ___preempt_schedule")
+
+#ifdef CONFIG_CONTEXT_TRACKING
+#define __preempt_schedule_context() asm ("call ___preempt_schedule_context")
+#endif
+#endif /* CONFIG_PREEMPT */
+
#endif /* __ASM_PREEMPT_H */
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -36,6 +36,8 @@ obj-y += tsc.o io_delay.o rtc.o
obj-y += pci-iommu_table.o
obj-y += resource.o
+obj-$(CONFIG_PREEMPT) += preempt.o
+
obj-y += process.o
obj-y += i387.o xsave.o
obj-y += ptrace.o
--- /dev/null
+++ b/arch/x86/kernel/preempt.S
@@ -0,0 +1,25 @@
+
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/asm.h>
+#include <asm/calling.h>
+
+ENTRY(___preempt_schedule)
+ CFI_STARTPROC
+ SAVE_ALL
+ call preempt_schedule
+ RESTORE_ALL
+ ret
+ CFI_ENDPROC
+
+#ifdef CONFIG_CONTEXT_TRACKING
+
+ENTRY(___preempt_schedule_context)
+ CFI_STARTPROC
+ SAVE_ALL
+ call preempt_schedule_context
+ RESTORE_ALL
+ ret
+ CFI_ENDPROC
+
+#endif
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -90,4 +90,14 @@ static __always_inline bool should_resch
return unlikely(!*preempt_count_ptr());
}
+#ifdef CONFIG_PREEMPT
+extern asmlinkage void preempt_schedule(void);
+#define __preempt_schedule() preempt_schedule()
+
+#ifdef CONFIG_CONTEXT_TRACKING
+extern asmlinkage void preempt_schedule_context(void);
+#define __preempt_schedule_context() preempt_schedule_context()
+#endif
+#endif /* CONFIG_PREEMPT */
+
#endif /* __ASM_PREEMPT_H */
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -50,18 +50,17 @@ do { \
#define preempt_enable_no_resched() sched_preempt_enable_no_resched()
#ifdef CONFIG_PREEMPT
-asmlinkage void preempt_schedule(void);
#define preempt_enable() \
do { \
barrier(); \
if (unlikely(preempt_count_dec_and_test())) \
- preempt_schedule(); \
+ __preempt_schedule(); \
} while (0)
#define preempt_check_resched() \
do { \
if (should_resched()) \
- preempt_schedule(); \
+ __preempt_schedule(); \
} while (0)
#else
@@ -83,17 +82,15 @@ do { \
#ifdef CONFIG_PREEMPT
-#ifdef CONFIG_CONTEXT_TRACKING
-asmlinkage void preempt_schedule_context(void);
-#else
-#define preempt_schedule_context() preempt_schedule()
+#ifndef CONFIG_CONTEXT_TRACKING
+#define __preempt_schedule_context() __preempt_schedule()
#endif
#define preempt_enable_notrace() \
do { \
barrier(); \
if (unlikely(__preempt_count_dec_and_test())) \
- preempt_schedule_context(); \
+ __preempt_schedule_context(); \
} while (0)
#else
#define preempt_enable_notrace() preempt_enable_no_resched_notrace()
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
@ 2013-09-10 13:27 ` Peter Zijlstra
2013-09-10 14:02 ` Eric Dumazet
2013-09-10 16:48 ` Peter Zijlstra
2 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 13:27 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch
On Tue, Sep 10, 2013 at 03:08:17PM +0200, Peter Zijlstra wrote:
> @@ -1095,6 +1095,9 @@ DEFINE_PER_CPU(char *, irq_stack_ptr) =
>
> DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
>
> +DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
> +EXPORT_PER_CPU_SYMBOL(__preempt_count);
> +
Note that afaict this patch takes away the reason irq_count exists.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
2013-09-10 13:08 ` [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call Peter Zijlstra
@ 2013-09-10 13:42 ` Ingo Molnar
2013-09-10 13:55 ` Jan Beulich
0 siblings, 1 reply; 52+ messages in thread
From: Ingo Molnar @ 2013-09-10 13:42 UTC (permalink / raw)
To: Peter Zijlstra, Jan Beulich
Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
linux-kernel, linux-arch
* Peter Zijlstra <peterz@infradead.org> wrote:
> + .macro SAVE_ALL
> + pushl_cfi %eax
> + CFI_REL_OFFSET eax, 0
> + pushl_cfi %ebp
> + CFI_REL_OFFSET ebp, 0
> + pushl_cfi %edi
> + CFI_REL_OFFSET edi, 0
> + pushl_cfi %esi
> + CFI_REL_OFFSET esi, 0
> + pushl_cfi %edx
> + CFI_REL_OFFSET edx, 0
> + pushl_cfi %ecx
> + CFI_REL_OFFSET ecx, 0
> + pushl_cfi %ebx
> + CFI_REL_OFFSET ebx, 0
> + .endm
> +
> + .macro RESTORE_ALL
> + popl_cfi %ebx
> + CFI_RESTORE ebx
> + popl_cfi %ecx
> + CFI_RESTORE ecx
> + popl_cfi %edx
> + CFI_RESTORE edx
> + popl_cfi %esi
> + CFI_RESTORE esi
> + popl_cfi %edi
> + CFI_RESTORE edi
> + popl_cfi %ebp
> + CFI_RESTORE ebp
> + popl_cfi %eax
> + CFI_RESTORE eax
> + .endm
Side note: shouldn't the pushl_cfi and popl_cfi macros be adjusted,
instead of open coding it?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
` (6 preceding siblings ...)
2013-09-10 13:08 ` [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call Peter Zijlstra
@ 2013-09-10 13:51 ` Ingo Molnar
2013-09-10 13:56 ` Ingo Molnar
7 siblings, 1 reply; 52+ messages in thread
From: Ingo Molnar @ 2013-09-10 13:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
linux-kernel, linux-arch
* Peter Zijlstra <peterz@infradead.org> wrote:
> These patches optimize preempt_enable by firstly folding the preempt and
> need_resched tests into one -- this should work for all architectures. And
> secondly by providing per-arch preempt_count implementations; with x86 using
> per-cpu preempt_count for fastest access.
>
>
> These patches have been boot tested on CONFIG_PREEMPT=y x86_64 and survive
> building a x86_64-defconfig kernel.
>
> kernel/sched/core.c:kick_process() now looks like:
>
> ffffffff8106f3f0 <kick_process>:
> ffffffff8106f3f0: 55 push %rbp
> ffffffff8106f3f1: 65 ff 04 25 e0 b7 00 incl %gs:0xb7e0
> ffffffff8106f3f8: 00
> ffffffff8106f3f9: 48 89 e5 mov %rsp,%rbp
> ffffffff8106f3fc: 48 8b 47 08 mov 0x8(%rdi),%rax
> ffffffff8106f400: 8b 50 18 mov 0x18(%rax),%edx
> ffffffff8106f403: 65 8b 04 25 1c b0 00 mov %gs:0xb01c,%eax
> ffffffff8106f40a: 00
> ffffffff8106f40b: 39 c2 cmp %eax,%edx
> ffffffff8106f40d: 74 1b je ffffffff8106f42a <kick_process+0x3a>
> ffffffff8106f40f: 89 d1 mov %edx,%ecx
> ffffffff8106f411: 48 c7 c0 00 2c 01 00 mov $0x12c00,%rax
> ffffffff8106f418: 48 8b 0c cd a0 bc cb mov -0x7e344360(,%rcx,8),%rcx
> ffffffff8106f41f: 81
> ffffffff8106f420: 48 3b bc 08 00 08 00 cmp 0x800(%rax,%rcx,1),%rdi
> ffffffff8106f427: 00
> ffffffff8106f428: 74 1e je ffffffff8106f448 <kick_process+0x58>
> * ffffffff8106f42a: 65 ff 0c 25 e0 b7 00 decl %gs:0xb7e0
> ffffffff8106f431: 00
> * ffffffff8106f432: 0f 94 c0 sete %al
> * ffffffff8106f435: 84 c0 test %al,%al
> * ffffffff8106f437: 75 02 jne ffffffff8106f43b <kick_process+0x4b>
> ffffffff8106f439: 5d pop %rbp
> ffffffff8106f43a: c3 retq
> * ffffffff8106f43b: e8 b0 b6 f9 ff callq ffffffff8100aaf0 <___preempt_schedule>
Mind also posting the 'before' assembly, to make it clear how much we've
improved things?
> ffffffff8106f440: 5d pop %rbp
> ffffffff8106f441: c3 retq
> ffffffff8106f442: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
> ffffffff8106f448: 89 d7 mov %edx,%edi
> ffffffff8106f44a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
> ffffffff8106f450: ff 15 ea e0 ba 00 callq *0xbae0ea(%rip) # ffffffff81c1d540 <smp_ops+0x20>
> ffffffff8106f456: eb d2 jmp ffffffff8106f42a <kick_process+0x3a>
> ffffffff8106f458: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
> ffffffff8106f45f: 00
>
> Where the '*' marked lines are preempt_enable(), sadly GCC isn't able to
> get rid of the sete+test :/ Its a rather frequent pattern in the kernel,
> so 'fixing' the x86 GCC backend to recognise this might be useful.
So what we do in kick_process() is:
preempt_disable();
cpu = task_cpu(p);
if ((cpu != smp_processor_id()) && task_curr(p))
smp_send_reschedule(cpu);
preempt_enable();
The preempt_disable() looks sweet:
> ffffffff8106f3f1: 65 ff 04 25 e0 b7 00 incl %gs:0xb7e0
> ffffffff8106f3f8: 00
and the '*' you marked is the preempt_enable() portion, which, with your
new code, looks like this:
#define preempt_check_resched() \
do { \
if (unlikely(!*preempt_count_ptr())) \
preempt_schedule(); \
} while (0)
Which GCC translates to:
> * ffffffff8106f42a: 65 ff 0c 25 e0 b7 00 decl %gs:0xb7e0
> ffffffff8106f431: 00
> * ffffffff8106f432: 0f 94 c0 sete %al
> * ffffffff8106f435: 84 c0 test %al,%al
> * ffffffff8106f437: 75 02 jne ffffffff8106f43b <kick_process+0x4b>
So, is the problem that GCC cannot pass a 'CPU flags' state out of asm(),
only an explicit (pseudo-)value, right?
Ideally we'd like to have something like:
> * ffffffff8106f42a: 65 ff 0c 25 e0 b7 00 decl %gs:0xb7e0
> ffffffff8106f431: 00
> * ffffffff8106f437: 75 02 jne ffffffff8106f43b <kick_process+0x4b>
right?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
2013-09-10 13:42 ` Ingo Molnar
@ 2013-09-10 13:55 ` Jan Beulich
0 siblings, 0 replies; 52+ messages in thread
From: Jan Beulich @ 2013-09-10 13:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
Linus Torvalds, Andi Kleen, Arjan van de Ven, Mike Galbraith,
linux-arch, linux-kernel, Peter Anvin
>>> On 10.09.13 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> + .macro SAVE_ALL
>> + pushl_cfi %eax
>> + CFI_REL_OFFSET eax, 0
>> + pushl_cfi %ebp
>> + CFI_REL_OFFSET ebp, 0
>> + pushl_cfi %edi
>> + CFI_REL_OFFSET edi, 0
>> + pushl_cfi %esi
>> + CFI_REL_OFFSET esi, 0
>> + pushl_cfi %edx
>> + CFI_REL_OFFSET edx, 0
>> + pushl_cfi %ecx
>> + CFI_REL_OFFSET ecx, 0
>> + pushl_cfi %ebx
>> + CFI_REL_OFFSET ebx, 0
>> + .endm
>> +
>> + .macro RESTORE_ALL
>> + popl_cfi %ebx
>> + CFI_RESTORE ebx
>> + popl_cfi %ecx
>> + CFI_RESTORE ecx
>> + popl_cfi %edx
>> + CFI_RESTORE edx
>> + popl_cfi %esi
>> + CFI_RESTORE esi
>> + popl_cfi %edi
>> + CFI_RESTORE edi
>> + popl_cfi %ebp
>> + CFI_RESTORE ebp
>> + popl_cfi %eax
>> + CFI_RESTORE eax
>> + .endm
>
> Side note: shouldn't the pushl_cfi and popl_cfi macros be adjusted,
> instead of open coding it?
If you mean the open coding of CFI_REL_OFFSET and CFI_RESTORE,
then no - there may be pushes/pops that don't save the caller's
register values (i.e. where solely the frame pointer adjustment
matters).
If you meant something else, please clarify what.
Jan
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
@ 2013-09-10 13:55 ` Jan Beulich
0 siblings, 0 replies; 52+ messages in thread
From: Jan Beulich @ 2013-09-10 13:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
Linus Torvalds, Andi Kleen, Arjan van de Ven, Mike Galbraith,
linux-arch, linux-kernel, Peter Anvin
>>> On 10.09.13 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> + .macro SAVE_ALL
>> + pushl_cfi %eax
>> + CFI_REL_OFFSET eax, 0
>> + pushl_cfi %ebp
>> + CFI_REL_OFFSET ebp, 0
>> + pushl_cfi %edi
>> + CFI_REL_OFFSET edi, 0
>> + pushl_cfi %esi
>> + CFI_REL_OFFSET esi, 0
>> + pushl_cfi %edx
>> + CFI_REL_OFFSET edx, 0
>> + pushl_cfi %ecx
>> + CFI_REL_OFFSET ecx, 0
>> + pushl_cfi %ebx
>> + CFI_REL_OFFSET ebx, 0
>> + .endm
>> +
>> + .macro RESTORE_ALL
>> + popl_cfi %ebx
>> + CFI_RESTORE ebx
>> + popl_cfi %ecx
>> + CFI_RESTORE ecx
>> + popl_cfi %edx
>> + CFI_RESTORE edx
>> + popl_cfi %esi
>> + CFI_RESTORE esi
>> + popl_cfi %edi
>> + CFI_RESTORE edi
>> + popl_cfi %ebp
>> + CFI_RESTORE ebp
>> + popl_cfi %eax
>> + CFI_RESTORE eax
>> + .endm
>
> Side note: shouldn't the pushl_cfi and popl_cfi macros be adjusted,
> instead of open coding it?
If you mean the open coding of CFI_REL_OFFSET and CFI_RESTORE,
then no - there may be pushes/pops that don't save the caller's
register values (i.e. where solely the frame pointer adjustment
matters).
If you meant something else, please clarify what.
Jan
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 13:51 ` [PATCH 0/7] preempt_count rework -v2 Ingo Molnar
@ 2013-09-10 13:56 ` Ingo Molnar
2013-09-10 15:14 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 52+ messages in thread
From: Ingo Molnar @ 2013-09-10 13:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
linux-kernel, linux-arch
* Ingo Molnar <mingo@kernel.org> wrote:
> So what we do in kick_process() is:
>
> preempt_disable();
> cpu = task_cpu(p);
> if ((cpu != smp_processor_id()) && task_curr(p))
> smp_send_reschedule(cpu);
> preempt_enable();
>
> The preempt_disable() looks sweet:
>
> > ffffffff8106f3f1: 65 ff 04 25 e0 b7 00 incl %gs:0xb7e0
> > ffffffff8106f3f8: 00
>
> and the '*' you marked is the preempt_enable() portion, which, with your
> new code, looks like this:
>
> #define preempt_check_resched() \
> do { \
> if (unlikely(!*preempt_count_ptr())) \
> preempt_schedule(); \
> } while (0)
>
> Which GCC translates to:
>
> > * ffffffff8106f42a: 65 ff 0c 25 e0 b7 00 decl %gs:0xb7e0
> > ffffffff8106f431: 00
> > * ffffffff8106f432: 0f 94 c0 sete %al
> > * ffffffff8106f435: 84 c0 test %al,%al
> > * ffffffff8106f437: 75 02 jne ffffffff8106f43b <kick_process+0x4b>
Correction, so this comes from the new x86-specific optimization:
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+ unsigned char c;
+
+ asm ("decl " __percpu_arg(0) "; sete %1"
+ : "+m" (__preempt_count), "=qm" (c));
+
+ return c != 0;
+}
And that's where the sete and test originates from.
Couldn't it be improved by merging the preempt_schedule() call into a new
primitive, keeping the call in the regular flow, or using section tricks
to move it out of line? The scheduling case is a slowpath in most cases.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
2013-09-10 13:27 ` Peter Zijlstra
@ 2013-09-10 14:02 ` Eric Dumazet
2013-09-10 15:25 ` Peter Zijlstra
2013-09-10 16:48 ` Peter Zijlstra
2 siblings, 1 reply; 52+ messages in thread
From: Eric Dumazet @ 2013-09-10 14:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch
On Tue, 2013-09-10 at 15:08 +0200, Peter Zijlstra wrote:
> +static __always_inline int preempt_count(void)
> +{
> + return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
> +}
Not sure why you used the _4 prefix on all accessors ?
>
> +#ifdef CONFIG_PREEMPT_COUNT
> + /*
> + * If it were not for PREEMPT_ACTIVE we could guarantee that the
> + * preempt_count of all tasks was equal here and this would not be
> + * needed.
> + */
> + task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);
this_cpu_read(__preempt_count) ?
> + __raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;
this_cpu_write(__preempt_count,
task_thread_info(next_p)->saved_preempt_count;
> +#endif
> +
> this_cpu_write(kernel_stack,
> (unsigned long)task_stack_page(next_p) +
> THREAD_SIZE - KERNEL_STACK_OFFSET);
>
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call
2013-09-10 13:55 ` Jan Beulich
(?)
@ 2013-09-10 14:25 ` Ingo Molnar
-1 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2013-09-10 14:25 UTC (permalink / raw)
To: Jan Beulich
Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
Linus Torvalds, Andi Kleen, Arjan van de Ven, Mike Galbraith,
linux-arch, linux-kernel, Peter Anvin
* Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 10.09.13 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:
>
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >> + .macro SAVE_ALL
> >> + pushl_cfi %eax
> >> + CFI_REL_OFFSET eax, 0
> >> + pushl_cfi %ebp
> >> + CFI_REL_OFFSET ebp, 0
> >> + pushl_cfi %edi
> >> + CFI_REL_OFFSET edi, 0
> >> + pushl_cfi %esi
> >> + CFI_REL_OFFSET esi, 0
> >> + pushl_cfi %edx
> >> + CFI_REL_OFFSET edx, 0
> >> + pushl_cfi %ecx
> >> + CFI_REL_OFFSET ecx, 0
> >> + pushl_cfi %ebx
> >> + CFI_REL_OFFSET ebx, 0
> >> + .endm
> >> +
> >> + .macro RESTORE_ALL
> >> + popl_cfi %ebx
> >> + CFI_RESTORE ebx
> >> + popl_cfi %ecx
> >> + CFI_RESTORE ecx
> >> + popl_cfi %edx
> >> + CFI_RESTORE edx
> >> + popl_cfi %esi
> >> + CFI_RESTORE esi
> >> + popl_cfi %edi
> >> + CFI_RESTORE edi
> >> + popl_cfi %ebp
> >> + CFI_RESTORE ebp
> >> + popl_cfi %eax
> >> + CFI_RESTORE eax
> >> + .endm
> >
> > Side note: shouldn't the pushl_cfi and popl_cfi macros be adjusted,
> > instead of open coding it?
>
> If you mean the open coding of CFI_REL_OFFSET and CFI_RESTORE, then no -
> there may be pushes/pops that don't save the caller's register values
> (i.e. where solely the frame pointer adjustment matters).
Ok.
> If you meant something else, please clarify what.
No, that's what I meant.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 13:56 ` Ingo Molnar
@ 2013-09-10 15:14 ` Peter Zijlstra
2013-09-10 15:29 ` Arjan van de Ven
2013-09-10 16:34 ` Linus Torvalds
2 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 15:14 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
linux-kernel, linux-arch
On Tue, Sep 10, 2013 at 03:56:36PM +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@kernel.org> wrote:
> > > * ffffffff8106f42a: 65 ff 0c 25 e0 b7 00 decl %gs:0xb7e0
> > > ffffffff8106f431: 00
> > > * ffffffff8106f432: 0f 94 c0 sete %al
> > > * ffffffff8106f435: 84 c0 test %al,%al
> > > * ffffffff8106f437: 75 02 jne ffffffff8106f43b <kick_process+0x4b>
>
> Correction, so this comes from the new x86-specific optimization:
>
> +static __always_inline bool __preempt_count_dec_and_test(void)
> +{
> + unsigned char c;
> +
> + asm ("decl " __percpu_arg(0) "; sete %1"
> + : "+m" (__preempt_count), "=qm" (c));
> +
> + return c != 0;
> +}
>
> And that's where the sete and test originates from.
Correct, used in:
#define preempt_enable() \
do { \
barrier(); \
if (unlikely(preempt_count_dec_and_test())) \
__preempt_schedule(); \
} while (0)
> Couldn't it be improved by merging the preempt_schedule() call into a new
> primitive, keeping the call in the regular flow, or using section tricks
> to move it out of line? The scheduling case is a slowpath in most cases.
Not if we want to keep using the GCC unlikely thing afaik. That said,
all this inline asm stuff is isn't my strong point, so maybe someone
else has a good idea.
But I really think fixing GCC would be good, as we have the same pattern
with all *_and_test() functions.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
2013-09-10 14:02 ` Eric Dumazet
@ 2013-09-10 15:25 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 15:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch
On Tue, Sep 10, 2013 at 07:02:51AM -0700, Eric Dumazet wrote:
> On Tue, 2013-09-10 at 15:08 +0200, Peter Zijlstra wrote:
>
> > +static __always_inline int preempt_count(void)
> > +{
> > + return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
> > +}
>
> Not sure why you used the _4 prefix on all accessors ?
Last time I tried using the proper this_cpu* stuff that all exploded in
my face due to header recursion hell, so I've limited myself to what's
available in arch/x86/include/asm/percpu.h.
It was a few weeks ago though and maybe I just didn't try hard enough.
> > +#ifdef CONFIG_PREEMPT_COUNT
> > + /*
> > + * If it were not for PREEMPT_ACTIVE we could guarantee that the
> > + * preempt_count of all tasks was equal here and this would not be
> > + * needed.
> > + */
> > + task_thread_info(prev_p)->saved_preempt_count = __raw_get_cpu_var(__preempt_count);
>
> this_cpu_read(__preempt_count) ?
>
> > + __raw_get_cpu_var(__preempt_count) = task_thread_info(next_p)->saved_preempt_count;
>
> this_cpu_write(__preempt_count,
> task_thread_info(next_p)->saved_preempt_count;
OK, that does indeed generate slightly better code.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 13:56 ` Ingo Molnar
2013-09-10 15:14 ` Peter Zijlstra
@ 2013-09-10 15:29 ` Arjan van de Ven
2013-09-10 15:35 ` Peter Zijlstra
2013-09-10 16:24 ` Linus Torvalds
2013-09-10 16:34 ` Linus Torvalds
2 siblings, 2 replies; 52+ messages in thread
From: Arjan van de Ven @ 2013-09-10 15:29 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Linus Torvalds, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Frederic Weisbecker,
linux-kernel, linux-arch
On 9/10/2013 6:56 AM, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
>> So what we do in kick_process() is:
>>
>> preempt_disable();
>> cpu = task_cpu(p);
>> if ((cpu != smp_processor_id()) && task_curr(p))
>> smp_send_reschedule(cpu);
>> preempt_enable();
>>
>> The preempt_disable() looks sweet:
>>
>>> ffffffff8106f3f1: 65 ff 04 25 e0 b7 00 incl %gs:0xb7e0
>>> ffffffff8106f3f8: 00
>>
>> and the '*' you marked is the preempt_enable() portion, which, with your
>> new code, looks like this:
>>
>> #define preempt_check_resched() \
>> do { \
>> if (unlikely(!*preempt_count_ptr())) \
>> preempt_schedule(); \
>> } while (0)
>>
>> Which GCC translates to:
>>
>>> * ffffffff8106f42a: 65 ff 0c 25 e0 b7 00 decl %gs:0xb7e0
>>> ffffffff8106f431: 00
>>> * ffffffff8106f432: 0f 94 c0 sete %al
>>> * ffffffff8106f435: 84 c0 test %al,%al
>>> * ffffffff8106f437: 75 02 jne ffffffff8106f43b <kick_process+0x4b>
>
> Correction, so this comes from the new x86-specific optimization:
>
> +static __always_inline bool __preempt_count_dec_and_test(void)
> +{
> + unsigned char c;
> +
> + asm ("decl " __percpu_arg(0) "; sete %1"
> + : "+m" (__preempt_count), "=qm" (c));
> +
> + return c != 0;
> +}
>
> And that's where the sete and test originates from.
>
> Couldn't it be improved by merging the preempt_schedule() call into a new
> primitive, keeping the call in the regular flow, or using section tricks
> to move it out of line? The scheduling case is a slowpath in most cases.
>
also.. yuck on using "dec"
"dec" sucks, please use "sub foo ,1" instead
(dec sucks because of its broken flags behavior; it creates basically a bubble in the pipeline)
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 15:29 ` Arjan van de Ven
@ 2013-09-10 15:35 ` Peter Zijlstra
2013-09-10 16:24 ` Linus Torvalds
1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 15:35 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Ingo Molnar, Linus Torvalds, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Frederic Weisbecker,
linux-kernel, linux-arch
On Tue, Sep 10, 2013 at 08:29:16AM -0700, Arjan van de Ven wrote:
> also.. yuck on using "dec" "dec" sucks, please use "sub foo ,1"
> instead (dec sucks because of its broken flags behavior; it creates
> basically a bubble in the pipeline)
Then someone needs to go change:
atomic{,64}_dec*()
local{,64}_dec*()
percpu_add_op()
All of which use decl/incl, percpu_add_op() actually goes out of its way
to generate them.
Also, subl ,1 is 4 bytes larger.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 15:29 ` Arjan van de Ven
2013-09-10 15:35 ` Peter Zijlstra
@ 2013-09-10 16:24 ` Linus Torvalds
2013-09-11 16:00 ` H. Peter Anvin
1 sibling, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 16:24 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Ingo Molnar, Peter Zijlstra, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Tue, Sep 10, 2013 at 8:29 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>
> also.. yuck on using "dec"
> "dec" sucks, please use "sub foo ,1" instead
That's a bigger instruction, largely due to the constant.
> (dec sucks because of its broken flags behavior; it creates basically a
> bubble in the pipeline)
Intel could (and should) just fix that. It's "easy" enough - you just
need to rename the carry flag as a separate register (and make sure
that the conditional instructions that don't use it don't think they
need it).
In fact I thought Intel already did that on their large cores. Are you
sure they still have trouble with inc/dec?
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 13:56 ` Ingo Molnar
2013-09-10 15:14 ` Peter Zijlstra
2013-09-10 15:29 ` Arjan van de Ven
@ 2013-09-10 16:34 ` Linus Torvalds
2013-09-10 16:45 ` Peter Zijlstra
2 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 16:34 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Tue, Sep 10, 2013 at 6:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> +static __always_inline bool __preempt_count_dec_and_test(void)
> +{
> + unsigned char c;
> +
> + asm ("decl " __percpu_arg(0) "; sete %1"
> + : "+m" (__preempt_count), "=qm" (c));
> +
> + return c != 0;
> +}
>
> And that's where the sete and test originates from.
We could make this use "asm goto" instead.
An "asm goto" cannot have outputs, but this particular one doesn't
_need_ outputs. You could mark the preempt_count memory as an input,
and then have a memory clobber. I think you need the memory clobber
anyway for that preempt-count thing.
So I _think_ something like
static __always_inline bool __preempt_count_dec_and_test(void)
{
asm goto("decl " __percpu_arg(0) "\n\t"
"je %l[became_zero]"
: :"m" (__preempt_count):"memory":became_zero);
return 0;
became_zero:
return 1;
}
would work.
You need to wrap it in
#ifdef CC_HAVE_ASM_GOTO
and then have the old "sete" version for older compilers, but for
newer ones you'd get pretty much perfect code. UNTESTED.
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 16:34 ` Linus Torvalds
@ 2013-09-10 16:45 ` Peter Zijlstra
2013-09-10 17:06 ` Linus Torvalds
0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 16:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Tue, Sep 10, 2013 at 09:34:52AM -0700, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 6:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > +static __always_inline bool __preempt_count_dec_and_test(void)
> > +{
> > + unsigned char c;
> > +
> > + asm ("decl " __percpu_arg(0) "; sete %1"
> > + : "+m" (__preempt_count), "=qm" (c));
> > +
> > + return c != 0;
> > +}
> >
> > And that's where the sete and test originates from.
>
> We could make this use "asm goto" instead.
>
> An "asm goto" cannot have outputs, but this particular one doesn't
> _need_ outputs. You could mark the preempt_count memory as an input,
> and then have a memory clobber. I think you need the memory clobber
> anyway for that preempt-count thing.
>
> So I _think_ something like
>
> static __always_inline bool __preempt_count_dec_and_test(void)
> {
> asm goto("decl " __percpu_arg(0) "\n\t"
> "je %l[became_zero]"
> : :"m" (__preempt_count):"memory":became_zero);
> return 0;
> became_zero:
> return 1;
> }
The usage site:
#define preempt_enable() \
do { \
barrier(); \
if (unlikely(preempt_count_dec_and_test())) \
__preempt_schedule(); \
} while (0)
Already includes the barrier explicitly, so do we still need the memory
clobber in that asm goto thing?
That said, your change results in:
ffffffff8106f420 <kick_process>:
ffffffff8106f420: 55 push %rbp
ffffffff8106f421: 65 ff 04 25 e0 b7 00 incl %gs:0xb7e0
ffffffff8106f428: 00
ffffffff8106f429: 48 89 e5 mov %rsp,%rbp
ffffffff8106f42c: 48 8b 47 08 mov 0x8(%rdi),%rax
ffffffff8106f430: 8b 50 18 mov 0x18(%rax),%edx
ffffffff8106f433: 65 8b 04 25 1c b0 00 mov %gs:0xb01c,%eax
ffffffff8106f43a: 00
ffffffff8106f43b: 39 c2 cmp %eax,%edx
ffffffff8106f43d: 74 1b je ffffffff8106f45a <kick_process+0x3a>
ffffffff8106f43f: 89 d1 mov %edx,%ecx
ffffffff8106f441: 48 c7 c0 00 2c 01 00 mov $0x12c00,%rax
ffffffff8106f448: 48 8b 0c cd a0 bc cb mov -0x7e344360(,%rcx,8),%rcx
ffffffff8106f44f: 81
ffffffff8106f450: 48 3b bc 08 00 08 00 cmp 0x800(%rax,%rcx,1),%rdi
ffffffff8106f457: 00
ffffffff8106f458: 74 26 je ffffffff8106f480 <kick_process+0x60>
* ffffffff8106f45a: 65 ff 0c 25 e0 b7 00 decl %gs:0xb7e0
ffffffff8106f461: 00
* ffffffff8106f462: 74 0c je ffffffff8106f470 <kick_process+0x50>
ffffffff8106f464: 5d pop %rbp
ffffffff8106f465: c3 retq
ffffffff8106f466: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
ffffffff8106f46d: 00 00 00
* ffffffff8106f470: e8 9b b6 f9 ff callq ffffffff8100ab10 <___preempt_schedule>
ffffffff8106f475: 5d pop %rbp
ffffffff8106f476: c3 retq
ffffffff8106f477: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
ffffffff8106f47e: 00 00
ffffffff8106f480: 89 d7 mov %edx,%edi
ffffffff8106f482: ff 15 b8 e0 ba 00 callq *0xbae0b8(%rip) # ffffffff81c1d540 <smp_ops+0x20>
ffffffff8106f488: eb d0 jmp ffffffff8106f45a <kick_process+0x3a>
ffffffff8106f48a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
Which is indeed perfect. So should I go 'fix' the other _and_test()
functions we have to do this same thing?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation
2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
2013-09-10 13:27 ` Peter Zijlstra
2013-09-10 14:02 ` Eric Dumazet
@ 2013-09-10 16:48 ` Peter Zijlstra
2 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 16:48 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch
Updated version to include the changes suggested by Eric and the asm
goto magic from Linus.
---
Subject: sched, x86: Provide a per-cpu preempt_count implementation
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Aug 14 14:51:00 CEST 2013
Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.
We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.
Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/Kbuild | 1
arch/x86/include/asm/preempt.h | 114 +++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/thread_info.h | 5 -
arch/x86/kernel/asm-offsets.c | 1
arch/x86/kernel/cpu/common.c | 5 +
arch/x86/kernel/entry_32.S | 7 --
arch/x86/kernel/entry_64.S | 4 -
arch/x86/kernel/process_32.c | 10 +++
arch/x86/kernel/process_64.c | 10 +++
9 files changed, 144 insertions(+), 13 deletions(-)
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,4 +5,3 @@ genhdr-y += unistd_64.h
genhdr-y += unistd_x32.h
generic-y += clkdev.h
-generic-y += preempt.h
--- /dev/null
+++ b/arch/x86/include/asm/preempt.h
@@ -0,0 +1,114 @@
+#ifndef __ASM_PREEMPT_H
+#define __ASM_PREEMPT_H
+
+#include <asm/percpu.h>
+#include <linux/thread_info.h>
+
+DECLARE_PER_CPU(int, __preempt_count);
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
+static __always_inline int preempt_count(void)
+{
+ return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline int *preempt_count_ptr(void)
+{
+ return &__raw_get_cpu_var(__preempt_count);
+}
+
+/*
+ * must be macros to avoid header recursion hell
+ */
+#define task_preempt_count(p) \
+ (task_thread_info(p)->saved_preempt_count & ~PREEMPT_NEED_RESCHED)
+
+#define init_task_preempt_count(p) do { \
+ task_thread_info(p)->saved_preempt_count = 1 | PREEMPT_NEED_RESCHED; \
+} while (0)
+
+#define init_idle_preempt_count(p, cpu) do { \
+ task_thread_info(p)->saved_preempt_count = 0; \
+ per_cpu(__preempt_count, (cpu)) = 0; \
+} while (0)
+
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+ __this_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+ __this_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+ return !(__this_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
+}
+
+/*
+ * The various preempt_count add/sub methods
+ */
+
+static __always_inline void __preempt_count_add(int val)
+{
+ __this_cpu_add_4(__preempt_count, val);
+}
+
+static __always_inline void __preempt_count_sub(int val)
+{
+ __this_cpu_add_4(__preempt_count, -val);
+}
+
+#ifdef CC_HAVE_ASM_GOTO
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+ asm goto("decl " __percpu_arg(0) "\n\t"
+ "je %l[became_zero]"
+ : :"m" (__preempt_count):"memory":became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+ unsigned char c;
+
+ asm ("decl " __percpu_arg(0) "; sete %1"
+ : "+m" (__preempt_count), "=qm" (c));
+
+ return c != 0;
+}
+#endif
+
+/*
+ * Returns true when we need to resched -- even if we can not.
+ */
+static __always_inline bool need_resched(void)
+{
+ return unlikely(test_preempt_need_resched());
+}
+
+/*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+static __always_inline bool should_resched(void)
+{
+ return unlikely(!__this_cpu_read_4(__preempt_count));
+}
+
+#endif /* __ASM_PREEMPT_H */
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -28,8 +28,7 @@ struct thread_info {
__u32 flags; /* low level flags */
__u32 status; /* thread synchronous flags */
__u32 cpu; /* current CPU */
- int preempt_count; /* 0 => preemptable,
- <0 => BUG */
+ int saved_preempt_count;
mm_segment_t addr_limit;
struct restart_block restart_block;
void __user *sysenter_return;
@@ -49,7 +48,7 @@ struct thread_info {
.exec_domain = &default_exec_domain, \
.flags = 0, \
.cpu = 0, \
- .preempt_count = INIT_PREEMPT_COUNT, \
+ .saved_preempt_count = INIT_PREEMPT_COUNT, \
.addr_limit = KERNEL_DS, \
.restart_block = { \
.fn = do_no_restart_syscall, \
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -32,7 +32,6 @@ void common(void) {
OFFSET(TI_flags, thread_info, flags);
OFFSET(TI_status, thread_info, status);
OFFSET(TI_addr_limit, thread_info, addr_limit);
- OFFSET(TI_preempt_count, thread_info, preempt_count);
BLANK();
OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1095,6 +1095,9 @@ DEFINE_PER_CPU(char *, irq_stack_ptr) =
DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
+
DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
/*
@@ -1169,6 +1172,8 @@ void debug_stack_reset(void)
DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
EXPORT_PER_CPU_SYMBOL(current_task);
+DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
+EXPORT_PER_CPU_SYMBOL(__preempt_count);
DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
#ifdef CONFIG_CC_STACKPROTECTOR
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -362,12 +362,9 @@ END(ret_from_exception)
#ifdef CONFIG_PREEMPT
ENTRY(resume_kernel)
DISABLE_INTERRUPTS(CLBR_ANY)
- cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ?
- jnz restore_all
need_resched:
- movl TI_flags(%ebp), %ecx # need_resched set ?
- testb $_TIF_NEED_RESCHED, %cl
- jz restore_all
+ cmpl $0,PER_CPU_VAR(__preempt_count)
+ jnz restore_all
testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off (exception path) ?
jz restore_all
call preempt_schedule_irq
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1103,10 +1103,8 @@ ENTRY(native_iret)
/* Returning to kernel space. Check if we need preemption */
/* rcx: threadinfo. interrupts off. */
ENTRY(retint_kernel)
- cmpl $0,TI_preempt_count(%rcx)
+ cmpl $0,PER_CPU_VAR(__preempt_count)
jnz retint_restore_args
- bt $TIF_NEED_RESCHED,TI_flags(%rcx)
- jnc retint_restore_args
bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */
jnc retint_restore_args
call preempt_schedule_irq
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -291,6 +291,16 @@ __switch_to(struct task_struct *prev_p,
if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
set_iopl_mask(next->iopl);
+#ifdef CONFIG_PREEMPT_COUNT
+ /*
+ * If it were not for PREEMPT_ACTIVE we could guarantee that the
+ * preempt_count of all tasks was equal here and this would not be
+ * needed.
+ */
+ task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
+ this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
+#endif
+
/*
* Now maybe handle debug registers and/or IO bitmaps
*/
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -363,6 +363,16 @@ __switch_to(struct task_struct *prev_p,
this_cpu_write(old_rsp, next->usersp);
this_cpu_write(current_task, next_p);
+#ifdef CONFIG_PREEMPT_COUNT
+ /*
+ * If it were not for PREEMPT_ACTIVE we could guarantee that the
+ * preempt_count of all tasks was equal here and this would not be
+ * needed.
+ */
+ task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
+ this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
+#endif
+
this_cpu_write(kernel_stack,
(unsigned long)task_stack_page(next_p) +
THREAD_SIZE - KERNEL_STACK_OFFSET);
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 16:45 ` Peter Zijlstra
@ 2013-09-10 17:06 ` Linus Torvalds
2013-09-10 21:25 ` Peter Zijlstra
0 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 17:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Tue, Sep 10, 2013 at 9:45 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The usage site:
>
> #define preempt_enable() \
> do { \
> barrier(); \
> if (unlikely(preempt_count_dec_and_test())) \
> __preempt_schedule(); \
> } while (0)
>
> Already includes the barrier explicitly, so do we still need the memory
> clobber in that asm goto thing?
Yeah, you do to be safe, just to let gcc know that it may be changing
the memory location.
The issue is that because an "asm goto" cannot have outputs, I had to
change the (correct) "+m" input/output into just a "m" (input).
So without the memory clobber, gcc might decide that the thing doesn't
actually change the preempt count, and perhaps move a load of that
across the "asm goto".
Admittedly, that does sound almost impossibly unlikely, but I'd be
happier being careful.
> That said, your change results in:
>
> * ffffffff8106f45a: 65 ff 0c 25 e0 b7 00 decl %gs:0xb7e0
> ffffffff8106f461: 00
> * ffffffff8106f462: 74 0c je ffffffff8106f470 <kick_process+0x50>
...
> Which is indeed perfect. So should I go 'fix' the other _and_test()
> functions we have to do this same thing?
It would be a good thing to test. There might be downsides with "asm
goto" (maybe it limits gcc some way), but it does in general save us
not only two instructions but also a register.
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 17:06 ` Linus Torvalds
@ 2013-09-10 21:25 ` Peter Zijlstra
2013-09-10 21:43 ` Linus Torvalds
0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-10 21:25 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Tue, Sep 10, 2013 at 10:06:44AM -0700, Linus Torvalds wrote:
> > Which is indeed perfect. So should I go 'fix' the other _and_test()
> > functions we have to do this same thing?
>
> It would be a good thing to test. There might be downsides with "asm
> goto" (maybe it limits gcc some way), but it does in general save us
> not only two instructions but also a register.
Here's one that builds and boots on kvm until wanting to mount root.
I'm not entirely sure on the "ir" vs "er" thing and atomic64_t and
local_t are inconsistent wrt that so I'm too.
I'll try running on actual hardware tomorrow sometime.
---
arch/x86/include/asm/atomic.h | 52 +++++++++++++++++++++++++++++++++++
arch/x86/include/asm/atomic64_64.h | 54 ++++++++++++++++++++++++++++++++++++-
arch/x86/include/asm/local.h | 52 +++++++++++++++++++++++++++++++++++
3 files changed, 157 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -74,6 +74,18 @@ static inline void atomic_sub(int i, ato
* true if the result is zero, or false for all
* other cases.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic_sub_and_test(int i, atomic_t *v)
+{
+ asm volatile goto(LOCK_PREFIX "subl %1,%0;"
+ "je %l[became_zero]"
+ : : "m" (v->counter), "ir" (i)
+ : "memory" : became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
static inline int atomic_sub_and_test(int i, atomic_t *v)
{
unsigned char c;
@@ -83,6 +95,7 @@ static inline int atomic_sub_and_test(in
: "ir" (i) : "memory");
return c;
}
+#endif
/**
* atomic_inc - increment atomic variable
@@ -116,6 +129,18 @@ static inline void atomic_dec(atomic_t *
* returns true if the result is 0, or false for all other
* cases.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic_dec_and_test(atomic_t *v)
+{
+ asm volatile goto(LOCK_PREFIX "decl %0;"
+ "je %l[became_zero]"
+ : : "m" (v->counter)
+ : "memory" : became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
static inline int atomic_dec_and_test(atomic_t *v)
{
unsigned char c;
@@ -125,6 +150,7 @@ static inline int atomic_dec_and_test(at
: : "memory");
return c != 0;
}
+#endif
/**
* atomic_inc_and_test - increment and test
@@ -134,6 +160,18 @@ static inline int atomic_dec_and_test(at
* and returns true if the result is zero, or false for all
* other cases.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic_inc_and_test(atomic_t *v)
+{
+ asm volatile goto(LOCK_PREFIX "incl %0;"
+ "je %l[became_zero]"
+ : : "m" (v->counter)
+ : "memory" : became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
static inline int atomic_inc_and_test(atomic_t *v)
{
unsigned char c;
@@ -143,6 +181,7 @@ static inline int atomic_inc_and_test(at
: : "memory");
return c != 0;
}
+#endif
/**
* atomic_add_negative - add and test if negative
@@ -153,6 +192,18 @@ static inline int atomic_inc_and_test(at
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic_add_negative(int i, atomic_t *v)
+{
+ asm volatile goto(LOCK_PREFIX "addl %1,%0;"
+ "js %l[became_neg]"
+ : : "m" (v->counter), "ir" (i)
+ : "memory" : became_neg);
+ return 0;
+became_neg:
+ return 1;
+}
+#else
static inline int atomic_add_negative(int i, atomic_t *v)
{
unsigned char c;
@@ -162,6 +213,7 @@ static inline int atomic_add_negative(in
: "ir" (i) : "memory");
return c;
}
+#endif
/**
* atomic_add_return - add integer and return
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -70,6 +70,18 @@ static inline void atomic64_sub(long i,
* true if the result is zero, or false for all
* other cases.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic64_sub_and_test(long i, atomic64_t *v)
+{
+ asm volatile goto(LOCK_PREFIX "subq %1,%0;"
+ "je %l[became_zero]"
+ : : "m" (v->counter), "er" (i)
+ : "memory" : became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
static inline int atomic64_sub_and_test(long i, atomic64_t *v)
{
unsigned char c;
@@ -79,6 +91,7 @@ static inline int atomic64_sub_and_test(
: "er" (i), "m" (v->counter) : "memory");
return c;
}
+#endif
/**
* atomic64_inc - increment atomic64 variable
@@ -114,6 +127,18 @@ static inline void atomic64_dec(atomic64
* returns true if the result is 0, or false for all other
* cases.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic64_dec_and_test(atomic64_t *v)
+{
+ asm volatile goto(LOCK_PREFIX "decq %0;"
+ "je %l[became_zero]"
+ : : "m" (v->counter)
+ : "memory" : became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
static inline int atomic64_dec_and_test(atomic64_t *v)
{
unsigned char c;
@@ -123,6 +148,7 @@ static inline int atomic64_dec_and_test(
: "m" (v->counter) : "memory");
return c != 0;
}
+#endif
/**
* atomic64_inc_and_test - increment and test
@@ -132,6 +158,18 @@ static inline int atomic64_dec_and_test(
* and returns true if the result is zero, or false for all
* other cases.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic64_inc_and_test(atomic64_t *v)
+{
+ asm volatile goto(LOCK_PREFIX "incq %0;"
+ "je %l[became_zero]"
+ : : "m" (v->counter)
+ : "memory" : became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
static inline int atomic64_inc_and_test(atomic64_t *v)
{
unsigned char c;
@@ -141,6 +179,7 @@ static inline int atomic64_inc_and_test(
: "m" (v->counter) : "memory");
return c != 0;
}
+#endif
/**
* atomic64_add_negative - add and test if negative
@@ -151,6 +190,18 @@ static inline int atomic64_inc_and_test(
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int atomic64_add_negative(long i, atomic64_t *v)
+{
+ asm volatile goto(LOCK_PREFIX "addq %1,%0;"
+ "js %l[became_neg]"
+ : : "m" (v->counter), "er" (i)
+ : "memory" : became_neg);
+ return 0;
+became_neg:
+ return 1;
+}
+#else
static inline int atomic64_add_negative(long i, atomic64_t *v)
{
unsigned char c;
@@ -160,6 +211,7 @@ static inline int atomic64_add_negative(
: "er" (i), "m" (v->counter) : "memory");
return c;
}
+#endif
/**
* atomic64_add_return - add and return
@@ -219,7 +271,7 @@ static inline int atomic64_add_unless(at
/*
* atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic_t
+ * @v: pointer of type atomic64_t
*
* The function returns the old value of *v minus 1, even if
* the atomic variable, v, was not decremented.
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -50,6 +50,18 @@ static inline void local_sub(long i, loc
* true if the result is zero, or false for all
* other cases.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int local_sub_and_test(long i, local_t *l)
+{
+ asm volatile goto(_ASM_SUB " %1,%0;"
+ "je %l[became_zero]"
+ : : "m" (l->a.counter), "ir" (i)
+ : "memory" : became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
static inline int local_sub_and_test(long i, local_t *l)
{
unsigned char c;
@@ -59,6 +71,7 @@ static inline int local_sub_and_test(lon
: "ir" (i) : "memory");
return c;
}
+#endif
/**
* local_dec_and_test - decrement and test
@@ -68,6 +81,18 @@ static inline int local_sub_and_test(lon
* returns true if the result is 0, or false for all other
* cases.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int local_dec_and_test(local_t *l)
+{
+ asm volatile goto(_ASM_DEC " %0;"
+ "je %l[became_zero]"
+ : : "m" (l->a.counter)
+ : "memory" : became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
static inline int local_dec_and_test(local_t *l)
{
unsigned char c;
@@ -77,6 +102,7 @@ static inline int local_dec_and_test(loc
: : "memory");
return c != 0;
}
+#endif
/**
* local_inc_and_test - increment and test
@@ -86,6 +112,18 @@ static inline int local_dec_and_test(loc
* and returns true if the result is zero, or false for all
* other cases.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int local_inc_and_test(local_t *l)
+{
+ asm volatile goto(_ASM_INC " %0;"
+ "je %l[became_zero]"
+ : : "m" (l->a.counter)
+ : "memory" : became_zero);
+ return 0;
+became_zero:
+ return 1;
+}
+#else
static inline int local_inc_and_test(local_t *l)
{
unsigned char c;
@@ -95,6 +133,7 @@ static inline int local_inc_and_test(loc
: : "memory");
return c != 0;
}
+#endif
/**
* local_add_negative - add and test if negative
@@ -105,6 +144,18 @@ static inline int local_inc_and_test(loc
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
+#ifdef CC_HAVE_ASM_GOTO
+static inline int local_add_negative(long i, local_t *l)
+{
+ asm volatile goto(_ASM_ADD " %1,%0;"
+ "js %l[became_neg]"
+ : : "m" (l->a.counter), "ir" (i)
+ : "memory" : became_neg);
+ return 0;
+became_neg:
+ return 1;
+}
+#else
static inline int local_add_negative(long i, local_t *l)
{
unsigned char c;
@@ -114,6 +165,7 @@ static inline int local_add_negative(lon
: "ir" (i) : "memory");
return c;
}
+#endif
/**
* local_add_return - add and return
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 21:25 ` Peter Zijlstra
@ 2013-09-10 21:43 ` Linus Torvalds
2013-09-10 21:51 ` H. Peter Anvin
2013-09-11 13:13 ` Peter Zijlstra
0 siblings, 2 replies; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 21:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Tue, Sep 10, 2013 at 2:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Here's one that builds and boots on kvm until wanting to mount root.
>
> I'm not entirely sure on the "ir" vs "er" thing and atomic64_t and
> local_t are inconsistent wrt that so I'm too.
"i" is "any constant", while "e" is "32-bit signed constant".
And I think all of the 64-bit ones should probably be "e", because
afaik there is no way to add a 64-bit constant directly to memory (you
have to load it into a register first).
Of course, in reality, the constant is always just 1 or -1 or
something like that, so nobody will ever notice the incorrect case...
And it doesn't matter for the 32-bit cases, obviously, but we could
just make them all be "e" for simplicity.
That said, looking at your patch, I get the *very* strong feeling that
we could make a macro that does all the repetitions for us, and then
have a
GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
GENERATE_RMW(atomic_dec_and_test, LOCK_PREFIX "decl", "e", "")
..
GENERATE_RMW(atomic_add_negative, LOCK_PREFIX "addl", "s", "")
GENERATE_RMW(local_sub_and_test, "subl", "e", __percpu_prefix)
...
etc.
I'm sure the macro would be nasty as hell (and I bet it needs a few
more arguments), but then we'd avoid the repetition..
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 21:43 ` Linus Torvalds
@ 2013-09-10 21:51 ` H. Peter Anvin
2013-09-10 22:02 ` Linus Torvalds
2013-09-11 13:13 ` Peter Zijlstra
1 sibling, 1 reply; 52+ messages in thread
From: H. Peter Anvin @ 2013-09-10 21:51 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, Ingo Molnar, Andi Kleen, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On 09/10/2013 02:43 PM, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 2:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> Here's one that builds and boots on kvm until wanting to mount root.
>>
>> I'm not entirely sure on the "ir" vs "er" thing and atomic64_t and
>> local_t are inconsistent wrt that so I'm too.
>
> "i" is "any constant", while "e" is "32-bit signed constant".
>
> And I think all of the 64-bit ones should probably be "e", because
> afaik there is no way to add a 64-bit constant directly to memory (you
> have to load it into a register first).
>
> Of course, in reality, the constant is always just 1 or -1 or
> something like that, so nobody will ever notice the incorrect case...
>
> And it doesn't matter for the 32-bit cases, obviously, but we could
> just make them all be "e" for simplicity.
>
> That said, looking at your patch, I get the *very* strong feeling that
> we could make a macro that does all the repetitions for us, and then
> have a
>
> GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
> GENERATE_RMW(atomic_dec_and_test, LOCK_PREFIX "decl", "e", "")
> ..
> GENERATE_RMW(atomic_add_negative, LOCK_PREFIX "addl", "s", "")
>
> GENERATE_RMW(local_sub_and_test, "subl", "e", __percpu_prefix)
> ...
>
> etc.
>
> I'm sure the macro would be nasty as hell (and I bet it needs a few
> more arguments), but then we'd avoid the repetition..
>
Actually, the right thing here really is "er" (which I think you meant,
but just to make it clear.) Why? Even if the value is representable as
a signed immediate, if gcc already happens to have it in a register it
will be better to use the register.
"e" doesn't work on versions of gcc older than the first x86-64 release,
but we don't care about that anymore.
A final good question is if we should encapsulate the add/inc and
sub/dec into a single function; one could easily do somethin glike:
static inline int atomic_sub_and_test(int i, atomic_t *)
{
if (__builtin_constant_p(i) && i == 1)
/* Use incl */
else if (__builtin_constant_p(i) && i == -1)
/* Use decl */
else
/* Use addl */
}
-hpa
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 21:51 ` H. Peter Anvin
@ 2013-09-10 22:02 ` Linus Torvalds
2013-09-10 22:06 ` H. Peter Anvin
0 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-10 22:02 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Peter Zijlstra, Ingo Molnar, Andi Kleen, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Tue, Sep 10, 2013 at 2:51 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/10/2013 02:43 PM, Linus Torvalds wrote:
>
> Actually, the right thing here really is "er" (which I think you meant,
> but just to make it clear.)
Yes, I was just answering the i-vs-e confusion.
> "e" doesn't work on versions of gcc older than the first x86-64 release,
> but we don't care about that anymore.
Indeed.
> A final good question is if we should encapsulate the add/inc and
> sub/dec into a single function; one could easily do somethin glike:
Yes. However, I would do that at a higher level than the one that
builds the actual functions.
That said, there's a few cases where you might want to specify
add-vs-sub explicitly, but they are rather odd, namely the fact that
"-128" fits in a byte, but "128" does not.
So it can be better to add 128 by doing a "subl $-128" than by doing
an "add $128".
But we probably don't have any situation where we care about that
special value of "128". I've seen the trick, though.
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 22:02 ` Linus Torvalds
@ 2013-09-10 22:06 ` H. Peter Anvin
0 siblings, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2013-09-10 22:06 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, Ingo Molnar, Andi Kleen, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On 09/10/2013 03:02 PM, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 2:51 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 09/10/2013 02:43 PM, Linus Torvalds wrote:
>>
>> Actually, the right thing here really is "er" (which I think you meant,
>> but just to make it clear.)
>
> Yes, I was just answering the i-vs-e confusion.
>
>> "e" doesn't work on versions of gcc older than the first x86-64 release,
>> but we don't care about that anymore.
>
> Indeed.
>
>> A final good question is if we should encapsulate the add/inc and
>> sub/dec into a single function; one could easily do somethin glike:
>
> Yes. However, I would do that at a higher level than the one that
> builds the actual functions.
>
> That said, there's a few cases where you might want to specify
> add-vs-sub explicitly, but they are rather odd, namely the fact that
> "-128" fits in a byte, but "128" does not.
>
> So it can be better to add 128 by doing a "subl $-128" than by doing
> an "add $128".
>
> But we probably don't have any situation where we care about that
> special value of "128". I've seen the trick, though.
>
Yes, and if __builtin_constant_p() we could even do it explicitly.
Unfortunately I don't think gcc allows alternatives in asm() statements,
unlike in its own pattern tables.
-hpa
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
@ 2013-09-11 1:59 ` Andy Lutomirski
2013-09-11 8:25 ` Peter Zijlstra
2013-09-11 11:14 ` Peter Zijlstra
1 sibling, 1 reply; 52+ messages in thread
From: Andy Lutomirski @ 2013-09-11 1:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch
On 09/10/2013 06:08 AM, Peter Zijlstra wrote:
> In order to combine the preemption and need_resched test we need to
> fold the need_resched information into the preempt_count value.
>
> We keep the existing TIF_NEED_RESCHED infrastructure in place but at 3
> sites test it and fold its value into preempt_count; namely:
>
> - resched_task() when setting TIF_NEED_RESCHED on the current task
> - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
> remote task it follows it up with a reschedule IPI
> and we can modify the cpu local preempt_count from
> there.
> - cpu_idle_loop() for when resched_task() found tsk_is_polling().
It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
is set but the preempt resched bit is not -- the need_resched call
between monitor and mwait won't notice TIF_NEED_RESCHED.
Is this condition possible?
--Andy
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-11 1:59 ` Andy Lutomirski
@ 2013-09-11 8:25 ` Peter Zijlstra
2013-09-11 11:06 ` Peter Zijlstra
0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 8:25 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch
On Tue, Sep 10, 2013 at 06:59:57PM -0700, Andy Lutomirski wrote:
> On 09/10/2013 06:08 AM, Peter Zijlstra wrote:
> > In order to combine the preemption and need_resched test we need to
> > fold the need_resched information into the preempt_count value.
> >
> > We keep the existing TIF_NEED_RESCHED infrastructure in place but at 3
> > sites test it and fold its value into preempt_count; namely:
> >
> > - resched_task() when setting TIF_NEED_RESCHED on the current task
> > - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
> > remote task it follows it up with a reschedule IPI
> > and we can modify the cpu local preempt_count from
> > there.
> > - cpu_idle_loop() for when resched_task() found tsk_is_polling().
>
>
> It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
> is set but the preempt resched bit is not -- the need_resched call
> between monitor and mwait won't notice TIF_NEED_RESCHED.
>
> Is this condition possible?
Ah indeed, I'll have to go find all idle loops out there it seems. Now
if only they were easy to spot :/
I was hoping the generic idle thing was good enough, apparently not
quite. Thanks for spotting that.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-11 8:25 ` Peter Zijlstra
@ 2013-09-11 11:06 ` Peter Zijlstra
2013-09-11 13:34 ` Mike Galbraith
2013-09-11 16:35 ` Andy Lutomirski
0 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 11:06 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
Vaidyanathan Srinivasan
On Wed, Sep 11, 2013 at 10:25:30AM +0200, Peter Zijlstra wrote:
> On Tue, Sep 10, 2013 at 06:59:57PM -0700, Andy Lutomirski wrote:
> > It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
> > is set but the preempt resched bit is not -- the need_resched call
> > between monitor and mwait won't notice TIF_NEED_RESCHED.
> >
> > Is this condition possible?
>
> Ah indeed, I'll have to go find all idle loops out there it seems. Now
> if only they were easy to spot :/
>
> I was hoping the generic idle thing was good enough, apparently not
> quite. Thanks for spotting that.
OK, and the reason I didn't notice is that the entire TS_POLLING thing
is completely wrecked so we'll get the interrupt anyway.
The below might fix it.. it boots, but then it already did so that
doesn't say much.
Mike does it fix the funny you saw?
---
Subject: sched, idle: Fix the idle polling state logic
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Sep 11 12:43:13 CEST 2013
Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
regressed several workloads and caused excessive reschedule
interrupts.
The patch in question failed to notice that the x86 code had an
inverted sense of the polling state versus the new generic code (x86:
default polling, generic: default !polling).
Fix the two prominent x86 mwait based idle drivers and introduce a few
new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
usage).
Also switch the idle routines to using test_need_resched() which is an
immediate TIF_NEED_RESCHED test as opposed to need_resched which will
end up being slightly different.
Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/kernel/process.c | 6 +--
drivers/acpi/processor_idle.c | 46 +++++-------------------
drivers/idle/intel_idle.c | 2 -
include/linux/sched.h | 78 ++++++++++++++++++++++++++++++++++++++----
kernel/cpu/idle.c | 9 ++--
5 files changed, 89 insertions(+), 52 deletions(-)
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -391,9 +391,9 @@ static void amd_e400_idle(void)
* The switch back from broadcast mode needs to be
* called with interrupts disabled.
*/
- local_irq_disable();
- clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
- local_irq_enable();
+ local_irq_disable();
+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
+ local_irq_enable();
} else
default_idle();
}
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -119,17 +119,10 @@ static struct dmi_system_id processor_po
*/
static void acpi_safe_halt(void)
{
- current_thread_info()->status &= ~TS_POLLING;
- /*
- * TS_POLLING-cleared state must be visible before we
- * test NEED_RESCHED:
- */
- smp_mb();
- if (!need_resched()) {
+ if (!test_need_resched()) {
safe_halt();
local_irq_disable();
}
- current_thread_info()->status |= TS_POLLING;
}
#ifdef ARCH_APICTIMER_STOPS_ON_C3
@@ -737,6 +730,11 @@ static int acpi_idle_enter_c1(struct cpu
if (unlikely(!pr))
return -EINVAL;
+ if (cx->entry_method == ACPI_CSTATE_FFH) {
+ if (current_set_polling_and_test())
+ return -EINVAL;
+ }
+
lapic_timer_state_broadcast(pr, cx, 1);
acpi_idle_do_entry(cx);
@@ -790,18 +788,9 @@ static int acpi_idle_enter_simple(struct
if (unlikely(!pr))
return -EINVAL;
- if (cx->entry_method != ACPI_CSTATE_FFH) {
- current_thread_info()->status &= ~TS_POLLING;
- /*
- * TS_POLLING-cleared state must be visible before we test
- * NEED_RESCHED:
- */
- smp_mb();
-
- if (unlikely(need_resched())) {
- current_thread_info()->status |= TS_POLLING;
+ if (cx->entry_method == ACPI_CSTATE_FFH) {
+ if (current_set_polling_and_test())
return -EINVAL;
- }
}
/*
@@ -819,9 +808,6 @@ static int acpi_idle_enter_simple(struct
sched_clock_idle_wakeup_event(0);
- if (cx->entry_method != ACPI_CSTATE_FFH)
- current_thread_info()->status |= TS_POLLING;
-
lapic_timer_state_broadcast(pr, cx, 0);
return index;
}
@@ -858,18 +844,9 @@ static int acpi_idle_enter_bm(struct cpu
}
}
- if (cx->entry_method != ACPI_CSTATE_FFH) {
- current_thread_info()->status &= ~TS_POLLING;
- /*
- * TS_POLLING-cleared state must be visible before we test
- * NEED_RESCHED:
- */
- smp_mb();
-
- if (unlikely(need_resched())) {
- current_thread_info()->status |= TS_POLLING;
+ if (cx->entry_method == ACPI_CSTATE_FFH) {
+ if (current_set_polling_and_test())
return -EINVAL;
- }
}
acpi_unlazy_tlb(smp_processor_id());
@@ -915,9 +892,6 @@ static int acpi_idle_enter_bm(struct cpu
sched_clock_idle_wakeup_event(0);
- if (cx->entry_method != ACPI_CSTATE_FFH)
- current_thread_info()->status |= TS_POLLING;
-
lapic_timer_state_broadcast(pr, cx, 0);
return index;
}
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -359,7 +359,7 @@ static int intel_idle(struct cpuidle_dev
if (!(lapic_timer_reliable_states & (1 << (cstate))))
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
- if (!need_resched()) {
+ if (!current_set_polling_and_test()) {
__monitor((void *)¤t_thread_info()->flags, 0, 0);
smp_mb();
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2471,34 +2471,98 @@ static inline int tsk_is_polling(struct
{
return task_thread_info(p)->status & TS_POLLING;
}
-static inline void current_set_polling(void)
+static inline void __current_set_polling(void)
{
current_thread_info()->status |= TS_POLLING;
}
-static inline void current_clr_polling(void)
+static inline bool __must_check current_set_polling_and_test(void)
+{
+ __current_set_polling();
+
+ /*
+ * Polling state must be visible before we test NEED_RESCHED,
+ * paired by resched_task()
+ */
+ smp_mb();
+
+ return unlikely(test_need_resched());
+}
+
+static inline void __current_clr_polling(void)
{
current_thread_info()->status &= ~TS_POLLING;
- smp_mb__after_clear_bit();
+}
+
+static inline bool __must_check current_clr_polling_and_test(void)
+{
+ __current_clr_polling();
+
+ /*
+ * Polling state must be visible before we test NEED_RESCHED,
+ * paired by resched_task()
+ */
+ smp_mb();
+
+ return unlikely(test_need_resched());
}
#elif defined(TIF_POLLING_NRFLAG)
static inline int tsk_is_polling(struct task_struct *p)
{
return test_tsk_thread_flag(p, TIF_POLLING_NRFLAG);
}
-static inline void current_set_polling(void)
+
+static inline void __current_set_polling(void)
{
set_thread_flag(TIF_POLLING_NRFLAG);
}
-static inline void current_clr_polling(void)
+static inline bool __must_check current_set_polling_and_test(void)
+{
+ __current_set_polling();
+
+ /*
+ * Polling state must be visible before we test NEED_RESCHED,
+ * paired by resched_task()
+ *
+ * XXX: assumes set/clear bit are identical barrier wise.
+ */
+ smp_mb__after_clear_bit();
+
+ return unlikely(test_need_resched());
+}
+
+static inline void __current_clr_polling(void)
{
clear_thread_flag(TIF_POLLING_NRFLAG);
}
+
+static inline bool __must_check current_clr_polling_and_test(void)
+{
+ __current_clr_polling();
+
+ /*
+ * Polling state must be visible before we test NEED_RESCHED,
+ * paired by resched_task()
+ */
+ smp_mb__after_clear_bit();
+
+ return unlikely(test_need_resched());
+}
+
#else
static inline int tsk_is_polling(struct task_struct *p) { return 0; }
-static inline void current_set_polling(void) { }
-static inline void current_clr_polling(void) { }
+static inline void __current_set_polling(void) { }
+static inline void __current_clr_polling(void) { }
+
+static inline bool __must_check current_set_polling_and_test(void)
+{
+ return unlikely(test_need_resched);
+}
+static inline bool __must_check current_clr_polling_and_test(void)
+{
+ return unlikely(test_need_resched);
+}
#endif
/*
--- a/kernel/cpu/idle.c
+++ b/kernel/cpu/idle.c
@@ -44,7 +44,7 @@ static inline int cpu_idle_poll(void)
rcu_idle_enter();
trace_cpu_idle_rcuidle(0, smp_processor_id());
local_irq_enable();
- while (!need_resched())
+ while (!test_need_resched())
cpu_relax();
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
rcu_idle_exit();
@@ -92,8 +92,7 @@ static void cpu_idle_loop(void)
if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
cpu_idle_poll();
} else {
- current_clr_polling();
- if (!need_resched()) {
+ if (!current_clr_polling_and_test()) {
stop_critical_timings();
rcu_idle_enter();
arch_cpu_idle();
@@ -103,7 +102,7 @@ static void cpu_idle_loop(void)
} else {
local_irq_enable();
}
- current_set_polling();
+ __current_set_polling();
}
arch_cpu_idle_exit();
/*
@@ -136,7 +135,7 @@ void cpu_startup_entry(enum cpuhp_state
*/
boot_init_stack_canary();
#endif
- current_set_polling();
+ __current_set_polling();
arch_cpu_idle_prepare();
cpu_idle_loop();
}
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
2013-09-11 1:59 ` Andy Lutomirski
@ 2013-09-11 11:14 ` Peter Zijlstra
1 sibling, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 11:14 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Andi Kleen, Peter Anvin, Mike Galbraith, Thomas Gleixner,
Arjan van de Ven, Frederic Weisbecker, linux-kernel, linux-arch
Prompted by a question from Mike I updated the Changelog to explain why
we need to keep TIF_NEED_RESCHED.
---
Subject: sched: Add NEED_RESCHED to the preempt_count
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Aug 14 14:55:31 CEST 2013
In order to combine the preemption and need_resched test we need to
fold the need_resched information into the preempt_count value.
Since the NEED_RESCHED flag is set across CPUs this needs to be an
atomic operation, however we very much want to avoid making
preempt_count atomic, therefore we keep the existing TIF_NEED_RESCHED
infrastructure in place but at 3 sites test it and fold its value into
preempt_count; namely:
- resched_task() when setting TIF_NEED_RESCHED on the current task
- scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
remote task it follows it up with a reschedule IPI
and we can modify the cpu local preempt_count from
there.
- cpu_idle_loop() for when resched_task() found tsk_is_polling().
We use an inverted bitmask to indicate need_resched so that a 0 means
both need_resched and !atomic.
Also remove the barrier() in preempt_enable() between
preempt_enable_no_resched() and preempt_check_resched() to avoid
having to reload the preemption value and allow the compiler to use
the flags of the previuos decrement. I couldn't come up with any sane
reason for this barrier() to be there as preempt_enable_no_resched()
already has a barrier() before doing the decrement.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
include/linux/preempt.h | 42 +++++++++++++++++++++++++++++++++++++-----
include/linux/sched.h | 2 +-
include/linux/thread_info.h | 1 +
kernel/context_tracking.c | 2 +-
kernel/cpu/idle.c | 10 ++++++++++
kernel/sched/core.c | 18 ++++++++++++++----
6 files changed, 64 insertions(+), 11 deletions(-)
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -10,9 +10,19 @@
#include <linux/linkage.h>
#include <linux/list.h>
+/*
+ * We use the MSB mostly because its available; see <linux/hardirq.h> for
+ * the other bits.
+ */
+#define PREEMPT_NEED_RESCHED 0x80000000
+
+/*
+ * We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
+ * that think a non-zero value indicates we cannot preempt.
+ */
static __always_inline int preempt_count(void)
{
- return current_thread_info()->preempt_count;
+ return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
}
static __always_inline int *preempt_count_ptr(void)
@@ -20,6 +30,30 @@ static __always_inline int *preempt_coun
return ¤t_thread_info()->preempt_count;
}
+/*
+ * We fold the NEED_RESCHED bit into the preempt count such that
+ * preempt_enable() can decrement and test for needing to reschedule with a
+ * single instruction.
+ *
+ * We invert the actual bit, so that when the decrement hits 0 we know we both
+ * need to resched (the bit is cleared) and can resched (no preempt count).
+ */
+
+static __always_inline void set_preempt_need_resched(void)
+{
+ *preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline void clear_preempt_need_resched(void)
+{
+ *preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
+}
+
+static __always_inline bool test_preempt_need_resched(void)
+{
+ return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
+}
+
#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
extern void add_preempt_count(int val);
extern void sub_preempt_count(int val);
@@ -37,7 +71,7 @@ asmlinkage void preempt_schedule(void);
#define preempt_check_resched() \
do { \
- if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ if (unlikely(!*preempt_count_ptr())) \
preempt_schedule(); \
} while (0)
@@ -47,7 +81,7 @@ void preempt_schedule_context(void);
#define preempt_check_resched_context() \
do { \
- if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
+ if (unlikely(!*preempt_count_ptr())) \
preempt_schedule_context(); \
} while (0)
#else
@@ -83,7 +117,6 @@ do { \
#define preempt_enable() \
do { \
preempt_enable_no_resched(); \
- barrier(); \
preempt_check_resched(); \
} while (0)
@@ -111,7 +144,6 @@ do { \
#define preempt_enable_notrace() \
do { \
preempt_enable_no_resched_notrace(); \
- barrier(); \
preempt_check_resched_context(); \
} while (0)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2405,7 +2405,7 @@ static inline int signal_pending_state(l
static inline int need_resched(void)
{
- return unlikely(test_thread_flag(TIF_NEED_RESCHED));
+ return unlikely(test_preempt_need_resched());
}
/*
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -104,6 +104,7 @@ static inline int test_ti_thread_flag(st
#define test_thread_flag(flag) \
test_ti_thread_flag(current_thread_info(), flag)
+#define test_need_resched() test_thread_flag(TIF_NEED_RESCHED)
#define set_need_resched() set_thread_flag(TIF_NEED_RESCHED)
#define clear_need_resched() clear_thread_flag(TIF_NEED_RESCHED)
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -115,7 +115,7 @@ void __sched notrace preempt_schedule_co
{
enum ctx_state prev_ctx;
- if (likely(!preemptible()))
+ if (likely(preempt_count() || irqs_disabled()))
return;
/*
--- a/kernel/cpu/idle.c
+++ b/kernel/cpu/idle.c
@@ -106,6 +106,13 @@ static void cpu_idle_loop(void)
current_set_polling();
}
arch_cpu_idle_exit();
+ /*
+ * We need to test and propagate the TIF_NEED_RESCHED
+ * bit here because we might not have send the
+ * reschedule IPI to idle tasks.
+ */
+ if (test_need_resched())
+ set_preempt_need_resched();
}
tick_nohz_idle_exit();
schedule_preempt_disabled();
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -526,8 +526,10 @@ void resched_task(struct task_struct *p)
set_tsk_need_resched(p);
cpu = task_cpu(p);
- if (cpu == smp_processor_id())
+ if (cpu == smp_processor_id()) {
+ set_preempt_need_resched();
return;
+ }
/* NEED_RESCHED must be visible before we test polling */
smp_mb();
@@ -1411,6 +1413,14 @@ static void sched_ttwu_pending(void)
void scheduler_ipi(void)
{
+ /*
+ * Fold TIF_NEED_RESCHED into the preempt_count; anybody setting
+ * TIF_NEED_RESCHED remotely (for the first time) will also send
+ * this IPI.
+ */
+ if (test_need_resched())
+ set_preempt_need_resched();
+
if (llist_empty(&this_rq()->wake_list)
&& !tick_nohz_full_cpu(smp_processor_id())
&& !got_nohz_idle_kick())
@@ -2445,6 +2455,7 @@ static void __sched __schedule(void)
put_prev_task(rq, prev);
next = pick_next_task(rq);
clear_tsk_need_resched(prev);
+ clear_preempt_need_resched();
rq->skip_clock_update = 0;
if (likely(prev != next)) {
@@ -2531,7 +2542,7 @@ asmlinkage void __sched notrace preempt_
* If there is a non-zero preempt_count or interrupts are disabled,
* we do not want to preempt the current task. Just return..
*/
- if (likely(!preemptible()))
+ if (likely(preempt_count() || irqs_disabled()))
return;
do {
@@ -2556,11 +2567,10 @@ EXPORT_SYMBOL(preempt_schedule);
*/
asmlinkage void __sched preempt_schedule_irq(void)
{
- struct thread_info *ti = current_thread_info();
enum ctx_state prev_state;
/* Catch callers which need to be fixed */
- BUG_ON(ti->preempt_count || !irqs_disabled());
+ BUG_ON(preempt_count() || !irqs_disabled());
prev_state = exception_enter();
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 21:43 ` Linus Torvalds
2013-09-10 21:51 ` H. Peter Anvin
@ 2013-09-11 13:13 ` Peter Zijlstra
2013-09-11 13:26 ` Peter Zijlstra
` (2 more replies)
1 sibling, 3 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 13:13 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Tue, Sep 10, 2013 at 02:43:06PM -0700, Linus Torvalds wrote:
> That said, looking at your patch, I get the *very* strong feeling that
> we could make a macro that does all the repetitions for us, and then
> have a
>
> GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
The below seems to compile..
---
arch/x86/include/asm/atomic.h | 29 +--------
arch/x86/include/asm/atomic64_64.h | 28 +--------
arch/x86/include/asm/local.h | 28 +--------
arch/x86/include/asm/addcc.h | 115 +++++++++++++++++++++++++++++++++++++
4 files changed, 128 insertions(+), 72 deletions(-)
Index: linux-2.6/arch/x86/include/asm/atomic.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/atomic.h
+++ linux-2.6/arch/x86/include/asm/atomic.h
@@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/alternative.h>
#include <asm/cmpxchg.h>
+#include <asm/addcc.h>
/*
* Atomic operations that C can't guarantee us. Useful for
@@ -76,12 +77,7 @@ static inline void atomic_sub(int i, ato
*/
static inline int atomic_sub_and_test(int i, atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_ADDcc(v->counter, -i, LOCK_PREFIX, "e");
}
/**
@@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *
*/
static inline int atomic_dec_and_test(atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "decl %0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_ADDcc(v->counter, -1, LOCK_PREFIX, "e");
}
/**
@@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(at
*/
static inline int atomic_inc_and_test(atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "incl %0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_ADDcc(v->counter, 1, LOCK_PREFIX, "e");
}
/**
@@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(at
*/
static inline int atomic_add_negative(int i, atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
- : "+m" (v->counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_ADDcc(v->counter, i, LOCK_PREFIX, "s");
}
/**
Index: linux-2.6/arch/x86/include/asm/atomic64_64.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/atomic64_64.h
+++ linux-2.6/arch/x86/include/asm/atomic64_64.h
@@ -72,12 +72,7 @@ static inline void atomic64_sub(long i,
*/
static inline int atomic64_sub_and_test(long i, atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "subq %2,%0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "er" (i), "m" (v->counter) : "memory");
- return c;
+ GENERATE_ADDcc(v->counter, -i, LOCK_PREFIX, "e");
}
/**
@@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64
*/
static inline int atomic64_dec_and_test(atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "decq %0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "m" (v->counter) : "memory");
- return c != 0;
+ GENERATE_ADDcc(v->counter, -1, LOCK_PREFIX, "e");
}
/**
@@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(
*/
static inline int atomic64_inc_and_test(atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "incq %0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "m" (v->counter) : "memory");
- return c != 0;
+ GENERATE_ADDcc(v->counter, 1, LOCK_PREFIX, "e");
}
/**
@@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(
*/
static inline int atomic64_add_negative(long i, atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "addq %2,%0; sets %1"
- : "=m" (v->counter), "=qm" (c)
- : "er" (i), "m" (v->counter) : "memory");
- return c;
+ GENERATE_ADDcc(v->counter, i, LOCK_PREFIX, "s");
}
/**
Index: linux-2.6/arch/x86/include/asm/local.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/local.h
+++ linux-2.6/arch/x86/include/asm/local.h
@@ -52,12 +52,7 @@ static inline void local_sub(long i, loc
*/
static inline int local_sub_and_test(long i, local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_SUB "%2,%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_ADDcc(l->a.counter, -i, "", "e");
}
/**
@@ -70,12 +65,7 @@ static inline int local_sub_and_test(lon
*/
static inline int local_dec_and_test(local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_DEC "%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_ADDcc(l->a.counter, -1, "", "e");
}
/**
@@ -88,12 +78,7 @@ static inline int local_dec_and_test(loc
*/
static inline int local_inc_and_test(local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_INC "%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_ADDcc(l->a.counter, 1, "", "e");
}
/**
@@ -107,12 +92,7 @@ static inline int local_inc_and_test(loc
*/
static inline int local_add_negative(long i, local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_ADD "%2,%0; sets %1"
- : "+m" (l->a.counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_ADDcc(l->a.counter, i, "", "s");
}
/**
Index: linux-2.6/arch/x86/include/asm/addcc.h
===================================================================
--- /dev/null
+++ linux-2.6/arch/x86/include/asm/addcc.h
@@ -0,0 +1,115 @@
+#ifndef _ASM_X86_ADDcc
+#define _ASM_X86_ADDcc
+
+extern void __bad_addcc_size(void);
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define GENERATE_ADDcc(var, val, lock, cc) \
+do { \
+ const int add_ID__ = (__builtin_constant_p(val) && \
+ ((val) == 1 || (val) == -1)) ? (val) : 0; \
+ \
+ switch (sizeof(var)) { \
+ case 4: \
+ if (add_ID__ == 1) { \
+ asm volatile goto(lock "incl %0;" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ } else if (add_ID__ == -1) { \
+ asm volatile goto(lock "decl %0;" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ } else { \
+ asm volatile goto(lock "addl %1, %0;" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var), "er" (val) \
+ : "memory" : cc_label); \
+ } \
+ break; \
+ \
+ case 8: \
+ if (add_ID__ == 1) { \
+ asm volatile goto(lock "incq %0;" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ } else if (add_ID__ == -1) { \
+ asm volatile goto(lock "decq %0;" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ } else { \
+ asm volatile goto(lock "addq %1, %0;" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var), "er" (val) \
+ : "memory" : cc_label); \
+ } \
+ break; \
+ \
+ default: __bad_addcc_size(); \
+ } \
+ \
+ return 0; \
+cc_label: \
+ return 1; \
+} while (0)
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define GENERATE_ADDcc(var, val, lock, cc) \
+do { \
+ const int add_ID__ = (__builtin_constant_p(val) && \
+ ((val) == 1 || (val) == -1)) ? (val) : 0; \
+ char c; \
+ \
+ switch (sizeof(var)) { \
+ case 4: \
+ if (add_ID__ == 1) { \
+ asm volatile (lock "incl %0;" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ } else if (add_ID__ == -1) { \
+ asm volatile (lock "decl %0;" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ } else { \
+ asm volatile (lock "addl %2, %0;" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : "er" (val) : "memory"); \
+ } \
+ break; \
+ \
+ case 8: \
+ if (add_ID__ == 1) { \
+ asm volatile (lock "incq %0;" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ } else if (add_ID__ == -1) { \
+ asm volatile (lock "decq %0;" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ } else { \
+ asm volatile (lock "addq %2, %0;" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : "er" (val) : "memory"); \
+ } \
+ break; \
+ \
+ default: __bad_addcc_size(); \
+ } \
+ \
+ return c != 0; \
+} while (0)
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_ADDcc */
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-11 13:13 ` Peter Zijlstra
@ 2013-09-11 13:26 ` Peter Zijlstra
2013-09-11 15:29 ` H. Peter Anvin
2013-09-11 15:33 ` Linus Torvalds
2 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 13:26 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Wed, Sep 11, 2013 at 03:13:23PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 10, 2013 at 02:43:06PM -0700, Linus Torvalds wrote:
> > That said, looking at your patch, I get the *very* strong feeling that
> > we could make a macro that does all the repetitions for us, and then
> > have a
> >
> > GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
>
> The below seems to compile..
>
> +#define GENERATE_ADDcc(var, val, lock, cc) \
And here's one that builds and is usable for per-cpu things too:
static __always_inline bool __preempt_count_dec_and_test(void)
{
GENERATE_ADDcc(__preempt_count, -1, "", __percpu_arg(0), "e");
}
---
arch/x86/include/asm/addcc.h | 115 +++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/atomic.h | 29 +--------
arch/x86/include/asm/atomic64_64.h | 28 +--------
arch/x86/include/asm/local.h | 28 +--------
4 files changed, 128 insertions(+), 72 deletions(-)
--- /dev/null
+++ b/arch/x86/include/asm/addcc.h
@@ -0,0 +1,115 @@
+#ifndef _ASM_X86_ADDcc
+#define _ASM_X86_ADDcc
+
+extern void __bad_addcc_size(void);
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define GENERATE_ADDcc(var, val, lock, arg0, cc) \
+do { \
+ const int add_ID__ = (__builtin_constant_p(val) && \
+ ((val) == 1 || (val) == -1)) ? (val) : 0; \
+ \
+ switch (sizeof(var)) { \
+ case 4: \
+ if (add_ID__ == 1) { \
+ asm volatile goto(lock "incl " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ } else if (add_ID__ == -1) { \
+ asm volatile goto(lock "decl " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ } else { \
+ asm volatile goto(lock "addl %1, " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var), "er" (val) \
+ : "memory" : cc_label); \
+ } \
+ break; \
+ \
+ case 8: \
+ if (add_ID__ == 1) { \
+ asm volatile goto(lock "incq " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ } else if (add_ID__ == -1) { \
+ asm volatile goto(lock "decq " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ } else { \
+ asm volatile goto(lock "addq %1, " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var), "er" (val) \
+ : "memory" : cc_label); \
+ } \
+ break; \
+ \
+ default: __bad_addcc_size(); \
+ } \
+ \
+ return 0; \
+cc_label: \
+ return 1; \
+} while (0)
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define GENERATE_ADDcc(var, val, lock, arg0, cc) \
+do { \
+ const int add_ID__ = (__builtin_constant_p(val) && \
+ ((val) == 1 || (val) == -1)) ? (val) : 0; \
+ char c; \
+ \
+ switch (sizeof(var)) { \
+ case 4: \
+ if (add_ID__ == 1) { \
+ asm volatile (lock "incl " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ } else if (add_ID__ == -1) { \
+ asm volatile (lock "decl " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ } else { \
+ asm volatile (lock "addl %2, " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : "er" (val) : "memory"); \
+ } \
+ break; \
+ \
+ case 8: \
+ if (add_ID__ == 1) { \
+ asm volatile (lock "incq " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ } else if (add_ID__ == -1) { \
+ asm volatile (lock "decq " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ } else { \
+ asm volatile (lock "addq %2, " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : "er" (val) : "memory"); \
+ } \
+ break; \
+ \
+ default: __bad_addcc_size(); \
+ } \
+ \
+ return c != 0; \
+} while (0)
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_ADDcc */
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/alternative.h>
#include <asm/cmpxchg.h>
+#include <asm/addcc.h>
/*
* Atomic operations that C can't guarantee us. Useful for
@@ -76,12 +77,7 @@ static inline void atomic_sub(int i, ato
*/
static inline int atomic_sub_and_test(int i, atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_ADDcc(v->counter, -i, LOCK_PREFIX, "%0", "e");
}
/**
@@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *
*/
static inline int atomic_dec_and_test(atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "decl %0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_ADDcc(v->counter, -1, LOCK_PREFIX, "%0", "e");
}
/**
@@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(at
*/
static inline int atomic_inc_and_test(atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "incl %0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_ADDcc(v->counter, 1, LOCK_PREFIX, "%0", "e");
}
/**
@@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(at
*/
static inline int atomic_add_negative(int i, atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
- : "+m" (v->counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_ADDcc(v->counter, i, LOCK_PREFIX, "%0", "s");
}
/**
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -72,12 +72,7 @@ static inline void atomic64_sub(long i,
*/
static inline int atomic64_sub_and_test(long i, atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "subq %2,%0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "er" (i), "m" (v->counter) : "memory");
- return c;
+ GENERATE_ADDcc(v->counter, -i, LOCK_PREFIX, "%0", "e");
}
/**
@@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64
*/
static inline int atomic64_dec_and_test(atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "decq %0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "m" (v->counter) : "memory");
- return c != 0;
+ GENERATE_ADDcc(v->counter, -1, LOCK_PREFIX, "%0", "e");
}
/**
@@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(
*/
static inline int atomic64_inc_and_test(atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "incq %0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "m" (v->counter) : "memory");
- return c != 0;
+ GENERATE_ADDcc(v->counter, 1, LOCK_PREFIX, "%0", "e");
}
/**
@@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(
*/
static inline int atomic64_add_negative(long i, atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "addq %2,%0; sets %1"
- : "=m" (v->counter), "=qm" (c)
- : "er" (i), "m" (v->counter) : "memory");
- return c;
+ GENERATE_ADDcc(v->counter, i, LOCK_PREFIX, "%0", "s");
}
/**
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -52,12 +52,7 @@ static inline void local_sub(long i, loc
*/
static inline int local_sub_and_test(long i, local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_SUB "%2,%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_ADDcc(l->a.counter, -i, "", "%0", "e");
}
/**
@@ -70,12 +65,7 @@ static inline int local_sub_and_test(lon
*/
static inline int local_dec_and_test(local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_DEC "%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_ADDcc(l->a.counter, -1, "", "%0", "e");
}
/**
@@ -88,12 +78,7 @@ static inline int local_dec_and_test(loc
*/
static inline int local_inc_and_test(local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_INC "%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_ADDcc(l->a.counter, 1, "", "%0", "e");
}
/**
@@ -107,12 +92,7 @@ static inline int local_inc_and_test(loc
*/
static inline int local_add_negative(long i, local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_ADD "%2,%0; sets %1"
- : "+m" (l->a.counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_ADDcc(l->a.counter, i, "", "%0", "s");
}
/**
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-11 11:06 ` Peter Zijlstra
@ 2013-09-11 13:34 ` Mike Galbraith
2013-09-12 6:01 ` Mike Galbraith
2013-09-11 16:35 ` Andy Lutomirski
1 sibling, 1 reply; 52+ messages in thread
From: Mike Galbraith @ 2013-09-11 13:34 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, Linus Torvalds, Ingo Molnar, Andi Kleen,
Peter Anvin, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
Vaidyanathan Srinivasan
On Wed, 2013-09-11 at 13:06 +0200, Peter Zijlstra wrote:
> On Wed, Sep 11, 2013 at 10:25:30AM +0200, Peter Zijlstra wrote:
> > On Tue, Sep 10, 2013 at 06:59:57PM -0700, Andy Lutomirski wrote:
>
> > > It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
> > > is set but the preempt resched bit is not -- the need_resched call
> > > between monitor and mwait won't notice TIF_NEED_RESCHED.
> > >
> > > Is this condition possible?
> >
> > Ah indeed, I'll have to go find all idle loops out there it seems. Now
> > if only they were easy to spot :/
> >
> > I was hoping the generic idle thing was good enough, apparently not
> > quite. Thanks for spotting that.
>
> OK, and the reason I didn't notice is that the entire TS_POLLING thing
> is completely wrecked so we'll get the interrupt anyway.
>
> The below might fix it.. it boots, but then it already did so that
> doesn't say much.
>
> Mike does it fix the funny you saw?
Yup, modulo test_need_resched() not existing in master.
E5620 pipe-test
v3.7.10 578.5 KHz
v3.7.10-nothrottle 366.7 KHz
v3.8.13 468.3 KHz
v3.9.11 462.0 KHz
v3.10.4 419.4 KHz
v3.11-rc3-4-g36f571e 400.1 KHz
master-9013-gbf83e61 553.5 KHz
I still have your not quite complete hrtimer patch in there, lest menu
governor munch any improvement, as well as my throttle-nohz patch, so
seems we may be a _bit_ down vs 3.7, but reschedule_interrupt did crawl
back under its rock, and while I haven't yet booted Q6600 box, core2
Toshiba Satellite lappy is using mwait_idle_with_hints() in master.
Thanks,
-Mike
> ---
> Subject: sched, idle: Fix the idle polling state logic
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Sep 11 12:43:13 CEST 2013
>
> Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
> regressed several workloads and caused excessive reschedule
> interrupts.
>
> The patch in question failed to notice that the x86 code had an
> inverted sense of the polling state versus the new generic code (x86:
> default polling, generic: default !polling).
>
> Fix the two prominent x86 mwait based idle drivers and introduce a few
> new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
> usage).
>
> Also switch the idle routines to using test_need_resched() which is an
> immediate TIF_NEED_RESCHED test as opposed to need_resched which will
> end up being slightly different.
>
> Reported-by: Mike Galbraith <bitbucket@online.de>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
> arch/x86/kernel/process.c | 6 +--
> drivers/acpi/processor_idle.c | 46 +++++-------------------
> drivers/idle/intel_idle.c | 2 -
> include/linux/sched.h | 78 ++++++++++++++++++++++++++++++++++++++----
> kernel/cpu/idle.c | 9 ++--
> 5 files changed, 89 insertions(+), 52 deletions(-)
>
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -391,9 +391,9 @@ static void amd_e400_idle(void)
> * The switch back from broadcast mode needs to be
> * called with interrupts disabled.
> */
> - local_irq_disable();
> - clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> - local_irq_enable();
> + local_irq_disable();
> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> + local_irq_enable();
> } else
> default_idle();
> }
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -119,17 +119,10 @@ static struct dmi_system_id processor_po
> */
> static void acpi_safe_halt(void)
> {
> - current_thread_info()->status &= ~TS_POLLING;
> - /*
> - * TS_POLLING-cleared state must be visible before we
> - * test NEED_RESCHED:
> - */
> - smp_mb();
> - if (!need_resched()) {
> + if (!test_need_resched()) {
> safe_halt();
> local_irq_disable();
> }
> - current_thread_info()->status |= TS_POLLING;
> }
>
> #ifdef ARCH_APICTIMER_STOPS_ON_C3
> @@ -737,6 +730,11 @@ static int acpi_idle_enter_c1(struct cpu
> if (unlikely(!pr))
> return -EINVAL;
>
> + if (cx->entry_method == ACPI_CSTATE_FFH) {
> + if (current_set_polling_and_test())
> + return -EINVAL;
> + }
> +
> lapic_timer_state_broadcast(pr, cx, 1);
> acpi_idle_do_entry(cx);
>
> @@ -790,18 +788,9 @@ static int acpi_idle_enter_simple(struct
> if (unlikely(!pr))
> return -EINVAL;
>
> - if (cx->entry_method != ACPI_CSTATE_FFH) {
> - current_thread_info()->status &= ~TS_POLLING;
> - /*
> - * TS_POLLING-cleared state must be visible before we test
> - * NEED_RESCHED:
> - */
> - smp_mb();
> -
> - if (unlikely(need_resched())) {
> - current_thread_info()->status |= TS_POLLING;
> + if (cx->entry_method == ACPI_CSTATE_FFH) {
> + if (current_set_polling_and_test())
> return -EINVAL;
> - }
> }
>
> /*
> @@ -819,9 +808,6 @@ static int acpi_idle_enter_simple(struct
>
> sched_clock_idle_wakeup_event(0);
>
> - if (cx->entry_method != ACPI_CSTATE_FFH)
> - current_thread_info()->status |= TS_POLLING;
> -
> lapic_timer_state_broadcast(pr, cx, 0);
> return index;
> }
> @@ -858,18 +844,9 @@ static int acpi_idle_enter_bm(struct cpu
> }
> }
>
> - if (cx->entry_method != ACPI_CSTATE_FFH) {
> - current_thread_info()->status &= ~TS_POLLING;
> - /*
> - * TS_POLLING-cleared state must be visible before we test
> - * NEED_RESCHED:
> - */
> - smp_mb();
> -
> - if (unlikely(need_resched())) {
> - current_thread_info()->status |= TS_POLLING;
> + if (cx->entry_method == ACPI_CSTATE_FFH) {
> + if (current_set_polling_and_test())
> return -EINVAL;
> - }
> }
>
> acpi_unlazy_tlb(smp_processor_id());
> @@ -915,9 +892,6 @@ static int acpi_idle_enter_bm(struct cpu
>
> sched_clock_idle_wakeup_event(0);
>
> - if (cx->entry_method != ACPI_CSTATE_FFH)
> - current_thread_info()->status |= TS_POLLING;
> -
> lapic_timer_state_broadcast(pr, cx, 0);
> return index;
> }
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -359,7 +359,7 @@ static int intel_idle(struct cpuidle_dev
> if (!(lapic_timer_reliable_states & (1 << (cstate))))
> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
>
> - if (!need_resched()) {
> + if (!current_set_polling_and_test()) {
>
> __monitor((void *)¤t_thread_info()->flags, 0, 0);
> smp_mb();
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2471,34 +2471,98 @@ static inline int tsk_is_polling(struct
> {
> return task_thread_info(p)->status & TS_POLLING;
> }
> -static inline void current_set_polling(void)
> +static inline void __current_set_polling(void)
> {
> current_thread_info()->status |= TS_POLLING;
> }
>
> -static inline void current_clr_polling(void)
> +static inline bool __must_check current_set_polling_and_test(void)
> +{
> + __current_set_polling();
> +
> + /*
> + * Polling state must be visible before we test NEED_RESCHED,
> + * paired by resched_task()
> + */
> + smp_mb();
> +
> + return unlikely(test_need_resched());
> +}
> +
> +static inline void __current_clr_polling(void)
> {
> current_thread_info()->status &= ~TS_POLLING;
> - smp_mb__after_clear_bit();
> +}
> +
> +static inline bool __must_check current_clr_polling_and_test(void)
> +{
> + __current_clr_polling();
> +
> + /*
> + * Polling state must be visible before we test NEED_RESCHED,
> + * paired by resched_task()
> + */
> + smp_mb();
> +
> + return unlikely(test_need_resched());
> }
> #elif defined(TIF_POLLING_NRFLAG)
> static inline int tsk_is_polling(struct task_struct *p)
> {
> return test_tsk_thread_flag(p, TIF_POLLING_NRFLAG);
> }
> -static inline void current_set_polling(void)
> +
> +static inline void __current_set_polling(void)
> {
> set_thread_flag(TIF_POLLING_NRFLAG);
> }
>
> -static inline void current_clr_polling(void)
> +static inline bool __must_check current_set_polling_and_test(void)
> +{
> + __current_set_polling();
> +
> + /*
> + * Polling state must be visible before we test NEED_RESCHED,
> + * paired by resched_task()
> + *
> + * XXX: assumes set/clear bit are identical barrier wise.
> + */
> + smp_mb__after_clear_bit();
> +
> + return unlikely(test_need_resched());
> +}
> +
> +static inline void __current_clr_polling(void)
> {
> clear_thread_flag(TIF_POLLING_NRFLAG);
> }
> +
> +static inline bool __must_check current_clr_polling_and_test(void)
> +{
> + __current_clr_polling();
> +
> + /*
> + * Polling state must be visible before we test NEED_RESCHED,
> + * paired by resched_task()
> + */
> + smp_mb__after_clear_bit();
> +
> + return unlikely(test_need_resched());
> +}
> +
> #else
> static inline int tsk_is_polling(struct task_struct *p) { return 0; }
> -static inline void current_set_polling(void) { }
> -static inline void current_clr_polling(void) { }
> +static inline void __current_set_polling(void) { }
> +static inline void __current_clr_polling(void) { }
> +
> +static inline bool __must_check current_set_polling_and_test(void)
> +{
> + return unlikely(test_need_resched);
> +}
> +static inline bool __must_check current_clr_polling_and_test(void)
> +{
> + return unlikely(test_need_resched);
> +}
> #endif
>
> /*
> --- a/kernel/cpu/idle.c
> +++ b/kernel/cpu/idle.c
> @@ -44,7 +44,7 @@ static inline int cpu_idle_poll(void)
> rcu_idle_enter();
> trace_cpu_idle_rcuidle(0, smp_processor_id());
> local_irq_enable();
> - while (!need_resched())
> + while (!test_need_resched())
> cpu_relax();
> trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> rcu_idle_exit();
> @@ -92,8 +92,7 @@ static void cpu_idle_loop(void)
> if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
> cpu_idle_poll();
> } else {
> - current_clr_polling();
> - if (!need_resched()) {
> + if (!current_clr_polling_and_test()) {
> stop_critical_timings();
> rcu_idle_enter();
> arch_cpu_idle();
> @@ -103,7 +102,7 @@ static void cpu_idle_loop(void)
> } else {
> local_irq_enable();
> }
> - current_set_polling();
> + __current_set_polling();
> }
> arch_cpu_idle_exit();
> /*
> @@ -136,7 +135,7 @@ void cpu_startup_entry(enum cpuhp_state
> */
> boot_init_stack_canary();
> #endif
> - current_set_polling();
> + __current_set_polling();
> arch_cpu_idle_prepare();
> cpu_idle_loop();
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-11 13:13 ` Peter Zijlstra
2013-09-11 13:26 ` Peter Zijlstra
@ 2013-09-11 15:29 ` H. Peter Anvin
2013-09-11 15:33 ` Linus Torvalds
2 siblings, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2013-09-11 15:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On 09/11/2013 06:13 AM, Peter Zijlstra wrote:
> On Tue, Sep 10, 2013 at 02:43:06PM -0700, Linus Torvalds wrote:
>> That said, looking at your patch, I get the *very* strong feeling that
>> we could make a macro that does all the repetitions for us, and then
>> have a
>>
>> GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
>
> The below seems to compile..
>
> +
> +#define GENERATE_ADDcc(var, val, lock, cc) \
> +do { \
> + const int add_ID__ = (__builtin_constant_p(val) && \
> + ((val) == 1 || (val) == -1)) ? (val) : 0; \
> + \
> + switch (sizeof(var)) { \
> + case 4: \
> + if (add_ID__ == 1) { \
> + asm volatile goto(lock "incl %0;" \
> + "j" cc " %l[cc_label]" \
> + : : "m" (var) \
> + : "memory" : cc_label); \
> + } else if (add_ID__ == -1) { \
> + asm volatile goto(lock "decl %0;" \
> + "j" cc " %l[cc_label]" \
> + : : "m" (var) \
> + : "memory" : cc_label); \
> + } else { \
> + asm volatile goto(lock "addl %1, %0;" \
> + "j" cc " %l[cc_label]" \
> + : : "m" (var), "er" (val) \
> + : "memory" : cc_label); \
> + } \
> + break; \
> + \
> + case 8: \
> + if (add_ID__ == 1) { \
> + asm volatile goto(lock "incq %0;" \
> + "j" cc " %l[cc_label]" \
> + : : "m" (var) \
> + : "memory" : cc_label); \
> + } else if (add_ID__ == -1) { \
> + asm volatile goto(lock "decq %0;" \
> + "j" cc " %l[cc_label]" \
> + : : "m" (var) \
> + : "memory" : cc_label); \
> + } else { \
> + asm volatile goto(lock "addq %1, %0;" \
> + "j" cc " %l[cc_label]" \
> + : : "m" (var), "er" (val) \
> + : "memory" : cc_label); \
> + } \
> + break; \
> + \
At least in the "asm goto" case you can use:
lock "add%z0 %1,%0;"
... and skip the switch statement.
There was a bug in some old (gcc 3.x?) early x86-64 versions which would
treat %z0 as if it was %Z0 which means it would emit "ll" instead of "q"
but that doesn't apply to any gcc which has "asm goto"...
-hpa
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-11 13:13 ` Peter Zijlstra
2013-09-11 13:26 ` Peter Zijlstra
2013-09-11 15:29 ` H. Peter Anvin
@ 2013-09-11 15:33 ` Linus Torvalds
2013-09-11 18:59 ` Peter Zijlstra
2 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-11 15:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Wed, Sep 11, 2013 at 6:13 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 10, 2013 at 02:43:06PM -0700, Linus Torvalds wrote:
>> That said, looking at your patch, I get the *very* strong feeling that
>> we could make a macro that does all the repetitions for us, and then
>> have a
>>
>> GENERATE_RMW(atomic_sub_and_test, LOCK_PREFIX "subl", "e", "")
>
> The below seems to compile..
You have that horrible duplication in the macro itself now, though.
The only difference between the 4-byte and the 8-byte one is the "l"
vs "q". And the counter increment/decrement argument (by the caller)
defines whether it's add/dec/inc. Why not just add those as parameters
to the macro, and suddenly the macro becomes 10x smaller.
An excessively complex macro that makes the arguments trivially
simpler is not worth it.
Especially since that excessive macro complexity now means that your
macro is useless for things that the *simpler* macro could have done,
like the bit-test-and-modify cases.
So your complexity actually makes things worse.
So just pass in the operation name and size. Then somebody will use it
for test_and_set_bit() and friends too.
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-10 16:24 ` Linus Torvalds
@ 2013-09-11 16:00 ` H. Peter Anvin
0 siblings, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2013-09-11 16:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: Arjan van de Ven, Ingo Molnar, Peter Zijlstra, Andi Kleen,
Mike Galbraith, Thomas Gleixner, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On 09/10/2013 09:24 AM, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 8:29 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>>
>> also.. yuck on using "dec"
>> "dec" sucks, please use "sub foo ,1" instead
>
> That's a bigger instruction, largely due to the constant.
>
>> (dec sucks because of its broken flags behavior; it creates basically a
>> bubble in the pipeline)
>
> Intel could (and should) just fix that. It's "easy" enough - you just
> need to rename the carry flag as a separate register (and make sure
> that the conditional instructions that don't use it don't think they
> need it).
>
> In fact I thought Intel already did that on their large cores. Are you
> sure they still have trouble with inc/dec?
>
Big cores are fine, but Atom (and presumably Quark) might still suffer a
performance penalty.
-hpa
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-11 11:06 ` Peter Zijlstra
2013-09-11 13:34 ` Mike Galbraith
@ 2013-09-11 16:35 ` Andy Lutomirski
2013-09-11 18:05 ` Peter Zijlstra
1 sibling, 1 reply; 52+ messages in thread
From: Andy Lutomirski @ 2013-09-11 16:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
Vaidyanathan Srinivasan
On Wed, Sep 11, 2013 at 4:06 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Sep 11, 2013 at 10:25:30AM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 10, 2013 at 06:59:57PM -0700, Andy Lutomirski wrote:
>
>> > It looks like the intel_idle code can get confused if TIF_NEED_RESCHED
>> > is set but the preempt resched bit is not -- the need_resched call
>> > between monitor and mwait won't notice TIF_NEED_RESCHED.
>> >
>> > Is this condition possible?
>>
>> Ah indeed, I'll have to go find all idle loops out there it seems. Now
>> if only they were easy to spot :/
>>
>> I was hoping the generic idle thing was good enough, apparently not
>> quite. Thanks for spotting that.
>
> OK, and the reason I didn't notice is that the entire TS_POLLING thing
> is completely wrecked so we'll get the interrupt anyway.
>
I bet that this improves cross-cpu wakeup latency, too -- the old code
would presumably wake up the cpu and then immediately interrupt it.
It might be nice to rename one or both of need_resched and
test_need_resched, though -- the difference is somewhat inscrutable.
--Andy
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-11 16:35 ` Andy Lutomirski
@ 2013-09-11 18:05 ` Peter Zijlstra
2013-09-11 18:07 ` Andy Lutomirski
0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 18:05 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
Vaidyanathan Srinivasan
On Wed, Sep 11, 2013 at 09:35:08AM -0700, Andy Lutomirski wrote:
> I bet that this improves cross-cpu wakeup latency, too -- the old code
> would presumably wake up the cpu and then immediately interrupt it.
Yeah,. its what clued Mike in to there being a problem.
> It might be nice to rename one or both of need_resched and
> test_need_resched, though -- the difference is somewhat inscrutable.
I agreed, I've just been unable to come up with anything sane.
The best I could come up with is renaming
{set,clear,test}_need_resched() to {}_tif_need_resched(), so we then end
up with test_tif_need_resched(), which is slightly more different from
need_resched().
Ideas?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-11 18:05 ` Peter Zijlstra
@ 2013-09-11 18:07 ` Andy Lutomirski
0 siblings, 0 replies; 52+ messages in thread
From: Andy Lutomirski @ 2013-09-11 18:07 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
Vaidyanathan Srinivasan
On Wed, Sep 11, 2013 at 11:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Sep 11, 2013 at 09:35:08AM -0700, Andy Lutomirski wrote:
>> I bet that this improves cross-cpu wakeup latency, too -- the old code
>> would presumably wake up the cpu and then immediately interrupt it.
>
> Yeah,. its what clued Mike in to there being a problem.
>
>> It might be nice to rename one or both of need_resched and
>> test_need_resched, though -- the difference is somewhat inscrutable.
>
> I agreed, I've just been unable to come up with anything sane.
>
> The best I could come up with is renaming
> {set,clear,test}_need_resched() to {}_tif_need_resched(), so we then end
> up with test_tif_need_resched(), which is slightly more different from
> need_resched().
Nothing great. That's at least better than the current state, I think.
--Andy
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-11 15:33 ` Linus Torvalds
@ 2013-09-11 18:59 ` Peter Zijlstra
2013-09-11 23:02 ` Linus Torvalds
0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-11 18:59 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Wed, Sep 11, 2013 at 08:33:21AM -0700, Linus Torvalds wrote:
> An excessively complex macro that makes the arguments trivially
> simpler is not worth it.
OK, stripped it down further, I couldn't quite see how to collapse the
unary and binary operator variants though :/
---
arch/x86/include/asm/rmwcc.h | 62 +++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/atomic.h | 29 ++---------------
arch/x86/include/asm/atomic64_64.h | 28 ++--------------
arch/x86/include/asm/local.h | 28 ++--------------
4 files changed, 75 insertions(+), 72 deletions(-)
--- /dev/null
+++ b/arch/x86/include/asm/rmwcc.h
@@ -0,0 +1,62 @@
+#ifndef _ASM_X86_RMWcc
+#define _ASM_X86_RMWcc
+
+extern void __bad_rmwcc_nargs(void);
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define GENERATE_RMWcc(var, val, op, nargs, arg0, cc) \
+do { \
+ switch (nargs) { \
+ case 0: \
+ asm volatile goto (op " " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ break; \
+ \
+ case 1: \
+ asm volatile goto (op " %1, " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var), "er" (val) \
+ : "memory" : cc_label); \
+ break; \
+ \
+ default: __bad_rmwcc_nargs(); \
+ } \
+ \
+ return 0; \
+cc_label: \
+ return 1; \
+} while (0)
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define GENERATE_RMWcc(var, val, op, nargs, arg0, cc) \
+do { \
+ char c; \
+ \
+ switch (nargs) { \
+ case 0: \
+ asm volatile (op " " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ break; \
+ \
+ case 1: \
+ asm volatile (op " %2, " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : "er" (val) : "memory"); \
+ break; \
+ \
+ default: __bad_rmwcc_nargs(); \
+ } \
+ \
+ return c != 0; \
+} while (0)
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_RMWcc */
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/alternative.h>
#include <asm/cmpxchg.h>
+#include <asm/rmwcc.h>
/*
* Atomic operations that C can't guarantee us. Useful for
@@ -76,12 +77,7 @@ static inline void atomic_sub(int i, ato
*/
static inline int atomic_sub_and_test(int i, atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_RMWcc(v->counter, i, LOCK_PREFIX "subl", 1, "%0", "e");
}
/**
@@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *
*/
static inline int atomic_dec_and_test(atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "decl %0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_RMWcc(v->counter, 0, LOCK_PREFIX "decl", 0, "%0", "e");
}
/**
@@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(at
*/
static inline int atomic_inc_and_test(atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "incl %0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_RMWcc(v->counter, 0, LOCK_PREFIX "incl", 0, "%0", "e");
}
/**
@@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(at
*/
static inline int atomic_add_negative(int i, atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
- : "+m" (v->counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_RMWcc(v->counter, i, LOCK_PREFIX "addl", 1, "%0", "s");
}
/**
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -72,12 +72,7 @@ static inline void atomic64_sub(long i,
*/
static inline int atomic64_sub_and_test(long i, atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "subq %2,%0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "er" (i), "m" (v->counter) : "memory");
- return c;
+ GENERATE_RMWcc(v->counter, i, LOCK_PREFIX "subq", 1, "%0", "e");
}
/**
@@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64
*/
static inline int atomic64_dec_and_test(atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "decq %0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "m" (v->counter) : "memory");
- return c != 0;
+ GENERATE_RMWcc(v->counter, 0, LOCK_PREFIX "decq", 0, "%0", "e");
}
/**
@@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(
*/
static inline int atomic64_inc_and_test(atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "incq %0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "m" (v->counter) : "memory");
- return c != 0;
+ GENERATE_RMWcc(v->counter, 0, LOCK_PREFIX "incq", 0, "%0", "e");
}
/**
@@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(
*/
static inline int atomic64_add_negative(long i, atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "addq %2,%0; sets %1"
- : "=m" (v->counter), "=qm" (c)
- : "er" (i), "m" (v->counter) : "memory");
- return c;
+ GENERATE_RMWcc(v->counter, i, LOCK_PREFIX "addq", 1, "%0", "s");
}
/**
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -52,12 +52,7 @@ static inline void local_sub(long i, loc
*/
static inline int local_sub_and_test(long i, local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_SUB "%2,%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_RMWcc(l->a.counter, i, _ASM_SUB, 1, "%0", "e");
}
/**
@@ -70,12 +65,7 @@ static inline int local_sub_and_test(lon
*/
static inline int local_dec_and_test(local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_DEC "%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_RMWcc(l->a.counter, 0, _ASM_DEC, 0, "%0", "e");
}
/**
@@ -88,12 +78,7 @@ static inline int local_dec_and_test(loc
*/
static inline int local_inc_and_test(local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_INC "%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GENERATE_RMWcc(l->a.counter, 0, _ASM_INC, 0, "%0", "e");
}
/**
@@ -107,12 +92,7 @@ static inline int local_inc_and_test(loc
*/
static inline int local_add_negative(long i, local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_ADD "%2,%0; sets %1"
- : "+m" (l->a.counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GENERATE_RMWcc(l->a.counter, i, _ASM_ADD, 1, "%0", "s");
}
/**
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-11 18:59 ` Peter Zijlstra
@ 2013-09-11 23:02 ` Linus Torvalds
2013-09-12 2:20 ` Peter Zijlstra
0 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-11 23:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Wed, Sep 11, 2013 at 11:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> OK, stripped it down further, I couldn't quite see how to collapse the
> unary and binary operator variants though :/
Ok, this looks pretty good. I assume it works too? ;)
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-11 23:02 ` Linus Torvalds
@ 2013-09-12 2:20 ` Peter Zijlstra
2013-09-12 2:43 ` Linus Torvalds
2013-09-13 7:25 ` Kevin Easton
0 siblings, 2 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-12 2:20 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Wed, Sep 11, 2013 at 04:02:14PM -0700, Linus Torvalds wrote:
> On Wed, Sep 11, 2013 at 11:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > OK, stripped it down further, I couldn't quite see how to collapse the
> > unary and binary operator variants though :/
>
> Ok, this looks pretty good. I assume it works too? ;)
Only compile tested that one.. the below is kvm boot tested until root
mount -- I'll try on actual hardware when I've gotten some sleep.
I split the thing up into two macros GEN_UNARY_RMWcc and
GEN_BINARY_RMWcc which ends up being more readable as well as smaller
code overall.
I also attempted to convert asm/bitops.h, although I'm not sure it'll
compile right with older GCCs due to the comment near BITOP_ADDR()
It also changes the fallback from sbb %0,%0 to setc %0, which afaict
should be similar but I'm not too familiar with all that.
I might have to add the clobber to the macro arguments so we can do
version without "memory" clobber, although bitops is inconsistent with
that as well, __test_and_clear_bit() doesn't have a memory clobber but
__test_and_change_bit() does.
---
arch/x86/include/asm/atomic.h | 29 +++---------------
arch/x86/include/asm/atomic64_64.h | 28 ++---------------
arch/x86/include/asm/bitops.h | 58 +++++--------------------------------
arch/x86/include/asm/local.h | 28 ++---------------
arch/x86/include/asm/rmwcc.h | 52 +++++++++++++++++++++++++++++++++
5 files changed, 73 insertions(+), 122 deletions(-)
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -6,6 +6,7 @@
#include <asm/processor.h>
#include <asm/alternative.h>
#include <asm/cmpxchg.h>
+#include <asm/rmwcc.h>
/*
* Atomic operations that C can't guarantee us. Useful for
@@ -76,12 +77,7 @@ static inline void atomic_sub(int i, ato
*/
static inline int atomic_sub_and_test(int i, atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, i, "%0", "e");
}
/**
@@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *
*/
static inline int atomic_dec_and_test(atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "decl %0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e");
}
/**
@@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(at
*/
static inline int atomic_inc_and_test(atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "incl %0; sete %1"
- : "+m" (v->counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e");
}
/**
@@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(at
*/
static inline int atomic_add_negative(int i, atomic_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
- : "+m" (v->counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, i, "%0", "s");
}
/**
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -72,12 +72,7 @@ static inline void atomic64_sub(long i,
*/
static inline int atomic64_sub_and_test(long i, atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "subq %2,%0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "er" (i), "m" (v->counter) : "memory");
- return c;
+ GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, i, "%0", "e");
}
/**
@@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64
*/
static inline int atomic64_dec_and_test(atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "decq %0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "m" (v->counter) : "memory");
- return c != 0;
+ GEN_UNARY_RMWcc(LOCK_PREFIX "decq", v->counter, "%0", "e");
}
/**
@@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(
*/
static inline int atomic64_inc_and_test(atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "incq %0; sete %1"
- : "=m" (v->counter), "=qm" (c)
- : "m" (v->counter) : "memory");
- return c != 0;
+ GEN_UNARY_RMWcc(LOCK_PREFIX "incq", v->counter, "%0", "e");
}
/**
@@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(
*/
static inline int atomic64_add_negative(long i, atomic64_t *v)
{
- unsigned char c;
-
- asm volatile(LOCK_PREFIX "addq %2,%0; sets %1"
- : "=m" (v->counter), "=qm" (c)
- : "er" (i), "m" (v->counter) : "memory");
- return c;
+ GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, i, "%0", "s");
}
/**
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -14,6 +14,7 @@
#include <linux/compiler.h>
#include <asm/alternative.h>
+#include <asm/rmwcc.h>
#if BITS_PER_LONG == 32
# define _BITOPS_LONG_SHIFT 5
@@ -204,12 +205,7 @@ static inline void change_bit(long nr, v
*/
static inline int test_and_set_bit(long nr, volatile unsigned long *addr)
{
- int oldbit;
-
- asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
- "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
-
- return oldbit;
+ GEN_BINARY_RMWcc(LOCK_PREFIX "bts", *addr, nr, "%0", "c");
}
/**
@@ -236,13 +232,7 @@ test_and_set_bit_lock(long nr, volatile
*/
static inline int __test_and_set_bit(long nr, volatile unsigned long *addr)
{
- int oldbit;
-
- asm("bts %2,%1\n\t"
- "sbb %0,%0"
- : "=r" (oldbit), ADDR
- : "Ir" (nr));
- return oldbit;
+ GEN_BINARY_RMWcc("bts", *addr, nr, "%0", "c");
}
/**
@@ -255,13 +245,7 @@ static inline int __test_and_set_bit(lon
*/
static inline int test_and_clear_bit(long nr, volatile unsigned long *addr)
{
- int oldbit;
-
- asm volatile(LOCK_PREFIX "btr %2,%1\n\t"
- "sbb %0,%0"
- : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
-
- return oldbit;
+ GEN_BINARY_RMWcc(LOCK_PREFIX "btr", *addr, nr, "%0", "c");
}
/**
@@ -282,26 +266,13 @@ static inline int test_and_clear_bit(lon
*/
static inline int __test_and_clear_bit(long nr, volatile unsigned long *addr)
{
- int oldbit;
-
- asm volatile("btr %2,%1\n\t"
- "sbb %0,%0"
- : "=r" (oldbit), ADDR
- : "Ir" (nr));
- return oldbit;
+ GEN_BINARY_RMWcc("btr", *addr, nr, "%0", "c");
}
/* WARNING: non atomic and it can be reordered! */
static inline int __test_and_change_bit(long nr, volatile unsigned long *addr)
{
- int oldbit;
-
- asm volatile("btc %2,%1\n\t"
- "sbb %0,%0"
- : "=r" (oldbit), ADDR
- : "Ir" (nr) : "memory");
-
- return oldbit;
+ GEN_BINARY_RMWcc("btc", *addr, nr, "%0", "c");
}
/**
@@ -314,13 +285,7 @@ static inline int __test_and_change_bit(
*/
static inline int test_and_change_bit(long nr, volatile unsigned long *addr)
{
- int oldbit;
-
- asm volatile(LOCK_PREFIX "btc %2,%1\n\t"
- "sbb %0,%0"
- : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
-
- return oldbit;
+ GEN_BINARY_RMWcc(LOCK_PREFIX "btc", *addr, nr, "%0", "c");
}
static __always_inline int constant_test_bit(long nr, const volatile unsigned long *addr)
@@ -331,14 +296,7 @@ static __always_inline int constant_test
static inline int variable_test_bit(long nr, volatile const unsigned long *addr)
{
- int oldbit;
-
- asm volatile("bt %2,%1\n\t"
- "sbb %0,%0"
- : "=r" (oldbit)
- : "m" (*(unsigned long *)addr), "Ir" (nr));
-
- return oldbit;
+ GEN_BINARY_RMWcc("bt", *(volatile unsigned long *)addr, nr, "%0", "c");
}
#if 0 /* Fool kernel-doc since it doesn't do macros yet */
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -52,12 +52,7 @@ static inline void local_sub(long i, loc
*/
static inline int local_sub_and_test(long i, local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_SUB "%2,%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GEN_BINARY_RMWcc(_ASM_SUB, l->a.counter, i, "%0", "e");
}
/**
@@ -70,12 +65,7 @@ static inline int local_sub_and_test(lon
*/
static inline int local_dec_and_test(local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_DEC "%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GEN_UNARY_RMWcc(_ASM_DEC, l->a.counter, "%0", "e");
}
/**
@@ -88,12 +78,7 @@ static inline int local_dec_and_test(loc
*/
static inline int local_inc_and_test(local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_INC "%0; sete %1"
- : "+m" (l->a.counter), "=qm" (c)
- : : "memory");
- return c != 0;
+ GEN_UNARY_RMWcc(_ASM_INC, l->a.counter, "%0", "e");
}
/**
@@ -107,12 +92,7 @@ static inline int local_inc_and_test(loc
*/
static inline int local_add_negative(long i, local_t *l)
{
- unsigned char c;
-
- asm volatile(_ASM_ADD "%2,%0; sets %1"
- : "+m" (l->a.counter), "=qm" (c)
- : "ir" (i) : "memory");
- return c;
+ GEN_BINARY_RMWcc(_ASM_ADD, l->a.counter, i, "%0", "s");
}
/**
--- /dev/null
+++ b/arch/x86/include/asm/rmwcc.h
@@ -0,0 +1,52 @@
+#ifndef _ASM_X86_RMWcc
+#define _ASM_X86_RMWcc
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define GEN_UNARY_RMWcc(op, var, arg0, cc) \
+do { \
+ asm volatile goto (op " " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var) \
+ : "memory" : cc_label); \
+ return 0; \
+cc_label: \
+ return 1; \
+} while (0)
+
+#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) \
+do { \
+ asm volatile goto (op " %1, " arg0 ";" \
+ "j" cc " %l[cc_label]" \
+ : : "m" (var), "er" (val) \
+ : "memory" : cc_label); \
+ return 0; \
+cc_label: \
+ return 1; \
+} while (0)
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define GEN_UNARY_RMWcc(op, var, arg0, cc) \
+do { \
+ char c; \
+ asm volatile (op " " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : : "memory"); \
+ return c != 0; \
+} while (0)
+
+#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) \
+do { \
+ char c; \
+ asm volatile (op " %2, " arg0 ";" \
+ "set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : "er" (val) : "memory"); \
+ return c != 0; \
+} while (0)
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_RMWcc */
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-12 2:20 ` Peter Zijlstra
@ 2013-09-12 2:43 ` Linus Torvalds
2013-09-12 11:51 ` Peter Zijlstra
2013-09-13 7:25 ` Kevin Easton
1 sibling, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2013-09-12 2:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Wed, Sep 11, 2013 at 7:20 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> I split the thing up into two macros GEN_UNARY_RMWcc and
> GEN_BINARY_RMWcc which ends up being more readable as well as smaller
> code overall.
Yes, that looks like the right abstraction level. Powerful without
being complicated.
> I also attempted to convert asm/bitops.h, although I'm not sure it'll
> compile right with older GCCs due to the comment near BITOP_ADDR()
Actually, since you now have that memory clobber, it no longer
matters, and "m" is the right thing.
That said, the very same memory clobber may be what makes this whole
approach a loss, if it causes gcc to do tons of reloads or other
random things.
That memory clobber wasn't an issue for the whole
__preempt_count_dec_and_test() thing, because there we needed to keep
anything before the decrement contained anyway.
For other cases? Who knows.. A lot of the "change and test atomically"
things have the same containment need, so it might not be a problem.
> I might have to add the clobber to the macro arguments so we can do
> version without "memory" clobber, although bitops is inconsistent with
> that as well, __test_and_clear_bit() doesn't have a memory clobber but
> __test_and_change_bit() does.
You do need the memory clobber. The "asm goto" version can not have
outputs, so "=m" and "+m" are both no-go, and so the only thing left
is that memory clobber, as far as I can see..
The bitops didn't use to need it, because they had that "+/=m" that
already tells gcc that the target is changed.
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count
2013-09-11 13:34 ` Mike Galbraith
@ 2013-09-12 6:01 ` Mike Galbraith
0 siblings, 0 replies; 52+ messages in thread
From: Mike Galbraith @ 2013-09-12 6:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, Linus Torvalds, Ingo Molnar, Andi Kleen,
Peter Anvin, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, linux-kernel, linux-arch, Len Brown,
Vaidyanathan Srinivasan
On Wed, 2013-09-11 at 15:34 +0200, Mike Galbraith wrote:
> On Wed, 2013-09-11 at 13:06 +0200, Peter Zijlstra wrote:
> > Mike does it fix the funny you saw?
>
> Yup, modulo test_need_resched() not existing in master.
...
> while I haven't yet booted Q6600 box, core2
> Toshiba Satellite lappy is using mwait_idle_with_hints() in master.
Heh. Core2 lappy uses mwait_idle_with_hints().. unless you boot
processor.max_cstate=1 so it doesn't use piece of shite hpet. Q6600
always has idle woes, seemingly because box doesn't do deep enough
cstate to make something (acpi?) happy, despite mwait_idle() having
worked fine for years.
Seems I need to find the cstate dependency spot, and rap it upside the
head if I want my core2 boxen to regain happy campers status.
-Mike
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-12 2:43 ` Linus Torvalds
@ 2013-09-12 11:51 ` Peter Zijlstra
2013-09-12 12:25 ` Ingo Molnar
0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-12 11:51 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
On Wed, Sep 11, 2013 at 07:43:42PM -0700, Linus Torvalds wrote:
> On Wed, Sep 11, 2013 at 7:20 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > I split the thing up into two macros GEN_UNARY_RMWcc and
> > GEN_BINARY_RMWcc which ends up being more readable as well as smaller
> > code overall.
>
> Yes, that looks like the right abstraction level. Powerful without
> being complicated.
> That said, the very same memory clobber may be what makes this whole
> approach a loss, if it causes gcc to do tons of reloads or other
> random things.
Random things below, not sure why
> For other cases? Who knows.. A lot of the "change and test atomically"
> things have the same containment need, so it might not be a problem.
Right, all atomic ops already contain a memory clobber to go along with
their memory barrier semantics.
So I did a defconfig build without the patch and with the patch and got
a significant size increase:
text data bss filename
11173443 1423736 1183744 defconfig-build/vmlinux.before
11174516 1423736 1183744 defconfig-build/vmlinux.after
I then undid all the bitop conversions that added the extra memory
clobber and got:
11174204 1423736 1183744 defconfig-build/vmlinux.after
Which is still a significant increase, so I had a look at what GCC
generates, for mm/rmap.c, which uses quite a few atomic_*_and_test(),
functions I got:
text data bss dec hex filename
8675 1 16 8692 21f4 defconfig-build/mm/rmap.o
8739 1 16 8756 2234 defconfig-build/mm/rmap.o
So the increase is there too, doing a objdump -D on them the first
difference is:
0000000000000660 <do_page_add_anon_rmap>:
660: 55 push %rbp
661: 48 89 e5 mov %rsp,%rbp
664: 48 83 ec 20 sub $0x20,%rsp
668: 48 89 5d f0 mov %rbx,-0x10(%rbp)
66c: 4c 89 65 f8 mov %r12,-0x8(%rbp)
670: 48 89 fb mov %rdi,%rbx
673: f0 ff 47 18 lock incl 0x18(%rdi)
677: 0f 94 c0 sete %al
67a: 84 c0 test %al,%al
67c: 75 12 jne 690 <do_page_add_anon_rmap+0x30>
67e: 48 8b 5d f0 mov -0x10(%rbp),%rbx
682: 4c 8b 65 f8 mov -0x8(%rbp),%r12
686: c9 leaveq
vs.:
0000000000000660 <do_page_add_anon_rmap>:
660: 55 push %rbp
661: 48 89 e5 mov %rsp,%rbp
664: 48 83 ec 20 sub $0x20,%rsp
668: 48 89 5d e0 mov %rbx,-0x20(%rbp)
66c: 4c 89 65 e8 mov %r12,-0x18(%rbp)
670: 48 89 fb mov %rdi,%rbx
673: 4c 89 6d f0 mov %r13,-0x10(%rbp)
677: 4c 89 75 f8 mov %r14,-0x8(%rbp)
67b: f0 ff 47 18 lock incl 0x18(%rdi)
67f: 74 17 je 698 <do_page_add_anon_rmap+0x38>
681: 48 8b 5d e0 mov -0x20(%rbp),%rbx
685: 4c 8b 65 e8 mov -0x18(%rbp),%r12
689: 4c 8b 6d f0 mov -0x10(%rbp),%r13
68d: 4c 8b 75 f8 mov -0x8(%rbp),%r14
691: c9 leaveq
For some obscure (to me) reason the new fangled asm goto construct
generates a bunch of extra MOVs.
OTOH the good news is that a kernel with that patch applied does indeed
boot properly on real hardware.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-12 11:51 ` Peter Zijlstra
@ 2013-09-12 12:25 ` Ingo Molnar
0 siblings, 0 replies; 52+ messages in thread
From: Ingo Molnar @ 2013-09-12 12:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Andi Kleen, Peter Anvin, Mike Galbraith,
Thomas Gleixner, Arjan van de Ven, Frederic Weisbecker,
Linux Kernel Mailing List, linux-arch
* Peter Zijlstra <peterz@infradead.org> wrote:
> So the increase is there too, doing a objdump -D on them the first
> difference is:
>
> 0000000000000660 <do_page_add_anon_rmap>:
> 660: 55 push %rbp
> 661: 48 89 e5 mov %rsp,%rbp
> 664: 48 83 ec 20 sub $0x20,%rsp
> 668: 48 89 5d f0 mov %rbx,-0x10(%rbp)
> 66c: 4c 89 65 f8 mov %r12,-0x8(%rbp)
> 670: 48 89 fb mov %rdi,%rbx
> 673: f0 ff 47 18 lock incl 0x18(%rdi)
> 677: 0f 94 c0 sete %al
> 67a: 84 c0 test %al,%al
> 67c: 75 12 jne 690 <do_page_add_anon_rmap+0x30>
> 67e: 48 8b 5d f0 mov -0x10(%rbp),%rbx
> 682: 4c 8b 65 f8 mov -0x8(%rbp),%r12
> 686: c9 leaveq
>
> vs.:
>
> 0000000000000660 <do_page_add_anon_rmap>:
> 660: 55 push %rbp
> 661: 48 89 e5 mov %rsp,%rbp
> 664: 48 83 ec 20 sub $0x20,%rsp
> 668: 48 89 5d e0 mov %rbx,-0x20(%rbp)
> 66c: 4c 89 65 e8 mov %r12,-0x18(%rbp)
> 670: 48 89 fb mov %rdi,%rbx
> 673: 4c 89 6d f0 mov %r13,-0x10(%rbp)
> 677: 4c 89 75 f8 mov %r14,-0x8(%rbp)
> 67b: f0 ff 47 18 lock incl 0x18(%rdi)
> 67f: 74 17 je 698 <do_page_add_anon_rmap+0x38>
> 681: 48 8b 5d e0 mov -0x20(%rbp),%rbx
> 685: 4c 8b 65 e8 mov -0x18(%rbp),%r12
> 689: 4c 8b 6d f0 mov -0x10(%rbp),%r13
> 68d: 4c 8b 75 f8 mov -0x8(%rbp),%r14
> 691: c9 leaveq
>
> For some obscure (to me) reason the new fangled asm goto construct
> generates a bunch of extra MOVs.
It adds two pairs of MOVs that shows that R13 and R14 got clobbered, but
the change also got rid of of a SETE and a TEST here:
> 673: f0 ff 47 18 lock incl 0x18(%rdi)
> 677: 0f 94 c0 sete %al
> 67a: 84 c0 test %al,%al
> 67c: 75 12 jne 690 <do_page_add_anon_rmap+0x30>
so there's a slight increase in size, but the extra instructions look
rather lightweight and it could all go away if asm goto is improved ...
It would all be very sweet if all those clobbers went away.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-12 2:20 ` Peter Zijlstra
2013-09-12 2:43 ` Linus Torvalds
@ 2013-09-13 7:25 ` Kevin Easton
2013-09-13 8:06 ` Peter Zijlstra
1 sibling, 1 reply; 52+ messages in thread
From: Kevin Easton @ 2013-09-13 7:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, LKML
On Thu, Sep 12, 2013 at 04:20:40AM +0200, Peter Zijlstra wrote:
> On Wed, Sep 11, 2013 at 04:02:14PM -0700, Linus Torvalds wrote:
> > On Wed, Sep 11, 2013 at 11:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > OK, stripped it down further, I couldn't quite see how to collapse the
> > > unary and binary operator variants though :/
> >
> > Ok, this looks pretty good. I assume it works too? ;)
>
> Only compile tested that one.. the below is kvm boot tested until root
> mount -- I'll try on actual hardware when I've gotten some sleep.
>
> I split the thing up into two macros GEN_UNARY_RMWcc and
> GEN_BINARY_RMWcc which ends up being more readable as well as smaller
> code overall.
If you wanted to collapse the unary and binary variants as you mentioned
upthread, you could do something like (for the CC_HAVE_ASM_GOTO case):
#define GEN_RMWcc(fullop, cc, ...) \
do { \
asm volatile goto (fullop \
"j" cc " %l[cc_label]" \
: : __VA_ARGS__ \
: "memory" : cc_label); \
return 0; \
cc_label: \
return 1; \
} while (0)
#define GEN_UNARY_RMWcc(op, var, arg0, cc) GEN_RMWcc(op " " arg0 ";", cc, "m" (var))
#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) GEN_RMWcc(op " %1, " arg0 ";", cc, "m" (var), "er" (val))
- Kevin
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/7] preempt_count rework -v2
2013-09-13 7:25 ` Kevin Easton
@ 2013-09-13 8:06 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2013-09-13 8:06 UTC (permalink / raw)
To: Kevin Easton
Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Peter Anvin,
Mike Galbraith, Thomas Gleixner, Arjan van de Ven,
Frederic Weisbecker, LKML
On Fri, Sep 13, 2013 at 05:25:00PM +1000, Kevin Easton wrote:
> If you wanted to collapse the unary and binary variants as you mentioned
> upthread, you could do something like (for the CC_HAVE_ASM_GOTO case):
Indeed, another 11 lines off, awesome!
--- /dev/null
+++ b/arch/x86/include/asm/rmwcc.h
@@ -0,0 +1,41 @@
+#ifndef _ASM_X86_RMWcc
+#define _ASM_X86_RMWcc
+
+#ifdef CC_HAVE_ASM_GOTO
+
+#define __GEN_RMWcc(fullop, var, cc, ...) \
+do { \
+ asm volatile goto (fullop "; j" cc " %l[cc_label]" \
+ : : "m" (var), ## __VA_ARGS__ \
+ : "memory" : cc_label); \
+ return 0; \
+cc_label: \
+ return 1; \
+} while (0)
+
+#define GEN_UNARY_RMWcc(op, var, arg0, cc) \
+ __GEN_RMWcc(op " " arg0, var, cc)
+
+#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) \
+ __GEN_RMWcc(op " %1, " arg0, var, cc, "er" (val))
+
+#else /* !CC_HAVE_ASM_GOTO */
+
+#define __GEN_RMWcc(fullop, var, cc, ...) \
+do { \
+ char c; \
+ asm volatile (fullop "; set" cc " %1" \
+ : "+m" (var), "=qm" (c) \
+ : __VA_ARGS__ : "memory"); \
+ return c != 0; \
+} while (0)
+
+#define GEN_UNARY_RMWcc(op, var, arg0, cc) \
+ __GEN_RMWcc(op " " arg0, var, cc)
+
+#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) \
+ __GEN_RMWcc(op " %2, " arg0, var, cc, "er" (val))
+
+#endif /* CC_HAVE_ASM_GOTO */
+
+#endif /* _ASM_X86_RMWcc */
^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2013-09-13 8:06 UTC | newest]
Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-10 13:08 [PATCH 0/7] preempt_count rework -v2 Peter Zijlstra
2013-09-10 13:08 ` [PATCH 1/7] sched: Introduce preempt_count accessor functions Peter Zijlstra
2013-09-10 13:08 ` [PATCH 2/7] sched: Add NEED_RESCHED to the preempt_count Peter Zijlstra
2013-09-11 1:59 ` Andy Lutomirski
2013-09-11 8:25 ` Peter Zijlstra
2013-09-11 11:06 ` Peter Zijlstra
2013-09-11 13:34 ` Mike Galbraith
2013-09-12 6:01 ` Mike Galbraith
2013-09-11 16:35 ` Andy Lutomirski
2013-09-11 18:05 ` Peter Zijlstra
2013-09-11 18:07 ` Andy Lutomirski
2013-09-11 11:14 ` Peter Zijlstra
2013-09-10 13:08 ` [PATCH 3/7] sched, arch: Create asm/preempt.h Peter Zijlstra
2013-09-10 13:08 ` [PATCH 4/7] sched: Create more preempt_count accessors Peter Zijlstra
2013-09-10 13:08 ` [PATCH 5/7] sched: Extract the basic add/sub preempt_count modifiers Peter Zijlstra
2013-09-10 13:08 ` [PATCH 6/7] sched, x86: Provide a per-cpu preempt_count implementation Peter Zijlstra
2013-09-10 13:27 ` Peter Zijlstra
2013-09-10 14:02 ` Eric Dumazet
2013-09-10 15:25 ` Peter Zijlstra
2013-09-10 16:48 ` Peter Zijlstra
2013-09-10 13:08 ` [PATCH 7/7] sched, x86: Optimize the preempt_schedule() call Peter Zijlstra
2013-09-10 13:42 ` Ingo Molnar
2013-09-10 13:55 ` Jan Beulich
2013-09-10 13:55 ` Jan Beulich
2013-09-10 14:25 ` Ingo Molnar
2013-09-10 13:51 ` [PATCH 0/7] preempt_count rework -v2 Ingo Molnar
2013-09-10 13:56 ` Ingo Molnar
2013-09-10 15:14 ` Peter Zijlstra
2013-09-10 15:29 ` Arjan van de Ven
2013-09-10 15:35 ` Peter Zijlstra
2013-09-10 16:24 ` Linus Torvalds
2013-09-11 16:00 ` H. Peter Anvin
2013-09-10 16:34 ` Linus Torvalds
2013-09-10 16:45 ` Peter Zijlstra
2013-09-10 17:06 ` Linus Torvalds
2013-09-10 21:25 ` Peter Zijlstra
2013-09-10 21:43 ` Linus Torvalds
2013-09-10 21:51 ` H. Peter Anvin
2013-09-10 22:02 ` Linus Torvalds
2013-09-10 22:06 ` H. Peter Anvin
2013-09-11 13:13 ` Peter Zijlstra
2013-09-11 13:26 ` Peter Zijlstra
2013-09-11 15:29 ` H. Peter Anvin
2013-09-11 15:33 ` Linus Torvalds
2013-09-11 18:59 ` Peter Zijlstra
2013-09-11 23:02 ` Linus Torvalds
2013-09-12 2:20 ` Peter Zijlstra
2013-09-12 2:43 ` Linus Torvalds
2013-09-12 11:51 ` Peter Zijlstra
2013-09-12 12:25 ` Ingo Molnar
2013-09-13 7:25 ` Kevin Easton
2013-09-13 8:06 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.