* [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE
@ 2015-09-30 7:10 Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
` (11 more replies)
0 siblings, 12 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
Second posting of the series that kills PREEMPT_ACTIVE dead. Thanks for all the
feedback.
I've re-oredered the patches slightly and reworked one or two; those that got
significant changes I've ignored the Reviewed-by tags for.
Please have another careful look.
---
arch/x86/include/asm/preempt.h | 5 +--
arch/x86/include/asm/thread_info.h | 2 -
arch/x86/kernel/process_32.c | 8 ----
arch/x86/kernel/process_64.c | 8 ----
include/asm-generic/preempt.h | 2 +-
include/linux/preempt.h | 20 +--------
include/linux/sched.h | 26 +++++++-----
include/trace/events/sched.h | 22 +++++-----
kernel/exit.c | 4 +-
kernel/sched/core.c | 83 +++++++++++++++++++++++++-------------
kernel/trace/ftrace.c | 2 +-
kernel/trace/trace_sched_switch.c | 3 +-
kernel/trace/trace_sched_wakeup.c | 2 +-
13 files changed, 89 insertions(+), 98 deletions(-)
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-09-30 9:04 ` Steven Rostedt
2015-10-01 13:25 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 02/12] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
` (10 subsequent siblings)
11 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-1.patch --]
[-- Type: text/plain, Size: 1682 bytes --]
As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
PREEMPT_ACTIVE to avoid cond_resched() from working before the
scheduler is setup.
However, keeping preemption disabled should do the same thing already,
making the PREEMPT_ACTIVE part entirely redundant.
The only complication is !PREEMPT_COUNT kernels, where
PREEMPT_DISABLED ends up being 0. Instead we use an unconditional
PREEMPT_OFFSET to set preempt_count() even on !PREEMPT_COUNT kernels.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/sched.h | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -606,19 +606,18 @@ struct task_cputime_atomic {
#endif
/*
- * Disable preemption until the scheduler is running.
- * Reset by start_kernel()->sched_init()->init_idle().
+ * Disable preemption until the scheduler is running -- use an unconditional
+ * value so that it also works on !PREEMPT_COUNT kernels.
*
- * We include PREEMPT_ACTIVE to avoid cond_resched() from working
- * before the scheduler is active -- see should_resched().
+ * Reset by start_kernel()->sched_init()->init_idle().
*/
-#define INIT_PREEMPT_COUNT (PREEMPT_DISABLED + PREEMPT_ACTIVE)
+#define INIT_PREEMPT_COUNT PREEMPT_OFFSET
/**
* struct thread_group_cputimer - thread group interval timer counts
* @cputime_atomic: atomic thread group interval timers.
* @running: non-zero when there are timers running and
- * @cputime receives updates.
+ * @cputime receives updates.
*
* This structure contains the version of task_cputime, above, that is
* used for thread group CPU timer calculations.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 02/12] sched: Rework TASK_DEAD preemption exception
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 03/12] sched: Create preempt_count invariant Peter Zijlstra
` (9 subsequent siblings)
11 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-4.patch --]
[-- Type: text/plain, Size: 1734 bytes --]
TASK_DEAD is special in that the final schedule call from do_exit()
must be done with preemption disabled.
This means we end up scheduling with a preempt_count() higher than
usual (3 instead of the 'expected' 2). Since future patches will want
to rely on an invariant preempt_count() value during schedule, fix
this up.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2949,12 +2949,8 @@ static inline void schedule_debug(struct
#ifdef CONFIG_SCHED_STACK_END_CHECK
BUG_ON(unlikely(task_stack_end_corrupted(prev)));
#endif
- /*
- * Test if we are atomic. Since do_exit() needs to call into
- * schedule() atomically, we ignore that path. Otherwise whine
- * if we are scheduling when we should not.
- */
- if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD))
+
+ if (unlikely(in_atomic_preempt_off()))
__schedule_bug(prev);
rcu_sleep_check();
@@ -3053,6 +3049,17 @@ static void __sched __schedule(void)
rcu_note_context_switch();
prev = rq->curr;
+ /*
+ * do_exit() calls schedule() with preemption disabled as an exception;
+ * however we must fix that up, otherwise the next task will see an
+ * inconsistent (higher) preempt count.
+ *
+ * It also avoids the below schedule_debug() test from complaining
+ * about this.
+ */
+ if (unlikely(prev->state == TASK_DEAD))
+ preempt_enable_no_resched_notrace();
+
schedule_debug(prev);
if (sched_feat(HRTICK))
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 03/12] sched: Create preempt_count invariant
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 02/12] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-09-30 9:32 ` Steven Rostedt
2015-09-30 7:10 ` [PATCH v2 04/12] sched: Add preempt argument to __schedule() Peter Zijlstra
` (8 subsequent siblings)
11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-2.patch --]
[-- Type: text/plain, Size: 3831 bytes --]
Assuming units of PREEMPT_DISABLE_OFFSET for preempt_count() numbers.
Now that TASK_DEAD no longer results in preempt_count() == 3 during
scheduling, we will always call context_switch() with preempt_count()
== 2.
However, we don't always end up with preempt_count() == 2 in
finish_task_switch() because new tasks get created with
preempt_count() == 1.
Create FORK_PREEMPT_COUNT and set it to 2 and use that in the right
places. Note that we cannot use INIT_PREEMPT_COUNT as that serves
another purpose (boot).
After this, preempt_count() is invariant across the context switch,
with exception of PREEMPT_ACTIVE.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/preempt.h | 2 +-
include/asm-generic/preempt.h | 2 +-
include/linux/sched.h | 17 ++++++++++++-----
kernel/sched/core.c | 23 +++++++++++++++++++++--
4 files changed, 35 insertions(+), 9 deletions(-)
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -31,7 +31,7 @@ static __always_inline void preempt_coun
* must be macros to avoid header recursion hell
*/
#define init_task_preempt_count(p) do { \
- task_thread_info(p)->saved_preempt_count = PREEMPT_DISABLED; \
+ task_thread_info(p)->saved_preempt_count = FORK_PREEMPT_COUNT; \
} while (0)
#define init_idle_preempt_count(p, cpu) do { \
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -24,7 +24,7 @@ static __always_inline void preempt_coun
* must be macros to avoid header recursion hell
*/
#define init_task_preempt_count(p) do { \
- task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
+ task_thread_info(p)->preempt_count = FORK_PREEMPT_COUNT; \
} while (0)
#define init_idle_preempt_count(p, cpu) do { \
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -599,11 +599,7 @@ struct task_cputime_atomic {
.sum_exec_runtime = ATOMIC64_INIT(0), \
}
-#ifdef CONFIG_PREEMPT_COUNT
-#define PREEMPT_DISABLED (1 + PREEMPT_ENABLED)
-#else
-#define PREEMPT_DISABLED PREEMPT_ENABLED
-#endif
+#define PREEMPT_DISABLED (PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
/*
* Disable preemption until the scheduler is running -- use an unconditional
@@ -613,6 +609,17 @@ struct task_cputime_atomic {
*/
#define INIT_PREEMPT_COUNT PREEMPT_OFFSET
+/*
+ * Initial preempt_count value; reflects the preempt_count schedule invariant
+ * which states that during context switches:
+ *
+ * preempt_count() == 2*PREEMPT_DISABLE_OFFSET
+ *
+ * Note: PREEMPT_DISABLE_OFFSET is 0 for !PREEMPT_COUNT kernels.
+ * Note: See finish_task_switch().
+ */
+#define FORK_PREEMPT_COUNT (2*PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
+
/**
* struct thread_group_cputimer - thread group interval timer counts
* @cputime_atomic: atomic thread group interval timers.
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2504,6 +2504,18 @@ static struct rq *finish_task_switch(str
struct mm_struct *mm = rq->prev_mm;
long prev_state;
+ /*
+ * The previous task will have left us with a preempt_count of 2
+ * because it left us after:
+ *
+ * schedule()
+ * preempt_disable(); // 1
+ * __schedule()
+ * raw_spin_lock_irq(&rq->lock) // 2
+ *
+ * Also, see FORK_PREEMPT_COUNT.
+ */
+
rq->prev_mm = NULL;
/*
@@ -2588,8 +2600,15 @@ asmlinkage __visible void schedule_tail(
{
struct rq *rq;
- /* finish_task_switch() drops rq->lock and enables preemtion */
- preempt_disable();
+ /*
+ * New tasks start with FORK_PREEMPT_COUNT, see there and
+ * finish_task_switch() for details.
+ *
+ * finish_task_switch() will drop rq->lock() and lower preempt_count
+ * and the preempt_enable() will end up enabling preemption (on
+ * PREEMPT_COUNT kernels).
+ */
+
rq = finish_task_switch(prev);
balance_callback(rq);
preempt_enable();
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 04/12] sched: Add preempt argument to __schedule()
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
` (2 preceding siblings ...)
2015-09-30 7:10 ` [PATCH v2 03/12] sched: Create preempt_count invariant Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 05/12] sched: Fix trace_sched_switch() Peter Zijlstra
` (7 subsequent siblings)
11 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-5.patch --]
[-- Type: text/plain, Size: 2087 bytes --]
There is only a single PREEMPT_ACTIVE use in the regular __schedule()
path and that is to circumvent the task->state check. Since the code
setting PREEMPT_ACTIVE is the immediate caller of __schedule() we can
replace this with a function argument.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3037,7 +3037,7 @@ pick_next_task(struct rq *rq, struct tas
*
* WARNING: must be called with preemption disabled!
*/
-static void __sched __schedule(void)
+static void __sched __schedule(bool preempt)
{
struct task_struct *prev, *next;
unsigned long *switch_count;
@@ -3077,7 +3077,7 @@ static void __sched __schedule(void)
rq->clock_skip_update <<= 1; /* promote REQ to ACT */
switch_count = &prev->nivcsw;
- if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
+ if (!preempt && prev->state) {
if (unlikely(signal_pending_state(prev->state, prev))) {
prev->state = TASK_RUNNING;
} else {
@@ -3142,7 +3142,7 @@ asmlinkage __visible void __sched schedu
sched_submit_work(tsk);
do {
preempt_disable();
- __schedule();
+ __schedule(false);
sched_preempt_enable_no_resched();
} while (need_resched());
}
@@ -3183,7 +3183,7 @@ static void __sched notrace preempt_sche
{
do {
preempt_active_enter();
- __schedule();
+ __schedule(true);
preempt_active_exit();
/*
@@ -3248,7 +3248,7 @@ asmlinkage __visible void __sched notrac
* an infinite recursion.
*/
prev_ctx = exception_enter();
- __schedule();
+ __schedule(true);
exception_exit(prev_ctx);
barrier();
@@ -3277,7 +3277,7 @@ asmlinkage __visible void __sched preemp
do {
preempt_active_enter();
local_irq_enable();
- __schedule();
+ __schedule(true);
local_irq_disable();
preempt_active_exit();
} while (need_resched());
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 05/12] sched: Fix trace_sched_switch()
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
` (3 preceding siblings ...)
2015-09-30 7:10 ` [PATCH v2 04/12] sched: Add preempt argument to __schedule() Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
` (6 subsequent siblings)
11 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-6.patch --]
[-- Type: text/plain, Size: 4039 bytes --]
__trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE
user, move trace_sched_switch() from prepare_task_switch() to
__schedule() and propagate the @preempt argument.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/trace/events/sched.h | 22 +++++++++-------------
kernel/sched/core.c | 2 +-
kernel/trace/ftrace.c | 2 +-
kernel/trace/trace_sched_switch.c | 3 ++-
kernel/trace/trace_sched_wakeup.c | 2 +-
5 files changed, 14 insertions(+), 17 deletions(-)
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -104,22 +104,17 @@ DEFINE_EVENT(sched_wakeup_template, sche
TP_ARGS(p));
#ifdef CREATE_TRACE_POINTS
-static inline long __trace_sched_switch_state(struct task_struct *p)
+static inline long __trace_sched_switch_state(bool preempt, struct task_struct *p)
{
- long state = p->state;
-
-#ifdef CONFIG_PREEMPT
#ifdef CONFIG_SCHED_DEBUG
BUG_ON(p != current);
#endif /* CONFIG_SCHED_DEBUG */
+
/*
- * For all intents and purposes a preempted task is a running task.
+ * Preemption ignores task state, therefore preempted tasks are always
+ * RUNNING (we will not have dequeued if state != RUNNING).
*/
- if (preempt_count() & PREEMPT_ACTIVE)
- state = TASK_RUNNING | TASK_STATE_MAX;
-#endif /* CONFIG_PREEMPT */
-
- return state;
+ return preempt ? TASK_RUNNING | TASK_STATE_MAX : p->state;
}
#endif /* CREATE_TRACE_POINTS */
@@ -128,10 +123,11 @@ static inline long __trace_sched_switch_
*/
TRACE_EVENT(sched_switch,
- TP_PROTO(struct task_struct *prev,
+ TP_PROTO(bool preempt,
+ struct task_struct *prev,
struct task_struct *next),
- TP_ARGS(prev, next),
+ TP_ARGS(preempt, prev, next),
TP_STRUCT__entry(
__array( char, prev_comm, TASK_COMM_LEN )
@@ -147,7 +143,7 @@ TRACE_EVENT(sched_switch,
memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
__entry->prev_pid = prev->pid;
__entry->prev_prio = prev->prio;
- __entry->prev_state = __trace_sched_switch_state(prev);
+ __entry->prev_state = __trace_sched_switch_state(preempt, prev);
memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
__entry->next_pid = next->pid;
__entry->next_prio = next->prio;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2470,7 +2470,6 @@ static inline void
prepare_task_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
{
- trace_sched_switch(prev, next);
sched_info_switch(rq, prev, next);
perf_event_task_sched_out(prev, next);
fire_sched_out_preempt_notifiers(prev, next);
@@ -3121,6 +3120,7 @@ static void __sched __schedule(bool pree
rq->curr = next;
++*switch_count;
+ trace_sched_switch(preempt, prev, next);
rq = context_switch(rq, prev, next); /* unlocks the rq */
cpu = cpu_of(rq);
} else {
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5697,7 +5697,7 @@ static int alloc_retstack_tasklist(struc
}
static void
-ftrace_graph_probe_sched_switch(void *ignore,
+ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
struct task_struct *prev, struct task_struct *next)
{
unsigned long long timestamp;
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -16,7 +16,8 @@ static int sched_ref;
static DEFINE_MUTEX(sched_register_mutex);
static void
-probe_sched_switch(void *ignore, struct task_struct *prev, struct task_struct *next)
+probe_sched_switch(void *ignore, bool preempt,
+ struct task_struct *prev, struct task_struct *next)
{
if (unlikely(!sched_ref))
return;
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -420,7 +420,7 @@ tracing_sched_wakeup_trace(struct trace_
}
static void notrace
-probe_wakeup_sched_switch(void *ignore,
+probe_wakeup_sched_switch(void *ignore, bool preempt,
struct task_struct *prev, struct task_struct *next)
{
struct trace_array_cpu *data;
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
` (4 preceding siblings ...)
2015-09-30 7:10 ` [PATCH v2 05/12] sched: Fix trace_sched_switch() Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-10-01 15:27 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 07/12] sched: Robustify preemption leak checks Peter Zijlstra
` (5 subsequent siblings)
11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-7.patch --]
[-- Type: text/plain, Size: 2486 bytes --]
Now that nothing tests for PREEMPT_ACTIVE anymore, stop setting it.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/preempt.h | 12 ------------
kernel/sched/core.c | 19 ++++++-------------
2 files changed, 6 insertions(+), 25 deletions(-)
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -146,18 +146,6 @@ extern void preempt_count_sub(int val);
#define preempt_count_inc() preempt_count_add(1)
#define preempt_count_dec() preempt_count_sub(1)
-#define preempt_active_enter() \
-do { \
- preempt_count_add(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET); \
- barrier(); \
-} while (0)
-
-#define preempt_active_exit() \
-do { \
- barrier(); \
- preempt_count_sub(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET); \
-} while (0)
-
#ifdef CONFIG_PREEMPT_COUNT
#define preempt_disable() \
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3190,9 +3190,9 @@ void __sched schedule_preempt_disabled(v
static void __sched notrace preempt_schedule_common(void)
{
do {
- preempt_active_enter();
+ preempt_disable();
__schedule(true);
- preempt_active_exit();
+ sched_preempt_enable_no_resched();
/*
* Check again in case we missed a preemption opportunity
@@ -3243,13 +3243,7 @@ asmlinkage __visible void __sched notrac
return;
do {
- /*
- * Use raw __prempt_count() ops that don't call function.
- * We can't call functions before disabling preemption which
- * disarm preemption tracing recursions.
- */
- __preempt_count_add(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
- barrier();
+ preempt_disable_notrace();
/*
* Needs preempt disabled in case user_exit() is traced
* and the tracer calls preempt_enable_notrace() causing
@@ -3259,8 +3253,7 @@ asmlinkage __visible void __sched notrac
__schedule(true);
exception_exit(prev_ctx);
- barrier();
- __preempt_count_sub(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
+ preempt_enable_no_resched_notrace();
} while (need_resched());
}
EXPORT_SYMBOL_GPL(preempt_schedule_notrace);
@@ -3283,11 +3276,11 @@ asmlinkage __visible void __sched preemp
prev_state = exception_enter();
do {
- preempt_active_enter();
+ preempt_disable();
local_irq_enable();
__schedule(true);
local_irq_disable();
- preempt_active_exit();
+ sched_preempt_enable_no_resched();
} while (need_resched());
exception_exit(prev_state);
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 07/12] sched: Robustify preemption leak checks
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
` (5 preceding siblings ...)
2015-09-30 7:10 ` [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-09-30 9:35 ` Steven Rostedt
2015-09-30 7:10 ` [PATCH v2 08/12] sched: Simplify preempt_count tests Peter Zijlstra
` (4 subsequent siblings)
11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-3.patch --]
[-- Type: text/plain, Size: 1564 bytes --]
When we warn about a preempt_count leak; reset the preempt_count to
the known good value such that the problem does not ripple forward.
This is most important on x86 which has a per cpu preempt_count that is
not saved/restored (after this series). So if you schedule with an
invalid (!2*PREEMPT_DISABLE_OFFSET) preempt_count the next task is
messed up too.
Enforcing this invariant limits the borkage to just the one task.
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/exit.c | 4 +++-
kernel/sched/core.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -706,10 +706,12 @@ void do_exit(long code)
smp_mb();
raw_spin_unlock_wait(&tsk->pi_lock);
- if (unlikely(in_atomic()))
+ if (unlikely(in_atomic())) {
pr_info("note: %s[%d] exited with preempt_count %d\n",
current->comm, task_pid_nr(current),
preempt_count());
+ preempt_count_set(PREEMPT_ENABLED);
+ }
/* sync mm's RSS info before statistics gathering */
if (tsk->mm)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2968,8 +2968,10 @@ static inline void schedule_debug(struct
BUG_ON(unlikely(task_stack_end_corrupted(prev)));
#endif
- if (unlikely(in_atomic_preempt_off()))
+ if (unlikely(in_atomic_preempt_off())) {
__schedule_bug(prev);
+ preempt_count_set(PREEMPT_DISABLED);
+ }
rcu_sleep_check();
profile_hit(SCHED_PROFILING, __builtin_return_address(0));
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 08/12] sched: Simplify preempt_count tests
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
` (6 preceding siblings ...)
2015-09-30 7:10 ` [PATCH v2 07/12] sched: Robustify preemption leak checks Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-10-01 15:31 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 09/12] sched, x86: Kill saved_preempt_count Peter Zijlstra
` (3 subsequent siblings)
11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-8.patch --]
[-- Type: text/plain, Size: 1229 bytes --]
Since we stopped setting PREEMPT_ACTIVE, there is no need to mask it
out of preempt_count() tests.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/preempt.h | 3 +--
kernel/sched/core.c | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -126,8 +126,7 @@
* Check whether we were atomic before we did preempt_disable():
* (used by the scheduler)
*/
-#define in_atomic_preempt_off() \
- ((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_DISABLE_OFFSET)
+#define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
extern void preempt_count_add(int val);
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7472,7 +7472,7 @@ void __init sched_init(void)
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
static inline int preempt_count_equals(int preempt_offset)
{
- int nested = (preempt_count() & ~PREEMPT_ACTIVE) + rcu_preempt_depth();
+ int nested = preempt_count() + rcu_preempt_depth();
return (nested == preempt_offset);
}
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 09/12] sched, x86: Kill saved_preempt_count
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
` (7 preceding siblings ...)
2015-09-30 7:10 ` [PATCH v2 08/12] sched: Simplify preempt_count tests Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-10-01 15:40 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
` (2 subsequent siblings)
11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-9.patch --]
[-- Type: text/plain, Size: 2890 bytes --]
With the introduction of the context switch preempt_count invariant,
and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
per-cpu preemption count, it must always be the same.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/preempt.h | 5 +----
arch/x86/include/asm/thread_info.h | 2 --
arch/x86/kernel/process_32.c | 8 --------
arch/x86/kernel/process_64.c | 8 --------
4 files changed, 1 insertion(+), 22 deletions(-)
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -30,12 +30,9 @@ static __always_inline void preempt_coun
/*
* must be macros to avoid header recursion hell
*/
-#define init_task_preempt_count(p) do { \
- task_thread_info(p)->saved_preempt_count = FORK_PREEMPT_COUNT; \
-} while (0)
+#define init_task_preempt_count(p) do { } while (0)
#define init_idle_preempt_count(p, cpu) do { \
- task_thread_info(p)->saved_preempt_count = PREEMPT_ENABLED; \
per_cpu(__preempt_count, (cpu)) = PREEMPT_ENABLED; \
} while (0)
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -57,7 +57,6 @@ struct thread_info {
__u32 flags; /* low level flags */
__u32 status; /* thread synchronous flags */
__u32 cpu; /* current CPU */
- int saved_preempt_count;
mm_segment_t addr_limit;
void __user *sysenter_return;
unsigned int sig_on_uaccess_error:1;
@@ -69,7 +68,6 @@ struct thread_info {
.task = &tsk, \
.flags = 0, \
.cpu = 0, \
- .saved_preempt_count = INIT_PREEMPT_COUNT, \
.addr_limit = KERNEL_DS, \
}
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -280,14 +280,6 @@ __switch_to(struct task_struct *prev_p,
set_iopl_mask(next->iopl);
/*
- * If it were not for PREEMPT_ACTIVE we could guarantee that the
- * preempt_count of all tasks was equal here and this would not be
- * needed.
- */
- task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
- this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
-
- /*
* Now maybe handle debug registers and/or IO bitmaps
*/
if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -401,14 +401,6 @@ __switch_to(struct task_struct *prev_p,
*/
this_cpu_write(current_task, next_p);
- /*
- * If it were not for PREEMPT_ACTIVE we could guarantee that the
- * preempt_count of all tasks was equal here and this would not be
- * needed.
- */
- task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
- this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
-
/* Reload esp0 and ss1. This changes current_thread_info(). */
load_sp0(tss, next);
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
` (8 preceding siblings ...)
2015-09-30 7:10 ` [PATCH v2 09/12] sched, x86: Kill saved_preempt_count Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-10-01 15:41 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 11/12] sched: More notrace Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 12/12] sched: Add preempt_count invariant check Peter Zijlstra
11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-10.patch --]
[-- Type: text/plain, Size: 902 bytes --]
Its unused, kill the definition.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/preempt.h | 5 -----
1 file changed, 5 deletions(-)
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -26,7 +26,6 @@
* SOFTIRQ_MASK: 0x0000ff00
* HARDIRQ_MASK: 0x000f0000
* NMI_MASK: 0x00100000
- * PREEMPT_ACTIVE: 0x00200000
* PREEMPT_NEED_RESCHED: 0x80000000
*/
#define PREEMPT_BITS 8
@@ -53,10 +52,6 @@
#define SOFTIRQ_DISABLE_OFFSET (2 * SOFTIRQ_OFFSET)
-#define PREEMPT_ACTIVE_BITS 1
-#define PREEMPT_ACTIVE_SHIFT (NMI_SHIFT + NMI_BITS)
-#define PREEMPT_ACTIVE (__IRQ_MASK(PREEMPT_ACTIVE_BITS) << PREEMPT_ACTIVE_SHIFT)
-
/* We use the MSB mostly because its available */
#define PREEMPT_NEED_RESCHED 0x80000000
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 11/12] sched: More notrace
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
` (9 preceding siblings ...)
2015-09-30 7:10 ` [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 12/12] sched: Add preempt_count invariant check Peter Zijlstra
11 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-11.patch --]
[-- Type: text/plain, Size: 2159 bytes --]
preempt_schedule_common() is marked notrace, but it does not use
_notrace() preempt_count functions and __schedule() is also not marked
notrace, which means that its perfectly possible to end up in the
tracer from preempt_schedule_common().
Steve says:
| Yep, there's some history to this. This was originally the issue that
| caused function tracing to go into infinite recursion. But now we have
| preempt_schedule_notrace(), which is used by the function tracer, and
| that function must not be traced till preemption is disabled.
|
| Now if function tracing is running and we take an interrupt when
| NEED_RESCHED is set, it calls
|
| preempt_schedule_common() (not traced)
|
| But then that calls preempt_disable() (traced)
|
| function tracer calls preempt_disable_notrace() followed by
| preempt_enable_notrace() which will see NEED_RESCHED set, and it will
| call preempt_schedule_notrace(), which stops the recursion, but
| still calls __schedule() here, and that means when we return, we call
| the __schedule() from preempt_schedule_common().
|
| That said, I prefer this patch. Preemption is disabled before calling
| __schedule(), and we get rid of a one round recursion with the
| scheduler.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3057,7 +3057,7 @@ pick_next_task(struct rq *rq, struct tas
*
* WARNING: must be called with preemption disabled!
*/
-static void __sched __schedule(bool preempt)
+static void __sched notrace __schedule(bool preempt)
{
struct task_struct *prev, *next;
unsigned long *switch_count;
@@ -3203,9 +3203,9 @@ void __sched schedule_preempt_disabled(v
static void __sched notrace preempt_schedule_common(void)
{
do {
- preempt_disable();
+ preempt_disable_notrace();
__schedule(true);
- sched_preempt_enable_no_resched();
+ preempt_enable_no_resched_notrace();
/*
* Check again in case we missed a preemption opportunity
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 12/12] sched: Add preempt_count invariant check
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
` (10 preceding siblings ...)
2015-09-30 7:10 ` [PATCH v2 11/12] sched: More notrace Peter Zijlstra
@ 2015-09-30 7:10 ` Peter Zijlstra
2015-09-30 9:38 ` Steven Rostedt
11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 7:10 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
rostedt, Peter Zijlstra
[-- Attachment #1: peterz-kill-preempt_active-12.patch --]
[-- Type: text/plain, Size: 661 bytes --]
Ingo requested I keep my debug check for the preempt_count invariant.
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2514,6 +2514,10 @@ static struct rq *finish_task_switch(str
*
* Also, see FORK_PREEMPT_COUNT.
*/
+ if (unlikely(WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
+ "corrupted preempt_count: %s/%d/0x%x\n",
+ current->comm, current->pid, preempt_count())))
+ preempt_count_set(FORK_PREEMPT_COUNT);
rq->prev_mm = NULL;
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT
2015-09-30 7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
@ 2015-09-30 9:04 ` Steven Rostedt
2015-09-30 20:39 ` Thomas Gleixner
2015-10-01 13:25 ` Frederic Weisbecker
1 sibling, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30 9:04 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx
On Wed, 30 Sep 2015 09:10:36 +0200
Peter Zijlstra <peterz@infradead.org> wrote:
> As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
> PREEMPT_ACTIVE to avoid cond_resched() from working before the
> scheduler is setup.
>
> However, keeping preemption disabled should do the same thing already,
> making the PREEMPT_ACTIVE part entirely redundant.
>
> The only complication is !PREEMPT_COUNT kernels, where
> PREEMPT_DISABLED ends up being 0. Instead we use an unconditional
> PREEMPT_OFFSET to set preempt_count() even on !PREEMPT_COUNT kernels.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> include/linux/sched.h | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -606,19 +606,18 @@ struct task_cputime_atomic {
> #endif
>
> /*
> - * Disable preemption until the scheduler is running.
> - * Reset by start_kernel()->sched_init()->init_idle().
> + * Disable preemption until the scheduler is running -- use an unconditional
> + * value so that it also works on !PREEMPT_COUNT kernels.
> *
> - * We include PREEMPT_ACTIVE to avoid cond_resched() from working
> - * before the scheduler is active -- see should_resched().
> + * Reset by start_kernel()->sched_init()->init_idle().
Reset by start_kernel()->sched_init()->init_idle()->init_idle_preempt_count().
Other than that.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
-- Steve
> */
> -#define INIT_PREEMPT_COUNT (PREEMPT_DISABLED + PREEMPT_ACTIVE)
> +#define INIT_PREEMPT_COUNT PREEMPT_OFFSET
>
> /**
> * struct thread_group_cputimer - thread group interval timer counts
> * @cputime_atomic: atomic thread group interval timers.
> * @running: non-zero when there are timers running and
> - * @cputime receives updates.
> + * @cputime receives updates.
> *
> * This structure contains the version of task_cputime, above, that is
> * used for thread group CPU timer calculations.
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 03/12] sched: Create preempt_count invariant
2015-09-30 7:10 ` [PATCH v2 03/12] sched: Create preempt_count invariant Peter Zijlstra
@ 2015-09-30 9:32 ` Steven Rostedt
2015-09-30 11:13 ` Peter Zijlstra
0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30 9:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx
On Wed, 30 Sep 2015 09:10:38 +0200
Peter Zijlstra <peterz@infradead.org> wrote:
> Assuming units of PREEMPT_DISABLE_OFFSET for preempt_count() numbers.
>
> Now that TASK_DEAD no longer results in preempt_count() == 3 during
> scheduling, we will always call context_switch() with preempt_count()
> == 2.
>
> However, we don't always end up with preempt_count() == 2 in
> finish_task_switch() because new tasks get created with
> preempt_count() == 1.
>
> Create FORK_PREEMPT_COUNT and set it to 2 and use that in the right
> places. Note that we cannot use INIT_PREEMPT_COUNT as that serves
> another purpose (boot).
>
> After this, preempt_count() is invariant across the context switch,
> with exception of PREEMPT_ACTIVE.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> arch/x86/include/asm/preempt.h | 2 +-
> include/asm-generic/preempt.h | 2 +-
> include/linux/sched.h | 17 ++++++++++++-----
> kernel/sched/core.c | 23 +++++++++++++++++++++--
> 4 files changed, 35 insertions(+), 9 deletions(-)
>
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -31,7 +31,7 @@ static __always_inline void preempt_coun
> * must be macros to avoid header recursion hell
> */
> #define init_task_preempt_count(p) do { \
> - task_thread_info(p)->saved_preempt_count = PREEMPT_DISABLED; \
> + task_thread_info(p)->saved_preempt_count = FORK_PREEMPT_COUNT; \
> } while (0)
>
> #define init_idle_preempt_count(p, cpu) do { \
> --- a/include/asm-generic/preempt.h
> +++ b/include/asm-generic/preempt.h
> @@ -24,7 +24,7 @@ static __always_inline void preempt_coun
> * must be macros to avoid header recursion hell
> */
> #define init_task_preempt_count(p) do { \
> - task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
> + task_thread_info(p)->preempt_count = FORK_PREEMPT_COUNT; \
> } while (0)
>
> #define init_idle_preempt_count(p, cpu) do { \
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -599,11 +599,7 @@ struct task_cputime_atomic {
> .sum_exec_runtime = ATOMIC64_INIT(0), \
> }
>
> -#ifdef CONFIG_PREEMPT_COUNT
> -#define PREEMPT_DISABLED (1 + PREEMPT_ENABLED)
> -#else
> -#define PREEMPT_DISABLED PREEMPT_ENABLED
> -#endif
> +#define PREEMPT_DISABLED (PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
Hmm, it looks to me that you removed all users of PREEMPT_DISABLED.
Did you add another user of it?
-- Steve
>
> /*
> * Disable preemption until the scheduler is running -- use an unconditional
> @@ -613,6 +609,17 @@ struct task_cputime_atomic {
> */
> #define INIT_PREEMPT_COUNT PREEMPT_OFFSET
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 07/12] sched: Robustify preemption leak checks
2015-09-30 7:10 ` [PATCH v2 07/12] sched: Robustify preemption leak checks Peter Zijlstra
@ 2015-09-30 9:35 ` Steven Rostedt
0 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30 9:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx
On Wed, 30 Sep 2015 09:10:42 +0200
Peter Zijlstra <peterz@infradead.org> wrote:
> When we warn about a preempt_count leak; reset the preempt_count to
> the known good value such that the problem does not ripple forward.
>
> This is most important on x86 which has a per cpu preempt_count that is
> not saved/restored (after this series). So if you schedule with an
> invalid (!2*PREEMPT_DISABLE_OFFSET) preempt_count the next task is
> messed up too.
>
> Enforcing this invariant limits the borkage to just the one task.
>
> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
-- Steve
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 12/12] sched: Add preempt_count invariant check
2015-09-30 7:10 ` [PATCH v2 12/12] sched: Add preempt_count invariant check Peter Zijlstra
@ 2015-09-30 9:38 ` Steven Rostedt
2015-09-30 11:15 ` Peter Zijlstra
0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30 9:38 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx
On Wed, 30 Sep 2015 09:10:47 +0200
Peter Zijlstra <peterz@infradead.org> wrote:
> Ingo requested I keep my debug check for the preempt_count invariant.
>
> Requested-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/sched/core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2514,6 +2514,10 @@ static struct rq *finish_task_switch(str
> *
> * Also, see FORK_PREEMPT_COUNT.
> */
> + if (unlikely(WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
Nuke the "unlikely" it's redundant with the WARN_ONCE().
-- Steve
> + "corrupted preempt_count: %s/%d/0x%x\n",
> + current->comm, current->pid, preempt_count())))
> + preempt_count_set(FORK_PREEMPT_COUNT);
>
> rq->prev_mm = NULL;
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 03/12] sched: Create preempt_count invariant
2015-09-30 9:32 ` Steven Rostedt
@ 2015-09-30 11:13 ` Peter Zijlstra
2015-09-30 13:36 ` Steven Rostedt
0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 11:13 UTC (permalink / raw)
To: Steven Rostedt
Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx
On Wed, Sep 30, 2015 at 05:32:08AM -0400, Steven Rostedt wrote:
> On Wed, 30 Sep 2015 09:10:38 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> > -#ifdef CONFIG_PREEMPT_COUNT
> > -#define PREEMPT_DISABLED (1 + PREEMPT_ENABLED)
> > -#else
> > -#define PREEMPT_DISABLED PREEMPT_ENABLED
> > -#endif
> > +#define PREEMPT_DISABLED (PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
>
> Hmm, it looks to me that you removed all users of PREEMPT_DISABLED.
>
> Did you add another user of it?
Dunno, lemme go grep ;-)
include/linux/sched.h:#define PREEMPT_DISABLED (PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
kernel/sched/core.c: preempt_count_set(PREEMPT_DISABLED);
So there still is one user.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 12/12] sched: Add preempt_count invariant check
2015-09-30 9:38 ` Steven Rostedt
@ 2015-09-30 11:15 ` Peter Zijlstra
0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 11:15 UTC (permalink / raw)
To: Steven Rostedt
Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx
On Wed, Sep 30, 2015 at 05:38:10AM -0400, Steven Rostedt wrote:
> On Wed, 30 Sep 2015 09:10:47 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
>
> > Ingo requested I keep my debug check for the preempt_count invariant.
> >
> > Requested-by: Ingo Molnar <mingo@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > kernel/sched/core.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2514,6 +2514,10 @@ static struct rq *finish_task_switch(str
> > *
> > * Also, see FORK_PREEMPT_COUNT.
> > */
> > + if (unlikely(WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
>
> Nuke the "unlikely" it's redundant with the WARN_ONCE().
Ah, indeed. *poof*
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 03/12] sched: Create preempt_count invariant
2015-09-30 11:13 ` Peter Zijlstra
@ 2015-09-30 13:36 ` Steven Rostedt
0 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30 13:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx
On Wed, 30 Sep 2015 13:13:23 +0200
Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Sep 30, 2015 at 05:32:08AM -0400, Steven Rostedt wrote:
> > On Wed, 30 Sep 2015 09:10:38 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > > -#ifdef CONFIG_PREEMPT_COUNT
> > > -#define PREEMPT_DISABLED (1 + PREEMPT_ENABLED)
> > > -#else
> > > -#define PREEMPT_DISABLED PREEMPT_ENABLED
> > > -#endif
> > > +#define PREEMPT_DISABLED (PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
> >
> > Hmm, it looks to me that you removed all users of PREEMPT_DISABLED.
> >
> > Did you add another user of it?
>
> Dunno, lemme go grep ;-)
>
> include/linux/sched.h:#define PREEMPT_DISABLED (PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
> kernel/sched/core.c: preempt_count_set(PREEMPT_DISABLED);
>
> So there still is one user.
Ah, that's the one you added in patch 7. I did a git grep from the
vanilla tree.
-- Steve
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT
2015-09-30 9:04 ` Steven Rostedt
@ 2015-09-30 20:39 ` Thomas Gleixner
0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2015-09-30 20:39 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra, mingo, linux-kernel, torvalds, fweisbec, oleg,
umgwanakikbuti
On Wed, 30 Sep 2015, Steven Rostedt wrote:
> > /*
> > - * Disable preemption until the scheduler is running.
> > - * Reset by start_kernel()->sched_init()->init_idle().
> > + * Disable preemption until the scheduler is running -- use an unconditional
> > + * value so that it also works on !PREEMPT_COUNT kernels.
> > *
> > - * We include PREEMPT_ACTIVE to avoid cond_resched() from working
> > - * before the scheduler is active -- see should_resched().
> > + * Reset by start_kernel()->sched_init()->init_idle().
>
> Reset by start_kernel()->sched_init()->init_idle()->init_idle_preempt_count().
>
> Other than that.
>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT
2015-09-30 7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
2015-09-30 9:04 ` Steven Rostedt
@ 2015-10-01 13:25 ` Frederic Weisbecker
1 sibling, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 13:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt
On Wed, Sep 30, 2015 at 09:10:36AM +0200, Peter Zijlstra wrote:
> As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
> PREEMPT_ACTIVE to avoid cond_resched() from working before the
> scheduler is setup.
>
> However, keeping preemption disabled should do the same thing already,
> making the PREEMPT_ACTIVE part entirely redundant.
>
> The only complication is !PREEMPT_COUNT kernels, where
> PREEMPT_DISABLED ends up being 0. Instead we use an unconditional
> PREEMPT_OFFSET to set preempt_count() even on !PREEMPT_COUNT kernels.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE
2015-09-30 7:10 ` [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
@ 2015-10-01 15:27 ` Frederic Weisbecker
0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 15:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt
On Wed, Sep 30, 2015 at 09:10:41AM +0200, Peter Zijlstra wrote:
> Now that nothing tests for PREEMPT_ACTIVE anymore, stop setting it.
>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Great news!
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 08/12] sched: Simplify preempt_count tests
2015-09-30 7:10 ` [PATCH v2 08/12] sched: Simplify preempt_count tests Peter Zijlstra
@ 2015-10-01 15:31 ` Frederic Weisbecker
0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 15:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt
On Wed, Sep 30, 2015 at 09:10:43AM +0200, Peter Zijlstra wrote:
> Since we stopped setting PREEMPT_ACTIVE, there is no need to mask it
> out of preempt_count() tests.
>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 09/12] sched, x86: Kill saved_preempt_count
2015-09-30 7:10 ` [PATCH v2 09/12] sched, x86: Kill saved_preempt_count Peter Zijlstra
@ 2015-10-01 15:40 ` Frederic Weisbecker
0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 15:40 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt
On Wed, Sep 30, 2015 at 09:10:44AM +0200, Peter Zijlstra wrote:
> With the introduction of the context switch preempt_count invariant,
> and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
> per-cpu preemption count, it must always be the same.
>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Good news as well.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE
2015-09-30 7:10 ` [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
@ 2015-10-01 15:41 ` Frederic Weisbecker
0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 15:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt
On Wed, Sep 30, 2015 at 09:10:45AM +0200, Peter Zijlstra wrote:
> Its unused, kill the definition.
>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2015-10-01 15:41 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-30 7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
2015-09-30 9:04 ` Steven Rostedt
2015-09-30 20:39 ` Thomas Gleixner
2015-10-01 13:25 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 02/12] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 03/12] sched: Create preempt_count invariant Peter Zijlstra
2015-09-30 9:32 ` Steven Rostedt
2015-09-30 11:13 ` Peter Zijlstra
2015-09-30 13:36 ` Steven Rostedt
2015-09-30 7:10 ` [PATCH v2 04/12] sched: Add preempt argument to __schedule() Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 05/12] sched: Fix trace_sched_switch() Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
2015-10-01 15:27 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 07/12] sched: Robustify preemption leak checks Peter Zijlstra
2015-09-30 9:35 ` Steven Rostedt
2015-09-30 7:10 ` [PATCH v2 08/12] sched: Simplify preempt_count tests Peter Zijlstra
2015-10-01 15:31 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 09/12] sched, x86: Kill saved_preempt_count Peter Zijlstra
2015-10-01 15:40 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
2015-10-01 15:41 ` Frederic Weisbecker
2015-09-30 7:10 ` [PATCH v2 11/12] sched: More notrace Peter Zijlstra
2015-09-30 7:10 ` [PATCH v2 12/12] sched: Add preempt_count invariant check Peter Zijlstra
2015-09-30 9:38 ` Steven Rostedt
2015-09-30 11:15 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.