All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE
@ 2015-09-30  7:10 Peter Zijlstra
  2015-09-30  7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
                   ` (11 more replies)
  0 siblings, 12 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

Second posting of the series that kills PREEMPT_ACTIVE dead. Thanks for all the
feedback.

I've re-oredered the patches slightly and reworked one or two; those that got
significant changes I've ignored the Reviewed-by tags for.

Please have another careful look.

---
 arch/x86/include/asm/preempt.h     |  5 +--
 arch/x86/include/asm/thread_info.h |  2 -
 arch/x86/kernel/process_32.c       |  8 ----
 arch/x86/kernel/process_64.c       |  8 ----
 include/asm-generic/preempt.h      |  2 +-
 include/linux/preempt.h            | 20 +--------
 include/linux/sched.h              | 26 +++++++-----
 include/trace/events/sched.h       | 22 +++++-----
 kernel/exit.c                      |  4 +-
 kernel/sched/core.c                | 83 +++++++++++++++++++++++++-------------
 kernel/trace/ftrace.c              |  2 +-
 kernel/trace/trace_sched_switch.c  |  3 +-
 kernel/trace/trace_sched_wakeup.c  |  2 +-
 13 files changed, 89 insertions(+), 98 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-09-30  9:04   ` Steven Rostedt
  2015-10-01 13:25   ` Frederic Weisbecker
  2015-09-30  7:10 ` [PATCH v2 02/12] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-1.patch --]
[-- Type: text/plain, Size: 1682 bytes --]

As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
PREEMPT_ACTIVE to avoid cond_resched() from working before the
scheduler is setup.

However, keeping preemption disabled should do the same thing already,
making the PREEMPT_ACTIVE part entirely redundant.

The only complication is !PREEMPT_COUNT kernels, where
PREEMPT_DISABLED ends up being 0. Instead we use an unconditional
PREEMPT_OFFSET to set preempt_count() even on !PREEMPT_COUNT kernels.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched.h |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -606,19 +606,18 @@ struct task_cputime_atomic {
 #endif
 
 /*
- * Disable preemption until the scheduler is running.
- * Reset by start_kernel()->sched_init()->init_idle().
+ * Disable preemption until the scheduler is running -- use an unconditional
+ * value so that it also works on !PREEMPT_COUNT kernels.
  *
- * We include PREEMPT_ACTIVE to avoid cond_resched() from working
- * before the scheduler is active -- see should_resched().
+ * Reset by start_kernel()->sched_init()->init_idle().
  */
-#define INIT_PREEMPT_COUNT	(PREEMPT_DISABLED + PREEMPT_ACTIVE)
+#define INIT_PREEMPT_COUNT	PREEMPT_OFFSET
 
 /**
  * struct thread_group_cputimer - thread group interval timer counts
  * @cputime_atomic:	atomic thread group interval timers.
  * @running:		non-zero when there are timers running and
- * 			@cputime receives updates.
+ *			@cputime receives updates.
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 02/12] sched: Rework TASK_DEAD preemption exception
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
  2015-09-30  7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-09-30  7:10 ` [PATCH v2 03/12] sched: Create preempt_count invariant Peter Zijlstra
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-4.patch --]
[-- Type: text/plain, Size: 1734 bytes --]

TASK_DEAD is special in that the final schedule call from do_exit()
must be done with preemption disabled.

This means we end up scheduling with a preempt_count() higher than
usual (3 instead of the 'expected' 2). Since future patches will want
to rely on an invariant preempt_count() value during schedule, fix
this up.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2949,12 +2949,8 @@ static inline void schedule_debug(struct
 #ifdef CONFIG_SCHED_STACK_END_CHECK
 	BUG_ON(unlikely(task_stack_end_corrupted(prev)));
 #endif
-	/*
-	 * Test if we are atomic. Since do_exit() needs to call into
-	 * schedule() atomically, we ignore that path. Otherwise whine
-	 * if we are scheduling when we should not.
-	 */
-	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD))
+
+	if (unlikely(in_atomic_preempt_off()))
 		__schedule_bug(prev);
 	rcu_sleep_check();
 
@@ -3053,6 +3049,17 @@ static void __sched __schedule(void)
 	rcu_note_context_switch();
 	prev = rq->curr;
 
+	/*
+	 * do_exit() calls schedule() with preemption disabled as an exception;
+	 * however we must fix that up, otherwise the next task will see an
+	 * inconsistent (higher) preempt count.
+	 *
+	 * It also avoids the below schedule_debug() test from complaining
+	 * about this.
+	 */
+	if (unlikely(prev->state == TASK_DEAD))
+		preempt_enable_no_resched_notrace();
+
 	schedule_debug(prev);
 
 	if (sched_feat(HRTICK))



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 03/12] sched: Create preempt_count invariant
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
  2015-09-30  7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
  2015-09-30  7:10 ` [PATCH v2 02/12] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-09-30  9:32   ` Steven Rostedt
  2015-09-30  7:10 ` [PATCH v2 04/12] sched: Add preempt argument to __schedule() Peter Zijlstra
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-2.patch --]
[-- Type: text/plain, Size: 3831 bytes --]

Assuming units of PREEMPT_DISABLE_OFFSET for preempt_count() numbers.

Now that TASK_DEAD no longer results in preempt_count() == 3 during
scheduling, we will always call context_switch() with preempt_count()
== 2.

However, we don't always end up with preempt_count() == 2 in
finish_task_switch() because new tasks get created with
preempt_count() == 1.

Create FORK_PREEMPT_COUNT and set it to 2 and use that in the right
places. Note that we cannot use INIT_PREEMPT_COUNT as that serves
another purpose (boot).

After this, preempt_count() is invariant across the context switch,
with exception of PREEMPT_ACTIVE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/preempt.h |    2 +-
 include/asm-generic/preempt.h  |    2 +-
 include/linux/sched.h          |   17 ++++++++++++-----
 kernel/sched/core.c            |   23 +++++++++++++++++++++--
 4 files changed, 35 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -31,7 +31,7 @@ static __always_inline void preempt_coun
  * must be macros to avoid header recursion hell
  */
 #define init_task_preempt_count(p) do { \
-	task_thread_info(p)->saved_preempt_count = PREEMPT_DISABLED; \
+	task_thread_info(p)->saved_preempt_count = FORK_PREEMPT_COUNT; \
 } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -24,7 +24,7 @@ static __always_inline void preempt_coun
  * must be macros to avoid header recursion hell
  */
 #define init_task_preempt_count(p) do { \
-	task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
+	task_thread_info(p)->preempt_count = FORK_PREEMPT_COUNT; \
 } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -599,11 +599,7 @@ struct task_cputime_atomic {
 		.sum_exec_runtime = ATOMIC64_INIT(0),		\
 	}
 
-#ifdef CONFIG_PREEMPT_COUNT
-#define PREEMPT_DISABLED	(1 + PREEMPT_ENABLED)
-#else
-#define PREEMPT_DISABLED	PREEMPT_ENABLED
-#endif
+#define PREEMPT_DISABLED	(PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
 
 /*
  * Disable preemption until the scheduler is running -- use an unconditional
@@ -613,6 +609,17 @@ struct task_cputime_atomic {
  */
 #define INIT_PREEMPT_COUNT	PREEMPT_OFFSET
 
+/*
+ * Initial preempt_count value; reflects the preempt_count schedule invariant
+ * which states that during context switches:
+ *
+ *    preempt_count() == 2*PREEMPT_DISABLE_OFFSET
+ *
+ * Note: PREEMPT_DISABLE_OFFSET is 0 for !PREEMPT_COUNT kernels.
+ * Note: See finish_task_switch().
+ */
+#define FORK_PREEMPT_COUNT	(2*PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
+
 /**
  * struct thread_group_cputimer - thread group interval timer counts
  * @cputime_atomic:	atomic thread group interval timers.
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2504,6 +2504,18 @@ static struct rq *finish_task_switch(str
 	struct mm_struct *mm = rq->prev_mm;
 	long prev_state;
 
+	/*
+	 * The previous task will have left us with a preempt_count of 2
+	 * because it left us after:
+	 *
+	 *	schedule()
+	 *	  preempt_disable();			// 1
+	 *	  __schedule()
+	 *	    raw_spin_lock_irq(&rq->lock)	// 2
+	 *
+	 * Also, see FORK_PREEMPT_COUNT.
+	 */
+
 	rq->prev_mm = NULL;
 
 	/*
@@ -2588,8 +2600,15 @@ asmlinkage __visible void schedule_tail(
 {
 	struct rq *rq;
 
-	/* finish_task_switch() drops rq->lock and enables preemtion */
-	preempt_disable();
+	/*
+	 * New tasks start with FORK_PREEMPT_COUNT, see there and
+	 * finish_task_switch() for details.
+	 *
+	 * finish_task_switch() will drop rq->lock() and lower preempt_count
+	 * and the preempt_enable() will end up enabling preemption (on
+	 * PREEMPT_COUNT kernels).
+	 */
+
 	rq = finish_task_switch(prev);
 	balance_callback(rq);
 	preempt_enable();



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 04/12] sched: Add preempt argument to __schedule()
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (2 preceding siblings ...)
  2015-09-30  7:10 ` [PATCH v2 03/12] sched: Create preempt_count invariant Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-09-30  7:10 ` [PATCH v2 05/12] sched: Fix trace_sched_switch() Peter Zijlstra
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-5.patch --]
[-- Type: text/plain, Size: 2087 bytes --]

There is only a single PREEMPT_ACTIVE use in the regular __schedule()
path and that is to circumvent the task->state check. Since the code
setting PREEMPT_ACTIVE is the immediate caller of __schedule() we can
replace this with a function argument.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3037,7 +3037,7 @@ pick_next_task(struct rq *rq, struct tas
  *
  * WARNING: must be called with preemption disabled!
  */
-static void __sched __schedule(void)
+static void __sched __schedule(bool preempt)
 {
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
@@ -3077,7 +3077,7 @@ static void __sched __schedule(void)
 	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
 
 	switch_count = &prev->nivcsw;
-	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
+	if (!preempt && prev->state) {
 		if (unlikely(signal_pending_state(prev->state, prev))) {
 			prev->state = TASK_RUNNING;
 		} else {
@@ -3142,7 +3142,7 @@ asmlinkage __visible void __sched schedu
 	sched_submit_work(tsk);
 	do {
 		preempt_disable();
-		__schedule();
+		__schedule(false);
 		sched_preempt_enable_no_resched();
 	} while (need_resched());
 }
@@ -3183,7 +3183,7 @@ static void __sched notrace preempt_sche
 {
 	do {
 		preempt_active_enter();
-		__schedule();
+		__schedule(true);
 		preempt_active_exit();
 
 		/*
@@ -3248,7 +3248,7 @@ asmlinkage __visible void __sched notrac
 		 * an infinite recursion.
 		 */
 		prev_ctx = exception_enter();
-		__schedule();
+		__schedule(true);
 		exception_exit(prev_ctx);
 
 		barrier();
@@ -3277,7 +3277,7 @@ asmlinkage __visible void __sched preemp
 	do {
 		preempt_active_enter();
 		local_irq_enable();
-		__schedule();
+		__schedule(true);
 		local_irq_disable();
 		preempt_active_exit();
 	} while (need_resched());



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 05/12] sched: Fix trace_sched_switch()
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (3 preceding siblings ...)
  2015-09-30  7:10 ` [PATCH v2 04/12] sched: Add preempt argument to __schedule() Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-09-30  7:10 ` [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-6.patch --]
[-- Type: text/plain, Size: 4039 bytes --]

__trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE
user, move trace_sched_switch() from prepare_task_switch() to
__schedule() and propagate the @preempt argument.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/trace/events/sched.h      |   22 +++++++++-------------
 kernel/sched/core.c               |    2 +-
 kernel/trace/ftrace.c             |    2 +-
 kernel/trace/trace_sched_switch.c |    3 ++-
 kernel/trace/trace_sched_wakeup.c |    2 +-
 5 files changed, 14 insertions(+), 17 deletions(-)

--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -104,22 +104,17 @@ DEFINE_EVENT(sched_wakeup_template, sche
 	     TP_ARGS(p));
 
 #ifdef CREATE_TRACE_POINTS
-static inline long __trace_sched_switch_state(struct task_struct *p)
+static inline long __trace_sched_switch_state(bool preempt, struct task_struct *p)
 {
-	long state = p->state;
-
-#ifdef CONFIG_PREEMPT
 #ifdef CONFIG_SCHED_DEBUG
 	BUG_ON(p != current);
 #endif /* CONFIG_SCHED_DEBUG */
+
 	/*
-	 * For all intents and purposes a preempted task is a running task.
+	 * Preemption ignores task state, therefore preempted tasks are always
+	 * RUNNING (we will not have dequeued if state != RUNNING).
 	 */
-	if (preempt_count() & PREEMPT_ACTIVE)
-		state = TASK_RUNNING | TASK_STATE_MAX;
-#endif /* CONFIG_PREEMPT */
-
-	return state;
+	return preempt ? TASK_RUNNING | TASK_STATE_MAX : p->state;
 }
 #endif /* CREATE_TRACE_POINTS */
 
@@ -128,10 +123,11 @@ static inline long __trace_sched_switch_
  */
 TRACE_EVENT(sched_switch,
 
-	TP_PROTO(struct task_struct *prev,
+	TP_PROTO(bool preempt,
+		 struct task_struct *prev,
 		 struct task_struct *next),
 
-	TP_ARGS(prev, next),
+	TP_ARGS(preempt, prev, next),
 
 	TP_STRUCT__entry(
 		__array(	char,	prev_comm,	TASK_COMM_LEN	)
@@ -147,7 +143,7 @@ TRACE_EVENT(sched_switch,
 		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
 		__entry->prev_pid	= prev->pid;
 		__entry->prev_prio	= prev->prio;
-		__entry->prev_state	= __trace_sched_switch_state(prev);
+		__entry->prev_state	= __trace_sched_switch_state(preempt, prev);
 		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
 		__entry->next_pid	= next->pid;
 		__entry->next_prio	= next->prio;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2470,7 +2470,6 @@ static inline void
 prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 {
-	trace_sched_switch(prev, next);
 	sched_info_switch(rq, prev, next);
 	perf_event_task_sched_out(prev, next);
 	fire_sched_out_preempt_notifiers(prev, next);
@@ -3121,6 +3120,7 @@ static void __sched __schedule(bool pree
 		rq->curr = next;
 		++*switch_count;
 
+		trace_sched_switch(preempt, prev, next);
 		rq = context_switch(rq, prev, next); /* unlocks the rq */
 		cpu = cpu_of(rq);
 	} else {
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5697,7 +5697,7 @@ static int alloc_retstack_tasklist(struc
 }
 
 static void
-ftrace_graph_probe_sched_switch(void *ignore,
+ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 			struct task_struct *prev, struct task_struct *next)
 {
 	unsigned long long timestamp;
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -16,7 +16,8 @@ static int			sched_ref;
 static DEFINE_MUTEX(sched_register_mutex);
 
 static void
-probe_sched_switch(void *ignore, struct task_struct *prev, struct task_struct *next)
+probe_sched_switch(void *ignore, bool preempt,
+		   struct task_struct *prev, struct task_struct *next)
 {
 	if (unlikely(!sched_ref))
 		return;
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -420,7 +420,7 @@ tracing_sched_wakeup_trace(struct trace_
 }
 
 static void notrace
-probe_wakeup_sched_switch(void *ignore,
+probe_wakeup_sched_switch(void *ignore, bool preempt,
 			  struct task_struct *prev, struct task_struct *next)
 {
 	struct trace_array_cpu *data;



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (4 preceding siblings ...)
  2015-09-30  7:10 ` [PATCH v2 05/12] sched: Fix trace_sched_switch() Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-10-01 15:27   ` Frederic Weisbecker
  2015-09-30  7:10 ` [PATCH v2 07/12] sched: Robustify preemption leak checks Peter Zijlstra
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-7.patch --]
[-- Type: text/plain, Size: 2486 bytes --]

Now that nothing tests for PREEMPT_ACTIVE anymore, stop setting it.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/preempt.h |   12 ------------
 kernel/sched/core.c     |   19 ++++++-------------
 2 files changed, 6 insertions(+), 25 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -146,18 +146,6 @@ extern void preempt_count_sub(int val);
 #define preempt_count_inc() preempt_count_add(1)
 #define preempt_count_dec() preempt_count_sub(1)
 
-#define preempt_active_enter() \
-do { \
-	preempt_count_add(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET); \
-	barrier(); \
-} while (0)
-
-#define preempt_active_exit() \
-do { \
-	barrier(); \
-	preempt_count_sub(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET); \
-} while (0)
-
 #ifdef CONFIG_PREEMPT_COUNT
 
 #define preempt_disable() \
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3190,9 +3190,9 @@ void __sched schedule_preempt_disabled(v
 static void __sched notrace preempt_schedule_common(void)
 {
 	do {
-		preempt_active_enter();
+		preempt_disable();
 		__schedule(true);
-		preempt_active_exit();
+		sched_preempt_enable_no_resched();
 
 		/*
 		 * Check again in case we missed a preemption opportunity
@@ -3243,13 +3243,7 @@ asmlinkage __visible void __sched notrac
 		return;
 
 	do {
-		/*
-		 * Use raw __prempt_count() ops that don't call function.
-		 * We can't call functions before disabling preemption which
-		 * disarm preemption tracing recursions.
-		 */
-		__preempt_count_add(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
-		barrier();
+		preempt_disable_notrace();
 		/*
 		 * Needs preempt disabled in case user_exit() is traced
 		 * and the tracer calls preempt_enable_notrace() causing
@@ -3259,8 +3253,7 @@ asmlinkage __visible void __sched notrac
 		__schedule(true);
 		exception_exit(prev_ctx);
 
-		barrier();
-		__preempt_count_sub(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
+		preempt_enable_no_resched_notrace();
 	} while (need_resched());
 }
 EXPORT_SYMBOL_GPL(preempt_schedule_notrace);
@@ -3283,11 +3276,11 @@ asmlinkage __visible void __sched preemp
 	prev_state = exception_enter();
 
 	do {
-		preempt_active_enter();
+		preempt_disable();
 		local_irq_enable();
 		__schedule(true);
 		local_irq_disable();
-		preempt_active_exit();
+		sched_preempt_enable_no_resched();
 	} while (need_resched());
 
 	exception_exit(prev_state);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 07/12] sched: Robustify preemption leak checks
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (5 preceding siblings ...)
  2015-09-30  7:10 ` [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-09-30  9:35   ` Steven Rostedt
  2015-09-30  7:10 ` [PATCH v2 08/12] sched: Simplify preempt_count tests Peter Zijlstra
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-3.patch --]
[-- Type: text/plain, Size: 1564 bytes --]

When we warn about a preempt_count leak; reset the preempt_count to
the known good value such that the problem does not ripple forward.

This is most important on x86 which has a per cpu preempt_count that is
not saved/restored (after this series). So if you schedule with an
invalid (!2*PREEMPT_DISABLE_OFFSET) preempt_count the next task is
messed up too.

Enforcing this invariant limits the borkage to just the one task.

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/exit.c       |    4 +++-
 kernel/sched/core.c |    4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -706,10 +706,12 @@ void do_exit(long code)
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
 
-	if (unlikely(in_atomic()))
+	if (unlikely(in_atomic())) {
 		pr_info("note: %s[%d] exited with preempt_count %d\n",
 			current->comm, task_pid_nr(current),
 			preempt_count());
+		preempt_count_set(PREEMPT_ENABLED);
+	}
 
 	/* sync mm's RSS info before statistics gathering */
 	if (tsk->mm)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2968,8 +2968,10 @@ static inline void schedule_debug(struct
 	BUG_ON(unlikely(task_stack_end_corrupted(prev)));
 #endif
 
-	if (unlikely(in_atomic_preempt_off()))
+	if (unlikely(in_atomic_preempt_off())) {
 		__schedule_bug(prev);
+		preempt_count_set(PREEMPT_DISABLED);
+	}
 	rcu_sleep_check();
 
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 08/12] sched: Simplify preempt_count tests
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (6 preceding siblings ...)
  2015-09-30  7:10 ` [PATCH v2 07/12] sched: Robustify preemption leak checks Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-10-01 15:31   ` Frederic Weisbecker
  2015-09-30  7:10 ` [PATCH v2 09/12] sched, x86: Kill saved_preempt_count Peter Zijlstra
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-8.patch --]
[-- Type: text/plain, Size: 1229 bytes --]

Since we stopped setting PREEMPT_ACTIVE, there is no need to mask it
out of preempt_count() tests.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/preempt.h |    3 +--
 kernel/sched/core.c     |    2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -126,8 +126,7 @@
  * Check whether we were atomic before we did preempt_disable():
  * (used by the scheduler)
  */
-#define in_atomic_preempt_off() \
-		((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_DISABLE_OFFSET)
+#define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
 
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
 extern void preempt_count_add(int val);
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7472,7 +7472,7 @@ void __init sched_init(void)
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 static inline int preempt_count_equals(int preempt_offset)
 {
-	int nested = (preempt_count() & ~PREEMPT_ACTIVE) + rcu_preempt_depth();
+	int nested = preempt_count() + rcu_preempt_depth();
 
 	return (nested == preempt_offset);
 }



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 09/12] sched, x86: Kill saved_preempt_count
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (7 preceding siblings ...)
  2015-09-30  7:10 ` [PATCH v2 08/12] sched: Simplify preempt_count tests Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-10-01 15:40   ` Frederic Weisbecker
  2015-09-30  7:10 ` [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-9.patch --]
[-- Type: text/plain, Size: 2890 bytes --]

With the introduction of the context switch preempt_count invariant,
and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
per-cpu preemption count, it must always be the same.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/preempt.h     |    5 +----
 arch/x86/include/asm/thread_info.h |    2 --
 arch/x86/kernel/process_32.c       |    8 --------
 arch/x86/kernel/process_64.c       |    8 --------
 4 files changed, 1 insertion(+), 22 deletions(-)

--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -30,12 +30,9 @@ static __always_inline void preempt_coun
 /*
  * must be macros to avoid header recursion hell
  */
-#define init_task_preempt_count(p) do { \
-	task_thread_info(p)->saved_preempt_count = FORK_PREEMPT_COUNT; \
-} while (0)
+#define init_task_preempt_count(p) do { } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
-	task_thread_info(p)->saved_preempt_count = PREEMPT_ENABLED; \
 	per_cpu(__preempt_count, (cpu)) = PREEMPT_ENABLED; \
 } while (0)
 
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -57,7 +57,6 @@ struct thread_info {
 	__u32			flags;		/* low level flags */
 	__u32			status;		/* thread synchronous flags */
 	__u32			cpu;		/* current CPU */
-	int			saved_preempt_count;
 	mm_segment_t		addr_limit;
 	void __user		*sysenter_return;
 	unsigned int		sig_on_uaccess_error:1;
@@ -69,7 +68,6 @@ struct thread_info {
 	.task		= &tsk,			\
 	.flags		= 0,			\
 	.cpu		= 0,			\
-	.saved_preempt_count = INIT_PREEMPT_COUNT,	\
 	.addr_limit	= KERNEL_DS,		\
 }
 
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -280,14 +280,6 @@ __switch_to(struct task_struct *prev_p,
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
-	 * preempt_count of all tasks was equal here and this would not be
-	 * needed.
-	 */
-	task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
-	this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
-
-	/*
 	 * Now maybe handle debug registers and/or IO bitmaps
 	 */
 	if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -401,14 +401,6 @@ __switch_to(struct task_struct *prev_p,
 	 */
 	this_cpu_write(current_task, next_p);
 
-	/*
-	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
-	 * preempt_count of all tasks was equal here and this would not be
-	 * needed.
-	 */
-	task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
-	this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
-
 	/* Reload esp0 and ss1.  This changes current_thread_info(). */
 	load_sp0(tss, next);
 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (8 preceding siblings ...)
  2015-09-30  7:10 ` [PATCH v2 09/12] sched, x86: Kill saved_preempt_count Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-10-01 15:41   ` Frederic Weisbecker
  2015-09-30  7:10 ` [PATCH v2 11/12] sched: More notrace Peter Zijlstra
  2015-09-30  7:10 ` [PATCH v2 12/12] sched: Add preempt_count invariant check Peter Zijlstra
  11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-10.patch --]
[-- Type: text/plain, Size: 902 bytes --]

Its unused, kill the definition.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/preempt.h |    5 -----
 1 file changed, 5 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -26,7 +26,6 @@
  *         SOFTIRQ_MASK:	0x0000ff00
  *         HARDIRQ_MASK:	0x000f0000
  *             NMI_MASK:	0x00100000
- *       PREEMPT_ACTIVE:	0x00200000
  * PREEMPT_NEED_RESCHED:	0x80000000
  */
 #define PREEMPT_BITS	8
@@ -53,10 +52,6 @@
 
 #define SOFTIRQ_DISABLE_OFFSET	(2 * SOFTIRQ_OFFSET)
 
-#define PREEMPT_ACTIVE_BITS	1
-#define PREEMPT_ACTIVE_SHIFT	(NMI_SHIFT + NMI_BITS)
-#define PREEMPT_ACTIVE	(__IRQ_MASK(PREEMPT_ACTIVE_BITS) << PREEMPT_ACTIVE_SHIFT)
-
 /* We use the MSB mostly because its available */
 #define PREEMPT_NEED_RESCHED	0x80000000
 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 11/12] sched: More notrace
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (9 preceding siblings ...)
  2015-09-30  7:10 ` [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-09-30  7:10 ` [PATCH v2 12/12] sched: Add preempt_count invariant check Peter Zijlstra
  11 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-11.patch --]
[-- Type: text/plain, Size: 2159 bytes --]

preempt_schedule_common() is marked notrace, but it does not use
_notrace() preempt_count functions and __schedule() is also not marked
notrace, which means that its perfectly possible to end up in the
tracer from preempt_schedule_common().

Steve says:

| Yep, there's some history to this. This was originally the issue that
| caused function tracing to go into infinite recursion. But now we have
| preempt_schedule_notrace(), which is used by the function tracer, and
| that function must not be traced till preemption is disabled.
|
| Now if function tracing is running and we take an interrupt when
| NEED_RESCHED is set, it calls
|
|   preempt_schedule_common() (not traced)
|
| But then that calls preempt_disable() (traced)
|
| function tracer calls preempt_disable_notrace() followed by
| preempt_enable_notrace() which will see NEED_RESCHED set, and it will
| call preempt_schedule_notrace(), which stops the recursion, but
| still calls __schedule() here, and that means when we return, we call
| the __schedule() from preempt_schedule_common().
|
| That said, I prefer this patch. Preemption is disabled before calling
| __schedule(), and we get rid of a one round recursion with the
| scheduler.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3057,7 +3057,7 @@ pick_next_task(struct rq *rq, struct tas
  *
  * WARNING: must be called with preemption disabled!
  */
-static void __sched __schedule(bool preempt)
+static void __sched notrace __schedule(bool preempt)
 {
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
@@ -3203,9 +3203,9 @@ void __sched schedule_preempt_disabled(v
 static void __sched notrace preempt_schedule_common(void)
 {
 	do {
-		preempt_disable();
+		preempt_disable_notrace();
 		__schedule(true);
-		sched_preempt_enable_no_resched();
+		preempt_enable_no_resched_notrace();
 
 		/*
 		 * Check again in case we missed a preemption opportunity



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 12/12] sched: Add preempt_count invariant check
  2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (10 preceding siblings ...)
  2015-09-30  7:10 ` [PATCH v2 11/12] sched: More notrace Peter Zijlstra
@ 2015-09-30  7:10 ` Peter Zijlstra
  2015-09-30  9:38   ` Steven Rostedt
  11 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30  7:10 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-12.patch --]
[-- Type: text/plain, Size: 661 bytes --]

Ingo requested I keep my debug check for the preempt_count invariant.

Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2514,6 +2514,10 @@ static struct rq *finish_task_switch(str
 	 *
 	 * Also, see FORK_PREEMPT_COUNT.
 	 */
+	if (unlikely(WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
+			       "corrupted preempt_count: %s/%d/0x%x\n",
+			       current->comm, current->pid, preempt_count())))
+		preempt_count_set(FORK_PREEMPT_COUNT);
 
 	rq->prev_mm = NULL;
 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT
  2015-09-30  7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
@ 2015-09-30  9:04   ` Steven Rostedt
  2015-09-30 20:39     ` Thomas Gleixner
  2015-10-01 13:25   ` Frederic Weisbecker
  1 sibling, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30  9:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Wed, 30 Sep 2015 09:10:36 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
> PREEMPT_ACTIVE to avoid cond_resched() from working before the
> scheduler is setup.
> 
> However, keeping preemption disabled should do the same thing already,
> making the PREEMPT_ACTIVE part entirely redundant.
> 
> The only complication is !PREEMPT_COUNT kernels, where
> PREEMPT_DISABLED ends up being 0. Instead we use an unconditional
> PREEMPT_OFFSET to set preempt_count() even on !PREEMPT_COUNT kernels.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/sched.h |   11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -606,19 +606,18 @@ struct task_cputime_atomic {
>  #endif
>  
>  /*
> - * Disable preemption until the scheduler is running.
> - * Reset by start_kernel()->sched_init()->init_idle().
> + * Disable preemption until the scheduler is running -- use an unconditional
> + * value so that it also works on !PREEMPT_COUNT kernels.
>   *
> - * We include PREEMPT_ACTIVE to avoid cond_resched() from working
> - * before the scheduler is active -- see should_resched().
> + * Reset by start_kernel()->sched_init()->init_idle().

 Reset by start_kernel()->sched_init()->init_idle()->init_idle_preempt_count().

Other than that.

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

>   */
> -#define INIT_PREEMPT_COUNT	(PREEMPT_DISABLED + PREEMPT_ACTIVE)
> +#define INIT_PREEMPT_COUNT	PREEMPT_OFFSET
>  
>  /**
>   * struct thread_group_cputimer - thread group interval timer counts
>   * @cputime_atomic:	atomic thread group interval timers.
>   * @running:		non-zero when there are timers running and
> - * 			@cputime receives updates.
> + *			@cputime receives updates.
>   *
>   * This structure contains the version of task_cputime, above, that is
>   * used for thread group CPU timer calculations.
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 03/12] sched: Create preempt_count invariant
  2015-09-30  7:10 ` [PATCH v2 03/12] sched: Create preempt_count invariant Peter Zijlstra
@ 2015-09-30  9:32   ` Steven Rostedt
  2015-09-30 11:13     ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30  9:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Wed, 30 Sep 2015 09:10:38 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Assuming units of PREEMPT_DISABLE_OFFSET for preempt_count() numbers.
> 
> Now that TASK_DEAD no longer results in preempt_count() == 3 during
> scheduling, we will always call context_switch() with preempt_count()
> == 2.
> 
> However, we don't always end up with preempt_count() == 2 in
> finish_task_switch() because new tasks get created with
> preempt_count() == 1.
> 
> Create FORK_PREEMPT_COUNT and set it to 2 and use that in the right
> places. Note that we cannot use INIT_PREEMPT_COUNT as that serves
> another purpose (boot).
> 
> After this, preempt_count() is invariant across the context switch,
> with exception of PREEMPT_ACTIVE.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/preempt.h |    2 +-
>  include/asm-generic/preempt.h  |    2 +-
>  include/linux/sched.h          |   17 ++++++++++++-----
>  kernel/sched/core.c            |   23 +++++++++++++++++++++--
>  4 files changed, 35 insertions(+), 9 deletions(-)
> 
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -31,7 +31,7 @@ static __always_inline void preempt_coun
>   * must be macros to avoid header recursion hell
>   */
>  #define init_task_preempt_count(p) do { \
> -	task_thread_info(p)->saved_preempt_count = PREEMPT_DISABLED; \
> +	task_thread_info(p)->saved_preempt_count = FORK_PREEMPT_COUNT; \
>  } while (0)
>  
>  #define init_idle_preempt_count(p, cpu) do { \
> --- a/include/asm-generic/preempt.h
> +++ b/include/asm-generic/preempt.h
> @@ -24,7 +24,7 @@ static __always_inline void preempt_coun
>   * must be macros to avoid header recursion hell
>   */
>  #define init_task_preempt_count(p) do { \
> -	task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
> +	task_thread_info(p)->preempt_count = FORK_PREEMPT_COUNT; \
>  } while (0)
>  
>  #define init_idle_preempt_count(p, cpu) do { \
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -599,11 +599,7 @@ struct task_cputime_atomic {
>  		.sum_exec_runtime = ATOMIC64_INIT(0),		\
>  	}
>  
> -#ifdef CONFIG_PREEMPT_COUNT
> -#define PREEMPT_DISABLED	(1 + PREEMPT_ENABLED)
> -#else
> -#define PREEMPT_DISABLED	PREEMPT_ENABLED
> -#endif
> +#define PREEMPT_DISABLED	(PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)

Hmm, it looks to me that you removed all users of PREEMPT_DISABLED.

Did you add another user of it?

-- Steve

>  
>  /*
>   * Disable preemption until the scheduler is running -- use an unconditional
> @@ -613,6 +609,17 @@ struct task_cputime_atomic {
>   */
>  #define INIT_PREEMPT_COUNT	PREEMPT_OFFSET
>  


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 07/12] sched: Robustify preemption leak checks
  2015-09-30  7:10 ` [PATCH v2 07/12] sched: Robustify preemption leak checks Peter Zijlstra
@ 2015-09-30  9:35   ` Steven Rostedt
  0 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30  9:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Wed, 30 Sep 2015 09:10:42 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> When we warn about a preempt_count leak; reset the preempt_count to
> the known good value such that the problem does not ripple forward.
> 
> This is most important on x86 which has a per cpu preempt_count that is
> not saved/restored (after this series). So if you schedule with an
> invalid (!2*PREEMPT_DISABLE_OFFSET) preempt_count the next task is
> messed up too.
> 
> Enforcing this invariant limits the borkage to just the one task.
> 
> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 12/12] sched: Add preempt_count invariant check
  2015-09-30  7:10 ` [PATCH v2 12/12] sched: Add preempt_count invariant check Peter Zijlstra
@ 2015-09-30  9:38   ` Steven Rostedt
  2015-09-30 11:15     ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30  9:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Wed, 30 Sep 2015 09:10:47 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Ingo requested I keep my debug check for the preempt_count invariant.
> 
> Requested-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/core.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2514,6 +2514,10 @@ static struct rq *finish_task_switch(str
>  	 *
>  	 * Also, see FORK_PREEMPT_COUNT.
>  	 */
> +	if (unlikely(WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,

Nuke the "unlikely" it's redundant with the WARN_ONCE().

-- Steve

> +			       "corrupted preempt_count: %s/%d/0x%x\n",
> +			       current->comm, current->pid, preempt_count())))
> +		preempt_count_set(FORK_PREEMPT_COUNT);
>  
>  	rq->prev_mm = NULL;
>  
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 03/12] sched: Create preempt_count invariant
  2015-09-30  9:32   ` Steven Rostedt
@ 2015-09-30 11:13     ` Peter Zijlstra
  2015-09-30 13:36       ` Steven Rostedt
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 11:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Wed, Sep 30, 2015 at 05:32:08AM -0400, Steven Rostedt wrote:
> On Wed, 30 Sep 2015 09:10:38 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> > -#ifdef CONFIG_PREEMPT_COUNT
> > -#define PREEMPT_DISABLED	(1 + PREEMPT_ENABLED)
> > -#else
> > -#define PREEMPT_DISABLED	PREEMPT_ENABLED
> > -#endif
> > +#define PREEMPT_DISABLED	(PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
> 
> Hmm, it looks to me that you removed all users of PREEMPT_DISABLED.
> 
> Did you add another user of it?

Dunno, lemme go grep ;-)

include/linux/sched.h:#define PREEMPT_DISABLED  (PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
kernel/sched/core.c:            preempt_count_set(PREEMPT_DISABLED);

So there still is one user.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 12/12] sched: Add preempt_count invariant check
  2015-09-30  9:38   ` Steven Rostedt
@ 2015-09-30 11:15     ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2015-09-30 11:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Wed, Sep 30, 2015 at 05:38:10AM -0400, Steven Rostedt wrote:
> On Wed, 30 Sep 2015 09:10:47 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Ingo requested I keep my debug check for the preempt_count invariant.
> > 
> > Requested-by: Ingo Molnar <mingo@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  kernel/sched/core.c |    4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2514,6 +2514,10 @@ static struct rq *finish_task_switch(str
> >  	 *
> >  	 * Also, see FORK_PREEMPT_COUNT.
> >  	 */
> > +	if (unlikely(WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
> 
> Nuke the "unlikely" it's redundant with the WARN_ONCE().

Ah, indeed. *poof*

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 03/12] sched: Create preempt_count invariant
  2015-09-30 11:13     ` Peter Zijlstra
@ 2015-09-30 13:36       ` Steven Rostedt
  0 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2015-09-30 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Wed, 30 Sep 2015 13:13:23 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Sep 30, 2015 at 05:32:08AM -0400, Steven Rostedt wrote:
> > On Wed, 30 Sep 2015 09:10:38 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > > -#ifdef CONFIG_PREEMPT_COUNT
> > > -#define PREEMPT_DISABLED	(1 + PREEMPT_ENABLED)
> > > -#else
> > > -#define PREEMPT_DISABLED	PREEMPT_ENABLED
> > > -#endif
> > > +#define PREEMPT_DISABLED	(PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
> > 
> > Hmm, it looks to me that you removed all users of PREEMPT_DISABLED.
> > 
> > Did you add another user of it?
> 
> Dunno, lemme go grep ;-)
> 
> include/linux/sched.h:#define PREEMPT_DISABLED  (PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
> kernel/sched/core.c:            preempt_count_set(PREEMPT_DISABLED);
> 
> So there still is one user.

Ah, that's the one you added in patch 7. I did a git grep from the
vanilla tree.

-- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT
  2015-09-30  9:04   ` Steven Rostedt
@ 2015-09-30 20:39     ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2015-09-30 20:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, mingo, linux-kernel, torvalds, fweisbec, oleg,
	umgwanakikbuti

On Wed, 30 Sep 2015, Steven Rostedt wrote:
> >  /*
> > - * Disable preemption until the scheduler is running.
> > - * Reset by start_kernel()->sched_init()->init_idle().
> > + * Disable preemption until the scheduler is running -- use an unconditional
> > + * value so that it also works on !PREEMPT_COUNT kernels.
> >   *
> > - * We include PREEMPT_ACTIVE to avoid cond_resched() from working
> > - * before the scheduler is active -- see should_resched().
> > + * Reset by start_kernel()->sched_init()->init_idle().
> 
>  Reset by start_kernel()->sched_init()->init_idle()->init_idle_preempt_count().
> 
> Other than that.
> 
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT
  2015-09-30  7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
  2015-09-30  9:04   ` Steven Rostedt
@ 2015-10-01 13:25   ` Frederic Weisbecker
  1 sibling, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 13:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Wed, Sep 30, 2015 at 09:10:36AM +0200, Peter Zijlstra wrote:
> As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
> PREEMPT_ACTIVE to avoid cond_resched() from working before the
> scheduler is setup.
> 
> However, keeping preemption disabled should do the same thing already,
> making the PREEMPT_ACTIVE part entirely redundant.
> 
> The only complication is !PREEMPT_COUNT kernels, where
> PREEMPT_DISABLED ends up being 0. Instead we use an unconditional
> PREEMPT_OFFSET to set preempt_count() even on !PREEMPT_COUNT kernels.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE
  2015-09-30  7:10 ` [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
@ 2015-10-01 15:27   ` Frederic Weisbecker
  0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 15:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Wed, Sep 30, 2015 at 09:10:41AM +0200, Peter Zijlstra wrote:
> Now that nothing tests for PREEMPT_ACTIVE anymore, stop setting it.
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Great news!

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 08/12] sched: Simplify preempt_count tests
  2015-09-30  7:10 ` [PATCH v2 08/12] sched: Simplify preempt_count tests Peter Zijlstra
@ 2015-10-01 15:31   ` Frederic Weisbecker
  0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Wed, Sep 30, 2015 at 09:10:43AM +0200, Peter Zijlstra wrote:
> Since we stopped setting PREEMPT_ACTIVE, there is no need to mask it
> out of preempt_count() tests.
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 09/12] sched, x86: Kill saved_preempt_count
  2015-09-30  7:10 ` [PATCH v2 09/12] sched, x86: Kill saved_preempt_count Peter Zijlstra
@ 2015-10-01 15:40   ` Frederic Weisbecker
  0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Wed, Sep 30, 2015 at 09:10:44AM +0200, Peter Zijlstra wrote:
> With the introduction of the context switch preempt_count invariant,
> and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
> per-cpu preemption count, it must always be the same.
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>

Good news as well.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE
  2015-09-30  7:10 ` [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
@ 2015-10-01 15:41   ` Frederic Weisbecker
  0 siblings, 0 replies; 26+ messages in thread
From: Frederic Weisbecker @ 2015-10-01 15:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Wed, Sep 30, 2015 at 09:10:45AM +0200, Peter Zijlstra wrote:
> Its unused, kill the definition.
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-10-01 15:41 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-30  7:10 [PATCH v2 00/12] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
2015-09-30  7:10 ` [PATCH v2 01/12] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
2015-09-30  9:04   ` Steven Rostedt
2015-09-30 20:39     ` Thomas Gleixner
2015-10-01 13:25   ` Frederic Weisbecker
2015-09-30  7:10 ` [PATCH v2 02/12] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
2015-09-30  7:10 ` [PATCH v2 03/12] sched: Create preempt_count invariant Peter Zijlstra
2015-09-30  9:32   ` Steven Rostedt
2015-09-30 11:13     ` Peter Zijlstra
2015-09-30 13:36       ` Steven Rostedt
2015-09-30  7:10 ` [PATCH v2 04/12] sched: Add preempt argument to __schedule() Peter Zijlstra
2015-09-30  7:10 ` [PATCH v2 05/12] sched: Fix trace_sched_switch() Peter Zijlstra
2015-09-30  7:10 ` [PATCH v2 06/12] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
2015-10-01 15:27   ` Frederic Weisbecker
2015-09-30  7:10 ` [PATCH v2 07/12] sched: Robustify preemption leak checks Peter Zijlstra
2015-09-30  9:35   ` Steven Rostedt
2015-09-30  7:10 ` [PATCH v2 08/12] sched: Simplify preempt_count tests Peter Zijlstra
2015-10-01 15:31   ` Frederic Weisbecker
2015-09-30  7:10 ` [PATCH v2 09/12] sched, x86: Kill saved_preempt_count Peter Zijlstra
2015-10-01 15:40   ` Frederic Weisbecker
2015-09-30  7:10 ` [PATCH v2 10/12] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
2015-10-01 15:41   ` Frederic Weisbecker
2015-09-30  7:10 ` [PATCH v2 11/12] sched: More notrace Peter Zijlstra
2015-09-30  7:10 ` [PATCH v2 12/12] sched: Add preempt_count invariant check Peter Zijlstra
2015-09-30  9:38   ` Steven Rostedt
2015-09-30 11:15     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.