All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE
@ 2015-09-29  9:28 Peter Zijlstra
  2015-09-29  9:28 ` [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
                   ` (12 more replies)
  0 siblings, 13 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra


This series kills PREEMPT_ACTIVE dead, its a tad risky, but survives light
testing on x86_64 and seems to compile on everything else.

---
 arch/x86/include/asm/preempt.h     |  5 +--
 arch/x86/include/asm/thread_info.h |  2 --
 arch/x86/kernel/process_32.c       |  8 -----
 arch/x86/kernel/process_64.c       |  8 -----
 include/asm-generic/preempt.h      |  2 +-
 include/linux/preempt.h            | 20 +----------
 include/linux/sched.h              |  7 ++--
 include/trace/events/sched.h       | 22 +++++-------
 kernel/exit.c                      |  4 ++-
 kernel/sched/core.c                | 70 +++++++++++++++++++++-----------------
 kernel/trace/ftrace.c              |  2 +-
 kernel/trace/trace_sched_switch.c  |  3 +-
 kernel/trace/trace_sched_wakeup.c  |  2 +-
 13 files changed, 60 insertions(+), 95 deletions(-)


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 10:30   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
                     ` (3 more replies)
  2015-09-29  9:28 ` [RFC][PATCH 02/11] sched: Create preempt_count invariant Peter Zijlstra
                   ` (11 subsequent siblings)
  12 siblings, 4 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-1.patch --]
[-- Type: text/plain, Size: 1285 bytes --]

As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
PREEMPT_ACTIVE to avoid cond_resched() from working before the
scheduler is setup.

However, keeping preemption disabled should do the same thing already,
making the PREEMPT_ACTIVE part entirely redundant.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched.h |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -608,17 +608,14 @@ struct task_cputime_atomic {
 /*
  * Disable preemption until the scheduler is running.
  * Reset by start_kernel()->sched_init()->init_idle().
- *
- * We include PREEMPT_ACTIVE to avoid cond_resched() from working
- * before the scheduler is active -- see should_resched().
  */
-#define INIT_PREEMPT_COUNT	(PREEMPT_DISABLED + PREEMPT_ACTIVE)
+#define INIT_PREEMPT_COUNT	PREEMPT_DISABLED
 
 /**
  * struct thread_group_cputimer - thread group interval timer counts
  * @cputime_atomic:	atomic thread group interval timers.
  * @running:		non-zero when there are timers running and
- * 			@cputime receives updates.
+ *			@cputime receives updates.
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 02/11] sched: Create preempt_count invariant
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
  2015-09-29  9:28 ` [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 10:30   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
                     ` (2 more replies)
  2015-09-29  9:28 ` [RFC][PATCH 03/11] sched: Robustify preemption leak checks Peter Zijlstra
                   ` (10 subsequent siblings)
  12 siblings, 3 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-2.patch --]
[-- Type: text/plain, Size: 1936 bytes --]

Ensure that upon scheduling preempt_count == 2; although currently an
additional PREEMPT_ACTIVE is still possible.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/preempt.h |    3 ++-
 include/asm-generic/preempt.h  |    2 +-
 kernel/sched/core.c            |   14 ++++++++++----
 3 files changed, 13 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -31,7 +31,8 @@ static __always_inline void preempt_coun
  * must be macros to avoid header recursion hell
  */
 #define init_task_preempt_count(p) do { \
-	task_thread_info(p)->saved_preempt_count = PREEMPT_DISABLED; \
+	task_thread_info(p)->saved_preempt_count = \
+		2*PREEMPT_DISABLE_OFFSET + PREEMPT_NEED_RESCHED; \
 } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -24,7 +24,7 @@ static __always_inline void preempt_coun
  * must be macros to avoid header recursion hell
  */
 #define init_task_preempt_count(p) do { \
-	task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
+	task_thread_info(p)->preempt_count = 2*PREEMPT_DISABLED; \
 } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2588,11 +2588,17 @@ asmlinkage __visible void schedule_tail(
 {
 	struct rq *rq;
 
-	/* finish_task_switch() drops rq->lock and enables preemtion */
-	preempt_disable();
-	rq = finish_task_switch(prev);
+	/*
+	 * Still have preempt_count() == 2, from:
+	 *
+	 *	schedule()
+	 *	  preempt_disable();			// 1
+	 *	  __schedule()
+	 *	    raw_spin_lock_irq(&rq->lock)	// 2
+	 */
+	rq = finish_task_switch(prev); /* drops rq->lock, preempt_count() == 1 */
 	balance_callback(rq);
-	preempt_enable();
+	preempt_enable(); /* preempt_count() == 0 */
 
 	if (current->set_child_tid)
 		put_user(task_pid_vnr(current), current->set_child_tid);



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 03/11] sched: Robustify preemption leak checks
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
  2015-09-29  9:28 ` [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
  2015-09-29  9:28 ` [RFC][PATCH 02/11] sched: Create preempt_count invariant Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 10:31   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
                     ` (3 more replies)
  2015-09-29  9:28 ` [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
                   ` (9 subsequent siblings)
  12 siblings, 4 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-3.patch --]
[-- Type: text/plain, Size: 1282 bytes --]

When we warn about a preempt_count leak; reset the preempt_count to
the known good value such that the problem does not ripple forward.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/exit.c       |    4 +++-
 kernel/sched/core.c |    4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -706,10 +706,12 @@ void do_exit(long code)
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
 
-	if (unlikely(in_atomic()))
+	if (unlikely(in_atomic())) {
 		pr_info("note: %s[%d] exited with preempt_count %d\n",
 			current->comm, task_pid_nr(current),
 			preempt_count());
+		preempt_count_set(PREEMPT_ENABLED);
+	}
 
 	/* sync mm's RSS info before statistics gathering */
 	if (tsk->mm)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2960,8 +2960,10 @@ static inline void schedule_debug(struct
 	 * schedule() atomically, we ignore that path. Otherwise whine
 	 * if we are scheduling when we should not.
 	 */
-	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD))
+	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) {
 		__schedule_bug(prev);
+		preempt_count_set(PREEMPT_DISABLED);
+	}
 	rcu_sleep_check();
 
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (2 preceding siblings ...)
  2015-09-29  9:28 ` [RFC][PATCH 03/11] sched: Robustify preemption leak checks Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 13:13   ` Thomas Gleixner
                     ` (2 more replies)
  2015-09-29  9:28 ` [RFC][PATCH 05/11] sched: Add preempt argument to __schedule() Peter Zijlstra
                   ` (8 subsequent siblings)
  12 siblings, 3 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-4.patch --]
[-- Type: text/plain, Size: 1599 bytes --]

TASK_DEAD is special in that the final schedule call from do_exit()
must be done with preemption disabled.

This leads to a violation of our new scheduling invariant which states
that the preempt count should be 2. Move the check for TASK_DEAD out
of the debug check and use it to decrement the preempt count (from 2
to 1).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2955,12 +2955,8 @@ static inline void schedule_debug(struct
 #ifdef CONFIG_SCHED_STACK_END_CHECK
 	BUG_ON(unlikely(task_stack_end_corrupted(prev)));
 #endif
-	/*
-	 * Test if we are atomic. Since do_exit() needs to call into
-	 * schedule() atomically, we ignore that path. Otherwise whine
-	 * if we are scheduling when we should not.
-	 */
-	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) {
+
+	if (unlikely(in_atomic_preempt_off())) {
 		__schedule_bug(prev);
 		preempt_count_set(PREEMPT_DISABLED);
 	}
@@ -3061,6 +3057,17 @@ static void __sched __schedule(void)
 	rcu_note_context_switch();
 	prev = rq->curr;
 
+	/*
+	 * do_exit() calls schedule() with preemption disabled as an exception;
+	 * however we must fix that up, otherwise the next task will see an
+	 * inconsistent preempt count.
+	 *
+	 * It also avoids the below schedule_debug() test from complaining
+	 * about this.
+	 */
+	if (unlikely(prev->state == TASK_DEAD))
+		preempt_enable_no_resched_notrace();
+
 	schedule_debug(prev);
 
 	if (sched_feat(HRTICK))



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 05/11] sched: Add preempt argument to __schedule()
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (3 preceding siblings ...)
  2015-09-29  9:28 ` [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 13:14   ` Thomas Gleixner
                     ` (2 more replies)
  2015-09-29  9:28 ` [RFC][PATCH 06/11] sched: Fix trace_sched_switch() Peter Zijlstra
                   ` (7 subsequent siblings)
  12 siblings, 3 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-5.patch --]
[-- Type: text/plain, Size: 1933 bytes --]

There is only a single PREEMPT_ACTIVE use in the regular __schedule()
path and that is to circumvent the task->state check. Since the code
setting PREEMPT_ACTIVE is the immediate caller of __schedule() we can
replace this with a function argument.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3045,7 +3045,7 @@ pick_next_task(struct rq *rq, struct tas
  *
  * WARNING: must be called with preemption disabled!
  */
-static void __sched __schedule(void)
+static void __sched __schedule(bool preempt)
 {
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
@@ -3085,7 +3085,7 @@ static void __sched __schedule(void)
 	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
 
 	switch_count = &prev->nivcsw;
-	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
+	if (!preempt && prev->state) {
 		if (unlikely(signal_pending_state(prev->state, prev))) {
 			prev->state = TASK_RUNNING;
 		} else {
@@ -3150,7 +3150,7 @@ asmlinkage __visible void __sched schedu
 	sched_submit_work(tsk);
 	do {
 		preempt_disable();
-		__schedule();
+		__schedule(false);
 		sched_preempt_enable_no_resched();
 	} while (need_resched());
 }
@@ -3191,7 +3191,7 @@ static void __sched notrace preempt_sche
 {
 	do {
 		preempt_active_enter();
-		__schedule();
+		__schedule(true);
 		preempt_active_exit();
 
 		/*
@@ -3256,7 +3256,7 @@ asmlinkage __visible void __sched notrac
 		 * an infinite recursion.
 		 */
 		prev_ctx = exception_enter();
-		__schedule();
+		__schedule(true);
 		exception_exit(prev_ctx);
 
 		barrier();
@@ -3285,7 +3285,7 @@ asmlinkage __visible void __sched preemp
 	do {
 		preempt_active_enter();
 		local_irq_enable();
-		__schedule();
+		__schedule(true);
 		local_irq_disable();
 		preempt_active_exit();
 	} while (need_resched());



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 06/11] sched: Fix trace_sched_switch()
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (4 preceding siblings ...)
  2015-09-29  9:28 ` [RFC][PATCH 05/11] sched: Add preempt argument to __schedule() Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 13:15   ` Thomas Gleixner
  2015-09-29 15:38   ` Steven Rostedt
  2015-09-29  9:28 ` [RFC][PATCH 07/11] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-6.patch --]
[-- Type: text/plain, Size: 3939 bytes --]

__trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE
user, move trace_sched_switch() from prepare_task_switch() to
__schedule() and propagate the @preempt argument.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/trace/events/sched.h      |   22 +++++++++-------------
 kernel/sched/core.c               |    2 +-
 kernel/trace/ftrace.c             |    2 +-
 kernel/trace/trace_sched_switch.c |    3 ++-
 kernel/trace/trace_sched_wakeup.c |    2 +-
 5 files changed, 14 insertions(+), 17 deletions(-)

--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -104,22 +104,17 @@ DEFINE_EVENT(sched_wakeup_template, sche
 	     TP_ARGS(p));
 
 #ifdef CREATE_TRACE_POINTS
-static inline long __trace_sched_switch_state(struct task_struct *p)
+static inline long __trace_sched_switch_state(bool preempt, struct task_struct *p)
 {
-	long state = p->state;
-
-#ifdef CONFIG_PREEMPT
 #ifdef CONFIG_SCHED_DEBUG
 	BUG_ON(p != current);
 #endif /* CONFIG_SCHED_DEBUG */
+
 	/*
-	 * For all intents and purposes a preempted task is a running task.
+	 * Preemption ignores task state, therefore preempted tasks are always
+	 * RUNNING (we will not have dequeued if state != RUNNING).
 	 */
-	if (preempt_count() & PREEMPT_ACTIVE)
-		state = TASK_RUNNING | TASK_STATE_MAX;
-#endif /* CONFIG_PREEMPT */
-
-	return state;
+	return preempt ? TASK_RUNNING | TASK_STATE_MAX : p->state;
 }
 #endif /* CREATE_TRACE_POINTS */
 
@@ -128,10 +123,11 @@ static inline long __trace_sched_switch_
  */
 TRACE_EVENT(sched_switch,
 
-	TP_PROTO(struct task_struct *prev,
+	TP_PROTO(bool preempt,
+		 struct task_struct *prev,
 		 struct task_struct *next),
 
-	TP_ARGS(prev, next),
+	TP_ARGS(preempt, prev, next),
 
 	TP_STRUCT__entry(
 		__array(	char,	prev_comm,	TASK_COMM_LEN	)
@@ -147,7 +143,7 @@ TRACE_EVENT(sched_switch,
 		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
 		__entry->prev_pid	= prev->pid;
 		__entry->prev_prio	= prev->prio;
-		__entry->prev_state	= __trace_sched_switch_state(prev);
+		__entry->prev_state	= __trace_sched_switch_state(preempt, prev);
 		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
 		__entry->next_pid	= next->pid;
 		__entry->next_prio	= next->prio;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2470,7 +2470,6 @@ static inline void
 prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 {
-	trace_sched_switch(prev, next);
 	sched_info_switch(rq, prev, next);
 	perf_event_task_sched_out(prev, next);
 	fire_sched_out_preempt_notifiers(prev, next);
@@ -3121,6 +3120,7 @@ static void __sched __schedule(bool pree
 		rq->curr = next;
 		++*switch_count;
 
+		trace_sched_switch(preempt, prev, next);
 		rq = context_switch(rq, prev, next); /* unlocks the rq */
 		cpu = cpu_of(rq);
 	} else {
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5697,7 +5697,7 @@ static int alloc_retstack_tasklist(struc
 }
 
 static void
-ftrace_graph_probe_sched_switch(void *ignore,
+ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 			struct task_struct *prev, struct task_struct *next)
 {
 	unsigned long long timestamp;
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -16,7 +16,8 @@ static int			sched_ref;
 static DEFINE_MUTEX(sched_register_mutex);
 
 static void
-probe_sched_switch(void *ignore, struct task_struct *prev, struct task_struct *next)
+probe_sched_switch(void *ignore, bool preempt,
+		   struct task_struct *prev, struct task_struct *next)
 {
 	if (unlikely(!sched_ref))
 		return;
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -420,7 +420,7 @@ tracing_sched_wakeup_trace(struct trace_
 }
 
 static void notrace
-probe_wakeup_sched_switch(void *ignore,
+probe_wakeup_sched_switch(void *ignore, bool preempt,
 			  struct task_struct *prev, struct task_struct *next)
 {
 	struct trace_array_cpu *data;



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 07/11] sched: Stop setting PREEMPT_ACTIVE
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (5 preceding siblings ...)
  2015-09-29  9:28 ` [RFC][PATCH 06/11] sched: Fix trace_sched_switch() Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 13:16   ` Thomas Gleixner
  2015-09-29 15:41   ` Steven Rostedt
  2015-09-29  9:28 ` [RFC][PATCH 08/11] sched: Simplify preempt_count tests Peter Zijlstra
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-7.patch --]
[-- Type: text/plain, Size: 2386 bytes --]

Now that nothing tests for PREEMPT_ACTIVE anymore, stop setting it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/preempt.h |   12 ------------
 kernel/sched/core.c     |   19 ++++++-------------
 2 files changed, 6 insertions(+), 25 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -146,18 +146,6 @@ extern void preempt_count_sub(int val);
 #define preempt_count_inc() preempt_count_add(1)
 #define preempt_count_dec() preempt_count_sub(1)
 
-#define preempt_active_enter() \
-do { \
-	preempt_count_add(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET); \
-	barrier(); \
-} while (0)
-
-#define preempt_active_exit() \
-do { \
-	barrier(); \
-	preempt_count_sub(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET); \
-} while (0)
-
 #ifdef CONFIG_PREEMPT_COUNT
 
 #define preempt_disable() \
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3190,9 +3190,9 @@ void __sched schedule_preempt_disabled(v
 static void __sched notrace preempt_schedule_common(void)
 {
 	do {
-		preempt_active_enter();
+		preempt_disable();
 		__schedule(true);
-		preempt_active_exit();
+		sched_preempt_enable_no_resched();
 
 		/*
 		 * Check again in case we missed a preemption opportunity
@@ -3243,13 +3243,7 @@ asmlinkage __visible void __sched notrac
 		return;
 
 	do {
-		/*
-		 * Use raw __prempt_count() ops that don't call function.
-		 * We can't call functions before disabling preemption which
-		 * disarm preemption tracing recursions.
-		 */
-		__preempt_count_add(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
-		barrier();
+		preempt_disable_notrace();
 		/*
 		 * Needs preempt disabled in case user_exit() is traced
 		 * and the tracer calls preempt_enable_notrace() causing
@@ -3259,8 +3253,7 @@ asmlinkage __visible void __sched notrac
 		__schedule(true);
 		exception_exit(prev_ctx);
 
-		barrier();
-		__preempt_count_sub(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
+		preempt_enable_no_resched_notrace();
 	} while (need_resched());
 }
 EXPORT_SYMBOL_GPL(preempt_schedule_notrace);
@@ -3283,11 +3276,11 @@ asmlinkage __visible void __sched preemp
 	prev_state = exception_enter();
 
 	do {
-		preempt_active_enter();
+		preempt_disable();
 		local_irq_enable();
 		__schedule(true);
 		local_irq_disable();
-		preempt_active_exit();
+		sched_preempt_enable_no_resched();
 	} while (need_resched());
 
 	exception_exit(prev_state);



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 08/11] sched: Simplify preempt_count tests
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (6 preceding siblings ...)
  2015-09-29  9:28 ` [RFC][PATCH 07/11] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 13:17   ` Thomas Gleixner
  2015-09-29 15:42   ` Steven Rostedt
  2015-09-29  9:28 ` [RFC][PATCH 09/11] sched, x86: Kill saved_preempt_count Peter Zijlstra
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-8.patch --]
[-- Type: text/plain, Size: 1129 bytes --]

Since we stopped setting PREEMPT_ACTIVE, there is no need to mask it
out of preempt_count() tests.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/preempt.h |    3 +--
 kernel/sched/core.c     |    2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -126,8 +126,7 @@
  * Check whether we were atomic before we did preempt_disable():
  * (used by the scheduler)
  */
-#define in_atomic_preempt_off() \
-		((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_DISABLE_OFFSET)
+#define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
 
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
 extern void preempt_count_add(int val);
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7472,7 +7472,7 @@ void __init sched_init(void)
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 static inline int preempt_count_equals(int preempt_offset)
 {
-	int nested = (preempt_count() & ~PREEMPT_ACTIVE) + rcu_preempt_depth();
+	int nested = preempt_count() + rcu_preempt_depth();
 
 	return (nested == preempt_offset);
 }



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 09/11] sched, x86: Kill saved_preempt_count
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (7 preceding siblings ...)
  2015-09-29  9:28 ` [RFC][PATCH 08/11] sched: Simplify preempt_count tests Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 13:18   ` Thomas Gleixner
  2015-09-29 15:44   ` Steven Rostedt
  2015-09-29  9:28 ` [RFC][PATCH 10/11] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-9.patch --]
[-- Type: text/plain, Size: 2868 bytes --]

With the introduction of the context switch preempt_count invariant,
and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
per-cpu preemption count, it must always be 2.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/preempt.h     |    6 +-----
 arch/x86/include/asm/thread_info.h |    2 --
 arch/x86/kernel/process_32.c       |    8 --------
 arch/x86/kernel/process_64.c       |    8 --------
 4 files changed, 1 insertion(+), 23 deletions(-)

--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -30,13 +30,9 @@ static __always_inline void preempt_coun
 /*
  * must be macros to avoid header recursion hell
  */
-#define init_task_preempt_count(p) do { \
-	task_thread_info(p)->saved_preempt_count = \
-		2*PREEMPT_DISABLE_OFFSET + PREEMPT_NEED_RESCHED; \
-} while (0)
+#define init_task_preempt_count(p) do { } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
-	task_thread_info(p)->saved_preempt_count = PREEMPT_ENABLED; \
 	per_cpu(__preempt_count, (cpu)) = PREEMPT_ENABLED; \
 } while (0)
 
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -57,7 +57,6 @@ struct thread_info {
 	__u32			flags;		/* low level flags */
 	__u32			status;		/* thread synchronous flags */
 	__u32			cpu;		/* current CPU */
-	int			saved_preempt_count;
 	mm_segment_t		addr_limit;
 	void __user		*sysenter_return;
 	unsigned int		sig_on_uaccess_error:1;
@@ -69,7 +68,6 @@ struct thread_info {
 	.task		= &tsk,			\
 	.flags		= 0,			\
 	.cpu		= 0,			\
-	.saved_preempt_count = INIT_PREEMPT_COUNT,	\
 	.addr_limit	= KERNEL_DS,		\
 }
 
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -280,14 +280,6 @@ __switch_to(struct task_struct *prev_p,
 		set_iopl_mask(next->iopl);
 
 	/*
-	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
-	 * preempt_count of all tasks was equal here and this would not be
-	 * needed.
-	 */
-	task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
-	this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
-
-	/*
 	 * Now maybe handle debug registers and/or IO bitmaps
 	 */
 	if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -401,14 +401,6 @@ __switch_to(struct task_struct *prev_p,
 	 */
 	this_cpu_write(current_task, next_p);
 
-	/*
-	 * If it were not for PREEMPT_ACTIVE we could guarantee that the
-	 * preempt_count of all tasks was equal here and this would not be
-	 * needed.
-	 */
-	task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
-	this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
-
 	/* Reload esp0 and ss1.  This changes current_thread_info(). */
 	load_sp0(tss, next);
 



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 10/11] sched: Kill PREEMPT_ACTIVE
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (8 preceding siblings ...)
  2015-09-29  9:28 ` [RFC][PATCH 09/11] sched, x86: Kill saved_preempt_count Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 13:18   ` Thomas Gleixner
  2015-09-29 15:45   ` Steven Rostedt
  2015-09-29  9:28 ` [RFC][PATCH 11/11] sched: More notrace Peter Zijlstra
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-10.patch --]
[-- Type: text/plain, Size: 802 bytes --]

Its unused, kill the definition.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/preempt.h |    5 -----
 1 file changed, 5 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -26,7 +26,6 @@
  *         SOFTIRQ_MASK:	0x0000ff00
  *         HARDIRQ_MASK:	0x000f0000
  *             NMI_MASK:	0x00100000
- *       PREEMPT_ACTIVE:	0x00200000
  * PREEMPT_NEED_RESCHED:	0x80000000
  */
 #define PREEMPT_BITS	8
@@ -53,10 +52,6 @@
 
 #define SOFTIRQ_DISABLE_OFFSET	(2 * SOFTIRQ_OFFSET)
 
-#define PREEMPT_ACTIVE_BITS	1
-#define PREEMPT_ACTIVE_SHIFT	(NMI_SHIFT + NMI_BITS)
-#define PREEMPT_ACTIVE	(__IRQ_MASK(PREEMPT_ACTIVE_BITS) << PREEMPT_ACTIVE_SHIFT)
-
 /* We use the MSB mostly because its available */
 #define PREEMPT_NEED_RESCHED	0x80000000
 



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 11/11] sched: More notrace
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (9 preceding siblings ...)
  2015-09-29  9:28 ` [RFC][PATCH 10/11] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-29  9:28 ` Peter Zijlstra
  2015-09-29 13:19   ` Thomas Gleixner
  2015-09-29 15:58   ` Steven Rostedt
  2015-09-29 10:27 ` [RFC][PATCH 12/11] sched: Add preempt_count invariant check Peter Zijlstra
  2015-09-29 10:34 ` [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Ingo Molnar
  12 siblings, 2 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29  9:28 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx,
	rostedt, Peter Zijlstra

[-- Attachment #1: peterz-kill-preempt_active-11.patch --]
[-- Type: text/plain, Size: 1187 bytes --]

preempt_schedule_common() is marked notrace, but it does not use
_notrace() preempt_count functions and __schedule() is also not marked
notrace, which means that its perfectly possible to end up in the
tracer from preempt_schedule_common().

So either we need more notrace, or we need to remove notrace from
preempt_schedule_common().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3044,7 +3044,7 @@ pick_next_task(struct rq *rq, struct tas
  *
  * WARNING: must be called with preemption disabled!
  */
-static void __sched __schedule(bool preempt)
+static void __sched notrace __schedule(bool preempt)
 {
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
@@ -3190,9 +3190,9 @@ void __sched schedule_preempt_disabled(v
 static void __sched notrace preempt_schedule_common(void)
 {
 	do {
-		preempt_disable();
+		preempt_disable_notrace();
 		__schedule(true);
-		sched_preempt_enable_no_resched();
+		preempt_enable_no_resched_notrace();
 
 		/*
 		 * Check again in case we missed a preemption opportunity



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH 12/11] sched: Add preempt_count invariant check
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (10 preceding siblings ...)
  2015-09-29  9:28 ` [RFC][PATCH 11/11] sched: More notrace Peter Zijlstra
@ 2015-09-29 10:27 ` Peter Zijlstra
  2015-09-29 10:32   ` Peter Zijlstra
  2015-09-29 10:56   ` [RFC][PATCH v2 " Peter Zijlstra
  2015-09-29 10:34 ` [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Ingo Molnar
  12 siblings, 2 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 10:27 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx, rostedt


Ingo requested I keep my debug check for the preempt_count invariant.

Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2503,6 +2503,11 @@ static struct rq *finish_task_switch(str
 	struct mm_struct *mm = rq->prev_mm;
 	long prev_state;
 
+	if (unlikely(WARN_ONCE(preempt_count() != 2,
+			       "corrupted preempt_count: %s/%d/0x%x\n",
+			       current->comm, current->pid, preempt_count())))
+		preempt_count_set(INIT_PREEMPT_COUNT);
+
 	rq->prev_mm = NULL;
 
 	/*

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [tip:sched/core] sched/core: Simplify INIT_PREEMPT_COUNT
  2015-09-29  9:28 ` [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
@ 2015-09-29 10:30   ` tip-bot for Peter Zijlstra
  2015-09-29 13:19   ` [RFC][PATCH 01/11] sched: " Frederic Weisbecker
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-09-29 10:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, hpa, efault, mingo, peterz, linux-kernel, tglx

Commit-ID:  aed586c2f11e5cddd52c180ccde651d0d35b736b
Gitweb:     http://git.kernel.org/tip/aed586c2f11e5cddd52c180ccde651d0d35b736b
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 29 Sep 2015 11:28:26 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 29 Sep 2015 12:27:39 +0200

sched/core: Simplify INIT_PREEMPT_COUNT

As per commit d86ee4809d03 ("sched: optimize cond_resched()") we
need PREEMPT_ACTIVE to avoid cond_resched() from working before
the scheduler is setup.

However, keeping preemption disabled should do the same thing
already, making the PREEMPT_ACTIVE part entirely redundant.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: fweisbec@gmail.com
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Cc: rostedt@goodmis.org
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150929093519.706413197@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 699228b..5ef9e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -608,17 +608,14 @@ struct task_cputime_atomic {
 /*
  * Disable preemption until the scheduler is running.
  * Reset by start_kernel()->sched_init()->init_idle().
- *
- * We include PREEMPT_ACTIVE to avoid cond_resched() from working
- * before the scheduler is active -- see should_resched().
  */
-#define INIT_PREEMPT_COUNT	(PREEMPT_DISABLED + PREEMPT_ACTIVE)
+#define INIT_PREEMPT_COUNT	PREEMPT_DISABLED
 
 /**
  * struct thread_group_cputimer - thread group interval timer counts
  * @cputime_atomic:	atomic thread group interval timers.
  * @running:		non-zero when there are timers running and
- * 			@cputime receives updates.
+ *			@cputime receives updates.
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [tip:sched/core] sched/core: Create preempt_count invariant
  2015-09-29  9:28 ` [RFC][PATCH 02/11] sched: Create preempt_count invariant Peter Zijlstra
@ 2015-09-29 10:30   ` tip-bot for Peter Zijlstra
  2015-09-29 12:55   ` [RFC][PATCH 02/11] sched: " Frederic Weisbecker
  2015-09-29 13:11   ` Thomas Gleixner
  2 siblings, 0 replies; 61+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-09-29 10:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, torvalds, linux-kernel, peterz, mingo, efault, tglx

Commit-ID:  e26d555f695ac8a3aa38055dd04bd23c1334723b
Gitweb:     http://git.kernel.org/tip/e26d555f695ac8a3aa38055dd04bd23c1334723b
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 29 Sep 2015 11:28:27 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 29 Sep 2015 12:27:40 +0200

sched/core: Create preempt_count invariant

Ensure that upon scheduling preempt_count == 2; although
currently an additional PREEMPT_ACTIVE is still possible.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: fweisbec@gmail.com
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Cc: rostedt@goodmis.org
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150929093519.817299442@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/preempt.h |  3 ++-
 include/asm-generic/preempt.h  |  2 +-
 kernel/sched/core.c            | 14 ++++++++++----
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index b12f810..183d95c6 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -31,7 +31,8 @@ static __always_inline void preempt_count_set(int pc)
  * must be macros to avoid header recursion hell
  */
 #define init_task_preempt_count(p) do { \
-	task_thread_info(p)->saved_preempt_count = PREEMPT_DISABLED; \
+	task_thread_info(p)->saved_preempt_count = \
+		2*PREEMPT_DISABLE_OFFSET + PREEMPT_NEED_RESCHED; \
 } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h
index 0bec580..1d6f104 100644
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -24,7 +24,7 @@ static __always_inline void preempt_count_set(int pc)
  * must be macros to avoid header recursion hell
  */
 #define init_task_preempt_count(p) do { \
-	task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
+	task_thread_info(p)->preempt_count = 2*PREEMPT_DISABLED; \
 } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a91df61..ecd585c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2588,11 +2588,17 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
 {
 	struct rq *rq;
 
-	/* finish_task_switch() drops rq->lock and enables preemtion */
-	preempt_disable();
-	rq = finish_task_switch(prev);
+	/*
+	 * Still have preempt_count() == 2, from:
+	 *
+	 *	schedule()
+	 *	  preempt_disable();			// 1
+	 *	  __schedule()
+	 *	    raw_spin_lock_irq(&rq->lock)	// 2
+	 */
+	rq = finish_task_switch(prev); /* drops rq->lock, preempt_count() == 1 */
 	balance_callback(rq);
-	preempt_enable();
+	preempt_enable(); /* preempt_count() == 0 */
 
 	if (current->set_child_tid)
 		put_user(task_pid_vnr(current), current->set_child_tid);

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [tip:sched/core] sched/core: Robustify preemption leak checks
  2015-09-29  9:28 ` [RFC][PATCH 03/11] sched: Robustify preemption leak checks Peter Zijlstra
@ 2015-09-29 10:31   ` tip-bot for Peter Zijlstra
  2015-09-29 13:13   ` [RFC][PATCH 03/11] sched: " Thomas Gleixner
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-09-29 10:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, mingo, efault, linux-kernel, peterz, hpa, tglx

Commit-ID:  185b42420b0948dfc00af6cc5d93aa00b7fef581
Gitweb:     http://git.kernel.org/tip/185b42420b0948dfc00af6cc5d93aa00b7fef581
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 29 Sep 2015 11:28:28 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 29 Sep 2015 12:27:40 +0200

sched/core: Robustify preemption leak checks

When we warn about a preempt_count leak; reset the preempt_count
to the known good value such that the problem does not ripple
forward.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: fweisbec@gmail.com
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Cc: rostedt@goodmis.org
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150929093519.930244771@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/exit.c       | 4 +++-
 kernel/sched/core.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index ea95ee1..443677c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -706,10 +706,12 @@ void do_exit(long code)
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
 
-	if (unlikely(in_atomic()))
+	if (unlikely(in_atomic())) {
 		pr_info("note: %s[%d] exited with preempt_count %d\n",
 			current->comm, task_pid_nr(current),
 			preempt_count());
+		preempt_count_set(PREEMPT_ENABLED);
+	}
 
 	/* sync mm's RSS info before statistics gathering */
 	if (tsk->mm)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ecd585c..d1dec5d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2953,8 +2953,10 @@ static inline void schedule_debug(struct task_struct *prev)
 	 * schedule() atomically, we ignore that path. Otherwise whine
 	 * if we are scheduling when we should not.
 	 */
-	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD))
+	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) {
 		__schedule_bug(prev);
+		preempt_count_set(PREEMPT_DISABLED);
+	}
 	rcu_sleep_check();
 
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 12/11] sched: Add preempt_count invariant check
  2015-09-29 10:27 ` [RFC][PATCH 12/11] sched: Add preempt_count invariant check Peter Zijlstra
@ 2015-09-29 10:32   ` Peter Zijlstra
  2015-09-29 10:56   ` [RFC][PATCH v2 " Peter Zijlstra
  1 sibling, 0 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 10:32 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx, rostedt

On Tue, Sep 29, 2015 at 12:27:55PM +0200, Peter Zijlstra wrote:
> 
> Ingo requested I keep my debug check for the preempt_count invariant.
> 
> Requested-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/core.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2503,6 +2503,11 @@ static struct rq *finish_task_switch(str
>  	struct mm_struct *mm = rq->prev_mm;
>  	long prev_state;
>  
> +	if (unlikely(WARN_ONCE(preempt_count() != 2,

That should have: s/2/2*PREEMPT_DISABLE_OFFSET/ or something, otherwise
!PREEMPT_COUNT kernels will go funny.

> +			       "corrupted preempt_count: %s/%d/0x%x\n",
> +			       current->comm, current->pid, preempt_count())))
> +		preempt_count_set(INIT_PREEMPT_COUNT);
> +
>  	rq->prev_mm = NULL;
>  
>  	/*

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE
  2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
                   ` (11 preceding siblings ...)
  2015-09-29 10:27 ` [RFC][PATCH 12/11] sched: Add preempt_count invariant check Peter Zijlstra
@ 2015-09-29 10:34 ` Ingo Molnar
  12 siblings, 0 replies; 61+ messages in thread
From: Ingo Molnar @ 2015-09-29 10:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx, rostedt


* Peter Zijlstra <peterz@infradead.org> wrote:

> This series kills PREEMPT_ACTIVE dead, its a tad risky, but survives light 
> testing on x86_64 and seems to compile on everything else.

I've applied the first three patches because they look low risk.

Will wait a bit with the rest.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC][PATCH v2 12/11] sched: Add preempt_count invariant check
  2015-09-29 10:27 ` [RFC][PATCH 12/11] sched: Add preempt_count invariant check Peter Zijlstra
  2015-09-29 10:32   ` Peter Zijlstra
@ 2015-09-29 10:56   ` Peter Zijlstra
  2015-09-29 11:55     ` Linus Torvalds
  1 sibling, 1 reply; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 10:56 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx, rostedt


Ingo requested I keep my debug check for the preempt_count invariant.

Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/preempt.h |    2 +-
 include/linux/sched.h         |   17 +++++++++--------
 kernel/sched/core.c           |    5 +++++
 3 files changed, 15 insertions(+), 9 deletions(-)

--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -24,7 +24,7 @@ static __always_inline void preempt_coun
  * must be macros to avoid header recursion hell
  */
 #define init_task_preempt_count(p) do { \
-	task_thread_info(p)->preempt_count = 2*PREEMPT_DISABLED; \
+	task_thread_info(p)->preempt_count = INIT_PREEMPT_COUNT; \
 } while (0)
 
 #define init_idle_preempt_count(p, cpu) do { \
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -599,17 +599,18 @@ struct task_cputime_atomic {
 		.sum_exec_runtime = ATOMIC64_INIT(0),		\
 	}
 
-#ifdef CONFIG_PREEMPT_COUNT
-#define PREEMPT_DISABLED	(1 + PREEMPT_ENABLED)
-#else
-#define PREEMPT_DISABLED	PREEMPT_ENABLED
-#endif
+#define PREEMPT_DISABLED	(PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
 
 /*
- * Disable preemption until the scheduler is running.
- * Reset by start_kernel()->sched_init()->init_idle().
+ * Initial preempt_count value; reflects the preempt_count schedule invariant
+ * which states that during schedule preempt_count() == 2.
+ *
+ * This also results in the kernel starting with preemption disabled until
+ * the scheduler is initialized, see:
+ *
+ *   start_kernel()->sched_init()->init_idle().
  */
-#define INIT_PREEMPT_COUNT	PREEMPT_DISABLED
+#define INIT_PREEMPT_COUNT	(2*PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
 
 /**
  * struct thread_group_cputimer - thread group interval timer counts
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2503,6 +2503,11 @@ static struct rq *finish_task_switch(str
 	struct mm_struct *mm = rq->prev_mm;
 	long prev_state;
 
+	if (unlikely(WARN_ONCE(preempt_count() != 2*PREEMPT_DISABLE_OFFSET,
+			       "corrupted preempt_count: %s/%d/0x%x\n",
+			       current->comm, current->pid, preempt_count())))
+		preempt_count_set(INIT_PREEMPT_COUNT);
+
 	rq->prev_mm = NULL;
 
 	/*

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH v2 12/11] sched: Add preempt_count invariant check
  2015-09-29 11:55     ` Linus Torvalds
@ 2015-09-29 11:55       ` Peter Zijlstra
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 11:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List,
	Frédéric Weisbecker, Oleg Nesterov, Mike Galbraith,
	Thomas Gleixner, Steven Rostedt

On Tue, Sep 29, 2015 at 07:55:49AM -0400, Linus Torvalds wrote:
> On Tue, Sep 29, 2015 at 6:56 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > + * Initial preempt_count value; reflects the preempt_count schedule invariant
> > + * which states that during schedule preempt_count() == 2.
> 
> Is this actually *true*?
> 
> preempt_count() is two only if preemption is enabled. But spinlocks do
> not actually update the preemption count if there is no preemption, so
> these kinds of checks and comments that aren't even inside #ifdef
> CONFIG_PREEMPT seem to be actively misleading.
> 
> I do believe that the preempt count is stable - but the actual value
> would seem to depend on config options.

Right, 2*PREEMPT_DISABLE_OFFSET is the 'right' number. That ends up
being 0 for !PREEMPT_COUNT configs.

I'll update the Changelog to be more accurate on this.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH v2 12/11] sched: Add preempt_count invariant check
  2015-09-29 10:56   ` [RFC][PATCH v2 " Peter Zijlstra
@ 2015-09-29 11:55     ` Linus Torvalds
  2015-09-29 11:55       ` Peter Zijlstra
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2015-09-29 11:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Linux Kernel Mailing List,
	Frédéric Weisbecker, Oleg Nesterov, Mike Galbraith,
	Thomas Gleixner, Steven Rostedt

On Tue, Sep 29, 2015 at 6:56 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> + * Initial preempt_count value; reflects the preempt_count schedule invariant
> + * which states that during schedule preempt_count() == 2.

Is this actually *true*?

preempt_count() is two only if preemption is enabled. But spinlocks do
not actually update the preemption count if there is no preemption, so
these kinds of checks and comments that aren't even inside #ifdef
CONFIG_PREEMPT seem to be actively misleading.

I do believe that the preempt count is stable - but the actual value
would seem to depend on config options.

            Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 02/11] sched: Create preempt_count invariant
  2015-09-29  9:28 ` [RFC][PATCH 02/11] sched: Create preempt_count invariant Peter Zijlstra
  2015-09-29 10:30   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
@ 2015-09-29 12:55   ` Frederic Weisbecker
  2015-09-29 13:02     ` Peter Zijlstra
  2015-09-29 13:11   ` Thomas Gleixner
  2 siblings, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2015-09-29 12:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Tue, Sep 29, 2015 at 11:28:27AM +0200, Peter Zijlstra wrote:
> Ensure that upon scheduling preempt_count == 2; although currently an
> additional PREEMPT_ACTIVE is still possible.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/preempt.h |    3 ++-
>  include/asm-generic/preempt.h  |    2 +-
>  kernel/sched/core.c            |   14 ++++++++++----
>  3 files changed, 13 insertions(+), 6 deletions(-)
> 
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -31,7 +31,8 @@ static __always_inline void preempt_coun
>   * must be macros to avoid header recursion hell
>   */
>  #define init_task_preempt_count(p) do { \
> -	task_thread_info(p)->saved_preempt_count = PREEMPT_DISABLED; \
> +	task_thread_info(p)->saved_preempt_count = \
> +		2*PREEMPT_DISABLE_OFFSET + PREEMPT_NEED_RESCHED; \
>  } while (0)
>  
>  #define init_idle_preempt_count(p, cpu) do { \
> --- a/include/asm-generic/preempt.h
> +++ b/include/asm-generic/preempt.h
> @@ -24,7 +24,7 @@ static __always_inline void preempt_coun
>   * must be macros to avoid header recursion hell
>   */
>  #define init_task_preempt_count(p) do { \
> -	task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
> +	task_thread_info(p)->preempt_count = 2*PREEMPT_DISABLED; \

Since it's not quite obvious why we use this magic value without looking
at schedule_tail() details, maybe add a little comment? (Just "/* see schedule_tail() */").

>  } while (0)
>  
>  #define init_idle_preempt_count(p, cpu) do { \
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2588,11 +2588,17 @@ asmlinkage __visible void schedule_tail(
>  {
>  	struct rq *rq;
>  
> -	/* finish_task_switch() drops rq->lock and enables preemtion */
> -	preempt_disable();
> -	rq = finish_task_switch(prev);
> +	/*
> +	 * Still have preempt_count() == 2, from:
> +	 *
> +	 *	schedule()
> +	 *	  preempt_disable();			// 1
> +	 *	  __schedule()
> +	 *	    raw_spin_lock_irq(&rq->lock)	// 2
> +	 */

I found that a bit confusing first, because that's a preempt_count()
we actually emulate for a new task. Maybe something like:

+	/*
+	 * New task is init with preempt_count() == 2 because prev task left
+        * us after:
+	 *
+	 *	schedule()
+	 *	  preempt_disable();			// 1
+	 *	  __schedule()
+	 *	    raw_spin_lock_irq(&rq->lock)	// 2
+	 */

> +	rq = finish_task_switch(prev); /* drops rq->lock, preempt_count() == 1 */
>  	balance_callback(rq);
> -	preempt_enable();
> +	preempt_enable(); /* preempt_count() == 0 */

Thanks!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 02/11] sched: Create preempt_count invariant
  2015-09-29 12:55   ` [RFC][PATCH 02/11] sched: " Frederic Weisbecker
@ 2015-09-29 13:02     ` Peter Zijlstra
  2015-09-29 14:40       ` Steven Rostedt
  0 siblings, 1 reply; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 13:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Tue, Sep 29, 2015 at 02:55:13PM +0200, Frederic Weisbecker wrote:
> On Tue, Sep 29, 2015 at 11:28:27AM +0200, Peter Zijlstra wrote:
> >  #define init_task_preempt_count(p) do { \
> > -	task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
> > +	task_thread_info(p)->preempt_count = 2*PREEMPT_DISABLED; \
> 
> Since it's not quite obvious why we use this magic value without looking
> at schedule_tail() details, maybe add a little comment? (Just "/* see schedule_tail() */").

Right, I fixed that in 12/11 v2. I'll change that around a bit.

> > +	/*
> > +	 * Still have preempt_count() == 2, from:
> > +	 *
> > +	 *	schedule()
> > +	 *	  preempt_disable();			// 1
> > +	 *	  __schedule()
> > +	 *	    raw_spin_lock_irq(&rq->lock)	// 2
> > +	 */
> 
> I found that a bit confusing first, because that's a preempt_count()
> we actually emulate for a new task. Maybe something like:
> 
> +	/*
> +	 * New task is init with preempt_count() == 2 because prev task left
> +        * us after:
> +	 *
> +	 *	schedule()
> +	 *	  preempt_disable();			// 1
> +	 *	  __schedule()
> +	 *	    raw_spin_lock_irq(&rq->lock)	// 2
> +	 */

I think I'll move the comment to finish_task_switch(), but yes.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 02/11] sched: Create preempt_count invariant
  2015-09-29  9:28 ` [RFC][PATCH 02/11] sched: Create preempt_count invariant Peter Zijlstra
  2015-09-29 10:30   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
  2015-09-29 12:55   ` [RFC][PATCH 02/11] sched: " Frederic Weisbecker
@ 2015-09-29 13:11   ` Thomas Gleixner
  2015-09-29 14:09     ` Peter Zijlstra
  2 siblings, 1 reply; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:
> +	/*
> +	 * Still have preempt_count() == 2, from:
> +	 *
> +	 *	schedule()
> +	 *	  preempt_disable();			// 1
> +	 *	  __schedule()
> +	 *	    raw_spin_lock_irq(&rq->lock)	// 2
> +	 */
> +	rq = finish_task_switch(prev); /* drops rq->lock, preempt_count() == 1 */
>  	balance_callback(rq);
> -	preempt_enable();
> +	preempt_enable(); /* preempt_count() == 0 */

Bah. I so hate tail comments. What's wrong with

+	 /* preempt_count() ==> 0 */
	preempt_enable();

Hmm?

	tglx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 03/11] sched: Robustify preemption leak checks
  2015-09-29  9:28 ` [RFC][PATCH 03/11] sched: Robustify preemption leak checks Peter Zijlstra
  2015-09-29 10:31   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
@ 2015-09-29 13:13   ` Thomas Gleixner
  2015-09-29 13:25   ` Frederic Weisbecker
  2015-09-29 15:07   ` Steven Rostedt
  3 siblings, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:

> When we warn about a preempt_count leak; reset the preempt_count to
> the known good value such that the problem does not ripple forward.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception
  2015-09-29  9:28 ` [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
@ 2015-09-29 13:13   ` Thomas Gleixner
  2015-09-29 13:41   ` Frederic Weisbecker
  2015-09-29 15:10   ` Steven Rostedt
  2 siblings, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:

> TASK_DEAD is special in that the final schedule call from do_exit()
> must be done with preemption disabled.
> 
> This leads to a violation of our new scheduling invariant which states
> that the preempt count should be 2. Move the check for TASK_DEAD out
> of the debug check and use it to decrement the preempt count (from 2
> to 1).

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 05/11] sched: Add preempt argument to __schedule()
  2015-09-29  9:28 ` [RFC][PATCH 05/11] sched: Add preempt argument to __schedule() Peter Zijlstra
@ 2015-09-29 13:14   ` Thomas Gleixner
  2015-09-29 13:18   ` Frederic Weisbecker
  2015-09-29 15:28   ` Steven Rostedt
  2 siblings, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:

> There is only a single PREEMPT_ACTIVE use in the regular __schedule()
> path and that is to circumvent the task->state check. Since the code
> setting PREEMPT_ACTIVE is the immediate caller of __schedule() we can
> replace this with a function argument.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 06/11] sched: Fix trace_sched_switch()
  2015-09-29  9:28 ` [RFC][PATCH 06/11] sched: Fix trace_sched_switch() Peter Zijlstra
@ 2015-09-29 13:15   ` Thomas Gleixner
  2015-09-29 15:38   ` Steven Rostedt
  1 sibling, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:

> __trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE
> user, move trace_sched_switch() from prepare_task_switch() to
> __schedule() and propagate the @preempt argument.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 07/11] sched: Stop setting PREEMPT_ACTIVE
  2015-09-29  9:28 ` [RFC][PATCH 07/11] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-29 13:16   ` Thomas Gleixner
  2015-09-29 15:41   ` Steven Rostedt
  1 sibling, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:

> Now that nothing tests for PREEMPT_ACTIVE anymore, stop setting it.

Good riddance.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 08/11] sched: Simplify preempt_count tests
  2015-09-29  9:28 ` [RFC][PATCH 08/11] sched: Simplify preempt_count tests Peter Zijlstra
@ 2015-09-29 13:17   ` Thomas Gleixner
  2015-09-29 15:42   ` Steven Rostedt
  1 sibling, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:

> Since we stopped setting PREEMPT_ACTIVE, there is no need to mask it
> out of preempt_count() tests.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 09/11] sched, x86: Kill saved_preempt_count
  2015-09-29  9:28 ` [RFC][PATCH 09/11] sched, x86: Kill saved_preempt_count Peter Zijlstra
@ 2015-09-29 13:18   ` Thomas Gleixner
  2015-09-29 15:44   ` Steven Rostedt
  1 sibling, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:

> With the introduction of the context switch preempt_count invariant,
> and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
> per-cpu preemption count, it must always be 2.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 10/11] sched: Kill PREEMPT_ACTIVE
  2015-09-29  9:28 ` [RFC][PATCH 10/11] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
@ 2015-09-29 13:18   ` Thomas Gleixner
  2015-09-29 15:45   ` Steven Rostedt
  1 sibling, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:

> Its unused, kill the definition.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 05/11] sched: Add preempt argument to __schedule()
  2015-09-29  9:28 ` [RFC][PATCH 05/11] sched: Add preempt argument to __schedule() Peter Zijlstra
  2015-09-29 13:14   ` Thomas Gleixner
@ 2015-09-29 13:18   ` Frederic Weisbecker
  2015-09-29 15:28   ` Steven Rostedt
  2 siblings, 0 replies; 61+ messages in thread
From: Frederic Weisbecker @ 2015-09-29 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Tue, Sep 29, 2015 at 11:28:30AM +0200, Peter Zijlstra wrote:
> There is only a single PREEMPT_ACTIVE use in the regular __schedule()
> path and that is to circumvent the task->state check. Since the code
> setting PREEMPT_ACTIVE is the immediate caller of __schedule() we can
> replace this with a function argument.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 11/11] sched: More notrace
  2015-09-29  9:28 ` [RFC][PATCH 11/11] sched: More notrace Peter Zijlstra
@ 2015-09-29 13:19   ` Thomas Gleixner
  2015-09-29 15:58   ` Steven Rostedt
  1 sibling, 0 replies; 61+ messages in thread
From: Thomas Gleixner @ 2015-09-29 13:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, 29 Sep 2015, Peter Zijlstra wrote:

> preempt_schedule_common() is marked notrace, but it does not use
> _notrace() preempt_count functions and __schedule() is also not marked
> notrace, which means that its perfectly possible to end up in the
> tracer from preempt_schedule_common().
> 
> So either we need more notrace, or we need to remove notrace from
> preempt_schedule_common().

More notrace is the right thing to do. We have enough pointless
information in tracing already.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT
  2015-09-29  9:28 ` [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
  2015-09-29 10:30   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
@ 2015-09-29 13:19   ` Frederic Weisbecker
  2015-09-29 14:43   ` Steven Rostedt
       [not found]   ` <20150929103729.00899e69@gandalf.local.home>
  3 siblings, 0 replies; 61+ messages in thread
From: Frederic Weisbecker @ 2015-09-29 13:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Tue, Sep 29, 2015 at 11:28:26AM +0200, Peter Zijlstra wrote:
> As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
> PREEMPT_ACTIVE to avoid cond_resched() from working before the
> scheduler is setup.
> 
> However, keeping preemption disabled should do the same thing already,
> making the PREEMPT_ACTIVE part entirely redundant.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 03/11] sched: Robustify preemption leak checks
  2015-09-29  9:28 ` [RFC][PATCH 03/11] sched: Robustify preemption leak checks Peter Zijlstra
  2015-09-29 10:31   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
  2015-09-29 13:13   ` [RFC][PATCH 03/11] sched: " Thomas Gleixner
@ 2015-09-29 13:25   ` Frederic Weisbecker
  2015-09-29 14:24     ` Peter Zijlstra
  2015-09-29 15:07   ` Steven Rostedt
  3 siblings, 1 reply; 61+ messages in thread
From: Frederic Weisbecker @ 2015-09-29 13:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Tue, Sep 29, 2015 at 11:28:28AM +0200, Peter Zijlstra wrote:
> When we warn about a preempt_count leak; reset the preempt_count to
> the known good value such that the problem does not ripple forward.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/exit.c       |    4 +++-
>  kernel/sched/core.c |    4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -706,10 +706,12 @@ void do_exit(long code)
>  	smp_mb();
>  	raw_spin_unlock_wait(&tsk->pi_lock);
>  
> -	if (unlikely(in_atomic()))
> +	if (unlikely(in_atomic())) {
>  		pr_info("note: %s[%d] exited with preempt_count %d\n",
>  			current->comm, task_pid_nr(current),
>  			preempt_count());
> +		preempt_count_set(PREEMPT_ENABLED);
> +	}
>  
>  	/* sync mm's RSS info before statistics gathering */
>  	if (tsk->mm)
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2960,8 +2960,10 @@ static inline void schedule_debug(struct
>  	 * schedule() atomically, we ignore that path. Otherwise whine
>  	 * if we are scheduling when we should not.
>  	 */
> -	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD))
> +	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) {
>  		__schedule_bug(prev);
> +		preempt_count_set(PREEMPT_DISABLED);

That one would be a bit fragile if we kept PREEMPT_ACTIVE, but since we are removing
it...

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception
  2015-09-29  9:28 ` [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
  2015-09-29 13:13   ` Thomas Gleixner
@ 2015-09-29 13:41   ` Frederic Weisbecker
  2015-09-29 15:10   ` Steven Rostedt
  2 siblings, 0 replies; 61+ messages in thread
From: Frederic Weisbecker @ 2015-09-29 13:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Tue, Sep 29, 2015 at 11:28:29AM +0200, Peter Zijlstra wrote:
> TASK_DEAD is special in that the final schedule call from do_exit()
> must be done with preemption disabled.
> 
> This leads to a violation of our new scheduling invariant which states
> that the preempt count should be 2. Move the check for TASK_DEAD out
> of the debug check and use it to decrement the preempt count (from 2
> to 1).
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 02/11] sched: Create preempt_count invariant
  2015-09-29 13:11   ` Thomas Gleixner
@ 2015-09-29 14:09     ` Peter Zijlstra
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 14:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, rostedt

On Tue, Sep 29, 2015 at 03:11:56PM +0200, Thomas Gleixner wrote:
> On Tue, 29 Sep 2015, Peter Zijlstra wrote:
> > +	/*
> > +	 * Still have preempt_count() == 2, from:
> > +	 *
> > +	 *	schedule()
> > +	 *	  preempt_disable();			// 1
> > +	 *	  __schedule()
> > +	 *	    raw_spin_lock_irq(&rq->lock)	// 2
> > +	 */
> > +	rq = finish_task_switch(prev); /* drops rq->lock, preempt_count() == 1 */
> >  	balance_callback(rq);
> > -	preempt_enable();
> > +	preempt_enable(); /* preempt_count() == 0 */
> 
> Bah. I so hate tail comments. What's wrong with
> 
> +	 /* preempt_count() ==> 0 */
> 	preempt_enable();
> 
> Hmm?

I find the tail comments more readable in this case; clearly I don't
share your hatred :-). But I can change them if you insist.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 03/11] sched: Robustify preemption leak checks
  2015-09-29 13:25   ` Frederic Weisbecker
@ 2015-09-29 14:24     ` Peter Zijlstra
  2015-09-29 14:32       ` Frederic Weisbecker
  0 siblings, 1 reply; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 14:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Tue, Sep 29, 2015 at 03:25:53PM +0200, Frederic Weisbecker wrote:
> On Tue, Sep 29, 2015 at 11:28:28AM +0200, Peter Zijlstra wrote:

> > +	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) {
> >  		__schedule_bug(prev);
> > +		preempt_count_set(PREEMPT_DISABLED);
> 
> That one would be a bit fragile if we kept PREEMPT_ACTIVE, but since we are removing
> it...

Right, let me move this patch after 7 (the one that stops setting
PREEMPT_ACTIVE).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 03/11] sched: Robustify preemption leak checks
  2015-09-29 14:24     ` Peter Zijlstra
@ 2015-09-29 14:32       ` Frederic Weisbecker
  0 siblings, 0 replies; 61+ messages in thread
From: Frederic Weisbecker @ 2015-09-29 14:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, oleg, umgwanakikbuti, tglx, rostedt

On Tue, Sep 29, 2015 at 04:24:00PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 29, 2015 at 03:25:53PM +0200, Frederic Weisbecker wrote:
> > On Tue, Sep 29, 2015 at 11:28:28AM +0200, Peter Zijlstra wrote:
> 
> > > +	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) {
> > >  		__schedule_bug(prev);
> > > +		preempt_count_set(PREEMPT_DISABLED);
> > 
> > That one would be a bit fragile if we kept PREEMPT_ACTIVE, but since we are removing
> > it...
> 
> Right, let me move this patch after 7 (the one that stops setting
> PREEMPT_ACTIVE).

Perfect!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 02/11] sched: Create preempt_count invariant
  2015-09-29 13:02     ` Peter Zijlstra
@ 2015-09-29 14:40       ` Steven Rostedt
  0 siblings, 0 replies; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, mingo, linux-kernel, torvalds, oleg,
	umgwanakikbuti, tglx

On Tue, 29 Sep 2015 15:02:01 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Sep 29, 2015 at 02:55:13PM +0200, Frederic Weisbecker wrote:
> > On Tue, Sep 29, 2015 at 11:28:27AM +0200, Peter Zijlstra wrote:
> > >  #define init_task_preempt_count(p) do { \
> > > -	task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
> > > +	task_thread_info(p)->preempt_count = 2*PREEMPT_DISABLED; \
> > 
> > Since it's not quite obvious why we use this magic value without looking
> > at schedule_tail() details, maybe add a little comment? (Just "/* see schedule_tail() */").
> 
> Right, I fixed that in 12/11 v2. I'll change that around a bit.

I'll wait for v2 because this confused me too.

-- Steve

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT
  2015-09-29  9:28 ` [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
  2015-09-29 10:30   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
  2015-09-29 13:19   ` [RFC][PATCH 01/11] sched: " Frederic Weisbecker
@ 2015-09-29 14:43   ` Steven Rostedt
  2015-09-29 14:47     ` Peter Zijlstra
       [not found]   ` <20150929103729.00899e69@gandalf.local.home>
  3 siblings, 1 reply; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 14:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx


[ Resend with a "Reply-all" this time! ]

On Tue, 29 Sep 2015 11:28:26 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
> PREEMPT_ACTIVE to avoid cond_resched() from working before the
> scheduler is setup.
> 
> However, keeping preemption disabled should do the same thing already,
> making the PREEMPT_ACTIVE part entirely redundant.

Thus PREEMPT_ACTIVE wasn't needed at that commit either, or was it?

> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/sched.h |    7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -608,17 +608,14 @@ struct task_cputime_atomic {
>  /*
>   * Disable preemption until the scheduler is running.
>   * Reset by start_kernel()->sched_init()->init_idle().
> - *
> - * We include PREEMPT_ACTIVE to avoid cond_resched() from working
> - * before the scheduler is active -- see should_resched().
>   */
> -#define INIT_PREEMPT_COUNT	(PREEMPT_DISABLED + PREEMPT_ACTIVE)
> +#define INIT_PREEMPT_COUNT	PREEMPT_DISABLED

I can't find anything wrong with this.

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

>  
>  /**
>   * struct thread_group_cputimer - thread group interval timer counts
>   * @cputime_atomic:	atomic thread group interval timers.
>   * @running:		non-zero when there are timers running and
> - * 			@cputime receives updates.
> + *			@cputime receives updates.
>   *
>   * This structure contains the version of task_cputime, above, that is
>   * used for thread group CPU timer calculations.
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT
  2015-09-29 14:43   ` Steven Rostedt
@ 2015-09-29 14:47     ` Peter Zijlstra
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 14:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, Sep 29, 2015 at 10:43:32AM -0400, Steven Rostedt wrote:
> 
> [ Resend with a "Reply-all" this time! ]
> 
> On Tue, 29 Sep 2015 11:28:26 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
> > PREEMPT_ACTIVE to avoid cond_resched() from working before the
> > scheduler is setup.
> > 
> > However, keeping preemption disabled should do the same thing already,
> > making the PREEMPT_ACTIVE part entirely redundant.
> 
> Thus PREEMPT_ACTIVE wasn't needed at that commit either, or was it?

Correct. We might have actually discussed that at that time but decided
to be conservative -- but these be vague memories..


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT
       [not found]     ` <20150929143559.GL3816@twins.programming.kicks-ass.net>
@ 2015-09-29 15:00       ` Steven Rostedt
  2015-09-29 15:05         ` Peter Zijlstra
  0 siblings, 1 reply; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx


[ Resend, with the Cc list back ]

On Tue, 29 Sep 2015 16:35:59 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Sep 29, 2015 at 10:37:29AM -0400, Steven Rostedt wrote:
> > On Tue, 29 Sep 2015 11:28:26 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > As per commit d86ee4809d03 ("sched: optimize cond_resched()") we need
> > > PREEMPT_ACTIVE to avoid cond_resched() from working before the
> > > scheduler is setup.
> > > 
> > > However, keeping preemption disabled should do the same thing already,
> > > making the PREEMPT_ACTIVE part entirely redundant.
> > 
> > Thus PREEMPT_ACTIVE wasn't needed at that commit either, or was it?
> 
> Correct. We might have actually discussed that at that time but decided
> to be conservative -- but these be vague memories..

Hmm, I just looked closer and I'm wondering if this still works?

We have:

#ifdef CONFIG_PREEMPT_COUNT
#define PREEMPT_DISABLED	(1 + PREEMPT_ENABLED)
#else
#define PREEMPT_DISABLED	PREEMPT_ENABLED
#endif

Now if we just remove the PREEMPT_ACTIVE from the INIT_PREEMPT_COUNT,
when CONFIG_PREEMPT_COUNT is not set, wouldn't a _cond_resched() in
boot up schedule?

-- Steve

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT
  2015-09-29 15:00       ` Steven Rostedt
@ 2015-09-29 15:05         ` Peter Zijlstra
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 15:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, Sep 29, 2015 at 11:00:49AM -0400, Steven Rostedt wrote:
> 
> [ Resend, with the Cc list back ]

You seem to have a hard time with these buttons today ;-)

> Hmm, I just looked closer and I'm wondering if this still works?
> 
> We have:
> 
> #ifdef CONFIG_PREEMPT_COUNT
> #define PREEMPT_DISABLED	(1 + PREEMPT_ENABLED)
> #else
> #define PREEMPT_DISABLED	PREEMPT_ENABLED
> #endif
> 
> Now if we just remove the PREEMPT_ACTIVE from the INIT_PREEMPT_COUNT,
> when CONFIG_PREEMPT_COUNT is not set, wouldn't a _cond_resched() in
> boot up schedule?

Good point, PREEMPT_COUNT isn't required for voluntary preemption; when
I tested the voluntary build nothing bad happened, but that could have
been luck of course.

So what we can do is use an unconditional 1 for INIT_PREEMPT_COUNT, such
that the init task will have a raised preempt_count() during boot, even
on !PREEMPT_COUNT kernels.

But that means we cannot use INIT_PREEMPT_COUNT for
init_task_preempt_count() and such like, as I wanted to. I'll make
FORK_PREEMPT_COUNT for that instead.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 03/11] sched: Robustify preemption leak checks
  2015-09-29  9:28 ` [RFC][PATCH 03/11] sched: Robustify preemption leak checks Peter Zijlstra
                     ` (2 preceding siblings ...)
  2015-09-29 13:25   ` Frederic Weisbecker
@ 2015-09-29 15:07   ` Steven Rostedt
  2015-09-29 15:17     ` Peter Zijlstra
  3 siblings, 1 reply; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:28 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> When we warn about a preempt_count leak; reset the preempt_count to
> the known good value such that the problem does not ripple forward.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/exit.c       |    4 +++-
>  kernel/sched/core.c |    4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -706,10 +706,12 @@ void do_exit(long code)
>  	smp_mb();
>  	raw_spin_unlock_wait(&tsk->pi_lock);
>  
> -	if (unlikely(in_atomic()))
> +	if (unlikely(in_atomic())) {
>  		pr_info("note: %s[%d] exited with preempt_count %d\n",
>  			current->comm, task_pid_nr(current),
>  			preempt_count());
> +		preempt_count_set(PREEMPT_ENABLED);
> +	}

Looks good.

>  
>  	/* sync mm's RSS info before statistics gathering */
>  	if (tsk->mm)
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2960,8 +2960,10 @@ static inline void schedule_debug(struct
>  	 * schedule() atomically, we ignore that path. Otherwise whine
>  	 * if we are scheduling when we should not.
>  	 */
> -	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD))
> +	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) {
>  		__schedule_bug(prev);
> +		preempt_count_set(PREEMPT_DISABLED);
> +	}

Of course, if this was not a preemption leak, but something that called
schedule within a preempt_disable()/preempt_enable() section, when it
returns, preemption will be enabled, right?

-- Steve


>  	rcu_sleep_check();
>  
>  	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception
  2015-09-29  9:28 ` [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
  2015-09-29 13:13   ` Thomas Gleixner
  2015-09-29 13:41   ` Frederic Weisbecker
@ 2015-09-29 15:10   ` Steven Rostedt
  2 siblings, 0 replies; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:29 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> TASK_DEAD is special in that the final schedule call from do_exit()
> must be done with preemption disabled.
> 
> This leads to a violation of our new scheduling invariant which states
> that the preempt count should be 2. Move the check for TASK_DEAD out
> of the debug check and use it to decrement the preempt count (from 2
> to 1).
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 03/11] sched: Robustify preemption leak checks
  2015-09-29 15:07   ` Steven Rostedt
@ 2015-09-29 15:17     ` Peter Zijlstra
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 15:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, Sep 29, 2015 at 11:07:34AM -0400, Steven Rostedt wrote:
> On Tue, 29 Sep 2015 11:28:28 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2960,8 +2960,10 @@ static inline void schedule_debug(struct
> >  	 * schedule() atomically, we ignore that path. Otherwise whine
> >  	 * if we are scheduling when we should not.
> >  	 */
> > -	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD))
> > +	if (unlikely(in_atomic_preempt_off() && prev->state != TASK_DEAD)) {
> >  		__schedule_bug(prev);
> > +		preempt_count_set(PREEMPT_DISABLED);
> > +	}
> 
> Of course, if this was not a preemption leak, but something that called
> schedule within a preempt_disable()/preempt_enable() section, when it
> returns, preemption will be enabled, right?

Indeed.. But it ensures only the task that incorrectly called schedule()
gets screwed and not everybody else.

This is most important on x86 which has a per cpu preempt_count that is
not saved/restored (after this series). So if you schedule with an
invalid (!2*PREEMPT_DISABLE_OFFSET) preempt_count the next task is
messed up too.

Enforcing this invariant limits the borkage to just the one task.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 05/11] sched: Add preempt argument to __schedule()
  2015-09-29  9:28 ` [RFC][PATCH 05/11] sched: Add preempt argument to __schedule() Peter Zijlstra
  2015-09-29 13:14   ` Thomas Gleixner
  2015-09-29 13:18   ` Frederic Weisbecker
@ 2015-09-29 15:28   ` Steven Rostedt
  2015-09-29 15:30     ` Steven Rostedt
  2 siblings, 1 reply; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:30 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> There is only a single PREEMPT_ACTIVE use in the regular __schedule()
> path and that is to circumvent the task->state check. Since the code
> setting PREEMPT_ACTIVE is the immediate caller of __schedule() we can
> replace this with a function argument.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 05/11] sched: Add preempt argument to __schedule()
  2015-09-29 15:30     ` Steven Rostedt
@ 2015-09-29 15:29       ` Peter Zijlstra
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 15:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, Sep 29, 2015 at 11:30:26AM -0400, Steven Rostedt wrote:
> On Tue, 29 Sep 2015 11:28:39 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Tue, 29 Sep 2015 11:28:30 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > There is only a single PREEMPT_ACTIVE use in the regular __schedule()
> > > path and that is to circumvent the task->state check. Since the code
> > > setting PREEMPT_ACTIVE is the immediate caller of __schedule() we can
> > > replace this with a function argument.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > 
> > Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> 
> I just want to note that this scared me at first, because __schedule()
> can be traced by the function tracer that can also do a

See 11/11 :-)


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 05/11] sched: Add preempt argument to __schedule()
  2015-09-29 15:28   ` Steven Rostedt
@ 2015-09-29 15:30     ` Steven Rostedt
  2015-09-29 15:29       ` Peter Zijlstra
  0 siblings, 1 reply; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:39 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 29 Sep 2015 11:28:30 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > There is only a single PREEMPT_ACTIVE use in the regular __schedule()
> > path and that is to circumvent the task->state check. Since the code
> > setting PREEMPT_ACTIVE is the immediate caller of __schedule() we can
> > replace this with a function argument.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> 
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

I just want to note that this scared me at first, because __schedule()
can be traced by the function tracer that can also do a
preempt_enable() which would schedule if preemption was enabled. But it
looks as __schedule() is always called with preemption disabled, thus,
no worries.

-- Steve


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 06/11] sched: Fix trace_sched_switch()
  2015-09-29  9:28 ` [RFC][PATCH 06/11] sched: Fix trace_sched_switch() Peter Zijlstra
  2015-09-29 13:15   ` Thomas Gleixner
@ 2015-09-29 15:38   ` Steven Rostedt
  2015-09-29 15:48     ` Peter Zijlstra
  1 sibling, 1 reply; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:31 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> __trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE
> user, move trace_sched_switch() from prepare_task_switch() to
> __schedule() and propagate the @preempt argument.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/trace/events/sched.h      |   22 +++++++++-------------
>  kernel/sched/core.c               |    2 +-
>  kernel/trace/ftrace.c             |    2 +-
>  kernel/trace/trace_sched_switch.c |    3 ++-
>  kernel/trace/trace_sched_wakeup.c |    2 +-
>  5 files changed, 14 insertions(+), 17 deletions(-)
> 
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -104,22 +104,17 @@ DEFINE_EVENT(sched_wakeup_template, sche
>  	     TP_ARGS(p));
>  
>  #ifdef CREATE_TRACE_POINTS
> -static inline long __trace_sched_switch_state(struct task_struct *p)
> +static inline long __trace_sched_switch_state(bool preempt, struct task_struct *p)
>  {
> -	long state = p->state;
> -
> -#ifdef CONFIG_PREEMPT
>  #ifdef CONFIG_SCHED_DEBUG
>  	BUG_ON(p != current);
>  #endif /* CONFIG_SCHED_DEBUG */
> +
>  	/*
> -	 * For all intents and purposes a preempted task is a running task.
> +	 * Preemption ignores task state, therefore preempted tasks are always
> +	 * RUNNING (we will not have dequeued if state != RUNNING).
>  	 */
> -	if (preempt_count() & PREEMPT_ACTIVE)
> -		state = TASK_RUNNING | TASK_STATE_MAX;
> -#endif /* CONFIG_PREEMPT */
> -
> -	return state;
> +	return preempt ? TASK_RUNNING | TASK_STATE_MAX : p->state;
>  }

Hmm, this original change screwed up kernelshark, as it used the
state to determine if something was preempted or not. Because now you
always show a task as running, it can't do that anymore. I think I
bitched about this before.

What about nuking the above and just export to the sched_switch
tracepoint the fact that it was preempted. We now have that information
passed to it.

As everything should be using the parsing files, it should not break
any tools to export it.

As for this patch and with the current state, I see nothing wrong with
it.

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 07/11] sched: Stop setting PREEMPT_ACTIVE
  2015-09-29  9:28 ` [RFC][PATCH 07/11] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
  2015-09-29 13:16   ` Thomas Gleixner
@ 2015-09-29 15:41   ` Steven Rostedt
  2015-09-29 16:25     ` Peter Zijlstra
  1 sibling, 1 reply; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:32 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Now that nothing tests for PREEMPT_ACTIVE anymore, stop setting it.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>


> @@ -3243,13 +3243,7 @@ asmlinkage __visible void __sched notrac
>  		return;
>  
>  	do {
> -		/*
> -		 * Use raw __prempt_count() ops that don't call function.
> -		 * We can't call functions before disabling preemption which
> -		 * disarm preemption tracing recursions.
> -		 */
> -		__preempt_count_add(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
> -		barrier();
> +		preempt_disable_notrace();
>  		/*
>  		 * Needs preempt disabled in case user_exit() is traced
>  		 * and the tracer calls preempt_enable_notrace() causing
> @@ -3259,8 +3253,7 @@ asmlinkage __visible void __sched notrac
>  		__schedule(true);
>  		exception_exit(prev_ctx);
>  
> -		barrier();
> -		__preempt_count_sub(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
> +		preempt_enable_no_resched_notrace();

Nice, you used the notrace variants for this function.

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

>  	} while (need_resched());
>  }

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 08/11] sched: Simplify preempt_count tests
  2015-09-29  9:28 ` [RFC][PATCH 08/11] sched: Simplify preempt_count tests Peter Zijlstra
  2015-09-29 13:17   ` Thomas Gleixner
@ 2015-09-29 15:42   ` Steven Rostedt
  1 sibling, 0 replies; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:33 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Since we stopped setting PREEMPT_ACTIVE, there is no need to mask it
> out of preempt_count() tests.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

>  include/linux/preempt.h |    3 +--
>  kernel/sched/core.c     |    2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -126,8 +126,7 @@
>   * Check whether we were atomic before we did preempt_disable():
>   * (used by the scheduler)
>   */
> -#define in_atomic_preempt_off() \
> -		((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_DISABLE_OFFSET)
> +#define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
>  
>  #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
>  extern void preempt_count_add(int val);
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7472,7 +7472,7 @@ void __init sched_init(void)
>  #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
>  static inline int preempt_count_equals(int preempt_offset)
>  {
> -	int nested = (preempt_count() & ~PREEMPT_ACTIVE) + rcu_preempt_depth();
> +	int nested = preempt_count() + rcu_preempt_depth();
>  
>  	return (nested == preempt_offset);
>  }
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 09/11] sched, x86: Kill saved_preempt_count
  2015-09-29  9:28 ` [RFC][PATCH 09/11] sched, x86: Kill saved_preempt_count Peter Zijlstra
  2015-09-29 13:18   ` Thomas Gleixner
@ 2015-09-29 15:44   ` Steven Rostedt
  2015-09-29 15:46     ` Peter Zijlstra
  1 sibling, 1 reply; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:34 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> With the introduction of the context switch preempt_count invariant,
> and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
> per-cpu preemption count, it must always be 2.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Should we add a warn on someplace:

 WARN_ON_ONCE(preempt_count() != 2)

?

-- Steve

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 10/11] sched: Kill PREEMPT_ACTIVE
  2015-09-29  9:28 ` [RFC][PATCH 10/11] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
  2015-09-29 13:18   ` Thomas Gleixner
@ 2015-09-29 15:45   ` Steven Rostedt
  1 sibling, 0 replies; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:35 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Its unused, kill the definition.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 09/11] sched, x86: Kill saved_preempt_count
  2015-09-29 15:44   ` Steven Rostedt
@ 2015-09-29 15:46     ` Peter Zijlstra
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 15:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, Sep 29, 2015 at 11:44:04AM -0400, Steven Rostedt wrote:
> On Tue, 29 Sep 2015 11:28:34 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > With the introduction of the context switch preempt_count invariant,
> > and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
> > per-cpu preemption count, it must always be 2.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Should we add a warn on someplace:
> 
>  WARN_ON_ONCE(preempt_count() != 2)

See v2 12/11 :-)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 06/11] sched: Fix trace_sched_switch()
  2015-09-29 15:38   ` Steven Rostedt
@ 2015-09-29 15:48     ` Peter Zijlstra
  2015-09-29 16:30       ` Steven Rostedt
  0 siblings, 1 reply; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 15:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, Sep 29, 2015 at 11:38:12AM -0400, Steven Rostedt wrote:
> On Tue, 29 Sep 2015 11:28:31 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > +static inline long __trace_sched_switch_state(bool preempt, struct task_struct *p)
> >  {
> > +	return preempt ? TASK_RUNNING | TASK_STATE_MAX : p->state;
> >  }
> 
> Hmm, this original change screwed up kernelshark, as it used the
> state to determine if something was preempted or not. Because now you
> always show a task as running, it can't do that anymore. I think I
> bitched about this before.
> 
> What about nuking the above and just export to the sched_switch
> tracepoint the fact that it was preempted. We now have that information
> passed to it.
> 
> As everything should be using the parsing files, it should not break
> any tools to export it.

/SHOULD/ being the operative word. Experience has taught me that
changing the sched tracepoint leads to borkage.

But we can sure try, see if someone notices etc.. Same with
trace_sched_wakeup(), that still prints a dummy value.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 11/11] sched: More notrace
  2015-09-29  9:28 ` [RFC][PATCH 11/11] sched: More notrace Peter Zijlstra
  2015-09-29 13:19   ` Thomas Gleixner
@ 2015-09-29 15:58   ` Steven Rostedt
  1 sibling, 0 replies; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 15:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 11:28:36 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> preempt_schedule_common() is marked notrace, but it does not use
> _notrace() preempt_count functions and __schedule() is also not marked
> notrace, which means that its perfectly possible to end up in the
> tracer from preempt_schedule_common().
> 
> So either we need more notrace, or we need to remove notrace from
> preempt_schedule_common().

Yep, there's some history to this. This was originally the issue that
caused function tracing to go into infinite recursion. But now we have
preempt_schedule_notrace(), which is used by the function tracer, and
that function must not be traced till preemption is disabled.

Now if function tracing is running and we take an interrupt when
NEED_RESCHED is set, it calls

  preempt_schedule_common() (not traced)

But then that calls preempt_disable() (traced)

function tracer calls preempt_disable_notrace() followed by
preempt_enable_notrace() which will see NEED_RESCHED set, and it will
call preempt_schedule_notrace(), which stops the recursion, but
still calls __schedule() here, and that means when we return, we call
the __schedule() from preempt_schedule_common().

That said, I prefer this patch. Preemption is disabled before calling
__schedule(), and we get rid of a one round recursion with the
scheduler.

Feel free to add any of the above to your change log.

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/core.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3044,7 +3044,7 @@ pick_next_task(struct rq *rq, struct tas
>   *
>   * WARNING: must be called with preemption disabled!
>   */
> -static void __sched __schedule(bool preempt)
> +static void __sched notrace __schedule(bool preempt)
>  {
>  	struct task_struct *prev, *next;
>  	unsigned long *switch_count;
> @@ -3190,9 +3190,9 @@ void __sched schedule_preempt_disabled(v
>  static void __sched notrace preempt_schedule_common(void)
>  {
>  	do {
> -		preempt_disable();
> +		preempt_disable_notrace();
>  		__schedule(true);
> -		sched_preempt_enable_no_resched();
> +		preempt_enable_no_resched_notrace();
>  
>  		/*
>  		 * Check again in case we missed a preemption opportunity
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 07/11] sched: Stop setting PREEMPT_ACTIVE
  2015-09-29 15:41   ` Steven Rostedt
@ 2015-09-29 16:25     ` Peter Zijlstra
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Zijlstra @ 2015-09-29 16:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, Sep 29, 2015 at 11:41:02AM -0400, Steven Rostedt wrote:
> On Tue, 29 Sep 2015 11:28:32 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Now that nothing tests for PREEMPT_ACTIVE anymore, stop setting it.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> 
> > @@ -3243,13 +3243,7 @@ asmlinkage __visible void __sched notrac
> >  		return;
> >  
> >  	do {
> > -		/*
> > -		 * Use raw __prempt_count() ops that don't call function.
> > -		 * We can't call functions before disabling preemption which
> > -		 * disarm preemption tracing recursions.
> > -		 */
> > -		__preempt_count_add(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
> > -		barrier();
> > +		preempt_disable_notrace();
> >  		/*
> >  		 * Needs preempt disabled in case user_exit() is traced
> >  		 * and the tracer calls preempt_enable_notrace() causing
> > @@ -3259,8 +3253,7 @@ asmlinkage __visible void __sched notrac
> >  		__schedule(true);
> >  		exception_exit(prev_ctx);
> >  
> > -		barrier();
> > -		__preempt_count_sub(PREEMPT_ACTIVE + PREEMPT_DISABLE_OFFSET);
> > +		preempt_enable_no_resched_notrace();
> 
> Nice, you used the notrace variants for this function.

There was a comment stating this was important, and its a notrace
annotated function :-)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC][PATCH 06/11] sched: Fix trace_sched_switch()
  2015-09-29 15:48     ` Peter Zijlstra
@ 2015-09-29 16:30       ` Steven Rostedt
  0 siblings, 0 replies; 61+ messages in thread
From: Steven Rostedt @ 2015-09-29 16:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, torvalds, fweisbec, oleg, umgwanakikbuti, tglx

On Tue, 29 Sep 2015 17:48:28 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> /SHOULD/ being the operative word. Experience has taught me that
> changing the sched tracepoint leads to borkage.

Right. But experience also tells us that those relying on offsets will
get brokage if they run 32 bit userspace on a 64 bit kernel too.

> 
> But we can sure try, see if someone notices etc.. Same with
> trace_sched_wakeup(), that still prints a dummy value.

Removing a field can break things that even use the parsing library.
That's why we still have that.

-- Steve

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2015-09-29 16:31 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29  9:28 [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Peter Zijlstra
2015-09-29  9:28 ` [RFC][PATCH 01/11] sched: Simplify INIT_PREEMPT_COUNT Peter Zijlstra
2015-09-29 10:30   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
2015-09-29 13:19   ` [RFC][PATCH 01/11] sched: " Frederic Weisbecker
2015-09-29 14:43   ` Steven Rostedt
2015-09-29 14:47     ` Peter Zijlstra
     [not found]   ` <20150929103729.00899e69@gandalf.local.home>
     [not found]     ` <20150929143559.GL3816@twins.programming.kicks-ass.net>
2015-09-29 15:00       ` Steven Rostedt
2015-09-29 15:05         ` Peter Zijlstra
2015-09-29  9:28 ` [RFC][PATCH 02/11] sched: Create preempt_count invariant Peter Zijlstra
2015-09-29 10:30   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
2015-09-29 12:55   ` [RFC][PATCH 02/11] sched: " Frederic Weisbecker
2015-09-29 13:02     ` Peter Zijlstra
2015-09-29 14:40       ` Steven Rostedt
2015-09-29 13:11   ` Thomas Gleixner
2015-09-29 14:09     ` Peter Zijlstra
2015-09-29  9:28 ` [RFC][PATCH 03/11] sched: Robustify preemption leak checks Peter Zijlstra
2015-09-29 10:31   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
2015-09-29 13:13   ` [RFC][PATCH 03/11] sched: " Thomas Gleixner
2015-09-29 13:25   ` Frederic Weisbecker
2015-09-29 14:24     ` Peter Zijlstra
2015-09-29 14:32       ` Frederic Weisbecker
2015-09-29 15:07   ` Steven Rostedt
2015-09-29 15:17     ` Peter Zijlstra
2015-09-29  9:28 ` [RFC][PATCH 04/11] sched: Rework TASK_DEAD preemption exception Peter Zijlstra
2015-09-29 13:13   ` Thomas Gleixner
2015-09-29 13:41   ` Frederic Weisbecker
2015-09-29 15:10   ` Steven Rostedt
2015-09-29  9:28 ` [RFC][PATCH 05/11] sched: Add preempt argument to __schedule() Peter Zijlstra
2015-09-29 13:14   ` Thomas Gleixner
2015-09-29 13:18   ` Frederic Weisbecker
2015-09-29 15:28   ` Steven Rostedt
2015-09-29 15:30     ` Steven Rostedt
2015-09-29 15:29       ` Peter Zijlstra
2015-09-29  9:28 ` [RFC][PATCH 06/11] sched: Fix trace_sched_switch() Peter Zijlstra
2015-09-29 13:15   ` Thomas Gleixner
2015-09-29 15:38   ` Steven Rostedt
2015-09-29 15:48     ` Peter Zijlstra
2015-09-29 16:30       ` Steven Rostedt
2015-09-29  9:28 ` [RFC][PATCH 07/11] sched: Stop setting PREEMPT_ACTIVE Peter Zijlstra
2015-09-29 13:16   ` Thomas Gleixner
2015-09-29 15:41   ` Steven Rostedt
2015-09-29 16:25     ` Peter Zijlstra
2015-09-29  9:28 ` [RFC][PATCH 08/11] sched: Simplify preempt_count tests Peter Zijlstra
2015-09-29 13:17   ` Thomas Gleixner
2015-09-29 15:42   ` Steven Rostedt
2015-09-29  9:28 ` [RFC][PATCH 09/11] sched, x86: Kill saved_preempt_count Peter Zijlstra
2015-09-29 13:18   ` Thomas Gleixner
2015-09-29 15:44   ` Steven Rostedt
2015-09-29 15:46     ` Peter Zijlstra
2015-09-29  9:28 ` [RFC][PATCH 10/11] sched: Kill PREEMPT_ACTIVE Peter Zijlstra
2015-09-29 13:18   ` Thomas Gleixner
2015-09-29 15:45   ` Steven Rostedt
2015-09-29  9:28 ` [RFC][PATCH 11/11] sched: More notrace Peter Zijlstra
2015-09-29 13:19   ` Thomas Gleixner
2015-09-29 15:58   ` Steven Rostedt
2015-09-29 10:27 ` [RFC][PATCH 12/11] sched: Add preempt_count invariant check Peter Zijlstra
2015-09-29 10:32   ` Peter Zijlstra
2015-09-29 10:56   ` [RFC][PATCH v2 " Peter Zijlstra
2015-09-29 11:55     ` Linus Torvalds
2015-09-29 11:55       ` Peter Zijlstra
2015-09-29 10:34 ` [RFC][PATCH 00/11] sched: Killing PREEMPT_ACTIVE Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.