linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/8] sched,idle: need resched polling rework
@ 2014-04-11 13:42 Peter Zijlstra
  2014-04-11 13:42 ` [RFC][PATCH 1/8] sched,idle,alpha: Switch from TS_POLLING to TIF_POLLING_NRFLAG Peter Zijlstra
                   ` (9 more replies)
  0 siblings, 10 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 13:42 UTC (permalink / raw)
  To: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, peterz, umgwanakikbuti
  Cc: linux-kernel

A while ago both Mike and Andy complained that we still get pointless wakeup
IPIs, we had a few patches back and forth but eventually more or less agreed
and then nothing... :-)

So here's a number of patches that implement something near what we left off
with.

Its only been compile/boot tested on x86_64, I've no actually looked at the IPI
numbers yet.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 1/8] sched,idle,alpha: Switch from TS_POLLING to TIF_POLLING_NRFLAG
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
@ 2014-04-11 13:42 ` Peter Zijlstra
  2014-04-11 14:38   ` Richard Henderson
  2014-04-11 13:42 ` [RFC][PATCH 2/8] sched,idle,tile: " Peter Zijlstra
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 13:42 UTC (permalink / raw)
  To: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, peterz, umgwanakikbuti
  Cc: linux-kernel, Richard Henderson

[-- Attachment #1: peterz-ts_polling-alpha.patch --]
[-- Type: text/plain, Size: 2013 bytes --]

Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
This will allow us, using fetch_or(), to both set NEED_RESCHED and
check for POLLING_NRFLAG in a single operation and avoid pointless
wakeups.

Changing from the non-atomic thread_info::status flags to the atomic
thread_info::flags shouldn't be a big issue since most polling state
changes were followed/preceded by a full memory barrier anyway.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/alpha/include/asm/thread_info.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/alpha/include/asm/thread_info.h
+++ b/arch/alpha/include/asm/thread_info.h
@@ -73,12 +73,14 @@ register struct thread_info *__current_t
 #define TIF_SYSCALL_AUDIT	4	/* syscall audit active */
 #define TIF_DIE_IF_KERNEL	9	/* dik recursion lock */
 #define TIF_MEMDIE		13	/* is terminating due to OOM killer */
+#define TIF_POLLING_NRFLAG	14	/* idle is polling for TIF_NEED_RESCHED */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1<<TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
 #define _TIF_SYSCALL_AUDIT	(1<<TIF_SYSCALL_AUDIT)
+#define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
 
 /* Work to do on interrupt/exception return.  */
 #define _TIF_WORK_MASK		(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
@@ -92,8 +94,6 @@ register struct thread_info *__current_t
 #define TS_UAC_NOFIX		0x0002	/* ! flags as they match          */
 #define TS_UAC_SIGBUS		0x0004	/* ! userspace part of 'osf_sysinfo' */
 #define TS_RESTORE_SIGMASK	0x0008	/* restore signal mask in do_signal() */
-#define TS_POLLING		0x0010	/* idle task polling need_resched,
-					   skip sending interrupt */
 
 #ifndef __ASSEMBLY__
 #define HAVE_SET_RESTORE_SIGMASK	1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 2/8] sched,idle,tile: Switch from TS_POLLING to TIF_POLLING_NRFLAG
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
  2014-04-11 13:42 ` [RFC][PATCH 1/8] sched,idle,alpha: Switch from TS_POLLING to TIF_POLLING_NRFLAG Peter Zijlstra
@ 2014-04-11 13:42 ` Peter Zijlstra
  2014-04-11 15:15   ` Chris Metcalf
  2014-04-11 13:42 ` [RFC][PATCH 3/8] sched,idle,ia64: " Peter Zijlstra
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 13:42 UTC (permalink / raw)
  To: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, peterz, umgwanakikbuti
  Cc: linux-kernel, Chris Metcalf

[-- Attachment #1: peterz-ts_polling-tile.patch --]
[-- Type: text/plain, Size: 1887 bytes --]

Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
This will allow us, using fetch_or(), to both set NEED_RESCHED and
check for POLLING_NRFLAG in a single operation and avoid pointless
wakeups.

Changing from the non-atomic thread_info::status flags to the atomic
thread_info::flags shouldn't be a big issue since most polling state
changes were followed/preceded by a full memory barrier anyway.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/tile/include/asm/thread_info.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/tile/include/asm/thread_info.h
+++ b/arch/tile/include/asm/thread_info.h
@@ -129,6 +129,7 @@ extern void _cpu_idle(void);
 #define TIF_MEMDIE		7	/* OOM killer at work */
 #define TIF_NOTIFY_RESUME	8	/* callback before returning to user */
 #define TIF_SYSCALL_TRACEPOINT	9	/* syscall tracepoint instrumentation */
+#define TIF_POLLING_NRFLAG	10	/* idle is polling for TIF_NEED_RESCHED */
 
 #define _TIF_SIGPENDING		(1<<TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
@@ -140,6 +141,7 @@ extern void _cpu_idle(void);
 #define _TIF_MEMDIE		(1<<TIF_MEMDIE)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
 #define _TIF_SYSCALL_TRACEPOINT	(1<<TIF_SYSCALL_TRACEPOINT)
+#define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
 
 /* Work to do on any return to user space. */
 #define _TIF_ALLWORK_MASK \
@@ -162,7 +164,6 @@ extern void _cpu_idle(void);
 #ifdef __tilegx__
 #define TS_COMPAT		0x0001	/* 32-bit compatibility mode */
 #endif
-#define TS_POLLING		0x0004	/* in idle loop but not sleeping */
 #define TS_RESTORE_SIGMASK	0x0008	/* restore signal mask in do_signal */
 
 #ifndef __ASSEMBLY__



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 3/8] sched,idle,ia64: Switch from TS_POLLING to TIF_POLLING_NRFLAG
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
  2014-04-11 13:42 ` [RFC][PATCH 1/8] sched,idle,alpha: Switch from TS_POLLING to TIF_POLLING_NRFLAG Peter Zijlstra
  2014-04-11 13:42 ` [RFC][PATCH 2/8] sched,idle,tile: " Peter Zijlstra
@ 2014-04-11 13:42 ` Peter Zijlstra
  2014-04-11 13:42 ` [RFC][PATCH 4/8] sched,idle,x86: " Peter Zijlstra
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 13:42 UTC (permalink / raw)
  To: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, peterz, umgwanakikbuti
  Cc: linux-kernel, Tony Luck

[-- Attachment #1: peterz-ts_polling-ia64.patch --]
[-- Type: text/plain, Size: 2002 bytes --]

Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
This will allow us, using fetch_or(), to both set NEED_RESCHED and
check for POLLING_NRFLAG in a single operation and avoid pointless
wakeups.

Changing from the non-atomic thread_info::status flags to the atomic
thread_info::flags shouldn't be a big issue since most polling state
changes were followed/preceded by a full memory barrier anyway.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/ia64/include/asm/thread_info.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/ia64/include/asm/thread_info.h
+++ b/arch/ia64/include/asm/thread_info.h
@@ -107,6 +107,7 @@ struct thread_info {
 #define TIF_MCA_INIT		18	/* this task is processing MCA or INIT */
 #define TIF_DB_DISABLED		19	/* debug trap disabled for fsyscall */
 #define TIF_RESTORE_RSE		21	/* user RBS is newer than kernel RBS */
+#define TIF_POLLING_NRFLAG	22	/* idle is polling for TIF_NEED_RESCHED */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
@@ -118,6 +119,7 @@ struct thread_info {
 #define _TIF_MCA_INIT		(1 << TIF_MCA_INIT)
 #define _TIF_DB_DISABLED	(1 << TIF_DB_DISABLED)
 #define _TIF_RESTORE_RSE	(1 << TIF_RESTORE_RSE)
+#define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 
 /* "work to do on user-return" bits */
 #define TIF_ALLWORK_MASK	(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SYSCALL_AUDIT|\
@@ -125,7 +127,6 @@ struct thread_info {
 /* like TIF_ALLWORK_BITS but sans TIF_SYSCALL_TRACE or TIF_SYSCALL_AUDIT */
 #define TIF_WORK_MASK		(TIF_ALLWORK_MASK&~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT))
 
-#define TS_POLLING		1 	/* true if in idle loop and not sleeping */
 #define TS_RESTORE_SIGMASK	2	/* restore signal mask in do_signal() */
 
 #ifndef __ASSEMBLY__



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 4/8] sched,idle,x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
                   ` (2 preceding siblings ...)
  2014-04-11 13:42 ` [RFC][PATCH 3/8] sched,idle,ia64: " Peter Zijlstra
@ 2014-04-11 13:42 ` Peter Zijlstra
  2014-04-11 13:42 ` [RFC][PATCH 5/8] sched,idle: Remove TS_POLLING support Peter Zijlstra
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 13:42 UTC (permalink / raw)
  To: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, peterz, umgwanakikbuti
  Cc: linux-kernel

[-- Attachment #1: peterz-ts_polling-x86.patch --]
[-- Type: text/plain, Size: 2752 bytes --]

Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
This will allow us, using fetch_or(), to both set NEED_RESCHED and
check for POLLING_NRFLAG in a single operation and avoid pointless
wakeups.

Changing from the non-atomic thread_info::status flags to the atomic
thread_info::flags shouldn't be a big issue since most polling state
changes were followed/preceded by a full memory barrier anyway.

Also, fix up the apm_32 idle function, clearly that was forgotten in
the last conversion. The default idle state is !POLLING so just kill
the lot.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/thread_info.h |    4 ++--
 arch/x86/kernel/apm_32.c           |   11 -----------
 2 files changed, 2 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -83,6 +83,7 @@ struct thread_info {
 #define TIF_FORK		18	/* ret_from_fork */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
+#define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
 #define TIF_IO_BITMAP		22	/* uses I/O bitmap */
 #define TIF_FORCED_TF		24	/* true if TF in eflags artificially */
 #define TIF_BLOCKSTEP		25	/* set when we want DEBUGCTLMSR_BTF */
@@ -106,6 +107,7 @@ struct thread_info {
 #define _TIF_IA32		(1 << TIF_IA32)
 #define _TIF_FORK		(1 << TIF_FORK)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
+#define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
 #define _TIF_FORCED_TF		(1 << TIF_FORCED_TF)
 #define _TIF_BLOCKSTEP		(1 << TIF_BLOCKSTEP)
@@ -191,8 +193,6 @@ static inline struct thread_info *curren
  * have to worry about atomic accesses.
  */
 #define TS_COMPAT		0x0002	/* 32bit syscall active (64BIT)*/
-#define TS_POLLING		0x0004	/* idle task polling need_resched,
-					   skip sending interrupt */
 #define TS_RESTORE_SIGMASK	0x0008	/* restore signal mask in do_signal() */
 
 #ifndef __ASSEMBLY__
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -844,21 +844,10 @@ static int apm_do_idle(void)
 	int polling;
 	int err = 0;
 
-	polling = !!(current_thread_info()->status & TS_POLLING);
-	if (polling) {
-		current_thread_info()->status &= ~TS_POLLING;
-		/*
-		 * TS_POLLING-cleared state must be visible before we
-		 * test NEED_RESCHED:
-		 */
-		smp_mb();
-	}
 	if (!need_resched()) {
 		idled = 1;
 		ret = apm_bios_call_simple(APM_FUNC_IDLE, 0, 0, &eax, &err);
 	}
-	if (polling)
-		current_thread_info()->status |= TS_POLLING;
 
 	if (!idled)
 		return 0;



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 5/8] sched,idle: Remove TS_POLLING support
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
                   ` (3 preceding siblings ...)
  2014-04-11 13:42 ` [RFC][PATCH 4/8] sched,idle,x86: " Peter Zijlstra
@ 2014-04-11 13:42 ` Peter Zijlstra
  2014-04-11 13:42 ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs Peter Zijlstra
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 13:42 UTC (permalink / raw)
  To: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, peterz, umgwanakikbuti
  Cc: linux-kernel

[-- Attachment #1: peterz-ts_polling-kill.patch --]
[-- Type: text/plain, Size: 1762 bytes --]

Now that there are no architectures left using it, kill the support
for TS_POLLING.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/sched.h |   46 ++--------------------------------------------
 1 file changed, 2 insertions(+), 44 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2695,51 +2695,9 @@ static inline int spin_needbreak(spinloc
 
 /*
  * Idle thread specific functions to determine the need_resched
- * polling state. We have two versions, one based on TS_POLLING in
- * thread_info.status and one based on TIF_POLLING_NRFLAG in
- * thread_info.flags
+ * polling state.
  */
-#ifdef TS_POLLING
-static inline int tsk_is_polling(struct task_struct *p)
-{
-	return task_thread_info(p)->status & TS_POLLING;
-}
-static inline void __current_set_polling(void)
-{
-	current_thread_info()->status |= TS_POLLING;
-}
-
-static inline bool __must_check current_set_polling_and_test(void)
-{
-	__current_set_polling();
-
-	/*
-	 * Polling state must be visible before we test NEED_RESCHED,
-	 * paired by resched_task()
-	 */
-	smp_mb();
-
-	return unlikely(tif_need_resched());
-}
-
-static inline void __current_clr_polling(void)
-{
-	current_thread_info()->status &= ~TS_POLLING;
-}
-
-static inline bool __must_check current_clr_polling_and_test(void)
-{
-	__current_clr_polling();
-
-	/*
-	 * Polling state must be visible before we test NEED_RESCHED,
-	 * paired by resched_task()
-	 */
-	smp_mb();
-
-	return unlikely(tif_need_resched());
-}
-#elif defined(TIF_POLLING_NRFLAG)
+#ifdef TIF_POLLING_NRFLAG
 static inline int tsk_is_polling(struct task_struct *p)
 {
 	return test_tsk_thread_flag(p, TIF_POLLING_NRFLAG);



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
                   ` (4 preceding siblings ...)
  2014-04-11 13:42 ` [RFC][PATCH 5/8] sched,idle: Remove TS_POLLING support Peter Zijlstra
@ 2014-04-11 13:42 ` Peter Zijlstra
  2014-04-13 21:41   ` Nicolas Pitre
  2014-05-09 13:37   ` James Hogan
  2014-04-11 13:42 ` [RFC][PATCH 7/8] sched,idle: Delay clearing the polling bit Peter Zijlstra
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 13:42 UTC (permalink / raw)
  To: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, peterz, umgwanakikbuti
  Cc: linux-kernel

[-- Attachment #1: peterz-sched-resched_cpu.patch --]
[-- Type: text/plain, Size: 2698 bytes --]

Because mwait_idle_with_hints() gets called from !idle context it must
call current_clr_polling(). This however means that resched_task() is
very likely to send an IPI even when we were polling:

  CPU0					CPU1

  if (current_set_polling_and_test())
    goto out;

  __monitor(&ti->flags);
  if (!need_resched())
    __mwait(eax, ecx);
					set_tsk_need_resched(p);
					smp_mb();
out:
  current_clr_polling();
					if (!tsk_is_polling(p))
					  smp_send_reschedule(cpu);


So while it is correct (extra IPIs aren't a problem, whereas a missed
IPI would be) it is a performance problem (for some).

Avoid this issue by using fetch_or() to atomically set NEED_RESCHED
and test if POLLING_NRFLAG is set.

Since a CPU stuck in mwait is unlikely to modify the flags word,
contention on the cmpxchg is unlikely and thus we should mostly
succeed in a single go.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/core.c |   41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -505,6 +505,39 @@ static inline void init_hrtick(void)
 #endif	/* CONFIG_SCHED_HRTICK */
 
 /*
+ * cmpxchg based fetch_or, macro so it works for different integer types
+ */
+#define fetch_or(ptr, val)						\
+({	typeof(*(ptr)) __old, __val = *(ptr);				\
+ 	for (;;) {							\
+ 		__old = cmpxchg((ptr), __val, __val | (val));		\
+ 		if (__old == __val)					\
+ 			break;						\
+ 		__val = __old;						\
+ 	}								\
+ 	__old;								\
+})
+
+#ifdef TIF_POLLING_NRFLAG
+/*
+ * Atomically set TIF_NEED_RESCHED and test for TIF_POLLING_NRFLAG,
+ * this avoids any races wrt polling state changes and thereby avoids
+ * spurious IPIs.
+ */
+static bool set_nr_and_not_polling(struct task_struct *p)
+{
+	struct thread_info *ti = task_thread_info(p);
+	return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
+}
+#else
+static bool set_nr_and_not_polling(struct task_struct *p)
+{
+	set_tsk_need_resched(p);
+	return true;
+}
+#endif
+
+/*
  * resched_task - mark a task 'to be rescheduled now'.
  *
  * On UP this means the setting of the need_resched flag, on SMP it
@@ -520,17 +553,15 @@ void resched_task(struct task_struct *p)
 	if (test_tsk_need_resched(p))
 		return;
 
-	set_tsk_need_resched(p);
-
 	cpu = task_cpu(p);
+
 	if (cpu == smp_processor_id()) {
+		set_tsk_need_resched(p);
 		set_preempt_need_resched();
 		return;
 	}
 
-	/* NEED_RESCHED must be visible before we test polling */
-	smp_mb();
-	if (!tsk_is_polling(p))
+	if (set_nr_and_not_polling(p))
 		smp_send_reschedule(cpu);
 }
 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 7/8] sched,idle: Delay clearing the polling bit
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
                   ` (5 preceding siblings ...)
  2014-04-11 13:42 ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs Peter Zijlstra
@ 2014-04-11 13:42 ` Peter Zijlstra
  2014-04-13 21:51   ` Nicolas Pitre
  2014-04-11 13:42 ` [RFC][PATCH 8/8] sched,idle: Reflow cpuidle_idle_call() Peter Zijlstra
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 13:42 UTC (permalink / raw)
  To: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, peterz, umgwanakikbuti
  Cc: linux-kernel

[-- Attachment #1: sched-idle-polling-fixup.patch --]
[-- Type: text/plain, Size: 1817 bytes --]

With the generic idle functions assuming !polling we should only clear
the polling bit at the very last opportunity in order to avoid
spurious IPIs.

Ideally we'd flip the default to polling, but that means auditing all
arch idle functions.

Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/idle.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -78,12 +78,10 @@ static int cpuidle_idle_call(void)
 
 	/*
 	 * Check if the idle task must be rescheduled. If it is the
-	 * case, exit the function after re-enabling the local irq and
-	 * set again the polling flag
+	 * case, exit the function after re-enabling the local irq.
 	 */
-	if (current_clr_polling_and_test()) {
+	if (need_resched()) {
 		local_irq_enable();
-		__current_set_polling();
 		return 0;
 	}
 
@@ -127,7 +125,7 @@ static int cpuidle_idle_call(void)
 			broadcast = !!(drv->states[next_state].flags &
 				       CPUIDLE_FLAG_TIMER_STOP);
 
-			if (broadcast)
+			if (broadcast) {
 				/*
 				 * Tell the time framework to switch
 				 * to a broadcast timer because our
@@ -139,6 +137,7 @@ static int cpuidle_idle_call(void)
 				ret = clockevents_notify(
 					CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
 					&dev->cpu);
+			}
 
 			if (!ret) {
 				trace_cpu_idle_rcuidle(next_state, dev->cpu);
@@ -175,8 +174,12 @@ static int cpuidle_idle_call(void)
 	 * We can't use the cpuidle framework, let's use the default
 	 * idle routine
 	 */
-	if (ret)
-		arch_cpu_idle();
+	if (ret) {
+		if (!current_clr_polling_and_test())
+			arch_cpu_idle();
+		else
+			local_irq_enable();
+	}
 
 	__current_set_polling();
 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 8/8] sched,idle: Reflow cpuidle_idle_call()
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
                   ` (6 preceding siblings ...)
  2014-04-11 13:42 ` [RFC][PATCH 7/8] sched,idle: Delay clearing the polling bit Peter Zijlstra
@ 2014-04-11 13:42 ` Peter Zijlstra
  2014-04-13 21:36   ` Nicolas Pitre
  2014-04-11 15:00 ` [RFC][PATCH 0/8] sched,idle: need resched polling rework Andy Lutomirski
  2014-04-12  8:35 ` Mike Galbraith
  9 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 13:42 UTC (permalink / raw)
  To: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, peterz, umgwanakikbuti
  Cc: linux-kernel

[-- Attachment #1: sched-idle-polling-reflow.patch --]
[-- Type: text/plain, Size: 4887 bytes --]

Apply goto to reduce lines and nesting levels.

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/idle.c |  138 +++++++++++++++++++++++-----------------------------
 1 file changed, 62 insertions(+), 76 deletions(-)

--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -73,7 +73,7 @@ static int cpuidle_idle_call(void)
 {
 	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
 	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
-	int next_state, entered_state, ret;
+	int next_state, entered_state;
 	bool broadcast;
 
 	/*
@@ -102,90 +102,64 @@ static int cpuidle_idle_call(void)
 	 * Check if the cpuidle framework is ready, otherwise fallback
 	 * to the default arch specific idle method
 	 */
-	ret = cpuidle_enabled(drv, dev);
+	if (cpuidle_enabled(drv, dev))
+		goto use_default;
 
-	if (!ret) {
-		/*
-		 * Ask the governor to choose an idle state it thinks
-		 * it is convenient to go to. There is *always* a
-		 * convenient idle state
-		 */
-		next_state = cpuidle_select(drv, dev);
-
-		/*
-		 * The idle task must be scheduled, it is pointless to
-		 * go to idle, just update no idle residency and get
-		 * out of this function
-		 */
-		if (current_clr_polling_and_test()) {
-			dev->last_residency = 0;
-			entered_state = next_state;
-			local_irq_enable();
-		} else {
-			broadcast = !!(drv->states[next_state].flags &
-				       CPUIDLE_FLAG_TIMER_STOP);
-
-			if (broadcast) {
-				/*
-				 * Tell the time framework to switch
-				 * to a broadcast timer because our
-				 * local timer will be shutdown. If a
-				 * local timer is used from another
-				 * cpu as a broadcast timer, this call
-				 * may fail if it is not available
-				 */
-				ret = clockevents_notify(
-					CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
-					&dev->cpu);
-			}
-
-			if (!ret) {
-				trace_cpu_idle_rcuidle(next_state, dev->cpu);
-
-				/*
-				 * Enter the idle state previously
-				 * returned by the governor
-				 * decision. This function will block
-				 * until an interrupt occurs and will
-				 * take care of re-enabling the local
-				 * interrupts
-				 */
-				entered_state = cpuidle_enter(drv, dev,
-							      next_state);
-
-				trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
-						       dev->cpu);
-
-				if (broadcast)
-					clockevents_notify(
-						CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
-						&dev->cpu);
-
-				/*
-				 * Give the governor an opportunity to reflect on the
-				 * outcome
-				 */
-				cpuidle_reflect(dev, entered_state);
-			}
-		}
-	}
+	/*
+	 * Ask the governor to choose an idle state it thinks
+	 * it is convenient to go to. There is *always* a
+	 * convenient idle state
+	 */
+	next_state = cpuidle_select(drv, dev);
 
 	/*
-	 * We can't use the cpuidle framework, let's use the default
-	 * idle routine
+	 * The idle task must be scheduled, it is pointless to
+	 * go to idle, just update no idle residency and get
+	 * out of this function
 	 */
-	if (ret) {
-		if (!current_clr_polling_and_test())
-			arch_cpu_idle();
-		else
-			local_irq_enable();
+	if (current_clr_polling_and_test()) {
+		dev->last_residency = 0;
+		entered_state = next_state;
+		local_irq_enable();
+		goto exit_idle;
 	}
 
+	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
+
+	/*
+	 * Tell the time framework to switch to a broadcast timer
+	 * because our local timer will be shutdown. If a local timer
+	 * is used from another cpu as a broadcast timer, this call may
+	 * fail if it is not available
+	 */
+	if (broadcast &&
+	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
+		goto use_default;
+
+	trace_cpu_idle_rcuidle(next_state, dev->cpu);
+
+	/*
+	 * Enter the idle state previously returned by the governor decision.
+	 * This function will block until an interrupt occurs and will take
+	 * care of re-enabling the local interrupts
+	 */
+	entered_state = cpuidle_enter(drv, dev, next_state);
+
+	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
+
+	if (broadcast)
+		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
+
+	/*
+	 * Give the governor an opportunity to reflect on the outcome
+	 */
+	cpuidle_reflect(dev, entered_state);
+
+exit_idle:
 	__current_set_polling();
 
 	/*
-	 * It is up to the idle functions to enable back the local
-	 * interrupt
+	 * It is up to the idle functions to reenable local interrupts
 	 */
 	if (WARN_ON_ONCE(irqs_disabled()))
 		local_irq_enable();
@@ -194,6 +168,18 @@ static int cpuidle_idle_call(void)
 	start_critical_timings();
 
 	return 0;
+
+	/*
+	 * We can't use the cpuidle framework, let's use the default idle
+	 * routine
+	 */
+use_default:
+	if (current_clr_polling_and_test())
+		local_irq_enable();
+	else
+		arch_cpu_idle();
+
+	goto exit_idle;
 }
 
 /*



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 1/8] sched,idle,alpha: Switch from TS_POLLING to TIF_POLLING_NRFLAG
  2014-04-11 13:42 ` [RFC][PATCH 1/8] sched,idle,alpha: Switch from TS_POLLING to TIF_POLLING_NRFLAG Peter Zijlstra
@ 2014-04-11 14:38   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2014-04-11 14:38 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, tglx, luto, nicolas.pitre, daniel.lezcano,
	umgwanakikbuti
  Cc: linux-kernel

On 04/11/2014 06:42 AM, Peter Zijlstra wrote:
> Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
> both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
> This will allow us, using fetch_or(), to both set NEED_RESCHED and
> check for POLLING_NRFLAG in a single operation and avoid pointless
> wakeups.
> 
> Changing from the non-atomic thread_info::status flags to the atomic
> thread_info::flags shouldn't be a big issue since most polling state
> changes were followed/preceded by a full memory barrier anyway.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/alpha/include/asm/thread_info.h |    4 ++--
>  1 file changed, 2 inser

Acked-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
                   ` (7 preceding siblings ...)
  2014-04-11 13:42 ` [RFC][PATCH 8/8] sched,idle: Reflow cpuidle_idle_call() Peter Zijlstra
@ 2014-04-11 15:00 ` Andy Lutomirski
  2014-04-11 15:14   ` Peter Zijlstra
  2014-05-22 12:58   ` Peter Zijlstra
  2014-04-12  8:35 ` Mike Galbraith
  9 siblings, 2 replies; 51+ messages in thread
From: Andy Lutomirski @ 2014-04-11 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, daniel.lezcano,
	Mike Galbraith, linux-kernel

On Fri, Apr 11, 2014 at 6:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> A while ago both Mike and Andy complained that we still get pointless wakeup
> IPIs, we had a few patches back and forth but eventually more or less agreed
> and then nothing... :-)
>
> So here's a number of patches that implement something near what we left off
> with.
>
> Its only been compile/boot tested on x86_64, I've no actually looked at the IPI
> numbers yet.
>

Looks generally good and it should be a nice cleanup.

That being said, I think that this addresses once one of the two major
issues.  While the race you're fixing is more interesting, I think its
impact is dwarfed by the fact that ttwu_queue_remote completely
ignores polling.  (NB: I haven't actually tested this patch set, but I
did try to instrument this stuff awhile ago.)

To fix this, presumably the wake-from-idle path needs a
sched_ttwu_pending call, and ttwu_queue_remote could use resched_task.
 sched_ttwu_pending could benefit from a straightforward optimization:
it doesn't need rq->lock if llist is empty.

If you're not planning on trying to fix that, I can try to write up a
patch in the next day or two.

Even with all of this fixed, what happens when ttwu_queue_remote is
called with a task that has lower priority than whatever is currently
running on the targeted cpu?  I think the result is an IPI that serves
very little purpose other than avoiding taking a spinlock in the
waking thread.  This may be a bad tradeoff.  I doubt that this matters
for my particular workload, though.

--Andy

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-04-11 15:00 ` [RFC][PATCH 0/8] sched,idle: need resched polling rework Andy Lutomirski
@ 2014-04-11 15:14   ` Peter Zijlstra
  2014-05-22 12:58   ` Peter Zijlstra
  1 sibling, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 15:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, daniel.lezcano,
	Mike Galbraith, linux-kernel

On Fri, Apr 11, 2014 at 08:00:23AM -0700, Andy Lutomirski wrote:
> That being said, I think that this addresses once one of the two major
> issues.  While the race you're fixing is more interesting, I think its
> impact is dwarfed by the fact that ttwu_queue_remote completely
> ignores polling.  (NB: I haven't actually tested this patch set, but I
> did try to instrument this stuff awhile ago.)
> 
> To fix this, presumably the wake-from-idle path needs a
> sched_ttwu_pending call, and ttwu_queue_remote could use resched_task.
>  sched_ttwu_pending could benefit from a straightforward optimization:
> it doesn't need rq->lock if llist is empty.
> 
> If you're not planning on trying to fix that, I can try to write up a
> patch in the next day or two.

Right; I forgot to write about that; I was going to look at both ttwu
and arch_send_call_function_single_ipi() after this got sorted.

While you didn't complain about the remote function call IPI, Venki
(while @google) did and this was their reason to look at this.

> Even with all of this fixed, what happens when ttwu_queue_remote is
> called with a task that has lower priority than whatever is currently
> running on the targeted cpu?  I think the result is an IPI that serves
> very little purpose other than avoiding taking a spinlock in the
> waking thread.  This may be a bad tradeoff.  I doubt that this matters
> for my particular workload, though.

Today Mike also noted that on very high freq the IPI is actually a lot
slower than doing the remote accesses for some weird reason --
previously I've seen the remote wakeups queue a lot of wakeups and have
the IPI take too long.

So there's definitely something to prod at there.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 2/8] sched,idle,tile: Switch from TS_POLLING to TIF_POLLING_NRFLAG
  2014-04-11 13:42 ` [RFC][PATCH 2/8] sched,idle,tile: " Peter Zijlstra
@ 2014-04-11 15:15   ` Chris Metcalf
  2014-04-11 15:30     ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Chris Metcalf @ 2014-04-11 15:15 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, tglx, luto, nicolas.pitre, daniel.lezcano,
	umgwanakikbuti
  Cc: linux-kernel

On 4/11/2014 9:42 AM, Peter Zijlstra wrote:
> Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
> both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
> This will allow us, using fetch_or(), to both set NEED_RESCHED and
> check for POLLING_NRFLAG in a single operation and avoid pointless
> wakeups.
>
> Changing from the non-atomic thread_info::status flags to the atomic
> thread_info::flags shouldn't be a big issue since most polling state
> changes were followed/preceded by a full memory barrier anyway.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/tile/include/asm/thread_info.h |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Acked-by: Chris Metcalf <cmetcalf@tilera.com>

It's unfortunate that you're rolling the fetch_or() unconditionally out of cmpxchg(), since some architectures (like tilegx) do have support for "fetchor" as a primitive atomic operation.  Currently the tilegx fetchor is only exposed to architecture-independent code via set_bit() and test_and_set_bit(), but perhaps it's worth having an atomic_or() in atomic.h so architectures can choose to optimize it?

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 2/8] sched,idle,tile: Switch from TS_POLLING to TIF_POLLING_NRFLAG
  2014-04-11 15:15   ` Chris Metcalf
@ 2014-04-11 15:30     ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-11 15:30 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, umgwanakikbuti,
	linux-kernel, Linus Torvalds

On Fri, Apr 11, 2014 at 11:15:14AM -0400, Chris Metcalf wrote:
> It's unfortunate that you're rolling the fetch_or() unconditionally
> out of cmpxchg(), since some architectures (like tilegx) do have
> support for "fetchor" as a primitive atomic operation.  Currently the
> tilegx fetchor is only exposed to architecture-independent code via
> set_bit() and test_and_set_bit(), but perhaps it's worth having an
> atomic_or() in atomic.h so architectures can choose to optimize it?

Yeah, I know.. Linus hated atomic_or_return() -- which admittedly is a
little worse than fetch_or, because the operation isn't reversible.

Note that atomic_or_return() or rather, atomic_return_or() won't
actually work here because thread_info::flags is not a fixed typed
between archs, some have int, some have long.

Then again, when I proposed that thing, I didn't have a usage for it.
Still, Linus' main argument still holds, its a crappy primitive on x86
and one should not encourage its use by pretending its 'easy'.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
                   ` (8 preceding siblings ...)
  2014-04-11 15:00 ` [RFC][PATCH 0/8] sched,idle: need resched polling rework Andy Lutomirski
@ 2014-04-12  8:35 ` Mike Galbraith
  9 siblings, 0 replies; 51+ messages in thread
From: Mike Galbraith @ 2014-04-12  8:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, luto, nicolas.pitre, daniel.lezcano, linux-kernel

On Fri, 2014-04-11 at 15:42 +0200, Peter Zijlstra wrote: 
> A while ago both Mike and Andy complained that we still get pointless wakeup
> IPIs, we had a few patches back and forth but eventually more or less agreed
> and then nothing... :-)
> 
> So here's a number of patches that implement something near what we left off
> with.
> 
> Its only been compile/boot tested on x86_64, I've no actually looked at the IPI
> numbers yet.

'course this didn't do much for my Q6600 box, or core2 lappy when booted
max_cstate=1, but series didn't seem to break anything, both boxen still
work fine with the below on top.


Subject: [PATCH REGRESSION FIX] x86 idle: restore mwait_idle()
From: Len Brown <lenb@kernel.org>
Date: Wed, 15 Jan 2014 00:37:34 -0500

From: Len Brown <len.brown@intel.com>

In Linux-3.9 we removed the mwait_idle() loop:
'x86 idle: remove mwait_idle() and "idle=mwait" cmdline param'
(69fb3676df3329a7142803bb3502fa59dc0db2e3)

The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT loop,
until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.

But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
   MWAIT-C1 is preferred for optimal power and performance.
   But if they support just C1, cpuidle never loads and
   so they use the boot-time default idle loop forever.

2. Some laptops will boot-hang if HALT is used,
   but will boot successfully if MWAIT is used.
   This appears to be a hidden assumption in BIOS SMI,
   that is presumably valid on the proprietary OS
   where the BIOS was validated.

   https://bugzilla.kernel.org/show_bug.cgi?id=60770

So here we effectively revert the patch above, restoring
the mwait_idle() loop.  However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.

Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context
For 3.11, 3.12, 3.13, this patch applies cleanly

Mike: add clflush barriers and resched IPI avoidance.

Cc: Mike Galbraith <bitbucket@online.de>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org> # 3.9, 3.10, 3.11, 3.12, 3.13
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Mike Galbraith <bitbucket@online.de>
---
 arch/x86/include/asm/mwait.h |    8 ++++++
 arch/x86/kernel/process.c    |   50 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -30,6 +30,14 @@ static inline void __mwait(unsigned long
 		     :: "a" (eax), "c" (ecx));
 }
 
+static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
+{
+	trace_hardirqs_on();
+	/* "mwait %eax, %ecx;" */
+	asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
+		     :: "a" (eax), "c" (ecx));
+}
+
 /*
  * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
  * which can obviate IPI to trigger checking of need_resched.
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -28,6 +28,7 @@
 #include <asm/fpu-internal.h>
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
+#include <asm/mwait.h>
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -395,6 +396,52 @@ static void amd_e400_idle(void)
 		default_idle();
 }
 
+/*
+ * Intel Core2 and older machines prefer MWAIT over HALT for C1.
+ * We can't rely on cpuidle installing MWAIT, because it will not load
+ * on systems that support only C1 -- so the boot default must be MWAIT.
+ *
+ * Some AMD machines are the opposite, they depend on using HALT.
+ *
+ * So for default C1, which is used during boot until cpuidle loads,
+ * use MWAIT-C1 on Intel HW that has it, else use HALT.
+ */
+static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
+{
+	if (c->x86_vendor != X86_VENDOR_INTEL)
+		return 0;
+
+	if (!cpu_has(c, X86_FEATURE_MWAIT))
+		return 0;
+
+	return 1;
+}
+
+/*
+ * MONITOR/MWAIT with no hints, used for default default C1 state.
+ * This invokes MWAIT with interrutps enabled and no flags,
+ * which is backwards compatible with the original MWAIT implementation.
+ */
+
+static void mwait_idle(void)
+{
+	if (!current_set_polling_and_test()) {
+		if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) {
+			mb();
+			clflush((void *)&current_thread_info()->flags);
+			mb();
+		}
+
+		__monitor((void *)&current_thread_info()->flags, 0, 0);
+		if (!need_resched())
+			__sti_mwait(0, 0);
+		else
+			local_irq_enable();
+	} else
+		local_irq_enable();
+	current_clr_polling();
+}
+
 void select_idle_routine(const struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
@@ -408,6 +455,9 @@ void select_idle_routine(const struct cp
 		/* E400: APIC timer interrupt does not wake up CPU from C1e */
 		pr_info("using AMD E400 aware idle routine\n");
 		x86_idle = amd_e400_idle;
+	} else if (prefer_mwait_c1_over_halt(c)) {
+		pr_info("using mwait in idle threads\n");
+		x86_idle = mwait_idle;
 	} else
 		x86_idle = default_idle;
 }




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 8/8] sched,idle: Reflow cpuidle_idle_call()
  2014-04-11 13:42 ` [RFC][PATCH 8/8] sched,idle: Reflow cpuidle_idle_call() Peter Zijlstra
@ 2014-04-13 21:36   ` Nicolas Pitre
  2014-04-14  8:59     ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Nicolas Pitre @ 2014-04-13 21:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, luto, daniel.lezcano, umgwanakikbuti, linux-kernel

On Fri, 11 Apr 2014, Peter Zijlstra wrote:

> Apply goto to reduce lines and nesting levels.
> 
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/sched/idle.c |  138 +++++++++++++++++++++++-----------------------------
>  1 file changed, 62 insertions(+), 76 deletions(-)
> 
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -73,7 +73,7 @@ static int cpuidle_idle_call(void)
>  {
>  	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
>  	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
> -	int next_state, entered_state, ret;
> +	int next_state, entered_state;
>  	bool broadcast;
>  
>  	/*
> @@ -102,90 +102,64 @@ static int cpuidle_idle_call(void)
>  	 * Check if the cpuidle framework is ready, otherwise fallback
>  	 * to the default arch specific idle method
>  	 */
> -	ret = cpuidle_enabled(drv, dev);
> +	if (cpuidle_enabled(drv, dev))
> +		goto use_default;

Why not using braces here and moving the use_default code block inside 
it instead?


>  
> -	if (!ret) {
> -		/*
> -		 * Ask the governor to choose an idle state it thinks
> -		 * it is convenient to go to. There is *always* a
> -		 * convenient idle state
> -		 */
> -		next_state = cpuidle_select(drv, dev);
> -
> -		/*
> -		 * The idle task must be scheduled, it is pointless to
> -		 * go to idle, just update no idle residency and get
> -		 * out of this function
> -		 */
> -		if (current_clr_polling_and_test()) {
> -			dev->last_residency = 0;
> -			entered_state = next_state;
> -			local_irq_enable();
> -		} else {
> -			broadcast = !!(drv->states[next_state].flags &
> -				       CPUIDLE_FLAG_TIMER_STOP);
> -
> -			if (broadcast) {
> -				/*
> -				 * Tell the time framework to switch
> -				 * to a broadcast timer because our
> -				 * local timer will be shutdown. If a
> -				 * local timer is used from another
> -				 * cpu as a broadcast timer, this call
> -				 * may fail if it is not available
> -				 */
> -				ret = clockevents_notify(
> -					CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
> -					&dev->cpu);
> -			}
> -
> -			if (!ret) {
> -				trace_cpu_idle_rcuidle(next_state, dev->cpu);
> -
> -				/*
> -				 * Enter the idle state previously
> -				 * returned by the governor
> -				 * decision. This function will block
> -				 * until an interrupt occurs and will
> -				 * take care of re-enabling the local
> -				 * interrupts
> -				 */
> -				entered_state = cpuidle_enter(drv, dev,
> -							      next_state);
> -
> -				trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
> -						       dev->cpu);
> -
> -				if (broadcast)
> -					clockevents_notify(
> -						CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
> -						&dev->cpu);
> -
> -				/*
> -				 * Give the governor an opportunity to reflect on the
> -				 * outcome
> -				 */
> -				cpuidle_reflect(dev, entered_state);
> -			}
> -		}
> -	}
> +	/*
> +	 * Ask the governor to choose an idle state it thinks
> +	 * it is convenient to go to. There is *always* a
> +	 * convenient idle state
> +	 */
> +	next_state = cpuidle_select(drv, dev);
>  
>  	/*
> -	 * We can't use the cpuidle framework, let's use the default
> -	 * idle routine
> +	 * The idle task must be scheduled, it is pointless to
> +	 * go to idle, just update no idle residency and get
> +	 * out of this function
>  	 */
> -	if (ret) {
> -		if (!current_clr_polling_and_test())
> -			arch_cpu_idle();
> -		else
> -			local_irq_enable();
> +	if (current_clr_polling_and_test()) {
> +		dev->last_residency = 0;
> +		entered_state = next_state;
> +		local_irq_enable();
> +		goto exit_idle;
>  	}
>  
> +	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> +
> +	/*
> +	 * Tell the time framework to switch to a broadcast timer
> +	 * because our local timer will be shutdown. If a local timer
> +	 * is used from another cpu as a broadcast timer, this call may
> +	 * fail if it is not available
> +	 */
> +	if (broadcast &&
> +	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
> +		goto use_default;
> +
> +	trace_cpu_idle_rcuidle(next_state, dev->cpu);
> +
> +	/*
> +	 * Enter the idle state previously returned by the governor decision.
> +	 * This function will block until an interrupt occurs and will take
> +	 * care of re-enabling the local interrupts
> +	 */
> +	entered_state = cpuidle_enter(drv, dev, next_state);
> +
> +	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
> +
> +	if (broadcast)
> +		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
> +
> +	/*
> +	 * Give the governor an opportunity to reflect on the outcome
> +	 */
> +	cpuidle_reflect(dev, entered_state);
> +
> +exit_idle:
>  	__current_set_polling();
>  
>  	/*
> -	 * It is up to the idle functions to enable back the local
> -	 * interrupt
> +	 * It is up to the idle functions to reenable local interrupts
>  	 */
>  	if (WARN_ON_ONCE(irqs_disabled()))
>  		local_irq_enable();
> @@ -194,6 +168,18 @@ static int cpuidle_idle_call(void)
>  	start_critical_timings();
>  
>  	return 0;
> +
> +	/*
> +	 * We can't use the cpuidle framework, let's use the default idle
> +	 * routine
> +	 */
> +use_default:
> +	if (current_clr_polling_and_test())
> +		local_irq_enable();
> +	else
> +		arch_cpu_idle();
> +
> +	goto exit_idle;
>  }
>  
>  /*
> 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-04-11 13:42 ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs Peter Zijlstra
@ 2014-04-13 21:41   ` Nicolas Pitre
  2014-05-09 13:37   ` James Hogan
  1 sibling, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2014-04-13 21:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, luto, daniel.lezcano, umgwanakikbuti, linux-kernel

On Fri, 11 Apr 2014, Peter Zijlstra wrote:

> Because mwait_idle_with_hints() gets called from !idle context it must
> call current_clr_polling(). This however means that resched_task() is
> very likely to send an IPI even when we were polling:
> 
>   CPU0					CPU1
> 
>   if (current_set_polling_and_test())
>     goto out;
> 
>   __monitor(&ti->flags);
>   if (!need_resched())
>     __mwait(eax, ecx);
> 					set_tsk_need_resched(p);
> 					smp_mb();
> out:
>   current_clr_polling();
> 					if (!tsk_is_polling(p))
> 					  smp_send_reschedule(cpu);
> 
> 
> So while it is correct (extra IPIs aren't a problem, whereas a missed
> IPI would be) it is a performance problem (for some).
> 
> Avoid this issue by using fetch_or() to atomically set NEED_RESCHED
> and test if POLLING_NRFLAG is set.
> 
> Since a CPU stuck in mwait is unlikely to modify the flags word,
> contention on the cmpxchg is unlikely and thus we should mostly
> succeed in a single go.

Looks like a nice cleanup.

Acked-by: Nicolas Pitre <nico@linaro.org>

> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/sched/core.c |   41 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -505,6 +505,39 @@ static inline void init_hrtick(void)
>  #endif	/* CONFIG_SCHED_HRTICK */
>  
>  /*
> + * cmpxchg based fetch_or, macro so it works for different integer types
> + */
> +#define fetch_or(ptr, val)						\
> +({	typeof(*(ptr)) __old, __val = *(ptr);				\
> + 	for (;;) {							\
> + 		__old = cmpxchg((ptr), __val, __val | (val));		\
> + 		if (__old == __val)					\
> + 			break;						\
> + 		__val = __old;						\
> + 	}								\
> + 	__old;								\
> +})
> +
> +#ifdef TIF_POLLING_NRFLAG
> +/*
> + * Atomically set TIF_NEED_RESCHED and test for TIF_POLLING_NRFLAG,
> + * this avoids any races wrt polling state changes and thereby avoids
> + * spurious IPIs.
> + */
> +static bool set_nr_and_not_polling(struct task_struct *p)
> +{
> +	struct thread_info *ti = task_thread_info(p);
> +	return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
> +}
> +#else
> +static bool set_nr_and_not_polling(struct task_struct *p)
> +{
> +	set_tsk_need_resched(p);
> +	return true;
> +}
> +#endif
> +
> +/*
>   * resched_task - mark a task 'to be rescheduled now'.
>   *
>   * On UP this means the setting of the need_resched flag, on SMP it
> @@ -520,17 +553,15 @@ void resched_task(struct task_struct *p)
>  	if (test_tsk_need_resched(p))
>  		return;
>  
> -	set_tsk_need_resched(p);
> -
>  	cpu = task_cpu(p);
> +
>  	if (cpu == smp_processor_id()) {
> +		set_tsk_need_resched(p);
>  		set_preempt_need_resched();
>  		return;
>  	}
>  
> -	/* NEED_RESCHED must be visible before we test polling */
> -	smp_mb();
> -	if (!tsk_is_polling(p))
> +	if (set_nr_and_not_polling(p))
>  		smp_send_reschedule(cpu);
>  }
>  
> 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 7/8] sched,idle: Delay clearing the polling bit
  2014-04-11 13:42 ` [RFC][PATCH 7/8] sched,idle: Delay clearing the polling bit Peter Zijlstra
@ 2014-04-13 21:51   ` Nicolas Pitre
  0 siblings, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2014-04-13 21:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, luto, daniel.lezcano, umgwanakikbuti, linux-kernel

On Fri, 11 Apr 2014, Peter Zijlstra wrote:

> With the generic idle functions assuming !polling we should only clear
> the polling bit at the very last opportunity in order to avoid
> spurious IPIs.
> 
> Ideally we'd flip the default to polling, but that means auditing all
> arch idle functions.
> 
> Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>

Acked-by: Nicolas Pitre <nico@linaro.org>


> ---
>  kernel/sched/idle.c |   17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -78,12 +78,10 @@ static int cpuidle_idle_call(void)
>  
>  	/*
>  	 * Check if the idle task must be rescheduled. If it is the
> -	 * case, exit the function after re-enabling the local irq and
> -	 * set again the polling flag
> +	 * case, exit the function after re-enabling the local irq.
>  	 */
> -	if (current_clr_polling_and_test()) {
> +	if (need_resched()) {
>  		local_irq_enable();
> -		__current_set_polling();
>  		return 0;
>  	}
>  
> @@ -127,7 +125,7 @@ static int cpuidle_idle_call(void)
>  			broadcast = !!(drv->states[next_state].flags &
>  				       CPUIDLE_FLAG_TIMER_STOP);
>  
> -			if (broadcast)
> +			if (broadcast) {
>  				/*
>  				 * Tell the time framework to switch
>  				 * to a broadcast timer because our
> @@ -139,6 +137,7 @@ static int cpuidle_idle_call(void)
>  				ret = clockevents_notify(
>  					CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
>  					&dev->cpu);
> +			}
>  
>  			if (!ret) {
>  				trace_cpu_idle_rcuidle(next_state, dev->cpu);
> @@ -175,8 +174,12 @@ static int cpuidle_idle_call(void)
>  	 * We can't use the cpuidle framework, let's use the default
>  	 * idle routine
>  	 */
> -	if (ret)
> -		arch_cpu_idle();
> +	if (ret) {
> +		if (!current_clr_polling_and_test())
> +			arch_cpu_idle();
> +		else
> +			local_irq_enable();
> +	}
>  
>  	__current_set_polling();
>  
> 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 8/8] sched,idle: Reflow cpuidle_idle_call()
  2014-04-13 21:36   ` Nicolas Pitre
@ 2014-04-14  8:59     ` Peter Zijlstra
  2014-04-14  9:25       ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-14  8:59 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: mingo, tglx, luto, daniel.lezcano, umgwanakikbuti, linux-kernel

On Sun, Apr 13, 2014 at 05:36:46PM -0400, Nicolas Pitre wrote:
> > @@ -102,90 +102,64 @@ static int cpuidle_idle_call(void)
> >  	 * Check if the cpuidle framework is ready, otherwise fallback
> >  	 * to the default arch specific idle method
> >  	 */
> > -	ret = cpuidle_enabled(drv, dev);
> > +	if (cpuidle_enabled(drv, dev))
> > +		goto use_default;
> 
> Why not using braces here and moving the use_default code block inside 
> it instead?

I didn't because there's two goto use_default sites, but sure, I can do
that.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 8/8] sched,idle: Reflow cpuidle_idle_call()
  2014-04-14  8:59     ` Peter Zijlstra
@ 2014-04-14  9:25       ` Peter Zijlstra
  2014-04-14 13:55         ` Nicolas Pitre
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-04-14  9:25 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: mingo, tglx, luto, daniel.lezcano, umgwanakikbuti, linux-kernel

On Mon, Apr 14, 2014 at 10:59:11AM +0200, Peter Zijlstra wrote:
> > Why not using braces here and moving the use_default code block inside 
> > it instead?
> 
> I didn't because there's two goto use_default sites, but sure, I can do
> that.

---
Subject: sched,idle: Reflow cpuidle_idle_call()
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Apr 11 13:55:48 CEST 2014

Apply goto to reduce lines and nesting levels.

Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-xt260n72av3ip7jsaef2ps3h@git.kernel.org
---
 kernel/sched/idle.c |  131 +++++++++++++++++++++++-----------------------------
 1 file changed, 58 insertions(+), 73 deletions(-)

--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -73,7 +73,7 @@ static int cpuidle_idle_call(void)
 {
 	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
 	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
-	int next_state, entered_state, ret;
+	int next_state, entered_state;
 	bool broadcast;
 
 	/*
@@ -102,90 +102,75 @@ static int cpuidle_idle_call(void)
 	 * Check if the cpuidle framework is ready, otherwise fallback
 	 * to the default arch specific idle method
 	 */
-	ret = cpuidle_enabled(drv, dev);
-
-	if (!ret) {
-		/*
-		 * Ask the governor to choose an idle state it thinks
-		 * it is convenient to go to. There is *always* a
-		 * convenient idle state
-		 */
-		next_state = cpuidle_select(drv, dev);
-
+	if (cpuidle_enabled(drv, dev)) {
+use_default:
 		/*
-		 * The idle task must be scheduled, it is pointless to
-		 * go to idle, just update no idle residency and get
-		 * out of this function
+		 * We can't use the cpuidle framework, let's use the default
+		 * idle routine.
 		 */
-		if (current_clr_polling_and_test()) {
-			dev->last_residency = 0;
-			entered_state = next_state;
+		if (current_clr_polling_and_test())
 			local_irq_enable();
-		} else {
-			broadcast = !!(drv->states[next_state].flags &
-				       CPUIDLE_FLAG_TIMER_STOP);
-
-			if (broadcast) {
-				/*
-				 * Tell the time framework to switch
-				 * to a broadcast timer because our
-				 * local timer will be shutdown. If a
-				 * local timer is used from another
-				 * cpu as a broadcast timer, this call
-				 * may fail if it is not available
-				 */
-				ret = clockevents_notify(
-					CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
-					&dev->cpu);
-			}
-
-			if (!ret) {
-				trace_cpu_idle_rcuidle(next_state, dev->cpu);
-
-				/*
-				 * Enter the idle state previously
-				 * returned by the governor
-				 * decision. This function will block
-				 * until an interrupt occurs and will
-				 * take care of re-enabling the local
-				 * interrupts
-				 */
-				entered_state = cpuidle_enter(drv, dev,
-							      next_state);
-
-				trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
-						       dev->cpu);
-
-				if (broadcast)
-					clockevents_notify(
-						CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
-						&dev->cpu);
-
-				/*
-				 * Give the governor an opportunity to reflect on the
-				 * outcome
-				 */
-				cpuidle_reflect(dev, entered_state);
-			}
-		}
+		else
+			arch_cpu_idle();
+
+		goto exit_idle;
 	}
 
 	/*
-	 * We can't use the cpuidle framework, let's use the default
-	 * idle routine
+	 * Ask the governor to choose an idle state it thinks
+	 * it is convenient to go to. There is *always* a
+	 * convenient idle state
 	 */
-	if (ret) {
-		if (!current_clr_polling_and_test())
-			arch_cpu_idle();
-		else
-			local_irq_enable();
+	next_state = cpuidle_select(drv, dev);
+
+	/*
+	 * The idle task must be scheduled, it is pointless to
+	 * go to idle, just update no idle residency and get
+	 * out of this function
+	 */
+	if (current_clr_polling_and_test()) {
+		dev->last_residency = 0;
+		entered_state = next_state;
+		local_irq_enable();
+		goto exit_idle;
 	}
 
+	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
+
+	/*
+	 * Tell the time framework to switch to a broadcast timer
+	 * because our local timer will be shutdown. If a local timer
+	 * is used from another cpu as a broadcast timer, this call may
+	 * fail if it is not available
+	 */
+	if (broadcast &&
+	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
+		goto use_default;
+
+	trace_cpu_idle_rcuidle(next_state, dev->cpu);
+
+	/*
+	 * Enter the idle state previously returned by the governor decision.
+	 * This function will block until an interrupt occurs and will take
+	 * care of re-enabling the local interrupts
+	 */
+	entered_state = cpuidle_enter(drv, dev, next_state);
+
+	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
+
+	if (broadcast)
+		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
+
+	/*
+	 * Give the governor an opportunity to reflect on the outcome
+	 */
+	cpuidle_reflect(dev, entered_state);
+
+exit_idle:
 	__current_set_polling();
 
 	/*
-	 * It is up to the idle functions to enable back the local
-	 * interrupt
+	 * It is up to the idle functions to reenable local interrupts
 	 */
 	if (WARN_ON_ONCE(irqs_disabled()))
 		local_irq_enable();

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 8/8] sched,idle: Reflow cpuidle_idle_call()
  2014-04-14  9:25       ` Peter Zijlstra
@ 2014-04-14 13:55         ` Nicolas Pitre
  0 siblings, 0 replies; 51+ messages in thread
From: Nicolas Pitre @ 2014-04-14 13:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, tglx, luto, daniel.lezcano, umgwanakikbuti, linux-kernel

On Mon, 14 Apr 2014, Peter Zijlstra wrote:

> On Mon, Apr 14, 2014 at 10:59:11AM +0200, Peter Zijlstra wrote:
> > > Why not using braces here and moving the use_default code block inside 
> > > it instead?
> > 
> > I didn't because there's two goto use_default sites, but sure, I can do
> > that.

Ah, I initially missed the second site, but still.

> ---
> Subject: sched,idle: Reflow cpuidle_idle_call()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri Apr 11 13:55:48 CEST 2014
> 
> Apply goto to reduce lines and nesting levels.
> 
> Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Link: http://lkml.kernel.org/n/tip-xt260n72av3ip7jsaef2ps3h@git.kernel.org

Acked-by: Nicolas Pitre <nico@linaro.org>

> ---
>  kernel/sched/idle.c |  131 +++++++++++++++++++++++-----------------------------
>  1 file changed, 58 insertions(+), 73 deletions(-)
> 
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -73,7 +73,7 @@ static int cpuidle_idle_call(void)
>  {
>  	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
>  	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
> -	int next_state, entered_state, ret;
> +	int next_state, entered_state;
>  	bool broadcast;
>  
>  	/*
> @@ -102,90 +102,75 @@ static int cpuidle_idle_call(void)
>  	 * Check if the cpuidle framework is ready, otherwise fallback
>  	 * to the default arch specific idle method
>  	 */
> -	ret = cpuidle_enabled(drv, dev);
> -
> -	if (!ret) {
> -		/*
> -		 * Ask the governor to choose an idle state it thinks
> -		 * it is convenient to go to. There is *always* a
> -		 * convenient idle state
> -		 */
> -		next_state = cpuidle_select(drv, dev);
> -
> +	if (cpuidle_enabled(drv, dev)) {
> +use_default:
>  		/*
> -		 * The idle task must be scheduled, it is pointless to
> -		 * go to idle, just update no idle residency and get
> -		 * out of this function
> +		 * We can't use the cpuidle framework, let's use the default
> +		 * idle routine.
>  		 */
> -		if (current_clr_polling_and_test()) {
> -			dev->last_residency = 0;
> -			entered_state = next_state;
> +		if (current_clr_polling_and_test())
>  			local_irq_enable();
> -		} else {
> -			broadcast = !!(drv->states[next_state].flags &
> -				       CPUIDLE_FLAG_TIMER_STOP);
> -
> -			if (broadcast) {
> -				/*
> -				 * Tell the time framework to switch
> -				 * to a broadcast timer because our
> -				 * local timer will be shutdown. If a
> -				 * local timer is used from another
> -				 * cpu as a broadcast timer, this call
> -				 * may fail if it is not available
> -				 */
> -				ret = clockevents_notify(
> -					CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
> -					&dev->cpu);
> -			}
> -
> -			if (!ret) {
> -				trace_cpu_idle_rcuidle(next_state, dev->cpu);
> -
> -				/*
> -				 * Enter the idle state previously
> -				 * returned by the governor
> -				 * decision. This function will block
> -				 * until an interrupt occurs and will
> -				 * take care of re-enabling the local
> -				 * interrupts
> -				 */
> -				entered_state = cpuidle_enter(drv, dev,
> -							      next_state);
> -
> -				trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
> -						       dev->cpu);
> -
> -				if (broadcast)
> -					clockevents_notify(
> -						CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
> -						&dev->cpu);
> -
> -				/*
> -				 * Give the governor an opportunity to reflect on the
> -				 * outcome
> -				 */
> -				cpuidle_reflect(dev, entered_state);
> -			}
> -		}
> +		else
> +			arch_cpu_idle();
> +
> +		goto exit_idle;
>  	}
>  
>  	/*
> -	 * We can't use the cpuidle framework, let's use the default
> -	 * idle routine
> +	 * Ask the governor to choose an idle state it thinks
> +	 * it is convenient to go to. There is *always* a
> +	 * convenient idle state
>  	 */
> -	if (ret) {
> -		if (!current_clr_polling_and_test())
> -			arch_cpu_idle();
> -		else
> -			local_irq_enable();
> +	next_state = cpuidle_select(drv, dev);
> +
> +	/*
> +	 * The idle task must be scheduled, it is pointless to
> +	 * go to idle, just update no idle residency and get
> +	 * out of this function
> +	 */
> +	if (current_clr_polling_and_test()) {
> +		dev->last_residency = 0;
> +		entered_state = next_state;
> +		local_irq_enable();
> +		goto exit_idle;
>  	}
>  
> +	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
> +
> +	/*
> +	 * Tell the time framework to switch to a broadcast timer
> +	 * because our local timer will be shutdown. If a local timer
> +	 * is used from another cpu as a broadcast timer, this call may
> +	 * fail if it is not available
> +	 */
> +	if (broadcast &&
> +	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
> +		goto use_default;
> +
> +	trace_cpu_idle_rcuidle(next_state, dev->cpu);
> +
> +	/*
> +	 * Enter the idle state previously returned by the governor decision.
> +	 * This function will block until an interrupt occurs and will take
> +	 * care of re-enabling the local interrupts
> +	 */
> +	entered_state = cpuidle_enter(drv, dev, next_state);
> +
> +	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
> +
> +	if (broadcast)
> +		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
> +
> +	/*
> +	 * Give the governor an opportunity to reflect on the outcome
> +	 */
> +	cpuidle_reflect(dev, entered_state);
> +
> +exit_idle:
>  	__current_set_polling();
>  
>  	/*
> -	 * It is up to the idle functions to enable back the local
> -	 * interrupt
> +	 * It is up to the idle functions to reenable local interrupts
>  	 */
>  	if (WARN_ON_ONCE(irqs_disabled()))
>  		local_irq_enable();
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-04-11 13:42 ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs Peter Zijlstra
  2014-04-13 21:41   ` Nicolas Pitre
@ 2014-05-09 13:37   ` James Hogan
  2014-05-09 14:15     ` Peter Zijlstra
  1 sibling, 1 reply; 51+ messages in thread
From: James Hogan @ 2014-05-09 13:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, Thomas Gleixner, luto, nicolas.pitre, daniel.lezcano,
	umgwanakikbuti, LKML, Catalin Marinas, Will Deacon,
	ARM Kernel List

Hi Peter,

On 11 April 2014 14:42, Peter Zijlstra <peterz@infradead.org> wrote:
> Because mwait_idle_with_hints() gets called from !idle context it must
> call current_clr_polling(). This however means that resched_task() is
> very likely to send an IPI even when we were polling:
>
>   CPU0                                  CPU1
>
>   if (current_set_polling_and_test())
>     goto out;
>
>   __monitor(&ti->flags);
>   if (!need_resched())
>     __mwait(eax, ecx);
>                                         set_tsk_need_resched(p);
>                                         smp_mb();
> out:
>   current_clr_polling();
>                                         if (!tsk_is_polling(p))
>                                           smp_send_reschedule(cpu);
>
>
> So while it is correct (extra IPIs aren't a problem, whereas a missed
> IPI would be) it is a performance problem (for some).
>
> Avoid this issue by using fetch_or() to atomically set NEED_RESCHED
> and test if POLLING_NRFLAG is set.
>
> Since a CPU stuck in mwait is unlikely to modify the flags word,
> contention on the cmpxchg is unlikely and thus we should mostly
> succeed in a single go.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/sched/core.c |   41 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 5 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -505,6 +505,39 @@ static inline void init_hrtick(void)
>  #endif /* CONFIG_SCHED_HRTICK */
>
>  /*
> + * cmpxchg based fetch_or, macro so it works for different integer types
> + */
> +#define fetch_or(ptr, val)                                             \
> +({     typeof(*(ptr)) __old, __val = *(ptr);                           \
> +       for (;;) {                                                      \
> +               __old = cmpxchg((ptr), __val, __val | (val));           \
> +               if (__old == __val)                                     \
> +                       break;                                          \
> +               __val = __old;                                          \
> +       }                                                               \
> +       __old;                                                          \
> +})
> +
> +#ifdef TIF_POLLING_NRFLAG
> +/*
> + * Atomically set TIF_NEED_RESCHED and test for TIF_POLLING_NRFLAG,
> + * this avoids any races wrt polling state changes and thereby avoids
> + * spurious IPIs.
> + */
> +static bool set_nr_and_not_polling(struct task_struct *p)
> +{
> +       struct thread_info *ti = task_thread_info(p);
> +       return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);

This breaks the build on metag, and I suspect arm64 too:

kernel/sched/core.c In function ‘set_nr_and_not_polling’:
kernel/sched/core.c +531 : error: ‘_TIF_POLLING_NRFLAG’ undeclared

since metag/arm64 define TIF_POLLING_NRFLAG but not
_TIF_POLLING_NRFLAG. Could you please fix that prior to your patch to
avoid breaking bisection?

BTW what is it that determines whether an arch needs TIF_POLLING_NRFLAG?

Thanks
James


> +}
> +#else
> +static bool set_nr_and_not_polling(struct task_struct *p)
> +{
> +       set_tsk_need_resched(p);
> +       return true;
> +}
> +#endif
> +
> +/*
>   * resched_task - mark a task 'to be rescheduled now'.
>   *
>   * On UP this means the setting of the need_resched flag, on SMP it
> @@ -520,17 +553,15 @@ void resched_task(struct task_struct *p)
>         if (test_tsk_need_resched(p))
>                 return;
>
> -       set_tsk_need_resched(p);
> -
>         cpu = task_cpu(p);
> +
>         if (cpu == smp_processor_id()) {
> +               set_tsk_need_resched(p);
>                 set_preempt_need_resched();
>                 return;
>         }
>
> -       /* NEED_RESCHED must be visible before we test polling */
> -       smp_mb();
> -       if (!tsk_is_polling(p))
> +       if (set_nr_and_not_polling(p))
>                 smp_send_reschedule(cpu);
>  }
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 13:37   ` James Hogan
@ 2014-05-09 14:15     ` Peter Zijlstra
  2014-05-09 14:40       ` Catalin Marinas
  2014-05-09 14:51       ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs James Hogan
  0 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-05-09 14:15 UTC (permalink / raw)
  To: James Hogan
  Cc: mingo, Thomas Gleixner, luto, nicolas.pitre, daniel.lezcano,
	umgwanakikbuti, LKML, Catalin Marinas, Will Deacon,
	ARM Kernel List

On Fri, May 09, 2014 at 02:37:27PM +0100, James Hogan wrote:
> Hi Peter,
> 
> On 11 April 2014 14:42, Peter Zijlstra <peterz@infradead.org> wrote:
> > +       return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
> 
> This breaks the build on metag, and I suspect arm64 too:

Yep, I just got a patch for arm64.

> kernel/sched/core.c In function ‘set_nr_and_not_polling’:
> kernel/sched/core.c +531 : error: ‘_TIF_POLLING_NRFLAG’ undeclared
> 
> since metag/arm64 define TIF_POLLING_NRFLAG but not
> _TIF_POLLING_NRFLAG. Could you please fix that prior to your patch to
> avoid breaking bisection?

Ingo, is there any git magic to make that happen? Can we have a tree
with 2 patches (one for ARM64 and one for metag) before the sched/core
tree?

> BTW what is it that determines whether an arch needs TIF_POLLING_NRFLAG?

Any SMP arch that has a polling idle function of any kind (including the
default cpu_idle_poll()).

That said, even if that's true, not having TIF_POLLING_NRFLAG isn't
fatal, just sub-optimal in that we'll send an unconditional IPI to wake
the CPU even though its polling TIF_NEED_RESCHED and doesn't need
anything other than that write to wake up.

Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
that is your only possible idle function, you'll require the interrupt
to wake up and there's really no point to having the POLLING bit.

Lastly, having the POLLING bit and not needing it is similarly non-fatal.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 14:15     ` Peter Zijlstra
@ 2014-05-09 14:40       ` Catalin Marinas
  2014-05-09 14:50         ` Peter Zijlstra
  2014-05-09 14:51       ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs James Hogan
  1 sibling, 1 reply; 51+ messages in thread
From: Catalin Marinas @ 2014-05-09 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: James Hogan, mingo, Thomas Gleixner, luto, nicolas.pitre,
	daniel.lezcano, umgwanakikbuti, LKML, Will Deacon,
	ARM Kernel List

Hi Peter,

On Fri, May 09, 2014 at 03:15:20PM +0100, Peter Zijlstra wrote:
> On Fri, May 09, 2014 at 02:37:27PM +0100, James Hogan wrote:
> > On 11 April 2014 14:42, Peter Zijlstra <peterz@infradead.org> wrote:
> > > +       return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
> > 
> > This breaks the build on metag, and I suspect arm64 too:
> 
> Yep, I just got a patch for arm64.

[...]

> Any SMP arch that has a polling idle function of any kind (including the
> default cpu_idle_poll()).
> 
> That said, even if that's true, not having TIF_POLLING_NRFLAG isn't
> fatal, just sub-optimal in that we'll send an unconditional IPI to wake
> the CPU even though its polling TIF_NEED_RESCHED and doesn't need
> anything other than that write to wake up.
> 
> Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
> that is your only possible idle function, you'll require the interrupt
> to wake up and there's really no point to having the POLLING bit.

I wonder why we still need TIF_POLLING_NRFLAG for arm64. It was on arm
until commit 16a8016372c42c7628eb (sanitize tsk_is_polling()). On arm64
we use wfi for idle or a firmware call but in both cases the assumption
is that we need an interrupt for waking up.

So I think we should remove this macro for arm64.

-- 
Catalin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 14:40       ` Catalin Marinas
@ 2014-05-09 14:50         ` Peter Zijlstra
  2014-05-09 14:57           ` Catalin Marinas
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-05-09 14:50 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: James Hogan, mingo, Thomas Gleixner, luto, nicolas.pitre,
	daniel.lezcano, umgwanakikbuti, LKML, Will Deacon,
	ARM Kernel List

On Fri, May 09, 2014 at 03:40:34PM +0100, Catalin Marinas wrote:

> I wonder why we still need TIF_POLLING_NRFLAG for arm64. It was on arm
> until commit 16a8016372c42c7628eb (sanitize tsk_is_polling()). On arm64
> we use wfi for idle or a firmware call but in both cases the assumption
> is that we need an interrupt for waking up.
> 
> So I think we should remove this macro for arm64.

Does ARM64 support idle=poll? If so, you could keep it for that,
otherwise it does indeed appear to be pointless.

As to 32bit ARM, are there SMP chips which do not have WFI?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 14:15     ` Peter Zijlstra
  2014-05-09 14:40       ` Catalin Marinas
@ 2014-05-09 14:51       ` James Hogan
  2014-05-15  9:17         ` James Hogan
                           ` (2 more replies)
  1 sibling, 3 replies; 51+ messages in thread
From: James Hogan @ 2014-05-09 14:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, Thomas Gleixner, luto, nicolas.pitre, daniel.lezcano,
	umgwanakikbuti, LKML, Catalin Marinas, Will Deacon,
	ARM Kernel List, linux-metag

[-- Attachment #1: Type: text/plain, Size: 3580 bytes --]

On 09/05/14 15:15, Peter Zijlstra wrote:
> On Fri, May 09, 2014 at 02:37:27PM +0100, James Hogan wrote:
>> Hi Peter,
>>
>> On 11 April 2014 14:42, Peter Zijlstra <peterz@infradead.org> wrote:
>>> +       return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
>>
>> This breaks the build on metag, and I suspect arm64 too:
> 
> Yep, I just got a patch for arm64.
> 
>> kernel/sched/core.c In function ‘set_nr_and_not_polling’:
>> kernel/sched/core.c +531 : error: ‘_TIF_POLLING_NRFLAG’ undeclared
>>
>> since metag/arm64 define TIF_POLLING_NRFLAG but not
>> _TIF_POLLING_NRFLAG. Could you please fix that prior to your patch to
>> avoid breaking bisection?
> 
> Ingo, is there any git magic to make that happen? Can we have a tree
> with 2 patches (one for ARM64 and one for metag) before the sched/core
> tree?
> 
>> BTW what is it that determines whether an arch needs TIF_POLLING_NRFLAG?
> 
> Any SMP arch that has a polling idle function of any kind (including the
> default cpu_idle_poll()).
> 
> That said, even if that's true, not having TIF_POLLING_NRFLAG isn't
> fatal, just sub-optimal in that we'll send an unconditional IPI to wake
> the CPU even though its polling TIF_NEED_RESCHED and doesn't need
> anything other than that write to wake up.
> 
> Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
> that is your only possible idle function, you'll require the interrupt
> to wake up and there's really no point to having the POLLING bit.
> 
> Lastly, having the POLLING bit and not needing it is similarly non-fatal.
> 

Thanks. I think the flag can go for Metag then. I suggest the following
patch.

Cheers
James

From 15dd3f9cc18bdb1c8c4d2c395778421022250aa8 Mon Sep 17 00:00:00 2001
From: James Hogan <james.hogan@imgtec.com>
Date: Fri, 9 May 2014 15:36:21 +0100
Subject: [PATCH] metag: Remove TIF_POLLING_NRFLAG

The Meta idle function jumps into the interrupt handler which
efficiently blocks waiting for the next interrupt when it reads the
interrupt status register (TXSTATI). No other (polling) idle functions
can be used, therefore TIF_POLLING_NRFLAG is unnecessary, so lets remove
it.

Peter Zijlstra said:
> Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
> that is your only possible idle function, you'll require the interrupt
> to wake up and there's really no point to having the POLLING bit.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/metag/include/asm/thread_info.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/metag/include/asm/thread_info.h b/arch/metag/include/asm/thread_info.h
index b19e9c588a16..47711336119e 100644
--- a/arch/metag/include/asm/thread_info.h
+++ b/arch/metag/include/asm/thread_info.h
@@ -117,10 +117,8 @@ static inline int kstack_end(void *addr)
 #define TIF_SECCOMP		5	/* secure computing */
 #define TIF_RESTORE_SIGMASK	6	/* restore signal mask in do_signal() */
 #define TIF_NOTIFY_RESUME	7	/* callback before returning to user */
-#define TIF_POLLING_NRFLAG      8	/* true if poll_idle() is polling
-					   TIF_NEED_RESCHED */
-#define TIF_MEMDIE		9	/* is terminating due to OOM killer */
-#define TIF_SYSCALL_TRACEPOINT  10	/* syscall tracepoint instrumentation */
+#define TIF_MEMDIE		8	/* is terminating due to OOM killer */
+#define TIF_SYSCALL_TRACEPOINT	9	/* syscall tracepoint instrumentation */
 
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
-- 
1.9.2


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 14:50         ` Peter Zijlstra
@ 2014-05-09 14:57           ` Catalin Marinas
  2014-05-09 17:02             ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Catalin Marinas @ 2014-05-09 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: James Hogan, mingo, Thomas Gleixner, luto, nicolas.pitre,
	daniel.lezcano, umgwanakikbuti, LKML, Will Deacon,
	ARM Kernel List

On Fri, May 09, 2014 at 03:50:02PM +0100, Peter Zijlstra wrote:
> On Fri, May 09, 2014 at 03:40:34PM +0100, Catalin Marinas wrote:
> 
> > I wonder why we still need TIF_POLLING_NRFLAG for arm64. It was on arm
> > until commit 16a8016372c42c7628eb (sanitize tsk_is_polling()). On arm64
> > we use wfi for idle or a firmware call but in both cases the assumption
> > is that we need an interrupt for waking up.
> > 
> > So I think we should remove this macro for arm64.
> 
> Does ARM64 support idle=poll? If so, you could keep it for that,
> otherwise it does indeed appear to be pointless.

We don't support idle=poll either.

> As to 32bit ARM, are there SMP chips which do not have WFI?

No. WFI is even used for the secondary booting protocol (we need to send
an IPI to get them going).

-- 
Catalin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 14:57           ` Catalin Marinas
@ 2014-05-09 17:02             ` Peter Zijlstra
  2014-05-09 17:06               ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-05-09 17:02 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: James Hogan, mingo, Thomas Gleixner, luto, nicolas.pitre,
	daniel.lezcano, umgwanakikbuti, LKML, Will Deacon,
	ARM Kernel List

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

On Fri, May 09, 2014 at 03:57:45PM +0100, Catalin Marinas wrote:
> On Fri, May 09, 2014 at 03:50:02PM +0100, Peter Zijlstra wrote:
> > On Fri, May 09, 2014 at 03:40:34PM +0100, Catalin Marinas wrote:
> > 
> > > I wonder why we still need TIF_POLLING_NRFLAG for arm64. It was on arm
> > > until commit 16a8016372c42c7628eb (sanitize tsk_is_polling()). On arm64
> > > we use wfi for idle or a firmware call but in both cases the assumption
> > > is that we need an interrupt for waking up.
> > > 
> > > So I think we should remove this macro for arm64.
> > 
> > Does ARM64 support idle=poll? If so, you could keep it for that,
> > otherwise it does indeed appear to be pointless.
> 
> We don't support idle=poll either.
> 
> > As to 32bit ARM, are there SMP chips which do not have WFI?
> 
> No. WFI is even used for the secondary booting protocol (we need to send
> an IPI to get them going).

OK, so I'll queue a patch removing TIF_POLLING_NRFLAG for arm64.

Thanks!

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 17:02             ` Peter Zijlstra
@ 2014-05-09 17:06               ` Peter Zijlstra
  2014-05-09 17:09                 ` Catalin Marinas
                                   ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-05-09 17:06 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: James Hogan, mingo, Thomas Gleixner, luto, nicolas.pitre,
	daniel.lezcano, umgwanakikbuti, LKML, Will Deacon,
	ARM Kernel List

[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

On Fri, May 09, 2014 at 07:02:34PM +0200, Peter Zijlstra wrote:
> OK, so I'll queue a patch removing TIF_POLLING_NRFLAG for arm64.


---
Subject: arm64: Remove TIF_POLLING_NRFLAG
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri May  9 19:04:00 CEST 2014

The only idle method for arm64 is WFI and it therefore
unconditionally requires the reschedule interrupt when idle.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/arm64/include/asm/thread_info.h |    2 --
 1 file changed, 2 deletions(-)

--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -95,13 +95,11 @@ static inline struct thread_info *curren
  *  TIF_NEED_RESCHED	- rescheduling necessary
  *  TIF_NOTIFY_RESUME	- callback before returning to user
  *  TIF_USEDFPU		- FPU was used by this task this quantum (SMP)
- *  TIF_POLLING_NRFLAG	- true if poll_idle() is polling TIF_NEED_RESCHED
  */
 #define TIF_SIGPENDING		0
 #define TIF_NEED_RESCHED	1
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
 #define TIF_SYSCALL_TRACE	8
-#define TIF_POLLING_NRFLAG	16
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
 #define TIF_FREEZE		19
 #define TIF_RESTORE_SIGMASK	20

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 17:06               ` Peter Zijlstra
@ 2014-05-09 17:09                 ` Catalin Marinas
  2014-05-09 17:20                   ` Peter Zijlstra
  2014-05-19 12:54                 ` [tip:sched/arch] arm64: Remove TIF_POLLING_NRFLAG tip-bot for Peter Zijlstra
  2014-05-22 12:26                 ` [tip:sched/core] " tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 51+ messages in thread
From: Catalin Marinas @ 2014-05-09 17:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: James Hogan, mingo, Thomas Gleixner, luto, nicolas.pitre,
	daniel.lezcano, umgwanakikbuti, LKML, Will Deacon,
	ARM Kernel List

On Fri, May 09, 2014 at 06:06:49PM +0100, Peter Zijlstra wrote:
> Subject: arm64: Remove TIF_POLLING_NRFLAG
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri May  9 19:04:00 CEST 2014
> 
> The only idle method for arm64 is WFI and it therefore
> unconditionally requires the reschedule interrupt when idle.
> 
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>

There's a tag with my name already but just in case you need another:

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 17:09                 ` Catalin Marinas
@ 2014-05-09 17:20                   ` Peter Zijlstra
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-05-09 17:20 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: James Hogan, mingo, Thomas Gleixner, luto, nicolas.pitre,
	daniel.lezcano, umgwanakikbuti, LKML, Will Deacon,
	ARM Kernel List

[-- Attachment #1: Type: text/plain, Size: 691 bytes --]

On Fri, May 09, 2014 at 06:09:46PM +0100, Catalin Marinas wrote:
> On Fri, May 09, 2014 at 06:06:49PM +0100, Peter Zijlstra wrote:
> > Subject: arm64: Remove TIF_POLLING_NRFLAG
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Fri May  9 19:04:00 CEST 2014
> > 
> > The only idle method for arm64 is WFI and it therefore
> > unconditionally requires the reschedule interrupt when idle.
> > 
> > Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> 
> There's a tag with my name already but just in case you need another:
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

The more the merrier :-)

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs
  2014-05-09 14:51       ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs James Hogan
@ 2014-05-15  9:17         ` James Hogan
  2014-05-19 12:54         ` [tip:sched/arch] metag: Remove TIF_POLLING_NRFLAG tip-bot for James Hogan
  2014-05-22 12:26         ` [tip:sched/core] " tip-bot for James Hogan
  2 siblings, 0 replies; 51+ messages in thread
From: James Hogan @ 2014-05-15  9:17 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, Thomas Gleixner
  Cc: luto, nicolas.pitre, daniel.lezcano, umgwanakikbuti, LKML,
	Catalin Marinas, Will Deacon, ARM Kernel List, linux-metag

Hi Peter,

On 09/05/14 15:51, James Hogan wrote:
> On 09/05/14 15:15, Peter Zijlstra wrote:
>> Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
>> that is your only possible idle function, you'll require the interrupt
>> to wake up and there's really no point to having the POLLING bit.
>>
>> Lastly, having the POLLING bit and not needing it is similarly non-fatal.
>>
> 
> Thanks. I think the flag can go for Metag then. I suggest the following
> patch.

This is still bust in linux-next. What's the status of this patch?

Thanks
James

> 
> From 15dd3f9cc18bdb1c8c4d2c395778421022250aa8 Mon Sep 17 00:00:00 2001
> From: James Hogan <james.hogan@imgtec.com>
> Date: Fri, 9 May 2014 15:36:21 +0100
> Subject: [PATCH] metag: Remove TIF_POLLING_NRFLAG
> 
> The Meta idle function jumps into the interrupt handler which
> efficiently blocks waiting for the next interrupt when it reads the
> interrupt status register (TXSTATI). No other (polling) idle functions
> can be used, therefore TIF_POLLING_NRFLAG is unnecessary, so lets remove
> it.
> 
> Peter Zijlstra said:
>> Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
>> that is your only possible idle function, you'll require the interrupt
>> to wake up and there's really no point to having the POLLING bit.
> 
> Signed-off-by: James Hogan <james.hogan@imgtec.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/metag/include/asm/thread_info.h | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/metag/include/asm/thread_info.h b/arch/metag/include/asm/thread_info.h
> index b19e9c588a16..47711336119e 100644
> --- a/arch/metag/include/asm/thread_info.h
> +++ b/arch/metag/include/asm/thread_info.h
> @@ -117,10 +117,8 @@ static inline int kstack_end(void *addr)
>  #define TIF_SECCOMP		5	/* secure computing */
>  #define TIF_RESTORE_SIGMASK	6	/* restore signal mask in do_signal() */
>  #define TIF_NOTIFY_RESUME	7	/* callback before returning to user */
> -#define TIF_POLLING_NRFLAG      8	/* true if poll_idle() is polling
> -					   TIF_NEED_RESCHED */
> -#define TIF_MEMDIE		9	/* is terminating due to OOM killer */
> -#define TIF_SYSCALL_TRACEPOINT  10	/* syscall tracepoint instrumentation */
> +#define TIF_MEMDIE		8	/* is terminating due to OOM killer */
> +#define TIF_SYSCALL_TRACEPOINT	9	/* syscall tracepoint instrumentation */
>  
>  
>  #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [tip:sched/arch] metag: Remove TIF_POLLING_NRFLAG
  2014-05-09 14:51       ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs James Hogan
  2014-05-15  9:17         ` James Hogan
@ 2014-05-19 12:54         ` tip-bot for James Hogan
  2014-05-22 12:26         ` [tip:sched/core] " tip-bot for James Hogan
  2 siblings, 0 replies; 51+ messages in thread
From: tip-bot for James Hogan @ 2014-05-19 12:54 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, james.hogan, peterz, tglx

Commit-ID:  a9d27e0c4b0fd7f43a28625f1a703f15596647ad
Gitweb:     http://git.kernel.org/tip/a9d27e0c4b0fd7f43a28625f1a703f15596647ad
Author:     James Hogan <james.hogan@imgtec.com>
AuthorDate: Fri, 9 May 2014 15:36:21 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 19 May 2014 21:50:30 +0900

metag: Remove TIF_POLLING_NRFLAG

The Meta idle function jumps into the interrupt handler which
efficiently blocks waiting for the next interrupt when it reads the
interrupt status register (TXSTATI). No other (polling) idle functions
can be used, therefore TIF_POLLING_NRFLAG is unnecessary, so lets remove
it.

Peter Zijlstra said:
> Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
> that is your only possible idle function, you'll require the interrupt
> to wake up and there's really no point to having the POLLING bit.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEB7E.9080007@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/metag/include/asm/thread_info.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/metag/include/asm/thread_info.h b/arch/metag/include/asm/thread_info.h
index b19e9c5..4771133 100644
--- a/arch/metag/include/asm/thread_info.h
+++ b/arch/metag/include/asm/thread_info.h
@@ -117,10 +117,8 @@ static inline int kstack_end(void *addr)
 #define TIF_SECCOMP		5	/* secure computing */
 #define TIF_RESTORE_SIGMASK	6	/* restore signal mask in do_signal() */
 #define TIF_NOTIFY_RESUME	7	/* callback before returning to user */
-#define TIF_POLLING_NRFLAG      8	/* true if poll_idle() is polling
-					   TIF_NEED_RESCHED */
-#define TIF_MEMDIE		9	/* is terminating due to OOM killer */
-#define TIF_SYSCALL_TRACEPOINT  10	/* syscall tracepoint instrumentation */
+#define TIF_MEMDIE		8	/* is terminating due to OOM killer */
+#define TIF_SYSCALL_TRACEPOINT	9	/* syscall tracepoint instrumentation */
 
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [tip:sched/arch] arm64: Remove TIF_POLLING_NRFLAG
  2014-05-09 17:06               ` Peter Zijlstra
  2014-05-09 17:09                 ` Catalin Marinas
@ 2014-05-19 12:54                 ` tip-bot for Peter Zijlstra
  2014-05-22 12:26                 ` [tip:sched/core] " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 51+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-05-19 12:54 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, catalin.marinas, peterz, tglx

Commit-ID:  7da7c131fcbdc4b023048bd39fac019ae195ccc5
Gitweb:     http://git.kernel.org/tip/7da7c131fcbdc4b023048bd39fac019ae195ccc5
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 9 May 2014 19:04:00 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 19 May 2014 21:50:30 +0900

arm64: Remove TIF_POLLING_NRFLAG

The only idle method for arm64 is WFI and it therefore
unconditionally requires the reschedule interrupt when idle.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140509170649.GG13658@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/arm64/include/asm/thread_info.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 720e70b..7b8e3a2 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -95,13 +95,11 @@ static inline struct thread_info *current_thread_info(void)
  *  TIF_NEED_RESCHED	- rescheduling necessary
  *  TIF_NOTIFY_RESUME	- callback before returning to user
  *  TIF_USEDFPU		- FPU was used by this task this quantum (SMP)
- *  TIF_POLLING_NRFLAG	- true if poll_idle() is polling TIF_NEED_RESCHED
  */
 #define TIF_SIGPENDING		0
 #define TIF_NEED_RESCHED	1
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
 #define TIF_SYSCALL_TRACE	8
-#define TIF_POLLING_NRFLAG	16
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
 #define TIF_FREEZE		19
 #define TIF_RESTORE_SIGMASK	20

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [tip:sched/core] metag: Remove TIF_POLLING_NRFLAG
  2014-05-09 14:51       ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs James Hogan
  2014-05-15  9:17         ` James Hogan
  2014-05-19 12:54         ` [tip:sched/arch] metag: Remove TIF_POLLING_NRFLAG tip-bot for James Hogan
@ 2014-05-22 12:26         ` tip-bot for James Hogan
  2 siblings, 0 replies; 51+ messages in thread
From: tip-bot for James Hogan @ 2014-05-22 12:26 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, james.hogan, peterz, tglx

Commit-ID:  31e6cdefd377ae90bea5c6a4e1dc519793577a24
Gitweb:     http://git.kernel.org/tip/31e6cdefd377ae90bea5c6a4e1dc519793577a24
Author:     James Hogan <james.hogan@imgtec.com>
AuthorDate: Fri, 9 May 2014 15:36:21 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 May 2014 10:22:10 +0200

metag: Remove TIF_POLLING_NRFLAG

The Meta idle function jumps into the interrupt handler which
efficiently blocks waiting for the next interrupt when it reads the
interrupt status register (TXSTATI). No other (polling) idle functions
can be used, therefore TIF_POLLING_NRFLAG is unnecessary, so lets remove
it.

Peter Zijlstra said:
> Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
> that is your only possible idle function, you'll require the interrupt
> to wake up and there's really no point to having the POLLING bit.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEB7E.9080007@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/metag/include/asm/thread_info.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/metag/include/asm/thread_info.h b/arch/metag/include/asm/thread_info.h
index b19e9c5..4771133 100644
--- a/arch/metag/include/asm/thread_info.h
+++ b/arch/metag/include/asm/thread_info.h
@@ -117,10 +117,8 @@ static inline int kstack_end(void *addr)
 #define TIF_SECCOMP		5	/* secure computing */
 #define TIF_RESTORE_SIGMASK	6	/* restore signal mask in do_signal() */
 #define TIF_NOTIFY_RESUME	7	/* callback before returning to user */
-#define TIF_POLLING_NRFLAG      8	/* true if poll_idle() is polling
-					   TIF_NEED_RESCHED */
-#define TIF_MEMDIE		9	/* is terminating due to OOM killer */
-#define TIF_SYSCALL_TRACEPOINT  10	/* syscall tracepoint instrumentation */
+#define TIF_MEMDIE		8	/* is terminating due to OOM killer */
+#define TIF_SYSCALL_TRACEPOINT	9	/* syscall tracepoint instrumentation */
 
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [tip:sched/core] arm64: Remove TIF_POLLING_NRFLAG
  2014-05-09 17:06               ` Peter Zijlstra
  2014-05-09 17:09                 ` Catalin Marinas
  2014-05-19 12:54                 ` [tip:sched/arch] arm64: Remove TIF_POLLING_NRFLAG tip-bot for Peter Zijlstra
@ 2014-05-22 12:26                 ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 51+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-05-22 12:26 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, catalin.marinas, peterz, tglx

Commit-ID:  842514849a616e9b61acad65771c7afe01e651f9
Gitweb:     http://git.kernel.org/tip/842514849a616e9b61acad65771c7afe01e651f9
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 9 May 2014 19:04:00 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 May 2014 10:22:58 +0200

arm64: Remove TIF_POLLING_NRFLAG

The only idle method for arm64 is WFI and it therefore
unconditionally requires the reschedule interrupt when idle.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/20140509170649.GG13658@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/arm64/include/asm/thread_info.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 720e70b..7b8e3a2 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -95,13 +95,11 @@ static inline struct thread_info *current_thread_info(void)
  *  TIF_NEED_RESCHED	- rescheduling necessary
  *  TIF_NOTIFY_RESUME	- callback before returning to user
  *  TIF_USEDFPU		- FPU was used by this task this quantum (SMP)
- *  TIF_POLLING_NRFLAG	- true if poll_idle() is polling TIF_NEED_RESCHED
  */
 #define TIF_SIGPENDING		0
 #define TIF_NEED_RESCHED	1
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
 #define TIF_SYSCALL_TRACE	8
-#define TIF_POLLING_NRFLAG	16
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
 #define TIF_FREEZE		19
 #define TIF_RESTORE_SIGMASK	20

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-04-11 15:00 ` [RFC][PATCH 0/8] sched,idle: need resched polling rework Andy Lutomirski
  2014-04-11 15:14   ` Peter Zijlstra
@ 2014-05-22 12:58   ` Peter Zijlstra
  2014-05-22 13:09     ` Peter Zijlstra
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-05-22 12:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, daniel.lezcano,
	Mike Galbraith, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4858 bytes --]

On Fri, Apr 11, 2014 at 08:00:23AM -0700, Andy Lutomirski wrote:
> 
> That being said, I think that this addresses once one of the two major
> issues.  While the race you're fixing is more interesting, I think its
> impact is dwarfed by the fact that ttwu_queue_remote completely
> ignores polling.  (NB: I haven't actually tested this patch set, but I
> did try to instrument this stuff awhile ago.)
> 
> To fix this, presumably the wake-from-idle path needs a
> sched_ttwu_pending call, and ttwu_queue_remote could use resched_task.
>  sched_ttwu_pending could benefit from a straightforward optimization:
> it doesn't need rq->lock if llist is empty.
> 
> If you're not planning on trying to fix that, I can try to write up a
> patch in the next day or two.
> 
> Even with all of this fixed, what happens when ttwu_queue_remote is
> called with a task that has lower priority than whatever is currently
> running on the targeted cpu?  I think the result is an IPI that serves
> very little purpose other than avoiding taking a spinlock in the
> waking thread.  This may be a bad tradeoff.  I doubt that this matters
> for my particular workload, though.
> 

Ok, now that that all settled, how about something like this on top?

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4ea7b3f1a087..a5da85fb3570 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -546,12 +546,38 @@ static bool set_nr_and_not_polling(struct task_struct *p)
 	struct thread_info *ti = task_thread_info(p);
 	return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
 }
+
+/*
+ * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.
+ */
+static bool set_nr_if_polling(struct task_struct *p)
+{
+	struct thread_info *ti = task_thread_info(p);
+	typeof(ti->flags) old, val = ti->flags;
+
+	for (;;) {
+		if (!(val & _TIF_POLLING_NRFLAG))
+			return false;
+		if (val & _TIF_NEED_RESCHED)
+			return true;
+		old = cmpxchg(&ti->flags, val, val | _TIF_NEED_RESCHED);
+		if (old == val)
+			return true;
+		val = old;
+	}
+}
+
 #else
 static bool set_nr_and_not_polling(struct task_struct *p)
 {
 	set_tsk_need_resched(p);
 	return true;
 }
+
+static bool set_nr_if_polling(struct task_struct *p)
+{
+	return false;
+}
 #endif
 
 /*
@@ -652,16 +678,7 @@ static void wake_up_idle_cpu(int cpu)
 	if (rq->curr != rq->idle)
 		return;
 
-	/*
-	 * We can set TIF_RESCHED on the idle task of the other CPU
-	 * lockless. The worst case is that the other CPU runs the
-	 * idle task through an additional NOOP schedule()
-	 */
-	set_tsk_need_resched(rq->idle);
-
-	/* NEED_RESCHED must be visible before we test polling */
-	smp_mb();
-	if (!tsk_is_polling(rq->idle))
+	if (set_nr_and_not_polling(rq->idle))
 		smp_send_reschedule(cpu);
 }
 
@@ -1521,12 +1538,15 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 }
 
 #ifdef CONFIG_SMP
-static void sched_ttwu_pending(void)
+void sched_ttwu_pending(void)
 {
 	struct rq *rq = this_rq();
 	struct llist_node *llist = llist_del_all(&rq->wake_list);
 	struct task_struct *p;
 
+	if (!llist)
+		return;
+
 	raw_spin_lock(&rq->lock);
 
 	while (llist) {
@@ -1581,8 +1601,10 @@ void scheduler_ipi(void)
 
 static void ttwu_queue_remote(struct task_struct *p, int cpu)
 {
-	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
-		smp_send_reschedule(cpu);
+	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {
+		if (!set_nr_if_polling(p))
+			smp_send_reschedule(cpu);
+	}
 }
 
 bool cpus_share_cache(int this_cpu, int that_cpu)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 34083c9ac976..4286f2e59136 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -12,6 +12,8 @@
 
 #include <trace/events/power.h>
 
+#include "sched.h"
+
 static int __read_mostly cpu_idle_force_poll;
 
 void cpu_idle_poll_ctrl(bool enable)
@@ -223,6 +225,7 @@ static void cpu_idle_loop(void)
 		 */
 		preempt_set_need_resched();
 		tick_nohz_idle_exit();
+		sched_ttwu_pending();
 		schedule_preempt_disabled();
 	}
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b2cbe81308af..2b35ec6df6b3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -670,6 +670,8 @@ extern int migrate_swap(struct task_struct *, struct task_struct *);
 
 #ifdef CONFIG_SMP
 
+extern void sched_ttwu_pending(void);
+
 #define rcu_dereference_check_sched_domain(p) \
 	rcu_dereference_check((p), \
 			      lockdep_is_held(&sched_domains_mutex))
@@ -787,6 +789,10 @@ static inline unsigned int group_first_cpu(struct sched_group *group)
 
 extern int group_balance_cpu(struct sched_group *sg);
 
+#else
+
+static inline void sched_ttwu_pending(void) { }
+
 #endif /* CONFIG_SMP */
 
 #include "stats.h"

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-05-22 12:58   ` Peter Zijlstra
@ 2014-05-22 13:09     ` Peter Zijlstra
  2014-05-29  0:01       ` Andy Lutomirski
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-05-22 13:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, daniel.lezcano,
	Mike Galbraith, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

On Thu, May 22, 2014 at 02:58:18PM +0200, Peter Zijlstra wrote:
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4ea7b3f1a087..a5da85fb3570 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -546,12 +546,38 @@ static bool set_nr_and_not_polling(struct task_struct *p)
>  	struct thread_info *ti = task_thread_info(p);
>  	return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
>  }
> +
> +/*
> + * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.
> + */
> +static bool set_nr_if_polling(struct task_struct *p)
> +{
> +	struct thread_info *ti = task_thread_info(p);
> +	typeof(ti->flags) old, val = ti->flags;
> +
> +	for (;;) {
> +		if (!(val & _TIF_POLLING_NRFLAG))
> +			return false;
> +		if (val & _TIF_NEED_RESCHED)
> +			return true;

Hmm, I think this is racy, false would be safer. If its already set we
might already be past the sched_ttwu_pending() invocation, while if its
clear and we're the one to set it, we're guaranteed not.

> +		old = cmpxchg(&ti->flags, val, val | _TIF_NEED_RESCHED);
> +		if (old == val)
> +			return true;
> +		val = old;
> +	}
> +}

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-05-22 13:09     ` Peter Zijlstra
@ 2014-05-29  0:01       ` Andy Lutomirski
  2014-05-29  6:48         ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Andy Lutomirski @ 2014-05-29  0:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

On Thu, May 22, 2014 at 6:09 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, May 22, 2014 at 02:58:18PM +0200, Peter Zijlstra wrote:
>> ---
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 4ea7b3f1a087..a5da85fb3570 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -546,12 +546,38 @@ static bool set_nr_and_not_polling(struct task_struct *p)
>>       struct thread_info *ti = task_thread_info(p);
>>       return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
>>  }
>> +
>> +/*
>> + * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.
>> + */
>> +static bool set_nr_if_polling(struct task_struct *p)
>> +{
>> +     struct thread_info *ti = task_thread_info(p);
>> +     typeof(ti->flags) old, val = ti->flags;
>> +
>> +     for (;;) {
>> +             if (!(val & _TIF_POLLING_NRFLAG))
>> +                     return false;
>> +             if (val & _TIF_NEED_RESCHED)
>> +                     return true;
>
> Hmm, I think this is racy, false would be safer. If its already set we
> might already be past the sched_ttwu_pending() invocation, while if its
> clear and we're the one to set it, we're guaranteed not.
>
>> +             old = cmpxchg(&ti->flags, val, val | _TIF_NEED_RESCHED);
>> +             if (old == val)
>> +                     return true;
>> +             val = old;
>> +     }
>> +}

Do you have an updated patch?  After fixing the MIME flow damage
(sigh), it doesn't apply to sched/core, which is my best guess for
what it's based on.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-05-29  0:01       ` Andy Lutomirski
@ 2014-05-29  6:48         ` Peter Zijlstra
  2014-06-03  6:40           ` Andy Lutomirski
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-05-29  6:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

On Wed, May 28, 2014 at 05:01:41PM -0700, Andy Lutomirski wrote:
> On Thu, May 22, 2014 at 6:09 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, May 22, 2014 at 02:58:18PM +0200, Peter Zijlstra wrote:
> >> ---
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 4ea7b3f1a087..a5da85fb3570 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -546,12 +546,38 @@ static bool set_nr_and_not_polling(struct task_struct *p)
> >>       struct thread_info *ti = task_thread_info(p);
> >>       return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
> >>  }
> >> +
> >> +/*
> >> + * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.
> >> + */
> >> +static bool set_nr_if_polling(struct task_struct *p)
> >> +{
> >> +     struct thread_info *ti = task_thread_info(p);
> >> +     typeof(ti->flags) old, val = ti->flags;
> >> +
> >> +     for (;;) {
> >> +             if (!(val & _TIF_POLLING_NRFLAG))
> >> +                     return false;
> >> +             if (val & _TIF_NEED_RESCHED)
> >> +                     return true;
> >
> > Hmm, I think this is racy, false would be safer. If its already set we
> > might already be past the sched_ttwu_pending() invocation, while if its
> > clear and we're the one to set it, we're guaranteed not.
> >
> >> +             old = cmpxchg(&ti->flags, val, val | _TIF_NEED_RESCHED);
> >> +             if (old == val)
> >> +                     return true;
> >> +             val = old;
> >> +     }
> >> +}
> 
> Do you have an updated patch?  After fixing the MIME flow damage
> (sigh), it doesn't apply to sched/core, which is my best guess for
> what it's based on.

https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=sched/core&id=c224d4fee677ecc72209903d330b643bcf0793d7

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-05-29  6:48         ` Peter Zijlstra
@ 2014-06-03  6:40           ` Andy Lutomirski
  2014-06-03  6:51             ` Peter Zijlstra
  2014-06-03 10:43             ` Peter Zijlstra
  0 siblings, 2 replies; 51+ messages in thread
From: Andy Lutomirski @ 2014-06-03  6:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

On Wed, May 28, 2014 at 11:48 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, May 28, 2014 at 05:01:41PM -0700, Andy Lutomirski wrote:
>> On Thu, May 22, 2014 at 6:09 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, May 22, 2014 at 02:58:18PM +0200, Peter Zijlstra wrote:
>> >> ---
>> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> >> index 4ea7b3f1a087..a5da85fb3570 100644
>> >> --- a/kernel/sched/core.c
>> >> +++ b/kernel/sched/core.c
>> >> @@ -546,12 +546,38 @@ static bool set_nr_and_not_polling(struct task_struct *p)
>> >>       struct thread_info *ti = task_thread_info(p);
>> >>       return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
>> >>  }
>> >> +
>> >> +/*
>> >> + * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.
>> >> + */
>> >> +static bool set_nr_if_polling(struct task_struct *p)
>> >> +{
>> >> +     struct thread_info *ti = task_thread_info(p);
>> >> +     typeof(ti->flags) old, val = ti->flags;
>> >> +
>> >> +     for (;;) {
>> >> +             if (!(val & _TIF_POLLING_NRFLAG))
>> >> +                     return false;
>> >> +             if (val & _TIF_NEED_RESCHED)
>> >> +                     return true;
>> >
>> > Hmm, I think this is racy, false would be safer. If its already set we
>> > might already be past the sched_ttwu_pending() invocation, while if its
>> > clear and we're the one to set it, we're guaranteed not.
>> >
>> >> +             old = cmpxchg(&ti->flags, val, val | _TIF_NEED_RESCHED);
>> >> +             if (old == val)
>> >> +                     return true;
>> >> +             val = old;
>> >> +     }
>> >> +}
>>
>> Do you have an updated patch?  After fixing the MIME flow damage
>> (sigh), it doesn't apply to sched/core, which is my best guess for
>> what it's based on.
>
> https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/commit/?h=sched/core&id=c224d4fee677ecc72209903d330b643bcf0793d7

Thanks!

Bugs found so far:

defined(SMP) should be defined(CONFIG_SMP)

You're testing polling on the task being woken, which cannot possibly
succeed: the only tasks that have any business polling are the idle
tasks.  Something like this seems to help:

 static void ttwu_queue_remote(struct task_struct *p, int cpu)
 {
+       struct rq *rq = cpu_rq(cpu);
+
        if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {
-               if (!set_nr_if_polling(p))
+               if (!set_nr_if_polling(rq->idle))
                        smp_send_reschedule(cpu);
+               else
+                       trace_sched_wake_polling_cpu(cpu);
        }
 }

If you don't beat me to it, I'll send real patches in the morning.
I'll also send some followup patches to make it even better.  Fully
fixed up, this gets rid of almost all of my rescheduling interrupts
except for interrupts from the timer tick.

Also, grr, I still think this would be clearer if polling and
need_resched were per cpu instead of per task -- they only make sense
on a running task.  I guess that need_resched being in
thread_info->flags is helpful because it streamlines the interrupt
exit code.  Oh, well.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03  6:40           ` Andy Lutomirski
@ 2014-06-03  6:51             ` Peter Zijlstra
  2014-06-03 10:43             ` Peter Zijlstra
  1 sibling, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2014-06-03  6:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 410 bytes --]

On Mon, Jun 02, 2014 at 11:40:26PM -0700, Andy Lutomirski wrote:
> Bugs found so far:
> 
> defined(SMP) should be defined(CONFIG_SMP)
> 
> You're testing polling on the task being woken, which cannot possibly
> succeed: the only tasks that have any business polling are the idle
> tasks.  Something like this seems to help:

Urgh,. comes from doing too many things at once.. Thanks, I'll go fix
up.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03  6:40           ` Andy Lutomirski
  2014-06-03  6:51             ` Peter Zijlstra
@ 2014-06-03 10:43             ` Peter Zijlstra
  2014-06-03 14:02               ` Peter Zijlstra
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-06-03 10:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6670 bytes --]

On Mon, Jun 02, 2014 at 11:40:26PM -0700, Andy Lutomirski wrote:
> You're testing polling on the task being woken, which cannot possibly
> succeed: the only tasks that have any business polling are the idle
> tasks.  Something like this seems to help:
> 
>  static void ttwu_queue_remote(struct task_struct *p, int cpu)
>  {
> +       struct rq *rq = cpu_rq(cpu);
> +
>         if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {
> -               if (!set_nr_if_polling(p))
> +               if (!set_nr_if_polling(rq->idle))
>                         smp_send_reschedule(cpu);
> +               else
> +                       trace_sched_wake_polling_cpu(cpu);
>         }
>  }
> 
> If you don't beat me to it, I'll send real patches in the morning.
> I'll also send some followup patches to make it even better.  Fully
> fixed up, this gets rid of almost all of my rescheduling interrupts
> except for interrupts from the timer tick.

We need rq->curr, rq->idle 'sleeps' with polling set and nr clear, but
it obviously has no effect setting that if its not actually the current
task.

Touching rq->curr needs holding rcu_read_lock() though, to make sure the
task stays around, still shouldn't be a problem.

> Also, grr, I still think this would be clearer if polling and
> need_resched were per cpu instead of per task -- they only make sense
> on a running task.  I guess that need_resched being in
> thread_info->flags is helpful because it streamlines the interrupt
> exit code.  Oh, well.

Yes, and the fact that traditionally we didn't have per-cpu storage. But
yes, the interrupt/system return path is a good reason to keep all this.

---
Subject: sched: Optimize ttwu IPI
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu May 22 17:19:22 CEST 2014

When enqueueing tasks on remote LLC domains, we send an IPI to do the
work 'locally' and avoid bouncing all the cachelines over.

However, when the remote CPU is idle (and polling, say x86 mwait), we
don't need to send an IPI, we can simply kick the TIF word to wake it
up and have the 'idle' loop do the work.

So when _TIF_POLLING_NRFLAG is set, but _TIF_NEED_RESCHED is not (yet)
set, set _TIF_NEED_RESCHED and avoid sending the IPI.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Much-Requested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/core.c  |   61 +++++++++++++++++++++++++++++++++++++--------------
 kernel/sched/idle.c  |    3 ++
 kernel/sched/sched.h |    6 +++++
 3 files changed, 54 insertions(+), 16 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -535,7 +535,7 @@ static inline void init_hrtick(void)
  	__old;								\
 })
 
-#ifdef TIF_POLLING_NRFLAG
+#if defined(CONFIG_SMP) && defined(TIF_POLLING_NRFLAG)
 /*
  * Atomically set TIF_NEED_RESCHED and test for TIF_POLLING_NRFLAG,
  * this avoids any races wrt polling state changes and thereby avoids
@@ -546,12 +546,40 @@ static bool set_nr_and_not_polling(struc
 	struct thread_info *ti = task_thread_info(p);
 	return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
 }
+
+/*
+ * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.
+ */
+static bool set_nr_if_polling(struct task_struct *p)
+{
+	struct thread_info *ti = task_thread_info(p);
+	typeof(ti->flags) old, val = ACCESS_ONCE(ti->flags);
+
+	for (;;) {
+		if ((val & (_TIF_NEED_RESCHED | _TIF_POLLING_NRFLAG)) !=
+				_TIF_POLLING_NRFLAG)
+			return false;
+		old = cmpxchg(&ti->flags, val, val | _TIF_NEED_RESCHED);
+		if (old == val)
+			break;
+		val = old;
+	}
+	return true;
+}
+
 #else
 static bool set_nr_and_not_polling(struct task_struct *p)
 {
 	set_tsk_need_resched(p);
 	return true;
 }
+
+#ifdef CONFIG_SMP
+static bool set_nr_if_polling(struct task_struct *p)
+{
+	return false;
+}
+#endif
 #endif
 
 /*
@@ -652,16 +680,7 @@ static void wake_up_idle_cpu(int cpu)
 	if (rq->curr != rq->idle)
 		return;
 
-	/*
-	 * We can set TIF_RESCHED on the idle task of the other CPU
-	 * lockless. The worst case is that the other CPU runs the
-	 * idle task through an additional NOOP schedule()
-	 */
-	set_tsk_need_resched(rq->idle);
-
-	/* NEED_RESCHED must be visible before we test polling */
-	smp_mb();
-	if (!tsk_is_polling(rq->idle))
+	if (set_nr_and_not_polling(rq->idle))
 		smp_send_reschedule(cpu);
 }
 
@@ -1521,13 +1540,17 @@ static int ttwu_remote(struct task_struc
 }
 
 #ifdef CONFIG_SMP
-static void sched_ttwu_pending(void)
+void sched_ttwu_pending(void)
 {
 	struct rq *rq = this_rq();
 	struct llist_node *llist = llist_del_all(&rq->wake_list);
 	struct task_struct *p;
+	unsigned long flags;
+
+	if (!llist)
+		return;
 
-	raw_spin_lock(&rq->lock);
+	raw_spin_lock_irqsave(&rq->lock, flags);
 
 	while (llist) {
 		p = llist_entry(llist, struct task_struct, wake_entry);
@@ -1535,7 +1558,7 @@ static void sched_ttwu_pending(void)
 		ttwu_do_activate(rq, p, 0);
 	}
 
-	raw_spin_unlock(&rq->lock);
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }
 
 void scheduler_ipi(void)
@@ -1581,8 +1604,14 @@ void scheduler_ipi(void)
 
 static void ttwu_queue_remote(struct task_struct *p, int cpu)
 {
-	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
-		smp_send_reschedule(cpu);
+	struct rq *rq = cpu_rq(cpu);
+
+	if (llist_add(&p->wake_entry, &rq->wake_list)) {
+		rcu_read_lock();
+		if (!set_nr_if_polling(rq->curr))
+			smp_send_reschedule(cpu);
+		rcu_read_unlock();
+	}
 }
 
 bool cpus_share_cache(int this_cpu, int that_cpu)
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -12,6 +12,8 @@
 
 #include <trace/events/power.h>
 
+#include "sched.h"
+
 static int __read_mostly cpu_idle_force_poll;
 
 void cpu_idle_poll_ctrl(bool enable)
@@ -218,6 +220,7 @@ static void cpu_idle_loop(void)
 		 */
 		preempt_set_need_resched();
 		tick_nohz_idle_exit();
+		sched_ttwu_pending();
 		schedule_preempt_disabled();
 	}
 }
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -670,6 +670,8 @@ extern int migrate_swap(struct task_stru
 
 #ifdef CONFIG_SMP
 
+extern void sched_ttwu_pending(void);
+
 #define rcu_dereference_check_sched_domain(p) \
 	rcu_dereference_check((p), \
 			      lockdep_is_held(&sched_domains_mutex))
@@ -787,6 +789,10 @@ static inline unsigned int group_first_c
 
 extern int group_balance_cpu(struct sched_group *sg);
 
+#else
+
+static inline void sched_ttwu_pending(void) { }
+
 #endif /* CONFIG_SMP */
 
 #include "stats.h"

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03 10:43             ` Peter Zijlstra
@ 2014-06-03 14:02               ` Peter Zijlstra
  2014-06-03 16:05                 ` Andy Lutomirski
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-06-03 14:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

On Tue, Jun 03, 2014 at 12:43:47PM +0200, Peter Zijlstra wrote:
> We need rq->curr, rq->idle 'sleeps' with polling set and nr clear, but
> it obviously has no effect setting that if its not actually the current
> task.
> 
> Touching rq->curr needs holding rcu_read_lock() though, to make sure the
> task stays around, still shouldn't be a problem.

> @@ -1581,8 +1604,14 @@ void scheduler_ipi(void)
>  
>  static void ttwu_queue_remote(struct task_struct *p, int cpu)
>  {
> -	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
> -		smp_send_reschedule(cpu);
> +	struct rq *rq = cpu_rq(cpu);
> +
> +	if (llist_add(&p->wake_entry, &rq->wake_list)) {
> +		rcu_read_lock();
> +		if (!set_nr_if_polling(rq->curr))
> +			smp_send_reschedule(cpu);
> +		rcu_read_unlock();
> +	}
>  }

Hrmm, I think that is still broken, see how in schedule() we clear NR
before setting the new ->curr.

So I think I had a loop on rq->curr the last time we talked about this,
but alternatively we could look at clearing NR after setting a new curr.

I think I once looked at why it was done before, of course I can't
actually remember the details :/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03 14:02               ` Peter Zijlstra
@ 2014-06-03 16:05                 ` Andy Lutomirski
  2014-06-03 16:19                   ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Andy Lutomirski @ 2014-06-03 16:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

On Tue, Jun 3, 2014 at 7:02 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jun 03, 2014 at 12:43:47PM +0200, Peter Zijlstra wrote:
>> We need rq->curr, rq->idle 'sleeps' with polling set and nr clear, but
>> it obviously has no effect setting that if its not actually the current
>> task.
>>
>> Touching rq->curr needs holding rcu_read_lock() though, to make sure the
>> task stays around, still shouldn't be a problem.
>
>> @@ -1581,8 +1604,14 @@ void scheduler_ipi(void)
>>
>>  static void ttwu_queue_remote(struct task_struct *p, int cpu)
>>  {
>> -     if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
>> -             smp_send_reschedule(cpu);
>> +     struct rq *rq = cpu_rq(cpu);
>> +
>> +     if (llist_add(&p->wake_entry, &rq->wake_list)) {
>> +             rcu_read_lock();
>> +             if (!set_nr_if_polling(rq->curr))
>> +                     smp_send_reschedule(cpu);
>> +             rcu_read_unlock();
>> +     }
>>  }
>
> Hrmm, I think that is still broken, see how in schedule() we clear NR
> before setting the new ->curr.
>
> So I think I had a loop on rq->curr the last time we talked about this,
> but alternatively we could look at clearing NR after setting a new curr.
>
> I think I once looked at why it was done before, of course I can't
> actually remember the details :/

Wouldn't this be a little simpler and maybe even faster if we just
changed the idle loop to make TIF_POLLING_NRFLAG be a real indication
that the idle task is running and actively polling?  That is, change
the end of cpuidle_idle_loop to:

                preempt_set_need_resched();
                tick_nohz_idle_exit();
                clear_tsk_need_resched(current);
                __current_clr_polling();
                smp_mb__after_clear_bit();
                WARN_ON_ONCE(test_thread_flag(TIF_POLLING_NRFLAG));
                sched_ttwu_pending();
                schedule_preempt_disabled();
                __current_set_polling();

This has the added benefit that the optimistic version of the cmpxchg
loop would be safe again.  I'm about to test this with this variant.
I'll try and send a comprehensible set of patches in a few hours.

Can you remind me what the benefit was of letting polling be set when
the idle thread schedules?  It seems racy to me: it probably prevents
any safe use of the polling bit without holding the rq lock.  I guess
there's some benefit to having polling be set for as long as possible,
but it only helps if there are wakeups in very rapid succession, and
it costs a couple of extra bit ops per idle entry.

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03 16:05                 ` Andy Lutomirski
@ 2014-06-03 16:19                   ` Peter Zijlstra
  2014-06-03 16:52                     ` Andy Lutomirski
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-06-03 16:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3187 bytes --]

On Tue, Jun 03, 2014 at 09:05:03AM -0700, Andy Lutomirski wrote:
> On Tue, Jun 3, 2014 at 7:02 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, Jun 03, 2014 at 12:43:47PM +0200, Peter Zijlstra wrote:
> >> We need rq->curr, rq->idle 'sleeps' with polling set and nr clear, but
> >> it obviously has no effect setting that if its not actually the current
> >> task.
> >>
> >> Touching rq->curr needs holding rcu_read_lock() though, to make sure the
> >> task stays around, still shouldn't be a problem.
> >
> >> @@ -1581,8 +1604,14 @@ void scheduler_ipi(void)
> >>
> >>  static void ttwu_queue_remote(struct task_struct *p, int cpu)
> >>  {
> >> -     if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
> >> -             smp_send_reschedule(cpu);
> >> +     struct rq *rq = cpu_rq(cpu);
> >> +
> >> +     if (llist_add(&p->wake_entry, &rq->wake_list)) {
> >> +             rcu_read_lock();
> >> +             if (!set_nr_if_polling(rq->curr))
> >> +                     smp_send_reschedule(cpu);
> >> +             rcu_read_unlock();
> >> +     }
> >>  }
> >
> > Hrmm, I think that is still broken, see how in schedule() we clear NR
> > before setting the new ->curr.
> >
> > So I think I had a loop on rq->curr the last time we talked about this,
> > but alternatively we could look at clearing NR after setting a new curr.
> >
> > I think I once looked at why it was done before, of course I can't
> > actually remember the details :/
> 
> Wouldn't this be a little simpler and maybe even faster if we just
> changed the idle loop to make TIF_POLLING_NRFLAG be a real indication
> that the idle task is running and actively polling?  That is, change
> the end of cpuidle_idle_loop to:
> 
>                 preempt_set_need_resched();
>                 tick_nohz_idle_exit();
>                 clear_tsk_need_resched(current);
>                 __current_clr_polling();
>                 smp_mb__after_clear_bit();
>                 WARN_ON_ONCE(test_thread_flag(TIF_POLLING_NRFLAG));
>                 sched_ttwu_pending();
>                 schedule_preempt_disabled();
>                 __current_set_polling();
> 
> This has the added benefit that the optimistic version of the cmpxchg
> loop would be safe again.  I'm about to test this with this variant.
> I'll try and send a comprehensible set of patches in a few hours.
> 
> Can you remind me what the benefit was of letting polling be set when
> the idle thread schedules? 

Hysterical raisins, I don't think there's an actual reason, so yes, that
might be the best option indeed.

> It seems racy to me: it probably prevents
> any safe use of the polling bit without holding the rq lock.  I guess
> there's some benefit to having polling be set for as long as possible,
> but it only helps if there are wakeups in very rapid succession, and
> it costs a couple of extra bit ops per idle entry.

So you could cheat and set it in pick_next_task_idle() and clear in
put_prev_task_idle(), that way the entire idle loop, when running has it
set.

And then there was the idle injection loop crap trainwreck, which I
should send patches for.. 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03 16:19                   ` Peter Zijlstra
@ 2014-06-03 16:52                     ` Andy Lutomirski
  2014-06-03 17:00                       ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Andy Lutomirski @ 2014-06-03 16:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

On Tue, Jun 3, 2014 at 9:19 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jun 03, 2014 at 09:05:03AM -0700, Andy Lutomirski wrote:
>> On Tue, Jun 3, 2014 at 7:02 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, Jun 03, 2014 at 12:43:47PM +0200, Peter Zijlstra wrote:
>> >> We need rq->curr, rq->idle 'sleeps' with polling set and nr clear, but
>> >> it obviously has no effect setting that if its not actually the current
>> >> task.
>> >>
>> >> Touching rq->curr needs holding rcu_read_lock() though, to make sure the
>> >> task stays around, still shouldn't be a problem.
>> >
>> >> @@ -1581,8 +1604,14 @@ void scheduler_ipi(void)
>> >>
>> >>  static void ttwu_queue_remote(struct task_struct *p, int cpu)
>> >>  {
>> >> -     if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
>> >> -             smp_send_reschedule(cpu);
>> >> +     struct rq *rq = cpu_rq(cpu);
>> >> +
>> >> +     if (llist_add(&p->wake_entry, &rq->wake_list)) {
>> >> +             rcu_read_lock();
>> >> +             if (!set_nr_if_polling(rq->curr))
>> >> +                     smp_send_reschedule(cpu);
>> >> +             rcu_read_unlock();
>> >> +     }
>> >>  }
>> >
>> > Hrmm, I think that is still broken, see how in schedule() we clear NR
>> > before setting the new ->curr.
>> >
>> > So I think I had a loop on rq->curr the last time we talked about this,
>> > but alternatively we could look at clearing NR after setting a new curr.
>> >
>> > I think I once looked at why it was done before, of course I can't
>> > actually remember the details :/
>>
>> Wouldn't this be a little simpler and maybe even faster if we just
>> changed the idle loop to make TIF_POLLING_NRFLAG be a real indication
>> that the idle task is running and actively polling?  That is, change
>> the end of cpuidle_idle_loop to:
>>
>>                 preempt_set_need_resched();
>>                 tick_nohz_idle_exit();
>>                 clear_tsk_need_resched(current);
>>                 __current_clr_polling();
>>                 smp_mb__after_clear_bit();
>>                 WARN_ON_ONCE(test_thread_flag(TIF_POLLING_NRFLAG));
>>                 sched_ttwu_pending();
>>                 schedule_preempt_disabled();
>>                 __current_set_polling();
>>
>> This has the added benefit that the optimistic version of the cmpxchg
>> loop would be safe again.  I'm about to test this with this variant.
>> I'll try and send a comprehensible set of patches in a few hours.
>>
>> Can you remind me what the benefit was of letting polling be set when
>> the idle thread schedules?
>
> Hysterical raisins, I don't think there's an actual reason, so yes, that
> might be the best option indeed.
>
>> It seems racy to me: it probably prevents
>> any safe use of the polling bit without holding the rq lock.  I guess
>> there's some benefit to having polling be set for as long as possible,
>> but it only helps if there are wakeups in very rapid succession, and
>> it costs a couple of extra bit ops per idle entry.
>
> So you could cheat and set it in pick_next_task_idle() and clear in
> put_prev_task_idle(), that way the entire idle loop, when running has it
> set.
>

Isn't that a little late for sched_ttwu_pending?  I guess it could be
okay, but I'm hesitant to muck around with the scheduler innards that
much.  I don't see anything that'll break, though.

I'm seriously confused by the polling situation, though.
TIF_NRFLAG_POLLING is defined for a whole bunch of architectures, but
I can't find any actual polling idle code outside cpuidle and x86.
For example, arm defines TIF_POLLING_NRFLAG, but I can't find anything
that clears the polling bit in the arm code.  Am I missing something
obvious here?  Is the whole point of this so that a wakeup that
happens right after the idle task is scheduled but before it goes idle
cancels idle and avoids an IPI?  This seems like a lot of complexity
to avoid IPIs in a hopefully extremely narrow window.

This is totally backwards for x86, but it seems to do the right thing
for other architectures.  I'm not convinced it does any good, though.

--Andy

> And then there was the idle injection loop crap trainwreck, which I
> should send patches for..



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03 16:52                     ` Andy Lutomirski
@ 2014-06-03 17:00                       ` Peter Zijlstra
  2014-06-03 18:28                         ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-06-03 17:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2056 bytes --]

On Tue, Jun 03, 2014 at 09:52:22AM -0700, Andy Lutomirski wrote:
> > So you could cheat and set it in pick_next_task_idle() and clear in
> > put_prev_task_idle(), that way the entire idle loop, when running has it
> > set.
> >
> 
> Isn't that a little late for sched_ttwu_pending?  I guess it could be
> okay, but I'm hesitant to muck around with the scheduler innards that
> much.  I don't see anything that'll break, though.

Yeah, only later did I see you clear much earlier, which makes sense.

> I'm seriously confused by the polling situation, though.
> TIF_NRFLAG_POLLING is defined for a whole bunch of architectures, but
> I can't find any actual polling idle code outside cpuidle and x86.

I think there's some in ppc somewhere, they poll a bit before doing the
idle hypercall.

> For example, arm defines TIF_POLLING_NRFLAG, but I can't find anything
> that clears the polling bit in the arm code.  Am I missing something
> obvious here? 

It wasn't at all clear to a bunch of arch people wtf that flag was for
and so its got copy/pasted around.

When I recently broke the build with that fetch_or() thing some arch
folks piped up and we ended up removing the flag for arm64 and metag.

Which reminds me, I should probably write a comment somewhere explaining
this stuff.

> Is the whole point of this so that a wakeup that
> happens right after the idle task is scheduled but before it goes idle
> cancels idle and avoids an IPI?  This seems like a lot of complexity
> to avoid IPIs in a hopefully extremely narrow window.
> 
> This is totally backwards for x86, but it seems to do the right thing
> for other architectures.  I'm not convinced it does any good, though.

Its all a bit confused.. back when tglx merged the idle loops we also
flipped the default state around, which didn't improve things.

In any case, I'm only aware of x86 mwait and the ppc amortize poll loop
and the generic poll loop.

That of course doesn't mean there isn't something hidden somewhere in
the 2x archs, but...

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03 17:00                       ` Peter Zijlstra
@ 2014-06-03 18:28                         ` Peter Zijlstra
  2014-06-03 18:44                           ` Andy Lutomirski
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2014-06-03 18:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

On Tue, Jun 03, 2014 at 07:00:18PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 03, 2014 at 09:52:22AM -0700, Andy Lutomirski wrote:
> > > So you could cheat and set it in pick_next_task_idle() and clear in
> > > put_prev_task_idle(), that way the entire idle loop, when running has it
> > > set.
> > >
> > 
> > Isn't that a little late for sched_ttwu_pending?  I guess it could be
> > okay, but I'm hesitant to muck around with the scheduler innards that
> > much.  I don't see anything that'll break, though.
> 
> Yeah, only later did I see you clear much earlier, which makes sense.

Could we clear it from set_nr_and_not_polling()/set_nr_if_polling()?
That's the only two functions that'll kick a cpu out of its polling
loop, and we're already writing to the word anyhow.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03 18:28                         ` Peter Zijlstra
@ 2014-06-03 18:44                           ` Andy Lutomirski
  2014-06-03 20:07                             ` Andy Lutomirski
  0 siblings, 1 reply; 51+ messages in thread
From: Andy Lutomirski @ 2014-06-03 18:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

On Tue, Jun 3, 2014 at 11:28 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jun 03, 2014 at 07:00:18PM +0200, Peter Zijlstra wrote:
>> On Tue, Jun 03, 2014 at 09:52:22AM -0700, Andy Lutomirski wrote:
>> > > So you could cheat and set it in pick_next_task_idle() and clear in
>> > > put_prev_task_idle(), that way the entire idle loop, when running has it
>> > > set.
>> > >
>> >
>> > Isn't that a little late for sched_ttwu_pending?  I guess it could be
>> > okay, but I'm hesitant to muck around with the scheduler innards that
>> > much.  I don't see anything that'll break, though.
>>
>> Yeah, only later did I see you clear much earlier, which makes sense.
>
> Could we clear it from set_nr_and_not_polling()/set_nr_if_polling()?
> That's the only two functions that'll kick a cpu out of its polling
> loop, and we're already writing to the word anyhow.

I'd be nervous about this.  I think it could break if
cpuidle_idle_call decides not to idle for any reason, and there is
plenty of complicated code in there.

I'm currently working on some patches that might make this clearer.
Give me a bit.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/8] sched,idle: need resched polling rework
  2014-06-03 18:44                           ` Andy Lutomirski
@ 2014-06-03 20:07                             ` Andy Lutomirski
  0 siblings, 0 replies; 51+ messages in thread
From: Andy Lutomirski @ 2014-06-03 20:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, nicolas.pitre, Daniel Lezcano,
	Mike Galbraith, linux-kernel

On Tue, Jun 3, 2014 at 11:44 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Jun 3, 2014 at 11:28 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Tue, Jun 03, 2014 at 07:00:18PM +0200, Peter Zijlstra wrote:
>>> On Tue, Jun 03, 2014 at 09:52:22AM -0700, Andy Lutomirski wrote:
>>> > > So you could cheat and set it in pick_next_task_idle() and clear in
>>> > > put_prev_task_idle(), that way the entire idle loop, when running has it
>>> > > set.
>>> > >
>>> >
>>> > Isn't that a little late for sched_ttwu_pending?  I guess it could be
>>> > okay, but I'm hesitant to muck around with the scheduler innards that
>>> > much.  I don't see anything that'll break, though.
>>>
>>> Yeah, only later did I see you clear much earlier, which makes sense.
>>
>> Could we clear it from set_nr_and_not_polling()/set_nr_if_polling()?
>> That's the only two functions that'll kick a cpu out of its polling
>> loop, and we're already writing to the word anyhow.
>
> I'd be nervous about this.  I think it could break if
> cpuidle_idle_call decides not to idle for any reason, and there is
> plenty of complicated code in there.
>
> I'm currently working on some patches that might make this clearer.
> Give me a bit.

The code's here:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=sched_ipi

In summary, here's what it does:

 - Fix poll_idle to set polling.

 - Add a new tracepoint for polling wakeups -- this makes it much
easier to verify that the code is doing something.

 - Make it so that polling is only set when the idle task is actually
paying attention.

 - Clean up wake_up_idle_cpu

 - More or less your earlier, more aggressive patch, with the
CONFIG_SMP thing fixed and waking rq->idle instead of the task to be
woken in ttwu.

This has survived my trading software for about twenty minutes, and
latency monitoring doesn't show any signs of missed wakeups.  If it
survives awhile longer and the kbuild robot doesn't complain, I'll
send out patches.

--Andy

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-06-03 20:07 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-11 13:42 [RFC][PATCH 0/8] sched,idle: need resched polling rework Peter Zijlstra
2014-04-11 13:42 ` [RFC][PATCH 1/8] sched,idle,alpha: Switch from TS_POLLING to TIF_POLLING_NRFLAG Peter Zijlstra
2014-04-11 14:38   ` Richard Henderson
2014-04-11 13:42 ` [RFC][PATCH 2/8] sched,idle,tile: " Peter Zijlstra
2014-04-11 15:15   ` Chris Metcalf
2014-04-11 15:30     ` Peter Zijlstra
2014-04-11 13:42 ` [RFC][PATCH 3/8] sched,idle,ia64: " Peter Zijlstra
2014-04-11 13:42 ` [RFC][PATCH 4/8] sched,idle,x86: " Peter Zijlstra
2014-04-11 13:42 ` [RFC][PATCH 5/8] sched,idle: Remove TS_POLLING support Peter Zijlstra
2014-04-11 13:42 ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs Peter Zijlstra
2014-04-13 21:41   ` Nicolas Pitre
2014-05-09 13:37   ` James Hogan
2014-05-09 14:15     ` Peter Zijlstra
2014-05-09 14:40       ` Catalin Marinas
2014-05-09 14:50         ` Peter Zijlstra
2014-05-09 14:57           ` Catalin Marinas
2014-05-09 17:02             ` Peter Zijlstra
2014-05-09 17:06               ` Peter Zijlstra
2014-05-09 17:09                 ` Catalin Marinas
2014-05-09 17:20                   ` Peter Zijlstra
2014-05-19 12:54                 ` [tip:sched/arch] arm64: Remove TIF_POLLING_NRFLAG tip-bot for Peter Zijlstra
2014-05-22 12:26                 ` [tip:sched/core] " tip-bot for Peter Zijlstra
2014-05-09 14:51       ` [RFC][PATCH 6/8] sched,idle: Avoid spurious wakeup IPIs James Hogan
2014-05-15  9:17         ` James Hogan
2014-05-19 12:54         ` [tip:sched/arch] metag: Remove TIF_POLLING_NRFLAG tip-bot for James Hogan
2014-05-22 12:26         ` [tip:sched/core] " tip-bot for James Hogan
2014-04-11 13:42 ` [RFC][PATCH 7/8] sched,idle: Delay clearing the polling bit Peter Zijlstra
2014-04-13 21:51   ` Nicolas Pitre
2014-04-11 13:42 ` [RFC][PATCH 8/8] sched,idle: Reflow cpuidle_idle_call() Peter Zijlstra
2014-04-13 21:36   ` Nicolas Pitre
2014-04-14  8:59     ` Peter Zijlstra
2014-04-14  9:25       ` Peter Zijlstra
2014-04-14 13:55         ` Nicolas Pitre
2014-04-11 15:00 ` [RFC][PATCH 0/8] sched,idle: need resched polling rework Andy Lutomirski
2014-04-11 15:14   ` Peter Zijlstra
2014-05-22 12:58   ` Peter Zijlstra
2014-05-22 13:09     ` Peter Zijlstra
2014-05-29  0:01       ` Andy Lutomirski
2014-05-29  6:48         ` Peter Zijlstra
2014-06-03  6:40           ` Andy Lutomirski
2014-06-03  6:51             ` Peter Zijlstra
2014-06-03 10:43             ` Peter Zijlstra
2014-06-03 14:02               ` Peter Zijlstra
2014-06-03 16:05                 ` Andy Lutomirski
2014-06-03 16:19                   ` Peter Zijlstra
2014-06-03 16:52                     ` Andy Lutomirski
2014-06-03 17:00                       ` Peter Zijlstra
2014-06-03 18:28                         ` Peter Zijlstra
2014-06-03 18:44                           ` Andy Lutomirski
2014-06-03 20:07                             ` Andy Lutomirski
2014-04-12  8:35 ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).