All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] nohz: Tick dependency mask v5
@ 2016-02-04 17:00 Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 1/9] atomic: Export fetch_or() Frederic Weisbecker
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

Few changes in this version.
It is mostly about off-case optimizations using static keys:

* Use wrappers named tick_[set,clear]_dep on top of
  tick_nohz_[set,clear]_dep in order to optimize off-cases using static
  keys.

* Add Chris's reviewed-by

* Rebase against v4.5-rc1

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/core-v7

HEAD: d4527cabc51d3ca3c47b5ce5685f5508918f771e

Thanks,
	Frederic
---

Frederic Weisbecker (9):
      atomic: Export fetch_or()
      nohz: Implement wide kick on top of irq work
      nohz: New tick dependency mask
      nohz: Use enum code for tick stop failure tracing message
      perf: Migrate perf to use new tick dependency mask model
      sched: Account rr tasks
      sched: Migrate sched to use new tick dependency mask model
      posix-cpu-timers: Migrate to use new tick dependency mask model
      sched-clock: Migrate to use new tick dependency mask model


 include/linux/atomic.h         |  21 +++++
 include/linux/perf_event.h     |   6 --
 include/linux/posix-timers.h   |   3 -
 include/linux/sched.h          |  11 ++-
 include/linux/tick.h           |  97 ++++++++++++++++++++++-
 include/trace/events/timer.h   |  36 +++++++--
 kernel/events/core.c           |  65 +++++++++++----
 kernel/sched/clock.c           |   5 ++
 kernel/sched/core.c            |  49 +++++-------
 kernel/sched/rt.c              |  16 ++++
 kernel/sched/sched.h           |  48 +++++++----
 kernel/time/posix-cpu-timers.c |  52 +++---------
 kernel/time/tick-sched.c       | 175 ++++++++++++++++++++++++++++++++---------
 kernel/time/tick-sched.h       |   1 +
 14 files changed, 424 insertions(+), 161 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/9] atomic: Export fetch_or()
  2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
@ 2016-02-04 17:00 ` Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 2/9] nohz: Implement wide kick on top of irq work Frederic Weisbecker
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

Export fetch_or() that's implemented and used internally by the
scheduler. We are going to use it for NO_HZ so make it generally
available.

Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/atomic.h | 21 +++++++++++++++++++++
 kernel/sched/core.c    | 14 --------------
 2 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 301de78..6c502cb 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -548,6 +548,27 @@ static inline int atomic_dec_if_positive(atomic_t *v)
 }
 #endif
 
+/**
+ * fetch_or - perform *ptr |= mask and return old value of *ptr
+ * @ptr: pointer to value
+ * @mask: mask to OR on the value
+ *
+ * cmpxchg based fetch_or, macro so it works for different integer types
+ */
+#ifndef fetch_or
+#define fetch_or(ptr, mask)						\
+({	typeof(*(ptr)) __old, __val = *(ptr);				\
+	for (;;) {							\
+		__old = cmpxchg((ptr), __val, __val | (mask));		\
+		if (__old == __val)					\
+			break;						\
+		__val = __old;						\
+	}								\
+	__old;								\
+})
+#endif
+
+
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 63d3a24..f1f399e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -453,20 +453,6 @@ static inline void init_hrtick(void)
 }
 #endif	/* CONFIG_SCHED_HRTICK */
 
-/*
- * cmpxchg based fetch_or, macro so it works for different integer types
- */
-#define fetch_or(ptr, val)						\
-({	typeof(*(ptr)) __old, __val = *(ptr);				\
- 	for (;;) {							\
- 		__old = cmpxchg((ptr), __val, __val | (val));		\
- 		if (__old == __val)					\
- 			break;						\
- 		__val = __old;						\
- 	}								\
- 	__old;								\
-})
-
 #if defined(CONFIG_SMP) && defined(TIF_POLLING_NRFLAG)
 /*
  * Atomically set TIF_NEED_RESCHED and test for TIF_POLLING_NRFLAG,
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/9] nohz: Implement wide kick on top of irq work
  2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 1/9] atomic: Export fetch_or() Frederic Weisbecker
@ 2016-02-04 17:00 ` Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 3/9] nohz: New tick dependency mask Frederic Weisbecker
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

It simplifies it and allows wide kick to be performed, even when IRQs
are disabled, without an asynchronous level in the middle.

This comes at a cost of some more overhead on features like perf and
posix cpu timers slow-paths, which is probably not much important
for nohz full users.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/time/tick-sched.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9d7a053..f6a980f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -232,24 +232,20 @@ void tick_nohz_full_kick_cpu(int cpu)
 	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
 }
 
-static void nohz_full_kick_ipi(void *info)
-{
-	/* Empty, the tick restart happens on tick_nohz_irq_exit() */
-}
-
 /*
  * Kick all full dynticks CPUs in order to force these to re-evaluate
  * their dependency on the tick and restart it if necessary.
  */
 void tick_nohz_full_kick_all(void)
 {
+	int cpu;
+
 	if (!tick_nohz_full_running)
 		return;
 
 	preempt_disable();
-	smp_call_function_many(tick_nohz_full_mask,
-			       nohz_full_kick_ipi, NULL, false);
-	tick_nohz_full_kick();
+	for_each_cpu_and(cpu, tick_nohz_full_mask, cpu_online_mask)
+		tick_nohz_full_kick_cpu(cpu);
 	preempt_enable();
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/9] nohz: New tick dependency mask
  2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 1/9] atomic: Export fetch_or() Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 2/9] nohz: Implement wide kick on top of irq work Frederic Weisbecker
@ 2016-02-04 17:00 ` Frederic Weisbecker
  2016-02-16  8:03   ` Ingo Molnar
  2016-02-04 17:00 ` [PATCH 4/9] nohz: Use enum code for tick stop failure tracing message Frederic Weisbecker
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

The tick dependency is evaluated on every IRQ and context switch. This
consists is a batch of checks which determine whether it is safe to
stop the tick or not. These checks are often split in many details:
posix cpu timers, scheduler, sched clock, perf events.... each of which
are made of smaller details: posix cpu timer involves checking process
wide timers then thread wide timers. Perf involves checking freq events
then more per cpu details.

Checking these informations asynchronously every time we update the full
dynticks state bring avoidable overhead and a messy layout.

Let's introduce instead tick dependency masks: one for system wide
dependency (unstable sched clock, freq based perf events), one for CPU
wide dependency (sched, throttling perf events), and task/signal level
dependencies (posix cpu timers). The subsystems are responsible
for setting and clearing their dependency through a set of APIs that will
take care of concurrent dependency mask modifications and kick targets
to restart the relevant CPU tick whenever needed.

This new dependency engine stays beside the old one until all subsystems
having a tick dependency are converted to it.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/sched.h    |   8 +++
 include/linux/tick.h     |  92 +++++++++++++++++++++++++++++
 kernel/time/tick-sched.c | 150 ++++++++++++++++++++++++++++++++++++++++++++---
 kernel/time/tick-sched.h |   1 +
 4 files changed, 244 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a10494a..d482cc8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -719,6 +719,10 @@ struct signal_struct {
 	/* Earliest-expiration cache. */
 	struct task_cputime cputime_expires;
 
+#ifdef CONFIG_NO_HZ_FULL
+	unsigned long tick_dependency;
+#endif
+
 	struct list_head cpu_timers[3];
 
 	struct pid *tty_old_pgrp;
@@ -1542,6 +1546,10 @@ struct task_struct {
 		VTIME_SYS,
 	} vtime_snap_whence;
 #endif
+
+#ifdef CONFIG_NO_HZ_FULL
+	unsigned long tick_dependency;
+#endif
 	unsigned long nvcsw, nivcsw; /* context switch counts */
 	u64 start_time;		/* monotonic time in nsec */
 	u64 real_start_time;	/* boot based time in nsec */
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 97fd4e5..a33adab 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -97,6 +97,18 @@ static inline void tick_broadcast_exit(void)
 	tick_broadcast_oneshot_control(TICK_BROADCAST_EXIT);
 }
 
+enum tick_dependency_bit {
+	TICK_POSIX_TIMER_BIT	= 0,
+	TICK_PERF_EVENTS_BIT	= 1,
+	TICK_SCHED_BIT		= 2,
+	TICK_CLOCK_UNSTABLE_BIT	= 3
+};
+
+#define TICK_POSIX_TIMER_MASK		(1 << TICK_POSIX_TIMER_BIT)
+#define TICK_PERF_EVENTS_MASK		(1 << TICK_PERF_EVENTS_BIT)
+#define TICK_SCHED_MASK			(1 << TICK_SCHED_BIT)
+#define TICK_CLOCK_UNSTABLE_MASK	(1 << TICK_CLOCK_UNSTABLE_BIT)
+
 #ifdef CONFIG_NO_HZ_COMMON
 extern int tick_nohz_enabled;
 extern int tick_nohz_tick_stopped(void);
@@ -154,6 +166,72 @@ static inline int housekeeping_any_cpu(void)
 	return cpumask_any_and(housekeeping_mask, cpu_online_mask);
 }
 
+extern void tick_nohz_set_dep(enum tick_dependency_bit bit);
+extern void tick_nohz_clear_dep(enum tick_dependency_bit bit);
+extern void tick_nohz_set_dep_cpu(int cpu, enum tick_dependency_bit bit);
+extern void tick_nohz_clear_dep_cpu(int cpu, enum tick_dependency_bit bit);
+extern void tick_nohz_set_dep_task(struct task_struct *tsk,
+				   enum tick_dependency_bit bit);
+extern void tick_nohz_clear_dep_task(struct task_struct *tsk,
+				     enum tick_dependency_bit bit);
+extern void tick_nohz_set_dep_signal(struct signal_struct *signal,
+				     enum tick_dependency_bit bit);
+extern void tick_nohz_clear_dep_signal(struct signal_struct *signal,
+				       enum tick_dependency_bit bit);
+
+/*
+ * The below are tick_nohz_[set,clear]_dep() wrappers that optimize off-cases
+ * on top of static keys.
+ */
+static inline void tick_set_dep(enum tick_dependency_bit bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_set_dep(bit);
+}
+
+static inline void tick_clear_dep(enum tick_dependency_bit bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_clear_dep(bit);
+}
+
+static inline void tick_set_dep_cpu(int cpu, enum tick_dependency_bit bit)
+{
+	if (tick_nohz_full_cpu(cpu))
+		tick_nohz_set_dep_cpu(cpu, bit);
+}
+
+static inline void tick_clear_dep_cpu(int cpu, enum tick_dependency_bit bit)
+{
+	if (tick_nohz_full_cpu(cpu))
+		tick_nohz_clear_dep_cpu(cpu, bit);
+}
+
+static inline void tick_set_dep_task(struct task_struct *tsk,
+				     enum tick_dependency_bit bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_set_dep_task(tsk, bit);
+}
+static inline void tick_clear_dep_task(struct task_struct *tsk,
+				       enum tick_dependency_bit bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_clear_dep_task(tsk, bit);
+}
+static inline void tick_set_dep_signal(struct signal_struct *signal,
+				       enum tick_dependency_bit bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_set_dep_signal(signal, bit);
+}
+static inline void tick_clear_dep_signal(struct signal_struct *signal,
+					 enum tick_dependency_bit bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_clear_dep_signal(signal, bit);
+}
+
 extern void tick_nohz_full_kick(void);
 extern void tick_nohz_full_kick_cpu(int cpu);
 extern void tick_nohz_full_kick_all(void);
@@ -166,6 +244,20 @@ static inline int housekeeping_any_cpu(void)
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
+
+static inline void tick_set_dep(enum tick_dependency_bit bit) { }
+static inline void tick_clear_dep(enum tick_dependency_bit bit) { }
+static inline void tick_set_dep_cpu(int cpu, enum tick_dependency_bit bit) { }
+static inline void tick_clear_dep_cpu(int cpu, enum tick_dependency_bit bit) { }
+static inline void tick_set_dep_task(struct task_struct *tsk,
+				     enum tick_dependency_bit bit) { }
+static inline void tick_clear_dep_task(struct task_struct *tsk,
+				       enum tick_dependency_bit bit) { }
+static inline void tick_set_dep_signal(struct signal_struct *signal,
+				       enum tick_dependency_bit bit) { }
+static inline void tick_clear_dep_signal(struct task_struct *signal,
+					 enum tick_dependency_bit bit) { }
+
 static inline void tick_nohz_full_kick_cpu(int cpu) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f6a980f..8f0fc57 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -156,11 +156,53 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 cpumask_var_t tick_nohz_full_mask;
 cpumask_var_t housekeeping_mask;
 bool tick_nohz_full_running;
+static unsigned long tick_dependency;
 
-static bool can_stop_full_tick(void)
+static void trace_tick_dependency(unsigned long dep)
+{
+	if (dep & TICK_POSIX_TIMER_MASK) {
+		trace_tick_stop(0, "posix timers running\n");
+		return;
+	}
+
+	if (dep & TICK_PERF_EVENTS_MASK) {
+		trace_tick_stop(0, "perf events running\n");
+		return;
+	}
+
+	if (dep & TICK_SCHED_MASK) {
+		trace_tick_stop(0, "more than 1 task in runqueue\n");
+		return;
+	}
+
+	if (dep & TICK_CLOCK_UNSTABLE_MASK)
+		trace_tick_stop(0, "unstable sched clock\n");
+}
+
+static bool can_stop_full_tick(struct tick_sched *ts)
 {
 	WARN_ON_ONCE(!irqs_disabled());
 
+	if (tick_dependency) {
+		trace_tick_dependency(tick_dependency);
+		return false;
+	}
+
+	if (ts->tick_dependency) {
+		trace_tick_dependency(ts->tick_dependency);
+		return false;
+	}
+
+	if (current->tick_dependency) {
+		trace_tick_dependency(current->tick_dependency);
+		return false;
+	}
+
+	if (current->signal->tick_dependency) {
+		trace_tick_dependency(current->signal->tick_dependency);
+		return false;
+	}
+
 	if (!sched_can_stop_tick()) {
 		trace_tick_stop(0, "more than 1 task in runqueue\n");
 		return false;
@@ -176,9 +218,10 @@ static bool can_stop_full_tick(void)
 		return false;
 	}
 
-	/* sched_clock_tick() needs us? */
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
 	/*
+	 * sched_clock_tick() needs us?
+	 *
 	 * TODO: kick full dynticks CPUs when
 	 * sched_clock_stable is set.
 	 */
@@ -197,13 +240,13 @@ static bool can_stop_full_tick(void)
 	return true;
 }
 
-static void nohz_full_kick_work_func(struct irq_work *work)
+static void nohz_full_kick_func(struct irq_work *work)
 {
 	/* Empty, the tick restart happens on tick_nohz_irq_exit() */
 }
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
-	.func = nohz_full_kick_work_func,
+	.func = nohz_full_kick_func,
 };
 
 /*
@@ -249,6 +292,95 @@ void tick_nohz_full_kick_all(void)
 	preempt_enable();
 }
 
+static void tick_nohz_set_dep_all(unsigned long *dep,
+				  enum tick_dependency_bit bit)
+{
+	unsigned long prev;
+
+	prev = fetch_or(dep, BIT_MASK(bit));
+	if (!prev)
+		tick_nohz_full_kick_all();
+}
+
+/*
+ * Set a global tick dependency. Used by perf events that rely on freq and
+ * by unstable clock.
+ */
+void tick_nohz_set_dep(enum tick_dependency_bit bit)
+{
+	tick_nohz_set_dep_all(&tick_dependency, bit);
+}
+
+void tick_nohz_clear_dep(enum tick_dependency_bit bit)
+{
+	clear_bit(bit, &tick_dependency);
+}
+
+/*
+ * Set per-CPU tick dependency. Used by scheduler and perf events in order to
+ * manage events throttling.
+ */
+void tick_nohz_set_dep_cpu(int cpu, enum tick_dependency_bit bit)
+{
+	unsigned long prev;
+	struct tick_sched *ts;
+
+	ts = per_cpu_ptr(&tick_cpu_sched, cpu);
+
+	prev = fetch_or(&ts->tick_dependency, BIT_MASK(bit));
+	if (!prev) {
+		preempt_disable();
+		/* Perf needs local kick that is NMI safe */
+		if (cpu == smp_processor_id()) {
+			tick_nohz_full_kick();
+		} else {
+			/* Remote irq work not NMI-safe */
+			if (!WARN_ON_ONCE(in_nmi()))
+				tick_nohz_full_kick_cpu(cpu);
+		}
+		preempt_enable();
+	}
+}
+
+void tick_nohz_clear_dep_cpu(int cpu, enum tick_dependency_bit bit)
+{
+	struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu);
+
+	clear_bit(bit, &ts->tick_dependency);
+}
+
+/*
+ * Set a per-task tick dependency. Posix CPU timers need this in order to elapse
+ * per task timers.
+ */
+void tick_nohz_set_dep_task(struct task_struct *tsk, enum tick_dependency_bit bit)
+{
+	/*
+	 * We could optimize this with just kicking the target running the task
+	 * if that noise matters for nohz full users.
+	 */
+	tick_nohz_set_dep_all(&tsk->tick_dependency, bit);
+}
+
+void tick_nohz_clear_dep_task(struct task_struct *tsk, enum tick_dependency_bit bit)
+{
+	clear_bit(bit, &tsk->tick_dependency);
+}
+
+/*
+ * Set a per-taskgroup tick dependency. Posix CPU timers need this in order to elapse
+ * per process timers.
+ */
+void tick_nohz_set_dep_signal(struct signal_struct *sig, enum tick_dependency_bit bit)
+{
+	tick_nohz_set_dep_all(&sig->tick_dependency, bit);
+}
+
+void tick_nohz_clear_dep_signal(struct signal_struct *sig, enum tick_dependency_bit bit)
+{
+	clear_bit(bit, &sig->tick_dependency);
+}
+
 /*
  * Re-evaluate the need for the tick as we switch the current task.
  * It might need the tick due to per task/process properties:
@@ -257,15 +389,19 @@ void tick_nohz_full_kick_all(void)
 void __tick_nohz_task_switch(void)
 {
 	unsigned long flags;
+	struct tick_sched *ts;
 
 	local_irq_save(flags);
 
 	if (!tick_nohz_full_cpu(smp_processor_id()))
 		goto out;
 
-	if (tick_nohz_tick_stopped() && !can_stop_full_tick())
-		tick_nohz_full_kick();
+	ts = this_cpu_ptr(&tick_cpu_sched);
 
+	if (ts->tick_stopped) {
+		if (current->tick_dependency || current->signal->tick_dependency)
+			tick_nohz_full_kick();
+	}
 out:
 	local_irq_restore(flags);
 }
@@ -734,7 +870,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
 		return;
 
-	if (can_stop_full_tick())
+	if (can_stop_full_tick(ts))
 		tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 	else if (ts->tick_stopped)
 		tick_nohz_restart_sched_tick(ts, ktime_get(), 1);
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index a4a8d4e..d327f70 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -60,6 +60,7 @@ struct tick_sched {
 	u64				next_timer;
 	ktime_t				idle_expires;
 	int				do_timer_last;
+	unsigned long			tick_dependency;
 };
 
 extern struct tick_sched *tick_get_tick_sched(int cpu);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/9] nohz: Use enum code for tick stop failure tracing message
  2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2016-02-04 17:00 ` [PATCH 3/9] nohz: New tick dependency mask Frederic Weisbecker
@ 2016-02-04 17:00 ` Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 5/9] perf: Migrate perf to use new tick dependency mask model Frederic Weisbecker
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

It makes nohz tracing more lightweight, standard and easier to parse.

Examples:

       user_loop-2904  [007] d..1   517.701126: tick_stop: success=1 dependency=NONE
       user_loop-2904  [007] dn.1   518.021181: tick_stop: success=0 dependency=SCHED
    posix_timers-6142  [007] d..1  1739.027400: tick_stop: success=0 dependency=POSIX_TIMER
       user_loop-5463  [007] dN.1  1185.931939: tick_stop: success=0 dependency=PERF_EVENTS

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/tick.h         |  1 +
 include/trace/events/timer.h | 36 +++++++++++++++++++++++++++++++-----
 kernel/time/tick-sched.c     | 18 +++++++++---------
 3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index a33adab..9ae7ebf 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -104,6 +104,7 @@ enum tick_dependency_bit {
 	TICK_CLOCK_UNSTABLE_BIT	= 3
 };
 
+#define TICK_NONE_MASK			0
 #define TICK_POSIX_TIMER_MASK		(1 << TICK_POSIX_TIMER_BIT)
 #define TICK_PERF_EVENTS_MASK		(1 << TICK_PERF_EVENTS_BIT)
 #define TICK_SCHED_MASK			(1 << TICK_SCHED_BIT)
diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 073b9ac..2868fa5 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -328,23 +328,49 @@ TRACE_EVENT(itimer_expire,
 );
 
 #ifdef CONFIG_NO_HZ_COMMON
+
+#define TICK_DEP_NAMES					\
+		tick_dep_name(NONE)			\
+		tick_dep_name(POSIX_TIMER)		\
+		tick_dep_name(PERF_EVENTS)		\
+		tick_dep_name(SCHED)			\
+		tick_dep_name_end(CLOCK_UNSTABLE)
+
+#undef tick_dep_name
+#undef tick_dep_name_end
+
+#define tick_dep_name(sdep) TRACE_DEFINE_ENUM(TICK_##sdep##_MASK);
+#define tick_dep_name_end(sdep)  TRACE_DEFINE_ENUM(TICK_##sdep##_MASK);
+
+TICK_DEP_NAMES
+
+#undef tick_dep_name
+#undef tick_dep_name_end
+
+#define tick_dep_name(sdep) { TICK_##sdep##_MASK, #sdep },
+#define tick_dep_name_end(sdep) { TICK_##sdep##_MASK, #sdep }
+
+#define show_tick_dep_name(val)				\
+	__print_symbolic(val, TICK_DEP_NAMES)
+
 TRACE_EVENT(tick_stop,
 
-	TP_PROTO(int success, char *error_msg),
+	TP_PROTO(int success, int dependency),
 
-	TP_ARGS(success, error_msg),
+	TP_ARGS(success, dependency),
 
 	TP_STRUCT__entry(
 		__field( int ,		success	)
-		__string( msg, 		error_msg )
+		__field( int ,		dependency )
 	),
 
 	TP_fast_assign(
 		__entry->success	= success;
-		__assign_str(msg, error_msg);
+		__entry->dependency	= dependency;
 	),
 
-	TP_printk("success=%s msg=%s",  __entry->success ? "yes" : "no", __get_str(msg))
+	TP_printk("success=%d dependency=%s",  __entry->success, \
+			show_tick_dep_name(__entry->dependency))
 );
 #endif
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8f0fc57..f258381 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -161,22 +161,22 @@ static unsigned long tick_dependency;
 static void trace_tick_dependency(unsigned long dep)
 {
 	if (dep & TICK_POSIX_TIMER_MASK) {
-		trace_tick_stop(0, "posix timers running\n");
+		trace_tick_stop(0, TICK_POSIX_TIMER_MASK);
 		return;
 	}
 
 	if (dep & TICK_PERF_EVENTS_MASK) {
-		trace_tick_stop(0, "perf events running\n");
+		trace_tick_stop(0, TICK_PERF_EVENTS_MASK);
 		return;
 	}
 
 	if (dep & TICK_SCHED_MASK) {
-		trace_tick_stop(0, "more than 1 task in runqueue\n");
+		trace_tick_stop(0, TICK_SCHED_MASK);
 		return;
 	}
 
 	if (dep & TICK_CLOCK_UNSTABLE_MASK)
-		trace_tick_stop(0, "unstable sched clock\n");
+		trace_tick_stop(0, TICK_CLOCK_UNSTABLE_MASK);
 }
 
 static bool can_stop_full_tick(struct tick_sched *ts)
@@ -204,17 +204,17 @@ static bool can_stop_full_tick(struct tick_sched *ts)
 	}
 
 	if (!sched_can_stop_tick()) {
-		trace_tick_stop(0, "more than 1 task in runqueue\n");
+		trace_tick_stop(0, TICK_SCHED_MASK);
 		return false;
 	}
 
 	if (!posix_cpu_timers_can_stop_tick(current)) {
-		trace_tick_stop(0, "posix timers running\n");
+		trace_tick_stop(0, TICK_POSIX_TIMER_MASK);
 		return false;
 	}
 
 	if (!perf_event_can_stop_tick()) {
-		trace_tick_stop(0, "perf events running\n");
+		trace_tick_stop(0, TICK_PERF_EVENTS_MASK);
 		return false;
 	}
 
@@ -226,7 +226,7 @@ static bool can_stop_full_tick(struct tick_sched *ts)
 	 * sched_clock_stable is set.
 	 */
 	if (!sched_clock_stable()) {
-		trace_tick_stop(0, "unstable sched clock\n");
+		trace_tick_stop(0, TICK_CLOCK_UNSTABLE_MASK);
 		/*
 		 * Don't allow the user to think they can get
 		 * full NO_HZ with this machine.
@@ -819,7 +819,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 		ts->tick_stopped = 1;
-		trace_tick_stop(1, " ");
+		trace_tick_stop(1, TICK_NONE_MASK);
 	}
 
 	/*
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/9] perf: Migrate perf to use new tick dependency mask model
  2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2016-02-04 17:00 ` [PATCH 4/9] nohz: Use enum code for tick stop failure tracing message Frederic Weisbecker
@ 2016-02-04 17:00 ` Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 6/9] sched: Account rr tasks Frederic Weisbecker
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

Instead of providing asynchronous checks for the nohz subsystem to verify
perf event tick dependency, migrate perf to the new mask.

Perf needs the tick for two situations:

1) Freq events. We could set the tick dependency when those are
installed on a CPU context. But setting a global dependency on top of
the global freq events accounting is much easier. If people want that
to be optimized, we can still refine that on the per-CPU tick dependency
level. This patch dooesn't change the current behaviour anyway.

2) Throttled events: this is a per-cpu dependency.

Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/perf_event.h |  6 -----
 include/linux/tick.h       |  2 --
 kernel/events/core.c       | 65 ++++++++++++++++++++++++++++++++++------------
 kernel/time/tick-sched.c   |  8 +-----
 4 files changed, 49 insertions(+), 32 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f9828a4..15bc5a6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1111,12 +1111,6 @@ static inline void perf_event_task_tick(void)				{ }
 static inline int perf_event_release_kernel(struct perf_event *event)	{ return 0; }
 #endif
 
-#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL)
-extern bool perf_event_can_stop_tick(void);
-#else
-static inline bool perf_event_can_stop_tick(void)			{ return true; }
-#endif
-
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
 extern void perf_restore_debug_store(void);
 #else
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9ae7ebf..994c5be 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -233,7 +233,6 @@ static inline void tick_clear_dep_signal(struct signal_struct *signal,
 		tick_nohz_clear_dep_signal(signal, bit);
 }
 
-extern void tick_nohz_full_kick(void);
 extern void tick_nohz_full_kick_cpu(int cpu);
 extern void tick_nohz_full_kick_all(void);
 extern void __tick_nohz_task_switch(void);
@@ -260,7 +259,6 @@ static inline void tick_clear_dep_signal(struct task_struct *signal,
 					 enum tick_dependency_bit bit) { }
 
 static inline void tick_nohz_full_kick_cpu(int cpu) { }
-static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
 static inline void __tick_nohz_task_switch(void) { }
 #endif
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 06ae52e..cedfbfe 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3051,17 +3051,6 @@ done:
 	return rotate;
 }
 
-#ifdef CONFIG_NO_HZ_FULL
-bool perf_event_can_stop_tick(void)
-{
-	if (atomic_read(&nr_freq_events) ||
-	    __this_cpu_read(perf_throttled_count))
-		return false;
-	else
-		return true;
-}
-#endif
-
 void perf_event_task_tick(void)
 {
 	struct list_head *head = this_cpu_ptr(&active_ctx_list);
@@ -3072,6 +3061,7 @@ void perf_event_task_tick(void)
 
 	__this_cpu_inc(perf_throttled_seq);
 	throttled = __this_cpu_xchg(perf_throttled_count, 0);
+	tick_clear_dep_cpu(smp_processor_id(), TICK_PERF_EVENTS_BIT);
 
 	list_for_each_entry_safe(ctx, tmp, head, active_ctx_list)
 		perf_adjust_freq_unthr_context(ctx, throttled);
@@ -3519,6 +3509,28 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
 		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
 }
 
+#ifdef CONFIG_NO_HZ_FULL
+static DEFINE_SPINLOCK(nr_freq_lock);
+#endif
+
+static void unaccount_freq_event_nohz(void)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	spin_lock(&nr_freq_lock);
+	if (atomic_dec_and_test(&nr_freq_events))
+		tick_nohz_clear_dep(TICK_PERF_EVENTS_BIT);
+	spin_unlock(&nr_freq_lock);
+#endif
+}
+
+static void unaccount_freq_event(void)
+{
+	if (tick_nohz_full_enabled())
+		unaccount_freq_event_nohz();
+	else
+		atomic_dec(&nr_freq_events);
+}
+
 static void unaccount_event(struct perf_event *event)
 {
 	if (event->parent)
@@ -3533,7 +3545,7 @@ static void unaccount_event(struct perf_event *event)
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
-		atomic_dec(&nr_freq_events);
+		unaccount_freq_event();
 	if (event->attr.context_switch) {
 		static_key_slow_dec_deferred(&perf_sched_events);
 		atomic_dec(&nr_switch_events);
@@ -6359,9 +6371,9 @@ static int __perf_event_overflow(struct perf_event *event,
 		if (unlikely(throttle
 			     && hwc->interrupts >= max_samples_per_tick)) {
 			__this_cpu_inc(perf_throttled_count);
+			tick_set_dep_cpu(smp_processor_id(), TICK_PERF_EVENTS_BIT);
 			hwc->interrupts = MAX_INTERRUPTS;
 			perf_log_throttle(event, 0);
-			tick_nohz_full_kick();
 			ret = 1;
 		}
 	}
@@ -7751,6 +7763,27 @@ static void account_event_cpu(struct perf_event *event, int cpu)
 		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
 }
 
+/* Freq events need the tick to stay alive (see perf_event_task_tick). */
+static void account_freq_event_nohz(void)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	/* Lock so we don't race with concurrent unaccount */
+	spin_lock(&nr_freq_lock);
+	if (atomic_inc_return(&nr_freq_events) == 1)
+		tick_nohz_set_dep(TICK_PERF_EVENTS_BIT);
+	spin_unlock(&nr_freq_lock);
+#endif
+}
+
+static void account_freq_event(void)
+{
+	if (tick_nohz_full_enabled())
+		account_freq_event_nohz();
+	else
+		atomic_inc(&nr_freq_events);
+}
+
+
 static void account_event(struct perf_event *event)
 {
 	if (event->parent)
@@ -7764,10 +7797,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
-	if (event->attr.freq) {
-		if (atomic_inc_return(&nr_freq_events) == 1)
-			tick_nohz_full_kick_all();
-	}
+	if (event->attr.freq)
+		account_freq_event();
 	if (event->attr.context_switch) {
 		atomic_inc(&nr_switch_events);
 		static_key_slow_inc(&perf_sched_events.key);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f258381..6fdb55d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -22,7 +22,6 @@
 #include <linux/module.h>
 #include <linux/irq_work.h>
 #include <linux/posix-timers.h>
-#include <linux/perf_event.h>
 #include <linux/context_tracking.h>
 
 #include <asm/irq_regs.h>
@@ -213,11 +212,6 @@ static bool can_stop_full_tick(struct tick_sched *ts)
 		return false;
 	}
 
-	if (!perf_event_can_stop_tick()) {
-		trace_tick_stop(0, TICK_PERF_EVENTS_MASK);
-		return false;
-	}
-
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
 	/*
 	 * sched_clock_tick() needs us?
@@ -255,7 +249,7 @@ static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
  * This kick, unlike tick_nohz_full_kick_cpu() and tick_nohz_full_kick_all(),
  * is NMI safe.
  */
-void tick_nohz_full_kick(void)
+static void tick_nohz_full_kick(void)
 {
 	if (!tick_nohz_full_cpu(smp_processor_id()))
 		return;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/9] sched: Account rr tasks
  2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2016-02-04 17:00 ` [PATCH 5/9] perf: Migrate perf to use new tick dependency mask model Frederic Weisbecker
@ 2016-02-04 17:00 ` Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 7/9] sched: Migrate sched to use new tick dependency mask model Frederic Weisbecker
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

In order to evaluate the scheduler tick dependency without probing
context switches, we need to know how much SCHED_RR and SCHED_FIFO tasks
are enqueued as those policies don't have the same preemption
requirements.

To prepare for that, let's account SCHED_RR tasks, we'll be able to
deduce SCHED_FIFO tasks as well from it and the total RT tasks in the
runqueue.

Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/rt.c    | 16 ++++++++++++++++
 kernel/sched/sched.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8ec86ab..3f1fcff 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1142,12 +1142,27 @@ unsigned int rt_se_nr_running(struct sched_rt_entity *rt_se)
 }
 
 static inline
+unsigned int rt_se_rr_nr_running(struct sched_rt_entity *rt_se)
+{
+	struct rt_rq *group_rq = group_rt_rq(rt_se);
+	struct task_struct *tsk;
+
+	if (group_rq)
+		return group_rq->rr_nr_running;
+
+	tsk = rt_task_of(rt_se);
+
+	return (tsk->policy == SCHED_RR) ? 1 : 0;
+}
+
+static inline
 void inc_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 {
 	int prio = rt_se_prio(rt_se);
 
 	WARN_ON(!rt_prio(prio));
 	rt_rq->rt_nr_running += rt_se_nr_running(rt_se);
+	rt_rq->rr_nr_running += rt_se_rr_nr_running(rt_se);
 
 	inc_rt_prio(rt_rq, prio);
 	inc_rt_migration(rt_se, rt_rq);
@@ -1160,6 +1175,7 @@ void dec_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 	WARN_ON(!rt_prio(rt_se_prio(rt_se)));
 	WARN_ON(!rt_rq->rt_nr_running);
 	rt_rq->rt_nr_running -= rt_se_nr_running(rt_se);
+	rt_rq->rr_nr_running -= rt_se_rr_nr_running(rt_se);
 
 	dec_rt_prio(rt_rq, rt_se_prio(rt_se));
 	dec_rt_migration(rt_se, rt_rq);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 10f1637..f0abfce 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -450,6 +450,7 @@ static inline int rt_bandwidth_enabled(void)
 struct rt_rq {
 	struct rt_prio_array active;
 	unsigned int rt_nr_running;
+	unsigned int rr_nr_running;
 #if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED
 	struct {
 		int curr; /* highest queued rt task prio */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/9] sched: Migrate sched to use new tick dependency mask model
  2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2016-02-04 17:00 ` [PATCH 6/9] sched: Account rr tasks Frederic Weisbecker
@ 2016-02-04 17:00 ` Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 8/9] posix-cpu-timers: Migrate " Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 9/9] sched-clock: " Frederic Weisbecker
  8 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

Instead of providing asynchronous checks for the nohz subsystem to verify
sched tick dependency, migrate sched to the new mask.

Everytime a task is enqueued or dequeued, we evaluate the state of the
tick dependency on top of the policy of the tasks in the runqueue, by
order of priority:

SCHED_DEADLINE: Need the tick in order to periodically check for runtime
SCHED_FIFO    : Don't need the tick (no round-robin)
SCHED_RR      : Need the tick if more than 1 task of the same priority
                for round robin (simplified with checking if more than
                one SCHED_RR task no matter what priority).
SCHED_NORMAL  : Need the tick if more than 1 task for round-robin.

We could optimize that further with one flag per sched policy on the tick
dependency mask and perform only the checks relevant to the policy
concerned by an enqueue/dequeue operation.

Since the checks aren't based on the current task anymore, we could get
rid of the task switch hook but it's still needed for posix cpu
timers.

Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/sched.h    |  3 ---
 kernel/sched/core.c      | 35 ++++++++++++++++++++---------------
 kernel/sched/sched.h     | 47 +++++++++++++++++++++++++++++++++--------------
 kernel/time/tick-sched.c |  5 -----
 4 files changed, 53 insertions(+), 37 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d482cc8..34bc493 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2364,10 +2364,7 @@ static inline void wake_up_nohz_cpu(int cpu) { }
 #endif
 
 #ifdef CONFIG_NO_HZ_FULL
-extern bool sched_can_stop_tick(void);
 extern u64 scheduler_tick_max_deferment(void);
-#else
-static inline bool sched_can_stop_tick(void) { return false; }
 #endif
 
 #ifdef CONFIG_SCHED_AUTOGROUP
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f1f399e..1239c20 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -701,31 +701,36 @@ static inline bool got_nohz_idle_kick(void)
 #endif /* CONFIG_NO_HZ_COMMON */
 
 #ifdef CONFIG_NO_HZ_FULL
-bool sched_can_stop_tick(void)
+bool sched_can_stop_tick(struct rq *rq)
 {
+	int fifo_nr_running;
+
+	/* Deadline tasks, even if single, need the tick */
+	if (rq->dl.dl_nr_running)
+		return false;
+
 	/*
-	 * FIFO realtime policy runs the highest priority task. Other runnable
-	 * tasks are of a lower priority. The scheduler tick does nothing.
+	 * FIFO realtime policy runs the highest priority task (after DEADLINE).
+	 * Other runnable tasks are of a lower priority. The scheduler tick
+	 * isn't needed.
 	 */
-	if (current->policy == SCHED_FIFO)
+	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
+	if (fifo_nr_running)
 		return true;
 
 	/*
 	 * Round-robin realtime tasks time slice with other tasks at the same
-	 * realtime priority. Is this task the only one at this priority?
+	 * realtime priority.
 	 */
-	if (current->policy == SCHED_RR) {
-		struct sched_rt_entity *rt_se = &current->rt;
-
-		return list_is_singular(&rt_se->run_list);
+	if (rq->rt.rr_nr_running) {
+		if (rq->rt.rr_nr_running == 1)
+			return true;
+		else
+			return false;
 	}
 
-	/*
-	 * More than one running task need preemption.
-	 * nr_running update is assumed to be visible
-	 * after IPI is sent from wakers.
-	 */
-	if (this_rq()->nr_running > 1)
+	/* Normal multitasking need periodic preemption checks */
+	if (rq->cfs.nr_running > 1)
 		return false;
 
 	return true;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f0abfce..f9e1a94 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1279,6 +1279,35 @@ unsigned long to_ratio(u64 period, u64 runtime);
 
 extern void init_entity_runnable_average(struct sched_entity *se);
 
+#ifdef CONFIG_NO_HZ_FULL
+extern bool sched_can_stop_tick(struct rq *rq);
+
+/*
+ * Tick may be needed by tasks in the runqueue depending on their policy and
+ * requirements. If tick is needed, lets send the target an IPI to kick it out of
+ * nohz mode if necessary.
+ */
+static inline void sched_update_tick_dependency(struct rq *rq)
+{
+	int cpu;
+
+	if (!tick_nohz_full_enabled())
+		return;
+
+	cpu = cpu_of(rq);
+
+	if (!tick_nohz_full_cpu(cpu))
+		return;
+
+	if (sched_can_stop_tick(rq))
+		tick_nohz_clear_dep_cpu(cpu, TICK_SCHED_BIT);
+	else
+		tick_nohz_set_dep_cpu(cpu, TICK_SCHED_BIT);
+}
+#else
+static inline void sched_update_tick_dependency(struct rq *rq) { }
+#endif
+
 static inline void add_nr_running(struct rq *rq, unsigned count)
 {
 	unsigned prev_nr = rq->nr_running;
@@ -1290,26 +1319,16 @@ static inline void add_nr_running(struct rq *rq, unsigned count)
 		if (!rq->rd->overload)
 			rq->rd->overload = true;
 #endif
-
-#ifdef CONFIG_NO_HZ_FULL
-		if (tick_nohz_full_cpu(rq->cpu)) {
-			/*
-			 * Tick is needed if more than one task runs on a CPU.
-			 * Send the target an IPI to kick it out of nohz mode.
-			 *
-			 * We assume that IPI implies full memory barrier and the
-			 * new value of rq->nr_running is visible on reception
-			 * from the target.
-			 */
-			tick_nohz_full_kick_cpu(rq->cpu);
-		}
-#endif
 	}
+
+	sched_update_tick_dependency(rq);
 }
 
 static inline void sub_nr_running(struct rq *rq, unsigned count)
 {
 	rq->nr_running -= count;
+	/* Check if we still need preemption */
+	sched_update_tick_dependency(rq);
 }
 
 static inline void rq_last_tick_reset(struct rq *rq)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6fdb55d..64f0469 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -202,11 +202,6 @@ static bool can_stop_full_tick(struct tick_sched *ts)
 		return false;
 	}
 
-	if (!sched_can_stop_tick()) {
-		trace_tick_stop(0, TICK_SCHED_MASK);
-		return false;
-	}
-
 	if (!posix_cpu_timers_can_stop_tick(current)) {
 		trace_tick_stop(0, TICK_POSIX_TIMER_MASK);
 		return false;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 8/9] posix-cpu-timers: Migrate to use new tick dependency mask model
  2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2016-02-04 17:00 ` [PATCH 7/9] sched: Migrate sched to use new tick dependency mask model Frederic Weisbecker
@ 2016-02-04 17:00 ` Frederic Weisbecker
  2016-02-04 17:00 ` [PATCH 9/9] sched-clock: " Frederic Weisbecker
  8 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

Instead of providing asynchronous checks for the nohz subsystem to verify
posix cpu timers tick dependency, migrate the latter to the new mask.

In order to keep track of the running timers and expose the tick
dependency accordingly, we must probe the timers queuing and dequeuing
on threads and process lists.

Unfortunately it implies both task and signal level dependencies. We
should be able to further optimize this and merge all that on the task
level dependency, at the cost of a bit of complexity and may be overhead.

Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/posix-timers.h   |  3 ---
 include/linux/tick.h           |  2 --
 kernel/time/posix-cpu-timers.c | 52 +++++++++---------------------------------
 kernel/time/tick-sched.c       |  7 +-----
 4 files changed, 12 insertions(+), 52 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 907f3fd..62d44c1 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -128,9 +128,6 @@ void posix_cpu_timer_schedule(struct k_itimer *timer);
 void run_posix_cpu_timers(struct task_struct *task);
 void posix_cpu_timers_exit(struct task_struct *task);
 void posix_cpu_timers_exit_group(struct task_struct *task);
-
-bool posix_cpu_timers_can_stop_tick(struct task_struct *tsk);
-
 void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
 			   cputime_t *newval, cputime_t *oldval);
 
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 994c5be..7e25952 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -234,7 +234,6 @@ static inline void tick_clear_dep_signal(struct signal_struct *signal,
 }
 
 extern void tick_nohz_full_kick_cpu(int cpu);
-extern void tick_nohz_full_kick_all(void);
 extern void __tick_nohz_task_switch(void);
 #else
 static inline int housekeeping_any_cpu(void)
@@ -259,7 +258,6 @@ static inline void tick_clear_dep_signal(struct task_struct *signal,
 					 enum tick_dependency_bit bit) { }
 
 static inline void tick_nohz_full_kick_cpu(int cpu) { }
-static inline void tick_nohz_full_kick_all(void) { }
 static inline void __tick_nohz_task_switch(void) { }
 #endif
 
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index f5e86d2..dd2b221 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -333,7 +333,6 @@ static int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *tp)
 	return err;
 }
 
-
 /*
  * Validate the clockid_t for a new CPU-clock timer, and initialize the timer.
  * This is called from sys_timer_create() and do_cpu_nanosleep() with the
@@ -517,6 +516,10 @@ static void arm_timer(struct k_itimer *timer)
 				cputime_expires->sched_exp = exp;
 			break;
 		}
+		if (CPUCLOCK_PERTHREAD(timer->it_clock))
+			tick_set_dep_task(p, TICK_POSIX_TIMER_BIT);
+		else
+			tick_set_dep_signal(p->signal, TICK_POSIX_TIMER_BIT);
 	}
 }
 
@@ -582,39 +585,6 @@ static int cpu_timer_sample_group(const clockid_t which_clock,
 	return 0;
 }
 
-#ifdef CONFIG_NO_HZ_FULL
-static void nohz_kick_work_fn(struct work_struct *work)
-{
-	tick_nohz_full_kick_all();
-}
-
-static DECLARE_WORK(nohz_kick_work, nohz_kick_work_fn);
-
-/*
- * We need the IPIs to be sent from sane process context.
- * The posix cpu timers are always set with irqs disabled.
- */
-static void posix_cpu_timer_kick_nohz(void)
-{
-	if (context_tracking_is_enabled())
-		schedule_work(&nohz_kick_work);
-}
-
-bool posix_cpu_timers_can_stop_tick(struct task_struct *tsk)
-{
-	if (!task_cputime_zero(&tsk->cputime_expires))
-		return false;
-
-	/* Check if cputimer is running. This is accessed without locking. */
-	if (READ_ONCE(tsk->signal->cputimer.running))
-		return false;
-
-	return true;
-}
-#else
-static inline void posix_cpu_timer_kick_nohz(void) { }
-#endif
-
 /*
  * Guts of sys_timer_settime for CPU timers.
  * This is called with the timer locked and interrupts disabled.
@@ -761,8 +731,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
 		sample_to_timespec(timer->it_clock,
 				   old_incr, &old->it_interval);
 	}
-	if (!ret)
-		posix_cpu_timer_kick_nohz();
+
 	return ret;
 }
 
@@ -911,6 +880,8 @@ static void check_thread_timers(struct task_struct *tsk,
 			__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
 		}
 	}
+	if (task_cputime_zero(tsk_expires))
+		tick_clear_dep_task(tsk, TICK_POSIX_TIMER_BIT);
 }
 
 static inline void stop_process_timers(struct signal_struct *sig)
@@ -919,6 +890,7 @@ static inline void stop_process_timers(struct signal_struct *sig)
 
 	/* Turn off cputimer->running. This is done without locking. */
 	WRITE_ONCE(cputimer->running, false);
+	tick_clear_dep_signal(sig, TICK_POSIX_TIMER_BIT);
 }
 
 static u32 onecputick;
@@ -1095,8 +1067,6 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
 	arm_timer(timer);
 	unlock_task_sighand(p, &flags);
 
-	/* Kick full dynticks CPUs in case they need to tick on the new timer */
-	posix_cpu_timer_kick_nohz();
 out:
 	timer->it_overrun_last = timer->it_overrun;
 	timer->it_overrun = -1;
@@ -1270,7 +1240,7 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clock_idx,
 		}
 
 		if (!*newval)
-			goto out;
+			return;
 		*newval += now;
 	}
 
@@ -1288,8 +1258,8 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clock_idx,
 			tsk->signal->cputime_expires.virt_exp = *newval;
 		break;
 	}
-out:
-	posix_cpu_timer_kick_nohz();
+
+	tick_set_dep_signal(tsk->signal, TICK_POSIX_TIMER_BIT);
 }
 
 static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 64f0469..1f5226b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -202,11 +202,6 @@ static bool can_stop_full_tick(struct tick_sched *ts)
 		return false;
 	}
 
-	if (!posix_cpu_timers_can_stop_tick(current)) {
-		trace_tick_stop(0, TICK_POSIX_TIMER_MASK);
-		return false;
-	}
-
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
 	/*
 	 * sched_clock_tick() needs us?
@@ -268,7 +263,7 @@ void tick_nohz_full_kick_cpu(int cpu)
  * Kick all full dynticks CPUs in order to force these to re-evaluate
  * their dependency on the tick and restart it if necessary.
  */
-void tick_nohz_full_kick_all(void)
+static void tick_nohz_full_kick_all(void)
 {
 	int cpu;
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 9/9] sched-clock: Migrate to use new tick dependency mask model
  2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2016-02-04 17:00 ` [PATCH 8/9] posix-cpu-timers: Migrate " Frederic Weisbecker
@ 2016-02-04 17:00 ` Frederic Weisbecker
  8 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-04 17:00 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

Instead of checking sched_clock_stable from the nohz subsystem to verify
its tick dependency, migrate it to the new mask in order to include it
to the all-in-one check.

Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/clock.c     |  5 +++++
 kernel/time/tick-sched.c | 19 -------------------
 2 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index bc54e84..5b8d349 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -61,6 +61,7 @@
 #include <linux/static_key.h>
 #include <linux/workqueue.h>
 #include <linux/compiler.h>
+#include <linux/tick.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -89,6 +90,8 @@ static void __set_sched_clock_stable(void)
 {
 	if (!sched_clock_stable())
 		static_key_slow_inc(&__sched_clock_stable);
+
+	tick_clear_dep(TICK_CLOCK_UNSTABLE_BIT);
 }
 
 void set_sched_clock_stable(void)
@@ -108,6 +111,8 @@ static void __clear_sched_clock_stable(struct work_struct *work)
 	/* XXX worry about clock continuity */
 	if (sched_clock_stable())
 		static_key_slow_dec(&__sched_clock_stable);
+
+	tick_set_dep(TICK_CLOCK_UNSTABLE_BIT);
 }
 
 static DECLARE_WORK(sched_clock_work, __clear_sched_clock_stable);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1f5226b..cd5b4cf 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -202,25 +202,6 @@ static bool can_stop_full_tick(struct tick_sched *ts)
 		return false;
 	}
 
-#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
-	/*
-	 * sched_clock_tick() needs us?
-	 *
-	 * TODO: kick full dynticks CPUs when
-	 * sched_clock_stable is set.
-	 */
-	if (!sched_clock_stable()) {
-		trace_tick_stop(0, TICK_CLOCK_UNSTABLE_MASK);
-		/*
-		 * Don't allow the user to think they can get
-		 * full NO_HZ with this machine.
-		 */
-		WARN_ONCE(tick_nohz_full_running,
-			  "NO_HZ FULL will not work with unstable sched clock");
-		return false;
-	}
-#endif
-
 	return true;
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/9] nohz: New tick dependency mask
  2016-02-04 17:00 ` [PATCH 3/9] nohz: New tick dependency mask Frederic Weisbecker
@ 2016-02-16  8:03   ` Ingo Molnar
  2016-02-16 13:38     ` Frederic Weisbecker
  2016-03-03  0:47     ` [GIT PULL] nohz: Tick dependency mask v2 Frederic Weisbecker
  0 siblings, 2 replies; 15+ messages in thread
From: Ingo Molnar @ 2016-02-16  8:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Viresh Kumar, Rik van Riel


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> The tick dependency is evaluated on every IRQ and context switch. This
> consists is a batch of checks which determine whether it is safe to
> stop the tick or not. These checks are often split in many details:
> posix cpu timers, scheduler, sched clock, perf events.... each of which
> are made of smaller details: posix cpu timer involves checking process
> wide timers then thread wide timers. Perf involves checking freq events
> then more per cpu details.
> 
> Checking these informations asynchronously every time we update the full
> dynticks state bring avoidable overhead and a messy layout.
> 
> Let's introduce instead tick dependency masks: one for system wide
> dependency (unstable sched clock, freq based perf events), one for CPU
> wide dependency (sched, throttling perf events), and task/signal level
> dependencies (posix cpu timers). The subsystems are responsible
> for setting and clearing their dependency through a set of APIs that will
> take care of concurrent dependency mask modifications and kick targets
> to restart the relevant CPU tick whenever needed.
> 
> This new dependency engine stays beside the old one until all subsystems
> having a tick dependency are converted to it.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Chris Metcalf <cmetcalf@ezchip.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  include/linux/sched.h    |   8 +++
>  include/linux/tick.h     |  92 +++++++++++++++++++++++++++++
>  kernel/time/tick-sched.c | 150 ++++++++++++++++++++++++++++++++++++++++++++---
>  kernel/time/tick-sched.h |   1 +
>  4 files changed, 244 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a10494a..d482cc8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -719,6 +719,10 @@ struct signal_struct {
>  	/* Earliest-expiration cache. */
>  	struct task_cputime cputime_expires;
>  
> +#ifdef CONFIG_NO_HZ_FULL
> +	unsigned long tick_dependency;
> +#endif
> +
>  	struct list_head cpu_timers[3];
>  
>  	struct pid *tty_old_pgrp;
> @@ -1542,6 +1546,10 @@ struct task_struct {
>  		VTIME_SYS,
>  	} vtime_snap_whence;
>  #endif
> +
> +#ifdef CONFIG_NO_HZ_FULL
> +	unsigned long tick_dependency;


So I think it would be useful to name this in a way the expresses that this is a 
mask.

'tick_dep_mask' or so?

> +#endif
>  	unsigned long nvcsw, nivcsw; /* context switch counts */
>  	u64 start_time;		/* monotonic time in nsec */
>  	u64 real_start_time;	/* boot based time in nsec */
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 97fd4e5..a33adab 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -97,6 +97,18 @@ static inline void tick_broadcast_exit(void)
>  	tick_broadcast_oneshot_control(TICK_BROADCAST_EXIT);
>  }
>  
> +enum tick_dependency_bit {

s/tick_dep_bits

> +	TICK_POSIX_TIMER_BIT	= 0,
> +	TICK_PERF_EVENTS_BIT	= 1,
> +	TICK_SCHED_BIT		= 2,
> +	TICK_CLOCK_UNSTABLE_BIT	= 3

s/TICK_DEP_BIT_...

> +};
> +
> +#define TICK_POSIX_TIMER_MASK		(1 << TICK_POSIX_TIMER_BIT)
> +#define TICK_PERF_EVENTS_MASK		(1 << TICK_PERF_EVENTS_BIT)
> +#define TICK_SCHED_MASK			(1 << TICK_SCHED_BIT)
> +#define TICK_CLOCK_UNSTABLE_MASK	(1 << TICK_CLOCK_UNSTABLE_BIT)

So I'd rename this to:

#define TICK_DEP_MASK_POSIX_TIMER	(1 << TICK_POSIX_TIMER_BIT)
#define TICK_DEP_MASK_PERF_EVENTS	(1 << TICK_PERF_EVENTS_BIT)
#define TICK_DEP_MASK_SCHED		(1 << TICK_SCHED_BIT)
#define TICK_DEP_MASK_CLOCK_UNSTABLE	(1 << TICK_CLOCK_UNSTABLE_BIT)

i.e. the 'tick_dep' and 'TICK_DEP' nomenclature would be used throughout the code 
and the pattern would be easy to grep for.

> +extern void tick_nohz_set_dep(enum tick_dependency_bit bit);
> +extern void tick_nohz_clear_dep(enum tick_dependency_bit bit);
> +extern void tick_nohz_set_dep_cpu(int cpu, enum tick_dependency_bit bit);
> +extern void tick_nohz_clear_dep_cpu(int cpu, enum tick_dependency_bit bit);
> +extern void tick_nohz_set_dep_task(struct task_struct *tsk,
> +				   enum tick_dependency_bit bit);
> +extern void tick_nohz_clear_dep_task(struct task_struct *tsk,
> +				     enum tick_dependency_bit bit);
> +extern void tick_nohz_set_dep_signal(struct signal_struct *signal,
> +				     enum tick_dependency_bit bit);
> +extern void tick_nohz_clear_dep_signal(struct signal_struct *signal,
> +				       enum tick_dependency_bit bit);

Ditto, please rename it all to:

	tick_dep_set()
	tick_dep_clear()
	tick_dep_set_cpu()
	tick_dep_clear_cpu()
	tick_dep_set_task()
	...

also, please don't line-break function prototypes, it only makes the result harder 
to read.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/9] nohz: New tick dependency mask
  2016-02-16  8:03   ` Ingo Molnar
@ 2016-02-16 13:38     ` Frederic Weisbecker
  2016-03-03  0:47     ` [GIT PULL] nohz: Tick dependency mask v2 Frederic Weisbecker
  1 sibling, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-02-16 13:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Viresh Kumar, Rik van Riel

On Tue, Feb 16, 2016 at 09:03:45AM +0100, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > The tick dependency is evaluated on every IRQ and context switch. This
> > consists is a batch of checks which determine whether it is safe to
> > stop the tick or not. These checks are often split in many details:
> > posix cpu timers, scheduler, sched clock, perf events.... each of which
> > are made of smaller details: posix cpu timer involves checking process
> > wide timers then thread wide timers. Perf involves checking freq events
> > then more per cpu details.
> > 
> > Checking these informations asynchronously every time we update the full
> > dynticks state bring avoidable overhead and a messy layout.
> > 
> > Let's introduce instead tick dependency masks: one for system wide
> > dependency (unstable sched clock, freq based perf events), one for CPU
> > wide dependency (sched, throttling perf events), and task/signal level
> > dependencies (posix cpu timers). The subsystems are responsible
> > for setting and clearing their dependency through a set of APIs that will
> > take care of concurrent dependency mask modifications and kick targets
> > to restart the relevant CPU tick whenever needed.
> > 
> > This new dependency engine stays beside the old one until all subsystems
> > having a tick dependency are converted to it.
> > 
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Chris Metcalf <cmetcalf@ezchip.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Luiz Capitulino <lcapitulino@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  include/linux/sched.h    |   8 +++
> >  include/linux/tick.h     |  92 +++++++++++++++++++++++++++++
> >  kernel/time/tick-sched.c | 150 ++++++++++++++++++++++++++++++++++++++++++++---
> >  kernel/time/tick-sched.h |   1 +
> >  4 files changed, 244 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index a10494a..d482cc8 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -719,6 +719,10 @@ struct signal_struct {
> >  	/* Earliest-expiration cache. */
> >  	struct task_cputime cputime_expires;
> >  
> > +#ifdef CONFIG_NO_HZ_FULL
> > +	unsigned long tick_dependency;
> > +#endif
> > +
> >  	struct list_head cpu_timers[3];
> >  
> >  	struct pid *tty_old_pgrp;
> > @@ -1542,6 +1546,10 @@ struct task_struct {
> >  		VTIME_SYS,
> >  	} vtime_snap_whence;
> >  #endif
> > +
> > +#ifdef CONFIG_NO_HZ_FULL
> > +	unsigned long tick_dependency;
> 
> 
> So I think it would be useful to name this in a way the expresses that this is a 
> mask.
> 
> 'tick_dep_mask' or so?
> 
> > +#endif
> >  	unsigned long nvcsw, nivcsw; /* context switch counts */
> >  	u64 start_time;		/* monotonic time in nsec */
> >  	u64 real_start_time;	/* boot based time in nsec */
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index 97fd4e5..a33adab 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -97,6 +97,18 @@ static inline void tick_broadcast_exit(void)
> >  	tick_broadcast_oneshot_control(TICK_BROADCAST_EXIT);
> >  }
> >  
> > +enum tick_dependency_bit {
> 
> s/tick_dep_bits
> 
> > +	TICK_POSIX_TIMER_BIT	= 0,
> > +	TICK_PERF_EVENTS_BIT	= 1,
> > +	TICK_SCHED_BIT		= 2,
> > +	TICK_CLOCK_UNSTABLE_BIT	= 3
> 
> s/TICK_DEP_BIT_...
> 
> > +};
> > +
> > +#define TICK_POSIX_TIMER_MASK		(1 << TICK_POSIX_TIMER_BIT)
> > +#define TICK_PERF_EVENTS_MASK		(1 << TICK_PERF_EVENTS_BIT)
> > +#define TICK_SCHED_MASK			(1 << TICK_SCHED_BIT)
> > +#define TICK_CLOCK_UNSTABLE_MASK	(1 << TICK_CLOCK_UNSTABLE_BIT)
> 
> So I'd rename this to:
> 
> #define TICK_DEP_MASK_POSIX_TIMER	(1 << TICK_POSIX_TIMER_BIT)
> #define TICK_DEP_MASK_PERF_EVENTS	(1 << TICK_PERF_EVENTS_BIT)
> #define TICK_DEP_MASK_SCHED		(1 << TICK_SCHED_BIT)
> #define TICK_DEP_MASK_CLOCK_UNSTABLE	(1 << TICK_CLOCK_UNSTABLE_BIT)
> 
> i.e. the 'tick_dep' and 'TICK_DEP' nomenclature would be used throughout the code 
> and the pattern would be easy to grep for.

I agree with all the above, I'll change that.

> 
> > +extern void tick_nohz_set_dep(enum tick_dependency_bit bit);
> > +extern void tick_nohz_clear_dep(enum tick_dependency_bit bit);
> > +extern void tick_nohz_set_dep_cpu(int cpu, enum tick_dependency_bit bit);
> > +extern void tick_nohz_clear_dep_cpu(int cpu, enum tick_dependency_bit bit);
> > +extern void tick_nohz_set_dep_task(struct task_struct *tsk,
> > +				   enum tick_dependency_bit bit);
> > +extern void tick_nohz_clear_dep_task(struct task_struct *tsk,
> > +				     enum tick_dependency_bit bit);
> > +extern void tick_nohz_set_dep_signal(struct signal_struct *signal,
> > +				     enum tick_dependency_bit bit);
> > +extern void tick_nohz_clear_dep_signal(struct signal_struct *signal,
> > +				       enum tick_dependency_bit bit);
> 
> Ditto, please rename it all to:
> 
> 	tick_dep_set()
> 	tick_dep_clear()
> 	tick_dep_set_cpu()
> 	tick_dep_clear_cpu()
> 	tick_dep_set_task()
> 	...

tick_dep_* are already used in the same patch as static key wrappers for the
tick_nohz_dep functions.

> 
> also, please don't line-break function prototypes, it only makes the result harder 
> to read.

Even if it exceeds 80 columns?

Thanks.

> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [GIT PULL] nohz: Tick dependency mask v2
  2016-02-16  8:03   ` Ingo Molnar
  2016-02-16 13:38     ` Frederic Weisbecker
@ 2016-03-03  0:47     ` Frederic Weisbecker
  2016-03-08 13:14       ` Ingo Molnar
  1 sibling, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2016-03-03  0:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Viresh Kumar, Rik van Riel

On Tue, Feb 16, 2016 at 09:03:45AM +0100, Ingo Molnar wrote:
> 
> 
> So I think it would be useful to name this in a way the expresses that this is a 
> mask.
> 
> 'tick_dep_mask' or so?

[...]

> >  
> > +enum tick_dependency_bit {
> 
> s/tick_dep_bits
> 
> > +	TICK_POSIX_TIMER_BIT	= 0,
> > +	TICK_PERF_EVENTS_BIT	= 1,
> > +	TICK_SCHED_BIT		= 2,
> > +	TICK_CLOCK_UNSTABLE_BIT	= 3
> 
> s/TICK_DEP_BIT_...
> 
> > +};
> > +
> > +#define TICK_POSIX_TIMER_MASK		(1 << TICK_POSIX_TIMER_BIT)
> > +#define TICK_PERF_EVENTS_MASK		(1 << TICK_PERF_EVENTS_BIT)
> > +#define TICK_SCHED_MASK			(1 << TICK_SCHED_BIT)
> > +#define TICK_CLOCK_UNSTABLE_MASK	(1 << TICK_CLOCK_UNSTABLE_BIT)
> 
> So I'd rename this to:
> 
> #define TICK_DEP_MASK_POSIX_TIMER	(1 << TICK_POSIX_TIMER_BIT)
> #define TICK_DEP_MASK_PERF_EVENTS	(1 << TICK_PERF_EVENTS_BIT)
> #define TICK_DEP_MASK_SCHED		(1 << TICK_SCHED_BIT)
> #define TICK_DEP_MASK_CLOCK_UNSTABLE	(1 << TICK_CLOCK_UNSTABLE_BIT)
> 
> i.e. the 'tick_dep' and 'TICK_DEP' nomenclature would be used throughout the code 
> and the pattern would be easy to grep for.
> 
> > +extern void tick_nohz_set_dep(enum tick_dependency_bit bit);
> > +extern void tick_nohz_clear_dep(enum tick_dependency_bit bit);
> > +extern void tick_nohz_set_dep_cpu(int cpu, enum tick_dependency_bit bit);
> > +extern void tick_nohz_clear_dep_cpu(int cpu, enum tick_dependency_bit bit);
> > +extern void tick_nohz_set_dep_task(struct task_struct *tsk,
> > +				   enum tick_dependency_bit bit);
> > +extern void tick_nohz_clear_dep_task(struct task_struct *tsk,
> > +				     enum tick_dependency_bit bit);
> > +extern void tick_nohz_set_dep_signal(struct signal_struct *signal,
> > +				     enum tick_dependency_bit bit);
> > +extern void tick_nohz_clear_dep_signal(struct signal_struct *signal,
> > +				       enum tick_dependency_bit bit);
> 
> Ditto, please rename it all to:
> 
> 	tick_dep_set()
> 	tick_dep_clear()
> 	tick_dep_set_cpu()
> 	tick_dep_clear_cpu()
> 	tick_dep_set_task()
> 	...

Ok, I fixed all the above.

> 
> also, please don't line-break function prototypes, it only makes the result harder 
> to read.

I couldn't fix that though, I'm limited by the 80 columns.


If you're ok with it, please pull the branch:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/core-v9

HEAD: 4f49b90abb4aca6fe677c95fc352fd0674d489bd

--- Summary ---

Currently in nohz full configs, the tick dependency is checked
asynchronously by nohz code from interrupt and context switch for each
concerned subsystem with a set of function provided by these. Such
functions are made of many conditions and details that can be heavyweight
as they are called on fastpath: sched_can_stop_tick(),
posix_cpu_timer_can_stop_tick(), perf_event_can_stop_tick()...

Thomas suggested a few month ago to make that tick dependency check
synchronous. Instead of checking subsystems details from each interrupt
to guess if the tick can be stopped, every subsystem that may have a tick
dependency should set itself a flag specifying the state of that
dependency. This way we can verify if we can stop the tick with a single
lightweight mask check on fast path.

This conversion from a pull to a push model to implement tick dependency
is the core feature of this patchset that is split into:

* Nohz wide kick simplification
* Improve nohz tracing
* Introduce tick dependency mask
* Migrate scheduler, posix timers, perf events and sched clock tick
  dependencies to the tick dependency mask.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] nohz: Tick dependency mask v2
  2016-03-03  0:47     ` [GIT PULL] nohz: Tick dependency mask v2 Frederic Weisbecker
@ 2016-03-08 13:14       ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2016-03-08 13:14 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Viresh Kumar, Rik van Riel


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Tue, Feb 16, 2016 at 09:03:45AM +0100, Ingo Molnar wrote:
> > 
> > 
> > So I think it would be useful to name this in a way the expresses that this is a 
> > mask.
> > 
> > 'tick_dep_mask' or so?
> 
> [...]
> 
> > >  
> > > +enum tick_dependency_bit {
> > 
> > s/tick_dep_bits
> > 
> > > +	TICK_POSIX_TIMER_BIT	= 0,
> > > +	TICK_PERF_EVENTS_BIT	= 1,
> > > +	TICK_SCHED_BIT		= 2,
> > > +	TICK_CLOCK_UNSTABLE_BIT	= 3
> > 
> > s/TICK_DEP_BIT_...
> > 
> > > +};
> > > +
> > > +#define TICK_POSIX_TIMER_MASK		(1 << TICK_POSIX_TIMER_BIT)
> > > +#define TICK_PERF_EVENTS_MASK		(1 << TICK_PERF_EVENTS_BIT)
> > > +#define TICK_SCHED_MASK			(1 << TICK_SCHED_BIT)
> > > +#define TICK_CLOCK_UNSTABLE_MASK	(1 << TICK_CLOCK_UNSTABLE_BIT)
> > 
> > So I'd rename this to:
> > 
> > #define TICK_DEP_MASK_POSIX_TIMER	(1 << TICK_POSIX_TIMER_BIT)
> > #define TICK_DEP_MASK_PERF_EVENTS	(1 << TICK_PERF_EVENTS_BIT)
> > #define TICK_DEP_MASK_SCHED		(1 << TICK_SCHED_BIT)
> > #define TICK_DEP_MASK_CLOCK_UNSTABLE	(1 << TICK_CLOCK_UNSTABLE_BIT)
> > 
> > i.e. the 'tick_dep' and 'TICK_DEP' nomenclature would be used throughout the code 
> > and the pattern would be easy to grep for.
> > 
> > > +extern void tick_nohz_set_dep(enum tick_dependency_bit bit);
> > > +extern void tick_nohz_clear_dep(enum tick_dependency_bit bit);
> > > +extern void tick_nohz_set_dep_cpu(int cpu, enum tick_dependency_bit bit);
> > > +extern void tick_nohz_clear_dep_cpu(int cpu, enum tick_dependency_bit bit);
> > > +extern void tick_nohz_set_dep_task(struct task_struct *tsk,
> > > +				   enum tick_dependency_bit bit);
> > > +extern void tick_nohz_clear_dep_task(struct task_struct *tsk,
> > > +				     enum tick_dependency_bit bit);
> > > +extern void tick_nohz_set_dep_signal(struct signal_struct *signal,
> > > +				     enum tick_dependency_bit bit);
> > > +extern void tick_nohz_clear_dep_signal(struct signal_struct *signal,
> > > +				       enum tick_dependency_bit bit);
> > 
> > Ditto, please rename it all to:
> > 
> > 	tick_dep_set()
> > 	tick_dep_clear()
> > 	tick_dep_set_cpu()
> > 	tick_dep_clear_cpu()
> > 	tick_dep_set_task()
> > 	...
> 
> Ok, I fixed all the above.
> 
> > 
> > also, please don't line-break function prototypes, it only makes the result harder 
> > to read.
> 
> I couldn't fix that though, I'm limited by the 80 columns.
> 
> 
> If you're ok with it, please pull the branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	timers/core-v9
> 
> HEAD: 4f49b90abb4aca6fe677c95fc352fd0674d489bd
> 
> --- Summary ---
> 
> Currently in nohz full configs, the tick dependency is checked
> asynchronously by nohz code from interrupt and context switch for each
> concerned subsystem with a set of function provided by these. Such
> functions are made of many conditions and details that can be heavyweight
> as they are called on fastpath: sched_can_stop_tick(),
> posix_cpu_timer_can_stop_tick(), perf_event_can_stop_tick()...
> 
> Thomas suggested a few month ago to make that tick dependency check
> synchronous. Instead of checking subsystems details from each interrupt
> to guess if the tick can be stopped, every subsystem that may have a tick
> dependency should set itself a flag specifying the state of that
> dependency. This way we can verify if we can stop the tick with a single
> lightweight mask check on fast path.
> 
> This conversion from a pull to a push model to implement tick dependency
> is the core feature of this patchset that is split into:
> 
> * Nohz wide kick simplification
> * Improve nohz tracing
> * Introduce tick dependency mask
> * Migrate scheduler, posix timers, perf events and sched clock tick
>   dependencies to the tick dependency mask.

Pulled into tip:timers/nohz, thanks a lot Frederic!

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/9] nohz: New tick dependency mask
  2015-12-14 18:38 [PATCH 0/9] nohz: Tick dependency mask v4 Frederic Weisbecker
@ 2015-12-14 18:38 ` Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2015-12-14 18:38 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter, Ingo Molnar,
	Viresh Kumar, Rik van Riel

The tick dependency is evaluated on every IRQ and context switch. This
consists is a batch of checks which determine whether it is safe to
stop the tick or not. These checks are often split in many details:
posix cpu timers, scheduler, sched clock, perf events.... each of which
are made of smaller details: posix cpu timer involves checking process
wide timers then thread wide timers. Perf involves checking freq events
then more per cpu details.

Checking these informations asynchronously every time we update the full
dynticks state bring avoidable overhead and a messy layout.

Let's introduce instead tick dependency masks: one for system wide
dependency (unstable sched clock, freq based perf events), one for CPU
wide dependency (sched, throttling perf events), and task/signal level
dependencies (posix cpu timers). The subsystems are responsible
for setting and clearing their dependency through a set of APIs that will
take care of concurrent dependency mask modifications and kick targets
to restart the relevant CPU tick whenever needed.

This new dependency engine stays beside the old one until all subsystems
having a tick dependency are converted to it.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/sched.h    |   8 +++
 include/linux/tick.h     |  39 ++++++++++++
 kernel/time/tick-sched.c | 150 ++++++++++++++++++++++++++++++++++++++++++++---
 kernel/time/tick-sched.h |   1 +
 4 files changed, 191 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index edad7a4..d1de0db 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -715,6 +715,10 @@ struct signal_struct {
 	/* Earliest-expiration cache. */
 	struct task_cputime cputime_expires;
 
+#ifdef CONFIG_NO_HZ_FULL
+	unsigned long tick_dependency;
+#endif
+
 	struct list_head cpu_timers[3];
 
 	struct pid *tty_old_pgrp;
@@ -1527,6 +1531,10 @@ struct task_struct {
 		VTIME_SYS,
 	} vtime_snap_whence;
 #endif
+
+#ifdef CONFIG_NO_HZ_FULL
+	unsigned long tick_dependency;
+#endif
 	unsigned long nvcsw, nivcsw; /* context switch counts */
 	u64 start_time;		/* monotonic time in nsec */
 	u64 real_start_time;	/* boot based time in nsec */
diff --git a/include/linux/tick.h b/include/linux/tick.h
index e312219..56c660e 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -97,6 +97,18 @@ static inline void tick_broadcast_exit(void)
 	tick_broadcast_oneshot_control(TICK_BROADCAST_EXIT);
 }
 
+enum tick_dependency_bit {
+	TICK_POSIX_TIMER_BIT	= 0,
+	TICK_PERF_EVENTS_BIT	= 1,
+	TICK_SCHED_BIT		= 2,
+	TICK_CLOCK_UNSTABLE_BIT	= 3
+};
+
+#define TICK_POSIX_TIMER_MASK		(1 << TICK_POSIX_TIMER_BIT)
+#define TICK_PERF_EVENTS_MASK		(1 << TICK_PERF_EVENTS_BIT)
+#define TICK_SCHED_MASK			(1 << TICK_SCHED_BIT)
+#define TICK_CLOCK_UNSTABLE_MASK	(1 << TICK_CLOCK_UNSTABLE_BIT)
+
 #ifdef CONFIG_NO_HZ_COMMON
 extern int tick_nohz_tick_stopped(void);
 extern void tick_nohz_idle_enter(void);
@@ -152,6 +164,19 @@ static inline int housekeeping_any_cpu(void)
 	return cpumask_any_and(housekeeping_mask, cpu_online_mask);
 }
 
+extern void tick_nohz_set_dep(enum tick_dependency_bit bit);
+extern void tick_nohz_clear_dep(enum tick_dependency_bit bit);
+extern void tick_nohz_set_dep_cpu(enum tick_dependency_bit bit, int cpu);
+extern void tick_nohz_clear_dep_cpu(enum tick_dependency_bit bit, int cpu);
+extern void tick_nohz_set_dep_task(struct task_struct *tsk,
+				   enum tick_dependency_bit bit);
+extern void tick_nohz_clear_dep_task(struct task_struct *tsk,
+				     enum tick_dependency_bit bit);
+extern void tick_nohz_set_dep_signal(struct signal_struct *signal,
+				     enum tick_dependency_bit bit);
+extern void tick_nohz_clear_dep_signal(struct signal_struct *signal,
+				       enum tick_dependency_bit bit);
+
 extern void tick_nohz_full_kick(void);
 extern void tick_nohz_full_kick_cpu(int cpu);
 extern void tick_nohz_full_kick_all(void);
@@ -164,6 +189,20 @@ static inline int housekeeping_any_cpu(void)
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
+
+static inline void tick_nohz_set_dep(enum tick_dependency_bit bit) { }
+static inline void tick_nohz_clear_dep(enum tick_dependency_bit bit) { }
+static inline void tick_nohz_set_dep_cpu(enum tick_dependency_bit bit, int cpu) { }
+static inline void tick_nohz_clear_dep_cpu(enum tick_dependency_bit bit, int cpu) { }
+static inline void tick_nohz_set_dep_task(enum tick_dependency_bit bit,
+					  struct task_struct *tsk) { }
+static inline void tick_nohz_clear_dep_task(enum tick_dependency_bit bit,
+					    struct task_struct *tsk) { }
+static inline void tick_nohz_set_dep_signal(enum tick_dependency_bit bit,
+					    struct signal_struct *signal) { }
+static inline void tick_nohz_clear_dep_signal(enum tick_dependency_bit bit,
+					      struct task_struct *signal) { }
+
 static inline void tick_nohz_full_kick_cpu(int cpu) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 509019c..093b807 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -156,11 +156,53 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 cpumask_var_t tick_nohz_full_mask;
 cpumask_var_t housekeeping_mask;
 bool tick_nohz_full_running;
+static unsigned long tick_dependency;
 
-static bool can_stop_full_tick(void)
+static void trace_tick_dependency(unsigned long dep)
+{
+	if (dep & TICK_POSIX_TIMER_MASK) {
+		trace_tick_stop(0, "posix timers running\n");
+		return;
+	}
+
+	if (dep & TICK_PERF_EVENTS_MASK) {
+		trace_tick_stop(0, "perf events running\n");
+		return;
+	}
+
+	if (dep & TICK_SCHED_MASK) {
+		trace_tick_stop(0, "more than 1 task in runqueue\n");
+		return;
+	}
+
+	if (dep & TICK_CLOCK_UNSTABLE_MASK)
+		trace_tick_stop(0, "unstable sched clock\n");
+}
+
+static bool can_stop_full_tick(struct tick_sched *ts)
 {
 	WARN_ON_ONCE(!irqs_disabled());
 
+	if (tick_dependency) {
+		trace_tick_dependency(tick_dependency);
+		return false;
+	}
+
+	if (ts->tick_dependency) {
+		trace_tick_dependency(ts->tick_dependency);
+		return false;
+	}
+
+	if (current->tick_dependency) {
+		trace_tick_dependency(current->tick_dependency);
+		return false;
+	}
+
+	if (current->signal->tick_dependency) {
+		trace_tick_dependency(current->signal->tick_dependency);
+		return false;
+	}
+
 	if (!sched_can_stop_tick()) {
 		trace_tick_stop(0, "more than 1 task in runqueue\n");
 		return false;
@@ -176,9 +218,10 @@ static bool can_stop_full_tick(void)
 		return false;
 	}
 
-	/* sched_clock_tick() needs us? */
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
 	/*
+	 * sched_clock_tick() needs us?
+	 *
 	 * TODO: kick full dynticks CPUs when
 	 * sched_clock_stable is set.
 	 */
@@ -197,13 +240,13 @@ static bool can_stop_full_tick(void)
 	return true;
 }
 
-static void nohz_full_kick_work_func(struct irq_work *work)
+static void nohz_full_kick_func(struct irq_work *work)
 {
 	/* Empty, the tick restart happens on tick_nohz_irq_exit() */
 }
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
-	.func = nohz_full_kick_work_func,
+	.func = nohz_full_kick_func,
 };
 
 /*
@@ -249,6 +292,95 @@ void tick_nohz_full_kick_all(void)
 	preempt_enable();
 }
 
+static void tick_nohz_set_dep_all(unsigned long *dep,
+				  enum tick_dependency_bit bit)
+{
+	unsigned long prev;
+
+	prev = fetch_or(dep, BIT_MASK(bit));
+	if (!prev)
+		tick_nohz_full_kick_all();
+}
+
+/*
+ * Set a global tick dependency. Used by perf events that rely on freq and
+ * by unstable clock.
+ */
+void tick_nohz_set_dep(enum tick_dependency_bit bit)
+{
+	tick_nohz_set_dep_all(&tick_dependency, bit);
+}
+
+void tick_nohz_clear_dep(enum tick_dependency_bit bit)
+{
+	clear_bit(bit, &tick_dependency);
+}
+
+/*
+ * Set per-CPU tick dependency. Used by scheduler and perf events in order to
+ * manage events throttling.
+ */
+void tick_nohz_set_dep_cpu(enum tick_dependency_bit bit, int cpu)
+{
+	unsigned long prev;
+	struct tick_sched *ts;
+
+	ts = per_cpu_ptr(&tick_cpu_sched, cpu);
+
+	prev = fetch_or(&ts->tick_dependency, BIT_MASK(bit));
+	if (!prev) {
+		preempt_disable();
+		/* Perf needs local kick that is NMI safe */
+		if (cpu == smp_processor_id()) {
+			tick_nohz_full_kick();
+		} else {
+			/* Remote irq work not NMI-safe */
+			if (!WARN_ON_ONCE(in_nmi()))
+				tick_nohz_full_kick_cpu(cpu);
+		}
+		preempt_enable();
+	}
+}
+
+void tick_nohz_clear_dep_cpu(enum tick_dependency_bit bit, int cpu)
+{
+	struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu);
+
+	clear_bit(bit, &ts->tick_dependency);
+}
+
+/*
+ * Set a per-task tick dependency. Posix CPU timers need this in order to elapse
+ * per task timers.
+ */
+void tick_nohz_set_dep_task(struct task_struct *tsk, enum tick_dependency_bit bit)
+{
+	/*
+	 * We could optimize this with just kicking the target running the task
+	 * if that noise matters for nohz full users.
+	 */
+	tick_nohz_set_dep_all(&tsk->tick_dependency, bit);
+}
+
+void tick_nohz_clear_dep_task(struct task_struct *tsk, enum tick_dependency_bit bit)
+{
+	clear_bit(bit, &tsk->tick_dependency);
+}
+
+/*
+ * Set a per-taskgroup tick dependency. Posix CPU timers need this in order to elapse
+ * per process timers.
+ */
+void tick_nohz_set_dep_signal(struct signal_struct *sig, enum tick_dependency_bit bit)
+{
+	tick_nohz_set_dep_all(&sig->tick_dependency, bit);
+}
+
+void tick_nohz_clear_dep_signal(struct signal_struct *sig, enum tick_dependency_bit bit)
+{
+	clear_bit(bit, &sig->tick_dependency);
+}
+
 /*
  * Re-evaluate the need for the tick as we switch the current task.
  * It might need the tick due to per task/process properties:
@@ -257,15 +389,19 @@ void tick_nohz_full_kick_all(void)
 void __tick_nohz_task_switch(void)
 {
 	unsigned long flags;
+	struct tick_sched *ts;
 
 	local_irq_save(flags);
 
 	if (!tick_nohz_full_cpu(smp_processor_id()))
 		goto out;
 
-	if (tick_nohz_tick_stopped() && !can_stop_full_tick())
-		tick_nohz_full_kick();
+	ts = this_cpu_ptr(&tick_cpu_sched);
 
+	if (ts->tick_stopped) {
+		if (current->tick_dependency || current->signal->tick_dependency)
+			tick_nohz_full_kick();
+	}
 out:
 	local_irq_restore(flags);
 }
@@ -718,7 +854,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
 		return;
 
-	if (can_stop_full_tick())
+	if (can_stop_full_tick(ts))
 		tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 	else if (ts->tick_stopped)
 		tick_nohz_restart_sched_tick(ts, ktime_get());
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index a4a8d4e..d327f70 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -60,6 +60,7 @@ struct tick_sched {
 	u64				next_timer;
 	ktime_t				idle_expires;
 	int				do_timer_last;
+	unsigned long			tick_dependency;
 };
 
 extern struct tick_sched *tick_get_tick_sched(int cpu);
-- 
2.6.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-03-08 13:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-04 17:00 [PATCH 0/9] nohz: Tick dependency mask v5 Frederic Weisbecker
2016-02-04 17:00 ` [PATCH 1/9] atomic: Export fetch_or() Frederic Weisbecker
2016-02-04 17:00 ` [PATCH 2/9] nohz: Implement wide kick on top of irq work Frederic Weisbecker
2016-02-04 17:00 ` [PATCH 3/9] nohz: New tick dependency mask Frederic Weisbecker
2016-02-16  8:03   ` Ingo Molnar
2016-02-16 13:38     ` Frederic Weisbecker
2016-03-03  0:47     ` [GIT PULL] nohz: Tick dependency mask v2 Frederic Weisbecker
2016-03-08 13:14       ` Ingo Molnar
2016-02-04 17:00 ` [PATCH 4/9] nohz: Use enum code for tick stop failure tracing message Frederic Weisbecker
2016-02-04 17:00 ` [PATCH 5/9] perf: Migrate perf to use new tick dependency mask model Frederic Weisbecker
2016-02-04 17:00 ` [PATCH 6/9] sched: Account rr tasks Frederic Weisbecker
2016-02-04 17:00 ` [PATCH 7/9] sched: Migrate sched to use new tick dependency mask model Frederic Weisbecker
2016-02-04 17:00 ` [PATCH 8/9] posix-cpu-timers: Migrate " Frederic Weisbecker
2016-02-04 17:00 ` [PATCH 9/9] sched-clock: " Frederic Weisbecker
  -- strict thread matches above, loose matches on Subject: below --
2015-12-14 18:38 [PATCH 0/9] nohz: Tick dependency mask v4 Frederic Weisbecker
2015-12-14 18:38 ` [PATCH 3/9] nohz: New tick dependency mask Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.