* [PATCH 1/6] sched: Rename init_rq_hrtick to hrtick_rq_init
2018-02-08 17:59 [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Frederic Weisbecker
@ 2018-02-08 17:59 ` Frederic Weisbecker
2018-02-09 6:53 ` Ingo Molnar
2018-02-08 17:59 ` [PATCH 2/6] nohz: Allow to check if remote CPU tick is stopped Frederic Weisbecker
` (5 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-08 17:59 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
Paul E . McKenney, Ingo Molnar, Wanpeng Li, Mike Galbraith,
Rik van Riel
Do that rename in order to normalize the hrtick namespace.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 36f113a..fc9fa25 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -333,7 +333,7 @@ void hrtick_start(struct rq *rq, u64 delay)
}
#endif /* CONFIG_SMP */
-static void init_rq_hrtick(struct rq *rq)
+static void hrtick_rq_init(struct rq *rq)
{
#ifdef CONFIG_SMP
rq->hrtick_csd_pending = 0;
@@ -351,7 +351,7 @@ static inline void hrtick_clear(struct rq *rq)
{
}
-static inline void init_rq_hrtick(struct rq *rq)
+static inline void hrtick_rq_init(struct rq *rq)
{
}
#endif /* CONFIG_SCHED_HRTICK */
@@ -6023,7 +6023,7 @@ void __init sched_init(void)
rq->last_sched_tick = 0;
#endif
#endif /* CONFIG_SMP */
- init_rq_hrtick(rq);
+ hrtick_rq_init(rq);
atomic_set(&rq->nr_iowait, 0);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/6] sched: Rename init_rq_hrtick to hrtick_rq_init
2018-02-08 17:59 ` [PATCH 1/6] sched: Rename init_rq_hrtick to hrtick_rq_init Frederic Weisbecker
@ 2018-02-09 6:53 ` Ingo Molnar
0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-02-09 6:53 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
Wanpeng Li, Mike Galbraith, Rik van Riel
* Frederic Weisbecker <frederic@kernel.org> wrote:
> Do that rename in order to normalize the hrtick namespace.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Wanpeng Li <kernellwp@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
> kernel/sched/core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 36f113a..fc9fa25 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -333,7 +333,7 @@ void hrtick_start(struct rq *rq, u64 delay)
> }
> #endif /* CONFIG_SMP */
>
> -static void init_rq_hrtick(struct rq *rq)
> +static void hrtick_rq_init(struct rq *rq)
On a related note, I think we should also do:
s/start_hrtick_dl
/sched_dl_hrtick_start
or such. (In a separate patch)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/6] nohz: Allow to check if remote CPU tick is stopped
2018-02-08 17:59 [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Frederic Weisbecker
2018-02-08 17:59 ` [PATCH 1/6] sched: Rename init_rq_hrtick to hrtick_rq_init Frederic Weisbecker
@ 2018-02-08 17:59 ` Frederic Weisbecker
2018-02-08 17:59 ` [PATCH 3/6] sched/isolation: Isolate workqueues when "nohz_full=" is set Frederic Weisbecker
` (4 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-08 17:59 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
Paul E . McKenney, Ingo Molnar, Wanpeng Li, Mike Galbraith,
Rik van Riel
This check is racy but provides a good heuristic to determine whether
a CPU may need a remote tick or not.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
include/linux/tick.h | 2 ++
kernel/time/tick-sched.c | 7 +++++++
2 files changed, 9 insertions(+)
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 7cc3592..944c829 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -114,6 +114,7 @@ enum tick_dep_bits {
#ifdef CONFIG_NO_HZ_COMMON
extern bool tick_nohz_enabled;
extern int tick_nohz_tick_stopped(void);
+extern int tick_nohz_tick_stopped_cpu(int cpu);
extern void tick_nohz_idle_enter(void);
extern void tick_nohz_idle_exit(void);
extern void tick_nohz_irq_exit(void);
@@ -125,6 +126,7 @@ extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
#else /* !CONFIG_NO_HZ_COMMON */
#define tick_nohz_enabled (0)
static inline int tick_nohz_tick_stopped(void) { return 0; }
+static inline int tick_nohz_tick_stopped_cpu(int cpu) { return 0; }
static inline void tick_nohz_idle_enter(void) { }
static inline void tick_nohz_idle_exit(void) { }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 29a5733..b517485 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -486,6 +486,13 @@ int tick_nohz_tick_stopped(void)
return __this_cpu_read(tick_cpu_sched.tick_stopped);
}
+int tick_nohz_tick_stopped_cpu(int cpu)
+{
+ struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu);
+
+ return ts->tick_stopped;
+}
+
/**
* tick_nohz_update_jiffies - update jiffies when idle was interrupted
*
--
2.7.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/6] sched/isolation: Isolate workqueues when "nohz_full=" is set
2018-02-08 17:59 [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Frederic Weisbecker
2018-02-08 17:59 ` [PATCH 1/6] sched: Rename init_rq_hrtick to hrtick_rq_init Frederic Weisbecker
2018-02-08 17:59 ` [PATCH 2/6] nohz: Allow to check if remote CPU tick is stopped Frederic Weisbecker
@ 2018-02-08 17:59 ` Frederic Weisbecker
2018-02-09 6:55 ` Ingo Molnar
2018-02-08 17:59 ` [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload Frederic Weisbecker
` (3 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-08 17:59 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
Paul E . McKenney, Ingo Molnar, Wanpeng Li, Mike Galbraith,
Rik van Riel
As we prepare for offloading the residual 1hz scheduler ticks to
workqueue, let's affine those to housekeepers so that they don't
interrupt the CPUs that don't want to be disturbed.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
include/linux/sched/isolation.h | 1 +
kernel/sched/isolation.c | 4 +++-
kernel/workqueue.c | 3 ++-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index d849431..4a6582c 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -12,6 +12,7 @@ enum hk_flags {
HK_FLAG_SCHED = (1 << 3),
HK_FLAG_TICK = (1 << 4),
HK_FLAG_DOMAIN = (1 << 5),
+ HK_FLAG_WQ = (1 << 6),
};
#ifdef CONFIG_CPU_ISOLATION
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index b71b436..8f1c1de 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -3,6 +3,7 @@
* any CPU: unbound workqueues, timers, kthreads and any offloadable work.
*
* Copyright (C) 2017 Red Hat, Inc., Frederic Weisbecker
+ * Copyright (C) 2017-2018 SUSE, Frederic Weisbecker
*
*/
@@ -119,7 +120,8 @@ static int __init housekeeping_nohz_full_setup(char *str)
{
unsigned int flags;
- flags = HK_FLAG_TICK | HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC;
+ flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER |
+ HK_FLAG_RCU | HK_FLAG_MISC;
return housekeeping_setup(str, flags);
}
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 017044c..0cee2d6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5570,7 +5570,8 @@ int __init workqueue_init_early(void)
WARN_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));
BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
- cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_FLAG_DOMAIN));
+ cpumask_copy(wq_unbound_cpumask,
+ housekeeping_cpumask(HK_FLAG_DOMAIN | HK_FLAG_WQ));
pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
--
2.7.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 3/6] sched/isolation: Isolate workqueues when "nohz_full=" is set
2018-02-08 17:59 ` [PATCH 3/6] sched/isolation: Isolate workqueues when "nohz_full=" is set Frederic Weisbecker
@ 2018-02-09 6:55 ` Ingo Molnar
2018-02-10 10:22 ` Frederic Weisbecker
0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-02-09 6:55 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
Wanpeng Li, Mike Galbraith, Rik van Riel
* Frederic Weisbecker <frederic@kernel.org> wrote:
> - flags = HK_FLAG_TICK | HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC;
> + flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER |
> + HK_FLAG_RCU | HK_FLAG_MISC;
> - cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_FLAG_DOMAIN));
> + cpumask_copy(wq_unbound_cpumask,
> + housekeeping_cpumask(HK_FLAG_DOMAIN | HK_FLAG_WQ));
LGTM, but _please_ don't do these ugly line-breaks, just keep it slightly over
col80.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/6] sched/isolation: Isolate workqueues when "nohz_full=" is set
2018-02-09 6:55 ` Ingo Molnar
@ 2018-02-10 10:22 ` Frederic Weisbecker
0 siblings, 0 replies; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-10 10:22 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
Wanpeng Li, Mike Galbraith, Rik van Riel
On Fri, Feb 09, 2018 at 07:55:44AM +0100, Ingo Molnar wrote:
>
> * Frederic Weisbecker <frederic@kernel.org> wrote:
>
> > - flags = HK_FLAG_TICK | HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC;
> > + flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER |
> > + HK_FLAG_RCU | HK_FLAG_MISC;
>
> > - cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_FLAG_DOMAIN));
> > + cpumask_copy(wq_unbound_cpumask,
> > + housekeeping_cpumask(HK_FLAG_DOMAIN | HK_FLAG_WQ));
>
> LGTM, but _please_ don't do these ugly line-breaks, just keep it slightly over
> col80.
Works for me, I'll fix.
Thanks.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload
2018-02-08 17:59 [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Frederic Weisbecker
` (2 preceding siblings ...)
2018-02-08 17:59 ` [PATCH 3/6] sched/isolation: Isolate workqueues when "nohz_full=" is set Frederic Weisbecker
@ 2018-02-08 17:59 ` Frederic Weisbecker
2018-02-09 7:16 ` Ingo Molnar
2018-02-08 17:59 ` [PATCH 5/6] sched/nohz: Remove the 1 Hz tick code Frederic Weisbecker
` (2 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-08 17:59 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
Paul E . McKenney, Ingo Molnar, Wanpeng Li, Mike Galbraith,
Rik van Riel
When a CPU runs in full dynticks mode, a 1Hz tick remains in order to
keep the scheduler stats alive. However this residual tick is a burden
for bare metal tasks that can't stand any interruption at all, or want
to minimize them.
The usual boot parameters "nohz_full=" or "isolcpus=nohz" will now
outsource these scheduler ticks to the global workqueue so that a
housekeeping CPU handles those remotely. The sched_class::task_tick()
implementations have been audited and look safe to be called remotely
as the target runqueue and its current task are passed in parameter
and don't seem to be accessed locally.
Note that in the case of using isolcpus, it's still up to the user to
affine the global workqueues to the housekeeping CPUs through
/sys/devices/virtual/workqueue/cpumask or domains isolation
"isolcpus=nohz,domain".
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/core.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
kernel/sched/isolation.c | 4 +++
kernel/sched/sched.h | 2 ++
3 files changed, 96 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fc9fa25..5c0e8b6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3120,7 +3120,94 @@ u64 scheduler_tick_max_deferment(void)
return jiffies_to_nsecs(next - now);
}
-#endif
+
+struct tick_work {
+ int cpu;
+ struct delayed_work work;
+};
+
+static struct tick_work __percpu *tick_work_cpu;
+
+static void sched_tick_remote(struct work_struct *work)
+{
+ struct delayed_work *dwork = to_delayed_work(work);
+ struct tick_work *twork = container_of(dwork, struct tick_work, work);
+ int cpu = twork->cpu;
+ struct rq *rq = cpu_rq(cpu);
+ struct rq_flags rf;
+
+ /*
+ * Handle the tick only if it appears the remote CPU is running
+ * in full dynticks mode. The check is racy by nature, but
+ * missing a tick or having one too much is no big deal.
+ */
+ if (!idle_cpu(cpu) && tick_nohz_tick_stopped_cpu(cpu)) {
+ struct task_struct *curr;
+ u64 delta;
+
+ rq_lock_irq(rq, &rf);
+ update_rq_clock(rq);
+ curr = rq->curr;
+ delta = rq_clock_task(rq) - curr->se.exec_start;
+ /* Make sure we tick in a reasonable amount of time */
+ WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
+ curr->sched_class->task_tick(rq, curr, 0);
+ rq_unlock_irq(rq, &rf);
+ }
+
+ /*
+ * Perform remote tick every second. The arbitrary frequence is
+ * large enough to avoid overload and short enough to keep sched
+ * internal stats alive.
+ */
+ queue_delayed_work(system_unbound_wq, dwork, HZ);
+}
+
+static void sched_tick_start(int cpu)
+{
+ struct tick_work *twork;
+
+ if (housekeeping_cpu(cpu, HK_FLAG_TICK))
+ return;
+
+ WARN_ON_ONCE(!tick_work_cpu);
+
+ twork = per_cpu_ptr(tick_work_cpu, cpu);
+ twork->cpu = cpu;
+ INIT_DELAYED_WORK(&twork->work, sched_tick_remote);
+ queue_delayed_work(system_unbound_wq, &twork->work, HZ);
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static void sched_tick_stop(int cpu)
+{
+ struct tick_work *twork;
+
+ if (housekeeping_cpu(cpu, HK_FLAG_TICK))
+ return;
+
+ WARN_ON_ONCE(!tick_work_cpu);
+
+ twork = per_cpu_ptr(tick_work_cpu, cpu);
+ cancel_delayed_work_sync(&twork->work);
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
+int __init sched_tick_offload_init(void)
+{
+ tick_work_cpu = alloc_percpu(struct tick_work);
+ if (!tick_work_cpu) {
+ pr_err("Can't allocate remote tick struct\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+#else
+static void sched_tick_start(int cpu) { }
+static void sched_tick_stop(int cpu) { }
+#endif /* CONFIG_NO_HZ_FULL */
#if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
defined(CONFIG_PREEMPT_TRACER))
@@ -5781,6 +5868,7 @@ int sched_cpu_starting(unsigned int cpu)
{
set_cpu_rq_start_time(cpu);
sched_rq_cpu_starting(cpu);
+ sched_tick_start(cpu);
return 0;
}
@@ -5792,6 +5880,7 @@ int sched_cpu_dying(unsigned int cpu)
/* Handle pending wakeups and then migrate everything off */
sched_ttwu_pending();
+ sched_tick_stop(cpu);
rq_lock_irqsave(rq, &rf);
if (rq->rd) {
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 8f1c1de..d782302 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -13,6 +13,7 @@
#include <linux/kernel.h>
#include <linux/static_key.h>
#include <linux/ctype.h>
+#include "sched.h"
DEFINE_STATIC_KEY_FALSE(housekeeping_overriden);
EXPORT_SYMBOL_GPL(housekeeping_overriden);
@@ -61,6 +62,9 @@ void __init housekeeping_init(void)
static_branch_enable(&housekeeping_overriden);
+ if (housekeeping_flags & HK_FLAG_TICK)
+ sched_tick_offload_init();
+
/* We need at least one CPU to handle housekeeping work */
WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fb5fc458..c1c7c78 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1574,6 +1574,7 @@ extern void post_init_entity_util_avg(struct sched_entity *se);
#ifdef CONFIG_NO_HZ_FULL
extern bool sched_can_stop_tick(struct rq *rq);
+extern int __init sched_tick_offload_init(void);
/*
* Tick may be needed by tasks in the runqueue depending on their policy and
@@ -1598,6 +1599,7 @@ static inline void sched_update_tick_dependency(struct rq *rq)
tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
}
#else
+static inline int sched_tick_offload_init(void) { return 0; }
static inline void sched_update_tick_dependency(struct rq *rq) { }
#endif
--
2.7.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload
2018-02-08 17:59 ` [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload Frederic Weisbecker
@ 2018-02-09 7:16 ` Ingo Molnar
2018-02-10 10:29 ` Frederic Weisbecker
0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-02-09 7:16 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
Wanpeng Li, Mike Galbraith, Rik van Riel
* Frederic Weisbecker <frederic@kernel.org> wrote:
> When a CPU runs in full dynticks mode, a 1Hz tick remains in order to
> keep the scheduler stats alive. However this residual tick is a burden
> for bare metal tasks that can't stand any interruption at all, or want
> to minimize them.
>
> The usual boot parameters "nohz_full=" or "isolcpus=nohz" will now
> outsource these scheduler ticks to the global workqueue so that a
> housekeeping CPU handles those remotely. The sched_class::task_tick()
> implementations have been audited and look safe to be called remotely
> as the target runqueue and its current task are passed in parameter
> and don't seem to be accessed locally.
>
> Note that in the case of using isolcpus, it's still up to the user to
> affine the global workqueues to the housekeeping CPUs through
> /sys/devices/virtual/workqueue/cpumask or domains isolation
> "isolcpus=nohz,domain".
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Wanpeng Li <kernellwp@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
> kernel/sched/core.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
> kernel/sched/isolation.c | 4 +++
> kernel/sched/sched.h | 2 ++
> 3 files changed, 96 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index fc9fa25..5c0e8b6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3120,7 +3120,94 @@ u64 scheduler_tick_max_deferment(void)
>
> return jiffies_to_nsecs(next - now);
> }
> -#endif
> +
> +struct tick_work {
> + int cpu;
> + struct delayed_work work;
> +};
> +
> +static struct tick_work __percpu *tick_work_cpu;
> +
> +static void sched_tick_remote(struct work_struct *work)
> +{
> + struct delayed_work *dwork = to_delayed_work(work);
> + struct tick_work *twork = container_of(dwork, struct tick_work, work);
> + int cpu = twork->cpu;
> + struct rq *rq = cpu_rq(cpu);
> + struct rq_flags rf;
> +
> + /*
> + * Handle the tick only if it appears the remote CPU is running
> + * in full dynticks mode. The check is racy by nature, but
> + * missing a tick or having one too much is no big deal.
I'd suggest pointing out why it's no big deal:
* missing a tick or having one too much is no big deal,
* because the scheduler tick updates statistics and checks
* timeslices in a time-independent way, regardless of when
* exactly it is running.
> + */
> + if (!idle_cpu(cpu) && tick_nohz_tick_stopped_cpu(cpu)) {
> + struct task_struct *curr;
> + u64 delta;
> +
> + rq_lock_irq(rq, &rf);
> + update_rq_clock(rq);
> + curr = rq->curr;
> + delta = rq_clock_task(rq) - curr->se.exec_start;
> + /* Make sure we tick in a reasonable amount of time */
> + WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
Please add a newline before the comment, and I'd also suggest this wording:
/* Make sure the next tick runs within a reasonable amount of time: */
> + /*
> + * Perform remote tick every second. The arbitrary frequence is
> + * large enough to avoid overload and short enough to keep sched
> + * internal stats alive.
> + */
> + queue_delayed_work(system_unbound_wq, dwork, HZ);
> +}
Typo. I'd also suggest somewhat clearer wording:
/*
* Run the remote tick once per second (1Hz). This arbitrary
* frequency is large enough to avoid overload but short enough
* to keep scheduler internal stats reasonably up to date.
*/
> +#ifdef CONFIG_HOTPLUG_CPU
> +static void sched_tick_stop(int cpu)
> +{
> + struct tick_work *twork;
> +
> + if (housekeeping_cpu(cpu, HK_FLAG_TICK))
> + return;
> +
> + WARN_ON_ONCE(!tick_work_cpu);
> +
> + twork = per_cpu_ptr(tick_work_cpu, cpu);
> + cancel_delayed_work_sync(&twork->work);
> +}
> +#endif /* CONFIG_HOTPLUG_CPU */
> +
> +int __init sched_tick_offload_init(void)
> +{
> + tick_work_cpu = alloc_percpu(struct tick_work);
> + if (!tick_work_cpu) {
> + pr_err("Can't allocate remote tick struct\n");
> + return -ENOMEM;
Printing a warning is not enough. If tick_work_cpu ends up being NULL, then the
tick will crash AFAICS, due to:
> + twork = per_cpu_ptr(tick_work_cpu, cpu);
> + cancel_delayed_work_sync(&twork->work);
... it's much better to crash straight away - i.e. we should use panic().
> +#else
> +static void sched_tick_start(int cpu) { }
> +static void sched_tick_stop(int cpu) { }
> +#endif /* CONFIG_NO_HZ_FULL */
So if we are using #if/else/endif markers, please use them in the #else branch
when it's so short, where they are actually useful:
> +#else /* !CONFIG_NO_HZ_FULL: */
> +static void sched_tick_start(int cpu) { }
> +static void sched_tick_stop(int cpu) { }
> +#endif
(also note the inversion)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload
2018-02-09 7:16 ` Ingo Molnar
@ 2018-02-10 10:29 ` Frederic Weisbecker
0 siblings, 0 replies; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-10 10:29 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
Wanpeng Li, Mike Galbraith, Rik van Riel
On Fri, Feb 09, 2018 at 08:16:12AM +0100, Ingo Molnar wrote:
>
> * Frederic Weisbecker <frederic@kernel.org> wrote:
>
> > When a CPU runs in full dynticks mode, a 1Hz tick remains in order to
> > keep the scheduler stats alive. However this residual tick is a burden
> > for bare metal tasks that can't stand any interruption at all, or want
> > to minimize them.
> >
> > The usual boot parameters "nohz_full=" or "isolcpus=nohz" will now
> > outsource these scheduler ticks to the global workqueue so that a
> > housekeeping CPU handles those remotely. The sched_class::task_tick()
> > implementations have been audited and look safe to be called remotely
> > as the target runqueue and its current task are passed in parameter
> > and don't seem to be accessed locally.
> >
> > Note that in the case of using isolcpus, it's still up to the user to
> > affine the global workqueues to the housekeeping CPUs through
> > /sys/devices/virtual/workqueue/cpumask or domains isolation
> > "isolcpus=nohz,domain".
> >
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Chris Metcalf <cmetcalf@mellanox.com>
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Luiz Capitulino <lcapitulino@redhat.com>
> > Cc: Mike Galbraith <efault@gmx.de>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Wanpeng Li <kernellwp@gmail.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > ---
> > kernel/sched/core.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
> > kernel/sched/isolation.c | 4 +++
> > kernel/sched/sched.h | 2 ++
> > 3 files changed, 96 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index fc9fa25..5c0e8b6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3120,7 +3120,94 @@ u64 scheduler_tick_max_deferment(void)
> >
> > return jiffies_to_nsecs(next - now);
> > }
> > -#endif
> > +
> > +struct tick_work {
> > + int cpu;
> > + struct delayed_work work;
> > +};
> > +
> > +static struct tick_work __percpu *tick_work_cpu;
> > +
> > +static void sched_tick_remote(struct work_struct *work)
> > +{
> > + struct delayed_work *dwork = to_delayed_work(work);
> > + struct tick_work *twork = container_of(dwork, struct tick_work, work);
> > + int cpu = twork->cpu;
> > + struct rq *rq = cpu_rq(cpu);
> > + struct rq_flags rf;
> > +
> > + /*
> > + * Handle the tick only if it appears the remote CPU is running
> > + * in full dynticks mode. The check is racy by nature, but
> > + * missing a tick or having one too much is no big deal.
>
> I'd suggest pointing out why it's no big deal:
>
> * missing a tick or having one too much is no big deal,
> * because the scheduler tick updates statistics and checks
> * timeslices in a time-independent way, regardless of when
> * exactly it is running.
>
> > + */
> > + if (!idle_cpu(cpu) && tick_nohz_tick_stopped_cpu(cpu)) {
> > + struct task_struct *curr;
> > + u64 delta;
> > +
> > + rq_lock_irq(rq, &rf);
> > + update_rq_clock(rq);
> > + curr = rq->curr;
> > + delta = rq_clock_task(rq) - curr->se.exec_start;
> > + /* Make sure we tick in a reasonable amount of time */
> > + WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
>
>
> Please add a newline before the comment, and I'd also suggest this wording:
>
> /* Make sure the next tick runs within a reasonable amount of time: */
>
> > + /*
> > + * Perform remote tick every second. The arbitrary frequence is
> > + * large enough to avoid overload and short enough to keep sched
> > + * internal stats alive.
> > + */
> > + queue_delayed_work(system_unbound_wq, dwork, HZ);
> > +}
>
> Typo. I'd also suggest somewhat clearer wording:
>
> /*
> * Run the remote tick once per second (1Hz). This arbitrary
> * frequency is large enough to avoid overload but short enough
> * to keep scheduler internal stats reasonably up to date.
> */
>
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +static void sched_tick_stop(int cpu)
> > +{
> > + struct tick_work *twork;
> > +
> > + if (housekeeping_cpu(cpu, HK_FLAG_TICK))
> > + return;
> > +
> > + WARN_ON_ONCE(!tick_work_cpu);
> > +
> > + twork = per_cpu_ptr(tick_work_cpu, cpu);
> > + cancel_delayed_work_sync(&twork->work);
> > +}
> > +#endif /* CONFIG_HOTPLUG_CPU */
> > +
> > +int __init sched_tick_offload_init(void)
> > +{
> > + tick_work_cpu = alloc_percpu(struct tick_work);
> > + if (!tick_work_cpu) {
> > + pr_err("Can't allocate remote tick struct\n");
> > + return -ENOMEM;
>
> Printing a warning is not enough. If tick_work_cpu ends up being NULL, then the
> tick will crash AFAICS, due to:
>
> > + twork = per_cpu_ptr(tick_work_cpu, cpu);
> > + cancel_delayed_work_sync(&twork->work);
>
> ... it's much better to crash straight away - i.e. we should use panic().
>
> > +#else
> > +static void sched_tick_start(int cpu) { }
> > +static void sched_tick_stop(int cpu) { }
> > +#endif /* CONFIG_NO_HZ_FULL */
>
> So if we are using #if/else/endif markers, please use them in the #else branch
> when it's so short, where they are actually useful:
>
> > +#else /* !CONFIG_NO_HZ_FULL: */
> > +static void sched_tick_start(int cpu) { }
> > +static void sched_tick_stop(int cpu) { }
> > +#endif
>
> (also note the inversion)
Ok for everything there, I'll fix.
Thanks!
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 5/6] sched/nohz: Remove the 1 Hz tick code
2018-02-08 17:59 [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Frederic Weisbecker
` (3 preceding siblings ...)
2018-02-08 17:59 ` [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload Frederic Weisbecker
@ 2018-02-08 17:59 ` Frederic Weisbecker
2018-02-08 17:59 ` [PATCH 6/6] sched/isolation: Tick offload documentation Frederic Weisbecker
2018-02-09 7:00 ` [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Ingo Molnar
6 siblings, 0 replies; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-08 17:59 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
Paul E . McKenney, Ingo Molnar, Wanpeng Li, Mike Galbraith,
Rik van Riel
Now that the 1Hz tick is offloaded to workqueues, we can safely remove
the residual code that used to handle it locally.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
include/linux/sched/nohz.h | 4 ----
kernel/sched/core.c | 29 -----------------------------
kernel/sched/idle_task.c | 1 -
kernel/sched/sched.h | 11 +----------
kernel/time/tick-sched.c | 6 ------
5 files changed, 1 insertion(+), 50 deletions(-)
diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 3d3a97d..0942172 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -37,8 +37,4 @@ extern void wake_up_nohz_cpu(int cpu);
static inline void wake_up_nohz_cpu(int cpu) { }
#endif
-#ifdef CONFIG_NO_HZ_FULL
-extern u64 scheduler_tick_max_deferment(void);
-#endif
-
#endif /* _LINUX_SCHED_NOHZ_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5c0e8b6..5aeecf2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3091,35 +3091,9 @@ void scheduler_tick(void)
rq->idle_balance = idle_cpu(cpu);
trigger_load_balance(rq);
#endif
- rq_last_tick_reset(rq);
}
#ifdef CONFIG_NO_HZ_FULL
-/**
- * scheduler_tick_max_deferment
- *
- * Keep at least one tick per second when a single
- * active task is running because the scheduler doesn't
- * yet completely support full dynticks environment.
- *
- * This makes sure that uptime, CFS vruntime, load
- * balancing, etc... continue to move forward, even
- * with a very low granularity.
- *
- * Return: Maximum deferment in nanoseconds.
- */
-u64 scheduler_tick_max_deferment(void)
-{
- struct rq *rq = this_rq();
- unsigned long next, now = READ_ONCE(jiffies);
-
- next = rq->last_sched_tick + HZ;
-
- if (time_before_eq(next, now))
- return 0;
-
- return jiffies_to_nsecs(next - now);
-}
struct tick_work {
int cpu;
@@ -6108,9 +6082,6 @@ void __init sched_init(void)
rq->last_load_update_tick = jiffies;
rq->nohz_flags = 0;
#endif
-#ifdef CONFIG_NO_HZ_FULL
- rq->last_sched_tick = 0;
-#endif
#endif /* CONFIG_SMP */
hrtick_rq_init(rq);
atomic_set(&rq->nr_iowait, 0);
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index d518664..a64fc92 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -48,7 +48,6 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
{
- rq_last_tick_reset(rq);
}
static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c1c7c78..dc6c8b5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -727,9 +727,7 @@ struct rq {
#endif /* CONFIG_SMP */
unsigned long nohz_flags;
#endif /* CONFIG_NO_HZ_COMMON */
-#ifdef CONFIG_NO_HZ_FULL
- unsigned long last_sched_tick;
-#endif
+
/* capture load from *all* tasks on this cpu: */
struct load_weight load;
unsigned long nr_load_updates;
@@ -1626,13 +1624,6 @@ static inline void sub_nr_running(struct rq *rq, unsigned count)
sched_update_tick_dependency(rq);
}
-static inline void rq_last_tick_reset(struct rq *rq)
-{
-#ifdef CONFIG_NO_HZ_FULL
- rq->last_sched_tick = jiffies;
-#endif
-}
-
extern void update_rq_clock(struct rq *rq);
extern void activate_task(struct rq *rq, struct task_struct *p, int flags);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b517485..0c615a0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -748,12 +748,6 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
delta = KTIME_MAX;
}
-#ifdef CONFIG_NO_HZ_FULL
- /* Limit the tick delta to the maximum scheduler deferment */
- if (!ts->inidle)
- delta = min(delta, scheduler_tick_max_deferment());
-#endif
-
/* Calculate the next expiry time */
if (delta < (KTIME_MAX - basemono))
expires = basemono + delta;
--
2.7.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 6/6] sched/isolation: Tick offload documentation
2018-02-08 17:59 [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Frederic Weisbecker
` (4 preceding siblings ...)
2018-02-08 17:59 ` [PATCH 5/6] sched/nohz: Remove the 1 Hz tick code Frederic Weisbecker
@ 2018-02-08 17:59 ` Frederic Weisbecker
2018-02-09 7:06 ` Ingo Molnar
2018-02-09 7:00 ` [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Ingo Molnar
6 siblings, 1 reply; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-08 17:59 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Peter Zijlstra, Chris Metcalf,
Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
Paul E . McKenney, Ingo Molnar, Wanpeng Li, Mike Galbraith,
Rik van Riel
Update the documentation to reflect the 1Hz tick offload changes.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
Documentation/admin-guide/kernel-parameters.txt | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 39ac9d4..c851e41 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1762,7 +1762,11 @@
specified in the flag list (default: domain):
nohz
- Disable the tick when a single task runs.
+ Disable the tick when a single task runs. A residual 1Hz
+ tick is offloaded to workqueues that you need to affine
+ to housekeeping through the sysfs file
+ /sys/devices/virtual/workqueue/cpumask or using the below
+ domain flag.
domain
Isolate from the general SMP balancing and scheduling
algorithms. Note that performing domain isolation this way
--
2.7.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 6/6] sched/isolation: Tick offload documentation
2018-02-08 17:59 ` [PATCH 6/6] sched/isolation: Tick offload documentation Frederic Weisbecker
@ 2018-02-09 7:06 ` Ingo Molnar
2018-02-14 14:52 ` Frederic Weisbecker
0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-02-09 7:06 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
Wanpeng Li, Mike Galbraith, Rik van Riel
* Frederic Weisbecker <frederic@kernel.org> wrote:
> Update the documentation to reflect the 1Hz tick offload changes.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Wanpeng Li <kernellwp@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 39ac9d4..c851e41 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1762,7 +1762,11 @@
> specified in the flag list (default: domain):
>
> nohz
> - Disable the tick when a single task runs.
> + Disable the tick when a single task runs. A residual 1Hz
> + tick is offloaded to workqueues that you need to affine
> + to housekeeping through the sysfs file
> + /sys/devices/virtual/workqueue/cpumask or using the below
> + domain flag.
This is pretty ambiguous and somewhat confusing, I'd suggest something like:
nohz
Disable the tick when a single task runs.
A residual 1Hz tick is offloaded to workqueues, which you
need to affine to housekeeping through the global
workqueue's affinity configured via the
/sys/devices/virtual/workqueue/cpumask sysfs file, or
by using the 'domain' flag described below.
NOTE: by default the global workqueue runs on all CPUs,
so to protect individual CPUs the 'cpumask' file has to
be configured manually after bootup.
Assuming what I wrote is correct - the CPU isolation config space is pretty
confusing all around and should be made a lot more human friendly ...
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/6] sched/isolation: Tick offload documentation
2018-02-09 7:06 ` Ingo Molnar
@ 2018-02-14 14:52 ` Frederic Weisbecker
0 siblings, 0 replies; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-14 14:52 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
Wanpeng Li, Mike Galbraith, Rik van Riel
On Fri, Feb 09, 2018 at 08:06:49AM +0100, Ingo Molnar wrote:
>
> * Frederic Weisbecker <frederic@kernel.org> wrote:
>
> > Update the documentation to reflect the 1Hz tick offload changes.
> >
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Chris Metcalf <cmetcalf@mellanox.com>
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Luiz Capitulino <lcapitulino@redhat.com>
> > Cc: Mike Galbraith <efault@gmx.de>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Wanpeng Li <kernellwp@gmail.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > ---
> > Documentation/admin-guide/kernel-parameters.txt | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 39ac9d4..c851e41 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1762,7 +1762,11 @@
> > specified in the flag list (default: domain):
> >
> > nohz
> > - Disable the tick when a single task runs.
> > + Disable the tick when a single task runs. A residual 1Hz
> > + tick is offloaded to workqueues that you need to affine
> > + to housekeeping through the sysfs file
> > + /sys/devices/virtual/workqueue/cpumask or using the below
> > + domain flag.
>
> This is pretty ambiguous and somewhat confusing, I'd suggest something like:
>
> nohz
> Disable the tick when a single task runs.
>
> A residual 1Hz tick is offloaded to workqueues, which you
> need to affine to housekeeping through the global
> workqueue's affinity configured via the
> /sys/devices/virtual/workqueue/cpumask sysfs file, or
> by using the 'domain' flag described below.
>
> NOTE: by default the global workqueue runs on all CPUs,
> so to protect individual CPUs the 'cpumask' file has to
> be configured manually after bootup.
>
> Assuming what I wrote is correct - the CPU isolation config space is pretty
> confusing all around and should be made a lot more human friendly ...
That's right. In fact "nohz_full=" affines the workqueues and it involves much
more: unbound timers affinity, RCU threads, etc...
So "nohz_full=" is the friendly interface as it does all in one.
Now the use of "isolcpus=" is supposed to be more finegrained and allow for more control.
Ideally I would like to have an "unbound" flag which affines all these unbound works. And
why not a "per_cpu" flag to disable or offload per cpu work such as watchdog.
So we would only need to do:
isolcpus=nohz,unbound,per_cpu
Or even just:
isolcpus=all
But before extending further isolcpus=, I would like to make sure I can make it mutable
later through cpusets. So this is work in progress.
Thanks.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/6] isolation: 1Hz residual tick offloading v5
2018-02-08 17:59 [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Frederic Weisbecker
` (5 preceding siblings ...)
2018-02-08 17:59 ` [PATCH 6/6] sched/isolation: Tick offload documentation Frederic Weisbecker
@ 2018-02-09 7:00 ` Ingo Molnar
2018-02-10 10:24 ` Frederic Weisbecker
6 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-02-09 7:00 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
Wanpeng Li, Mike Galbraith, Rik van Riel
* Frederic Weisbecker <frederic@kernel.org> wrote:
> sched/isolation: Residual 1Hz scheduler tick offload
> sched/isolation: Tick offload documentation
Please try to start each title with a verb.
[ ... and preferably not by prepending 'do' ;-) ]
Beyond making changelogs more consistent, this will actually also add real
information to the title, because, for example, any of these possible variants:
sched/isolation: Fix tick offload documentation
sched/isolation: Update tick offload documentation
sched/isolation: Add tick offload documentation
sched/isolation: Remove tick offload documentation
sched/isolation: Fix residual 1Hz scheduler tick offload
sched/isolation: Update residual 1Hz scheduler tick offload
sched/isolation: Introduce residual 1Hz scheduler tick offload
sched/isolation: Remove residual 1Hz scheduler tick offload
will tell us _a lot more_ about the nature of the changes from the shortlog alone!
Thanks,
Ingo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/6] isolation: 1Hz residual tick offloading v5
2018-02-09 7:00 ` [PATCH 0/6] isolation: 1Hz residual tick offloading v5 Ingo Molnar
@ 2018-02-10 10:24 ` Frederic Weisbecker
0 siblings, 0 replies; 19+ messages in thread
From: Frederic Weisbecker @ 2018-02-10 10:24 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Peter Zijlstra, Chris Metcalf, Thomas Gleixner,
Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
Wanpeng Li, Mike Galbraith, Rik van Riel
On Fri, Feb 09, 2018 at 08:00:06AM +0100, Ingo Molnar wrote:
>
> * Frederic Weisbecker <frederic@kernel.org> wrote:
>
> > sched/isolation: Residual 1Hz scheduler tick offload
> > sched/isolation: Tick offload documentation
>
> Please try to start each title with a verb.
>
> [ ... and preferably not by prepending 'do' ;-) ]
>
> Beyond making changelogs more consistent, this will actually also add real
> information to the title, because, for example, any of these possible variants:
>
> sched/isolation: Fix tick offload documentation
> sched/isolation: Update tick offload documentation
> sched/isolation: Add tick offload documentation
> sched/isolation: Remove tick offload documentation
>
> sched/isolation: Fix residual 1Hz scheduler tick offload
> sched/isolation: Update residual 1Hz scheduler tick offload
> sched/isolation: Introduce residual 1Hz scheduler tick offload
> sched/isolation: Remove residual 1Hz scheduler tick offload
>
> will tell us _a lot more_ about the nature of the changes from the shortlog alone!
Good point, I get strange habits over the years.
Thanks.
^ permalink raw reply [flat|nested] 19+ messages in thread