This is a kernel enhancement that configures the cpu affinity of kernel threads via kernel boot option nohz_full=. When this option is specified, the cpumask is immediately applied upon thread launch. This does not affect kernel threads that specify cpu and node. This allows CPU isolation (that is not allowing certain threads to execute on certain CPUs) without using the isolcpus=domain parameter, making it possible to enable load balancing on such CPUs during runtime (see kernel-parameters.txt). Note-1: this is based off on Wind River's patch at https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch Difference being that this patch is limited to modifying kernel thread cpumask: Behaviour of other threads can be controlled via cgroups or sched_setaffinity. Note-2: Wind River's patch was based off Christoph Lameter's patch at https://lwn.net/Articles/565932/ with the only difference being the kernel parameter changed from kthread to kthread_cpus. v2: use isolcpus= subcommand (Thomas Gleixner) v3: s/MontaVista/Wind River/ on changelog (Chris Friesen) documentation updates (Chris Friesen) undeprecate isolcpus (Chris Friesen) general cleanups (Frederic Weisbecker) separate cpu_possible_mask kthread mask change (Frederic Weisbecker) v4: disable idle load balancing for nohz_full= use nohz_full= option for kthread isolation (Frederic Weisbecker)
Next patch will switch unbound kernel threads mask to housekeeping_cpumask(), a subset of cpu_possible_mask. Switch from cpu_all_mask to cpu_possible_mask separately, to ease bisection. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- kernel/kthread.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6/kernel/kthread.c =================================================================== --- linux-2.6.orig/kernel/kthread.c +++ linux-2.6/kernel/kthread.c @@ -347,7 +347,7 @@ struct task_struct *__kthread_create_on_ * The kernel thread should not inherit these properties. */ sched_setscheduler_nocheck(task, SCHED_NORMAL, ¶m); - set_cpus_allowed_ptr(task, cpu_all_mask); + set_cpus_allowed_ptr(task, cpu_possible_mask); } kfree(create); return task; @@ -572,7 +572,7 @@ int kthreadd(void *unused) /* Setup a clean context for our children to inherit. */ set_task_comm(tsk, "kthreadd"); ignore_signals(tsk); - set_cpus_allowed_ptr(tsk, cpu_all_mask); + set_cpus_allowed_ptr(tsk, cpu_possible_mask); set_mems_allowed(node_states[N_MEMORY]); current->flags |= PF_NOFREEZE;
Avoid idle load balancing on nohz_full CPUs. This avoids assigning tasks to such CPUs, when they enter idle. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Index: linux-2.6/kernel/sched/isolation.c =================================================================== --- linux-2.6.orig/kernel/sched/isolation.c +++ linux-2.6/kernel/sched/isolation.c @@ -140,7 +140,8 @@ static int __init housekeeping_nohz_full { unsigned int flags; - flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC; + flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | + HK_FLAG_MISC | HK_FLAG_SCHED; return housekeeping_setup(str, flags); }
This is a kernel enhancement that configures the cpu affinity of kernel threads via kernel boot option nohz_full=. When this option is specified, the cpumask is immediately applied upon thread launch. This does not affect kernel threads that specify cpu and node. This allows CPU isolation (that is not allowing certain threads to execute on certain CPUs) without using the isolcpus=domain parameter, making it possible to enable load balancing on such CPUs during runtime (see kernel-parameters.txt). Note-1: this is based off on Wind River's patch at https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch Difference being that this patch is limited to modifying kernel thread cpumask: Behaviour of other threads can be controlled via cgroups or sched_setaffinity. Note-2: Wind River's patch was based off Christoph Lameter's patch at https://lwn.net/Articles/565932/ with the only difference being the kernel parameter changed from kthread to kthread_cpus. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++ include/linux/sched/isolation.h | 1 + kernel/kthread.c | 6 ++++-- kernel/sched/isolation.c | 6 ++++++ 4 files changed, 19 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/sched/isolation.h =================================================================== --- linux-2.6.orig/include/linux/sched/isolation.h +++ linux-2.6/include/linux/sched/isolation.h @@ -14,6 +14,7 @@ enum hk_flags { HK_FLAG_DOMAIN = (1 << 5), HK_FLAG_WQ = (1 << 6), HK_FLAG_MANAGED_IRQ = (1 << 7), + HK_FLAG_KTHREAD = (1 << 8), }; #ifdef CONFIG_CPU_ISOLATION Index: linux-2.6/kernel/kthread.c =================================================================== --- linux-2.6.orig/kernel/kthread.c +++ linux-2.6/kernel/kthread.c @@ -23,6 +23,7 @@ #include <linux/ptrace.h> #include <linux/uaccess.h> #include <linux/numa.h> +#include <linux/sched/isolation.h> #include <trace/events/sched.h> static DEFINE_SPINLOCK(kthread_create_lock); @@ -347,7 +348,8 @@ struct task_struct *__kthread_create_on_ * The kernel thread should not inherit these properties. */ sched_setscheduler_nocheck(task, SCHED_NORMAL, ¶m); - set_cpus_allowed_ptr(task, cpu_possible_mask); + set_cpus_allowed_ptr(task, + housekeeping_cpumask(HK_FLAG_KTHREAD)); } kfree(create); return task; @@ -572,7 +574,7 @@ int kthreadd(void *unused) /* Setup a clean context for our children to inherit. */ set_task_comm(tsk, "kthreadd"); ignore_signals(tsk); - set_cpus_allowed_ptr(tsk, cpu_possible_mask); + set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_FLAG_KTHREAD)); set_mems_allowed(node_states[N_MEMORY]); current->flags |= PF_NOFREEZE; Index: linux-2.6/kernel/sched/isolation.c =================================================================== --- linux-2.6.orig/kernel/sched/isolation.c +++ linux-2.6/kernel/sched/isolation.c @@ -141,7 +141,7 @@ static int __init housekeeping_nohz_full unsigned int flags; flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | - HK_FLAG_MISC | HK_FLAG_SCHED; + HK_FLAG_MISC | HK_FLAG_SCHED | HK_FLAG_KTHREAD; return housekeeping_setup(str, flags); }
isolcpus is used to control steering of interrupts to managed_irqs and kernel threads, therefore its incorrect to state that its deprecated. Remove deprecation warning. Suggested-by: Chris Friesen <chris.friesen@windriver.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- Documentation/admin-guide/kernel-parameters.txt | 1 - 1 file changed, 1 deletion(-) Index: linux-2.6/Documentation/admin-guide/kernel-parameters.txt =================================================================== --- linux-2.6.orig/Documentation/admin-guide/kernel-parameters.txt +++ linux-2.6/Documentation/admin-guide/kernel-parameters.txt @@ -1926,7 +1926,6 @@ Format: <RDP>,<reset>,<pci_scan>,<verbosity> isolcpus= [KNL,SMP,ISOL] Isolate a given set of CPUs from disturbance. - [Deprecated - use cpusets instead] Format: [flag-list,]<cpu-list> Specify one or more CPUs to isolate from disturbances
On 4/1/20 5:10 AM, Marcelo Tosatti wrote: > This is a kernel enhancement that configures the cpu affinity of kernel > threads via kernel boot option nohz_full=. > > When this option is specified, the cpumask is immediately applied upon > thread launch. This does not affect kernel threads that specify cpu > and node. > > This allows CPU isolation (that is not allowing certain threads > to execute on certain CPUs) without using the isolcpus=domain parameter, > making it possible to enable load balancing on such CPUs > during runtime (see kernel-parameters.txt). > > Note-1: this is based off on Wind River's patch at > https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch > > Difference being that this patch is limited to modifying > kernel thread cpumask: Behaviour of other threads can > be controlled via cgroups or sched_setaffinity. > > Note-2: Wind River's patch was based off Christoph Lameter's patch at > https://lwn.net/Articles/565932/ with the only difference being > the kernel parameter changed from kthread to kthread_cpus. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > Hi Marcelo, > --- > Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++ Patch is missing those Docum bits. > include/linux/sched/isolation.h | 1 + > kernel/kthread.c | 6 ++++-- > kernel/sched/isolation.c | 6 ++++++ > 4 files changed, 19 insertions(+), 2 deletions(-) > > Index: linux-2.6/include/linux/sched/isolation.h > =================================================================== > --- linux-2.6.orig/include/linux/sched/isolation.h > +++ linux-2.6/include/linux/sched/isolation.h > @@ -14,6 +14,7 @@ enum hk_flags { > HK_FLAG_DOMAIN = (1 << 5), > HK_FLAG_WQ = (1 << 6), > HK_FLAG_MANAGED_IRQ = (1 << 7), > + HK_FLAG_KTHREAD = (1 << 8), > }; > > #ifdef CONFIG_CPU_ISOLATION > Index: linux-2.6/kernel/kthread.c > =================================================================== > --- linux-2.6.orig/kernel/kthread.c > +++ linux-2.6/kernel/kthread.c > @@ -23,6 +23,7 @@ > #include <linux/ptrace.h> > #include <linux/uaccess.h> > #include <linux/numa.h> > +#include <linux/sched/isolation.h> > #include <trace/events/sched.h> > > static DEFINE_SPINLOCK(kthread_create_lock); > @@ -347,7 +348,8 @@ struct task_struct *__kthread_create_on_ > * The kernel thread should not inherit these properties. > */ > sched_setscheduler_nocheck(task, SCHED_NORMAL, ¶m); > - set_cpus_allowed_ptr(task, cpu_possible_mask); > + set_cpus_allowed_ptr(task, > + housekeeping_cpumask(HK_FLAG_KTHREAD)); > } > kfree(create); > return task; > @@ -572,7 +574,7 @@ int kthreadd(void *unused) > /* Setup a clean context for our children to inherit. */ > set_task_comm(tsk, "kthreadd"); > ignore_signals(tsk); > - set_cpus_allowed_ptr(tsk, cpu_possible_mask); > + set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_FLAG_KTHREAD)); > set_mems_allowed(node_states[N_MEMORY]); > > current->flags |= PF_NOFREEZE; > Index: linux-2.6/kernel/sched/isolation.c > =================================================================== > --- linux-2.6.orig/kernel/sched/isolation.c > +++ linux-2.6/kernel/sched/isolation.c > @@ -141,7 +141,7 @@ static int __init housekeeping_nohz_full > unsigned int flags; > > flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | > - HK_FLAG_MISC | HK_FLAG_SCHED; > + HK_FLAG_MISC | HK_FLAG_SCHED | HK_FLAG_KTHREAD; > > return housekeeping_setup(str, flags); > } > > thanks. -- ~Randy
This is a kernel enhancement that configures the cpu affinity of kernel threads via kernel boot option nohz_full=. When this option is specified, the cpumask is immediately applied upon thread launch. This does not affect kernel threads that specify cpu and node. This allows CPU isolation (that is not allowing certain threads to execute on certain CPUs) without using the isolcpus=domain parameter, making it possible to enable load balancing on such CPUs during runtime (see kernel-parameters.txt). Note-1: this is based off on Wind River's patch at https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch Difference being that this patch is limited to modifying kernel thread cpumask: Behaviour of other threads can be controlled via cgroups or sched_setaffinity. Note-2: Wind River's patch was based off Christoph Lameter's patch at https://lwn.net/Articles/565932/ with the only difference being the kernel parameter changed from kthread to kthread_cpus. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- include/linux/sched/isolation.h | 1 + kernel/kthread.c | 6 ++++-- kernel/sched/isolation.c | 6 ++++++ 4 files changed, 19 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/sched/isolation.h =================================================================== --- linux-2.6.orig/include/linux/sched/isolation.h +++ linux-2.6/include/linux/sched/isolation.h @@ -14,6 +14,7 @@ enum hk_flags { HK_FLAG_DOMAIN = (1 << 5), HK_FLAG_WQ = (1 << 6), HK_FLAG_MANAGED_IRQ = (1 << 7), + HK_FLAG_KTHREAD = (1 << 8), }; #ifdef CONFIG_CPU_ISOLATION Index: linux-2.6/kernel/kthread.c =================================================================== --- linux-2.6.orig/kernel/kthread.c +++ linux-2.6/kernel/kthread.c @@ -23,6 +23,7 @@ #include <linux/ptrace.h> #include <linux/uaccess.h> #include <linux/numa.h> +#include <linux/sched/isolation.h> #include <trace/events/sched.h> static DEFINE_SPINLOCK(kthread_create_lock); @@ -347,7 +348,8 @@ struct task_struct *__kthread_create_on_ * The kernel thread should not inherit these properties. */ sched_setscheduler_nocheck(task, SCHED_NORMAL, ¶m); - set_cpus_allowed_ptr(task, cpu_possible_mask); + set_cpus_allowed_ptr(task, + housekeeping_cpumask(HK_FLAG_KTHREAD)); } kfree(create); return task; @@ -572,7 +574,7 @@ int kthreadd(void *unused) /* Setup a clean context for our children to inherit. */ set_task_comm(tsk, "kthreadd"); ignore_signals(tsk); - set_cpus_allowed_ptr(tsk, cpu_possible_mask); + set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_FLAG_KTHREAD)); set_mems_allowed(node_states[N_MEMORY]); current->flags |= PF_NOFREEZE; Index: linux-2.6/kernel/sched/isolation.c =================================================================== --- linux-2.6.orig/kernel/sched/isolation.c +++ linux-2.6/kernel/sched/isolation.c @@ -141,7 +141,7 @@ static int __init housekeeping_nohz_full unsigned int flags; flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | - HK_FLAG_MISC | HK_FLAG_SCHED; + HK_FLAG_MISC | HK_FLAG_SCHED | HK_FLAG_KTHREAD; return housekeeping_setup(str, flags); }
On Wed, Apr 01, 2020 at 09:10:18AM -0300, Marcelo Tosatti wrote:
> This is a kernel enhancement that configures the cpu affinity of kernel
> threads via kernel boot option nohz_full=.
>
> When this option is specified, the cpumask is immediately applied upon
> thread launch. This does not affect kernel threads that specify cpu
> and node.
>
> This allows CPU isolation (that is not allowing certain threads
> to execute on certain CPUs) without using the isolcpus=domain parameter,
> making it possible to enable load balancing on such CPUs
> during runtime (see kernel-parameters.txt).
>
> Note-1: this is based off on Wind River's patch at
> https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch
>
> Difference being that this patch is limited to modifying
> kernel thread cpumask: Behaviour of other threads can
> be controlled via cgroups or sched_setaffinity.
>
> Note-2: Wind River's patch was based off Christoph Lameter's patch at
> https://lwn.net/Articles/565932/ with the only difference being
> the kernel parameter changed from kthread to kthread_cpus.
>
> v2: use isolcpus= subcommand (Thomas Gleixner)
>
> v3: s/MontaVista/Wind River/ on changelog (Chris Friesen)
> documentation updates (Chris Friesen)
> undeprecate isolcpus (Chris Friesen)
> general cleanups (Frederic Weisbecker)
> separate cpu_possible_mask kthread
> mask change (Frederic Weisbecker)
>
> v4: disable idle load balancing for nohz_full=
> use nohz_full= option for kthread isolation (Frederic Weisbecker)
>
Thanks for the patches. I'm applying them, I may add a few details
to the changelogs and stuff.