linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] CPU isolation improvements
@ 2022-10-13 18:40 Leonardo Bras
  2022-10-13 18:40 ` [PATCH v2 1/4] sched/isolation: Fix style issues reported by checkpatch Leonardo Bras
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Leonardo Bras @ 2022-10-13 18:40 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Leonardo Bras, Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti
  Cc: linux-crypto, linux-kernel, linux-pci, netdev

Patch 1 removes some noise from isolation.c

Patch 2 adds some information about the housekeeping flags and a short
description on what to expect from the HK functions. I would really like 
some feedback on this one, since I got all that from the flags usage, and 
maybe I am misreading stuff.

In patch 3, I am suggesting making isolcpus have both the _DOMAIN flag and 
the _WQ flag, so the _DOMAIN flag is not responsible for isolating cpus on 
workqueue operations anymore. This will avoid AND'ing both those bitmaps 
every time we need to check for Workqueue isolation, simplifying code and 
avoiding cpumask allocation in most cases. 

Maybe I am missing something in this move, so please provide feedback.

In patch 4 I use the results from patch 3 and I disallow pcrypt to schedule 
work in cpus that are not enabled for workqueue housekeeping, meaning there 
will be less work done in those isolated cpus.

Best regards,
Leo

Leonardo Bras (4):
  sched/isolation: Fix style issues reported by checkpatch
  sched/isolation: Improve documentation
  sched/isolation: Add HK_TYPE_WQ to isolcpus=domain
  crypto/pcrypt: Do not use isolated CPUs for callback

 crypto/pcrypt.c                 |  9 +++++---
 drivers/pci/pci-driver.c        | 13 +----------
 include/linux/sched/isolation.h | 38 ++++++++++++++++++++-------------
 kernel/sched/isolation.c        |  4 ++--
 kernel/workqueue.c              |  1 -
 net/core/net-sysfs.c            |  1 -
 6 files changed, 32 insertions(+), 34 deletions(-)

-- 
2.38.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/4] sched/isolation: Fix style issues reported by checkpatch
  2022-10-13 18:40 [PATCH v2 0/4] CPU isolation improvements Leonardo Bras
@ 2022-10-13 18:40 ` Leonardo Bras
  2022-10-14  8:28   ` Peter Zijlstra
  2022-10-13 18:40 ` [PATCH v2 2/4] sched/isolation: Improve documentation Leonardo Bras
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Leonardo Bras @ 2022-10-13 18:40 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Leonardo Bras, Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti
  Cc: linux-crypto, linux-kernel, linux-pci, netdev

scripts/checkpatch.pl warns about:
- extern prototypes should be avoided in .h files
- Missing or malformed SPDX-License-Identifier tag in line 1

Fix those issues to avoid extra noise.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/linux/sched/isolation.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 8c15abd67aed9..762701f295d1c 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _LINUX_SCHED_ISOLATION_H
 #define _LINUX_SCHED_ISOLATION_H
 
@@ -20,12 +21,12 @@ enum hk_type {
 
 #ifdef CONFIG_CPU_ISOLATION
 DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
-extern int housekeeping_any_cpu(enum hk_type type);
-extern const struct cpumask *housekeeping_cpumask(enum hk_type type);
-extern bool housekeeping_enabled(enum hk_type type);
-extern void housekeeping_affine(struct task_struct *t, enum hk_type type);
-extern bool housekeeping_test_cpu(int cpu, enum hk_type type);
-extern void __init housekeeping_init(void);
+int housekeeping_any_cpu(enum hk_type type);
+const struct cpumask *housekeeping_cpumask(enum hk_type type);
+bool housekeeping_enabled(enum hk_type type);
+void housekeeping_affine(struct task_struct *t, enum hk_type type);
+bool housekeeping_test_cpu(int cpu, enum hk_type type);
+void __init housekeeping_init(void);
 
 #else
 
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/4] sched/isolation: Improve documentation
  2022-10-13 18:40 [PATCH v2 0/4] CPU isolation improvements Leonardo Bras
  2022-10-13 18:40 ` [PATCH v2 1/4] sched/isolation: Fix style issues reported by checkpatch Leonardo Bras
@ 2022-10-13 18:40 ` Leonardo Bras
  2022-10-14  8:32   ` Peter Zijlstra
  2022-11-29 11:54   ` Frederic Weisbecker
  2022-10-13 18:40 ` [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain Leonardo Bras
  2022-10-13 18:40 ` [PATCH v2 4/4] crypto/pcrypt: Do not use isolated CPUs for callback Leonardo Bras
  3 siblings, 2 replies; 16+ messages in thread
From: Leonardo Bras @ 2022-10-13 18:40 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Leonardo Bras, Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti
  Cc: linux-crypto, linux-kernel, linux-pci, netdev

Improve documentation on housekeeping types and what to expect from
housekeeping functions.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/linux/sched/isolation.h | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 762701f295d1c..9333c28153a7a 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -7,18 +7,25 @@
 #include <linux/tick.h>
 
 enum hk_type {
-	HK_TYPE_TIMER,
-	HK_TYPE_RCU,
-	HK_TYPE_MISC,
-	HK_TYPE_SCHED,
-	HK_TYPE_TICK,
-	HK_TYPE_DOMAIN,
-	HK_TYPE_WQ,
-	HK_TYPE_MANAGED_IRQ,
-	HK_TYPE_KTHREAD,
+	HK_TYPE_TIMER,		/* Timer interrupt, watchdogs */
+	HK_TYPE_RCU,		/* RCU callbacks */
+	HK_TYPE_MISC,		/* Minor housekeeping categories */
+	HK_TYPE_SCHED,		/* Scheduling and idle load balancing */
+	HK_TYPE_TICK,		/* See isolcpus=nohz boot parameter */
+	HK_TYPE_DOMAIN,		/* See isolcpus=domain boot parameter*/
+	HK_TYPE_WQ,		/* Work Queues*/
+	HK_TYPE_MANAGED_IRQ,	/* See isolcpus=managed_irq boot parameter */
+	HK_TYPE_KTHREAD,	/* kernel threads */
 	HK_TYPE_MAX
 };
 
+/* Kernel parameters like nohz_full and isolcpus allow passing cpu numbers
+ * for disabling housekeeping types.
+ *
+ * The functions bellow work the opposite way, by referencing which cpus
+ * are able to perform the housekeeping type in parameter.
+ */
+
 #ifdef CONFIG_CPU_ISOLATION
 DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
 int housekeeping_any_cpu(enum hk_type type);
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain
  2022-10-13 18:40 [PATCH v2 0/4] CPU isolation improvements Leonardo Bras
  2022-10-13 18:40 ` [PATCH v2 1/4] sched/isolation: Fix style issues reported by checkpatch Leonardo Bras
  2022-10-13 18:40 ` [PATCH v2 2/4] sched/isolation: Improve documentation Leonardo Bras
@ 2022-10-13 18:40 ` Leonardo Bras
  2022-10-14  8:36   ` Peter Zijlstra
  2022-10-13 18:40 ` [PATCH v2 4/4] crypto/pcrypt: Do not use isolated CPUs for callback Leonardo Bras
  3 siblings, 1 reply; 16+ messages in thread
From: Leonardo Bras @ 2022-10-13 18:40 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Leonardo Bras, Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti
  Cc: linux-crypto, linux-kernel, linux-pci, netdev

Housekeeping code keeps multiple cpumasks in order to keep track of which
cpus can perform given housekeeping category.

Every time the HK_TYPE_WQ cpumask is checked before queueing work at a cpu
WQ it also happens to check for HK_TYPE_DOMAIN. So It can be assumed that
the Domain isolation also ends up isolating work queues.

Delegating current HK_TYPE_DOMAIN's work queue isolation to HK_TYPE_WQ
makes it simpler to check if a cpu can run a task into an work queue, since
code just need to go through a single HK_TYPE_* cpumask.

Make isolcpus=domain aggregate both HK_TYPE_DOMAIN and HK_TYPE_WQ, and
remove a lot of cpumask_and calls.

Also, remove a unnecessary '|=' at housekeeping_isolcpus_setup() since we
are sure that 'flags == 0' here.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 drivers/pci/pci-driver.c | 13 +------------
 kernel/sched/isolation.c |  4 ++--
 kernel/workqueue.c       |  1 -
 net/core/net-sysfs.c     |  1 -
 4 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 107d77f3c8467..550bef2504b8d 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -371,19 +371,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 	    pci_physfn_is_probed(dev)) {
 		cpu = nr_cpu_ids;
 	} else {
-		cpumask_var_t wq_domain_mask;
-
-		if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
-			error = -ENOMEM;
-			goto out;
-		}
-		cpumask_and(wq_domain_mask,
-			    housekeeping_cpumask(HK_TYPE_WQ),
-			    housekeeping_cpumask(HK_TYPE_DOMAIN));
-
 		cpu = cpumask_any_and(cpumask_of_node(node),
-				      wq_domain_mask);
-		free_cpumask_var(wq_domain_mask);
+				      housekeeping_cpumask(HK_TYPE_WQ));
 	}
 
 	if (cpu < nr_cpu_ids)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 373d42c707bc5..ced4b78564810 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -204,7 +204,7 @@ static int __init housekeeping_isolcpus_setup(char *str)
 
 		if (!strncmp(str, "domain,", 7)) {
 			str += 7;
-			flags |= HK_FLAG_DOMAIN;
+			flags |= HK_FLAG_DOMAIN | HK_FLAG_WQ;
 			continue;
 		}
 
@@ -234,7 +234,7 @@ static int __init housekeeping_isolcpus_setup(char *str)
 
 	/* Default behaviour for isolcpus without flags */
 	if (!flags)
-		flags |= HK_FLAG_DOMAIN;
+		flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
 
 	return housekeeping_setup(str, flags);
 }
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7cd5f5e7e0a1b..b557daa571f17 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -6004,7 +6004,6 @@ void __init workqueue_init_early(void)
 
 	BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
 	cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ));
-	cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN));
 
 	pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 8409d41405dfe..7b6fb62a118ab 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -852,7 +852,6 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
 	}
 
 	if (!cpumask_empty(mask)) {
-		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
 		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ));
 		if (cpumask_empty(mask)) {
 			free_cpumask_var(mask);
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/4] crypto/pcrypt: Do not use isolated CPUs for callback
  2022-10-13 18:40 [PATCH v2 0/4] CPU isolation improvements Leonardo Bras
                   ` (2 preceding siblings ...)
  2022-10-13 18:40 ` [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain Leonardo Bras
@ 2022-10-13 18:40 ` Leonardo Bras
  2023-05-27  0:47   ` Leonardo Bras Soares Passos
  3 siblings, 1 reply; 16+ messages in thread
From: Leonardo Bras @ 2022-10-13 18:40 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Leonardo Bras, Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti
  Cc: linux-crypto, linux-kernel, linux-pci, netdev

Currently pcrypt_aead_init_tfm() will pick callback cpus (ctx->cb_cpu)
from any online cpus. Later padata_reorder() will queue_work_on() the
chosen cb_cpu.

This is undesired if the chosen cb_cpu is listed as isolated (i.e. using
isolcpus=... or nohz_full=... kernel parameters), since the work queued
will interfere with the workload on the isolated cpu.

Make sure isolated cpus are not used for pcrypt.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 crypto/pcrypt.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index 9d10b846ccf73..0162629a03957 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -16,6 +16,7 @@
 #include <linux/kobject.h>
 #include <linux/cpu.h>
 #include <crypto/pcrypt.h>
+#include <linux/sched/isolation.h>
 
 static struct padata_instance *pencrypt;
 static struct padata_instance *pdecrypt;
@@ -175,13 +176,15 @@ static int pcrypt_aead_init_tfm(struct crypto_aead *tfm)
 	struct pcrypt_instance_ctx *ictx = aead_instance_ctx(inst);
 	struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(tfm);
 	struct crypto_aead *cipher;
+	const cpumask_t *hk_wq = housekeeping_cpumask(HK_TYPE_WQ);
 
 	cpu_index = (unsigned int)atomic_inc_return(&ictx->tfm_count) %
-		    cpumask_weight(cpu_online_mask);
+		    cpumask_weight_and(hk_wq, cpu_online_mask);
 
-	ctx->cb_cpu = cpumask_first(cpu_online_mask);
+	ctx->cb_cpu = cpumask_first_and(hk_wq, cpu_online_mask);
 	for (cpu = 0; cpu < cpu_index; cpu++)
-		ctx->cb_cpu = cpumask_next(ctx->cb_cpu, cpu_online_mask);
+		ctx->cb_cpu = cpumask_next_and(ctx->cb_cpu, hk_wq,
+					       cpu_online_mask);
 
 	cipher = crypto_spawn_aead(&ictx->spawn);
 
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: Fix style issues reported by checkpatch
  2022-10-13 18:40 ` [PATCH v2 1/4] sched/isolation: Fix style issues reported by checkpatch Leonardo Bras
@ 2022-10-14  8:28   ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2022-10-14  8:28 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti, linux-crypto,
	linux-kernel, linux-pci, netdev

On Thu, Oct 13, 2022 at 03:40:26PM -0300, Leonardo Bras wrote:
> scripts/checkpatch.pl warns about:
> - extern prototypes should be avoided in .h files

Checkpatch is wrong... :-)

(and yeah, I know the opinions on extern are divided, but I like it)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/4] sched/isolation: Improve documentation
  2022-10-13 18:40 ` [PATCH v2 2/4] sched/isolation: Improve documentation Leonardo Bras
@ 2022-10-14  8:32   ` Peter Zijlstra
  2022-10-14 15:40     ` Leonardo Bras Soares Passos
  2022-11-29 11:54   ` Frederic Weisbecker
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2022-10-14  8:32 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti, linux-crypto,
	linux-kernel, linux-pci, netdev

On Thu, Oct 13, 2022 at 03:40:27PM -0300, Leonardo Bras wrote:

> +/* Kernel parameters like nohz_full and isolcpus allow passing cpu numbers
> + * for disabling housekeeping types.
> + *
> + * The functions bellow work the opposite way, by referencing which cpus
> + * are able to perform the housekeeping type in parameter.
> + */

So checkpatch should have bitten your head off for this drug-indiced
comment style :-)

https://lore.kernel.org/all/CA+55aFyQYJerovMsSoSKS7PessZBr4vNp-3QUUwhqk4A4_jcbg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain
  2022-10-13 18:40 ` [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain Leonardo Bras
@ 2022-10-14  8:36   ` Peter Zijlstra
  2022-10-14 13:24     ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2022-10-14  8:36 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti, linux-crypto,
	linux-kernel, linux-pci, netdev, fweisbec


+ Frederic; who actually does most of this code

On Thu, Oct 13, 2022 at 03:40:28PM -0300, Leonardo Bras wrote:
> Housekeeping code keeps multiple cpumasks in order to keep track of which
> cpus can perform given housekeeping category.
> 
> Every time the HK_TYPE_WQ cpumask is checked before queueing work at a cpu
> WQ it also happens to check for HK_TYPE_DOMAIN. So It can be assumed that
> the Domain isolation also ends up isolating work queues.
> 
> Delegating current HK_TYPE_DOMAIN's work queue isolation to HK_TYPE_WQ
> makes it simpler to check if a cpu can run a task into an work queue, since
> code just need to go through a single HK_TYPE_* cpumask.
> 
> Make isolcpus=domain aggregate both HK_TYPE_DOMAIN and HK_TYPE_WQ, and
> remove a lot of cpumask_and calls.
> 
> Also, remove a unnecessary '|=' at housekeeping_isolcpus_setup() since we
> are sure that 'flags == 0' here.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

I've long maintained that having all these separate masks is daft;
Frederic do we really need that?

> ---
>  drivers/pci/pci-driver.c | 13 +------------
>  kernel/sched/isolation.c |  4 ++--
>  kernel/workqueue.c       |  1 -
>  net/core/net-sysfs.c     |  1 -
>  4 files changed, 3 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 107d77f3c8467..550bef2504b8d 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -371,19 +371,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>  	    pci_physfn_is_probed(dev)) {
>  		cpu = nr_cpu_ids;
>  	} else {
> -		cpumask_var_t wq_domain_mask;
> -
> -		if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) {
> -			error = -ENOMEM;
> -			goto out;
> -		}
> -		cpumask_and(wq_domain_mask,
> -			    housekeeping_cpumask(HK_TYPE_WQ),
> -			    housekeeping_cpumask(HK_TYPE_DOMAIN));
> -
>  		cpu = cpumask_any_and(cpumask_of_node(node),
> -				      wq_domain_mask);
> -		free_cpumask_var(wq_domain_mask);
> +				      housekeeping_cpumask(HK_TYPE_WQ));
>  	}
>  
>  	if (cpu < nr_cpu_ids)
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 373d42c707bc5..ced4b78564810 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -204,7 +204,7 @@ static int __init housekeeping_isolcpus_setup(char *str)
>  
>  		if (!strncmp(str, "domain,", 7)) {
>  			str += 7;
> -			flags |= HK_FLAG_DOMAIN;
> +			flags |= HK_FLAG_DOMAIN | HK_FLAG_WQ;
>  			continue;
>  		}
>  
> @@ -234,7 +234,7 @@ static int __init housekeeping_isolcpus_setup(char *str)
>  
>  	/* Default behaviour for isolcpus without flags */
>  	if (!flags)
> -		flags |= HK_FLAG_DOMAIN;
> +		flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
>  
>  	return housekeeping_setup(str, flags);
>  }
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 7cd5f5e7e0a1b..b557daa571f17 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -6004,7 +6004,6 @@ void __init workqueue_init_early(void)
>  
>  	BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
>  	cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ));
> -	cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN));
>  
>  	pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
>  
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 8409d41405dfe..7b6fb62a118ab 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -852,7 +852,6 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
>  	}
>  
>  	if (!cpumask_empty(mask)) {
> -		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
>  		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ));
>  		if (cpumask_empty(mask)) {
>  			free_cpumask_var(mask);
> -- 
> 2.38.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain
  2022-10-14  8:36   ` Peter Zijlstra
@ 2022-10-14 13:24     ` Frederic Weisbecker
  2022-10-14 16:27       ` Leonardo Brás
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2022-10-14 13:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Leonardo Bras, Steffen Klassert, Herbert Xu, David S. Miller,
	Bjorn Helgaas, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Phil Auld, Antoine Tenart, Christophe JAILLET, Wang Yufen,
	mtosatti, linux-crypto, linux-kernel, linux-pci, netdev,
	fweisbec

On Fri, Oct 14, 2022 at 10:36:19AM +0200, Peter Zijlstra wrote:
> 
> + Frederic; who actually does most of this code
> 
> On Thu, Oct 13, 2022 at 03:40:28PM -0300, Leonardo Bras wrote:
> > Housekeeping code keeps multiple cpumasks in order to keep track of which
> > cpus can perform given housekeeping category.
> > 
> > Every time the HK_TYPE_WQ cpumask is checked before queueing work at a cpu
> > WQ it also happens to check for HK_TYPE_DOMAIN. So It can be assumed that
> > the Domain isolation also ends up isolating work queues.
> > 
> > Delegating current HK_TYPE_DOMAIN's work queue isolation to HK_TYPE_WQ
> > makes it simpler to check if a cpu can run a task into an work queue, since
> > code just need to go through a single HK_TYPE_* cpumask.
> > 
> > Make isolcpus=domain aggregate both HK_TYPE_DOMAIN and HK_TYPE_WQ, and
> > remove a lot of cpumask_and calls.
> > 
> > Also, remove a unnecessary '|=' at housekeeping_isolcpus_setup() since we
> > are sure that 'flags == 0' here.
> > 
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> 
> I've long maintained that having all these separate masks is daft;
> Frederic do we really need that?

Indeed. In my queue for the cpuset interface to nohz_full, I have the following
patch (but note DOMAIN and WQ have to stay seperate flags because workqueue
affinity can be modified seperately from isolcpus)

---
From: Frederic Weisbecker <frederic@kernel.org>
Date: Tue, 26 Jul 2022 17:03:30 +0200
Subject: [PATCH] sched/isolation: Gather nohz_full related isolation features
 into common flag

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/x86/kvm/x86.c              |  2 +-
 drivers/pci/pci-driver.c        |  2 +-
 include/linux/sched/isolation.h |  7 +------
 kernel/cpu.c                    |  4 ++--
 kernel/kthread.c                |  4 ++--
 kernel/rcu/tasks.h              |  2 +-
 kernel/rcu/tree_plugin.h        |  6 +++---
 kernel/sched/core.c             | 10 +++++-----
 kernel/sched/fair.c             |  6 +++---
 kernel/sched/isolation.c        | 25 +++++++------------------
 kernel/watchdog.c               |  2 +-
 kernel/workqueue.c              |  2 +-
 net/core/net-sysfs.c            |  2 +-
 13 files changed, 29 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1910e1e78b15..d0b73fcf4a1c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9009,7 +9009,7 @@ int kvm_arch_init(void *opaque)
 	}
 
 	if (pi_inject_timer == -1)
-		pi_inject_timer = housekeeping_enabled(HK_TYPE_TIMER);
+		pi_inject_timer = housekeeping_enabled(HK_TYPE_NOHZ_FULL);
 #ifdef CONFIG_X86_64
 	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
 
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 49238ddd39ee..af3494a39921 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -378,7 +378,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
 			goto out;
 		}
 		cpumask_and(wq_domain_mask,
-			    housekeeping_cpumask(HK_TYPE_WQ),
+			    housekeeping_cpumask(HK_TYPE_NOHZ_FULL),
 			    housekeeping_cpumask(HK_TYPE_DOMAIN));
 
 		cpu = cpumask_any_and(cpumask_of_node(node),
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 8c15abd67aed..7ca34e04abe7 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -6,15 +6,10 @@
 #include <linux/tick.h>
 
 enum hk_type {
-	HK_TYPE_TIMER,
-	HK_TYPE_RCU,
-	HK_TYPE_MISC,
+	HK_TYPE_NOHZ_FULL,
 	HK_TYPE_SCHED,
-	HK_TYPE_TICK,
 	HK_TYPE_DOMAIN,
-	HK_TYPE_WQ,
 	HK_TYPE_MANAGED_IRQ,
-	HK_TYPE_KTHREAD,
 	HK_TYPE_MAX
 };
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index bbad5e375d3b..573f14d75a2e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1500,8 +1500,8 @@ int freeze_secondary_cpus(int primary)
 	cpu_maps_update_begin();
 	if (primary == -1) {
 		primary = cpumask_first(cpu_online_mask);
-		if (!housekeeping_cpu(primary, HK_TYPE_TIMER))
-			primary = housekeeping_any_cpu(HK_TYPE_TIMER);
+		if (!housekeeping_cpu(primary, HK_TYPE_NOHZ_FULL))
+			primary = housekeeping_any_cpu(HK_TYPE_NOHZ_FULL);
 	} else {
 		if (!cpu_online(primary))
 			primary = cpumask_first(cpu_online_mask);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 544fd4097406..0719035feba0 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -355,7 +355,7 @@ static int kthread(void *_create)
 	 * back to default in case they have been changed.
 	 */
 	sched_setscheduler_nocheck(current, SCHED_NORMAL, &param);
-	set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));
+	set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
 
 	/* OK, tell user we're spawned, wait for stop or wakeup */
 	__set_current_state(TASK_UNINTERRUPTIBLE);
@@ -721,7 +721,7 @@ int kthreadd(void *unused)
 	/* Setup a clean context for our children to inherit. */
 	set_task_comm(tsk, "kthreadd");
 	ignore_signals(tsk);
-	set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_KTHREAD));
+	set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
 	set_mems_allowed(node_states[N_MEMORY]);
 
 	current->flags |= PF_NOFREEZE;
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index f5bf6fb430da..b99f79625b26 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -537,7 +537,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 	struct rcu_tasks *rtp = arg;
 
 	/* Run on housekeeping CPUs by default.  Sysadm can move if desired. */
-	housekeeping_affine(current, HK_TYPE_RCU);
+	housekeeping_affine(current, HK_TYPE_NOHZ_FULL);
 	WRITE_ONCE(rtp->kthread_ptr, current); // Let GPs start!
 
 	/*
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b2219577fbe2..4935b06c3caf 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1237,9 +1237,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
 		    cpu != outgoingcpu)
 			cpumask_set_cpu(cpu, cm);
-	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
+	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
 	if (cpumask_empty(cm))
-		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
+		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
 	set_cpus_allowed_ptr(t, cm);
 	mutex_unlock(&rnp->boost_kthread_mutex);
 	free_cpumask_var(cm);
@@ -1294,5 +1294,5 @@ static void rcu_bind_gp_kthread(void)
 {
 	if (!tick_nohz_full_enabled())
 		return;
-	housekeeping_affine(current, HK_TYPE_RCU);
+	housekeeping_affine(current, HK_TYPE_NOHZ_FULL);
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f53c0096860b..5ff205f39197 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1079,13 +1079,13 @@ int get_nohz_timer_target(void)
 	struct sched_domain *sd;
 	const struct cpumask *hk_mask;
 
-	if (housekeeping_cpu(cpu, HK_TYPE_TIMER)) {
+	if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL)) {
 		if (!idle_cpu(cpu))
 			return cpu;
 		default_cpu = cpu;
 	}
 
-	hk_mask = housekeeping_cpumask(HK_TYPE_TIMER);
+	hk_mask = housekeeping_cpumask(HK_TYPE_NOHZ_FULL);
 
 	rcu_read_lock();
 	for_each_domain(cpu, sd) {
@@ -1101,7 +1101,7 @@ int get_nohz_timer_target(void)
 	}
 
 	if (default_cpu == -1)
-		default_cpu = housekeeping_any_cpu(HK_TYPE_TIMER);
+		default_cpu = housekeeping_any_cpu(HK_TYPE_NOHZ_FULL);
 	cpu = default_cpu;
 unlock:
 	rcu_read_unlock();
@@ -5562,7 +5562,7 @@ static void sched_tick_start(int cpu)
 	int os;
 	struct tick_work *twork;
 
-	if (housekeeping_cpu(cpu, HK_TYPE_TICK))
+	if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL))
 		return;
 
 	WARN_ON_ONCE(!tick_work_cpu);
@@ -5583,7 +5583,7 @@ static void sched_tick_stop(int cpu)
 	struct tick_work *twork;
 	int os;
 
-	if (housekeeping_cpu(cpu, HK_TYPE_TICK))
+	if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL))
 		return;
 
 	WARN_ON_ONCE(!tick_work_cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 77b2048a9326..ac3b33e00451 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10375,7 +10375,7 @@ static inline int on_null_domain(struct rq *rq)
  * - When one of the busy CPUs notice that there may be an idle rebalancing
  *   needed, they will kick the idle load balancer, which then does idle
  *   load balancing for all the idle CPUs.
- * - HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED not set
+ * - HK_TYPE_NOHZ_FULL CPUs are used for this task, because HK_TYPE_SCHED not set
  *   anywhere yet.
  */
 
@@ -10384,7 +10384,7 @@ static inline int find_new_ilb(void)
 	int ilb;
 	const struct cpumask *hk_mask;
 
-	hk_mask = housekeeping_cpumask(HK_TYPE_MISC);
+	hk_mask = housekeeping_cpumask(HK_TYPE_NOHZ_FULL);
 
 	for_each_cpu_and(ilb, nohz.idle_cpus_mask, hk_mask) {
 
@@ -10400,7 +10400,7 @@ static inline int find_new_ilb(void)
 
 /*
  * Kick a CPU to do the nohz balancing, if it is time for it. We pick any
- * idle CPU in the HK_TYPE_MISC housekeeping set (if there is one).
+ * idle CPU in the HK_TYPE_NOHZ_FULL housekeeping set (if there is one).
  */
 static void kick_ilb(unsigned int flags)
 {
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 4087718ee5b4..443f1ce83e32 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -4,20 +4,15 @@
  *  any CPU: unbound workqueues, timers, kthreads and any offloadable work.
  *
  * Copyright (C) 2017 Red Hat, Inc., Frederic Weisbecker
- * Copyright (C) 2017-2018 SUSE, Frederic Weisbecker
+ * Copyright (C) 2017-2022 SUSE, Frederic Weisbecker
  *
  */
 
 enum hk_flags {
-	HK_FLAG_TIMER		= BIT(HK_TYPE_TIMER),
-	HK_FLAG_RCU		= BIT(HK_TYPE_RCU),
-	HK_FLAG_MISC		= BIT(HK_TYPE_MISC),
+	HK_FLAG_NOHZ_FULL	= BIT(HK_TYPE_NOHZ_FULL),
 	HK_FLAG_SCHED		= BIT(HK_TYPE_SCHED),
-	HK_FLAG_TICK		= BIT(HK_TYPE_TICK),
 	HK_FLAG_DOMAIN		= BIT(HK_TYPE_DOMAIN),
-	HK_FLAG_WQ		= BIT(HK_TYPE_WQ),
 	HK_FLAG_MANAGED_IRQ	= BIT(HK_TYPE_MANAGED_IRQ),
-	HK_FLAG_KTHREAD		= BIT(HK_TYPE_KTHREAD),
 };
 
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
@@ -88,7 +83,7 @@ void __init housekeeping_init(void)
 
 	static_branch_enable(&housekeeping_overridden);
 
-	if (housekeeping.flags & HK_FLAG_TICK)
+	if (housekeeping.flags & HK_FLAG_NOHZ_FULL)
 		sched_tick_offload_init();
 
 	for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) {
@@ -111,7 +106,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
 	cpumask_var_t non_housekeeping_mask, housekeeping_staging;
 	int err = 0;
 
-	if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) {
+	if ((flags & HK_FLAG_NOHZ_FULL) && !(housekeeping.flags & HK_FLAG_NOHZ_FULL)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_FULL)) {
 			pr_warn("Housekeeping: nohz unsupported."
 				" Build with CONFIG_NO_HZ_FULL\n");
@@ -163,7 +158,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
 			housekeeping_setup_type(type, housekeeping_staging);
 	}
 
-	if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK))
+	if ((flags & HK_FLAG_NOHZ_FULL) && !(housekeeping.flags & HK_FLAG_NOHZ_FULL))
 		tick_nohz_full_setup(non_housekeeping_mask);
 
 	housekeeping.flags |= flags;
@@ -179,12 +174,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
 
 static int __init housekeeping_nohz_full_setup(char *str)
 {
-	unsigned long flags;
-
-	flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
-		HK_FLAG_MISC | HK_FLAG_KTHREAD;
-
-	return housekeeping_setup(str, flags);
+	return housekeeping_setup(str, HK_FLAG_NOHZ_FULL);
 }
 __setup("nohz_full=", housekeeping_nohz_full_setup);
 
@@ -198,8 +188,7 @@ static int __init housekeeping_isolcpus_setup(char *str)
 	while (isalpha(*str)) {
 		if (!strncmp(str, "nohz,", 5)) {
 			str += 5;
-			flags |= HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER |
-				HK_FLAG_RCU | HK_FLAG_MISC | HK_FLAG_KTHREAD;
+			flags |= HK_FLAG_NOHZ_FULL;
 			continue;
 		}
 
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 20a7a55e62b6..3e9636f4bac6 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -852,7 +852,7 @@ void __init lockup_detector_init(void)
 		pr_info("Disabling watchdog on nohz_full cores by default\n");
 
 	cpumask_copy(&watchdog_cpumask,
-		     housekeeping_cpumask(HK_TYPE_TIMER));
+		     housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
 
 	if (!watchdog_nmi_probe())
 		nmi_watchdog_available = true;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1ea50f6be843..3eb283d76d81 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5993,7 +5993,7 @@ void __init workqueue_init_early(void)
 	BUILD_BUG_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));
 
 	BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
-	cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ));
+	cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
 	cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN));
 
 	pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index e319e242dddf..6dddf359b754 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -852,7 +852,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
 
 	if (!cpumask_empty(mask)) {
 		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
-		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ));
+		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
 		if (cpumask_empty(mask)) {
 			free_cpumask_var(mask);
 			return -EINVAL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/4] sched/isolation: Improve documentation
  2022-10-14  8:32   ` Peter Zijlstra
@ 2022-10-14 15:40     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 16+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-10-14 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti, linux-crypto,
	linux-kernel, linux-pci, netdev

On Fri, Oct 14, 2022 at 6:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 13, 2022 at 03:40:27PM -0300, Leonardo Bras wrote:
>
> > +/* Kernel parameters like nohz_full and isolcpus allow passing cpu numbers
> > + * for disabling housekeeping types.
> > + *
> > + * The functions bellow work the opposite way, by referencing which cpus
> > + * are able to perform the housekeeping type in parameter.
> > + */
>
> So checkpatch should have bitten your head off for this drug-indiced
> comment style :-)

Oh, my bad on this one, I will fix that now.

>
> https://lore.kernel.org/all/CA+55aFyQYJerovMsSoSKS7PessZBr4vNp-3QUUwhqk4A4_jcbg@mail.gmail.com/
>

Any comments in the text? Is that correct?

Thanks for the feedback!

Best regards,
Leo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain
  2022-10-14 13:24     ` Frederic Weisbecker
@ 2022-10-14 16:27       ` Leonardo Brás
  2022-11-29 12:10         ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Leonardo Brás @ 2022-10-14 16:27 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Phil Auld, Antoine Tenart, Christophe JAILLET, Wang Yufen,
	mtosatti, linux-crypto, linux-kernel, linux-pci, netdev,
	fweisbec

On Fri, 2022-10-14 at 15:24 +0200, Frederic Weisbecker wrote:
> On Fri, Oct 14, 2022 at 10:36:19AM +0200, Peter Zijlstra wrote:
> > 
> > + Frederic; who actually does most of this code
> > 
> > On Thu, Oct 13, 2022 at 03:40:28PM -0300, Leonardo Bras wrote:
> > > Housekeeping code keeps multiple cpumasks in order to keep track of which
> > > cpus can perform given housekeeping category.
> > > 
> > > Every time the HK_TYPE_WQ cpumask is checked before queueing work at a cpu
> > > WQ it also happens to check for HK_TYPE_DOMAIN. So It can be assumed that
> > > the Domain isolation also ends up isolating work queues.
> > > 
> > > Delegating current HK_TYPE_DOMAIN's work queue isolation to HK_TYPE_WQ
> > > makes it simpler to check if a cpu can run a task into an work queue, since
> > > code just need to go through a single HK_TYPE_* cpumask.
> > > 
> > > Make isolcpus=domain aggregate both HK_TYPE_DOMAIN and HK_TYPE_WQ, and
> > > remove a lot of cpumask_and calls.
> > > 
> > > Also, remove a unnecessary '|=' at housekeeping_isolcpus_setup() since we
> > > are sure that 'flags == 0' here.
> > > 
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > 
> > I've long maintained that having all these separate masks is daft;
> > Frederic do we really need that?
> 
> Indeed. In my queue for the cpuset interface to nohz_full, I have the following
> patch (but note DOMAIN and WQ have to stay seperate flags because workqueue
> affinity can be modified seperately from isolcpus)
> 
> ---
> From: Frederic Weisbecker <frederic@kernel.org>
> Date: Tue, 26 Jul 2022 17:03:30 +0200
> Subject: [PATCH] sched/isolation: Gather nohz_full related isolation features
>  into common flag
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  arch/x86/kvm/x86.c              |  2 +-
>  drivers/pci/pci-driver.c        |  2 +-
>  include/linux/sched/isolation.h |  7 +------
>  kernel/cpu.c                    |  4 ++--
>  kernel/kthread.c                |  4 ++--
>  kernel/rcu/tasks.h              |  2 +-
>  kernel/rcu/tree_plugin.h        |  6 +++---
>  kernel/sched/core.c             | 10 +++++-----
>  kernel/sched/fair.c             |  6 +++---
>  kernel/sched/isolation.c        | 25 +++++++------------------
>  kernel/watchdog.c               |  2 +-
>  kernel/workqueue.c              |  2 +-
>  net/core/net-sysfs.c            |  2 +-
>  13 files changed, 29 insertions(+), 45 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1910e1e78b15..d0b73fcf4a1c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9009,7 +9009,7 @@ int kvm_arch_init(void *opaque)
>  	}
>  
>  	if (pi_inject_timer == -1)
> -		pi_inject_timer = housekeeping_enabled(HK_TYPE_TIMER);
> +		pi_inject_timer = housekeeping_enabled(HK_TYPE_NOHZ_FULL);
>  #ifdef CONFIG_X86_64
>  	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
>  
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 49238ddd39ee..af3494a39921 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -378,7 +378,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>  			goto out;
>  		}
>  		cpumask_and(wq_domain_mask,
> -			    housekeeping_cpumask(HK_TYPE_WQ),
> +			    housekeeping_cpumask(HK_TYPE_NOHZ_FULL),
>  			    housekeeping_cpumask(HK_TYPE_DOMAIN));
>  
>  		cpu = cpumask_any_and(cpumask_of_node(node),
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index 8c15abd67aed..7ca34e04abe7 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -6,15 +6,10 @@
>  #include <linux/tick.h>
>  
>  enum hk_type {
> -	HK_TYPE_TIMER,
> -	HK_TYPE_RCU,
> -	HK_TYPE_MISC,
> +	HK_TYPE_NOHZ_FULL,
>  	HK_TYPE_SCHED,
> -	HK_TYPE_TICK,
>  	HK_TYPE_DOMAIN,
> -	HK_TYPE_WQ,
>  	HK_TYPE_MANAGED_IRQ,
> -	HK_TYPE_KTHREAD,
>  	HK_TYPE_MAX
>  };
>  
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index bbad5e375d3b..573f14d75a2e 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1500,8 +1500,8 @@ int freeze_secondary_cpus(int primary)
>  	cpu_maps_update_begin();
>  	if (primary == -1) {
>  		primary = cpumask_first(cpu_online_mask);
> -		if (!housekeeping_cpu(primary, HK_TYPE_TIMER))
> -			primary = housekeeping_any_cpu(HK_TYPE_TIMER);
> +		if (!housekeeping_cpu(primary, HK_TYPE_NOHZ_FULL))
> +			primary = housekeeping_any_cpu(HK_TYPE_NOHZ_FULL);
>  	} else {
>  		if (!cpu_online(primary))
>  			primary = cpumask_first(cpu_online_mask);
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 544fd4097406..0719035feba0 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -355,7 +355,7 @@ static int kthread(void *_create)
>  	 * back to default in case they have been changed.
>  	 */
>  	sched_setscheduler_nocheck(current, SCHED_NORMAL, &param);
> -	set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));
> +	set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
>  
>  	/* OK, tell user we're spawned, wait for stop or wakeup */
>  	__set_current_state(TASK_UNINTERRUPTIBLE);
> @@ -721,7 +721,7 @@ int kthreadd(void *unused)
>  	/* Setup a clean context for our children to inherit. */
>  	set_task_comm(tsk, "kthreadd");
>  	ignore_signals(tsk);
> -	set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_KTHREAD));
> +	set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
>  	set_mems_allowed(node_states[N_MEMORY]);
>  
>  	current->flags |= PF_NOFREEZE;
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index f5bf6fb430da..b99f79625b26 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -537,7 +537,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  	struct rcu_tasks *rtp = arg;
>  
>  	/* Run on housekeeping CPUs by default.  Sysadm can move if desired. */
> -	housekeeping_affine(current, HK_TYPE_RCU);
> +	housekeeping_affine(current, HK_TYPE_NOHZ_FULL);
>  	WRITE_ONCE(rtp->kthread_ptr, current); // Let GPs start!
>  
>  	/*
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index b2219577fbe2..4935b06c3caf 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1237,9 +1237,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
>  		    cpu != outgoingcpu)
>  			cpumask_set_cpu(cpu, cm);
> -	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> +	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
>  	if (cpumask_empty(cm))
> -		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
> +		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
>  	set_cpus_allowed_ptr(t, cm);
>  	mutex_unlock(&rnp->boost_kthread_mutex);
>  	free_cpumask_var(cm);
> @@ -1294,5 +1294,5 @@ static void rcu_bind_gp_kthread(void)
>  {
>  	if (!tick_nohz_full_enabled())
>  		return;
> -	housekeeping_affine(current, HK_TYPE_RCU);
> +	housekeeping_affine(current, HK_TYPE_NOHZ_FULL);
>  }
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f53c0096860b..5ff205f39197 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1079,13 +1079,13 @@ int get_nohz_timer_target(void)
>  	struct sched_domain *sd;
>  	const struct cpumask *hk_mask;
>  
> -	if (housekeeping_cpu(cpu, HK_TYPE_TIMER)) {
> +	if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL)) {
>  		if (!idle_cpu(cpu))
>  			return cpu;
>  		default_cpu = cpu;
>  	}
>  
> -	hk_mask = housekeeping_cpumask(HK_TYPE_TIMER);
> +	hk_mask = housekeeping_cpumask(HK_TYPE_NOHZ_FULL);
>  
>  	rcu_read_lock();
>  	for_each_domain(cpu, sd) {
> @@ -1101,7 +1101,7 @@ int get_nohz_timer_target(void)
>  	}
>  
>  	if (default_cpu == -1)
> -		default_cpu = housekeeping_any_cpu(HK_TYPE_TIMER);
> +		default_cpu = housekeeping_any_cpu(HK_TYPE_NOHZ_FULL);
>  	cpu = default_cpu;
>  unlock:
>  	rcu_read_unlock();
> @@ -5562,7 +5562,7 @@ static void sched_tick_start(int cpu)
>  	int os;
>  	struct tick_work *twork;
>  
> -	if (housekeeping_cpu(cpu, HK_TYPE_TICK))
> +	if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL))
>  		return;
>  
>  	WARN_ON_ONCE(!tick_work_cpu);
> @@ -5583,7 +5583,7 @@ static void sched_tick_stop(int cpu)
>  	struct tick_work *twork;
>  	int os;
>  
> -	if (housekeeping_cpu(cpu, HK_TYPE_TICK))
> +	if (housekeeping_cpu(cpu, HK_TYPE_NOHZ_FULL))
>  		return;
>  
>  	WARN_ON_ONCE(!tick_work_cpu);
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 77b2048a9326..ac3b33e00451 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10375,7 +10375,7 @@ static inline int on_null_domain(struct rq *rq)
>   * - When one of the busy CPUs notice that there may be an idle rebalancing
>   *   needed, they will kick the idle load balancer, which then does idle
>   *   load balancing for all the idle CPUs.
> - * - HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED not set
> + * - HK_TYPE_NOHZ_FULL CPUs are used for this task, because HK_TYPE_SCHED not set
>   *   anywhere yet.
>   */
>  
> @@ -10384,7 +10384,7 @@ static inline int find_new_ilb(void)
>  	int ilb;
>  	const struct cpumask *hk_mask;
>  
> -	hk_mask = housekeeping_cpumask(HK_TYPE_MISC);
> +	hk_mask = housekeeping_cpumask(HK_TYPE_NOHZ_FULL);
>  
>  	for_each_cpu_and(ilb, nohz.idle_cpus_mask, hk_mask) {
>  
> @@ -10400,7 +10400,7 @@ static inline int find_new_ilb(void)
>  
>  /*
>   * Kick a CPU to do the nohz balancing, if it is time for it. We pick any
> - * idle CPU in the HK_TYPE_MISC housekeeping set (if there is one).
> + * idle CPU in the HK_TYPE_NOHZ_FULL housekeeping set (if there is one).
>   */
>  static void kick_ilb(unsigned int flags)
>  {
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 4087718ee5b4..443f1ce83e32 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -4,20 +4,15 @@
>   *  any CPU: unbound workqueues, timers, kthreads and any offloadable work.
>   *
>   * Copyright (C) 2017 Red Hat, Inc., Frederic Weisbecker
> - * Copyright (C) 2017-2018 SUSE, Frederic Weisbecker
> + * Copyright (C) 2017-2022 SUSE, Frederic Weisbecker
>   *
>   */
>  
>  enum hk_flags {
> -	HK_FLAG_TIMER		= BIT(HK_TYPE_TIMER),
> -	HK_FLAG_RCU		= BIT(HK_TYPE_RCU),
> -	HK_FLAG_MISC		= BIT(HK_TYPE_MISC),
> +	HK_FLAG_NOHZ_FULL	= BIT(HK_TYPE_NOHZ_FULL),
>  	HK_FLAG_SCHED		= BIT(HK_TYPE_SCHED),
> -	HK_FLAG_TICK		= BIT(HK_TYPE_TICK),
>  	HK_FLAG_DOMAIN		= BIT(HK_TYPE_DOMAIN),
> -	HK_FLAG_WQ		= BIT(HK_TYPE_WQ),
>  	HK_FLAG_MANAGED_IRQ	= BIT(HK_TYPE_MANAGED_IRQ),
> -	HK_FLAG_KTHREAD		= BIT(HK_TYPE_KTHREAD),
>  };
>  
>  DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
> @@ -88,7 +83,7 @@ void __init housekeeping_init(void)
>  
>  	static_branch_enable(&housekeeping_overridden);
>  
> -	if (housekeeping.flags & HK_FLAG_TICK)
> +	if (housekeeping.flags & HK_FLAG_NOHZ_FULL)
>  		sched_tick_offload_init();
>  
>  	for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) {
> @@ -111,7 +106,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
>  	cpumask_var_t non_housekeeping_mask, housekeeping_staging;
>  	int err = 0;
>  
> -	if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK)) {
> +	if ((flags & HK_FLAG_NOHZ_FULL) && !(housekeeping.flags & HK_FLAG_NOHZ_FULL)) {
>  		if (!IS_ENABLED(CONFIG_NO_HZ_FULL)) {
>  			pr_warn("Housekeeping: nohz unsupported."
>  				" Build with CONFIG_NO_HZ_FULL\n");
> @@ -163,7 +158,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
>  			housekeeping_setup_type(type, housekeeping_staging);
>  	}
>  
> -	if ((flags & HK_FLAG_TICK) && !(housekeeping.flags & HK_FLAG_TICK))
> +	if ((flags & HK_FLAG_NOHZ_FULL) && !(housekeeping.flags & HK_FLAG_NOHZ_FULL))
>  		tick_nohz_full_setup(non_housekeeping_mask);
>  
>  	housekeeping.flags |= flags;
> @@ -179,12 +174,7 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
>  
>  static int __init housekeeping_nohz_full_setup(char *str)
>  {
> -	unsigned long flags;
> -
> -	flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
> -		HK_FLAG_MISC | HK_FLAG_KTHREAD;
> -
> -	return housekeeping_setup(str, flags);
> +	return housekeeping_setup(str, HK_FLAG_NOHZ_FULL);
>  }
>  __setup("nohz_full=", housekeeping_nohz_full_setup);
>  
> @@ -198,8 +188,7 @@ static int __init housekeeping_isolcpus_setup(char *str)
>  	while (isalpha(*str)) {
>  		if (!strncmp(str, "nohz,", 5)) {
>  			str += 5;
> -			flags |= HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER |
> -				HK_FLAG_RCU | HK_FLAG_MISC | HK_FLAG_KTHREAD;
> +			flags |= HK_FLAG_NOHZ_FULL;
>  			continue;
>  		}
>  
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 20a7a55e62b6..3e9636f4bac6 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -852,7 +852,7 @@ void __init lockup_detector_init(void)
>  		pr_info("Disabling watchdog on nohz_full cores by default\n");
>  
>  	cpumask_copy(&watchdog_cpumask,
> -		     housekeeping_cpumask(HK_TYPE_TIMER));
> +		     housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
>  
>  	if (!watchdog_nmi_probe())
>  		nmi_watchdog_available = true;
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1ea50f6be843..3eb283d76d81 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -5993,7 +5993,7 @@ void __init workqueue_init_early(void)
>  	BUILD_BUG_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));
>  
>  	BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
> -	cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ));
> +	cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
>  	cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN));
>  
>  	pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index e319e242dddf..6dddf359b754 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -852,7 +852,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
>  
>  	if (!cpumask_empty(mask)) {
>  		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
> -		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ));
> +		cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_NOHZ_FULL));
>  		if (cpumask_empty(mask)) {
>  			free_cpumask_var(mask);
>  			return -EINVAL;

Hello Frederic,

So, IIUC you are removing all flags composing nohz_full= parameter in favor of a
unified NOHZ_FULL flag. 

I am very new to the code, and I am probably missing the whole picture, but I
actually think it's a good approach to keep them split for a couple reasons:
1 - They are easier to understand in code (IMHO): 
"This cpu should not do this, because it's not able to do WQ housekeeping" looks
better than "because it's not in DOMAIN or NOHZ_FULL housekeeping"

2 - They are simpler for using: 
Suppose we have this function that should run at a WQ, but we want to keep them
out of the isolated cpus. If we have the unified flags, we need to combine both
DOMAIN and NOHZ_FULL bitmasks, and then combine it again with something like
cpu_online_mask. It usually means allocating a new cpumask_t, and also freeing
it afterwards.
If we have a single WQ flag, we can avoid the allocation altogether by using
for_each_cpu_and(), making the code much simpler.

3 - It makes easier to compose new isolation modes:
In case the future requires a new isolation mode that also uses the types of
isolation we currently have implemented, it would be much easier to just compose
it with the current HK flags, instead of having to go through all usages and do
a cpumask_and() there. Also, new isolation modes would make (2) worse.

What I am able see as a pro in "unified flag" side, is that getting all the
multiple flags doing their jobs should be a lot of trouble. 

While I do understand you are much more experienced than me on that, and that
your decision is more likely to be better, I am unable to see the arguments that
helped you reach it. 

Could you please point them, so I can better understand the decision?

Best regards,
Leo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/4] sched/isolation: Improve documentation
  2022-10-13 18:40 ` [PATCH v2 2/4] sched/isolation: Improve documentation Leonardo Bras
  2022-10-14  8:32   ` Peter Zijlstra
@ 2022-11-29 11:54   ` Frederic Weisbecker
  2022-12-17  5:04     ` Leonardo Brás
  1 sibling, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2022-11-29 11:54 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Phil Auld, Antoine Tenart, Christophe JAILLET, Wang Yufen,
	mtosatti, linux-crypto, linux-kernel, linux-pci, netdev

On Thu, Oct 13, 2022 at 03:40:27PM -0300, Leonardo Bras wrote:
> Improve documentation on housekeeping types and what to expect from
> housekeeping functions.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  include/linux/sched/isolation.h | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index 762701f295d1c..9333c28153a7a 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -7,18 +7,25 @@
>  #include <linux/tick.h>
>  
>  enum hk_type {
> -	HK_TYPE_TIMER,
> -	HK_TYPE_RCU,
> -	HK_TYPE_MISC,
> -	HK_TYPE_SCHED,
> -	HK_TYPE_TICK,
> -	HK_TYPE_DOMAIN,
> -	HK_TYPE_WQ,
> -	HK_TYPE_MANAGED_IRQ,
> -	HK_TYPE_KTHREAD,
> +	HK_TYPE_TIMER,		/* Timer interrupt, watchdogs */

More precisely:

     /* Unbound timer callbacks */

> +	HK_TYPE_RCU,		/* RCU callbacks */

More generally, because it's more than just about callbacks:

     /* Unbound RCU work */

> +	HK_TYPE_MISC,		/* Minor housekeeping categories */
> +	HK_TYPE_SCHED,		/* Scheduling and idle load balancing */
> +	HK_TYPE_TICK,		/* See isolcpus=nohz boot parameter */

Yes or nohz_full=

> +	HK_TYPE_DOMAIN,		/* See isolcpus=domain boot parameter*/
> +	HK_TYPE_WQ,		/* Work Queues*/

  /* Unbound workqueues */

> +	HK_TYPE_MANAGED_IRQ,	/* See isolcpus=managed_irq boot parameter */
> +	HK_TYPE_KTHREAD,	/* kernel threads */

  /* Unbound kernel threads */


>  	HK_TYPE_MAX
>  };
>  
> +/* Kernel parameters like nohz_full and isolcpus allow passing cpu numbers
> + * for disabling housekeeping types.
> + *
> + * The functions bellow work the opposite way, by referencing which cpus
> + * are able to perform the housekeeping type in parameter.
> + */

*below

Thanks!

> +
>  #ifdef CONFIG_CPU_ISOLATION
>  DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
>  int housekeeping_any_cpu(enum hk_type type);
> -- 
> 2.38.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain
  2022-10-14 16:27       ` Leonardo Brás
@ 2022-11-29 12:10         ` Frederic Weisbecker
  2022-12-20  6:57           ` Leonardo Brás
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2022-11-29 12:10 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: Peter Zijlstra, Steffen Klassert, Herbert Xu, David S. Miller,
	Bjorn Helgaas, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Phil Auld, Antoine Tenart, Christophe JAILLET, Wang Yufen,
	mtosatti, linux-crypto, linux-kernel, linux-pci, netdev,
	fweisbec

On Fri, Oct 14, 2022 at 01:27:25PM -0300, Leonardo Brás wrote:
> Hello Frederic,
> 
> So, IIUC you are removing all flags composing nohz_full= parameter in favor of a
> unified NOHZ_FULL flag. 
> 
> I am very new to the code, and I am probably missing the whole picture, but I
> actually think it's a good approach to keep them split for a couple reasons:
> 1 - They are easier to understand in code (IMHO): 
> "This cpu should not do this, because it's not able to do WQ housekeeping" looks
> better than "because it's not in DOMAIN or NOHZ_FULL housekeeping"

A comment above each site may solve that.

> 
> 2 - They are simpler for using: 
> Suppose we have this function that should run at a WQ, but we want to keep them
> out of the isolated cpus. If we have the unified flags, we need to combine both
> DOMAIN and NOHZ_FULL bitmasks, and then combine it again with something like
> cpu_online_mask. It usually means allocating a new cpumask_t, and also freeing
> it afterwards.
> If we have a single WQ flag, we can avoid the allocation altogether by using
> for_each_cpu_and(), making the code much simpler.

I guess having a specific function for workqueues would arrange for it.

> 
> 3 - It makes easier to compose new isolation modes:
> In case the future requires a new isolation mode that also uses the types of
> isolation we currently have implemented, it would be much easier to just compose
> it with the current HK flags, instead of having to go through all usages and do
> a cpumask_and() there. Also, new isolation modes would make (2) worse.

Actually having a new feature merged in HK_NOHZ_FULL would make it easier to
handle as it avoids spreading cpumasks. I'm not sure I understand what you
mean.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/4] sched/isolation: Improve documentation
  2022-11-29 11:54   ` Frederic Weisbecker
@ 2022-12-17  5:04     ` Leonardo Brás
  0 siblings, 0 replies; 16+ messages in thread
From: Leonardo Brás @ 2022-12-17  5:04 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Phil Auld, Antoine Tenart, Christophe JAILLET, Wang Yufen,
	mtosatti, linux-crypto, linux-kernel, linux-pci, netdev

On Tue, 2022-11-29 at 12:54 +0100, Frederic Weisbecker wrote:
> On Thu, Oct 13, 2022 at 03:40:27PM -0300, Leonardo Bras wrote:
> > Improve documentation on housekeeping types and what to expect from
> > housekeeping functions.
> > 
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  include/linux/sched/isolation.h | 25 ++++++++++++++++---------
> >  1 file changed, 16 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index 762701f295d1c..9333c28153a7a 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -7,18 +7,25 @@
> >  #include <linux/tick.h>
> >  
> >  enum hk_type {
> > -	HK_TYPE_TIMER,
> > -	HK_TYPE_RCU,
> > -	HK_TYPE_MISC,
> > -	HK_TYPE_SCHED,
> > -	HK_TYPE_TICK,
> > -	HK_TYPE_DOMAIN,
> > -	HK_TYPE_WQ,
> > -	HK_TYPE_MANAGED_IRQ,
> > -	HK_TYPE_KTHREAD,
> > +	HK_TYPE_TIMER,		/* Timer interrupt, watchdogs */
> 
> More precisely:
> 
>      /* Unbound timer callbacks */
> 
> > +	HK_TYPE_RCU,		/* RCU callbacks */
> 
> More generally, because it's more than just about callbacks:
> 
>      /* Unbound RCU work */

Both updated, thanks!

Out of curiosity, what does 'Unbound' means in this context?

> 
> > +	HK_TYPE_MISC,		/* Minor housekeeping categories */
> > +	HK_TYPE_SCHED,		/* Scheduling and idle load balancing */
> > +	HK_TYPE_TICK,		/* See isolcpus=nohz boot parameter */
> 
> Yes or nohz_full=
> 
> > +	HK_TYPE_DOMAIN,		/* See isolcpus=domain boot parameter*/
> > +	HK_TYPE_WQ,		/* Work Queues*/
> 
>   /* Unbound workqueues */
> 
> > +	HK_TYPE_MANAGED_IRQ,	/* See isolcpus=managed_irq boot parameter */
> > +	HK_TYPE_KTHREAD,	/* kernel threads */
> 
>   /* Unbound kernel threads */
> 
> 
> >  	HK_TYPE_MAX
> >  };
> >  
> > +/* Kernel parameters like nohz_full and isolcpus allow passing cpu numbers
> > + * for disabling housekeeping types.
> > + *
> > + * The functions bellow work the opposite way, by referencing which cpus
> > + * are able to perform the housekeeping type in parameter.
> > + */
> 
> *below
> 
> Thanks!

Done, done, done, done.
Thanks a lot for reviewing!

Best regards,
Leo

> 
> > +
> >  #ifdef CONFIG_CPU_ISOLATION
> >  DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
> >  int housekeeping_any_cpu(enum hk_type type);
> > -- 
> > 2.38.0
> > 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain
  2022-11-29 12:10         ` Frederic Weisbecker
@ 2022-12-20  6:57           ` Leonardo Brás
  0 siblings, 0 replies; 16+ messages in thread
From: Leonardo Brás @ 2022-12-20  6:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Steffen Klassert, Herbert Xu, David S. Miller,
	Bjorn Helgaas, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Phil Auld, Antoine Tenart, Christophe JAILLET, Wang Yufen,
	mtosatti, linux-crypto, linux-kernel, linux-pci, netdev,
	fweisbec

On Tue, 2022-11-29 at 13:10 +0100, Frederic Weisbecker wrote:
> On Fri, Oct 14, 2022 at 01:27:25PM -0300, Leonardo Brás wrote:
> > Hello Frederic,
> > 
> > So, IIUC you are removing all flags composing nohz_full= parameter in favor of a
> > unified NOHZ_FULL flag. 
> > 
> > I am very new to the code, and I am probably missing the whole picture, but I
> > actually think it's a good approach to keep them split for a couple reasons:
> > 1 - They are easier to understand in code (IMHO): 
> > "This cpu should not do this, because it's not able to do WQ housekeeping" looks
> > better than "because it's not in DOMAIN or NOHZ_FULL housekeeping"
> 
> A comment above each site may solve that.

Sure, but not having to leave comments would be better. Or am I missing
something?

> 
> > 
> > 2 - They are simpler for using: 
> > Suppose we have this function that should run at a WQ, but we want to keep them
> > out of the isolated cpus. If we have the unified flags, we need to combine both
> > DOMAIN and NOHZ_FULL bitmasks, and then combine it again with something like
> > cpu_online_mask. It usually means allocating a new cpumask_t, and also freeing
> > it afterwards.
> > If we have a single WQ flag, we can avoid the allocation altogether by using
> > for_each_cpu_and(), making the code much simpler.
> 
> I guess having a specific function for workqueues would arrange for it.

You mean keeping a WQ housekeeping bitmap? This could be a solution, but it
would affect only the WQ example.

> 
> > 
> > 3 - It makes easier to compose new isolation modes:
> > In case the future requires a new isolation mode that also uses the types of
> > isolation we currently have implemented, it would be much easier to just compose
> > it with the current HK flags, instead of having to go through all usages and do
> > a cpumask_and() there. Also, new isolation modes would make (2) worse.
> 
> Actually having a new feature merged in HK_NOHZ_FULL would make it easier to
> handle as it avoids spreading cpumasks. I'm not sure I understand what you
> mean.

IIUC, your queued patch merges the housekeeping types HK_TYPE_TIMER,
HK_TYPE_RCU, HK_TYPE_MISC, HK_TYPE_TICK, HK_TYPE_WQ and HK_TYPE_KTHREAD in a
single HK_TYPE_NOHZ_FULL.

Suppose in future we need a new isolation feature in cmdline, say 
isol_new=<cpulist>, and it works exactly like nohz_full=<cpulist>, but also
needs to isolate cpulist against something else, say doing X.

How would this get implemented? IIUC, following the same pattern:
- A new type HK_TYPE_ISOL_NEW would be created together with a cpumask,
- The new cpumask would be used to keep cpulist from doing X
- All places that use HK_TYPE_NOHZ_FULL bitmap for isolation would need to also
bitmask_and() the new cpumask. (sometimes needing a local cpumask_t)

Ok, there may be shortcuts for this, like keeping an intermediary bitmap, but
that can become tricky.

Other more complex example: New isolation feature isol_new2=<cpulist> behaves
like nohz_full=<cpulist>, keeps cpulist from doing X, but allows unbound RCU
work. Now it's even harder to have shortcuts from previous implementation.

What I am trying to defend here is that keeping the HK_type with the idea of
"things to get cpulist isolated from" works better for future implementations
than a single flag with a lot of responsibilities:
- A new type HK_TYPE_X would be created together with a cpumask,
- The new cpumask would be used to keep cpulist from doing X
- isol_new=<cpulist> is composed with the flags for what cpulist is getting
isolated.
- (No need to touch already implemented isolations.)

In fact, I propose that it works better for current implementations also:
The current patch (3/4) takes the WQ isolation responsibility from
HK_TYPE_DOMAIN and focus it in HK_TYPE_WQ, adding it to isolcpus=<cpulist>
flags. This avoids some cpumask_and()s, and a cpumask_t kzalloc, and makes the
code less complex to implement when we need to put isolation in further parts of
the code. (patch 4/4)

I am not sure if I am missing some important point here. 
Please let me know if it's the case. 

> 
> Thanks.
> 

Thank you for replying!
Leo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/4] crypto/pcrypt: Do not use isolated CPUs for callback
  2022-10-13 18:40 ` [PATCH v2 4/4] crypto/pcrypt: Do not use isolated CPUs for callback Leonardo Bras
@ 2023-05-27  0:47   ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 16+ messages in thread
From: Leonardo Bras Soares Passos @ 2023-05-27  0:47 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, David S. Miller, Bjorn Helgaas,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tejun Heo,
	Lai Jiangshan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Leonardo Bras, Frederic Weisbecker, Phil Auld, Antoine Tenart,
	Christophe JAILLET, Wang Yufen, mtosatti
  Cc: linux-crypto, linux-kernel, linux-pci, netdev

Friendly ping
(for this single patch)

On Thu, Oct 13, 2022 at 3:41 PM Leonardo Bras <leobras@redhat.com> wrote:
>
> Currently pcrypt_aead_init_tfm() will pick callback cpus (ctx->cb_cpu)
> from any online cpus. Later padata_reorder() will queue_work_on() the
> chosen cb_cpu.
>
> This is undesired if the chosen cb_cpu is listed as isolated (i.e. using
> isolcpus=... or nohz_full=... kernel parameters), since the work queued
> will interfere with the workload on the isolated cpu.
>
> Make sure isolated cpus are not used for pcrypt.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  crypto/pcrypt.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
> index 9d10b846ccf73..0162629a03957 100644
> --- a/crypto/pcrypt.c
> +++ b/crypto/pcrypt.c
> @@ -16,6 +16,7 @@
>  #include <linux/kobject.h>
>  #include <linux/cpu.h>
>  #include <crypto/pcrypt.h>
> +#include <linux/sched/isolation.h>
>
>  static struct padata_instance *pencrypt;
>  static struct padata_instance *pdecrypt;
> @@ -175,13 +176,15 @@ static int pcrypt_aead_init_tfm(struct crypto_aead *tfm)
>         struct pcrypt_instance_ctx *ictx = aead_instance_ctx(inst);
>         struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(tfm);
>         struct crypto_aead *cipher;
> +       const cpumask_t *hk_wq = housekeeping_cpumask(HK_TYPE_WQ);
>
>         cpu_index = (unsigned int)atomic_inc_return(&ictx->tfm_count) %
> -                   cpumask_weight(cpu_online_mask);
> +                   cpumask_weight_and(hk_wq, cpu_online_mask);
>
> -       ctx->cb_cpu = cpumask_first(cpu_online_mask);
> +       ctx->cb_cpu = cpumask_first_and(hk_wq, cpu_online_mask);
>         for (cpu = 0; cpu < cpu_index; cpu++)
> -               ctx->cb_cpu = cpumask_next(ctx->cb_cpu, cpu_online_mask);
> +               ctx->cb_cpu = cpumask_next_and(ctx->cb_cpu, hk_wq,
> +                                              cpu_online_mask);
>
>         cipher = crypto_spawn_aead(&ictx->spawn);
>
> --
> 2.38.0
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-05-27  0:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-13 18:40 [PATCH v2 0/4] CPU isolation improvements Leonardo Bras
2022-10-13 18:40 ` [PATCH v2 1/4] sched/isolation: Fix style issues reported by checkpatch Leonardo Bras
2022-10-14  8:28   ` Peter Zijlstra
2022-10-13 18:40 ` [PATCH v2 2/4] sched/isolation: Improve documentation Leonardo Bras
2022-10-14  8:32   ` Peter Zijlstra
2022-10-14 15:40     ` Leonardo Bras Soares Passos
2022-11-29 11:54   ` Frederic Weisbecker
2022-12-17  5:04     ` Leonardo Brás
2022-10-13 18:40 ` [PATCH v2 3/4] sched/isolation: Add HK_TYPE_WQ to isolcpus=domain Leonardo Bras
2022-10-14  8:36   ` Peter Zijlstra
2022-10-14 13:24     ` Frederic Weisbecker
2022-10-14 16:27       ` Leonardo Brás
2022-11-29 12:10         ` Frederic Weisbecker
2022-12-20  6:57           ` Leonardo Brás
2022-10-13 18:40 ` [PATCH v2 4/4] crypto/pcrypt: Do not use isolated CPUs for callback Leonardo Bras
2023-05-27  0:47   ` Leonardo Bras Soares Passos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).