linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] sched/isolation: isolate from handling managed interrupt
@ 2020-01-17  3:53 Ming Lei
  2020-01-17 15:46 ` Thomas Gleixner
  0 siblings, 1 reply; 2+ messages in thread
From: Ming Lei @ 2020-01-17  3:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ming Lei, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Peter Xu,
	Juri Lelli

Userspace can't change managed interrupt's affinity via /proc interface,
however, applications often require the specified isolated CPUs not
disturbed by interrupts.

Add sub-parameter 'managed_irq' for 'isolcpus', so that we can isolate
from handling managed interrupt.

Not select irq effective CPU from isolated CPUs if the interrupt affinity
includes at least one housekeeping CPU. This way guarantees that isolated
CPUs won't be interrupted by managed irq if IO isn't submitted from any
isolated CPU.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
V2:
	- not allocate cpumask in context with irq_desc::lock held
	- deal with cpu hotplug race with new flag of IRQD_MANAGED_FORCE_MIGRATE
	- use comment doc from Thomas


 .../admin-guide/kernel-parameters.txt         |  9 +++++
 include/linux/irq.h                           |  8 ++++
 include/linux/sched/isolation.h               |  1 +
 kernel/irq/cpuhotplug.c                       |  6 ++-
 kernel/irq/manage.c                           | 37 ++++++++++++++++++-
 kernel/sched/isolation.c                      |  6 +++
 6 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..e0f18ac866d4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1933,6 +1933,15 @@
 			  <cpu number> begins at 0 and the maximum value is
 			  "number of CPUs in system - 1".
 
+			managed_irq
+			  Isolate from handling managed interrupt. Userspace can't
+			  change managed interrupt's affinity via /proc interface,
+			  however application often requires the specified isolated
+			  CPUs not disturbed by interrupts. This way guarantees that
+			  isolated CPU won't be interrupted if IO isn't submitted
+			  from isolated CPU when managed interrupt is used by IO
+			  drivers.
+
 			The format of <cpu-list> is described above.
 
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 7853eb9301f2..0a6bd1c56205 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -209,6 +209,8 @@ struct irq_data {
  * IRQD_SINGLE_TARGET		- IRQ allows only a single affinity target
  * IRQD_DEFAULT_TRIGGER_SET	- Expected trigger already been set
  * IRQD_CAN_RESERVE		- Can use reservation mode
+ * IRQD_MANAGED_FORCE_MIGRATE	- Force to migrate irq after one CPU in its
+ *				  affinity becomes online
  */
 enum {
 	IRQD_TRIGGER_MASK		= 0xf,
@@ -231,6 +233,7 @@ enum {
 	IRQD_SINGLE_TARGET		= (1 << 24),
 	IRQD_DEFAULT_TRIGGER_SET	= (1 << 25),
 	IRQD_CAN_RESERVE		= (1 << 26),
+	IRQD_MANAGED_FORCE_MIGRATE	= (1 << 27),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -390,6 +393,11 @@ static inline bool irqd_can_reserve(struct irq_data *d)
 	return __irqd_to_state(d) & IRQD_CAN_RESERVE;
 }
 
+static inline bool irqd_managed_force_migrate(struct irq_data *d)
+{
+	return __irqd_to_state(d) & IRQD_MANAGED_FORCE_MIGRATE;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 6c8512d3be88..0fbcbacd1b29 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -13,6 +13,7 @@ enum hk_flags {
 	HK_FLAG_TICK		= (1 << 4),
 	HK_FLAG_DOMAIN		= (1 << 5),
 	HK_FLAG_WQ		= (1 << 6),
+	HK_FLAG_MANAGED_IRQ	= (1 << 7),
 };
 
 #ifdef CONFIG_CPU_ISOLATION
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 6c7ca2e983a5..20c7704ce019 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -77,7 +77,8 @@ static bool migrate_one_irq(struct irq_desc *desc)
 	 * Note: Do not check desc->action as this might be a chained
 	 * interrupt.
 	 */
-	if (irqd_is_per_cpu(d) || !irqd_is_started(d) || !irq_needs_fixup(d)) {
+	if ((irqd_is_per_cpu(d) || !irqd_is_started(d) || !irq_needs_fixup(d))
+			&& !irqd_managed_force_migrate(d)) {
 		/*
 		 * If an irq move is pending, abort it if the dying CPU is
 		 * the sole target.
@@ -192,6 +193,9 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
 	 */
 	if (!irqd_is_single_target(data))
 		irq_set_affinity_locked(data, affinity, false);
+
+	if (irqd_managed_force_migrate(data))
+		migrate_one_irq(desc);
 }
 
 /**
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 1753486b440c..046329f2d39a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -18,6 +18,7 @@
 #include <linux/sched.h>
 #include <linux/sched/rt.h>
 #include <linux/sched/task.h>
+#include <linux/sched/isolation.h>
 #include <uapi/linux/sched/types.h>
 #include <linux/task_work.h>
 
@@ -213,11 +214,45 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	struct irq_desc *desc = irq_data_to_desc(data);
 	struct irq_chip *chip = irq_data_get_irq_chip(data);
 	int ret;
+	const struct cpumask *mask_to_set = mask;
 
 	if (!chip || !chip->irq_set_affinity)
 		return -EINVAL;
 
-	ret = chip->irq_set_affinity(data, mask, force);
+	/*
+	 * If this is a managed interrupt check whether the requested
+	 * affinity mask intersects with a housekeeping CPU. If so, then
+	 * remove the isolated CPUs from the mask and just keep the
+	 * housekeeping CPU(s). This prevents the affinity setter from
+	 * routing the interrupt to an isolated CPU to avoid that I/O
+	 * submitted from a housekeeping CPU causes interrupts on an
+	 * isolated one.
+	 *
+	 * If the masks do not intersect or include online CPU(s) then
+	 * keep the requested mask. The isolated target CPUs are only
+	 * receiving interrupts when the I/O operation was submitted
+	 * directly from them.
+	 *
+	 * If all housekeeping CPUs are offline, 'FORCE_MIGRATE' is set
+	 * so that we can migrate the irq from isolate CPU when any
+	 * housekeeping CPU becomes online.
+	 */
+	if (irqd_affinity_is_managed(data)) {
+		static struct cpumask tmp_mask;
+		const struct cpumask *housekeeping =
+			housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
+
+		if (cpumask_intersects(mask, housekeeping)) {
+			cpumask_and(&tmp_mask, mask, housekeeping);
+			if (cpumask_intersects(&tmp_mask, cpu_online_mask)) {
+				mask_to_set = &tmp_mask;
+				irqd_clear(data, IRQD_MANAGED_FORCE_MIGRATE);
+			} else {
+				irqd_set(data, IRQD_MANAGED_FORCE_MIGRATE);
+			}
+		}
+	}
+	ret = chip->irq_set_affinity(data, mask_to_set, force);
 	switch (ret) {
 	case IRQ_SET_MASK_OK:
 	case IRQ_SET_MASK_OK_DONE:
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 9fcb2a695a41..008d6ac2342b 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -163,6 +163,12 @@ static int __init housekeeping_isolcpus_setup(char *str)
 			continue;
 		}
 
+		if (!strncmp(str, "managed_irq,", 12)) {
+			str += 12;
+			flags |= HK_FLAG_MANAGED_IRQ;
+			continue;
+		}
+
 		pr_warn("isolcpus: Error, unknown flag\n");
 		return 0;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH V2] sched/isolation: isolate from handling managed interrupt
  2020-01-17  3:53 [PATCH V2] sched/isolation: isolate from handling managed interrupt Ming Lei
@ 2020-01-17 15:46 ` Thomas Gleixner
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Gleixner @ 2020-01-17 15:46 UTC (permalink / raw)
  To: Ming Lei, linux-kernel
  Cc: Ming Lei, Ingo Molnar, Peter Zijlstra, Peter Xu, Juri Lelli

Ming Lei <ming.lei@redhat.com> writes:
>  #ifdef CONFIG_CPU_ISOLATION
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index 6c7ca2e983a5..20c7704ce019 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -77,7 +77,8 @@ static bool migrate_one_irq(struct irq_desc *desc)
>  	 * Note: Do not check desc->action as this might be a chained
>  	 * interrupt.
>  	 */
> -	if (irqd_is_per_cpu(d) || !irqd_is_started(d) || !irq_needs_fixup(d)) {
> +	if ((irqd_is_per_cpu(d) || !irqd_is_started(d) || !irq_needs_fixup(d))
> +			&& !irqd_managed_force_migrate(d)) {

That's broken. Assume the bit is set and then the remaining CPU goes
offline and shuts down the interupt. As a consequence the interrupt is
in shutdown state when the first CPU of the affinity set comes online
again. It will see the force migrate bit which prevents starting it up
again. FAIL!

>  		/*
>  		 * If an irq move is pending, abort it if the dying CPU is
>  		 * the sole target.
> @@ -192,6 +193,9 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
>  	 */
>  	if (!irqd_is_single_target(data))
>  		irq_set_affinity_locked(data, affinity, false);
> +
> +	if (irqd_managed_force_migrate(data))
> +		migrate_one_irq(desc);

That's not a good idea, really. This is forceful migration which we only
should do when there is no other option especially on X86 for the
interrupts which cannot be safely migrated outside of an actual
interrupt service routine of that interrupt line. When a CPU is
unplugged then we have to do this, no matter what. This isolation thing
is best effort and does not justify this at all.

I really do not like this force migrate wording either. We force stuff
when there is no other way out, but this is not the case here.

We know that there is an isolation mask set, right? So we can simply do:

-  	if (!irqd_is_single_target(data))
+	if (!irqd_is_single_target(data) || hk_should_isolate(data, affinity))
  		irq_set_affinity_locked(data, affinity, false);

and have

static bool hk_should_isolate(data, affinity)
{
        if (!hk_isolation)
        	return false;

        if (!irqd_is_managed(data))
        	return false;

        if (!cpumask_intersects(affinity, hk_isolmask))
        	return false;

        e = irq_data_get_effective_affinity_mask(data);
        return !cpumask_subset(e, hk_isolmask);
}

and let irq_set_affinity_locked() -> irq_do_set_affinity() do the right
thing, i.e. migrate it directly or mark it pending for migration.

> @@ -213,11 +214,45 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
>  	struct irq_desc *desc = irq_data_to_desc(data);
>  	struct irq_chip *chip = irq_data_get_irq_chip(data);
>  	int ret;
> +	const struct cpumask *mask_to_set = mask;
>
>  	if (!chip || !chip->irq_set_affinity)
>  		return -EINVAL;
>  
> -	ret = chip->irq_set_affinity(data, mask, force);
>
> +	if (irqd_affinity_is_managed(data)) {
> +		static struct cpumask tmp_mask;

What protects this mask? Assume there are two concurrent callers on
different CPUs for different interrupts. -> FAIL!

The place I pointed you to does TheRightThing.

Aside of that this mask is only needed when ISOLATION is enabled in Kconfig.

> +		const struct cpumask *housekeeping =
> +			housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);

Can you please code in a way which makes these horrible line breaks go away?

> +
> +		if (cpumask_intersects(mask, housekeeping)) {
> +			cpumask_and(&tmp_mask, mask, housekeeping);
> +			if (cpumask_intersects(&tmp_mask, cpu_online_mask)) {
> +				mask_to_set = &tmp_mask;
> +				irqd_clear(data, IRQD_MANAGED_FORCE_MIGRATE);
> +			} else {
> +				irqd_set(data, IRQD_MANAGED_FORCE_MIGRATE);

And with the above this flag dance is not required at all.

Please take your time and analyze the code you are changing properly
before you send out the next half baken version.

I'm happy to review and help, but this frenzy mode sucks.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-01-17 15:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-17  3:53 [PATCH V2] sched/isolation: isolate from handling managed interrupt Ming Lei
2020-01-17 15:46 ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).