All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: "Saidi, Ali" <alisaidi@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Jason Cooper <jason@lakedaemon.net>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, "Herrenschmidt,
	Benjamin" <benh@amazon.com>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"Zilberman, Zeev" <zeev@amazon.com>,
	"Machulsky, Zorik" <zorik@amazon.com>
Subject: Re: [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq
Date: Sun, 31 May 2020 12:09:25 +0100	[thread overview]
Message-ID: <eed907d48de84c96e3ceb27c1ed6f622@kernel.org> (raw)
In-Reply-To: <20200530174929.7bf6d5d7@why>

On 2020-05-30 17:49, Marc Zyngier wrote:
> Hi Ali,
> 
> On Fri, 29 May 2020 12:36:42 +0000
> "Saidi, Ali" <alisaidi@amazon.com> wrote:
> 
>> Hi Marc,
>> 
>> > On May 29, 2020, at 3:33 AM, Marc Zyngier <maz@kernel.org> wrote:
>> >
>> > Hi Ali,
>> >
>> >> On 2020-05-29 02:55, Ali Saidi wrote:
>> >> If an interrupt is disabled the ITS driver has sent a discard removing
>> >> the DeviceID and EventID from the ITT. After this occurs it can't be
>> >> moved to another collection with a MOVI and a command error occurs if
>> >> attempted. Before issuing the MOVI command make sure that the IRQ isn't
>> >> disabled and change the activate code to try and use the previous
>> >> affinity.
>> >>
>> >> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
>> >> ---
>> >> drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
>> >> 1 file changed, 15 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/irqchip/irq-gic-v3-its.c
>> >> b/drivers/irqchip/irq-gic-v3-its.c
>> >> index 124251b0ccba..1235dd9a2fb2 100644
>> >> --- a/drivers/irqchip/irq-gic-v3-its.c
>> >> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> >> @@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d,
>> >> const struct cpumask *mask_val,
>> >>      /* don't set the affinity when the target cpu is same as current one
>> >> */
>> >>      if (cpu != its_dev->event_map.col_map[id]) {
>> >>              target_col = &its_dev->its->collections[cpu];
>> >> -             its_send_movi(its_dev, target_col, id);
>> >> +
>> >> +             /* If the IRQ is disabled a discard was sent so don't move */
>> >> +             if (!irqd_irq_disabled(d))
>> >> +                     its_send_movi(its_dev, target_col, id);
>> >> +
>> >
>> > This looks wrong. What you are testing here is whether the interrupt
>> > is masked, not that there isn't a valid translation.
>> I’m not exactly sure the correct condition, but what I’m looking for
>> is interrupts which are deactivated and we have thus sent a discard.
> 
> That looks like IRQD_IRQ_STARTED not being set in this case.
> 
>> >
>> > In the commit message, you're saying that we've issued a discard.
>> > This hints at doing a set_affinity on an interrupt that has been
>> > deactivated (mapping removed). Is that actually the case? If so,
>> > why was it deactivated
>> > the first place?
>> This is the case. If we down a NIC, that interface’s MSIs will be
>> deactivated but remain allocated until the device is unbound from the
>> driver or the NIC is brought up.
>> 
>> While stressing down/up a device I’ve found that irqbalance can move
>> interrupts and you end up with the situation described. The device is
>> downed, the interrupts are deactivated but still present and then
>> trying to move one results in sending a MOVI after the DISCARD which
>> is an error per the GIC spec.
> 
> Not great indeed. But this is not, as far as I can tell, a GIC
> driver problem.
> 
> The semantic of activate/deactivate (which maps to started/shutdown
> in the IRQ code) is that the HW resources for a given interrupt are
> only committed when the interrupt is activated. Trying to perform
> actions involving the HW on an interrupt that isn't active cannot be
> guaranteed to take effect.
> 
> I'd rather address it in the core code, by preventing set_affinity (and
> potentially others) to take place when the interrupt is not in the
> STARTED state. Userspace would get an error, which is perfectly
> legitimate, and which it already has to deal with it for plenty of 
> other
> reasons.

How about this:

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 453a8a0f4804..1a2ac1392c0f 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -147,7 +147,8 @@ cpumask_var_t irq_default_affinity;
  static bool __irq_can_set_affinity(struct irq_desc *desc)
  {
  	if (!desc || !irqd_can_balance(&desc->irq_data) ||
-	    !desc->irq_data.chip || !desc->irq_data.chip->irq_set_affinity)
+	    !desc->irq_data.chip || !desc->irq_data.chip->irq_set_affinity ||
+	    !irqd_is_started(&desc->irq_data))
  		return false;
  	return true;
  }

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: "Saidi, Ali" <alisaidi@amazon.com>
Cc: "Herrenschmidt, Benjamin" <benh@amazon.com>,
	Jason Cooper <jason@lakedaemon.net>,
	"Machulsky, Zorik" <zorik@amazon.com>,
	linux-kernel@vger.kernel.org, "Zilberman, Zeev" <zeev@amazon.com>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	"Woodhouse, David" <dwmw@amazon.co.uk>
Subject: Re: [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq
Date: Sun, 31 May 2020 12:09:25 +0100	[thread overview]
Message-ID: <eed907d48de84c96e3ceb27c1ed6f622@kernel.org> (raw)
In-Reply-To: <20200530174929.7bf6d5d7@why>

On 2020-05-30 17:49, Marc Zyngier wrote:
> Hi Ali,
> 
> On Fri, 29 May 2020 12:36:42 +0000
> "Saidi, Ali" <alisaidi@amazon.com> wrote:
> 
>> Hi Marc,
>> 
>> > On May 29, 2020, at 3:33 AM, Marc Zyngier <maz@kernel.org> wrote:
>> >
>> > Hi Ali,
>> >
>> >> On 2020-05-29 02:55, Ali Saidi wrote:
>> >> If an interrupt is disabled the ITS driver has sent a discard removing
>> >> the DeviceID and EventID from the ITT. After this occurs it can't be
>> >> moved to another collection with a MOVI and a command error occurs if
>> >> attempted. Before issuing the MOVI command make sure that the IRQ isn't
>> >> disabled and change the activate code to try and use the previous
>> >> affinity.
>> >>
>> >> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
>> >> ---
>> >> drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
>> >> 1 file changed, 15 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/irqchip/irq-gic-v3-its.c
>> >> b/drivers/irqchip/irq-gic-v3-its.c
>> >> index 124251b0ccba..1235dd9a2fb2 100644
>> >> --- a/drivers/irqchip/irq-gic-v3-its.c
>> >> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> >> @@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d,
>> >> const struct cpumask *mask_val,
>> >>      /* don't set the affinity when the target cpu is same as current one
>> >> */
>> >>      if (cpu != its_dev->event_map.col_map[id]) {
>> >>              target_col = &its_dev->its->collections[cpu];
>> >> -             its_send_movi(its_dev, target_col, id);
>> >> +
>> >> +             /* If the IRQ is disabled a discard was sent so don't move */
>> >> +             if (!irqd_irq_disabled(d))
>> >> +                     its_send_movi(its_dev, target_col, id);
>> >> +
>> >
>> > This looks wrong. What you are testing here is whether the interrupt
>> > is masked, not that there isn't a valid translation.
>> I’m not exactly sure the correct condition, but what I’m looking for
>> is interrupts which are deactivated and we have thus sent a discard.
> 
> That looks like IRQD_IRQ_STARTED not being set in this case.
> 
>> >
>> > In the commit message, you're saying that we've issued a discard.
>> > This hints at doing a set_affinity on an interrupt that has been
>> > deactivated (mapping removed). Is that actually the case? If so,
>> > why was it deactivated
>> > the first place?
>> This is the case. If we down a NIC, that interface’s MSIs will be
>> deactivated but remain allocated until the device is unbound from the
>> driver or the NIC is brought up.
>> 
>> While stressing down/up a device I’ve found that irqbalance can move
>> interrupts and you end up with the situation described. The device is
>> downed, the interrupts are deactivated but still present and then
>> trying to move one results in sending a MOVI after the DISCARD which
>> is an error per the GIC spec.
> 
> Not great indeed. But this is not, as far as I can tell, a GIC
> driver problem.
> 
> The semantic of activate/deactivate (which maps to started/shutdown
> in the IRQ code) is that the HW resources for a given interrupt are
> only committed when the interrupt is activated. Trying to perform
> actions involving the HW on an interrupt that isn't active cannot be
> guaranteed to take effect.
> 
> I'd rather address it in the core code, by preventing set_affinity (and
> potentially others) to take place when the interrupt is not in the
> STARTED state. Userspace would get an error, which is perfectly
> legitimate, and which it already has to deal with it for plenty of 
> other
> reasons.

How about this:

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 453a8a0f4804..1a2ac1392c0f 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -147,7 +147,8 @@ cpumask_var_t irq_default_affinity;
  static bool __irq_can_set_affinity(struct irq_desc *desc)
  {
  	if (!desc || !irqd_can_balance(&desc->irq_data) ||
-	    !desc->irq_data.chip || !desc->irq_data.chip->irq_set_affinity)
+	    !desc->irq_data.chip || !desc->irq_data.chip->irq_set_affinity ||
+	    !irqd_is_started(&desc->irq_data))
  		return false;
  	return true;
  }

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-05-31 11:09 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-29  1:55 [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq Ali Saidi
2020-05-29  1:55 ` Ali Saidi
2020-05-29  4:07 ` Zenghui Yu
2020-05-29  4:07   ` Zenghui Yu
2020-05-29  8:32 ` Marc Zyngier
2020-05-29  8:32   ` Marc Zyngier
2020-05-29 12:36   ` Saidi, Ali
2020-05-29 12:36     ` Saidi, Ali
2020-05-30 16:49     ` Marc Zyngier
2020-05-30 16:49       ` Marc Zyngier
2020-05-31 11:09       ` Marc Zyngier [this message]
2020-05-31 11:09         ` Marc Zyngier
2020-06-01  0:10         ` Saidi, Ali
2020-06-01  0:10           ` Saidi, Ali
2020-06-01  2:40         ` Herrenschmidt, Benjamin
2020-06-01  2:40           ` Herrenschmidt, Benjamin
2020-06-02 20:54           ` Thomas Gleixner
2020-06-02 20:54             ` Thomas Gleixner
2020-06-03 12:44             ` Marc Zyngier
2020-06-03 12:44               ` Marc Zyngier
2020-05-31  2:53 ` kbuild test robot
2020-05-31  2:53   ` kbuild test robot
2020-05-31  2:53   ` kbuild test robot
2020-07-17 21:34 ` [tip: irq/urgent] genirq/affinity: Handle affinity setting on inactive interrupts correctly tip-bot2 for Thomas Gleixner
2020-06-02 18:47 [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq Saidi, Ali
2020-06-03 15:16 ` Marc Zyngier
2020-06-03 15:16   ` Marc Zyngier
2020-06-03 22:14   ` Herrenschmidt, Benjamin
2020-06-03 22:14     ` Herrenschmidt, Benjamin
2020-06-08 13:48     ` Thomas Gleixner
2020-06-08 13:48       ` Thomas Gleixner
2020-06-08 21:59       ` Benjamin Herrenschmidt
2020-06-08 21:59         ` Benjamin Herrenschmidt
2020-06-08 23:36         ` Thomas Gleixner
2020-06-08 23:36           ` Thomas Gleixner
2020-06-11 17:44 Saidi, Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eed907d48de84c96e3ceb27c1ed6f622@kernel.org \
    --to=maz@kernel.org \
    --cc=alisaidi@amazon.com \
    --cc=benh@amazon.com \
    --cc=dwmw@amazon.co.uk \
    --cc=jason@lakedaemon.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=zeev@amazon.com \
    --cc=zorik@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.