linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq
@ 2020-06-02 18:47 Saidi, Ali
  2020-06-03 15:16 ` Marc Zyngier
  0 siblings, 1 reply; 18+ messages in thread
From: Saidi, Ali @ 2020-06-02 18:47 UTC (permalink / raw)
  To: Herrenschmidt, Benjamin, maz
  Cc: tglx, jason, linux-kernel, linux-arm-kernel, Woodhouse, David,
	Zilberman, Zeev, Machulsky, Zorik


On 5/31/20, 9:40 PM, "Herrenschmidt, Benjamin" <benh@amazon.com> wrote:

    On Sun, 2020-05-31 at 12:09 +0100, Marc Zyngier wrote:
    > 
    > 
    > > Not great indeed. But this is not, as far as I can tell, a GIC
    > > driver problem.
    > > 
    > > The semantic of activate/deactivate (which maps to started/shutdown
    > > in the IRQ code) is that the HW resources for a given interrupt are
    > > only committed when the interrupt is activated. Trying to perform
    > > actions involving the HW on an interrupt that isn't active cannot be
    > > guaranteed to take effect.
    > > 
    > > I'd rather address it in the core code, by preventing set_affinity (and
    > > potentially others) to take place when the interrupt is not in the
    > > STARTED state. Userspace would get an error, which is perfectly
    > > legitimate, and which it already has to deal with it for plenty of
    > > other
    > > reasons.
    
    So I finally found time to dig a bit in there :) Code has changed a bit
    since last I looked. But I have memories of the startup code messing
    around with the affinity, and here it is. In irq_startup() :
    
    
    		switch (__irq_startup_managed(desc, aff, force)) {
    		case IRQ_STARTUP_NORMAL:
    			ret = __irq_startup(desc);
    			irq_setup_affinity(desc);
    			break;
    		case IRQ_STARTUP_MANAGED:
    			irq_do_set_affinity(d, aff, false);
    			ret = __irq_startup(desc);
    			break;
    		case IRQ_STARTUP_ABORT:
    			irqd_set_managed_shutdown(d);
    			return 0;
    
    So we have two cases here. Normal and managed.
    
    In the managed case, we set the affinity before startup. I feel like your
    patch might break that or am I missing something ?
    
    Additionally, your patch would break any userspace program that expects to
    be able to change the affinity on an interrupt before it's been started.
    I don't know if such a thing exsits but the fact that we hit that bug
    makes me think it might.
    
    Now most controller drivers (at least that I'm familiar with, which doesn't
    include GiC at this point) can deal with that just fine.
    
    Now there's also another possible issue:
    
    Your patch checks irqd_is_started(). Now I always mixup irqd vs irq_state these
    days so I may be wrong but irq_state_set_started() is only done in __irq_startup
    which will *not* be called if the interrupt has NOAUTOEN.
    
    Is that ok ? Do we intend for affinity setting not to work until the first
    enable_irq() for such an interrupt ? We could check activated instead of
    started I suppose. (again provided I didn't mixup two different things
    between the irqd and the irq_state stuff).
    
    For these reasons my gut feeling is we should just fix GIC as Ali wanted to
    do initially.
    
    The basic idea is simply to defer the HW configuration until the interrupt
    has been started. I don't see why that would be an issue. Have set_affinity just
    store the mask (and apply whatever other sanity checking it might want to do)
    until the itnerrupt is started and when started, apply things to HW.
    
    I might be missing a reason why it's more complicated than that :) But I do
    feel a bit uncomfortable with your approach.
    
Looks like the x86 apic set_affinity call explicitly checks for if it’s activated in the managed case which makes sense given the code Ben posted above:
          /*
           * Core code can call here for inactive interrupts. For inactive
           * interrupts which use managed or reservation mode there is no
           * point in going through the vector assignment right now as the
           * activation will assign a vector which fits the destination
           * cpumask. Let the core code store the destination mask and be
           * done with it.
           */
          if (!irqd_is_activated(irqd) &&
              (apicd->is_managed || apicd->can_reserve))    

My original patch should certain check activated and not disabled. With that do you still have reservations Marc?

Thanks,
Ali





^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq
@ 2020-06-11 17:44 Saidi, Ali
  0 siblings, 0 replies; 18+ messages in thread
From: Saidi, Ali @ 2020-06-11 17:44 UTC (permalink / raw)
  To: Thomas Gleixner, Herrenschmidt, Benjamin, maz
  Cc: jason, linux-kernel, linux-arm-kernel, Woodhouse, David,
	Zilberman, Zeev, Machulsky, Zorik


On 6/8/20, 8:49 AM, "Thomas Gleixner" <tglx@linutronix.de> wrote:
    
    
    "Herrenschmidt, Benjamin" <benh@amazon.com> writes:
    > On Wed, 2020-06-03 at 16:16 +0100, Marc Zyngier wrote:
    >> > My original patch should certain check activated and not disabled.
    >> > With that do you still have reservations Marc?
    >>
    >> I'd still prefer it if we could do something in core code, rather
    >> than spreading these checks in the individual drivers. If we can't,
    >> fair enough. But it feels like the core set_affinity function could
    >> just do the same thing in a single place (although the started vs
    >> activated is yet another piece of the puzzle I didn't consider,
    >> and the ITS doesn't need the "can_reserve" thing).
    >
    > For the sake of fixing the problem in a timely and backportable way I
    > would suggest first merging the fix, *then* fixing the core core.
    
    The "fix" is just wrong
    
    >       if (cpu != its_dev->event_map.col_map[id]) {
    >               target_col = &its_dev->its->collections[cpu];
    > -             its_send_movi(its_dev, target_col, id);
    > +
    > +             /* If the IRQ is disabled a discard was sent so don't move */
    > +             if (!irqd_irq_disabled(d))
    
    That check needs to be !irqd_is_activated() because enable_irq() does
    not touch anything affinity related.
    
    > +                     its_send_movi(its_dev, target_col, id);
    > +
    >               its_dev->event_map.col_map[id] = cpu;
    >               irq_data_update_effective_affinity(d, cpumask_of(cpu));
    
    And then these associtations are disconnected from reality in any case.
    
    Something like the completely untested patch below should work.

I've been unable to reproduce the problem with your patch on an Arm system.

Thanks,

Ali




^ permalink raw reply	[flat|nested] 18+ messages in thread
* [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq
@ 2020-05-29  1:55 Ali Saidi
  2020-05-29  4:07 ` Zenghui Yu
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ali Saidi @ 2020-05-29  1:55 UTC (permalink / raw)
  To: Thomas Gleixner, Jason Cooper, Marc Zyngier, linux-kernel
  Cc: linux-arm-kernel, benh, dwmw, zeev, zorik, alisaidi

If an interrupt is disabled the ITS driver has sent a discard removing
the DeviceID and EventID from the ITT. After this occurs it can't be
moved to another collection with a MOVI and a command error occurs if
attempted. Before issuing the MOVI command make sure that the IRQ isn't
disabled and change the activate code to try and use the previous
affinity.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 124251b0ccba..1235dd9a2fb2 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 	/* don't set the affinity when the target cpu is same as current one */
 	if (cpu != its_dev->event_map.col_map[id]) {
 		target_col = &its_dev->its->collections[cpu];
-		its_send_movi(its_dev, target_col, id);
+
+		/* If the IRQ is disabled a discard was sent so don't move */
+		if (!irqd_irq_disabled(d))
+			its_send_movi(its_dev, target_col, id);
+
 		its_dev->event_map.col_map[id] = cpu;
 		irq_data_update_effective_affinity(d, cpumask_of(cpu));
 	}
@@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct irq_domain *domain,
 	if (its_dev->its->numa_node >= 0)
 		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
 
-	/* Bind the LPI to the first possible CPU */
-	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+	/* If the cpu set to a different CPU that is still online use it */
+	cpu = its_dev->event_map.col_map[event];
+
+	cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
+
+	if (!cpumask_test_cpu(cpu, cpu_mask)) {
+		/* Bind the LPI to the first possible CPU */
+		cpu = cpumask_first(cpu_mask);
+	}
+
 	if (cpu >= nr_cpu_ids) {
 		if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
 			return -EINVAL;
-- 
2.24.1.AMZN


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-06-11 17:44 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-02 18:47 [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq Saidi, Ali
2020-06-03 15:16 ` Marc Zyngier
2020-06-03 22:14   ` Herrenschmidt, Benjamin
2020-06-08 13:48     ` Thomas Gleixner
2020-06-08 21:59       ` Benjamin Herrenschmidt
2020-06-08 23:36         ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2020-06-11 17:44 Saidi, Ali
2020-05-29  1:55 Ali Saidi
2020-05-29  4:07 ` Zenghui Yu
2020-05-29  8:32 ` Marc Zyngier
2020-05-29 12:36   ` Saidi, Ali
2020-05-30 16:49     ` Marc Zyngier
2020-05-31 11:09       ` Marc Zyngier
2020-06-01  0:10         ` Saidi, Ali
2020-06-01  2:40         ` Herrenschmidt, Benjamin
2020-06-02 20:54           ` Thomas Gleixner
2020-06-03 12:44             ` Marc Zyngier
2020-05-31  2:53 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).