linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug
@ 2020-03-18 19:25 Mathias Nyman
  2020-03-19 20:24 ` Evan Green
  2020-03-20  9:52 ` Thomas Gleixner
  0 siblings, 2 replies; 23+ messages in thread
From: Mathias Nyman @ 2020-03-18 19:25 UTC (permalink / raw)
  To: x86
  Cc: linux-pci, Thomas Gleixner, LKML, Bjorn Helgaas, Evan Green,
	Ghorai, Sukumar, Amara, Madhusudanarao, Nandamuri, Srikanth

Hi

I can reproduce the lost MSI interrupt issue on 5.6-rc6 which includes
the "Plug non-maskable MSI affinity race" patch.

I can see this on a couple platforms, I'm running a script that first generates
a lot of usb traffic, and then in a busyloop sets irq affinity and turns off
and on cpus:

for i in 1 3 5 7; do
	echo "1" > /sys/devices/system/cpu/cpu$i/online
done
echo "A" > "/proc/irq/*/smp_affinity"
echo "A" > "/proc/irq/*/smp_affinity"
echo "F" > "/proc/irq/*/smp_affinity"
for i in 1 3 5 7; do
	echo "0" > /sys/devices/system/cpu/cpu$i/online
done

I added some very simple debugging but I don't really know what to look for.
xhci interrupts (122) just stop after a setting msi affinity, it survived many
similar msi_set_affinity() calls before this.

I'm not that familiar with the inner workings of this, but I'll be happy to
help out with adding debugging and testing patches.

Details:

 cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
   0:         26          0          0          0          0          0          0          0   IO-APIC    2-edge      timer
   1:          0          0          0          0          0          7          0          0   IO-APIC    1-edge      i8042
   4:          0          4      59941          0          0          0          0          0   IO-APIC    4-edge      ttyS0
   8:          0          0          0          0          0          0          1          0   IO-APIC    8-edge      rtc0
   9:          0         40          8          0          0          0          0          0   IO-APIC    9-fasteoi   acpi
  16:          0          0          0          0          0          0          0          0   IO-APIC   16-fasteoi   i801_smbus
 120:          0          0        293          0          0          0          0          0   PCI-MSI 32768-edge      i915
 121:        728          0          0         58          0          0          0          0   PCI-MSI 520192-edge      enp0s31f6
 122:      63575       2271          0       1957       7262          0          0          0   PCI-MSI 327680-edge      xhci_hcd
 123:          0          0          0          0          0          0          0          0   PCI-MSI 514048-edge      snd_hda_intel:card0
 NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
 
trace snippet: 
      <idle>-0     [001] d.h.   129.676900: xhci_irq: xhci irq
      <idle>-0     [001] d.h.   129.677507: xhci_irq: xhci irq
      <idle>-0     [001] d.h.   129.677556: xhci_irq: xhci irq
      <idle>-0     [001] d.h.   129.677647: xhci_irq: xhci irq
      <...>-14    [001] d..1   129.679802: msi_set_affinity: direct update msi 122, vector 33 -> 33, apicid: 2 -> 6
      <idle>-0     [003] d.h.   129.682639: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.682769: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.682908: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.683552: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.683677: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.683819: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.689017: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.689140: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.689307: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.689984: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.690107: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.690278: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.695541: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.695674: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.695839: xhci_irq: xhci irq
      <idle>-0     [003] d.H.   129.696667: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.696797: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.696973: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.702288: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.702380: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.702493: xhci_irq: xhci irq
 migration/3-24    [003] d..1   129.703150: msi_set_affinity: direct update msi 122, vector 33 -> 33, apicid: 6 -> 0
 kworker/0:0-5     [000] d.h.   131.328790: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   133.312704: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   135.360786: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
      <idle>-0     [000] d.h.   137.344694: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   139.128679: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   141.312686: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   143.360703: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   145.344791: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0


--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -92,6 +92,10 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
            cfg->vector == old_cfg.vector ||
            old_cfg.vector == MANAGED_IRQ_SHUTDOWN_VECTOR ||
            cfg->dest_apicid == old_cfg.dest_apicid) {
+               trace_printk("direct update msi %u, vector %u -> %u, apicid: %u -> %u\n",
+                    irqd->irq,
+                    old_cfg.vector, cfg->vector,
+                    old_cfg.dest_apicid, cfg->dest_apicid);
                irq_msi_update_msg(irqd, cfg);
                return ret;
        }
@@ -134,7 +138,10 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
         */
        if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
                this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);
-
+       trace_printk("twostep update msi, irq %u, vector %u -> %u, apicid: %u -> %u\n",
+                    irqd->irq,
+                    old_cfg.vector, cfg->vector,
+                    old_cfg.dest_apicid, cfg->dest_apicid);
        /* Redirect it to the new vector on the local CPU temporarily */
        old_cfg.vector = cfg->vector;
        irq_msi_update_msg(irqd, &old_cfg);

Thanks
-Mathias

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-05-11 20:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200508005528.GB61703@otc-nc-03>
2020-05-08 11:04 ` MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug Thomas Gleixner
2020-05-08 16:09   ` Raj, Ashok
2020-05-08 16:49     ` Thomas Gleixner
2020-05-11 19:03       ` Raj, Ashok
2020-05-11 20:14         ` Thomas Gleixner
2020-03-18 19:25 Mathias Nyman
2020-03-19 20:24 ` Evan Green
2020-03-20  8:07   ` Mathias Nyman
2020-03-20  9:52 ` Thomas Gleixner
2020-03-23  9:42   ` Mathias Nyman
2020-03-23 14:10     ` Thomas Gleixner
2020-03-23 20:32       ` Mathias Nyman
2020-03-24  0:24         ` Thomas Gleixner
2020-03-24 16:17           ` Evan Green
2020-03-24 19:03             ` Thomas Gleixner
2020-05-01 18:43               ` Raj, Ashok
2020-05-05 19:36                 ` Thomas Gleixner
2020-05-05 20:16                   ` Raj, Ashok
2020-05-05 21:47                     ` Thomas Gleixner
2020-05-07 12:18                       ` Raj, Ashok
2020-05-07 12:53                         ` Thomas Gleixner
     [not found]                           ` <20200507175715.GA22426@otc-nc-03>
2020-05-07 19:41                             ` Thomas Gleixner
2020-03-25 17:12             ` Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).