| Message ID | 20161003101708.34795-1-mika.westerberg@linux.intel.com |
|---|---|
| State | New, archived |
| Headers | show |
| Series |
|
| Related | show |
Hi Mika, On 10/03/16 13:17, Mika Westerberg wrote: > When a CPU is about to be offlined we call fixup_irqs() that resets IRQ > affinities related to the CPU in question. The same thing is also done when > the system is suspended to S-states like S3 (mem). > > For each IRQ we try to complete any on-going move regardless whether the > IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch > its chip_data, assume it is of type struct apic_chip_data and manipulate it > by clearing old_domain mask etc. For irq_chips that are not part of the > x86_vector_domain, like those created by various GPIO drivers, will find > their chip_data being changed unexpectly. > > Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets > corrupted after resume: > > # cat /sys/kernel/debug/gpio > gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: > gpio-511 ( |sysfs ) in hi > > # rtcwake -s10 -mmem > <10 seconds passes> > > # cat /sys/kernel/debug/gpio > gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: > gpio-511 ( |sysfs ) in ? > > Note '?' in the output. It means the struct gpio_chip ->get function is > NULL whereas before suspend it was there. > > Fix this by first checking that the IRQ belongs to x86_vector_domain before > we try to use the chip_data as struct apic_chip_data. > > Reported-by: Sakari Ailus <sakari.ailus@linux.intel.com> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Thanks for debugging this! I've tested it on the laptop where the SD card is no longer detected after suspend; with this patch it works fine. Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 6066d945c40e..5d30c5e42bb1 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -661,11 +661,28 @@ void irq_complete_move(struct irq_cfg *cfg) */ void irq_force_complete_move(struct irq_desc *desc) { - struct irq_data *irqdata = irq_desc_get_irq_data(desc); - struct apic_chip_data *data = apic_chip_data(irqdata); - struct irq_cfg *cfg = data ? &data->cfg : NULL; + struct irq_data *irqdata; + struct apic_chip_data *data; + struct irq_cfg *cfg; unsigned int cpu; + /* + * The function is called for all descriptors regardless of which + * irqdomain they belong to. For example if an IRQ is provided by + * an irq_chip as part of a GPIO driver, the chip data for that + * descriptor is specific to the irq_chip in question. + * + * Check first that the chip_data is what we expect + * (apic_chip_data) before touching it any further. + */ + irqdata = irq_domain_get_irq_data(x86_vector_domain, + irq_desc_get_irq(desc)); + if (!irqdata) + return; + + data = apic_chip_data(irqdata); + cfg = data ? &data->cfg : NULL; + if (!cfg) return;
When a CPU is about to be offlined we call fixup_irqs() that resets IRQ affinities related to the CPU in question. The same thing is also done when the system is suspended to S-states like S3 (mem). For each IRQ we try to complete any on-going move regardless whether the IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch its chip_data, assume it is of type struct apic_chip_data and manipulate it by clearing old_domain mask etc. For irq_chips that are not part of the x86_vector_domain, like those created by various GPIO drivers, will find their chip_data being changed unexpectly. Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets corrupted after resume: # cat /sys/kernel/debug/gpio gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: gpio-511 ( |sysfs ) in hi # rtcwake -s10 -mmem <10 seconds passes> # cat /sys/kernel/debug/gpio gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00: gpio-511 ( |sysfs ) in ? Note '?' in the output. It means the struct gpio_chip ->get function is NULL whereas before suspend it was there. Fix this by first checking that the IRQ belongs to x86_vector_domain before we try to use the chip_data as struct apic_chip_data. Reported-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> --- arch/x86/kernel/apic/vector.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)