* [patch 00/14] x86/irq: Plug various vector cleanup races
@ 2015-12-31 16:30 Thomas Gleixner
2015-12-31 16:30 ` [patch 01/14] x86/irq: Fix a race in x86_vector_free_irqs() Thomas Gleixner
` (14 more replies)
0 siblings, 15 replies; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
Joe reported a nasty race in the vector cleanup code which results in stale
irq descriptors in the vector arrays and a potential use after free.
This series addresses this issue, another race which was found and fixed by
Jiang related to the same area, plus a cpu hotplug issue which is not related
to this.
This lot is intended for stable, but not yet marked so. The diffstat below is
rather large, but this is mostly due to extensive commentry which I added in
the process.
This is also available via git from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/apic
Thanks,
tglx
---
include/asm/irq.h | 5 -
kernel/apic/vector.c | 221 +++++++++++++++++++++++++++++++++------------------
kernel/irq.c | 11 ++
3 files changed, 158 insertions(+), 79 deletions(-)
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 01/14] x86/irq: Fix a race in x86_vector_free_irqs()
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2015-12-31 16:30 ` [patch 02/14] x86/irq: Validate that irq descriptor is still active Thomas Gleixner
` (13 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq-fix-a-race-window-in-x86_vector_free_irqs.patch --]
[-- Type: text/plain, Size: 2980 bytes --]
There's a race condition between
x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
xxxxx //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and
smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(&vector_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->xxxx // may access freed memory
raw_spin_unlock(&desc->lock);
}
which may cause smp_irq_move_cleanup_interrupt() to access freed memory.
Call irq_domain_reset_irq_data(), which clears the pointer with vector lock
held.
[ tglx: Free memory outside of lock held region. ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Link: http://lkml.kernel.org/r/1450880014-11741-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -224,10 +224,8 @@ static int assign_irq_vector_policy(int
static void clear_irq_vector(int irq, struct apic_chip_data *data)
{
struct irq_desc *desc;
- unsigned long flags;
int cpu, vector;
- raw_spin_lock_irqsave(&vector_lock, flags);
BUG_ON(!data->cfg.vector);
vector = data->cfg.vector;
@@ -237,10 +235,8 @@ static void clear_irq_vector(int irq, st
data->cfg.vector = 0;
cpumask_clear(data->domain);
- if (likely(!data->move_in_progress)) {
- raw_spin_unlock_irqrestore(&vector_lock, flags);
+ if (likely(!data->move_in_progress))
return;
- }
desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
@@ -253,7 +249,6 @@ static void clear_irq_vector(int irq, st
}
}
data->move_in_progress = 0;
- raw_spin_unlock_irqrestore(&vector_lock, flags);
}
void init_irq_alloc_info(struct irq_alloc_info *info,
@@ -274,19 +269,24 @@ void copy_irq_alloc_info(struct irq_allo
static void x86_vector_free_irqs(struct irq_domain *domain,
unsigned int virq, unsigned int nr_irqs)
{
+ struct apic_chip_data *apic_data;
struct irq_data *irq_data;
+ unsigned long flags;
int i;
for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
+ raw_spin_lock_irqsave(&vector_lock, flags);
clear_irq_vector(virq + i, irq_data->chip_data);
- free_apic_chip_data(irq_data->chip_data);
+ apic_data = irq_data->chip_data;
+ irq_domain_reset_irq_data(irq_data);
+ raw_spin_unlock_irqrestore(&vector_lock, flags);
+ free_apic_chip_data(apic_data);
#ifdef CONFIG_X86_IO_APIC
if (virq + i < nr_legacy_irqs())
legacy_irq_data[virq + i] = NULL;
#endif
- irq_domain_reset_irq_data(irq_data);
}
}
}
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 02/14] x86/irq: Validate that irq descriptor is still active
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
2015-12-31 16:30 ` [patch 01/14] x86/irq: Fix a race in x86_vector_free_irqs() Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:16 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 04/14] x86/irq: Reorganize the return path in assign_irq_vector Thomas Gleixner
` (12 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Validate-that-irq-descriptor-is-still-active.patch --]
[-- Type: text/plain, Size: 773 bytes --]
In fixup_irqs() we unconditionally dereference the irq chip of an irq
descriptor. The descriptor might still be valid, but already cleaned up,
i.e. the chip removed. Add a check for this condition.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/irq.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -470,6 +470,15 @@ void fixup_irqs(void)
}
chip = irq_data_get_irq_chip(data);
+ /*
+ * The interrupt descriptor might have been cleaned up
+ * already, but it is not yet removed from the radix tree
+ */
+ if (!chip) {
+ raw_spin_unlock(&desc->lock);
+ continue;
+ }
+
if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
chip->irq_mask(data);
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 04/14] x86/irq: Reorganize the return path in assign_irq_vector
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
2015-12-31 16:30 ` [patch 01/14] x86/irq: Fix a race in x86_vector_free_irqs() Thomas Gleixner
2015-12-31 16:30 ` [patch 02/14] x86/irq: Validate that irq descriptor is still active Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:17 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 03/14] x86/irq: Do not use apic_chip_data.old_domain as temporary buffer Thomas Gleixner
` (11 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Reorganize-return-path-in-assign_irq_vector.patch --]
[-- Type: text/plain, Size: 2208 bytes --]
Use an explicit goto for the cases where we have success in the search/update
and return -ENOSPC if the search loop ends due to no space.
Preparatory patch for fixes. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -116,13 +116,12 @@ static int __assign_irq_vector(int irq,
*/
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
- int cpu, err;
+ int cpu;
if (d->move_in_progress)
return -EBUSY;
/* Only try and allocate irqs on cpus that are present */
- err = -ENOSPC;
cpumask_clear(d->old_domain);
cpumask_clear(searched_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
@@ -132,9 +131,8 @@ static int __assign_irq_vector(int irq,
apic->vector_allocation_domain(cpu, vector_cpumask, mask);
if (cpumask_subset(vector_cpumask, d->domain)) {
- err = 0;
if (cpumask_equal(vector_cpumask, d->domain))
- break;
+ goto success;
/*
* New cpumask using the vector is a proper subset of
* the current in use mask. So cleanup the vector
@@ -145,7 +143,7 @@ static int __assign_irq_vector(int irq,
d->move_in_progress =
cpumask_intersects(d->old_domain, cpu_online_mask);
cpumask_and(d->domain, d->domain, vector_cpumask);
- break;
+ goto success;
}
vector = current_vector;
@@ -185,17 +183,13 @@ static int __assign_irq_vector(int irq,
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
- err = 0;
- break;
+ goto success;
}
+ return -ENOSPC;
- if (!err) {
- /* cache destination APIC IDs into cfg->dest_apicid */
- err = apic->cpu_mask_to_apicid_and(mask, d->domain,
- &d->cfg.dest_apicid);
- }
-
- return err;
+success:
+ /* cache destination APIC IDs into cfg->dest_apicid */
+ return apic->cpu_mask_to_apicid_and(mask, d->domain, &d->cfg.dest_apicid);
}
static int assign_irq_vector(int irq, struct apic_chip_data *data,
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 03/14] x86/irq: Do not use apic_chip_data.old_domain as temporary buffer
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (2 preceding siblings ...)
2015-12-31 16:30 ` [patch 04/14] x86/irq: Reorganize the return path in assign_irq_vector Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2015-12-31 16:30 ` [patch 05/14] x86/irq: Reorganize the search in assign_irq_vector Thomas Gleixner
` (10 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq-do-not-reuse-struct-apic_chip_data-old_domain-as-temporary-buffer.patch --]
[-- Type: text/plain, Size: 2246 bytes --]
Function __assign_irq_vector() makes use of apic_chip_data.old_domain as a
temporary buffer, which is in the way of using apic_chip_data.old_domain for
synchronizing the vector cleanup with the vector assignement code.
Use a proper temporary cpumask for this.
[ tglx: Renamed the mask to searched_cpumask for clarity ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Link: http://lkml.kernel.org/r/1450880014-11741-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,7 +30,7 @@ struct apic_chip_data {
struct irq_domain *x86_vector_domain;
static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask;
+static cpumask_var_t vector_cpumask, searched_cpumask;
static struct irq_chip lapic_controller;
#ifdef CONFIG_X86_IO_APIC
static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -124,6 +124,7 @@ static int __assign_irq_vector(int irq,
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
cpumask_clear(d->old_domain);
+ cpumask_clear(searched_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
@@ -157,9 +158,9 @@ static int __assign_irq_vector(int irq,
}
if (unlikely(current_vector == vector)) {
- cpumask_or(d->old_domain, d->old_domain,
+ cpumask_or(searched_cpumask, searched_cpumask,
vector_cpumask);
- cpumask_andnot(vector_cpumask, mask, d->old_domain);
+ cpumask_andnot(vector_cpumask, mask, searched_cpumask);
cpu = cpumask_first_and(vector_cpumask,
cpu_online_mask);
continue;
@@ -404,6 +405,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);
BUG_ON(!alloc_cpumask_var(&vector_cpumask, GFP_KERNEL));
+ BUG_ON(!alloc_cpumask_var(&searched_cpumask, GFP_KERNEL));
return arch_early_ioapic_init();
}
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 05/14] x86/irq: Reorganize the search in assign_irq_vector
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (3 preceding siblings ...)
2015-12-31 16:30 ` [patch 03/14] x86/irq: Do not use apic_chip_data.old_domain as temporary buffer Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:17 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 06/14] x86/irq: Check vector allocation early Thomas Gleixner
` (9 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Reorganize-the-search-in-assign_irq_vector.patch --]
[-- Type: text/plain, Size: 1716 bytes --]
Split out the code which advances the target cpu for the search so we can
reuse it for the next patch which adds an early validation check for the
vectormask which we get from the apic.
Add comments while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -155,14 +155,9 @@ static int __assign_irq_vector(int irq,
vector = FIRST_EXTERNAL_VECTOR + offset;
}
- if (unlikely(current_vector == vector)) {
- cpumask_or(searched_cpumask, searched_cpumask,
- vector_cpumask);
- cpumask_andnot(vector_cpumask, mask, searched_cpumask);
- cpu = cpumask_first_and(vector_cpumask,
- cpu_online_mask);
- continue;
- }
+ /* If the search wrapped around, try the next cpu */
+ if (unlikely(current_vector == vector))
+ goto next_cpu;
if (test_bit(vector, used_vectors))
goto next;
@@ -184,6 +179,19 @@ static int __assign_irq_vector(int irq,
d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
goto success;
+
+next_cpu:
+ /*
+ * We exclude the current @vector_cpumask from the requested
+ * @mask and try again with the next online cpu in the
+ * result. We cannot modify @mask, so we use @vector_cpumask
+ * as a temporary buffer here as it will be reassigned when
+ * calling apic->vector_allocation_domain() above.
+ */
+ cpumask_or(searched_cpumask, searched_cpumask, vector_cpumask);
+ cpumask_andnot(vector_cpumask, mask, searched_cpumask);
+ cpu = cpumask_first_and(vector_cpumask, cpu_online_mask);
+ continue;
}
return -ENOSPC;
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 06/14] x86/irq: Check vector allocation early
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (4 preceding siblings ...)
2015-12-31 16:30 ` [patch 05/14] x86/irq: Reorganize the search in assign_irq_vector Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:17 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 08/14] x86/irq: Get rid of code duplication Thomas Gleixner
` (8 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Check-vector-allocation-early.patch --]
[-- Type: text/plain, Size: 4099 bytes --]
__assign_irq_vector() uses the vector_cpumask which is assigned by
apic->vector_allocation_domain() without doing basic sanity checks. That can
result in a situation where the final assignement of a newly found vector
fails in apic->cpu_mask_to_apicid_and(). So we have to do rollbacks for no
reason.
apic->cpu_mask_to_apicid_and() only fails if
vector_cpumask & requested_cpumask & cpu_online_mask
is empty.
Check for this condition right away and if the result is empty try immediately
the next possible cpu in the requested mask. So in case of a failure the old
setting is unchanged and we can remove the rollback code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,7 +30,7 @@ struct apic_chip_data {
struct irq_domain *x86_vector_domain;
static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask, searched_cpumask;
+static cpumask_var_t vector_cpumask, vector_searchmask, searched_cpumask;
static struct irq_chip lapic_controller;
#ifdef CONFIG_X86_IO_APIC
static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -128,8 +128,20 @@ static int __assign_irq_vector(int irq,
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
+ /* Get the possible target cpus for @mask/@cpu from the apic */
apic->vector_allocation_domain(cpu, vector_cpumask, mask);
+ /*
+ * Clear the offline cpus from @vector_cpumask for searching
+ * and verify whether the result overlaps with @mask. If true,
+ * then the call to apic->cpu_mask_to_apicid_and() will
+ * succeed as well. If not, no point in trying to find a
+ * vector in this mask.
+ */
+ cpumask_and(vector_searchmask, vector_cpumask, cpu_online_mask);
+ if (!cpumask_intersects(vector_searchmask, mask))
+ goto next_cpu;
+
if (cpumask_subset(vector_cpumask, d->domain)) {
if (cpumask_equal(vector_cpumask, d->domain))
goto success;
@@ -162,7 +174,7 @@ static int __assign_irq_vector(int irq,
if (test_bit(vector, used_vectors))
goto next;
- for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
+ for_each_cpu(new_cpu, vector_searchmask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, new_cpu)[vector]))
goto next;
}
@@ -174,7 +186,7 @@ static int __assign_irq_vector(int irq,
d->move_in_progress =
cpumask_intersects(d->old_domain, cpu_online_mask);
}
- for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
+ for_each_cpu(new_cpu, vector_searchmask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
@@ -196,8 +208,14 @@ static int __assign_irq_vector(int irq,
return -ENOSPC;
success:
- /* cache destination APIC IDs into cfg->dest_apicid */
- return apic->cpu_mask_to_apicid_and(mask, d->domain, &d->cfg.dest_apicid);
+ /*
+ * Cache destination APIC IDs into cfg->dest_apicid. This cannot fail
+ * as we already established, that mask & d->domain & cpu_online_mask
+ * is not empty.
+ */
+ BUG_ON(apic->cpu_mask_to_apicid_and(mask, d->domain,
+ &d->cfg.dest_apicid));
+ return 0;
}
static int assign_irq_vector(int irq, struct apic_chip_data *data,
@@ -407,6 +425,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);
BUG_ON(!alloc_cpumask_var(&vector_cpumask, GFP_KERNEL));
+ BUG_ON(!alloc_cpumask_var(&vector_searchmask, GFP_KERNEL));
BUG_ON(!alloc_cpumask_var(&searched_cpumask, GFP_KERNEL));
return arch_early_ioapic_init();
@@ -496,14 +515,7 @@ static int apic_set_affinity(struct irq_
return -EINVAL;
err = assign_irq_vector(irq, data, dest);
- if (err) {
- if (assign_irq_vector(irq, data,
- irq_data_get_affinity_mask(irq_data)))
- pr_err("Failed to recover vector for irq %d\n", irq);
- return err;
- }
-
- return IRQ_SET_MASK_OK;
+ return err ? err : IRQ_SET_MASK_OK;
}
static struct irq_chip lapic_controller = {
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 07/14] x86/irq: Copy vectormask instead of an AND operation
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (6 preceding siblings ...)
2015-12-31 16:30 ` [patch 08/14] x86/irq: Get rid of code duplication Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:18 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 09/14] x86/irq: Remove offline cpus from vector cleanup Thomas Gleixner
` (6 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Copy-vectormask-instead-of-an-AND-operation.patch --]
[-- Type: text/plain, Size: 799 bytes --]
In the case that the new vector mask is a subset of the existing mask there is
no point to do a AND operation of currentmask & newmask. The result is
newmask. So we can simply copy the new mask to the current mask and be done
with it. Preparatory patch for further consolidation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -154,7 +154,7 @@ static int __assign_irq_vector(int irq,
vector_cpumask);
d->move_in_progress =
cpumask_intersects(d->old_domain, cpu_online_mask);
- cpumask_and(d->domain, d->domain, vector_cpumask);
+ cpumask_copy(d->domain, vector_cpumask);
goto success;
}
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 08/14] x86/irq: Get rid of code duplication
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (5 preceding siblings ...)
2015-12-31 16:30 ` [patch 06/14] x86/irq: Check vector allocation early Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:18 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 07/14] x86/irq: Copy vectormask instead of an AND operation Thomas Gleixner
` (7 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Get-rid-of-code-duplication.patch --]
[-- Type: text/plain, Size: 2767 bytes --]
Reusing an existing vector and assigning a new vector has duplicated
code. Consolidate it.
This is also a preparatory patch for finally plugging the cleanup race.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 33 +++++++++++++++------------------
1 file changed, 15 insertions(+), 18 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -116,7 +116,7 @@ static int __assign_irq_vector(int irq,
*/
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
- int cpu;
+ int cpu, vector;
if (d->move_in_progress)
return -EBUSY;
@@ -126,7 +126,7 @@ static int __assign_irq_vector(int irq,
cpumask_clear(searched_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
- int new_cpu, vector, offset;
+ int new_cpu, offset;
/* Get the possible target cpus for @mask/@cpu from the apic */
apic->vector_allocation_domain(cpu, vector_cpumask, mask);
@@ -146,16 +146,12 @@ static int __assign_irq_vector(int irq,
if (cpumask_equal(vector_cpumask, d->domain))
goto success;
/*
- * New cpumask using the vector is a proper subset of
- * the current in use mask. So cleanup the vector
- * allocation for the members that are not used anymore.
+ * Mark the cpus which are not longer in the mask for
+ * cleanup.
*/
- cpumask_andnot(d->old_domain, d->domain,
- vector_cpumask);
- d->move_in_progress =
- cpumask_intersects(d->old_domain, cpu_online_mask);
- cpumask_copy(d->domain, vector_cpumask);
- goto success;
+ cpumask_andnot(d->old_domain, d->domain, vector_cpumask);
+ vector = d->cfg.vector;
+ goto update;
}
vector = current_vector;
@@ -181,16 +177,12 @@ static int __assign_irq_vector(int irq,
/* Found one! */
current_vector = vector;
current_offset = offset;
- if (d->cfg.vector) {
+ /* Schedule the old vector for cleanup on all cpus */
+ if (d->cfg.vector)
cpumask_copy(d->old_domain, d->domain);
- d->move_in_progress =
- cpumask_intersects(d->old_domain, cpu_online_mask);
- }
for_each_cpu(new_cpu, vector_searchmask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
- d->cfg.vector = vector;
- cpumask_copy(d->domain, vector_cpumask);
- goto success;
+ goto update;
next_cpu:
/*
@@ -207,6 +199,11 @@ static int __assign_irq_vector(int irq,
}
return -ENOSPC;
+update:
+ /* Cleanup required ? */
+ d->move_in_progress = cpumask_intersects(d->old_domain, cpu_online_mask);
+ d->cfg.vector = vector;
+ cpumask_copy(d->domain, vector_cpumask);
success:
/*
* Cache destination APIC IDs into cfg->dest_apicid. This cannot fail
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 09/14] x86/irq: Remove offline cpus from vector cleanup
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (7 preceding siblings ...)
2015-12-31 16:30 ` [patch 07/14] x86/irq: Copy vectormask instead of an AND operation Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:18 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 10/14] x86/irq: Clear move_in_progress before sending cleanup IPI Thomas Gleixner
` (5 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Remove-offline-cpus-from-vector-cleanup.patch --]
[-- Type: text/plain, Size: 799 bytes --]
No point of keeping offline cpus in the cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -200,8 +200,12 @@ static int __assign_irq_vector(int irq,
return -ENOSPC;
update:
- /* Cleanup required ? */
- d->move_in_progress = cpumask_intersects(d->old_domain, cpu_online_mask);
+ /*
+ * Exclude offline cpus from the cleanup mask and set the
+ * move_in_progress flag when the result is not empty.
+ */
+ cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
+ d->move_in_progress = !cpumask_empty(d->old_domain);
d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
success:
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 10/14] x86/irq: Clear move_in_progress before sending cleanup IPI
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (8 preceding siblings ...)
2015-12-31 16:30 ` [patch 09/14] x86/irq: Remove offline cpus from vector cleanup Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:19 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 12/14] x86/irq: Remove outgoing CPU from vector cleanup mask Thomas Gleixner
` (4 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Protect-send_cleanup_vector-with-vector_lock.patch --]
[-- Type: text/plain, Size: 1989 bytes --]
send_cleanup_vector() fiddles with the old_domain mask unprotected because it
relies on the protection by the move_in_progress flag. But this is fatal, as
the flag is reset after the IPI has been sent. So a cpu which receives the IPI
can still see the flag set and therefor ignores the cleanup request. If no
other cleanup request happens then the vector stays stale on that cpu and in
case of an irq removal the vector still persists. That can lead to use after
free when the next cleanup IPI happens.
Protect the code with vector_lock and clear move_in_progress before sending
the IPI.
This does not plug the race which Joe reported because:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
The full fix comes with a later patch.
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -530,6 +530,8 @@ static void __send_cleanup_vector(struct
{
cpumask_var_t cleanup_mask;
+ raw_spin_lock(&vector_lock);
+ data->move_in_progress = 0;
if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
unsigned int i;
@@ -541,7 +543,7 @@ static void __send_cleanup_vector(struct
apic->send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
free_cpumask_var(cleanup_mask);
}
- data->move_in_progress = 0;
+ raw_spin_unlock(&vector_lock);
}
void send_cleanup_vector(struct irq_cfg *cfg)
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 11/14] x86/irq: Remove the cpumask allocation from send_cleanup_vector()
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (10 preceding siblings ...)
2015-12-31 16:30 ` [patch 12/14] x86/irq: Remove outgoing CPU from vector cleanup mask Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:19 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 13/14] x86/irq: Call irq_force_move_complete with irq descriptor Thomas Gleixner
` (2 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Remove-the-cpumask-allocation-from-send_cleanup_vector.patch --]
[-- Type: text/plain, Size: 1311 bytes --]
There is no need to allocate a new cpumask for sending the cleanup vector. The
old_domain mask is now protected by the vector_lock, so we can safely remove
the offline cpus from it and send the IPI with the resulting mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -528,21 +528,11 @@ static struct irq_chip lapic_controller
#ifdef CONFIG_SMP
static void __send_cleanup_vector(struct apic_chip_data *data)
{
- cpumask_var_t cleanup_mask;
-
raw_spin_lock(&vector_lock);
+ cpumask_and(data->old_domain, data->old_domain, cpu_online_mask);
data->move_in_progress = 0;
- if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
- unsigned int i;
-
- for_each_cpu_and(i, data->old_domain, cpu_online_mask)
- apic->send_IPI_mask(cpumask_of(i),
- IRQ_MOVE_CLEANUP_VECTOR);
- } else {
- cpumask_and(cleanup_mask, data->old_domain, cpu_online_mask);
- apic->send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
- free_cpumask_var(cleanup_mask);
- }
+ if (!cpumask_empty(data->old_domain))
+ apic->send_IPI_mask(data->old_domain, IRQ_MOVE_CLEANUP_VECTOR);
raw_spin_unlock(&vector_lock);
}
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 12/14] x86/irq: Remove outgoing CPU from vector cleanup mask
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (9 preceding siblings ...)
2015-12-31 16:30 ` [patch 10/14] x86/irq: Clear move_in_progress before sending cleanup IPI Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:19 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 11/14] x86/irq: Remove the cpumask allocation from send_cleanup_vector() Thomas Gleixner
` (3 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Remove-outgoing-CPU-from-vector-cleanup-mask.patch --]
[-- Type: text/plain, Size: 1089 bytes --]
We want to synchronize new vector assignments with a pending cleanup. Remove a
dying cpu from a pending cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -631,9 +631,23 @@ void irq_complete_move(struct irq_cfg *c
void irq_force_complete_move(int irq)
{
struct irq_cfg *cfg = irq_cfg(irq);
+ struct apic_chip_data *data;
- if (cfg)
- __irq_complete_move(cfg, cfg->vector);
+ if (!cfg)
+ return;
+
+ __irq_complete_move(cfg, cfg->vector);
+
+ /*
+ * Remove this cpu from the cleanup mask. The IPI might have been sent
+ * just before the cpu was removed from the offline mask, but has not
+ * been processed because the CPU has interrupts disabled and is on
+ * the way out.
+ */
+ raw_spin_lock(&vector_lock);
+ data = container_of(cfg, struct apic_chip_data, cfg);
+ cpumask_clear_cpu(smp_processor_id(), data->old_domain);
+ raw_spin_unlock(&vector_lock);
}
#endif
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 13/14] x86/irq: Call irq_force_move_complete with irq descriptor
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (11 preceding siblings ...)
2015-12-31 16:30 ` [patch 11/14] x86/irq: Remove the cpumask allocation from send_cleanup_vector() Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:20 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 14/14] x86/irq: Plug vector cleanup race Thomas Gleixner
2016-01-04 15:35 ` [patch 00/14] x86/irq: Plug various vector cleanup races Joe Lawrence
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Call-irq_force_move_complete-with-irq-descriptor.patch --]
[-- Type: text/plain, Size: 2469 bytes --]
First of all there is no point in looking up the irq descriptor again, but we
also need the descriptor for the final cleanup race fix in the next
patch. Make that change seperate. No functional difference.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/include/asm/irq.h | 5 +++--
arch/x86/kernel/apic/vector.c | 11 +++++++----
arch/x86/kernel/irq.c | 2 +-
3 files changed, 11 insertions(+), 7 deletions(-)
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -23,11 +23,13 @@ extern void irq_ctx_init(int cpu);
#define __ARCH_HAS_DO_SOFTIRQ
+struct irq_desc;
+
#ifdef CONFIG_HOTPLUG_CPU
#include <linux/cpumask.h>
extern int check_irq_vectors_for_cpu_disable(void);
extern void fixup_irqs(void);
-extern void irq_force_complete_move(int);
+extern void irq_force_complete_move(struct irq_desc *desc);
#endif
#ifdef CONFIG_HAVE_KVM
@@ -37,7 +39,6 @@ extern void kvm_set_posted_intr_wakeup_h
extern void (*x86_platform_ipi_callback)(void);
extern void native_init_IRQ(void);
-struct irq_desc;
extern bool handle_irq(struct irq_desc *desc, struct pt_regs *regs);
extern __visible unsigned int do_IRQ(struct pt_regs *regs);
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -628,10 +628,14 @@ void irq_complete_move(struct irq_cfg *c
__irq_complete_move(cfg, ~get_irq_regs()->orig_ax);
}
-void irq_force_complete_move(int irq)
+/*
+ * Called with @desc->lock held and interrupts disabled.
+ */
+void irq_force_complete_move(struct irq_desc *desc)
{
- struct irq_cfg *cfg = irq_cfg(irq);
- struct apic_chip_data *data;
+ struct irq_data *irqdata = irq_desc_get_irq_data(desc);
+ struct apic_chip_data *data = apic_chip_data(irqdata);
+ struct irq_cfg *cfg = data ? &data->cfg : NULL;
if (!cfg)
return;
@@ -645,7 +649,6 @@ void irq_force_complete_move(int irq)
* the way out.
*/
raw_spin_lock(&vector_lock);
- data = container_of(cfg, struct apic_chip_data, cfg);
cpumask_clear_cpu(smp_processor_id(), data->old_domain);
raw_spin_unlock(&vector_lock);
}
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -462,7 +462,7 @@ void fixup_irqs(void)
* non intr-remapping case, we can't wait till this interrupt
* arrives at this cpu before completing the irq move.
*/
- irq_force_complete_move(irq);
+ irq_force_complete_move(desc);
if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
break_affinity = 1;
^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch 14/14] x86/irq: Plug vector cleanup race
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (12 preceding siblings ...)
2015-12-31 16:30 ` [patch 13/14] x86/irq: Call irq_force_move_complete with irq descriptor Thomas Gleixner
@ 2015-12-31 16:30 ` Thomas Gleixner
2016-01-16 21:20 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2016-01-04 15:35 ` [patch 00/14] x86/irq: Plug various vector cleanup races Joe Lawrence
14 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-12-31 16:30 UTC (permalink / raw)
To: LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Joe Lawrence,
Jeremiah Mahler, Borislav Petkov, andy.shevchenko, Guenter Roeck
[-- Attachment #1: x86-irq--Plug-vector-race.patch --]
[-- Type: text/plain, Size: 5140 bytes --]
We still can end up with a stale vector due to the following:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
So we need to serialize the vector assignment against a pending cleanup. The
solution is rather simple now. We not only check for the move_in_progress flag
in assign_irq_vector(), we also check whether there is still a cleanup pending
in the old_domain cpumask. If so, we return -EBUSY to the caller and let him
deal with it. Though we have to be careful in the cpu unplug case. If the
cleanout has not yet completed then the following setaffinity() call would
return -EBUSY. Add code which prevents this.
Full context is here: http://lkml.kernel.org/r/5653B688.4050809@stratus.com
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 63 +++++++++++++++++++++++++++++++++++-------
1 file changed, 53 insertions(+), 10 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -118,7 +118,12 @@ static int __assign_irq_vector(int irq,
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, vector;
- if (d->move_in_progress)
+ /*
+ * If there is still a move in progress or the previous move has not
+ * been cleaned up completely, tell the caller to come back later.
+ */
+ if (d->move_in_progress ||
+ cpumask_intersects(d->old_domain, cpu_online_mask))
return -EBUSY;
/* Only try and allocate irqs on cpus that are present */
@@ -257,7 +262,12 @@ static void clear_irq_vector(int irq, st
data->cfg.vector = 0;
cpumask_clear(data->domain);
- if (likely(!data->move_in_progress))
+ /*
+ * If move is in progress or the old_domain mask is not empty,
+ * i.e. the cleanup IPI has not been processed yet, we need to remove
+ * the old references to desc from all cpus vector tables.
+ */
+ if (!data->move_in_progress && cpumask_empty(data->old_domain))
return;
desc = irq_to_desc(irq);
@@ -577,12 +587,25 @@ asmlinkage __visible void smp_irq_move_c
goto unlock;
/*
- * Check if the irq migration is in progress. If so, we
- * haven't received the cleanup request yet for this irq.
+ * Nothing to cleanup if irq migration is in progress
+ * or this cpu is not set in the cleanup mask.
*/
- if (data->move_in_progress)
+ if (data->move_in_progress ||
+ !cpumask_test_cpu(me, data->old_domain))
goto unlock;
+ /*
+ * We have two cases to handle here:
+ * 1) vector is unchanged but the target mask got reduced
+ * 2) vector and the target mask has changed
+ *
+ * #1 is obvious, but in #2 we have two vectors with the same
+ * irq descriptor: the old and the new vector. So we need to
+ * make sure that we only cleanup the old vector. The new
+ * vector has the current @vector number in the config and
+ * this cpu is part of the target mask. We better leave that
+ * one alone.
+ */
if (vector == data->cfg.vector &&
cpumask_test_cpu(me, data->domain))
goto unlock;
@@ -600,6 +623,7 @@ asmlinkage __visible void smp_irq_move_c
goto unlock;
}
__this_cpu_write(vector_irq[vector], VECTOR_UNUSED);
+ cpumask_clear_cpu(me, data->old_domain);
unlock:
raw_spin_unlock(&desc->lock);
}
@@ -643,13 +667,32 @@ void irq_force_complete_move(struct irq_
__irq_complete_move(cfg, cfg->vector);
/*
- * Remove this cpu from the cleanup mask. The IPI might have been sent
- * just before the cpu was removed from the offline mask, but has not
- * been processed because the CPU has interrupts disabled and is on
- * the way out.
+ * This is tricky. If the cleanup of @data->old_domain has not been
+ * done yet, then the following setaffinity call will fail with
+ * -EBUSY. This can leave the interrupt in a stale state.
+ *
+ * The cleanup cannot make progress because we hold @desc->lock. So in
+ * case @data->old_domain is not yet cleaned up, we need to drop the
+ * lock and acquire it again. @desc cannot go away, because the
+ * hotplug code holds the sparse irq lock.
*/
raw_spin_lock(&vector_lock);
- cpumask_clear_cpu(smp_processor_id(), data->old_domain);
+ /* Clean out all offline cpus (including ourself) first. */
+ cpumask_and(data->old_domain, data->old_domain, cpu_online_mask);
+ while (!cpumask_empty(data->old_domain)) {
+ raw_spin_unlock(&vector_lock);
+ raw_spin_unlock(&desc->lock);
+ cpu_relax();
+ raw_spin_lock(&desc->lock);
+ /*
+ * Reevaluate apic_chip_data. It might have been cleared after
+ * we dropped @desc->lock.
+ */
+ data = apic_chip_data(irqdata);
+ if (!data)
+ return;
+ raw_spin_lock(&vector_lock);
+ }
raw_spin_unlock(&vector_lock);
}
#endif
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
` (13 preceding siblings ...)
2015-12-31 16:30 ` [patch 14/14] x86/irq: Plug vector cleanup race Thomas Gleixner
@ 2016-01-04 15:35 ` Joe Lawrence
2016-01-14 8:24 ` Thomas Gleixner
14 siblings, 1 reply; 38+ messages in thread
From: Joe Lawrence @ 2016-01-04 15:35 UTC (permalink / raw)
To: Thomas Gleixner, LKML
Cc: Ingo Molnar, Peter Anvin, Jiang Liu, Jeremiah Mahler,
Borislav Petkov, andy.shevchenko, Guenter Roeck
On 12/31/2015 11:30 AM, Thomas Gleixner wrote:
> Joe reported a nasty race in the vector cleanup code which results in stale
> irq descriptors in the vector arrays and a potential use after free.
>
> This series addresses this issue, another race which was found and fixed by
> Jiang related to the same area, plus a cpu hotplug issue which is not related
> to this.
>
> This lot is intended for stable, but not yet marked so. The diffstat below is
> rather large, but this is mostly due to extensive commentry which I added in
> the process.
Hi Thomas,
No issues running the same PCI device removal and stress tests against
the patchset.
Thanks,
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2016-01-04 15:35 ` [patch 00/14] x86/irq: Plug various vector cleanup races Joe Lawrence
@ 2016-01-14 8:24 ` Thomas Gleixner
2016-01-14 10:33 ` Borislav Petkov
2016-01-16 21:15 ` [tip:x86/urgent] x86/irq: Call chip-> irq_set_affinity in proper context tip-bot for Thomas Gleixner
0 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-01-14 8:24 UTC (permalink / raw)
To: Joe Lawrence
Cc: LKML, Ingo Molnar, Peter Anvin, Jiang Liu, Jeremiah Mahler,
Borislav Petkov, andy.shevchenko, Guenter Roeck
On Mon, 4 Jan 2016, Joe Lawrence wrote:
> No issues running the same PCI device removal and stress tests against
> the patchset.
Thanks for testing!
Though there is yet another long standing bug in that area. Fix below.
Thanks,
tglx
8<--------------------
Subject: x86/irq: Call chip->irq_set_affinity in proper context
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 14 Jan 2016 08:43:38 +0100
setup_ioapic_dest() calls irqchip->irq_set_affinity() completely
unprotected. That's wrong in several aspects:
- it triggers a lockdep splat because vector lock is taken with interrupts
enabled.
- it opens a race window where irq_set_affinity() can be interrupted and the
irq chip left in unconsistent state.
The proper calling convention is irq descriptor lock held and interrupts
disabled.
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
arch/x86/kernel/apic/io_apic.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2521,6 +2521,7 @@ void __init setup_ioapic_dest(void)
{
int pin, ioapic, irq, irq_entry;
const struct cpumask *mask;
+ struct irq_desc *desc;
struct irq_data *idata;
struct irq_chip *chip;
@@ -2536,7 +2537,9 @@ void __init setup_ioapic_dest(void)
if (irq < 0 || !mp_init_irq_at_boot(ioapic, irq))
continue;
- idata = irq_get_irq_data(irq);
+ desc = irq_to_desc(irq);
+ raw_spin_lock_irq(d&desc->lock);
+ idata = irq_desc_get_irq_data(desc);
/*
* Honour affinities which have been set in early boot
@@ -2550,6 +2553,7 @@ void __init setup_ioapic_dest(void)
/* Might be lapic_chip for irq 0 */
if (chip->irq_set_affinity)
chip->irq_set_affinity(idata, mask, false);
+ raw_spin_unlock_irq(d&desc->lock);
}
}
#endif
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2016-01-14 8:24 ` Thomas Gleixner
@ 2016-01-14 10:33 ` Borislav Petkov
2016-01-16 21:37 ` Joe Lawrence
2016-01-16 21:15 ` [tip:x86/urgent] x86/irq: Call chip-> irq_set_affinity in proper context tip-bot for Thomas Gleixner
1 sibling, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2016-01-14 10:33 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Joe Lawrence, LKML, Ingo Molnar, Peter Anvin, Jiang Liu,
Jeremiah Mahler, andy.shevchenko, Guenter Roeck
On Thu, Jan 14, 2016 at 09:24:35AM +0100, Thomas Gleixner wrote:
> On Mon, 4 Jan 2016, Joe Lawrence wrote:
> > No issues running the same PCI device removal and stress tests against
> > the patchset.
>
> Thanks for testing!
>
> Though there is yet another long standing bug in that area. Fix below.
>
> Thanks,
>
> tglx
>
> 8<--------------------
>
> Subject: x86/irq: Call chip->irq_set_affinity in proper context
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Thu, 14 Jan 2016 08:43:38 +0100
>
> setup_ioapic_dest() calls irqchip->irq_set_affinity() completely
> unprotected. That's wrong in several aspects:
>
> - it triggers a lockdep splat because vector lock is taken with interrupts
> enabled.
>
> - it opens a race window where irq_set_affinity() can be interrupted and the
> irq chip left in unconsistent state.
>
> The proper calling convention is irq descriptor lock held and interrupts
> disabled.
>
> Reported-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
> arch/x86/kernel/apic/io_apic.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2521,6 +2521,7 @@ void __init setup_ioapic_dest(void)
> {
> int pin, ioapic, irq, irq_entry;
> const struct cpumask *mask;
> + struct irq_desc *desc;
> struct irq_data *idata;
> struct irq_chip *chip;
>
> @@ -2536,7 +2537,9 @@ void __init setup_ioapic_dest(void)
> if (irq < 0 || !mp_init_irq_at_boot(ioapic, irq))
> continue;
>
> - idata = irq_get_irq_data(irq);
> + desc = irq_to_desc(irq);
> + raw_spin_lock_irq(d&desc->lock);
s/d//
> + idata = irq_desc_get_irq_data(desc);
>
> /*
> * Honour affinities which have been set in early boot
> @@ -2550,6 +2553,7 @@ void __init setup_ioapic_dest(void)
> /* Might be lapic_chip for irq 0 */
> if (chip->irq_set_affinity)
> chip->irq_set_affinity(idata, mask, false);
> + raw_spin_unlock_irq(d&desc->lock);
s/d//
With those micro-changes:
Tested-by: Borislav Petkov <bp@suse.de>
:-)
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Call chip-> irq_set_affinity in proper context
2016-01-14 8:24 ` Thomas Gleixner
2016-01-14 10:33 ` Borislav Petkov
@ 2016-01-16 21:15 ` tip-bot for Thomas Gleixner
1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:15 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, joe.lawrence, mingo, jmmahler, tglx, linux, bp,
linux-kernel, jiang.liu
Commit-ID: e23b257c293ce4bcc8cabb2aa3097b6ed8a8261a
Gitweb: http://git.kernel.org/tip/e23b257c293ce4bcc8cabb2aa3097b6ed8a8261a
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 14 Jan 2016 08:43:38 +0100
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:43:58 +0100
x86/irq: Call chip->irq_set_affinity in proper context
setup_ioapic_dest() calls irqchip->irq_set_affinity() completely
unprotected. That's wrong in several aspects:
- it opens a race window where irq_set_affinity() can be interrupted and the
irq chip left in unconsistent state.
- it triggers a lockdep splat when we fix the vector race for 4.3+ because
vector lock is taken with interrupts enabled.
The proper calling convention is irq descriptor lock held and interrupts
disabled.
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1601140919420.3575@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/io_apic.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index f253218..fdb0fbf 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2521,6 +2521,7 @@ void __init setup_ioapic_dest(void)
{
int pin, ioapic, irq, irq_entry;
const struct cpumask *mask;
+ struct irq_desc *desc;
struct irq_data *idata;
struct irq_chip *chip;
@@ -2536,7 +2537,9 @@ void __init setup_ioapic_dest(void)
if (irq < 0 || !mp_init_irq_at_boot(ioapic, irq))
continue;
- idata = irq_get_irq_data(irq);
+ desc = irq_to_desc(irq);
+ raw_spin_lock_irq(&desc->lock);
+ idata = irq_desc_get_irq_data(desc);
/*
* Honour affinities which have been set in early boot
@@ -2550,6 +2553,7 @@ void __init setup_ioapic_dest(void)
/* Might be lapic_chip for irq 0 */
if (chip->irq_set_affinity)
chip->irq_set_affinity(idata, mask, false);
+ raw_spin_unlock_irq(&desc->lock);
}
}
#endif
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Validate that irq descriptor is still active
2015-12-31 16:30 ` [patch 02/14] x86/irq: Validate that irq descriptor is still active Thomas Gleixner
@ 2016-01-16 21:16 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:16 UTC (permalink / raw)
To: linux-tip-commits
Cc: jmmahler, joe.lawrence, tglx, jiang.liu, linux-kernel, linux, bp,
mingo, hpa
Commit-ID: 36f34c8c63da3e272fd66f91089228c22d2b6e8b
Gitweb: http://git.kernel.org/tip/36f34c8c63da3e272fd66f91089228c22d2b6e8b
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:45 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:43:59 +0100
x86/irq: Validate that irq descriptor is still active
In fixup_irqs() we unconditionally dereference the irq chip of an irq
descriptor. The descriptor might still be valid, but already cleaned up,
i.e. the chip removed. Add a check for this condition.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.236423282@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/irq.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index f8062aa..c0b58dd 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -470,6 +470,15 @@ void fixup_irqs(void)
}
chip = irq_data_get_irq_chip(data);
+ /*
+ * The interrupt descriptor might have been cleaned up
+ * already, but it is not yet removed from the radix tree
+ */
+ if (!chip) {
+ raw_spin_unlock(&desc->lock);
+ continue;
+ }
+
if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
chip->irq_mask(data);
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Reorganize the return path in assign_irq_vector
2015-12-31 16:30 ` [patch 04/14] x86/irq: Reorganize the return path in assign_irq_vector Thomas Gleixner
@ 2016-01-16 21:17 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: mingo, linux-kernel, joe.lawrence, tglx, jmmahler, bp, jiang.liu,
hpa, linux
Commit-ID: 433cbd57d190a1cdd02f243df41c3d7f55ec4b94
Gitweb: http://git.kernel.org/tip/433cbd57d190a1cdd02f243df41c3d7f55ec4b94
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:46 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:43:59 +0100
x86/irq: Reorganize the return path in assign_irq_vector
Use an explicit goto for the cases where we have success in the search/update
and return -ENOSPC if the search loop ends due to no space.
Preparatory patch for fixes. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.403491024@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 19082cf..613b1cd 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -118,13 +118,12 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
*/
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
- int cpu, err;
+ int cpu;
if (d->move_in_progress)
return -EBUSY;
/* Only try and allocate irqs on cpus that are present */
- err = -ENOSPC;
cpumask_clear(d->old_domain);
cpumask_clear(searched_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
@@ -134,9 +133,8 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
apic->vector_allocation_domain(cpu, vector_cpumask, mask);
if (cpumask_subset(vector_cpumask, d->domain)) {
- err = 0;
if (cpumask_equal(vector_cpumask, d->domain))
- break;
+ goto success;
/*
* New cpumask using the vector is a proper subset of
* the current in use mask. So cleanup the vector
@@ -147,7 +145,7 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
d->move_in_progress =
cpumask_intersects(d->old_domain, cpu_online_mask);
cpumask_and(d->domain, d->domain, vector_cpumask);
- break;
+ goto success;
}
vector = current_vector;
@@ -187,17 +185,13 @@ next:
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
- err = 0;
- break;
+ goto success;
}
+ return -ENOSPC;
- if (!err) {
- /* cache destination APIC IDs into cfg->dest_apicid */
- err = apic->cpu_mask_to_apicid_and(mask, d->domain,
- &d->cfg.dest_apicid);
- }
-
- return err;
+success:
+ /* cache destination APIC IDs into cfg->dest_apicid */
+ return apic->cpu_mask_to_apicid_and(mask, d->domain, &d->cfg.dest_apicid);
}
static int assign_irq_vector(int irq, struct apic_chip_data *data,
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Reorganize the search in assign_irq_vector
2015-12-31 16:30 ` [patch 05/14] x86/irq: Reorganize the search in assign_irq_vector Thomas Gleixner
@ 2016-01-16 21:17 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: bp, jmmahler, linux, jiang.liu, linux-kernel, tglx, hpa,
joe.lawrence, mingo
Commit-ID: 95ffeb4b5baca266e1d0d2bc90f1513e6f419cdd
Gitweb: http://git.kernel.org/tip/95ffeb4b5baca266e1d0d2bc90f1513e6f419cdd
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:47 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:00 +0100
x86/irq: Reorganize the search in assign_irq_vector
Split out the code which advances the target cpu for the search so we can
reuse it for the next patch which adds an early validation check for the
vectormask which we get from the apic.
Add comments while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.484562040@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 613b1cd..cef3195 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -157,14 +157,9 @@ next:
vector = FIRST_EXTERNAL_VECTOR + offset;
}
- if (unlikely(current_vector == vector)) {
- cpumask_or(searched_cpumask, searched_cpumask,
- vector_cpumask);
- cpumask_andnot(vector_cpumask, mask, searched_cpumask);
- cpu = cpumask_first_and(vector_cpumask,
- cpu_online_mask);
- continue;
- }
+ /* If the search wrapped around, try the next cpu */
+ if (unlikely(current_vector == vector))
+ goto next_cpu;
if (test_bit(vector, used_vectors))
goto next;
@@ -186,6 +181,19 @@ next:
d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
goto success;
+
+next_cpu:
+ /*
+ * We exclude the current @vector_cpumask from the requested
+ * @mask and try again with the next online cpu in the
+ * result. We cannot modify @mask, so we use @vector_cpumask
+ * as a temporary buffer here as it will be reassigned when
+ * calling apic->vector_allocation_domain() above.
+ */
+ cpumask_or(searched_cpumask, searched_cpumask, vector_cpumask);
+ cpumask_andnot(vector_cpumask, mask, searched_cpumask);
+ cpu = cpumask_first_and(vector_cpumask, cpu_online_mask);
+ continue;
}
return -ENOSPC;
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Check vector allocation early
2015-12-31 16:30 ` [patch 06/14] x86/irq: Check vector allocation early Thomas Gleixner
@ 2016-01-16 21:17 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: jiang.liu, tglx, bp, mingo, hpa, linux-kernel, joe.lawrence,
linux, jmmahler
Commit-ID: 3716fd27a604d61a91cda47083504971486b80f1
Gitweb: http://git.kernel.org/tip/3716fd27a604d61a91cda47083504971486b80f1
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:48 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:00 +0100
x86/irq: Check vector allocation early
__assign_irq_vector() uses the vector_cpumask which is assigned by
apic->vector_allocation_domain() without doing basic sanity checks. That can
result in a situation where the final assignement of a newly found vector
fails in apic->cpu_mask_to_apicid_and(). So we have to do rollbacks for no
reason.
apic->cpu_mask_to_apicid_and() only fails if
vector_cpumask & requested_cpumask & cpu_online_mask
is empty.
Check for this condition right away and if the result is empty try immediately
the next possible cpu in the requested mask. So in case of a failure the old
setting is unchanged and we can remove the rollback code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.561877324@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index cef3195..940e18d 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -31,7 +31,7 @@ struct apic_chip_data {
struct irq_domain *x86_vector_domain;
EXPORT_SYMBOL_GPL(x86_vector_domain);
static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask, searched_cpumask;
+static cpumask_var_t vector_cpumask, vector_searchmask, searched_cpumask;
static struct irq_chip lapic_controller;
#ifdef CONFIG_X86_IO_APIC
static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -130,8 +130,20 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
+ /* Get the possible target cpus for @mask/@cpu from the apic */
apic->vector_allocation_domain(cpu, vector_cpumask, mask);
+ /*
+ * Clear the offline cpus from @vector_cpumask for searching
+ * and verify whether the result overlaps with @mask. If true,
+ * then the call to apic->cpu_mask_to_apicid_and() will
+ * succeed as well. If not, no point in trying to find a
+ * vector in this mask.
+ */
+ cpumask_and(vector_searchmask, vector_cpumask, cpu_online_mask);
+ if (!cpumask_intersects(vector_searchmask, mask))
+ goto next_cpu;
+
if (cpumask_subset(vector_cpumask, d->domain)) {
if (cpumask_equal(vector_cpumask, d->domain))
goto success;
@@ -164,7 +176,7 @@ next:
if (test_bit(vector, used_vectors))
goto next;
- for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
+ for_each_cpu(new_cpu, vector_searchmask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, new_cpu)[vector]))
goto next;
}
@@ -176,7 +188,7 @@ next:
d->move_in_progress =
cpumask_intersects(d->old_domain, cpu_online_mask);
}
- for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
+ for_each_cpu(new_cpu, vector_searchmask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
@@ -198,8 +210,14 @@ next_cpu:
return -ENOSPC;
success:
- /* cache destination APIC IDs into cfg->dest_apicid */
- return apic->cpu_mask_to_apicid_and(mask, d->domain, &d->cfg.dest_apicid);
+ /*
+ * Cache destination APIC IDs into cfg->dest_apicid. This cannot fail
+ * as we already established, that mask & d->domain & cpu_online_mask
+ * is not empty.
+ */
+ BUG_ON(apic->cpu_mask_to_apicid_and(mask, d->domain,
+ &d->cfg.dest_apicid));
+ return 0;
}
static int assign_irq_vector(int irq, struct apic_chip_data *data,
@@ -409,6 +427,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);
BUG_ON(!alloc_cpumask_var(&vector_cpumask, GFP_KERNEL));
+ BUG_ON(!alloc_cpumask_var(&vector_searchmask, GFP_KERNEL));
BUG_ON(!alloc_cpumask_var(&searched_cpumask, GFP_KERNEL));
return arch_early_ioapic_init();
@@ -498,14 +517,7 @@ static int apic_set_affinity(struct irq_data *irq_data,
return -EINVAL;
err = assign_irq_vector(irq, data, dest);
- if (err) {
- if (assign_irq_vector(irq, data,
- irq_data_get_affinity_mask(irq_data)))
- pr_err("Failed to recover vector for irq %d\n", irq);
- return err;
- }
-
- return IRQ_SET_MASK_OK;
+ return err ? err : IRQ_SET_MASK_OK;
}
static struct irq_chip lapic_controller = {
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Copy vectormask instead of an AND operation
2015-12-31 16:30 ` [patch 07/14] x86/irq: Copy vectormask instead of an AND operation Thomas Gleixner
@ 2016-01-16 21:18 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux, tglx, jiang.liu, linux-kernel, joe.lawrence, bp, mingo,
jmmahler, hpa
Commit-ID: 9ac15b7a8af4cf3337a101498c0ed690d23ade75
Gitweb: http://git.kernel.org/tip/9ac15b7a8af4cf3337a101498c0ed690d23ade75
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:49 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:00 +0100
x86/irq: Copy vectormask instead of an AND operation
In the case that the new vector mask is a subset of the existing mask there is
no point to do a AND operation of currentmask & newmask. The result is
newmask. So we can simply copy the new mask to the current mask and be done
with it. Preparatory patch for further consolidation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.640253454@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 940e18d..1bd29c6 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -156,7 +156,7 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
vector_cpumask);
d->move_in_progress =
cpumask_intersects(d->old_domain, cpu_online_mask);
- cpumask_and(d->domain, d->domain, vector_cpumask);
+ cpumask_copy(d->domain, vector_cpumask);
goto success;
}
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Get rid of code duplication
2015-12-31 16:30 ` [patch 08/14] x86/irq: Get rid of code duplication Thomas Gleixner
@ 2016-01-16 21:18 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, joe.lawrence, bp, hpa, linux, mingo, jiang.liu,
tglx, jmmahler
Commit-ID: ab25ac02148b600e645f77cfb8b8ea415ed75bb4
Gitweb: http://git.kernel.org/tip/ab25ac02148b600e645f77cfb8b8ea415ed75bb4
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:49 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:00 +0100
x86/irq: Get rid of code duplication
Reusing an existing vector and assigning a new vector has duplicated
code. Consolidate it.
This is also a preparatory patch for finally plugging the cleanup race.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.721599216@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 33 +++++++++++++++------------------
1 file changed, 15 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 1bd29c6..fccfa3f 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -118,7 +118,7 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
*/
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
- int cpu;
+ int cpu, vector;
if (d->move_in_progress)
return -EBUSY;
@@ -128,7 +128,7 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
cpumask_clear(searched_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
- int new_cpu, vector, offset;
+ int new_cpu, offset;
/* Get the possible target cpus for @mask/@cpu from the apic */
apic->vector_allocation_domain(cpu, vector_cpumask, mask);
@@ -148,16 +148,12 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
if (cpumask_equal(vector_cpumask, d->domain))
goto success;
/*
- * New cpumask using the vector is a proper subset of
- * the current in use mask. So cleanup the vector
- * allocation for the members that are not used anymore.
+ * Mark the cpus which are not longer in the mask for
+ * cleanup.
*/
- cpumask_andnot(d->old_domain, d->domain,
- vector_cpumask);
- d->move_in_progress =
- cpumask_intersects(d->old_domain, cpu_online_mask);
- cpumask_copy(d->domain, vector_cpumask);
- goto success;
+ cpumask_andnot(d->old_domain, d->domain, vector_cpumask);
+ vector = d->cfg.vector;
+ goto update;
}
vector = current_vector;
@@ -183,16 +179,12 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
- if (d->cfg.vector) {
+ /* Schedule the old vector for cleanup on all cpus */
+ if (d->cfg.vector)
cpumask_copy(d->old_domain, d->domain);
- d->move_in_progress =
- cpumask_intersects(d->old_domain, cpu_online_mask);
- }
for_each_cpu(new_cpu, vector_searchmask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
- d->cfg.vector = vector;
- cpumask_copy(d->domain, vector_cpumask);
- goto success;
+ goto update;
next_cpu:
/*
@@ -209,6 +201,11 @@ next_cpu:
}
return -ENOSPC;
+update:
+ /* Cleanup required ? */
+ d->move_in_progress = cpumask_intersects(d->old_domain, cpu_online_mask);
+ d->cfg.vector = vector;
+ cpumask_copy(d->domain, vector_cpumask);
success:
/*
* Cache destination APIC IDs into cfg->dest_apicid. This cannot fail
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Remove offline cpus from vector cleanup
2015-12-31 16:30 ` [patch 09/14] x86/irq: Remove offline cpus from vector cleanup Thomas Gleixner
@ 2016-01-16 21:18 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, linux, bp, jiang.liu, mingo, joe.lawrence, hpa,
tglx, jmmahler
Commit-ID: 847667ef10356b824a11c853fc8a8b1b437b6a8d
Gitweb: http://git.kernel.org/tip/847667ef10356b824a11c853fc8a8b1b437b6a8d
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:50 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:01 +0100
x86/irq: Remove offline cpus from vector cleanup
No point of keeping offline cpus in the cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.808642683@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index fccfa3f..68d18b3 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -202,8 +202,12 @@ next_cpu:
return -ENOSPC;
update:
- /* Cleanup required ? */
- d->move_in_progress = cpumask_intersects(d->old_domain, cpu_online_mask);
+ /*
+ * Exclude offline cpus from the cleanup mask and set the
+ * move_in_progress flag when the result is not empty.
+ */
+ cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
+ d->move_in_progress = !cpumask_empty(d->old_domain);
d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
success:
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Clear move_in_progress before sending cleanup IPI
2015-12-31 16:30 ` [patch 10/14] x86/irq: Clear move_in_progress before sending cleanup IPI Thomas Gleixner
@ 2016-01-16 21:19 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: bp, joe.lawrence, mingo, hpa, jmmahler, jiang.liu, tglx, linux,
linux-kernel
Commit-ID: c1684f5035b60e9f98566493e869496fb5de1d89
Gitweb: http://git.kernel.org/tip/c1684f5035b60e9f98566493e869496fb5de1d89
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:51 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:01 +0100
x86/irq: Clear move_in_progress before sending cleanup IPI
send_cleanup_vector() fiddles with the old_domain mask unprotected because it
relies on the protection by the move_in_progress flag. But this is fatal, as
the flag is reset after the IPI has been sent. So a cpu which receives the IPI
can still see the flag set and therefor ignores the cleanup request. If no
other cleanup request happens then the vector stays stale on that cpu and in
case of an irq removal the vector still persists. That can lead to use after
free when the next cleanup IPI happens.
Protect the code with vector_lock and clear move_in_progress before sending
the IPI.
This does not plug the race which Joe reported because:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
The full fix comes with a later patch.
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.892412198@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 68d18b3..ed62f9c 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -532,6 +532,8 @@ static void __send_cleanup_vector(struct apic_chip_data *data)
{
cpumask_var_t cleanup_mask;
+ raw_spin_lock(&vector_lock);
+ data->move_in_progress = 0;
if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
unsigned int i;
@@ -543,7 +545,7 @@ static void __send_cleanup_vector(struct apic_chip_data *data)
apic->send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
free_cpumask_var(cleanup_mask);
}
- data->move_in_progress = 0;
+ raw_spin_unlock(&vector_lock);
}
void send_cleanup_vector(struct irq_cfg *cfg)
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Remove the cpumask allocation from send_cleanup_vector()
2015-12-31 16:30 ` [patch 11/14] x86/irq: Remove the cpumask allocation from send_cleanup_vector() Thomas Gleixner
@ 2016-01-16 21:19 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: jiang.liu, jmmahler, mingo, tglx, linux, hpa, linux-kernel,
joe.lawrence, bp
Commit-ID: 5da0c1217f05d2ccc9a8ed6e6e5c23a8a1d24dd6
Gitweb: http://git.kernel.org/tip/5da0c1217f05d2ccc9a8ed6e6e5c23a8a1d24dd6
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:52 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:01 +0100
x86/irq: Remove the cpumask allocation from send_cleanup_vector()
There is no need to allocate a new cpumask for sending the cleanup vector. The
old_domain mask is now protected by the vector_lock, so we can safely remove
the offline cpus from it and send the IPI with the resulting mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160106.967993932@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index ed62f9c..91dc274 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -530,21 +530,11 @@ static struct irq_chip lapic_controller = {
#ifdef CONFIG_SMP
static void __send_cleanup_vector(struct apic_chip_data *data)
{
- cpumask_var_t cleanup_mask;
-
raw_spin_lock(&vector_lock);
+ cpumask_and(data->old_domain, data->old_domain, cpu_online_mask);
data->move_in_progress = 0;
- if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
- unsigned int i;
-
- for_each_cpu_and(i, data->old_domain, cpu_online_mask)
- apic->send_IPI_mask(cpumask_of(i),
- IRQ_MOVE_CLEANUP_VECTOR);
- } else {
- cpumask_and(cleanup_mask, data->old_domain, cpu_online_mask);
- apic->send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
- free_cpumask_var(cleanup_mask);
- }
+ if (!cpumask_empty(data->old_domain))
+ apic->send_IPI_mask(data->old_domain, IRQ_MOVE_CLEANUP_VECTOR);
raw_spin_unlock(&vector_lock);
}
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Remove outgoing CPU from vector cleanup mask
2015-12-31 16:30 ` [patch 12/14] x86/irq: Remove outgoing CPU from vector cleanup mask Thomas Gleixner
@ 2016-01-16 21:19 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: jiang.liu, linux-kernel, hpa, tglx, linux, bp, jmmahler,
joe.lawrence, mingo
Commit-ID: 56d7d2f4bbd00fb198b7907cb3ab657d06115a42
Gitweb: http://git.kernel.org/tip/56d7d2f4bbd00fb198b7907cb3ab657d06115a42
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:52 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:01 +0100
x86/irq: Remove outgoing CPU from vector cleanup mask
We want to synchronize new vector assignments with a pending cleanup. Remove a
dying cpu from a pending cleanup mask.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.045961667@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 91dc274..a7fa11e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -633,9 +633,23 @@ void irq_complete_move(struct irq_cfg *cfg)
void irq_force_complete_move(int irq)
{
struct irq_cfg *cfg = irq_cfg(irq);
+ struct apic_chip_data *data;
+
+ if (!cfg)
+ return;
- if (cfg)
- __irq_complete_move(cfg, cfg->vector);
+ __irq_complete_move(cfg, cfg->vector);
+
+ /*
+ * Remove this cpu from the cleanup mask. The IPI might have been sent
+ * just before the cpu was removed from the offline mask, but has not
+ * been processed because the CPU has interrupts disabled and is on
+ * the way out.
+ */
+ raw_spin_lock(&vector_lock);
+ data = container_of(cfg, struct apic_chip_data, cfg);
+ cpumask_clear_cpu(smp_processor_id(), data->old_domain);
+ raw_spin_unlock(&vector_lock);
}
#endif
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Call irq_force_move_complete with irq descriptor
2015-12-31 16:30 ` [patch 13/14] x86/irq: Call irq_force_move_complete with irq descriptor Thomas Gleixner
@ 2016-01-16 21:20 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:20 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, mingo, jiang.liu, joe.lawrence, bp, linux-kernel, linux,
jmmahler, hpa
Commit-ID: 90a2282e23f0522e4b3f797ad447c5e91bf7fe32
Gitweb: http://git.kernel.org/tip/90a2282e23f0522e4b3f797ad447c5e91bf7fe32
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:53 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:01 +0100
x86/irq: Call irq_force_move_complete with irq descriptor
First of all there is no point in looking up the irq descriptor again, but we
also need the descriptor for the final cleanup race fix in the next
patch. Make that change seperate. No functional difference.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.125211743@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/include/asm/irq.h | 5 +++--
arch/x86/kernel/apic/vector.c | 11 +++++++----
arch/x86/kernel/irq.c | 2 +-
3 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index 881b476..e7de5c9 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -23,11 +23,13 @@ extern void irq_ctx_init(int cpu);
#define __ARCH_HAS_DO_SOFTIRQ
+struct irq_desc;
+
#ifdef CONFIG_HOTPLUG_CPU
#include <linux/cpumask.h>
extern int check_irq_vectors_for_cpu_disable(void);
extern void fixup_irqs(void);
-extern void irq_force_complete_move(int);
+extern void irq_force_complete_move(struct irq_desc *desc);
#endif
#ifdef CONFIG_HAVE_KVM
@@ -37,7 +39,6 @@ extern void kvm_set_posted_intr_wakeup_handler(void (*handler)(void));
extern void (*x86_platform_ipi_callback)(void);
extern void native_init_IRQ(void);
-struct irq_desc;
extern bool handle_irq(struct irq_desc *desc, struct pt_regs *regs);
extern __visible unsigned int do_IRQ(struct pt_regs *regs);
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index a7fa11e..5f78835 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -630,10 +630,14 @@ void irq_complete_move(struct irq_cfg *cfg)
__irq_complete_move(cfg, ~get_irq_regs()->orig_ax);
}
-void irq_force_complete_move(int irq)
+/*
+ * Called with @desc->lock held and interrupts disabled.
+ */
+void irq_force_complete_move(struct irq_desc *desc)
{
- struct irq_cfg *cfg = irq_cfg(irq);
- struct apic_chip_data *data;
+ struct irq_data *irqdata = irq_desc_get_irq_data(desc);
+ struct apic_chip_data *data = apic_chip_data(irqdata);
+ struct irq_cfg *cfg = data ? &data->cfg : NULL;
if (!cfg)
return;
@@ -647,7 +651,6 @@ void irq_force_complete_move(int irq)
* the way out.
*/
raw_spin_lock(&vector_lock);
- data = container_of(cfg, struct apic_chip_data, cfg);
cpumask_clear_cpu(smp_processor_id(), data->old_domain);
raw_spin_unlock(&vector_lock);
}
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index c0b58dd..61521dc 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -462,7 +462,7 @@ void fixup_irqs(void)
* non intr-remapping case, we can't wait till this interrupt
* arrives at this cpu before completing the irq move.
*/
- irq_force_complete_move(irq);
+ irq_force_complete_move(desc);
if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
break_affinity = 1;
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/urgent] x86/irq: Plug vector cleanup race
2015-12-31 16:30 ` [patch 14/14] x86/irq: Plug vector cleanup race Thomas Gleixner
@ 2016-01-16 21:20 ` tip-bot for Thomas Gleixner
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-01-16 21:20 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux, mingo, jmmahler, joe.lawrence, bp, linux-kernel, hpa,
jiang.liu, tglx
Commit-ID: 98229aa36caa9c769b13565523de9b813013c703
Gitweb: http://git.kernel.org/tip/98229aa36caa9c769b13565523de9b813013c703
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 31 Dec 2015 16:30:54 +0000
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 15 Jan 2016 13:44:02 +0100
x86/irq: Plug vector cleanup race
We still can end up with a stale vector due to the following:
CPU0 CPU1 CPU2
lock_vector()
data->move_in_progress=0
sendIPI()
unlock_vector()
set_affinity()
assign_irq_vector()
lock_vector() handle_IPI
move_in_progress = 1 lock_vector()
unlock_vector()
move_in_progress == 1
So we need to serialize the vector assignment against a pending cleanup. The
solution is rather simple now. We not only check for the move_in_progress flag
in assign_irq_vector(), we also check whether there is still a cleanup pending
in the old_domain cpumask. If so, we return -EBUSY to the caller and let him
deal with it. Though we have to be careful in the cpu unplug case. If the
cleanout has not yet completed then the following setaffinity() call would
return -EBUSY. Add code which prevents this.
Full context is here: http://lkml.kernel.org/r/5653B688.4050809@stratus.com
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.207265407@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 63 ++++++++++++++++++++++++++++++++++++-------
1 file changed, 53 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 5f78835..3b670df 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -120,7 +120,12 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, vector;
- if (d->move_in_progress)
+ /*
+ * If there is still a move in progress or the previous move has not
+ * been cleaned up completely, tell the caller to come back later.
+ */
+ if (d->move_in_progress ||
+ cpumask_intersects(d->old_domain, cpu_online_mask))
return -EBUSY;
/* Only try and allocate irqs on cpus that are present */
@@ -259,7 +264,12 @@ static void clear_irq_vector(int irq, struct apic_chip_data *data)
data->cfg.vector = 0;
cpumask_clear(data->domain);
- if (likely(!data->move_in_progress))
+ /*
+ * If move is in progress or the old_domain mask is not empty,
+ * i.e. the cleanup IPI has not been processed yet, we need to remove
+ * the old references to desc from all cpus vector tables.
+ */
+ if (!data->move_in_progress && cpumask_empty(data->old_domain))
return;
desc = irq_to_desc(irq);
@@ -579,12 +589,25 @@ asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
goto unlock;
/*
- * Check if the irq migration is in progress. If so, we
- * haven't received the cleanup request yet for this irq.
+ * Nothing to cleanup if irq migration is in progress
+ * or this cpu is not set in the cleanup mask.
*/
- if (data->move_in_progress)
+ if (data->move_in_progress ||
+ !cpumask_test_cpu(me, data->old_domain))
goto unlock;
+ /*
+ * We have two cases to handle here:
+ * 1) vector is unchanged but the target mask got reduced
+ * 2) vector and the target mask has changed
+ *
+ * #1 is obvious, but in #2 we have two vectors with the same
+ * irq descriptor: the old and the new vector. So we need to
+ * make sure that we only cleanup the old vector. The new
+ * vector has the current @vector number in the config and
+ * this cpu is part of the target mask. We better leave that
+ * one alone.
+ */
if (vector == data->cfg.vector &&
cpumask_test_cpu(me, data->domain))
goto unlock;
@@ -602,6 +625,7 @@ asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
goto unlock;
}
__this_cpu_write(vector_irq[vector], VECTOR_UNUSED);
+ cpumask_clear_cpu(me, data->old_domain);
unlock:
raw_spin_unlock(&desc->lock);
}
@@ -645,13 +669,32 @@ void irq_force_complete_move(struct irq_desc *desc)
__irq_complete_move(cfg, cfg->vector);
/*
- * Remove this cpu from the cleanup mask. The IPI might have been sent
- * just before the cpu was removed from the offline mask, but has not
- * been processed because the CPU has interrupts disabled and is on
- * the way out.
+ * This is tricky. If the cleanup of @data->old_domain has not been
+ * done yet, then the following setaffinity call will fail with
+ * -EBUSY. This can leave the interrupt in a stale state.
+ *
+ * The cleanup cannot make progress because we hold @desc->lock. So in
+ * case @data->old_domain is not yet cleaned up, we need to drop the
+ * lock and acquire it again. @desc cannot go away, because the
+ * hotplug code holds the sparse irq lock.
*/
raw_spin_lock(&vector_lock);
- cpumask_clear_cpu(smp_processor_id(), data->old_domain);
+ /* Clean out all offline cpus (including ourself) first. */
+ cpumask_and(data->old_domain, data->old_domain, cpu_online_mask);
+ while (!cpumask_empty(data->old_domain)) {
+ raw_spin_unlock(&vector_lock);
+ raw_spin_unlock(&desc->lock);
+ cpu_relax();
+ raw_spin_lock(&desc->lock);
+ /*
+ * Reevaluate apic_chip_data. It might have been cleared after
+ * we dropped @desc->lock.
+ */
+ data = apic_chip_data(irqdata);
+ if (!data)
+ return;
+ raw_spin_lock(&vector_lock);
+ }
raw_spin_unlock(&vector_lock);
}
#endif
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2016-01-14 10:33 ` Borislav Petkov
@ 2016-01-16 21:37 ` Joe Lawrence
2016-01-18 15:00 ` Joe Lawrence
0 siblings, 1 reply; 38+ messages in thread
From: Joe Lawrence @ 2016-01-16 21:37 UTC (permalink / raw)
To: Borislav Petkov, Thomas Gleixner
Cc: LKML, Ingo Molnar, Peter Anvin, Jiang Liu, Jeremiah Mahler,
andy.shevchenko, Guenter Roeck
On 01/14/2016 05:33 AM, Borislav Petkov wrote:
> On Thu, Jan 14, 2016 at 09:24:35AM +0100, Thomas Gleixner wrote:
>> On Mon, 4 Jan 2016, Joe Lawrence wrote:
>>> No issues running the same PCI device removal and stress tests against
>>> the patchset.
>>
>> Thanks for testing!
>>
>> Though there is yet another long standing bug in that area. Fix below.
>>
>> Thanks,
>>
>> tglx
>>
>> 8<--------------------
>>
[ ... snip ... ]
>
> s/d//
>
> With those micro-changes:
>
> Tested-by: Borislav Petkov <bp@suse.de>
>
> :-)
Tests still running ok here (with same micro-change as Borislav).
-- Joe
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2016-01-16 21:37 ` Joe Lawrence
@ 2016-01-18 15:00 ` Joe Lawrence
2016-01-18 15:43 ` Borislav Petkov
2016-01-20 3:57 ` Joe Lawrence
0 siblings, 2 replies; 38+ messages in thread
From: Joe Lawrence @ 2016-01-18 15:00 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Borislav Petkov, LKML, Ingo Molnar, Peter Anvin, Jiang Liu,
Jeremiah Mahler, andy.shevchenko, Guenter Roeck
On 01/16/2016 04:37 PM, Joe Lawrence wrote:
> On 01/14/2016 05:33 AM, Borislav Petkov wrote:
>> On Thu, Jan 14, 2016 at 09:24:35AM +0100, Thomas Gleixner wrote:
>>> On Mon, 4 Jan 2016, Joe Lawrence wrote:
>>>> No issues running the same PCI device removal and stress tests against
>>>> the patchset.
>>>
>>> Thanks for testing!
>>>
>>> Though there is yet another long standing bug in that area. Fix below.
>>>
>>> Thanks,
>>>
>>> tglx
>>>
>>> 8<--------------------
>>>
> [ ... snip ... ]
>>
>> s/d//
>>
>> With those micro-changes:
>>
>> Tested-by: Borislav Petkov <bp@suse.de>
>>
>> :-)
>
> Tests still running ok here (with same micro-change as Borislav).
Hi Thomas,
When logging in this morning and looking at the box running the 14
patches + additional patch, I see it hit a hung task timeout in xhci USB
code about 39 hours in. Stack trace below (looks to be waiting on a
completion that never comes).
I didn't see this when running only the *initial* 14 patches. Of
course, before these irq cleanup fixes my tests never ran this long :)
So it may or may not be related to the patchset, I'm still poking around
the generated vmcore. Let me know if there is anything you might be
interested in looking at from the wreckage.
-- Joe
INFO: task kworker/0:1:1506 blocked for more than 120 seconds.
Tainted: P OE 4.3.0sra12+ #50
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/0:1 D 0000000000000000 0 1506 2 0x00000080
Workqueue: usb_hub_wq hub_event
ffff8801e46dba58 0000000000000046 ffff8810375dac00 ffff881038430000
ffff8801e46dc000 ffff88025ac20440 ffff88025ac20438 ffff881038430000
0000000000000000 ffff8801e46dba70 ffffffff81659893 7fffffffffffffff
Call Trace:
[<ffffffff81659893>] schedule+0x33/0x80
[<ffffffff8165c530>] schedule_timeout+0x200/0x2a0
[<ffffffff810e2761>] ? internal_add_timer+0x71/0xb0
[<ffffffff810e4994>] ? mod_timer+0x114/0x210
[<ffffffff8165a371>] wait_for_completion+0xf1/0x130
[<ffffffff810a70d0>] ? wake_up_q+0x70/0x70
[<ffffffff814b14a1>] xhci_discover_or_reset_device+0x1e1/0x540
[<ffffffff814723b8>] hub_port_reset+0x3c8/0x590
[<ffffffff81472aa5>] hub_port_init+0x525/0xb00
[<ffffffff81476068>] hub_port_connect+0x328/0x940
[<ffffffff81476cbc>] hub_event+0x63c/0xb00
[<ffffffff810947dc>] process_one_work+0x14c/0x3c0
[<ffffffff81095044>] worker_thread+0x114/0x470
[<ffffffff8165925f>] ? __schedule+0x2af/0x8b0
[<ffffffff81094f30>] ? rescuer_thread+0x310/0x310
[<ffffffff8109ab88>] kthread+0xd8/0xf0
[<ffffffff8109aab0>] ? kthread_park+0x60/0x60
[<ffffffff8165d75f>] ret_from_fork+0x3f/0x70
[<ffffffff8109aab0>] ? kthread_park+0x60/0x60
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2016-01-18 15:00 ` Joe Lawrence
@ 2016-01-18 15:43 ` Borislav Petkov
2016-01-18 16:38 ` Joe Lawrence
2016-01-20 3:57 ` Joe Lawrence
1 sibling, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2016-01-18 15:43 UTC (permalink / raw)
To: Joe Lawrence
Cc: Thomas Gleixner, LKML, Ingo Molnar, Peter Anvin, Jiang Liu,
Jeremiah Mahler, andy.shevchenko, Guenter Roeck
On Mon, Jan 18, 2016 at 10:00:49AM -0500, Joe Lawrence wrote:
> INFO: task kworker/0:1:1506 blocked for more than 120 seconds.
> Tainted: P OE 4.3.0sra12+ #50
P as in Proprietary?
Can you trigger without that module?
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2016-01-18 15:43 ` Borislav Petkov
@ 2016-01-18 16:38 ` Joe Lawrence
0 siblings, 0 replies; 38+ messages in thread
From: Joe Lawrence @ 2016-01-18 16:38 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, LKML, Ingo Molnar, Peter Anvin, Jiang Liu,
Jeremiah Mahler, andy.shevchenko, Guenter Roeck
On 01/18/2016 10:43 AM, Borislav Petkov wrote:
> On Mon, Jan 18, 2016 at 10:00:49AM -0500, Joe Lawrence wrote:
>> INFO: task kworker/0:1:1506 blocked for more than 120 seconds.
>> Tainted: P OE 4.3.0sra12+ #50
>
> P as in Proprietary?
>
> Can you trigger without that module?
>
Hi Boris,
Unfortunately, yes to proprietary.
I had a Stratus proprietary driver in place that enables a test to
hotplug add/remove entire PCI subtrees at random points of
initialization. That driver allows us to insert various hardware faults
to facilitate testing the Stratus ftServer platform and OS device
drivers against PCI master abort detection and cleanup.
I can try w/o the driver, though the test will be drastically different
-- polite sysfs PCI removal of a single device (in this case, the xHCI
host controller). An interesting data point to gather nonetheless.
-- Joe
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2016-01-18 15:00 ` Joe Lawrence
2016-01-18 15:43 ` Borislav Petkov
@ 2016-01-20 3:57 ` Joe Lawrence
2016-01-20 8:26 ` Borislav Petkov
1 sibling, 1 reply; 38+ messages in thread
From: Joe Lawrence @ 2016-01-20 3:57 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Borislav Petkov, LKML, Ingo Molnar, Peter Anvin, Jiang Liu,
Jeremiah Mahler, andy.shevchenko, Guenter Roeck
On 01/18/2016 10:00 AM, Joe Lawrence wrote:
[... snip ... ]
> Hi Thomas,
>
> When logging in this morning and looking at the box running the 14
> patches + additional patch, I see it hit a hung task timeout in xhci USB
> code about 39 hours in. Stack trace below (looks to be waiting on a
> completion that never comes).
>
> I didn't see this when running only the *initial* 14 patches. Of
> course, before these irq cleanup fixes my tests never ran this long :)
> So it may or may not be related to the patchset, I'm still poking around
> the generated vmcore. Let me know if there is anything you might be
> interested in looking at from the wreckage.
>
> -- Joe
>
>
>
> INFO: task kworker/0:1:1506 blocked for more than 120 seconds.
> Tainted: P OE 4.3.0sra12+ #50
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/0:1 D 0000000000000000 0 1506 2 0x00000080
> Workqueue: usb_hub_wq hub_event
> ffff8801e46dba58 0000000000000046 ffff8810375dac00 ffff881038430000
> ffff8801e46dc000 ffff88025ac20440 ffff88025ac20438 ffff881038430000
> 0000000000000000 ffff8801e46dba70 ffffffff81659893 7fffffffffffffff
> Call Trace:
> [<ffffffff81659893>] schedule+0x33/0x80
> [<ffffffff8165c530>] schedule_timeout+0x200/0x2a0
> [<ffffffff810e2761>] ? internal_add_timer+0x71/0xb0
> [<ffffffff810e4994>] ? mod_timer+0x114/0x210
> [<ffffffff8165a371>] wait_for_completion+0xf1/0x130
> [<ffffffff810a70d0>] ? wake_up_q+0x70/0x70
> [<ffffffff814b14a1>] xhci_discover_or_reset_device+0x1e1/0x540
> [<ffffffff814723b8>] hub_port_reset+0x3c8/0x590
> [<ffffffff81472aa5>] hub_port_init+0x525/0xb00
> [<ffffffff81476068>] hub_port_connect+0x328/0x940
> [<ffffffff81476cbc>] hub_event+0x63c/0xb00
> [<ffffffff810947dc>] process_one_work+0x14c/0x3c0
> [<ffffffff81095044>] worker_thread+0x114/0x470
> [<ffffffff8165925f>] ? __schedule+0x2af/0x8b0
> [<ffffffff81094f30>] ? rescuer_thread+0x310/0x310
> [<ffffffff8109ab88>] kthread+0xd8/0xf0
> [<ffffffff8109aab0>] ? kthread_park+0x60/0x60
> [<ffffffff8165d75f>] ret_from_fork+0x3f/0x70
> [<ffffffff8109aab0>] ? kthread_park+0x60/0x60
Hi Thomas / Boris,
In an effort to exonerate the patchset, I instrumented xHCI to monitor
complementary wait_for_completion / complete calls in that driver,
hoping that an early exit in its probe might be simply skipping the
complete call ... but I ended up collecting two new crashes in
get_next_timer_interrupt:
(Again with proprietary and out-of-tree drivers loaded.)
general protection fault: 0000 [#1] SMP
Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun
matroxfb(OE) ccmod(POE) ftmod(OE) videosw(OE) [ ... snip ... ]
CPU: 10 PID: 0 Comm: swapper/10 Tainted: P OE 4.3.0sra13+ #53
Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.1:61 09/10/2015
task: ffff881038e35800 ti: ffff881038e3c000 task.ti: ffff881038e3c000
RIP: 0010:[<ffffffff810e4c55>] [<ffffffff810e4c55>] get_next_timer_interrupt+0x1a5/0x240
RSP: 0018:ffff881038e3fde0 EFLAGS: 00010002
RAX: ffff88103fa8e8b8 RBX: 000013629b0c5740 RCX: 000000014140a6d6
RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000001 RDI: 00000000010140a7
RBP: ffff881038e3fe30 R08: 6b6b6b6b6b6b6b6b R09: 0000000000000027
R10: 0000000000000027 R11: ffff881038e3fde8 R12: 000000010140a6d6
R13: ffff88103fa8e080 R14: ffff881038e3fe00 R15: 0000000000000040
FS: 0000000000000000(0000) GS:ffff88103fa80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcbf695a000 CR3: 0000002030582000 CR4: 00000000001406e0
Stack:
ffff88103fa8e8b8 ffff88103fa8eab8 ffff88103fa8ecb8 ffff88103fa8eeb8
c5067d2c236293b0 ffff88103fa8f500 ffff88103fa8d600 000013629b1632f3
000000010140a6d6 000013629b0c5740 ffff881038e3fe88 ffffffff810f421f
Call Trace:
[<ffffffff810f421f>] tick_nohz_stop_sched_tick+0x1bf/0x2b0
[<ffffffff810f43af>] __tick_nohz_idle_enter+0x9f/0x150
[<ffffffff810f47dc>] tick_nohz_idle_enter+0x3c/0x70
[<ffffffff810bf00c>] cpu_startup_entry+0x9c/0x330
[<ffffffff8104e700>] start_secondary+0x160/0x1a0
Code: 95 38 0e 00 00 48 89 55 c8 48 8d 55 b0 4c 8d 5a 08 4c 8d 72 20 41 89 fa 41 83 e2 3f 45 89 d1 49 63 d1 48 8b 14 d0 48 85 d2 74 1e <f6> 42 2a 10 75 10 4c 8b 42 10 be 01 00 00 00 49 39 c8 49 0f 48
RIP [<ffffffff810e4c55>] get_next_timer_interrupt+0x1a5/0x240
RSP <ffff881038e3fde0>
crash> dis -l get_next_timer_interrupt+0x1a5
/root/linus/kernel/time/timer.c: 1289
<get_next_timer_interrupt+0x1a5>: testb $0x10,0x2a(%rdx)
RDX: 6b6b6b6b6b6b6b6b
1246 static unsigned long __next_timer_interrupt(struct tvec_base *base)
...
1251 struct timer_list *nte;
...
1277 /* Check tv2-tv5. */
1278 varray[0] = &base->tv2;
1279 varray[1] = &base->tv3;
1280 varray[2] = &base->tv4;
1281 varray[3] = &base->tv5;
...
1283 for (array = 0; array < 4; array++) {
1284 struct tvec *varp = varray[array];
1285
1286 index = slot = timer_jiffies & TVN_MASK;
1287 do {
1288 hlist_for_each_entry(nte, varp->vec + slot, entry) {
1289 if (nte->flags & TIMER_DEFERRABLE)
So the nte pointer contains the slub_debug poison pattern.
>From the disassembly of __next_timer_interrupt, it looks like r13 is
used to store "base".
R13: ffff88103fa8e080
crash> struct tvec_base ffff88103fa8e080
struct tvec_base {
lock = {
{
rlock = {
raw_lock = {
val = {
counter = 0x1
}
}
}
}
},
running_timer = 0x0,
timer_jiffies = 0x10140a6d7,
next_timer = 0x10140a6d6,
active_timers = 0x7,
all_timers = 0x8,
cpu = 0xa,
migration_enabled = 0x1,
nohz_active = 0x1,
...
tv1 = {
vec = {{
...
first = 0xffff88203800dff8
...
tv2 = {
vec = {{
...
first = 0xffff88100aa9a550 << index 39 points to slub poison
...
first = 0xffff88103fa90ea0
...
tv3 = {
vec = {{
...
first = 0xffff8807452b3928
...
first = 0xffff88103fa8d380
...
tv4 = {
vec = {{
...
tv5 = {
vec = {{
...
first = 0xffff88100c8955e8
crash> struct hlist_node 0xffff88203800dff8
struct hlist_node {
next = 0x0,
pprev = 0xffff88103fa8e790
}
crash> struct hlist_node 0xffff88100aa9a550
struct hlist_node {
next = 0x6b6b6b6b6b6b6b6b, << uhoh!
pprev = 0x6b6b6b6b6b6b6b6b <<
}
crash> struct hlist_node 0xffff88103fa90ea0
struct hlist_node {
next = 0x0,
pprev = 0xffff88103fa8e9f8
}
crash> struct hlist_node 0xffff8807452b3928
struct hlist_node {
next = 0xffff88100aa9da68,
pprev = 0xffff88103fa8eae0
}
crash> struct hlist_node 0xffff88103fa8d380
struct hlist_node {
next = 0x0,
pprev = 0xffff88103fa8eb58
}
crash> struct hlist_node 0xffff88100c8955e8
struct hlist_node {
next = 0xffff881021e47598,
pprev = 0xffff88103fa8eec0
}
crash utility confirms it in its "timer" display:
crash> timer
TVEC_BASES[9]: ffff88103fa4e080
JIFFIES
4315982678
EXPIRES TIMER_LIST FUNCTION
4315982681 ffff88203800ddb0 ffffffff8150e7a0 <intel_pstate_timer_func>
4315982973 ffff88103fa50ea0 ffffffff81092d90 <delayed_work_timer_fn>
4316267973 ffff88103fa4d380 ffffffff81041930 <mce_timer_fn>
timer: invalid list entry: 6b6b6b6b6b6b6b6b
timer: ignoring faulty timer list at index 39 of timer array
timer: invalid list entry: 6b6b6b6b6b6b6b6b
timer: ignoring faulty timer list at index 39 of timer array
TVEC_BASES[10]: ffff88103fa8e080
JIFFIES
4315982678
EXPIRES TIMER_LIST FUNCTION
4315981531 ffff88203800dff8 ffffffff8150e7a0 <intel_pstate_timer_func>
4316034039 ffff88100aa9da68 ffffffff81092d90 <delayed_work_timer_fn>
4316034111 ffff8807452b3928 ffffffff81092d90 <delayed_work_timer_fn>
4316267970 ffff88103fa8d380 ffffffff81041930 <mce_timer_fn>
4401397760 ffff88100c8955e8 ffffffff8160bbe0 <ipv6_regen_rndid>
4401397760 ffff881021e47598 ffffffff8160bbe0 <ipv6_regen_rndid>
7740398493674204011 ffff88100aa9a550 6b6b6b6b6b6b6b6b
crash> struct timer_list ffff88100aa9a550
struct timer_list {
entry = {
next = 0x6b6b6b6b6b6b6b6b,
pprev = 0x6b6b6b6b6b6b6b6b
},
expires = 0x6b6b6b6b6b6b6b6b,
function = 0x6b6b6b6b6b6b6b6b,
data = 0x6b6b6b6b6b6b6b6b,
flags = 0x6b6b6b6b,
slack = 0x6b6b6b6b,
start_pid = 0x6b6b6b6b,
start_site = 0x6b6b6b6b6b6b6b6b,
start_comm = "kkkkkkkkkkkkkkkk"
}
A second crash of the same signature occurred a few hours later.
Unfortunately I only have a single box to run these tests in what
amounts to an after-hours effort. I started testing back in 4.3 but
avoided moving forward to avoid the 4.4 development cycle (and
incidental issues that it might have muddled the waters). That said,
what would be the best way to proceed?
Change device removal tests to avoid proprietary drivers.
What about the other out-of-tree device drivers (mpt3sas, ixgbe,
etc.)? These are open source, but contain much Stratus
device removal paranoia that upstream hasn't adopted.
Rebase evil(TM) proprietary/out-of-tree drivers against 4.4 or
4.5rcX, apply this patchset and any other required device removal
fixups.
If proprietary/out-of-tree drivers are a debugging deal breaker, I
understand. The platform offers a unique hotplug testbed, so I try to
contribute testing and bug reports where I feel they apply equally
to untainted upstream.
Regards,
-- Joe
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2016-01-20 3:57 ` Joe Lawrence
@ 2016-01-20 8:26 ` Borislav Petkov
2016-01-22 15:28 ` Joe Lawrence
0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2016-01-20 8:26 UTC (permalink / raw)
To: Joe Lawrence
Cc: Thomas Gleixner, LKML, Ingo Molnar, Peter Anvin, Jiang Liu,
Jeremiah Mahler, andy.shevchenko, Guenter Roeck
On Tue, Jan 19, 2016 at 10:57:35PM -0500, Joe Lawrence wrote:
> Unfortunately I only have a single box to run these tests in what
> amounts to an after-hours effort. I started testing back in 4.3 but
> avoided moving forward to avoid the 4.4 development cycle (and
> incidental issues that it might have muddled the waters). That said,
> what would be the best way to proceed?
>
> Change device removal tests to avoid proprietary drivers.
Right, I'd tend to suggest that. tglx says you could also try enabling
CONFIG_DEBUG_OBJECTS and CONFIG_DEBUG_OBJECTS_TIMERS and collect full
dmesg from when the corruption happens. That might give us some more
insights as to what happens.
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch 00/14] x86/irq: Plug various vector cleanup races
2016-01-20 8:26 ` Borislav Petkov
@ 2016-01-22 15:28 ` Joe Lawrence
0 siblings, 0 replies; 38+ messages in thread
From: Joe Lawrence @ 2016-01-22 15:28 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, LKML, Ingo Molnar, Peter Anvin, Jiang Liu,
Jeremiah Mahler, andy.shevchenko, Guenter Roeck
On 01/20/2016 03:26 AM, Borislav Petkov wrote:
> ... tglx says you could also try enabling
> CONFIG_DEBUG_OBJECTS and CONFIG_DEBUG_OBJECTS_TIMERS and collect full
> dmesg from when the corruption happens. That might give us some more
> insights as to what happens.
Hi Boris,
Excellent idea. With CONFIG_DEBUG_OBJECTS_FREE=y I've hit this twice:
ODEBUG: free active (active state 0) object type: timer_list hint:
xhci_stop_endpoint_command_watchdog+0x0/0x2b0
So I'll try to hunt that down on the xHCI side and report over to
linux-usb. Sorry for the unrelated noise on this patch -- my guess is
that timing changed just enough for the xHCI driver to hit this. :)
-- Joe
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2016-01-22 15:28 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-31 16:30 [patch 00/14] x86/irq: Plug various vector cleanup races Thomas Gleixner
2015-12-31 16:30 ` [patch 01/14] x86/irq: Fix a race in x86_vector_free_irqs() Thomas Gleixner
2015-12-31 16:30 ` [patch 02/14] x86/irq: Validate that irq descriptor is still active Thomas Gleixner
2016-01-16 21:16 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 04/14] x86/irq: Reorganize the return path in assign_irq_vector Thomas Gleixner
2016-01-16 21:17 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 03/14] x86/irq: Do not use apic_chip_data.old_domain as temporary buffer Thomas Gleixner
2015-12-31 16:30 ` [patch 05/14] x86/irq: Reorganize the search in assign_irq_vector Thomas Gleixner
2016-01-16 21:17 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 06/14] x86/irq: Check vector allocation early Thomas Gleixner
2016-01-16 21:17 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 08/14] x86/irq: Get rid of code duplication Thomas Gleixner
2016-01-16 21:18 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 07/14] x86/irq: Copy vectormask instead of an AND operation Thomas Gleixner
2016-01-16 21:18 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 09/14] x86/irq: Remove offline cpus from vector cleanup Thomas Gleixner
2016-01-16 21:18 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 10/14] x86/irq: Clear move_in_progress before sending cleanup IPI Thomas Gleixner
2016-01-16 21:19 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 12/14] x86/irq: Remove outgoing CPU from vector cleanup mask Thomas Gleixner
2016-01-16 21:19 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 11/14] x86/irq: Remove the cpumask allocation from send_cleanup_vector() Thomas Gleixner
2016-01-16 21:19 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 13/14] x86/irq: Call irq_force_move_complete with irq descriptor Thomas Gleixner
2016-01-16 21:20 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-12-31 16:30 ` [patch 14/14] x86/irq: Plug vector cleanup race Thomas Gleixner
2016-01-16 21:20 ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2016-01-04 15:35 ` [patch 00/14] x86/irq: Plug various vector cleanup races Joe Lawrence
2016-01-14 8:24 ` Thomas Gleixner
2016-01-14 10:33 ` Borislav Petkov
2016-01-16 21:37 ` Joe Lawrence
2016-01-18 15:00 ` Joe Lawrence
2016-01-18 15:43 ` Borislav Petkov
2016-01-18 16:38 ` Joe Lawrence
2016-01-20 3:57 ` Joe Lawrence
2016-01-20 8:26 ` Borislav Petkov
2016-01-22 15:28 ` Joe Lawrence
2016-01-16 21:15 ` [tip:x86/urgent] x86/irq: Call chip-> irq_set_affinity in proper context tip-bot for Thomas Gleixner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.