* Re: linux acpi (thunderbolt? bug)
[not found] <CABbc0=RMYWGN0L=z_Y=FuZJUDzD5NVa2XBTVnmpZxX5tnk3-5g@mail.gmail.com>
@ 2018-02-14 12:09 ` Thomas Gleixner
2018-02-14 13:55 ` Andy Shevchenko
2018-02-15 8:52 ` Thomas Gleixner
0 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-14 12:09 UTC (permalink / raw)
To: Yuriy Vostrikov; +Cc: linux-kernel, x86
On Wed, 14 Feb 2018, Yuriy Vostrikov wrote:
> after boot
> name: VECTOR
> size: 0
> mapped: 64
> flags: 0x00000041
> Online bitmaps: 2
> Global available: 368
> Global reserved: 26
> Total allocated: 38
> System: 41: 0-19,32,50,128,238-255
> | CPU | avl | man | act | vectors
> 0 184 1 19 33-49,51-52
> 1 184 1 19 33-49,51-52
>
> after unplug
> name: VECTOR
> size: 0
> mapped: 55
> flags: 0x00000041
> Online bitmaps: 2
> Global available: 377
> Global reserved: 26
> Total allocated: 29
> System: 41: 0-19,32,50,128,238-255
> | CPU | avl | man | act | vectors
> 0 188 1 15 33-46,48
> 1 189 1 14 33-46
>
> after sleep 1 time
> name: VECTOR
> size: 0
> mapped: 35
> flags: 0x00000041
> Online bitmaps: 2
> Global available: 385
> Global reserved: 12
> Total allocated: 32
> System: 41: 0-19,32,50,128,238-255
> | CPU | avl | man | act | vectors
> 0 185 1 18 33-43,46-49,51-53
> 1 200 1 3 33-37
The accounting is already screwed. CPU1 claims to have 3 allocated vectors,
but the allocation bitmap has 5 bits set !?!
I have no idea yet how that can happen. Lemme stare into the code some more
and I came back to you.
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-14 12:09 ` linux acpi (thunderbolt? bug) Thomas Gleixner
@ 2018-02-14 13:55 ` Andy Shevchenko
2018-02-15 8:52 ` Thomas Gleixner
1 sibling, 0 replies; 11+ messages in thread
From: Andy Shevchenko @ 2018-02-14 13:55 UTC (permalink / raw)
To: Thomas Gleixner, Mika Westerberg
Cc: Yuriy Vostrikov, Linux Kernel Mailing List, x86
+Cc: Mika -- the Thunderbolt guy.
On Wed, Feb 14, 2018 at 2:09 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 14 Feb 2018, Yuriy Vostrikov wrote:
>> after boot
>> name: VECTOR
>> size: 0
>> mapped: 64
>> flags: 0x00000041
>> Online bitmaps: 2
>> Global available: 368
>> Global reserved: 26
>> Total allocated: 38
>> System: 41: 0-19,32,50,128,238-255
>> | CPU | avl | man | act | vectors
>> 0 184 1 19 33-49,51-52
>> 1 184 1 19 33-49,51-52
>>
>> after unplug
>> name: VECTOR
>> size: 0
>> mapped: 55
>> flags: 0x00000041
>> Online bitmaps: 2
>> Global available: 377
>> Global reserved: 26
>> Total allocated: 29
>> System: 41: 0-19,32,50,128,238-255
>> | CPU | avl | man | act | vectors
>> 0 188 1 15 33-46,48
>> 1 189 1 14 33-46
>>
>> after sleep 1 time
>> name: VECTOR
>> size: 0
>> mapped: 35
>> flags: 0x00000041
>> Online bitmaps: 2
>> Global available: 385
>> Global reserved: 12
>> Total allocated: 32
>> System: 41: 0-19,32,50,128,238-255
>> | CPU | avl | man | act | vectors
>> 0 185 1 18 33-43,46-49,51-53
>> 1 200 1 3 33-37
>
> The accounting is already screwed. CPU1 claims to have 3 allocated vectors,
> but the allocation bitmap has 5 bits set !?!
>
> I have no idea yet how that can happen. Lemme stare into the code some more
> and I came back to you.
>
> Thanks,
>
> tglx
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-14 12:09 ` linux acpi (thunderbolt? bug) Thomas Gleixner
2018-02-14 13:55 ` Andy Shevchenko
@ 2018-02-15 8:52 ` Thomas Gleixner
2018-02-16 15:23 ` Yuriy Vostrikov
1 sibling, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-15 8:52 UTC (permalink / raw)
To: Yuriy Vostrikov; +Cc: linux-kernel, x86
On Wed, 14 Feb 2018, Thomas Gleixner wrote:
> On Wed, 14 Feb 2018, Yuriy Vostrikov wrote:
> > after sleep 1 time
> > name: VECTOR
> > size: 0
> > mapped: 35
> > flags: 0x00000041
> > Online bitmaps: 2
> > Global available: 385
> > Global reserved: 12
> > Total allocated: 32
> > System: 41: 0-19,32,50,128,238-255
> > | CPU | avl | man | act | vectors
> > 0 185 1 18 33-43,46-49,51-53
> > 1 200 1 3 33-37
>
> The accounting is already screwed. CPU1 claims to have 3 allocated vectors,
> but the allocation bitmap has 5 bits set !?!
>
> I have no idea yet how that can happen. Lemme stare into the code some more
> and I came back to you.
Still confused.
Does this happen if you boot w/o the thunderbolt thingy and then do
suspend/resume cycles?
Can you please take snapshots from:
/proc/interrupts
/sys/kernel/debug/irq/*
right after boot, after the unplug, before suspend and after resume?
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-15 8:52 ` Thomas Gleixner
@ 2018-02-16 15:23 ` Yuriy Vostrikov
2018-02-18 18:03 ` Thomas Gleixner
0 siblings, 1 reply; 11+ messages in thread
From: Yuriy Vostrikov @ 2018-02-16 15:23 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: linux-kernel, x86
[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]
On 15 February 2018 at 11:52, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 14 Feb 2018, Thomas Gleixner wrote:
>> On Wed, 14 Feb 2018, Yuriy Vostrikov wrote:
>> > after sleep 1 time
>> > name: VECTOR
>> > size: 0
>> > mapped: 35
>> > flags: 0x00000041
>> > Online bitmaps: 2
>> > Global available: 385
>> > Global reserved: 12
>> > Total allocated: 32
>> > System: 41: 0-19,32,50,128,238-255
>> > | CPU | avl | man | act | vectors
>> > 0 185 1 18 33-43,46-49,51-53
>> > 1 200 1 3 33-37
>>
>> The accounting is already screwed. CPU1 claims to have 3 allocated vectors,
>> but the allocation bitmap has 5 bits set !?!
>>
>> I have no idea yet how that can happen. Lemme stare into the code some more
>> and I came back to you.
>
> Still confused.
>
> Does this happen if you boot w/o the thunderbolt thingy and then do
> suspend/resume cycles?
>
> Can you please take snapshots from:
>
> /proc/interrupts
> /sys/kernel/debug/irq/*
>
> right after boot, after the unplug, before suspend and after resume?
>
Apparently, timing is important: problem manifests if the laptop goes
to sleep shortly after unplug.
If there is some delay between unplugging and sleeping, then there is
no problem.
I'm attaching tar.gz with two runs: run-1 with the problem and run-2
without. Dumps include output
of dmesg in time of making a snapshot.
Hope this clarifies the situation a bit.
Thank you,
Yuriy.
[-- Attachment #2: dump.tar.gz --]
[-- Type: application/x-gzip, Size: 181264 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-16 15:23 ` Yuriy Vostrikov
@ 2018-02-18 18:03 ` Thomas Gleixner
2018-02-19 14:51 ` Thomas Gleixner
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-18 18:03 UTC (permalink / raw)
To: Yuriy Vostrikov; +Cc: linux-kernel, x86
On Fri, 16 Feb 2018, Yuriy Vostrikov wrote:
> On 15 February 2018 at 11:52, Thomas Gleixner <tglx@linutronix.de> wrote:
> > Can you please take snapshots from:
> >
> > /proc/interrupts
> > /sys/kernel/debug/irq/*
> >
> > right after boot, after the unplug, before suspend and after resume?
> >
>
> Apparently, timing is important: problem manifests if the laptop goes
> to sleep shortly after unplug.
> If there is some delay between unplugging and sleeping, then there is
> no problem.
> I'm attaching tar.gz with two runs: run-1 with the problem and run-2
> without. Dumps include output
> of dmesg in time of making a snapshot.
>
> Hope this clarifies the situation a bit.
Yes. I finally wrapped my brain around it and I can reproduce now after
understanding the root cause. I have no fix yet, but I should have
something for you to test tomorrow.
Thanks for providing all the info.
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-18 18:03 ` Thomas Gleixner
@ 2018-02-19 14:51 ` Thomas Gleixner
2018-02-19 17:18 ` Randy Dunlap
2018-02-21 8:45 ` Yuriy Vostrikov
0 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-19 14:51 UTC (permalink / raw)
To: Yuriy Vostrikov; +Cc: linux-kernel, x86
On Sun, 18 Feb 2018, Thomas Gleixner wrote:
> On Fri, 16 Feb 2018, Yuriy Vostrikov wrote:
> > On 15 February 2018 at 11:52, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > Can you please take snapshots from:
> > >
> > > /proc/interrupts
> > > /sys/kernel/debug/irq/*
> > >
> > > right after boot, after the unplug, before suspend and after resume?
> > >
> >
> > Apparently, timing is important: problem manifests if the laptop goes
> > to sleep shortly after unplug.
> > If there is some delay between unplugging and sleeping, then there is
> > no problem.
> > I'm attaching tar.gz with two runs: run-1 with the problem and run-2
> > without. Dumps include output
> > of dmesg in time of making a snapshot.
> >
> > Hope this clarifies the situation a bit.
>
> Yes. I finally wrapped my brain around it and I can reproduce now after
> understanding the root cause. I have no fix yet, but I should have
> something for you to test tomorrow.
The patch below should cure it.
Thanks,
tglx
8<------------------
Subject: genirq/matrix: Handle CPU offlining proper
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 19 Feb 2018 12:59:34 +0100
Add blurb.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/x86/kernel/apic/vector.c | 10 ++++++++++
kernel/irq/matrix.c | 23 ++++++++++++++---------
2 files changed, 24 insertions(+), 9 deletions(-)
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -134,6 +134,7 @@ static void apic_update_vector(struct ir
{
struct apic_chip_data *apicd = apic_chip_data(irqd);
struct irq_desc *desc = irq_data_to_desc(irqd);
+ bool managed = irqd_affinity_is_managed(irqd);
lockdep_assert_held(&vector_lock);
@@ -146,6 +147,15 @@ static void apic_update_vector(struct ir
apicd->prev_vector = apicd->vector;
apicd->prev_cpu = apicd->cpu;
} else {
+ /*
+ * Offline case: The current vector needs to be released in
+ * the matrix allocator.
+ */
+ if (apicd->vector &&
+ apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
+ irq_matrix_free(vector_matrix, apicd->cpu,
+ apicd->vector, managed);
+ }
apicd->prev_vector = 0;
}
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -16,6 +16,7 @@ struct cpumap {
unsigned int available;
unsigned int allocated;
unsigned int managed;
+ bool initialized;
bool online;
unsigned long alloc_map[IRQ_MATRIX_SIZE];
unsigned long managed_map[IRQ_MATRIX_SIZE];
@@ -81,9 +82,11 @@ void irq_matrix_online(struct irq_matrix
BUG_ON(cm->online);
- bitmap_zero(cm->alloc_map, m->matrix_bits);
- cm->available = m->alloc_size - (cm->managed + m->systembits_inalloc);
- cm->allocated = 0;
+ if (!cm->initialized) {
+ cm->available = m->alloc_size;
+ cm->available -= cm->managed + m->systembits_inalloc;
+ cm->initialized = true;
+ }
m->global_available += cm->available;
cm->online = true;
m->online_maps++;
@@ -370,14 +373,16 @@ void irq_matrix_free(struct irq_matrix *
if (WARN_ON_ONCE(bit < m->alloc_start || bit >= m->alloc_end))
return;
- if (cm->online) {
- clear_bit(bit, cm->alloc_map);
- cm->allocated--;
+ clear_bit(bit, cm->alloc_map);
+ cm->allocated--;
+
+ if (cm->online)
m->total_allocated--;
- if (!managed) {
- cm->available++;
+
+ if (!managed) {
+ cm->available++;
+ if (cm->online)
m->global_available++;
- }
}
trace_irq_matrix_free(bit, cpu, m, cm);
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-19 14:51 ` Thomas Gleixner
@ 2018-02-19 17:18 ` Randy Dunlap
2018-02-19 17:26 ` Thomas Gleixner
2018-02-21 8:45 ` Yuriy Vostrikov
1 sibling, 1 reply; 11+ messages in thread
From: Randy Dunlap @ 2018-02-19 17:18 UTC (permalink / raw)
To: Thomas Gleixner, Yuriy Vostrikov; +Cc: linux-kernel, x86
On 02/19/18 06:51, Thomas Gleixner wrote:
> } else {
> + /*
> + * Offline case: The current vector needs to be released in
> + * the matrix allocator.
> + */
> + if (apicd->vector &&
Drop the "apicd->vector &&" ? (redundant)
> + apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
> + irq_matrix_free(vector_matrix, apicd->cpu,
> + apicd->vector, managed);
> + }
> apicd->prev_vector = 0;
> }
--
~Randy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-19 17:18 ` Randy Dunlap
@ 2018-02-19 17:26 ` Thomas Gleixner
2018-02-19 17:42 ` Randy Dunlap
2018-02-19 17:46 ` Randy Dunlap
0 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-19 17:26 UTC (permalink / raw)
To: Randy Dunlap; +Cc: Yuriy Vostrikov, linux-kernel, x86
On Mon, 19 Feb 2018, Randy Dunlap wrote:
> On 02/19/18 06:51, Thomas Gleixner wrote:
> > } else {
> > + /*
> > + * Offline case: The current vector needs to be released in
> > + * the matrix allocator.
> > + */
> > + if (apicd->vector &&
>
> Drop the "apicd->vector &&" ? (redundant)
No. The else path is entered when
apicd->vector == 0
or
apicd->cpu is offline
So we need to check here for vector != 0 as otherwise we'd free vector 0
which is invalid. I'll add a comment explaining the mess.
> > + apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
> > + irq_matrix_free(vector_matrix, apicd->cpu,
> > + apicd->vector, managed);
> > + }
> > apicd->prev_vector = 0;
> > }
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-19 17:26 ` Thomas Gleixner
@ 2018-02-19 17:42 ` Randy Dunlap
2018-02-19 17:46 ` Randy Dunlap
1 sibling, 0 replies; 11+ messages in thread
From: Randy Dunlap @ 2018-02-19 17:42 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Yuriy Vostrikov, linux-kernel, x86
On 02/19/18 09:26, Thomas Gleixner wrote:
> On Mon, 19 Feb 2018, Randy Dunlap wrote:
>
>> On 02/19/18 06:51, Thomas Gleixner wrote:
>>> } else {
>>> + /*
>>> + * Offline case: The current vector needs to be released in
>>> + * the matrix allocator.
>>> + */
>>> + if (apicd->vector &&
>>
>> Drop the "apicd->vector &&" ? (redundant)
>
> No. The else path is entered when
>
> apicd->vector == 0
>
> or
>
> apicd->cpu is offline
>
> So we need to check here for vector != 0 as otherwise we'd free vector 0
> which is invalid. I'll add a comment explaining the mess.
>
>>> + apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
>>> + irq_matrix_free(vector_matrix, apicd->cpu,
>>> + apicd->vector, managed);
>>> + }
>>> apicd->prev_vector = 0;
>>> }
I see. Thanks.
--
~Randy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-19 17:26 ` Thomas Gleixner
2018-02-19 17:42 ` Randy Dunlap
@ 2018-02-19 17:46 ` Randy Dunlap
1 sibling, 0 replies; 11+ messages in thread
From: Randy Dunlap @ 2018-02-19 17:46 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Yuriy Vostrikov, linux-kernel, x86
On 02/19/18 09:26, Thomas Gleixner wrote:
> On Mon, 19 Feb 2018, Randy Dunlap wrote:
>
>> On 02/19/18 06:51, Thomas Gleixner wrote:
>>> } else {
>>> + /*
>>> + * Offline case: The current vector needs to be released in
>>> + * the matrix allocator.
>>> + */
>>> + if (apicd->vector &&
>>
>> Drop the "apicd->vector &&" ? (redundant)
>
> No. The else path is entered when
>
> apicd->vector == 0
>
> or
>
> apicd->cpu is offline
>
> So we need to check here for vector != 0 as otherwise we'd free vector 0
> which is invalid. I'll add a comment explaining the mess.
I just need more coffee. I read the != (below) as ==.
:(
>>> + apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
>>> + irq_matrix_free(vector_matrix, apicd->cpu,
>>> + apicd->vector, managed);
>>> + }
>>> apicd->prev_vector = 0;
>>> }
--
~Randy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug)
2018-02-19 14:51 ` Thomas Gleixner
2018-02-19 17:18 ` Randy Dunlap
@ 2018-02-21 8:45 ` Yuriy Vostrikov
1 sibling, 0 replies; 11+ messages in thread
From: Yuriy Vostrikov @ 2018-02-21 8:45 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: linux-kernel, x86
On 19 February 2018 at 17:51, Thomas Gleixner <tglx@linutronix.de> wrote:
> The patch below should cure it.
After applying the patch I'm no longer unable to reproduce the bug. Thank you!
Yuriy
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-02-21 8:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CABbc0=RMYWGN0L=z_Y=FuZJUDzD5NVa2XBTVnmpZxX5tnk3-5g@mail.gmail.com>
2018-02-14 12:09 ` linux acpi (thunderbolt? bug) Thomas Gleixner
2018-02-14 13:55 ` Andy Shevchenko
2018-02-15 8:52 ` Thomas Gleixner
2018-02-16 15:23 ` Yuriy Vostrikov
2018-02-18 18:03 ` Thomas Gleixner
2018-02-19 14:51 ` Thomas Gleixner
2018-02-19 17:18 ` Randy Dunlap
2018-02-19 17:26 ` Thomas Gleixner
2018-02-19 17:42 ` Randy Dunlap
2018-02-19 17:46 ` Randy Dunlap
2018-02-21 8:45 ` Yuriy Vostrikov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.