* Re: linux acpi (thunderbolt? bug) [not found] <CABbc0=RMYWGN0L=z_Y=FuZJUDzD5NVa2XBTVnmpZxX5tnk3-5g@mail.gmail.com> @ 2018-02-14 12:09 ` Thomas Gleixner 2018-02-14 13:55 ` Andy Shevchenko 2018-02-15 8:52 ` Thomas Gleixner 0 siblings, 2 replies; 11+ messages in thread From: Thomas Gleixner @ 2018-02-14 12:09 UTC (permalink / raw) To: Yuriy Vostrikov; +Cc: linux-kernel, x86 On Wed, 14 Feb 2018, Yuriy Vostrikov wrote: > after boot > name: VECTOR > size: 0 > mapped: 64 > flags: 0x00000041 > Online bitmaps: 2 > Global available: 368 > Global reserved: 26 > Total allocated: 38 > System: 41: 0-19,32,50,128,238-255 > | CPU | avl | man | act | vectors > 0 184 1 19 33-49,51-52 > 1 184 1 19 33-49,51-52 > > after unplug > name: VECTOR > size: 0 > mapped: 55 > flags: 0x00000041 > Online bitmaps: 2 > Global available: 377 > Global reserved: 26 > Total allocated: 29 > System: 41: 0-19,32,50,128,238-255 > | CPU | avl | man | act | vectors > 0 188 1 15 33-46,48 > 1 189 1 14 33-46 > > after sleep 1 time > name: VECTOR > size: 0 > mapped: 35 > flags: 0x00000041 > Online bitmaps: 2 > Global available: 385 > Global reserved: 12 > Total allocated: 32 > System: 41: 0-19,32,50,128,238-255 > | CPU | avl | man | act | vectors > 0 185 1 18 33-43,46-49,51-53 > 1 200 1 3 33-37 The accounting is already screwed. CPU1 claims to have 3 allocated vectors, but the allocation bitmap has 5 bits set !?! I have no idea yet how that can happen. Lemme stare into the code some more and I came back to you. Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-14 12:09 ` linux acpi (thunderbolt? bug) Thomas Gleixner @ 2018-02-14 13:55 ` Andy Shevchenko 2018-02-15 8:52 ` Thomas Gleixner 1 sibling, 0 replies; 11+ messages in thread From: Andy Shevchenko @ 2018-02-14 13:55 UTC (permalink / raw) To: Thomas Gleixner, Mika Westerberg Cc: Yuriy Vostrikov, Linux Kernel Mailing List, x86 +Cc: Mika -- the Thunderbolt guy. On Wed, Feb 14, 2018 at 2:09 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Wed, 14 Feb 2018, Yuriy Vostrikov wrote: >> after boot >> name: VECTOR >> size: 0 >> mapped: 64 >> flags: 0x00000041 >> Online bitmaps: 2 >> Global available: 368 >> Global reserved: 26 >> Total allocated: 38 >> System: 41: 0-19,32,50,128,238-255 >> | CPU | avl | man | act | vectors >> 0 184 1 19 33-49,51-52 >> 1 184 1 19 33-49,51-52 >> >> after unplug >> name: VECTOR >> size: 0 >> mapped: 55 >> flags: 0x00000041 >> Online bitmaps: 2 >> Global available: 377 >> Global reserved: 26 >> Total allocated: 29 >> System: 41: 0-19,32,50,128,238-255 >> | CPU | avl | man | act | vectors >> 0 188 1 15 33-46,48 >> 1 189 1 14 33-46 >> >> after sleep 1 time >> name: VECTOR >> size: 0 >> mapped: 35 >> flags: 0x00000041 >> Online bitmaps: 2 >> Global available: 385 >> Global reserved: 12 >> Total allocated: 32 >> System: 41: 0-19,32,50,128,238-255 >> | CPU | avl | man | act | vectors >> 0 185 1 18 33-43,46-49,51-53 >> 1 200 1 3 33-37 > > The accounting is already screwed. CPU1 claims to have 3 allocated vectors, > but the allocation bitmap has 5 bits set !?! > > I have no idea yet how that can happen. Lemme stare into the code some more > and I came back to you. > > Thanks, > > tglx -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-14 12:09 ` linux acpi (thunderbolt? bug) Thomas Gleixner 2018-02-14 13:55 ` Andy Shevchenko @ 2018-02-15 8:52 ` Thomas Gleixner 2018-02-16 15:23 ` Yuriy Vostrikov 1 sibling, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2018-02-15 8:52 UTC (permalink / raw) To: Yuriy Vostrikov; +Cc: linux-kernel, x86 On Wed, 14 Feb 2018, Thomas Gleixner wrote: > On Wed, 14 Feb 2018, Yuriy Vostrikov wrote: > > after sleep 1 time > > name: VECTOR > > size: 0 > > mapped: 35 > > flags: 0x00000041 > > Online bitmaps: 2 > > Global available: 385 > > Global reserved: 12 > > Total allocated: 32 > > System: 41: 0-19,32,50,128,238-255 > > | CPU | avl | man | act | vectors > > 0 185 1 18 33-43,46-49,51-53 > > 1 200 1 3 33-37 > > The accounting is already screwed. CPU1 claims to have 3 allocated vectors, > but the allocation bitmap has 5 bits set !?! > > I have no idea yet how that can happen. Lemme stare into the code some more > and I came back to you. Still confused. Does this happen if you boot w/o the thunderbolt thingy and then do suspend/resume cycles? Can you please take snapshots from: /proc/interrupts /sys/kernel/debug/irq/* right after boot, after the unplug, before suspend and after resume? Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-15 8:52 ` Thomas Gleixner @ 2018-02-16 15:23 ` Yuriy Vostrikov 2018-02-18 18:03 ` Thomas Gleixner 0 siblings, 1 reply; 11+ messages in thread From: Yuriy Vostrikov @ 2018-02-16 15:23 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, x86 [-- Attachment #1: Type: text/plain, Size: 1464 bytes --] On 15 February 2018 at 11:52, Thomas Gleixner <tglx@linutronix.de> wrote: > On Wed, 14 Feb 2018, Thomas Gleixner wrote: >> On Wed, 14 Feb 2018, Yuriy Vostrikov wrote: >> > after sleep 1 time >> > name: VECTOR >> > size: 0 >> > mapped: 35 >> > flags: 0x00000041 >> > Online bitmaps: 2 >> > Global available: 385 >> > Global reserved: 12 >> > Total allocated: 32 >> > System: 41: 0-19,32,50,128,238-255 >> > | CPU | avl | man | act | vectors >> > 0 185 1 18 33-43,46-49,51-53 >> > 1 200 1 3 33-37 >> >> The accounting is already screwed. CPU1 claims to have 3 allocated vectors, >> but the allocation bitmap has 5 bits set !?! >> >> I have no idea yet how that can happen. Lemme stare into the code some more >> and I came back to you. > > Still confused. > > Does this happen if you boot w/o the thunderbolt thingy and then do > suspend/resume cycles? > > Can you please take snapshots from: > > /proc/interrupts > /sys/kernel/debug/irq/* > > right after boot, after the unplug, before suspend and after resume? > Apparently, timing is important: problem manifests if the laptop goes to sleep shortly after unplug. If there is some delay between unplugging and sleeping, then there is no problem. I'm attaching tar.gz with two runs: run-1 with the problem and run-2 without. Dumps include output of dmesg in time of making a snapshot. Hope this clarifies the situation a bit. Thank you, Yuriy. [-- Attachment #2: dump.tar.gz --] [-- Type: application/x-gzip, Size: 181264 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-16 15:23 ` Yuriy Vostrikov @ 2018-02-18 18:03 ` Thomas Gleixner 2018-02-19 14:51 ` Thomas Gleixner 0 siblings, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2018-02-18 18:03 UTC (permalink / raw) To: Yuriy Vostrikov; +Cc: linux-kernel, x86 On Fri, 16 Feb 2018, Yuriy Vostrikov wrote: > On 15 February 2018 at 11:52, Thomas Gleixner <tglx@linutronix.de> wrote: > > Can you please take snapshots from: > > > > /proc/interrupts > > /sys/kernel/debug/irq/* > > > > right after boot, after the unplug, before suspend and after resume? > > > > Apparently, timing is important: problem manifests if the laptop goes > to sleep shortly after unplug. > If there is some delay between unplugging and sleeping, then there is > no problem. > I'm attaching tar.gz with two runs: run-1 with the problem and run-2 > without. Dumps include output > of dmesg in time of making a snapshot. > > Hope this clarifies the situation a bit. Yes. I finally wrapped my brain around it and I can reproduce now after understanding the root cause. I have no fix yet, but I should have something for you to test tomorrow. Thanks for providing all the info. tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-18 18:03 ` Thomas Gleixner @ 2018-02-19 14:51 ` Thomas Gleixner 2018-02-19 17:18 ` Randy Dunlap 2018-02-21 8:45 ` Yuriy Vostrikov 0 siblings, 2 replies; 11+ messages in thread From: Thomas Gleixner @ 2018-02-19 14:51 UTC (permalink / raw) To: Yuriy Vostrikov; +Cc: linux-kernel, x86 On Sun, 18 Feb 2018, Thomas Gleixner wrote: > On Fri, 16 Feb 2018, Yuriy Vostrikov wrote: > > On 15 February 2018 at 11:52, Thomas Gleixner <tglx@linutronix.de> wrote: > > > Can you please take snapshots from: > > > > > > /proc/interrupts > > > /sys/kernel/debug/irq/* > > > > > > right after boot, after the unplug, before suspend and after resume? > > > > > > > Apparently, timing is important: problem manifests if the laptop goes > > to sleep shortly after unplug. > > If there is some delay between unplugging and sleeping, then there is > > no problem. > > I'm attaching tar.gz with two runs: run-1 with the problem and run-2 > > without. Dumps include output > > of dmesg in time of making a snapshot. > > > > Hope this clarifies the situation a bit. > > Yes. I finally wrapped my brain around it and I can reproduce now after > understanding the root cause. I have no fix yet, but I should have > something for you to test tomorrow. The patch below should cure it. Thanks, tglx 8<------------------ Subject: genirq/matrix: Handle CPU offlining proper From: Thomas Gleixner <tglx@linutronix.de> Date: Mon, 19 Feb 2018 12:59:34 +0100 Add blurb. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/kernel/apic/vector.c | 10 ++++++++++ kernel/irq/matrix.c | 23 ++++++++++++++--------- 2 files changed, 24 insertions(+), 9 deletions(-) --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -134,6 +134,7 @@ static void apic_update_vector(struct ir { struct apic_chip_data *apicd = apic_chip_data(irqd); struct irq_desc *desc = irq_data_to_desc(irqd); + bool managed = irqd_affinity_is_managed(irqd); lockdep_assert_held(&vector_lock); @@ -146,6 +147,15 @@ static void apic_update_vector(struct ir apicd->prev_vector = apicd->vector; apicd->prev_cpu = apicd->cpu; } else { + /* + * Offline case: The current vector needs to be released in + * the matrix allocator. + */ + if (apicd->vector && + apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) { + irq_matrix_free(vector_matrix, apicd->cpu, + apicd->vector, managed); + } apicd->prev_vector = 0; } --- a/kernel/irq/matrix.c +++ b/kernel/irq/matrix.c @@ -16,6 +16,7 @@ struct cpumap { unsigned int available; unsigned int allocated; unsigned int managed; + bool initialized; bool online; unsigned long alloc_map[IRQ_MATRIX_SIZE]; unsigned long managed_map[IRQ_MATRIX_SIZE]; @@ -81,9 +82,11 @@ void irq_matrix_online(struct irq_matrix BUG_ON(cm->online); - bitmap_zero(cm->alloc_map, m->matrix_bits); - cm->available = m->alloc_size - (cm->managed + m->systembits_inalloc); - cm->allocated = 0; + if (!cm->initialized) { + cm->available = m->alloc_size; + cm->available -= cm->managed + m->systembits_inalloc; + cm->initialized = true; + } m->global_available += cm->available; cm->online = true; m->online_maps++; @@ -370,14 +373,16 @@ void irq_matrix_free(struct irq_matrix * if (WARN_ON_ONCE(bit < m->alloc_start || bit >= m->alloc_end)) return; - if (cm->online) { - clear_bit(bit, cm->alloc_map); - cm->allocated--; + clear_bit(bit, cm->alloc_map); + cm->allocated--; + + if (cm->online) m->total_allocated--; - if (!managed) { - cm->available++; + + if (!managed) { + cm->available++; + if (cm->online) m->global_available++; - } } trace_irq_matrix_free(bit, cpu, m, cm); } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-19 14:51 ` Thomas Gleixner @ 2018-02-19 17:18 ` Randy Dunlap 2018-02-19 17:26 ` Thomas Gleixner 2018-02-21 8:45 ` Yuriy Vostrikov 1 sibling, 1 reply; 11+ messages in thread From: Randy Dunlap @ 2018-02-19 17:18 UTC (permalink / raw) To: Thomas Gleixner, Yuriy Vostrikov; +Cc: linux-kernel, x86 On 02/19/18 06:51, Thomas Gleixner wrote: > } else { > + /* > + * Offline case: The current vector needs to be released in > + * the matrix allocator. > + */ > + if (apicd->vector && Drop the "apicd->vector &&" ? (redundant) > + apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) { > + irq_matrix_free(vector_matrix, apicd->cpu, > + apicd->vector, managed); > + } > apicd->prev_vector = 0; > } -- ~Randy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-19 17:18 ` Randy Dunlap @ 2018-02-19 17:26 ` Thomas Gleixner 2018-02-19 17:42 ` Randy Dunlap 2018-02-19 17:46 ` Randy Dunlap 0 siblings, 2 replies; 11+ messages in thread From: Thomas Gleixner @ 2018-02-19 17:26 UTC (permalink / raw) To: Randy Dunlap; +Cc: Yuriy Vostrikov, linux-kernel, x86 On Mon, 19 Feb 2018, Randy Dunlap wrote: > On 02/19/18 06:51, Thomas Gleixner wrote: > > } else { > > + /* > > + * Offline case: The current vector needs to be released in > > + * the matrix allocator. > > + */ > > + if (apicd->vector && > > Drop the "apicd->vector &&" ? (redundant) No. The else path is entered when apicd->vector == 0 or apicd->cpu is offline So we need to check here for vector != 0 as otherwise we'd free vector 0 which is invalid. I'll add a comment explaining the mess. > > + apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) { > > + irq_matrix_free(vector_matrix, apicd->cpu, > > + apicd->vector, managed); > > + } > > apicd->prev_vector = 0; > > } Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-19 17:26 ` Thomas Gleixner @ 2018-02-19 17:42 ` Randy Dunlap 2018-02-19 17:46 ` Randy Dunlap 1 sibling, 0 replies; 11+ messages in thread From: Randy Dunlap @ 2018-02-19 17:42 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Yuriy Vostrikov, linux-kernel, x86 On 02/19/18 09:26, Thomas Gleixner wrote: > On Mon, 19 Feb 2018, Randy Dunlap wrote: > >> On 02/19/18 06:51, Thomas Gleixner wrote: >>> } else { >>> + /* >>> + * Offline case: The current vector needs to be released in >>> + * the matrix allocator. >>> + */ >>> + if (apicd->vector && >> >> Drop the "apicd->vector &&" ? (redundant) > > No. The else path is entered when > > apicd->vector == 0 > > or > > apicd->cpu is offline > > So we need to check here for vector != 0 as otherwise we'd free vector 0 > which is invalid. I'll add a comment explaining the mess. > >>> + apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) { >>> + irq_matrix_free(vector_matrix, apicd->cpu, >>> + apicd->vector, managed); >>> + } >>> apicd->prev_vector = 0; >>> } I see. Thanks. -- ~Randy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-19 17:26 ` Thomas Gleixner 2018-02-19 17:42 ` Randy Dunlap @ 2018-02-19 17:46 ` Randy Dunlap 1 sibling, 0 replies; 11+ messages in thread From: Randy Dunlap @ 2018-02-19 17:46 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Yuriy Vostrikov, linux-kernel, x86 On 02/19/18 09:26, Thomas Gleixner wrote: > On Mon, 19 Feb 2018, Randy Dunlap wrote: > >> On 02/19/18 06:51, Thomas Gleixner wrote: >>> } else { >>> + /* >>> + * Offline case: The current vector needs to be released in >>> + * the matrix allocator. >>> + */ >>> + if (apicd->vector && >> >> Drop the "apicd->vector &&" ? (redundant) > > No. The else path is entered when > > apicd->vector == 0 > > or > > apicd->cpu is offline > > So we need to check here for vector != 0 as otherwise we'd free vector 0 > which is invalid. I'll add a comment explaining the mess. I just need more coffee. I read the != (below) as ==. :( >>> + apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) { >>> + irq_matrix_free(vector_matrix, apicd->cpu, >>> + apicd->vector, managed); >>> + } >>> apicd->prev_vector = 0; >>> } -- ~Randy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: linux acpi (thunderbolt? bug) 2018-02-19 14:51 ` Thomas Gleixner 2018-02-19 17:18 ` Randy Dunlap @ 2018-02-21 8:45 ` Yuriy Vostrikov 1 sibling, 0 replies; 11+ messages in thread From: Yuriy Vostrikov @ 2018-02-21 8:45 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, x86 On 19 February 2018 at 17:51, Thomas Gleixner <tglx@linutronix.de> wrote: > The patch below should cure it. After applying the patch I'm no longer unable to reproduce the bug. Thank you! Yuriy ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-02-21 8:45 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CABbc0=RMYWGN0L=z_Y=FuZJUDzD5NVa2XBTVnmpZxX5tnk3-5g@mail.gmail.com> 2018-02-14 12:09 ` linux acpi (thunderbolt? bug) Thomas Gleixner 2018-02-14 13:55 ` Andy Shevchenko 2018-02-15 8:52 ` Thomas Gleixner 2018-02-16 15:23 ` Yuriy Vostrikov 2018-02-18 18:03 ` Thomas Gleixner 2018-02-19 14:51 ` Thomas Gleixner 2018-02-19 17:18 ` Randy Dunlap 2018-02-19 17:26 ` Thomas Gleixner 2018-02-19 17:42 ` Randy Dunlap 2018-02-19 17:46 ` Randy Dunlap 2018-02-21 8:45 ` Yuriy Vostrikov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.