All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: linux acpi (thunderbolt? bug)
       [not found] <CABbc0=RMYWGN0L=z_Y=FuZJUDzD5NVa2XBTVnmpZxX5tnk3-5g@mail.gmail.com>
@ 2018-02-14 12:09 ` Thomas Gleixner
  2018-02-14 13:55   ` Andy Shevchenko
  2018-02-15  8:52   ` Thomas Gleixner
  0 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-14 12:09 UTC (permalink / raw)
  To: Yuriy Vostrikov; +Cc: linux-kernel, x86

On Wed, 14 Feb 2018, Yuriy Vostrikov wrote:
> after boot
> name:   VECTOR
>  size:   0
>  mapped: 64
>  flags:  0x00000041
> Online bitmaps:        2
> Global available:    368
> Global reserved:      26
> Total allocated:      38
> System: 41: 0-19,32,50,128,238-255
>  | CPU | avl | man | act | vectors
>      0   184     1    19  33-49,51-52
>      1   184     1    19  33-49,51-52
> 
> after unplug
> name:   VECTOR
>  size:   0
>  mapped: 55
>  flags:  0x00000041
> Online bitmaps:        2
> Global available:    377
> Global reserved:      26
> Total allocated:      29
> System: 41: 0-19,32,50,128,238-255
>  | CPU | avl | man | act | vectors
>      0   188     1    15  33-46,48
>      1   189     1    14  33-46
> 
> after sleep 1 time
> name:   VECTOR
>  size:   0
>  mapped: 35
>  flags:  0x00000041
> Online bitmaps:        2
> Global available:    385
> Global reserved:      12
> Total allocated:      32
> System: 41: 0-19,32,50,128,238-255
>  | CPU | avl | man | act | vectors
>      0   185     1    18  33-43,46-49,51-53
>      1   200     1     3  33-37

The accounting is already screwed. CPU1 claims to have 3 allocated vectors,
but the allocation bitmap has 5 bits set !?! 

I have no idea yet how that can happen. Lemme stare into the code some more
and I came back to you.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-14 12:09 ` linux acpi (thunderbolt? bug) Thomas Gleixner
@ 2018-02-14 13:55   ` Andy Shevchenko
  2018-02-15  8:52   ` Thomas Gleixner
  1 sibling, 0 replies; 11+ messages in thread
From: Andy Shevchenko @ 2018-02-14 13:55 UTC (permalink / raw)
  To: Thomas Gleixner, Mika Westerberg
  Cc: Yuriy Vostrikov, Linux Kernel Mailing List, x86

+Cc: Mika -- the Thunderbolt guy.

On Wed, Feb 14, 2018 at 2:09 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 14 Feb 2018, Yuriy Vostrikov wrote:
>> after boot
>> name:   VECTOR
>>  size:   0
>>  mapped: 64
>>  flags:  0x00000041
>> Online bitmaps:        2
>> Global available:    368
>> Global reserved:      26
>> Total allocated:      38
>> System: 41: 0-19,32,50,128,238-255
>>  | CPU | avl | man | act | vectors
>>      0   184     1    19  33-49,51-52
>>      1   184     1    19  33-49,51-52
>>
>> after unplug
>> name:   VECTOR
>>  size:   0
>>  mapped: 55
>>  flags:  0x00000041
>> Online bitmaps:        2
>> Global available:    377
>> Global reserved:      26
>> Total allocated:      29
>> System: 41: 0-19,32,50,128,238-255
>>  | CPU | avl | man | act | vectors
>>      0   188     1    15  33-46,48
>>      1   189     1    14  33-46
>>
>> after sleep 1 time
>> name:   VECTOR
>>  size:   0
>>  mapped: 35
>>  flags:  0x00000041
>> Online bitmaps:        2
>> Global available:    385
>> Global reserved:      12
>> Total allocated:      32
>> System: 41: 0-19,32,50,128,238-255
>>  | CPU | avl | man | act | vectors
>>      0   185     1    18  33-43,46-49,51-53
>>      1   200     1     3  33-37
>
> The accounting is already screwed. CPU1 claims to have 3 allocated vectors,
> but the allocation bitmap has 5 bits set !?!
>
> I have no idea yet how that can happen. Lemme stare into the code some more
> and I came back to you.
>
> Thanks,
>
>         tglx



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-14 12:09 ` linux acpi (thunderbolt? bug) Thomas Gleixner
  2018-02-14 13:55   ` Andy Shevchenko
@ 2018-02-15  8:52   ` Thomas Gleixner
  2018-02-16 15:23     ` Yuriy Vostrikov
  1 sibling, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-15  8:52 UTC (permalink / raw)
  To: Yuriy Vostrikov; +Cc: linux-kernel, x86

On Wed, 14 Feb 2018, Thomas Gleixner wrote:
> On Wed, 14 Feb 2018, Yuriy Vostrikov wrote:
> > after sleep 1 time
> > name:   VECTOR
> >  size:   0
> >  mapped: 35
> >  flags:  0x00000041
> > Online bitmaps:        2
> > Global available:    385
> > Global reserved:      12
> > Total allocated:      32
> > System: 41: 0-19,32,50,128,238-255
> >  | CPU | avl | man | act | vectors
> >      0   185     1    18  33-43,46-49,51-53
> >      1   200     1     3  33-37
> 
> The accounting is already screwed. CPU1 claims to have 3 allocated vectors,
> but the allocation bitmap has 5 bits set !?! 
> 
> I have no idea yet how that can happen. Lemme stare into the code some more
> and I came back to you.

Still confused.

Does this happen if you boot w/o the thunderbolt thingy and then do
suspend/resume cycles?

Can you please take snapshots from:

    /proc/interrupts
    /sys/kernel/debug/irq/*

right after boot, after the unplug, before suspend and after resume?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-15  8:52   ` Thomas Gleixner
@ 2018-02-16 15:23     ` Yuriy Vostrikov
  2018-02-18 18:03       ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Yuriy Vostrikov @ 2018-02-16 15:23 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, x86

[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]

On 15 February 2018 at 11:52, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 14 Feb 2018, Thomas Gleixner wrote:
>> On Wed, 14 Feb 2018, Yuriy Vostrikov wrote:
>> > after sleep 1 time
>> > name:   VECTOR
>> >  size:   0
>> >  mapped: 35
>> >  flags:  0x00000041
>> > Online bitmaps:        2
>> > Global available:    385
>> > Global reserved:      12
>> > Total allocated:      32
>> > System: 41: 0-19,32,50,128,238-255
>> >  | CPU | avl | man | act | vectors
>> >      0   185     1    18  33-43,46-49,51-53
>> >      1   200     1     3  33-37
>>
>> The accounting is already screwed. CPU1 claims to have 3 allocated vectors,
>> but the allocation bitmap has 5 bits set !?!
>>
>> I have no idea yet how that can happen. Lemme stare into the code some more
>> and I came back to you.
>
> Still confused.
>
> Does this happen if you boot w/o the thunderbolt thingy and then do
> suspend/resume cycles?
>
> Can you please take snapshots from:
>
>     /proc/interrupts
>     /sys/kernel/debug/irq/*
>
> right after boot, after the unplug, before suspend and after resume?
>

Apparently, timing is important: problem manifests if the laptop goes
to sleep shortly after unplug.
If there is some delay between unplugging and sleeping, then there is
no problem.
I'm attaching tar.gz with two runs: run-1 with the problem and run-2
without. Dumps include output
of dmesg in time of making a snapshot.

Hope this clarifies the situation a bit.

Thank you,
Yuriy.

[-- Attachment #2: dump.tar.gz --]
[-- Type: application/x-gzip, Size: 181264 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-16 15:23     ` Yuriy Vostrikov
@ 2018-02-18 18:03       ` Thomas Gleixner
  2018-02-19 14:51         ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-18 18:03 UTC (permalink / raw)
  To: Yuriy Vostrikov; +Cc: linux-kernel, x86

On Fri, 16 Feb 2018, Yuriy Vostrikov wrote:
> On 15 February 2018 at 11:52, Thomas Gleixner <tglx@linutronix.de> wrote:
> > Can you please take snapshots from:
> >
> >     /proc/interrupts
> >     /sys/kernel/debug/irq/*
> >
> > right after boot, after the unplug, before suspend and after resume?
> >
> 
> Apparently, timing is important: problem manifests if the laptop goes
> to sleep shortly after unplug.
> If there is some delay between unplugging and sleeping, then there is
> no problem.
> I'm attaching tar.gz with two runs: run-1 with the problem and run-2
> without. Dumps include output
> of dmesg in time of making a snapshot.
> 
> Hope this clarifies the situation a bit.

Yes. I finally wrapped my brain around it and I can reproduce now after
understanding the root cause. I have no fix yet, but I should have
something for you to test tomorrow.

Thanks for providing all the info.

       tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-18 18:03       ` Thomas Gleixner
@ 2018-02-19 14:51         ` Thomas Gleixner
  2018-02-19 17:18           ` Randy Dunlap
  2018-02-21  8:45           ` Yuriy Vostrikov
  0 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-19 14:51 UTC (permalink / raw)
  To: Yuriy Vostrikov; +Cc: linux-kernel, x86

On Sun, 18 Feb 2018, Thomas Gleixner wrote:
> On Fri, 16 Feb 2018, Yuriy Vostrikov wrote:
> > On 15 February 2018 at 11:52, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > Can you please take snapshots from:
> > >
> > >     /proc/interrupts
> > >     /sys/kernel/debug/irq/*
> > >
> > > right after boot, after the unplug, before suspend and after resume?
> > >
> > 
> > Apparently, timing is important: problem manifests if the laptop goes
> > to sleep shortly after unplug.
> > If there is some delay between unplugging and sleeping, then there is
> > no problem.
> > I'm attaching tar.gz with two runs: run-1 with the problem and run-2
> > without. Dumps include output
> > of dmesg in time of making a snapshot.
> > 
> > Hope this clarifies the situation a bit.
> 
> Yes. I finally wrapped my brain around it and I can reproduce now after
> understanding the root cause. I have no fix yet, but I should have
> something for you to test tomorrow.

The patch below should cure it.

Thanks,

	tglx

8<------------------

Subject: genirq/matrix: Handle CPU offlining proper
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 19 Feb 2018 12:59:34 +0100

Add blurb.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/vector.c |   10 ++++++++++
 kernel/irq/matrix.c           |   23 ++++++++++++++---------
 2 files changed, 24 insertions(+), 9 deletions(-)

--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -134,6 +134,7 @@ static void apic_update_vector(struct ir
 {
 	struct apic_chip_data *apicd = apic_chip_data(irqd);
 	struct irq_desc *desc = irq_data_to_desc(irqd);
+	bool managed = irqd_affinity_is_managed(irqd);
 
 	lockdep_assert_held(&vector_lock);
 
@@ -146,6 +147,15 @@ static void apic_update_vector(struct ir
 		apicd->prev_vector = apicd->vector;
 		apicd->prev_cpu = apicd->cpu;
 	} else {
+		/*
+		 * Offline case: The current vector needs to be released in
+		 * the matrix allocator.
+		 */
+		if (apicd->vector &&
+		    apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
+			irq_matrix_free(vector_matrix, apicd->cpu,
+					apicd->vector, managed);
+		}
 		apicd->prev_vector = 0;
 	}
 
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -16,6 +16,7 @@ struct cpumap {
 	unsigned int		available;
 	unsigned int		allocated;
 	unsigned int		managed;
+	bool			initialized;
 	bool			online;
 	unsigned long		alloc_map[IRQ_MATRIX_SIZE];
 	unsigned long		managed_map[IRQ_MATRIX_SIZE];
@@ -81,9 +82,11 @@ void irq_matrix_online(struct irq_matrix
 
 	BUG_ON(cm->online);
 
-	bitmap_zero(cm->alloc_map, m->matrix_bits);
-	cm->available = m->alloc_size - (cm->managed + m->systembits_inalloc);
-	cm->allocated = 0;
+	if (!cm->initialized) {
+		cm->available = m->alloc_size;
+		cm->available -= cm->managed + m->systembits_inalloc;
+		cm->initialized = true;
+	}
 	m->global_available += cm->available;
 	cm->online = true;
 	m->online_maps++;
@@ -370,14 +373,16 @@ void irq_matrix_free(struct irq_matrix *
 	if (WARN_ON_ONCE(bit < m->alloc_start || bit >= m->alloc_end))
 		return;
 
-	if (cm->online) {
-		clear_bit(bit, cm->alloc_map);
-		cm->allocated--;
+	clear_bit(bit, cm->alloc_map);
+	cm->allocated--;
+
+	if (cm->online)
 		m->total_allocated--;
-		if (!managed) {
-			cm->available++;
+
+	if (!managed) {
+		cm->available++;
+		if (cm->online)
 			m->global_available++;
-		}
 	}
 	trace_irq_matrix_free(bit, cpu, m, cm);
 }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-19 14:51         ` Thomas Gleixner
@ 2018-02-19 17:18           ` Randy Dunlap
  2018-02-19 17:26             ` Thomas Gleixner
  2018-02-21  8:45           ` Yuriy Vostrikov
  1 sibling, 1 reply; 11+ messages in thread
From: Randy Dunlap @ 2018-02-19 17:18 UTC (permalink / raw)
  To: Thomas Gleixner, Yuriy Vostrikov; +Cc: linux-kernel, x86

On 02/19/18 06:51, Thomas Gleixner wrote:
>  	} else {
> +		/*
> +		 * Offline case: The current vector needs to be released in
> +		 * the matrix allocator.
> +		 */
> +		if (apicd->vector &&

Drop the "apicd->vector &&" ? (redundant)

> +		    apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
> +			irq_matrix_free(vector_matrix, apicd->cpu,
> +					apicd->vector, managed);
> +		}
>  		apicd->prev_vector = 0;
>  	}


-- 
~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-19 17:18           ` Randy Dunlap
@ 2018-02-19 17:26             ` Thomas Gleixner
  2018-02-19 17:42               ` Randy Dunlap
  2018-02-19 17:46               ` Randy Dunlap
  0 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2018-02-19 17:26 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Yuriy Vostrikov, linux-kernel, x86

On Mon, 19 Feb 2018, Randy Dunlap wrote:

> On 02/19/18 06:51, Thomas Gleixner wrote:
> >  	} else {
> > +		/*
> > +		 * Offline case: The current vector needs to be released in
> > +		 * the matrix allocator.
> > +		 */
> > +		if (apicd->vector &&
> 
> Drop the "apicd->vector &&" ? (redundant)

No. The else path is entered when

    apicd->vector == 0

or

    apicd->cpu is offline

So we need to check here for vector != 0 as otherwise we'd free vector 0
which is invalid. I'll add a comment explaining the mess.

> > +		    apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
> > +			irq_matrix_free(vector_matrix, apicd->cpu,
> > +					apicd->vector, managed);
> > +		}
> >  		apicd->prev_vector = 0;
> >  	}

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-19 17:26             ` Thomas Gleixner
@ 2018-02-19 17:42               ` Randy Dunlap
  2018-02-19 17:46               ` Randy Dunlap
  1 sibling, 0 replies; 11+ messages in thread
From: Randy Dunlap @ 2018-02-19 17:42 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Yuriy Vostrikov, linux-kernel, x86

On 02/19/18 09:26, Thomas Gleixner wrote:
> On Mon, 19 Feb 2018, Randy Dunlap wrote:
> 
>> On 02/19/18 06:51, Thomas Gleixner wrote:
>>>  	} else {
>>> +		/*
>>> +		 * Offline case: The current vector needs to be released in
>>> +		 * the matrix allocator.
>>> +		 */
>>> +		if (apicd->vector &&
>>
>> Drop the "apicd->vector &&" ? (redundant)
> 
> No. The else path is entered when
> 
>     apicd->vector == 0
> 
> or
> 
>     apicd->cpu is offline
> 
> So we need to check here for vector != 0 as otherwise we'd free vector 0
> which is invalid. I'll add a comment explaining the mess.
> 
>>> +		    apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
>>> +			irq_matrix_free(vector_matrix, apicd->cpu,
>>> +					apicd->vector, managed);
>>> +		}
>>>  		apicd->prev_vector = 0;
>>>  	}

I see. Thanks.

-- 
~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-19 17:26             ` Thomas Gleixner
  2018-02-19 17:42               ` Randy Dunlap
@ 2018-02-19 17:46               ` Randy Dunlap
  1 sibling, 0 replies; 11+ messages in thread
From: Randy Dunlap @ 2018-02-19 17:46 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Yuriy Vostrikov, linux-kernel, x86

On 02/19/18 09:26, Thomas Gleixner wrote:
> On Mon, 19 Feb 2018, Randy Dunlap wrote:
> 
>> On 02/19/18 06:51, Thomas Gleixner wrote:
>>>  	} else {
>>> +		/*
>>> +		 * Offline case: The current vector needs to be released in
>>> +		 * the matrix allocator.
>>> +		 */
>>> +		if (apicd->vector &&
>>
>> Drop the "apicd->vector &&" ? (redundant)
> 
> No. The else path is entered when
> 
>     apicd->vector == 0
> 
> or
> 
>     apicd->cpu is offline
> 
> So we need to check here for vector != 0 as otherwise we'd free vector 0
> which is invalid. I'll add a comment explaining the mess.

I just need more coffee.  I read the != (below) as ==.
:(

>>> +		    apicd->vector != MANAGED_IRQ_SHUTDOWN_VECTOR) {
>>> +			irq_matrix_free(vector_matrix, apicd->cpu,
>>> +					apicd->vector, managed);
>>> +		}
>>>  		apicd->prev_vector = 0;
>>>  	}


-- 
~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux acpi (thunderbolt? bug)
  2018-02-19 14:51         ` Thomas Gleixner
  2018-02-19 17:18           ` Randy Dunlap
@ 2018-02-21  8:45           ` Yuriy Vostrikov
  1 sibling, 0 replies; 11+ messages in thread
From: Yuriy Vostrikov @ 2018-02-21  8:45 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, x86

On 19 February 2018 at 17:51, Thomas Gleixner <tglx@linutronix.de> wrote:
> The patch below should cure it.

After applying the patch I'm no longer unable to reproduce the bug. Thank you!

Yuriy

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-02-21  8:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CABbc0=RMYWGN0L=z_Y=FuZJUDzD5NVa2XBTVnmpZxX5tnk3-5g@mail.gmail.com>
2018-02-14 12:09 ` linux acpi (thunderbolt? bug) Thomas Gleixner
2018-02-14 13:55   ` Andy Shevchenko
2018-02-15  8:52   ` Thomas Gleixner
2018-02-16 15:23     ` Yuriy Vostrikov
2018-02-18 18:03       ` Thomas Gleixner
2018-02-19 14:51         ` Thomas Gleixner
2018-02-19 17:18           ` Randy Dunlap
2018-02-19 17:26             ` Thomas Gleixner
2018-02-19 17:42               ` Randy Dunlap
2018-02-19 17:46               ` Randy Dunlap
2018-02-21  8:45           ` Yuriy Vostrikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.