All of lore.kernel.org
 help / color / mirror / Atom feed
* ACPI events broken on non-SMP since 3.16
@ 2015-04-05 14:43 Jim Bos
  2015-04-07 14:34 ` [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 Jiang Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Jim Bos @ 2015-04-05 14:43 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: jiang.liu, tglx


Hello,

Since around 3.16 ACPI events aren't working on a single processor
system, I found this on an old Dell laptop Pentium-M and on AMD geode board.
By ACPI events I mean, acpid (via netlink) and acpi_listen doesn't give
anything when pressing ACPI power button.

Simply 'nosmp' on recent kernel on modern 2-core Celeron system also
stops acpid from working.
Note that booting an SMP kernel with single core isn't enough to break
this, smp really needs to be disabled by 'nosmp' or running it on a
older motherboard without I/O apic ('smbboot: SMP disabled' in dmesg).

With some problems managed to bisect this using VirtualBox guest, which
allows to to specify number of CPU's and having yes/no an I/O Apic.

It bisects to this commit which unfortunately doesn't revert cleanly on
latest 4.0-rcX:

commit 16ee7b3dcc56be14b9a813612cff2cc2339cdced
Author: Jiang Liu <jiang.liu@linux.intel.com>
Date:   Mon Jun 9 16:20:04 2014 +0800

    x86, irq: Simplify the way to handle ISA IRQ

    On startup, setup_IO_APIC_irqs() will program all IOAPIC pins for ISA
    IRQs. Later when mp_map_pin_to_irq() is called, it just returns ISA IRQ
    number without programming corresponding IOAPIC pin.

    This patch consolidates the way to program IOAPIC pins for both ISA and
    non-ISA IRQs into mp_map_pin_to_irq() as below:
    1) For ISA IRQs, mp_irqs array is used to map IOAPIC pin to IRQ and
       mp_irqdomain_map() is used to actually program the pin.
    2) For non-ISA IRQs, irqdomain is used to map IOAPIC pin to IRQ, and
       mp_irqdomain_map() is also used to actually program the pin.


Any ideas ?
_
Jim

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56
  2015-04-05 14:43 ACPI events broken on non-SMP since 3.16 Jim Bos
@ 2015-04-07 14:34 ` Jiang Liu
  2015-04-07 16:49   ` Jim Bos
  0 siblings, 1 reply; 17+ messages in thread
From: Jiang Liu @ 2015-04-07 14:34 UTC (permalink / raw)
  To: jim876, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, Jiang Liu, linux-pm

Hi Jim,
	Could you please help to test this patch against v4.0-rc6?
Thanks!
Gerry

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/kernel/acpi/boot.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 803b684676ff..f7f1fe7cd1b0 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -403,10 +403,14 @@ static int mp_config_acpi_gsi(struct device *dev, u32 gsi, int trigger,
 static int mp_register_gsi(struct device *dev, u32 gsi, int trigger,
 			   int polarity)
 {
-	int irq, node;
+	int i, irq, node;
 
-	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
-		return gsi;
+	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC) {
+		for (i = 0; i < nr_legacy_irqs(); i++)
+			if (isa_irq_to_gsi[i] == gsi)
+				return i;
+		return -1;
+	}
 
 	trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
 	polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56
  2015-04-07 14:34 ` [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 Jiang Liu
@ 2015-04-07 16:49   ` Jim Bos
  2015-04-08  5:26     ` Jiang Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Jim Bos @ 2015-04-07 16:49 UTC (permalink / raw)
  To: Jiang Liu, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, linux-pm

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

On 04/07/2015 04:34 PM, Jiang Liu wrote:
> Hi Jim,
> 	Could you please help to test this patch against v4.0-rc6?
> Thanks!
> Gerry
> 
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> ---
>  arch/x86/kernel/acpi/boot.c |   10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 803b684676ff..f7f1fe7cd1b0 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -403,10 +403,14 @@ static int mp_config_acpi_gsi(struct device *dev, u32 gsi, int trigger,
>  static int mp_register_gsi(struct device *dev, u32 gsi, int trigger,
>  			   int polarity)
>  {
> -	int irq, node;
> +	int i, irq, node;
>  
> -	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
> -		return gsi;
> +	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC) {
> +		for (i = 0; i < nr_legacy_irqs(); i++)
> +			if (isa_irq_to_gsi[i] == gsi)
> +				return i;
> +		return -1;
> +	}
>  
>  	trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
>  	polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
> 

Jiang,

It definitely seems to be an improvement, using Virtualbox guest with
your patch applied acpi-events work for all combinations (smp/nosmp
with/without I/O APIC assigned to the guest).

However, on the Dell laptop it still doesn't work.  To be sure I built a
3.16 kernel on this laptop and acpi_event power-button lid close/open
are working just fine.

Attached config + dmesg + cat /proc/interrupt for the working 3.16 case
and still not working 4.0-rc6+patch case.

Thanks,
Jim








[-- Attachment #2: 3.16-OK.tgz --]
[-- Type: application/x-compressed-tar, Size: 32855 bytes --]

[-- Attachment #3: 4.0rc6-NOT_OK.tgz --]
[-- Type: application/x-compressed-tar, Size: 33681 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56
  2015-04-07 16:49   ` Jim Bos
@ 2015-04-08  5:26     ` Jiang Liu
  2015-04-08 15:51       ` Jim Bos
  0 siblings, 1 reply; 17+ messages in thread
From: Jiang Liu @ 2015-04-08  5:26 UTC (permalink / raw)
  To: Jim Bos, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, linux-pm

On 2015/4/8 0:49, Jim Bos wrote:
> On 04/07/2015 04:34 PM, Jiang Liu wrote:
>> Hi Jim,
>> 	Could you please help to test this patch against v4.0-rc6?
>> Thanks!
>> Gerry
>>
>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>> ---
>>  arch/x86/kernel/acpi/boot.c |   10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index 803b684676ff..f7f1fe7cd1b0 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -403,10 +403,14 @@ static int mp_config_acpi_gsi(struct device *dev, u32 gsi, int trigger,
>>  static int mp_register_gsi(struct device *dev, u32 gsi, int trigger,
>>  			   int polarity)
>>  {
>> -	int irq, node;
>> +	int i, irq, node;
>>  
>> -	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
>> -		return gsi;
>> +	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC) {
>> +		for (i = 0; i < nr_legacy_irqs(); i++)
>> +			if (isa_irq_to_gsi[i] == gsi)
>> +				return i;
>> +		return -1;
>> +	}
>>  
>>  	trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
>>  	polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
>>
> 
> Jiang,
> 
> It definitely seems to be an improvement, using Virtualbox guest with
> your patch applied acpi-events work for all combinations (smp/nosmp
> with/without I/O APIC assigned to the guest).
> 
> However, on the Dell laptop it still doesn't work.  To be sure I built a
> 3.16 kernel on this laptop and acpi_event power-button lid close/open
> are working just fine.
> 
> Attached config + dmesg + cat /proc/interrupt for the working 3.16 case
> and still not working 4.0-rc6+patch case.
Hi Jim,
	According to the attached files, you are building a UP kernel
with IOAPIC enabled. This configuration works well on my HP laptop.
And according to file IRQs from 4.0-rc6, it shows:
 9:          1    XT-PIC  acpi
That means kernel has received one ACPI SCI interrupt, but no
following-on ACPI SCI interrupts, I can't figure out the root cause yet.

So could you please help to dump ACPI tables from your dell laptop by
using acpidump utility?
Thanks!
Gerry

> 
> Thanks,
> Jim
> 
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56
  2015-04-08  5:26     ` Jiang Liu
@ 2015-04-08 15:51       ` Jim Bos
  2015-04-09 10:15         ` Jiang Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Jim Bos @ 2015-04-08 15:51 UTC (permalink / raw)
  To: Jiang Liu, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, linux-pm

[-- Attachment #1: Type: text/plain, Size: 2559 bytes --]

On 04/08/2015 07:26 AM, Jiang Liu wrote:
> On 2015/4/8 0:49, Jim Bos wrote:
>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
>>> Hi Jim,
>>> 	Could you please help to test this patch against v4.0-rc6?
>>> Thanks!
>>> Gerry
>>>
>>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>>> ---
>>>  arch/x86/kernel/acpi/boot.c |   10 +++++++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>>> index 803b684676ff..f7f1fe7cd1b0 100644
>>> --- a/arch/x86/kernel/acpi/boot.c
>>> +++ b/arch/x86/kernel/acpi/boot.c
>>> @@ -403,10 +403,14 @@ static int mp_config_acpi_gsi(struct device *dev, u32 gsi, int trigger,
>>>  static int mp_register_gsi(struct device *dev, u32 gsi, int trigger,
>>>  			   int polarity)
>>>  {
>>> -	int irq, node;
>>> +	int i, irq, node;
>>>  
>>> -	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
>>> -		return gsi;
>>> +	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC) {
>>> +		for (i = 0; i < nr_legacy_irqs(); i++)
>>> +			if (isa_irq_to_gsi[i] == gsi)
>>> +				return i;
>>> +		return -1;
>>> +	}
>>>  
>>>  	trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
>>>  	polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
>>>
>>
>> Jiang,
>>
>> It definitely seems to be an improvement, using Virtualbox guest with
>> your patch applied acpi-events work for all combinations (smp/nosmp
>> with/without I/O APIC assigned to the guest).
>>
>> However, on the Dell laptop it still doesn't work.  To be sure I built a
>> 3.16 kernel on this laptop and acpi_event power-button lid close/open
>> are working just fine.
>>
>> Attached config + dmesg + cat /proc/interrupt for the working 3.16 case
>> and still not working 4.0-rc6+patch case.
> Hi Jim,
> 	According to the attached files, you are building a UP kernel
> with IOAPIC enabled. This configuration works well on my HP laptop.
> And according to file IRQs from 4.0-rc6, it shows:
>  9:          1    XT-PIC  acpi
> That means kernel has received one ACPI SCI interrupt, but no
> following-on ACPI SCI interrupts, I can't figure out the root cause yet.
> 
> So could you please help to dump ACPI tables from your dell laptop by
> using acpidump utility?
> Thanks!
> Gerry
> 
>>
>> Thanks,
>> Jim
>>

Gerry,

Attached acpidump (binary files and regular dump).
I just tested 4.0-rc6+your_patch on another single core system, AMD
geode board, and that works fine now!
So indeed it seems there is something special about the dell laptop as
that's the only system, I've available here, which still has an issue.
_
Jim


[-- Attachment #2: acpi-dell.tgz --]
[-- Type: application/x-compressed-tar, Size: 20404 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56
  2015-04-08 15:51       ` Jim Bos
@ 2015-04-09 10:15         ` Jiang Liu
  2015-04-09 16:41           ` Jim Bos
  0 siblings, 1 reply; 17+ messages in thread
From: Jiang Liu @ 2015-04-09 10:15 UTC (permalink / raw)
  To: Jim Bos, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, linux-pm

On 2015/4/8 23:51, Jim Bos wrote:
> On 04/08/2015 07:26 AM, Jiang Liu wrote:
>> On 2015/4/8 0:49, Jim Bos wrote:
>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
>>>> Hi Jim,
>>>> 	Could you please help to test this patch against v4.0-rc6?
>>>> Thanks!
>>>> Gerry
>>>>
>>>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>>>> ---
>>>>  arch/x86/kernel/acpi/boot.c |   10 +++++++---
>>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>>>> index 803b684676ff..f7f1fe7cd1b0 100644
>>>> --- a/arch/x86/kernel/acpi/boot.c
>>>> +++ b/arch/x86/kernel/acpi/boot.c
>>>> @@ -403,10 +403,14 @@ static int mp_config_acpi_gsi(struct device *dev, u32 gsi, int trigger,
>>>>  static int mp_register_gsi(struct device *dev, u32 gsi, int trigger,
>>>>  			   int polarity)
>>>>  {
>>>> -	int irq, node;
>>>> +	int i, irq, node;
>>>>  
>>>> -	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
>>>> -		return gsi;
>>>> +	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC) {
>>>> +		for (i = 0; i < nr_legacy_irqs(); i++)
>>>> +			if (isa_irq_to_gsi[i] == gsi)
>>>> +				return i;
>>>> +		return -1;
>>>> +	}
>>>>  
>>>>  	trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
>>>>  	polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
>>>>
>>>
>>> Jiang,
>>>
>>> It definitely seems to be an improvement, using Virtualbox guest with
>>> your patch applied acpi-events work for all combinations (smp/nosmp
>>> with/without I/O APIC assigned to the guest).
>>>
>>> However, on the Dell laptop it still doesn't work.  To be sure I built a
>>> 3.16 kernel on this laptop and acpi_event power-button lid close/open
>>> are working just fine.
>>>
>>> Attached config + dmesg + cat /proc/interrupt for the working 3.16 case
>>> and still not working 4.0-rc6+patch case.
>> Hi Jim,
>> 	According to the attached files, you are building a UP kernel
>> with IOAPIC enabled. This configuration works well on my HP laptop.
>> And according to file IRQs from 4.0-rc6, it shows:
>>  9:          1    XT-PIC  acpi
>> That means kernel has received one ACPI SCI interrupt, but no
>> following-on ACPI SCI interrupts, I can't figure out the root cause yet.
>>
>> So could you please help to dump ACPI tables from your dell laptop by
>> using acpidump utility?
>> Thanks!
>> Gerry
>>
>>>
>>> Thanks,
>>> Jim
>>>
> 
> Gerry,
> 
> Attached acpidump (binary files and regular dump).
> I just tested 4.0-rc6+your_patch on another single core system, AMD
> geode board, and that works fine now!
> So indeed it seems there is something special about the dell laptop as
> that's the only system, I've available here, which still has an issue.
Hi Jim,
	I'm really confused. I can't even explain why my previous
patch fixes the issue on AMD geode board now:(

For the Dell laptop, seems you have:
1) build a kernel with Local APIC and IOAPIC enabled
2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
table at all.
That means the laptop is working with 8259 PICs only.
There's little change between 3.16 and 4.0 related to 8259.

For the AMD geode board, I still think original code is right.
I can't explain why the patch fix the issue.

So could you please help to:
1) Try to enable lapic on Dell laptop in BIOS
2) Dump acpi tables and dmesg on AMD board

If that still doesn't help, I will try to send you some
debug patches to gather more info.
Thanks!
Gerry
> _
> Jim
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56
  2015-04-09 10:15         ` Jiang Liu
@ 2015-04-09 16:41           ` Jim Bos
  2015-04-10  1:56             ` Jiang Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Jim Bos @ 2015-04-09 16:41 UTC (permalink / raw)
  To: Jiang Liu, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, linux-pm

On 04/09/2015 12:15 PM, Jiang Liu wrote:
> On 2015/4/8 23:51, Jim Bos wrote:
>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
>>> On 2015/4/8 0:49, Jim Bos wrote:
>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
>>>>> Hi Jim,
>>>>> 	Could you please help to test this patch against v4.0-rc6?
>>>>> Thanks!
>>>>> Gerry
>>>>>
>>>>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>>>>> ---
>>>>>  arch/x86/kernel/acpi/boot.c |   10 +++++++---
>>>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>>>>> index 803b684676ff..f7f1fe7cd1b0 100644
>>>>> --- a/arch/x86/kernel/acpi/boot.c
>>>>> +++ b/arch/x86/kernel/acpi/boot.c
>>>>> @@ -403,10 +403,14 @@ static int mp_config_acpi_gsi(struct device *dev, u32 gsi, int trigger,
>>>>>  static int mp_register_gsi(struct device *dev, u32 gsi, int trigger,
>>>>>  			   int polarity)
>>>>>  {
>>>>> -	int irq, node;
>>>>> +	int i, irq, node;
>>>>>  
>>>>> -	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC)
>>>>> -		return gsi;
>>>>> +	if (acpi_irq_model != ACPI_IRQ_MODEL_IOAPIC) {
>>>>> +		for (i = 0; i < nr_legacy_irqs(); i++)
>>>>> +			if (isa_irq_to_gsi[i] == gsi)
>>>>> +				return i;
>>>>> +		return -1;
>>>>> +	}
>>>>>  
>>>>>  	trigger = trigger == ACPI_EDGE_SENSITIVE ? 0 : 1;
>>>>>  	polarity = polarity == ACPI_ACTIVE_HIGH ? 0 : 1;
>>>>>
>>>>
>>>> Jiang,
>>>>
>>>> It definitely seems to be an improvement, using Virtualbox guest with
>>>> your patch applied acpi-events work for all combinations (smp/nosmp
>>>> with/without I/O APIC assigned to the guest).
>>>>
>>>> However, on the Dell laptop it still doesn't work.  To be sure I built a
>>>> 3.16 kernel on this laptop and acpi_event power-button lid close/open
>>>> are working just fine.
>>>>
>>>> Attached config + dmesg + cat /proc/interrupt for the working 3.16 case
>>>> and still not working 4.0-rc6+patch case.
>>> Hi Jim,
>>> 	According to the attached files, you are building a UP kernel
>>> with IOAPIC enabled. This configuration works well on my HP laptop.
>>> And according to file IRQs from 4.0-rc6, it shows:
>>>  9:          1    XT-PIC  acpi
>>> That means kernel has received one ACPI SCI interrupt, but no
>>> following-on ACPI SCI interrupts, I can't figure out the root cause yet.
>>>
>>> So could you please help to dump ACPI tables from your dell laptop by
>>> using acpidump utility?
>>> Thanks!
>>> Gerry
>>>
>>>>
>>>> Thanks,
>>>> Jim
>>>>
>>
>> Gerry,
>>
>> Attached acpidump (binary files and regular dump).
>> I just tested 4.0-rc6+your_patch on another single core system, AMD
>> geode board, and that works fine now!
>> So indeed it seems there is something special about the dell laptop as
>> that's the only system, I've available here, which still has an issue.
> Hi Jim,
> 	I'm really confused. I can't even explain why my previous
> patch fixes the issue on AMD geode board now:(
> 
> For the Dell laptop, seems you have:
> 1) build a kernel with Local APIC and IOAPIC enabled
> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
> table at all.
> That means the laptop is working with 8259 PICs only.
> There's little change between 3.16 and 4.0 related to 8259.
> 
> For the AMD geode board, I still think original code is right.
> I can't explain why the patch fix the issue.
> 
> So could you please help to:
> 1) Try to enable lapic on Dell laptop in BIOS
> 2) Dump acpi tables and dmesg on AMD board
> 
> If that still doesn't help, I will try to send you some
> debug patches to gather more info.
> Thanks!
> Gerry
>> _
>> Jim
>>
> 

Gerry,

As you mentioned your patch shouldn't make a difference, run some more
tests, as it turns out:
- geode system  broken on 3.16+ up to and including 3.19, however, on
plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
interrupt assigned in non-working kernels.
- other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
when boot parameter 'nosmp' is specified, again no acpi entry in
/proc/interrupts,  working fine on 4.0-rc6

So obviously between 3.19 and 4.0-rc6 something got fixed here!

The Dell laptop remains the only problem then on 4.0-rc6, there IS an
acpi interrupt  (but firing once apparently).
There isn't an option in BIOS to enable LAPIC, however, when specifying
'lapic' as boot parameter I got interesting result, still not working
and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
3.19 and 4.0-rc6 this pops out:

-Local APIC disabled by BIOS -- you can enable it with "lapic"
-APIC: disable apic facility
-APIC: switched to apic NOOP
+Local APIC disabled by BIOS -- reenabling.
+Found and enabled local APIC!

+Enabling APIC mode:  Flat.  Using 0 I/O APICs

Jim


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56
  2015-04-09 16:41           ` Jim Bos
@ 2015-04-10  1:56             ` Jiang Liu
  2015-04-10  6:19               ` Jim Bos
  2015-04-11 15:08               ` [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7 Jim Bos
  0 siblings, 2 replies; 17+ messages in thread
From: Jiang Liu @ 2015-04-10  1:56 UTC (permalink / raw)
  To: Jim Bos, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, linux-pm

On 2015/4/10 0:41, Jim Bos wrote:
> On 04/09/2015 12:15 PM, Jiang Liu wrote:
>> On 2015/4/8 23:51, Jim Bos wrote:
>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
>>>> On 2015/4/8 0:49, Jim Bos wrote:
>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
<snip>
>> Hi Jim,
>> 	I'm really confused. I can't even explain why my previous
>> patch fixes the issue on AMD geode board now:(
>>
>> For the Dell laptop, seems you have:
>> 1) build a kernel with Local APIC and IOAPIC enabled
>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
>> table at all.
>> That means the laptop is working with 8259 PICs only.
>> There's little change between 3.16 and 4.0 related to 8259.
>>
>> For the AMD geode board, I still think original code is right.
>> I can't explain why the patch fix the issue.
>>
>> So could you please help to:
>> 1) Try to enable lapic on Dell laptop in BIOS
>> 2) Dump acpi tables and dmesg on AMD board
>>
>> If that still doesn't help, I will try to send you some
>> debug patches to gather more info.
>> Thanks!
>> Gerry
>>> _
>>> Jim
>>>
>>
> 
> Gerry,
> 
> As you mentioned your patch shouldn't make a difference, run some more
> tests, as it turns out:
> - geode system  broken on 3.16+ up to and including 3.19, however, on
> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
> interrupt assigned in non-working kernels.
> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
> when boot parameter 'nosmp' is specified, again no acpi entry in
> /proc/interrupts,  working fine on 4.0-rc6
> 
> So obviously between 3.19 and 4.0-rc6 something got fixed here!
Hi Jim,
Yes, the bugfix patch should be:
commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")

> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
> acpi interrupt  (but firing once apparently).
> There isn't an option in BIOS to enable LAPIC, however, when specifying
> 'lapic' as boot parameter I got interesting result, still not working
> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
> 3.19 and 4.0-rc6 this pops out:
> 
> -Local APIC disabled by BIOS -- you can enable it with "lapic"
> -APIC: disable apic facility
> -APIC: switched to apic NOOP
> +Local APIC disabled by BIOS -- reenabling.
> +Found and enabled local APIC!
> 
> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
What's the last know working kernel for Dell laptop?
Does it work as expected with v3.19 kernel?
Do you means this message is from plain v4.0-rc6 kernel?
Thanks!
Gerry

> 
> Jim
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56
  2015-04-10  1:56             ` Jiang Liu
@ 2015-04-10  6:19               ` Jim Bos
  2015-04-11 15:08               ` [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7 Jim Bos
  1 sibling, 0 replies; 17+ messages in thread
From: Jim Bos @ 2015-04-10  6:19 UTC (permalink / raw)
  To: Jiang Liu, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, linux-pm

[-- Attachment #1: Type: text/plain, Size: 3023 bytes --]

On 04/10/2015 03:56 AM, Jiang Liu wrote:
> On 2015/4/10 0:41, Jim Bos wrote:
>> On 04/09/2015 12:15 PM, Jiang Liu wrote:
>>> On 2015/4/8 23:51, Jim Bos wrote:
>>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
>>>>> On 2015/4/8 0:49, Jim Bos wrote:
>>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
> <snip>
>>> Hi Jim,
>>> 	I'm really confused. I can't even explain why my previous
>>> patch fixes the issue on AMD geode board now:(
>>>
>>> For the Dell laptop, seems you have:
>>> 1) build a kernel with Local APIC and IOAPIC enabled
>>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
>>> table at all.
>>> That means the laptop is working with 8259 PICs only.
>>> There's little change between 3.16 and 4.0 related to 8259.
>>>
>>> For the AMD geode board, I still think original code is right.
>>> I can't explain why the patch fix the issue.
>>>
>>> So could you please help to:
>>> 1) Try to enable lapic on Dell laptop in BIOS
>>> 2) Dump acpi tables and dmesg on AMD board
>>>
>>> If that still doesn't help, I will try to send you some
>>> debug patches to gather more info.
>>> Thanks!
>>> Gerry
>>>> _
>>>> Jim
>>>>
>>>
>>
>> Gerry,
>>
>> As you mentioned your patch shouldn't make a difference, run some more
>> tests, as it turns out:
>> - geode system  broken on 3.16+ up to and including 3.19, however, on
>> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
>> interrupt assigned in non-working kernels.
>> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
>> when boot parameter 'nosmp' is specified, again no acpi entry in
>> /proc/interrupts,  working fine on 4.0-rc6
>>
>> So obviously between 3.19 and 4.0-rc6 something got fixed here!
> Hi Jim,
> Yes, the bugfix patch should be:
> commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")
> 
>> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
>> acpi interrupt  (but firing once apparently).
>> There isn't an option in BIOS to enable LAPIC, however, when specifying
>> 'lapic' as boot parameter I got interesting result, still not working
>> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
>> 3.19 and 4.0-rc6 this pops out:
>>
>> -Local APIC disabled by BIOS -- you can enable it with "lapic"
>> -APIC: disable apic facility
>> -APIC: switched to apic NOOP
>> +Local APIC disabled by BIOS -- reenabling.
>> +Found and enabled local APIC!
>>
>> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
> What's the last know working kernel for Dell laptop?
> Does it work as expected with v3.19 kernel?
> Do you means this message is from plain v4.0-rc6 kernel?
> Thanks!
> Gerry
> 
>>
>> Jim
>>
> 
Last know working version on the Dell laptop is 3.16 and yes this message"

>> -Local APIC disabled by BIOS -- you can enable it with "lapic"
>> -APIC: disable apic facility
>> -APIC: switched to apic NOOP
>> +Local APIC disabled by BIOS -- reenabling.
>> +Found and enabled local APIC!

is from the v4.0-rc6 message. Full dmesg attached.

Jim.



[-- Attachment #2: dmesg-lapic_specified.gz --]
[-- Type: application/x-gzip, Size: 9807 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7
  2015-04-10  1:56             ` Jiang Liu
  2015-04-10  6:19               ` Jim Bos
@ 2015-04-11 15:08               ` Jim Bos
  2015-04-11 16:57                 ` Jiang Liu
  2015-04-12  1:29                 ` Rafael J. Wysocki
  1 sibling, 2 replies; 17+ messages in thread
From: Jim Bos @ 2015-04-11 15:08 UTC (permalink / raw)
  To: Jiang Liu, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, linux-pm

On 04/10/2015 03:56 AM, Jiang Liu wrote:
> On 2015/4/10 0:41, Jim Bos wrote:
>> On 04/09/2015 12:15 PM, Jiang Liu wrote:
>>> On 2015/4/8 23:51, Jim Bos wrote:
>>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
>>>>> On 2015/4/8 0:49, Jim Bos wrote:
>>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
> <snip>
>>> Hi Jim,
>>> 	I'm really confused. I can't even explain why my previous
>>> patch fixes the issue on AMD geode board now:(
>>>
>>> For the Dell laptop, seems you have:
>>> 1) build a kernel with Local APIC and IOAPIC enabled
>>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
>>> table at all.
>>> That means the laptop is working with 8259 PICs only.
>>> There's little change between 3.16 and 4.0 related to 8259.
>>>
>>> For the AMD geode board, I still think original code is right.
>>> I can't explain why the patch fix the issue.
>>>
>>> So could you please help to:
>>> 1) Try to enable lapic on Dell laptop in BIOS
>>> 2) Dump acpi tables and dmesg on AMD board
>>>
>>> If that still doesn't help, I will try to send you some
>>> debug patches to gather more info.
>>> Thanks!
>>> Gerry
>>>> _
>>>> Jim
>>>>
>>>
>>
>> Gerry,
>>
>> As you mentioned your patch shouldn't make a difference, run some more
>> tests, as it turns out:
>> - geode system  broken on 3.16+ up to and including 3.19, however, on
>> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
>> interrupt assigned in non-working kernels.
>> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
>> when boot parameter 'nosmp' is specified, again no acpi entry in
>> /proc/interrupts,  working fine on 4.0-rc6
>>
>> So obviously between 3.19 and 4.0-rc6 something got fixed here!
> Hi Jim,
> Yes, the bugfix patch should be:
> commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")
> 
>> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
>> acpi interrupt  (but firing once apparently).
>> There isn't an option in BIOS to enable LAPIC, however, when specifying
>> 'lapic' as boot parameter I got interesting result, still not working
>> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
>> 3.19 and 4.0-rc6 this pops out:
>>
>> -Local APIC disabled by BIOS -- you can enable it with "lapic"
>> -APIC: disable apic facility
>> -APIC: switched to apic NOOP
>> +Local APIC disabled by BIOS -- reenabling.
>> +Found and enabled local APIC!
>>
>> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
> What's the last know working kernel for Dell laptop?
> Does it work as expected with v3.19 kernel?
> Do you means this message is from plain v4.0-rc6 kernel?
> Thanks!
> Gerry
> 
>>
>> Jim
>>
> 

Gerry, Rafael,

Ok, found it.
It was very 'interesting' bisect, because there were 2 overlapping
issues here.  On the Dell laptop some kernels there wasn't an ACPI
interrupt at all or it fired once and then seemed to get stuck.
Turns out that kernel version:
3.16: OK
3.17: Broken (no acpi interrupt)
3.18: actually fine
3.19: Broken

So started bisecting between 3.18 or 3.19, in the end found Rafael's
patch which broke it:

==
commit c50f13c672df758b59e026c15b9118f3ed46edc4
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Mon Dec 1 23:50:16 2014 +0100

    ACPICA: Save current masks of enabled GPEs after enable register writes
==

Reverting that patch on top of 4.0-rc7 (with some offsets and one
trivial manual edit) and I finally got a working ACPI interrupt again!

Jim




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7
  2015-04-11 15:08               ` [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7 Jim Bos
@ 2015-04-11 16:57                 ` Jiang Liu
  2015-04-12  1:29                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 17+ messages in thread
From: Jiang Liu @ 2015-04-11 16:57 UTC (permalink / raw)
  To: Jim Bos, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
  Cc: Linux Kernel Mailing List, linux-pm

On 2015/4/11 23:08, Jim Bos wrote:
> On 04/10/2015 03:56 AM, Jiang Liu wrote:
>> On 2015/4/10 0:41, Jim Bos wrote:
>>> On 04/09/2015 12:15 PM, Jiang Liu wrote:
>>>> On 2015/4/8 23:51, Jim Bos wrote:
>>>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
>>>>>> On 2015/4/8 0:49, Jim Bos wrote:
>>>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
>> <snip>
>>>> Hi Jim,
>>>> 	I'm really confused. I can't even explain why my previous
>>>> patch fixes the issue on AMD geode board now:(
>>>>
>>>> For the Dell laptop, seems you have:
>>>> 1) build a kernel with Local APIC and IOAPIC enabled
>>>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
>>>> table at all.
>>>> That means the laptop is working with 8259 PICs only.
>>>> There's little change between 3.16 and 4.0 related to 8259.
>>>>
>>>> For the AMD geode board, I still think original code is right.
>>>> I can't explain why the patch fix the issue.
>>>>
>>>> So could you please help to:
>>>> 1) Try to enable lapic on Dell laptop in BIOS
>>>> 2) Dump acpi tables and dmesg on AMD board
>>>>
>>>> If that still doesn't help, I will try to send you some
>>>> debug patches to gather more info.
>>>> Thanks!
>>>> Gerry
>>>>> _
>>>>> Jim
>>>>>
>>>>
>>>
>>> Gerry,
>>>
>>> As you mentioned your patch shouldn't make a difference, run some more
>>> tests, as it turns out:
>>> - geode system  broken on 3.16+ up to and including 3.19, however, on
>>> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
>>> interrupt assigned in non-working kernels.
>>> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
>>> when boot parameter 'nosmp' is specified, again no acpi entry in
>>> /proc/interrupts,  working fine on 4.0-rc6
>>>
>>> So obviously between 3.19 and 4.0-rc6 something got fixed here!
>> Hi Jim,
>> Yes, the bugfix patch should be:
>> commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")
>>
>>> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
>>> acpi interrupt  (but firing once apparently).
>>> There isn't an option in BIOS to enable LAPIC, however, when specifying
>>> 'lapic' as boot parameter I got interesting result, still not working
>>> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
>>> 3.19 and 4.0-rc6 this pops out:
>>>
>>> -Local APIC disabled by BIOS -- you can enable it with "lapic"
>>> -APIC: disable apic facility
>>> -APIC: switched to apic NOOP
>>> +Local APIC disabled by BIOS -- reenabling.
>>> +Found and enabled local APIC!
>>>
>>> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
>> What's the last know working kernel for Dell laptop?
>> Does it work as expected with v3.19 kernel?
>> Do you means this message is from plain v4.0-rc6 kernel?
>> Thanks!
>> Gerry
>>
>>>
>>> Jim
>>>
>>
> 
> Gerry, Rafael,
> 
> Ok, found it.
> It was very 'interesting' bisect, because there were 2 overlapping
> issues here.  On the Dell laptop some kernels there wasn't an ACPI
> interrupt at all or it fired once and then seemed to get stuck.
> Turns out that kernel version:
> 3.16: OK
> 3.17: Broken (no acpi interrupt)
> 3.18: actually fine
> 3.19: Broken
> 
> So started bisecting between 3.18 or 3.19, in the end found Rafael's
> patch which broke it:
> 
> ==
> commit c50f13c672df758b59e026c15b9118f3ed46edc4
> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Date:   Mon Dec 1 23:50:16 2014 +0100
> 
>     ACPICA: Save current masks of enabled GPEs after enable register writes
> ==
> 
> Reverting that patch on top of 4.0-rc7 (with some offsets and one
> trivial manual edit) and I finally got a working ACPI interrupt again!
Hi Jim,
	Thanks for your great effort to bisect the regression.
And it gets reasonable now:)
Thanks!
Gerry

> 
> Jim
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7
  2015-04-11 15:08               ` [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7 Jim Bos
  2015-04-11 16:57                 ` Jiang Liu
@ 2015-04-12  1:29                 ` Rafael J. Wysocki
  2015-04-12  9:03                   ` Jim Bos
  1 sibling, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2015-04-12  1:29 UTC (permalink / raw)
  To: Jim Bos
  Cc: Jiang Liu, Len Brown, Pavel Machek, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Linux Kernel Mailing List, linux-pm

On Saturday, April 11, 2015 05:08:38 PM Jim Bos wrote:
> On 04/10/2015 03:56 AM, Jiang Liu wrote:
> > On 2015/4/10 0:41, Jim Bos wrote:
> >> On 04/09/2015 12:15 PM, Jiang Liu wrote:
> >>> On 2015/4/8 23:51, Jim Bos wrote:
> >>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
> >>>>> On 2015/4/8 0:49, Jim Bos wrote:
> >>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
> > <snip>
> >>> Hi Jim,
> >>> 	I'm really confused. I can't even explain why my previous
> >>> patch fixes the issue on AMD geode board now:(
> >>>
> >>> For the Dell laptop, seems you have:
> >>> 1) build a kernel with Local APIC and IOAPIC enabled
> >>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
> >>> table at all.
> >>> That means the laptop is working with 8259 PICs only.
> >>> There's little change between 3.16 and 4.0 related to 8259.
> >>>
> >>> For the AMD geode board, I still think original code is right.
> >>> I can't explain why the patch fix the issue.
> >>>
> >>> So could you please help to:
> >>> 1) Try to enable lapic on Dell laptop in BIOS
> >>> 2) Dump acpi tables and dmesg on AMD board
> >>>
> >>> If that still doesn't help, I will try to send you some
> >>> debug patches to gather more info.
> >>> Thanks!
> >>> Gerry
> >>>> _
> >>>> Jim
> >>>>
> >>>
> >>
> >> Gerry,
> >>
> >> As you mentioned your patch shouldn't make a difference, run some more
> >> tests, as it turns out:
> >> - geode system  broken on 3.16+ up to and including 3.19, however, on
> >> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
> >> interrupt assigned in non-working kernels.
> >> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
> >> when boot parameter 'nosmp' is specified, again no acpi entry in
> >> /proc/interrupts,  working fine on 4.0-rc6
> >>
> >> So obviously between 3.19 and 4.0-rc6 something got fixed here!
> > Hi Jim,
> > Yes, the bugfix patch should be:
> > commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")
> > 
> >> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
> >> acpi interrupt  (but firing once apparently).
> >> There isn't an option in BIOS to enable LAPIC, however, when specifying
> >> 'lapic' as boot parameter I got interesting result, still not working
> >> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
> >> 3.19 and 4.0-rc6 this pops out:
> >>
> >> -Local APIC disabled by BIOS -- you can enable it with "lapic"
> >> -APIC: disable apic facility
> >> -APIC: switched to apic NOOP
> >> +Local APIC disabled by BIOS -- reenabling.
> >> +Found and enabled local APIC!
> >>
> >> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
> > What's the last know working kernel for Dell laptop?
> > Does it work as expected with v3.19 kernel?
> > Do you means this message is from plain v4.0-rc6 kernel?
> > Thanks!
> > Gerry
> > 
> >>
> >> Jim
> >>
> > 
> 
> Gerry, Rafael,
> 
> Ok, found it.
> It was very 'interesting' bisect, because there were 2 overlapping
> issues here.  On the Dell laptop some kernels there wasn't an ACPI
> interrupt at all or it fired once and then seemed to get stuck.
> Turns out that kernel version:
> 3.16: OK
> 3.17: Broken (no acpi interrupt)
> 3.18: actually fine
> 3.19: Broken
> 
> So started bisecting between 3.18 or 3.19, in the end found Rafael's
> patch which broke it:
> 
> ==
> commit c50f13c672df758b59e026c15b9118f3ed46edc4
> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Date:   Mon Dec 1 23:50:16 2014 +0100
> 
>     ACPICA: Save current masks of enabled GPEs after enable register writes
> ==
> 
> Reverting that patch on top of 4.0-rc7 (with some offsets and one
> trivial manual edit) and I finally got a working ACPI interrupt again!

That's unexpected.

Is system suspend/resume involved in the reproduction of the problem in any way?

In any case, does it help if you replace "enable_mask" with "enable_for_run"
in line 127 of drivers/acpi/acpica/hwgpe.c (without reverting the whole commit)?


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7
  2015-04-12  1:29                 ` Rafael J. Wysocki
@ 2015-04-12  9:03                   ` Jim Bos
  2015-04-12 23:30                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 17+ messages in thread
From: Jim Bos @ 2015-04-12  9:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiang Liu, Len Brown, Pavel Machek, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Linux Kernel Mailing List, linux-pm

[-- Attachment #1: Type: text/plain, Size: 4607 bytes --]

On 04/12/2015 03:29 AM, Rafael J. Wysocki wrote:
> On Saturday, April 11, 2015 05:08:38 PM Jim Bos wrote:
>> On 04/10/2015 03:56 AM, Jiang Liu wrote:
>>> On 2015/4/10 0:41, Jim Bos wrote:
>>>> On 04/09/2015 12:15 PM, Jiang Liu wrote:
>>>>> On 2015/4/8 23:51, Jim Bos wrote:
>>>>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
>>>>>>> On 2015/4/8 0:49, Jim Bos wrote:
>>>>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
>>> <snip>
>>>>> Hi Jim,
>>>>> 	I'm really confused. I can't even explain why my previous
>>>>> patch fixes the issue on AMD geode board now:(
>>>>>
>>>>> For the Dell laptop, seems you have:
>>>>> 1) build a kernel with Local APIC and IOAPIC enabled
>>>>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
>>>>> table at all.
>>>>> That means the laptop is working with 8259 PICs only.
>>>>> There's little change between 3.16 and 4.0 related to 8259.
>>>>>
>>>>> For the AMD geode board, I still think original code is right.
>>>>> I can't explain why the patch fix the issue.
>>>>>
>>>>> So could you please help to:
>>>>> 1) Try to enable lapic on Dell laptop in BIOS
>>>>> 2) Dump acpi tables and dmesg on AMD board
>>>>>
>>>>> If that still doesn't help, I will try to send you some
>>>>> debug patches to gather more info.
>>>>> Thanks!
>>>>> Gerry
>>>>>> _
>>>>>> Jim
>>>>>>
>>>>>
>>>>
>>>> Gerry,
>>>>
>>>> As you mentioned your patch shouldn't make a difference, run some more
>>>> tests, as it turns out:
>>>> - geode system  broken on 3.16+ up to and including 3.19, however, on
>>>> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
>>>> interrupt assigned in non-working kernels.
>>>> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
>>>> when boot parameter 'nosmp' is specified, again no acpi entry in
>>>> /proc/interrupts,  working fine on 4.0-rc6
>>>>
>>>> So obviously between 3.19 and 4.0-rc6 something got fixed here!
>>> Hi Jim,
>>> Yes, the bugfix patch should be:
>>> commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")
>>>
>>>> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
>>>> acpi interrupt  (but firing once apparently).
>>>> There isn't an option in BIOS to enable LAPIC, however, when specifying
>>>> 'lapic' as boot parameter I got interesting result, still not working
>>>> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
>>>> 3.19 and 4.0-rc6 this pops out:
>>>>
>>>> -Local APIC disabled by BIOS -- you can enable it with "lapic"
>>>> -APIC: disable apic facility
>>>> -APIC: switched to apic NOOP
>>>> +Local APIC disabled by BIOS -- reenabling.
>>>> +Found and enabled local APIC!
>>>>
>>>> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
>>> What's the last know working kernel for Dell laptop?
>>> Does it work as expected with v3.19 kernel?
>>> Do you means this message is from plain v4.0-rc6 kernel?
>>> Thanks!
>>> Gerry
>>>
>>>>
>>>> Jim
>>>>
>>>
>>
>> Gerry, Rafael,
>>
>> Ok, found it.
>> It was very 'interesting' bisect, because there were 2 overlapping
>> issues here.  On the Dell laptop some kernels there wasn't an ACPI
>> interrupt at all or it fired once and then seemed to get stuck.
>> Turns out that kernel version:
>> 3.16: OK
>> 3.17: Broken (no acpi interrupt)
>> 3.18: actually fine
>> 3.19: Broken
>>
>> So started bisecting between 3.18 or 3.19, in the end found Rafael's
>> patch which broke it:
>>
>> ==
>> commit c50f13c672df758b59e026c15b9118f3ed46edc4
>> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Date:   Mon Dec 1 23:50:16 2014 +0100
>>
>>     ACPICA: Save current masks of enabled GPEs after enable register writes
>> ==
>>
>> Reverting that patch on top of 4.0-rc7 (with some offsets and one
>> trivial manual edit) and I finally got a working ACPI interrupt again!
> 
> That's unexpected.
> 
> Is system suspend/resume involved in the reproduction of the problem in any way?
> 
> In any case, does it help if you replace "enable_mask" with "enable_for_run"
> in line 127 of drivers/acpi/acpica/hwgpe.c (without reverting the whole commit)?
> 
> 

No suspends/resumes, but this suggestion works :-)


--- hwgpe.c.ORIG        2015-04-12 10:41:11.754104398 +0200
+++ hwgpe.c     2015-04-12 10:42:38.021283593 +0200
@@ -124,7 +124,7 @@

                /* Only enable if the corresponding enable_mask bit is
set */

-               if (!(register_bit & gpe_register_info->enable_mask)) {
+               if (!(register_bit & gpe_register_info->enable_for_run)) {
                        return (AE_BAD_PARAMETER);
                }

Tested-by: Jim Bos <jim876@xs4all.nl>

Thanks,
Jim

[-- Attachment #2: patch-fix-acpi_irq_dell --]
[-- Type: text/plain, Size: 349 bytes --]

--- hwgpe.c.ORIG	2015-04-12 10:41:11.754104398 +0200
+++ hwgpe.c	2015-04-12 10:42:38.021283593 +0200
@@ -124,7 +124,7 @@
 
 		/* Only enable if the corresponding enable_mask bit is set */
 
-		if (!(register_bit & gpe_register_info->enable_mask)) {
+		if (!(register_bit & gpe_register_info->enable_for_run)) {
 			return (AE_BAD_PARAMETER);
 		}
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7
  2015-04-12  9:03                   ` Jim Bos
@ 2015-04-12 23:30                     ` Rafael J. Wysocki
  2015-04-13 13:36                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2015-04-12 23:30 UTC (permalink / raw)
  To: Jim Bos
  Cc: Jiang Liu, Len Brown, Pavel Machek, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Linux Kernel Mailing List, linux-pm,
	Lv Zheng

On Sunday, April 12, 2015 11:03:30 AM Jim Bos wrote:
> On 04/12/2015 03:29 AM, Rafael J. Wysocki wrote:
> > On Saturday, April 11, 2015 05:08:38 PM Jim Bos wrote:
> >> On 04/10/2015 03:56 AM, Jiang Liu wrote:
> >>> On 2015/4/10 0:41, Jim Bos wrote:
> >>>> On 04/09/2015 12:15 PM, Jiang Liu wrote:
> >>>>> On 2015/4/8 23:51, Jim Bos wrote:
> >>>>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
> >>>>>>> On 2015/4/8 0:49, Jim Bos wrote:
> >>>>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
> >>> <snip>
> >>>>> Hi Jim,
> >>>>> 	I'm really confused. I can't even explain why my previous
> >>>>> patch fixes the issue on AMD geode board now:(
> >>>>>
> >>>>> For the Dell laptop, seems you have:
> >>>>> 1) build a kernel with Local APIC and IOAPIC enabled
> >>>>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
> >>>>> table at all.
> >>>>> That means the laptop is working with 8259 PICs only.
> >>>>> There's little change between 3.16 and 4.0 related to 8259.
> >>>>>
> >>>>> For the AMD geode board, I still think original code is right.
> >>>>> I can't explain why the patch fix the issue.
> >>>>>
> >>>>> So could you please help to:
> >>>>> 1) Try to enable lapic on Dell laptop in BIOS
> >>>>> 2) Dump acpi tables and dmesg on AMD board
> >>>>>
> >>>>> If that still doesn't help, I will try to send you some
> >>>>> debug patches to gather more info.
> >>>>> Thanks!
> >>>>> Gerry
> >>>>>> _
> >>>>>> Jim
> >>>>>>
> >>>>>
> >>>>
> >>>> Gerry,
> >>>>
> >>>> As you mentioned your patch shouldn't make a difference, run some more
> >>>> tests, as it turns out:
> >>>> - geode system  broken on 3.16+ up to and including 3.19, however, on
> >>>> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
> >>>> interrupt assigned in non-working kernels.
> >>>> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
> >>>> when boot parameter 'nosmp' is specified, again no acpi entry in
> >>>> /proc/interrupts,  working fine on 4.0-rc6
> >>>>
> >>>> So obviously between 3.19 and 4.0-rc6 something got fixed here!
> >>> Hi Jim,
> >>> Yes, the bugfix patch should be:
> >>> commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")
> >>>
> >>>> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
> >>>> acpi interrupt  (but firing once apparently).
> >>>> There isn't an option in BIOS to enable LAPIC, however, when specifying
> >>>> 'lapic' as boot parameter I got interesting result, still not working
> >>>> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
> >>>> 3.19 and 4.0-rc6 this pops out:
> >>>>
> >>>> -Local APIC disabled by BIOS -- you can enable it with "lapic"
> >>>> -APIC: disable apic facility
> >>>> -APIC: switched to apic NOOP
> >>>> +Local APIC disabled by BIOS -- reenabling.
> >>>> +Found and enabled local APIC!
> >>>>
> >>>> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
> >>> What's the last know working kernel for Dell laptop?
> >>> Does it work as expected with v3.19 kernel?
> >>> Do you means this message is from plain v4.0-rc6 kernel?
> >>> Thanks!
> >>> Gerry
> >>>
> >>>>
> >>>> Jim
> >>>>
> >>>
> >>
> >> Gerry, Rafael,
> >>
> >> Ok, found it.
> >> It was very 'interesting' bisect, because there were 2 overlapping
> >> issues here.  On the Dell laptop some kernels there wasn't an ACPI
> >> interrupt at all or it fired once and then seemed to get stuck.
> >> Turns out that kernel version:
> >> 3.16: OK
> >> 3.17: Broken (no acpi interrupt)
> >> 3.18: actually fine
> >> 3.19: Broken
> >>
> >> So started bisecting between 3.18 or 3.19, in the end found Rafael's
> >> patch which broke it:
> >>
> >> ==
> >> commit c50f13c672df758b59e026c15b9118f3ed46edc4
> >> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> Date:   Mon Dec 1 23:50:16 2014 +0100
> >>
> >>     ACPICA: Save current masks of enabled GPEs after enable register writes
> >> ==
> >>
> >> Reverting that patch on top of 4.0-rc7 (with some offsets and one
> >> trivial manual edit) and I finally got a working ACPI interrupt again!
> > 
> > That's unexpected.
> > 
> > Is system suspend/resume involved in the reproduction of the problem in any way?
> > 
> > In any case, does it help if you replace "enable_mask" with "enable_for_run"
> > in line 127 of drivers/acpi/acpica/hwgpe.c (without reverting the whole commit)?
> > 
> > 
> 
> No suspends/resumes, but this suggestion works :-)

OK

> --- hwgpe.c.ORIG        2015-04-12 10:41:11.754104398 +0200
> +++ hwgpe.c     2015-04-12 10:42:38.021283593 +0200
> @@ -124,7 +124,7 @@
> 
>                 /* Only enable if the corresponding enable_mask bit is
> set */
> 
> -               if (!(register_bit & gpe_register_info->enable_mask)) {
> +               if (!(register_bit & gpe_register_info->enable_for_run)) {
>                         return (AE_BAD_PARAMETER);
>                 }
> 
> Tested-by: Jim Bos <jim876@xs4all.nl>

No, no, this is not a fix. :-)

It means, though, that enable_for_run and enable_mask diverge at one point and,
moreover, enable_for_run has more bits set, which is *really* mysterious.

So the only place modifying enable_for_run is acpi_ev_update_gpe_enable_mask()
which roughly does this:
	(a) Find the register bit corresponding to the given GPE.
	(b) Clear that bit in enable_for_run.
	(c) If runtime_count is set for the GPE, set that bit in enable_for_run.
Thus the only case when a bit may be set in enable_for_run is when runtime_count
is nonzero for the GPE in acpi_ev_update_gpe_enable_mask().

Now, acpi_ev_update_gpe_enable_mask() is only called from two places,
acpi_ev_add_gpe_reference() and acpi_ev_remove_gpe_reference().  The former
calls it when runtime_count has just been incremented and is now equal to one
and the latter calls it when runtime_count has just been decremented and is now
equal to zero.  Hence, if all of the involved functions return AE_OK,
acpi_ev_add_gpe_reference() may set the corresponding bit in enable_for_run
and acpi_ev_remove_gpe_reference() may clear it.

Further, having set the bit in enable_for_run, acpi_ev_add_gpe_reference() calls
acpi_ev_enable_gpe() which then calls acpi_hw_low_set_gpe() with ACPI_GPE_SAVE_MASK
set in the second argument.  Again, this causes the corresponding bit to be set in
the register's enable_mask, unless any errors are returned.  In that case the
bit will be set in both enable_for_run and enable_mask and analogously for
acpi_ev_remove_gpe_reference().  So if I'm not overlooking anything and if all of
the involved calls are successful, enable_for_run and enable_mask will always be
in sync.

As far as I can say that may change *only* if there's an error, because in that
case (1) we may not save the mask that we attempted to write to the register and
(2) we will reset runtime_count *without* updating enable_for_run which arguably
is a bug.  So the previous patch might just work accidentally.

If that theory holds any water, the patch below may help too (instead of the
previous one), so please test it.  If it doesn't help, we'll need to find out
what exactly happens on that system, but surely it is *not* usual behavior.


---
 drivers/acpi/acpica/hwgpe.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-pm/drivers/acpi/acpica/hwgpe.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/hwgpe.c
+++ linux-pm/drivers/acpi/acpica/hwgpe.c
@@ -149,7 +149,7 @@ acpi_hw_low_set_gpe(struct acpi_gpe_even
 	/* Write the updated enable mask */
 
 	status = acpi_hw_write(enable_mask, &gpe_register_info->enable_address);
-	if (ACPI_SUCCESS(status) && (action & ACPI_GPE_SAVE_MASK)) {
+	if (action & ACPI_GPE_SAVE_MASK) {
 		gpe_register_info->enable_mask = (u8)enable_mask;
 	}
 	return (status);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7
  2015-04-12 23:30                     ` Rafael J. Wysocki
@ 2015-04-13 13:36                       ` Rafael J. Wysocki
  2015-04-13 14:54                         ` Jim Bos
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2015-04-13 13:36 UTC (permalink / raw)
  To: Jim Bos
  Cc: Jiang Liu, Len Brown, Pavel Machek, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Linux Kernel Mailing List, linux-pm,
	Lv Zheng, ACPI Devel Maling List, Bob Moore

On Monday, April 13, 2015 01:30:20 AM Rafael J. Wysocki wrote:
> On Sunday, April 12, 2015 11:03:30 AM Jim Bos wrote:
> > On 04/12/2015 03:29 AM, Rafael J. Wysocki wrote:
> > > On Saturday, April 11, 2015 05:08:38 PM Jim Bos wrote:
> > >> On 04/10/2015 03:56 AM, Jiang Liu wrote:
> > >>> On 2015/4/10 0:41, Jim Bos wrote:
> > >>>> On 04/09/2015 12:15 PM, Jiang Liu wrote:
> > >>>>> On 2015/4/8 23:51, Jim Bos wrote:
> > >>>>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
> > >>>>>>> On 2015/4/8 0:49, Jim Bos wrote:
> > >>>>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
> > >>> <snip>
> > >>>>> Hi Jim,
> > >>>>> 	I'm really confused. I can't even explain why my previous
> > >>>>> patch fixes the issue on AMD geode board now:(
> > >>>>>
> > >>>>> For the Dell laptop, seems you have:
> > >>>>> 1) build a kernel with Local APIC and IOAPIC enabled
> > >>>>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
> > >>>>> table at all.
> > >>>>> That means the laptop is working with 8259 PICs only.
> > >>>>> There's little change between 3.16 and 4.0 related to 8259.
> > >>>>>
> > >>>>> For the AMD geode board, I still think original code is right.
> > >>>>> I can't explain why the patch fix the issue.
> > >>>>>
> > >>>>> So could you please help to:
> > >>>>> 1) Try to enable lapic on Dell laptop in BIOS
> > >>>>> 2) Dump acpi tables and dmesg on AMD board
> > >>>>>
> > >>>>> If that still doesn't help, I will try to send you some
> > >>>>> debug patches to gather more info.
> > >>>>> Thanks!
> > >>>>> Gerry
> > >>>>>> _
> > >>>>>> Jim
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>> Gerry,
> > >>>>
> > >>>> As you mentioned your patch shouldn't make a difference, run some more
> > >>>> tests, as it turns out:
> > >>>> - geode system  broken on 3.16+ up to and including 3.19, however, on
> > >>>> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
> > >>>> interrupt assigned in non-working kernels.
> > >>>> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
> > >>>> when boot parameter 'nosmp' is specified, again no acpi entry in
> > >>>> /proc/interrupts,  working fine on 4.0-rc6
> > >>>>
> > >>>> So obviously between 3.19 and 4.0-rc6 something got fixed here!
> > >>> Hi Jim,
> > >>> Yes, the bugfix patch should be:
> > >>> commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")
> > >>>
> > >>>> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
> > >>>> acpi interrupt  (but firing once apparently).
> > >>>> There isn't an option in BIOS to enable LAPIC, however, when specifying
> > >>>> 'lapic' as boot parameter I got interesting result, still not working
> > >>>> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
> > >>>> 3.19 and 4.0-rc6 this pops out:
> > >>>>
> > >>>> -Local APIC disabled by BIOS -- you can enable it with "lapic"
> > >>>> -APIC: disable apic facility
> > >>>> -APIC: switched to apic NOOP
> > >>>> +Local APIC disabled by BIOS -- reenabling.
> > >>>> +Found and enabled local APIC!
> > >>>>
> > >>>> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
> > >>> What's the last know working kernel for Dell laptop?
> > >>> Does it work as expected with v3.19 kernel?
> > >>> Do you means this message is from plain v4.0-rc6 kernel?
> > >>> Thanks!
> > >>> Gerry
> > >>>
> > >>>>
> > >>>> Jim
> > >>>>
> > >>>
> > >>
> > >> Gerry, Rafael,
> > >>
> > >> Ok, found it.
> > >> It was very 'interesting' bisect, because there were 2 overlapping
> > >> issues here.  On the Dell laptop some kernels there wasn't an ACPI
> > >> interrupt at all or it fired once and then seemed to get stuck.
> > >> Turns out that kernel version:
> > >> 3.16: OK
> > >> 3.17: Broken (no acpi interrupt)
> > >> 3.18: actually fine
> > >> 3.19: Broken
> > >>
> > >> So started bisecting between 3.18 or 3.19, in the end found Rafael's
> > >> patch which broke it:
> > >>
> > >> ==
> > >> commit c50f13c672df758b59e026c15b9118f3ed46edc4
> > >> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >> Date:   Mon Dec 1 23:50:16 2014 +0100
> > >>
> > >>     ACPICA: Save current masks of enabled GPEs after enable register writes
> > >> ==
> > >>
> > >> Reverting that patch on top of 4.0-rc7 (with some offsets and one
> > >> trivial manual edit) and I finally got a working ACPI interrupt again!
> > > 
> > > That's unexpected.
> > > 
> > > Is system suspend/resume involved in the reproduction of the problem in any way?
> > > 
> > > In any case, does it help if you replace "enable_mask" with "enable_for_run"
> > > in line 127 of drivers/acpi/acpica/hwgpe.c (without reverting the whole commit)?
> > > 
> > > 
> > 
> > No suspends/resumes, but this suggestion works :-)
> 
> OK
> 
> > --- hwgpe.c.ORIG        2015-04-12 10:41:11.754104398 +0200
> > +++ hwgpe.c     2015-04-12 10:42:38.021283593 +0200
> > @@ -124,7 +124,7 @@
> > 
> >                 /* Only enable if the corresponding enable_mask bit is
> > set */
> > 
> > -               if (!(register_bit & gpe_register_info->enable_mask)) {
> > +               if (!(register_bit & gpe_register_info->enable_for_run)) {
> >                         return (AE_BAD_PARAMETER);
> >                 }
> > 
> > Tested-by: Jim Bos <jim876@xs4all.nl>
> 
> No, no, this is not a fix. :-)
> 
> It means, though, that enable_for_run and enable_mask diverge at one point and,
> moreover, enable_for_run has more bits set, which is *really* mysterious.
> 
> So the only place modifying enable_for_run is acpi_ev_update_gpe_enable_mask()
> which roughly does this:
> 	(a) Find the register bit corresponding to the given GPE.
> 	(b) Clear that bit in enable_for_run.
> 	(c) If runtime_count is set for the GPE, set that bit in enable_for_run.
> Thus the only case when a bit may be set in enable_for_run is when runtime_count
> is nonzero for the GPE in acpi_ev_update_gpe_enable_mask().
> 
> Now, acpi_ev_update_gpe_enable_mask() is only called from two places,
> acpi_ev_add_gpe_reference() and acpi_ev_remove_gpe_reference().  The former
> calls it when runtime_count has just been incremented and is now equal to one
> and the latter calls it when runtime_count has just been decremented and is now
> equal to zero.  Hence, if all of the involved functions return AE_OK,
> acpi_ev_add_gpe_reference() may set the corresponding bit in enable_for_run
> and acpi_ev_remove_gpe_reference() may clear it.
> 
> Further, having set the bit in enable_for_run, acpi_ev_add_gpe_reference() calls
> acpi_ev_enable_gpe() which then calls acpi_hw_low_set_gpe() with ACPI_GPE_SAVE_MASK
> set in the second argument.  Again, this causes the corresponding bit to be set in
> the register's enable_mask, unless any errors are returned.  In that case the
> bit will be set in both enable_for_run and enable_mask and analogously for
> acpi_ev_remove_gpe_reference().  So if I'm not overlooking anything and if all of
> the involved calls are successful, enable_for_run and enable_mask will always be
> in sync.
> 
> As far as I can say that may change *only* if there's an error, because in that
> case (1) we may not save the mask that we attempted to write to the register and
> (2) we will reset runtime_count *without* updating enable_for_run which arguably
> is a bug.  So the previous patch might just work accidentally.
> 
> If that theory holds any water, the patch below may help too (instead of the
> previous one), so please test it.  If it doesn't help, we'll need to find out
> what exactly happens on that system, but surely it is *not* usual behavior.

Actually, the one below is better, so please test this one instead.

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: ACPICA: Store GPE register enable masks upfront

It is reported that ACPI interrupts do not work any more on
Dell Latitude D600 after commit c50f13c672df (ACPICA: Save
current masks of enabled GPEs after enable register writes).
The problem turns out to be related to the fact that the
enable_mask and enable_for_run GPE bit masks are not in
sync (in the absence of any system suspend/resume events)
for at least one GPE register on that machine.

Address this problem by writing the enable_for_run mask into
enable_mask as soon as enable_for_run is updated instead of
doing that only after the subsequent register write has
succeeded.  For consistency, update acpi_hw_gpe_enable_write()
to store the bit mask to be written into the GPE register
in enable_mask unconditionally before the write.

Since the ACPI_GPE_SAVE_MASK flag is not necessary any more after
that, drop it along with the symbols depending on it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/acpica/evgpe.c |    5 +++--
 drivers/acpi/acpica/hwgpe.c |   11 ++++-------
 include/acpi/actypes.h      |    4 ----
 3 files changed, 7 insertions(+), 13 deletions(-)

Index: linux-pm/drivers/acpi/acpica/hwgpe.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/hwgpe.c
+++ linux-pm/drivers/acpi/acpica/hwgpe.c
@@ -89,6 +89,8 @@ u32 acpi_hw_get_gpe_register_bit(struct
  * RETURN:	Status
  *
  * DESCRIPTION: Enable or disable a single GPE in the parent enable register.
+ *              The enable_mask field of the involved GPE register structure
+ *              must be updated by the caller if necessary.
  *
  ******************************************************************************/
 
@@ -119,7 +121,7 @@ acpi_hw_low_set_gpe(struct acpi_gpe_even
 	/* Set or clear just the bit that corresponds to this GPE */
 
 	register_bit = acpi_hw_get_gpe_register_bit(gpe_event_info);
-	switch (action & ~ACPI_GPE_SAVE_MASK) {
+	switch (action) {
 	case ACPI_GPE_CONDITIONAL_ENABLE:
 
 		/* Only enable if the corresponding enable_mask bit is set */
@@ -149,9 +151,6 @@ acpi_hw_low_set_gpe(struct acpi_gpe_even
 	/* Write the updated enable mask */
 
 	status = acpi_hw_write(enable_mask, &gpe_register_info->enable_address);
-	if (ACPI_SUCCESS(status) && (action & ACPI_GPE_SAVE_MASK)) {
-		gpe_register_info->enable_mask = (u8)enable_mask;
-	}
 	return (status);
 }
 
@@ -286,10 +285,8 @@ acpi_hw_gpe_enable_write(u8 enable_mask,
 {
 	acpi_status status;
 
+	gpe_register_info->enable_mask = enable_mask;
 	status = acpi_hw_write(enable_mask, &gpe_register_info->enable_address);
-	if (ACPI_SUCCESS(status)) {
-		gpe_register_info->enable_mask = enable_mask;
-	}
 	return (status);
 }
 
Index: linux-pm/include/acpi/actypes.h
===================================================================
--- linux-pm.orig/include/acpi/actypes.h
+++ linux-pm/include/acpi/actypes.h
@@ -736,10 +736,6 @@ typedef u32 acpi_event_status;
 #define ACPI_GPE_ENABLE                 0
 #define ACPI_GPE_DISABLE                1
 #define ACPI_GPE_CONDITIONAL_ENABLE     2
-#define ACPI_GPE_SAVE_MASK              4
-
-#define ACPI_GPE_ENABLE_SAVE            (ACPI_GPE_ENABLE | ACPI_GPE_SAVE_MASK)
-#define ACPI_GPE_DISABLE_SAVE           (ACPI_GPE_DISABLE | ACPI_GPE_SAVE_MASK)
 
 /*
  * GPE info flags - Per GPE
Index: linux-pm/drivers/acpi/acpica/evgpe.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/evgpe.c
+++ linux-pm/drivers/acpi/acpica/evgpe.c
@@ -92,6 +92,7 @@ acpi_ev_update_gpe_enable_mask(struct ac
 		ACPI_SET_BIT(gpe_register_info->enable_for_run,
 			     (u8)register_bit);
 	}
+	gpe_register_info->enable_mask = gpe_register_info->enable_for_run;
 
 	return_ACPI_STATUS(AE_OK);
 }
@@ -123,7 +124,7 @@ acpi_status acpi_ev_enable_gpe(struct ac
 
 	/* Enable the requested GPE */
 
-	status = acpi_hw_low_set_gpe(gpe_event_info, ACPI_GPE_ENABLE_SAVE);
+	status = acpi_hw_low_set_gpe(gpe_event_info, ACPI_GPE_ENABLE);
 	return_ACPI_STATUS(status);
 }
 
@@ -202,7 +203,7 @@ acpi_ev_remove_gpe_reference(struct acpi
 		if (ACPI_SUCCESS(status)) {
 			status =
 			    acpi_hw_low_set_gpe(gpe_event_info,
-						ACPI_GPE_DISABLE_SAVE);
+						ACPI_GPE_DISABLE);
 		}
 
 		if (ACPI_FAILURE(status)) {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7
  2015-04-13 13:36                       ` Rafael J. Wysocki
@ 2015-04-13 14:54                         ` Jim Bos
  2015-04-13 19:46                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 17+ messages in thread
From: Jim Bos @ 2015-04-13 14:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jiang Liu, Len Brown, Pavel Machek, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Linux Kernel Mailing List, linux-pm,
	Lv Zheng, ACPI Devel Maling List, Bob Moore

On 04/13/2015 03:36 PM, Rafael J. Wysocki wrote:
> On Monday, April 13, 2015 01:30:20 AM Rafael J. Wysocki wrote:
>> On Sunday, April 12, 2015 11:03:30 AM Jim Bos wrote:
>>> On 04/12/2015 03:29 AM, Rafael J. Wysocki wrote:
>>>> On Saturday, April 11, 2015 05:08:38 PM Jim Bos wrote:
>>>>> On 04/10/2015 03:56 AM, Jiang Liu wrote:
>>>>>> On 2015/4/10 0:41, Jim Bos wrote:
>>>>>>> On 04/09/2015 12:15 PM, Jiang Liu wrote:
>>>>>>>> On 2015/4/8 23:51, Jim Bos wrote:
>>>>>>>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
>>>>>>>>>> On 2015/4/8 0:49, Jim Bos wrote:
>>>>>>>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
>>>>>> <snip>
>>>>>>>> Hi Jim,
>>>>>>>> 	I'm really confused. I can't even explain why my previous
>>>>>>>> patch fixes the issue on AMD geode board now:(
>>>>>>>>
>>>>>>>> For the Dell laptop, seems you have:
>>>>>>>> 1) build a kernel with Local APIC and IOAPIC enabled
>>>>>>>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
>>>>>>>> table at all.
>>>>>>>> That means the laptop is working with 8259 PICs only.
>>>>>>>> There's little change between 3.16 and 4.0 related to 8259.
>>>>>>>>
>>>>>>>> For the AMD geode board, I still think original code is right.
>>>>>>>> I can't explain why the patch fix the issue.
>>>>>>>>
>>>>>>>> So could you please help to:
>>>>>>>> 1) Try to enable lapic on Dell laptop in BIOS
>>>>>>>> 2) Dump acpi tables and dmesg on AMD board
>>>>>>>>
>>>>>>>> If that still doesn't help, I will try to send you some
>>>>>>>> debug patches to gather more info.
>>>>>>>> Thanks!
>>>>>>>> Gerry
>>>>>>>>> _
>>>>>>>>> Jim
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Gerry,
>>>>>>>
>>>>>>> As you mentioned your patch shouldn't make a difference, run some more
>>>>>>> tests, as it turns out:
>>>>>>> - geode system  broken on 3.16+ up to and including 3.19, however, on
>>>>>>> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
>>>>>>> interrupt assigned in non-working kernels.
>>>>>>> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
>>>>>>> when boot parameter 'nosmp' is specified, again no acpi entry in
>>>>>>> /proc/interrupts,  working fine on 4.0-rc6
>>>>>>>
>>>>>>> So obviously between 3.19 and 4.0-rc6 something got fixed here!
>>>>>> Hi Jim,
>>>>>> Yes, the bugfix patch should be:
>>>>>> commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")
>>>>>>
>>>>>>> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
>>>>>>> acpi interrupt  (but firing once apparently).
>>>>>>> There isn't an option in BIOS to enable LAPIC, however, when specifying
>>>>>>> 'lapic' as boot parameter I got interesting result, still not working
>>>>>>> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
>>>>>>> 3.19 and 4.0-rc6 this pops out:
>>>>>>>
>>>>>>> -Local APIC disabled by BIOS -- you can enable it with "lapic"
>>>>>>> -APIC: disable apic facility
>>>>>>> -APIC: switched to apic NOOP
>>>>>>> +Local APIC disabled by BIOS -- reenabling.
>>>>>>> +Found and enabled local APIC!
>>>>>>>
>>>>>>> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
>>>>>> What's the last know working kernel for Dell laptop?
>>>>>> Does it work as expected with v3.19 kernel?
>>>>>> Do you means this message is from plain v4.0-rc6 kernel?
>>>>>> Thanks!
>>>>>> Gerry
>>>>>>
>>>>>>>
>>>>>>> Jim
>>>>>>>
>>>>>>
>>>>>
>>>>> Gerry, Rafael,
>>>>>
>>>>> Ok, found it.
>>>>> It was very 'interesting' bisect, because there were 2 overlapping
>>>>> issues here.  On the Dell laptop some kernels there wasn't an ACPI
>>>>> interrupt at all or it fired once and then seemed to get stuck.
>>>>> Turns out that kernel version:
>>>>> 3.16: OK
>>>>> 3.17: Broken (no acpi interrupt)
>>>>> 3.18: actually fine
>>>>> 3.19: Broken
>>>>>
>>>>> So started bisecting between 3.18 or 3.19, in the end found Rafael's
>>>>> patch which broke it:
>>>>>
>>>>> ==
>>>>> commit c50f13c672df758b59e026c15b9118f3ed46edc4
>>>>> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>> Date:   Mon Dec 1 23:50:16 2014 +0100
>>>>>
>>>>>     ACPICA: Save current masks of enabled GPEs after enable register writes
>>>>> ==
>>>>>
>>>>> Reverting that patch on top of 4.0-rc7 (with some offsets and one
>>>>> trivial manual edit) and I finally got a working ACPI interrupt again!
>>>>
>>>> That's unexpected.
>>>>
>>>> Is system suspend/resume involved in the reproduction of the problem in any way?
>>>>
>>>> In any case, does it help if you replace "enable_mask" with "enable_for_run"
>>>> in line 127 of drivers/acpi/acpica/hwgpe.c (without reverting the whole commit)?
>>>>
>>>>
>>>
>>> No suspends/resumes, but this suggestion works :-)
>>
>> OK
>>
>>> --- hwgpe.c.ORIG        2015-04-12 10:41:11.754104398 +0200
>>> +++ hwgpe.c     2015-04-12 10:42:38.021283593 +0200
>>> @@ -124,7 +124,7 @@
>>>
>>>                 /* Only enable if the corresponding enable_mask bit is
>>> set */
>>>
>>> -               if (!(register_bit & gpe_register_info->enable_mask)) {
>>> +               if (!(register_bit & gpe_register_info->enable_for_run)) {
>>>                         return (AE_BAD_PARAMETER);
>>>                 }
>>>
>>> Tested-by: Jim Bos <jim876@xs4all.nl>
>>
>> No, no, this is not a fix. :-)
>>
>> It means, though, that enable_for_run and enable_mask diverge at one point and,
>> moreover, enable_for_run has more bits set, which is *really* mysterious.
>>
>> So the only place modifying enable_for_run is acpi_ev_update_gpe_enable_mask()
>> which roughly does this:
>> 	(a) Find the register bit corresponding to the given GPE.
>> 	(b) Clear that bit in enable_for_run.
>> 	(c) If runtime_count is set for the GPE, set that bit in enable_for_run.
>> Thus the only case when a bit may be set in enable_for_run is when runtime_count
>> is nonzero for the GPE in acpi_ev_update_gpe_enable_mask().
>>
>> Now, acpi_ev_update_gpe_enable_mask() is only called from two places,
>> acpi_ev_add_gpe_reference() and acpi_ev_remove_gpe_reference().  The former
>> calls it when runtime_count has just been incremented and is now equal to one
>> and the latter calls it when runtime_count has just been decremented and is now
>> equal to zero.  Hence, if all of the involved functions return AE_OK,
>> acpi_ev_add_gpe_reference() may set the corresponding bit in enable_for_run
>> and acpi_ev_remove_gpe_reference() may clear it.
>>
>> Further, having set the bit in enable_for_run, acpi_ev_add_gpe_reference() calls
>> acpi_ev_enable_gpe() which then calls acpi_hw_low_set_gpe() with ACPI_GPE_SAVE_MASK
>> set in the second argument.  Again, this causes the corresponding bit to be set in
>> the register's enable_mask, unless any errors are returned.  In that case the
>> bit will be set in both enable_for_run and enable_mask and analogously for
>> acpi_ev_remove_gpe_reference().  So if I'm not overlooking anything and if all of
>> the involved calls are successful, enable_for_run and enable_mask will always be
>> in sync.
>>
>> As far as I can say that may change *only* if there's an error, because in that
>> case (1) we may not save the mask that we attempted to write to the register and
>> (2) we will reset runtime_count *without* updating enable_for_run which arguably
>> is a bug.  So the previous patch might just work accidentally.
>>
>> If that theory holds any water, the patch below may help too (instead of the
>> previous one), so please test it.  If it doesn't help, we'll need to find out
>> what exactly happens on that system, but surely it is *not* usual behavior.
> 
> Actually, the one below is better, so please test this one instead.
> 
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: ACPICA: Store GPE register enable masks upfront
> 
> It is reported that ACPI interrupts do not work any more on
> Dell Latitude D600 after commit c50f13c672df (ACPICA: Save
> current masks of enabled GPEs after enable register writes).
> The problem turns out to be related to the fact that the
> enable_mask and enable_for_run GPE bit masks are not in
> sync (in the absence of any system suspend/resume events)
> for at least one GPE register on that machine.
> 
> Address this problem by writing the enable_for_run mask into
> enable_mask as soon as enable_for_run is updated instead of
> doing that only after the subsequent register write has
> succeeded.  For consistency, update acpi_hw_gpe_enable_write()
> to store the bit mask to be written into the GPE register
> in enable_mask unconditionally before the write.
> 
> Since the ACPI_GPE_SAVE_MASK flag is not necessary any more after
> that, drop it along with the symbols depending on it.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/acpi/acpica/evgpe.c |    5 +++--
>  drivers/acpi/acpica/hwgpe.c |   11 ++++-------
>  include/acpi/actypes.h      |    4 ----
>  3 files changed, 7 insertions(+), 13 deletions(-)
> 
> Index: linux-pm/drivers/acpi/acpica/hwgpe.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/hwgpe.c
> +++ linux-pm/drivers/acpi/acpica/hwgpe.c
> @@ -89,6 +89,8 @@ u32 acpi_hw_get_gpe_register_bit(struct
>   * RETURN:	Status
>   *
>   * DESCRIPTION: Enable or disable a single GPE in the parent enable register.
> + *              The enable_mask field of the involved GPE register structure
> + *              must be updated by the caller if necessary.
>   *
>   ******************************************************************************/
>  
> @@ -119,7 +121,7 @@ acpi_hw_low_set_gpe(struct acpi_gpe_even
>  	/* Set or clear just the bit that corresponds to this GPE */
>  
>  	register_bit = acpi_hw_get_gpe_register_bit(gpe_event_info);
> -	switch (action & ~ACPI_GPE_SAVE_MASK) {
> +	switch (action) {
>  	case ACPI_GPE_CONDITIONAL_ENABLE:
>  
>  		/* Only enable if the corresponding enable_mask bit is set */
> @@ -149,9 +151,6 @@ acpi_hw_low_set_gpe(struct acpi_gpe_even
>  	/* Write the updated enable mask */
>  
>  	status = acpi_hw_write(enable_mask, &gpe_register_info->enable_address);
> -	if (ACPI_SUCCESS(status) && (action & ACPI_GPE_SAVE_MASK)) {
> -		gpe_register_info->enable_mask = (u8)enable_mask;
> -	}
>  	return (status);
>  }
>  
> @@ -286,10 +285,8 @@ acpi_hw_gpe_enable_write(u8 enable_mask,
>  {
>  	acpi_status status;
>  
> +	gpe_register_info->enable_mask = enable_mask;
>  	status = acpi_hw_write(enable_mask, &gpe_register_info->enable_address);
> -	if (ACPI_SUCCESS(status)) {
> -		gpe_register_info->enable_mask = enable_mask;
> -	}
>  	return (status);
>  }
>  
> Index: linux-pm/include/acpi/actypes.h
> ===================================================================
> --- linux-pm.orig/include/acpi/actypes.h
> +++ linux-pm/include/acpi/actypes.h
> @@ -736,10 +736,6 @@ typedef u32 acpi_event_status;
>  #define ACPI_GPE_ENABLE                 0
>  #define ACPI_GPE_DISABLE                1
>  #define ACPI_GPE_CONDITIONAL_ENABLE     2
> -#define ACPI_GPE_SAVE_MASK              4
> -
> -#define ACPI_GPE_ENABLE_SAVE            (ACPI_GPE_ENABLE | ACPI_GPE_SAVE_MASK)
> -#define ACPI_GPE_DISABLE_SAVE           (ACPI_GPE_DISABLE | ACPI_GPE_SAVE_MASK)
>  
>  /*
>   * GPE info flags - Per GPE
> Index: linux-pm/drivers/acpi/acpica/evgpe.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/evgpe.c
> +++ linux-pm/drivers/acpi/acpica/evgpe.c
> @@ -92,6 +92,7 @@ acpi_ev_update_gpe_enable_mask(struct ac
>  		ACPI_SET_BIT(gpe_register_info->enable_for_run,
>  			     (u8)register_bit);
>  	}
> +	gpe_register_info->enable_mask = gpe_register_info->enable_for_run;
>  
>  	return_ACPI_STATUS(AE_OK);
>  }
> @@ -123,7 +124,7 @@ acpi_status acpi_ev_enable_gpe(struct ac
>  
>  	/* Enable the requested GPE */
>  
> -	status = acpi_hw_low_set_gpe(gpe_event_info, ACPI_GPE_ENABLE_SAVE);
> +	status = acpi_hw_low_set_gpe(gpe_event_info, ACPI_GPE_ENABLE);
>  	return_ACPI_STATUS(status);
>  }
>  
> @@ -202,7 +203,7 @@ acpi_ev_remove_gpe_reference(struct acpi
>  		if (ACPI_SUCCESS(status)) {
>  			status =
>  			    acpi_hw_low_set_gpe(gpe_event_info,
> -						ACPI_GPE_DISABLE_SAVE);
> +						ACPI_GPE_DISABLE);
>  		}
>  
>  		if (ACPI_FAILURE(status)) {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Works!  If this is actually the fix ;-)

Tested-by: Jim Bos <jim876@xs4all.nl>


_
Jim



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7
  2015-04-13 14:54                         ` Jim Bos
@ 2015-04-13 19:46                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2015-04-13 19:46 UTC (permalink / raw)
  To: Jim Bos
  Cc: Jiang Liu, Len Brown, Pavel Machek, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Linux Kernel Mailing List, linux-pm,
	Lv Zheng, ACPI Devel Maling List, Bob Moore

On Monday, April 13, 2015 04:54:16 PM Jim Bos wrote:
> On 04/13/2015 03:36 PM, Rafael J. Wysocki wrote:
> > On Monday, April 13, 2015 01:30:20 AM Rafael J. Wysocki wrote:
> >> On Sunday, April 12, 2015 11:03:30 AM Jim Bos wrote:
> >>> On 04/12/2015 03:29 AM, Rafael J. Wysocki wrote:
> >>>> On Saturday, April 11, 2015 05:08:38 PM Jim Bos wrote:
> >>>>> On 04/10/2015 03:56 AM, Jiang Liu wrote:
> >>>>>> On 2015/4/10 0:41, Jim Bos wrote:
> >>>>>>> On 04/09/2015 12:15 PM, Jiang Liu wrote:
> >>>>>>>> On 2015/4/8 23:51, Jim Bos wrote:
> >>>>>>>>> On 04/08/2015 07:26 AM, Jiang Liu wrote:
> >>>>>>>>>> On 2015/4/8 0:49, Jim Bos wrote:
> >>>>>>>>>>> On 04/07/2015 04:34 PM, Jiang Liu wrote:
> >>>>>> <snip>
> >>>>>>>> Hi Jim,
> >>>>>>>> 	I'm really confused. I can't even explain why my previous
> >>>>>>>> patch fixes the issue on AMD geode board now:(
> >>>>>>>>
> >>>>>>>> For the Dell laptop, seems you have:
> >>>>>>>> 1) build a kernel with Local APIC and IOAPIC enabled
> >>>>>>>> 2) lapic is disabled by BIOS, so there's no ACPI MADT(APIC)
> >>>>>>>> table at all.
> >>>>>>>> That means the laptop is working with 8259 PICs only.
> >>>>>>>> There's little change between 3.16 and 4.0 related to 8259.
> >>>>>>>>
> >>>>>>>> For the AMD geode board, I still think original code is right.
> >>>>>>>> I can't explain why the patch fix the issue.
> >>>>>>>>
> >>>>>>>> So could you please help to:
> >>>>>>>> 1) Try to enable lapic on Dell laptop in BIOS
> >>>>>>>> 2) Dump acpi tables and dmesg on AMD board
> >>>>>>>>
> >>>>>>>> If that still doesn't help, I will try to send you some
> >>>>>>>> debug patches to gather more info.
> >>>>>>>> Thanks!
> >>>>>>>> Gerry
> >>>>>>>>> _
> >>>>>>>>> Jim
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> Gerry,
> >>>>>>>
> >>>>>>> As you mentioned your patch shouldn't make a difference, run some more
> >>>>>>> tests, as it turns out:
> >>>>>>> - geode system  broken on 3.16+ up to and including 3.19, however, on
> >>>>>>> plain 4.0-rc6 it works!  Root cause appears to be there isn't an ACPI
> >>>>>>> interrupt assigned in non-working kernels.
> >>>>>>> - other system I got my hands on: Pentium(R) CPU G3220, broken on 3.19.0
> >>>>>>> when boot parameter 'nosmp' is specified, again no acpi entry in
> >>>>>>> /proc/interrupts,  working fine on 4.0-rc6
> >>>>>>>
> >>>>>>> So obviously between 3.19 and 4.0-rc6 something got fixed here!
> >>>>>> Hi Jim,
> >>>>>> Yes, the bugfix patch should be:
> >>>>>> commit 1ea76fbadd66("x86/irq: Fix regression caused by commit b568b8601f05")
> >>>>>>
> >>>>>>> The Dell laptop remains the only problem then on 4.0-rc6, there IS an
> >>>>>>> acpi interrupt  (but firing once apparently).
> >>>>>>> There isn't an option in BIOS to enable LAPIC, however, when specifying
> >>>>>>> 'lapic' as boot parameter I got interesting result, still not working
> >>>>>>> and /proc/interrups still shows XT-PIC.  Doing a diff between dmesg on
> >>>>>>> 3.19 and 4.0-rc6 this pops out:
> >>>>>>>
> >>>>>>> -Local APIC disabled by BIOS -- you can enable it with "lapic"
> >>>>>>> -APIC: disable apic facility
> >>>>>>> -APIC: switched to apic NOOP
> >>>>>>> +Local APIC disabled by BIOS -- reenabling.
> >>>>>>> +Found and enabled local APIC!
> >>>>>>>
> >>>>>>> +Enabling APIC mode:  Flat.  Using 0 I/O APICs
> >>>>>> What's the last know working kernel for Dell laptop?
> >>>>>> Does it work as expected with v3.19 kernel?
> >>>>>> Do you means this message is from plain v4.0-rc6 kernel?
> >>>>>> Thanks!
> >>>>>> Gerry
> >>>>>>
> >>>>>>>
> >>>>>>> Jim
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> Gerry, Rafael,
> >>>>>
> >>>>> Ok, found it.
> >>>>> It was very 'interesting' bisect, because there were 2 overlapping
> >>>>> issues here.  On the Dell laptop some kernels there wasn't an ACPI
> >>>>> interrupt at all or it fired once and then seemed to get stuck.
> >>>>> Turns out that kernel version:
> >>>>> 3.16: OK
> >>>>> 3.17: Broken (no acpi interrupt)
> >>>>> 3.18: actually fine
> >>>>> 3.19: Broken
> >>>>>
> >>>>> So started bisecting between 3.18 or 3.19, in the end found Rafael's
> >>>>> patch which broke it:
> >>>>>
> >>>>> ==
> >>>>> commit c50f13c672df758b59e026c15b9118f3ed46edc4
> >>>>> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>>> Date:   Mon Dec 1 23:50:16 2014 +0100
> >>>>>
> >>>>>     ACPICA: Save current masks of enabled GPEs after enable register writes
> >>>>> ==
> >>>>>
> >>>>> Reverting that patch on top of 4.0-rc7 (with some offsets and one
> >>>>> trivial manual edit) and I finally got a working ACPI interrupt again!
> >>>>
> >>>> That's unexpected.
> >>>>
> >>>> Is system suspend/resume involved in the reproduction of the problem in any way?
> >>>>
> >>>> In any case, does it help if you replace "enable_mask" with "enable_for_run"
> >>>> in line 127 of drivers/acpi/acpica/hwgpe.c (without reverting the whole commit)?
> >>>>
> >>>>
> >>>
> >>> No suspends/resumes, but this suggestion works :-)
> >>
> >> OK
> >>
> >>> --- hwgpe.c.ORIG        2015-04-12 10:41:11.754104398 +0200
> >>> +++ hwgpe.c     2015-04-12 10:42:38.021283593 +0200
> >>> @@ -124,7 +124,7 @@
> >>>
> >>>                 /* Only enable if the corresponding enable_mask bit is
> >>> set */
> >>>
> >>> -               if (!(register_bit & gpe_register_info->enable_mask)) {
> >>> +               if (!(register_bit & gpe_register_info->enable_for_run)) {
> >>>                         return (AE_BAD_PARAMETER);
> >>>                 }
> >>>
> >>> Tested-by: Jim Bos <jim876@xs4all.nl>
> >>
> >> No, no, this is not a fix. :-)
> >>
> >> It means, though, that enable_for_run and enable_mask diverge at one point and,
> >> moreover, enable_for_run has more bits set, which is *really* mysterious.
> >>
> >> So the only place modifying enable_for_run is acpi_ev_update_gpe_enable_mask()
> >> which roughly does this:
> >> 	(a) Find the register bit corresponding to the given GPE.
> >> 	(b) Clear that bit in enable_for_run.
> >> 	(c) If runtime_count is set for the GPE, set that bit in enable_for_run.
> >> Thus the only case when a bit may be set in enable_for_run is when runtime_count
> >> is nonzero for the GPE in acpi_ev_update_gpe_enable_mask().
> >>
> >> Now, acpi_ev_update_gpe_enable_mask() is only called from two places,
> >> acpi_ev_add_gpe_reference() and acpi_ev_remove_gpe_reference().  The former
> >> calls it when runtime_count has just been incremented and is now equal to one
> >> and the latter calls it when runtime_count has just been decremented and is now
> >> equal to zero.  Hence, if all of the involved functions return AE_OK,
> >> acpi_ev_add_gpe_reference() may set the corresponding bit in enable_for_run
> >> and acpi_ev_remove_gpe_reference() may clear it.
> >>
> >> Further, having set the bit in enable_for_run, acpi_ev_add_gpe_reference() calls
> >> acpi_ev_enable_gpe() which then calls acpi_hw_low_set_gpe() with ACPI_GPE_SAVE_MASK
> >> set in the second argument.  Again, this causes the corresponding bit to be set in
> >> the register's enable_mask, unless any errors are returned.  In that case the
> >> bit will be set in both enable_for_run and enable_mask and analogously for
> >> acpi_ev_remove_gpe_reference().  So if I'm not overlooking anything and if all of
> >> the involved calls are successful, enable_for_run and enable_mask will always be
> >> in sync.
> >>
> >> As far as I can say that may change *only* if there's an error, because in that
> >> case (1) we may not save the mask that we attempted to write to the register and
> >> (2) we will reset runtime_count *without* updating enable_for_run which arguably
> >> is a bug.  So the previous patch might just work accidentally.
> >>
> >> If that theory holds any water, the patch below may help too (instead of the
> >> previous one), so please test it.  If it doesn't help, we'll need to find out
> >> what exactly happens on that system, but surely it is *not* usual behavior.
> > 
> > Actually, the one below is better, so please test this one instead.
> > 
> > ---
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Subject: ACPICA: Store GPE register enable masks upfront
> > 
> > It is reported that ACPI interrupts do not work any more on
> > Dell Latitude D600 after commit c50f13c672df (ACPICA: Save
> > current masks of enabled GPEs after enable register writes).
> > The problem turns out to be related to the fact that the
> > enable_mask and enable_for_run GPE bit masks are not in
> > sync (in the absence of any system suspend/resume events)
> > for at least one GPE register on that machine.
> > 
> > Address this problem by writing the enable_for_run mask into
> > enable_mask as soon as enable_for_run is updated instead of
> > doing that only after the subsequent register write has
> > succeeded.  For consistency, update acpi_hw_gpe_enable_write()
> > to store the bit mask to be written into the GPE register
> > in enable_mask unconditionally before the write.
> > 
> > Since the ACPI_GPE_SAVE_MASK flag is not necessary any more after
> > that, drop it along with the symbols depending on it.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/acpi/acpica/evgpe.c |    5 +++--
> >  drivers/acpi/acpica/hwgpe.c |   11 ++++-------
> >  include/acpi/actypes.h      |    4 ----
> >  3 files changed, 7 insertions(+), 13 deletions(-)
> > 
> > Index: linux-pm/drivers/acpi/acpica/hwgpe.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/acpica/hwgpe.c
> > +++ linux-pm/drivers/acpi/acpica/hwgpe.c
> > @@ -89,6 +89,8 @@ u32 acpi_hw_get_gpe_register_bit(struct
> >   * RETURN:	Status
> >   *
> >   * DESCRIPTION: Enable or disable a single GPE in the parent enable register.
> > + *              The enable_mask field of the involved GPE register structure
> > + *              must be updated by the caller if necessary.
> >   *
> >   ******************************************************************************/
> >  
> > @@ -119,7 +121,7 @@ acpi_hw_low_set_gpe(struct acpi_gpe_even
> >  	/* Set or clear just the bit that corresponds to this GPE */
> >  
> >  	register_bit = acpi_hw_get_gpe_register_bit(gpe_event_info);
> > -	switch (action & ~ACPI_GPE_SAVE_MASK) {
> > +	switch (action) {
> >  	case ACPI_GPE_CONDITIONAL_ENABLE:
> >  
> >  		/* Only enable if the corresponding enable_mask bit is set */
> > @@ -149,9 +151,6 @@ acpi_hw_low_set_gpe(struct acpi_gpe_even
> >  	/* Write the updated enable mask */
> >  
> >  	status = acpi_hw_write(enable_mask, &gpe_register_info->enable_address);
> > -	if (ACPI_SUCCESS(status) && (action & ACPI_GPE_SAVE_MASK)) {
> > -		gpe_register_info->enable_mask = (u8)enable_mask;
> > -	}
> >  	return (status);
> >  }
> >  
> > @@ -286,10 +285,8 @@ acpi_hw_gpe_enable_write(u8 enable_mask,
> >  {
> >  	acpi_status status;
> >  
> > +	gpe_register_info->enable_mask = enable_mask;
> >  	status = acpi_hw_write(enable_mask, &gpe_register_info->enable_address);
> > -	if (ACPI_SUCCESS(status)) {
> > -		gpe_register_info->enable_mask = enable_mask;
> > -	}
> >  	return (status);
> >  }
> >  
> > Index: linux-pm/include/acpi/actypes.h
> > ===================================================================
> > --- linux-pm.orig/include/acpi/actypes.h
> > +++ linux-pm/include/acpi/actypes.h
> > @@ -736,10 +736,6 @@ typedef u32 acpi_event_status;
> >  #define ACPI_GPE_ENABLE                 0
> >  #define ACPI_GPE_DISABLE                1
> >  #define ACPI_GPE_CONDITIONAL_ENABLE     2
> > -#define ACPI_GPE_SAVE_MASK              4
> > -
> > -#define ACPI_GPE_ENABLE_SAVE            (ACPI_GPE_ENABLE | ACPI_GPE_SAVE_MASK)
> > -#define ACPI_GPE_DISABLE_SAVE           (ACPI_GPE_DISABLE | ACPI_GPE_SAVE_MASK)
> >  
> >  /*
> >   * GPE info flags - Per GPE
> > Index: linux-pm/drivers/acpi/acpica/evgpe.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/acpica/evgpe.c
> > +++ linux-pm/drivers/acpi/acpica/evgpe.c
> > @@ -92,6 +92,7 @@ acpi_ev_update_gpe_enable_mask(struct ac
> >  		ACPI_SET_BIT(gpe_register_info->enable_for_run,
> >  			     (u8)register_bit);
> >  	}
> > +	gpe_register_info->enable_mask = gpe_register_info->enable_for_run;
> >  
> >  	return_ACPI_STATUS(AE_OK);
> >  }
> > @@ -123,7 +124,7 @@ acpi_status acpi_ev_enable_gpe(struct ac
> >  
> >  	/* Enable the requested GPE */
> >  
> > -	status = acpi_hw_low_set_gpe(gpe_event_info, ACPI_GPE_ENABLE_SAVE);
> > +	status = acpi_hw_low_set_gpe(gpe_event_info, ACPI_GPE_ENABLE);
> >  	return_ACPI_STATUS(status);
> >  }
> >  
> > @@ -202,7 +203,7 @@ acpi_ev_remove_gpe_reference(struct acpi
> >  		if (ACPI_SUCCESS(status)) {
> >  			status =
> >  			    acpi_hw_low_set_gpe(gpe_event_info,
> > -						ACPI_GPE_DISABLE_SAVE);
> > +						ACPI_GPE_DISABLE);
> >  		}
> >  
> >  		if (ACPI_FAILURE(status)) {
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 
> Works!  If this is actually the fix ;-)
> 
> Tested-by: Jim Bos <jim876@xs4all.nl>

Yes, this is the fix I'd like to apply unless others have objections.

Thanks!


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-04-13 19:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-05 14:43 ACPI events broken on non-SMP since 3.16 Jim Bos
2015-04-07 14:34 ` [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 Jiang Liu
2015-04-07 16:49   ` Jim Bos
2015-04-08  5:26     ` Jiang Liu
2015-04-08 15:51       ` Jim Bos
2015-04-09 10:15         ` Jiang Liu
2015-04-09 16:41           ` Jim Bos
2015-04-10  1:56             ` Jiang Liu
2015-04-10  6:19               ` Jim Bos
2015-04-11 15:08               ` [PATCH] x86/ACPI: Fix regression caused by 16ee7b3dcc56 & c50f13c672df7 Jim Bos
2015-04-11 16:57                 ` Jiang Liu
2015-04-12  1:29                 ` Rafael J. Wysocki
2015-04-12  9:03                   ` Jim Bos
2015-04-12 23:30                     ` Rafael J. Wysocki
2015-04-13 13:36                       ` Rafael J. Wysocki
2015-04-13 14:54                         ` Jim Bos
2015-04-13 19:46                           ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.