All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel
@ 2015-06-24  7:43 Jiang Liu
  2015-06-24  8:25 ` Boszormenyi Zoltan
  2015-06-24  8:30 ` Ingo Molnar
  0 siblings, 2 replies; 18+ messages in thread
From: Jiang Liu @ 2015-06-24  7:43 UTC (permalink / raw)
  To: Rafael J . Wysocki, Bjorn Helgaas, Ingo Molnar,
	Boszormenyi Zoltan, Len Brown
  Cc: Jiang Liu, LKML, linux-pci, linux-acpi, x86 @ kernel . org

Since commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource
interfaces to simplify implementation"), x86 PCI ACPI host bridge driver
validates ACPI resources by first converting an ACPI resource to
a 'struct resource' structure and then applying checks against the
converted resource structure. The 'start' and 'end' fields in 'struct
resource' are defined to be type of resource_size_t, which may be 32 bits
or 64 bits depending on CONFIG_PHYS_ADDR_T_64BIT.

This may cause incorrect resource validation results with 32 bit kernels
because 64bit ACPI resource descriptors may get truncated when converting
to 32bit 'start' and 'end' fields in 'struct resource'. And eventually
affects PCI resource allocation subsystem and causes some PCI devices
unusable.

So enhance the ACPI resource parsing interfaces to ignore ACPI resource
descriptors with address/offset observe 4G when running in 32bit mode.
This reverts to the behavior before commit 593669c2ac0f.

This issue was triggered on a platform running 32bit kernel with an
ACPI resource descriptor with address range [0x400000000-0xfffffffff].
Please refer to https://lkml.org/lkml/2015/6/19/277 for more information.

Reported-by: Boszormenyi Zoltan <zboszor@pr.hu>
Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # 4.0
---

Hi Zoltan,
	Could you please help to test this patch against the latest kernel?
Thanks!
Gerry

---
 drivers/acpi/resource.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 8244f013f210..f1c966e05078 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -193,6 +193,7 @@ static bool acpi_decode_space(struct resource_win *win,
 	u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 : ACPI_DECODE_16;
 	bool wp = addr->info.mem.write_protect;
 	u64 len = attr->address_length;
+	u64 start, end, offset = 0;
 	struct resource *res = &win->res;
 
 	/*
@@ -204,9 +205,6 @@ static bool acpi_decode_space(struct resource_win *win,
 		pr_debug("ACPI: Invalid address space min_addr_fix %d, max_addr_fix %d, len %llx\n",
 			 addr->min_address_fixed, addr->max_address_fixed, len);
 
-	res->start = attr->minimum;
-	res->end = attr->maximum;
-
 	/*
 	 * For bridges that translate addresses across the bridge,
 	 * translation_offset is the offset that must be added to the
@@ -214,12 +212,22 @@ static bool acpi_decode_space(struct resource_win *win,
 	 * primary side. Non-bridge devices must list 0 for all Address
 	 * Translation offset bits.
 	 */
-	if (addr->producer_consumer == ACPI_PRODUCER) {
-		res->start += attr->translation_offset;
-		res->end += attr->translation_offset;
-	} else if (attr->translation_offset) {
+	if (addr->producer_consumer == ACPI_PRODUCER)
+		offset = attr->translation_offset;
+	else if (attr->translation_offset)
 		pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
 			 attr->translation_offset);
+	start = attr->minimum + offset;
+	end = attr->maximum + offset;
+
+	win->offset = offset;
+	res->start = start;
+	res->end = end;
+	if (sizeof(resource_size_t) < sizeof(u64) &&
+	    (offset != win->offset || start != res->start || end != res->end)) {
+		pr_warn("acpi resource window ([%#llx-%#llx] ignored, not CPU addressable)\n",
+			attr->minimum, attr->maximum);
+		return false;
 	}
 
 	switch (addr->resource_type) {
@@ -236,8 +244,6 @@ static bool acpi_decode_space(struct resource_win *win,
 		return false;
 	}
 
-	win->offset = attr->translation_offset;
-
 	if (addr->producer_consumer == ACPI_PRODUCER)
 		res->flags |= IORESOURCE_WINDOW;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel
  2015-06-24  7:43 [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel Jiang Liu
@ 2015-06-24  8:25 ` Boszormenyi Zoltan
  2015-06-24 11:00   ` Boszormenyi Zoltan
  2015-06-24  8:30 ` Ingo Molnar
  1 sibling, 1 reply; 18+ messages in thread
From: Boszormenyi Zoltan @ 2015-06-24  8:25 UTC (permalink / raw)
  To: Jiang Liu, Rafael J . Wysocki, Bjorn Helgaas, Ingo Molnar, Len Brown
  Cc: LKML, linux-pci, linux-acpi, x86 @ kernel . org

2015-06-24 09:43 keltezéssel, Jiang Liu írta:
> Since commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource
> interfaces to simplify implementation"), x86 PCI ACPI host bridge driver
> validates ACPI resources by first converting an ACPI resource to
> a 'struct resource' structure and then applying checks against the
> converted resource structure. The 'start' and 'end' fields in 'struct
> resource' are defined to be type of resource_size_t, which may be 32 bits
> or 64 bits depending on CONFIG_PHYS_ADDR_T_64BIT.
>
> This may cause incorrect resource validation results with 32 bit kernels
> because 64bit ACPI resource descriptors may get truncated when converting
> to 32bit 'start' and 'end' fields in 'struct resource'. And eventually
> affects PCI resource allocation subsystem and causes some PCI devices
> unusable.
>
> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
> descriptors with address/offset observe 4G when running in 32bit mode.
> This reverts to the behavior before commit 593669c2ac0f.
>
> This issue was triggered on a platform running 32bit kernel with an
> ACPI resource descriptor with address range [0x400000000-0xfffffffff].
> Please refer to https://lkml.org/lkml/2015/6/19/277 for more information.
>
> Reported-by: Boszormenyi Zoltan <zboszor@pr.hu>
> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> Cc: stable@vger.kernel.org # 4.0
> ---
>
> Hi Zoltan,
> 	Could you please help to test this patch against the latest kernel?
> Thanks!
> Gerry

I will, thanks.

Best regards,
Zoltán

>
> ---
>  drivers/acpi/resource.c |   24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
> index 8244f013f210..f1c966e05078 100644
> --- a/drivers/acpi/resource.c
> +++ b/drivers/acpi/resource.c
> @@ -193,6 +193,7 @@ static bool acpi_decode_space(struct resource_win *win,
>  	u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 : ACPI_DECODE_16;
>  	bool wp = addr->info.mem.write_protect;
>  	u64 len = attr->address_length;
> +	u64 start, end, offset = 0;
>  	struct resource *res = &win->res;
>  
>  	/*
> @@ -204,9 +205,6 @@ static bool acpi_decode_space(struct resource_win *win,
>  		pr_debug("ACPI: Invalid address space min_addr_fix %d, max_addr_fix %d, len %llx\n",
>  			 addr->min_address_fixed, addr->max_address_fixed, len);
>  
> -	res->start = attr->minimum;
> -	res->end = attr->maximum;
> -
>  	/*
>  	 * For bridges that translate addresses across the bridge,
>  	 * translation_offset is the offset that must be added to the
> @@ -214,12 +212,22 @@ static bool acpi_decode_space(struct resource_win *win,
>  	 * primary side. Non-bridge devices must list 0 for all Address
>  	 * Translation offset bits.
>  	 */
> -	if (addr->producer_consumer == ACPI_PRODUCER) {
> -		res->start += attr->translation_offset;
> -		res->end += attr->translation_offset;
> -	} else if (attr->translation_offset) {
> +	if (addr->producer_consumer == ACPI_PRODUCER)
> +		offset = attr->translation_offset;
> +	else if (attr->translation_offset)
>  		pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
>  			 attr->translation_offset);
> +	start = attr->minimum + offset;
> +	end = attr->maximum + offset;
> +
> +	win->offset = offset;
> +	res->start = start;
> +	res->end = end;
> +	if (sizeof(resource_size_t) < sizeof(u64) &&
> +	    (offset != win->offset || start != res->start || end != res->end)) {
> +		pr_warn("acpi resource window ([%#llx-%#llx] ignored, not CPU addressable)\n",
> +			attr->minimum, attr->maximum);
> +		return false;
>  	}
>  
>  	switch (addr->resource_type) {
> @@ -236,8 +244,6 @@ static bool acpi_decode_space(struct resource_win *win,
>  		return false;
>  	}
>  
> -	win->offset = attr->translation_offset;
> -
>  	if (addr->producer_consumer == ACPI_PRODUCER)
>  		res->flags |= IORESOURCE_WINDOW;
>  

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel
  2015-06-24  7:43 [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel Jiang Liu
  2015-06-24  8:25 ` Boszormenyi Zoltan
@ 2015-06-24  8:30 ` Ingo Molnar
  2015-06-24  9:28     ` Boszormenyi Zoltan
  1 sibling, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2015-06-24  8:30 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Rafael J . Wysocki, Bjorn Helgaas, Boszormenyi Zoltan, Len Brown,
	LKML, linux-pci, linux-acpi, x86 @ kernel . org


* Jiang Liu <jiang.liu@linux.intel.com> wrote:

> Since commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to 
> simplify implementation"), x86 PCI ACPI host bridge driver validates ACPI 
> resources by first converting an ACPI resource to a 'struct resource' structure 
> and then applying checks against the converted resource structure. The 'start' 
> and 'end' fields in 'struct resource' are defined to be type of resource_size_t, 
> which may be 32 bits or 64 bits depending on CONFIG_PHYS_ADDR_T_64BIT.
> 
> This may cause incorrect resource validation results with 32 bit kernels because 
> 64bit ACPI resource descriptors may get truncated when converting to 32bit 
> 'start' and 'end' fields in 'struct resource'. And eventually affects PCI 
> resource allocation subsystem and causes some PCI devices unusable.

s/causes some PCI devices unusuable.
  makes some PCI devices unusuable.

Also, this description is still pretty vague. What exactly happened? Did some PCI 
devices not show up during bootup? Or did they hang? Or did something else happen?

This is _by far_ the most important part of the changelog and determines whether a 
patch gets backported or not. Why does a usable regression description have to be 
coaxed out of you like pulling teeth??

> So enhance the ACPI resource parsing interfaces to ignore ACPI resource 
> descriptors with address/offset observe 4G when running in 32bit mode. This 
> reverts to the behavior before commit 593669c2ac0f.
> 
> This issue was triggered on a platform running 32bit kernel with an ACPI 
> resource descriptor with address range [0x400000000-0xfffffffff]. Please refer 
> to https://lkml.org/lkml/2015/6/19/277 for more information.

s/32bit/32-bit
s/64bit/64-bit
s/32 bit/32-bit
s/64 bit/64-bit

Thanks,

    Ingo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel
  2015-06-24  8:30 ` Ingo Molnar
@ 2015-06-24  9:28     ` Boszormenyi Zoltan
  0 siblings, 0 replies; 18+ messages in thread
From: Boszormenyi Zoltan @ 2015-06-24  9:28 UTC (permalink / raw)
  To: Ingo Molnar, Jiang Liu
  Cc: Rafael J . Wysocki, Bjorn Helgaas, Len Brown, LKML, linux-pci,
	linux-acpi, x86 @ kernel . org

2015-06-24 10:30 keltezéssel, Ingo Molnar írta:
> * Jiang Liu <jiang.liu@linux.intel.com> wrote:
>
>> Since commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to 
>> simplify implementation"), x86 PCI ACPI host bridge driver validates ACPI 
>> resources by first converting an ACPI resource to a 'struct resource' structure 
>> and then applying checks against the converted resource structure. The 'start' 
>> and 'end' fields in 'struct resource' are defined to be type of resource_size_t, 
>> which may be 32 bits or 64 bits depending on CONFIG_PHYS_ADDR_T_64BIT.
>>
>> This may cause incorrect resource validation results with 32 bit kernels because 
>> 64bit ACPI resource descriptors may get truncated when converting to 32bit 
>> 'start' and 'end' fields in 'struct resource'. And eventually affects PCI 
>> resource allocation subsystem and causes some PCI devices unusable.
> s/causes some PCI devices unusuable.
>   makes some PCI devices unusuable.
>
> Also, this description is still pretty vague. What exactly happened? Did some PCI 
> devices not show up during bootup? Or did they hang? Or did something else happen?

There's a reference mail URL in the description, but here it is in full glory.

The machine in question started behaving like being drunk without this fix
with 4.0.5 and 4.1.0-rc8 and 4.1.0-final. 3.18.16 was good.

There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID 1565:230e)
network chip on the mainboard. After the r8169 driver loaded, the IRQs in
the machine went berserk. Keyboard keypressed arrived with considerable
latency and duplicated, so no real work was possible. The machine responded
to the power button but didn't actually power down. It just stuck at the powering
down message. I had to press the power button for 4 seconds to power it down.

The computer is a POS machine with a big battery inside. Because of this,
either ACPI or the Realtek chip kept the bad state and after rebooting, the
network chip didn't even show up in lspci. Not even the PXE ROM announced
itself during boot. I had to disconnect the battery to beat some sense back
to the computer.

Without the patch I was able to get debugging info out of the machine in this
bad state with:

# modprobe r8169 ; sleep 10 ; dmesg >dmesg.log ; lspci -vvxxx >lspci.log ; \
    sync ; sync ; sync ; poweroff

all in the same command line. Entering commands manually after a single
"modprobe r8169" was impossible. That revealed that the #2 PCIe port
(the one that the Realtek chip is attached to) changed this way:

@@ -211,7 +211,7 @@
 
 00:1c.1 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 2 (rev 02)
(prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B- DisINTx+
-       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR- INTx-
+       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR+ <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
@@ -226,7 +226,7 @@
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
-               DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
+               DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #2, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s
<256ns, L1 <4us
                        ClockPM- Surprise- LLActRep+ BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

The "uncorrectable error" seems to have pushed it or the device behind it
to a disabled state after reboot and this state was kept because of the battery.

Also, with the 32-bit wraparound caused that every device in the system
was reprogrammed to use a different memory address range.

With the fix, the behavior of the machine was restored to how 3.18.16 worked,
i.e. the memory range that is over 4GB is ignored again, and lspci -vvxxx shows
that everything is at the same memory window as they were with 3.18.16.

Unrelated to this fix, but I also had an adventure with r8168 (downloaded from
Realtek and compiled from source) vs r8169. Most likely caused by switching
between r8168 and r8169, the network chip was programmed with a bad
MAC address (ff:fc:6d:11:28:ff, the real one is 00:0c:6d:11:28:77) which made
the network started acting weirdly. While the machine was pingable and it was
able to ping others, real networking like the ssh login prompt never appeared,
traceroute took ages, etc. That was also solved by disconnecting the battery
and powering down completely and returning to r8169 with the kernel patched
with a preliminary version of this patch.

>
> This is _by far_ the most important part of the changelog and determines whether a 
> patch gets backported or not. Why does a usable regression description have to be 
> coaxed out of you like pulling teeth??

The commit description by Jiang Liu has the URL for initial mail where
I reported the symptoms I experienced. If you thing the above summary is
not too long for a commit message, then feel free to use it, edited
in any way you like.

Best regards,
Zoltán

>
>> So enhance the ACPI resource parsing interfaces to ignore ACPI resource 
>> descriptors with address/offset observe 4G when running in 32bit mode. This 
>> reverts to the behavior before commit 593669c2ac0f.
>>
>> This issue was triggered on a platform running 32bit kernel with an ACPI 
>> resource descriptor with address range [0x400000000-0xfffffffff]. Please refer 
>> to https://lkml.org/lkml/2015/6/19/277 for more information.
> s/32bit/32-bit
> s/64bit/64-bit
> s/32 bit/32-bit
> s/64 bit/64-bit
>
> Thanks,
>
>     Ingo
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel
@ 2015-06-24  9:28     ` Boszormenyi Zoltan
  0 siblings, 0 replies; 18+ messages in thread
From: Boszormenyi Zoltan @ 2015-06-24  9:28 UTC (permalink / raw)
  To: Ingo Molnar, Jiang Liu
  Cc: Rafael J . Wysocki, Bjorn Helgaas, Len Brown, LKML, linux-pci,
	linux-acpi, x86 @ kernel . org

2015-06-24 10:30 keltezéssel, Ingo Molnar írta:
> * Jiang Liu <jiang.liu@linux.intel.com> wrote:
>
>> Since commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to 
>> simplify implementation"), x86 PCI ACPI host bridge driver validates ACPI 
>> resources by first converting an ACPI resource to a 'struct resource' structure 
>> and then applying checks against the converted resource structure. The 'start' 
>> and 'end' fields in 'struct resource' are defined to be type of resource_size_t, 
>> which may be 32 bits or 64 bits depending on CONFIG_PHYS_ADDR_T_64BIT.
>>
>> This may cause incorrect resource validation results with 32 bit kernels because 
>> 64bit ACPI resource descriptors may get truncated when converting to 32bit 
>> 'start' and 'end' fields in 'struct resource'. And eventually affects PCI 
>> resource allocation subsystem and causes some PCI devices unusable.
> s/causes some PCI devices unusuable.
>   makes some PCI devices unusuable.
>
> Also, this description is still pretty vague. What exactly happened? Did some PCI 
> devices not show up during bootup? Or did they hang? Or did something else happen?

There's a reference mail URL in the description, but here it is in full glory.

The machine in question started behaving like being drunk without this fix
with 4.0.5 and 4.1.0-rc8 and 4.1.0-final. 3.18.16 was good.

There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID 1565:230e)
network chip on the mainboard. After the r8169 driver loaded, the IRQs in
the machine went berserk. Keyboard keypressed arrived with considerable
latency and duplicated, so no real work was possible. The machine responded
to the power button but didn't actually power down. It just stuck at the powering
down message. I had to press the power button for 4 seconds to power it down.

The computer is a POS machine with a big battery inside. Because of this,
either ACPI or the Realtek chip kept the bad state and after rebooting, the
network chip didn't even show up in lspci. Not even the PXE ROM announced
itself during boot. I had to disconnect the battery to beat some sense back
to the computer.

Without the patch I was able to get debugging info out of the machine in this
bad state with:

# modprobe r8169 ; sleep 10 ; dmesg >dmesg.log ; lspci -vvxxx >lspci.log ; \
    sync ; sync ; sync ; poweroff

all in the same command line. Entering commands manually after a single
"modprobe r8169" was impossible. That revealed that the #2 PCIe port
(the one that the Realtek chip is attached to) changed this way:

@@ -211,7 +211,7 @@
 
 00:1c.1 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 2 (rev 02)
(prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR+ FastB2B- DisINTx+
-       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR- INTx-
+       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR+ <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
@@ -226,7 +226,7 @@
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
-               DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
+               DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #2, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s
<256ns, L1 <4us
                        ClockPM- Surprise- LLActRep+ BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

The "uncorrectable error" seems to have pushed it or the device behind it
to a disabled state after reboot and this state was kept because of the battery.

Also, with the 32-bit wraparound caused that every device in the system
was reprogrammed to use a different memory address range.

With the fix, the behavior of the machine was restored to how 3.18.16 worked,
i.e. the memory range that is over 4GB is ignored again, and lspci -vvxxx shows
that everything is at the same memory window as they were with 3.18.16.

Unrelated to this fix, but I also had an adventure with r8168 (downloaded from
Realtek and compiled from source) vs r8169. Most likely caused by switching
between r8168 and r8169, the network chip was programmed with a bad
MAC address (ff:fc:6d:11:28:ff, the real one is 00:0c:6d:11:28:77) which made
the network started acting weirdly. While the machine was pingable and it was
able to ping others, real networking like the ssh login prompt never appeared,
traceroute took ages, etc. That was also solved by disconnecting the battery
and powering down completely and returning to r8169 with the kernel patched
with a preliminary version of this patch.

>
> This is _by far_ the most important part of the changelog and determines whether a 
> patch gets backported or not. Why does a usable regression description have to be 
> coaxed out of you like pulling teeth??

The commit description by Jiang Liu has the URL for initial mail where
I reported the symptoms I experienced. If you thing the above summary is
not too long for a commit message, then feel free to use it, edited
in any way you like.

Best regards,
Zoltán

>
>> So enhance the ACPI resource parsing interfaces to ignore ACPI resource 
>> descriptors with address/offset observe 4G when running in 32bit mode. This 
>> reverts to the behavior before commit 593669c2ac0f.
>>
>> This issue was triggered on a platform running 32bit kernel with an ACPI 
>> resource descriptor with address range [0x400000000-0xfffffffff]. Please refer 
>> to https://lkml.org/lkml/2015/6/19/277 for more information.
> s/32bit/32-bit
> s/64bit/64-bit
> s/32 bit/32-bit
> s/64 bit/64-bit
>
> Thanks,
>
>     Ingo
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel
  2015-06-24  9:28     ` Boszormenyi Zoltan
  (?)
@ 2015-06-24  9:49     ` Ingo Molnar
  2015-06-24 10:17       ` [Bugfix v3] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel Jiang Liu
  -1 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2015-06-24  9:49 UTC (permalink / raw)
  To: Boszormenyi Zoltan
  Cc: Jiang Liu, Rafael J . Wysocki, Bjorn Helgaas, Len Brown, LKML,
	linux-pci, linux-acpi, x86 @ kernel . org


* Boszormenyi Zoltan <zboszor@pr.hu> wrote:

> 2015-06-24 10:30 keltezéssel, Ingo Molnar írta:
> > * Jiang Liu <jiang.liu@linux.intel.com> wrote:
> >
> >> Since commit 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to 
> >> simplify implementation"), x86 PCI ACPI host bridge driver validates ACPI 
> >> resources by first converting an ACPI resource to a 'struct resource' structure 
> >> and then applying checks against the converted resource structure. The 'start' 
> >> and 'end' fields in 'struct resource' are defined to be type of resource_size_t, 
> >> which may be 32 bits or 64 bits depending on CONFIG_PHYS_ADDR_T_64BIT.
> >>
> >> This may cause incorrect resource validation results with 32 bit kernels because 
> >> 64bit ACPI resource descriptors may get truncated when converting to 32bit 
> >> 'start' and 'end' fields in 'struct resource'. And eventually affects PCI 
> >> resource allocation subsystem and causes some PCI devices unusable.
> > s/causes some PCI devices unusuable.
> >   makes some PCI devices unusuable.
> >
> > Also, this description is still pretty vague. What exactly happened? Did some PCI 
> > devices not show up during bootup? Or did they hang? Or did something else happen?
> 
> There's a reference mail URL in the description, but here it is in full glory.
> 
> The machine in question started behaving like being drunk without this fix
> with 4.0.5 and 4.1.0-rc8 and 4.1.0-final. 3.18.16 was good.
> 
> There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID 1565:230e)
> network chip on the mainboard. After the r8169 driver loaded, the IRQs in
> the machine went berserk. Keyboard keypressed arrived with considerable
> latency and duplicated, so no real work was possible. The machine responded
> to the power button but didn't actually power down. It just stuck at the powering
> down message. I had to press the power button for 4 seconds to power it down.
> 
> The computer is a POS machine with a big battery inside. Because of this,
> either ACPI or the Realtek chip kept the bad state and after rebooting, the
> network chip didn't even show up in lspci. Not even the PXE ROM announced
> itself during boot. I had to disconnect the battery to beat some sense back
> to the computer.

So my point is that this description is more valuable than all the rest of the 
changelog, and it should be quoted prominently in the first paragraph or so!

And this too should round up the changelog:

> With the fix, the behavior of the machine was restored to how 3.18.16 worked, 
> i.e. the memory range that is over 4GB is ignored again, and lspci -vvxxx shows 
> that everything is at the same memory window as they were with 3.18.16.

as it is far more informative about the practical effects of the fix than anything 
in the previous versions of the changelog.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bugfix v3] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-06-24  9:49     ` Ingo Molnar
@ 2015-06-24 10:17       ` Jiang Liu
  2015-06-24 10:18         ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Jiang Liu @ 2015-06-24 10:17 UTC (permalink / raw)
  To: Rafael J . Wysocki, Bjorn Helgaas, Ingo Molnar,
	Boszormenyi Zoltan, Len Brown
  Cc: Jiang Liu, LKML, linux-pci, linux-acpi, x86 @ kernel . org

A regression report from Boszormenyi Zoltan <zboszor@pr.hu>:
There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID 1565:230e)
network chip on the mainboard. After the r8169 driver loaded, the IRQs in
the machine went berserk. Keyboard keypressed arrived with considerable
latency and duplicated, so no real work was possible. The machine responded
to the power button but didn't actually power down. It just stuck at the
powering down message. I had to press the power button for 4 seconds to power
it down.

The computer is a POS machine with a big battery inside. Because of this,
either ACPI or the Realtek chip kept the bad state and after rebooting, the
network chip didn't even show up in lspci. Not even the PXE ROM announced
itself during boot. I had to disconnect the battery to beat some sense back
to the computer.

The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
good.

The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use common
ACPI resource interfaces to simplify implementation"). Since commit
593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
first converting an ACPI resource to a 'struct resource' structure and
then applying checks against the converted resource structure. The 'start'
and 'end' fields in 'struct resource' are defined to be type of
resource_size_t, which may be 32 bits or 64 bits depending on
CONFIG_PHYS_ADDR_T_64BIT.

This may cause incorrect resource validation results with 32-bit kernels
because 64-bit ACPI resource descriptors may get truncated when converting
to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
affects PCI resource allocation subsystem and makes some PCI devices and
the system behave abnormally due to incorrect resource assignment.

So enhance the ACPI resource parsing interfaces to ignore ACPI resource
descriptors with address/offset above 4G when running in 32-bit mode.

With the fix applied, the behavior of the machine was restored to how
3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
and lspci -vvxxx shows that everything is at the same memory window as
they were with 3.18.16.

Reported-by: Boszormenyi Zoltan <zboszor@pr.hu>
Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # 4.0
---
Thanks, Ingo and Zoltan!
Will write bugfix commit messages for people who will backport them.
Thanks!
Gerry
---
 drivers/acpi/resource.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 8244f013f210..f1c966e05078 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -193,6 +193,7 @@ static bool acpi_decode_space(struct resource_win *win,
 	u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 : ACPI_DECODE_16;
 	bool wp = addr->info.mem.write_protect;
 	u64 len = attr->address_length;
+	u64 start, end, offset = 0;
 	struct resource *res = &win->res;
 
 	/*
@@ -204,9 +205,6 @@ static bool acpi_decode_space(struct resource_win *win,
 		pr_debug("ACPI: Invalid address space min_addr_fix %d, max_addr_fix %d, len %llx\n",
 			 addr->min_address_fixed, addr->max_address_fixed, len);
 
-	res->start = attr->minimum;
-	res->end = attr->maximum;
-
 	/*
 	 * For bridges that translate addresses across the bridge,
 	 * translation_offset is the offset that must be added to the
@@ -214,12 +212,22 @@ static bool acpi_decode_space(struct resource_win *win,
 	 * primary side. Non-bridge devices must list 0 for all Address
 	 * Translation offset bits.
 	 */
-	if (addr->producer_consumer == ACPI_PRODUCER) {
-		res->start += attr->translation_offset;
-		res->end += attr->translation_offset;
-	} else if (attr->translation_offset) {
+	if (addr->producer_consumer == ACPI_PRODUCER)
+		offset = attr->translation_offset;
+	else if (attr->translation_offset)
 		pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
 			 attr->translation_offset);
+	start = attr->minimum + offset;
+	end = attr->maximum + offset;
+
+	win->offset = offset;
+	res->start = start;
+	res->end = end;
+	if (sizeof(resource_size_t) < sizeof(u64) &&
+	    (offset != win->offset || start != res->start || end != res->end)) {
+		pr_warn("acpi resource window ([%#llx-%#llx] ignored, not CPU addressable)\n",
+			attr->minimum, attr->maximum);
+		return false;
 	}
 
 	switch (addr->resource_type) {
@@ -236,8 +244,6 @@ static bool acpi_decode_space(struct resource_win *win,
 		return false;
 	}
 
-	win->offset = attr->translation_offset;
-
 	if (addr->producer_consumer == ACPI_PRODUCER)
 		res->flags |= IORESOURCE_WINDOW;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Bugfix v3] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-06-24 10:17       ` [Bugfix v3] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel Jiang Liu
@ 2015-06-24 10:18         ` Ingo Molnar
  2015-06-29  8:55           ` Boszormenyi Zoltan
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2015-06-24 10:18 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Rafael J . Wysocki, Bjorn Helgaas, Boszormenyi Zoltan, Len Brown,
	LKML, linux-pci, linux-acpi, x86 @ kernel . org


* Jiang Liu <jiang.liu@linux.intel.com> wrote:

> A regression report from Boszormenyi Zoltan <zboszor@pr.hu>:
> There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID 1565:230e)
> network chip on the mainboard. After the r8169 driver loaded, the IRQs in
> the machine went berserk. Keyboard keypressed arrived with considerable
> latency and duplicated, so no real work was possible. The machine responded
> to the power button but didn't actually power down. It just stuck at the
> powering down message. I had to press the power button for 4 seconds to power
> it down.
> 
> The computer is a POS machine with a big battery inside. Because of this,
> either ACPI or the Realtek chip kept the bad state and after rebooting, the
> network chip didn't even show up in lspci. Not even the PXE ROM announced
> itself during boot. I had to disconnect the battery to beat some sense back
> to the computer.
> 
> The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
> good.

So please put this into quotes, like:

===============
Zoltan Boszormenyi reported this regression:

  "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID 1565:230e)
   network chip on the mainboard. After the r8169 driver loaded, the IRQs in
   the machine went berserk. Keyboard keypressed arrived with considerable
   latency and duplicated, so no real work was possible. The machine responded
   to the power button but didn't actually power down. It just stuck at the
   powering down message. I had to press the power button for 4 seconds to power
   it down.
 
   The computer is a POS machine with a big battery inside. Because of this,
   either ACPI or the Realtek chip kept the bad state and after rebooting, the
   network chip didn't even show up in lspci. Not even the PXE ROM announced
   itself during boot. I had to disconnect the battery to beat some sense back
   to the computer.
 
   The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
   good."

...
===============

Also note the indentation, that helps readability.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel
  2015-06-24  8:25 ` Boszormenyi Zoltan
@ 2015-06-24 11:00   ` Boszormenyi Zoltan
  0 siblings, 0 replies; 18+ messages in thread
From: Boszormenyi Zoltan @ 2015-06-24 11:00 UTC (permalink / raw)
  To: Jiang Liu, Rafael J . Wysocki, Bjorn Helgaas, Ingo Molnar, Len Brown
  Cc: LKML, linux-pci, linux-acpi, x86 @ kernel . org

[-- Attachment #1: Type: text/plain, Size: 776 bytes --]

2015-06-24 10:25 keltezéssel, Boszormenyi Zoltan írta:
> 2015-06-24 09:43 keltezéssel, Jiang Liu írta:
>> Hi Zoltan,
>> 	Could you please help to test this patch against the latest kernel?
>> Thanks!
>> Gerry
> I will, thanks.

Now i have tested this v2. I assume later ones will only differ in the commit message.
It works, thank you very much!

There are differences now between lspci between 3.18.16 and 4.1.0-final plus
this patch but I guess they are not relevant to this matter. The i915 chip and
the Realtek chip have their IRQs reversed and the "Data: " part in the
"Address:" line, too. I attached the lspci -vvxxx output from 3.18.16, 4.1-rc8
with the very first patch and 4.1-final with the v2 patch, so you can see if it is
an error or not.

Best regards,
Zoltán


[-- Attachment #2: lspci2.tgz --]
[-- Type: application/x-compressed-tar, Size: 12285 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v3] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-06-24 10:18         ` Ingo Molnar
@ 2015-06-29  8:55           ` Boszormenyi Zoltan
  2015-06-29 14:28               ` Jiang Liu
  2015-07-08  7:26             ` [Bugfix v4] " Jiang Liu
  0 siblings, 2 replies; 18+ messages in thread
From: Boszormenyi Zoltan @ 2015-06-29  8:55 UTC (permalink / raw)
  To: Ingo Molnar, Jiang Liu
  Cc: Rafael J . Wysocki, Bjorn Helgaas, Len Brown, LKML, linux-pci,
	linux-acpi, x86 @ kernel . org

2015-06-24 12:18 keltezéssel, Ingo Molnar írta:
> * Jiang Liu <jiang.liu@linux.intel.com> wrote:
>
>> A regression report from Boszormenyi Zoltan <zboszor@pr.hu>:
>> There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID 1565:230e)
>> network chip on the mainboard. After the r8169 driver loaded, the IRQs in
>> the machine went berserk. Keyboard keypressed arrived with considerable
>> latency and duplicated, so no real work was possible. The machine responded
>> to the power button but didn't actually power down. It just stuck at the
>> powering down message. I had to press the power button for 4 seconds to power
>> it down.
>>
>> The computer is a POS machine with a big battery inside. Because of this,
>> either ACPI or the Realtek chip kept the bad state and after rebooting, the
>> network chip didn't even show up in lspci. Not even the PXE ROM announced
>> itself during boot. I had to disconnect the battery to beat some sense back
>> to the computer.
>>
>> The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
>> good.
> So please put this into quotes, like:
>
> ===============
> Zoltan Boszormenyi reported this regression:
>
>   "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID 1565:230e)
>    network chip on the mainboard. After the r8169 driver loaded, the IRQs in
>    the machine went berserk. Keyboard keypressed arrived with considerable
>    latency and duplicated, so no real work was possible. The machine responded
>    to the power button but didn't actually power down. It just stuck at the
>    powering down message. I had to press the power button for 4 seconds to power
>    it down.
>  
>    The computer is a POS machine with a big battery inside. Because of this,
>    either ACPI or the Realtek chip kept the bad state and after rebooting, the
>    network chip didn't even show up in lspci. Not even the PXE ROM announced
>    itself during boot. I had to disconnect the battery to beat some sense back
>    to the computer.
>  
>    The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
>    good."
>
> ...
> ===============
>
> Also note the indentation, that helps readability.
>
> Thanks,
>
> 	Ingo

So, will there be a v4 with a commit message satisfactory to Ingo
that will be part of 4.0.7/4.1.1 and 4.2?

Best regards,
Zoltán

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v3] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-06-29  8:55           ` Boszormenyi Zoltan
@ 2015-06-29 14:28               ` Jiang Liu
  2015-07-08  7:26             ` [Bugfix v4] " Jiang Liu
  1 sibling, 0 replies; 18+ messages in thread
From: Jiang Liu @ 2015-06-29 14:28 UTC (permalink / raw)
  To: Boszormenyi Zoltan, Ingo Molnar
  Cc: Rafael J . Wysocki, Bjorn Helgaas, Len Brown, LKML, linux-pci,
	linux-acpi, x86 @ kernel . org

On 2015/6/29 16:55, Boszormenyi Zoltan wrote:
> 2015-06-24 12:18 keltezéssel, Ingo Molnar írta:
>> * Jiang Liu <jiang.liu@linux.intel.com> wrote:
>>
>> Also note the indentation, that helps readability.
>>
>> Thanks,
>>
>> 	Ingo
> 
> So, will there be a v4 with a commit message satisfactory to Ingo
> that will be part of 4.0.7/4.1.1 and 4.2?

Hi Zoltan,
	I'm waiting for a few days to see if there will be other
comments, and will send out v4 then.
Thanks!
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v3] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
@ 2015-06-29 14:28               ` Jiang Liu
  0 siblings, 0 replies; 18+ messages in thread
From: Jiang Liu @ 2015-06-29 14:28 UTC (permalink / raw)
  To: Boszormenyi Zoltan, Ingo Molnar
  Cc: Rafael J . Wysocki, Bjorn Helgaas, Len Brown, LKML, linux-pci,
	linux-acpi, x86 @ kernel . org

On 2015/6/29 16:55, Boszormenyi Zoltan wrote:
> 2015-06-24 12:18 keltezéssel, Ingo Molnar írta:
>> * Jiang Liu <jiang.liu@linux.intel.com> wrote:
>>
>> Also note the indentation, that helps readability.
>>
>> Thanks,
>>
>> 	Ingo
> 
> So, will there be a v4 with a commit message satisfactory to Ingo
> that will be part of 4.0.7/4.1.1 and 4.2?

Hi Zoltan,
	I'm waiting for a few days to see if there will be other
comments, and will send out v4 then.
Thanks!
Gerry

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bugfix v4] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-06-29  8:55           ` Boszormenyi Zoltan
  2015-06-29 14:28               ` Jiang Liu
@ 2015-07-08  7:26             ` Jiang Liu
  2015-07-10  1:10               ` Rafael J. Wysocki
  2015-11-02 15:27               ` Tomasz Nowicki
  1 sibling, 2 replies; 18+ messages in thread
From: Jiang Liu @ 2015-07-08  7:26 UTC (permalink / raw)
  To: Rafael J . Wysocki, Bjorn Helgaas, Ingo Molnar,
	Boszormenyi Zoltan, Len Brown
  Cc: Jiang Liu, LKML, linux-pci, linux-acpi, x86 @ kernel . org

Zoltan Boszormenyi reported this regression:
  "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
   1565:230e) network chip on the mainboard. After the r8169 driver loaded
   the IRQs in the machine went berserk. Keyboard keypressed arrived with
   considerable latency and duplicated, so no real work was possible.
   The machine responded to the power button but didn't actually power
   down. It just stuck at the powering down message. I had to press the
   power button for 4 seconds to power it down.

   The computer is a POS machine with a big battery inside. Because of this,
   either ACPI or the Realtek chip kept the bad state and after rebooting,
   the network chip didn't even show up in lspci. Not even the PXE ROM
   announced itself during boot. I had to disconnect the battery to beat
   some sense back to the computer.

   The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
   good."

The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use common
ACPI resource interfaces to simplify implementation"). Since commit
593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
first converting an ACPI resource to a 'struct resource' structure and
then applying checks against the converted resource structure. The 'start'
and 'end' fields in 'struct resource' are defined to be type of
resource_size_t, which may be 32 bits or 64 bits depending on
CONFIG_PHYS_ADDR_T_64BIT.

This may cause incorrect resource validation results with 32-bit kernels
because 64-bit ACPI resource descriptors may get truncated when converting
to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
affects PCI resource allocation subsystem and makes some PCI devices and
the system behave abnormally due to incorrect resource assignment.

So enhance the ACPI resource parsing interfaces to ignore ACPI resource
descriptors with address/offset above 4G when running in 32-bit mode.

With the fix applied, the behavior of the machine was restored to how
3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
and lspci -vvxxx shows that everything is at the same memory window as
they were with 3.18.16.

Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # 4.0
---
 drivers/acpi/resource.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 10561ce16ed1..e8d281739cbc 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -194,6 +194,7 @@ static bool acpi_decode_space(struct resource_win *win,
 	u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 : ACPI_DECODE_16;
 	bool wp = addr->info.mem.write_protect;
 	u64 len = attr->address_length;
+	u64 start, end, offset = 0;
 	struct resource *res = &win->res;
 
 	/*
@@ -205,9 +206,6 @@ static bool acpi_decode_space(struct resource_win *win,
 		pr_debug("ACPI: Invalid address space min_addr_fix %d, max_addr_fix %d, len %llx\n",
 			 addr->min_address_fixed, addr->max_address_fixed, len);
 
-	res->start = attr->minimum;
-	res->end = attr->maximum;
-
 	/*
 	 * For bridges that translate addresses across the bridge,
 	 * translation_offset is the offset that must be added to the
@@ -215,12 +213,22 @@ static bool acpi_decode_space(struct resource_win *win,
 	 * primary side. Non-bridge devices must list 0 for all Address
 	 * Translation offset bits.
 	 */
-	if (addr->producer_consumer == ACPI_PRODUCER) {
-		res->start += attr->translation_offset;
-		res->end += attr->translation_offset;
-	} else if (attr->translation_offset) {
+	if (addr->producer_consumer == ACPI_PRODUCER)
+		offset = attr->translation_offset;
+	else if (attr->translation_offset)
 		pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
 			 attr->translation_offset);
+	start = attr->minimum + offset;
+	end = attr->maximum + offset;
+
+	win->offset = offset;
+	res->start = start;
+	res->end = end;
+	if (sizeof(resource_size_t) < sizeof(u64) &&
+	    (offset != win->offset || start != res->start || end != res->end)) {
+		pr_warn("acpi resource window ([%#llx-%#llx] ignored, not CPU addressable)\n",
+			attr->minimum, attr->maximum);
+		return false;
 	}
 
 	switch (addr->resource_type) {
@@ -237,8 +245,6 @@ static bool acpi_decode_space(struct resource_win *win,
 		return false;
 	}
 
-	win->offset = attr->translation_offset;
-
 	if (addr->producer_consumer == ACPI_PRODUCER)
 		res->flags |= IORESOURCE_WINDOW;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Bugfix v4] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-07-08  7:26             ` [Bugfix v4] " Jiang Liu
@ 2015-07-10  1:10               ` Rafael J. Wysocki
  2015-11-02 15:27               ` Tomasz Nowicki
  1 sibling, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2015-07-10  1:10 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Bjorn Helgaas, Ingo Molnar, Boszormenyi Zoltan, Len Brown, LKML,
	linux-pci, linux-acpi, x86 @ kernel . org

On Wednesday, July 08, 2015 03:26:39 PM Jiang Liu wrote:
> Zoltan Boszormenyi reported this regression:
>   "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>    1565:230e) network chip on the mainboard. After the r8169 driver loaded
>    the IRQs in the machine went berserk. Keyboard keypressed arrived with
>    considerable latency and duplicated, so no real work was possible.
>    The machine responded to the power button but didn't actually power
>    down. It just stuck at the powering down message. I had to press the
>    power button for 4 seconds to power it down.
> 
>    The computer is a POS machine with a big battery inside. Because of this,
>    either ACPI or the Realtek chip kept the bad state and after rebooting,
>    the network chip didn't even show up in lspci. Not even the PXE ROM
>    announced itself during boot. I had to disconnect the battery to beat
>    some sense back to the computer.
> 
>    The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
>    good."
> 
> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use common
> ACPI resource interfaces to simplify implementation"). Since commit
> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
> first converting an ACPI resource to a 'struct resource' structure and
> then applying checks against the converted resource structure. The 'start'
> and 'end' fields in 'struct resource' are defined to be type of
> resource_size_t, which may be 32 bits or 64 bits depending on
> CONFIG_PHYS_ADDR_T_64BIT.
> 
> This may cause incorrect resource validation results with 32-bit kernels
> because 64-bit ACPI resource descriptors may get truncated when converting
> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
> affects PCI resource allocation subsystem and makes some PCI devices and
> the system behave abnormally due to incorrect resource assignment.
> 
> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
> descriptors with address/offset above 4G when running in 32-bit mode.
> 
> With the fix applied, the behavior of the machine was restored to how
> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
> and lspci -vvxxx shows that everything is at the same memory window as
> they were with 3.18.16.
> 
> Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> Cc: stable@vger.kernel.org # 4.0

OK, I'm happy with the above changelog, so I'm going to apply the patch.

If anyone has any objections, please let me know.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v4] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-07-08  7:26             ` [Bugfix v4] " Jiang Liu
  2015-07-10  1:10               ` Rafael J. Wysocki
@ 2015-11-02 15:27               ` Tomasz Nowicki
  2015-11-05 12:53                 ` Tomasz Nowicki
  1 sibling, 1 reply; 18+ messages in thread
From: Tomasz Nowicki @ 2015-11-02 15:27 UTC (permalink / raw)
  To: Jiang Liu, Rafael J . Wysocki, Bjorn Helgaas, Ingo Molnar,
	Boszormenyi Zoltan, Len Brown
  Cc: LKML, linux-pci, linux-acpi, x86 @ kernel . org

On 08.07.2015 09:26, Jiang Liu wrote:
> Zoltan Boszormenyi reported this regression:
>    "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>     1565:230e) network chip on the mainboard. After the r8169 driver loaded
>     the IRQs in the machine went berserk. Keyboard keypressed arrived with
>     considerable latency and duplicated, so no real work was possible.
>     The machine responded to the power button but didn't actually power
>     down. It just stuck at the powering down message. I had to press the
>     power button for 4 seconds to power it down.
>
>     The computer is a POS machine with a big battery inside. Because of this,
>     either ACPI or the Realtek chip kept the bad state and after rebooting,
>     the network chip didn't even show up in lspci. Not even the PXE ROM
>     announced itself during boot. I had to disconnect the battery to beat
>     some sense back to the computer.
>
>     The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
>     good."
>
> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use common
> ACPI resource interfaces to simplify implementation"). Since commit
> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
> first converting an ACPI resource to a 'struct resource' structure and
> then applying checks against the converted resource structure. The 'start'
> and 'end' fields in 'struct resource' are defined to be type of
> resource_size_t, which may be 32 bits or 64 bits depending on
> CONFIG_PHYS_ADDR_T_64BIT.
>
> This may cause incorrect resource validation results with 32-bit kernels
> because 64-bit ACPI resource descriptors may get truncated when converting
> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
> affects PCI resource allocation subsystem and makes some PCI devices and
> the system behave abnormally due to incorrect resource assignment.
>
> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
> descriptors with address/offset above 4G when running in 32-bit mode.
>
> With the fix applied, the behavior of the machine was restored to how
> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
> and lspci -vvxxx shows that everything is at the same memory window as
> they were with 3.18.16.
>
> Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> Cc: stable@vger.kernel.org # 4.0
> ---
>   drivers/acpi/resource.c |   24 +++++++++++++++---------
>   1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
> index 10561ce16ed1..e8d281739cbc 100644
> --- a/drivers/acpi/resource.c
> +++ b/drivers/acpi/resource.c
> @@ -194,6 +194,7 @@ static bool acpi_decode_space(struct resource_win *win,
>   	u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 : ACPI_DECODE_16;
>   	bool wp = addr->info.mem.write_protect;
>   	u64 len = attr->address_length;
> +	u64 start, end, offset = 0;
>   	struct resource *res = &win->res;
>
>   	/*
> @@ -205,9 +206,6 @@ static bool acpi_decode_space(struct resource_win *win,
>   		pr_debug("ACPI: Invalid address space min_addr_fix %d, max_addr_fix %d, len %llx\n",
>   			 addr->min_address_fixed, addr->max_address_fixed, len);
>
> -	res->start = attr->minimum;
> -	res->end = attr->maximum;
> -
>   	/*
>   	 * For bridges that translate addresses across the bridge,
>   	 * translation_offset is the offset that must be added to the
> @@ -215,12 +213,22 @@ static bool acpi_decode_space(struct resource_win *win,
>   	 * primary side. Non-bridge devices must list 0 for all Address
>   	 * Translation offset bits.
>   	 */
> -	if (addr->producer_consumer == ACPI_PRODUCER) {
> -		res->start += attr->translation_offset;
> -		res->end += attr->translation_offset;
> -	} else if (attr->translation_offset) {
> +	if (addr->producer_consumer == ACPI_PRODUCER)
> +		offset = attr->translation_offset;
> +	else if (attr->translation_offset)
>   		pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
>   			 attr->translation_offset);
> +	start = attr->minimum + offset;
> +	end = attr->maximum + offset;

I still see the issue for this area, I mean ACPI_IO_RANGE. You are 
adding translation offset to attr->minimum, build resource structure 
which is then passed to acpi_dev_ioresource_flags and compared against 
0x10003. It causes some IO ranges to be ignored.

> +
> +	win->offset = offset;
> +	res->start = start;
> +	res->end = end;
> +	if (sizeof(resource_size_t) < sizeof(u64) &&
> +	    (offset != win->offset || start != res->start || end != res->end)) {
> +		pr_warn("acpi resource window ([%#llx-%#llx] ignored, not CPU addressable)\n",
> +			attr->minimum, attr->maximum);
> +		return false;
>   	}
>
>   	switch (addr->resource_type) {
> @@ -237,8 +245,6 @@ static bool acpi_decode_space(struct resource_win *win,
>   		return false;
>   	}
>
> -	win->offset = attr->translation_offset;
> -
>   	if (addr->producer_consumer == ACPI_PRODUCER)
>   		res->flags |= IORESOURCE_WINDOW;
>
>

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v4] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-11-02 15:27               ` Tomasz Nowicki
@ 2015-11-05 12:53                 ` Tomasz Nowicki
  2015-11-05 13:24                   ` Jiang Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Tomasz Nowicki @ 2015-11-05 12:53 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Tomasz Nowicki, Rafael J . Wysocki, Bjorn Helgaas, Ingo Molnar,
	Boszormenyi Zoltan, Len Brown, LKML, linux-pci, linux-acpi,
	x86 @ kernel . org

On 02.11.2015 16:27, Tomasz Nowicki wrote:
> On 08.07.2015 09:26, Jiang Liu wrote:
>> Zoltan Boszormenyi reported this regression:
>>    "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>>     1565:230e) network chip on the mainboard. After the r8169 driver
>> loaded
>>     the IRQs in the machine went berserk. Keyboard keypressed arrived
>> with
>>     considerable latency and duplicated, so no real work was possible.
>>     The machine responded to the power button but didn't actually power
>>     down. It just stuck at the powering down message. I had to press the
>>     power button for 4 seconds to power it down.
>>
>>     The computer is a POS machine with a big battery inside. Because
>> of this,
>>     either ACPI or the Realtek chip kept the bad state and after
>> rebooting,
>>     the network chip didn't even show up in lspci. Not even the PXE ROM
>>     announced itself during boot. I had to disconnect the battery to beat
>>     some sense back to the computer.
>>
>>     The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final.
>> 3.18.16 was
>>     good."
>>
>> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use
>> common
>> ACPI resource interfaces to simplify implementation"). Since commit
>> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
>> first converting an ACPI resource to a 'struct resource' structure and
>> then applying checks against the converted resource structure. The
>> 'start'
>> and 'end' fields in 'struct resource' are defined to be type of
>> resource_size_t, which may be 32 bits or 64 bits depending on
>> CONFIG_PHYS_ADDR_T_64BIT.
>>
>> This may cause incorrect resource validation results with 32-bit kernels
>> because 64-bit ACPI resource descriptors may get truncated when
>> converting
>> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
>> affects PCI resource allocation subsystem and makes some PCI devices and
>> the system behave abnormally due to incorrect resource assignment.
>>
>> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
>> descriptors with address/offset above 4G when running in 32-bit mode.
>>
>> With the fix applied, the behavior of the machine was restored to how
>> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
>> and lspci -vvxxx shows that everything is at the same memory window as
>> they were with 3.18.16.
>>
>> Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
>> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource
>> interfaces to simplify implementation")
>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>> Cc: stable@vger.kernel.org # 4.0
>> ---
>>   drivers/acpi/resource.c |   24 +++++++++++++++---------
>>   1 file changed, 15 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>> index 10561ce16ed1..e8d281739cbc 100644
>> --- a/drivers/acpi/resource.c
>> +++ b/drivers/acpi/resource.c
>> @@ -194,6 +194,7 @@ static bool acpi_decode_space(struct resource_win
>> *win,
>>       u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 :
>> ACPI_DECODE_16;
>>       bool wp = addr->info.mem.write_protect;
>>       u64 len = attr->address_length;
>> +    u64 start, end, offset = 0;
>>       struct resource *res = &win->res;
>>
>>       /*
>> @@ -205,9 +206,6 @@ static bool acpi_decode_space(struct resource_win
>> *win,
>>           pr_debug("ACPI: Invalid address space min_addr_fix %d,
>> max_addr_fix %d, len %llx\n",
>>                addr->min_address_fixed, addr->max_address_fixed, len);
>>
>> -    res->start = attr->minimum;
>> -    res->end = attr->maximum;
>> -
>>       /*
>>        * For bridges that translate addresses across the bridge,
>>        * translation_offset is the offset that must be added to the
>> @@ -215,12 +213,22 @@ static bool acpi_decode_space(struct
>> resource_win *win,
>>        * primary side. Non-bridge devices must list 0 for all Address
>>        * Translation offset bits.
>>        */
>> -    if (addr->producer_consumer == ACPI_PRODUCER) {
>> -        res->start += attr->translation_offset;
>> -        res->end += attr->translation_offset;
>> -    } else if (attr->translation_offset) {
>> +    if (addr->producer_consumer == ACPI_PRODUCER)
>> +        offset = attr->translation_offset;
>> +    else if (attr->translation_offset)
>>           pr_debug("ACPI: translation_offset(%lld) is invalid for
>> non-bridge device.\n",
>>                attr->translation_offset);
>> +    start = attr->minimum + offset;
>> +    end = attr->maximum + offset;
>
> I still see the issue for this area, I mean ACPI_IO_RANGE. You are
> adding translation offset to attr->minimum, build resource structure
> which is then passed to acpi_dev_ioresource_flags and compared against
> 0x10003. It causes some IO ranges to be ignored.
>

Kindly reminder, any comments?

Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bugfix v4] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-11-05 12:53                 ` Tomasz Nowicki
@ 2015-11-05 13:24                   ` Jiang Liu
  2015-11-05 13:53                     ` Tomasz Nowicki
  0 siblings, 1 reply; 18+ messages in thread
From: Jiang Liu @ 2015-11-05 13:24 UTC (permalink / raw)
  To: Tomasz Nowicki
  Cc: Tomasz Nowicki, Rafael J . Wysocki, Bjorn Helgaas, Ingo Molnar,
	Boszormenyi Zoltan, Len Brown, LKML, linux-pci, linux-acpi,
	x86 @ kernel . org

[-- Attachment #1: Type: text/plain, Size: 5201 bytes --]

On 2015/11/5 20:53, Tomasz Nowicki wrote:
> On 02.11.2015 16:27, Tomasz Nowicki wrote:
>> On 08.07.2015 09:26, Jiang Liu wrote:
>>> Zoltan Boszormenyi reported this regression:
>>>    "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>>>     1565:230e) network chip on the mainboard. After the r8169 driver
>>> loaded
>>>     the IRQs in the machine went berserk. Keyboard keypressed arrived
>>> with
>>>     considerable latency and duplicated, so no real work was possible.
>>>     The machine responded to the power button but didn't actually power
>>>     down. It just stuck at the powering down message. I had to press the
>>>     power button for 4 seconds to power it down.
>>>
>>>     The computer is a POS machine with a big battery inside. Because
>>> of this,
>>>     either ACPI or the Realtek chip kept the bad state and after
>>> rebooting,
>>>     the network chip didn't even show up in lspci. Not even the PXE ROM
>>>     announced itself during boot. I had to disconnect the battery to
>>> beat
>>>     some sense back to the computer.
>>>
>>>     The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final.
>>> 3.18.16 was
>>>     good."
>>>
>>> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use
>>> common
>>> ACPI resource interfaces to simplify implementation"). Since commit
>>> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI
>>> resources by
>>> first converting an ACPI resource to a 'struct resource' structure and
>>> then applying checks against the converted resource structure. The
>>> 'start'
>>> and 'end' fields in 'struct resource' are defined to be type of
>>> resource_size_t, which may be 32 bits or 64 bits depending on
>>> CONFIG_PHYS_ADDR_T_64BIT.
>>>
>>> This may cause incorrect resource validation results with 32-bit kernels
>>> because 64-bit ACPI resource descriptors may get truncated when
>>> converting
>>> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
>>> affects PCI resource allocation subsystem and makes some PCI devices and
>>> the system behave abnormally due to incorrect resource assignment.
>>>
>>> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
>>> descriptors with address/offset above 4G when running in 32-bit mode.
>>>
>>> With the fix applied, the behavior of the machine was restored to how
>>> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
>>> and lspci -vvxxx shows that everything is at the same memory window as
>>> they were with 3.18.16.
>>>
>>> Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
>>> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource
>>> interfaces to simplify implementation")
>>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>>> Cc: stable@vger.kernel.org # 4.0
>>> ---
>>>   drivers/acpi/resource.c |   24 +++++++++++++++---------
>>>   1 file changed, 15 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>>> index 10561ce16ed1..e8d281739cbc 100644
>>> --- a/drivers/acpi/resource.c
>>> +++ b/drivers/acpi/resource.c
>>> @@ -194,6 +194,7 @@ static bool acpi_decode_space(struct resource_win
>>> *win,
>>>       u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 :
>>> ACPI_DECODE_16;
>>>       bool wp = addr->info.mem.write_protect;
>>>       u64 len = attr->address_length;
>>> +    u64 start, end, offset = 0;
>>>       struct resource *res = &win->res;
>>>
>>>       /*
>>> @@ -205,9 +206,6 @@ static bool acpi_decode_space(struct resource_win
>>> *win,
>>>           pr_debug("ACPI: Invalid address space min_addr_fix %d,
>>> max_addr_fix %d, len %llx\n",
>>>                addr->min_address_fixed, addr->max_address_fixed, len);
>>>
>>> -    res->start = attr->minimum;
>>> -    res->end = attr->maximum;
>>> -
>>>       /*
>>>        * For bridges that translate addresses across the bridge,
>>>        * translation_offset is the offset that must be added to the
>>> @@ -215,12 +213,22 @@ static bool acpi_decode_space(struct
>>> resource_win *win,
>>>        * primary side. Non-bridge devices must list 0 for all Address
>>>        * Translation offset bits.
>>>        */
>>> -    if (addr->producer_consumer == ACPI_PRODUCER) {
>>> -        res->start += attr->translation_offset;
>>> -        res->end += attr->translation_offset;
>>> -    } else if (attr->translation_offset) {
>>> +    if (addr->producer_consumer == ACPI_PRODUCER)
>>> +        offset = attr->translation_offset;
>>> +    else if (attr->translation_offset)
>>>           pr_debug("ACPI: translation_offset(%lld) is invalid for
>>> non-bridge device.\n",
>>>                attr->translation_offset);
>>> +    start = attr->minimum + offset;
>>> +    end = attr->maximum + offset;
>>
>> I still see the issue for this area, I mean ACPI_IO_RANGE. You are
>> adding translation offset to attr->minimum, build resource structure
>> which is then passed to acpi_dev_ioresource_flags and compared against
>> 0x10003. It causes some IO ranges to be ignored.
>>
> 
> Kindly reminder, any comments?
> 
> Tomasz
Hi Tomasz,
	Thanks for reporting this issue! Could you please help to
test the attached patch?
Thanks,
Gerry


[-- Attachment #2: 0001-ACPI-Fix-an-error-in-IO-port-range-validation.patch --]
[-- Type: text/x-patch, Size: 1739 bytes --]

>From 2afdf4595dc961a2472ba1a35d7f67046b1845d2 Mon Sep 17 00:00:00 2001
From: Liu Jiang <jiang.liu@linux.intel.com>
Date: Thu, 5 Nov 2015 21:13:23 +0800
Subject: [PATCH] ACPI: Fix an error in IO port range validation


Signed-off-by: Liu Jiang <jiang.liu@linux.intel.com>
---
 drivers/acpi/resource.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 15d22db05054..5bb1daa562b0 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -119,14 +119,14 @@ bool acpi_dev_resource_memory(struct acpi_resource *ares, struct resource *res)
 EXPORT_SYMBOL_GPL(acpi_dev_resource_memory);
 
 static void acpi_dev_ioresource_flags(struct resource *res, u64 len,
-				      u8 io_decode)
+				      u64 offset, u8 io_decode)
 {
 	res->flags = IORESOURCE_IO;
 
 	if (!acpi_dev_resource_len_valid(res->start, res->end, len, true))
 		res->flags |= IORESOURCE_DISABLED | IORESOURCE_UNSET;
 
-	if (res->end >= 0x10003)
+	if (res->end - offset >= 0x10003)
 		res->flags |= IORESOURCE_DISABLED | IORESOURCE_UNSET;
 
 	if (io_decode == ACPI_DECODE_16)
@@ -138,7 +138,7 @@ static void acpi_dev_get_ioresource(struct resource *res, u64 start, u64 len,
 {
 	res->start = start;
 	res->end = start + len - 1;
-	acpi_dev_ioresource_flags(res, len, io_decode);
+	acpi_dev_ioresource_flags(res, len, 0, io_decode);
 }
 
 /**
@@ -231,7 +231,7 @@ static bool acpi_decode_space(struct resource_win *win,
 		acpi_dev_memresource_flags(res, len, wp);
 		break;
 	case ACPI_IO_RANGE:
-		acpi_dev_ioresource_flags(res, len, iodec);
+		acpi_dev_ioresource_flags(res, len, offset, iodec);
 		break;
 	case ACPI_BUS_NUMBER_RANGE:
 		res->flags = IORESOURCE_BUS;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Bugfix v4] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel
  2015-11-05 13:24                   ` Jiang Liu
@ 2015-11-05 13:53                     ` Tomasz Nowicki
  0 siblings, 0 replies; 18+ messages in thread
From: Tomasz Nowicki @ 2015-11-05 13:53 UTC (permalink / raw)
  To: Jiang Liu, Tomasz Nowicki
  Cc: Rafael J . Wysocki, Bjorn Helgaas, Ingo Molnar,
	Boszormenyi Zoltan, Len Brown, LKML, linux-pci, linux-acpi,
	x86 @ kernel . org

On 05.11.2015 14:24, Jiang Liu wrote:
> On 2015/11/5 20:53, Tomasz Nowicki wrote:
>> On 02.11.2015 16:27, Tomasz Nowicki wrote:
>>> On 08.07.2015 09:26, Jiang Liu wrote:
>>>> Zoltan Boszormenyi reported this regression:
>>>>     "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>>>>      1565:230e) network chip on the mainboard. After the r8169 driver
>>>> loaded
>>>>      the IRQs in the machine went berserk. Keyboard keypressed arrived
>>>> with
>>>>      considerable latency and duplicated, so no real work was possible.
>>>>      The machine responded to the power button but didn't actually power
>>>>      down. It just stuck at the powering down message. I had to press the
>>>>      power button for 4 seconds to power it down.
>>>>
>>>>      The computer is a POS machine with a big battery inside. Because
>>>> of this,
>>>>      either ACPI or the Realtek chip kept the bad state and after
>>>> rebooting,
>>>>      the network chip didn't even show up in lspci. Not even the PXE ROM
>>>>      announced itself during boot. I had to disconnect the battery to
>>>> beat
>>>>      some sense back to the computer.
>>>>
>>>>      The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final.
>>>> 3.18.16 was
>>>>      good."
>>>>
>>>> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use
>>>> common
>>>> ACPI resource interfaces to simplify implementation"). Since commit
>>>> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI
>>>> resources by
>>>> first converting an ACPI resource to a 'struct resource' structure and
>>>> then applying checks against the converted resource structure. The
>>>> 'start'
>>>> and 'end' fields in 'struct resource' are defined to be type of
>>>> resource_size_t, which may be 32 bits or 64 bits depending on
>>>> CONFIG_PHYS_ADDR_T_64BIT.
>>>>
>>>> This may cause incorrect resource validation results with 32-bit kernels
>>>> because 64-bit ACPI resource descriptors may get truncated when
>>>> converting
>>>> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
>>>> affects PCI resource allocation subsystem and makes some PCI devices and
>>>> the system behave abnormally due to incorrect resource assignment.
>>>>
>>>> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
>>>> descriptors with address/offset above 4G when running in 32-bit mode.
>>>>
>>>> With the fix applied, the behavior of the machine was restored to how
>>>> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
>>>> and lspci -vvxxx shows that everything is at the same memory window as
>>>> they were with 3.18.16.
>>>>
>>>> Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
>>>> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource
>>>> interfaces to simplify implementation")
>>>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>>>> Cc: stable@vger.kernel.org # 4.0
>>>> ---
>>>>    drivers/acpi/resource.c |   24 +++++++++++++++---------
>>>>    1 file changed, 15 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>>>> index 10561ce16ed1..e8d281739cbc 100644
>>>> --- a/drivers/acpi/resource.c
>>>> +++ b/drivers/acpi/resource.c
>>>> @@ -194,6 +194,7 @@ static bool acpi_decode_space(struct resource_win
>>>> *win,
>>>>        u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 :
>>>> ACPI_DECODE_16;
>>>>        bool wp = addr->info.mem.write_protect;
>>>>        u64 len = attr->address_length;
>>>> +    u64 start, end, offset = 0;
>>>>        struct resource *res = &win->res;
>>>>
>>>>        /*
>>>> @@ -205,9 +206,6 @@ static bool acpi_decode_space(struct resource_win
>>>> *win,
>>>>            pr_debug("ACPI: Invalid address space min_addr_fix %d,
>>>> max_addr_fix %d, len %llx\n",
>>>>                 addr->min_address_fixed, addr->max_address_fixed, len);
>>>>
>>>> -    res->start = attr->minimum;
>>>> -    res->end = attr->maximum;
>>>> -
>>>>        /*
>>>>         * For bridges that translate addresses across the bridge,
>>>>         * translation_offset is the offset that must be added to the
>>>> @@ -215,12 +213,22 @@ static bool acpi_decode_space(struct
>>>> resource_win *win,
>>>>         * primary side. Non-bridge devices must list 0 for all Address
>>>>         * Translation offset bits.
>>>>         */
>>>> -    if (addr->producer_consumer == ACPI_PRODUCER) {
>>>> -        res->start += attr->translation_offset;
>>>> -        res->end += attr->translation_offset;
>>>> -    } else if (attr->translation_offset) {
>>>> +    if (addr->producer_consumer == ACPI_PRODUCER)
>>>> +        offset = attr->translation_offset;
>>>> +    else if (attr->translation_offset)
>>>>            pr_debug("ACPI: translation_offset(%lld) is invalid for
>>>> non-bridge device.\n",
>>>>                 attr->translation_offset);
>>>> +    start = attr->minimum + offset;
>>>> +    end = attr->maximum + offset;
>>>
>>> I still see the issue for this area, I mean ACPI_IO_RANGE. You are
>>> adding translation offset to attr->minimum, build resource structure
>>> which is then passed to acpi_dev_ioresource_flags and compared against
>>> 0x10003. It causes some IO ranges to be ignored.
>>>
>>
>> Kindly reminder, any comments?
>>
>> Tomasz
> Hi Tomasz,
> 	Thanks for reporting this issue! Could you please help to
> test the attached patch?

I was not able to apply your patch directly but that part:
-	if (res->end >= 0x10003)
+	if (res->end - offset >= 0x10003)
  		res->flags |= IORESOURCE_DISABLED | IORESOURCE_UNSET;

definitely helps. Thanks!

Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-11-05 13:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-24  7:43 [Bugfix v2] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel Jiang Liu
2015-06-24  8:25 ` Boszormenyi Zoltan
2015-06-24 11:00   ` Boszormenyi Zoltan
2015-06-24  8:30 ` Ingo Molnar
2015-06-24  9:28   ` Boszormenyi Zoltan
2015-06-24  9:28     ` Boszormenyi Zoltan
2015-06-24  9:49     ` Ingo Molnar
2015-06-24 10:17       ` [Bugfix v3] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel Jiang Liu
2015-06-24 10:18         ` Ingo Molnar
2015-06-29  8:55           ` Boszormenyi Zoltan
2015-06-29 14:28             ` Jiang Liu
2015-06-29 14:28               ` Jiang Liu
2015-07-08  7:26             ` [Bugfix v4] " Jiang Liu
2015-07-10  1:10               ` Rafael J. Wysocki
2015-11-02 15:27               ` Tomasz Nowicki
2015-11-05 12:53                 ` Tomasz Nowicki
2015-11-05 13:24                   ` Jiang Liu
2015-11-05 13:53                     ` Tomasz Nowicki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.