linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
@ 2018-11-15 23:16 Alexandru Gagniuc
  2018-11-15 23:16 ` [PATCH 1/2] PCI/AER: Do not use APEI/HEST to disable AER services globally Alexandru Gagniuc
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Alexandru Gagniuc @ 2018-11-15 23:16 UTC (permalink / raw)
  To: helgaas
  Cc: austin_bolen, alex_gagniuc, keith.busch, Shyam_Iyer, lukas,
	Alexandru Gagniuc, Bjorn Helgaas, Rafael J. Wysocki, Len Brown,
	Russell Currey, Sam Bobroff, Oliver O'Halloran, linux-pci,
	linux-acpi, linux-kernel, linuxppc-dev

Thanks to Keith for pointing out that it doesn't make sense to disable
AER services when only one device has a FIRMWARE_FIRST HEST.

AER ownership is an interesting issue brought in by FFS (firmware-first)
model. In a nutshell if FFS handles AER, then OS should not touch any
of the AER bits. FW might set things up so that it receives AER
notifications via SMI. It's theoretically possible to receive SCIs,
but the exact mechanism is platform-dependent. OS touching AER bits
when firmware owns them may interfere with these notifications.

The ACPI mechanism for negotiating control of AER is _OSC, and is
described in detail in ACPI 6.2 Ch. 6.2.11.3. _OSC is negotiated at
the root bus level. Any root port, switch, or endpoint under the bus
would have its AER ownership negotiated in one _OSC call.

Then there is HEST, which is part of ACPI Platform Error Interfaces
(APEI). HEST tables describe the errors that FW may report to the OS.
A types 6,7 and 7 HEST tables describe AER errors from PCIe devices.
As part of this description, we're told if the error source is FFS.

Information in HEST seems to be redundant, as each error reported by
FW will have a CPER table that describes it in detail.

Because HEST describes an error source as firmware-first or not, we've
taken this to mean ownership of AER. Because AER ownership and error
reporting are coupled, _OSC and HEST usually agree on the matter of
ownership. However, that doesn't seem to be required by ACPI.

I've asked around a few people at Dell and they unanimously agree that
_OSC is the correct way to determine ownership of AER. In linux, we
use the result of _OSC to enable AER services, but we use HEST to
determine AER ownership. That's inconsistent. This series drops the
use of HEST in favor of _OSC.

[1] https://lkml.org/lkml/2018/11/15/62

Alexandru Gagniuc (2):
  PCI/AER: Do not use APEI/HEST to disable AER services globally
  PCI/AER: Determine AER ownership based on _OSC instead of HEST

 drivers/acpi/pci_root.c  |  9 +----
 drivers/pci/pcie/aer.c   | 82 ++--------------------------------------
 include/linux/pci-acpi.h |  6 ---
 3 files changed, 5 insertions(+), 92 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/2] PCI/AER: Do not use APEI/HEST to disable AER services globally
  2018-11-15 23:16 [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Alexandru Gagniuc
@ 2018-11-15 23:16 ` Alexandru Gagniuc
  2018-11-15 23:16 ` [PATCH 2/2] PCI/AER: Determine AER ownership based on _OSC instead of HEST Alexandru Gagniuc
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Alexandru Gagniuc @ 2018-11-15 23:16 UTC (permalink / raw)
  To: helgaas
  Cc: austin_bolen, alex_gagniuc, keith.busch, Shyam_Iyer, lukas,
	Alexandru Gagniuc, Bjorn Helgaas, Rafael J. Wysocki, Len Brown,
	Russell Currey, Sam Bobroff, Oliver O'Halloran, linux-pci,
	linux-acpi, linux-kernel, linuxppc-dev

HEST is used to describe the meaning of errors received as part of ACPI
Platform Error Interfaces (APEI). HEST is tightly coupled to ownership
of AER, however the correct way to determine AER ownership is the _OSC
method.

_OSC is negotiated per root bus. It's possible to have one HEST
descriptor for all root busses where AER is owned by firmware, and
still have root busses that are not firmware-first. In that case we
would incorrectly disable AER services globally.

Instead of using HEST in making the determination, always request AER
control via _OSC, and only disable AER when firmware denies control.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/acpi/pci_root.c  |  9 ++-------
 drivers/pci/pcie/aer.c   | 25 +------------------------
 include/linux/pci-acpi.h |  6 ------
 3 files changed, 3 insertions(+), 37 deletions(-)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 707aafc7c2aa..32b2053bb0fa 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -491,13 +491,8 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm,
 	if (IS_ENABLED(CONFIG_HOTPLUG_PCI_SHPC))
 		control |= OSC_PCI_SHPC_NATIVE_HP_CONTROL;
 
-	if (pci_aer_available()) {
-		if (aer_acpi_firmware_first())
-			dev_info(&device->dev,
-				 "PCIe AER handled by firmware\n");
-		else
-			control |= OSC_PCI_EXPRESS_AER_CONTROL;
-	}
+	if (pci_aer_available())
+		control |= OSC_PCI_EXPRESS_AER_CONTROL;
 
 	requested = control;
 	status = acpi_pci_osc_control_set(handle, &control,
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a90a9194ac4a..ac014151b7a6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -316,29 +316,6 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev)
 	return dev->__aer_firmware_first;
 }
 
-static bool aer_firmware_first;
-
-/**
- * aer_acpi_firmware_first - Check if APEI should control AER.
- */
-bool aer_acpi_firmware_first(void)
-{
-	static bool parsed = false;
-	struct aer_hest_parse_info info = {
-		.pci_dev	= NULL,	/* Check all PCIe devices */
-		.firmware_first	= 0,
-	};
-
-	if (pcie_ports_native)
-		return false;
-
-	if (!parsed) {
-		apei_hest_parse(aer_hest_parse, &info);
-		aer_firmware_first = info.firmware_first;
-		parsed = true;
-	}
-	return aer_firmware_first;
-}
 #endif
 
 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
@@ -1453,7 +1430,7 @@ static struct pcie_port_service_driver aerdriver = {
  */
 int __init pcie_aer_init(void)
 {
-	if (!pci_aer_available() || aer_acpi_firmware_first())
+	if (!pci_aer_available())
 		return -ENXIO;
 	return pcie_port_service_register(&aerdriver);
 }
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 8082b612f561..2e9c0b973eba 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -116,10 +116,4 @@ static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
 static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
 #endif	/* CONFIG_ACPI */
 
-#ifdef CONFIG_ACPI_APEI
-extern bool aer_acpi_firmware_first(void);
-#else
-static inline bool aer_acpi_firmware_first(void) { return false; }
-#endif
-
 #endif	/* _PCI_ACPI_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/2] PCI/AER: Determine AER ownership based on _OSC instead of HEST
  2018-11-15 23:16 [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Alexandru Gagniuc
  2018-11-15 23:16 ` [PATCH 1/2] PCI/AER: Do not use APEI/HEST to disable AER services globally Alexandru Gagniuc
@ 2018-11-15 23:16 ` Alexandru Gagniuc
  2018-11-15 23:43   ` Keith Busch
  2018-11-16  1:49 ` [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Sinan Kaya
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Alexandru Gagniuc @ 2018-11-15 23:16 UTC (permalink / raw)
  To: helgaas
  Cc: austin_bolen, alex_gagniuc, keith.busch, Shyam_Iyer, lukas,
	Alexandru Gagniuc, Bjorn Helgaas, Rafael J. Wysocki, Len Brown,
	Russell Currey, Sam Bobroff, Oliver O'Halloran, linux-pci,
	linux-acpi, linux-kernel, linuxppc-dev

HEST is used to describe the meaning of errors received as part of ACPI
Platform Error Interfaces (APEI), however the correct way to determine
AER ownership is the _OSC method.

pci_dev->__aer_firmware_first is used to prevent modification of AER
registers when firmware owns AER. This is synonymous with the AER
ownership negotiated during _OSC. Thus _OSC is the correct way to
use to set this flag, not HEST.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/pcie/aer.c | 57 ++----------------------------------------
 1 file changed, 2 insertions(+), 55 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index ac014151b7a6..dd9594e8ed08 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -240,66 +240,13 @@ static inline bool hest_match_type(struct acpi_hest_header *hest_hdr,
 	return false;
 }
 
-struct aer_hest_parse_info {
-	struct pci_dev *pci_dev;
-	int firmware_first;
-};
-
-static int hest_source_is_pcie_aer(struct acpi_hest_header *hest_hdr)
-{
-	if (hest_hdr->type == ACPI_HEST_TYPE_AER_ROOT_PORT ||
-	    hest_hdr->type == ACPI_HEST_TYPE_AER_ENDPOINT ||
-	    hest_hdr->type == ACPI_HEST_TYPE_AER_BRIDGE)
-		return 1;
-	return 0;
-}
-
-static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
-{
-	struct aer_hest_parse_info *info = data;
-	struct acpi_hest_aer_common *p;
-	int ff;
-
-	if (!hest_source_is_pcie_aer(hest_hdr))
-		return 0;
-
-	p = (struct acpi_hest_aer_common *)(hest_hdr + 1);
-	ff = !!(p->flags & ACPI_HEST_FIRMWARE_FIRST);
-
-	/*
-	 * If no specific device is supplied, determine whether
-	 * FIRMWARE_FIRST is set for *any* PCIe device.
-	 */
-	if (!info->pci_dev) {
-		info->firmware_first |= ff;
-		return 0;
-	}
 
-	/* Otherwise, check the specific device */
-	if (p->flags & ACPI_HEST_GLOBAL) {
-		if (hest_match_type(hest_hdr, info->pci_dev))
-			info->firmware_first = ff;
-	} else
-		if (hest_match_pci(p, info->pci_dev))
-			info->firmware_first = ff;
-
-	return 0;
-}
 
 static void aer_set_firmware_first(struct pci_dev *pci_dev)
 {
-	int rc;
-	struct aer_hest_parse_info info = {
-		.pci_dev	= pci_dev,
-		.firmware_first	= 0,
-	};
+	struct pci_host_bridge *host = pci_find_host_bridge(pci_dev->bus);
 
-	rc = apei_hest_parse(aer_hest_parse, &info);
-
-	if (rc)
-		pci_dev->__aer_firmware_first = 0;
-	else
-		pci_dev->__aer_firmware_first = info.firmware_first;
+	pci_dev->__aer_firmware_first = !host->native_aer;
 	pci_dev->__aer_firmware_first_valid = 1;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] PCI/AER: Determine AER ownership based on _OSC instead of HEST
  2018-11-15 23:16 ` [PATCH 2/2] PCI/AER: Determine AER ownership based on _OSC instead of HEST Alexandru Gagniuc
@ 2018-11-15 23:43   ` Keith Busch
  0 siblings, 0 replies; 33+ messages in thread
From: Keith Busch @ 2018-11-15 23:43 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: helgaas, austin_bolen, alex_gagniuc, Shyam_Iyer, lukas,
	Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Russell Currey,
	Sam Bobroff, Oliver O'Halloran, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On Thu, Nov 15, 2018 at 05:16:03PM -0600, Alexandru Gagniuc wrote:
>  static void aer_set_firmware_first(struct pci_dev *pci_dev)
>  {
> -	int rc;
> -	struct aer_hest_parse_info info = {
> -		.pci_dev	= pci_dev,
> -		.firmware_first	= 0,
> -	};
> +	struct pci_host_bridge *host = pci_find_host_bridge(pci_dev->bus);
>  
> -	rc = apei_hest_parse(aer_hest_parse, &info);
> -
> -	if (rc)
> -		pci_dev->__aer_firmware_first = 0;
> -	else
> -		pci_dev->__aer_firmware_first = info.firmware_first;
> +	pci_dev->__aer_firmware_first = !host->native_aer;
>  	pci_dev->__aer_firmware_first_valid = 1;
>  }

I think we can clean this up even more by removing the setter and the
__aer_firmware_first fields, and have the pcie_aer_get_firmware_first()
go directly to the host bride->native_aer.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-15 23:16 [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Alexandru Gagniuc
  2018-11-15 23:16 ` [PATCH 1/2] PCI/AER: Do not use APEI/HEST to disable AER services globally Alexandru Gagniuc
  2018-11-15 23:16 ` [PATCH 2/2] PCI/AER: Determine AER ownership based on _OSC instead of HEST Alexandru Gagniuc
@ 2018-11-16  1:49 ` Sinan Kaya
  2018-11-19 16:53   ` Tyler Baicar
  2018-11-16 12:37 ` David Laight
  2019-03-05 23:16 ` Bjorn Helgaas
  4 siblings, 1 reply; 33+ messages in thread
From: Sinan Kaya @ 2018-11-16  1:49 UTC (permalink / raw)
  To: Alexandru Gagniuc, helgaas
  Cc: austin_bolen, alex_gagniuc, keith.busch, Shyam_Iyer, lukas,
	Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Russell Currey,
	Sam Bobroff, Oliver O'Halloran, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> I've asked around a few people at Dell and they unanimously agree that
> _OSC is the correct way to determine ownership of AER. In linux, we
> use the result of _OSC to enable AER services, but we use HEST to
> determine AER ownership. That's inconsistent. This series drops the
> use of HEST in favor of _OSC.
> 
> [1]https://lkml.org/lkml/2018/11/15/62

This change breaks the existing systems that rely on the HEST table
telling the operating system about firmware first presence.

Besides, HEST table has much more granularity about which PCI component
needs firmware such as global/device/switch.

You should probably circulate these ideas for wider consumption in UEFI
forum as UEFI owns the HEST table definition.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-15 23:16 [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Alexandru Gagniuc
                   ` (2 preceding siblings ...)
  2018-11-16  1:49 ` [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Sinan Kaya
@ 2018-11-16 12:37 ` David Laight
  2019-03-05 23:16 ` Bjorn Helgaas
  4 siblings, 0 replies; 33+ messages in thread
From: David Laight @ 2018-11-16 12:37 UTC (permalink / raw)
  To: 'Alexandru Gagniuc', helgaas
  Cc: austin_bolen, alex_gagniuc, keith.busch, Shyam_Iyer, lukas,
	Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Russell Currey,
	Sam Bobroff, Oliver O'Halloran, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

From: Alexandru Gagniuc
> Sent: 15 November 2018 23:16
...
> I've asked around a few people at Dell and they unanimously agree that
> _OSC is the correct way to determine ownership of AER.

This is all very well, but we have systems (they might be Dell ones)
where failure of a PCIe link (even when all the drivers are removed)
causes an NMI - and crashes the kernel.

There are other systems which have AER registers available for
some of the PCIe devices, but the BIOS ignores them and _OSC
doesn't let the kernel take control.
In this case the AER registers can contain useful information
after a read returns ~0u.
It would be useful to be able to get this information without
having to grovel through all the PCIe config space.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-16  1:49 ` [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Sinan Kaya
@ 2018-11-19 16:53   ` Tyler Baicar
  2018-11-19 16:53     ` Keith Busch
  0 siblings, 1 reply; 33+ messages in thread
From: Tyler Baicar @ 2018-11-19 16:53 UTC (permalink / raw)
  To: okaya
  Cc: mr.nuke.me, helgaas, austin_bolen, alex_gagniuc, keith.busch,
	Shyam_Iyer, lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall,
	linux-pci, linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya <okaya@kernel.org> wrote:
>
> On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > I've asked around a few people at Dell and they unanimously agree that
> > _OSC is the correct way to determine ownership of AER. In linux, we
> > use the result of _OSC to enable AER services, but we use HEST to
> > determine AER ownership. That's inconsistent. This series drops the
> > use of HEST in favor of _OSC.
> >
> > [1]https://lkml.org/lkml/2018/11/15/62
>
> This change breaks the existing systems that rely on the HEST table
> telling the operating system about firmware first presence.
>
> Besides, HEST table has much more granularity about which PCI component
> needs firmware such as global/device/switch.
>
> You should probably circulate these ideas for wider consumption in UEFI
> forum as UEFI owns the HEST table definition.

I agree with Sinan, this will break existing systems, and the granularity of the
HEST definition is more useful than the single bit in _OSC.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 16:53   ` Tyler Baicar
@ 2018-11-19 16:53     ` Keith Busch
  2018-11-19 17:32       ` Sinan Kaya
  0 siblings, 1 reply; 33+ messages in thread
From: Keith Busch @ 2018-11-19 16:53 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: okaya, mr.nuke.me, helgaas, austin_bolen, alex_gagniuc,
	Shyam_Iyer, lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall,
	linux-pci, linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:
> On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya <okaya@kernel.org> wrote:
> >
> > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > > I've asked around a few people at Dell and they unanimously agree that
> > > _OSC is the correct way to determine ownership of AER. In linux, we
> > > use the result of _OSC to enable AER services, but we use HEST to
> > > determine AER ownership. That's inconsistent. This series drops the
> > > use of HEST in favor of _OSC.
> > >
> > > [1]https://lkml.org/lkml/2018/11/15/62
> >
> > This change breaks the existing systems that rely on the HEST table
> > telling the operating system about firmware first presence.
> >
> > Besides, HEST table has much more granularity about which PCI component
> > needs firmware such as global/device/switch.
> >
> > You should probably circulate these ideas for wider consumption in UEFI
> > forum as UEFI owns the HEST table definition.
> 
> I agree with Sinan, this will break existing systems, and the granularity of the
> HEST definition is more useful than the single bit in _OSC.

But we're not using HEST as a fine grain control. We disable native AER
handling if *any* device has FF set in HEST, and that just forces people
to use pcie_ports=native to get around that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 16:53     ` Keith Busch
@ 2018-11-19 17:32       ` Sinan Kaya
  2018-11-19 17:36         ` Keith Busch
  2018-11-19 17:42         ` Sinan Kaya
  0 siblings, 2 replies; 33+ messages in thread
From: Sinan Kaya @ 2018-11-19 17:32 UTC (permalink / raw)
  To: Keith Busch, Tyler Baicar
  Cc: mr.nuke.me, helgaas, austin_bolen, alex_gagniuc, Shyam_Iyer,
	lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall, linux-pci,
	linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On 11/19/2018 11:53 AM, Keith Busch wrote:
> On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:
>> On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya <okaya@kernel.org> wrote:
>>>
>>> On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
>>>> I've asked around a few people at Dell and they unanimously agree that
>>>> _OSC is the correct way to determine ownership of AER. In linux, we
>>>> use the result of _OSC to enable AER services, but we use HEST to
>>>> determine AER ownership. That's inconsistent. This series drops the
>>>> use of HEST in favor of _OSC.
>>>>
>>>> [1]https://lkml.org/lkml/2018/11/15/62
>>>
>>> This change breaks the existing systems that rely on the HEST table
>>> telling the operating system about firmware first presence.
>>>
>>> Besides, HEST table has much more granularity about which PCI component
>>> needs firmware such as global/device/switch.
>>>
>>> You should probably circulate these ideas for wider consumption in UEFI
>>> forum as UEFI owns the HEST table definition.
>>
>> I agree with Sinan, this will break existing systems, and the granularity of the
>> HEST definition is more useful than the single bit in _OSC.
> 
> But we're not using HEST as a fine grain control. We disable native AER
> handling if *any* device has FF set in HEST, and that just forces people
> to use pcie_ports=native to get around that.
> 

I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
It switches to firmware first mode if global flag in HEST is set. Otherwise
for each BDF in device, hest_match_pci() is used to do a cross-matching against
HEST table contents.

Am I missing something?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 17:32       ` Sinan Kaya
@ 2018-11-19 17:36         ` Keith Busch
  2018-11-19 17:42         ` Sinan Kaya
  1 sibling, 0 replies; 33+ messages in thread
From: Keith Busch @ 2018-11-19 17:36 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Tyler Baicar, mr.nuke.me, helgaas, austin_bolen, alex_gagniuc,
	Shyam_Iyer, lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall,
	linux-pci, linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On Mon, Nov 19, 2018 at 12:32:42PM -0500, Sinan Kaya wrote:
> On 11/19/2018 11:53 AM, Keith Busch wrote:
> > On Mon, Nov 19, 2018 at 11:53:05AM -0500, Tyler Baicar wrote:
> > > On Thu, Nov 15, 2018 at 8:49 PM Sinan Kaya <okaya@kernel.org> wrote:
> > > > 
> > > > On 11/15/2018 3:16 PM, Alexandru Gagniuc wrote:
> > > > > I've asked around a few people at Dell and they unanimously agree that
> > > > > _OSC is the correct way to determine ownership of AER. In linux, we
> > > > > use the result of _OSC to enable AER services, but we use HEST to
> > > > > determine AER ownership. That's inconsistent. This series drops the
> > > > > use of HEST in favor of _OSC.
> > > > > 
> > > > > [1]https://lkml.org/lkml/2018/11/15/62
> > > > 
> > > > This change breaks the existing systems that rely on the HEST table
> > > > telling the operating system about firmware first presence.
> > > > 
> > > > Besides, HEST table has much more granularity about which PCI component
> > > > needs firmware such as global/device/switch.
> > > > 
> > > > You should probably circulate these ideas for wider consumption in UEFI
> > > > forum as UEFI owns the HEST table definition.
> > > 
> > > I agree with Sinan, this will break existing systems, and the granularity of the
> > > HEST definition is more useful than the single bit in _OSC.
> > 
> > But we're not using HEST as a fine grain control. We disable native AER
> > handling if *any* device has FF set in HEST, and that just forces people
> > to use pcie_ports=native to get around that.
> > 
> 
> I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
> It switches to firmware first mode if global flag in HEST is set. Otherwise
> for each BDF in device, hest_match_pci() is used to do a cross-matching against
> HEST table contents.
> 
> Am I missing something?

You might be. :)

static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
{
<snip>
        /*
         * If no specific device is supplied, determine whether
         * FIRMWARE_FIRST is set for *any* PCIe device.
         */
        if (!info->pci_dev) {
                info->firmware_first |= ff;
                return 0;
        }


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 17:42         ` Sinan Kaya
@ 2018-11-19 17:41           ` Keith Busch
  2018-11-19 17:56             ` Sinan Kaya
  0 siblings, 1 reply; 33+ messages in thread
From: Keith Busch @ 2018-11-19 17:41 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Tyler Baicar, mr.nuke.me, helgaas, austin_bolen, alex_gagniuc,
	Shyam_Iyer, lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall,
	linux-pci, linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On Mon, Nov 19, 2018 at 12:42:25PM -0500, Sinan Kaya wrote:
> On 11/19/2018 12:32 PM, Sinan Kaya wrote:
> > > 
> > > But we're not using HEST as a fine grain control. We disable native AER
> > > handling if *any* device has FF set in HEST, and that just forces people
> > > to use pcie_ports=native to get around that.
> > > 
> > 
> > I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
> > It switches to firmware first mode if global flag in HEST is set. Otherwise
> > for each BDF in device, hest_match_pci() is used to do a cross-matching against
> > HEST table contents.
> > 
> > Am I missing something?
> 
> I see. I think you are talking about aer_firmware_first, right?
> 
> aer_set_firmware_first() and  pcie_aer_get_firmware_first() seem to do the right
> thing.

Right, but what difference does it make if device specific AER checks do
the right thing if pcie_aer_init() doesn't even register it's port driver?
 
> aer_firmware_first is probably getting set because events are all routed to a
> single root port and aer_acpi_firmware_first() is used to decide whether AER
> should be initialized or not.
> 
> I think I understand what is going on now.
> 
> Still, breaking existing systems that rely on HEST table is not cool.
> I'd rather have users specify "pcie_ports=native" to skip FF rather than
> having broken systems by default to be honest.

The pcie_ports=native work-around ignores FF to potentially unknown
results, though.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 17:32       ` Sinan Kaya
  2018-11-19 17:36         ` Keith Busch
@ 2018-11-19 17:42         ` Sinan Kaya
  2018-11-19 17:41           ` Keith Busch
  1 sibling, 1 reply; 33+ messages in thread
From: Sinan Kaya @ 2018-11-19 17:42 UTC (permalink / raw)
  To: Keith Busch, Tyler Baicar
  Cc: mr.nuke.me, helgaas, austin_bolen, alex_gagniuc, Shyam_Iyer,
	lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall, linux-pci,
	linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On 11/19/2018 12:32 PM, Sinan Kaya wrote:
>>
>> But we're not using HEST as a fine grain control. We disable native AER
>> handling if *any* device has FF set in HEST, and that just forces people
>> to use pcie_ports=native to get around that.
>>
> 
> I don't see *any* in the code.  aer_hest_parse() does the HEST table parsing.
> It switches to firmware first mode if global flag in HEST is set. Otherwise
> for each BDF in device, hest_match_pci() is used to do a cross-matching against
> HEST table contents.
> 
> Am I missing something?

I see. I think you are talking about aer_firmware_first, right?

aer_set_firmware_first() and  pcie_aer_get_firmware_first() seem to do the right
thing.

aer_firmware_first is probably getting set because events are all routed to a
single root port and aer_acpi_firmware_first() is used to decide whether AER
should be initialized or not.

I think I understand what is going on now.

Still, breaking existing systems that rely on HEST table is not cool.
I'd rather have users specify "pcie_ports=native" to skip FF rather than
having broken systems by default to be honest.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 17:41           ` Keith Busch
@ 2018-11-19 17:56             ` Sinan Kaya
  2018-11-19 18:10               ` Keith Busch
  0 siblings, 1 reply; 33+ messages in thread
From: Sinan Kaya @ 2018-11-19 17:56 UTC (permalink / raw)
  To: Keith Busch
  Cc: Tyler Baicar, mr.nuke.me, helgaas, austin_bolen, alex_gagniuc,
	Shyam_Iyer, lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall,
	linux-pci, linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On 11/19/2018 12:41 PM, Keith Busch wrote:
>> Still, breaking existing systems that rely on HEST table is not cool.
>> I'd rather have users specify "pcie_ports=native" to skip FF rather than
>> having broken systems by default to be honest.
> The pcie_ports=native work-around ignores FF to potentially unknown
> results, though.
> 

How about being able to enable/disable FF in BIOS?

We can't really turn off firmware first in the kernel without asking help
from the firmware. Like you said, it causes unpredictable results.

There will be two competing software trying to touch the same registers.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 17:56             ` Sinan Kaya
@ 2018-11-19 18:10               ` Keith Busch
  2018-11-19 18:24                 ` Sinan Kaya
  0 siblings, 1 reply; 33+ messages in thread
From: Keith Busch @ 2018-11-19 18:10 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Tyler Baicar, mr.nuke.me, austin_bolen, alex_gagniuc, Shyam_Iyer,
	lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall, linux-pci,
	linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On Mon, Nov 19, 2018 at 12:56:56PM -0500, Sinan Kaya wrote:
> On 11/19/2018 12:41 PM, Keith Busch wrote:
> > > Still, breaking existing systems that rely on HEST table is not cool.
> > > I'd rather have users specify "pcie_ports=native" to skip FF rather than
> > > having broken systems by default to be honest.
> > The pcie_ports=native work-around ignores FF to potentially unknown
> > results, though.
> > 
> 
> How about being able to enable/disable FF in BIOS?
>
> We can't really turn off firmware first in the kernel without asking help
> from the firmware.

The _OSC method this patch utilizes is the ACPI spec defined way for
the kernel to wrest control from firmware. BIOS specific menu settings
shouldn't be our only recourse when we have a spec authority defining
generic OS interfaces to accomplish the same thing.

Unless there is a disagreement on the _OSC interpreation, we'd have to
accept that platforms breaking from this patch are non-compliant.

> Like you said, it causes unpredictable results.
>
> There will be two competing software trying to touch the same registers.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 18:10               ` Keith Busch
@ 2018-11-19 18:24                 ` Sinan Kaya
  2018-11-19 19:11                   ` Alex G.
  0 siblings, 1 reply; 33+ messages in thread
From: Sinan Kaya @ 2018-11-19 18:24 UTC (permalink / raw)
  To: Keith Busch
  Cc: Tyler Baicar, mr.nuke.me, austin_bolen, alex_gagniuc, Shyam_Iyer,
	lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall, linux-pci,
	linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On 11/19/2018 1:10 PM, Keith Busch wrote:
>> We can't really turn off firmware first in the kernel without asking help
>> from the firmware.
> The _OSC method this patch utilizes is the ACPI spec defined way for
> the kernel to wrest control from firmware. BIOS specific menu settings
> shouldn't be our only recourse when we have a spec authority defining
> generic OS interfaces to accomplish the same thing.
> 
> Unless there is a disagreement on the _OSC interpreation, we'd have to
> accept that platforms breaking from this patch are non-compliant.
> 

It depends on which spec you look :)

UEFI HEST table specification also claims that it should be the ultimate
table for when PCI firmware-first should be disabled/enabled.

I think somebody needs to fix these. I saw an email from Harb Abdulhamid
sent to aswg this morning.

That's why I suggested circulating this idea in UEFI forums first.
Let's see what everybody thinks. We can go from there.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 18:24                 ` Sinan Kaya
@ 2018-11-19 19:11                   ` Alex G.
  2018-11-19 19:32                     ` Sinan Kaya
  0 siblings, 1 reply; 33+ messages in thread
From: Alex G. @ 2018-11-19 19:11 UTC (permalink / raw)
  To: Sinan Kaya, Keith Busch
  Cc: Tyler Baicar, austin_bolen, alex_gagniuc, Shyam_Iyer, lukas,
	bhelgaas, rjw, lenb, ruscur, sbobroff, oohall, linux-pci,
	linux-acpi, Linux Kernel Mailing List, linuxppc-dev

On 11/19/2018 12:24 PM, Sinan Kaya wrote:
> On 11/19/2018 1:10 PM, Keith Busch wrote:
>>> We can't really turn off firmware first in the kernel without asking 
>>> help
>>> from the firmware.
>> The _OSC method this patch utilizes is the ACPI spec defined way for
>> the kernel to wrest control from firmware. BIOS specific menu settings
>> shouldn't be our only recourse when we have a spec authority defining
>> generic OS interfaces to accomplish the same thing.
>>
>> Unless there is a disagreement on the _OSC interpreation, we'd have to
>> accept that platforms breaking from this patch are non-compliant.
>>
> 
> It depends on which spec you look :)
> 
> UEFI HEST table specification also claims that it should be the ultimate
> table for when PCI firmware-first should be disabled/enabled.

IIRC, EFI absorbed ACPI before FFS was a thing. Could you point me to 
the UEFI chapter that says HEST is authoritative?
(not being a smartie, just that my free software PDF readers can't 
search within a file that large)


> I think somebody needs to fix these. I saw an email from Harb Abdulhamid
> sent to aswg this morning.
> 
> That's why I suggested circulating this idea in UEFI forums first.
> Let's see what everybody thinks. We can go from there.

However you look at it, we have a glaring inconsistency in how we handle 
AER control in linux. I'm surprised we didn't see huge issues because of 
mixing HEST/_OSC.

What systems rely on the HEST definition as opposed to _OSC? It doesn't 
make sense to me that you could have a system with mixed FFS and native 
AER on the same root port. The granularity of HEST shouldn't matter here 
because of how AER works.

I'd like see how exactly we break one of those elusive systems with 
_OSC. I suspect _OSC and HEST end up having the same information, and 
that's why we didn't see any real-life issue with mixing the approaches.

Alex


P.S.
(SARCASM ALERT) Isn't UEFI is a pile of stuff that didn't stick to the wall?

P.P.S
I remember someone asking why we don't disable FFS in the BIOS. I think 
it would be next to impossible to get certain platform vendors to 
relinquish FFS control, unless the specs required things to be that way 
-- and had a "standard" way to do so.

Then getting the specs to change in that direction is also a battle.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 19:11                   ` Alex G.
@ 2018-11-19 19:32                     ` Sinan Kaya
  2018-11-19 20:16                       ` Alex_Gagniuc
  0 siblings, 1 reply; 33+ messages in thread
From: Sinan Kaya @ 2018-11-19 19:32 UTC (permalink / raw)
  To: Alex G., Keith Busch
  Cc: Tyler Baicar, austin_bolen, alex_gagniuc, Shyam_Iyer, lukas,
	bhelgaas, rjw, lenb, ruscur, sbobroff, oohall, linux-pci,
	linux-acpi, Linux Kernel Mailing List, linuxppc-dev

>> UEFI HEST table specification also claims that it should be the ultimate
>> table for when PCI firmware-first should be disabled/enabled.
> 
> IIRC, EFI absorbed ACPI before FFS was a thing. Could you point me to the UEFI 
> chapter that says HEST is authoritative?
> (not being a smartie, just that my free software PDF readers can't search within 
> a file that large)
> 

ACPI 6.2:

18.3.2.4 PCI Express Root Port AER Structure

Flags:

Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system 
firmware will handle errors from this source first.
Bit [1] - GLOBAL: If set, indicates that the settings contained in this 
structure apply globally to all PCI Express Devices.
All other bits must be set to zero.

It doesn't say shall, may or might. It says will.

> 
>> I think somebody needs to fix these. I saw an email from Harb Abdulhamid
>> sent to aswg this morning.
>>
>> That's why I suggested circulating this idea in UEFI forums first.
>> Let's see what everybody thinks. We can go from there.
> 
> However you look at it, we have a glaring inconsistency in how we handle AER 
> control in linux. I'm surprised we didn't see huge issues because of mixing 
> HEST/_OSC.
> 
> What systems rely on the HEST definition as opposed to _OSC? It doesn't make 
> sense to me that you could have a system with mixed FFS and native AER on the 
> same root port. The granularity of HEST shouldn't matter here because of how AER 
> works.

I think It depends on your PCI topology.

For other topologies with multiple PCI root complexes, I can see this being
used per root complex flag to indicate which root complex needs firmware first
and which one doesn't.

> 
> I'd like see how exactly we break one of those elusive systems with _OSC. I 
> suspect _OSC and HEST end up having the same information, and that's why we 
> didn't see any real-life issue with mixing the approaches.

I'm already aware of two systems that rely on HEST table to pass information to
the OS that firmware first is enabled. Both of the systems do not change their
_OSC bits during this assuming HEST table has priority over _OSC for firmware
first.

If we add this patch, OS will try to claim the AER address space while firmware
wants exclusive access.

As I said in my previous email, the right place to talk about this is UEFI
forum.

> 
> Alex
> 
> 
> P.S.
> (SARCASM ALERT) Isn't UEFI is a pile of stuff that didn't stick to the wall?
> 
> P.P.S
> I remember someone asking why we don't disable FFS in the BIOS. I think it would 
> be next to impossible to get certain platform vendors to relinquish FFS control, 
> unless the specs required things to be that way -- and had a "standard" way to 
> do so.
> 
> Then getting the specs to change in that direction is also a battle.
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 19:32                     ` Sinan Kaya
@ 2018-11-19 20:16                       ` Alex_Gagniuc
  2018-11-19 20:33                         ` Sinan Kaya
  0 siblings, 1 reply; 33+ messages in thread
From: Alex_Gagniuc @ 2018-11-19 20:16 UTC (permalink / raw)
  To: okaya, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/19/2018 01:32 PM, Sinan Kaya wrote:
> ACPI 6.2:
> 
> 18.3.2.4 PCI Express Root Port AER Structure
> 
> Flags:
> 
> Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system
> firmware will handle errors from this source first.
> Bit [1] - GLOBAL: If set, indicates that the settings contained in this
> structure apply globally to all PCI Express Devices.
> All other bits must be set to zero.
> 
> It doesn't say shall, may or might. It says will.

It says "system firmware will handle errors". It does not say "system 
firmware owns AER registers". In absence on any descriptor text on the 
meaning of these tables, this really looks to me like it should be 
interpreted as a descriptor of APEI error sources, not a mutex on who 
writes to certain bits-- AER in this case.

I don't think that is contradictory or inconsistent.
I also wasn't able to find any reference to HEST in UEFI 2.7, only in 
ACPI spec.

> I think It depends on your PCI topology.
> 
> For other topologies with multiple PCI root complexes, I can see this being
> used per root complex flag to indicate which root complex needs firmware first
> and which one doesn't.

_OSC is per root bus, so it's already granular enough, right? Why would 
it depend on PCI topology?


>> I'd like see how exactly we break one of those elusive systems with _OSC. I
>> suspect _OSC and HEST end up having the same information, and that's why we
>> didn't see any real-life issue with mixing the approaches.
> 
> I'm already aware of two systems that rely on HEST table to pass information to
> the OS that firmware first is enabled. Both of the systems do not change their
> _OSC bits during this assuming HEST table has priority over _OSC for firmware
> first.

Are those hax86 systems?
It seems like the systems have broken firmware. I see several ways to 
handle broken systems like those:
  - Parse both HEST and _OSC, and decide AER ownership with root bridge 
granularity. i.e. host_bridge->native_aer is authoritative, but is 
derived from both HEST and _OSC
  - Add quirks for the broken systems
  - Keep doing what we're doing until current code breaks a new system

> If we add this patch, OS will try to claim the AER address space while firmware
> wants exclusive access.

Yay! FFS wants exclusive access, but does not claim it. Oh, FFS!


> As I said in my previous email, the right place to talk about this is UEFI
> forum.

The way I would present the problem to he spec writers is that, although 
the spec appears to be consistent, we've seen firmware vendors that made 
the wrong assumptions about HEST/_OSC. Instead of describing AER 
ownership with _OSC, they attempted to do it with HEST. So we should add 
an implementation note, or clarification about this.

Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 20:16                       ` Alex_Gagniuc
@ 2018-11-19 20:33                         ` Sinan Kaya
  2018-11-19 23:49                           ` Alex_Gagniuc
  0 siblings, 1 reply; 33+ messages in thread
From: Sinan Kaya @ 2018-11-19 20:33 UTC (permalink / raw)
  To: Alex_Gagniuc, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/19/2018 3:16 PM, Alex_Gagniuc@Dellteam.com wrote:
> On 11/19/2018 01:32 PM, Sinan Kaya wrote:
>> ACPI 6.2:
>>
>> 18.3.2.4 PCI Express Root Port AER Structure
>>
>> Flags:
>>
>> Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system
>> firmware will handle errors from this source first.
>> Bit [1] - GLOBAL: If set, indicates that the settings contained in this
>> structure apply globally to all PCI Express Devices.
>> All other bits must be set to zero.
>>
>> It doesn't say shall, may or might. It says will.
> 
> It says "system firmware will handle errors". It does not say "system
> firmware owns AER registers". In absence on any descriptor text on the
> meaning of these tables, this really looks to me like it should be
> interpreted as a descriptor of APEI error sources, not a mutex on who
> writes to certain bits-- AER in this case.

True. I was trying to get it out in a rush. I omitted words.

However; table assumes governance about for which entities firmware first
should be enabled. There is no cross reference to _OSC or permission
negotiation like _OST.

> 
> I don't think that is contradictory or inconsistent.
> I also wasn't able to find any reference to HEST in UEFI 2.7, only in
> ACPI spec.

You are right. It was a confusion on my side. The right place to look is
ACPI specification. I was involved in this a couple of years ago. Some pieces
were in UEFI spec. Other pieces were in ACPI. I guess they got unified
now.

> 
>> I think It depends on your PCI topology.
>>
>> For other topologies with multiple PCI root complexes, I can see this being
>> used per root complex flag to indicate which root complex needs firmware first
>> and which one doesn't.
> 
> _OSC is per root bus, so it's already granular enough, right? Why would
> it depend on PCI topology?
> 

I was speculating. I don't have the full background on this. Need to consult
the spec developers.

>> As I said in my previous email, the right place to talk about this is UEFI
>> forum.
> 
> The way I would present the problem to he spec writers is that, although
> the spec appears to be consistent, we've seen firmware vendors that made
> the wrong assumptions about HEST/_OSC. Instead of describing AER
> ownership with _OSC, they attempted to do it with HEST. So we should add
> an implementation note, or clarification about this.

I agree.

> 
> Alex
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 20:33                         ` Sinan Kaya
@ 2018-11-19 23:49                           ` Alex_Gagniuc
  2018-11-20  1:54                             ` Sinan Kaya
  0 siblings, 1 reply; 33+ messages in thread
From: Alex_Gagniuc @ 2018-11-19 23:49 UTC (permalink / raw)
  To: okaya, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/19/2018 02:33 PM, Sinan Kaya wrote:
> True. I was trying to get it out in a rush. I omitted words.

Sounds like you'd make an top notch spec writer! :p

> However; table assumes governance about for which entities firmware first
> should be enabled. There is no cross reference to _OSC or permission
> negotiation like _OST.

Well, from an OSPM perspective, is FFS something that can be enabled or 
disabled? FFS seems to be static to OSPM, which would change the sort of 
assumptions we can reasonably make here.


>>> As I said in my previous email, the right place to talk about this is UEFI
>>> forum.
>>
>> The way I would present the problem to he spec writers is that, although
>> the spec appears to be consistent, we've seen firmware vendors that made
>> the wrong assumptions about HEST/_OSC. Instead of describing AER
>> ownership with _OSC, they attempted to do it with HEST. So we should add
>> an implementation note, or clarification about this.
> 
> I agree.

Cool. While the UEFI Secret Society debates, can we figure out if/how 
[patch 1/2] breaks those systems, or is it only [patch 2/2] of this 
series that we suspect?

Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-19 23:49                           ` Alex_Gagniuc
@ 2018-11-20  1:54                             ` Sinan Kaya
  2018-11-20 20:44                               ` Alex_Gagniuc
  0 siblings, 1 reply; 33+ messages in thread
From: Sinan Kaya @ 2018-11-20  1:54 UTC (permalink / raw)
  To: Alex_Gagniuc, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/19/2018 6:49 PM, Alex_Gagniuc@Dellteam.com wrote:
> On 11/19/2018 02:33 PM, Sinan Kaya wrote:
>> However; table assumes governance about for which entities firmware first
>> should be enabled. There is no cross reference to _OSC or permission
>> negotiation like _OST.
> 
> Well, from an OSPM perspective, is FFS something that can be enabled or
> disabled? FFS seems to be static to OSPM, which would change the sort of
> assumptions we can reasonably make here.

IMO, it can be enabled/disabled in BIOS. I have seen this implementation before.
If the trigger is the presence of a statically compiled ACPI HEST table (as the
current code does); presence of FFS would be static from OSPM perspective.
BIOS could patch this table or hide it during boot.

If FFS were to be negotiated via _OSC as indirectly implied in this series, then
same BIOS could patch the ACPI table to return different values for the _OSC
return.

> 
> 
>>>> As I said in my previous email, the right place to talk about this is UEFI
>>>> forum.
>>>
>>> The way I would present the problem to he spec writers is that, although
>>> the spec appears to be consistent, we've seen firmware vendors that made
>>> the wrong assumptions about HEST/_OSC. Instead of describing AER
>>> ownership with _OSC, they attempted to do it with HEST. So we should add
>>> an implementation note, or clarification about this.
>>
>> I agree.
> 
> Cool. While the UEFI Secret Society debates, can we figure out if/how
> [patch 1/2] breaks those systems, or is it only [patch 2/2] of this
> series that we suspect?

I went back and looked at both patches. Both of them are removing references to
HEST table. I think both patches are impacted by this discussion.

> 
> Alex
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-20  1:54                             ` Sinan Kaya
@ 2018-11-20 20:44                               ` Alex_Gagniuc
  2018-11-20 21:02                                 ` Sinan Kaya
  0 siblings, 1 reply; 33+ messages in thread
From: Alex_Gagniuc @ 2018-11-20 20:44 UTC (permalink / raw)
  To: okaya, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/19/2018 07:54 PM, Sinan Kaya wrote:
> On 11/19/2018 6:49 PM, Alex_Gagniuc@Dellteam.com wrote:
>> On 11/19/2018 02:33 PM, Sinan Kaya wrote:
>>> However; table assumes governance about for which entities firmware first
>>> should be enabled. There is no cross reference to _OSC or permission
>>> negotiation like _OST.
>>
>> Well, from an OSPM perspective, is FFS something that can be enabled or
>> disabled? FFS seems to be static to OSPM, which would change the sort of
>> assumptions we can reasonably make here.
> 
> IMO, it can be enabled/disabled in BIOS. I have seen this implementation before.
> If the trigger is the presence of a statically compiled ACPI HEST table (as the
> current code does); presence of FFS would be static from OSPM perspective.
> BIOS could patch this table or hide it during boot.
> 
> If FFS were to be negotiated via _OSC as indirectly implied in this series, then
> same BIOS could patch the ACPI table to return different values for the _OSC
> return.

It is theoretically possible to have proprietary BIOS settings to 
disable FFS. The platform vendors that I've spoken to do not offer this 
option. Though even if, hypothetically, BIOS clears the FFS bit in HEST, 
it won't stop it from commandeering the CPU and doing whatever it wants.

Although, I'm not quite sure why we'd want to negotiate FFS itself. FFS 
is too big of a can of worms (goes far beyond AER error reporting), when 
what we really care about is if OS can use a specific feature or not.
>> Cool. While the UEFI Secret Society debates, can we figure out if/how
>> [patch 1/2] breaks those systems, or is it only [patch 2/2] of this
>> series that we suspect?
> 
> I went back and looked at both patches. Both of them are removing references to
> HEST table. I think both patches are impacted by this discussion.

I'd prefer "sure" instead of "think". "I think it breaks some system I'm 
not telling you about" doesn't help much in figuring out how not to 
break said system(s). :)

Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-20 20:44                               ` Alex_Gagniuc
@ 2018-11-20 21:02                                 ` Sinan Kaya
  2018-11-20 21:42                                   ` Keith Busch
  2018-11-20 21:46                                   ` Alex_Gagniuc
  0 siblings, 2 replies; 33+ messages in thread
From: Sinan Kaya @ 2018-11-20 21:02 UTC (permalink / raw)
  To: Alex_Gagniuc, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/20/2018 3:44 PM, Alex_Gagniuc@Dellteam.com wrote:
> I'd prefer "sure" instead of "think". "I think it breaks some system I'm
> not telling you about" doesn't help much in figuring out how not to
> break said system(s).:)

Sorry, I thought I mentioned why it would break but let me repeat.

The systems I have seen rely on the HEST table presence as an indicator
to the OS that firmware first is enabled. If you go look at the _OSC bits
on such systems, it still says OS owns the AER service.

The assumption here is that HEST table has precedence over the _OSC bits.
That's what needs to be clarified in the UEFI forum.

If this code is to go in and ignore the HEST table presence, then firmware
will think that it owns AER service and OS will think that it owns AER
service too.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-20 21:02                                 ` Sinan Kaya
@ 2018-11-20 21:42                                   ` Keith Busch
  2018-11-20 22:28                                     ` Sinan Kaya
  2018-11-20 21:46                                   ` Alex_Gagniuc
  1 sibling, 1 reply; 33+ messages in thread
From: Keith Busch @ 2018-11-20 21:42 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Alex_Gagniuc, mr.nuke.me, baicar.tyler, Austin.Bolen, Shyam.Iyer,
	lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall, linux-pci,
	linux-acpi, linux-kernel, linuxppc-dev

On Tue, Nov 20, 2018 at 04:02:21PM -0500, Sinan Kaya wrote:
> On 11/20/2018 3:44 PM, Alex_Gagniuc@Dellteam.com wrote:
> > I'd prefer "sure" instead of "think". "I think it breaks some system I'm
> > not telling you about" doesn't help much in figuring out how not to
> > break said system(s).:)
> 
> Sorry, I thought I mentioned why it would break but let me repeat.
> 
> The systems I have seen rely on the HEST table presence as an indicator
> to the OS that firmware first is enabled. If you go look at the _OSC bits
> on such systems, it still says OS owns the AER service.
> 
> The assumption here is that HEST table has precedence over the _OSC bits.
> That's what needs to be clarified in the UEFI forum.
> 
> If this code is to go in and ignore the HEST table presence, then firmware
> will think that it owns AER service and OS will think that it owns AER
> service too.

How does that work? If the OS takes control, it sets up MSIs that FW don't
react to, and disables system errors through PCIe Root Control. Aren't
those sys errs the mechanism FW knows it has something to do, which
means the OS can effectively fence it off?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-20 21:02                                 ` Sinan Kaya
  2018-11-20 21:42                                   ` Keith Busch
@ 2018-11-20 21:46                                   ` Alex_Gagniuc
  2018-11-20 22:08                                     ` Sinan Kaya
  1 sibling, 1 reply; 33+ messages in thread
From: Alex_Gagniuc @ 2018-11-20 21:46 UTC (permalink / raw)
  To: okaya, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/20/2018 03:02 PM, Sinan Kaya wrote:
> On 11/20/2018 3:44 PM, Alex_Gagniuc@Dellteam.com wrote:
>> I'd prefer "sure" instead of "think". "I think it breaks some system I'm
>> not telling you about" doesn't help much in figuring out how not to
>> break said system(s).:)
> 
> Sorry, I thought I mentioned why it would break but let me repeat.

Why, yes, but bets are still being placed on the systems allegedly 
suffering from this.


> The systems I have seen rely on the HEST table presence as an indicator
> to the OS that firmware first is enabled. If you go look at the _OSC bits
> on such systems, it still says OS owns the AER service.
> 
> The assumption here is that HEST table has precedence over the _OSC bits.
> That's what needs to be clarified in the UEFI forum.
> 
> If this code is to go in and ignore the HEST table presence, then firmware
> will think that it owns AER service and OS will think that it owns AER
> service too.

So this seems like exactly the scenario we were hypothesizing.

  * System boots up with FFS enabled. Everything is fine so far.
  * OSPM requests control of AER (set bit 3 in _OSC)
  * FW grants OS control of AER (set bit 3 in _OSC reply)

That's how things are designed to work.


Now, let's assume, for the sake of argument, that the firmware on those 
system's is broken, and it didn't intend to give the OS control of AER. 
OSPM checking HEST instead of _OSC is still wrong, according to the 
spec. Two wrongs don't make a right, they just don't crash.

I think the correct way is to identify those broken systems, and add 
quirks for them. Continuing to have inconsistent and over-complicated 
logic that is not spec compliant is not any better.

Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-20 21:46                                   ` Alex_Gagniuc
@ 2018-11-20 22:08                                     ` Sinan Kaya
  2018-11-20 22:36                                       ` Alex_Gagniuc
  2018-11-27 18:22                                       ` Alex_Gagniuc
  0 siblings, 2 replies; 33+ messages in thread
From: Sinan Kaya @ 2018-11-20 22:08 UTC (permalink / raw)
  To: Alex_Gagniuc, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/20/2018 4:46 PM, Alex_Gagniuc@Dellteam.com wrote:
> Now, let's assume, for the sake of argument, that the firmware on those
> system's is broken, and it didn't intend to give the OS control of AER.
> OSPM checking HEST instead of _OSC is still wrong, according to the
> spec. Two wrongs don't make a right, they just don't crash.
> 
> I think the correct way is to identify those broken systems, and add
> quirks for them. Continuing to have inconsistent and over-complicated
> logic that is not spec compliant is not any better.

Remember that both _OSC and HEST are in the ACPI specification. I don't
think there is a consensus on what is "wrong".

There is certainly a need for spec clarification.

One version is:

"if HEST table is present, ignore _OSC"

or

Another version is:

"if HEST table is present, make sure that FW sets _OSC bit for AER to
false. Otherwise, warn like crazy that this BIOS is broken and needs
an update and can cause all sorts of trouble"

I can see both points of view. The second one can also be worked around
by an SMBIOS quirk too as you suggested. Counting the number of quirks
and random bug reports will be an interesting exercise / regression.

I followed the ASWG thread yesterday. There will be a meeting next week to
discuss this.

My preference is not to introduce new behavior/regression to the kernel.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-20 21:42                                   ` Keith Busch
@ 2018-11-20 22:28                                     ` Sinan Kaya
  2018-11-20 22:35                                       ` Alex G.
  0 siblings, 1 reply; 33+ messages in thread
From: Sinan Kaya @ 2018-11-20 22:28 UTC (permalink / raw)
  To: Keith Busch
  Cc: Alex_Gagniuc, mr.nuke.me, baicar.tyler, Austin.Bolen, Shyam.Iyer,
	lukas, bhelgaas, rjw, lenb, ruscur, sbobroff, oohall, linux-pci,
	linux-acpi, linux-kernel, linuxppc-dev

On 11/20/2018 4:42 PM, Keith Busch wrote:
> How does that work? If the OS takes control, it sets up MSIs that FW don't
> react to, and disables system errors through PCIe Root Control. Aren't
> those sys errs the mechanism FW knows it has something to do, which
> means the OS can effectively fence it off?

I think this is all implementation detail and doesn't necessarily apply
to all firmware-first implementation flavors.

Assumptions are:
1. both FW and OS are listening to MSI interrupts
2. FW monitors the system errors

Some FF implementation could route the AER interrupt to a higher privilege
level. Some other implementation could use INTx or a side-band channel interrupt
for firmware-interrupt too.

I have seen all 3 except MSI :) and also firmware never monitored the system
error bits. I was curious if anybody ever used those legacy bits. Now, I know
someone is using it.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-20 22:28                                     ` Sinan Kaya
@ 2018-11-20 22:35                                       ` Alex G.
  0 siblings, 0 replies; 33+ messages in thread
From: Alex G. @ 2018-11-20 22:35 UTC (permalink / raw)
  To: Sinan Kaya, Keith Busch
  Cc: Alex_Gagniuc, baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas,
	bhelgaas, rjw, lenb, ruscur, sbobroff, oohall, linux-pci,
	linux-acpi, linux-kernel, linuxppc-dev



On 11/20/2018 04:28 PM, Sinan Kaya wrote:
> On 11/20/2018 4:42 PM, Keith Busch wrote:
>> How does that work? If the OS takes control, it sets up MSIs that FW 
>> don't
>> react to, and disables system errors through PCIe Root Control. Aren't
>> those sys errs the mechanism FW knows it has something to do, which
>> means the OS can effectively fence it off?
> 
> I think this is all implementation detail and doesn't necessarily apply
> to all firmware-first implementation flavors.
> 
> Assumptions are:
> 1. both FW and OS are listening to MSI interrupts

On hax86, I'm not sure FW can listen to MSI interrups. FW only exists in 
SMM, not ring 1-4.

> 2. FW monitors the system errors
> 
> Some FF implementation could route the AER interrupt to a higher privilege
> level. Some other implementation could use INTx or a side-band channel 
> interrupt
> for firmware-interrupt too.
> 
> I have seen all 3 except MSI :) and also firmware never monitored the 
> system
> error bits. I was curious if anybody ever used those legacy bits. Now, I 
> know
> someone is using it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-20 22:08                                     ` Sinan Kaya
@ 2018-11-20 22:36                                       ` Alex_Gagniuc
  2018-11-27 18:22                                       ` Alex_Gagniuc
  1 sibling, 0 replies; 33+ messages in thread
From: Alex_Gagniuc @ 2018-11-20 22:36 UTC (permalink / raw)
  To: okaya, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/20/2018 04:08 PM, Sinan Kaya wrote:
> One version is:
> "if HEST table is present, ignore _OSC"
> or
> Another version is:
> "if HEST table is present, make sure that FW sets _OSC bit for AER to
> false. Otherwise, warn like crazy that this BIOS is broken and needs
> an update and can cause all sorts of trouble"

ACPI 6.2 Ch 6.2.11.3, Table 6-197
	PCI Express Advanced Error Reporting control

	The firmware sets this bit to 1 to grant control over PCI
	Express Advanced Error Reporting.
	If firmware allows the OS control of this feature, then in the
	context of the _OSC method it must ensure that error messages
	are routed to device interrupts as described in the PCI
	Express Base Specification. Additionally, after control is
	transferred to the OS, firmware must not modify the Advanced
	Error Reporting Capability. If control of this feature was
	requested and denied or was not requested, firmware returns
	this bit set to 0.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-20 22:08                                     ` Sinan Kaya
  2018-11-20 22:36                                       ` Alex_Gagniuc
@ 2018-11-27 18:22                                       ` Alex_Gagniuc
  2018-11-27 18:32                                         ` Sinan Kaya
  1 sibling, 1 reply; 33+ messages in thread
From: Alex_Gagniuc @ 2018-11-27 18:22 UTC (permalink / raw)
  To: okaya, mr.nuke.me, keith.busch
  Cc: baicar.tyler, Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw,
	lenb, ruscur, sbobroff, oohall, linux-pci, linux-acpi,
	linux-kernel, linuxppc-dev

On 11/20/2018 04:08 PM, Sinan Kaya wrote:
> I followed the ASWG thread yesterday. There will be a meeting next week to
> discuss this.

Any updates on the meeting?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-27 18:22                                       ` Alex_Gagniuc
@ 2018-11-27 18:32                                         ` Sinan Kaya
  2018-11-27 18:46                                           ` Tyler Baicar
  0 siblings, 1 reply; 33+ messages in thread
From: Sinan Kaya @ 2018-11-27 18:32 UTC (permalink / raw)
  To: Alex_Gagniuc, mr.nuke.me, keith.busch, baicar.tyler
  Cc: Austin.Bolen, Shyam.Iyer, lukas, bhelgaas, rjw, lenb, ruscur,
	sbobroff, oohall, linux-pci, linux-acpi, linux-kernel,
	linuxppc-dev

On 11/27/2018 1:22 PM, Alex_Gagniuc@Dellteam.com wrote:
> On 11/20/2018 04:08 PM, Sinan Kaya wrote:
>> I followed the ASWG thread yesterday. There will be a meeting next week to
>> discuss this.
> 
> Any updates on the meeting?
> 
> 

Tyler should be able to get an update.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-27 18:32                                         ` Sinan Kaya
@ 2018-11-27 18:46                                           ` Tyler Baicar
  0 siblings, 0 replies; 33+ messages in thread
From: Tyler Baicar @ 2018-11-27 18:46 UTC (permalink / raw)
  To: okaya
  Cc: Alex_Gagniuc, Alexandru Gagniuc, keith.busch, Austin.Bolen,
	Shyam.Iyer, lukas, Bjorn Helgaas, rjw, Len Brown, ruscur,
	sbobroff, Oliver, linux-pci, linux-acpi,
	Linux Kernel Mailing List, linuxppc-dev

On Tue, Nov 27, 2018 at 1:32 PM Sinan Kaya <okaya@kernel.org> wrote:
>
> On 11/27/2018 1:22 PM, Alex_Gagniuc@Dellteam.com wrote:
> > On 11/20/2018 04:08 PM, Sinan Kaya wrote:
> >> I followed the ASWG thread yesterday. There will be a meeting next week to
> >> discuss this.
> >
> > Any updates on the meeting?
> >
> >
>
> Tyler should be able to get an update.

As per Harb, the meetings are on Thursdays so it will be discussed this
Thursday since last week was a holiday.

Thanks,
Tyler

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER
  2018-11-15 23:16 [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Alexandru Gagniuc
                   ` (3 preceding siblings ...)
  2018-11-16 12:37 ` David Laight
@ 2019-03-05 23:16 ` Bjorn Helgaas
  4 siblings, 0 replies; 33+ messages in thread
From: Bjorn Helgaas @ 2019-03-05 23:16 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: helgaas, austin_bolen, alex_gagniuc, keith.busch, Shyam_Iyer,
	lukas, Rafael J. Wysocki, Len Brown, Russell Currey, Sam Bobroff,
	Oliver O'Halloran, linux-pci, linux-acpi, linux-kernel,
	linuxppc-dev

On Thu, Nov 15, 2018 at 05:16:01PM -0600, Alexandru Gagniuc wrote:
> Thanks to Keith for pointing out that it doesn't make sense to disable
> AER services when only one device has a FIRMWARE_FIRST HEST.
> 
> AER ownership is an interesting issue brought in by FFS (firmware-first)
> model. In a nutshell if FFS handles AER, then OS should not touch any
> of the AER bits. FW might set things up so that it receives AER
> notifications via SMI. It's theoretically possible to receive SCIs,
> but the exact mechanism is platform-dependent. OS touching AER bits
> when firmware owns them may interfere with these notifications.
> 
> The ACPI mechanism for negotiating control of AER is _OSC, and is
> described in detail in ACPI 6.2 Ch. 6.2.11.3. _OSC is negotiated at
> the root bus level. Any root port, switch, or endpoint under the bus
> would have its AER ownership negotiated in one _OSC call.
> 
> Then there is HEST, which is part of ACPI Platform Error Interfaces
> (APEI). HEST tables describe the errors that FW may report to the OS.
> A types 6,7 and 7 HEST tables describe AER errors from PCIe devices.
> As part of this description, we're told if the error source is FFS.
> 
> Information in HEST seems to be redundant, as each error reported by
> FW will have a CPER table that describes it in detail.
> 
> Because HEST describes an error source as firmware-first or not, we've
> taken this to mean ownership of AER. Because AER ownership and error
> reporting are coupled, _OSC and HEST usually agree on the matter of
> ownership. However, that doesn't seem to be required by ACPI.
> 
> I've asked around a few people at Dell and they unanimously agree that
> _OSC is the correct way to determine ownership of AER. In linux, we
> use the result of _OSC to enable AER services, but we use HEST to
> determine AER ownership. That's inconsistent. This series drops the
> use of HEST in favor of _OSC.
> 
> [1] https://lkml.org/lkml/2018/11/15/62
> 
> Alexandru Gagniuc (2):
>   PCI/AER: Do not use APEI/HEST to disable AER services globally
>   PCI/AER: Determine AER ownership based on _OSC instead of HEST
> 
>  drivers/acpi/pci_root.c  |  9 +----
>  drivers/pci/pcie/aer.c   | 82 ++--------------------------------------
>  include/linux/pci-acpi.h |  6 ---
>  3 files changed, 5 insertions(+), 92 deletions(-)

I'm pretty sure we do need to do something here, but there was quite a
lot of discussion that didn't seem to really get resolved, so I'm
dropping these for now.

Please repost them with any relevant updates and we'll see if we can
get a consensus that we're going the right direction.

Bjorn

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-03-05 23:16 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-15 23:16 [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Alexandru Gagniuc
2018-11-15 23:16 ` [PATCH 1/2] PCI/AER: Do not use APEI/HEST to disable AER services globally Alexandru Gagniuc
2018-11-15 23:16 ` [PATCH 2/2] PCI/AER: Determine AER ownership based on _OSC instead of HEST Alexandru Gagniuc
2018-11-15 23:43   ` Keith Busch
2018-11-16  1:49 ` [PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER Sinan Kaya
2018-11-19 16:53   ` Tyler Baicar
2018-11-19 16:53     ` Keith Busch
2018-11-19 17:32       ` Sinan Kaya
2018-11-19 17:36         ` Keith Busch
2018-11-19 17:42         ` Sinan Kaya
2018-11-19 17:41           ` Keith Busch
2018-11-19 17:56             ` Sinan Kaya
2018-11-19 18:10               ` Keith Busch
2018-11-19 18:24                 ` Sinan Kaya
2018-11-19 19:11                   ` Alex G.
2018-11-19 19:32                     ` Sinan Kaya
2018-11-19 20:16                       ` Alex_Gagniuc
2018-11-19 20:33                         ` Sinan Kaya
2018-11-19 23:49                           ` Alex_Gagniuc
2018-11-20  1:54                             ` Sinan Kaya
2018-11-20 20:44                               ` Alex_Gagniuc
2018-11-20 21:02                                 ` Sinan Kaya
2018-11-20 21:42                                   ` Keith Busch
2018-11-20 22:28                                     ` Sinan Kaya
2018-11-20 22:35                                       ` Alex G.
2018-11-20 21:46                                   ` Alex_Gagniuc
2018-11-20 22:08                                     ` Sinan Kaya
2018-11-20 22:36                                       ` Alex_Gagniuc
2018-11-27 18:22                                       ` Alex_Gagniuc
2018-11-27 18:32                                         ` Sinan Kaya
2018-11-27 18:46                                           ` Tyler Baicar
2018-11-16 12:37 ` David Laight
2019-03-05 23:16 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).