All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-06-28 10:06 ` James Morse
  0 siblings, 0 replies; 36+ messages in thread
From: James Morse @ 2018-06-28 10:06 UTC (permalink / raw)
  To: linux-acpi
  Cc: Lorenzo Pieralisi, Ard Biesheuvel, Geoff Levand, Riku Voipio,
	Mark Salter, James Morse, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

There are reports[0] that HPE's 'ProLiant m400 Server' (aka moonshot) has
broken RAS support, and adding disable_hest to the kernel cmdline is the
only way to make the board boot if APEI support is built into the kernel.

After Mark Salter's investigation[1] we know that UEFI's ExitBootServices
is doing something that causes a fatal error to be written to GHES.2.
Once the kernel finds this, it falsely assume it was due to something that
happened during boot, and panic()s.

This series adds a DMI quirks table to hest.c, and adds a helper that lets
us query the UEFI system table version, to set hest_disabled on this
platform.

Testing the HEST table vendor and revision is a problem as this would
match all 'HPE ProLiant', some of which may be a totally different CPU
architecture.


I don't have access to an m400, these DMI and UEFI values were taken from
the crashlog report at [0], then tested with the equivalent fields on
Seattle.


Thanks,

James

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1574718
[1] https://www.spinics.net/lists/arm-kernel/msg660956.html

James Morse (2):
  efi: Add helper to retrieve runtime version number
  ACPI / APEI: Add DMI matching quirks for platforms that require
    hest_disable

 drivers/acpi/apei/hest.c | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/efi.h      |  5 +++++
 2 files changed, 43 insertions(+)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-06-28 10:06 ` James Morse
  0 siblings, 0 replies; 36+ messages in thread
From: James Morse @ 2018-06-28 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

There are reports[0] that HPE's 'ProLiant m400 Server' (aka moonshot) has
broken RAS support, and adding disable_hest to the kernel cmdline is the
only way to make the board boot if APEI support is built into the kernel.

After Mark Salter's investigation[1] we know that UEFI's ExitBootServices
is doing something that causes a fatal error to be written to GHES.2.
Once the kernel finds this, it falsely assume it was due to something that
happened during boot, and panic()s.

This series adds a DMI quirks table to hest.c, and adds a helper that lets
us query the UEFI system table version, to set hest_disabled on this
platform.

Testing the HEST table vendor and revision is a problem as this would
match all 'HPE ProLiant', some of which may be a totally different CPU
architecture.


I don't have access to an m400, these DMI and UEFI values were taken from
the crashlog report at [0], then tested with the equivalent fields on
Seattle.


Thanks,

James

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1574718
[1] https://www.spinics.net/lists/arm-kernel/msg660956.html

James Morse (2):
  efi: Add helper to retrieve runtime version number
  ACPI / APEI: Add DMI matching quirks for platforms that require
    hest_disable

 drivers/acpi/apei/hest.c | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/efi.h      |  5 +++++
 2 files changed, 43 insertions(+)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 1/2] efi: Add helper to retrieve runtime version number
  2018-06-28 10:06 ` James Morse
@ 2018-06-28 10:06   ` James Morse
  -1 siblings, 0 replies; 36+ messages in thread
From: James Morse @ 2018-06-28 10:06 UTC (permalink / raw)
  To: linux-acpi
  Cc: Lorenzo Pieralisi, Ard Biesheuvel, Geoff Levand, Riku Voipio,
	Mark Salter, James Morse, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

Some RAS errors are related to an invalid hardware access that was
made by software. Sometimes this happens before the kernel runs.

For example, HPE's ProLiant m400 Server (aka moonshot) trips the
platforms RAS mechanism during ExitBootServices. Once the kernel
probes the RAS error descriptor regions, it finds a stale 'fatal
error', and hits the deck.

Quirking this platform based on the DMI data doesn't capture that
the bug lies in the platforms UEFI firmware. Expose the runtime
version, originally retrieved from the EFI system table's header
revision field, so that any future or unaffected platform firmware
isn't swept up too.

Signed-off-by: James Morse <james.morse@arm.com>
---
 include/linux/efi.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/efi.h b/include/linux/efi.h
index 56add823f190..42ddf8399814 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -966,6 +966,11 @@ extern struct efi {
 	unsigned long flags;
 } efi;
 
+static inline unsigned int efi_get_runtime_version(void)
+{
+	return efi.runtime_version;
+}
+
 extern struct mm_struct efi_mm;
 
 static inline int
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 1/2] efi: Add helper to retrieve runtime version number
@ 2018-06-28 10:06   ` James Morse
  0 siblings, 0 replies; 36+ messages in thread
From: James Morse @ 2018-06-28 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

Some RAS errors are related to an invalid hardware access that was
made by software. Sometimes this happens before the kernel runs.

For example, HPE's ProLiant m400 Server (aka moonshot) trips the
platforms RAS mechanism during ExitBootServices. Once the kernel
probes the RAS error descriptor regions, it finds a stale 'fatal
error', and hits the deck.

Quirking this platform based on the DMI data doesn't capture that
the bug lies in the platforms UEFI firmware. Expose the runtime
version, originally retrieved from the EFI system table's header
revision field, so that any future or unaffected platform firmware
isn't swept up too.

Signed-off-by: James Morse <james.morse@arm.com>
---
 include/linux/efi.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/efi.h b/include/linux/efi.h
index 56add823f190..42ddf8399814 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -966,6 +966,11 @@ extern struct efi {
 	unsigned long flags;
 } efi;
 
+static inline unsigned int efi_get_runtime_version(void)
+{
+	return efi.runtime_version;
+}
+
 extern struct mm_struct efi_mm;
 
 static inline int
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 2/2] ACPI / APEI: Add DMI matching quirks for platforms that require hest_disable
  2018-06-28 10:06 ` James Morse
@ 2018-06-28 10:06   ` James Morse
  -1 siblings, 0 replies; 36+ messages in thread
From: James Morse @ 2018-06-28 10:06 UTC (permalink / raw)
  To: linux-acpi
  Cc: Lorenzo Pieralisi, Ard Biesheuvel, Geoff Levand, Riku Voipio,
	Mark Salter, James Morse, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

Some RAS errors are related to an invalid hardware access that was
made by software. Sometimes this happens before the kernel runs.

For example, HPE's ProLiant m400 Server (aka moonshot) trips the
platforms RAS mechanism during UEFI's ExitBootServices. Once the
kernel probes the RAS error descriptor regions, it finds a stale
'fatal error', and hits the deck.

Add a table of DMI matches to allow platforms like this to be
quirked. For moonshot we also want to know the UEFI firmware
version, as this appears to be where the faulting access happens.

This quirk causes the following to be printed during boot:
| [    2.491990] HEST: disabled due to firmware quirk
| [    2.496659] HEST: Table parsing disabled.
[...]
| [    6.341314] GHES: HEST is not enabled!

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1574718
Link: https://www.spinics.net/lists/arm-kernel/msg660956.html
Signed-off-by: James Morse <james.morse@arm.com>
CC: Mark Salter <msalter@redhat.com>
CC: Geoff Levand <geoff@infradead.org>
---
 drivers/acpi/apei/hest.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index b1e9f81ebeea..d0e49f0cd353 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -23,6 +23,8 @@
  * GNU General Public License for more details.
  */
 
+#include <linux/efi.h>
+#include <linux/dmi.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/init.h>
@@ -212,6 +214,40 @@ err:
 	goto out;
 }
 
+static int __init quirk_hpe_moonshot_m400(const struct dmi_system_id *d)
+{
+	/* Only 'EFI v2.60 by HPE' is known to be affected */
+	unsigned int affected_version = (2<<16) | 60;
+
+	if (!IS_ENABLED(CONFIG_EFI))
+		return 0;
+
+	if (efi_get_runtime_version() == affected_version) {
+		pr_info(HEST_PFX "disabled due to firmware quirk\n");
+		hest_disable = HEST_DISABLED;
+	}
+
+	return 0;
+}
+
+static const struct dmi_system_id hest_quirk_dmi_table[]  __initconst = {
+	{
+		.callback = quirk_hpe_moonshot_m400,
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR,	"HPE"),
+			DMI_MATCH(DMI_PRODUCT_NAME,	"ProLiant m400 Server"),
+			DMI_MATCH(DMI_BOARD_NAME,	"ProLiant m400 Server"),
+			DMI_MATCH(DMI_BIOS_VERSION,	"U02"),
+		},
+	},
+	{},
+};
+
+static void __init acpi_hest_quirks(void)
+{
+	dmi_check_system(hest_quirk_dmi_table);
+}
+
 static int __init setup_hest_disable(char *str)
 {
 	hest_disable = HEST_DISABLED;
@@ -226,6 +262,8 @@ void __init acpi_hest_init(void)
 	int rc = -ENODEV;
 	unsigned int ghes_count = 0;
 
+	acpi_hest_quirks();
+
 	if (hest_disable) {
 		pr_info(HEST_PFX "Table parsing disabled.\n");
 		return;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 2/2] ACPI / APEI: Add DMI matching quirks for platforms that require hest_disable
@ 2018-06-28 10:06   ` James Morse
  0 siblings, 0 replies; 36+ messages in thread
From: James Morse @ 2018-06-28 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

Some RAS errors are related to an invalid hardware access that was
made by software. Sometimes this happens before the kernel runs.

For example, HPE's ProLiant m400 Server (aka moonshot) trips the
platforms RAS mechanism during UEFI's ExitBootServices. Once the
kernel probes the RAS error descriptor regions, it finds a stale
'fatal error', and hits the deck.

Add a table of DMI matches to allow platforms like this to be
quirked. For moonshot we also want to know the UEFI firmware
version, as this appears to be where the faulting access happens.

This quirk causes the following to be printed during boot:
| [    2.491990] HEST: disabled due to firmware quirk
| [    2.496659] HEST: Table parsing disabled.
[...]
| [    6.341314] GHES: HEST is not enabled!

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1574718
Link: https://www.spinics.net/lists/arm-kernel/msg660956.html
Signed-off-by: James Morse <james.morse@arm.com>
CC: Mark Salter <msalter@redhat.com>
CC: Geoff Levand <geoff@infradead.org>
---
 drivers/acpi/apei/hest.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index b1e9f81ebeea..d0e49f0cd353 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -23,6 +23,8 @@
  * GNU General Public License for more details.
  */
 
+#include <linux/efi.h>
+#include <linux/dmi.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/init.h>
@@ -212,6 +214,40 @@ err:
 	goto out;
 }
 
+static int __init quirk_hpe_moonshot_m400(const struct dmi_system_id *d)
+{
+	/* Only 'EFI v2.60 by HPE' is known to be affected */
+	unsigned int affected_version = (2<<16) | 60;
+
+	if (!IS_ENABLED(CONFIG_EFI))
+		return 0;
+
+	if (efi_get_runtime_version() == affected_version) {
+		pr_info(HEST_PFX "disabled due to firmware quirk\n");
+		hest_disable = HEST_DISABLED;
+	}
+
+	return 0;
+}
+
+static const struct dmi_system_id hest_quirk_dmi_table[]  __initconst = {
+	{
+		.callback = quirk_hpe_moonshot_m400,
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR,	"HPE"),
+			DMI_MATCH(DMI_PRODUCT_NAME,	"ProLiant m400 Server"),
+			DMI_MATCH(DMI_BOARD_NAME,	"ProLiant m400 Server"),
+			DMI_MATCH(DMI_BIOS_VERSION,	"U02"),
+		},
+	},
+	{},
+};
+
+static void __init acpi_hest_quirks(void)
+{
+	dmi_check_system(hest_quirk_dmi_table);
+}
+
 static int __init setup_hest_disable(char *str)
 {
 	hest_disable = HEST_DISABLED;
@@ -226,6 +262,8 @@ void __init acpi_hest_init(void)
 	int rc = -ENODEV;
 	unsigned int ghes_count = 0;
 
+	acpi_hest_quirks();
+
 	if (hest_disable) {
 		pr_info(HEST_PFX "Table parsing disabled.\n");
 		return;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-06-28 10:06 ` James Morse
@ 2018-06-28 10:25   ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2018-06-28 10:25 UTC (permalink / raw)
  To: James Morse
  Cc: Lorenzo Pieralisi, Geoff Levand, Riku Voipio, Mark Salter,
	ACPI Devel Maling List, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

Hi James,

On 28 June 2018 at 12:06, James Morse <james.morse@arm.com> wrote:
> There are reports[0] that HPE's 'ProLiant m400 Server' (aka moonshot) has
> broken RAS support, and adding disable_hest to the kernel cmdline is the
> only way to make the board boot if APEI support is built into the kernel.
>
> After Mark Salter's investigation[1] we know that UEFI's ExitBootServices
> is doing something that causes a fatal error to be written to GHES.2.
> Once the kernel finds this, it falsely assume it was due to something that
> happened during boot, and panic()s.
>
> This series adds a DMI quirks table to hest.c, and adds a helper that lets
> us query the UEFI system table version, to set hest_disabled on this
> platform.
>
> Testing the HEST table vendor and revision is a problem as this would
> match all 'HPE ProLiant', some of which may be a totally different CPU
> architecture.
>
>
> I don't have access to an m400, these DMI and UEFI values were taken from
> the crashlog report at [0], then tested with the equivalent fields on
> Seattle.
>

I understand the desire to keep running these M400s as long as they
have some life left in them, but the reality is that they are end of
life already, and not many were manufactured to begin with.

Given how the upstream kernel is aimed at future development, I don't
think we should fix this in the upstream kernel at all. Distros are
free to do what they like, of course, and I'm sure RedHat already have
a fix for this in their downstream kernel. But putting this upstream
means we will never be able to remove it again, which would be
especially unfortunate given that it is the first ever DMI quirk for
arm64, which we tried *very* hard to avoid, also because we don't
initialize the DMI framework as early as x86 does, and so once we open
the floodgates, we will run into issues where we will need to reorder
the init sequence to make DMI data available early enough.

As for the efi.h patch: I don't object to adding code that makes the
spec revision available, but note that this is *not* a firmware build
number, and so it should not be used as such. Also, given that m400 is
EOL and unmaintained, no firmware updates are expected, and so
assuming that there will be a UEFI 2.7 based update in the future
seems rather optimistic.

Ultimately, it is not up to me to decide whether

a) DMI quirks will be permitted on arm64
b) we care about m400 enough to put this quirk in the upstream kernel

but I'd prefer it if we steered clear of this.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-06-28 10:25   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2018-06-28 10:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 28 June 2018 at 12:06, James Morse <james.morse@arm.com> wrote:
> There are reports[0] that HPE's 'ProLiant m400 Server' (aka moonshot) has
> broken RAS support, and adding disable_hest to the kernel cmdline is the
> only way to make the board boot if APEI support is built into the kernel.
>
> After Mark Salter's investigation[1] we know that UEFI's ExitBootServices
> is doing something that causes a fatal error to be written to GHES.2.
> Once the kernel finds this, it falsely assume it was due to something that
> happened during boot, and panic()s.
>
> This series adds a DMI quirks table to hest.c, and adds a helper that lets
> us query the UEFI system table version, to set hest_disabled on this
> platform.
>
> Testing the HEST table vendor and revision is a problem as this would
> match all 'HPE ProLiant', some of which may be a totally different CPU
> architecture.
>
>
> I don't have access to an m400, these DMI and UEFI values were taken from
> the crashlog report at [0], then tested with the equivalent fields on
> Seattle.
>

I understand the desire to keep running these M400s as long as they
have some life left in them, but the reality is that they are end of
life already, and not many were manufactured to begin with.

Given how the upstream kernel is aimed at future development, I don't
think we should fix this in the upstream kernel at all. Distros are
free to do what they like, of course, and I'm sure RedHat already have
a fix for this in their downstream kernel. But putting this upstream
means we will never be able to remove it again, which would be
especially unfortunate given that it is the first ever DMI quirk for
arm64, which we tried *very* hard to avoid, also because we don't
initialize the DMI framework as early as x86 does, and so once we open
the floodgates, we will run into issues where we will need to reorder
the init sequence to make DMI data available early enough.

As for the efi.h patch: I don't object to adding code that makes the
spec revision available, but note that this is *not* a firmware build
number, and so it should not be used as such. Also, given that m400 is
EOL and unmaintained, no firmware updates are expected, and so
assuming that there will be a UEFI 2.7 based update in the future
seems rather optimistic.

Ultimately, it is not up to me to decide whether

a) DMI quirks will be permitted on arm64
b) we care about m400 enough to put this quirk in the upstream kernel

but I'd prefer it if we steered clear of this.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-06-28 10:25   ` Ard Biesheuvel
@ 2018-06-28 12:51     ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Pieralisi @ 2018-06-28 12:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Geoff Levand, Riku Voipio, Mark Salter, ACPI Devel Maling List,
	James Morse, Hanjun Guo, Sudeep Holla, linux-arm-kernel

On Thu, Jun 28, 2018 at 12:25:06PM +0200, Ard Biesheuvel wrote:
> Hi James,
> 
> On 28 June 2018 at 12:06, James Morse <james.morse@arm.com> wrote:
> > There are reports[0] that HPE's 'ProLiant m400 Server' (aka moonshot) has
> > broken RAS support, and adding disable_hest to the kernel cmdline is the
> > only way to make the board boot if APEI support is built into the kernel.
> >
> > After Mark Salter's investigation[1] we know that UEFI's ExitBootServices
> > is doing something that causes a fatal error to be written to GHES.2.
> > Once the kernel finds this, it falsely assume it was due to something that
> > happened during boot, and panic()s.
> >
> > This series adds a DMI quirks table to hest.c, and adds a helper that lets
> > us query the UEFI system table version, to set hest_disabled on this
> > platform.
> >
> > Testing the HEST table vendor and revision is a problem as this would
> > match all 'HPE ProLiant', some of which may be a totally different CPU
> > architecture.
> >
> >
> > I don't have access to an m400, these DMI and UEFI values were taken from
> > the crashlog report at [0], then tested with the equivalent fields on
> > Seattle.
> >
> 
> I understand the desire to keep running these M400s as long as they
> have some life left in them, but the reality is that they are end of
> life already, and not many were manufactured to begin with.
> 
> Given how the upstream kernel is aimed at future development, I don't
> think we should fix this in the upstream kernel at all. Distros are
> free to do what they like, of course, and I'm sure RedHat already have
> a fix for this in their downstream kernel. But putting this upstream
> means we will never be able to remove it again, which would be
> especially unfortunate given that it is the first ever DMI quirk for
> arm64, which we tried *very* hard to avoid, also because we don't
> initialize the DMI framework as early as x86 does, and so once we open
> the floodgates, we will run into issues where we will need to reorder
> the init sequence to make DMI data available early enough.
> 
> As for the efi.h patch: I don't object to adding code that makes the
> spec revision available, but note that this is *not* a firmware build
> number, and so it should not be used as such. Also, given that m400 is
> EOL and unmaintained, no firmware updates are expected, and so
> assuming that there will be a UEFI 2.7 based update in the future
> seems rather optimistic.
> 
> Ultimately, it is not up to me to decide whether
> 
> a) DMI quirks will be permitted on arm64
> b) we care about m400 enough to put this quirk in the upstream kernel
> 
> but I'd prefer it if we steered clear of this.

I apologise to James (and Mark) who went all the way to debug this FW
bug and worked around it with a series that is upstreamable, I was in
two minds about this but eventually I would agree with you, your
reasoning is linear and it is an acceptable reason not to merge this
series, if HPe do not care I do not think we should either, for the time
being let's keep the floodgates watertight, with my apologies.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-06-28 12:51     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Pieralisi @ 2018-06-28 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 28, 2018 at 12:25:06PM +0200, Ard Biesheuvel wrote:
> Hi James,
> 
> On 28 June 2018 at 12:06, James Morse <james.morse@arm.com> wrote:
> > There are reports[0] that HPE's 'ProLiant m400 Server' (aka moonshot) has
> > broken RAS support, and adding disable_hest to the kernel cmdline is the
> > only way to make the board boot if APEI support is built into the kernel.
> >
> > After Mark Salter's investigation[1] we know that UEFI's ExitBootServices
> > is doing something that causes a fatal error to be written to GHES.2.
> > Once the kernel finds this, it falsely assume it was due to something that
> > happened during boot, and panic()s.
> >
> > This series adds a DMI quirks table to hest.c, and adds a helper that lets
> > us query the UEFI system table version, to set hest_disabled on this
> > platform.
> >
> > Testing the HEST table vendor and revision is a problem as this would
> > match all 'HPE ProLiant', some of which may be a totally different CPU
> > architecture.
> >
> >
> > I don't have access to an m400, these DMI and UEFI values were taken from
> > the crashlog report at [0], then tested with the equivalent fields on
> > Seattle.
> >
> 
> I understand the desire to keep running these M400s as long as they
> have some life left in them, but the reality is that they are end of
> life already, and not many were manufactured to begin with.
> 
> Given how the upstream kernel is aimed at future development, I don't
> think we should fix this in the upstream kernel at all. Distros are
> free to do what they like, of course, and I'm sure RedHat already have
> a fix for this in their downstream kernel. But putting this upstream
> means we will never be able to remove it again, which would be
> especially unfortunate given that it is the first ever DMI quirk for
> arm64, which we tried *very* hard to avoid, also because we don't
> initialize the DMI framework as early as x86 does, and so once we open
> the floodgates, we will run into issues where we will need to reorder
> the init sequence to make DMI data available early enough.
> 
> As for the efi.h patch: I don't object to adding code that makes the
> spec revision available, but note that this is *not* a firmware build
> number, and so it should not be used as such. Also, given that m400 is
> EOL and unmaintained, no firmware updates are expected, and so
> assuming that there will be a UEFI 2.7 based update in the future
> seems rather optimistic.
> 
> Ultimately, it is not up to me to decide whether
> 
> a) DMI quirks will be permitted on arm64
> b) we care about m400 enough to put this quirk in the upstream kernel
> 
> but I'd prefer it if we steered clear of this.

I apologise to James (and Mark) who went all the way to debug this FW
bug and worked around it with a series that is upstreamable, I was in
two minds about this but eventually I would agree with you, your
reasoning is linear and it is an acceptable reason not to merge this
series, if HPe do not care I do not think we should either, for the time
being let's keep the floodgates watertight, with my apologies.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-06-28 10:25   ` Ard Biesheuvel
@ 2018-06-28 14:24     ` James Morse
  -1 siblings, 0 replies; 36+ messages in thread
From: James Morse @ 2018-06-28 14:24 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Lorenzo Pieralisi, Geoff Levand, Riku Voipio, Mark Salter,
	ACPI Devel Maling List, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

Hi Ard,

On 28/06/18 11:25, Ard Biesheuvel wrote:
> I understand the desire to keep running these M400s as long as they
> have some life left in them, but the reality is that they are end of
> life already, and not many were manufactured to begin with.

Sure, all this would really avoid is users having to specify 'disable_hest' on
the cmdline. If these things are as rare as they sound, all the users are
probably experts quite capable of doing this, and must have been doing this
since v4.13.

[...]

> As for the efi.h patch: I don't object to adding code that makes the
> spec revision available, but note that this is *not* a firmware build
> number, and so it should not be used as such.

Ah, I thought it was, from:
| efi.runtime_version = efi.systab->hdr.revision;

I read this like an ACPI OEM revision, which obviously its not.
(its was also motivated by something I can pick out of dmesg on the bug reports)

So this is the wrong thing to do. Matching BIOS build dates is clearly silly,
and doesn't scale to a range.


> Also, given that m400 is
> EOL and unmaintained, no firmware updates are expected, and so
> assuming that there will be a UEFI 2.7 based update in the future
> seems rather optimistic.

Not just a future release (although I am forever optimistic), but I was trying
not to match older versions without evidence that they are affected.

I mistakenly thought this was something approximating a firmware build version.


Thanks,

James

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-06-28 14:24     ` James Morse
  0 siblings, 0 replies; 36+ messages in thread
From: James Morse @ 2018-06-28 14:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On 28/06/18 11:25, Ard Biesheuvel wrote:
> I understand the desire to keep running these M400s as long as they
> have some life left in them, but the reality is that they are end of
> life already, and not many were manufactured to begin with.

Sure, all this would really avoid is users having to specify 'disable_hest' on
the cmdline. If these things are as rare as they sound, all the users are
probably experts quite capable of doing this, and must have been doing this
since v4.13.

[...]

> As for the efi.h patch: I don't object to adding code that makes the
> spec revision available, but note that this is *not* a firmware build
> number, and so it should not be used as such.

Ah, I thought it was, from:
| efi.runtime_version = efi.systab->hdr.revision;

I read this like an ACPI OEM revision, which obviously its not.
(its was also motivated by something I can pick out of dmesg on the bug reports)

So this is the wrong thing to do. Matching BIOS build dates is clearly silly,
and doesn't scale to a range.


> Also, given that m400 is
> EOL and unmaintained, no firmware updates are expected, and so
> assuming that there will be a UEFI 2.7 based update in the future
> seems rather optimistic.

Not just a future release (although I am forever optimistic), but I was trying
not to match older versions without evidence that they are affected.

I mistakenly thought this was something approximating a firmware build version.


Thanks,

James

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-06-28 10:25   ` Ard Biesheuvel
@ 2018-06-28 16:15     ` Geoff Levand
  -1 siblings, 0 replies; 36+ messages in thread
From: Geoff Levand @ 2018-06-28 16:15 UTC (permalink / raw)
  To: Ard Biesheuvel, James Morse
  Cc: Lorenzo Pieralisi, Riku Voipio, Mark Salter,
	ACPI Devel Maling List, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

Hi Ard,

On 06/28/2018 03:25 AM, Ard Biesheuvel wrote:

> Given how the upstream kernel is aimed at future development, I don't
> think we should fix this in the upstream kernel at all. Distros are
> free to do what they like, of course, and I'm sure RedHat already have
> a fix for this in their downstream kernel. 

Debian expects the kernel to work correctly, and so won't add a fix.
That means CONFIG_ACPI_APEI can't be enabled, and so users must either
go without APEI or use a custom built kernel. See:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900581

-Geoff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-06-28 16:15     ` Geoff Levand
  0 siblings, 0 replies; 36+ messages in thread
From: Geoff Levand @ 2018-06-28 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On 06/28/2018 03:25 AM, Ard Biesheuvel wrote:

> Given how the upstream kernel is aimed at future development, I don't
> think we should fix this in the upstream kernel at all. Distros are
> free to do what they like, of course, and I'm sure RedHat already have
> a fix for this in their downstream kernel. 

Debian expects the kernel to work correctly, and so won't add a fix.
That means CONFIG_ACPI_APEI can't be enabled, and so users must either
go without APEI or use a custom built kernel. See:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900581

-Geoff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-06-28 16:15     ` Geoff Levand
@ 2018-06-28 20:56       ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2018-06-28 20:56 UTC (permalink / raw)
  To: Geoff Levand
  Cc: Lorenzo Pieralisi, Riku Voipio, Mark Salter,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

On 28 June 2018 at 18:15, Geoff Levand <geoff@infradead.org> wrote:
> Hi Ard,
>
> On 06/28/2018 03:25 AM, Ard Biesheuvel wrote:
>
>> Given how the upstream kernel is aimed at future development, I don't
>> think we should fix this in the upstream kernel at all. Distros are
>> free to do what they like, of course, and I'm sure RedHat already have
>> a fix for this in their downstream kernel.
>
> Debian expects the kernel to work correctly, and so won't add a fix.
> That means CONFIG_ACPI_APEI can't be enabled, and so users must either
> go without APEI or use a custom built kernel. See:
>
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900581
>

Hi Geoff,

I am aware of the context of this discussion, and I feel your pain,
given that you don't care about m400 in the first place.

So I guess we should take this up with the Debian folks directly. They
apparently do care about m400, and are reluctant to have to add
'hest_disable=1' to the kernel command line.

But that does not make it an upstream problem. The fact that this is
an EOL platform of which only a couple of hundred are in circulation,
combined with the fact that there is a trivial workaround available
(the command line option) makes it a non-issue in my opinion,
especially given the fact that not a single distro ships pristine
mainline kernels, and so they can carry the quirk themselves.

I

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-06-28 20:56       ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2018-06-28 20:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 June 2018 at 18:15, Geoff Levand <geoff@infradead.org> wrote:
> Hi Ard,
>
> On 06/28/2018 03:25 AM, Ard Biesheuvel wrote:
>
>> Given how the upstream kernel is aimed at future development, I don't
>> think we should fix this in the upstream kernel at all. Distros are
>> free to do what they like, of course, and I'm sure RedHat already have
>> a fix for this in their downstream kernel.
>
> Debian expects the kernel to work correctly, and so won't add a fix.
> That means CONFIG_ACPI_APEI can't be enabled, and so users must either
> go without APEI or use a custom built kernel. See:
>
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900581
>

Hi Geoff,

I am aware of the context of this discussion, and I feel your pain,
given that you don't care about m400 in the first place.

So I guess we should take this up with the Debian folks directly. They
apparently do care about m400, and are reluctant to have to add
'hest_disable=1' to the kernel command line.

But that does not make it an upstream problem. The fact that this is
an EOL platform of which only a couple of hundred are in circulation,
combined with the fact that there is a trivial workaround available
(the command line option) makes it a non-issue in my opinion,
especially given the fact that not a single distro ships pristine
mainline kernels, and so they can carry the quirk themselves.

I

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-06-28 10:25   ` Ard Biesheuvel
@ 2018-07-03  8:44     ` Ian Campbell
  -1 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03  8:44 UTC (permalink / raw)
  To: Ard Biesheuvel, James Morse
  Cc: Lorenzo Pieralisi, Geoff Levand, Riku Voipio, Sudeep Holla,
	ACPI Devel Maling List, Hanjun Guo, Mark Salter,
	linux-arm-kernel

On Thu, 2018-06-28 at 12:25 +0200, Ard Biesheuvel wrote:
> I understand the desire to keep running these M400s as long as they
> have some life left in them, but the reality is that they are end of
> life already, and not many were manufactured to begin with.

Linux has a long history of supporting such devices so long as there is
someone around willing to keep them running (witness for example how
long x86/voyager lived with just 1 in existence in a motivated
developer's basement, probably some number of entire architectures and
I bet a not insubstantial chunk of the platform support in arch/arm).

> Given how the upstream kernel is aimed at future development,

That might be true in some sense but I don't think it can be said to
extends to "not worried about running on existing hardware".

>  I don't
> think we should fix this in the upstream kernel at all. Distros are
> free to do what they like, of course, and I'm sure RedHat already have
> a fix for this in their downstream kernel. But putting this upstream
> means we will never be able to remove it again,

Quirks are pretty self contained and should be very unobtrusive to the
common code paths though. Also I expect that quirks _can_ be removed
once the platform has actually died in reality (not just no longer
produced) or becomes too much of a burden for other reasons (which AIUI
is what eventually happened to Voyager).

>  which would be
> especially unfortunate given that it is the first ever DMI quirk for
> arm64, which we tried *very* hard to avoid, also because we don't
> initialize the DMI framework as early as x86 does, and so once we open
> the floodgates,

The "flood" is inversely proportional to the quality of the firmware
certification and it isn't too overwhelming on x86, which historically
had next to no certification apart from "runs Windows", so it seems
unlikely to me that on arm64, where some attempts have been made at
validation and test suites from very near the start, that the flood
will be all that overwhelming.

>  we will run into issues where we will need to reorder
> the init sequence to make DMI data available early enough.

> a) DMI quirks will be permitted on arm64
> b) we care about m400 enough to put this quirk in the upstream kernel

In general arm64 Linux is going to need to be able to cope with
firmware in the field which is either rubbish to some degree or which
predates the addition of some support in the kernel and turns out not
to be fully functional when that support is enabled (the latter it
seems being what happened in the m400 case).

So, I think DMI quirks are probably, in reality, inevitable unless you
think firmware authors are going to be infaliable or the
testing/certification suites never has any gaps in it.

Given that, the overhead of then supporting m400 seems pretty trivial.

That said, maybe there are more appropriate mechanisms than DMI on
arm64 for detecting and activating quirks?

Cheers,
Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-03  8:44     ` Ian Campbell
  0 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2018-06-28 at 12:25 +0200, Ard Biesheuvel wrote:
> I understand the desire to keep running these M400s as long as they
> have some life left in them, but the reality is that they are end of
> life already, and not many were manufactured to begin with.

Linux has a long history of supporting such devices so long as there is
someone around willing to keep them running (witness for example how
long x86/voyager lived with just 1 in existence in a motivated
developer's basement, probably some number of entire architectures and
I bet a not insubstantial chunk of the platform support in arch/arm).

> Given how the upstream kernel is aimed at future development,

That might be true in some sense but I don't think it can be said to
extends to "not worried about running on existing hardware".

>  I don't
> think we should fix this in the upstream kernel at all. Distros are
> free to do what they like, of course, and I'm sure RedHat already have
> a fix for this in their downstream kernel. But putting this upstream
> means we will never be able to remove it again,

Quirks are pretty self contained and should be very unobtrusive to the
common code paths though. Also I expect that quirks _can_ be removed
once the platform has actually died in reality (not just no longer
produced) or becomes too much of a burden for other reasons (which AIUI
is what eventually happened to Voyager).

>  which would be
> especially unfortunate given that it is the first ever DMI quirk for
> arm64, which we tried *very* hard to avoid, also because we don't
> initialize the DMI framework as early as x86 does, and so once we open
> the floodgates,

The "flood" is inversely proportional to the quality of the firmware
certification and it isn't too overwhelming on x86, which historically
had next to no certification apart from "runs Windows", so it seems
unlikely to me that on arm64, where some attempts have been made at
validation and test suites from very near the start, that the flood
will be all that overwhelming.

>  we will run into issues where we will need to reorder
> the init sequence to make DMI data available early enough.

> a) DMI quirks will be permitted on arm64
> b) we care about m400 enough to put this quirk in the upstream kernel

In general arm64 Linux is going to need to be able to cope with
firmware in the field which is either rubbish to some degree or which
predates the addition of some support in the kernel and turns out not
to be fully functional when that support is enabled (the latter it
seems being what happened in the m400 case).

So, I think DMI quirks are probably, in reality, inevitable unless you
think firmware authors are going to be infaliable or the
testing/certification suites never has any gaps in it.

Given that, the overhead of then supporting m400 seems pretty trivial.

That said, maybe there are more appropriate mechanisms than DMI on
arm64 for detecting and activating quirks?

Cheers,
Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-06-28 20:56       ` Ard Biesheuvel
@ 2018-07-03  8:46         ` Ian Campbell
  -1 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03  8:46 UTC (permalink / raw)
  To: Ard Biesheuvel, Geoff Levand
  Cc: Lorenzo Pieralisi, Riku Voipio, Sudeep Holla,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Mark Salter,
	linux-arm-kernel

On Thu, 2018-06-28 at 22:56 +0200, Ard Biesheuvel wrote:
> But that does not make it an upstream problem. The fact that this is
> an EOL platform of which only a couple of hundred are in circulation,
> combined with the fact that there is a trivial workaround available
> (the command line option) makes it a non-issue in my opinion,
> especially given the fact that not a single distro ships pristine
> mainline kernels, and so they can carry the quirk themselves.

I think that's rather unfortunate given that many distros (Debian
included, AIUI Fedora too, I'm sure others) try to ship kernels as
close to mainline as possible and the general steer from the kernel
community to try and do so whenever possible, to track stable releases
early and often etc.

Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-03  8:46         ` Ian Campbell
  0 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03  8:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2018-06-28 at 22:56 +0200, Ard Biesheuvel wrote:
> But that does not make it an upstream problem. The fact that this is
> an EOL platform of which only a couple of hundred are in circulation,
> combined with the fact that there is a trivial workaround available
> (the command line option) makes it a non-issue in my opinion,
> especially given the fact that not a single distro ships pristine
> mainline kernels, and so they can carry the quirk themselves.

I think that's rather unfortunate given that many distros (Debian
included, AIUI Fedora too, I'm sure others) try to ship kernels as
close to mainline as possible and the general steer from the kernel
community to try and do so whenever possible, to track stable releases
early and often etc.

Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-07-03  8:44     ` Ian Campbell
@ 2018-07-03 15:17       ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2018-07-03 15:17 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Lorenzo Pieralisi, Geoff Levand, Riku Voipio, Sudeep Holla,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Mark Salter,
	linux-arm-kernel

On 3 July 2018 at 10:44, Ian Campbell <ijc@debian.org> wrote:
> On Thu, 2018-06-28 at 12:25 +0200, Ard Biesheuvel wrote:
>> I understand the desire to keep running these M400s as long as they
>> have some life left in them, but the reality is that they are end of
>> life already, and not many were manufactured to begin with.
>
> Linux has a long history of supporting such devices so long as there is
> someone around willing to keep them running (witness for example how
> long x86/voyager lived with just 1 in existence in a motivated
> developer's basement, probably some number of entire architectures and
> I bet a not insubstantial chunk of the platform support in arch/arm).
>

I wonder how many such quirks fall into the 'user cannot be bothered
to add a kernel command line option' category.

>> Given how the upstream kernel is aimed at future development,
>
> That might be true in some sense but I don't think it can be said to
> extends to "not worried about running on existing hardware".
>
>>  I don't
>> think we should fix this in the upstream kernel at all. Distros are
>> free to do what they like, of course, and I'm sure RedHat already have
>> a fix for this in their downstream kernel. But putting this upstream
>> means we will never be able to remove it again,
>
> Quirks are pretty self contained and should be very unobtrusive to the
> common code paths though. Also I expect that quirks _can_ be removed
> once the platform has actually died in reality (not just no longer
> produced) or becomes too much of a burden for other reasons (which AIUI
> is what eventually happened to Voyager).
>
>>  which would be
>> especially unfortunate given that it is the first ever DMI quirk for
>> arm64, which we tried *very* hard to avoid, also because we don't
>> initialize the DMI framework as early as x86 does, and so once we open
>> the floodgates,
>
> The "flood" is inversely proportional to the quality of the firmware
> certification and it isn't too overwhelming on x86, which historically
> had next to no certification apart from "runs Windows", so it seems
> unlikely to me that on arm64, where some attempts have been made at
> validation and test suites from very near the start, that the flood
> will be all that overwhelming.
>
>>  we will run into issues where we will need to reorder
>> the init sequence to make DMI data available early enough.
>
>> a) DMI quirks will be permitted on arm64
>> b) we care about m400 enough to put this quirk in the upstream kernel
>
> In general arm64 Linux is going to need to be able to cope with
> firmware in the field which is either rubbish to some degree or which
> predates the addition of some support in the kernel and turns out not
> to be fully functional when that support is enabled (the latter it
> seems being what happened in the m400 case).
>
> So, I think DMI quirks are probably, in reality, inevitable unless you
> think firmware authors are going to be infaliable or the
> testing/certification suites never has any gaps in it.
>

Oh, obviously. But this is exactly my point about flood gates: we know
we need implement support for them, but that fact alone does not
justify adding quirks for dead platforms for issues that can be
trivially worked around.

On a related note: what we *could* do to accommodate platforms such as
m400 that are affected by quirks that can be worked around by a
command line parameter: we could teach the stub to look at the
contents of the 'LinuxExtraArgs' EFI environment variable and append
it to the kernel command line. This is trivial to implement, given
that we already manipulate and parse the command line in the stub, and
would allow for a 'fix and forget' tweak to be applied to such
platforms., without having to accumulate quirks for broken platforms
that are difficult to remove later.

> Given that, the overhead of then supporting m400 seems pretty trivial.
>
> That said, maybe there are more appropriate mechanisms than DMI on
> arm64 for detecting and activating quirks?
>
> Cheers,
> Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-03 15:17       ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2018-07-03 15:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 3 July 2018 at 10:44, Ian Campbell <ijc@debian.org> wrote:
> On Thu, 2018-06-28 at 12:25 +0200, Ard Biesheuvel wrote:
>> I understand the desire to keep running these M400s as long as they
>> have some life left in them, but the reality is that they are end of
>> life already, and not many were manufactured to begin with.
>
> Linux has a long history of supporting such devices so long as there is
> someone around willing to keep them running (witness for example how
> long x86/voyager lived with just 1 in existence in a motivated
> developer's basement, probably some number of entire architectures and
> I bet a not insubstantial chunk of the platform support in arch/arm).
>

I wonder how many such quirks fall into the 'user cannot be bothered
to add a kernel command line option' category.

>> Given how the upstream kernel is aimed at future development,
>
> That might be true in some sense but I don't think it can be said to
> extends to "not worried about running on existing hardware".
>
>>  I don't
>> think we should fix this in the upstream kernel at all. Distros are
>> free to do what they like, of course, and I'm sure RedHat already have
>> a fix for this in their downstream kernel. But putting this upstream
>> means we will never be able to remove it again,
>
> Quirks are pretty self contained and should be very unobtrusive to the
> common code paths though. Also I expect that quirks _can_ be removed
> once the platform has actually died in reality (not just no longer
> produced) or becomes too much of a burden for other reasons (which AIUI
> is what eventually happened to Voyager).
>
>>  which would be
>> especially unfortunate given that it is the first ever DMI quirk for
>> arm64, which we tried *very* hard to avoid, also because we don't
>> initialize the DMI framework as early as x86 does, and so once we open
>> the floodgates,
>
> The "flood" is inversely proportional to the quality of the firmware
> certification and it isn't too overwhelming on x86, which historically
> had next to no certification apart from "runs Windows", so it seems
> unlikely to me that on arm64, where some attempts have been made at
> validation and test suites from very near the start, that the flood
> will be all that overwhelming.
>
>>  we will run into issues where we will need to reorder
>> the init sequence to make DMI data available early enough.
>
>> a) DMI quirks will be permitted on arm64
>> b) we care about m400 enough to put this quirk in the upstream kernel
>
> In general arm64 Linux is going to need to be able to cope with
> firmware in the field which is either rubbish to some degree or which
> predates the addition of some support in the kernel and turns out not
> to be fully functional when that support is enabled (the latter it
> seems being what happened in the m400 case).
>
> So, I think DMI quirks are probably, in reality, inevitable unless you
> think firmware authors are going to be infaliable or the
> testing/certification suites never has any gaps in it.
>

Oh, obviously. But this is exactly my point about flood gates: we know
we need implement support for them, but that fact alone does not
justify adding quirks for dead platforms for issues that can be
trivially worked around.

On a related note: what we *could* do to accommodate platforms such as
m400 that are affected by quirks that can be worked around by a
command line parameter: we could teach the stub to look at the
contents of the 'LinuxExtraArgs' EFI environment variable and append
it to the kernel command line. This is trivial to implement, given
that we already manipulate and parse the command line in the stub, and
would allow for a 'fix and forget' tweak to be applied to such
platforms., without having to accumulate quirks for broken platforms
that are difficult to remove later.

> Given that, the overhead of then supporting m400 seems pretty trivial.
>
> That said, maybe there are more appropriate mechanisms than DMI on
> arm64 for detecting and activating quirks?
>
> Cheers,
> Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-07-03 15:17       ` Ard Biesheuvel
@ 2018-07-03 15:47         ` Ian Campbell
  -1 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03 15:47 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Lorenzo Pieralisi, Geoff Levand, Riku Voipio, Sudeep Holla,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Mark Salter,
	linux-arm-kernel

On Tue, 2018-07-03 at 17:17 +0200, Ard Biesheuvel wrote:
> On 3 July 2018 at 10:44, Ian Campbell <ijc@debian.org> wrote:
> > On Thu, 2018-06-28 at 12:25 +0200, Ard Biesheuvel wrote:
> > > I understand the desire to keep running these M400s as long as they
> > > have some life left in them, but the reality is that they are end of
> > > life already, and not many were manufactured to begin with.
> > 
> > Linux has a long history of supporting such devices so long as there is
> > someone around willing to keep them running (witness for example how
> > long x86/voyager lived with just 1 in existence in a motivated
> > developer's basement, probably some number of entire architectures and
> > I bet a not insubstantial chunk of the platform support in arch/arm).
> > 
> 
> I wonder how many such quirks fall into the 'user cannot be bothered
> to add a kernel command line option' category.

I don't know the overall picture, but the very first one I happened to
look at in arch/x86/kernel/acpi/boot.c (picked by grepping for quirk
and looking for acpi) just now was half a dozen quirks setting
acpi_skip_timer_override which is also settable on the command line.
There's also a bunch in there which just disable ACPI completely which
is also possible on the command line.

My gut feeling is that these are the rule not the exception.

> > So, I think DMI quirks are probably, in reality, inevitable unless
> > you
> > think firmware authors are going to be infaliable or the
> > testing/certification suites never has any gaps in it.
> > 
> 
> Oh, obviously. But this is exactly my point about flood gates: we know
> we need implement support for them, but that fact alone does not
> justify adding quirks for dead platforms for issues that can be
> trivially worked around.

Is m400 really dead? There certainly seem to be people around who care
about keeping it running and have access to them.

> On a related note: what we *could* do to accommodate platforms such as
> m400 that are affected by quirks that can be worked around by a
> command line parameter: we could teach the stub to look at the
> contents of the 'LinuxExtraArgs' EFI environment variable and append
> it to the kernel command line. This is trivial to implement, given
> that we already manipulate and parse the command line in the stub, and
> would allow for a 'fix and forget' tweak to be applied to such
> platforms., without having to accumulate quirks for broken platforms
> that are difficult to remove later.

Ideally the quirk would be a single entry in a table, which is
unobtrusive enough not to worry about removing.

Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-03 15:47         ` Ian Campbell
  0 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03 15:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2018-07-03 at 17:17 +0200, Ard Biesheuvel wrote:
> On 3 July 2018 at 10:44, Ian Campbell <ijc@debian.org> wrote:
> > On Thu, 2018-06-28 at 12:25 +0200, Ard Biesheuvel wrote:
> > > I understand the desire to keep running these M400s as long as they
> > > have some life left in them, but the reality is that they are end of
> > > life already, and not many were manufactured to begin with.
> > 
> > Linux has a long history of supporting such devices so long as there is
> > someone around willing to keep them running (witness for example how
> > long x86/voyager lived with just 1 in existence in a motivated
> > developer's basement, probably some number of entire architectures and
> > I bet a not insubstantial chunk of the platform support in arch/arm).
> > 
> 
> I wonder how many such quirks fall into the 'user cannot be bothered
> to add a kernel command line option' category.

I don't know the overall picture, but the very first one I happened to
look at in arch/x86/kernel/acpi/boot.c (picked by grepping for quirk
and looking for acpi) just now was half a dozen quirks setting
acpi_skip_timer_override which is also settable on the command line.
There's also a bunch in there which just disable ACPI completely which
is also possible on the command line.

My gut feeling is that these are the rule not the exception.

> > So, I think DMI quirks are probably, in reality, inevitable unless
> > you
> > think firmware authors are going to be infaliable or the
> > testing/certification suites never has any gaps in it.
> > 
> 
> Oh, obviously. But this is exactly my point about flood gates: we know
> we need implement support for them, but that fact alone does not
> justify adding quirks for dead platforms for issues that can be
> trivially worked around.

Is m400 really dead? There certainly seem to be people around who care
about keeping it running and have access to them.

> On a related note: what we *could* do to accommodate platforms such as
> m400 that are affected by quirks that can be worked around by a
> command line parameter: we could teach the stub to look at the
> contents of the 'LinuxExtraArgs' EFI environment variable and append
> it to the kernel command line. This is trivial to implement, given
> that we already manipulate and parse the command line in the stub, and
> would allow for a 'fix and forget' tweak to be applied to such
> platforms., without having to accumulate quirks for broken platforms
> that are difficult to remove later.

Ideally the quirk would be a single entry in a table, which is
unobtrusive enough not to worry about removing.

Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-07-03 15:47         ` Ian Campbell
@ 2018-07-03 17:12           ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Pieralisi @ 2018-07-03 17:12 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ard Biesheuvel, Geoff Levand, Riku Voipio, Sudeep Holla,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Mark Salter,
	linux-arm-kernel

On Tue, Jul 03, 2018 at 04:47:51PM +0100, Ian Campbell wrote:
> On Tue, 2018-07-03 at 17:17 +0200, Ard Biesheuvel wrote:
> > On 3 July 2018 at 10:44, Ian Campbell <ijc@debian.org> wrote:
> > > On Thu, 2018-06-28 at 12:25 +0200, Ard Biesheuvel wrote:
> > > > I understand the desire to keep running these M400s as long as they
> > > > have some life left in them, but the reality is that they are end of
> > > > life already, and not many were manufactured to begin with.
> > > 
> > > Linux has a long history of supporting such devices so long as there is
> > > someone around willing to keep them running (witness for example how
> > > long x86/voyager lived with just 1 in existence in a motivated
> > > developer's basement, probably some number of entire architectures and
> > > I bet a not insubstantial chunk of the platform support in arch/arm).
> > > 
> > 
> > I wonder how many such quirks fall into the 'user cannot be bothered
> > to add a kernel command line option' category.
> 
> I don't know the overall picture, but the very first one I happened to
> look at in arch/x86/kernel/acpi/boot.c (picked by grepping for quirk
> and looking for acpi) just now was half a dozen quirks setting
> acpi_skip_timer_override which is also settable on the command line.
> There's also a bunch in there which just disable ACPI completely which
> is also possible on the command line.
> 
> My gut feeling is that these are the rule not the exception.
> 
> > > So, I think DMI quirks are probably, in reality, inevitable unless
> > > you
> > > think firmware authors are going to be infaliable or the
> > > testing/certification suites never has any gaps in it.
> > > 
> > 
> > Oh, obviously. But this is exactly my point about flood gates: we know
> > we need implement support for them, but that fact alone does not
> > justify adding quirks for dead platforms for issues that can be
> > trivially worked around.
> 
> Is m400 really dead? There certainly seem to be people around who care
> about keeping it running and have access to them.

I do not think anybody is preventing that, it is just that we do not
see the reason for adding a DMI quirk to the mainline kernel to enable
a platform with broken firmware that cripples one of the main feature
it is supposed to implement, we can go on forever about this but that's
the gist.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-03 17:12           ` Lorenzo Pieralisi
  0 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Pieralisi @ 2018-07-03 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 03, 2018 at 04:47:51PM +0100, Ian Campbell wrote:
> On Tue, 2018-07-03 at 17:17 +0200, Ard Biesheuvel wrote:
> > On 3 July 2018 at 10:44, Ian Campbell <ijc@debian.org> wrote:
> > > On Thu, 2018-06-28 at 12:25 +0200, Ard Biesheuvel wrote:
> > > > I understand the desire to keep running these M400s as long as they
> > > > have some life left in them, but the reality is that they are end of
> > > > life already, and not many were manufactured to begin with.
> > > 
> > > Linux has a long history of supporting such devices so long as there is
> > > someone around willing to keep them running (witness for example how
> > > long x86/voyager lived with just 1 in existence in a motivated
> > > developer's basement, probably some number of entire architectures and
> > > I bet a not insubstantial chunk of the platform support in arch/arm).
> > > 
> > 
> > I wonder how many such quirks fall into the 'user cannot be bothered
> > to add a kernel command line option' category.
> 
> I don't know the overall picture, but the very first one I happened to
> look at in arch/x86/kernel/acpi/boot.c (picked by grepping for quirk
> and looking for acpi) just now was half a dozen quirks setting
> acpi_skip_timer_override which is also settable on the command line.
> There's also a bunch in there which just disable ACPI completely which
> is also possible on the command line.
> 
> My gut feeling is that these are the rule not the exception.
> 
> > > So, I think DMI quirks are probably, in reality, inevitable unless
> > > you
> > > think firmware authors are going to be infaliable or the
> > > testing/certification suites never has any gaps in it.
> > > 
> > 
> > Oh, obviously. But this is exactly my point about flood gates: we know
> > we need implement support for them, but that fact alone does not
> > justify adding quirks for dead platforms for issues that can be
> > trivially worked around.
> 
> Is m400 really dead? There certainly seem to be people around who care
> about keeping it running and have access to them.

I do not think anybody is preventing that, it is just that we do not
see the reason for adding a DMI quirk to the mainline kernel to enable
a platform with broken firmware that cripples one of the main feature
it is supposed to implement, we can go on forever about this but that's
the gist.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-07-03 17:12           ` Lorenzo Pieralisi
@ 2018-07-03 17:16             ` Ian Campbell
  -1 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03 17:16 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Ard Biesheuvel, Geoff Levand, Riku Voipio, Mark Salter,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
> I do not think anybody is preventing that, it is just that we do not
> see the reason for adding a DMI quirk to the mainline kernel to enable
> a platform with broken firmware that cripples one of the main feature
> it is supposed to implement, we can go on forever about this but that's
> the gist.

The quirk turns off a broken feature on the platform where it is
broken, not everywhere, there's no "crippling" of the feature.

Or are you suggesting that you have in mind a way to fix this which
makes HEST work even on m400 and renders the quirk unnecessary?

Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-03 17:16             ` Ian Campbell
  0 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
> I do not think anybody is preventing that, it is just that we do not
> see the reason for adding a DMI quirk to the mainline kernel to enable
> a platform with broken firmware that cripples one of the main feature
> it is supposed to implement, we can go on forever about this but that's
> the gist.

The quirk turns off a broken feature on the platform where it is
broken, not everywhere, there's no "crippling" of the feature.

Or are you suggesting that you have in mind a way to fix this which
makes HEST work even on m400 and renders the quirk unnecessary?

Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-07-03 17:16             ` Ian Campbell
@ 2018-07-03 17:39               ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Pieralisi @ 2018-07-03 17:39 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ard Biesheuvel, Geoff Levand, Riku Voipio, Mark Salter,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

On Tue, Jul 03, 2018 at 06:16:12PM +0100, Ian Campbell wrote:
> On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > I do not think anybody is preventing that, it is just that we do not
> > see the reason for adding a DMI quirk to the mainline kernel to enable
> > a platform with broken firmware that cripples one of the main feature
> > it is supposed to implement, we can go on forever about this but that's
> > the gist.
> 
> The quirk turns off a broken feature on the platform where it is
> broken, not everywhere, there's no "crippling" of the feature.
> 
> Or are you suggesting that you have in mind a way to fix this which
> makes HEST work even on m400 and renders the quirk unnecessary?

HEST error reporting is broken on those platforms and it is one
of the main features we expect from FW in ACPI systems, that
glossing over the legacy nature of m400 platforms.

What I do not understand, and Ard pointed that out already, is why we
should add a DMI quirk to the mainline kernel (that he tried/is trying
very hard to prevent since ACPI for ARM64 was merged) to enable a legacy
platform with missing/broken (HEST) key FW functionality in a
distribution.

If we answer that question we can merge this series but I see no
compelling reason for the time being.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-03 17:39               ` Lorenzo Pieralisi
  0 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Pieralisi @ 2018-07-03 17:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 03, 2018 at 06:16:12PM +0100, Ian Campbell wrote:
> On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > I do not think anybody is preventing that, it is just that we do not
> > see the reason for adding a DMI quirk to the mainline kernel to enable
> > a platform with broken firmware that cripples one of the main feature
> > it is supposed to implement, we can go on forever about this but that's
> > the gist.
> 
> The quirk turns off a broken feature on the platform where it is
> broken, not everywhere, there's no "crippling" of the feature.
> 
> Or are you suggesting that you have in mind a way to fix this which
> makes HEST work even on m400 and renders the quirk unnecessary?

HEST error reporting is broken on those platforms and it is one
of the main features we expect from FW in ACPI systems, that
glossing over the legacy nature of m400 platforms.

What I do not understand, and Ard pointed that out already, is why we
should add a DMI quirk to the mainline kernel (that he tried/is trying
very hard to prevent since ACPI for ARM64 was merged) to enable a legacy
platform with missing/broken (HEST) key FW functionality in a
distribution.

If we answer that question we can merge this series but I see no
compelling reason for the time being.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-07-03 17:39               ` Lorenzo Pieralisi
@ 2018-07-03 19:47                 ` Ian Campbell
  -1 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03 19:47 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Ard Biesheuvel, Geoff Levand, Riku Voipio, Mark Salter,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

On Tue, 2018-07-03 at 18:39 +0100, Lorenzo Pieralisi wrote:
> On Tue, Jul 03, 2018 at 06:16:12PM +0100, Ian Campbell wrote:
> > On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > > I do not think anybody is preventing that, it is just that we do
> not
> > > see the reason for adding a DMI quirk to the mainline kernel to
> enable
> > > a platform with broken firmware that cripples one of the main
> feature
> > > it is supposed to implement, we can go on forever about this but
> that's
> > > the gist.
> > 
> > The quirk turns off a broken feature on the platform where it is
> > broken, not everywhere, there's no "crippling" of the feature.
> > 
> > Or are you suggesting that you have in mind a way to fix this which
> > makes HEST work even on m400 and renders the quirk unnecessary?
> 
> HEST error reporting is broken on those platforms and it is one
> of the main features we expect from FW in ACPI systems, that
> glossing over the legacy nature of m400 platforms.
> 
> What I do not understand, and Ard pointed that out already, is why we
> should add a DMI quirk to the mainline kernel (that he tried/is trying
> very hard to prevent since ACPI for ARM64 was merged) to enable a legacy
> platform with missing/broken (HEST) key FW functionality in a
> distribution.
> 
> If we answer that question we can merge this series but I see no
> compelling reason for the time being.

These systems still exist in the real world and enabling HEST in the
(generic) kernel configuration causes a regression on those systems.

The advice from upstream Linux maintainers for many years has been for
distros to remain as close as possible to upstream and to take stable
updates early and often. Telling distros to carry patches because a
platform is no longer produced seems to me to be completely counter to
that.

Is it the policy now that users of a platform should no longer upgrade
their kernels once the manufacturer has decided the platform is EOL, or
shortly after when the kernel decides it is no longer worth supporting?

Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-03 19:47                 ` Ian Campbell
  0 siblings, 0 replies; 36+ messages in thread
From: Ian Campbell @ 2018-07-03 19:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2018-07-03 at 18:39 +0100, Lorenzo Pieralisi wrote:
> On Tue, Jul 03, 2018 at 06:16:12PM +0100, Ian Campbell wrote:
> > On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > > I do not think anybody is preventing that, it is just that we do
> not
> > > see the reason for adding a DMI quirk to the mainline kernel to
> enable
> > > a platform with broken firmware that cripples one of the main
> feature
> > > it is supposed to implement, we can go on forever about this but
> that's
> > > the gist.
> > 
> > The quirk turns off a broken feature on the platform where it is
> > broken, not everywhere, there's no "crippling" of the feature.
> > 
> > Or are you suggesting that you have in mind a way to fix this which
> > makes HEST work even on m400 and renders the quirk unnecessary?
> 
> HEST error reporting is broken on those platforms and it is one
> of the main features we expect from FW in ACPI systems, that
> glossing over the legacy nature of m400 platforms.
> 
> What I do not understand, and Ard pointed that out already, is why we
> should add a DMI quirk to the mainline kernel (that he tried/is trying
> very hard to prevent since ACPI for ARM64 was merged) to enable a legacy
> platform with missing/broken (HEST) key FW functionality in a
> distribution.
> 
> If we answer that question we can merge this series but I see no
> compelling reason for the time being.

These systems still exist in the real world and enabling HEST in the
(generic) kernel configuration causes a regression on those systems.

The advice from upstream Linux maintainers for many years has been for
distros to remain as close as possible to upstream and to take stable
updates early and often. Telling distros to carry patches because a
platform is no longer produced seems to me to be completely counter to
that.

Is it the policy now that users of a platform should no longer upgrade
their kernels once the manufacturer has decided the platform is EOL, or
shortly after when the kernel decides it is no longer worth supporting?

Ian.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-07-03 19:47                 ` Ian Campbell
@ 2018-07-04  9:14                   ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Pieralisi @ 2018-07-04  9:14 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ard Biesheuvel, Geoff Levand, Riku Voipio, Mark Salter,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

On Tue, Jul 03, 2018 at 08:47:15PM +0100, Ian Campbell wrote:
> On Tue, 2018-07-03 at 18:39 +0100, Lorenzo Pieralisi wrote:
> > On Tue, Jul 03, 2018 at 06:16:12PM +0100, Ian Campbell wrote:
> > > On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > > > I do not think anybody is preventing that, it is just that we do
> > not
> > > > see the reason for adding a DMI quirk to the mainline kernel to
> > enable
> > > > a platform with broken firmware that cripples one of the main
> > feature
> > > > it is supposed to implement, we can go on forever about this but
> > that's
> > > > the gist.
> > > 
> > > The quirk turns off a broken feature on the platform where it is
> > > broken, not everywhere, there's no "crippling" of the feature.
> > > 
> > > Or are you suggesting that you have in mind a way to fix this which
> > > makes HEST work even on m400 and renders the quirk unnecessary?
> > 
> > HEST error reporting is broken on those platforms and it is one
> > of the main features we expect from FW in ACPI systems, that
> > glossing over the legacy nature of m400 platforms.
> > 
> > What I do not understand, and Ard pointed that out already, is why we
> > should add a DMI quirk to the mainline kernel (that he tried/is trying
> > very hard to prevent since ACPI for ARM64 was merged) to enable a legacy
> > platform with missing/broken (HEST) key FW functionality in a
> > distribution.
> > 
> > If we answer that question we can merge this series but I see no
> > compelling reason for the time being.
> 
> These systems still exist in the real world and enabling HEST in the
> (generic) kernel configuration causes a regression on those systems.
> 
> The advice from upstream Linux maintainers for many years has been for
> distros to remain as close as possible to upstream and to take stable
> updates early and often. Telling distros to carry patches because a
> platform is no longer produced seems to me to be completely counter to
> that.
> 
> Is it the policy now that users of a platform should no longer upgrade
> their kernels once the manufacturer has decided the platform is EOL, or
> shortly after when the kernel decides it is no longer worth supporting?

I do not think it is an argument as clean-cut as you may want it to
appear. We are talking about one of (if not "the") earliest ACPI
platforms on ARM64 with all the implications on FW that this may
have. We already had to add horrible quirks (PCI) for people to
use it.

We are telling you that it would be preferable to avoid taking quirks
for this platform given its legacy nature and EOL FW, at least not
DMI based quirks.

You are referring to the process in general, I am referring to
a specific platform with its ACPI support in mind that caused all
sort of issues from an upstream point of view.

Ard provided an alternative to this patch series since he has good
reasons not to want it in the mainline kernel, we understand your point
but I think it is time for you and others to understand ours too.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-04  9:14                   ` Lorenzo Pieralisi
  0 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Pieralisi @ 2018-07-04  9:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 03, 2018 at 08:47:15PM +0100, Ian Campbell wrote:
> On Tue, 2018-07-03 at 18:39 +0100, Lorenzo Pieralisi wrote:
> > On Tue, Jul 03, 2018 at 06:16:12PM +0100, Ian Campbell wrote:
> > > On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > > > I do not think anybody is preventing that, it is just that we do
> > not
> > > > see the reason for adding a DMI quirk to the mainline kernel to
> > enable
> > > > a platform with broken firmware that cripples one of the main
> > feature
> > > > it is supposed to implement, we can go on forever about this but
> > that's
> > > > the gist.
> > > 
> > > The quirk turns off a broken feature on the platform where it is
> > > broken, not everywhere, there's no "crippling" of the feature.
> > > 
> > > Or are you suggesting that you have in mind a way to fix this which
> > > makes HEST work even on m400 and renders the quirk unnecessary?
> > 
> > HEST error reporting is broken on those platforms and it is one
> > of the main features we expect from FW in ACPI systems, that
> > glossing over the legacy nature of m400 platforms.
> > 
> > What I do not understand, and Ard pointed that out already, is why we
> > should add a DMI quirk to the mainline kernel (that he tried/is trying
> > very hard to prevent since ACPI for ARM64 was merged) to enable a legacy
> > platform with missing/broken (HEST) key FW functionality in a
> > distribution.
> > 
> > If we answer that question we can merge this series but I see no
> > compelling reason for the time being.
> 
> These systems still exist in the real world and enabling HEST in the
> (generic) kernel configuration causes a regression on those systems.
> 
> The advice from upstream Linux maintainers for many years has been for
> distros to remain as close as possible to upstream and to take stable
> updates early and often. Telling distros to carry patches because a
> platform is no longer produced seems to me to be completely counter to
> that.
> 
> Is it the policy now that users of a platform should no longer upgrade
> their kernels once the manufacturer has decided the platform is EOL, or
> shortly after when the kernel decides it is no longer worth supporting?

I do not think it is an argument as clean-cut as you may want it to
appear. We are talking about one of (if not "the") earliest ACPI
platforms on ARM64 with all the implications on FW that this may
have. We already had to add horrible quirks (PCI) for people to
use it.

We are telling you that it would be preferable to avoid taking quirks
for this platform given its legacy nature and EOL FW, at least not
DMI based quirks.

You are referring to the process in general, I am referring to
a specific platform with its ACPI support in mind that caused all
sort of issues from an upstream point of view.

Ard provided an alternative to this patch series since he has good
reasons not to want it in the mainline kernel, we understand your point
but I think it is time for you and others to understand ours too.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
  2018-07-03 19:47                 ` Ian Campbell
@ 2018-07-04  9:47                   ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2018-07-04  9:47 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Lorenzo Pieralisi, Geoff Levand, Riku Voipio, Mark Salter,
	ACPI Devel Maling List, James Morse, Hanjun Guo, Sudeep Holla,
	linux-arm-kernel

On 3 July 2018 at 21:47, Ian Campbell <ijc@debian.org> wrote:
> On Tue, 2018-07-03 at 18:39 +0100, Lorenzo Pieralisi wrote:
>> On Tue, Jul 03, 2018 at 06:16:12PM +0100, Ian Campbell wrote:
>> > On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
>> > > I do not think anybody is preventing that, it is just that we do
>> not
>> > > see the reason for adding a DMI quirk to the mainline kernel to
>> enable
>> > > a platform with broken firmware that cripples one of the main
>> feature
>> > > it is supposed to implement, we can go on forever about this but
>> that's
>> > > the gist.
>> >
>> > The quirk turns off a broken feature on the platform where it is
>> > broken, not everywhere, there's no "crippling" of the feature.
>> >
>> > Or are you suggesting that you have in mind a way to fix this which
>> > makes HEST work even on m400 and renders the quirk unnecessary?
>>
>> HEST error reporting is broken on those platforms and it is one
>> of the main features we expect from FW in ACPI systems, that
>> glossing over the legacy nature of m400 platforms.
>>
>> What I do not understand, and Ard pointed that out already, is why we
>> should add a DMI quirk to the mainline kernel (that he tried/is trying
>> very hard to prevent since ACPI for ARM64 was merged) to enable a legacy
>> platform with missing/broken (HEST) key FW functionality in a
>> distribution.
>>
>> If we answer that question we can merge this series but I see no
>> compelling reason for the time being.
>
> These systems still exist in the real world and enabling HEST in the
> (generic) kernel configuration causes a regression on those systems.
>
> The advice from upstream Linux maintainers for many years has been for
> distros to remain as close as possible to upstream and to take stable
> updates early and often. Telling distros to carry patches because a
> platform is no longer produced seems to me to be completely counter to
> that.
>
> Is it the policy now that users of a platform should no longer upgrade
> their kernels once the manufacturer has decided the platform is EOL, or
> shortly after when the kernel decides it is no longer worth supporting?
>

It is entirely reasonable for a legacy platform to remain on a stable
kernel branch, and as Linaro LEG, we own ~25% of all the M400's
currently in circulation, and we have no desire to run bleeding edge
kernels on them.

It is also entirely reasonable for a legacy platform to require
minimally invasive surgery (e.g., add a kernel command line param to
/etc/default/grub) when moving to a kernel branch that is 5 years
newer than the platform in question.

Please don't put it like you have no other option than to stick with
an outdated bug ridden Linux version because we are refusing to take
your quirk.

Also, 'what x86 does' is not gospel. We are in the fortunate position
to be able to learn from x86, and at the same time, we are maintaining
an architecture, not what amounts to a single platform (i.e.,
virtually all x86s are essentially PCs). So yes, we will most likely
need quirks in the mainline kernel to work around silicon for firmware
issues, but we can try to do a better job than x86 simply we have the
luxury of hindsight, and quirking trivial things left and right is one
of the things we can surely try to avoid.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware
@ 2018-07-04  9:47                   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2018-07-04  9:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 3 July 2018 at 21:47, Ian Campbell <ijc@debian.org> wrote:
> On Tue, 2018-07-03 at 18:39 +0100, Lorenzo Pieralisi wrote:
>> On Tue, Jul 03, 2018 at 06:16:12PM +0100, Ian Campbell wrote:
>> > On Tue, 2018-07-03 at 18:12 +0100, Lorenzo Pieralisi wrote:
>> > > I do not think anybody is preventing that, it is just that we do
>> not
>> > > see the reason for adding a DMI quirk to the mainline kernel to
>> enable
>> > > a platform with broken firmware that cripples one of the main
>> feature
>> > > it is supposed to implement, we can go on forever about this but
>> that's
>> > > the gist.
>> >
>> > The quirk turns off a broken feature on the platform where it is
>> > broken, not everywhere, there's no "crippling" of the feature.
>> >
>> > Or are you suggesting that you have in mind a way to fix this which
>> > makes HEST work even on m400 and renders the quirk unnecessary?
>>
>> HEST error reporting is broken on those platforms and it is one
>> of the main features we expect from FW in ACPI systems, that
>> glossing over the legacy nature of m400 platforms.
>>
>> What I do not understand, and Ard pointed that out already, is why we
>> should add a DMI quirk to the mainline kernel (that he tried/is trying
>> very hard to prevent since ACPI for ARM64 was merged) to enable a legacy
>> platform with missing/broken (HEST) key FW functionality in a
>> distribution.
>>
>> If we answer that question we can merge this series but I see no
>> compelling reason for the time being.
>
> These systems still exist in the real world and enabling HEST in the
> (generic) kernel configuration causes a regression on those systems.
>
> The advice from upstream Linux maintainers for many years has been for
> distros to remain as close as possible to upstream and to take stable
> updates early and often. Telling distros to carry patches because a
> platform is no longer produced seems to me to be completely counter to
> that.
>
> Is it the policy now that users of a platform should no longer upgrade
> their kernels once the manufacturer has decided the platform is EOL, or
> shortly after when the kernel decides it is no longer worth supporting?
>

It is entirely reasonable for a legacy platform to remain on a stable
kernel branch, and as Linaro LEG, we own ~25% of all the M400's
currently in circulation, and we have no desire to run bleeding edge
kernels on them.

It is also entirely reasonable for a legacy platform to require
minimally invasive surgery (e.g., add a kernel command line param to
/etc/default/grub) when moving to a kernel branch that is 5 years
newer than the platform in question.

Please don't put it like you have no other option than to stick with
an outdated bug ridden Linux version because we are refusing to take
your quirk.

Also, 'what x86 does' is not gospel. We are in the fortunate position
to be able to learn from x86, and at the same time, we are maintaining
an architecture, not what amounts to a single platform (i.e.,
virtually all x86s are essentially PCs). So yes, we will most likely
need quirks in the mainline kernel to work around silicon for firmware
issues, but we can try to do a better job than x86 simply we have the
luxury of hindsight, and quirking trivial things left and right is one
of the things we can surely try to avoid.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2018-07-04  9:47 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-28 10:06 [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware James Morse
2018-06-28 10:06 ` James Morse
2018-06-28 10:06 ` [RFC/RFT PATCH 1/2] efi: Add helper to retrieve runtime version number James Morse
2018-06-28 10:06   ` James Morse
2018-06-28 10:06 ` [RFC/RFT PATCH 2/2] ACPI / APEI: Add DMI matching quirks for platforms that require hest_disable James Morse
2018-06-28 10:06   ` James Morse
2018-06-28 10:25 ` [RFC/RFT PATCH 0/2] disable_hest quirk on HP m400 with bad UEFI firmwware Ard Biesheuvel
2018-06-28 10:25   ` Ard Biesheuvel
2018-06-28 12:51   ` Lorenzo Pieralisi
2018-06-28 12:51     ` Lorenzo Pieralisi
2018-06-28 14:24   ` James Morse
2018-06-28 14:24     ` James Morse
2018-06-28 16:15   ` Geoff Levand
2018-06-28 16:15     ` Geoff Levand
2018-06-28 20:56     ` Ard Biesheuvel
2018-06-28 20:56       ` Ard Biesheuvel
2018-07-03  8:46       ` Ian Campbell
2018-07-03  8:46         ` Ian Campbell
2018-07-03  8:44   ` Ian Campbell
2018-07-03  8:44     ` Ian Campbell
2018-07-03 15:17     ` Ard Biesheuvel
2018-07-03 15:17       ` Ard Biesheuvel
2018-07-03 15:47       ` Ian Campbell
2018-07-03 15:47         ` Ian Campbell
2018-07-03 17:12         ` Lorenzo Pieralisi
2018-07-03 17:12           ` Lorenzo Pieralisi
2018-07-03 17:16           ` Ian Campbell
2018-07-03 17:16             ` Ian Campbell
2018-07-03 17:39             ` Lorenzo Pieralisi
2018-07-03 17:39               ` Lorenzo Pieralisi
2018-07-03 19:47               ` Ian Campbell
2018-07-03 19:47                 ` Ian Campbell
2018-07-04  9:14                 ` Lorenzo Pieralisi
2018-07-04  9:14                   ` Lorenzo Pieralisi
2018-07-04  9:47                 ` Ard Biesheuvel
2018-07-04  9:47                   ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.