All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/12] APEI in_nmi() rework and arm64 SDEI wire-up
@ 2018-04-27 15:34 ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:34 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

The aim of this series is to wire arm64's SDEI into APEI.

Nothing much has changed since v2, patch 3 is new and the KVM IS_ENABLED()
stuff has been cleaned up. Otherwise things are noted in each patch.

This touches a few trees, so I'm not sure how best it should be merged.
Patches 11 and 12 are reducing a race that is made worse by patch 4, I'd
like them to arrive together, even though patch 11 doesn't depend on anything
else in the series. (patches 11&12 could be moved to after 8 if that makes
things easier)


The earlier boiler-plate:

What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
used by firmware to tell the OS about firmware-first RAS events.

These Software exceptions can interrupt anything, so I describe them as
NMI-like. They aren't the only NMI-like way to notify the OS about
firmware-first RAS events, the ACPI spec also defines 'NOTFIY_SEA' and
'NOTIFY_SEI'.

(Acronyms: SEA, Synchronous External Abort. The CPU requested some memory,
but the owner of that memory said no. These are always synchronous with the
instruction that caused them. SEI, System-Error Interrupt, commonly called
SError. This is an asynchronous external abort, the memory-owner didn't say no
at the right point. Collectively these things are called external-aborts
How is firmware involved? It traps these and re-injects them into the kernel
once its written the CPER records).


APEI's GHES code only expects one source of NMI. If a platform implements
more than one of these mechanisms, APEI needs to handle the interaction.
'SEA' and 'SEI' can interact as 'SEI' is asynchronous. SDEI can interact
with itself: its exceptions can be 'normal' or 'critical', and firmware
could use both types for RAS. (errors using normal, 'panic-now' using
critical).

What does this series do?
Patches 1-4 refactor APEIs 'estatus queue' so it can be used for all
NMI-like notifications. This defers the NMI work to irq_work, which will
happen when we next unmask interrupts.

Patches 5&6 move the arch and KVM code around so that NMI-like notifications
are always called in_nmi().

Patch 7 changes the 'irq or nmi?' path through ghes_copy_tofrom_phys()
to be per-ghes. When called in_nmi(), the struct ghes is expected to
provide a fixmap slot and lock that is safe to use. NMI-like notifications
that mask each other can share these resources. Those that interact should
have their own fixmap slot and lock.

Patch 8 renames NOTIFY_SEA's use of NOTIFY_NMI's infrastructure, as we're
about to have multiple NMI-like users that can't share resources.

Pathes 9&10 add the SDEI helper, and notify methods for APEI.

After this, adding further firmware-first pieces for arm64 is simple
(and safe), and all our NMI-like notifications behave the same as x86's
NOTIFY_NMI.

All of this makes the race between memory_failure_queue() and
ret_to_user worse, as there is now always irq_work involved.

Patch 11 makes the reschedule to memory_failure() run as soon as possible.
Patch 12 makes sure the arch code knows whether the irq_work has run by
the time do_sea() returns. We can skip the signalling step if it has as
APEI has done its work.


ghes.c became clearer to me when I worked out that it has three sets of
functions with 'estatus' in the name. One is a pool of memory that can be
allocated-from atomically. This is grown/shrunk when new NMI users are
allocated.
The second is the estatus-cache, which holds recent notifications so it
can suppress notifications we've already handled.
The last it the estatus-queue, which holds data from NMI-like notifications
(in pool memory) to be processed from irq_work.


Testing?
Tested with the SDEI FVP based software model and a mocked up NOTFIY_SEA using
KVM. I've added a case where 'corrected errors' are discovered at probe time
to exercise ghes_probe() during boot. I've only build tested this on x86.


Thanks,

James

[0] http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf


James Morse (12):
  ACPI / APEI: Move the estatus queue code up, and under its own ifdef
  ACPI / APEI: Generalise the estatus queue's add/remove and notify code
  ACPI / APEI: don't wait to serialise with oops messages when
    panic()ing
  ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
  KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
  arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
  ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple
    in_nmi() users
  ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
  firmware: arm_sdei: Add ACPI GHES registration helper
  ACPI / APEI: Add support for the SDEI GHES Notification type
  mm/memory-failure: increase queued recovery work's priority
  arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work

 arch/arm/include/asm/kvm_ras.h       |  14 +
 arch/arm/include/asm/system_misc.h   |   5 -
 arch/arm64/include/asm/acpi.h        |   5 +-
 arch/arm64/include/asm/daifflags.h   |   1 +
 arch/arm64/include/asm/fixmap.h      |   8 +-
 arch/arm64/include/asm/kvm_ras.h     |  24 ++
 arch/arm64/include/asm/system_misc.h |   2 -
 arch/arm64/kernel/acpi.c             |  49 ++++
 arch/arm64/mm/fault.c                |  30 +-
 drivers/acpi/apei/ghes.c             | 517 ++++++++++++++++++++---------------
 drivers/firmware/arm_sdei.c          |  77 ++++++
 include/acpi/ghes.h                  |   4 +
 include/linux/arm_sdei.h             |   8 +
 mm/memory-failure.c                  |  11 +-
 virt/kvm/arm/mmu.c                   |   4 +-
 15 files changed, 499 insertions(+), 260 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_ras.h
 create mode 100644 arch/arm64/include/asm/kvm_ras.h

-- 
2.16.2

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 00/12] APEI in_nmi() rework and arm64 SDEI wire-up
@ 2018-04-27 15:34 ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:34 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

The aim of this series is to wire arm64's SDEI into APEI.

Nothing much has changed since v2, patch 3 is new and the KVM IS_ENABLED()
stuff has been cleaned up. Otherwise things are noted in each patch.

This touches a few trees, so I'm not sure how best it should be merged.
Patches 11 and 12 are reducing a race that is made worse by patch 4, I'd
like them to arrive together, even though patch 11 doesn't depend on anything
else in the series. (patches 11&12 could be moved to after 8 if that makes
things easier)


The earlier boiler-plate:

What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
used by firmware to tell the OS about firmware-first RAS events.

These Software exceptions can interrupt anything, so I describe them as
NMI-like. They aren't the only NMI-like way to notify the OS about
firmware-first RAS events, the ACPI spec also defines 'NOTFIY_SEA' and
'NOTIFY_SEI'.

(Acronyms: SEA, Synchronous External Abort. The CPU requested some memory,
but the owner of that memory said no. These are always synchronous with the
instruction that caused them. SEI, System-Error Interrupt, commonly called
SError. This is an asynchronous external abort, the memory-owner didn't say no
at the right point. Collectively these things are called external-aborts
How is firmware involved? It traps these and re-injects them into the kernel
once its written the CPER records).


APEI's GHES code only expects one source of NMI. If a platform implements
more than one of these mechanisms, APEI needs to handle the interaction.
'SEA' and 'SEI' can interact as 'SEI' is asynchronous. SDEI can interact
with itself: its exceptions can be 'normal' or 'critical', and firmware
could use both types for RAS. (errors using normal, 'panic-now' using
critical).

What does this series do?
Patches 1-4 refactor APEIs 'estatus queue' so it can be used for all
NMI-like notifications. This defers the NMI work to irq_work, which will
happen when we next unmask interrupts.

Patches 5&6 move the arch and KVM code around so that NMI-like notifications
are always called in_nmi().

Patch 7 changes the 'irq or nmi?' path through ghes_copy_tofrom_phys()
to be per-ghes. When called in_nmi(), the struct ghes is expected to
provide a fixmap slot and lock that is safe to use. NMI-like notifications
that mask each other can share these resources. Those that interact should
have their own fixmap slot and lock.

Patch 8 renames NOTIFY_SEA's use of NOTIFY_NMI's infrastructure, as we're
about to have multiple NMI-like users that can't share resources.

Pathes 9&10 add the SDEI helper, and notify methods for APEI.

After this, adding further firmware-first pieces for arm64 is simple
(and safe), and all our NMI-like notifications behave the same as x86's
NOTIFY_NMI.

All of this makes the race between memory_failure_queue() and
ret_to_user worse, as there is now always irq_work involved.

Patch 11 makes the reschedule to memory_failure() run as soon as possible.
Patch 12 makes sure the arch code knows whether the irq_work has run by
the time do_sea() returns. We can skip the signalling step if it has as
APEI has done its work.


ghes.c became clearer to me when I worked out that it has three sets of
functions with 'estatus' in the name. One is a pool of memory that can be
allocated-from atomically. This is grown/shrunk when new NMI users are
allocated.
The second is the estatus-cache, which holds recent notifications so it
can suppress notifications we've already handled.
The last it the estatus-queue, which holds data from NMI-like notifications
(in pool memory) to be processed from irq_work.


Testing?
Tested with the SDEI FVP based software model and a mocked up NOTFIY_SEA using
KVM. I've added a case where 'corrected errors' are discovered at probe time
to exercise ghes_probe() during boot. I've only build tested this on x86.


Thanks,

James

[0] http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf


James Morse (12):
  ACPI / APEI: Move the estatus queue code up, and under its own ifdef
  ACPI / APEI: Generalise the estatus queue's add/remove and notify code
  ACPI / APEI: don't wait to serialise with oops messages when
    panic()ing
  ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
  KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
  arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
  ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple
    in_nmi() users
  ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
  firmware: arm_sdei: Add ACPI GHES registration helper
  ACPI / APEI: Add support for the SDEI GHES Notification type
  mm/memory-failure: increase queued recovery work's priority
  arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work

 arch/arm/include/asm/kvm_ras.h       |  14 +
 arch/arm/include/asm/system_misc.h   |   5 -
 arch/arm64/include/asm/acpi.h        |   5 +-
 arch/arm64/include/asm/daifflags.h   |   1 +
 arch/arm64/include/asm/fixmap.h      |   8 +-
 arch/arm64/include/asm/kvm_ras.h     |  24 ++
 arch/arm64/include/asm/system_misc.h |   2 -
 arch/arm64/kernel/acpi.c             |  49 ++++
 arch/arm64/mm/fault.c                |  30 +-
 drivers/acpi/apei/ghes.c             | 517 ++++++++++++++++++++---------------
 drivers/firmware/arm_sdei.c          |  77 ++++++
 include/acpi/ghes.h                  |   4 +
 include/linux/arm_sdei.h             |   8 +
 mm/memory-failure.c                  |  11 +-
 virt/kvm/arm/mmu.c                   |   4 +-
 15 files changed, 499 insertions(+), 260 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_ras.h
 create mode 100644 arch/arm64/include/asm/kvm_ras.h

-- 
2.16.2

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 00/12] APEI in_nmi() rework and arm64 SDEI wire-up
@ 2018-04-27 15:34 ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

The aim of this series is to wire arm64's SDEI into APEI.

Nothing much has changed since v2, patch 3 is new and the KVM IS_ENABLED()
stuff has been cleaned up. Otherwise things are noted in each patch.

This touches a few trees, so I'm not sure how best it should be merged.
Patches 11 and 12 are reducing a race that is made worse by patch 4, I'd
like them to arrive together, even though patch 11 doesn't depend on anything
else in the series. (patches 11&12 could be moved to after 8 if that makes
things easier)


The earlier boiler-plate:

What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
used by firmware to tell the OS about firmware-first RAS events.

These Software exceptions can interrupt anything, so I describe them as
NMI-like. They aren't the only NMI-like way to notify the OS about
firmware-first RAS events, the ACPI spec also defines 'NOTFIY_SEA' and
'NOTIFY_SEI'.

(Acronyms: SEA, Synchronous External Abort. The CPU requested some memory,
but the owner of that memory said no. These are always synchronous with the
instruction that caused them. SEI, System-Error Interrupt, commonly called
SError. This is an asynchronous external abort, the memory-owner didn't say no
at the right point. Collectively these things are called external-aborts
How is firmware involved? It traps these and re-injects them into the kernel
once its written the CPER records).


APEI's GHES code only expects one source of NMI. If a platform implements
more than one of these mechanisms, APEI needs to handle the interaction.
'SEA' and 'SEI' can interact as 'SEI' is asynchronous. SDEI can interact
with itself: its exceptions can be 'normal' or 'critical', and firmware
could use both types for RAS. (errors using normal, 'panic-now' using
critical).

What does this series do?
Patches 1-4 refactor APEIs 'estatus queue' so it can be used for all
NMI-like notifications. This defers the NMI work to irq_work, which will
happen when we next unmask interrupts.

Patches 5&6 move the arch and KVM code around so that NMI-like notifications
are always called in_nmi().

Patch 7 changes the 'irq or nmi?' path through ghes_copy_tofrom_phys()
to be per-ghes. When called in_nmi(), the struct ghes is expected to
provide a fixmap slot and lock that is safe to use. NMI-like notifications
that mask each other can share these resources. Those that interact should
have their own fixmap slot and lock.

Patch 8 renames NOTIFY_SEA's use of NOTIFY_NMI's infrastructure, as we're
about to have multiple NMI-like users that can't share resources.

Pathes 9&10 add the SDEI helper, and notify methods for APEI.

After this, adding further firmware-first pieces for arm64 is simple
(and safe), and all our NMI-like notifications behave the same as x86's
NOTIFY_NMI.

All of this makes the race between memory_failure_queue() and
ret_to_user worse, as there is now always irq_work involved.

Patch 11 makes the reschedule to memory_failure() run as soon as possible.
Patch 12 makes sure the arch code knows whether the irq_work has run by
the time do_sea() returns. We can skip the signalling step if it has as
APEI has done its work.


ghes.c became clearer to me when I worked out that it has three sets of
functions with 'estatus' in the name. One is a pool of memory that can be
allocated-from atomically. This is grown/shrunk when new NMI users are
allocated.
The second is the estatus-cache, which holds recent notifications so it
can suppress notifications we've already handled.
The last it the estatus-queue, which holds data from NMI-like notifications
(in pool memory) to be processed from irq_work.


Testing?
Tested with the SDEI FVP based software model and a mocked up NOTFIY_SEA using
KVM. I've added a case where 'corrected errors' are discovered at probe time
to exercise ghes_probe() during boot. I've only build tested this on x86.


Thanks,

James

[0] http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf


James Morse (12):
  ACPI / APEI: Move the estatus queue code up, and under its own ifdef
  ACPI / APEI: Generalise the estatus queue's add/remove and notify code
  ACPI / APEI: don't wait to serialise with oops messages when
    panic()ing
  ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
  KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
  arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
  ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple
    in_nmi() users
  ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
  firmware: arm_sdei: Add ACPI GHES registration helper
  ACPI / APEI: Add support for the SDEI GHES Notification type
  mm/memory-failure: increase queued recovery work's priority
  arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work

 arch/arm/include/asm/kvm_ras.h       |  14 +
 arch/arm/include/asm/system_misc.h   |   5 -
 arch/arm64/include/asm/acpi.h        |   5 +-
 arch/arm64/include/asm/daifflags.h   |   1 +
 arch/arm64/include/asm/fixmap.h      |   8 +-
 arch/arm64/include/asm/kvm_ras.h     |  24 ++
 arch/arm64/include/asm/system_misc.h |   2 -
 arch/arm64/kernel/acpi.c             |  49 ++++
 arch/arm64/mm/fault.c                |  30 +-
 drivers/acpi/apei/ghes.c             | 517 ++++++++++++++++++++---------------
 drivers/firmware/arm_sdei.c          |  77 ++++++
 include/acpi/ghes.h                  |   4 +
 include/linux/arm_sdei.h             |   8 +
 mm/memory-failure.c                  |  11 +-
 virt/kvm/arm/mmu.c                   |   4 +-
 15 files changed, 499 insertions(+), 260 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_ras.h
 create mode 100644 arch/arm64/include/asm/kvm_ras.h

-- 
2.16.2

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:34   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:34 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.

First we move the estatus-queue code higher in the file so that any
notify_foo() handler can make use of it.

This patch moves code around ... and makes the following trivial change:
Freshen the dated comment above ghes_estatus_llist. printk() is no
longer the issue, its the helpers like memory_failure_queue() that
still aren't nmi safe.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>

Notes for cover letter:
ghes.c has three things all called 'estatus'. One is a pool of memory
that has a static size, and is grown/shrunk when new NMI users are
allocated.
The second is the cache, this holds recent notifications so we can
suppress notifications we've already handled.
The last is the queue, which hold data from NMI notifications (in pool
memory) that can't be handled immediatly.
---
 drivers/acpi/apei/ghes.c | 265 ++++++++++++++++++++++++-----------------------
 1 file changed, 137 insertions(+), 128 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1efefe919555..e2af91c92135 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -545,6 +545,16 @@ static int ghes_print_estatus(const char *pfx,
 	return 0;
 }
 
+static void __ghes_panic(struct ghes *ghes)
+{
+	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
+
+	/* reboot to log the error! */
+	if (!panic_timeout)
+		panic_timeout = ghes_panic_timeout;
+	panic("Fatal hardware error!");
+}
+
 /*
  * GHES error status reporting throttle, to report more kinds of
  * errors, instead of just most frequently occurred errors.
@@ -672,6 +682,133 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
+#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+/*
+ * Handlers for CPER records may not be NMI safe. For example,
+ * memory_failure_queue() takes spinlocks and calls schedule_work_on().
+ * In any NMI-like handler, memory from ghes_estatus_pool is used to save
+ * estatus, and added to the ghes_estatus_llist. irq_work_queue() causes
+ * ghes_proc_in_irq() to run in IRQ context where each estatus in
+ * ghes_estatus_llist is processed. Each NMI-like error source must grow
+ * the ghes_estatus_pool to ensure memory is available.
+ *
+ * Memory from the ghes_estatus_pool is also used with the ghes_estatus_cache
+ * to suppress frequent messages.
+ */
+static struct llist_head ghes_estatus_llist;
+static struct irq_work ghes_proc_irq_work;
+
+static void ghes_print_queued_estatus(void)
+{
+	struct llist_node *llnode;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic *generic;
+	struct acpi_hest_generic_status *estatus;
+
+	llnode = llist_del_all(&ghes_estatus_llist);
+	/*
+	 * Because the time order of estatus in list is reversed,
+	 * revert it back to proper order.
+	 */
+	llnode = llist_reverse_order(llnode);
+	while (llnode) {
+		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+					   llnode);
+		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+		generic = estatus_node->generic;
+		ghes_print_estatus(NULL, generic, estatus);
+		llnode = llnode->next;
+	}
+}
+
+/* Save estatus for further processing in IRQ context */
+static void __process_error(struct ghes *ghes)
+{
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	u32 len, node_len;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic_status *estatus;
+
+	if (ghes_estatus_cached(ghes->estatus))
+		return;
+
+	len = cper_estatus_len(ghes->estatus);
+	node_len = GHES_ESTATUS_NODE_LEN(len);
+
+	estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
+	if (!estatus_node)
+		return;
+
+	estatus_node->ghes = ghes;
+	estatus_node->generic = ghes->generic;
+	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+	memcpy(estatus, ghes->estatus, len);
+	llist_add(&estatus_node->llnode, &ghes_estatus_llist);
+#endif
+}
+
+static unsigned long ghes_esource_prealloc_size(
+	const struct acpi_hest_generic *generic)
+{
+	unsigned long block_length, prealloc_records, prealloc_size;
+
+	block_length = min_t(unsigned long, generic->error_block_length,
+			     GHES_ESTATUS_MAX_SIZE);
+	prealloc_records = max_t(unsigned long,
+				 generic->records_to_preallocate, 1);
+	prealloc_size = min_t(unsigned long, block_length * prealloc_records,
+			      GHES_ESOURCE_PREALLOC_MAX_SIZE);
+
+	return prealloc_size;
+}
+
+static void ghes_estatus_pool_shrink(unsigned long len)
+{
+	ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
+}
+
+static void ghes_proc_in_irq(struct irq_work *irq_work)
+{
+	struct llist_node *llnode, *next;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic *generic;
+	struct acpi_hest_generic_status *estatus;
+	u32 len, node_len;
+
+	llnode = llist_del_all(&ghes_estatus_llist);
+	/*
+	 * Because the time order of estatus in list is reversed,
+	 * revert it back to proper order.
+	 */
+	llnode = llist_reverse_order(llnode);
+	while (llnode) {
+		next = llnode->next;
+		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+					   llnode);
+		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+		len = cper_estatus_len(estatus);
+		node_len = GHES_ESTATUS_NODE_LEN(len);
+		ghes_do_proc(estatus_node->ghes, estatus);
+		if (!ghes_estatus_cached(estatus)) {
+			generic = estatus_node->generic;
+			if (ghes_print_estatus(NULL, generic, estatus))
+				ghes_estatus_cache_add(generic, estatus);
+		}
+		gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
+			      node_len);
+		llnode = next;
+	}
+}
+
+static void ghes_nmi_init_cxt(void)
+{
+	init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq);
+}
+
+#else
+static inline void ghes_nmi_init_cxt(void) { }
+#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+
 static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 {
 	int rc;
@@ -687,16 +824,6 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 	return apei_write(val, &gv2->read_ack_register);
 }
 
-static void __ghes_panic(struct ghes *ghes)
-{
-	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
-
-	/* reboot to log the error! */
-	if (!panic_timeout)
-		panic_timeout = ghes_panic_timeout;
-	panic("Fatal hardware error!");
-}
-
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -828,17 +955,6 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
 #endif /* CONFIG_ACPI_APEI_SEA */
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
-/*
- * printk is not safe in NMI context.  So in NMI handler, we allocate
- * required memory from lock-less memory allocator
- * (ghes_estatus_pool), save estatus into it, put them into lock-less
- * list (ghes_estatus_llist), then delay printk into IRQ context via
- * irq_work (ghes_proc_irq_work).  ghes_estatus_size_request record
- * required pool size by all NMI error source.
- */
-static struct llist_head ghes_estatus_llist;
-static struct irq_work ghes_proc_irq_work;
-
 /*
  * NMI may be triggered on any CPU, so ghes_in_nmi is used for
  * having only one concurrent reader.
@@ -847,88 +963,6 @@ static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
 
 static LIST_HEAD(ghes_nmi);
 
-static void ghes_proc_in_irq(struct irq_work *irq_work)
-{
-	struct llist_node *llnode, *next;
-	struct ghes_estatus_node *estatus_node;
-	struct acpi_hest_generic *generic;
-	struct acpi_hest_generic_status *estatus;
-	u32 len, node_len;
-
-	llnode = llist_del_all(&ghes_estatus_llist);
-	/*
-	 * Because the time order of estatus in list is reversed,
-	 * revert it back to proper order.
-	 */
-	llnode = llist_reverse_order(llnode);
-	while (llnode) {
-		next = llnode->next;
-		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
-					   llnode);
-		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-		len = cper_estatus_len(estatus);
-		node_len = GHES_ESTATUS_NODE_LEN(len);
-		ghes_do_proc(estatus_node->ghes, estatus);
-		if (!ghes_estatus_cached(estatus)) {
-			generic = estatus_node->generic;
-			if (ghes_print_estatus(NULL, generic, estatus))
-				ghes_estatus_cache_add(generic, estatus);
-		}
-		gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
-			      node_len);
-		llnode = next;
-	}
-}
-
-static void ghes_print_queued_estatus(void)
-{
-	struct llist_node *llnode;
-	struct ghes_estatus_node *estatus_node;
-	struct acpi_hest_generic *generic;
-	struct acpi_hest_generic_status *estatus;
-
-	llnode = llist_del_all(&ghes_estatus_llist);
-	/*
-	 * Because the time order of estatus in list is reversed,
-	 * revert it back to proper order.
-	 */
-	llnode = llist_reverse_order(llnode);
-	while (llnode) {
-		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
-					   llnode);
-		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-		generic = estatus_node->generic;
-		ghes_print_estatus(NULL, generic, estatus);
-		llnode = llnode->next;
-	}
-}
-
-/* Save estatus for further processing in IRQ context */
-static void __process_error(struct ghes *ghes)
-{
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	u32 len, node_len;
-	struct ghes_estatus_node *estatus_node;
-	struct acpi_hest_generic_status *estatus;
-
-	if (ghes_estatus_cached(ghes->estatus))
-		return;
-
-	len = cper_estatus_len(ghes->estatus);
-	node_len = GHES_ESTATUS_NODE_LEN(len);
-
-	estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
-	if (!estatus_node)
-		return;
-
-	estatus_node->ghes = ghes;
-	estatus_node->generic = ghes->generic;
-	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-	memcpy(estatus, ghes->estatus, len);
-	llist_add(&estatus_node->llnode, &ghes_estatus_llist);
-#endif
-}
-
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
 	struct ghes *ghes;
@@ -967,26 +1001,6 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 	return ret;
 }
 
-static unsigned long ghes_esource_prealloc_size(
-	const struct acpi_hest_generic *generic)
-{
-	unsigned long block_length, prealloc_records, prealloc_size;
-
-	block_length = min_t(unsigned long, generic->error_block_length,
-			     GHES_ESTATUS_MAX_SIZE);
-	prealloc_records = max_t(unsigned long,
-				 generic->records_to_preallocate, 1);
-	prealloc_size = min_t(unsigned long, block_length * prealloc_records,
-			      GHES_ESOURCE_PREALLOC_MAX_SIZE);
-
-	return prealloc_size;
-}
-
-static void ghes_estatus_pool_shrink(unsigned long len)
-{
-	ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
-}
-
 static void ghes_nmi_add(struct ghes *ghes)
 {
 	unsigned long len;
@@ -1018,14 +1032,9 @@ static void ghes_nmi_remove(struct ghes *ghes)
 	ghes_estatus_pool_shrink(len);
 }
 
-static void ghes_nmi_init_cxt(void)
-{
-	init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq);
-}
 #else /* CONFIG_HAVE_ACPI_APEI_NMI */
 static inline void ghes_nmi_add(struct ghes *ghes) { }
 static inline void ghes_nmi_remove(struct ghes *ghes) { }
-static inline void ghes_nmi_init_cxt(void) { }
 #endif /* CONFIG_HAVE_ACPI_APEI_NMI */
 
 static int ghes_probe(struct platform_device *ghes_dev)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
@ 2018-04-27 15:34   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:34 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.

First we move the estatus-queue code higher in the file so that any
notify_foo() handler can make use of it.

This patch moves code around ... and makes the following trivial change:
Freshen the dated comment above ghes_estatus_llist. printk() is no
longer the issue, its the helpers like memory_failure_queue() that
still aren't nmi safe.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>

Notes for cover letter:
ghes.c has three things all called 'estatus'. One is a pool of memory
that has a static size, and is grown/shrunk when new NMI users are
allocated.
The second is the cache, this holds recent notifications so we can
suppress notifications we've already handled.
The last is the queue, which hold data from NMI notifications (in pool
memory) that can't be handled immediatly.
---
 drivers/acpi/apei/ghes.c | 265 ++++++++++++++++++++++++-----------------------
 1 file changed, 137 insertions(+), 128 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1efefe919555..e2af91c92135 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -545,6 +545,16 @@ static int ghes_print_estatus(const char *pfx,
 	return 0;
 }
 
+static void __ghes_panic(struct ghes *ghes)
+{
+	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
+
+	/* reboot to log the error! */
+	if (!panic_timeout)
+		panic_timeout = ghes_panic_timeout;
+	panic("Fatal hardware error!");
+}
+
 /*
  * GHES error status reporting throttle, to report more kinds of
  * errors, instead of just most frequently occurred errors.
@@ -672,6 +682,133 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
+#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+/*
+ * Handlers for CPER records may not be NMI safe. For example,
+ * memory_failure_queue() takes spinlocks and calls schedule_work_on().
+ * In any NMI-like handler, memory from ghes_estatus_pool is used to save
+ * estatus, and added to the ghes_estatus_llist. irq_work_queue() causes
+ * ghes_proc_in_irq() to run in IRQ context where each estatus in
+ * ghes_estatus_llist is processed. Each NMI-like error source must grow
+ * the ghes_estatus_pool to ensure memory is available.
+ *
+ * Memory from the ghes_estatus_pool is also used with the ghes_estatus_cache
+ * to suppress frequent messages.
+ */
+static struct llist_head ghes_estatus_llist;
+static struct irq_work ghes_proc_irq_work;
+
+static void ghes_print_queued_estatus(void)
+{
+	struct llist_node *llnode;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic *generic;
+	struct acpi_hest_generic_status *estatus;
+
+	llnode = llist_del_all(&ghes_estatus_llist);
+	/*
+	 * Because the time order of estatus in list is reversed,
+	 * revert it back to proper order.
+	 */
+	llnode = llist_reverse_order(llnode);
+	while (llnode) {
+		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+					   llnode);
+		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+		generic = estatus_node->generic;
+		ghes_print_estatus(NULL, generic, estatus);
+		llnode = llnode->next;
+	}
+}
+
+/* Save estatus for further processing in IRQ context */
+static void __process_error(struct ghes *ghes)
+{
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	u32 len, node_len;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic_status *estatus;
+
+	if (ghes_estatus_cached(ghes->estatus))
+		return;
+
+	len = cper_estatus_len(ghes->estatus);
+	node_len = GHES_ESTATUS_NODE_LEN(len);
+
+	estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
+	if (!estatus_node)
+		return;
+
+	estatus_node->ghes = ghes;
+	estatus_node->generic = ghes->generic;
+	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+	memcpy(estatus, ghes->estatus, len);
+	llist_add(&estatus_node->llnode, &ghes_estatus_llist);
+#endif
+}
+
+static unsigned long ghes_esource_prealloc_size(
+	const struct acpi_hest_generic *generic)
+{
+	unsigned long block_length, prealloc_records, prealloc_size;
+
+	block_length = min_t(unsigned long, generic->error_block_length,
+			     GHES_ESTATUS_MAX_SIZE);
+	prealloc_records = max_t(unsigned long,
+				 generic->records_to_preallocate, 1);
+	prealloc_size = min_t(unsigned long, block_length * prealloc_records,
+			      GHES_ESOURCE_PREALLOC_MAX_SIZE);
+
+	return prealloc_size;
+}
+
+static void ghes_estatus_pool_shrink(unsigned long len)
+{
+	ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
+}
+
+static void ghes_proc_in_irq(struct irq_work *irq_work)
+{
+	struct llist_node *llnode, *next;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic *generic;
+	struct acpi_hest_generic_status *estatus;
+	u32 len, node_len;
+
+	llnode = llist_del_all(&ghes_estatus_llist);
+	/*
+	 * Because the time order of estatus in list is reversed,
+	 * revert it back to proper order.
+	 */
+	llnode = llist_reverse_order(llnode);
+	while (llnode) {
+		next = llnode->next;
+		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+					   llnode);
+		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+		len = cper_estatus_len(estatus);
+		node_len = GHES_ESTATUS_NODE_LEN(len);
+		ghes_do_proc(estatus_node->ghes, estatus);
+		if (!ghes_estatus_cached(estatus)) {
+			generic = estatus_node->generic;
+			if (ghes_print_estatus(NULL, generic, estatus))
+				ghes_estatus_cache_add(generic, estatus);
+		}
+		gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
+			      node_len);
+		llnode = next;
+	}
+}
+
+static void ghes_nmi_init_cxt(void)
+{
+	init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq);
+}
+
+#else
+static inline void ghes_nmi_init_cxt(void) { }
+#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+
 static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 {
 	int rc;
@@ -687,16 +824,6 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 	return apei_write(val, &gv2->read_ack_register);
 }
 
-static void __ghes_panic(struct ghes *ghes)
-{
-	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
-
-	/* reboot to log the error! */
-	if (!panic_timeout)
-		panic_timeout = ghes_panic_timeout;
-	panic("Fatal hardware error!");
-}
-
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -828,17 +955,6 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
 #endif /* CONFIG_ACPI_APEI_SEA */
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
-/*
- * printk is not safe in NMI context.  So in NMI handler, we allocate
- * required memory from lock-less memory allocator
- * (ghes_estatus_pool), save estatus into it, put them into lock-less
- * list (ghes_estatus_llist), then delay printk into IRQ context via
- * irq_work (ghes_proc_irq_work).  ghes_estatus_size_request record
- * required pool size by all NMI error source.
- */
-static struct llist_head ghes_estatus_llist;
-static struct irq_work ghes_proc_irq_work;
-
 /*
  * NMI may be triggered on any CPU, so ghes_in_nmi is used for
  * having only one concurrent reader.
@@ -847,88 +963,6 @@ static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
 
 static LIST_HEAD(ghes_nmi);
 
-static void ghes_proc_in_irq(struct irq_work *irq_work)
-{
-	struct llist_node *llnode, *next;
-	struct ghes_estatus_node *estatus_node;
-	struct acpi_hest_generic *generic;
-	struct acpi_hest_generic_status *estatus;
-	u32 len, node_len;
-
-	llnode = llist_del_all(&ghes_estatus_llist);
-	/*
-	 * Because the time order of estatus in list is reversed,
-	 * revert it back to proper order.
-	 */
-	llnode = llist_reverse_order(llnode);
-	while (llnode) {
-		next = llnode->next;
-		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
-					   llnode);
-		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-		len = cper_estatus_len(estatus);
-		node_len = GHES_ESTATUS_NODE_LEN(len);
-		ghes_do_proc(estatus_node->ghes, estatus);
-		if (!ghes_estatus_cached(estatus)) {
-			generic = estatus_node->generic;
-			if (ghes_print_estatus(NULL, generic, estatus))
-				ghes_estatus_cache_add(generic, estatus);
-		}
-		gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
-			      node_len);
-		llnode = next;
-	}
-}
-
-static void ghes_print_queued_estatus(void)
-{
-	struct llist_node *llnode;
-	struct ghes_estatus_node *estatus_node;
-	struct acpi_hest_generic *generic;
-	struct acpi_hest_generic_status *estatus;
-
-	llnode = llist_del_all(&ghes_estatus_llist);
-	/*
-	 * Because the time order of estatus in list is reversed,
-	 * revert it back to proper order.
-	 */
-	llnode = llist_reverse_order(llnode);
-	while (llnode) {
-		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
-					   llnode);
-		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-		generic = estatus_node->generic;
-		ghes_print_estatus(NULL, generic, estatus);
-		llnode = llnode->next;
-	}
-}
-
-/* Save estatus for further processing in IRQ context */
-static void __process_error(struct ghes *ghes)
-{
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	u32 len, node_len;
-	struct ghes_estatus_node *estatus_node;
-	struct acpi_hest_generic_status *estatus;
-
-	if (ghes_estatus_cached(ghes->estatus))
-		return;
-
-	len = cper_estatus_len(ghes->estatus);
-	node_len = GHES_ESTATUS_NODE_LEN(len);
-
-	estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
-	if (!estatus_node)
-		return;
-
-	estatus_node->ghes = ghes;
-	estatus_node->generic = ghes->generic;
-	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-	memcpy(estatus, ghes->estatus, len);
-	llist_add(&estatus_node->llnode, &ghes_estatus_llist);
-#endif
-}
-
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
 	struct ghes *ghes;
@@ -967,26 +1001,6 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 	return ret;
 }
 
-static unsigned long ghes_esource_prealloc_size(
-	const struct acpi_hest_generic *generic)
-{
-	unsigned long block_length, prealloc_records, prealloc_size;
-
-	block_length = min_t(unsigned long, generic->error_block_length,
-			     GHES_ESTATUS_MAX_SIZE);
-	prealloc_records = max_t(unsigned long,
-				 generic->records_to_preallocate, 1);
-	prealloc_size = min_t(unsigned long, block_length * prealloc_records,
-			      GHES_ESOURCE_PREALLOC_MAX_SIZE);
-
-	return prealloc_size;
-}
-
-static void ghes_estatus_pool_shrink(unsigned long len)
-{
-	ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
-}
-
 static void ghes_nmi_add(struct ghes *ghes)
 {
 	unsigned long len;
@@ -1018,14 +1032,9 @@ static void ghes_nmi_remove(struct ghes *ghes)
 	ghes_estatus_pool_shrink(len);
 }
 
-static void ghes_nmi_init_cxt(void)
-{
-	init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq);
-}
 #else /* CONFIG_HAVE_ACPI_APEI_NMI */
 static inline void ghes_nmi_add(struct ghes *ghes) { }
 static inline void ghes_nmi_remove(struct ghes *ghes) { }
-static inline void ghes_nmi_init_cxt(void) { }
 #endif /* CONFIG_HAVE_ACPI_APEI_NMI */
 
 static int ghes_probe(struct platform_device *ghes_dev)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
@ 2018-04-27 15:34   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.

First we move the estatus-queue code higher in the file so that any
notify_foo() handler can make use of it.

This patch moves code around ... and makes the following trivial change:
Freshen the dated comment above ghes_estatus_llist. printk() is no
longer the issue, its the helpers like memory_failure_queue() that
still aren't nmi safe.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>

Notes for cover letter:
ghes.c has three things all called 'estatus'. One is a pool of memory
that has a static size, and is grown/shrunk when new NMI users are
allocated.
The second is the cache, this holds recent notifications so we can
suppress notifications we've already handled.
The last is the queue, which hold data from NMI notifications (in pool
memory) that can't be handled immediatly.
---
 drivers/acpi/apei/ghes.c | 265 ++++++++++++++++++++++++-----------------------
 1 file changed, 137 insertions(+), 128 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1efefe919555..e2af91c92135 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -545,6 +545,16 @@ static int ghes_print_estatus(const char *pfx,
 	return 0;
 }
 
+static void __ghes_panic(struct ghes *ghes)
+{
+	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
+
+	/* reboot to log the error! */
+	if (!panic_timeout)
+		panic_timeout = ghes_panic_timeout;
+	panic("Fatal hardware error!");
+}
+
 /*
  * GHES error status reporting throttle, to report more kinds of
  * errors, instead of just most frequently occurred errors.
@@ -672,6 +682,133 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
+#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+/*
+ * Handlers for CPER records may not be NMI safe. For example,
+ * memory_failure_queue() takes spinlocks and calls schedule_work_on().
+ * In any NMI-like handler, memory from ghes_estatus_pool is used to save
+ * estatus, and added to the ghes_estatus_llist. irq_work_queue() causes
+ * ghes_proc_in_irq() to run in IRQ context where each estatus in
+ * ghes_estatus_llist is processed. Each NMI-like error source must grow
+ * the ghes_estatus_pool to ensure memory is available.
+ *
+ * Memory from the ghes_estatus_pool is also used with the ghes_estatus_cache
+ * to suppress frequent messages.
+ */
+static struct llist_head ghes_estatus_llist;
+static struct irq_work ghes_proc_irq_work;
+
+static void ghes_print_queued_estatus(void)
+{
+	struct llist_node *llnode;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic *generic;
+	struct acpi_hest_generic_status *estatus;
+
+	llnode = llist_del_all(&ghes_estatus_llist);
+	/*
+	 * Because the time order of estatus in list is reversed,
+	 * revert it back to proper order.
+	 */
+	llnode = llist_reverse_order(llnode);
+	while (llnode) {
+		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+					   llnode);
+		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+		generic = estatus_node->generic;
+		ghes_print_estatus(NULL, generic, estatus);
+		llnode = llnode->next;
+	}
+}
+
+/* Save estatus for further processing in IRQ context */
+static void __process_error(struct ghes *ghes)
+{
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	u32 len, node_len;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic_status *estatus;
+
+	if (ghes_estatus_cached(ghes->estatus))
+		return;
+
+	len = cper_estatus_len(ghes->estatus);
+	node_len = GHES_ESTATUS_NODE_LEN(len);
+
+	estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
+	if (!estatus_node)
+		return;
+
+	estatus_node->ghes = ghes;
+	estatus_node->generic = ghes->generic;
+	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+	memcpy(estatus, ghes->estatus, len);
+	llist_add(&estatus_node->llnode, &ghes_estatus_llist);
+#endif
+}
+
+static unsigned long ghes_esource_prealloc_size(
+	const struct acpi_hest_generic *generic)
+{
+	unsigned long block_length, prealloc_records, prealloc_size;
+
+	block_length = min_t(unsigned long, generic->error_block_length,
+			     GHES_ESTATUS_MAX_SIZE);
+	prealloc_records = max_t(unsigned long,
+				 generic->records_to_preallocate, 1);
+	prealloc_size = min_t(unsigned long, block_length * prealloc_records,
+			      GHES_ESOURCE_PREALLOC_MAX_SIZE);
+
+	return prealloc_size;
+}
+
+static void ghes_estatus_pool_shrink(unsigned long len)
+{
+	ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
+}
+
+static void ghes_proc_in_irq(struct irq_work *irq_work)
+{
+	struct llist_node *llnode, *next;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic *generic;
+	struct acpi_hest_generic_status *estatus;
+	u32 len, node_len;
+
+	llnode = llist_del_all(&ghes_estatus_llist);
+	/*
+	 * Because the time order of estatus in list is reversed,
+	 * revert it back to proper order.
+	 */
+	llnode = llist_reverse_order(llnode);
+	while (llnode) {
+		next = llnode->next;
+		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+					   llnode);
+		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+		len = cper_estatus_len(estatus);
+		node_len = GHES_ESTATUS_NODE_LEN(len);
+		ghes_do_proc(estatus_node->ghes, estatus);
+		if (!ghes_estatus_cached(estatus)) {
+			generic = estatus_node->generic;
+			if (ghes_print_estatus(NULL, generic, estatus))
+				ghes_estatus_cache_add(generic, estatus);
+		}
+		gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
+			      node_len);
+		llnode = next;
+	}
+}
+
+static void ghes_nmi_init_cxt(void)
+{
+	init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq);
+}
+
+#else
+static inline void ghes_nmi_init_cxt(void) { }
+#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+
 static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 {
 	int rc;
@@ -687,16 +824,6 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 	return apei_write(val, &gv2->read_ack_register);
 }
 
-static void __ghes_panic(struct ghes *ghes)
-{
-	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
-
-	/* reboot to log the error! */
-	if (!panic_timeout)
-		panic_timeout = ghes_panic_timeout;
-	panic("Fatal hardware error!");
-}
-
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -828,17 +955,6 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
 #endif /* CONFIG_ACPI_APEI_SEA */
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
-/*
- * printk is not safe in NMI context.  So in NMI handler, we allocate
- * required memory from lock-less memory allocator
- * (ghes_estatus_pool), save estatus into it, put them into lock-less
- * list (ghes_estatus_llist), then delay printk into IRQ context via
- * irq_work (ghes_proc_irq_work).  ghes_estatus_size_request record
- * required pool size by all NMI error source.
- */
-static struct llist_head ghes_estatus_llist;
-static struct irq_work ghes_proc_irq_work;
-
 /*
  * NMI may be triggered on any CPU, so ghes_in_nmi is used for
  * having only one concurrent reader.
@@ -847,88 +963,6 @@ static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
 
 static LIST_HEAD(ghes_nmi);
 
-static void ghes_proc_in_irq(struct irq_work *irq_work)
-{
-	struct llist_node *llnode, *next;
-	struct ghes_estatus_node *estatus_node;
-	struct acpi_hest_generic *generic;
-	struct acpi_hest_generic_status *estatus;
-	u32 len, node_len;
-
-	llnode = llist_del_all(&ghes_estatus_llist);
-	/*
-	 * Because the time order of estatus in list is reversed,
-	 * revert it back to proper order.
-	 */
-	llnode = llist_reverse_order(llnode);
-	while (llnode) {
-		next = llnode->next;
-		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
-					   llnode);
-		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-		len = cper_estatus_len(estatus);
-		node_len = GHES_ESTATUS_NODE_LEN(len);
-		ghes_do_proc(estatus_node->ghes, estatus);
-		if (!ghes_estatus_cached(estatus)) {
-			generic = estatus_node->generic;
-			if (ghes_print_estatus(NULL, generic, estatus))
-				ghes_estatus_cache_add(generic, estatus);
-		}
-		gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
-			      node_len);
-		llnode = next;
-	}
-}
-
-static void ghes_print_queued_estatus(void)
-{
-	struct llist_node *llnode;
-	struct ghes_estatus_node *estatus_node;
-	struct acpi_hest_generic *generic;
-	struct acpi_hest_generic_status *estatus;
-
-	llnode = llist_del_all(&ghes_estatus_llist);
-	/*
-	 * Because the time order of estatus in list is reversed,
-	 * revert it back to proper order.
-	 */
-	llnode = llist_reverse_order(llnode);
-	while (llnode) {
-		estatus_node = llist_entry(llnode, struct ghes_estatus_node,
-					   llnode);
-		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-		generic = estatus_node->generic;
-		ghes_print_estatus(NULL, generic, estatus);
-		llnode = llnode->next;
-	}
-}
-
-/* Save estatus for further processing in IRQ context */
-static void __process_error(struct ghes *ghes)
-{
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	u32 len, node_len;
-	struct ghes_estatus_node *estatus_node;
-	struct acpi_hest_generic_status *estatus;
-
-	if (ghes_estatus_cached(ghes->estatus))
-		return;
-
-	len = cper_estatus_len(ghes->estatus);
-	node_len = GHES_ESTATUS_NODE_LEN(len);
-
-	estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
-	if (!estatus_node)
-		return;
-
-	estatus_node->ghes = ghes;
-	estatus_node->generic = ghes->generic;
-	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-	memcpy(estatus, ghes->estatus, len);
-	llist_add(&estatus_node->llnode, &ghes_estatus_llist);
-#endif
-}
-
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
 	struct ghes *ghes;
@@ -967,26 +1001,6 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 	return ret;
 }
 
-static unsigned long ghes_esource_prealloc_size(
-	const struct acpi_hest_generic *generic)
-{
-	unsigned long block_length, prealloc_records, prealloc_size;
-
-	block_length = min_t(unsigned long, generic->error_block_length,
-			     GHES_ESTATUS_MAX_SIZE);
-	prealloc_records = max_t(unsigned long,
-				 generic->records_to_preallocate, 1);
-	prealloc_size = min_t(unsigned long, block_length * prealloc_records,
-			      GHES_ESOURCE_PREALLOC_MAX_SIZE);
-
-	return prealloc_size;
-}
-
-static void ghes_estatus_pool_shrink(unsigned long len)
-{
-	ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
-}
-
 static void ghes_nmi_add(struct ghes *ghes)
 {
 	unsigned long len;
@@ -1018,14 +1032,9 @@ static void ghes_nmi_remove(struct ghes *ghes)
 	ghes_estatus_pool_shrink(len);
 }
 
-static void ghes_nmi_init_cxt(void)
-{
-	init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq);
-}
 #else /* CONFIG_HAVE_ACPI_APEI_NMI */
 static inline void ghes_nmi_add(struct ghes *ghes) { }
 static inline void ghes_nmi_remove(struct ghes *ghes) { }
-static inline void ghes_nmi_init_cxt(void) { }
 #endif /* CONFIG_HAVE_ACPI_APEI_NMI */
 
 static int ghes_probe(struct platform_device *ghes_dev)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Xie XiuQi, Marc Zyngier, Catalin Marinas,
	Tyler Baicar, Will Deacon, Dongjiu Geng, linux-mm,
	Borislav Petkov, James Morse, Naoya Horiguchi, kvmarm,
	linux-arm-kernel, Len Brown

To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.

Refactor the estatus queue's pool grow/shrink code and notification
routine from NOTIFY_NMI's handlers. This will allow another notification
method to use the estatus queue without duplicating this code.

This patch adds rcu_read_lock()/rcu_read_unlock() around the list
list_for_each_entry_rcu() walker. These aren't strictly necessary as
the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
critical section.

Keep the oops_begin() call for x86, arm64 doesn't have one of these,
and APEI is the only thing outside arch code calling this..

The existing ghes_estatus_pool_shrink() is folded into the new
ghes_estatus_queue_shrink_pool() as only the queue uses it.

_in_nmi_notify_one() is separate from the rcu-list walker for a later
caller that doesn't need to walk a list.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>

---
Changes since v1:
 * Tidied up _in_nmi_notify_one().

 drivers/acpi/apei/ghes.c | 100 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 65 insertions(+), 35 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e2af91c92135..c8a6c5b0516e 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -747,6 +747,51 @@ static void __process_error(struct ghes *ghes)
 #endif
 }
 
+static int _in_nmi_notify_one(struct ghes *ghes)
+{
+	int sev;
+
+	if (ghes_read_estatus(ghes, 1)) {
+		ghes_clear_estatus(ghes);
+		return -ENOENT;
+	}
+
+	sev = ghes_severity(ghes->estatus->error_severity);
+	if (sev >= GHES_SEV_PANIC) {
+#ifdef CONFIG_X86
+		oops_begin();
+#endif
+		ghes_print_queued_estatus();
+		__ghes_panic(ghes);
+	}
+
+	if (!(ghes->flags & GHES_TO_CLEAR))
+		return 0;
+
+	__process_error(ghes);
+	ghes_clear_estatus(ghes);
+
+	return 0;
+}
+
+static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+{
+	int ret = -ENOENT;
+	struct ghes *ghes;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, rcu_list, list) {
+		if (!_in_nmi_notify_one(ghes))
+			ret = 0;
+	}
+	rcu_read_unlock();
+
+	if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && ret == 0)
+		irq_work_queue(&ghes_proc_irq_work);
+
+	return ret;
+}
+
 static unsigned long ghes_esource_prealloc_size(
 	const struct acpi_hest_generic *generic)
 {
@@ -762,11 +807,24 @@ static unsigned long ghes_esource_prealloc_size(
 	return prealloc_size;
 }
 
-static void ghes_estatus_pool_shrink(unsigned long len)
+/* After removing a queue user, we can shrink the pool */
+static void ghes_estatus_queue_shrink_pool(struct ghes *ghes)
 {
+	unsigned long len;
+
+	len = ghes_esource_prealloc_size(ghes->generic);
 	ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
 }
 
+/* Before adding a queue user, grow the pool */
+static void ghes_estatus_queue_grow_pool(struct ghes *ghes)
+{
+	unsigned long len;
+
+	len = ghes_esource_prealloc_size(ghes->generic);
+	ghes_estatus_pool_expand(len);
+}
+
 static void ghes_proc_in_irq(struct irq_work *irq_work)
 {
 	struct llist_node *llnode, *next;
@@ -965,48 +1023,22 @@ static LIST_HEAD(ghes_nmi);
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
-	struct ghes *ghes;
-	int sev, ret = NMI_DONE;
+	int ret = NMI_DONE;
 
 	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
 		return ret;
 
-	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
-		if (ghes_read_estatus(ghes, 1)) {
-			ghes_clear_estatus(ghes);
-			continue;
-		} else {
-			ret = NMI_HANDLED;
-		}
-
-		sev = ghes_severity(ghes->estatus->error_severity);
-		if (sev >= GHES_SEV_PANIC) {
-			oops_begin();
-			ghes_print_queued_estatus();
-			__ghes_panic(ghes);
-		}
+	if (!ghes_estatus_queue_notified(&ghes_nmi))
+		ret = NMI_HANDLED;
 
-		if (!(ghes->flags & GHES_TO_CLEAR))
-			continue;
-
-		__process_error(ghes);
-		ghes_clear_estatus(ghes);
-	}
-
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	if (ret == NMI_HANDLED)
-		irq_work_queue(&ghes_proc_irq_work);
-#endif
 	atomic_dec(&ghes_in_nmi);
 	return ret;
 }
 
 static void ghes_nmi_add(struct ghes *ghes)
 {
-	unsigned long len;
+	ghes_estatus_queue_grow_pool(ghes);
 
-	len = ghes_esource_prealloc_size(ghes->generic);
-	ghes_estatus_pool_expand(len);
 	mutex_lock(&ghes_list_mutex);
 	if (list_empty(&ghes_nmi))
 		register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes");
@@ -1016,8 +1048,6 @@ static void ghes_nmi_add(struct ghes *ghes)
 
 static void ghes_nmi_remove(struct ghes *ghes)
 {
-	unsigned long len;
-
 	mutex_lock(&ghes_list_mutex);
 	list_del_rcu(&ghes->list);
 	if (list_empty(&ghes_nmi))
@@ -1028,8 +1058,8 @@ static void ghes_nmi_remove(struct ghes *ghes)
 	 * freed after NMI handler finishes.
 	 */
 	synchronize_rcu();
-	len = ghes_esource_prealloc_size(ghes->generic);
-	ghes_estatus_pool_shrink(len);
+
+	ghes_estatus_queue_shrink_pool(ghes);
 }
 
 #else /* CONFIG_HAVE_ACPI_APEI_NMI */
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.

Refactor the estatus queue's pool grow/shrink code and notification
routine from NOTIFY_NMI's handlers. This will allow another notification
method to use the estatus queue without duplicating this code.

This patch adds rcu_read_lock()/rcu_read_unlock() around the list
list_for_each_entry_rcu() walker. These aren't strictly necessary as
the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
critical section.

Keep the oops_begin() call for x86, arm64 doesn't have one of these,
and APEI is the only thing outside arch code calling this..

The existing ghes_estatus_pool_shrink() is folded into the new
ghes_estatus_queue_shrink_pool() as only the queue uses it.

_in_nmi_notify_one() is separate from the rcu-list walker for a later
caller that doesn't need to walk a list.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>

---
Changes since v1:
 * Tidied up _in_nmi_notify_one().

 drivers/acpi/apei/ghes.c | 100 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 65 insertions(+), 35 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e2af91c92135..c8a6c5b0516e 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -747,6 +747,51 @@ static void __process_error(struct ghes *ghes)
 #endif
 }
 
+static int _in_nmi_notify_one(struct ghes *ghes)
+{
+	int sev;
+
+	if (ghes_read_estatus(ghes, 1)) {
+		ghes_clear_estatus(ghes);
+		return -ENOENT;
+	}
+
+	sev = ghes_severity(ghes->estatus->error_severity);
+	if (sev >= GHES_SEV_PANIC) {
+#ifdef CONFIG_X86
+		oops_begin();
+#endif
+		ghes_print_queued_estatus();
+		__ghes_panic(ghes);
+	}
+
+	if (!(ghes->flags & GHES_TO_CLEAR))
+		return 0;
+
+	__process_error(ghes);
+	ghes_clear_estatus(ghes);
+
+	return 0;
+}
+
+static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+{
+	int ret = -ENOENT;
+	struct ghes *ghes;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, rcu_list, list) {
+		if (!_in_nmi_notify_one(ghes))
+			ret = 0;
+	}
+	rcu_read_unlock();
+
+	if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && ret == 0)
+		irq_work_queue(&ghes_proc_irq_work);
+
+	return ret;
+}
+
 static unsigned long ghes_esource_prealloc_size(
 	const struct acpi_hest_generic *generic)
 {
@@ -762,11 +807,24 @@ static unsigned long ghes_esource_prealloc_size(
 	return prealloc_size;
 }
 
-static void ghes_estatus_pool_shrink(unsigned long len)
+/* After removing a queue user, we can shrink the pool */
+static void ghes_estatus_queue_shrink_pool(struct ghes *ghes)
 {
+	unsigned long len;
+
+	len = ghes_esource_prealloc_size(ghes->generic);
 	ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
 }
 
+/* Before adding a queue user, grow the pool */
+static void ghes_estatus_queue_grow_pool(struct ghes *ghes)
+{
+	unsigned long len;
+
+	len = ghes_esource_prealloc_size(ghes->generic);
+	ghes_estatus_pool_expand(len);
+}
+
 static void ghes_proc_in_irq(struct irq_work *irq_work)
 {
 	struct llist_node *llnode, *next;
@@ -965,48 +1023,22 @@ static LIST_HEAD(ghes_nmi);
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
-	struct ghes *ghes;
-	int sev, ret = NMI_DONE;
+	int ret = NMI_DONE;
 
 	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
 		return ret;
 
-	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
-		if (ghes_read_estatus(ghes, 1)) {
-			ghes_clear_estatus(ghes);
-			continue;
-		} else {
-			ret = NMI_HANDLED;
-		}
-
-		sev = ghes_severity(ghes->estatus->error_severity);
-		if (sev >= GHES_SEV_PANIC) {
-			oops_begin();
-			ghes_print_queued_estatus();
-			__ghes_panic(ghes);
-		}
+	if (!ghes_estatus_queue_notified(&ghes_nmi))
+		ret = NMI_HANDLED;
 
-		if (!(ghes->flags & GHES_TO_CLEAR))
-			continue;
-
-		__process_error(ghes);
-		ghes_clear_estatus(ghes);
-	}
-
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	if (ret == NMI_HANDLED)
-		irq_work_queue(&ghes_proc_irq_work);
-#endif
 	atomic_dec(&ghes_in_nmi);
 	return ret;
 }
 
 static void ghes_nmi_add(struct ghes *ghes)
 {
-	unsigned long len;
+	ghes_estatus_queue_grow_pool(ghes);
 
-	len = ghes_esource_prealloc_size(ghes->generic);
-	ghes_estatus_pool_expand(len);
 	mutex_lock(&ghes_list_mutex);
 	if (list_empty(&ghes_nmi))
 		register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes");
@@ -1016,8 +1048,6 @@ static void ghes_nmi_add(struct ghes *ghes)
 
 static void ghes_nmi_remove(struct ghes *ghes)
 {
-	unsigned long len;
-
 	mutex_lock(&ghes_list_mutex);
 	list_del_rcu(&ghes->list);
 	if (list_empty(&ghes_nmi))
@@ -1028,8 +1058,8 @@ static void ghes_nmi_remove(struct ghes *ghes)
 	 * freed after NMI handler finishes.
 	 */
 	synchronize_rcu();
-	len = ghes_esource_prealloc_size(ghes->generic);
-	ghes_estatus_pool_shrink(len);
+
+	ghes_estatus_queue_shrink_pool(ghes);
 }
 
 #else /* CONFIG_HAVE_ACPI_APEI_NMI */
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.

Refactor the estatus queue's pool grow/shrink code and notification
routine from NOTIFY_NMI's handlers. This will allow another notification
method to use the estatus queue without duplicating this code.

This patch adds rcu_read_lock()/rcu_read_unlock() around the list
list_for_each_entry_rcu() walker. These aren't strictly necessary as
the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
critical section.

Keep the oops_begin() call for x86, arm64 doesn't have one of these,
and APEI is the only thing outside arch code calling this..

The existing ghes_estatus_pool_shrink() is folded into the new
ghes_estatus_queue_shrink_pool() as only the queue uses it.

_in_nmi_notify_one() is separate from the rcu-list walker for a later
caller that doesn't need to walk a list.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>

---
Changes since v1:
 * Tidied up _in_nmi_notify_one().

 drivers/acpi/apei/ghes.c | 100 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 65 insertions(+), 35 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e2af91c92135..c8a6c5b0516e 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -747,6 +747,51 @@ static void __process_error(struct ghes *ghes)
 #endif
 }
 
+static int _in_nmi_notify_one(struct ghes *ghes)
+{
+	int sev;
+
+	if (ghes_read_estatus(ghes, 1)) {
+		ghes_clear_estatus(ghes);
+		return -ENOENT;
+	}
+
+	sev = ghes_severity(ghes->estatus->error_severity);
+	if (sev >= GHES_SEV_PANIC) {
+#ifdef CONFIG_X86
+		oops_begin();
+#endif
+		ghes_print_queued_estatus();
+		__ghes_panic(ghes);
+	}
+
+	if (!(ghes->flags & GHES_TO_CLEAR))
+		return 0;
+
+	__process_error(ghes);
+	ghes_clear_estatus(ghes);
+
+	return 0;
+}
+
+static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+{
+	int ret = -ENOENT;
+	struct ghes *ghes;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, rcu_list, list) {
+		if (!_in_nmi_notify_one(ghes))
+			ret = 0;
+	}
+	rcu_read_unlock();
+
+	if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && ret == 0)
+		irq_work_queue(&ghes_proc_irq_work);
+
+	return ret;
+}
+
 static unsigned long ghes_esource_prealloc_size(
 	const struct acpi_hest_generic *generic)
 {
@@ -762,11 +807,24 @@ static unsigned long ghes_esource_prealloc_size(
 	return prealloc_size;
 }
 
-static void ghes_estatus_pool_shrink(unsigned long len)
+/* After removing a queue user, we can shrink the pool */
+static void ghes_estatus_queue_shrink_pool(struct ghes *ghes)
 {
+	unsigned long len;
+
+	len = ghes_esource_prealloc_size(ghes->generic);
 	ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
 }
 
+/* Before adding a queue user, grow the pool */
+static void ghes_estatus_queue_grow_pool(struct ghes *ghes)
+{
+	unsigned long len;
+
+	len = ghes_esource_prealloc_size(ghes->generic);
+	ghes_estatus_pool_expand(len);
+}
+
 static void ghes_proc_in_irq(struct irq_work *irq_work)
 {
 	struct llist_node *llnode, *next;
@@ -965,48 +1023,22 @@ static LIST_HEAD(ghes_nmi);
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
-	struct ghes *ghes;
-	int sev, ret = NMI_DONE;
+	int ret = NMI_DONE;
 
 	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
 		return ret;
 
-	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
-		if (ghes_read_estatus(ghes, 1)) {
-			ghes_clear_estatus(ghes);
-			continue;
-		} else {
-			ret = NMI_HANDLED;
-		}
-
-		sev = ghes_severity(ghes->estatus->error_severity);
-		if (sev >= GHES_SEV_PANIC) {
-			oops_begin();
-			ghes_print_queued_estatus();
-			__ghes_panic(ghes);
-		}
+	if (!ghes_estatus_queue_notified(&ghes_nmi))
+		ret = NMI_HANDLED;
 
-		if (!(ghes->flags & GHES_TO_CLEAR))
-			continue;
-
-		__process_error(ghes);
-		ghes_clear_estatus(ghes);
-	}
-
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	if (ret == NMI_HANDLED)
-		irq_work_queue(&ghes_proc_irq_work);
-#endif
 	atomic_dec(&ghes_in_nmi);
 	return ret;
 }
 
 static void ghes_nmi_add(struct ghes *ghes)
 {
-	unsigned long len;
+	ghes_estatus_queue_grow_pool(ghes);
 
-	len = ghes_esource_prealloc_size(ghes->generic);
-	ghes_estatus_pool_expand(len);
 	mutex_lock(&ghes_list_mutex);
 	if (list_empty(&ghes_nmi))
 		register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes");
@@ -1016,8 +1048,6 @@ static void ghes_nmi_add(struct ghes *ghes)
 
 static void ghes_nmi_remove(struct ghes *ghes)
 {
-	unsigned long len;
-
 	mutex_lock(&ghes_list_mutex);
 	list_del_rcu(&ghes->list);
 	if (list_empty(&ghes_nmi))
@@ -1028,8 +1058,8 @@ static void ghes_nmi_remove(struct ghes *ghes)
 	 * freed after NMI handler finishes.
 	 */
 	synchronize_rcu();
-	len = ghes_esource_prealloc_size(ghes->generic);
-	ghes_estatus_pool_shrink(len);
+
+	ghes_estatus_queue_shrink_pool(ghes);
 }
 
 #else /* CONFIG_HAVE_ACPI_APEI_NMI */
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 03/12] ACPI / APEI: don't wait to serialise with oops messages when panic()ing
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

oops_begin() exists to group printk() messages with the oops message
printed by die(). To reach this caller we know that platform firmware
took this error first, then notified the OS via NMI with a 'panic'
severity.

Don't wait for another CPU to release the die-lock before we can
panic(), our only goal is to print this fatal error and panic().

This code is always called in_nmi(), and since 42a0bb3f7138 ("printk/nmi:
generic solution for safe printk in NMI"), it has been safe to call
printk() from this context. Messages are batched in a per-cpu buffer
and printed via irq-work, or a call back from panic().

Link: https://patchwork.kernel.org/patch/10313555/
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index c8a6c5b0516e..ed8ad9898365 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -33,7 +33,6 @@
 #include <linux/interrupt.h>
 #include <linux/timer.h>
 #include <linux/cper.h>
-#include <linux/kdebug.h>
 #include <linux/platform_device.h>
 #include <linux/mutex.h>
 #include <linux/ratelimit.h>
@@ -758,9 +757,6 @@ static int _in_nmi_notify_one(struct ghes *ghes)
 
 	sev = ghes_severity(ghes->estatus->error_severity);
 	if (sev >= GHES_SEV_PANIC) {
-#ifdef CONFIG_X86
-		oops_begin();
-#endif
 		ghes_print_queued_estatus();
 		__ghes_panic(ghes);
 	}
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 03/12] ACPI / APEI: don't wait to serialise with oops messages when panic()ing
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

oops_begin() exists to group printk() messages with the oops message
printed by die(). To reach this caller we know that platform firmware
took this error first, then notified the OS via NMI with a 'panic'
severity.

Don't wait for another CPU to release the die-lock before we can
panic(), our only goal is to print this fatal error and panic().

This code is always called in_nmi(), and since 42a0bb3f7138 ("printk/nmi:
generic solution for safe printk in NMI"), it has been safe to call
printk() from this context. Messages are batched in a per-cpu buffer
and printed via irq-work, or a call back from panic().

Link: https://patchwork.kernel.org/patch/10313555/
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index c8a6c5b0516e..ed8ad9898365 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -33,7 +33,6 @@
 #include <linux/interrupt.h>
 #include <linux/timer.h>
 #include <linux/cper.h>
-#include <linux/kdebug.h>
 #include <linux/platform_device.h>
 #include <linux/mutex.h>
 #include <linux/ratelimit.h>
@@ -758,9 +757,6 @@ static int _in_nmi_notify_one(struct ghes *ghes)
 
 	sev = ghes_severity(ghes->estatus->error_severity);
 	if (sev >= GHES_SEV_PANIC) {
-#ifdef CONFIG_X86
-		oops_begin();
-#endif
 		ghes_print_queued_estatus();
 		__ghes_panic(ghes);
 	}
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 03/12] ACPI / APEI: don't wait to serialise with oops messages when panic()ing
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

oops_begin() exists to group printk() messages with the oops message
printed by die(). To reach this caller we know that platform firmware
took this error first, then notified the OS via NMI with a 'panic'
severity.

Don't wait for another CPU to release the die-lock before we can
panic(), our only goal is to print this fatal error and panic().

This code is always called in_nmi(), and since 42a0bb3f7138 ("printk/nmi:
generic solution for safe printk in NMI"), it has been safe to call
printk() from this context. Messages are batched in a per-cpu buffer
and printed via irq-work, or a call back from panic().

Link: https://patchwork.kernel.org/patch/10313555/
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index c8a6c5b0516e..ed8ad9898365 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -33,7 +33,6 @@
 #include <linux/interrupt.h>
 #include <linux/timer.h>
 #include <linux/cper.h>
-#include <linux/kdebug.h>
 #include <linux/platform_device.h>
 #include <linux/mutex.h>
 #include <linux/ratelimit.h>
@@ -758,9 +757,6 @@ static int _in_nmi_notify_one(struct ghes *ghes)
 
 	sev = ghes_severity(ghes->estatus->error_severity);
 	if (sev >= GHES_SEV_PANIC) {
-#ifdef CONFIG_X86
-		oops_begin();
-#endif
 		ghes_print_queued_estatus();
 		__ghes_panic(ghes);
 	}
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 04/12] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

Now that the estatus queue can be used by more than one notification
method, we can move notifications that have NMI-like behaviour over to
it, and start abstracting GHES's single in_nmi() path.

Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
in the same way as x86's NOTIFY_NMI.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
 drivers/acpi/apei/ghes.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index ed8ad9898365..1859f27c37ff 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -58,6 +58,10 @@
 
 #define GHES_PFX	"GHES: "
 
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#define WANT_NMI_ESTATUS_QUEUE	1
+#endif
+
 #define GHES_ESTATUS_MAX_SIZE		65536
 #define GHES_ESOURCE_PREALLOC_MAX_SIZE	65536
 
@@ -681,7 +685,7 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
-#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+#ifdef WANT_NMI_ESTATUS_QUEUE
 /*
  * Handlers for CPER records may not be NMI safe. For example,
  * memory_failure_queue() takes spinlocks and calls schedule_work_on().
@@ -861,7 +865,7 @@ static void ghes_nmi_init_cxt(void)
 
 #else
 static inline void ghes_nmi_init_cxt(void) { }
-#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+#endif /* WANT_NMI_ESTATUS_QUEUE */
 
 static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 {
@@ -977,20 +981,13 @@ static LIST_HEAD(ghes_sea);
  */
 int ghes_notify_sea(void)
 {
-	struct ghes *ghes;
-	int ret = -ENOENT;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
-		if (!ghes_proc(ghes))
-			ret = 0;
-	}
-	rcu_read_unlock();
-	return ret;
+	return ghes_estatus_queue_notified(&ghes_sea);
 }
 
 static void ghes_sea_add(struct ghes *ghes)
 {
+	ghes_estatus_queue_grow_pool(ghes);
+
 	mutex_lock(&ghes_list_mutex);
 	list_add_rcu(&ghes->list, &ghes_sea);
 	mutex_unlock(&ghes_list_mutex);
@@ -1002,6 +999,8 @@ static void ghes_sea_remove(struct ghes *ghes)
 	list_del_rcu(&ghes->list);
 	mutex_unlock(&ghes_list_mutex);
 	synchronize_rcu();
+
+	ghes_estatus_queue_shrink_pool(ghes);
 }
 #else /* CONFIG_ACPI_APEI_SEA */
 static inline void ghes_sea_add(struct ghes *ghes) { }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 04/12] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

Now that the estatus queue can be used by more than one notification
method, we can move notifications that have NMI-like behaviour over to
it, and start abstracting GHES's single in_nmi() path.

Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
in the same way as x86's NOTIFY_NMI.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
 drivers/acpi/apei/ghes.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index ed8ad9898365..1859f27c37ff 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -58,6 +58,10 @@
 
 #define GHES_PFX	"GHES: "
 
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#define WANT_NMI_ESTATUS_QUEUE	1
+#endif
+
 #define GHES_ESTATUS_MAX_SIZE		65536
 #define GHES_ESOURCE_PREALLOC_MAX_SIZE	65536
 
@@ -681,7 +685,7 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
-#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+#ifdef WANT_NMI_ESTATUS_QUEUE
 /*
  * Handlers for CPER records may not be NMI safe. For example,
  * memory_failure_queue() takes spinlocks and calls schedule_work_on().
@@ -861,7 +865,7 @@ static void ghes_nmi_init_cxt(void)
 
 #else
 static inline void ghes_nmi_init_cxt(void) { }
-#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+#endif /* WANT_NMI_ESTATUS_QUEUE */
 
 static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 {
@@ -977,20 +981,13 @@ static LIST_HEAD(ghes_sea);
  */
 int ghes_notify_sea(void)
 {
-	struct ghes *ghes;
-	int ret = -ENOENT;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
-		if (!ghes_proc(ghes))
-			ret = 0;
-	}
-	rcu_read_unlock();
-	return ret;
+	return ghes_estatus_queue_notified(&ghes_sea);
 }
 
 static void ghes_sea_add(struct ghes *ghes)
 {
+	ghes_estatus_queue_grow_pool(ghes);
+
 	mutex_lock(&ghes_list_mutex);
 	list_add_rcu(&ghes->list, &ghes_sea);
 	mutex_unlock(&ghes_list_mutex);
@@ -1002,6 +999,8 @@ static void ghes_sea_remove(struct ghes *ghes)
 	list_del_rcu(&ghes->list);
 	mutex_unlock(&ghes_list_mutex);
 	synchronize_rcu();
+
+	ghes_estatus_queue_shrink_pool(ghes);
 }
 #else /* CONFIG_ACPI_APEI_SEA */
 static inline void ghes_sea_add(struct ghes *ghes) { }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 04/12] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Now that the estatus queue can be used by more than one notification
method, we can move notifications that have NMI-like behaviour over to
it, and start abstracting GHES's single in_nmi() path.

Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
in the same way as x86's NOTIFY_NMI.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
 drivers/acpi/apei/ghes.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index ed8ad9898365..1859f27c37ff 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -58,6 +58,10 @@
 
 #define GHES_PFX	"GHES: "
 
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#define WANT_NMI_ESTATUS_QUEUE	1
+#endif
+
 #define GHES_ESTATUS_MAX_SIZE		65536
 #define GHES_ESOURCE_PREALLOC_MAX_SIZE	65536
 
@@ -681,7 +685,7 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
-#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+#ifdef WANT_NMI_ESTATUS_QUEUE
 /*
  * Handlers for CPER records may not be NMI safe. For example,
  * memory_failure_queue() takes spinlocks and calls schedule_work_on().
@@ -861,7 +865,7 @@ static void ghes_nmi_init_cxt(void)
 
 #else
 static inline void ghes_nmi_init_cxt(void) { }
-#endif /* CONFIG_HAVE_ACPI_APEI_NMI */
+#endif /* WANT_NMI_ESTATUS_QUEUE */
 
 static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
 {
@@ -977,20 +981,13 @@ static LIST_HEAD(ghes_sea);
  */
 int ghes_notify_sea(void)
 {
-	struct ghes *ghes;
-	int ret = -ENOENT;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
-		if (!ghes_proc(ghes))
-			ret = 0;
-	}
-	rcu_read_unlock();
-	return ret;
+	return ghes_estatus_queue_notified(&ghes_sea);
 }
 
 static void ghes_sea_add(struct ghes *ghes)
 {
+	ghes_estatus_queue_grow_pool(ghes);
+
 	mutex_lock(&ghes_list_mutex);
 	list_add_rcu(&ghes->list, &ghes_sea);
 	mutex_unlock(&ghes_list_mutex);
@@ -1002,6 +999,8 @@ static void ghes_sea_remove(struct ghes *ghes)
 	list_del_rcu(&ghes->list);
 	mutex_unlock(&ghes_list_mutex);
 	synchronize_rcu();
+
+	ghes_estatus_queue_shrink_pool(ghes);
 }
 #else /* CONFIG_ACPI_APEI_SEA */
 static inline void ghes_sea_add(struct ghes *ghes) { }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 05/12] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

To split up APEIs in_nmi() path, we need any nmi-like callers to always
be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing
out into a header file.

Currently guest synchronous external aborts are claimed as RAS
notifications by handle_guest_sea(), which is hidden in the arch codes
mm/fault.c. 32bit gets a dummy declaration in system_misc.h.

There is going to be more of this in the future if/when we support
the SError-based firmware-first notification mechanism and/or
kernel-first notifications for both synchronous external abort and
SError. Each of these will come with some Kconfig symbols and a
handful of header files.

Create a header file for all this.

This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the
declarations to kvm_ras.h as preparation for a future patch that moves
the ACPI-specific RAS code out of mm/fault.c.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_ras.h       | 14 ++++++++++++++
 arch/arm/include/asm/system_misc.h   |  5 -----
 arch/arm64/include/asm/kvm_ras.h     | 11 +++++++++++
 arch/arm64/include/asm/system_misc.h |  2 --
 arch/arm64/mm/fault.c                |  2 +-
 virt/kvm/arm/mmu.c                   |  4 ++--
 6 files changed, 28 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_ras.h
 create mode 100644 arch/arm64/include/asm/kvm_ras.h

diff --git a/arch/arm/include/asm/kvm_ras.h b/arch/arm/include/asm/kvm_ras.h
new file mode 100644
index 000000000000..aaff56bf338f
--- /dev/null
+++ b/arch/arm/include/asm/kvm_ras.h
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM_KVM_RAS_H__
+#define __ARM_KVM_RAS_H__
+
+#include <linux/types.h>
+
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+	return -1;
+}
+
+#endif /* __ARM_KVM_RAS_H__ */
diff --git a/arch/arm/include/asm/system_misc.h b/arch/arm/include/asm/system_misc.h
index 78f6db114faf..51e5ab50b35f 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -23,11 +23,6 @@ extern void (*arm_pm_idle)(void);
 
 extern unsigned int user_debug;
 
-static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)
-{
-	return -1;
-}
-
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
new file mode 100644
index 000000000000..5f72b07b7912
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM64_KVM_RAS_H__
+#define __ARM64_KVM_RAS_H__
+
+#include <linux/types.h>
+
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+
+#endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 28893a0b141d..48ded3628a89 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -45,8 +45,6 @@ extern void __show_regs(struct pt_regs *);
 
 extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
 
-int handle_guest_sea(phys_addr_t addr, unsigned int esr);
-
 #endif	/* __ASSEMBLY__ */
 
 #endif	/* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 4165485e8b6e..d61a886afec7 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -662,7 +662,7 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 63"			},
 };
 
-int handle_guest_sea(phys_addr_t addr, unsigned int esr)
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
 {
 	int ret = -ENOENT;
 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 7f6a944db23d..673141de1e67 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -27,10 +27,10 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_mmio.h>
+#include <asm/kvm_ras.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/virt.h>
-#include <asm/system_misc.h>
 
 #include "trace.h"
 
@@ -1647,7 +1647,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * For RAS the host kernel may handle this abort.
 		 * There is no need to pass the error into the guest.
 		 */
-		if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
+		if (!kvm_handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
 			return 1;
 
 		if (unlikely(!is_iabt)) {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 05/12] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

To split up APEIs in_nmi() path, we need any nmi-like callers to always
be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing
out into a header file.

Currently guest synchronous external aborts are claimed as RAS
notifications by handle_guest_sea(), which is hidden in the arch codes
mm/fault.c. 32bit gets a dummy declaration in system_misc.h.

There is going to be more of this in the future if/when we support
the SError-based firmware-first notification mechanism and/or
kernel-first notifications for both synchronous external abort and
SError. Each of these will come with some Kconfig symbols and a
handful of header files.

Create a header file for all this.

This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the
declarations to kvm_ras.h as preparation for a future patch that moves
the ACPI-specific RAS code out of mm/fault.c.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_ras.h       | 14 ++++++++++++++
 arch/arm/include/asm/system_misc.h   |  5 -----
 arch/arm64/include/asm/kvm_ras.h     | 11 +++++++++++
 arch/arm64/include/asm/system_misc.h |  2 --
 arch/arm64/mm/fault.c                |  2 +-
 virt/kvm/arm/mmu.c                   |  4 ++--
 6 files changed, 28 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_ras.h
 create mode 100644 arch/arm64/include/asm/kvm_ras.h

diff --git a/arch/arm/include/asm/kvm_ras.h b/arch/arm/include/asm/kvm_ras.h
new file mode 100644
index 000000000000..aaff56bf338f
--- /dev/null
+++ b/arch/arm/include/asm/kvm_ras.h
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM_KVM_RAS_H__
+#define __ARM_KVM_RAS_H__
+
+#include <linux/types.h>
+
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+	return -1;
+}
+
+#endif /* __ARM_KVM_RAS_H__ */
diff --git a/arch/arm/include/asm/system_misc.h b/arch/arm/include/asm/system_misc.h
index 78f6db114faf..51e5ab50b35f 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -23,11 +23,6 @@ extern void (*arm_pm_idle)(void);
 
 extern unsigned int user_debug;
 
-static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)
-{
-	return -1;
-}
-
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
new file mode 100644
index 000000000000..5f72b07b7912
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM64_KVM_RAS_H__
+#define __ARM64_KVM_RAS_H__
+
+#include <linux/types.h>
+
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+
+#endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 28893a0b141d..48ded3628a89 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -45,8 +45,6 @@ extern void __show_regs(struct pt_regs *);
 
 extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
 
-int handle_guest_sea(phys_addr_t addr, unsigned int esr);
-
 #endif	/* __ASSEMBLY__ */
 
 #endif	/* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 4165485e8b6e..d61a886afec7 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -662,7 +662,7 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 63"			},
 };
 
-int handle_guest_sea(phys_addr_t addr, unsigned int esr)
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
 {
 	int ret = -ENOENT;
 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 7f6a944db23d..673141de1e67 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -27,10 +27,10 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_mmio.h>
+#include <asm/kvm_ras.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/virt.h>
-#include <asm/system_misc.h>
 
 #include "trace.h"
 
@@ -1647,7 +1647,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * For RAS the host kernel may handle this abort.
 		 * There is no need to pass the error into the guest.
 		 */
-		if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
+		if (!kvm_handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
 			return 1;
 
 		if (unlikely(!is_iabt)) {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 05/12] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

To split up APEIs in_nmi() path, we need any nmi-like callers to always
be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing
out into a header file.

Currently guest synchronous external aborts are claimed as RAS
notifications by handle_guest_sea(), which is hidden in the arch codes
mm/fault.c. 32bit gets a dummy declaration in system_misc.h.

There is going to be more of this in the future if/when we support
the SError-based firmware-first notification mechanism and/or
kernel-first notifications for both synchronous external abort and
SError. Each of these will come with some Kconfig symbols and a
handful of header files.

Create a header file for all this.

This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the
declarations to kvm_ras.h as preparation for a future patch that moves
the ACPI-specific RAS code out of mm/fault.c.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_ras.h       | 14 ++++++++++++++
 arch/arm/include/asm/system_misc.h   |  5 -----
 arch/arm64/include/asm/kvm_ras.h     | 11 +++++++++++
 arch/arm64/include/asm/system_misc.h |  2 --
 arch/arm64/mm/fault.c                |  2 +-
 virt/kvm/arm/mmu.c                   |  4 ++--
 6 files changed, 28 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_ras.h
 create mode 100644 arch/arm64/include/asm/kvm_ras.h

diff --git a/arch/arm/include/asm/kvm_ras.h b/arch/arm/include/asm/kvm_ras.h
new file mode 100644
index 000000000000..aaff56bf338f
--- /dev/null
+++ b/arch/arm/include/asm/kvm_ras.h
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM_KVM_RAS_H__
+#define __ARM_KVM_RAS_H__
+
+#include <linux/types.h>
+
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+	return -1;
+}
+
+#endif /* __ARM_KVM_RAS_H__ */
diff --git a/arch/arm/include/asm/system_misc.h b/arch/arm/include/asm/system_misc.h
index 78f6db114faf..51e5ab50b35f 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -23,11 +23,6 @@ extern void (*arm_pm_idle)(void);
 
 extern unsigned int user_debug;
 
-static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)
-{
-	return -1;
-}
-
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
new file mode 100644
index 000000000000..5f72b07b7912
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 - Arm Ltd
+
+#ifndef __ARM64_KVM_RAS_H__
+#define __ARM64_KVM_RAS_H__
+
+#include <linux/types.h>
+
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+
+#endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 28893a0b141d..48ded3628a89 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -45,8 +45,6 @@ extern void __show_regs(struct pt_regs *);
 
 extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
 
-int handle_guest_sea(phys_addr_t addr, unsigned int esr);
-
 #endif	/* __ASSEMBLY__ */
 
 #endif	/* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 4165485e8b6e..d61a886afec7 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -662,7 +662,7 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 63"			},
 };
 
-int handle_guest_sea(phys_addr_t addr, unsigned int esr)
+int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
 {
 	int ret = -ENOENT;
 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 7f6a944db23d..673141de1e67 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -27,10 +27,10 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_mmio.h>
+#include <asm/kvm_ras.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/virt.h>
-#include <asm/system_misc.h>
 
 #include "trace.h"
 
@@ -1647,7 +1647,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * For RAS the host kernel may handle this abort.
 		 * There is no need to pass the error into the guest.
 		 */
-		if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
+		if (!kvm_handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
 			return 1;
 
 		if (unlikely(!is_iabt)) {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 06/12] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

To split up APEIs in_nmi() path, we need the nmi-like callers to always
be in_nmi(). Add a helper to do the work and claim the notification.

When KVM or the arch code takes an exception that might be a RAS
notification, it asks the APEI firmware-first code whether it wants
to claim the exception. We can then go on to see if (a future)
kernel-first mechanism wants to claim the notification, before
falling through to the existing default behaviour.

The NOTIFY_SEA code was merged before we had multiple, possibly
interacting, NMI-like notifications and the need to consider kernel
first in the future. Make the 'claiming' behaviour explicit.

As we're restructuring the APEI code to allow multiple NMI-like
notifications, any notification that might interrupt interrupts-masked
code must always be wrapped in nmi_enter()/nmi_exit(). This allows APEI
to use in_nmi() to choose between the raw/regular spinlock routines.

We mask SError over this window to prevent an asynchronous RAS error
arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()).

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
Why does apei_claim_sea() take a pt_regs? This gets used later to take
APEI by the hand through NMI->IRQ context, depending on what we
interrupted.

Changes since v2:
 * Added dummy definition for !ACPI and culled IS_ENABLED() checks.

 arch/arm64/include/asm/acpi.h      |  5 ++++-
 arch/arm64/include/asm/daifflags.h |  1 +
 arch/arm64/include/asm/kvm_ras.h   | 15 ++++++++++++++-
 arch/arm64/kernel/acpi.c           | 30 ++++++++++++++++++++++++++++++
 arch/arm64/mm/fault.c              | 29 +++++------------------------
 5 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 32f465a80e4e..e8e4f05b4776 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -16,6 +16,7 @@
 #include <linux/psci.h>
 
 #include <asm/cputype.h>
+#include <asm/ptrace.h>
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
 
@@ -89,7 +90,6 @@ struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu);
 
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
 void __init acpi_init_cpus(void);
-
 #else
 static inline void acpi_init_cpus(void) { }
 #endif /* CONFIG_ACPI */
@@ -126,6 +126,9 @@ static inline const char *acpi_get_enable_method(int cpu)
  */
 #define acpi_disable_cmcff 1
 pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
+int apei_claim_sea(struct pt_regs *regs);
+#else
+static inline int apei_claim_sea(struct pt_regs *regs) { return -ENOENT; }
 #endif /* CONFIG_ACPI_APEI */
 
 #ifdef CONFIG_ACPI_NUMA
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index 22e4c83de5a5..cbd753855bf3 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -20,6 +20,7 @@
 
 #define DAIF_PROCCTX		0
 #define DAIF_PROCCTX_NOIRQ	PSR_I_BIT
+#define DAIF_ERRCTX		(PSR_I_BIT | PSR_A_BIT)
 
 /* mask/save/unmask/restore all exceptions, including interrupts. */
 static inline void local_daif_mask(void)
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
index 5f72b07b7912..52edc9b3b937 100644
--- a/arch/arm64/include/asm/kvm_ras.h
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -4,8 +4,21 @@
 #ifndef __ARM64_KVM_RAS_H__
 #define __ARM64_KVM_RAS_H__
 
+#include <linux/acpi.h>
+#include <linux/errno.h>
 #include <linux/types.h>
 
-int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+#include <asm/acpi.h>
+
+/*
+ * Was this synchronous external abort a RAS notification?
+ * Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
+ *
+ * Call with irqs unmasked.
+ */
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+	return apei_claim_sea(NULL);
+}
 
 #endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 7b09487ff8fb..6a4823a3eb5e 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -33,6 +33,8 @@
 
 #ifdef CONFIG_ACPI_APEI
 # include <linux/efi.h>
+# include <acpi/ghes.h>
+# include <asm/daifflags.h>
 # include <asm/pgtable.h>
 #endif
 
@@ -261,4 +263,32 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
 		return __pgprot(PROT_NORMAL_NC);
 	return __pgprot(PROT_DEVICE_nGnRnE);
 }
+
+
+/*
+ * Claim Synchronous External Aborts as a firmware first notification.
+ *
+ * Used by KVM and the arch do_sea handler.
+ * @regs may be NULL when called from process context.
+ */
+int apei_claim_sea(struct pt_regs *regs)
+{
+	int err = -ENOENT;
+	unsigned long current_flags = arch_local_save_flags();
+
+	if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
+		return err;
+
+	/*
+	 * APEI expects an NMI-like notification to always be called
+	 * in NMI context.
+	 */
+	local_daif_restore(DAIF_ERRCTX);
+	nmi_enter();
+	err = ghes_notify_sea();
+	nmi_exit();
+	local_daif_restore(current_flags);
+
+	return err;
+}
 #endif
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index d61a886afec7..d7e89da0e5df 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/extable.h>
 #include <linux/signal.h>
 #include <linux/mm.h>
@@ -33,6 +34,7 @@
 #include <linux/preempt.h>
 #include <linux/hugetlb.h>
 
+#include <asm/acpi.h>
 #include <asm/bug.h>
 #include <asm/cmpxchg.h>
 #include <asm/cpufeature.h>
@@ -45,8 +47,6 @@
 #include <asm/tlbflush.h>
 #include <asm/traps.h>
 
-#include <acpi/ghes.h>
-
 struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr,
 		      struct pt_regs *regs);
@@ -569,19 +569,10 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	inf = esr_to_fault_info(esr);
 
 	/*
-	 * Synchronous aborts may interrupt code which had interrupts masked.
-	 * Before calling out into the wider kernel tell the interested
-	 * subsystems.
+	 * Return value ignored as we rely on signal merging.
+	 * Future patches will make this more robust.
 	 */
-	if (IS_ENABLED(CONFIG_ACPI_APEI_SEA)) {
-		if (interrupts_enabled(regs))
-			nmi_enter();
-
-		ghes_notify_sea();
-
-		if (interrupts_enabled(regs))
-			nmi_exit();
-	}
+	apei_claim_sea(regs);
 
 	info.si_signo = inf->sig;
 	info.si_errno = 0;
@@ -662,16 +653,6 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 63"			},
 };
 
-int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
-{
-	int ret = -ENOENT;
-
-	if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
-		ret = ghes_notify_sea();
-
-	return ret;
-}
-
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
 					 struct pt_regs *regs)
 {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 06/12] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

To split up APEIs in_nmi() path, we need the nmi-like callers to always
be in_nmi(). Add a helper to do the work and claim the notification.

When KVM or the arch code takes an exception that might be a RAS
notification, it asks the APEI firmware-first code whether it wants
to claim the exception. We can then go on to see if (a future)
kernel-first mechanism wants to claim the notification, before
falling through to the existing default behaviour.

The NOTIFY_SEA code was merged before we had multiple, possibly
interacting, NMI-like notifications and the need to consider kernel
first in the future. Make the 'claiming' behaviour explicit.

As we're restructuring the APEI code to allow multiple NMI-like
notifications, any notification that might interrupt interrupts-masked
code must always be wrapped in nmi_enter()/nmi_exit(). This allows APEI
to use in_nmi() to choose between the raw/regular spinlock routines.

We mask SError over this window to prevent an asynchronous RAS error
arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()).

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
Why does apei_claim_sea() take a pt_regs? This gets used later to take
APEI by the hand through NMI->IRQ context, depending on what we
interrupted.

Changes since v2:
 * Added dummy definition for !ACPI and culled IS_ENABLED() checks.

 arch/arm64/include/asm/acpi.h      |  5 ++++-
 arch/arm64/include/asm/daifflags.h |  1 +
 arch/arm64/include/asm/kvm_ras.h   | 15 ++++++++++++++-
 arch/arm64/kernel/acpi.c           | 30 ++++++++++++++++++++++++++++++
 arch/arm64/mm/fault.c              | 29 +++++------------------------
 5 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 32f465a80e4e..e8e4f05b4776 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -16,6 +16,7 @@
 #include <linux/psci.h>
 
 #include <asm/cputype.h>
+#include <asm/ptrace.h>
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
 
@@ -89,7 +90,6 @@ struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu);
 
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
 void __init acpi_init_cpus(void);
-
 #else
 static inline void acpi_init_cpus(void) { }
 #endif /* CONFIG_ACPI */
@@ -126,6 +126,9 @@ static inline const char *acpi_get_enable_method(int cpu)
  */
 #define acpi_disable_cmcff 1
 pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
+int apei_claim_sea(struct pt_regs *regs);
+#else
+static inline int apei_claim_sea(struct pt_regs *regs) { return -ENOENT; }
 #endif /* CONFIG_ACPI_APEI */
 
 #ifdef CONFIG_ACPI_NUMA
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index 22e4c83de5a5..cbd753855bf3 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -20,6 +20,7 @@
 
 #define DAIF_PROCCTX		0
 #define DAIF_PROCCTX_NOIRQ	PSR_I_BIT
+#define DAIF_ERRCTX		(PSR_I_BIT | PSR_A_BIT)
 
 /* mask/save/unmask/restore all exceptions, including interrupts. */
 static inline void local_daif_mask(void)
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
index 5f72b07b7912..52edc9b3b937 100644
--- a/arch/arm64/include/asm/kvm_ras.h
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -4,8 +4,21 @@
 #ifndef __ARM64_KVM_RAS_H__
 #define __ARM64_KVM_RAS_H__
 
+#include <linux/acpi.h>
+#include <linux/errno.h>
 #include <linux/types.h>
 
-int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+#include <asm/acpi.h>
+
+/*
+ * Was this synchronous external abort a RAS notification?
+ * Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
+ *
+ * Call with irqs unmasked.
+ */
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+	return apei_claim_sea(NULL);
+}
 
 #endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 7b09487ff8fb..6a4823a3eb5e 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -33,6 +33,8 @@
 
 #ifdef CONFIG_ACPI_APEI
 # include <linux/efi.h>
+# include <acpi/ghes.h>
+# include <asm/daifflags.h>
 # include <asm/pgtable.h>
 #endif
 
@@ -261,4 +263,32 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
 		return __pgprot(PROT_NORMAL_NC);
 	return __pgprot(PROT_DEVICE_nGnRnE);
 }
+
+
+/*
+ * Claim Synchronous External Aborts as a firmware first notification.
+ *
+ * Used by KVM and the arch do_sea handler.
+ * @regs may be NULL when called from process context.
+ */
+int apei_claim_sea(struct pt_regs *regs)
+{
+	int err = -ENOENT;
+	unsigned long current_flags = arch_local_save_flags();
+
+	if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
+		return err;
+
+	/*
+	 * APEI expects an NMI-like notification to always be called
+	 * in NMI context.
+	 */
+	local_daif_restore(DAIF_ERRCTX);
+	nmi_enter();
+	err = ghes_notify_sea();
+	nmi_exit();
+	local_daif_restore(current_flags);
+
+	return err;
+}
 #endif
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index d61a886afec7..d7e89da0e5df 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/extable.h>
 #include <linux/signal.h>
 #include <linux/mm.h>
@@ -33,6 +34,7 @@
 #include <linux/preempt.h>
 #include <linux/hugetlb.h>
 
+#include <asm/acpi.h>
 #include <asm/bug.h>
 #include <asm/cmpxchg.h>
 #include <asm/cpufeature.h>
@@ -45,8 +47,6 @@
 #include <asm/tlbflush.h>
 #include <asm/traps.h>
 
-#include <acpi/ghes.h>
-
 struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr,
 		      struct pt_regs *regs);
@@ -569,19 +569,10 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	inf = esr_to_fault_info(esr);
 
 	/*
-	 * Synchronous aborts may interrupt code which had interrupts masked.
-	 * Before calling out into the wider kernel tell the interested
-	 * subsystems.
+	 * Return value ignored as we rely on signal merging.
+	 * Future patches will make this more robust.
 	 */
-	if (IS_ENABLED(CONFIG_ACPI_APEI_SEA)) {
-		if (interrupts_enabled(regs))
-			nmi_enter();
-
-		ghes_notify_sea();
-
-		if (interrupts_enabled(regs))
-			nmi_exit();
-	}
+	apei_claim_sea(regs);
 
 	info.si_signo = inf->sig;
 	info.si_errno = 0;
@@ -662,16 +653,6 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 63"			},
 };
 
-int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
-{
-	int ret = -ENOENT;
-
-	if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
-		ret = ghes_notify_sea();
-
-	return ret;
-}
-
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
 					 struct pt_regs *regs)
 {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 06/12] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

To split up APEIs in_nmi() path, we need the nmi-like callers to always
be in_nmi(). Add a helper to do the work and claim the notification.

When KVM or the arch code takes an exception that might be a RAS
notification, it asks the APEI firmware-first code whether it wants
to claim the exception. We can then go on to see if (a future)
kernel-first mechanism wants to claim the notification, before
falling through to the existing default behaviour.

The NOTIFY_SEA code was merged before we had multiple, possibly
interacting, NMI-like notifications and the need to consider kernel
first in the future. Make the 'claiming' behaviour explicit.

As we're restructuring the APEI code to allow multiple NMI-like
notifications, any notification that might interrupt interrupts-masked
code must always be wrapped in nmi_enter()/nmi_exit(). This allows APEI
to use in_nmi() to choose between the raw/regular spinlock routines.

We mask SError over this window to prevent an asynchronous RAS error
arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()).

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
Why does apei_claim_sea() take a pt_regs? This gets used later to take
APEI by the hand through NMI->IRQ context, depending on what we
interrupted.

Changes since v2:
 * Added dummy definition for !ACPI and culled IS_ENABLED() checks.

 arch/arm64/include/asm/acpi.h      |  5 ++++-
 arch/arm64/include/asm/daifflags.h |  1 +
 arch/arm64/include/asm/kvm_ras.h   | 15 ++++++++++++++-
 arch/arm64/kernel/acpi.c           | 30 ++++++++++++++++++++++++++++++
 arch/arm64/mm/fault.c              | 29 +++++------------------------
 5 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 32f465a80e4e..e8e4f05b4776 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -16,6 +16,7 @@
 #include <linux/psci.h>
 
 #include <asm/cputype.h>
+#include <asm/ptrace.h>
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
 
@@ -89,7 +90,6 @@ struct acpi_madt_generic_interrupt *acpi_cpu_get_madt_gicc(int cpu);
 
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
 void __init acpi_init_cpus(void);
-
 #else
 static inline void acpi_init_cpus(void) { }
 #endif /* CONFIG_ACPI */
@@ -126,6 +126,9 @@ static inline const char *acpi_get_enable_method(int cpu)
  */
 #define acpi_disable_cmcff 1
 pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
+int apei_claim_sea(struct pt_regs *regs);
+#else
+static inline int apei_claim_sea(struct pt_regs *regs) { return -ENOENT; }
 #endif /* CONFIG_ACPI_APEI */
 
 #ifdef CONFIG_ACPI_NUMA
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index 22e4c83de5a5..cbd753855bf3 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -20,6 +20,7 @@
 
 #define DAIF_PROCCTX		0
 #define DAIF_PROCCTX_NOIRQ	PSR_I_BIT
+#define DAIF_ERRCTX		(PSR_I_BIT | PSR_A_BIT)
 
 /* mask/save/unmask/restore all exceptions, including interrupts. */
 static inline void local_daif_mask(void)
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
index 5f72b07b7912..52edc9b3b937 100644
--- a/arch/arm64/include/asm/kvm_ras.h
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -4,8 +4,21 @@
 #ifndef __ARM64_KVM_RAS_H__
 #define __ARM64_KVM_RAS_H__
 
+#include <linux/acpi.h>
+#include <linux/errno.h>
 #include <linux/types.h>
 
-int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr);
+#include <asm/acpi.h>
+
+/*
+ * Was this synchronous external abort a RAS notification?
+ * Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
+ *
+ * Call with irqs unmasked.
+ */
+static inline int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+	return apei_claim_sea(NULL);
+}
 
 #endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 7b09487ff8fb..6a4823a3eb5e 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -33,6 +33,8 @@
 
 #ifdef CONFIG_ACPI_APEI
 # include <linux/efi.h>
+# include <acpi/ghes.h>
+# include <asm/daifflags.h>
 # include <asm/pgtable.h>
 #endif
 
@@ -261,4 +263,32 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
 		return __pgprot(PROT_NORMAL_NC);
 	return __pgprot(PROT_DEVICE_nGnRnE);
 }
+
+
+/*
+ * Claim Synchronous External Aborts as a firmware first notification.
+ *
+ * Used by KVM and the arch do_sea handler.
+ * @regs may be NULL when called from process context.
+ */
+int apei_claim_sea(struct pt_regs *regs)
+{
+	int err = -ENOENT;
+	unsigned long current_flags = arch_local_save_flags();
+
+	if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
+		return err;
+
+	/*
+	 * APEI expects an NMI-like notification to always be called
+	 * in NMI context.
+	 */
+	local_daif_restore(DAIF_ERRCTX);
+	nmi_enter();
+	err = ghes_notify_sea();
+	nmi_exit();
+	local_daif_restore(current_flags);
+
+	return err;
+}
 #endif
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index d61a886afec7..d7e89da0e5df 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/acpi.h>
 #include <linux/extable.h>
 #include <linux/signal.h>
 #include <linux/mm.h>
@@ -33,6 +34,7 @@
 #include <linux/preempt.h>
 #include <linux/hugetlb.h>
 
+#include <asm/acpi.h>
 #include <asm/bug.h>
 #include <asm/cmpxchg.h>
 #include <asm/cpufeature.h>
@@ -45,8 +47,6 @@
 #include <asm/tlbflush.h>
 #include <asm/traps.h>
 
-#include <acpi/ghes.h>
-
 struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr,
 		      struct pt_regs *regs);
@@ -569,19 +569,10 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	inf = esr_to_fault_info(esr);
 
 	/*
-	 * Synchronous aborts may interrupt code which had interrupts masked.
-	 * Before calling out into the wider kernel tell the interested
-	 * subsystems.
+	 * Return value ignored as we rely on signal merging.
+	 * Future patches will make this more robust.
 	 */
-	if (IS_ENABLED(CONFIG_ACPI_APEI_SEA)) {
-		if (interrupts_enabled(regs))
-			nmi_enter();
-
-		ghes_notify_sea();
-
-		if (interrupts_enabled(regs))
-			nmi_exit();
-	}
+	apei_claim_sea(regs);
 
 	info.si_signo = inf->sig;
 	info.si_errno = 0;
@@ -662,16 +653,6 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 63"			},
 };
 
-int kvm_handle_guest_sea(phys_addr_t addr, unsigned int esr)
-{
-	int ret = -ENOENT;
-
-	if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
-		ret = ghes_notify_sea();
-
-	return ret;
-}
-
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
 					 struct pt_regs *regs)
 {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

Arm64 has multiple NMI-like notifications, but ghes.c only has one
in_nmi() path, risking deadlock if one NMI-like notification can
interrupt another.

To support this we need a fixmap entry and lock for each notification
type. But ghes_probe() attempts to process each struct ghes at probe
time, to ensure any error that was notified before ghes_probe() was
called has been done, and the buffer released (and maybe acknowledge
to firmware) so that future errors can be delivered.

This means NMI-like notifications need two fixmap entries and locks,
one for the ghes_probe() time call, and another for the actual NMI
that could interrupt ghes_probe().

Split this single path up by adding an NMI fixmap idx and lock into
the struct ghes. Any notification that can be called as an NMI can
use these to separate its resources from any other notification it
may interrupt.

The majority of notifications occur in IRQ context, so unless its
called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
fixmap entry and the ghes_fixmap_lock_irq lock. This allows
NMI-notifications to be processed by ghes_probe(), and then taken
as an NMI.

The double-underscore version of fix_to_virt() is used because the index
to be mapped can't be tested against the end of the enum at compile
time.

Signed-off-by: James Morse <james.morse@arm.com>

---
Changes since v1:
 * Fixed for ghes_proc() always calling every notification in process context.
   Now only NMI-like notifications need an additional fixmap-slot/lock.

 drivers/acpi/apei/ghes.c | 70 +++++++++++++++++-------------------------------
 include/acpi/ghes.h      |  4 +++
 2 files changed, 28 insertions(+), 46 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1859f27c37ff..48d9eb55ebb8 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -117,12 +117,9 @@ static DEFINE_MUTEX(ghes_list_mutex);
  * from BIOS to Linux can be determined only in NMI, IRQ or timer
  * handler, but general ioremap can not be used in atomic context, so
  * the fixmap is used instead.
- *
- * These 2 spinlocks are used to prevent the fixmap entries from being used
- * simultaneously.
  */
-static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
-static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
+static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
 
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
@@ -132,38 +129,16 @@ static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
 
-static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
-{
-	phys_addr_t paddr;
-	pgprot_t prot;
-
-	paddr = pfn << PAGE_SHIFT;
-	prot = arch_apei_get_mem_attribute(paddr);
-	__set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
-
-	return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI);
-}
-
-static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
+static void __iomem *ghes_fixmap_pfn(int fixmap_idx, u64 pfn)
 {
 	phys_addr_t paddr;
 	pgprot_t prot;
 
 	paddr = pfn << PAGE_SHIFT;
 	prot = arch_apei_get_mem_attribute(paddr);
-	__set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
+	__set_fixmap(fixmap_idx, paddr, prot);
 
-	return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ);
-}
-
-static void ghes_iounmap_nmi(void)
-{
-	clear_fixmap(FIX_APEI_GHES_NMI);
-}
-
-static void ghes_iounmap_irq(void)
-{
-	clear_fixmap(FIX_APEI_GHES_IRQ);
+	return (void __iomem *) __fix_to_virt(fixmap_idx);
 }
 
 static int ghes_estatus_pool_init(void)
@@ -291,10 +266,11 @@ static inline int ghes_severity(int severity)
 	}
 }
 
-static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
-				  int from_phys)
+static void ghes_copy_tofrom_phys(struct ghes *ghes, void *buffer, u64 paddr,
+				  u32 len, int from_phys)
 {
 	void __iomem *vaddr;
+	int fixmap_idx = FIX_APEI_GHES_IRQ;
 	unsigned long flags = 0;
 	int in_nmi = in_nmi();
 	u64 offset;
@@ -303,12 +279,12 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
 	while (len > 0) {
 		offset = paddr - (paddr & PAGE_MASK);
 		if (in_nmi) {
-			raw_spin_lock(&ghes_ioremap_lock_nmi);
-			vaddr = ghes_ioremap_pfn_nmi(paddr >> PAGE_SHIFT);
+			raw_spin_lock(ghes->nmi_fixmap_lock);
+			fixmap_idx = ghes->nmi_fixmap_idx;
 		} else {
-			spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
-			vaddr = ghes_ioremap_pfn_irq(paddr >> PAGE_SHIFT);
+			spin_lock_irqsave(&ghes_fixmap_lock_irq, flags);
 		}
+		vaddr = ghes_fixmap_pfn(fixmap_idx, paddr >> PAGE_SHIFT);
 		trunk = PAGE_SIZE - offset;
 		trunk = min(trunk, len);
 		if (from_phys)
@@ -318,13 +294,11 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
 		len -= trunk;
 		paddr += trunk;
 		buffer += trunk;
-		if (in_nmi) {
-			ghes_iounmap_nmi();
-			raw_spin_unlock(&ghes_ioremap_lock_nmi);
-		} else {
-			ghes_iounmap_irq();
-			spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
-		}
+		clear_fixmap(fixmap_idx);
+		if (in_nmi)
+			raw_spin_unlock(ghes->nmi_fixmap_lock);
+		else
+			spin_unlock_irqrestore(&ghes_fixmap_lock_irq, flags);
 	}
 }
 
@@ -346,7 +320,7 @@ static int ghes_read_estatus(struct ghes *ghes, int silent)
 	if (!buf_paddr)
 		return -ENOENT;
 
-	ghes_copy_tofrom_phys(ghes->estatus, buf_paddr,
+	ghes_copy_tofrom_phys(ghes, ghes->estatus, buf_paddr,
 			      sizeof(*ghes->estatus), 1);
 	if (!ghes->estatus->block_status)
 		return -ENOENT;
@@ -362,7 +336,7 @@ static int ghes_read_estatus(struct ghes *ghes, int silent)
 		goto err_read_block;
 	if (cper_estatus_check_header(ghes->estatus))
 		goto err_read_block;
-	ghes_copy_tofrom_phys(ghes->estatus + 1,
+	ghes_copy_tofrom_phys(ghes, ghes->estatus + 1,
 			      buf_paddr + sizeof(*ghes->estatus),
 			      len - sizeof(*ghes->estatus), 1);
 	if (cper_estatus_check(ghes->estatus))
@@ -381,7 +355,7 @@ static void ghes_clear_estatus(struct ghes *ghes)
 	ghes->estatus->block_status = 0;
 	if (!(ghes->flags & GHES_TO_CLEAR))
 		return;
-	ghes_copy_tofrom_phys(ghes->estatus, ghes->buffer_paddr,
+	ghes_copy_tofrom_phys(ghes, ghes->estatus, ghes->buffer_paddr,
 			      sizeof(ghes->estatus->block_status), 0);
 	ghes->flags &= ~GHES_TO_CLEAR;
 }
@@ -986,6 +960,8 @@ int ghes_notify_sea(void)
 
 static void ghes_sea_add(struct ghes *ghes)
 {
+	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
+	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
 	ghes_estatus_queue_grow_pool(ghes);
 
 	mutex_lock(&ghes_list_mutex);
@@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 
 static void ghes_nmi_add(struct ghes *ghes)
 {
+	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
+	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
 	ghes_estatus_queue_grow_pool(ghes);
 
 	mutex_lock(&ghes_list_mutex);
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c866ee0..9c6a92730699 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -29,6 +29,10 @@ struct ghes {
 		struct timer_list timer;
 		unsigned int irq;
 	};
+
+	/* If this ghes can be called in NMI contet, these must be populated. */
+	raw_spinlock_t *nmi_fixmap_lock;
+	int nmi_fixmap_idx;
 };
 
 struct ghes_estatus_node {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

Arm64 has multiple NMI-like notifications, but ghes.c only has one
in_nmi() path, risking deadlock if one NMI-like notification can
interrupt another.

To support this we need a fixmap entry and lock for each notification
type. But ghes_probe() attempts to process each struct ghes at probe
time, to ensure any error that was notified before ghes_probe() was
called has been done, and the buffer released (and maybe acknowledge
to firmware) so that future errors can be delivered.

This means NMI-like notifications need two fixmap entries and locks,
one for the ghes_probe() time call, and another for the actual NMI
that could interrupt ghes_probe().

Split this single path up by adding an NMI fixmap idx and lock into
the struct ghes. Any notification that can be called as an NMI can
use these to separate its resources from any other notification it
may interrupt.

The majority of notifications occur in IRQ context, so unless its
called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
fixmap entry and the ghes_fixmap_lock_irq lock. This allows
NMI-notifications to be processed by ghes_probe(), and then taken
as an NMI.

The double-underscore version of fix_to_virt() is used because the index
to be mapped can't be tested against the end of the enum at compile
time.

Signed-off-by: James Morse <james.morse@arm.com>

---
Changes since v1:
 * Fixed for ghes_proc() always calling every notification in process context.
   Now only NMI-like notifications need an additional fixmap-slot/lock.

 drivers/acpi/apei/ghes.c | 70 +++++++++++++++++-------------------------------
 include/acpi/ghes.h      |  4 +++
 2 files changed, 28 insertions(+), 46 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1859f27c37ff..48d9eb55ebb8 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -117,12 +117,9 @@ static DEFINE_MUTEX(ghes_list_mutex);
  * from BIOS to Linux can be determined only in NMI, IRQ or timer
  * handler, but general ioremap can not be used in atomic context, so
  * the fixmap is used instead.
- *
- * These 2 spinlocks are used to prevent the fixmap entries from being used
- * simultaneously.
  */
-static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
-static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
+static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
 
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
@@ -132,38 +129,16 @@ static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
 
-static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
-{
-	phys_addr_t paddr;
-	pgprot_t prot;
-
-	paddr = pfn << PAGE_SHIFT;
-	prot = arch_apei_get_mem_attribute(paddr);
-	__set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
-
-	return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI);
-}
-
-static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
+static void __iomem *ghes_fixmap_pfn(int fixmap_idx, u64 pfn)
 {
 	phys_addr_t paddr;
 	pgprot_t prot;
 
 	paddr = pfn << PAGE_SHIFT;
 	prot = arch_apei_get_mem_attribute(paddr);
-	__set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
+	__set_fixmap(fixmap_idx, paddr, prot);
 
-	return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ);
-}
-
-static void ghes_iounmap_nmi(void)
-{
-	clear_fixmap(FIX_APEI_GHES_NMI);
-}
-
-static void ghes_iounmap_irq(void)
-{
-	clear_fixmap(FIX_APEI_GHES_IRQ);
+	return (void __iomem *) __fix_to_virt(fixmap_idx);
 }
 
 static int ghes_estatus_pool_init(void)
@@ -291,10 +266,11 @@ static inline int ghes_severity(int severity)
 	}
 }
 
-static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
-				  int from_phys)
+static void ghes_copy_tofrom_phys(struct ghes *ghes, void *buffer, u64 paddr,
+				  u32 len, int from_phys)
 {
 	void __iomem *vaddr;
+	int fixmap_idx = FIX_APEI_GHES_IRQ;
 	unsigned long flags = 0;
 	int in_nmi = in_nmi();
 	u64 offset;
@@ -303,12 +279,12 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
 	while (len > 0) {
 		offset = paddr - (paddr & PAGE_MASK);
 		if (in_nmi) {
-			raw_spin_lock(&ghes_ioremap_lock_nmi);
-			vaddr = ghes_ioremap_pfn_nmi(paddr >> PAGE_SHIFT);
+			raw_spin_lock(ghes->nmi_fixmap_lock);
+			fixmap_idx = ghes->nmi_fixmap_idx;
 		} else {
-			spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
-			vaddr = ghes_ioremap_pfn_irq(paddr >> PAGE_SHIFT);
+			spin_lock_irqsave(&ghes_fixmap_lock_irq, flags);
 		}
+		vaddr = ghes_fixmap_pfn(fixmap_idx, paddr >> PAGE_SHIFT);
 		trunk = PAGE_SIZE - offset;
 		trunk = min(trunk, len);
 		if (from_phys)
@@ -318,13 +294,11 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
 		len -= trunk;
 		paddr += trunk;
 		buffer += trunk;
-		if (in_nmi) {
-			ghes_iounmap_nmi();
-			raw_spin_unlock(&ghes_ioremap_lock_nmi);
-		} else {
-			ghes_iounmap_irq();
-			spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
-		}
+		clear_fixmap(fixmap_idx);
+		if (in_nmi)
+			raw_spin_unlock(ghes->nmi_fixmap_lock);
+		else
+			spin_unlock_irqrestore(&ghes_fixmap_lock_irq, flags);
 	}
 }
 
@@ -346,7 +320,7 @@ static int ghes_read_estatus(struct ghes *ghes, int silent)
 	if (!buf_paddr)
 		return -ENOENT;
 
-	ghes_copy_tofrom_phys(ghes->estatus, buf_paddr,
+	ghes_copy_tofrom_phys(ghes, ghes->estatus, buf_paddr,
 			      sizeof(*ghes->estatus), 1);
 	if (!ghes->estatus->block_status)
 		return -ENOENT;
@@ -362,7 +336,7 @@ static int ghes_read_estatus(struct ghes *ghes, int silent)
 		goto err_read_block;
 	if (cper_estatus_check_header(ghes->estatus))
 		goto err_read_block;
-	ghes_copy_tofrom_phys(ghes->estatus + 1,
+	ghes_copy_tofrom_phys(ghes, ghes->estatus + 1,
 			      buf_paddr + sizeof(*ghes->estatus),
 			      len - sizeof(*ghes->estatus), 1);
 	if (cper_estatus_check(ghes->estatus))
@@ -381,7 +355,7 @@ static void ghes_clear_estatus(struct ghes *ghes)
 	ghes->estatus->block_status = 0;
 	if (!(ghes->flags & GHES_TO_CLEAR))
 		return;
-	ghes_copy_tofrom_phys(ghes->estatus, ghes->buffer_paddr,
+	ghes_copy_tofrom_phys(ghes, ghes->estatus, ghes->buffer_paddr,
 			      sizeof(ghes->estatus->block_status), 0);
 	ghes->flags &= ~GHES_TO_CLEAR;
 }
@@ -986,6 +960,8 @@ int ghes_notify_sea(void)
 
 static void ghes_sea_add(struct ghes *ghes)
 {
+	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
+	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
 	ghes_estatus_queue_grow_pool(ghes);
 
 	mutex_lock(&ghes_list_mutex);
@@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 
 static void ghes_nmi_add(struct ghes *ghes)
 {
+	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
+	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
 	ghes_estatus_queue_grow_pool(ghes);
 
 	mutex_lock(&ghes_list_mutex);
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c866ee0..9c6a92730699 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -29,6 +29,10 @@ struct ghes {
 		struct timer_list timer;
 		unsigned int irq;
 	};
+
+	/* If this ghes can be called in NMI contet, these must be populated. */
+	raw_spinlock_t *nmi_fixmap_lock;
+	int nmi_fixmap_idx;
 };
 
 struct ghes_estatus_node {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Arm64 has multiple NMI-like notifications, but ghes.c only has one
in_nmi() path, risking deadlock if one NMI-like notification can
interrupt another.

To support this we need a fixmap entry and lock for each notification
type. But ghes_probe() attempts to process each struct ghes at probe
time, to ensure any error that was notified before ghes_probe() was
called has been done, and the buffer released (and maybe acknowledge
to firmware) so that future errors can be delivered.

This means NMI-like notifications need two fixmap entries and locks,
one for the ghes_probe() time call, and another for the actual NMI
that could interrupt ghes_probe().

Split this single path up by adding an NMI fixmap idx and lock into
the struct ghes. Any notification that can be called as an NMI can
use these to separate its resources from any other notification it
may interrupt.

The majority of notifications occur in IRQ context, so unless its
called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
fixmap entry and the ghes_fixmap_lock_irq lock. This allows
NMI-notifications to be processed by ghes_probe(), and then taken
as an NMI.

The double-underscore version of fix_to_virt() is used because the index
to be mapped can't be tested against the end of the enum at compile
time.

Signed-off-by: James Morse <james.morse@arm.com>

---
Changes since v1:
 * Fixed for ghes_proc() always calling every notification in process context.
   Now only NMI-like notifications need an additional fixmap-slot/lock.

 drivers/acpi/apei/ghes.c | 70 +++++++++++++++++-------------------------------
 include/acpi/ghes.h      |  4 +++
 2 files changed, 28 insertions(+), 46 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1859f27c37ff..48d9eb55ebb8 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -117,12 +117,9 @@ static DEFINE_MUTEX(ghes_list_mutex);
  * from BIOS to Linux can be determined only in NMI, IRQ or timer
  * handler, but general ioremap can not be used in atomic context, so
  * the fixmap is used instead.
- *
- * These 2 spinlocks are used to prevent the fixmap entries from being used
- * simultaneously.
  */
-static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
-static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
+static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
 
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
@@ -132,38 +129,16 @@ static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
 
-static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
-{
-	phys_addr_t paddr;
-	pgprot_t prot;
-
-	paddr = pfn << PAGE_SHIFT;
-	prot = arch_apei_get_mem_attribute(paddr);
-	__set_fixmap(FIX_APEI_GHES_NMI, paddr, prot);
-
-	return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI);
-}
-
-static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
+static void __iomem *ghes_fixmap_pfn(int fixmap_idx, u64 pfn)
 {
 	phys_addr_t paddr;
 	pgprot_t prot;
 
 	paddr = pfn << PAGE_SHIFT;
 	prot = arch_apei_get_mem_attribute(paddr);
-	__set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot);
+	__set_fixmap(fixmap_idx, paddr, prot);
 
-	return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ);
-}
-
-static void ghes_iounmap_nmi(void)
-{
-	clear_fixmap(FIX_APEI_GHES_NMI);
-}
-
-static void ghes_iounmap_irq(void)
-{
-	clear_fixmap(FIX_APEI_GHES_IRQ);
+	return (void __iomem *) __fix_to_virt(fixmap_idx);
 }
 
 static int ghes_estatus_pool_init(void)
@@ -291,10 +266,11 @@ static inline int ghes_severity(int severity)
 	}
 }
 
-static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
-				  int from_phys)
+static void ghes_copy_tofrom_phys(struct ghes *ghes, void *buffer, u64 paddr,
+				  u32 len, int from_phys)
 {
 	void __iomem *vaddr;
+	int fixmap_idx = FIX_APEI_GHES_IRQ;
 	unsigned long flags = 0;
 	int in_nmi = in_nmi();
 	u64 offset;
@@ -303,12 +279,12 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
 	while (len > 0) {
 		offset = paddr - (paddr & PAGE_MASK);
 		if (in_nmi) {
-			raw_spin_lock(&ghes_ioremap_lock_nmi);
-			vaddr = ghes_ioremap_pfn_nmi(paddr >> PAGE_SHIFT);
+			raw_spin_lock(ghes->nmi_fixmap_lock);
+			fixmap_idx = ghes->nmi_fixmap_idx;
 		} else {
-			spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
-			vaddr = ghes_ioremap_pfn_irq(paddr >> PAGE_SHIFT);
+			spin_lock_irqsave(&ghes_fixmap_lock_irq, flags);
 		}
+		vaddr = ghes_fixmap_pfn(fixmap_idx, paddr >> PAGE_SHIFT);
 		trunk = PAGE_SIZE - offset;
 		trunk = min(trunk, len);
 		if (from_phys)
@@ -318,13 +294,11 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
 		len -= trunk;
 		paddr += trunk;
 		buffer += trunk;
-		if (in_nmi) {
-			ghes_iounmap_nmi();
-			raw_spin_unlock(&ghes_ioremap_lock_nmi);
-		} else {
-			ghes_iounmap_irq();
-			spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
-		}
+		clear_fixmap(fixmap_idx);
+		if (in_nmi)
+			raw_spin_unlock(ghes->nmi_fixmap_lock);
+		else
+			spin_unlock_irqrestore(&ghes_fixmap_lock_irq, flags);
 	}
 }
 
@@ -346,7 +320,7 @@ static int ghes_read_estatus(struct ghes *ghes, int silent)
 	if (!buf_paddr)
 		return -ENOENT;
 
-	ghes_copy_tofrom_phys(ghes->estatus, buf_paddr,
+	ghes_copy_tofrom_phys(ghes, ghes->estatus, buf_paddr,
 			      sizeof(*ghes->estatus), 1);
 	if (!ghes->estatus->block_status)
 		return -ENOENT;
@@ -362,7 +336,7 @@ static int ghes_read_estatus(struct ghes *ghes, int silent)
 		goto err_read_block;
 	if (cper_estatus_check_header(ghes->estatus))
 		goto err_read_block;
-	ghes_copy_tofrom_phys(ghes->estatus + 1,
+	ghes_copy_tofrom_phys(ghes, ghes->estatus + 1,
 			      buf_paddr + sizeof(*ghes->estatus),
 			      len - sizeof(*ghes->estatus), 1);
 	if (cper_estatus_check(ghes->estatus))
@@ -381,7 +355,7 @@ static void ghes_clear_estatus(struct ghes *ghes)
 	ghes->estatus->block_status = 0;
 	if (!(ghes->flags & GHES_TO_CLEAR))
 		return;
-	ghes_copy_tofrom_phys(ghes->estatus, ghes->buffer_paddr,
+	ghes_copy_tofrom_phys(ghes, ghes->estatus, ghes->buffer_paddr,
 			      sizeof(ghes->estatus->block_status), 0);
 	ghes->flags &= ~GHES_TO_CLEAR;
 }
@@ -986,6 +960,8 @@ int ghes_notify_sea(void)
 
 static void ghes_sea_add(struct ghes *ghes)
 {
+	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
+	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
 	ghes_estatus_queue_grow_pool(ghes);
 
 	mutex_lock(&ghes_list_mutex);
@@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 
 static void ghes_nmi_add(struct ghes *ghes)
 {
+	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
+	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
 	ghes_estatus_queue_grow_pool(ghes);
 
 	mutex_lock(&ghes_list_mutex);
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c866ee0..9c6a92730699 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -29,6 +29,10 @@ struct ghes {
 		struct timer_list timer;
 		unsigned int irq;
 	};
+
+	/* If this ghes can be called in NMI contet, these must be populated. */
+	raw_spinlock_t *nmi_fixmap_lock;
+	int nmi_fixmap_idx;
 };
 
 struct ghes_estatus_node {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 08/12] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

Now that ghes uses the fixmap addresses and locks via some indirection
we can support multiple NMI-like notifications on arm64.

These should be named after their notification method. x86's
NOTIFY_NMI already is, change the SEA fixmap entry to be called
FIX_APEI_GHES_SEA.

Future patches can add support for FIX_APEI_GHES_SEI and
FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/include/asm/fixmap.h |  4 +++-
 drivers/acpi/apei/ghes.c        | 11 ++++++-----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index ec1e6d6fa14c..c3974517c2cb 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -55,7 +55,9 @@ enum fixed_addresses {
 #ifdef CONFIG_ACPI_APEI_GHES
 	/* Used for GHES mapping from assorted contexts */
 	FIX_APEI_GHES_IRQ,
-	FIX_APEI_GHES_NMI,
+#ifdef CONFIG_ACPI_APEI_SEA
+	FIX_APEI_GHES_SEA,
+#endif
 #endif /* CONFIG_ACPI_APEI_GHES */
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 48d9eb55ebb8..3ddccd170240 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -118,7 +118,6 @@ static DEFINE_MUTEX(ghes_list_mutex);
  * handler, but general ioremap can not be used in atomic context, so
  * the fixmap is used instead.
  */
-static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
 static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
 
 static struct gen_pool *ghes_estatus_pool;
@@ -948,6 +947,7 @@ static struct notifier_block ghes_notifier_hed = {
 
 #ifdef CONFIG_ACPI_APEI_SEA
 static LIST_HEAD(ghes_sea);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_sea);
 
 /*
  * Return 0 only if one of the SEA error sources successfully reported an error
@@ -960,8 +960,8 @@ int ghes_notify_sea(void)
 
 static void ghes_sea_add(struct ghes *ghes)
 {
-	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
-	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
+	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_sea;
+	ghes->nmi_fixmap_idx = FIX_APEI_GHES_SEA;
 	ghes_estatus_queue_grow_pool(ghes);
 
 	mutex_lock(&ghes_list_mutex);
@@ -985,12 +985,13 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
- * NMI may be triggered on any CPU, so ghes_in_nmi is used for
- * having only one concurrent reader.
+ * NOTIFY_NMI may be triggered on any CPU, so ghes_in_nmi is
+ * used for having only one concurrent reader.
  */
 static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
 
 static LIST_HEAD(ghes_nmi);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 08/12] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

Now that ghes uses the fixmap addresses and locks via some indirection
we can support multiple NMI-like notifications on arm64.

These should be named after their notification method. x86's
NOTIFY_NMI already is, change the SEA fixmap entry to be called
FIX_APEI_GHES_SEA.

Future patches can add support for FIX_APEI_GHES_SEI and
FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/include/asm/fixmap.h |  4 +++-
 drivers/acpi/apei/ghes.c        | 11 ++++++-----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index ec1e6d6fa14c..c3974517c2cb 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -55,7 +55,9 @@ enum fixed_addresses {
 #ifdef CONFIG_ACPI_APEI_GHES
 	/* Used for GHES mapping from assorted contexts */
 	FIX_APEI_GHES_IRQ,
-	FIX_APEI_GHES_NMI,
+#ifdef CONFIG_ACPI_APEI_SEA
+	FIX_APEI_GHES_SEA,
+#endif
 #endif /* CONFIG_ACPI_APEI_GHES */
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 48d9eb55ebb8..3ddccd170240 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -118,7 +118,6 @@ static DEFINE_MUTEX(ghes_list_mutex);
  * handler, but general ioremap can not be used in atomic context, so
  * the fixmap is used instead.
  */
-static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
 static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
 
 static struct gen_pool *ghes_estatus_pool;
@@ -948,6 +947,7 @@ static struct notifier_block ghes_notifier_hed = {
 
 #ifdef CONFIG_ACPI_APEI_SEA
 static LIST_HEAD(ghes_sea);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_sea);
 
 /*
  * Return 0 only if one of the SEA error sources successfully reported an error
@@ -960,8 +960,8 @@ int ghes_notify_sea(void)
 
 static void ghes_sea_add(struct ghes *ghes)
 {
-	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
-	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
+	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_sea;
+	ghes->nmi_fixmap_idx = FIX_APEI_GHES_SEA;
 	ghes_estatus_queue_grow_pool(ghes);
 
 	mutex_lock(&ghes_list_mutex);
@@ -985,12 +985,13 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
- * NMI may be triggered on any CPU, so ghes_in_nmi is used for
- * having only one concurrent reader.
+ * NOTIFY_NMI may be triggered on any CPU, so ghes_in_nmi is
+ * used for having only one concurrent reader.
  */
 static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
 
 static LIST_HEAD(ghes_nmi);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 08/12] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

Now that ghes uses the fixmap addresses and locks via some indirection
we can support multiple NMI-like notifications on arm64.

These should be named after their notification method. x86's
NOTIFY_NMI already is, change the SEA fixmap entry to be called
FIX_APEI_GHES_SEA.

Future patches can add support for FIX_APEI_GHES_SEI and
FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/include/asm/fixmap.h |  4 +++-
 drivers/acpi/apei/ghes.c        | 11 ++++++-----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index ec1e6d6fa14c..c3974517c2cb 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -55,7 +55,9 @@ enum fixed_addresses {
 #ifdef CONFIG_ACPI_APEI_GHES
 	/* Used for GHES mapping from assorted contexts */
 	FIX_APEI_GHES_IRQ,
-	FIX_APEI_GHES_NMI,
+#ifdef CONFIG_ACPI_APEI_SEA
+	FIX_APEI_GHES_SEA,
+#endif
 #endif /* CONFIG_ACPI_APEI_GHES */
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 48d9eb55ebb8..3ddccd170240 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -118,7 +118,6 @@ static DEFINE_MUTEX(ghes_list_mutex);
  * handler, but general ioremap can not be used in atomic context, so
  * the fixmap is used instead.
  */
-static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
 static DEFINE_SPINLOCK(ghes_fixmap_lock_irq);
 
 static struct gen_pool *ghes_estatus_pool;
@@ -948,6 +947,7 @@ static struct notifier_block ghes_notifier_hed = {
 
 #ifdef CONFIG_ACPI_APEI_SEA
 static LIST_HEAD(ghes_sea);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_sea);
 
 /*
  * Return 0 only if one of the SEA error sources successfully reported an error
@@ -960,8 +960,8 @@ int ghes_notify_sea(void)
 
 static void ghes_sea_add(struct ghes *ghes)
 {
-	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
-	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
+	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_sea;
+	ghes->nmi_fixmap_idx = FIX_APEI_GHES_SEA;
 	ghes_estatus_queue_grow_pool(ghes);
 
 	mutex_lock(&ghes_list_mutex);
@@ -985,12 +985,13 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
- * NMI may be triggered on any CPU, so ghes_in_nmi is used for
- * having only one concurrent reader.
+ * NOTIFY_NMI may be triggered on any CPU, so ghes_in_nmi is
+ * used for having only one concurrent reader.
  */
 static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
 
 static LIST_HEAD(ghes_nmi);
+static DEFINE_RAW_SPINLOCK(ghes_fixmap_lock_nmi);
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 09/12] firmware: arm_sdei: Add ACPI GHES registration helper
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

APEI's Generic Hardware Error Source structures do not describe
whether the SDEI event is shared or private, as this information is
discoverable via the API.

GHES needs to know whether an event is normal or critical to avoid
sharing locks or fixmap entries.

Add a helper to ask firmware for this information so it can initialise
the struct ghes and register then enable the event.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>

---
Changes since v2:
 * Added header file, thanks kbuild-robot!
 * changed ifdef to the GHES version to match the fixmap definition

Changes since v1:
 * ghes->fixmap_idx variable rename

 arch/arm64/include/asm/fixmap.h |  4 +++
 drivers/firmware/arm_sdei.c     | 77 +++++++++++++++++++++++++++++++++++++++++
 include/linux/arm_sdei.h        |  5 +++
 3 files changed, 86 insertions(+)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index c3974517c2cb..e2b423a5feaf 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -58,6 +58,10 @@ enum fixed_addresses {
 #ifdef CONFIG_ACPI_APEI_SEA
 	FIX_APEI_GHES_SEA,
 #endif
+#ifdef CONFIG_ARM_SDE_INTERFACE
+	FIX_APEI_GHES_SDEI_NORMAL,
+	FIX_APEI_GHES_SDEI_CRITICAL,
+#endif
 #endif /* CONFIG_ACPI_APEI_GHES */
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 1ea71640fdc2..2c29f435b1f9 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -2,6 +2,7 @@
 // Copyright (C) 2017 Arm Ltd.
 #define pr_fmt(fmt) "sdei: " fmt
 
+#include <acpi/ghes.h>
 #include <linux/acpi.h>
 #include <linux/arm_sdei.h>
 #include <linux/arm-smccc.h>
@@ -32,6 +33,8 @@
 #include <linux/spinlock.h>
 #include <linux/uaccess.h>
 
+#include <asm/fixmap.h>
+
 /*
  * The call to use to reach the firmware.
  */
@@ -887,6 +890,80 @@ static void sdei_smccc_hvc(unsigned long function_id,
 	arm_smccc_hvc(function_id, arg0, arg1, arg2, arg3, arg4, 0, 0, res);
 }
 
+#ifdef CONFIG_ACPI_APEI_GHES
+/* These stop private notifications using the fixmap entries simultaneously */
+static DEFINE_RAW_SPINLOCK(sdei_ghes_fixmap_lock_normal);
+static DEFINE_RAW_SPINLOCK(sdei_ghes_fixmap_lock_critical);
+
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb)
+{
+	int err;
+	u32 event_num;
+	u64 result;
+
+	if (acpi_disabled)
+		return -EOPNOTSUPP;
+
+	event_num = ghes->generic->notify.vector;
+	if (event_num == 0) {
+		/*
+		 * Event 0 is reserved by the specification for
+		 * SDEI_EVENT_SIGNAL.
+		 */
+		return -EINVAL;
+	}
+
+	err = sdei_api_event_get_info(event_num, SDEI_EVENT_INFO_EV_PRIORITY,
+				      &result);
+	if (err)
+		return err;
+
+	if (result == SDEI_EVENT_PRIORITY_CRITICAL) {
+		ghes->nmi_fixmap_lock = &sdei_ghes_fixmap_lock_critical;
+		ghes->nmi_fixmap_idx = FIX_APEI_GHES_SDEI_CRITICAL;
+	} else {
+		ghes->nmi_fixmap_lock = &sdei_ghes_fixmap_lock_normal;
+		ghes->nmi_fixmap_idx = FIX_APEI_GHES_SDEI_NORMAL;
+	}
+
+	err = sdei_event_register(event_num, cb, ghes);
+	if (!err)
+		err = sdei_event_enable(event_num);
+
+	return err;
+}
+
+int sdei_unregister_ghes(struct ghes *ghes)
+{
+	int i;
+	int err;
+	u32 event_num = ghes->generic->notify.vector;
+
+	might_sleep();
+
+	if (acpi_disabled)
+		return -EOPNOTSUPP;
+
+	/*
+	 * The event may be running on another CPU. Disable it
+	 * to stop new events, then try to unregister a few times.
+	 */
+	err = sdei_event_disable(event_num);
+	if (err)
+		return err;
+
+	for (i = 0; i < 3; i++) {
+		err = sdei_event_unregister(event_num);
+		if (err != -EINPROGRESS)
+			break;
+
+		schedule();
+	}
+
+	return err;
+}
+#endif /* CONFIG_ACPI_APEI_GHES */
+
 static int sdei_get_conduit(struct platform_device *pdev)
 {
 	const char *method;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 942afbd544b7..5fdf799be026 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -11,6 +11,7 @@ enum sdei_conduit_types {
 	CONDUIT_HVC,
 };
 
+#include <acpi/ghes.h>
 #include <asm/sdei.h>
 
 /* Arch code should override this to set the entry point from firmware... */
@@ -39,6 +40,10 @@ int sdei_event_unregister(u32 event_num);
 int sdei_event_enable(u32 event_num);
 int sdei_event_disable(u32 event_num);
 
+/* GHES register/unregister helpers */
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb);
+int sdei_unregister_ghes(struct ghes *ghes);
+
 #ifdef CONFIG_ARM_SDE_INTERFACE
 /* For use by arch code when CPU hotplug notifiers are not appropriate. */
 int sdei_mask_local_cpu(void);
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 09/12] firmware: arm_sdei: Add ACPI GHES registration helper
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

APEI's Generic Hardware Error Source structures do not describe
whether the SDEI event is shared or private, as this information is
discoverable via the API.

GHES needs to know whether an event is normal or critical to avoid
sharing locks or fixmap entries.

Add a helper to ask firmware for this information so it can initialise
the struct ghes and register then enable the event.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>

---
Changes since v2:
 * Added header file, thanks kbuild-robot!
 * changed ifdef to the GHES version to match the fixmap definition

Changes since v1:
 * ghes->fixmap_idx variable rename

 arch/arm64/include/asm/fixmap.h |  4 +++
 drivers/firmware/arm_sdei.c     | 77 +++++++++++++++++++++++++++++++++++++++++
 include/linux/arm_sdei.h        |  5 +++
 3 files changed, 86 insertions(+)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index c3974517c2cb..e2b423a5feaf 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -58,6 +58,10 @@ enum fixed_addresses {
 #ifdef CONFIG_ACPI_APEI_SEA
 	FIX_APEI_GHES_SEA,
 #endif
+#ifdef CONFIG_ARM_SDE_INTERFACE
+	FIX_APEI_GHES_SDEI_NORMAL,
+	FIX_APEI_GHES_SDEI_CRITICAL,
+#endif
 #endif /* CONFIG_ACPI_APEI_GHES */
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 1ea71640fdc2..2c29f435b1f9 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -2,6 +2,7 @@
 // Copyright (C) 2017 Arm Ltd.
 #define pr_fmt(fmt) "sdei: " fmt
 
+#include <acpi/ghes.h>
 #include <linux/acpi.h>
 #include <linux/arm_sdei.h>
 #include <linux/arm-smccc.h>
@@ -32,6 +33,8 @@
 #include <linux/spinlock.h>
 #include <linux/uaccess.h>
 
+#include <asm/fixmap.h>
+
 /*
  * The call to use to reach the firmware.
  */
@@ -887,6 +890,80 @@ static void sdei_smccc_hvc(unsigned long function_id,
 	arm_smccc_hvc(function_id, arg0, arg1, arg2, arg3, arg4, 0, 0, res);
 }
 
+#ifdef CONFIG_ACPI_APEI_GHES
+/* These stop private notifications using the fixmap entries simultaneously */
+static DEFINE_RAW_SPINLOCK(sdei_ghes_fixmap_lock_normal);
+static DEFINE_RAW_SPINLOCK(sdei_ghes_fixmap_lock_critical);
+
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb)
+{
+	int err;
+	u32 event_num;
+	u64 result;
+
+	if (acpi_disabled)
+		return -EOPNOTSUPP;
+
+	event_num = ghes->generic->notify.vector;
+	if (event_num == 0) {
+		/*
+		 * Event 0 is reserved by the specification for
+		 * SDEI_EVENT_SIGNAL.
+		 */
+		return -EINVAL;
+	}
+
+	err = sdei_api_event_get_info(event_num, SDEI_EVENT_INFO_EV_PRIORITY,
+				      &result);
+	if (err)
+		return err;
+
+	if (result == SDEI_EVENT_PRIORITY_CRITICAL) {
+		ghes->nmi_fixmap_lock = &sdei_ghes_fixmap_lock_critical;
+		ghes->nmi_fixmap_idx = FIX_APEI_GHES_SDEI_CRITICAL;
+	} else {
+		ghes->nmi_fixmap_lock = &sdei_ghes_fixmap_lock_normal;
+		ghes->nmi_fixmap_idx = FIX_APEI_GHES_SDEI_NORMAL;
+	}
+
+	err = sdei_event_register(event_num, cb, ghes);
+	if (!err)
+		err = sdei_event_enable(event_num);
+
+	return err;
+}
+
+int sdei_unregister_ghes(struct ghes *ghes)
+{
+	int i;
+	int err;
+	u32 event_num = ghes->generic->notify.vector;
+
+	might_sleep();
+
+	if (acpi_disabled)
+		return -EOPNOTSUPP;
+
+	/*
+	 * The event may be running on another CPU. Disable it
+	 * to stop new events, then try to unregister a few times.
+	 */
+	err = sdei_event_disable(event_num);
+	if (err)
+		return err;
+
+	for (i = 0; i < 3; i++) {
+		err = sdei_event_unregister(event_num);
+		if (err != -EINPROGRESS)
+			break;
+
+		schedule();
+	}
+
+	return err;
+}
+#endif /* CONFIG_ACPI_APEI_GHES */
+
 static int sdei_get_conduit(struct platform_device *pdev)
 {
 	const char *method;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 942afbd544b7..5fdf799be026 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -11,6 +11,7 @@ enum sdei_conduit_types {
 	CONDUIT_HVC,
 };
 
+#include <acpi/ghes.h>
 #include <asm/sdei.h>
 
 /* Arch code should override this to set the entry point from firmware... */
@@ -39,6 +40,10 @@ int sdei_event_unregister(u32 event_num);
 int sdei_event_enable(u32 event_num);
 int sdei_event_disable(u32 event_num);
 
+/* GHES register/unregister helpers */
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb);
+int sdei_unregister_ghes(struct ghes *ghes);
+
 #ifdef CONFIG_ARM_SDE_INTERFACE
 /* For use by arch code when CPU hotplug notifiers are not appropriate. */
 int sdei_mask_local_cpu(void);
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 09/12] firmware: arm_sdei: Add ACPI GHES registration helper
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

APEI's Generic Hardware Error Source structures do not describe
whether the SDEI event is shared or private, as this information is
discoverable via the API.

GHES needs to know whether an event is normal or critical to avoid
sharing locks or fixmap entries.

Add a helper to ask firmware for this information so it can initialise
the struct ghes and register then enable the event.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>

---
Changes since v2:
 * Added header file, thanks kbuild-robot!
 * changed ifdef to the GHES version to match the fixmap definition

Changes since v1:
 * ghes->fixmap_idx variable rename

 arch/arm64/include/asm/fixmap.h |  4 +++
 drivers/firmware/arm_sdei.c     | 77 +++++++++++++++++++++++++++++++++++++++++
 include/linux/arm_sdei.h        |  5 +++
 3 files changed, 86 insertions(+)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index c3974517c2cb..e2b423a5feaf 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -58,6 +58,10 @@ enum fixed_addresses {
 #ifdef CONFIG_ACPI_APEI_SEA
 	FIX_APEI_GHES_SEA,
 #endif
+#ifdef CONFIG_ARM_SDE_INTERFACE
+	FIX_APEI_GHES_SDEI_NORMAL,
+	FIX_APEI_GHES_SDEI_CRITICAL,
+#endif
 #endif /* CONFIG_ACPI_APEI_GHES */
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
index 1ea71640fdc2..2c29f435b1f9 100644
--- a/drivers/firmware/arm_sdei.c
+++ b/drivers/firmware/arm_sdei.c
@@ -2,6 +2,7 @@
 // Copyright (C) 2017 Arm Ltd.
 #define pr_fmt(fmt) "sdei: " fmt
 
+#include <acpi/ghes.h>
 #include <linux/acpi.h>
 #include <linux/arm_sdei.h>
 #include <linux/arm-smccc.h>
@@ -32,6 +33,8 @@
 #include <linux/spinlock.h>
 #include <linux/uaccess.h>
 
+#include <asm/fixmap.h>
+
 /*
  * The call to use to reach the firmware.
  */
@@ -887,6 +890,80 @@ static void sdei_smccc_hvc(unsigned long function_id,
 	arm_smccc_hvc(function_id, arg0, arg1, arg2, arg3, arg4, 0, 0, res);
 }
 
+#ifdef CONFIG_ACPI_APEI_GHES
+/* These stop private notifications using the fixmap entries simultaneously */
+static DEFINE_RAW_SPINLOCK(sdei_ghes_fixmap_lock_normal);
+static DEFINE_RAW_SPINLOCK(sdei_ghes_fixmap_lock_critical);
+
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb)
+{
+	int err;
+	u32 event_num;
+	u64 result;
+
+	if (acpi_disabled)
+		return -EOPNOTSUPP;
+
+	event_num = ghes->generic->notify.vector;
+	if (event_num == 0) {
+		/*
+		 * Event 0 is reserved by the specification for
+		 * SDEI_EVENT_SIGNAL.
+		 */
+		return -EINVAL;
+	}
+
+	err = sdei_api_event_get_info(event_num, SDEI_EVENT_INFO_EV_PRIORITY,
+				      &result);
+	if (err)
+		return err;
+
+	if (result == SDEI_EVENT_PRIORITY_CRITICAL) {
+		ghes->nmi_fixmap_lock = &sdei_ghes_fixmap_lock_critical;
+		ghes->nmi_fixmap_idx = FIX_APEI_GHES_SDEI_CRITICAL;
+	} else {
+		ghes->nmi_fixmap_lock = &sdei_ghes_fixmap_lock_normal;
+		ghes->nmi_fixmap_idx = FIX_APEI_GHES_SDEI_NORMAL;
+	}
+
+	err = sdei_event_register(event_num, cb, ghes);
+	if (!err)
+		err = sdei_event_enable(event_num);
+
+	return err;
+}
+
+int sdei_unregister_ghes(struct ghes *ghes)
+{
+	int i;
+	int err;
+	u32 event_num = ghes->generic->notify.vector;
+
+	might_sleep();
+
+	if (acpi_disabled)
+		return -EOPNOTSUPP;
+
+	/*
+	 * The event may be running on another CPU. Disable it
+	 * to stop new events, then try to unregister a few times.
+	 */
+	err = sdei_event_disable(event_num);
+	if (err)
+		return err;
+
+	for (i = 0; i < 3; i++) {
+		err = sdei_event_unregister(event_num);
+		if (err != -EINPROGRESS)
+			break;
+
+		schedule();
+	}
+
+	return err;
+}
+#endif /* CONFIG_ACPI_APEI_GHES */
+
 static int sdei_get_conduit(struct platform_device *pdev)
 {
 	const char *method;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 942afbd544b7..5fdf799be026 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -11,6 +11,7 @@ enum sdei_conduit_types {
 	CONDUIT_HVC,
 };
 
+#include <acpi/ghes.h>
 #include <asm/sdei.h>
 
 /* Arch code should override this to set the entry point from firmware... */
@@ -39,6 +40,10 @@ int sdei_event_unregister(u32 event_num);
 int sdei_event_enable(u32 event_num);
 int sdei_event_disable(u32 event_num);
 
+/* GHES register/unregister helpers */
+int sdei_register_ghes(struct ghes *ghes, sdei_event_callback *cb);
+int sdei_unregister_ghes(struct ghes *ghes);
+
 #ifdef CONFIG_ARM_SDE_INTERFACE
 /* For use by arch code when CPU hotplug notifiers are not appropriate. */
 int sdei_mask_local_cpu(void);
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 10/12] ACPI / APEI: Add support for the SDEI GHES Notification type
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

If the GHES notification type is SDEI, register the provided event
number and point the callback at ghes_sdei_callback().

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
 drivers/acpi/apei/ghes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/arm_sdei.h |  3 +++
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 3ddccd170240..f348e6540960 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -25,6 +25,7 @@
  * GNU General Public License for more details.
  */
 
+#include <linux/arm_sdei.h>
 #include <linux/kernel.h>
 #include <linux/moduleparam.h>
 #include <linux/init.h>
@@ -58,7 +59,7 @@
 
 #define GHES_PFX	"GHES: "
 
-#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA) || defined(CONFIG_ARM_SDE_INTERFACE)
 #define WANT_NMI_ESTATUS_QUEUE	1
 #endif
 
@@ -747,7 +748,7 @@ static int _in_nmi_notify_one(struct ghes *ghes)
 	return 0;
 }
 
-static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+static int __maybe_unused ghes_estatus_queue_notified(struct list_head *rcu_list)
 {
 	int ret = -ENOENT;
 	struct ghes *ghes;
@@ -1041,6 +1042,49 @@ static inline void ghes_nmi_add(struct ghes *ghes) { }
 static inline void ghes_nmi_remove(struct ghes *ghes) { }
 #endif /* CONFIG_HAVE_ACPI_APEI_NMI */
 
+static int ghes_sdei_callback(u32 event_num, struct pt_regs *regs, void *arg)
+{
+	struct ghes *ghes = arg;
+
+	if (!_in_nmi_notify_one(ghes)) {
+		if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG))
+			irq_work_queue(&ghes_proc_irq_work);
+
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+static int apei_sdei_register_ghes(struct ghes *ghes)
+{
+	int err = -EINVAL;
+
+	if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+		ghes_estatus_queue_grow_pool(ghes);
+
+		err = sdei_register_ghes(ghes, ghes_sdei_callback);
+		if (err)
+			ghes_estatus_queue_shrink_pool(ghes);
+	}
+
+	return err;
+}
+
+static int apei_sdei_unregister_ghes(struct ghes *ghes)
+{
+	int err = -EINVAL;
+
+	if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+		err = sdei_unregister_ghes(ghes);
+
+		if (!err)
+			ghes_estatus_queue_shrink_pool(ghes);
+	}
+
+	return err;
+}
+
 static int ghes_probe(struct platform_device *ghes_dev)
 {
 	struct acpi_hest_generic *generic;
@@ -1075,6 +1119,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 			goto err;
 		}
 		break;
+	case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+		if (!IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SDE Interface is not supported!\n",
+				generic->header.source_id);
+			goto err;
+		}
+		break;
 	case ACPI_HEST_NOTIFY_LOCAL:
 		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
 			   generic->header.source_id);
@@ -1142,6 +1193,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+		rc = apei_sdei_register_ghes(ghes);
+		if (rc)
+			goto err_edac_unreg;
+		break;
 	default:
 		BUG();
 	}
@@ -1163,6 +1219,7 @@ err:
 
 static int ghes_remove(struct platform_device *ghes_dev)
 {
+	int rc;
 	struct ghes *ghes;
 	struct acpi_hest_generic *generic;
 
@@ -1195,6 +1252,11 @@ static int ghes_remove(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+		rc = apei_sdei_unregister_ghes(ghes);
+		if (rc)
+			return rc;
+		break;
 	default:
 		BUG();
 		break;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 5fdf799be026..f49063ca206d 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -12,7 +12,10 @@ enum sdei_conduit_types {
 };
 
 #include <acpi/ghes.h>
+
+#ifdef CONFIG_ARM_SDE_INTERFACE
 #include <asm/sdei.h>
+#endif
 
 /* Arch code should override this to set the entry point from firmware... */
 #ifndef sdei_arch_get_entry_point
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 10/12] ACPI / APEI: Add support for the SDEI GHES Notification type
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

If the GHES notification type is SDEI, register the provided event
number and point the callback at ghes_sdei_callback().

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
 drivers/acpi/apei/ghes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/arm_sdei.h |  3 +++
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 3ddccd170240..f348e6540960 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -25,6 +25,7 @@
  * GNU General Public License for more details.
  */
 
+#include <linux/arm_sdei.h>
 #include <linux/kernel.h>
 #include <linux/moduleparam.h>
 #include <linux/init.h>
@@ -58,7 +59,7 @@
 
 #define GHES_PFX	"GHES: "
 
-#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA) || defined(CONFIG_ARM_SDE_INTERFACE)
 #define WANT_NMI_ESTATUS_QUEUE	1
 #endif
 
@@ -747,7 +748,7 @@ static int _in_nmi_notify_one(struct ghes *ghes)
 	return 0;
 }
 
-static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+static int __maybe_unused ghes_estatus_queue_notified(struct list_head *rcu_list)
 {
 	int ret = -ENOENT;
 	struct ghes *ghes;
@@ -1041,6 +1042,49 @@ static inline void ghes_nmi_add(struct ghes *ghes) { }
 static inline void ghes_nmi_remove(struct ghes *ghes) { }
 #endif /* CONFIG_HAVE_ACPI_APEI_NMI */
 
+static int ghes_sdei_callback(u32 event_num, struct pt_regs *regs, void *arg)
+{
+	struct ghes *ghes = arg;
+
+	if (!_in_nmi_notify_one(ghes)) {
+		if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG))
+			irq_work_queue(&ghes_proc_irq_work);
+
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+static int apei_sdei_register_ghes(struct ghes *ghes)
+{
+	int err = -EINVAL;
+
+	if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+		ghes_estatus_queue_grow_pool(ghes);
+
+		err = sdei_register_ghes(ghes, ghes_sdei_callback);
+		if (err)
+			ghes_estatus_queue_shrink_pool(ghes);
+	}
+
+	return err;
+}
+
+static int apei_sdei_unregister_ghes(struct ghes *ghes)
+{
+	int err = -EINVAL;
+
+	if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+		err = sdei_unregister_ghes(ghes);
+
+		if (!err)
+			ghes_estatus_queue_shrink_pool(ghes);
+	}
+
+	return err;
+}
+
 static int ghes_probe(struct platform_device *ghes_dev)
 {
 	struct acpi_hest_generic *generic;
@@ -1075,6 +1119,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 			goto err;
 		}
 		break;
+	case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+		if (!IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SDE Interface is not supported!\n",
+				generic->header.source_id);
+			goto err;
+		}
+		break;
 	case ACPI_HEST_NOTIFY_LOCAL:
 		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
 			   generic->header.source_id);
@@ -1142,6 +1193,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+		rc = apei_sdei_register_ghes(ghes);
+		if (rc)
+			goto err_edac_unreg;
+		break;
 	default:
 		BUG();
 	}
@@ -1163,6 +1219,7 @@ err:
 
 static int ghes_remove(struct platform_device *ghes_dev)
 {
+	int rc;
 	struct ghes *ghes;
 	struct acpi_hest_generic *generic;
 
@@ -1195,6 +1252,11 @@ static int ghes_remove(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+		rc = apei_sdei_unregister_ghes(ghes);
+		if (rc)
+			return rc;
+		break;
 	default:
 		BUG();
 		break;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 5fdf799be026..f49063ca206d 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -12,7 +12,10 @@ enum sdei_conduit_types {
 };
 
 #include <acpi/ghes.h>
+
+#ifdef CONFIG_ARM_SDE_INTERFACE
 #include <asm/sdei.h>
+#endif
 
 /* Arch code should override this to set the entry point from firmware... */
 #ifndef sdei_arch_get_entry_point
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 10/12] ACPI / APEI: Add support for the SDEI GHES Notification type
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

If the GHES notification type is SDEI, register the provided event
number and point the callback at ghes_sdei_callback().

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
---
 drivers/acpi/apei/ghes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/arm_sdei.h |  3 +++
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 3ddccd170240..f348e6540960 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -25,6 +25,7 @@
  * GNU General Public License for more details.
  */
 
+#include <linux/arm_sdei.h>
 #include <linux/kernel.h>
 #include <linux/moduleparam.h>
 #include <linux/init.h>
@@ -58,7 +59,7 @@
 
 #define GHES_PFX	"GHES: "
 
-#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA)
+#if defined(CONFIG_HAVE_ACPI_APEI_NMI) || defined(CONFIG_ACPI_APEI_SEA) || defined(CONFIG_ARM_SDE_INTERFACE)
 #define WANT_NMI_ESTATUS_QUEUE	1
 #endif
 
@@ -747,7 +748,7 @@ static int _in_nmi_notify_one(struct ghes *ghes)
 	return 0;
 }
 
-static int ghes_estatus_queue_notified(struct list_head *rcu_list)
+static int __maybe_unused ghes_estatus_queue_notified(struct list_head *rcu_list)
 {
 	int ret = -ENOENT;
 	struct ghes *ghes;
@@ -1041,6 +1042,49 @@ static inline void ghes_nmi_add(struct ghes *ghes) { }
 static inline void ghes_nmi_remove(struct ghes *ghes) { }
 #endif /* CONFIG_HAVE_ACPI_APEI_NMI */
 
+static int ghes_sdei_callback(u32 event_num, struct pt_regs *regs, void *arg)
+{
+	struct ghes *ghes = arg;
+
+	if (!_in_nmi_notify_one(ghes)) {
+		if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG))
+			irq_work_queue(&ghes_proc_irq_work);
+
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+static int apei_sdei_register_ghes(struct ghes *ghes)
+{
+	int err = -EINVAL;
+
+	if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+		ghes_estatus_queue_grow_pool(ghes);
+
+		err = sdei_register_ghes(ghes, ghes_sdei_callback);
+		if (err)
+			ghes_estatus_queue_shrink_pool(ghes);
+	}
+
+	return err;
+}
+
+static int apei_sdei_unregister_ghes(struct ghes *ghes)
+{
+	int err = -EINVAL;
+
+	if (IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+		err = sdei_unregister_ghes(ghes);
+
+		if (!err)
+			ghes_estatus_queue_shrink_pool(ghes);
+	}
+
+	return err;
+}
+
 static int ghes_probe(struct platform_device *ghes_dev)
 {
 	struct acpi_hest_generic *generic;
@@ -1075,6 +1119,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 			goto err;
 		}
 		break;
+	case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+		if (!IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SDE Interface is not supported!\n",
+				generic->header.source_id);
+			goto err;
+		}
+		break;
 	case ACPI_HEST_NOTIFY_LOCAL:
 		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
 			   generic->header.source_id);
@@ -1142,6 +1193,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+		rc = apei_sdei_register_ghes(ghes);
+		if (rc)
+			goto err_edac_unreg;
+		break;
 	default:
 		BUG();
 	}
@@ -1163,6 +1219,7 @@ err:
 
 static int ghes_remove(struct platform_device *ghes_dev)
 {
+	int rc;
 	struct ghes *ghes;
 	struct acpi_hest_generic *generic;
 
@@ -1195,6 +1252,11 @@ static int ghes_remove(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SOFTWARE_DELEGATED:
+		rc = apei_sdei_unregister_ghes(ghes);
+		if (rc)
+			return rc;
+		break;
 	default:
 		BUG();
 		break;
diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h
index 5fdf799be026..f49063ca206d 100644
--- a/include/linux/arm_sdei.h
+++ b/include/linux/arm_sdei.h
@@ -12,7 +12,10 @@ enum sdei_conduit_types {
 };
 
 #include <acpi/ghes.h>
+
+#ifdef CONFIG_ARM_SDE_INTERFACE
 #include <asm/sdei.h>
+#endif
 
 /* Arch code should override this to set the entry point from firmware... */
 #ifndef sdei_arch_get_entry_point
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 11/12] mm/memory-failure: increase queued recovery work's priority
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

arm64 can take an NMI-like error notification when user-space steps in
some corrupt memory. APEI's GHES code will call memory_failure_queue()
to schedule the recovery work. We then return to user-space, possibly
taking the fault again.

Currently the arch code unconditionally signals user-space from this
path, so we don't get stuck in this loop, but the affected process
never benefits from memory_failure()s recovery work. To fix this we
need to know the recovery work will run before we get back to user-space.

Increase the priority of the recovery work by scheduling it on the
system_highpri_wq, then try to bump the current task off this CPU
so that the recover work starts immediately.

Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
CC: Xie XiuQi <xiexiuqi@huawei.com>
CC: gengdongjiu <gengdongjiu@huawei.com>
---
 mm/memory-failure.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9d142b9b86dc..f0e69d7ac406 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -55,6 +55,7 @@
 #include <linux/hugetlb.h>
 #include <linux/memory_hotplug.h>
 #include <linux/mm_inline.h>
+#include <linux/preempt.h>
 #include <linux/kfifo.h>
 #include <linux/ratelimit.h>
 #include "internal.h"
@@ -1333,6 +1334,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
  */
 void memory_failure_queue(unsigned long pfn, int flags)
 {
+	int cpu = smp_processor_id();
 	struct memory_failure_cpu *mf_cpu;
 	unsigned long proc_flags;
 	struct memory_failure_entry entry = {
@@ -1342,11 +1344,14 @@ void memory_failure_queue(unsigned long pfn, int flags)
 
 	mf_cpu = &get_cpu_var(memory_failure_cpu);
 	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
-	if (kfifo_put(&mf_cpu->fifo, entry))
-		schedule_work_on(smp_processor_id(), &mf_cpu->work);
-	else
+	if (kfifo_put(&mf_cpu->fifo, entry)) {
+		queue_work_on(cpu, system_highpri_wq, &mf_cpu->work);
+		set_tsk_need_resched(current);
+		preempt_set_need_resched();
+	} else {
 		pr_err("Memory failure: buffer overflow when queuing memory failure at %#lx\n",
 		       pfn);
+	}
 	spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
 	put_cpu_var(memory_failure_cpu);
 }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 11/12] mm/memory-failure: increase queued recovery work's priority
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

arm64 can take an NMI-like error notification when user-space steps in
some corrupt memory. APEI's GHES code will call memory_failure_queue()
to schedule the recovery work. We then return to user-space, possibly
taking the fault again.

Currently the arch code unconditionally signals user-space from this
path, so we don't get stuck in this loop, but the affected process
never benefits from memory_failure()s recovery work. To fix this we
need to know the recovery work will run before we get back to user-space.

Increase the priority of the recovery work by scheduling it on the
system_highpri_wq, then try to bump the current task off this CPU
so that the recover work starts immediately.

Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
CC: Xie XiuQi <xiexiuqi@huawei.com>
CC: gengdongjiu <gengdongjiu@huawei.com>
---
 mm/memory-failure.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9d142b9b86dc..f0e69d7ac406 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -55,6 +55,7 @@
 #include <linux/hugetlb.h>
 #include <linux/memory_hotplug.h>
 #include <linux/mm_inline.h>
+#include <linux/preempt.h>
 #include <linux/kfifo.h>
 #include <linux/ratelimit.h>
 #include "internal.h"
@@ -1333,6 +1334,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
  */
 void memory_failure_queue(unsigned long pfn, int flags)
 {
+	int cpu = smp_processor_id();
 	struct memory_failure_cpu *mf_cpu;
 	unsigned long proc_flags;
 	struct memory_failure_entry entry = {
@@ -1342,11 +1344,14 @@ void memory_failure_queue(unsigned long pfn, int flags)
 
 	mf_cpu = &get_cpu_var(memory_failure_cpu);
 	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
-	if (kfifo_put(&mf_cpu->fifo, entry))
-		schedule_work_on(smp_processor_id(), &mf_cpu->work);
-	else
+	if (kfifo_put(&mf_cpu->fifo, entry)) {
+		queue_work_on(cpu, system_highpri_wq, &mf_cpu->work);
+		set_tsk_need_resched(current);
+		preempt_set_need_resched();
+	} else {
 		pr_err("Memory failure: buffer overflow when queuing memory failure at %#lx\n",
 		       pfn);
+	}
 	spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
 	put_cpu_var(memory_failure_cpu);
 }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 11/12] mm/memory-failure: increase queued recovery work's priority
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

arm64 can take an NMI-like error notification when user-space steps in
some corrupt memory. APEI's GHES code will call memory_failure_queue()
to schedule the recovery work. We then return to user-space, possibly
taking the fault again.

Currently the arch code unconditionally signals user-space from this
path, so we don't get stuck in this loop, but the affected process
never benefits from memory_failure()s recovery work. To fix this we
need to know the recovery work will run before we get back to user-space.

Increase the priority of the recovery work by scheduling it on the
system_highpri_wq, then try to bump the current task off this CPU
so that the recover work starts immediately.

Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
CC: Xie XiuQi <xiexiuqi@huawei.com>
CC: gengdongjiu <gengdongjiu@huawei.com>
---
 mm/memory-failure.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9d142b9b86dc..f0e69d7ac406 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -55,6 +55,7 @@
 #include <linux/hugetlb.h>
 #include <linux/memory_hotplug.h>
 #include <linux/mm_inline.h>
+#include <linux/preempt.h>
 #include <linux/kfifo.h>
 #include <linux/ratelimit.h>
 #include "internal.h"
@@ -1333,6 +1334,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
  */
 void memory_failure_queue(unsigned long pfn, int flags)
 {
+	int cpu = smp_processor_id();
 	struct memory_failure_cpu *mf_cpu;
 	unsigned long proc_flags;
 	struct memory_failure_entry entry = {
@@ -1342,11 +1344,14 @@ void memory_failure_queue(unsigned long pfn, int flags)
 
 	mf_cpu = &get_cpu_var(memory_failure_cpu);
 	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
-	if (kfifo_put(&mf_cpu->fifo, entry))
-		schedule_work_on(smp_processor_id(), &mf_cpu->work);
-	else
+	if (kfifo_put(&mf_cpu->fifo, entry)) {
+		queue_work_on(cpu, system_highpri_wq, &mf_cpu->work);
+		set_tsk_need_resched(current);
+		preempt_set_need_resched();
+	} else {
 		pr_err("Memory failure: buffer overflow when queuing memory failure at %#lx\n",
 		       pfn);
+	}
 	spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
 	put_cpu_var(memory_failure_cpu);
 }
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 12/12] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-04-27 15:35   ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

APEI is unable to do all of its error handling work in nmi-context, so
it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
sends an IPI to the calling cpu, but we can't guarantee this will be
taken before we return.

Unless we interrupted a context with irqs-masked, we can call
irq_work_run() to do the work now. Otherwise return -EINPROGRESS to
indicate ghes_notify_sea() found some work to do, but it hasn't
finished yet.

With this we can take apei_claim_sea() returning '0' to mean this
external-abort was also notification of a firmware-first RAS error,
and that APEI has processed the CPER records.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
CC: Xie XiuQi <xiexiuqi@huawei.com>
CC: gengdongjiu <gengdongjiu@huawei.com>
---
Changes since v2:
 * Removed IS_ENABLED() check, done by the caller unless we have a dummy
   definition.

 arch/arm64/kernel/acpi.c | 19 +++++++++++++++++++
 arch/arm64/mm/fault.c    |  9 ++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 6a4823a3eb5e..a51a7abd98e0 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -22,6 +22,7 @@
 #include <linux/init.h>
 #include <linux/irq.h>
 #include <linux/irqdomain.h>
+#include <linux/irq_work.h>
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
 #include <linux/smp.h>
@@ -275,10 +276,14 @@ int apei_claim_sea(struct pt_regs *regs)
 {
 	int err = -ENOENT;
 	unsigned long current_flags = arch_local_save_flags();
+	unsigned long interrupted_flags = current_flags;
 
 	if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
 		return err;
 
+	if (regs)
+		interrupted_flags = regs->pstate;
+
 	/*
 	 * APEI expects an NMI-like notification to always be called
 	 * in NMI context.
@@ -287,6 +292,20 @@ int apei_claim_sea(struct pt_regs *regs)
 	nmi_enter();
 	err = ghes_notify_sea();
 	nmi_exit();
+
+	/*
+	 * APEI NMI-like notifications are deferred to irq_work. Unless
+	 * we interrupted irqs-masked code, we can do that now.
+	 */
+	if (!err) {
+		if (!arch_irqs_disabled_flags(interrupted_flags)) {
+			local_daif_restore(DAIF_PROCCTX_NOIRQ);
+			irq_work_run();
+		} else {
+			err = -EINPROGRESS;
+		}
+	}
+
 	local_daif_restore(current_flags);
 
 	return err;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index d7e89da0e5df..0232e9064144 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -568,11 +568,10 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 
 	inf = esr_to_fault_info(esr);
 
-	/*
-	 * Return value ignored as we rely on signal merging.
-	 * Future patches will make this more robust.
-	 */
-	apei_claim_sea(regs);
+	if (apei_claim_sea(regs) == 0) {
+		/* APEI claimed this as a firmware-first notification */
+		return 0;
+	}
 
 	info.si_signo = inf->sig;
 	info.si_errno = 0;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 12/12] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-acpi
  Cc: kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, Punit Agrawal,
	jonathan.zhang, James Morse

APEI is unable to do all of its error handling work in nmi-context, so
it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
sends an IPI to the calling cpu, but we can't guarantee this will be
taken before we return.

Unless we interrupted a context with irqs-masked, we can call
irq_work_run() to do the work now. Otherwise return -EINPROGRESS to
indicate ghes_notify_sea() found some work to do, but it hasn't
finished yet.

With this we can take apei_claim_sea() returning '0' to mean this
external-abort was also notification of a firmware-first RAS error,
and that APEI has processed the CPER records.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
CC: Xie XiuQi <xiexiuqi@huawei.com>
CC: gengdongjiu <gengdongjiu@huawei.com>
---
Changes since v2:
 * Removed IS_ENABLED() check, done by the caller unless we have a dummy
   definition.

 arch/arm64/kernel/acpi.c | 19 +++++++++++++++++++
 arch/arm64/mm/fault.c    |  9 ++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 6a4823a3eb5e..a51a7abd98e0 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -22,6 +22,7 @@
 #include <linux/init.h>
 #include <linux/irq.h>
 #include <linux/irqdomain.h>
+#include <linux/irq_work.h>
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
 #include <linux/smp.h>
@@ -275,10 +276,14 @@ int apei_claim_sea(struct pt_regs *regs)
 {
 	int err = -ENOENT;
 	unsigned long current_flags = arch_local_save_flags();
+	unsigned long interrupted_flags = current_flags;
 
 	if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
 		return err;
 
+	if (regs)
+		interrupted_flags = regs->pstate;
+
 	/*
 	 * APEI expects an NMI-like notification to always be called
 	 * in NMI context.
@@ -287,6 +292,20 @@ int apei_claim_sea(struct pt_regs *regs)
 	nmi_enter();
 	err = ghes_notify_sea();
 	nmi_exit();
+
+	/*
+	 * APEI NMI-like notifications are deferred to irq_work. Unless
+	 * we interrupted irqs-masked code, we can do that now.
+	 */
+	if (!err) {
+		if (!arch_irqs_disabled_flags(interrupted_flags)) {
+			local_daif_restore(DAIF_PROCCTX_NOIRQ);
+			irq_work_run();
+		} else {
+			err = -EINPROGRESS;
+		}
+	}
+
 	local_daif_restore(current_flags);
 
 	return err;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index d7e89da0e5df..0232e9064144 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -568,11 +568,10 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 
 	inf = esr_to_fault_info(esr);
 
-	/*
-	 * Return value ignored as we rely on signal merging.
-	 * Future patches will make this more robust.
-	 */
-	apei_claim_sea(regs);
+	if (apei_claim_sea(regs) == 0) {
+		/* APEI claimed this as a firmware-first notification */
+		return 0;
+	}
 
 	info.si_signo = inf->sig;
 	info.si_errno = 0;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v3 12/12] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
@ 2018-04-27 15:35   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-04-27 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

APEI is unable to do all of its error handling work in nmi-context, so
it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
sends an IPI to the calling cpu, but we can't guarantee this will be
taken before we return.

Unless we interrupted a context with irqs-masked, we can call
irq_work_run() to do the work now. Otherwise return -EINPROGRESS to
indicate ghes_notify_sea() found some work to do, but it hasn't
finished yet.

With this we can take apei_claim_sea() returning '0' to mean this
external-abort was also notification of a firmware-first RAS error,
and that APEI has processed the CPER records.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
CC: Xie XiuQi <xiexiuqi@huawei.com>
CC: gengdongjiu <gengdongjiu@huawei.com>
---
Changes since v2:
 * Removed IS_ENABLED() check, done by the caller unless we have a dummy
   definition.

 arch/arm64/kernel/acpi.c | 19 +++++++++++++++++++
 arch/arm64/mm/fault.c    |  9 ++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 6a4823a3eb5e..a51a7abd98e0 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -22,6 +22,7 @@
 #include <linux/init.h>
 #include <linux/irq.h>
 #include <linux/irqdomain.h>
+#include <linux/irq_work.h>
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
 #include <linux/smp.h>
@@ -275,10 +276,14 @@ int apei_claim_sea(struct pt_regs *regs)
 {
 	int err = -ENOENT;
 	unsigned long current_flags = arch_local_save_flags();
+	unsigned long interrupted_flags = current_flags;
 
 	if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA))
 		return err;
 
+	if (regs)
+		interrupted_flags = regs->pstate;
+
 	/*
 	 * APEI expects an NMI-like notification to always be called
 	 * in NMI context.
@@ -287,6 +292,20 @@ int apei_claim_sea(struct pt_regs *regs)
 	nmi_enter();
 	err = ghes_notify_sea();
 	nmi_exit();
+
+	/*
+	 * APEI NMI-like notifications are deferred to irq_work. Unless
+	 * we interrupted irqs-masked code, we can do that now.
+	 */
+	if (!err) {
+		if (!arch_irqs_disabled_flags(interrupted_flags)) {
+			local_daif_restore(DAIF_PROCCTX_NOIRQ);
+			irq_work_run();
+		} else {
+			err = -EINPROGRESS;
+		}
+	}
+
 	local_daif_restore(current_flags);
 
 	return err;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index d7e89da0e5df..0232e9064144 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -568,11 +568,10 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 
 	inf = esr_to_fault_info(esr);
 
-	/*
-	 * Return value ignored as we rely on signal merging.
-	 * Future patches will make this more robust.
-	 */
-	apei_claim_sea(regs);
+	if (apei_claim_sea(regs) == 0) {
+		/* APEI claimed this as a firmware-first notification */
+		return 0;
+	}
 
 	info.si_signo = inf->sig;
 	info.si_errno = 0;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
  2018-04-27 15:34   ` James Morse
  (?)
@ 2018-05-01 10:43     ` Punit Agrawal
  -1 siblings, 0 replies; 78+ messages in thread
From: Punit Agrawal @ 2018-05-01 10:43 UTC (permalink / raw)
  To: James Morse
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Xie XiuQi, linux-mm, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-acpi, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

Hi James,

James Morse <james.morse@arm.com> writes:

> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
>
> First we move the estatus-queue code higher in the file so that any
> notify_foo() handler can make use of it.
>
> This patch moves code around ... and makes the following trivial change:
> Freshen the dated comment above ghes_estatus_llist. printk() is no
> longer the issue, its the helpers like memory_failure_queue() that
> still aren't nmi safe.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
>


> Notes for cover letter:
> ghes.c has three things all called 'estatus'. One is a pool of memory
> that has a static size, and is grown/shrunk when new NMI users are
> allocated.
> The second is the cache, this holds recent notifications so we can
> suppress notifications we've already handled.
> The last is the queue, which hold data from NMI notifications (in pool
> memory) that can't be handled immediatly.


I am guessing you intended to drop the notes before sending the patch
out.

Calling this out as it'd make sense to clean-this up if the series is
ready for merging.

Thanks,
Punit

[...]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
@ 2018-05-01 10:43     ` Punit Agrawal
  0 siblings, 0 replies; 78+ messages in thread
From: Punit Agrawal @ 2018-05-01 10:43 UTC (permalink / raw)
  To: James Morse
  Cc: linux-acpi, kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, jonathan.zhang

Hi James,

James Morse <james.morse@arm.com> writes:

> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
>
> First we move the estatus-queue code higher in the file so that any
> notify_foo() handler can make use of it.
>
> This patch moves code around ... and makes the following trivial change:
> Freshen the dated comment above ghes_estatus_llist. printk() is no
> longer the issue, its the helpers like memory_failure_queue() that
> still aren't nmi safe.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
>


> Notes for cover letter:
> ghes.c has three things all called 'estatus'. One is a pool of memory
> that has a static size, and is grown/shrunk when new NMI users are
> allocated.
> The second is the cache, this holds recent notifications so we can
> suppress notifications we've already handled.
> The last is the queue, which hold data from NMI notifications (in pool
> memory) that can't be handled immediatly.


I am guessing you intended to drop the notes before sending the patch
out.

Calling this out as it'd make sense to clean-this up if the series is
ready for merging.

Thanks,
Punit

[...]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
@ 2018-05-01 10:43     ` Punit Agrawal
  0 siblings, 0 replies; 78+ messages in thread
From: Punit Agrawal @ 2018-05-01 10:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

James Morse <james.morse@arm.com> writes:

> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
>
> First we move the estatus-queue code higher in the file so that any
> notify_foo() handler can make use of it.
>
> This patch moves code around ... and makes the following trivial change:
> Freshen the dated comment above ghes_estatus_llist. printk() is no
> longer the issue, its the helpers like memory_failure_queue() that
> still aren't nmi safe.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
>


> Notes for cover letter:
> ghes.c has three things all called 'estatus'. One is a pool of memory
> that has a static size, and is grown/shrunk when new NMI users are
> allocated.
> The second is the cache, this holds recent notifications so we can
> suppress notifications we've already handled.
> The last is the queue, which hold data from NMI notifications (in pool
> memory) that can't be handled immediatly.


I am guessing you intended to drop the notes before sending the patch
out.

Calling this out as it'd make sense to clean-this up if the series is
ready for merging.

Thanks,
Punit

[...]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
  2018-05-01 10:43     ` Punit Agrawal
  (?)
@ 2018-05-01 12:50       ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-01 12:50 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Xie XiuQi, linux-mm, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-acpi, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

Hi Punit,

On 01/05/18 11:43, Punit Agrawal wrote:
> James Morse <james.morse@arm.com> writes:
>> Notes for cover letter:
>> ghes.c has three things all called 'estatus'. One is a pool of memory
>> that has a static size, and is grown/shrunk when new NMI users are
>> allocated.
>> The second is the cache, this holds recent notifications so we can
>> suppress notifications we've already handled.
>> The last is the queue, which hold data from NMI notifications (in pool
>> memory) that can't be handled immediatly.
> 
> 
> I am guessing you intended to drop the notes before sending the patch
> out.

Ha, oops!

> Calling this out as it'd make sense to clean-this up if the series is
> ready for merging.

Yes, thanks for calling that out.

I've deleted them now (copy-paste cover-letter serves the same purpose), so any
v4 will drop this.


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
@ 2018-05-01 12:50       ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-01 12:50 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: linux-acpi, kvmarm, linux-arm-kernel, linux-mm, Borislav Petkov,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Tyler Baicar, Dongjiu Geng, Xie XiuQi, jonathan.zhang

Hi Punit,

On 01/05/18 11:43, Punit Agrawal wrote:
> James Morse <james.morse@arm.com> writes:
>> Notes for cover letter:
>> ghes.c has three things all called 'estatus'. One is a pool of memory
>> that has a static size, and is grown/shrunk when new NMI users are
>> allocated.
>> The second is the cache, this holds recent notifications so we can
>> suppress notifications we've already handled.
>> The last is the queue, which hold data from NMI notifications (in pool
>> memory) that can't be handled immediatly.
> 
> 
> I am guessing you intended to drop the notes before sending the patch
> out.

Ha, oops!

> Calling this out as it'd make sense to clean-this up if the series is
> ready for merging.

Yes, thanks for calling that out.

I've deleted them now (copy-paste cover-letter serves the same purpose), so any
v4 will drop this.


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
@ 2018-05-01 12:50       ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-01 12:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Punit,

On 01/05/18 11:43, Punit Agrawal wrote:
> James Morse <james.morse@arm.com> writes:
>> Notes for cover letter:
>> ghes.c has three things all called 'estatus'. One is a pool of memory
>> that has a static size, and is grown/shrunk when new NMI users are
>> allocated.
>> The second is the cache, this holds recent notifications so we can
>> suppress notifications we've already handled.
>> The last is the queue, which hold data from NMI notifications (in pool
>> memory) that can't be handled immediatly.
> 
> 
> I am guessing you intended to drop the notes before sending the patch
> out.

Ha, oops!

> Calling this out as it'd make sense to clean-this up if the series is
> ready for merging.

Yes, thanks for calling that out.

I've deleted them now (copy-paste cover-letter serves the same purpose), so any
v4 will drop this.


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 00/12] APEI in_nmi() rework and arm64 SDEI wire-up
  2018-04-27 15:34 ` James Morse
  (?)
@ 2018-05-01 20:15   ` Tyler Baicar
  -1 siblings, 0 replies; 78+ messages in thread
From: Tyler Baicar @ 2018-05-01 20:15 UTC (permalink / raw)
  To: James Morse, linux-acpi
  Cc: jonathan.zhang, Tony Luck, Christoffer Dall, Xie XiuQi,
	Marc Zyngier, Catalin Marinas, Punit Agrawal, Rafael Wysocki,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

Hi James,

I tested this and verified that the NULL pointer issue that I reported on the 
initial version is fixed.

I don't currently have a way to test the SDEI parts of this, but for all other 
patches:

Tested-by: Tyler Baicar <tbaicar@codeaurora.org>

Thanks!
Tyler


On 4/27/2018 11:34 AM, James Morse wrote:
> The aim of this series is to wire arm64's SDEI into APEI.
>
> Nothing much has changed since v2, patch 3 is new and the KVM IS_ENABLED()
> stuff has been cleaned up. Otherwise things are noted in each patch.
>
> This touches a few trees, so I'm not sure how best it should be merged.
> Patches 11 and 12 are reducing a race that is made worse by patch 4, I'd
> like them to arrive together, even though patch 11 doesn't depend on anything
> else in the series. (patches 11&12 could be moved to after 8 if that makes
> things easier)
>
>
> The earlier boiler-plate:
>
> What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
> used by firmware to tell the OS about firmware-first RAS events.
>
> These Software exceptions can interrupt anything, so I describe them as
> NMI-like. They aren't the only NMI-like way to notify the OS about
> firmware-first RAS events, the ACPI spec also defines 'NOTFIY_SEA' and
> 'NOTIFY_SEI'.
>
> (Acronyms: SEA, Synchronous External Abort. The CPU requested some memory,
> but the owner of that memory said no. These are always synchronous with the
> instruction that caused them. SEI, System-Error Interrupt, commonly called
> SError. This is an asynchronous external abort, the memory-owner didn't say no
> at the right point. Collectively these things are called external-aborts
> How is firmware involved? It traps these and re-injects them into the kernel
> once its written the CPER records).
>
>
> APEI's GHES code only expects one source of NMI. If a platform implements
> more than one of these mechanisms, APEI needs to handle the interaction.
> 'SEA' and 'SEI' can interact as 'SEI' is asynchronous. SDEI can interact
> with itself: its exceptions can be 'normal' or 'critical', and firmware
> could use both types for RAS. (errors using normal, 'panic-now' using
> critical).
>
> What does this series do?
> Patches 1-4 refactor APEIs 'estatus queue' so it can be used for all
> NMI-like notifications. This defers the NMI work to irq_work, which will
> happen when we next unmask interrupts.
>
> Patches 5&6 move the arch and KVM code around so that NMI-like notifications
> are always called in_nmi().
>
> Patch 7 changes the 'irq or nmi?' path through ghes_copy_tofrom_phys()
> to be per-ghes. When called in_nmi(), the struct ghes is expected to
> provide a fixmap slot and lock that is safe to use. NMI-like notifications
> that mask each other can share these resources. Those that interact should
> have their own fixmap slot and lock.
>
> Patch 8 renames NOTIFY_SEA's use of NOTIFY_NMI's infrastructure, as we're
> about to have multiple NMI-like users that can't share resources.
>
> Pathes 9&10 add the SDEI helper, and notify methods for APEI.
>
> After this, adding further firmware-first pieces for arm64 is simple
> (and safe), and all our NMI-like notifications behave the same as x86's
> NOTIFY_NMI.
>
> All of this makes the race between memory_failure_queue() and
> ret_to_user worse, as there is now always irq_work involved.
>
> Patch 11 makes the reschedule to memory_failure() run as soon as possible.
> Patch 12 makes sure the arch code knows whether the irq_work has run by
> the time do_sea() returns. We can skip the signalling step if it has as
> APEI has done its work.
>
>
> ghes.c became clearer to me when I worked out that it has three sets of
> functions with 'estatus' in the name. One is a pool of memory that can be
> allocated-from atomically. This is grown/shrunk when new NMI users are
> allocated.
> The second is the estatus-cache, which holds recent notifications so it
> can suppress notifications we've already handled.
> The last it the estatus-queue, which holds data from NMI-like notifications
> (in pool memory) to be processed from irq_work.
>
>
> Testing?
> Tested with the SDEI FVP based software model and a mocked up NOTFIY_SEA using
> KVM. I've added a case where 'corrected errors' are discovered at probe time
> to exercise ghes_probe() during boot. I've only build tested this on x86.
>
>
> Thanks,
>
> James
>
> [0] http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf
>
>
> James Morse (12):
>    ACPI / APEI: Move the estatus queue code up, and under its own ifdef
>    ACPI / APEI: Generalise the estatus queue's add/remove and notify code
>    ACPI / APEI: don't wait to serialise with oops messages when
>      panic()ing
>    ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
>    KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
>    arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
>    ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple
>      in_nmi() users
>    ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
>    firmware: arm_sdei: Add ACPI GHES registration helper
>    ACPI / APEI: Add support for the SDEI GHES Notification type
>    mm/memory-failure: increase queued recovery work's priority
>    arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
>
>   arch/arm/include/asm/kvm_ras.h       |  14 +
>   arch/arm/include/asm/system_misc.h   |   5 -
>   arch/arm64/include/asm/acpi.h        |   5 +-
>   arch/arm64/include/asm/daifflags.h   |   1 +
>   arch/arm64/include/asm/fixmap.h      |   8 +-
>   arch/arm64/include/asm/kvm_ras.h     |  24 ++
>   arch/arm64/include/asm/system_misc.h |   2 -
>   arch/arm64/kernel/acpi.c             |  49 ++++
>   arch/arm64/mm/fault.c                |  30 +-
>   drivers/acpi/apei/ghes.c             | 517 ++++++++++++++++++++---------------
>   drivers/firmware/arm_sdei.c          |  77 ++++++
>   include/acpi/ghes.h                  |   4 +
>   include/linux/arm_sdei.h             |   8 +
>   mm/memory-failure.c                  |  11 +-
>   virt/kvm/arm/mmu.c                   |   4 +-
>   15 files changed, 499 insertions(+), 260 deletions(-)
>   create mode 100644 arch/arm/include/asm/kvm_ras.h
>   create mode 100644 arch/arm64/include/asm/kvm_ras.h
>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 00/12] APEI in_nmi() rework and arm64 SDEI wire-up
@ 2018-05-01 20:15   ` Tyler Baicar
  0 siblings, 0 replies; 78+ messages in thread
From: Tyler Baicar @ 2018-05-01 20:15 UTC (permalink / raw)
  To: James Morse, linux-acpi
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, Xie XiuQi, Marc Zyngier, Catalin Marinas,
	Will Deacon, Dongjiu Geng, linux-mm, Borislav Petkov,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

Hi James,

I tested this and verified that the NULL pointer issue that I reported on the 
initial version is fixed.

I don't currently have a way to test the SDEI parts of this, but for all other 
patches:

Tested-by: Tyler Baicar <tbaicar@codeaurora.org>

Thanks!
Tyler


On 4/27/2018 11:34 AM, James Morse wrote:
> The aim of this series is to wire arm64's SDEI into APEI.
>
> Nothing much has changed since v2, patch 3 is new and the KVM IS_ENABLED()
> stuff has been cleaned up. Otherwise things are noted in each patch.
>
> This touches a few trees, so I'm not sure how best it should be merged.
> Patches 11 and 12 are reducing a race that is made worse by patch 4, I'd
> like them to arrive together, even though patch 11 doesn't depend on anything
> else in the series. (patches 11&12 could be moved to after 8 if that makes
> things easier)
>
>
> The earlier boiler-plate:
>
> What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
> used by firmware to tell the OS about firmware-first RAS events.
>
> These Software exceptions can interrupt anything, so I describe them as
> NMI-like. They aren't the only NMI-like way to notify the OS about
> firmware-first RAS events, the ACPI spec also defines 'NOTFIY_SEA' and
> 'NOTIFY_SEI'.
>
> (Acronyms: SEA, Synchronous External Abort. The CPU requested some memory,
> but the owner of that memory said no. These are always synchronous with the
> instruction that caused them. SEI, System-Error Interrupt, commonly called
> SError. This is an asynchronous external abort, the memory-owner didn't say no
> at the right point. Collectively these things are called external-aborts
> How is firmware involved? It traps these and re-injects them into the kernel
> once its written the CPER records).
>
>
> APEI's GHES code only expects one source of NMI. If a platform implements
> more than one of these mechanisms, APEI needs to handle the interaction.
> 'SEA' and 'SEI' can interact as 'SEI' is asynchronous. SDEI can interact
> with itself: its exceptions can be 'normal' or 'critical', and firmware
> could use both types for RAS. (errors using normal, 'panic-now' using
> critical).
>
> What does this series do?
> Patches 1-4 refactor APEIs 'estatus queue' so it can be used for all
> NMI-like notifications. This defers the NMI work to irq_work, which will
> happen when we next unmask interrupts.
>
> Patches 5&6 move the arch and KVM code around so that NMI-like notifications
> are always called in_nmi().
>
> Patch 7 changes the 'irq or nmi?' path through ghes_copy_tofrom_phys()
> to be per-ghes. When called in_nmi(), the struct ghes is expected to
> provide a fixmap slot and lock that is safe to use. NMI-like notifications
> that mask each other can share these resources. Those that interact should
> have their own fixmap slot and lock.
>
> Patch 8 renames NOTIFY_SEA's use of NOTIFY_NMI's infrastructure, as we're
> about to have multiple NMI-like users that can't share resources.
>
> Pathes 9&10 add the SDEI helper, and notify methods for APEI.
>
> After this, adding further firmware-first pieces for arm64 is simple
> (and safe), and all our NMI-like notifications behave the same as x86's
> NOTIFY_NMI.
>
> All of this makes the race between memory_failure_queue() and
> ret_to_user worse, as there is now always irq_work involved.
>
> Patch 11 makes the reschedule to memory_failure() run as soon as possible.
> Patch 12 makes sure the arch code knows whether the irq_work has run by
> the time do_sea() returns. We can skip the signalling step if it has as
> APEI has done its work.
>
>
> ghes.c became clearer to me when I worked out that it has three sets of
> functions with 'estatus' in the name. One is a pool of memory that can be
> allocated-from atomically. This is grown/shrunk when new NMI users are
> allocated.
> The second is the estatus-cache, which holds recent notifications so it
> can suppress notifications we've already handled.
> The last it the estatus-queue, which holds data from NMI-like notifications
> (in pool memory) to be processed from irq_work.
>
>
> Testing?
> Tested with the SDEI FVP based software model and a mocked up NOTFIY_SEA using
> KVM. I've added a case where 'corrected errors' are discovered at probe time
> to exercise ghes_probe() during boot. I've only build tested this on x86.
>
>
> Thanks,
>
> James
>
> [0] http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf
>
>
> James Morse (12):
>    ACPI / APEI: Move the estatus queue code up, and under its own ifdef
>    ACPI / APEI: Generalise the estatus queue's add/remove and notify code
>    ACPI / APEI: don't wait to serialise with oops messages when
>      panic()ing
>    ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
>    KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
>    arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
>    ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple
>      in_nmi() users
>    ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
>    firmware: arm_sdei: Add ACPI GHES registration helper
>    ACPI / APEI: Add support for the SDEI GHES Notification type
>    mm/memory-failure: increase queued recovery work's priority
>    arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
>
>   arch/arm/include/asm/kvm_ras.h       |  14 +
>   arch/arm/include/asm/system_misc.h   |   5 -
>   arch/arm64/include/asm/acpi.h        |   5 +-
>   arch/arm64/include/asm/daifflags.h   |   1 +
>   arch/arm64/include/asm/fixmap.h      |   8 +-
>   arch/arm64/include/asm/kvm_ras.h     |  24 ++
>   arch/arm64/include/asm/system_misc.h |   2 -
>   arch/arm64/kernel/acpi.c             |  49 ++++
>   arch/arm64/mm/fault.c                |  30 +-
>   drivers/acpi/apei/ghes.c             | 517 ++++++++++++++++++++---------------
>   drivers/firmware/arm_sdei.c          |  77 ++++++
>   include/acpi/ghes.h                  |   4 +
>   include/linux/arm_sdei.h             |   8 +
>   mm/memory-failure.c                  |  11 +-
>   virt/kvm/arm/mmu.c                   |   4 +-
>   15 files changed, 499 insertions(+), 260 deletions(-)
>   create mode 100644 arch/arm/include/asm/kvm_ras.h
>   create mode 100644 arch/arm64/include/asm/kvm_ras.h
>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 00/12] APEI in_nmi() rework and arm64 SDEI wire-up
@ 2018-05-01 20:15   ` Tyler Baicar
  0 siblings, 0 replies; 78+ messages in thread
From: Tyler Baicar @ 2018-05-01 20:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

I tested this and verified that the NULL pointer issue that I reported on the 
initial version is fixed.

I don't currently have a way to test the SDEI parts of this, but for all other 
patches:

Tested-by: Tyler Baicar <tbaicar@codeaurora.org>

Thanks!
Tyler


On 4/27/2018 11:34 AM, James Morse wrote:
> The aim of this series is to wire arm64's SDEI into APEI.
>
> Nothing much has changed since v2, patch 3 is new and the KVM IS_ENABLED()
> stuff has been cleaned up. Otherwise things are noted in each patch.
>
> This touches a few trees, so I'm not sure how best it should be merged.
> Patches 11 and 12 are reducing a race that is made worse by patch 4, I'd
> like them to arrive together, even though patch 11 doesn't depend on anything
> else in the series. (patches 11&12 could be moved to after 8 if that makes
> things easier)
>
>
> The earlier boiler-plate:
>
> What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
> used by firmware to tell the OS about firmware-first RAS events.
>
> These Software exceptions can interrupt anything, so I describe them as
> NMI-like. They aren't the only NMI-like way to notify the OS about
> firmware-first RAS events, the ACPI spec also defines 'NOTFIY_SEA' and
> 'NOTIFY_SEI'.
>
> (Acronyms: SEA, Synchronous External Abort. The CPU requested some memory,
> but the owner of that memory said no. These are always synchronous with the
> instruction that caused them. SEI, System-Error Interrupt, commonly called
> SError. This is an asynchronous external abort, the memory-owner didn't say no
> at the right point. Collectively these things are called external-aborts
> How is firmware involved? It traps these and re-injects them into the kernel
> once its written the CPER records).
>
>
> APEI's GHES code only expects one source of NMI. If a platform implements
> more than one of these mechanisms, APEI needs to handle the interaction.
> 'SEA' and 'SEI' can interact as 'SEI' is asynchronous. SDEI can interact
> with itself: its exceptions can be 'normal' or 'critical', and firmware
> could use both types for RAS. (errors using normal, 'panic-now' using
> critical).
>
> What does this series do?
> Patches 1-4 refactor APEIs 'estatus queue' so it can be used for all
> NMI-like notifications. This defers the NMI work to irq_work, which will
> happen when we next unmask interrupts.
>
> Patches 5&6 move the arch and KVM code around so that NMI-like notifications
> are always called in_nmi().
>
> Patch 7 changes the 'irq or nmi?' path through ghes_copy_tofrom_phys()
> to be per-ghes. When called in_nmi(), the struct ghes is expected to
> provide a fixmap slot and lock that is safe to use. NMI-like notifications
> that mask each other can share these resources. Those that interact should
> have their own fixmap slot and lock.
>
> Patch 8 renames NOTIFY_SEA's use of NOTIFY_NMI's infrastructure, as we're
> about to have multiple NMI-like users that can't share resources.
>
> Pathes 9&10 add the SDEI helper, and notify methods for APEI.
>
> After this, adding further firmware-first pieces for arm64 is simple
> (and safe), and all our NMI-like notifications behave the same as x86's
> NOTIFY_NMI.
>
> All of this makes the race between memory_failure_queue() and
> ret_to_user worse, as there is now always irq_work involved.
>
> Patch 11 makes the reschedule to memory_failure() run as soon as possible.
> Patch 12 makes sure the arch code knows whether the irq_work has run by
> the time do_sea() returns. We can skip the signalling step if it has as
> APEI has done its work.
>
>
> ghes.c became clearer to me when I worked out that it has three sets of
> functions with 'estatus' in the name. One is a pool of memory that can be
> allocated-from atomically. This is grown/shrunk when new NMI users are
> allocated.
> The second is the estatus-cache, which holds recent notifications so it
> can suppress notifications we've already handled.
> The last it the estatus-queue, which holds data from NMI-like notifications
> (in pool memory) to be processed from irq_work.
>
>
> Testing?
> Tested with the SDEI FVP based software model and a mocked up NOTFIY_SEA using
> KVM. I've added a case where 'corrected errors' are discovered at probe time
> to exercise ghes_probe() during boot. I've only build tested this on x86.
>
>
> Thanks,
>
> James
>
> [0] http://infocenter.arm.com/help/topic/com.arm.doc.den0054a/ARM_DEN0054A_Software_Delegated_Exception_Interface.pdf
>
>
> James Morse (12):
>    ACPI / APEI: Move the estatus queue code up, and under its own ifdef
>    ACPI / APEI: Generalise the estatus queue's add/remove and notify code
>    ACPI / APEI: don't wait to serialise with oops messages when
>      panic()ing
>    ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
>    KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
>    arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
>    ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple
>      in_nmi() users
>    ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications
>    firmware: arm_sdei: Add ACPI GHES registration helper
>    ACPI / APEI: Add support for the SDEI GHES Notification type
>    mm/memory-failure: increase queued recovery work's priority
>    arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
>
>   arch/arm/include/asm/kvm_ras.h       |  14 +
>   arch/arm/include/asm/system_misc.h   |   5 -
>   arch/arm64/include/asm/acpi.h        |   5 +-
>   arch/arm64/include/asm/daifflags.h   |   1 +
>   arch/arm64/include/asm/fixmap.h      |   8 +-
>   arch/arm64/include/asm/kvm_ras.h     |  24 ++
>   arch/arm64/include/asm/system_misc.h |   2 -
>   arch/arm64/kernel/acpi.c             |  49 ++++
>   arch/arm64/mm/fault.c                |  30 +-
>   drivers/acpi/apei/ghes.c             | 517 ++++++++++++++++++++---------------
>   drivers/firmware/arm_sdei.c          |  77 ++++++
>   include/acpi/ghes.h                  |   4 +
>   include/linux/arm_sdei.h             |   8 +
>   mm/memory-failure.c                  |  11 +-
>   virt/kvm/arm/mmu.c                   |   4 +-
>   15 files changed, 499 insertions(+), 260 deletions(-)
>   create mode 100644 arch/arm/include/asm/kvm_ras.h
>   create mode 100644 arch/arm64/include/asm/kvm_ras.h
>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
  2018-04-27 15:34   ` James Morse
  (?)
@ 2018-05-05  9:58     ` Borislav Petkov
  -1 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-05  9:58 UTC (permalink / raw)
  To: James Morse
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Xie XiuQi, linux-mm, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-acpi, Punit Agrawal,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

On Fri, Apr 27, 2018 at 04:34:59PM +0100, James Morse wrote:
> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
> 
> First we move the estatus-queue code higher in the file so that any
> notify_foo() handler can make use of it.
> 
> This patch moves code around ... and makes the following trivial change:
> Freshen the dated comment above ghes_estatus_llist. printk() is no
> longer the issue, its the helpers like memory_failure_queue() that
> still aren't nmi safe.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
> 
> Notes for cover letter:
> ghes.c has three things all called 'estatus'. One is a pool of memory
> that has a static size, and is grown/shrunk when new NMI users are
> allocated.
> The second is the cache, this holds recent notifications so we can
> suppress notifications we've already handled.
> The last is the queue, which hold data from NMI notifications (in pool
> memory) that can't be handled immediatly.
> ---
>  drivers/acpi/apei/ghes.c | 265 ++++++++++++++++++++++++-----------------------
>  1 file changed, 137 insertions(+), 128 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
@ 2018-05-05  9:58     ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-05  9:58 UTC (permalink / raw)
  To: James Morse
  Cc: linux-acpi, kvmarm, linux-arm-kernel, linux-mm, Marc Zyngier,
	Christoffer Dall, Will Deacon, Catalin Marinas, Naoya Horiguchi,
	Rafael Wysocki, Len Brown, Tony Luck, Tyler Baicar, Dongjiu Geng,
	Xie XiuQi, Punit Agrawal, jonathan.zhang

On Fri, Apr 27, 2018 at 04:34:59PM +0100, James Morse wrote:
> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
> 
> First we move the estatus-queue code higher in the file so that any
> notify_foo() handler can make use of it.
> 
> This patch moves code around ... and makes the following trivial change:
> Freshen the dated comment above ghes_estatus_llist. printk() is no
> longer the issue, its the helpers like memory_failure_queue() that
> still aren't nmi safe.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
> 
> Notes for cover letter:
> ghes.c has three things all called 'estatus'. One is a pool of memory
> that has a static size, and is grown/shrunk when new NMI users are
> allocated.
> The second is the cache, this holds recent notifications so we can
> suppress notifications we've already handled.
> The last is the queue, which hold data from NMI notifications (in pool
> memory) that can't be handled immediatly.
> ---
>  drivers/acpi/apei/ghes.c | 265 ++++++++++++++++++++++++-----------------------
>  1 file changed, 137 insertions(+), 128 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef
@ 2018-05-05  9:58     ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-05  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 27, 2018 at 04:34:59PM +0100, James Morse wrote:
> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
> 
> First we move the estatus-queue code higher in the file so that any
> notify_foo() handler can make use of it.
> 
> This patch moves code around ... and makes the following trivial change:
> Freshen the dated comment above ghes_estatus_llist. printk() is no
> longer the issue, its the helpers like memory_failure_queue() that
> still aren't nmi safe.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
> 
> Notes for cover letter:
> ghes.c has three things all called 'estatus'. One is a pool of memory
> that has a static size, and is grown/shrunk when new NMI users are
> allocated.
> The second is the cache, this holds recent notifications so we can
> suppress notifications we've already handled.
> The last is the queue, which hold data from NMI notifications (in pool
> memory) that can't be handled immediatly.
> ---
>  drivers/acpi/apei/ghes.c | 265 ++++++++++++++++++++++++-----------------------
>  1 file changed, 137 insertions(+), 128 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code
  2018-04-27 15:35   ` James Morse
  (?)
@ 2018-05-05 10:12     ` Borislav Petkov
  -1 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-05 10:12 UTC (permalink / raw)
  To: James Morse
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Xie XiuQi, linux-mm, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-acpi, Punit Agrawal,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

On Fri, Apr 27, 2018 at 04:35:00PM +0100, James Morse wrote:
> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
> 
> Refactor the estatus queue's pool grow/shrink code and notification
> routine from NOTIFY_NMI's handlers. This will allow another notification
> method to use the estatus queue without duplicating this code.

These two are repeated from patch 1.

> This patch adds rcu_read_lock()/rcu_read_unlock() around the list
> list_for_each_entry_rcu() walker. These aren't strictly necessary as
> the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
> critical section.
> 
> Keep the oops_begin() call for x86, arm64 doesn't have one of these,
> and APEI is the only thing outside arch code calling this..

Next patch removes it so I guess you don't have to talk about it here.

> The existing ghes_estatus_pool_shrink() is folded into the new
> ghes_estatus_queue_shrink_pool() as only the queue uses it.
> 
> _in_nmi_notify_one() is separate from the rcu-list walker for a later
> caller that doesn't need to walk a list.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
> 
> ---
> Changes since v1:
>  * Tidied up _in_nmi_notify_one().
> 
>  drivers/acpi/apei/ghes.c | 100 ++++++++++++++++++++++++++++++-----------------
>  1 file changed, 65 insertions(+), 35 deletions(-)

...

> +static int ghes_estatus_queue_notified(struct list_head *rcu_list)
> +{
> +	int ret = -ENOENT;
> +	struct ghes *ghes;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(ghes, rcu_list, list) {
> +		if (!_in_nmi_notify_one(ghes))
> +			ret = 0;
> +	}
> +	rcu_read_unlock();
> +
> +	if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && ret == 0)

		... && !ret

like the rest of the file.

> +		irq_work_queue(&ghes_proc_irq_work);
> +
> +	return ret;
> +}
> +
>  static unsigned long ghes_esource_prealloc_size(
>  	const struct acpi_hest_generic *generic)
>  {

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code
@ 2018-05-05 10:12     ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-05 10:12 UTC (permalink / raw)
  To: James Morse
  Cc: linux-acpi, kvmarm, linux-arm-kernel, linux-mm, Marc Zyngier,
	Christoffer Dall, Will Deacon, Catalin Marinas, Naoya Horiguchi,
	Rafael Wysocki, Len Brown, Tony Luck, Tyler Baicar, Dongjiu Geng,
	Xie XiuQi, Punit Agrawal, jonathan.zhang

On Fri, Apr 27, 2018 at 04:35:00PM +0100, James Morse wrote:
> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
> 
> Refactor the estatus queue's pool grow/shrink code and notification
> routine from NOTIFY_NMI's handlers. This will allow another notification
> method to use the estatus queue without duplicating this code.

These two are repeated from patch 1.

> This patch adds rcu_read_lock()/rcu_read_unlock() around the list
> list_for_each_entry_rcu() walker. These aren't strictly necessary as
> the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
> critical section.
> 
> Keep the oops_begin() call for x86, arm64 doesn't have one of these,
> and APEI is the only thing outside arch code calling this..

Next patch removes it so I guess you don't have to talk about it here.

> The existing ghes_estatus_pool_shrink() is folded into the new
> ghes_estatus_queue_shrink_pool() as only the queue uses it.
> 
> _in_nmi_notify_one() is separate from the rcu-list walker for a later
> caller that doesn't need to walk a list.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
> 
> ---
> Changes since v1:
>  * Tidied up _in_nmi_notify_one().
> 
>  drivers/acpi/apei/ghes.c | 100 ++++++++++++++++++++++++++++++-----------------
>  1 file changed, 65 insertions(+), 35 deletions(-)

...

> +static int ghes_estatus_queue_notified(struct list_head *rcu_list)
> +{
> +	int ret = -ENOENT;
> +	struct ghes *ghes;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(ghes, rcu_list, list) {
> +		if (!_in_nmi_notify_one(ghes))
> +			ret = 0;
> +	}
> +	rcu_read_unlock();
> +
> +	if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && ret == 0)

		... && !ret

like the rest of the file.

> +		irq_work_queue(&ghes_proc_irq_work);
> +
> +	return ret;
> +}
> +
>  static unsigned long ghes_esource_prealloc_size(
>  	const struct acpi_hest_generic *generic)
>  {

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code
@ 2018-05-05 10:12     ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-05 10:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 27, 2018 at 04:35:00PM +0100, James Morse wrote:
> To support asynchronous NMI-like notifications on arm64 we need to use
> the estatus-queue. These patches refactor it to allow multiple APEI
> notification types to use it.
> 
> Refactor the estatus queue's pool grow/shrink code and notification
> routine from NOTIFY_NMI's handlers. This will allow another notification
> method to use the estatus queue without duplicating this code.

These two are repeated from patch 1.

> This patch adds rcu_read_lock()/rcu_read_unlock() around the list
> list_for_each_entry_rcu() walker. These aren't strictly necessary as
> the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
> critical section.
> 
> Keep the oops_begin() call for x86, arm64 doesn't have one of these,
> and APEI is the only thing outside arch code calling this..

Next patch removes it so I guess you don't have to talk about it here.

> The existing ghes_estatus_pool_shrink() is folded into the new
> ghes_estatus_queue_shrink_pool() as only the queue uses it.
> 
> _in_nmi_notify_one() is separate from the rcu-list walker for a later
> caller that doesn't need to walk a list.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
> 
> ---
> Changes since v1:
>  * Tidied up _in_nmi_notify_one().
> 
>  drivers/acpi/apei/ghes.c | 100 ++++++++++++++++++++++++++++++-----------------
>  1 file changed, 65 insertions(+), 35 deletions(-)

...

> +static int ghes_estatus_queue_notified(struct list_head *rcu_list)
> +{
> +	int ret = -ENOENT;
> +	struct ghes *ghes;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(ghes, rcu_list, list) {
> +		if (!_in_nmi_notify_one(ghes))
> +			ret = 0;
> +	}
> +	rcu_read_unlock();
> +
> +	if (IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && ret == 0)

		... && !ret

like the rest of the file.

> +		irq_work_queue(&ghes_proc_irq_work);
> +
> +	return ret;
> +}
> +
>  static unsigned long ghes_esource_prealloc_size(
>  	const struct acpi_hest_generic *generic)
>  {

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
  2018-04-27 15:35   ` James Morse
  (?)
@ 2018-05-05 12:27     ` Borislav Petkov
  -1 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-05 12:27 UTC (permalink / raw)
  To: James Morse
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Xie XiuQi, linux-mm, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-acpi, Punit Agrawal,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

On Fri, Apr 27, 2018 at 04:35:05PM +0100, James Morse wrote:
> Arm64 has multiple NMI-like notifications, but ghes.c only has one
> in_nmi() path, risking deadlock if one NMI-like notification can
> interrupt another.
> 
> To support this we need a fixmap entry and lock for each notification
> type. But ghes_probe() attempts to process each struct ghes at probe
> time, to ensure any error that was notified before ghes_probe() was
> called has been done, and the buffer released (and maybe acknowledge
> to firmware) so that future errors can be delivered.
> 
> This means NMI-like notifications need two fixmap entries and locks,
> one for the ghes_probe() time call, and another for the actual NMI
> that could interrupt ghes_probe().
> 
> Split this single path up by adding an NMI fixmap idx and lock into
> the struct ghes. Any notification that can be called as an NMI can
> use these to separate its resources from any other notification it
> may interrupt.
> 
> The majority of notifications occur in IRQ context, so unless its
> called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
> fixmap entry and the ghes_fixmap_lock_irq lock. This allows
> NMI-notifications to be processed by ghes_probe(), and then taken
> as an NMI.
> 
> The double-underscore version of fix_to_virt() is used because the index
> to be mapped can't be tested against the end of the enum at compile
> time.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since v1:
>  * Fixed for ghes_proc() always calling every notification in process context.
>    Now only NMI-like notifications need an additional fixmap-slot/lock.

...

> @@ -986,6 +960,8 @@ int ghes_notify_sea(void)
>  
>  static void ghes_sea_add(struct ghes *ghes)
>  {
> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
> +	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
>  	ghes_estatus_queue_grow_pool(ghes);
>  
>  	mutex_lock(&ghes_list_mutex);
> @@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>  
>  static void ghes_nmi_add(struct ghes *ghes)
>  {
> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;

Ewww, we're assigning the spinlock to a pointer which we'll take later?
Yuck.

Why?

Do I see it correctly that one can have ACPI_HEST_NOTIFY_SEA and
ACPI_HEST_NOTIFY_NMI coexist in parallel on a single system?

If not, you can use a single spinlock.

If yes, then I'd prefer to make it less ugly and do the notification
type check ghes_probe() does:

	switch (generic->notify.type)

and take the respective spinlock in ghes_copy_tofrom_phys(). This way it
is a bit better than using a spinlock ptr.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-05 12:27     ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-05 12:27 UTC (permalink / raw)
  To: James Morse
  Cc: linux-acpi, kvmarm, linux-arm-kernel, linux-mm, Marc Zyngier,
	Christoffer Dall, Will Deacon, Catalin Marinas, Naoya Horiguchi,
	Rafael Wysocki, Len Brown, Tony Luck, Tyler Baicar, Dongjiu Geng,
	Xie XiuQi, Punit Agrawal, jonathan.zhang

On Fri, Apr 27, 2018 at 04:35:05PM +0100, James Morse wrote:
> Arm64 has multiple NMI-like notifications, but ghes.c only has one
> in_nmi() path, risking deadlock if one NMI-like notification can
> interrupt another.
> 
> To support this we need a fixmap entry and lock for each notification
> type. But ghes_probe() attempts to process each struct ghes at probe
> time, to ensure any error that was notified before ghes_probe() was
> called has been done, and the buffer released (and maybe acknowledge
> to firmware) so that future errors can be delivered.
> 
> This means NMI-like notifications need two fixmap entries and locks,
> one for the ghes_probe() time call, and another for the actual NMI
> that could interrupt ghes_probe().
> 
> Split this single path up by adding an NMI fixmap idx and lock into
> the struct ghes. Any notification that can be called as an NMI can
> use these to separate its resources from any other notification it
> may interrupt.
> 
> The majority of notifications occur in IRQ context, so unless its
> called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
> fixmap entry and the ghes_fixmap_lock_irq lock. This allows
> NMI-notifications to be processed by ghes_probe(), and then taken
> as an NMI.
> 
> The double-underscore version of fix_to_virt() is used because the index
> to be mapped can't be tested against the end of the enum at compile
> time.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since v1:
>  * Fixed for ghes_proc() always calling every notification in process context.
>    Now only NMI-like notifications need an additional fixmap-slot/lock.

...

> @@ -986,6 +960,8 @@ int ghes_notify_sea(void)
>  
>  static void ghes_sea_add(struct ghes *ghes)
>  {
> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
> +	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
>  	ghes_estatus_queue_grow_pool(ghes);
>  
>  	mutex_lock(&ghes_list_mutex);
> @@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>  
>  static void ghes_nmi_add(struct ghes *ghes)
>  {
> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;

Ewww, we're assigning the spinlock to a pointer which we'll take later?
Yuck.

Why?

Do I see it correctly that one can have ACPI_HEST_NOTIFY_SEA and
ACPI_HEST_NOTIFY_NMI coexist in parallel on a single system?

If not, you can use a single spinlock.

If yes, then I'd prefer to make it less ugly and do the notification
type check ghes_probe() does:

	switch (generic->notify.type)

and take the respective spinlock in ghes_copy_tofrom_phys(). This way it
is a bit better than using a spinlock ptr.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-05 12:27     ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-05 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 27, 2018 at 04:35:05PM +0100, James Morse wrote:
> Arm64 has multiple NMI-like notifications, but ghes.c only has one
> in_nmi() path, risking deadlock if one NMI-like notification can
> interrupt another.
> 
> To support this we need a fixmap entry and lock for each notification
> type. But ghes_probe() attempts to process each struct ghes at probe
> time, to ensure any error that was notified before ghes_probe() was
> called has been done, and the buffer released (and maybe acknowledge
> to firmware) so that future errors can be delivered.
> 
> This means NMI-like notifications need two fixmap entries and locks,
> one for the ghes_probe() time call, and another for the actual NMI
> that could interrupt ghes_probe().
> 
> Split this single path up by adding an NMI fixmap idx and lock into
> the struct ghes. Any notification that can be called as an NMI can
> use these to separate its resources from any other notification it
> may interrupt.
> 
> The majority of notifications occur in IRQ context, so unless its
> called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
> fixmap entry and the ghes_fixmap_lock_irq lock. This allows
> NMI-notifications to be processed by ghes_probe(), and then taken
> as an NMI.
> 
> The double-underscore version of fix_to_virt() is used because the index
> to be mapped can't be tested against the end of the enum at compile
> time.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since v1:
>  * Fixed for ghes_proc() always calling every notification in process context.
>    Now only NMI-like notifications need an additional fixmap-slot/lock.

...

> @@ -986,6 +960,8 @@ int ghes_notify_sea(void)
>  
>  static void ghes_sea_add(struct ghes *ghes)
>  {
> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
> +	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
>  	ghes_estatus_queue_grow_pool(ghes);
>  
>  	mutex_lock(&ghes_list_mutex);
> @@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>  
>  static void ghes_nmi_add(struct ghes *ghes)
>  {
> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;

Ewww, we're assigning the spinlock to a pointer which we'll take later?
Yuck.

Why?

Do I see it correctly that one can have ACPI_HEST_NOTIFY_SEA and
ACPI_HEST_NOTIFY_NMI coexist in parallel on a single system?

If not, you can use a single spinlock.

If yes, then I'd prefer to make it less ugly and do the notification
type check ghes_probe() does:

	switch (generic->notify.type)

and take the respective spinlock in ghes_copy_tofrom_phys(). This way it
is a bit better than using a spinlock ptr.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
  2018-05-05 12:27     ` Borislav Petkov
  (?)
@ 2018-05-08  8:45       ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-08  8:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Xie XiuQi, linux-mm, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-acpi, Punit Agrawal,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

Hi Borislav,

On 05/05/18 13:27, Borislav Petkov wrote:
> On Fri, Apr 27, 2018 at 04:35:05PM +0100, James Morse wrote:
>> Arm64 has multiple NMI-like notifications, but ghes.c only has one
>> in_nmi() path, risking deadlock if one NMI-like notification can
>> interrupt another.
>>
>> To support this we need a fixmap entry and lock for each notification
>> type. But ghes_probe() attempts to process each struct ghes at probe
>> time, to ensure any error that was notified before ghes_probe() was
>> called has been done, and the buffer released (and maybe acknowledge
>> to firmware) so that future errors can be delivered.
>>
>> This means NMI-like notifications need two fixmap entries and locks,
>> one for the ghes_probe() time call, and another for the actual NMI
>> that could interrupt ghes_probe().
>>
>> Split this single path up by adding an NMI fixmap idx and lock into
>> the struct ghes. Any notification that can be called as an NMI can
>> use these to separate its resources from any other notification it
>> may interrupt.
>>
>> The majority of notifications occur in IRQ context, so unless its
>> called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
>> fixmap entry and the ghes_fixmap_lock_irq lock. This allows
>> NMI-notifications to be processed by ghes_probe(), and then taken
>> as an NMI.
>>
>> The double-underscore version of fix_to_virt() is used because the index
>> to be mapped can't be tested against the end of the enum at compile
>> time.

>> @@ -986,6 +960,8 @@ int ghes_notify_sea(void)
>>  
>>  static void ghes_sea_add(struct ghes *ghes)
>>  {
>> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
>> +	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
>>  	ghes_estatus_queue_grow_pool(ghes);
>>  
>>  	mutex_lock(&ghes_list_mutex);
>> @@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>>  
>>  static void ghes_nmi_add(struct ghes *ghes)
>>  {
>> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
> 
> Ewww, we're assigning the spinlock to a pointer which we'll take later?
> Yuck.

> Why?

So that APEI doesn't need to know which lock goes with which fixmap page, and
how these notifications interact.


> Do I see it correctly that one can have ACPI_HEST_NOTIFY_SEA and
> ACPI_HEST_NOTIFY_NMI coexist in parallel on a single system?

NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same way,
so doesn't use it. The equivalent notifications with NMI-like behaviour are:
* SEA (synchronous external abort)
* SEI (SError Interrupt)
* SDEI (software delegated exception interface)


> If not, you can use a single spinlock.

Today we could, but once we have SDEI and SEI this won't work:
SDEI behaves as two notifications, 'normal' and 'critical', a different fixmap
page is needed for these as they can interrupt each other, and a different lock.

SEA can interrupt SEI, so they need a different fixmap-pages and locks.
We can always disable SEI when we're handling another NMI-like notification.

I doubt anyone would implement all three, but if they did SEA can interrupt the lot.


I'd like to avoid describing any of these interactions in ghes.c, I think it
should be possible that any notification can interrupt any other notification
without the risk of deadlock.


> If yes, then I'd prefer to make it less ugly and do the notification
> type check ghes_probe() does:
> 
> 	switch (generic->notify.type)
> 
> and take the respective spinlock in ghes_copy_tofrom_phys(). This way it
> is a bit better than using a spinlock ptr.

I wanted to avoid duplicating that list, some of the locks are #ifdef'd so it
gets ugly quickly. (We would only need the NMI-like notifications though).

I'd really like to avoid the GHES code having to know about normal/critical SDEI
events.


Alternatively, I can put the fixmap-page and spinlock in some 'struct
ghes_notification' that only the NMI-like struct-ghes need. This is just moving
the indirection up a level, but it does pair the lock with the thing it locks,
and gets rid of assigning spinlock pointers.


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-08  8:45       ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-08  8:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-acpi, kvmarm, linux-arm-kernel, linux-mm, Marc Zyngier,
	Christoffer Dall, Will Deacon, Catalin Marinas, Naoya Horiguchi,
	Rafael Wysocki, Len Brown, Tony Luck, Tyler Baicar, Dongjiu Geng,
	Xie XiuQi, Punit Agrawal, jonathan.zhang

Hi Borislav,

On 05/05/18 13:27, Borislav Petkov wrote:
> On Fri, Apr 27, 2018 at 04:35:05PM +0100, James Morse wrote:
>> Arm64 has multiple NMI-like notifications, but ghes.c only has one
>> in_nmi() path, risking deadlock if one NMI-like notification can
>> interrupt another.
>>
>> To support this we need a fixmap entry and lock for each notification
>> type. But ghes_probe() attempts to process each struct ghes at probe
>> time, to ensure any error that was notified before ghes_probe() was
>> called has been done, and the buffer released (and maybe acknowledge
>> to firmware) so that future errors can be delivered.
>>
>> This means NMI-like notifications need two fixmap entries and locks,
>> one for the ghes_probe() time call, and another for the actual NMI
>> that could interrupt ghes_probe().
>>
>> Split this single path up by adding an NMI fixmap idx and lock into
>> the struct ghes. Any notification that can be called as an NMI can
>> use these to separate its resources from any other notification it
>> may interrupt.
>>
>> The majority of notifications occur in IRQ context, so unless its
>> called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
>> fixmap entry and the ghes_fixmap_lock_irq lock. This allows
>> NMI-notifications to be processed by ghes_probe(), and then taken
>> as an NMI.
>>
>> The double-underscore version of fix_to_virt() is used because the index
>> to be mapped can't be tested against the end of the enum at compile
>> time.

>> @@ -986,6 +960,8 @@ int ghes_notify_sea(void)
>>  
>>  static void ghes_sea_add(struct ghes *ghes)
>>  {
>> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
>> +	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
>>  	ghes_estatus_queue_grow_pool(ghes);
>>  
>>  	mutex_lock(&ghes_list_mutex);
>> @@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>>  
>>  static void ghes_nmi_add(struct ghes *ghes)
>>  {
>> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
> 
> Ewww, we're assigning the spinlock to a pointer which we'll take later?
> Yuck.

> Why?

So that APEI doesn't need to know which lock goes with which fixmap page, and
how these notifications interact.


> Do I see it correctly that one can have ACPI_HEST_NOTIFY_SEA and
> ACPI_HEST_NOTIFY_NMI coexist in parallel on a single system?

NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same way,
so doesn't use it. The equivalent notifications with NMI-like behaviour are:
* SEA (synchronous external abort)
* SEI (SError Interrupt)
* SDEI (software delegated exception interface)


> If not, you can use a single spinlock.

Today we could, but once we have SDEI and SEI this won't work:
SDEI behaves as two notifications, 'normal' and 'critical', a different fixmap
page is needed for these as they can interrupt each other, and a different lock.

SEA can interrupt SEI, so they need a different fixmap-pages and locks.
We can always disable SEI when we're handling another NMI-like notification.

I doubt anyone would implement all three, but if they did SEA can interrupt the lot.


I'd like to avoid describing any of these interactions in ghes.c, I think it
should be possible that any notification can interrupt any other notification
without the risk of deadlock.


> If yes, then I'd prefer to make it less ugly and do the notification
> type check ghes_probe() does:
> 
> 	switch (generic->notify.type)
> 
> and take the respective spinlock in ghes_copy_tofrom_phys(). This way it
> is a bit better than using a spinlock ptr.

I wanted to avoid duplicating that list, some of the locks are #ifdef'd so it
gets ugly quickly. (We would only need the NMI-like notifications though).

I'd really like to avoid the GHES code having to know about normal/critical SDEI
events.


Alternatively, I can put the fixmap-page and spinlock in some 'struct
ghes_notification' that only the NMI-like struct-ghes need. This is just moving
the indirection up a level, but it does pair the lock with the thing it locks,
and gets rid of assigning spinlock pointers.


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-08  8:45       ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-08  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Borislav,

On 05/05/18 13:27, Borislav Petkov wrote:
> On Fri, Apr 27, 2018 at 04:35:05PM +0100, James Morse wrote:
>> Arm64 has multiple NMI-like notifications, but ghes.c only has one
>> in_nmi() path, risking deadlock if one NMI-like notification can
>> interrupt another.
>>
>> To support this we need a fixmap entry and lock for each notification
>> type. But ghes_probe() attempts to process each struct ghes at probe
>> time, to ensure any error that was notified before ghes_probe() was
>> called has been done, and the buffer released (and maybe acknowledge
>> to firmware) so that future errors can be delivered.
>>
>> This means NMI-like notifications need two fixmap entries and locks,
>> one for the ghes_probe() time call, and another for the actual NMI
>> that could interrupt ghes_probe().
>>
>> Split this single path up by adding an NMI fixmap idx and lock into
>> the struct ghes. Any notification that can be called as an NMI can
>> use these to separate its resources from any other notification it
>> may interrupt.
>>
>> The majority of notifications occur in IRQ context, so unless its
>> called in_nmi(), ghes_copy_tofrom_phys() will use the FIX_APEI_GHES_IRQ
>> fixmap entry and the ghes_fixmap_lock_irq lock. This allows
>> NMI-notifications to be processed by ghes_probe(), and then taken
>> as an NMI.
>>
>> The double-underscore version of fix_to_virt() is used because the index
>> to be mapped can't be tested against the end of the enum at compile
>> time.

>> @@ -986,6 +960,8 @@ int ghes_notify_sea(void)
>>  
>>  static void ghes_sea_add(struct ghes *ghes)
>>  {
>> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
>> +	ghes->nmi_fixmap_idx = FIX_APEI_GHES_NMI;
>>  	ghes_estatus_queue_grow_pool(ghes);
>>  
>>  	mutex_lock(&ghes_list_mutex);
>> @@ -1032,6 +1008,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>>  
>>  static void ghes_nmi_add(struct ghes *ghes)
>>  {
>> +	ghes->nmi_fixmap_lock = &ghes_fixmap_lock_nmi;
> 
> Ewww, we're assigning the spinlock to a pointer which we'll take later?
> Yuck.

> Why?

So that APEI doesn't need to know which lock goes with which fixmap page, and
how these notifications interact.


> Do I see it correctly that one can have ACPI_HEST_NOTIFY_SEA and
> ACPI_HEST_NOTIFY_NMI coexist in parallel on a single system?

NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same way,
so doesn't use it. The equivalent notifications with NMI-like behaviour are:
* SEA (synchronous external abort)
* SEI (SError Interrupt)
* SDEI (software delegated exception interface)


> If not, you can use a single spinlock.

Today we could, but once we have SDEI and SEI this won't work:
SDEI behaves as two notifications, 'normal' and 'critical', a different fixmap
page is needed for these as they can interrupt each other, and a different lock.

SEA can interrupt SEI, so they need a different fixmap-pages and locks.
We can always disable SEI when we're handling another NMI-like notification.

I doubt anyone would implement all three, but if they did SEA can interrupt the lot.


I'd like to avoid describing any of these interactions in ghes.c, I think it
should be possible that any notification can interrupt any other notification
without the risk of deadlock.


> If yes, then I'd prefer to make it less ugly and do the notification
> type check ghes_probe() does:
> 
> 	switch (generic->notify.type)
> 
> and take the respective spinlock in ghes_copy_tofrom_phys(). This way it
> is a bit better than using a spinlock ptr.

I wanted to avoid duplicating that list, some of the locks are #ifdef'd so it
gets ugly quickly. (We would only need the NMI-like notifications though).

I'd really like to avoid the GHES code having to know about normal/critical SDEI
events.


Alternatively, I can put the fixmap-page and spinlock in some 'struct
ghes_notification' that only the NMI-like struct-ghes need. This is just moving
the indirection up a level, but it does pair the lock with the thing it locks,
and gets rid of assigning spinlock pointers.


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
  2018-05-08  8:45       ` James Morse
  (?)
@ 2018-05-16 11:05         ` Borislav Petkov
  -1 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-16 11:05 UTC (permalink / raw)
  To: James Morse, Tyler Baicar
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, linux-mm, Marc Zyngier, Catalin Marinas,
	Will Deacon, Dongjiu Geng, linux-acpi, Thomas Gleixner,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
> NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same way,
> so doesn't use it. The equivalent notifications with NMI-like behaviour are:
> * SEA (synchronous external abort)
> * SEI (SError Interrupt)
> * SDEI (software delegated exception interface)

Oh wow, three! :)

> Alternatively, I can put the fixmap-page and spinlock in some 'struct
> ghes_notification' that only the NMI-like struct-ghes need. This is just moving
> the indirection up a level, but it does pair the lock with the thing it locks,
> and gets rid of assigning spinlock pointers.

Keeping the lock and what it protects in one place certainly sounds
better. I guess you could so something like this:

struct ghes_fixmap {
 union {
  raw_spinlock_t nmi_lock;
   spinlock_t lock;
 };
 void __iomem *(map)(struct ghes_fixmap *);
};

and assign the proper ghes_ioremap function to ->map.

The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
questionable. Because we should have disabled interrupts so that you can
do

spin_lock(map->lock);

Except that we do get called with IRQs on and looking at that call of
ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
happen.

And that comes from:

  77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")

Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
the timer fires while that ghes_proc() runs...

What's up?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-16 11:05         ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-16 11:05 UTC (permalink / raw)
  To: James Morse, Tyler Baicar
  Cc: linux-acpi, kvmarm, linux-arm-kernel, linux-mm, Marc Zyngier,
	Christoffer Dall, Will Deacon, Catalin Marinas, Naoya Horiguchi,
	Rafael Wysocki, Len Brown, Tony Luck, Dongjiu Geng, Xie XiuQi,
	Punit Agrawal, jonathan.zhang, Thomas Gleixner

On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
> NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same way,
> so doesn't use it. The equivalent notifications with NMI-like behaviour are:
> * SEA (synchronous external abort)
> * SEI (SError Interrupt)
> * SDEI (software delegated exception interface)

Oh wow, three! :)

> Alternatively, I can put the fixmap-page and spinlock in some 'struct
> ghes_notification' that only the NMI-like struct-ghes need. This is just moving
> the indirection up a level, but it does pair the lock with the thing it locks,
> and gets rid of assigning spinlock pointers.

Keeping the lock and what it protects in one place certainly sounds
better. I guess you could so something like this:

struct ghes_fixmap {
 union {
  raw_spinlock_t nmi_lock;
   spinlock_t lock;
 };
 void __iomem *(map)(struct ghes_fixmap *);
};

and assign the proper ghes_ioremap function to ->map.

The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
questionable. Because we should have disabled interrupts so that you can
do

spin_lock(map->lock);

Except that we do get called with IRQs on and looking at that call of
ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
happen.

And that comes from:

  77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")

Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
the timer fires while that ghes_proc() runs...

What's up?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-16 11:05         ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-16 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
> NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same way,
> so doesn't use it. The equivalent notifications with NMI-like behaviour are:
> * SEA (synchronous external abort)
> * SEI (SError Interrupt)
> * SDEI (software delegated exception interface)

Oh wow, three! :)

> Alternatively, I can put the fixmap-page and spinlock in some 'struct
> ghes_notification' that only the NMI-like struct-ghes need. This is just moving
> the indirection up a level, but it does pair the lock with the thing it locks,
> and gets rid of assigning spinlock pointers.

Keeping the lock and what it protects in one place certainly sounds
better. I guess you could so something like this:

struct ghes_fixmap {
 union {
  raw_spinlock_t nmi_lock;
   spinlock_t lock;
 };
 void __iomem *(map)(struct ghes_fixmap *);
};

and assign the proper ghes_ioremap function to ->map.

The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
questionable. Because we should have disabled interrupts so that you can
do

spin_lock(map->lock);

Except that we do get called with IRQs on and looking at that call of
ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
happen.

And that comes from:

  77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")

Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
the timer fires while that ghes_proc() runs...

What's up?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
  2018-05-16 11:05         ` Borislav Petkov
  (?)
@ 2018-05-16 14:51           ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-16 14:51 UTC (permalink / raw)
  To: Borislav Petkov, Tyler Baicar
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, linux-mm, Marc Zyngier, Catalin Marinas,
	Will Deacon, Dongjiu Geng, linux-acpi, Thomas Gleixner,
	Naoya Horiguchi, kvmarm, linux-arm-kernel, Len Brown

Hi Borislav,

On 16/05/18 12:05, Borislav Petkov wrote:
> On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
>> NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same way,
>> so doesn't use it. The equivalent notifications with NMI-like behaviour are:
>> * SEA (synchronous external abort)
>> * SEI (SError Interrupt)
>> * SDEI (software delegated exception interface)
> > Oh wow, three! :)

The first two overload existing architectural behavior, the third improves all
this with a third standard option. Its the standard story!


>> Alternatively, I can put the fixmap-page and spinlock in some 'struct
>> ghes_notification' that only the NMI-like struct-ghes need. This is just moving
>> the indirection up a level, but it does pair the lock with the thing it locks,
>> and gets rid of assigning spinlock pointers.
> 
> Keeping the lock and what it protects in one place certainly sounds
> better.

Yup, I was about to post a v4...


> I guess you could so something like this:
> 
> struct ghes_fixmap {
>  union {
>   raw_spinlock_t nmi_lock;
>    spinlock_t lock;
>  };

(heh, spinlock_t already contains a raw_spinlock_t)

>  void __iomem *(map)(struct ghes_fixmap *);
> };
> 
> and assign the proper ghes_ioremap function to ->map.

The function pointer is a problem because SDEI is effectively two notification
methods. Critical can interrupt normal. I'd really like to keep the differences
buried in the SDEI driver.

v4 has a separate structure for the fixmap-entry and lock, which
ghes_copy_tofrom_phys() reaches into if in_nmi().


> The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
> questionable. Because we should have disabled interrupts so that you can
> do
> 
> spin_lock(map->lock);

I thought this was for the polled driver, but that must be backed by an
interrupt too...

linux/timer.h has:
|  * An irqsafe timer is executed with IRQ disabled and it's safe to wait for
|  * the completion of the running instance from IRQ handlers, for example,
|  * by calling del_timer_sync().
|  *
|  * Note: The irq disabled callback execution is a special case for
|  * workqueue locking issues. It's not meant for executing random crap
|  * with interrupts disabled. Abuse is monitored!

This irq-disable behaviour is controlled by the flags field:
| #define TIMER_DEFERRABLE	0x00080000
| #define TIMER_IRQSAFE		0x00200000

and ghes_probe() does this:
| timer_setup(&ghes->timer, ghes_poll_func, TIMER_DEFERRABLE);

So I think the ghes_poll_func() can be called with IRQs unmasked, hence the
spin_lock_irqsave().


> Except that we do get called with IRQs on and looking at that call of
> ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
> happen.
> 
> And that comes from:
> 
>   77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")
> 
> Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
> the timer fires while that ghes_proc() runs...

I thought this was safe because its just ghes_copy_tofrom_phys()s access to the
fixmap slots that needs mutual exclusion.

Polled and all the IRQ flavours are kept apart by the spin_lock_irqsave(), and
the NMIs have their own fixmap entry. (This is fine until there is more than
once source of NMI)


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-16 14:51           ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-16 14:51 UTC (permalink / raw)
  To: Borislav Petkov, Tyler Baicar
  Cc: linux-acpi, kvmarm, linux-arm-kernel, linux-mm, Marc Zyngier,
	Christoffer Dall, Will Deacon, Catalin Marinas, Naoya Horiguchi,
	Rafael Wysocki, Len Brown, Tony Luck, Dongjiu Geng, Xie XiuQi,
	Punit Agrawal, jonathan.zhang, Thomas Gleixner

Hi Borislav,

On 16/05/18 12:05, Borislav Petkov wrote:
> On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
>> NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same way,
>> so doesn't use it. The equivalent notifications with NMI-like behaviour are:
>> * SEA (synchronous external abort)
>> * SEI (SError Interrupt)
>> * SDEI (software delegated exception interface)
> > Oh wow, three! :)

The first two overload existing architectural behavior, the third improves all
this with a third standard option. Its the standard story!


>> Alternatively, I can put the fixmap-page and spinlock in some 'struct
>> ghes_notification' that only the NMI-like struct-ghes need. This is just moving
>> the indirection up a level, but it does pair the lock with the thing it locks,
>> and gets rid of assigning spinlock pointers.
> 
> Keeping the lock and what it protects in one place certainly sounds
> better.

Yup, I was about to post a v4...


> I guess you could so something like this:
> 
> struct ghes_fixmap {
>  union {
>   raw_spinlock_t nmi_lock;
>    spinlock_t lock;
>  };

(heh, spinlock_t already contains a raw_spinlock_t)

>  void __iomem *(map)(struct ghes_fixmap *);
> };
> 
> and assign the proper ghes_ioremap function to ->map.

The function pointer is a problem because SDEI is effectively two notification
methods. Critical can interrupt normal. I'd really like to keep the differences
buried in the SDEI driver.

v4 has a separate structure for the fixmap-entry and lock, which
ghes_copy_tofrom_phys() reaches into if in_nmi().


> The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
> questionable. Because we should have disabled interrupts so that you can
> do
> 
> spin_lock(map->lock);

I thought this was for the polled driver, but that must be backed by an
interrupt too...

linux/timer.h has:
|  * An irqsafe timer is executed with IRQ disabled and it's safe to wait for
|  * the completion of the running instance from IRQ handlers, for example,
|  * by calling del_timer_sync().
|  *
|  * Note: The irq disabled callback execution is a special case for
|  * workqueue locking issues. It's not meant for executing random crap
|  * with interrupts disabled. Abuse is monitored!

This irq-disable behaviour is controlled by the flags field:
| #define TIMER_DEFERRABLE	0x00080000
| #define TIMER_IRQSAFE		0x00200000

and ghes_probe() does this:
| timer_setup(&ghes->timer, ghes_poll_func, TIMER_DEFERRABLE);

So I think the ghes_poll_func() can be called with IRQs unmasked, hence the
spin_lock_irqsave().


> Except that we do get called with IRQs on and looking at that call of
> ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
> happen.
> 
> And that comes from:
> 
>   77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")
> 
> Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
> the timer fires while that ghes_proc() runs...

I thought this was safe because its just ghes_copy_tofrom_phys()s access to the
fixmap slots that needs mutual exclusion.

Polled and all the IRQ flavours are kept apart by the spin_lock_irqsave(), and
the NMIs have their own fixmap entry. (This is fine until there is more than
once source of NMI)


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-16 14:51           ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-16 14:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Borislav,

On 16/05/18 12:05, Borislav Petkov wrote:
> On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
>> NOTIFY_NMI is x86's NMI, arm doesn't have anything that behaves in the same way,
>> so doesn't use it. The equivalent notifications with NMI-like behaviour are:
>> * SEA (synchronous external abort)
>> * SEI (SError Interrupt)
>> * SDEI (software delegated exception interface)
> > Oh wow, three! :)

The first two overload existing architectural behavior, the third improves all
this with a third standard option. Its the standard story!


>> Alternatively, I can put the fixmap-page and spinlock in some 'struct
>> ghes_notification' that only the NMI-like struct-ghes need. This is just moving
>> the indirection up a level, but it does pair the lock with the thing it locks,
>> and gets rid of assigning spinlock pointers.
> 
> Keeping the lock and what it protects in one place certainly sounds
> better.

Yup, I was about to post a v4...


> I guess you could so something like this:
> 
> struct ghes_fixmap {
>  union {
>   raw_spinlock_t nmi_lock;
>    spinlock_t lock;
>  };

(heh, spinlock_t already contains a raw_spinlock_t)

>  void __iomem *(map)(struct ghes_fixmap *);
> };
> 
> and assign the proper ghes_ioremap function to ->map.

The function pointer is a problem because SDEI is effectively two notification
methods. Critical can interrupt normal. I'd really like to keep the differences
buried in the SDEI driver.

v4 has a separate structure for the fixmap-entry and lock, which
ghes_copy_tofrom_phys() reaches into if in_nmi().


> The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
> questionable. Because we should have disabled interrupts so that you can
> do
> 
> spin_lock(map->lock);

I thought this was for the polled driver, but that must be backed by an
interrupt too...

linux/timer.h has:
|  * An irqsafe timer is executed with IRQ disabled and it's safe to wait for
|  * the completion of the running instance from IRQ handlers, for example,
|  * by calling del_timer_sync().
|  *
|  * Note: The irq disabled callback execution is a special case for
|  * workqueue locking issues. It's not meant for executing random crap
|  * with interrupts disabled. Abuse is monitored!

This irq-disable behaviour is controlled by the flags field:
| #define TIMER_DEFERRABLE	0x00080000
| #define TIMER_IRQSAFE		0x00200000

and ghes_probe() does this:
| timer_setup(&ghes->timer, ghes_poll_func, TIMER_DEFERRABLE);

So I think the ghes_poll_func() can be called with IRQs unmasked, hence the
spin_lock_irqsave().


> Except that we do get called with IRQs on and looking at that call of
> ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
> happen.
> 
> And that comes from:
> 
>   77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")
> 
> Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
> the timer fires while that ghes_proc() runs...

I thought this was safe because its just ghes_copy_tofrom_phys()s access to the
fixmap slots that needs mutual exclusion.

Polled and all the IRQ flavours are kept apart by the spin_lock_irqsave(), and
the NMIs have their own fixmap entry. (This is fine until there is more than
once source of NMI)


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
  2018-05-16 11:05         ` Borislav Petkov
  (?)
@ 2018-05-16 15:38           ` Tyler Baicar
  -1 siblings, 0 replies; 78+ messages in thread
From: Tyler Baicar @ 2018-05-16 15:38 UTC (permalink / raw)
  To: Borislav Petkov, James Morse
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Punit Agrawal, linux-mm, Marc Zyngier, Catalin Marinas,
	Xie XiuQi, Will Deacon, Dongjiu Geng, linux-acpi,
	Thomas Gleixner, Naoya Horiguchi, kvmarm, linux-arm-kernel,
	Len Brown

On 5/16/2018 7:05 AM, Borislav Petkov wrote:
> On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
>> Alternatively, I can put the fixmap-page and spinlock in some 'struct
>> ghes_notification' that only the NMI-like struct-ghes need. This is just moving
>> the indirection up a level, but it does pair the lock with the thing it locks,
>> and gets rid of assigning spinlock pointers.
> Keeping the lock and what it protects in one place certainly sounds
> better. I guess you could so something like this:
>
> struct ghes_fixmap {
>   union {
>    raw_spinlock_t nmi_lock;
>     spinlock_t lock;
>   };
>   void __iomem *(map)(struct ghes_fixmap *);
> };
>
> and assign the proper ghes_ioremap function to ->map.
>
> The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
> questionable. Because we should have disabled interrupts so that you can
> do
>
> spin_lock(map->lock);
>
> Except that we do get called with IRQs on and looking at that call of
> ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
> happen.
>
> And that comes from:
>
>    77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")
>
> Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
> the timer fires while that ghes_proc() runs...
>
> What's up?
Hello Boris,

I haven't seen a deadlock from that, but it looks possible. What if the 
ghes_proc() call in ghes_probe()
is moved before the second switch statement? That way it is before the 
NMI/IRQ/poll is setup. At quick
glance I think that should avoid the deadlock and still provide the 
functionality that call was added for. I
can test that out if you all agree.

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-16 15:38           ` Tyler Baicar
  0 siblings, 0 replies; 78+ messages in thread
From: Tyler Baicar @ 2018-05-16 15:38 UTC (permalink / raw)
  To: Borislav Petkov, James Morse
  Cc: linux-acpi, kvmarm, linux-arm-kernel, linux-mm, Marc Zyngier,
	Christoffer Dall, Will Deacon, Catalin Marinas, Naoya Horiguchi,
	Rafael Wysocki, Len Brown, Tony Luck, Dongjiu Geng, Xie XiuQi,
	Punit Agrawal, jonathan.zhang, Thomas Gleixner

On 5/16/2018 7:05 AM, Borislav Petkov wrote:
> On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
>> Alternatively, I can put the fixmap-page and spinlock in some 'struct
>> ghes_notification' that only the NMI-like struct-ghes need. This is just moving
>> the indirection up a level, but it does pair the lock with the thing it locks,
>> and gets rid of assigning spinlock pointers.
> Keeping the lock and what it protects in one place certainly sounds
> better. I guess you could so something like this:
>
> struct ghes_fixmap {
>   union {
>    raw_spinlock_t nmi_lock;
>     spinlock_t lock;
>   };
>   void __iomem *(map)(struct ghes_fixmap *);
> };
>
> and assign the proper ghes_ioremap function to ->map.
>
> The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
> questionable. Because we should have disabled interrupts so that you can
> do
>
> spin_lock(map->lock);
>
> Except that we do get called with IRQs on and looking at that call of
> ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
> happen.
>
> And that comes from:
>
>    77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")
>
> Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
> the timer fires while that ghes_proc() runs...
>
> What's up?
Hello Boris,

I haven't seen a deadlock from that, but it looks possible. What if the 
ghes_proc() call in ghes_probe()
is moved before the second switch statement? That way it is before the 
NMI/IRQ/poll is setup. At quick
glance I think that should avoid the deadlock and still provide the 
functionality that call was added for. I
can test that out if you all agree.

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-16 15:38           ` Tyler Baicar
  0 siblings, 0 replies; 78+ messages in thread
From: Tyler Baicar @ 2018-05-16 15:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 5/16/2018 7:05 AM, Borislav Petkov wrote:
> On Tue, May 08, 2018 at 09:45:01AM +0100, James Morse wrote:
>> Alternatively, I can put the fixmap-page and spinlock in some 'struct
>> ghes_notification' that only the NMI-like struct-ghes need. This is just moving
>> the indirection up a level, but it does pair the lock with the thing it locks,
>> and gets rid of assigning spinlock pointers.
> Keeping the lock and what it protects in one place certainly sounds
> better. I guess you could so something like this:
>
> struct ghes_fixmap {
>   union {
>    raw_spinlock_t nmi_lock;
>     spinlock_t lock;
>   };
>   void __iomem *(map)(struct ghes_fixmap *);
> };
>
> and assign the proper ghes_ioremap function to ->map.
>
> The spin_lock_irqsave() call in ghes_copy_tofrom_phys() is kinda
> questionable. Because we should have disabled interrupts so that you can
> do
>
> spin_lock(map->lock);
>
> Except that we do get called with IRQs on and looking at that call of
> ghes_proc() at the end of ghes_probe(), that's a deadlock waiting to
> happen.
>
> And that comes from:
>
>    77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")
>
> Tyler, this can't work in any context: imagine the GHES NMI or IRQ or
> the timer fires while that ghes_proc() runs...
>
> What's up?
Hello Boris,

I haven't seen a deadlock from that, but it looks possible. What if the 
ghes_proc() call in ghes_probe()
is moved before the second switch statement? That way it is before the 
NMI/IRQ/poll is setup. At quick
glance I think that should avoid the deadlock and still provide the 
functionality that call was added for. I
can test that out if you all agree.

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
  2018-05-16 14:51           ` James Morse
  (?)
@ 2018-05-17 13:36             ` Borislav Petkov
  -1 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-17 13:36 UTC (permalink / raw)
  To: James Morse
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Xie XiuQi, linux-mm, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-acpi, Punit Agrawal,
	Thomas Gleixner, Naoya Horiguchi, kvmarm, linux-arm-kernel,
	Len Brown

On Wed, May 16, 2018 at 03:51:14PM +0100, James Morse wrote:
> The first two overload existing architectural behavior, the third improves all
> this with a third standard option. Its the standard story!

:-)

> I thought this was safe because its just ghes_copy_tofrom_phys()s access to the
> fixmap slots that needs mutual exclusion.
>
> Polled and all the IRQ flavours are kept apart by the spin_lock_irqsave(), and
> the NMIs have their own fixmap entry. (This is fine until there is more than
> once source of NMI)

For example:

ghes_probe()

	switch (generic->notify.type) {

	...

        case ACPI_HEST_NOTIFY_NMI:
		ghes_nmi_add(ghes);
	}

	...

	ghes_proc();
	  ghes_read_estatus();
		 spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);

		 memcpy...

	-> NMI

		ghes_notify_nmi();
		 ghes_read_estatus();
		 ..
		   if (in_nmi) {
			   raw_spin_lock(&ghes_ioremap_lock_nmi);

		...
	<- NMI

ghes->estatus from above, before the NMI fired, has gotten some nice
scribbling over. AFAICT.

Now, I don't know whether this can happen with the ARM facilities but if
they're NMI-like, I don't see why not.

Which means, that this code is not really reentrant and if should be
fixed to be callable from different contexts, then it should use private
buffers and be careful about locking.

Oh, and that

	if (in_nmi)
		lock()
	else
		lock_irqsave()

pattern is really yucky. And it is an explosion waiting to happen.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-17 13:36             ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-17 13:36 UTC (permalink / raw)
  To: James Morse
  Cc: Tyler Baicar, linux-acpi, kvmarm, linux-arm-kernel, linux-mm,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Dongjiu Geng, Xie XiuQi, Punit Agrawal, jonathan.zhang,
	Thomas Gleixner

On Wed, May 16, 2018 at 03:51:14PM +0100, James Morse wrote:
> The first two overload existing architectural behavior, the third improves all
> this with a third standard option. Its the standard story!

:-)

> I thought this was safe because its just ghes_copy_tofrom_phys()s access to the
> fixmap slots that needs mutual exclusion.
>
> Polled and all the IRQ flavours are kept apart by the spin_lock_irqsave(), and
> the NMIs have their own fixmap entry. (This is fine until there is more than
> once source of NMI)

For example:

ghes_probe()

	switch (generic->notify.type) {

	...

        case ACPI_HEST_NOTIFY_NMI:
		ghes_nmi_add(ghes);
	}

	...

	ghes_proc();
	  ghes_read_estatus();
		 spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);

		 memcpy...

	-> NMI

		ghes_notify_nmi();
		 ghes_read_estatus();
		 ..
		   if (in_nmi) {
			   raw_spin_lock(&ghes_ioremap_lock_nmi);

		...
	<- NMI

ghes->estatus from above, before the NMI fired, has gotten some nice
scribbling over. AFAICT.

Now, I don't know whether this can happen with the ARM facilities but if
they're NMI-like, I don't see why not.

Which means, that this code is not really reentrant and if should be
fixed to be callable from different contexts, then it should use private
buffers and be careful about locking.

Oh, and that

	if (in_nmi)
		lock()
	else
		lock_irqsave()

pattern is really yucky. And it is an explosion waiting to happen.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-17 13:36             ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-17 13:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 16, 2018 at 03:51:14PM +0100, James Morse wrote:
> The first two overload existing architectural behavior, the third improves all
> this with a third standard option. Its the standard story!

:-)

> I thought this was safe because its just ghes_copy_tofrom_phys()s access to the
> fixmap slots that needs mutual exclusion.
>
> Polled and all the IRQ flavours are kept apart by the spin_lock_irqsave(), and
> the NMIs have their own fixmap entry. (This is fine until there is more than
> once source of NMI)

For example:

ghes_probe()

	switch (generic->notify.type) {

	...

        case ACPI_HEST_NOTIFY_NMI:
		ghes_nmi_add(ghes);
	}

	...

	ghes_proc();
	  ghes_read_estatus();
		 spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);

		 memcpy...

	-> NMI

		ghes_notify_nmi();
		 ghes_read_estatus();
		 ..
		   if (in_nmi) {
			   raw_spin_lock(&ghes_ioremap_lock_nmi);

		...
	<- NMI

ghes->estatus from above, before the NMI fired, has gotten some nice
scribbling over. AFAICT.

Now, I don't know whether this can happen with the ARM facilities but if
they're NMI-like, I don't see why not.

Which means, that this code is not really reentrant and if should be
fixed to be callable from different contexts, then it should use private
buffers and be careful about locking.

Oh, and that

	if (in_nmi)
		lock()
	else
		lock_irqsave()

pattern is really yucky. And it is an explosion waiting to happen.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
  2018-05-16 15:38           ` Tyler Baicar
  (?)
@ 2018-05-17 13:39             ` Borislav Petkov
  -1 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-17 13:39 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Thomas Gleixner,
	Christoffer Dall, Punit Agrawal, linux-mm, Marc Zyngier,
	Catalin Marinas, Xie XiuQi, Will Deacon, Dongjiu Geng,
	linux-acpi, James Morse, Naoya Horiguchi, kvmarm,
	linux-arm-kernel, Len Brown

On Wed, May 16, 2018 at 11:38:16AM -0400, Tyler Baicar wrote:
> I haven't seen a deadlock from that, but it looks possible. What if
> the ghes_proc() call in ghes_probe() is moved before the second switch
> statement? That way it is before the NMI/IRQ/poll is setup. At quick
> glance I think that should avoid the deadlock and still provide the
> functionality that call was added for. I can test that out if you all
> agree.

Makes sense but please audit it properly before doing the change. That
code is full of landmines and could use a proper scrubbing first.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-17 13:39             ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-17 13:39 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: James Morse, linux-acpi, kvmarm, linux-arm-kernel, linux-mm,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Dongjiu Geng, Xie XiuQi, Punit Agrawal, jonathan.zhang,
	Thomas Gleixner

On Wed, May 16, 2018 at 11:38:16AM -0400, Tyler Baicar wrote:
> I haven't seen a deadlock from that, but it looks possible. What if
> the ghes_proc() call in ghes_probe() is moved before the second switch
> statement? That way it is before the NMI/IRQ/poll is setup. At quick
> glance I think that should avoid the deadlock and still provide the
> functionality that call was added for. I can test that out if you all
> agree.

Makes sense but please audit it properly before doing the change. That
code is full of landmines and could use a proper scrubbing first.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-17 13:39             ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2018-05-17 13:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 16, 2018 at 11:38:16AM -0400, Tyler Baicar wrote:
> I haven't seen a deadlock from that, but it looks possible. What if
> the ghes_proc() call in ghes_probe() is moved before the second switch
> statement? That way it is before the NMI/IRQ/poll is setup. At quick
> glance I think that should avoid the deadlock and still provide the
> functionality that call was added for. I can test that out if you all
> agree.

Makes sense but please audit it properly before doing the change. That
code is full of landmines and could use a proper scrubbing first.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
  2018-05-17 13:36             ` Borislav Petkov
  (?)
@ 2018-05-17 18:11               ` James Morse
  -1 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-17 18:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: jonathan.zhang, Rafael Wysocki, Tony Luck, Christoffer Dall,
	Xie XiuQi, linux-mm, Marc Zyngier, Catalin Marinas, Tyler Baicar,
	Will Deacon, Dongjiu Geng, linux-acpi, Punit Agrawal,
	Thomas Gleixner, Naoya Horiguchi, kvmarm, linux-arm-kernel,
	Len Brown

Hi Borislav,

On 17/05/18 14:36, Borislav Petkov wrote:
> On Wed, May 16, 2018 at 03:51:14PM +0100, James Morse wrote:
>> I thought this was safe because its just ghes_copy_tofrom_phys()s access to the
>> fixmap slots that needs mutual exclusion.

and here is where I was wrong: I was only looking at reading the data, we then
dump it in struct ghes assuming it can only be notified on once CPU at a time. Oops.

> For example:

> ghes->estatus from above, before the NMI fired, has gotten some nice
> scribbling over. AFAICT.

Yup, thanks for the example!


> Now, I don't know whether this can happen with the ARM facilities but if
> they're NMI-like, I don't see why not.

NOTIFY_SEA is synchronous so the error has to be something to do with the
instruction that was interrupted. In your example this would mean the APEI
code/data was corrupted, which there is little point trying to handle.

NOTIFY_{SEI, SDEI} on the other hand are asynchronous, so this could happen.


> Which means, that this code is not really reentrant and if should be
> fixed to be callable from different contexts, then it should use private
> buffers and be careful about locking.

... I need to go through this thing again to work out how the firmware-buffers
map on to estatus=>ghes ...


> Oh, and that
> 
> 	if (in_nmi)
> 		lock()
> 	else
> 		lock_irqsave()
> 
> pattern is really yucky. And it is an explosion waiting to happen.

The whole in_nmi()=>other-lock think looks like a hack to make a warning go
away. We could get the notification to take whatever lock is appropriate further
out, but it may mean some code duplication. (I'll put it on my list...)


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-17 18:11               ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-17 18:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tyler Baicar, linux-acpi, kvmarm, linux-arm-kernel, linux-mm,
	Marc Zyngier, Christoffer Dall, Will Deacon, Catalin Marinas,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Dongjiu Geng, Xie XiuQi, Punit Agrawal, jonathan.zhang,
	Thomas Gleixner

Hi Borislav,

On 17/05/18 14:36, Borislav Petkov wrote:
> On Wed, May 16, 2018 at 03:51:14PM +0100, James Morse wrote:
>> I thought this was safe because its just ghes_copy_tofrom_phys()s access to the
>> fixmap slots that needs mutual exclusion.

and here is where I was wrong: I was only looking at reading the data, we then
dump it in struct ghes assuming it can only be notified on once CPU at a time. Oops.

> For example:

> ghes->estatus from above, before the NMI fired, has gotten some nice
> scribbling over. AFAICT.

Yup, thanks for the example!


> Now, I don't know whether this can happen with the ARM facilities but if
> they're NMI-like, I don't see why not.

NOTIFY_SEA is synchronous so the error has to be something to do with the
instruction that was interrupted. In your example this would mean the APEI
code/data was corrupted, which there is little point trying to handle.

NOTIFY_{SEI, SDEI} on the other hand are asynchronous, so this could happen.


> Which means, that this code is not really reentrant and if should be
> fixed to be callable from different contexts, then it should use private
> buffers and be careful about locking.

... I need to go through this thing again to work out how the firmware-buffers
map on to estatus=>ghes ...


> Oh, and that
> 
> 	if (in_nmi)
> 		lock()
> 	else
> 		lock_irqsave()
> 
> pattern is really yucky. And it is an explosion waiting to happen.

The whole in_nmi()=>other-lock think looks like a hack to make a warning go
away. We could get the notification to take whatever lock is appropriate further
out, but it may mean some code duplication. (I'll put it on my list...)


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users
@ 2018-05-17 18:11               ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2018-05-17 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Borislav,

On 17/05/18 14:36, Borislav Petkov wrote:
> On Wed, May 16, 2018 at 03:51:14PM +0100, James Morse wrote:
>> I thought this was safe because its just ghes_copy_tofrom_phys()s access to the
>> fixmap slots that needs mutual exclusion.

and here is where I was wrong: I was only looking at reading the data, we then
dump it in struct ghes assuming it can only be notified on once CPU at a time. Oops.

> For example:

> ghes->estatus from above, before the NMI fired, has gotten some nice
> scribbling over. AFAICT.

Yup, thanks for the example!


> Now, I don't know whether this can happen with the ARM facilities but if
> they're NMI-like, I don't see why not.

NOTIFY_SEA is synchronous so the error has to be something to do with the
instruction that was interrupted. In your example this would mean the APEI
code/data was corrupted, which there is little point trying to handle.

NOTIFY_{SEI, SDEI} on the other hand are asynchronous, so this could happen.


> Which means, that this code is not really reentrant and if should be
> fixed to be callable from different contexts, then it should use private
> buffers and be careful about locking.

... I need to go through this thing again to work out how the firmware-buffers
map on to estatus=>ghes ...


> Oh, and that
> 
> 	if (in_nmi)
> 		lock()
> 	else
> 		lock_irqsave()
> 
> pattern is really yucky. And it is an explosion waiting to happen.

The whole in_nmi()=>other-lock think looks like a hack to make a warning go
away. We could get the notification to take whatever lock is appropriate further
out, but it may mean some code duplication. (I'll put it on my list...)


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2018-05-17 18:14 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 15:34 [PATCH v3 00/12] APEI in_nmi() rework and arm64 SDEI wire-up James Morse
2018-04-27 15:34 ` James Morse
2018-04-27 15:34 ` James Morse
2018-04-27 15:34 ` [PATCH v3 01/12] ACPI / APEI: Move the estatus queue code up, and under its own ifdef James Morse
2018-04-27 15:34   ` James Morse
2018-04-27 15:34   ` James Morse
2018-05-01 10:43   ` Punit Agrawal
2018-05-01 10:43     ` Punit Agrawal
2018-05-01 10:43     ` Punit Agrawal
2018-05-01 12:50     ` James Morse
2018-05-01 12:50       ` James Morse
2018-05-01 12:50       ` James Morse
2018-05-05  9:58   ` Borislav Petkov
2018-05-05  9:58     ` Borislav Petkov
2018-05-05  9:58     ` Borislav Petkov
2018-04-27 15:35 ` [PATCH v3 02/12] ACPI / APEI: Generalise the estatus queue's add/remove and notify code James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-05-05 10:12   ` Borislav Petkov
2018-05-05 10:12     ` Borislav Petkov
2018-05-05 10:12     ` Borislav Petkov
2018-04-27 15:35 ` [PATCH v3 03/12] ACPI / APEI: don't wait to serialise with oops messages when panic()ing James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35 ` [PATCH v3 04/12] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35 ` [PATCH v3 05/12] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35 ` [PATCH v3 06/12] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35 ` [PATCH v3 07/12] ACPI / APEI: Make the nmi_fixmap_idx per-ghes to allow multiple in_nmi() users James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-05-05 12:27   ` Borislav Petkov
2018-05-05 12:27     ` Borislav Petkov
2018-05-05 12:27     ` Borislav Petkov
2018-05-08  8:45     ` James Morse
2018-05-08  8:45       ` James Morse
2018-05-08  8:45       ` James Morse
2018-05-16 11:05       ` Borislav Petkov
2018-05-16 11:05         ` Borislav Petkov
2018-05-16 11:05         ` Borislav Petkov
2018-05-16 14:51         ` James Morse
2018-05-16 14:51           ` James Morse
2018-05-16 14:51           ` James Morse
2018-05-17 13:36           ` Borislav Petkov
2018-05-17 13:36             ` Borislav Petkov
2018-05-17 13:36             ` Borislav Petkov
2018-05-17 18:11             ` James Morse
2018-05-17 18:11               ` James Morse
2018-05-17 18:11               ` James Morse
2018-05-16 15:38         ` Tyler Baicar
2018-05-16 15:38           ` Tyler Baicar
2018-05-16 15:38           ` Tyler Baicar
2018-05-17 13:39           ` Borislav Petkov
2018-05-17 13:39             ` Borislav Petkov
2018-05-17 13:39             ` Borislav Petkov
2018-04-27 15:35 ` [PATCH v3 08/12] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35 ` [PATCH v3 09/12] firmware: arm_sdei: Add ACPI GHES registration helper James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35 ` [PATCH v3 10/12] ACPI / APEI: Add support for the SDEI GHES Notification type James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35 ` [PATCH v3 11/12] mm/memory-failure: increase queued recovery work's priority James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35 ` [PATCH v3 12/12] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work James Morse
2018-04-27 15:35   ` James Morse
2018-04-27 15:35   ` James Morse
2018-05-01 20:15 ` [PATCH v3 00/12] APEI in_nmi() rework and arm64 SDEI wire-up Tyler Baicar
2018-05-01 20:15   ` Tyler Baicar
2018-05-01 20:15   ` Tyler Baicar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.