linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] GHES NMI handler cleanup
@ 2015-03-27  9:22 Borislav Petkov
  2015-03-27  9:22 ` [RFC PATCH 1/5] GHES: Carve out error queueing in a separate function Borislav Petkov
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-03-27  9:22 UTC (permalink / raw)
  To: linux-edac
  Cc: Borislav Petkov, Rafael J. Wysocki, Len Brown, Tony Luck,
	Tomasz Nowicki, Chen, Gong, Wolfram Sang, Lv Zheng,
	Naoya Horiguchi, linux-acpi, linux-kernel

From: Borislav Petkov <bp@suse.de>

So this patchset is the result of us seeing this while debugging a
customer issue:

[  118.113136] INFO: NMI handler (ghes_notify_nmi) took too long to run: 1.005 msecs

Looking at that NMI handler, it could use a good scrubbing as it has
grown some needless fat. So let's try it.

First 4 patches simplify it and clean it, and the last one does the
bold move of making the status reader CPU be a single one based on
the not-100-percent-confirmed observation that GHES error sources are
global in the firmware glue and thus only one reader suffices to see all
errors.

This last thing still needs to be confirmed but I'm sending the patches
now so that people can look at them and poke holes. Thus the RFC tag.

Thanks.

Borislav Petkov (4):
  GHES: Carve out error queueing in a separate function
  GHES: Carve out the panic functionality
  GHES: Panic right after detection
  GHES: Elliminate double-loop in the NMI handler

Jiri Kosina (1):
  GHES: Make NMI handler have a single reader

 drivers/acpi/apei/ghes.c | 108 ++++++++++++++++++++++++-----------------------
 1 file changed, 55 insertions(+), 53 deletions(-)

-- 
2.3.3


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 1/5] GHES: Carve out error queueing in a separate function
  2015-03-27  9:22 [RFC PATCH 0/5] GHES NMI handler cleanup Borislav Petkov
@ 2015-03-27  9:22 ` Borislav Petkov
  2015-03-27  9:22 ` [RFC PATCH 2/5] GHES: Carve out the panic functionality Borislav Petkov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-03-27  9:22 UTC (permalink / raw)
  To: linux-edac
  Cc: Borislav Petkov, Rafael J. Wysocki, Len Brown, Tony Luck,
	Tomasz Nowicki, Chen, Gong, Wolfram Sang, Lv Zheng,
	Naoya Horiguchi, linux-acpi, linux-kernel

From: Borislav Petkov <bp@suse.de>

Make the handler more readable.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c | 51 +++++++++++++++++++++++++++---------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e82d0976a5d0..fe1e41bf5609 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -797,6 +797,32 @@ static void ghes_print_queued_estatus(void)
 	}
 }
 
+/* Save estatus for further processing in IRQ context */
+static void __process_error(struct ghes *ghes)
+{
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	u32 len, node_len;
+	struct ghes_estatus_node *estatus_node;
+	struct acpi_hest_generic_status *estatus;
+
+	if (ghes_estatus_cached(ghes->estatus))
+		return;
+
+	len = cper_estatus_len(ghes->estatus);
+	node_len = GHES_ESTATUS_NODE_LEN(len);
+
+	estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool, node_len);
+	if (!estatus_node)
+		return;
+
+	estatus_node->ghes = ghes;
+	estatus_node->generic = ghes->generic;
+	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+	memcpy(estatus, ghes->estatus, len);
+	llist_add(&estatus_node->llnode, &ghes_estatus_llist);
+#endif
+}
+
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
 	struct ghes *ghes, *ghes_global = NULL;
@@ -832,32 +858,13 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 	}
 
 	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-		u32 len, node_len;
-		struct ghes_estatus_node *estatus_node;
-		struct acpi_hest_generic_status *estatus;
-#endif
 		if (!(ghes->flags & GHES_TO_CLEAR))
 			continue;
-#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-		if (ghes_estatus_cached(ghes->estatus))
-			goto next;
-		/* Save estatus for further processing in IRQ context */
-		len = cper_estatus_len(ghes->estatus);
-		node_len = GHES_ESTATUS_NODE_LEN(len);
-		estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool,
-						      node_len);
-		if (estatus_node) {
-			estatus_node->ghes = ghes;
-			estatus_node->generic = ghes->generic;
-			estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
-			memcpy(estatus, ghes->estatus, len);
-			llist_add(&estatus_node->llnode, &ghes_estatus_llist);
-		}
-next:
-#endif
+
+		__process_error(ghes);
 		ghes_clear_estatus(ghes);
 	}
+
 #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	irq_work_queue(&ghes_proc_irq_work);
 #endif
-- 
2.3.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 2/5] GHES: Carve out the panic functionality
  2015-03-27  9:22 [RFC PATCH 0/5] GHES NMI handler cleanup Borislav Petkov
  2015-03-27  9:22 ` [RFC PATCH 1/5] GHES: Carve out error queueing in a separate function Borislav Petkov
@ 2015-03-27  9:22 ` Borislav Petkov
  2015-03-27  9:22 ` [RFC PATCH 3/5] GHES: Panic right after detection Borislav Petkov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-03-27  9:22 UTC (permalink / raw)
  To: linux-edac
  Cc: Borislav Petkov, Rafael J. Wysocki, Len Brown, Tony Luck,
	Tomasz Nowicki, Chen, Gong, Wolfram Sang, Lv Zheng,
	Naoya Horiguchi, linux-acpi, linux-kernel

From: Borislav Petkov <bp@suse.de>

... into another function for more clarity. No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fe1e41bf5609..712ed95b1dca 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -823,6 +823,18 @@ static void __process_error(struct ghes *ghes)
 #endif
 }
 
+static void __ghes_panic(struct ghes *ghes)
+{
+	oops_begin();
+	ghes_print_queued_estatus();
+	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
+
+	/* reboot to log the error! */
+	if (panic_timeout == 0)
+		panic_timeout = ghes_panic_timeout;
+	panic("Fatal hardware error!");
+}
+
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
 	struct ghes *ghes, *ghes_global = NULL;
@@ -846,16 +858,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 	if (ret == NMI_DONE)
 		goto out;
 
-	if (sev_global >= GHES_SEV_PANIC) {
-		oops_begin();
-		ghes_print_queued_estatus();
-		__ghes_print_estatus(KERN_EMERG, ghes_global->generic,
-				     ghes_global->estatus);
-		/* reboot to log the error! */
-		if (panic_timeout == 0)
-			panic_timeout = ghes_panic_timeout;
-		panic("Fatal hardware error!");
-	}
+	if (sev_global >= GHES_SEV_PANIC)
+		__ghes_panic(ghes_global);
 
 	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
 		if (!(ghes->flags & GHES_TO_CLEAR))
-- 
2.3.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 3/5] GHES: Panic right after detection
  2015-03-27  9:22 [RFC PATCH 0/5] GHES NMI handler cleanup Borislav Petkov
  2015-03-27  9:22 ` [RFC PATCH 1/5] GHES: Carve out error queueing in a separate function Borislav Petkov
  2015-03-27  9:22 ` [RFC PATCH 2/5] GHES: Carve out the panic functionality Borislav Petkov
@ 2015-03-27  9:22 ` Borislav Petkov
  2015-03-27  9:22 ` [RFC PATCH 4/5] GHES: Elliminate double-loop in the NMI handler Borislav Petkov
  2015-03-27  9:22 ` [RFC PATCH 5/5] GHES: Make NMI handler have a single reader Borislav Petkov
  4 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-03-27  9:22 UTC (permalink / raw)
  To: linux-edac
  Cc: Borislav Petkov, Rafael J. Wysocki, Len Brown, Tony Luck,
	Tomasz Nowicki, Chen, Gong, Wolfram Sang, Lv Zheng,
	Naoya Horiguchi, linux-acpi, linux-kernel

From: Borislav Petkov <bp@suse.de>

The moment we log an error of panic severity, there's no need to noodle
through the ghes_nmi list anymore. So panic instead right then and
there.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 712ed95b1dca..0de3adcca03e 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -837,9 +837,8 @@ static void __ghes_panic(struct ghes *ghes)
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
-	struct ghes *ghes, *ghes_global = NULL;
-	int sev, sev_global = -1;
-	int ret = NMI_DONE;
+	struct ghes *ghes;
+	int sev, ret = NMI_DONE;
 
 	raw_spin_lock(&ghes_nmi_lock);
 	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
@@ -847,20 +846,17 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 			ghes_clear_estatus(ghes);
 			continue;
 		}
+
 		sev = ghes_severity(ghes->estatus->error_severity);
-		if (sev > sev_global) {
-			sev_global = sev;
-			ghes_global = ghes;
-		}
+		if (sev >= GHES_SEV_PANIC)
+			__ghes_panic(ghes);
+
 		ret = NMI_HANDLED;
 	}
 
 	if (ret == NMI_DONE)
 		goto out;
 
-	if (sev_global >= GHES_SEV_PANIC)
-		__ghes_panic(ghes_global);
-
 	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
 		if (!(ghes->flags & GHES_TO_CLEAR))
 			continue;
-- 
2.3.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 4/5] GHES: Elliminate double-loop in the NMI handler
  2015-03-27  9:22 [RFC PATCH 0/5] GHES NMI handler cleanup Borislav Petkov
                   ` (2 preceding siblings ...)
  2015-03-27  9:22 ` [RFC PATCH 3/5] GHES: Panic right after detection Borislav Petkov
@ 2015-03-27  9:22 ` Borislav Petkov
  2015-03-27  9:22 ` [RFC PATCH 5/5] GHES: Make NMI handler have a single reader Borislav Petkov
  4 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-03-27  9:22 UTC (permalink / raw)
  To: linux-edac
  Cc: Borislav Petkov, Rafael J. Wysocki, Len Brown, Tony Luck,
	Tomasz Nowicki, Chen, Gong, Wolfram Sang, Lv Zheng,
	Naoya Horiguchi, linux-acpi, linux-kernel

From: Borislav Petkov <bp@suse.de>

There's no real need to iterate twice over the HW error sources in the
NMI handler. With the previous cleanups, elliminating the second loop is
almost trivial.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0de3adcca03e..94a44bad5576 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -851,25 +851,18 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 		if (sev >= GHES_SEV_PANIC)
 			__ghes_panic(ghes);
 
-		ret = NMI_HANDLED;
-	}
-
-	if (ret == NMI_DONE)
-		goto out;
-
-	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
 		if (!(ghes->flags & GHES_TO_CLEAR))
 			continue;
 
 		__process_error(ghes);
 		ghes_clear_estatus(ghes);
+
+		ret = NMI_HANDLED;
 	}
 
 #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	irq_work_queue(&ghes_proc_irq_work);
 #endif
-
-out:
 	raw_spin_unlock(&ghes_nmi_lock);
 	return ret;
 }
-- 
2.3.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-03-27  9:22 [RFC PATCH 0/5] GHES NMI handler cleanup Borislav Petkov
                   ` (3 preceding siblings ...)
  2015-03-27  9:22 ` [RFC PATCH 4/5] GHES: Elliminate double-loop in the NMI handler Borislav Petkov
@ 2015-03-27  9:22 ` Borislav Petkov
  2015-04-01  7:45   ` Jiri Kosina
                     ` (2 more replies)
  4 siblings, 3 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-03-27  9:22 UTC (permalink / raw)
  To: linux-edac
  Cc: Jiri Kosina, Borislav Petkov, Rafael J. Wysocki, Len Brown,
	Tony Luck, Tomasz Nowicki, Chen, Gong, Wolfram Sang, Lv Zheng,
	Naoya Horiguchi, linux-acpi, linux-kernel

From: Jiri Kosina <jkosina@suse.cz>

Since GHES sources are global, we theoretically need only a single CPU
reading them per NMI instead of a thundering herd of CPUs waiting on a
spinlock in NMI context for no reason at all.

Do that.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/acpi/apei/ghes.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 94a44bad5576..2bfd53cbfe80 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -729,10 +729,10 @@ static struct llist_head ghes_estatus_llist;
 static struct irq_work ghes_proc_irq_work;
 
 /*
- * NMI may be triggered on any CPU, so ghes_nmi_lock is used for
- * mutual exclusion.
+ * NMI may be triggered on any CPU, so ghes_in_nmi is used for
+ * having only one concurrent reader.
  */
-static DEFINE_RAW_SPINLOCK(ghes_nmi_lock);
+static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
 
 static LIST_HEAD(ghes_nmi);
 
@@ -840,7 +840,9 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 	struct ghes *ghes;
 	int sev, ret = NMI_DONE;
 
-	raw_spin_lock(&ghes_nmi_lock);
+	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
+		return ret;
+
 	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
 		if (ghes_read_estatus(ghes, 1)) {
 			ghes_clear_estatus(ghes);
@@ -863,7 +865,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 	irq_work_queue(&ghes_proc_irq_work);
 #endif
-	raw_spin_unlock(&ghes_nmi_lock);
+	atomic_dec(&ghes_in_nmi);
 	return ret;
 }
 
-- 
2.3.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-03-27  9:22 ` [RFC PATCH 5/5] GHES: Make NMI handler have a single reader Borislav Petkov
@ 2015-04-01  7:45   ` Jiri Kosina
  2015-04-01 13:49     ` Borislav Petkov
  2015-04-28 14:30     ` Don Zickus
  2015-04-27  3:16   ` Zheng, Lv
  2015-04-28 13:38   ` Zheng, Lv
  2 siblings, 2 replies; 32+ messages in thread
From: Jiri Kosina @ 2015-04-01  7:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Borislav Petkov, Rafael J. Wysocki, Len Brown,
	Tony Luck, Tomasz Nowicki, Chen, Gong, Wolfram Sang, Lv Zheng,
	Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Fri, 27 Mar 2015, Borislav Petkov wrote:

> From: Jiri Kosina <jkosina@suse.cz>
> 
> Since GHES sources are global, we theoretically need only a single CPU
> reading them per NMI instead of a thundering herd of CPUs waiting on a
> spinlock in NMI context for no reason at all.

I originally wasn't 100% sure whether GHES sources are global (i.e. if it 
really doesn't matter which CPU is reading the registers), but looking at 
the code more it actually seems that this is really the right thing to do.

Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page 
with the registers, performs apei_read() (which is ghes-source specific, 
but not CPU-specific) and unmaps the page again.

There is nothing that would make this CPU-specific. Adding Huang Ying (the 
original author of the code) to confirm this. Huang?

> Do that.

I think this should indeed be pushed forward. It fixes horrible spinlock 
contention on systems which are under NMI storm (such as when perf is 
active) unrelated to GHES.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-01  7:45   ` Jiri Kosina
@ 2015-04-01 13:49     ` Borislav Petkov
  2015-04-23  8:39       ` Jiri Kosina
  2015-04-28 14:30     ` Don Zickus
  1 sibling, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2015-04-01 13:49 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: linux-edac, Rafael J. Wysocki, Len Brown, Tony Luck,
	Tomasz Nowicki, Chen, Gong, Wolfram Sang, Lv Zheng,
	Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> On Fri, 27 Mar 2015, Borislav Petkov wrote:
> 
> > From: Jiri Kosina <jkosina@suse.cz>
> > 
> > Since GHES sources are global, we theoretically need only a single CPU
> > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > spinlock in NMI context for no reason at all.
> 
> I originally wasn't 100% sure whether GHES sources are global (i.e. if it 
> really doesn't matter which CPU is reading the registers), but looking at 
> the code more it actually seems that this is really the right thing to do.
> 
> Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page 
> with the registers, performs apei_read() (which is ghes-source specific, 
> but not CPU-specific) and unmaps the page again.
> 
> There is nothing that would make this CPU-specific. Adding Huang Ying (the 
> original author of the code) to confirm this. Huang?
> 
> > Do that.
> 
> I think this should indeed be pushed forward. It fixes horrible spinlock 
> contention on systems which are under NMI storm (such as when perf is 
> active) unrelated to GHES.

Right, so I tested injecting an error without your patch and same
behavior. So it all points at global sources AFAICT. It would be cool,
though, if someone who knows the fw confirms unambiguously.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-01 13:49     ` Borislav Petkov
@ 2015-04-23  8:39       ` Jiri Kosina
  2015-04-23  8:59         ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Kosina @ 2015-04-23  8:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Rafael J. Wysocki, Len Brown, Tony Luck,
	Tomasz Nowicki, Chen, Gong, Wolfram Sang, Lv Zheng,
	Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Wed, 1 Apr 2015, Borislav Petkov wrote:

> > > From: Jiri Kosina <jkosina@suse.cz>
> > > 
> > > Since GHES sources are global, we theoretically need only a single CPU
> > > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > > spinlock in NMI context for no reason at all.
> > 
> > I originally wasn't 100% sure whether GHES sources are global (i.e. if it 
> > really doesn't matter which CPU is reading the registers), but looking at 
> > the code more it actually seems that this is really the right thing to do.
> > 
> > Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page 
> > with the registers, performs apei_read() (which is ghes-source specific, 
> > but not CPU-specific) and unmaps the page again.
> > 
> > There is nothing that would make this CPU-specific. Adding Huang Ying (the 
> > original author of the code) to confirm this. Huang?
> > 
> > > Do that.
> > 
> > I think this should indeed be pushed forward. It fixes horrible spinlock 
> > contention on systems which are under NMI storm (such as when perf is 
> > active) unrelated to GHES.
> 
> Right, so I tested injecting an error without your patch and same
> behavior. So it all points at global sources AFAICT. It would be cool,
> though, if someone who knows the fw confirms unambiguously.

Three weeks have passed, therefore I find this an appropriate time for a 
friendly ping :)

Rafael? Naoya? Huang?

This fixes a contention spinlock problem in NMI observed on a real HW, so 
it would be really nice to have it fixed.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-23  8:39       ` Jiri Kosina
@ 2015-04-23  8:59         ` Borislav Petkov
  2015-04-23 18:00           ` Luck, Tony
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2015-04-23  8:59 UTC (permalink / raw)
  To: Jiri Kosina, Tony Luck
  Cc: linux-edac, Rafael J. Wysocki, Len Brown, Tomasz Nowicki, Chen,
	Gong, Wolfram Sang, Lv Zheng, Naoya Horiguchi, linux-acpi,
	linux-kernel, Huang Ying

On Thu, Apr 23, 2015 at 10:39:58AM +0200, Jiri Kosina wrote:
> Three weeks have passed, therefore I find this an appropriate time for a 
> friendly ping :)
> 
> Rafael? Naoya? Huang?
> 
> This fixes a contention spinlock problem in NMI observed on a real HW, so 
> it would be really nice to have it fixed.

I think we should apply this.

Here's why: nothing in the ghes_notify_nmi() handler does CPU-specific
accesses. It iterates over the list of ghes sources which do NMI
notification but those sources are the *same* regardless of which core
does the access as their addresses are per-source, i.e. in that

	struct acpi_generic_address error_status_address;

thing.

And it is a safe bet to say that all that error information is
serialized in the firmware for the error source to consume.

So I'm going to route this through the RAS tree unless Rafael wants to
take it.

Ok?

Tony, objections?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-23  8:59         ` Borislav Petkov
@ 2015-04-23 18:00           ` Luck, Tony
  2015-04-27 20:23             ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Luck, Tony @ 2015-04-23 18:00 UTC (permalink / raw)
  To: Borislav Petkov, Jiri Kosina
  Cc: linux-edac, Rafael J. Wysocki, Len Brown, Tomasz Nowicki, Chen,
	Gong, Wolfram Sang, Zheng, Lv, Naoya Horiguchi, linux-acpi,
	linux-kernel, Huang, Ying

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 346 bytes --]

> I think we should apply this.
>
> Here's why: nothing in the ghes_notify_nmi() handler does CPU-specific
> accesses

This looks to be true.

> Tony, objections?

No objections.

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-03-27  9:22 ` [RFC PATCH 5/5] GHES: Make NMI handler have a single reader Borislav Petkov
  2015-04-01  7:45   ` Jiri Kosina
@ 2015-04-27  3:16   ` Zheng, Lv
  2015-04-27  8:46     ` Borislav Petkov
  2015-04-28 13:38   ` Zheng, Lv
  2 siblings, 1 reply; 32+ messages in thread
From: Zheng, Lv @ 2015-04-27  3:16 UTC (permalink / raw)
  To: Borislav Petkov, linux-edac
  Cc: Jiri Kosina, Borislav Petkov, Rafael J. Wysocki, Len Brown, Luck,
	Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang, Naoya Horiguchi,
	linux-acpi, linux-kernel

Hi,

> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Friday, March 27, 2015 5:23 PM
> 
> From: Jiri Kosina <jkosina@suse.cz>
> 
> Since GHES sources are global, we theoretically need only a single CPU
> reading them per NMI instead of a thundering herd of CPUs waiting on a
> spinlock in NMI context for no reason at all.
> 
> Do that.
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  drivers/acpi/apei/ghes.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 94a44bad5576..2bfd53cbfe80 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -729,10 +729,10 @@ static struct llist_head ghes_estatus_llist;
>  static struct irq_work ghes_proc_irq_work;
> 
>  /*
> - * NMI may be triggered on any CPU, so ghes_nmi_lock is used for
> - * mutual exclusion.
> + * NMI may be triggered on any CPU, so ghes_in_nmi is used for
> + * having only one concurrent reader.
>   */
> -static DEFINE_RAW_SPINLOCK(ghes_nmi_lock);
> +static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
> 
>  static LIST_HEAD(ghes_nmi);
> 
> @@ -840,7 +840,9 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>  	struct ghes *ghes;
>  	int sev, ret = NMI_DONE;
> 
> -	raw_spin_lock(&ghes_nmi_lock);
> +	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
> +		return ret;
> +

Just a simple question.
Why not just using cmpxchg here instead of atomic_add_unless so that no atomic_dec will be needed.

Thanks and best regards
-Lv

>  	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
>  		if (ghes_read_estatus(ghes, 1)) {
>  			ghes_clear_estatus(ghes);
> @@ -863,7 +865,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>  #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	irq_work_queue(&ghes_proc_irq_work);
>  #endif
> -	raw_spin_unlock(&ghes_nmi_lock);
> +	atomic_dec(&ghes_in_nmi);
>  	return ret;
>  }
> 
> --
> 2.3.3


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-27  3:16   ` Zheng, Lv
@ 2015-04-27  8:46     ` Borislav Petkov
  2015-04-28  0:44       ` Zheng, Lv
  2015-04-28  2:24       ` Zheng, Lv
  0 siblings, 2 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-04-27  8:46 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

On Mon, Apr 27, 2015 at 03:16:00AM +0000, Zheng, Lv wrote:
> > @@ -840,7 +840,9 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
> >  	struct ghes *ghes;
> >  	int sev, ret = NMI_DONE;
> > 
> > -	raw_spin_lock(&ghes_nmi_lock);
> > +	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
> > +		return ret;
> > +
> 
> Just a simple question.
> Why not just using cmpxchg here instead of atomic_add_unless so that no atomic_dec will be needed.

What do you think atomic_add_unless ends up doing:

#APP
# 177 "./arch/x86/include/asm/atomic.h" 1
	.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
	lock; cmpxchgl %edx,ghes_in_nmi(%rip)	# D.37056, MEM[(volatile u32 *)&ghes_in_nmi]
# 0 "" 2
#NO_APP

And you need to atomic_dec() so that another reader can enter, i.e. how
the exclusion primitive works.

Or did you have something else in mind?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-23 18:00           ` Luck, Tony
@ 2015-04-27 20:23             ` Borislav Petkov
  0 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-04-27 20:23 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Jiri Kosina, linux-edac, Rafael J. Wysocki,
	Len Brown, Tomasz Nowicki, Chen, Gong, Wolfram Sang, Zheng, Lv,
	Naoya Horiguchi, linux-acpi, linux-kernel, Huang, Ying

On Thu, Apr 23, 2015 at 06:00:15PM +0000, Luck, Tony wrote:
> > I think we should apply this.
> >
> > Here's why: nothing in the ghes_notify_nmi() handler does CPU-specific
> > accesses
> 
> This looks to be true.
> 
> > Tony, objections?
> 
> No objections.

Thanks, queued for 4.2, pending one last test I'm doing on a machine
which says

[   24.332560] INFO: NMI handler (ghes_notify_nmi) took too long to run: 3.265 msecs
[   24.332567] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.946 msecs
[   24.332568] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.948 msecs

during boot.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-27  8:46     ` Borislav Petkov
@ 2015-04-28  0:44       ` Zheng, Lv
  2015-04-28  2:24       ` Zheng, Lv
  1 sibling, 0 replies; 32+ messages in thread
From: Zheng, Lv @ 2015-04-28  0:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1623 bytes --]

Hi,

> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Monday, April 27, 2015 4:47 PM
> 
> On Mon, Apr 27, 2015 at 03:16:00AM +0000, Zheng, Lv wrote:
> > > @@ -840,7 +840,9 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
> > >  	struct ghes *ghes;
> > >  	int sev, ret = NMI_DONE;
> > >
> > > -	raw_spin_lock(&ghes_nmi_lock);
> > > +	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
> > > +		return ret;
> > > +
> >
> > Just a simple question.
> > Why not just using cmpxchg here instead of atomic_add_unless so that no atomic_dec will be needed.
> 
> What do you think atomic_add_unless ends up doing:
> 
> #APP
> # 177 "./arch/x86/include/asm/atomic.h" 1
> 	.pushsection .smp_locks,"a"
> .balign 4
> .long 671f - .
> .popsection
> 671:
> 	lock; cmpxchgl %edx,ghes_in_nmi(%rip)	# D.37056, MEM[(volatile u32 *)&ghes_in_nmi]
> # 0 "" 2
> #NO_APP
> 
> And you need to atomic_dec() so that another reader can enter, i.e. how
> the exclusion primitive works.
> 
> Or did you have something else in mind?

My mistake.
I mean cmpxchg() and xchg() (or atomic_cmpxchg() and atomic_xchg()) pair here, so nothing can be reduced.

But IMO, atomic_add_unless() is implemented via cmpxchg on many architectures.
And it might be better to use it directly here which is a bit faster as you actually only need one value switch here.

Thanks and best regards
-Lv


> 
> --
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
> --
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-27  8:46     ` Borislav Petkov
  2015-04-28  0:44       ` Zheng, Lv
@ 2015-04-28  2:24       ` Zheng, Lv
  2015-04-28  7:38         ` Borislav Petkov
  1 sibling, 1 reply; 32+ messages in thread
From: Zheng, Lv @ 2015-04-28  2:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1952 bytes --]

Hi,

> From: Zheng, Lv
> Sent: Tuesday, April 28, 2015 8:44 AM
> 
> Hi,
> 
> > From: Borislav Petkov [mailto:bp@alien8.de]
> > Sent: Monday, April 27, 2015 4:47 PM
> >
> > On Mon, Apr 27, 2015 at 03:16:00AM +0000, Zheng, Lv wrote:
> > > > @@ -840,7 +840,9 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
> > > >  	struct ghes *ghes;
> > > >  	int sev, ret = NMI_DONE;
> > > >
> > > > -	raw_spin_lock(&ghes_nmi_lock);
> > > > +	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
> > > > +		return ret;
> > > > +
> > >
> > > Just a simple question.
> > > Why not just using cmpxchg here instead of atomic_add_unless so that no atomic_dec will be needed.
> >
> > What do you think atomic_add_unless ends up doing:
> >
> > #APP
> > # 177 "./arch/x86/include/asm/atomic.h" 1
> > 	.pushsection .smp_locks,"a"
> > .balign 4
> > .long 671f - .
> > .popsection
> > 671:
> > 	lock; cmpxchgl %edx,ghes_in_nmi(%rip)	# D.37056, MEM[(volatile u32 *)&ghes_in_nmi]
> > # 0 "" 2
> > #NO_APP
> >
> > And you need to atomic_dec() so that another reader can enter, i.e. how
> > the exclusion primitive works.
> >
> > Or did you have something else in mind?
> 
> My mistake.
> I mean cmpxchg() and xchg() (or atomic_cmpxchg() and atomic_xchg()) pair here, so nothing can be reduced.

Let me correct, it should be atomic_cmpxchg() and atomic_set() here as you only need to switch between 0 and 1.
Sorry for the noise.

Thanks and best regards
-Lv

> 
> But IMO, atomic_add_unless() is implemented via cmpxchg on many architectures.
> And it might be better to use it directly here which is a bit faster as you actually only need one value switch here.
> 
> Thanks and best regards
> -Lv
> 
> 
> >
> > --
> > Regards/Gruss,
> >     Boris.
> >
> > ECO tip #101: Trim your mails when you reply.
> > --
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28  2:24       ` Zheng, Lv
@ 2015-04-28  7:38         ` Borislav Petkov
  0 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-04-28  7:38 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

On Tue, Apr 28, 2015 at 02:24:16AM +0000, Zheng, Lv wrote:
> > > #APP
> > > # 177 "./arch/x86/include/asm/atomic.h" 1
> > > 	.pushsection .smp_locks,"a"
> > > .balign 4
> > > .long 671f - .
> > > .popsection
> > > 671:
> > > 	lock; cmpxchgl %edx,ghes_in_nmi(%rip)	# D.37056, MEM[(volatile u32 *)&ghes_in_nmi]
> > > # 0 "" 2
> > > #NO_APP
> > >
> > > And you need to atomic_dec() so that another reader can enter, i.e. how
> > > the exclusion primitive works.
> > >
> > > Or did you have something else in mind?
> > 
> > My mistake.
> > I mean cmpxchg() and xchg() (or atomic_cmpxchg() and atomic_xchg()) pair here, so nothing can be reduced.
> 
> Let me correct, it should be atomic_cmpxchg() and atomic_set() here as you only need to switch between 0 and 1.
> Sorry for the noise.

I still don't understand what you want from me here. You need to go into
more detail.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-03-27  9:22 ` [RFC PATCH 5/5] GHES: Make NMI handler have a single reader Borislav Petkov
  2015-04-01  7:45   ` Jiri Kosina
  2015-04-27  3:16   ` Zheng, Lv
@ 2015-04-28 13:38   ` Zheng, Lv
  2015-04-28 13:59     ` Borislav Petkov
  2 siblings, 1 reply; 32+ messages in thread
From: Zheng, Lv @ 2015-04-28 13:38 UTC (permalink / raw)
  To: Borislav Petkov, linux-edac
  Cc: Jiri Kosina, Borislav Petkov, Rafael J. Wysocki, Len Brown, Luck,
	Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang, Naoya Horiguchi,
	linux-acpi, linux-kernel

Hi,

I was talking about this patch.

> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Friday, March 27, 2015 5:23 PM
> Subject: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
> 
> From: Jiri Kosina <jkosina@suse.cz>
> 
> Since GHES sources are global, we theoretically need only a single CPU
> reading them per NMI instead of a thundering herd of CPUs waiting on a
> spinlock in NMI context for no reason at all.
> 
> Do that.
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  drivers/acpi/apei/ghes.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 94a44bad5576..2bfd53cbfe80 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -729,10 +729,10 @@ static struct llist_head ghes_estatus_llist;
>  static struct irq_work ghes_proc_irq_work;
> 
>  /*
> - * NMI may be triggered on any CPU, so ghes_nmi_lock is used for
> - * mutual exclusion.
> + * NMI may be triggered on any CPU, so ghes_in_nmi is used for
> + * having only one concurrent reader.
>   */
> -static DEFINE_RAW_SPINLOCK(ghes_nmi_lock);
> +static atomic_t ghes_in_nmi = ATOMIC_INIT(0);
> 
>  static LIST_HEAD(ghes_nmi);
> 
> @@ -840,7 +840,9 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>  	struct ghes *ghes;
>  	int sev, ret = NMI_DONE;
> 
> -	raw_spin_lock(&ghes_nmi_lock);
> +	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
> +		return ret;
> +

if (atomic_cmpxchg(&ghes_in_nmi, 0, 1))
	return ret;

>  	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
>  		if (ghes_read_estatus(ghes, 1)) {
>  			ghes_clear_estatus(ghes);
> @@ -863,7 +865,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>  #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	irq_work_queue(&ghes_proc_irq_work);
>  #endif
> -	raw_spin_unlock(&ghes_nmi_lock);
> +	atomic_dec(&ghes_in_nmi);

atomic_set(&ghes_in_nmi, 0);

It seems most of the drivers (under drivers/) are written in this way.
While the user of atomic_add_unless() is rare.

Can this work for you?

Thanks and best regards
-Lv

>  	return ret;
>  }
> 
> --
> 2.3.3


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28 13:38   ` Zheng, Lv
@ 2015-04-28 13:59     ` Borislav Petkov
  2015-04-29  0:24       ` Zheng, Lv
  2015-04-29  0:49       ` Zheng, Lv
  0 siblings, 2 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-04-28 13:59 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

On Tue, Apr 28, 2015 at 01:38:41PM +0000, Zheng, Lv wrote:
> > -	raw_spin_lock(&ghes_nmi_lock);
> > +	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
> > +		return ret;
> > +
> 
> if (atomic_cmpxchg(&ghes_in_nmi, 0, 1))
> 	return ret;

Ok, now I understand what you mean.

We absolutely want to use atomic_add_unless() because we get to save us
the expensive

	LOCK; CMPXCHG

if the value was already 1. Which is exactly what this patch is trying
to avoid - a thundering herd of cores CMPXCHGing a global variable.

I.e.,

	movl	ghes_in_nmi(%rip), %ecx	# MEM[(const int *)&ghes_in_nmi], c
	cmpl	$1, %ecx	#, c
	je	.L311	#,				<--- exit here if ghes_in_nmi == 1.
	leal	1(%rcx), %edx	#, D.37163
	movl	%ecx, %eax	# c, c
#APP
# 177 "./arch/x86/include/asm/atomic.h" 1
	.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
	lock; cmpxchgl %edx,ghes_in_nmi(%rip)	# D.37163, MEM[(volatile u32 *)&ghes_in_nmi]
# 0 "" 2
#NO_APP

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-01  7:45   ` Jiri Kosina
  2015-04-01 13:49     ` Borislav Petkov
@ 2015-04-28 14:30     ` Don Zickus
  2015-04-28 14:42       ` Don Zickus
  2015-04-28 14:55       ` Borislav Petkov
  1 sibling, 2 replies; 32+ messages in thread
From: Don Zickus @ 2015-04-28 14:30 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Borislav Petkov, linux-edac, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Tony Luck, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Lv Zheng, Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> On Fri, 27 Mar 2015, Borislav Petkov wrote:
> 
> > From: Jiri Kosina <jkosina@suse.cz>
> > 
> > Since GHES sources are global, we theoretically need only a single CPU
> > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > spinlock in NMI context for no reason at all.
> 
> I originally wasn't 100% sure whether GHES sources are global (i.e. if it 
> really doesn't matter which CPU is reading the registers), but looking at 
> the code more it actually seems that this is really the right thing to do.
> 
> Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page 
> with the registers, performs apei_read() (which is ghes-source specific, 
> but not CPU-specific) and unmaps the page again.
> 
> There is nothing that would make this CPU-specific. Adding Huang Ying (the 
> original author of the code) to confirm this. Huang?

Hi,

I believe the answer to this question is no, they are not global but
instead external.  All external NMIs are routed to one cpu, normally cpu0.
This spinlock was made global to handle the 'someday' case of hotplugging
the bsp cpu (cpu0).

The other external NMIs (IO_CHECK and SERR) suffer from the same spinlock
problem.  I tried using an irq_workqueue to work around quirks there and
PeterZ wasn't very fond of it (though he couldn't think of a better way to
solve it).

This patch seems interesting but you might still run into the thundering
herd problem with the global spinlock in
arch/x86/kernel/nmi.c::default_do_nmi().  That functions grabs a global
spinlock before processing the external NMI list (which GHES is a part of).


So I am assuming this patch solves the 'thundering herd' problem by
minimizing all the useless writes the spinlock would do for each cpu that
noticed it had no work to do?

In that case, I am in favor of this solution and would like to apply a
similar solution to arch/x86/kernel/nmi.c, to see if that helps there too.

Cheers,
Don


> 
> > Do that.
> 
> I think this should indeed be pushed forward. It fixes horrible spinlock 
> contention on systems which are under NMI storm (such as when perf is 
> active) unrelated to GHES.
> 
> Thanks,
> 
> -- 
> Jiri Kosina
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28 14:30     ` Don Zickus
@ 2015-04-28 14:42       ` Don Zickus
  2015-04-28 14:55       ` Borislav Petkov
  1 sibling, 0 replies; 32+ messages in thread
From: Don Zickus @ 2015-04-28 14:42 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Borislav Petkov, linux-edac, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Tony Luck, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Lv Zheng, Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Tue, Apr 28, 2015 at 10:30:09AM -0400, Don Zickus wrote:
> On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> > On Fri, 27 Mar 2015, Borislav Petkov wrote:
> > 
> > > From: Jiri Kosina <jkosina@suse.cz>
> > > 
> > > Since GHES sources are global, we theoretically need only a single CPU
> > > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > > spinlock in NMI context for no reason at all.
> > 
> > I originally wasn't 100% sure whether GHES sources are global (i.e. if it 
> > really doesn't matter which CPU is reading the registers), but looking at 
> > the code more it actually seems that this is really the right thing to do.
> > 
> > Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page 
> > with the registers, performs apei_read() (which is ghes-source specific, 
> > but not CPU-specific) and unmaps the page again.
> > 
> > There is nothing that would make this CPU-specific. Adding Huang Ying (the 
> > original author of the code) to confirm this. Huang?
> 
> Hi,
> 
> I believe the answer to this question is no, they are not global but
> instead external.  All external NMIs are routed to one cpu, normally cpu0.
> This spinlock was made global to handle the 'someday' case of hotplugging
> the bsp cpu (cpu0).
> 
> The other external NMIs (IO_CHECK and SERR) suffer from the same spinlock
> problem.  I tried using an irq_workqueue to work around quirks there and
> PeterZ wasn't very fond of it (though he couldn't think of a better way to
> solve it).
> 
> This patch seems interesting but you might still run into the thundering
> herd problem with the global spinlock in
> arch/x86/kernel/nmi.c::default_do_nmi().  That functions grabs a global
> spinlock before processing the external NMI list (which GHES is a part of).

Grr, I mispoke.   I sent a patchset a year ago to split out internal and
external NMIs to simplify the problem.  So I wrote the above paragraph
thinking the GHES NMI handler was wrapped with the external NMI spinlock,
when in fact it isn't.  However, with perf running with lots of events, it
is possible to start 'swallowing' NMIs which requires passing through the
spinlock I just mentioned.  This might cause random delays in your
measurements and is still worth modifying.

Cheers,
Don

> 
> 
> So I am assuming this patch solves the 'thundering herd' problem by
> minimizing all the useless writes the spinlock would do for each cpu that
> noticed it had no work to do?
> 
> In that case, I am in favor of this solution and would like to apply a
> similar solution to arch/x86/kernel/nmi.c, to see if that helps there too.
> 
> Cheers,
> Don
> 
> 
> > 
> > > Do that.
> > 
> > I think this should indeed be pushed forward. It fixes horrible spinlock 
> > contention on systems which are under NMI storm (such as when perf is 
> > active) unrelated to GHES.
> > 
> > Thanks,
> > 
> > -- 
> > Jiri Kosina
> > SUSE Labs
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28 14:30     ` Don Zickus
  2015-04-28 14:42       ` Don Zickus
@ 2015-04-28 14:55       ` Borislav Petkov
  2015-04-28 15:35         ` Don Zickus
  1 sibling, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2015-04-28 14:55 UTC (permalink / raw)
  To: Don Zickus
  Cc: Jiri Kosina, linux-edac, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Tony Luck, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Lv Zheng, Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Tue, Apr 28, 2015 at 10:30:09AM -0400, Don Zickus wrote:
> On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> > On Fri, 27 Mar 2015, Borislav Petkov wrote:
> > 
> > > From: Jiri Kosina <jkosina@suse.cz>
> > > 
> > > Since GHES sources are global, we theoretically need only a single CPU
> > > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > > spinlock in NMI context for no reason at all.
> > 
> > I originally wasn't 100% sure whether GHES sources are global (i.e. if it 
> > really doesn't matter which CPU is reading the registers), but looking at 
> > the code more it actually seems that this is really the right thing to do.
> > 
> > Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page 
> > with the registers, performs apei_read() (which is ghes-source specific, 
> > but not CPU-specific) and unmaps the page again.
> > 
> > There is nothing that would make this CPU-specific. Adding Huang Ying (the 
> > original author of the code) to confirm this. Huang?
> 
> Hi,
> 
> I believe the answer to this question is no, they are not global but
> instead external.  All external NMIs are routed to one cpu, normally cpu0.

Actually, the things we're talking about here are not global NMIs but
global error sources. I.e., the GHES sources which use an NMI to report
errors with and which we noodle over in ghes_notify_nmi() twice:

	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
		...

> This spinlock was made global to handle the 'someday' case of hotplugging
> the bsp cpu (cpu0).
> 
> The other external NMIs (IO_CHECK and SERR) suffer from the same spinlock
> problem.  I tried using an irq_workqueue to work around quirks there and
> PeterZ wasn't very fond of it (though he couldn't think of a better way to
> solve it).
> 
> This patch seems interesting but you might still run into the thundering
> herd problem with the global spinlock in
> arch/x86/kernel/nmi.c::default_do_nmi().  That functions grabs a global
> spinlock before processing the external NMI list (which GHES is a part of).

nmi_reason_lock? And our handler is registered with
register_nmi_handler() and so we iterate over it in nmi_handle(). It'd
be interesting to know what NMI reason we get for those GHES NMIs so
that we know whether nmi_handle() is called under the lock or not...

> So I am assuming this patch solves the 'thundering herd' problem by
> minimizing all the useless writes the spinlock would do for each cpu that
> noticed it had no work to do?

Not that spinlock. It used to take another one in ghes_notify_nmi().
Removing it solved the issue. There were even boxes which would do:

[   24.332560] INFO: NMI handler (ghes_notify_nmi) took too long to run: 3.265 msecs
[   24.332567] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.946 msecs
[   24.332568] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.948 msecs

just like that during boot.

It was basically a lot of purely useless lock grabbing for no reason
whatsoever.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28 14:55       ` Borislav Petkov
@ 2015-04-28 15:35         ` Don Zickus
  2015-04-28 16:22           ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Don Zickus @ 2015-04-28 15:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Kosina, linux-edac, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Tony Luck, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Lv Zheng, Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Tue, Apr 28, 2015 at 04:55:48PM +0200, Borislav Petkov wrote:
> On Tue, Apr 28, 2015 at 10:30:09AM -0400, Don Zickus wrote:
> > On Wed, Apr 01, 2015 at 09:45:53AM +0200, Jiri Kosina wrote:
> > > On Fri, 27 Mar 2015, Borislav Petkov wrote:
> > > 
> > > > From: Jiri Kosina <jkosina@suse.cz>
> > > > 
> > > > Since GHES sources are global, we theoretically need only a single CPU
> > > > reading them per NMI instead of a thundering herd of CPUs waiting on a
> > > > spinlock in NMI context for no reason at all.
> > > 
> > > I originally wasn't 100% sure whether GHES sources are global (i.e. if it 
> > > really doesn't matter which CPU is reading the registers), but looking at 
> > > the code more it actually seems that this is really the right thing to do.
> > > 
> > > Rationale: ghes_ioremap_pfn_nmi() always ioremaps() (exclusively) the page 
> > > with the registers, performs apei_read() (which is ghes-source specific, 
> > > but not CPU-specific) and unmaps the page again.
> > > 
> > > There is nothing that would make this CPU-specific. Adding Huang Ying (the 
> > > original author of the code) to confirm this. Huang?
> > 
> > Hi,
> > 
> > I believe the answer to this question is no, they are not global but
> > instead external.  All external NMIs are routed to one cpu, normally cpu0.
> 
> Actually, the things we're talking about here are not global NMIs but
> global error sources. I.e., the GHES sources which use an NMI to report
> errors with and which we noodle over in ghes_notify_nmi() twice:
> 
> 	list_for_each_entry_rcu(ghes, &ghes_nmi, list) {

Sure, most systems use NMI to report them I believe (at least the ones I
have played with).  Regardless, I have hit performance problems on this lock
while doing perf testing.

I tried to work around it by splitting the NMI list into an
nmi_register(INTERNAL,...) and nmi_register(EXTERNAL,...) to allow perf to
avoid hitting this lock.  But the fallout of that change included missed
NMIs and various solutions from that resulted in complexities that blocked
inclusion last year.

Your solution seems much simpler. :-)

> 		...
> 
> > This spinlock was made global to handle the 'someday' case of hotplugging
> > the bsp cpu (cpu0).
> > 
> > The other external NMIs (IO_CHECK and SERR) suffer from the same spinlock
> > problem.  I tried using an irq_workqueue to work around quirks there and
> > PeterZ wasn't very fond of it (though he couldn't think of a better way to
> > solve it).
> > 
> > This patch seems interesting but you might still run into the thundering
> > herd problem with the global spinlock in
> > arch/x86/kernel/nmi.c::default_do_nmi().  That functions grabs a global
> > spinlock before processing the external NMI list (which GHES is a part of).
> 
> nmi_reason_lock? And our handler is registered with
> register_nmi_handler() and so we iterate over it in nmi_handle(). It'd
> be interesting to know what NMI reason we get for those GHES NMIs so
> that we know whether nmi_handle() is called under the lock or not...

I followed up in another email stating I mis-spoke.  I forgot this still
uses the NMI_LOCAL shared NMI.  So every perf NMI, will also call the GHES
handler to make sure NMIs did not piggy back each other.  So I don't believe
the NMI reason lock is called a majority of the time (except when the NMI is
swallowed, but that is under heavy perf load...).


> 
> > So I am assuming this patch solves the 'thundering herd' problem by
> > minimizing all the useless writes the spinlock would do for each cpu that
> > noticed it had no work to do?
> 
> Not that spinlock. It used to take another one in ghes_notify_nmi().
> Removing it solved the issue. There were even boxes which would do:
> 
> [   24.332560] INFO: NMI handler (ghes_notify_nmi) took too long to run: 3.265 msecs
> [   24.332567] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.946 msecs
> [   24.332568] INFO: NMI handler (ghes_notify_nmi) took too long to run: 5.948 msecs
> 
> just like that during boot.
> 
> It was basically a lot of purely useless lock grabbing for no reason
> whatsoever.

Yes, I meant the 'raw_spin_lock(&ghes_nmi_lock);' one.  I poorly typed my
email and confused you into thinking I was referring to the wrong one.

Well, it isn't exactly no reason, but more along the lines of 'we do not
know who gets the external NMI, so everyone tries to talk with the GHES
firmware'.  But I think that is just me squabbling.

We both agree the mechanics of the spinlock are overkill here and cause much
cache contention.  Simplifying it to just 'reads' and return removes most of
the problem.

Cheers,
Don

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28 15:35         ` Don Zickus
@ 2015-04-28 16:22           ` Borislav Petkov
  2015-04-28 18:44             ` Don Zickus
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2015-04-28 16:22 UTC (permalink / raw)
  To: Don Zickus
  Cc: Jiri Kosina, linux-edac, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Tony Luck, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Lv Zheng, Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Tue, Apr 28, 2015 at 11:35:21AM -0400, Don Zickus wrote:
> Your solution seems much simpler. :-)

... and I love simpler :-)

> I followed up in another email stating I mis-spoke.  I forgot this still
> uses the NMI_LOCAL shared NMI.  So every perf NMI, will also call the GHES
> handler to make sure NMIs did not piggy back each other.  So I don't believe

And this is something we should really fix - perf and RAS should
not have anything to do with each other. But I don't know the NMI
code to even have an idea how. I don't even know whether we can
differentiate NMIs, hell, I can't imagine the hardware giving us a
different NMI reason through get_nmi_reason(). Maybe that byte returned
from NMI_REASON_PORT is too small and hangs on too much legacy crap to
even be usable. Questions over questions...

> the NMI reason lock is called a majority of the time (except when the NMI is
> swallowed, but that is under heavy perf load...).

..

> We both agree the mechanics of the spinlock are overkill here and cause much
> cache contention.  Simplifying it to just 'reads' and return removes most of
> the problem.

Right.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28 16:22           ` Borislav Petkov
@ 2015-04-28 18:44             ` Don Zickus
  2015-05-04 15:40               ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Don Zickus @ 2015-04-28 18:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jiri Kosina, linux-edac, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Tony Luck, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Lv Zheng, Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Tue, Apr 28, 2015 at 06:22:29PM +0200, Borislav Petkov wrote:
> On Tue, Apr 28, 2015 at 11:35:21AM -0400, Don Zickus wrote:
> > Your solution seems much simpler. :-)
> 
> ... and I love simpler :-)
> 
> > I followed up in another email stating I mis-spoke.  I forgot this still
> > uses the NMI_LOCAL shared NMI.  So every perf NMI, will also call the GHES
> > handler to make sure NMIs did not piggy back each other.  So I don't believe
> 
> And this is something we should really fix - perf and RAS should
> not have anything to do with each other. But I don't know the NMI
> code to even have an idea how. I don't even know whether we can
> differentiate NMIs, hell, I can't imagine the hardware giving us a
> different NMI reason through get_nmi_reason(). Maybe that byte returned
> from NMI_REASON_PORT is too small and hangs on too much legacy crap to
> even be usable. Questions over questions...

:-)  Well, let me first clear up some of your questions. 

RAS doesn't go through the legacy ports (ie get_nmi_reason()).  Instead it
triggers the external NMI through a different bit (ioapic I think).

The nmi code has no idea what io_remap'ed address apei is using to map its
error handling register that GHES uses.  Unlike the legacy port which is
always port 0x61.

So, with NMI being basically a shared interrupt, with no ability to discern
who sent the interrupt (and even worse no ability to know how _many_ were sent as
the NMI is edge triggered instead of level triggered).  As a result we rely
on the NMI handlers to talk to their address space/registers to determine if
they were they source of the interrupt.

Now I can agree that perf and RAS have nothing to do with each other, but
they both use NMI to interrupt.  Perf is fortunate enough to be internal to
each cpu and therefore needs no global lock unlike GHES (hence part of the
problem).

The only way to determine who sent the NMI is to have each handler read its
register, which is time consuming for GHES.

Of course, we could go back to playing tricks knowing that external NMIs
like GHES and IO_CHECK/SERR are only routed to one cpu (cpu0 mainly) and
optimize things that way, but that inhibits the bsp cpu hotplugging folks.



I also played tricks like last year's patchset that split out the
nmi_handlers into LOCAL and EXTERNAL queues.  Perf would be part of the
LOCAL queue while GHES was part of the EXTERNAL queue.  The thought was to
never touch the EXTERNAL queue if perf claimed an NMI.  This lead to all
sorts of missed external NMIs, so it didn't work out.


Anyway, any ideas or thoughts for improvement are always welcomed. :-) 


Cheers,
Don

> 
> > the NMI reason lock is called a majority of the time (except when the NMI is
> > swallowed, but that is under heavy perf load...).
> 
> ..
> 
> > We both agree the mechanics of the spinlock are overkill here and cause much
> > cache contention.  Simplifying it to just 'reads' and return removes most of
> > the problem.
> 
> Right.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
> --

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28 13:59     ` Borislav Petkov
@ 2015-04-29  0:24       ` Zheng, Lv
  2015-04-29  0:49       ` Zheng, Lv
  1 sibling, 0 replies; 32+ messages in thread
From: Zheng, Lv @ 2015-04-29  0:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1659 bytes --]

Hi,

> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Tuesday, April 28, 2015 9:59 PM
> Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
> 
> On Tue, Apr 28, 2015 at 01:38:41PM +0000, Zheng, Lv wrote:
> > > -	raw_spin_lock(&ghes_nmi_lock);
> > > +	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
> > > +		return ret;
> > > +
> >
> > if (atomic_cmpxchg(&ghes_in_nmi, 0, 1))
> > 	return ret;
> 
> Ok, now I understand what you mean.
> 
> We absolutely want to use atomic_add_unless() because we get to save us
> the expensive
> 
> 	LOCK; CMPXCHG
> 
> if the value was already 1. Which is exactly what this patch is trying
> to avoid - a thundering herd of cores CMPXCHGing a global variable.

IMO, on most architectures, the "cmp" part should work just like what you've done with "if".
And on some architectures, if the "xchg" doesn't happen, the "cmp" part even won't cause a pipe line hazard.

Thanks and best regards
-Lv


> 
> I.e.,
> 
> 	movl	ghes_in_nmi(%rip), %ecx	# MEM[(const int *)&ghes_in_nmi], c
> 	cmpl	$1, %ecx	#, c
> 	je	.L311	#,				<--- exit here if ghes_in_nmi == 1.
> 	leal	1(%rcx), %edx	#, D.37163
> 	movl	%ecx, %eax	# c, c
> #APP
> # 177 "./arch/x86/include/asm/atomic.h" 1
> 	.pushsection .smp_locks,"a"
> .balign 4
> .long 671f - .
> .popsection
> 671:
> 	lock; cmpxchgl %edx,ghes_in_nmi(%rip)	# D.37163, MEM[(volatile u32 *)&ghes_in_nmi]
> # 0 "" 2
> #NO_APP
> 
> --
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
> --
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28 13:59     ` Borislav Petkov
  2015-04-29  0:24       ` Zheng, Lv
@ 2015-04-29  0:49       ` Zheng, Lv
  2015-04-29  8:13         ` Borislav Petkov
  1 sibling, 1 reply; 32+ messages in thread
From: Zheng, Lv @ 2015-04-29  0:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1984 bytes --]

Hi,

> From: Zheng, Lv
> Sent: Wednesday, April 29, 2015 8:25 AM
> Subject: RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
> 
> Hi,
> 
> > From: Borislav Petkov [mailto:bp@alien8.de]
> > Sent: Tuesday, April 28, 2015 9:59 PM
> > Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
> >
> > On Tue, Apr 28, 2015 at 01:38:41PM +0000, Zheng, Lv wrote:
> > > > -	raw_spin_lock(&ghes_nmi_lock);
> > > > +	if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
> > > > +		return ret;
> > > > +
> > >
> > > if (atomic_cmpxchg(&ghes_in_nmi, 0, 1))
> > > 	return ret;
> >
> > Ok, now I understand what you mean.
> >
> > We absolutely want to use atomic_add_unless() because we get to save us
> > the expensive
> >
> > 	LOCK; CMPXCHG
> >
> > if the value was already 1. Which is exactly what this patch is trying
> > to avoid - a thundering herd of cores CMPXCHGing a global variable.
> 
> IMO, on most architectures, the "cmp" part should work just like what you've done with "if".
> And on some architectures, if the "xchg" doesn't happen, the "cmp" part even won't cause a pipe line hazard.

If you man the LOCK prefix, I understand now.

Thanks and best regards
-Lv

> 
> Thanks and best regards
> -Lv
> 
> 
> >
> > I.e.,
> >
> > 	movl	ghes_in_nmi(%rip), %ecx	# MEM[(const int *)&ghes_in_nmi], c
> > 	cmpl	$1, %ecx	#, c
> > 	je	.L311	#,				<--- exit here if ghes_in_nmi == 1.
> > 	leal	1(%rcx), %edx	#, D.37163
> > 	movl	%ecx, %eax	# c, c
> > #APP
> > # 177 "./arch/x86/include/asm/atomic.h" 1
> > 	.pushsection .smp_locks,"a"
> > .balign 4
> > .long 671f - .
> > .popsection
> > 671:
> > 	lock; cmpxchgl %edx,ghes_in_nmi(%rip)	# D.37163, MEM[(volatile u32 *)&ghes_in_nmi]
> > # 0 "" 2
> > #NO_APP
> >
> > --
> > Regards/Gruss,
> >     Boris.
> >
> > ECO tip #101: Trim your mails when you reply.
> > --
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-29  0:49       ` Zheng, Lv
@ 2015-04-29  8:13         ` Borislav Petkov
  2015-04-30  8:05           ` Zheng, Lv
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2015-04-29  8:13 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

On Wed, Apr 29, 2015 at 12:49:59AM +0000, Zheng, Lv wrote:
> > > We absolutely want to use atomic_add_unless() because we get to save us
> > > the expensive
> > >
> > > 	LOCK; CMPXCHG
> > >
> > > if the value was already 1. Which is exactly what this patch is trying
> > > to avoid - a thundering herd of cores CMPXCHGing a global variable.
> > 
> > IMO, on most architectures, the "cmp" part should work just like what you've done with "if".
> > And on some architectures, if the "xchg" doesn't happen, the "cmp" part even won't cause a pipe line hazard.

Even if CMPXCHG is being split into several microops, they all still
need to flow down the pipe and require resources and tracking. And you
only know at retire time what the CMP result is and can "discard" the
XCHG part. Provided the uarch is smart enough to do that.

This is probably why CMPXCHG needs 5,6,7,10,22,... cycles depending on
uarch and vendor, if I can trust Agner Fog's tables. And I bet those
numbers are best-case only and in real-life they probably tend to fall
out even worse.

CMP needs only 1. On almost every uarch and vendor. And even that cycle
probably gets hidden with a good branch predictor.

> If you man the LOCK prefix, I understand now.

And that makes several times worse: 22, 40, 80, ... cycles.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-29  8:13         ` Borislav Petkov
@ 2015-04-30  8:05           ` Zheng, Lv
  2015-04-30  8:48             ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Zheng, Lv @ 2015-04-30  8:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1906 bytes --]

Hi,

> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Wednesday, April 29, 2015 4:14 PM
> Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
> 
> On Wed, Apr 29, 2015 at 12:49:59AM +0000, Zheng, Lv wrote:
> > > > We absolutely want to use atomic_add_unless() because we get to save us
> > > > the expensive
> > > >
> > > > 	LOCK; CMPXCHG
> > > >
> > > > if the value was already 1. Which is exactly what this patch is trying
> > > > to avoid - a thundering herd of cores CMPXCHGing a global variable.
> > >
> > > IMO, on most architectures, the "cmp" part should work just like what you've done with "if".
> > > And on some architectures, if the "xchg" doesn't happen, the "cmp" part even won't cause a pipe line hazard.
> 
> Even if CMPXCHG is being split into several microops, they all still
> need to flow down the pipe and require resources and tracking. And you
> only know at retire time what the CMP result is and can "discard" the
> XCHG part. Provided the uarch is smart enough to do that.
> 
> This is probably why CMPXCHG needs 5,6,7,10,22,... cycles depending on
> uarch and vendor, if I can trust Agner Fog's tables. And I bet those
> numbers are best-case only and in real-life they probably tend to fall
> out even worse.
> 
> CMP needs only 1. On almost every uarch and vendor. And even that cycle
> probably gets hidden with a good branch predictor.

Are there any such data around the SC and LL (MIPS)?

> 
> > If you man the LOCK prefix, I understand now.
> 
> And that makes several times worse: 22, 40, 80, ... cycles.

I'm OK if the code still keeps the readability then.

Thanks and best regards
-Lv

> 
> --
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
> --
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-30  8:05           ` Zheng, Lv
@ 2015-04-30  8:48             ` Borislav Petkov
  2015-05-02  0:34               ` Zheng, Lv
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2015-04-30  8:48 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

On Thu, Apr 30, 2015 at 08:05:12AM +0000, Zheng, Lv wrote:
> Are there any such data around the SC and LL (MIPS)?

What, you can't search the internet yourself?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-30  8:48             ` Borislav Petkov
@ 2015-05-02  0:34               ` Zheng, Lv
  0 siblings, 0 replies; 32+ messages in thread
From: Zheng, Lv @ 2015-05-02  0:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, Jiri Kosina, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Luck, Tony, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Naoya Horiguchi, linux-acpi, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 928 bytes --]

Hi,

> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Thursday, April 30, 2015 4:49 PM
> 
> On Thu, Apr 30, 2015 at 08:05:12AM +0000, Zheng, Lv wrote:
> > Are there any such data around the SC and LL (MIPS)?
> 
> What, you can't search the internet yourself?

I mean if LL can do what you exactly did with "if" and no ABA problem can break the atomic_add_unless() users.
Then no atomic_cmpxchg() users might be broken for the same reason.
Why don't you put an "if" in the atomic_cmpxchg() to allow other users to have the same benefit because of the data?

But this is out of the context for this patch.
I actually don't have any API usage preference now.

Thanks
-Lv

> 
> --
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
> --
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
  2015-04-28 18:44             ` Don Zickus
@ 2015-05-04 15:40               ` Borislav Petkov
  0 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2015-05-04 15:40 UTC (permalink / raw)
  To: Don Zickus
  Cc: Jiri Kosina, linux-edac, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Tony Luck, Tomasz Nowicki, Chen, Gong, Wolfram Sang,
	Lv Zheng, Naoya Horiguchi, linux-acpi, linux-kernel, Huang Ying

On Tue, Apr 28, 2015 at 02:44:28PM -0400, Don Zickus wrote:
> RAS doesn't go through the legacy ports (ie get_nmi_reason()).  Instead it
> triggers the external NMI through a different bit (ioapic I think).

Well, I see it getting registered with __register_nmi_handler() which
adds it to the NMI_LOCAL type, i.e., ghes_notify_nmi() gets called by

default_do_nmi
|-> nmi_handle(NMI_LOCAL, regs, b2b);

AFAICT.

Which explains also the issue we were seeing as that handler is called
on each NMI, even when the machine is running a perf workload.

> The nmi code has no idea what io_remap'ed address apei is using to map its
> error handling register that GHES uses.  Unlike the legacy port which is
> always port 0x61.
> 
> So, with NMI being basically a shared interrupt, with no ability to discern
> who sent the interrupt (and even worse no ability to know how _many_ were sent as
> the NMI is edge triggered instead of level triggered).  As a result we rely
> on the NMI handlers to talk to their address space/registers to determine if
> they were they source of the interrupt.

I was afraid it would be something like that. We probably should poke hw
people to extend that NMI fun so that we can know who caused it.

<snip stuff I agree with>

> Anyway, any ideas or thoughts for improvement are always welcomed. :-)

Yeah, I'm afraid without hw support, that won't be doable. We need the
hw to tell us who caused the NMI. Otherwise we'll be round-robining
(:-)) through handlers like nuts.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-05-04 15:41 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-27  9:22 [RFC PATCH 0/5] GHES NMI handler cleanup Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 1/5] GHES: Carve out error queueing in a separate function Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 2/5] GHES: Carve out the panic functionality Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 3/5] GHES: Panic right after detection Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 4/5] GHES: Elliminate double-loop in the NMI handler Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 5/5] GHES: Make NMI handler have a single reader Borislav Petkov
2015-04-01  7:45   ` Jiri Kosina
2015-04-01 13:49     ` Borislav Petkov
2015-04-23  8:39       ` Jiri Kosina
2015-04-23  8:59         ` Borislav Petkov
2015-04-23 18:00           ` Luck, Tony
2015-04-27 20:23             ` Borislav Petkov
2015-04-28 14:30     ` Don Zickus
2015-04-28 14:42       ` Don Zickus
2015-04-28 14:55       ` Borislav Petkov
2015-04-28 15:35         ` Don Zickus
2015-04-28 16:22           ` Borislav Petkov
2015-04-28 18:44             ` Don Zickus
2015-05-04 15:40               ` Borislav Petkov
2015-04-27  3:16   ` Zheng, Lv
2015-04-27  8:46     ` Borislav Petkov
2015-04-28  0:44       ` Zheng, Lv
2015-04-28  2:24       ` Zheng, Lv
2015-04-28  7:38         ` Borislav Petkov
2015-04-28 13:38   ` Zheng, Lv
2015-04-28 13:59     ` Borislav Petkov
2015-04-29  0:24       ` Zheng, Lv
2015-04-29  0:49       ` Zheng, Lv
2015-04-29  8:13         ` Borislav Petkov
2015-04-30  8:05           ` Zheng, Lv
2015-04-30  8:48             ` Borislav Petkov
2015-05-02  0:34               ` Zheng, Lv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).