linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] x86/mce/AMD: Use saved threshold block info in interrupt handler
@ 2017-05-31 14:11 Yazen Ghannam
  2017-06-01 17:49 ` Borislav Petkov
  0 siblings, 1 reply; 2+ messages in thread
From: Yazen Ghannam @ 2017-05-31 14:11 UTC (permalink / raw)
  To: linux-edac; +Cc: Borislav Petkov, Tony Luck, x86, linux-kernel, Yazen Ghannam

From: Yazen Ghannam <yazen.ghannam@amd.com>

In the amd_threshold_interrupt() handler, we loop through every possible
block in each bank and rediscover the block's address and if it's valid,
e.g. valid, counter present and not locked. However, we already have the
address saved in the threshold blocks list for each CPU and bank. The list
only contains blocks that have passed all the valid checks.

Besides the redundancy, there's also a smp_call_function* in
get_block_address() and this causes a warning when servicing the interrupt.

 WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0
 ...
 Call Trace:
  <IRQ>
  rdmsr_safe_on_cpu+0x5d/0x90
  get_block_address.isra.2+0x97/0x100
  amd_threshold_interrupt+0xae/0x220
  smp_threshold_interrupt+0x1b/0x40
  threshold_interrupt+0x89/0x90

Drop the redundant valid checks and move the overflow check, logging and
block reset into a separate function.

Check the first block then iterate over the rest. This procedure is needed
since the first block is used as the head of the list.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/1495658507-7413-3-git-send-email-Yazen.Ghannam@amd.com

Link for patch 2:
https://lkml.kernel.org/r/1495658507-7413-2-git-send-email-Yazen.Ghannam@amd.com

v1->v2:
* Drop patch 2 from the first set.
* Rather than iterating through all blocks in the list, check the first
  and iterate over the rest. This way we don't need to have an external
  list_head.
 
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 64 ++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index d11f94e..5cbb6ee 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -867,49 +867,55 @@ static void log_error_thresholding(unsigned int bank, u64 misc)
 	_log_error_bank(bank, msr_ops.status(bank), msr_ops.addr(bank), misc);
 }
 
+static void log_and_reset_block(struct threshold_block *block)
+{
+	u32 low = 0, high = 0;
+	struct thresh_restart tr;
+
+	if (!block)
+		return;
+
+	if (rdmsr_safe(block->address, &low, &high))
+		return;
+
+	if (!(high & MASK_OVERFLOW_HI))
+		return;
+
+	/* Log the MCE which caused the threshold event. */
+	log_error_thresholding(block->bank, ((u64)high << 32) | low);
+
+	/* Reset threshold block after logging error. */
+	memset(&tr, 0, sizeof(tr));
+	tr.b = block;
+	threshold_restart_bank(&tr);
+}
+
 /*
  * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
  * goes off when error_count reaches threshold_limit.
  */
 static void amd_threshold_interrupt(void)
 {
-	u32 low = 0, high = 0, address = 0;
-	unsigned int bank, block, cpu = smp_processor_id();
-	struct thresh_restart tr;
+	unsigned int bank, cpu = smp_processor_id();
+	struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
 
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
 		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
-		for (block = 0; block < NR_BLOCKS; ++block) {
-			address = get_block_address(cpu, address, low, high, bank, block);
-			if (!address)
-				break;
 
-			if (rdmsr_safe(address, &low, &high))
-				break;
+		first_block = per_cpu(threshold_banks, cpu)[bank]->blocks;
 
-			if (!(high & MASK_VALID_HI)) {
-				if (block)
-					continue;
-				else
-					break;
-			}
-
-			if (!(high & MASK_CNTP_HI)  ||
-			     (high & MASK_LOCKED_HI))
-				continue;
-
-			if (!(high & MASK_OVERFLOW_HI))
-				continue;
+		if (!first_block)
+			continue;
 
-			/* Log the MCE which caused the threshold event. */
-			log_error_thresholding(bank, ((u64)high << 32) | low);
+		/*
+		 * The first block is also the head of the list.
+		 * Check it first before iterating over the rest.
+		 */
+		log_and_reset_block(first_block);
 
-			/* Reset threshold block after logging error. */
-			memset(&tr, 0, sizeof(tr));
-			tr.b = &per_cpu(threshold_banks, cpu)[bank]->blocks[block];
-			threshold_restart_bank(&tr);
-		}
+		list_for_each_entry_safe(block, tmp, &first_block->miscj, miscj)
+			log_and_reset_block(block);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] x86/mce/AMD: Use saved threshold block info in interrupt handler
  2017-05-31 14:11 [PATCH v2] x86/mce/AMD: Use saved threshold block info in interrupt handler Yazen Ghannam
@ 2017-06-01 17:49 ` Borislav Petkov
  0 siblings, 0 replies; 2+ messages in thread
From: Borislav Petkov @ 2017-06-01 17:49 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, Tony Luck, x86, linux-kernel

On Wed, May 31, 2017 at 09:11:55AM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> In the amd_threshold_interrupt() handler, we loop through every possible
> block in each bank and rediscover the block's address and if it's valid,
> e.g. valid, counter present and not locked. However, we already have the
> address saved in the threshold blocks list for each CPU and bank. The list
> only contains blocks that have passed all the valid checks.
> 
> Besides the redundancy, there's also a smp_call_function* in
> get_block_address() and this causes a warning when servicing the interrupt.
> 
>  WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0
>  ...
>  Call Trace:
>   <IRQ>
>   rdmsr_safe_on_cpu+0x5d/0x90
>   get_block_address.isra.2+0x97/0x100
>   amd_threshold_interrupt+0xae/0x220
>   smp_threshold_interrupt+0x1b/0x40
>   threshold_interrupt+0x89/0x90
> 
> Drop the redundant valid checks and move the overflow check, logging and
> block reset into a separate function.
> 
> Check the first block then iterate over the rest. This procedure is needed
> since the first block is used as the head of the list.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> Link:
> https://lkml.kernel.org/r/1495658507-7413-3-git-send-email-Yazen.Ghannam@amd.com
> 
> Link for patch 2:
> https://lkml.kernel.org/r/1495658507-7413-2-git-send-email-Yazen.Ghannam@amd.com
> 
> v1->v2:
> * Drop patch 2 from the first set.
> * Rather than iterating through all blocks in the list, check the first
>   and iterate over the rest. This way we don't need to have an external
>   list_head.
>  
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 64 ++++++++++++++++++++----------------
>  1 file changed, 35 insertions(+), 29 deletions(-)

Much better!

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-06-01 17:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-31 14:11 [PATCH v2] x86/mce/AMD: Use saved threshold block info in interrupt handler Yazen Ghannam
2017-06-01 17:49 ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).