linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Avadhut Naik <avadhut.naik@amd.com>
To: <x86@kernel.org>, <linux-edac@vger.kernel.org>
Cc: <bp@alien8.de>, <tony.luck@intel.com>,
	<linux-kernel@vger.kernel.org>, <yazen.ghannam@amd.com>,
	<avadnaik@amd.com>
Subject: [PATCH 1/2] x86/MCE: Extend size of the MCE Records pool
Date: Wed, 7 Feb 2024 16:56:31 -0600	[thread overview]
Message-ID: <20240207225632.159276-2-avadhut.naik@amd.com> (raw)
In-Reply-To: <20240207225632.159276-1-avadhut.naik@amd.com>

Currently, 2 pages are allocated for the MCE Records pool during system
bootup. Records of MCEs (struct mce) occurring on the system are added to
the pool through mce_gen_pool_add() in MC context. These records are then
decoded later, in process context through notifier chains.

However, on systems with high CPU count, the 2 pages allocated for the
pool might not be sufficient in some instances. Successive MCEs received
back-to-back, before they are decoded through mce_gen_pool_process(), will
result in the pool getting exhausted. Consequently, some MCE records will
be missed. The issue further intensifies since the amount of memory
associated with a system typically increases with the CPU count, thereby,
increasing the probability of MCEs being received.

The limit of 2 pages for the MCE records pool was set more than 8 years
ago and has not been revised till date. The CPU count and the amount of
memory associated with a system however, have increased enormously since
then. Additionally, the size of MCE Records (struct mce) too has increased
from 88 bytes to 124 bytes.

Extend the size of MCE Records pool to better serve modern systems. The
increase in size depends on the CPU count of the system. Currently, since
size of struct mce is 124 bytes, each logical CPU of the system will have
space for at least 2 MCE records available in the pool. To get around the
allocation woes during early boot time, the same is undertaken using
late_initcall().

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
 arch/x86/kernel/cpu/mce/core.c     |  3 +++
 arch/x86/kernel/cpu/mce/genpool.c  | 22 ++++++++++++++++++++++
 arch/x86/kernel/cpu/mce/internal.h |  1 +
 3 files changed, 26 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b5cc557cfc37..5d6d7994d549 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2901,6 +2901,9 @@ static int __init mcheck_late_init(void)
 	if (mca_cfg.recovery)
 		enable_copy_mc_fragile();
 
+	if (mce_gen_pool_extend())
+		pr_info("Couldn't extend MCE records pool!\n");
+
 	mcheck_debugfs_init();
 
 	/*
diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
index fbe8b61c3413..aed01612d342 100644
--- a/arch/x86/kernel/cpu/mce/genpool.c
+++ b/arch/x86/kernel/cpu/mce/genpool.c
@@ -20,6 +20,7 @@
  * 2 pages to save MCE events for now (~80 MCE records at most).
  */
 #define MCE_POOLSZ	(2 * PAGE_SIZE)
+#define CPU_GEN_MEMSZ	256
 
 static struct gen_pool *mce_evt_pool;
 static LLIST_HEAD(mce_event_llist);
@@ -116,6 +117,27 @@ int mce_gen_pool_add(struct mce *mce)
 	return 0;
 }
 
+int mce_gen_pool_extend(void)
+{
+	unsigned long addr, len;
+	int ret = -ENOMEM;
+	u32 num_threads;
+
+	num_threads = num_present_cpus();
+	len = PAGE_ALIGN(num_threads * CPU_GEN_MEMSZ);
+	addr = (unsigned long)kzalloc(len, GFP_KERNEL);
+
+	if (!addr)
+		goto out;
+
+	ret = gen_pool_add(mce_evt_pool, addr, len, -1);
+	if (ret)
+		kfree((void *)addr);
+
+out:
+	return ret;
+}
+
 static int mce_gen_pool_create(void)
 {
 	struct gen_pool *tmpp;
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 01f8f03969e6..81e35ec58ebc 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -33,6 +33,7 @@ void mce_gen_pool_process(struct work_struct *__unused);
 bool mce_gen_pool_empty(void);
 int mce_gen_pool_add(struct mce *mce);
 int mce_gen_pool_init(void);
+int mce_gen_pool_extend(void);
 struct llist_node *mce_gen_pool_prepare_records(void);
 
 int mce_severity(struct mce *a, struct pt_regs *regs, char **msg, bool is_excp);
-- 
2.34.1


  reply	other threads:[~2024-02-07 22:56 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-07 22:56 [PATCH 0/2] Extend size of the MCE Records pool Avadhut Naik
2024-02-07 22:56 ` Avadhut Naik [this message]
2024-02-08  0:02   ` [PATCH 1/2] x86/MCE: " Luck, Tony
2024-02-08 17:41     ` Naik, Avadhut
2024-02-08 17:47       ` Naik, Avadhut
2024-02-08 18:39       ` Luck, Tony
2024-02-09 19:47         ` Naik, Avadhut
2024-02-08 21:09   ` Sohil Mehta
2024-02-09 19:52     ` Naik, Avadhut
2024-02-07 22:56 ` [PATCH 2/2] x86/MCE: Add command line option to extend " Avadhut Naik
2024-02-09  1:36   ` Sohil Mehta
2024-02-09 20:02     ` Naik, Avadhut
2024-02-09 20:09       ` Borislav Petkov
2024-02-09 20:35         ` Naik, Avadhut
2024-02-09 20:51           ` Borislav Petkov
2024-02-10  7:52             ` Borislav Petkov
2024-02-10 21:15               ` Naik, Avadhut
2024-02-11 11:14                 ` Borislav Petkov
2024-02-12  2:54                   ` Naik, Avadhut
2024-02-12  8:58                     ` Borislav Petkov
2024-02-12  9:32                       ` Borislav Petkov
2024-02-12 17:29                         ` Luck, Tony
2024-02-12 17:54                           ` Borislav Petkov
2024-02-12 18:45                             ` Luck, Tony
2024-02-12 19:14                               ` Borislav Petkov
2024-02-12 19:41                                 ` Luck, Tony
2024-02-12 21:37                                   ` Tony Luck
2024-02-12 22:08                                     ` Borislav Petkov
2024-02-12 22:19                                       ` Borislav Petkov
2024-02-12 22:42                                         ` Borislav Petkov
2024-02-28 23:14                                           ` [PATCH] x86/mce: Dynamically size space for machine check records Tony Luck
2024-02-29  0:39                                             ` Sohil Mehta
2024-02-29  0:44                                               ` Luck, Tony
2024-02-29  1:56                                             ` Sohil Mehta
2024-02-29 15:49                                               ` Yazen Ghannam
2024-02-29 17:22                                                 ` Tony Luck
2024-02-29 17:21                                               ` Tony Luck
2024-02-29 23:56                                                 ` Sohil Mehta
2024-02-29  6:42                                             ` Naik, Avadhut
2024-02-29  8:39                                               ` Borislav Petkov
2024-02-29 17:47                                                 ` Tony Luck
2024-02-29 18:28                                                   ` Naik, Avadhut
2024-02-29 18:38                                                     ` Luck, Tony
2024-02-29 17:26                                               ` Tony Luck
2024-03-06 21:52                                             ` Naik, Avadhut
2024-03-06 22:07                                               ` Luck, Tony
2024-03-06 23:21                                                 ` Naik, Avadhut
2024-02-15 20:18                             ` [PATCH 2/2] x86/MCE: Add command line option to extend MCE Records pool Naik, Avadhut
2024-02-15 20:15                         ` Naik, Avadhut
2024-02-15 20:14                       ` Naik, Avadhut
2024-02-12 18:47                   ` Yazen Ghannam
2024-02-12 18:58                     ` Luck, Tony
2024-02-12 19:40                       ` Naik, Avadhut
2024-02-12 20:18                         ` Borislav Petkov
2024-02-12 20:51                           ` Naik, Avadhut
2024-02-12 19:43                       ` Yazen Ghannam
2024-02-12 19:49                         ` Luck, Tony
2024-02-12 20:10                           ` Borislav Petkov
2024-02-12 20:44                             ` Paul E. McKenney
2024-02-12 21:18                               ` Luck, Tony
2024-02-12 21:27                               ` Borislav Petkov
2024-02-12 22:46                                 ` Paul E. McKenney
2024-02-12 22:53                                   ` Luck, Tony
2024-02-12 23:10                                   ` Borislav Petkov
2024-02-13  1:07                                     ` Paul E. McKenney
2024-02-09 20:16       ` Sohil Mehta
2024-02-09 20:28         ` Luck, Tony
2024-02-09 21:02           ` Sohil Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240207225632.159276-2-avadhut.naik@amd.com \
    --to=avadhut.naik@amd.com \
    --cc=avadnaik@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).