From: "Naik, Avadhut" <avadnaik@amd.com>
To: Sohil Mehta <sohil.mehta@intel.com>,
x86@kernel.org, linux-edac@vger.kernel.org
Cc: bp@alien8.de, tony.luck@intel.com, linux-kernel@vger.kernel.org,
yazen.ghannam@amd.com, Avadhut Naik <avadhut.naik@amd.com>
Subject: [PATCH 1/2] x86/MCE: Extend size of the MCE Records pool
Date: Fri, 9 Feb 2024 13:52:21 -0600 [thread overview]
Message-ID: <17b1747a-8487-44d2-b79c-0da03b09c990@amd.com> (raw)
In-Reply-To: <75f48901-fbfa-4ef4-99b9-312800d20896@intel.com>
Hi,
On 2/8/2024 15:09, Sohil Mehta wrote:
> On 2/7/2024 2:56 PM, Avadhut Naik wrote:
>
>> Extend the size of MCE Records pool to better serve modern systems. The
>> increase in size depends on the CPU count of the system. Currently, since
>> size of struct mce is 124 bytes, each logical CPU of the system will have
>> space for at least 2 MCE records available in the pool. To get around the
>> allocation woes during early boot time, the same is undertaken using
>> late_initcall().
>>
>
> I guess making this proportional to the number of CPUs is probably fine
> assuming CPUs and memory capacity *would* generally increase in sync.
>
> But, is there some logic to having 2 MCE records per logical cpu or it
> is just a heuristic approach? In practice, the pool is shared amongst
> all MCE sources and can be filled by anyone, right?
>
Yes, the pool is shared among all MCE sources but the logic for 256 is
that the genpool was set to 2 pages i.e. 8192 bytes in 2015.
Around that time, AFAIK, the max number of logical CPUs on a system was
32.
So, in the maximum case, each CPU will have around 256 bytes (8192/32) in
the pool. It equates to approximately 2 MCE records since sizeof(struct mce)
back then was 88 bytes.
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
>> arch/x86/kernel/cpu/mce/core.c | 3 +++
>> arch/x86/kernel/cpu/mce/genpool.c | 22 ++++++++++++++++++++++
>> arch/x86/kernel/cpu/mce/internal.h | 1 +
>> 3 files changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index b5cc557cfc37..5d6d7994d549 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -2901,6 +2901,9 @@ static int __init mcheck_late_init(void)
>> if (mca_cfg.recovery)
>> enable_copy_mc_fragile();
>>
>> + if (mce_gen_pool_extend())
>> + pr_info("Couldn't extend MCE records pool!\n");
>> +
>
> Why do this unconditionally? For a vast majority of low core-count, low
> memory systems the default 2 pages would be good enough.
>
> Should there be a threshold beyond which the extension becomes active?
> Let's say, for example, a check for num_present_cpus() > 32 (Roughly
> based on 8Kb memory and 124b*2 estimate per logical CPU).
>
> Whatever you choose, a comment above the code would be helpful
> describing when the extension is expected to be useful.
>
Put it in unconditionally because IMO the increase in memory even for
low-core systems didn't seem to be substantial. Just an additional page
for systems with less than 16 CPUs.
But I do get your point. Will add a check in mcheck_late_init() for CPUs
present. Something like below:
@@ -2901,7 +2901,7 @@ static int __init mcheck_late_init(void)
if (mca_cfg.recovery)
enable_copy_mc_fragile();
- if (mce_gen_pool_extend())
+ if ((num_present_cpus() > 32) && mce_gen_pool_extend())
pr_info("Couldn't extend MCE records pool!\n");
Does this look good? The genpool extension will then be undertaken only for
systems with more than 32 CPUs. Will explain the same in a comment.
>> mcheck_debugfs_init();
>>
>> /*
>> diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
>> index fbe8b61c3413..aed01612d342 100644
>> --- a/arch/x86/kernel/cpu/mce/genpool.c
>> +++ b/arch/x86/kernel/cpu/mce/genpool.c
>> @@ -20,6 +20,7 @@
>> * 2 pages to save MCE events for now (~80 MCE records at most).
>> */
>> #define MCE_POOLSZ (2 * PAGE_SIZE)
>> +#define CPU_GEN_MEMSZ 256
>>
>
> The comment above MCE_POOLSZ probably needs a complete re-write. Right
> now, it reads as follows:
>
> * This memory pool is only to be used to save MCE records in MCE context.
> * MCE events are rare, so a fixed size memory pool should be enough. Use
> * 2 pages to save MCE events for now (~80 MCE records at most).
>
> Apart from the numbers being incorrect since sizeof(struct mce) has
> increased, this patch is based on the assumption that the current MCE
> memory pool is no longer enough in certain cases.
>
Yes, will change the comment to something like below:
* This memory pool is only to be used to save MCE records in MCE context.
* Though MCE events are rare, their frequency typically depends on the
* system's memory and CPU count.
* Allocate 2 pages to the MCE Records pool during early boot with the
* option to extend the pool, as needed, through command line, for systems
* with CPU count of more than 32.
* By default, each logical CPU can have around 2 MCE records in the pool
* at the same time.
Sounds good?
>> static struct gen_pool *mce_evt_pool;
>> static LLIST_HEAD(mce_event_llist);
>> @@ -116,6 +117,27 @@ int mce_gen_pool_add(struct mce *mce)
>> return 0;
>> }
>>
>> +int mce_gen_pool_extend(void)
>> +{
>> + unsigned long addr, len;
>
> s/len/size/
>
Noted.
>> + int ret = -ENOMEM;
>> + u32 num_threads;
>> +
>> + num_threads = num_present_cpus();
>> + len = PAGE_ALIGN(num_threads * CPU_GEN_MEMSZ);
>
> Nit: Can the use of the num_threads variable be avoided?
> How about:
>
> size = PAGE_ALIGN(num_present_cpus() * CPU_GEN_MEMSZ);
>
Will do.
>
>
>> + addr = (unsigned long)kzalloc(len, GFP_KERNEL);
>
> Also, shouldn't the new allocation be incremental to the 2 pages already
> present?
>
> Let's say, for example, that you have a 40-cpu system and the calculated
> size in this case comes out to 40 * 2 * 128b = 9920bytes i.e. 3 pages.
> You only need to allocate 1 additional page to add to mce_evt_pool
> instead of the 3 pages that the current code does.
>
Will make it incremental when genpool extension is being undertaken through
the default means. Something like below:
@@ -129,6 +134,7 @@ int mce_gen_pool_extend(void)
} else {
num_threads = num_present_cpus();
len = PAGE_ALIGN(num_threads * CPU_GEN_MEMSZ);
+ len -= MCE_POOLSZ;
Does this sound good?
--
Thanks,
Avadhut Naik
> Sohil
>
>> +
>> + if (!addr)
>> + goto out;
>> +
>> + ret = gen_pool_add(mce_evt_pool, addr, len, -1);
>> + if (ret)
>> + kfree((void *)addr);
>> +
>> +out:
>> + return ret;
>> +}
>> +
>> static int mce_gen_pool_create(void)
>> {
>> struct gen_pool *tmpp;
>
>
next prev parent reply other threads:[~2024-02-09 19:52 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-07 22:56 [PATCH 0/2] Extend size of the MCE Records pool Avadhut Naik
2024-02-07 22:56 ` [PATCH 1/2] x86/MCE: " Avadhut Naik
2024-02-08 0:02 ` Luck, Tony
2024-02-08 17:41 ` Naik, Avadhut
2024-02-08 17:47 ` Naik, Avadhut
2024-02-08 18:39 ` Luck, Tony
2024-02-09 19:47 ` Naik, Avadhut
2024-02-08 21:09 ` Sohil Mehta
2024-02-09 19:52 ` Naik, Avadhut [this message]
2024-02-07 22:56 ` [PATCH 2/2] x86/MCE: Add command line option to extend " Avadhut Naik
2024-02-09 1:36 ` Sohil Mehta
2024-02-09 20:02 ` Naik, Avadhut
2024-02-09 20:09 ` Borislav Petkov
2024-02-09 20:35 ` Naik, Avadhut
2024-02-09 20:51 ` Borislav Petkov
2024-02-10 7:52 ` Borislav Petkov
2024-02-10 21:15 ` Naik, Avadhut
2024-02-11 11:14 ` Borislav Petkov
2024-02-12 2:54 ` Naik, Avadhut
2024-02-12 8:58 ` Borislav Petkov
2024-02-12 9:32 ` Borislav Petkov
2024-02-12 17:29 ` Luck, Tony
2024-02-12 17:54 ` Borislav Petkov
2024-02-12 18:45 ` Luck, Tony
2024-02-12 19:14 ` Borislav Petkov
2024-02-12 19:41 ` Luck, Tony
2024-02-12 21:37 ` Tony Luck
2024-02-12 22:08 ` Borislav Petkov
2024-02-12 22:19 ` Borislav Petkov
2024-02-12 22:42 ` Borislav Petkov
2024-02-28 23:14 ` [PATCH] x86/mce: Dynamically size space for machine check records Tony Luck
2024-02-29 0:39 ` Sohil Mehta
2024-02-29 0:44 ` Luck, Tony
2024-02-29 1:56 ` Sohil Mehta
2024-02-29 15:49 ` Yazen Ghannam
2024-02-29 17:22 ` Tony Luck
2024-02-29 17:21 ` Tony Luck
2024-02-29 23:56 ` Sohil Mehta
2024-02-29 6:42 ` Naik, Avadhut
2024-02-29 8:39 ` Borislav Petkov
2024-02-29 17:47 ` Tony Luck
2024-02-29 18:28 ` Naik, Avadhut
2024-02-29 18:38 ` Luck, Tony
2024-02-29 17:26 ` Tony Luck
2024-03-06 21:52 ` Naik, Avadhut
2024-03-06 22:07 ` Luck, Tony
2024-03-06 23:21 ` Naik, Avadhut
2024-02-15 20:18 ` [PATCH 2/2] x86/MCE: Add command line option to extend MCE Records pool Naik, Avadhut
2024-02-15 20:15 ` Naik, Avadhut
2024-02-15 20:14 ` Naik, Avadhut
2024-02-12 18:47 ` Yazen Ghannam
2024-02-12 18:58 ` Luck, Tony
2024-02-12 19:40 ` Naik, Avadhut
2024-02-12 20:18 ` Borislav Petkov
2024-02-12 20:51 ` Naik, Avadhut
2024-02-12 19:43 ` Yazen Ghannam
2024-02-12 19:49 ` Luck, Tony
2024-02-12 20:10 ` Borislav Petkov
2024-02-12 20:44 ` Paul E. McKenney
2024-02-12 21:18 ` Luck, Tony
2024-02-12 21:27 ` Borislav Petkov
2024-02-12 22:46 ` Paul E. McKenney
2024-02-12 22:53 ` Luck, Tony
2024-02-12 23:10 ` Borislav Petkov
2024-02-13 1:07 ` Paul E. McKenney
2024-02-09 20:16 ` Sohil Mehta
2024-02-09 20:28 ` Luck, Tony
2024-02-09 21:02 ` Sohil Mehta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17b1747a-8487-44d2-b79c-0da03b09c990@amd.com \
--to=avadnaik@amd.com \
--cc=avadhut.naik@amd.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sohil.mehta@intel.com \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).