linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Naik, Avadhut" <avadnaik@amd.com>
To: Sohil Mehta <sohil.mehta@intel.com>,
	x86@kernel.org, linux-edac@vger.kernel.org
Cc: bp@alien8.de, tony.luck@intel.com, linux-kernel@vger.kernel.org,
	yazen.ghannam@amd.com, Avadhut Naik <avadhut.naik@amd.com>
Subject: [PATCH 1/2] x86/MCE: Extend size of the MCE Records pool
Date: Fri, 9 Feb 2024 13:52:21 -0600	[thread overview]
Message-ID: <17b1747a-8487-44d2-b79c-0da03b09c990@amd.com> (raw)
In-Reply-To: <75f48901-fbfa-4ef4-99b9-312800d20896@intel.com>

Hi,

On 2/8/2024 15:09, Sohil Mehta wrote:
> On 2/7/2024 2:56 PM, Avadhut Naik wrote:
> 
>> Extend the size of MCE Records pool to better serve modern systems. The
>> increase in size depends on the CPU count of the system. Currently, since
>> size of struct mce is 124 bytes, each logical CPU of the system will have
>> space for at least 2 MCE records available in the pool. To get around the
>> allocation woes during early boot time, the same is undertaken using
>> late_initcall().
>>
> 
> I guess making this proportional to the number of CPUs is probably fine
> assuming CPUs and memory capacity *would* generally increase in sync.
> 
> But, is there some logic to having 2 MCE records per logical cpu or it
> is just a heuristic approach? In practice, the pool is shared amongst
> all MCE sources and can be filled by anyone, right?
> 
Yes, the pool is shared among all MCE sources but the logic for 256 is
that the genpool was set to 2 pages i.e. 8192 bytes in 2015.
Around that time, AFAIK, the max number of logical CPUs on a system was
32.
So, in the maximum case, each CPU will have around 256 bytes (8192/32) in
the pool. It equates to approximately 2 MCE records since sizeof(struct mce)
back then was 88 bytes.
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
>>  arch/x86/kernel/cpu/mce/core.c     |  3 +++
>>  arch/x86/kernel/cpu/mce/genpool.c  | 22 ++++++++++++++++++++++
>>  arch/x86/kernel/cpu/mce/internal.h |  1 +
>>  3 files changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index b5cc557cfc37..5d6d7994d549 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -2901,6 +2901,9 @@ static int __init mcheck_late_init(void)
>>  	if (mca_cfg.recovery)
>>  		enable_copy_mc_fragile();
>>  
>> +	if (mce_gen_pool_extend())
>> +		pr_info("Couldn't extend MCE records pool!\n");
>> +
> 
> Why do this unconditionally? For a vast majority of low core-count, low
> memory systems the default 2 pages would be good enough.
> 
> Should there be a threshold beyond which the extension becomes active?
> Let's say, for example, a check for num_present_cpus() > 32 (Roughly
> based on 8Kb memory and 124b*2 estimate per logical CPU).
> 
> Whatever you choose, a comment above the code would be helpful
> describing when the extension is expected to be useful.
> 
Put it in unconditionally because IMO the increase in memory even for
low-core systems didn't seem to be substantial. Just an additional page
for systems with less than 16 CPUs.

But I do get your point. Will add a check in mcheck_late_init() for CPUs
present. Something like below:

@@ -2901,7 +2901,7 @@ static int __init mcheck_late_init(void)
    if (mca_cfg.recovery)
        enable_copy_mc_fragile();

-   if (mce_gen_pool_extend())
+   if ((num_present_cpus() > 32) && mce_gen_pool_extend())
        pr_info("Couldn't extend MCE records pool!\n");

Does this look good? The genpool extension will then be undertaken only for
systems with more than 32 CPUs. Will explain the same in a comment.

>>  	mcheck_debugfs_init();
>>  
>>  	/*
>> diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
>> index fbe8b61c3413..aed01612d342 100644
>> --- a/arch/x86/kernel/cpu/mce/genpool.c
>> +++ b/arch/x86/kernel/cpu/mce/genpool.c
>> @@ -20,6 +20,7 @@
>>   * 2 pages to save MCE events for now (~80 MCE records at most).
>>   */
>>  #define MCE_POOLSZ	(2 * PAGE_SIZE)
>> +#define CPU_GEN_MEMSZ	256
>>  
> 
> The comment above MCE_POOLSZ probably needs a complete re-write. Right
> now, it reads as follows:
> 
> * This memory pool is only to be used to save MCE records in MCE context.
> * MCE events are rare, so a fixed size memory pool should be enough. Use
> * 2 pages to save MCE events for now (~80 MCE records at most).
> 
> Apart from the numbers being incorrect since sizeof(struct mce) has
> increased, this patch is based on the assumption that the current MCE
> memory pool is no longer enough in certain cases.
> 
Yes, will change the comment to something like below:

 * This memory pool is only to be used to save MCE records in MCE context.
 * Though MCE events are rare, their frequency typically depends on the
 * system's memory and CPU count.
 * Allocate 2 pages to the MCE Records pool during early boot with the
 * option to extend the pool, as needed, through command line, for systems
 * with CPU count of more than 32.
 * By default, each logical CPU can have around 2 MCE records in the pool
 * at the same time. 

Sounds good?

>>  static struct gen_pool *mce_evt_pool;
>>  static LLIST_HEAD(mce_event_llist);
>> @@ -116,6 +117,27 @@ int mce_gen_pool_add(struct mce *mce)
>>  	return 0;
>>  }
>>  
>> +int mce_gen_pool_extend(void)
>> +{
>> +	unsigned long addr, len;
> 
> s/len/size/
> 
Noted.
>> +	int ret = -ENOMEM;
>> +	u32 num_threads;
>> +
>> +	num_threads = num_present_cpus();
>> +	len = PAGE_ALIGN(num_threads * CPU_GEN_MEMSZ);
> 
> Nit: Can the use of the num_threads variable be avoided?
> How about:
> 
> 	size = PAGE_ALIGN(num_present_cpus() * CPU_GEN_MEMSZ);
> 
Will do.
> 
> 
>> +	addr = (unsigned long)kzalloc(len, GFP_KERNEL);
> 
> Also, shouldn't the new allocation be incremental to the 2 pages already
> present?
> 
> Let's say, for example, that you have a 40-cpu system and the calculated
> size in this case comes out to 40 * 2 * 128b = 9920bytes  i.e. 3 pages.
> You only need to allocate 1 additional page to add to mce_evt_pool
> instead of the 3 pages that the current code does.
> 
Will make it incremental when genpool extension is being undertaken through
the default means. Something like below:

@@ -129,6 +134,7 @@ int mce_gen_pool_extend(void)
    } else {
        num_threads = num_present_cpus();
        len = PAGE_ALIGN(num_threads * CPU_GEN_MEMSZ);
+       len -= MCE_POOLSZ;

Does this sound good?

-- 
Thanks,
Avadhut Naik

> Sohil
> 
>> +
>> +	if (!addr)
>> +		goto out;
>> +
>> +	ret = gen_pool_add(mce_evt_pool, addr, len, -1);
>> +	if (ret)
>> +		kfree((void *)addr);
>> +
>> +out:
>> +	return ret;
>> +}
>> +
>>  static int mce_gen_pool_create(void)
>>  {
>>  	struct gen_pool *tmpp;
> 
> 



  reply	other threads:[~2024-02-09 19:52 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-07 22:56 [PATCH 0/2] Extend size of the MCE Records pool Avadhut Naik
2024-02-07 22:56 ` [PATCH 1/2] x86/MCE: " Avadhut Naik
2024-02-08  0:02   ` Luck, Tony
2024-02-08 17:41     ` Naik, Avadhut
2024-02-08 17:47       ` Naik, Avadhut
2024-02-08 18:39       ` Luck, Tony
2024-02-09 19:47         ` Naik, Avadhut
2024-02-08 21:09   ` Sohil Mehta
2024-02-09 19:52     ` Naik, Avadhut [this message]
2024-02-07 22:56 ` [PATCH 2/2] x86/MCE: Add command line option to extend " Avadhut Naik
2024-02-09  1:36   ` Sohil Mehta
2024-02-09 20:02     ` Naik, Avadhut
2024-02-09 20:09       ` Borislav Petkov
2024-02-09 20:35         ` Naik, Avadhut
2024-02-09 20:51           ` Borislav Petkov
2024-02-10  7:52             ` Borislav Petkov
2024-02-10 21:15               ` Naik, Avadhut
2024-02-11 11:14                 ` Borislav Petkov
2024-02-12  2:54                   ` Naik, Avadhut
2024-02-12  8:58                     ` Borislav Petkov
2024-02-12  9:32                       ` Borislav Petkov
2024-02-12 17:29                         ` Luck, Tony
2024-02-12 17:54                           ` Borislav Petkov
2024-02-12 18:45                             ` Luck, Tony
2024-02-12 19:14                               ` Borislav Petkov
2024-02-12 19:41                                 ` Luck, Tony
2024-02-12 21:37                                   ` Tony Luck
2024-02-12 22:08                                     ` Borislav Petkov
2024-02-12 22:19                                       ` Borislav Petkov
2024-02-12 22:42                                         ` Borislav Petkov
2024-02-28 23:14                                           ` [PATCH] x86/mce: Dynamically size space for machine check records Tony Luck
2024-02-29  0:39                                             ` Sohil Mehta
2024-02-29  0:44                                               ` Luck, Tony
2024-02-29  1:56                                             ` Sohil Mehta
2024-02-29 15:49                                               ` Yazen Ghannam
2024-02-29 17:22                                                 ` Tony Luck
2024-02-29 17:21                                               ` Tony Luck
2024-02-29 23:56                                                 ` Sohil Mehta
2024-02-29  6:42                                             ` Naik, Avadhut
2024-02-29  8:39                                               ` Borislav Petkov
2024-02-29 17:47                                                 ` Tony Luck
2024-02-29 18:28                                                   ` Naik, Avadhut
2024-02-29 18:38                                                     ` Luck, Tony
2024-02-29 17:26                                               ` Tony Luck
2024-03-06 21:52                                             ` Naik, Avadhut
2024-03-06 22:07                                               ` Luck, Tony
2024-03-06 23:21                                                 ` Naik, Avadhut
2024-02-15 20:18                             ` [PATCH 2/2] x86/MCE: Add command line option to extend MCE Records pool Naik, Avadhut
2024-02-15 20:15                         ` Naik, Avadhut
2024-02-15 20:14                       ` Naik, Avadhut
2024-02-12 18:47                   ` Yazen Ghannam
2024-02-12 18:58                     ` Luck, Tony
2024-02-12 19:40                       ` Naik, Avadhut
2024-02-12 20:18                         ` Borislav Petkov
2024-02-12 20:51                           ` Naik, Avadhut
2024-02-12 19:43                       ` Yazen Ghannam
2024-02-12 19:49                         ` Luck, Tony
2024-02-12 20:10                           ` Borislav Petkov
2024-02-12 20:44                             ` Paul E. McKenney
2024-02-12 21:18                               ` Luck, Tony
2024-02-12 21:27                               ` Borislav Petkov
2024-02-12 22:46                                 ` Paul E. McKenney
2024-02-12 22:53                                   ` Luck, Tony
2024-02-12 23:10                                   ` Borislav Petkov
2024-02-13  1:07                                     ` Paul E. McKenney
2024-02-09 20:16       ` Sohil Mehta
2024-02-09 20:28         ` Luck, Tony
2024-02-09 21:02           ` Sohil Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17b1747a-8487-44d2-b79c-0da03b09c990@amd.com \
    --to=avadnaik@amd.com \
    --cc=avadhut.naik@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sohil.mehta@intel.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).