linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Naik, Avadhut" <avadnaik@amd.com>
To: Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>
Cc: "Mehta, Sohil" <sohil.mehta@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"yazen.ghannam@amd.com" <yazen.ghannam@amd.com>,
	Avadhut Naik <avadhut.naik@amd.com>
Subject: [PATCH] x86/mce: Dynamically size space for machine check records
Date: Wed, 6 Mar 2024 15:52:34 -0600	[thread overview]
Message-ID: <e6675835-46ca-4183-86ce-008fde928e73@amd.com> (raw)
In-Reply-To: <Zd--PJp-NbXGrb39@agluck-desk3>

Hi,

On 2/28/2024 17:14, Tony Luck wrote:
> Systems with a large number of CPUs may generate a large
> number of machine check records when things go seriously
> wrong. But Linux has a fixed buffer that can only capture
> a few dozen errors.
> 
> Allocate space based on the number of CPUs (with a minimum
> value based on the historical fixed buffer that could store
> 80 records).
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> 
> Discussion earlier concluded with the realization that it is
> safe to dynamically allocate the mce_evt_pool at boot time.
> So here's a patch to do that. Scaling algorithm here is a
> simple linear "4 records per possible CPU" with a minimum
> of 80 to match the legacy behavior. I'm open to other
> suggestions.
> 
> Note that I threw in a "+1" to the return from ilog2() when
> calling gen_pool_create(). From reading code, and running
> some tests, it appears that the min_alloc_order argument
> needs to be large enough to allocate one of the mce_evt_llist
> structures.
> 
> Some other gen_pool users in Linux may also need this "+1".
> 
>  arch/x86/kernel/cpu/mce/genpool.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
> index fbe8b61c3413..a1f0a8f29cf5 100644
> --- a/arch/x86/kernel/cpu/mce/genpool.c
> +++ b/arch/x86/kernel/cpu/mce/genpool.c
> @@ -16,14 +16,13 @@
>   * used to save error information organized in a lock-less list.
>   *
>   * This memory pool is only to be used to save MCE records in MCE context.
> - * MCE events are rare, so a fixed size memory pool should be enough. Use
> - * 2 pages to save MCE events for now (~80 MCE records at most).
> + * MCE events are rare, so a fixed size memory pool should be enough.
> + * Allocate on a sliding scale based on number of CPUs.
>   */
> -#define MCE_POOLSZ	(2 * PAGE_SIZE)
> +#define MCE_MIN_ENTRIES	80
>  
>  static struct gen_pool *mce_evt_pool;
>  static LLIST_HEAD(mce_event_llist);
> -static char gen_pool_buf[MCE_POOLSZ];
>  
>  /*
>   * Compare the record "t" with each of the records on list "l" to see if
> @@ -118,14 +117,25 @@ int mce_gen_pool_add(struct mce *mce)
>  
>  static int mce_gen_pool_create(void)
>  {
> +	int mce_numrecords, mce_poolsz;
>  	struct gen_pool *tmpp;
>  	int ret = -ENOMEM;
> +	void *mce_pool;
> +	int order;
>  
> -	tmpp = gen_pool_create(ilog2(sizeof(struct mce_evt_llist)), -1);
> +	order = ilog2(sizeof(struct mce_evt_llist)) + 1;
> +	tmpp = gen_pool_create(order, -1);
>  	if (!tmpp)
>  		goto out;
>  
> -	ret = gen_pool_add(tmpp, (unsigned long)gen_pool_buf, MCE_POOLSZ, -1);
> +	mce_numrecords = max(80, num_possible_cpus() * 4);

Per Boris's below suggestion, shouldn't this be:
	mce_numrecords = max(80, num_possible_cpus() * 16);

>> 	min(4*PAGE_SIZE, num_possible_cpus() * PAGE_SIZE);
> 
> max() ofc.
> 
>> There's a sane minimum and one page pro logical CPU should be fine on
>> pretty much every configuration...

4 MCE records per CPU equates to 1024 bytes, considering the genpool intrinsic
behavior you explained in the other subthread.

Apart from this, tested the patch on a couple of AMD systems. Didn't observe any
issues.

> +	mce_poolsz = mce_numrecords * (1 << order);
> +	mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
> +	if (!mce_pool) {
> +		gen_pool_destroy(tmpp);
> +		goto out;
> +	}
> +	ret = gen_pool_add(tmpp, (unsigned long)mce_pool, mce_poolsz, -1);
>  	if (ret) {
>  		gen_pool_destroy(tmpp);
>  		goto out;

-- 
Thanks,
Avadhut Naik

  parent reply	other threads:[~2024-03-06 21:52 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-07 22:56 [PATCH 0/2] Extend size of the MCE Records pool Avadhut Naik
2024-02-07 22:56 ` [PATCH 1/2] x86/MCE: " Avadhut Naik
2024-02-08  0:02   ` Luck, Tony
2024-02-08 17:41     ` Naik, Avadhut
2024-02-08 17:47       ` Naik, Avadhut
2024-02-08 18:39       ` Luck, Tony
2024-02-09 19:47         ` Naik, Avadhut
2024-02-08 21:09   ` Sohil Mehta
2024-02-09 19:52     ` Naik, Avadhut
2024-02-07 22:56 ` [PATCH 2/2] x86/MCE: Add command line option to extend " Avadhut Naik
2024-02-09  1:36   ` Sohil Mehta
2024-02-09 20:02     ` Naik, Avadhut
2024-02-09 20:09       ` Borislav Petkov
2024-02-09 20:35         ` Naik, Avadhut
2024-02-09 20:51           ` Borislav Petkov
2024-02-10  7:52             ` Borislav Petkov
2024-02-10 21:15               ` Naik, Avadhut
2024-02-11 11:14                 ` Borislav Petkov
2024-02-12  2:54                   ` Naik, Avadhut
2024-02-12  8:58                     ` Borislav Petkov
2024-02-12  9:32                       ` Borislav Petkov
2024-02-12 17:29                         ` Luck, Tony
2024-02-12 17:54                           ` Borislav Petkov
2024-02-12 18:45                             ` Luck, Tony
2024-02-12 19:14                               ` Borislav Petkov
2024-02-12 19:41                                 ` Luck, Tony
2024-02-12 21:37                                   ` Tony Luck
2024-02-12 22:08                                     ` Borislav Petkov
2024-02-12 22:19                                       ` Borislav Petkov
2024-02-12 22:42                                         ` Borislav Petkov
2024-02-28 23:14                                           ` [PATCH] x86/mce: Dynamically size space for machine check records Tony Luck
2024-02-29  0:39                                             ` Sohil Mehta
2024-02-29  0:44                                               ` Luck, Tony
2024-02-29  1:56                                             ` Sohil Mehta
2024-02-29 15:49                                               ` Yazen Ghannam
2024-02-29 17:22                                                 ` Tony Luck
2024-02-29 17:21                                               ` Tony Luck
2024-02-29 23:56                                                 ` Sohil Mehta
2024-02-29  6:42                                             ` Naik, Avadhut
2024-02-29  8:39                                               ` Borislav Petkov
2024-02-29 17:47                                                 ` Tony Luck
2024-02-29 18:28                                                   ` Naik, Avadhut
2024-02-29 18:38                                                     ` Luck, Tony
2024-02-29 17:26                                               ` Tony Luck
2024-03-06 21:52                                             ` Naik, Avadhut [this message]
2024-03-06 22:07                                               ` Luck, Tony
2024-03-06 23:21                                                 ` Naik, Avadhut
2024-02-15 20:18                             ` [PATCH 2/2] x86/MCE: Add command line option to extend MCE Records pool Naik, Avadhut
2024-02-15 20:15                         ` Naik, Avadhut
2024-02-15 20:14                       ` Naik, Avadhut
2024-02-12 18:47                   ` Yazen Ghannam
2024-02-12 18:58                     ` Luck, Tony
2024-02-12 19:40                       ` Naik, Avadhut
2024-02-12 20:18                         ` Borislav Petkov
2024-02-12 20:51                           ` Naik, Avadhut
2024-02-12 19:43                       ` Yazen Ghannam
2024-02-12 19:49                         ` Luck, Tony
2024-02-12 20:10                           ` Borislav Petkov
2024-02-12 20:44                             ` Paul E. McKenney
2024-02-12 21:18                               ` Luck, Tony
2024-02-12 21:27                               ` Borislav Petkov
2024-02-12 22:46                                 ` Paul E. McKenney
2024-02-12 22:53                                   ` Luck, Tony
2024-02-12 23:10                                   ` Borislav Petkov
2024-02-13  1:07                                     ` Paul E. McKenney
2024-02-09 20:16       ` Sohil Mehta
2024-02-09 20:28         ` Luck, Tony
2024-02-09 21:02           ` Sohil Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6675835-46ca-4183-86ce-008fde928e73@amd.com \
    --to=avadnaik@amd.com \
    --cc=avadhut.naik@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sohil.mehta@intel.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).