linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"lenb@kernel.org" <lenb@kernel.org>,
	Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH 1/1] RAS: Add CPU Correctable Error Collector to isolate an erroneous CPU core
Date: Tue, 1 Sep 2020 16:20:54 +0000	[thread overview]
Message-ID: <512b7b8e6cb846aabaf5a2191cd9b5d4@huawei.com> (raw)
In-Reply-To: <20200901143539.GC8392@zn.tnic>

Hi Boris,

>-----Original Message-----
>From: Borislav Petkov [mailto:bp@alien8.de]
>Sent: 01 September 2020 15:36
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
>kernel@vger.kernel.org; tony.luck@intel.com; rjw@rjwysocki.net;
>james.morse@arm.com; lenb@kernel.org; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH 1/1] RAS: Add CPU Correctable Error Collector to isolate
>an erroneous CPU core
>
>On Tue, Sep 01, 2020 at 03:01:40PM +0100, Shiju Jose wrote:
>> When the CPU correctable errors reported on an ARM64 CPU core too
>> often, it should be isolated. Add the CPU correctable error collector
>> to store the CPU correctable error count.
>>
>> When the correctable error count for a CPU exceed the threshold value
>> in a short time period, it will try to isolate the CPU core.
>> The threshold value, time period etc are configurable.
>>
>> Implementation details is added in the file.
>>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  Documentation/ABI/testing/debugfs-cpu-cec |  22 ++
>>  arch/arm64/ras/Kconfig                    |   8 +
>>  drivers/acpi/apei/ghes.c                  |  30 +-
>>  drivers/ras/Kconfig                       |   1 +
>>  drivers/ras/Makefile                      |   1 +
>>  drivers/ras/cpu_cec.c                     | 393 ++++++++++++++++++++++
>
>So instead of adding the ability to collect other error types to the CEC, you're
>duplicating the CEC itself?!
>
>Why?
CPU CEC derived the infrastructure of the CEC only and the logic used in the CEC for
CE count storage, CE count calculation and page isolation is very unique for the
memory pages,  which seems cannot be reusable for the CPU CEs. 
Also the values set for the parameters such as threshold, time period for the memory errors
and  CPU errors would be different.
Thus extending cec.c to support CPU CEs would include adding CPU CEC specific code
for storing error count, isolation etc which I thought would result the code less tidy and
less readable unless find more reusable logic.

>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju

  reply	other threads:[~2020-09-01 16:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-01 14:01 [PATCH 1/1] RAS: Add CPU Correctable Error Collector to isolate an erroneous CPU core Shiju Jose
2020-09-01 14:35 ` Borislav Petkov
2020-09-01 16:20   ` Shiju Jose [this message]
2020-09-09 12:02     ` Borislav Petkov
2020-09-10 15:29       ` Shiju Jose
2020-09-17  8:40         ` Borislav Petkov
2020-10-01 17:16           ` James Morse
2020-10-01 17:30             ` Borislav Petkov
2020-10-02 12:23               ` Shiju Jose
2020-09-01 17:59 ` kernel test robot
2020-09-01 18:51 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512b7b8e6cb846aabaf5a2191cd9b5d4@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=rjw@rjwysocki.net \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).