All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Shiju Jose <shiju.jose@huawei.com>
Cc: James Morse <james.morse@arm.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"lenb@kernel.org" <lenb@kernel.org>,
	Linuxarm <linuxarm@huawei.com>
Subject: Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period
Date: Fri, 2 Oct 2020 20:02:18 +0200	[thread overview]
Message-ID: <20201002180125.GD17436@zn.tnic> (raw)
In-Reply-To: <59950d44-906b-684f-c876-e09c76e5f827@arm.com>

On Fri, Oct 02, 2020 at 06:33:17PM +0100, James Morse wrote:
> > I think adding the CPU error collection to the kernel
> > has the following advantages,
> >     1. The CPU error collection and isolation would not be active if the
> >          rasdaemon stopped running or not running on a machine.

Wasn't there this thing called systemd which promised that it would
restart daemons when they fail? And even if it is not there, you can
always do your own cronjob which checks rasdaemon presence and restarts
it if it has died and sends a mail to the admin to check why it had
died.

Everything else I've trimmed but James has put it a lot more eloquently
than me and I cannot agree more with what he says. Doing this in
userspace is better in every aspect you can think of.

The current CEC thing runs in the kernel because it has a completely
different purpose - to limit corrected error reports which turn into
very expensive support calls for errors which were corrected but people
simply don't get that they were corrected. Instead, they throw hands in
the air and go "OMG, my hardware is failing".

Where those are, as James says:

> These are corrected errors. Nothing has gone wrong.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2020-10-02 18:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 12:22 [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 1/7] RAS/CEC: Replace the macro PFN with ELEM_NO Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 2/7] RAS/CEC: Replace pfns_poisoned with elems_poisoned Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 3/7] RAS/CEC: Move X86 MCE specific code under CONFIG_X86_MCE Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 4/7] RAS/CEC: Modify cec_mod_work() for common use Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 5/7] RAS/CEC: Add support for errors count check on short time period Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 6/7] RAS/CEC: Add CPU Correctable Error Collector to isolate an erroneous CPU core Shiju Jose
2020-10-02 16:54   ` kernel test robot
2020-10-02 20:33   ` kernel test robot
2020-10-02 12:22 ` [RFC PATCH 7/7] ACPI / APEI: Add reporting ARM64 CPU correctable errors to the CEC Shiju Jose
2020-10-02 12:43 ` [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period Borislav Petkov
2020-10-02 15:38   ` Shiju Jose
2020-10-02 17:33     ` James Morse
2020-10-02 18:02       ` Borislav Petkov [this message]
2020-10-06 16:13       ` Shiju Jose
2020-10-07 16:45         ` James Morse
2020-10-02 16:04   ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201002180125.GD17436@zn.tnic \
    --to=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=rjw@rjwysocki.net \
    --cc=shiju.jose@huawei.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.