All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: "Kani, Toshimitsu" <toshi.kani@hpe.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mchehab@kernel.org" <mchehab@kernel.org>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"srinivas.pandruvada@linux.intel.com"
	<srinivas.pandruvada@linux.intel.com>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac
Date: Fri, 21 Jul 2017 19:23:45 +0200	[thread overview]
Message-ID: <20170721172344.GA11316@nazgul.tnic> (raw)
In-Reply-To: <20170721140131.40079805@vento.lan>

On Fri, Jul 21, 2017 at 02:01:31PM -0300, Mauro Carvalho Chehab wrote:
> I see the value of having a threshold in BIOS, provided that it is
> well documented, and whose value can be adjusted, if needed.
> 
> One of the things I wanted to implement in ras-daemon were an
> algorithm that would be doing such threshold in software.

We have that now in the kernel: drivers/ras/cec.c

We did it exactly for that purpose - not upsetting users unnecessarily.

> The thing with a BIOS threshold is that the user has no way to
> audit the algorithm. So, when BIOS start reporting such errors,
> it may be already too late: the systems may be in the verge of 
> losing data (or some data was already lost).

Not only that: thresholds depend on the DIMM types which means, BIOS
must know what DIMM types are in there which I doubt. So exposing that
to configuration instead of "deciding" for people would be better.

> That's critical on cluster systems with thousands of machines:
> while the impact of disabling a cluster node to do some maintainance
> is marginal, the impact of an uncorrected error on a single
> machine may compromise weeks of expensive processing.
> 
> That's why some users prefer to monitor every single corrected
> error, and compare with the probability distribution they
> know that the risk of uncorrected errors is acceptable.

Yap, you need to have stuff like that configurable - BIOS can't predict
all possible use cases.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

WARNING: multiple messages have this Message-ID (diff)
From: Borislav Petkov <bp@alien8.de>
To: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: "Kani, Toshimitsu" <toshi.kani@hpe.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mchehab@kernel.org" <mchehab@kernel.org>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"srinivas.pandruvada@linux.intel.com"
	<srinivas.pandruvada@linux.intel.com>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: [3/3] ghes_edac: add platform check to enable ghes_edac
Date: Fri, 21 Jul 2017 19:23:45 +0200	[thread overview]
Message-ID: <20170721172344.GA11316@nazgul.tnic> (raw)

On Fri, Jul 21, 2017 at 02:01:31PM -0300, Mauro Carvalho Chehab wrote:
> I see the value of having a threshold in BIOS, provided that it is
> well documented, and whose value can be adjusted, if needed.
> 
> One of the things I wanted to implement in ras-daemon were an
> algorithm that would be doing such threshold in software.

We have that now in the kernel: drivers/ras/cec.c

We did it exactly for that purpose - not upsetting users unnecessarily.

> The thing with a BIOS threshold is that the user has no way to
> audit the algorithm. So, when BIOS start reporting such errors,
> it may be already too late: the systems may be in the verge of 
> losing data (or some data was already lost).

Not only that: thresholds depend on the DIMM types which means, BIOS
must know what DIMM types are in there which I doubt. So exposing that
to configuration instead of "deciding" for people would be better.

> That's critical on cluster systems with thousands of machines:
> while the impact of disabling a cluster node to do some maintainance
> is marginal, the impact of an uncorrected error on a single
> machine may compromise weeks of expensive processing.
> 
> That's why some users prefer to monitor every single corrected
> error, and compare with the probability distribution they
> know that the risk of uncorrected errors is acceptable.

Yap, you need to have stuff like that configurable - BIOS can't predict
all possible use cases.

  parent reply	other threads:[~2017-07-21 17:23 UTC|newest]

Thread overview: 238+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-17 21:59 [PATCH 0/3] enable ghes_edac on selected platforms Toshi Kani
2017-07-17 21:59 ` [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface Toshi Kani
2017-07-17 21:59   ` [1/3] " Toshi Kani
2017-07-18  5:34   ` [PATCH 1/3] " Borislav Petkov
2017-07-18  5:34     ` [1/3] " Borislav Petkov
2017-07-18 15:48     ` [PATCH 1/3] " Kani, Toshimitsu
2017-07-18 15:48       ` [1/3] " Toshi Kani
2017-07-18 15:48       ` [PATCH 1/3] " Kani, Toshimitsu
2017-07-18 16:43       ` Borislav Petkov
2017-07-18 16:43         ` [1/3] " Borislav Petkov
2017-07-18 16:43         ` [PATCH 1/3] " Borislav Petkov
2017-07-18 17:24         ` Kani, Toshimitsu
2017-07-18 17:24           ` [1/3] " Toshi Kani
2017-07-18 17:24           ` [PATCH 1/3] " Kani, Toshimitsu
2017-07-18 17:42           ` Borislav Petkov
2017-07-18 17:42             ` [1/3] " Borislav Petkov
2017-07-18 17:42             ` [PATCH 1/3] " Borislav Petkov
2017-07-18 18:49             ` Kani, Toshimitsu
2017-07-18 18:49               ` [1/3] " Toshi Kani
2017-07-18 18:49               ` [PATCH 1/3] " Kani, Toshimitsu
2017-07-18 19:32               ` Borislav Petkov
2017-07-18 19:32                 ` [1/3] " Borislav Petkov
2017-07-18 19:32                 ` [PATCH 1/3] " Borislav Petkov
2017-07-18 20:17                 ` Kani, Toshimitsu
2017-07-18 20:17                   ` [1/3] " Toshi Kani
2017-07-18 20:17                   ` [PATCH 1/3] " Kani, Toshimitsu
2017-07-17 21:59 ` [PATCH 2/3] intel_pstate: convert to use acpi_match_oemlist() Toshi Kani
2017-07-17 21:59   ` [2/3] " Toshi Kani
2017-07-17 21:59 ` [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac Toshi Kani
2017-07-17 21:59   ` [3/3] " Toshi Kani
2017-07-18  6:00   ` [PATCH 3/3] " Borislav Petkov
2017-07-18  6:00     ` [3/3] " Borislav Petkov
2017-07-18  8:08     ` [PATCH 3/3] " Borislav Petkov
2017-07-18  8:08       ` [3/3] " Borislav Petkov
2017-07-18 21:20       ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-18 21:20         ` [3/3] " Toshi Kani
2017-07-18 21:20         ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-19  5:52         ` Borislav Petkov
2017-07-19  5:52           ` [3/3] " Borislav Petkov
2017-07-19  5:52           ` [PATCH 3/3] " Borislav Petkov
2017-07-19 16:10           ` Kani, Toshimitsu
2017-07-19 16:10             ` [3/3] " Toshi Kani
2017-07-19 16:10             ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-19 16:22             ` Borislav Petkov
2017-07-19 16:22               ` [3/3] " Borislav Petkov
2017-07-19 16:22               ` [PATCH 3/3] " Borislav Petkov
2017-07-19 16:56               ` Kani, Toshimitsu
2017-07-19 16:56                 ` [3/3] " Toshi Kani
2017-07-19 16:56                 ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-20  4:16                 ` Borislav Petkov
2017-07-20  4:16                   ` [3/3] " Borislav Petkov
2017-07-20  4:16                   ` [PATCH 3/3] " Borislav Petkov
2017-07-20 14:42                   ` Kani, Toshimitsu
2017-07-20 14:42                     ` [3/3] " Toshi Kani
2017-07-20 14:42                     ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-20 15:04                     ` Borislav Petkov
2017-07-20 15:04                       ` [3/3] " Borislav Petkov
2017-07-20 15:04                       ` [PATCH 3/3] " Borislav Petkov
2017-07-20 16:55                       ` Luck, Tony
2017-07-20 16:55                         ` [3/3] " Luck, Tony
2017-07-20 16:55                         ` [PATCH 3/3] " Luck, Tony
2017-07-20 17:05                         ` Borislav Petkov
2017-07-20 17:05                           ` [3/3] " Borislav Petkov
2017-07-20 17:05                           ` [PATCH 3/3] " Borislav Petkov
2017-07-20 17:10                           ` Luck, Tony
2017-07-20 17:10                             ` [3/3] " Luck, Tony
2017-07-20 17:10                             ` [PATCH 3/3] " Luck, Tony
2017-07-20 18:16                           ` Mauro Carvalho Chehab
2017-07-20 18:16                             ` [3/3] " Mauro Carvalho Chehab
2017-07-20 18:16                             ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-19 18:55               ` Aristeu Rozanski
2017-07-19 18:55                 ` [3/3] " Aristeu Rozanski
2017-07-19 18:55                 ` [PATCH 3/3] " Aristeu Rozanski
2017-07-19 20:13                 ` Kani, Toshimitsu
2017-07-19 20:13                   ` [3/3] " Toshi Kani
2017-07-19 20:13                   ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-20  4:19                 ` Borislav Petkov
2017-07-20  4:19                   ` [3/3] " Borislav Petkov
2017-07-20  4:19                   ` [PATCH 3/3] " Borislav Petkov
2017-07-18 19:58     ` Kani, Toshimitsu
2017-07-18 19:58       ` [3/3] " Toshi Kani
2017-07-18 19:58       ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-18 21:15       ` Mauro Carvalho Chehab
2017-07-18 21:15         ` [3/3] " Mauro Carvalho Chehab
2017-07-18 21:15         ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-19  5:58         ` Borislav Petkov
2017-07-19  5:58           ` [3/3] " Borislav Petkov
2017-07-19  5:58           ` [PATCH 3/3] " Borislav Petkov
2017-07-19 15:14           ` Luck, Tony
2017-07-19 15:14             ` [3/3] " Luck, Tony
2017-07-19 15:14             ` [PATCH 3/3] " Luck, Tony
2017-07-19 15:57             ` Borislav Petkov
2017-07-19 15:57               ` [3/3] " Borislav Petkov
2017-07-19 15:57               ` [PATCH 3/3] " Borislav Petkov
2017-07-19 18:06               ` Luck, Tony
2017-07-19 18:06                 ` [3/3] " Luck, Tony
2017-07-19 18:06                 ` [PATCH 3/3] " Luck, Tony
2017-07-19 16:02             ` Mauro Carvalho Chehab
2017-07-19 16:02               ` [3/3] " Mauro Carvalho Chehab
2017-07-19 20:06               ` [PATCH 3/3] " Luck, Tony
2017-07-19 20:06                 ` [3/3] " Luck, Tony
2017-07-20 21:15               ` [PATCH 3/3] " Luck, Tony
2017-07-20 21:15                 ` [3/3] " Luck, Tony
2017-07-21  0:00                 ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-21  0:00                   ` [3/3] " Mauro Carvalho Chehab
2017-07-21 16:53                   ` [PATCH 3/3] " Luck, Tony
2017-07-21 16:53                     ` [3/3] " Luck, Tony
2017-07-19 16:40         ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-19 16:40           ` [3/3] " Toshi Kani
2017-07-19 16:40           ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-20  4:33           ` Borislav Petkov
2017-07-20  4:33             ` [3/3] " Borislav Petkov
2017-07-20  4:33             ` [PATCH 3/3] " Borislav Petkov
2017-07-20 19:50             ` Kani, Toshimitsu
2017-07-20 19:50               ` [3/3] " Toshi Kani
2017-07-20 19:50               ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-20 20:15               ` Mauro Carvalho Chehab
2017-07-20 20:15                 ` [3/3] " Mauro Carvalho Chehab
2017-07-20 20:15                 ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-20 21:07                 ` Kani, Toshimitsu
2017-07-20 21:07                   ` [3/3] " Toshi Kani
2017-07-20 21:07                   ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-21 13:34               ` Borislav Petkov
2017-07-21 13:34                 ` [3/3] " Borislav Petkov
2017-07-21 13:34                 ` [PATCH 3/3] " Borislav Petkov
2017-07-21 13:40                 ` Mauro Carvalho Chehab
2017-07-21 13:40                   ` [3/3] " Mauro Carvalho Chehab
2017-07-21 13:40                   ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-21 13:47                   ` Borislav Petkov
2017-07-21 13:47                     ` [3/3] " Borislav Petkov
2017-07-21 13:47                     ` [PATCH 3/3] " Borislav Petkov
2017-07-21 15:08                     ` Kani, Toshimitsu
2017-07-21 15:08                       ` [3/3] " Toshi Kani
2017-07-21 15:08                       ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-21 15:13                       ` Borislav Petkov
2017-07-21 15:13                         ` [3/3] " Borislav Petkov
2017-07-21 15:13                         ` [PATCH 3/3] " Borislav Petkov
2017-07-21 15:34                         ` Kani, Toshimitsu
2017-07-21 15:34                           ` [3/3] " Toshi Kani
2017-07-21 15:34                           ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-21 15:44                           ` Mauro Carvalho Chehab
2017-07-21 15:44                             ` [3/3] " Mauro Carvalho Chehab
2017-07-21 15:44                             ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-21 16:40                             ` Kani, Toshimitsu
2017-07-21 16:40                               ` [3/3] " Toshi Kani
2017-07-21 16:40                               ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-21 17:01                               ` Mauro Carvalho Chehab
2017-07-21 17:01                                 ` [3/3] " Mauro Carvalho Chehab
2017-07-21 17:01                                 ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-21 17:21                                 ` Kani, Toshimitsu
2017-07-21 17:21                                   ` [3/3] " Toshi Kani
2017-07-21 17:21                                   ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-21 17:23                                 ` Borislav Petkov [this message]
2017-07-21 17:23                                   ` [3/3] " Borislav Petkov
2017-07-21 17:23                                   ` [PATCH 3/3] " Borislav Petkov
2017-07-21 18:38                                   ` Kani, Toshimitsu
2017-07-21 18:38                                     ` [3/3] " Toshi Kani
2017-07-21 18:38                                     ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-22  6:28                                     ` Borislav Petkov
2017-07-22  6:28                                       ` [3/3] " Borislav Petkov
2017-07-22  6:28                                       ` [PATCH 3/3] " Borislav Petkov
2017-07-24 14:49                                       ` Kani, Toshimitsu
2017-07-24 14:49                                         ` [3/3] " Toshi Kani
2017-07-24 14:49                                         ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-24 15:04                                         ` Borislav Petkov
2017-07-24 15:04                                           ` [3/3] " Borislav Petkov
2017-07-24 15:04                                           ` [PATCH 3/3] " Borislav Petkov
2017-07-24 15:25                                           ` Kani, Toshimitsu
2017-07-24 15:25                                             ` [3/3] " Toshi Kani
2017-07-24 15:25                                             ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-24 15:37                                             ` Borislav Petkov
2017-07-24 15:37                                               ` [3/3] " Borislav Petkov
2017-07-24 15:37                                               ` [PATCH 3/3] " Borislav Petkov
2017-07-24 15:56                                               ` Kani, Toshimitsu
2017-07-24 15:56                                                 ` [3/3] " Toshi Kani
2017-07-24 15:56                                                 ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-24 16:37                                                 ` Borislav Petkov
2017-07-24 16:37                                                   ` [3/3] " Borislav Petkov
2017-07-24 16:37                                                   ` [PATCH 3/3] " Borislav Petkov
2017-07-24 17:44                                                   ` Kani, Toshimitsu
2017-07-24 17:44                                                     ` [3/3] " Toshi Kani
2017-07-24 17:44                                                     ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-24 17:50                                                     ` Boris Petkov
2017-07-24 17:50                                                       ` [3/3] " Borislav Petkov
2017-07-24 17:50                                                       ` [PATCH 3/3] " Boris Petkov
2017-07-24 17:54                                                       ` Kani, Toshimitsu
2017-07-24 17:54                                                         ` [3/3] " Toshi Kani
2017-07-24 17:54                                                         ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-24 18:18                                                         ` Borislav Petkov
2017-07-24 18:18                                                           ` [3/3] " Borislav Petkov
2017-07-24 18:18                                                           ` [PATCH 3/3] " Borislav Petkov
2017-07-24 17:56                                                 ` Mauro Carvalho Chehab
2017-07-24 17:56                                                   ` [3/3] " Mauro Carvalho Chehab
2017-07-24 17:56                                                   ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-24 18:12                                                   ` Kani, Toshimitsu
2017-07-24 18:12                                                     ` [3/3] " Toshi Kani
2017-07-24 18:12                                                     ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-24 16:04                                               ` Mauro Carvalho Chehab
2017-07-24 16:04                                                 ` [3/3] " Mauro Carvalho Chehab
2017-07-24 16:04                                                 ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-24 16:44                                                 ` Borislav Petkov
2017-07-24 16:44                                                   ` [3/3] " Borislav Petkov
2017-07-24 16:44                                                   ` [PATCH 3/3] " Borislav Petkov
2017-07-24 18:10                                                   ` Mauro Carvalho Chehab
2017-07-24 18:10                                                     ` [3/3] " Mauro Carvalho Chehab
2017-07-24 18:10                                                     ` [PATCH 3/3] " Mauro Carvalho Chehab
2017-07-24 18:30                                                     ` Borislav Petkov
2017-07-24 18:30                                                       ` [3/3] " Borislav Petkov
2017-07-24 18:30                                                       ` [PATCH 3/3] " Borislav Petkov
2017-07-25 23:00                                                       ` Kani, Toshimitsu
2017-07-25 23:00                                                         ` [3/3] " Toshi Kani
2017-07-25 23:00                                                         ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-21 15:53                           ` Borislav Petkov
2017-07-21 15:53                             ` [3/3] " Borislav Petkov
2017-07-21 15:53                             ` [PATCH 3/3] " Borislav Petkov
2017-07-21 16:32                             ` Kani, Toshimitsu
2017-07-21 16:32                               ` [3/3] " Toshi Kani
2017-07-21 16:32                               ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-19  5:55       ` Borislav Petkov
2017-07-19  5:55         ` [3/3] " Borislav Petkov
2017-07-19  5:55         ` [PATCH 3/3] " Borislav Petkov
2017-07-18 22:13     ` Luck, Tony
2017-07-18 22:13       ` [3/3] " Luck, Tony
2017-07-18 22:13       ` [PATCH 3/3] " Luck, Tony
2017-07-19  6:01       ` Borislav Petkov
2017-07-19  6:01         ` [3/3] " Borislav Petkov
2017-07-19  6:01         ` [PATCH 3/3] " Borislav Petkov
2017-07-18 14:39   ` Jeffrey Hugo
2017-07-18 14:39     ` [3/3] " Jeffrey Hugo
2017-07-18 15:36     ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-18 15:36       ` [3/3] " Toshi Kani
2017-07-18 15:36       ` [PATCH 3/3] " Kani, Toshimitsu
2017-07-18 16:24       ` Jeffrey Hugo
2017-07-18 16:24         ` [3/3] " Jeffrey Hugo
2017-07-18 16:24         ` [PATCH 3/3] " Jeffrey Hugo
2017-07-18 16:42         ` Kani, Toshimitsu
2017-07-18 16:42           ` [3/3] " Toshi Kani
2017-07-18 16:42           ` [PATCH 3/3] " Kani, Toshimitsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170721172344.GA11316@nazgul.tnic \
    --to=bp@alien8.de \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=mchehab@s-opensource.com \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=toshi.kani@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.