* [PATCH] EDAC/mce_amd: Do not load edac_mce_amd module on guests @ 2021-06-28 17:27 Smita Koralahalli 2021-07-08 20:41 ` Yazen Ghannam ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: Smita Koralahalli @ 2021-06-28 17:27 UTC (permalink / raw) To: x86, linux-edac, linux-kernel Cc: Robert Richter, Tony Luck, James Morse, yazen.ghannam, Smita Koralahalli Hypervisors may not expose SMCA feature to the guest. Check for X86_FEATURE_HYPERVISOR on entry in mce_amd_init() and return -ENODEV if set. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> --- drivers/edac/mce_amd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index 5dd905a3f30c..1a1629166aa3 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -1176,6 +1176,9 @@ static int __init mce_amd_init(void) c->x86_vendor != X86_VENDOR_HYGON) return -ENODEV; + if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) + return -ENODEV; + if (boot_cpu_has(X86_FEATURE_SMCA)) { xec_mask = 0x3f; goto out; -- 2.17.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] EDAC/mce_amd: Do not load edac_mce_amd module on guests 2021-06-28 17:27 [PATCH] EDAC/mce_amd: Do not load edac_mce_amd module on guests Smita Koralahalli @ 2021-07-08 20:41 ` Yazen Ghannam 2021-07-23 18:18 ` Kim Phillips ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Yazen Ghannam @ 2021-07-08 20:41 UTC (permalink / raw) To: Smita Koralahalli Cc: x86, linux-edac, linux-kernel, Robert Richter, Tony Luck, James Morse On Mon, Jun 28, 2021 at 12:27:40PM -0500, Smita Koralahalli wrote: > Hypervisors may not expose SMCA feature to the guest. > > Check for X86_FEATURE_HYPERVISOR on entry in mce_amd_init() and return > -ENODEV if set. > > Suggested-by: Borislav Petkov <bp@suse.de> > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> > --- > drivers/edac/mce_amd.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c > index 5dd905a3f30c..1a1629166aa3 100644 > --- a/drivers/edac/mce_amd.c > +++ b/drivers/edac/mce_amd.c > @@ -1176,6 +1176,9 @@ static int __init mce_amd_init(void) > c->x86_vendor != X86_VENDOR_HYGON) > return -ENODEV; > > + if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) > + return -ENODEV; > + > if (boot_cpu_has(X86_FEATURE_SMCA)) { > xec_mask = 0x3f; > goto out; > -- Looks good to me. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Thanks, Yazen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] EDAC/mce_amd: Do not load edac_mce_amd module on guests 2021-06-28 17:27 [PATCH] EDAC/mce_amd: Do not load edac_mce_amd module on guests Smita Koralahalli 2021-07-08 20:41 ` Yazen Ghannam @ 2021-07-23 18:18 ` Kim Phillips 2021-08-09 10:44 ` Borislav Petkov 2023-09-08 3:08 ` [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" Elliott Mitchell 3 siblings, 0 replies; 12+ messages in thread From: Kim Phillips @ 2021-07-23 18:18 UTC (permalink / raw) To: Smita Koralahalli, x86, linux-edac, linux-kernel Cc: Robert Richter, Tony Luck, James Morse, yazen.ghannam On 6/28/21 12:27 PM, Smita Koralahalli wrote: > Hypervisors may not expose SMCA feature to the guest. > > Check for X86_FEATURE_HYPERVISOR on entry in mce_amd_init() and return > -ENODEV if set. > > Suggested-by: Borislav Petkov <bp@suse.de> > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> > --- This gets rid of the "Huh? What family is it: 0x19?!" messages in my F19h hosted guest: Tested-by: Kim Phillips <kim.phillips@amd.com> Thanks, Kim ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] EDAC/mce_amd: Do not load edac_mce_amd module on guests 2021-06-28 17:27 [PATCH] EDAC/mce_amd: Do not load edac_mce_amd module on guests Smita Koralahalli 2021-07-08 20:41 ` Yazen Ghannam 2021-07-23 18:18 ` Kim Phillips @ 2021-08-09 10:44 ` Borislav Petkov 2023-09-08 3:08 ` [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" Elliott Mitchell 3 siblings, 0 replies; 12+ messages in thread From: Borislav Petkov @ 2021-08-09 10:44 UTC (permalink / raw) To: Smita Koralahalli Cc: x86, linux-edac, linux-kernel, Robert Richter, Tony Luck, James Morse, yazen.ghannam On Mon, Jun 28, 2021 at 12:27:40PM -0500, Smita Koralahalli wrote: > Hypervisors may not expose SMCA feature to the guest. > > Check for X86_FEATURE_HYPERVISOR on entry in mce_amd_init() and return > -ENODEV if set. > > Suggested-by: Borislav Petkov <bp@suse.de> > Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> > --- > drivers/edac/mce_amd.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c > index 5dd905a3f30c..1a1629166aa3 100644 > --- a/drivers/edac/mce_amd.c > +++ b/drivers/edac/mce_amd.c > @@ -1176,6 +1176,9 @@ static int __init mce_amd_init(void) > c->x86_vendor != X86_VENDOR_HYGON) > return -ENODEV; > > + if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) > + return -ENODEV; > + > if (boot_cpu_has(X86_FEATURE_SMCA)) { > xec_mask = 0x3f; > goto out; > -- > 2.17.1 Applied, thanks. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" 2021-06-28 17:27 [PATCH] EDAC/mce_amd: Do not load edac_mce_amd module on guests Smita Koralahalli ` (2 preceding siblings ...) 2021-08-09 10:44 ` Borislav Petkov @ 2023-09-08 3:08 ` Elliott Mitchell 2023-09-08 3:59 ` Borislav Petkov 3 siblings, 1 reply; 12+ messages in thread From: Elliott Mitchell @ 2023-09-08 3:08 UTC (permalink / raw) To: smita.koralahallichannabasappa Cc: linux-edac, linux-kernel, x86, xen-devel, rric, james.morse, tony.luck, yazen.ghannam This reverts commit 767f4b620edadac579c9b8b6660761d4285fa6f9. There are at least 3 valid reasons why a VM may see MCE events/registers. First, the hypervisor may have a bug. Ideally this should be dealt with by fixing the hypervisor. Failing that, the hypervisor and versions effected need to be identified so only they are flagged as buggy. Second, the Linux kernel may be handling adminstrative duties/hardware for a hypervisor. In this case, the events need to be processed and potentially passed back through the hypervisor. Third, the hypervisor may do full virtualization of MCE events. In such case, they should be handled normally. Any of these blanket disabling the functionality is bad. The original patch was wrong. --- drivers/edac/mce_amd.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index 9215c06783df..1b7fccfbb654 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -1361,9 +1361,6 @@ static int __init mce_amd_init(void) c->x86_vendor != X86_VENDOR_HYGON) return -ENODEV; - if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) - return -ENODEV; - if (boot_cpu_has(X86_FEATURE_SMCA)) { xec_mask = 0x3f; goto out; -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" 2023-09-08 3:08 ` [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" Elliott Mitchell @ 2023-09-08 3:59 ` Borislav Petkov 2023-09-13 14:36 ` Yazen Ghannam 2023-09-14 17:02 ` Elliott Mitchell 0 siblings, 2 replies; 12+ messages in thread From: Borislav Petkov @ 2023-09-08 3:59 UTC (permalink / raw) To: Elliott Mitchell Cc: smita.koralahallichannabasappa, linux-edac, linux-kernel, x86, xen-devel, rric, james.morse, tony.luck, yazen.ghannam On Thu, Sep 07, 2023 at 08:08:00PM -0700, Elliott Mitchell wrote: > This reverts commit 767f4b620edadac579c9b8b6660761d4285fa6f9. > > There are at least 3 valid reasons why a VM may see MCE events/registers. Hmm, so they all read like a bunch of handwaving to me, with those probable hypothetical "may" formulations. How about we cut to the chase and you explain what exactly is the concrete issue you're encountering and trying to solve? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" 2023-09-08 3:59 ` Borislav Petkov @ 2023-09-13 14:36 ` Yazen Ghannam 2023-09-13 15:50 ` Luck, Tony 2023-09-14 17:02 ` Elliott Mitchell 1 sibling, 1 reply; 12+ messages in thread From: Yazen Ghannam @ 2023-09-13 14:36 UTC (permalink / raw) To: Borislav Petkov, Elliott Mitchell Cc: yazen.ghannam, smita.koralahallichannabasappa, linux-edac, linux-kernel, x86, xen-devel, rric, james.morse, tony.luck On 9/7/23 11:59 PM, Borislav Petkov wrote: > On Thu, Sep 07, 2023 at 08:08:00PM -0700, Elliott Mitchell wrote: >> This reverts commit 767f4b620edadac579c9b8b6660761d4285fa6f9. >> >> There are at least 3 valid reasons why a VM may see MCE events/registers. > > Hmm, so they all read like a bunch of handwaving to me, with those > probable hypothetical "may" formulations. > > How about we cut to the chase and you explain what exactly is the > concrete issue you're encountering and trying to solve? Also, please note that the EDAC modules don't handle MCE events directly. They act on information passed from the MCE subsystem. Furthermore, there are other EDAC modules that have the same !hypervisor check, so why change only this one? Thanks, Yazen ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" 2023-09-13 14:36 ` Yazen Ghannam @ 2023-09-13 15:50 ` Luck, Tony 2023-09-13 16:21 ` Yazen Ghannam 0 siblings, 1 reply; 12+ messages in thread From: Luck, Tony @ 2023-09-13 15:50 UTC (permalink / raw) To: Yazen Ghannam, Borislav Petkov, Elliott Mitchell Cc: smita.koralahallichannabasappa, linux-edac, linux-kernel, x86, xen-devel, rric, james.morse > Also, please note that the EDAC modules don't handle MCE events > directly. They act on information passed from the MCE subsystem. > > Furthermore, there are other EDAC modules that have the same !hypervisor > check, so why change only this one? The older Intel EDAC drivers translated system physical addresses to DIMM addresses by digging around in the CONFIG and MMIO space of the memory controller devices. It would seem unwise for a VMM to give access to those addresses to a guest (in general ... perhaps OK for a Xen style "DOM0" guest that is handling many tasks for the VMM?). What system resources do AMD EDAC drivers need access to? Could they work inside a guest? -Tony ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" 2023-09-13 15:50 ` Luck, Tony @ 2023-09-13 16:21 ` Yazen Ghannam 0 siblings, 0 replies; 12+ messages in thread From: Yazen Ghannam @ 2023-09-13 16:21 UTC (permalink / raw) To: Luck, Tony, Borislav Petkov, Elliott Mitchell Cc: yazen.ghannam, smita.koralahallichannabasappa, linux-edac, linux-kernel, x86, xen-devel, rric, james.morse On 9/13/23 11:50 AM, Luck, Tony wrote: >> Also, please note that the EDAC modules don't handle MCE events >> directly. They act on information passed from the MCE subsystem. >> >> Furthermore, there are other EDAC modules that have the same !hypervisor >> check, so why change only this one? > > The older Intel EDAC drivers translated system physical addresses to DIMM > addresses by digging around in the CONFIG and MMIO space of the memory > controller devices. It would seem unwise for a VMM to give access to those > addresses to a guest (in general ... perhaps OK for a Xen style "DOM0" guest that is > handling many tasks for the VMM?). > > What system resources do AMD EDAC drivers need access to? Could they > work inside a guest? > The MCE decoder may access some newer MCA registers, or request info from the MCE subsystem. But this is for informational error decoding. It won't support any actions that a guest could take. The AMD64 EDAC module reads system-specific memory controller registers through non-architectural interfaces. So also unwise or not useful for a guest to access. Thanks, Yazen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" 2023-09-08 3:59 ` Borislav Petkov 2023-09-13 14:36 ` Yazen Ghannam @ 2023-09-14 17:02 ` Elliott Mitchell 2023-09-15 11:56 ` Borislav Petkov 1 sibling, 1 reply; 12+ messages in thread From: Elliott Mitchell @ 2023-09-14 17:02 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Yazen Ghannam, smita.koralahallichannabasappa, linux-edac, linux-kernel, x86, xen-devel, rric, james.morse On Fri, Sep 08, 2023 at 05:59:11AM +0200, Borislav Petkov wrote: > On Thu, Sep 07, 2023 at 08:08:00PM -0700, Elliott Mitchell wrote: > > This reverts commit 767f4b620edadac579c9b8b6660761d4285fa6f9. > > > > There are at least 3 valid reasons why a VM may see MCE events/registers. > > Hmm, so they all read like a bunch of handwaving to me, with those > probable hypothetical "may" formulations. Indeed. At what point is the lack of information and response long enough to simply commit a revert due to those lacks? Even with the commit message having been rewritten and the link to: https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com added, this still reads as roughly: "A hypothetical bug on a hypothetivisor" I rather suspect a genuine issue was observed, but with absolutely no detail this is useless. I can make some guesses, but those guesses relation to reality is dubious. On Wed, Sep 13, 2023 at 03:50:12PM +0000, Luck, Tony wrote: > > Also, please note that the EDAC modules don't handle MCE events > > directly. They act on information passed from the MCE subsystem. > > > > Furthermore, there are other EDAC modules that have the same !hypervisor > > check, so why change only this one? > > The older Intel EDAC drivers translated system physical addresses to DIMM > addresses by digging around in the CONFIG and MMIO space of the memory > controller devices. It would seem unwise for a VMM to give access to those > addresses to a guest (in general ... perhaps OK for a Xen style "DOM0" guest that is > handling many tasks for the VMM?). Which seems oddly similar to: "the Linux kernel may be handling adminstrative duties/hardware for a hypervisor. In this case, the events need to be processed and potentially passed back through the hypervisor." On Wed, Sep 13, 2023 at 12:21:50PM -0400, Yazen Ghannam wrote: > The MCE decoder may access some newer MCA registers, or request info > from the MCE subsystem. But this is for informational error decoding. It > won't support any actions that a guest could take. > > The AMD64 EDAC module reads system-specific memory controller registers > through non-architectural interfaces. So also unwise or not useful for a > guest to access. This could be emulated. With it not being officially specified the emulation may not be too accurate, but it is possible. Admittedly VMware may have abandoned this level of perfect emulation accuracy, but one could do it. Which would be "full virtualization of MCE events." On Wed, Sep 13, 2023 at 10:36:50AM -0400, Yazen Ghannam wrote: > Furthermore, there are other EDAC modules that have the same !hypervisor > check, so why change only this one? Indeed. Those will also need similar treatment, but that wouldn't be a revert of 767f4b620eda. I found 767f4b620eda in the process of looking for the correct hook point. There are at least two, and possibly more, points of view with regards to MCE and virtualization. I keep noticing most implementers are strictly thinking of perfect, full virtualization of hardware, and missing what is actually desired. Full virtualization is where you are renting an actual physical slice of actual hardware, proper virtualization of CEs and UEs is desireable. In reality most clients merely want to rent the processing power the hardware provides and not deal with actually owning the hardware. To them, CEs are an annoyance since they clutter logs and they're not something they're in a position to deal with. Instead the owner of the hardware wants the CEs so they can monitor hardware health. What you want depends on your SLAs, but the most prominent authors keep missing that many clients (VM owners) don't actually want to deal with CEs. A SLA could also state a single UE means discarding current VM state and rolling back to the last known good checkpoint. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" 2023-09-14 17:02 ` Elliott Mitchell @ 2023-09-15 11:56 ` Borislav Petkov 2023-09-21 21:18 ` Elliott Mitchell 0 siblings, 1 reply; 12+ messages in thread From: Borislav Petkov @ 2023-09-15 11:56 UTC (permalink / raw) To: Elliott Mitchell Cc: Luck, Tony, Yazen Ghannam, smita.koralahallichannabasappa, linux-edac, linux-kernel, x86, xen-devel, rric, james.morse On Thu, Sep 14, 2023 at 10:02:05AM -0700, Elliott Mitchell wrote: > Indeed. At what point is the lack of information and response long > enough to simply commit a revert due to those lacks? At no point. > Even with the commit message having been rewritten and the link to: > https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com > added, this still reads as roughly: > > "A hypothetical bug on a hypothetivisor" If "Hypervisors likely do not expose the SMCA feature to the guest" doesn't explain to you what the problem is this commit is fixing, then I can't help you. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" 2023-09-15 11:56 ` Borislav Petkov @ 2023-09-21 21:18 ` Elliott Mitchell 0 siblings, 0 replies; 12+ messages in thread From: Elliott Mitchell @ 2023-09-21 21:18 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Yazen Ghannam, smita.koralahallichannabasappa, linux-edac, linux-kernel, x86, xen-devel, rric, james.morse On Fri, Sep 15, 2023 at 01:56:31PM +0200, Borislav Petkov wrote: > On Thu, Sep 14, 2023 at 10:02:05AM -0700, Elliott Mitchell wrote: > > Indeed. At what point is the lack of information and response long > > enough to simply commit a revert due to those lacks? > > At no point. > > > Even with the commit message having been rewritten and the link to: > > https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com > > added, this still reads as roughly: > > > > "A hypothetical bug on a hypothetivisor" > > If "Hypervisors likely do not expose the SMCA feature to the guest" > doesn't explain to you what the problem is this commit is fixing, then > I can't help you. Problem is you were objecting to 'probable hypothetical "may" formulations' in what I wrote, yet the original patch message overtly uses that word. In order for the first patch to be correct, it is insufficient for the condition to be unlikely. Ideally it should be mathematically proven impossible. As such I was writing about known counter-examples from the real world. Mainly at least one hypervisor (Xen) does tend to allow a particular VM to access sensitive system registers. Also it is entirely possible some hypervisor could proxy access to the registers and thus properly simulate the events. Not only that, but in fact this very strategy was already actively deployed: https://bugs.debian.org/810964 I'm less than 100% certain this successfully retrieves EDAC events on Xen right now, so I had been taking a look at the situation only to find 767f4b620eda. Perhaps everyone should consult with large-scale system administrators when doing things which effect them? -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-09-21 21:41 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-28 17:27 [PATCH] EDAC/mce_amd: Do not load edac_mce_amd module on guests Smita Koralahalli 2021-07-08 20:41 ` Yazen Ghannam 2021-07-23 18:18 ` Kim Phillips 2021-08-09 10:44 ` Borislav Petkov 2023-09-08 3:08 ` [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" Elliott Mitchell 2023-09-08 3:59 ` Borislav Petkov 2023-09-13 14:36 ` Yazen Ghannam 2023-09-13 15:50 ` Luck, Tony 2023-09-13 16:21 ` Yazen Ghannam 2023-09-14 17:02 ` Elliott Mitchell 2023-09-15 11:56 ` Borislav Petkov 2023-09-21 21:18 ` Elliott Mitchell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).