All of lore.kernel.org
 help / color / mirror / Atom feed
* Thoughts on current Xen EDAC/MCE situation
@ 2024-01-22 20:53 Elliott Mitchell
  2024-01-23 10:44 ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Elliott Mitchell @ 2024-01-22 20:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Jan Beulich, Andrew Cooper

I've been mentioning this on a regular basis, but the state of MCE
handling with Xen seems poor.

I find the present handling of MCE in Xen an odd choice.  Having Xen do
most of the handling of MCE events is a behavior matching a traditional
stand-alone hypervisor.  Yet Xen was originally pushing any task not
requiring hypervisor action onto Domain 0.

MCE seems a perfect match for sharing responsibility with Domain 0.
Domain 0 needs to know about any MCE event, this is where system
administrators will expect to find logs.  In fact, if the event is a
Correctable Error, then *only* Domain 0 needs to know.  For a CE, Xen
may need no action at all (an implementation could need help) and
the effected domain would need no action.  It is strictly for
Uncorrectable Errors that action beside logging is needed.

For a UE memory error, the best approach might be for Domain 0 to decode
the error.  Once Domain 0 determines it is UE, invoke a hypercall to pass
the GPFN to Xen.  Xen would then forcibly unmap the page (similar to what
Linux does to userspace for corrupted pages).  Xen would then identify
what the page was used for, alert the domain and return that to Domain 0.


The key advantage of this approach is it makes MCE handling act very
similar to MCE handling without Xen.  Documentation about how MCEs are
reported/decoded would apply equally to Xen.  Another rather important
issue is it means less maintenance work to keep MCE handling working with
cutting-edge hardware.  I've noticed one vendor being sluggish about
getting patches into Linux and I fear similar issues may apply more
severely to Xen.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thoughts on current Xen EDAC/MCE situation
  2024-01-22 20:53 Thoughts on current Xen EDAC/MCE situation Elliott Mitchell
@ 2024-01-23 10:44 ` Jan Beulich
  2024-01-23 22:52   ` Elliott Mitchell
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2024-01-23 10:44 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: Andrew Cooper, xen-devel

On 22.01.2024 21:53, Elliott Mitchell wrote:
> I've been mentioning this on a regular basis, but the state of MCE
> handling with Xen seems poor.

I certainly agree here.

> I find the present handling of MCE in Xen an odd choice.  Having Xen do
> most of the handling of MCE events is a behavior matching a traditional
> stand-alone hypervisor.  Yet Xen was originally pushing any task not
> requiring hypervisor action onto Domain 0.

Not exactly. Xen in particular deals with all of CPU and all of memory.
Dom0 may be unaware of the full amount of CPUs in the system, nor the
full memory map (without resorting to interfaces specifically making
that information available, but not to be used for Dom0 kernel's own
acting as a kernel).

> MCE seems a perfect match for sharing responsibility with Domain 0.
> Domain 0 needs to know about any MCE event, this is where system
> administrators will expect to find logs.  In fact, if the event is a
> Correctable Error, then *only* Domain 0 needs to know.  For a CE, Xen
> may need no action at all (an implementation could need help) and
> the effected domain would need no action.  It is strictly for
> Uncorrectable Errors that action beside logging is needed.
> 
> For a UE memory error, the best approach might be for Domain 0 to decode
> the error.  Once Domain 0 determines it is UE, invoke a hypercall to pass
> the GPFN to Xen.

What GPFN? Decoding can only possibly find machine addresses in what
hardware supplies.

>  Xen would then forcibly unmap the page (similar to what
> Linux does to userspace for corrupted pages).  Xen would then identify
> what the page was used for, alert the domain and return that to Domain 0.

Some of this is already in place. How well it functions is a different
question.

> The key advantage of this approach is it makes MCE handling act very
> similar to MCE handling without Xen.

While that's true, you're completely omitting all implications towards
what it means to hand off most handling to Dom0. While it is perhaps
possible to make Linux'es chipset-specific EDAC drivers Xen PV aware,
it might be yet harder to achieve the same in a PVH Dom0.

>  Documentation about how MCEs are
> reported/decoded would apply equally to Xen.  Another rather important
> issue is it means less maintenance work to keep MCE handling working with
> cutting-edge hardware.  I've noticed one vendor being sluggish about
> getting patches into Linux and I fear similar issues may apply more
> severely to Xen.

With all of your suggestions: Who do you think is going to do all of
the work involved here (properly writing down a design, to take care
of all known difficulties, and then actually implement everything)?
We're already short on people, as you're very likely aware.

Jan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thoughts on current Xen EDAC/MCE situation
  2024-01-23 10:44 ` Jan Beulich
@ 2024-01-23 22:52   ` Elliott Mitchell
  2024-01-24  7:23     ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Elliott Mitchell @ 2024-01-23 22:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Tue, Jan 23, 2024 at 11:44:03AM +0100, Jan Beulich wrote:
> On 22.01.2024 21:53, Elliott Mitchell wrote:
> 
> > I find the present handling of MCE in Xen an odd choice.  Having Xen do
> > most of the handling of MCE events is a behavior matching a traditional
> > stand-alone hypervisor.  Yet Xen was originally pushing any task not
> > requiring hypervisor action onto Domain 0.
> 
> Not exactly. Xen in particular deals with all of CPU and all of memory.
> Dom0 may be unaware of the full amount of CPUs in the system, nor the
> full memory map (without resorting to interfaces specifically making
> that information available, but not to be used for Dom0 kernel's own
> acting as a kernel).

Why would this be an issue?

I would expect the handling to be roughly:  NMI -> Xen; Xen schedules a
Dom0 vCPU which is eligible to run on the pCPU onto the pCPU; Dom0
examines registers/MSRs, Dom0 then issues a hypercall to Xen telling
Xen how to resolve the issue (no action, fix memory contents, kill page).

Ideally there would be an idle Dom0 vCPU, but interrupting a busy vCPU
would be viable.  It would even be reasonable to ignore affinity and
grab any Dom0 vCPU.

Dom0 has 2 purposes for the address.  First, to pass it back to Xen.
Second, to report it to a system administrator so they could restart the
system with that address marked as bad.  Dom0 wouldn't care whether the
address was directly accessible to it or not.

The proposed hypercall should report back what was effected by a UE
event.  A given site might have a policy that if $some_domain is hit by a
UE, everything is restarted.  Meanwhile Dom0 or Xen being the winner
could deserve urgent action.


> > MCE seems a perfect match for sharing responsibility with Domain 0.
> > Domain 0 needs to know about any MCE event, this is where system
> > administrators will expect to find logs.  In fact, if the event is a
> > Correctable Error, then *only* Domain 0 needs to know.  For a CE, Xen
> > may need no action at all (an implementation could need help) and
> > the effected domain would need no action.  It is strictly for
> > Uncorrectable Errors that action beside logging is needed.
> > 
> > For a UE memory error, the best approach might be for Domain 0 to decode
> > the error.  Once Domain 0 determines it is UE, invoke a hypercall to pass
> > the GPFN to Xen.
> 
> What GPFN? Decoding can only possibly find machine addresses in what
> hardware supplies.

I may have chosen the wrong term here.

> > The key advantage of this approach is it makes MCE handling act very
> > similar to MCE handling without Xen.
> 
> While that's true, you're completely omitting all implications towards
> what it means to hand off most handling to Dom0. While it is perhaps
> possible to make Linux'es chipset-specific EDAC drivers Xen PV aware,
> it might be yet harder to achieve the same in a PVH Dom0.

Much of it *doesn't* need to be Xen-aware.  There needs to be some
mechanism to allow Dom0 to access special MSRs, beyond that Xen would
only need to interpose between decoding and handling.

> >  Documentation about how MCEs are
> > reported/decoded would apply equally to Xen.  Another rather important
> > issue is it means less maintenance work to keep MCE handling working with
> > cutting-edge hardware.  I've noticed one vendor being sluggish about
> > getting patches into Linux and I fear similar issues may apply more
> > severely to Xen.
> 
> With all of your suggestions: Who do you think is going to do all of
> the work involved here (properly writing down a design, to take care
> of all known difficulties, and then actually implement everything)?
> We're already short on people, as you're very likely aware.

Right now I'm mostly want to know what general course of action is
planned/desired.

Several of the Linux x86 EDAC drivers have been adding a check for a
hypervisor and refusing to load if one is present.  The stated reason
being to get rid of a message.  Problem is this is being scattered into
several places and will make paravirtualized handling *much* harder.  As
such taking action to ensure this is in a single location is kind of
urgent now.

I'm kind of wonder if this is quietly being encouraged by a Redmond, WA
company to poison the well for other hypervisors...

(the OS wars are over, we're now into the hypervisor wars)


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thoughts on current Xen EDAC/MCE situation
  2024-01-23 22:52   ` Elliott Mitchell
@ 2024-01-24  7:23     ` Jan Beulich
  2024-01-24 15:20       ` Elliott Mitchell
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2024-01-24  7:23 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: Andrew Cooper, xen-devel

On 23.01.2024 23:52, Elliott Mitchell wrote:
> On Tue, Jan 23, 2024 at 11:44:03AM +0100, Jan Beulich wrote:
>> On 22.01.2024 21:53, Elliott Mitchell wrote:
>>
>>> I find the present handling of MCE in Xen an odd choice.  Having Xen do
>>> most of the handling of MCE events is a behavior matching a traditional
>>> stand-alone hypervisor.  Yet Xen was originally pushing any task not
>>> requiring hypervisor action onto Domain 0.
>>
>> Not exactly. Xen in particular deals with all of CPU and all of memory.
>> Dom0 may be unaware of the full amount of CPUs in the system, nor the
>> full memory map (without resorting to interfaces specifically making
>> that information available, but not to be used for Dom0 kernel's own
>> acting as a kernel).
> 
> Why would this be an issue?

Well, counter question: For all of ...

> I would expect the handling to be roughly:  NMI -> Xen; Xen schedules a
> Dom0 vCPU which is eligible to run on the pCPU onto the pCPU; Dom0
> examines registers/MSRs, Dom0 then issues a hypercall to Xen telling
> Xen how to resolve the issue (no action, fix memory contents, kill page).
> 
> Ideally there would be an idle Dom0 vCPU, but interrupting a busy vCPU
> would be viable.  It would even be reasonable to ignore affinity and
> grab any Dom0 vCPU.
> 
> Dom0 has 2 purposes for the address.  First, to pass it back to Xen.
> Second, to report it to a system administrator so they could restart the
> system with that address marked as bad.  Dom0 wouldn't care whether the
> address was directly accessible to it or not.
> 
> The proposed hypercall should report back what was effected by a UE
> event.  A given site might have a policy that if $some_domain is hit by a
> UE, everything is restarted.  Meanwhile Dom0 or Xen being the winner
> could deserve urgent action.

... this, did you first look at code and figure how what you suggest
could be seamlessly integrated? Part of your suggestion (if I got it
right) is, after all, to make maintenance on the Dom0 kernel side easy.
I expect such adjustments being not overly intrusive would also be an
acceptance criteria by the maintainers.

Second - since you specifically talk about UE: The more code is involved
in handling, the higher the chance of the #MC ending up fatal to the
system.

Third, as to Dom0's purposes of having the address: If all it is to use
it for is to pass it back to Xen, paths in the respective drivers will
necessarily be entirely different for the Xen vs the native cases.

Jan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thoughts on current Xen EDAC/MCE situation
  2024-01-24  7:23     ` Jan Beulich
@ 2024-01-24 15:20       ` Elliott Mitchell
  2024-02-01  0:45         ` Elliott Mitchell
  0 siblings, 1 reply; 6+ messages in thread
From: Elliott Mitchell @ 2024-01-24 15:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Wed, Jan 24, 2024 at 08:23:15AM +0100, Jan Beulich wrote:
> On 23.01.2024 23:52, Elliott Mitchell wrote:
> > On Tue, Jan 23, 2024 at 11:44:03AM +0100, Jan Beulich wrote:
> >> On 22.01.2024 21:53, Elliott Mitchell wrote:
> >>
> >>> I find the present handling of MCE in Xen an odd choice.  Having Xen do
> >>> most of the handling of MCE events is a behavior matching a traditional
> >>> stand-alone hypervisor.  Yet Xen was originally pushing any task not
> >>> requiring hypervisor action onto Domain 0.
> >>
> >> Not exactly. Xen in particular deals with all of CPU and all of memory.
> >> Dom0 may be unaware of the full amount of CPUs in the system, nor the
> >> full memory map (without resorting to interfaces specifically making
> >> that information available, but not to be used for Dom0 kernel's own
> >> acting as a kernel).
> > 
> > Why would this be an issue?
> 
> Well, counter question: For all of ...
> 
> > I would expect the handling to be roughly:  NMI -> Xen; Xen schedules a
> > Dom0 vCPU which is eligible to run on the pCPU onto the pCPU; Dom0
> > examines registers/MSRs, Dom0 then issues a hypercall to Xen telling
> > Xen how to resolve the issue (no action, fix memory contents, kill page).
> > 
> > Ideally there would be an idle Dom0 vCPU, but interrupting a busy vCPU
> > would be viable.  It would even be reasonable to ignore affinity and
> > grab any Dom0 vCPU.
> > 
> > Dom0 has 2 purposes for the address.  First, to pass it back to Xen.
> > Second, to report it to a system administrator so they could restart the
> > system with that address marked as bad.  Dom0 wouldn't care whether the
> > address was directly accessible to it or not.
> > 
> > The proposed hypercall should report back what was effected by a UE
> > event.  A given site might have a policy that if $some_domain is hit by a
> > UE, everything is restarted.  Meanwhile Dom0 or Xen being the winner
> > could deserve urgent action.
> 
> ... this, did you first look at code and figure how what you suggest
> could be seamlessly integrated? Part of your suggestion (if I got it
> right) is, after all, to make maintenance on the Dom0 kernel side easy.
> I expect such adjustments being not overly intrusive would also be an
> acceptance criteria by the maintainers.

Maintenance on the Dom0 kernel isn't the issue.

One issue is for reporting of MCEs when running on Xen to be consistent
with MCE when not running on Xen.  Notably similar level of information
and ideally tools which assist with analyzing failures working when
running on Xen.

Another issue is to do a better job of keeping Xen up to date with MCE
handling as new hardware with new MCE implementations show up.

> Second - since you specifically talk about UE: The more code is involved
> in handling, the higher the chance of the #MC ending up fatal to the
> system.

Indeed.  Yet right now I'm more concerned over whether MCEs reporting is
happening at all.  There aren't very many messages at all.

> Third, as to Dom0's purposes of having the address: If all it is to use
> it for is to pass it back to Xen, paths in the respective drivers will
> necessarily be entirely different for the Xen vs the native cases.

I'm less than certain of the best place for Xen to intercept MCE events.
For UE memory events, the simplest approach on Linux might be to wrap the
memory_failure() function.  Yet for Linux/x86,
mce_register_decode_chain() also looks like a very good candidate.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Thoughts on current Xen EDAC/MCE situation
  2024-01-24 15:20       ` Elliott Mitchell
@ 2024-02-01  0:45         ` Elliott Mitchell
  0 siblings, 0 replies; 6+ messages in thread
From: Elliott Mitchell @ 2024-02-01  0:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Wed, Jan 24, 2024 at 07:20:56AM -0800, Elliott Mitchell wrote:
> On Wed, Jan 24, 2024 at 08:23:15AM +0100, Jan Beulich wrote:
> > 
> > Third, as to Dom0's purposes of having the address: If all it is to use
> > it for is to pass it back to Xen, paths in the respective drivers will
> > necessarily be entirely different for the Xen vs the native cases.
> 
> I'm less than certain of the best place for Xen to intercept MCE events.
> For UE memory events, the simplest approach on Linux might be to wrap the
> memory_failure() function.  Yet for Linux/x86,
> mce_register_decode_chain() also looks like a very good candidate.

I did hope to get some response.

It really does look like, aside from being x86-only,
mce_register_decode_chain() is the ideal hook point.  Xen could forward
NMIs to Domain 0, then intercept them from the decode chain.  For UEs
Xen would mark the event handled, then create a new event for whichever
domain (if any) was effected.


Right now my main concern is several of the Linux MCE/EDAC drivers are
growing `if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) return -ENODEV;`
calls.

This approach is being poisoned and will become quite difficult if this
isn't stopped.  The justification found for one instance was that it
"removed one message", with no useful information.  I cannot help
suspecting it involved a hypervisor from Redmond, WA and their engineers
are encouraged to poison interfaces used by others.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-02-01  0:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22 20:53 Thoughts on current Xen EDAC/MCE situation Elliott Mitchell
2024-01-23 10:44 ` Jan Beulich
2024-01-23 22:52   ` Elliott Mitchell
2024-01-24  7:23     ` Jan Beulich
2024-01-24 15:20       ` Elliott Mitchell
2024-02-01  0:45         ` Elliott Mitchell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.