All of lore.kernel.org
 help / color / mirror / Atom feed
* HVM/PVH Ballon crash
@ 2021-09-05 22:10 Elliott Mitchell
  2021-09-06  7:52 ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Elliott Mitchell @ 2021-09-05 22:10 UTC (permalink / raw)
  To: xen-devel

I brought this up a while back, but it still appears to be present and
the latest observations appear rather serious.

I'm unsure of the entire set of conditions for reproduction.

Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but
this is an older AMD IOMMU).

This has been confirmed with Xen 4.11 and Xen 4.14.  This includes
Debian's patches, but those are mostly backports or environment
adjustments.

Domain 0 is presently using a 4.19 kernel.

The trigger is creating a HVM or PVH domain where memory does not equal
maxmem.


New observations:

I discovered this occurs with PVH domains in addition to HVM ones.

I got PVH GRUB operational.  PVH GRUB appeared at to operate normally
and not trigger the crash/panic.

The crash/panic occurred some number of seconds after the Linux kernel
was loaded.


Mitigation by not using ballooning with HVM/PVH is workable, but this is
quite a large mine in the configuration.

I'm wondering if perhaps it is actually the Linux kernel in Domain 0
which is panicing.

The crash/panic occurring AFTER the main kernel loads suggests some
action by the user domain is doing is the actual trigger of the
crash/panic.


That last point is actually rather worrisome.  There might be a security
hole lurking here.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Ballon crash
  2021-09-05 22:10 HVM/PVH Ballon crash Elliott Mitchell
@ 2021-09-06  7:52 ` Jan Beulich
  2021-09-06 20:47   ` HVM/PVH Balloon crash Elliott Mitchell
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2021-09-06  7:52 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel

On 06.09.2021 00:10, Elliott Mitchell wrote:
> I brought this up a while back, but it still appears to be present and
> the latest observations appear rather serious.
> 
> I'm unsure of the entire set of conditions for reproduction.
> 
> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but
> this is an older AMD IOMMU).
> 
> This has been confirmed with Xen 4.11 and Xen 4.14.  This includes
> Debian's patches, but those are mostly backports or environment
> adjustments.
> 
> Domain 0 is presently using a 4.19 kernel.
> 
> The trigger is creating a HVM or PVH domain where memory does not equal
> maxmem.

I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory
allocations" submitted very early this year? There you said the issue
was with a guest's maxmem exceeding host memory size. Here you seem to
be talking of PoD in its normal form of use. Personally I uses this
all the time (unless enabling PCI pass-through for a guest, for being
incompatible). I've not observed any badness as severe as you've
described.

> New observations:
> 
> I discovered this occurs with PVH domains in addition to HVM ones.
> 
> I got PVH GRUB operational.  PVH GRUB appeared at to operate normally
> and not trigger the crash/panic.
> 
> The crash/panic occurred some number of seconds after the Linux kernel
> was loaded.
> 
> 
> Mitigation by not using ballooning with HVM/PVH is workable, but this is
> quite a large mine in the configuration.
> 
> I'm wondering if perhaps it is actually the Linux kernel in Domain 0
> which is panicing.
> 
> The crash/panic occurring AFTER the main kernel loads suggests some
> action by the user domain is doing is the actual trigger of the
> crash/panic.

All of this is pretty vague: If you don't even know what component it
is that crashes / panics, I don't suppose you have any logs. Yet what
do you expect us to do without any technical detail?

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-06  7:52 ` Jan Beulich
@ 2021-09-06 20:47   ` Elliott Mitchell
  2021-09-07  8:03     ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Elliott Mitchell @ 2021-09-06 20:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote:
> On 06.09.2021 00:10, Elliott Mitchell wrote:
> > I brought this up a while back, but it still appears to be present and
> > the latest observations appear rather serious.
> > 
> > I'm unsure of the entire set of conditions for reproduction.
> > 
> > Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but
> > this is an older AMD IOMMU).
> > 
> > This has been confirmed with Xen 4.11 and Xen 4.14.  This includes
> > Debian's patches, but those are mostly backports or environment
> > adjustments.
> > 
> > Domain 0 is presently using a 4.19 kernel.
> > 
> > The trigger is creating a HVM or PVH domain where memory does not equal
> > maxmem.
> 
> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory
> allocations" submitted very early this year? There you said the issue
> was with a guest's maxmem exceeding host memory size. Here you seem to
> be talking of PoD in its normal form of use. Personally I uses this
> all the time (unless enabling PCI pass-through for a guest, for being
> incompatible). I've not observed any badness as severe as you've
> described.

I've got very little idea what is occurring as I'm expecting to be doing
ARM debugging, not x86 debugging.

I was starting to wonder whether this was widespread or not.  As such I
was reporting the factors which might be different in my environment.

The one which sticks out is the computer has an older AMD processor (you
a 100% Intel shop?).  The processor has the AMD NPT feature, but a very
early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not
available").

Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an
IOMMU).


There is also the possibility Debian added a bad patch, but that seems
improbable as there aren't enough bug reports.


> > New observations:
> > 
> > I discovered this occurs with PVH domains in addition to HVM ones.
> > 
> > I got PVH GRUB operational.  PVH GRUB appeared at to operate normally
> > and not trigger the crash/panic.
> > 
> > The crash/panic occurred some number of seconds after the Linux kernel
> > was loaded.
> > 
> > 
> > Mitigation by not using ballooning with HVM/PVH is workable, but this is
> > quite a large mine in the configuration.
> > 
> > I'm wondering if perhaps it is actually the Linux kernel in Domain 0
> > which is panicing.
> > 
> > The crash/panic occurring AFTER the main kernel loads suggests some
> > action by the user domain is doing is the actual trigger of the
> > crash/panic.
> 
> All of this is pretty vague: If you don't even know what component it
> is that crashes / panics, I don't suppose you have any logs. Yet what
> do you expect us to do without any technical detail?

Initially this had looked so spectacular as to be easy to reproduce.

No logs, I wasn't expecting to be doing hardware-level debugging on x86.
I've got several USB to TTL-serial cables (ARM/MIPS debug), I may need to
hunt a USB to full voltage EIA-232C cable.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-06 20:47   ` HVM/PVH Balloon crash Elliott Mitchell
@ 2021-09-07  8:03     ` Jan Beulich
  2021-09-07 15:03       ` Elliott Mitchell
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2021-09-07  8:03 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel

On 06.09.2021 22:47, Elliott Mitchell wrote:
> On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote:
>> On 06.09.2021 00:10, Elliott Mitchell wrote:
>>> I brought this up a while back, but it still appears to be present and
>>> the latest observations appear rather serious.
>>>
>>> I'm unsure of the entire set of conditions for reproduction.
>>>
>>> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but
>>> this is an older AMD IOMMU).
>>>
>>> This has been confirmed with Xen 4.11 and Xen 4.14.  This includes
>>> Debian's patches, but those are mostly backports or environment
>>> adjustments.
>>>
>>> Domain 0 is presently using a 4.19 kernel.
>>>
>>> The trigger is creating a HVM or PVH domain where memory does not equal
>>> maxmem.
>>
>> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory
>> allocations" submitted very early this year? There you said the issue
>> was with a guest's maxmem exceeding host memory size. Here you seem to
>> be talking of PoD in its normal form of use. Personally I uses this
>> all the time (unless enabling PCI pass-through for a guest, for being
>> incompatible). I've not observed any badness as severe as you've
>> described.
> 
> I've got very little idea what is occurring as I'm expecting to be doing
> ARM debugging, not x86 debugging.
> 
> I was starting to wonder whether this was widespread or not.  As such I
> was reporting the factors which might be different in my environment.
> 
> The one which sticks out is the computer has an older AMD processor (you
> a 100% Intel shop?).

No, AMD is as relevant to us as is Intel.

>  The processor has the AMD NPT feature, but a very
> early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not
> available").
> 
> Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an
> IOMMU).

That sounds odd at the first glance - PVH simply requires that there be
an (enabled) IOMMU. Hence the only thing I could imagine is that Xen
doesn't enable the IOMMU in the first place for some reason.

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-07  8:03     ` Jan Beulich
@ 2021-09-07 15:03       ` Elliott Mitchell
  2021-09-07 15:57         ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Elliott Mitchell @ 2021-09-07 15:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Tue, Sep 07, 2021 at 10:03:51AM +0200, Jan Beulich wrote:
> On 06.09.2021 22:47, Elliott Mitchell wrote:
> > On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote:
> >> On 06.09.2021 00:10, Elliott Mitchell wrote:
> >>> I brought this up a while back, but it still appears to be present and
> >>> the latest observations appear rather serious.
> >>>
> >>> I'm unsure of the entire set of conditions for reproduction.
> >>>
> >>> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but
> >>> this is an older AMD IOMMU).
> >>>
> >>> This has been confirmed with Xen 4.11 and Xen 4.14.  This includes
> >>> Debian's patches, but those are mostly backports or environment
> >>> adjustments.
> >>>
> >>> Domain 0 is presently using a 4.19 kernel.
> >>>
> >>> The trigger is creating a HVM or PVH domain where memory does not equal
> >>> maxmem.
> >>
> >> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory
> >> allocations" submitted very early this year? There you said the issue
> >> was with a guest's maxmem exceeding host memory size. Here you seem to
> >> be talking of PoD in its normal form of use. Personally I uses this
> >> all the time (unless enabling PCI pass-through for a guest, for being
> >> incompatible). I've not observed any badness as severe as you've
> >> described.
> > 
> > I've got very little idea what is occurring as I'm expecting to be doing
> > ARM debugging, not x86 debugging.
> > 
> > I was starting to wonder whether this was widespread or not.  As such I
> > was reporting the factors which might be different in my environment.
> > 
> > The one which sticks out is the computer has an older AMD processor (you
> > a 100% Intel shop?).
> 
> No, AMD is as relevant to us as is Intel.
> 
> >  The processor has the AMD NPT feature, but a very
> > early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not
> > available").
> > 
> > Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an
> > IOMMU).
> 
> That sounds odd at the first glance - PVH simply requires that there be
> an (enabled) IOMMU. Hence the only thing I could imagine is that Xen
> doesn't enable the IOMMU in the first place for some reason.

Doesn't seem that odd to me.  I don't know the differences between the
first and second versions of the AMD IOMMU, but could well be v1 was
judged not to have enough functionality to bother with.

What this does make me wonder is, how much testing was done on systems
with functioning NPT, but disabled IOMMU?  Could be this system is in an
intergenerational hole, and some spot in the PVH/HVM code makes an
assumption of the presence of NPT guarantees presence of an operational
IOMMU.  Otherwise if there was some copy and paste while writing IOMMU
code, some portion of the IOMMU code might be checking for presence of
NPT instead of presence of IOMMU.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-07 15:03       ` Elliott Mitchell
@ 2021-09-07 15:57         ` Jan Beulich
  2021-09-07 21:40           ` Elliott Mitchell
  2021-09-15  2:40           ` Elliott Mitchell
  0 siblings, 2 replies; 16+ messages in thread
From: Jan Beulich @ 2021-09-07 15:57 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel

On 07.09.2021 17:03, Elliott Mitchell wrote:
> On Tue, Sep 07, 2021 at 10:03:51AM +0200, Jan Beulich wrote:
>> On 06.09.2021 22:47, Elliott Mitchell wrote:
>>> On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote:
>>>> On 06.09.2021 00:10, Elliott Mitchell wrote:
>>>>> I brought this up a while back, but it still appears to be present and
>>>>> the latest observations appear rather serious.
>>>>>
>>>>> I'm unsure of the entire set of conditions for reproduction.
>>>>>
>>>>> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but
>>>>> this is an older AMD IOMMU).
>>>>>
>>>>> This has been confirmed with Xen 4.11 and Xen 4.14.  This includes
>>>>> Debian's patches, but those are mostly backports or environment
>>>>> adjustments.
>>>>>
>>>>> Domain 0 is presently using a 4.19 kernel.
>>>>>
>>>>> The trigger is creating a HVM or PVH domain where memory does not equal
>>>>> maxmem.
>>>>
>>>> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory
>>>> allocations" submitted very early this year? There you said the issue
>>>> was with a guest's maxmem exceeding host memory size. Here you seem to
>>>> be talking of PoD in its normal form of use. Personally I uses this
>>>> all the time (unless enabling PCI pass-through for a guest, for being
>>>> incompatible). I've not observed any badness as severe as you've
>>>> described.
>>>
>>> I've got very little idea what is occurring as I'm expecting to be doing
>>> ARM debugging, not x86 debugging.
>>>
>>> I was starting to wonder whether this was widespread or not.  As such I
>>> was reporting the factors which might be different in my environment.
>>>
>>> The one which sticks out is the computer has an older AMD processor (you
>>> a 100% Intel shop?).
>>
>> No, AMD is as relevant to us as is Intel.
>>
>>>  The processor has the AMD NPT feature, but a very
>>> early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not
>>> available").
>>>
>>> Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an
>>> IOMMU).
>>
>> That sounds odd at the first glance - PVH simply requires that there be
>> an (enabled) IOMMU. Hence the only thing I could imagine is that Xen
>> doesn't enable the IOMMU in the first place for some reason.
> 
> Doesn't seem that odd to me.  I don't know the differences between the
> first and second versions of the AMD IOMMU, but could well be v1 was
> judged not to have enough functionality to bother with.
> 
> What this does make me wonder is, how much testing was done on systems
> with functioning NPT, but disabled IOMMU?

No idea. During development is may happen (rarely) that one disables
the IOMMU on purpose. Beyond that - can't tell.

>  Could be this system is in an
> intergenerational hole, and some spot in the PVH/HVM code makes an
> assumption of the presence of NPT guarantees presence of an operational
> IOMMU.  Otherwise if there was some copy and paste while writing IOMMU
> code, some portion of the IOMMU code might be checking for presence of
> NPT instead of presence of IOMMU.

This is all very speculative; I consider what you suspect not very likely,
but also not entirely impossible. This is not the least because for a
long time we've been running without shared page tables on AMD.

I'm afraid without technical data and without knowing how to repro, I
don't see a way forward here.

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-07 15:57         ` Jan Beulich
@ 2021-09-07 21:40           ` Elliott Mitchell
  2021-09-15  2:40           ` Elliott Mitchell
  1 sibling, 0 replies; 16+ messages in thread
From: Elliott Mitchell @ 2021-09-07 21:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote:
> On 07.09.2021 17:03, Elliott Mitchell wrote:
> > On Tue, Sep 07, 2021 at 10:03:51AM +0200, Jan Beulich wrote:
> >>
> >> That sounds odd at the first glance - PVH simply requires that there be
> >> an (enabled) IOMMU. Hence the only thing I could imagine is that Xen
> >> doesn't enable the IOMMU in the first place for some reason.
> > 
> > Doesn't seem that odd to me.  I don't know the differences between the
> > first and second versions of the AMD IOMMU, but could well be v1 was
> > judged not to have enough functionality to bother with.
> > 
> > What this does make me wonder is, how much testing was done on systems
> > with functioning NPT, but disabled IOMMU?
> 
> No idea. During development is may happen (rarely) that one disables
> the IOMMU on purpose. Beyond that - can't tell.

Thus this processor having an early and not too capable IOMMU seems
worthy of note.

> >  Could be this system is in an
> > intergenerational hole, and some spot in the PVH/HVM code makes an
> > assumption of the presence of NPT guarantees presence of an operational
> > IOMMU.  Otherwise if there was some copy and paste while writing IOMMU
> > code, some portion of the IOMMU code might be checking for presence of
> > NPT instead of presence of IOMMU.
> 
> This is all very speculative; I consider what you suspect not very likely,
> but also not entirely impossible. This is not the least because for a
> long time we've been running without shared page tables on AMD.
> 
> I'm afraid without technical data and without knowing how to repro, I
> don't see a way forward here.

I cannot report things which do not exist.  This occurs very quickly and
no warning or error messages have ever been observed on the main console
(VGA).

Happens during user domain kernel boot.  The configuration:
builder = "hvm"
name = "kr45h"
memory = 1024
maxmem = 16384
vcpus = 2
vif = [ '' ]
disk = [ 'vdev=sdc,format=raw,access=r,devtype=cdrom,target=/tmp/boot.iso', ]
sdl = 1

has been tested and confirmed to reproduce.  Looks like this was last
examined with a FreeBSD 12.2 AMD64 ISO, but Linux ISOs (un)happily work
too.  It is less than 40 seconds from `xl create` to indications of
hardware boot process starting.

Since there don't appear to be too many reports, the one factor which
now stands out is this machine has an AMD processor.  Xen confirms
presence of NPT support, but reports "I/O virtualisation disabled"
(older, less capable IOMMU).


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-07 15:57         ` Jan Beulich
  2021-09-07 21:40           ` Elliott Mitchell
@ 2021-09-15  2:40           ` Elliott Mitchell
  2021-09-15  6:05             ` Jan Beulich
  1 sibling, 1 reply; 16+ messages in thread
From: Elliott Mitchell @ 2021-09-15  2:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote:
> On 07.09.2021 17:03, Elliott Mitchell wrote:
> >  Could be this system is in an
> > intergenerational hole, and some spot in the PVH/HVM code makes an
> > assumption of the presence of NPT guarantees presence of an operational
> > IOMMU.  Otherwise if there was some copy and paste while writing IOMMU
> > code, some portion of the IOMMU code might be checking for presence of
> > NPT instead of presence of IOMMU.
> 
> This is all very speculative; I consider what you suspect not very likely,
> but also not entirely impossible. This is not the least because for a
> long time we've been running without shared page tables on AMD.
> 
> I'm afraid without technical data and without knowing how to repro, I
> don't see a way forward here.

Downtimes are very expensive even for lower-end servers.  Plus there is
the issue the system wasn't meant for development and thus never had
appropriate setup done.

Experimentation with a system of similar age suggested another candidate.
System has a conventional BIOS.  Might some dependancies on the presence
of UEFI snuck into the NPT code?


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-15  2:40           ` Elliott Mitchell
@ 2021-09-15  6:05             ` Jan Beulich
  2021-09-26 22:53               ` Elliott Mitchell
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2021-09-15  6:05 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel

On 15.09.2021 04:40, Elliott Mitchell wrote:
> On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote:
>> On 07.09.2021 17:03, Elliott Mitchell wrote:
>>>  Could be this system is in an
>>> intergenerational hole, and some spot in the PVH/HVM code makes an
>>> assumption of the presence of NPT guarantees presence of an operational
>>> IOMMU.  Otherwise if there was some copy and paste while writing IOMMU
>>> code, some portion of the IOMMU code might be checking for presence of
>>> NPT instead of presence of IOMMU.
>>
>> This is all very speculative; I consider what you suspect not very likely,
>> but also not entirely impossible. This is not the least because for a
>> long time we've been running without shared page tables on AMD.
>>
>> I'm afraid without technical data and without knowing how to repro, I
>> don't see a way forward here.
> 
> Downtimes are very expensive even for lower-end servers.  Plus there is
> the issue the system wasn't meant for development and thus never had
> appropriate setup done.
> 
> Experimentation with a system of similar age suggested another candidate.
> System has a conventional BIOS.  Might some dependancies on the presence
> of UEFI snuck into the NPT code?

I can't think of any such, but as all of this is very nebulous I can't
really rule out anything.

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-15  6:05             ` Jan Beulich
@ 2021-09-26 22:53               ` Elliott Mitchell
  2021-09-29 13:32                 ` Jan Beulich
  2021-09-30  7:43                 ` Jan Beulich
  0 siblings, 2 replies; 16+ messages in thread
From: Elliott Mitchell @ 2021-09-26 22:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Wed, Sep 15, 2021 at 08:05:05AM +0200, Jan Beulich wrote:
> On 15.09.2021 04:40, Elliott Mitchell wrote:
> > On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote:
> >> On 07.09.2021 17:03, Elliott Mitchell wrote:
> >>>  Could be this system is in an
> >>> intergenerational hole, and some spot in the PVH/HVM code makes an
> >>> assumption of the presence of NPT guarantees presence of an operational
> >>> IOMMU.  Otherwise if there was some copy and paste while writing IOMMU
> >>> code, some portion of the IOMMU code might be checking for presence of
> >>> NPT instead of presence of IOMMU.
> >>
> >> This is all very speculative; I consider what you suspect not very likely,
> >> but also not entirely impossible. This is not the least because for a
> >> long time we've been running without shared page tables on AMD.
> >>
> >> I'm afraid without technical data and without knowing how to repro, I
> >> don't see a way forward here.
> > 
> > Downtimes are very expensive even for lower-end servers.  Plus there is
> > the issue the system wasn't meant for development and thus never had
> > appropriate setup done.
> > 
> > Experimentation with a system of similar age suggested another candidate.
> > System has a conventional BIOS.  Might some dependancies on the presence
> > of UEFI snuck into the NPT code?
> 
> I can't think of any such, but as all of this is very nebulous I can't
> really rule out anything.

Getting everything right to recreate is rather inexact.  Having an
equivalent of `sysctl` to turn on the serial console while running might
be handy...

Luckily get things together and...

(XEN) mm locking order violation: 48 > 16
(XEN) Xen BUG at mm-locks.h:82
(XEN) ----[ Xen-4.14.3  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff82d0402e8be0>] arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260
(XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor (d1v0)
(XEN) rax: ffff83080b2f106c   rbx: ffff83081da0f2d0   rcx: 0000000000000000
(XEN) rdx: ffff83080b27ffff   rsi: 000000000000000a   rdi: ffff82d040469738
(XEN) rbp: ffff82d040580688   rsp: ffff83080b27f8b0   r8:  0000000000000002
(XEN) r9:  0000000000008000   r10: ffff82d04058f381   r11: ffff82d040375100
(XEN) r12: ffff82d040580688   r13: ffff83080b27ffff   r14: ffff83081ddf6000
(XEN) r15: 00000000004f8c00   cr0: 000000008005003b   cr4: 00000000000406e0
(XEN) cr3: 000000081dee6000   cr2: 0000000000000000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0010   gs: 0010   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d0402e8be0> (arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260):
(XEN)  e3 0c 00 e8 30 7f f6 ff <0f> 0b 66 0f 1f 44 00 00 42 8b 34 20 48 8d 3d 8d
(XEN) Xen stack trace from rsp=ffff83080b27f8b0:
(XEN)    ffff83081ddf67c8 ffff83081ddf6810 ffff82d040580688 000000081ddf4067
(XEN)    0000000000000001 ffff82d0402ec51c ffff83081ddf6000 0000000000000000
(XEN)    ffff82d0402e0528 ffff83081da0f010 ffff83081dde9000 ffff830800000002
(XEN)    ffff83081ddf6690 0000000000000001 ffff83081dde9000 ffff83081ddf5000
(XEN)    0000000000000000 ffff83081da0f010 ffff83081da0f010 00000000004f8c00
(XEN)    ffff82d0402f009a 0000000000000067 0000000100000000 ffff83080b27fa00
(XEN)    ffff83081dde9000 000000000081ddf4 ffff83081ddf4000 ffff83081da0f010
(XEN)    0000000000000000 0000000000000006 0000000000000000 0000000000000000
(XEN)    ffff83080b27f9f0 ffff82d0402f1097 0000000000000001 0000000000000000
(XEN)    ffffffffffffffff ffff83081ddf6000 ffff83080b27fa00 0000000400000000
(XEN)    0000000000000000 0000000000000000 ffff83081dde9000 0000000000000000
(XEN)    0000000000000000 ffffffffffffffff 0000000000000001 0000000000000001
(XEN)    0000000000000000 ffff83081ddf6000 0000000000000000 ffff82d0402ea0a6
(XEN)    ffffffffffffffff ffff83081da0f010 0000000700000006 ffff8304f8c00000
(XEN)    ffff83081da0f010 0000000000000000 ffff83080b27fba0 ffff83080b27fc98
(XEN)    0000000000000000 ffff82d0402f4ecd ffff83080b27fac8 ffff83080b27fb20
(XEN)    ffff83081ddf6000 ffff83080b27fae0 0000000100000000 0000000000000007
(XEN)    0000000000000002 ffff83081ca88018 ffff830800000000 0000000000000012
(XEN)    ffff82d0402f023f ffff82d0402f02ed 00000000000fa400 ffff82d0402f00ed
(XEN)    ffff83080b27fb38 ffff82d0402e03da 00000000004f8c00 ffff83081dff1e90
(XEN) Xen call trace:
(XEN)    [<ffff82d0402e8be0>] R arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260
(XEN)    [<ffff82d0402ec51c>] S p2m_flush_nestedp2m+0x1c/0x30
(XEN)    [<ffff82d0402e0528>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x378/0x490
(XEN)    [<ffff82d0402f009a>] S arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x24a/0x2e0
(XEN)    [<ffff82d0402f1097>] S arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x3c7/0x7b0
(XEN)    [<ffff82d0402ea0a6>] S p2m_set_entry+0xa6/0x130
(XEN)    [<ffff82d0402f4ecd>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check+0x1cd/0x440
(XEN)    [<ffff82d0402f023f>] S arch/x86/mm/p2m-pt.c#do_recalc+0x10f/0x470
(XEN)    [<ffff82d0402f02ed>] S arch/x86/mm/p2m-pt.c#do_recalc+0x1bd/0x470
(XEN)    [<ffff82d0402f00ed>] S arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x29d/0x2e0
(XEN)    [<ffff82d0402e03da>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x22a/0x490
(XEN)    [<ffff82d0402f0fe2>] S arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x312/0x7b0
(XEN)    [<ffff82d0402f0c4e>] S arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x3fe/0x480
(XEN)    [<ffff82d0402f59aa>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x17a/0x600
(XEN)    [<ffff82d0402f5ba0>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x370/0x600
(XEN)    [<ffff82d0402f7c78>] S p2m_pod_demand_populate+0x6b8/0xa90
(XEN)    [<ffff82d0402f0aa6>] S arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x256/0x480
(XEN)    [<ffff82d0402e9a1f>] S __get_gfn_type_access+0x6f/0x130
(XEN)    [<ffff82d0402ab12b>] S hvm_hap_nested_page_fault+0xeb/0x760
(XEN)    [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164
(XEN)    [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164

The stack trace goes further, but I suspect the rest would be overkill.
That seems to readily qualify as "Xen bug".


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-26 22:53               ` Elliott Mitchell
@ 2021-09-29 13:32                 ` Jan Beulich
  2021-09-29 15:31                   ` Elliott Mitchell
  2021-09-30  7:43                 ` Jan Beulich
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2021-09-29 13:32 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel

On 27.09.2021 00:53, Elliott Mitchell wrote:
> Getting everything right to recreate is rather inexact.  Having an
> equivalent of `sysctl` to turn on the serial console while running might
> be handy...
> 
> Luckily get things together and...

Thanks; finally got around to look at this in at least slightly more
detail.

> (XEN) mm locking order violation: 48 > 16
> (XEN) Xen BUG at mm-locks.h:82
> (XEN) ----[ Xen-4.14.3  x86_64  debug=n   Not tainted ]----
> (XEN) CPU:    2
> (XEN) RIP:    e008:[<ffff82d0402e8be0>] arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260
> (XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor (d1v0)
> (XEN) rax: ffff83080b2f106c   rbx: ffff83081da0f2d0   rcx: 0000000000000000
> (XEN) rdx: ffff83080b27ffff   rsi: 000000000000000a   rdi: ffff82d040469738
> (XEN) rbp: ffff82d040580688   rsp: ffff83080b27f8b0   r8:  0000000000000002
> (XEN) r9:  0000000000008000   r10: ffff82d04058f381   r11: ffff82d040375100
> (XEN) r12: ffff82d040580688   r13: ffff83080b27ffff   r14: ffff83081ddf6000
> (XEN) r15: 00000000004f8c00   cr0: 000000008005003b   cr4: 00000000000406e0
> (XEN) cr3: 000000081dee6000   cr2: 0000000000000000
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0010   gs: 0010   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d0402e8be0> (arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260):
> (XEN)  e3 0c 00 e8 30 7f f6 ff <0f> 0b 66 0f 1f 44 00 00 42 8b 34 20 48 8d 3d 8d
> (XEN) Xen stack trace from rsp=ffff83080b27f8b0:
> [...]
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0402e8be0>] R arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260
> (XEN)    [<ffff82d0402ec51c>] S p2m_flush_nestedp2m+0x1c/0x30
> (XEN)    [<ffff82d0402e0528>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x378/0x490

hap_write_p2m_entry() calling p2m_flush_nestedp2m() suggests that
nestedhvm_enabled() was true for the domain. While we will want to
fix this, nested virt is experimental (even in current staging),
and hence there at least is no security concern.

Can you confirm that by leaving nested off you don't run into this
(or a similar) issue?

Of course you not having done this with a debug build (and frame
pointers in particular) leaves a level of uncertainty, i.e. the
real call chain may have been different from what this call trace
suggests.

Jan

> (XEN)    [<ffff82d0402f009a>] S arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x24a/0x2e0
> (XEN)    [<ffff82d0402f1097>] S arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x3c7/0x7b0
> (XEN)    [<ffff82d0402ea0a6>] S p2m_set_entry+0xa6/0x130
> (XEN)    [<ffff82d0402f4ecd>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check+0x1cd/0x440
> (XEN)    [<ffff82d0402f023f>] S arch/x86/mm/p2m-pt.c#do_recalc+0x10f/0x470
> (XEN)    [<ffff82d0402f02ed>] S arch/x86/mm/p2m-pt.c#do_recalc+0x1bd/0x470
> (XEN)    [<ffff82d0402f00ed>] S arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x29d/0x2e0
> (XEN)    [<ffff82d0402e03da>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x22a/0x490
> (XEN)    [<ffff82d0402f0fe2>] S arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x312/0x7b0
> (XEN)    [<ffff82d0402f0c4e>] S arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x3fe/0x480
> (XEN)    [<ffff82d0402f59aa>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x17a/0x600
> (XEN)    [<ffff82d0402f5ba0>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x370/0x600
> (XEN)    [<ffff82d0402f7c78>] S p2m_pod_demand_populate+0x6b8/0xa90
> (XEN)    [<ffff82d0402f0aa6>] S arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x256/0x480
> (XEN)    [<ffff82d0402e9a1f>] S __get_gfn_type_access+0x6f/0x130
> (XEN)    [<ffff82d0402ab12b>] S hvm_hap_nested_page_fault+0xeb/0x760
> (XEN)    [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164
> (XEN)    [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164
> 
> The stack trace goes further, but I suspect the rest would be overkill.
> That seems to readily qualify as "Xen bug".
> 
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-29 13:32                 ` Jan Beulich
@ 2021-09-29 15:31                   ` Elliott Mitchell
  2021-09-30  7:08                     ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Elliott Mitchell @ 2021-09-29 15:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Wed, Sep 29, 2021 at 03:32:15PM +0200, Jan Beulich wrote:
> On 27.09.2021 00:53, Elliott Mitchell wrote:
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d0402e8be0>] R arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260
> > (XEN)    [<ffff82d0402ec51c>] S p2m_flush_nestedp2m+0x1c/0x30
> > (XEN)    [<ffff82d0402e0528>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x378/0x490
> 
> hap_write_p2m_entry() calling p2m_flush_nestedp2m() suggests that
> nestedhvm_enabled() was true for the domain. While we will want to
> fix this, nested virt is experimental (even in current staging),
> and hence there at least is no security concern.

Copy and paste from the xl.cfg man page:

       nestedhvm=BOOLEAN
           Enable or disables guest access to hardware virtualisation
           features, e.g. it allows a guest Operating System to also function
           as a hypervisor. You may want this option if you want to run
           another hypervisor (including another copy of Xen) within a Xen
           guest or to support a guest Operating System which uses hardware
           virtualisation extensions (e.g. Windows XP compatibility mode on
           more modern Windows OS).  This option is disabled by default.

"This option is disabled by default." doesn't mean "this is an
experimental feature with no security support and is likely to crash the
hypervisor".

More notably this is fully enabled in default builds of Xen.  Contrast
this with the stance of the ARM side with regards to ACPI.


> Can you confirm that by leaving nested off you don't run into this
> (or a similar) issue?

Hypervisor doesn't panic.  `xl dmesg` does end up with:

(XEN) p2m_pod_demand_populate: Dom72 out of PoD memory! (tot=524304 ents=28773031 dom72)
(XEN) domain_crash called from p2m-pod.c:1233

Which is problematic.  maxmem for this domain is set to allow for trading
memory around, so it is desireable for it to successfully load even when
its maximum isn't available.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-29 15:31                   ` Elliott Mitchell
@ 2021-09-30  7:08                     ` Jan Beulich
  2021-10-02  2:35                       ` Elliott Mitchell
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2021-09-30  7:08 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel

On 29.09.2021 17:31, Elliott Mitchell wrote:
> On Wed, Sep 29, 2021 at 03:32:15PM +0200, Jan Beulich wrote:
>> On 27.09.2021 00:53, Elliott Mitchell wrote:
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d0402e8be0>] R arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260
>>> (XEN)    [<ffff82d0402ec51c>] S p2m_flush_nestedp2m+0x1c/0x30
>>> (XEN)    [<ffff82d0402e0528>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x378/0x490
>>
>> hap_write_p2m_entry() calling p2m_flush_nestedp2m() suggests that
>> nestedhvm_enabled() was true for the domain. While we will want to
>> fix this, nested virt is experimental (even in current staging),
>> and hence there at least is no security concern.
> 
> Copy and paste from the xl.cfg man page:
> 
>        nestedhvm=BOOLEAN
>            Enable or disables guest access to hardware virtualisation
>            features, e.g. it allows a guest Operating System to also function
>            as a hypervisor. You may want this option if you want to run
>            another hypervisor (including another copy of Xen) within a Xen
>            guest or to support a guest Operating System which uses hardware
>            virtualisation extensions (e.g. Windows XP compatibility mode on
>            more modern Windows OS).  This option is disabled by default.
> 
> "This option is disabled by default." doesn't mean "this is an
> experimental feature with no security support and is likely to crash the
> hypervisor".

Correct, but this isn't the only place to look at. Quoting
SUPPORT.md:

"### x86/Nested HVM

 This means providing hardware virtulization support to guest VMs
 allowing, for instance, a nested Xen to support both PV and HVM guests.
 It also implies support for other hypervisors,
 such as KVM, Hyper-V, Bromium, and so on as guests.

    Status, x86 HVM: Experimental"

And with an experimental feature you have to expect crashes, no matter
that we'd prefer if you wouldn't hit any.

>> Can you confirm that by leaving nested off you don't run into this
>> (or a similar) issue?
> 
> Hypervisor doesn't panic.  `xl dmesg` does end up with:
> 
> (XEN) p2m_pod_demand_populate: Dom72 out of PoD memory! (tot=524304 ents=28773031 dom72)
> (XEN) domain_crash called from p2m-pod.c:1233
> 
> Which is problematic.  maxmem for this domain is set to allow for trading
> memory around, so it is desireable for it to successfully load even when
> its maximum isn't available.

Yet that's still a configuration error (of the guest), not a bug in
Xen.

Thanks for confirming that the issue is nested-hvm related. I'm in the
process of putting together a draft fix, but I'm afraid there's a
bigger underlying issue, so I'm not convinced we would want to go with
that fix even if you were to find that it helps in your case.

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-26 22:53               ` Elliott Mitchell
  2021-09-29 13:32                 ` Jan Beulich
@ 2021-09-30  7:43                 ` Jan Beulich
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2021-09-30  7:43 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel

On 27.09.2021 00:53, Elliott Mitchell wrote:
> On Wed, Sep 15, 2021 at 08:05:05AM +0200, Jan Beulich wrote:
>> On 15.09.2021 04:40, Elliott Mitchell wrote:
>>> On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote:
>>>> On 07.09.2021 17:03, Elliott Mitchell wrote:
>>>>>  Could be this system is in an
>>>>> intergenerational hole, and some spot in the PVH/HVM code makes an
>>>>> assumption of the presence of NPT guarantees presence of an operational
>>>>> IOMMU.  Otherwise if there was some copy and paste while writing IOMMU
>>>>> code, some portion of the IOMMU code might be checking for presence of
>>>>> NPT instead of presence of IOMMU.
>>>>
>>>> This is all very speculative; I consider what you suspect not very likely,
>>>> but also not entirely impossible. This is not the least because for a
>>>> long time we've been running without shared page tables on AMD.
>>>>
>>>> I'm afraid without technical data and without knowing how to repro, I
>>>> don't see a way forward here.
>>>
>>> Downtimes are very expensive even for lower-end servers.  Plus there is
>>> the issue the system wasn't meant for development and thus never had
>>> appropriate setup done.
>>>
>>> Experimentation with a system of similar age suggested another candidate.
>>> System has a conventional BIOS.  Might some dependancies on the presence
>>> of UEFI snuck into the NPT code?
>>
>> I can't think of any such, but as all of this is very nebulous I can't
>> really rule out anything.
> 
> Getting everything right to recreate is rather inexact.  Having an
> equivalent of `sysctl` to turn on the serial console while running might
> be handy...
> 
> Luckily get things together and...
> 
> (XEN) mm locking order violation: 48 > 16
> (XEN) Xen BUG at mm-locks.h:82

Would you give the patch below a try? While against current staging it
looks to apply fine to 4.14.3.

Jan

x86/PoD: defer nested P2M flushes

With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() ->
write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock
order violation when the PoD lock is held around it. Hence such flushing
needs to be deferred. Steal the approach from p2m_change_type_range().

Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -24,6 +24,7 @@
 #include <xen/mm.h>
 #include <xen/sched.h>
 #include <xen/trace.h>
+#include <asm/hvm/nestedhvm.h>
 #include <asm/page.h>
 #include <asm/paging.h>
 #include <asm/p2m.h>
@@ -494,6 +495,13 @@ p2m_pod_offline_or_broken_replace(struct
 static int
 p2m_pod_zero_check_superpage(struct p2m_domain *p2m, gfn_t gfn);
 
+static void pod_unlock_and_flush(struct p2m_domain *p2m)
+{
+    pod_unlock(p2m);
+    p2m->defer_nested_flush = false;
+    if ( nestedhvm_enabled(p2m->domain) )
+        p2m_flush_nestedp2m(p2m->domain);
+}
 
 /*
  * This function is needed for two reasons:
@@ -514,6 +522,7 @@ p2m_pod_decrease_reservation(struct doma
 
     gfn_lock(p2m, gfn, order);
     pod_lock(p2m);
+    p2m->defer_nested_flush = true;
 
     /*
      * If we don't have any outstanding PoD entries, let things take their
@@ -665,7 +674,7 @@ out_entry_check:
     }
 
 out_unlock:
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
     gfn_unlock(p2m, gfn, order);
     return ret;
 }
@@ -1144,8 +1153,10 @@ p2m_pod_demand_populate(struct p2m_domai
      * won't start until we're done.
      */
     if ( unlikely(d->is_dying) )
-        goto out_fail;
-
+    {
+        pod_unlock(p2m);
+        return false;
+    }
 
     /*
      * Because PoD does not have cache list for 1GB pages, it has to remap
@@ -1167,6 +1178,8 @@ p2m_pod_demand_populate(struct p2m_domai
                               p2m_populate_on_demand, p2m->default_access);
     }
 
+    p2m->defer_nested_flush = true;
+
     /* Only reclaim if we're in actual need of more cache. */
     if ( p2m->pod.entry_count > p2m->pod.count )
         pod_eager_reclaim(p2m);
@@ -1229,8 +1242,9 @@ p2m_pod_demand_populate(struct p2m_domai
         __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t);
     }
 
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
     return true;
+
 out_of_memory:
     pod_unlock(p2m);
 
@@ -1239,12 +1253,14 @@ out_of_memory:
            p2m->pod.entry_count, current->domain->domain_id);
     domain_crash(d);
     return false;
+
 out_fail:
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
     return false;
+
 remap_and_retry:
     BUG_ON(order != PAGE_ORDER_2M);
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
 
     /*
      * Remap this 2-meg region in singleton chunks. See the comment on the



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-09-30  7:08                     ` Jan Beulich
@ 2021-10-02  2:35                       ` Elliott Mitchell
  2021-10-07  7:20                         ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Elliott Mitchell @ 2021-10-02  2:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Thu, Sep 30, 2021 at 09:08:34AM +0200, Jan Beulich wrote:
> On 29.09.2021 17:31, Elliott Mitchell wrote:
> > 
> > Copy and paste from the xl.cfg man page:
> > 
> >        nestedhvm=BOOLEAN
> >            Enable or disables guest access to hardware virtualisation
> >            features, e.g. it allows a guest Operating System to also function
> >            as a hypervisor. You may want this option if you want to run
> >            another hypervisor (including another copy of Xen) within a Xen
> >            guest or to support a guest Operating System which uses hardware
> >            virtualisation extensions (e.g. Windows XP compatibility mode on
> >            more modern Windows OS).  This option is disabled by default.
> > 
> > "This option is disabled by default." doesn't mean "this is an
> > experimental feature with no security support and is likely to crash the
> > hypervisor".
> 
> Correct, but this isn't the only place to look at. Quoting
> SUPPORT.md:

You expect everyone to memorize SUPPORT.md (almost 1000 lines) before
trying to use Xen?

Your statement amounts to saying you really expect that.  People who want
to get work done will look at `man xl.cfg` when needed, and follow
instructions.

Mentioning something in `man xl.cfg` amounts to a statment "this is
supported".  Experimental/unsupported options need to be marked
"EXPERIMENTAL: DO NOT ENABLE IN PRODUCTION ENVIRONMENTS".


> Yet that's still a configuration error (of the guest), not a bug in
> Xen.

Documentation that poor amounts to a security vulnerability.



I would suggest this needs 2 extra enablers.

First, this has potential to panic the hypervisor.  As such there needs
to be an "enable_experimental=" option for the Xen command-line.  The
argument would be a list of features to enable ("nestedhvm" for this
case).  If this is absent, the hypervisor should ideally disable as much
of the code related to the unsupported/experimental features as possible.

Second, since this needs to be enabled per-domain, there should be a
similar "enable_experimental" setting for xl.cfg options.



I think this really is bad enough to warrant a security vulnerability
and updates to all branches.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: HVM/PVH Balloon crash
  2021-10-02  2:35                       ` Elliott Mitchell
@ 2021-10-07  7:20                         ` Jan Beulich
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2021-10-07  7:20 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel

On 02.10.2021 04:35, Elliott Mitchell wrote:
> On Thu, Sep 30, 2021 at 09:08:34AM +0200, Jan Beulich wrote:
>> On 29.09.2021 17:31, Elliott Mitchell wrote:
>>>
>>> Copy and paste from the xl.cfg man page:
>>>
>>>        nestedhvm=BOOLEAN
>>>            Enable or disables guest access to hardware virtualisation
>>>            features, e.g. it allows a guest Operating System to also function
>>>            as a hypervisor. You may want this option if you want to run
>>>            another hypervisor (including another copy of Xen) within a Xen
>>>            guest or to support a guest Operating System which uses hardware
>>>            virtualisation extensions (e.g. Windows XP compatibility mode on
>>>            more modern Windows OS).  This option is disabled by default.
>>>
>>> "This option is disabled by default." doesn't mean "this is an
>>> experimental feature with no security support and is likely to crash the
>>> hypervisor".
>>
>> Correct, but this isn't the only place to look at. Quoting
>> SUPPORT.md:
> 
> You expect everyone to memorize SUPPORT.md (almost 1000 lines) before
> trying to use Xen?

I don't see why you say "memorize". When the file was introduced, it was
(aiui) indeed the intention for _it_ to become the main reference. Feel
free to propose alternatives.

> Your statement amounts to saying you really expect that.  People who want
> to get work done will look at `man xl.cfg` when needed, and follow
> instructions.
> 
> Mentioning something in `man xl.cfg` amounts to a statment "this is
> supported".  Experimental/unsupported options need to be marked
> "EXPERIMENTAL: DO NOT ENABLE IN PRODUCTION ENVIRONMENTS".
> 
> 
>> Yet that's still a configuration error (of the guest), not a bug in
>> Xen.
> 
> Documentation that poor amounts to a security vulnerability.

I disagree.

> I would suggest this needs 2 extra enablers.
> 
> First, this has potential to panic the hypervisor.  As such there needs
> to be an "enable_experimental=" option for the Xen command-line.  The
> argument would be a list of features to enable ("nestedhvm" for this
> case).  If this is absent, the hypervisor should ideally disable as much
> of the code related to the unsupported/experimental features as possible.
> 
> Second, since this needs to be enabled per-domain, there should be a
> similar "enable_experimental" setting for xl.cfg options.
> 
> 
> 
> I think this really is bad enough to warrant a security vulnerability
> and updates to all branches.

As above, I don't think I agree. But please feel free to propose patches.

What I'm personally more curious about is whether the patch I did send
you actually made a difference.

Jan



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-10-07  7:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-05 22:10 HVM/PVH Ballon crash Elliott Mitchell
2021-09-06  7:52 ` Jan Beulich
2021-09-06 20:47   ` HVM/PVH Balloon crash Elliott Mitchell
2021-09-07  8:03     ` Jan Beulich
2021-09-07 15:03       ` Elliott Mitchell
2021-09-07 15:57         ` Jan Beulich
2021-09-07 21:40           ` Elliott Mitchell
2021-09-15  2:40           ` Elliott Mitchell
2021-09-15  6:05             ` Jan Beulich
2021-09-26 22:53               ` Elliott Mitchell
2021-09-29 13:32                 ` Jan Beulich
2021-09-29 15:31                   ` Elliott Mitchell
2021-09-30  7:08                     ` Jan Beulich
2021-10-02  2:35                       ` Elliott Mitchell
2021-10-07  7:20                         ` Jan Beulich
2021-09-30  7:43                 ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.