* HVM/PVH Ballon crash @ 2021-09-05 22:10 Elliott Mitchell 2021-09-06 7:52 ` Jan Beulich 0 siblings, 1 reply; 16+ messages in thread From: Elliott Mitchell @ 2021-09-05 22:10 UTC (permalink / raw) To: xen-devel I brought this up a while back, but it still appears to be present and the latest observations appear rather serious. I'm unsure of the entire set of conditions for reproduction. Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but this is an older AMD IOMMU). This has been confirmed with Xen 4.11 and Xen 4.14. This includes Debian's patches, but those are mostly backports or environment adjustments. Domain 0 is presently using a 4.19 kernel. The trigger is creating a HVM or PVH domain where memory does not equal maxmem. New observations: I discovered this occurs with PVH domains in addition to HVM ones. I got PVH GRUB operational. PVH GRUB appeared at to operate normally and not trigger the crash/panic. The crash/panic occurred some number of seconds after the Linux kernel was loaded. Mitigation by not using ballooning with HVM/PVH is workable, but this is quite a large mine in the configuration. I'm wondering if perhaps it is actually the Linux kernel in Domain 0 which is panicing. The crash/panic occurring AFTER the main kernel loads suggests some action by the user domain is doing is the actual trigger of the crash/panic. That last point is actually rather worrisome. There might be a security hole lurking here. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Ballon crash 2021-09-05 22:10 HVM/PVH Ballon crash Elliott Mitchell @ 2021-09-06 7:52 ` Jan Beulich 2021-09-06 20:47 ` HVM/PVH Balloon crash Elliott Mitchell 0 siblings, 1 reply; 16+ messages in thread From: Jan Beulich @ 2021-09-06 7:52 UTC (permalink / raw) To: Elliott Mitchell; +Cc: xen-devel On 06.09.2021 00:10, Elliott Mitchell wrote: > I brought this up a while back, but it still appears to be present and > the latest observations appear rather serious. > > I'm unsure of the entire set of conditions for reproduction. > > Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but > this is an older AMD IOMMU). > > This has been confirmed with Xen 4.11 and Xen 4.14. This includes > Debian's patches, but those are mostly backports or environment > adjustments. > > Domain 0 is presently using a 4.19 kernel. > > The trigger is creating a HVM or PVH domain where memory does not equal > maxmem. I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory allocations" submitted very early this year? There you said the issue was with a guest's maxmem exceeding host memory size. Here you seem to be talking of PoD in its normal form of use. Personally I uses this all the time (unless enabling PCI pass-through for a guest, for being incompatible). I've not observed any badness as severe as you've described. > New observations: > > I discovered this occurs with PVH domains in addition to HVM ones. > > I got PVH GRUB operational. PVH GRUB appeared at to operate normally > and not trigger the crash/panic. > > The crash/panic occurred some number of seconds after the Linux kernel > was loaded. > > > Mitigation by not using ballooning with HVM/PVH is workable, but this is > quite a large mine in the configuration. > > I'm wondering if perhaps it is actually the Linux kernel in Domain 0 > which is panicing. > > The crash/panic occurring AFTER the main kernel loads suggests some > action by the user domain is doing is the actual trigger of the > crash/panic. All of this is pretty vague: If you don't even know what component it is that crashes / panics, I don't suppose you have any logs. Yet what do you expect us to do without any technical detail? Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-06 7:52 ` Jan Beulich @ 2021-09-06 20:47 ` Elliott Mitchell 2021-09-07 8:03 ` Jan Beulich 0 siblings, 1 reply; 16+ messages in thread From: Elliott Mitchell @ 2021-09-06 20:47 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote: > On 06.09.2021 00:10, Elliott Mitchell wrote: > > I brought this up a while back, but it still appears to be present and > > the latest observations appear rather serious. > > > > I'm unsure of the entire set of conditions for reproduction. > > > > Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but > > this is an older AMD IOMMU). > > > > This has been confirmed with Xen 4.11 and Xen 4.14. This includes > > Debian's patches, but those are mostly backports or environment > > adjustments. > > > > Domain 0 is presently using a 4.19 kernel. > > > > The trigger is creating a HVM or PVH domain where memory does not equal > > maxmem. > > I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory > allocations" submitted very early this year? There you said the issue > was with a guest's maxmem exceeding host memory size. Here you seem to > be talking of PoD in its normal form of use. Personally I uses this > all the time (unless enabling PCI pass-through for a guest, for being > incompatible). I've not observed any badness as severe as you've > described. I've got very little idea what is occurring as I'm expecting to be doing ARM debugging, not x86 debugging. I was starting to wonder whether this was widespread or not. As such I was reporting the factors which might be different in my environment. The one which sticks out is the computer has an older AMD processor (you a 100% Intel shop?). The processor has the AMD NPT feature, but a very early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not available"). Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an IOMMU). There is also the possibility Debian added a bad patch, but that seems improbable as there aren't enough bug reports. > > New observations: > > > > I discovered this occurs with PVH domains in addition to HVM ones. > > > > I got PVH GRUB operational. PVH GRUB appeared at to operate normally > > and not trigger the crash/panic. > > > > The crash/panic occurred some number of seconds after the Linux kernel > > was loaded. > > > > > > Mitigation by not using ballooning with HVM/PVH is workable, but this is > > quite a large mine in the configuration. > > > > I'm wondering if perhaps it is actually the Linux kernel in Domain 0 > > which is panicing. > > > > The crash/panic occurring AFTER the main kernel loads suggests some > > action by the user domain is doing is the actual trigger of the > > crash/panic. > > All of this is pretty vague: If you don't even know what component it > is that crashes / panics, I don't suppose you have any logs. Yet what > do you expect us to do without any technical detail? Initially this had looked so spectacular as to be easy to reproduce. No logs, I wasn't expecting to be doing hardware-level debugging on x86. I've got several USB to TTL-serial cables (ARM/MIPS debug), I may need to hunt a USB to full voltage EIA-232C cable. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-06 20:47 ` HVM/PVH Balloon crash Elliott Mitchell @ 2021-09-07 8:03 ` Jan Beulich 2021-09-07 15:03 ` Elliott Mitchell 0 siblings, 1 reply; 16+ messages in thread From: Jan Beulich @ 2021-09-07 8:03 UTC (permalink / raw) To: Elliott Mitchell; +Cc: xen-devel On 06.09.2021 22:47, Elliott Mitchell wrote: > On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote: >> On 06.09.2021 00:10, Elliott Mitchell wrote: >>> I brought this up a while back, but it still appears to be present and >>> the latest observations appear rather serious. >>> >>> I'm unsure of the entire set of conditions for reproduction. >>> >>> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but >>> this is an older AMD IOMMU). >>> >>> This has been confirmed with Xen 4.11 and Xen 4.14. This includes >>> Debian's patches, but those are mostly backports or environment >>> adjustments. >>> >>> Domain 0 is presently using a 4.19 kernel. >>> >>> The trigger is creating a HVM or PVH domain where memory does not equal >>> maxmem. >> >> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory >> allocations" submitted very early this year? There you said the issue >> was with a guest's maxmem exceeding host memory size. Here you seem to >> be talking of PoD in its normal form of use. Personally I uses this >> all the time (unless enabling PCI pass-through for a guest, for being >> incompatible). I've not observed any badness as severe as you've >> described. > > I've got very little idea what is occurring as I'm expecting to be doing > ARM debugging, not x86 debugging. > > I was starting to wonder whether this was widespread or not. As such I > was reporting the factors which might be different in my environment. > > The one which sticks out is the computer has an older AMD processor (you > a 100% Intel shop?). No, AMD is as relevant to us as is Intel. > The processor has the AMD NPT feature, but a very > early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not > available"). > > Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an > IOMMU). That sounds odd at the first glance - PVH simply requires that there be an (enabled) IOMMU. Hence the only thing I could imagine is that Xen doesn't enable the IOMMU in the first place for some reason. Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-07 8:03 ` Jan Beulich @ 2021-09-07 15:03 ` Elliott Mitchell 2021-09-07 15:57 ` Jan Beulich 0 siblings, 1 reply; 16+ messages in thread From: Elliott Mitchell @ 2021-09-07 15:03 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On Tue, Sep 07, 2021 at 10:03:51AM +0200, Jan Beulich wrote: > On 06.09.2021 22:47, Elliott Mitchell wrote: > > On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote: > >> On 06.09.2021 00:10, Elliott Mitchell wrote: > >>> I brought this up a while back, but it still appears to be present and > >>> the latest observations appear rather serious. > >>> > >>> I'm unsure of the entire set of conditions for reproduction. > >>> > >>> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but > >>> this is an older AMD IOMMU). > >>> > >>> This has been confirmed with Xen 4.11 and Xen 4.14. This includes > >>> Debian's patches, but those are mostly backports or environment > >>> adjustments. > >>> > >>> Domain 0 is presently using a 4.19 kernel. > >>> > >>> The trigger is creating a HVM or PVH domain where memory does not equal > >>> maxmem. > >> > >> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory > >> allocations" submitted very early this year? There you said the issue > >> was with a guest's maxmem exceeding host memory size. Here you seem to > >> be talking of PoD in its normal form of use. Personally I uses this > >> all the time (unless enabling PCI pass-through for a guest, for being > >> incompatible). I've not observed any badness as severe as you've > >> described. > > > > I've got very little idea what is occurring as I'm expecting to be doing > > ARM debugging, not x86 debugging. > > > > I was starting to wonder whether this was widespread or not. As such I > > was reporting the factors which might be different in my environment. > > > > The one which sticks out is the computer has an older AMD processor (you > > a 100% Intel shop?). > > No, AMD is as relevant to us as is Intel. > > > The processor has the AMD NPT feature, but a very > > early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not > > available"). > > > > Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an > > IOMMU). > > That sounds odd at the first glance - PVH simply requires that there be > an (enabled) IOMMU. Hence the only thing I could imagine is that Xen > doesn't enable the IOMMU in the first place for some reason. Doesn't seem that odd to me. I don't know the differences between the first and second versions of the AMD IOMMU, but could well be v1 was judged not to have enough functionality to bother with. What this does make me wonder is, how much testing was done on systems with functioning NPT, but disabled IOMMU? Could be this system is in an intergenerational hole, and some spot in the PVH/HVM code makes an assumption of the presence of NPT guarantees presence of an operational IOMMU. Otherwise if there was some copy and paste while writing IOMMU code, some portion of the IOMMU code might be checking for presence of NPT instead of presence of IOMMU. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-07 15:03 ` Elliott Mitchell @ 2021-09-07 15:57 ` Jan Beulich 2021-09-07 21:40 ` Elliott Mitchell 2021-09-15 2:40 ` Elliott Mitchell 0 siblings, 2 replies; 16+ messages in thread From: Jan Beulich @ 2021-09-07 15:57 UTC (permalink / raw) To: Elliott Mitchell; +Cc: xen-devel On 07.09.2021 17:03, Elliott Mitchell wrote: > On Tue, Sep 07, 2021 at 10:03:51AM +0200, Jan Beulich wrote: >> On 06.09.2021 22:47, Elliott Mitchell wrote: >>> On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote: >>>> On 06.09.2021 00:10, Elliott Mitchell wrote: >>>>> I brought this up a while back, but it still appears to be present and >>>>> the latest observations appear rather serious. >>>>> >>>>> I'm unsure of the entire set of conditions for reproduction. >>>>> >>>>> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but >>>>> this is an older AMD IOMMU). >>>>> >>>>> This has been confirmed with Xen 4.11 and Xen 4.14. This includes >>>>> Debian's patches, but those are mostly backports or environment >>>>> adjustments. >>>>> >>>>> Domain 0 is presently using a 4.19 kernel. >>>>> >>>>> The trigger is creating a HVM or PVH domain where memory does not equal >>>>> maxmem. >>>> >>>> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory >>>> allocations" submitted very early this year? There you said the issue >>>> was with a guest's maxmem exceeding host memory size. Here you seem to >>>> be talking of PoD in its normal form of use. Personally I uses this >>>> all the time (unless enabling PCI pass-through for a guest, for being >>>> incompatible). I've not observed any badness as severe as you've >>>> described. >>> >>> I've got very little idea what is occurring as I'm expecting to be doing >>> ARM debugging, not x86 debugging. >>> >>> I was starting to wonder whether this was widespread or not. As such I >>> was reporting the factors which might be different in my environment. >>> >>> The one which sticks out is the computer has an older AMD processor (you >>> a 100% Intel shop?). >> >> No, AMD is as relevant to us as is Intel. >> >>> The processor has the AMD NPT feature, but a very >>> early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not >>> available"). >>> >>> Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an >>> IOMMU). >> >> That sounds odd at the first glance - PVH simply requires that there be >> an (enabled) IOMMU. Hence the only thing I could imagine is that Xen >> doesn't enable the IOMMU in the first place for some reason. > > Doesn't seem that odd to me. I don't know the differences between the > first and second versions of the AMD IOMMU, but could well be v1 was > judged not to have enough functionality to bother with. > > What this does make me wonder is, how much testing was done on systems > with functioning NPT, but disabled IOMMU? No idea. During development is may happen (rarely) that one disables the IOMMU on purpose. Beyond that - can't tell. > Could be this system is in an > intergenerational hole, and some spot in the PVH/HVM code makes an > assumption of the presence of NPT guarantees presence of an operational > IOMMU. Otherwise if there was some copy and paste while writing IOMMU > code, some portion of the IOMMU code might be checking for presence of > NPT instead of presence of IOMMU. This is all very speculative; I consider what you suspect not very likely, but also not entirely impossible. This is not the least because for a long time we've been running without shared page tables on AMD. I'm afraid without technical data and without knowing how to repro, I don't see a way forward here. Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-07 15:57 ` Jan Beulich @ 2021-09-07 21:40 ` Elliott Mitchell 2021-09-15 2:40 ` Elliott Mitchell 1 sibling, 0 replies; 16+ messages in thread From: Elliott Mitchell @ 2021-09-07 21:40 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote: > On 07.09.2021 17:03, Elliott Mitchell wrote: > > On Tue, Sep 07, 2021 at 10:03:51AM +0200, Jan Beulich wrote: > >> > >> That sounds odd at the first glance - PVH simply requires that there be > >> an (enabled) IOMMU. Hence the only thing I could imagine is that Xen > >> doesn't enable the IOMMU in the first place for some reason. > > > > Doesn't seem that odd to me. I don't know the differences between the > > first and second versions of the AMD IOMMU, but could well be v1 was > > judged not to have enough functionality to bother with. > > > > What this does make me wonder is, how much testing was done on systems > > with functioning NPT, but disabled IOMMU? > > No idea. During development is may happen (rarely) that one disables > the IOMMU on purpose. Beyond that - can't tell. Thus this processor having an early and not too capable IOMMU seems worthy of note. > > Could be this system is in an > > intergenerational hole, and some spot in the PVH/HVM code makes an > > assumption of the presence of NPT guarantees presence of an operational > > IOMMU. Otherwise if there was some copy and paste while writing IOMMU > > code, some portion of the IOMMU code might be checking for presence of > > NPT instead of presence of IOMMU. > > This is all very speculative; I consider what you suspect not very likely, > but also not entirely impossible. This is not the least because for a > long time we've been running without shared page tables on AMD. > > I'm afraid without technical data and without knowing how to repro, I > don't see a way forward here. I cannot report things which do not exist. This occurs very quickly and no warning or error messages have ever been observed on the main console (VGA). Happens during user domain kernel boot. The configuration: builder = "hvm" name = "kr45h" memory = 1024 maxmem = 16384 vcpus = 2 vif = [ '' ] disk = [ 'vdev=sdc,format=raw,access=r,devtype=cdrom,target=/tmp/boot.iso', ] sdl = 1 has been tested and confirmed to reproduce. Looks like this was last examined with a FreeBSD 12.2 AMD64 ISO, but Linux ISOs (un)happily work too. It is less than 40 seconds from `xl create` to indications of hardware boot process starting. Since there don't appear to be too many reports, the one factor which now stands out is this machine has an AMD processor. Xen confirms presence of NPT support, but reports "I/O virtualisation disabled" (older, less capable IOMMU). -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-07 15:57 ` Jan Beulich 2021-09-07 21:40 ` Elliott Mitchell @ 2021-09-15 2:40 ` Elliott Mitchell 2021-09-15 6:05 ` Jan Beulich 1 sibling, 1 reply; 16+ messages in thread From: Elliott Mitchell @ 2021-09-15 2:40 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote: > On 07.09.2021 17:03, Elliott Mitchell wrote: > > Could be this system is in an > > intergenerational hole, and some spot in the PVH/HVM code makes an > > assumption of the presence of NPT guarantees presence of an operational > > IOMMU. Otherwise if there was some copy and paste while writing IOMMU > > code, some portion of the IOMMU code might be checking for presence of > > NPT instead of presence of IOMMU. > > This is all very speculative; I consider what you suspect not very likely, > but also not entirely impossible. This is not the least because for a > long time we've been running without shared page tables on AMD. > > I'm afraid without technical data and without knowing how to repro, I > don't see a way forward here. Downtimes are very expensive even for lower-end servers. Plus there is the issue the system wasn't meant for development and thus never had appropriate setup done. Experimentation with a system of similar age suggested another candidate. System has a conventional BIOS. Might some dependancies on the presence of UEFI snuck into the NPT code? -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-15 2:40 ` Elliott Mitchell @ 2021-09-15 6:05 ` Jan Beulich 2021-09-26 22:53 ` Elliott Mitchell 0 siblings, 1 reply; 16+ messages in thread From: Jan Beulich @ 2021-09-15 6:05 UTC (permalink / raw) To: Elliott Mitchell; +Cc: xen-devel On 15.09.2021 04:40, Elliott Mitchell wrote: > On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote: >> On 07.09.2021 17:03, Elliott Mitchell wrote: >>> Could be this system is in an >>> intergenerational hole, and some spot in the PVH/HVM code makes an >>> assumption of the presence of NPT guarantees presence of an operational >>> IOMMU. Otherwise if there was some copy and paste while writing IOMMU >>> code, some portion of the IOMMU code might be checking for presence of >>> NPT instead of presence of IOMMU. >> >> This is all very speculative; I consider what you suspect not very likely, >> but also not entirely impossible. This is not the least because for a >> long time we've been running without shared page tables on AMD. >> >> I'm afraid without technical data and without knowing how to repro, I >> don't see a way forward here. > > Downtimes are very expensive even for lower-end servers. Plus there is > the issue the system wasn't meant for development and thus never had > appropriate setup done. > > Experimentation with a system of similar age suggested another candidate. > System has a conventional BIOS. Might some dependancies on the presence > of UEFI snuck into the NPT code? I can't think of any such, but as all of this is very nebulous I can't really rule out anything. Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-15 6:05 ` Jan Beulich @ 2021-09-26 22:53 ` Elliott Mitchell 2021-09-29 13:32 ` Jan Beulich 2021-09-30 7:43 ` Jan Beulich 0 siblings, 2 replies; 16+ messages in thread From: Elliott Mitchell @ 2021-09-26 22:53 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On Wed, Sep 15, 2021 at 08:05:05AM +0200, Jan Beulich wrote: > On 15.09.2021 04:40, Elliott Mitchell wrote: > > On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote: > >> On 07.09.2021 17:03, Elliott Mitchell wrote: > >>> Could be this system is in an > >>> intergenerational hole, and some spot in the PVH/HVM code makes an > >>> assumption of the presence of NPT guarantees presence of an operational > >>> IOMMU. Otherwise if there was some copy and paste while writing IOMMU > >>> code, some portion of the IOMMU code might be checking for presence of > >>> NPT instead of presence of IOMMU. > >> > >> This is all very speculative; I consider what you suspect not very likely, > >> but also not entirely impossible. This is not the least because for a > >> long time we've been running without shared page tables on AMD. > >> > >> I'm afraid without technical data and without knowing how to repro, I > >> don't see a way forward here. > > > > Downtimes are very expensive even for lower-end servers. Plus there is > > the issue the system wasn't meant for development and thus never had > > appropriate setup done. > > > > Experimentation with a system of similar age suggested another candidate. > > System has a conventional BIOS. Might some dependancies on the presence > > of UEFI snuck into the NPT code? > > I can't think of any such, but as all of this is very nebulous I can't > really rule out anything. Getting everything right to recreate is rather inexact. Having an equivalent of `sysctl` to turn on the serial console while running might be handy... Luckily get things together and... (XEN) mm locking order violation: 48 > 16 (XEN) Xen BUG at mm-locks.h:82 (XEN) ----[ Xen-4.14.3 x86_64 debug=n Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff82d0402e8be0>] arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260 (XEN) RFLAGS: 0000000000010292 CONTEXT: hypervisor (d1v0) (XEN) rax: ffff83080b2f106c rbx: ffff83081da0f2d0 rcx: 0000000000000000 (XEN) rdx: ffff83080b27ffff rsi: 000000000000000a rdi: ffff82d040469738 (XEN) rbp: ffff82d040580688 rsp: ffff83080b27f8b0 r8: 0000000000000002 (XEN) r9: 0000000000008000 r10: ffff82d04058f381 r11: ffff82d040375100 (XEN) r12: ffff82d040580688 r13: ffff83080b27ffff r14: ffff83081ddf6000 (XEN) r15: 00000000004f8c00 cr0: 000000008005003b cr4: 00000000000406e0 (XEN) cr3: 000000081dee6000 cr2: 0000000000000000 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0010 gs: 0010 ss: 0000 cs: e008 (XEN) Xen code around <ffff82d0402e8be0> (arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260): (XEN) e3 0c 00 e8 30 7f f6 ff <0f> 0b 66 0f 1f 44 00 00 42 8b 34 20 48 8d 3d 8d (XEN) Xen stack trace from rsp=ffff83080b27f8b0: (XEN) ffff83081ddf67c8 ffff83081ddf6810 ffff82d040580688 000000081ddf4067 (XEN) 0000000000000001 ffff82d0402ec51c ffff83081ddf6000 0000000000000000 (XEN) ffff82d0402e0528 ffff83081da0f010 ffff83081dde9000 ffff830800000002 (XEN) ffff83081ddf6690 0000000000000001 ffff83081dde9000 ffff83081ddf5000 (XEN) 0000000000000000 ffff83081da0f010 ffff83081da0f010 00000000004f8c00 (XEN) ffff82d0402f009a 0000000000000067 0000000100000000 ffff83080b27fa00 (XEN) ffff83081dde9000 000000000081ddf4 ffff83081ddf4000 ffff83081da0f010 (XEN) 0000000000000000 0000000000000006 0000000000000000 0000000000000000 (XEN) ffff83080b27f9f0 ffff82d0402f1097 0000000000000001 0000000000000000 (XEN) ffffffffffffffff ffff83081ddf6000 ffff83080b27fa00 0000000400000000 (XEN) 0000000000000000 0000000000000000 ffff83081dde9000 0000000000000000 (XEN) 0000000000000000 ffffffffffffffff 0000000000000001 0000000000000001 (XEN) 0000000000000000 ffff83081ddf6000 0000000000000000 ffff82d0402ea0a6 (XEN) ffffffffffffffff ffff83081da0f010 0000000700000006 ffff8304f8c00000 (XEN) ffff83081da0f010 0000000000000000 ffff83080b27fba0 ffff83080b27fc98 (XEN) 0000000000000000 ffff82d0402f4ecd ffff83080b27fac8 ffff83080b27fb20 (XEN) ffff83081ddf6000 ffff83080b27fae0 0000000100000000 0000000000000007 (XEN) 0000000000000002 ffff83081ca88018 ffff830800000000 0000000000000012 (XEN) ffff82d0402f023f ffff82d0402f02ed 00000000000fa400 ffff82d0402f00ed (XEN) ffff83080b27fb38 ffff82d0402e03da 00000000004f8c00 ffff83081dff1e90 (XEN) Xen call trace: (XEN) [<ffff82d0402e8be0>] R arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260 (XEN) [<ffff82d0402ec51c>] S p2m_flush_nestedp2m+0x1c/0x30 (XEN) [<ffff82d0402e0528>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x378/0x490 (XEN) [<ffff82d0402f009a>] S arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x24a/0x2e0 (XEN) [<ffff82d0402f1097>] S arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x3c7/0x7b0 (XEN) [<ffff82d0402ea0a6>] S p2m_set_entry+0xa6/0x130 (XEN) [<ffff82d0402f4ecd>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check+0x1cd/0x440 (XEN) [<ffff82d0402f023f>] S arch/x86/mm/p2m-pt.c#do_recalc+0x10f/0x470 (XEN) [<ffff82d0402f02ed>] S arch/x86/mm/p2m-pt.c#do_recalc+0x1bd/0x470 (XEN) [<ffff82d0402f00ed>] S arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x29d/0x2e0 (XEN) [<ffff82d0402e03da>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x22a/0x490 (XEN) [<ffff82d0402f0fe2>] S arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x312/0x7b0 (XEN) [<ffff82d0402f0c4e>] S arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x3fe/0x480 (XEN) [<ffff82d0402f59aa>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x17a/0x600 (XEN) [<ffff82d0402f5ba0>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x370/0x600 (XEN) [<ffff82d0402f7c78>] S p2m_pod_demand_populate+0x6b8/0xa90 (XEN) [<ffff82d0402f0aa6>] S arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x256/0x480 (XEN) [<ffff82d0402e9a1f>] S __get_gfn_type_access+0x6f/0x130 (XEN) [<ffff82d0402ab12b>] S hvm_hap_nested_page_fault+0xeb/0x760 (XEN) [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164 (XEN) [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164 The stack trace goes further, but I suspect the rest would be overkill. That seems to readily qualify as "Xen bug". -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-26 22:53 ` Elliott Mitchell @ 2021-09-29 13:32 ` Jan Beulich 2021-09-29 15:31 ` Elliott Mitchell 2021-09-30 7:43 ` Jan Beulich 1 sibling, 1 reply; 16+ messages in thread From: Jan Beulich @ 2021-09-29 13:32 UTC (permalink / raw) To: Elliott Mitchell; +Cc: xen-devel On 27.09.2021 00:53, Elliott Mitchell wrote: > Getting everything right to recreate is rather inexact. Having an > equivalent of `sysctl` to turn on the serial console while running might > be handy... > > Luckily get things together and... Thanks; finally got around to look at this in at least slightly more detail. > (XEN) mm locking order violation: 48 > 16 > (XEN) Xen BUG at mm-locks.h:82 > (XEN) ----[ Xen-4.14.3 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 2 > (XEN) RIP: e008:[<ffff82d0402e8be0>] arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260 > (XEN) RFLAGS: 0000000000010292 CONTEXT: hypervisor (d1v0) > (XEN) rax: ffff83080b2f106c rbx: ffff83081da0f2d0 rcx: 0000000000000000 > (XEN) rdx: ffff83080b27ffff rsi: 000000000000000a rdi: ffff82d040469738 > (XEN) rbp: ffff82d040580688 rsp: ffff83080b27f8b0 r8: 0000000000000002 > (XEN) r9: 0000000000008000 r10: ffff82d04058f381 r11: ffff82d040375100 > (XEN) r12: ffff82d040580688 r13: ffff83080b27ffff r14: ffff83081ddf6000 > (XEN) r15: 00000000004f8c00 cr0: 000000008005003b cr4: 00000000000406e0 > (XEN) cr3: 000000081dee6000 cr2: 0000000000000000 > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0010 gs: 0010 ss: 0000 cs: e008 > (XEN) Xen code around <ffff82d0402e8be0> (arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260): > (XEN) e3 0c 00 e8 30 7f f6 ff <0f> 0b 66 0f 1f 44 00 00 42 8b 34 20 48 8d 3d 8d > (XEN) Xen stack trace from rsp=ffff83080b27f8b0: > [...] > (XEN) Xen call trace: > (XEN) [<ffff82d0402e8be0>] R arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260 > (XEN) [<ffff82d0402ec51c>] S p2m_flush_nestedp2m+0x1c/0x30 > (XEN) [<ffff82d0402e0528>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x378/0x490 hap_write_p2m_entry() calling p2m_flush_nestedp2m() suggests that nestedhvm_enabled() was true for the domain. While we will want to fix this, nested virt is experimental (even in current staging), and hence there at least is no security concern. Can you confirm that by leaving nested off you don't run into this (or a similar) issue? Of course you not having done this with a debug build (and frame pointers in particular) leaves a level of uncertainty, i.e. the real call chain may have been different from what this call trace suggests. Jan > (XEN) [<ffff82d0402f009a>] S arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x24a/0x2e0 > (XEN) [<ffff82d0402f1097>] S arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x3c7/0x7b0 > (XEN) [<ffff82d0402ea0a6>] S p2m_set_entry+0xa6/0x130 > (XEN) [<ffff82d0402f4ecd>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check+0x1cd/0x440 > (XEN) [<ffff82d0402f023f>] S arch/x86/mm/p2m-pt.c#do_recalc+0x10f/0x470 > (XEN) [<ffff82d0402f02ed>] S arch/x86/mm/p2m-pt.c#do_recalc+0x1bd/0x470 > (XEN) [<ffff82d0402f00ed>] S arch/x86/mm/p2m-pt.c#p2m_next_level.constprop.10+0x29d/0x2e0 > (XEN) [<ffff82d0402e03da>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x22a/0x490 > (XEN) [<ffff82d0402f0fe2>] S arch/x86/mm/p2m-pt.c#p2m_pt_set_entry+0x312/0x7b0 > (XEN) [<ffff82d0402f0c4e>] S arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x3fe/0x480 > (XEN) [<ffff82d0402f59aa>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x17a/0x600 > (XEN) [<ffff82d0402f5ba0>] S arch/x86/mm/p2m-pod.c#p2m_pod_zero_check_superpage+0x370/0x600 > (XEN) [<ffff82d0402f7c78>] S p2m_pod_demand_populate+0x6b8/0xa90 > (XEN) [<ffff82d0402f0aa6>] S arch/x86/mm/p2m-pt.c#p2m_pt_get_entry+0x256/0x480 > (XEN) [<ffff82d0402e9a1f>] S __get_gfn_type_access+0x6f/0x130 > (XEN) [<ffff82d0402ab12b>] S hvm_hap_nested_page_fault+0xeb/0x760 > (XEN) [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164 > (XEN) [<ffff82d04028c87e>] S svm_asm_do_resume+0x12e/0x164 > > The stack trace goes further, but I suspect the rest would be overkill. > That seems to readily qualify as "Xen bug". > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-29 13:32 ` Jan Beulich @ 2021-09-29 15:31 ` Elliott Mitchell 2021-09-30 7:08 ` Jan Beulich 0 siblings, 1 reply; 16+ messages in thread From: Elliott Mitchell @ 2021-09-29 15:31 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On Wed, Sep 29, 2021 at 03:32:15PM +0200, Jan Beulich wrote: > On 27.09.2021 00:53, Elliott Mitchell wrote: > > (XEN) Xen call trace: > > (XEN) [<ffff82d0402e8be0>] R arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260 > > (XEN) [<ffff82d0402ec51c>] S p2m_flush_nestedp2m+0x1c/0x30 > > (XEN) [<ffff82d0402e0528>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x378/0x490 > > hap_write_p2m_entry() calling p2m_flush_nestedp2m() suggests that > nestedhvm_enabled() was true for the domain. While we will want to > fix this, nested virt is experimental (even in current staging), > and hence there at least is no security concern. Copy and paste from the xl.cfg man page: nestedhvm=BOOLEAN Enable or disables guest access to hardware virtualisation features, e.g. it allows a guest Operating System to also function as a hypervisor. You may want this option if you want to run another hypervisor (including another copy of Xen) within a Xen guest or to support a guest Operating System which uses hardware virtualisation extensions (e.g. Windows XP compatibility mode on more modern Windows OS). This option is disabled by default. "This option is disabled by default." doesn't mean "this is an experimental feature with no security support and is likely to crash the hypervisor". More notably this is fully enabled in default builds of Xen. Contrast this with the stance of the ARM side with regards to ACPI. > Can you confirm that by leaving nested off you don't run into this > (or a similar) issue? Hypervisor doesn't panic. `xl dmesg` does end up with: (XEN) p2m_pod_demand_populate: Dom72 out of PoD memory! (tot=524304 ents=28773031 dom72) (XEN) domain_crash called from p2m-pod.c:1233 Which is problematic. maxmem for this domain is set to allow for trading memory around, so it is desireable for it to successfully load even when its maximum isn't available. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-29 15:31 ` Elliott Mitchell @ 2021-09-30 7:08 ` Jan Beulich 2021-10-02 2:35 ` Elliott Mitchell 0 siblings, 1 reply; 16+ messages in thread From: Jan Beulich @ 2021-09-30 7:08 UTC (permalink / raw) To: Elliott Mitchell; +Cc: xen-devel On 29.09.2021 17:31, Elliott Mitchell wrote: > On Wed, Sep 29, 2021 at 03:32:15PM +0200, Jan Beulich wrote: >> On 27.09.2021 00:53, Elliott Mitchell wrote: >>> (XEN) Xen call trace: >>> (XEN) [<ffff82d0402e8be0>] R arch/x86/mm/p2m.c#p2m_flush_table+0x240/0x260 >>> (XEN) [<ffff82d0402ec51c>] S p2m_flush_nestedp2m+0x1c/0x30 >>> (XEN) [<ffff82d0402e0528>] S arch/x86/mm/hap/hap.c#hap_write_p2m_entry+0x378/0x490 >> >> hap_write_p2m_entry() calling p2m_flush_nestedp2m() suggests that >> nestedhvm_enabled() was true for the domain. While we will want to >> fix this, nested virt is experimental (even in current staging), >> and hence there at least is no security concern. > > Copy and paste from the xl.cfg man page: > > nestedhvm=BOOLEAN > Enable or disables guest access to hardware virtualisation > features, e.g. it allows a guest Operating System to also function > as a hypervisor. You may want this option if you want to run > another hypervisor (including another copy of Xen) within a Xen > guest or to support a guest Operating System which uses hardware > virtualisation extensions (e.g. Windows XP compatibility mode on > more modern Windows OS). This option is disabled by default. > > "This option is disabled by default." doesn't mean "this is an > experimental feature with no security support and is likely to crash the > hypervisor". Correct, but this isn't the only place to look at. Quoting SUPPORT.md: "### x86/Nested HVM This means providing hardware virtulization support to guest VMs allowing, for instance, a nested Xen to support both PV and HVM guests. It also implies support for other hypervisors, such as KVM, Hyper-V, Bromium, and so on as guests. Status, x86 HVM: Experimental" And with an experimental feature you have to expect crashes, no matter that we'd prefer if you wouldn't hit any. >> Can you confirm that by leaving nested off you don't run into this >> (or a similar) issue? > > Hypervisor doesn't panic. `xl dmesg` does end up with: > > (XEN) p2m_pod_demand_populate: Dom72 out of PoD memory! (tot=524304 ents=28773031 dom72) > (XEN) domain_crash called from p2m-pod.c:1233 > > Which is problematic. maxmem for this domain is set to allow for trading > memory around, so it is desireable for it to successfully load even when > its maximum isn't available. Yet that's still a configuration error (of the guest), not a bug in Xen. Thanks for confirming that the issue is nested-hvm related. I'm in the process of putting together a draft fix, but I'm afraid there's a bigger underlying issue, so I'm not convinced we would want to go with that fix even if you were to find that it helps in your case. Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-30 7:08 ` Jan Beulich @ 2021-10-02 2:35 ` Elliott Mitchell 2021-10-07 7:20 ` Jan Beulich 0 siblings, 1 reply; 16+ messages in thread From: Elliott Mitchell @ 2021-10-02 2:35 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On Thu, Sep 30, 2021 at 09:08:34AM +0200, Jan Beulich wrote: > On 29.09.2021 17:31, Elliott Mitchell wrote: > > > > Copy and paste from the xl.cfg man page: > > > > nestedhvm=BOOLEAN > > Enable or disables guest access to hardware virtualisation > > features, e.g. it allows a guest Operating System to also function > > as a hypervisor. You may want this option if you want to run > > another hypervisor (including another copy of Xen) within a Xen > > guest or to support a guest Operating System which uses hardware > > virtualisation extensions (e.g. Windows XP compatibility mode on > > more modern Windows OS). This option is disabled by default. > > > > "This option is disabled by default." doesn't mean "this is an > > experimental feature with no security support and is likely to crash the > > hypervisor". > > Correct, but this isn't the only place to look at. Quoting > SUPPORT.md: You expect everyone to memorize SUPPORT.md (almost 1000 lines) before trying to use Xen? Your statement amounts to saying you really expect that. People who want to get work done will look at `man xl.cfg` when needed, and follow instructions. Mentioning something in `man xl.cfg` amounts to a statment "this is supported". Experimental/unsupported options need to be marked "EXPERIMENTAL: DO NOT ENABLE IN PRODUCTION ENVIRONMENTS". > Yet that's still a configuration error (of the guest), not a bug in > Xen. Documentation that poor amounts to a security vulnerability. I would suggest this needs 2 extra enablers. First, this has potential to panic the hypervisor. As such there needs to be an "enable_experimental=" option for the Xen command-line. The argument would be a list of features to enable ("nestedhvm" for this case). If this is absent, the hypervisor should ideally disable as much of the code related to the unsupported/experimental features as possible. Second, since this needs to be enabled per-domain, there should be a similar "enable_experimental" setting for xl.cfg options. I think this really is bad enough to warrant a security vulnerability and updates to all branches. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-10-02 2:35 ` Elliott Mitchell @ 2021-10-07 7:20 ` Jan Beulich 0 siblings, 0 replies; 16+ messages in thread From: Jan Beulich @ 2021-10-07 7:20 UTC (permalink / raw) To: Elliott Mitchell; +Cc: xen-devel On 02.10.2021 04:35, Elliott Mitchell wrote: > On Thu, Sep 30, 2021 at 09:08:34AM +0200, Jan Beulich wrote: >> On 29.09.2021 17:31, Elliott Mitchell wrote: >>> >>> Copy and paste from the xl.cfg man page: >>> >>> nestedhvm=BOOLEAN >>> Enable or disables guest access to hardware virtualisation >>> features, e.g. it allows a guest Operating System to also function >>> as a hypervisor. You may want this option if you want to run >>> another hypervisor (including another copy of Xen) within a Xen >>> guest or to support a guest Operating System which uses hardware >>> virtualisation extensions (e.g. Windows XP compatibility mode on >>> more modern Windows OS). This option is disabled by default. >>> >>> "This option is disabled by default." doesn't mean "this is an >>> experimental feature with no security support and is likely to crash the >>> hypervisor". >> >> Correct, but this isn't the only place to look at. Quoting >> SUPPORT.md: > > You expect everyone to memorize SUPPORT.md (almost 1000 lines) before > trying to use Xen? I don't see why you say "memorize". When the file was introduced, it was (aiui) indeed the intention for _it_ to become the main reference. Feel free to propose alternatives. > Your statement amounts to saying you really expect that. People who want > to get work done will look at `man xl.cfg` when needed, and follow > instructions. > > Mentioning something in `man xl.cfg` amounts to a statment "this is > supported". Experimental/unsupported options need to be marked > "EXPERIMENTAL: DO NOT ENABLE IN PRODUCTION ENVIRONMENTS". > > >> Yet that's still a configuration error (of the guest), not a bug in >> Xen. > > Documentation that poor amounts to a security vulnerability. I disagree. > I would suggest this needs 2 extra enablers. > > First, this has potential to panic the hypervisor. As such there needs > to be an "enable_experimental=" option for the Xen command-line. The > argument would be a list of features to enable ("nestedhvm" for this > case). If this is absent, the hypervisor should ideally disable as much > of the code related to the unsupported/experimental features as possible. > > Second, since this needs to be enabled per-domain, there should be a > similar "enable_experimental" setting for xl.cfg options. > > > > I think this really is bad enough to warrant a security vulnerability > and updates to all branches. As above, I don't think I agree. But please feel free to propose patches. What I'm personally more curious about is whether the patch I did send you actually made a difference. Jan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: HVM/PVH Balloon crash 2021-09-26 22:53 ` Elliott Mitchell 2021-09-29 13:32 ` Jan Beulich @ 2021-09-30 7:43 ` Jan Beulich 1 sibling, 0 replies; 16+ messages in thread From: Jan Beulich @ 2021-09-30 7:43 UTC (permalink / raw) To: Elliott Mitchell; +Cc: xen-devel On 27.09.2021 00:53, Elliott Mitchell wrote: > On Wed, Sep 15, 2021 at 08:05:05AM +0200, Jan Beulich wrote: >> On 15.09.2021 04:40, Elliott Mitchell wrote: >>> On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote: >>>> On 07.09.2021 17:03, Elliott Mitchell wrote: >>>>> Could be this system is in an >>>>> intergenerational hole, and some spot in the PVH/HVM code makes an >>>>> assumption of the presence of NPT guarantees presence of an operational >>>>> IOMMU. Otherwise if there was some copy and paste while writing IOMMU >>>>> code, some portion of the IOMMU code might be checking for presence of >>>>> NPT instead of presence of IOMMU. >>>> >>>> This is all very speculative; I consider what you suspect not very likely, >>>> but also not entirely impossible. This is not the least because for a >>>> long time we've been running without shared page tables on AMD. >>>> >>>> I'm afraid without technical data and without knowing how to repro, I >>>> don't see a way forward here. >>> >>> Downtimes are very expensive even for lower-end servers. Plus there is >>> the issue the system wasn't meant for development and thus never had >>> appropriate setup done. >>> >>> Experimentation with a system of similar age suggested another candidate. >>> System has a conventional BIOS. Might some dependancies on the presence >>> of UEFI snuck into the NPT code? >> >> I can't think of any such, but as all of this is very nebulous I can't >> really rule out anything. > > Getting everything right to recreate is rather inexact. Having an > equivalent of `sysctl` to turn on the serial console while running might > be handy... > > Luckily get things together and... > > (XEN) mm locking order violation: 48 > 16 > (XEN) Xen BUG at mm-locks.h:82 Would you give the patch below a try? While against current staging it looks to apply fine to 4.14.3. Jan x86/PoD: defer nested P2M flushes With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() -> write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock order violation when the PoD lock is held around it. Hence such flushing needs to be deferred. Steal the approach from p2m_change_type_range(). Reported-by: Elliott Mitchell <ehem+xen@m5p.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/mm/p2m-pod.c +++ b/xen/arch/x86/mm/p2m-pod.c @@ -24,6 +24,7 @@ #include <xen/mm.h> #include <xen/sched.h> #include <xen/trace.h> +#include <asm/hvm/nestedhvm.h> #include <asm/page.h> #include <asm/paging.h> #include <asm/p2m.h> @@ -494,6 +495,13 @@ p2m_pod_offline_or_broken_replace(struct static int p2m_pod_zero_check_superpage(struct p2m_domain *p2m, gfn_t gfn); +static void pod_unlock_and_flush(struct p2m_domain *p2m) +{ + pod_unlock(p2m); + p2m->defer_nested_flush = false; + if ( nestedhvm_enabled(p2m->domain) ) + p2m_flush_nestedp2m(p2m->domain); +} /* * This function is needed for two reasons: @@ -514,6 +522,7 @@ p2m_pod_decrease_reservation(struct doma gfn_lock(p2m, gfn, order); pod_lock(p2m); + p2m->defer_nested_flush = true; /* * If we don't have any outstanding PoD entries, let things take their @@ -665,7 +674,7 @@ out_entry_check: } out_unlock: - pod_unlock(p2m); + pod_unlock_and_flush(p2m); gfn_unlock(p2m, gfn, order); return ret; } @@ -1144,8 +1153,10 @@ p2m_pod_demand_populate(struct p2m_domai * won't start until we're done. */ if ( unlikely(d->is_dying) ) - goto out_fail; - + { + pod_unlock(p2m); + return false; + } /* * Because PoD does not have cache list for 1GB pages, it has to remap @@ -1167,6 +1178,8 @@ p2m_pod_demand_populate(struct p2m_domai p2m_populate_on_demand, p2m->default_access); } + p2m->defer_nested_flush = true; + /* Only reclaim if we're in actual need of more cache. */ if ( p2m->pod.entry_count > p2m->pod.count ) pod_eager_reclaim(p2m); @@ -1229,8 +1242,9 @@ p2m_pod_demand_populate(struct p2m_domai __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t); } - pod_unlock(p2m); + pod_unlock_and_flush(p2m); return true; + out_of_memory: pod_unlock(p2m); @@ -1239,12 +1253,14 @@ out_of_memory: p2m->pod.entry_count, current->domain->domain_id); domain_crash(d); return false; + out_fail: - pod_unlock(p2m); + pod_unlock_and_flush(p2m); return false; + remap_and_retry: BUG_ON(order != PAGE_ORDER_2M); - pod_unlock(p2m); + pod_unlock_and_flush(p2m); /* * Remap this 2-meg region in singleton chunks. See the comment on the ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-10-07 7:21 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-05 22:10 HVM/PVH Ballon crash Elliott Mitchell 2021-09-06 7:52 ` Jan Beulich 2021-09-06 20:47 ` HVM/PVH Balloon crash Elliott Mitchell 2021-09-07 8:03 ` Jan Beulich 2021-09-07 15:03 ` Elliott Mitchell 2021-09-07 15:57 ` Jan Beulich 2021-09-07 21:40 ` Elliott Mitchell 2021-09-15 2:40 ` Elliott Mitchell 2021-09-15 6:05 ` Jan Beulich 2021-09-26 22:53 ` Elliott Mitchell 2021-09-29 13:32 ` Jan Beulich 2021-09-29 15:31 ` Elliott Mitchell 2021-09-30 7:08 ` Jan Beulich 2021-10-02 2:35 ` Elliott Mitchell 2021-10-07 7:20 ` Jan Beulich 2021-09-30 7:43 ` Jan Beulich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.