All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elliott Mitchell <ehem+xen@m5p.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: xen-devel@lists.xenproject.org
Subject: Re: HVM/PVH Balloon crash
Date: Mon, 6 Sep 2021 13:47:26 -0700	[thread overview]
Message-ID: <YTZ+XsnoKNnV4IOz@mattapan.m5p.com> (raw)
In-Reply-To: <84d9137e-a268-c3d8-57d2-76fb596e00d3@suse.com>

On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote:
> On 06.09.2021 00:10, Elliott Mitchell wrote:
> > I brought this up a while back, but it still appears to be present and
> > the latest observations appear rather serious.
> > 
> > I'm unsure of the entire set of conditions for reproduction.
> > 
> > Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but
> > this is an older AMD IOMMU).
> > 
> > This has been confirmed with Xen 4.11 and Xen 4.14.  This includes
> > Debian's patches, but those are mostly backports or environment
> > adjustments.
> > 
> > Domain 0 is presently using a 4.19 kernel.
> > 
> > The trigger is creating a HVM or PVH domain where memory does not equal
> > maxmem.
> 
> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory
> allocations" submitted very early this year? There you said the issue
> was with a guest's maxmem exceeding host memory size. Here you seem to
> be talking of PoD in its normal form of use. Personally I uses this
> all the time (unless enabling PCI pass-through for a guest, for being
> incompatible). I've not observed any badness as severe as you've
> described.

I've got very little idea what is occurring as I'm expecting to be doing
ARM debugging, not x86 debugging.

I was starting to wonder whether this was widespread or not.  As such I
was reporting the factors which might be different in my environment.

The one which sticks out is the computer has an older AMD processor (you
a 100% Intel shop?).  The processor has the AMD NPT feature, but a very
early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not
available").

Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an
IOMMU).


There is also the possibility Debian added a bad patch, but that seems
improbable as there aren't enough bug reports.


> > New observations:
> > 
> > I discovered this occurs with PVH domains in addition to HVM ones.
> > 
> > I got PVH GRUB operational.  PVH GRUB appeared at to operate normally
> > and not trigger the crash/panic.
> > 
> > The crash/panic occurred some number of seconds after the Linux kernel
> > was loaded.
> > 
> > 
> > Mitigation by not using ballooning with HVM/PVH is workable, but this is
> > quite a large mine in the configuration.
> > 
> > I'm wondering if perhaps it is actually the Linux kernel in Domain 0
> > which is panicing.
> > 
> > The crash/panic occurring AFTER the main kernel loads suggests some
> > action by the user domain is doing is the actual trigger of the
> > crash/panic.
> 
> All of this is pretty vague: If you don't even know what component it
> is that crashes / panics, I don't suppose you have any logs. Yet what
> do you expect us to do without any technical detail?

Initially this had looked so spectacular as to be easy to reproduce.

No logs, I wasn't expecting to be doing hardware-level debugging on x86.
I've got several USB to TTL-serial cables (ARM/MIPS debug), I may need to
hunt a USB to full voltage EIA-232C cable.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




  reply	other threads:[~2021-09-06 20:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-05 22:10 HVM/PVH Ballon crash Elliott Mitchell
2021-09-06  7:52 ` Jan Beulich
2021-09-06 20:47   ` Elliott Mitchell [this message]
2021-09-07  8:03     ` HVM/PVH Balloon crash Jan Beulich
2021-09-07 15:03       ` Elliott Mitchell
2021-09-07 15:57         ` Jan Beulich
2021-09-07 21:40           ` Elliott Mitchell
2021-09-15  2:40           ` Elliott Mitchell
2021-09-15  6:05             ` Jan Beulich
2021-09-26 22:53               ` Elliott Mitchell
2021-09-29 13:32                 ` Jan Beulich
2021-09-29 15:31                   ` Elliott Mitchell
2021-09-30  7:08                     ` Jan Beulich
2021-10-02  2:35                       ` Elliott Mitchell
2021-10-07  7:20                         ` Jan Beulich
2021-09-30  7:43                 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTZ+XsnoKNnV4IOz@mattapan.m5p.com \
    --to=ehem+xen@m5p.com \
    --cc=jbeulich@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.