All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] Linux pvh vm not getting destroyed on shutdown
@ 2021-02-13 15:36 Maximilian Engelhardt
  2021-02-13 18:21 ` Elliott Mitchell
  2021-02-17 18:19 ` Maximilian Engelhardt
  0 siblings, 2 replies; 6+ messages in thread
From: Maximilian Engelhardt @ 2021-02-13 15:36 UTC (permalink / raw)
  To: xen-devel; +Cc: pkg-xen-devel

[-- Attachment #1: Type: text/plain, Size: 3260 bytes --]

Hi,

after a recent upgrade of one of our test systems to Debian Bullseye we 
noticed an issue where on shutdown of a pvh vm the vm was not destroyed by xen 
automatically. It could still be destroyed by manually issuing a 'xl destroy 
$vm' command.

We can reproduce the hang reliably with the following vm configuration:

type = 'pvh'
memory = '512'
kernel = '/usr/lib/grub-xen/grub-i386-xen_pvh.bin'
[... disk/name/vif ]
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'
vcpus = '1'
maxvcpus = '2'

And then issuing a shutdown command in the vm (e.g. by calling 'poweroff')


Here are some things I noticed while trying to debug this issue:

* It happens on a Debian buster dom0 as well as on a bullseye dom0

* It seems to only affect pvh vms.

* shutdown from the pvgrub menu ("c" -> "halt") does work

* the vm seems to shut down normal, the last lines in the console are:

[  228.461167] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD 
devices and DM devices detached.
[  228.476794] systemd-shutdown[1]: Syncing filesystems and block devices.
[  228.477878] systemd-shutdown[1]: Powering off.
[  233.709498] xenbus_probe_frontend: xenbus_frontend_dev_shutdown: device/
vif/0 timeout closing device
[  233.745642] reboot: System halted

* issuing a reboot instead of a shutdown does work fine.

* The issue started with Debian kernel 5.8.3+1~exp1 running in the vm, Debian 
kernel 5.7.17-1 does not show the issue.

* setting vcpus equal to maxvcpus does *not* show the hang.


Below is the output of "xl debug-keys q; xl dmesg" for the affected vm in the 
'hang' state as suggested by andyhhp on #xen to attach to this bug report:

(XEN) General information for domain 55:
(XEN)     refcnt=3 dying=0 pause_count=0
(XEN)     nr_pages=131088 xenheap_pages=4 shared_pages=0 paged_pages=0 
dirty_cpus={} max_pages=131328
(XEN)     handle=275e3a73-247f-4649-af86-6d5c0c72e8e4 vm_assist=00000020
(XEN)     paging assistance: hap refcounts translate external 
(XEN) Rangesets belonging to domain 55:
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN)     I/O Ports  { }
(XEN)     log-dirty  { }
(XEN) Memory pages belonging to domain 55:
(XEN)     DomPage list too long to display
(XEN)     PoD entries=0 cachesize=0
(XEN)     XenPage 0000000000080125: caf=c000000000000001, taf=e400000000000001
(XEN)     XenPage 00000000001412c9: caf=c000000000000001, taf=e400000000000001
(XEN)     XenPage 0000000000140da0: caf=c000000000000001, taf=e400000000000001
(XEN)     XenPage 0000000000140d9a: caf=c000000000000001, taf=e400000000000001
(XEN)     ExtraPage 00000000001412d3: caf=8040000000000002, 
taf=e400000000000001
(XEN) NODE affinity for domain 55: [0]
(XEN) VCPU information and callbacks for domain 55:
(XEN)   UNIT0 affinities: hard={0-7} soft={0-3}
(XEN)     VCPU0: CPU2 [has=F] poll=0 upcall_pend=01 upcall_mask=00 
(XEN)     pause_count=0 pause_flags=2
(XEN)     paging assistance: hap, 4 levels
(XEN) No periodic timer
(XEN)   UNIT1 affinities: hard={0-7} soft={0-3}
(XEN)     VCPU1: CPU1 [has=F] poll=0 upcall_pend=00 upcall_mask=00 
(XEN)     pause_count=0 pause_flags=1
(XEN)     paging assistance: hap, 4 levels
(XEN) No periodic timer


Please let me know if more information is necessary.

Thanks,
Maxi

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Linux pvh vm not getting destroyed on shutdown
  2021-02-13 15:36 [BUG] Linux pvh vm not getting destroyed on shutdown Maximilian Engelhardt
@ 2021-02-13 18:21 ` Elliott Mitchell
  2021-02-14 22:45   ` Maximilian Engelhardt
  2021-02-17 18:19 ` Maximilian Engelhardt
  1 sibling, 1 reply; 6+ messages in thread
From: Elliott Mitchell @ 2021-02-13 18:21 UTC (permalink / raw)
  To: Maximilian Engelhardt; +Cc: xen-devel, pkg-xen-devel

On Sat, Feb 13, 2021 at 04:36:24PM +0100, Maximilian Engelhardt wrote:
> after a recent upgrade of one of our test systems to Debian Bullseye we 
> noticed an issue where on shutdown of a pvh vm the vm was not destroyed by xen 
> automatically. It could still be destroyed by manually issuing a 'xl destroy 
> $vm' command.

Usually I would expect such an issue to show on the Debian bug database
before xen-devel.  In particular as this is a behavior change with
security updates, there is a good chance this isn't attributable to the
Xen Project.  Additionally the Xen Project's support window is rather
narrow.  I've been observing the same (or similar) issue for a bit too.


> Here are some things I noticed while trying to debug this issue:
> 
> * It happens on a Debian buster dom0 as well as on a bullseye dom0

I stick with stable on non-development machines, so I can't say anything
to this.

> * It seems to only affect pvh vms.

I've observed it with pv and hvm VMs as well.

> * shutdown from the pvgrub menu ("c" -> "halt") does work

Woah!  That is quite the observation.  Since I had a handy opportunity
I tried this and this reproduces for me.

> * the vm seems to shut down normal, the last lines in the console are:

I agree with this.  Everything appears typical until the last moment.

> * issuing a reboot instead of a shutdown does work fine.

I disagree with this.  I'm seeing the issue occur with restart attempts
too.

> * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm, Debian 
> kernel 5.7.17-1 does not show the issue.

I think the first kernel update during which I saw the issue was around
linux-image-4.19.0-12-amd64 or linux-image-4.19.0-13-amd64.  I think
the last security update to the Xen packages was in a similar timeframe
though.  Rate this portion as unreliable though.  I can definitely state
this occurs with Debian's linux-image-4.19.0-13-amd64 and kernels built
from corresponding source, this may have shown earlier.

> * setting vcpus equal to maxvcpus does *not* show the hang.

I haven't tried things related to this, so I can't comment on this
part.


Fresh observation.  During a similar timeframe I started noticing VM
creation leaving a `xl create` process behind.  I had discovered this
process could be freely killed without appearing to effect the VM and had
thus been doing so (memory in a lean Dom0 is precious).

While typing this I realized there was another scenario I needed to try.
Turns out if I boot PV GRUB and get to its command-line (press 'c'), then
get away from the VM console, kill the `xl create` process, return to
the console and type "halt".  This results in a hung VM.

Are you perhaps either killing the `xl create` process for effected VMs,
or migrating the VM and thus splitting the `xl create` process from the
effected VMs?

This seems more a Debian issue than a Xen Project issue right now.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Linux pvh vm not getting destroyed on shutdown
  2021-02-13 18:21 ` Elliott Mitchell
@ 2021-02-14 22:45   ` Maximilian Engelhardt
  2021-02-15  3:27     ` Elliott Mitchell
  0 siblings, 1 reply; 6+ messages in thread
From: Maximilian Engelhardt @ 2021-02-14 22:45 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: xen-devel, pkg-xen-devel

[-- Attachment #1: Type: text/plain, Size: 4289 bytes --]

On Samstag, 13. Februar 2021 19:21:56 CET Elliott Mitchell wrote:
> On Sat, Feb 13, 2021 at 04:36:24PM +0100, Maximilian Engelhardt wrote:
> > after a recent upgrade of one of our test systems to Debian Bullseye we
> > noticed an issue where on shutdown of a pvh vm the vm was not destroyed by
> > xen automatically. It could still be destroyed by manually issuing a 'xl
> > destroy $vm' command.
> 
> Usually I would expect such an issue to show on the Debian bug database
> before xen-devel.  In particular as this is a behavior change with
> security updates, there is a good chance this isn't attributable to the
> Xen Project.  Additionally the Xen Project's support window is rather
> narrow.  I've been observing the same (or similar) issue for a bit too.

I posted this bug report to the xen-devel list because I was told to do so on 
upstream #xen irc channel.
Before writing my mail, I also checked the Debian kernel packaging for 
anything that might be related to our issue, but could not find anything.
Please note we didn't observe any behavior change in Debian buster on our 
systems and also didn't notice the shutdown issue there. For us the issue 
only started with kernel version 5.8.3+1~exp1.

> > Here are some things I noticed while trying to debug this issue:
> > 
> > * It happens on a Debian buster dom0 as well as on a bullseye dom0
> 
> I stick with stable on non-development machines, so I can't say anything
> to this.
> 
> > * It seems to only affect pvh vms.
> 
> I've observed it with pv and hvm VMs as well.
> 
> > * shutdown from the pvgrub menu ("c" -> "halt") does work
> 
> Woah!  That is quite the observation.  Since I had a handy opportunity
> I tried this and this reproduces for me.
> 
> > * the vm seems to shut down normal, the last lines in the console are:
> I agree with this.  Everything appears typical until the last moment.
> 
> > * issuing a reboot instead of a shutdown does work fine.
> 
> I disagree with this.  I'm seeing the issue occur with restart attempts
> too.
> 
> > * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm,
> > Debian kernel 5.7.17-1 does not show the issue.
> 
> I think the first kernel update during which I saw the issue was around
> linux-image-4.19.0-12-amd64 or linux-image-4.19.0-13-amd64.  I think
> the last security update to the Xen packages was in a similar timeframe
> though.  Rate this portion as unreliable though.  I can definitely state
> this occurs with Debian's linux-image-4.19.0-13-amd64 and kernels built
> from corresponding source, this may have shown earlier.

We don't see any issues with the current Debian buster (Debian stable) kernel 
(4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux) and 
also did not notice any issues with the older kernel packages in buster. Also 
the security update of xen in buster did not cause any behavior change for us. 
In our case everything in buster is working as we expect it to work (using 
latest updates and security updates).

> > * setting vcpus equal to maxvcpus does *not* show the hang.
> 
> I haven't tried things related to this, so I can't comment on this
> part.
> 
> 
> Fresh observation.  During a similar timeframe I started noticing VM
> creation leaving a `xl create` process behind.  I had discovered this
> process could be freely killed without appearing to effect the VM and had
> thus been doing so (memory in a lean Dom0 is precious).
> 
> While typing this I realized there was another scenario I needed to try.
> Turns out if I boot PV GRUB and get to its command-line (press 'c'), then
> get away from the VM console, kill the `xl create` process, return to
> the console and type "halt".  This results in a hung VM.
> 
> Are you perhaps either killing the `xl create` process for effected VMs,
> or migrating the VM and thus splitting the `xl create` process from the
> effected VMs?
> 
> This seems more a Debian issue than a Xen Project issue right now.

We don't migrate the vms, we don't kill any processes running on the dom0 and 
I don't see anything in our logs indicating something gets killed on the dom0. 
On our systems the running 'xl create' processes only use very little memory.

Have you tried if you still observer your hangs if you don't kill the xl 
processes?

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Linux pvh vm not getting destroyed on shutdown
  2021-02-14 22:45   ` Maximilian Engelhardt
@ 2021-02-15  3:27     ` Elliott Mitchell
  2021-02-15  9:00       ` Roger Pau Monné
  0 siblings, 1 reply; 6+ messages in thread
From: Elliott Mitchell @ 2021-02-15  3:27 UTC (permalink / raw)
  To: Maximilian Engelhardt; +Cc: xen-devel, pkg-xen-devel

On Sun, Feb 14, 2021 at 11:45:47PM +0100, Maximilian Engelhardt wrote:
> On Samstag, 13. Februar 2021 19:21:56 CET Elliott Mitchell wrote:
> > On Sat, Feb 13, 2021 at 04:36:24PM +0100, Maximilian Engelhardt wrote:
> > > * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm,
> > > Debian kernel 5.7.17-1 does not show the issue.
> > 
> > I think the first kernel update during which I saw the issue was around
> > linux-image-4.19.0-12-amd64 or linux-image-4.19.0-13-amd64.  I think
> > the last security update to the Xen packages was in a similar timeframe
> > though.  Rate this portion as unreliable though.  I can definitely state
> > this occurs with Debian's linux-image-4.19.0-13-amd64 and kernels built
> > from corresponding source, this may have shown earlier.
> 
> We don't see any issues with the current Debian buster (Debian stable) kernel 
> (4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux) and 
> also did not notice any issues with the older kernel packages in buster. Also 
> the security update of xen in buster did not cause any behavior change for us. 
> In our case everything in buster is working as we expect it to work (using 
> latest updates and security updates).

I can't really say much here.  I keep up to date and I cannot point to a
key ingredient as the one which caused this breakage.


> > Fresh observation.  During a similar timeframe I started noticing VM
> > creation leaving a `xl create` process behind.  I had discovered this
> > process could be freely killed without appearing to effect the VM and had
> > thus been doing so (memory in a lean Dom0 is precious).
> > 
> > While typing this I realized there was another scenario I needed to try.
> > Turns out if I boot PV GRUB and get to its command-line (press 'c'), then
> > get away from the VM console, kill the `xl create` process, return to
> > the console and type "halt".  This results in a hung VM.
> > 
> > Are you perhaps either killing the `xl create` process for effected VMs,
> > or migrating the VM and thus splitting the `xl create` process from the
> > effected VMs?
> > 
> > This seems more a Debian issue than a Xen Project issue right now.
> 
> We don't migrate the vms, we don't kill any processes running on the dom0 and 
> I don't see anything in our logs indicating something gets killed on the dom0. 
> On our systems the running 'xl create' processes only use very little memory.
> 
> Have you tried if you still observer your hangs if you don't kill the xl 
> processes?

That is exactly what I pointed to above.  On stable killing the
mysterious left behind `xl create` process causes the problem to
manifest, while leaving it undisturbed appears to makes the problem not
manifest.

After a save/restore instead it is a `xl restore` process left behind.
I /suspect/ this plays a similar role, I'm unsure how far this goes
though.  Might you try telling a VM to reboot, then do a save followed
by a restore of it?

I'm curious whether respawning the `xl restore` could work around what is
occuring.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Linux pvh vm not getting destroyed on shutdown
  2021-02-15  3:27     ` Elliott Mitchell
@ 2021-02-15  9:00       ` Roger Pau Monné
  0 siblings, 0 replies; 6+ messages in thread
From: Roger Pau Monné @ 2021-02-15  9:00 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: Maximilian Engelhardt, xen-devel, pkg-xen-devel

On Sun, Feb 14, 2021 at 07:27:46PM -0800, Elliott Mitchell wrote:
> On Sun, Feb 14, 2021 at 11:45:47PM +0100, Maximilian Engelhardt wrote:
> > On Samstag, 13. Februar 2021 19:21:56 CET Elliott Mitchell wrote:
> > > On Sat, Feb 13, 2021 at 04:36:24PM +0100, Maximilian Engelhardt wrote:
> > > > * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm,
> > > > Debian kernel 5.7.17-1 does not show the issue.
> > > 
> > > I think the first kernel update during which I saw the issue was around
> > > linux-image-4.19.0-12-amd64 or linux-image-4.19.0-13-amd64.  I think
> > > the last security update to the Xen packages was in a similar timeframe
> > > though.  Rate this portion as unreliable though.  I can definitely state
> > > this occurs with Debian's linux-image-4.19.0-13-amd64 and kernels built
> > > from corresponding source, this may have shown earlier.
> > 
> > We don't see any issues with the current Debian buster (Debian stable) kernel 
> > (4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux) and 
> > also did not notice any issues with the older kernel packages in buster. Also 
> > the security update of xen in buster did not cause any behavior change for us. 
> > In our case everything in buster is working as we expect it to work (using 
> > latest updates and security updates).
> 
> I can't really say much here.  I keep up to date and I cannot point to a
> key ingredient as the one which caused this breakage.
> 
> 
> > > Fresh observation.  During a similar timeframe I started noticing VM
> > > creation leaving a `xl create` process behind.  I had discovered this
> > > process could be freely killed without appearing to effect the VM and had
> > > thus been doing so (memory in a lean Dom0 is precious).
> > > 
> > > While typing this I realized there was another scenario I needed to try.
> > > Turns out if I boot PV GRUB and get to its command-line (press 'c'), then
> > > get away from the VM console, kill the `xl create` process, return to
> > > the console and type "halt".  This results in a hung VM.
> > > 
> > > Are you perhaps either killing the `xl create` process for effected VMs,
> > > or migrating the VM and thus splitting the `xl create` process from the
> > > effected VMs?
> > > 
> > > This seems more a Debian issue than a Xen Project issue right now.
> > 
> > We don't migrate the vms, we don't kill any processes running on the dom0 and 
> > I don't see anything in our logs indicating something gets killed on the dom0. 
> > On our systems the running 'xl create' processes only use very little memory.
> > 
> > Have you tried if you still observer your hangs if you don't kill the xl 
> > processes?
> 
> That is exactly what I pointed to above.  On stable killing the
> mysterious left behind `xl create` process causes the problem to
> manifest, while leaving it undisturbed appears to makes the problem not
> manifest.

You cannot kill the 'xl create' process, or else events for the domain
(like shutdown) won't be handled by the toolstack, and thus the domain
won't be destroyed when the guest shuts down. The same would happen if
the guest ties to reboot, it won't work properly because the reboot
request won't be handled by the toolstack as you have just killed the
xl process that's in charge of doing it.

Roger.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Linux pvh vm not getting destroyed on shutdown
  2021-02-13 15:36 [BUG] Linux pvh vm not getting destroyed on shutdown Maximilian Engelhardt
  2021-02-13 18:21 ` Elliott Mitchell
@ 2021-02-17 18:19 ` Maximilian Engelhardt
  1 sibling, 0 replies; 6+ messages in thread
From: Maximilian Engelhardt @ 2021-02-17 18:19 UTC (permalink / raw)
  To: xen-devel; +Cc: pkg-xen-devel

[-- Attachment #1: Type: text/plain, Size: 827 bytes --]

On Samstag, 13. Februar 2021 16:36:24 CET Maximilian Engelhardt wrote:
> Here are some things I noticed while trying to debug this issue:
> 
> * It happens on a Debian buster dom0 as well as on a bullseye dom0
> 
> * It seems to only affect pvh vms.
> 
> * shutdown from the pvgrub menu ("c" -> "halt") does work
> 
> * the vm seems to shut down normal, the last lines in the console are:
[...]
> 
> * issuing a reboot instead of a shutdown does work fine.
> 
> * The issue started with Debian kernel 5.8.3+1~exp1 running in the vm,
> Debian kernel 5.7.17-1 does not show the issue.
> 
> * setting vcpus equal to maxvcpus does *not* show the hang.

One thing I just realized I totally forgot to mention in my initial report is 
that this issue is present for us also on a modern kernel. We tested with 
Debian kernel 5.10.13-1.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-17 18:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-13 15:36 [BUG] Linux pvh vm not getting destroyed on shutdown Maximilian Engelhardt
2021-02-13 18:21 ` Elliott Mitchell
2021-02-14 22:45   ` Maximilian Engelhardt
2021-02-15  3:27     ` Elliott Mitchell
2021-02-15  9:00       ` Roger Pau Monné
2021-02-17 18:19 ` Maximilian Engelhardt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.