null domains after xl destroy

* null domains after xl destroy
@ 2017-04-11  5:25 Glenn Enright
  2017-04-11  5:59 ` Juergen Gross
  0 siblings, 1 reply; 28+ messages in thread
From: Glenn Enright @ 2017-04-11  5:25 UTC (permalink / raw)
  To: xen-devel

Hi all

We are seeing an odd issue with domu domains from xl destroy, under 
recent 4.9 kernels a (null) domain is left behind.

This has occurred on a variety of hardware, with no obvious commonality.

4.4.55 does not show this behavior.

On my test machine I have the following packages installed under 
centos6, from https://xen.crc.id.au/

~]# rpm -qa | grep xen
xen47-licenses-4.7.2-4.el6.x86_64
xen47-4.7.2-4.el6.x86_64
kernel-xen-4.9.21-1.el6xen.x86_64
xen47-ocaml-4.7.2-4.el6.x86_64
xen47-libs-4.7.2-4.el6.x86_64
xen47-libcacard-4.7.2-4.el6.x86_64
xen47-hypervisor-4.7.2-4.el6.x86_64
xen47-runtime-4.7.2-4.el6.x86_64
kernel-xen-firmware-4.9.21-1.el6xen.x86_64

I've also replicated the issue with 4.9.17 and 4.9.20

To replicate, on a cleanly booted dom0 with one pv VM, I run the 
following on the VM

{
while true; do
  dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
done
}

Then on the dom0 I do this sequence to reliably get a null domain. This 
occurs with oxenstored and xenstored both.

{
xl sync 1
xl destroy 1
}

xl list then renders something like ...

(null)                                       1     4     4     --p--d 
    9.8     0

 From what I can see it appears to be disk related. Affected VMs all use 
lvm storage for their boot disk. lvdisplay of the affected lv shows that 
the lv has is being help open by something.

~]# lvdisplay test/test.img | grep open
   # open                 1

I've not been able to determine what that thing is as yet. I tried lsof, 
dmsetup, various lv tools. Waiting for the disk to be released does not 
work.

~]# xl list
Name                                        ID   Mem VCPUs      State 
Time(s)
Domain-0                                     0  1512     2     r----- 
   29.0
(null)                                       1     4     4     --p--d 
    9.8

xenstore-ls reports nothing for the null domain id that I can see.

I can later start and restart the affected domain, but direct operations 
on the lv such as removing it don't work since it is 'busy'.

Does anyone have thoughts on this? Happy to provide any output that 
might be useful, not subscribed to this list so please CC.

Regards, Glenn
http://rimuhosting.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread