xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Re: [Xen-users] DomU fails to reboot with storage driver domain
       [not found]   ` <CALhSYYQLPdEZ-pSBdUjKDWnNsmF95zu9t=T7zGZ69Vb1S1hwWg@mail.gmail.com>
@ 2016-04-01 11:04     ` Roger Pau Monné
  2016-04-01 12:07       ` Wei Liu
  0 siblings, 1 reply; 5+ messages in thread
From: Roger Pau Monné @ 2016-04-01 11:04 UTC (permalink / raw)
  To: Alex Velazquez
  Cc: Ian.Jackson, xen-users, wei.liu2, xen-devel, Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 4850 bytes --]

Adding xen-devel to the Cc, since this is AFAICT a real bug.

On Thu, 24 Mar 2016, Alex Velazquez wrote:

> On Wed, Mar 23, 2016 at 6:56 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > Hello,
> >
> > On Mon, 21 Mar 2016, Alex Velazquez wrote:
> >> Hello,
> >>
> >> I am running Xen 4.6.0, with Ubuntu 14.04 as my Domain-0.
> >>
> >> I have a storage driver domain (PV guest running Ubuntu 14.04) that
> >> serves a disk backend to a PV DomU (also running Ubuntu 14.04).
> >>
> >> Here is the XL config file of StorageDom:
> >>
> >> > name = "StorageDom"
> >> > memory = 1024
> >> > maxmem = 1024
> >> > vcpus = 2
> >> > maxvcpus = 2
> >> > driver_domain = 1
> >> > pci = [ "84:00.0" ]
> >> > builder = "generic"
> >> > kernel = "/var/lib/xen/images/vmlinuz-3.19.0-56-generic"
> >> > ramdisk = "/var/lib/xen/images/initrd.img-3.19.0-56-generic"
> >> > cmdline = "root=/dev/sda1 ro"
> >>
> >>
> >> Here is the XL config file of ClientDom:
> >>
> >> > name = "ClientDom"
> >> > memory = 1024
> >> > maxmem = 1024
> >> > vcpus = 2
> >> > maxvcpus = 2
> >> > builder = "generic"
> >> > kernel = "/usr/local/lib/xen/boot/pv-grub-x86_64.gz"
> >> > cmdline = "(hd0,0)/boot/grub/menu.lst"
> >> > disk = [ "format=raw,vdev=xvda,access=rw,backend=StorageDom,target=/dev/sdb" ]
> >>
> >>
> >> When I start ClientDom, everything looks good. Here is the backend
> >> entry in xenstore:
> >>
> >> > user@ubuntu ~> $ sudo xenstore-ls /local/domain/1/backend/vbd
> >> > 2 = ""
> >> >  51712 = ""
> >> >   frontend = "/local/domain/2/device/vbd/51712"
> >> >   params = "/dev/sdb"
> >> >   script = "/etc/xen/scripts/block"
> >> >   frontend-id = "2"
> >> >   online = "1"
> >> >   removable = "0"
> >> >   bootable = "1"
> >> >   state = "4"
> >> >   dev = "xvda"
> >> >   type = "phy"
> >> >   mode = "w"
> >> >   device-type = "disk"
> >> >   discard-enable = "1"
> >> >   physical-device = "8:10"
> >> >   feature-flush-cache = "1"
> >> >   feature-discard = "0"
> >> >   feature-barrier = "1"
> >> >   feature-persistent = "1"
> >> >   feature-max-indirect-segments = "256"
> >> >   sectors = "1562824368"
> >> >   info = "2"
> >> >   sector-size = "512"
> >> >   physical-sector-size = "512"
> >> >   hotplug-status = "connected"
> >>
> >>
> >> And here is the corresponding frontend entry:
> >>
> >> > user@ubuntu ~> $ sudo xenstore-ls /local/domain/2/device/vbd
> >> > 51712 = ""
> >> >  backend = "/local/domain/1/backend/vbd/2/51712"
> >> >  backend-id = "1"
> >> >  state = "4"
> >> >  virtual-device = "51712"
> >> >  device-type = "disk"
> >> >  protocol = "x86_64-abi"
> >> >  ring-ref = "8"
> >> >  event-channel = "17"
> >> >  feature-persistent = "1"
> >>
> >>
> >> I run into problems if I try to reboot ClientDom (either from within
> >> the VM, or by calling "xl reboot ClientDom" from Domain-0). As
> >> ClientDom goes down, the backend entry is cleared out:
> >>
> >> > user@ubuntu ~> $ sudo xenstore-ls /local/domain/1/backend/vbd
> >> > 2 = ""
> >>
> >>
> >> Then ClientDom comes back up with ID 3, but the new backend/frontend
> >> are not created:
> >>
> >> > user@ubuntu ~> $ sudo xenstore-ls /local/domain/1/backend/vbd
> >> > 2 = ""
> >>
> >>
> >> > user@ubuntu ~> $ sudo xenstore-ls /local/domain/3/device/vbd
> >> > xenstore-ls: xs_directory (/local/domain/3/device/vbd): No such file or directory
> >>
> >>
> >> > user@ubuntu ~> $ sudo xenstore-ls /local/domain/3/device
> >> > suspend = ""
> >> >  event-channel = ""
> >>
> >>
> >> Connecting to ClientDom's console shows the PvGrub prompt, because it
> >> can't find its boot disk:
> >>
> >> >
> >> >     GNU GRUB  version 0.97  (1048576K lower / 0K upper memory)
> >> >
> >> >        [ Minimal BASH-like line editing is supported.   For
> >> >          the   first   word,  TAB  lists  possible  command
> >> >          completions.  Anywhere else TAB lists the possible
> >> >          completions of a device/filename. ]
> >> >
> >> > grubdom> root (hd0,0)
> >> >
> >> > Error 21: Selected disk does not exist
> >> >
> >> > grubdom>

Hello,

I've been able to reproduce this locally using the latest Xen unstable, 
and it is indeed a bug. The issue happens because libxl compares the JSON 
status with the list of backends it fetches from Domain 0 only, and then 
if the domain is using backends from a driver domain, it is not able to 
find those entries on Domain 0 and discards them. As an example 
libxl__append_disk_list_of_type hardcodes the backend path of Domain 0 as 
the only search path.

TBH, I don't see an easy way to solve this, I've thought about fetching 
the "backend" node from the xenstore frontend path of each device, but 
that's not safe since the guest can modify those entries.

Since it's not clear to me, why do we need to check the JSON internal data 
against the devices on xenstore? Is there a possibility that they became 
out of sync?

Roger.

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xen-users] DomU fails to reboot with storage driver domain
  2016-04-01 11:04     ` [Xen-users] DomU fails to reboot with storage driver domain Roger Pau Monné
@ 2016-04-01 12:07       ` Wei Liu
  2016-04-01 13:43         ` Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Wei Liu @ 2016-04-01 12:07 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Alex Velazquez, xen-users, wei.liu2, Ian.Jackson, xen-devel

On Fri, Apr 01, 2016 at 01:04:42PM +0200, Roger Pau Monné wrote:
> Hello,
> 
> I've been able to reproduce this locally using the latest Xen unstable, 
> and it is indeed a bug. The issue happens because libxl compares the JSON 
> status with the list of backends it fetches from Domain 0 only, and then 
> if the domain is using backends from a driver domain, it is not able to 
> find those entries on Domain 0 and discards them. As an example 
> libxl__append_disk_list_of_type hardcodes the backend path of Domain 0 as 
> the only search path.

This is the real bug. I think libxl__append_disk_list_of_type should not
hardcode 0 since we support backend domain other than 0.  The problem
you discovered is not specific to disk device. Nic and channel have the
same problem, too.

> 
> TBH, I don't see an easy way to solve this, I've thought about fetching 
> the "backend" node from the xenstore frontend path of each device, but 
> that's not safe since the guest can modify those entries.
> 

Interrogating frontend  is not entirely unsafe because we can validate
that path before reading from it. There is also a backend-id field that
we can use if that make validation easier -- no need to parse a frontend
provided string.

Another fix is to fetch all backend domain name / domid from JSON, then
fetch all xenstore backend entries. This is safe because JSON is not
controlled by guest. This might require adding locks to multiple APIs,
but luckily that wouldn't change their semantics.

Ian, do you have better ideas?

> Since it's not clear to me, why do we need to check the JSON internal data 
> against the devices on xenstore? Is there a possibility that they became 
> out of sync?
> 

The possibility that they go out of sync is exactly why we need to have
primary reference. In this case, the primary reference is xenstore.

Wei.

> Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xen-users] DomU fails to reboot with storage driver domain
  2016-04-01 12:07       ` Wei Liu
@ 2016-04-01 13:43         ` Ian Jackson
  2016-04-01 13:54           ` Wei Liu
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2016-04-01 13:43 UTC (permalink / raw)
  To: Wei Liu; +Cc: Alex Velazquez, xen-users, xen-devel, Roger Pau Monné

Wei Liu writes ("Re: [Xen-users] DomU fails to reboot with storage driver domain"):
> On Fri, Apr 01, 2016 at 01:04:42PM +0200, Roger Pau Monné wrote:
> > TBH, I don't see an easy way to solve this, I've thought about fetching 
> > the "backend" node from the xenstore frontend path of each device, but 
> > that's not safe since the guest can modify those entries.
> > 
> Interrogating frontend  is not entirely unsafe because we can validate
> that path before reading from it. There is also a backend-id field that
> we can use if that make validation easier -- no need to parse a frontend
> provided string.
> 
> Another fix is to fetch all backend domain name / domid from JSON, then
> fetch all xenstore backend entries. This is safe because JSON is not
> controlled by guest. This might require adding locks to multiple APIs,
> but luckily that wouldn't change their semantics.
>
> Ian, do you have better ideas?

Either of these two approaches sound good.

I'm not sure why using the JSON domid would need any additional
locking.  The code here already has the JSON in its hand, doesn't it ?
But using the domain _name_ rather than the domid is wrong, and I
think the JSON might have only the name.

I think the xenstore approach is probably better.  I think it may be
best to use (with checking) the frontend's backend path, since ideally
we would find the corresponding device entry directly.

But I am happy with whatever is most convenient.

I think we should fix this for 4.7 and the fix is a bugfix so OK to go
in after the freeze.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xen-users] DomU fails to reboot with storage driver domain
  2016-04-01 13:43         ` Ian Jackson
@ 2016-04-01 13:54           ` Wei Liu
  2016-04-12 20:05             ` Alex Velazquez
  0 siblings, 1 reply; 5+ messages in thread
From: Wei Liu @ 2016-04-01 13:54 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Alex Velazquez, xen-users, Wei Liu, xen-devel, Roger Pau Monné

On Fri, Apr 01, 2016 at 02:43:32PM +0100, Ian Jackson wrote:
> Wei Liu writes ("Re: [Xen-users] DomU fails to reboot with storage driver domain"):
> > On Fri, Apr 01, 2016 at 01:04:42PM +0200, Roger Pau Monné wrote:
> > > TBH, I don't see an easy way to solve this, I've thought about fetching 
> > > the "backend" node from the xenstore frontend path of each device, but 
> > > that's not safe since the guest can modify those entries.
> > > 
> > Interrogating frontend  is not entirely unsafe because we can validate
> > that path before reading from it. There is also a backend-id field that
> > we can use if that make validation easier -- no need to parse a frontend
> > provided string.
> > 
> > Another fix is to fetch all backend domain name / domid from JSON, then
> > fetch all xenstore backend entries. This is safe because JSON is not
> > controlled by guest. This might require adding locks to multiple APIs,
> > but luckily that wouldn't change their semantics.
> >
> > Ian, do you have better ideas?
> 
> Either of these two approaches sound good.
> 
> I'm not sure why using the JSON domid would need any additional
> locking.  The code here already has the JSON in its hand, doesn't it ?
> But using the domain _name_ rather than the domid is wrong, and I
> think the JSON might have only the name.
> 

The APIs that retrieve lists of devices will need to lock JSON config
as well, which they don't currently do because they only reference
xenstore.

> I think the xenstore approach is probably better.  I think it may be
> best to use (with checking) the frontend's backend path, since ideally
> we would find the corresponding device entry directly.
> 
> But I am happy with whatever is most convenient.
> 

Interrogating frontend is more convenient.

> I think we should fix this for 4.7 and the fix is a bugfix so OK to go
> in after the freeze.
> 

Yes, it's a bug that should be fixed. I can give it a stab next week
when I'm back in office unless someone else beats me to it.

Wei.

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Xen-users] DomU fails to reboot with storage driver domain
  2016-04-01 13:54           ` Wei Liu
@ 2016-04-12 20:05             ` Alex Velazquez
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Velazquez @ 2016-04-12 20:05 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-users, Ian Jackson, xen-devel, Roger Pau Monné

On Fri, Apr 1, 2016 at 9:54 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> On Fri, Apr 01, 2016 at 02:43:32PM +0100, Ian Jackson wrote:
>> Wei Liu writes ("Re: [Xen-users] DomU fails to reboot with storage driver domain"):
>> > On Fri, Apr 01, 2016 at 01:04:42PM +0200, Roger Pau Monné wrote:
>> > > TBH, I don't see an easy way to solve this, I've thought about fetching
>> > > the "backend" node from the xenstore frontend path of each device, but
>> > > that's not safe since the guest can modify those entries.
>> > >
>> > Interrogating frontend  is not entirely unsafe because we can validate
>> > that path before reading from it. There is also a backend-id field that
>> > we can use if that make validation easier -- no need to parse a frontend
>> > provided string.
>> >
>> > Another fix is to fetch all backend domain name / domid from JSON, then
>> > fetch all xenstore backend entries. This is safe because JSON is not
>> > controlled by guest. This might require adding locks to multiple APIs,
>> > but luckily that wouldn't change their semantics.
>> >
>> > Ian, do you have better ideas?
>>
>> Either of these two approaches sound good.
>>
>> I'm not sure why using the JSON domid would need any additional
>> locking.  The code here already has the JSON in its hand, doesn't it ?
>> But using the domain _name_ rather than the domid is wrong, and I
>> think the JSON might have only the name.
>>
>
> The APIs that retrieve lists of devices will need to lock JSON config
> as well, which they don't currently do because they only reference
> xenstore.
>
>> I think the xenstore approach is probably better.  I think it may be
>> best to use (with checking) the frontend's backend path, since ideally
>> we would find the corresponding device entry directly.
>>
>> But I am happy with whatever is most convenient.
>>
>
> Interrogating frontend is more convenient.
>
>> I think we should fix this for 4.7 and the fix is a bugfix so OK to go
>> in after the freeze.
>>
>
> Yes, it's a bug that should be fixed. I can give it a stab next week
> when I'm back in office unless someone else beats me to it.
>
> Wei.
>
>> Ian.

Hi all,

Not sure that there's any further information I can provide to help
with this, as it sounds like a viable solution has been proposed.
However, if/when there is a patch available, I am happy to test it out
on my end.

Thanks,
Alex

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-12 20:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALhSYYRek3PeJyJEVNSLFE6WCbkDc3n+Qj6L5XYwQKkMgthqPw@mail.gmail.com>
     [not found] ` <alpine.OSX.2.20.1603231152520.753@mac>
     [not found]   ` <CALhSYYQLPdEZ-pSBdUjKDWnNsmF95zu9t=T7zGZ69Vb1S1hwWg@mail.gmail.com>
2016-04-01 11:04     ` [Xen-users] DomU fails to reboot with storage driver domain Roger Pau Monné
2016-04-01 12:07       ` Wei Liu
2016-04-01 13:43         ` Ian Jackson
2016-04-01 13:54           ` Wei Liu
2016-04-12 20:05             ` Alex Velazquez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).