All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC, PATCH] docs: Block numbering and naming specification
@ 2010-09-21 15:24 Ian Jackson
  2010-09-22 11:37 ` Stefano Stabellini
  0 siblings, 1 reply; 8+ messages in thread
From: Ian Jackson @ 2010-09-21 15:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Campbell, Jeremy Fitzhardinge, Stefano Stabellini

This document describes the vbd device numbering and naming.  I've
posted versions of it before.  It should be in docs/misc, so here is a
patch to add it.

This is currently an RFC because the section near the bottom about the
behaviour of Linux guests needs to be checked for accuracy.  In
particular, it would be good for Stefano or Jeremy to confirm the
behaviour of current pvops kernels.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>

diff -r 77a3da957017 docs/misc/block-numbering-naming.txt
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/docs/misc/block-numbering-naming.txt	Tue Sep 21 16:20:53 2010 +0100
@@ -0,0 +1,124 @@
+Xen guest interface
+-------------------
+
+A Xen guest can be provided with block devices.  These are always
+provided as Xen VBDs; for HVM guests they may also be provided as
+emulated IDE or SCSI disks.
+
+The abstract interface involves specifying, for each block device:
+
+ * Nominal disk type: Xen virtual disk (aka xvd*, the default); SCSI
+   (sd*); IDE (hd*).
+
+   For HVM guests, each whole-disk hd* and and sd* device is made
+   available _both_ via emulated IDE resp. SCSI controller, _and_ as a
+   Xen VBD.  The HVM guest is entitled to assume that the IDE or SCSI
+   disks available via the emulated IDE controller target the same
+   underlying devices as the corresponding Xen VBD (ie, multipath).
+
+   For PV guests every device is made available to the guest only as a
+   Xen VBD.  For these domains the type is advisory, for use by the
+   guest's device naming scheme.
+
+   The Xen interface does not specify what name a device should have
+   in the guest (nor what major/minor device number it should have in
+   thee guest, if the guest has such a concept).
+
+ * Disk number, which is a nonnegative integer,
+   conventionally starting at 0 for the first disk.
+
+ * Partition number, which is a nonnegative integer where by
+   convention partition 0 indicates the "whole disk".
+
+   Normally for any disk _either_ partition 0 should be supplied in
+   which case the guest is expected to treat it as they would a native
+   whole disk (for example by putting or expecting a partition table
+   or disk label on it);
+
+   _Or_ only non-0 partitions should be supplied in which case the
+   guest should expect storage management to be done by the host and
+   treat each vbd as it would a partition or slice or LVM volume (for
+   example by putting or expecting a filesystem on it).
+
+   Non-whole disk devices cannot be passed through to HVM guests via
+   the emulated IDE or SCSI controllers.
+
+
+Configuration file syntax
+-------------------------
+
+The config file syntaxes are, for example
+
+       d0 d0p0  xvda     Xen virtual disk 0 partition 0 (whole disk)
+       d1p2     xvda2    Xen virtual disk 1 partition 2
+       d536p37  xvdtq37  Xen virtual disk 536 partition 37
+       sdb3              SCSI disk 1 partition 3
+       hdc2              IDE disk 2 partition 2
+
+The d*p* syntax is not supported by xm/xend.
+
+To cope with guests which predate this scheme we therefore preserve
+the existing facility to specify the xenstore numerical value directly
+by putting a single number (hex, decimal or octal) in the domain
+config file instead of the disk identifier.
+
+
+Concrete encoding in the VBD interface (in xenstore)
+----------------------------------------------------
+
+The information above is encoded in the concrete interface as an
+integer (in a canonical decimal format in xenstore), whose value
+encodes the information above as follows:
+
+    1 << 28 | disk << 8 | partition      xvd, disks or partitions 16 onwards
+   202 << 8 | disk << 4 | partition      xvd, disks and partitions up to 15
+     8 << 8 | disk << 4 | partition      sd, disks and partitions up to 15
+     3 << 8 | disk << 6 | partition      hd, disks 0..1, partitions 0..63
+    22 << 8 | (disk-2) << 6 | partition  hd, disks 2..3, partitions 0..63
+    2 << 28 onwards                      reserved for future use
+   other values less than 1 << 28        deprecated / reserved
+
+The 1<<28 format handles disks up to (1<<20)-1 and partitions up to
+255.  It will be used only where the 202<<8 format does not have
+enough bits.
+
+Guests MAY support any subset of the formats above except that if they
+support 1<<28 they MUST also support 202<<8.  PV-on-HVM drivers MUST
+support at least one of 3<<8 or 8<<8; 3<<8 is recommended.
+
+Some software has provided essentially Linux-specific encodings for
+SCSI disks beyond disk 15 partition 15, and IDE disks beyond disk 3
+partition 63.  These vbds, and the corresponding encoded integers, are
+deprecated.
+
+Guests SHOULD ignore numbers that they do not understand or
+recognise.  They SHOULD check supplied numbers for validity.
+
+
+Notes on Linux as a guest
+-------------------------
+
+Very old Linux guests (PV and PV-on-HVM) are able to "steal" the
+device numbers and names normally used by the IDE and SCSI
+controllers, so that writing "hda1" in the config file results in
+/dev/hda1 in the guest.  These systems interpret the xenstore integer
+as
+       major << 8 | minor
+where major and minor are the Linux-specific device numbers.  Some old
+configurations may depend on deprecated high-numbered SCSI and IDE
+disks.  This does not work in recent versions of Linux.
+
+So for Linux PV guests, users are recommended to supply xvd* devices
+only.  Modern PV drivers will map these to identically-named devices
+in the guest.
+
+For Linux HVM guests using PV-on-HVM drivers, users are recommended to
+supply as few hd* devices as possible and use pure xvd* devices for
+the rest.  Modern PV-on-HVM drivers will map the hd* devices to
+/dev/xvdHDa etc.
+
+Some Linux HVM guests with broken PV-on-HVM drivers do not cope
+properly if both hda and hdc are supplied, nor with both hda and xvda,
+because they directly map the bottom 8 bits of the xenstore integer
+directly to the Linux guest's device number and throw away the rest;
+they can crash due to minor number clashes.

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC, PATCH] docs: Block numbering and naming specification
  2010-09-21 15:24 [RFC, PATCH] docs: Block numbering and naming specification Ian Jackson
@ 2010-09-22 11:37 ` Stefano Stabellini
  2010-09-22 11:59   ` Ian Jackson
  2010-09-29  8:47   ` Ian Campbell
  0 siblings, 2 replies; 8+ messages in thread
From: Stefano Stabellini @ 2010-09-22 11:37 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel, Stabellini

On Tue, 21 Sep 2010, Ian Jackson wrote:
> This document describes the vbd device numbering and naming.  I've
> posted versions of it before.  It should be in docs/misc, so here is a
> patch to add it.
> 
> This is currently an RFC because the section near the bottom about the
> behaviour of Linux guests needs to be checked for accuracy.  In
> particular, it would be good for Stefano or Jeremy to confirm the
> behaviour of current pvops kernels.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
> 
> diff -r 77a3da957017 docs/misc/block-numbering-naming.txt
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/docs/misc/block-numbering-naming.txt	Tue Sep 21 16:20:53 2010 +0100
> @@ -0,0 +1,124 @@
> +Xen guest interface
> +-------------------
> +
> +A Xen guest can be provided with block devices.  These are always
> +provided as Xen VBDs; for HVM guests they may also be provided as
> +emulated IDE or SCSI disks.
> +
> +The abstract interface involves specifying, for each block device:
> +
> + * Nominal disk type: Xen virtual disk (aka xvd*, the default); SCSI
> +   (sd*); IDE (hd*).
> +
> +   For HVM guests, each whole-disk hd* and and sd* device is made
> +   available _both_ via emulated IDE resp. SCSI controller, _and_ as a
> +   Xen VBD.  The HVM guest is entitled to assume that the IDE or SCSI
> +   disks available via the emulated IDE controller target the same
> +   underlying devices as the corresponding Xen VBD (ie, multipath).
> +
> +   For PV guests every device is made available to the guest only as a
> +   Xen VBD.  For these domains the type is advisory, for use by the
> +   guest's device naming scheme.
> +
> +   The Xen interface does not specify what name a device should have
> +   in the guest (nor what major/minor device number it should have in
> +   thee guest, if the guest has such a concept).
> +

It should be made clear that for HVM guests specifying xvd* in the VM
config file means that the user is requesting a PV only disk without any
corresponding emulated disks.
As a consequence using only xvd* disks in an HVM config file is a
mistake, because grub (or any other bootloader) wouldn't be able to boot
the OS.


> + * Disk number, which is a nonnegative integer,
> +   conventionally starting at 0 for the first disk.
> +
> + * Partition number, which is a nonnegative integer where by
> +   convention partition 0 indicates the "whole disk".
> +
> +   Normally for any disk _either_ partition 0 should be supplied in
> +   which case the guest is expected to treat it as they would a native
> +   whole disk (for example by putting or expecting a partition table
> +   or disk label on it);
> +
> +   _Or_ only non-0 partitions should be supplied in which case the
> +   guest should expect storage management to be done by the host and
> +   treat each vbd as it would a partition or slice or LVM volume (for
> +   example by putting or expecting a filesystem on it).
> +
> +   Non-whole disk devices cannot be passed through to HVM guests via
> +   the emulated IDE or SCSI controllers.
> +
> +
> +Configuration file syntax
> +-------------------------
> +
> +The config file syntaxes are, for example
> +
> +       d0 d0p0  xvda     Xen virtual disk 0 partition 0 (whole disk)
> +       d1p2     xvda2    Xen virtual disk 1 partition 2

shouldn't this be xvdb2?

> +       d536p37  xvdtq37  Xen virtual disk 536 partition 37
> +       sdb3              SCSI disk 1 partition 3
> +       hdc2              IDE disk 2 partition 2
> +
> +The d*p* syntax is not supported by xm/xend.
> +
> +To cope with guests which predate this scheme we therefore preserve
> +the existing facility to specify the xenstore numerical value directly
> +by putting a single number (hex, decimal or octal) in the domain
> +config file instead of the disk identifier.
> +
> +
> +Concrete encoding in the VBD interface (in xenstore)
> +----------------------------------------------------
> +
> +The information above is encoded in the concrete interface as an
> +integer (in a canonical decimal format in xenstore), whose value
> +encodes the information above as follows:
> +
> +    1 << 28 | disk << 8 | partition      xvd, disks or partitions 16 onwards
> +   202 << 8 | disk << 4 | partition      xvd, disks and partitions up to 15
> +     8 << 8 | disk << 4 | partition      sd, disks and partitions up to 15
> +     3 << 8 | disk << 6 | partition      hd, disks 0..1, partitions 0..63
> +    22 << 8 | (disk-2) << 6 | partition  hd, disks 2..3, partitions 0..63
> +    2 << 28 onwards                      reserved for future use
> +   other values less than 1 << 28        deprecated / reserved
> +
> +The 1<<28 format handles disks up to (1<<20)-1 and partitions up to
> +255.  It will be used only where the 202<<8 format does not have
> +enough bits.
> +
> +Guests MAY support any subset of the formats above except that if they
> +support 1<<28 they MUST also support 202<<8.  PV-on-HVM drivers MUST
> +support at least one of 3<<8 or 8<<8; 3<<8 is recommended.
> +
> +Some software has provided essentially Linux-specific encodings for
> +SCSI disks beyond disk 15 partition 15, and IDE disks beyond disk 3
> +partition 63.  These vbds, and the corresponding encoded integers, are
> +deprecated.
> +
> +Guests SHOULD ignore numbers that they do not understand or
> +recognise.  They SHOULD check supplied numbers for validity.
> +
> +
> +Notes on Linux as a guest
> +-------------------------
> +
> +Very old Linux guests (PV and PV-on-HVM) are able to "steal" the
> +device numbers and names normally used by the IDE and SCSI
> +controllers, so that writing "hda1" in the config file results in
> +/dev/hda1 in the guest.  These systems interpret the xenstore integer
> +as
> +       major << 8 | minor
> +where major and minor are the Linux-specific device numbers.  Some old
> +configurations may depend on deprecated high-numbered SCSI and IDE
> +disks.  This does not work in recent versions of Linux.
> +
> +So for Linux PV guests, users are recommended to supply xvd* devices
> +only.  Modern PV drivers will map these to identically-named devices
> +in the guest.
> +
> +For Linux HVM guests using PV-on-HVM drivers, users are recommended to
> +supply as few hd* devices as possible and use pure xvd* devices for
> +the rest.  Modern PV-on-HVM drivers will map the hd* devices to
> +/dev/xvdHDa etc.
> +

moderm PV-on-HVM drivers will map the hd* devices to /dev/xvd* etc., so
"hda1" in the config file results in /dev/xvda1 in the guest.

> +Some Linux HVM guests with broken PV-on-HVM drivers do not cope
> +properly if both hda and hdc are supplied, nor with both hda and xvda,
> +because they directly map the bottom 8 bits of the xenstore integer
> +directly to the Linux guest's device number and throw away the rest;
> +they can crash due to minor number clashes.
> 
> -- 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC, PATCH] docs: Block numbering and naming specification
  2010-09-22 11:37 ` Stefano Stabellini
@ 2010-09-22 11:59   ` Ian Jackson
  2010-09-22 13:01     ` Stefano Stabellini
  2010-09-29  8:47   ` Ian Campbell
  1 sibling, 1 reply; 8+ messages in thread
From: Ian Jackson @ 2010-09-22 11:59 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Ian, Campbell, Jeremy Fitzhardinge, xen-devel

Stefano Stabellini writes ("Re: [RFC, PATCH] docs: Block numbering and naming specification"):
> moderm PV-on-HVM drivers will map the hd* devices to /dev/xvd* etc., so
> "hda1" in the config file results in /dev/xvda1 in the guest.

So what happens if you specify both hda and xvda in the domain config
file ?  The document as currently written seems to contemplate this as
a reasonable configuration.

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC, PATCH] docs: Block numbering and naming specification
  2010-09-22 11:59   ` Ian Jackson
@ 2010-09-22 13:01     ` Stefano Stabellini
  2010-09-23 18:11       ` Ian Jackson
  0 siblings, 1 reply; 8+ messages in thread
From: Stefano Stabellini @ 2010-09-22 13:01 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel, Stefano Stabellini

On Wed, 22 Sep 2010, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [RFC, PATCH] docs: Block numbering and naming specification"):
> > moderm PV-on-HVM drivers will map the hd* devices to /dev/xvd* etc., so
> > "hda1" in the config file results in /dev/xvda1 in the guest.
> 
> So what happens if you specify both hda and xvda in the domain config
> file ?  The document as currently written seems to contemplate this as
> a reasonable configuration.
 
the kernel won't boot but blkfront will print a warning about the problem

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC, PATCH] docs: Block numbering and naming specification
  2010-09-22 13:01     ` Stefano Stabellini
@ 2010-09-23 18:11       ` Ian Jackson
  0 siblings, 0 replies; 8+ messages in thread
From: Ian Jackson @ 2010-09-23 18:11 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel

Stefano Stabellini writes ("[Xen-devel] Re: [RFC, PATCH] docs: Block numbering and naming specification"):
> On Wed, 22 Sep 2010, Ian Jackson wrote:
> > So what happens if you specify both hda and xvda in the domain config
> > file ?  The document as currently written seems to contemplate this as
> > a reasonable configuration.
>  
> the kernel won't boot but blkfront will print a warning about the problem

OK, I'll try to update the document.

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC, PATCH] docs: Block numbering and naming specification
  2010-09-22 11:37 ` Stefano Stabellini
  2010-09-22 11:59   ` Ian Jackson
@ 2010-09-29  8:47   ` Ian Campbell
  2010-09-29  8:59     ` Stefano Stabellini
  2010-09-29 10:44     ` Ian Jackson
  1 sibling, 2 replies; 8+ messages in thread
From: Ian Campbell @ 2010-09-29  8:47 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Jeremy Fitzhardinge, xen-devel, Ian Jackson

On Wed, 2010-09-22 at 12:37 +0100, Stefano Stabellini wrote:
> On Tue, 21 Sep 2010, Ian Jackson wrote:
> > This document describes the vbd device numbering and naming.  I've
> > posted versions of it before.  It should be in docs/misc, so here is a
> > patch to add it.
> > 
> > This is currently an RFC because the section near the bottom about the
> > behaviour of Linux guests needs to be checked for accuracy.  In
> > particular, it would be good for Stefano or Jeremy to confirm the
> > behaviour of current pvops kernels.
> > 
> > Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
> > Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
> > Cc: Jeremy Fitzhardinge <jeremy@goop.org>
> > 
> > diff -r 77a3da957017 docs/misc/block-numbering-naming.txt
> > --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> > +++ b/docs/misc/block-numbering-naming.txt	Tue Sep 21 16:20:53 2010 +0100
> > @@ -0,0 +1,124 @@
> > +Xen guest interface
> > +-------------------
> > +
> > +A Xen guest can be provided with block devices.  These are always
> > +provided as Xen VBDs; for HVM guests they may also be provided as
> > +emulated IDE or SCSI disks.
> > +
> > +The abstract interface involves specifying, for each block device:
> > +
> > + * Nominal disk type: Xen virtual disk (aka xvd*, the default); SCSI
> > +   (sd*); IDE (hd*).
> > +
> > +   For HVM guests, each whole-disk hd* and and sd* device is made
> > +   available _both_ via emulated IDE resp. SCSI controller, _and_ as a
> > +   Xen VBD.  The HVM guest is entitled to assume that the IDE or SCSI
> > +   disks available via the emulated IDE controller target the same
> > +   underlying devices as the corresponding Xen VBD (ie, multipath).
> > +
> > +   For PV guests every device is made available to the guest only as a
> > +   Xen VBD.  For these domains the type is advisory, for use by the
> > +   guest's device naming scheme.
> > +
> > +   The Xen interface does not specify what name a device should have
> > +   in the guest (nor what major/minor device number it should have in
> > +   thee guest, if the guest has such a concept).
> > +
> 
> It should be made clear that for HVM guests specifying xvd* in the VM
> config file means that the user is requesting a PV only disk without any
> corresponding emulated disks.

> As a consequence using only xvd* disks in an HVM config file is a
> mistake, because grub (or any other bootloader) wouldn't be able to boot
> the OS.

How does one boot in this case then? The current behaviour is that you
get both xvd* and hd* when you ask for only xvd*. I agree that this is
nasty but it is how it works today so we should at least document what
the correct configuration is if we are going to deprecate it.

Is the correct configuration in this case to have both? e.g.:
	disk = ['phy:/dev/VG/VM,xvda,w', 'phy:/dev/VG/VM,hda,w']

Ian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC, PATCH] docs: Block numbering and naming specification
  2010-09-29  8:47   ` Ian Campbell
@ 2010-09-29  8:59     ` Stefano Stabellini
  2010-09-29 10:44     ` Ian Jackson
  1 sibling, 0 replies; 8+ messages in thread
From: Stefano Stabellini @ 2010-09-29  8:59 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Jeremy Fitzhardinge, xen-devel, Ian Jackson, Stefano Stabellini

On Wed, 29 Sep 2010, Ian Campbell wrote:
> How does one boot in this case then? The current behaviour is that you
> get both xvd* and hd* when you ask for only xvd*. I agree that this is
> nasty but it is how it works today so we should at least document what
> the correct configuration is if we are going to deprecate it.
> 
> Is the correct configuration in this case to have both? e.g.:
> 	disk = ['phy:/dev/VG/VM,xvda,w', 'phy:/dev/VG/VM,hda,w']
> 
 
I think we should choose a behavior and be consistent, so it would be
probably clearer if in that case you get both devices, and in this case:

 	disk = ['phy:/dev/VG/VM,xvda,w', 'phy:/dev/VG/VM,xvdb,w']

your guest doesn't boot.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC, PATCH] docs: Block numbering and naming specification
  2010-09-29  8:47   ` Ian Campbell
  2010-09-29  8:59     ` Stefano Stabellini
@ 2010-09-29 10:44     ` Ian Jackson
  1 sibling, 0 replies; 8+ messages in thread
From: Ian Jackson @ 2010-09-29 10:44 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Fitzhardinge, xen-devel, Jeremy, Stefano Stabellini

Ian Campbell writes ("Re: [RFC, PATCH] docs: Block numbering and naming specification"):
> On Wed, 2010-09-22 at 12:37 +0100, Stefano Stabellini wrote:
> > As a consequence using only xvd* disks in an HVM config file is a
> > mistake, because grub (or any other bootloader) wouldn't be able to boot
> > the OS.
> 
> How does one boot in this case then?

Via the network perhaps ?  Or perhaps Stefano is saying "don't do that
then".

>   The current behaviour is that you
> get both xvd* and hd* when you ask for only xvd*. I agree that this is
> nasty but it is how it works today so we should at least document what
> the correct configuration is if we are going to deprecate it.

There's a compatibility hack that means that if you don't specify
_any_ hd* devices but _do_ specify some xvd* devices, qemu-xen treats
all of the xvd*'s as if they were hd*'s.

> Is the correct configuration in this case to have both? e.g.:
> 	disk = ['phy:/dev/VG/VM,xvda,w', 'phy:/dev/VG/VM,hda,w']

I don't think this is ever the correct configuration and nor should we
change things so that it is.

Making all emulated disks available via the PV interfaces is the
appropriate behaviour (and is what we do with networks, too).

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-09-29 10:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-21 15:24 [RFC, PATCH] docs: Block numbering and naming specification Ian Jackson
2010-09-22 11:37 ` Stefano Stabellini
2010-09-22 11:59   ` Ian Jackson
2010-09-22 13:01     ` Stefano Stabellini
2010-09-23 18:11       ` Ian Jackson
2010-09-29  8:47   ` Ian Campbell
2010-09-29  8:59     ` Stefano Stabellini
2010-09-29 10:44     ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.