All of lore.kernel.org
 help / color / mirror / Atom feed
* rbd storage pool support for libvirt
@ 2010-11-02  3:52 Sage Weil
  2010-11-02 19:47 ` Wido den Hollander
  2010-11-03 13:59 ` [libvirt] " Daniel P. Berrange
  0 siblings, 2 replies; 14+ messages in thread
From: Sage Weil @ 2010-11-02  3:52 UTC (permalink / raw)
  To: libvir-list; +Cc: ceph-devel

Hi,

We've been working on RBD, a distributed block device backed by the Ceph 
distributed object store.  (Ceph is a highly scalable, fault tolerant 
distributed storage and file system; see http://ceph.newdream.net.)  
Although the Ceph file system client has been in Linux since 2.6.34, the 
RBD block device was just merged for 2.6.37.  We also have patches pending 
for Qemu that use librados to natively talk to the Ceph storage backend, 
avoiding any kernel dependency.

To support disks backed by RBD in libvirt, we originally proposed a 
'virtual' type that simply passed the configuration information through to 
qemu, but that idea was shot down for a variety of reasons:

	http://www.redhat.com/archives/libvir-list/2010-June/thread.html#00257

It sounds like the "right" approach is to create a storage pool type.  
Ceph also has a 'pool' concept that contains some number of RBD images and 
a command line tool to manipulate (create, destroy, resize, rename, 
snapshot, etc.) those images, which seems to map nicely onto the storage 
pool abstraction.  For example,

 $ rbd create foo -s 1000
 rbd image 'foo':
         size 1000 MB in 250 objects
         order 22 (4096 KB objects)
 adding rbd image to directory...
  creating rbd image...
 done.
 $ rbd create bar -s 10000
 [...]
 $ rbd list
 bar
 foo

Something along the lines of

 <pool type="rbd">
   <name>virtimages</name>
   <source mode="kernel">
     <host monitor="ceph-mon1.domain.com:6789"/>
     <host monitor="ceph-mon2.domain.com:6789"/>
     <host monitor="ceph-mon3.domain.com:6789"/>
     <pool name="rbd"/>
   </source>
 </pool>

or whatever (I'm not too familiar with the libvirt schema)?  One 
difference between the existing pool types listed at 
libvirt.org/storage.html is that RBD does not necessarily associate itself 
with a path in the local file system.  If the native qemu driver is used, 
there is no path involved, just a magic string passed to qemu 
(rbd:poolname/imagename).  If the kernel RBD driver is used, it gets 
mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n 
is not static across reboots.

In any case, before someone goes off and implements something, does this 
look like the right general approach to adding rbd support to libvirt?

Thanks!
sage


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: rbd storage pool support for libvirt
  2010-11-02  3:52 rbd storage pool support for libvirt Sage Weil
@ 2010-11-02 19:47 ` Wido den Hollander
  2010-11-02 19:50   ` Wido den Hollander
  2010-11-03 13:59 ` [libvirt] " Daniel P. Berrange
  1 sibling, 1 reply; 14+ messages in thread
From: Wido den Hollander @ 2010-11-02 19:47 UTC (permalink / raw)
  To: ceph-devel

Hi,

I've given this a try a few months ago, what I found out that there is a
difference between a storage pool and a disk declaration in libvirt.

I'll take the LVM storage pool as an example:

In src/storage you will find storage_backend_logical.c|h, these are
simple "wrappers" around the LVM commands like lvcreate, lvremove, etc,
etc.


static int
virStorageBackendLogicalDeleteVol(virConnectPtr conn ATTRIBUTE_UNUSED,
                                  virStoragePoolObjPtr pool
ATTRIBUTE_UNUSED,
                                  virStorageVolDefPtr vol,
                                  unsigned int flags ATTRIBUTE_UNUSED)
{
    const char *cmdargv[] = {
        LVREMOVE, "-f", vol->target.path, NULL
    };

    if (virRun(cmdargv, NULL) < 0)
        return -1;

    return 0;
}


virStorageBackend virStorageBackendLogical = {
    .type = VIR_STORAGE_POOL_LOGICAL,

    ....
    ....
    ....
    .deleteVol = virStorageBackendLogicalDeleteVol,
    ....
};

As you can see, libvirt simply calls "lvremove" to remove the command,
but this does not help you mapping the LV to a virtual machine, it's
just a mechanism to manage your storage via libvirt, as you can do with
Virt-Manager (which uses libvirt)

Below you find two screenshots how this works in Virt Manager, as you
can see, you can manage your VG's and attach LV's to a Virtual Machine.

* http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_allocation.png
*
http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_manager_virt.png

Note, this is Virt Manager and not libvirt, but it uses libvirt you
perform these actions.

On the CLI you have for example: vol-create, vol-delete, pool-create,
pool-delete

But, there is no special disk format for a LV, in my XML there is:

    <disk type='block' device='disk'>
      <source dev='/dev/xen-domains/v3-root'/>
      <target dev='sda' bus='scsi'/>
    </disk>

So libvirt somehow reads "source dev" and maps this back to a VG and LV.

A storage manager for RBD would simply mean implementing wrap functions
around the "rbd" binary and parsing output from it.

Implementing RBD support in libvirt would then mean two things:

1. Storage manager in libvirt
2. A special disk format for RBD

The first one is done as I explained above, but for the second one, I'm
not sure how you could do that.

Libvirt now expects a disk to always be a file/block, the virtual disks
like RBD and NBD are not supported.

For #2 we should have a "special" disk declaration format, like
mentioned on the RedHat mailinglist:

http://www.redhat.com/archives/libvir-list/2010-June/msg00300.html

<disk type='rbd' device='disk'>
  <driver name='qemu' type='raw' />
  <source pool='rbd' image='alpha' />
  <target dev='vda' bus='virtio' />
</disk>

As images on a RBD image are always "raw", it might seem obsolete to
define this, but newer version of Qemu don't autodetect formats.

Defining a monitor in the disk declaration won't be possible I think, I
don't see a way to get that parameter down to librados, so we need a
valid /etc/ceph/ceph.conf

Now, I'm not a libvirt expert, this is what I found in my search.

Any suggestions / thoughts about this?

Thanks,

Wido

On Mon, 2010-11-01 at 20:52 -0700, Sage Weil wrote:
> Hi,
> 
> We've been working on RBD, a distributed block device backed by the Ceph 
> distributed object store.  (Ceph is a highly scalable, fault tolerant 
> distributed storage and file system; see http://ceph.newdream.net.)  
> Although the Ceph file system client has been in Linux since 2.6.34, the 
> RBD block device was just merged for 2.6.37.  We also have patches pending 
> for Qemu that use librados to natively talk to the Ceph storage backend, 
> avoiding any kernel dependency.
> 
> To support disks backed by RBD in libvirt, we originally proposed a 
> 'virtual' type that simply passed the configuration information through to 
> qemu, but that idea was shot down for a variety of reasons:
> 
> 	http://www.redhat.com/archives/libvir-list/2010-June/thread.html#00257
> 
> It sounds like the "right" approach is to create a storage pool type.  
> Ceph also has a 'pool' concept that contains some number of RBD images and 
> a command line tool to manipulate (create, destroy, resize, rename, 
> snapshot, etc.) those images, which seems to map nicely onto the storage 
> pool abstraction.  For example,
> 
>  $ rbd create foo -s 1000
>  rbd image 'foo':
>          size 1000 MB in 250 objects
>          order 22 (4096 KB objects)
>  adding rbd image to directory...
>   creating rbd image...
>  done.
>  $ rbd create bar -s 10000
>  [...]
>  $ rbd list
>  bar
>  foo
> 
> Something along the lines of
> 
>  <pool type="rbd">
>    <name>virtimages</name>
>    <source mode="kernel">
>      <host monitor="ceph-mon1.domain.com:6789"/>
>      <host monitor="ceph-mon2.domain.com:6789"/>
>      <host monitor="ceph-mon3.domain.com:6789"/>
>      <pool name="rbd"/>
>    </source>
>  </pool>
> 
> or whatever (I'm not too familiar with the libvirt schema)?  One 
> difference between the existing pool types listed at 
> libvirt.org/storage.html is that RBD does not necessarily associate itself 
> with a path in the local file system.  If the native qemu driver is used, 
> there is no path involved, just a magic string passed to qemu 
> (rbd:poolname/imagename).  If the kernel RBD driver is used, it gets 
> mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n 
> is not static across reboots.
> 
> In any case, before someone goes off and implements something, does this 
> look like the right general approach to adding rbd support to libvirt?
> 
> Thanks!
> sage
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: rbd storage pool support for libvirt
  2010-11-02 19:47 ` Wido den Hollander
@ 2010-11-02 19:50   ` Wido den Hollander
  0 siblings, 0 replies; 14+ messages in thread
From: Wido den Hollander @ 2010-11-02 19:50 UTC (permalink / raw)
  To: ceph-devel

Seems there was somebody recently with the same problem:
http://www.redhat.com/archives/libvir-list/2010-October/msg01247.html

NBD seems to be suffering from the same limitations as RBD.

On Tue, 2010-11-02 at 20:47 +0100, Wido den Hollander wrote:
> Hi,
> 
> I've given this a try a few months ago, what I found out that there is a
> difference between a storage pool and a disk declaration in libvirt.
> 
> I'll take the LVM storage pool as an example:
> 
> In src/storage you will find storage_backend_logical.c|h, these are
> simple "wrappers" around the LVM commands like lvcreate, lvremove, etc,
> etc.
> 
> 
> static int
> virStorageBackendLogicalDeleteVol(virConnectPtr conn ATTRIBUTE_UNUSED,
>                                   virStoragePoolObjPtr pool
> ATTRIBUTE_UNUSED,
>                                   virStorageVolDefPtr vol,
>                                   unsigned int flags ATTRIBUTE_UNUSED)
> {
>     const char *cmdargv[] = {
>         LVREMOVE, "-f", vol->target.path, NULL
>     };
> 
>     if (virRun(cmdargv, NULL) < 0)
>         return -1;
> 
>     return 0;
> }
> 
> 
> virStorageBackend virStorageBackendLogical = {
>     .type = VIR_STORAGE_POOL_LOGICAL,
> 
>     ....
>     ....
>     ....
>     .deleteVol = virStorageBackendLogicalDeleteVol,
>     ....
> };
> 
> As you can see, libvirt simply calls "lvremove" to remove the command,
> but this does not help you mapping the LV to a virtual machine, it's
> just a mechanism to manage your storage via libvirt, as you can do with
> Virt-Manager (which uses libvirt)
> 
> Below you find two screenshots how this works in Virt Manager, as you
> can see, you can manage your VG's and attach LV's to a Virtual Machine.
> 
> * http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_allocation.png
> *
> http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_manager_virt.png
> 
> Note, this is Virt Manager and not libvirt, but it uses libvirt you
> perform these actions.
> 
> On the CLI you have for example: vol-create, vol-delete, pool-create,
> pool-delete
> 
> But, there is no special disk format for a LV, in my XML there is:
> 
>     <disk type='block' device='disk'>
>       <source dev='/dev/xen-domains/v3-root'/>
>       <target dev='sda' bus='scsi'/>
>     </disk>
> 
> So libvirt somehow reads "source dev" and maps this back to a VG and LV.
> 
> A storage manager for RBD would simply mean implementing wrap functions
> around the "rbd" binary and parsing output from it.
> 
> Implementing RBD support in libvirt would then mean two things:
> 
> 1. Storage manager in libvirt
> 2. A special disk format for RBD
> 
> The first one is done as I explained above, but for the second one, I'm
> not sure how you could do that.
> 
> Libvirt now expects a disk to always be a file/block, the virtual disks
> like RBD and NBD are not supported.
> 
> For #2 we should have a "special" disk declaration format, like
> mentioned on the RedHat mailinglist:
> 
> http://www.redhat.com/archives/libvir-list/2010-June/msg00300.html
> 
> <disk type='rbd' device='disk'>
>   <driver name='qemu' type='raw' />
>   <source pool='rbd' image='alpha' />
>   <target dev='vda' bus='virtio' />
> </disk>
> 
> As images on a RBD image are always "raw", it might seem obsolete to
> define this, but newer version of Qemu don't autodetect formats.
> 
> Defining a monitor in the disk declaration won't be possible I think, I
> don't see a way to get that parameter down to librados, so we need a
> valid /etc/ceph/ceph.conf
> 
> Now, I'm not a libvirt expert, this is what I found in my search.
> 
> Any suggestions / thoughts about this?
> 
> Thanks,
> 
> Wido
> 
> On Mon, 2010-11-01 at 20:52 -0700, Sage Weil wrote:
> > Hi,
> > 
> > We've been working on RBD, a distributed block device backed by the Ceph 
> > distributed object store.  (Ceph is a highly scalable, fault tolerant 
> > distributed storage and file system; see http://ceph.newdream.net.)  
> > Although the Ceph file system client has been in Linux since 2.6.34, the 
> > RBD block device was just merged for 2.6.37.  We also have patches pending 
> > for Qemu that use librados to natively talk to the Ceph storage backend, 
> > avoiding any kernel dependency.
> > 
> > To support disks backed by RBD in libvirt, we originally proposed a 
> > 'virtual' type that simply passed the configuration information through to 
> > qemu, but that idea was shot down for a variety of reasons:
> > 
> > 	http://www.redhat.com/archives/libvir-list/2010-June/thread.html#00257
> > 
> > It sounds like the "right" approach is to create a storage pool type.  
> > Ceph also has a 'pool' concept that contains some number of RBD images and 
> > a command line tool to manipulate (create, destroy, resize, rename, 
> > snapshot, etc.) those images, which seems to map nicely onto the storage 
> > pool abstraction.  For example,
> > 
> >  $ rbd create foo -s 1000
> >  rbd image 'foo':
> >          size 1000 MB in 250 objects
> >          order 22 (4096 KB objects)
> >  adding rbd image to directory...
> >   creating rbd image...
> >  done.
> >  $ rbd create bar -s 10000
> >  [...]
> >  $ rbd list
> >  bar
> >  foo
> > 
> > Something along the lines of
> > 
> >  <pool type="rbd">
> >    <name>virtimages</name>
> >    <source mode="kernel">
> >      <host monitor="ceph-mon1.domain.com:6789"/>
> >      <host monitor="ceph-mon2.domain.com:6789"/>
> >      <host monitor="ceph-mon3.domain.com:6789"/>
> >      <pool name="rbd"/>
> >    </source>
> >  </pool>
> > 
> > or whatever (I'm not too familiar with the libvirt schema)?  One 
> > difference between the existing pool types listed at 
> > libvirt.org/storage.html is that RBD does not necessarily associate itself 
> > with a path in the local file system.  If the native qemu driver is used, 
> > there is no path involved, just a magic string passed to qemu 
> > (rbd:poolname/imagename).  If the kernel RBD driver is used, it gets 
> > mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n 
> > is not static across reboots.
> > 
> > In any case, before someone goes off and implements something, does this 
> > look like the right general approach to adding rbd support to libvirt?
> > 
> > Thanks!
> > sage
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-02  3:52 rbd storage pool support for libvirt Sage Weil
  2010-11-02 19:47 ` Wido den Hollander
@ 2010-11-03 13:59 ` Daniel P. Berrange
  2010-11-05 23:33   ` Sage Weil
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel P. Berrange @ 2010-11-03 13:59 UTC (permalink / raw)
  To: Sage Weil; +Cc: libvir-list, ceph-devel

On Mon, Nov 01, 2010 at 08:52:05PM -0700, Sage Weil wrote:
> Hi,
> 
> We've been working on RBD, a distributed block device backed by the Ceph 
> distributed object store.  (Ceph is a highly scalable, fault tolerant 
> distributed storage and file system; see http://ceph.newdream.net.)  
> Although the Ceph file system client has been in Linux since 2.6.34, the 
> RBD block device was just merged for 2.6.37.  We also have patches pending 
> for Qemu that use librados to natively talk to the Ceph storage backend, 
> avoiding any kernel dependency.
> 
> To support disks backed by RBD in libvirt, we originally proposed a 
> 'virtual' type that simply passed the configuration information through to 
> qemu, but that idea was shot down for a variety of reasons:
> 
> 	http://www.redhat.com/archives/libvir-list/2010-June/thread.html#00257

NB, I'm not against adding new disk types to the guest XML, just
that each type should be explicitly modelled, rather than being
lumped under a generic 'virtual' type.
 
> It sounds like the "right" approach is to create a storage pool type.  

Sort of. There's really two separate aspects to handling storage
in libvirt

 1. How do you configure a VM to use a storage volume
 2. How do you list/create/delete storage volumes

The XML addition proposed in the mailing list post above is attempting
to cater for the first aspect. The storage pool type idea you're 
describing in this post is catering to the second aspect. 

If the storage pool ends up providing real block devices that exist
on the filesystem, then the first item is trivially solved, because
libvirt can already point any guest at a block device. If the storage
pool provides some kind of virtual device, then we'd still need to
decide how to deal with the XML for configuring the guest VM.

> Ceph also has a 'pool' concept that contains some number of RBD images and 
> a command line tool to manipulate (create, destroy, resize, rename, 
> snapshot, etc.) those images, which seems to map nicely onto the storage 
> pool abstraction.  For example,

Agreed, it does look like it'd map in quite well and let the RDB
functionality more or less 'just work' in virt-manager & other 
apps using storage pool APIs.

>  $ rbd create foo -s 1000
>  rbd image 'foo':
>          size 1000 MB in 250 objects
>          order 22 (4096 KB objects)
>  adding rbd image to directory...
>   creating rbd image...
>  done.
>  $ rbd create bar -s 10000
>  [...]
>  $ rbd list
>  bar
>  foo
> 
> Something along the lines of
> 
>  <pool type="rbd">
>    <name>virtimages</name>
>    <source mode="kernel">
>      <host monitor="ceph-mon1.domain.com:6789"/>
>      <host monitor="ceph-mon2.domain.com:6789"/>
>      <host monitor="ceph-mon3.domain.com:6789"/>
>      <pool name="rbd"/>
>    </source>
>  </pool>

What do the 3 hostnames represent in this context ?

> or whatever (I'm not too familiar with the libvirt schema)?  One 
> difference between the existing pool types listed at 
> libvirt.org/storage.html is that RBD does not necessarily associate itself 
> with a path in the local file system.  If the native qemu driver is used, 
> there is no path involved, just a magic string passed to qemu 
> (rbd:poolname/imagename).  If the kernel RBD driver is used, it gets 
> mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n 
> is not static across reboots.

The docs about storage pool are slightly inaccurate. While it is
desirable that the storage volume path exists on the filesystem,
it is not something we strictly require. The only require that
there is some way to map from the storage volume path to the
corresponding guest XML

If we define a new guest XML syntax for RBD magic strings, then
we can also define a storage pool that provides path data in a
corresponding format.

WRT to the issue of /dev/rbd/$n being unstable, this is quite similar
to the issue of /dev/sdXX device names being unstable for SCSI. The
way to cope with this is to drop in a UDEV ruleset that creates 
symlinks with sensible names, eg perhaps setup symlinks for: 

  /dev/disk/by-id/rbd-$poolname-$imagename -> /dev/rbd/0

It might also make sense to wire up /dev/disk/by-path symlinks
for RBD devices.

> In any case, before someone goes off and implements something, does this 
> look like the right general approach to adding rbd support to libvirt?

I think this looks reasonable. I'd be inclined to get the storage pool
stuff working with the kernel RBD driver & UDEV rules for stable path
names, since that avoids needing to make any changes to guest XML
format. Support for QEMU with the native librados CEPH driver could
be added as a second patch.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-03 13:59 ` [libvirt] " Daniel P. Berrange
@ 2010-11-05 23:33   ` Sage Weil
  2010-11-08 13:16     ` Daniel P. Berrange
  0 siblings, 1 reply; 14+ messages in thread
From: Sage Weil @ 2010-11-05 23:33 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: libvir-list, ceph-devel

Hi Daniel,

On Wed, 3 Nov 2010, Daniel P. Berrange wrote:
> > Ceph also has a 'pool' concept that contains some number of RBD images and 
> > a command line tool to manipulate (create, destroy, resize, rename, 
> > snapshot, etc.) those images, which seems to map nicely onto the storage 
> > pool abstraction.  For example,
> 
> Agreed, it does look like it'd map in quite well and let the RDB
> functionality more or less 'just work' in virt-manager & other 
> apps using storage pool APIs.

Great!

> > Something along the lines of
> > 
> >  <pool type="rbd">
> >    <name>virtimages</name>
> >    <source mode="kernel">
> >      <host monitor="ceph-mon1.domain.com:6789"/>
> >      <host monitor="ceph-mon2.domain.com:6789"/>
> >      <host monitor="ceph-mon3.domain.com:6789"/>
> >      <pool name="rbd"/>
> >    </source>
> >  </pool>
> 
> What do the 3 hostnames represent in this context ?

They're the host(s) that RBD needs to be fed to talk to the storage 
cluster.  Ideally there's more than one for redundancy. Does the above 
syntax look reasonable, or is there something you would propose instead?  
From the RBD side of things, the key parameters are

 - pool name
 - monitor address(es)
 - user and secret key to authenticate with

If the 'rbd' command line tool is used for this, everything but the pool 
can come out of the default /etc/ceph/ceph.conf config file, or we could
have a way to specify a config path in the XML.

> > or whatever (I'm not too familiar with the libvirt schema)?  One 
> > difference between the existing pool types listed at 
> > libvirt.org/storage.html is that RBD does not necessarily associate itself 
> > with a path in the local file system.  If the native qemu driver is used, 
> > there is no path involved, just a magic string passed to qemu 
> > (rbd:poolname/imagename).  If the kernel RBD driver is used, it gets 
> > mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n 
> > is not static across reboots.
> 
> The docs about storage pool are slightly inaccurate. While it is
> desirable that the storage volume path exists on the filesystem,
> it is not something we strictly require. The only require that
> there is some way to map from the storage volume path to the
> corresponding guest XML
> 
> If we define a new guest XML syntax for RBD magic strings, then
> we can also define a storage pool that provides path data in a
> corresponding format.

Ok thanks, that clarifies things.
 
> WRT to the issue of /dev/rbd/$n being unstable, this is quite similar
> to the issue of /dev/sdXX device names being unstable for SCSI. The
> way to cope with this is to drop in a UDEV ruleset that creates 
> symlinks with sensible names, eg perhaps setup symlinks for: 
> 
>   /dev/disk/by-id/rbd-$poolname-$imagename -> /dev/rbd/0
> 
> It might also make sense to wire up /dev/disk/by-path symlinks
> for RBD devices.

We're putting together some udev rules to do this.

> > In any case, before someone goes off and implements something, does this 
> > look like the right general approach to adding rbd support to libvirt?
> 
> I think this looks reasonable. I'd be inclined to get the storage pool
> stuff working with the kernel RBD driver & UDEV rules for stable path
> names, since that avoids needing to make any changes to guest XML
> format. Support for QEMU with the native librados CEPH driver could
> be added as a second patch.

Okay, that sounds reasonable.  Supporting the QEMU librados driver is 
definitely something we want to target, though, and seems to be route that 
more users are interested in.  Is defining the XML syntax for a guest VM 
something we can discuss now as well?

(BTW this is biting NBD users too.  Presumably the guest VM XML should 
look similar?

http://www.redhat.com/archives/libvir-list/2010-October/msg01247.html
)

Thanks!
sage

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-05 23:33   ` Sage Weil
@ 2010-11-08 13:16     ` Daniel P. Berrange
  2010-11-18  0:33       ` Josh Durgin
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel P. Berrange @ 2010-11-08 13:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: libvir-list, ceph-devel

On Fri, Nov 05, 2010 at 04:33:46PM -0700, Sage Weil wrote:
> On Wed, 3 Nov 2010, Daniel P. Berrange wrote:
> > > Something along the lines of
> > > 
> > >  <pool type="rbd">
> > >    <name>virtimages</name>
> > >    <source mode="kernel">
> > >      <host monitor="ceph-mon1.domain.com:6789"/>
> > >      <host monitor="ceph-mon2.domain.com:6789"/>
> > >      <host monitor="ceph-mon3.domain.com:6789"/>
> > >      <pool name="rbd"/>
> > >    </source>
> > >  </pool>
> > 
> > What do the 3 hostnames represent in this context ?
> 
> They're the host(s) that RBD needs to be fed to talk to the storage 
> cluster.  Ideally there's more than one for redundancy. Does the above 
> syntax look reasonable, or is there something you would propose instead?  
> From the RBD side of things, the key parameters are
> 
>  - pool name
>  - monitor address(es)
>  - user and secret key to authenticate with
> 
> If the 'rbd' command line tool is used for this, everything but the pool 
> can come out of the default /etc/ceph/ceph.conf config file, or we could
> have a way to specify a config path in the XML.

It makes sense to allow the hostname in the XML, because the general
goal is that you should be able to configure storage without needing
to SSH into a machine & manually setup config files. The XML does not
currently allow multiple hosts, but we can extend that. 

The pool name / user / key are already covered.

> > > In any case, before someone goes off and implements something, does this 
> > > look like the right general approach to adding rbd support to libvirt?
> > 
> > I think this looks reasonable. I'd be inclined to get the storage pool
> > stuff working with the kernel RBD driver & UDEV rules for stable path
> > names, since that avoids needing to make any changes to guest XML
> > format. Support for QEMU with the native librados CEPH driver could
> > be added as a second patch.
> 
> Okay, that sounds reasonable.  Supporting the QEMU librados driver is 
> definitely something we want to target, though, and seems to be route that 
> more users are interested in.  Is defining the XML syntax for a guest VM 
> something we can discuss now as well?
> 
> (BTW this is biting NBD users too.  Presumably the guest VM XML should 
> look similar?

And also Sheepdog storage volumes. To define a syntax for all these we need
to determine what configuration metadata is required at a per-VM level for
each of them. Then try and decide how to represent that in the guest XML.
It looks like at a VM level we'd need a hostname, port number and a volume
name (or path).

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-08 13:16     ` Daniel P. Berrange
@ 2010-11-18  0:33       ` Josh Durgin
  2010-11-18  2:04         ` Josh Durgin
  2010-11-18 10:42         ` Daniel P. Berrange
  0 siblings, 2 replies; 14+ messages in thread
From: Josh Durgin @ 2010-11-18  0:33 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Sage Weil, libvir-list, ceph-devel

Hi Daniel,

On 11/08/2010 05:16 AM, Daniel P. Berrange wrote:
>>>> In any case, before someone goes off and implements something, does this
>>>> look like the right general approach to adding rbd support to libvirt?
>>>
>>> I think this looks reasonable. I'd be inclined to get the storage pool
>>> stuff working with the kernel RBD driver&  UDEV rules for stable path
>>> names, since that avoids needing to make any changes to guest XML
>>> format. Support for QEMU with the native librados CEPH driver could
>>> be added as a second patch.
>>
>> Okay, that sounds reasonable.  Supporting the QEMU librados driver is
>> definitely something we want to target, though, and seems to be route that
>> more users are interested in.  Is defining the XML syntax for a guest VM
>> something we can discuss now as well?
>>
>> (BTW this is biting NBD users too.  Presumably the guest VM XML should
>> look similar?
>
> And also Sheepdog storage volumes. To define a syntax for all these we need
> to determine what configuration metadata is required at a per-VM level for
> each of them. Then try and decide how to represent that in the guest XML.
> It looks like at a VM level we'd need a hostname, port number and a volume
> name (or path).

It looks like that's what Sheepdog needs from the patch that was
submitted earlier today. For RBD, we would want to allow multiple hosts,
and specify the pool and image name when the QEMU librados driver is
used, e.g.:

     <disk type="rbd" device="disk">
       <driver name="qemu" type="raw" />
       <source vdi="image_name" pool="pool_name">
         <host name="mon1.example.org" port="6000">
         <host name="mon2.example.org" port="6000">
         <host name="mon3.example.org" port="6000">
       </source>
       <target dev="vda" bus="virtio" />
     </disk>

As you mentioned earlier, we could just use the existing source format
for the kernel RBD driver.

Does this seem like a reasonable format for the VM XML? Any suggestions?

Thanks,
Josh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-18  0:33       ` Josh Durgin
@ 2010-11-18  2:04         ` Josh Durgin
  2010-11-18 10:38           ` Daniel P. Berrange
  2010-11-18 10:42         ` Daniel P. Berrange
  1 sibling, 1 reply; 14+ messages in thread
From: Josh Durgin @ 2010-11-18  2:04 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Sage Weil, libvir-list, ceph-devel

On 11/17/2010 04:33 PM, Josh Durgin wrote:
> Hi Daniel,
>
> On 11/08/2010 05:16 AM, Daniel P. Berrange wrote:
>>>>> In any case, before someone goes off and implements something, does
>>>>> this
>>>>> look like the right general approach to adding rbd support to libvirt?
>>>>
>>>> I think this looks reasonable. I'd be inclined to get the storage pool
>>>> stuff working with the kernel RBD driver& UDEV rules for stable path
>>>> names, since that avoids needing to make any changes to guest XML
>>>> format. Support for QEMU with the native librados CEPH driver could
>>>> be added as a second patch.
>>>
>>> Okay, that sounds reasonable. Supporting the QEMU librados driver is
>>> definitely something we want to target, though, and seems to be route
>>> that
>>> more users are interested in. Is defining the XML syntax for a guest VM
>>> something we can discuss now as well?
>>>
>>> (BTW this is biting NBD users too. Presumably the guest VM XML should
>>> look similar?
>>
>> And also Sheepdog storage volumes. To define a syntax for all these we
>> need
>> to determine what configuration metadata is required at a per-VM level
>> for
>> each of them. Then try and decide how to represent that in the guest XML.
>> It looks like at a VM level we'd need a hostname, port number and a
>> volume
>> name (or path).
>
> It looks like that's what Sheepdog needs from the patch that was
> submitted earlier today. For RBD, we would want to allow multiple hosts,
> and specify the pool and image name when the QEMU librados driver is
> used, e.g.:
>
> <disk type="rbd" device="disk">
> <driver name="qemu" type="raw" />
> <source vdi="image_name" pool="pool_name">
> <host name="mon1.example.org" port="6000">
> <host name="mon2.example.org" port="6000">
> <host name="mon3.example.org" port="6000">
> </source>
> <target dev="vda" bus="virtio" />
> </disk>
>
> As you mentioned earlier, we could just use the existing source format
> for the kernel RBD driver.
>
> Does this seem like a reasonable format for the VM XML? Any suggestions?

Also, it would be convenient to be able to specify which RBD driver to 
use in the guest XML, so that it's independent of the libvirt pool 
configuration. Would having two different rbd disk types be the right 
approach here?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-18  2:04         ` Josh Durgin
@ 2010-11-18 10:38           ` Daniel P. Berrange
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel P. Berrange @ 2010-11-18 10:38 UTC (permalink / raw)
  To: Josh Durgin; +Cc: Sage Weil, libvir-list, ceph-devel

On Wed, Nov 17, 2010 at 06:04:50PM -0800, Josh Durgin wrote:
> On 11/17/2010 04:33 PM, Josh Durgin wrote:
> >Hi Daniel,
> >
> >On 11/08/2010 05:16 AM, Daniel P. Berrange wrote:
> >>>>>In any case, before someone goes off and implements something, does
> >>>>>this
> >>>>>look like the right general approach to adding rbd support to libvirt?
> >>>>
> >>>>I think this looks reasonable. I'd be inclined to get the storage pool
> >>>>stuff working with the kernel RBD driver& UDEV rules for stable path
> >>>>names, since that avoids needing to make any changes to guest XML
> >>>>format. Support for QEMU with the native librados CEPH driver could
> >>>>be added as a second patch.
> >>>
> >>>Okay, that sounds reasonable. Supporting the QEMU librados driver is
> >>>definitely something we want to target, though, and seems to be route
> >>>that
> >>>more users are interested in. Is defining the XML syntax for a guest VM
> >>>something we can discuss now as well?
> >>>
> >>>(BTW this is biting NBD users too. Presumably the guest VM XML should
> >>>look similar?
> >>
> >>And also Sheepdog storage volumes. To define a syntax for all these we
> >>need
> >>to determine what configuration metadata is required at a per-VM level
> >>for
> >>each of them. Then try and decide how to represent that in the guest XML.
> >>It looks like at a VM level we'd need a hostname, port number and a
> >>volume
> >>name (or path).
> >
> >It looks like that's what Sheepdog needs from the patch that was
> >submitted earlier today. For RBD, we would want to allow multiple hosts,
> >and specify the pool and image name when the QEMU librados driver is
> >used, e.g.:
> >
> ><disk type="rbd" device="disk">
> ><driver name="qemu" type="raw" />
> ><source vdi="image_name" pool="pool_name">
> ><host name="mon1.example.org" port="6000">
> ><host name="mon2.example.org" port="6000">
> ><host name="mon3.example.org" port="6000">
> ></source>
> ><target dev="vda" bus="virtio" />
> ></disk>
> >
> >As you mentioned earlier, we could just use the existing source format
> >for the kernel RBD driver.
> >
> >Does this seem like a reasonable format for the VM XML? Any suggestions?
> 
> Also, it would be convenient to be able to specify which RBD driver to 
> use in the guest XML, so that it's independent of the libvirt pool 
> configuration. Would having two different rbd disk types be the right 
> approach here?

What do you mean by  RBD driver here ? kernel vs native QEMU ? If so,
the kernel case is trivially handled by the <disk type='block'> case,
so we only need new syntax for the native QEMU impl


Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-18  0:33       ` Josh Durgin
  2010-11-18  2:04         ` Josh Durgin
@ 2010-11-18 10:42         ` Daniel P. Berrange
  2010-11-18 17:13           ` Sage Weil
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel P. Berrange @ 2010-11-18 10:42 UTC (permalink / raw)
  To: Josh Durgin; +Cc: Sage Weil, libvir-list, ceph-devel

On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote:
> Hi Daniel,
> 
> On 11/08/2010 05:16 AM, Daniel P. Berrange wrote:
> >>>>In any case, before someone goes off and implements something, does this
> >>>>look like the right general approach to adding rbd support to libvirt?
> >>>
> >>>I think this looks reasonable. I'd be inclined to get the storage pool
> >>>stuff working with the kernel RBD driver&  UDEV rules for stable path
> >>>names, since that avoids needing to make any changes to guest XML
> >>>format. Support for QEMU with the native librados CEPH driver could
> >>>be added as a second patch.
> >>
> >>Okay, that sounds reasonable.  Supporting the QEMU librados driver is
> >>definitely something we want to target, though, and seems to be route that
> >>more users are interested in.  Is defining the XML syntax for a guest VM
> >>something we can discuss now as well?
> >>
> >>(BTW this is biting NBD users too.  Presumably the guest VM XML should
> >>look similar?
> >
> >And also Sheepdog storage volumes. To define a syntax for all these we need
> >to determine what configuration metadata is required at a per-VM level for
> >each of them. Then try and decide how to represent that in the guest XML.
> >It looks like at a VM level we'd need a hostname, port number and a volume
> >name (or path).
> 
> It looks like that's what Sheepdog needs from the patch that was
> submitted earlier today. For RBD, we would want to allow multiple hosts,
> and specify the pool and image name when the QEMU librados driver is
> used, e.g.:
> 
>     <disk type="rbd" device="disk">
>       <driver name="qemu" type="raw" />
>       <source vdi="image_name" pool="pool_name">
>         <host name="mon1.example.org" port="6000">
>         <host name="mon2.example.org" port="6000">
>         <host name="mon3.example.org" port="6000">
>       </source>
>       <target dev="vda" bus="virtio" />
>     </disk>
> 
> Does this seem like a reasonable format for the VM XML? Any suggestions?

I'm basically wondering whether we should be going for separate types for
each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier
today. Or type to merge them into one type 'nework' which covers any kind of
network block device, and list a protocol on the  source element, eg

     <disk type="network" device="disk">
       <driver name="qemu" type="raw" />
       <source protocol='rbd|sheepdog|nbd' name="...some image identifier...">
         <host name="mon1.example.org" port="6000">
         <host name="mon2.example.org" port="6000">
         <host name="mon3.example.org" port="6000">
       </source>
       <target dev="vda" bus="virtio" />
     </disk>


Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-18 10:42         ` Daniel P. Berrange
@ 2010-11-18 17:13           ` Sage Weil
  2010-11-19  9:27             ` Stefan Hajnoczi
  0 siblings, 1 reply; 14+ messages in thread
From: Sage Weil @ 2010-11-18 17:13 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Josh Durgin, libvir-list, ceph-devel

On Thu, 18 Nov 2010, Daniel P. Berrange wrote:
> On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote:
> > Hi Daniel,
> > 
> > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote:
> > >>>>In any case, before someone goes off and implements something, does this
> > >>>>look like the right general approach to adding rbd support to libvirt?
> > >>>
> > >>>I think this looks reasonable. I'd be inclined to get the storage pool
> > >>>stuff working with the kernel RBD driver&  UDEV rules for stable path
> > >>>names, since that avoids needing to make any changes to guest XML
> > >>>format. Support for QEMU with the native librados CEPH driver could
> > >>>be added as a second patch.
> > >>
> > >>Okay, that sounds reasonable.  Supporting the QEMU librados driver is
> > >>definitely something we want to target, though, and seems to be route that
> > >>more users are interested in.  Is defining the XML syntax for a guest VM
> > >>something we can discuss now as well?
> > >>
> > >>(BTW this is biting NBD users too.  Presumably the guest VM XML should
> > >>look similar?
> > >
> > >And also Sheepdog storage volumes. To define a syntax for all these we need
> > >to determine what configuration metadata is required at a per-VM level for
> > >each of them. Then try and decide how to represent that in the guest XML.
> > >It looks like at a VM level we'd need a hostname, port number and a volume
> > >name (or path).
> > 
> > It looks like that's what Sheepdog needs from the patch that was
> > submitted earlier today. For RBD, we would want to allow multiple hosts,
> > and specify the pool and image name when the QEMU librados driver is
> > used, e.g.:
> > 
> >     <disk type="rbd" device="disk">
> >       <driver name="qemu" type="raw" />
> >       <source vdi="image_name" pool="pool_name">
> >         <host name="mon1.example.org" port="6000">
> >         <host name="mon2.example.org" port="6000">
> >         <host name="mon3.example.org" port="6000">
> >       </source>
> >       <target dev="vda" bus="virtio" />
> >     </disk>
> > 
> > Does this seem like a reasonable format for the VM XML? Any suggestions?
> 
> I'm basically wondering whether we should be going for separate types for
> each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier
> today. Or type to merge them into one type 'nework' which covers any kind of
> network block device, and list a protocol on the  source element, eg
> 
>      <disk type="network" device="disk">
>        <driver name="qemu" type="raw" />
>        <source protocol='rbd|sheepdog|nbd' name="...some image identifier...">
>          <host name="mon1.example.org" port="6000">
>          <host name="mon2.example.org" port="6000">
>          <host name="mon3.example.org" port="6000">
>        </source>
>        <target dev="vda" bus="virtio" />
>      </disk>

That would work...

One thing that I think should be considered, though, is that both RBD and 
NBD can be used for non-qemu instances by mapping a regular block device 
via the host's kernel.  And in that case, there's some sysfs-fu (at least 
in the rbd case; I'm not familiar with how the nbd client works) required 
to set up/tear down the block device.

I think the ideal would be if either method (qemu or kernel driver) could 
be used, and libvirt could take care of that process of setting up the 
block device so that RBD (and/or NBD) can be used with non-qemu instances. 
If that means totally separate <disk> descriptions for the two scenarios, 
that's fine, as long as there's a way for a storage pool driver to be used 
to set up both types of mappings...

sage

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-18 17:13           ` Sage Weil
@ 2010-11-19  9:27             ` Stefan Hajnoczi
  2010-11-19  9:50               ` Daniel P. Berrange
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2010-11-19  9:27 UTC (permalink / raw)
  To: Sage Weil; +Cc: libvir-list, ceph-devel

On Thu, Nov 18, 2010 at 5:13 PM, Sage Weil <sage@newdream.net> wrote:
> On Thu, 18 Nov 2010, Daniel P. Berrange wrote:
>> On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote:
>> > Hi Daniel,
>> >
>> > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote:
>> > >>>>In any case, before someone goes off and implements something, does this
>> > >>>>look like the right general approach to adding rbd support to libvirt?
>> > >>>
>> > >>>I think this looks reasonable. I'd be inclined to get the storage pool
>> > >>>stuff working with the kernel RBD driver&  UDEV rules for stable path
>> > >>>names, since that avoids needing to make any changes to guest XML
>> > >>>format. Support for QEMU with the native librados CEPH driver could
>> > >>>be added as a second patch.
>> > >>
>> > >>Okay, that sounds reasonable.  Supporting the QEMU librados driver is
>> > >>definitely something we want to target, though, and seems to be route that
>> > >>more users are interested in.  Is defining the XML syntax for a guest VM
>> > >>something we can discuss now as well?
>> > >>
>> > >>(BTW this is biting NBD users too.  Presumably the guest VM XML should
>> > >>look similar?
>> > >
>> > >And also Sheepdog storage volumes. To define a syntax for all these we need
>> > >to determine what configuration metadata is required at a per-VM level for
>> > >each of them. Then try and decide how to represent that in the guest XML.
>> > >It looks like at a VM level we'd need a hostname, port number and a volume
>> > >name (or path).
>> >
>> > It looks like that's what Sheepdog needs from the patch that was
>> > submitted earlier today. For RBD, we would want to allow multiple hosts,
>> > and specify the pool and image name when the QEMU librados driver is
>> > used, e.g.:
>> >
>> >     <disk type="rbd" device="disk">
>> >       <driver name="qemu" type="raw" />
>> >       <source vdi="image_name" pool="pool_name">
>> >         <host name="mon1.example.org" port="6000">
>> >         <host name="mon2.example.org" port="6000">
>> >         <host name="mon3.example.org" port="6000">
>> >       </source>
>> >       <target dev="vda" bus="virtio" />
>> >     </disk>
>> >
>> > Does this seem like a reasonable format for the VM XML? Any suggestions?
>>
>> I'm basically wondering whether we should be going for separate types for
>> each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier
>> today. Or type to merge them into one type 'nework' which covers any kind of
>> network block device, and list a protocol on the  source element, eg
>>
>>      <disk type="network" device="disk">
>>        <driver name="qemu" type="raw" />
>>        <source protocol='rbd|sheepdog|nbd' name="...some image identifier...">
>>          <host name="mon1.example.org" port="6000">
>>          <host name="mon2.example.org" port="6000">
>>          <host name="mon3.example.org" port="6000">
>>        </source>
>>        <target dev="vda" bus="virtio" />
>>      </disk>
>
> That would work...
>
> One thing that I think should be considered, though, is that both RBD and
> NBD can be used for non-qemu instances by mapping a regular block device
> via the host's kernel.  And in that case, there's some sysfs-fu (at least
> in the rbd case; I'm not familiar with how the nbd client works) required
> to set up/tear down the block device.

An nbd block device is attached using the nbd-client(1) userspace tool:
$ nbd-client my-server 1234 /dev/nbd0 # <host> <port> <nbd-device>

That program will open the socket, grab /dev/nbd0, and poke it with a
few ioctls so the kernel has the socket and can take it from there.

Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-19  9:27             ` Stefan Hajnoczi
@ 2010-11-19  9:50               ` Daniel P. Berrange
  2010-11-19 12:55                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel P. Berrange @ 2010-11-19  9:50 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Sage Weil, libvir-list, ceph-devel

On Fri, Nov 19, 2010 at 09:27:40AM +0000, Stefan Hajnoczi wrote:
> On Thu, Nov 18, 2010 at 5:13 PM, Sage Weil <sage@newdream.net> wrote:
> > On Thu, 18 Nov 2010, Daniel P. Berrange wrote:
> >> On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote:
> >> > Hi Daniel,
> >> >
> >> > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote:
> >> > >>>>In any case, before someone goes off and implements something, does this
> >> > >>>>look like the right general approach to adding rbd support to libvirt?
> >> > >>>
> >> > >>>I think this looks reasonable. I'd be inclined to get the storage pool
> >> > >>>stuff working with the kernel RBD driver&  UDEV rules for stable path
> >> > >>>names, since that avoids needing to make any changes to guest XML
> >> > >>>format. Support for QEMU with the native librados CEPH driver could
> >> > >>>be added as a second patch.
> >> > >>
> >> > >>Okay, that sounds reasonable.  Supporting the QEMU librados driver is
> >> > >>definitely something we want to target, though, and seems to be route that
> >> > >>more users are interested in.  Is defining the XML syntax for a guest VM
> >> > >>something we can discuss now as well?
> >> > >>
> >> > >>(BTW this is biting NBD users too.  Presumably the guest VM XML should
> >> > >>look similar?
> >> > >
> >> > >And also Sheepdog storage volumes. To define a syntax for all these we need
> >> > >to determine what configuration metadata is required at a per-VM level for
> >> > >each of them. Then try and decide how to represent that in the guest XML.
> >> > >It looks like at a VM level we'd need a hostname, port number and a volume
> >> > >name (or path).
> >> >
> >> > It looks like that's what Sheepdog needs from the patch that was
> >> > submitted earlier today. For RBD, we would want to allow multiple hosts,
> >> > and specify the pool and image name when the QEMU librados driver is
> >> > used, e.g.:
> >> >
> >> >     <disk type="rbd" device="disk">
> >> >       <driver name="qemu" type="raw" />
> >> >       <source vdi="image_name" pool="pool_name">
> >> >         <host name="mon1.example.org" port="6000">
> >> >         <host name="mon2.example.org" port="6000">
> >> >         <host name="mon3.example.org" port="6000">
> >> >       </source>
> >> >       <target dev="vda" bus="virtio" />
> >> >     </disk>
> >> >
> >> > Does this seem like a reasonable format for the VM XML? Any suggestions?
> >>
> >> I'm basically wondering whether we should be going for separate types for
> >> each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier
> >> today. Or type to merge them into one type 'nework' which covers any kind of
> >> network block device, and list a protocol on the  source element, eg
> >>
> >>      <disk type="network" device="disk">
> >>        <driver name="qemu" type="raw" />
> >>        <source protocol='rbd|sheepdog|nbd' name="...some image identifier...">
> >>          <host name="mon1.example.org" port="6000">
> >>          <host name="mon2.example.org" port="6000">
> >>          <host name="mon3.example.org" port="6000">
> >>        </source>
> >>        <target dev="vda" bus="virtio" />
> >>      </disk>
> >
> > That would work...
> >
> > One thing that I think should be considered, though, is that both RBD and
> > NBD can be used for non-qemu instances by mapping a regular block device
> > via the host's kernel.  And in that case, there's some sysfs-fu (at least
> > in the rbd case; I'm not familiar with how the nbd client works) required
> > to set up/tear down the block device.
> 
> An nbd block device is attached using the nbd-client(1) userspace tool:
> $ nbd-client my-server 1234 /dev/nbd0 # <host> <port> <nbd-device>
> 
> That program will open the socket, grab /dev/nbd0, and poke it with a
> few ioctls so the kernel has the socket and can take it from there.

We don't need to worry about this for libvirt/QEMU. Since QEMU has native
NBD client support there's no need to do anything with nbd client tools
to setup the device for use with a VM.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [libvirt] rbd storage pool support for libvirt
  2010-11-19  9:50               ` Daniel P. Berrange
@ 2010-11-19 12:55                 ` Stefan Hajnoczi
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2010-11-19 12:55 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Sage Weil, libvir-list, ceph-devel

On Fri, Nov 19, 2010 at 9:50 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
> On Fri, Nov 19, 2010 at 09:27:40AM +0000, Stefan Hajnoczi wrote:
>> On Thu, Nov 18, 2010 at 5:13 PM, Sage Weil <sage@newdream.net> wrote:
>> > On Thu, 18 Nov 2010, Daniel P. Berrange wrote:
>> >> On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote:
>> >> > Hi Daniel,
>> >> >
>> >> > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote:
>> >> > >>>>In any case, before someone goes off and implements something, does this
>> >> > >>>>look like the right general approach to adding rbd support to libvirt?
>> >> > >>>
>> >> > >>>I think this looks reasonable. I'd be inclined to get the storage pool
>> >> > >>>stuff working with the kernel RBD driver&  UDEV rules for stable path
>> >> > >>>names, since that avoids needing to make any changes to guest XML
>> >> > >>>format. Support for QEMU with the native librados CEPH driver could
>> >> > >>>be added as a second patch.
>> >> > >>
>> >> > >>Okay, that sounds reasonable.  Supporting the QEMU librados driver is
>> >> > >>definitely something we want to target, though, and seems to be route that
>> >> > >>more users are interested in.  Is defining the XML syntax for a guest VM
>> >> > >>something we can discuss now as well?
>> >> > >>
>> >> > >>(BTW this is biting NBD users too.  Presumably the guest VM XML should
>> >> > >>look similar?
>> >> > >
>> >> > >And also Sheepdog storage volumes. To define a syntax for all these we need
>> >> > >to determine what configuration metadata is required at a per-VM level for
>> >> > >each of them. Then try and decide how to represent that in the guest XML.
>> >> > >It looks like at a VM level we'd need a hostname, port number and a volume
>> >> > >name (or path).
>> >> >
>> >> > It looks like that's what Sheepdog needs from the patch that was
>> >> > submitted earlier today. For RBD, we would want to allow multiple hosts,
>> >> > and specify the pool and image name when the QEMU librados driver is
>> >> > used, e.g.:
>> >> >
>> >> >     <disk type="rbd" device="disk">
>> >> >       <driver name="qemu" type="raw" />
>> >> >       <source vdi="image_name" pool="pool_name">
>> >> >         <host name="mon1.example.org" port="6000">
>> >> >         <host name="mon2.example.org" port="6000">
>> >> >         <host name="mon3.example.org" port="6000">
>> >> >       </source>
>> >> >       <target dev="vda" bus="virtio" />
>> >> >     </disk>
>> >> >
>> >> > Does this seem like a reasonable format for the VM XML? Any suggestions?
>> >>
>> >> I'm basically wondering whether we should be going for separate types for
>> >> each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier
>> >> today. Or type to merge them into one type 'nework' which covers any kind of
>> >> network block device, and list a protocol on the  source element, eg
>> >>
>> >>      <disk type="network" device="disk">
>> >>        <driver name="qemu" type="raw" />
>> >>        <source protocol='rbd|sheepdog|nbd' name="...some image identifier...">
>> >>          <host name="mon1.example.org" port="6000">
>> >>          <host name="mon2.example.org" port="6000">
>> >>          <host name="mon3.example.org" port="6000">
>> >>        </source>
>> >>        <target dev="vda" bus="virtio" />
>> >>      </disk>
>> >
>> > That would work...
>> >
>> > One thing that I think should be considered, though, is that both RBD and
>> > NBD can be used for non-qemu instances by mapping a regular block device
>> > via the host's kernel.  And in that case, there's some sysfs-fu (at least
>> > in the rbd case; I'm not familiar with how the nbd client works) required
>> > to set up/tear down the block device.
>>
>> An nbd block device is attached using the nbd-client(1) userspace tool:
>> $ nbd-client my-server 1234 /dev/nbd0 # <host> <port> <nbd-device>
>>
>> That program will open the socket, grab /dev/nbd0, and poke it with a
>> few ioctls so the kernel has the socket and can take it from there.
>
> We don't need to worry about this for libvirt/QEMU. Since QEMU has native
> NBD client support there's no need to do anything with nbd client tools
> to setup the device for use with a VM.

I agree it's easier to use the built-in NBD support.  Just wanted to
provide the background on how NBD client works when using the kernel
implementation.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-11-19 12:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-02  3:52 rbd storage pool support for libvirt Sage Weil
2010-11-02 19:47 ` Wido den Hollander
2010-11-02 19:50   ` Wido den Hollander
2010-11-03 13:59 ` [libvirt] " Daniel P. Berrange
2010-11-05 23:33   ` Sage Weil
2010-11-08 13:16     ` Daniel P. Berrange
2010-11-18  0:33       ` Josh Durgin
2010-11-18  2:04         ` Josh Durgin
2010-11-18 10:38           ` Daniel P. Berrange
2010-11-18 10:42         ` Daniel P. Berrange
2010-11-18 17:13           ` Sage Weil
2010-11-19  9:27             ` Stefan Hajnoczi
2010-11-19  9:50               ` Daniel P. Berrange
2010-11-19 12:55                 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.