* rbd storage pool support for libvirt @ 2010-11-02 3:52 Sage Weil 2010-11-02 19:47 ` Wido den Hollander 2010-11-03 13:59 ` [libvirt] " Daniel P. Berrange 0 siblings, 2 replies; 14+ messages in thread From: Sage Weil @ 2010-11-02 3:52 UTC (permalink / raw) To: libvir-list; +Cc: ceph-devel Hi, We've been working on RBD, a distributed block device backed by the Ceph distributed object store. (Ceph is a highly scalable, fault tolerant distributed storage and file system; see http://ceph.newdream.net.) Although the Ceph file system client has been in Linux since 2.6.34, the RBD block device was just merged for 2.6.37. We also have patches pending for Qemu that use librados to natively talk to the Ceph storage backend, avoiding any kernel dependency. To support disks backed by RBD in libvirt, we originally proposed a 'virtual' type that simply passed the configuration information through to qemu, but that idea was shot down for a variety of reasons: http://www.redhat.com/archives/libvir-list/2010-June/thread.html#00257 It sounds like the "right" approach is to create a storage pool type. Ceph also has a 'pool' concept that contains some number of RBD images and a command line tool to manipulate (create, destroy, resize, rename, snapshot, etc.) those images, which seems to map nicely onto the storage pool abstraction. For example, $ rbd create foo -s 1000 rbd image 'foo': size 1000 MB in 250 objects order 22 (4096 KB objects) adding rbd image to directory... creating rbd image... done. $ rbd create bar -s 10000 [...] $ rbd list bar foo Something along the lines of <pool type="rbd"> <name>virtimages</name> <source mode="kernel"> <host monitor="ceph-mon1.domain.com:6789"/> <host monitor="ceph-mon2.domain.com:6789"/> <host monitor="ceph-mon3.domain.com:6789"/> <pool name="rbd"/> </source> </pool> or whatever (I'm not too familiar with the libvirt schema)? One difference between the existing pool types listed at libvirt.org/storage.html is that RBD does not necessarily associate itself with a path in the local file system. If the native qemu driver is used, there is no path involved, just a magic string passed to qemu (rbd:poolname/imagename). If the kernel RBD driver is used, it gets mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n is not static across reboots. In any case, before someone goes off and implements something, does this look like the right general approach to adding rbd support to libvirt? Thanks! sage ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: rbd storage pool support for libvirt 2010-11-02 3:52 rbd storage pool support for libvirt Sage Weil @ 2010-11-02 19:47 ` Wido den Hollander 2010-11-02 19:50 ` Wido den Hollander 2010-11-03 13:59 ` [libvirt] " Daniel P. Berrange 1 sibling, 1 reply; 14+ messages in thread From: Wido den Hollander @ 2010-11-02 19:47 UTC (permalink / raw) To: ceph-devel Hi, I've given this a try a few months ago, what I found out that there is a difference between a storage pool and a disk declaration in libvirt. I'll take the LVM storage pool as an example: In src/storage you will find storage_backend_logical.c|h, these are simple "wrappers" around the LVM commands like lvcreate, lvremove, etc, etc. static int virStorageBackendLogicalDeleteVol(virConnectPtr conn ATTRIBUTE_UNUSED, virStoragePoolObjPtr pool ATTRIBUTE_UNUSED, virStorageVolDefPtr vol, unsigned int flags ATTRIBUTE_UNUSED) { const char *cmdargv[] = { LVREMOVE, "-f", vol->target.path, NULL }; if (virRun(cmdargv, NULL) < 0) return -1; return 0; } virStorageBackend virStorageBackendLogical = { .type = VIR_STORAGE_POOL_LOGICAL, .... .... .... .deleteVol = virStorageBackendLogicalDeleteVol, .... }; As you can see, libvirt simply calls "lvremove" to remove the command, but this does not help you mapping the LV to a virtual machine, it's just a mechanism to manage your storage via libvirt, as you can do with Virt-Manager (which uses libvirt) Below you find two screenshots how this works in Virt Manager, as you can see, you can manage your VG's and attach LV's to a Virtual Machine. * http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_allocation.png * http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_manager_virt.png Note, this is Virt Manager and not libvirt, but it uses libvirt you perform these actions. On the CLI you have for example: vol-create, vol-delete, pool-create, pool-delete But, there is no special disk format for a LV, in my XML there is: <disk type='block' device='disk'> <source dev='/dev/xen-domains/v3-root'/> <target dev='sda' bus='scsi'/> </disk> So libvirt somehow reads "source dev" and maps this back to a VG and LV. A storage manager for RBD would simply mean implementing wrap functions around the "rbd" binary and parsing output from it. Implementing RBD support in libvirt would then mean two things: 1. Storage manager in libvirt 2. A special disk format for RBD The first one is done as I explained above, but for the second one, I'm not sure how you could do that. Libvirt now expects a disk to always be a file/block, the virtual disks like RBD and NBD are not supported. For #2 we should have a "special" disk declaration format, like mentioned on the RedHat mailinglist: http://www.redhat.com/archives/libvir-list/2010-June/msg00300.html <disk type='rbd' device='disk'> <driver name='qemu' type='raw' /> <source pool='rbd' image='alpha' /> <target dev='vda' bus='virtio' /> </disk> As images on a RBD image are always "raw", it might seem obsolete to define this, but newer version of Qemu don't autodetect formats. Defining a monitor in the disk declaration won't be possible I think, I don't see a way to get that parameter down to librados, so we need a valid /etc/ceph/ceph.conf Now, I'm not a libvirt expert, this is what I found in my search. Any suggestions / thoughts about this? Thanks, Wido On Mon, 2010-11-01 at 20:52 -0700, Sage Weil wrote: > Hi, > > We've been working on RBD, a distributed block device backed by the Ceph > distributed object store. (Ceph is a highly scalable, fault tolerant > distributed storage and file system; see http://ceph.newdream.net.) > Although the Ceph file system client has been in Linux since 2.6.34, the > RBD block device was just merged for 2.6.37. We also have patches pending > for Qemu that use librados to natively talk to the Ceph storage backend, > avoiding any kernel dependency. > > To support disks backed by RBD in libvirt, we originally proposed a > 'virtual' type that simply passed the configuration information through to > qemu, but that idea was shot down for a variety of reasons: > > http://www.redhat.com/archives/libvir-list/2010-June/thread.html#00257 > > It sounds like the "right" approach is to create a storage pool type. > Ceph also has a 'pool' concept that contains some number of RBD images and > a command line tool to manipulate (create, destroy, resize, rename, > snapshot, etc.) those images, which seems to map nicely onto the storage > pool abstraction. For example, > > $ rbd create foo -s 1000 > rbd image 'foo': > size 1000 MB in 250 objects > order 22 (4096 KB objects) > adding rbd image to directory... > creating rbd image... > done. > $ rbd create bar -s 10000 > [...] > $ rbd list > bar > foo > > Something along the lines of > > <pool type="rbd"> > <name>virtimages</name> > <source mode="kernel"> > <host monitor="ceph-mon1.domain.com:6789"/> > <host monitor="ceph-mon2.domain.com:6789"/> > <host monitor="ceph-mon3.domain.com:6789"/> > <pool name="rbd"/> > </source> > </pool> > > or whatever (I'm not too familiar with the libvirt schema)? One > difference between the existing pool types listed at > libvirt.org/storage.html is that RBD does not necessarily associate itself > with a path in the local file system. If the native qemu driver is used, > there is no path involved, just a magic string passed to qemu > (rbd:poolname/imagename). If the kernel RBD driver is used, it gets > mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n > is not static across reboots. > > In any case, before someone goes off and implements something, does this > look like the right general approach to adding rbd support to libvirt? > > Thanks! > sage > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: rbd storage pool support for libvirt 2010-11-02 19:47 ` Wido den Hollander @ 2010-11-02 19:50 ` Wido den Hollander 0 siblings, 0 replies; 14+ messages in thread From: Wido den Hollander @ 2010-11-02 19:50 UTC (permalink / raw) To: ceph-devel Seems there was somebody recently with the same problem: http://www.redhat.com/archives/libvir-list/2010-October/msg01247.html NBD seems to be suffering from the same limitations as RBD. On Tue, 2010-11-02 at 20:47 +0100, Wido den Hollander wrote: > Hi, > > I've given this a try a few months ago, what I found out that there is a > difference between a storage pool and a disk declaration in libvirt. > > I'll take the LVM storage pool as an example: > > In src/storage you will find storage_backend_logical.c|h, these are > simple "wrappers" around the LVM commands like lvcreate, lvremove, etc, > etc. > > > static int > virStorageBackendLogicalDeleteVol(virConnectPtr conn ATTRIBUTE_UNUSED, > virStoragePoolObjPtr pool > ATTRIBUTE_UNUSED, > virStorageVolDefPtr vol, > unsigned int flags ATTRIBUTE_UNUSED) > { > const char *cmdargv[] = { > LVREMOVE, "-f", vol->target.path, NULL > }; > > if (virRun(cmdargv, NULL) < 0) > return -1; > > return 0; > } > > > virStorageBackend virStorageBackendLogical = { > .type = VIR_STORAGE_POOL_LOGICAL, > > .... > .... > .... > .deleteVol = virStorageBackendLogicalDeleteVol, > .... > }; > > As you can see, libvirt simply calls "lvremove" to remove the command, > but this does not help you mapping the LV to a virtual machine, it's > just a mechanism to manage your storage via libvirt, as you can do with > Virt-Manager (which uses libvirt) > > Below you find two screenshots how this works in Virt Manager, as you > can see, you can manage your VG's and attach LV's to a Virtual Machine. > > * http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_allocation.png > * > http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_manager_virt.png > > Note, this is Virt Manager and not libvirt, but it uses libvirt you > perform these actions. > > On the CLI you have for example: vol-create, vol-delete, pool-create, > pool-delete > > But, there is no special disk format for a LV, in my XML there is: > > <disk type='block' device='disk'> > <source dev='/dev/xen-domains/v3-root'/> > <target dev='sda' bus='scsi'/> > </disk> > > So libvirt somehow reads "source dev" and maps this back to a VG and LV. > > A storage manager for RBD would simply mean implementing wrap functions > around the "rbd" binary and parsing output from it. > > Implementing RBD support in libvirt would then mean two things: > > 1. Storage manager in libvirt > 2. A special disk format for RBD > > The first one is done as I explained above, but for the second one, I'm > not sure how you could do that. > > Libvirt now expects a disk to always be a file/block, the virtual disks > like RBD and NBD are not supported. > > For #2 we should have a "special" disk declaration format, like > mentioned on the RedHat mailinglist: > > http://www.redhat.com/archives/libvir-list/2010-June/msg00300.html > > <disk type='rbd' device='disk'> > <driver name='qemu' type='raw' /> > <source pool='rbd' image='alpha' /> > <target dev='vda' bus='virtio' /> > </disk> > > As images on a RBD image are always "raw", it might seem obsolete to > define this, but newer version of Qemu don't autodetect formats. > > Defining a monitor in the disk declaration won't be possible I think, I > don't see a way to get that parameter down to librados, so we need a > valid /etc/ceph/ceph.conf > > Now, I'm not a libvirt expert, this is what I found in my search. > > Any suggestions / thoughts about this? > > Thanks, > > Wido > > On Mon, 2010-11-01 at 20:52 -0700, Sage Weil wrote: > > Hi, > > > > We've been working on RBD, a distributed block device backed by the Ceph > > distributed object store. (Ceph is a highly scalable, fault tolerant > > distributed storage and file system; see http://ceph.newdream.net.) > > Although the Ceph file system client has been in Linux since 2.6.34, the > > RBD block device was just merged for 2.6.37. We also have patches pending > > for Qemu that use librados to natively talk to the Ceph storage backend, > > avoiding any kernel dependency. > > > > To support disks backed by RBD in libvirt, we originally proposed a > > 'virtual' type that simply passed the configuration information through to > > qemu, but that idea was shot down for a variety of reasons: > > > > http://www.redhat.com/archives/libvir-list/2010-June/thread.html#00257 > > > > It sounds like the "right" approach is to create a storage pool type. > > Ceph also has a 'pool' concept that contains some number of RBD images and > > a command line tool to manipulate (create, destroy, resize, rename, > > snapshot, etc.) those images, which seems to map nicely onto the storage > > pool abstraction. For example, > > > > $ rbd create foo -s 1000 > > rbd image 'foo': > > size 1000 MB in 250 objects > > order 22 (4096 KB objects) > > adding rbd image to directory... > > creating rbd image... > > done. > > $ rbd create bar -s 10000 > > [...] > > $ rbd list > > bar > > foo > > > > Something along the lines of > > > > <pool type="rbd"> > > <name>virtimages</name> > > <source mode="kernel"> > > <host monitor="ceph-mon1.domain.com:6789"/> > > <host monitor="ceph-mon2.domain.com:6789"/> > > <host monitor="ceph-mon3.domain.com:6789"/> > > <pool name="rbd"/> > > </source> > > </pool> > > > > or whatever (I'm not too familiar with the libvirt schema)? One > > difference between the existing pool types listed at > > libvirt.org/storage.html is that RBD does not necessarily associate itself > > with a path in the local file system. If the native qemu driver is used, > > there is no path involved, just a magic string passed to qemu > > (rbd:poolname/imagename). If the kernel RBD driver is used, it gets > > mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n > > is not static across reboots. > > > > In any case, before someone goes off and implements something, does this > > look like the right general approach to adding rbd support to libvirt? > > > > Thanks! > > sage > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-02 3:52 rbd storage pool support for libvirt Sage Weil 2010-11-02 19:47 ` Wido den Hollander @ 2010-11-03 13:59 ` Daniel P. Berrange 2010-11-05 23:33 ` Sage Weil 1 sibling, 1 reply; 14+ messages in thread From: Daniel P. Berrange @ 2010-11-03 13:59 UTC (permalink / raw) To: Sage Weil; +Cc: libvir-list, ceph-devel On Mon, Nov 01, 2010 at 08:52:05PM -0700, Sage Weil wrote: > Hi, > > We've been working on RBD, a distributed block device backed by the Ceph > distributed object store. (Ceph is a highly scalable, fault tolerant > distributed storage and file system; see http://ceph.newdream.net.) > Although the Ceph file system client has been in Linux since 2.6.34, the > RBD block device was just merged for 2.6.37. We also have patches pending > for Qemu that use librados to natively talk to the Ceph storage backend, > avoiding any kernel dependency. > > To support disks backed by RBD in libvirt, we originally proposed a > 'virtual' type that simply passed the configuration information through to > qemu, but that idea was shot down for a variety of reasons: > > http://www.redhat.com/archives/libvir-list/2010-June/thread.html#00257 NB, I'm not against adding new disk types to the guest XML, just that each type should be explicitly modelled, rather than being lumped under a generic 'virtual' type. > It sounds like the "right" approach is to create a storage pool type. Sort of. There's really two separate aspects to handling storage in libvirt 1. How do you configure a VM to use a storage volume 2. How do you list/create/delete storage volumes The XML addition proposed in the mailing list post above is attempting to cater for the first aspect. The storage pool type idea you're describing in this post is catering to the second aspect. If the storage pool ends up providing real block devices that exist on the filesystem, then the first item is trivially solved, because libvirt can already point any guest at a block device. If the storage pool provides some kind of virtual device, then we'd still need to decide how to deal with the XML for configuring the guest VM. > Ceph also has a 'pool' concept that contains some number of RBD images and > a command line tool to manipulate (create, destroy, resize, rename, > snapshot, etc.) those images, which seems to map nicely onto the storage > pool abstraction. For example, Agreed, it does look like it'd map in quite well and let the RDB functionality more or less 'just work' in virt-manager & other apps using storage pool APIs. > $ rbd create foo -s 1000 > rbd image 'foo': > size 1000 MB in 250 objects > order 22 (4096 KB objects) > adding rbd image to directory... > creating rbd image... > done. > $ rbd create bar -s 10000 > [...] > $ rbd list > bar > foo > > Something along the lines of > > <pool type="rbd"> > <name>virtimages</name> > <source mode="kernel"> > <host monitor="ceph-mon1.domain.com:6789"/> > <host monitor="ceph-mon2.domain.com:6789"/> > <host monitor="ceph-mon3.domain.com:6789"/> > <pool name="rbd"/> > </source> > </pool> What do the 3 hostnames represent in this context ? > or whatever (I'm not too familiar with the libvirt schema)? One > difference between the existing pool types listed at > libvirt.org/storage.html is that RBD does not necessarily associate itself > with a path in the local file system. If the native qemu driver is used, > there is no path involved, just a magic string passed to qemu > (rbd:poolname/imagename). If the kernel RBD driver is used, it gets > mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n > is not static across reboots. The docs about storage pool are slightly inaccurate. While it is desirable that the storage volume path exists on the filesystem, it is not something we strictly require. The only require that there is some way to map from the storage volume path to the corresponding guest XML If we define a new guest XML syntax for RBD magic strings, then we can also define a storage pool that provides path data in a corresponding format. WRT to the issue of /dev/rbd/$n being unstable, this is quite similar to the issue of /dev/sdXX device names being unstable for SCSI. The way to cope with this is to drop in a UDEV ruleset that creates symlinks with sensible names, eg perhaps setup symlinks for: /dev/disk/by-id/rbd-$poolname-$imagename -> /dev/rbd/0 It might also make sense to wire up /dev/disk/by-path symlinks for RBD devices. > In any case, before someone goes off and implements something, does this > look like the right general approach to adding rbd support to libvirt? I think this looks reasonable. I'd be inclined to get the storage pool stuff working with the kernel RBD driver & UDEV rules for stable path names, since that avoids needing to make any changes to guest XML format. Support for QEMU with the native librados CEPH driver could be added as a second patch. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-03 13:59 ` [libvirt] " Daniel P. Berrange @ 2010-11-05 23:33 ` Sage Weil 2010-11-08 13:16 ` Daniel P. Berrange 0 siblings, 1 reply; 14+ messages in thread From: Sage Weil @ 2010-11-05 23:33 UTC (permalink / raw) To: Daniel P. Berrange; +Cc: libvir-list, ceph-devel Hi Daniel, On Wed, 3 Nov 2010, Daniel P. Berrange wrote: > > Ceph also has a 'pool' concept that contains some number of RBD images and > > a command line tool to manipulate (create, destroy, resize, rename, > > snapshot, etc.) those images, which seems to map nicely onto the storage > > pool abstraction. For example, > > Agreed, it does look like it'd map in quite well and let the RDB > functionality more or less 'just work' in virt-manager & other > apps using storage pool APIs. Great! > > Something along the lines of > > > > <pool type="rbd"> > > <name>virtimages</name> > > <source mode="kernel"> > > <host monitor="ceph-mon1.domain.com:6789"/> > > <host monitor="ceph-mon2.domain.com:6789"/> > > <host monitor="ceph-mon3.domain.com:6789"/> > > <pool name="rbd"/> > > </source> > > </pool> > > What do the 3 hostnames represent in this context ? They're the host(s) that RBD needs to be fed to talk to the storage cluster. Ideally there's more than one for redundancy. Does the above syntax look reasonable, or is there something you would propose instead? From the RBD side of things, the key parameters are - pool name - monitor address(es) - user and secret key to authenticate with If the 'rbd' command line tool is used for this, everything but the pool can come out of the default /etc/ceph/ceph.conf config file, or we could have a way to specify a config path in the XML. > > or whatever (I'm not too familiar with the libvirt schema)? One > > difference between the existing pool types listed at > > libvirt.org/storage.html is that RBD does not necessarily associate itself > > with a path in the local file system. If the native qemu driver is used, > > there is no path involved, just a magic string passed to qemu > > (rbd:poolname/imagename). If the kernel RBD driver is used, it gets > > mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n > > is not static across reboots. > > The docs about storage pool are slightly inaccurate. While it is > desirable that the storage volume path exists on the filesystem, > it is not something we strictly require. The only require that > there is some way to map from the storage volume path to the > corresponding guest XML > > If we define a new guest XML syntax for RBD magic strings, then > we can also define a storage pool that provides path data in a > corresponding format. Ok thanks, that clarifies things. > WRT to the issue of /dev/rbd/$n being unstable, this is quite similar > to the issue of /dev/sdXX device names being unstable for SCSI. The > way to cope with this is to drop in a UDEV ruleset that creates > symlinks with sensible names, eg perhaps setup symlinks for: > > /dev/disk/by-id/rbd-$poolname-$imagename -> /dev/rbd/0 > > It might also make sense to wire up /dev/disk/by-path symlinks > for RBD devices. We're putting together some udev rules to do this. > > In any case, before someone goes off and implements something, does this > > look like the right general approach to adding rbd support to libvirt? > > I think this looks reasonable. I'd be inclined to get the storage pool > stuff working with the kernel RBD driver & UDEV rules for stable path > names, since that avoids needing to make any changes to guest XML > format. Support for QEMU with the native librados CEPH driver could > be added as a second patch. Okay, that sounds reasonable. Supporting the QEMU librados driver is definitely something we want to target, though, and seems to be route that more users are interested in. Is defining the XML syntax for a guest VM something we can discuss now as well? (BTW this is biting NBD users too. Presumably the guest VM XML should look similar? http://www.redhat.com/archives/libvir-list/2010-October/msg01247.html ) Thanks! sage ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-05 23:33 ` Sage Weil @ 2010-11-08 13:16 ` Daniel P. Berrange 2010-11-18 0:33 ` Josh Durgin 0 siblings, 1 reply; 14+ messages in thread From: Daniel P. Berrange @ 2010-11-08 13:16 UTC (permalink / raw) To: Sage Weil; +Cc: libvir-list, ceph-devel On Fri, Nov 05, 2010 at 04:33:46PM -0700, Sage Weil wrote: > On Wed, 3 Nov 2010, Daniel P. Berrange wrote: > > > Something along the lines of > > > > > > <pool type="rbd"> > > > <name>virtimages</name> > > > <source mode="kernel"> > > > <host monitor="ceph-mon1.domain.com:6789"/> > > > <host monitor="ceph-mon2.domain.com:6789"/> > > > <host monitor="ceph-mon3.domain.com:6789"/> > > > <pool name="rbd"/> > > > </source> > > > </pool> > > > > What do the 3 hostnames represent in this context ? > > They're the host(s) that RBD needs to be fed to talk to the storage > cluster. Ideally there's more than one for redundancy. Does the above > syntax look reasonable, or is there something you would propose instead? > From the RBD side of things, the key parameters are > > - pool name > - monitor address(es) > - user and secret key to authenticate with > > If the 'rbd' command line tool is used for this, everything but the pool > can come out of the default /etc/ceph/ceph.conf config file, or we could > have a way to specify a config path in the XML. It makes sense to allow the hostname in the XML, because the general goal is that you should be able to configure storage without needing to SSH into a machine & manually setup config files. The XML does not currently allow multiple hosts, but we can extend that. The pool name / user / key are already covered. > > > In any case, before someone goes off and implements something, does this > > > look like the right general approach to adding rbd support to libvirt? > > > > I think this looks reasonable. I'd be inclined to get the storage pool > > stuff working with the kernel RBD driver & UDEV rules for stable path > > names, since that avoids needing to make any changes to guest XML > > format. Support for QEMU with the native librados CEPH driver could > > be added as a second patch. > > Okay, that sounds reasonable. Supporting the QEMU librados driver is > definitely something we want to target, though, and seems to be route that > more users are interested in. Is defining the XML syntax for a guest VM > something we can discuss now as well? > > (BTW this is biting NBD users too. Presumably the guest VM XML should > look similar? And also Sheepdog storage volumes. To define a syntax for all these we need to determine what configuration metadata is required at a per-VM level for each of them. Then try and decide how to represent that in the guest XML. It looks like at a VM level we'd need a hostname, port number and a volume name (or path). Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-08 13:16 ` Daniel P. Berrange @ 2010-11-18 0:33 ` Josh Durgin 2010-11-18 2:04 ` Josh Durgin 2010-11-18 10:42 ` Daniel P. Berrange 0 siblings, 2 replies; 14+ messages in thread From: Josh Durgin @ 2010-11-18 0:33 UTC (permalink / raw) To: Daniel P. Berrange; +Cc: Sage Weil, libvir-list, ceph-devel Hi Daniel, On 11/08/2010 05:16 AM, Daniel P. Berrange wrote: >>>> In any case, before someone goes off and implements something, does this >>>> look like the right general approach to adding rbd support to libvirt? >>> >>> I think this looks reasonable. I'd be inclined to get the storage pool >>> stuff working with the kernel RBD driver& UDEV rules for stable path >>> names, since that avoids needing to make any changes to guest XML >>> format. Support for QEMU with the native librados CEPH driver could >>> be added as a second patch. >> >> Okay, that sounds reasonable. Supporting the QEMU librados driver is >> definitely something we want to target, though, and seems to be route that >> more users are interested in. Is defining the XML syntax for a guest VM >> something we can discuss now as well? >> >> (BTW this is biting NBD users too. Presumably the guest VM XML should >> look similar? > > And also Sheepdog storage volumes. To define a syntax for all these we need > to determine what configuration metadata is required at a per-VM level for > each of them. Then try and decide how to represent that in the guest XML. > It looks like at a VM level we'd need a hostname, port number and a volume > name (or path). It looks like that's what Sheepdog needs from the patch that was submitted earlier today. For RBD, we would want to allow multiple hosts, and specify the pool and image name when the QEMU librados driver is used, e.g.: <disk type="rbd" device="disk"> <driver name="qemu" type="raw" /> <source vdi="image_name" pool="pool_name"> <host name="mon1.example.org" port="6000"> <host name="mon2.example.org" port="6000"> <host name="mon3.example.org" port="6000"> </source> <target dev="vda" bus="virtio" /> </disk> As you mentioned earlier, we could just use the existing source format for the kernel RBD driver. Does this seem like a reasonable format for the VM XML? Any suggestions? Thanks, Josh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-18 0:33 ` Josh Durgin @ 2010-11-18 2:04 ` Josh Durgin 2010-11-18 10:38 ` Daniel P. Berrange 2010-11-18 10:42 ` Daniel P. Berrange 1 sibling, 1 reply; 14+ messages in thread From: Josh Durgin @ 2010-11-18 2:04 UTC (permalink / raw) To: Daniel P. Berrange; +Cc: Sage Weil, libvir-list, ceph-devel On 11/17/2010 04:33 PM, Josh Durgin wrote: > Hi Daniel, > > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote: >>>>> In any case, before someone goes off and implements something, does >>>>> this >>>>> look like the right general approach to adding rbd support to libvirt? >>>> >>>> I think this looks reasonable. I'd be inclined to get the storage pool >>>> stuff working with the kernel RBD driver& UDEV rules for stable path >>>> names, since that avoids needing to make any changes to guest XML >>>> format. Support for QEMU with the native librados CEPH driver could >>>> be added as a second patch. >>> >>> Okay, that sounds reasonable. Supporting the QEMU librados driver is >>> definitely something we want to target, though, and seems to be route >>> that >>> more users are interested in. Is defining the XML syntax for a guest VM >>> something we can discuss now as well? >>> >>> (BTW this is biting NBD users too. Presumably the guest VM XML should >>> look similar? >> >> And also Sheepdog storage volumes. To define a syntax for all these we >> need >> to determine what configuration metadata is required at a per-VM level >> for >> each of them. Then try and decide how to represent that in the guest XML. >> It looks like at a VM level we'd need a hostname, port number and a >> volume >> name (or path). > > It looks like that's what Sheepdog needs from the patch that was > submitted earlier today. For RBD, we would want to allow multiple hosts, > and specify the pool and image name when the QEMU librados driver is > used, e.g.: > > <disk type="rbd" device="disk"> > <driver name="qemu" type="raw" /> > <source vdi="image_name" pool="pool_name"> > <host name="mon1.example.org" port="6000"> > <host name="mon2.example.org" port="6000"> > <host name="mon3.example.org" port="6000"> > </source> > <target dev="vda" bus="virtio" /> > </disk> > > As you mentioned earlier, we could just use the existing source format > for the kernel RBD driver. > > Does this seem like a reasonable format for the VM XML? Any suggestions? Also, it would be convenient to be able to specify which RBD driver to use in the guest XML, so that it's independent of the libvirt pool configuration. Would having two different rbd disk types be the right approach here? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-18 2:04 ` Josh Durgin @ 2010-11-18 10:38 ` Daniel P. Berrange 0 siblings, 0 replies; 14+ messages in thread From: Daniel P. Berrange @ 2010-11-18 10:38 UTC (permalink / raw) To: Josh Durgin; +Cc: Sage Weil, libvir-list, ceph-devel On Wed, Nov 17, 2010 at 06:04:50PM -0800, Josh Durgin wrote: > On 11/17/2010 04:33 PM, Josh Durgin wrote: > >Hi Daniel, > > > >On 11/08/2010 05:16 AM, Daniel P. Berrange wrote: > >>>>>In any case, before someone goes off and implements something, does > >>>>>this > >>>>>look like the right general approach to adding rbd support to libvirt? > >>>> > >>>>I think this looks reasonable. I'd be inclined to get the storage pool > >>>>stuff working with the kernel RBD driver& UDEV rules for stable path > >>>>names, since that avoids needing to make any changes to guest XML > >>>>format. Support for QEMU with the native librados CEPH driver could > >>>>be added as a second patch. > >>> > >>>Okay, that sounds reasonable. Supporting the QEMU librados driver is > >>>definitely something we want to target, though, and seems to be route > >>>that > >>>more users are interested in. Is defining the XML syntax for a guest VM > >>>something we can discuss now as well? > >>> > >>>(BTW this is biting NBD users too. Presumably the guest VM XML should > >>>look similar? > >> > >>And also Sheepdog storage volumes. To define a syntax for all these we > >>need > >>to determine what configuration metadata is required at a per-VM level > >>for > >>each of them. Then try and decide how to represent that in the guest XML. > >>It looks like at a VM level we'd need a hostname, port number and a > >>volume > >>name (or path). > > > >It looks like that's what Sheepdog needs from the patch that was > >submitted earlier today. For RBD, we would want to allow multiple hosts, > >and specify the pool and image name when the QEMU librados driver is > >used, e.g.: > > > ><disk type="rbd" device="disk"> > ><driver name="qemu" type="raw" /> > ><source vdi="image_name" pool="pool_name"> > ><host name="mon1.example.org" port="6000"> > ><host name="mon2.example.org" port="6000"> > ><host name="mon3.example.org" port="6000"> > ></source> > ><target dev="vda" bus="virtio" /> > ></disk> > > > >As you mentioned earlier, we could just use the existing source format > >for the kernel RBD driver. > > > >Does this seem like a reasonable format for the VM XML? Any suggestions? > > Also, it would be convenient to be able to specify which RBD driver to > use in the guest XML, so that it's independent of the libvirt pool > configuration. Would having two different rbd disk types be the right > approach here? What do you mean by RBD driver here ? kernel vs native QEMU ? If so, the kernel case is trivially handled by the <disk type='block'> case, so we only need new syntax for the native QEMU impl Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-18 0:33 ` Josh Durgin 2010-11-18 2:04 ` Josh Durgin @ 2010-11-18 10:42 ` Daniel P. Berrange 2010-11-18 17:13 ` Sage Weil 1 sibling, 1 reply; 14+ messages in thread From: Daniel P. Berrange @ 2010-11-18 10:42 UTC (permalink / raw) To: Josh Durgin; +Cc: Sage Weil, libvir-list, ceph-devel On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote: > Hi Daniel, > > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote: > >>>>In any case, before someone goes off and implements something, does this > >>>>look like the right general approach to adding rbd support to libvirt? > >>> > >>>I think this looks reasonable. I'd be inclined to get the storage pool > >>>stuff working with the kernel RBD driver& UDEV rules for stable path > >>>names, since that avoids needing to make any changes to guest XML > >>>format. Support for QEMU with the native librados CEPH driver could > >>>be added as a second patch. > >> > >>Okay, that sounds reasonable. Supporting the QEMU librados driver is > >>definitely something we want to target, though, and seems to be route that > >>more users are interested in. Is defining the XML syntax for a guest VM > >>something we can discuss now as well? > >> > >>(BTW this is biting NBD users too. Presumably the guest VM XML should > >>look similar? > > > >And also Sheepdog storage volumes. To define a syntax for all these we need > >to determine what configuration metadata is required at a per-VM level for > >each of them. Then try and decide how to represent that in the guest XML. > >It looks like at a VM level we'd need a hostname, port number and a volume > >name (or path). > > It looks like that's what Sheepdog needs from the patch that was > submitted earlier today. For RBD, we would want to allow multiple hosts, > and specify the pool and image name when the QEMU librados driver is > used, e.g.: > > <disk type="rbd" device="disk"> > <driver name="qemu" type="raw" /> > <source vdi="image_name" pool="pool_name"> > <host name="mon1.example.org" port="6000"> > <host name="mon2.example.org" port="6000"> > <host name="mon3.example.org" port="6000"> > </source> > <target dev="vda" bus="virtio" /> > </disk> > > Does this seem like a reasonable format for the VM XML? Any suggestions? I'm basically wondering whether we should be going for separate types for each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier today. Or type to merge them into one type 'nework' which covers any kind of network block device, and list a protocol on the source element, eg <disk type="network" device="disk"> <driver name="qemu" type="raw" /> <source protocol='rbd|sheepdog|nbd' name="...some image identifier..."> <host name="mon1.example.org" port="6000"> <host name="mon2.example.org" port="6000"> <host name="mon3.example.org" port="6000"> </source> <target dev="vda" bus="virtio" /> </disk> Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-18 10:42 ` Daniel P. Berrange @ 2010-11-18 17:13 ` Sage Weil 2010-11-19 9:27 ` Stefan Hajnoczi 0 siblings, 1 reply; 14+ messages in thread From: Sage Weil @ 2010-11-18 17:13 UTC (permalink / raw) To: Daniel P. Berrange; +Cc: Josh Durgin, libvir-list, ceph-devel On Thu, 18 Nov 2010, Daniel P. Berrange wrote: > On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote: > > Hi Daniel, > > > > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote: > > >>>>In any case, before someone goes off and implements something, does this > > >>>>look like the right general approach to adding rbd support to libvirt? > > >>> > > >>>I think this looks reasonable. I'd be inclined to get the storage pool > > >>>stuff working with the kernel RBD driver& UDEV rules for stable path > > >>>names, since that avoids needing to make any changes to guest XML > > >>>format. Support for QEMU with the native librados CEPH driver could > > >>>be added as a second patch. > > >> > > >>Okay, that sounds reasonable. Supporting the QEMU librados driver is > > >>definitely something we want to target, though, and seems to be route that > > >>more users are interested in. Is defining the XML syntax for a guest VM > > >>something we can discuss now as well? > > >> > > >>(BTW this is biting NBD users too. Presumably the guest VM XML should > > >>look similar? > > > > > >And also Sheepdog storage volumes. To define a syntax for all these we need > > >to determine what configuration metadata is required at a per-VM level for > > >each of them. Then try and decide how to represent that in the guest XML. > > >It looks like at a VM level we'd need a hostname, port number and a volume > > >name (or path). > > > > It looks like that's what Sheepdog needs from the patch that was > > submitted earlier today. For RBD, we would want to allow multiple hosts, > > and specify the pool and image name when the QEMU librados driver is > > used, e.g.: > > > > <disk type="rbd" device="disk"> > > <driver name="qemu" type="raw" /> > > <source vdi="image_name" pool="pool_name"> > > <host name="mon1.example.org" port="6000"> > > <host name="mon2.example.org" port="6000"> > > <host name="mon3.example.org" port="6000"> > > </source> > > <target dev="vda" bus="virtio" /> > > </disk> > > > > Does this seem like a reasonable format for the VM XML? Any suggestions? > > I'm basically wondering whether we should be going for separate types for > each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier > today. Or type to merge them into one type 'nework' which covers any kind of > network block device, and list a protocol on the source element, eg > > <disk type="network" device="disk"> > <driver name="qemu" type="raw" /> > <source protocol='rbd|sheepdog|nbd' name="...some image identifier..."> > <host name="mon1.example.org" port="6000"> > <host name="mon2.example.org" port="6000"> > <host name="mon3.example.org" port="6000"> > </source> > <target dev="vda" bus="virtio" /> > </disk> That would work... One thing that I think should be considered, though, is that both RBD and NBD can be used for non-qemu instances by mapping a regular block device via the host's kernel. And in that case, there's some sysfs-fu (at least in the rbd case; I'm not familiar with how the nbd client works) required to set up/tear down the block device. I think the ideal would be if either method (qemu or kernel driver) could be used, and libvirt could take care of that process of setting up the block device so that RBD (and/or NBD) can be used with non-qemu instances. If that means totally separate <disk> descriptions for the two scenarios, that's fine, as long as there's a way for a storage pool driver to be used to set up both types of mappings... sage ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-18 17:13 ` Sage Weil @ 2010-11-19 9:27 ` Stefan Hajnoczi 2010-11-19 9:50 ` Daniel P. Berrange 0 siblings, 1 reply; 14+ messages in thread From: Stefan Hajnoczi @ 2010-11-19 9:27 UTC (permalink / raw) To: Sage Weil; +Cc: libvir-list, ceph-devel On Thu, Nov 18, 2010 at 5:13 PM, Sage Weil <sage@newdream.net> wrote: > On Thu, 18 Nov 2010, Daniel P. Berrange wrote: >> On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote: >> > Hi Daniel, >> > >> > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote: >> > >>>>In any case, before someone goes off and implements something, does this >> > >>>>look like the right general approach to adding rbd support to libvirt? >> > >>> >> > >>>I think this looks reasonable. I'd be inclined to get the storage pool >> > >>>stuff working with the kernel RBD driver& UDEV rules for stable path >> > >>>names, since that avoids needing to make any changes to guest XML >> > >>>format. Support for QEMU with the native librados CEPH driver could >> > >>>be added as a second patch. >> > >> >> > >>Okay, that sounds reasonable. Supporting the QEMU librados driver is >> > >>definitely something we want to target, though, and seems to be route that >> > >>more users are interested in. Is defining the XML syntax for a guest VM >> > >>something we can discuss now as well? >> > >> >> > >>(BTW this is biting NBD users too. Presumably the guest VM XML should >> > >>look similar? >> > > >> > >And also Sheepdog storage volumes. To define a syntax for all these we need >> > >to determine what configuration metadata is required at a per-VM level for >> > >each of them. Then try and decide how to represent that in the guest XML. >> > >It looks like at a VM level we'd need a hostname, port number and a volume >> > >name (or path). >> > >> > It looks like that's what Sheepdog needs from the patch that was >> > submitted earlier today. For RBD, we would want to allow multiple hosts, >> > and specify the pool and image name when the QEMU librados driver is >> > used, e.g.: >> > >> > <disk type="rbd" device="disk"> >> > <driver name="qemu" type="raw" /> >> > <source vdi="image_name" pool="pool_name"> >> > <host name="mon1.example.org" port="6000"> >> > <host name="mon2.example.org" port="6000"> >> > <host name="mon3.example.org" port="6000"> >> > </source> >> > <target dev="vda" bus="virtio" /> >> > </disk> >> > >> > Does this seem like a reasonable format for the VM XML? Any suggestions? >> >> I'm basically wondering whether we should be going for separate types for >> each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier >> today. Or type to merge them into one type 'nework' which covers any kind of >> network block device, and list a protocol on the source element, eg >> >> <disk type="network" device="disk"> >> <driver name="qemu" type="raw" /> >> <source protocol='rbd|sheepdog|nbd' name="...some image identifier..."> >> <host name="mon1.example.org" port="6000"> >> <host name="mon2.example.org" port="6000"> >> <host name="mon3.example.org" port="6000"> >> </source> >> <target dev="vda" bus="virtio" /> >> </disk> > > That would work... > > One thing that I think should be considered, though, is that both RBD and > NBD can be used for non-qemu instances by mapping a regular block device > via the host's kernel. And in that case, there's some sysfs-fu (at least > in the rbd case; I'm not familiar with how the nbd client works) required > to set up/tear down the block device. An nbd block device is attached using the nbd-client(1) userspace tool: $ nbd-client my-server 1234 /dev/nbd0 # <host> <port> <nbd-device> That program will open the socket, grab /dev/nbd0, and poke it with a few ioctls so the kernel has the socket and can take it from there. Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-19 9:27 ` Stefan Hajnoczi @ 2010-11-19 9:50 ` Daniel P. Berrange 2010-11-19 12:55 ` Stefan Hajnoczi 0 siblings, 1 reply; 14+ messages in thread From: Daniel P. Berrange @ 2010-11-19 9:50 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Sage Weil, libvir-list, ceph-devel On Fri, Nov 19, 2010 at 09:27:40AM +0000, Stefan Hajnoczi wrote: > On Thu, Nov 18, 2010 at 5:13 PM, Sage Weil <sage@newdream.net> wrote: > > On Thu, 18 Nov 2010, Daniel P. Berrange wrote: > >> On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote: > >> > Hi Daniel, > >> > > >> > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote: > >> > >>>>In any case, before someone goes off and implements something, does this > >> > >>>>look like the right general approach to adding rbd support to libvirt? > >> > >>> > >> > >>>I think this looks reasonable. I'd be inclined to get the storage pool > >> > >>>stuff working with the kernel RBD driver& UDEV rules for stable path > >> > >>>names, since that avoids needing to make any changes to guest XML > >> > >>>format. Support for QEMU with the native librados CEPH driver could > >> > >>>be added as a second patch. > >> > >> > >> > >>Okay, that sounds reasonable. Supporting the QEMU librados driver is > >> > >>definitely something we want to target, though, and seems to be route that > >> > >>more users are interested in. Is defining the XML syntax for a guest VM > >> > >>something we can discuss now as well? > >> > >> > >> > >>(BTW this is biting NBD users too. Presumably the guest VM XML should > >> > >>look similar? > >> > > > >> > >And also Sheepdog storage volumes. To define a syntax for all these we need > >> > >to determine what configuration metadata is required at a per-VM level for > >> > >each of them. Then try and decide how to represent that in the guest XML. > >> > >It looks like at a VM level we'd need a hostname, port number and a volume > >> > >name (or path). > >> > > >> > It looks like that's what Sheepdog needs from the patch that was > >> > submitted earlier today. For RBD, we would want to allow multiple hosts, > >> > and specify the pool and image name when the QEMU librados driver is > >> > used, e.g.: > >> > > >> > <disk type="rbd" device="disk"> > >> > <driver name="qemu" type="raw" /> > >> > <source vdi="image_name" pool="pool_name"> > >> > <host name="mon1.example.org" port="6000"> > >> > <host name="mon2.example.org" port="6000"> > >> > <host name="mon3.example.org" port="6000"> > >> > </source> > >> > <target dev="vda" bus="virtio" /> > >> > </disk> > >> > > >> > Does this seem like a reasonable format for the VM XML? Any suggestions? > >> > >> I'm basically wondering whether we should be going for separate types for > >> each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier > >> today. Or type to merge them into one type 'nework' which covers any kind of > >> network block device, and list a protocol on the source element, eg > >> > >> <disk type="network" device="disk"> > >> <driver name="qemu" type="raw" /> > >> <source protocol='rbd|sheepdog|nbd' name="...some image identifier..."> > >> <host name="mon1.example.org" port="6000"> > >> <host name="mon2.example.org" port="6000"> > >> <host name="mon3.example.org" port="6000"> > >> </source> > >> <target dev="vda" bus="virtio" /> > >> </disk> > > > > That would work... > > > > One thing that I think should be considered, though, is that both RBD and > > NBD can be used for non-qemu instances by mapping a regular block device > > via the host's kernel. And in that case, there's some sysfs-fu (at least > > in the rbd case; I'm not familiar with how the nbd client works) required > > to set up/tear down the block device. > > An nbd block device is attached using the nbd-client(1) userspace tool: > $ nbd-client my-server 1234 /dev/nbd0 # <host> <port> <nbd-device> > > That program will open the socket, grab /dev/nbd0, and poke it with a > few ioctls so the kernel has the socket and can take it from there. We don't need to worry about this for libvirt/QEMU. Since QEMU has native NBD client support there's no need to do anything with nbd client tools to setup the device for use with a VM. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [libvirt] rbd storage pool support for libvirt 2010-11-19 9:50 ` Daniel P. Berrange @ 2010-11-19 12:55 ` Stefan Hajnoczi 0 siblings, 0 replies; 14+ messages in thread From: Stefan Hajnoczi @ 2010-11-19 12:55 UTC (permalink / raw) To: Daniel P. Berrange; +Cc: Sage Weil, libvir-list, ceph-devel On Fri, Nov 19, 2010 at 9:50 AM, Daniel P. Berrange <berrange@redhat.com> wrote: > On Fri, Nov 19, 2010 at 09:27:40AM +0000, Stefan Hajnoczi wrote: >> On Thu, Nov 18, 2010 at 5:13 PM, Sage Weil <sage@newdream.net> wrote: >> > On Thu, 18 Nov 2010, Daniel P. Berrange wrote: >> >> On Wed, Nov 17, 2010 at 04:33:07PM -0800, Josh Durgin wrote: >> >> > Hi Daniel, >> >> > >> >> > On 11/08/2010 05:16 AM, Daniel P. Berrange wrote: >> >> > >>>>In any case, before someone goes off and implements something, does this >> >> > >>>>look like the right general approach to adding rbd support to libvirt? >> >> > >>> >> >> > >>>I think this looks reasonable. I'd be inclined to get the storage pool >> >> > >>>stuff working with the kernel RBD driver& UDEV rules for stable path >> >> > >>>names, since that avoids needing to make any changes to guest XML >> >> > >>>format. Support for QEMU with the native librados CEPH driver could >> >> > >>>be added as a second patch. >> >> > >> >> >> > >>Okay, that sounds reasonable. Supporting the QEMU librados driver is >> >> > >>definitely something we want to target, though, and seems to be route that >> >> > >>more users are interested in. Is defining the XML syntax for a guest VM >> >> > >>something we can discuss now as well? >> >> > >> >> >> > >>(BTW this is biting NBD users too. Presumably the guest VM XML should >> >> > >>look similar? >> >> > > >> >> > >And also Sheepdog storage volumes. To define a syntax for all these we need >> >> > >to determine what configuration metadata is required at a per-VM level for >> >> > >each of them. Then try and decide how to represent that in the guest XML. >> >> > >It looks like at a VM level we'd need a hostname, port number and a volume >> >> > >name (or path). >> >> > >> >> > It looks like that's what Sheepdog needs from the patch that was >> >> > submitted earlier today. For RBD, we would want to allow multiple hosts, >> >> > and specify the pool and image name when the QEMU librados driver is >> >> > used, e.g.: >> >> > >> >> > <disk type="rbd" device="disk"> >> >> > <driver name="qemu" type="raw" /> >> >> > <source vdi="image_name" pool="pool_name"> >> >> > <host name="mon1.example.org" port="6000"> >> >> > <host name="mon2.example.org" port="6000"> >> >> > <host name="mon3.example.org" port="6000"> >> >> > </source> >> >> > <target dev="vda" bus="virtio" /> >> >> > </disk> >> >> > >> >> > Does this seem like a reasonable format for the VM XML? Any suggestions? >> >> >> >> I'm basically wondering whether we should be going for separate types for >> >> each of NBD, RBD & Sheepdog, as per your proposal & the sheepdog one earlier >> >> today. Or type to merge them into one type 'nework' which covers any kind of >> >> network block device, and list a protocol on the source element, eg >> >> >> >> <disk type="network" device="disk"> >> >> <driver name="qemu" type="raw" /> >> >> <source protocol='rbd|sheepdog|nbd' name="...some image identifier..."> >> >> <host name="mon1.example.org" port="6000"> >> >> <host name="mon2.example.org" port="6000"> >> >> <host name="mon3.example.org" port="6000"> >> >> </source> >> >> <target dev="vda" bus="virtio" /> >> >> </disk> >> > >> > That would work... >> > >> > One thing that I think should be considered, though, is that both RBD and >> > NBD can be used for non-qemu instances by mapping a regular block device >> > via the host's kernel. And in that case, there's some sysfs-fu (at least >> > in the rbd case; I'm not familiar with how the nbd client works) required >> > to set up/tear down the block device. >> >> An nbd block device is attached using the nbd-client(1) userspace tool: >> $ nbd-client my-server 1234 /dev/nbd0 # <host> <port> <nbd-device> >> >> That program will open the socket, grab /dev/nbd0, and poke it with a >> few ioctls so the kernel has the socket and can take it from there. > > We don't need to worry about this for libvirt/QEMU. Since QEMU has native > NBD client support there's no need to do anything with nbd client tools > to setup the device for use with a VM. I agree it's easier to use the built-in NBD support. Just wanted to provide the background on how NBD client works when using the kernel implementation. Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-11-19 12:55 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-11-02 3:52 rbd storage pool support for libvirt Sage Weil 2010-11-02 19:47 ` Wido den Hollander 2010-11-02 19:50 ` Wido den Hollander 2010-11-03 13:59 ` [libvirt] " Daniel P. Berrange 2010-11-05 23:33 ` Sage Weil 2010-11-08 13:16 ` Daniel P. Berrange 2010-11-18 0:33 ` Josh Durgin 2010-11-18 2:04 ` Josh Durgin 2010-11-18 10:38 ` Daniel P. Berrange 2010-11-18 10:42 ` Daniel P. Berrange 2010-11-18 17:13 ` Sage Weil 2010-11-19 9:27 ` Stefan Hajnoczi 2010-11-19 9:50 ` Daniel P. Berrange 2010-11-19 12:55 ` Stefan Hajnoczi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.