All of lore.kernel.org
 help / color / mirror / Atom feed
* RBD performance with many childs and snapshots
@ 2015-12-21 19:06 Wido den Hollander
  2015-12-21 22:51 ` Josh Durgin
  0 siblings, 1 reply; 6+ messages in thread
From: Wido den Hollander @ 2015-12-21 19:06 UTC (permalink / raw)
  To: ceph-devel

Hi,

While implementing the buildvolfrom method in libvirt for RBD I'm stuck
at some point.

$ virsh vol-clone --pool myrbdpool image1 image2

This would clone image1 to a new RBD image called 'image2'.

The code I've written now does:

1. Create a snapshot called image1@libvirt-<epochtimestamp>
2. Protect the snapshot
3. Clone the snapshot to 'image1'

wido@wido-desktop:~/repos/libvirt$ ./tools/virsh vol-clone --pool
rbdpool image1 image2
Vol image2 cloned from image1

wido@wido-desktop:~/repos/libvirt$

root@alpha:~# rbd -p libvirt info image2
rbd image 'image2':
	size 10240 MB in 2560 objects
	order 22 (4096 kB objects)
	block_name_prefix: rbd_data.1976451ead36b
	format: 2
	features: layering, striping
	flags:
	parent: libvirt/image1@libvirt-1450724650
	overlap: 10240 MB
	stripe unit: 4096 kB
	stripe count: 1
root@alpha:~#

But this could potentially lead to a lot of snapshots with children on
'image1'.

image1 itself will probably never change, but I'm wondering about the
negative performance impact this might have on a OSD.

I'd rather not hardcode a snapshot name like 'libvirt-parent-snapshot'
into libvirt. There is however no way to pass something like a snapshot
name in libvirt when cloning.

Any bright suggestions? Or is it fine to create so many snapshots?

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD performance with many childs and snapshots
  2015-12-21 19:06 RBD performance with many childs and snapshots Wido den Hollander
@ 2015-12-21 22:51 ` Josh Durgin
  2015-12-22 13:34   ` Wido den Hollander
  2015-12-22 21:55   ` Wido den Hollander
  0 siblings, 2 replies; 6+ messages in thread
From: Josh Durgin @ 2015-12-21 22:51 UTC (permalink / raw)
  To: Wido den Hollander, ceph-devel

On 12/21/2015 11:06 AM, Wido den Hollander wrote:
> Hi,
>
> While implementing the buildvolfrom method in libvirt for RBD I'm stuck
> at some point.
>
> $ virsh vol-clone --pool myrbdpool image1 image2
>
> This would clone image1 to a new RBD image called 'image2'.
>
> The code I've written now does:
>
> 1. Create a snapshot called image1@libvirt-<epochtimestamp>
> 2. Protect the snapshot
> 3. Clone the snapshot to 'image1'
>
> wido@wido-desktop:~/repos/libvirt$ ./tools/virsh vol-clone --pool
> rbdpool image1 image2
> Vol image2 cloned from image1
>
> wido@wido-desktop:~/repos/libvirt$
>
> root@alpha:~# rbd -p libvirt info image2
> rbd image 'image2':
> 	size 10240 MB in 2560 objects
> 	order 22 (4096 kB objects)
> 	block_name_prefix: rbd_data.1976451ead36b
> 	format: 2
> 	features: layering, striping
> 	flags:
> 	parent: libvirt/image1@libvirt-1450724650
> 	overlap: 10240 MB
> 	stripe unit: 4096 kB
> 	stripe count: 1
> root@alpha:~#
>
> But this could potentially lead to a lot of snapshots with children on
> 'image1'.
>
> image1 itself will probably never change, but I'm wondering about the
> negative performance impact this might have on a OSD.

Creating them isn't so bad, more snapshots that don't change don't have
much affect on the osds. Deleting them is what's expensive, since the
osds need to scan the objects to see which ones are part of the
snapshot and can be deleted. If you have too many snapshots created and
deleted, it can affect cluster load, so I'd rather avoid always
creating a snapshot.

> I'd rather not hardcode a snapshot name like 'libvirt-parent-snapshot'
> into libvirt. There is however no way to pass something like a snapshot
> name in libvirt when cloning.
>
> Any bright suggestions? Or is it fine to create so many snapshots?

You could have canonical names for the libvirt snapshots like you 
suggest, 'libvirt-<timestamp>', and check via rbd_diff_iterate2()
whether the parent image changed since the last snapshot. That's a bit
slower than plain cloning, but with object map + fast diff it's fast
again, since it doesn't need to scan all the objects anymore.

I think libvirt would need to expand its api a bit to be able to really
use it effectively to manage rbd. Hiding the snapshots becomes 
cumbersome if the application wants to use them too. If libvirt's
current model of clones lets parents be deleted before children,
that may be a hassle to hide too...

Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD performance with many childs and snapshots
  2015-12-21 22:51 ` Josh Durgin
@ 2015-12-22 13:34   ` Wido den Hollander
  2015-12-23  2:03     ` Josh Durgin
  2015-12-22 21:55   ` Wido den Hollander
  1 sibling, 1 reply; 6+ messages in thread
From: Wido den Hollander @ 2015-12-22 13:34 UTC (permalink / raw)
  To: Josh Durgin, ceph-devel



On 21-12-15 23:51, Josh Durgin wrote:
> On 12/21/2015 11:06 AM, Wido den Hollander wrote:
>> Hi,
>>
>> While implementing the buildvolfrom method in libvirt for RBD I'm stuck
>> at some point.
>>
>> $ virsh vol-clone --pool myrbdpool image1 image2
>>
>> This would clone image1 to a new RBD image called 'image2'.
>>
>> The code I've written now does:
>>
>> 1. Create a snapshot called image1@libvirt-<epochtimestamp>
>> 2. Protect the snapshot
>> 3. Clone the snapshot to 'image1'
>>
>> wido@wido-desktop:~/repos/libvirt$ ./tools/virsh vol-clone --pool
>> rbdpool image1 image2
>> Vol image2 cloned from image1
>>
>> wido@wido-desktop:~/repos/libvirt$
>>
>> root@alpha:~# rbd -p libvirt info image2
>> rbd image 'image2':
>>     size 10240 MB in 2560 objects
>>     order 22 (4096 kB objects)
>>     block_name_prefix: rbd_data.1976451ead36b
>>     format: 2
>>     features: layering, striping
>>     flags:
>>     parent: libvirt/image1@libvirt-1450724650
>>     overlap: 10240 MB
>>     stripe unit: 4096 kB
>>     stripe count: 1
>> root@alpha:~#
>>
>> But this could potentially lead to a lot of snapshots with children on
>> 'image1'.
>>
>> image1 itself will probably never change, but I'm wondering about the
>> negative performance impact this might have on a OSD.
> 
> Creating them isn't so bad, more snapshots that don't change don't have
> much affect on the osds. Deleting them is what's expensive, since the
> osds need to scan the objects to see which ones are part of the
> snapshot and can be deleted. If you have too many snapshots created and
> deleted, it can affect cluster load, so I'd rather avoid always
> creating a snapshot.
> 
>> I'd rather not hardcode a snapshot name like 'libvirt-parent-snapshot'
>> into libvirt. There is however no way to pass something like a snapshot
>> name in libvirt when cloning.
>>
>> Any bright suggestions? Or is it fine to create so many snapshots?
> 
> You could have canonical names for the libvirt snapshots like you
> suggest, 'libvirt-<timestamp>', and check via rbd_diff_iterate2()
> whether the parent image changed since the last snapshot. That's a bit
> slower than plain cloning, but with object map + fast diff it's fast
> again, since it doesn't need to scan all the objects anymore.
> 

I'll give that a try, seems like a good suggestion!

I'll have to use rbd_diff_iterate() through since iterate2() is
post-hammer and that will not be available on all systems.

> I think libvirt would need to expand its api a bit to be able to really
> use it effectively to manage rbd. Hiding the snapshots becomes
> cumbersome if the application wants to use them too. If libvirt's
> current model of clones lets parents be deleted before children,
> that may be a hassle to hide too...
> 

Yes, I would love to see:

- vol-snap-list
- vol-snap-create
- vol-snap-delete
- vol-snap-revert

And then:

- vol-clone --snapshot <mysnap> --pool <mypool> image1 image2

But this would need some more work inside libvirt. Would be very nice
though.

At CloudStack we want to do as much as possible using libvirt, the more
features it has there, the less we have to do in Java code :)

Wido

> Josh
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD performance with many childs and snapshots
  2015-12-21 22:51 ` Josh Durgin
  2015-12-22 13:34   ` Wido den Hollander
@ 2015-12-22 21:55   ` Wido den Hollander
  2015-12-23  2:04     ` Josh Durgin
  1 sibling, 1 reply; 6+ messages in thread
From: Wido den Hollander @ 2015-12-22 21:55 UTC (permalink / raw)
  To: Josh Durgin, ceph-devel

On 12/21/2015 11:51 PM, Josh Durgin wrote:
> On 12/21/2015 11:06 AM, Wido den Hollander wrote:
>> Hi,
>>
>> While implementing the buildvolfrom method in libvirt for RBD I'm stuck
>> at some point.
>>
>> $ virsh vol-clone --pool myrbdpool image1 image2
>>
>> This would clone image1 to a new RBD image called 'image2'.
>>
>> The code I've written now does:
>>
>> 1. Create a snapshot called image1@libvirt-<epochtimestamp>
>> 2. Protect the snapshot
>> 3. Clone the snapshot to 'image1'
>>
>> wido@wido-desktop:~/repos/libvirt$ ./tools/virsh vol-clone --pool
>> rbdpool image1 image2
>> Vol image2 cloned from image1
>>
>> wido@wido-desktop:~/repos/libvirt$
>>
>> root@alpha:~# rbd -p libvirt info image2
>> rbd image 'image2':
>>     size 10240 MB in 2560 objects
>>     order 22 (4096 kB objects)
>>     block_name_prefix: rbd_data.1976451ead36b
>>     format: 2
>>     features: layering, striping
>>     flags:
>>     parent: libvirt/image1@libvirt-1450724650
>>     overlap: 10240 MB
>>     stripe unit: 4096 kB
>>     stripe count: 1
>> root@alpha:~#
>>
>> But this could potentially lead to a lot of snapshots with children on
>> 'image1'.
>>
>> image1 itself will probably never change, but I'm wondering about the
>> negative performance impact this might have on a OSD.
> 
> Creating them isn't so bad, more snapshots that don't change don't have
> much affect on the osds. Deleting them is what's expensive, since the
> osds need to scan the objects to see which ones are part of the
> snapshot and can be deleted. If you have too many snapshots created and
> deleted, it can affect cluster load, so I'd rather avoid always
> creating a snapshot.
> 
>> I'd rather not hardcode a snapshot name like 'libvirt-parent-snapshot'
>> into libvirt. There is however no way to pass something like a snapshot
>> name in libvirt when cloning.
>>
>> Any bright suggestions? Or is it fine to create so many snapshots?
> 
> You could have canonical names for the libvirt snapshots like you
> suggest, 'libvirt-<timestamp>', and check via rbd_diff_iterate2()
> whether the parent image changed since the last snapshot. That's a bit
> slower than plain cloning, but with object map + fast diff it's fast
> again, since it doesn't need to scan all the objects anymore.
> 
> I think libvirt would need to expand its api a bit to be able to really
> use it effectively to manage rbd. Hiding the snapshots becomes
> cumbersome if the application wants to use them too. If libvirt's
> current model of clones lets parents be deleted before children,
> that may be a hassle to hide too...
> 

I gave it a shot. callback functions are a bit new to me, but I gave it
a try:
https://github.com/wido/libvirt/commit/756dca8023027616f53c39fa73c52a6d8f86a223

Could you take a look?

> Josh
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD performance with many childs and snapshots
  2015-12-22 13:34   ` Wido den Hollander
@ 2015-12-23  2:03     ` Josh Durgin
  0 siblings, 0 replies; 6+ messages in thread
From: Josh Durgin @ 2015-12-23  2:03 UTC (permalink / raw)
  To: Wido den Hollander, ceph-devel

On 12/22/2015 05:34 AM, Wido den Hollander wrote:
>
>
> On 21-12-15 23:51, Josh Durgin wrote:
>> On 12/21/2015 11:06 AM, Wido den Hollander wrote:
>>> Hi,
>>>
>>> While implementing the buildvolfrom method in libvirt for RBD I'm stuck
>>> at some point.
>>>
>>> $ virsh vol-clone --pool myrbdpool image1 image2
>>>
>>> This would clone image1 to a new RBD image called 'image2'.
>>>
>>> The code I've written now does:
>>>
>>> 1. Create a snapshot called image1@libvirt-<epochtimestamp>
>>> 2. Protect the snapshot
>>> 3. Clone the snapshot to 'image1'
>>>
>>> wido@wido-desktop:~/repos/libvirt$ ./tools/virsh vol-clone --pool
>>> rbdpool image1 image2
>>> Vol image2 cloned from image1
>>>
>>> wido@wido-desktop:~/repos/libvirt$
>>>
>>> root@alpha:~# rbd -p libvirt info image2
>>> rbd image 'image2':
>>>      size 10240 MB in 2560 objects
>>>      order 22 (4096 kB objects)
>>>      block_name_prefix: rbd_data.1976451ead36b
>>>      format: 2
>>>      features: layering, striping
>>>      flags:
>>>      parent: libvirt/image1@libvirt-1450724650
>>>      overlap: 10240 MB
>>>      stripe unit: 4096 kB
>>>      stripe count: 1
>>> root@alpha:~#
>>>
>>> But this could potentially lead to a lot of snapshots with children on
>>> 'image1'.
>>>
>>> image1 itself will probably never change, but I'm wondering about the
>>> negative performance impact this might have on a OSD.
>>
>> Creating them isn't so bad, more snapshots that don't change don't have
>> much affect on the osds. Deleting them is what's expensive, since the
>> osds need to scan the objects to see which ones are part of the
>> snapshot and can be deleted. If you have too many snapshots created and
>> deleted, it can affect cluster load, so I'd rather avoid always
>> creating a snapshot.
>>
>>> I'd rather not hardcode a snapshot name like 'libvirt-parent-snapshot'
>>> into libvirt. There is however no way to pass something like a snapshot
>>> name in libvirt when cloning.
>>>
>>> Any bright suggestions? Or is it fine to create so many snapshots?
>>
>> You could have canonical names for the libvirt snapshots like you
>> suggest, 'libvirt-<timestamp>', and check via rbd_diff_iterate2()
>> whether the parent image changed since the last snapshot. That's a bit
>> slower than plain cloning, but with object map + fast diff it's fast
>> again, since it doesn't need to scan all the objects anymore.
>>
>
> I'll give that a try, seems like a good suggestion!
>
> I'll have to use rbd_diff_iterate() through since iterate2() is
> post-hammer and that will not be available on all systems.
>
>> I think libvirt would need to expand its api a bit to be able to really
>> use it effectively to manage rbd. Hiding the snapshots becomes
>> cumbersome if the application wants to use them too. If libvirt's
>> current model of clones lets parents be deleted before children,
>> that may be a hassle to hide too...
>>
>
> Yes, I would love to see:
>
> - vol-snap-list
> - vol-snap-create
> - vol-snap-delete
> - vol-snap-revert
>
> And then:
>
> - vol-clone --snapshot <mysnap> --pool <mypool> image1 image2
>
> But this would need some more work inside libvirt. Would be very nice
> though.

Yeah, those would be nice.

> At CloudStack we want to do as much as possible using libvirt, the more
> features it has there, the less we have to do in Java code :)

Dan Berrange has talked about using libvirt storage pools for managing
rbd and other storage from openstack nova too, for the same reason. I'm
not sure if there are any current plans for that, but you may want to
ask him about it on the libvirt list.

Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD performance with many childs and snapshots
  2015-12-22 21:55   ` Wido den Hollander
@ 2015-12-23  2:04     ` Josh Durgin
  0 siblings, 0 replies; 6+ messages in thread
From: Josh Durgin @ 2015-12-23  2:04 UTC (permalink / raw)
  To: Wido den Hollander, ceph-devel

On 12/22/2015 01:55 PM, Wido den Hollander wrote:
> On 12/21/2015 11:51 PM, Josh Durgin wrote:
>> On 12/21/2015 11:06 AM, Wido den Hollander wrote:
>>> Hi,
>>>
>>> While implementing the buildvolfrom method in libvirt for RBD I'm stuck
>>> at some point.
>>>
>>> $ virsh vol-clone --pool myrbdpool image1 image2
>>>
>>> This would clone image1 to a new RBD image called 'image2'.
>>>
>>> The code I've written now does:
>>>
>>> 1. Create a snapshot called image1@libvirt-<epochtimestamp>
>>> 2. Protect the snapshot
>>> 3. Clone the snapshot to 'image1'
>>>
>>> wido@wido-desktop:~/repos/libvirt$ ./tools/virsh vol-clone --pool
>>> rbdpool image1 image2
>>> Vol image2 cloned from image1
>>>
>>> wido@wido-desktop:~/repos/libvirt$
>>>
>>> root@alpha:~# rbd -p libvirt info image2
>>> rbd image 'image2':
>>>      size 10240 MB in 2560 objects
>>>      order 22 (4096 kB objects)
>>>      block_name_prefix: rbd_data.1976451ead36b
>>>      format: 2
>>>      features: layering, striping
>>>      flags:
>>>      parent: libvirt/image1@libvirt-1450724650
>>>      overlap: 10240 MB
>>>      stripe unit: 4096 kB
>>>      stripe count: 1
>>> root@alpha:~#
>>>
>>> But this could potentially lead to a lot of snapshots with children on
>>> 'image1'.
>>>
>>> image1 itself will probably never change, but I'm wondering about the
>>> negative performance impact this might have on a OSD.
>>
>> Creating them isn't so bad, more snapshots that don't change don't have
>> much affect on the osds. Deleting them is what's expensive, since the
>> osds need to scan the objects to see which ones are part of the
>> snapshot and can be deleted. If you have too many snapshots created and
>> deleted, it can affect cluster load, so I'd rather avoid always
>> creating a snapshot.
>>
>>> I'd rather not hardcode a snapshot name like 'libvirt-parent-snapshot'
>>> into libvirt. There is however no way to pass something like a snapshot
>>> name in libvirt when cloning.
>>>
>>> Any bright suggestions? Or is it fine to create so many snapshots?
>>
>> You could have canonical names for the libvirt snapshots like you
>> suggest, 'libvirt-<timestamp>', and check via rbd_diff_iterate2()
>> whether the parent image changed since the last snapshot. That's a bit
>> slower than plain cloning, but with object map + fast diff it's fast
>> again, since it doesn't need to scan all the objects anymore.
>>
>> I think libvirt would need to expand its api a bit to be able to really
>> use it effectively to manage rbd. Hiding the snapshots becomes
>> cumbersome if the application wants to use them too. If libvirt's
>> current model of clones lets parents be deleted before children,
>> that may be a hassle to hide too...
>>
>
> I gave it a shot. callback functions are a bit new to me, but I gave it
> a try:
> https://github.com/wido/libvirt/commit/756dca8023027616f53c39fa73c52a6d8f86a223
>
> Could you take a look?

Left some comments on the commits. Looks good in general.

Josh


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-12-23  2:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-21 19:06 RBD performance with many childs and snapshots Wido den Hollander
2015-12-21 22:51 ` Josh Durgin
2015-12-22 13:34   ` Wido den Hollander
2015-12-23  2:03     ` Josh Durgin
2015-12-22 21:55   ` Wido den Hollander
2015-12-23  2:04     ` Josh Durgin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.