All of lore.kernel.org
 help / color / mirror / Atom feed
* RBD image "lightweight snapshots"
@ 2018-08-09 13:01 Piotr Dałek
       [not found] ` <45c564fb-63ed-af46-851b-6467649ae56d-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Piotr Dałek @ 2018-08-09 13:01 UTC (permalink / raw)
  To: ceph-devel, ceph-users

Hello,

At OVH we're heavily utilizing snapshots for our backup system. We think 
there's an interesting optimization opportunity regarding snapshots I'd like 
to discuss here.

The idea is to introduce a concept of a "lightweight" snapshots - such 
snapshot would not contain data but only the information about what has 
changed on the image since it was created (so basically only the object map 
part of snapshots).

Our backup solution (which seems to be a pretty common practice) is as follows:

1. Create snapshot of the image we want to backup
2. If there's a previous backup snapshot, export diff and apply it on the 
backup image
3. If there's no older snapshot, just do a full backup of image

This introduces one big issue: it enforces COW snapshot on image, meaning 
that original image access latencies and consumed space increases. 
"Lightweight" snapshots would remove these inefficiencies - no COW 
performance and storage overhead.

At first glance, it seems like it could be implemented as extension to 
current RBD snapshot system, leaving out the machinery required for 
copy-on-write. In theory it could even co-exist with regular snapshots. 
Removal of these "lightweight" snapshots would be instant (or near instant).

So what do others think about this?

-- 
Piotr Dałek
piotr.dalek@corp.ovh.com
https://www.ovhcloud.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RBD image "lightweight snapshots"
       [not found] ` <45c564fb-63ed-af46-851b-6467649ae56d-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org>
@ 2018-08-09 13:15   ` Sage Weil
       [not found]     ` <alpine.DEB.2.11.1808091311360.15688-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2018-08-09 13:15 UTC (permalink / raw)
  To: Piotr Dałek; +Cc: ceph-devel, ceph-users

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2541 bytes --]

On Thu, 9 Aug 2018, Piotr Dałek wrote:
> Hello,
> 
> At OVH we're heavily utilizing snapshots for our backup system. We think
> there's an interesting optimization opportunity regarding snapshots I'd like
> to discuss here.
> 
> The idea is to introduce a concept of a "lightweight" snapshots - such
> snapshot would not contain data but only the information about what has
> changed on the image since it was created (so basically only the object map
> part of snapshots).
> 
> Our backup solution (which seems to be a pretty common practice) is as
> follows:
> 
> 1. Create snapshot of the image we want to backup
> 2. If there's a previous backup snapshot, export diff and apply it on the
> backup image
> 3. If there's no older snapshot, just do a full backup of image
> 
> This introduces one big issue: it enforces COW snapshot on image, meaning that
> original image access latencies and consumed space increases. "Lightweight"
> snapshots would remove these inefficiencies - no COW performance and storage
> overhead.

The snapshot in 1 would be lightweight you mean?  And you'd do the backup 
some (short) time later based on a diff with changed extents?

I'm pretty sure this will export a garbage image.  I mean, it will usually 
be non-garbage, but the result won't be crash consistent, and in some 
(many?) cases won't be usable.

Consider:

- take reference snapshot
- back up this image (assume for now it is perfect)
- write A to location 1
- take lightweight snapshot
- write B to location 1
- backup process copie location 1 (B) to target

That's the wrong data.  Maybe that change is harmless, but maybe location 
1 belongs to the filesystem journal, and you have some records that now 
reference location 10 that as an A-era value, or haven't been written at 
all yet, and now your file system journal won't replay and you can't 
mount...

sage
 
> At first glance, it seems like it could be implemented as extension to current
> RBD snapshot system, leaving out the machinery required for copy-on-write. In
> theory it could even co-exist with regular snapshots. Removal of these
> "lightweight" snapshots would be instant (or near instant).
> 
> So what do others think about this?
> 
> -- 
> Piotr Dałek
> piotr.dalek-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org
> https://www.ovhcloud.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RBD image "lightweight snapshots"
       [not found]     ` <alpine.DEB.2.11.1808091311360.15688-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
@ 2018-08-09 14:39       ` Alex Elder
       [not found]         ` <cdc2a3bb-d3aa-0208-503a-e37c8bb3ed5d-EkmVulN54Sk@public.gmane.org>
       [not found]         ` <27efd295-8a87-94ea-aa77-fcaee4e5f250@sadziu.pl>
  0 siblings, 2 replies; 10+ messages in thread
From: Alex Elder @ 2018-08-09 14:39 UTC (permalink / raw)
  To: Sage Weil, Piotr Dałek; +Cc: ceph-devel, ceph-users

On 08/09/2018 08:15 AM, Sage Weil wrote:
> On Thu, 9 Aug 2018, Piotr Dałek wrote:
>> Hello,
>>
>> At OVH we're heavily utilizing snapshots for our backup system. We think
>> there's an interesting optimization opportunity regarding snapshots I'd like
>> to discuss here.
>>
>> The idea is to introduce a concept of a "lightweight" snapshots - such
>> snapshot would not contain data but only the information about what has
>> changed on the image since it was created (so basically only the object map
>> part of snapshots).
>>
>> Our backup solution (which seems to be a pretty common practice) is as
>> follows:
>>
>> 1. Create snapshot of the image we want to backup
>> 2. If there's a previous backup snapshot, export diff and apply it on the
>> backup image
>> 3. If there's no older snapshot, just do a full backup of image
>>
>> This introduces one big issue: it enforces COW snapshot on image, meaning that
>> original image access latencies and consumed space increases. "Lightweight"
>> snapshots would remove these inefficiencies - no COW performance and storage
>> overhead.
> 
> The snapshot in 1 would be lightweight you mean?  And you'd do the backup 
> some (short) time later based on a diff with changed extents?
> 
> I'm pretty sure this will export a garbage image.  I mean, it will usually 
> be non-garbage, but the result won't be crash consistent, and in some 
> (many?) cases won't be usable.
> 
> Consider:
> 
> - take reference snapshot
> - back up this image (assume for now it is perfect)
> - write A to location 1
> - take lightweight snapshot
> - write B to location 1
> - backup process copie location 1 (B) to target
> 
> That's the wrong data.  Maybe that change is harmless, but maybe location 
> 1 belongs to the filesystem journal, and you have some records that now 
> reference location 10 that as an A-era value, or haven't been written at 
> all yet, and now your file system journal won't replay and you can't 
> mount...

Forgive me if I'm misunderstanding; this just caught my attention.

The goal here seems to be to reduce the storage needed to do backups of an
RBD image, and I think there's something to that.

This seems to be no different from any other incremental backup scheme.  It's
layered, and it's ultimately based on an "epoch" complete backup image (what
you call the reference snapshot).

If you're using that model, it would be useful to be able to back up only
the data present in a second snapshot that's the child of the reference
snapshot.  (And so on, with snapshot 2 building on snapshot 1, etc.)
RBD internally *knows* this information, but I'm not sure how (or whether)
it's formally exposed.

Restoring an image in this scheme requires restoring the epoch, then the
incrementals, in order.  The cost to restore is higher, but the cost
of incremental backups is significantly smaller than doing full ones.

I'm not sure how the "lightweight" snapshot would work though.  Without
references to objects there's no guarantee the data taken at the time of
the snapshot still exists when you want to back it up.

					-Alex

> 
> sage
>  
>> At first glance, it seems like it could be implemented as extension to current
>> RBD snapshot system, leaving out the machinery required for copy-on-write. In
>> theory it could even co-exist with regular snapshots. Removal of these
>> "lightweight" snapshots would be instant (or near instant).
>>
>> So what do others think about this?
>>
>> -- 
>> Piotr Dałek
>> piotr.dalek@corp.ovh.com
>> https://www.ovhcloud.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RBD image "lightweight snapshots"
       [not found]         ` <cdc2a3bb-d3aa-0208-503a-e37c8bb3ed5d-EkmVulN54Sk@public.gmane.org>
@ 2018-08-10 11:53           ` Paweł Sadowsk
       [not found]             ` <d0c3e492-24da-5188-1a5a-72bc74dac4c9-Bj5ZXqqQV65mR6Xm/wNWPw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Paweł Sadowsk @ 2018-08-10 11:53 UTC (permalink / raw)
  To: Alex Elder, Sage Weil, Piotr Dałek; +Cc: ceph-devel, ceph-users

On 08/09/2018 04:39 PM, Alex Elder wrote:
> On 08/09/2018 08:15 AM, Sage Weil wrote:
>> On Thu, 9 Aug 2018, Piotr Dałek wrote:
>>> Hello,
>>>
>>> At OVH we're heavily utilizing snapshots for our backup system. We think
>>> there's an interesting optimization opportunity regarding snapshots I'd like
>>> to discuss here.
>>>
>>> The idea is to introduce a concept of a "lightweight" snapshots - such
>>> snapshot would not contain data but only the information about what has
>>> changed on the image since it was created (so basically only the object map
>>> part of snapshots).
>>>
>>> Our backup solution (which seems to be a pretty common practice) is as
>>> follows:
>>>
>>> 1. Create snapshot of the image we want to backup
>>> 2. If there's a previous backup snapshot, export diff and apply it on the
>>> backup image
>>> 3. If there's no older snapshot, just do a full backup of image
>>>
>>> This introduces one big issue: it enforces COW snapshot on image, meaning that
>>> original image access latencies and consumed space increases. "Lightweight"
>>> snapshots would remove these inefficiencies - no COW performance and storage
>>> overhead.
>>
>> The snapshot in 1 would be lightweight you mean?  And you'd do the backup 
>> some (short) time later based on a diff with changed extents?
>>
>> I'm pretty sure this will export a garbage image.  I mean, it will usually 
>> be non-garbage, but the result won't be crash consistent, and in some 
>> (many?) cases won't be usable.
>>
>> Consider:
>>
>> - take reference snapshot
>> - back up this image (assume for now it is perfect)
>> - write A to location 1
>> - take lightweight snapshot
>> - write B to location 1
>> - backup process copie location 1 (B) to target

The way I (we) see it working is a bit different:
 - take snapshot (1)
 - data write might occur, it's ok - CoW kicks in here to preserve data
 - export data
 - convert snapshot (1) to a lightweight one (not create new):
   * from now on just remember which blocks has been modified instead
     of doing CoW
   * you can get rid on previously CoW data blocks (they've been
     exported already)
 - more writes
 - take snapshot (2)
 - export diff - only blocks modified since snap (1)
 - convert snapshot (2) to a lightweight one
 - ...


That way I don't see a place for data corruption. Of course this has
some drawbacks - you can't rollback/export data from such lightweight
snapshot anymore. But on the other hand we are reducing need for CoW -
and that's the main goal with this idea. Instead of making CoW ~all the
time it's needed only for the time of exporting image/modified blocks.

>> That's the wrong data.  Maybe that change is harmless, but maybe location 
>> 1 belongs to the filesystem journal, and you have some records that now 
>> reference location 10 that as an A-era value, or haven't been written at 
>> all yet, and now your file system journal won't replay and you can't 
>> mount...
> 
> Forgive me if I'm misunderstanding; this just caught my attention.
> 
> The goal here seems to be to reduce the storage needed to do backups of an
> RBD image, and I think there's something to that.

Storage reduction is only side effect here. We want to get rid of CoW as
much as possible. In an example - we are doing snapshot every 24h - this
means that every 24h we will start doing CoW from the beginning on every
image. This has big impact on a cluster latency

As for the storage need, with 24h backup period we see a space usage
increase by about 5% on our clusters. But this clearly depends on client
traffic.

> This seems to be no different from any other incremental backup scheme.  It's
> layered, and it's ultimately based on an "epoch" complete backup image (what
> you call the reference snapshot).
> 
> If you're using that model, it would be useful to be able to back up only
> the data present in a second snapshot that's the child of the reference
> snapshot.  (And so on, with snapshot 2 building on snapshot 1, etc.)
> RBD internally *knows* this information, but I'm not sure how (or whether)
> it's formally exposed.
> 
> Restoring an image in this scheme requires restoring the epoch, then the
> incrementals, in order.  The cost to restore is higher, but the cost
> of incremental backups is significantly smaller than doing full ones.

It depends how we will store exported data. We might just want to merge
all diffs into base image right after export to keep only single copy.
But that is out of scope of main topic here, IMHO.

> I'm not sure how the "lightweight" snapshot would work though.  Without
> references to objects there's no guarantee the data taken at the time of
> the snapshot still exists when you want to back it up.
> 
> 					-Alex
> 
>>
>> sage
>>  
>>> At first glance, it seems like it could be implemented as extension to current
>>> RBD snapshot system, leaving out the machinery required for copy-on-write. In
>>> theory it could even co-exist with regular snapshots. Removal of these
>>> "lightweight" snapshots would be instant (or near instant).
>>>
>>> So what do others think about this?
>>>
>>> -- 
>>> Piotr Dałek
>>> piotr.dalek@corp.ovh.com
>>> https://www.ovhcloud.com
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RBD image "lightweight snapshots"
       [not found]           ` <27efd295-8a87-94ea-aa77-fcaee4e5f250-Bj5ZXqqQV65mR6Xm/wNWPw@public.gmane.org>
@ 2018-08-10 12:29             ` Sage Weil
       [not found]               ` <alpine.DEB.2.11.1808101222580.15696-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2018-08-10 12:29 UTC (permalink / raw)
  To: Paweł Sadowski; +Cc: ceph-users, Alex Elder, ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6845 bytes --]

On Fri, 10 Aug 2018, Paweł Sadowski wrote:
> On 08/09/2018 04:39 PM, Alex Elder wrote:
> > On 08/09/2018 08:15 AM, Sage Weil wrote:
> >> On Thu, 9 Aug 2018, Piotr Dałek wrote:
> >>> Hello,
> >>>
> >>> At OVH we're heavily utilizing snapshots for our backup system. We think
> >>> there's an interesting optimization opportunity regarding snapshots I'd like
> >>> to discuss here.
> >>>
> >>> The idea is to introduce a concept of a "lightweight" snapshots - such
> >>> snapshot would not contain data but only the information about what has
> >>> changed on the image since it was created (so basically only the object map
> >>> part of snapshots).
> >>>
> >>> Our backup solution (which seems to be a pretty common practice) is as
> >>> follows:
> >>>
> >>> 1. Create snapshot of the image we want to backup
> >>> 2. If there's a previous backup snapshot, export diff and apply it on the
> >>> backup image
> >>> 3. If there's no older snapshot, just do a full backup of image
> >>>
> >>> This introduces one big issue: it enforces COW snapshot on image, meaning that
> >>> original image access latencies and consumed space increases. "Lightweight"
> >>> snapshots would remove these inefficiencies - no COW performance and storage
> >>> overhead.
> >>
> >> The snapshot in 1 would be lightweight you mean?  And you'd do the backup 
> >> some (short) time later based on a diff with changed extents?
> >>
> >> I'm pretty sure this will export a garbage image.  I mean, it will usually 
> >> be non-garbage, but the result won't be crash consistent, and in some 
> >> (many?) cases won't be usable.
> >>
> >> Consider:
> >>
> >> - take reference snapshot
> >> - back up this image (assume for now it is perfect)
> >> - write A to location 1
> >> - take lightweight snapshot
> >> - write B to location 1
> >> - backup process copie location 1 (B) to target
> 
> The way I (we) see it working is a bit different:
>  - take snapshot (1)
>  - data write might occur, it's ok - CoW kicks in here to preserve data
>  - export data
>  - convert snapshot (1) to a lightweight one (not create new):
>    * from now on just remember which blocks has been modified instead
>      of doing CoW
>    * you can get rid on previously CoW data blocks (they've been
>      exported already)
>  - more writes
>  - take snapshot (2)
>  - export diff - only blocks modified since snap (1)
>  - convert snapshot (2) to a lightweight one
>  - ...
> 
> 
> That way I don't see a place for data corruption. Of course this has
> some drawbacks - you can't rollback/export data from such lightweight
> snapshot anymore. But on the other hand we are reducing need for CoW -
> and that's the main goal with this idea. Instead of making CoW ~all the
> time it's needed only for the time of exporting image/modified blocks.

Ok, so this is a bit different.  I'm a bit fuzzy still on how the 
'lightweight' (1) snapshot will be implemented, but basically I think 
you just mean saving on its storage overhead, but keeping enough metadata 
to make a fully consistent (2) for the purposes of the backup.

Maybe Jason has a better idea for how this would work in practice?  I 
haven't thought about the RBD snapshots in a while (not above the rados 
layer at least).

> >> That's the wrong data.  Maybe that change is harmless, but maybe location 
> >> 1 belongs to the filesystem journal, and you have some records that now 
> >> reference location 10 that as an A-era value, or haven't been written at 
> >> all yet, and now your file system journal won't replay and you can't 
> >> mount...
> > 
> > Forgive me if I'm misunderstanding; this just caught my attention.
> > 
> > The goal here seems to be to reduce the storage needed to do backups of an
> > RBD image, and I think there's something to that.
> 
> Storage reduction is only side effect here. We want to get rid of CoW as
> much as possible. In an example - we are doing snapshot every 24h - this
> means that every 24h we will start doing CoW from the beginning on every
> image. This has big impact on a cluster latency
> 
> As for the storage need, with 24h backup period we see a space usage
> increase by about 5% on our clusters. But this clearly depends on client
> traffic.

One thing to keep in mind here is that the CoW/clone overheard goes *way* 
down with BlueStore.  On FileStore we are literally blocking to make 
a copy of each 4MB object.  With BlueStore there is a bit of metadata 
overhead for the tracking but it is doing CoW at the lowest layer.

Lightweight snapshots might be a big win for FileStore but that advantage 
will mostly evaporate once you repave the OSDs.

sage


> > This seems to be no different from any other incremental backup scheme.  It's
> > layered, and it's ultimately based on an "epoch" complete backup image (what
> > you call the reference snapshot).
> > 
> > If you're using that model, it would be useful to be able to back up only
> > the data present in a second snapshot that's the child of the reference
> > snapshot.  (And so on, with snapshot 2 building on snapshot 1, etc.)
> > RBD internally *knows* this information, but I'm not sure how (or whether)
> > it's formally exposed.
> > 
> > Restoring an image in this scheme requires restoring the epoch, then the
> > incrementals, in order.  The cost to restore is higher, but the cost
> > of incremental backups is significantly smaller than doing full ones.
> 
> It depends how we will store exported data. We might just want to merge
> all diffs into base image right after export to keep only single copy.
> But that is out of scope of main topic here, IMHO.
> 
> > I'm not sure how the "lightweight" snapshot would work though.  Without
> > references to objects there's no guarantee the data taken at the time of
> > the snapshot still exists when you want to back it up.
> > 
> > 					-Alex
> > 
> >>
> >> sage
> >>  
> >>> At first glance, it seems like it could be implemented as extension to current
> >>> RBD snapshot system, leaving out the machinery required for copy-on-write. In
> >>> theory it could even co-exist with regular snapshots. Removal of these
> >>> "lightweight" snapshots would be instant (or near instant).
> >>>
> >>> So what do others think about this?
> >>>
> >>> -- 
> >>> Piotr Dałek
> >>> piotr.dalek-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org
> >>> https://www.ovhcloud.com
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RBD image "lightweight snapshots"
       [not found]             ` <d0c3e492-24da-5188-1a5a-72bc74dac4c9-Bj5ZXqqQV65mR6Xm/wNWPw@public.gmane.org>
@ 2018-08-10 16:24               ` Gregory Farnum
       [not found]                 ` <CAJ4mKGacWb=DsN2WeSeyGbSb3QzWoFccXws4H7VBDgBsGw72Eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Gregory Farnum @ 2018-08-10 16:24 UTC (permalink / raw)
  To: Paweł Sadowsk; +Cc: Alex Elder, ceph-users, ceph-devel

On Fri, Aug 10, 2018 at 4:53 AM, Paweł Sadowsk <ceph@sadziu.pl> wrote:
> On 08/09/2018 04:39 PM, Alex Elder wrote:
>> On 08/09/2018 08:15 AM, Sage Weil wrote:
>>> On Thu, 9 Aug 2018, Piotr Dałek wrote:
>>>> Hello,
>>>>
>>>> At OVH we're heavily utilizing snapshots for our backup system. We think
>>>> there's an interesting optimization opportunity regarding snapshots I'd like
>>>> to discuss here.
>>>>
>>>> The idea is to introduce a concept of a "lightweight" snapshots - such
>>>> snapshot would not contain data but only the information about what has
>>>> changed on the image since it was created (so basically only the object map
>>>> part of snapshots).
>>>>
>>>> Our backup solution (which seems to be a pretty common practice) is as
>>>> follows:
>>>>
>>>> 1. Create snapshot of the image we want to backup
>>>> 2. If there's a previous backup snapshot, export diff and apply it on the
>>>> backup image
>>>> 3. If there's no older snapshot, just do a full backup of image
>>>>
>>>> This introduces one big issue: it enforces COW snapshot on image, meaning that
>>>> original image access latencies and consumed space increases. "Lightweight"
>>>> snapshots would remove these inefficiencies - no COW performance and storage
>>>> overhead.
>>>
>>> The snapshot in 1 would be lightweight you mean?  And you'd do the backup
>>> some (short) time later based on a diff with changed extents?
>>>
>>> I'm pretty sure this will export a garbage image.  I mean, it will usually
>>> be non-garbage, but the result won't be crash consistent, and in some
>>> (many?) cases won't be usable.
>>>
>>> Consider:
>>>
>>> - take reference snapshot
>>> - back up this image (assume for now it is perfect)
>>> - write A to location 1
>>> - take lightweight snapshot
>>> - write B to location 1
>>> - backup process copie location 1 (B) to target
>
> The way I (we) see it working is a bit different:
>  - take snapshot (1)
>  - data write might occur, it's ok - CoW kicks in here to preserve data
>  - export data
>  - convert snapshot (1) to a lightweight one (not create new):
>    * from now on just remember which blocks has been modified instead
>      of doing CoW
>    * you can get rid on previously CoW data blocks (they've been
>      exported already)
>  - more writes
>  - take snapshot (2)
>  - export diff - only blocks modified since snap (1)
>  - convert snapshot (2) to a lightweight one
>  - ...
>
>
> That way I don't see a place for data corruption. Of course this has
> some drawbacks - you can't rollback/export data from such lightweight
> snapshot anymore. But on the other hand we are reducing need for CoW -
> and that's the main goal with this idea. Instead of making CoW ~all the
> time it's needed only for the time of exporting image/modified blocks.

What's the advantage of remembering the blocks changed for a
"lightweight snapshot" once the actual data diff is no longer there?
Is there a meaningful difference between this and just immediately
deleting a snapshot after doing the export?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RBD image "lightweight snapshots"
       [not found]                 ` <CAJ4mKGacWb=DsN2WeSeyGbSb3QzWoFccXws4H7VBDgBsGw72Eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-08-11  5:56                   ` Paweł Sadowski
       [not found]                     ` <9251af62-3bcc-6645-4ee0-a30cee3aa1ff-Bj5ZXqqQV65mR6Xm/wNWPw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Paweł Sadowski @ 2018-08-11  5:56 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Alex Elder, ceph-users, ceph-devel

On 08/10/2018 06:24 PM, Gregory Farnum wrote:
> On Fri, Aug 10, 2018 at 4:53 AM, Paweł Sadowsk <ceph@sadziu.pl> wrote:
>> On 08/09/2018 04:39 PM, Alex Elder wrote:
>>> On 08/09/2018 08:15 AM, Sage Weil wrote:
>>>> On Thu, 9 Aug 2018, Piotr Dałek wrote:
>>>>> Hello,
>>>>>
>>>>> At OVH we're heavily utilizing snapshots for our backup system. We think
>>>>> there's an interesting optimization opportunity regarding snapshots I'd like
>>>>> to discuss here.
>>>>>
>>>>> The idea is to introduce a concept of a "lightweight" snapshots - such
>>>>> snapshot would not contain data but only the information about what has
>>>>> changed on the image since it was created (so basically only the object map
>>>>> part of snapshots).
>>>>>
>>>>> Our backup solution (which seems to be a pretty common practice) is as
>>>>> follows:
>>>>>
>>>>> 1. Create snapshot of the image we want to backup
>>>>> 2. If there's a previous backup snapshot, export diff and apply it on the
>>>>> backup image
>>>>> 3. If there's no older snapshot, just do a full backup of image
>>>>>
>>>>> This introduces one big issue: it enforces COW snapshot on image, meaning that
>>>>> original image access latencies and consumed space increases. "Lightweight"
>>>>> snapshots would remove these inefficiencies - no COW performance and storage
>>>>> overhead.
>>>> The snapshot in 1 would be lightweight you mean?  And you'd do the backup
>>>> some (short) time later based on a diff with changed extents?
>>>>
>>>> I'm pretty sure this will export a garbage image.  I mean, it will usually
>>>> be non-garbage, but the result won't be crash consistent, and in some
>>>> (many?) cases won't be usable.
>>>>
>>>> Consider:
>>>>
>>>> - take reference snapshot
>>>> - back up this image (assume for now it is perfect)
>>>> - write A to location 1
>>>> - take lightweight snapshot
>>>> - write B to location 1
>>>> - backup process copie location 1 (B) to target
>> The way I (we) see it working is a bit different:
>>   - take snapshot (1)
>>   - data write might occur, it's ok - CoW kicks in here to preserve data
>>   - export data
>>   - convert snapshot (1) to a lightweight one (not create new):
>>     * from now on just remember which blocks has been modified instead
>>       of doing CoW
>>     * you can get rid on previously CoW data blocks (they've been
>>       exported already)
>>   - more writes
>>   - take snapshot (2)
>>   - export diff - only blocks modified since snap (1)
>>   - convert snapshot (2) to a lightweight one
>>   - ...
>>
>>
>> That way I don't see a place for data corruption. Of course this has
>> some drawbacks - you can't rollback/export data from such lightweight
>> snapshot anymore. But on the other hand we are reducing need for CoW -
>> and that's the main goal with this idea. Instead of making CoW ~all the
>> time it's needed only for the time of exporting image/modified blocks.
> What's the advantage of remembering the blocks changed for a
> "lightweight snapshot" once the actual data diff is no longer there?
> Is there a meaningful difference between this and just immediately
> deleting a snapshot after doing the export?
> -Greg

Advantage is that when I need to export diff I know which blocks changed,
without checking (reading) others so I can just export them for backup.
If i delete snapshot after export, next time I'll have to read whole image
again - no possibility to do differential backup.

But as Sage wrote, we are doing this on Filestore. I don't know how Bluestore
works with snapshots (are whole 4MB chunks copied or only area of current write)
so performance might be much better - need to test it.

Our main goal with this idea is to improve performance in case where all images
have at least one snapshot taken every *backup period* (24h or lower).

-- 
PS

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RBD image "lightweight snapshots"
       [not found]                     ` <9251af62-3bcc-6645-4ee0-a30cee3aa1ff-Bj5ZXqqQV65mR6Xm/wNWPw@public.gmane.org>
@ 2018-08-13 10:22                       ` Bartosz Rabiega
       [not found]                         ` <537635ea-00db-f1ff-6fff-d6fd86af7e7e@corp.ovh.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Bartosz Rabiega @ 2018-08-13 10:22 UTC (permalink / raw)
  To: Paweł Sadowski, Gregory Farnum; +Cc: Alex Elder, ceph-users, ceph-devel



On 08/11/2018 07:56 AM, Paweł Sadowski wrote:
> On 08/10/2018 06:24 PM, Gregory Farnum wrote:
>> On Fri, Aug 10, 2018 at 4:53 AM, Paweł Sadowsk <ceph@sadziu.pl> wrote:
>>> On 08/09/2018 04:39 PM, Alex Elder wrote:
>>>> On 08/09/2018 08:15 AM, Sage Weil wrote:
>>>>> On Thu, 9 Aug 2018, Piotr Dałek wrote:
>>>>>> Hello,
>>>>>>
>>>>>> At OVH we're heavily utilizing snapshots for our backup system. 
>>>>>> We think
>>>>>> there's an interesting optimization opportunity regarding 
>>>>>> snapshots I'd like
>>>>>> to discuss here.
>>>>>>
>>>>>> The idea is to introduce a concept of a "lightweight" snapshots - 
>>>>>> such
>>>>>> snapshot would not contain data but only the information about 
>>>>>> what has
>>>>>> changed on the image since it was created (so basically only the 
>>>>>> object map
>>>>>> part of snapshots).
>>>>>>
>>>>>> Our backup solution (which seems to be a pretty common practice) 
>>>>>> is as
>>>>>> follows:
>>>>>>
>>>>>> 1. Create snapshot of the image we want to backup
>>>>>> 2. If there's a previous backup snapshot, export diff and apply 
>>>>>> it on the
>>>>>> backup image
>>>>>> 3. If there's no older snapshot, just do a full backup of image
>>>>>>
>>>>>> This introduces one big issue: it enforces COW snapshot on image, 
>>>>>> meaning that
>>>>>> original image access latencies and consumed space increases. 
>>>>>> "Lightweight"
>>>>>> snapshots would remove these inefficiencies - no COW performance 
>>>>>> and storage
>>>>>> overhead.
>>>>> The snapshot in 1 would be lightweight you mean?  And you'd do the 
>>>>> backup
>>>>> some (short) time later based on a diff with changed extents?
>>>>>
>>>>> I'm pretty sure this will export a garbage image.  I mean, it will 
>>>>> usually
>>>>> be non-garbage, but the result won't be crash consistent, and in some
>>>>> (many?) cases won't be usable.
>>>>>
>>>>> Consider:
>>>>>
>>>>> - take reference snapshot
>>>>> - back up this image (assume for now it is perfect)
>>>>> - write A to location 1
>>>>> - take lightweight snapshot
>>>>> - write B to location 1
>>>>> - backup process copie location 1 (B) to target
>>> The way I (we) see it working is a bit different:
>>>   - take snapshot (1)
>>>   - data write might occur, it's ok - CoW kicks in here to preserve 
>>> data
>>>   - export data
>>>   - convert snapshot (1) to a lightweight one (not create new):
>>>     * from now on just remember which blocks has been modified instead
>>>       of doing CoW
>>>     * you can get rid on previously CoW data blocks (they've been
>>>       exported already)
>>>   - more writes
>>>   - take snapshot (2)
>>>   - export diff - only blocks modified since snap (1)
>>>   - convert snapshot (2) to a lightweight one
>>>   - ...
>>>
>>>
>>> That way I don't see a place for data corruption. Of course this has
>>> some drawbacks - you can't rollback/export data from such lightweight
>>> snapshot anymore. But on the other hand we are reducing need for CoW -
>>> and that's the main goal with this idea. Instead of making CoW ~all the
>>> time it's needed only for the time of exporting image/modified blocks.
>> What's the advantage of remembering the blocks changed for a
>> "lightweight snapshot" once the actual data diff is no longer there?
>> Is there a meaningful difference between this and just immediately
>> deleting a snapshot after doing the export?
>> -Greg
>
> Advantage is that when I need to export diff I know which blocks changed,
> without checking (reading) others so I can just export them for backup.
> If i delete snapshot after export, next time I'll have to read whole 
> image
> again - no possibility to do differential backup.
>
> But as Sage wrote, we are doing this on Filestore. I don't know how 
> Bluestore
> works with snapshots (are whole 4MB chunks copied or only area of 
> current write)
> so performance might be much better - need to test it.
>
> Our main goal with this idea is to improve performance in case where 
> all images
> have at least one snapshot taken every *backup period* (24h or lower).
>

The actual advantage lies in keeping COW at minimum.

Assuming that you want to do differential backups every 24h.

With normal snapshots:
1. Create snapshot A, do full image export, takes 3h
2. Typical client IO, all writes are COW for 24h
3. After 24h Create snapshot B, and do export diff (A -> B), takes 0.5h
4. Remove snapshot A, as it's no longer needed
5. Typical client IO, all writes are COW for 24h
6. After 24h Create snapshot C, and do export diff (B -> C), takes 0.5h
7. Remove snapshot B, as it's no longer needed
8. Typical client IO, all writes are COW for 24h

Simplified estimation:
COW done for writes all the time since snapshot A = 72h of COW

With 'lightweight' snapshots
1. Create snapshot A, do full image export, takes 3h
2. Convert snapshot A to lightweight
3. Typical client IO, COW was done for 3h only
4. After 24h Create snapshot B, and do export diff (A -> B), takes 0.5h
5. Remove snapshot A, as it's no longer needed
6. Convert snapshot B to lightweight
7. Typical client IO, COW was done only for 0.5h
8. After 24h Create snapshot C, and do export diff (B -> C), takes 0.5h
9. Remove snapshot B, as it's no longer needed
10. Convert snapshot C to lightweight
11. Typical client IO, all writes are COW for 0.5h

Simplified estimation:
COW done for full snapshot lifespan - 3h + 0.5h + 0.5h = 4h of COW

The longer it lasts the bigger the advantage.
I'm not sure how smart COW with bluestore is but still for such use case 
'lightweight' snapshots would probably give much savings (COW overhead 
(CPU + storage IO).

Bartosz Rabiega
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RBD image "lightweight snapshots"
       [not found]               ` <alpine.DEB.2.11.1808101222580.15696-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
@ 2018-08-13 15:20                 ` Jason Dillaman
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Dillaman @ 2018-08-13 15:20 UTC (permalink / raw)
  To: Sage Weil
  Cc: pawel.sadowski-Bj5ZXqqQV65mR6Xm/wNWPw, ceph-users, ceph-devel,
	Alex Elder


[-- Attachment #1.1: Type: text/plain, Size: 8646 bytes --]

On Fri, Aug 10, 2018 at 8:29 AM Sage Weil <sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org> wrote:

> On Fri, 10 Aug 2018, Paweł Sadowski wrote:
> > On 08/09/2018 04:39 PM, Alex Elder wrote:
> > > On 08/09/2018 08:15 AM, Sage Weil wrote:
> > >> On Thu, 9 Aug 2018, Piotr Dałek wrote:
> > >>> Hello,
> > >>>
> > >>> At OVH we're heavily utilizing snapshots for our backup system. We
> think
> > >>> there's an interesting optimization opportunity regarding snapshots
> I'd like
> > >>> to discuss here.
> > >>>
> > >>> The idea is to introduce a concept of a "lightweight" snapshots -
> such
> > >>> snapshot would not contain data but only the information about what
> has
> > >>> changed on the image since it was created (so basically only the
> object map
> > >>> part of snapshots).
> > >>>
> > >>> Our backup solution (which seems to be a pretty common practice) is
> as
> > >>> follows:
> > >>>
> > >>> 1. Create snapshot of the image we want to backup
> > >>> 2. If there's a previous backup snapshot, export diff and apply it
> on the
> > >>> backup image
> > >>> 3. If there's no older snapshot, just do a full backup of image
> > >>>
> > >>> This introduces one big issue: it enforces COW snapshot on image,
> meaning that
> > >>> original image access latencies and consumed space increases.
> "Lightweight"
> > >>> snapshots would remove these inefficiencies - no COW performance and
> storage
> > >>> overhead.
> > >>
> > >> The snapshot in 1 would be lightweight you mean?  And you'd do the
> backup
> > >> some (short) time later based on a diff with changed extents?
> > >>
> > >> I'm pretty sure this will export a garbage image.  I mean, it will
> usually
> > >> be non-garbage, but the result won't be crash consistent, and in some
> > >> (many?) cases won't be usable.
> > >>
> > >> Consider:
> > >>
> > >> - take reference snapshot
> > >> - back up this image (assume for now it is perfect)
> > >> - write A to location 1
> > >> - take lightweight snapshot
> > >> - write B to location 1
> > >> - backup process copie location 1 (B) to target
> >
> > The way I (we) see it working is a bit different:
> >  - take snapshot (1)
> >  - data write might occur, it's ok - CoW kicks in here to preserve data
> >  - export data
> >  - convert snapshot (1) to a lightweight one (not create new):
> >    * from now on just remember which blocks has been modified instead
> >      of doing CoW
> >    * you can get rid on previously CoW data blocks (they've been
> >      exported already)
> >  - more writes
> >  - take snapshot (2)
> >  - export diff - only blocks modified since snap (1)
> >  - convert snapshot (2) to a lightweight one
> >  - ...
> >
> >
> > That way I don't see a place for data corruption. Of course this has
> > some drawbacks - you can't rollback/export data from such lightweight
> > snapshot anymore. But on the other hand we are reducing need for CoW -
> > and that's the main goal with this idea. Instead of making CoW ~all the
> > time it's needed only for the time of exporting image/modified blocks.
>
> Ok, so this is a bit different.  I'm a bit fuzzy still on how the
> 'lightweight' (1) snapshot will be implemented, but basically I think
> you just mean saving on its storage overhead, but keeping enough metadata
> to make a fully consistent (2) for the purposes of the backup.
>
> Maybe Jason has a better idea for how this would work in practice?  I
> haven't thought about the RBD snapshots in a while (not above the rados
> layer at least).
>

The 'fast-diff' object map already tracks updated objects since a snapshot
was taken, so I think such an approach would just require deleting the
RADOS self-managed snapshot when converting to "lightweight" mode and then
just using the existing "--whole-object" option for "rbd export-diff" to
utilize the 'fast-diff' object map for calculating deltas instead of
relying on RADOS snap diffs.

If you don't mind getting your hands dirty writing a little Python code to
invoke "remove_self_managed_snap" using the snap id provided by "rbd snap
ls", you should be able to test it out now. If it were to be incorporated
into RBD core, I think it would need some sanity checks to ensure it relies
on 'fast-diff' when handling a lightweight snapshot. However, I would also
be interested to know if bluestore alleviates a lot of your latency
concerns given that it attempts to redirect-on-write by updating metadata
instead of copy-on-write.


> > >> That's the wrong data.  Maybe that change is harmless, but maybe
> location
> > >> 1 belongs to the filesystem journal, and you have some records that
> now
> > >> reference location 10 that as an A-era value, or haven't been written
> at
> > >> all yet, and now your file system journal won't replay and you can't
> > >> mount...
> > >
> > > Forgive me if I'm misunderstanding; this just caught my attention.
> > >
> > > The goal here seems to be to reduce the storage needed to do backups
> of an
> > > RBD image, and I think there's something to that.
> >
> > Storage reduction is only side effect here. We want to get rid of CoW as
> > much as possible. In an example - we are doing snapshot every 24h - this
> > means that every 24h we will start doing CoW from the beginning on every
> > image. This has big impact on a cluster latency
> >
> > As for the storage need, with 24h backup period we see a space usage
> > increase by about 5% on our clusters. But this clearly depends on client
> > traffic.
>
> One thing to keep in mind here is that the CoW/clone overheard goes *way*
> down with BlueStore.  On FileStore we are literally blocking to make
> a copy of each 4MB object.  With BlueStore there is a bit of metadata
> overhead for the tracking but it is doing CoW at the lowest layer.
>
> Lightweight snapshots might be a big win for FileStore but that advantage
> will mostly evaporate once you repave the OSDs.
>
> sage
>
>
> > > This seems to be no different from any other incremental backup
> scheme.  It's
> > > layered, and it's ultimately based on an "epoch" complete backup image
> (what
> > > you call the reference snapshot).
> > >
> > > If you're using that model, it would be useful to be able to back up
> only
> > > the data present in a second snapshot that's the child of the reference
> > > snapshot.  (And so on, with snapshot 2 building on snapshot 1, etc.)
> > > RBD internally *knows* this information, but I'm not sure how (or
> whether)
> > > it's formally exposed.
> > >
> > > Restoring an image in this scheme requires restoring the epoch, then
> the
> > > incrementals, in order.  The cost to restore is higher, but the cost
> > > of incremental backups is significantly smaller than doing full ones.
> >
> > It depends how we will store exported data. We might just want to merge
> > all diffs into base image right after export to keep only single copy.
> > But that is out of scope of main topic here, IMHO.
> >
> > > I'm not sure how the "lightweight" snapshot would work though.  Without
> > > references to objects there's no guarantee the data taken at the time
> of
> > > the snapshot still exists when you want to back it up.
> > >
> > >                                     -Alex
> > >
> > >>
> > >> sage
> > >>
> > >>> At first glance, it seems like it could be implemented as extension
> to current
> > >>> RBD snapshot system, leaving out the machinery required for
> copy-on-write. In
> > >>> theory it could even co-exist with regular snapshots. Removal of
> these
> > >>> "lightweight" snapshots would be instant (or near instant).
> > >>>
> > >>> So what do others think about this?
> > >>>
> > >>> --
> > >>> Piotr Dałek
> > >>> piotr.dalek-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org
> > >>> https://www.ovhcloud.com
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe
> ceph-devel" in
> > >>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>>
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> >
> > _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Jason

[-- Attachment #1.2: Type: text/html, Size: 11494 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RBD image "lightweight snapshots"
       [not found]                           ` <537635ea-00db-f1ff-6fff-d6fd86af7e7e-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org>
@ 2018-08-27 13:12                             ` Jason Dillaman
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Dillaman @ 2018-08-27 13:12 UTC (permalink / raw)
  To: bartosz.rabiega-Rm6v+N6rxxBWk0Htik3J/w; +Cc: ceph-users, ceph-devel

On Mon, Aug 27, 2018 at 3:29 AM Bartosz Rabiega
<bartosz.rabiega@corp.ovh.com> wrote:
>
> Bumping the topic.
>
>
> So, what do you think guys?

Not sure if you saw my response from August 13th, but I stated that
this is something that you should be able to build right now using the
RADOS Python bindings and the rbd CLI. It would be pretty dangerous
for the average user to use without adding a lot of safety guardrails
to the entire process, however.

Of course, now that I think about it some more, I am not sure how the
OSDs would behave if sent a snap set with a deleted snapshot. They
used to just filter the errant entry, but I'm not sure how they would
behave under the removed snapshot interval set cleanup logic [1].

> On 08/13/2018 12:22 PM, Bartosz Rabiega wrote:
> >
> >
> > On 08/11/2018 07:56 AM, Paweł Sadowski wrote:
> >> On 08/10/2018 06:24 PM, Gregory Farnum wrote:
> >>> On Fri, Aug 10, 2018 at 4:53 AM, Paweł Sadowsk <ceph@sadziu.pl> wrote:
> >>>> On 08/09/2018 04:39 PM, Alex Elder wrote:
> >>>>> On 08/09/2018 08:15 AM, Sage Weil wrote:
> >>>>>> On Thu, 9 Aug 2018, Piotr Dałek wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> At OVH we're heavily utilizing snapshots for our backup system.
> >>>>>>> We think
> >>>>>>> there's an interesting optimization opportunity regarding
> >>>>>>> snapshots I'd like
> >>>>>>> to discuss here.
> >>>>>>>
> >>>>>>> The idea is to introduce a concept of a "lightweight" snapshots
> >>>>>>> - such
> >>>>>>> snapshot would not contain data but only the information about
> >>>>>>> what has
> >>>>>>> changed on the image since it was created (so basically only the
> >>>>>>> object map
> >>>>>>> part of snapshots).
> >>>>>>>
> >>>>>>> Our backup solution (which seems to be a pretty common practice)
> >>>>>>> is as
> >>>>>>> follows:
> >>>>>>>
> >>>>>>> 1. Create snapshot of the image we want to backup
> >>>>>>> 2. If there's a previous backup snapshot, export diff and apply
> >>>>>>> it on the
> >>>>>>> backup image
> >>>>>>> 3. If there's no older snapshot, just do a full backup of image
> >>>>>>>
> >>>>>>> This introduces one big issue: it enforces COW snapshot on
> >>>>>>> image, meaning that
> >>>>>>> original image access latencies and consumed space increases.
> >>>>>>> "Lightweight"
> >>>>>>> snapshots would remove these inefficiencies - no COW performance
> >>>>>>> and storage
> >>>>>>> overhead.
> >>>>>> The snapshot in 1 would be lightweight you mean?  And you'd do
> >>>>>> the backup
> >>>>>> some (short) time later based on a diff with changed extents?
> >>>>>>
> >>>>>> I'm pretty sure this will export a garbage image.  I mean, it
> >>>>>> will usually
> >>>>>> be non-garbage, but the result won't be crash consistent, and in
> >>>>>> some
> >>>>>> (many?) cases won't be usable.
> >>>>>>
> >>>>>> Consider:
> >>>>>>
> >>>>>> - take reference snapshot
> >>>>>> - back up this image (assume for now it is perfect)
> >>>>>> - write A to location 1
> >>>>>> - take lightweight snapshot
> >>>>>> - write B to location 1
> >>>>>> - backup process copie location 1 (B) to target
> >>>> The way I (we) see it working is a bit different:
> >>>>   - take snapshot (1)
> >>>>   - data write might occur, it's ok - CoW kicks in here to preserve
> >>>> data
> >>>>   - export data
> >>>>   - convert snapshot (1) to a lightweight one (not create new):
> >>>>     * from now on just remember which blocks has been modified instead
> >>>>       of doing CoW
> >>>>     * you can get rid on previously CoW data blocks (they've been
> >>>>       exported already)
> >>>>   - more writes
> >>>>   - take snapshot (2)
> >>>>   - export diff - only blocks modified since snap (1)
> >>>>   - convert snapshot (2) to a lightweight one
> >>>>   - ...
> >>>>
> >>>>
> >>>> That way I don't see a place for data corruption. Of course this has
> >>>> some drawbacks - you can't rollback/export data from such lightweight
> >>>> snapshot anymore. But on the other hand we are reducing need for CoW -
> >>>> and that's the main goal with this idea. Instead of making CoW ~all
> >>>> the
> >>>> time it's needed only for the time of exporting image/modified blocks.
> >>> What's the advantage of remembering the blocks changed for a
> >>> "lightweight snapshot" once the actual data diff is no longer there?
> >>> Is there a meaningful difference between this and just immediately
> >>> deleting a snapshot after doing the export?
> >>> -Greg
> >>
> >> Advantage is that when I need to export diff I know which blocks
> >> changed,
> >> without checking (reading) others so I can just export them for backup.
> >> If i delete snapshot after export, next time I'll have to read whole
> >> image
> >> again - no possibility to do differential backup.
> >>
> >> But as Sage wrote, we are doing this on Filestore. I don't know how
> >> Bluestore
> >> works with snapshots (are whole 4MB chunks copied or only area of
> >> current write)
> >> so performance might be much better - need to test it.
> >>
> >> Our main goal with this idea is to improve performance in case where
> >> all images
> >> have at least one snapshot taken every *backup period* (24h or lower).
> >>
> >
> > The actual advantage lies in keeping COW at minimum.
> >
> > Assuming that you want to do differential backups every 24h.
> >
> > With normal snapshots:
> > 1. Create snapshot A, do full image export, takes 3h
> > 2. Typical client IO, all writes are COW for 24h
> > 3. After 24h Create snapshot B, and do export diff (A -> B), takes 0.5h
> > 4. Remove snapshot A, as it's no longer needed
> > 5. Typical client IO, all writes are COW for 24h
> > 6. After 24h Create snapshot C, and do export diff (B -> C), takes 0.5h
> > 7. Remove snapshot B, as it's no longer needed
> > 8. Typical client IO, all writes are COW for 24h
> >
> > Simplified estimation:
> > COW done for writes all the time since snapshot A = 72h of COW
> >
> > With 'lightweight' snapshots
> > 1. Create snapshot A, do full image export, takes 3h
> > 2. Convert snapshot A to lightweight
> > 3. Typical client IO, COW was done for 3h only
> > 4. After 24h Create snapshot B, and do export diff (A -> B), takes 0.5h
> > 5. Remove snapshot A, as it's no longer needed
> > 6. Convert snapshot B to lightweight
> > 7. Typical client IO, COW was done only for 0.5h
> > 8. After 24h Create snapshot C, and do export diff (B -> C), takes 0.5h
> > 9. Remove snapshot B, as it's no longer needed
> > 10. Convert snapshot C to lightweight
> > 11. Typical client IO, all writes are COW for 0.5h
> >
> > Simplified estimation:
> > COW done for full snapshot lifespan - 3h + 0.5h + 0.5h = 4h of COW
> >
> > The longer it lasts the bigger the advantage.
> > I'm not sure how smart COW with bluestore is but still for such use
> > case 'lightweight' snapshots would probably give much savings (COW
> > overhead (CPU + storage IO).
> >
> > Bartosz Rabiega
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] http://github.com/ceph/ceph/pull/18276

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-08-27 13:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-09 13:01 RBD image "lightweight snapshots" Piotr Dałek
     [not found] ` <45c564fb-63ed-af46-851b-6467649ae56d-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org>
2018-08-09 13:15   ` Sage Weil
     [not found]     ` <alpine.DEB.2.11.1808091311360.15688-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
2018-08-09 14:39       ` Alex Elder
     [not found]         ` <cdc2a3bb-d3aa-0208-503a-e37c8bb3ed5d-EkmVulN54Sk@public.gmane.org>
2018-08-10 11:53           ` Paweł Sadowsk
     [not found]             ` <d0c3e492-24da-5188-1a5a-72bc74dac4c9-Bj5ZXqqQV65mR6Xm/wNWPw@public.gmane.org>
2018-08-10 16:24               ` Gregory Farnum
     [not found]                 ` <CAJ4mKGacWb=DsN2WeSeyGbSb3QzWoFccXws4H7VBDgBsGw72Eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-08-11  5:56                   ` Paweł Sadowski
     [not found]                     ` <9251af62-3bcc-6645-4ee0-a30cee3aa1ff-Bj5ZXqqQV65mR6Xm/wNWPw@public.gmane.org>
2018-08-13 10:22                       ` Bartosz Rabiega
     [not found]                         ` <537635ea-00db-f1ff-6fff-d6fd86af7e7e@corp.ovh.com>
     [not found]                           ` <537635ea-00db-f1ff-6fff-d6fd86af7e7e-Rm6v+N6rxxBWk0Htik3J/w@public.gmane.org>
2018-08-27 13:12                             ` Jason Dillaman
     [not found]         ` <27efd295-8a87-94ea-aa77-fcaee4e5f250@sadziu.pl>
     [not found]           ` <27efd295-8a87-94ea-aa77-fcaee4e5f250-Bj5ZXqqQV65mR6Xm/wNWPw@public.gmane.org>
2018-08-10 12:29             ` Sage Weil
     [not found]               ` <alpine.DEB.2.11.1808101222580.15696-qHenpvqtifaMSRpgCs4c+g@public.gmane.org>
2018-08-13 15:20                 ` Jason Dillaman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.