All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs send receive, clone
@ 2014-04-24 15:23 Chris Murphy
  2014-04-24 15:55 ` Hugo Mills
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Murphy @ 2014-04-24 15:23 UTC (permalink / raw)
  To: Btrfs BTRFS



I don't understand the btrfs send -c <clone-src> man page text, or really even the use case. In part this is what it says:

> You must not specify clone sources unless you
>  guarantee that these snapshots are exactly in the same state on both
>  sides, the sender and the receiver.

If the snapshots are the same on both sides, then why would I be using clone in the first place?

> -c <clone-src> Use this snapshot as a clone source for an 
> incremental send (multiple allowed)

Incremental send implies the sender and receiver are not in the same state now, but will be after the command is executed. Is one, or both, snapshots rw for -c?

Anyway, I'm lost on the specifics, but clearly I'm even lost when it comes to the basic difference between -p and -c.


Thanks,

Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs send receive, clone
  2014-04-24 15:23 btrfs send receive, clone Chris Murphy
@ 2014-04-24 15:55 ` Hugo Mills
  2014-04-24 16:29   ` Hugo Mills
  2014-04-24 17:22   ` Chris Murphy
  0 siblings, 2 replies; 5+ messages in thread
From: Hugo Mills @ 2014-04-24 15:55 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3598 bytes --]

On Thu, Apr 24, 2014 at 09:23:28AM -0600, Chris Murphy wrote:
> 
> 
> I don't understand the btrfs send -c <clone-src> man page text, or really even the use case. In part this is what it says:
> 
> > You must not specify clone sources unless you
> >  guarantee that these snapshots are exactly in the same state on both
> >  sides, the sender and the receiver.
> 
> If the snapshots are the same on both sides, then why would I be using clone in the first place?

   To copy over another snapshot which shares data with them.

> > -c <clone-src> Use this snapshot as a clone source for an 
> > incremental send (multiple allowed)
> 
> Incremental send implies the sender and receiver are not in the same state now, but will be after the command is executed. Is one, or both, snapshots rw for -c?
> 
> Anyway, I'm lost on the specifics, but clearly I'm even lost when it comes to the basic difference between -p and -c.

(Note: I've not actually tried the second case in what follows, but
it's what I think is going on. This may be subject to corrections.)

   OK, call the sending system "S" and the receiving system "R". Let's
say we've got three subvolumes on S:

S:A2, the current /home (say)
S:A1, a snapshot of an earlier version of S:A2
S:B, a separate subvolume that's had some CoW copies of files in both
     S:A1 and S:A2 made into it.

   If we send S:A1 to R, then we'll have to send the whole thing,
because R doesn't have any subvolumes yet.

   If we now want to send S:A2 to R, then we can use -p S:A1, and it
will send just the differences between those two. This means that the
send stream can potentially ignore a load of the metadata as well as
the data. It's effectively saying, "you can clone R:A1, then do these
things to it to get R:A2".

   If we now want to send S:B to R, then we can use -c S:A1 -c S:A2.
Note that S:B doesn't have any metadata in common with either of the
As, only data. This will send all of the metadata ("start with an
empty subvolume and do these things to it to get R:B"), but because
it's known to share data with some subvols on S, and those subvols
also exist on R, we can avoid sending that data again by simply
specifying where the data can be found and reflinked from on R.

   So, if you have a load of snapshots, you can do one of two things
to duplicate all of them:

btrfs sub send <snap 0>
for n=1 to N
   btrfs sub send -p <snap n-1> <snap n>

   Or, in any order,

btrfs sub send <snap s1>
for n=1 to N
   btrfs sub send -c <snap s1> -c <snap s2> -c <snap s3> ... <snap sn>

where each subvolume that's been sent before gets added as a -c to the
next send command. This second approach means that all possible
reflinks between subvolumes can be captured, but it will send all of
the metadata across each time. The first approach may lose some manual
reflink efficiency, but is better at sending only the necessary
changed metadata. You should be able to combine the two methods, I
think.

   I'm trying to think of a case where -c is useful that doesn't
involve someone having done cp --reflink=always between subvolumes,
but I can't. So, I think the summary is:

 * Use -p to deal with parent-child reflinks through snapshots
 * Use -c to specify other subvolumes (present on both sides) that
   might contain reflinked data

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Well, you don't get to be a kernel hacker simply by looking ---   
                    good in Speedos. -- Rusty Russell                    

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs send receive, clone
  2014-04-24 15:55 ` Hugo Mills
@ 2014-04-24 16:29   ` Hugo Mills
  2014-04-24 17:22   ` Chris Murphy
  1 sibling, 0 replies; 5+ messages in thread
From: Hugo Mills @ 2014-04-24 16:29 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

On Thu, Apr 24, 2014 at 04:55:10PM +0100, Hugo Mills wrote:
>    I'm trying to think of a case where -c is useful that doesn't
> involve someone having done cp --reflink=always between subvolumes,
> but I can't.

   OK, you can use -c if you don't have a record of the relationships
between the subvolumes you want to send, but know that they're related
in some way. As above, you send the first subvol "bare", and then
supply a -c for each one that you've already sent.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
               --- echo "killall cat" > ~/curiosity.sh ---               

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs send receive, clone
  2014-04-24 15:55 ` Hugo Mills
  2014-04-24 16:29   ` Hugo Mills
@ 2014-04-24 17:22   ` Chris Murphy
  2014-04-24 18:45     ` Hugo Mills
  1 sibling, 1 reply; 5+ messages in thread
From: Chris Murphy @ 2014-04-24 17:22 UTC (permalink / raw)
  To: Hugo Mills; +Cc: Btrfs BTRFS


On Apr 24, 2014, at 9:55 AM, Hugo Mills <hugo@carfax.org.uk> wrote:

> On Thu, Apr 24, 2014 at 09:23:28AM -0600, Chris Murphy wrote:
>> 
>> 
>> I don't understand the btrfs send -c <clone-src> man page text, or really even the use case. In part this is what it says:
>> 
>>> You must not specify clone sources unless you
>>> guarantee that these snapshots are exactly in the same state on both
>>> sides, the sender and the receiver.
>> 
>> If the snapshots are the same on both sides, then why would I be using clone in the first place?
> 
>   To copy over another snapshot which shares data with them.
> 
>>> -c <clone-src> Use this snapshot as a clone source for an 
>>> incremental send (multiple allowed)
>> 
>> Incremental send implies the sender and receiver are not in the same state now, but will be after the command is executed. Is one, or both, snapshots rw for -c?
>> 
>> Anyway, I'm lost on the specifics, but clearly I'm even lost when it comes to the basic difference between -p and -c.
> 
> (Note: I've not actually tried the second case in what follows, but
> it's what I think is going on. This may be subject to corrections.)
> 
>   OK, call the sending system "S" and the receiving system "R". Let's
> say we've got three subvolumes on S:
> 
> S:A2, the current /home (say)
> S:A1, a snapshot of an earlier version of S:A2
> S:B, a separate subvolume that's had some CoW copies of files in both
>     S:A1 and S:A2 made into it.
> 
>   If we send S:A1 to R, then we'll have to send the whole thing,
> because R doesn't have any subvolumes yet.
> 
>   If we now want to send S:A2 to R, then we can use -p S:A1, and it
> will send just the differences between those two. This means that the
> send stream can potentially ignore a load of the metadata as well as
> the data. It's effectively saying, "you can clone R:A1, then do these
> things to it to get R:A2".
> 
>   If we now want to send S:B to R, then we can use -c S:A1 -c S:A2.

OK this makes sense now, thanks.

Does the use of -c always require at least two -c instances? Is there an example where -c is used once? From the man page I'm not groking that there must be at least two -c's.


>   I'm trying to think of a case where -c is useful that doesn't
> involve someone having done cp --reflink=always between subvolumes,
> but I can't.

OK.


> So, I think the summary is:
> 
> * Use -p to deal with parent-child reflinks through snapshots
> * Use -c to specify other subvolumes (present on both sides) that
>   might contain reflinked data

I think the key is that -c implies a minimum of five subvolumes: two subvolumes on the source, which have (identical) counterparts on the destination (that's four subvolumes), and then one additional somehow related subvolume B on the source that I want on the destination.

Whereas -p implies three subvolumes (one on the source which is the parent, its counterpart on the destination, and a child on the source which I want on the destination). I necessarily must understand the relationship among them in order to get the desired incremental result on the destination.


Chris Murphy


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: btrfs send receive, clone
  2014-04-24 17:22   ` Chris Murphy
@ 2014-04-24 18:45     ` Hugo Mills
  0 siblings, 0 replies; 5+ messages in thread
From: Hugo Mills @ 2014-04-24 18:45 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4652 bytes --]

On Thu, Apr 24, 2014 at 11:22:40AM -0600, Chris Murphy wrote:
> 
> On Apr 24, 2014, at 9:55 AM, Hugo Mills <hugo@carfax.org.uk> wrote:
> 
> > On Thu, Apr 24, 2014 at 09:23:28AM -0600, Chris Murphy wrote:
> >> 
> >> 
> >> I don't understand the btrfs send -c <clone-src> man page text, or really even the use case. In part this is what it says:
> >> 
> >>> You must not specify clone sources unless you
> >>> guarantee that these snapshots are exactly in the same state on both
> >>> sides, the sender and the receiver.
> >> 
> >> If the snapshots are the same on both sides, then why would I be using clone in the first place?
> > 
> >   To copy over another snapshot which shares data with them.
> > 
> >>> -c <clone-src> Use this snapshot as a clone source for an 
> >>> incremental send (multiple allowed)
> >> 
> >> Incremental send implies the sender and receiver are not in the same state now, but will be after the command is executed. Is one, or both, snapshots rw for -c?
> >> 
> >> Anyway, I'm lost on the specifics, but clearly I'm even lost when it comes to the basic difference between -p and -c.
> > 
> > (Note: I've not actually tried the second case in what follows, but
> > it's what I think is going on. This may be subject to corrections.)
> > 
> >   OK, call the sending system "S" and the receiving system "R". Let's
> > say we've got three subvolumes on S:
> > 
> > S:A2, the current /home (say)
> > S:A1, a snapshot of an earlier version of S:A2
> > S:B, a separate subvolume that's had some CoW copies of files in both
> >     S:A1 and S:A2 made into it.
> > 
> >   If we send S:A1 to R, then we'll have to send the whole thing,
> > because R doesn't have any subvolumes yet.
> > 
> >   If we now want to send S:A2 to R, then we can use -p S:A1, and it
> > will send just the differences between those two. This means that the
> > send stream can potentially ignore a load of the metadata as well as
> > the data. It's effectively saying, "you can clone R:A1, then do these
> > things to it to get R:A2".
> > 
> >   If we now want to send S:B to R, then we can use -c S:A1 -c S:A2.
> 
> OK this makes sense now, thanks.
> 
> Does the use of -c always require at least two -c instances? Is there an example where -c is used once? From the man page I'm not groking that there must be at least two -c's.

   No, my understanding is that you could have any number (0 or more).
It just allows the sending side to tell the receiving side that
there's some shared data in use that it's already got the data for,
and it just needs to hook up the extents. The reason I used two -cs
above was because there's data that S:B shares with those two
subvolumes (because that's the example scenario I picked). If S:B only
shared with one subvolume, you would use only one -c.

> >   I'm trying to think of a case where -c is useful that doesn't
> > involve someone having done cp --reflink=always between subvolumes,
> > but I can't.
> 
> OK.
> 
> 
> > So, I think the summary is:
> > 
> > * Use -p to deal with parent-child reflinks through snapshots
> > * Use -c to specify other subvolumes (present on both sides) that
> >   might contain reflinked data
> 
> I think the key is that -c implies a minimum of five subvolumes: two subvolumes on the source, which have (identical) counterparts on the destination (that's four subvolumes), and then one additional somehow related subvolume B on the source that I want on the destination.

   No, -c implies three subvolumes that exist: the one provided to the
-c, which must exist on both sides as a data source, and the one being
sent, which exists on the sending side, and will be recreated on the
receiving side, with any shared extents replicated.

> Whereas -p implies three subvolumes (one on the source which is the parent, its counterpart on the destination, and a child on the source which I want on the destination). I necessarily must understand the relationship among them in order to get the desired incremental result on the destination.

   I don't think you have to know that the subvol being sent is the
"child" of the subvol provided with -p. I suspect that the operation
would work just as well round the other way (i.e., if you've already
sent the latest snapshot, you could do a cheaper copy of older
snapshots by sending them with -p <latest_subvol>). Remember, there's
not really any deep FS-level concept of parent/child with snapshots.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
           --- Sometimes, when I'm alone, I Google myself. ---           

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-04-24 18:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-24 15:23 btrfs send receive, clone Chris Murphy
2014-04-24 15:55 ` Hugo Mills
2014-04-24 16:29   ` Hugo Mills
2014-04-24 17:22   ` Chris Murphy
2014-04-24 18:45     ` Hugo Mills

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.