linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Questions regarding COW-related behaviors
@ 2010-11-08 14:23 João Eduardo Luís
  2010-11-08 17:31 ` Goffredo Baroncelli
  2010-11-08 22:45 ` Sean Bartell
  0 siblings, 2 replies; 6+ messages in thread
From: João Eduardo Luís @ 2010-11-08 14:23 UTC (permalink / raw)
  To: linux-btrfs

Hello list.


I've been working on my MSc thesis and I believe such a time came when =
it will cross paths with BTRFS.

However, I have a couple of standing questions I haven't been able do a=
nswer, even though having read Ohad Rodeh's paper, most of the wiki's d=
ocumentation, after looking to BTRFS' code and after testing it myself =
--- I'm not putting aside missing some information, somewhere, though.

Basically, I need to be aware how the COW works in BTRFS, and what it m=
ay allow one to achieve. Questions follow.

1) Is COW only used when creating or updating a file? While testing BTR=
=46S, using 'btrfs subvolume find-new', I got the idea that neither cre=
ation of directories, nor any kind of deletion are covered by COW. Is t=
his right?

2) Each time a COW happens, is there any kind implicit 'snapshotting' t=
hat may keep track of changes around the filesystem for each COW?=20
By Rodeh's paper and some info on the wiki, I gather that a new root is=
 created for each COW, due to shadowing, but will the previous tree be =
kept? The wiki, at "BTRFS Design", states that "after the commit finish=
es, the older subvolume root items may be removed". This would make imp=
ossible to track changes to files, but 'btrfs subvolume find-new' still=
 manages to output file generations, so there must be some info left be=
hind.=20

3) Following (2), is there any way to access this informations and, let=
's say, recover an older version of a given file? Or an entire previous=
 tree?

4) From Rodeh's paper I got the idea that BTRFS uses periodic checkpoin=
ting, in order to assign generations to operations. Using 'btrfs subvol=
ume find-new' I confirmed my suspicions. After copying two different di=
rectories into the same subvolume at the same time, all files got assig=
ned the same generation and it took a while until they all showed up. T=
his raises the question: what triggers a new checkpoint? Is it based on=
 elapsed time since last checkpoint? Is it triggered by a COW and then,=
 all COWs happening at the same time will be put together and create a =
big new generation?

5) If we have multiple jobs updating the same file at the same time, I =
assume the system will shadow their updates; when the time for committi=
ng comes, will there be any kind of 'conflict' between concurrent updat=
es, or will they be applied by order of commit, ignoring whether there =
were previous commits or not? Regarding checkpointing, will all the cha=
nges be shown as part of the generation, or will they be considered as =
only one?


I would greatly appreciate any answer regarding any of this topics, inc=
luding any pointers to additional documentation that I may have missed.


Regards.


---
Jo=E3o Eduardo Lu=EDs




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Questions regarding COW-related behaviors
  2010-11-08 14:23 Questions regarding COW-related behaviors João Eduardo Luís
@ 2010-11-08 17:31 ` Goffredo Baroncelli
  2010-11-08 22:45 ` Sean Bartell
  1 sibling, 0 replies; 6+ messages in thread
From: Goffredo Baroncelli @ 2010-11-08 17:31 UTC (permalink / raw)
  To: João Eduardo Luís; +Cc: linux-btrfs

On Monday, 08 November, 2010, Jo=E3o Eduardo Lu=EDs wrote:
> Hello list.
[...]
> Basically, I need to be aware how the COW works in BTRFS, and what it=
 may=20
allow one to achieve. Questions follow.
>=20
> 1) Is COW only used when creating or updating a file? While testing B=
TRFS,=20
using 'btrfs subvolume find-new', I got the idea that neither creation =
of=20
directories, nor any kind of deletion are covered by COW. Is this right=
?

The command "btrfs subvolume find-new" search through the keys with typ=
e =3D=3D=20
BTRFS_EXTENT_DATA_KEY.
Because the COW is per disk-block basis, it is no so simple to track a =
change=20
in a metadata (in a disk-block there are a lot of metadata). In fact th=
ere are=20
a lot of false-positive.
I thought a bit about a way to compare two tree(s). But it is not so si=
mple.=20
If I understood correctly even if only a leaf is different, you have to=
=20
compare a full branch (from the root to the leaf) of different disk-blo=
cks.

>=20
> 2) Each time a COW happens, is there any kind implicit 'snapshotting'=
 that=20
may keep track of changes around the filesystem for each COW?=20
> By Rodeh's paper and some info on the wiki, I gather that a new root =
is=20
created for each COW, due to shadowing, but will the previous tree be k=
ept?=20
The wiki, at "BTRFS Design", states that "after the commit finishes, th=
e older=20
subvolume root items may be removed". This would make impossible to tra=
ck=20
changes to files, but 'btrfs subvolume find-new' still manages to outpu=
t file=20
generations, so there must be some info left behind.=20
>=20
> 3) Following (2), is there any way to access this informations and, l=
et's=20
say, recover an older version of a given file? Or an entire previous tr=
ee?

Snapshotting a tree is the method to track "an older version of a given=
=20
file"(s) or a tree.=20
[...]
>=20
> I would greatly appreciate any answer regarding any of this topics,=20
including any pointers to additional documentation that I may have miss=
ed.
>=20
>=20
> Regards.
>=20
>=20
> ---
> Jo=E3o Eduardo Lu=EDs
Ciao
Goffredo
--=20
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwi=
nd.it>
Key fingerprint =3D 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Questions regarding COW-related behaviors
  2010-11-08 14:23 Questions regarding COW-related behaviors João Eduardo Luís
  2010-11-08 17:31 ` Goffredo Baroncelli
@ 2010-11-08 22:45 ` Sean Bartell
  2010-11-08 23:05   ` Chris Samuel
  2010-11-09 14:18   ` João Eduardo Luís
  1 sibling, 2 replies; 6+ messages in thread
From: Sean Bartell @ 2010-11-08 22:45 UTC (permalink / raw)
  To: João Eduardo Luís; +Cc: linux-btrfs

(sorry for sending twice)

On Mon, Nov 08, 2010 at 02:23:13PM +0000, Jo=E3o Eduardo Lu=EDs wrote:

> Basically, I need to be aware how the COW works in BTRFS, and what it=
 may allow one to achieve. Questions follow.

=46rom your questions, you don't seem to understand CoW. CoW is basical=
ly
an alternative to the logging/journalling used by most filesystems.

When you change a data structure in a journalling filesystem, like ext4=
,
you actually write two copies--one into the journal, and one that
overwrites the old data structure. If a crash happens, at least one cop=
y
will still be valid, making recovery possible.

When you change a data structure in a CoW filesystem, like btrfs, you
only write one copy, but you DON'T write it over the old data structure=
=2E
You write it to a new, unallocated space. This means the location of th=
e
data structure changed, so you have to change the parent data structure=
;
you use CoW for that and so on up to the superblocks, which actually ar=
e
overwritten. Once that's finished, the old versions are no longer
needed, so they will be unallocated and eventually overwritten. If a
crash happens, the superblocks will still point to the old version of
the data structures.

This makes it relatively easy to add snapshot features--just add
reference counting, and don't free old versions of data structures if
they're still being used. However, this only happens if the user
explicitly requests a snapshot. Otherwise, the old data structures are
freed immediately once the new ones are completely written.

> 1) Is COW only used when creating or updating a file? While testing B=
TRFS, using 'btrfs subvolume find-new', I got the idea that neither cre=
ation of directories, nor any kind of deletion are covered by COW. Is t=
his right?

CoW is used anytime any structure is changed. find-new is not directly
related to CoW.

> 2) Each time a COW happens, is there any kind implicit 'snapshotting'=
 that may keep track of changes around the filesystem for each COW?=20
> By Rodeh's paper and some info on the wiki, I gather that a new root =
is created for each COW, due to shadowing, but will the previous tree b=
e kept? The wiki, at "BTRFS Design", states that "after the commit fini=
shes, the older subvolume root items may be removed". This would make i=
mpossible to track changes to files, but 'btrfs subvolume find-new' sti=
ll manages to output file generations, so there must be some info left =
behind.=20

The old tree is discarded unless the user requested a snapshot of it.

Every time btrfs updates the roots is a new generation. Some data
structures have "generation" fields, indicating the generation in which
they were most recently changed. This is mostly used to verify the
filesystem is correct, but it's also possible to scan the generation
fields and find out which files have changed.

> 3) Following (2), is there any way to access this informations and, l=
et's say, recover an older version of a given file? Or an entire previo=
us tree?

No, unless the user request a snapshot. I'm assuming you're not talking
about tools like PhotoRec, that try to reassemble files from whatever
disk data looks valid.

> 4) From Rodeh's paper I got the idea that BTRFS uses periodic checkpo=
inting, in order to assign generations to operations. Using 'btrfs subv=
olume find-new' I confirmed my suspicions. After copying two different =
directories into the same subvolume at the same time, all files got ass=
igned the same generation and it took a while until they all showed up.=
 This raises the question: what triggers a new checkpoint? Is it based =
on elapsed time since last checkpoint? Is it triggered by a COW and the=
n, all COWs happening at the same time will be put together and create =
a big new generation?

Again, periodic checkpointing is probably the wrong way to think about
it. It would be wasteful to overwrite the superblocks every time a
change is made; instead, btrfs may combine multiple changes into one
generation and only update the superblocks once. I'm not sure exactly
how btrfs decides when to write a new generation.

> 5) If we have multiple jobs updating the same file at the same time, =
I assume the system will shadow their updates; when the time for commit=
ting comes, will there be any kind of 'conflict' between concurrent upd=
ates, or will they be applied by order of commit, ignoring whether ther=
e were previous commits or not? Regarding checkpointing, will all the c=
hanges be shown as part of the generation, or will they be considered a=
s only one?

This is handled just like in any other filesystem. There are no
concurrent generations; if two threads both update a file, btrfs will
handle the updates sequentially, one at a time.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Questions regarding COW-related behaviors
  2010-11-08 22:45 ` Sean Bartell
@ 2010-11-08 23:05   ` Chris Samuel
  2010-11-09 14:18   ` João Eduardo Luís
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Samuel @ 2010-11-08 23:05 UTC (permalink / raw)
  To: The development of BTRFS

On 09/11/10 09:45, Sean Bartell wrote:

> No, unless the user request a snapshot. I'm assuming you're not talking
> about tools like PhotoRec, that try to reassemble files from whatever
> disk data looks valid.

It may be that he has confused it with nilfs which does have
automatic periodic checkpointing (and expiry thereof) based on
a user configurable policy.  These can be converted to persistent
snapshots from the command line.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Questions regarding COW-related behaviors
  2010-11-08 22:45 ` Sean Bartell
  2010-11-08 23:05   ` Chris Samuel
@ 2010-11-09 14:18   ` João Eduardo Luís
  2010-11-09 19:37     ` Goffredo Baroncelli
  1 sibling, 1 reply; 6+ messages in thread
From: João Eduardo Luís @ 2010-11-09 14:18 UTC (permalink / raw)
  To: linux-btrfs

=46irst of all, thanks for all replies; they've been quite insightful.


On Nov 8, 2010, at 10:45 PM, Sean Bartell wrote:

> From your questions, you don't seem to understand CoW. CoW is basical=
ly
> an alternative to the logging/journalling used by most filesystems.
>=20

Actually, I do understand how CoW works. Although, maybe due to naivet=E9=
, I do lack some understanding on how it is applied on a full-fledged f=
ilesystem.


>> 2) Each time a COW happens, is there any kind implicit 'snapshotting=
' that may keep track of changes around the filesystem for each COW?=20
>> By Rodeh's paper and some info on the wiki, I gather that a new root=
 is created for each COW, due to shadowing, but will the previous tree =
be kept? The wiki, at "BTRFS Design", states that "after the commit fin=
ishes, the older subvolume root items may be removed". This would make =
impossible to track changes to files, but 'btrfs subvolume find-new' st=
ill manages to output file generations, so there must be some info left=
 behind.=20
>=20
> The old tree is discarded unless the user requested a snapshot of it.
>=20
> Every time btrfs updates the roots is a new generation. Some data
> structures have "generation" fields, indicating the generation in whi=
ch
> they were most recently changed. This is mostly used to verify the
> filesystem is correct, but it's also possible to scan the generation
> fields and find out which files have changed.

As Goffredo Baroncelli explained in a previous reply to my questions, t=
he "find-new" command will search through keys with type BTRFS_EXTENT_D=
ATA_TYPE. This command does print several changes to the same files thr=
oughout history since a given generation. My new question to this is ra=
ther simple: does BTRFS actually keep the data from this generations to=
 which "find-new" has access, or is it only able to access information =
that records this changes?


>=20
>> 3) Following (2), is there any way to access this informations and, =
let's say, recover an older version of a given file? Or an entire previ=
ous tree?
>=20
> No, unless the user request a snapshot. I'm assuming you're not talki=
ng
> about tools like PhotoRec, that try to reassemble files from whatever
> disk data looks valid.

You're right. What I mean is for one to be able to actually recover an =
old, recently-modified version of a file -- somewhat like a versioning =
system. I believe to have read that both WAFL and ZFS have similar supp=
ort. I do understand now that this is only possible if one explicitly c=
reates a snapshot.

However, I thought that, by using shadowing of the changed blocks, it c=
ould be quite inexpensive (aside from a storage point-of-view) to impli=
citly keep multiple "versions" of the tree --- unchanged blocks would b=
e kept shared among "tree versions" until CoWed, if they were ever chan=
ged. With these versions one would be able to recover, or restore, a fi=
le or the entire filesystem, regardless of having created an explicit c=
heckpoint. Then again,  I understand this is not how it works with BTRF=
S, and neither do I have a clue if it is feasible such support.

=20
>=20
>> 4) From Rodeh's paper I got the idea that BTRFS uses periodic checkp=
ointing, in order to assign generations to operations. Using 'btrfs sub=
volume find-new' I confirmed my suspicions. After copying two different=
 directories into the same subvolume at the same time, all files got as=
signed the same generation and it took a while until they all showed up=
=2E This raises the question: what triggers a new checkpoint? Is it bas=
ed on elapsed time since last checkpoint? Is it triggered by a COW and =
then, all COWs happening at the same time will be put together and crea=
te a big new generation?
>=20
> Again, periodic checkpointing is probably the wrong way to think abou=
t
> it. It would be wasteful to overwrite the superblocks every time a
> change is made; instead, btrfs may combine multiple changes into one
> generation and only update the superblocks once. I'm not sure exactly
> how btrfs decides when to write a new generation.

As Chris Samuel stated in another reply, at some point I did made the l=
ink between BTRFS' checkpointing and NILFS'. Although I assumed BTRFS' =
checkpointing was hardcoded somewhere in the code. If this is not the c=
ase, I'm still wondering how such decision is made, for I have not yet =
found where this checkpointing is happening in the code.


Regards.

---
Jo=E3o Eduardo Lu=EDs--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Questions regarding COW-related behaviors
  2010-11-09 14:18   ` João Eduardo Luís
@ 2010-11-09 19:37     ` Goffredo Baroncelli
  0 siblings, 0 replies; 6+ messages in thread
From: Goffredo Baroncelli @ 2010-11-09 19:37 UTC (permalink / raw)
  To: João Eduardo Luís; +Cc: linux-btrfs

On Tuesday, 09 November, 2010, Jo=E3o Eduardo Lu=EDs wrote:
> > The old tree is discarded unless the user requested a snapshot of i=
t.
> >=20
> > Every time btrfs updates the roots is a new generation. Some data
> > structures have "generation" fields, indicating the generation in w=
hich
> > they were most recently changed. This is mostly used to verify the
> > filesystem is correct, but it's also possible to scan the generatio=
n
> > fields and find out which files have changed.
>=20
> As Goffredo Baroncelli explained in a previous reply to my questions,=
 the=20
"find-new" command will search through keys with type BTRFS_EXTENT_DATA=
_TYPE.=20
This command does print several changes to the same files throughout hi=
story=20
since a given generation. My new question to this is rather simple: doe=
s BTRFS=20
actually keep the data from this generations to which "find-new" has ac=
cess,=20
or is it only able to access information that records this changes?
>=20

Btrfs stores the info in a btree. With the exception of the leaf, every=
 block=20
of the tree contains a list of a pair of key, pointer defined by the st=
ruct=20
btrfs_key_ptr:

struct btrfs_disk_key {
        __le64 objectid;
        u8 type;
        __le64 offset;
} __attribute__ ((__packed__));

struct btrfs_key_ptr {
        struct btrfs_disk_key key;
        __le64 blockptr;
        __le64 generation;
} __attribute__ ((__packed__));

The generation field, contains the info in which you are interested.

So I think that the correct answer of your question is the second one.

Regards
G.Baroncelli

--=20
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwi=
nd.it>
Key fingerprint =3D 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-11-09 19:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-08 14:23 Questions regarding COW-related behaviors João Eduardo Luís
2010-11-08 17:31 ` Goffredo Baroncelli
2010-11-08 22:45 ` Sean Bartell
2010-11-08 23:05   ` Chris Samuel
2010-11-09 14:18   ` João Eduardo Luís
2010-11-09 19:37     ` Goffredo Baroncelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).