All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS messes up snapshot LV with origin
@ 2014-11-16 21:35 MegaBrutal
  2014-11-17  1:42 ` Duncan
  0 siblings, 1 reply; 64+ messages in thread
From: MegaBrutal @ 2014-11-16 21:35 UTC (permalink / raw)
  To: linux-btrfs

Hello guys,

I think you'll like this...
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429


MegaBrutal

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-16 21:35 BTRFS messes up snapshot LV with origin MegaBrutal
@ 2014-11-17  1:42 ` Duncan
  2014-11-17  6:59   ` Brendan Hide
  0 siblings, 1 reply; 64+ messages in thread
From: Duncan @ 2014-11-17  1:42 UTC (permalink / raw)
  To: linux-btrfs

MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:

> Hello guys,
> 
> I think you'll like this...
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429

UUID is an initialism for "Universally Unique IDentifier".[1]

If the UUID isn't unique, by definition, then, it can't be a UUID, and 
that's a bug in whatever is making the non-unique would-be UUID that 
isn't unique and thus cannot be a universally unique ID.  In this case 
that would appear to be LVM.

Meanwhile, if two or more devices are btrfs and have the same UUID, btrfs 
considers them part of the same filesystem, since btrfs /can/ be a multi-
device filesystem.  That's not a bug; that's the way btrfs IDs multiple 
devices as part of the same filesystem, because a UUID, by definition, 
can be relied upon to be unique, or it's no longer a UUID.  Additionally, 
the UUID is actually written into the metadata of the filesystem in such 
a way that it's /not/ a simple task to change the UUID.  Put simply, it's 
"ingrained" into the filesystem so deeply it cannot be changed, at least 
not without rewriting pretty much all the metadata.  (FWIW, a btrfs 
balance does just that, rewrite the data, metadata, or both.  However, I 
don't believe a balance plugin to change the UUID is yet available.  
You're simply not supposed to change the UUID once the filesystem is 
created.)

So if LVM snapshots duplicate a UUID, as I believe they do, then there's 
your bug, because they're breaking the definition of Universally *UNIQUE* 
ID.  That being the case, using them with btrfs is pretty essentially 
broken, because btrfs depends on UUIDs to be what they say on the label, 
actually "unique", and UUIDs are deeply enough ingrained into the very 
fabric of btrfs that it's simply not possible to change that on the btrfs 
side.

Meanwhile, since btrfs *DOES* depend on UUIDs being unique, if there's 
multiple btrfs that accidentally have the same UUID, btrfs will not 
distinguish between them and will very possibly be writing into both of 
them.  If I found myself in that situation, I'd very carefully copy all 
the data I wanted to save off the filesystem and do a new mkfs as soon as 
possible, because I would not consider the filesystem as it was at all 
stable, and I'd count myself very lucky if I got everything off the 
filesystem without damage.  In actuality, since the second device was a 
snapshot of the first, if you catch it reasonably quickly you likely 
won't have too many issues.  However, a btrfs in that condition is in an 
undefined state, and the longer it exists in that state, the more likely 
things are to go wrong, possibly VERY VERY wrong.  So if you don't 
already have backups for anything you consider valuable on that thing, 
get it off there as soon as you possibly can, and consider yourself very 
lucky if nothing's damaged as a result.

---
[1] http://en.wiktionary.org/wiki/UUID

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-17  1:42 ` Duncan
@ 2014-11-17  6:59   ` Brendan Hide
  2014-11-17  7:35     ` Daniel Dressler
                       ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Brendan Hide @ 2014-11-17  6:59 UTC (permalink / raw)
  To: linux-btrfs; +Cc: bug-grub

cc'd bug-grub@gnu.org for FYI

On 2014/11/17 03:42, Duncan wrote:
> MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:
>
>> Hello guys,
>>
>> I think you'll like this...
>> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429
> UUID is an initialism for "Universally Unique IDentifier".[1]
>
> If the UUID isn't unique, by definition, then, it can't be a UUID, and
> that's a bug in whatever is making the non-unique would-be UUID that
> isn't unique and thus cannot be a universally unique ID.  In this case
> that would appear to be LVM.
Perhaps the right question to ask is "Where should this bug be fixed?".

TL;DR: This needs more thought and input from btrfs devs. To LVM, the 
bug is likely seen as being "out of scope". The "correct" fix probably 
lies in the ecosystem design, which requires co-operation from btrfs.

Making a snapshot in LVM is a fundamental thing - and I feel LVM, in 
making its snapshot, is doing its job "exactly as expected".

Additionally, there are other ways to get to a similar state without 
LVM: ddrescue backup, SAN snapshot, old "missing" disk re-introduced, etc.

That leaves two places where this can be fixed: grub and btrfs

Grub is already a little smart here - it avoids snapshots. But in this 
case it is relying on the UUID and only finding it in the snapshot. So 
possibly this is a bug in grub affecting the bug reporter specifically - 
but perhaps the bug is in btrfs where grub is relying on btrfs code.

Yes, I'd rather use btrfs' snapshot mechanism - but this is often a 
choice that is left to the user/admin/distro. I don't think saying "LVM 
snapshots are incompatible with btrfs" is the right way to go either.

That leaves two aspects of this issue which I view as two separate bugs:
a) Btrfs cannot gracefully handle separate filesystems that have the 
same UUID. At all.
b) Grub appears to pick the wrong filesystem when presented with two 
filesystems with the same UUID.

I feel a) is a btrfs bug.
I feel b) is a bug that is more about "ecosystem design" than grub being 
silly.

I imagine a couple of aspects that could help fix a):
- Utilise a "unique drive identifier" in the btrfs metadata (surely this 
exists already?). This way, any two filesystems will always have 
different drive identifiers *except* in cases like a ddrescue'd copy or 
a block-level snapshot. This will provide a sensible mechanism for 
"defined behaviour", preventing corruption - even if that "defined 
behaviour" is to simply give out lots of "PEBKAC" errors and panic.
- Utilise a "drive list" to ensure that two unrelated filesystems with 
the same UUID cannot get "mixed up". Yes, the user/admin would likely be 
the culprit here (perhaps a VM rollout process that always gives out the 
same UUID in all its filesystems). Again, does btrfs not already have 
something like this built-in that we're simply not utilising fully?

I'm not exactly sure of the "correct" way to fix b) except that I 
imagine it would be trivial to fix once a) is fixed.

-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-17  6:59   ` Brendan Hide
@ 2014-11-17  7:35     ` Daniel Dressler
  2014-11-17  9:00       ` Brendan Hide
  2014-11-17 19:04     ` Goffredo Baroncelli
  2014-11-18  6:21     ` Chris Murphy
  2 siblings, 1 reply; 64+ messages in thread
From: Daniel Dressler @ 2014-11-17  7:35 UTC (permalink / raw)
  To: Brendan Hide; +Cc: open list:BTRFS FILE SYSTEM, bug-grub

If a UUID is not unique enough how will adding a second UUID or
"unique drive identifier" help?

A UUID only serves any purpose when it is unique. Thus duplicate UUIDs
are themselves a failure state.

The solution should be to make it harder to get into this failure
state. Not to make all programs resilient against running under this
failure state. It isn't a btrfs bug that it requires Universal Unique
IDs to be universally unique.

Daniel

2014-11-17 15:59 GMT+09:00 Brendan Hide <brendan@swiftspirit.co.za>:
> cc'd bug-grub@gnu.org for FYI
>
> On 2014/11/17 03:42, Duncan wrote:
>>
>> MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:
>>
>>> Hello guys,
>>>
>>> I think you'll like this...
>>> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429
>>
>> UUID is an initialism for "Universally Unique IDentifier".[1]
>>
>> If the UUID isn't unique, by definition, then, it can't be a UUID, and
>> that's a bug in whatever is making the non-unique would-be UUID that
>> isn't unique and thus cannot be a universally unique ID.  In this case
>> that would appear to be LVM.
>
> Perhaps the right question to ask is "Where should this bug be fixed?".
>
> TL;DR: This needs more thought and input from btrfs devs. To LVM, the bug is
> likely seen as being "out of scope". The "correct" fix probably lies in the
> ecosystem design, which requires co-operation from btrfs.
>
> Making a snapshot in LVM is a fundamental thing - and I feel LVM, in making
> its snapshot, is doing its job "exactly as expected".
>
> Additionally, there are other ways to get to a similar state without LVM:
> ddrescue backup, SAN snapshot, old "missing" disk re-introduced, etc.
>
> That leaves two places where this can be fixed: grub and btrfs
>
> Grub is already a little smart here - it avoids snapshots. But in this case
> it is relying on the UUID and only finding it in the snapshot. So possibly
> this is a bug in grub affecting the bug reporter specifically - but perhaps
> the bug is in btrfs where grub is relying on btrfs code.
>
> Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice
> that is left to the user/admin/distro. I don't think saying "LVM snapshots
> are incompatible with btrfs" is the right way to go either.
>
> That leaves two aspects of this issue which I view as two separate bugs:
> a) Btrfs cannot gracefully handle separate filesystems that have the same
> UUID. At all.
> b) Grub appears to pick the wrong filesystem when presented with two
> filesystems with the same UUID.
>
> I feel a) is a btrfs bug.
> I feel b) is a bug that is more about "ecosystem design" than grub being
> silly.
>
> I imagine a couple of aspects that could help fix a):
> - Utilise a "unique drive identifier" in the btrfs metadata (surely this
> exists already?). This way, any two filesystems will always have different
> drive identifiers *except* in cases like a ddrescue'd copy or a block-level
> snapshot. This will provide a sensible mechanism for "defined behaviour",
> preventing corruption - even if that "defined behaviour" is to simply give
> out lots of "PEBKAC" errors and panic.
> - Utilise a "drive list" to ensure that two unrelated filesystems with the
> same UUID cannot get "mixed up". Yes, the user/admin would likely be the
> culprit here (perhaps a VM rollout process that always gives out the same
> UUID in all its filesystems). Again, does btrfs not already have something
> like this built-in that we're simply not utilising fully?
>
> I'm not exactly sure of the "correct" way to fix b) except that I imagine it
> would be trivial to fix once a) is fixed.
>
> --
> __________
> Brendan Hide
> http://swiftspirit.co.za/
> http://www.webafrica.co.za/?AFF1E97
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-17  7:35     ` Daniel Dressler
@ 2014-11-17  9:00       ` Brendan Hide
  0 siblings, 0 replies; 64+ messages in thread
From: Brendan Hide @ 2014-11-17  9:00 UTC (permalink / raw)
  Cc: linux-btrfs, bug-grub

On 2014/11/17 09:35, Daniel Dressler top-posted:
> If a UUID is not unique enough how will adding a second UUID or
> "unique drive identifier" help?
A UUID is *supposed* to be unique by design. Isolated, the design is 
adequate.

But the bigger picture clearly shows the design is naive. And broken.

A second per-disk id (note I said "unique" - but I never said universal 
as in "UUID") would allow for better-defined behaviour where, presently, 
we're simply saying "current behaviour is undefined and you're likely to 
get corruption".

On the other hand, I asked already if we have IDs of some sort (how else 
do we know which disk a chunk is stored on?), thus I don't think we need 
to add anything to the format.

A simple scenario similar to the one the OP introduced:

Disk sda -> says it is UUID Z with diskid 0
Disk sdb -> says it is UUID Z with diskid 0

If we're ignoring the fact that there are two disks with the same UUID 
and diskid and it causes corruption, then the kernel is doing something 
"stupid but fixable". We have some choices:
- give a clear warning and ignore one of the disks (could just pick the 
first one - or be a little smarter and pick one based on some heuristic 
- for example extent generation number)
- give a clear error and panic

Normal multi-disk scenario:
Disk sda -> UUID Z with diskid 1
Disk sdb -> UUID Z with diskid 2

These two disks are in the same filesystem and are supposed to work 
together - no issues.

My second suggestion covers another scenario as well:

Disk sda -> UUID Z with diskid 1; root block indicates that only diskid 
1 is recorded as being part of the filesystem
Disk sdb -> UUID Z with diskid 3; root block indicates that only diskid 
3 is recorded as being part of the filesystem

Again, based on the existing featureset, it seems reasonable that this 
information should already be recorded in the fs metadata. If the 
behaviour is "undefined" and causing corruption, again the kernel is 
currently doing something "stupid but fixable". Again, we have similar 
choices:
- give a clear warning and ignore bad disk(s)
- give a clear error and panic

> 2014-11-17 15:59 GMT+09:00 Brendan Hide <brendan@swiftspirit.co.za>:
>> cc'd bug-grub@gnu.org for FYI
>>
>> On 2014/11/17 03:42, Duncan wrote:
>>> MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:
>>>
>>>> Hello guys,
>>>>
>>>> I think you'll like this...
>>>> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429
>>> UUID is an initialism for "Universally Unique IDentifier".[1]
>>>
>>> If the UUID isn't unique, by definition, then, it can't be a UUID, and
>>> that's a bug in whatever is making the non-unique would-be UUID that
>>> isn't unique and thus cannot be a universally unique ID.  In this case
>>> that would appear to be LVM.
>> Perhaps the right question to ask is "Where should this bug be fixed?".
>>
>> TL;DR: This needs more thought and input from btrfs devs. To LVM, the bug is
>> likely seen as being "out of scope". The "correct" fix probably lies in the
>> ecosystem design, which requires co-operation from btrfs.
>>
>> Making a snapshot in LVM is a fundamental thing - and I feel LVM, in making
>> its snapshot, is doing its job "exactly as expected".
>>
>> Additionally, there are other ways to get to a similar state without LVM:
>> ddrescue backup, SAN snapshot, old "missing" disk re-introduced, etc.
>>
>> That leaves two places where this can be fixed: grub and btrfs
>>
>> Grub is already a little smart here - it avoids snapshots. But in this case
>> it is relying on the UUID and only finding it in the snapshot. So possibly
>> this is a bug in grub affecting the bug reporter specifically - but perhaps
>> the bug is in btrfs where grub is relying on btrfs code.
>>
>> Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice
>> that is left to the user/admin/distro. I don't think saying "LVM snapshots
>> are incompatible with btrfs" is the right way to go either.
>>
>> That leaves two aspects of this issue which I view as two separate bugs:
>> a) Btrfs cannot gracefully handle separate filesystems that have the same
>> UUID. At all.
>> b) Grub appears to pick the wrong filesystem when presented with two
>> filesystems with the same UUID.
>>
>> I feel a) is a btrfs bug.
>> I feel b) is a bug that is more about "ecosystem design" than grub being
>> silly.
>>
>> I imagine a couple of aspects that could help fix a):
>> - Utilise a "unique drive identifier" in the btrfs metadata (surely this
>> exists already?). This way, any two filesystems will always have different
>> drive identifiers *except* in cases like a ddrescue'd copy or a block-level
>> snapshot. This will provide a sensible mechanism for "defined behaviour",
>> preventing corruption - even if that "defined behaviour" is to simply give
>> out lots of "PEBKAC" errors and panic.
>> - Utilise a "drive list" to ensure that two unrelated filesystems with the
>> same UUID cannot get "mixed up". Yes, the user/admin would likely be the
>> culprit here (perhaps a VM rollout process that always gives out the same
>> UUID in all its filesystems). Again, does btrfs not already have something
>> like this built-in that we're simply not utilising fully?
>>
>> I'm not exactly sure of the "correct" way to fix b) except that I imagine it
>> would be trivial to fix once a) is fixed.
>>
>> --
>> __________
>> Brendan Hide
>> http://swiftspirit.co.za/
>> http://www.webafrica.co.za/?AFF1E97
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-17  6:59   ` Brendan Hide
  2014-11-17  7:35     ` Daniel Dressler
@ 2014-11-17 19:04     ` Goffredo Baroncelli
       [not found]       ` <CAE8gLh=VubBbZdeKTAuWRjOxPF7C+ouUeeVvmGfT2ckYWGhQVA@mail.gmail.com>
  2014-11-21  4:24       ` Zygo Blaxell
  2014-11-18  6:21     ` Chris Murphy
  2 siblings, 2 replies; 64+ messages in thread
From: Goffredo Baroncelli @ 2014-11-17 19:04 UTC (permalink / raw)
  To: Brendan Hide, linux-btrfs; +Cc: bug-grub

On 2014-11-17 07:59, Brendan Hide wrote:
> 
> That leaves two aspects of this issue which I view as two separate bugs:
> a) Btrfs cannot gracefully handle separate filesystems that have the same UUID. At all.
> b) Grub appears to pick the wrong filesystem when presented with two filesystems with the same UUID.
> 
> I feel a) is a btrfs bug.
> I feel b) is a bug that is more about "ecosystem design" than grub being silly.

Regarding a)
IIRC, btrfs collects the filesystem information by UUID; if two 
filesystems have the same UUID (like the LVM-snapshot case), the
last filesystem discovered overwrite the first one.

The filesystem discovering is done in user-space; so it should be simple
to skip a filesystem on a LVM-snapshot.

Regarding b)
I am bit confused: if I understood correctly, the root filesystem was
picked from a LVM-snapshot, so grub-probe *correctly* reported that
the root device is the snapshot.
The problem was that during the boot filesystem discovering: first
scanned the *real* device, then the LVM-snapshot; the latter
overwrote the former so the system booted from the LVM-snapshot.

My conclusion is that we should improve the btrfs scan so:
- in udev rules, a partition that is a LVM snapshot by default 
should be not scanned by "btrfs dev scan"
- "btrfs dev scan", during the partition discovery should skip the 
lvm-snapshot.

BR
G.Baroncelli



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Fwd: BTRFS messes up snapshot LV with origin
       [not found]       ` <CAE8gLh=VubBbZdeKTAuWRjOxPF7C+ouUeeVvmGfT2ckYWGhQVA@mail.gmail.com>
@ 2014-11-17 19:45         ` MegaBrutal
  2014-11-17 20:32           ` Goffredo Baroncelli
  2014-11-18  6:16           ` Chris Murphy
  0 siblings, 2 replies; 64+ messages in thread
From: MegaBrutal @ 2014-11-17 19:45 UTC (permalink / raw)
  To: kreijack, Brendan Hide, linux-btrfs

2014-11-17 20:04 GMT+01:00 Goffredo Baroncelli <kreijack@inwind.it>:
>
> Regarding b)
> I am bit confused: if I understood correctly, the root filesystem was
> picked from a LVM-snapshot, so grub-probe *correctly* reported that
> the root device is the snapshot.


This is not what happens. The system doesn't even get a reboot when
the mix-up happens.

You boot from the original device, create an LVM-snapshot*, and mount
starts to report the snapshot as the root device, while in fact it
isn't.

I know my initial descriptions of the bug were misleading, as myself
didn't know what the heck is going on.

>From this point, please take these comments as reference:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429/comments/2
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429/comments/4


* I know I shouldn't make an LVM-snapshot of a mounted file system,
but this is not the point.


P.S.: E-mail sent twice, as lists didn't accept it in HTML. Plus I'm
not on the GRUB list, and can't post there.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Fwd: BTRFS messes up snapshot LV with origin
  2014-11-17 19:45         ` Fwd: " MegaBrutal
@ 2014-11-17 20:32           ` Goffredo Baroncelli
  2014-11-18  6:16           ` Chris Murphy
  1 sibling, 0 replies; 64+ messages in thread
From: Goffredo Baroncelli @ 2014-11-17 20:32 UTC (permalink / raw)
  To: MegaBrutal, Brendan Hide, linux-btrfs

On 2014-11-17 20:45, MegaBrutal wrote:
> * I know I shouldn't make an LVM-snapshot of a mounted file system,
> but this is not the point.

This should be supported for the filesystem which support the freezing

See http://stackoverflow.com/questions/1940093/lvm-snapshot-of-mounted-filesystem


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-17 19:45         ` Fwd: " MegaBrutal
  2014-11-17 20:32           ` Goffredo Baroncelli
@ 2014-11-18  6:16           ` Chris Murphy
  2014-11-18 15:42             ` Phillip Susi
  1 sibling, 1 reply; 64+ messages in thread
From: Chris Murphy @ 2014-11-18  6:16 UTC (permalink / raw)
  Cc: Btrfs BTRFS


On Nov 17, 2014, at 12:45 PM, MegaBrutal <megabrutal@gmail.com> wrote:

> 2014-11-17 20:04 GMT+01:00 Goffredo Baroncelli <kreijack@inwind.it>:
>> 
>> Regarding b)
>> I am bit confused: if I understood correctly, the root filesystem was
>> picked from a LVM-snapshot, so grub-probe *correctly* reported that
>> the root device is the snapshot.
> 
> 
> This is not what happens. The system doesn't even get a reboot when
> the mix-up happens.
> 
> You boot from the original device, create an LVM-snapshot*, and mount
> starts to report the snapshot as the root device, while in fact it
> isn’t.

If fstab specifies rootfs as UUID, and there are two volumes with the same UUID, it’s now ambiguous which one at boot time is the intended rootfs. It’s no different than the days of /dev/sdXY where X would change designations between boots = ambiguity and why we went to UUID. 

So we kinda need a way to distinguish derivative volumes. Maybe XFS and ext4 could easily change the volume UUID, but my vague recollection is this is difficult on Btrfs? So that led me to the idea of a way to create an on-the-fly (but consistent) “virtual volume UUID” maybe based on a hash of both the LVM LV and fs volume UUID.


Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-17  6:59   ` Brendan Hide
  2014-11-17  7:35     ` Daniel Dressler
  2014-11-17 19:04     ` Goffredo Baroncelli
@ 2014-11-18  6:21     ` Chris Murphy
  2014-11-18 12:13       ` Duncan
  2014-11-18 20:01       ` Goffredo Baroncelli
  2 siblings, 2 replies; 64+ messages in thread
From: Chris Murphy @ 2014-11-18  6:21 UTC (permalink / raw)
  Cc: Btrfs BTRFS, bug-grub


On Nov 16, 2014, at 11:59 PM, Brendan Hide <brendan@swiftspirit.co.za> wrote:

> cc'd bug-grub@gnu.org for FYI
> 
> On 2014/11/17 03:42, Duncan wrote:
>> MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:
>> 
>>> Hello guys,
>>> 
>>> I think you'll like this...
>>> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429
>> UUID is an initialism for "Universally Unique IDentifier".[1]
>> 
>> If the UUID isn't unique, by definition, then, it can't be a UUID, and
>> that's a bug in whatever is making the non-unique would-be UUID that
>> isn't unique and thus cannot be a universally unique ID.  In this case
>> that would appear to be LVM.
> Perhaps the right question to ask is "Where should this bug be fixed?”.
> 
> TL;DR: This needs more thought and input from btrfs devs. To LVM, the bug is likely seen as being "out of scope". The "correct" fix probably lies in the ecosystem design, which requires co-operation from btrfs.

I think the libblkid folks should be brought into this discussion, see what their take on this.

LVM conventional snapshots causing this problem is rare / self-limiting as they’re short lived. LVM thinp snapshots mean there can be dozens, and they can sanely endure for the life of the thin pool.

Effectively we have derivative volumes. At snapshot time, should a.) the fs volume UUID be changed; b.) each fs adds an additional/secondary volume UUID at snapshot time; c.) each fs adds a derivative/version indicator, i.e. 0 at mkfs time and maybe epoch time stamped at snapshot time; d.) not use fs UUID for identifying volumes uniqueness, instead use a virtual volume UUID which is externally determined based on whether the fs is on an LV snapshot.



> Making a snapshot in LVM is a fundamental thing - and I feel LVM, in making its snapshot, is doing its job "exactly as expected".
> 
> Additionally, there are other ways to get to a similar state without LVM: ddrescue backup, SAN snapshot, old "missing" disk re-introduced, etc.

Sure and likewise self limiting problem. LVM thinp snapshots actually do make this confusion of multiple instances of the same volume UUID much much more likely.

> 
> That leaves two places where this can be fixed: grub and btrfs

The GRUB os-prober and grub-mkconfig paradigm I think needs to come to an end. The grub.cfg is not supposed to be externally modified, the design is that os-prober + grub-mkconfig obliterate it and generate a whole new one from scratch anytime the system boot state changes, i.e. anytime a new kernel is added.

GRUB isn’t good at OS discovery now, I think it should just be abandoned. It can have its grub.cfg generated to do whatever complex things are needed, but the individual boot menu entries should exist as drop-in scripts managed by whatever is changing the OS boot state. This is the fundamental part of the two bootloaderspecs:
http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec/
http://www.freedesktop.org/wiki/MatthewGarrett/BootLoaderSpec/

And it’s a fundamental part of OSTree which supports multiple bootable trees on any filesystem, and currently uses a variation on bootloaderspec drop-in scripts to inform GRUB how to boot such a system:
https://wiki.gnome.org/action/show/Projects/OSTree?action=show&redirect=OSTree




> 
> Grub is already a little smart here - it avoids snapshots. But in this case it is relying on the UUID and only finding it in the snapshot. So possibly this is a bug in grub affecting the bug reporter specifically - but perhaps the bug is in btrfs where grub is relying on btrfs code.
> 
> Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice that is left to the user/admin/distro. I don't think saying "LVM snapshots are incompatible with btrfs" is the right way to go either.
> 
> That leaves two aspects of this issue which I view as two separate bugs:
> a) Btrfs cannot gracefully handle separate filesystems that have the same UUID. At all.
> b) Grub appears to pick the wrong filesystem when presented with two filesystems with the same UUID.
> 
> I feel a) is a btrfs bug.
> I feel b) is a bug that is more about "ecosystem design" than grub being silly.

I think we’re well past the expiration date on grub.cfg, a line should be drawn in the sand to deprecate routine use of os-prober + grub-mkconfig, and move to drop-in scripts by whatever the distro presumes will be responsible for managing what “tree” will be booted or will be offered as a boot option, all GRUB needs to learn is how to use that drop in script file format.

Ergo just because I’ve snapshot my root does not mean grub-mkconfig should be creating boot entries for it. But whatever usespace tool I’m using to do those snapshots (ostree, snapper, whatever the GNOME folks might come up with) should be the thing that creates the boot entry script; or as simple as this 2-4 line script should be, even hand done by a user, unlike the current grub.cfg file format.

Further I’d like to get more traction from the syslinux/extlinux folks to support the same drop-in boot file format. There’s no good reason for us to not support a single file format for boot menu entries. 


Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-18  6:21     ` Chris Murphy
@ 2014-11-18 12:13       ` Duncan
  2014-11-18 20:01       ` Goffredo Baroncelli
  1 sibling, 0 replies; 64+ messages in thread
From: Duncan @ 2014-11-18 12:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: bug-grub

Chris Murphy posted on Mon, 17 Nov 2014 23:21:57 -0700 as excerpted:

> I think we’re well past the expiration date on grub.cfg, a line should
> be drawn in the sand to deprecate routine use of os-prober +
> grub-mkconfig,
> and move to drop-in scripts by whatever the distro presumes will be
> responsible for managing what “tree” will be booted or will be offered
> as a boot option, all GRUB needs to learn is how to use that drop in
> script file format.
> 
> Ergo just because I’ve snapshot my root does not mean grub-mkconfig
> should be creating boot entries for it. But whatever usespace tool I’m
> using to do those snapshots (ostree, snapper, whatever the GNOME folks
> might come up with) should be the thing that creates the boot entry
> script; or as simple as this 2-4 line script should be, even hand done
> by a user, unlike the current grub.cfg file format.

FWIW, I hand-edit my grub.cfg here, grub-probe was taking /forever/ on my 
system back when I upgraded to grub2, and the "direct drive" 
configuration of direct grub.cfg editing was /far/ more flexible, or at 
least /far/ easier to learn how to do what I wanted to do than to figure 
out how to do it thru the translation layer, in any case.

The configuration is advanced enough it has individual choices to set 
standard init and init=/bin/bash, current/fallback/stable kernels, 
current/backup/second-backup roots, etc, plus a choice to interactively 
type in additional kernel commandline options, loading those choices into 
grub variables as I change them, then another choice to boot using the 
loaded variables to select the kernel and setup the kernel commandline.  
The initial grub.cfg has the default boot option, plus others that load 
either a troubleshooting menu or the backups choices menu, from separate 
included config files, as necessary.  Just /thinking/ about trying to do 
that via the cumbersome translation layer gives me a headache, and since 
I had to learn the grub scripting layer language to set it up anyway, I 
might as well just write and troubleshoot it in that directly rather than 
trying to figure out how to get the translation layer to write it, and 
then have to troubleshoot BOTH the translation layer and the lower level 
script.

Then I deleted grub-probe and grub-mkconfig so they couldn't be run 
accidentally with unconfigured/default translation-level options to undo 
all my hard work, and set a mask on them so updating the package wouldn't 
reinstall them.

So deprecate/kill os-prober and grub-mkconfig if you want, but grub.cfg 
needs to stay working!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-18  6:16           ` Chris Murphy
@ 2014-11-18 15:42             ` Phillip Susi
  2014-11-18 19:17               ` Chris Murphy
                                 ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Phillip Susi @ 2014-11-18 15:42 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/18/2014 1:16 AM, Chris Murphy wrote:
> If fstab specifies rootfs as UUID, and there are two volumes with
> the same UUID, it’s now ambiguous which one at boot time is the
> intended rootfs. It’s no different than the days of /dev/sdXY where
> X would change designations between boots = ambiguity and why we
> went to UUID.

He already said he has NOT rebooted, so there is no way that the
snapshot has actually been mounted, even if it were UUID confusion.

> So we kinda need a way to distinguish derivative volumes. Maybe
> XFS and ext4 could easily change the volume UUID, but my vague 
> recollection is this is difficult on Btrfs? So that led me to the 
> idea of a way to create an on-the-fly (but consistent) “virtual 
> volume UUID” maybe based on a hash of both the LVM LV and fs
> volume UUID.

When using LVM, you should be referring to the volume by the LVM name
rather than UUID.  LVM names are stable, and don't have the duplicate
uuid problem.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUa2j4AAoJEI5FoCIzSKrwvywH/3yS25MAIwsGfIwBfCrNN5Qo
NlBttcUcrYgOD/nQHEuulHdilWrvz3q6jGwVL9W8MQsHm0Ah5dMatT5e5zr1DSNC
ZqSEXSE8jsYJu99FUWevxO7wtb94ioKa+OF1u0zsaA5yQUdaj5smPqK3iUfskUhs
jE/vsJmws5iBv0dxnZI/6n3YqOB1Qck4PcMItRj8xvZQ0GjARIVw36pgJnmboGfY
vWRmUXnTeLMu9ilHWhqNUIh3lTTUvRdaYoZtTr6eYh9sIntDCegN71WGmO8FfdjP
vXhikg7Yx7FhkhxAl1X2NzM93d7fUSQDeQfTLYLMDbbTV/n2HwcoZ6G2+IQEJnQ=
=3Lv1
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-18 15:42             ` Phillip Susi
@ 2014-11-18 19:17               ` Chris Murphy
  2014-11-18 20:17                 ` Phillip Susi
  2014-11-18 20:41               ` MegaBrutal
  2014-11-19  1:29               ` Robert White
  2 siblings, 1 reply; 64+ messages in thread
From: Chris Murphy @ 2014-11-18 19:17 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Btrfs BTRFS


On Nov 18, 2014, at 8:42 AM, Phillip Susi <psusi@ubuntu.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 11/18/2014 1:16 AM, Chris Murphy wrote:
>> If fstab specifies rootfs as UUID, and there are two volumes with
>> the same UUID, it’s now ambiguous which one at boot time is the
>> intended rootfs. It’s no different than the days of /dev/sdXY where
>> X would change designations between boots = ambiguity and why we
>> went to UUID.
> 
> He already said he has NOT rebooted, so there is no way that the
> snapshot has actually been mounted, even if it were UUID confusion.
> 
>> So we kinda need a way to distinguish derivative volumes. Maybe
>> XFS and ext4 could easily change the volume UUID, but my vague 
>> recollection is this is difficult on Btrfs? So that led me to the 
>> idea of a way to create an on-the-fly (but consistent) “virtual 
>> volume UUID” maybe based on a hash of both the LVM LV and fs
>> volume UUID.
> 
> When using LVM, you should be referring to the volume by the LVM name
> rather than UUID.  LVM names are stable, and don't have the duplicate
> uuid problem.

What if you have a Btrfs raid1 volume using two LV’s and then snapshot both LV’s?

Of course I’d specify one of the devices by VG-LV name. But Btrfs finds additional devices itself, it doesn’t support explicitly naming additional member devices. And in this example, there are two identical candidates, so it’s ambiguous to Btrfs which one to use. And further it’s unknown to the user which one Btrfs chose because neither mount, nor /proc/mounts right now shows anything other than the first device that’s mounted. So it’s using one of those two VG-LV’s automatically but not informing us which one.

I think there’s some metadata that can be set on each LV whether it’s automatically activated (at e.g. boot time) so I think the thing to do would be to make sure the snapshot LV’s are not activated, therefore their UUID’s shouldn’t be visible to Btrfs and it won’t automatically discover and use the wrong LV. But I haven’t tested this.


Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-18  6:21     ` Chris Murphy
  2014-11-18 12:13       ` Duncan
@ 2014-11-18 20:01       ` Goffredo Baroncelli
  1 sibling, 0 replies; 64+ messages in thread
From: Goffredo Baroncelli @ 2014-11-18 20:01 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, bug-grub

On 2014-11-18 07:21, Chris Murphy wrote:
> Ergo just because I’ve snapshot my root does not mean grub-mkconfig
> should be creating boot entries for it.

I find this an useful feature: a snapshot of / is done to rollback
some changes, so why don't let grub to start (the kernel) from ?

Anyway I find grub-mkconfig quite useful for a "standard" user.
For more advance uses cases editing by hand grub.cfg may be possible.

BR
G.Baroncelli



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-18 19:17               ` Chris Murphy
@ 2014-11-18 20:17                 ` Phillip Susi
  2014-11-19  2:54                   ` Chris Murphy
  0 siblings, 1 reply; 64+ messages in thread
From: Phillip Susi @ 2014-11-18 20:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/18/2014 2:17 PM, Chris Murphy wrote:
> What if you have a Btrfs raid1 volume using two LV’s and then 
> snapshot both LV’s?

That's even more silly than a single lvm snapshot under btrfs.  Just
don't do it.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUa6lsAAoJEI5FoCIzSKrwzicIAJLXrsVpWxsI+wq8xGGumwoy
s2QGUZ3Soknr30FAeZWFpS7diXOuuOWXjaObTlFMcUAqGE134d4I3W+k2PxejHns
AfdKSdyiactcndea6aw5zBGzdk5N5bLaoCaS8GSeKVdIMWlLFh+lMzHX2q6tC+cS
8RWJI7GYk193RmWkHKUhX57J9tnP7eJmXTkqdRJIDXmaaceYLR8057LZbNsuurFA
h0ZptXKFUhp6dsEMV5JPnxKZ9l62ZNcL5zEE3D7sVU20ll/YEP7UHOYY/JTGwdLN
KWOUIJ89gM6LqWTz2gFuz8JhPhmZCIKrpN6Fu/pKDHYSrdYyazZV/D6P/dX5TUA=
=3LdX
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-18 15:42             ` Phillip Susi
  2014-11-18 19:17               ` Chris Murphy
@ 2014-11-18 20:41               ` MegaBrutal
  2014-11-19  1:29               ` Robert White
  2 siblings, 0 replies; 64+ messages in thread
From: MegaBrutal @ 2014-11-18 20:41 UTC (permalink / raw)
  To: linux-btrfs

2014-11-18 16:42 GMT+01:00 Phillip Susi <psusi@ubuntu.com>:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/18/2014 1:16 AM, Chris Murphy wrote:
> > If fstab specifies rootfs as UUID, and there are two volumes with
> > the same UUID, it’s now ambiguous which one at boot time is the
> > intended rootfs. It’s no different than the days of /dev/sdXY where
> > X would change designations between boots = ambiguity and why we
> > went to UUID.
>
> He already said he has NOT rebooted, so there is no way that the
> snapshot has actually been mounted, even if it were UUID confusion.
>

That's right.

Anyway, I've built a system to reproduce the bug. You can download the
image and run it with KVM or other virtualization technology.
Instructions are straightforward – if you start the VM, you'll know
what to do, and you'll see what I was talking about.

http://undead.megabrutal.com/kvm-reproduce-1391429.img.xz

Download size: 113 MB; Unpacked image size: 2 GB.


> > So we kinda need a way to distinguish derivative volumes. Maybe
> > XFS and ext4 could easily change the volume UUID, but my vague
> > recollection is this is difficult on Btrfs? So that led me to the
> > idea of a way to create an on-the-fly (but consistent) “virtual
> > volume UUID” maybe based on a hash of both the LVM LV and fs
> > volume UUID.
>
> When using LVM, you should be referring to the volume by the LVM name
> rather than UUID.  LVM names are stable, and don't have the duplicate
> uuid problem.
>

I use LVM names to identify volumes. I initially suspected it's an
UUID confusion, because I thought grub-probe looks for the volume by
UUID. But now I think the problem is nothing to do with UUIDs.
Probably I should have looked deeper into the problem before I
hypothesized.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-18 15:42             ` Phillip Susi
  2014-11-18 19:17               ` Chris Murphy
  2014-11-18 20:41               ` MegaBrutal
@ 2014-11-19  1:29               ` Robert White
  2014-11-19  3:37                 ` Duncan
  2 siblings, 1 reply; 64+ messages in thread
From: Robert White @ 2014-11-19  1:29 UTC (permalink / raw)
  To: Phillip Susi, Chris Murphy; +Cc: Btrfs BTRFS

On 11/18/2014 07:42 AM, Phillip Susi wrote:

> On 11/18/2014 1:16 AM, Chris Murphy wrote:
>> (stuff about UUIDs and LVM snapshots).
 > (suggestion to use LVM paths instead).

This is also an XFS+LVM+LVM_Snapshot problem going back to at least 
2009. It's inherent to the block-device-level snapshot phenomonia.

q.v. http://www.miljan.org/main/2009/11/16/lvm-snapshots-and-xfs/ et al

In XFS you attack the snapshot with a command to regenerate the UUID as 
soon as you take the snapshot. I don't think there is a "regenerate all 
my UUIDs" command for BTRFS.

There are other places this can bone you, like old-format mdadm mirrors, 
where the metadata was only at the end of the partition so you could 
accidentally see two copied of your RAID1 file system if you hand't 
built/started the array.

There is no really good way to prevent this other than "being really 
careful" or "not doing that at all".

Sorry. Cost of doing business. Cheers...
Rob.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-18 20:17                 ` Phillip Susi
@ 2014-11-19  2:54                   ` Chris Murphy
  2014-11-19 15:20                     ` Phillip Susi
  0 siblings, 1 reply; 64+ messages in thread
From: Chris Murphy @ 2014-11-19  2:54 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Btrfs BTRFS


On Nov 18, 2014, at 1:17 PM, Phillip Susi <psusi@ubuntu.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 11/18/2014 2:17 PM, Chris Murphy wrote:
>> What if you have a Btrfs raid1 volume using two LV’s and then 
>> snapshot both LV’s?
> 
> That's even more silly than a single lvm snapshot under btrfs.  Just
> don't do it.

Why is it silly? Btrfs on a thin volume has practical use case aside from just being thinly provisioned, its snapshots are block device based, not merely that of an fs tree.

Looks like lvm.conf does have a way to affect LV autoactivation, and there may be another way to achieve this also. Right after the snapshot(s) they’d need to have their autoactivation disabled to avoid UUID confusion.


Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-19  1:29               ` Robert White
@ 2014-11-19  3:37                 ` Duncan
  0 siblings, 0 replies; 64+ messages in thread
From: Duncan @ 2014-11-19  3:37 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Tue, 18 Nov 2014 17:29:12 -0800 as excerpted:

> On 11/18/2014 07:42 AM, Phillip Susi wrote:
> 
>> On 11/18/2014 1:16 AM, Chris Murphy wrote:
>>> (stuff about UUIDs and LVM snapshots).
>  > (suggestion to use LVM paths instead).
> 
> This is also an XFS+LVM+LVM_Snapshot problem going back to at least
> 2009. It's inherent to the block-device-level snapshot phenomonia.
> 
> q.v. http://www.miljan.org/main/2009/11/16/lvm-snapshots-and-xfs/ et al
> 
> In XFS you attack the snapshot with a command to regenerate the UUID as
> soon as you take the snapshot. I don't think there is a "regenerate all
> my UUIDs" command for BTRFS.

Which was part of my point in my reply.  Btrfs embeds the UUID in the 
metadata deeply enough that it's no simple task to simply change it to 
something else and be done.  It's quite a complicated operation for any 
(future, none current) tool that attempts it, with the most likely 
candidate being an option to btrfs balance or the like, but even then, 
we're looking at a timescale of hours for spinning rust.

So while it's possible in theory, in practice such a regenerate-all UUIDs 
command for btrfs isn't available yet, and given the time involved in 
rewriting all those metadata UUIDs to something else, during which the 
filesystem's in a critically unstable state, and the limited use-case 
with other alternatives, such a tool isn't all /that/ practical in any 
case.

Making an entirely new btrfs and doing a btrfs send/receive for the 
duplicate, or using btrfs snapshots, is a more practical way to go.  (Tho 
watch out for the implications of btrfs snapshots on nocow files!)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-19  2:54                   ` Chris Murphy
@ 2014-11-19 15:20                     ` Phillip Susi
  2014-11-19 18:35                       ` Chris Murphy
  2014-11-21  4:28                       ` Zygo Blaxell
  0 siblings, 2 replies; 64+ messages in thread
From: Phillip Susi @ 2014-11-19 15:20 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/18/2014 9:54 PM, Chris Murphy wrote:
> Why is it silly? Btrfs on a thin volume has practical use case
> aside from just being thinly provisioned, its snapshots are block
> device based, not merely that of an fs tree.

Umm... because one of the big selling points of btrfs is that it is in
a much better position to make snapshots being aware of the fs tree
rather than doing it in the block layer.

So it is kind of silly in the first place to be using lvm snapshots
under btrfs, but it is is doubly silly to use lvm for snapshots, and
btrfs for the mirroring rather than lvm.  Pick one layer and use it
for both functions.  Even if that is lvm, then it should also be
handling the mirroring.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUbLUxAAoJEI5FoCIzSKrwh0oH/3TZ2oo8u2BjHYO3b0x8800/
LFkmGFWrZFSnAvtWuN5B1WlhMXku4dxLRXz14fJKFp3fNmnYRNVvw3tu9btvsBsC
sZdwLaKwKPHTK8RS+QCI2pZPX+cGB+F7/z9PCHrzIzzCKk/4SvnJ76e2nnZFpY1m
Md3f1BCHEVUPMMXbqv6Ry6v7PDs/8bx8WITYyAL9uh3tjh0dXQsjbZJn5u4XDitS
/CoE8eX4rf1vc7qHI4K56TtArCcXQxAHcC56fXmcmS03bVhAkkJ5Z+/uwi6+TkJe
55rMFCd7UFy9pwKha3Q2flJHtDYG6ns7Njyff6BSL9Yzq7tHh4wLk1H3XxaOCP8=
=ktv/
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-19 15:20                     ` Phillip Susi
@ 2014-11-19 18:35                       ` Chris Murphy
  2014-11-19 19:23                         ` Phillip Susi
  2014-11-21  4:28                       ` Zygo Blaxell
  1 sibling, 1 reply; 64+ messages in thread
From: Chris Murphy @ 2014-11-19 18:35 UTC (permalink / raw)
  To: Btrfs BTRFS

On Wed, Nov 19, 2014 at 8:20 AM, Phillip Susi <psusi@ubuntu.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 11/18/2014 9:54 PM, Chris Murphy wrote:
>> Why is it silly? Btrfs on a thin volume has practical use case
>> aside from just being thinly provisioned, its snapshots are block
>> device based, not merely that of an fs tree.
>
> Umm... because one of the big selling points of btrfs is that it is in
> a much better position to make snapshots being aware of the fs tree
> rather than doing it in the block layer.

This is why we have fsfreeze before taking block level snapshots. And
I point out that consistent snapshots with Btrfs have posed challenges
too, there's a recent fstest "snapshoting after file write + truncate"
for this reason.

A block layer snapshot will snapshot the entire file system, not just
one tree. We don't have a way in Btrfs to snapshot the entire volume.
Considering how things still aren't exactly stable yet, in particular
with many snapshots, it's not unreasonable to want to freeze then
snapshot the entire volume before doing some possibly risky testing or
usage where even a Btrfs snapshot doesn't protect your entire volume
should things go wrong.

>
>
> So it is kind of silly in the first place to be using lvm snapshots
> under btrfs, but it is is doubly silly to use lvm for snapshots, and
> btrfs for the mirroring rather than lvm.  Pick one layer and use it
> for both functions.  Even if that is lvm, then it should also be
> handling the mirroring.


Thin volumes are more efficient. And the user creating them doesn't
have to mess around with locating physical devices or possibly
partitioning them. Plus in enterprise environments with lots of
storage and many different kinds of use cases, even knowledable users
aren't always granted full access to the physical storage anyway. They
get a VG to play with, or now they can have a thin pool and only
consume on storage what is actually used, and not what they've
reserved. You can mkfs a 4TG virtual size volume, while it only uses
1MB of physical extents on storage. And all of that is orthogonal to
using XFS or Btrfs which again comes down to use case. And whether I'd
have LVM mirror or Btrfs mirror is again a question of use case, maybe
I'm OK with LVM mirroring and I just get the rare corrupt file warning
and that's OK. In another use case, corruption isn't OK, I need higher
availability of known good data therefore I need Btrfs doing the
mirroring.

So I find your argument thus far uncompelling.


Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-19 18:35                       ` Chris Murphy
@ 2014-11-19 19:23                         ` Phillip Susi
  0 siblings, 0 replies; 64+ messages in thread
From: Phillip Susi @ 2014-11-19 19:23 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/19/2014 1:33 PM, Chris Murphy wrote:
> Thin volumes are more efficient. And the user creating them doesn't
> have to mess around with locating physical devices or possibly
> partitioning them. Plus in enterprise environments with lots of
> storage and many different kinds of use cases, even knowledable
> users aren't always granted full access to the physical storage
> anyway. They get a VG to play with, or now they can have a thin
> pool and only consume on storage what is actually used, and not
> what they've reserved. You can mkfs a 4TG virtual size volume, 
> while it only uses 1MB of physical extents on storage. And all of 
> that is orthogonal to using XFS or Btrfs which again comes down to 
> use case. And whether I'd have LVM mirror or Btrfs mirror is again 
> a question of use case, maybe I'm OK with LVM mirroring and I just 
> get the rare corrupt file warning and that's OK. In another use 
> case, corruption isn't OK, I need higher availability of known
> good data therefore I need Btrfs doing the mirroring.

Correct me if I'm wrong, but this kind of setup is basically where you
have a provider running an lvm thin pool volume on their hardware, and
exposing it to the customer's vm as a virtual disk.  In that case,
then the provider can do their snapshots and it won't cause this
problem since the snapshots aren't visible to the vm.  Also in these
cases the provider is normally already providing data protection by
having the vg on a raid6 or raid60 or something, so having the client
vm mirror the data in btrfs is a bit redundant.




-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUbO4nAAoJEI5FoCIzSKrwl/QIAJ7arJ0ZXVc16pBRjE2F66uV
GAOhatdx8pLhGey6by+gV8Ltvx4bK3BG40dkvQIM9RN9UFC5vofQ4FnzIn1nfXZB
qyyITE2mF+lE3RNCb8ZKxwG58rfa9NOModPCeNVFWkS6+fyyhGY23sliWbVO6b15
w6BD5xu/Pp7Fhgkx81AL07XpusR9c8pKZd8ZHw4nozFHw20+13XuL+2g8axpZS+O
Xd9W5GRlC+0k9jQ0q9xGi1jh6QpjMSWVj54MNS5jRubsY65TtmFPkdvgaMGD4U5k
bADSEUMfij9NRMw8VwA4ik/JEi1IbukD4u1geKeZTowMGXReel2RimeA/PhFYcc=
=tmDI
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-17 19:04     ` Goffredo Baroncelli
       [not found]       ` <CAE8gLh=VubBbZdeKTAuWRjOxPF7C+ouUeeVvmGfT2ckYWGhQVA@mail.gmail.com>
@ 2014-11-21  4:24       ` Zygo Blaxell
  1 sibling, 0 replies; 64+ messages in thread
From: Zygo Blaxell @ 2014-11-21  4:24 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Brendan Hide, linux-btrfs, bug-grub

[-- Attachment #1: Type: text/plain, Size: 2507 bytes --]

On Mon, Nov 17, 2014 at 08:04:05PM +0100, Goffredo Baroncelli wrote:
> On 2014-11-17 07:59, Brendan Hide wrote:
> > 
> > That leaves two aspects of this issue which I view as two separate bugs:
> > a) Btrfs cannot gracefully handle separate filesystems that have the same UUID. At all.
> > b) Grub appears to pick the wrong filesystem when presented with two filesystems with the same UUID.
> > 
> > I feel a) is a btrfs bug.
> > I feel b) is a bug that is more about "ecosystem design" than grub being silly.
> 
> Regarding a)
> IIRC, btrfs collects the filesystem information by UUID; if two 
> filesystems have the same UUID (like the LVM-snapshot case), the
> last filesystem discovered overwrite the first one.
> 
> The filesystem discovering is done in user-space; so it should be simple
> to skip a filesystem on a LVM-snapshot.
> 
> Regarding b)
> I am bit confused: if I understood correctly, the root filesystem was
> picked from a LVM-snapshot, so grub-probe *correctly* reported that
> the root device is the snapshot.
> The problem was that during the boot filesystem discovering: first
> scanned the *real* device, then the LVM-snapshot; the latter
> overwrote the former so the system booted from the LVM-snapshot.

IMHO if the device UUID search finds multiple devices with the same device
UUID, it should ignore _all_ of them as the identification problem
is unsolvable without further user input.  This is what the 'device='
mount option is for.

> My conclusion is that we should improve the btrfs scan so:
> - in udev rules, a partition that is a LVM snapshot by default 
> should be not scanned by "btrfs dev scan"
> - "btrfs dev scan", during the partition discovery should skip the 
> lvm-snapshot.

That would mean I can't do this:

	1.  lvm snapshot of ext4 filesystem

	2.  btrfs-convert the snapshot

	3.  mount the snapshot, make sure it's OK

	4.  merge LVM snapshot to overwrite original ext4 filesystem

which would be a shame since that's the only way I ever convert ext3/4
filesystems to btrfs (btrfs-convert is a little buggy still).

> BR
> G.Baroncelli
> 
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-19 15:20                     ` Phillip Susi
  2014-11-19 18:35                       ` Chris Murphy
@ 2014-11-21  4:28                       ` Zygo Blaxell
  2014-11-21  6:22                         ` Duncan
  2014-11-22 17:34                         ` Goffredo Baroncelli
  1 sibling, 2 replies; 64+ messages in thread
From: Zygo Blaxell @ 2014-11-21  4:28 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3108 bytes --]

On Wed, Nov 19, 2014 at 10:20:17AM -0500, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 11/18/2014 9:54 PM, Chris Murphy wrote:
> > Why is it silly? Btrfs on a thin volume has practical use case
> > aside from just being thinly provisioned, its snapshots are block
> > device based, not merely that of an fs tree.
> 
> Umm... because one of the big selling points of btrfs is that it is in
> a much better position to make snapshots being aware of the fs tree
> rather than doing it in the block layer.

One of the big selling points of LVM is that it is in a much better
position to make snapshots so you can run btrfsck on the shattered
remains of your broken btrfs filesystem.

The UUID-driven behavior of btrfs is _really extremely annoying_.
No other filesystem forces me to jump through the hoops btrfs does
to get routine admin tasks done.

e.g. if an ext4 filesystem explodes, I can:

	1.  make a LVM snapshot of the broken filesystem

	2.  run e2fsck on the snapshot

	3.  mount and repair the snapshot, e.g. rsync any missing files
	from backups, salvage anything that survived

	4.  LVM merge the snapshot to its origin volume

	5.  umount the origin volume and mount the merged volume
	(or just reboot)

...and I can do all of this on a running system, in-place, with only a
few minutes of downtime in the must-reboot case.

None of the above works with btrfs at all.  Multi-device btrfs fails
at 2, and mounting the filesystem fails at 3.  The closest I've gotten
to this workflow is to set up a kvm instance that can see only the LVM
snapshots, (only) and run the btrfsck or rsync there--and hope that the
system doesn't crash and reboot during that time, or the filesystem will
be more or less destroyed by the random combination of origin and
snapshot LVs.

I've also learned the hard way to always make an LVM snapshot before
running btrfsck, just in case you discover a new btrfsck bug with your
filesystem.  That at least works for single-device btrfs filesystems.

> So it is kind of silly in the first place to be using lvm snapshots
> under btrfs, but it is is doubly silly to use lvm for snapshots, and
> btrfs for the mirroring rather than lvm.  Pick one layer and use it
> for both functions.  Even if that is lvm, then it should also be
> handling the mirroring.
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (MingW32)
> 
> iQEcBAEBAgAGBQJUbLUxAAoJEI5FoCIzSKrwh0oH/3TZ2oo8u2BjHYO3b0x8800/
> LFkmGFWrZFSnAvtWuN5B1WlhMXku4dxLRXz14fJKFp3fNmnYRNVvw3tu9btvsBsC
> sZdwLaKwKPHTK8RS+QCI2pZPX+cGB+F7/z9PCHrzIzzCKk/4SvnJ76e2nnZFpY1m
> Md3f1BCHEVUPMMXbqv6Ry6v7PDs/8bx8WITYyAL9uh3tjh0dXQsjbZJn5u4XDitS
> /CoE8eX4rf1vc7qHI4K56TtArCcXQxAHcC56fXmcmS03bVhAkkJ5Z+/uwi6+TkJe
> 55rMFCd7UFy9pwKha3Q2flJHtDYG6ns7Njyff6BSL9Yzq7tHh4wLk1H3XxaOCP8=
> =ktv/
> -----END PGP SIGNATURE-----
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21  4:28                       ` Zygo Blaxell
@ 2014-11-21  6:22                         ` Duncan
  2014-11-21 11:35                           ` Robert White
                                             ` (2 more replies)
  2014-11-22 17:34                         ` Goffredo Baroncelli
  1 sibling, 3 replies; 64+ messages in thread
From: Duncan @ 2014-11-21  6:22 UTC (permalink / raw)
  To: linux-btrfs

Zygo Blaxell posted on Thu, 20 Nov 2014 23:28:14 -0500 as excerpted:

> On Wed, Nov 19, 2014 at 10:20:17AM -0500, Phillip Susi wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> On 11/18/2014 9:54 PM, Chris Murphy wrote:
>> > Why is it silly? Btrfs on a thin volume has practical use case aside
>> > from just being thinly provisioned, its snapshots are block device
>> > based, not merely that of an fs tree.
>> 
>> Umm... because one of the big selling points of btrfs is that it is in
>> a much better position to make snapshots being aware of the fs tree
>> rather than doing it in the block layer.
> 
> One of the big selling points of LVM is that it is in a much better
> position to make snapshots so you can run btrfsck on the shattered
> remains of your broken btrfs filesystem.
> 
> The UUID-driven behavior of btrfs is _really extremely annoying_.
> No other filesystem forces me to jump through the hoops btrfs does to
> get routine admin tasks done.
> 
> e.g. if an ext4 filesystem explodes, I can:
> 
> 	1.  make a LVM snapshot of the broken filesystem
> 
> 	2.  run e2fsck on the snapshot
> 
> 	3.  mount and repair the snapshot, e.g. rsync any missing files 
from
> 	backups, salvage anything that survived
> 
> 	4.  LVM merge the snapshot to its origin volume
> 
> 	5.  umount the origin volume and mount the merged volume (or just
> 	reboot)
> 
> ...and I can do all of this on a running system, in-place, with only a
> few minutes of downtime in the must-reboot case.
> 
> None of the above works with btrfs at all.  Multi-device btrfs fails at
> 2,
> and mounting the filesystem fails at 3.  The closest I've gotten to this
> workflow is to set up a kvm instance that can see only the LVM
> snapshots, (only) and run the btrfsck or rsync there--and hope that the
> system doesn't crash and reboot during that time, or the filesystem will
> be more or less destroyed by the random combination of origin and
> snapshot LVs.
> 
> I've also learned the hard way to always make an LVM snapshot before
> running btrfsck, just in case you discover a new btrfsck bug with your
> filesystem.  That at least works for single-device btrfs filesystems.

When I have such a filesystem level problem, I simply dd from the backing 
device to some other location, generally to a file that's on a different 
filesystem (preferrably non-btrfs, I use reiserfs as I've found it very 
resilient, here), in which case btrfs device scan won't see the UUID on 
the copy as it scans block devices, not inside non-device files.

After all, an LVM block-level snapshot takes the same space as a file 
containing the same raw data, and if there's room for the data in an LVM 
snapshot, given a different layout, there's room for exactly the same 
amount of data as a file on a different filesystem, piped thru some 
compressor if necessary due to tight datasize constraints.

But while other filesystems might allow un-UUIDs (heh, UUUIDs or U3IDs 
=:^), because they're no longer unique, requiring them to be unique just 
as the label says cannot be considered a bug.  It's simply stricter 
enforcement of the rules, which are, after all, plainly stated in the 
descriptive name.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21  6:22                         ` Duncan
@ 2014-11-21 11:35                           ` Robert White
  2014-11-21 11:54                             ` Duncan
  2014-11-21 17:56                           ` Zygo Blaxell
  2014-11-21 18:23                           ` Chris Murphy
  2 siblings, 1 reply; 64+ messages in thread
From: Robert White @ 2014-11-21 11:35 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 11/20/2014 10:22 PM, Duncan wrote:
> But while other filesystems might allow un-UUIDs (heh, UUUIDs or U3IDs
> =:^), because they're no longer unique, requiring them to be unique just
> as the label says cannot be considered a bug.  It's simply stricter
> enforcement of the rules, which are, after all, plainly stated in the
> descriptive name.

You take "U"s away, not add them

UID = unique ID
GUID = globally unique ID
UUID = universally unique ID


And other file systems have the same issues. XFS, for example uses UUIDs 
in the same way. It just has a command to re-brand the filesystem's UUID 
which you apply to the LVM snapshot immediately after taking the 
snapshot. (problem long-since established and understood since 2009 or so.)

I don't know if this approach would work for BRFS with subvolumes.

Example Citation :: 
http://www.miljan.org/main/2009/11/16/lvm-snapshots-and-xfs/

XFS also has the nouuids mount option.

btrfs has device= mount option.

But any system with unique ids will have this identical issue when 
block-snapshot support is added underneath.

-- Rob.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21 11:35                           ` Robert White
@ 2014-11-21 11:54                             ` Duncan
  0 siblings, 0 replies; 64+ messages in thread
From: Duncan @ 2014-11-21 11:54 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Fri, 21 Nov 2014 03:35:05 -0800 as excerpted:

> On 11/20/2014 10:22 PM, Duncan wrote:
>> But while other filesystems might allow un-UUIDs (heh, UUUIDs or U3IDs
>> =:^), because they're no longer unique, requiring them to be unique
>> just as the label says cannot be considered a bug.  It's simply
>> stricter enforcement of the rules, which are, after all, plainly stated
>> in the descriptive name.
> 
> You take "U"s away, not add them
> 
> UID = unique ID GUID = globally unique ID UUID = universally unique ID

I was making a joke, as I happened to notice un-UUID =3 U-s just as I was 
writing that.  Universally unique ID = UUID, un-UUID (not universally 
unique ID) = UUUID = U^3ID. =:^)

Of course formally it'd be NUID (not/non- unique) or some such, but un-
UUID served my purpose well enough, including the joke once I noticed it, 
so...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21  6:22                         ` Duncan
  2014-11-21 11:35                           ` Robert White
@ 2014-11-21 17:56                           ` Zygo Blaxell
  2014-11-21 23:09                             ` Duncan
  2014-11-21 18:23                           ` Chris Murphy
  2 siblings, 1 reply; 64+ messages in thread
From: Zygo Blaxell @ 2014-11-21 17:56 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2035 bytes --]

On Fri, Nov 21, 2014 at 06:22:57AM +0000, Duncan wrote:
> After all, an LVM block-level snapshot takes the same space as a file 
> containing the same raw data, and if there's room for the data in an LVM 
> snapshot, given a different layout, there's room for exactly the same 
> amount of data as a file on a different filesystem, piped thru some 
> compressor if necessary due to tight datasize constraints.

That isn't true at all.  A repairing fsck can take less than 1% of the
overall volume size, and a full conversion from another filesystem type
can take less than 10%.  Usually I can find enough space by blowing away
the swap LV for a few hours.

I do NOT usually have 13TB of slack space lying around in a 26TB disk
array, nor do I have enough bandwidth to move those 13TB to another
machine without great inconvenience.

> But while other filesystems might allow un-UUIDs (heh, UUUIDs or U3IDs 
> =:^), because they're no longer unique, requiring them to be unique just 
> as the label says cannot be considered a bug.  It's simply stricter 
> enforcement of the rules, which are, after all, plainly stated in the 
> descriptive name.

It's not a bug as long as I can completely control which devices are
searched for UUIDs, and the system behaves sanely when multiple UUIDs
are found through automatic discovery; otherwise, it's not only a bug,
it's a DoS attack security vulnerability.  Consider what happens if
someone looks at /sys/fs/btrfs, reads the non-secret UUIDs, builds a fake
filesystem with those UUIDs, puts the fake filesystem on a USB stick,
and plugs it back into the victim machine...

> -- 
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21  6:22                         ` Duncan
  2014-11-21 11:35                           ` Robert White
  2014-11-21 17:56                           ` Zygo Blaxell
@ 2014-11-21 18:23                           ` Chris Murphy
  2014-11-21 22:49                             ` Duncan
  2 siblings, 1 reply; 64+ messages in thread
From: Chris Murphy @ 2014-11-21 18:23 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Thu, Nov 20, 2014 at 11:22 PM, Duncan <1i5t5.duncan@cox.net> wrote:

>
> When I have such a filesystem level problem, I simply dd from the backing
> device to some other location, generally to a file that's on a different
> filesystem (preferrably non-btrfs, I use reiserfs as I've found it very
> resilient, here), in which case btrfs device scan won't see the UUID on
> the copy as it scans block devices, not inside non-device files.

That's hours of dd and you have to find space to do it.


> After all, an LVM block-level snapshot takes the same space as a file
> containing the same raw data, and if there's room for the data in an LVM
> snapshot, given a different layout, there's room for exactly the same
> amount of data as a file on a different filesystem, piped thru some
> compressor if necessary due to tight datasize constraints.

That's not true for thin volume snapshots. They take up next to no
space upon creation, they don't need space reserved in advance.
They're more like a qcow2 snapshot than a conventional LVM snapshot; a
big difference being if you delete the snapshot, or you delete a bunch
of files in a thin volume and follow it with fstrim, the unused
extents are returned to the thin pool.

There has been a fragmentation problem with thin volumes; I don't know
if that's solved yet. And I don't know if it exacerbates things with
Btrfs fragmentation.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21 18:23                           ` Chris Murphy
@ 2014-11-21 22:49                             ` Duncan
  2014-11-21 23:41                               ` Duncan
  0 siblings, 1 reply; 64+ messages in thread
From: Duncan @ 2014-11-21 22:49 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Fri, 21 Nov 2014 11:23:45 -0700 as excerpted:

> On Thu, Nov 20, 2014 at 11:22 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> 
> 
>> When I have such a filesystem level problem, I simply dd from the
>> backing device to some other location, generally to a file that's on a
>> different filesystem (preferrably non-btrfs, I use reiserfs as I've
>> found it very resilient, here), in which case btrfs device scan won't
>> see the UUID on the copy as it scans block devices, not inside
>> non-device files.
> 
> That's hours of dd and you have to find space to do it.

I did it recently here.  There's a method to my sub-100-GiB partition 
madness! =:^)  The partitions in question were on SSD, and were small 
enough I could simply DD them to files on my media filesystem, which was 
after all designed to be able to take full ISO images, etc.

Additionally, due to size and reasonably consistent linear intra-file 
access patterns, the media filesystem's still on much cheaper spinning 
rust, while most of the system's on much faster to random-access but far 
more expensive SSD, so in this case one side was SSD, the other spinning 
rust.

Tho granted, if you're doing single-partition/filesystem multi-TiB 
filesystems, it does get to be a problem.  As there would have been if 
the filesystem in question was the media filesystem, altho that one's not 
yet btrfs for a reason.  But still, if there's room enough for an LVM 
snapshot in the first place, with a different layout, there'd be room for 
the same data as a file.  That's pretty basic.

>> After all, an LVM block-level snapshot takes the same space as a file
>> containing the same raw data, and if there's room for the data in an
>> LVM snapshot, given a different layout, there's room for exactly the
>> same amount of data as a file on a different filesystem, piped thru
>> some compressor if necessary due to tight datasize constraints.
> 
> That's not true for thin volume snapshots. They take up next to no space
> upon creation, they don't need space reserved in advance.

Thus the mention of compression if necessary.  Thin-volume snapshots are 
effectively compression by another name, and a raw dd from them should 
compress pretty much equally well, depending on compression method 
chosen, of course. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21 17:56                           ` Zygo Blaxell
@ 2014-11-21 23:09                             ` Duncan
  0 siblings, 0 replies; 64+ messages in thread
From: Duncan @ 2014-11-21 23:09 UTC (permalink / raw)
  To: linux-btrfs

Zygo Blaxell posted on Fri, 21 Nov 2014 12:56:23 -0500 as excerpted:

> It's not a bug as long as I can completely control which devices are
> searched for UUIDs, and the system behaves sanely when multiple UUIDs
> are found through automatic discovery; otherwise, it's not only a bug,
> it's a DoS attack security vulnerability.  Consider what happens if
> someone looks at /sys/fs/btrfs, reads the non-secret UUIDs, builds a
> fake filesystem with those UUIDs, puts the fake filesystem on a USB
> stick, and plugs it back into the victim machine...

With the current state of USB vulnerability (firmware reprogrammed as an 
input device, etc, the vuln has been all over the tech news for some 
months now), anyone with USB access to the machine is simply another case 
of anyone with physical access to the machine, they're normally assumed 
to be able to be able to at minimum take down the machine, the ultimate 
DoS, in any case, and often to have effective root, tho that can be 
mitigated to some extent with encryption, etc.  It's generally assumed 
that if you have physical access, as required to plug in that USB, game 
over, the machine is effectively p40wn3d.  At the /very/ least, with 
physical access it's vulnerable to the sledgehammer DoS, and there's 
little to be done about that but prevent physical access by all means 
necessary (armed guards, nuclear silo hosting, etc) in the first place.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21 22:49                             ` Duncan
@ 2014-11-21 23:41                               ` Duncan
  2014-11-21 23:51                                 ` Duncan
  0 siblings, 1 reply; 64+ messages in thread
From: Duncan @ 2014-11-21 23:41 UTC (permalink / raw)
  To: linux-btrfs

Duncan posted on Fri, 21 Nov 2014 22:49:06 +0000 as excerpted:

> Chris Murphy posted...

>> That's not true for thin volume snapshots. They take up next to no
>> space upon creation, they don't need space reserved in advance.
> 
> Thus the mention of compression if necessary.  Thin-volume snapshots are
> effectively compression by another name, and a raw dd from them should
> compress pretty much equally well, depending on compression method
> chosen, of course. =:^)

Oops, I mis-parsed "thin".  Good point and thanks, Chris.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21 23:41                               ` Duncan
@ 2014-11-21 23:51                                 ` Duncan
  0 siblings, 0 replies; 64+ messages in thread
From: Duncan @ 2014-11-21 23:51 UTC (permalink / raw)
  To: linux-btrfs

Duncan posted on Fri, 21 Nov 2014 23:41:49 +0000 as excerpted:

> Duncan posted on Fri, 21 Nov 2014 22:49:06 +0000 as excerpted:
> 
>> Chris Murphy posted...
> 
>>> That's not true for thin volume snapshots. They take up next to no
>>> space upon creation, they don't need space reserved in advance.
>> 
>> Thus the mention of compression if necessary.  Thin-volume snapshots
>> are effectively compression by another name, and a raw dd from them
>> should compress pretty much equally well, depending on compression
>> method chosen, of course. =:^)
> 
> Oops, I mis-parsed "thin".  Good point and thanks, Chris.

... And Zygo, who pointed out my error as well. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-21  4:28                       ` Zygo Blaxell
  2014-11-21  6:22                         ` Duncan
@ 2014-11-22 17:34                         ` Goffredo Baroncelli
  2014-11-23  0:19                           ` Zygo Blaxell
  1 sibling, 1 reply; 64+ messages in thread
From: Goffredo Baroncelli @ 2014-11-22 17:34 UTC (permalink / raw)
  To: Zygo Blaxell, Phillip Susi; +Cc: Chris Murphy, Btrfs BTRFS

On 11/21/2014 05:28 AM, Zygo Blaxell wrote:
> e.g. if an ext4 filesystem explodes, I can:
> 
> 	1.  make a LVM snapshot of the broken filesystem
> 
> 	2.  run e2fsck on the snapshot
> 
> 	3.  mount and repair the snapshot, e.g. rsync any missing files
> 	from backups, salvage anything that survived
> 
> 	4.  LVM merge the snapshot to its origin volume
> 
> 	5.  umount the origin volume and mount the merged volume
> 	(or just reboot)
> 
> ...and I can do all of this on a running system, in-place, with only a
> few minutes of downtime in the must-reboot case.
> 
> None of the above works with btrfs at all.  Multi-device btrfs fails
> at 2, 

You can't compare ext4 with btrfs, if you are talking about a multi-device 
filesystem: ext4 haven't this capability. 
Try to make a md-raid over a snapshotted logical volume(s); I never tried
that, but I suppose that there will be the same problems...

> and mounting the filesystem fails at 3.  
Are you sure ?

ghigo@venice:/tmp$ # create a btrfs filesystem in a logical volume
ghigo@venice:/tmp$ sudo truncate -s +10G disk.img
ghigo@venice:/tmp$ sudo losetup -f disk.img 
ghigo@venice:/tmp$ sudo pvcreate /dev/loop0 
ghigo@venice:/tmp$ sudo vgcreate vgtest /dev/loop0 
ghigo@venice:/tmp$ sudo lvcreate -n lvone -L 3G vgtest
ghigo@venice:/tmp$ sudo mkfs.btrfs /dev/vgtest/lvone 
ghigo@venice:/tmp$ mkdir t

ghigo@venice:/tmp$ # create a file inside a btrfs fs
ghigo@venice:/tmp$ sudo mount /dev/vgtest/lvone t/
ghigo@venice:/tmp$ sudo dd if=/dev/zero of=t/disk-orig bs=1M count=1
ghigo@venice:/tmp$ sudo umount t

ghigo@venice:/tmp$ # make a lvm snapshot and add a 2nd file
ghigo@venice:/tmp$ sudo lvcreate -s -n lvone_snap -L 3G vgtest/lvone
ghigo@venice:/tmp$ sudo mount /dev/vgtest/lvone_snap t/
ghigo@venice:/tmp$ sudo dd if=/dev/zero of=t/disk-snap bs=1M count=1
ghigo@venice:/tmp$ sudo umount t

ghigo@venice:/tmp$ # mount the first one lv, and check the file
ghigo@venice:/tmp$ sudo mount /dev/vgtest/lvone t/
ghigo@venice:/tmp$ ls -l t
total 1024
-rw-r--r-- 1 root root 1048576 Nov 22 18:11 disk-orig
ghigo@venice:/tmp$ sudo umount t

ghigo@venice:/tmp$ # mount the first one lv, and check the files
ghigo@venice:/tmp$ sudo mount /dev/vgtest/lvone_snap t/
ghigo@venice:/tmp$ ls -l t
total 2048
-rw-r--r-- 1 root root 1048576 Nov 22 18:11 disk-orig
-rw-r--r-- 1 root root 1048576 Nov 22 18:12 disk-snap

On the basis of the example above, in case you want to mount a 
"single-disk", BTRFS seems me to work properly. You have to pay
attention only to not mount the two filesystem at the same time.

BR
G.Baroncelli


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-22 17:34                         ` Goffredo Baroncelli
@ 2014-11-23  0:19                           ` Zygo Blaxell
  2014-11-25 16:34                             ` Goffredo Baroncelli
  0 siblings, 1 reply; 64+ messages in thread
From: Zygo Blaxell @ 2014-11-23  0:19 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Phillip Susi, Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 2159 bytes --]

On Sat, Nov 22, 2014 at 06:34:38PM +0100, Goffredo Baroncelli wrote:
> On 11/21/2014 05:28 AM, Zygo Blaxell wrote:
> > e.g. if an ext4 filesystem explodes, I can:
> > 
> > 	1.  make a LVM snapshot of the broken filesystem
> > 
> > 	2.  run e2fsck on the snapshot
> > 
> > 	3.  mount and repair the snapshot, e.g. rsync any missing files
> > 	from backups, salvage anything that survived
> > 
> > 	4.  LVM merge the snapshot to its origin volume
> > 
> > 	5.  umount the origin volume and mount the merged volume
> > 	(or just reboot)
> > 
> > ...and I can do all of this on a running system, in-place, with only a
> > few minutes of downtime in the must-reboot case.
> > 
> > None of the above works with btrfs at all.  Multi-device btrfs fails
> > at 2, 
> 
> You can't compare ext4 with btrfs, if you are talking about a multi-device 
> filesystem: ext4 haven't this capability. 

btrfs fails this comparison as a single-device filesystem.

> Try to make a md-raid over a snapshotted logical volume(s); I never tried
> that, but I suppose that there will be the same problems...

md-raid works as long as you specify the devices, and because it's always
the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
not a particularly common use case, while making an LV snapshot of a
filesystem is a typical use case.

> > and mounting the filesystem fails at 3.  
> Are you sure ?

Yes, I'm sure.  I've had to replace filesystems destroyed this way.

>[working instance snipped]

> On the basis of the example above, in case you want to mount a 
> "single-disk", BTRFS seems me to work properly. You have to pay
> attention only to not mount the two filesystem at the same time.

The problem is btrfs stops searching when it sees one disk with each UUID,
so the set of disks (snapshot vs origin) that you get is *random*.
For a pair of origin + snapshots, there's a 50% chance it works, 50%
chance it eats your data.

> BR
> G.Baroncelli
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-23  0:19                           ` Zygo Blaxell
@ 2014-11-25 16:34                             ` Goffredo Baroncelli
  2014-11-25 20:29                               ` Zygo Blaxell
  0 siblings, 1 reply; 64+ messages in thread
From: Goffredo Baroncelli @ 2014-11-25 16:34 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
[...]
> md-raid works as long as you specify the devices, and because it's always
> the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
> not a particularly common use case, while making an LV snapshot of a
> filesystem is a typical use case.

I fully agree; but you still consider a *multi-device* btrfs over lvm...
This is like a dm over lvm... which doesn't make sense at all (as you 
already wrote)

> 
>>> and mounting the filesystem fails at 3.  
>> Are you sure ?
> 
> Yes, I'm sure.  I've had to replace filesystems destroyed this way.
> 
>> [working instance snipped]
> 
>> On the basis of the example above, in case you want to mount a 
>> "single-disk", BTRFS seems me to work properly. You have to pay
>> attention only to not mount the two filesystem at the same time.
> 
> The problem is btrfs stops searching when it sees one disk with each UUID,

BTRFS doens't search anything. It is udev which "push" the information
on the kernel module. The btrfs module groups these information by UUID.
When a new disk is inserted, overwrite the information of the old one.


> so the set of disks (snapshot vs origin) that you get is *random*.
> For a pair of origin + snapshots, there's a 50% chance it works, 50%
> chance it eats your data.

Sorry but I have to disagree: the code is quite clear 
(see fs/btrfs/volume.c, near line 512):

[...]

        } else if (!device->name || strcmp(device->name->str, path)) {
                /*
                 * When FS is already mounted.
                 * 1. If you are here and if the device->name is NULL that
                 *    means this device was missing at time of FS mount.
                 * 2. If you are here and if the device->name is different
                 *    from 'path' that means either
                 *      a. The same device disappeared and reappeared with
                 *         different name. or
                 *      b. The missing-disk-which-was-replaced, has
                 *         reappeared now.
                 *
                 * We must allow 1 and 2a above. But 2b would be a spurious
                 * and unintentional.

[...]

The case is the 2a; in this case btrfs store the new name and mount it.

Anyway I made a small test: I created 1 btrfs filesystem, and 
made a lvm-snapshot. Then create two different file in the snapshot and in
the original one. I run a program which mounts randomly the first or
the latter, checks if the correct file is present; after more than 130 tests I
never saw your "50% chance it works": it always works.

BR
G.Baroncelli

> 
>> BR
>> G.Baroncelli
>>
>>
>> -- 
>> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-25 16:34                             ` Goffredo Baroncelli
@ 2014-11-25 20:29                               ` Zygo Blaxell
  2014-11-25 21:59                                 ` Goffredo Baroncelli
  0 siblings, 1 reply; 64+ messages in thread
From: Zygo Blaxell @ 2014-11-25 20:29 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4835 bytes --]

On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote:
> On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
> [...]
> > md-raid works as long as you specify the devices, and because it's always
> > the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
> > not a particularly common use case, while making an LV snapshot of a
> > filesystem is a typical use case.
> 
> I fully agree; but you still consider a *multi-device* btrfs over lvm...
> This is like a dm over lvm... which doesn't make sense at all (as you 
> already wrote)

It makes sense for btrfs because btrfs can productively use LVs on
different PVs (e.g. btrfs-raid1 on two LVs, one on each PV).  LVM is
the bottom layer because not everything in the world is btrfs--things
like ephemeral /tmp, boot, swap, and temporary backup copies of the btrfs
(e.g.  before running btrfsck) have to live on the same physical drives
as the btrfs filesystems.

> >>> and mounting the filesystem fails at 3.  
> >> Are you sure ?
> > 
> > Yes, I'm sure.  I've had to replace filesystems destroyed this way.
> > 
> >> [working instance snipped]
> > 
> >> On the basis of the example above, in case you want to mount a 
> >> "single-disk", BTRFS seems me to work properly. You have to pay
> >> attention only to not mount the two filesystem at the same time.
> > 
> > The problem is btrfs stops searching when it sees one disk with each UUID,
> 
> BTRFS doens't search anything. It is udev which "push" the information
> on the kernel module. The btrfs module groups these information by UUID.
> When a new disk is inserted, overwrite the information of the old one.

Same result:  when presented with multiple devices with the same UUID,
one is chosen arbitrarily instead of rejecting all of them.

> > so the set of disks (snapshot vs origin) that you get is *random*.
> > For a pair of origin + snapshots, there's a 50% chance it works, 50%
> > chance it eats your data.
> 
> Sorry but I have to disagree: the code is quite clear 
> (see fs/btrfs/volume.c, near line 512):
> 
> [...]
> 
>         } else if (!device->name || strcmp(device->name->str, path)) {
>                 /*
>                  * When FS is already mounted.
>                  * 1. If you are here and if the device->name is NULL that
>                  *    means this device was missing at time of FS mount.
>                  * 2. If you are here and if the device->name is different
>                  *    from 'path' that means either
>                  *      a. The same device disappeared and reappeared with
>                  *         different name. or
>                  *      b. The missing-disk-which-was-replaced, has
>                  *         reappeared now.

If the FS is already mounted then there is no issue.  It's when you're trying
to mount the FS that the fun occurs.

>                  *
>                  * We must allow 1 and 2a above. But 2b would be a spurious
>                  * and unintentional.
> 
> [...]
> 
> The case is the 2a; in this case btrfs store the new name and mount it.
> 
> Anyway I made a small test: I created 1 btrfs filesystem, and 
> made a lvm-snapshot. Then create two different file in the snapshot and in
> the original one. I run a program which mounts randomly the first or
> the latter, checks if the correct file is present; after more than 130 tests I
> never saw your "50% chance it works": it always works.

One btrfs filesystem on two LVs with a snapshot of each LV also present.
So you'd have:

	lv00 - btrfs device 1
	lv01 - btrfs device 2
	lv00snap - snapshot of lv00
	lv01snap - snapshot of lv01

If you mount by device UUID then you get one of these results at random:

	lv00 + lv01 - OK
	lv00snap + lv01snap - also OK
	lv00 + lv01snap - failure
	lv00snap + lv01 - failure

2 failures, 2 successes = 50% failure rate.

If you mount by the name of one of the devices then you only get the two
rows of the above table that match the device you named, but you still
get one success row and one failure row.

Which result you get seems to depend on the order in which LVM enumerates
the LVs, so if you are doing a mount/umount loop then you won't see any
problems as btrfs will consistently make the same choice of LVs over
and over again.  Rebooting or creating other LVs in between mounts will
definitely cause problems.

> BR
> G.Baroncelli
> 
> > 
> >> BR
> >> G.Baroncelli
> >>
> >>
> >> -- 
> >> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> >> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-25 20:29                               ` Zygo Blaxell
@ 2014-11-25 21:59                                 ` Goffredo Baroncelli
  2014-11-25 22:21                                   ` Zygo Blaxell
  2014-11-26  3:22                                   ` Duncan
  0 siblings, 2 replies; 64+ messages in thread
From: Goffredo Baroncelli @ 2014-11-25 21:59 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

On 11/25/2014 09:29 PM, Zygo Blaxell wrote:
> On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote:
>> On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
>> [...]
>>> md-raid works as long as you specify the devices, and because it's always
>>> the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
>>> not a particularly common use case, while making an LV snapshot of a
>>> filesystem is a typical use case.
>>
>> I fully agree; but you still consider a *multi-device* btrfs over lvm...
>> This is like a dm over lvm... which doesn't make sense at all (as you 
>> already wrote)
> 
> It makes sense for btrfs because btrfs can productively use LVs on
> different PVs (e.g. btrfs-raid1 on two LVs, one on each PV).  LVM is
> the bottom layer because not everything in the world is btrfs--things
> like ephemeral /tmp, boot, swap, and temporary backup copies of the btrfs
> (e.g.  before running btrfsck) have to live on the same physical drives
> as the btrfs filesystems.

Let me to summrize

1) btrfs-single-disk on lvm works fine
2) btrfs-w/multiple-disk on lvm works fine
3) btrfs-single-disk on lvm works fine even with snapshot

4) btrfs-w/multiple-disk doesn't work with lvm AND snapshot

However I still doesn't understood why you want btrfs-w/multiple disk over LVM ?



> 
>>>>> and mounting the filesystem fails at 3.  
>>>> Are you sure ?
>>>
>>> Yes, I'm sure.  I've had to replace filesystems destroyed this way.

In a previous email you wrote:
>> Multi-device btrfs fails at 2, 
So I assumed that the point 3 onwards were related to a "single-disk" btrfs.



[...]


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-25 21:59                                 ` Goffredo Baroncelli
@ 2014-11-25 22:21                                   ` Zygo Blaxell
  2014-11-25 22:47                                     ` Chris Murphy
                                                       ` (2 more replies)
  2014-11-26  3:22                                   ` Duncan
  1 sibling, 3 replies; 64+ messages in thread
From: Zygo Blaxell @ 2014-11-25 22:21 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2765 bytes --]

On Tue, Nov 25, 2014 at 10:59:53PM +0100, Goffredo Baroncelli wrote:
> On 11/25/2014 09:29 PM, Zygo Blaxell wrote:
> > On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote:
> >> On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
> >> [...]
> >>> md-raid works as long as you specify the devices, and because it's always
> >>> the lowest layer it can ignore LVs (snapshot or otherwise).  It's also
> >>> not a particularly common use case, while making an LV snapshot of a
> >>> filesystem is a typical use case.
> >>
> >> I fully agree; but you still consider a *multi-device* btrfs over lvm...
> >> This is like a dm over lvm... which doesn't make sense at all (as you 
> >> already wrote)
> > 
> > It makes sense for btrfs because btrfs can productively use LVs on
> > different PVs (e.g. btrfs-raid1 on two LVs, one on each PV).  LVM is
> > the bottom layer because not everything in the world is btrfs--things
> > like ephemeral /tmp, boot, swap, and temporary backup copies of the btrfs
> > (e.g.  before running btrfsck) have to live on the same physical drives
> > as the btrfs filesystems.
> 
> Let me to summrize
> 
> 1) btrfs-single-disk on lvm works fine
> 2) btrfs-w/multiple-disk on lvm works fine
> 3) btrfs-single-disk on lvm works fine even with snapshot
> 
> 4) btrfs-w/multiple-disk doesn't work with lvm AND snapshot
> 
> However I still doesn't understood why you want btrfs-w/multiple disk over LVM ?

I want to split a few disks into partitions, but I want to create,
move, and resize the partitions from time to time.  Only LVM can do
that without taking the machine down, reducing RAID integrity levels,
hotplugging drives, or leaving installed drives idle most of the time.

I want btrfs-raid1 because of its ability to replace corrupted or lost
data from one disk using the other.  If I run a single-volume btrfs
on LVM-RAID1 (or dm-RAID1, or RAID1 at any other layer of the storage
stack), I can detect lost data, but not replace it automatically from
the other mirror.

Since I want both things at the same time, I have btrfs w/multiple disks
on LVM.

The LVM snapshots are for providing an 'undo' capability when I experiment
with some btrfs or btrfsck feature that destroys the filesystem.

> >>>>> and mounting the filesystem fails at 3.  
> >>>> Are you sure ?
> >>>
> >>> Yes, I'm sure.  I've had to replace filesystems destroyed this way.
> 
> In a previous email you wrote:
> >> Multi-device btrfs fails at 2, 
> So I assumed that the point 3 onwards were related to a "single-disk" btrfs.
> 
> 
> 
> [...]
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-25 22:21                                   ` Zygo Blaxell
@ 2014-11-25 22:47                                     ` Chris Murphy
       [not found]                                     ` <CAJCQCtQUM=viSoPtcJMcyKquYb1DLmEsqBi=p++uXPy63+r3Ow@mail.gmail.com>
  2014-11-26 17:19                                     ` Goffredo Baroncelli
  2 siblings, 0 replies; 64+ messages in thread
From: Chris Murphy @ 2014-11-25 22:47 UTC (permalink / raw)
  To: linux-btrfs

What happens when all btrfs LVs are unmounted, and you lvchange -an
the LVs (the pair) you do not want mounted; and then btrfs dev scan;
and then mount one of the devices? It should only find the matching LV
because the others are deactivated. I know this isn't ideal, but it's
better than corruption.


Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-25 21:59                                 ` Goffredo Baroncelli
  2014-11-25 22:21                                   ` Zygo Blaxell
@ 2014-11-26  3:22                                   ` Duncan
  2014-11-26  5:11                                     ` Chris Murphy
  2014-11-26 22:08                                     ` Robert White
  1 sibling, 2 replies; 64+ messages in thread
From: Duncan @ 2014-11-26  3:22 UTC (permalink / raw)
  To: linux-btrfs

Goffredo Baroncelli posted on Tue, 25 Nov 2014 22:59:53 +0100 as
excerpted:

> However I still doesn't understood why you want btrfs-w/multiple disk
> over LVM ?

While I'm not an LVM person here, and he already replied with essentially 
the same point, I think it's worth repeating...

Btrfs' checksummed error detection and automatic rewrite from a different 
copy isn't a small thing, and simply isn't available at all with most 
would-be alternatives (zfs being the only similar thing I know of for 
Linux, and of course it has its own issues both technical and social/
legal/license).  That alone is worth running multi-device btrfs to get.  
That makes btrfs a near-mandatory part of the picture, whatever it's on.

And for people wanting LVM's volume management (including partitioning 
without many of the limitations), the direct result is multi-device btrfs 
on lvm.

>From my perspective, however, btrfs is simply incompatible with lvm 
snapshots, because the basic assumptions are incompatible.  Btrfs assumes 
UUIDs will be exactly what they say on the label, /unique/, while lvm's 
snapshot feature directly breaks that uniqueness by copying the (former) 
UUID, thus making the former UUID no longer unique and thus no longer 
truly UUID.  Thus, part of the lvm /feature/ of snapshots is in direct 
contradiction to a basic assumption of btrfs, that UUIDs are exactly 
that, unique, making that feature directly incompatible with btrfs on a 
very basic level.

So people can have their btrfs on lvm, but if they do, they have to forego 
LVM snapshots because btrfs isn't compatible with their usage.  To me 
it's as simple as that, and people can choose either btrfs or lvm 
snapshots, but not both, it's one XOR the other.  So for me it's simply 
choose the one you will have the most difficulty doing without and forgo 
the other one.  Not a problem, just make your choice and move on.

OTOH, there's that common signature about the reasonable man folding to 
the circumstance while the unreasonable man insisting on folding the 
circumstance to his wishes instead, so progress depends on the 
unreasonable man...

But that's exactly what I see here, an unreasonable man insisting that 
entirely logical circumstance bend to his will.  Which, given someone to 
actually code it up, it might well do. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
       [not found]                                       ` <20141126021134.GR17380@hungrycats.org>
@ 2014-11-26  4:48                                         ` Chris Murphy
  0 siblings, 0 replies; 64+ messages in thread
From: Chris Murphy @ 2014-11-26  4:48 UTC (permalink / raw)
  To: Btrfs BTRFS

On Tue, Nov 25, 2014 at 7:11 PM, Zygo Blaxell <zblaxell@furryterror.org> wrote:
> On Tue, Nov 25, 2014 at 03:46:32PM -0700, Chris Murphy wrote:
>> What happens when all btrfs LVs are unmounted, and you lvchange -an
>> the LVs (the pair) you do not want mounted; and then btrfs dev scan;
>> and then mount one of the devices? It should only find the matching LV
>> because the others are deactivated. I know this isn't ideal, but it's
>> better than corruption.
>
> This is one of two possible ways to assemble the btrfs correctly.
> The other is to explicitly name all of the devices when mounting.

OK I didn't realize it was possible to explicitly name all of them,
the last time I'd tried this (about 9 epochs ago) mount didn't
understand being passed two devices before the mount point.

>
> The challenge for the poor end-user (or inexperienced sysadmin) is to
> defeat all the defaults in system installers, initramfs-tools, lvm2,
> udev, etc. to prevent btrfs from destroying a filesystem accidentally.

I agree if it finds two identical volumes it should fail to mount with
some coherent error.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-26  3:22                                   ` Duncan
@ 2014-11-26  5:11                                     ` Chris Murphy
  2014-11-26 22:08                                     ` Robert White
  1 sibling, 0 replies; 64+ messages in thread
From: Chris Murphy @ 2014-11-26  5:11 UTC (permalink / raw)
  To: Btrfs BTRFS

On Tue, Nov 25, 2014 at 8:22 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> From my perspective, however, btrfs is simply incompatible with lvm
> snapshots, because the basic assumptions are incompatible.  Btrfs assumes
> UUIDs will be exactly what they say on the label, /unique/, while lvm's
> snapshot feature directly breaks that uniqueness by copying the (former)
> UUID, thus making the former UUID no longer unique and thus no longer
> truly UUID.

The seed device has a mechanism to change volume UUID without
rewriting a bunch of stuff in the original, the gotcha is that it
requires adding a device.

man fsfreeze says "fsfreeze is unncessary for device-mapper devices.
The device-mapper (and LVM) automatically freezes filesystem on the
device when a snapshot creation is requested." So if it's possible to
communicate snapshotting/freezing to the fs at snapshot time, then
maybe btrfs could 'btrfstune -S 1' the volume in the snapshot. That
way that snapshot actually contains a btrfs seed device, which is read
only. At least the snapshot copy isn't going to get obliterated in an
accident; even though most people would probably want the origin LV to
be protected while considering the snapshot disposable.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-25 22:21                                   ` Zygo Blaxell
  2014-11-25 22:47                                     ` Chris Murphy
       [not found]                                     ` <CAJCQCtQUM=viSoPtcJMcyKquYb1DLmEsqBi=p++uXPy63+r3Ow@mail.gmail.com>
@ 2014-11-26 17:19                                     ` Goffredo Baroncelli
  2014-11-27  4:15                                       ` Zygo Blaxell
  2 siblings, 1 reply; 64+ messages in thread
From: Goffredo Baroncelli @ 2014-11-26 17:19 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

On 11/25/2014 11:21 PM, Zygo Blaxell wrote:
>> > However I still doesn't understood why you want btrfs-w/multiple disk over LVM ?
> I want to split a few disks into partitions, but I want to create,
> move, and resize the partitions from time to time.  Only LVM can do
> that without taking the machine down, reducing RAID integrity levels,
> hotplugging drives, or leaving installed drives idle most of the time.
> 
> I want btrfs-raid1 because of its ability to replace corrupted or lost
> data from one disk using the other.  If I run a single-volume btrfs
> on LVM-RAID1 (or dm-RAID1, or RAID1 at any other layer of the storage
> stack), I can detect lost data, but not replace it automatically from
> the other mirror.
OK, now I have understood.

Anyway as workaround, take in account that you can pass explicitly the
devices as:

mount -o device=/dev/sda,device=/dev/sdb,device=/dev/sdc /dev/sdd /mnt

(supposing that the filesystem is on /dev/sda.../dev/sdd)

I am working to a mount.btrfs helper. The aim of this helper is to manage
the assembling of multiple devices; the main points will be:
- wait until all the devices appeared
- allow (if required) to mount in degraded mode after a timeout
- at this point it could/should also skip the lvm-snapshotted devices (but before 
I have to know how recognize these) 

I hope to issue the patches in the next week

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-26  3:22                                   ` Duncan
  2014-11-26  5:11                                     ` Chris Murphy
@ 2014-11-26 22:08                                     ` Robert White
  2014-11-27  9:08                                       ` Duncan
  1 sibling, 1 reply; 64+ messages in thread
From: Robert White @ 2014-11-26 22:08 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 11/25/2014 07:22 PM, Duncan wrote:
>>From my perspective, however, btrfs is simply incompatible with lvm
> snapshots, because the basic assumptions are incompatible.  Btrfs assumes
> UUIDs will be exactly what they say on the label, /unique/, while lvm's
> snapshot feature directly breaks that uniqueness by copying the (former)
> UUID, thus making the former UUID no longer unique and thus no longer
> truly UUID.  Thus, part of the lvm /feature/ of snapshots is in direct
> contradiction to a basic assumption of btrfs, that UUIDs are exactly
> that, unique, making that feature directly incompatible with btrfs on a
> very basic level.

A finer point here. LVM doesn't "copy" the UUID. AN LVM snapshot is a 
copy-on-write entity so it _exposes_ the single sector(s) of the 
superblock(s) in both views of the underlying storage. This is universal 
to the idea of a snapshot. Just as a "btrfs subvol snap /old /new" 
exposes all the "unique" elements of "/old" under the name "/new" (in 
preparation for the user to implement subsequent divergence); "lvmcreate 
--snapshot Old New" causes every block-N of Old to be identically 
available as block-N of New (in preparation for the user to implement 
subsequent divergence).

In point of fact the LVM snapshot operation is a zero-copy operation at 
its heart. After the snapshot is established, when a block in modified 
in Old, it's original content is saved in New. When blocks are written 
in New, they are written in place and the reference to the block content 
in Old is overwritten.

This is the reason that fsfreeze is unnecessary for things above LVM 
snapshots as the instant-in-time divergence is _instant_. It's not that 
LVM goes out and does an fsfreeze equivalent action, its that the switch 
to write-divergence is essentially atomic. A bunch of metatdata is setup 
and then all-at-once one write behavior is switched with another by 
re-mapping the device access routines.

So while you may have a point about btrfs being unprepared for LVM, 
neither party is particularly "at fault" in any way.

The "damn you photocopier for making photocopies so identically" nature 
of your problem with LVM seems to be leading you to misplaced conclusions.

If you need to harmonize these sorts of things, you need to be able to 
re-write blocks in question with disambiguating information (like new 
UUIDS) or restrict your accesses in some other manner.

If you are waiting for someone to "code it up" perhaps you should do so. 
But it will _never_ be automatic because the use cases that don't match 
your expectations may need the founding assumptions to be as they are today.

In other words, your belief that your position is "entirely logical" may 
be a little off, particularly if you think LVM is "Copying" things when 
it does a snapshot.

As previously stated XFS solved this problem by providing a tool that 
would change the UUID of a file system. This tool cold then be pointed 
at either (or both) the original and/or snapshot volumes as needed.

I don't see a "re-make the btrfs" option for changing UUIDs and LVM 
doesn't care _at_ _all_ about what is actually in its volumes (okay, 
lvresize has some fsck nonsense, but that's just messy).

It might even be "wrong" to try to harmonize those features, like trying 
to put a manual clutch into a car with an automatic transmission... it 
may just not fit.

Given that BTRFS want's to play in the same level of abstraction as LVM, 
its kind of a given that they'll butt heads over things like conflicting 
definitions of what it means to take a snapshot.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-26 17:19                                     ` Goffredo Baroncelli
@ 2014-11-27  4:15                                       ` Zygo Blaxell
  2014-11-28 17:05                                         ` Goffredo Baroncelli
  0 siblings, 1 reply; 64+ messages in thread
From: Zygo Blaxell @ 2014-11-27  4:15 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3001 bytes --]

On Wed, Nov 26, 2014 at 06:19:05PM +0100, Goffredo Baroncelli wrote:
> On 11/25/2014 11:21 PM, Zygo Blaxell wrote:
> >> > However I still doesn't understood why you want btrfs-w/multiple disk over LVM ?
> > I want to split a few disks into partitions, but I want to create,
> > move, and resize the partitions from time to time.  Only LVM can do
> > that without taking the machine down, reducing RAID integrity levels,
> > hotplugging drives, or leaving installed drives idle most of the time.
> > 
> > I want btrfs-raid1 because of its ability to replace corrupted or lost
> > data from one disk using the other.  If I run a single-volume btrfs
> > on LVM-RAID1 (or dm-RAID1, or RAID1 at any other layer of the storage
> > stack), I can detect lost data, but not replace it automatically from
> > the other mirror.
> OK, now I have understood.
> 
> Anyway as workaround, take in account that you can pass explicitly the
> devices as:
> 
> mount -o device=/dev/sda,device=/dev/sdb,device=/dev/sdc /dev/sdd /mnt
> 
> (supposing that the filesystem is on /dev/sda.../dev/sdd)
> 
> I am working to a mount.btrfs helper. The aim of this helper is to manage
> the assembling of multiple devices; the main points will be:
> - wait until all the devices appeared

...and make sure there are no duplicate UUIDs.

> - allow (if required) to mount in degraded mode after a timeout

This is a terrible idea with current btrfs, at least for read-write
degraded mounting (fallback to read-only degraded would be OK).
Mounting a filesystem read-write and degraded is something you only want
to do immediately before you replace all the missing disks and bring the
filesystem up to a non-degraded space and after you've ensured that the
missing disks can never, ever come back; otherwise, btrfs eats your data
in a slightly different way than we have discussed so far...

> - at this point it could/should also skip the lvm-snapshotted devices (but before 
> I have to know how recognize these) 

You don't have to recognize them as snapshots (and it's probably better
not to treat snapshots specially anyway--how do you know whether the
snapshot or the origin LVs are wanted for mounting?).  You just have to
detect duplicate UUIDs at the btrfs subdevice level, and if any are found,
stop immediately (or get a hint from the admin).

This is a weakness of the current udev and asynchronous device hotplug
concept:  there is no notion of bus enumeration in progress, so we can be
trying to assemble multi-device storage before we have all the devices
visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
lvm2...) has to wait until all known storage buses are fully enumerated
in order to detect if there are duplicates.

> I hope to issue the patches in the next week
> 
> BR
> G.Baroncelli
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-26 22:08                                     ` Robert White
@ 2014-11-27  9:08                                       ` Duncan
  2014-11-28  7:10                                         ` Chris Murphy
  0 siblings, 1 reply; 64+ messages in thread
From: Duncan @ 2014-11-27  9:08 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Wed, 26 Nov 2014 14:08:14 -0800 as excerpted:

> On 11/25/2014 07:22 PM, Duncan wrote:
>>>From my perspective, however, btrfs is simply incompatible with lvm
>> snapshots, because the basic assumptions are incompatible.  Btrfs
>> assumes UUIDs will be exactly what they say on the label, /unique/,
>> while lvm's snapshot feature directly breaks that uniqueness by copying
>> the (former) UUID, thus making the former UUID no longer unique and
>> thus no longer truly UUID.  Thus, part of the lvm /feature/ of
>> snapshots is in direct contradiction to a basic assumption of btrfs,
>> that UUIDs are exactly that, unique, making that feature directly
>> incompatible with btrfs on a very basic level.
> 
> A finer point here. LVM doesn't "copy" the UUID. AN LVM snapshot is a
> copy-on-write entity so it _exposes_ the single sector(s) of the
> superblock(s) in both views of the underlying storage.

I /hate/ it when this happens, which is why my posts often end up so 
long.  People keep saying shorten them, but when I try, invariably I end 
up shortcutting something like this and get called on it! =:^(

So, umm... kinda late now, but read that "copy" as if it had a footnote 
attached, saying "Yes, I know it's not actual copy, it's two views of the 
same thing using COW, but my point is, from the btrfs perspective it's a 
copy, the "universally UNIQUE ID" no longer looks "unique" and thus no 
longer can be properly called a UUID at all."

Which kinda makes most of the rest of what you said, which I agree with 
in general were it the case that I actually thought of it as a literal 
copy, unnecessary...

Tho I can't fault you for catching and pointing out my shortcut as an 
error, because you're absolutely correct in that case, and I'd almost 
certainly be doing the same thing were the situation reversed.

> So while you may have a point about btrfs being unprepared for LVM,
> neither party is particularly "at fault" in any way.
> 
> The "damn you photocopier for making photocopies so identically" nature
> of your problem with LVM seems to be leading you to misplaced
> conclusions.

Well, to the extent that I tried to take an unwarranted logical shortcut 
and didn't properly describe it...

But... I'd still say LVM is "at fault" to the extent that anyone is, as 
it /knows/ it's dealing with UUIDs because after all that's part of 
what's /on/ what it's snapshotting, and it doesn't make any effort to 
deal with the situation, despite the at least theoretical (and now in 
fact) confusion that may occur when former UUIDs are no longer unique and 
thus no longer UUIDs.

However, the point remains, they are pretty much incompatible, in that 
one assumes "unique" means that a second one won't pop up elsewhere and 
depends on exactly that, while the functionality of the other is exactly 
that, to make another view of the same thing, including the otherwise 
unique ID, pop up elsewhere, with COW semantics.

> If you are waiting for someone to "code it up" perhaps you should do so.

I'm not sure if that was the singular or plural "you", but in any case, 
it won't be /me/, because I'm not a coder, simply another sysadmin 
willing to guinea-pig this fascinating new filesystem toy. =:^)

> As previously stated XFS solved this problem by providing a tool that
> would change the UUID of a file system. This tool cold then be pointed
> at either (or both) the original and/or snapshot volumes as needed.

I think that'll eventually happen.  Actually, I see it's on the wiki 
project ideas page, now (see 1.2.25 and 1.2.26, online/offline UUID 
changes, respectively):

https://btrfs.wiki.kernel.org/index.php/Project_ideas

There's even POC code. =:^)  Wiki page history says Kdave added that on 
06 Oct. 2014, so the entry is reasonably new, and the POC's encouraging, 
but will it go anywhere from there?

> Given that BTRFS want's to play in the same level of abstraction as LVM,
> its kind of a given that they'll butt heads over things like conflicting
> definitions of what it means to take a snapshot.

Agreed.

Actually, given btrfs is already doing much of it, it'd be interesting if 
it eventually got the ability to specify where subvolumes went and limit 
them in size (ideally more directly than the existing btrfs quotas 
related functionality does, etc, thus avoiding having to rely on LVM for 
that and eliminating the need for it in scenarios where that's desired.  
Couple that with the better snapshot handling that is already in the 
works, and would there /still/ be a need for LVM under btrfs then; for 
what if so, and could it too be integrated into btrfs?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-27  9:08                                       ` Duncan
@ 2014-11-28  7:10                                         ` Chris Murphy
  2014-11-29  7:29                                           ` Duncan
  0 siblings, 1 reply; 64+ messages in thread
From: Chris Murphy @ 2014-11-28  7:10 UTC (permalink / raw)
  Cc: Btrfs BTRFS

On Thu, Nov 27, 2014 at 2:08 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> So, umm... kinda late now, but read that "copy" as if it had a footnote
> attached, saying "Yes, I know it's not actual copy, it's two views of the
> same thing using COW, but my point is, from the btrfs perspective it's a
> copy, the "universally UNIQUE ID" no longer looks "unique" and thus no
> longer can be properly called a UUID at all."

The copy is sort of a misnomer anyway because up until the computer
age the copy was a derivative, a facsimile, like a photocopy. But a
copy of a digital file is actually another original. Therein lies the
problem with the LVM snapshot in this context, we don't want another
original. We want a copy, as in we want something we know has been
derived from something else, and therefore can be discriminated.

And that's the same problem with subvolume UUIDs being "reused" when
creating new Btrfs volumes, which have new volume UUIDs, from a Btrfs
seed device. There are now multiple originals of those subvolumes,
there's no distinguishing them by their UUID alone.


> But... I'd still say LVM is "at fault" to the extent that anyone is, as
> it /knows/ it's dealing with UUIDs because after all that's part of
> what's /on/ what it's snapshotting, and it doesn't make any effort to
> deal with the situation, despite the at least theoretical (and now in
> fact) confusion that may occur when former UUIDs are no longer unique and
> thus no longer UUIDs.

Well RFC 4122 I don't think would say it's not a UUID, the uniqueness
is only guaranteed at the time of UUID creation. And duplication isn't
creation so it's not going to say these things are no longer UUIDs,
they're just UUIDs that have been recycled. That RFC doesn't specify
workflow, but if it did, I think it'd basically say "oh crap, why'd
you go and do that?" After all a major point of UUIDs is that they are
effectively unlimited in quantity, therefore a.) we don't need central
registry to avoid (unintended) collisions because they're so uncommon,
b.) we're encouraged to not be attached to specific UUIDs when in
doubt just create another one.

A very good example of WTF reusage of a UUID that irks me to no end is
GNU parted devs decided to recycle the Microsoft Windows Basic Data
partition type GUID for Linux partitions. It's like watching someone
get run over by a zamboni with 50 feet of advance notice...



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-27  4:15                                       ` Zygo Blaxell
@ 2014-11-28 17:05                                         ` Goffredo Baroncelli
  2014-11-29  1:25                                           ` Robert White
  2014-11-29  4:59                                           ` Zygo Blaxell
  0 siblings, 2 replies; 64+ messages in thread
From: Goffredo Baroncelli @ 2014-11-28 17:05 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

On 11/27/2014 05:15 AM, Zygo Blaxell wrote:
> On Wed, Nov 26, 2014 at 06:19:05PM +0100, Goffredo Baroncelli wrote:
>> On 11/25/2014 11:21 PM, Zygo Blaxell wrote:
>>>>> However I still doesn't understood why you want btrfs-w/multiple disk over LVM ?
>>> I want to split a few disks into partitions, but I want to create,
>>> move, and resize the partitions from time to time.  Only LVM can do
>>> that without taking the machine down, reducing RAID integrity levels,
>>> hotplugging drives, or leaving installed drives idle most of the time.
>>>
>>> I want btrfs-raid1 because of its ability to replace corrupted or lost
>>> data from one disk using the other.  If I run a single-volume btrfs
>>> on LVM-RAID1 (or dm-RAID1, or RAID1 at any other layer of the storage
>>> stack), I can detect lost data, but not replace it automatically from
>>> the other mirror.
>> OK, now I have understood.
>>
>> Anyway as workaround, take in account that you can pass explicitly the
>> devices as:
>>
>> mount -o device=/dev/sda,device=/dev/sdb,device=/dev/sdc /dev/sdd /mnt
>>
>> (supposing that the filesystem is on /dev/sda.../dev/sdd)
>>
>> I am working to a mount.btrfs helper. The aim of this helper is to manage
>> the assembling of multiple devices; the main points will be:
>> - wait until all the devices appeared
> 
> ...and make sure there are no duplicate UUIDs.
Yes, at the end I implemented in this way the "snapshot" detection:
if two autodetected devices have the same DISK_UUID (reported as 
SUB_UUID by blkid), th emount process stopped. I checked also the 
num_device field of the superblock.

> 
>> - allow (if required) to mount in degraded mode after a timeout
> 
> This is a terrible idea with current btrfs, at least for read-write
> degraded mounting (fallback to read-only degraded would be OK).
> Mounting a filesystem read-write and degraded is something you only want
> to do immediately before you replace all the missing disks and bring the
> filesystem up to a non-degraded space and after you've ensured that the
> missing disks can never, ever come back; otherwise, btrfs eats your data
> in a slightly different way than we have discussed so far...

I don't care. If the user pass "degraded" in the options of mount, 
he have it. Anyway this (wrong) btrfs behavior I hope that it will be
solved.
> 
>> - at this point it could/should also skip the lvm-snapshotted devices (but before 
>> I have to know how recognize these) 
> 
> You don't have to recognize them as snapshots (and it's probably better
> not to treat snapshots specially anyway--how do you know whether the
> snapshot or the origin LVs are wanted for mounting?).  You just have to
> detect duplicate UUIDs at the btrfs subdevice level, and if any are found,
> stop immediately (or get a hint from the admin).

For the disk autodetection, I still convinced that it is a "sane" default
to skip the lvm-snapshot

> 
> This is a weakness of the current udev and asynchronous device hotplug
> concept:  there is no notion of bus enumeration in progress, so we can be
> trying to assemble multi-device storage before we have all the devices
> visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
> lvm2...) has to wait until all known storage buses are fully enumerated
> in order to detect if there are duplicates.

It is more complex than that. Some devices may appear after the "1st" bus
enumeration.


> 
>> I hope to issue the patches in the next week
>>
>> BR
>> G.Baroncelli
>>
>> -- 
>> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-28 17:05                                         ` Goffredo Baroncelli
@ 2014-11-29  1:25                                           ` Robert White
  2014-11-29  7:35                                             ` Goffredo Baroncelli
  2014-11-29  7:37                                             ` MegaBrutal
  2014-11-29  4:59                                           ` Zygo Blaxell
  1 sibling, 2 replies; 64+ messages in thread
From: Robert White @ 2014-11-29  1:25 UTC (permalink / raw)
  To: kreijack, Zygo Blaxell; +Cc: linux-btrfs

On 11/28/2014 09:05 AM, Goffredo Baroncelli wrote:
> For the disk autodetection, I still convinced that it is a "sane" default
> to skip the lvm-snapshot

No... please don't...

Maybe offer an option to select between snapshots or no-snapshots but in 
much the same way there is no _functional_ difference between a 
subvolume and a snapshot in btrfs, there is no "degenerate" status to an 
LVM snapshot.

It would be way more useful if the helper dumped a message via stderr or 
syslog that said something like "UUID=xxxxxxxx ambiguous, must select 
between /dev/AA and /dev/BB using device= to mount filesystem."



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-28 17:05                                         ` Goffredo Baroncelli
  2014-11-29  1:25                                           ` Robert White
@ 2014-11-29  4:59                                           ` Zygo Blaxell
  2014-11-29  7:55                                             ` Robert White
  1 sibling, 1 reply; 64+ messages in thread
From: Zygo Blaxell @ 2014-11-29  4:59 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1147 bytes --]

On Fri, Nov 28, 2014 at 06:05:48PM +0100, Goffredo Baroncelli wrote:
> On 11/27/2014 05:15 AM, Zygo Blaxell wrote:
> > This is a weakness of the current udev and asynchronous device hotplug
> > concept:  there is no notion of bus enumeration in progress, so we can be
> > trying to assemble multi-device storage before we have all the devices
> > visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
> > lvm2...) has to wait until all known storage buses are fully enumerated
> > in order to detect if there are duplicates.
> 
> It is more complex than that. Some devices may appear after the "1st" bus
> enumeration.

That case is well handled already--a new enumeration will start with the
second (and all later) hotplug events.

The problem arises when we try to assemble disk arrays before the
known end of the "1st" (or any) enumeration.  There is no way for an
enumerating agent to tell other agents "this is definitely not the
complete list of devices yet, other devices may be inserted imminently"
and defer all the multi-device assembly until the address space of the
enumering bus is fully covered.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-28  7:10                                         ` Chris Murphy
@ 2014-11-29  7:29                                           ` Duncan
  2014-11-29  8:20                                             ` Robert White
  0 siblings, 1 reply; 64+ messages in thread
From: Duncan @ 2014-11-29  7:29 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Fri, 28 Nov 2014 00:10:40 -0700 as excerpted:

> On Thu, Nov 27, 2014 at 2:08 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>> So, umm... kinda late now, but read that "copy" as if it had a footnote
>> attached, saying "Yes, I know it's not actual copy, it's two views of
>> the same thing using COW, but my point is, from the btrfs perspective
>> it's a copy, the "universally UNIQUE ID" no longer looks "unique" and
>> thus no longer can be properly called a UUID at all."
> 
> The copy is sort of a misnomer anyway because up until the computer age
> the copy was a derivative, a facsimile, like a photocopy. But a copy of
> a digital file is actually another original. Therein lies the problem
> with the LVM snapshot in this context, we don't want another original.
> We want a copy, as in we want something we know has been derived from
> something else, and therefore can be discriminated.

Very good point.  I had all the pieces but hadn't put them together yet, 
so thanks. =:^)

> Well RFC 4122 I don't think would say it's not a UUID, the uniqueness is
> only guaranteed at the time of UUID creation. And duplication isn't
> creation so it's not going to say these things are no longer UUIDs,
> they're just UUIDs that have been recycled. That RFC doesn't specify
> workflow, but if it did, I think it'd basically say "oh crap, why'd you
> go and do that?" After all a major point of UUIDs is that they are
> effectively unlimited in quantity, therefore a.) we don't need central
> registry to avoid (unintended) collisions because they're so uncommon,
> b.) we're encouraged to not be attached to specific UUIDs when in doubt
> just create another one.

Another good point.  One common and less RFC/technical way of putting it, 
that I had thought about a few times but hadn't actually posted yet IIRC, 
is the old "If it hurts when you bang your head against the wall, quit 
banging!" =:^)

IOW, LVM could change the UUIDs in its "copies", COWing that bit in 
ordered to do so.  While that wouldn't change the same UUIDs embedded in 
for instance btrfs internals it would provide a mechanism to keep initial 
scans from confusing things, and filesystems or other UUID applications 
that duplicated the number for their own internals would then need to 
provide tools that rewrote them to match the LVM-changed master location 
UUID.  Those that failed to do so would fail to function unless/until the 
master location version was changed back, but the tools and likely would 
eventually be provided, as I expect they will be here, but the difference 
would be at least it'd keep mixups like this from happening.

> A very good example of WTF reusage of a UUID that irks me to no end is
> GNU parted devs decided to recycle the Microsoft Windows Basic Data
> partition type GUID for Linux partitions. It's like watching someone get
> run over by a zamboni with 50 feet of advance notice...

At least I don't have to worry about that one, since I no longer agree to 
"WE REFUSE TO TELL YOU SPECIFICALLY WHAT THIS SOFTWARE DOES AS WE DON'T 
SUPPLY THE SOURCES, BUT YOU ARE STILL REQUIRED TO ACCEPT ALL 
RESPONSIBILITY FOR IT, REGARDLESS OF WHAT IT DOES AND REGARDLESS OF 
WHETHER WE'VE BEEN WARNED" style EULAs, which is basically all of them, 
which means I have no legal way to run that software, so I don't.  Note 
that the GPL among others has similar liability disclaimer wording (and 
to be fair it'd be hard not to, since the sources are there and the 
original author can hardly be held responsible for later modifications to 
them), but because it actually gives you the sources too, it allows you 
to fairly make your own decision about the responsibility you're about to 
take on.

Since I can't/won't run pretty much anything proprietary, there's little 
chance of it being taken as anything but Linux, here.  (Tho I actually 
use (c)gdisk for partitioning here and it appears to use a different GUID. 
(0700 in its short form which AFAIK is gdisk specific, for MS basic data, 
while it uses 8300 for general Linux filesystems.  I could look up the 
long form GUIDs, but meh...)


-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  1:25                                           ` Robert White
@ 2014-11-29  7:35                                             ` Goffredo Baroncelli
  2014-11-29  8:02                                               ` Robert White
  2014-11-29  7:37                                             ` MegaBrutal
  1 sibling, 1 reply; 64+ messages in thread
From: Goffredo Baroncelli @ 2014-11-29  7:35 UTC (permalink / raw)
  To: Robert White, Zygo Blaxell; +Cc: linux-btrfs

On 11/29/2014 02:25 AM, Robert White wrote:
> On 11/28/2014 09:05 AM, Goffredo Baroncelli wrote:
>> For the disk autodetection, I still convinced that it is a "sane"
>> default to skip the lvm-snapshot
> 
> No... please don't...
> 
> Maybe offer an option to select between snapshots or no-snapshots but
> in much the same way there is no _functional_ difference between a
> subvolume and a snapshot in btrfs, there is no "degenerate" status to
> an LVM snapshot.

I agree with you; but I have to find a "default" so during the boot
a system can start even if snapshots are present.

And pay attention that there would be cases where multiple
snapshot are present: how group these ? My be for generation number ?

Anyway for the moment my help simply refuse to mount if there is
a conflict of dev_uuid.

> 
> It would be way more useful if the helper dumped a message via stderr
> or syslog that said something like "UUID=xxxxxxxx ambiguous, 

This is what it is printed when the helper finds a duplicate uuid:

ghigo@emulato:~$ sudo lvdisplay | grep "LV Path"
  LV Path                /dev/test/lv01
  LV Path                /dev/test/lv02
  LV Path                /dev/test/lv02_snap
  LV Path                /dev/test/lv01_snap

ghigo@emulato:~$ sudo mount /dev/test/lv01 /mnt/btrfs1/
ERROR: disk '/dev/mapper/test-lv01' and '/dev/mapper/test-lv01_snap' have the same disk uuid
ERROR: disk '/dev/mapper/test-lv02_snap' and '/dev/mapper/test-lv02' have the same disk uuid

> must
> select between /dev/AA and /dev/BB using device= to mount
> filesystem."

But anyway I can force the disk to mount:

ghigo@emulato:~$ sudo mount /dev/test/lv01_snap -o device=/dev/test/lv02_snap /mnt/btrfs1/

> 
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  1:25                                           ` Robert White
  2014-11-29  7:35                                             ` Goffredo Baroncelli
@ 2014-11-29  7:37                                             ` MegaBrutal
  1 sibling, 0 replies; 64+ messages in thread
From: MegaBrutal @ 2014-11-29  7:37 UTC (permalink / raw)
  To: linux-btrfs

2014-11-29 2:25 GMT+01:00 Robert White <rwhite@pobox.com>:
>
> On 11/28/2014 09:05 AM, Goffredo Baroncelli wrote:
>>
>> For the disk autodetection, I still convinced that it is a "sane" default
>> to skip the lvm-snapshot
>
>
> No... please don't...
>
> Maybe offer an option to select between snapshots or no-snapshots but in much the same way there is no _functional_ difference between a subvolume and a snapshot in btrfs, there is no "degenerate" status to an LVM snapshot.
>
> It would be way more useful if the helper dumped a message via stderr or syslog that said something like "UUID=xxxxxxxx ambiguous, must select between /dev/AA and /dev/BB using device= to mount filesystem."
>


I agree with this. Sometimes people will exactly want to do that:
mount the snapshot devices and not the origins. Listing devices in the
device= mount option sounds perfectly sane.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  4:59                                           ` Zygo Blaxell
@ 2014-11-29  7:55                                             ` Robert White
  2014-12-01 15:25                                               ` Zygo Blaxell
  0 siblings, 1 reply; 64+ messages in thread
From: Robert White @ 2014-11-29  7:55 UTC (permalink / raw)
  To: Zygo Blaxell, Goffredo Baroncelli; +Cc: linux-btrfs

On 11/28/2014 08:59 PM, Zygo Blaxell wrote:
> On Fri, Nov 28, 2014 at 06:05:48PM +0100, Goffredo Baroncelli wrote:
>> On 11/27/2014 05:15 AM, Zygo Blaxell wrote:
>>> This is a weakness of the current udev and asynchronous device hotplug
>>> concept:  there is no notion of bus enumeration in progress, so we can be
>>> trying to assemble multi-device storage before we have all the devices
>>> visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
>>> lvm2...) has to wait until all known storage buses are fully enumerated
>>> in order to detect if there are duplicates.
>>
>> It is more complex than that. Some devices may appear after the "1st" bus
>> enumeration.
>
> That case is well handled already--a new enumeration will start with the
> second (and all later) hotplug events.
>
> The problem arises when we try to assemble disk arrays before the
> known end of the "1st" (or any) enumeration.  There is no way for an
> enumerating agent to tell other agents "this is definitely not the
> complete list of devices yet, other devices may be inserted imminently"
> and defer all the multi-device assembly until the address space of the
> enumering bus is fully covered.
>
MDADM has an "attached" but not "started" state for arrays that handles 
this condition during incremental assembly. (see "mdadm --incremental 
/dev/whatever"),

To slightly misuse the vocabulary, as each partition is encountered and 
submitted to the system it's checked for a superblock. If one is found 
then it has the identity of an array encoded on it and if that array 
doesn't exist it is allocated, otherwise the device is added to the 
existent array. The array is only started if all the devices are 
accounted for unless an option is added to allow earlier starts, and 
even then "enough" of the devices must be present to make sense (e.g. 
only one device missing from a RAID5, or a correct pair of devices for a 
RAID10 etc.)

So we'd need a "partially assembled but not started" state and some 
ioctls to do things like force-start or force-disown a filesystem that 
cannot be "finished" automatically.

That sort of thing is very easy to do with devices because devices don't 
have to be opened and can reject an open attempt, or at least the 
read/writes after an open and such.

Unfortunately a filesystem can really only exist as a mounted thing, and 
can really only be controlled by remounting thereafter. The most 
efficient way to do this would be to have a alternate file system 
operations structure that was filled mostly with dummy operations that 
would return ENOENT and friends. Then the remount that finally fulfilled 
the file system's requirements would then switch out that struct for the 
fully functional one. That remount would need an "adddev=" and some 
other such options (much like AUFS adds layers).

It;s all doable. But it stretches to near breaking the "mount" paradigm. 
You would need an operation that looked like "mount -t btrfs -o 
do_we_need_this /dev/whatever /this/datum/means/nothing" to match and 
attach a device "wherever it goes" or you might end up needing to do the 
Cartesian product of trial attachments of each new device to all active 
fileystems to match it up, which is an ugly external scripting requirement.

As far as waiting for the address space to be fully covered. Meh. If a 
ready-or-not, or ready-enough, status is established in the file system 
it would be undesirable for it to know anything about any other subsystem.

We don't care if enumeration is "done" we only care if we have a 
rational set of storage, and whether that rational set is "enough" to be 
fully ready, enough to be only read-ready, or just plain not enough.

In theory, the idempotent mount command could be

mount -t btrfs some-uuid-instead-of-device /mount/point
mount -t btrfs some-other-uuid-here /other/mount/point

to create the zero-devices involved entity, followed by

mount -t btrfs -o trydev /dev/something /this/bit/is/ignored

repeated for all possible somethings. /mount/point and 
/other/mount/point would be returning ENOENT for their contents until 
they were ready-enough.

In practice this is very impure compared to how mdadm has the /dev/md- 
namespace in which to build its devices before any actual mount is possible.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  7:35                                             ` Goffredo Baroncelli
@ 2014-11-29  8:02                                               ` Robert White
  0 siblings, 0 replies; 64+ messages in thread
From: Robert White @ 2014-11-29  8:02 UTC (permalink / raw)
  To: kreijack, Zygo Blaxell; +Cc: linux-btrfs

On 11/28/2014 11:35 PM, Goffredo Baroncelli wrote:
> I agree with you; but I have to find a "default" so during the boot
> a system can start even if snapshots are present.

No, you really _don't_ need to find such a default.

Better a system that doesn't boot than one that boots based on a guess.

I've been spending a lot of time thinking about booting while writing 
underdog (http://underdog.sourceforge.net) and while booting is fragile, 
an even partially incorrect boot is a system and _security_ nightmare.

If you start making preferential guesses then an intruder could trick 
the system into booting from a thumb-drive or other alternate media by 
coercing a UUID colision in a way that the system picks the new media.

Conflicts should _never_ be guessed at during boot. Ever.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  7:29                                           ` Duncan
@ 2014-11-29  8:20                                             ` Robert White
  2014-11-29  9:41                                               ` Duncan
                                                                 ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Robert White @ 2014-11-29  8:20 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 11/28/2014 11:29 PM, Duncan wrote:
> Since I can't/won't run pretty much anything proprietary, there's little
> chance of it being taken as anything but Linux, here.  (Tho I actually
> use (c)gdisk for partitioning here and it appears to use a different GUID.
> (0700 in its short form which AFAIK is gdisk specific, for MS basic data,
> while it uses 8300 for general Linux filesystems.  I could look up the
> long form GUIDs, but meh...)

Partition type codes (e.g. 0700, 8300, EF00, etc) have _nothing_ to do 
with UUIDs. They are type codes. They aren't "short form" of anything 
else at all. In fact 0700 is the _long_ _form_ of the original code of 
"7", but in big-endian order now that it went from one byte to two.

Microsoft started using pre-assigned UUIDs as "classes", e.g. type codes 
they could cram into their various registry files. If you actually read 
the registry you'll find a lot of places where "rational word" is 
defined as {some_uuid_here} and then eslwere {some_uuid_here} has a 
bunch of data items attached to it.

So gpartd didn;t "reuse" microsoft UUIDs.

In some/many of the older formats there was a code for "operating system 
data" (which I think is what 7 was originally). Others came by and said 
"since we're going to put in a type code for "linux swap" (82) then lets 
put in a code for linux data as well (83), and all this before the whole 
byte expansion to turn these things from bytes into two-byte words.

Once everybody else picked their own type codes for their data 
partitions, everybody just started calling "7" microsoft data. And linux 
doesn't care at all since it's noise since every partition just ends up 
as /dev/[sh]d? anyway.

All this stuff has historical reasons. GNU/Linux attempts to be an 
egalitarian actor so it adapts to whatever you do.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  8:20                                             ` Robert White
@ 2014-11-29  9:41                                               ` Duncan
  2014-11-29 16:33                                                 ` Robert White
  2014-11-29 16:50                                               ` Robert White
  2014-11-29 21:15                                               ` Chris Murphy
  2 siblings, 1 reply; 64+ messages in thread
From: Duncan @ 2014-11-29  9:41 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Sat, 29 Nov 2014 00:20:11 -0800 as excerpted:

> On 11/28/2014 11:29 PM, Duncan wrote:
>> (Tho I actually use (c)gdisk for partitioning here and it appears to
>> use a different GUID. (0700 in its short form which AFAIK is gdisk
>> specific, for MS basic data, while it uses 8300 for general Linux
>> filesystems.  I could look up the long form GUIDs, but meh...)
> 
> Partition type codes (e.g. 0700, 8300, EF00, etc) have _nothing_ to do
> with UUIDs. They are type codes. They aren't "short form" of anything
> else at all. In fact 0700 is the _long_ _form_ of the original code of
> "7", but in big-endian order now that it went from one byte to two.

You obviously know where the short forms originated (MBR type codes), but 
you haven't the foggiest what you're talking about in relation to gdisk, 
where they're used as 4-hex-char entry shortcuts for the similar GPT/EFI 
GUIDs.  Now that's what I expected with the mention of a different 
partition editor, thus my mention that they were shortcuts for GUIDs, 
apparently gdisk specific, but in gdisk they certainly ARE shortcuts to 
the various GUIDs and you certainly do *NOT* know what you're talking 
about saying they are not even related.

>From the gdisk (8) manpage entry for the l/list action:

l	Display a summary of partition types. GPT uses a GUID to
	identify partition types for particular OSes and purposes. For
	ease of data entry, gdisk compresses these into two-byte
	(four-digit hexadecimal) values that are related to their 
		equivalent MBR codes.  Specifically, the MBR code is multiplied
	by hexadecimal 0x0100. For instance, the code for Linux swap
	space in MBR is 0x82, and it's 0x8200 in gdisk. A one-to-one
	correspondence is impossible, though. Most notably, the codes
	for all varieties of FAT and NTFS partition correspond to a
	single GPT code (entered as 0x0700 in sgdisk).  Some OSes use a
	single MBR code but employ many more codes in GPT. For these,
	gdisk adds code numbers sequentially, such as 0xa500 for a
	FreeBSD disklabel, 0xa501 for FreeBSD boot, 0xa502 for FreeBSD
	swap, and so on. Note that these two-byte codes are unique to
	gdisk.

See also the gdisk home page:

http://www.rodsbooks.com/gdisk/

In particular, see the gdisk walkthru here:

http://www.rodsbooks.com/gdisk/walkthrough.html

... and the gdisk manpage I quoted above here:

http://www.rodsbooks.com/gdisk/gdisk.html


So as I said, gdisk uses a 4-hexit short code based on the legacy MBR 
type-code as an easy entry and display form referencing the longer and 
much less human readable GUIDs, just like I said, and such usage is gdisk 
specific, just like I said I thought it was.

And you might have known the legacy MBR type-codes from which they were 
derived, but obviously you had no idea what I was talking about here, and 
despite my saying it was gdisk specific you decided to simply claim I 
didn't know what I was talking about without actually checking the 
situation, despite my telling you exactly what app I was referring to and 
that I thought those references were app-specific, giving you plenty of 
chance to actually look it up yourself if you decided to, or simply not 
argue that point if you weren't interested in checking out the app-
specific stuff.

=:^(

> Microsoft started using pre-assigned UUIDs as "classes", e.g. type codes
> they could cram into their various registry files. If you actually read
> the registry you'll find a lot of places where "rational word" is
> defined as {some_uuid_here} and then eslwere {some_uuid_here} has a
> bunch of data items attached to it.

FWIW I know about the MS registry stuff from actually doing MS-registry 
and API related programming (hobbiest/VB level but using the regular API 
not just the VB exposed stuff) back before the turn of the century.  I've 
not touched it in nearing a decade and a half now and my knowledge is 
consequently dated 9x vintage, but it obviously had the registry and I 
used to be /quite/ familiar with it, including of course the UUIDs.

> So gpartd didn;t "reuse" microsoft UUIDs.
> 
> In some/many of the older formats there was a code for "operating system
> data" (which I think is what 7 was originally). Others came by and said
> "since we're going to put in a type code for "linux swap" (82) then lets
> put in a code for linux data as well (83), and all this before the whole
> byte expansion to turn these things from bytes into two-byte words.
> 
> Once everybody else picked their own type codes for their data
> partitions, everybody just started calling "7" microsoft data. And linux
> doesn't care at all since it's noise since every partition just ends up
> as /dev/[sh]d? anyway.
> 
> All this stuff has historical reasons. GNU/Linux attempts to be an
> egalitarian actor so it adapts to whatever you do.

This part I have no disagreement with...


-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  9:41                                               ` Duncan
@ 2014-11-29 16:33                                                 ` Robert White
  0 siblings, 0 replies; 64+ messages in thread
From: Robert White @ 2014-11-29 16:33 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 11/29/2014 01:41 AM, Duncan wrote:
> Robert White posted on Sat, 29 Nov 2014 00:20:11 -0800 as excerpted:
> l	Display a summary of partition types. GPT uses a GUID to
> 	identify partition types for particular OSes and purposes. For
> 	ease of data entry, gdisk compresses these into two-byte
> 	(four-digit hexadecimal) values that are related to their
> 		equivalent MBR codes.  Specifically, the MBR code is multiplied
> 	by hexadecimal 0x0100.

That EFI uses GUIDs is one thing. That the standard allows these to be 
selected based on type codes originally derived from ms-dos partition 
type codes ("compressed" is the wrong word) is something else. If they 
were "compressed" then it would be a relationship that could represent 
any GUID at all. It's marginally hashed, in that there is a table 
lookup, but its not properly a hashed as the "hash function" is 
undefined for virtually all possible input values.


The other partition GUID is acutally more interesting.


> So as I said, gdisk uses a 4-hexit short code based on the legacy MBR
> type-code as an easy entry and display form referencing the longer and
> much less human readable GUIDs, just like I said, and such usage is gdisk
> specific, just like I said I thought it was.

Which is not what you said. None of the above was mentioned in the email 
to which I responded.

What you actually said ::

[QUOTE]
Since I can't/won't run pretty much anything proprietary, there's little 
chance of it being taken as anything but Linux, here.  (Tho I actually 
use (c)gdisk for partitioning here and it appears to use a different 
GUID. (0700 in its short form which AFAIK is gdisk specific, for MS 
basic data, while it uses 8300 for general Linux filesystems.  I could 
look up the long form GUIDs, but meh...)
[/QUOTE]

None of which is "gdisk specific", and all of which is based on EFI and 
the GUID partition table.

What I mistakenly attributed to you and was key to my initial response 
was your extension of Chris Murphy:
 >>> Chris Murphy posted on Fri, 28 Nov 2014 00:10:40 -0700 as excerpted:
 >>>> A very good example of WTF reusage of a UUID that irks me to no end is
 >>>> GNU parted devs decided to recycle the Microsoft Windows Basic Data
 >>>> partition type GUID for Linux partitions. It's like watching 
someone get
 >>>> run over by a zamboni with 50 feet of advance notice...

[So my bad there on the quoting...]

The irking there being dumb because the universally used "type GUID" has 
nothing to do with the second GUID that universally identifies the 
partition regardless of type.

But here is the thing... for all the screed about open and closed 
source... (and I am an open source guy myself) The actual EFI standard 
dictates these partition numbers and whatnot so if you used the 
microsoft tools you'd get the same results.

http://en.wikipedia.org/wiki/GUID_Partition_Table#Partition_type_GUIDs

AND microsoft was one of several principle players in the EFI and its 
GUID partition subparts.

So his being "irked to no end" and your agreement and "that's why I used 
gdisk" response are both completely misplaced, and potentially 
misleading to others.

I just went a little off the rails while trying to explain. /D'oh.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  8:20                                             ` Robert White
  2014-11-29  9:41                                               ` Duncan
@ 2014-11-29 16:50                                               ` Robert White
  2014-11-30  6:46                                                 ` Duncan
  2014-11-29 21:15                                               ` Chris Murphy
  2 siblings, 1 reply; 64+ messages in thread
From: Robert White @ 2014-11-29 16:50 UTC (permalink / raw)
  To: linux-btrfs

To those reading along who don't already know. My explanation below is 
factually inadequate or wrong in various places...

The "type codes" as presented in the various EFI/GUID disk partitioning 
tools as 0700, 8200, 8300, EF02, and so on are never written to disk as 
such. They are short-hand values (chosen to be deliberately similar to 
the MS-DOS partitioning type codes of 07, 82, 83, etc) to select 
standardized GUIDs for the partition type field.

So there is the two-digit code from the ms-dos partitoning scheme, then 
there are the four-digit codes that let you select which type GUID will 
be written in an EFI partition scheme.

The question of "reuse" is still improper as the type codes were 
assigned by the EFI standard for specific use as type codes. The EFI 
tool used (gdisk, or windows disk partitioning tool, etc) is immaterial 
as the result codes are selected by standard.

I could have, and should have, been _way_ more clear, and/or less wrong. 8-)

http://en.wikipedia.org/wiki/GUID_Partition_Table#Partition_type_GUIDs


On 11/29/2014 12:20 AM, Robert White wrote:
> On 11/28/2014 11:29 PM, Duncan wrote:
>> Since I can't/won't run pretty much anything proprietary, there's little
>> chance of it being taken as anything but Linux, here.  (Tho I actually
>> use (c)gdisk for partitioning here and it appears to use a different
>> GUID.
>> (0700 in its short form which AFAIK is gdisk specific, for MS basic data,
>> while it uses 8300 for general Linux filesystems.  I could look up the
>> long form GUIDs, but meh...)
>
> Partition type codes (e.g. 0700, 8300, EF00, etc) have _nothing_ to do
> with UUIDs. They are type codes. They aren't "short form" of anything
> else at all. In fact 0700 is the _long_ _form_ of the original code of
> "7", but in big-endian order now that it went from one byte to two.
>
> Microsoft started using pre-assigned UUIDs as "classes", e.g. type codes
> they could cram into their various registry files. If you actually read
> the registry you'll find a lot of places where "rational word" is
> defined as {some_uuid_here} and then eslwere {some_uuid_here} has a
> bunch of data items attached to it.
>
> So gpartd didn;t "reuse" microsoft UUIDs.
>
> In some/many of the older formats there was a code for "operating system
> data" (which I think is what 7 was originally). Others came by and said
> "since we're going to put in a type code for "linux swap" (82) then lets
> put in a code for linux data as well (83), and all this before the whole
> byte expansion to turn these things from bytes into two-byte words.
>
> Once everybody else picked their own type codes for their data
> partitions, everybody just started calling "7" microsoft data. And linux
> doesn't care at all since it's noise since every partition just ends up
> as /dev/[sh]d? anyway.
>
> All this stuff has historical reasons. GNU/Linux attempts to be an
> egalitarian actor so it adapts to whatever you do.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  8:20                                             ` Robert White
  2014-11-29  9:41                                               ` Duncan
  2014-11-29 16:50                                               ` Robert White
@ 2014-11-29 21:15                                               ` Chris Murphy
  2 siblings, 0 replies; 64+ messages in thread
From: Chris Murphy @ 2014-11-29 21:15 UTC (permalink / raw)
  To: Robert White; +Cc: Duncan, Btrfs BTRFS

On Sat, Nov 29, 2014 at 1:20 AM, Robert White <rwhite@pobox.com> wrote:
> On 11/28/2014 11:29 PM, Duncan wrote:
>>
>> Since I can't/won't run pretty much anything proprietary, there's little
>> chance of it being taken as anything but Linux, here.  (Tho I actually
>> use (c)gdisk for partitioning here and it appears to use a different GUID.
>> (0700 in its short form which AFAIK is gdisk specific, for MS basic data,
>> while it uses 8300 for general Linux filesystems.  I could look up the
>> long form GUIDs, but meh...)
>
>
> Partition type codes (e.g. 0700, 8300, EF00, etc) have _nothing_ to do with
> UUIDs. They are type codes. They aren't "short form" of anything else at
> all. In fact 0700 is the _long_ _form_ of the original code of "7", but in
> big-endian order now that it went from one byte to two.

No that's not correct. These four digit type codes are a user facing
friendly type code, the actual on-disk "partitiontype GUID" is a UUID
in that at the time of creation that UUID followed RFC 4122 so it was
unique: no one else was using the UUID. That UUID in the context of a
partitiontype GUID is intended to describe the purpose of that
partition: what OS, what file system, where it should mount or be used
for, etc. This is elaborately detailed in the GPT (GUID partition
table) portion of the UEFI specification. A 120 bit type code is
rather difficult for humans to remember and interact with, hence gdisk
and recently fdisk now use a four digit type code as a front end for
the partitiontypeGUID. The selection of four digits was to account for
the fact there are many many many more type codes now possible,
essentially unlimited.

This is a case where UUID are reused effectively.



> Microsoft started using pre-assigned UUIDs as "classes", e.g. type codes
> they could cram into their various registry files. If you actually read the
> registry you'll find a lot of places where "rational word" is defined as
> {some_uuid_here} and then eslwere {some_uuid_here} has a bunch of data items
> attached to it.
>
> So gpartd didn;t "reuse" microsoft UUIDs.

GNU parted absolutely re-used partitiontypeGUID
EBD0A0A2-B9E5-4433-87C0-68B6B72699C for Linux, by default. This you
know as gdisk (and friends) type code 0700. It's the same thing as
using type code 07 on an MBR partitioned disk instead of 83. It's
ridiculous that this happened considering we had distinction on MBR
with limited type code availability, and on GPT with unlimited type
codes the decision was to use an already existing type code,
EBD0A0A2-B9E5-4433-87C0-68B6B72699C.

http://www.rodsbooks.com/linux-fs-code/

The Linux partitiontype GUID is now
0FC63DAF-8483-4772-8E79-3D69D8477DE4. And actually some others have
been created also for encryption, RAID, LVM, swap, and a pile of GUIDs
from the 'discoverable partitions spec' hosted at freedesktop.org for
autodiscovery by systemd. Only very recent versions of parted supports
code 0FC63DAF-8483-4772-8E79-3D69D8477DE4.


> All this stuff has historical reasons. GNU/Linux attempts to be an
> egalitarian actor so it adapts to whatever you do.

With respect to this particular reuse of a Windows type code, it did a
total face plant on adaptation. The very decision to reuse that GUID
was a huge, weird mistake that we'll live with for years to come. Data
loss will result from it. And then it was made worse, upon recognition
that the conflict was probably not a good idea, to undermine patching
GNU parted in a timely manner. The patch to fix the problem, from the
gdisk author, sat around for two years before parted upstream merged
it. There really isn't good diplomatic language to use for this. Some
people flat out dropped the ball, and just didn't give a crap.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29 16:50                                               ` Robert White
@ 2014-11-30  6:46                                                 ` Duncan
  0 siblings, 0 replies; 64+ messages in thread
From: Duncan @ 2014-11-30  6:46 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Sat, 29 Nov 2014 08:50:57 -0800 as excerpted:

> To those reading along who don't already know. My explanation below is
> factually inadequate or wrong in various places...
> 
> The "type codes" as presented in the various EFI/GUID disk partitioning
> tools as 0700, 8200, 8300, EF02, and so on are never written to disk as
> such. They are short-hand values (chosen to be deliberately similar to
> the MS-DOS partitioning type codes of 07, 82, 83, etc) to select
> standardized GUIDs for the partition type field.

> I could have, and should have, been _way_ more clear, and/or less wrong.
> 8-)
> 
> http://en.wikipedia.org/wiki/GUID_Partition_Table#Partition_type_GUIDs

Thanks.

While I guess we all end up eat humble pie occasionally, you handled it 
with more rather more grace that I often do, and by taking such a hard 
line myself I didn't make it as easy as I might have.


-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
  2014-11-29  7:55                                             ` Robert White
@ 2014-12-01 15:25                                               ` Zygo Blaxell
  0 siblings, 0 replies; 64+ messages in thread
From: Zygo Blaxell @ 2014-12-01 15:25 UTC (permalink / raw)
  To: Robert White; +Cc: Goffredo Baroncelli, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2269 bytes --]

On Fri, Nov 28, 2014 at 11:55:07PM -0800, Robert White wrote:
> On 11/28/2014 08:59 PM, Zygo Blaxell wrote:
> >On Fri, Nov 28, 2014 at 06:05:48PM +0100, Goffredo Baroncelli wrote:
> >>On 11/27/2014 05:15 AM, Zygo Blaxell wrote:
> >>>This is a weakness of the current udev and asynchronous device hotplug
> >>>concept:  there is no notion of bus enumeration in progress, so we can be
> >>>trying to assemble multi-device storage before we have all the devices
> >>>visible.  Assembly of aggregate storage (whatever it is--btrfs, md,
> >>>lvm2...) has to wait until all known storage buses are fully enumerated
> >>>in order to detect if there are duplicates.
> >>
> >>It is more complex than that. Some devices may appear after the "1st" bus
> >>enumeration.
> >
> >That case is well handled already--a new enumeration will start with the
> >second (and all later) hotplug events.
> >
> >The problem arises when we try to assemble disk arrays before the
> >known end of the "1st" (or any) enumeration.  There is no way for an
> >enumerating agent to tell other agents "this is definitely not the
> >complete list of devices yet, other devices may be inserted imminently"
> >and defer all the multi-device assembly until the address space of the
> >enumering bus is fully covered.
> >
> MDADM has an "attached" but not "started" state for arrays that
> handles this condition during incremental assembly. (see "mdadm
> --incremental /dev/whatever"),

> [...very complicated mdadm-architecture-invades-the-filesystem-layer
> thing snipped...]

I don't see why it can't all be done in user-space more or less the same
way LVM does.  Scan all the parititions known to be available, build a
table of devices with UUIDs matching the target filesystem, check for
sufficiency, check for uniqueness, and if the configuration passes all the
sanity checks (or we have hints from the user that resolve ambiguity),
submit the entire list of devices to the kernel as a BTRFS filesystem.
If there are UUID duplicates or missing devices, submit nothing to the
kernel at all.

initramfs-less multi-disk configurations can calculate all that in
advance and generate a rootflags parameter for the kernel command line.
It's not necessary to resolve every possible situation in the kernel.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: BTRFS messes up snapshot LV with origin
@ 2014-11-17  8:00 MegaBrutal
  0 siblings, 0 replies; 64+ messages in thread
From: MegaBrutal @ 2014-11-17  8:00 UTC (permalink / raw)
  To: linux-btrfs

2014-11-17 7:59 GMT+01:00 Brendan Hide <brendan@swiftspirit.co.za>:
>
> Grub is already a little smart here - it avoids snapshots. But in this case it is relying on the UUID and only finding it in the snapshot. So possibly this is a bug in grub affecting the bug reporter specifically - but perhaps the bug is in btrfs where grub is relying on btrfs code.


Yesterday, when I reproduced the phenomenon on a VM, I've found
something rather interesting thing: even /proc/mounts reports
incorrectly, that the snapshot is being mounted instead of the root
FS. Note, there were no reboot. Just create an LVM snapshot and then
check /proc/mounts.

I couldn't reproduce the same with non-root file systems. It seems
this only appears when the device in question is mounted as root FS.


> Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice that is left to the user/admin/distro. I don't think saying "LVM snapshots are incompatible with btrfs" is the right way to go either.


Before I did a release upgrade, just to be safe, I made both (LVM and
btrfs snapshot).


>
> That leaves two aspects of this issue which I view as two separate bugs:
> a) Btrfs cannot gracefully handle separate filesystems that have the same UUID. At all.
> b) Grub appears to pick the wrong filesystem when presented with two filesystems with the same UUID.
>
> I feel a) is a btrfs bug.
> I feel b) is a bug that is more about "ecosystem design" than grub being silly.
>
> I imagine a couple of aspects that could help fix a):
> - Utilise a "unique drive identifier" in the btrfs metadata (surely this exists already?). This way, any two filesystems will always have different drive identifiers *except* in cases like a ddrescue'd copy or a block-level snapshot. This will provide a sensible mechanism for "defined behaviour", preventing corruption - even if that "defined behaviour" is to simply give out lots of "PEBKAC" errors and panic.
> - Utilise a "drive list" to ensure that two unrelated filesystems with the same UUID cannot get "mixed up". Yes, the user/admin would likely be the culprit here (perhaps a VM rollout process that always gives out the same UUID in all its filesystems). Again, does btrfs not already have something like this built-in that we're simply not utilising fully?
>
> I'm not exactly sure of the "correct" way to fix b) except that I imagine it would be trivial to fix once a) is fixed.


Note that everything that is written into the file system's metadata
gets duplicated with an LVM snapshot. So a "unique drive identifier"
wouldn't solve the problem, as it would also get replicated, and BTRFS
would still see two identical devices.

But devices on Linux have major and minor numbers those uniquely
identify devices while they are attached. The original and the
snapshot device have different major/minor numbers, and it would be
quite enough to differentiate the devices while they are being
opened/mounted.

By the way, I actually made an entire release upgrade with the
snapshot being there and being reported incorrectly. This would have
caused enough corruption in the file system that I would have surely
noticed it. But I didn't perceive any data corruption. BTRFS didn't
actually write to the snapshot device. It seems the device is only
mixed up in /proc/mounts, so probably the problem is not so severe as
we think, and wouldn't require fundamental changes to BTRFS to fix it.

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2014-12-01 15:25 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-16 21:35 BTRFS messes up snapshot LV with origin MegaBrutal
2014-11-17  1:42 ` Duncan
2014-11-17  6:59   ` Brendan Hide
2014-11-17  7:35     ` Daniel Dressler
2014-11-17  9:00       ` Brendan Hide
2014-11-17 19:04     ` Goffredo Baroncelli
     [not found]       ` <CAE8gLh=VubBbZdeKTAuWRjOxPF7C+ouUeeVvmGfT2ckYWGhQVA@mail.gmail.com>
2014-11-17 19:45         ` Fwd: " MegaBrutal
2014-11-17 20:32           ` Goffredo Baroncelli
2014-11-18  6:16           ` Chris Murphy
2014-11-18 15:42             ` Phillip Susi
2014-11-18 19:17               ` Chris Murphy
2014-11-18 20:17                 ` Phillip Susi
2014-11-19  2:54                   ` Chris Murphy
2014-11-19 15:20                     ` Phillip Susi
2014-11-19 18:35                       ` Chris Murphy
2014-11-19 19:23                         ` Phillip Susi
2014-11-21  4:28                       ` Zygo Blaxell
2014-11-21  6:22                         ` Duncan
2014-11-21 11:35                           ` Robert White
2014-11-21 11:54                             ` Duncan
2014-11-21 17:56                           ` Zygo Blaxell
2014-11-21 23:09                             ` Duncan
2014-11-21 18:23                           ` Chris Murphy
2014-11-21 22:49                             ` Duncan
2014-11-21 23:41                               ` Duncan
2014-11-21 23:51                                 ` Duncan
2014-11-22 17:34                         ` Goffredo Baroncelli
2014-11-23  0:19                           ` Zygo Blaxell
2014-11-25 16:34                             ` Goffredo Baroncelli
2014-11-25 20:29                               ` Zygo Blaxell
2014-11-25 21:59                                 ` Goffredo Baroncelli
2014-11-25 22:21                                   ` Zygo Blaxell
2014-11-25 22:47                                     ` Chris Murphy
     [not found]                                     ` <CAJCQCtQUM=viSoPtcJMcyKquYb1DLmEsqBi=p++uXPy63+r3Ow@mail.gmail.com>
     [not found]                                       ` <20141126021134.GR17380@hungrycats.org>
2014-11-26  4:48                                         ` Chris Murphy
2014-11-26 17:19                                     ` Goffredo Baroncelli
2014-11-27  4:15                                       ` Zygo Blaxell
2014-11-28 17:05                                         ` Goffredo Baroncelli
2014-11-29  1:25                                           ` Robert White
2014-11-29  7:35                                             ` Goffredo Baroncelli
2014-11-29  8:02                                               ` Robert White
2014-11-29  7:37                                             ` MegaBrutal
2014-11-29  4:59                                           ` Zygo Blaxell
2014-11-29  7:55                                             ` Robert White
2014-12-01 15:25                                               ` Zygo Blaxell
2014-11-26  3:22                                   ` Duncan
2014-11-26  5:11                                     ` Chris Murphy
2014-11-26 22:08                                     ` Robert White
2014-11-27  9:08                                       ` Duncan
2014-11-28  7:10                                         ` Chris Murphy
2014-11-29  7:29                                           ` Duncan
2014-11-29  8:20                                             ` Robert White
2014-11-29  9:41                                               ` Duncan
2014-11-29 16:33                                                 ` Robert White
2014-11-29 16:50                                               ` Robert White
2014-11-30  6:46                                                 ` Duncan
2014-11-29 21:15                                               ` Chris Murphy
2014-11-18 20:41               ` MegaBrutal
2014-11-19  1:29               ` Robert White
2014-11-19  3:37                 ` Duncan
2014-11-21  4:24       ` Zygo Blaxell
2014-11-18  6:21     ` Chris Murphy
2014-11-18 12:13       ` Duncan
2014-11-18 20:01       ` Goffredo Baroncelli
2014-11-17  8:00 MegaBrutal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.