All of lore.kernel.org
 help / color / mirror / Atom feed
* Subvolume UUID, data corruption?
@ 2015-12-04 12:05 S.J
  2015-12-04 13:07 ` Hugo Mills
  0 siblings, 1 reply; 51+ messages in thread
From: S.J @ 2015-12-04 12:05 UTC (permalink / raw)
  To: linux-btrfs

Hello

As we know, two file systems with the same UUID (like reported by eg. "blkid") are problematic, especially if both are mounted at the same time it leads to data corruption. So, copying a BTRFS partition with eg. dd to another and use it immediately is bad. To prevent this, "btrfstune -u /dev/sdaX" changes the UUID of the given partition.

However, BTRFS subvolumes have their own UUID, which can be viewed eg. with "btrfs sub list -u /mountpoint". This UUIDs are not changed by the command above, and apparently there is no other way to do this.

My question is: Is this a problem similar to the main UUID? Can mounting two BTRFS partitions with equal subvolume UUIDs (but different main UUID) can cause data corruption?

(...well, and maybe someone could explain me what these subvol UUIDs are for in the first place. Subvolumes already have an unique number, and from user p.o.v, there isn't anything where the subvol UUIDs can be used at all (?))

Thank you

PS: Apologies for sending a second mail, somehow my first try didn't contain any text

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Subvolume UUID, data corruption?
  2015-12-04 12:05 Subvolume UUID, data corruption? S.J
@ 2015-12-04 13:07 ` Hugo Mills
  2015-12-05  3:28   ` Christoph Anton Mitterer
  0 siblings, 1 reply; 51+ messages in thread
From: Hugo Mills @ 2015-12-04 13:07 UTC (permalink / raw)
  To: S.J; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2166 bytes --]

On Fri, Dec 04, 2015 at 01:05:28PM +0100, S.J wrote:
> Hello
> 
> As we know, two file systems with the same UUID (like reported by eg. "blkid") are problematic, especially if both are mounted at the same time it leads to data corruption. So, copying a BTRFS partition with eg. dd to another and use it immediately is bad. To prevent this, "btrfstune -u /dev/sdaX" changes the UUID of the given partition.
> 
> However, BTRFS subvolumes have their own UUID, which can be viewed eg. with "btrfs sub list -u /mountpoint". This UUIDs are not changed by the command above, and apparently there is no other way to do this.
> 
> My question is: Is this a problem similar to the main UUID? Can mounting two BTRFS partitions with equal subvolume UUIDs (but different main UUID) can cause data corruption?

   I don't think it'll cause problems. The UUIDs on subvols are only
really used internally to that filesystem, so the kernel doesn't have
a chance to get confused. The main thing that could be confused is
send/receive, but that's a matter of possibly losing some validation
(thus allowing you to do something that will fail) rather than causing
active damage, as in the duplicate-FS-UUID case.

> (...well, and maybe someone could explain me what these subvol UUIDs are for in the first place. Subvolumes already have an unique number, and from user p.o.v, there isn't anything where the subvol UUIDs can be used at all (?))

   The subvol UUIDs are used to identify them through send/receive
operations. There are three main UUID fields on a subvol: the actual
UUID (u), the Received_UUID (r) and the Parent_UUID (p), and these are
used to identify whether an incremental send could function correctly
when received. (I can give you chapter and verse on how they're used
if you like, but that's a bit excessive just for answering your
question here).

   Hugo.

> Thank you
> 
> PS: Apologies for sending a second mail, somehow my first try didn't contain any text

-- 
Hugo Mills             | Do not meddle in the affairs of system
hugo@... carfax.org.uk | administrators, for they are subtle, and quick to
http://carfax.org.uk/  | anger.
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Subvolume UUID, data corruption?
  2015-12-04 13:07 ` Hugo Mills
@ 2015-12-05  3:28   ` Christoph Anton Mitterer
  2015-12-05  5:52     ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
                       ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-05  3:28 UTC (permalink / raw)
  To: Hugo Mills; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]

On Fri, 2015-12-04 at 13:07 +0000, Hugo Mills wrote:
> I don't think it'll cause problems.
Is there any guaranteed behaviour when btrfs encounters two filesystems
(i.e. not talking about the subvols now) with the same UUID?

Given that it's long standing behaviour that people could clone
filesystems (dd, etc.) and this just worked™, btrfs should at least
handle such case gracefully.
For example, when already more than one block device with a btrfs of
the same UUID are known, then it should refuse to mount any of them.

And if one is already known and another device pops up it should refuse
to mount that and continue to normally use the already mounted one.



Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?)
  2015-12-05  3:28   ` Christoph Anton Mitterer
@ 2015-12-05  5:52     ` Christoph Anton Mitterer
  2015-12-05 12:01     ` Subvolume UUID, data corruption? Hugo Mills
  2015-12-05 13:19     ` Duncan
  2 siblings, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-05  5:52 UTC (permalink / raw)
  To: Hugo Mills; +Cc: linux-btrfs

Thinking a bit more I that, I came to the conclusion that it's actually security relevant that btrfs deals gracefully with filesystems having the same UUID:

Getting to know someone else's filesystem's UUID may be more easily possible than one may think.
It's usually not considered secret and for example included in debug reports (e.g. several Debian packages do this).

The only thing an attacker then needs to do is somehow making another filesystem with the UUID available in his victims system.
Simplest way is via a USB stick when he has local access.
Thanks to some stupid desktop environments, chances aren't to bad that the system will even auto mount the stick.

If btrfs doesn't handle this gracefully the attacker may damage or destroy the original filesystem, or if things get awkwardly corrupted (and data is written to the fake btrfs) even get data out of such a system (despite any screen locks or dm-crypt).

Cheers
Chris.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Subvolume UUID, data corruption?
  2015-12-05  3:28   ` Christoph Anton Mitterer
  2015-12-05  5:52     ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
@ 2015-12-05 12:01     ` Hugo Mills
  2015-12-06  1:51       ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
  2015-12-11 12:33       ` Subvolume UUID, data corruption? Austin S. Hemmelgarn
  2015-12-05 13:19     ` Duncan
  2 siblings, 2 replies; 51+ messages in thread
From: Hugo Mills @ 2015-12-05 12:01 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]

On Sat, Dec 05, 2015 at 04:28:24AM +0100, Christoph Anton Mitterer wrote:
> On Fri, 2015-12-04 at 13:07 +0000, Hugo Mills wrote:
> > I don't think it'll cause problems.
> Is there any guaranteed behaviour when btrfs encounters two filesystems
> (i.e. not talking about the subvols now) with the same UUID?

   Nothing guaranteed, but the likelihood is that things will go badly
wrong, in the sense of corrupt filesystems.

> Given that it's long standing behaviour that people could clone
> filesystems (dd, etc.) and this just worked™, btrfs should at least
> handle such case gracefully.
> For example, when already more than one block device with a btrfs of
> the same UUID are known, then it should refuse to mount any of them.
> And if one is already known and another device pops up it should refuse
> to mount that and continue to normally use the already mounted one.

   Except that that's exactly the mechanism that btrfs uses to handle
multi-device filesystems, so you've just broken anything with more
than one device in the FS.

   If you inspect the devid on each device as well, and refuse
duplicates of those, you've just broken any multipathing
configurations.

   Even if you can handle that, if you have two copies of dev1, and
two copies of dev2, how do you guarantee that the "right" pair of dev1
and dev2 is selected? (e.g. if you have them as network devices, and
the device enumeration order is unstable on each boot).

   Hugo.

-- 
Hugo Mills             | Geek, n.:
hugo@... carfax.org.uk | Circus sideshow performer specialising in the eating
http://carfax.org.uk/  | of live animals.
PGP: E2AB1DE4          |                                                   OED

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Subvolume UUID, data corruption?
  2015-12-05  3:28   ` Christoph Anton Mitterer
  2015-12-05  5:52     ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
  2015-12-05 12:01     ` Subvolume UUID, data corruption? Hugo Mills
@ 2015-12-05 13:19     ` Duncan
  2015-12-06  1:51       ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
  2 siblings, 1 reply; 51+ messages in thread
From: Duncan @ 2015-12-05 13:19 UTC (permalink / raw)
  To: linux-btrfs

Christoph Anton Mitterer posted on Sat, 05 Dec 2015 04:28:24 +0100 as
excerpted:

> On Fri, 2015-12-04 at 13:07 +0000, Hugo Mills wrote:
>> I don't think it'll cause problems.
> Is there any guaranteed behaviour when btrfs encounters two filesystems
> (i.e. not talking about the subvols now) with the same UUID?
> 
> Given that it's long standing behaviour that people could clone
> filesystems (dd, etc.) and this just worked™, btrfs should at least
> handle such case gracefully.
> For example, when already more than one block device with a btrfs of the
> same UUID are known, then it should refuse to mount any of them.
> 
> And if one is already known and another device pops up it should refuse
> to mount that and continue to normally use the already mounted one.

The problem with btrfs is that because (unlike traditional filesystems) 
it's multi-device, it needs some way to identify what devices belong to a 
particular filesystem.

And UUID is, by definition and expansion, Universally Unique ID.  Btrfs 
simply depends on it being what it says on the the tin, universally 
unique, to ID the components of the filesystem and assemble them 
correctly.

Besides dd, etc, LVM snapshots are another case where this goes screwy.  
If the UUID isn't UUID, do a btrfs device scan (which udev normally does 
by default these days) so the duplicate UUID is detected, and btrfs 
*WILL* eventually start trying to write to all the "newly added" devices 
that scan found, identified by their Universally Unique IDs, aka UUIDs.  
It's not a matter of if, but when.


And the UUID is embedded so deeply within the filesystem and its 
operations, as an inextricable part of the metadata (thus avoiding the 
problem reiserfs had where a reiserfs stored in a loopback file on a 
reiserfs, would screw up reiserfsck, on btrfs, the loopback file would 
have a different UUID and thus couldn't be mixed up), that changing the 
UUID is not the simple operation of changing a few bytes in the superblock 
that it is on other filesystems, which is why there's now a tool to go 
thru all those metadata entries and change it.


So an aware btrfs admin simply takes pains to avoid triggering a btrfs 
device scan at the wrong time, and to immediately hide their LVM 
snapshots, immediately unplug their directly dd-ed devices, etc, and thus 
doesn't have to deal with the filesystem corruption that'd be a when not 
if, if they didn't take such precautions with their dupped UUIDs that 
actually aren't as UUID as the name suggests...

And as your followup suggests in a security context, they consider 
masking out their UUIDs before posting them, as well, tho most kernel 
hackers generally consider unsupervised physical access to be game-over, 
security-wise.  (After all, in that case there's often little or nothing 
preventing a reboot to that USB stick, if desired, or simply yanking the 
devices and duping them or plugging them in elsewhere, if the BIOS is 
password protected, with the only thing standing in the way at that point 
being possible device encryption.)


The UUID *as* a UUID, _unique_ at least on that system (if not actually 
universally) as it says on the tin, is so deeply embedded in btrfs that 
at this point it's not going to be removed.  The only real alternative if 
you don't like it is using a different filesystem.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?)
  2015-12-05 12:01     ` Subvolume UUID, data corruption? Hugo Mills
@ 2015-12-06  1:51       ` Christoph Anton Mitterer
  2015-12-11 12:33       ` Subvolume UUID, data corruption? Austin S. Hemmelgarn
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-06  1:51 UTC (permalink / raw)
  To: Hugo Mills; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2797 bytes --]

On Sat, 2015-12-05 at 12:01 +0000, Hugo Mills wrote:
> On Sat, Dec 05, 2015 at 04:28:24AM +0100, Christoph Anton Mitterer
> wrote:
> > On Fri, 2015-12-04 at 13:07 +0000, Hugo Mills wrote:
> > > I don't think it'll cause problems.
> > Is there any guaranteed behaviour when btrfs encounters two
> > filesystems
> > (i.e. not talking about the subvols now) with the same UUID?
> 
>    Nothing guaranteed, but the likelihood is that things will go
> badly
> wrong, in the sense of corrupt filesystems.
Phew... well sorry, but I think that's really something that makes
btrfs not productively usable until fixed.



>    Except that that's exactly the mechanism that btrfs uses to handle
> multi-device filesystems, so you've just broken anything with more
> than one device in the FS.
Don't other containers (e.g. LVM) do something similar, and yet they
don't fail badly in case e.g. multipl PVs with the same UUID appear,
AFAIC.

And shouldn't there be some kind of device UUID, which differs
different parts of the same btrfs (with the same fs UUID) but on
different devices?!


>    If you inspect the devid on each device as well, and refuse
> duplicates of those, you've just broken any multipathing
> configurations.
Well, how many people are actually doing this? A minority. So then it
would be simply necessary that multipathing doesn't work out of the box
and one need to specifically tell the kernel to consider a device with
the same btrfs UUID as not a clone but another path to the same device.

In any cases, rare feature like multipathing cannot justify the
possibility of data corruption.
That situtation as it is now is IMHO completely unacceptable.



>    Even if you can handle that, if you have two copies of dev1, and
> two copies of dev2, how do you guarantee that the "right" pair of
> dev1
> and dev2 is selected? (e.g. if you have them as network devices, and
> the device enumeration order is unstable on each boot).
Not sure what you mean now:
The multipathing case?
Then, as I've said, such situations would simply require to manually
set things up and explicitly tell the kernel that the devices foo and
bar are to be used (despite their dup UUID).

If you mean what happens when I have e.g. two clones of a 2-device
btrfs, as in
fsdev1
fsdev2
fsdev1_clone
fsdev2_clone
Then as I've said before... if one pair of them
is already mounted (i.e. when the *_clone appear), than it's likely
that these belong actually together and the kernel should continue to
use them and ignore any other.
If all appear before any is mounted, then
either is should refuse to mount/use any of them, or it should require
to manually specify which devices to be used (i.e. via /dev/sda or so).


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?)
  2015-12-05 13:19     ` Duncan
@ 2015-12-06  1:51       ` Christoph Anton Mitterer
  2015-12-06  4:06         ` Duncan
  2015-12-06 14:34         ` attacking btrfs filesystems via UUID collisions? Qu Wenruo
  0 siblings, 2 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-06  1:51 UTC (permalink / raw)
  To: Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 6099 bytes --]

On Sat, 2015-12-05 at 13:19 +0000, Duncan wrote:
> The problem with btrfs is that because (unlike traditional
> filesystems) 
> it's multi-device, it needs some way to identify what devices belong
> to a 
> particular filesystem.
Sure, but that applies to lvm, or MD as well... and I wouldn't know of
any random corruption issues there.


> And UUID is, by definition and expansion, Universally Unique ID.
Nitpicking doesn't help here,... reality is they're not,.. either by
people doing stuff like dd, other forms of clones, LVM, etc. ... or as
I've described maliciously.


> Btrfs 
> simply depends on it being what it says on the the tin, universally 
> unique, to ID the components of the filesystem and assemble them 
> correctly.
Admittedly, I'm not an expert to the internals of btrfs, but it seems
other multi-device containers can handle UUID duplicates fine, or at
least so that you don't get any data corruption (or leaks).

This is a showstopper - maybe not under lab conditions but surely under
real world scenarios.
I'm actually quite surprised that no-one else didn't complain about
that before, given how long btrfs exists.


> Besides dd, etc, LVM snapshots are another case where this goes
> screwy.  
> If the UUID isn't UUID, do a btrfs device scan (which udev normally
> does 
> by default these days) so the duplicate UUID is detected, and btrfs 
> *WILL* eventually start trying to write to all the "newly added"
> devices 
> that scan found, identified by their Universally Unique IDs, aka
> UUIDs.  
> It's not a matter of if, but when.
Well.. as I said... quite scary, with respect to both, accidental and
malicious cases of duplicate UUIDs.


> And the UUID is embedded so deeply within the filesystem and its 
> operations, as an inextricable part of the metadata (thus avoiding
> the 
> problem reiserfs had where a reiserfs stored in a loopback file on a 
> reiserfs, would screw up reiserfsck, on btrfs, the loopback file
> would 
> have a different UUID and thus couldn't be mixed up), that changing
> the 
> UUID is not the simple operation of changing a few bytes in the
> superblock 
> that it is on other filesystems, which is why there's now a tool to
> go 
> thru all those metadata entries and change it.
I don't think that this design is per se bad and prevents the kernel to
handle such situations gracefully.

I would expect that in addition to the fs UUID, it needs a form of
device ID... so why not simply ignoring any new device for which there
already is a matching fs UUID and device ID, unless the respective tool
(mount, btrfs, etc.) is explicitly told so via some
device=/dev/sda,/dev/sdb option.

If that means that less things work out of the box (in the sense of
"auto-assembly") well than this is simply necessary.
data security and consistency is definitely much more important than
any fancy auto-magic.



> So an aware btrfs admin simply takes pains to avoid triggering a
> btrfs 
> device scan at the wrong time, and to immediately hide their LVM 
> snapshots, immediately unplug their directly dd-ed devices, etc, and
> thus 
> doesn't have to deal with the filesystem corruption that'd be a when
> not 
> if, if they didn't take such precautions with their dupped UUIDs that
> actually aren't as UUID as the name suggests...
a) People shouldn't need to do days of study to be able to use btrfs
securely. Of course it's more advanced and not everything can be
simplified in a way so that users don't need to know anything (e.g. all
the well-known effects of CoW)... but when the point is reached where
security and data integrity is threatened, there's definitely a hard
border that mustn't be crossed.

b) Given how complex software is, I doubt that it's easily possible,
even for the aware admin, to really prevent all situations that can
lead to such situations.
Not to talk about about any attack-scenarios.



> And as your followup suggests in a security context, they consider 
> masking out their UUIDs before posting them, as well, tho most kernel
> hackers generally consider unsupervised physical access to be game-
> over, 
> security-wise.
Do they? I rather thought many of them had a rather practical and real-
world-situations-based POV.

> (After all, in that case there's often little or nothing 
> preventing a reboot to that USB stick, if desired, or simply yanking
> the 
> devices and duping them or plugging them in elsewhere, if the BIOS is
> password protected, with the only thing standing in the way at that
> point 
> being possible device encryption.)
There's hardware which would, when it detects physicals intrusion (like
yanking) lock up itself (securely clearing the memory, disconnecting
itself from other nodes, which may be compromised as well, when the
filesystem on the attacked node would go crazy.

You have things like ATMs, which are physically usually quite well
secured, but which do have rather easily accessible maintenance ports.
All of us have seen such embedded devices rebooting themselves, where
you see kernel messages.
That's the point where an attacker could easily get the btrfs UUID:
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.2.0-1-amd64
root=UUID=bd1ea5a0-9bba-11e5-82fa-502690aa641f

If you can attack such devices already by just having access to a USB
port... then holly sh**...


> The only real
> alternative if 
> you don't like it is using a different filesystem.
As I've said, I don't have a problem with UUIDs... I just can't quite
believe that btrfs and the userland cannot be modified so that it
handles such cases gracefully.

If not, than, to be quite honest, that would be really a major
showstopper for many usage areas.
And I'm not talking about ATMs (or any other embedded devices where
people may have non-supervides access - e.g. TVs in a mall,
entertainment systems in planes) but also the normal desktops/laptops
where colleagues, fellow students, etc. may want to play some "prank".


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?)
  2015-12-06  1:51       ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
@ 2015-12-06  4:06         ` Duncan
  2015-12-09  5:07           ` Christoph Anton Mitterer
  2015-12-06 14:34         ` attacking btrfs filesystems via UUID collisions? Qu Wenruo
  1 sibling, 1 reply; 51+ messages in thread
From: Duncan @ 2015-12-06  4:06 UTC (permalink / raw)
  To: linux-btrfs

Christoph Anton Mitterer posted on Sun, 06 Dec 2015 02:51:20 +0100 as
excerpted:

> You have things like ATMs, which are physically usually quite well
> secured, but which do have rather easily accessible maintenance ports.
> All of us have seen such embedded devices rebooting themselves, where
> you see kernel messages.
> That's the point where an attacker could easily get the btrfs UUID:
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.2.0-1-amd64
> root=UUID=bd1ea5a0-9bba-11e5-82fa-502690aa641f
> 
> If you can attack such devices already by just having access to a USB
> port... then holly sh**...

There's actually a number of USB-based hardware and software vulns out
there, from the under $10 common-component-capacitor-based charge-and-zap
(charges off the 5V USB line, zaps the port with several hundred volts
reverse-polarity, if the machine survives the first pulse and continues
supplying 5V power, repeat...), to the ones that act like USB-based input
devices and "type" in whatever commands, to simple USB-boot to a forensic
distro and let you inspect attached hardware (which is where the encrypted
storage comes in, they've got everything that's not encrypted),
to the plain old fashioned boot-sector viruses that quickly jump to
everything else on the system that's not boot-sector protected and/or
secure-boot locked, to...

Which is why most people in the know say if you have unsupervised physical
access, you effectively own the machine and everything on it, at least
that's not encrypted.

There's a reason some places hot-glue the USB ports.  If you're plugging
anything untrusted into them... and that's a well known social engineering
hack as well, simply drop a few thumb drives in the target parking lot and
wait to see who picks them up and plugs them in, so they can call home... 
Pen-testers do it.  NSA does it.  It's said a form of that is how they
bridged the air-gap to the Iranian centrifuges...

If you haven't been keeping up, you really have some reading to do.  If
you're plugging in untrusted USB devices, seriously, a thumb drive with a
few duplicated btrfs UUIDs is the least of your worries!

>> The only real alternative if you don't like it is using a different
>> filesystem.

> As I've said, I don't have a problem with UUIDs... I just can't quite
> believe that btrfs and the userland cannot be modified so that it
> handles such cases gracefully.

As I implied,  UUIDs usage is so deeply embedded, fixing btrfs to not work
that way is pretty much impossible.  You'd be pretty much starting from
scratch and using some of the same ideas; it wouldn't be btrfs any longer.

> If not, than, to be quite honest, that would be really a major
> showstopper for many usage areas.

Consider the show stopped, then.

> And I'm not talking about ATMs (or any other embedded devices where
> people may have non-supervides access - e.g. TVs in a mall,
> entertainment systems in planes) but also the normal desktops/laptops
> where colleagues, fellow students, etc. may want to play some "prank".

As I said, if you're plugging in or allowing to be plugged in untrusted
USB devices, show's over, they're already playing pretty much any prank
they want, including zapping the hardware.  USB's now less trusted than a
raw Internet hookup with all services exposed.  The only controlling
factor now is the physical presence limitation, and if you're plugging in
devices you get for instance as "gifts just for trying us out" or
whatever, that someone mails to you... worse than running MS and
mindlessly running any exe someone sends you.


BTW, this is documented (in someone simpler "do not do XX" form) on the
wiki, gotchas page.

https://btrfs.wiki.kernel.org/index.php/Gotchas#Block-level_copies_of_devices


-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-06  1:51       ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
  2015-12-06  4:06         ` Duncan
@ 2015-12-06 14:34         ` Qu Wenruo
  2015-12-06 20:55           ` Chris Murphy
  2015-12-09  5:39           ` Christoph Anton Mitterer
  1 sibling, 2 replies; 51+ messages in thread
From: Qu Wenruo @ 2015-12-06 14:34 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Duncan, linux-btrfs



On 12/06/2015 09:51 AM, Christoph Anton Mitterer wrote:
> On Sat, 2015-12-05 at 13:19 +0000, Duncan wrote:
>> The problem with btrfs is that because (unlike traditional
>> filesystems)
>> it's multi-device, it needs some way to identify what devices belong
>> to a
>> particular filesystem.
> Sure, but that applies to lvm, or MD as well... and I wouldn't know of
> any random corruption issues there.

Not sure about LVM/MD, but they should suffer the same UUID conflict 
problem.

The only idea I have can only enhance the behavior, but never fix it.
For example, if found multiple btrfs devices with same devid, just 
refuse to mount.
And for already mounted btrfs, ignore any duplicated fsid/devid.

The problem can get even tricky for case like device missing for a while 
and appear again case.


But just as you mentioned, it *IS* a real problem, and we should need to 
enhance it.

>
>
>> And UUID is, by definition and expansion, Universally Unique ID.
> Nitpicking doesn't help here,... reality is they're not,.. either by
> people doing stuff like dd, other forms of clones, LVM, etc. ... or as
> I've described maliciously.
>
>
>> Btrfs
>> simply depends on it being what it says on the the tin, universally
>> unique, to ID the components of the filesystem and assemble them
>> correctly.
> Admittedly, I'm not an expert to the internals of btrfs, but it seems
> other multi-device containers can handle UUID duplicates fine, or at
> least so that you don't get any data corruption (or leaks).

I'd like to see how LVM/DM behaves first, at least as a reference if 
they are really so safe.
For example, I have a whole disk as the following configuration:

0         10G         20G
| test_lv |           |
----------------
|  test_vg     |
-----------------------
|       test_pv       |
-----------------------
|    /dev/sdb         |
-----------------------

If I did a dd copy of /dev/sdb to /dev/sdc,
what will pv/vg/lv rescan show if test_pv/vg/lv is already active?
And what will rescan show if they are not active? Or after a reboot?

>
> This is a showstopper - maybe not under lab conditions but surely under
> real world scenarios.
> I'm actually quite surprised that no-one else didn't complain about
> that before, given how long btrfs exists.
>
>
>> Besides dd, etc, LVM snapshots are another case where this goes
>> screwy.
>> If the UUID isn't UUID, do a btrfs device scan (which udev normally
>> does
>> by default these days) so the duplicate UUID is detected, and btrfs
>> *WILL* eventually start trying to write to all the "newly added"
>> devices
>> that scan found, identified by their Universally Unique IDs, aka
>> UUIDs.
>> It's not a matter of if, but when.
> Well.. as I said... quite scary, with respect to both, accidental and
> malicious cases of duplicate UUIDs.
>
>
>> And the UUID is embedded so deeply within the filesystem and its
>> operations, as an inextricable part of the metadata (thus avoiding
>> the
>> problem reiserfs had where a reiserfs stored in a loopback file on a
>> reiserfs, would screw up reiserfsck, on btrfs, the loopback file
>> would
>> have a different UUID and thus couldn't be mixed up), that changing
>> the
>> UUID is not the simple operation of changing a few bytes in the
>> superblock
>> that it is on other filesystems, which is why there's now a tool to
>> go
>> thru all those metadata entries and change it.
> I don't think that this design is per se bad and prevents the kernel to
> handle such situations gracefully.
>
> I would expect that in addition to the fs UUID, it needs a form of
> device ID... so why not simply ignoring any new device for which there
> already is a matching fs UUID and device ID, unless the respective tool
> (mount, btrfs, etc.) is explicitly told so via some
> device=/dev/sda,/dev/sdb option.

IIRC, there were some btrfs-progs patches for such behavior, not sure 
about kernel part though.
But at least an interesting method to solve the problem.
(Better than just rejecting mounting any)
>
> If that means that less things work out of the box (in the sense of
> "auto-assembly") well than this is simply necessary.
> data security and consistency is definitely much more important than
> any fancy auto-magic.

Can't agree any more.
Especially when auto leads to wrong behavior (Like kernel version based 
probing).



And after all, this topic makes me remember the bugreport of fuzzed (but 
csum recalculated) images.
I used to ignore them and I think that wouldn't happen.

But the reporter is right, it's a btrfs security problem, and now I'm 
super happy to see such report.
As it's easy to fix, I can always submit some patches if there is no 
other guy faster than me. :)

So for this one, as long as we find a good behavior to solve it, it 
won't be a big thing.

Thanks,
Qu
>
>
>
>> So an aware btrfs admin simply takes pains to avoid triggering a
>> btrfs
>> device scan at the wrong time, and to immediately hide their LVM
>> snapshots, immediately unplug their directly dd-ed devices, etc, and
>> thus
>> doesn't have to deal with the filesystem corruption that'd be a when
>> not
>> if, if they didn't take such precautions with their dupped UUIDs that
>> actually aren't as UUID as the name suggests...
> a) People shouldn't need to do days of study to be able to use btrfs
> securely.

> Of course it's more advanced and not everything can be
> simplified in a way so that users don't need to know anything (e.g. all
> the well-known effects of CoW)... but when the point is reached where
> security and data integrity is threatened, there's definitely a hard
> border that mustn't be crossed.
>
> b) Given how complex software is, I doubt that it's easily possible,
> even for the aware admin, to really prevent all situations that can
> lead to such situations.
> Not to talk about about any attack-scenarios.
>
>
>
>> And as your followup suggests in a security context, they consider
>> masking out their UUIDs before posting them, as well, tho most kernel
>> hackers generally consider unsupervised physical access to be game-
>> over,
>> security-wise.
> Do they? I rather thought many of them had a rather practical and real-
> world-situations-based POV.
>
>> (After all, in that case there's often little or nothing
>> preventing a reboot to that USB stick, if desired, or simply yanking
>> the
>> devices and duping them or plugging them in elsewhere, if the BIOS is
>> password protected, with the only thing standing in the way at that
>> point
>> being possible device encryption.)
> There's hardware which would, when it detects physicals intrusion (like
> yanking) lock up itself (securely clearing the memory, disconnecting
> itself from other nodes, which may be compromised as well, when the
> filesystem on the attacked node would go crazy.
>
> You have things like ATMs, which are physically usually quite well
> secured, but which do have rather easily accessible maintenance ports.
> All of us have seen such embedded devices rebooting themselves, where
> you see kernel messages.
> That's the point where an attacker could easily get the btrfs UUID:
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.2.0-1-amd64
> root=UUID=bd1ea5a0-9bba-11e5-82fa-502690aa641f
>
> If you can attack such devices already by just having access to a USB
> port... then holly sh**...
>
>
>> The only real
>> alternative if
>> you don't like it is using a different filesystem.
> As I've said, I don't have a problem with UUIDs... I just can't quite
> believe that btrfs and the userland cannot be modified so that it
> handles such cases gracefully.
>
> If not, than, to be quite honest, that would be really a major
> showstopper for many usage areas.
> And I'm not talking about ATMs (or any other embedded devices where
> people may have non-supervides access - e.g. TVs in a mall,
> entertainment systems in planes) but also the normal desktops/laptops
> where colleagues, fellow students, etc. may want to play some "prank".
>
>
> Cheers,
> Chris.
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-06 14:34         ` attacking btrfs filesystems via UUID collisions? Qu Wenruo
@ 2015-12-06 20:55           ` Chris Murphy
  2015-12-09  5:39           ` Christoph Anton Mitterer
  1 sibling, 0 replies; 51+ messages in thread
From: Chris Murphy @ 2015-12-06 20:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Anton Mitterer, Duncan, Btrfs BTRFS

On Sun, Dec 6, 2015 at 7:34 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

> But just as you mentioned, it *IS* a real problem, and we should need to
> enhance it.

LVM sorta avoids the problem, because its snapshots aren't active by
default so the underlying fs (and its UUID and superblock) don't
appear to the kernel.

In no order:

1. better practices, we really need to tell users, and documentation
writers, that using dd (or variant) to copy Btrfs volumes has a
consequence and should not be used to make copies.

2. Btrfs needs a better way to make a copy of a volume when there are
snapshots (including even rw snapshots); e.g. permit send/receive to
work on rw snapshots if the fs is ro mounted; e.g. a way to do
"recursive" send/receive.

3. Some way to fail gracefully, when there's ambiguity that cannot be
resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
then there's simply no way to resolve the ambiguity automatically, and
the volume should just refuse to rw mount until the user resolves the
ambiguity. I think it's OK to fallback to ro mount (maybe) by default
in such a case rather than totally fail to mount.



> I'd like to see how LVM/DM behaves first, at least as a reference if they
> are really so safe.
> For example, I have a whole disk as the following configuration:
>
> 0         10G         20G
> | test_lv |           |
> ----------------
> |  test_vg     |
> -----------------------
> |       test_pv       |
> -----------------------
> |    /dev/sdb         |
> -----------------------
>
> If I did a dd copy of /dev/sdb to /dev/sdc,
> what will pv/vg/lv rescan show if test_pv/vg/lv is already active?
> And what will rescan show if they are not active? Or after a reboot?

I haven't tested it recently but my recollection is that it flat
refused to activate the VG/LV whenever two PV's with identical UUIDs
were visible, that is, it would not use either PV until I resolved the
ambiguity by physical PV removal, or using pvremove, or using wipefs.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?)
  2015-12-06  4:06         ` Duncan
@ 2015-12-09  5:07           ` Christoph Anton Mitterer
  2015-12-09 11:54             ` Duncan
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-09  5:07 UTC (permalink / raw)
  To: Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4170 bytes --]

On Sun, 2015-12-06 at 04:06 +0000, Duncan wrote:
> There's actually a number of USB-based hardware and software vulns
> out
> there, from the under $10 common-component-capacitor-based charge-
> and-zap
> (charges off the 5V USB line, zaps the port with several hundred
> volts
> reverse-polarity, if the machine survives the first pulse and
> continues
> supplying 5V power, repeat...), to the ones that act like USB-based
> input
> devices and "type" in whatever commands, to simple USB-boot to a
> forensic
> distro and let you inspect attached hardware (which is where the
> encrypted
> storage comes in, they've got everything that's not encrypted),
> to the plain old fashioned boot-sector viruses that quickly jump to
> everything else on the system that's not boot-sector protected and/or
> secure-boot locked, to...
Well this is all well known - at least to security folks ;) - but to be
quite honest:
Not an excuse for allowing even more attack surface, in this case via
the filesystem.
One will *always* find a weaker element in the security chain, and
could always argue with that not to fixe one's own issues.

"Well, there's no need to fix that possible collision-data-leakage-
issue in btrfs[0]! Why? Well an attacker could still simply abduct the
bank manager, torture him for hours until he gives any secret with
pleasure"
;-)


> Which is why most people in the know say if you have unsupervised
> physical
> access, you effectively own the machine and everything on it, at
> least
> that's not encrypted.
Sorry, I wouldn't say so. Ultimately you're of course right, which is
why my fully-dm-crypted notebook is never left alone when it runs (cold
boot or USB firmware attacks)... but in practise things are a bit
different I think.
Take the ATM example.

Or take real world life in big computing centres.
Fact is, many people have usually access, from the actual main
personell, over electricians to the cleaning personnel.
Whacking a device or attacking it via USB firmware tricks, is of course
possible for them, but it's much more likely to be noted (making noise,
taking time and so on),... so there is no need to give another attack
surface by this.


> If you haven't been keeping up, you really have some reading to
> do.  If
> you're plugging in untrusted USB devices, seriously, a thumb drive
> with a
> few duplicated btrfs UUIDs is the least of your worries!
Well as I've said, getting that in via USB may be only one way.
We're already so far that GNOME&Co. automount devices when plugged...
who says the the next step isn't that this happens remotely in some
form, e.g. btrfs-image on dropbox, automounted by nautilus.
Okay, that may be a bit constructed, but it should demonstrate that
there could be plenty of ways for that to happen, which we don't even
think of (and usually these are the worst in security).


You said it's basically not fixable in btrfs:
It's absolutely clear that I'm no btrfs expert (or even developer), but
my poor man approach which I think I've written before doesn't seem so
impossible, does it?
1) Don't simply "activate" btrfs devices that are found but rather:
2) Check if there are other devices of the same fs UUID + device ID, or
more generally said: check if there are any collisions
3) If there are, and some of them are already active, continue to use
them, don't activate the newly appeared ones
4) If there are, and none of them are already active, refuse to
activate *any* of them unless the user manually instructs to do so via
device= like options.


> BTW, this is documented (in someone simpler "do not do XX" form) on
> the
> wiki, gotchas page.
> 
> https://btrfs.wiki.kernel.org/index.php/Gotchas#Block-level_copies_of
> _devices
I know, but it doesn't really tell all possibly consequences, and
again, it's unlikely that the end-user (even if possibly heavily
affected by it) will stumble over that.


Cheer,
Chris.


[0] Assuming there is actually one, I haven't really verified that and
base it solely one what people told that basically arbitrary
corruptions may happen on both devices.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-06 14:34         ` attacking btrfs filesystems via UUID collisions? Qu Wenruo
  2015-12-06 20:55           ` Chris Murphy
@ 2015-12-09  5:39           ` Christoph Anton Mitterer
  2015-12-09 21:48             ` S.J.
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-09  5:39 UTC (permalink / raw)
  To: Qu Wenruo, Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 8444 bytes --]

On Sun, 2015-12-06 at 22:34 +0800, Qu Wenruo wrote:
> Not sure about LVM/MD, but they should suffer the same UUID conflict
> problem.
Well I had that actually quite often in LVM (i.e. same UUIDs visible on
the same system), basically because we made clones from one template VM
image and when that is normally booted, LVM doesn't allow to change the
UUIDs of already active PV/VG/LVs (or maybe just some of these three,
forgot the details)

But there was never any issue, LVM on the host system, when one set was
already used, continues to use that just fine and the toolset reports
which it would use (more below).


> The only idea I have can only enhance the behavior, but never fix it.
> For example, if found multiple btrfs devices with same devid, just 
> refuse to mount.
> And for already mounted btrfs, ignore any duplicated fsid/devid.
Well I think that's already a perfectly valid solution... basically the
idea that I had before.
I'd call that a 100% fix, not just a workaround.

If then the tools (i.e. btrfstune) allows to change the UUID of the duplicate set of devices (perhaps again with the necessity to specify each of them via device=/dev/sda,etc.) I'd be completely happy again,... and the show could get on ;)

> The problem can get even tricky for case like device missing for a
> while 
> and appear again case.
I had thought about that too:
a) In the non-malicious case, this could e.g. mean that a device from a
btrfs RAID was missing and a clone with the same UUID / dev ID get's
added to the system
Possible consequences, AFAICS:
- The data is simply auto-rebuilt on the clone.
- Some corruptions occur when the clone is older, and data that was
only on the newer device is now missing (not sure if this can happen at
all or whether generation IDs prevent it).

b) In the malicious/attack case, one possible scenario could be:
A device is missing from a btrfs RAID... the machine is left
unattended. An attacker comes plugs in the USB stick with the missing
UUID. Is the rebuild (and thus data leakage) now happening
automatically?

In any case though, a simply solution could be, that not automatic
assemblies happen per default, and the people who still want to do
that, are properly warned about the possible implications in the docs.


> But just as you mentioned, it *IS* a real problem, and we should need
> to 
> enhance it.
Should one (or I) add this as a ticket to the kernel bugzilla, or as an
entry to the btrfs wiki?


> I'd like to see how LVM/DM behaves first, at least as a reference if 
> they are really so safe.
Well that's very simple to check, I did it here for the LV case only:
root@lcg-lrz-admin:~# truncate -s 1G image1
root@lcg-lrz-admin:~# losetup -f image1 
root@lcg-lrz-admin:~# pvcreate /dev/loop0
  Physical volume "/dev/loop0" successfully created
root@lcg-lrz-admin:~# losetup -d /dev/loop0 
root@lcg-lrz-admin:~# cp image1 image2
root@lcg-lrz-admin:~# losetup -f image1 
root@lcg-lrz-admin:~# pvscan 
  PV /dev/sdb     VG vg_data     lvm2 [50,00 GiB / 0    free]
  PV /dev/sda1    VG vg_system   lvm2 [9,99 GiB / 0    free]
  PV /dev/loop0                  lvm2 [1,00 GiB]
  Total: 3 [60,99 GiB] / in use: 2 [59,99 GiB] / in no VG: 1 [1,00 GiB]
root@lcg-lrz-admin:~# losetup -f image2 
root@lcg-lrz-admin:~# pvscan 
  Found duplicate PV tSK9Cdpw6bcmocZnxFPD6ThNz1opRXsB: using /dev/loop1 not /dev/loop0
  PV /dev/sdb     VG vg_data     lvm2 [50,00 GiB / 0    free]
  PV /dev/sda1    VG vg_system   lvm2 [9,99 GiB / 0    free]
  PV /dev/loop1                  lvm2 [1,00 GiB]
  Total: 3 [60,99 GiB] / in use: 2 [59,99 GiB] / in no VG: 1 [1,00 GiB]

Obviously, with PVs alone, there is no "x is already used" case. As one
can see it just says it would ignore one of them, which I think is
rather stupid in that particular case (i.e. non of the devices already
used somehow), because it probably just "randomly" decides which is to
be used, which is ambiguous.


> And what will rescan show if they are not active?
My experience was always (it's just quite late and I don't want to
simulate everything right now, which is trivial anyway):
- It shows warnings about the duplicates in the tools
- It continues to use the already active devices (if any)
- Unfortunately, while the kernel continues to use the already used
devices, the toolset may use other device (kinda stupid, but at least
it warns and the already used devices seem to be still properly used):

continuation from the setup above:
root@lcg-lrz-admin:~# losetup -d /dev/loop1 
(now only image1 is seen as loop0)
root@lcg-lrz-admin:~# vgcreate vg_test /dev/loop0
  Volume group "vg_test" successfully created
root@lcg-lrz-admin:~# lvcreate -n test vg_test -l 100
  Logical volume "test" created
root@lcg-lrz-admin:~# mkfs.ext4 /dev/vg_test/test 
mke2fs 1.42.12 (29-Aug-2014)
...
root@lcg-lrz-admin:~# mount /dev/vg_test/test /mnt/
root@lcg-lrz-admin:~# losetup -a
/dev/loop0: [64768]:518297 (/root/image1)
root@lcg-lrz-admin:~# losetup -f image2 
root@lcg-lrz-admin:~# vgs
  Found duplicate PV tSK9Cdpw6bcmocZnxFPD6ThNz1opRXsB: using /dev/loop1 not /dev/loop0
  VG        #PV #LV #SN Attr   VSize  VFree
  vg_data     1   1   0 wz--n- 50,00g    0 
  vg_system   1   2   0 wz--n-  9,99g    0 
root@lcg-lrz-admin:~# lvs
  Found duplicate PV tSK9Cdpw6bcmocZnxFPD6ThNz1opRXsB: using /dev/loop1 not /dev/loop0
  LV   VG        Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data vg_data   -wi-ao----   50,00g                                                    
  root vg_system -wi-ao----    9,02g                                                    
  swap vg_system -wi-ao---- 1000,00m                                                    

As you can see, even though loop0 is used (by the kernel) the toolset
would use loop1... o.O
Yeah, don't ask me why... I once had a discussion with Alastair from
the LVM people about that, forgot the exact reasons (if there were any)
and I was simply happy that it continued to use the already open
devices properly.


>  Or after a reboot?
Haven't checked this right now but I guess it again just decides on one
of them (which is pretty bad).


> > I would expect that in addition to the fs UUID, it needs a form of
> > device ID... so why not simply ignoring any new device for which
> > there
> > already is a matching fs UUID and device ID, unless the respective
> > tool
> > (mount, btrfs, etc.) is explicitly told so via some
> > device=/dev/sda,/dev/sdb option.
> 
> IIRC, there were some btrfs-progs patches for such behavior, not sure
> about kernel part though.
> But at least an interesting method to solve the problem.
> (Better than just rejecting mounting any)
Of course if the user wouldn't specify those, it would still need to
reject mounting/using/activating/fsck'ing/etc. ...


> > If that means that less things work out of the box (in the sense of
> > "auto-assembly") well than this is simply necessary.
> > data security and consistency is definitely much more important
> > than
> > any fancy auto-magic.
> 
> Can't agree any more.
> Especially when auto leads to wrong behavior (Like kernel version
> based 
> probing).
Good to hear... well... you're the developer... spread the word :D


> And after all, this topic makes me remember the bugreport of fuzzed
> (but 
> csum recalculated) images.
> I used to ignore them and I think that wouldn't happen.
> 
> But the reporter is right, it's a btrfs security problem, and now I'm
> super happy to see such report.
As I've said, I've been quite surprised that no one seems to have
thought about that before (especially the security aspect of that
issue).


> As it's easy to fix, I can always submit some patches if there is no 
> other guy faster than me. :)
Awesome... showstopper number #1 just seems to be about to walk away :D


> So for this one, as long as we find a good behavior to solve it, it 
> won't be a big thing.
Great... keep me/us updated :)


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?)
  2015-12-09  5:07           ` Christoph Anton Mitterer
@ 2015-12-09 11:54             ` Duncan
  0 siblings, 0 replies; 51+ messages in thread
From: Duncan @ 2015-12-09 11:54 UTC (permalink / raw)
  To: linux-btrfs

Christoph Anton Mitterer posted on Wed, 09 Dec 2015 06:07:38 +0100 as
excerpted:

> Well as I've said, getting that in via USB may be only one way.
> We're already so far that GNOME&Co. automount devices when plugged...

Ugh.  ... And many know that's the sort of thing that made MS so much of 
a security headache, and want no part of it!

FWIW, of course gentoo allows far more configurability in this regard 
than many distros, but no automount here, and while I don't do gnome 
because I like my system configurable and they'd just as soon it be their 
way or the highway (echoes of proprietaryware attitude there if you ask 
me, but I'm very glad gnome's available for them to work on as otherwise 
they'd be troubling kde and etc to go the same way), I do have a much 
more limited than usual kde installed, without stuff like the device 
notifier plasmoid or underlying infrastructure like udisks, as the only 
things I want mounted are the things I've either configured to be mounted 
via fstab, or the thing's I've manually mounted.  (FWIW, the semantic-
desktop crap is opted out at build-time too, so it's not even there to 
turn off at runtime, the best most distros allow for those not interested 
in that stuff.  It meant dumping a few apps and some missing features in 
others, but I don't have indexing taking gigs of space and major IO 
bandwidth at the most inconvenient times (any time!) for nothing I'm 
going to make use of, either!)

> You said it's basically not fixable in btrfs:
> It's absolutely clear that I'm no btrfs expert (or even developer), but
> my poor man approach which I think I've written before doesn't seem so
> impossible, does it?
> 1) Don't simply "activate" btrfs devices that are found but rather:
> 2) Check if there are other devices of the same fs UUID + device ID,
> or more generally said: check if there are any collisions
> 3) If there are, and some of them are already active,
> continue to use them, don't activate the newly appeared ones
> 4) If there are, and none of them are already active, refuse to
> activate *any* of them unless the user manually instructs to do so
> via device= like options.

The underlying issue pretty much isn't fixable, but as Qu has suggested 
on that subthread, there's ameliorations that can be done, basically in 
line with your suggestions above, and you've indicated that you'd 
consider that fixed, tho neither he nor I consider it "fixed", only 
hidden to some extent.

Anyway, he's a dev and actively involved now, while I'm not a dev,
so he can take it from there. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-09  5:39           ` Christoph Anton Mitterer
@ 2015-12-09 21:48             ` S.J.
  2015-12-10 12:08               ` Austin S Hemmelgarn
                                 ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: S.J. @ 2015-12-09 21:48 UTC (permalink / raw)
  To: linux-btrfs

> 1. better practices, we really need to tell users, and documentation
> writers, that using dd (or variant) to copy Btrfs volumes has a
> consequence and should not be used to make copies.

> 2. Btrfs needs a better way to make a copy of a volume when there are
> snapshots (including even rw snapshots); e.g. permit send/receive to
> work on rw snapshots if the fs is ro mounted; e.g. a way to do
> "recursive" send/receive.

> 3. Some way to fail gracefully, when there's ambiguity that cannot be
> resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
> then there's simply no way to resolve the ambiguity automatically, and
> the volume should just refuse to rw mount until the user resolves the
> ambiguity. I think it's OK to fallback to ro mount (maybe) by default
> in such a case rather than totally fail to mount.

About 3:
RO fallback for the second device/partitions is not good.
It won't stop confusing the two partitions, and even if both are RO,
thinking it's ok to read and then reading the wrong data is bad.

About 1 and 2 ... if 3 gets fulfilled, why?
DD itself is not a problem "if" the UUID is changed after it
(which is a command as simple as dd), and if someone doesn't
know that, he/she will notice when mount refuses to work
because UUID duplicate.


PS: Kudos to C.A. Mitterer for discovering that problem


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-09 21:48             ` S.J.
@ 2015-12-10 12:08               ` Austin S Hemmelgarn
  2015-12-10 12:41                 ` Hugo Mills
  2015-12-10 19:42               ` Chris Murphy
  2015-12-11 22:06               ` Christoph Anton Mitterer
  2 siblings, 1 reply; 51+ messages in thread
From: Austin S Hemmelgarn @ 2015-12-10 12:08 UTC (permalink / raw)
  To: S.J., linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]

On 2015-12-09 16:48, S.J. wrote:
>> 1. better practices, we really need to tell users, and documentation
>> writers, that using dd (or variant) to copy Btrfs volumes has a
>> consequence and should not be used to make copies.
> 
>> 2. Btrfs needs a better way to make a copy of a volume when there are
>> snapshots (including even rw snapshots); e.g. permit send/receive to
>> work on rw snapshots if the fs is ro mounted; e.g. a way to do
>> "recursive" send/receive.
> 
>> 3. Some way to fail gracefully, when there's ambiguity that cannot be
>> resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
>> then there's simply no way to resolve the ambiguity automatically, and
>> the volume should just refuse to rw mount until the user resolves the
>> ambiguity. I think it's OK to fallback to ro mount (maybe) by default
>> in such a case rather than totally fail to mount.
> 
> About 3:
> RO fallback for the second device/partitions is not good.
> It won't stop confusing the two partitions, and even if both are RO,
> thinking it's ok to read and then reading the wrong data is bad.
> 
> About 1 and 2 ... if 3 gets fulfilled, why?
> DD itself is not a problem "if" the UUID is changed after it
> (which is a command as simple as dd), and if someone doesn't
> know that, he/she will notice when mount refuses to work
> because UUID duplicate.
Unless things have changed significantly, changing the UUID on a BTRFS
image is not anywhere near as simple as copying it with dd.  The UUID
gets used internally somehow, and changing it would require rewriting
_all_ the metadata blocks.



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-10 12:08               ` Austin S Hemmelgarn
@ 2015-12-10 12:41                 ` Hugo Mills
  2015-12-10 12:57                   ` S.J.
  0 siblings, 1 reply; 51+ messages in thread
From: Hugo Mills @ 2015-12-10 12:41 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: S.J., linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

On Thu, Dec 10, 2015 at 07:08:51AM -0500, Austin S Hemmelgarn wrote:
> On 2015-12-09 16:48, S.J. wrote:
> >> 1. better practices, we really need to tell users, and documentation
> >> writers, that using dd (or variant) to copy Btrfs volumes has a
> >> consequence and should not be used to make copies.
> > 
> >> 2. Btrfs needs a better way to make a copy of a volume when there are
> >> snapshots (including even rw snapshots); e.g. permit send/receive to
> >> work on rw snapshots if the fs is ro mounted; e.g. a way to do
> >> "recursive" send/receive.
> > 
> >> 3. Some way to fail gracefully, when there's ambiguity that cannot be
> >> resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
> >> then there's simply no way to resolve the ambiguity automatically, and
> >> the volume should just refuse to rw mount until the user resolves the
> >> ambiguity. I think it's OK to fallback to ro mount (maybe) by default
> >> in such a case rather than totally fail to mount.
> > 
> > About 3:
> > RO fallback for the second device/partitions is not good.
> > It won't stop confusing the two partitions, and even if both are RO,
> > thinking it's ok to read and then reading the wrong data is bad.
> > 
> > About 1 and 2 ... if 3 gets fulfilled, why?
> > DD itself is not a problem "if" the UUID is changed after it
> > (which is a command as simple as dd), and if someone doesn't
> > know that, he/she will notice when mount refuses to work
> > because UUID duplicate.
> Unless things have changed significantly, changing the UUID on a BTRFS
> image is not anywhere near as simple as copying it with dd.  The UUID
> gets used internally somehow, and changing it would require rewriting
> _all_ the metadata blocks.

   Indeed, but there is now a tool to do that. :) (btrfstune -u or -U)

   Hugo.

-- 
Hugo Mills             | Go not to the elves for counsel, for they will say
hugo@... carfax.org.uk | both no and yes.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-10 12:41                 ` Hugo Mills
@ 2015-12-10 12:57                   ` S.J.
  0 siblings, 0 replies; 51+ messages in thread
From: S.J. @ 2015-12-10 12:57 UTC (permalink / raw)
  To: linux-btrfs


Am 10.12.2015 13:41, schrieb Hugo Mills:
> On Thu, Dec 10, 2015 at 07:08:51AM -0500, Austin S Hemmelgarn wrote:
>> On 2015-12-09 16:48, S.J. wrote:
>>>> 1. better practices, we really need to tell users, and documentation
>>>> writers, that using dd (or variant) to copy Btrfs volumes has a
>>>> consequence and should not be used to make copies.
>>>> 2. Btrfs needs a better way to make a copy of a volume when there are
>>>> snapshots (including even rw snapshots); e.g. permit send/receive to
>>>> work on rw snapshots if the fs is ro mounted; e.g. a way to do
>>>> "recursive" send/receive.
>>>> 3. Some way to fail gracefully, when there's ambiguity that cannot be
>>>> resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
>>>> then there's simply no way to resolve the ambiguity automatically, and
>>>> the volume should just refuse to rw mount until the user resolves the
>>>> ambiguity. I think it's OK to fallback to ro mount (maybe) by default
>>>> in such a case rather than totally fail to mount.
>>> About 3:
>>> RO fallback for the second device/partitions is not good.
>>> It won't stop confusing the two partitions, and even if both are RO,
>>> thinking it's ok to read and then reading the wrong data is bad.
>>>
>>> About 1 and 2 ... if 3 gets fulfilled, why?
>>> DD itself is not a problem "if" the UUID is changed after it
>>> (which is a command as simple as dd), and if someone doesn't
>>> know that, he/she will notice when mount refuses to work
>>> because UUID duplicate.
>> Unless things have changed significantly, changing the UUID on a BTRFS
>> image is not anywhere near as simple as copying it with dd.  The UUID
>> gets used internally somehow, and changing it would require rewriting
>> _all_ the metadata blocks.
>     Indeed, but there is now a tool to do that. :) (btrfstune -u or -U)
>
>     Hugo.
>
Yes, I meant that :)
I'm not saying that the tool is internally as simple as a
"dumb" dd block copy , but calling it certainly is.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-09 21:48             ` S.J.
  2015-12-10 12:08               ` Austin S Hemmelgarn
@ 2015-12-10 19:42               ` Chris Murphy
  2015-12-11 22:21                 ` Christoph Anton Mitterer
  2015-12-11 22:06               ` Christoph Anton Mitterer
  2 siblings, 1 reply; 51+ messages in thread
From: Chris Murphy @ 2015-12-10 19:42 UTC (permalink / raw)
  To: S.J.; +Cc: Btrfs BTRFS

On Wed, Dec 9, 2015 at 2:48 PM, S.J. <sorry@anonym.com> wrote:
>> 1. better practices, we really need to tell users, and documentation
>> writers, that using dd (or variant) to copy Btrfs volumes has a
>> consequence and should not be used to make copies.
>
>
>> 2. Btrfs needs a better way to make a copy of a volume when there are
>> snapshots (including even rw snapshots); e.g. permit send/receive to
>> work on rw snapshots if the fs is ro mounted; e.g. a way to do
>> "recursive" send/receive.
>
>
>> 3. Some way to fail gracefully, when there's ambiguity that cannot be
>> resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
>> then there's simply no way to resolve the ambiguity automatically, and
>> the volume should just refuse to rw mount until the user resolves the
>> ambiguity. I think it's OK to fallback to ro mount (maybe) by default
>> in such a case rather than totally fail to mount.
>
>
> About 3:
> RO fallback for the second device/partitions is not good.
> It won't stop confusing the two partitions, and even if both are RO,
> thinking it's ok to read and then reading the wrong data is bad.

That isn't what I'm suggesting. In the multiple device volume case
where there are two exact (same UUID, same devid, same generation)
instances of one of the block devices, Btrfs could randomly choose
either one if it's an RO mount.

It may very well be safer to just refuse to mount it with an error
indicating the ambiguity, and suggesting the user explicitly specify
the devices to use to assemble the volume, and if the generations
differ on those chosen devices, at least warn about that also.


>
> About 1 and 2 ... if 3 gets fulfilled, why?
> DD itself is not a problem "if" the UUID is changed after it
> (which is a command as simple as dd), and if someone doesn't
> know that, he/she will notice when mount refuses to work
> because UUID duplicate.

dd is not a copy operation. It's creating a 2nd original. You don't
end up with an original and a copy (or clone). A copy or clone has
some distinguishing difference. Volume UUID is used throughout Btrfs
metadata, it's not just in the superblocks. Changing volume UUID
requires a rewrite of all metadata. This is inefficient for two
reasons: one dd copies unused sectors; two it copies metadata that
will have to be completely rewritten by btrfstune to change volume
UUID; and also the subvolume UUIDs aren't changed, so it's an
incomplete solution that has problems (see other threads).

If your workflow requires making an exact copy (for the shelf or for
an emergency) then dd might be OK. But most often it's used because
it's been easy, not because it's a good practice. Note that Btrfs is
not unique, XFS v5 does a very similar thing with volume UUID as well,
and resulted in this change:
http://oss.sgi.com/pipermail/xfs/2015-April/041267.html

Using dd also means the volume is offline. For even medium sized
multiple device volumes, it's a huge penalty. dd does not scale. Using
dd means source and destination physical configurations are identical
(at least the number of devices and the data and metadata profiles)
which I may not want or need for a clone. Maybe I want a 1x6TB clone
for the 5x1TB raid5 volume.

Even for an online full volume copy/clone of a 5x1TB raid5, moving all
subvolume+snapshots to a new 3x4TB raid5 (or whatever), that could be
hundreds of subvolumes to btrfs send/receive. OK yeah script it. But
that's tedious even assuming I have a script friendly subvolume naming
convention to get the send/receive order correct, which I don't.

Anyway, I think it's a nice to have now, that'll eventually be a need.
And dd is just totally disqualified outside of very specific edge case
need.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Subvolume UUID, data corruption?
  2015-12-05 12:01     ` Subvolume UUID, data corruption? Hugo Mills
  2015-12-06  1:51       ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
@ 2015-12-11 12:33       ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 51+ messages in thread
From: Austin S. Hemmelgarn @ 2015-12-11 12:33 UTC (permalink / raw)
  To: Hugo Mills, Christoph Anton Mitterer, linux-btrfs

On 2015-12-05 07:01, Hugo Mills wrote:
> On Sat, Dec 05, 2015 at 04:28:24AM +0100, Christoph Anton Mitterer wrote:
>> On Fri, 2015-12-04 at 13:07 +0000, Hugo Mills wrote:
>>> I don't think it'll cause problems.
>> Is there any guaranteed behaviour when btrfs encounters two filesystems
>> (i.e. not talking about the subvols now) with the same UUID?
>
>     Nothing guaranteed, but the likelihood is that things will go badly
> wrong, in the sense of corrupt filesystems.
>
>> Given that it's long standing behaviour that people could clone
>> filesystems (dd, etc.) and this just worked™, btrfs should at least
>> handle such case gracefully.
>> For example, when already more than one block device with a btrfs of
>> the same UUID are known, then it should refuse to mount any of them.
>> And if one is already known and another device pops up it should refuse
>> to mount that and continue to normally use the already mounted one.
>
>     Except that that's exactly the mechanism that btrfs uses to handle
> multi-device filesystems, so you've just broken anything with more
> than one device in the FS.
>
>     If you inspect the devid on each device as well, and refuse
> duplicates of those, you've just broken any multipathing
> configurations.
This already potentially breaks multipath configurations, as well as 
dm-cache, some soft raid configurations, and probably other things as well.
>
>     Even if you can handle that, if you have two copies of dev1, and
> two copies of dev2, how do you guarantee that the "right" pair of dev1
> and dev2 is selected? (e.g. if you have them as network devices, and
> the device enumeration order is unstable on each boot).
In some cases it can be done without much effort.  Take dm-cache for 
example.  The hierarchy of devices in a dm-cache device looks like this:
cached-device
+ backing-device
+ cache-pool
   + pool-storage
   + pool-metadata

At a minimum, the cached device and the backing device contain identical 
data (the cached-device just has a writeback or writethrough cache on 
it), and the pool storage device may under some circumstances look like 
a BTRFS filesystem as well.  In this case, it's pretty obvious that the 
only device that BTRFS should be accessing is the cached device, not the 
backing device or the pool storage device.  For this, if we simply 
blacklist all devices that are themselves components in device-mapper 
tables, then we avoid the issue here, and possibly in some other as of 
yet undiscovered cases.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-09 21:48             ` S.J.
  2015-12-10 12:08               ` Austin S Hemmelgarn
  2015-12-10 19:42               ` Chris Murphy
@ 2015-12-11 22:06               ` Christoph Anton Mitterer
  2 siblings, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-11 22:06 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo, S.J.

[-- Attachment #1: Type: text/plain, Size: 8768 bytes --]

On Wed, 2015-12-09 at 22:48 +0100, S.J. wrote:
> > 3. Some way to fail gracefully, when there's ambiguity that cannot
> > be
> > resolved. Once there are duplicate devs (dd or lvm snapshots, etc)
> > then there's simply no way to resolve the ambiguity automatically,
> > and
> > the volume should just refuse to rw mount until the user resolves
> > the
> > ambiguity. I think it's OK to fallback to ro mount (maybe) by
> > default
> > in such a case rather than totally fail to mount.
> About 3:
> RO fallback for the second device/partitions is not good.
> It won't stop confusing the two partitions, and even if both are RO,
> thinking it's ok to read and then reading the wrong data is bad.
Adding my two cents about that, just to emphasise it, even though S.J.
already covered it:

Even romounts, if anything is ambiguous, are evil:
Even if the filesystem itself wouldn't be destroyed by that, it could
mean that bogus data (or even evil data by an attacker) shows up in the
system that is then used and causes damage by being used.

In the "accidental" scenario, data from the wrong device could e.g.
contain outdated binaries, that still have security holes, or they
could contain lists of datasets to be deleted by some software, but
since being outdated or simply garbage, the wrong data could be
deleted.

In the "attacker" scenario,... well again as above, old binaries could
get used, or garbage data injected into the system (even if ro) could
make it compromised or be used for DoS.




In general, the longer I think about it, the more I come to the
conclusion that any form of auto activation (mounting, assembling,
rebuilding, etc.) is kind of dangerous... (see below)

And this applies in general, not just when using UUIDs,... but since in
btrfs UUIDs are the main criterion for selecting/auto-assembling these
devices, it's what applies for us here.

We have several stages, where wrong devices could be picked up and lead
to damage (either accidentally or as part of a tricky attack):
1) When the system boots, i.e. replacing parts of the system (e.g. 
   root fs) itself.
   There's little we can do here in general (regardless of UUID,
   labels or device=/dev/sda,/dev/sdb). If an attacker can exchange
   one of the devices, he may do evil things.
   That's bad of course, but I think "fixing" it, is beyond the scope
   of btrfs.
   - If e.g. the ATM has an unsecured BIOS/UEFI/bootloader and allows 
     the attacker easily to access these and select which device to 
     boot from,... well than I feel no sorry for the owner (their 
     fault).
   - If they configure their grub/initrd/etc. to boot LABEL/UUID... 
     well that's certainly handy, but it's also stupid if these boots 
     happen unattended, and there is an way around it (specify the 
     device paths or e.g. /dev/sda)... if the HDDs are properly
     secured by steel, and attacker cannot use the possibly more
     easily accessible USB bus.
   - Another way to partially help here is: use disk dm-crypt and 
     boot/assemble your system based on the dm-crypt devices.
     E.g. boot from the multi-device-btrfs 
     device=/dev/mapper/crypt1,/dev/mapper/crypt2 and so on.
     As long as the kernel and initrd (which does all that) are secure 
     (which is assumed here), then even when the attacker manages to 
     replace one of the devices, it wouldn't help him, as the he 
     couldn't present a device for which a dm-crypt mapping can be set 
     up (unless he has the keys, but then game's over anyway)

=> Long story short, if the system boots unattended, then people
   should not use UUID/LABEL to select the device, if they do, their 
   fault, not btrfs scope.
   If boots are attended, there's anyway not problem.
=> IHMO, this conceptually "fixes" (in the sense, that there's nothing
   to do specifically from the btrfs side) the possible problems of
   such a system being booted, with an attacker having replaced or 
   added some devices to it (especially when unattended).
   And also the situation, that such system was left back, in an
   incomplete multi-device state (i.e. left back unattended with a
   degraded RAID)


In other words, I think any problems, resulting of auto-
assembly/activation/mounting, based on UUIDs/device-scanning/etc. that
affect the valid system becoming running (i.e. booting) are beyond our
scope here.
Yes there are problems, but one can at least try to avoid them, by
using dm-crypt  or  device paths instead of LABELS/UUIDs, and properly
securing (i.e. steel and so on) the system disks, mainboard, bios, etc.


So the remaining issues are those we discussed already before:
The system runs already.
1) Further devices show up with colliding UUIDs /device IDs.
   a) Either none of them are used (mounted, fsck, etc.) already.
   b) Or     some of them are used (mounted, fsck, etc.) already.
2)
Further devices show up, that have no UUID / device ID collisions,
 
 but that may fit to an already used multi-device btrfs.
   E.g. in the
sense of: I have degraded RAID1 btrfs where my system
   runs upon. A
new device shows up that would fit to that btrfs.

(1) we already discussed:
Effects:
- it leads to data corruption
- attackers may use it to cause damage or even get out data
Possible solutions:
If such situations occur:
- In case (a) refuse to do (mount, fsck,  anything else from the btrfs
  tools) anything unless the user specified the devices to be used 
  manually (i.e. device=/dev/sda,/dev/sdb), perhaps even checking for,
  whether the given value, may be accidentally a UUID or label, e.g.
  /dev/disk/by-uuid/*
- In case (b), continue to use the already used/active/assembled 
  devices (because we must assume they actually belong together),
  refuse to do anything (including mounting, adding to a multi-device
  fs, starting rebuild, etc. pp.) with the others unless the user
  manually says so via device=foo,bar,baz

(2) is similar to (1), but I think we haven't discussed it already in depth.
The effects here are the same as above (i.e. accidental data corruption, or possible attacks), but here they would happen if btrfs would ever automatically assemble/add devices to an already active (possibly degraded) fs.
Examples:
- I have a degraded RAID6, one disk missing, the system is e.g.
  unattended and an attacker can plug in a USB stick with IDs that
  just match perfectly.
  If btrfs would then start to automatically add that newly appeared
  device to the fs, being happy about the fact that it can now start
  to rebuild, we'd have a problem.
  In that example, because the attacker may use that to get data out of
  the system.
  Take the same example without an attacker, a sysadmin may just
  accidentally plug in wrong HDD, that should actually serve as
  backup... it would start to get written at (this is why many HW RAID
  controllers have auto-activation disabled).
- One has a *non-degraded* RAID1, and an attacker manages to plug a
  device with matching IDs...
  If then btrfs would be happy about being able to enlarge the RAID to
  one more device, and automatically start to use that new device,
  perhaps even starting a balance, then same problem as above.
Possible solutions:
Long story short, never do auto-assemblies (i.e. add to an already active fs) in multi-device scenarios.
That is, don't do it per default.
I'd be fine if it was an option, e.g. a kernel parameter or whatever that enables btrfs to such auto assemblies, and if the documentation clearly explains the possible issues (especially security issues) implied by it.... but it shouln't be the default.


- IMHO, a fs should be secure by default, thus I think, adding devices to an already active fs (e.g. for rebuild), should never happen (by default) automatically.

- But perhaps it would be useful to have one additional option, which generally disables that (i.e. not just in case of already active devices).
That option would make it mandatory in *all* cases, that the user specifies device=/dev/foo,/dev/bar.
That behaviour may be preferred for some special use cases, and having a true option for it, may be better than just trying to get it by removing any udev scripts or so (which may get accidentally added back by the distro).





> PS: Kudos to C.A. Mitterer for discovering that problem
Thanks, guess I have a hand for thinking about such "higher-level"
attacks,... unfortunately in most cases the people aren't that open
about it as here :-/


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-10 19:42               ` Chris Murphy
@ 2015-12-11 22:21                 ` Christoph Anton Mitterer
  2015-12-11 22:32                   ` Christoph Anton Mitterer
                                     ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-11 22:21 UTC (permalink / raw)
  To: Chris Murphy, S.J.; +Cc: Btrfs BTRFS

On Thu, 2015-12-10 at 12:42 -0700, Chris Murphy wrote:
> That isn't what I'm suggesting. In the multiple device volume case
> where there are two exact (same UUID, same devid, same generation)
> instances of one of the block devices, Btrfs could randomly choose
> either one if it's an RO mount.
No, for the same reasons as just stated in my mail few minutes ago.
An attacker could probably find out the UUID/devid/generation... it
would probably possible for him to craft a device with exactly those
and try to use it.
If then btrfs would select any of these, it may also select the wrong
one - ro or rw, this may likely lead to problems.




> > About 1 and 2 ... if 3 gets fulfilled, why?
> > DD itself is not a problem "if" the UUID is changed after it
> > (which is a command as simple as dd), and if someone doesn't
> > know that, he/she will notice when mount refuses to work
> > because UUID duplicate.
> 
> dd is not a copy operation. It's creating a 2nd original. You don't
> end up with an original and a copy (or clone). A copy or clone has
> some distinguishing difference. Volume UUID is used throughout Btrfs
> metadata, it's not just in the superblocks. Changing volume UUID
> requires a rewrite of all metadata. This is inefficient for two
> reasons: one dd copies unused sectors; two it copies metadata that
> will have to be completely rewritten by btrfstune to change volume
> UUID; and also the subvolume UUIDs aren't changed, so it's an
> incomplete solution that has problems (see other threads).
Well dd is surely not the only thing that can be used to create a clone
(i.e. a bitwise identical copy - I guess we don't really care which is
the "original" and which are the "clones", or whether these are "2nd
originals).
We always just use it here as an example for scenarios in which bitwise
identical copies are created.

And even if internally it's a big thing, from the user's PoV, changing
the UUID is pretty simple (I guess that's what S.J. meant).


> If your workflow requires making an exact copy (for the shelf or for
> an emergency) then dd might be OK. But most often it's used because
> it's been easy, not because it's a good practice.
Ufff.. I wouldn't got that far to call something here bad or good
practice.
At least, I do not see any reason to call it a bad practice, except
that systems got over time much more complex and haven't dealt properly
with the problems that can occur by using dd.
Again, I don't demand magical "solutions" (i.e. the btrfs or LVM people
getting code into all dd like tools, so that these auto-detect when the
duplicate such data and auto-change the UUIDs)... they just should
handle the situations gracefully.


>  Note that Btrfs is
> not unique, XFS v5 does a very similar thing with volume UUID as
> well,
> and resulted in this change:
> http://oss.sgi.com/pipermail/xfs/2015-April/041267.html
Do you mean that xfs may suffer from the same issues that we're talking
about here? If so, one should probably give them a notice.



> Using dd also means the volume is offline.
Not really, you could do it on a snapshotted LV, while the "original"
is still running.
Or in emergency cases one could do it on a ro-remounted... probably not
guaranteed to work, but may do so in practise.


Cheers,
Chris.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-11 22:21                 ` Christoph Anton Mitterer
@ 2015-12-11 22:32                   ` Christoph Anton Mitterer
  2015-12-11 23:06                   ` Chris Murphy
  2015-12-11 23:14                   ` Eric Sandeen
  2 siblings, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-11 22:32 UTC (permalink / raw)
  To: Chris Murphy, S.J.; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 234 bytes --]

Sorry, I'm just about to change my mail system, and used a bogus test
From: address in the previous mail (please replace fo@fo with
calestyo@scientia.net).

Apologies for any inconveniences and this noise here.

Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-11 22:21                 ` Christoph Anton Mitterer
  2015-12-11 22:32                   ` Christoph Anton Mitterer
@ 2015-12-11 23:06                   ` Chris Murphy
  2015-12-12  1:34                     ` S.J.
  2015-12-14  0:27                     ` Christoph Anton Mitterer
  2015-12-11 23:14                   ` Eric Sandeen
  2 siblings, 2 replies; 51+ messages in thread
From: Chris Murphy @ 2015-12-11 23:06 UTC (permalink / raw)
  To: Btrfs BTRFS

On Fri, Dec 11, 2015 at 3:21 PM, Christoph Anton Mitterer <fo@fo> wrote:
> On Thu, 2015-12-10 at 12:42 -0700, Chris Murphy wrote:
>> That isn't what I'm suggesting. In the multiple device volume case
>> where there are two exact (same UUID, same devid, same generation)
>> instances of one of the block devices, Btrfs could randomly choose
>> either one if it's an RO mount.
> No, for the same reasons as just stated in my mail few minutes ago.
> An attacker could probably find out the UUID/devid/generation... it
> would probably possible for him to craft a device with exactly those
> and try to use it.

For anything but a new and empty Btrfs volume, this hypothetical
attack would be a ton easier to do on LVM and mdadm raid because they
have a tiny amount of metadata to spoof compared to a Btrfs volume
with even a little bit of data on it. I think this concern is
overblown.



> If then btrfs would select any of these, it may also select the wrong
> one - ro or rw, this may likely lead to problems.





>> dd is not a copy operation. It's creating a 2nd original. You don't
>> end up with an original and a copy (or clone). A copy or clone has
>> some distinguishing difference. Volume UUID is used throughout Btrfs
>> metadata, it's not just in the superblocks. Changing volume UUID
>> requires a rewrite of all metadata. This is inefficient for two
>> reasons: one dd copies unused sectors; two it copies metadata that
>> will have to be completely rewritten by btrfstune to change volume
>> UUID; and also the subvolume UUIDs aren't changed, so it's an
>> incomplete solution that has problems (see other threads).
> Well dd is surely not the only thing that can be used to create a clone
> (i.e. a bitwise identical copy - I guess we don't really care which is
> the "original" and which are the "clones", or whether these are "2nd
> originals).
> We always just use it here as an example for scenarios in which bitwise
> identical copies are created.

I'm suggesting bitwise identical copies being created is not what is
wanted most of the time, except in edge cases.





>
> And even if internally it's a big thing, from the user's PoV, changing
> the UUID is pretty simple (I guess that's what S.J. meant).
>
>
>> If your workflow requires making an exact copy (for the shelf or for
>> an emergency) then dd might be OK. But most often it's used because
>> it's been easy, not because it's a good practice.
> Ufff.. I wouldn't got that far to call something here bad or good
> practice.

It's not just bad practice, it's sufficiently sloppy that it's very
nearly user sabotage. That this is due to innocent ignorance, and a
long standing practice that's bad advice being handed down from
previous generations doesn't absolve the practice and mean we should
invent esoteric work arounds for what is not a good practice. We have
all sorts of exhibits why it's not a good idea.


> At least, I do not see any reason to call it a bad practice, except
> that systems got over time much more complex and haven't dealt properly
> with the problems that can occur by using dd.

The lack of maturity in tools to make it just as easy, or easier, and
faster, to make a *data* bitwise identical copy, that preserves the
intent and integrity of UUID by ensuring there aren't duplicates of
them floating around, as well as profile reshaping on the fly, as well
as a means to account for format changes, etc is a completely
reasonable excuse for continuing to use dd - but it's still suboptimal
which is what I mean by bad idea.


> Again, I don't demand magical "solutions" (i.e. the btrfs or LVM people
> getting code into all dd like tools, so that these auto-detect when the
> duplicate such data and auto-change the UUIDs)... they just should
> handle the situations gracefully.

I disagree. It was due to the rudimentary nature of earlier
filesystems' metadata paradigm that it worked. That's no longer the
case.

Sure, the kernel code should get smarter about refusing to mount in
ambiguous cases, so that a file system isn't nerfed. That shouldn't
happen. But we also need to get away from this idea that dd is
actually an appropriate tool for making a file system copy.



>
>
>>  Note that Btrfs is
>> not unique, XFS v5 does a very similar thing with volume UUID as
>> well,
>> and resulted in this change:
>> http://oss.sgi.com/pipermail/xfs/2015-April/041267.html
> Do you mean that xfs may suffer from the same issues that we're talking
> about here? If so, one should probably give them a notice.

They're aware, that's why xfs_db had the option to change the UUID in
the first place. And the XFS kernel code knows not to mount a 2nd
instance of a volume UUID. But it doesn't support multiple devices, so
it's no where near as prone to problems in this area. If you're using
LVM snapshots, the duplicate UUID problem certainly comes up. While
there is a 'nouuid' mount option for XFS, I have no idea what problems
this might cause for V5 filesystems.




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-11 22:21                 ` Christoph Anton Mitterer
  2015-12-11 22:32                   ` Christoph Anton Mitterer
  2015-12-11 23:06                   ` Chris Murphy
@ 2015-12-11 23:14                   ` Eric Sandeen
  2 siblings, 0 replies; 51+ messages in thread
From: Eric Sandeen @ 2015-12-11 23:14 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Chris Murphy, S.J.; +Cc: Btrfs BTRFS

On 12/11/15 4:21 PM, Christoph Anton Mitterer wrote:
>>  Note that Btrfs is
>> > not unique, XFS v5 does a very similar thing with volume UUID as
>> > well,
>> > and resulted in this change:
>> > http://oss.sgi.com/pipermail/xfs/2015-April/041267.html
> Do you mean that xfs may suffer from the same issues that we're talking
> about here? If so, one should probably give them a notice.

That was disabled temporarily because changing the fs UUID meant that
every piece of checksummed metadata with an embedded UUID would then
mismatch.

It was fixed (re-allowed) with

ce748ea xfs: create new metadata UUID field and incompat flag

in the kernel and

9c4e12f xfsprogs: Add new sb_meta_uuid field, update userspace tools to manipulate it

in xfsprogs. 

-Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-11 23:06                   ` Chris Murphy
@ 2015-12-12  1:34                     ` S.J.
  2015-12-14  0:28                       ` Christoph Anton Mitterer
  2015-12-14  0:27                     ` Christoph Anton Mitterer
  1 sibling, 1 reply; 51+ messages in thread
From: S.J. @ 2015-12-12  1:34 UTC (permalink / raw)
  To: Btrfs BTRFS

A bit more about the dd-is-bad-topic:

IMHO it doesn't matter at all.

a) For this specific problem here, fixing a security problem automatically
fixes the risk of data corruption because careless cloning+mounting
(without UUID adjustments) too.
So, if the user likes to use dd with its disadvantages, like waiting 
hours to
copy lots of free space, and bad practice, etc.etc., why should it concern
the Btrfs developers and/or us here?

b) At wider scope; while Btrfs is more complex than Xfs etc., currently
there is no other reason why things could go bad when dd'ing something.
As long as this holds, is there really a place in the official Btrfs 
documentation
for telling the users "dd is bad [practice]"?
...



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-11 23:06                   ` Chris Murphy
  2015-12-12  1:34                     ` S.J.
@ 2015-12-14  0:27                     ` Christoph Anton Mitterer
  2015-12-14 13:23                       ` Austin S. Hemmelgarn
  2015-12-14 20:55                       ` Chris Murphy
  1 sibling, 2 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-14  0:27 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4640 bytes --]

On Fri, 2015-12-11 at 16:06 -0700, Chris Murphy wrote:
> For anything but a new and empty Btrfs volume
What's the influence of the fs being new/empty?

> this hypothetical
> attack would be a ton easier to do on LVM and mdadm raid because they
> have a tiny amount of metadata to spoof compared to a Btrfs volume
> with even a little bit of data on it.
Uhm I haven't said that other systems properly handle this kind of
attack. ;-)
Guess that would need to be evaluated...


>  I think this concern is overblown.
I don't think so. Let me give you an example: There is an attack[0]
against crypto, where the attacker listens via a smartphone's
microphone, and based on the acoustics of a computer where gnupg runs.
This is surely not an attack many people would have considered even
remotely possible, but in fact it works, at least under lab conditions.

I guess the same applies for possible attack vectors like this here.
The stronger actual crypto and the strong software gets in terms of
classical security holes (buffer overruns and so), the more attackers
will try to go alternative ways.


> I'm suggesting bitwise identical copies being created is not what is
> wanted most of the time, except in edge cases.
mhh,.. well there's the VM case, e.g. duplicating a template VM,
booting it deploying software. Guess that's already common enough.
There are people who want to use btrfs on top of LVM and using the
snapshot functionality of that... another use case.
Some people may want to use it on top of MD (for whatever reason)... at
least in the mirroring RAID case, the kernel would see the same btrfs
twice.

Apart from that, btrfs should be a general purpose fs, and not just a
desktop or server fs.
So edge cases like forensics (where it's common that you create bitwise
identical images) shouln't be forgotten either.


> > >If your workflow requires making an exact copy (for the shelf or
> > > for
> > > an emergency) then dd might be OK. But most often it's used
> > > because
> > > it's been easy, not because it's a good practice.
> > Ufff.. I wouldn't got that far to call something here bad or good
> > practice.
> 
> It's not just bad practice, it's sufficiently sloppy that it's very
> nearly user sabotage. That this is due to innocent ignorance, and a
> long standing practice that's bad advice being handed down from
> previous generations doesn't absolve the practice and mean we should
> invent esoteric work arounds for what is not a good practice. We have
> all sorts of exhibits why it's not a good idea.
Well if you don't give any real arguments or technical reasons (apart
from "working around software that doesn't handle this well") I
consider this just repetition of the baseless claim that long standing
practise would be bad.


> I disagree. It was due to the rudimentary nature of earlier
> filesystems' metadata paradigm that it worked. That's no longer the
> case.
Well in the end it's of course up to the developers to decide whether
this is acceptable or not, but being on the admin/end-user side, I can
at least say that not everyone on there would accept "this is no longer
the case" as valid explanation when their fs was corrupted or attacked.


> Sure, the kernel code should get smarter about refusing to mount in
> ambiguous cases, so that a file system isn't nerfed. That shouldn't
> happen. But we also need to get away from this idea that dd is
> actually an appropriate tool for making a file system copy.
Uhm... your view is a bit narrow-sighted... again take the forensics
example.

But apart from that,... I never said that dd should be the regular tool
for people to copy a btrfs image. Typically it would be simply slower
than other means.

But for some solutions, it may still be the better choice, or at least
the only choice implemented right now (e.g. I wouldn't now of a
hypervisor system, that looks at an existing disk image, finds any
btrfs in that (possibly "hidden" below further block layers), and
cleanly copies the data into freshly created btrfs image, with the same
structure.
AFAIK, there's not even a solution right now, that copies a complete
btrfs, with snapshots, etc. preserving all ref-links. At least nothing
official that works in one command.

Long story, short, I think we can agree, that - dd or not - corruptions
or attack vectors shouldn't be possible.
And be it just, to protect against the btrfs on hardware RAID1 case,
which is accidentally switched to JBOD mode...


Cheers,
Chris.


[0] http://www.tau.ac.il/~tromer/papers/acoustic-20131218.pdf

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-12  1:34                     ` S.J.
@ 2015-12-14  0:28                       ` Christoph Anton Mitterer
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-14  0:28 UTC (permalink / raw)
  To: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 894 bytes --]

On Sat, 2015-12-12 at 02:34 +0100, S.J. wrote:
> A bit more about the dd-is-bad-topic:
> 
> IMHO it doesn't matter at all.
Yes, fully agree.


> a) For this specific problem here, fixing a security problem
> automatically
> fixes the risk of data corruption because careless cloning+mounting
> (without UUID adjustments) too.
> So, if the user likes to use dd with its disadvantages, like waiting 
> hours to
> copy lots of free space, and bad practice, etc.etc., why should it
> concern
> the Btrfs developers and/or us here?
> 
> b) At wider scope; while Btrfs is more complex than Xfs etc.,
> currently
> there is no other reason why things could go bad when dd'ing
> something.
> As long as this holds, is there really a place in the official Btrfs 
> documentation
> for telling the users "dd is bad [practice]"?
> ...
fully agree as well. :-)


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-14  0:27                     ` Christoph Anton Mitterer
@ 2015-12-14 13:23                       ` Austin S. Hemmelgarn
  2015-12-14 21:26                         ` Chris Murphy
  2015-12-15  0:08                         ` Christoph Anton Mitterer
  2015-12-14 20:55                       ` Chris Murphy
  1 sibling, 2 replies; 51+ messages in thread
From: Austin S. Hemmelgarn @ 2015-12-14 13:23 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Chris Murphy, Btrfs BTRFS

On 2015-12-13 19:27, Christoph Anton Mitterer wrote:
> On Fri, 2015-12-11 at 16:06 -0700, Chris Murphy wrote:
>> For anything but a new and empty Btrfs volume
> What's the influence of the fs being new/empty?
>
>> this hypothetical
>> attack would be a ton easier to do on LVM and mdadm raid because they
>> have a tiny amount of metadata to spoof compared to a Btrfs volume
>> with even a little bit of data on it.
> Uhm I haven't said that other systems properly handle this kind of
> attack. ;-)
> Guess that would need to be evaluated...
>
>
>>   I think this concern is overblown.
> I don't think so. Let me give you an example: There is an attack[0]
> against crypto, where the attacker listens via a smartphone's
> microphone, and based on the acoustics of a computer where gnupg runs.
> This is surely not an attack many people would have considered even
> remotely possible, but in fact it works, at least under lab conditions.
>
> I guess the same applies for possible attack vectors like this here.
> The stronger actual crypto and the strong software gets in terms of
> classical security holes (buffer overruns and so), the more attackers
> will try to go alternative ways.
The reason that this isn't quite as high of a concern is because 
performing this attack requires either root access, or direct physical 
access to the hardware, and in either case, your system is already 
compromised.

I still think that that isn't a sufficient excuse for not fixing the 
issue, as there are a number of non-security related issues that can 
result from this (there are some things that are common practice with 
LVM or mdraid that can't be done with BTRFS because of this).
>
>> I'm suggesting bitwise identical copies being created is not what is
>> wanted most of the time, except in edge cases.
> mhh,.. well there's the VM case, e.g. duplicating a template VM,
> booting it deploying software. Guess that's already common enough.
> There are people who want to use btrfs on top of LVM and using the
> snapshot functionality of that... another use case.
> Some people may want to use it on top of MD (for whatever reason)... at
> least in the mirroring RAID case, the kernel would see the same btrfs
> twice.
Also, using flat DM-RAID (and yes, people do use DM-RAID without LVM), 
using the DM-cache target, some multi-path setups, some shared storage 
setups, a couple of other DM targets, and probably a number of other 
things I haven't thought of yet.
>
> Apart from that, btrfs should be a general purpose fs, and not just a
> desktop or server fs.
> So edge cases like forensics (where it's common that you create bitwise
> identical images) shouln't be forgotten either.
While I would normally agree, there are ways to work around this in the 
forensics case that don't work for any other case (namely, if BTRFS is 
built as a module, you can unmount everything, unload the module, reload 
it, and only scan the devices you want).
>
>
>>>> If your workflow requires making an exact copy (for the shelf or
>>>> for
>>>> an emergency) then dd might be OK. But most often it's used
>>>> because
>>>> it's been easy, not because it's a good practice.
>>> Ufff.. I wouldn't got that far to call something here bad or good
>>> practice.
>>
>> It's not just bad practice, it's sufficiently sloppy that it's very
>> nearly user sabotage. That this is due to innocent ignorance, and a
>> long standing practice that's bad advice being handed down from
>> previous generations doesn't absolve the practice and mean we should
>> invent esoteric work arounds for what is not a good practice. We have
>> all sorts of exhibits why it's not a good idea.
> Well if you don't give any real arguments or technical reasons (apart
> from "working around software that doesn't handle this well") I
> consider this just repetition of the baseless claim that long standing
> practise would be bad.
Agreed, if yo9u can't substantiate _why_ it's bad practice, then you 
aren't making a valid argument.  The fact that there is software that 
doesn't handle it well would say to me based on established practice 
that that software is what's broken, not common practice.

The assumption that a UUID is actually unique is an inherently flawed 
one, because it depends both on the method of generation guaranteeing 
it's unique (and none of the defined methods guarantee that), and a 
distinct absence of malicious intent.
>
>> I disagree. It was due to the rudimentary nature of earlier
>> filesystems' metadata paradigm that it worked. That's no longer the
>> case.
> Well in the end it's of course up to the developers to decide whether
> this is acceptable or not, but being on the admin/end-user side, I can
> at least say that not everyone on there would accept "this is no longer
> the case" as valid explanation when their fs was corrupted or attacked.
On that note, why exactly is it better to make the filesystem UUID such 
an integral part of the filesystem?  The other thing I'm reading out of 
this all, is that by writing a total of 64 bytes to a specific location 
in a single disk in a multi-device BTRFS filesystem, you can make the 
whole filesystem fall apart, which is absolutely absurd.
>
>> Sure, the kernel code should get smarter about refusing to mount in
>> ambiguous cases, so that a file system isn't nerfed. That shouldn't
>> happen. But we also need to get away from this idea that dd is
>> actually an appropriate tool for making a file system copy.
> Uhm... your view is a bit narrow-sighted... again take the forensics
> example.
And some recovery situations (think along the lines of no recovery disk, 
and you only have busybox or something similar to work with).
>
> But apart from that,... I never said that dd should be the regular tool
> for people to copy a btrfs image. Typically it would be simply slower
> than other means.
>
> But for some solutions, it may still be the better choice, or at least
> the only choice implemented right now (e.g. I wouldn't now of a
> hypervisor system, that looks at an existing disk image, finds any
> btrfs in that (possibly "hidden" below further block layers), and
> cleanly copies the data into freshly created btrfs image, with the same
> structure.
> AFAIK, there's not even a solution right now, that copies a complete
> btrfs, with snapshots, etc. preserving all ref-links. At least nothing
> official that works in one command.
Send-receive kind of works for that, but requires down time because the 
subvolumes all have to be read-only.  In theory, it's possible, but it 
would take a lot of work, and a lot of special case handling to 
implement properly.
>
> Long story, short, I think we can agree, that - dd or not - corruptions
> or attack vectors shouldn't be possible.
> And be it just, to protect against the btrfs on hardware RAID1 case,
> which is accidentally switched to JBOD mode...



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-14  0:27                     ` Christoph Anton Mitterer
  2015-12-14 13:23                       ` Austin S. Hemmelgarn
@ 2015-12-14 20:55                       ` Chris Murphy
  2015-12-15  0:22                         ` Christoph Anton Mitterer
  1 sibling, 1 reply; 51+ messages in thread
From: Chris Murphy @ 2015-12-14 20:55 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Chris Murphy, Btrfs BTRFS

On Sun, Dec 13, 2015 at 5:27 PM, Christoph Anton Mitterer
<calestyo@scientia.net> wrote:
> On Fri, 2015-12-11 at 16:06 -0700, Chris Murphy wrote:
>> For anything but a new and empty Btrfs volume
> What's the influence of the fs being new/empty?
>
>> this hypothetical
>> attack would be a ton easier to do on LVM and mdadm raid because they
>> have a tiny amount of metadata to spoof compared to a Btrfs volume
>> with even a little bit of data on it.
> Uhm I haven't said that other systems properly handle this kind of
> attack. ;-)
> Guess that would need to be evaluated...
>
>
>>  I think this concern is overblown.
> I don't think so. Let me give you an example: There is an attack[0]
> against crypto, where the attacker listens via a smartphone's
> microphone, and based on the acoustics of a computer where gnupg runs.
> This is surely not an attack many people would have considered even
> remotely possible, but in fact it works, at least under lab conditions.

I'm aware of this proof of concept. I'd put it, and this one, in the
realm of a targeted attack, so it's not nearly as likely as other
problems needing fixing. That doesn't mean don't understand it better
so it can be fixed. It means understand before arriving at risk
assessment let alone conclusions.



> Apart from that, btrfs should be a general purpose fs, and not just a
> desktop or server fs.
> So edge cases like forensics (where it's common that you create bitwise
> identical images) shouln't be forgotten either.

I didn't. I did state there are edge cases, not normal use. My
criticism of dd for copying a volume is for general purpose copying,
not edge cases.



>
>
>> > >If your workflow requires making an exact copy (for the shelf or
>> > > for
>> > > an emergency) then dd might be OK. But most often it's used
>> > > because
>> > > it's been easy, not because it's a good practice.
>> > Ufff.. I wouldn't got that far to call something here bad or good
>> > practice.
>>
>> It's not just bad practice, it's sufficiently sloppy that it's very
>> nearly user sabotage. That this is due to innocent ignorance, and a
>> long standing practice that's bad advice being handed down from
>> previous generations doesn't absolve the practice and mean we should
>> invent esoteric work arounds for what is not a good practice. We have
>> all sorts of exhibits why it's not a good idea.
> Well if you don't give any real arguments or technical reasons (apart
> from "working around software that doesn't handle this well") I
> consider this just repetition of the baseless claim that long standing
> practise would be bad.

I already have, as have others.

Does the user want cake or pie? The computer doesn't have that level
of granular information when there are two apparently bitwise
identical devices. The file system sees them both as dessert, without
other distinction. So option a is to simply fail and let the user
resolve the ambiguity. Option b is maybe to leveral btrfs check code
and find out if there's more to the story, some indication that one of
the apparently identical copies isn't really identical. But that's a
lot of work for something that probably won't happen. What's more
likely is they aren't just apparently identical, they are in fact
identical because it's an LVM snapshot or a dd copy that's making them
appear identical. That's not something btrfs can resolve alone.

To automate the distinction, requires more information. If it's LVM,
possibly LVM and Btrfs could work together where LVM LV UUID * Btrfs
volume UUID = Btrfs volume UUID'  (as in a derivative) and to treat it
internally with a new temp UUID that's throw away.

If it's a raw device, I still see this as the user's problem. They
created it, they'll have to help resolve the ambiguity by yanking one
of the drives.



> Long story, short, I think we can agree, that - dd or not - corruptions
> or attack vectors shouldn't be possible.

Yes.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-14 13:23                       ` Austin S. Hemmelgarn
@ 2015-12-14 21:26                         ` Chris Murphy
  2015-12-15  0:35                           ` Christoph Anton Mitterer
  2015-12-15 13:54                           ` Austin S. Hemmelgarn
  2015-12-15  0:08                         ` Christoph Anton Mitterer
  1 sibling, 2 replies; 51+ messages in thread
From: Chris Murphy @ 2015-12-14 21:26 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Christoph Anton Mitterer, Btrfs BTRFS

On Mon, Dec 14, 2015 at 6:23 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
>
> Agreed, if yo9u can't substantiate _why_ it's bad practice, then you aren't
> making a valid argument.  The fact that there is software that doesn't
> handle it well would say to me based on established practice that that
> software is what's broken, not common practice.

The automobile is invented and due to the ensuing chaos, common
practice of doing whatever the F you wanted came to an end in favor of
rules of the road and traffic lights. I'm sure some people went
ballistic, but for the most part things were much better without the
brokenness or prior common practice.

So the fact we're going to have this problem with all file systems
that incorporate the volume UUID into the metadata stream, tells me
that the very rudimentary common practice of using dd needs to go
away, in general practice. I've already said data recovery (including
forensics) and sticking drives away on a shelf could be reasonable.

> The assumption that a UUID is actually unique is an inherently flawed one,
> because it depends both on the method of generation guaranteeing it's unique
> (and none of the defined methods guarantee that), and a distinct absence of
> malicious intent.

http://www.ietf.org/rfc/rfc4122.txt
"A UUID is 128 bits long, and can guarantee uniqueness across space and time."

Also see security considerations in section 6.


> On that note, why exactly is it better to make the filesystem UUID such an
> integral part of the filesystem?  The other thing I'm reading out of this
> all, is that by writing a total of 64 bytes to a specific location in a
> single disk in a multi-device BTRFS filesystem, you can make the whole
> filesystem fall apart, which is absolutely absurd.


OK maybe I'm  missing something.

1. UUID is 128 bits. So where are you getting the additional 48 bytes from?
2. The volume UUID is in every superblock, which for all practical
purposes means at least two instances of that UUID per device.

Are you saying the file system falls apart when changing just one of
those volume UUIDs in one superblock? And how does it fall apart? I'd
say all volume UUID instances (each superblock, on every device)
should be checked and if any of them mismatch then fail to mount.

There could be some leveraging of the device WWN, or absent that its
serial number, propogated into all of the volume's devices (cross
referencing each other's devid to WWN or serial). And then that way
there's a way to differentiate. In the dd case, there would be
mismatching real device WWN/serial number and the one written in
metadata on all drives, including the copy. This doesn't say what
policy should happen next, just that at least it's known there's a
mismatch.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-14 13:23                       ` Austin S. Hemmelgarn
  2015-12-14 21:26                         ` Chris Murphy
@ 2015-12-15  0:08                         ` Christoph Anton Mitterer
  2015-12-15 14:19                           ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-15  0:08 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4049 bytes --]

On Mon, 2015-12-14 at 08:23 -0500, Austin S. Hemmelgarn wrote:
> The reason that this isn't quite as high of a concern is because
> performing this attack requires either root access, or direct
> physical 
> access to the hardware, and in either case, your system is already 
> compromised.
No necessarily.
Apart from the ATM image (where most people wouldn't call it
compromised, just because it's openly accessible on the street)
imageine you're running a VM hosting service, where you allow users to
upload images and have them deployed.
In the cheap" case these will end up as regular files, where they
couldn't do any harm (even if colliding UUIDs)... but even there one
would have to expect, that the hypervisor admin may losetup them for
whichever reason.
But if you offer more professional services, you may give your clients
e.g. direct access to some storage backend, which are then probably
also seen on the host by its kernel.
And here we already have the case, that a client could remotely trigger
such collision.

And remember, things only sounds far-fetched until it actually happens
the first time ;)


> I still think that that isn't a sufficient excuse for not fixing the 
> issue, as there are a number of non-security related issues that can 
> result from this (there are some things that are common practice with
> LVM or mdraid that can't be done with BTRFS because of this).
Sure, I guess we agree on that,...


> > Apart from that, btrfs should be a general purpose fs, and not just
> > a
> > desktop or server fs.
> > So edge cases like forensics (where it's common that you create
> > bitwise
> > identical images) shouln't be forgotten either.
> While I would normally agree, there are ways to work around this in
> the 
> forensics case that don't work for any other case (namely, if BTRFS
> is 
> built as a module, you can unmount everything, unload the module,
> reload 
> it, and only scan the devices you want).
see below (*)


> On that note, why exactly is it better to make the filesystem UUID
> such 
> an integral part of the filesystem?
Well I think it's a proper way to e.g. handle the multi-device case.
You have n devices, you want to differ them,... using a pseudo-random
UUID is surely better than giving them numbers.
Same for the fs UUID, e.g. when used for mounting devices whose paths
aren't stable.

As said before, using the UUID isn't the problem - not protecting
against collisions is.


> The other thing I'm reading out of 
> this all, is that by writing a total of 64 bytes to a specific
> location 
> in a single disk in a multi-device BTRFS filesystem, you can make the
> whole filesystem fall apart, which is absolutely absurd.
Well,... I don't think that writing *into* the filesystem is covered by
common practise anymore.

In UNIX, a device (which holds the filesystem) is a file. Therefore one
can argue: if one copies/duplicates one file (i.e. the fs) neither of
the two's contents should get corrupted.
But if you actively write *into* the file by yourself,... then you're
simply on your own, either you know what you do, or just may just
corrupt *that* specific file. Of course it should again not lead to any
of it's clones or become corrupted as well.



> And some recovery situations (think along the lines of no recovery
> disk, 
> and you only have busybox or something similar to work with).
(*) which is however also, why you may not be able to unmount the
device anymore or unload btrfs.
Maybe you have reasons you must/want to do any forensics in the running
system.


> > AFAIK, there's not even a solution right now, that copies a
> > complete
> > btrfs, with snapshots, etc. preserving all ref-links. At least
> > nothing
> > official that works in one command.
> Send-receive kind of works for that
I've added the "in one command" for that... O:-)
In case the btrfs would have subvols/snapshots... the user would need
to make the recursion himself... 


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-14 20:55                       ` Chris Murphy
@ 2015-12-15  0:22                         ` Christoph Anton Mitterer
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-15  0:22 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3604 bytes --]

On Mon, 2015-12-14 at 13:55 -0700, Chris Murphy wrote:
> I'm aware of this proof of concept. I'd put it, and this one, in the
> realm of a targeted attack, so it's not nearly as likely as other
> problems needing fixing. That doesn't mean don't understand it better
> so it can be fixed. It means understand before arriving at risk
> assessment let alone conclusions.
Assessing the actual risk of any such attack vector is IMHO quite
difficult... but at least past experience has shown countless times
over and over again, that any system, where people already saw it would
have issues, were sooner or later actively attacked.

Take all the things from online banking... TAN, iTAN... at some point
the two-factor auth via mobileTAN were some people already warned, that
this would be rather easy to attack... banks and proponents of the
system said, that this is rather not realistic in practise.
I think alone in Germany we had some 8 million Euros that were stolen
by hacking mTANs last year.


> I didn't. I did state there are edge cases, not normal use. My
> criticism of dd for copying a volume is for general purpose copying,
> not edge cases.
Sure... but I guess we've never needed to argue about that.
If a howto were to be written on "how to best copy a btrfs filesystem"
and someone would say "me! take dd"... I'd be surely on your side,
sayin "Naaahh... stupid... you copy empty blocks and that like".

But here we talk about something completely different... namely all
those cases where UUID collisions could happen, including those where a
bit-identical copy is, for whichever reason, the best solution.



> I already have, as have others.
So far you've only said it would be bad practise as it wouldn't work
well with filesystems that do use UUIDs.
I agree with what Austin gave you as an answer upon that.


> Does the user want cake or pie? The computer doesn't have that level
> of granular information when there are two apparently bitwise
> identical devices.
I'm quite sure the computer has some concept of device path, and UUID
isn't the only way to identify a device. If that was so, than any
cloned ext4 would suffer from corruptions as well, as the fs would
chose the device based on UUID.

brtfs does of course more, especially in the multi-device case,...
where it needs to differ devices based on their content, no on their
path (which may be unstable).
But such case can surely be detected, and as you said yourself below:

> So option a is to simply fail and let the user
> resolve the ambiguity.
... on could e.g. simply require the user to resolve the situation
manually.
And I guess that's exactly what I've wrote here several times in this
thread, for mounting situations, for rebuild/fsck/repair/etc.
sitations.


>  Option b is maybe to leveral btrfs check code
> and find out if there's more to the story, some indication that one
> of
> the apparently identical copies isn't really identical.
Can't believe that this would be possible... if they're bitwise
identical, they're bitwise identical, the only thing that differs them
is how they're connected, e.g. USB port 1, sata port 2, etc..
But as this is unstable (just swap two sata disks) it cannot be used.


> That's not something btrfs can resolve alone.
Sure, I've never demanded that.
I always said "handle it gracefully" (i.e. no corruptions, no new
mounts, fsck's, etc.), require the user to manually sort out things.
Not automagically determine which of the devices are actually the right
ones and use them.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-14 21:26                         ` Chris Murphy
@ 2015-12-15  0:35                           ` Christoph Anton Mitterer
  2015-12-15 13:54                           ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-15  0:35 UTC (permalink / raw)
  To: Chris Murphy, Austin S. Hemmelgarn; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 2133 bytes --]

On Mon, 2015-12-14 at 14:26 -0700, Chris Murphy wrote:
> The automobile is invented and due to the ensuing chaos, common
> practice of doing whatever the F you wanted came to an end in favor
> of
> rules of the road and traffic lights. I'm sure some people went
> ballistic, but for the most part things were much better without the
> brokenness or prior common practice.
Okay than take your road traffic example, apply it to filesystems.

In road traffic you have rules, e.g. pedestrians may cross the road
when their light shows green and that of the cars red.
That could be the rule, similar as to "don't have duplicate UUIDs with
btrfs".

Despite we have the rule, cars stop at red, pedestrians walk at green,
we still teach our kids: "look at both sides on the road, only cross if
there's no car (or tank or whatever ;) ) crossing.
Applying that to filesystems would be: "hope that everyone plays the
rules, but don't kill yourself in one doesn't and there are duplicate
IDs).

 
> So the fact we're going to have this problem with all file systems
> that incorporate the volume UUID into the metadata stream, tells me
> that the very rudimentary common practice of using dd needs to go
> away, in general practice.
Sure, for those that use multiple devices (LVM, MD, etc.), or for those
that actually just use the UUID to select the block device for each
write/read (and not use these only "once") to get the right major/minor
dev id (or whatever the kernel uses internally for path based
addressing).


> http://www.ietf.org/rfc/rfc4122.txt
> "A UUID is 128 bits long, and can guarantee uniqueness across space
> and time."
But of course not in terms of the problems we're talking about here,
where UUIDs may be accidentally or maliciously duplicated.

> Also see security considerations in section 6.
Doesn't section 6 basically imply that you can not 100% guarantee
they're equal? E.g. bad random seed on multiple systems?

Also, IIRC, one of the UUID algos just used some combination of MAC,
time and PID... which especially in VMs may even lead to dupes.



Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-14 21:26                         ` Chris Murphy
  2015-12-15  0:35                           ` Christoph Anton Mitterer
@ 2015-12-15 13:54                           ` Austin S. Hemmelgarn
  2015-12-15 14:18                             ` Hugo Mills
  2015-12-16 12:03                             ` Christoph Anton Mitterer
  1 sibling, 2 replies; 51+ messages in thread
From: Austin S. Hemmelgarn @ 2015-12-15 13:54 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Christoph Anton Mitterer, Btrfs BTRFS

On 2015-12-14 16:26, Chris Murphy wrote:
> On Mon, Dec 14, 2015 at 6:23 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>>
>> Agreed, if yo9u can't substantiate _why_ it's bad practice, then you aren't
>> making a valid argument.  The fact that there is software that doesn't
>> handle it well would say to me based on established practice that that
>> software is what's broken, not common practice.
>
> The automobile is invented and due to the ensuing chaos, common
> practice of doing whatever the F you wanted came to an end in favor of
> rules of the road and traffic lights. I'm sure some people went
> ballistic, but for the most part things were much better without the
> brokenness or prior common practice.
Except for one thing:  Automobiles actually provide a measurable 
significant benefit to society.  What specific benefit does embedding 
the filesystem UUID in the metadata actually provide?
>
> So the fact we're going to have this problem with all file systems
> that incorporate the volume UUID into the metadata stream, tells me
> that the very rudimentary common practice of using dd needs to go
> away, in general practice. I've already said data recovery (including
> forensics) and sticking drives away on a shelf could be reasonable.
>
>> The assumption that a UUID is actually unique is an inherently flawed one,
>> because it depends both on the method of generation guaranteeing it's unique
>> (and none of the defined methods guarantee that), and a distinct absence of
>> malicious intent.
>
> http://www.ietf.org/rfc/rfc4122.txt
> "A UUID is 128 bits long, and can guarantee uniqueness across space and time."
>
> Also see security considerations in section 6.
Both aspects ignore the facts that:
Version 1 is easy to cause a collision with (MAC addresses are by no 
means unique, and are easy to spoof, and so are timestamps).
Version 2 is relatively easy to cause a collision with, because UID and 
GID numbers are a fixed size namespace.
Version 3 is slightly better, but still not by any means unique because 
you just have to guess the seed string (or a collision for it).
Version 4 is probably the hardest to get a collision with, but only if 
you are using a true RNG, and evne then, 122 bits of entropy is not much 
protection.
Version 5 has the same issues as Version 3, but is more secure against 
hash collisions.

In general, you should only use UUID's when either:
a. You have absolutely 100% complete control of the storage of them, 
such that you can guarantee they don't get reused.
b. They can be guaranteed to be relatively unique for the system using them.
>
>
>> On that note, why exactly is it better to make the filesystem UUID such an
>> integral part of the filesystem?  The other thing I'm reading out of this
>> all, is that by writing a total of 64 bytes to a specific location in a
>> single disk in a multi-device BTRFS filesystem, you can make the whole
>> filesystem fall apart, which is absolutely absurd.
>
>
> OK maybe I'm  missing something.
>
> 1. UUID is 128 bits. So where are you getting the additional 48 bytes from?
> 2. The volume UUID is in every superblock, which for all practical
> purposes means at least two instances of that UUID per device.
>
> Are you saying the file system falls apart when changing just one of
> those volume UUIDs in one superblock? And how does it fall apart? I'd
> say all volume UUID instances (each superblock, on every device)
> should be checked and if any of them mismatch then fail to mount.
You're right, it would probably take writing all the SB's (although I'm 
not 100% certain that we actually check that the SB UUID's match).
The extra bytes, which I grossly miscalculated, are for the SB checksum, 
which would have to be updated to match the new SB.
>
> There could be some leveraging of the device WWN, or absent that its
> serial number, propogated into all of the volume's devices (cross
> referencing each other's devid to WWN or serial). And then that way
> there's a way to differentiate. In the dd case, there would be
> mismatching real device WWN/serial number and the one written in
> metadata on all drives, including the copy. This doesn't say what
> policy should happen next, just that at least it's known there's a
> mismatch.
>
That gets tricky too, because for example you have stuff like flat files 
used as filesystem images.

However, if we then use some separate UUID (possibly hashed off of the 
file location) in place of the device serial/WWN, that could 
theoretically provide some better protection.  The obvious solution in 
the case of a mismatch would be to refuse the mount until either the 
issue is fixed using the tools, or the user specifies some particular 
mount option to either fix ti automatically, or ignore copies with a 
mismatching serial.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15 13:54                           ` Austin S. Hemmelgarn
@ 2015-12-15 14:18                             ` Hugo Mills
  2015-12-15 14:27                               ` Austin S. Hemmelgarn
  2015-12-16 12:03                               ` Christoph Anton Mitterer
  2015-12-16 12:03                             ` Christoph Anton Mitterer
  1 sibling, 2 replies; 51+ messages in thread
From: Hugo Mills @ 2015-12-15 14:18 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Christoph Anton Mitterer, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 6073 bytes --]

On Tue, Dec 15, 2015 at 08:54:01AM -0500, Austin S. Hemmelgarn wrote:
> On 2015-12-14 16:26, Chris Murphy wrote:
> >On Mon, Dec 14, 2015 at 6:23 AM, Austin S. Hemmelgarn
> ><ahferroin7@gmail.com> wrote:
> >>
> >>Agreed, if yo9u can't substantiate _why_ it's bad practice, then you aren't
> >>making a valid argument.  The fact that there is software that doesn't
> >>handle it well would say to me based on established practice that that
> >>software is what's broken, not common practice.
> >
> >The automobile is invented and due to the ensuing chaos, common
> >practice of doing whatever the F you wanted came to an end in favor of
> >rules of the road and traffic lights. I'm sure some people went
> >ballistic, but for the most part things were much better without the
> >brokenness or prior common practice.
> Except for one thing:  Automobiles actually provide a measurable
> significant benefit to society.  What specific benefit does
> embedding the filesystem UUID in the metadata actually provide?

   That one's easy to answer. It deals with a major issue that
reiserfs had: if you have a filesystem with another filesystem image
stored on it, reiserfsck could end up deciding that both the metadata
blocks of the main filesystem *and* the metadata blocks of the image
were part of the same FS (because they're on the same block device),
and so would splice both filesystems into one, generally complaining
loudly along the way that there was a lot of corruption present that
it was trying to fix.

   Putting the UUID of the FS into the metadata blocks means that the
kind of low-level check/repair attempt which scans for "stuff that
looks like metadata" can at least distinguish between the stuff that's
really metadata and the stuff that's just data that looks like
metadata.

   Hugo.

> >So the fact we're going to have this problem with all file systems
> >that incorporate the volume UUID into the metadata stream, tells me
> >that the very rudimentary common practice of using dd needs to go
> >away, in general practice. I've already said data recovery (including
> >forensics) and sticking drives away on a shelf could be reasonable.
> >
> >>The assumption that a UUID is actually unique is an inherently flawed one,
> >>because it depends both on the method of generation guaranteeing it's unique
> >>(and none of the defined methods guarantee that), and a distinct absence of
> >>malicious intent.
> >
> >http://www.ietf.org/rfc/rfc4122.txt
> >"A UUID is 128 bits long, and can guarantee uniqueness across space and time."
> >
> >Also see security considerations in section 6.
> Both aspects ignore the facts that:
> Version 1 is easy to cause a collision with (MAC addresses are by no
> means unique, and are easy to spoof, and so are timestamps).
> Version 2 is relatively easy to cause a collision with, because UID
> and GID numbers are a fixed size namespace.
> Version 3 is slightly better, but still not by any means unique
> because you just have to guess the seed string (or a collision for
> it).
> Version 4 is probably the hardest to get a collision with, but only
> if you are using a true RNG, and evne then, 122 bits of entropy is
> not much protection.
> Version 5 has the same issues as Version 3, but is more secure
> against hash collisions.
> 
> In general, you should only use UUID's when either:
> a. You have absolutely 100% complete control of the storage of them,
> such that you can guarantee they don't get reused.
> b. They can be guaranteed to be relatively unique for the system using them.
> >
> >
> >>On that note, why exactly is it better to make the filesystem UUID such an
> >>integral part of the filesystem?  The other thing I'm reading out of this
> >>all, is that by writing a total of 64 bytes to a specific location in a
> >>single disk in a multi-device BTRFS filesystem, you can make the whole
> >>filesystem fall apart, which is absolutely absurd.
> >
> >
> >OK maybe I'm  missing something.
> >
> >1. UUID is 128 bits. So where are you getting the additional 48 bytes from?
> >2. The volume UUID is in every superblock, which for all practical
> >purposes means at least two instances of that UUID per device.
> >
> >Are you saying the file system falls apart when changing just one of
> >those volume UUIDs in one superblock? And how does it fall apart? I'd
> >say all volume UUID instances (each superblock, on every device)
> >should be checked and if any of them mismatch then fail to mount.
> You're right, it would probably take writing all the SB's (although
> I'm not 100% certain that we actually check that the SB UUID's
> match).
> The extra bytes, which I grossly miscalculated, are for the SB
> checksum, which would have to be updated to match the new SB.
> >
> >There could be some leveraging of the device WWN, or absent that its
> >serial number, propogated into all of the volume's devices (cross
> >referencing each other's devid to WWN or serial). And then that way
> >there's a way to differentiate. In the dd case, there would be
> >mismatching real device WWN/serial number and the one written in
> >metadata on all drives, including the copy. This doesn't say what
> >policy should happen next, just that at least it's known there's a
> >mismatch.
> >
> That gets tricky too, because for example you have stuff like flat
> files used as filesystem images.
> 
> However, if we then use some separate UUID (possibly hashed off of
> the file location) in place of the device serial/WWN, that could
> theoretically provide some better protection.  The obvious solution
> in the case of a mismatch would be to refuse the mount until either
> the issue is fixed using the tools, or the user specifies some
> particular mount option to either fix ti automatically, or ignore
> copies with a mismatching serial.
> 

-- 
Hugo Mills             | I think that everything darkling says is actually a
hugo@... carfax.org.uk | joke. It's just that we haven't worked out most of
http://carfax.org.uk/  | them yet.
PGP: E2AB1DE4          |                                                Vashka

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15  0:08                         ` Christoph Anton Mitterer
@ 2015-12-15 14:19                           ` Austin S. Hemmelgarn
  2015-12-16 12:56                             ` Christoph Anton Mitterer
  0 siblings, 1 reply; 51+ messages in thread
From: Austin S. Hemmelgarn @ 2015-12-15 14:19 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Chris Murphy, Btrfs BTRFS

On 2015-12-14 19:08, Christoph Anton Mitterer wrote:
> On Mon, 2015-12-14 at 08:23 -0500, Austin S. Hemmelgarn wrote:
>> The reason that this isn't quite as high of a concern is because
>> performing this attack requires either root access, or direct
>> physical
>> access to the hardware, and in either case, your system is already
>> compromised.
> No necessarily.
> Apart from the ATM image (where most people wouldn't call it
> compromised, just because it's openly accessible on the street)
Um, no you don't have direct physical access to the hardware with an 
ATM, at least, not unless you are going to take apart the cover and 
anything else in your way (and probably set off internal alarms).  And 
even without that, it's still possible to DoS an ATM without much 
effort.  Most of them have a 3.5mm headphone jack for TTS for people 
with poor vision, and that's more than enough to overload at least part 
of the system with a relatively simple to put together bit of 
electronics that would cost you less than 10 USD.
> imageine you're running a VM hosting service, where you allow users to
> upload images and have them deployed.
> In the cheap" case these will end up as regular files, where they
> couldn't do any harm (even if colliding UUIDs)... but even there one
> would have to expect, that the hypervisor admin may losetup them for
> whichever reason.
> But if you offer more professional services, you may give your clients
> e.g. direct access to some storage backend, which are then probably
> also seen on the host by its kernel.
> And here we already have the case, that a client could remotely trigger
> such collision.
In that particular situation, it's not relevant unless the host admin 
goes to mount them.  UUID collisions are only an issue if the 
filesystems get mounted.
>
> And remember, things only sounds far-fetched until it actually happens
> the first time ;)
>
>
>> I still think that that isn't a sufficient excuse for not fixing the
>> issue, as there are a number of non-security related issues that can
>> result from this (there are some things that are common practice with
>> LVM or mdraid that can't be done with BTRFS because of this).
> Sure, I guess we agree on that,...
>
>
>>> Apart from that, btrfs should be a general purpose fs, and not just
>>> a
>>> desktop or server fs.
>>> So edge cases like forensics (where it's common that you create
>>> bitwise
>>> identical images) shouln't be forgotten either.
>> While I would normally agree, there are ways to work around this in
>> the
>> forensics case that don't work for any other case (namely, if BTRFS
>> is
>> built as a module, you can unmount everything, unload the module,
>> reload
>> it, and only scan the devices you want).
> see below (*)
>
>
>> On that note, why exactly is it better to make the filesystem UUID
>> such
>> an integral part of the filesystem?
> Well I think it's a proper way to e.g. handle the multi-device case.
> You have n devices, you want to differ them,... using a pseudo-random
> UUID is surely better than giving them numbers.
That's debatable, the same issues are obviously present in both cases 
(individual numbers can collide too).
> Same for the fs UUID, e.g. when used for mounting devices whose paths
> aren't stable.
In the case of a sanely designed system using LVM for example, device 
paths are stable.
>
> As said before, using the UUID isn't the problem - not protecting
> against collisions is.
No, the issues are:
1. We assume that the UUID will be unique for the life of the 
filesystem, which is not a safe assumption.
2. We don't sanely handle things if it isn't unique.
>
>
>> The other thing I'm reading out of
>> this all, is that by writing a total of 64 bytes to a specific
>> location
>> in a single disk in a multi-device BTRFS filesystem, you can make the
>> whole filesystem fall apart, which is absolutely absurd.
> Well,... I don't think that writing *into* the filesystem is covered by
> common practise anymore.
For end users, I agree.  Part of the discussion involves attacks on the 
system, and for a attacker it's not a far stretch to write directly to 
the block device if possible (and it's even common practice for 
bypassing permission checks done in the VFS layer).
>
> In UNIX, a device (which holds the filesystem) is a file. Therefore one
> can argue: if one copies/duplicates one file (i.e. the fs) neither of
> the two's contents should get corrupted.
> But if you actively write *into* the file by yourself,... then you're
> simply on your own, either you know what you do, or just may just
> corrupt *that* specific file. Of course it should again not lead to any
> of it's clones or become corrupted as well.
My point is that by changing the UUID in a superblock (and properly 
updating the checksum for the superblock), you can trivially break a 
multi-device filesystem.  And it's a whole lot easier to do that than it 
is to do the equivalent for LVM.
>
>
>> And some recovery situations (think along the lines of no recovery
>> disk,
>> and you only have busybox or something similar to work with).
> (*) which is however also, why you may not be able to unmount the
> device anymore or unload btrfs.
> Maybe you have reasons you must/want to do any forensics in the running
> system.
>
>
>>> AFAIK, there's not even a solution right now, that copies a
>>> complete
>>> btrfs, with snapshots, etc. preserving all ref-links. At least
>>> nothing
>>> official that works in one command.
>> Send-receive kind of works for that
> I've added the "in one command" for that... O:-)
> In case the btrfs would have subvols/snapshots... the user would need
> to make the recursion himself...


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15 14:18                             ` Hugo Mills
@ 2015-12-15 14:27                               ` Austin S. Hemmelgarn
  2015-12-15 14:42                                 ` Hugo Mills
  2015-12-16 12:03                               ` Christoph Anton Mitterer
  1 sibling, 1 reply; 51+ messages in thread
From: Austin S. Hemmelgarn @ 2015-12-15 14:27 UTC (permalink / raw)
  To: Hugo Mills, Chris Murphy, Christoph Anton Mitterer, Btrfs BTRFS

On 2015-12-15 09:18, Hugo Mills wrote:
> On Tue, Dec 15, 2015 at 08:54:01AM -0500, Austin S. Hemmelgarn wrote:
>> On 2015-12-14 16:26, Chris Murphy wrote:
>>> On Mon, Dec 14, 2015 at 6:23 AM, Austin S. Hemmelgarn
>>> <ahferroin7@gmail.com> wrote:
>>>>
>>>> Agreed, if yo9u can't substantiate _why_ it's bad practice, then you aren't
>>>> making a valid argument.  The fact that there is software that doesn't
>>>> handle it well would say to me based on established practice that that
>>>> software is what's broken, not common practice.
>>>
>>> The automobile is invented and due to the ensuing chaos, common
>>> practice of doing whatever the F you wanted came to an end in favor of
>>> rules of the road and traffic lights. I'm sure some people went
>>> ballistic, but for the most part things were much better without the
>>> brokenness or prior common practice.
>> Except for one thing:  Automobiles actually provide a measurable
>> significant benefit to society.  What specific benefit does
>> embedding the filesystem UUID in the metadata actually provide?
>
>     That one's easy to answer. It deals with a major issue that
> reiserfs had: if you have a filesystem with another filesystem image
> stored on it, reiserfsck could end up deciding that both the metadata
> blocks of the main filesystem *and* the metadata blocks of the image
> were part of the same FS (because they're on the same block device),
> and so would splice both filesystems into one, generally complaining
> loudly along the way that there was a lot of corruption present that
> it was trying to fix.
IIRC, that was because of the way the SB was designed, and is why other 
filesystems have a UUID in the superblock.

I probably should have been clearer with my statement, what I meant was:
What specific benefit does using the UUID for multi-device filesystems 
to identify the various devices provide?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15 14:27                               ` Austin S. Hemmelgarn
@ 2015-12-15 14:42                                 ` Hugo Mills
  2015-12-15 16:03                                   ` Austin S. Hemmelgarn
  2015-12-16 12:10                                   ` Christoph Anton Mitterer
  0 siblings, 2 replies; 51+ messages in thread
From: Hugo Mills @ 2015-12-15 14:42 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Christoph Anton Mitterer, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4036 bytes --]

On Tue, Dec 15, 2015 at 09:27:12AM -0500, Austin S. Hemmelgarn wrote:
> On 2015-12-15 09:18, Hugo Mills wrote:
> >On Tue, Dec 15, 2015 at 08:54:01AM -0500, Austin S. Hemmelgarn wrote:
> >>On 2015-12-14 16:26, Chris Murphy wrote:
> >>>On Mon, Dec 14, 2015 at 6:23 AM, Austin S. Hemmelgarn
> >>><ahferroin7@gmail.com> wrote:
> >>>>
> >>>>Agreed, if yo9u can't substantiate _why_ it's bad practice, then you aren't
> >>>>making a valid argument.  The fact that there is software that doesn't
> >>>>handle it well would say to me based on established practice that that
> >>>>software is what's broken, not common practice.
> >>>
> >>>The automobile is invented and due to the ensuing chaos, common
> >>>practice of doing whatever the F you wanted came to an end in favor of
> >>>rules of the road and traffic lights. I'm sure some people went
> >>>ballistic, but for the most part things were much better without the
> >>>brokenness or prior common practice.
> >>Except for one thing:  Automobiles actually provide a measurable
> >>significant benefit to society.  What specific benefit does
> >>embedding the filesystem UUID in the metadata actually provide?
> >
> >    That one's easy to answer. It deals with a major issue that
> >reiserfs had: if you have a filesystem with another filesystem image
> >stored on it, reiserfsck could end up deciding that both the metadata
> >blocks of the main filesystem *and* the metadata blocks of the image
> >were part of the same FS (because they're on the same block device),
> >and so would splice both filesystems into one, generally complaining
> >loudly along the way that there was a lot of corruption present that
> >it was trying to fix.
> IIRC, that was because of the way the SB was designed, and is why
> other filesystems have a UUID in the superblock.
> 
> I probably should have been clearer with my statement, what I meant was:
> What specific benefit does using the UUID for multi-device
> filesystems to identify the various devices provide?

   Well, given a bunch of block devices, how do you identify which
ones to use for each of the (unknown number of) filesystems in the
system?

   You can either use some kind of config file, which is going to get
out of date as device enumeration orders change or as devices are
added/deleted from the FS, or you can try to identify the devices that
belong together automatically in some way. btrfs uses the latter
option (with the former option kind of supported using the device=
mount option). The use of a UUID isn't fundamental to the latter
process, but anything that you replaced the UUID with would have the
same issues that we're seeing here -- make a duplicate of the device
at the block level, and you get additional devices that look like they
should be part of the FS.

   The question is not how you avoid duplicating the UUIDs, but how
you identify that there are duplicates present, and how you deal with
that issue once you've detected them. This is complicated by the fact
that it's perfectly legitimate to have two block devices in the system
that identify themselves as the same device for the same filesystem --
this happens when they're different views of the same underlying
storage through multipathing.

   I would suggest trying to migrate to a state where detecting more
than one device with the same UUID and devid is cause to prevent the
FS from mounting, unless there's also a "mount_duplicates_yes_i_
know_this_is_dangerous_and_i_know_what_im_doing" mount flag present,
for the multipathing people. That will break existing userspace
behaviour for the multipathing case, but the migration can probably be
managed. (e.g. NFS has successfully changed default behaviour for one
of its mount options in the last few(?) years).

   Hugo.

-- 
Hugo Mills             | I think that everything darkling says is actually a
hugo@... carfax.org.uk | joke. It's just that we haven't worked out most of
http://carfax.org.uk/  | them yet.
PGP: E2AB1DE4          |                                                Vashka

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15 14:42                                 ` Hugo Mills
@ 2015-12-15 16:03                                   ` Austin S. Hemmelgarn
  2015-12-16 12:14                                     ` Christoph Anton Mitterer
  2015-12-16 12:10                                   ` Christoph Anton Mitterer
  1 sibling, 1 reply; 51+ messages in thread
From: Austin S. Hemmelgarn @ 2015-12-15 16:03 UTC (permalink / raw)
  To: Hugo Mills, Chris Murphy, Christoph Anton Mitterer, Btrfs BTRFS

On 2015-12-15 09:42, Hugo Mills wrote:
> On Tue, Dec 15, 2015 at 09:27:12AM -0500, Austin S. Hemmelgarn wrote:
>> On 2015-12-15 09:18, Hugo Mills wrote:
>>> On Tue, Dec 15, 2015 at 08:54:01AM -0500, Austin S. Hemmelgarn wrote:
>>>> On 2015-12-14 16:26, Chris Murphy wrote:
>>>>> On Mon, Dec 14, 2015 at 6:23 AM, Austin S. Hemmelgarn
>>>>> <ahferroin7@gmail.com> wrote:
>>>>>>
>>>>>> Agreed, if yo9u can't substantiate _why_ it's bad practice, then you aren't
>>>>>> making a valid argument.  The fact that there is software that doesn't
>>>>>> handle it well would say to me based on established practice that that
>>>>>> software is what's broken, not common practice.
>>>>>
>>>>> The automobile is invented and due to the ensuing chaos, common
>>>>> practice of doing whatever the F you wanted came to an end in favor of
>>>>> rules of the road and traffic lights. I'm sure some people went
>>>>> ballistic, but for the most part things were much better without the
>>>>> brokenness or prior common practice.
>>>> Except for one thing:  Automobiles actually provide a measurable
>>>> significant benefit to society.  What specific benefit does
>>>> embedding the filesystem UUID in the metadata actually provide?
>>>
>>>     That one's easy to answer. It deals with a major issue that
>>> reiserfs had: if you have a filesystem with another filesystem image
>>> stored on it, reiserfsck could end up deciding that both the metadata
>>> blocks of the main filesystem *and* the metadata blocks of the image
>>> were part of the same FS (because they're on the same block device),
>>> and so would splice both filesystems into one, generally complaining
>>> loudly along the way that there was a lot of corruption present that
>>> it was trying to fix.
>> IIRC, that was because of the way the SB was designed, and is why
>> other filesystems have a UUID in the superblock.
>>
>> I probably should have been clearer with my statement, what I meant was:
>> What specific benefit does using the UUID for multi-device
>> filesystems to identify the various devices provide?
>
>     Well, given a bunch of block devices, how do you identify which
> ones to use for each of the (unknown number of) filesystems in the
> system?
>
>     You can either use some kind of config file, which is going to get
> out of date as device enumeration orders change or as devices are
> added/deleted from the FS, or you can try to identify the devices that
> belong together automatically in some way. btrfs uses the latter
> option (with the former option kind of supported using the device=
> mount option). The use of a UUID isn't fundamental to the latter
> process, but anything that you replaced the UUID with would have the
> same issues that we're seeing here -- make a duplicate of the device
> at the block level, and you get additional devices that look like they
> should be part of the FS.
>
>     The question is not how you avoid duplicating the UUIDs, but how
> you identify that there are duplicates present, and how you deal with
> that issue once you've detected them. This is complicated by the fact
> that it's perfectly legitimate to have two block devices in the system
> that identify themselves as the same device for the same filesystem --
> this happens when they're different views of the same underlying
> storage through multipathing.
>
>     I would suggest trying to migrate to a state where detecting more
> than one device with the same UUID and devid is cause to prevent the
> FS from mounting, unless there's also a "mount_duplicates_yes_i_
> know_this_is_dangerous_and_i_know_what_im_doing" mount flag present,
> for the multipathing people. That will break existing userspace
> behaviour for the multipathing case, but the migration can probably be
> managed. (e.g. NFS has successfully changed default behaviour for one
> of its mount options in the last few(?) years).
May I propose the alternative option of adding a flag to tell mount to 
_only_ use the devices specified in the options?  That would allow 
people to work around the common issues (multipath, dm-cache, etc), and 
would provide people who have stable device enumeration to mitigate the 
possibility of an attack.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15 13:54                           ` Austin S. Hemmelgarn
  2015-12-15 14:18                             ` Hugo Mills
@ 2015-12-16 12:03                             ` Christoph Anton Mitterer
  2015-12-17  2:43                               ` Duncan
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-16 12:03 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 4904 bytes --]

On Tue, 2015-12-15 at 08:54 -0500, Austin S. Hemmelgarn wrote:
> Except for one thing:  Automobiles actually provide a measurable
> significant benefit to society.  What specific benefit does embedding
> the filesystem UUID in the metadata actually provide?
I guess that's quite obvious.
You want something that can be used to address devices stable (i.e. not
their "path" like sda,sdb). So either some ID or a label. Human
readable lables are basically guaranteed to collide, so UUIDs are the
clean solution.
Since there is however no guarantee that they don't collide (either by
accident or malicious intent), you need to protect against that.

Analogous for the device IDs of multi-device fs or containers.

> > "A UUID is 128 bits long, and can guarantee uniqueness across space
> > and time."
> > 
> > Also see security considerations in section 6.
> Both aspects ignore the facts that:
> Version 1 is easy to cause a collision with (MAC addresses are by no 
> means unique, and are easy to spoof, and so are timestamps).
> Version 2 is relatively easy to cause a collision with, because UID
> and 
> GID numbers are a fixed size namespace.
> Version 3 is slightly better, but still not by any means unique
> because 
> you just have to guess the seed string (or a collision for it).
> Version 4 is probably the hardest to get a collision with, but only
> if 
> you are using a true RNG, and evne then, 122 bits of entropy is not
> much 
> protection.
> Version 5 has the same issues as Version 3, but is more secure
> against 
> hash collisions.
I guess we don't need to discuss how unique UUIDs are when they're
*freshly created*, since this is the only thing what the RFC
"guarantees"...
That's mostly irrelevant for us here, as we have two far more stronger
cases, accidental duplication and malicious collisions.
The possible case, that by normal means (e.g. mkfs.btrfs) a UUID
collision occurs, are small, but solving the actual two cases here,
will solve that one as well.

Apart from that, I've noticed in several of your mails that either
something with the indention level goes wrong, or you mix contents from
multiple mails from different people.
E.g. that "Also see security considerations in section 6." wasn't from
me, which was at quotation level 1 in your mail, but the example with
the automobile, which was also on level 1, was from me.
That's kinda confusing...


> In general, you should only use UUID's when either:
> a. You have absolutely 100% complete control of the storage of them, 
> such that you can guarantee they don't get reused.
> b. They can be guaranteed to be relatively unique for the system
> using them.
No, this aren't necessary constraints. And in fact would make multi-
device practically impossible (you always need some ID, unless you want
to open the door for countless of errors, where people wrongly assemble
their devices... whether it's UUID or anything else, doesn't matter).
The only thing that one needs to do, is handle collisions gracefully
and don't do auto-assemblies,.. all as I've described in the mail from
"Fri, 11 Dec 2015 23:06:03 +0100"
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/50909/focus=51147)



> > There could be some leveraging of the device WWN, or absent that
> > its
> > serial number, propogated into all of the volume's devices (cross
> > referencing each other's devid to WWN or serial). And then that way
> > there's a way to differentiate. In the dd case, there would be
> > mismatching real device WWN/serial number and the one written in
> > metadata on all drives, including the copy. This doesn't say what
> > policy should happen next, just that at least it's known there's a
> > mismatch.
> > 
> That gets tricky too, because for example you have stuff like flat
> files 
> used as filesystem images.
plus... one cannot be sure whether any hardware device IDs, like serial
numbers, are unique... a powerful attacker could surely change these as
well.
Or imagine you have a failing harddisk, and dd it's content to
another... the btrfs part would stay identical, while the hardware
device IDs change and confuse everything.



> However, if we then use some separate UUID (possibly hashed off of
> the 
> file location) in place of the device serial/WWN, that could 
> theoretically provide some better protection.
Not really... it just delegates the problem one level further.
The only real protection is, that the kernel and userland tools deal
correctly with the situation.


> The obvious solution in 
> the case of a mismatch would be to refuse the mount until either the 
> issue is fixed using the tools, or the user specifies some particular
> mount option to either fix ti automatically, or ignore copies with a 
> mismatching serial.
Sure, as I've said before :-)


Cheers,
Chris

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15 14:18                             ` Hugo Mills
  2015-12-15 14:27                               ` Austin S. Hemmelgarn
@ 2015-12-16 12:03                               ` Christoph Anton Mitterer
  2015-12-16 14:41                                 ` Chris Mason
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-16 12:03 UTC (permalink / raw)
  To: Hugo Mills, Austin S. Hemmelgarn; +Cc: Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]

On Tue, 2015-12-15 at 14:18 +0000, Hugo Mills wrote:
>    That one's easy to answer. It deals with a major issue that
> reiserfs had: if you have a filesystem with another filesystem image
> stored on it, reiserfsck could end up deciding that both the metadata
> blocks of the main filesystem *and* the metadata blocks of the image
> were part of the same FS (because they're on the same block device),
> and so would splice both filesystems into one, generally complaining
> loudly along the way that there was a lot of corruption present that
> it was trying to fix.
Hmm that's a bit strange though, and to me it rather sounds like other
bugs...
You can have a ext4 on a file in an ext4, with or without the same
UUIDs, and it will just work.
If the filesystem takes contents from a normal file as possible
metadata, than something else is severely screwed up... or in case of
the fsck: it probably means it's a bit too liberal in searching places.

I'd be quite shocked if this is the case in btrfs, cause it would mean
again, that we have a vulnerability against UUID collisions.
Imagine some attacker finds out the UUID of a filesystem (which is
probably rather easy)... next he uploads some file (e.g. it's a
webserver with allows image uploads, a forum perhaps) that in reality
contains what's looks like btrfs metadata and uses a matching UUID.

It would run into the same issues as what you describe for reiser,..
the UUID would be no real help to solve that problem.


Does anyone know whether btrfsck (or other userland) tools do such
things? I.e. search more or less arbitrary blocks, where it cannot be
sure it's *not* data, for what it would interpret as meta-data
subsequently?


CHeers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15 14:42                                 ` Hugo Mills
  2015-12-15 16:03                                   ` Austin S. Hemmelgarn
@ 2015-12-16 12:10                                   ` Christoph Anton Mitterer
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-16 12:10 UTC (permalink / raw)
  To: Hugo Mills, Austin S. Hemmelgarn; +Cc: Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

On Tue, 2015-12-15 at 14:42 +0000, Hugo Mills wrote:
>    I would suggest trying to migrate to a state where detecting more
> than one device with the same UUID and devid is cause to prevent the
> FS from mounting, unless there's also a "mount_duplicates_yes_i_
> know_this_is_dangerous_and_i_know_what_im_doing" mount flag present,
> for the multipathing people. That will break existing userspace
> behaviour for the multipathing case, but the migration can probably
> be
> managed. (e.g. NFS has successfully changed default behaviour for one
> of its mount options in the last few(?) years).

I don't think that a single mountpoint a la "force-and-do-it" is a
proper solution here. It would still open surface for attacks and also
for accidents.
In the case mutli-pathing is used, the only realistic way seems to be
manually specifying the devices a la device=/dev/sda,/dev/sdb.

Of course btrfs would stil use the UUIDs/deviceIDs of these, but *only*
of those devices that have been whitelisted with the device=option.

In the case of a general "mount_duplicates_yes_iknow_th..." option you
could end up with having e.g. three duplicates, two being actually
mutli-paths, and the third one being a losetup or USB clone of the
image,... again allowing for the aforementioned attacks to happen, and
again allowing for severe corruption to occur.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15 16:03                                   ` Austin S. Hemmelgarn
@ 2015-12-16 12:14                                     ` Christoph Anton Mitterer
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-16 12:14 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Hugo Mills, Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 891 bytes --]

On Tue, 2015-12-15 at 11:03 -0500, Austin S. Hemmelgarn wrote:
> May I propose the alternative option of adding a flag to tell mount
> to 
> _only_ use the devices specified in the options?
That's one part of exactly what I propose since a few days :-P
(no one seems to read my mails ;-) )
Plus that this isn't the case only for mounts, but also fsck, repair,
and all other userland tool operations.

But it's only part of the solution to the whole problem, the other one
is that automatic device activations/rebuilds/etc. of _already active_
devices should generally not happen (manual of course may happen, again
with device= options, specifying *which* devices are actually meant).

See my mail from "Fri, 11 Dec 2015 23:06:03 +0100"
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/50909/focus=5114
7)
which I think covers pretty much all cases.

Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-15 14:19                           ` Austin S. Hemmelgarn
@ 2015-12-16 12:56                             ` Christoph Anton Mitterer
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-16 12:56 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 6035 bytes --]

On Tue, 2015-12-15 at 09:19 -0500, Austin S. Hemmelgarn wrote:
> Um, no you don't have direct physical access to the hardware with an
> ATM, at least, not unless you are going to take apart the cover and 
> anything else in your way (and probably set off internal alarms).
Well access to the services ports (which may be USB) is typically much
easier, and doesn't require to completely dismantle the steel and so...
Simply because service teams also need to access these "regularly".

But even if we don't count ATMs here, use any other publicly accessible
computer terminals.
Library computer, the entertainment systems in airplanes, TVs in a
shopping centre, etc. pp.


>   And 
> even without that, it's still possible to DoS an ATM without much 
> effort.  Most of them have a 3.5mm headphone jack for TTS for people 
> with poor vision, and that's more than enough to overload at least
> part 
> of the system with a relatively simple to put together bit of 
> electronics that would cost you less than 10 USD.
As I've said before,.. you always find another weak link, of course,...
as it was pointed out before, USB itself is quite a security problem
(firmware attacks and that like).

But just because there are other issues, right now, there is no
justification to make btrfs "weak" as well... because this just leads
to the vicious circle, that everyone has security issues, not willing
to solve them, pointing to others as an excuse.


> > imageine you're running a VM hosting service, where you allow users
> > to
> > upload images and have them deployed.
> > In the cheap" case these will end up as regular files, where they
> > couldn't do any harm (even if colliding UUIDs)... but even there
> > one
> > would have to expect, that the hypervisor admin may losetup them
> > for
> > whichever reason.
> > But if you offer more professional services, you may give your
> > clients
> > e.g. direct access to some storage backend, which are then probably
> > also seen on the host by its kernel.
> > And here we already have the case, that a client could remotely
> > trigger
> > such collision.
> In that particular situation, it's not relevant unless the host admin
> goes to mount them.  UUID collisions are only an issue if the 
> filesystems get mounted.
Hmm from the impression I got so far, it was not only a problem when
actually mounting... but even if... this doesn't change the situation.
Same problem as before, the host system may have btrfs filesystems
whose IDs have leaked, the attacker may upload them as VM images as
described above, and even if the host's admin doesn't want to mount
those, he may mount what he considers his filsystems, which however
also collide.
Boom. Same issues as before.

Turn it as you want, resistance is futile ;-)


> > Well I think it's a proper way to e.g. handle the multi-device
> > case.
> > You have n devices, you want to differ them,... using a pseudo-
> > random
> > UUID is surely better than giving them numbers.
> That's debatable, the same issues are obviously present in both cases
> (individual numbers can collide too).
Sure, as I've said. You always must handle the case of accidentally or
maliciously colliding IDs if you count on data integrity and security.
But using UUIDs makes chances at least small that you run into
collisions (that users must than manually resolve somehow) *even when*
you just create fresh filesystem, have no attacker and no dd or that
like goes in your way.


> > Same for the fs UUID, e.g. when used for mounting devices whose
> > paths
> > aren't stable.
> In the case of a sanely designed system using LVM for example, device
> paths are stable.
Well, but LVM itself works with UUIDs again, so you just delegate the
problem.
And apart from that, with btrfs, I thought, we rather want to avoid
using LVM below.


> > As said before, using the UUID isn't the problem - not protecting
> > against collisions is.
> No, the issues are:
> 1. We assume that the UUID will be unique for the life of the 
> filesystem, which is not a safe assumption.
> 2. We don't sanely handle things if it isn't unique.
Well isn't that what I've said? At least it's what I've meant ;)


> > Well,... I don't think that writing *into* the filesystem is
> > covered by
> > common practise anymore.
> For end users, I agree.  Part of the discussion involves attacks on
> the 
> system, and for a attacker it's not a far stretch to write directly
> to 
> the block device if possible (and it's even common practice for 
> bypassing permission checks done in the VFS layer).
Well but that's something else here what I don't think we can cover.
What we must assume is, that devices show up with colliding IDs, either
by "accident" or means like dd... or by an attacker somehow being able
to make them show up (USB, the image upload scenarios I've described
before, and so on).

If the attacker can however write to *arbitrary* (and not just "his")
devices, bypassing checks in the VFS layer or anything else... well
than game over.
He wouldn't need bothering to do a probably compelx attack based on
btrfs colliding UUIDs - he could simply overwrite the root filesystem
and reboot with his own malicious kernel/etc.


> My point is that by changing the UUID in a superblock (and properly 
> updating the checksum for the superblock), you can trivially break a 
> multi-device filesystem.  And it's a whole lot easier to do that than
> it 
> is to do the equivalent for LVM.
I'm a bit unsure what you try to show:
- Arbitrarily writing into a device (e.g. in the superblock) is IMHO
  not common practise, an not justified by common or historical use.
- If one does however do it (and it doesn't matter if the admin does it
  or an attacker), one would of course end up in the situation, where
  btrfs should detect this, and refuse mounting, fsck'ing, and that
  like.
  Problem solved.



Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-16 12:03                               ` Christoph Anton Mitterer
@ 2015-12-16 14:41                                 ` Chris Mason
  2015-12-16 15:04                                   ` Christoph Anton Mitterer
  0 siblings, 1 reply; 51+ messages in thread
From: Chris Mason @ 2015-12-16 14:41 UTC (permalink / raw)
  To: Christoph Anton Mitterer
  Cc: Hugo Mills, Austin S. Hemmelgarn, Chris Murphy, Btrfs BTRFS

On Wed, Dec 16, 2015 at 01:03:38PM +0100, Christoph Anton Mitterer wrote:
> On Tue, 2015-12-15 at 14:18 +0000, Hugo Mills wrote:
> >    That one's easy to answer. It deals with a major issue that
> > reiserfs had: if you have a filesystem with another filesystem image
> > stored on it, reiserfsck could end up deciding that both the metadata
> > blocks of the main filesystem *and* the metadata blocks of the image
> > were part of the same FS (because they're on the same block device),
> > and so would splice both filesystems into one, generally complaining
> > loudly along the way that there was a lot of corruption present that
> > it was trying to fix.
> Hmm that's a bit strange though, and to me it rather sounds like other
> bugs...
> You can have a ext4 on a file in an ext4, with or without the same
> UUIDs, and it will just work.

Hugo is right here.  reiserfs had tools that would scan and entire block
device for metadata blocks and try to reconstruct the filesystem based
on what it found.  Since there was no uuid, it was impossible to tell if
a block from the scan was really part of this filesystem or part of some
image file that happened to be sitting there.

Adding UUIDs doesn't make that whole class of problem go away (you could
have an image of the filesystem inside that filesystem), but it does
make it dramatically less likely.  

At the end of the day it's just a best practice mechanism to help
recovery and prevent admin mistakes.  It's also a building block of the
multi-device support.

We could change the multi-device support to allow duplicate uuids in
single device filesystems.  But I'd much rather see a variation on seed
devices enable transitioning from one uuid to another.

> If the filesystem takes contents from a normal file as possible
> metadata, than something else is severely screwed up... or in case of
> the fsck: it probably means it's a bit too liberal in searching places.
> 
> I'd be quite shocked if this is the case in btrfs, cause it would mean
> again, that we have a vulnerability against UUID collisions.

> Imagine some attacker finds out the UUID of a filesystem (which is
> probably rather easy)... next he uploads some file (e.g. it's a
> webserver with allows image uploads, a forum perhaps) that in reality
> contains what's looks like btrfs metadata and uses a matching UUID.
> 
> It would run into the same issues as what you describe for reiser,..
> the UUID would be no real help to solve that problem.
> 
> 
> Does anyone know whether btrfsck (or other userland) tools do such
> things? I.e. search more or less arbitrary blocks, where it cannot be
> sure it's *not* data, for what it would interpret as meta-data
> subsequently?

These are emergency tools, btrfs restore and find-roots can do some
scanning.  We don't do it the way reiserfs did because it would be very
difficult to reconstruct shared data and metadata from snapshots.

-chris

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-16 14:41                                 ` Chris Mason
@ 2015-12-16 15:04                                   ` Christoph Anton Mitterer
  2015-12-17  3:25                                     ` Duncan
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-16 15:04 UTC (permalink / raw)
  To: Chris Mason; +Cc: Hugo Mills, Austin S. Hemmelgarn, Chris Murphy, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1682 bytes --]

On Wed, 2015-12-16 at 09:41 -0500, Chris Mason wrote:
> Hugo is right here.  reiserfs had tools that would scan and entire
> block
> device for metadata blocks and try to reconstruct the filesystem
> based
> on what it found.
Creepy... at least when talking about a "normal" fsck... good that
btrfs is going to be the next-gen-ext, and not reiser4 ;)

> Adding UUIDs doesn't make that whole class of problem go away (you
> could
> have an image of the filesystem inside that filesystem), but it does
> make it dramatically less likely.  
Sure...


> > Does anyone know whether btrfsck (or other userland) tools do such
> > things? I.e. search more or less arbitrary blocks, where it cannot
> > be
> > sure it's *not* data, for what it would interpret as meta-data
> > subsequently?
> 
> These are emergency tools, btrfs restore and find-roots can do some
> scanning.  We don't do it the way reiserfs did because it would be
> very
> difficult to reconstruct shared data and metadata from snapshots.

Hmm I agree, that it's valid for such tools, to do these kinds of scans
(i.e. scan for meta-data in places that are not known for sure to be
meta-data) when doing some last-resort-rescue tries... or for rescue
operations, where it's clearly documented that this is done.

But I think it shouldn't happen e.g. during a normal fsck - only when
special options are given.
And it should be properly documented (i.e. telling people in the docs,
that this does a block for block scan for meta-data even within normal
data, and that if they'd had e.g. another fs of the same UUIDs within,
the results may be completely bogus.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-16 12:03                             ` Christoph Anton Mitterer
@ 2015-12-17  2:43                               ` Duncan
  0 siblings, 0 replies; 51+ messages in thread
From: Duncan @ 2015-12-17  2:43 UTC (permalink / raw)
  To: linux-btrfs

Christoph Anton Mitterer posted on Wed, 16 Dec 2015 13:03:24 +0100 as
excerpted:

> Human readable lables are basically guaranteed to collide,

Heh, not here, tho one could argue that my labels aren't "human 
readable", I suppose.

grep LABEL= /etc/fstab | cut -f1
LABEL=bt0238gcn1+35l0
LABEL=bt0238gcn0+35l0
LABEL=bt0465gsg0+47f0
LABEL=rt0238gcnx+35l0
LABEL=rt0238gcnx+35l1
LABEL=rt0465gsg0+47f0
LABEL=hm0238gcnx+35l0
LABEL=pk0238gcnx+35l0
LABEL=nr0238gcnx+35l0
LABEL=hm0238gcnx+35l1
LABEL=pk0238gcnx+35l1
LABEL=nr0238gcnx+35l1
LABEL=hm0465gsg0+47f0
LABEL=pk0465gsg0+47f0
LABEL=nr0465gsg0+47f0
LABEL=lg0238gcnx+35l0
LABEL=lg0465gsg0+47f0
LABEL=mm0465gsg0+2550
LABEL=mm0465gsg0+2551
#LABEL=sw0465gsg0+47f0

The scheme was originally designed with reiserfs' 15-char limited labels 
in mind, so it's 15-char.  These days I use it for both fs labels and gpt 
partition names/labels, with the two generally matched except for the 
device sequential, which is x in the multi-device case.

* function:	2 chars	bt=boot, hm=home, etc

* device-id:	8	uniq-in-scope device id
** size: 	5	0238g=238 GiB
** brand:	2	sg=seagate, cn=corsair neutron, etc
** dev-seq:	1	can be more than one 465 GiB seagate

* target:	1	+=home workstation, . for the netbook, etc

* date:		3	date of original partition creation
** year:	1	last digit of year, gives decade scope
** month	1	1-9abc
** day		1	1-9a-v (2char would be nice here, but...)

* func-seq	1	0=working, backup-N

2+8+1+3+1=15 chars =:^)

So for example rt0238gcnx+35l0 is root, on 238 GiB Corsair Neutron (multi-
devices), targeted at the workstation, with the partitions originally 
setup on 2013, June (something, whatever l is), working copy.

(Hmm...  Only apropos to this thread due to the tangential btrfs angle, 
but that's two and a half years ago.  Which since that's when I first 
deployed btrfs permanently, I've been running btrfs for two and a half 
years now. ... =:^)

The function tells me at a glance what it's intended to be used for.

The target (which also functions as a visual separator) tells me at a 
glance where the device is intended to be used.

The func-seq tells me at a glance whether I'm dealing with the working 
copy or what level of backup, and taken together with the function and 
target, uniquely ID the partition/filesystem "software device".

The dev-id is uniq-in-scope, easily IDing size, brand, and number of 
"hardware device", and size is ridiculously scalable from bytes to PiB 
and beyond.  For multi-device btrfs, dev-seq is "x", while the individual 
device partitions composing it still have their sequence numbers in their 
gpt labels.

The date (along with size, of course) provides some idea of the age of 
the device, or at least the partitioning scheme on it, as well as 
providing more bits of "software device" and overall unique-id.

Both sequence numbers can easily and intuitively scale to 61 (1-9a-zA-Z) 
if needed, and less intuitively a bit higher if it's really necessary.  
Target would lose its separator status if it scaled too far, but 
certainly gives me as an individual /reasonable/ number of machines 
flexibility.

This scheme self-evidently and easily scales to a library well into the 
multi-hundreds if not thousands of physical devices, portable or 
permanently installed, partitioned up as needed.  I haven't yet found the 
need as my "device library" is small enough, but were I to need to, I 
could reasonably easily put together a database tracking where various 
files (and even various versions of those files) are located.  With the 
"software device" and "hardware device" IDed separately, I can easily 
substitute out or add/remove hardware devices from software devices, or 
the reverse, as necessary.

The biggest problem is the 15-char limit; I had to pack the fields rather 
tighter and more cryptically than I'd have liked, so it's not as easily 
human readable as I'd have liked.  And of course it'd need adapted for 
deployment scales on the level of facebook/google/nsa, where 60-some 
device-scaling in the sequence numbers, and the target scaling as well, 
is pitifully laughable, but it's certainly reasonable on an individual 
scale, and with a couple revisions for mdraid and btrfs (basically, md 
for brand when I was doing partitioned mdraid, and substituting x for 
individual sequence number for multi-device), the scheme has served me 
surprisingly well over the years since I came up with it, and should 
continue to do so, I suppose, until I no longer have the need (death, or 
near-vegetable in a nursing home or whatever).  Tho if HP's "the machine" 
were to ever take off in my lifetime, it could prove somewhat... 
challenging to the mental and nomenclature model, but that pretty much 
applies to the entire computer field, both hardware and software, as we 
know it, so I'm far from alone, there.

But, despite the debatable human-readability, it's a h*** of a lot more 
readable than UUIDs, and works very well indeed in LABEL= usage in fstab, 
being a h*** of a lot easier to work with there than UUIDs! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-16 15:04                                   ` Christoph Anton Mitterer
@ 2015-12-17  3:25                                     ` Duncan
  2015-12-18  0:56                                       ` Christoph Anton Mitterer
  2015-12-22  2:13                                       ` Kai Krakow
  0 siblings, 2 replies; 51+ messages in thread
From: Duncan @ 2015-12-17  3:25 UTC (permalink / raw)
  To: linux-btrfs

Christoph Anton Mitterer posted on Wed, 16 Dec 2015 16:04:03 +0100 as
excerpted:

> On Wed, 2015-12-16 at 09:41 -0500, Chris Mason wrote:

>> Hugo is right here.  reiserfs had tools that would scan and entire
>> block device for metadata blocks and try to reconstruct the filesystem
>> based on what it found.

> Creepy... at least when talking about a "normal" fsck... good that btrfs
> is going to be the next-gen-ext, and not reiser4 ;)

What often gets lost in discussions of this nature is that it _wasn't_ 
"normal" fsck that had the problem, but rather, a parameter 
(--rebuild-tree, IIRC) much like btrfs check (--init-csum-tree,
 init-extent-tree) and rescue (chunk-recover) use for blowing away and 
recreating the checksum tree, extent tree, chunk tree, etc.

So it's definitely _not_ something that reiserfsck would do in a "normal" 
fsck, only when doing "I'm desperate and don't have backups, go to the 
ends of the earth if necessary to recover what you can of my data, and 
yes, I understand it could be a bit risky or end up rather disordered, 
but I'm willing to take that risk because I _am_ that desperate", level 
recovery.

Arguably, however, the problem was that reiserfs (heh, that's the second 
time I almost wrote btrfs and caught it, hope I didn't miss any! =:^) had 
a rather minor items repair mode, and an "I'm desperate, ends of the 
earth and I don't care about the risk as anything is better than nothing" 
mode, but not a lot of choice in between the two.  Additionally, now 
looking at btrfs (a correct reference this time! =:^), the "desperate" 
solution in btrfs is rather more fine-grained, including at least the 
three above options plus one for the superblock, with an additional read-
only restore tool that can often restore most or all data to elsewhere, 
in the case of a missed or not current backup, that reiserfs never had.

But AFAIK reiser4 (which I never actually tried as it never made 
mainline, which in general I prefer to stick to, but I read about it) 
improved on the reiserfs model in this regard as well -- indeed, it would 
have been surprising if it didn't, since both reiser4 and btrfs had the 
lessens of reiserfs to build upon.

And of course reiserfs might have gotten the same sort of tool changes 
too, except for Hans Reiser's controversial policy of letting stable be 
stable, and putting the improvements into reiser4, which of course was 
intended to get into mainline in some reasonable time and thus wouldn't 
have left reiserfs users so in the lurch as actually happened, because 
reiser4 never did hit mainline due to $reasons, most/all of which I agree 
with, or at least understand, where I don't entirely agree.

But anyway, for anyone with half a tech-oriented brain, it was very 
evident that the required options were "desperate" level, and for people 
without half a tech-oriented brain, the documentation clearly suggested 
danger ahead, you should have backups if you're going to do this as it's 
a risky process that could destroy chances of recovery instead of fixing 
things, as well.

But of course so many don't read the docs, they just do it... and 
sometimes they suffer the consequences when they do... and sometimes then 
try to blame others for it.  <shrug>  That's the way of the world; not 
something we're going to change.

Even the required actually spelled out "yes" confirmation, not just "y", 
didn't stop people, either from doing it or for blaming reiserfs for 
problems that were in fact mostly their own, when they went ahead anyway.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-17  3:25                                     ` Duncan
@ 2015-12-18  0:56                                       ` Christoph Anton Mitterer
  2015-12-22  2:13                                       ` Kai Krakow
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Anton Mitterer @ 2015-12-18  0:56 UTC (permalink / raw)
  To: Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 659 bytes --]

On Thu, 2015-12-17 at 03:25 +0000, Duncan wrote:
> So it's definitely _not_ something that reiserfsck would do in a
> "normal" 
> fsck, only when doing "I'm desperate and don't have backups, go to
> the 
> ends of the earth if necessary to recover what you can of my data,
> and 
> yes, I understand it could be a bit risky or end up rather
> disordered, 
> but I'm willing to take that risk because I _am_ that desperate",
> level 
> recovery.
Well, as long as that was/is clearly documented (which in the btrfs
would need to include any warnings about issues with multi-dev, if
any), then it's IMHO completely okay.
:)

Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: attacking btrfs filesystems via UUID collisions?
  2015-12-17  3:25                                     ` Duncan
  2015-12-18  0:56                                       ` Christoph Anton Mitterer
@ 2015-12-22  2:13                                       ` Kai Krakow
  1 sibling, 0 replies; 51+ messages in thread
From: Kai Krakow @ 2015-12-22  2:13 UTC (permalink / raw)
  To: linux-btrfs

Am Thu, 17 Dec 2015 03:25:50 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:

> So it's definitely _not_ something that reiserfsck would do in a
> "normal" fsck, only when doing "I'm desperate and don't have backups,
> go to the ends of the earth if necessary to recover what you can of
> my data, and yes, I understand it could be a bit risky or end up
> rather disordered, but I'm willing to take that risk because I _am_
> that desperate", level recovery.

What's fascinating: reiserfs was actually quite good at that and
actually saved me from "I'm desperate and don't have backups,
go to the ends of the earth if necessary to recover what you can of
my data, and yes, I understand it could be a bit risky or end up
rather disordered, but I'm willing to take that risk because I _am_
that desperate" (phew that's long). According to checksums all files
except some inflight temporary data was completely intact (in addition
to many files which came back out of nowhere - even ending up in their
original directory but not so intact). Lucky me... :-D

Cause of this was an unstable RAID controller which switched one hard
disk after the next into offline mode, then completely went offline
itself - leaving me with a system still running acceptably from cache
only. It was strange...

And reiserfs did this magic twice for me (but the second time I had
current backups, just wanted to have a copy of files created since the
nightly backup).

BTW: Ext3 partitions on the same hardware were broken beyond repair and
had to be recreated. e2fsck only made it worse.

Apparently, reiserfs did absolutely not scale to multithreaded
workloads - which is why I switched to xfs (it seemed pretty good at
it, especially on RAID and its behavior to distribute data diagonally
across the disks tho I won't recommend it without bbu as it tends to
nullify file contents during log-replay). It has proven similarly stable
in case of hardware havoc.

BTW2: The server with the "RAID controller accident" is still in
production but converted to XFS and migrated into virtualization
meanwhile. And yes: It has a daily backup schedule. :-)

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2015-12-22  2:13 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-04 12:05 Subvolume UUID, data corruption? S.J
2015-12-04 13:07 ` Hugo Mills
2015-12-05  3:28   ` Christoph Anton Mitterer
2015-12-05  5:52     ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
2015-12-05 12:01     ` Subvolume UUID, data corruption? Hugo Mills
2015-12-06  1:51       ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
2015-12-11 12:33       ` Subvolume UUID, data corruption? Austin S. Hemmelgarn
2015-12-05 13:19     ` Duncan
2015-12-06  1:51       ` attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?) Christoph Anton Mitterer
2015-12-06  4:06         ` Duncan
2015-12-09  5:07           ` Christoph Anton Mitterer
2015-12-09 11:54             ` Duncan
2015-12-06 14:34         ` attacking btrfs filesystems via UUID collisions? Qu Wenruo
2015-12-06 20:55           ` Chris Murphy
2015-12-09  5:39           ` Christoph Anton Mitterer
2015-12-09 21:48             ` S.J.
2015-12-10 12:08               ` Austin S Hemmelgarn
2015-12-10 12:41                 ` Hugo Mills
2015-12-10 12:57                   ` S.J.
2015-12-10 19:42               ` Chris Murphy
2015-12-11 22:21                 ` Christoph Anton Mitterer
2015-12-11 22:32                   ` Christoph Anton Mitterer
2015-12-11 23:06                   ` Chris Murphy
2015-12-12  1:34                     ` S.J.
2015-12-14  0:28                       ` Christoph Anton Mitterer
2015-12-14  0:27                     ` Christoph Anton Mitterer
2015-12-14 13:23                       ` Austin S. Hemmelgarn
2015-12-14 21:26                         ` Chris Murphy
2015-12-15  0:35                           ` Christoph Anton Mitterer
2015-12-15 13:54                           ` Austin S. Hemmelgarn
2015-12-15 14:18                             ` Hugo Mills
2015-12-15 14:27                               ` Austin S. Hemmelgarn
2015-12-15 14:42                                 ` Hugo Mills
2015-12-15 16:03                                   ` Austin S. Hemmelgarn
2015-12-16 12:14                                     ` Christoph Anton Mitterer
2015-12-16 12:10                                   ` Christoph Anton Mitterer
2015-12-16 12:03                               ` Christoph Anton Mitterer
2015-12-16 14:41                                 ` Chris Mason
2015-12-16 15:04                                   ` Christoph Anton Mitterer
2015-12-17  3:25                                     ` Duncan
2015-12-18  0:56                                       ` Christoph Anton Mitterer
2015-12-22  2:13                                       ` Kai Krakow
2015-12-16 12:03                             ` Christoph Anton Mitterer
2015-12-17  2:43                               ` Duncan
2015-12-15  0:08                         ` Christoph Anton Mitterer
2015-12-15 14:19                           ` Austin S. Hemmelgarn
2015-12-16 12:56                             ` Christoph Anton Mitterer
2015-12-14 20:55                       ` Chris Murphy
2015-12-15  0:22                         ` Christoph Anton Mitterer
2015-12-11 23:14                   ` Eric Sandeen
2015-12-11 22:06               ` Christoph Anton Mitterer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.