PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

All of lore.kernel.org
 help / color / mirror / Atom feed

* PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
@ 2014-12-01 12:56 MegaBrutal
  2014-12-01 17:27 ` Robert White
  2014-12-01 21:45 ` Konstantin
  0 siblings, 2 replies; 31+ messages in thread
From: MegaBrutal @ 2014-12-01 12:56 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

I've reported the bug I've previously posted about in "BTRFS messes up
snapshot LV with origin" in the Kernel Bug Tracker.
https://bugzilla.kernel.org/show_bug.cgi?id=89121

Since the other thread went off into theoretical debates about UUIDs
and their generic relation to BTRFS, their everyday use cases, and the
philosophical meaning behind uniqueness of copies and UUIDs; I'd like
to specifically ask you to only post here about the ACTUAL problem at
hand. Don't get me wrong, I find the discussion in the other thread
really interesting, I'm following it, but it is only very remotely
related to the original issue, so please keep it there! If you're
interested to catch up about the actual bug symptoms, please read the
bug report linked above, and (optionally) reproduce the problem
yourself!

A virtual machine image on which I've already reproduced the
conditions can be downloaded here:
http://undead.megabrutal.com/kvm-reproduce-1391429.img.xz
(Download size: 113 MB; Unpacked image size: 2 GB.)

Re-tested with mainline kernel 3.18.0-rc7 just today.

Regards,
MegaBrutal

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-01 12:56 PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots MegaBrutal
@ 2014-12-01 17:27 ` Robert White
  2014-12-01 22:10   ` MegaBrutal
  2014-12-01 21:45 ` Konstantin
  1 sibling, 1 reply; 31+ messages in thread
From: Robert White @ 2014-12-01 17:27 UTC (permalink / raw)
  To: MegaBrutal, linux-btrfs

On 12/01/2014 04:56 AM, MegaBrutal wrote:
> Since the other thread went off into theoretical debates about UUIDs
> and their generic relation to BTRFS, their everyday use cases, and the
> philosophical meaning behind uniqueness of copies and UUIDs; I'd like
> to specifically ask you to only post here about the ACTUAL problem at
> hand. Don't get me wrong, I find the discussion in the other thread
> really interesting, I'm following it, but it is only very remotely
> related to the original issue, so please keep it there! If you're
> interested to catch up about the actual bug symptoms, please read the
> bug report linked above, and (optionally) reproduce the problem
> yourself!

That discussion _was_ the actual discussion of the actual problem. A 
problem that is not particularly theoretical, a problem that is common 
to block-level snapshots, and a discussion that contained the actual 
work-arounds.

I suggest a re-read. 8-)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-01 12:56 PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots MegaBrutal
  2014-12-01 17:27 ` Robert White
@ 2014-12-01 21:45 ` Konstantin
  2014-12-02  5:47   ` MegaBrutal
  2014-12-02 19:19   ` Phillip Susi
  1 sibling, 2 replies; 31+ messages in thread
From: Konstantin @ 2014-12-01 21:45 UTC (permalink / raw)
  To: MegaBrutal, linux-btrfs

MegaBrutal schrieb am 01.12.2014 um 13:56:
> Hi all,
>
> I've reported the bug I've previously posted about in "BTRFS messes up
> snapshot LV with origin" in the Kernel Bug Tracker.
> https://bugzilla.kernel.org/show_bug.cgi?id=89121
Hi MegaBrutal. If I understand your report correctly, I can give you
another example where this bug is appearing. It is so bad that it leads
to freezing the system and I'm quite sure it's the same thing. I was
thinking about filing a bug but didn't have the time for that yet. Maybe
you could add this case to your bug report as well.

The bug appears also when using mdadm RAID1 - when one of the drives is
detached from the array then the OS discovers it and after a while (not
directly, it takes several minutes) it appears under /proc/mounts:
instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour
or so (depending on system workload) the PC completely freezes. So
discussion about the uniqueness of UUIDs or not, a crashing kernel is
telling me that there is a serious bug.

While in my case detaching was intentional, there are several real
possibilities when a RAID1 disk can get detached and currently this
leads to crashing the server when using BTRFS. That not what is intended
when using RAID ;-).

In my case I wanted to do something which was working perfectly all the
years before with all other file systems - checking the file system of
the root disk while the server is running. The procedure is simple:

1. detach one of the disks
2. do fsck on the disk device
3. mdadm --zero-superblock on the device so it gets completely rewritten
4. mdadm --add it to the array

There were some surprises with BTRFS - if 2. is not done directly after
1. btrfsck refuses to check the disk as it is reported to be mounted by
/proc/mounts. And while 2. or even after finishing it the system was
freezing. If I got to get to 4. fast enough everything was OK, but
again, that's not what I expect from a good operating system. Any
objections?

Konstantin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-01 17:27 ` Robert White
@ 2014-12-01 22:10   ` MegaBrutal
  2014-12-01 23:24     ` Robert White
  0 siblings, 1 reply; 31+ messages in thread
From: MegaBrutal @ 2014-12-01 22:10 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robert White

2014-12-01 18:27 GMT+01:00 Robert White <rwhite@pobox.com>:
> On 12/01/2014 04:56 AM, MegaBrutal wrote:
>>
>> Since the other thread went off into theoretical debates about UUIDs
>> and their generic relation to BTRFS, their everyday use cases, and the
>> philosophical meaning behind uniqueness of copies and UUIDs; I'd like
>> to specifically ask you to only post here about the ACTUAL problem at
>> hand. Don't get me wrong, I find the discussion in the other thread
>> really interesting, I'm following it, but it is only very remotely
>> related to the original issue, so please keep it there! If you're
>> interested to catch up about the actual bug symptoms, please read the
>> bug report linked above, and (optionally) reproduce the problem
>> yourself!
>
>
> That discussion _was_ the actual discussion of the actual problem. A problem
> that is not particularly theoretical, a problem that is common to
> block-level snapshots, and a discussion that contained the actual
> work-arounds.
>
> I suggest a re-read. 8-)
>

The majority of the discussion was about how the kernel should react
UPON mounting a file system when more than one device of the same UUID
exist on the system. While it is a very legit problem worth to discuss
and mitigate, this is not the same situation as how the kernel behaves
when an identical device appears WHILE the file system is being
mounted.

Actually, I would not identify devices by UUIDs when I know that
duplicates could exist due to snapshots, therefore I mount devices by
LVM paths. And when a file system is already mounted with all its
devices, that is a clear situation: all devices are open and locked by
the kernel, any mixup at that point is an error. What is the case with
multiple-device file systems? Supply all their devices with device=
mount options. Just don't identify devices by UUIDs when you know
there could be duplicates. Use UUIDs when you don't use LVM.
Identifying file systems by UUIDs were invented because classic
/dev/sdXX device names might change. But LVM names don't change. They
only change when you intentionally change them e.g. with lvrename.

Since having duplicate UUIDs on devices is not a problem for me since
I can tell them apart by LVM names, the discussion is of little
relevance to my use case. Of course it's interesting and I like to
read it along, it is not about the actual problem at hand.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-01 22:10   ` MegaBrutal
@ 2014-12-01 23:24     ` Robert White
  2014-12-02  0:15       ` MegaBrutal
  0 siblings, 1 reply; 31+ messages in thread
From: Robert White @ 2014-12-01 23:24 UTC (permalink / raw)
  To: MegaBrutal, linux-btrfs

On 12/01/2014 02:10 PM, MegaBrutal wrote:
> Since having duplicate UUIDs on devices is not a problem for me since
> I can tell them apart by LVM names, the discussion is of little
> relevance to my use case. Of course it's interesting and I like to
> read it along, it is not about the actual problem at hand.
>

Which is why you use the device= mount option, which would take LVM 
names and which was repeatedly discussed as solving this very problem.

Once you decide to duplicate the UUIDs with LVM snapshots you take up 
the burden of disambiguating your storage.

Which is part of why re-reading was suggested as this was covered in 
some depth and _is_ _exactly_ about the problem at hand.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-01 23:24     ` Robert White
@ 2014-12-02  0:15       ` MegaBrutal
  2014-12-02  7:50         ` Goffredo Baroncelli
  0 siblings, 1 reply; 31+ messages in thread
From: MegaBrutal @ 2014-12-02  0:15 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robert White

2014-12-02 0:24 GMT+01:00 Robert White <rwhite@pobox.com>:
> On 12/01/2014 02:10 PM, MegaBrutal wrote:
>>
>> Since having duplicate UUIDs on devices is not a problem for me since
>> I can tell them apart by LVM names, the discussion is of little
>> relevance to my use case. Of course it's interesting and I like to
>> read it along, it is not about the actual problem at hand.
>>
>
> Which is why you use the device= mount option, which would take LVM names
> and which was repeatedly discussed as solving this very problem.
>
> Once you decide to duplicate the UUIDs with LVM snapshots you take up the
> burden of disambiguating your storage.
>
> Which is part of why re-reading was suggested as this was covered in some
> depth and _is_ _exactly_ about the problem at hand.

Nope.

root@reproduce-1391429:~# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
root=/dev/mapper/vg-rootlv ro
rootflags=device=/dev/mapper/vg-rootlv,subvol=@

Observe, device= mount option is added.


root@reproduce-1391429:~# ./reproduce-1391429.sh
#!/bin/sh -v
lvs
  LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
  rootlv vg   -wi-ao---   1.00g
  swap0  vg   -wi-ao--- 256.00m

grub-probe --target=device /
/dev/mapper/vg-rootlv

grep " / " /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-1 / btrfs rw,relatime,space_cache 0 0

lvcreate --snapshot --size=128M --name z vg/rootlv
  Logical volume "z" created

lvs
  LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
  rootlv vg   owi-aos--   1.00g
  swap0  vg   -wi-ao--- 256.00m
  z      vg   swi-a-s-- 128.00m      rootlv   0.11

ls -l /dev/vg/
total 0
lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv -> ../dm-1
lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 -> ../dm-0
lrwxrwxrwx 1 root root 7 Dec  2 00:12 z -> ../dm-2

grub-probe --target=device /
/dev/mapper/vg-z

grep " / " /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-2 / btrfs rw,relatime,space_cache 0 0

lvremove --force vg/z
  Logical volume "z" successfully removed

grub-probe --target=device /
/dev/mapper/vg-rootlv

grep " / " /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-1 / btrfs rw,relatime,space_cache 0 0


Problem still reproduces.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-01 21:45 ` Konstantin
@ 2014-12-02  5:47   ` MegaBrutal
  2014-12-02 19:19   ` Phillip Susi
  1 sibling, 0 replies; 31+ messages in thread
From: MegaBrutal @ 2014-12-02  5:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Konstantin

2014-12-01 22:45 GMT+01:00 Konstantin <newsbox1026@web.de>:
>
> MegaBrutal schrieb am 01.12.2014 um 13:56:
>> Hi all,
>>
>> I've reported the bug I've previously posted about in "BTRFS messes up
>> snapshot LV with origin" in the Kernel Bug Tracker.
>> https://bugzilla.kernel.org/show_bug.cgi?id=89121
> Hi MegaBrutal. If I understand your report correctly, I can give you
> another example where this bug is appearing. It is so bad that it leads
> to freezing the system and I'm quite sure it's the same thing. I was
> thinking about filing a bug but didn't have the time for that yet. Maybe
> you could add this case to your bug report as well.
>
> The bug appears also when using mdadm RAID1 - when one of the drives is
> detached from the array then the OS discovers it and after a while (not
> directly, it takes several minutes) it appears under /proc/mounts:
> instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour
> or so (depending on system workload) the PC completely freezes. So
> discussion about the uniqueness of UUIDs or not, a crashing kernel is
> telling me that there is a serious bug.
>

Hmm, I also suspect our symptoms have the same root cause. It seems
the same thing happens: the BTRFS module notices another device with
the same file system and starts to report it as the root device. It
seems like it has no idea that it's part of a RAID configuration or
anything.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02  0:15       ` MegaBrutal
@ 2014-12-02  7:50         ` Goffredo Baroncelli
  2014-12-02  8:28           ` MegaBrutal
  0 siblings, 1 reply; 31+ messages in thread
From: Goffredo Baroncelli @ 2014-12-02  7:50 UTC (permalink / raw)
  To: MegaBrutal, linux-btrfs; +Cc: Robert White

On 12/02/2014 01:15 AM, MegaBrutal wrote:
> 2014-12-02 0:24 GMT+01:00 Robert White <rwhite@pobox.com>:
>> On 12/01/2014 02:10 PM, MegaBrutal wrote:
>>>
>>> Since having duplicate UUIDs on devices is not a problem for me since
>>> I can tell them apart by LVM names, the discussion is of little
>>> relevance to my use case. Of course it's interesting and I like to
>>> read it along, it is not about the actual problem at hand.
>>>
>>
>> Which is why you use the device= mount option, which would take LVM names
>> and which was repeatedly discussed as solving this very problem.
>>
>> Once you decide to duplicate the UUIDs with LVM snapshots you take up the
>> burden of disambiguating your storage.
>>
>> Which is part of why re-reading was suggested as this was covered in some
>> depth and _is_ _exactly_ about the problem at hand.
> 
> Nope.
> 
> root@reproduce-1391429:~# cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
> root=/dev/mapper/vg-rootlv ro
> rootflags=device=/dev/mapper/vg-rootlv,subvol=@
> 
> Observe, device= mount option is added.

device= options is needed only in a btrfs multi-volume scenario.
If you have only one disk, this is not needed

> 
> 
> root@reproduce-1391429:~# ./reproduce-1391429.sh
> #!/bin/sh -v
> lvs
>   LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
>   rootlv vg   -wi-ao---   1.00g
>   swap0  vg   -wi-ao--- 256.00m
> 
> grub-probe --target=device /
> /dev/mapper/vg-rootlv
> 
> grep " / " /proc/mounts
> rootfs / rootfs rw 0 0
> /dev/dm-1 / btrfs rw,relatime,space_cache 0 0
> 
> lvcreate --snapshot --size=128M --name z vg/rootlv
>   Logical volume "z" created
> 
> lvs
>   LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
>   rootlv vg   owi-aos--   1.00g
>   swap0  vg   -wi-ao--- 256.00m
>   z      vg   swi-a-s-- 128.00m      rootlv   0.11
> 
> ls -l /dev/vg/
> total 0
> lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv -> ../dm-1
> lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 -> ../dm-0
> lrwxrwxrwx 1 root root 7 Dec  2 00:12 z -> ../dm-2
> 
> grub-probe --target=device /
> /dev/mapper/vg-z
>
> grep " / " /proc/mounts
> rootfs / rootfs rw 0 0
> /dev/dm-2 / btrfs rw,relatime,space_cache 0 0

What /proc/self/mountinfo contains ?

And more important question: it is only the value
returned by /proc/mount wrongly or also the filesystem
content is affected ?

> 
> lvremove --force vg/z
>   Logical volume "z" successfully removed
> 
> grub-probe --target=device /
> /dev/mapper/vg-rootlv
> 
> grep " / " /proc/mounts
> rootfs / rootfs rw 0 0
> /dev/dm-1 / btrfs rw,relatime,space_cache 0 0
> 
> 
> Problem still reproduces.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02  7:50         ` Goffredo Baroncelli
@ 2014-12-02  8:28           ` MegaBrutal
  2014-12-02 11:14             ` Goffredo Baroncelli
  0 siblings, 1 reply; 31+ messages in thread
From: MegaBrutal @ 2014-12-02  8:28 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kreijack, Robert White

2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli <kreijack@inwind.it>:
> On 12/02/2014 01:15 AM, MegaBrutal wrote:
>> 2014-12-02 0:24 GMT+01:00 Robert White <rwhite@pobox.com>:
>>> On 12/01/2014 02:10 PM, MegaBrutal wrote:
>>>>
>>>> Since having duplicate UUIDs on devices is not a problem for me since
>>>> I can tell them apart by LVM names, the discussion is of little
>>>> relevance to my use case. Of course it's interesting and I like to
>>>> read it along, it is not about the actual problem at hand.
>>>>
>>>
>>> Which is why you use the device= mount option, which would take LVM names
>>> and which was repeatedly discussed as solving this very problem.
>>>
>>> Once you decide to duplicate the UUIDs with LVM snapshots you take up the
>>> burden of disambiguating your storage.
>>>
>>> Which is part of why re-reading was suggested as this was covered in some
>>> depth and _is_ _exactly_ about the problem at hand.
>>
>> Nope.
>>
>> root@reproduce-1391429:~# cat /proc/cmdline
>> BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
>> root=/dev/mapper/vg-rootlv ro
>> rootflags=device=/dev/mapper/vg-rootlv,subvol=@
>>
>> Observe, device= mount option is added.
>
> device= options is needed only in a btrfs multi-volume scenario.
> If you have only one disk, this is not needed
>

I know. I only did this as a demonstration for Robert. He insisted it
will certainly solve the problem. Well, it doesn't.


>>
>> root@reproduce-1391429:~# ./reproduce-1391429.sh
>> #!/bin/sh -v
>> lvs
>>   LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
>>   rootlv vg   -wi-ao---   1.00g
>>   swap0  vg   -wi-ao--- 256.00m
>>
>> grub-probe --target=device /
>> /dev/mapper/vg-rootlv
>>
>> grep " / " /proc/mounts
>> rootfs / rootfs rw 0 0
>> /dev/dm-1 / btrfs rw,relatime,space_cache 0 0
>>
>> lvcreate --snapshot --size=128M --name z vg/rootlv
>>   Logical volume "z" created
>>
>> lvs
>>   LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
>>   rootlv vg   owi-aos--   1.00g
>>   swap0  vg   -wi-ao--- 256.00m
>>   z      vg   swi-a-s-- 128.00m      rootlv   0.11
>>
>> ls -l /dev/vg/
>> total 0
>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv -> ../dm-1
>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 -> ../dm-0
>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 z -> ../dm-2
>>
>> grub-probe --target=device /
>> /dev/mapper/vg-z
>>
>> grep " / " /proc/mounts
>> rootfs / rootfs rw 0 0
>> /dev/dm-2 / btrfs rw,relatime,space_cache 0 0
>
> What /proc/self/mountinfo contains ?

Before creating snapshot:

15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
17 20 0:5 / /dev rw,relatime - devtmpfs udev
rw,size=241692k,nr_inodes=60423,mode=755
18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
rw,gid=5,mode=620,ptmxmode=000
19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
rw,size=50084k,mode=755
20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache
<----- THIS!
21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
rw,size=5120k
26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
rw,size=102400k,mode=755
28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw


After creating snapshot:

15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
17 20 0:5 / /dev rw,relatime - devtmpfs udev
rw,size=241692k,nr_inodes=60423,mode=755
18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
rw,gid=5,mode=620,ptmxmode=000
19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
rw,size=50084k,mode=755
20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-2 rw,space_cache
<----- WTF?!
21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
rw,size=5120k
26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
rw,size=102400k,mode=755
28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw


So it's consistent with what /proc/mounts reports.


>
> And more important question: it is only the value
> returned by /proc/mount wrongly or also the filesystem
> content is affected ?
>

I quote my bug report on this:

"The information reported in /proc/mounts is certainly bogus, since
still the origin device is being written, the kernel does not actually
mix up the devices for write operations, and such, the phenomenon does
not cause data corruption. (I did an entire distro release upgrade
while the conditions were present, and I centainly would have suffered
severe data corruption otherwise. Fortunately, the origin device had
the new distro, and the snapshot device had the old one, so besides
the mixup in /proc/mounts, no actual damage happened.)"

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02  8:28           ` MegaBrutal
@ 2014-12-02 11:14             ` Goffredo Baroncelli
  2014-12-02 11:54               ` Anand Jain
  0 siblings, 1 reply; 31+ messages in thread
From: Goffredo Baroncelli @ 2014-12-02 11:14 UTC (permalink / raw)
  To: MegaBrutal, linux-btrfs; +Cc: Robert White, Anand Jain, Chris Mason

I further investigate this issue.

MegaBrutal, reported the following issue: doing a lvm snapshot of the device of a
mounted btrfs fs, the new snapshot device name replaces the name of the original 
device in the output of /proc/mounts. This confused tools like grub-probe which
report a wrong root device.

It has to be pointed out that instead the link under /sys/fs/btrfs/<fsid>/devices is
correct.


What happens is that *even if the filesystem is mounted*, doing a
"btrfs dev scan" of a snapshot (of the real volume), the device name of the
filesystem is replaced with the snapshot one.

Anand, with b96de000b, tried to fix it; however further regression appeared
and Chris reverted this commit (see below).

BR
G.Baroncelli

commit b96de000bc8bc9688b3a2abea4332bd57648a49f
Author: Anand Jain <anand.jain@oracle.com>
Date:   Thu Jul 3 18:22:05 2014 +0800

    Btrfs: device_list_add() should not update list when mounted
[...]


commit 0f23ae74f589304bf33233f85737f4fd368549eb
Author: Chris Mason <clm@fb.com>
Date:   Thu Sep 18 07:49:05 2014 -0700

    Revert "Btrfs: device_list_add() should not update list when mounted"
    
    This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.
    
    This commit is triggering failures to mount by subvolume id in some
    configurations.  The main problem is how many different ways this
    scanning function is used, both for scanning while mounted and
    unmounted.  A proper cleanup is too big for late rcs.
    
[...]

On 12/02/2014 09:28 AM, MegaBrutal wrote:
> 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli <kreijack@inwind.it>:
>> On 12/02/2014 01:15 AM, MegaBrutal wrote:
>>> 2014-12-02 0:24 GMT+01:00 Robert White <rwhite@pobox.com>:
>>>> On 12/01/2014 02:10 PM, MegaBrutal wrote:
>>>>>
>>>>> Since having duplicate UUIDs on devices is not a problem for me since
>>>>> I can tell them apart by LVM names, the discussion is of little
>>>>> relevance to my use case. Of course it's interesting and I like to
>>>>> read it along, it is not about the actual problem at hand.
>>>>>
>>>>
>>>> Which is why you use the device= mount option, which would take LVM names
>>>> and which was repeatedly discussed as solving this very problem.
>>>>
>>>> Once you decide to duplicate the UUIDs with LVM snapshots you take up the
>>>> burden of disambiguating your storage.
>>>>
>>>> Which is part of why re-reading was suggested as this was covered in some
>>>> depth and _is_ _exactly_ about the problem at hand.
>>>
>>> Nope.
>>>
>>> root@reproduce-1391429:~# cat /proc/cmdline
>>> BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
>>> root=/dev/mapper/vg-rootlv ro
>>> rootflags=device=/dev/mapper/vg-rootlv,subvol=@
>>>
>>> Observe, device= mount option is added.
>>
>> device= options is needed only in a btrfs multi-volume scenario.
>> If you have only one disk, this is not needed
>>
> 
> I know. I only did this as a demonstration for Robert. He insisted it
> will certainly solve the problem. Well, it doesn't.
> 
> 
>>>
>>> root@reproduce-1391429:~# ./reproduce-1391429.sh
>>> #!/bin/sh -v
>>> lvs
>>>   LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
>>>   rootlv vg   -wi-ao---   1.00g
>>>   swap0  vg   -wi-ao--- 256.00m
>>>
>>> grub-probe --target=device /
>>> /dev/mapper/vg-rootlv
>>>
>>> grep " / " /proc/mounts
>>> rootfs / rootfs rw 0 0
>>> /dev/dm-1 / btrfs rw,relatime,space_cache 0 0
>>>
>>> lvcreate --snapshot --size=128M --name z vg/rootlv
>>>   Logical volume "z" created
>>>
>>> lvs
>>>   LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
>>>   rootlv vg   owi-aos--   1.00g
>>>   swap0  vg   -wi-ao--- 256.00m
>>>   z      vg   swi-a-s-- 128.00m      rootlv   0.11
>>>
>>> ls -l /dev/vg/
>>> total 0
>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv -> ../dm-1
>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 -> ../dm-0
>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 z -> ../dm-2
>>>
>>> grub-probe --target=device /
>>> /dev/mapper/vg-z
>>>
>>> grep " / " /proc/mounts
>>> rootfs / rootfs rw 0 0
>>> /dev/dm-2 / btrfs rw,relatime,space_cache 0 0
>>
>> What /proc/self/mountinfo contains ?
> 
> Before creating snapshot:
> 
> 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
> 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
> 17 20 0:5 / /dev rw,relatime - devtmpfs udev
> rw,size=241692k,nr_inodes=60423,mode=755
> 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
> rw,gid=5,mode=620,ptmxmode=000
> 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
> rw,size=50084k,mode=755
> 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache
> <----- THIS!
> 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
> 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
> 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
> 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
> 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
> rw,size=5120k
> 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
> 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
> rw,size=102400k,mode=755
> 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
> 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
> 
> 
> After creating snapshot:
> 
> 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
> 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
> 17 20 0:5 / /dev rw,relatime - devtmpfs udev
> rw,size=241692k,nr_inodes=60423,mode=755
> 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
> rw,gid=5,mode=620,ptmxmode=000
> 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
> rw,size=50084k,mode=755
> 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-2 rw,space_cache
> <----- WTF?!
> 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
> 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
> 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
> 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
> 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
> rw,size=5120k
> 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
> 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
> rw,size=102400k,mode=755
> 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
> 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
> 
> 
> So it's consistent with what /proc/mounts reports.
> 
> 
>>
>> And more important question: it is only the value
>> returned by /proc/mount wrongly or also the filesystem
>> content is affected ?
>>
> 
> I quote my bug report on this:
> 
> "The information reported in /proc/mounts is certainly bogus, since
> still the origin device is being written, the kernel does not actually
> mix up the devices for write operations, and such, the phenomenon does
> not cause data corruption. (I did an entire distro release upgrade
> while the conditions were present, and I centainly would have suffered
> severe data corruption otherwise. Fortunately, the origin device had
> the new distro, and the snapshot device had the old one, so besides
> the mixup in /proc/mounts, no actual damage happened.)"
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02 11:14             ` Goffredo Baroncelli
@ 2014-12-02 11:54               ` Anand Jain
  2014-12-02 12:23                 ` Austin S Hemmelgarn
                                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Anand Jain @ 2014-12-02 11:54 UTC (permalink / raw)
  To: kreijack, MegaBrutal, linux-btrfs; +Cc: Robert White, Chris Mason




On 02/12/2014 19:14, Goffredo Baroncelli wrote:
> I further investigate this issue.
>
> MegaBrutal, reported the following issue: doing a lvm snapshot of the device of a
> mounted btrfs fs, the new snapshot device name replaces the name of the original
> device in the output of /proc/mounts. This confused tools like grub-probe which
> report a wrong root device.

very good test case indeed thanks.

Actual IO would still go to the original device, until FS is remounted.


> It has to be pointed out that instead the link under /sys/fs/btrfs/<fsid>/devices is
> correct.

In this context the above sysfs path will be out of sync with the 
reality, its just stale sysfs entry.

>
> What happens is that *even if the filesystem is mounted*, doing a
> "btrfs dev scan" of a snapshot (of the real volume), the device name of the
> filesystem is replaced with the snapshot one.

we have some fundamentally wrong stuff. My original patch tried
to fix it. But later discovered that some external entities like
systmed and boot process is using that bug as a feature and we had
to revert the patch.

Fundamentally scsi inquiry serial number is only number which is unique
to the device (including the virtual device, but there could be some
legacy virtual device which didn't follow that strictly, Anyway those
I deem to be device side issue.) Btrfs depends on the combination of
fsid, uuid and devid (and generation number) to identify the unique
device volume, which is weak and easy to go wrong.


> Anand, with b96de000b, tried to fix it; however further regression appeared
> and Chris reverted this commit (see below).
>
> BR
> G.Baroncelli
>
> commit b96de000bc8bc9688b3a2abea4332bd57648a49f
> Author: Anand Jain <anand.jain@oracle.com>
> Date:   Thu Jul 3 18:22:05 2014 +0800
>
>      Btrfs: device_list_add() should not update list when mounted
> [...]
>
>
> commit 0f23ae74f589304bf33233f85737f4fd368549eb
> Author: Chris Mason <clm@fb.com>
> Date:   Thu Sep 18 07:49:05 2014 -0700
>
>      Revert "Btrfs: device_list_add() should not update list when mounted"
>
>      This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.
>
>      This commit is triggering failures to mount by subvolume id in some
>      configurations.  The main problem is how many different ways this
>      scanning function is used, both for scanning while mounted and
>      unmounted.  A proper cleanup is too big for late rcs.
>
> [...]
>
> On 12/02/2014 09:28 AM, MegaBrutal wrote:
>> 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli <kreijack@inwind.it>:
>>> On 12/02/2014 01:15 AM, MegaBrutal wrote:
>>>> 2014-12-02 0:24 GMT+01:00 Robert White <rwhite@pobox.com>:
>>>>> On 12/01/2014 02:10 PM, MegaBrutal wrote:
>>>>>>
>>>>>> Since having duplicate UUIDs on devices is not a problem for me since
>>>>>> I can tell them apart by LVM names, the discussion is of little
>>>>>> relevance to my use case. Of course it's interesting and I like to
>>>>>> read it along, it is not about the actual problem at hand.
>>>>>>
>>>>>
>>>>> Which is why you use the device= mount option, which would take LVM names
>>>>> and which was repeatedly discussed as solving this very problem.
>>>>>
>>>>> Once you decide to duplicate the UUIDs with LVM snapshots you take up the
>>>>> burden of disambiguating your storage.
>>>>>
>>>>> Which is part of why re-reading was suggested as this was covered in some
>>>>> depth and _is_ _exactly_ about the problem at hand.
>>>>
>>>> Nope.
>>>>
>>>> root@reproduce-1391429:~# cat /proc/cmdline
>>>> BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
>>>> root=/dev/mapper/vg-rootlv ro
>>>> rootflags=device=/dev/mapper/vg-rootlv,subvol=@
>>>>
>>>> Observe, device= mount option is added.
>>>
>>> device= options is needed only in a btrfs multi-volume scenario.
>>> If you have only one disk, this is not needed
>>>
>>
>> I know. I only did this as a demonstration for Robert. He insisted it
>> will certainly solve the problem. Well, it doesn't.
>>
>>
>>>>
>>>> root@reproduce-1391429:~# ./reproduce-1391429.sh
>>>> #!/bin/sh -v
>>>> lvs
>>>>    LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
>>>>    rootlv vg   -wi-ao---   1.00g
>>>>    swap0  vg   -wi-ao--- 256.00m
>>>>
>>>> grub-probe --target=device /
>>>> /dev/mapper/vg-rootlv
>>>>
>>>> grep " / " /proc/mounts
>>>> rootfs / rootfs rw 0 0
>>>> /dev/dm-1 / btrfs rw,relatime,space_cache 0 0
>>>>
>>>> lvcreate --snapshot --size=128M --name z vg/rootlv
>>>>    Logical volume "z" created
>>>>
>>>> lvs
>>>>    LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
>>>>    rootlv vg   owi-aos--   1.00g
>>>>    swap0  vg   -wi-ao--- 256.00m
>>>>    z      vg   swi-a-s-- 128.00m      rootlv   0.11
>>>>
>>>> ls -l /dev/vg/
>>>> total 0
>>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv -> ../dm-1
>>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 -> ../dm-0
>>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 z -> ../dm-2
>>>>
>>>> grub-probe --target=device /
>>>> /dev/mapper/vg-z
>>>>
>>>> grep " / " /proc/mounts
>>>> rootfs / rootfs rw 0 0
>>>> /dev/dm-2 / btrfs rw,relatime,space_cache 0 0
>>>
>>> What /proc/self/mountinfo contains ?
>>
>> Before creating snapshot:
>>
>> 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
>> 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>> 17 20 0:5 / /dev rw,relatime - devtmpfs udev
>> rw,size=241692k,nr_inodes=60423,mode=755
>> 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
>> rw,gid=5,mode=620,ptmxmode=000
>> 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
>> rw,size=50084k,mode=755
>> 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache
>> <----- THIS!
>> 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
>> 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
>> 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
>> 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
>> 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
>> rw,size=5120k
>> 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
>> 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
>> rw,size=102400k,mode=755
>> 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
>> 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
>>
>>
>> After creating snapshot:
>>
>> 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
>> 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>> 17 20 0:5 / /dev rw,relatime - devtmpfs udev
>> rw,size=241692k,nr_inodes=60423,mode=755
>> 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
>> rw,gid=5,mode=620,ptmxmode=000
>> 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
>> rw,size=50084k,mode=755
>> 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-2 rw,space_cache
>> <----- WTF?!
>> 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
>> 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
>> 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
>> 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
>> 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
>> rw,size=5120k
>> 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
>> 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
>> rw,size=102400k,mode=755
>> 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
>> 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
>>
>>
>> So it's consistent with what /proc/mounts reports.
>>
>>
>>>
>>> And more important question: it is only the value
>>> returned by /proc/mount wrongly or also the filesystem
>>> content is affected ?
>>>
>>
>> I quote my bug report on this:
>>
>> "The information reported in /proc/mounts is certainly bogus, since
>> still the origin device is being written, the kernel does not actually
>> mix up the devices for write operations, and such, the phenomenon does
>> not cause data corruption. (I did an entire distro release upgrade
>> while the conditions were present, and I centainly would have suffered
>> severe data corruption otherwise. Fortunately, the origin device had
>> the new distro, and the snapshot device had the old one, so besides
>> the mixup in /proc/mounts, no actual damage happened.)"
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02 11:54               ` Anand Jain
@ 2014-12-02 12:23                 ` Austin S Hemmelgarn
  2014-12-02 19:11                   ` Phillip Susi
  2014-12-02 19:14                 ` Phillip Susi
  2014-12-08  0:05                 ` Konstantin
  2 siblings, 1 reply; 31+ messages in thread
From: Austin S Hemmelgarn @ 2014-12-02 12:23 UTC (permalink / raw)
  To: Anand Jain, kreijack, MegaBrutal, linux-btrfs; +Cc: Robert White, Chris Mason

[-- Attachment #1: Type: text/plain, Size: 8985 bytes --]

On 2014-12-02 06:54, Anand Jain wrote:
>
>
>
> On 02/12/2014 19:14, Goffredo Baroncelli wrote:
>> I further investigate this issue.
>>
>> MegaBrutal, reported the following issue: doing a lvm snapshot of the
>> device of a
>> mounted btrfs fs, the new snapshot device name replaces the name of
>> the original
>> device in the output of /proc/mounts. This confused tools like
>> grub-probe which
>> report a wrong root device.
>
> very good test case indeed thanks.
>
> Actual IO would still go to the original device, until FS is remounted.
>
>
>> It has to be pointed out that instead the link under
>> /sys/fs/btrfs/<fsid>/devices is
>> correct.
>
> In this context the above sysfs path will be out of sync with the
> reality, its just stale sysfs entry.
>
>>
>> What happens is that *even if the filesystem is mounted*, doing a
>> "btrfs dev scan" of a snapshot (of the real volume), the device name
>> of the
>> filesystem is replaced with the snapshot one.
>
> we have some fundamentally wrong stuff. My original patch tried
> to fix it. But later discovered that some external entities like
> systmed and boot process is using that bug as a feature and we had
> to revert the patch.
>
> Fundamentally scsi inquiry serial number is only number which is unique
> to the device (including the virtual device, but there could be some
> legacy virtual device which didn't follow that strictly, Anyway those
> I deem to be device side issue.) Btrfs depends on the combination of
> fsid, uuid and devid (and generation number) to identify the unique
> device volume, which is weak and easy to go wrong.
>
>
>> Anand, with b96de000b, tried to fix it; however further regression
>> appeared
>> and Chris reverted this commit (see below).
>>
>> BR
>> G.Baroncelli
>>
>> commit b96de000bc8bc9688b3a2abea4332bd57648a49f
>> Author: Anand Jain <anand.jain@oracle.com>
>> Date:   Thu Jul 3 18:22:05 2014 +0800
>>
>>      Btrfs: device_list_add() should not update list when mounted
>> [...]
>>
>>
>> commit 0f23ae74f589304bf33233f85737f4fd368549eb
>> Author: Chris Mason <clm@fb.com>
>> Date:   Thu Sep 18 07:49:05 2014 -0700
>>
>>      Revert "Btrfs: device_list_add() should not update list when
>> mounted"
>>
>>      This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.
>>
>>      This commit is triggering failures to mount by subvolume id in some
>>      configurations.  The main problem is how many different ways this
>>      scanning function is used, both for scanning while mounted and
>>      unmounted.  A proper cleanup is too big for late rcs.
>>
>> [...]
>>
>> On 12/02/2014 09:28 AM, MegaBrutal wrote:
>>> 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli <kreijack@inwind.it>:
>>>> On 12/02/2014 01:15 AM, MegaBrutal wrote:
>>>>> 2014-12-02 0:24 GMT+01:00 Robert White <rwhite@pobox.com>:
>>>>>> On 12/01/2014 02:10 PM, MegaBrutal wrote:
>>>>>>>
>>>>>>> Since having duplicate UUIDs on devices is not a problem for me
>>>>>>> since
>>>>>>> I can tell them apart by LVM names, the discussion is of little
>>>>>>> relevance to my use case. Of course it's interesting and I like to
>>>>>>> read it along, it is not about the actual problem at hand.
>>>>>>>
>>>>>>
>>>>>> Which is why you use the device= mount option, which would take
>>>>>> LVM names
>>>>>> and which was repeatedly discussed as solving this very problem.
>>>>>>
>>>>>> Once you decide to duplicate the UUIDs with LVM snapshots you take
>>>>>> up the
>>>>>> burden of disambiguating your storage.
>>>>>>
>>>>>> Which is part of why re-reading was suggested as this was covered
>>>>>> in some
>>>>>> depth and _is_ _exactly_ about the problem at hand.
>>>>>
>>>>> Nope.
>>>>>
>>>>> root@reproduce-1391429:~# cat /proc/cmdline
>>>>> BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
>>>>> root=/dev/mapper/vg-rootlv ro
>>>>> rootflags=device=/dev/mapper/vg-rootlv,subvol=@
>>>>>
>>>>> Observe, device= mount option is added.
>>>>
>>>> device= options is needed only in a btrfs multi-volume scenario.
>>>> If you have only one disk, this is not needed
>>>>
>>>
>>> I know. I only did this as a demonstration for Robert. He insisted it
>>> will certainly solve the problem. Well, it doesn't.
>>>
>>>
>>>>>
>>>>> root@reproduce-1391429:~# ./reproduce-1391429.sh
>>>>> #!/bin/sh -v
>>>>> lvs
>>>>>    LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%
>>>>> Convert
>>>>>    rootlv vg   -wi-ao---   1.00g
>>>>>    swap0  vg   -wi-ao--- 256.00m
>>>>>
>>>>> grub-probe --target=device /
>>>>> /dev/mapper/vg-rootlv
>>>>>
>>>>> grep " / " /proc/mounts
>>>>> rootfs / rootfs rw 0 0
>>>>> /dev/dm-1 / btrfs rw,relatime,space_cache 0 0
>>>>>
>>>>> lvcreate --snapshot --size=128M --name z vg/rootlv
>>>>>    Logical volume "z" created
>>>>>
>>>>> lvs
>>>>>    LV     VG   Attr      LSize   Pool Origin Data%  Move Log Copy%
>>>>> Convert
>>>>>    rootlv vg   owi-aos--   1.00g
>>>>>    swap0  vg   -wi-ao--- 256.00m
>>>>>    z      vg   swi-a-s-- 128.00m      rootlv   0.11
>>>>>
>>>>> ls -l /dev/vg/
>>>>> total 0
>>>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv -> ../dm-1
>>>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 -> ../dm-0
>>>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 z -> ../dm-2
>>>>>
>>>>> grub-probe --target=device /
>>>>> /dev/mapper/vg-z
>>>>>
>>>>> grep " / " /proc/mounts
>>>>> rootfs / rootfs rw 0 0
>>>>> /dev/dm-2 / btrfs rw,relatime,space_cache 0 0
>>>>
>>>> What /proc/self/mountinfo contains ?
>>>
>>> Before creating snapshot:
>>>
>>> 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
>>> 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>>> 17 20 0:5 / /dev rw,relatime - devtmpfs udev
>>> rw,size=241692k,nr_inodes=60423,mode=755
>>> 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
>>> rw,gid=5,mode=620,ptmxmode=000
>>> 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
>>> rw,size=50084k,mode=755
>>> 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache
>>> <----- THIS!
>>> 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
>>> 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
>>> 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
>>> 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
>>> 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
>>> rw,size=5120k
>>> 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
>>> 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
>>> rw,size=102400k,mode=755
>>> 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
>>> 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
>>>
>>>
>>> After creating snapshot:
>>>
>>> 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
>>> 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>>> 17 20 0:5 / /dev rw,relatime - devtmpfs udev
>>> rw,size=241692k,nr_inodes=60423,mode=755
>>> 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
>>> rw,gid=5,mode=620,ptmxmode=000
>>> 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
>>> rw,size=50084k,mode=755
>>> 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-2 rw,space_cache
>>> <----- WTF?!
>>> 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
>>> 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
>>> 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
>>> 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
>>> 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
>>> rw,size=5120k
>>> 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
>>> 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
>>> rw,size=102400k,mode=755
>>> 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
>>> 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
>>>
>>>
>>> So it's consistent with what /proc/mounts reports.
>>>
>>>
>>>>
>>>> And more important question: it is only the value
>>>> returned by /proc/mount wrongly or also the filesystem
>>>> content is affected ?
>>>>
>>>
>>> I quote my bug report on this:
>>>
>>> "The information reported in /proc/mounts is certainly bogus, since
>>> still the origin device is being written, the kernel does not actually
>>> mix up the devices for write operations, and such, the phenomenon does
>>> not cause data corruption. (I did an entire distro release upgrade
>>> while the conditions were present, and I centainly would have suffered
>>> severe data corruption otherwise. Fortunately, the origin device had
>>> the new distro, and the snapshot device had the old one, so besides
>>> the mixup in /proc/mounts, no actual damage happened.)"
Stupid thought, why don't we just add blacklisting based on device path 
like LVM has for pvscan?



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02 12:23                 ` Austin S Hemmelgarn
@ 2014-12-02 19:11                   ` Phillip Susi
  2014-12-03  8:24                     ` Goffredo Baroncelli
  0 siblings, 1 reply; 31+ messages in thread
From: Phillip Susi @ 2014-12-02 19:11 UTC (permalink / raw)
  To: Austin S Hemmelgarn, Anand Jain, kreijack, MegaBrutal, linux-btrfs
  Cc: Robert White, Chris Mason

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/2/2014 7:23 AM, Austin S Hemmelgarn wrote:
> Stupid thought, why don't we just add blacklisting based on device
> path like LVM has for pvscan?

That isn't logic that belongs in the kernel, so that is going down the
path of yanking out the device auto probing from btrfs and instead
writing a mount.btrfs helper that can use policies like blacklisting
to auto locate all of the correct devices and pass them all to the
kernel at mount time.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUfg7lAAoJENRVrw2cjl5RAakIAKLsIKgjzUO8J/PBBDTmcCQh
IvkEMlQ6ME+Zi7xCKM9p+J5Skcu22zj8w2Ip0s/zNo3ydGorajxehUqtU983l5Hd
VklKOuNGZ0wrOtwCH8IkRt9HUvT3I7982jByi2Uk9jxpRbL/BruaJ4NF+Z9HnvHO
cmMNavcKvwOkYpPHPPbeyjNwWALe/WRZZ2cgsKqs/vB2nakxFntUc1UOsnIMfLJ7
dMF0l9GudoIoNaqRUNoxV1/Lh9MxKx0p9mBK6Pc+V+wLulUyOUSQ6OkUTsznCabk
iUyzX9IYiF83hWO3g+1vxR+GCeYNVGvC/Rj8ZkLSt9Tpi7JH0kbXnq6wKedSfE0=
=Lxfb
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02 11:54               ` Anand Jain
  2014-12-02 12:23                 ` Austin S Hemmelgarn
@ 2014-12-02 19:14                 ` Phillip Susi
  2014-12-08  0:05                 ` Konstantin
  2 siblings, 0 replies; 31+ messages in thread
From: Phillip Susi @ 2014-12-02 19:14 UTC (permalink / raw)
  To: Anand Jain, kreijack, MegaBrutal, linux-btrfs; +Cc: Robert White, Chris Mason

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/2/2014 6:54 AM, Anand Jain wrote:
> we have some fundamentally wrong stuff. My original patch tried to
> fix it. But later discovered that some external entities like 
> systmed and boot process is using that bug as a feature and we had 
> to revert the patch.

If systemd is depending on the kernel lieing about what device it has
mounted then something is *extremely* broken there and that should be
fixed instead of breaking the kernel.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUfg+BAAoJENRVrw2cjl5REm8H/j2MEbF2yeTsGtOGhszl82rZ
ngSvVfEEPq1D+tpi28+oZnSLYxIKEGudqTciyeb8Z1jCTD065D/T0xpGJZyd6pUG
KGahBpnPvhP5xg4RaoSxSzNcFzPPFfz+EIPyV+l3OlHbyeq0whkKj5OAq15Grz6c
RDWViqRFRE+dC2k70fAt6mlxWs7ChCVs9fPuuWVTFW+lXBoCKUZhnZ5Kc2orsKx6
rVTNTo6LxZQX7m+9WzIy5lqH+WgqxtfEacAlM/6jXWwPe09DDT3z0s3ogf+dfO0D
3/efDv1XJ/LwmbyQrGxiS0LQWoPA+d+MX0Od3XRcaeml3d7k/tZjDsrFOY6anIg=
=Rxh6
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-01 21:45 ` Konstantin
  2014-12-02  5:47   ` MegaBrutal
@ 2014-12-02 19:19   ` Phillip Susi
  2014-12-03  3:01     ` Russell Coker
  2014-12-08  0:32     ` Konstantin
  1 sibling, 2 replies; 31+ messages in thread
From: Phillip Susi @ 2014-12-02 19:19 UTC (permalink / raw)
  To: Konstantin, MegaBrutal, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/1/2014 4:45 PM, Konstantin wrote:
> The bug appears also when using mdadm RAID1 - when one of the
> drives is detached from the array then the OS discovers it and
> after a while (not directly, it takes several minutes) it appears
> under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1.
> And usually after some hour or so (depending on system workload)
> the PC completely freezes. So discussion about the uniqueness of
> UUIDs or not, a crashing kernel is telling me that there is a
> serious bug.

I'm guessing you are using metadata format 0.9 or 1.0, which put the
metadata at the end of the drive and the filesystem still starts in
sector zero.  1.2 is now the default and would not have this problem
as its metadata is at the start of the disk ( well, 4k from the start
) and the fs starts further down.



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUfhC6AAoJENRVrw2cjl5RQ2EH/0Z0iCFjOs3e5oGuGqT5Wtlc
rXV8R1EfGSxESK0g6QAe7QIvJu+0CdIgccDp8z3ezfPcm1/YRfBXxXA/Y1Wl4hqw
0wuk3bNqMjUmNwIFjEZCkgOSn4Whuppbh3hOOVGNropr4cwd84GP1Cr2vrzwYnkm
If1I3RTaBhAJRSngkP9X+L5J6zBBjaZLlF4AjC/WP/1bd5vkHpGqnFpRTquCPiNV
9LFWQIB+xYdoRdK2l7huS2jQ5kfw+qLZUQO17dU3fcicwwNk56V4HcLEPg9nx9es
pxJo9BAWmQXDpeMcCL4eFECoeAhn0IXoaXb363mmpq11qyYj73r3FzhNQ+ALzPY=
=U65Z
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02 19:19   ` Phillip Susi
@ 2014-12-03  3:01     ` Russell Coker
  2014-12-08  0:32     ` Konstantin
  1 sibling, 0 replies; 31+ messages in thread
From: Russell Coker @ 2014-12-03  3:01 UTC (permalink / raw)
  To: Phillip Susi, Konstantin, MegaBrutal, linux-btrfs

Maybe we should have a warning in mkfs.btrfs for the problematic RAID layouts.

On December 3, 2014 6:19:22 AM GMT+11:00, Phillip Susi <psusi@ubuntu.com> wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>On 12/1/2014 4:45 PM, Konstantin wrote:
>> The bug appears also when using mdadm RAID1 - when one of the
>> drives is detached from the array then the OS discovers it and
>> after a while (not directly, it takes several minutes) it appears
>> under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1.
>> And usually after some hour or so (depending on system workload)
>> the PC completely freezes. So discussion about the uniqueness of
>> UUIDs or not, a crashing kernel is telling me that there is a
>> serious bug.
>
>I'm guessing you are using metadata format 0.9 or 1.0, which put the
>metadata at the end of the drive and the filesystem still starts in
>sector zero.  1.2 is now the default and would not have this problem
>as its metadata is at the start of the disk ( well, 4k from the start
>) and the fs starts further down.
>
>
>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v2.0.17 (MingW32)
>
>iQEcBAEBAgAGBQJUfhC6AAoJENRVrw2cjl5RQ2EH/0Z0iCFjOs3e5oGuGqT5Wtlc
>rXV8R1EfGSxESK0g6QAe7QIvJu+0CdIgccDp8z3ezfPcm1/YRfBXxXA/Y1Wl4hqw
>0wuk3bNqMjUmNwIFjEZCkgOSn4Whuppbh3hOOVGNropr4cwd84GP1Cr2vrzwYnkm
>If1I3RTaBhAJRSngkP9X+L5J6zBBjaZLlF4AjC/WP/1bd5vkHpGqnFpRTquCPiNV
>9LFWQIB+xYdoRdK2l7huS2jQ5kfw+qLZUQO17dU3fcicwwNk56V4HcLEPg9nx9es
>pxJo9BAWmQXDpeMcCL4eFECoeAhn0IXoaXb363mmpq11qyYj73r3FzhNQ+ALzPY=
>=U65Z
>-----END PGP SIGNATURE-----
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Sent from my Samsung Galaxy Note 3 with K-9 Mail.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02 19:11                   ` Phillip Susi
@ 2014-12-03  8:24                     ` Goffredo Baroncelli
  2014-12-04  3:09                       ` Phillip Susi
  0 siblings, 1 reply; 31+ messages in thread
From: Goffredo Baroncelli @ 2014-12-03  8:24 UTC (permalink / raw)
  To: Phillip Susi, Austin S Hemmelgarn, Anand Jain, MegaBrutal, linux-btrfs
  Cc: Robert White, Chris Mason

On 12/02/2014 08:11 PM, Phillip Susi wrote:
> On 12/2/2014 7:23 AM, Austin S Hemmelgarn wrote:
>> Stupid thought, why don't we just add blacklisting based on device
>> path like LVM has for pvscan?
> 
> That isn't logic that belongs in the kernel, so that is going down the
> path of yanking out the device auto probing from btrfs and instead
> writing a mount.btrfs helper that can use policies like blacklisting
> to auto locate all of the correct devices and pass them all to the
> kernel at mount time.
> 
I am thinking about that. Today the device discovery happens:
a) when a device appears, two udev rules run "btrfs dev scan <device>"

/lib/udev/rules.d/70-btrfs.rules
/lib/udev/rules.d/80-btrfs-lvm.rules

b) during the boot it is ran a "btrfs device scan", which scan all 
the device (this happens in debian for other distros may be different)

c) after a btrfs.mkfs, which starts a device scan on each devices of
the new filesystem

d) by the user

Regarding a), the problem is simply solved adding a line like:

ENV{DM_UDEV_LOW_PRIORITY_FLAG}=="1", GOTO="btrfs_end"

Regarding c), it is not a problem

Regarding b) and d), the only solution that I found is to query the 
udev DB inside the "btrfs dev scan" program and to skip the devices
with DM_UDEV_LOW_PRIORITY_FLAG==1. But implementing this, it would 
solve all the points a), b), c), d) with one shot !

BR
G.Baroncelli

P.S.
This is the comment made by LVM by DM_UDEV_LOW_PRIORITY_FLAG:

/*
 * DM_UDEV_LOW_PRIORITY_FLAG is set in case we need to instruct the
 * udev rules to give low priority to the device that is currently
 * processed. For example, this provides a way to select which symlinks
 * could be overwritten by high priority ones if their names are equal.
 * Common situation is a name based on FS UUID while using origin and
 * snapshot devices.
 */
#define DM_UDEV_LOW_PRIORITY_FLAG 0x0010

https://git.fedorahosted.org/cgit/lvm2.git/tree/libdm/libdevmapper.h#n1969

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-03  8:24                     ` Goffredo Baroncelli
@ 2014-12-04  3:09                       ` Phillip Susi
  2014-12-04  5:15                         ` Duncan
  0 siblings, 1 reply; 31+ messages in thread
From: Phillip Susi @ 2014-12-04  3:09 UTC (permalink / raw)
  To: kreijack, Austin S Hemmelgarn, Anand Jain, MegaBrutal, linux-btrfs
  Cc: Robert White, Chris Mason

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 12/03/2014 03:24 AM, Goffredo Baroncelli wrote:
> I am thinking about that. Today the device discovery happens: a)
> when a device appears, two udev rules run "btrfs dev scan
> <device>"
> 
> /lib/udev/rules.d/70-btrfs.rules 
> /lib/udev/rules.d/80-btrfs-lvm.rules
> 
> b) during the boot it is ran a "btrfs device scan", which scan all
>  the device (this happens in debian for other distros may be
> different)
> 
> c) after a btrfs.mkfs, which starts a device scan on each devices
> of the new filesystem
> 
> d) by the user

Are you sure the kernel only gains awareness of btrfs volumes when
user space runs btrfs device scan?  If that is so then that means you
can not boot from a multi device btrfs root without using an
initramfs.  I thought the kernel auto scanned all devices if you tried
to mount a multi device volume, but if this is so, then yes, the udev
rules could be fixed to not call btrfs device scan on an lvm snapshot.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJUf9BpAAoJENRVrw2cjl5RgcQIALCGfplK/xgX/QaiRjNW96l2
DWNPQMIhPesci0gF7Th3sNboew0hrc3g6S0a55wAO12CBhMPdzHxHjd9iFVpKi9O
vzvU36XyzwdcPJkBqRdPJMT2kX+428gYUW7jkyC8usj5eSCyeiIodJuxirGDL5Nb
3TttEJOpbPHGlTzHjAqEcK2ybzYi9HCN3CD3fuLagP9n+4zmFE7tGaGglZ9+7P58
wZjlP5xKDCR4Cu5Hr+5ErrmT2EoOvFC+PLKOT8xXhD9Y2emk2AtuY+5l/w7I+SIS
42gTUqPOx/8AOxBhOhkI0pPO8eK7S/lP1LKoXF0WWHhX8CgJLIHwj5KniDYcjBA=
=HI90
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-04  3:09                       ` Phillip Susi
@ 2014-12-04  5:15                         ` Duncan
  2014-12-04  8:20                           ` MegaBrutal
  0 siblings, 1 reply; 31+ messages in thread
From: Duncan @ 2014-12-04  5:15 UTC (permalink / raw)
  To: linux-btrfs

Phillip Susi posted on Wed, 03 Dec 2014 22:09:29 -0500 as excerpted:

> Are you sure the kernel only gains awareness of btrfs volumes when user
> space runs btrfs device scan?  If that is so then that means you can not
> boot from a multi device btrfs root without using an initramfs.  I
> thought the kernel auto scanned all devices if you tried to mount a
> multi device volume, but if this is so, then yes, the udev rules could
> be fixed to not call btrfs device scan on an lvm snapshot.

That has indeed been the case in the past, and to my knowledge remains 
the case.

Unless it has changed in the last cycle or two (and I've not seen patches 
to that effect on the list nor any hint of such, so I doubt it) the 
kernel doesn't do any such scanning without userspace telling it to.  The 
device= mount option can be used instead, but it didn't work with 
rootflags= on the kernel commandline last I tried so for a multidevice 
btrfs root, yes, an initramfs/initrd is required.

Which is why I'm running an initramfs for the first time since I've 
switched to btrfs raid1 mode root, as I quit with initrds back before 
initramfs was an option.  An initramfs appended to the kernel image beats 
a separate initrd, but I'd still love to see the kernel commandline 
parsing fixed so it broke at the correct = in rootflags=device= (which 
seemed to be the problem, the kernel then didn't seem to recognize 
rootflags at all, as it was apparently seeing it as a parameter called 
rootflags=device, instead of rootflags), so I could be rid of the 
initramfs again.

FWIW, I'm using dracut to generate the cpio archive, which with the right 
kernel config options set, the kernel build process then appends to the 
kernel.  Dracut btrfs module enabled of course, most of the rest force-
disabled as I run a monolithic kernel so don't need module loading, etc.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-04  5:15                         ` Duncan
@ 2014-12-04  8:20                           ` MegaBrutal
  2014-12-04 13:14                             ` Duncan
  0 siblings, 1 reply; 31+ messages in thread
From: MegaBrutal @ 2014-12-04  8:20 UTC (permalink / raw)
  To: linux-btrfs

2014-12-04 6:15 GMT+01:00 Duncan <1i5t5.duncan@cox.net>:
>
> Which is why I'm running an initramfs for the first time since I've
> switched to btrfs raid1 mode root, as I quit with initrds back before
> initramfs was an option.  An initramfs appended to the kernel image beats
> a separate initrd, but I'd still love to see the kernel commandline
> parsing fixed so it broke at the correct = in rootflags=device= (which
> seemed to be the problem, the kernel then didn't seem to recognize
> rootflags at all, as it was apparently seeing it as a parameter called
> rootflags=device, instead of rootflags), so I could be rid of the
> initramfs again.
>

Are you sure it isn't fixed? At least, it parses "rootflags=subvol=@"
well, which also has multiple = signs. And last time I've tried this,
and didn't cause any problems:
"rootflags=device=/dev/mapper/vg-rootlv,subvol=@". Though "device="
shouldn't have an effect in this case anyway, but I didn't get any
complaints against it. Though I use an initrd.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-04  8:20                           ` MegaBrutal
@ 2014-12-04 13:14                             ` Duncan
  0 siblings, 0 replies; 31+ messages in thread
From: Duncan @ 2014-12-04 13:14 UTC (permalink / raw)
  To: linux-btrfs

MegaBrutal posted on Thu, 04 Dec 2014 09:20:12 +0100 as excerpted:

> Are you sure it isn't fixed? At least, it parses "rootflags=subvol=@"
> well, which also has multiple = signs. And last time I've tried this,
> and didn't cause any problems:
> "rootflags=device=/dev/mapper/vg-rootlv,subvol=@". Though "device="
> shouldn't have an effect in this case anyway, but I didn't get any
> complaints against it. Though I use an initrd.

AFAIK lvm requires userspace anyway, thus an initr*, and once you have 
that initr* handling the lvm, it's almost certainly the initr* parsing 
the rootflags= from the kernel commandline as well.  So in that case the 
kernel doesn't /need/ to be able to parse rootflag=, as all it does is 
pass the kernel commandline straight thru to the initr*, which would 
seem, in your case at least, to parse it correctly.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02 11:54               ` Anand Jain
  2014-12-02 12:23                 ` Austin S Hemmelgarn
  2014-12-02 19:14                 ` Phillip Susi
@ 2014-12-08  0:05                 ` Konstantin
  2 siblings, 0 replies; 31+ messages in thread
From: Konstantin @ 2014-12-08  0:05 UTC (permalink / raw)
  To: Anand Jain, kreijack, MegaBrutal, linux-btrfs; +Cc: Robert White, Chris Mason


Anand Jain wrote on 02.12.2014 at 12:54:
>
>
>
> On 02/12/2014 19:14, Goffredo Baroncelli wrote:
>> I further investigate this issue.
>>
>> MegaBrutal, reported the following issue: doing a lvm snapshot of the
>> device of a
>> mounted btrfs fs, the new snapshot device name replaces the name of
>> the original
>> device in the output of /proc/mounts. This confused tools like
>> grub-probe which
>> report a wrong root device.
>
> very good test case indeed thanks.
>
> Actual IO would still go to the original device, until FS is remounted.
This seems to be correct at least at the beginning but I wouldn't be so
sure - why else the system is crashing in my case after a while when the
second drive is present?! So if the kernel was not using it in some way,
except the wrong /proc/mounts nothing else should happen.

>
>> It has to be pointed out that instead the link under
>> /sys/fs/btrfs/<fsid>/devices is
>> correct.
>
> In this context the above sysfs path will be out of sync with the
> reality, its just stale sysfs entry.
>
>>
>> What happens is that *even if the filesystem is mounted*, doing a
>> "btrfs dev scan" of a snapshot (of the real volume), the device name
>> of the
>> filesystem is replaced with the snapshot one.
>
> we have some fundamentally wrong stuff. My original patch tried
> to fix it. But later discovered that some external entities like
> systmed and boot process is using that bug as a feature and we had
> to revert the patch.
>
> Fundamentally scsi inquiry serial number is only number which is unique
> to the device (including the virtual device, but there could be some
> legacy virtual device which didn't follow that strictly, Anyway those
> I deem to be device side issue.) Btrfs depends on the combination of
> fsid, uuid and devid (and generation number) to identify the unique
> device volume, which is weak and easy to go wrong.
>
>
>> Anand, with b96de000b, tried to fix it; however further regression
>> appeared
>> and Chris reverted this commit (see below).
>>
>> BR
>> G.Baroncelli
>>
>> commit b96de000bc8bc9688b3a2abea4332bd57648a49f
>> Author: Anand Jain <anand.jain@oracle.com>
>> Date:   Thu Jul 3 18:22:05 2014 +0800
>>
>>      Btrfs: device_list_add() should not update list when mounted
>> [...]
>>
>>
>> commit 0f23ae74f589304bf33233f85737f4fd368549eb
>> Author: Chris Mason <clm@fb.com>
>> Date:   Thu Sep 18 07:49:05 2014 -0700
>>
>>      Revert "Btrfs: device_list_add() should not update list when
>> mounted"
>>
>>      This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.
>>
>>      This commit is triggering failures to mount by subvolume id in some
>>      configurations.  The main problem is how many different ways this
>>      scanning function is used, both for scanning while mounted and
>>      unmounted.  A proper cleanup is too big for late rcs.
>>
>> [...]
>>
>> On 12/02/2014 09:28 AM, MegaBrutal wrote:
>>> 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli <kreijack@inwind.it>:
>>>> On 12/02/2014 01:15 AM, MegaBrutal wrote:
>>>>> 2014-12-02 0:24 GMT+01:00 Robert White <rwhite@pobox.com>:
>>>>>> On 12/01/2014 02:10 PM, MegaBrutal wrote:
>>>>>>>
>>>>>>> Since having duplicate UUIDs on devices is not a problem for me
>>>>>>> since
>>>>>>> I can tell them apart by LVM names, the discussion is of little
>>>>>>> relevance to my use case. Of course it's interesting and I like to
>>>>>>> read it along, it is not about the actual problem at hand.
>>>>>>>
>>>>>>
>>>>>> Which is why you use the device= mount option, which would take
>>>>>> LVM names
>>>>>> and which was repeatedly discussed as solving this very problem.
>>>>>>
>>>>>> Once you decide to duplicate the UUIDs with LVM snapshots you
>>>>>> take up the
>>>>>> burden of disambiguating your storage.
>>>>>>
>>>>>> Which is part of why re-reading was suggested as this was covered
>>>>>> in some
>>>>>> depth and _is_ _exactly_ about the problem at hand.
>>>>>
>>>>> Nope.
>>>>>
>>>>> root@reproduce-1391429:~# cat /proc/cmdline
>>>>> BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
>>>>> root=/dev/mapper/vg-rootlv ro
>>>>> rootflags=device=/dev/mapper/vg-rootlv,subvol=@
>>>>>
>>>>> Observe, device= mount option is added.
>>>>
>>>> device= options is needed only in a btrfs multi-volume scenario.
>>>> If you have only one disk, this is not needed
>>>>
>>>
>>> I know. I only did this as a demonstration for Robert. He insisted it
>>> will certainly solve the problem. Well, it doesn't.
>>>
>>>
>>>>>
>>>>> root@reproduce-1391429:~# ./reproduce-1391429.sh
>>>>> #!/bin/sh -v
>>>>> lvs
>>>>>    LV     VG   Attr      LSize   Pool Origin Data%  Move Log
>>>>> Copy%  Convert
>>>>>    rootlv vg   -wi-ao---   1.00g
>>>>>    swap0  vg   -wi-ao--- 256.00m
>>>>>
>>>>> grub-probe --target=device /
>>>>> /dev/mapper/vg-rootlv
>>>>>
>>>>> grep " / " /proc/mounts
>>>>> rootfs / rootfs rw 0 0
>>>>> /dev/dm-1 / btrfs rw,relatime,space_cache 0 0
>>>>>
>>>>> lvcreate --snapshot --size=128M --name z vg/rootlv
>>>>>    Logical volume "z" created
>>>>>
>>>>> lvs
>>>>>    LV     VG   Attr      LSize   Pool Origin Data%  Move Log
>>>>> Copy%  Convert
>>>>>    rootlv vg   owi-aos--   1.00g
>>>>>    swap0  vg   -wi-ao--- 256.00m
>>>>>    z      vg   swi-a-s-- 128.00m      rootlv   0.11
>>>>>
>>>>> ls -l /dev/vg/
>>>>> total 0
>>>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv -> ../dm-1
>>>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 -> ../dm-0
>>>>> lrwxrwxrwx 1 root root 7 Dec  2 00:12 z -> ../dm-2
>>>>>
>>>>> grub-probe --target=device /
>>>>> /dev/mapper/vg-z
>>>>>
>>>>> grep " / " /proc/mounts
>>>>> rootfs / rootfs rw 0 0
>>>>> /dev/dm-2 / btrfs rw,relatime,space_cache 0 0
>>>>
>>>> What /proc/self/mountinfo contains ?
>>>
>>> Before creating snapshot:
>>>
>>> 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
>>> 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>>> 17 20 0:5 / /dev rw,relatime - devtmpfs udev
>>> rw,size=241692k,nr_inodes=60423,mode=755
>>> 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
>>> rw,gid=5,mode=620,ptmxmode=000
>>> 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
>>> rw,size=50084k,mode=755
>>> 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache
>>> <----- THIS!
>>> 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none
>>> rw,size=4k,mode=755
>>> 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
>>> 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
>>> 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
>>> 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
>>> rw,size=5120k
>>> 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
>>> 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
>>> rw,size=102400k,mode=755
>>> 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
>>> 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
>>>
>>>
>>> After creating snapshot:
>>>
>>> 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
>>> 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>>> 17 20 0:5 / /dev rw,relatime - devtmpfs udev
>>> rw,size=241692k,nr_inodes=60423,mode=755
>>> 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
>>> rw,gid=5,mode=620,ptmxmode=000
>>> 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
>>> rw,size=50084k,mode=755
>>> 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-2 rw,space_cache
>>> <----- WTF?!
>>> 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none
>>> rw,size=4k,mode=755
>>> 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
>>> 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
>>> 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
>>> 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
>>> rw,size=5120k
>>> 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
>>> 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
>>> rw,size=102400k,mode=755
>>> 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
>>> 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
>>>
>>>
>>> So it's consistent with what /proc/mounts reports.
>>>
>>>
>>>>
>>>> And more important question: it is only the value
>>>> returned by /proc/mount wrongly or also the filesystem
>>>> content is affected ?
>>>>
>>>
>>> I quote my bug report on this:
>>>
>>> "The information reported in /proc/mounts is certainly bogus, since
>>> still the origin device is being written, the kernel does not actually
>>> mix up the devices for write operations, and such, the phenomenon does
>>> not cause data corruption. (I did an entire distro release upgrade
>>> while the conditions were present, and I centainly would have suffered
>>> severe data corruption otherwise. Fortunately, the origin device had
>>> the new distro, and the snapshot device had the old one, so besides
>>> the mixup in /proc/mounts, no actual damage happened.)"
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-02 19:19   ` Phillip Susi
  2014-12-03  3:01     ` Russell Coker
@ 2014-12-08  0:32     ` Konstantin
  2014-12-08 14:59       ` Phillip Susi
  2014-12-08 17:20       ` Robert White
  1 sibling, 2 replies; 31+ messages in thread
From: Konstantin @ 2014-12-08  0:32 UTC (permalink / raw)
  To: Phillip Susi, MegaBrutal, linux-btrfs

Phillip Susi wrote on 02.12.2014 at 20:19:
> On 12/1/2014 4:45 PM, Konstantin wrote:
> > The bug appears also when using mdadm RAID1 - when one of the
> > drives is detached from the array then the OS discovers it and
> > after a while (not directly, it takes several minutes) it appears
> > under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1.
> > And usually after some hour or so (depending on system workload)
> > the PC completely freezes. So discussion about the uniqueness of
> > UUIDs or not, a crashing kernel is telling me that there is a
> > serious bug.
>
> I'm guessing you are using metadata format 0.9 or 1.0, which put the
> metadata at the end of the drive and the filesystem still starts in
> sector zero.  1.2 is now the default and would not have this problem
> as its metadata is at the start of the disk ( well, 4k from the start
> ) and the fs starts further down.
I know this and I'm using 0.9 on purpose. I need to boot from these
disks so I can't use 1.2 format as the BIOS wouldn't recognize the
partitions. Having an additional non-RAID disk for booting introduces a
single point of failure which contrary to the idea of RAID>0.

Anyway, to avoid a futile discussion, mdraid and its format is not the
problem, it is just an example of the problem. Using dm-raid would do
the same trouble, LVM apparently, too. I could think of a bunch of other
cases including the use of hardware based RAID controllers. OK, it's not
the majority's problem, but that's not the argument to keep a bug/flaw
capable of crashing your system.

As it is a nice feature that the kernel apparently scans for drives and
automatically identifies BTRFS ones, it seems to me that this feature is
useless. When in a live system a BTRFS RAID disk fails, it is not
sufficient to hot-replace it, the kernel will not automatically
rebalance. Commands are still needed for the task as are with mdraid. So
the only point I can see at the moment where this auto-detect feature
makes sense is when mounting the device for the first time. If I
remember the documentation correctly, you mount one of the RAID devices
and the others are automagically attached as well. But outside of the
mount process, what is this auto-detect used for?

So here a couple of rather simple solutions which, as far as I can see,
could solve the problem:

1. Limit the auto-detect to the mount process and don't do it when
devices are appearing.

2. When a BTRFS device is detected and its metadata is identical to one
already mounted, just ignore it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-08  0:32     ` Konstantin
@ 2014-12-08 14:59       ` Phillip Susi
  2014-12-08 22:25         ` Konstantin
  2014-12-10  3:10         ` Anand Jain
  2014-12-08 17:20       ` Robert White
  1 sibling, 2 replies; 31+ messages in thread
From: Phillip Susi @ 2014-12-08 14:59 UTC (permalink / raw)
  To: Konstantin, MegaBrutal, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/7/2014 7:32 PM, Konstantin wrote:
>> I'm guessing you are using metadata format 0.9 or 1.0, which put
>> the metadata at the end of the drive and the filesystem still
>> starts in sector zero.  1.2 is now the default and would not have
>> this problem as its metadata is at the start of the disk ( well,
>> 4k from the start ) and the fs starts further down.
> I know this and I'm using 0.9 on purpose. I need to boot from
> these disks so I can't use 1.2 format as the BIOS wouldn't
> recognize the partitions. Having an additional non-RAID disk for
> booting introduces a single point of failure which contrary to the
> idea of RAID>0.

The bios does not know or care about partitions.  All you need is a
partition table in the MBR and you can install grub there and have it
boot the system from a mdadm 1.1 or 1.2 format array housed in a
partition on the rest of the disk.  The only time you really *have* to
use 0.9 or 1.0 ( and you really should be using 1.0 instead since it
handles larger arrays and can't be confused vis. whole disk vs.
partition components ) is if you are running a raid1 on the raw disk,
with no partition table and then partition inside the array instead,
and really, you just shouldn't be doing that.

> Anyway, to avoid a futile discussion, mdraid and its format is not
> the problem, it is just an example of the problem. Using dm-raid
> would do the same trouble, LVM apparently, too. I could think of a
> bunch of other cases including the use of hardware based RAID
> controllers. OK, it's not the majority's problem, but that's not
> the argument to keep a bug/flaw capable of crashing your system.

dmraid solves the problem by removing the partitions from the
underlying physical device ( /dev/sda ), and only exposing them on the
array ( /dev/mapper/whatever ).  LVM only has the problem when you
take a snapshot.  User space tools face the same issue and they
resolve it by ignoring or deprioritizing the snapshot.

> As it is a nice feature that the kernel apparently scans for drives
> and automatically identifies BTRFS ones, it seems to me that this
> feature is useless. When in a live system a BTRFS RAID disk fails,
> it is not sufficient to hot-replace it, the kernel will not
> automatically rebalance. Commands are still needed for the task as
> are with mdraid. So the only point I can see at the moment where
> this auto-detect feature makes sense is when mounting the device
> for the first time. If I remember the documentation correctly, you
> mount one of the RAID devices and the others are automagically
> attached as well. But outside of the mount process, what is this
> auto-detect used for?
> 
> So here a couple of rather simple solutions which, as far as I can
> see, could solve the problem:
> 
> 1. Limit the auto-detect to the mount process and don't do it when 
> devices are appearing.
> 
> 2. When a BTRFS device is detected and its metadata is identical to
> one already mounted, just ignore it.

That doesn't really solve the problem since you can still pick the
wrong one to mount in the first place.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUhbztAAoJENRVrw2cjl5RomkH/26Q3M6LXVaF0qEcEzFTzGEL
uVAOKBY040Ui5bSK0WQYnH0XtE8vlpLSFHxrRa7Ygpr3jhffSsu6ZsmbOclK64ZA
Z8rNEmRFhOxtFYTcQwcUbeBtXEN3k/5H49JxbjUDItnVPBoeK3n7XG4i1Lap5IdY
GXyLbh7ogqd/p+wX6Om20NkJSx4xzyU85E4ZvDADQA+2RIBaXva5tDPx5/UD4XBQ
h8ai+wS1iC8EySKxwKBEwzwb7+Z6w7nOWO93v/lL34fwTg0OIY9uEfTaAy5KcDjz
z6QXWTmvrbiFpyy/qyGSqBGlPjZ+r98mVEDbYWCVfK8AoD6UmteD7R8WAWkWiWY=
=PJww
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-08  0:32     ` Konstantin
  2014-12-08 14:59       ` Phillip Susi
@ 2014-12-08 17:20       ` Robert White
  2014-12-08 22:38         ` Konstantin
  1 sibling, 1 reply; 31+ messages in thread
From: Robert White @ 2014-12-08 17:20 UTC (permalink / raw)
  To: Konstantin, Phillip Susi, MegaBrutal, linux-btrfs

On 12/07/2014 04:32 PM, Konstantin wrote:
> I know this and I'm using 0.9 on purpose. I need to boot from these
> disks so I can't use 1.2 format as the BIOS wouldn't recognize the
> partitions. Having an additional non-RAID disk for booting introduces a
> single point of failure which contrary to the idea of RAID>0.

GRUB2 has raid 1.1 and 1.2 metadata support via the mdraid1x module. LVM 
is also supported. I don't know if a stack of both is supported.

There is, BTW, no such thing as a (commodity) computer without a single 
point of failure in it somewhere. I've watched government contracts 
chase this demon for decades. Be it disk, controller, network card, bus 
chip, cpu or stick-of-ram you've got a single point of failure 
somewhere. Actually you likely have several such points of potential 
failure.

For instance, are you _sure_ your BIOS is going to check the second 
drive if it gets read failure after starting in on your first drive? 
Chances are it won't because that four-hundred bytes-or-so boot loader 
on that first disk has no way to branch back into the bios.

You can waste a lot of your life chasing that ghost and you'll still 
discover you've missed it and have to whip out your backup boot media.

It may well be worth having a second copy of /boot around, but make sure 
you stay out of bandersnatch territory when designing your system. "The 
more you over-think the plumbing, the easier it is to stop up the pipes."

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-08 14:59       ` Phillip Susi
@ 2014-12-08 22:25         ` Konstantin
  2014-12-09 16:04           ` Phillip Susi
  2014-12-10  3:10         ` Anand Jain
  1 sibling, 1 reply; 31+ messages in thread
From: Konstantin @ 2014-12-08 22:25 UTC (permalink / raw)
  To: Phillip Susi, MegaBrutal, linux-btrfs


Phillip Susi schrieb am 08.12.2014 um 15:59:
> On 12/7/2014 7:32 PM, Konstantin wrote:
> >> I'm guessing you are using metadata format 0.9 or 1.0, which put
> >> the metadata at the end of the drive and the filesystem still
> >> starts in sector zero.  1.2 is now the default and would not have
> >> this problem as its metadata is at the start of the disk ( well,
> >> 4k from the start ) and the fs starts further down.
> > I know this and I'm using 0.9 on purpose. I need to boot from
> > these disks so I can't use 1.2 format as the BIOS wouldn't
> > recognize the partitions. Having an additional non-RAID disk for
> > booting introduces a single point of failure which contrary to the
> > idea of RAID>0.
>
> The bios does not know or care about partitions.  All you need is a
That's only true for older BIOSs. With current EFI boards they not only
care but some also mess around with GPT partition tables.
> partition table in the MBR and you can install grub there and have it
> boot the system from a mdadm 1.1 or 1.2 format array housed in a
> partition on the rest of the disk.  The only time you really *have* to
I was thinking of this solution as well but as I'm not aware of any
partitioning tool caring about mdadm metadata so I rejected it. It
requires a non-standard layout leaving reserved empty spaces for mdadm
metadata. It's possible but it isn't documented so far I know and before
losing hours of trying I chose the obvious one.
> use 0.9 or 1.0 ( and you really should be using 1.0 instead since it
> handles larger arrays and can't be confused vis. whole disk vs.
> partition components ) is if you are running a raid1 on the raw disk,
> with no partition table and then partition inside the array instead,
> and really, you just shouldn't be doing that.
That's exactly what I want to do - running RAID1 on the whole disk as
most hardware based RAID systems do. Before that I was running RAID on
disk partitions for some years but this was quite a pain in comparison.
Hot(un)plugging a drive brings you a lot of issues with failing mdadm
commands as they don't like concurrent execution when the same physical
device is affected. And rebuild of RAID partitions is done sequentially
with no deterministic order. We could talk for hours about that but if
interested maybe better in private as it is not BTRFS related.
> > Anyway, to avoid a futile discussion, mdraid and its format is not
> > the problem, it is just an example of the problem. Using dm-raid
> > would do the same trouble, LVM apparently, too. I could think of a
> > bunch of other cases including the use of hardware based RAID
> > controllers. OK, it's not the majority's problem, but that's not
> > the argument to keep a bug/flaw capable of crashing your system.
>
> dmraid solves the problem by removing the partitions from the
> underlying physical device ( /dev/sda ), and only exposing them on the
> array ( /dev/mapper/whatever ).  LVM only has the problem when you
> take a snapshot.  User space tools face the same issue and they
> resolve it by ignoring or deprioritizing the snapshot.
I don't agree. dmraid and mdraid both remove the partitions. This is not
a solution BTRFS will still crash the PC using /dev/mapper/whatever or
whatever device appears in the system providing the BTRFS volume.
> > As it is a nice feature that the kernel apparently scans for drives
> > and automatically identifies BTRFS ones, it seems to me that this
> > feature is useless. When in a live system a BTRFS RAID disk fails,
> > it is not sufficient to hot-replace it, the kernel will not
> > automatically rebalance. Commands are still needed for the task as
> > are with mdraid. So the only point I can see at the moment where
> > this auto-detect feature makes sense is when mounting the device
> > for the first time. If I remember the documentation correctly, you
> > mount one of the RAID devices and the others are automagically
> > attached as well. But outside of the mount process, what is this
> > auto-detect used for?
>
> > So here a couple of rather simple solutions which, as far as I can
> > see, could solve the problem:
>
> > 1. Limit the auto-detect to the mount process and don't do it when
> > devices are appearing.
>
> > 2. When a BTRFS device is detected and its metadata is identical to
> > one already mounted, just ignore it.
>
> That doesn't really solve the problem since you can still pick the
> wrong one to mount in the first place.
Oh, it does solve the problem, you are are speaking of another problem
which is always there when having several disks in a system. Mounting
the wrong device can happen the case I'm describing if you use UUID,
label or some other metadata related information to mount it. You won't
try do that when you insert a disk you know it has the same metadata. It
will not happen (except user tools outsmart you ;-)) when using the
device name(s). I think it could be expected from a user mounting things
manually to know or learn which device node is which drive. On the other
hand in my case one of the drives is already mounted so getting it
confused with a freshly inserted drive is not easy. Oh, I forgot that
part of this bug is that /proc/mounts starts to give wrong information
so in this case, yes, it gets much more likely to pick the wrong drive.
It even can happen that you format the mounted drive as user tools would
refuse to work on the non-mounted drive but may go for the mounted
one... not so funny.

Speaking of BTRFS tools, I am still somehow confused that the problem
confusing or mixing devices happens at all. I don't know the metadata of
a BTRFS RAID setup but I assume there must be something like a drive
index in there, as the order of RAID5 drives does matter. So having a
second device with identical metadata should be considered invalid for
auto-adding anyway.

But all this is fighting is at the wrong place. My understanding of a
good system is not that users should learn what flaws can cause an
unintentional crash but that it should not have such flaws. We have
identified such a flaw. Regardless of the discussion that there are
better ways to organize your drives, the system should not shoot itself
when doing suboptimal things. So I'm still convinced that Linux guys
should be interested to make the system better. So is the general
opinion that there is a bug which needs to be fixed or is this a problem
people will have to live with?



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-08 17:20       ` Robert White
@ 2014-12-08 22:38         ` Konstantin
  2014-12-08 23:17           ` Robert White
  0 siblings, 1 reply; 31+ messages in thread
From: Konstantin @ 2014-12-08 22:38 UTC (permalink / raw)
  To: Robert White, Phillip Susi, MegaBrutal, linux-btrfs


Robert White schrieb am 08.12.2014 um 18:20:
> On 12/07/2014 04:32 PM, Konstantin wrote:
>> I know this and I'm using 0.9 on purpose. I need to boot from these
>> disks so I can't use 1.2 format as the BIOS wouldn't recognize the
>> partitions. Having an additional non-RAID disk for booting introduces a
>> single point of failure which contrary to the idea of RAID>0.
>
> GRUB2 has raid 1.1 and 1.2 metadata support via the mdraid1x module.
> LVM is also supported. I don't know if a stack of both is supported.
>
> There is, BTW, no such thing as a (commodity) computer without a
> single point of failure in it somewhere. I've watched government
> contracts chase this demon for decades. Be it disk, controller,
> network card, bus chip, cpu or stick-of-ram you've got a single point
> of failure somewhere. Actually you likely have several such points of
> potential failure.
>
> For instance, are you _sure_ your BIOS is going to check the second
> drive if it gets read failure after starting in on your first drive?
> Chances are it won't because that four-hundred bytes-or-so boot loader
> on that first disk has no way to branch back into the bios.
>
> You can waste a lot of your life chasing that ghost and you'll still
> discover you've missed it and have to whip out your backup boot media.
>
> It may well be worth having a second copy of /boot around, but make
> sure you stay out of bandersnatch territory when designing your
> system. "The more you over-think the plumbing, the easier it is to
> stop up the pipes."
You are right, there is as good as always a single point of failure
somewhere, even if it is the power plant providing your electricity ;-).
I should have written "introduces an additional single point of failure"
to be 100% correct but I thought this was obvious. As I have replaced
dozens of damaged hard disks but only a few CPUs, RAMs etc. it is more
important for me to reduce the most frequent and easy-to-solve points of
failure. For more important systems there are high availability
solutions which alleviate many of the problems you mention of but that's
not the point here when speaking about the major bug in BTRFS which can
make your system crash.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-08 22:38         ` Konstantin
@ 2014-12-08 23:17           ` Robert White
  0 siblings, 0 replies; 31+ messages in thread
From: Robert White @ 2014-12-08 23:17 UTC (permalink / raw)
  To: Konstantin, Phillip Susi, MegaBrutal, linux-btrfs

On 12/08/2014 02:38 PM, Konstantin wrote:
> For more important systems there are high availability
> solutions which alleviate many of the problems you mention of but that's
> not the point here when speaking about the major bug in BTRFS which can
> make your system crash.

I think you missed the part where I told you that you could use GRUB2 
and then you could use the 1.2 metadata on your raid and then have you 
system work as desired.

Trying to make this all about BTRFS is more than a touch disingenuous as 
you are doing things that can make many systems fail in exactly the same 
way.

Undefined behavior is undefined.

The MDADM people made the latter metadata layouts to address your issue, 
and its up to you to use it. Need it to boot, GRUB2 will boot it, and 
it's up to you to use it.

New software fixes problems evident in the old, but once you decide to 
stick with the old despite the new, your problem becomes uninteresting 
because it was already fixed.

So yes, if you use the woefully out of date metadata and boot loader you 
will have problems. If you use the distro scripts that scan the volumes 
you don't want scanned, you will have problems. People are working on 
making sure that those problems have work arounds. And sometimes the 
work around for "doctor, it hurts when I do this" is "don't do that any 
more".

It is multiplicatively impossible to build BTRFS such that it can dance 
through the entire Cartesian Product of all possible storage management 
solutions. Just as it was impossible for LVM and MDADM before them. If 
your system is layered, _you_ bear the burden of making sure that the 
layers are applied. Each tool is evolving to help you, but its still you 
doing the system design.

GRUB has put in modules for everything you need (so far) to boot. mdadm 
has better signatures if you use them. LVM always had device offsets 
built into its metadata block.

But answering the negative. The sort of question that might be phrased 
"how do you know it's _not_ mdadm old style signatures" is an unbounded 
coding, not because any one is impossible to code, but because an 
endless stream of possibilities is coming in the pipe. A striped storage 
controller might make a system look like /dev/sdb is a stand-alone BTRFS 
file system if the controller doesn't start and the mdadm and lvm 
signatures are on /dev/sda and take up "just the right amount of room".

If I do an mkfs.ext2 on a media, then do a cryptsetup luksCreate on that 
same media, I can mount it either way, with disastrous consequences for 
the other semantic layout.

The bad combinations available are virtually limitless.

There comes a point where the System Architect that decided how to build 
the individual system has to take responsibility for his actions.

Note that the same "it didn't protect me" errors can happen _easily_ 
with other filesystems. Try building an NTFS on a disk, then build an 
ext4 on the same disk then mount as either or both. (though now days you 
may need to build the ext4 then the NTFS since I think mkfs.ext4 may now 
have a little dedicated wiper to de-NTFS a disk after that went sour a 
few too many times).

When storage signatures conflict you will get "exciting" outcomes. It 
will always be that way, and its not an "error" in any of that 
filesystem code. You, the System Architect, bear a burden here.

The system isn't shooting "itself" when you do certain things. The 
System Architect is shooting the system with a bad layout bullet.

You don't want some LV to be scanned... don't scan it... If your tools 
scan it automatically, don't use those tools that way. "But my distro 
automatically" is just a reason to look twice at your distro or your design.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-08 22:25         ` Konstantin
@ 2014-12-09 16:04           ` Phillip Susi
  0 siblings, 0 replies; 31+ messages in thread
From: Phillip Susi @ 2014-12-09 16:04 UTC (permalink / raw)
  To: Konstantin, MegaBrutal, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/8/2014 5:25 PM, Konstantin wrote:
> 
> Phillip Susi schrieb am 08.12.2014 um 15:59:
>> The bios does not know or care about partitions.  All you need is
>> a
> That's only true for older BIOSs. With current EFI boards they not
> only care but some also mess around with GPT partition tables.

EFI is a whole other beast that we aren't talking about.

>> partition table in the MBR and you can install grub there and
>> have it boot the system from a mdadm 1.1 or 1.2 format array
>> housed in a partition on the rest of the disk.  The only time you
>> really *have* to
> I was thinking of this solution as well but as I'm not aware of
> any partitioning tool caring about mdadm metadata so I rejected it.
> It requires a non-standard layout leaving reserved empty spaces for
> mdadm metadata. It's possible but it isn't documented so far I know
> and before losing hours of trying I chose the obvious one.

What on earth are you talking about?  Partitioning tool that cares
about mdadm?  non-standard layout?  I am talking about the bog
standard layout where you create a partition, then use that partition
to build an mdadm array.  mdadm takes care of its own metadata.  There
isn't anything unusual, non obvious, or undocumented here.

>> use 0.9 or 1.0 ( and you really should be using 1.0 instead since
>> it handles larger arrays and can't be confused vis. whole disk
>> vs. partition components ) is if you are running a raid1 on the
>> raw disk, with no partition table and then partition inside the
>> array instead, and really, you just shouldn't be doing that.
> That's exactly what I want to do - running RAID1 on the whole disk
> as most hardware based RAID systems do. Before that I was running
> RAID on disk partitions for some years but this was quite a pain in
> comparison. Hot(un)plugging a drive brings you a lot of issues with
> failing mdadm commands as they don't like concurrent execution when
> the same physical device is affected. And rebuild of RAID
> partitions is done sequentially with no deterministic order. We
> could talk for hours about that but if interested maybe better in
> private as it is not BTRFS related.

So don't create more than one raid partition on the disk.

>> dmraid solves the problem by removing the partitions from the 
>> underlying physical device ( /dev/sda ), and only exposing them
>> on the array ( /dev/mapper/whatever ).  LVM only has the problem
>> when you take a snapshot.  User space tools face the same issue
>> and they resolve it by ignoring or deprioritizing the snapshot.
> I don't agree. dmraid and mdraid both remove the partitions. This
> is not a solution BTRFS will still crash the PC using
> /dev/mapper/whatever or whatever device appears in the system
> providing the BTRFS volume.

You just said btrfs will crash by accessing the *correct* volume after
the *incorrect* one has been removed.  You aren't making any sense.
The problem only arises when the same partition is visible on *both*
the raw disk, and the md device.

> Speaking of BTRFS tools, I am still somehow confused that the
> problem confusing or mixing devices happens at all. I don't know
> the metadata of a BTRFS RAID setup but I assume there must be
> something like a drive index in there, as the order of RAID5 drives
> does matter. So having a second device with identical metadata
> should be considered invalid for auto-adding anyway.

Again, the problem is when you first boot up and/or mount the volume.
 Which of the duplicate devices shows up first is indeterminate so
just saying ignore the second one doesn't help.  Even saying "well
error out if there are two" doesn't help since that leaves open a race
condition where the second volume has not appeared yet at the time you
do the check.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUhx16AAoJENRVrw2cjl5R+IYH/R+ftOiy444+W/K+C0cFKBdi
RlMa2Op9Q0322Rae1IiJvkX/TPUQEnr7sFXcOIhYL9/HKB8zGMr+CQq+9rq8lGdB
QurLcI0MpWbwZZCJCTzrJxRBqqPOXKJ1aU9vWLuuGhS9tCdkfxfy9qcXPnmC2Qta
PfN1Qlr4Invb3Kb/NuB2w7S4nhzYLgBa1KgBDm3EWdCzG03WHMAxwSiBgMvf3nzc
DJ/JMF5TP70760yrlWCvFIa1fgWbGVp7fT9yArDab8N53FYAuE8WIunn+g1hHyue
MTF5ZPhEjVKUVHY1Tl1dqdv0i35TXCbXiVwCwk02veV2+lf95zeNcynmB9kUiSc=
=gvB2
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-08 14:59       ` Phillip Susi
  2014-12-08 22:25         ` Konstantin
@ 2014-12-10  3:10         ` Anand Jain
  2014-12-10 15:57           ` Phillip Susi
  1 sibling, 1 reply; 31+ messages in thread
From: Anand Jain @ 2014-12-10  3:10 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Konstantin, MegaBrutal, linux-btrfs



On 08/12/2014 22:59, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 12/7/2014 7:32 PM, Konstantin wrote:
>>> I'm guessing you are using metadata format 0.9 or 1.0, which put
>>> the metadata at the end of the drive and the filesystem still
>>> starts in sector zero.  1.2 is now the default and would not have
>>> this problem as its metadata is at the start of the disk ( well,
>>> 4k from the start ) and the fs starts further down.
>> I know this and I'm using 0.9 on purpose. I need to boot from
>> these disks so I can't use 1.2 format as the BIOS wouldn't
>> recognize the partitions. Having an additional non-RAID disk for
>> booting introduces a single point of failure which contrary to the
>> idea of RAID>0.
>
> The bios does not know or care about partitions.  All you need is a
> partition table in the MBR and you can install grub there and have it
> boot the system from a mdadm 1.1 or 1.2 format array housed in a
> partition on the rest of the disk.  The only time you really *have* to
> use 0.9 or 1.0 ( and you really should be using 1.0 instead since it
> handles larger arrays and can't be confused vis. whole disk vs.
> partition components ) is if you are running a raid1 on the raw disk,
> with no partition table and then partition inside the array instead,
> and really, you just shouldn't be doing that.
>
>> Anyway, to avoid a futile discussion, mdraid and its format is not
>> the problem, it is just an example of the problem. Using dm-raid
>> would do the same trouble, LVM apparently, too. I could think of a
>> bunch of other cases including the use of hardware based RAID
>> controllers. OK, it's not the majority's problem, but that's not
>> the argument to keep a bug/flaw capable of crashing your system.
>
> dmraid solves the problem by removing the partitions from the
> underlying physical device ( /dev/sda ), and only exposing them on the
> array ( /dev/mapper/whatever ).  LVM only has the problem when you
> take a snapshot.  User space tools face the same issue and they
> resolve it by ignoring or deprioritizing the snapshot.
>
>> As it is a nice feature that the kernel apparently scans for drives
>> and automatically identifies BTRFS ones, it seems to me that this
>> feature is useless. When in a live system a BTRFS RAID disk fails,
>> it is not sufficient to hot-replace it, the kernel will not
>> automatically rebalance. Commands are still needed for the task as
>> are with mdraid. So the only point I can see at the moment where
>> this auto-detect feature makes sense is when mounting the device
>> for the first time. If I remember the documentation correctly, you
>> mount one of the RAID devices and the others are automagically
>> attached as well. But outside of the mount process, what is this
>> auto-detect used for?
>>
>> So here a couple of rather simple solutions which, as far as I can
>> see, could solve the problem:
>>
>> 1. Limit the auto-detect to the mount process and don't do it when
>> devices are appearing.

  In the test case provided earlier who is triggering the scan ?
  grub-probe ?


>> 2. When a BTRFS device is detected and its metadata is identical to
>> one already mounted, just ignore it.

  Seems like patch:
    commit b96de000bc8bc9688b3a2abea4332bd57648a49f
    Author: Anand Jain <anand.jain@oracle.com>
    Date:   Thu Jul 3 18:22:05 2014 +0800

      Btrfs: device_list_add() should not update list when mounted


But we had to revert, Since btrfs bug become a feature for the system 
boot process and fixing that breaks mount at boot with subvol.

  commit 0f23ae74f589304bf33233f85737f4fd368549eb
  Author: Chris Mason <clm@fb.com>
  Date:   Thu Sep 18 07:49:05 2014 -0700

    Revert "Btrfs: device_list_add() should not update list when mounted"

      This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.


> That doesn't really solve the problem since you can still pick the
> wrong one to mount in the first place.

  The question is does both device has same generation number ?
  if not then this fix will take care of picking the device
  with larger generation number it during mount.

commit 77bdae4d136e167bab028cbec58b988f91cf73c0
Author: Anand Jain <anand.jain@oracle.com>
Date:   Thu Jul 3 18:22:06 2014 +0800

     btrfs: check generation as replace duplicates devid+uuid


  Yes if there are two devices with the same
    fsid + devid + uuid + generation

  then it use last probed during mount.
  OR
  if the device is already mounted, just the device path is updated
  but still the original device will be still in use (bug).

Thanks


> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (MingW32)
>
> iQEcBAEBAgAGBQJUhbztAAoJENRVrw2cjl5RomkH/26Q3M6LXVaF0qEcEzFTzGEL
> uVAOKBY040Ui5bSK0WQYnH0XtE8vlpLSFHxrRa7Ygpr3jhffSsu6ZsmbOclK64ZA
> Z8rNEmRFhOxtFYTcQwcUbeBtXEN3k/5H49JxbjUDItnVPBoeK3n7XG4i1Lap5IdY
> GXyLbh7ogqd/p+wX6Om20NkJSx4xzyU85E4ZvDADQA+2RIBaXva5tDPx5/UD4XBQ
> h8ai+wS1iC8EySKxwKBEwzwb7+Z6w7nOWO93v/lL34fwTg0OIY9uEfTaAy5KcDjz
> z6QXWTmvrbiFpyy/qyGSqBGlPjZ+r98mVEDbYWCVfK8AoD6UmteD7R8WAWkWiWY=
> =PJww
> -----END PGP SIGNATURE-----
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
  2014-12-10  3:10         ` Anand Jain
@ 2014-12-10 15:57           ` Phillip Susi
  0 siblings, 0 replies; 31+ messages in thread
From: Phillip Susi @ 2014-12-10 15:57 UTC (permalink / raw)
  To: Anand Jain; +Cc: Konstantin, MegaBrutal, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/9/2014 10:10 PM, Anand Jain wrote:
> In the test case provided earlier who is triggering the scan ? 
> grub-probe ?

The scan is initiated by udev.  grub-probe only comes into it because
it is looking to /proc/mounts to find out what device is mounted, and
/proc/mounts is lieing.

> But we had to revert, Since btrfs bug become a feature for the
> system boot process and fixing that breaks mount at boot with
> subvol.

How is this?  Also are we talking about updating the cached list of
devices that *can* be mounted, or what device already *is* mounted?  I
can see doing the former, but the latter should never happen.

> if the device is already mounted, just the device path is updated 
> but still the original device will be still in use (bug).

Yep, that is the bug that started all of this.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUiG1MAAoJENRVrw2cjl5Rm0gIAJ6sq72zKSEfCuCjigknx25T
a97wjtMeb+yeaECc5FfwN7Fm454GSSuj6RFCRVjo3sCgJP3sUEH49syJnvW1QiEP
A5ktXfTpz6/zaeP9DbGPDCiVix0RdsJ6bCjP/8InsASueXOENCpxxmblxrbE4Wxj
Mdz8lu9L8G+fc6btbLLb0N4i0clSiImQds90zTQ1cXihJ/4wUIO3qgq+rruSYMqI
A182FS7NTUQrRcJ/rbcha3dCyD/urbCaRTUztMvTnSs3a7hK5p+SBNbfxEORC6ni
HrRMxpOlgHOTMnL3EHw843OuGv0Us3VqVbuPG3K6L4+G4W1sFxgKEAnLvEbjzAI=
=Vpre
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-12-10 15:57 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-01 12:56 PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots MegaBrutal
2014-12-01 17:27 ` Robert White
2014-12-01 22:10   ` MegaBrutal
2014-12-01 23:24     ` Robert White
2014-12-02  0:15       ` MegaBrutal
2014-12-02  7:50         ` Goffredo Baroncelli
2014-12-02  8:28           ` MegaBrutal
2014-12-02 11:14             ` Goffredo Baroncelli
2014-12-02 11:54               ` Anand Jain
2014-12-02 12:23                 ` Austin S Hemmelgarn
2014-12-02 19:11                   ` Phillip Susi
2014-12-03  8:24                     ` Goffredo Baroncelli
2014-12-04  3:09                       ` Phillip Susi
2014-12-04  5:15                         ` Duncan
2014-12-04  8:20                           ` MegaBrutal
2014-12-04 13:14                             ` Duncan
2014-12-02 19:14                 ` Phillip Susi
2014-12-08  0:05                 ` Konstantin
2014-12-01 21:45 ` Konstantin
2014-12-02  5:47   ` MegaBrutal
2014-12-02 19:19   ` Phillip Susi
2014-12-03  3:01     ` Russell Coker
2014-12-08  0:32     ` Konstantin
2014-12-08 14:59       ` Phillip Susi
2014-12-08 22:25         ` Konstantin
2014-12-09 16:04           ` Phillip Susi
2014-12-10  3:10         ` Anand Jain
2014-12-10 15:57           ` Phillip Susi
2014-12-08 17:20       ` Robert White
2014-12-08 22:38         ` Konstantin
2014-12-08 23:17           ` Robert White

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.