All of lore.kernel.org
 help / color / mirror / Atom feed
* RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
@ 2017-08-02  8:38 Brendan Hide
  2017-08-02  9:11 ` Wang Shilong
                   ` (5 more replies)
  0 siblings, 6 replies; 63+ messages in thread
From: Brendan Hide @ 2017-08-02  8:38 UTC (permalink / raw)
  To: linux-btrfs

The title seems alarmist to me - and I suspect it is going to be 
misconstrued. :-/

 From the release notes at 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html

"Btrfs has been deprecated

The Btrfs file system has been in Technology Preview state since the 
initial release of Red Hat Enterprise Linux 6. Red Hat will not be 
moving Btrfs to a fully supported feature and it will be removed in a 
future major release of Red Hat Enterprise Linux.

The Btrfs file system did receive numerous updates from the upstream in 
Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat 
Enterprise Linux 7 series. However, this is the last planned update to 
this feature.

Red Hat will continue to invest in future technologies to address the 
use cases of our customers, specifically those related to snapshots, 
compression, NVRAM, and ease of use. We encourage feedback through your 
Red Hat representative on features and requirements you have for file 
systems and storage technology."



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02  8:38 RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut? Brendan Hide
@ 2017-08-02  9:11 ` Wang Shilong
  2017-08-03 19:18   ` Chris Murphy
  2017-08-02 11:25 ` Austin S. Hemmelgarn
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 63+ messages in thread
From: Wang Shilong @ 2017-08-02  9:11 UTC (permalink / raw)
  To: Brendan Hide, linux-btrfs

I haven't seen active btrfs developers from some time, Redhat looks
put most of their efforts on XFS, It is time to switch to SLES/opensuse!


On Wed, Aug 2, 2017 at 4:38 PM, Brendan Hide <brendan@swiftspirit.co.za> wrote:
> The title seems alarmist to me - and I suspect it is going to be
> misconstrued. :-/
>
> From the release notes at
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html
>
> "Btrfs has been deprecated
>
> The Btrfs file system has been in Technology Preview state since the initial
> release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a
> fully supported feature and it will be removed in a future major release of
> Red Hat Enterprise Linux.
>
> The Btrfs file system did receive numerous updates from the upstream in Red
> Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise
> Linux 7 series. However, this is the last planned update to this feature.
>
> Red Hat will continue to invest in future technologies to address the use
> cases of our customers, specifically those related to snapshots,
> compression, NVRAM, and ease of use. We encourage feedback through your Red
> Hat representative on features and requirements you have for file systems
> and storage technology."
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02  8:38 RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut? Brendan Hide
  2017-08-02  9:11 ` Wang Shilong
@ 2017-08-02 11:25 ` Austin S. Hemmelgarn
  2017-08-02 12:55   ` Lutz Vieweg
  2017-08-02 18:44 ` Chris Mason
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-02 11:25 UTC (permalink / raw)
  To: Brendan Hide, linux-btrfs

On 2017-08-02 04:38, Brendan Hide wrote:
> The title seems alarmist to me - and I suspect it is going to be 
> misconstrued. :-/
> 
>  From the release notes at 
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html 
> 
> 
> "Btrfs has been deprecated
> 
> The Btrfs file system has been in Technology Preview state since the 
> initial release of Red Hat Enterprise Linux 6. Red Hat will not be 
> moving Btrfs to a fully supported feature and it will be removed in a 
> future major release of Red Hat Enterprise Linux.
> 
> The Btrfs file system did receive numerous updates from the upstream in 
> Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat 
> Enterprise Linux 7 series. However, this is the last planned update to 
> this feature.
> 
> Red Hat will continue to invest in future technologies to address the 
> use cases of our customers, specifically those related to snapshots, 
> compression, NVRAM, and ease of use. We encourage feedback through your 
> Red Hat representative on features and requirements you have for file 
> systems and storage technology."

And this is a worst-case result of the fact that most distros added 
BTRFS support long before it was ready.

I'm betting some RH customer lost a lot of data because they didn't pay 
attention to the warnings and didn't do their research and were using 
raid5/6, and thus RH is considering it not worth investing in.  That, or 
they got fed up with the grandiose plans with no realistic timeline. 
There have been a number of cases of mishandled patches (chunk-level 
degraded check anyone?), and a lot of important (from an enterprise 
usage sense) features that have been proposed but to a naive outside 
have seen little to no progress (hot-spare support, device failure 
detection and handling, higher-order replication, working erasure coding 
(raid56), etc), and from both aspects, I can understand them not wanting 
to deal with it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02 11:25 ` Austin S. Hemmelgarn
@ 2017-08-02 12:55   ` Lutz Vieweg
  2017-08-02 13:47     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 63+ messages in thread
From: Lutz Vieweg @ 2017-08-02 12:55 UTC (permalink / raw)
  To: linux-btrfs

On 08/02/2017 01:25 PM, Austin S. Hemmelgarn wrote:
> And this is a worst-case result of the fact that most
> distros added BTRFS support long before it was ready.

RedHat still advertises "Ceph", and given Ceph initially recommended btrfs as
the filesystem to use for its nodes, it is interesting to read how clearly
they recommend against btrfs now:

http://docs.ceph.com/docs/master/rados/configuration/filesystem-recommendations/
> We recommand against using btrfs due to the lack of a stable version
> to test against and frequent bugs in the ENOSPC handling.

German IT magazine "Golem" speculates that RedHat's decision
is influenced by its recent acquisition of Permabit.

But I don't really see how XFS or Permabit tackle the problem
that if you need to create consistent backups of file systems while they are
in use, block-device level snapshots damage the write performance
big time.

(That backup topic is the one reason we use btrfs for a lot of
/home/ directories.)

I understand that XFS is expected to get some COW-features in the future
as well - but it remains to be seen what performance and robustness
implications that will have on XFS.

Regards,

Lutz Vieweg


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02 12:55   ` Lutz Vieweg
@ 2017-08-02 13:47     ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-02 13:47 UTC (permalink / raw)
  To: Lutz Vieweg, linux-btrfs

On 2017-08-02 08:55, Lutz Vieweg wrote:
> On 08/02/2017 01:25 PM, Austin S. Hemmelgarn wrote:
>> And this is a worst-case result of the fact that most
>> distros added BTRFS support long before it was ready.
> 
> RedHat still advertises "Ceph", and given Ceph initially recommended 
> btrfs as
> the filesystem to use for its nodes, it is interesting to read how clearly
> they recommend against btrfs now:
> 
> http://docs.ceph.com/docs/master/rados/configuration/filesystem-recommendations/ 
> 
>> We recommand against using btrfs due to the lack of a stable version
>> to test against and frequent bugs in the ENOSPC handling.
Yes, and the one thing they don't mention there is that Ceph is already 
doing most of the same things that BTRFS is, so you end up having 
performance issues due to duplicated work too.  What they specifically 
call out though is first the reason that it should not be supported yet 
in RHEL, OEL, and many other distros (I'm explicitly leaving 
SLES/OpenSUSE off of that list, because while I disagree with their 
choices of default behavior WRT BTRFS, they are actively involved in 
it's development, unlike most of the other distros that 'support' it), 
and then second one of the biggest issues for regular usage.
> 
> German IT magazine "Golem" speculates that RedHat's decision
> is influenced by its recent acquisition of Permabit.
> 
> But I don't really see how XFS or Permabit tackle the problem
> that if you need to create consistent backups of file systems while they 
> are
> in use, block-device level snapshots damage the write performance
> big time.
When you're talking about data safety though, most people are willing to 
sacrifice write performance in favor of significantly lowering perceived 
risk.  The misguided early support of BTRFS without sufficient 
explanation of exactly how 'in-development' it is by many distros means 
that there are a lot of stories of issues and failures with BTRFS than 
ones of success (partly also because the filesystem is one of those 
things that people tend to complain about if it breaks, and not praise 
all that much if it works), and as a result, the general perception 
outside of people who use it actively is that it's pretty risky to use 
(which is absolutely accurate if you don't do routine maintenance on it).
> 
> (That backup topic is the one reason we use btrfs for a lot of
> /home/ directories.)
> 
> I understand that XFS is expected to get some COW-features in the future
> as well - but it remains to be seen what performance and robustness
> implications that will have on XFS.
I believe basic reflink functionality is already upstream, and I wasn't 
aware of any other specific development for XFS.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02  8:38 RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut? Brendan Hide
  2017-08-02  9:11 ` Wang Shilong
  2017-08-02 11:25 ` Austin S. Hemmelgarn
@ 2017-08-02 18:44 ` Chris Mason
  2017-08-02 22:12   ` Fajar A. Nugraha
  2017-08-02 22:22 ` Chris Murphy
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 63+ messages in thread
From: Chris Mason @ 2017-08-02 18:44 UTC (permalink / raw)
  To: Brendan Hide, linux-btrfs

On 08/02/2017 04:38 AM, Brendan Hide wrote:
> The title seems alarmist to me - and I suspect it is going to be 
> misconstrued. :-/

Supporting any filesystem is a huge amount of work.  I don't have a 
problem with Redhat or any distro picking and choosing the projects they 
want to support.

At least inside of FB, our own internal btrfs usage is continuing to 
grow.  Btrfs is becoming a big part of how we ship containers and other 
workloads where snapshots improve performance.

We also heavily use XFS, so I'm happy to see RH's long standing 
investment there continue.

-chris

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02 18:44 ` Chris Mason
@ 2017-08-02 22:12   ` Fajar A. Nugraha
  0 siblings, 0 replies; 63+ messages in thread
From: Fajar A. Nugraha @ 2017-08-02 22:12 UTC (permalink / raw)
  To: linux-btrfs

On Thu, Aug 3, 2017 at 1:44 AM, Chris Mason <clm@fb.com> wrote:
>
> On 08/02/2017 04:38 AM, Brendan Hide wrote:
>>
>> The title seems alarmist to me - and I suspect it is going to be misconstrued. :-/
>
>
> Supporting any filesystem is a huge amount of work.  I don't have a problem with Redhat or any distro picking and choosing the projects they want to support.
>

It'd help a lot of people if things like
https://btrfs.wiki.kernel.org/index.php/Status is kept up-to-date and
'promoted', so at least users are more informed about what they're
getting into and can choose which features (stable/still in dev/likely
to destroy your data) that they want to use.

For example, https://btrfs.wiki.kernel.org/index.php/Status says
compression is 'mostly OK' ('auto-repair and compression may crash'
looks pretty scary, as from newcomers-perspective it might be
interpretted as 'potential data loss'), while
https://en.opensuse.org/SDB:BTRFS#Compressed_btrfs_filesystems says
they support compression on newer opensuse versions.


>
> At least inside of FB, our own internal btrfs usage is continuing to grow.  Btrfs is becoming a big part of how we ship containers and other workloads where snapshots improve performance.
>

Ubuntu also support btrfs as part their container implementation
(lxd), and (reading lxd mailing list) some people use lxd+btrfs on
their production environment. IIRC the last problem posted on lxd list
about btrfs was about how 'btrfs send/receive (used by lxd copy) is
slower than rsync for full/initial copy'.

-- 
Fajar

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02  8:38 RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut? Brendan Hide
                   ` (2 preceding siblings ...)
  2017-08-02 18:44 ` Chris Mason
@ 2017-08-02 22:22 ` Chris Murphy
  2017-08-03  9:59   ` Lutz Vieweg
  2017-08-03 18:08 ` waxhead
  2017-08-04 14:05 ` Qu Wenruo
  5 siblings, 1 reply; 63+ messages in thread
From: Chris Murphy @ 2017-08-02 22:22 UTC (permalink / raw)
  To: Brendan Hide; +Cc: Btrfs BTRFS

On Wed, Aug 2, 2017 at 2:38 AM, Brendan Hide <brendan@swiftspirit.co.za> wrote:
> The title seems alarmist to me - and I suspect it is going to be
> misconstrued. :-/

Josef pushed bak on the HN thread with very sound reasoning about why
this is totally unsurprising. RHEL runs old kernels, and they have no
upstream Btrfs developers. So it's a huge PITA to backport the tons of
changes Btrfs has been going through (thousands of line changes per
kernel cycle).

What's more interesting to me is whether this means
-  CONFIG_BTRFS_FS=m
+  # CONFIG_BTRFS_FS is not set

In particular in elrepo.org kernels.

Also more interesting is this Stratis project that started up a few months ago:

https://github.com/stratis-storage/stratisd

Which also includes this design document:
https://stratis-storage.github.io/StratisSoftwareDesign.pdf

Basically they're creating a file system manager manifesting as a
daemon, new CLI tools, and new metadata formats for the volume
manager. So it's going to use existing device mapper, md, some LVM
stuff, XFS, in a layered approach abstracted from the user.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02 22:22 ` Chris Murphy
@ 2017-08-03  9:59   ` Lutz Vieweg
  0 siblings, 0 replies; 63+ messages in thread
From: Lutz Vieweg @ 2017-08-03  9:59 UTC (permalink / raw)
  To: linux-btrfs

On 08/03/2017 12:22 AM, Chris Murphy wrote:
> Also more interesting is this Stratis project that started up a few months ago:
> https://github.com/stratis-storage/stratisd
>
> Which also includes this design document:
> https://stratis-storage.github.io/StratisSoftwareDesign.pdf

This concept, if successfully implemented, does not seem to achieve
anything beyond "hide the complexity of its implementation from the user".

No actual new functionality, no reason to assume any additional robustness
or stability, and certainly not a new filesystem,  just yet-another-wrapper.

Keeping users from understanding the complexity of a storage system
they use is not a benefit for all but the most trivial use cases.

And I find it symptomatic that the section "D-Bus Access Control" in
StratisSoftwareDesign.pdf is empty.

> So it's going to use existing device mapper, md, some LVM
> stuff, XFS

That is the only part of the Stratis concept that looks reasonable to me.





^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02  8:38 RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut? Brendan Hide
                   ` (3 preceding siblings ...)
  2017-08-02 22:22 ` Chris Murphy
@ 2017-08-03 18:08 ` waxhead
  2017-08-03 18:29   ` Christoph Anton Mitterer
                     ` (2 more replies)
  2017-08-04 14:05 ` Qu Wenruo
  5 siblings, 3 replies; 63+ messages in thread
From: waxhead @ 2017-08-03 18:08 UTC (permalink / raw)
  To: Brendan Hide, linux-btrfs

Brendan Hide wrote:
> The title seems alarmist to me - and I suspect it is going to be 
> misconstrued. :-/
>
> From the release notes at 
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html
>
> "Btrfs has been deprecated
>
> The Btrfs file system has been in Technology Preview state since the 
> initial release of Red Hat Enterprise Linux 6. Red Hat will not be 
> moving Btrfs to a fully supported feature and it will be removed in a 
> future major release of Red Hat Enterprise Linux.
>
> The Btrfs file system did receive numerous updates from the upstream 
> in Red Hat Enterprise Linux 7.4 and will remain available in the Red 
> Hat Enterprise Linux 7 series. However, this is the last planned 
> update to this feature.
>
> Red Hat will continue to invest in future technologies to address the 
> use cases of our customers, specifically those related to snapshots, 
> compression, NVRAM, and ease of use. We encourage feedback through 
> your Red Hat representative on features and requirements you have for 
> file systems and storage technology."
>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
First of all I am not a BTRFS dev, but I use it for various projects and 
have high hopes for what it can become.

Now, the fact that Red Hat depreciate BTRFS does not mean that BTRFS is 
depreciated. It not removed from the kernel and so far BTRFS offers 
features that other filesystems don't have. ZFS is something that people 
brag about all the time as a viable alternative, but for me it seems to 
be a pain to manage properly. E.g. grow, add/remove devices, shrink 
etc... good luck doing that right!

BTRFS biggest problem is not that there are some bits and pieces that 
are thoroughly screwed up (raid5/6 (which just got some fixes by the 
way)), but  the fact that the documentation is rather dated.

There is a simple status page here 
https://btrfs.wiki.kernel.org/index.php/Status

As others have pointed out already the explanations on the status page 
is not exactly good. For example compression (that was also mentioned) 
is as of writing this marked as 'Mostly ok'  '(needs verification and 
source) - auto repair and compression may crash'

Now, I am aware that many use compression without trouble. I am not sure 
how many that has compression with disk issues and don't have trouble , 
but I would at least expect to see more people yelling on the mailing 
list if that where the case. The problem here is that this message is 
rather scary and certainly does NOT sound like 'mostly ok' for most people.

What exactly needs verification and source? the mostly ok statement or 
something else?! A more detailed explanation would be required here to 
avoid scaring people away.

Same thing with the trim feature that is marked OK . It clearly says 
that is has performance implications. It is marked OK so one would 
expect it to not cause the filesystem to fail, but if the performance 
becomes so slow that the filesystem gets practically unusable it is of 
course not "OK". The relevant information is missing for people to make 
a decent choice and I certainly don't know how serious these performance 
implications are, if they are at all relevant...

Most people interested in BTRFS are probably a bit more paranoid and 
concerned about their data than the average computer user. What people 
tend to forget is that other filesystems either have NO redundancy, 
auto-repair and other fancy features that BTRFS have. So for the 
compression example above... if you run compressed files on ext4 and 
your disk gets some corruption you are in a no better state than what 
you would be with btrfs either (in fact probably worse). Also nothing is 
stopping you from putting btrfs DUP on a mdadm raid5 or 6 which mean you 
should be VERY safe.

Simple documentation is the key so HERE ARE MY DEMANDS!!!..... ehhh.... 
so here is what I think should be done:

1. The documentation needs to either be improved (or old non-relevant 
stuff simply removed / archived somewhere)
2. The status page MUST always be up to date for the latest kernel 
release (It's ok so far , let's hope nobody sleeps here)
3. Proper explanations must be given so the layman and reasonably 
technical people understand the risks / issues for non-ok stuff.
4. There should be links to roadmaps for each feature on the status page 
that clearly stats what is being worked on for the NEXT kernel release






^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-03 18:08 ` waxhead
@ 2017-08-03 18:29   ` Christoph Anton Mitterer
  2017-08-03 19:22     ` Austin S. Hemmelgarn
  2017-08-03 19:03   ` Austin S. Hemmelgarn
  2017-08-16 18:07   ` David Sterba
  2 siblings, 1 reply; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-03 18:29 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1709 bytes --]

On Thu, 2017-08-03 at 20:08 +0200, waxhead wrote:
> Brendan Hide wrote:
> > The title seems alarmist to me - and I suspect it is going to be 
> > misconstrued. :-/
> > 
> > From the release notes at 
> > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Li
> > nux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-
> > 7.4_Release_Notes-Deprecated_Functionality.html
> > "Btrfs has been deprecated
> > 

Wow... not that this would have any direct effect... it's still quite
alarming, isn't it?

This is not meant as criticism, but I often wonder myself where the
btrfs is going to!? :-/

It's in the kernel now since when? 2009? And while the extremely basic
things (snapshots, etc.) seem to work quite stable... other things seem
to be rather stuck (RAID?)... not to talk about many things that have
been kinda "promised" (fancy different compression algos, n-parity-
raid).
There are no higher-level management tools (e.g. RAID
management/monitoring, etc.)... there are still some kinda serious
issues (the attacks/corruptions likely possible via UUID collisions)...
One thing that I miss since long would be the checksumming with
nodatacow.
Also it has always been said that the actual performance tunning would
still lay ahead?!


I really like btrfs and use it on all my personal systems... and I
haven't had any data loss since then (only a number of seriously
looking false positives due to bugs in btrfs check ;-) )... but one
still reads every now and then from people here on the list who seem to
suffer from more serious losses.



So is there any concrete roadmap? Or priority tasks? Is there a lack of
developers?

Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-03 18:08 ` waxhead
  2017-08-03 18:29   ` Christoph Anton Mitterer
@ 2017-08-03 19:03   ` Austin S. Hemmelgarn
  2017-08-04  9:48     ` Duncan
  2017-08-16 18:07   ` David Sterba
  2 siblings, 1 reply; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-03 19:03 UTC (permalink / raw)
  To: waxhead, Brendan Hide, linux-btrfs

On 2017-08-03 14:08, waxhead wrote:
> Brendan Hide wrote:
>> The title seems alarmist to me - and I suspect it is going to be 
>> misconstrued. :-/
>>
>> From the release notes at 
>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html 
>>
>>
>> "Btrfs has been deprecated
>>
>> The Btrfs file system has been in Technology Preview state since the 
>> initial release of Red Hat Enterprise Linux 6. Red Hat will not be 
>> moving Btrfs to a fully supported feature and it will be removed in a 
>> future major release of Red Hat Enterprise Linux.
>>
>> The Btrfs file system did receive numerous updates from the upstream 
>> in Red Hat Enterprise Linux 7.4 and will remain available in the Red 
>> Hat Enterprise Linux 7 series. However, this is the last planned 
>> update to this feature.
>>
>> Red Hat will continue to invest in future technologies to address the 
>> use cases of our customers, specifically those related to snapshots, 
>> compression, NVRAM, and ease of use. We encourage feedback through 
>> your Red Hat representative on features and requirements you have for 
>> file systems and storage technology."
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> First of all I am not a BTRFS dev, but I use it for various projects and 
> have high hopes for what it can become.
> 
> Now, the fact that Red Hat depreciate BTRFS does not mean that BTRFS is 
> depreciated. It not removed from the kernel and so far BTRFS offers 
> features that other filesystems don't have. ZFS is something that people 
> brag about all the time as a viable alternative, but for me it seems to 
> be a pain to manage properly. E.g. grow, add/remove devices, shrink 
> etc... good luck doing that right!
> 
> BTRFS biggest problem is not that there are some bits and pieces that 
> are thoroughly screwed up (raid5/6 (which just got some fixes by the 
> way)), but  the fact that the documentation is rather dated.
> 
> There is a simple status page here 
> https://btrfs.wiki.kernel.org/index.php/Status
> 
> As others have pointed out already the explanations on the status page 
> is not exactly good. For example compression (that was also mentioned) 
> is as of writing this marked as 'Mostly ok'  '(needs verification and 
> source) - auto repair and compression may crash'
> 
> Now, I am aware that many use compression without trouble. I am not sure 
> how many that has compression with disk issues and don't have trouble , 
> but I would at least expect to see more people yelling on the mailing 
> list if that where the case. The problem here is that this message is 
> rather scary and certainly does NOT sound like 'mostly ok' for most people.
> 
> What exactly needs verification and source? the mostly ok statement or 
> something else?! A more detailed explanation would be required here to 
> avoid scaring people away.
Not certain what was meant here, but there were (a while back) some 
known issues with compressed extents, but I thought those had been fixed.
> 
> Same thing with the trim feature that is marked OK . It clearly says 
> that is has performance implications. It is marked OK so one would 
> expect it to not cause the filesystem to fail, but if the performance 
> becomes so slow that the filesystem gets practically unusable it is of 
> course not "OK". The relevant information is missing for people to make 
> a decent choice and I certainly don't know how serious these performance 
> implications are, if they are at all relevant...
The performance implications bit shouldn't be listed, that's a given for 
any filesystem with discard (TRIM is the ATA and eMMC command, UNMAP is 
the SCSI one, and ERASE is the name on SD cards, discard is the generic 
kernel term) support.  The issue arises from devices that don't have 
support for queuing such commands, which is quite rare for SSD's these days.
> 
> Most people interested in BTRFS are probably a bit more paranoid and 
> concerned about their data than the average computer user. What people 
> tend to forget is that other filesystems either have NO redundancy, 
> auto-repair and other fancy features that BTRFS have. So for the 
> compression example above... if you run compressed files on ext4 and 
> your disk gets some corruption you are in a no better state than what 
> you would be with btrfs either (in fact probably worse). Also nothing is 
> stopping you from putting btrfs DUP on a mdadm raid5 or 6 which mean you 
> should be VERY safe.
> 
> Simple documentation is the key so HERE ARE MY DEMANDS!!!..... ehhh.... 
> so here is what I think should be done:
> 
> 1. The documentation needs to either be improved (or old non-relevant 
> stuff simply removed / archived somewhere)
> 2. The status page MUST always be up to date for the latest kernel 
> release (It's ok so far , let's hope nobody sleeps here)
> 3. Proper explanations must be given so the layman and reasonably 
> technical people understand the risks / issues for non-ok stuff.
> 4. There should be links to roadmaps for each feature on the status page 
> that clearly stats what is being worked on for the NEXT kernel release
I entirely agree on all of this, but there is a severe lack of people 
willing to maintain it (I for example do not have the patience to 
maintain it, let alone the time).

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02  9:11 ` Wang Shilong
@ 2017-08-03 19:18   ` Chris Murphy
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Murphy @ 2017-08-03 19:18 UTC (permalink / raw)
  To: Wang Shilong; +Cc: Brendan Hide, linux-btrfs

On Wed, Aug 2, 2017 at 3:11 AM, Wang Shilong <wangshilong1991@gmail.com> wrote:
> I haven't seen active btrfs developers from some time, Redhat looks
> put most of their efforts on XFS, It is time to switch to SLES/opensuse!

I disagree. We need one or more Btrfs developers involved in Fedora.

Fedora runs fairly unmodified upstream kernels, which are kept up to
date. By default, Fedora 24, 25, 26 users today are on kernel 4.11.11
or 4.11.12. Fedora 25, 26 will soon be rebased to probably 4.12.5.
That's the stable repo. You can optionally get newer non-rc ones from
testing repo. And nightly Rawhide kernels are built as well with the
latest patchset in between rc's. Both Btrfs and Fedora are heavily
developing in containerize deployments, so it seems like a good fit
for both camps.

The problem is the Fedora kernel team has no one sufficiently familiar
with Btrfs, nor anyone at Red Hat to fall back on. But they do have
this with ext4, XFS, device-mapper, and LVM developers. So they're not
going to take on a burden like Btrfs by default without a
knowledgeable pair of eyes to triage issues as they come up. And
instead they're moving to XFS + overlayfs.

There's more opportunity for Btrfs than just as a default file system.
I like the idea of using Btrfs on install media to eliminate the
monolithic isomd5sum most users skip to test their USB install media;
eliminate device-mapper based persistent overlay for the install media
and use Btrfs seed/sprout instead (which would help the Sugar on a
Stick project as well); and at least for nightly composes eliminate
squashfs xz based images in favor of Btrfs compression (faster
compression and decompression, bigger file sizes but these are daily
throw aways so I think time is more important).

Anyway - point is that converging on SUSE doesn't help. If anything I
think it'll shrink the market for Btrfs as a general purpose file
system, rather than grow it.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-03 18:29   ` Christoph Anton Mitterer
@ 2017-08-03 19:22     ` Austin S. Hemmelgarn
  2017-08-03 20:45       ` Brendan Hide
  0 siblings, 1 reply; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-03 19:22 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs

On 2017-08-03 14:29, Christoph Anton Mitterer wrote:
> On Thu, 2017-08-03 at 20:08 +0200, waxhead wrote:
>> Brendan Hide wrote:
>>> The title seems alarmist to me - and I suspect it is going to be
>>> misconstrued. :-/
>>>
>>>  From the release notes at
>>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Li
>>> nux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-
>>> 7.4_Release_Notes-Deprecated_Functionality.html
>>> "Btrfs has been deprecated
>>>
> 
> Wow... not that this would have any direct effect... it's still quite
> alarming, isn't it?
> 
> This is not meant as criticism, but I often wonder myself where the
> btrfs is going to!? :-/
> 
> It's in the kernel now since when? 2009? And while the extremely basic
> things (snapshots, etc.) seem to work quite stable... other things seem
> to be rather stuck (RAID?)... not to talk about many things that have
> been kinda "promised" (fancy different compression algos, n-parity-
> raid).
I assume you mean the erasure coding the devs and docs call raid56 when 
you're talking about stuck features, and you're right, it has been 
stuck, but it arguably should have been better tested and verified 
before being merged at all.  As far as other 'raid' profiles, raid1 and 
raid0 work fine, and raid10 is mostly fine once you wrap your head 
around the implications of the inconsistent component device ordering.
> There are no higher-level management tools (e.g. RAID
> management/monitoring, etc.)... there are still some kinda serious
> issues (the attacks/corruptions likely possible via UUID collisions)...
The UUID collision issue is present in almost all volume managers and 
filesystems, it just does more damage in BTRFS, and is exacerbated by 
the brain-dead 'scan everything for BTRFS' policy in udev.

As far as 'higher-level' management tools, you're using your system 
wrong if you _need_ them.  There is no need for there to be a GUI, or a 
web interface, or a DBus interface, or any other such bloat in the main 
management tools, they work just fine as is and are mostly on par with 
the interfaces provided by LVM, MD, and ZFS (other than the lack of 
machine parseable output).  I'd also argue that if you can't reassemble 
your storage stack by hand without using 'higher-level' tools, you 
should not be using that storage stack as you don't properly understand it.

On the subject of monitoring specifically, part of the issue there is 
kernel side, any monitoring system currently needs to be polling-based, 
not event-based, and as a result monitoring tends to be a very system 
specific affair based on how much overhead you're willing to tolerate. 
The limited stuff that does exist is also trivial to integrate with many 
pieces of existing monitoring infrastructure (like Nagios or monit), and 
therefore the people who care about it a lot (like me) are either 
monitoring by hand, or are just using the tools with their existing 
infrastructure (for example, I use monit already on all my systems, so I 
just make sure to have entries in the config for that to check error 
counters and scrub results), so there's not much in the way of incentive 
for the concerned parties to reinvent the wheel.
> One thing that I miss since long would be the checksumming with
> nodatacow.
It has been stated multiple times on the list that this is not possible 
without making nodatacow prone to data loss.
> Also it has always been said that the actual performance tunning would
> still lay ahead?!
While there hasn't been anything touted specifically as performance 
tuning, performance has improved slightly since I started using BTRFS.
> 
> 
> I really like btrfs and use it on all my personal systems... and I
> haven't had any data loss since then (only a number of seriously
> looking false positives due to bugs in btrfs check ;-) )... but one
> still reads every now and then from people here on the list who seem to
> suffer from more serious losses.
And this brings up part of the issue with uptake.  People are quick to 
post about issues, but not successes.  I've been running BTRFS on almost 
everything (I don't use it in VM's because of the performance 
implications of having multiple CoW layers) since around kernel 3.9, 
have had no critical issues (ones resulting in data loss) since about 
3.16, and have actually survived quite a few pieces of marginal or 
failed hardware as a result of BTRFS.
> 
> 
> 
> So is there any concrete roadmap? Or priority tasks? Is there a lack of
> developers?
In order, no, in theory yes but not in practice, and somewhat.

As a general rule, all FOSS projects are short on developers.  Most of 
the work that is occurring on BTRFS is being sponsored by SUSE, 
Facebook, or Fujitsu (at least, I'm pretty sure those are the primary 
sponsors), and their priorities will not necessarily coincide with 
normal end-user priorities.  I'd say though that testing and review are 
just as much short on manpower as development.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-03 19:22     ` Austin S. Hemmelgarn
@ 2017-08-03 20:45       ` Brendan Hide
  2017-08-03 22:00         ` Chris Murphy
  2017-08-04 11:26         ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 63+ messages in thread
From: Brendan Hide @ 2017-08-03 20:45 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Christoph Anton Mitterer, linux-btrfs



On 08/03/2017 09:22 PM, Austin S. Hemmelgarn wrote:
> On 2017-08-03 14:29, Christoph Anton Mitterer wrote:
>> On Thu, 2017-08-03 at 20:08 +0200, waxhead wrote:
>> There are no higher-level management tools (e.g. RAID
>> management/monitoring, etc.)...
[snip]

> As far as 'higher-level' management tools, you're using your system 
> wrong if you _need_ them.  There is no need for there to be a GUI, or a 
> web interface, or a DBus interface, or any other such bloat in the main 
> management tools, they work just fine as is and are mostly on par with 
> the interfaces provided by LVM, MD, and ZFS (other than the lack of 
> machine parseable output).  I'd also argue that if you can't reassemble 
> your storage stack by hand without using 'higher-level' tools, you 
> should not be using that storage stack as you don't properly understand it.
> 
> On the subject of monitoring specifically, part of the issue there is 
> kernel side, any monitoring system currently needs to be polling-based, 
> not event-based, and as a result monitoring tends to be a very system 
> specific affair based on how much overhead you're willing to tolerate. 
> The limited stuff that does exist is also trivial to integrate with many 
> pieces of existing monitoring infrastructure (like Nagios or monit), and 
> therefore the people who care about it a lot (like me) are either 
> monitoring by hand, or are just using the tools with their existing 
> infrastructure (for example, I use monit already on all my systems, so I 
> just make sure to have entries in the config for that to check error 
> counters and scrub results), so there's not much in the way of incentive 
> for the concerned parties to reinvent the wheel.

To counter, I think this is a big problem with btrfs, especially in 
terms of user attrition. We don't need "GUI" tools. At all. But we do 
need that btrfs is self-sufficient enough that regular users don't get 
burnt by what they would view as unexpected behaviour.  We have 
currently a situation where btrfs is too demanding on inexperienced users.

I feel we need better worst-case behaviours. For example, if *I* have a 
btrfs on its second-to-last-available chunk, it means I'm not 
micro-managing properly. But users shouldn't have to micro-manage in the 
first place. Btrfs (or a management tool) should just know to balance 
the least-used chunk and/or delete the lowest-priority snapshot, etc. It 
shouldn't cause my services/apps to give diskspace errors when, clearly, 
there is free space available.

The other "high-level" aspect would be along the lines of better 
guidance and standardisation for distros on how best to configure btrfs. 
This would include guidance/best practices for things like appropriate 
subvolume mountpoints and snapshot paths, sensible schedules or logic 
(or perhaps even example tools/scripts) for balancing and scrubbing the 
filesystem.

I don't have all the answers. But I also don't want to have to tell 
people they can't adopt it because a) they don't (or never will) 
understand it; and b) they're going to resent me for their irresponsibly 
losing their own data.

-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-03 20:45       ` Brendan Hide
@ 2017-08-03 22:00         ` Chris Murphy
  2017-08-04 11:26         ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 63+ messages in thread
From: Chris Murphy @ 2017-08-03 22:00 UTC (permalink / raw)
  To: Brendan Hide; +Cc: Austin S. Hemmelgarn, Christoph Anton Mitterer, Btrfs BTRFS

On Thu, Aug 3, 2017 at 2:45 PM, Brendan Hide <brendan@swiftspirit.co.za> wrote:
>
> To counter, I think this is a big problem with btrfs, especially in terms of
> user attrition. We don't need "GUI" tools. At all. But we do need that btrfs
> is self-sufficient enough that regular users don't get burnt by what they
> would view as unexpected behaviour.  We have currently a situation where
> btrfs is too demanding on inexperienced users.

I think the top complaint is the manual nature of balancing to avoid
enospc when there's free space, followed by balancing needed to
avoid/reduce free space fragmentation and thus maintain consistent
performance.

Obviously the kernel needs more intelligent code to free up partially
full block groups, more correctly to free up contiguous space for it
to write to. That solve both problems.

But in the meantime, btrfs-progs should ship a policy to do some
minimal balance to totally obviate this and find the edge cases. Maybe
it's dusage=3 every day. And dusage=10 one a week. And dusage=20
musage=20 once a month. I don't know but some iteration in this area
is better than saying, PUNT! And putting it on the user's lap.

Better would be a trigger that is statistics based rather than time
based. The metric might be some combination of workload (i.e. idle)
and ratios found in sysfs.

Anyway, the first step is for people on this list to stop
micromanaging their own volumes, and try to center on a sane one size
fits all solution. And then iterate better and better solutions as we
determine the edge cases where one size doesn't fit all. We're
throwing hammers at the problem by default because it's a learned
behavior. We all need to just stop balancing and act like regular
users. And then figure out how to automatically optimize.




>
> I feel we need better worst-case behaviours. For example, if *I* have a
> btrfs on its second-to-last-available chunk, it means I'm not micro-managing
> properly. But users shouldn't have to micro-manage in the first place. Btrfs
> (or a management tool) should just know to balance the least-used chunk
> and/or delete the lowest-priority snapshot, etc. It shouldn't cause my
> services/apps to give diskspace errors when, clearly, there is free space
> available.

Ideally the kernel code needs to do a better job freeing up partial
block groups. But in the meantime, this can be set as an optimization
policy in user space. And it should be in btrfs-progs so it's
consistent across distros. SUSE has a distro specific balancer, on a
systemd timer, but I don't think it's enabled by default and I also
think it's weirdly too aggressive.

If it could be made smarter, with a trigger other than a timer, that'd
be even better.

But doing nothing has been one of the most consistently negative user
responses about Btrfs is the manual balance to maintain performance.



> The other "high-level" aspect would be along the lines of better guidance
> and standardisation for distros on how best to configure btrfs. This would
> include guidance/best practices for things like appropriate subvolume
> mountpoints and snapshot paths, sensible schedules or logic (or perhaps even
> example tools/scripts) for balancing and scrubbing the filesystem.

Would they listen? My experience with openSUSE is, nope.


> I don't have all the answers. But I also don't want to have to tell people
> they can't adopt it because a) they don't (or never will) understand it; and
> b) they're going to resent me for their irresponsibly losing their own data.

Sure.

You can read on the linux-raid@ list where there are still constant
problems with users doing crazy things like mdadm --create to fix a
raid assembly problem, and obliterate their data by doing this and
then getting pissed. It's like, where the hell do people keep getting
the idea of doing that?

There are six ways to Sunday ways of fixing a Btrfs volume. It reads
like a choose your own adventure book. No, actually it's worse because
at least the choose your own adventure book tells you what page to go
to next, and Btrfs gives you zero advice what order to try things in.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-03 19:03   ` Austin S. Hemmelgarn
@ 2017-08-04  9:48     ` Duncan
  0 siblings, 0 replies; 63+ messages in thread
From: Duncan @ 2017-08-04  9:48 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Thu, 03 Aug 2017 15:03:53 -0400 as
excerpted:

>> Same thing with the trim feature that is marked OK . It clearly says
>> that is has performance implications. It is marked OK so one would
>> expect it to not cause the filesystem to fail, but if the performance
>> becomes so slow that the filesystem gets practically unusable it is of
>> course not "OK". The relevant information is missing for people to make
>> a decent choice and I certainly don't know how serious these
>> performance implications are, if they are at all relevant...
> The performance implications bit shouldn't be listed, that's a given for
> any filesystem with discard (TRIM is the ATA and eMMC command, UNMAP is
> the SCSI one, and ERASE is the name on SD cards, discard is the generic
> kernel term) support.  The issue arises from devices that don't have
> support for queuing such commands, which is quite rare for SSD's these
> days.

Not so entirely rare.  The generally well regarded Samsung EVO/Pro 850 
ssd series don't support queued-trim, and indeed, due to a fiasco where 
new firmware lied about such support[1], the kernel now blacklists queued-
trim on all samsung ssds.

(I actually bought a pair of samsung evo 1TB ssds after seeing them well 
recommended both on this list and in various reviews.  Only AFTER I had 
them and was wondering if I could now add discard to my btrfs mount 
options and therefore googling for samsung evo queued trim specifically, 
did I find out about this fiasco and samsung not supporting linux because 
anyone can write the code, or I'd have certainly reconsidered and would 
have very likely spent my money elsewhere.  I did actually check the 
current kernel's blacklisting code and verified it, tho I also noted it 
whitelists samsung ssds for actually honoring flush directives where the 
code treats non-whitelisted ssds as not honoring them due apparently to 
too many claiming to do so while not actually doing so, to get better 
performance, so it's a mixed bag, one whitelisting for actually flushing 
when it claims to, one blacklisting for not reliably handling queued-trim 
despite some firmware claiming to do so.  But the worst IMO is samsung 
support blackballing linux because anyone can write the code. =:^  That's 
worth blackballing samsung for, in my book; I just wish I'd found out 
before the purchase instead of after, tho the linux devs have at least 
made sure samsung ssd users don't lose data on linux due to samsung's 
lies, despite samsung's horrible support policy blackballing linux, at 
least at the time.)

---
[1] The firmware said it supported a new ata standard where it's 
apparently mandatory, but the result was repeatedly corrupted data, with 
samsung support repeatedly said they don't support Linux because anyone 
can write code to execute, but they weren't seeing the problem on MS yet 
simply because MS hadn't issued a release that supported the new 
standard, and had queued-trim disabled by default with the older 
standards due to such problems when it was enabled.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-03 20:45       ` Brendan Hide
  2017-08-03 22:00         ` Chris Murphy
@ 2017-08-04 11:26         ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-04 11:26 UTC (permalink / raw)
  To: Brendan Hide, Christoph Anton Mitterer, linux-btrfs

On 2017-08-03 16:45, Brendan Hide wrote:
> 
> 
> On 08/03/2017 09:22 PM, Austin S. Hemmelgarn wrote:
>> On 2017-08-03 14:29, Christoph Anton Mitterer wrote:
>>> On Thu, 2017-08-03 at 20:08 +0200, waxhead wrote:
>>> There are no higher-level management tools (e.g. RAID
>>> management/monitoring, etc.)...
> [snip]
> 
>> As far as 'higher-level' management tools, you're using your system 
>> wrong if you _need_ them.  There is no need for there to be a GUI, or 
>> a web interface, or a DBus interface, or any other such bloat in the 
>> main management tools, they work just fine as is and are mostly on par 
>> with the interfaces provided by LVM, MD, and ZFS (other than the lack 
>> of machine parseable output).  I'd also argue that if you can't 
>> reassemble your storage stack by hand without using 'higher-level' 
>> tools, you should not be using that storage stack as you don't 
>> properly understand it.
>>
>> On the subject of monitoring specifically, part of the issue there is 
>> kernel side, any monitoring system currently needs to be 
>> polling-based, not event-based, and as a result monitoring tends to be 
>> a very system specific affair based on how much overhead you're 
>> willing to tolerate. The limited stuff that does exist is also trivial 
>> to integrate with many pieces of existing monitoring infrastructure 
>> (like Nagios or monit), and therefore the people who care about it a 
>> lot (like me) are either monitoring by hand, or are just using the 
>> tools with their existing infrastructure (for example, I use monit 
>> already on all my systems, so I just make sure to have entries in the 
>> config for that to check error counters and scrub results), so there's 
>> not much in the way of incentive for the concerned parties to reinvent 
>> the wheel.
> 
> To counter, I think this is a big problem with btrfs, especially in 
> terms of user attrition. We don't need "GUI" tools. At all. But we do 
> need that btrfs is self-sufficient enough that regular users don't get 
> burnt by what they would view as unexpected behaviour.  We have 
> currently a situation where btrfs is too demanding on inexperienced users.
> 
> I feel we need better worst-case behaviours. For example, if *I* have a 
> btrfs on its second-to-last-available chunk, it means I'm not 
> micro-managing properly. But users shouldn't have to micro-manage in the 
> first place. Btrfs (or a management tool) should just know to balance 
> the least-used chunk and/or delete the lowest-priority snapshot, etc. It 
> shouldn't cause my services/apps to give diskspace errors when, clearly, 
> there is free space available.
That's not just an issue with BTRFS, it's an issue with the distros too. 
  The only one that ships any kind of scheduled regular maintenance as 
far as I know is SUSE.  We don't need some daemon, or even special 
handling in the kernel, we just need to provide people with standard 
maintenance tools, and proper advice for monitoring.  I've been meaning 
to write up some wrappers and a couple of cron files to handle this a 
bit better, but just haven't had time.  I may look at getting that done 
either today or early next week.
> 
> The other "high-level" aspect would be along the lines of better 
> guidance and standardisation for distros on how best to configure btrfs. 
> This would include guidance/best practices for things like appropriate 
> subvolume mountpoints and snapshot paths, sensible schedules or logic 
> (or perhaps even example tools/scripts) for balancing and scrubbing the 
> filesystem.
There are currently three standards for this:
1. The snapper way, used by at least SUSE and Ubuntu, which IMO ends up 
being way too complicated for not much benefit.
2. The traditional filesystem way, used by most other distros, which 
doesn't use subvolumes at all.
3. The user choice way, used by stuff like Arch and Gentoo, which pretty 
much says the rest of the OS could care less how the filesystems and 
subvolumes are organized, as long as things work.

Overall, other than the first one, this is no different than with 
regular filesystems.
> 
> I don't have all the answers. But I also don't want to have to tell 
> people they can't adopt it because a) they don't (or never will) 
> understand it; and b) they're going to resent me for their irresponsibly 
> losing their own data.
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-02  8:38 RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut? Brendan Hide
                   ` (4 preceding siblings ...)
  2017-08-03 18:08 ` waxhead
@ 2017-08-04 14:05 ` Qu Wenruo
  2017-08-04 23:55   ` Wang Shilong
  2017-08-07 15:27   ` Chris Murphy
  5 siblings, 2 replies; 63+ messages in thread
From: Qu Wenruo @ 2017-08-04 14:05 UTC (permalink / raw)
  To: Brendan Hide, linux-btrfs



On 2017年08月02日 16:38, Brendan Hide wrote:
> The title seems alarmist to me - and I suspect it is going to be 
> misconstrued. :-/
> 
>  From the release notes at 
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html 
> 
> 
> "Btrfs has been deprecated
> 
> The Btrfs file system has been in Technology Preview state since the 
> initial release of Red Hat Enterprise Linux 6. Red Hat will not be 
> moving Btrfs to a fully supported feature and it will be removed in a 
> future major release of Red Hat Enterprise Linux.
> 
> The Btrfs file system did receive numerous updates from the upstream in 
> Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat 
> Enterprise Linux 7 series. However, this is the last planned update to 
> this feature.
> 
> Red Hat will continue to invest in future technologies to address the 
> use cases of our customers, specifically those related to snapshots, 
> compression, NVRAM, and ease of use. We encourage feedback through your 
> Red Hat representative on features and requirements you have for file 
> systems and storage technology."

Personally speaking, unlike most of the btrfs supporters, I think Red 
Hat is doing the correct thing for their enterprise use case.

(To clarify, I'm not going to Red Hat, just in case anyone wonders why 
I'm not supporting btrfs)

[Good things of btrfs]
Btrfs is indeed a technic pioneer in a lot of aspects (at least in linux 
world):

1) Metadata CoW instead of traditional journal
2) Snapshot and delta-backup
     I think this is the killer feature of Btrfs, and why SUSE is using 
it for root fs.
3) Default data CoW
4) Data checksum and scrubbing
5) Multi-device management
6) Online resize/balancing
And a lot of more.

[Bad things of btrfs]
But for enterprise usage, it's too advanced and has several problems 
preventing them being widely applied:

1) Low performance from metadata/data CoW
     This is a little complicated dilemma.
     Although Btrfs can disable data CoW, nodatacow also disables data 
checksum, which is another main feature for btrfs.
     So Btrfs can't default to nodatacow, unlike XFS.

     And metadata CoW causes extra metadata write along with superblock 
update (FUA), further degrading the performance.

     Such pioneered design makes traditional performance-intense use 
case very unhappy.
     Especially for almost all kind of databases. (Note that nodatacow 
can't always solve the performance problem).
     Most performance intense usage is still based on tradtional fs 
design (journal with no CoW)

2) Low concurrency caused by tree design.
      Unlike traditional one-tree-for-one-inode design, btrfs uses 
one-tree-for-one-subvolume.
      The design makes snapshot implementation very easy, while makes 
tree very hot when a lot of modifiers are trying to modify any metadata.

      Btrfs has a lot of different way to solve it.
      For extent tree (the most busy tree), we are using delayed-ref to 
speed up extent tree update.
      For fs tree fsync, we have log tree to speed things up.
      These approaches work, at the cost of complexity and bugs, and we 
still have slow fs tree modification speed.

3) Low code reusage of device-mapper.
      I totally understand that, due to the unique support for data 
csum, btrfs can't use device-mapper directly, as we must verify the data 
read out from device before passing it to higher level.
     So Btrfs uses its own device-mapper like implementation to handle 
multi-device management.

     The result is mixed. For easy to handle case like RAID0/1/10 btrfs 
is doing well.
     While for RAID5/6, everyone knows the result.

     Such btrfs *enhanced* re-implementation not only makes btrfs larger 
but also more complex and bug-prune.

In short, btrfs is too advanced for generic use cases (performance) and 
developers (bugs), unfortunately.

And even SUSE is just pushing btrfs as root fs, mainly for the snapshot 
feature.
But still ext4/xfs for data or performance intense use case.


[Other solution on the table]
On the other hand, I think RedHat is pushing storage technology based on 
LVM (thin) and Xfs.

For traditional LVM, it's stable but its snapshot design is old-fashion 
and low-performance.
While new thin-provision LVM solves the problem using a method just like 
Btrfs, but at block level.

And for XFS, it's still traditional designed, journal based, 
one-tree-for-one-inode.
But with fancy new features like data CoW.

Even XFS + LVM-thin lacks ability to shrink fs or scrub data or delta 
backup, it can do a lot of things just like Btrfs.
 From snapshot to multi-device management.

And more importantly, has better performance for things like DB.

So, for old use cases, the performance stays almost the same.
For developers, guys are still focusing on their old fields, less to 
concern and more focused to debug. The old UNIX method still works here, 
do one thing and do it well.

It provides some of the fancy features from btrfs, but not too fancy.
It's a compromising move, but a good move for enterprise usage.

[The future]
When btrfs is almost as good as traditional solutions for both 
performance and stability, I think it will be widely applied no matter 
whether RedHat uses it or not, especially since btrfs still has features 
which LVM-thin + XFS can't provide.

But the future is still full of challenges.
1) Complexity of btrfs makes development slow.
     Developers are already doing their work well, but the numbers of 
lines are twice of traditional fs.

2) New device-mapper based solution may come out fast
     Dm-thin is already here, and I won't be surprised that one day 
there will be hooks/API for device-mapper to communicate with higher levels.

     For example, if one day there is some dm-csum to support verify 
csum of given ranges (and skip unrelated ones specified by higher 
levels), btrfs support for data csum is no longer an exclusive feature.

Thanks,
Qu
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-04 14:05 ` Qu Wenruo
@ 2017-08-04 23:55   ` Wang Shilong
  2017-08-07 15:27   ` Chris Murphy
  1 sibling, 0 replies; 63+ messages in thread
From: Wang Shilong @ 2017-08-04 23:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Brendan Hide, linux-btrfs

Hi Qu,

On Fri, Aug 4, 2017 at 10:05 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年08月02日 16:38, Brendan Hide wrote:
>>
>> The title seems alarmist to me - and I suspect it is going to be
>> misconstrued. :-/
>>
>>  From the release notes at
>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html
>>
>> "Btrfs has been deprecated
>>
>> The Btrfs file system has been in Technology Preview state since the
>> initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving
>> Btrfs to a fully supported feature and it will be removed in a future major
>> release of Red Hat Enterprise Linux.
>>
>> The Btrfs file system did receive numerous updates from the upstream in
>> Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat
>> Enterprise Linux 7 series. However, this is the last planned update to this
>> feature.
>>
>> Red Hat will continue to invest in future technologies to address the use
>> cases of our customers, specifically those related to snapshots,
>> compression, NVRAM, and ease of use. We encourage feedback through your Red
>> Hat representative on features and requirements you have for file systems
>> and storage technology."
>
>
> Personally speaking, unlike most of the btrfs supporters, I think Red Hat is
> doing the correct thing for their enterprise use case.
>
> (To clarify, I'm not going to Red Hat, just in case anyone wonders why I'm
> not supporting btrfs)
>
> [Good things of btrfs]
> Btrfs is indeed a technic pioneer in a lot of aspects (at least in linux
> world):
>
> 1) Metadata CoW instead of traditional journal
> 2) Snapshot and delta-backup
>     I think this is the killer feature of Btrfs, and why SUSE is using it
> for root fs.
> 3) Default data CoW
> 4) Data checksum and scrubbing
> 5) Multi-device management
> 6) Online resize/balancing
> And a lot of more.
>
> [Bad things of btrfs]
> But for enterprise usage, it's too advanced and has several problems
> preventing them being widely applied:
>
> 1) Low performance from metadata/data CoW
>     This is a little complicated dilemma.
>     Although Btrfs can disable data CoW, nodatacow also disables data
> checksum, which is another main feature for btrfs.
>     So Btrfs can't default to nodatacow, unlike XFS.
>
>     And metadata CoW causes extra metadata write along with superblock
> update (FUA), further degrading the performance.
>
>     Such pioneered design makes traditional performance-intense use case
> very unhappy.
>     Especially for almost all kind of databases. (Note that nodatacow can't
> always solve the performance problem).
>     Most performance intense usage is still based on tradtional fs design
> (journal with no CoW)
>
> 2) Low concurrency caused by tree design.
>      Unlike traditional one-tree-for-one-inode design, btrfs uses
> one-tree-for-one-subvolume.
>      The design makes snapshot implementation very easy, while makes tree
> very hot when a lot of modifiers are trying to modify any metadata.
>
>      Btrfs has a lot of different way to solve it.
>      For extent tree (the most busy tree), we are using delayed-ref to speed
> up extent tree update.
>      For fs tree fsync, we have log tree to speed things up.
>      These approaches work, at the cost of complexity and bugs, and we still
> have slow fs tree modification speed.
>
> 3) Low code reusage of device-mapper.
>      I totally understand that, due to the unique support for data csum,
> btrfs can't use device-mapper directly, as we must verify the data read out
> from device before passing it to higher level.
>     So Btrfs uses its own device-mapper like implementation to handle
> multi-device management.
>
>     The result is mixed. For easy to handle case like RAID0/1/10 btrfs is
> doing well.
>     While for RAID5/6, everyone knows the result.
>
>     Such btrfs *enhanced* re-implementation not only makes btrfs larger but
> also more complex and bug-prune.
>
> In short, btrfs is too advanced for generic use cases (performance) and
> developers (bugs), unfortunately.
>
> And even SUSE is just pushing btrfs as root fs, mainly for the snapshot
> feature.
> But still ext4/xfs for data or performance intense use case.
>
>
> [Other solution on the table]
> On the other hand, I think RedHat is pushing storage technology based on LVM
> (thin) and Xfs.
>
> For traditional LVM, it's stable but its snapshot design is old-fashion and
> low-performance.
> While new thin-provision LVM solves the problem using a method just like
> Btrfs, but at block level.
>
> And for XFS, it's still traditional designed, journal based,
> one-tree-for-one-inode.
> But with fancy new features like data CoW.
>
> Even XFS + LVM-thin lacks ability to shrink fs or scrub data or delta
> backup, it can do a lot of things just like Btrfs.
> From snapshot to multi-device management.
>
> And more importantly, has better performance for things like DB.
>
> So, for old use cases, the performance stays almost the same.
> For developers, guys are still focusing on their old fields, less to concern
> and more focused to debug. The old UNIX method still works here, do one
> thing and do it well.
>
> It provides some of the fancy features from btrfs, but not too fancy.
> It's a compromising move, but a good move for enterprise usage.
>
> [The future]
> When btrfs is almost as good as traditional solutions for both performance
> and stability, I think it will be widely applied no matter whether RedHat
> uses it or not, especially since btrfs still has features which LVM-thin +
> XFS can't provide.
>
> But the future is still full of challenges.
> 1) Complexity of btrfs makes development slow.
>     Developers are already doing their work well, but the numbers of lines
> are twice of traditional fs.
>
> 2) New device-mapper based solution may come out fast
>     Dm-thin is already here, and I won't be surprised that one day there
> will be hooks/API for device-mapper to communicate with higher levels.
>
>     For example, if one day there is some dm-csum to support verify csum of
> given ranges (and skip unrelated ones specified by higher levels), btrfs
> support for data csum is no longer an exclusive feature.


Fair enough and good conclusion, I think most of reasons come to btrfs stability
and RedHat could not have some good developers for Btrfs like you.

I think for the future, the most difficult thing for Btrfs comes to
performance, Btrfs could
not scale with CPU numbers increased, that is bad for metadata heavy load
or even small io random read/write.

Thanks,
Shilong


>
> Thanks,
> Qu
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-04 14:05 ` Qu Wenruo
  2017-08-04 23:55   ` Wang Shilong
@ 2017-08-07 15:27   ` Chris Murphy
  2017-08-10  0:35     ` Qu Wenruo
  1 sibling, 1 reply; 63+ messages in thread
From: Chris Murphy @ 2017-08-07 15:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Brendan Hide, Btrfs BTRFS

On Fri, Aug 4, 2017 at 8:05 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>     For example, if one day there is some dm-csum to support verify csum of
> given ranges (and skip unrelated ones specified by higher levels), btrfs
> support for data csum is no longer an exclusive feature.

How would dm-csum differ from dm-integrity?
https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.txt

By that description it uses a journal to guarantee atomicity. If
multiqueue maybe the performance implications are neutral. But
certainly on spinning drives that would slow things down, especially
if the file system is also journaling, and the workload is metadata
heavy.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-07 15:27   ` Chris Murphy
@ 2017-08-10  0:35     ` Qu Wenruo
  2017-08-12  0:10       ` Christoph Anton Mitterer
  0 siblings, 1 reply; 63+ messages in thread
From: Qu Wenruo @ 2017-08-10  0:35 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Brendan Hide, Btrfs BTRFS



On 2017年08月07日 23:27, Chris Murphy wrote:
> On Fri, Aug 4, 2017 at 8:05 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>      For example, if one day there is some dm-csum to support verify csum of
>> given ranges (and skip unrelated ones specified by higher levels), btrfs
>> support for data csum is no longer an exclusive feature.
> 
> How would dm-csum differ from dm-integrity?
> https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.txt

Well, pretty much the same what I want.

While my idea is to do n-way buffered csum update. (n=2 should be most 
common case)
That's to say, for CRC32(4 bytes) of 4K write, the csum space will be 
reserved as 4bytes * n.
Even crash happened, one can verify all the csum slots if it's either 
old or new data.

That's just a degraded journal anyway, and may still cause data loss for 
power loss if data is still updated half way.

> 
> By that description it uses a journal to guarantee atomicity. If
> multiqueue maybe the performance implications are neutral. But
> certainly on spinning drives that would slow things down, especially
> if the file system is also journaling, and the workload is metadata
> heavy.

That's what btrfs is good at, better co-operation between different layers.
But this doesn't mean traditional dm solution can't figure out its way.

[No double-csum for metadata for btrfs]
Btrfs will not double calculate csum for metadata, which has its own 
csum at its header.
And nodatacow data will not cause csum calculation.

But if we have extra flag bits for bio to co-operate fs and dm/block 
driver, it can still be solved. (Maybe even easier)

For example, if there is extra bio flags to info dm-integrity or any 
supported block device driver not to calculate csum for specified bio, 
then we can avoid such useless double-csum for metadata or nodatacow write.

[Good solution on data cow and csum]
Despite of that possible performance improvement, btrfs also solves the 
problem of async data and csum write, by disabling csum completely for 
nocow contents, so there is no need to journal csum write and data. 
(Journaling data is super slow).

However nowadays, fs like XFS also has its own extent backref tree to 
know if given write is new (or CoWed) write or rewrite.

So following the method above, if we have another flag to info 
dm-integrity that a given bio is rewriting or not, then it can be 
handled much better.

For example, if a bio is rewriting data and dm-integrity is configured 
for better performance, then just let dm-integrity to mark that bio 
range to nocsum and ignore existing csum.
So that dm-integrity can avoid must of its data and csum journal. (New 
or CoWed write won't need to be journaled, just like what btrfs is doing)

In short, there is always method to do more or less the same thing btrfs 
can do.
So I will not be surprised if one day there is a solution to do 
everything current btrfs can do, with robust code base and less 
modification to current kernel.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-10  0:35     ` Qu Wenruo
@ 2017-08-12  0:10       ` Christoph Anton Mitterer
  2017-08-12  7:42         ` Christoph Hellwig
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-12  0:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]

Qu Wenruo wrote:
>Although Btrfs can disable data CoW, nodatacow also disables data 
>checksum, which is another main feature for btrfs.

Then decoupling of the two should probably decoupled and support for
notdatacow+checksumming be implemented?!

I'm not an expert, but I wouldn't see why this shouldn't be possible
(especially since metadata is AFAIC anyway *always* CoWed +
checksummed).


Nearly a year ago I had some off-list mails exchanged with CM and AFAIU
he said it would technically be possible...


What's the worst thing that can happen?! IMO, that noCoWed data would
have been correctly written on a crash, but not the checksum, thereby
the (bad) checksum would invalidate the actually good data.
How likely is that compared to the other way round? I'd guess not so
much.
And even if, it's IMO still better to have then false positives (which
the higher application layers should take care of anyway) than to not
notice silent data corruption at all.


Of course checksuming would possibly impact performance, but anyway
could still use nodatacow+nochecksum (or any other fs) if he focuses
more on performance than data integrity.
But all those who focus on integrity would get that, even in the
nodatacow case.


IIRC, CM brought as an argument, that some people rather get the bad
data than nothing at all (respectively EIO)... but for those btrfs is
probably anyway a bad choice (at least in the normal non-nodatacow
case),... also any application should properly deal with EIO... and
last but not least, one could still provide a special tool that, after
crash (with possibly non-matching data/csum) allows a user to find such
cases and decide what to do,... so a user/admin who rather takes the
bad data an tries for forensical recovery could be given a tool like
btrfs csum --recompute-invalid-csums (or some better name), in which
either all (or just some paths) csums are re-written in case they don't
match.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-12  0:10       ` Christoph Anton Mitterer
@ 2017-08-12  7:42         ` Christoph Hellwig
  2017-08-12 11:51           ` Christoph Anton Mitterer
  2017-08-14  6:36           ` Qu Wenruo
  0 siblings, 2 replies; 63+ messages in thread
From: Christoph Hellwig @ 2017-08-12  7:42 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Qu Wenruo, Btrfs BTRFS

On Sat, Aug 12, 2017 at 02:10:18AM +0200, Christoph Anton Mitterer wrote:
> Qu Wenruo wrote:
> >Although Btrfs can disable data CoW, nodatacow also disables data 
> >checksum, which is another main feature for btrfs.
> 
> Then decoupling of the two should probably decoupled and support for
> notdatacow+checksumming be implemented?!

And how are you going to write your data and checksum atomically when
doing in-place updates?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-12  7:42         ` Christoph Hellwig
@ 2017-08-12 11:51           ` Christoph Anton Mitterer
  2017-08-12 12:12             ` Hugo Mills
  2017-08-14  6:36           ` Qu Wenruo
  1 sibling, 1 reply; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-12 11:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On Sat, 2017-08-12 at 00:42 -0700, Christoph Hellwig wrote:
> And how are you going to write your data and checksum atomically when
> doing in-place updates?

Maybe I misunderstand something, but what's the big deal with not doing
it atomically (I assume you mean in terms of actually writing to the
pyhsical medium)? Isn't that anyway already a problem in case of a
crash?

And isn't that the case also with all forms of e.g. software RAID (when
not having a journal)?

And as I've said, what's the worst thing that can happen? Either the
data would not have been completely written - with or without
checksumming. Then what's the difference to try the checksumming (and
do it successfully in all non crash cases)?
My understanding was (but that may be wrong of course, I'm not a
filesystem expert at all), that worst that can happen is that data an
csum aren't *both* fully written (in all possible combinations), so
we'd have four cases in total:

data=good csum=good => fine
data=bad  csum=bad  => doesn't matter whether csum or not and whether atomic or not
data=bad  csum=good => the csum will tell us, that the data is bad
data=
good csum=bad  => the only real problem, data would be actually
        
              good, but csum is not


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-12 11:51           ` Christoph Anton Mitterer
@ 2017-08-12 12:12             ` Hugo Mills
  2017-08-13 14:08               ` Goffredo Baroncelli
  0 siblings, 1 reply; 63+ messages in thread
From: Hugo Mills @ 2017-08-12 12:12 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Christoph Hellwig, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3069 bytes --]

On Sat, Aug 12, 2017 at 01:51:46PM +0200, Christoph Anton Mitterer wrote:
> On Sat, 2017-08-12 at 00:42 -0700, Christoph Hellwig wrote:
> > And how are you going to write your data and checksum atomically when
> > doing in-place updates?
> 
> Maybe I misunderstand something, but what's the big deal with not doing
> it atomically (I assume you mean in terms of actually writing to the
> pyhsical medium)? Isn't that anyway already a problem in case of a
> crash?

   With normal CoW operations, the atomicity is achieved by
constructing a completely new metadata tree containing both changes
(references to the data, and the csum metadata), and then atomically
changing the superblock to point to the new tree, so it really is
atomic.

   With nodatacow, that approach doesn't work, because the new data
replaces the old on the physical medium, so you'd have to make the
data write atomic with the superblock write -- which can't be done,
because it's (at least) two distinct writes.

> And isn't that the case also with all forms of e.g. software RAID (when
> not having a journal)?
> 
> And as I've said, what's the worst thing that can happen? Either the
> data would not have been completely written - with or without
> checksumming. Then what's the difference to try the checksumming (and
> do it successfully in all non crash cases)?
> My understanding was (but that may be wrong of course, I'm not a
> filesystem expert at all), that worst that can happen is that data an
> csum aren't *both* fully written (in all possible combinations), so
> we'd have four cases in total:
> 
> data=good csum=good => fine
> data=bad  csum=bad  => doesn't matter whether csum or not and whether atomic or not
> data=bad  csum=good => the csum will tell us, that the data is bad
> data=
> good csum=bad  => the only real problem, data would be actually
>         
>               good, but csum is not

   I don't think this is a particularly good description of the
problem. I'd say it's more like this:

   If you write data and metadata separately (which you have to do in
the nodatacow case), and the system halts between the two writes, then
you either have the new data with the old csum, or the old csum with
the new data. Both data and csum are "good", but good from different
states of the FS. In both cases (data first or metadata first), the
csum doesn't match the data, and so you now have an I/O error reported
when trying to read that data.

   You can't easily fix this, because when the data and csum don't
match, you need to know the _reason_ they don't match -- is it because
the machine was interrupted during write (in which case you can fix
it), or is it because the hard disk has had someone write data to it
directly, and the data is now toast (in which case you shouldn't fix
the I/O error)?

   Basically, nodatacow bypasses the very mechanisms that are meant to
provide consistency in the filesystem.

   Hugo.

-- 
Hugo Mills             | vi vi vi: the Editor of the Beast.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-12 12:12             ` Hugo Mills
@ 2017-08-13 14:08               ` Goffredo Baroncelli
  2017-08-14  7:08                 ` Qu Wenruo
  0 siblings, 1 reply; 63+ messages in thread
From: Goffredo Baroncelli @ 2017-08-13 14:08 UTC (permalink / raw)
  To: Hugo Mills, Christoph Anton Mitterer, Christoph Hellwig, Btrfs BTRFS

On 08/12/2017 02:12 PM, Hugo Mills wrote:
> On Sat, Aug 12, 2017 at 01:51:46PM +0200, Christoph Anton Mitterer wrote:
>> On Sat, 2017-08-12 at 00:42 -0700, Christoph Hellwig wrote:
[...]      
>>               good, but csum is not
> 
>    I don't think this is a particularly good description of the
> problem. I'd say it's more like this:
> 
>    If you write data and metadata separately (which you have to do in
> the nodatacow case), and the system halts between the two writes, then
> you either have the new data with the old csum, or the old csum with
> the new data. Both data and csum are "good", but good from different
> states of the FS. In both cases (data first or metadata first), the
> csum doesn't match the data, and so you now have an I/O error reported
> when trying to read that data.
> 
>    You can't easily fix this, because when the data and csum don't
> match, you need to know the _reason_ they don't match -- is it because
> the machine was interrupted during write (in which case you can fix
> it), or is it because the hard disk has had someone write data to it
> directly, and the data is now toast (in which case you shouldn't fix
> the I/O error)?

I am still inclined to think that this kind of problems could be solved using a journal: if you track which blocks are updated in the transaction and their checksum, if the transaction are interrupted, you can always rebuild the pair data/checksum:
in case of interruption of a transaction:
- all COW data are trashed
- some NOCOW data might be written
- all metadata (which are COW) are trashed

Supposing to log for each transaction BTRFS which "data NOCOW blocks" will be updated and their checksum, in case a transaction is interrupted you know which blocks have to be checked and are able to verify if the checksum matches and correct the mismatch. Logging also the checksum could help to identify if:
- the data is old
- the data is updated
- the updated data is correct

The same approach could be used also to solving also the issue related to the infamous RAID5/6 hole: logging which block are updated, in case of transaction aborted you can check the parity which have to be rebuild.

> 
>    Basically, nodatacow bypasses the very mechanisms that are meant to
> provide consistency in the filesystem.
> 
>    Hugo.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-12  7:42         ` Christoph Hellwig
  2017-08-12 11:51           ` Christoph Anton Mitterer
@ 2017-08-14  6:36           ` Qu Wenruo
  2017-08-14  7:43             ` Paul Jones
  2017-08-14 12:24             ` Christoph Anton Mitterer
  1 sibling, 2 replies; 63+ messages in thread
From: Qu Wenruo @ 2017-08-14  6:36 UTC (permalink / raw)
  To: Christoph Hellwig, Christoph Anton Mitterer; +Cc: Btrfs BTRFS



On 2017年08月12日 15:42, Christoph Hellwig wrote:
> On Sat, Aug 12, 2017 at 02:10:18AM +0200, Christoph Anton Mitterer wrote:
>> Qu Wenruo wrote:
>>> Although Btrfs can disable data CoW, nodatacow also disables data
>>> checksum, which is another main feature for btrfs.
>>
>> Then decoupling of the two should probably decoupled and support for
>> notdatacow+checksumming be implemented?!
> 
> And how are you going to write your data and checksum atomically when
> doing in-place updates?

Exactly, that's the main reason I can figure out why btrfs disables 
checksum for nodatacow.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-13 14:08               ` Goffredo Baroncelli
@ 2017-08-14  7:08                 ` Qu Wenruo
  2017-08-14 14:23                   ` Goffredo Baroncelli
  0 siblings, 1 reply; 63+ messages in thread
From: Qu Wenruo @ 2017-08-14  7:08 UTC (permalink / raw)
  To: kreijack, Hugo Mills, Christoph Anton Mitterer,
	Christoph Hellwig, Btrfs BTRFS



On 2017年08月13日 22:08, Goffredo Baroncelli wrote:
> On 08/12/2017 02:12 PM, Hugo Mills wrote:
>> On Sat, Aug 12, 2017 at 01:51:46PM +0200, Christoph Anton Mitterer wrote:
>>> On Sat, 2017-08-12 at 00:42 -0700, Christoph Hellwig wrote:
> [...]
>>>                good, but csum is not
>>
>>     I don't think this is a particularly good description of the
>> problem. I'd say it's more like this:
>>
>>     If you write data and metadata separately (which you have to do in
>> the nodatacow case), and the system halts between the two writes, then
>> you either have the new data with the old csum, or the old csum with
>> the new data. Both data and csum are "good", but good from different
>> states of the FS. In both cases (data first or metadata first), the
>> csum doesn't match the data, and so you now have an I/O error reported
>> when trying to read that data.
>>
>>     You can't easily fix this, because when the data and csum don't
>> match, you need to know the _reason_ they don't match -- is it because
>> the machine was interrupted during write (in which case you can fix
>> it), or is it because the hard disk has had someone write data to it
>> directly, and the data is now toast (in which case you shouldn't fix
>> the I/O error)?
> 
> I am still inclined to think that this kind of problems could be solved using a journal: if you track which blocks are updated in the transaction and their checksum, if the transaction are interrupted, you can always rebuild the pair data/checksum:
> in case of interruption of a transaction:
> - all COW data are trashed
> - some NOCOW data might be written
> - all metadata (which are COW) are trashed

The idea itself sounds good, however btrfs doesn't use journal (yet) and 
that means we need to introduce journal while btrfs uses metadata CoW to 
handle most work of journal.

> 
> Supposing to log for each transaction BTRFS which "data NOCOW blocks" will be updated and their checksum, in case a transaction is interrupted you know which blocks have to be checked and are able to verify if the checksum matches and correct the mismatch. Logging also the checksum could help to identify if:
> - the data is old
> - the data is updated
> - the updated data is correct
> 
> The same approach could be used also to solving also the issue related to the infamous RAID5/6 hole: logging which block are updated, in case of transaction aborted you can check the parity which have to be rebuild.
Indeed Liu is using journal to solve RAID5/6 write hole.

But to address the lack-of-journal nature of btrfs, he introduced a 
journal device to handle it, since btrfs metadata is either written or 
trashed, we can't rely existing btrfs metadata to handle journal.

PS: This reminds me why ZFS is still using journal (called ZFS intent 
log) but not mandatory metadata CoW of btrfs.

Thanks,
Qu

> 
>>
>>     Basically, nodatacow bypasses the very mechanisms that are meant to
>> provide consistency in the filesystem.
>>
>>     Hugo.
>>
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14  6:36           ` Qu Wenruo
@ 2017-08-14  7:43             ` Paul Jones
  2017-08-14  7:46               ` Qu Wenruo
  2017-08-14 12:24             ` Christoph Anton Mitterer
  1 sibling, 1 reply; 63+ messages in thread
From: Paul Jones @ 2017-08-14  7:43 UTC (permalink / raw)
  To: Qu Wenruo, Christoph Hellwig, Christoph Anton Mitterer; +Cc: Btrfs BTRFS

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1418 bytes --]

> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Monday, 14 August 2017 4:37 PM
> To: Christoph Hellwig <hch@infradead.org>; Christoph Anton Mitterer
> <calestyo@scientia.net>
> Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
> Subject: Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
> 
> 
> 
> On 2017年08月12日 15:42, Christoph Hellwig wrote:
> > On Sat, Aug 12, 2017 at 02:10:18AM +0200, Christoph Anton Mitterer wrote:
> >> Qu Wenruo wrote:
> >>> Although Btrfs can disable data CoW, nodatacow also disables data
> >>> checksum, which is another main feature for btrfs.
> >>
> >> Then decoupling of the two should probably decoupled and support for
> >> notdatacow+checksumming be implemented?!
> >
> > And how are you going to write your data and checksum atomically when
> > doing in-place updates?
> 
> Exactly, that's the main reason I can figure out why btrfs disables checksum
> for nodatacow.

But does it matter if it's not strictly atomic? By turning off COW it implies you accept the risk of an ill-timed failure. 
Although from my point of view any reason that would require COW to be disabled implies you're using the wrong filesystem anyway.

Paul.







ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14  7:43             ` Paul Jones
@ 2017-08-14  7:46               ` Qu Wenruo
  2017-08-14 12:32                 ` Christoph Anton Mitterer
  0 siblings, 1 reply; 63+ messages in thread
From: Qu Wenruo @ 2017-08-14  7:46 UTC (permalink / raw)
  To: Paul Jones, Christoph Hellwig, Christoph Anton Mitterer; +Cc: Btrfs BTRFS



On 2017年08月14日 15:43, Paul Jones wrote:
>> -----Original Message-----
>> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
>> owner@vger.kernel.org] On Behalf Of Qu Wenruo
>> Sent: Monday, 14 August 2017 4:37 PM
>> To: Christoph Hellwig <hch@infradead.org>; Christoph Anton Mitterer
>> <calestyo@scientia.net>
>> Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
>> Subject: Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
>>
>>
>>
>> On 2017年08月12日 15:42, Christoph Hellwig wrote:
>>> On Sat, Aug 12, 2017 at 02:10:18AM +0200, Christoph Anton Mitterer wrote:
>>>> Qu Wenruo wrote:
>>>>> Although Btrfs can disable data CoW, nodatacow also disables data
>>>>> checksum, which is another main feature for btrfs.
>>>>
>>>> Then decoupling of the two should probably decoupled and support for
>>>> notdatacow+checksumming be implemented?!
>>>
>>> And how are you going to write your data and checksum atomically when
>>> doing in-place updates?
>>
>> Exactly, that's the main reason I can figure out why btrfs disables checksum
>> for nodatacow.
> 
> But does it matter if it's not strictly atomic? By turning off COW it implies you accept the risk of an ill-timed failure.

The problem here is, if you enable csum and even data is updated 
correctly, only metadata is trashed, then you can't even read out the 
correct data.

As btrfs csum checker will just prevent you from reading out any data 
which doesn't match with csum.

Now it's not just data corruption, but data loss then.

Thanks,
Qu

> Although from my point of view any reason that would require COW to be disabled implies you're using the wrong filesystem anyway.
> 
> Paul.
> 
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14  6:36           ` Qu Wenruo
  2017-08-14  7:43             ` Paul Jones
@ 2017-08-14 12:24             ` Christoph Anton Mitterer
  2017-08-14 14:23               ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-14 12:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

On Mon, 2017-08-14 at 14:36 +0800, Qu Wenruo wrote:
> > And how are you going to write your data and checksum atomically
> > when
> > doing in-place updates?
> 
> Exactly, that's the main reason I can figure out why btrfs disables 
> checksum for nodatacow.

Still, I don't get the problem here...

Yes it cannot be done atomically (without workarounds like a journal or
so), but this should be only an issue in case of a crash or similar.

And in this case nodatacow+nochecksum is anyway already bad, it's also
not atomic, so data may be completely garbage (e.g. half written)...
just that no one will ever notice.

The only problem that nodatacow + checksuming + nonatomic should give
is when the data was actually correctly written at a crash, but the
cheksum was not, in which case the bogus checksum would invalidate the
good data on next read.

Or do I miss something?


To me that sounds still much better than having no protection at all.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14  7:46               ` Qu Wenruo
@ 2017-08-14 12:32                 ` Christoph Anton Mitterer
  2017-08-14 12:58                   ` Qu Wenruo
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-14 12:32 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1705 bytes --]

On Mon, 2017-08-14 at 15:46 +0800, Qu Wenruo wrote:
> The problem here is, if you enable csum and even data is updated 
> correctly, only metadata is trashed, then you can't even read out
> the 
> correct data.

So what?
This problem occurs anyway *only* in case of a crash,.. and *only* if
notdatacow+checksumung would be used.
A case in which currently, the user can either only hope that his data
is fine (unless higher levels provide some checksumming means[0]), or
anyway needs to recover from a backup.

Intuitively I'd also say it's much less likely that the data (which is
more in terms of space) is written correctly while the checksum is not.
Or is it?



[0] And when I've investigated back when discussion rose up the first
time and some list member claimed that most typical cases (DBs, VM
images) would anyway do their own checksuming,... I came to the
conclusion that most did not even support it and even if they would
it's no enabled per default and not really a *full* checksumming in
most cases.



> As btrfs csum checker will just prevent you from reading out any
> data 
> which doesn't match with csum.
As I've said before, a tool could be provided, that re-computes the
checksums then (making the data accessible again)... or one could
simply mount the fs with nochecksum or some other special option, which
allows bypassing any checks.

> Now it's not just data corruption, but data loss then.
I think the former is worse than the later. The later gives you a
chance of noting it, and either recover from a backup, regenerate the
data (if possible) or manually mark the data as being "good" (though
corrupted) again.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 12:32                 ` Christoph Anton Mitterer
@ 2017-08-14 12:58                   ` Qu Wenruo
  0 siblings, 0 replies; 63+ messages in thread
From: Qu Wenruo @ 2017-08-14 12:58 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Btrfs BTRFS



On 2017年08月14日 20:32, Christoph Anton Mitterer wrote:
> On Mon, 2017-08-14 at 15:46 +0800, Qu Wenruo wrote:
>> The problem here is, if you enable csum and even data is updated
>> correctly, only metadata is trashed, then you can't even read out
>> the
>> correct data.
> 
> So what?
> This problem occurs anyway *only* in case of a crash,.. and *only* if
> notdatacow+checksumung would be used.
> A case in which currently, the user can either only hope that his data
> is fine (unless higher levels provide some checksumming means[0]), or
> anyway needs to recover from a backup.

Let's make it clear of the combinations and its result in power loss case:

Datacow + Datasum: Good old data

Datacow + nodatasum: Good old data

Nodatacow + datacum: Good old data (data not committed yet) or -EIO 
(data updated)
Not supported yet, so I just assume it's using current csum checking 
behavior.

Nodatacow + nodatasum: Good old data (data not committed yet) or 
uncertain data.

The uncertain part is when data updated, what it should behave.

If we really need to implement nodatacow +datasum, I prefer to make it 
consistent with nodatacow + nodatasum behavior, at least read out the 
data, give some csum warning instead of refuse to read and returning -EIO.

> 
> Intuitively I'd also say it's much less likely that the data (which is
> more in terms of space) is written correctly while the checksum is not.
> Or is it?

Checksums are protected by mandatory metadata CoW, so metadata update is 
always atomic.
Checksum will either be updated correctly, or trashed at all. Unlike data.

And it's highly possible to happen. As when synchronising a filesystem, 
we write data first, then metadata (data and meta may be cached by disk 
controller, but at least we submit such request to disk), then flush all 
data and metadata to disk, and update superblock finally.

Since metadata is updated CoW, unless the superblock is written to disk, 
we are always reading the old metadata trees (including csum tree).

So if powerloss happens between data written to disk and final 
superblock update, it's highly possible to hit the problem.
And considering the data/metadata ratio, we spend more time flushing 
data other than metadata, which increase the possibility further more.

> 
> [0] And when I've investigated back when discussion rose up the first
> time and some list member claimed that most typical cases (DBs, VM
> images) would anyway do their own checksuming,... I came to the
> conclusion that most did not even support it and even if they would
> it's no enabled per default and not really a *full* checksumming in
> most cases.
> 
> 
> 
>> As btrfs csum checker will just prevent you from reading out any
>> data
>> which doesn't match with csum.
> As I've said before, a tool could be provided, that re-computes the
> checksums then (making the data accessible again)... or one could
> simply mount the fs with nochecksum or some other special option, which
> allows bypassing any checks.

Just as you pointed out, such csum bypassing should be the prerequisite 
for nodatacow+datasum.
And unfortunately, we don't have such facility yet.

> 
>> Now it's not just data corruption, but data loss then.
> I think the former is worse than the later. The later gives you a
> chance of noting it, and either recover from a backup, regenerate the
> data (if possible) or manually mark the data as being "good" (though
> corrupted) again.

This depends.
If the upper layer has its own error detection mechanism, like keeping a 
special file fsynced before write (or just call it journal), then 
allowing reading out the corrupted data gives it a chance to find it 
good and continue.
While just returning -EIO kills the chance at all.

BTW, normal user space programs can handle csum mismatch better than -EIO.
Like zip files has its own checksum, btw can't handle -EIO at all.

Thanks,
Qu

> 
> 
> Cheers,
> Chris.
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14  7:08                 ` Qu Wenruo
@ 2017-08-14 14:23                   ` Goffredo Baroncelli
  2017-08-14 19:08                     ` Chris Murphy
  0 siblings, 1 reply; 63+ messages in thread
From: Goffredo Baroncelli @ 2017-08-14 14:23 UTC (permalink / raw)
  To: Qu Wenruo, Hugo Mills, Christoph Anton Mitterer,
	Christoph Hellwig, Btrfs BTRFS, Liu Bo

On 08/14/2017 09:08 AM, Qu Wenruo wrote:
> 
>>
>> Supposing to log for each transaction BTRFS which "data NOCOW blocks" will be updated and their checksum, in case a transaction is interrupted you know which blocks have to be checked and are able to verify if the checksum matches and correct the mismatch. Logging also the checksum could help to identify if:
>> - the data is old
>> - the data is updated
>> - the updated data is correct
>>
>> The same approach could be used also to solving also the issue related to the infamous RAID5/6 hole: logging which block are updated, in case of transaction aborted you can check the parity which have to be rebuild.
> Indeed Liu is using journal to solve RAID5/6 write hole.
> 
> But to address the lack-of-journal nature of btrfs, he introduced a journal device to handle it, since btrfs metadata is either written or trashed, we can't rely existing btrfs metadata to handle journal.

The Liu's solution is a lot heavier. With the Liu's solution, you need to write both the data and parity 2 times. I am only suggest to track the block to update. And it would be only need for the stripes involved by a RMW cycle. This is a lot less data to write (8 byte vs 4Kbyte)

> 
> PS: This reminds me why ZFS is still using journal (called ZFS intent log) but not mandatory metadata CoW of btrfs.

Form a theoretical point of view, if you have a "PURE" COW file-system, you don't need a journal. Unfortunately a RAID5/6 stripe update is a RMW cycle, so you need a journal to keep it in sync. The same is true for the NOCOW file (and their checksums)


> 
> Thanks,
> Qu


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 12:24             ` Christoph Anton Mitterer
@ 2017-08-14 14:23               ` Austin S. Hemmelgarn
  2017-08-14 15:13                 ` Graham Cobb
  2017-08-14 19:39                 ` Christoph Anton Mitterer
  0 siblings, 2 replies; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-14 14:23 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Qu Wenruo; +Cc: Btrfs BTRFS

On 2017-08-14 08:24, Christoph Anton Mitterer wrote:
> On Mon, 2017-08-14 at 14:36 +0800, Qu Wenruo wrote:
>>> And how are you going to write your data and checksum atomically
>>> when
>>> doing in-place updates?
>>
>> Exactly, that's the main reason I can figure out why btrfs disables
>> checksum for nodatacow.
> 
> Still, I don't get the problem here...
> 
> Yes it cannot be done atomically (without workarounds like a journal or
> so), but this should be only an issue in case of a crash or similar.
> 
> And in this case nodatacow+nochecksum is anyway already bad, it's also
> not atomic, so data may be completely garbage (e.g. half written)...
> just that no one will ever notice.
> 
> The only problem that nodatacow + checksuming + nonatomic should give
> is when the data was actually correctly written at a crash, but the
> cheksum was not, in which case the bogus checksum would invalidate the
> good data on next read.
> 
> Or do I miss something?
> 
> 
> To me that sounds still much better than having no protection at all.
Assume you have higher level verification.  Would you rather not be able 
to read the data regardless of if it's correct or not, or be able to 
read it and determine yourself if it's correct or not?  For almost 
anybody, the answer is going to be the second case, because the 
application knows better than the OS if the data is correct (and 
'correct' may be a threshold, not some binary determination).  At that 
point, you need to make the checksum error a warning instead of 
returning -EIO.  How do you intend to communicate that warning back to 
the application?  The kernel log won't work, because on any reasonably 
secure system it's not visible to anyone but root.  There's also no side 
channel for the read() system calls that you can utilize.  That then 
means that the checksums end up just being a means for the administrator 
to know some data wasn't written correctly, but they should know that 
anyway because the system crashed.  As a result, the whole thing ends up 
reduced to some extra work for a pointless notification that some people 
may not even see.

Looking at this from a different angle: Without background, what would 
you assume the behavior to be for this?  For most people, the assumption 
would be that this provides the same degree of data safety that the 
checksums do when the data is CoW.  We already have enough issues with 
people misunderstanding how things work and losing data as a result 
(keep in mind that the average user doesn't read documentation and will 
often blindly follow any random advice they see online), and we don't 
need more that are liable to cause data loss.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 14:23               ` Austin S. Hemmelgarn
@ 2017-08-14 15:13                 ` Graham Cobb
  2017-08-14 15:53                   ` Austin S. Hemmelgarn
  2017-08-14 19:39                 ` Christoph Anton Mitterer
  1 sibling, 1 reply; 63+ messages in thread
From: Graham Cobb @ 2017-08-14 15:13 UTC (permalink / raw)
  To: Btrfs BTRFS

On 14/08/17 15:23, Austin S. Hemmelgarn wrote:
> Assume you have higher level verification.  

But almost no applications do. In real life, the decision
making/correction process will be manual and labour-intensive (for
example, running fsck on a virtual disk or restoring a file from backup).

> Would you rather not be able
> to read the data regardless of if it's correct or not, or be able to
> read it and determine yourself if it's correct or not?  

It must be controllable on a per-file basis, of course. For the tiny
number of files where the app can both spot the problem and correct it
(for example if it has a journal) the current behaviour could be used.

But, on MY system, I absolutely would **always** select the first option
(-EIO). I need to know that a potential problem may have occurred and
will take manual action to decide what to do. Of course, this also needs
a special utility (as Christoph proposed) to be able to force the read
(to allow me to examine the data) and to be able to reset the checksum
(although that is presumably as simple as rewriting the data).

This is what happens normally with any filesystem when a disk block goes
bad, but with the additional benefit of being able to examine a
"possibly valid" version of the data block before overwriting it.

> Looking at this from a different angle: Without background, what would
> you assume the behavior to be for this?  For most people, the assumption
> would be that this provides the same degree of data safety that the
> checksums do when the data is CoW.  

Exactly. The naive expectation is that turning off datacow does not
prevent the bitrot checking from working. Also, the naive expectation
(for any filesystem operation) is that if there is any doubt about the
reliability of the data, the error is reported for the user to deal with.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 15:13                 ` Graham Cobb
@ 2017-08-14 15:53                   ` Austin S. Hemmelgarn
  2017-08-14 16:42                     ` Graham Cobb
  2017-08-14 19:54                     ` Christoph Anton Mitterer
  0 siblings, 2 replies; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-14 15:53 UTC (permalink / raw)
  To: Graham Cobb, Btrfs BTRFS

On 2017-08-14 11:13, Graham Cobb wrote:
> On 14/08/17 15:23, Austin S. Hemmelgarn wrote:
>> Assume you have higher level verification.
> 
> But almost no applications do. In real life, the decision
> making/correction process will be manual and labour-intensive (for
> example, running fsck on a virtual disk or restoring a file from backup).
Quite a few applications actually _do_ have some degree of secondary 
verification or protection from a crash.  Go look at almost any database 
software.  It usually will not have checksumming, but it will almost 
always have support for a journal, which is enough to cover the 
particular data loss scenario we're talking about (unexpected unclean 
shutdown).
> 
>> Would you rather not be able
>> to read the data regardless of if it's correct or not, or be able to
>> read it and determine yourself if it's correct or not?
> 
> It must be controllable on a per-file basis, of course. For the tiny
> number of files where the app can both spot the problem and correct it
> (for example if it has a journal) the current behaviour could be used.
In my own experience, the things that use nodatacow fall into one of 4 
categories:
1. Cases where the data is non-critical, and data loss will be 
inconvenient but not fatal.  Systemd journal files are a good example of 
this, as are web browser profiles when you're using profile sync.
2. Cases where the upper level can reasonably be expected to have some 
degree of handling, even if it's not correction.  VM disk images and 
most database applications fall into this category.
3. Cases where data corruption will take out the application anyway. 
Poorly written database software is the primary example of this.
4. Things that shouldn't be using nodatacow because data safety is the 
most important aspect of the system.

The first two cases work perfectly fine with the current behavior and 
are arguably no better off either way.  The third is functionally fine 
with the current behavior provided that the crash doesn't change state 
(which isn't a guarantee), but could theoretically benefit from the 
determinism of knowing the app will die if the data is bad.  The fourth 
is what most people seem to want this for, and don't realize that even 
if this is implemented, they will be no better off on average.
> 
> But, on MY system, I absolutely would **always** select the first option
> (-EIO). I need to know that a potential problem may have occurred and
> will take manual action to decide what to do. Of course, this also needs
> a special utility (as Christoph proposed) to be able to force the read
> (to allow me to examine the data) and to be able to reset the checksum
> (although that is presumably as simple as rewriting the data).
And I and most other sysadmins I know would prefer the opposite with the 
addition of a secondary notification method.  You can still hook the 
notification to stop the application, but you don't have to if you don't 
want to (and in cases 1 and 2 I listed above, you probably don't want to).
> 
> This is what happens normally with any filesystem when a disk block goes
> bad, but with the additional benefit of being able to examine a
> "possibly valid" version of the data block before overwriting it.
> 
>> Looking at this from a different angle: Without background, what would
>> you assume the behavior to be for this?  For most people, the assumption
>> would be that this provides the same degree of data safety that the
>> checksums do when the data is CoW.
> 
> Exactly. The naive expectation is that turning off datacow does not
> prevent the bitrot checking from working. Also, the naive expectation
> (for any filesystem operation) is that if there is any doubt about the
> reliability of the data, the error is reported for the user to deal with.
The problem is that the naive expectation about data safety appears to 
be that adding checksumming support for nodatacow will improve safety, 
which it WILL NOT do.  All it will do is add some reporting that will 
have a 50%+ rate of false positives (there is the very real possibility 
that the unexpected power loss will corrupt the checksum or the data if 
you're on anything but a traditional hard drive).  If you have something 
that you need data safety for and can't be arsed to pay attention to 
whether or not your system had an unclean shutdwon, then you have two 
practical options:
1. Don't use nodatacow.
2. Do some form of higher level verification.
Nothing about that is going to magically change because you suddenly 
have checksums telling you the data might be bad.

Now, you _might_ be better off in a situation where the data got 
corrupted for some other reason (say, a media error for example), but 
even then you should have higher level verification, and it won't 
provide much benefit unless you're using replication or parity in BTRFS.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 15:53                   ` Austin S. Hemmelgarn
@ 2017-08-14 16:42                     ` Graham Cobb
  2017-08-14 19:54                     ` Christoph Anton Mitterer
  1 sibling, 0 replies; 63+ messages in thread
From: Graham Cobb @ 2017-08-14 16:42 UTC (permalink / raw)
  To: Btrfs BTRFS

On 14/08/17 16:53, Austin S. Hemmelgarn wrote:
> Quite a few applications actually _do_ have some degree of secondary
> verification or protection from a crash.  

I am glad your applications do and you have no need of this feature.
You are welcome not to use it. I, on the other hand, definitely want
this feature and would have it enabled by default on all my systems
despite the need for manual actions after some unclean shutdowns.

> Go look at almost any database
> software.  It usually will not have checksumming, but it will almost
> always have support for a journal, which is enough to cover the
> particular data loss scenario we're talking about (unexpected unclean
> shutdown).

No, the problem we are talking about is the data-at-rest corruption that
checksumming is designed to deal with. That is why I want it. The
unclean shutdown is a side issue that means there is a trade-off to
using it.

No one is suggesting that checksums are any significant help with the
unclean shutdown case, just that the existence of that atomicity issue
does not **prevent** them being very useful for the function for which
they were designed. The degree to which any particular sysadmin will
choose to enable or disable checksums on nodatacow files will depend on
how much they value the checksum protection vs. the impact of manually
fixing problems after some unclean shutdowns.

In my particular case, many of these nodatacow files are large, very
long-lived and only in use intermittently. I would like my monthly
"btrfs scrub" to know they haven't gone bad but they are extremely
unlikely to be in the middle of a write during an unclean shutdown so I
am likely to have very few false errors. They are all backed up, but
without checksumming I don't know that the backup needs to be restored
(or even that I am not backing up now-bad data).

Graham

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 14:23                   ` Goffredo Baroncelli
@ 2017-08-14 19:08                     ` Chris Murphy
  2017-08-14 20:27                       ` Goffredo Baroncelli
  0 siblings, 1 reply; 63+ messages in thread
From: Chris Murphy @ 2017-08-14 19:08 UTC (permalink / raw)
  To: Goffredo Baroncelli
  Cc: Qu Wenruo, Hugo Mills, Christoph Anton Mitterer,
	Christoph Hellwig, Btrfs BTRFS, Liu Bo

On Mon, Aug 14, 2017 at 8:23 AM, Goffredo Baroncelli <kreijack@inwind.it> wrote:

> Form a theoretical point of view, if you have a "PURE" COW file-system, you don't need a journal. Unfortunately a RAID5/6 stripe update is a RMW cycle, so you need a journal to keep it in sync. The same is true for the NOCOW file (and their checksums)
>

I'm pretty sure the raid56 rmw is in memory only, I don't think we
have a case where a stripe is getting partial writes (a block in a
stripe is being overwritten). Partial stripe updates with rmw *on
disk* would mean Btrfs raid56 is not CoW.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 14:23               ` Austin S. Hemmelgarn
  2017-08-14 15:13                 ` Graham Cobb
@ 2017-08-14 19:39                 ` Christoph Anton Mitterer
  1 sibling, 0 replies; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-14 19:39 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3244 bytes --]

On Mon, 2017-08-14 at 10:23 -0400, Austin S. Hemmelgarn wrote:
> Assume you have higher level verification.  Would you rather not be
> able 
> to read the data regardless of if it's correct or not, or be able to 
> read it and determine yourself if it's correct or not?

What would be the difference here then to the CoW+checksuming+some-
data-corruption-case?!
btrfs would also give EIO and all these applications you mention would
fail then.

As I've said previous, one could provide end users with the means to
still access the faulty data. Or they could simply mount with
nochecksum.




> For almost 
> anybody, the answer is going to be the second case, because the 
> application knows better than the OS if the data is correct (and 
> 'correct' may be a threshold, not some binary determination).
You've made that claim already once with VMs and DBs, and your claim
proved simply wrong.

Most applications don't do this kind of verification.

And those that do probably rather just check whether the data is valid
and if not give an error or at best fall back to some automatical
backups (e.g. what package managers do).

I'd know only few programs who'd really be capable to use data they
know is bogus and recover from that automagically... the only examples
I'd know are some archive formats which include error correcting codes.
And I really mean using the blocks for recovery for which the csum
wouldn't verify (i.e. the ones that gives an EIO)... without ECCs, how
would a program know what do to with such data?


I cannot image that many people would choose the second option, to be
honest.
Working with bogus data?! What should be the benefit of this?



>   At that 
> point, you need to make the checksum error a warning instead of 
> returning -EIO.  How do you intend to communicate that warning back
> to 
> the application?  The kernel log won't work, because on any
> reasonably 
> secure system it's not visible to anyone but root.

Still same problem with CoW + any data corruption...

>   There's also no side 
> channel for the read() system calls that you can utilize.  That then 
> means that the checksums end up just being a means for the
> administrator 
> to know some data wasn't written correctly, but they should know
> that 
> anyway because the system crashed.

No, they'd have no idea if any / which data was written during the
crash.



> Looking at this from a different angle: Without background, what
> would 
> you assume the behavior to be for this?  For most people, the
> assumption 
> would be that this provides the same degree of data safety that the 
> checksums do when the data is CoW.

I don't think the average use would have any such assumption. Most
people likely don't even know that there is implicitly no checksuming
if nodatacow is enabled.


What people may however have heard of is, that btrfs does doe
checksuming and they'd assume that their filesystem gives them always
just valid data (or an error)... and IMO that's actually what each
modern fs should do per default.
Relying on higher levels providing such means is simply not realistic.



Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 15:53                   ` Austin S. Hemmelgarn
  2017-08-14 16:42                     ` Graham Cobb
@ 2017-08-14 19:54                     ` Christoph Anton Mitterer
  2017-08-15 11:37                       ` Austin S. Hemmelgarn
  2017-08-16 13:12                       ` Chris Mason
  1 sibling, 2 replies; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-14 19:54 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 2409 bytes --]

On Mon, 2017-08-14 at 11:53 -0400, Austin S. Hemmelgarn wrote:
> Quite a few applications actually _do_ have some degree of secondary 
> verification or protection from a crash.  Go look at almost any
> database 
> software.
Then please give proper references for this!

This is from 2015, where you claimed this already and I looked up all
the bigger DBs and they either couldn't do it at all, didn't to it per
default, or it required application support (i.e. from the programs
using the DB)
https://www.spinics.net/lists/linux-btrfs/msg50258.html


> It usually will not have checksumming, but it will almost 
> always have support for a journal, which is enough to cover the 
> particular data loss scenario we're talking about (unexpected
> unclean 
> shutdown).

I don't think we talk about this:
We talk about people wanting checksuming to notice e.g. silent data
corruption.

The crash case is only the corner case about what happens then if data
is written correctly but csums not.


> In my own experience, the things that use nodatacow fall into one of
> 4 
> categories:
> 1. Cases where the data is non-critical, and data loss will be 
> inconvenient but not fatal.  Systemd journal files are a good example
> of 
> this, as are web browser profiles when you're using profile sync.

I'd guess many people would want to have their log files valid and
complete. Same for their profiles (especially since people concerned
about their integrity might not want to have these synced to
Mozilla/Google etc.)


> 2. Cases where the upper level can reasonably be expected to have
> some 
> degree of handling, even if it's not correction.  VM disk images and 
> most database applications fall into this category.

No. Wrong. Or prove me that I'm wrong ;-)
And these two (VMs, DBs) are actually *the* main cases for nodatacow.


> And I and most other sysadmins I know would prefer the opposite with
> the 
> addition of a secondary notification method.  You can still hook the 
> notification to stop the application, but you don't have to if you
> don't 
> want to (and in cases 1 and 2 I listed above, you probably don't want
> to).

Then I guess btrfs is generally not the right thing for such people, as
in the CoW case it will also give them EIO on any corruptions and their
programs will fail.



Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 19:08                     ` Chris Murphy
@ 2017-08-14 20:27                       ` Goffredo Baroncelli
  0 siblings, 0 replies; 63+ messages in thread
From: Goffredo Baroncelli @ 2017-08-14 20:27 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Qu Wenruo, Hugo Mills, Christoph Anton Mitterer,
	Christoph Hellwig, Btrfs BTRFS, Liu Bo

On 08/14/2017 09:08 PM, Chris Murphy wrote:
> On Mon, Aug 14, 2017 at 8:23 AM, Goffredo Baroncelli <kreijack@inwind.it> wrote:
> 
>> Form a theoretical point of view, if you have a "PURE" COW file-system, you don't need a journal. Unfortunately a RAID5/6 stripe update is a RMW cycle, so you need a journal to keep it in sync. The same is true for the NOCOW file (and their checksums)
>>
> 
> I'm pretty sure the raid56 rmw is in memory only, I don't think we
> have a case where a stripe is getting partial writes (a block in a
> stripe is being overwritten). Partial stripe updates with rmw *on
> disk* would mean Btrfs raid56 is not CoW.
> 

I am not sure about that. Consider the following cases:
- what if we have to wrote less than a stripe ?
- supposing to remove a file with length of 4k. If you don't allow a RMW cycle, this means the space would be lost forever....

Pay attention that the size of a stripe (theoretically) could be quite big: suppose to have an (insanely) raid compose by 20disks, the stripe would be about 20*64k = ~1.2MB....)

BR
G.Baroncelli


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 19:54                     ` Christoph Anton Mitterer
@ 2017-08-15 11:37                       ` Austin S. Hemmelgarn
  2017-08-15 14:41                         ` Christoph Anton Mitterer
  2017-08-16 13:12                       ` Chris Mason
  1 sibling, 1 reply; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-15 11:37 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Btrfs BTRFS

On 2017-08-14 15:54, Christoph Anton Mitterer wrote:
> On Mon, 2017-08-14 at 11:53 -0400, Austin S. Hemmelgarn wrote:
>> Quite a few applications actually _do_ have some degree of secondary
>> verification or protection from a crash.  Go look at almost any
>> database
>> software.
> Then please give proper references for this!
> 
> This is from 2015, where you claimed this already and I looked up all
> the bigger DBs and they either couldn't do it at all, didn't to it per
> default, or it required application support (i.e. from the programs
> using the DB)
> https://www.spinics.net/lists/linux-btrfs/msg50258.html
Go look at Chrome, or Firefox, or Opera, or any other major web browser. 
  At minimum, they will safely bail out if they detect corruption in the 
user profile and can trivially resync the profile from another system if 
the user has profile sync set up.  Go take a look at any enterprise 
database application from a reasonable company, it will almost always 
support replication across systems and validate data it reads.  Note 
that in both cases this isn't the same as BTRFS checking block 
checksums, and I never said that the application had to work without 
issue, even BTRFS and ZFS can only provide that guarantee with multiple 
devices or dup profiles on a single disk, but I can count on one hand 
the software I've used in the last few years that didn't at least fail 
gracefully when fed bad data (and sending -EIO when a checksum fails is 
essentially the same thing).
> 
>> It usually will not have checksumming, but it will almost
>> always have support for a journal, which is enough to cover the
>> particular data loss scenario we're talking about (unexpected
>> unclean
>> shutdown).
> 
> I don't think we talk about this:
> We talk about people wanting checksuming to notice e.g. silent data
> corruption.
> 
> The crash case is only the corner case about what happens then if data
> is written correctly but csums not.
> 
> 
>> In my own experience, the things that use nodatacow fall into one of
>> 4
>> categories:
>> 1. Cases where the data is non-critical, and data loss will be
>> inconvenient but not fatal.  Systemd journal files are a good example
>> of
>> this, as are web browser profiles when you're using profile sync.
> 
> I'd guess many people would want to have their log files valid and
> complete. Same for their profiles (especially since people concerned
> about their integrity might not want to have these synced to
> Mozilla/Google etc.)
Agreed, but there's also the counter argument for log files that most 
people who are not running servers rarely (if ever) look at old logs, 
and it's the old logs that are the most likely to have at rest 
corruption (the longer something sits idle on media, the more likely it 
will suffer from a media error).
> 
> 
>> 2. Cases where the upper level can reasonably be expected to have
>> some
>> degree of handling, even if it's not correction.  VM disk images and
>> most database applications fall into this category.
> 
> No. Wrong. Or prove me that I'm wrong ;-)
> And these two (VMs, DBs) are actually *the* main cases for nodatacow.
Go install OpenSUSE in a VM.  Look at what filesystem it uses.  Go 
install Solaris in a VM, lo and behold it uses ZFS _with no option for 
anything else_ as it's root filesystem.  Go install a recent version of 
Windows server in a VM, notice that it also has the option of a properly 
checked filesystem (ReFS).  Go install FreeBSD in a VM, notice that it 
provides the option (which is actively recommended by many people who 
use FreeBSD) to install with root on ZFS.  Install Android or Chrome OS 
(or AOSP or Chromium OS) in a VM.  Root the system and take a look at 
the storage stack, both of them use dm-verity, and Android (and possibly 
Chrome OS too, not 100% certain) uses per-file AEAD through the VFS 
encryption API on encrypted devices.  The fact that some OS'es blindly 
trust the underlying storage hardware is not our issue, it's their 
issue, and it shouldn't be 'fixed' by BTRFS because it doesn't just 
affect their customers who run the OS in a VM on BTRFS.

As far as databases, I know of only one piece of enterprise level 
database software that doesn't have some kind of handling for this type 
of thing, and it's a a horribly designed piece of software other than 
that too.  Most enterprise database apps offer support for replication, 
and quite a few do their own data validation when reading from the 
database.  And if you care about non-enterprise database apps, then you 
need to worry about the edge case caused by unclean shutdown.
> 
> 
>> And I and most other sysadmins I know would prefer the opposite with
>> the
>> addition of a secondary notification method.  You can still hook the
>> notification to stop the application, but you don't have to if you
>> don't
>> want to (and in cases 1 and 2 I listed above, you probably don't want
>> to).
> 
> Then I guess btrfs is generally not the right thing for such people, as
> in the CoW case it will also give them EIO on any corruptions and their
> programs will fail.
For a single disk?  Yes, I'd agree that BTRFS isn't the correct answer 
unless you're running dup for all profiles on said single disk when you 
care about data safety.  Once you add another though, it's far superior 
to regular RAID because it knows inherently which copy is wrong.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-15 11:37                       ` Austin S. Hemmelgarn
@ 2017-08-15 14:41                         ` Christoph Anton Mitterer
  2017-08-15 15:43                           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-15 14:41 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3959 bytes --]

On Tue, 2017-08-15 at 07:37 -0400, Austin S. Hemmelgarn wrote:
> Go look at Chrome, or Firefox, or Opera, or any other major web
> browser. 
>   At minimum, they will safely bail out if they detect corruption in
> the 
> user profile and can trivially resync the profile from another system
> if 
> the user has profile sync set up.

Aha,... I'd rather see a concrete reference to some white paper or
code, where one can really see that these programs actually *do* their
own checksumming.
But even from what you claim here now (that they'd only detect the
corruption and then resync from another system - which is nothing else
than recovering from a backup), I wouldn't see the big problem with
EIO.


> Go take a look at any enterprise 
> database application from a reasonable company, it will almost
> always 
> support replication across systems and validate data it reads.

Okay, I already showed you, that PostgreSQL, MySQL, BDB, sqlite can't
or don't do per default... so which do you mean with the enterprise DB
(Oracle?) and where's the reference that shows that they really do
general checksuming? And that EIO would be a problem for their recovery
strategies?

And again, we're not talking about the WALs (or whatever these programs
call it) which are there to handle a crash... we are talking about
silent data corruption.



> Agreed, but there's also the counter argument for log files that
> most 
> people who are not running servers rarely (if ever) look at old
> logs, 
> and it's the old logs that are the most likely to have at rest 
> corruption (the longer something sits idle on media, the more likely
> it 
> will suffer from a media error).

I wouldn't have any valid prove that it's really the "idle" data, which
is the most likely one to have silent corruptions (at least not for all
types of storage medium), but even if this is the case as you say...
then it's probably more likely to hit the /usr/ /lib/ and so on stuff
on stable distros... logs are typically rotated and then at least once
re-written (when compressed).


> Go install OpenSUSE in a VM.  Look at what filesystem it uses.  Go 
> install Solaris in a VM, lo and behold it uses ZFS _with no option
> for 
> anything else_ as it's root filesystem.  Go install a recent version
> of 
> Windows server in a VM, notice that it also has the option of a
> properly 
> checked filesystem (ReFS).  Go install FreeBSD in a VM, notice that
> it 
> provides the option (which is actively recommended by many people
> who 
> use FreeBSD) to install with root on ZFS.  Install Android or Chrome
> OS 
> (or AOSP or Chromium OS) in a VM.  Root the system and take a look
> at 
> the storage stack, both of them use dm-verity, and Android (and
> possibly 
> Chrome OS too, not 100% certain) uses per-file AEAD through the VFS 
> encryption API on encrypted devices.

So your argument for not adding support for this is basically:
People don't or shouldn't use btrfs for this? o.O



>   The fact that some OS'es blindly 
> trust the underlying storage hardware is not our issue, it's their 
> issue, and it shouldn't be 'fixed' by BTRFS because it doesn't just 
> affect their customers who run the OS in a VM on BTRFS.

Then you can probably drop checksumming from btrfs altogether. And with
the same "argument" any other advanced feature.
For resilience there is hardware RAID or Linux' MD raid... so no need
to keep it in btrfs o.O


> Most enterprise database apps offer support for
> replication, 
> and quite a few do their own data validation when reading from the 
> database.
First of all,... replication != the capability to detect silent data
corruption.

You still haven't named a single one which does checksumming per
default. At least those which are quite popular in the FLOSS world all
don't seem to do.



Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-15 14:41                         ` Christoph Anton Mitterer
@ 2017-08-15 15:43                           ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-15 15:43 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Btrfs BTRFS

On 2017-08-15 10:41, Christoph Anton Mitterer wrote:
> On Tue, 2017-08-15 at 07:37 -0400, Austin S. Hemmelgarn wrote:
>> Go look at Chrome, or Firefox, or Opera, or any other major web
>> browser.
>>    At minimum, they will safely bail out if they detect corruption in
>> the
>> user profile and can trivially resync the profile from another system
>> if
>> the user has profile sync set up.
> 
> Aha,... I'd rather see a concrete reference to some white paper or
> code, where one can really see that these programs actually *do* their
> own checksumming.
> But even from what you claim here now (that they'd only detect the
> corruption and then resync from another system - which is nothing else
> than recovering from a backup), I wouldn't see the big problem with
> EIO.
It isn't a problem if it isn't a false positive.  It is a problem when 
it's not correct and the data is accurate.  This breaks from current 
behavior on BTRFS in a not insignificant way.  As things stand right 
now, -EIO on BTRFS means one of two things:
* The underlying device returned an IO error.
* The data there is incorrect.

While it technically is possible for there to be a false positive with 
CoW, it is a statistical impossibility even at Google and Facebook scale 
(I will comment that I've had this happen (exactly once), but it 
resulted from severe widespread media issues in the storage device that 
should have caused catastrophic failure of the device).

There is no way to avoid false positives without CoW or journaling.  We 
have CoW, and people aren't using it for performance reasons.  Adding 
journaling instead will make performance worse (and brings up the 
important question of whether or not the journal is CoW) for NOCOW, and 
has the potential to make performance worse than without NOCOW.
> 
> 
>> Go take a look at any enterprise
>> database application from a reasonable company, it will almost
>> always
>> support replication across systems and validate data it reads.
> 
> Okay, I already showed you, that PostgreSQL, MySQL, BDB, sqlite can't
> or don't do per default... so which do you mean with the enterprise DB
> (Oracle?) and where's the reference that shows that they really do
> general checksuming? And that EIO would be a problem for their recovery
> strategies?
Again, I never said it had to be checksumming.  Type and range checking 
and validation of the metadata (not through checksumming, but through 
verifying that the metadata makes sense, essentially the equivalent of 
fsck on older filesystems) _is_ done by almost everything dealing with 
databases these days except for trivial one-off stuff.

As far as EIO, see my reply above.
> 
> And again, we're not talking about the WALs (or whatever these programs
> call it) which are there to handle a crash... we are talking about
> silent data corruption.
Reread what I said.  Database _APPLICATION_ is not the same as database 
system.  PGSQL, MySQL, BDB, SQLite, MSSQL, Oracle, etc, are all database 
systems, they provide a database that an application can build on top 
of, and yes, none of them provide any significant protection (except 
possibly MSSQL, but I'm not sure about that and it's not hugely relevant 
to this particular discussion).  Things like MythTV, Bugzilla, Kodi, and 
other stuff that utilize the database for back-end storage (including 
things like many media players and web browsers) are database 
applications.  The distinction here is no different from Linux 
applications versus Linux systems.

In the context of actual applications using the database, it's still not 
rigorous verification like you seem to think I'm talking about, but most 
of them do enough sanity checking that most stuff beyond single bit 
errors in numeric and string types will be caught and at least reported.>
> 
>> Agreed, but there's also the counter argument for log files that
>> most
>> people who are not running servers rarely (if ever) look at old
>> logs,
>> and it's the old logs that are the most likely to have at rest
>> corruption (the longer something sits idle on media, the more likely
>> it
>> will suffer from a media error).
> 
> I wouldn't have any valid prove that it's really the "idle" data, which
> is the most likely one to have silent corruptions (at least not for all
> types of storage medium), but even if this is the case as you say...
> then it's probably more likely to hit the /usr/ /lib/ and so on stuff
> on stable distros... logs are typically rotated and then at least once
> re-written (when compressed).
Except that /usr and /lib are trivial to validate on any modern Linux or 
BSD system because the package manager almost certainly has file 
validation built in.  At minimum, emerge, Entropy, DNF, yum, FreeBSD 
pkg-ng, pkgin, Zypper, YaST2, Nix, and Alpine APK, all have this 
functionality, and there is at least one readily available piece of 
software (debsigs) for dpkg based systems.  Sensibly security minded 
individuals generally already have this type of validation in a cron job 
or systemd timer.
> 
> 
>> Go install OpenSUSE in a VM.  Look at what filesystem it uses.  Go
>> install Solaris in a VM, lo and behold it uses ZFS _with no option
>> for
>> anything else_ as it's root filesystem.  Go install a recent version
>> of
>> Windows server in a VM, notice that it also has the option of a
>> properly
>> checked filesystem (ReFS).  Go install FreeBSD in a VM, notice that
>> it
>> provides the option (which is actively recommended by many people
>> who
>> use FreeBSD) to install with root on ZFS.  Install Android or Chrome
>> OS
>> (or AOSP or Chromium OS) in a VM.  Root the system and take a look
>> at
>> the storage stack, both of them use dm-verity, and Android (and
>> possibly
>> Chrome OS too, not 100% certain) uses per-file AEAD through the VFS
>> encryption API on encrypted devices.
> 
> So your argument for not adding support for this is basically:
> People don't or shouldn't use btrfs for this? o.O
No, you shouldn't be using a CoW filesystem directly for VM image 
storage if you care at all about performance, and especially not BTRFS. 
Even with NOCOW, performance of this on BTRFS is absolutely horrendous. 
This goes double if you're using QCOW2 or other allocate-on-demand 
formats.  Ideal order of decreasing preference if you care about 
performance is:
* Native block devices
* SAN devices
* LVM or ZFS ZVols (believe it or not, ZVols actually get remarkably 
good performance despite being on a CoW backend)
* Simple filesystems like ext4 or XFS that don't do CoW or use log 
structures for data
* Files on ZFS or F2FS
* Most other CoW or log structured filesystems
* BTRFS

BTRFS should literally be your last resort for VM image storage if you 
care about performance.
> 
> 
> 
>>    The fact that some OS'es blindly
>> trust the underlying storage hardware is not our issue, it's their
>> issue, and it shouldn't be 'fixed' by BTRFS because it doesn't just
>> affect their customers who run the OS in a VM on BTRFS.
> 
> Then you can probably drop checksumming from btrfs altogether. And with
> the same "argument" any other advanced feature.
> For resilience there is hardware RAID or Linux' MD raid... so no need
> to keep it in btrfs o.O
**NO**.  That is not what I'm arguing.  That would be regressing BTRFS 
to a state that I'm arguing needs to be _FIXED_ in other systems.  My 
complaint is that operating systems (and by extension, VM's) should be 
doing the checking themselves because they inherently can't rely on the 
underlying storage in almost all cases, in particular in the ones in 
which they are almost always used.

Notice in particular that I mentioned OpenSUSE, which has this 
validation _because_ it uses BTRFS by default for the root filesystem. 
I would have thought that that would not need to be explained here, but 
apparently I was wrong.
> 
> 
>> Most enterprise database apps offer support for
>> replication,
>> and quite a few do their own data validation when reading from the
>> database.
> First of all,... replication != the capability to detect silent data
> corruption.
So how is proper verified replication not able to detect silent data 
corruption exactly?  I mean, that's what RAID1 is and it does provide 
the ability to detect such things (unless your RAID implementation is 
brain dead), it just doesn't fix it reliably by itself.
> 
> You still haven't named a single one which does checksumming per
> default. At least those which are quite popular in the FLOSS world all
> don't seem to do.
Again, checksumming is not the only way to detect data corruption. 
Comparison to other copies, metadata validation (databases aren't just a 
jumble of data, there is required structure that can be validated), and 
type and range checking are all ways of detecting silent corruption.

Are they perfect? No.
Is checksumming better? In some circumstances.
Are they sufficient for most use cases? Absolutely.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-14 19:54                     ` Christoph Anton Mitterer
  2017-08-15 11:37                       ` Austin S. Hemmelgarn
@ 2017-08-16 13:12                       ` Chris Mason
  2017-08-16 13:31                         ` Christoph Anton Mitterer
                                           ` (3 more replies)
  1 sibling, 4 replies; 63+ messages in thread
From: Chris Mason @ 2017-08-16 13:12 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Austin S. Hemmelgarn, Btrfs BTRFS

On Mon, Aug 14, 2017 at 09:54:48PM +0200, Christoph Anton Mitterer wrote:
>On Mon, 2017-08-14 at 11:53 -0400, Austin S. Hemmelgarn wrote:
>> Quite a few applications actually _do_ have some degree of secondary 
>> verification or protection from a crash.  Go look at almost any
>> database 
>> software.
>Then please give proper references for this!
>
>This is from 2015, where you claimed this already and I looked up all
>the bigger DBs and they either couldn't do it at all, didn't to it per
>default, or it required application support (i.e. from the programs
>using the DB)
>https://www.spinics.net/lists/linux-btrfs/msg50258.html
>
>
>> It usually will not have checksumming, but it will almost 
>> always have support for a journal, which is enough to cover the 
>> particular data loss scenario we're talking about (unexpected
>> unclean 
>> shutdown).
>
>I don't think we talk about this:
>We talk about people wanting checksuming to notice e.g. silent data
>corruption.
>
>The crash case is only the corner case about what happens then if data
>is written correctly but csums not.

We use the crcs to catch storage gone wrong, both in terms of simple 
things like cabling, bus errors, drives gone crazy or exotic problems 
like every time I reboot the box a handful of sectors return EFI 
partition table headers instead of the data I wrote.  You don't need 
data center scale for this to happen, but it does help...

So, we do catch crc errors in prod and they do keep us from replicating 
bad data over good data.  Some databases also crc, and all drives have 
correction bits of of some kind.  There's nothing wrong with crcs 
happening at lots of layers.

Btrfs couples the crcs with COW because it's the least complicated way 
to protect against:

* bits flipping
* IO getting lost on the way to the drive, leaving stale but valid data 
in place
* IO from sector A going to sector B instead, overwriting valid data 
with other valid data.

It's possible to protect against all three without COW, but all 
solutions have their own tradeoffs and this is the setup we chose.  It's 
easy to trust and easy to debug and at scale that really helps.

In general, production storage environments prefer clearly defined 
errors when the storage has the wrong data.  EIOs happen often, and you 
want to be able to quickly pitch the bad data and replicate in good 
data.

My real goal is to make COW fast enough that we can leave it on for the 
database applications too.  Obviously I haven't quite finished that one 
yet ;) But I'd rather keep the building block of all the other btrfs 
features in place than try to do crcs differently.

-chris

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 13:12                       ` Chris Mason
@ 2017-08-16 13:31                         ` Christoph Anton Mitterer
  2017-08-16 13:53                           ` Austin S. Hemmelgarn
  2017-08-16 16:54                           ` Peter Grandi
  2017-08-16 13:56                         ` Austin S. Hemmelgarn
                                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-16 13:31 UTC (permalink / raw)
  To: Chris Mason; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 2536 bytes --]

Just out of curiosity:


On Wed, 2017-08-16 at 09:12 -0400, Chris Mason wrote:
> Btrfs couples the crcs with COW because

this (which sounds like you want it to stay coupled that way)...

plus


> It's possible to protect against all three without COW, but all 
> solutions have their own tradeoffs and this is the setup we
> chose.  It's 
> easy to trust and easy to debug and at scale that really helps.

... this (which sounds more you think the checksumming is so helpful,
that it would be nice in the nodatacow as well).

What does that mean now? Things will stay as they are... or it may
become a goal to get checksumming for nodatacow (while of course still
retaining the possibility to disable both, datacow AND checksumming)?


> In general, production storage environments prefer clearly defined 
> errors when the storage has the wrong data.  EIOs happen often, and
> you 
> want to be able to quickly pitch the bad data and replicate in good 
> data.

Which would also rather point towards getting clear EIOs (and thus
checksumming) in the nodatacow case.



> My real goal is to make COW fast enough that we can leave it on for
> the 
> database applications too.  Obviously I haven't quite finished that
> one 
> yet ;)

Well the question is, even if you manage that sooner or later, will
everyone be fully satisfied by this?!
I've mentioned earlier on the list that I manage one of the many big
data/computing centres for LHC.
Our use case is typically big plain storage servers connected via some
higher level storage management system (http://dcache.org/)... with
mostly write once/read many.

So apart from some central DBs for the storage management system
itself, CoW is mostly no issue for us.
But I've talked to some friend at the local super computing centre and
they have rather general issues with CoW at their virtualisation
cluster.
Like SUSE's snapper making many snapshots leading the storage images of
VMs apparently to explode (in terms of space usage).
For some of their storage backends there simply seem to be no de-
duplication available (or other reasons that prevent it's usage).

From that I'd guess there would be still people who want the nice
features of btrfs (snapshots, checksumming, etc.), while still being
able to nodatacow in specific cases.


> But I'd rather keep the building block of all the other btrfs 
> features in place than try to do crcs differently.

Mhh I see, what a pity.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 13:31                         ` Christoph Anton Mitterer
@ 2017-08-16 13:53                           ` Austin S. Hemmelgarn
  2017-08-16 14:11                             ` Christoph Anton Mitterer
  2017-08-16 18:19                             ` David Sterba
  2017-08-16 16:54                           ` Peter Grandi
  1 sibling, 2 replies; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-16 13:53 UTC (permalink / raw)
  To: Christoph Anton Mitterer, Chris Mason; +Cc: Btrfs BTRFS

On 2017-08-16 09:31, Christoph Anton Mitterer wrote:
> Just out of curiosity:
> 
> 
> On Wed, 2017-08-16 at 09:12 -0400, Chris Mason wrote:
>> Btrfs couples the crcs with COW because
> 
> this (which sounds like you want it to stay coupled that way)...
> 
> plus
> 
> 
>> It's possible to protect against all three without COW, but all
>> solutions have their own tradeoffs and this is the setup we
>> chose.  It's
>> easy to trust and easy to debug and at scale that really helps.
> 
> ... this (which sounds more you think the checksumming is so helpful,
> that it would be nice in the nodatacow as well).
> 
> What does that mean now? Things will stay as they are... or it may
> become a goal to get checksumming for nodatacow (while of course still
> retaining the possibility to disable both, datacow AND checksumming)?
It means that you have other options if you want this so badly that you 
need to keep pestering the developers about it but can't be arsed to try 
to code it yourself.  Go try BTRFS on top of dm-integrity, or on a 
system with T10-DIF or T13-EPP support (which you should have access to 
given the amount of funding CERN gets), or even on a ZFS zvol if you're 
crazy enough.  It works wonderfully in the first two cases, and reliably 
(but not efficiently) in the third, and all of them provide exactly what 
you want, plus the bonus that they do a slightly better job of 
differentiating between media and memory errors.
> 
> 
>> In general, production storage environments prefer clearly defined
>> errors when the storage has the wrong data.  EIOs happen often, and
>> you
>> want to be able to quickly pitch the bad data and replicate in good
>> data.
> 
> Which would also rather point towards getting clear EIOs (and thus
> checksumming) in the nodatacow case.
Except it isn't clear with nodatacow, because it might be a false positive.
> 
> 
> 
>> My real goal is to make COW fast enough that we can leave it on for
>> the
>> database applications too.  Obviously I haven't quite finished that
>> one
>> yet ;)
> 
> Well the question is, even if you manage that sooner or later, will
> everyone be fully satisfied by this?!
> I've mentioned earlier on the list that I manage one of the many big
> data/computing centres for LHC.
> Our use case is typically big plain storage servers connected via some
> higher level storage management system (http://dcache.org/)... with
> mostly write once/read many.
> 
> So apart from some central DBs for the storage management system
> itself, CoW is mostly no issue for us.
> But I've talked to some friend at the local super computing centre and
> they have rather general issues with CoW at their virtualisation
> cluster.
> Like SUSE's snapper making many snapshots leading the storage images of
> VMs apparently to explode (in terms of space usage).
SUSE is pathological case of brain-dead defaults.  Snapper needs to 
either die or have some serious sense beat into it.  When you turn off 
the automatic snapshot generation for everything but updates and set the 
retention policy to not keep almost everything, it's actually not bad at 
all.
> For some of their storage backends there simply seem to be no de-
> duplication available (or other reasons that prevent it's usage).
If the snapshots are being CoW'ed, then dedupe won't save them any 
space.  Also, nodatacow is inherently at odds with reflinks used for dedupe.
> 
>  From that I'd guess there would be still people who want the nice
> features of btrfs (snapshots, checksumming, etc.), while still being
> able to nodatacow in specific cases.
Snapshots work fine with nodatacow, each block gets CoW'ed once when 
it's first written to, and then goes back to being NOCOW.  The only 
caveat is that you probably want to defrag either once everything has 
been rewritten, or right after the snapshot.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 13:12                       ` Chris Mason
  2017-08-16 13:31                         ` Christoph Anton Mitterer
@ 2017-08-16 13:56                         ` Austin S. Hemmelgarn
  2017-08-16 14:01                         ` Qu Wenruo
  2017-08-16 16:44                         ` Peter Grandi
  3 siblings, 0 replies; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-16 13:56 UTC (permalink / raw)
  To: Chris Mason, Btrfs BTRFS; +Cc: Christoph Anton Mitterer

On 2017-08-16 09:12, Chris Mason wrote:
> My real goal is to make COW fast enough that we can leave it on for the 
> database applications too.  Obviously I haven't quite finished that one 
> yet ;) But I'd rather keep the building block of all the other btrfs 
> features in place than try to do crcs differently.
In general, the performance issue isn't because of the time it takes to 
CoW the blocks, it's because of the fragmentation it introduces.  That 
fragmentation could in theory be mitigated by making CoW happen at a 
larger chunk size, but that would push the issue more towards being one 
of CoW performance, not fragmentation.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 13:12                       ` Chris Mason
  2017-08-16 13:31                         ` Christoph Anton Mitterer
  2017-08-16 13:56                         ` Austin S. Hemmelgarn
@ 2017-08-16 14:01                         ` Qu Wenruo
  2017-08-16 19:52                           ` Chris Murphy
  2017-08-16 16:44                         ` Peter Grandi
  3 siblings, 1 reply; 63+ messages in thread
From: Qu Wenruo @ 2017-08-16 14:01 UTC (permalink / raw)
  To: Chris Mason, Christoph Anton Mitterer, Austin S. Hemmelgarn, Btrfs BTRFS



On 2017年08月16日 21:12, Chris Mason wrote:
> On Mon, Aug 14, 2017 at 09:54:48PM +0200, Christoph Anton Mitterer wrote:
>> On Mon, 2017-08-14 at 11:53 -0400, Austin S. Hemmelgarn wrote:
>>> Quite a few applications actually _do_ have some degree of secondary
>>> verification or protection from a crash.  Go look at almost any
>>> database
>>> software.
>> Then please give proper references for this!
>>
>> This is from 2015, where you claimed this already and I looked up all
>> the bigger DBs and they either couldn't do it at all, didn't to it per
>> default, or it required application support (i.e. from the programs
>> using the DB)
>> https://www.spinics.net/lists/linux-btrfs/msg50258.html
>>
>>
>>> It usually will not have checksumming, but it will almost
>>> always have support for a journal, which is enough to cover the
>>> particular data loss scenario we're talking about (unexpected
>>> unclean
>>> shutdown).
>>
>> I don't think we talk about this:
>> We talk about people wanting checksuming to notice e.g. silent data
>> corruption.
>>
>> The crash case is only the corner case about what happens then if data
>> is written correctly but csums not.
> 
> We use the crcs to catch storage gone wrong, both in terms of simple 
> things like cabling, bus errors, drives gone crazy or exotic problems 
> like every time I reboot the box a handful of sectors return EFI 
> partition table headers instead of the data I wrote.  You don't need 
> data center scale for this to happen, but it does help...
> 
> So, we do catch crc errors in prod and they do keep us from replicating 
> bad data over good data.  Some databases also crc, and all drives have 
> correction bits of of some kind.  There's nothing wrong with crcs 
> happening at lots of layers.
> 
> Btrfs couples the crcs with COW because it's the least complicated way 
> to protect against:
> 
> * bits flipping
> * IO getting lost on the way to the drive, leaving stale but valid data 
> in place
> * IO from sector A going to sector B instead, overwriting valid data 
> with other valid data.
> 
> It's possible to protect against all three without COW, but all 
> solutions have their own tradeoffs and this is the setup we chose.  It's 
> easy to trust and easy to debug and at scale that really helps.
> 
> In general, production storage environments prefer clearly defined 
> errors when the storage has the wrong data.  EIOs happen often, and you 
> want to be able to quickly pitch the bad data and replicate in good data.

Btrfs csum is really good, specially for case like RAID1/5/6 where csum 
can provide extra info about which mirror/stripe/parity can be trusted, 
with minimal space wasted.

DM layer should really have the ability to verify its data at that 
timing like btrfs.

> 
> My real goal is to make COW fast enough that we can leave it on for the 
> database applications too.

Yes, most of the complexity of nodatasum/nodatacow comes from those 
special workload.

BTW, when Fujitsu tested the postgresql workload on btrfs, the result is 
quite interesting.

For HDD, when number of clients is low, btrfs shows obvious performance 
drop.
And the problem seems to be mandatory metadata COW, which leads to 
superblock FUA updates.
And when number of clients grow, difference between btrfs and other fses 
gets much smaller, the bottleneck is the HDD itself.

While for SSD, when number of clients is low, btrfs is almost the same 
performance as other fses, nodatacow/nodatasum only provides marginal 
difference.
But when number of clients grows, btrfs falls far behind other fses.
The reason seems to be related to how postgresql commit its transaction, 
which always fsync its journal sequentially without concurrency.
While Btrfs needs to wait its data write before updating its log tree, 
this makes most of its time wasted on waiting data IO.
In that case, nodatacow does improves the performance, by allowing btrfs 
to update its log tree without waiting data IO.

But in both case, CoW itself, like allocating new extent, or calculating 
csum, is not the main cause to slow down btrfs.
That's to say, nodatacow is not as important as we used to think.

If we can get rid of nodatacow/nodatasum, there will be much less thing 
to consider for us developers, and less related bugs.

Thanks,
Qu

>  Obviously I haven't quite finished that one 
> yet ;) But I'd rather keep the building block of all the other btrfs 
> features in place than try to do crcs differently.
> 
> -chris
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 13:53                           ` Austin S. Hemmelgarn
@ 2017-08-16 14:11                             ` Christoph Anton Mitterer
  2017-08-16 15:07                               ` Austin S. Hemmelgarn
  2017-08-16 18:19                             ` David Sterba
  1 sibling, 1 reply; 63+ messages in thread
From: Christoph Anton Mitterer @ 2017-08-16 14:11 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

On Wed, 2017-08-16 at 09:53 -0400, Austin S. Hemmelgarn wrote:
> Go try BTRFS on top of dm-integrity, or on a 
> system with T10-DIF or T13-EPP support

When dm-integrity is used... would that be enough for btrfs to do a
proper repair in the RAID+nodatacow case? I assume it can't do repairs
now there, because how should it know which copy is valid.


>  (which you should have access to 
> given the amount of funding CERN gets)
Hehe, CERN may get that funding (I don't know),... but the universities
rather don't ;-)


> Except it isn't clear with nodatacow, because it might be a false
> positive.

Sure, never claimed the opposite... just that I'd expect this to be
less likely than the other way round, and less of a problem in
practise.



> SUSE is pathological case of brain-dead defaults.  Snapper needs to 
> either die or have some serious sense beat into it.  When you turn
> off 
> the automatic snapshot generation for everything but updates and set
> the 
> retention policy to not keep almost everything, it's actually not bad
> at 
> all.

Well, still, with CoW (unless you have some form of deduplication,
which in e.g. their use case would have to be on the layers below
btrfs), your storage usage will grow probably more significantly than
without.

And as you've mentioned yourself in the other mail, there's still the
issue with fragmentation.


> Snapshots work fine with nodatacow, each block gets CoW'ed once when 
> it's first written to, and then goes back to being NOCOW.  The only 
> caveat is that you probably want to defrag either once everything
> has 
> been rewritten, or right after the snapshot.

I thought defrag would unshare the reflinks?

Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 14:11                             ` Christoph Anton Mitterer
@ 2017-08-16 15:07                               ` Austin S. Hemmelgarn
  2017-08-16 17:26                                 ` Peter Grandi
  0 siblings, 1 reply; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-16 15:07 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: Btrfs BTRFS

On 2017-08-16 10:11, Christoph Anton Mitterer wrote:
> On Wed, 2017-08-16 at 09:53 -0400, Austin S. Hemmelgarn wrote:
>> Go try BTRFS on top of dm-integrity, or on a
>> system with T10-DIF or T13-EPP support
> 
> When dm-integrity is used... would that be enough for btrfs to do a
> proper repair in the RAID+nodatacow case? I assume it can't do repairs
> now there, because how should it know which copy is valid.
dm-integrity is functionally a 1:1 mapping target (it uses a secondary 
device for storing the integrity info, but it requires one table per 
target).  It takes one backing device, and gives one mapped device.  The 
setup I'm suggesting would involve putting that on each device that you 
have BTRFS configured to use.  When the checksum there fails, you get a 
read error (AFAIK at least), which will trigger the regular BTRFS 
recovery code just like a failed checksum.  So in this case, it should 
recover just fine if one copy is bogus (assuming it's a media issue and 
not something between the the block device and the filesystem.

In all honesty, putting BTRFS on dm-integrity is going to be slow.  If 
you can find some T10 DIF or T13 EPP hardware, that will almost 
certainly be faster.
> 
> 
>>   (which you should have access to
>> given the amount of funding CERN gets)
> Hehe, CERN may get that funding (I don't know),... but the universities
> rather don't ;-)
Point taken, I often forget that funding isn't exactly distributed in 
the most obvious ways.
> 
> 
>> Except it isn't clear with nodatacow, because it might be a false
>> positive.
> 
> Sure, never claimed the opposite... just that I'd expect this to be
> less likely than the other way round, and less of a problem in
> practise.
Any number of hardware failures or errors can cause the same net effect 
as an unclean shutdown, and even some much more complicated issues (a 
loose data cable to a storage device is probably one of the best 
examples, as it's trivial to explain and not as rare as most people think).
> 
> 
> 
>> SUSE is pathological case of brain-dead defaults.  Snapper needs to
>> either die or have some serious sense beat into it.  When you turn
>> off
>> the automatic snapshot generation for everything but updates and set
>> the
>> retention policy to not keep almost everything, it's actually not bad
>> at
>> all.
> 
> Well, still, with CoW (unless you have some form of deduplication,
> which in e.g. their use case would have to be on the layers below
> btrfs), your storage usage will grow probably more significantly than
> without.
Yes, and for most VM use cases I would advocate not using BTRFS 
snapshots inside the VM and instead using snapshot functionality in the 
VM software itself.  That still has performance issues in some cases, 
but at least it's easier to see where the data is actually being used.
> 
> And as you've mentioned yourself in the other mail, there's still the
> issue with fragmentation.
> 
> 
>> Snapshots work fine with nodatacow, each block gets CoW'ed once when
>> it's first written to, and then goes back to being NOCOW.  The only
>> caveat is that you probably want to defrag either once everything
>> has
>> been rewritten, or right after the snapshot.
> 
> I thought defrag would unshare the reflinks?
Which is exactly why you might want to do it.  It will get rid of the 
overhead of the single CoW operation, and it will make sure there is 
minimal fragmentation.  IOW, when mixing NOCOW and snapshots, you either 
have to use extra space, or you deal with performance issues.  Aside 
from that though, it works just fine and has no special issues as 
compared to snapshots without NOCOW.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 13:12                       ` Chris Mason
                                           ` (2 preceding siblings ...)
  2017-08-16 14:01                         ` Qu Wenruo
@ 2017-08-16 16:44                         ` Peter Grandi
  3 siblings, 0 replies; 63+ messages in thread
From: Peter Grandi @ 2017-08-16 16:44 UTC (permalink / raw)
  To: Linux fs Btrfs

> We use the crcs to catch storage gone wrong, [ ... ]

And that's an opportunistically feasible idea given that current
CPUs can do that in real-time.

> [ ... ] It's possible to protect against all three without COW,
> but all solutions have their own tradeoffs and this is the setup
> we chose. It's easy to trust and easy to debug and at scale that
> really helps.

Indeed all filesystem designs have pathological workloads, and
system administrators and applications developers who are "more
prepared" know which one is best for which workload, or try to
figure it out.

> Some databases also crc, and all drives have correction bits of
> of some kind. There's nothing wrong with crcs happening at lots
> of layers.

Well, there is: in theory checksumming should be end-to-end, that
is entirely application level, so applications that don't need it
don't pay the price, but having it done at other layers can help
the very many applications that don't do it and should do it, and
it is cheap, and can help when troubleshooting exactly there the
problem is. It is an opportunistic thing to do.

> [ ... ] My real goal is to make COW fast enough that we can
> leave it on for the database applications too.  Obviously I
> haven't quite finished that one yet ;) [ ... ]

And this worries me because it portends the usual "marketing" goal
of making Btrfs all things to all workloads, the "OpenStack of
filesystems", with little consideration for complexity,
maintainability, or even sometimes reality.

The reality is that all known storage media have hugely
anisotropic performance envelopes, both as to functionality, cost,
speed, reliability, and there is no way to have an automagic
filesystem that "just works" in all cases, despite the constant
demands for one by "less prepared" storage administrators and
application developers. The reality is also that if one such
filesystem could automagically adapt to cover optimally the
performance envelopes of every possible device and workload, it
would be so complex as to be unmaintainable in practice.

So Btrfs, in its base "Rodeh" functionality, with COW, checksums,
subvolumes, shapshots, *on a single device*, works pretty well and
reliably and it is already very useful, for most workloads. Some
people also like some of its exotic complexities like in-place
compression and defragmentation, but they come at a high cost.

For workloads that inflict lots of small random in-place updates
on storage, like tablespaces for DBMSes etc, perhaps simpler less
featureful storage abstraction layers are more appropriate, from
OCFS2 to simple DM/LVM2 LVs, and Btrfs NOCOW approximates them
well.

BTW as to the specifics of DBMSes and filesystems, there is a
classic paper making eminently reasonable, practical, suggestions
that have been ignored for only 35 years and some:

  %A M. R. Stonebraker
  %T Operating system support for database management
  %J CACM
  %V 24
  %D JUL 1981
  %P 412-418

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 13:31                         ` Christoph Anton Mitterer
  2017-08-16 13:53                           ` Austin S. Hemmelgarn
@ 2017-08-16 16:54                           ` Peter Grandi
  1 sibling, 0 replies; 63+ messages in thread
From: Peter Grandi @ 2017-08-16 16:54 UTC (permalink / raw)
  To: Linux fs Btrfs

[ ... ]

> But I've talked to some friend at the local super computing
> centre and they have rather general issues with CoW at their
> virtualisation cluster.

Amazing news! :-)

> Like SUSE's snapper making many snapshots leading the storage
> images of VMs apparently to explode (in terms of space usage).

Well, this could be an argument that some of your friends are being
"challenged" by running the storage systems of a "super computing
centre" and that they could become "more prepared" about system
administration, for example as to the principle "know which tool to
use for which workload". Or else it could be an argument that they
expect Btrfs to do their job while they watch cat videos from the
intertubes. :-)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 15:07                               ` Austin S. Hemmelgarn
@ 2017-08-16 17:26                                 ` Peter Grandi
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Grandi @ 2017-08-16 17:26 UTC (permalink / raw)
  To: Linux fs Btrfs

[ ... ]

>>> Snapshots work fine with nodatacow, each block gets CoW'ed
>>> once when it's first written to, and then goes back to being
>>> NOCOW.
>>> The only caveat is that you probably want to defrag either
>>> once everything has been rewritten, or right after the
>>> snapshot.

>> I thought defrag would unshare the reflinks?
 
> Which is exactly why you might want to do it. It will get rid
> of the overhead of the single CoW operation, and it will make
> sure there is minimal fragmentation.
> IOW, when mixing NOCOW and snapshots, you either have to use
> extra space, or you deal with performance issues. Aside from
> that though, it works just fine and has no special issues as
> compared to snapshots without NOCOW.

The above illustrates my guess as to why RHEL 7.4 dropped Btrfs
support, which is:

  * RHEL is sold to managers who want to minimize the cost of
    upgrades and sysadm skills.
  * Every time a customer creates a ticket, RH profits fall.
  * RH had adopted 'ext3' because it was an in-place upgrade
    from 'ext2' and "just worked", 'ext4' because it was an
    in-place upgrade from 'ext3' and was supposed to "just
    work", and then was looking at Btrfs as an in-place upgrade
    from 'ext4', and presumably also a replacement for MD RAID,
    that would "just work".
  * 'ext4' (and XFS before that) already created a few years ago
    trouble because of the 'O_PONIES' controversy.
  * Not only Btrfs still has "challenges" as to multi-device
    functionality, and in-place upgrades from 'ext4' have
    "challenges" too, it has many "special cases" that need
    skill and discretion to handle, because it tries to cover so
    many different cases, and the first thing many a RH customer
    would do is to create a ticket to ask what to do, or how to
    fix a choice already made.

Try to imagine the impact on the RH ticketing system of a switch
from 'ext4' to Btrfs, with explanations like the above, about
NOCOW, defrag, snapshots, balance, reflinks, and the exact order
in which they have to be performed for best results.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-03 18:08 ` waxhead
  2017-08-03 18:29   ` Christoph Anton Mitterer
  2017-08-03 19:03   ` Austin S. Hemmelgarn
@ 2017-08-16 18:07   ` David Sterba
  2 siblings, 0 replies; 63+ messages in thread
From: David Sterba @ 2017-08-16 18:07 UTC (permalink / raw)
  To: waxhead; +Cc: Brendan Hide, linux-btrfs

On Thu, Aug 03, 2017 at 08:08:59PM +0200, waxhead wrote:
> BTRFS biggest problem is not that there are some bits and pieces that 
> are thoroughly screwed up (raid5/6 (which just got some fixes by the 
> way)), but  the fact that the documentation is rather dated.
> 
> There is a simple status page here 
> https://btrfs.wiki.kernel.org/index.php/Status
> 
> As others have pointed out already the explanations on the status page 
> is not exactly good. For example compression (that was also mentioned) 
> is as of writing this marked as 'Mostly ok'  '(needs verification and 
> source) - auto repair and compression may crash'
> 
> Now, I am aware that many use compression without trouble. I am not sure 
> how many that has compression with disk issues and don't have trouble , 
> but I would at least expect to see more people yelling on the mailing 
> list if that where the case. The problem here is that this message is 
> rather scary and certainly does NOT sound like 'mostly ok' for most people.
> 
> What exactly needs verification and source? the mostly ok statement or 
> something else?! A more detailed explanation would be required here to 
> avoid scaring people away.
> 
> Same thing with the trim feature that is marked OK . It clearly says 
> that is has performance implications. It is marked OK so one would 
> expect it to not cause the filesystem to fail, but if the performance 
> becomes so slow that the filesystem gets practically unusable it is of 
> course not "OK". The relevant information is missing for people to make 
> a decent choice and I certainly don't know how serious these performance 
> implications are, if they are at all relevant...

I'll try to restructure the page so it reflects status of the features
from more aspects, like overall/performance/"known bad scenarios". The
in-row notes are proably bad idea as they are short on details, the
section under table will be better for that.

> Most people interested in BTRFS are probably a bit more paranoid and 
> concerned about their data than the average computer user. What people 
> tend to forget is that other filesystems either have NO redundancy, 
> auto-repair and other fancy features that BTRFS have. So for the 
> compression example above... if you run compressed files on ext4 and 
> your disk gets some corruption you are in a no better state than what 
> you would be with btrfs either (in fact probably worse). Also nothing is 
> stopping you from putting btrfs DUP on a mdadm raid5 or 6 which mean you 
> should be VERY safe.
> 
> Simple documentation is the key so HERE ARE MY DEMANDS!!!..... ehhh.... 
> so here is what I think should be done:
> 
> 1. The documentation needs to either be improved (or old non-relevant 
> stuff simply removed / archived somewhere)

Agreed, this happens from time.

> 2. The status page MUST always be up to date for the latest kernel 
> release (It's ok so far , let's hope nobody sleeps here)

I'm watching over the page. It's been locked from edits so there's a
mandatory review of the new contents, the update process is documented
on the page.

> 3. Proper explanations must be given so the layman and reasonably 
> technical people understand the risks / issues for non-ok stuff.

This can be hard, the audience are both technical and non-technical
users. The page is supposed to give quick overview, the more detailed
information is either in the notes or on separate pages linked from
there. I believe this structure should be able to cover what you need,
but the acutal contents hasn't been written and there are not enough
people willing/capable of writing it.

> 4. There should be links to roadmaps for each feature on the status page 
> that clearly stats what is being worked on for the NEXT kernel release

We've tried something like that in the past, the page got out of sync
with reality over time and was deleted.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 13:53                           ` Austin S. Hemmelgarn
  2017-08-16 14:11                             ` Christoph Anton Mitterer
@ 2017-08-16 18:19                             ` David Sterba
  1 sibling, 0 replies; 63+ messages in thread
From: David Sterba @ 2017-08-16 18:19 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Christoph Anton Mitterer, Chris Mason, Btrfs BTRFS

On Wed, Aug 16, 2017 at 09:53:57AM -0400, Austin S. Hemmelgarn wrote:
> > So apart from some central DBs for the storage management system
> > itself, CoW is mostly no issue for us.
> > But I've talked to some friend at the local super computing centre and
> > they have rather general issues with CoW at their virtualisation
> > cluster.
> > Like SUSE's snapper making many snapshots leading the storage images of
> > VMs apparently to explode (in terms of space usage).
> SUSE is pathological case of brain-dead defaults.  Snapper needs to 
> either die or have some serious sense beat into it.  When you turn off 
> the automatic snapshot generation for everything but updates and set the 
> retention policy to not keep almost everything, it's actually not bad at 
> all.

The defaults for timeline are really bad, the partition is almost never
big enough to hold 10 months worth of data updates, not to say 10 years.
A rolling distro can fill the space even with the daily or weeky
settings set to low numbers. But certain people had different oppinion
and I was not successful to change that. The least I did was to document
some of the usecases and the hints that could allow one to have a bit
more understanding of the effects.

https://github.com/kdave/btrfsmaintenance#tuning-periodic-snapshotting

> > For some of their storage backends there simply seem to be no de-
> > duplication available (or other reasons that prevent it's usage).
> If the snapshots are being CoW'ed, then dedupe won't save them any 
> space.  Also, nodatacow is inherently at odds with reflinks used for dedupe.
> > 
> >  From that I'd guess there would be still people who want the nice
> > features of btrfs (snapshots, checksumming, etc.), while still being
> > able to nodatacow in specific cases.
> Snapshots work fine with nodatacow, each block gets CoW'ed once when 
> it's first written to, and then goes back to being NOCOW.  The only 
> caveat is that you probably want to defrag either once everything has 
> been rewritten, or right after the snapshot.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 14:01                         ` Qu Wenruo
@ 2017-08-16 19:52                           ` Chris Murphy
  2017-08-17  6:25                             ` GWB
  0 siblings, 1 reply; 63+ messages in thread
From: Chris Murphy @ 2017-08-16 19:52 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Chris Mason, Christoph Anton Mitterer, Austin S. Hemmelgarn, Btrfs BTRFS

On Wed, Aug 16, 2017 at 8:01 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

> BTW, when Fujitsu tested the postgresql workload on btrfs, the result is
> quite interesting.
>
> For HDD, when number of clients is low, btrfs shows obvious performance
> drop.
> And the problem seems to be mandatory metadata COW, which leads to
> superblock FUA updates.
> And when number of clients grow, difference between btrfs and other fses
> gets much smaller, the bottleneck is the HDD itself.
>
> While for SSD, when number of clients is low, btrfs is almost the same
> performance as other fses, nodatacow/nodatasum only provides marginal
> difference.
> But when number of clients grows, btrfs falls far behind other fses.
> The reason seems to be related to how postgresql commit its transaction,
> which always fsync its journal sequentially without concurrency.


I wonder to what degree fsync is used as a hammer for a problem that
needs more granular indicators to solve, like fsadvise() and even
extending it?

But I'm also curious if the above behaviors you report, how it changes
by combining SSD and HDD via either dm-cache or bcache? Do the worst
aspects of SSD and HDD get muted in that case? Or do the worst aspects
become even worse across the board?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-16 19:52                           ` Chris Murphy
@ 2017-08-17  6:25                             ` GWB
  2017-08-17 11:47                               ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 63+ messages in thread
From: GWB @ 2017-08-17  6:25 UTC (permalink / raw)
  To: Peter Grandi, Linux fs Btrfs

<<
Or else it could be an argument that they
expect Btrfs to do their job while they watch cat videos from the
intertubes. :-)
>>

My favourite quote from the list this week, and, well, obviously, that
is the main selling point of file systems like btrfs, zfs, and various
other lvm and raid set ups.  The need to free up time to watch cat
videos on the intertubes (whilst at work) has driven most
technological innovations, going back at least to the time of the
Roman Empire.

So, sure, I'll be happy to admit that I like it very much when a file
system or some other software or hardware component makes my job
easier (which gives me more time to watch cat videos).  But if hours
on hours of cat videos have taught me one thing, it is that
catastrophe (pun intended) awaits those who assume that btrfs (or zfs
or nilfs or whatever) will magically work well in all use cases.

That may be what their customers assumed about btrfs, but did Red Hat
make that claim implicitly or explicitly?  I don't know, but it seems
unlikely, and all the things mentioned in this thread make sense to
me.  It looks like Red Hat is pushing "GFS" (Red Hat Global File
System) for its clustered file system:

https://www.redhat.com/whitepapers/rha/gfs/GFS_INS0032US.pdf

XFS is now the standard "on disk" fs for Red Hat, but I can't tell if
XFS is the DMU (backing file system or Data Management Unit) for GFS
(zfs is the dmu for lustre).  Probably, but why does GFS still has a
file size limit of 100TB, while XFS has a 500TB limit, according to
Red Hat?

https://access.redhat.com/articles/rhel-limits

And btrfs is gone from that list.

So does this mean that Red Hat deprecating btrfs have a tangible
effect on its development, future improvements, and adoption?  It
doesn't help, but maybe its not too bad.  From reading the list, my
impression is that the typical Red Hat customer with large data arrays
might do fine running xfs over lvm2 over hardware raid (or at least
the customers who are paying attention to the monitor stats between
cat videos).  That's not for me, because I prefer mirrors, not
stripes, and "hot spares" that I can pull out of the enclosure, place
in another machine, and get running again (which points me back to
btrfs and zfs).  But it must work great for a lot of data silos.

On the plus side, btrfs is one of the backing file systems in ceph; on
the minus side, with Red Hat out, btrfs might lose some developers and
support:

http://www.h-online.com/open/features/Kernel-Log-Coming-in-2-6-37-Part-2-File-systems-1148305.html%3Fpage=2

As long as FaceBook keeps using btrfs, I wouldn't worry too much about
large firm adoption.  Chris (from facebook, post above) points out
that Facebook runs both xfs and btrfs as backing file systems for
Gluster:

https://www.linux.com/news/learn/intro-to-linux/how-facebook-uses-linux-and-btrfs-interview-chris-mason

And Gluster is... owned by Red Hat (since 2011), which now advertises
its "Red Hat Global File System", which would be... Gluster?  Chris,
is that right?  So Facebook runs Gluster (which might be Red Hat
Global File System) with both xfs and btrfs as the backing fs, and Red
Hat... advertises Red Hat GFS as a platform for Oracle RAC Database
Clustering.  But not (presumably) running with btrfs as the backing
fs, but rather xfs.  So could one Gluster "grid" run over two file
systems, xfs for the applications, and btrfs for the primary data
storage?

So Oracle still supports btrfs.  Facebook still uses it.  And it would
be very funny if Red Hat GFS does use btrfs (eventually, at some point
in the future) as the backing fs, but their customers probably won't
notice the difference.

I'm not too worried.  I'll keep using btrfs as it is now, within the
limits of what it can consistently do, and do what I can to help
support the effort.  I'm not a file system coder, but I very much
appreciate the enormous amount of work that goes into btrfs.

Steady on, ButterFS people.  Back now to cat videos.

Gordon

 Aug 16, 2017 at 11:54 AM, Peter Grandi <pg@btrfs.list.sabi.co.uk> wrote:
> [ ... ]
>
>> But I've talked to some friend at the local super computing
>> centre and they have rather general issues with CoW at their
>> virtualisation cluster.
>
> Amazing news! :-)
>
>> Like SUSE's snapper making many snapshots leading the storage
>> images of VMs apparently to explode (in terms of space usage).
>
> Well, this could be an argument that some of your friends are being
> "challenged" by running the storage systems of a "super computing
> centre" and that they could become "more prepared" about system
> administration, for example as to the principle "know which tool to
> use for which workload". Or else it could be an argument that they
> expect Btrfs to do their job while they watch cat videos from the
> intertubes. :-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-17  6:25                             ` GWB
@ 2017-08-17 11:47                               ` Austin S. Hemmelgarn
  2017-08-17 19:00                                 ` Chris Murphy
  0 siblings, 1 reply; 63+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-17 11:47 UTC (permalink / raw)
  To: GWB, Peter Grandi, Linux fs Btrfs

On 2017-08-17 02:25, GWB wrote:
> <<
> Or else it could be an argument that they
> expect Btrfs to do their job while they watch cat videos from the
> intertubes. :-)
>>>
> 
> My favourite quote from the list this week, and, well, obviously, that
> is the main selling point of file systems like btrfs, zfs, and various
> other lvm and raid set ups.  The need to free up time to watch cat
> videos on the intertubes (whilst at work) has driven most
> technological innovations, going back at least to the time of the
> Roman Empire.
> 
> So, sure, I'll be happy to admit that I like it very much when a file
> system or some other software or hardware component makes my job
> easier (which gives me more time to watch cat videos).  But if hours
> on hours of cat videos have taught me one thing, it is that
> catastrophe (pun intended) awaits those who assume that btrfs (or zfs
> or nilfs or whatever) will magically work well in all use cases.
Agreed, and I will comment that there are far more catastrophes caused 
by sysadmin complacency or not properly understanding what they're 
administering than almost anything else.
> 
> That may be what their customers assumed about btrfs, but did Red Hat
> make that claim implicitly or explicitly?  I don't know, but it seems
> unlikely, and all the things mentioned in this thread make sense to
> me.  It looks like Red Hat is pushing "GFS" (Red Hat Global File
> System) for its clustered file system:
> 
> https://www.redhat.com/whitepapers/rha/gfs/GFS_INS0032US.pdf
Huh, I could have sworn they were pushing Gluster...
> 
> XFS is now the standard "on disk" fs for Red Hat, but I can't tell if
> XFS is the DMU (backing file system or Data Management Unit) for GFS
> (zfs is the dmu for lustre).  Probably, but why does GFS still has a
> file size limit of 100TB, while XFS has a 500TB limit, according to
> Red Hat?
GFS2 (which is what I think they're talking about) has it's own on-disk 
format, and actually works as a single-node filesystem.  It's a lot 
closer to OCFS2 in terms of design than it is to Lustre, though I'm not 
sure if it needs shared storage or not.

Also, both of those file size 'limits' are customer support limits from 
what I can tell.  XFS supports files (in theory at least) up to 8 EB 
minus one byte, and I'm not able to find any other documentation on this 
regarding GFS2, but I seriously doubt that it has a 100TB file size limit.
> 
> https://access.redhat.com/articles/rhel-limits
> 
> And btrfs is gone from that list.
> 
> So does this mean that Red Hat deprecating btrfs have a tangible
> effect on its development, future improvements, and adoption?  It
> doesn't help, but maybe its not too bad.  From reading the list, my
> impression is that the typical Red Hat customer with large data arrays
> might do fine running xfs over lvm2 over hardware raid (or at least
> the customers who are paying attention to the monitor stats between
> cat videos).  That's not for me, because I prefer mirrors, not
> stripes, and "hot spares" that I can pull out of the enclosure, place
> in another machine, and get running again (which points me back to
> btrfs and zfs).  But it must work great for a lot of data silos.
> 
> On the plus side, btrfs is one of the backing file systems in ceph; on
> the minus side, with Red Hat out, btrfs might lose some developers and
> support:
> 
> http://www.h-online.com/open/features/Kernel-Log-Coming-in-2-6-37-Part-2-File-systems-1148305.html%3Fpage=2
I'm pretty certain that Ceph has officially stopped recommending BTRFS 
as a backend filesystem.  TBH, it was never that amazing of an idea to 
begin with, Ceph does a lot of the same things that BTRFS does, so 
you're replicating a not insignificant amount of work, and the big thing 
was really snapshot support anyway.

Also, I don't think I've ever seen any patches posted from a Red Hat 
address on the ML, so I don't think they were really all that involved 
in development to begin with.
> 
> As long as FaceBook keeps using btrfs, I wouldn't worry too much about
> large firm adoption.  Chris (from facebook, post above) points out
> that Facebook runs both xfs and btrfs as backing file systems for
> Gluster:
> 
> https://www.linux.com/news/learn/intro-to-linux/how-facebook-uses-linux-and-btrfs-interview-chris-mason
> 
> And Gluster is... owned by Red Hat (since 2011), which now advertises
> its "Red Hat Global File System", which would be... Gluster?  Chris,
> is that right?  So Facebook runs Gluster (which might be Red Hat
> Global File System) with both xfs and btrfs as the backing fs, and Red
> Hat... advertises Red Hat GFS as a platform for Oracle RAC Database
> Clustering.  But not (presumably) running with btrfs as the backing
> fs, but rather xfs.  So could one Gluster "grid" run over two file
> systems, xfs for the applications, and btrfs for the primary data
> storage?
GFS and GlusterFS are different technologies, unless Red Hat's marketing 
department is trying to be actively deceptive.

GFS is a traditional cluster filesystem which requires fencing hardware 
and has it's own on-disk format.  It originated on IRIX, got ported to 
Linux, got updated to GFS2 to add splice() support and a few other 
things, and hasn't seen much development from what I can tell since that 
happened in 2009 (at least, not much beyond standard bug fixes and 
maintenance).

GlusterFS is a more modern cluster filesystem design, uses separate 
backing storage (like Lustre and Ceph do), has the rather nice advantage 
that the layout on the back-end storage exactly replicates the layout in 
the GlusterFS volume (assuming you're just using replication), and 
doesn't require any special hardware.  It also runs reasonably well on 
top of BTRFS, other than some scalability issues with directories with 
thousands of files in them (both Gluster and BTRFS have issues there, 
and they compound when used in a stack like this).  It doesn't directly 
use any special functionality of BTRFS, although it in theory could make 
use of the snapshotting functionality (the current snapshot support in 
Gluster assumes the use of a backing FS that supports freezefs on top of 
LVM2).
> 
> So Oracle still supports btrfs.  Facebook still uses it.  And it would
> be very funny if Red Hat GFS does use btrfs (eventually, at some point
> in the future) as the backing fs, but their customers probably won't
> notice the difference.
SUSE is also pretty actively involved in the development too, and I 
think Fujitsu is as well.
> 
> I'm not too worried.  I'll keep using btrfs as it is now, within the
> limits of what it can consistently do, and do what I can to help
> support the effort.  I'm not a file system coder, but I very much
> appreciate the enormous amount of work that goes into btrfs.
> 
> Steady on, ButterFS people.  Back now to cat videos.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-17 11:47                               ` Austin S. Hemmelgarn
@ 2017-08-17 19:00                                 ` Chris Murphy
  2017-08-17 20:34                                   ` GWB
  0 siblings, 1 reply; 63+ messages in thread
From: Chris Murphy @ 2017-08-17 19:00 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: GWB, Peter Grandi, Linux fs Btrfs

On Thu, Aug 17, 2017 at 5:47 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> Also, I don't think I've ever seen any patches posted from a Red Hat address
> on the ML, so I don't think they were really all that involved in
> development to begin with.

Unfortunately the email domain doesn't tell the whole story who's
backing development, the company or the individual.

[chris@f26s linux]$ git log --since=”2016-01-01” --pretty=format:"%an
%ae" --no-merges -- fs/btrfs | sort -u | grep redhat
Andreas Gruenbacher agruenba@redhat.com
David Howells dhowells@redhat.com
Eric Sandeen sandeen@redhat.com
Jeff Layton jlayton@redhat.com
Mike Christie mchristi@redhat.com
Miklos Szeredi mszeredi@redhat.com
$



> GFS and GlusterFS are different technologies, unless Red Hat's marketing
> department is trying to be actively deceptive.

https://www.redhat.com/en/technologies/storage

Seems very clear. I don't even see GFS or GFS2 on here. It's Gluster and Ceph.


>
> SUSE is also pretty actively involved in the development too, and I think
> Fujitsu is as well.



>>
>>
>> I'm not too worried.  I'll keep using btrfs as it is now, within the
>> limits of what it can consistently do, and do what I can to help
>> support the effort.  I'm not a file system coder, but I very much
>> appreciate the enormous amount of work that goes into btrfs.
>>
>> Steady on, ButterFS people.  Back now to cat videos.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Big bunch of SUSE contributions (yes David Sterba is counted three
times here), and Fujitsu.

[chris@f26s linux]$ git log --since=”2016-01-01” --pretty=format:"%an
%ae" --no-merges -- fs/btrfs | sort -u | grep suse
Borislav Petkov bp@suse.de
David Sterba dsterba@suse.com
David Sterba DSterba@suse.com
David Sterba dsterba@suse.cz
Edmund Nadolski enadolski@suse.com
Filipe Manana fdmanana@suse.com
Goldwyn Rodrigues rgoldwyn@suse.com
Guoqing Jiang gqjiang@suse.com
Jan Kara jack@suse.cz
Jeff Mahoney jeffm@suse.com
Jiri Kosina jkosina@suse.cz
Mark Fasheh mfasheh@suse.de
Michal Hocko mhocko@suse.com
NeilBrown neilb@suse.com
Nikolay Borisov nborisov@suse.com
Petr Mladek pmladek@suse.com

[chris@f26s linux]$ git log --since=”2016-01-01” --pretty=format:"%an
%ae" --no-merges -- fs/btrfs | sort -u | grep fujitsu
Lu Fengqi lufq.fnst@cn.fujitsu.com
Qu Wenruo quwenruo@cn.fujitsu.com
Satoru Takeuchi takeuchi_satoru@jp.fujitsu.com
Su Yue suy.fnst@cn.fujitsu.com
Tsutomu Itoh t-itoh@jp.fujitsu.com
Wang Xiaoguang wangxg.fnst@cn.fujitsu.com
Xiaoguang Wang wangxg.fnst@cn.fujitsu.com
Zhao Lei zhaolei@cn.fujitsu.com


Over the past 18 months, it's about 100 Btrfs contributors, 71 ext4,
63 XFS. So all three have many contributors. That of course does not
tell the whole story by any means.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
  2017-08-17 19:00                                 ` Chris Murphy
@ 2017-08-17 20:34                                   ` GWB
  0 siblings, 0 replies; 63+ messages in thread
From: GWB @ 2017-08-17 20:34 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Austin S. Hemmelgarn, Peter Grandi, Linux fs Btrfs

Yep, and thank you to Suse, Fujitsu, and all the contributors.

I suppose we can all be charitable when reading this from the Red Hat
Whitepaper at:

https://www.redhat.com/whitepapers/rha/gfs/GFS_INS0032US.pdf:

<<
 Red Hat GFS is the world’s leading cluster file system for Linux.
>>

If that is GFS2, it is a different use case than Gluster
(https://www.redhat.com/en/technologies/storage).  So perhaps
marketing might tweak that a little bit, maybe:

<<
 Red Hat GFS is the world’s leading cluster file system for Linux for
Oracle RAC Database Clustering.
>>

But you can see how Oracle might quibble with that.  So Red Hat goes
as far as it can in the Whitepaper:

<<
 Red Hat GFS simplifies the installation, configuration, and on-going
maintenance of the SAN infrastructure necessary for Oracle RAC
clustering. Oracle tables, log files, program files, and archive
information can all be stored in GFS files, avoiding the complexity
and difficulties of managing raw storage devices on a SAN while
achieving excellent performance.
>>

Which avoids a comparison between, say, an Oracle Sparc server
(probably made by Fujitsu) hosting Oracle Rack Clusters on Solaris.
Given the price of Oracle's sparc servers, Red Hat may be as good as
an Oracle RAC DB server can get for a price less than the annual
budget of a small country.

Well, great news, Austin and Chris, that clears it up for me, and now
I know of yet another use case for btrfs as the dmu for Gluster.  So,
again, I'm not too worried about Red Hat deprecating btrfs, given the
number of supporters and developers.  If Oracle or Suse drops out,
then I would worry.

Gordon

On Thu, Aug 17, 2017 at 2:00 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Thu, Aug 17, 2017 at 5:47 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>> Also, I don't think I've ever seen any patches posted from a Red Hat address
>> on the ML, so I don't think they were really all that involved in
>> development to begin with.
>
> Unfortunately the email domain doesn't tell the whole story who's
> backing development, the company or the individual.
>
> [chris@f26s linux]$ git log --since=”2016-01-01” --pretty=format:"%an
> %ae" --no-merges -- fs/btrfs | sort -u | grep redhat
> Andreas Gruenbacher agruenba@redhat.com
> David Howells dhowells@redhat.com
> Eric Sandeen sandeen@redhat.com
> Jeff Layton jlayton@redhat.com
> Mike Christie mchristi@redhat.com
> Miklos Szeredi mszeredi@redhat.com
> $
>
>
>
>> GFS and GlusterFS are different technologies, unless Red Hat's marketing
>> department is trying to be actively deceptive.
>
> https://www.redhat.com/en/technologies/storage
>
> Seems very clear. I don't even see GFS or GFS2 on here. It's Gluster and Ceph.
>
>
>>
>> SUSE is also pretty actively involved in the development too, and I think
>> Fujitsu is as well.
>
>
>
>>>
>>>
>>> I'm not too worried.  I'll keep using btrfs as it is now, within the
>>> limits of what it can consistently do, and do what I can to help
>>> support the effort.  I'm not a file system coder, but I very much
>>> appreciate the enormous amount of work that goes into btrfs.
>>>
>>> Steady on, ButterFS people.  Back now to cat videos.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Big bunch of SUSE contributions (yes David Sterba is counted three
> times here), and Fujitsu.
>
> [chris@f26s linux]$ git log --since=”2016-01-01” --pretty=format:"%an
> %ae" --no-merges -- fs/btrfs | sort -u | grep suse
> Borislav Petkov bp@suse.de
> David Sterba dsterba@suse.com
> David Sterba DSterba@suse.com
> David Sterba dsterba@suse.cz
> Edmund Nadolski enadolski@suse.com
> Filipe Manana fdmanana@suse.com
> Goldwyn Rodrigues rgoldwyn@suse.com
> Guoqing Jiang gqjiang@suse.com
> Jan Kara jack@suse.cz
> Jeff Mahoney jeffm@suse.com
> Jiri Kosina jkosina@suse.cz
> Mark Fasheh mfasheh@suse.de
> Michal Hocko mhocko@suse.com
> NeilBrown neilb@suse.com
> Nikolay Borisov nborisov@suse.com
> Petr Mladek pmladek@suse.com
>
> [chris@f26s linux]$ git log --since=”2016-01-01” --pretty=format:"%an
> %ae" --no-merges -- fs/btrfs | sort -u | grep fujitsu
> Lu Fengqi lufq.fnst@cn.fujitsu.com
> Qu Wenruo quwenruo@cn.fujitsu.com
> Satoru Takeuchi takeuchi_satoru@jp.fujitsu.com
> Su Yue suy.fnst@cn.fujitsu.com
> Tsutomu Itoh t-itoh@jp.fujitsu.com
> Wang Xiaoguang wangxg.fnst@cn.fujitsu.com
> Xiaoguang Wang wangxg.fnst@cn.fujitsu.com
> Zhao Lei zhaolei@cn.fujitsu.com
>
>
> Over the past 18 months, it's about 100 Btrfs contributors, 71 ext4,
> 63 XFS. So all three have many contributors. That of course does not
> tell the whole story by any means.
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2017-08-17 20:34 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02  8:38 RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut? Brendan Hide
2017-08-02  9:11 ` Wang Shilong
2017-08-03 19:18   ` Chris Murphy
2017-08-02 11:25 ` Austin S. Hemmelgarn
2017-08-02 12:55   ` Lutz Vieweg
2017-08-02 13:47     ` Austin S. Hemmelgarn
2017-08-02 18:44 ` Chris Mason
2017-08-02 22:12   ` Fajar A. Nugraha
2017-08-02 22:22 ` Chris Murphy
2017-08-03  9:59   ` Lutz Vieweg
2017-08-03 18:08 ` waxhead
2017-08-03 18:29   ` Christoph Anton Mitterer
2017-08-03 19:22     ` Austin S. Hemmelgarn
2017-08-03 20:45       ` Brendan Hide
2017-08-03 22:00         ` Chris Murphy
2017-08-04 11:26         ` Austin S. Hemmelgarn
2017-08-03 19:03   ` Austin S. Hemmelgarn
2017-08-04  9:48     ` Duncan
2017-08-16 18:07   ` David Sterba
2017-08-04 14:05 ` Qu Wenruo
2017-08-04 23:55   ` Wang Shilong
2017-08-07 15:27   ` Chris Murphy
2017-08-10  0:35     ` Qu Wenruo
2017-08-12  0:10       ` Christoph Anton Mitterer
2017-08-12  7:42         ` Christoph Hellwig
2017-08-12 11:51           ` Christoph Anton Mitterer
2017-08-12 12:12             ` Hugo Mills
2017-08-13 14:08               ` Goffredo Baroncelli
2017-08-14  7:08                 ` Qu Wenruo
2017-08-14 14:23                   ` Goffredo Baroncelli
2017-08-14 19:08                     ` Chris Murphy
2017-08-14 20:27                       ` Goffredo Baroncelli
2017-08-14  6:36           ` Qu Wenruo
2017-08-14  7:43             ` Paul Jones
2017-08-14  7:46               ` Qu Wenruo
2017-08-14 12:32                 ` Christoph Anton Mitterer
2017-08-14 12:58                   ` Qu Wenruo
2017-08-14 12:24             ` Christoph Anton Mitterer
2017-08-14 14:23               ` Austin S. Hemmelgarn
2017-08-14 15:13                 ` Graham Cobb
2017-08-14 15:53                   ` Austin S. Hemmelgarn
2017-08-14 16:42                     ` Graham Cobb
2017-08-14 19:54                     ` Christoph Anton Mitterer
2017-08-15 11:37                       ` Austin S. Hemmelgarn
2017-08-15 14:41                         ` Christoph Anton Mitterer
2017-08-15 15:43                           ` Austin S. Hemmelgarn
2017-08-16 13:12                       ` Chris Mason
2017-08-16 13:31                         ` Christoph Anton Mitterer
2017-08-16 13:53                           ` Austin S. Hemmelgarn
2017-08-16 14:11                             ` Christoph Anton Mitterer
2017-08-16 15:07                               ` Austin S. Hemmelgarn
2017-08-16 17:26                                 ` Peter Grandi
2017-08-16 18:19                             ` David Sterba
2017-08-16 16:54                           ` Peter Grandi
2017-08-16 13:56                         ` Austin S. Hemmelgarn
2017-08-16 14:01                         ` Qu Wenruo
2017-08-16 19:52                           ` Chris Murphy
2017-08-17  6:25                             ` GWB
2017-08-17 11:47                               ` Austin S. Hemmelgarn
2017-08-17 19:00                                 ` Chris Murphy
2017-08-17 20:34                                   ` GWB
2017-08-16 16:44                         ` Peter Grandi
2017-08-14 19:39                 ` Christoph Anton Mitterer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.