All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs and write barriers
@ 2019-04-01 19:22 Hendrik Friedel
  2019-04-02  0:13 ` Qu Wenruo
  2019-04-02 11:46 ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 8+ messages in thread
From: Hendrik Friedel @ 2019-04-01 19:22 UTC (permalink / raw)
  To: linux-btrfs

Dear btrfs-team,

I am aware, that barriers are essential for btrfs [1].
I have some questions on that topic:
1) I am not aware how to determine, whether barriers are supported, 
except for searching dmesg for a message that barriers are disabled. Is 
that correct? It would be nice, if that could be determined before 
creating the FS.
2) I find the location of the (only?) warning -dmesg- well hidden. I 
think it would be better to notify the user when creating the 
file-system.
3) Even more, it would be good, if btrfs would disable the write cache 
in that case, so that one does not need to rely on the user
4) If [2] is still valid, there are drives 'lying' about their barrier 
support. Can someone comment? If that is the case, it would be even 
advisable to provide a test to test the actual capability. In fact, if 
this is still valid, this may be the reason for some btrfs corruptions 
that have been discussed here. [I did read, that LVM/Device-Mapper does 
not support barriers, but I think that this is outdated]

Greetings,
Hendrik


[1] 
https://btrfs.wiki.kernel.org/index.php/FAQ#I_see_a_warning_in_dmesg_about_barriers_being_disabled_when_mounting_my_filesystem._What_does_that_mean.3F
[2] https://brad.livejournal.com/2116715.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs and write barriers
  2019-04-01 19:22 btrfs and write barriers Hendrik Friedel
@ 2019-04-02  0:13 ` Qu Wenruo
       [not found]   ` <em07dd5637-7710-4eaa-8659-8d8eef1fc709@ryzen>
  2019-04-02 11:46 ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2019-04-02  0:13 UTC (permalink / raw)
  To: Hendrik Friedel, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2337 bytes --]



On 2019/4/2 上午3:22, Hendrik Friedel wrote:
> Dear btrfs-team,
> 
> I am aware, that barriers are essential for btrfs [1].
> I have some questions on that topic:
> 1) I am not aware how to determine, whether barriers are supported,
> except for searching dmesg for a message that barriers are disabled. Is
> that correct? It would be nice, if that could be determined before
> creating the FS.

That wiki line of barrier is out of date in fact.

The truth is, spec of SATA/SCSI/NVMe forces all devices to support at
least FLUSH (to flush all cached data in the controller to disk or
another persistent cache).

For the FUA (you can consider it as atomic write), if the controller
doesn't support it, kernel will emulate it by FLUSH/WRITE/FLUSH.

So even you see something like the following from dmesg:
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA

It means nothing wrong. Don't panic.

In fact, for most SATA devices, libata disable FUA by default.
And all other fs is OK with FLUSH only.

> 2) I find the location of the (only?) warning -dmesg- well hidden. I
> think it would be better to notify the user when creating the file-system.
> 3) Even more, it would be good, if btrfs would disable the write cache
> in that case, so that one does not need to rely on the user

Personally speaking, if user really believes it's write cache causing
the problem or want to be extra safe, then they should disable cache.

As long as FLUSH is implemented without problem, the only faulty part is
btrfs itself and I haven't found any proof of either yet.

Thanks,
Qu

> 4) If [2] is still valid, there are drives 'lying' about their barrier
> support. Can someone comment? If that is the case, it would be even
> advisable to provide a test to test the actual capability. In fact, if
> this is still valid, this may be the reason for some btrfs corruptions
> that have been discussed here. [I did read, that LVM/Device-Mapper does
> not support barriers, but I think that this is outdated]
> 
> Greetings,
> Hendrik
> 
> 
> [1]
> https://btrfs.wiki.kernel.org/index.php/FAQ#I_see_a_warning_in_dmesg_about_barriers_being_disabled_when_mounting_my_filesystem._What_does_that_mean.3F
> 
> [2] https://brad.livejournal.com/2116715.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs and write barriers
  2019-04-01 19:22 btrfs and write barriers Hendrik Friedel
  2019-04-02  0:13 ` Qu Wenruo
@ 2019-04-02 11:46 ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 8+ messages in thread
From: Austin S. Hemmelgarn @ 2019-04-02 11:46 UTC (permalink / raw)
  To: Hendrik Friedel, linux-btrfs

On 2019-04-01 15:22, Hendrik Friedel wrote:
> Dear btrfs-team,
> 
> I am aware, that barriers are essential for btrfs [1].
> I have some questions on that topic:
> 1) I am not aware how to determine, whether barriers are supported, 
> except for searching dmesg for a message that barriers are disabled. Is 
> that correct? It would be nice, if that could be determined before 
> creating the FS.
AFAIK, this is correct.  However, not supporting DPO or FUA is 
non-critical, because the kernel emulates them properly (there would be 
many problems far beyond BTRFS if it didn't, as most manufacturers treat 
FUA the same way they treat SCT ERC, it's an 'enterprise' feature, so 
consumers aren't allowed to have it).

> 2) I find the location of the (only?) warning -dmesg- well hidden. I 
> think it would be better to notify the user when creating the file-system.
A notification on creating the volume and ones when adding devices 
(either via `device add` or via a replace operation) would indeed be 
nice, but we should still keep the kernel log warning.  Note also that 
messages like what Qu mentioned as being fine are from the SCSI layer 
(yes, even if you're using ATA or USB disks, they both go through the 
SCSI layer in Linux), not BTRFS.

> 3) Even more, it would be good, if btrfs would disable the write cache 
> in that case, so that one does not need to rely on the user
I would tend to disagree here.  We should definitely _recommend_ this to 
the user if we know there is no barrier support, but just doing it 
behind their back is not a good idea. There are also plenty of valid 
reasons to want to use the write cache anyway.

> 4) If [2] is still valid, there are drives 'lying' about their barrier 
> support. Can someone comment? If that is the case, it would be even 
> advisable to provide a test to test the actual capability. In fact, if 
> this is still valid, this may be the reason for some btrfs corruptions 
> that have been discussed here. [I did read, that LVM/Device-Mapper does 
> not support barriers, but I think that this is outdated]There are two things to consider here, the FLUSH command which is 
mandatory as per SCSI, ATA, and pretty much every other storage protocol 
specification, and FUA/DPO, which is not.  If you have FLUSH, you can 
emulate FUA/DPO.

The only modern devices I know of that actually _lied_ about FLUSH are 
OCZ SSD's. They've stopped making them because the associated data-loss 
issues killed any consumer trust in the product.  The only other devices 
I've ever seen _any_ issue with the FLUSH implementation in are some 
ancient SCSI-2 5.25 inch full height disk drives where I work, which 
have a firmware bug that reports the FLUSH completed before the last 
sector in the write cache is written out (they still write that last 
sector, they just report command completion early).

As far as FUA/DPO, I know of exactly _zero_ devices that lie about 
implementing it and don't.  Unlike FLUSH, which is a required part of 
almost all modern storage protocols, FUA/DPO isn't required, so there's 
essentially zero incentive to claim you implement it when you don't 
(people who would be looking for it generally will know what they're doing

As far as that article you're linking about disks lying, note first that 
it's just over 14 years old (that's almost three times the MTBF for 
normal hard drives), and much has changed since then.  The actual issue 
there is not the disks doing write caching (which is what is actually 
being complained about), but the fact that Linux used to not issue a 
FLUSH command to the disks when you called fsync in userspace.
> 
> Greetings,
> Hendrik
> 
> 
> [1] 
> https://btrfs.wiki.kernel.org/index.php/FAQ#I_see_a_warning_in_dmesg_about_barriers_being_disabled_when_mounting_my_filesystem._What_does_that_mean.3F 
> 
> [2] https://brad.livejournal.com/2116715.html
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs and write barriers
       [not found]   ` <em07dd5637-7710-4eaa-8659-8d8eef1fc709@ryzen>
@ 2019-04-03 18:44     ` Austin S. Hemmelgarn
  2019-04-28 19:27       ` Re[2]: " Hendrik Friedel
       [not found]     ` <eme2e3d545-ea78-4120-9800-6a33db6c506b@ryzen>
  2019-04-04  1:00     ` Qu Wenruo
  2 siblings, 1 reply; 8+ messages in thread
From: Austin S. Hemmelgarn @ 2019-04-03 18:44 UTC (permalink / raw)
  To: Hendrik Friedel, Qu Wenruo, linux-btrfs

On 2019-04-03 14:17, Hendrik Friedel wrote:
> Hello,
> 
> thanks for your reply.
> 
>>> 3) Even more, it would be good, if btrfs would disable the write cache
>>> in that case, so that one does not need to rely on the user
>> Personally speaking, if user really believes it's write cache causing
>> the problem or want to be extra safe, then they should disable cache.
> How many percent of the users will be able to judge that?
>> As long as FLUSH is implemented without problem, the only faulty part is
>> btrfs itself and I haven't found any proof of either yet.
> But you have searched?
> 
>  >>2) I find the location of the (only?) warning -dmesg- well hidden. I 
> think it would be better to notify the user when creating the file-system.
>  >A notification on creating the volume and ones when adding devices 
> (either via `device add` or via a replace operation)
>  >would indeed be nice, but we should still keep the kernel log warning.
> 
> Ok, so what would be the way to move forward on that? Would it help if I 
> create an issue in a https://bugzilla.kernel.org/ ?
The biggest issue is actually figuring out if the devices don't support 
write barriers (which means no FLUSH or broken FLUSH on Linux, not no 
FUA/DPO, because as long as the device properly implements FLUSH (and 
most do), Linux will provide a FUA emulation which works for write 
barriers).  Once you've got that, it should be pretty trivial to add to 
the messages.
> 
>  >>3) Even more, it would be good, if btrfs would disable the write 
> cache in that case, so that one does not need to rely on the user
>  > I would tend to disagree here. We should definitely _recommend_ this 
> to the user if we know there is no barrier support, but just
>  > doing it behind their back is not a good idea.
> 
> Well, there is some room between 'automatic' and 'behind their back. E.g.
> "Barriers are not supported by /dev/sda. Automatically disabling 
> write-cache on mount. You can suppress this with the 
> 'enable-cache-despite-no-barrier-support-I-know-what-I-am-doing' mount 
> option (maybe, we can shorten the option).
And that's still 'behind the back' because it's a layering violation. 
Even LVM and MD don't do this, and they have even worse issues than we 
do because they aren't CoW.
> 
>  > There are also plenty of valid reasons to want to use the write cache 
> anyway.
> I cannot think of one. Who would sacrifice data integrity/potential 
> total loss of the filesystem for speed?
There are quite a few cases where the risk of data loss _just doesn't 
matter_, and any data that could be invalid is also inherently stale. 
Some trivial examples:

* /run on any modern Linux system. Primarily contains sockets used by 
running services, PID files for daemons, and other similar things that 
only matter for the duration of the current boot of the system. These 
days, it's usually in-memory, but some people with really tight memory 
constraints still use persistent storage for it to save memory.
* /tmp on any sane UNIX system. Similar case to above, but usually for 
stuff that only matters on the scale of session lifetimes, or even just 
process lifetimes.
* /var/tmp on most Linux systems. Usually the same case as /tmp.
* /var/cache on any sane UNIX system. By definition, if the data here is 
lost, it doesn't matter, as it only exists for performance reasons 
anyway. Smart applications will even validate the files they put here, 
so corruption isn't an issue either.

There are bunches of other examples I could list, but all of them are 
far more situational and application specific.
> 
>  > As far as FUA/DPO, I know of exactly _zero_ devices that lie about 
> implementing it and don't.
> ...
>  > but the fact that Linux used to not issue a FLUSH command to the 
> disks when you called fsync in userspace.
> Ok, thanks for that clarification.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re[3]: btrfs and write barriers
       [not found]     ` <eme2e3d545-ea78-4120-9800-6a33db6c506b@ryzen>
@ 2019-04-03 19:38       ` Hendrik Friedel
  0 siblings, 0 replies; 8+ messages in thread
From: Hendrik Friedel @ 2019-04-03 19:38 UTC (permalink / raw)
  To: linux-btrfs

Hello,
thanks for your reply.

 >>3) Even more, it would be good, if btrfs would disable the write cache
 >> in that case, so that one does not need to rely on the user
 >
 >Personally speaking, if user really believes it's write cache causing
 >the problem or want to be extra safe, then they should disable cache.
How many percent of the users will be able to judge that?

 >As long as FLUSH is implemented without problem, the only faulty part 
is
 >btrfs itself and I haven't found any proof of either yet.
But you have searched?

 >>2) I find the location of the (only?) warning -dmesg- well hidden. I 
think it would be better to notify the user when creating the 
file-system.
 >A notification on creating the volume and ones when adding devices 
(either via `device add` or via a replace operation)
 >would indeed be nice, but we should still keep the kernel log warning.

Ok, so what would be the way to move forward on that? Would it help if I 
create an issue in a https://bugzilla.kernel.org/ ?

 >>3) Even more, it would be good, if btrfs would disable the write cache 
in that case, so that one does not need to rely on the user
 > I would tend to disagree here. We should definitely _recommend_ this 
to the user if we know there is no barrier support, but just
 > doing it behind their back is not a good idea.

Well, there is some room between 'automatic' and 'behind their back. 
E.g.
"Barriers are not supported by /dev/sda. Automatically disabling 
write-cache on mount. You can suppress this with the 
'enable-cache-despite-no-barrier-support-I-know-what-I-am-doing' mount 
option (maybe, we can shorten the option).

 > There are also plenty of valid reasons to want to use the write cache 
anyway.

I cannot think of one. Who would sacrifice data integrity/potential 
total loss of the filesystem for speed?

 > As far as FUA/DPO, I know of exactly _zero_ devices that lie about 
implementing it and don't.
...
 > but the fact that Linux used to not issue a FLUSH command to the disks 
when you called fsync in userspace.

Ok, thanks for that clarification.


Greetings,
Hendrik


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs and write barriers
       [not found]   ` <em07dd5637-7710-4eaa-8659-8d8eef1fc709@ryzen>
  2019-04-03 18:44     ` Austin S. Hemmelgarn
       [not found]     ` <eme2e3d545-ea78-4120-9800-6a33db6c506b@ryzen>
@ 2019-04-04  1:00     ` Qu Wenruo
  2 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2019-04-04  1:00 UTC (permalink / raw)
  To: Hendrik Friedel, linux-btrfs, Austin S. Hemmelgarn


[-- Attachment #1.1: Type: text/plain, Size: 2811 bytes --]



On 2019/4/4 上午2:17, Hendrik Friedel wrote:
> Hello,
> 
> thanks for your reply.
> 
>>> 3) Even more, it would be good, if btrfs would disable the write cache
>>> in that case, so that one does not need to rely on the user
>>  
>> Personally speaking, if user really believes it's write cache causing
>> the problem or want to be extra safe, then they should disable cache.
> How many percent of the users will be able to judge that?

No need, unless user really got a faulty device.
And in that case, even experienced developer can't easily spot such
problem, so it doesn't really matter until it happens.

>> As long as FLUSH is implemented without problem, the only faulty part is
>> btrfs itself and I haven't found any proof of either yet.
> But you have searched?

Yes, using dm-log-write with various workload, to ensure btrfs does the
correct barrier/fua to ensure metadata is CoWed between transaction.

> 
>>>2) I find the location of the (only?) warning -dmesg- well hidden. I
> think it would be better to notify the user when creating the file-system.

As stated already, unless user got a faulty device, enabling write cache
won't cause any problem.
You guys are spending tons of attention to solve a problem that almost
nobody hits.

Thanks,
Qu

>>A notification on creating the volume and ones when adding devices
> (either via `device add` or via a replace operation) 
>>would indeed be nice, but we should still keep the kernel log warning. 
> 
> Ok, so what would be the way to move forward on that? Would it help if I
> create an issue in a https://bugzilla.kernel.org/ ?
> 
>>>3) Even more, it would be good, if btrfs would disable the write cache
> in that case, so that one does not need to rely on the user
>> I would tend to disagree here. We should definitely _recommend_ this
> to the user if we know there is no barrier support, but just 
>> doing it behind their back is not a good idea.
> 
> Well, there is some room between 'automatic' and 'behind their back. E.g.
> "Barriers are not supported by /dev/sda. Automatically disabling
> write-cache on mount. You can suppress this with the
> 'enable-cache-despite-no-barrier-support-I-know-what-I-am-doing' mount
> option (maybe, we can shorten the option). 
> 
>> There are also plenty of valid reasons to want to use the write cache
> anyway.
> I cannot think of one. Who would sacrifice data integrity/potential
> total loss of the filesystem for speed?
> 
>> As far as FUA/DPO, I know of exactly _zero_ devices that lie about
> implementing it and don't. 
> ...
>> but the fact that Linux used to not issue a FLUSH command to the disks
> when you called fsync in userspace.
> Ok, thanks for that clarification.
> 
> 
> Greetings,
> Hendrik
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re[2]: btrfs and write barriers
  2019-04-03 18:44     ` Austin S. Hemmelgarn
@ 2019-04-28 19:27       ` Hendrik Friedel
  2019-04-28 23:53         ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: Hendrik Friedel @ 2019-04-28 19:27 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Qu Wenruo, linux-btrfs

Hello,
thanks for your reply.

 >>3) Even more, it would be good, if btrfs would disable the write cache
 >> in that case, so that one does not need to rely on the user
 >
 >Personally speaking, if user really believes it's write cache causing
 >the problem or want to be extra safe, then they should disable cache.
How many percent of the users will be able to judge that?

 >As long as FLUSH is implemented without problem, the only faulty part 
is
 >btrfs itself and I haven't found any proof of either yet.
But you have searched?

 >>2) I find the location of the (only?) warning -dmesg- well hidden. I 
think it would be better to notify the user when creating the 
file-system.
 >A notification on creating the volume and ones when adding devices 
(either via `device add` or via a replace operation)
 >would indeed be nice, but we should still keep the kernel log warning.

Ok, so what would be the way to move forward on that? Would it help if I 
create an issue in a https://bugzilla.kernel.org/ ?

 >>3) Even more, it would be good, if btrfs would disable the write cache 
in that case, so that one does not need to rely on the user
 > I would tend to disagree here. We should definitely _recommend_ this 
to the user if we know there is no barrier support, but just
 > doing it behind their back is not a good idea.

Well, there is some room between 'automatic' and 'behind their back. 
E.g.
"Barriers are not supported by /dev/sda. Automatically disabling 
write-cache on mount. You can suppress this with the 
'enable-cache-despite-no-barrier-support-I-know-what-I-am-doing' mount 
option (maybe, we can shorten the option).

 > There are also plenty of valid reasons to want to use the write cache 
anyway.

I cannot think of one. Who would sacrifice data integrity/potential 
total loss of the filesystem for speed?

 > As far as FUA/DPO, I know of exactly _zero_ devices that lie about 
implementing it and don't.
...
 > but the fact that Linux used to not issue a FLUSH command to the disks 
when you called fsync in userspace.

Ok, thanks for that clarification.


Greetings,
Hendrik

------ Originalnachricht ------
Von: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
An: "Hendrik Friedel" <hendrik@friedels.name>; "Qu Wenruo" 
<quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
Gesendet: 03.04.2019 20:44:09
Betreff: Re: btrfs and write barriers

>On 2019-04-03 14:17, Hendrik Friedel wrote:
>>Hello,
>>
>>thanks for your reply.
>>
>>>>3) Even more, it would be good, if btrfs would disable the write cache
>>>>in that case, so that one does not need to rely on the user
>>>Personally speaking, if user really believes it's write cache causing
>>>the problem or want to be extra safe, then they should disable cache.
>>How many percent of the users will be able to judge that?
>>>As long as FLUSH is implemented without problem, the only faulty part is
>>>btrfs itself and I haven't found any proof of either yet.
>>But you have searched?
>>
>>  >>2) I find the location of the (only?) warning -dmesg- well hidden. I think it would be better to notify the user when creating the file-system.
>>  >A notification on creating the volume and ones when adding devices (either via `device add` or via a replace operation)
>>  >would indeed be nice, but we should still keep the kernel log warning.
>>
>>Ok, so what would be the way to move forward on that? Would it help if I create an issue in a https://bugzilla.kernel.org/ ?
>The biggest issue is actually figuring out if the devices don't support write barriers (which means no FLUSH or broken FLUSH on Linux, not no FUA/DPO, because as long as the device properly implements FLUSH (and most do), Linux will provide a FUA emulation which works for write barriers).  Once you've got that, it should be pretty trivial to add to the messages.
>>
>>  >>3) Even more, it would be good, if btrfs would disable the write cache in that case, so that one does not need to rely on the user
>>  > I would tend to disagree here. We should definitely _recommend_ this to the user if we know there is no barrier support, but just
>>  > doing it behind their back is not a good idea.
>>
>>Well, there is some room between 'automatic' and 'behind their back. E.g.
>>"Barriers are not supported by /dev/sda. Automatically disabling write-cache on mount. You can suppress this with the 'enable-cache-despite-no-barrier-support-I-know-what-I-am-doing' mount option (maybe, we can shorten the option).
>And that's still 'behind the back' because it's a layering violation. Even LVM and MD don't do this, and they have even worse issues than we do because they aren't CoW.
>>
>>  > There are also plenty of valid reasons to want to use the write cache anyway.
>>I cannot think of one. Who would sacrifice data integrity/potential total loss of the filesystem for speed?
>There are quite a few cases where the risk of data loss _just doesn't matter_, and any data that could be invalid is also inherently stale. Some trivial examples:
>
>* /run on any modern Linux system. Primarily contains sockets used by running services, PID files for daemons, and other similar things that only matter for the duration of the current boot of the system. These days, it's usually in-memory, but some people with really tight memory constraints still use persistent storage for it to save memory.
>* /tmp on any sane UNIX system. Similar case to above, but usually for stuff that only matters on the scale of session lifetimes, or even just process lifetimes.
>* /var/tmp on most Linux systems. Usually the same case as /tmp.
>* /var/cache on any sane UNIX system. By definition, if the data here is lost, it doesn't matter, as it only exists for performance reasons anyway. Smart applications will even validate the files they put here, so corruption isn't an issue either.
>
>There are bunches of other examples I could list, but all of them are far more situational and application specific.
>>
>>  > As far as FUA/DPO, I know of exactly _zero_ devices that lie about implementing it and don't.
>>...
>>  > but the fact that Linux used to not issue a FLUSH command to the disks when you called fsync in userspace.
>>Ok, thanks for that clarification.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: btrfs and write barriers
  2019-04-28 19:27       ` Re[2]: " Hendrik Friedel
@ 2019-04-28 23:53         ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2019-04-28 23:53 UTC (permalink / raw)
  To: Hendrik Friedel, Austin S. Hemmelgarn, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 7076 bytes --]



On 2019/4/29 上午3:27, Hendrik Friedel wrote:
> Hello,
> thanks for your reply.
> 
>>>3) Even more, it would be good, if btrfs would disable the write cache
>>> in that case, so that one does not need to rely on the user
>>
>>Personally speaking, if user really believes it's write cache causing
>>the problem or want to be extra safe, then they should disable cache.
> How many percent of the users will be able to judge that?
> 
>>As long as FLUSH is implemented without problem, the only faulty part is
>>btrfs itself and I haven't found any proof of either yet.
> But you have searched?
> 
>>>2) I find the location of the (only?) warning -dmesg- well hidden. I
> think it would be better to notify the user when creating the file-system.
>>A notification on creating the volume and ones when adding devices
> (either via `device add` or via a replace operation)
>>would indeed be nice, but we should still keep the kernel log warning.
> 
> Ok, so what would be the way to move forward on that? Would it help if I
> create an issue in a https://bugzilla.kernel.org/ ?

No need. See comment below.

> 
>>>3) Even more, it would be good, if btrfs would disable the write cache
> in that case, so that one does not need to rely on the user
>> I would tend to disagree here. We should definitely _recommend_ this
> to the user if we know there is no barrier support, but just
>> doing it behind their back is not a good idea.
> 
> Well, there is some room between 'automatic' and 'behind their back. E.g.
> "Barriers are not supported by /dev/sda. Automatically disabling
> write-cache on mount. You can suppress this with the
> 'enable-cache-despite-no-barrier-support-I-know-what-I-am-doing' mount
> option (maybe, we can shorten the option).

There is no problem using write cache as long as the device supports
flush. SATA/NVME protocol specified all devices should support flush.

As long as flush is supported, fua can be emulated.
Thus write cache is not a problem at all, as long as flush is
implemented correctly.

> 
>> There are also plenty of valid reasons to want to use the write cache
> anyway.
> 
> I cannot think of one. Who would sacrifice data integrity/potential
> total loss of the filesystem for speed?

No data integrity is lost, and performance is greatly improved with
write cache.

Thanks,
Qu
> 
>> As far as FUA/DPO, I know of exactly _zero_ devices that lie about
> implementing it and don't.
> ...
>> but the fact that Linux used to not issue a FLUSH command to the disks
> when you called fsync in userspace.
> 
> Ok, thanks for that clarification.
> 
> 
> Greetings,
> Hendrik
> 
> ------ Originalnachricht ------
> Von: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
> An: "Hendrik Friedel" <hendrik@friedels.name>; "Qu Wenruo"
> <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
> Gesendet: 03.04.2019 20:44:09
> Betreff: Re: btrfs and write barriers
> 
>> On 2019-04-03 14:17, Hendrik Friedel wrote:
>>> Hello,
>>>
>>> thanks for your reply.
>>>
>>>>> 3) Even more, it would be good, if btrfs would disable the write cache
>>>>> in that case, so that one does not need to rely on the user
>>>> Personally speaking, if user really believes it's write cache causing
>>>> the problem or want to be extra safe, then they should disable cache.
>>> How many percent of the users will be able to judge that?
>>>> As long as FLUSH is implemented without problem, the only faulty
>>>> part is
>>>> btrfs itself and I haven't found any proof of either yet.
>>> But you have searched?
>>>
>>>  >>2) I find the location of the (only?) warning -dmesg- well hidden.
>>> I think it would be better to notify the user when creating the
>>> file-system.
>>>  >A notification on creating the volume and ones when adding devices
>>> (either via `device add` or via a replace operation)
>>>  >would indeed be nice, but we should still keep the kernel log warning.
>>>
>>> Ok, so what would be the way to move forward on that? Would it help
>>> if I create an issue in a https://bugzilla.kernel.org/ ?
>> The biggest issue is actually figuring out if the devices don't
>> support write barriers (which means no FLUSH or broken FLUSH on Linux,
>> not no FUA/DPO, because as long as the device properly implements
>> FLUSH (and most do), Linux will provide a FUA emulation which works
>> for write barriers).  Once you've got that, it should be pretty
>> trivial to add to the messages.
>>>
>>>  >>3) Even more, it would be good, if btrfs would disable the write
>>> cache in that case, so that one does not need to rely on the user
>>>  > I would tend to disagree here. We should definitely _recommend_
>>> this to the user if we know there is no barrier support, but just
>>>  > doing it behind their back is not a good idea.
>>>
>>> Well, there is some room between 'automatic' and 'behind their back.
>>> E.g.
>>> "Barriers are not supported by /dev/sda. Automatically disabling
>>> write-cache on mount. You can suppress this with the
>>> 'enable-cache-despite-no-barrier-support-I-know-what-I-am-doing'
>>> mount option (maybe, we can shorten the option).
>> And that's still 'behind the back' because it's a layering violation.
>> Even LVM and MD don't do this, and they have even worse issues than we
>> do because they aren't CoW.
>>>
>>>  > There are also plenty of valid reasons to want to use the write
>>> cache anyway.
>>> I cannot think of one. Who would sacrifice data integrity/potential
>>> total loss of the filesystem for speed?
>> There are quite a few cases where the risk of data loss _just doesn't
>> matter_, and any data that could be invalid is also inherently stale.
>> Some trivial examples:
>>
>> * /run on any modern Linux system. Primarily contains sockets used by
>> running services, PID files for daemons, and other similar things that
>> only matter for the duration of the current boot of the system. These
>> days, it's usually in-memory, but some people with really tight memory
>> constraints still use persistent storage for it to save memory.
>> * /tmp on any sane UNIX system. Similar case to above, but usually for
>> stuff that only matters on the scale of session lifetimes, or even
>> just process lifetimes.
>> * /var/tmp on most Linux systems. Usually the same case as /tmp.
>> * /var/cache on any sane UNIX system. By definition, if the data here
>> is lost, it doesn't matter, as it only exists for performance reasons
>> anyway. Smart applications will even validate the files they put here,
>> so corruption isn't an issue either.
>>
>> There are bunches of other examples I could list, but all of them are
>> far more situational and application specific.
>>>
>>>  > As far as FUA/DPO, I know of exactly _zero_ devices that lie about
>>> implementing it and don't.
>>> ...
>>>  > but the fact that Linux used to not issue a FLUSH command to the
>>> disks when you called fsync in userspace.
>>> Ok, thanks for that clarification.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-28 23:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-01 19:22 btrfs and write barriers Hendrik Friedel
2019-04-02  0:13 ` Qu Wenruo
     [not found]   ` <em07dd5637-7710-4eaa-8659-8d8eef1fc709@ryzen>
2019-04-03 18:44     ` Austin S. Hemmelgarn
2019-04-28 19:27       ` Re[2]: " Hendrik Friedel
2019-04-28 23:53         ` Qu Wenruo
     [not found]     ` <eme2e3d545-ea78-4120-9800-6a33db6c506b@ryzen>
2019-04-03 19:38       ` Re[3]: " Hendrik Friedel
2019-04-04  1:00     ` Qu Wenruo
2019-04-02 11:46 ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.