On 2019/4/29 上午3:27, Hendrik Friedel wrote:
> Hello,
> thanks for your reply.
> 
>>>3) Even more, it would be good, if btrfs would disable the write cache
>>> in that case, so that one does not need to rely on the user
>>
>>Personally speaking, if user really believes it's write cache causing
>>the problem or want to be extra safe, then they should disable cache.
> How many percent of the users will be able to judge that?
> 
>>As long as FLUSH is implemented without problem, the only faulty part is
>>btrfs itself and I haven't found any proof of either yet.
> But you have searched?
> 
>>>2) I find the location of the (only?) warning -dmesg- well hidden. I
> think it would be better to notify the user when creating the file-system.
>>A notification on creating the volume and ones when adding devices
> (either via `device add` or via a replace operation)
>>would indeed be nice, but we should still keep the kernel log warning.
> 
> Ok, so what would be the way to move forward on that? Would it help if I
> create an issue in a https://bugzilla.kernel.org/ ?

No need. See comment below.

> 
>>>3) Even more, it would be good, if btrfs would disable the write cache
> in that case, so that one does not need to rely on the user
>> I would tend to disagree here. We should definitely _recommend_ this
> to the user if we know there is no barrier support, but just
>> doing it behind their back is not a good idea.
> 
> Well, there is some room between 'automatic' and 'behind their back. E.g.
> "Barriers are not supported by /dev/sda. Automatically disabling
> write-cache on mount. You can suppress this with the
> 'enable-cache-despite-no-barrier-support-I-know-what-I-am-doing' mount
> option (maybe, we can shorten the option).

There is no problem using write cache as long as the device supports
flush. SATA/NVME protocol specified all devices should support flush.

As long as flush is supported, fua can be emulated.
Thus write cache is not a problem at all, as long as flush is
implemented correctly.

> 
>> There are also plenty of valid reasons to want to use the write cache
> anyway.
> 
> I cannot think of one. Who would sacrifice data integrity/potential
> total loss of the filesystem for speed?

No data integrity is lost, and performance is greatly improved with
write cache.

Thanks,
Qu
> 
>> As far as FUA/DPO, I know of exactly _zero_ devices that lie about
> implementing it and don't.
> ...
>> but the fact that Linux used to not issue a FLUSH command to the disks
> when you called fsync in userspace.
> 
> Ok, thanks for that clarification.
> 
> 
> Greetings,
> Hendrik
> 
> ------ Originalnachricht ------
> Von: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
> An: "Hendrik Friedel" <hendrik@friedels.name>; "Qu Wenruo"
> <quwenruo.btrfs@gmx.com>; linux-btrfs@vger.kernel.org
> Gesendet: 03.04.2019 20:44:09
> Betreff: Re: btrfs and write barriers
> 
>> On 2019-04-03 14:17, Hendrik Friedel wrote:
>>> Hello,
>>>
>>> thanks for your reply.
>>>
>>>>> 3) Even more, it would be good, if btrfs would disable the write cache
>>>>> in that case, so that one does not need to rely on the user
>>>> Personally speaking, if user really believes it's write cache causing
>>>> the problem or want to be extra safe, then they should disable cache.
>>> How many percent of the users will be able to judge that?
>>>> As long as FLUSH is implemented without problem, the only faulty
>>>> part is
>>>> btrfs itself and I haven't found any proof of either yet.
>>> But you have searched?
>>>
>>>  >>2) I find the location of the (only?) warning -dmesg- well hidden.
>>> I think it would be better to notify the user when creating the
>>> file-system.
>>>  >A notification on creating the volume and ones when adding devices
>>> (either via `device add` or via a replace operation)
>>>  >would indeed be nice, but we should still keep the kernel log warning.
>>>
>>> Ok, so what would be the way to move forward on that? Would it help
>>> if I create an issue in a https://bugzilla.kernel.org/ ?
>> The biggest issue is actually figuring out if the devices don't
>> support write barriers (which means no FLUSH or broken FLUSH on Linux,
>> not no FUA/DPO, because as long as the device properly implements
>> FLUSH (and most do), Linux will provide a FUA emulation which works
>> for write barriers).  Once you've got that, it should be pretty
>> trivial to add to the messages.
>>>
>>>  >>3) Even more, it would be good, if btrfs would disable the write
>>> cache in that case, so that one does not need to rely on the user
>>>  > I would tend to disagree here. We should definitely _recommend_
>>> this to the user if we know there is no barrier support, but just
>>>  > doing it behind their back is not a good idea.
>>>
>>> Well, there is some room between 'automatic' and 'behind their back.
>>> E.g.
>>> "Barriers are not supported by /dev/sda. Automatically disabling
>>> write-cache on mount. You can suppress this with the
>>> 'enable-cache-despite-no-barrier-support-I-know-what-I-am-doing'
>>> mount option (maybe, we can shorten the option).
>> And that's still 'behind the back' because it's a layering violation.
>> Even LVM and MD don't do this, and they have even worse issues than we
>> do because they aren't CoW.
>>>
>>>  > There are also plenty of valid reasons to want to use the write
>>> cache anyway.
>>> I cannot think of one. Who would sacrifice data integrity/potential
>>> total loss of the filesystem for speed?
>> There are quite a few cases where the risk of data loss _just doesn't
>> matter_, and any data that could be invalid is also inherently stale.
>> Some trivial examples:
>>
>> * /run on any modern Linux system. Primarily contains sockets used by
>> running services, PID files for daemons, and other similar things that
>> only matter for the duration of the current boot of the system. These
>> days, it's usually in-memory, but some people with really tight memory
>> constraints still use persistent storage for it to save memory.
>> * /tmp on any sane UNIX system. Similar case to above, but usually for
>> stuff that only matters on the scale of session lifetimes, or even
>> just process lifetimes.
>> * /var/tmp on most Linux systems. Usually the same case as /tmp.
>> * /var/cache on any sane UNIX system. By definition, if the data here
>> is lost, it doesn't matter, as it only exists for performance reasons
>> anyway. Smart applications will even validate the files they put here,
>> so corruption isn't an issue either.
>>
>> There are bunches of other examples I could list, but all of them are
>> far more situational and application specific.
>>>
>>>  > As far as FUA/DPO, I know of exactly _zero_ devices that lie about
>>> implementing it and don't.
>>> ...
>>>  > but the fact that Linux used to not issue a FLUSH command to the
>>> disks when you called fsync in userspace.
>>> Ok, thanks for that clarification.
>