linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Alberto Bursi <alberto.bursi@outlook.it>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: Is it possible that certain physical disk doesn't implement flush correctly?
Date: Sun, 31 Mar 2019 15:36:53 +0200	[thread overview]
Message-ID: <a340c3a4-65e7-a1f1-89a0-6b922dfb8755@suse.de> (raw)
In-Reply-To: <371167e3-b1d1-48f5-e8a3-501cc41bddf6@gmx.com>

On 3/31/19 2:00 PM, Qu Wenruo wrote:
> 
> 
> On 2019/3/31 下午7:27, Alberto Bursi wrote:
>>
>> On 30/03/19 13:31, Qu Wenruo wrote:
>>> Hi,
>>>
>>> I'm wondering if it's possible that certain physical device doesn't
>>> handle flush correctly.
>>>
>>> E.g. some vendor does some complex logical in their hdd controller to
>>> skip certain flush request (but not all, obviously) to improve performance?
>>>
>>> Do anyone see such reports?
>>>
>>> And if proves to happened before, how do we users detect such problem?
>>>
>>> Can we just check the flush time against the write before flush call?
>>> E.g. write X random blocks into that device, call fsync() on it, check
>>> the execution time. Repeat Y times, and compare the avg/std.
>>> And change X to 2X/4X/..., repeat above check.
>>>
>>> Thanks,
>>> Qu
>>>
>>>
>>
>> Afaik HDDs and SSDs do lie to fsync()
> 
> fsync() on block device is interpreted into FLUSH bio.
> 
> If all/most consumer level SATA HDD/SSD devices are lying, then there is
> no power loss safety at all for any fs. As most fs relies on FLUSH bio
> to implement barrier.
> 
> And for fs with generation check, they all should report metadata from
> the future every time a crash happens, or even worse gracefully
> umounting fs would cause corruption.
> 
Please, stop making assumptions.

Disks don't 'lie' about anything, they report things according to the 
(SCSI) standard.
And the SCSI standard has two ways of ensuring that things are written 
to disk: the SYNCHRONIZE_CACHE command and the FUA (force unit access) 
bit in the command.
The latter provides a way of ensuring that a single command made it to 
disk, and the former instructs the driver to:

"a) perform a write medium operation to the LBA using the logical block 
data in volatile cache; or
b) write the logical block to the non-volatile cache, if any."

which means it's perfectly fine to treat the write-cache as a 
_non-volative_ cache if the RAID HBA is battery backed, and thus can 
make sure that outstanding I/O can be written back even in the case of a 
power failure.

The FUA handling, OTOH, is another matter, and indeed is causing some 
raised eyebrows when comparing it to the spec. But that's another story.

Cheers,

Hannes
-- 
r. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                              +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

  reply	other threads:[~2019-03-31 13:37 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-30 12:31 Is it possible that certain physical disk doesn't implement flush correctly? Qu Wenruo
2019-03-30 12:57 ` Supercilious Dude
2019-03-30 13:00   ` Qu Wenruo
2019-03-30 13:04     ` Supercilious Dude
2019-03-30 13:09       ` Qu Wenruo
2019-03-30 13:14         ` Supercilious Dude
2019-03-30 13:24           ` Qu Wenruo
2019-03-31 22:45             ` J. Bruce Fields
2019-03-31 23:07               ` Alberto Bursi
2019-03-31 11:27 ` Alberto Bursi
2019-03-31 12:00   ` Qu Wenruo
2019-03-31 13:36     ` Hannes Reinecke [this message]
2019-03-31 14:17       ` Qu Wenruo
2019-03-31 14:37         ` Hannes Reinecke
2019-03-31 14:40           ` Qu Wenruo
2019-03-31 12:21   ` Andrei Borzenkov
2019-04-01 11:55   ` Austin S. Hemmelgarn
2019-04-01 12:04 ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a340c3a4-65e7-a1f1-89a0-6b922dfb8755@suse.de \
    --to=hare@suse.de \
    --cc=alberto.bursi@outlook.it \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).