linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kenichi Okuyama <okuyamak@dd.iij4u.or.jp>
To: jgarzik@pobox.com
Cc: gene.heskett@verizon.net, linux-kernel@vger.kernel.org
Subject: Re: Disk write cache
Date: Mon, 16 May 2005 02:20:40 +0900 (JST)	[thread overview]
Message-ID: <20050516.022040.81456242.okuyamak@dd.iij4u.or.jp> (raw)
In-Reply-To: <42877C1B.2030008@pobox.com>

>>>>> "Jeff" == Jeff Garzik <jgarzik@pobox.com> writes:

Jeff> Kenichi Okuyama wrote:
>>>>>>> "Jeff" == Jeff Garzik <jgarzik@pobox.com> writes:
>> 
>> 
Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote:
>> 
>>>> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
>>>> 
>>>>> On Sun, 15 May 2005, Tomasz Torcz wrote:
>>>>> 
>>>>>> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
>>>>>> 
>>>>>>>>>> However they've patched the FreeBSD kernel to
>>>>>>>>>> "workaround?" it:
>>>>>>>>>> ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht
>>>>>>>>>> t5.patch
>>>>>>>>> 
>>>>>>>>> That's a similar stupid idea as they did with the disk write
>>>>>>>>> cache (lowering the MTBFs of their disks by considerable
>>>>>>>>> factors, which is much worse than the power off data loss
>>>>>>>>> problem) Let's not go down this path please.
>>>>>>>> 
>>>>>>>> What wrong did they do with disk write cache?
>>>>>>> 
>>>>>>> They turned it off by default, which according to disk vendors
>>>>>>> lowers the MTBF of your disk to a fraction of the original
>>>>>>> value.
>>>>>>> 
>>>>>>> I bet the total amount of valuable data lost for FreeBSD users
>>>>>>> because of broken disks is much much bigger than what they
>>>>>>> gained from not losing in the rather hard to hit power off
>>>>>>> cases.
>>>>>> 
>>>>> Aren't I/O barriers a way to safely use write cache?
>>>>> 
>>>>> FreeBSD used these barriers (FLUSH CACHE command) long time ago.
>>>>> 
>>>>> There are rumors that some disks ignore FLUSH CACHE command just to
>>>>> get higher benchmarks in Windows. But I haven't heart of any proof.
>>>>> Does anybody know, what companies fake this command?
>>>>> 
>>>> 
>>>>> From a story I read elsewhere just a few days ago, this problem is 
>>>> virtually universal even in the umpty-bucks 15,000 rpm scsi server 
>>>> drives.  It appears that this is just another way to crank up the 
>>>> numbers and make each drive seem faster than its competition.
>>>> 
>>>> My gut feeling is that if this gets enough ink to get under the drive 
>>>> makers skins, we will see the issuance of a utility from the makers 
>>>> that will re-program the drives therefore enabling the proper 
>>>> handling of the FLUSH CACHE command.  This would be an excellent 
>>>> chance IMO, to make a bit of noise if the utility comes out, but only 
>>>> runs on windows.  In that event, we hold their feet to the fire (the 
>>>> prefereable method), or a wrapper is written that allows it to run on 
>>>> any os with a bash-like shell manager.
>> 
>> 
>> 
Jeff> There is a large amount of yammering and speculation in this thread.
>> 
Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE.
>> 
>> 
>> Then it must be file system who's not controlling properly.  And
>> because this is so widely spread among Linux, there must be at least
>> one bug existing in VFS ( or there was, and everyone copied it ).
>> 
>> At least, from:
>> 
>> http://developer.osdl.jp/projects/doubt/
>> 
>> there is project name "diskio" which does black box test about this:
>> 
>> http://developer.osdl.jp/projects/doubt/diskio/index.html
>> 
>> And if we assume for Read after Write access semantics of HDD for
>> "SURELY" checking the data image on disk surface ( by HDD, I mean ),
>> on both SCSI and ATA, ALL the file system does not pass the test.
>> 
>> And I was wondering who's bad. File system? Device driver of both
>> SCSI and ATA? or criterion? From Jeff's point, it seems like file
>> system or criterion...

Jeff> The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE 
Jeff> command to be generated has only been present in the most recent 2.6.x 
Jeff> kernels.  See the "write barrier" stuff that people have been discussing.

Jeff> Furthermore, read-after-write implies nothing at all.  The only way to 
Jeff> you can be assured that your data has "hit the platter" is
Jeff> (1) issuing [FLUSH|SYNC] CACHE, or
Jeff> (2) using FUA-style disk commands

Jeff> It sounds like your test (or reasoning) is invalid.

Thank you for you information, Jeff.

I didn't see the reason why my reasoning is invalid, for they are
black box test and doesn't care about implementation.

But with your explanation and some logs, I see where to look for.
I'll run test with FreeBSD as soon as I got time.
If FreeBSD fails, there must be something wrong with reasoning.

Thanks again for great hint.
regards,
---- 
Kenichi Okuyama

  parent reply	other threads:[~2005-05-15 17:23 UTC|newest]

Thread overview: 142+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-13  5:51 Hyper-Threading Vulnerability Gabor MICSKO
2005-05-13 12:47 ` Barry K. Nathan
2005-05-13 14:10   ` Jeff Garzik
2005-05-13 14:23     ` Daniel Jacobowitz
2005-05-13 14:32       ` Jeff Garzik
2005-05-13 17:13         ` Andy Isaacson
2005-05-13 18:30           ` Vadim Lobanov
2005-05-13 19:02             ` Andy Isaacson
2005-05-15  9:31               ` Adrian Bunk
2005-05-13 17:14         ` Gabor MICSKO
2005-05-13 20:23     ` Barry K. Nathan
2005-05-13 18:03 ` Andi Kleen
2005-05-13 18:34   ` Eric Rannaud
2005-05-13 18:35   ` Alan Cox
2005-05-13 18:49     ` Scott Robert Ladd
2005-05-13 19:08       ` Andi Kleen
2005-05-13 19:36       ` Grant Coady
2005-05-16 17:00       ` Linus Torvalds
2005-05-16 12:37         ` Tommy Reynolds
2005-05-18 19:07     ` Bill Davidsen
2005-05-13 18:38   ` Richard F. Rebel
2005-05-13 19:05     ` Andi Kleen
2005-05-13 21:26       ` Andy Isaacson
2005-05-13 21:59         ` Matt Mackall
2005-05-13 22:47           ` Alan Cox
2005-05-13 23:00             ` Lee Revell
2005-05-13 23:27               ` Dave Jones
2005-05-13 23:38                 ` Lee Revell
2005-05-13 23:44                   ` Dave Jones
2005-05-14  7:37                     ` Lee Revell
2005-05-14 15:33                       ` Andrea Arcangeli
2005-05-15  1:07                         ` Christer Weinigel
2005-05-15  9:48                         ` Andi Kleen
2005-05-14 15:23                   ` Alan Cox
2005-05-14 15:45                     ` andrea
2005-05-15 13:38                       ` Mikulas Patocka
2005-05-16  7:06                         ` andrea
2005-05-14 16:30                     ` Lee Revell
2005-05-14 16:44                       ` Arjan van de Ven
2005-05-14 17:56                         ` Lee Revell
2005-05-14 18:01                           ` Arjan van de Ven
2005-05-14 19:21                             ` Lee Revell
2005-05-14 19:48                               ` Arjan van de Ven
2005-05-14 23:40                                 ` Lee Revell
2005-05-15  7:30                                   ` Arjan van de Ven
2005-05-15 20:41                                     ` Alan Cox
2005-05-15 20:48                                       ` Arjan van de Ven
2005-05-15 21:10                                         ` Lee Revell
2005-05-15 22:55                                           ` Dave Jones
2005-05-15 23:10                                             ` Lee Revell
2005-05-16  7:25                                               ` Arjan van de Ven
2005-05-15  9:37                                   ` Andi Kleen
2005-05-15  3:19                                 ` dean gaudet
2005-05-15 10:01                             ` Andi Kleen
2005-05-15 10:23                               ` 2.6.4 timer and helper functions kernel
2005-05-19  0:38                                 ` George Anzinger
2005-05-15  9:33                           ` Hyper-Threading Vulnerability Adrian Bunk
2005-05-14 17:04                       ` Jindrich Makovicka
2005-05-14 18:27                         ` Lee Revell
2005-05-15  9:58                       ` Andi Kleen
2005-05-14  0:39         ` dean gaudet
2005-05-16 13:41           ` Andrea Arcangeli
2005-05-15  9:43         ` Andi Kleen
2005-05-15 18:42           ` David Schwartz
2005-05-15 18:56             ` Dr. David Alan Gilbert
2005-05-16  7:10           ` Eric W. Biederman
2005-05-16 11:04             ` Andi Kleen
2005-05-16 19:14               ` Eric W. Biederman
2005-05-16 20:05                 ` Valdis.Kletnieks
2005-05-15 14:00         ` Mikulas Patocka
2005-05-15 14:26         ` Andi Kleen
2005-05-13 23:32       ` Paul Jakma
2005-05-14 16:29         ` Paul Jakma
2005-05-13 19:14     ` Jim Crilly
2005-05-13 20:18       ` Barry K. Nathan
2005-05-13 23:14         ` Jim Crilly
2005-05-13 19:16   ` Diego Calleja
2005-05-13 19:42     ` Frank Denis (Jedi/Sector One)
2005-05-15  9:54     ` Andi Kleen
2005-05-15 13:51       ` Mikulas Patocka
2005-05-15 14:12         ` Andi Kleen
2005-05-15 14:21           ` Mikulas Patocka
2005-05-15 14:52           ` Tomasz Torcz
2005-05-15 15:00             ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka
2005-05-15 15:21               ` Gene Heskett
2005-05-15 15:29                 ` Jeff Garzik
2005-05-15 16:27                   ` Disk write cache Kenichi Okuyama
2005-05-15 16:43                     ` Jeff Garzik
2005-05-15 16:50                       ` Kyle Moffett
2005-05-15 16:56                       ` Andi Kleen
2005-05-15 20:44                         ` Andrew Morton
2005-05-15 23:31                           ` Cache based insecurity/CPU cache/Disk Cache Tradeoffs Brian O'Mahoney
2005-05-15 16:58                       ` Disk write cache Mikulas Patocka
2005-05-15 17:20                       ` Kenichi Okuyama [this message]
2005-05-16 11:02                       ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree
2005-05-16 11:12                         ` Arjan van de Ven
2005-05-16 11:29                           ` Matthias Andree
2005-05-16 14:02                             ` Arjan van de Ven
2005-05-16 14:48                               ` Matthias Andree
2005-05-16 15:06                                 ` Alan Cox
2005-05-16 15:40                                   ` Matthias Andree
2005-05-16 18:04                                     ` Alan Cox
2005-05-16 19:11                                       ` Linux does not care for data integrity Florian Weimer
2005-05-29 21:02                                   ` Linux does not care for data integrity (was: Disk write cache) Greg Stark
2005-05-29 21:16                                     ` Matthias Andree
2005-05-30  6:04                                       ` Greg Stark
2005-05-30  8:21                                         ` Matthias Andree
2005-06-01 19:02                                       ` Linux does not care for data integrity Bill Davidsen
2005-06-01 22:02                                         ` Matthias Andree
2005-06-02  0:12                                           ` Bill Davidsen
2005-06-02  0:36                                         ` Jeff Garzik
2005-06-02  1:37                                           ` Bill Davidsen
2005-06-02  1:54                                             ` Jeff Garzik
2005-06-02  8:53                                         ` Helge Hafting
2005-06-02 12:00                                           ` Bill Davidsen
2005-06-02 13:33                                             ` Lennart Sorensen
2005-06-04 13:37                                               ` Bill Davidsen
2005-06-04 15:31                                                 ` Bernd Eckenfels
2005-05-16 14:57                           ` Linux does not care for data integrity (was: Disk write cache) Alan Cox
2005-05-16 13:48                         ` Linux does not care for data integrity Mark Lord
2005-05-16 14:59                           ` Matthias Andree
2005-05-16  1:56                   ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett
2005-05-16  2:11                     ` Jeff Garzik
2005-05-16  2:24                     ` Mikulas Patocka
2005-05-16  3:05                       ` Gene Heskett
2005-05-16  2:32                     ` Mark Lord
2005-05-16  3:08                       ` Gene Heskett
2005-05-16 13:44                         ` Mark Lord
2005-05-18  4:03                       ` Eric D. Mudama
2005-05-15 16:24                 ` Mikulas Patocka
2005-05-16 11:18                   ` Matthias Andree
2005-05-16 14:33                     ` Jeff Garzik
2005-05-16 15:26                       ` Richard B. Johnson
2005-05-16 16:00                         ` [OT] drive behavior on power-off (was: Disk write cache) Matthias Andree
2005-05-16 18:11                       ` Disk write cache (Was: Hyper-Threading Vulnerability) Valdis.Kletnieks
2005-05-16 14:54                     ` Alan Cox
2005-05-17 13:15                       ` Bill Davidsen
2005-05-17 21:41                         ` Kyle Moffett
2005-05-18  4:06                     ` Eric D. Mudama
2005-05-15 21:38                 ` Tomasz Torcz
2005-05-16 14:50               ` Alan Cox
2005-05-15 15:00             ` Hyper-Threading Vulnerability Arjan van de Ven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050516.022040.81456242.okuyamak@dd.iij4u.or.jp \
    --to=okuyamak@dd.iij4u.or.jp \
    --cc=gene.heskett@verizon.net \
    --cc=jgarzik@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).