From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264367AbTLBUbj (ORCPT ); Tue, 2 Dec 2003 15:31:39 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264365AbTLBU26 (ORCPT ); Tue, 2 Dec 2003 15:28:58 -0500 Received: from havoc.gtf.org ([63.247.75.124]:17558 "EHLO havoc.gtf.org") by vger.kernel.org with ESMTP id S264364AbTLBUQu (ORCPT ); Tue, 2 Dec 2003 15:16:50 -0500 Date: Tue, 2 Dec 2003 15:16:49 -0500 From: Jeff Garzik To: Greg Stark Cc: Mike Fedyk , Erik Steffl , linux-kernel@vger.kernel.org Subject: Re: libata in 2.4.24? Message-ID: <20031202201649.GB17779@gtf.org> References: <3FCBB15F.7050505@rackable.com> <3FCBB9F1.2080300@bigfoot.com> <87n0abbx2k.fsf@stark.dyndns.tv> <20031202055336.GO1566@mis-mike-wstn.matchmail.com> <20031202055852.GP1566@mis-mike-wstn.matchmail.com> <87zneb9o5q.fsf@stark.dyndns.tv> <20031202180241.GB1990@gtf.org> <87iskz9hp6.fsf@stark.dyndns.tv> <20031202190646.GA9043@gtf.org> <877k1f9e1g.fsf@stark.dyndns.tv> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <877k1f9e1g.fsf@stark.dyndns.tv> User-Agent: Mutt/1.3.28i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 02, 2003 at 03:10:19PM -0500, Greg Stark wrote: > Jeff Garzik writes: > > > So, today, no acknowledgement occurs until the data _really_ is in the > > drive's buffers. > > The drive's buffers isn't good enough. If power is lost the write will be lost > and the database corrupt. It needs to be on the platters. Certainly agreed. > > > This doesn't happen with SCSI disks where multiple requests can be pending so > > > there's no urgency to reporting a false success. The request doesn't complete > > > until the write hits disk. As a result SCSI disks are reliable for database > > > operation and IDE disks aren't unless write caching is disabled. > > > > This is not really true. > > > > Regardless of TCQ, if the OS driver has not issued a FLUSH CACHE (IDE) > > or SYNCHRONIZE CACHE (SCSI), then the data is not guaranteed to be on > > the disk media. Plain and simple. > > That doesn't agree with people's experience. People seem to find that SCSI > drives never cache writes. This sort of makes sense since there's just not > much reason to report a write success before the write can be performed. > There's no performance advantage as long as more requests can be queued up. Some IDE _and/or_ SCSI drives do not cache writes. For these drives, the _absence_ of an OS flush-cache command still means your data gets to the platter. The core problem is not issuing a flush-cache command, it sounds like. The drive technology (wcache, or no) is largely irrelevant. > > If fsync(2) returns without a flush-cache, then your data is not > > guaranteed to be on the disk. And as you noted, flush-cache destroys > > performance. > > It's my understanding that it doesn't. There was some discussion in the past eh? flush-cache very definitely hurts performance, on both IDE and SCSI, for drives that support write caching. > > There are three levels: > > > > a) Data is successfully transferred to the controller/drive queue (TCQ). > > b) Data is successfully transferred to the drive's internal buffers. > > c) The drive successfully transfers data to the media. > > Only the third is of interest to Postgres or other databases. In fact, I Certainly. > suspect only the third is of interest to other systems that are supposed to be > reliable like MTAs etc. I think Wietse and others would be shocked if they > were told fsync wasn't guaranteed to have waited until the writes had actually > hit the media. As well he should be :) Jeff