From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Sun, 25 Mar 2007 20:14:29 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l2Q3EL6p003689 for ; Sun, 25 Mar 2007 20:14:24 -0700 Date: Mon, 26 Mar 2007 14:14:07 +1100 From: David Chinner Subject: Re: XFS and write barriers. Message-ID: <20070326031407.GG32597093@melbourne.sgi.com> References: <17923.11463.459927.628762@notabene.brown> <20070323053043.GD32602149@melbourne.sgi.com> <17923.34462.210758.852042@notabene.brown> <20070325041755.GJ32602149@melbourne.sgi.com> <17927.1031.996460.858328@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17927.1031.996460.858328@notabene.brown> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Neil Brown Cc: David Chinner , xfs@oss.sgi.com, hch@infradead.org On Mon, Mar 26, 2007 at 09:21:43AM +1000, Neil Brown wrote: > My point was that if the functionality cannot be provided in the > lowest-level firmware (as it cannot for raid0 as there is no single > lowest-level firmware), then it should be implemented at the > filesystem level. Implementing barriers in md or dm doesn't make any > sense (though passing barriers through can in some situations). Hold on - you've said that the barrier support in a block deivce can change because of MD doing hot swap. Now you're saying there is no barrier implementation in md. Can you explain *exactly* what barrier support there is in MD? > > > Trying to emulate it > > > in the md layer doesn't make any sense as the filesystem is in a much > > > better position to do any emulation required. > > > > You're saying that the emulation of block layer functionality is the > > responsibility of layers above the block layer. Why is this not > > considered a layering violation? > > :-) > Maybe it depends on your perspective. I think this is filesystem > layer functionality. Making sure blocks are written in the right > order sounds like something that the filesystem should be primarily > responsible for. Sure, but if the filesystem requires the block layer to provide those ordering semantics to it. e.g. barrier I/Os. Remember, different filesystem have different levels of data+metadata safety and many of them do nothing to guarantee write ordering. > The most straight-forward way to implement this is to make sure all > preceding blocks have been written before writing the barrier block. > All filesystems should be able to do this (if it is important to them). ^^^^^^^^^^^^^^^^^^^^^^^^^^ And that is the key point - XFS provides no guarantee that your data is on spinning rust other than I/O barriers when you have volatile write caches. IOWs, if you turn barriers off, we provide *no guarantees* about the consistency of your filesystem after a power failure if you are using volatile write caching. This mode is for use with non-cached disks or disks with NVRAM caches where there is no need for barriers. > Because block IO tends to have long pipelines and because this > operation will stall the pipeline, it makes sense for a block IO > subsystem to provide the possibility of implementing this sequencing > without a complete stall, and the 'barrier' flag makes that possible. > But that doesn't mean it is block-layer functionality. It means (to > me) it is common fs functionality that the block layer is helping out > with. I disagree - it is a function supported and defined by the block layer. Errors returned to the filesystem are directly defined in the block layer, the ordering guarantees are provided by the block layer and changes in semantics appear to be defined by the block layer...... > > wait for all queued I/Os to complete > > call blkdev_issue_flush > > schedule the write of the 'barrier' block > > call blkdev_issue_flush again. > > > > And not involve the filesystem at all? i.e. why should the filesystem > > have to do this? > > Certainly it could. > However > a/ The the block layer would have to wait for *all* queued I/O, > where-as the filesystem would only have to wait for queued IO > which has a semantic dependence on the barrier block. So the > filesystem can potentially perform the operation more efficiently. Assuming the filesystem can do it more efficiently. What if it can't? What if, like XFS, when barriers are turned off, the filesystem provides *no* guarantees? > b/ Some block devices don't support barriers, so the filesystem needs > to have the mechanisms in place to do this already. No, you turn write caching off on the drive. This is an especially important consideration given that many older drives lied about cache flushes being complete (i.e. they were implemented as no-ops). > (c/ md/raid0 doesn't track all the outstanding requests...:-) XFS doesn't track all outstanding requests either.... > What did XFS do before the block layer supported barriers? Either turn off write caching or use non-volatile write caches. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group