All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: David Chinner <dgc@sgi.com>
Cc: xfs@oss.sgi.com, hch@infradead.org
Subject: Re: XFS and write barriers.
Date: Mon, 26 Mar 2007 09:21:43 +1000	[thread overview]
Message-ID: <17927.1031.996460.858328@notabene.brown> (raw)
In-Reply-To: message from David Chinner on Sunday March 25

On Sunday March 25, dgc@sgi.com wrote:
> > Barriers only make sense inside drive firmware.
> 
> I disagree. e.g. Barriers have to be handled by the block layer to
> prevent reordering of I/O in the request queues as well. The
> block layer is responsible for ensuring barrier I/Os, as
> indicated by the filesystem, act as real barriers.

Absolutely.  The block layer needs to understand about barriers and
allow them to do their job, which means not re-ordering requests
around barriers.
My point was that if the functionality cannot be provided in the
lowest-level firmware (as it cannot for raid0 as there is no single
lowest-level firmware), then it should be implemented at the
filesystem level.  Implementing barriers in md or dm doesn't make any
sense (though passing barriers through can in some situations).

> 
> > Trying to emulate it
> > in the md layer doesn't make any sense as the filesystem is in a much
> > better position to do any emulation required.
> 
> You're saying that the emulation of block layer functionality is the
> responsibility of layers above the block layer. Why is this not
> considered a layering violation?

:-)
Maybe it depends on your perspective.  I think this is filesystem
layer functionality.  Making sure blocks are written in the right
order sounds like something that the filesystem should be primarily
responsible for.

The most straight-forward way to implement this is to make sure all
preceding blocks have been written before writing the barrier block.
All filesystems should be able to do this (if it is important to
them).

Because block IO tends to have long pipelines and because this
operation will stall the pipeline, it makes sense for a block IO
subsystem to provide the possibility of implementing this sequencing
without a complete stall, and the 'barrier' flag makes that possible.
But that doesn't mean it is block-layer functionality.  It means (to
me) it is common fs functionality that the block layer is helping out
with.

> > 
> > There should never be a possibility of filesystem corruption.
> > If the a barrier request fails, the filesystem should:
> >   wait for any dependant request to complete
> >   call blkdev_issue_flush
> >   schedule the write of the 'barrier' block
> >   call blkdev_issue_flush again.
> 
> IOWs, the filesystem has to use block device calls to emulate a block device
> barrier I/O. Why can't the block layer, on reception of a barrier write
> and detecting that barriers are no longer supported by the underlying
> device (i.e. in MD), do:
> 
> 	wait for all queued I/Os to complete
> 	call blkdev_issue_flush
> 	schedule the write of the 'barrier' block
> 	call blkdev_issue_flush again.
> 
> And not involve the filesystem at all? i.e. why should the filesystem
> have to do this?

Certainly it could.
However
 a/ The the block layer would have to wait for *all* queued I/O,
    where-as the filesystem would only have to wait for queued IO
    which has a semantic dependence on the barrier block.  So the
    filesystem can potentially perform the operation more efficiently.
 b/ Some block devices don't support barriers, so the filesystem needs
    to have the mechanisms in place to do this already.  Why duplicate
    it in the block layer?
(c/ md/raid0 doesn't track all the outstanding requests...:-)

I think the block device should support barriers when it can do so
more efficiently than the filesystem.  For a single SCSI drive, it
can.  For a logical volume striped over multiple physical devices, it
cannot.

> 
> > My understand is that that sequence is as safe as a barrier, but maybe
> > not as fast.
> 
> Yes, and my understanding is that the block device is perfectly capable
> of implementing this just as safely as the filesystem.
> 

But possibly not as efficiently...

What did XFS do before the block layer supported barriers?


> > The patch looks at least believable.  As you can imagine it is awkward
> > to test thoroughly.
> 
> As well as being pretty much impossible to test reliably with an
> automated testing framework. Hence so ongoing test coverage will
> approach zero.....

This is a problem with barriers in general.... it is very hard to test
that the data is encoded on the platter at any given time :-(

NeilBrown

  reply	other threads:[~2007-03-25 23:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-23  1:26 XFS and write barriers Neil Brown
2007-03-23  5:30 ` David Chinner
2007-03-23  7:49   ` Neil Brown
2007-03-25  4:17     ` David Chinner
2007-03-25 23:21       ` Neil Brown [this message]
2007-03-26  3:14         ` David Chinner
2007-03-26  4:27           ` Neil Brown
2007-03-26  9:04             ` David Chinner
2007-03-29 14:56               ` Martin Steigerwald
2007-03-29 15:18                 ` David Chinner
2007-03-29 16:49                   ` Martin Steigerwald
2007-03-23  9:50   ` Christoph Hellwig
2007-03-25  3:51     ` David Chinner
2007-03-25 23:58       ` Neil Brown
2007-03-26  1:11     ` Neil Brown
2007-03-23  6:20 ` Timothy Shimmin
2007-03-23  8:00   ` Neil Brown
2007-03-25  3:19     ` David Chinner
2007-03-26  0:01       ` Neil Brown
2007-03-26  3:58         ` David Chinner
2007-03-27  3:58       ` Timothy Shimmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17927.1031.996460.858328@notabene.brown \
    --to=neilb@suse.de \
    --cc=dgc@sgi.com \
    --cc=hch@infradead.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.