All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: Neil Brown <neilb@suse.de>
Cc: David Chinner <dgc@sgi.com>, xfs@oss.sgi.com, hch@infradead.org
Subject: Re: XFS and write barriers.
Date: Sun, 25 Mar 2007 15:17:55 +1100	[thread overview]
Message-ID: <20070325041755.GJ32602149@melbourne.sgi.com> (raw)
In-Reply-To: <17923.34462.210758.852042@notabene.brown>

On Fri, Mar 23, 2007 at 06:49:50PM +1100, Neil Brown wrote:
> On Friday March 23, dgc@sgi.com wrote:
> > On Fri, Mar 23, 2007 at 12:26:31PM +1100, Neil Brown wrote:
> > > Secondly, if a barrier write fails due to EOPNOTSUPP, it should be
> > > retried without the barrier (after possibly waiting for dependant
> > > requests to complete).  This is what other filesystems do, but I
> > > cannot find the code in xfs which does this.
> > 
> > XFS doesn't handle this - I was unaware that the barrier status of the
> > underlying block device could change....
> > 
> > OOC, when did this behaviour get introduced?
> 
> Probably when md/raid1 started supported barriers....
> 
> The problem is that this interface is (as far as I can see) undocumented
> and not fully specified.

And not communicated very far, either.

> Barriers only make sense inside drive firmware.

I disagree. e.g. Barriers have to be handled by the block layer to
prevent reordering of I/O in the request queues as well. The
block layer is responsible for ensuring barrier I/Os, as
indicated by the filesystem, act as real barriers.

> Trying to emulate it
> in the md layer doesn't make any sense as the filesystem is in a much
> better position to do any emulation required.

You're saying that the emulation of block layer functionality is the
responsibility of layers above the block layer. Why is this not
considered a layering violation?

> > > This is particularly important for md/raid1 as it is quite possible
> > > that barriers will be supported at first, but after a failure and
> > > different device on a different controller could be swapped in that
> > > does not support barriers.
> > 
> > I/O errors are not the way this should be handled. What happens if
> > the opposite happens? A drive that needs barriers is used as a
> > replacement on a filesystem that has barriers disabled because they
> > weren't needed? Now a crash can result in filesystem corruption, but
> > the filesystem has not been able to warn the admin that this
> > situation occurred. 
> 
> There should never be a possibility of filesystem corruption.
> If the a barrier request fails, the filesystem should:
>   wait for any dependant request to complete
>   call blkdev_issue_flush
>   schedule the write of the 'barrier' block
>   call blkdev_issue_flush again.

IOWs, the filesystem has to use block device calls to emulate a block device
barrier I/O. Why can't the block layer, on reception of a barrier write
and detecting that barriers are no longer supported by the underlying
device (i.e. in MD), do:

	wait for all queued I/Os to complete
	call blkdev_issue_flush
	schedule the write of the 'barrier' block
	call blkdev_issue_flush again.

And not involve the filesystem at all? i.e. why should the filesystem
have to do this?

> My understand is that that sequence is as safe as a barrier, but maybe
> not as fast.

Yes, and my understanding is that the block device is perfectly capable
of implementing this just as safely as the filesystem.

> The patch looks at least believable.  As you can imagine it is awkward
> to test thoroughly.

As well as being pretty much impossible to test reliably with an
automated testing framework. Hence so ongoing test coverage will
approach zero.....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  reply	other threads:[~2007-03-25  4:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-23  1:26 XFS and write barriers Neil Brown
2007-03-23  5:30 ` David Chinner
2007-03-23  7:49   ` Neil Brown
2007-03-25  4:17     ` David Chinner [this message]
2007-03-25 23:21       ` Neil Brown
2007-03-26  3:14         ` David Chinner
2007-03-26  4:27           ` Neil Brown
2007-03-26  9:04             ` David Chinner
2007-03-29 14:56               ` Martin Steigerwald
2007-03-29 15:18                 ` David Chinner
2007-03-29 16:49                   ` Martin Steigerwald
2007-03-23  9:50   ` Christoph Hellwig
2007-03-25  3:51     ` David Chinner
2007-03-25 23:58       ` Neil Brown
2007-03-26  1:11     ` Neil Brown
2007-03-23  6:20 ` Timothy Shimmin
2007-03-23  8:00   ` Neil Brown
2007-03-25  3:19     ` David Chinner
2007-03-26  0:01       ` Neil Brown
2007-03-26  3:58         ` David Chinner
2007-03-27  3:58       ` Timothy Shimmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070325041755.GJ32602149@melbourne.sgi.com \
    --to=dgc@sgi.com \
    --cc=hch@infradead.org \
    --cc=neilb@suse.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.