linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Bligh <alex@alex.org.uk>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Bligh <alex@alex.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, nbd-general@lists.sf.net,
	Paul Clements <Paul.Clements@steeleye.com>
Subject: Re: [PATCH 1/3] nbd: support FLUSH requests
Date: Wed, 13 Feb 2013 15:55:01 +0000	[thread overview]
Message-ID: <3EDEF735-9A67-439E-BA65-089C6AAFD1BF@alex.org.uk> (raw)
In-Reply-To: <511B8E65.5020507@redhat.com>

Paolo,

On 13 Feb 2013, at 13:00, Paolo Bonzini wrote:

> But as far as I can test with free servers, the FUA bits have no
> advantage over flush.  Also, I wasn't sure if SEND_FUA without
> SEND_FLUSH is valid, and if so how to handle this combination (treat it
> as writethrough and add FUA to all requests? warn and do nothing?).

On the main opensource nbd client, the following applies:

What REQ_FUA does is an fdatasync() after the write. Code extract and
comments below from Christoph Hellwig.

What REQ_FLUSH does is to do an fsync().

The way I read Christoph's comment, provided the linux block layer always
issues a REQ_FLUSH before a REQ_FUA, there is not performance problem.

However, a REQ_FUA is going to do a f(data)?sync AFTER the write, whereas
the preceding REQ_FLUSH is going to an fsync() BEFORE the write. It seems
to me that either the FUA and FLUSH semantics are therefore different
(and we need FUA), or that Christoph's comment is wrong and that you
are guaranteed a REQ_FLUSH *after* the write with REQ_FUA.

-- 
Alex Bligh




        } else if (fua) {

          /* This is where we would do the following
           *   #ifdef USE_SYNC_FILE_RANGE
           * However, we don't, for the reasons set out below
           * by Christoph Hellwig <hch@infradead.org>
           *
           * [BEGINS] 
           * fdatasync is equivalent to fsync except that it does not flush
           * non-essential metadata (basically just timestamps in practice), but it
           * does flush metadata requried to find the data again, e.g. allocation
           * information and extent maps.  sync_file_range does nothing but flush
           * out pagecache content - it means you basically won't get your data
           * back in case of a crash if you either:
           * 
           *  a) have a volatile write cache in your disk (e.g. any normal SATA disk)
           *  b) are using a sparse file on a filesystem
           *  c) are using a fallocate-preallocated file on a filesystem
           *  d) use any file on a COW filesystem like btrfs
           * 
           * e.g. it only does anything useful for you if you do not have a volatile
           * write cache, and either use a raw block device node, or just overwrite
           * an already fully allocated (and not preallocated) file on a non-COW
           * filesystem.
           * [ENDS]
           *
           * What we should do is open a second FD with O_DSYNC set, then write to
           * that when appropriate. However, with a Linux client, every REQ_FUA
           * immediately follows a REQ_FLUSH, so fdatasync does not cause performance
           * problems.
           *
           */
#if 0
                sync_file_range(fhandle, foffset, len,
                                SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE |
                                SYNC_FILE_RANGE_WAIT_AFTER);
#else
                fdatasync(fhandle);
#endif
        }



  reply	other threads:[~2013-02-13 15:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12 16:06 [PATCH 0/3] NBD fixes for caching and block device flags Paolo Bonzini
2013-02-12 16:06 ` [PATCH 1/3] nbd: support FLUSH requests Paolo Bonzini
2013-02-12 17:37   ` Alex Bligh
2013-02-12 18:06     ` Paolo Bonzini
2013-02-12 21:32       ` Andrew Morton
2013-02-13  0:03         ` Alex Bligh
2013-02-13 13:00           ` Paolo Bonzini
2013-02-13 15:55             ` Alex Bligh [this message]
2013-02-13 16:02               ` Paolo Bonzini
2013-02-13 17:35                 ` Alex Bligh
2013-02-13  0:00       ` Alex Bligh
2013-02-12 22:07   ` Paul Clements
2013-02-12 16:06 ` [PATCH 2/3] nbd: fsync and kill block device on shutdown Paolo Bonzini
2013-02-12 21:41   ` Andrew Morton
2013-02-13 13:05     ` Paolo Bonzini
2013-02-12 22:15   ` Paul Clements
2013-02-12 16:06 ` [PATCH 3/3] nbd: show read-only state in sysfs Paolo Bonzini
2013-02-12 22:16   ` Paul Clements
2013-02-12 21:43 ` [PATCH 0/3] NBD fixes for caching and block device flags Andrew Morton
2013-02-13 17:14   ` [Nbd] " Wouter Verhelst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3EDEF735-9A67-439E-BA65-089C6AAFD1BF@alex.org.uk \
    --to=alex@alex.org.uk \
    --cc=Paul.Clements@steeleye.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nbd-general@lists.sf.net \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).