linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-block@vger.kernel.org
Subject: [RFC] failure atomic writes for file systems and block devices
Date: Tue, 28 Feb 2017 06:57:25 -0800	[thread overview]
Message-ID: <20170228145737.19016-1-hch@lst.de> (raw)

Hi all,

this series implements a new O_ATOMIC flag for failure atomic writes
to files.   It is based on and tries to unify to earlier proposals,
the first one for block devices by Chris Mason:

	https://lwn.net/Articles/573092/

and the second one for regular files, published by HP Research at
Usenix FAST 2015:

	https://www.usenix.org/conference/fast15/technical-sessions/presentation/verma

It adds a new O_ATOMIC flag for open, which requests writes to be
failure-atomic, that is either the whole write makes it to persistent
storage, or none of it, even in case of power of other failures.

There are two implementation various of this:  on block devices O_ATOMIC
must be combined with O_(D)SYNC so that storage devices that can handle
large writes atomically can simply do that without any additional work.
This case is supported by NVMe.

The second case is for file systems, where we simply write new blocks
out of places and then remap them into the file atomically on either
completion of an O_(D)SYNC write or when fsync is called explicitly.

The semantics of the latter case are explained in detail at the Usenix
paper above.

Last but not least a new fcntl is implemented to provide information
about I/O restrictions such as alignment requirements and the maximum
atomic write size.

The implementation is simple and clean, but I'm rather unhappy about
the interface as it has too many failure modes to bullet proof.  For
one old kernels ignore unknown open flags silently, so applications
have to check the F_IOINFO fcntl before, which is a bit of a killer.
Because of that I've also not implemented any other validity checks
yet, as they might make thing even worse when an open on a not supported
file system or device fails, but not on an old kernel.  Maybe we need
a new open version that checks arguments properly first?

Also I'm really worried about the NVMe failure modes - devices simply
advertise an atomic write size, with no way for the device to know
that the host requested a given write to be atomic, and thus no
error reporting.  This is made worse by NVMe 1.2 adding per-namespace
atomic I/O parameters that devices can use to introduce additional
odd alignment quirks - while there is some language in the spec
requiring them not to weaken the per-controller guarantees it all
looks rather weak and I'm not too confident in all implementations
getting everything right.

Last but not least this depends on a few XFS patches, so to actually
apply / run the patches please use this git tree:

    git://git.infradead.org/users/hch/vfs.git O_ATOMIC

Gitweb:

    http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/O_ATOMIC

             reply	other threads:[~2017-02-28 14:58 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-28 14:57 Christoph Hellwig [this message]
2017-02-28 14:57 ` [PATCH 01/12] uapi/fs: add O_ATOMIC to the open flags Christoph Hellwig
2017-02-28 14:57 ` [PATCH 02/12] iomap: pass IOMAP_* flags to actors Christoph Hellwig
2017-02-28 14:57 ` [PATCH 03/12] iomap: add a IOMAP_ATOMIC flag Christoph Hellwig
2017-02-28 14:57 ` [PATCH 04/12] fs: add a BH_Atomic flag Christoph Hellwig
2017-02-28 14:57 ` [PATCH 05/12] fs: add a F_IOINFO fcntl Christoph Hellwig
2017-02-28 16:51   ` Darrick J. Wong
2017-03-01 15:11     ` Christoph Hellwig
2017-02-28 14:57 ` [PATCH 06/12] xfs: cleanup is_reflink checks Christoph Hellwig
2017-02-28 14:57 ` [PATCH 07/12] xfs: implement failure-atomic writes Christoph Hellwig
2017-02-28 23:09   ` Darrick J. Wong
2017-03-01 15:17     ` Christoph Hellwig
2017-02-28 14:57 ` [PATCH 08/12] xfs: implement the F_IOINFO fcntl Christoph Hellwig
2017-02-28 14:57 ` [PATCH 09/12] block: advertize max atomic write limit Christoph Hellwig
2017-02-28 14:57 ` [PATCH 10/12] block_dev: set REQ_NOMERGE for O_ATOMIC writes Christoph Hellwig
2017-02-28 14:57 ` [PATCH 11/12] block_dev: implement the F_IOINFO fcntl Christoph Hellwig
2017-02-28 14:57 ` [PATCH 12/12] nvme: export the atomic write limit Christoph Hellwig
2017-02-28 20:48 ` [RFC] failure atomic writes for file systems and block devices Chris Mason
2017-03-01 15:07   ` Christoph Hellwig
2017-02-28 23:22 ` Darrick J. Wong
2017-03-01 15:09   ` Christoph Hellwig
2017-03-01 11:21 ` Amir Goldstein
2017-03-01 15:07   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170228145737.19016-1-hch@lst.de \
    --to=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).