All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: Christoph Hellwig <hch@lst.de>, <linux-fsdevel@vger.kernel.org>,
	<linux-xfs@vger.kernel.org>, <linux-block@vger.kernel.org>
Subject: Re: [RFC] failure atomic writes for file systems and block devices
Date: Tue, 28 Feb 2017 15:48:16 -0500	[thread overview]
Message-ID: <e4bc2911-99ee-0049-a11d-3944c1770cff@fb.com> (raw)
In-Reply-To: <20170228145737.19016-1-hch@lst.de>



On 02/28/2017 09:57 AM, Christoph Hellwig wrote:
> Hi all,
>
> this series implements a new O_ATOMIC flag for failure atomic writes
> to files.   It is based on and tries to unify to earlier proposals,
> the first one for block devices by Chris Mason:
>
> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_573092_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P5byIhbDCF-kdlNpZVpxMKG3E36-cQ-lK27coqUFUng&s=rqXtuRMvf2rijHel_VAiO-KQ8AtQ5DXEI2obnCI_ljQ&e=
>
> and the second one for regular files, published by HP Research at
> Usenix FAST 2015:
>
> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__www.usenix.org_conference_fast15_technical-2Dsessions_presentation_verma&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P5byIhbDCF-kdlNpZVpxMKG3E36-cQ-lK27coqUFUng&s=ilnrrNs8nG4_UV2xx7tc2Efm20d2Wa8PHoJE8WUTCwI&e=
>
> It adds a new O_ATOMIC flag for open, which requests writes to be
> failure-atomic, that is either the whole write makes it to persistent
> storage, or none of it, even in case of power of other failures.
>
> There are two implementation various of this:  on block devices O_ATOMIC
> must be combined with O_(D)SYNC so that storage devices that can handle
> large writes atomically can simply do that without any additional work.
> This case is supported by NVMe.
>

Hi Christoph,

This is great, and supporting code in both dio and bio get rid of some 
of the warts from when I tried.  The DIO_PAGES define used to be an 
upper limit on the max contiguous bio that would get built, but that's 
much better now.

One thing that isn't clear to me is how we're dealing with boundary bio 
mappings, which will get submitted by submit_page_section()

sdio->boundary = buffer_boundary(map_bh);

In btrfs I'd just chain things together and do the extent pointer swap 
afterwards, but I didn't follow the XFS code well enough to see how its 
handled there.  But either way it feels like an error prone surprise 
waiting for later, and one gap we really want to get right in the FS 
support is O_ATOMIC across a fragmented extent.

If I'm reading the XFS patches right, the code always cows for atomic. 
Are you planning on adding an optimization to use atomic support in the 
device to skip COW when possible?

To turn off mysql double buffering, we only need 16K or 64K writes, 
which most of the time you'd be able to pass down directly without cows.

-chris

WARNING: multiple messages have this Message-ID (diff)
From: Chris Mason <clm@fb.com>
To: Christoph Hellwig <hch@lst.de>,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-block@vger.kernel.org
Subject: Re: [RFC] failure atomic writes for file systems and block devices
Date: Tue, 28 Feb 2017 15:48:16 -0500	[thread overview]
Message-ID: <e4bc2911-99ee-0049-a11d-3944c1770cff@fb.com> (raw)
In-Reply-To: <20170228145737.19016-1-hch@lst.de>



On 02/28/2017 09:57 AM, Christoph Hellwig wrote:
> Hi all,
>
> this series implements a new O_ATOMIC flag for failure atomic writes
> to files.   It is based on and tries to unify to earlier proposals,
> the first one for block devices by Chris Mason:
>
> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_573092_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P5byIhbDCF-kdlNpZVpxMKG3E36-cQ-lK27coqUFUng&s=rqXtuRMvf2rijHel_VAiO-KQ8AtQ5DXEI2obnCI_ljQ&e=
>
> and the second one for regular files, published by HP Research at
> Usenix FAST 2015:
>
> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__www.usenix.org_conference_fast15_technical-2Dsessions_presentation_verma&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=P5byIhbDCF-kdlNpZVpxMKG3E36-cQ-lK27coqUFUng&s=ilnrrNs8nG4_UV2xx7tc2Efm20d2Wa8PHoJE8WUTCwI&e=
>
> It adds a new O_ATOMIC flag for open, which requests writes to be
> failure-atomic, that is either the whole write makes it to persistent
> storage, or none of it, even in case of power of other failures.
>
> There are two implementation various of this:  on block devices O_ATOMIC
> must be combined with O_(D)SYNC so that storage devices that can handle
> large writes atomically can simply do that without any additional work.
> This case is supported by NVMe.
>

Hi Christoph,

This is great, and supporting code in both dio and bio get rid of some 
of the warts from when I tried.  The DIO_PAGES define used to be an 
upper limit on the max contiguous bio that would get built, but that's 
much better now.

One thing that isn't clear to me is how we're dealing with boundary bio 
mappings, which will get submitted by submit_page_section()

sdio->boundary = buffer_boundary(map_bh);

In btrfs I'd just chain things together and do the extent pointer swap 
afterwards, but I didn't follow the XFS code well enough to see how its 
handled there.  But either way it feels like an error prone surprise 
waiting for later, and one gap we really want to get right in the FS 
support is O_ATOMIC across a fragmented extent.

If I'm reading the XFS patches right, the code always cows for atomic. 
Are you planning on adding an optimization to use atomic support in the 
device to skip COW when possible?

To turn off mysql double buffering, we only need 16K or 64K writes, 
which most of the time you'd be able to pass down directly without cows.

-chris

  parent reply	other threads:[~2017-02-28 20:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-28 14:57 [RFC] failure atomic writes for file systems and block devices Christoph Hellwig
2017-02-28 14:57 ` [PATCH 01/12] uapi/fs: add O_ATOMIC to the open flags Christoph Hellwig
2017-02-28 14:57 ` [PATCH 02/12] iomap: pass IOMAP_* flags to actors Christoph Hellwig
2017-02-28 14:57 ` [PATCH 03/12] iomap: add a IOMAP_ATOMIC flag Christoph Hellwig
2017-02-28 14:57 ` [PATCH 04/12] fs: add a BH_Atomic flag Christoph Hellwig
2017-02-28 14:57 ` [PATCH 05/12] fs: add a F_IOINFO fcntl Christoph Hellwig
2017-02-28 16:51   ` Darrick J. Wong
2017-03-01 15:11     ` Christoph Hellwig
2017-02-28 14:57 ` [PATCH 06/12] xfs: cleanup is_reflink checks Christoph Hellwig
2017-02-28 14:57 ` [PATCH 07/12] xfs: implement failure-atomic writes Christoph Hellwig
2017-02-28 23:09   ` Darrick J. Wong
2017-03-01 15:17     ` Christoph Hellwig
2017-02-28 14:57 ` [PATCH 08/12] xfs: implement the F_IOINFO fcntl Christoph Hellwig
2017-02-28 14:57 ` [PATCH 09/12] block: advertize max atomic write limit Christoph Hellwig
2017-02-28 14:57 ` [PATCH 10/12] block_dev: set REQ_NOMERGE for O_ATOMIC writes Christoph Hellwig
2017-02-28 14:57 ` [PATCH 11/12] block_dev: implement the F_IOINFO fcntl Christoph Hellwig
2017-02-28 14:57 ` [PATCH 12/12] nvme: export the atomic write limit Christoph Hellwig
2017-02-28 20:48 ` Chris Mason [this message]
2017-02-28 20:48   ` [RFC] failure atomic writes for file systems and block devices Chris Mason
2017-03-01 15:07   ` Christoph Hellwig
2017-02-28 23:22 ` Darrick J. Wong
2017-03-01 15:09   ` Christoph Hellwig
2017-03-01 11:21 ` Amir Goldstein
2017-03-01 15:07   ` Christoph Hellwig
2017-03-01 15:07     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4bc2911-99ee-0049-a11d-3944c1770cff@fb.com \
    --to=clm@fb.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.