linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steve French <smfrench@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Allison Collins <allison.henderson@oracle.com>,
	lsf-pc@lists.linux-foundation.org,
	xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Atomic Writes
Date: Thu, 20 Feb 2020 15:30:42 -0600	[thread overview]
Message-ID: <CAH2r5mvGHbibnxzERepYqbG0+yacD+pfLanBz52j16WkNm6-1g@mail.gmail.com> (raw)
In-Reply-To: <20200214044242.GI6870@magnolia>

The idea of using O_TMPFILE is interesting ... but opening an
O_TMPFILE is awkward for network file systems because it is not an
atomic operation either ... (create/close then open)

On Thu, Feb 13, 2020 at 10:43 PM Darrick J. Wong
<darrick.wong@oracle.com> wrote:
>
> On Thu, Feb 13, 2020 at 03:33:08PM -0700, Allison Collins wrote:
> > Hi all,
> >
> > I know there's a lot of discussion on the list right now, but I'd like to
> > get this out before too much time gets away.  I would like to propose the
> > topic of atomic writes.  I realize the topic has been discussed before, but
> > I have not found much activity for it recently so perhaps we can revisit it.
> > We do have a customer who may have an interest, so I would like to discuss
> > the current state of things, and how we can move forward.  If efforts are in
> > progress, and if not, what have we learned from the attempt.
> >
> > I also understand there are multiple ways to solve this problem that people
> > may have opinions on.  I've noticed some older patch sets trying to use a
> > flag to control when dirty pages are flushed, though I think our customer
> > would like to see a hardware solution via NVMe devices.  So I would like to
> > see if others have similar interests as well and what their thoughts may be.
> > Thanks everyone!
>
> Hmmm well there are a number of different ways one could do this--
>
> 1) Userspace allocates an O_TMPFILE file, clones all the file data to
> it, makes whatever changes it wants (thus invoking COW writes), and then
> calls some ioctl to swap the differing extent maps atomically.  For XFS
> we have most of those pieces, but we'd have to add a log intent item to
> track the progress of the remap so that we can complete the remap if the
> system goes down.  This has potentially the best flexibility (multiple
> processes can coordinate to stage multiple updates to non-overlapping
> ranges of the file) but is also a nice foot bazooka.
>
> 2) Set O_ATOMIC on the file, ensure that all writes are staged via COW,
> and defer the cow remap step until we hit the synchronization point.
> When that happens, we persist the new mappings somewhere (e.g. well
> beyond all possible EOF in the XFS case) and then start an atomic remap
> operation to move the new blocks into place in the file.  (XFS would
> still have to add a new log intent item here to finish the remapping if
> the system goes down.)  Less foot bazooka but leaves lingering questions
> like what do you do if multiple processes want to run their own atomic
> updates?
>
> (Note that I think you have some sort of higher level progress tracking
> of the remap operation because we can't leave a torn write just because
> the computer crashed.)
>
> 3) Magic pwritev2 API that lets userspace talk directly to hardware
> atomic writes, though I don't know how userspace discovers what the
> hardware limits are.   I'm assuming the usual sysfs knobs?
>
> Note that #1 and #2 are done entirely in software, which makes them less
> performant but OTOH there's effectively no limit (besides available
> physical space) on how much data or how many non-contiguous extents we
> can stage and commit.
>
> --D
>
> > Allison



-- 
Thanks,

Steve

      parent reply	other threads:[~2020-02-20 21:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-13 22:33 [Lsf-pc] [LSF/MM/BPF TOPIC] Atomic Writes Allison Collins
2020-02-14  4:42 ` Darrick J. Wong
2020-02-15 19:53   ` Matthew Wilcox
2020-02-16 21:41     ` Dave Chinner
2020-02-20 21:30   ` Steve French [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAH2r5mvGHbibnxzERepYqbG0+yacD+pfLanBz52j16WkNm6-1g@mail.gmail.com \
    --to=smfrench@gmail.com \
    --cc=allison.henderson@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).