All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: xfs <linux-xfs@vger.kernel.org>, Brian Foster <bfoster@redhat.com>
Subject: Re: [PATCH] xfs: handle large CoW remapping requests
Date: Tue, 2 May 2017 11:02:20 -0700	[thread overview]
Message-ID: <20170502180220.GA5973@birch.djwong.org> (raw)
In-Reply-To: <20170502075021.GA7916@infradead.org>

On Tue, May 02, 2017 at 12:50:21AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 27, 2017 at 02:27:54PM -0700, Darrick J. Wong wrote:
> > XFS transactions are constrained both by space and block reservation
> > limits and the fact that we have to avoid doing 64-bit divisions.  This
> > means that we can't remap more than 2^32 blocks at a time.  However,
> > file logical blocks are 64-bit in size, so if we encounter a huge remap
> > request we have to break it up into smaller pieces.
> 
> But where would we get that huge remap request from?

Nowhere, at the moment.  I had O_ATOMIC in mind for this though, since
it'll call end_cow on the entire file at fsync time.  What if you've
written 8GB to a file that you've opened with ATOMIC and then fsync it?
That would trigger a remap longer than MAX_RW_COUNT which will blow the
assert, right?

> We already did the BUILD_BUG_ON for the max read/write size at least.
> Also the remaps would now not be atomic, which would be a problem for
> my O_ATOMIC implementation at least.

Hm... you're right, if we crash midway through the remap then ideally
we'd recover by finishing whatever remapping steps we didn't get to.

The current remapping mechanism only guarantees that whatever little
part of the data fork we've bunmapi'd for each cow fork extent will also
get remapped.  There isn't anything in there that guarantees a remap of
the parts we haven't touched yet.  If one CoW fork extent maps to 2000
data fork extents, we'll atomically remap each of the 2000 extents.  If
we fail at extent 900, the remaining 1100 extents are fed to the CoW
cleanup at the next mount time.  This patch doesn't try to change that
behavior.

For O_ATOMIC I think we'll have to put in some extra log intent items
to help us track all the extents we intend to remap so that we can
pick up where we left off during recovery.  Hm.  It would be difficult
to avoid running into log space problems if there are a lot of extents.

Second half-baked idea: play games with a shadow inode -- allocate an
unlinked inode, persist all the written CoW fork extents to the shadow
inode, and reflink the extents from the shadow to the original inode.
If we crash then we can just re-reflink everything in the shadow inode.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-05-02 18:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-27 21:27 [PATCH] xfs: handle large CoW remapping requests Darrick J. Wong
2017-05-02  7:50 ` Christoph Hellwig
2017-05-02 18:02   ` Darrick J. Wong [this message]
2017-05-04 11:54     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170502180220.GA5973@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.