All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>,
	david@fromorbit.com, linux-btrfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/6] iomap: Use a IOMAP_COW/srcmap for a read-modify-write I/O
Date: Wed, 26 Jun 2019 10:34:25 -0700	[thread overview]
Message-ID: <20190626173425.GA5164@magnolia> (raw)
In-Reply-To: <20190626161017.rzkktei2ngznhbat@fiona>

On Wed, Jun 26, 2019 at 11:10:17AM -0500, Goldwyn Rodrigues wrote:
> On  8:39 26/06, Christoph Hellwig wrote:
> > On Tue, Jun 25, 2019 at 02:14:42PM -0500, Goldwyn Rodrigues wrote:
> > > > I can't say I'm a huge fan of this two iomaps in one method call
> > > > approach.  I always though two separate iomap iterations would be nicer,
> > > > but compared to that even the older hack with just the additional
> > > > src_addr seems a little better.
> > > 
> > > I am just expanding on your idea of using multiple iterations for the Cow case
> > > in the hope we can come out of a good design:
> > > 
> > > 1. iomap_file_buffered_write calls iomap_apply with IOMAP_WRITE flag.
> > >    which calls iomap_begin() for the respective filesystem.
> > > 2. btrfs_iomap_begin() sets up iomap->type as IOMAP_COW and fills iomap
> > >    struct with read addr information.
> > > 3. iomap_apply() conditionally for IOMAP_COW calls do_cow(new function)
> > >    and calls ops->iomap_begin() with flag IOMAP_COW_READ_DONE(new flag).
> > > 4. btrfs_iomap_begin() fills up iomap structure with write information.
> > > 
> > > Step 3 seems out of place because iomap_apply should be iomap.type agnostic.
> > > Right?
> > > Should we be adding another flag IOMAP_COW_DONE, just to figure out that
> > > this is the "real" write for iomap_begin to fill iomap?
> > > 
> > > If this is not how you imagined, could you elaborate on the dual iteration
> > > sequence?
> > 
> > Here are my thoughts from dealing with this from a while ago, all
> > XFS based of course.
> > 
> > If iomap_file_buffered_write is called on a page that is inside a COW
> > extent we have the following options:
> > 
> >  a) the page is updatodate or entirely overwritten.  We cn just allocate
> >     new COW blocks and return them, and we are done
> >  b) the page is not/partially uptodate and not entirely overwritten.
> > 
> > The latter case is the interesting one.  My thought was that iff the
> > IOMAP_F_SHARED flag is set __iomap_write_begin / iomap_read_page_sync
> > will then have to retreive the source information in some form.
> > 
> > My original plan was to just do a nested iomap_apply call, which would
> > need a special nested flag to not duplicate any locking the file
> > system might be holding between ->iomap_begin and ->iomap_end.
> > 
> > The upside here is that there is no additional overhead for the non-COW
> > path and the architecture looks relatively clean.  The downside is that
> > at least for XFS we usually have to look up the source information
> > anyway before allocating the COW destination extent, so we'd have to
> > cache that information somewhere or redo it, which would be rather
> > pointless.  At that point the idea of a srcaddr in the iomap becomes
> > interesting again - while it looks a little ugly from the architectural
> > POV it actually ends up having very practical benefits.

I think it's less complicated to pass both mappings out in a single
->iomap_begin call rather than have this dance where the fs tells iomap
to call back for the read mapping and then iomap calls back for the read
mapping with a special "don't take locks" flag.

For XFS specifically this means we can serve both mappings with a single
ILOCK cycle.

> So, do we move back to the design of adding an extra field of srcaddr?

TLDR: Please no.

> Honestly, I find the design of using an extra field srcaddr in iomap better
> and simpler versus passing additional iomap srcmap or multiple iterations.

Putting my long-range planning hat on, the usage story (from the fs'
perspective) here is:

"iomap wants to know how a file write should map to a disk write.  If
we're doing a straight overwrite of disk blocks then I should send back
the relevant mapping.  Sometimes I might need the write to go to a
totally different location than where the data is currently stored, so I
need to send back a second mapping."

Because iomap is now a general-purpose API, we need to think about the
read mapping for a moment:

 - For all disk-based filesystems we'll want the source address for the
   read mapping.

 - For filesystems that support "inline data" (which really just means
   the fs maintains its own buffers to file data) we'll also need the
   inline_data pointer.

 - For filesystems that support multiple devices (like btrfs) we'll also
   need a pointer to a block_device because we could be writing to a
   different device than the one that stores the data.  The prime
   example I can think of is reading data from disk A in some RAID
   stripe and writing to disk B in a different RAID stripe to solve the
   RAID5 hole... but you could just be lazy-migrating file data to less
   full or newer drives or whatever.

 - If we ever encounter a filesystem that supports multiple dax devices
   then we'll need a pointer to the dax_device too.  (That would be
   btrfs, since I thought your larger goal was to enable dax there...)

 - We're probably going to need the ability to pass flags for the read
   mapping at some point or another, so we need that too.

From this, you can see that we need half the fields in the existing
struct iomap, and that's how I arrived at the idea of passing to
iomap_begin pointers to two iomaps instead of bolting these five fields
into struct iomap.

In XFS parlance (where the data fork stores mappings for on-disk data
and the cow fork stores mappings for), this means that our iomap_begin
data paths remain fairly straightforward:

	xfs_bmapi_read(ip, offset, XFS_DATA_FORK, &imap...);
	xfs_bmapi_read(ip, offset, XFS_COW_FORK, &cmap...);

	xfs_bmbt_to_iomap(ip, iomap, &imap...);
	iomap->type = IOMAP_COW;
	xfs_bmbt_to_iomap(ip, srcmap, &cmap...);

(It's more complicated than that, but that's approximately what we do
now.)

> Also, should we add another iomap type IOMAP_COW, or (re)use the flag
> IOMAP_F_SHARED during writes? IOW iomap type vs iomap flag.

I think they need to remain distinct types because the IOMAP_F_SHARED
flag means that the storage is shared amongst multiple owners (which
iomap_fiemap reports to userspace), whereas the IOMAP_COW type means
that RMW is required for unaligned writes.

btrfs has a strong tendency to require copy writes, but that doesn't
mean the extent is necessarily shared.

> Dave/Darrick, what are your thoughts?

I liked where the first 3 patches of this series were heading. :)

--D

> 
> -- 
> Goldwyn

  reply	other threads:[~2019-06-26 17:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-21 19:28 [PATCH 0/6] Btrfs iomap Goldwyn Rodrigues
2019-06-21 19:28 ` [PATCH 1/6] iomap: Use a IOMAP_COW/srcmap for a read-modify-write I/O Goldwyn Rodrigues
2019-06-22  0:46   ` Darrick J. Wong
2019-06-25 19:17     ` Goldwyn Rodrigues
2019-06-26  6:21     ` Christoph Hellwig
2019-06-24  7:07   ` Christoph Hellwig
2019-06-25 19:14     ` Goldwyn Rodrigues
2019-06-26  1:36       ` Shiyang Ruan
2019-06-26  6:39       ` Christoph Hellwig
2019-06-26 16:10         ` Goldwyn Rodrigues
2019-06-26 17:34           ` Darrick J. Wong [this message]
2019-06-26 18:00       ` Darrick J. Wong
2019-06-26 18:42         ` Goldwyn Rodrigues
2019-06-21 19:28 ` [PATCH 2/6] iomap: Read page from srcmap for IOMAP_COW Goldwyn Rodrigues
2019-06-22  0:41   ` Darrick J. Wong
2019-06-21 19:28 ` [PATCH 3/6] iomap: Check iblocksize before transforming page->private Goldwyn Rodrigues
2019-06-22  0:21   ` Darrick J. Wong
2019-06-25 19:22     ` Goldwyn Rodrigues
2019-06-24  7:05   ` Christoph Hellwig
2019-06-25 18:56     ` Goldwyn Rodrigues
2019-06-25 20:04       ` Filipe Manana
2019-06-26  3:03         ` Goldwyn Rodrigues
2019-06-26  6:42           ` Nikolay Borisov
2019-06-26  6:16       ` Christoph Hellwig
2019-06-21 19:28 ` [PATCH 4/6] btrfs: Add a simple buffered iomap write Goldwyn Rodrigues
2019-06-21 19:28 ` [PATCH 5/6] btrfs: Add CoW in iomap based writes Goldwyn Rodrigues
2019-06-21 19:28 ` [PATCH 6/6] btrfs: remove buffered write code made unnecessary Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190626173425.GA5164@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.