linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Goldwyn Rodrigues <rgoldwyn@suse.de>
To: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-btrfs@vger.kernel.org, kilobyte@angband.pl,
	linux-fsdevel@vger.kernel.org, jack@suse.cz, david@fromorbit.com,
	willy@infradead.org, hch@lst.de, dsterba@suse.cz,
	nborisov@suse.com, linux-nvdimm@lists.01.org
Subject: Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes
Date: Thu, 23 May 2019 06:51:09 -0500	[thread overview]
Message-ID: <20190523115109.2o4txdjq2ft7fzzc@fiona> (raw)
In-Reply-To: <1e9951c1-d320-e480-3130-dc1f4b81ef2c@cn.fujitsu.com>

On 17:05 23/05, Shiyang Ruan wrote:
> 
> 
> On 5/22/19 12:51 AM, Darrick J. Wong wrote:
> > On Mon, Apr 29, 2019 at 12:26:35PM -0500, Goldwyn Rodrigues wrote:
> > > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > > 
> > > The IOMAP_DAX_COW is a iomap type which performs copy of
> > > edges of data while performing a write if start/end are
> > > not page aligned. The source address is expected in
> > > iomap->inline_data.
> > > 
> > > dax_copy_edges() is a helper functions performs a copy from
> > > one part of the device to another for data not page aligned.
> > > If iomap->inline_data is NULL, it memset's the area to zero.
> > > 
> > > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > > ---
> > >   fs/dax.c              | 46 +++++++++++++++++++++++++++++++++++++++++++++-
> > >   include/linux/iomap.h |  1 +
> > >   2 files changed, 46 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/dax.c b/fs/dax.c
> > > index e5e54da1715f..610bfa861a28 100644
> > > --- a/fs/dax.c
> > > +++ b/fs/dax.c
> > > @@ -1084,6 +1084,42 @@ int __dax_zero_page_range(struct block_device *bdev,
> > >   }
> > >   EXPORT_SYMBOL_GPL(__dax_zero_page_range);
> > > +/*
> > > + * dax_copy_edges - Copies the part of the pages not included in
> > > + * 		    the write, but required for CoW because
> > > + * 		    offset/offset+length are not page aligned.
> > > + */
> > > +static int dax_copy_edges(struct inode *inode, loff_t pos, loff_t length,
> > > +			   struct iomap *iomap, void *daddr)
> > > +{
> > > +	unsigned offset = pos & (PAGE_SIZE - 1);
> > > +	loff_t end = pos + length;
> > > +	loff_t pg_end = round_up(end, PAGE_SIZE);
> > > +	void *saddr = iomap->inline_data;
> > > +	int ret = 0;
> > > +	/*
> > > +	 * Copy the first part of the page
> > > +	 * Note: we pass offset as length
> > > +	 */
> > > +	if (offset) {
> > > +		if (saddr)
> > > +			ret = memcpy_mcsafe(daddr, saddr, offset);
> > > +		else
> > > +			memset(daddr, 0, offset);
> > > +	}
> > > +
> > > +	/* Copy the last part of the range */
> > > +	if (end < pg_end) {
> > > +		if (saddr)
> > > +			ret = memcpy_mcsafe(daddr + offset + length,
> > > +			       saddr + offset + length,	pg_end - end);
> > > +		else
> > > +			memset(daddr + offset + length, 0,
> > > +					pg_end - end);
> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > >   static loff_t
> > >   dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > >   		struct iomap *iomap)
> > > @@ -1105,9 +1141,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > >   			return iov_iter_zero(min(length, end - pos), iter);
> > >   	}
> > > -	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
> > > +	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED
> > > +			 && iomap->type != IOMAP_DAX_COW))
> > 
> > I reiterate (from V3) that the && goes on the previous line...
> > 
> > 	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED &&
> > 			 iomap->type != IOMAP_DAX_COW))
> > 
> > >   		return -EIO;
> > > +
> > >   	/*
> > >   	 * Write can allocate block for an area which has a hole page mapped
> > >   	 * into page tables. We have to tear down these mappings so that data
> > > @@ -1144,6 +1182,12 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > >   			break;
> > >   		}
> > > +		if (iomap->type == IOMAP_DAX_COW) {
> > > +			ret = dax_copy_edges(inode, pos, length, iomap, kaddr);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +
> > >   		map_len = PFN_PHYS(map_len);
> > >   		kaddr += offset;
> > >   		map_len -= offset;
> > > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > > index 0fefb5455bda..6e885c5a38a3 100644
> > > --- a/include/linux/iomap.h
> > > +++ b/include/linux/iomap.h
> > > @@ -25,6 +25,7 @@ struct vm_fault;
> > >   #define IOMAP_MAPPED	0x03	/* blocks allocated at @addr */
> > >   #define IOMAP_UNWRITTEN	0x04	/* blocks allocated at @addr in unwritten state */
> > >   #define IOMAP_INLINE	0x05	/* data inline in the inode */
> > 
> > > +#define IOMAP_DAX_COW	0x06
> > 
> > DAX isn't going to be the only scenario where we need a way to
> > communicate to iomap actors the need to implement copy on write.
> > 
> > XFS also uses struct iomap to hand out file leases to clients.  The
> > lease code /currently/ doesn't support files with shared blocks (because
> > the only user is pNFS) but one could easily imagine a future where some
> > client wants to lease a file with shared blocks, in which case XFS will
> > want to convey the COW details to the lessee.
> > 
> > > +/* Copy data pointed by inline_data before write*/
> > 
> > A month ago during the V3 patchset review, I wrote (possibly in an other
> > thread, sorry) about something that I'm putting my foot down about now
> > for the V4 patchset, which is the {re,ab}use of @inline_data for the
> > data source address.
> > 
> > We cannot use @inline_data to convey the source address.  @inline_data
> > (so far) is used to point to the in-memory representation of the storage
> > described by @addr.  For data writes, @addr is the location of the write
> > on disk and @inline_data is the location of the write in memory.
> > 
> > Reusing @inline_data here to point to the location of the source data in
> > memory is a totally different thing and will likely result in confusion.
> > On a practical level, this also means that we cannot support the case of
> > COW && INLINE because the type codes collide and so would the users of
> > @inline_data.  This isn't required *right now*, but if you had a pmem
> > filesystem that stages inode updates in memory and flips a pointer to
> > commit changes then the ->iomap_begin function will need to convey two
> > pointers at once.
> > 
> > So this brings us back to Dave's suggestion during the V1 patchset
> > review that instead of adding more iomap flags/types and overloading
> > fields, we simply pass two struct iomaps into ->iomap_begin:
> > 
> >   - Change iomap_apply() to "struct iomap iomap[2] = 0;" and pass
> >     &iomap[0] into the ->iomap_begin and ->iomap_end functions.  The
> >     first iomap will be filled in with the destination for the write (as
> >     all implementations do now), and the second iomap can be filled in
> >     with the source information for a COW operation.
> > 
> >   - If the ->iomap_begin implementation decides that COW is necessary for
> >     the requested operation, then it should fill out that second iomap
> >     with information about the extent that the actor must copied before
> >     returning.  The second iomap's offset and length must match the
> >     first.  If COW isn't necessary, the ->iomap_begin implementation
> 
> Hi,
> 
> I'm working on reflink & dax in XFS, here are some thoughts on this:
> 
> As mentioned above: the second iomap's offset and length must match the
> first.  I thought so at the beginning, but later found that the only
> difference between these two iomaps is @addr.  So, what about adding a
> @saddr, which means the source address of COW extent, into the struct iomap.
> The ->iomap_begin() fills @saddr if the extent is COW, and 0 if not.  Then
> handle this @saddr in each ->actor().  No more modifications in other
> functions.

Yes, I started of with the exact idea before being recommended this by Dave.
I used two fields instead of one namely cow_pos and cow_addr which defined
the source details. I had put it as a iomap flag as opposed to a type
which of course did not appeal well.

We may want to use iomaps for cases where two inodes are involved.
An example of the other scenario where offset may be different is file
comparison for dedup: vfs_dedup_file_range_compare(). However, it would
need two inodes in iomap as well.

> 
> My RFC patchset[1] is implemented in this way and works for me, though it is
> far away from perfectness.
> 
> [1]: https://patchwork.kernel.org/cover/10904307/
> 

-- 
Goldwyn

  reply	other threads:[~2019-05-23 11:51 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-29 17:26 [PATCH v4 00/18] btrfs dax support Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 01/18] btrfs: create a mount option for dax Goldwyn Rodrigues
2019-05-21 18:02   ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 02/18] btrfs: Carve out btrfs_get_extent_map_write() out of btrfs_get_blocks_write() Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 03/18] btrfs: basic dax read Goldwyn Rodrigues
2019-05-21 15:14   ` Darrick J. Wong
2019-05-22 21:50     ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Goldwyn Rodrigues
2019-05-21 16:51   ` Darrick J. Wong
2019-05-22 20:14     ` Goldwyn Rodrigues
2019-05-23  2:10       ` Dave Chinner
2019-05-23  9:05     ` Shiyang Ruan
2019-05-23 11:51       ` Goldwyn Rodrigues [this message]
2019-05-27  8:25         ` Shiyang Ruan
2019-05-28  9:17           ` Jan Kara
2019-05-29  2:01             ` Shiyang Ruan
2019-05-29  2:47               ` Dave Chinner
2019-05-29  4:02                 ` Shiyang Ruan
2019-05-29  4:07                   ` Darrick J. Wong
2019-05-29  4:46                     ` Dave Chinner
2019-05-29 13:46                       ` Jan Kara
2019-05-29 22:14                         ` Dave Chinner
2019-05-30 11:16                           ` Jan Kara
2019-05-30 22:59                             ` Dave Chinner
2019-04-29 17:26 ` [PATCH 05/18] btrfs: return whether extent is nocow or not Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 06/18] btrfs: Rename __endio_write_update_ordered() to btrfs_update_ordered_extent() Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 07/18] btrfs: add dax write support Goldwyn Rodrigues
2019-05-21 17:08   ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 08/18] dax: memcpy page in case of IOMAP_DAX_COW for mmap faults Goldwyn Rodrigues
2019-05-21 17:46   ` Darrick J. Wong
2019-05-22 19:11     ` Goldwyn Rodrigues
2019-05-23  4:02       ` Darrick J. Wong
2019-05-23 12:10     ` Jan Kara
2019-04-29 17:26 ` [PATCH 09/18] btrfs: Add dax specific address_space_operations Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 10/18] dax: replace mmap entry in case of CoW Goldwyn Rodrigues
2019-05-21 17:35   ` Darrick J. Wong
2019-05-23 13:38   ` Jan Kara
2019-04-29 17:26 ` [PATCH 11/18] btrfs: add dax mmap support Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 12/18] btrfs: allow MAP_SYNC mmap Goldwyn Rodrigues
2019-05-10 15:32   ` [PATCH for-goldwyn] btrfs: disallow MAP_SYNC outside of DAX mounts Adam Borowski
2019-05-10 15:41     ` Dan Williams
2019-05-10 15:59       ` Pankaj Gupta
2019-05-23 13:44   ` [PATCH 12/18] btrfs: allow MAP_SYNC mmap Jan Kara
2019-05-23 16:19     ` Adam Borowski
2019-04-29 17:26 ` [PATCH 13/18] fs: dedup file range to use a compare function Goldwyn Rodrigues
2019-05-21 18:17   ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 14/18] dax: memcpy before zeroing range Goldwyn Rodrigues
2019-05-21 17:27   ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 15/18] btrfs: handle dax page zeroing Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 16/18] btrfs: Writeprotect mmap pages on snapshot Goldwyn Rodrigues
2019-05-23 14:04   ` Jan Kara
2019-05-23 15:27     ` Goldwyn Rodrigues
2019-05-23 19:07       ` Jan Kara
2019-05-23 21:22         ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 17/18] btrfs: Disable dax-based defrag and send Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 18/18] btrfs: trace functions for btrfs_iomap_begin/end Goldwyn Rodrigues
  -- strict thread matches above, loose matches on Subject: below --
2019-04-16 16:41 [PATCH v3 00/18] btrfs dax support Goldwyn Rodrigues
2019-04-16 16:41 ` [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Goldwyn Rodrigues
2019-04-17 16:46   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190523115109.2o4txdjq2ft7fzzc@fiona \
    --to=rgoldwyn@suse.de \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dsterba@suse.cz \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kilobyte@angband.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=nborisov@suse.com \
    --cc=ruansy.fnst@cn.fujitsu.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).