All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
To: "Darrick J. Wong" <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org,
	Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Subject: Re: [PATCH 10/12] xfs: use iomap to implement DAX
Date: Thu, 15 Sep 2016 08:43:20 +0200	[thread overview]
Message-ID: <20160915064320.GA7665@lst.de> (raw)
In-Reply-To: <20160915052933.GH9314-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>

On Wed, Sep 14, 2016 at 10:29:33PM -0700, Darrick J. Wong wrote:
> I'm a little confused about xfs_file_iomap_begin() and IOMAP_WRITE --
> prior to this patchset, it was only called via page_mkwrite and what
> used to be write_begin, and all it did was create a delalloc reservation
> to back the write (or actually allocate blocks if extsize was set).

No, it was called from write itself in addition, of course.

> Thinking ahead to integrating reflink with DAX (and presumably
> directio) in both cases _file_iomap_begin has to return a real extent.
> If the range we're writing is shared, then that means that I'd have to
> ensure the range is reserved in the COW fork (the reflink patches already do
> this).  Next, if the caller requires a firm mapping (i.e. dax or dio)
> I'd have to allocate the range in the COW fork and have that mapping
> returned to the caller.

Yes.  No different than the existing buffered write case with an
extent size hint, really.

> So I guess it would look something like this:
> 
> /* reserve reflink cow range like the reflink patches already do */
> if (flags & (IOMAP_WRITE | IOMAP_ZERO) && xfs_is_reflink_inode(ip))
> 	xfs_reflink_reserve_cow_range(ip, offset, length);

For zeroing we don't need to reserve anything IFF we are finding a hole,
so this is too early.

> /* do the cow thing if need be */
> if ((flags & IOMAP_WRITE) &&
>     xfs_is_reflink_inode(ip) &&
>     (IS_DAX(inode) || (flags & IOMAP_DIO)) {
> 	xfs_reflink_allocate_cow_range(ip, offset, length);
> 
> 	if (xfs_reflink_is_cow_pending(ip, offset))
> 		xfs_reflink_find_cow_mapping(ip, offset, &imap, &need_alloc);
> 	else
> 		xfs_bmapi_read(offset, &imap)
> } else
> 	xfs_bmapi_read(offset, &imap) /* non-cow io */


We really have three write block allocation variants, of which the first
two look basically the same except for I/O completion handling:

 1) For DAX we need to allocate real extents and zero them before use.
 2) For buffered I/O with an extent size hint or direct I/O we need to
    allocate unwritten extents (and convert after the I/O succeeded for
    the range that succeeded)
 3) for buffered I/O without and extent size hint we need to created
    a delayed allocation

That logic exists both in the old __xfs_get_blocks and and the new
iomap code (although we don't check for the dio case yet).

So I don't think we should need to change anything in terms of COW
in relation to DAX or direct I/O here.  I do however need to fully
audit the usage of the reflink calls for the cases 1 and 2.  For one
it doesn't really seem nessecary to split the reserve/allocate case
ehre, and I hope that a lot of the bmap btree looks could be saved,
similar to my rewrite of the delalloc case.
    

> Does that look plausible?  If _iomap_begin will have one type of caller
> that only needs a delalloc reservation and another type that actually
> needs a firm mapping, should we make a new IOMAP flag to indicate that
> the caller really needs a firm mapping?  IS_DAX is fine for detecting
> the DAX case as this patchset shows, but what about directio?
> 
> (At this point Dave suggests on IRC that we just make a dio specific
> iomap_begin.  That eliminates the flag, though it'd be a fairly similar
> function.)

As explained above, DIO is exaxtly the same as buffered I/O with
an extent size hint.  We already handle it fine.  DAX is almost
the same, keyed off a DAX check in lower level code to convert unwritten
extents and zero the blocks.

WARNING: multiple messages have this Message-ID
From: Christoph Hellwig <hch@lst.de>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-nvdimm@ml01.01.org, ross.zwisler@linux.intel.com,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 10/12] xfs: use iomap to implement DAX
Date: Thu, 15 Sep 2016 08:43:20 +0200	[thread overview]
Message-ID: <20160915064320.GA7665@lst.de> (raw)
In-Reply-To: <20160915052933.GH9314@birch.djwong.org>

On Wed, Sep 14, 2016 at 10:29:33PM -0700, Darrick J. Wong wrote:
> I'm a little confused about xfs_file_iomap_begin() and IOMAP_WRITE --
> prior to this patchset, it was only called via page_mkwrite and what
> used to be write_begin, and all it did was create a delalloc reservation
> to back the write (or actually allocate blocks if extsize was set).

No, it was called from write itself in addition, of course.

> Thinking ahead to integrating reflink with DAX (and presumably
> directio) in both cases _file_iomap_begin has to return a real extent.
> If the range we're writing is shared, then that means that I'd have to
> ensure the range is reserved in the COW fork (the reflink patches already do
> this).  Next, if the caller requires a firm mapping (i.e. dax or dio)
> I'd have to allocate the range in the COW fork and have that mapping
> returned to the caller.

Yes.  No different than the existing buffered write case with an
extent size hint, really.

> So I guess it would look something like this:
> 
> /* reserve reflink cow range like the reflink patches already do */
> if (flags & (IOMAP_WRITE | IOMAP_ZERO) && xfs_is_reflink_inode(ip))
> 	xfs_reflink_reserve_cow_range(ip, offset, length);

For zeroing we don't need to reserve anything IFF we are finding a hole,
so this is too early.

> /* do the cow thing if need be */
> if ((flags & IOMAP_WRITE) &&
>     xfs_is_reflink_inode(ip) &&
>     (IS_DAX(inode) || (flags & IOMAP_DIO)) {
> 	xfs_reflink_allocate_cow_range(ip, offset, length);
> 
> 	if (xfs_reflink_is_cow_pending(ip, offset))
> 		xfs_reflink_find_cow_mapping(ip, offset, &imap, &need_alloc);
> 	else
> 		xfs_bmapi_read(offset, &imap)
> } else
> 	xfs_bmapi_read(offset, &imap) /* non-cow io */


We really have three write block allocation variants, of which the first
two look basically the same except for I/O completion handling:

 1) For DAX we need to allocate real extents and zero them before use.
 2) For buffered I/O with an extent size hint or direct I/O we need to
    allocate unwritten extents (and convert after the I/O succeeded for
    the range that succeeded)
 3) for buffered I/O without and extent size hint we need to created
    a delayed allocation

That logic exists both in the old __xfs_get_blocks and and the new
iomap code (although we don't check for the dio case yet).

So I don't think we should need to change anything in terms of COW
in relation to DAX or direct I/O here.  I do however need to fully
audit the usage of the reflink calls for the cases 1 and 2.  For one
it doesn't really seem nessecary to split the reserve/allocate case
ehre, and I hope that a lot of the bmap btree looks could be saved,
similar to my rewrite of the delalloc case.
    

> Does that look plausible?  If _iomap_begin will have one type of caller
> that only needs a delalloc reservation and another type that actually
> needs a firm mapping, should we make a new IOMAP flag to indicate that
> the caller really needs a firm mapping?  IS_DAX is fine for detecting
> the DAX case as this patchset shows, but what about directio?
> 
> (At this point Dave suggests on IRC that we just make a dio specific
> iomap_begin.  That eliminates the flag, though it'd be a fairly similar
> function.)

As explained above, DIO is exaxtly the same as buffered I/O with
an extent size hint.  We already handle it fine.  DAX is almost
the same, keyed off a DAX check in lower level code to convert unwritten
extents and zero the blocks.

  parent reply	other threads:[~2016-09-15  6:43 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-14 10:01 iomap based DAX path V2 Christoph Hellwig
2016-09-14 10:01 ` Christoph Hellwig
     [not found] ` <1473847291-18913-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 10:01   ` [PATCH 01/12] iomap: add IOMAP_F_NEW flag Christoph Hellwig
2016-09-14 10:01     ` Christoph Hellwig
2016-09-14 10:01   ` [PATCH 02/12] iomap: expose iomap_apply outside iomap.c Christoph Hellwig
2016-09-14 10:01     ` Christoph Hellwig
2016-09-14 10:01   ` [PATCH 12/12] ext2: use iomap to implement DAX Christoph Hellwig
2016-09-14 10:01     ` Christoph Hellwig
     [not found]     ` <1473847291-18913-13-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 22:51       ` Ross Zwisler
2016-09-14 22:51         ` Ross Zwisler
2016-09-15  5:14         ` Christoph Hellwig
2016-09-14 10:01 ` [PATCH 03/12] dax: don't pass buffer_head to dax_insert_mapping Christoph Hellwig
2016-09-14 10:01 ` [PATCH 04/12] dax: don't pass buffer_head to copy_user_dax Christoph Hellwig
2016-09-14 10:01 ` [PATCH 05/12] dax: provide an iomap based dax read/write path Christoph Hellwig
     [not found]   ` <1473847291-18913-6-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 17:26     ` Ross Zwisler
2016-09-14 17:26       ` Ross Zwisler
2016-09-14 10:01 ` [PATCH 06/12] dax: provide an iomap based fault handler Christoph Hellwig
2016-09-14 17:27   ` Ross Zwisler
     [not found]     ` <20160914172717.GB30852-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-15  5:13       ` Christoph Hellwig
2016-09-15  5:13         ` Christoph Hellwig
2016-09-14 10:01 ` [PATCH 07/12] xfs: fix locking for DAX writes Christoph Hellwig
2016-09-14 10:01 ` [PATCH 08/12] xfs: take the ilock shared if possible in xfs_file_iomap_begin Christoph Hellwig
2016-09-14 10:01 ` [PATCH 09/12] xfs: refactor xfs_setfilesize Christoph Hellwig
2016-09-14 10:01 ` [PATCH 10/12] xfs: use iomap to implement DAX Christoph Hellwig
     [not found]   ` <1473847291-18913-11-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 17:32     ` Ross Zwisler
2016-09-14 17:32       ` Ross Zwisler
     [not found]       ` <20160914173247.GC30852-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-15  5:14         ` Christoph Hellwig
2016-09-15  5:14           ` Christoph Hellwig
2016-09-15  5:29   ` Darrick J. Wong
     [not found]     ` <20160915052933.GH9314-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-09-15  6:43       ` Christoph Hellwig [this message]
2016-09-15  6:43         ` Christoph Hellwig
2016-09-14 10:01 ` [PATCH 11/12] ext2: stop passing buffer_head to ext2_get_blocks Christoph Hellwig
     [not found]   ` <1473847291-18913-12-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-14 22:42     ` Ross Zwisler
2016-09-14 22:42       ` Ross Zwisler
2016-09-16 11:27 iomap based DAX path V3 Christoph Hellwig
     [not found] ` <1474025234-13804-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-16 11:27   ` [PATCH 10/12] xfs: use iomap to implement DAX Christoph Hellwig
2016-09-16 11:27     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160915064320.GA7665@lst.de \
    --to=hch-jcswghmuv9g@public.gmane.org \
    --cc=darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org \
    --cc=linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --subject='Re: [PATCH 10/12] xfs: use iomap to implement DAX' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.