All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
	linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org
Subject: Re: [PATCH 06/10] dax: provide an iomap based fault handler
Date: Fri, 23 Sep 2016 15:02:37 -0600	[thread overview]
Message-ID: <20160923210237.GA23346@linux.intel.com> (raw)
In-Reply-To: <20160914070633.GA17278-jcswGhMUV9g@public.gmane.org>

On Wed, Sep 14, 2016 at 09:06:33AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 13, 2016 at 09:51:26AM -0600, Ross Zwisler wrote:
> > I'm working on this right now.  I expect that most/all of the infrastructure
> > between the bh+get_block_t version and the iomap version to be shared, it'll
> > just be a matter of having a PMD version of the iomap fault handler.  This
> > should be pretty minor.
> 
> Yes, I looked at it (although I didn't do any work yet), and the work
> should be fairly easy.
> 
> > Let's see how it goes, but right now my plan is to have both - I'd like to
> > keep feature parity between ext2/ext4 and XFS, and that means having PMD
> > faults in ext4 via bh+get_block_t until they move over to iomap.
> > 
> > Regarding coordination, the PMD v2 series hasn't gotten much review so far, so
> > I'm not sure it'll go in for v4.9.  At this point I'm planning on just
> > rebasing on top of your iomap series, though if it gets taken sooner I
> > wouldn't object.
> 
> So let's do iomap first.  I've got stable ext2 support, as well as support
> for the block device, although I'm not sure what the proper testing
> protocol for that is.  I've started ext4 and read / zero was easy, but
> now I'm stuck in the convoluted mess that is the ext4 direct I/O and
> DAX path.
> 
> Maybe we should get the iomap work into 4.9 and then convert over ext4
> as well as adding PMD fault support in the next release.

I was doing some testing of my PMD patches, and was surprised to see that
ext4 + PMDs + generic/074 now takes 23 minutes to complete.  With ext4 and
PTEs faults this test takes ~50 seconds in the same setup, and XFS + PMDs
takes 27 seconds.

The root cause is that ext4 is using the direct I/O path for reads and writes,
and that the DIO path thinks it needs to flush dirty data from the radix tree
on each I/O.

Each read ends up writing back PMDs via:

  vfs_read()
    __vfs_read()
      new_sync_read
        generic_file_read_iter()
          filemap_write_and_wait_range()

I believe we have an analogous problem for writes.

This results in us flushing 12 TiB worth of data during the generic/074 test,
one cache line at a time...

With your recent changes to detangle the DAX and DIO faults paths in XFS, we
avoid this because XFS no longer uses generic_file_read_iter(), but instead
uses xfs_file_write_iter() which skips the writeback and instead just calls
xfs_file_dax_write() to do the I/O.

This is obviously an existing issue in ext4 that we need to address.  Even
with the PTE path we are doing tons of unnecessary flushing, but the move from
flushing PTEs to flushing PMDs is what killed us.

I can just add a hack to hop over the writeback in generic_file_read_iter(),
but I hesitate to do this because it seems like the correct thing to do is to
separate the ext4 DAX & DIO paths, which I think you are already doing.

I believe that my DAX PMD patches are ready to go, but because of this issue
they currently only support XFS.  I'm tempted to send them out as they are
right now since they add a bunch of complexity to DAX that we need to review,
and that review can fully happen with only XFS support. We can add ext4
support back in later when it's ready.

Thoughts?

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-nvdimm@ml01.01.org
Subject: Re: [PATCH 06/10] dax: provide an iomap based fault handler
Date: Fri, 23 Sep 2016 15:02:37 -0600	[thread overview]
Message-ID: <20160923210237.GA23346@linux.intel.com> (raw)
In-Reply-To: <20160914070633.GA17278@lst.de>

On Wed, Sep 14, 2016 at 09:06:33AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 13, 2016 at 09:51:26AM -0600, Ross Zwisler wrote:
> > I'm working on this right now.  I expect that most/all of the infrastructure
> > between the bh+get_block_t version and the iomap version to be shared, it'll
> > just be a matter of having a PMD version of the iomap fault handler.  This
> > should be pretty minor.
> 
> Yes, I looked at it (although I didn't do any work yet), and the work
> should be fairly easy.
> 
> > Let's see how it goes, but right now my plan is to have both - I'd like to
> > keep feature parity between ext2/ext4 and XFS, and that means having PMD
> > faults in ext4 via bh+get_block_t until they move over to iomap.
> > 
> > Regarding coordination, the PMD v2 series hasn't gotten much review so far, so
> > I'm not sure it'll go in for v4.9.  At this point I'm planning on just
> > rebasing on top of your iomap series, though if it gets taken sooner I
> > wouldn't object.
> 
> So let's do iomap first.  I've got stable ext2 support, as well as support
> for the block device, although I'm not sure what the proper testing
> protocol for that is.  I've started ext4 and read / zero was easy, but
> now I'm stuck in the convoluted mess that is the ext4 direct I/O and
> DAX path.
> 
> Maybe we should get the iomap work into 4.9 and then convert over ext4
> as well as adding PMD fault support in the next release.

I was doing some testing of my PMD patches, and was surprised to see that
ext4 + PMDs + generic/074 now takes 23 minutes to complete.  With ext4 and
PTEs faults this test takes ~50 seconds in the same setup, and XFS + PMDs
takes 27 seconds.

The root cause is that ext4 is using the direct I/O path for reads and writes,
and that the DIO path thinks it needs to flush dirty data from the radix tree
on each I/O.

Each read ends up writing back PMDs via:

  vfs_read()
    __vfs_read()
      new_sync_read
        generic_file_read_iter()
          filemap_write_and_wait_range()

I believe we have an analogous problem for writes.

This results in us flushing 12 TiB worth of data during the generic/074 test,
one cache line at a time...

With your recent changes to detangle the DAX and DIO faults paths in XFS, we
avoid this because XFS no longer uses generic_file_read_iter(), but instead
uses xfs_file_write_iter() which skips the writeback and instead just calls
xfs_file_dax_write() to do the I/O.

This is obviously an existing issue in ext4 that we need to address.  Even
with the PTE path we are doing tons of unnecessary flushing, but the move from
flushing PTEs to flushing PMDs is what killed us.

I can just add a hack to hop over the writeback in generic_file_read_iter(),
but I hesitate to do this because it seems like the correct thing to do is to
separate the ext4 DAX & DIO paths, which I think you are already doing.

I believe that my DAX PMD patches are ready to go, but because of this issue
they currently only support XFS.  I'm tempted to send them out as they are
right now since they add a bunch of complexity to DAX that we need to review,
and that review can fully happen with only XFS support. We can add ext4
support back in later when it's ready.

Thoughts?

  parent reply	other threads:[~2016-09-23 21:02 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-09 16:34 iomap based DAX path Christoph Hellwig
     [not found] ` <1473438884-674-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-09 16:34   ` [PATCH 01/10] iomap: add IOMAP_F_NEW flag Christoph Hellwig
2016-09-09 16:34     ` Christoph Hellwig
     [not found]     ` <1473438884-674-2-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-13 22:43       ` Ross Zwisler
2016-09-13 22:43         ` Ross Zwisler
2016-09-14  7:08         ` Christoph Hellwig
2016-09-09 16:34   ` [PATCH 02/10] iomap: expose iomap_apply outside iomap.c Christoph Hellwig
2016-09-09 16:34     ` Christoph Hellwig
     [not found]     ` <1473438884-674-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-13 22:48       ` Ross Zwisler
2016-09-13 22:48         ` Ross Zwisler
2016-09-09 16:34   ` [PATCH 03/10] dax: don't pass buffer_head to dax_insert_mapping Christoph Hellwig
2016-09-09 16:34     ` Christoph Hellwig
     [not found]     ` <1473438884-674-4-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-13 22:53       ` Ross Zwisler
2016-09-13 22:53         ` Ross Zwisler
2016-09-09 16:34   ` [PATCH 04/10] dax: don't pass buffer_head to copy_user_dax Christoph Hellwig
2016-09-09 16:34     ` Christoph Hellwig
2016-09-13 22:54     ` Ross Zwisler
2016-09-09 16:34   ` [PATCH 05/10] dax: provide an iomap based dax read/write path Christoph Hellwig
2016-09-09 16:34     ` Christoph Hellwig
     [not found]     ` <1473438884-674-6-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-13 23:00       ` Ross Zwisler
2016-09-13 23:00         ` Ross Zwisler
2016-09-09 16:34   ` [PATCH 06/10] dax: provide an iomap based fault handler Christoph Hellwig
2016-09-09 16:34     ` Christoph Hellwig
     [not found]     ` <1473438884-674-7-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-09 22:55       ` Dave Chinner
2016-09-09 22:55         ` Dave Chinner
2016-09-10  7:36         ` Christoph Hellwig
2016-09-10  7:36           ` Christoph Hellwig
2016-09-13 15:51           ` Ross Zwisler
     [not found]             ` <20160913155126.GA10622-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-14  7:06               ` Christoph Hellwig
2016-09-14  7:06                 ` Christoph Hellwig
     [not found]                 ` <20160914070633.GA17278-jcswGhMUV9g@public.gmane.org>
2016-09-14  9:53                   ` Christoph Hellwig
2016-09-14  9:53                     ` Christoph Hellwig
2016-09-23 21:02                   ` Ross Zwisler [this message]
2016-09-23 21:02                     ` Ross Zwisler
     [not found]                     ` <20160923210237.GA23346-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-26  0:08                       ` Christoph Hellwig
2016-09-26  0:08                         ` Christoph Hellwig
     [not found]                         ` <20160926000805.GA32252-jcswGhMUV9g@public.gmane.org>
2016-09-26 14:28                           ` Jan Kara
2016-09-26 14:28                             ` Jan Kara
2016-09-10  1:38       ` Elliott, Robert (Persistent Memory)
2016-09-10  1:38         ` Elliott, Robert (Persistent Memory)
2016-09-13 23:10     ` Ross Zwisler
     [not found]       ` <20160913231039.GF26002-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-14  7:19         ` Christoph Hellwig
2016-09-14  7:19           ` Christoph Hellwig
     [not found]           ` <20160914071910.GC17278-jcswGhMUV9g@public.gmane.org>
2016-09-14 17:07             ` Ross Zwisler
2016-09-14 17:07               ` Ross Zwisler
     [not found]               ` <20160914170759.GA14196-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2016-09-15  5:12                 ` Christoph Hellwig
2016-09-15  5:12                   ` Christoph Hellwig
     [not found]                   ` <20160915051229.GD6188-jcswGhMUV9g@public.gmane.org>
2016-09-15  5:30                     ` Darrick J. Wong
2016-09-15  5:30                       ` Darrick J. Wong
2016-09-26  0:05               ` Christoph Hellwig
2016-09-09 16:34   ` [PATCH 07/10] xfs: fix locking for DAX writes Christoph Hellwig
2016-09-09 16:34     ` Christoph Hellwig
2016-09-09 16:34   ` [PATCH 08/10] xfs: take the ilock shared if possible in xfs_file_iomap_begin Christoph Hellwig
2016-09-09 16:34     ` Christoph Hellwig
2016-09-09 16:34 ` [PATCH 09/10] xfs: refactor xfs_setfilesize Christoph Hellwig
     [not found]   ` <1473438884-674-10-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-09-13 23:12     ` Ross Zwisler
2016-09-13 23:12       ` Ross Zwisler
2016-09-09 16:34 ` [PATCH 10/10] xfs: use iomap to implement DAX Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160923210237.GA23346@linux.intel.com \
    --to=ross.zwisler-vuqaysv1563yd54fqh9/ca@public.gmane.org \
    --cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org \
    --cc=linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.