nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jan Kara <jack@suse.cz>, Mike Snitzer <snitzer@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: Snapshot target and DAX-capable devices
Date: Fri, 31 Aug 2018 09:38:09 +1000	[thread overview]
Message-ID: <20180830233809.GH1572@dastard> (raw)
In-Reply-To: <alpine.LRH.2.02.1808301545200.30950@file01.intranet.prod.int.rdu2.redhat.com>

On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote:
> 
> 
> On Thu, 30 Aug 2018, Jeff Moyer wrote:
> 
> > Mike Snitzer <snitzer@redhat.com> writes:
> > 
> > > Until we properly add DAX support to dm-snapshot I'm afraid we really do
> > > need to tolerate this "regression".  Since reality is the original
> > > support for snapshot of a DAX DM device never worked in a robust way.
> > 
> > Agreed.
> > 
> > -Jeff
> 
> You can't support dax on snapshot - if someone maps a block and the block 
> needs to be moved, then what?

This is only a problem for access via mmap and page faults.

At the filesystem level, it's no different to the existing direct IO
algorithm for read/write IO - we simply allocate new space, copy the
data we need to copy into the new space (may be no copy needed), and
then write the new data into the new space. I'm pretty sure that for
bio-based IO to dm-snapshot devices the algorithm will be exactly
the same.

However, for direct access via mmap, we have to modify how the
userspace virtual address is mapped to the physical location. IOWs,
during the COW operation, we have to invalidate all existing user
mappings we have for that physical address. This means we have to do
an invalidation after the allocate/copy part of the COW operation.

If we are doing this during a page fault, it means we'll probably
have to restart the page fault so it can look up the new physical
address associated with the faulting user address. After we've done
the invalidation, any new (or restarted) page fault finds the
location of new copy we just made, maps it into the user address
space, updates the ptes and we're all good.

Well, that's the theory. We haven't implemented this for XFS yet, so
it might end up a little different, and we might yet hit unexpected
problems (it's DAX, that's what happens :/).

It's a whole different ballgame for a dm-snapshot device - block
devices are completely unaware of page faults to DAX file mappings.
We'll need the filesystem to be aware it's on a remappable block
device, and when we take a DAX write fault we'll need to ask the
underlying device to remap the block and treat it like the
filesystem COW case above. We'll need to do this remap/invalidate
dance in the write IO path, too, because a COW by the block device
is no different to filesystem COW in that path.

Basically, it's the same algorithm as the filesystem COW case, we
just get the physical location of the data and the notification of
block changing physical location from a different interface.

Hmmmm. ISTR that someone has been making a few noises recently about
virtual block address space mapping interfaces that could help solve
this problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  parent reply	other threads:[~2018-08-30 23:38 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 16:07 Snapshot target and DAX-capable devices Jan Kara
2018-08-27 16:43 ` Kani, Toshi
2018-08-28  7:50   ` Jan Kara
2018-08-28 17:56     ` Mike Snitzer
2018-08-28 22:38       ` Kani, Toshi
2018-08-30  9:30       ` Jan Kara
2018-08-30 18:49         ` Mike Snitzer
2018-08-30 19:32           ` Jeff Moyer
2018-08-30 19:47             ` Mikulas Patocka
2018-08-30 19:53               ` Jeff Moyer
2018-08-30 23:38               ` Dave Chinner [this message]
2018-08-31  9:42                 ` Jan Kara
2018-09-05  1:25                   ` Dave Chinner
2018-12-12 16:11                   ` Huaisheng Ye
2018-12-12 16:12                     ` Christoph Hellwig
2018-12-12 17:50                       ` Mike Snitzer
2018-12-12 19:49                         ` Kani, Toshi
2018-12-12 21:15                         ` Theodore Y. Ts'o
2018-12-12 22:43                           ` Mike Snitzer
2018-12-14  4:11                             ` [dm-devel] " Theodore Y. Ts'o
2018-12-14  8:24                             ` [External] " Huaisheng HS1 Ye
2018-12-18 19:49                               ` Mike Snitzer
2018-08-30 19:44           ` Mikulas Patocka
2018-08-31 10:01             ` Jan Kara
2018-08-30 22:55           ` Dave Chinner
2018-08-31  9:54           ` Jan Kara
2018-08-30 19:17         ` [dm-devel] " Jeff Moyer
2018-08-31  9:14           ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180830233809.GH1572@dastard \
    --to=david@fromorbit.com \
    --cc=dm-devel@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).