From: Jan Kara <jack@suse.cz>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Jan Kara <jack@suse.cz>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
"dm-devel@redhat.com" <dm-devel@redhat.com>,
Mikulas Patocka <mpatocka@redhat.com>,
Ross Zwisler <zwisler@kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: Snapshot target and DAX-capable devices
Date: Fri, 31 Aug 2018 11:54:35 +0200 [thread overview]
Message-ID: <20180831095435.GC11622@quack2.suse.cz> (raw)
In-Reply-To: <20180830184907.GA14867@redhat.com>
On Thu 30-08-18 14:49:07, Mike Snitzer wrote:
> On Thu, Aug 30 2018 at 5:30am -0400,
> Jan Kara <jack@suse.cz> wrote:
>
> > On Tue 28-08-18 13:56:30, Mike Snitzer wrote:
> > > On Tue, Aug 28 2018 at 3:50am -0400,
> > > Jan Kara <jack@suse.cz> wrote:
> > >
> > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote:
> > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I've been analyzing why fstest generic/081 fails when the backing device is
> > > > > > capable of DAX. The problem boils down to the failure of:
> > > > > >
> > > > > > lvm vgcreate -f vg0 /dev/pmem0
> > > > > > lvm lvcreate -L 128M -n lv0 vg0
> > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0
> > > > > >
> > > > > > The last command fails like:
> > > > > >
> > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument
> > > > > > Failed to lock logical volume vg0/lv0.
> > > > > > Aborting. Manual intervention required.
> > > > > >
> > > > > > And the core of the problem is that volume vg0/lv0 is originally of
> > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to
> > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX.
> > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent
> > > > > > DAX mounts if not supported".
> > > > > >
> > > > > > The question is whether / how this should be fixed. The current inability
> > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic
> > > > > > failure makes it even worse (it took me quite a while to understand what is
> > > > > > failing and why). OTOH I see the rationale behind Ross' change as well.
> > > > >
> > > > > Here are the dm-snap changes that went along with the original DAX
> > > > > support.
> > > > >
> > > > > commit b5ab4a9ba55
> > > > > commit f6e629bd237
> > > > >
> > > > > Basically, snapshots can be added/removed to DAX-capable devices, but
> > > > > snapshots need to be mounted without dax option.
> > > >
> > > > Yes, and after these two commits things were working. But then commit
> > > > dbc626597 broke things again so currently snapshotting DAX-capable devices
> > > > does not work. Just try with 4.18...
> > >
> > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as
> > > such. But commit dbc626597 has caused us to regress.. so we need to fix
> > > it.
> > >
> > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was
> > > reluctant to do so because it really is unclear how/if we can even
> > > support a device switching from DAX to non-DAX while IO is in-flight. DM
> > > supports suspending without flushing (via dmsetup suspend --noflush) and
> > > that could really be problematic if we leave DAX IO inflight and then
> > > switch the DM table such that the DM device no longer supports DAX.
> >
> > Well, changing device from DAX-capable to DAX-incapable is problematic for
> > filesystem on top of it as well. Filesystems simply don't expect this
> > feature of a device can change so they would fail in unexpected ways. Also
> > PFNs from the pmem (DAX-capable) device that are already mapped to user page
> > tables won't magically become unmapped so those processes will still have
> > DAX access to those areas of the device.
> >
> > But, if both original bdev and COW device are DAX-capable, we *should* be
> > able to support snapshotting (and refusing mixing of DAX-capable and
> > DAX-incapable devices in a snapshot is IMHO not very surprising to users).
> > When creating a snapshot of a device, we need to freeze the filesystem
> > using it. That will writeprotect all page tables so we are sure we'll get
> > page faults (and thus ->direct_access requests from DM POV) for each write
> > attempt to any mapping. Then ->direct_access method of snapshot-origin can
> > make sure to copy original contents to the COW-device before returning PFN
> > from ->direct_access. Similarly ->direct_access of COW-device can provide
> > remapped PFN so everything should work seamlessly from user POV.
> >
> > So something like the above would seem like the best solution from user
> > POV. Implementation of the above would not be completely trivial though as
> > far as I'm looking into DM code. We'd have to implement ->direct_access
> > paths for dm-snap and also I have a vague memory ->direct_access is not
> > allowed to sleep these days and DM uses sleeping locks all around... Dan
> > should know how big obstacle would it be to reintroduce the sleeping
> > possibility (I'm not currently aware of any particular problem with that
> > but I'm not paying close attention to those parts of NVDIMM code).
>
> Thanks for these details Jan. Think Dan is on sabbatical so we'll need
> Ross to weigh in.
Ross was on vacation as well and I didn't get any email from him for a few
weeks. I'm not sure when he'll be back. So I guess we are on our own.
> As you point out, how are the upper layers (e.g. filesystems) supposed
> to reliably cope with this runtime switch to from DAX to non-DAX access?
As Dave wrote, switching of underlying device from DAX to non-DAX would be
very difficult to implement currently to that's IMHO a no-go.
> It does look like we'll need the more elaborate work you outlined
> above. It could be that Mikulas will have interest, DAX expertise and
> time to do the work.
OK, thanks for including him.
> In general I _never_ should have taken commit f6e629bd237 ("dm snap: add
> fake origin_direct_access"). It gave the elusion that DAX is supported
> by dm-snapshot-origin when in reality it simply returns -EIO. Expecting
> that this will "just work" because the bio-based path would be used
> instead is extremely fragile.
>
> Until we properly add DAX support to dm-snapshot I'm afraid we really do
> need to tolerate this "regression". Since reality is the original
> support for snapshot of a DAX DM device never worked in a robust way.
> I'm running the risk of making peoples' heads explode but I cannot just
> drop everything and scramble to implement all the required DAX changes
> in dm-snapshot.
Yeah, I don't think the regression is critical. I just wanted to point out
that it exists and that we should look into fixing it...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next prev parent reply other threads:[~2018-08-31 9:54 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-27 16:07 Snapshot target and DAX-capable devices Jan Kara
2018-08-27 16:43 ` Kani, Toshi
2018-08-28 7:50 ` Jan Kara
2018-08-28 17:56 ` Mike Snitzer
2018-08-28 22:38 ` Kani, Toshi
2018-08-30 9:30 ` Jan Kara
2018-08-30 18:49 ` Mike Snitzer
2018-08-30 19:32 ` Jeff Moyer
2018-08-30 19:47 ` Mikulas Patocka
2018-08-30 19:53 ` Jeff Moyer
2018-08-30 23:38 ` Dave Chinner
2018-08-31 9:42 ` Jan Kara
2018-09-05 1:25 ` Dave Chinner
2018-12-12 16:11 ` Huaisheng Ye
2018-12-12 16:12 ` Christoph Hellwig
2018-12-12 17:50 ` Mike Snitzer
2018-12-12 19:49 ` Kani, Toshi
2018-12-12 21:15 ` Theodore Y. Ts'o
2018-12-12 22:43 ` Mike Snitzer
2018-12-14 4:11 ` [dm-devel] " Theodore Y. Ts'o
2018-12-14 8:24 ` [External] " Huaisheng HS1 Ye
2018-12-18 19:49 ` Mike Snitzer
2018-08-30 19:44 ` Mikulas Patocka
2018-08-31 10:01 ` Jan Kara
2018-08-30 22:55 ` Dave Chinner
2018-08-31 9:54 ` Jan Kara [this message]
2018-08-30 19:17 ` [dm-devel] " Jeff Moyer
2018-08-31 9:14 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180831095435.GC11622@quack2.suse.cz \
--to=jack@suse.cz \
--cc=dm-devel@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mpatocka@redhat.com \
--cc=snitzer@redhat.com \
--cc=zwisler@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).