nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Jeff Moyer <jmoyer@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [dm-devel] Snapshot target and DAX-capable devices
Date: Thu, 30 Aug 2018 15:17:16 -0400	[thread overview]
Message-ID: <x498t4naclf.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <20180830093028.GC1767@quack2.suse.cz> (Jan Kara's message of "Thu, 30 Aug 2018 11:30:28 +0200")

Jan Kara <jack@suse.cz> writes:

> On Tue 28-08-18 13:56:30, Mike Snitzer wrote:
>> On Tue, Aug 28 2018 at  3:50am -0400,
>> Jan Kara <jack@suse.cz> wrote:
>> 
>> > On Mon 27-08-18 16:43:28, Kani, Toshi wrote:
>> > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote:
>> > > > Hi,
>> > > > 
>> > > > I've been analyzing why fstest generic/081 fails when the backing device is
>> > > > capable of DAX. The problem boils down to the failure of:
>> > > > 
>> > > > lvm vgcreate -f vg0 /dev/pmem0
>> > > > lvm lvcreate -L 128M -n lv0 vg0
>> > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0
>> > > > 
>> > > > The last command fails like:
>> > > > 
>> > > >   device-mapper: reload ioctl on (253:0) failed: Invalid argument
>> > > >   Failed to lock logical volume vg0/lv0.
>> > > >   Aborting. Manual intervention required.
>> > > > 
>> > > > And the core of the problem is that volume vg0/lv0 is originally of
>> > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to
>> > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX.
>> > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent
>> > > > DAX mounts if not supported".
>> > > > 
>> > > > The question is whether / how this should be fixed. The current inability
>> > > > to create snapshots of DAX-capable devices looks weird and the cryptic
>> > > > failure makes it even worse (it took me quite a while to understand what is
>> > > > failing and why). OTOH I see the rationale behind Ross' change as well.
>> > > 
>> > > Here are the dm-snap changes that went along with the original DAX
>> > > support.
>> > > 
>> > > commit b5ab4a9ba55
>> > > commit f6e629bd237
>> > > 
>> > > Basically, snapshots can be added/removed to DAX-capable devices, but
>> > > snapshots need to be mounted without dax option.
>> > 
>> > Yes, and after these two commits things were working. But then commit
>> > dbc626597 broke things again so currently snapshotting DAX-capable devices
>> > does not work. Just try with 4.18...
>> 
>> Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as
>> such.  But commit dbc626597 has caused us to regress.. so we need to fix
>> it.
>> 
>> We could remove DM_TYPE_DAX_BIO_BASED completely.  But in the past I was
>> reluctant to do so because it really is unclear how/if we can even
>> support a device switching from DAX to non-DAX while IO is in-flight. DM
>> supports suspending without flushing (via dmsetup suspend --noflush) and
>> that could really be problematic if we leave DAX IO inflight and then
>> switch the DM table such that the DM device no longer supports DAX.
>
> Well, changing device from DAX-capable to DAX-incapable is problematic for
> filesystem on top of it as well. Filesystems simply don't expect this
> feature of a device can change so they would fail in unexpected ways. Also
> PFNs from the pmem (DAX-capable) device that are already mapped to user page
> tables won't magically become unmapped so those processes will still have
> DAX access to those areas of the device.
>
> But, if both original bdev and COW device are DAX-capable, we *should* be
> able to support snapshotting (and refusing mixing of DAX-capable and
> DAX-incapable devices in a snapshot is IMHO not very surprising to users).
> When creating a snapshot of a device, we need to freeze the filesystem
> using it. That will writeprotect all page tables so we are sure we'll get
> page faults (and thus ->direct_access requests from DM POV) for each write
> attempt to any mapping. Then ->direct_access method of snapshot-origin can
> make sure to copy original contents to the COW-device before returning PFN
> from ->direct_access. Similarly ->direct_access of COW-device can provide
> remapped PFN so everything should work seamlessly from user POV.

In your example above, if two processes have a file mapped with
MAP_SHARED, and P1 does a store, the new contents will not be reflected
in P2, right?.  This is different from what is expected, and different
from what happens when the page cache is involved.

I think you'd need to unmap all mappings on a CoW, whether triggered by
a store to an existing mapping or a write(2).

-Jeff
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  parent reply	other threads:[~2018-08-30 19:17 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 16:07 Snapshot target and DAX-capable devices Jan Kara
2018-08-27 16:43 ` Kani, Toshi
2018-08-28  7:50   ` Jan Kara
2018-08-28 17:56     ` Mike Snitzer
2018-08-28 22:38       ` Kani, Toshi
2018-08-30  9:30       ` Jan Kara
2018-08-30 18:49         ` Mike Snitzer
2018-08-30 19:32           ` Jeff Moyer
2018-08-30 19:47             ` Mikulas Patocka
2018-08-30 19:53               ` Jeff Moyer
2018-08-30 23:38               ` Dave Chinner
2018-08-31  9:42                 ` Jan Kara
2018-09-05  1:25                   ` Dave Chinner
2018-12-12 16:11                   ` Huaisheng Ye
2018-12-12 16:12                     ` Christoph Hellwig
2018-12-12 17:50                       ` Mike Snitzer
2018-12-12 19:49                         ` Kani, Toshi
2018-12-12 21:15                         ` Theodore Y. Ts'o
2018-12-12 22:43                           ` Mike Snitzer
2018-12-14  4:11                             ` [dm-devel] " Theodore Y. Ts'o
2018-12-14  8:24                             ` [External] " Huaisheng HS1 Ye
2018-12-18 19:49                               ` Mike Snitzer
2018-08-30 19:44           ` Mikulas Patocka
2018-08-31 10:01             ` Jan Kara
2018-08-30 22:55           ` Dave Chinner
2018-08-31  9:54           ` Jan Kara
2018-08-30 19:17         ` Jeff Moyer [this message]
2018-08-31  9:14           ` [dm-devel] " Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x498t4naclf.fsf@segfault.boston.devel.redhat.com \
    --to=jmoyer@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).