linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Toshi Kani <toshi.kani@hpe.com>,
	dm-devel@redhat.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2 4/7] dm: prevent DAX mounts if not supported
Date: Mon, 25 Jun 2018 13:20:36 -0600	[thread overview]
Message-ID: <20180625192036.GA11672@linux.intel.com> (raw)
In-Reply-To: <20180620151748.GA4847@redhat.com>

On Wed, Jun 20, 2018 at 11:17:49AM -0400, Mike Snitzer wrote:
> On Mon, Jun 04 2018 at  7:15pm -0400,
> Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> 
> > On Fri, Jun 01, 2018 at 05:55:13PM -0400, Mike Snitzer wrote:
> > > On Tue, May 29 2018 at  3:51pm -0400,
> > > Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> > > 
> > > > Currently the code in dm_dax_direct_access() only checks whether the target
> > > > type has a direct_access() operation defined, not whether the underlying
> > > > block devices all support DAX.  This latter property can be seen by looking
> > > > at whether we set the QUEUE_FLAG_DAX request queue flag when creating the
> > > > DM device.
> > > 
> > > Wait... I thought DAX support was all or nothing?
> > 
> > Right, it is, and that's what I'm trying to capture.  The point of this series
> > is to make sure that we don't use DAX thru DM if one of the DM members doesn't
> > support DAX.
> > 
> > This is a bit tricky, though, because as you've pointed out there are a lot of
> > elements that go into a block device actually supporting DAX.  
> > 
> > First, the block device has to have a direct_access() operation defined in its
> > struct dax_operations table.  This is a static definition in the drivers,
> > though, so it's necessary but not sufficient.  For example, the PMEM driver
> > always defines a direct_access() operation, but depending on the mode of the
> > namespace (raw, fsdax or sector) it may or may not support DAX.
> > 
> > The next step is that a driver needs to say that he block queue supports
> > QUEUE_FLAG_DAX.  This again is necessary but not sufficient.  The PMEM driver
> > currently sets this for all namespace modes, but I agree that this should be
> > restricted to modes that support DAX.  Even once we do that, though, for the
> > block driver this isn't fully sufficient.  We'd really like users to call
> > bdev_dax_supported() so it can run some additional tests to make sure that DAX
> > will work.
> > 
> > So, the real test that filesystems rely on is bdev_dax_suppported().
> > 
> > The trick is that with DM we need to verify each block device via
> > bdev_dax_supported() just like a filesystem would, and then have some way of
> > communicating the result of all those checks to the filesystem which is
> > eventually mounted on the DM device.  At DAX mount time the filesystem will
> > call bdev_dax_supported() on the DM device, but it'll really only check the
> > first device.  
> > 
> > So, the strategy is to have DM manually check each member device via
> > bdev_dax_supported() then if they all pass set QUEUE_FLAG_DAX.  This then
> > becomes our one source of truth on whether or not a DM device supports DAX.
> > When the filesystem mounts with DAX support it'll also run
> > bdev_dax_supported(), but if we have QUEUE_FLAG_DAX set on the DM device, we
> > know that this check will pass.
> > 
> > > > This is problematic if we have, for example, a dm-linear device made up of
> > > > a PMEM namespace in fsdax mode followed by a ramdisk from BRD.
> > > > QUEUE_FLAG_DAX won't be set on the dm-linear device's request queue, but
> > > > we have a working direct_access() entry point and the first member of the
> > > > dm-linear set *does* support DAX.
> > > 
> > > If you don't have a uniformly capable device then it is very dangerous
> > > to advertise that the entire device has a certain capability.  That
> > > completely bit me in the past with discard (because for every IO I
> > > wasn't then checking if the destination device supported discards).
> > >
> > > It is all well and good that you're adding that check here.  But what I
> > > don't like is how you're saying QUEUE_FLAG_DAX implies direct_access()
> > > operation exists.. yet for raw PMEM namespaces we just discussed how
> > > that is a lie.
> > 
> > QUEUE_FLAG_DAX does imply that direct_access() exits.  However, as discussed
> > above for a given bdev we really do need to check bdev_dax_supported().
> > 
> > > SO this type of change showcases how the QUEUE_FLAG_DAX doesn't _really_
> > > imply direct_access() exists.
> > > 
> > > > This allows the user to create a filesystem on the dm-linear device, and
> > > > then mount it with DAX.  The filesystem's bdev_dax_supported() test will
> > > > pass because it'll operate on the first member of the dm-linear device,
> > > > which happens to be a fsdax PMEM namespace.
> > > > 
> > > > All DAX I/O will then fail to that dm-linear device because the lack of
> > > > QUEUE_FLAG_DAX prevents fs_dax_get_by_bdev() from working.  This means that
> > > > the struct dax_device isn't ever set in the filesystem, so
> > > > dax_direct_access() will always return -EOPNOTSUPP.
> > > 
> > > Now you've lost me... these past 2 paragraphs.  Why can a user mount it
> > > is DAX mode?  Because bdev_dax_supported() only accesses the first
> > > portion (which happens to have DAX capabilities?)
> > 
> > Right.  bdev_dax_supported() runs all of its checks, and because they are
> > running against the first block device in the dm set, they all pass.  But the
> > overall DM device does not actually support DAX.
> > 
> > > Isn't this exactly why you should be checking for QUEUE_FLAG_DAX in the
> > > caller (bdev_dax_supported)?  Why not use bdev_get_queue() and verify
> > > QUEUE_FLAG_DAX is set in there?
> > 
> > I'll look into that for the next revision, thanks.
> 
> Have you made any progress on a new revision?
> 
> > > > By failing out of dm_dax_direct_access() if QUEUE_FLAG_DAX isn't set we let
> > > > the filesystem know we don't support DAX at mount time.  The filesystem
> > > > will then silently fall back and remove the dax mount option, causing it to
> > > > work properly.
> > > 
> > > This shouldn't be needed.  Again, QUEUE_FLAG_DAX wasn't set.. so don't
> > > allow code to falsely try operations that should've been gated by the
> > > fact it wasn't set.
> > 
> > Right, the goal is to make QUEUE_FLAG_DAX our one source of truth for whether
> > DM devices support DAX, and not have it half defined by that and half by the
> > DM_TYPE_DAX_BIO_BASED.
> 
> My hope is that you can ignore the DM-internal book-keeping
> (DM_TYPE_DAX_BIO_BASED) for now and just focus on fixing the real issue
> of needing proper checking (as well as properly _not_ setting
> QUEUE_FLAG_DAX in the case of pmem "raw").
> 
> Please advise, thanks Ross!

I'm back working on this, and will send out another revision in the next day
or so.

  reply	other threads:[~2018-06-25 19:20 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-29 19:50 [PATCH v2 0/7] Fix DM DAX handling Ross Zwisler
2018-05-29 19:51 ` [PATCH v2 1/7] fs: allow per-device dax status checking for filesystems Ross Zwisler
2018-05-29 19:51 ` [PATCH v2 2/7] dax: change bdev_dax_supported() to support boolean returns Ross Zwisler
2018-05-29 21:25   ` Darrick J. Wong
2018-05-29 22:01     ` Ross Zwisler
2018-05-31 19:13       ` Darrick J. Wong
2018-05-31 20:34         ` Ross Zwisler
2018-05-31 20:35         ` Dan Williams
2018-05-31 20:41         ` Ross Zwisler
2018-05-31 20:52         ` Mike Snitzer
2018-05-31 22:26           ` [dm-devel] " Darrick J. Wong
2018-06-01 20:59             ` Ross Zwisler
2018-06-01  1:26         ` Dave Chinner
2018-06-01  1:57           ` Dan Williams
2018-06-01  2:24             ` Dave Chinner
2018-06-01  4:02               ` Dan Williams
2018-06-03 22:20                 ` Dave Chinner
2018-06-04  0:25                   ` Dave Chinner
2018-06-04  1:48                     ` Dan Williams
2018-06-04 23:40                       ` Dan Williams
2018-06-05  0:33                         ` Mike Snitzer
2018-06-05  5:55                           ` Dave Chinner
2018-06-05  3:32                         ` Dan Williams
2018-05-29 19:51 ` [PATCH v2 3/7] dm: fix test for DAX device support Ross Zwisler
2018-06-01 20:19   ` Mike Snitzer
2018-06-01 20:46     ` Mike Snitzer
2018-06-01 21:11       ` Ross Zwisler
2018-06-01 21:16       ` Dan Williams
2018-05-29 19:51 ` [PATCH v2 4/7] dm: prevent DAX mounts if not supported Ross Zwisler
2018-06-01 21:55   ` Mike Snitzer
2018-06-04 23:15     ` Ross Zwisler
2018-06-20 15:17       ` Mike Snitzer
2018-06-25 19:20         ` Ross Zwisler [this message]
2018-05-29 19:51 ` [PATCH v2 5/7] dm: remove DM_TYPE_DAX_BIO_BASED dm_queue_mode Ross Zwisler
2018-06-01 22:04   ` Mike Snitzer
2018-06-04 23:24     ` Ross Zwisler
2018-06-04 23:49       ` Kani, Toshi
2018-06-05  0:46       ` Mike Snitzer
2018-06-06 17:24         ` Ross Zwisler
2018-06-06 22:29           ` Mike Snitzer
2018-05-29 19:51 ` [PATCH v2 6/7] dm-snap: remove unnecessary direct_access() stub Ross Zwisler
2018-05-29 19:51 ` [PATCH v2 7/7] dm-error: " Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180625192036.GA11672@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=toshi.kani@hpe.com \
    --subject='Re: [PATCH v2 4/7] dm: prevent DAX mounts if not supported' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).