linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 0/3] ext2, ext4, xfs: hard fail dax mount on unsupported devices
       [not found]           ` <116ef687-f23d-b45c-1b48-fd444b346719@sandeen.net>
@ 2018-10-17 21:31             ` Jeff Moyer
  2018-10-17 21:44               ` Dan Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Moyer @ 2018-10-17 21:31 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Ross Zwisler, Christoph Hellwig, Dan Williams, Jan Kara,
	linux-xfs, linux-ext4, linux-fsdevel

Eric Sandeen <sandeen@sandeen.net> writes:

> I've been thinking about the per-inode stuff a bit, and while I don't know
> how to resolve some of the trickier issues, at least the expected behavior
> seems like something we can narrow down and specify.
>
> Because it's an on-disk flag (in xfs today, in any case) it seems that
> the only sane behavior to expect is either/or, i.e.:
>
> Mount option: All files always dax, per-inode flags ignored (or rejected)
> Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
>
> Think about it; what would mount-option-plus-per-inode mean?  We have
> no "negative" dax flag, so while mount-option-with-flag surely means
> "dax", what the heck does mount-option-without-flag mean, and how is it
> distinguishable from mount option only?
>
> I submit that flags can only have meaning w/o the fs-wide mount option
> enabled, so the question of "should we hard fail mount -o dax for devices
> that cannot support it" seems to be orthogonal to the per-inode question.
>
> i.e. mount -o dax really can only mean "I want dax on everything" and so
> again, I think we probably need to fail the mount if that can't be honored.

I hate to even open up this can of worms, but what about killing the dax
mount option?

To quote Christoph:
  How does an application "make use of DAX"?  What actual user visible
  semantics are associated with a file that has this flag set?

We're already talking about making caching decisions automatically, so
does DAX even mean anything at that point?  If the storage and the file
system support it, enable it.

>From what we've seen so far, aplications want:
1) to be able to make data persistent from userspace
   For this, we have MAP_SYNC.
2) to determine whether or not page cache will be used
   For this, we have O_DIRECT for read/write access, and MAP_SYNC for
   mmap access (and maybe a third option coming, we'll see).

The only thing users gain from a mount option is the ability to turn OFF
dax.  I suppose there might be a use case that wants this, but I'm not
aware of it.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/3] ext2, ext4, xfs: hard fail dax mount on unsupported devices
  2018-10-17 21:31             ` [PATCH 0/3] ext2, ext4, xfs: hard fail dax mount on unsupported devices Jeff Moyer
@ 2018-10-17 21:44               ` Dan Williams
  2018-10-18  1:05                 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Dan Williams @ 2018-10-17 21:44 UTC (permalink / raw)
  To: jmoyer
  Cc: Eric Sandeen, zwisler, Christoph Hellwig, Jan Kara, linux-xfs,
	linux-ext4, linux-fsdevel

On Wed, Oct 17, 2018 at 2:31 PM Jeff Moyer <jmoyer@redhat.com> wrote:
>
> Eric Sandeen <sandeen@sandeen.net> writes:
>
> > I've been thinking about the per-inode stuff a bit, and while I don't know
> > how to resolve some of the trickier issues, at least the expected behavior
> > seems like something we can narrow down and specify.
> >
> > Because it's an on-disk flag (in xfs today, in any case) it seems that
> > the only sane behavior to expect is either/or, i.e.:
> >
> > Mount option: All files always dax, per-inode flags ignored (or rejected)
> > Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
> >
> > Think about it; what would mount-option-plus-per-inode mean?  We have
> > no "negative" dax flag, so while mount-option-with-flag surely means
> > "dax", what the heck does mount-option-without-flag mean, and how is it
> > distinguishable from mount option only?
> >
> > I submit that flags can only have meaning w/o the fs-wide mount option
> > enabled, so the question of "should we hard fail mount -o dax for devices
> > that cannot support it" seems to be orthogonal to the per-inode question.
> >
> > i.e. mount -o dax really can only mean "I want dax on everything" and so
> > again, I think we probably need to fail the mount if that can't be honored.
>
> I hate to even open up this can of worms, but what about killing the dax
> mount option?
>
> To quote Christoph:
>   How does an application "make use of DAX"?  What actual user visible
>   semantics are associated with a file that has this flag set?
>
> We're already talking about making caching decisions automatically, so
> does DAX even mean anything at that point?  If the storage and the file
> system support it, enable it.
>
> From what we've seen so far, aplications want:
> 1) to be able to make data persistent from userspace
>    For this, we have MAP_SYNC.
> 2) to determine whether or not page cache will be used
>    For this, we have O_DIRECT for read/write access, and MAP_SYNC for
>    mmap access (and maybe a third option coming, we'll see).

As Jan has said, it's not safe to assume that 'no page cache' is
implied with MAP_SYNC. It's a side effect not a contract of the
current implementation.

> The only thing users gain from a mount option is the ability to turn OFF
> dax.  I suppose there might be a use case that wants this, but I'm not
> aware of it.

I think we're stuck with it as many scripts would break if it ever
went completely away. However, we could mark it deprecated / ignored
provided we had a way for applications to query and override if DAX is
enabled. I also think it's important to keep separate the dax-mmap
behavior from the dax-read/write behavior. dax-mmap is where an
application would make different decisions if it can get a mapping
without page cache, dax-read/write does not appear to have any
justification to be advertised because the application would not do
anything different whether that is present or not.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/3] ext2, ext4, xfs: hard fail dax mount on unsupported devices
  2018-10-17 21:44               ` Dan Williams
@ 2018-10-18  1:05                 ` Dave Chinner
  2018-10-18  2:01                   ` Dan Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2018-10-18  1:05 UTC (permalink / raw)
  To: Dan Williams
  Cc: jmoyer, Eric Sandeen, zwisler, Christoph Hellwig, Jan Kara,
	linux-xfs, linux-ext4, linux-fsdevel

On Wed, Oct 17, 2018 at 02:44:55PM -0700, Dan Williams wrote:
> On Wed, Oct 17, 2018 at 2:31 PM Jeff Moyer <jmoyer@redhat.com> wrote:
> >
> > Eric Sandeen <sandeen@sandeen.net> writes:
> >
> > > I've been thinking about the per-inode stuff a bit, and while I don't know
> > > how to resolve some of the trickier issues, at least the expected behavior
> > > seems like something we can narrow down and specify.
> > >
> > > Because it's an on-disk flag (in xfs today, in any case) it seems that
> > > the only sane behavior to expect is either/or, i.e.:
> > >
> > > Mount option: All files always dax, per-inode flags ignored (or rejected)
> > > Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
> > >
> > > Think about it; what would mount-option-plus-per-inode mean?  We have
> > > no "negative" dax flag, so while mount-option-with-flag surely means
> > > "dax", what the heck does mount-option-without-flag mean, and how is it
> > > distinguishable from mount option only?
> > >
> > > I submit that flags can only have meaning w/o the fs-wide mount option
> > > enabled, so the question of "should we hard fail mount -o dax for devices
> > > that cannot support it" seems to be orthogonal to the per-inode question.
> > >
> > > i.e. mount -o dax really can only mean "I want dax on everything" and so
> > > again, I think we probably need to fail the mount if that can't be honored.
> >
> > I hate to even open up this can of worms, but what about killing the dax
> > mount option?
> >
> > To quote Christoph:
> >   How does an application "make use of DAX"?  What actual user visible
> >   semantics are associated with a file that has this flag set?
> >
> > We're already talking about making caching decisions automatically, so
> > does DAX even mean anything at that point?  If the storage and the file
> > system support it, enable it.
> >
> > From what we've seen so far, aplications want:
> > 1) to be able to make data persistent from userspace
> >    For this, we have MAP_SYNC.
> > 2) to determine whether or not page cache will be used
> >    For this, we have O_DIRECT for read/write access, and MAP_SYNC for
> >    mmap access (and maybe a third option coming, we'll see).
> 
> As Jan has said, it's not safe to assume that 'no page cache' is
> implied with MAP_SYNC. It's a side effect not a contract of the
> current implementation.

Even MAP_DIRECT shouldn't mean "no page cache". O_DIRECT is a hint,
not a guarantee, and so it may very well use the page cache if it
needs to (as I've just explained in detail in a different thread).

> > The only thing users gain from a mount option is the ability to turn OFF
> > dax.  I suppose there might be a use case that wants this, but I'm not
> > aware of it.
> 
> I think we're stuck with it as many scripts would break if it ever
> went completely away. However, we could mark it deprecated / ignored

I don't really care that much about this - it is still marked
experimental.

That said, deprecation is the best way forward here if we are going
to remove the mount option. We've done this for other XFS mount
options recently (e.g. barrier/nobarrier) where the functionality is
now fully baked into the fileystem and there's no user option to
control it anymore.

What we really need is a document describing the expected behaviour
of filesysetms on dax-capable storage. Let's nail down exactly what
we need to do to pull DAX out of the experimental state before we
start changing things. We've been doing things in a very ad-hoc way
for a while now, and we're not really converging on an endpoint where we
can say "we're done, have at it".

I think we need to decide on:

- default filesystem behaviour on dax-capable block devices
- what information aout DAX do applications actually need? What
  makes sense to provide them with that information?
- how to provide hints to the kernel for desired behaviour
  - on-disk inode flags, or something else?
  - dax/nodax mount options or root dir inode flags become default
    global hints?
  - is a single hint flag sufficient or do we also need an
    explicit "do not use dax" flag?
- behaviour of MAP_SYNC w.r.t. non-DAX filesystems that can provide
  required MAP_SYNC semnatics
- behaviour of MAP_DIRECT - hint like O_DIRECT or guarantee?
- default read/write path behaviour of dax-capable block devices
  - automatically bypass the pagecache if bdev is capable?
- default mmap behaviour on dax capable devices
  - use dax always?
- DAX vs get_user_pages_longterm
  - turns off DAX dynamically?
  - how do DAX-enabled filesystems interact with page fault capable
    hardware? Can we allow DAX in those cases?

I'm sure there's a heap more we need to document and nail down.
There's a lot of stuff to sort out before we start hammering on
random bits of code....

> provided we had a way for applications to query and override if DAX is
> enabled. I also think it's important to keep separate the dax-mmap
> behavior from the dax-read/write behavior. dax-mmap is where an
> application would make different decisions if it can get a mapping
> without page cache,

The functionality people keep saying "requires DAX" really doesn't -
what it really requires is that mmap() exposes filesystem tracked
pmem in a CPU addressable memory range. DAX is not the only way to
do that - a filesystem with a pmem-based persistent page cache can
provide MAP_SYNC semantics to userspace without being a DAX
filesystem.

(see other thread again)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/3] ext2, ext4, xfs: hard fail dax mount on unsupported devices
  2018-10-18  1:05                 ` Dave Chinner
@ 2018-10-18  2:01                   ` Dan Williams
  0 siblings, 0 replies; 4+ messages in thread
From: Dan Williams @ 2018-10-18  2:01 UTC (permalink / raw)
  To: david
  Cc: jmoyer, Eric Sandeen, zwisler, Christoph Hellwig, Jan Kara,
	linux-xfs, linux-ext4, linux-fsdevel

On Wed, Oct 17, 2018 at 6:05 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Oct 17, 2018 at 02:44:55PM -0700, Dan Williams wrote:
> > On Wed, Oct 17, 2018 at 2:31 PM Jeff Moyer <jmoyer@redhat.com> wrote:
> > >
> > > Eric Sandeen <sandeen@sandeen.net> writes:
> > >
> > > > I've been thinking about the per-inode stuff a bit, and while I don't know
> > > > how to resolve some of the trickier issues, at least the expected behavior
> > > > seems like something we can narrow down and specify.
> > > >
> > > > Because it's an on-disk flag (in xfs today, in any case) it seems that
> > > > the only sane behavior to expect is either/or, i.e.:
> > > >
> > > > Mount option: All files always dax, per-inode flags ignored (or rejected)
> > > > Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
> > > >
> > > > Think about it; what would mount-option-plus-per-inode mean?  We have
> > > > no "negative" dax flag, so while mount-option-with-flag surely means
> > > > "dax", what the heck does mount-option-without-flag mean, and how is it
> > > > distinguishable from mount option only?
> > > >
> > > > I submit that flags can only have meaning w/o the fs-wide mount option
> > > > enabled, so the question of "should we hard fail mount -o dax for devices
> > > > that cannot support it" seems to be orthogonal to the per-inode question.
> > > >
> > > > i.e. mount -o dax really can only mean "I want dax on everything" and so
> > > > again, I think we probably need to fail the mount if that can't be honored.
> > >
> > > I hate to even open up this can of worms, but what about killing the dax
> > > mount option?
> > >
> > > To quote Christoph:
> > >   How does an application "make use of DAX"?  What actual user visible
> > >   semantics are associated with a file that has this flag set?
> > >
> > > We're already talking about making caching decisions automatically, so
> > > does DAX even mean anything at that point?  If the storage and the file
> > > system support it, enable it.
> > >
> > > From what we've seen so far, aplications want:
> > > 1) to be able to make data persistent from userspace
> > >    For this, we have MAP_SYNC.
> > > 2) to determine whether or not page cache will be used
> > >    For this, we have O_DIRECT for read/write access, and MAP_SYNC for
> > >    mmap access (and maybe a third option coming, we'll see).
> >
> > As Jan has said, it's not safe to assume that 'no page cache' is
> > implied with MAP_SYNC. It's a side effect not a contract of the
> > current implementation.
>
> Even MAP_DIRECT shouldn't mean "no page cache". O_DIRECT is a hint,
> not a guarantee, and so it may very well use the page cache if it
> needs to (as I've just explained in detail in a different thread).
>
> > > The only thing users gain from a mount option is the ability to turn OFF
> > > dax.  I suppose there might be a use case that wants this, but I'm not
> > > aware of it.
> >
> > I think we're stuck with it as many scripts would break if it ever
> > went completely away. However, we could mark it deprecated / ignored
>
> I don't really care that much about this - it is still marked
> experimental.
>
> That said, deprecation is the best way forward here if we are going
> to remove the mount option. We've done this for other XFS mount
> options recently (e.g. barrier/nobarrier) where the functionality is
> now fully baked into the fileystem and there's no user option to
> control it anymore.
>
> What we really need is a document describing the expected behaviour
> of filesysetms on dax-capable storage. Let's nail down exactly what
> we need to do to pull DAX out of the experimental state before we
> start changing things. We've been doing things in a very ad-hoc way
> for a while now, and we're not really converging on an endpoint where we
> can say "we're done, have at it".
>
> I think we need to decide on:
>
> - default filesystem behaviour on dax-capable block devices
> - what information aout DAX do applications actually need? What
>   makes sense to provide them with that information?
> - how to provide hints to the kernel for desired behaviour
>   - on-disk inode flags, or something else?
>   - dax/nodax mount options or root dir inode flags become default
>     global hints?
>   - is a single hint flag sufficient or do we also need an
>     explicit "do not use dax" flag?
> - behaviour of MAP_SYNC w.r.t. non-DAX filesystems that can provide
>   required MAP_SYNC semnatics
> - behaviour of MAP_DIRECT - hint like O_DIRECT or guarantee?
> - default read/write path behaviour of dax-capable block devices
>   - automatically bypass the pagecache if bdev is capable?
> - default mmap behaviour on dax capable devices
>   - use dax always?
> - DAX vs get_user_pages_longterm
>   - turns off DAX dynamically?
>   - how do DAX-enabled filesystems interact with page fault capable
>     hardware? Can we allow DAX in those cases?
>
> I'm sure there's a heap more we need to document and nail down.
> There's a lot of stuff to sort out before we start hammering on
> random bits of code....

Nice, yes, I'll add some more:

- Is MADV_DIRECT_ACCESS a hint or a requirement?
- How does the kernel communicate the effective mode of a mapping
  taking into account madvise(), inode flags, mount options, and / or
  default fs behavior? New madvice() syscall?
- What is the behavior of dax in the presence of reflink'd extents?
  Just failing seems the 'experimental' behavior. What to do about
  page->index when page belongs to more than 1 file via reflink?
- Is there ever a case to force disable dax operation? To date we've
  only ever thought about interfaces to force *enable* dax operation
- The virtio-pmem use case wants dax mappings but requires an explicit
  fsync() instead of MAP_SYNC to flush software buffers, it's a DAX
  sub-set, should it have it's own name?
- DAX operation is loosely tied to block devices. There has been
  discussions of mounting filesystems on /dev/dax devices directly.
  Should we take that to its logical conclusion and support a
  block-layer-less conversion of dax-capable file systems?
- Willy has proposed that the Xarray cache file-offset-to-physical
  address lookups, currently it only tracks dirty mapping state
- The NVDIMM sub-system tracks badblocks, but the filesytem currently
  only finds out about them late when it attempts dax_direct_access().
  Applications want to be able to list files+offsets that have
  experienced media corruption.

> > provided we had a way for applications to query and override if DAX is
> > enabled. I also think it's important to keep separate the dax-mmap
> > behavior from the dax-read/write behavior. dax-mmap is where an
> > application would make different decisions if it can get a mapping
> > without page cache,
>
> The functionality people keep saying "requires DAX" really doesn't -
> what it really requires is that mmap() exposes filesystem tracked
> pmem in a CPU addressable memory range. DAX is not the only way to
> do that - a filesystem with a pmem-based persistent page cache can
> provide MAP_SYNC semantics to userspace without being a DAX
> filesystem.

*nod*

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-18 10:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1539027169-23332-1-git-send-email-sandeen@sandeen.net>
     [not found] ` <20181011103636.GC9467@quack2.suse.cz>
     [not found]   ` <CAPcyv4iAD_wkjY1way1rOxr4=gC_TGx71VquF13ooAuUPz9RJw@mail.gmail.com>
     [not found]     ` <5a8e54e8-4845-1c85-e4e9-0b9b551a9ce2@sandeen.net>
     [not found]       ` <20181012082154.GB30154@lst.de>
     [not found]         ` <CAOxpaSWf=6RBTa3WM=Hnbr7MwpQ5mMSMAZ+B5FfZo3zKv4Nu7w@mail.gmail.com>
     [not found]           ` <116ef687-f23d-b45c-1b48-fd444b346719@sandeen.net>
2018-10-17 21:31             ` [PATCH 0/3] ext2, ext4, xfs: hard fail dax mount on unsupported devices Jeff Moyer
2018-10-17 21:44               ` Dan Williams
2018-10-18  1:05                 ` Dave Chinner
2018-10-18  2:01                   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).