From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.cz>, Jan Kara <jack@suse.cz>,
linux-nvdimm <linux-nvdimm@lists.01.org>,
Christoph Hellwig <hch@infradead.org>,
Linux MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps
Date: Fri, 19 Oct 2018 14:01:03 +1100 [thread overview]
Message-ID: <20181019030103.GG18822@dastard> (raw)
In-Reply-To: <CAPcyv4gEmCt3OwQ_AoFCmpX5fmmBppvaxtQ+uPT=_f2MXezcGg@mail.gmail.com>
On Thu, Oct 18, 2018 at 12:10:13PM -0700, Dan Williams wrote:
> The only caveat to address all the use cases for applications making
> decisions based on the presence of DAX
And that's how we've got into this mess.
Applications need to focus on the functionality they require, not
the technology that provides it. That's the root of the we are
trying to solve here and really I don't care if we have to break
existing applications to do it. i.e. we've made no promises about
API/ABI stability and the functionality is still experimental.
Fundamentally, DAX is a technology, not an API property. The two
"DAX" API properties that matter to applications are:
1. does mmap allow us to use CPU flush instructions for data
integrity operations safely? And
2. can mmap directly access the backing store without
incurring any additional overhead?
MAP_SYNC provides #1, MAP_DIRECT provides #2, and DAX provides both.
However, they do not define DAX, nor does DAX define them. e.g.
MAP_SYNC can be provided by a persistent memory page cache.
But a persistent memory page cache does not provide
MAP_DIRECT.
MAP_SYNC can be provided by filesystem DAX, but *only* when
direct access is used. i.e. MAP_SYNC | MAP_DIRECT
MAP_DIRECT can be provided by filesystem DAX, but it does
not imply or require MAP_SYNC behaviour.
IOWs, using MAP_SYNC and/or MAP_DIRECT to answering an "is DAX
present" question ties the API to a technology rather than to the
functionality the technology provides applications.
i.e. If the requested behaviour/property is not available from the
underlying technology, then the app needs to handle that error and
use a different access method.
> applications making
> decisions based on the presence of DAX
> is to make MADV_DIRECT_ACCESS
> fail if the mapping was not established with MAP_SYNC.
And so this is wrong - MADV_DIRECT_ACCESS does not require MAP_SYNC.
It is perfectly legal for MADV_DIRECT_ACCESS to be used without
MAP_SYNC - the app just needs to use msync/fsync instead.
Wanting to enable full userspace CPU data sync semantics via
madvise() implies we also need MADV_SYNC in addition to
MADV_DIRECT_ACCESS.
i.e. Apps that are currently testing for dax should use
mmap(MAP_SYNC|MAP_DIRECT) or madvise(MADV_SYNC|MADV_DIRECT) and they
will fail if the underlying storage is not DAX capable. The app
doesn't need to poke at anything else to see if DAX is enabled - if
the functionality is there then it will work, otherwise they need to
handle the error and do something else.
> That way we
> have both a way to assert that page cache resources are not being
> consumed, and that the kernel is handling metadata synchronization for
> any write-faults.
Yes, we need to do that, but not at the cost of having the API
prevent apps from ever being able to use direct access + msync/fsync
data integrity operations.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next prev parent reply other threads:[~2018-10-19 3:01 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-02 10:05 Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps Jan Kara
2018-10-02 10:50 ` Michal Hocko
2018-10-02 13:32 ` Jan Kara
2018-10-02 12:10 ` Johannes Thumshirn
2018-10-02 14:20 ` Johannes Thumshirn
2018-10-02 14:45 ` Christoph Hellwig
2018-10-02 15:01 ` Johannes Thumshirn
2018-10-02 15:06 ` Christoph Hellwig
2018-10-04 10:09 ` Johannes Thumshirn
2018-10-05 6:25 ` Christoph Hellwig
2018-10-05 6:35 ` Johannes Thumshirn
2018-10-06 1:17 ` Dan Williams
2018-10-14 15:47 ` Dan Williams
2018-10-17 20:01 ` Dan Williams
2018-10-18 17:43 ` Jan Kara
2018-10-18 19:10 ` Dan Williams
2018-10-19 3:01 ` Dave Chinner [this message]
2018-10-02 14:29 ` Jan Kara
2018-10-02 14:37 ` Christoph Hellwig
2018-10-02 14:44 ` Johannes Thumshirn
2018-10-02 14:52 ` Christoph Hellwig
2018-10-02 15:31 ` Jan Kara
2018-10-02 20:18 ` Dan Williams
2018-10-03 12:50 ` Jan Kara
2018-10-03 14:38 ` Dan Williams
2018-10-03 15:06 ` Jan Kara
2018-10-03 15:13 ` Dan Williams
2018-10-03 16:44 ` Jan Kara
2018-10-03 21:13 ` Dan Williams
2018-10-04 10:04 ` Johannes Thumshirn
2018-10-02 15:07 ` Jan Kara
2018-10-17 20:23 ` Jeff Moyer
2018-10-18 0:25 ` Dave Chinner
2018-10-18 14:55 ` Jan Kara
2018-10-19 0:43 ` Dave Chinner
2018-10-30 6:30 ` Dan Williams
2018-10-30 22:49 ` Dave Chinner
2018-10-30 22:59 ` Dan Williams
2018-10-31 5:59 ` y-goto
2018-11-01 23:00 ` Dave Chinner
2018-11-02 1:43 ` y-goto
2018-10-18 21:05 ` Jeff Moyer
2018-10-09 19:43 ` Jeff Moyer
2018-10-16 8:25 ` Jan Kara
2018-10-16 12:35 ` Jeff Moyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181019030103.GG18822@dastard \
--to=david@fromorbit.com \
--cc=dan.j.williams@intel.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).