Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>, Johannes Thumshirn <jthumshirn@suse.de>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Michal Hocko <mhocko@suse.cz>
Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps
Date: Fri, 19 Oct 2018 14:01:03 +1100
Message-ID: <20181019030103.GG18822@dastard> (raw)
In-Reply-To: <CAPcyv4gEmCt3OwQ_AoFCmpX5fmmBppvaxtQ+uPT=_f2MXezcGg@mail.gmail.com>

On Thu, Oct 18, 2018 at 12:10:13PM -0700, Dan Williams wrote:
> The only caveat to address all the use cases for applications making
> decisions based on the presence of DAX

And that's how we've got into this mess.

Applications need to focus on the functionality they require, not
the technology that provides it. That's the root of the we are
trying to solve here and really I don't care if we have to break
existing applications to do it. i.e. we've made no promises about
API/ABI stability and the functionality is still experimental.

Fundamentally, DAX is a technology, not an API property. The two
"DAX" API properties that matter to applications are:

	1. does mmap allow us to use CPU flush instructions for data
	integrity operations safely? And
	2. can mmap directly access the backing store without
	incurring any additional overhead?

MAP_SYNC provides #1, MAP_DIRECT provides #2, and DAX provides both.
However, they do not define DAX, nor does DAX define them. e.g.

	MAP_SYNC can be provided by a persistent memory page cache.
	But a persistent memory page cache does not provide
	MAP_DIRECT.

	MAP_SYNC can be provided by filesystem DAX, but *only* when
	direct access is used. i.e. MAP_SYNC | MAP_DIRECT

	MAP_DIRECT can be provided by filesystem DAX, but it does
	not imply or require MAP_SYNC behaviour.

IOWs, using MAP_SYNC and/or MAP_DIRECT to answering an "is DAX
present" question ties the API to a technology rather than to the
functionality the technology provides applications.

i.e. If the requested behaviour/property is not available from the
underlying technology, then the app needs to handle that error and
use a different access method.

> applications making
> decisions based on the presence of DAX
> is to make MADV_DIRECT_ACCESS
> fail if the mapping was not established with MAP_SYNC.

And so this is wrong - MADV_DIRECT_ACCESS does not require MAP_SYNC.

It is perfectly legal for MADV_DIRECT_ACCESS to be used without
MAP_SYNC - the app just needs to use msync/fsync instead.

Wanting to enable full userspace CPU data sync semantics via
madvise() implies we also need MADV_SYNC in addition to
MADV_DIRECT_ACCESS.

i.e. Apps that are currently testing for dax should use
mmap(MAP_SYNC|MAP_DIRECT) or madvise(MADV_SYNC|MADV_DIRECT) and they
will fail if the underlying storage is not DAX capable. The app
doesn't need to poke at anything else to see if DAX is enabled - if
the functionality is there then it will work, otherwise they need to
handle the error and do something else.

> That way we
> have both a way to assert that page cache resources are not being
> consumed, and that the kernel is handling metadata synchronization for
> any write-faults.

Yes, we need to do that, but not at the cost of having the API
prevent apps from ever being able to use direct access + msync/fsync
data integrity operations.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply index

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-02 10:05 Jan Kara
2018-10-02 10:50 ` Michal Hocko
2018-10-02 13:32   ` Jan Kara
2018-10-02 12:10 ` Johannes Thumshirn
2018-10-02 14:20   ` Johannes Thumshirn
2018-10-02 14:45     ` Christoph Hellwig
2018-10-02 15:01       ` Johannes Thumshirn
2018-10-02 15:06         ` Christoph Hellwig
2018-10-04 10:09           ` Johannes Thumshirn
2018-10-05  6:25             ` Christoph Hellwig
2018-10-05  6:35               ` Johannes Thumshirn
2018-10-06  1:17                 ` Dan Williams
2018-10-14 15:47                   ` Dan Williams
2018-10-17 20:01                     ` Dan Williams
2018-10-18 17:43                       ` Jan Kara
2018-10-18 19:10                         ` Dan Williams
2018-10-19  3:01                           ` Dave Chinner [this message]
2018-10-02 14:29   ` Jan Kara
2018-10-02 14:37     ` Christoph Hellwig
2018-10-02 14:44       ` Johannes Thumshirn
2018-10-02 14:52         ` Christoph Hellwig
2018-10-02 15:31           ` Jan Kara
2018-10-02 20:18             ` Dan Williams
2018-10-03 12:50               ` Jan Kara
2018-10-03 14:38                 ` Dan Williams
2018-10-03 15:06                   ` Jan Kara
2018-10-03 15:13                     ` Dan Williams
2018-10-03 16:44                       ` Jan Kara
2018-10-03 21:13                         ` Dan Williams
2018-10-04 10:04                         ` Johannes Thumshirn
2018-10-02 15:07       ` Jan Kara
2018-10-17 20:23     ` Jeff Moyer
2018-10-18  0:25       ` Dave Chinner
2018-10-18 14:55         ` Jan Kara
2018-10-19  0:43           ` Dave Chinner
2018-10-30  6:30             ` Dan Williams
2018-10-30 22:49               ` Dave Chinner
2018-10-30 22:59                 ` Dan Williams
2018-10-31  5:59                 ` y-goto
2018-11-01 23:00                   ` Dave Chinner
2018-11-02  1:43                     ` y-goto
2018-10-18 21:05         ` Jeff Moyer
2018-10-09 19:43 ` Jeff Moyer
2018-10-16  8:25   ` Jan Kara
2018-10-16 12:35     ` Jeff Moyer

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181019030103.GG18822@dastard \
    --to=david@fromorbit.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jthumshirn@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git