All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Johannes Thumshirn <jthumshirn@suse.de>
Cc: linux-nvdimm@lists.01.org, linux-api@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org
Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps
Date: Tue, 2 Oct 2018 16:29:59 +0200	[thread overview]
Message-ID: <20181002142959.GD9127@quack2.suse.cz> (raw)
In-Reply-To: <20181002121039.GA3274@linux-x5ow.site>

[Added ext4, xfs, and linux-api folks to CC for the interface discussion]

On Tue 02-10-18 14:10:39, Johannes Thumshirn wrote:
> On Tue, Oct 02, 2018 at 12:05:31PM +0200, Jan Kara wrote:
> > Hello,
> > 
> > commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has
> > removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the
> > mean time certain customer of ours started poking into /proc/<pid>/smaps
> > and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA
> > flags, the application just fails to start complaining that DAX support is
> > missing in the kernel. The question now is how do we go about this?
> 
> OK naive question from me, how do we want an application to be able to
> check if it is running on a DAX mapping?

The question from me is: Should application really care? After all DAX is
just a caching decision. Sure it affects performance characteristics and
memory usage of the kernel but it is not a correctness issue (in particular
we took care for MAP_SYNC to return EOPNOTSUPP if the feature cannot be
supported for current mapping). And in the future the details of what we do
with DAX mapping can change - e.g. I could imagine we might decide to cache
writes in DRAM but do direct PMEM access on reads. And all this could be
auto-tuned based on media properties. And we don't want to tie our hands by
specifying too narrowly how the kernel is going to behave.

OTOH I understand that e.g. for a large database application the difference
between DAX and non-DAX mapping can be a difference between performs fine
and performs terribly / kills the machine so such application might want to
determine / force caching policy to save sysadmin from debugging why the
application is misbehaving.

> AFAIU DAX is always associated with a file descriptor of some kind (be
> it a real file with filesystem dax or the /dev/dax device file for
> device dax). So could a new fcntl() be of any help here? IS_DAX() only
> checks for the S_DAX flag in inode::i_flags, so this should be doable
> for both fsdax and devdax.

So fcntl() to query DAX usage is one option. Another option is the GETFLAGS
ioctl with which you can query the state of S_DAX flag (works only for XFS
currently). But that inode flag was meant more as a hint "use DAX if
available" AFAIK so that's probably not really suitable for querying
whether DAX is really in use or not. Since DAX is really about caching
policy, I was also thinking that we could use madvise / fadvise for this.
I.e., something like MADV_DIRECT_ACCESS which would return with success if
DAX is in use, with error if not. Later, kernel could use it as a hint to
really force DAX on a mapping and not try clever caching policies...
Thoughts?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Jan Kara <jack@suse.cz>, Dan Williams <dan.j.williams@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	linux-nvdimm@lists.01.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps
Date: Tue, 2 Oct 2018 16:29:59 +0200	[thread overview]
Message-ID: <20181002142959.GD9127@quack2.suse.cz> (raw)
In-Reply-To: <20181002121039.GA3274@linux-x5ow.site>

[Added ext4, xfs, and linux-api folks to CC for the interface discussion]

On Tue 02-10-18 14:10:39, Johannes Thumshirn wrote:
> On Tue, Oct 02, 2018 at 12:05:31PM +0200, Jan Kara wrote:
> > Hello,
> > 
> > commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has
> > removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the
> > mean time certain customer of ours started poking into /proc/<pid>/smaps
> > and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA
> > flags, the application just fails to start complaining that DAX support is
> > missing in the kernel. The question now is how do we go about this?
> 
> OK naive question from me, how do we want an application to be able to
> check if it is running on a DAX mapping?

The question from me is: Should application really care? After all DAX is
just a caching decision. Sure it affects performance characteristics and
memory usage of the kernel but it is not a correctness issue (in particular
we took care for MAP_SYNC to return EOPNOTSUPP if the feature cannot be
supported for current mapping). And in the future the details of what we do
with DAX mapping can change - e.g. I could imagine we might decide to cache
writes in DRAM but do direct PMEM access on reads. And all this could be
auto-tuned based on media properties. And we don't want to tie our hands by
specifying too narrowly how the kernel is going to behave.

OTOH I understand that e.g. for a large database application the difference
between DAX and non-DAX mapping can be a difference between performs fine
and performs terribly / kills the machine so such application might want to
determine / force caching policy to save sysadmin from debugging why the
application is misbehaving.

> AFAIU DAX is always associated with a file descriptor of some kind (be
> it a real file with filesystem dax or the /dev/dax device file for
> device dax). So could a new fcntl() be of any help here? IS_DAX() only
> checks for the S_DAX flag in inode::i_flags, so this should be doable
> for both fsdax and devdax.

So fcntl() to query DAX usage is one option. Another option is the GETFLAGS
ioctl with which you can query the state of S_DAX flag (works only for XFS
currently). But that inode flag was meant more as a hint "use DAX if
available" AFAIK so that's probably not really suitable for querying
whether DAX is really in use or not. Since DAX is really about caching
policy, I was also thinking that we could use madvise / fadvise for this.
I.e., something like MADV_DIRECT_ACCESS which would return with success if
DAX is in use, with error if not. Later, kernel could use it as a hint to
really force DAX on a mapping and not try clever caching policies...
Thoughts?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
To: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps
Date: Tue, 2 Oct 2018 16:29:59 +0200	[thread overview]
Message-ID: <20181002142959.GD9127@quack2.suse.cz> (raw)
In-Reply-To: <20181002121039.GA3274-qw2SdCWA0PpjqqEj2zc+bA@public.gmane.org>

[Added ext4, xfs, and linux-api folks to CC for the interface discussion]

On Tue 02-10-18 14:10:39, Johannes Thumshirn wrote:
> On Tue, Oct 02, 2018 at 12:05:31PM +0200, Jan Kara wrote:
> > Hello,
> > 
> > commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has
> > removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the
> > mean time certain customer of ours started poking into /proc/<pid>/smaps
> > and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA
> > flags, the application just fails to start complaining that DAX support is
> > missing in the kernel. The question now is how do we go about this?
> 
> OK naive question from me, how do we want an application to be able to
> check if it is running on a DAX mapping?

The question from me is: Should application really care? After all DAX is
just a caching decision. Sure it affects performance characteristics and
memory usage of the kernel but it is not a correctness issue (in particular
we took care for MAP_SYNC to return EOPNOTSUPP if the feature cannot be
supported for current mapping). And in the future the details of what we do
with DAX mapping can change - e.g. I could imagine we might decide to cache
writes in DRAM but do direct PMEM access on reads. And all this could be
auto-tuned based on media properties. And we don't want to tie our hands by
specifying too narrowly how the kernel is going to behave.

OTOH I understand that e.g. for a large database application the difference
between DAX and non-DAX mapping can be a difference between performs fine
and performs terribly / kills the machine so such application might want to
determine / force caching policy to save sysadmin from debugging why the
application is misbehaving.

> AFAIU DAX is always associated with a file descriptor of some kind (be
> it a real file with filesystem dax or the /dev/dax device file for
> device dax). So could a new fcntl() be of any help here? IS_DAX() only
> checks for the S_DAX flag in inode::i_flags, so this should be doable
> for both fsdax and devdax.

So fcntl() to query DAX usage is one option. Another option is the GETFLAGS
ioctl with which you can query the state of S_DAX flag (works only for XFS
currently). But that inode flag was meant more as a hint "use DAX if
available" AFAIK so that's probably not really suitable for querying
whether DAX is really in use or not. Since DAX is really about caching
policy, I was also thinking that we could use madvise / fadvise for this.
I.e., something like MADV_DIRECT_ACCESS which would return with success if
DAX is in use, with error if not. Later, kernel could use it as a hint to
really force DAX on a mapping and not try clever caching policies...
Thoughts?

								Honza

-- 
Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
SUSE Labs, CR

  parent reply	other threads:[~2018-10-02 14:30 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-02 10:05 Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps Jan Kara
2018-10-02 10:05 ` Jan Kara
2018-10-02 10:05 ` Jan Kara
2018-10-02 10:50 ` Michal Hocko
2018-10-02 10:50   ` Michal Hocko
2018-10-02 13:32   ` Jan Kara
2018-10-02 13:32     ` Jan Kara
2018-10-02 12:10 ` Johannes Thumshirn
2018-10-02 12:10   ` Johannes Thumshirn
2018-10-02 12:10   ` Johannes Thumshirn
2018-10-02 14:20   ` Johannes Thumshirn
2018-10-02 14:20     ` Johannes Thumshirn
2018-10-02 14:20     ` Johannes Thumshirn
2018-10-02 14:45     ` Christoph Hellwig
2018-10-02 14:45       ` Christoph Hellwig
2018-10-02 15:01       ` Johannes Thumshirn
2018-10-02 15:01         ` Johannes Thumshirn
2018-10-02 15:01         ` Johannes Thumshirn
2018-10-02 15:06         ` Christoph Hellwig
2018-10-02 15:06           ` Christoph Hellwig
2018-10-04 10:09           ` Johannes Thumshirn
2018-10-04 10:09             ` Johannes Thumshirn
2018-10-04 10:09             ` Johannes Thumshirn
2018-10-05  6:25             ` Christoph Hellwig
2018-10-05  6:25               ` Christoph Hellwig
2018-10-05  6:35               ` Johannes Thumshirn
2018-10-05  6:35                 ` Johannes Thumshirn
2018-10-05  6:35                 ` Johannes Thumshirn
2018-10-06  1:17                 ` Dan Williams
2018-10-06  1:17                   ` Dan Williams
2018-10-14 15:47                   ` Dan Williams
2018-10-14 15:47                     ` Dan Williams
2018-10-17 20:01                     ` Dan Williams
2018-10-18 17:43                       ` Jan Kara
2018-10-18 17:43                         ` Jan Kara
2018-10-18 19:10                         ` Dan Williams
2018-10-18 19:10                           ` Dan Williams
2018-10-19  3:01                           ` Dave Chinner
2018-10-19  3:01                             ` Dave Chinner
2018-10-02 14:29   ` Jan Kara [this message]
2018-10-02 14:29     ` Jan Kara
2018-10-02 14:29     ` Jan Kara
2018-10-02 14:37     ` Christoph Hellwig
2018-10-02 14:37       ` Christoph Hellwig
2018-10-02 14:37       ` Christoph Hellwig
2018-10-02 14:44       ` Johannes Thumshirn
2018-10-02 14:44         ` Johannes Thumshirn
2018-10-02 14:44         ` Johannes Thumshirn
2018-10-02 14:44         ` Johannes Thumshirn
2018-10-02 14:44         ` Johannes Thumshirn
2018-10-02 14:52         ` Christoph Hellwig
2018-10-02 14:52           ` Christoph Hellwig
2018-10-02 14:52           ` Christoph Hellwig
2018-10-02 15:31           ` Jan Kara
2018-10-02 15:31             ` Jan Kara
2018-10-02 15:31             ` Jan Kara
2018-10-02 20:18             ` Dan Williams
2018-10-02 20:18               ` Dan Williams
2018-10-03 12:50               ` Jan Kara
2018-10-03 12:50                 ` Jan Kara
2018-10-03 12:50                 ` Jan Kara
2018-10-03 14:38                 ` Dan Williams
2018-10-03 14:38                   ` Dan Williams
2018-10-03 15:06                   ` Jan Kara
2018-10-03 15:06                     ` Jan Kara
2018-10-03 15:06                     ` Jan Kara
2018-10-03 15:13                     ` Dan Williams
2018-10-03 15:13                       ` Dan Williams
2018-10-03 15:13                       ` Dan Williams
2018-10-03 16:44                       ` Jan Kara
2018-10-03 16:44                         ` Jan Kara
2018-10-03 16:44                         ` Jan Kara
2018-10-03 21:13                         ` Dan Williams
2018-10-03 21:13                           ` Dan Williams
2018-10-03 21:13                           ` Dan Williams
2018-10-04 10:04                         ` Johannes Thumshirn
2018-10-04 10:04                           ` Johannes Thumshirn
2018-10-04 10:04                           ` Johannes Thumshirn
2018-10-04 10:04                           ` Johannes Thumshirn
2018-10-04 10:04                           ` Johannes Thumshirn
2018-10-02 15:07       ` Jan Kara
2018-10-02 15:07         ` Jan Kara
2018-10-02 15:07         ` Jan Kara
2018-10-17 20:23     ` Jeff Moyer
2018-10-17 20:23       ` Jeff Moyer
2018-10-17 20:23       ` Jeff Moyer
2018-10-17 20:23       ` Jeff Moyer
2018-10-18  0:25       ` Dave Chinner
2018-10-18  0:25         ` Dave Chinner
2018-10-18  0:25         ` Dave Chinner
2018-10-18 14:55         ` Jan Kara
2018-10-18 14:55           ` Jan Kara
2018-10-19  0:43           ` Dave Chinner
2018-10-19  0:43             ` Dave Chinner
2018-10-19  0:43             ` Dave Chinner
2018-10-30  6:30             ` Dan Williams
2018-10-30  6:30               ` Dan Williams
2018-10-30  6:30               ` Dan Williams
2018-10-30 22:49               ` Dave Chinner
2018-10-30 22:49                 ` Dave Chinner
2018-10-30 22:49                 ` Dave Chinner
2018-10-30 22:59                 ` Dan Williams
2018-10-30 22:59                   ` Dan Williams
2018-10-30 22:59                   ` Dan Williams
2018-10-31  5:59                 ` y-goto
2018-10-31  5:59                   ` y-goto-LMvhtfratI1BDgjK7y7TUQ
2018-10-31  5:59                   ` y-goto
2018-11-01 23:00                   ` Dave Chinner
2018-11-01 23:00                     ` Dave Chinner
2018-11-01 23:00                     ` Dave Chinner
2018-11-02  1:43                     ` y-goto
2018-11-02  1:43                       ` y-goto-LMvhtfratI1BDgjK7y7TUQ
2018-11-02  1:43                       ` y-goto
2018-10-18 21:05         ` Jeff Moyer
2018-10-18 21:05           ` Jeff Moyer
2018-10-18 21:05           ` Jeff Moyer
2018-10-18 21:05           ` Jeff Moyer
2018-10-09 19:43 ` Jeff Moyer
2018-10-09 19:43   ` Jeff Moyer
2018-10-09 19:43   ` Jeff Moyer
2018-10-16  8:25   ` Jan Kara
2018-10-16  8:25     ` Jan Kara
2018-10-16 12:35     ` Jeff Moyer
2018-10-16 12:35       ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181002142959.GD9127@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=jthumshirn@suse.de \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.