From: Jan Kara <jack@suse.cz> To: Johannes Thumshirn <jthumshirn@suse.de> Cc: Jan Kara <jack@suse.cz>, Dan Williams <dan.j.williams@intel.com>, Dave Jiang <dave.jiang@intel.com>, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-api@vger.kernel.org Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps Date: Tue, 2 Oct 2018 16:29:59 +0200 Message-ID: <20181002142959.GD9127@quack2.suse.cz> (raw) In-Reply-To: <20181002121039.GA3274@linux-x5ow.site> [Added ext4, xfs, and linux-api folks to CC for the interface discussion] On Tue 02-10-18 14:10:39, Johannes Thumshirn wrote: > On Tue, Oct 02, 2018 at 12:05:31PM +0200, Jan Kara wrote: > > Hello, > > > > commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has > > removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the > > mean time certain customer of ours started poking into /proc/<pid>/smaps > > and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA > > flags, the application just fails to start complaining that DAX support is > > missing in the kernel. The question now is how do we go about this? > > OK naive question from me, how do we want an application to be able to > check if it is running on a DAX mapping? The question from me is: Should application really care? After all DAX is just a caching decision. Sure it affects performance characteristics and memory usage of the kernel but it is not a correctness issue (in particular we took care for MAP_SYNC to return EOPNOTSUPP if the feature cannot be supported for current mapping). And in the future the details of what we do with DAX mapping can change - e.g. I could imagine we might decide to cache writes in DRAM but do direct PMEM access on reads. And all this could be auto-tuned based on media properties. And we don't want to tie our hands by specifying too narrowly how the kernel is going to behave. OTOH I understand that e.g. for a large database application the difference between DAX and non-DAX mapping can be a difference between performs fine and performs terribly / kills the machine so such application might want to determine / force caching policy to save sysadmin from debugging why the application is misbehaving. > AFAIU DAX is always associated with a file descriptor of some kind (be > it a real file with filesystem dax or the /dev/dax device file for > device dax). So could a new fcntl() be of any help here? IS_DAX() only > checks for the S_DAX flag in inode::i_flags, so this should be doable > for both fsdax and devdax. So fcntl() to query DAX usage is one option. Another option is the GETFLAGS ioctl with which you can query the state of S_DAX flag (works only for XFS currently). But that inode flag was meant more as a hint "use DAX if available" AFAIK so that's probably not really suitable for querying whether DAX is really in use or not. Since DAX is really about caching policy, I was also thinking that we could use madvise / fadvise for this. I.e., something like MADV_DIRECT_ACCESS which would return with success if DAX is in use, with error if not. Later, kernel could use it as a hint to really force DAX on a mapping and not try clever caching policies... Thoughts? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
next prev parent reply index Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-02 10:05 Jan Kara 2018-10-02 10:50 ` Michal Hocko 2018-10-02 13:32 ` Jan Kara 2018-10-02 12:10 ` Johannes Thumshirn 2018-10-02 14:20 ` Johannes Thumshirn 2018-10-02 14:45 ` Christoph Hellwig 2018-10-02 15:01 ` Johannes Thumshirn 2018-10-02 15:06 ` Christoph Hellwig 2018-10-04 10:09 ` Johannes Thumshirn 2018-10-05 6:25 ` Christoph Hellwig 2018-10-05 6:35 ` Johannes Thumshirn 2018-10-06 1:17 ` Dan Williams 2018-10-14 15:47 ` Dan Williams 2018-10-17 20:01 ` Dan Williams 2018-10-18 17:43 ` Jan Kara 2018-10-18 19:10 ` Dan Williams 2018-10-19 3:01 ` Dave Chinner 2018-10-02 14:29 ` Jan Kara [this message] 2018-10-02 14:37 ` Christoph Hellwig 2018-10-02 14:44 ` Johannes Thumshirn 2018-10-02 14:52 ` Christoph Hellwig 2018-10-02 15:31 ` Jan Kara 2018-10-02 20:18 ` Dan Williams 2018-10-03 12:50 ` Jan Kara 2018-10-03 14:38 ` Dan Williams 2018-10-03 15:06 ` Jan Kara 2018-10-03 15:13 ` Dan Williams 2018-10-03 16:44 ` Jan Kara 2018-10-03 21:13 ` Dan Williams 2018-10-04 10:04 ` Johannes Thumshirn 2018-10-02 15:07 ` Jan Kara 2018-10-17 20:23 ` Jeff Moyer 2018-10-18 0:25 ` Dave Chinner 2018-10-18 14:55 ` Jan Kara 2018-10-19 0:43 ` Dave Chinner 2018-10-30 6:30 ` Dan Williams 2018-10-30 22:49 ` Dave Chinner 2018-10-30 22:59 ` Dan Williams 2018-10-31 5:59 ` y-goto 2018-11-01 23:00 ` Dave Chinner 2018-11-02 1:43 ` y-goto 2018-10-18 21:05 ` Jeff Moyer 2018-10-09 19:43 ` Jeff Moyer 2018-10-16 8:25 ` Jan Kara 2018-10-16 12:35 ` Jeff Moyer
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181002142959.GD9127@quack2.suse.cz \ --to=jack@suse.cz \ --cc=dan.j.williams@intel.com \ --cc=dave.jiang@intel.com \ --cc=jthumshirn@suse.de \ --cc=linux-api@vger.kernel.org \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=linux-xfs@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-Fsdevel Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \ linux-fsdevel@vger.kernel.org public-inbox-index linux-fsdevel Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git