Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Johannes Thumshirn <jthumshirn@suse.de>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, mhocko@suse.cz,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: Problems with VM_MIXEDMAP removal from /proc/<pid>/smaps
Date: Tue, 2 Oct 2018 16:20:10 +0200
Message-ID: <20181002142010.GB4963@linux-x5ow.site> (raw)
In-Reply-To: <20181002121039.GA3274@linux-x5ow.site>

On Tue, Oct 02, 2018 at 02:10:39PM +0200, Johannes Thumshirn wrote:
> On Tue, Oct 02, 2018 at 12:05:31PM +0200, Jan Kara wrote:
> > Hello,
> > 
> > commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has
> > removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the
> > mean time certain customer of ours started poking into /proc/<pid>/smaps
> > and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA
> > flags, the application just fails to start complaining that DAX support is
> > missing in the kernel. The question now is how do we go about this?
> 
> OK naive question from me, how do we want an application to be able to
> check if it is running on a DAX mapping?
> 
> AFAIU DAX is always associated with a file descriptor of some kind (be
> it a real file with filesystem dax or the /dev/dax device file for
> device dax). So could a new fcntl() be of any help here? IS_DAX() only
> checks for the S_DAX flag in inode::i_flags, so this should be doable
> for both fsdax and devdax.
> 
> I haven't tried it yet but it should be fairly easy to come up with
> something like this.

OK now I did on a normal file on BTFS (without DAX obviously) and on a
file on XFS with the -o dax mount option.

Here's the RFC:

commit 3a8f0d23c421e8c91bc9d8bd3a956e1ffe3f754b
Author: Johannes Thumshirn <jthumshirn@suse.de>
Date:   Tue Oct 2 14:51:33 2018 +0200

    fcntl: provide F_GETDAX for applications to query DAX capabilities
    
    Provide a F_GETDAX fcntl(2) command so an application can query
    whether it can make use of DAX or not.
    
    Both file-system DAX as well as device DAX mark the DAX capability in
    struct inode::i_flags using the S_DAX flag, so we can query it using
    the IS_DAX() macro on a struct file's inode.
    
    If the file descriptor is either device DAX or on a DAX capable
    file-system '1' is returned back to user-space, if DAX isn't usable
    for some reason '0' is returned back.
    
    This patch can be tested with the following small C program:
    
     #include <stdio.h>
     #include <stdlib.h>
     #include <unistd.h>
     #include <fcntl.h>
     #include <libgen.h>
    
     #ifndef F_LINUX_SPECIFIC_BASE
     #define F_LINUX_SPECIFIC_BASE 1024
     #endif
    
     #define F_GETDAX               (F_LINUX_SPECIFIC_BASE + 15)
    
     int main(int argc, char **argv)
     {
            int dax;
            int fd;
            int rc;
    
            if (argc != 2) {
                    printf("Usage: %s file\n", basename(argv[0]));
                    exit(EXIT_FAILURE);
            }
    
            fd = open(argv[1], O_RDONLY);
            if (fd < 0) {
                    perror("open");
                    exit(EXIT_FAILURE);
            }
    
            rc = fcntl(fd, F_GETDAX, &dax);
            if (rc < 0) {
                    perror("fcntl");
                    close(fd);
                    exit(EXIT_FAILURE);
            }
    
            if (dax) {
                    printf("fd %d is dax capable\n", fd);
                    exit(EXIT_FAILURE);
            } else {
                    printf("fd %d is not dax capable\n", fd);
                    exit(EXIT_SUCCESS);
            }
     }
    
    Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Dan Williams <dan.j.williams@intel.com>

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 4137d96534a6..0b53f968f569 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -32,6 +32,22 @@
 
 #define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME)
 
+static int fcntl_get_dax(struct file *filp, unsigned long arg)
+{
+	struct inode *inode = file_inode(filp);
+	u64 *argp = (u64 __user *)arg;
+	u64 dax;
+
+	if (IS_DAX(inode))
+		dax = 1;
+	else
+		dax = 0;
+
+	if (copy_to_user(argp, &dax, sizeof(*argp)))
+		return -EFAULT;
+	return 0;
+}
+
 static int setfl(int fd, struct file * filp, unsigned long arg)
 {
 	struct inode * inode = file_inode(filp);
@@ -426,6 +442,9 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_SET_FILE_RW_HINT:
 		err = fcntl_rw_hint(filp, cmd, arg);
 		break;
+	case F_GETDAX:
+		err = fcntl_get_dax(filp, arg);
+		break;
 	default:
 		break;
 	}
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..65a59c3cc46d 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -52,6 +52,7 @@
 #define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
 #define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
 #define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)
+#define F_GETDAX		(F_LINUX_SPECIFIC_BASE + 15)
 
 /*
  * Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be


-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

  reply index

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-02 10:05 Jan Kara
2018-10-02 10:50 ` Michal Hocko
2018-10-02 13:32   ` Jan Kara
2018-10-02 12:10 ` Johannes Thumshirn
2018-10-02 14:20   ` Johannes Thumshirn [this message]
2018-10-02 14:45     ` Christoph Hellwig
2018-10-02 15:01       ` Johannes Thumshirn
2018-10-02 15:06         ` Christoph Hellwig
2018-10-04 10:09           ` Johannes Thumshirn
2018-10-05  6:25             ` Christoph Hellwig
2018-10-05  6:35               ` Johannes Thumshirn
2018-10-06  1:17                 ` Dan Williams
2018-10-14 15:47                   ` Dan Williams
2018-10-17 20:01                     ` Dan Williams
2018-10-18 17:43                       ` Jan Kara
2018-10-18 19:10                         ` Dan Williams
2018-10-19  3:01                           ` Dave Chinner
2018-10-02 14:29   ` Jan Kara
2018-10-02 14:37     ` Christoph Hellwig
2018-10-02 14:44       ` Johannes Thumshirn
2018-10-02 14:52         ` Christoph Hellwig
2018-10-02 15:31           ` Jan Kara
2018-10-02 20:18             ` Dan Williams
2018-10-03 12:50               ` Jan Kara
2018-10-03 14:38                 ` Dan Williams
2018-10-03 15:06                   ` Jan Kara
2018-10-03 15:13                     ` Dan Williams
2018-10-03 16:44                       ` Jan Kara
2018-10-03 21:13                         ` Dan Williams
2018-10-04 10:04                         ` Johannes Thumshirn
2018-10-02 15:07       ` Jan Kara
2018-10-17 20:23     ` Jeff Moyer
2018-10-18  0:25       ` Dave Chinner
2018-10-18 14:55         ` Jan Kara
2018-10-19  0:43           ` Dave Chinner
2018-10-30  6:30             ` Dan Williams
2018-10-30 22:49               ` Dave Chinner
2018-10-30 22:59                 ` Dan Williams
2018-10-31  5:59                 ` y-goto
2018-11-01 23:00                   ` Dave Chinner
2018-11-02  1:43                     ` y-goto
2018-10-18 21:05         ` Jeff Moyer
2018-10-09 19:43 ` Jeff Moyer
2018-10-16  8:25   ` Jan Kara
2018-10-16 12:35     ` Jeff Moyer

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181002142010.GB4963@linux-x5ow.site \
    --to=jthumshirn@suse.de \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git