All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] DAX updates for 4.2
@ 2015-06-29 20:02 Matthew Wilcox
  2015-06-29 20:02 ` [PATCH 1/5] dax: Add block size note to documentation Matthew Wilcox
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Matthew Wilcox @ 2015-06-29 20:02 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: Matthew Wilcox

Five small independent changes for DAX.  Al, please can you take these
through the VFS tree?

Matthew Wilcox (5):
  dax: Add block size note to documentation
  dax: Use copy_from_iter_nocache
  block: Add support for DAX on block devices
  ext4: Use ext4_get_block_write() for DAX
  vfs: Allow truncate, chomd and chown to be interrupted by fatal
    signals

 Documentation/filesystems/dax.txt |  6 ++++--
 fs/block_dev.c                    | 38 ++++++++++++++++++++++++++++++++++++--
 fs/dax.c                          |  8 +++++---
 fs/ext4/file.c                    |  5 ++---
 fs/open.c                         |  9 ++++++---
 5 files changed, 53 insertions(+), 13 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] dax: Add block size note to documentation
  2015-06-29 20:02 [PATCH 0/5] DAX updates for 4.2 Matthew Wilcox
@ 2015-06-29 20:02 ` Matthew Wilcox
  2015-06-29 20:02 ` [PATCH 2/5] dax: Use copy_from_iter_nocache Matthew Wilcox
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2015-06-29 20:02 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: Matthew Wilcox

From: Matthew Wilcox <willy@linux.intel.com>

For block devices which are small enough, mkfs will default to creating
a filesystem with block sizes smaller than page size.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 Documentation/filesystems/dax.txt | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index baf4111..7af2851 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -18,8 +18,10 @@ Usage
 -----
 
 If you have a block device which supports DAX, you can make a filesystem
-on it as usual.  When mounting it, use the -o dax option manually
-or add 'dax' to the options in /etc/fstab.
+on it as usual.  The DAX code currently only supports files with a block
+size equal to your kernel's PAGE_SIZE, so you may need to specify a block
+size when creating the filesystem.  When mounting it, use the "-o dax"
+option on the command line or add 'dax' to the options in /etc/fstab.
 
 
 Implementation Tips for Block Driver Writers
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] dax: Use copy_from_iter_nocache
  2015-06-29 20:02 [PATCH 0/5] DAX updates for 4.2 Matthew Wilcox
  2015-06-29 20:02 ` [PATCH 1/5] dax: Add block size note to documentation Matthew Wilcox
@ 2015-06-29 20:02 ` Matthew Wilcox
  2015-06-29 20:02 ` [PATCH 3/5] block: Add support for DAX on block devices Matthew Wilcox
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2015-06-29 20:02 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: Matthew Wilcox

From: Matthew Wilcox <willy@linux.intel.com>

When userspace does a write, there's no need for the written data to
pollute the CPU cache.  This matches the original XIP code.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 6f65f00..159f796 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -155,7 +155,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 		}
 
 		if (iov_iter_rw(iter) == WRITE)
-			len = copy_from_iter(addr, max - pos, iter);
+			len = copy_from_iter_nocache(addr, max - pos, iter);
 		else if (!hole)
 			len = copy_to_iter(addr, max - pos, iter);
 		else
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] block: Add support for DAX on block devices
  2015-06-29 20:02 [PATCH 0/5] DAX updates for 4.2 Matthew Wilcox
  2015-06-29 20:02 ` [PATCH 1/5] dax: Add block size note to documentation Matthew Wilcox
  2015-06-29 20:02 ` [PATCH 2/5] dax: Use copy_from_iter_nocache Matthew Wilcox
@ 2015-06-29 20:02 ` Matthew Wilcox
  2015-06-30 11:19   ` Christoph Hellwig
  2015-06-29 20:02 ` [PATCH 4/5] ext4: Use ext4_get_block_write() for DAX Matthew Wilcox
  2015-06-29 20:02 ` [PATCH 5/5] vfs: Allow truncate, chomd and chown to be interrupted by fatal signals Matthew Wilcox
  4 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2015-06-29 20:02 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: Matthew Wilcox

From: Matthew Wilcox <willy@linux.intel.com>

Without this patch, accesses to a file on a filesystem on a block device
could be done without the page cache, but accessing the block device
itself would always go through the page cache.

Now reads and writes to a block device that is capable of DAX will always
bypass the page cache.  Loads and stores to an mmapped block device will
bypass the page cache if the user specified O_DIRECT.  This opt-in from
the user is necessary because DAX mappings are currently incompatible
with RDMA and O_DIRECT I/Os with non-DAX files.

Include support for the DIO_SKIP_DIO_COUNT flag in DAX, which is only
used by the block device driver.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 fs/block_dev.c | 38 ++++++++++++++++++++++++++++++++++++--
 fs/dax.c       |  6 ++++--
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index f04c873..e3fab8c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -152,6 +152,9 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
 
+	if (IS_DAX(inode))
+		return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
+				 NULL, DIO_SKIP_DIO_COUNT);
 	return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
 				    blkdev_get_block, NULL, NULL,
 				    DIO_SKIP_DIO_COUNT);
@@ -333,7 +336,37 @@ static loff_t block_llseek(struct file *file, loff_t offset, int whence)
 	mutex_unlock(&bd_inode->i_mutex);
 	return retval;
 }
-	
+
+#ifdef CONFIG_FS_DAX
+static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	return dax_fault(vma, vmf, blkdev_get_block);
+}
+
+static int blkdev_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	return dax_mkwrite(vma, vmf, blkdev_get_block);
+}
+
+static const struct vm_operations_struct blkdev_dax_vm_ops = {
+	.fault		= blkdev_dax_fault,
+	.page_mkwrite	= blkdev_dax_mkwrite,
+};
+
+static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	if ((IS_DAX(file->f_mapping->host)) && (file->f_flags & O_DIRECT)) {
+		file_accessed(file);
+		vma->vm_ops = &blkdev_dax_vm_ops;
+		vma->vm_flags |= VM_MIXEDMAP;
+		return 0;
+	}
+	return generic_file_mmap(file, vma);
+}
+#else
+#define blkdev_mmap	generic_file_mmap
+#endif
+
 int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
 {
 	struct inode *bd_inode = filp->f_mapping->host;
@@ -1170,6 +1203,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 		bdev->bd_disk = disk;
 		bdev->bd_queue = disk->queue;
 		bdev->bd_contains = bdev;
+		bdev->bd_inode->i_flags = disk->fops->direct_access ? S_DAX : 0;
 		if (!partno) {
 			ret = -ENXIO;
 			bdev->bd_part = disk_get_part(disk, partno);
@@ -1670,7 +1704,7 @@ const struct file_operations def_blk_fops = {
 	.llseek		= block_llseek,
 	.read_iter	= blkdev_read_iter,
 	.write_iter	= blkdev_write_iter,
-	.mmap		= generic_file_mmap,
+	.mmap		= blkdev_mmap,
 	.fsync		= blkdev_fsync,
 	.unlocked_ioctl	= block_ioctl,
 #ifdef CONFIG_COMPAT
diff --git a/fs/dax.c b/fs/dax.c
index 159f796..37a0c48 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -209,7 +209,8 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
 	}
 
 	/* Protects against truncate */
-	inode_dio_begin(inode);
+	if (!(flags & DIO_SKIP_DIO_COUNT))
+		inode_dio_begin(inode);
 
 	retval = dax_io(inode, iter, pos, end, get_block, &bh);
 
@@ -219,7 +220,8 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
 	if ((retval > 0) && end_io)
 		end_io(iocb, pos, retval, bh.b_private);
 
-	inode_dio_end(inode);
+	if (!(flags & DIO_SKIP_DIO_COUNT))
+		inode_dio_end(inode);
  out:
 	return retval;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] ext4: Use ext4_get_block_write() for DAX
  2015-06-29 20:02 [PATCH 0/5] DAX updates for 4.2 Matthew Wilcox
                   ` (2 preceding siblings ...)
  2015-06-29 20:02 ` [PATCH 3/5] block: Add support for DAX on block devices Matthew Wilcox
@ 2015-06-29 20:02 ` Matthew Wilcox
  2015-06-29 20:02 ` [PATCH 5/5] vfs: Allow truncate, chomd and chown to be interrupted by fatal signals Matthew Wilcox
  4 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2015-06-29 20:02 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro
  Cc: Matthew Wilcox, Theodore Ts'o, Andreas Dilger, linux-ext4

From: Matthew Wilcox <willy@linux.intel.com>

DAX relies on the get_block function either zeroing newly allocated blocks
before they're findable by subsequent calls to get_block, or marking newly
allocated blocks as unwritten.  ext4_get_block() cannot create unwritten
extents, but ext4_get_block_write() can.

Reported-by: Andy Rudoff <andy.rudoff@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 fs/ext4/file.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index ac517f1..f66f3da 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -194,13 +194,12 @@ out:
 #ifdef CONFIG_FS_DAX
 static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	return dax_fault(vma, vmf, ext4_get_block);
-					/* Is this the right get_block? */
+	return dax_fault(vma, vmf, ext4_get_block_write);
 }
 
 static int ext4_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	return dax_mkwrite(vma, vmf, ext4_get_block);
+	return dax_mkwrite(vma, vmf, ext4_get_block_write);
 }
 
 static const struct vm_operations_struct ext4_dax_vm_ops = {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] vfs: Allow truncate, chomd and chown to be interrupted by fatal signals
  2015-06-29 20:02 [PATCH 0/5] DAX updates for 4.2 Matthew Wilcox
                   ` (3 preceding siblings ...)
  2015-06-29 20:02 ` [PATCH 4/5] ext4: Use ext4_get_block_write() for DAX Matthew Wilcox
@ 2015-06-29 20:02 ` Matthew Wilcox
  4 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2015-06-29 20:02 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: Matthew Wilcox, Matthew Wilcox

If another task dies while holding the i_mutex, tasks trying to truncate,
chmod or chown the file will hang.  Allowing a fatal interrupt to kill
the task is beneficial for the system administrator trying to get the
machine to shut down more cleanly.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 fs/open.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e0250bd..4b0061e 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -56,7 +56,8 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
 	if (ret)
 		newattrs.ia_valid |= ret | ATTR_FORCE;
 
-	mutex_lock(&dentry->d_inode->i_mutex);
+	if (mutex_lock_killable(&dentry->d_inode->i_mutex))
+		return -EINTR;
 	/* Note any delegations or leases have already been broken: */
 	ret = notify_change(dentry, &newattrs, NULL);
 	mutex_unlock(&dentry->d_inode->i_mutex);
@@ -508,7 +509,8 @@ static int chmod_common(struct path *path, umode_t mode)
 	if (error)
 		return error;
 retry_deleg:
-	mutex_lock(&inode->i_mutex);
+	if (mutex_lock_killable(&inode->i_mutex))
+		return -EINTR;
 	error = security_path_chmod(path, mode);
 	if (error)
 		goto out_unlock;
@@ -591,7 +593,8 @@ retry_deleg:
 	if (!S_ISDIR(inode->i_mode))
 		newattrs.ia_valid |=
 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
-	mutex_lock(&inode->i_mutex);
+	if (mutex_lock_killable(&inode->i_mutex))
+		return -EINTR;
 	error = security_path_chown(path, uid, gid);
 	if (!error)
 		error = notify_change(path->dentry, &newattrs, &delegated_inode);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/5] block: Add support for DAX on block devices
  2015-06-29 20:02 ` [PATCH 3/5] block: Add support for DAX on block devices Matthew Wilcox
@ 2015-06-30 11:19   ` Christoph Hellwig
  2015-06-30 19:56     ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2015-06-30 11:19 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, Alexander Viro, Matthew Wilcox

On Mon, Jun 29, 2015 at 04:02:30PM -0400, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Without this patch, accesses to a file on a filesystem on a block device
> could be done without the page cache, but accessing the block device
> itself would always go through the page cache.
> 
> Now reads and writes to a block device that is capable of DAX will always
> bypass the page cache.  Loads and stores to an mmapped block device will
> bypass the page cache if the user specified O_DIRECT.  This opt-in from
> the user is necessary because DAX mappings are currently incompatible
> with RDMA and O_DIRECT I/Os with non-DAX files.

Using O_DIRECT for this seems like a pretty horrible hack, so I'd like
to see a really good justification of using this over other interfaces.

Also it needs a Cc to linux-api and an entry in the open man page, and
and even better explanation of why we only support this interface on
block devices but not file systems.

Last but least I supect we'll need a runtime option for direct_access
support in the brd devices, as we're now going to use the regular
block device path less and less.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/5] block: Add support for DAX on block devices
  2015-06-30 11:19   ` Christoph Hellwig
@ 2015-06-30 19:56     ` Matthew Wilcox
  2015-07-01  7:19       ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2015-06-30 19:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Matthew Wilcox, linux-fsdevel, Alexander Viro

On Tue, Jun 30, 2015 at 04:19:49AM -0700, Christoph Hellwig wrote:
> On Mon, Jun 29, 2015 at 04:02:30PM -0400, Matthew Wilcox wrote:
> > From: Matthew Wilcox <willy@linux.intel.com>
> > 
> > Without this patch, accesses to a file on a filesystem on a block device
> > could be done without the page cache, but accessing the block device
> > itself would always go through the page cache.
> > 
> > Now reads and writes to a block device that is capable of DAX will always
> > bypass the page cache.  Loads and stores to an mmapped block device will
> > bypass the page cache if the user specified O_DIRECT.  This opt-in from
> > the user is necessary because DAX mappings are currently incompatible
> > with RDMA and O_DIRECT I/Os with non-DAX files.
> 
> Using O_DIRECT for this seems like a pretty horrible hack, so I'd like
> to see a really good justification of using this over other interfaces.

O_DIRECT means "bypass the page cache", which is what this does (now it's
able to apply to mmap too).

> Also it needs a Cc to linux-api and an entry in the open man page, and
> and even better explanation of why we only support this interface on
> block devices but not file systems.

Um, we do support this for filesystems with DAX.  The inconsistency we
have is that if you have a direct-access-capable block device, currently
files in a filesystem on it get the bypass-page-cache treatment, but if
you use the raw block device directly, that mapping doesn't.

> Last but least I supect we'll need a runtime option for direct_access
> support in the brd devices, as we're now going to use the regular
> block device path less and less.

I'm getting there; I was working on getting DAX to dynamically map the
pages that it used (rather than relying on them being permanently part
of the direct mapping), but I had to set that work aside temporarily.
That lets us just delete the compile option, and have direct_access
always work on brd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/5] block: Add support for DAX on block devices
  2015-06-30 19:56     ` Matthew Wilcox
@ 2015-07-01  7:19       ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2015-07-01  7:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Matthew Wilcox, linux-fsdevel, Alexander Viro

On Tue, Jun 30, 2015 at 03:56:15PM -0400, Matthew Wilcox wrote:
> > Using O_DIRECT for this seems like a pretty horrible hack, so I'd like
> > to see a really good justification of using this over other interfaces.
> 
> O_DIRECT means "bypass the page cache", which is what this does (now it's
> able to apply to mmap too).

It never had a meaning for mmap.   

> > Also it needs a Cc to linux-api and an entry in the open man page, and
> > and even better explanation of why we only support this interface on
> > block devices but not file systems.
> 
> Um, we do support this for filesystems with DAX.  The inconsistency we
> have is that if you have a direct-access-capable block device, currently
> files in a filesystem on it get the bypass-page-cache treatment, but if
> you use the raw block device directly, that mapping doesn't.

I don't see this O_DIRECT check done anywhere in filesystems.
Filesystems seems to get your O_DIRECT treatment when mounted with the
dax option as far as I can tell without the need for additional options.

The block device equivalent would be a sysfs flag, which seems like the
better implementation choice here.  

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-07-01  7:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-29 20:02 [PATCH 0/5] DAX updates for 4.2 Matthew Wilcox
2015-06-29 20:02 ` [PATCH 1/5] dax: Add block size note to documentation Matthew Wilcox
2015-06-29 20:02 ` [PATCH 2/5] dax: Use copy_from_iter_nocache Matthew Wilcox
2015-06-29 20:02 ` [PATCH 3/5] block: Add support for DAX on block devices Matthew Wilcox
2015-06-30 11:19   ` Christoph Hellwig
2015-06-30 19:56     ` Matthew Wilcox
2015-07-01  7:19       ` Christoph Hellwig
2015-06-29 20:02 ` [PATCH 4/5] ext4: Use ext4_get_block_write() for DAX Matthew Wilcox
2015-06-29 20:02 ` [PATCH 5/5] vfs: Allow truncate, chomd and chown to be interrupted by fatal signals Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.