linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* aio fixes for use after free and freeze protection
@ 2016-10-30 16:42 Christoph Hellwig
  2016-10-30 16:42 ` [PATCH 1/4] aio: hold an extra file reference over AIO read/write operations Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-10-30 16:42 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Jan Kara, Dmitry Monakhov, Jeff Moyer, linux-fsdevel, linux-aio,
	linux-kernel

Hi Linus, hi Al,

below is the new version of the aio fix(es).

Patch one just holds an additional file reference over AIO ops.  This
one is minimally invasive and a clear 4.9 and stable candidate.

The next one drops the never implemented aio_fsync methods because it
makes my life easier later on.  No user visible change as we always
ended up returning EINVAL anyway.

Patch three refators the aio code so that it's not a spaghetti monster,
and patch four is Jan's original free patch rebased on top of this whole
stack.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] aio: hold an extra file reference over AIO read/write operations
  2016-10-30 16:42 aio fixes for use after free and freeze protection Christoph Hellwig
@ 2016-10-30 16:42 ` Christoph Hellwig
  2016-10-30 16:42 ` [PATCH 2/4] fs: remove the never implemented aio_fsync file operation Christoph Hellwig
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-10-30 16:42 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Jan Kara, Dmitry Monakhov, Jeff Moyer, linux-fsdevel, linux-aio,
	linux-kernel

Otherwise we might dereference an already freed file and/or inode
when aio_complete is called before we return from the read_iter or
write_iter method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 1157e13..0aa71d3 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1460,6 +1460,7 @@ static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode,
 			return ret;
 		}
 
+		get_file(file);
 		if (rw == WRITE)
 			file_start_write(file);
 
@@ -1467,6 +1468,7 @@ static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode,
 
 		if (rw == WRITE)
 			file_end_write(file);
+		fput(file);
 		kfree(iovec);
 		break;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] fs: remove the never implemented aio_fsync file operation
  2016-10-30 16:42 aio fixes for use after free and freeze protection Christoph Hellwig
  2016-10-30 16:42 ` [PATCH 1/4] aio: hold an extra file reference over AIO read/write operations Christoph Hellwig
@ 2016-10-30 16:42 ` Christoph Hellwig
  2016-10-30 23:23   ` Dave Chinner
  2016-10-30 16:42 ` [PATCH 3/4] fs: remove aio_run_iocb Christoph Hellwig
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-10-30 16:42 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Jan Kara, Dmitry Monakhov, Jeff Moyer, linux-fsdevel, linux-aio,
	linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/Locking |  1 -
 Documentation/filesystems/vfs.txt |  1 -
 fs/aio.c                          | 14 --------------
 fs/ntfs/dir.c                     |  2 --
 include/linux/fs.h                |  1 -
 5 files changed, 19 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 14cdc10..1b5f156 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -447,7 +447,6 @@ prototypes:
 	int (*flush) (struct file *);
 	int (*release) (struct inode *, struct file *);
 	int (*fsync) (struct file *, loff_t start, loff_t end, int datasync);
-	int (*aio_fsync) (struct kiocb *, int datasync);
 	int (*fasync) (int, struct file *, int);
 	int (*lock) (struct file *, int, struct file_lock *);
 	ssize_t (*readv) (struct file *, const struct iovec *, unsigned long,
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index d619c8d..b5039a0 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -828,7 +828,6 @@ struct file_operations {
 	int (*flush) (struct file *, fl_owner_t id);
 	int (*release) (struct inode *, struct file *);
 	int (*fsync) (struct file *, loff_t, loff_t, int datasync);
-	int (*aio_fsync) (struct kiocb *, int datasync);
 	int (*fasync) (int, struct file *, int);
 	int (*lock) (struct file *, int, struct file_lock *);
 	ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
diff --git a/fs/aio.c b/fs/aio.c
index 0aa71d3..2a6030a 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1472,20 +1472,6 @@ static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode,
 		kfree(iovec);
 		break;
 
-	case IOCB_CMD_FDSYNC:
-		if (!file->f_op->aio_fsync)
-			return -EINVAL;
-
-		ret = file->f_op->aio_fsync(req, 1);
-		break;
-
-	case IOCB_CMD_FSYNC:
-		if (!file->f_op->aio_fsync)
-			return -EINVAL;
-
-		ret = file->f_op->aio_fsync(req, 0);
-		break;
-
 	default:
 		pr_debug("EINVAL: no operation provided\n");
 		return -EINVAL;
diff --git a/fs/ntfs/dir.c b/fs/ntfs/dir.c
index a186135..0ee19ec 100644
--- a/fs/ntfs/dir.c
+++ b/fs/ntfs/dir.c
@@ -1544,8 +1544,6 @@ const struct file_operations ntfs_dir_ops = {
 	.iterate	= ntfs_readdir,		/* Read directory contents. */
 #ifdef NTFS_RW
 	.fsync		= ntfs_dir_fsync,	/* Sync a directory to disk. */
-	/*.aio_fsync	= ,*/			/* Sync all outstanding async
-						   i/o operations on a kiocb. */
 #endif /* NTFS_RW */
 	/*.ioctl	= ,*/			/* Perform function on the
 						   mounted filesystem. */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 16d2b6e..ff7bcd9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1709,7 +1709,6 @@ struct file_operations {
 	int (*flush) (struct file *, fl_owner_t id);
 	int (*release) (struct inode *, struct file *);
 	int (*fsync) (struct file *, loff_t, loff_t, int datasync);
-	int (*aio_fsync) (struct kiocb *, int datasync);
 	int (*fasync) (int, struct file *, int);
 	int (*lock) (struct file *, int, struct file_lock *);
 	ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] fs: remove aio_run_iocb
  2016-10-30 16:42 aio fixes for use after free and freeze protection Christoph Hellwig
  2016-10-30 16:42 ` [PATCH 1/4] aio: hold an extra file reference over AIO read/write operations Christoph Hellwig
  2016-10-30 16:42 ` [PATCH 2/4] fs: remove the never implemented aio_fsync file operation Christoph Hellwig
@ 2016-10-30 16:42 ` Christoph Hellwig
  2016-10-30 16:42 ` [PATCH 4/4] aio: fix freeze protection of aio writes Christoph Hellwig
  2016-10-30 16:58 ` aio fixes for use after free and freeze protection Al Viro
  4 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-10-30 16:42 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Jan Kara, Dmitry Monakhov, Jeff Moyer, linux-fsdevel, linux-aio,
	linux-kernel

Pass the ABI iocb structure to aio_setup_rw and let it handle the
non-vectored I/O case as well.  With that and a new helper for the AIO
return value handling we can now define new aio_read and aio_write
helpers that implement reads and writes in a self-contained way without
duplicating too much code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c | 182 +++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 94 insertions(+), 88 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 2a6030a..c197551 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1392,110 +1392,100 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
 	return -EINVAL;
 }
 
-typedef ssize_t (rw_iter_op)(struct kiocb *, struct iov_iter *);
-
-static int aio_setup_vectored_rw(int rw, char __user *buf, size_t len,
-				 struct iovec **iovec,
-				 bool compat,
-				 struct iov_iter *iter)
+static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
+		bool vectored, bool compat, struct iov_iter *iter)
 {
+	void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf;
+	size_t len = iocb->aio_nbytes;
+
+	if (!vectored) {
+		ssize_t ret = import_single_range(rw, buf, len, *iovec, iter);
+		*iovec = NULL;
+		return ret;
+	}
 #ifdef CONFIG_COMPAT
 	if (compat)
-		return compat_import_iovec(rw,
-				(struct compat_iovec __user *)buf,
-				len, UIO_FASTIOV, iovec, iter);
+		return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
+				iter);
 #endif
-	return import_iovec(rw, (struct iovec __user *)buf,
-				len, UIO_FASTIOV, iovec, iter);
+	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
-/*
- * aio_run_iocb:
- *	Performs the initial checks and io submission.
- */
-static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode,
-			    char __user *buf, size_t len, bool compat)
+static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
+{
+	switch (ret) {
+	case -EIOCBQUEUED:
+		return ret;
+	case -ERESTARTSYS:
+	case -ERESTARTNOINTR:
+	case -ERESTARTNOHAND:
+	case -ERESTART_RESTARTBLOCK:
+		/*
+		 * There's no easy way to restart the syscall since other AIO's
+		 * may be already running. Just fail this IO with EINTR.
+		 */
+		ret = -EINTR;
+		/*FALLTHRU*/
+	default:
+		aio_complete(req, ret, 0);
+		return 0;
+	}
+}
+
+static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
+		bool compat)
 {
 	struct file *file = req->ki_filp;
-	ssize_t ret;
-	int rw;
-	fmode_t mode;
-	rw_iter_op *iter_op;
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
+	ssize_t ret;
 
-	switch (opcode) {
-	case IOCB_CMD_PREAD:
-	case IOCB_CMD_PREADV:
-		mode	= FMODE_READ;
-		rw	= READ;
-		iter_op	= file->f_op->read_iter;
-		goto rw_common;
-
-	case IOCB_CMD_PWRITE:
-	case IOCB_CMD_PWRITEV:
-		mode	= FMODE_WRITE;
-		rw	= WRITE;
-		iter_op	= file->f_op->write_iter;
-		goto rw_common;
-rw_common:
-		if (unlikely(!(file->f_mode & mode)))
-			return -EBADF;
-
-		if (!iter_op)
-			return -EINVAL;
-
-		if (opcode == IOCB_CMD_PREADV || opcode == IOCB_CMD_PWRITEV)
-			ret = aio_setup_vectored_rw(rw, buf, len,
-						&iovec, compat, &iter);
-		else {
-			ret = import_single_range(rw, buf, len, iovec, &iter);
-			iovec = NULL;
-		}
-		if (!ret)
-			ret = rw_verify_area(rw, file, &req->ki_pos,
-					     iov_iter_count(&iter));
-		if (ret < 0) {
-			kfree(iovec);
-			return ret;
-		}
-
-		get_file(file);
-		if (rw == WRITE)
-			file_start_write(file);
+	if (unlikely(!(file->f_mode & FMODE_READ)))
+		return -EBADF;
+	if (unlikely(!file->f_op->read_iter))
+		return -EINVAL;
 
-		ret = iter_op(req, &iter);
+	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
+	if (ret)
+		return ret;
+	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
+	if (!ret)
+		ret = aio_ret(req, file->f_op->read_iter(req, &iter));
+	kfree(iovec);
+	return ret;
+}
 
-		if (rw == WRITE)
-			file_end_write(file);
-		fput(file);
-		kfree(iovec);
-		break;
+static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
+		bool compat)
+{
+	struct file *file = req->ki_filp;
+	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
+	struct iov_iter iter;
+	ssize_t ret;
 
-	default:
-		pr_debug("EINVAL: no operation provided\n");
+	if (unlikely(!(file->f_mode & FMODE_WRITE)))
+		return -EBADF;
+	if (unlikely(!file->f_op->write_iter))
 		return -EINVAL;
-	}
 
-	if (ret != -EIOCBQUEUED) {
-		/*
-		 * There's no easy way to restart the syscall since other AIO's
-		 * may be already running. Just fail this IO with EINTR.
-		 */
-		if (unlikely(ret == -ERESTARTSYS || ret == -ERESTARTNOINTR ||
-			     ret == -ERESTARTNOHAND ||
-			     ret == -ERESTART_RESTARTBLOCK))
-			ret = -EINTR;
-		aio_complete(req, ret, 0);
+	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
+	if (ret)
+		return ret;
+	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
+	if (!ret) {
+		file_start_write(file);
+		ret = aio_ret(req, file->f_op->write_iter(req, &iter));
+		file_end_write(file);
 	}
-
-	return 0;
+	kfree(iovec);
+	return ret;
 }
 
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 struct iocb *iocb, bool compat)
 {
 	struct aio_kiocb *req;
+	struct file *file;
 	ssize_t ret;
 
 	/* enforce forwards compatibility on users */
@@ -1518,7 +1508,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	if (unlikely(!req))
 		return -EAGAIN;
 
-	req->common.ki_filp = fget(iocb->aio_fildes);
+	req->common.ki_filp = file = fget(iocb->aio_fildes);
 	if (unlikely(!req->common.ki_filp)) {
 		ret = -EBADF;
 		goto out_put_req;
@@ -1553,13 +1543,29 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	req->ki_user_iocb = user_iocb;
 	req->ki_user_data = iocb->aio_data;
 
-	ret = aio_run_iocb(&req->common, iocb->aio_lio_opcode,
-			   (char __user *)(unsigned long)iocb->aio_buf,
-			   iocb->aio_nbytes,
-			   compat);
-	if (ret)
-		goto out_put_req;
+	get_file(file);
+	switch (iocb->aio_lio_opcode) {
+	case IOCB_CMD_PREAD:
+		ret = aio_read(&req->common, iocb, false, compat);
+		break;
+	case IOCB_CMD_PWRITE:
+		ret = aio_write(&req->common, iocb, false, compat);
+		break;
+	case IOCB_CMD_PREADV:
+		ret = aio_read(&req->common, iocb, true, compat);
+		break;
+	case IOCB_CMD_PWRITEV:
+		ret = aio_write(&req->common, iocb, true, compat);
+		break;
+	default:
+		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
+		ret = -EINVAL;
+		break;
+	}
+	fput(file);
 
+	if (ret && ret != -EIOCBQUEUED)
+		goto out_put_req;
 	return 0;
 out_put_req:
 	put_reqs_available(ctx, 1);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] aio: fix freeze protection of aio writes
  2016-10-30 16:42 aio fixes for use after free and freeze protection Christoph Hellwig
                   ` (2 preceding siblings ...)
  2016-10-30 16:42 ` [PATCH 3/4] fs: remove aio_run_iocb Christoph Hellwig
@ 2016-10-30 16:42 ` Christoph Hellwig
  2016-10-30 16:58 ` aio fixes for use after free and freeze protection Al Viro
  4 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-10-30 16:42 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Jan Kara, Dmitry Monakhov, Jeff Moyer, linux-fsdevel, linux-aio,
	linux-kernel

Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:

for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
    --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite

Fix the problem by dropping freeze protection only once IO is completed
in aio_complete().

Reported-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
[hch: forward ported on top of various VFS and aio changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c           | 19 ++++++++++++++++++-
 include/linux/fs.h |  1 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/aio.c b/fs/aio.c
index c197551..428484f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1078,6 +1078,17 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
+	if (kiocb->ki_flags & IOCB_WRITE) {
+		struct file *file = kiocb->ki_filp;
+
+		/*
+		 * Tell lockdep we inherited freeze protection from submission
+		 * thread.
+		 */
+		__sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE);
+		file_end_write(file);
+	}
+
 	/*
 	 * Special case handling for sync iocbs:
 	 *  - events go directly into the iocb for fast handling
@@ -1473,9 +1484,15 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
 		return ret;
 	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret) {
+		req->ki_flags |= IOCB_WRITE;
 		file_start_write(file);
 		ret = aio_ret(req, file->f_op->write_iter(req, &iter));
-		file_end_write(file);
+		/*
+		 * We release freeze protection in aio_complete().  Fool lockdep
+		 * by telling it the lock got released so that it doesn't
+		 * complain about held lock when we return to userspace.
+		 */
+		__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
 	}
 	kfree(iovec);
 	return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ff7bcd9..dc0478c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -321,6 +321,7 @@ struct writeback_control;
 #define IOCB_HIPRI		(1 << 3)
 #define IOCB_DSYNC		(1 << 4)
 #define IOCB_SYNC		(1 << 5)
+#define IOCB_WRITE		(1 << 6)
 
 struct kiocb {
 	struct file		*ki_filp;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: aio fixes for use after free and freeze protection
  2016-10-30 16:42 aio fixes for use after free and freeze protection Christoph Hellwig
                   ` (3 preceding siblings ...)
  2016-10-30 16:42 ` [PATCH 4/4] aio: fix freeze protection of aio writes Christoph Hellwig
@ 2016-10-30 16:58 ` Al Viro
  2016-10-30 17:01   ` Christoph Hellwig
  2016-11-09  2:41   ` Christoph Hellwig
  4 siblings, 2 replies; 15+ messages in thread
From: Al Viro @ 2016-10-30 16:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Jan Kara, Dmitry Monakhov, Jeff Moyer,
	linux-fsdevel, linux-aio, linux-kernel

On Sun, Oct 30, 2016 at 11:42:00AM -0500, Christoph Hellwig wrote:
> Hi Linus, hi Al,
> 
> below is the new version of the aio fix(es).
> 
> Patch one just holds an additional file reference over AIO ops.  This
> one is minimally invasive and a clear 4.9 and stable candidate.
> 
> The next one drops the never implemented aio_fsync methods because it
> makes my life easier later on.  No user visible change as we always
> ended up returning EINVAL anyway.
> 
> Patch three refators the aio code so that it's not a spaghetti monster,
> and patch four is Jan's original free patch rebased on top of this whole
> stack.

Applied.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: aio fixes for use after free and freeze protection
  2016-10-30 16:58 ` aio fixes for use after free and freeze protection Al Viro
@ 2016-10-30 17:01   ` Christoph Hellwig
  2016-11-09  2:41   ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-10-30 17:01 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Linus Torvalds, Jan Kara, Dmitry Monakhov,
	Jeff Moyer, linux-fsdevel, linux-aio, linux-kernel

Btw, patch 4 should be attributed to Jan - for some reason git keeps resetting
the attribution whenever there is a conflict during a rebase.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] fs: remove the never implemented aio_fsync file operation
  2016-10-30 16:42 ` [PATCH 2/4] fs: remove the never implemented aio_fsync file operation Christoph Hellwig
@ 2016-10-30 23:23   ` Dave Chinner
  2016-10-31 13:07     ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2016-10-30 23:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Al Viro, Jan Kara, Dmitry Monakhov, Jeff Moyer,
	linux-fsdevel, linux-aio, linux-kernel

On Sun, Oct 30, 2016 at 11:42:02AM -0500, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

This doesn't belong in this patchset.

Regardless, can we just implement the damned thing rather than
removing it?  Plenty of people have asked for it and they still want
this functionality. I've sent a couple of different prototypes that
worked but got bikeshedded to death, and IIRC Ben also tried to get
it implemented but that went nowhere because other parts of his
patchset got bikeshedded to death.

If nothing else, just let me implement it in XFS like I did the
first time so when all the bikshedding stops we can convert it to
the One True AIO Interface that is decided on.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] fs: remove the never implemented aio_fsync file operation
  2016-10-30 23:23   ` Dave Chinner
@ 2016-10-31 13:07     ` Christoph Hellwig
  2016-10-31 20:25       ` Dave Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-10-31 13:07 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Linus Torvalds, Al Viro, Jan Kara,
	Dmitry Monakhov, Jeff Moyer, linux-fsdevel, linux-aio,
	linux-kernel

On Mon, Oct 31, 2016 at 10:23:31AM +1100, Dave Chinner wrote:
> This doesn't belong in this patchset.

It does.  I can't fix up the calling conventions for a methods that
was never implemented.

> Regardless, can we just implement the damned thing rather than
> removing it?  Plenty of people have asked for it and they still want
> this functionality. I've sent a couple of different prototypes that
> worked but got bikeshedded to death, and IIRC Ben also tried to get
> it implemented but that went nowhere because other parts of his
> patchset got bikeshedded to death.
> 
> If nothing else, just let me implement it in XFS like I did the
> first time so when all the bikshedding stops we can convert it to
> the One True AIO Interface that is decided on.

I'm not going to complain about a proper implementation, but right now
we don't have any, and I'm not even sure the method signature is
all that suitable.  E.g. for the in-kernel users we'd really want a 
ranged fsync like the normal fsync anyway.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] fs: remove the never implemented aio_fsync file operation
  2016-10-31 13:07     ` Christoph Hellwig
@ 2016-10-31 20:25       ` Dave Chinner
  2016-11-01  1:30         ` Linus Torvalds
  2016-11-01 14:25         ` Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Dave Chinner @ 2016-10-31 20:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Al Viro, Jan Kara, Dmitry Monakhov, Jeff Moyer,
	linux-fsdevel, linux-aio, linux-kernel

On Mon, Oct 31, 2016 at 02:07:54PM +0100, Christoph Hellwig wrote:
> On Mon, Oct 31, 2016 at 10:23:31AM +1100, Dave Chinner wrote:
> > This doesn't belong in this patchset.
> 
> It does.  I can't fix up the calling conventions for a methods that
> was never implemented.

That sounds like a problem with your fix - it should work
regardless of whether a valid/implemented AIO function is called
or not, right? There's no difference between an invalid command,
IOCB_CMD_FSYNC where ->aio_fsync() is null, or some supported
command that immediately returns -EIO, the end result should
be the same...

> > Regardless, can we just implement the damned thing rather than
> > removing it?  Plenty of people have asked for it and they still want
> > this functionality. I've sent a couple of different prototypes that
> > worked but got bikeshedded to death, and IIRC Ben also tried to get
> > it implemented but that went nowhere because other parts of his
> > patchset got bikeshedded to death.
> > 
> > If nothing else, just let me implement it in XFS like I did the
> > first time so when all the bikshedding stops we can convert it to
> > the One True AIO Interface that is decided on.
> 
> I'm not going to complain about a proper implementation, but right now
> we don't have any, and I'm not even sure the method signature is
> all that suitable.  E.g. for the in-kernel users we'd really want a 
> ranged fsync like the normal fsync anyway.

You mean like this version I posted a year ago:

https://lkml.org/lkml/2015/10/29/517


Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] fs: remove the never implemented aio_fsync file operation
  2016-10-31 20:25       ` Dave Chinner
@ 2016-11-01  1:30         ` Linus Torvalds
  2016-11-01 14:30           ` Christoph Hellwig
  2016-11-01 14:25         ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2016-11-01  1:30 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Al Viro, Jan Kara, Dmitry Monakhov,
	Jeff Moyer, linux-fsdevel, linux-aio, Linux Kernel Mailing List

On Mon, Oct 31, 2016 at 1:25 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> You mean like this version I posted a year ago:
>
> https://lkml.org/lkml/2015/10/29/517

I still suspect that if we want to do this, we should strive to expose
all the other syncing flags from sync_file_range() too.

Not everybody wants to do just synchronous syncs. Especially if you're
doing async work, you might well want to have one async operation to
*start* the writeback on a range, then do something else, and then do
one to wait for the sync to actually have succeeded.

Yeah, that's more of a "keep writes streaming" interface than a
fsync() like interface, but I think the two really do fit together.
It's kind of sad how we have this very fragmented interface to
writeback, where  some operations take that "data vs metadata", some
operations take a range of bytes, and some operations take that "start
writeback vs wait for it", but nothing does all of the above. They are
really just different faces of the same writeback coin.

               Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] fs: remove the never implemented aio_fsync file operation
  2016-10-31 20:25       ` Dave Chinner
  2016-11-01  1:30         ` Linus Torvalds
@ 2016-11-01 14:25         ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-11-01 14:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Linus Torvalds, Al Viro, Jan Kara,
	Dmitry Monakhov, Jeff Moyer, linux-fsdevel, linux-aio,
	linux-kernel

On Tue, Nov 01, 2016 at 07:25:21AM +1100, Dave Chinner wrote:
> That sounds like a problem with your fix - it should work
> regardless of whether a valid/implemented AIO function is called
> or not, right? There's no difference between an invalid command,
> IOCB_CMD_FSYNC where ->aio_fsync() is null, or some supported
> command that immediately returns -EIO, the end result should
> be the same...

We would need the same increased file refcount if aio_fsync actually
was implemented using -EIOCBQUEUED returns.  We wouldn't nessecarily need
it without that.

> > I'm not going to complain about a proper implementation, but right now
> > we don't have any, and I'm not even sure the method signature is
> > all that suitable.  E.g. for the in-kernel users we'd really want a 
> > ranged fsync like the normal fsync anyway.
> 
> You mean like this version I posted a year ago:
> 
> https://lkml.org/lkml/2015/10/29/517

I'd love to see that one in - but it doesn't use the aio_fsync method
either..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] fs: remove the never implemented aio_fsync file operation
  2016-11-01  1:30         ` Linus Torvalds
@ 2016-11-01 14:30           ` Christoph Hellwig
  2016-11-01 15:07             ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-11-01 14:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Christoph Hellwig, Al Viro, Jan Kara,
	Dmitry Monakhov, Jeff Moyer, linux-fsdevel, linux-aio,
	Linux Kernel Mailing List

On Mon, Oct 31, 2016 at 06:30:11PM -0700, Linus Torvalds wrote:
> I still suspect that if we want to do this, we should strive to expose
> all the other syncing flags from sync_file_range() too.

sync_file_range is entirely different from fsync -
sync_file_range allows you detailed control of data writeback, but it
does not allow to commit metadata at all, i.e. it's not a data integrity
operation. 

> Yeah, that's more of a "keep writes streaming" interface than a
> fsync() like interface, but I think the two really do fit together.
> It's kind of sad how we have this very fragmented interface to
> writeback, where  some operations take that "data vs metadata", some
> operations take a range of bytes, and some operations take that "start
> writeback vs wait for it", but nothing does all of the above. They are
> really just different faces of the same writeback coin.

sync_file_range at the moment actually doesn't involve the fs, which
has it's own set of problems.  So yes, maybe we need a full blown
sync method unifying fsync, sync_file_range and which enables ranged
data integrity fsync and asynchronous operations of all this.

But to go back to Dave's argument - none of that can archived with that
aio_fsync method added more than 10 years ago and never implemented.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] fs: remove the never implemented aio_fsync file operation
  2016-11-01 14:30           ` Christoph Hellwig
@ 2016-11-01 15:07             ` Linus Torvalds
  0 siblings, 0 replies; 15+ messages in thread
From: Linus Torvalds @ 2016-11-01 15:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Al Viro, Jan Kara, Dmitry Monakhov, Jeff Moyer,
	linux-fsdevel, linux-aio, Linux Kernel Mailing List

On Tue, Nov 1, 2016 at 7:30 AM, Christoph Hellwig <hch@lst.de> wrote:
>
> sync_file_range is entirely different from fsync -

Absolutely.

And I think it's a shame. Particularly the fact that we don't have any
way to actually tie into the filesystem, and we just work on the page
cache level, which "mostly works" for writeback, but is really not
right. It's not like a filesystem even has to work on that level at
all..

> sync_file_range at the moment actually doesn't involve the fs, which
> has it's own set of problems.  So yes, maybe we need a full blown
> sync method unifying fsync, sync_file_range and which enables ranged
> data integrity fsync and asynchronous operations of all this.

I *think* we could just expand the existing filesystem "f_op->fsync()"
interface to take more than just the single-bit argument, and have the
full interface.

But being Linux-only, it sadly will never get much use. Even if the
other flags really are very useful for getting overlapping streaming
writes with minimal dirty state. It might be more productive to have
some easier-to-use interface to some generic writebehind logic, but..

> But to go back to Dave's argument - none of that can archived with that
> aio_fsync method added more than 10 years ago and never implemented.

Yeah, no, I actually agree with just removing that. I think we are
often much too slow to just say "nobody uses it, get rid of it". Even
if it were to ever be needed in the future, we could just resurrect
it, but it's not clear that it would ever be in that particular form.

               Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: aio fixes for use after free and freeze protection
  2016-10-30 16:58 ` aio fixes for use after free and freeze protection Al Viro
  2016-10-30 17:01   ` Christoph Hellwig
@ 2016-11-09  2:41   ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-11-09  2:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Linus Torvalds, Jan Kara, Dmitry Monakhov,
	Jeff Moyer, linux-fsdevel, linux-aio, linux-kernel

Al,

did you get a chance to send this series (or at very least patch 1
which is the most critical) to Linus?  I'd hate to miss another rc
for the use after free fix.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-11-09  2:41 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-30 16:42 aio fixes for use after free and freeze protection Christoph Hellwig
2016-10-30 16:42 ` [PATCH 1/4] aio: hold an extra file reference over AIO read/write operations Christoph Hellwig
2016-10-30 16:42 ` [PATCH 2/4] fs: remove the never implemented aio_fsync file operation Christoph Hellwig
2016-10-30 23:23   ` Dave Chinner
2016-10-31 13:07     ` Christoph Hellwig
2016-10-31 20:25       ` Dave Chinner
2016-11-01  1:30         ` Linus Torvalds
2016-11-01 14:30           ` Christoph Hellwig
2016-11-01 15:07             ` Linus Torvalds
2016-11-01 14:25         ` Christoph Hellwig
2016-10-30 16:42 ` [PATCH 3/4] fs: remove aio_run_iocb Christoph Hellwig
2016-10-30 16:42 ` [PATCH 4/4] aio: fix freeze protection of aio writes Christoph Hellwig
2016-10-30 16:58 ` aio fixes for use after free and freeze protection Al Viro
2016-10-30 17:01   ` Christoph Hellwig
2016-11-09  2:41   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).