linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* aio fsync revisited
@ 2018-01-17 19:47 Christoph Hellwig
  2018-01-17 19:47 ` [PATCH] aio: resurrect IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC support Christoph Hellwig
  2018-01-18 22:46 ` aio fsync revisited Dave Chinner
  0 siblings, 2 replies; 4+ messages in thread
From: Christoph Hellwig @ 2018-01-17 19:47 UTC (permalink / raw)
  To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel

Hi all,

this patch adds workqueue based fsync offload.  Version of this
patch have been floating around for a couple years, but we now
have a user with seastar used by ScyllaDB (who sponsored this
work) that really wants this in addition to the aio poll support.
More details are in the patch itself.

Because the iocb types have been defined sine day one (and probably
were supported by RHEL3) libaio already supports these calls as-is.

As the patch is on top of of the aio.c changes in the aio poll
series I've also provided a git tree:

    git://git.infradead.org/users/hch/vfs.git aio-fsync

Gitweb:

    http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-fsync

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] aio: resurrect IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC support
  2018-01-17 19:47 aio fsync revisited Christoph Hellwig
@ 2018-01-17 19:47 ` Christoph Hellwig
  2018-01-18 22:46 ` aio fsync revisited Dave Chinner
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2018-01-17 19:47 UTC (permalink / raw)
  To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel

These and the ->aio_fsync method had been merged together with the
initial aio support, but no ->aio_fsync method had ever been implemented
in mainline, so it got removed a while ago.

This patch wires up the iocb commands to a simple workqueue based offload
that already shows great performance.  In the future an aio_fsync method
could be added if we grow more elaborate implementations, but for now
an 6 to 8 fold improvement in the fsync rate in fs_mark should be good
enough to go with this simple version.

Note that this does not wire up the offset and length fields and thus
does not provide a ranged fsync.  The reasons for that are that in
all current file system ranges only matter for writing back page cache,
which doesn't mix with AIO anyway (as AIO only does direct I/O), and
also because these fields would bloat the aio_kiocb over the size of
the normal read/write and poll iocbs, which is worth it given the
condition above.  But the offset and length fields are checked for
being zero, so such a support could be added later if needed.

Based on an earlier patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/aio.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 0cddd24e7316..e1df7e8408ea 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -164,10 +164,17 @@ struct poll_iocb {
 	struct wait_queue_entry	wait;
 };
 
+struct fsync_iocb {
+	struct work_struct	work;
+	struct file		*file;
+	bool			datasync;
+};
+
 struct aio_kiocb {
 	union {
 		struct kiocb		rw;
 		struct poll_iocb	poll;
+		struct fsync_iocb	fsync;
 	};
 
 	struct kioctx		*ki_ctx;
@@ -1660,6 +1667,61 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
 	return -EIOCBQUEUED;
 }
 
+static void aio_fsync_work(struct work_struct *work)
+{
+	struct fsync_iocb *req = container_of(work, struct fsync_iocb, work);
+	int ret;
+
+	ret = vfs_fsync(req->file, req->datasync);
+	fput(req->file);
+	aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0);
+}
+
+static int generic_aio_fsync(struct fsync_iocb *req)
+{
+	struct super_block *sb = file_inode(req->file)->i_sb;
+
+	if (unlikely(!sb->s_dio_done_wq)) {
+		int ret = sb_init_dio_done_wq(sb);
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * Use the direct I/O completion workqueue, as that is used to queue
+	 * fsyncs for O_(D)SYNC writes already.
+	 */
+	INIT_WORK(&req->work, aio_fsync_work);
+	queue_work(sb->s_dio_done_wq, &req->work);
+	return -EIOCBQUEUED;
+}
+
+static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync)
+{
+	int ret;
+
+	if (iocb->aio_buf)
+		return -EINVAL;
+	if (iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)
+		return -EINVAL;
+
+	req->file = fget(iocb->aio_fildes);
+	if (unlikely(!req->file))
+		return -EBADF;
+
+	ret = -EINVAL;
+	if (!req->file->f_op->fsync)
+		goto out_fput;
+
+	req->datasync = datasync;
+
+	ret = generic_aio_fsync(req);
+out_fput:
+	if (unlikely(ret && ret != -EIOCBQUEUED))
+		fput(req->file);
+	return ret;
+}
+
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 struct iocb *iocb, bool compat)
 {
@@ -1723,6 +1785,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	case IOCB_CMD_PWRITEV:
 		ret = aio_write(&req->rw, iocb, true, compat);
 		break;
+	case IOCB_CMD_FSYNC:
+		ret = aio_fsync(&req->fsync, iocb, false);
+		break;
+	case IOCB_CMD_FDSYNC:
+		ret = aio_fsync(&req->fsync, iocb, true);
+		break;
 	case IOCB_CMD_POLL:
 		ret = aio_poll(req, iocb);
 		break;
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: aio fsync revisited
  2018-01-17 19:47 aio fsync revisited Christoph Hellwig
  2018-01-17 19:47 ` [PATCH] aio: resurrect IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC support Christoph Hellwig
@ 2018-01-18 22:46 ` Dave Chinner
  2018-01-19 19:06   ` Christoph Hellwig
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2018-01-18 22:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel

On Wed, Jan 17, 2018 at 08:47:46PM +0100, Christoph Hellwig wrote:
> Hi all,
> 
> this patch adds workqueue based fsync offload.  Version of this
> patch have been floating around for a couple years, but we now
> have a user with seastar used by ScyllaDB (who sponsored this
> work) that really wants this in addition to the aio poll support.
> More details are in the patch itself.
> 
> Because the iocb types have been defined sine day one (and probably
> were supported by RHEL3) libaio already supports these calls as-is.
> 
> As the patch is on top of of the aio.c changes in the aio poll
> series I've also provided a git tree:
> 
>     git://git.infradead.org/users/hch/vfs.git aio-fsync
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-fsync

After I get back from LCA (all next week) I'll update the fsmark
aio patches I have and retest this. The code looks pretty similar to
the last "generic aio fsync" patch I wrote, so I'm guessing that the
results will be pretty similar, too.

It would be good to finally get this implemented in the kernel...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: aio fsync revisited
  2018-01-18 22:46 ` aio fsync revisited Dave Chinner
@ 2018-01-19 19:06   ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2018-01-19 19:06 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, viro, Avi Kivity, linux-aio, linux-fsdevel,
	linux-api, linux-kernel

On Fri, Jan 19, 2018 at 09:46:14AM +1100, Dave Chinner wrote:
> After I get back from LCA (all next week) I'll update the fsmark
> aio patches I have and retest this. The code looks pretty similar to
> the last "generic aio fsync" patch I wrote, so I'm guessing that the
> results will be pretty similar, too.

Your patch applied as-is, as that's what I've been using to test the
feature.  Howerever I needed the following fixups to actually make
the compiler and linker happy:

diff --git a/Makefile b/Makefile
index 9b75ce3..ce5f54b 100644
--- a/Makefile
+++ b/Makefile
@@ -11,13 +11,14 @@ DIR2= /test/dir2
 
 COBJS= fs_mark.o lib_timing.o
 CFLAGS= -O2 -Wall -D_FILE_OFFSET_BITS=64
+LDFLAGS= -laio
 
 all: fs_mark 
 
 fs_mark.o: fs_mark.c fs_mark.h
 
 fs_mark: fs_mark.o lib_timing.o
-	${CC} -o fs_mark fs_mark.o lib_timing.o
+	${CC} ${LDFLAGS} -o fs_mark fs_mark.o lib_timing.o
 
 test: fs_mark
 	./fs_mark -d ${DIR1} -d ${DIR2} -s 51200 -n 4096
diff --git a/fs_mark.c b/fs_mark.c
index 8f8fb84..4a4103d 100644
--- a/fs_mark.c
+++ b/fs_mark.c
@@ -135,7 +135,7 @@ get_fsync_completions(int threshold)
 		aio_flight -= r;
 		for (i = 0; i < r; ++i) {
 			if (ioevents[i].res)
-				printf("FAIL! aio_fsync returned %d\n",
+				printf("FAIL! aio_fsync returned %zd\n",
 					ioevents[i].res);
 		}
 		usleep(1000);
@@ -162,6 +162,7 @@ do_fsync(int fd)
 		cleanup_exit();
 	}
 
+	return 0;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-01-19 19:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-17 19:47 aio fsync revisited Christoph Hellwig
2018-01-17 19:47 ` [PATCH] aio: resurrect IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC support Christoph Hellwig
2018-01-18 22:46 ` aio fsync revisited Dave Chinner
2018-01-19 19:06   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).