linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] fuse: process direct IO asynchronously
@ 2012-12-14 15:20 Maxim V. Patlasov
  2012-12-14 15:20 ` [PATCH 1/6] fuse: move fuse_release_user_pages() up Maxim V. Patlasov
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 15:20 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

Hi,

Existing fuse implementation always processes direct IO synchronously: it
submits next request to userspace fuse only when previous is completed. This
is suboptimal because: 1) libaio DIO works in blocking way; 2) userspace fuse
can't achieve parallelism  processing several requests simultaneously (e.g.
in case of distributed network storage); 3) userspace fuse can't merge
requests before passing it to actual storage.

The idea of the patch-set is to submit fuse requests in non-blocking way
(where it's possible) and either return -EIOCBQUEUED or wait for their
completion synchronously. The patch-set to be applied on top of for-next of
Miklos' git repo.

To estimate performance improvement I used slightly modified fusexmp over
tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For synchronous
operations I used 'dd' like this:

dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct
dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct conv=notrunc

For AIO I used 'aio-stress' like this:

aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file
aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file

The throughput on some commodity (rather feeble) server was (in MB/sec):

             original / patched

dd reads:    ~322     / ~382
dd writes:   ~277     / ~288

aio reads:   ~380     / ~459
aio writes:  ~319     / ~353

Changed in v2 - cleanups suggested by Brian:
 - Updated fuse_io_priv with an async field and file pointer to preserve
   the current style of interface (i.e., use this instead of iocb).
 - Trigger the type of request submission based on the async field.
 - Pulled up the fuse_write_update_size() call out of __fuse_direct_write()
   to make the separate paths more consistent.

Thanks,
Maxim

---

Maxim V. Patlasov (6):
      fuse: move fuse_release_user_pages() up
      fuse: add support of async IO
      fuse: make fuse_direct_io() aware about AIO
      fuse: enable asynchronous processing direct IO
      fuse: truncate file if async dio failed
      fuse: optimize short direct reads


 fs/fuse/cuse.c   |    6 +
 fs/fuse/file.c   |  290 +++++++++++++++++++++++++++++++++++++++++++++++-------
 fs/fuse/fuse_i.h |   19 +++-
 3 files changed, 276 insertions(+), 39 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] fuse: move fuse_release_user_pages() up
  2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
@ 2012-12-14 15:20 ` Maxim V. Patlasov
  2012-12-14 15:20 ` [PATCH 2/6] fuse: add support of async IO Maxim V. Patlasov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 15:20 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

fuse_release_user_pages() will be indirectly used by fuse_send_read/write
in future patches.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 19b50e7..6685cb0 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -491,6 +491,18 @@ void fuse_read_fill(struct fuse_req *req, struct file *file, loff_t pos,
 	req->out.args[0].size = count;
 }
 
+static void fuse_release_user_pages(struct fuse_req *req, int write)
+{
+	unsigned i;
+
+	for (i = 0; i < req->num_pages; i++) {
+		struct page *page = req->pages[i];
+		if (write)
+			set_page_dirty_lock(page);
+		put_page(page);
+	}
+}
+
 static size_t fuse_send_read(struct fuse_req *req, struct file *file,
 			     loff_t pos, size_t count, fl_owner_t owner)
 {
@@ -1035,18 +1047,6 @@ out:
 	return written ? written : err;
 }
 
-static void fuse_release_user_pages(struct fuse_req *req, int write)
-{
-	unsigned i;
-
-	for (i = 0; i < req->num_pages; i++) {
-		struct page *page = req->pages[i];
-		if (write)
-			set_page_dirty_lock(page);
-		put_page(page);
-	}
-}
-
 static inline void fuse_page_descs_length_init(struct fuse_req *req,
 		unsigned index, unsigned nr_pages)
 {


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] fuse: add support of async IO
  2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
  2012-12-14 15:20 ` [PATCH 1/6] fuse: move fuse_release_user_pages() up Maxim V. Patlasov
@ 2012-12-14 15:20 ` Maxim V. Patlasov
  2013-04-22 16:34   ` Miklos Szeredi
  2012-12-14 15:20 ` [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO Maxim V. Patlasov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 15:20 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

The patch implements a framework to process an IO request asynchronously. The
idea is to associate several fuse requests with a single kiocb by means of
fuse_io_priv structure. The structure plays the same role for FUSE as 'struct
dio' for direct-io.c.

The framework is supposed to be used like this:
 - someone (who wants to process an IO asynchronously) allocates fuse_io_priv
   and initializes it setting 'async' field to non-zero value.
 - as soon as fuse request is filled, it can be submitted (in non-blocking way)
   by fuse_async_req_send()
 - when all submitted requests are ACKed by userspace, io->reqs drops to zero
   triggering aio_complete()

In case of IO initiated by libaio, aio_complete() will finish processing the
same way as in case of dio_complete() calling aio_complete(). But the
framework may be also used for internal FUSE use when initial IO request
was synchronous (from user perspective), but it's beneficial to process it
asynchronously. Then the caller should wait on kiocb explicitly and
aio_complete() will wake the caller up.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c   |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/fuse_i.h |   17 ++++++++++
 2 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 6685cb0..8dd931f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -503,6 +503,98 @@ static void fuse_release_user_pages(struct fuse_req *req, int write)
 	}
 }
 
+/**
+ * In case of short read, the caller sets 'pos' to the position of
+ * actual end of fuse request in IO request. Otherwise, if bytes_requested
+ * == bytes_transferred or rw == WRITE, the caller sets 'pos' to -1.
+ *
+ * An example:
+ * User requested DIO read of 64K. It was splitted into two 32K fuse requests,
+ * both submitted asynchronously. The first of them was ACKed by userspace as
+ * fully completed (req->out.args[0].size == 32K) resulting in pos == -1. The
+ * second request was ACKed as short, e.g. only 1K was read, resulting in
+ * pos == 33K.
+ *
+ * Thus, when all fuse requests are completed, the minimal non-negative 'pos'
+ * will be equal to the length of the longest contiguous fragment of
+ * transferred data starting from the beginning of IO request.
+ */
+static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos)
+{
+	int left;
+
+	spin_lock(&io->lock);
+	if (err)
+		io->err = io->err ? : err;
+	else if (pos >= 0 && (io->bytes < 0 || pos < io->bytes))
+		io->bytes = pos;
+
+	left = --io->reqs;
+	spin_unlock(&io->lock);
+
+	if (!left) {
+		long res;
+
+		if (io->err)
+			res = io->err;
+		else if (io->bytes >= 0 && io->write)
+			res = -EIO;
+		else {
+			res = io->bytes < 0 ? io->size : io->bytes;
+
+			if (!is_sync_kiocb(io->iocb)) {
+				struct path *path = &io->iocb->ki_filp->f_path;
+				struct inode *inode = path->dentry->d_inode;
+				struct fuse_conn *fc = get_fuse_conn(inode);
+				struct fuse_inode *fi = get_fuse_inode(inode);
+
+				spin_lock(&fc->lock);
+				fi->attr_version = ++fc->attr_version;
+				spin_unlock(&fc->lock);
+			}
+		}
+
+		aio_complete(io->iocb, res, 0);
+		kfree(io);
+	}
+}
+
+static void fuse_aio_complete_req(struct fuse_conn *fc, struct fuse_req *req)
+{
+	struct fuse_io_priv *io = req->io;
+	ssize_t pos = -1;
+
+	fuse_release_user_pages(req, !io->write);
+
+	if (io->write) {
+		if (req->misc.write.in.size != req->misc.write.out.size)
+			pos = req->misc.write.in.offset - io->offset +
+				req->misc.write.out.size;
+	} else {
+		if (req->misc.read.in.size != req->out.args[0].size)
+			pos = req->misc.read.in.offset - io->offset +
+				req->out.args[0].size;
+	}
+
+	fuse_aio_complete(io, req->out.h.error, pos);
+}
+
+static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req,
+		size_t num_bytes, struct fuse_io_priv *io)
+{
+	spin_lock(&io->lock);
+	io->size += num_bytes;
+	io->reqs++;
+	spin_unlock(&io->lock);
+
+	req->io = io;
+	req->end = fuse_aio_complete_req;
+
+	fuse_request_send_background(fc, req);
+
+	return num_bytes;
+}
+
 static size_t fuse_send_read(struct fuse_req *req, struct file *file,
 			     loff_t pos, size_t count, fl_owner_t owner)
 {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e4f70ea..e0a5b65 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -219,6 +219,20 @@ enum fuse_req_state {
 	FUSE_REQ_FINISHED
 };
 
+/** The request IO state (for asynchronous processing) */
+struct fuse_io_priv {
+	int async;
+	spinlock_t lock;
+	unsigned reqs;
+	ssize_t bytes;
+	size_t size;
+	__u64 offset;
+	bool write;
+	int err;
+	struct kiocb *iocb;
+	struct file *file;
+};
+
 /**
  * A request to the client
  */
@@ -323,6 +337,9 @@ struct fuse_req {
 	/** Inode used in the request or NULL */
 	struct inode *inode;
 
+	/** AIO control block */
+	struct fuse_io_priv *io;
+
 	/** Link on fi->writepages */
 	struct list_head writepages_entry;
 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO
  2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
  2012-12-14 15:20 ` [PATCH 1/6] fuse: move fuse_release_user_pages() up Maxim V. Patlasov
  2012-12-14 15:20 ` [PATCH 2/6] fuse: add support of async IO Maxim V. Patlasov
@ 2012-12-14 15:20 ` Maxim V. Patlasov
  2012-12-14 15:21 ` [PATCH 4/6] fuse: enable asynchronous processing direct IO Maxim V. Patlasov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 15:20 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

The patch implements passing "struct fuse_io_priv *io" down the stack up to
fuse_send_read/write where it is used to submit request asynchronously.
io->async==0 designates synchronous processing.

Non-trivial part of the patch is changes in fuse_direct_io(): resources
like fuse requests and user pages cannot be released immediately in async
case.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/cuse.c   |    6 +++--
 fs/fuse/file.c   |   69 +++++++++++++++++++++++++++++++++++++++---------------
 fs/fuse/fuse_i.h |    2 +-
 3 files changed, 55 insertions(+), 22 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 65ce10a..d890901 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -92,8 +92,9 @@ static ssize_t cuse_read(struct file *file, char __user *buf, size_t count,
 {
 	loff_t pos = 0;
 	struct iovec iov = { .iov_base = buf, .iov_len = count };
+	struct fuse_io_priv io = { .async = 0, .file = file };
 
-	return fuse_direct_io(file, &iov, 1, count, &pos, 0);
+	return fuse_direct_io(&io, &iov, 1, count, &pos, 0);
 }
 
 static ssize_t cuse_write(struct file *file, const char __user *buf,
@@ -101,12 +102,13 @@ static ssize_t cuse_write(struct file *file, const char __user *buf,
 {
 	loff_t pos = 0;
 	struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count };
+	struct fuse_io_priv io = { .async = 0, .file = file };
 
 	/*
 	 * No locking or generic_write_checks(), the server is
 	 * responsible for locking and sanity checks.
 	 */
-	return fuse_direct_io(file, &iov, 1, count, &pos, 1);
+	return fuse_direct_io(&io, &iov, 1, count, &pos, 1);
 }
 
 static int cuse_open(struct inode *inode, struct file *file)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8dd931f..6c2ca8a 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -595,9 +595,10 @@ static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req,
 	return num_bytes;
 }
 
-static size_t fuse_send_read(struct fuse_req *req, struct file *file,
+static size_t fuse_send_read(struct fuse_req *req, struct fuse_io_priv *io,
 			     loff_t pos, size_t count, fl_owner_t owner)
 {
+	struct file *file = io->file;
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = ff->fc;
 
@@ -608,6 +609,10 @@ static size_t fuse_send_read(struct fuse_req *req, struct file *file,
 		inarg->read_flags |= FUSE_READ_LOCKOWNER;
 		inarg->lock_owner = fuse_lock_owner_id(fc, owner);
 	}
+
+	if (io->async)
+		return fuse_async_req_send(fc, req, count, io);
+
 	fuse_request_send(fc, req);
 	return req->out.args[0].size;
 }
@@ -628,6 +633,7 @@ static void fuse_read_update_size(struct inode *inode, loff_t size,
 
 static int fuse_readpage(struct file *file, struct page *page)
 {
+	struct fuse_io_priv io = { .async = 0, .file = file };
 	struct inode *inode = page->mapping->host;
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	struct fuse_req *req;
@@ -660,7 +666,7 @@ static int fuse_readpage(struct file *file, struct page *page)
 	req->num_pages = 1;
 	req->pages[0] = page;
 	req->page_descs[0].length = count;
-	num_read = fuse_send_read(req, file, pos, count, NULL);
+	num_read = fuse_send_read(req, &io, pos, count, NULL);
 	err = req->out.h.error;
 	fuse_put_request(fc, req);
 
@@ -862,9 +868,10 @@ static void fuse_write_fill(struct fuse_req *req, struct fuse_file *ff,
 	req->out.args[0].value = outarg;
 }
 
-static size_t fuse_send_write(struct fuse_req *req, struct file *file,
+static size_t fuse_send_write(struct fuse_req *req, struct fuse_io_priv *io,
 			      loff_t pos, size_t count, fl_owner_t owner)
 {
+	struct file *file = io->file;
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = ff->fc;
 	struct fuse_write_in *inarg = &req->misc.write.in;
@@ -875,6 +882,10 @@ static size_t fuse_send_write(struct fuse_req *req, struct file *file,
 		inarg->write_flags |= FUSE_WRITE_LOCKOWNER;
 		inarg->lock_owner = fuse_lock_owner_id(fc, owner);
 	}
+
+	if (io->async)
+		return fuse_async_req_send(fc, req, count, io);
+
 	fuse_request_send(fc, req);
 	return req->misc.write.out.size;
 }
@@ -898,11 +909,12 @@ static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file,
 	size_t res;
 	unsigned offset;
 	unsigned i;
+	struct fuse_io_priv io = { .async = 0, .file = file };
 
 	for (i = 0; i < req->num_pages; i++)
 		fuse_wait_on_page_writeback(inode, req->pages[i]->index);
 
-	res = fuse_send_write(req, file, pos, count, NULL);
+	res = fuse_send_write(req, &io, pos, count, NULL);
 
 	offset = req->page_descs[0].offset;
 	count = res;
@@ -1240,10 +1252,11 @@ static inline int fuse_iter_npages(const struct iov_iter *ii_p)
 	return min(npages, FUSE_MAX_PAGES_PER_REQ);
 }
 
-ssize_t fuse_direct_io(struct file *file, const struct iovec *iov,
+ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov,
 		       unsigned long nr_segs, size_t count, loff_t *ppos,
 		       int write)
 {
+	struct file *file = io->file;
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = ff->fc;
 	size_t nmax = write ? fc->max_write : fc->max_read;
@@ -1264,16 +1277,20 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov,
 		size_t nbytes = min(count, nmax);
 		int err = fuse_get_user_pages(req, &ii, &nbytes, write);
 		if (err) {
+			if (io->async)
+				fuse_put_request(fc, req);
+
 			res = err;
 			break;
 		}
 
 		if (write)
-			nres = fuse_send_write(req, file, pos, nbytes, owner);
+			nres = fuse_send_write(req, io, pos, nbytes, owner);
 		else
-			nres = fuse_send_read(req, file, pos, nbytes, owner);
+			nres = fuse_send_read(req, io, pos, nbytes, owner);
 
-		fuse_release_user_pages(req, !write);
+		if (!io->async)
+			fuse_release_user_pages(req, !write);
 		if (req->out.h.error) {
 			if (!res)
 				res = req->out.h.error;
@@ -1288,13 +1305,14 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov,
 		if (nres != nbytes)
 			break;
 		if (count) {
-			fuse_put_request(fc, req);
+			if (!io->async)
+				fuse_put_request(fc, req);
 			req = fuse_get_req(fc, fuse_iter_npages(&ii));
 			if (IS_ERR(req))
 				break;
 		}
 	}
-	if (!IS_ERR(req))
+	if (!IS_ERR(req) && !io->async)
 		fuse_put_request(fc, req);
 	if (res > 0)
 		*ppos = pos;
@@ -1303,16 +1321,17 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov,
 }
 EXPORT_SYMBOL_GPL(fuse_direct_io);
 
-static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov,
+static ssize_t __fuse_direct_read(struct fuse_io_priv *io, const struct iovec *iov,
 				  unsigned long nr_segs, loff_t *ppos)
 {
 	ssize_t res;
+	struct file *file = io->file;
 	struct inode *inode = file->f_path.dentry->d_inode;
 
 	if (is_bad_inode(inode))
 		return -EIO;
 
-	res = fuse_direct_io(file, iov, nr_segs, iov_length(iov, nr_segs),
+	res = fuse_direct_io(io, iov, nr_segs, iov_length(iov, nr_segs),
 			     ppos, 0);
 
 	fuse_invalidate_attr(inode);
@@ -1323,21 +1342,23 @@ static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov,
 static ssize_t fuse_direct_read(struct file *file, char __user *buf,
 				     size_t count, loff_t *ppos)
 {
+	struct fuse_io_priv io = { .async = 0, .file = file };
 	struct iovec iov = { .iov_base = buf, .iov_len = count };
-	return __fuse_direct_read(file, &iov, 1, ppos);
+	return __fuse_direct_read(&io, &iov, 1, ppos);
 }
 
-static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov,
+static ssize_t __fuse_direct_write(struct fuse_io_priv *io, const struct iovec *iov,
 				   unsigned long nr_segs, loff_t *ppos)
 {
+	struct file *file = io->file;
 	struct inode *inode = file->f_path.dentry->d_inode;
 	size_t count = iov_length(iov, nr_segs);
 	ssize_t res;
 
 	res = generic_write_checks(file, ppos, &count, 0);
 	if (!res) {
-		res = fuse_direct_io(file, iov, nr_segs, count, ppos, 1);
-		if (res > 0)
+		res = fuse_direct_io(io, iov, nr_segs, count, ppos, 1);
+		if (!io->async && res > 0)
 			fuse_write_update_size(inode, *ppos);
 	}
 
@@ -1352,13 +1373,14 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf,
 	struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count };
 	struct inode *inode = file->f_path.dentry->d_inode;
 	ssize_t res;
+	struct fuse_io_priv io = { .async = 0, .file = file };
 
 	if (is_bad_inode(inode))
 		return -EIO;
 
 	/* Don't allow parallel writes to the same file */
 	mutex_lock(&inode->i_mutex);
-	res = __fuse_direct_write(file, &iov, 1, ppos);
+	res = __fuse_direct_write(&io, &iov, 1, ppos);
 	mutex_unlock(&inode->i_mutex);
 
 	return res;
@@ -2326,14 +2348,23 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	ssize_t ret = 0;
 	struct file *file = NULL;
 	loff_t pos = 0;
+	struct fuse_io_priv *io;
 
 	file = iocb->ki_filp;
 	pos = offset;
 
+	io = kzalloc(sizeof(struct fuse_io_priv), GFP_KERNEL);
+	if (!io)
+		return -ENOMEM;
+
+	io->file = file;
+
 	if (rw == WRITE)
-		ret = __fuse_direct_write(file, iov, nr_segs, &pos);
+		ret = __fuse_direct_write(io, iov, nr_segs, &pos);
 	else
-		ret = __fuse_direct_read(file, iov, nr_segs, &pos);
+		ret = __fuse_direct_read(io, iov, nr_segs, &pos);
+
+	kfree(io);
 
 	return ret;
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e0a5b65..91b5192 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -828,7 +828,7 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid,
 
 int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
 		 bool isdir);
-ssize_t fuse_direct_io(struct file *file, const struct iovec *iov,
+ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov,
 		       unsigned long nr_segs, size_t count, loff_t *ppos,
 		       int write);
 long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg,


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] fuse: enable asynchronous processing direct IO
  2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
                   ` (2 preceding siblings ...)
  2012-12-14 15:20 ` [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO Maxim V. Patlasov
@ 2012-12-14 15:21 ` Maxim V. Patlasov
  2012-12-14 15:21 ` [PATCH 5/6] fuse: truncate file if async dio failed Maxim V. Patlasov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 15:21 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

In case of synchronous DIO request (i.e. read(2) or write(2) for a file
opened with O_DIRECT), the patch submits fuse requests asynchronously, but
waits for their completions before return from fuse_direct_IO().

In case of asynchronous DIO request (i.e. libaio io_submit() or a file opened
with O_DIRECT), the patch submits fuse requests asynchronously and return
-EIOCBQUEUED immediately.

The only special case is async DIO extending file. Here the patch falls back
to old behaviour because we can't return -EIOCBQUEUED and update i_size later,
without i_mutex hold. And we have no method to wait on real async I/O
requests.

The patch also clean __fuse_direct_write() up: it's better to update i_size
in its callers. Thanks Brian for suggestion.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 6c2ca8a..05eed23 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1356,11 +1356,8 @@ static ssize_t __fuse_direct_write(struct fuse_io_priv *io, const struct iovec *
 	ssize_t res;
 
 	res = generic_write_checks(file, ppos, &count, 0);
-	if (!res) {
+	if (!res)
 		res = fuse_direct_io(io, iov, nr_segs, count, ppos, 1);
-		if (!io->async && res > 0)
-			fuse_write_update_size(inode, *ppos);
-	}
 
 	fuse_invalidate_attr(inode);
 
@@ -1381,6 +1378,8 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf,
 	/* Don't allow parallel writes to the same file */
 	mutex_lock(&inode->i_mutex);
 	res = __fuse_direct_write(&io, &iov, 1, ppos);
+	if (res > 0)
+		fuse_write_update_size(inode, *ppos);
 	mutex_unlock(&inode->i_mutex);
 
 	return res;
@@ -2348,23 +2347,61 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	ssize_t ret = 0;
 	struct file *file = NULL;
 	loff_t pos = 0;
+	struct inode *inode;
+	loff_t i_size;
+	size_t count = iov_length(iov, nr_segs);
 	struct fuse_io_priv *io;
 
 	file = iocb->ki_filp;
 	pos = offset;
+	inode = file->f_mapping->host;
+	i_size = i_size_read(inode);
 
-	io = kzalloc(sizeof(struct fuse_io_priv), GFP_KERNEL);
+	io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL);
 	if (!io)
 		return -ENOMEM;
-
+	spin_lock_init(&io->lock);
+	io->reqs = 1;
+	io->bytes = -1;
+	io->size = 0;
+	io->offset = offset;
+	io->write = (rw == WRITE);
+	io->err = 0;
 	io->file = file;
+	/*
+	 * By default, we want to optimize all I/Os with async request submission
+	 * to the client filesystem.
+	 */
+	io->async = 1;
+	io->iocb = iocb;
+
+	/*
+	 * We cannot asynchronously extend the size of a file. We have no method
+	 * to wait on real async I/O requests, so we must submit this request
+	 * synchronously.
+	 */
+	if (!is_sync_kiocb(iocb) && (offset + count > i_size) && rw == WRITE)
+		io->async = 0;
 
 	if (rw == WRITE)
 		ret = __fuse_direct_write(io, iov, nr_segs, &pos);
 	else
 		ret = __fuse_direct_read(io, iov, nr_segs, &pos);
 
-	kfree(io);
+	if (io->async) {
+		fuse_aio_complete(io, ret == count ? 0 : -EIO, -1);
+		
+		/* we have a non-extending, async request, so return */
+		if (!is_sync_kiocb(iocb))
+			return -EIOCBQUEUED;
+
+		ret = wait_on_sync_kiocb(iocb);
+	} else {
+		kfree(io);
+	}
+
+	if (rw == WRITE && ret > 0)
+		fuse_write_update_size(inode, pos);
 
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] fuse: truncate file if async dio failed
  2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
                   ` (3 preceding siblings ...)
  2012-12-14 15:21 ` [PATCH 4/6] fuse: enable asynchronous processing direct IO Maxim V. Patlasov
@ 2012-12-14 15:21 ` Maxim V. Patlasov
  2012-12-14 20:16   ` Brian Foster
  2012-12-18 10:05   ` [PATCH] fuse: truncate file if async dio failed - v2 Maxim V. Patlasov
  2012-12-14 15:21 ` [PATCH 6/6] fuse: optimize short direct reads Maxim V. Patlasov
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 15:21 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

The patch improves error handling in fuse_direct_IO(): if we successfully
submitted several fuse requests on behalf of synchronous direct write
extending file and some of them failed, let's try to do our best to clean-up.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 05eed23..b6e9b8d 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2340,6 +2340,53 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 	return 0;
 }
 
+static void fuse_do_truncate(struct file *file)
+{
+	struct fuse_file *ff = file->private_data;
+	struct inode *inode = file->f_mapping->host;
+	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_req *req;
+	struct fuse_setattr_in inarg;
+	struct fuse_attr_out outarg;
+	int err;
+
+	req = fuse_get_req_nopages(fc);
+	if (IS_ERR(req)) {
+		printk(KERN_WARNING "failed to allocate req for truncate "
+		       "(%ld)\n", PTR_ERR(req));
+		return;
+	}
+
+	memset(&inarg, 0, sizeof(inarg));
+	memset(&outarg, 0, sizeof(outarg));
+
+	inarg.valid |= FATTR_SIZE;
+	inarg.size = i_size_read(inode);
+
+	inarg.valid |= FATTR_FH;
+	inarg.fh = ff->fh;
+
+	req->in.h.opcode = FUSE_SETATTR;
+	req->in.h.nodeid = get_node_id(inode);
+	req->in.numargs = 1;
+	req->in.args[0].size = sizeof(inarg);
+	req->in.args[0].value = &inarg;
+	req->out.numargs = 1;
+	if (fc->minor < 9)
+		req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE;
+	else
+		req->out.args[0].size = sizeof(outarg);
+	req->out.args[0].value = &outarg;
+
+	fuse_request_send(fc, req);
+	err = req->out.h.error;
+	fuse_put_request(fc, req);
+
+	if (err)
+		printk(KERN_WARNING "failed to truncate to %lld with error "
+		       "%d\n", i_size_read(inode), err);
+}
+
 static ssize_t
 fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 			loff_t offset, unsigned long nr_segs)
@@ -2400,8 +2447,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 		kfree(io);
 	}
 
-	if (rw == WRITE && ret > 0)
-		fuse_write_update_size(inode, pos);
+	if (rw == WRITE) {
+		if (ret > 0)
+			fuse_write_update_size(inode, pos);
+		else if (ret < 0 && offset + count > i_size)
+			fuse_do_truncate(file);
+	}
 
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] fuse: optimize short direct reads
  2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
                   ` (4 preceding siblings ...)
  2012-12-14 15:21 ` [PATCH 5/6] fuse: truncate file if async dio failed Maxim V. Patlasov
@ 2012-12-14 15:21 ` Maxim V. Patlasov
  2012-12-18 14:14 ` [PATCH v2 0/6] fuse: process direct IO asynchronously Brian Foster
  2013-04-11 11:22 ` [fuse-devel] " Maxim V. Patlasov
  7 siblings, 0 replies; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 15:21 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

If user requested direct read beyond EOF, we can skip sending fuse requests
for positions beyond EOF because userspace would ACK them with zero bytes read
anyway. We can trust to i_size in fuse_direct_IO for such cases because it's
called from fuse_file_aio_read() and the latter updates fuse attributes
including i_size.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b6e9b8d..ceacd20 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1322,7 +1322,8 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov,
 EXPORT_SYMBOL_GPL(fuse_direct_io);
 
 static ssize_t __fuse_direct_read(struct fuse_io_priv *io, const struct iovec *iov,
-				  unsigned long nr_segs, loff_t *ppos)
+				  unsigned long nr_segs, loff_t *ppos,
+				  size_t count)
 {
 	ssize_t res;
 	struct file *file = io->file;
@@ -1331,8 +1332,7 @@ static ssize_t __fuse_direct_read(struct fuse_io_priv *io, const struct iovec *i
 	if (is_bad_inode(inode))
 		return -EIO;
 
-	res = fuse_direct_io(io, iov, nr_segs, iov_length(iov, nr_segs),
-			     ppos, 0);
+	res = fuse_direct_io(io, iov, nr_segs, count, ppos, 0);
 
 	fuse_invalidate_attr(inode);
 
@@ -1344,7 +1344,7 @@ static ssize_t fuse_direct_read(struct file *file, char __user *buf,
 {
 	struct fuse_io_priv io = { .async = 0, .file = file };
 	struct iovec iov = { .iov_base = buf, .iov_len = count };
-	return __fuse_direct_read(&io, &iov, 1, ppos);
+	return __fuse_direct_read(&io, &iov, 1, ppos, count);
 }
 
 static ssize_t __fuse_direct_write(struct fuse_io_priv *io, const struct iovec *iov,
@@ -2404,6 +2404,13 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	inode = file->f_mapping->host;
 	i_size = i_size_read(inode);
 
+	/* optimization for short read */
+	if (rw != WRITE && offset + count > i_size) {
+		if (offset >= i_size)
+			return 0;
+		count = i_size - offset;
+	}
+
 	io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL);
 	if (!io)
 		return -ENOMEM;
@@ -2427,13 +2434,13 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 	 * to wait on real async I/O requests, so we must submit this request
 	 * synchronously.
 	 */
-	if (!is_sync_kiocb(iocb) && (offset + count > i_size) && rw == WRITE)
+	if (!is_sync_kiocb(iocb) && (offset + count > i_size))
 		io->async = 0;
 
 	if (rw == WRITE)
 		ret = __fuse_direct_write(io, iov, nr_segs, &pos);
 	else
-		ret = __fuse_direct_read(io, iov, nr_segs, &pos);
+		ret = __fuse_direct_read(io, iov, nr_segs, &pos, count);
 
 	if (io->async) {
 		fuse_aio_complete(io, ret == count ? 0 : -EIO, -1);


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: truncate file if async dio failed
  2012-12-14 15:21 ` [PATCH 5/6] fuse: truncate file if async dio failed Maxim V. Patlasov
@ 2012-12-14 20:16   ` Brian Foster
  2012-12-17 14:13     ` Maxim V. Patlasov
  2012-12-18 10:05   ` [PATCH] fuse: truncate file if async dio failed - v2 Maxim V. Patlasov
  1 sibling, 1 reply; 19+ messages in thread
From: Brian Foster @ 2012-12-14 20:16 UTC (permalink / raw)
  To: Maxim V. Patlasov; +Cc: miklos, dev, xemul, fuse-devel, linux-kernel, devel

On 12/14/2012 10:21 AM, Maxim V. Patlasov wrote:
> The patch improves error handling in fuse_direct_IO(): if we successfully
> submitted several fuse requests on behalf of synchronous direct write
> extending file and some of them failed, let's try to do our best to clean-up.
> 
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/fuse/file.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 05eed23..b6e9b8d 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -2340,6 +2340,53 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
>  	return 0;
>  }
>  
> +static void fuse_do_truncate(struct file *file)
> +{
> +	struct fuse_file *ff = file->private_data;
> +	struct inode *inode = file->f_mapping->host;
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +	struct fuse_req *req;
> +	struct fuse_setattr_in inarg;
> +	struct fuse_attr_out outarg;
> +	int err;
> +
> +	req = fuse_get_req_nopages(fc);
> +	if (IS_ERR(req)) {
> +		printk(KERN_WARNING "failed to allocate req for truncate "
> +		       "(%ld)\n", PTR_ERR(req));
> +		return;
> +	}
> +
> +	memset(&inarg, 0, sizeof(inarg));
> +	memset(&outarg, 0, sizeof(outarg));
> +
> +	inarg.valid |= FATTR_SIZE;
> +	inarg.size = i_size_read(inode);
> +
> +	inarg.valid |= FATTR_FH;
> +	inarg.fh = ff->fh;
> +
> +	req->in.h.opcode = FUSE_SETATTR;
> +	req->in.h.nodeid = get_node_id(inode);
> +	req->in.numargs = 1;
> +	req->in.args[0].size = sizeof(inarg);
> +	req->in.args[0].value = &inarg;
> +	req->out.numargs = 1;
> +	if (fc->minor < 9)
> +		req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE;
> +	else
> +		req->out.args[0].size = sizeof(outarg);
> +	req->out.args[0].value = &outarg;
> +
> +	fuse_request_send(fc, req);
> +	err = req->out.h.error;
> +	fuse_put_request(fc, req);
> +
> +	if (err)
> +		printk(KERN_WARNING "failed to truncate to %lld with error "
> +		       "%d\n", i_size_read(inode), err);
> +}
> +

fuse_do_truncate() looks fairly close to fuse_do_setattr(). Is there any
reason we couldn't make fuse_do_setattr() non-static, change the dentry
parameter to an inode and use that?

Brian

>  static ssize_t
>  fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
>  			loff_t offset, unsigned long nr_segs)
> @@ -2400,8 +2447,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
>  		kfree(io);
>  	}
>  
> -	if (rw == WRITE && ret > 0)
> -		fuse_write_update_size(inode, pos);
> +	if (rw == WRITE) {
> +		if (ret > 0)
> +			fuse_write_update_size(inode, pos);
> +		else if (ret < 0 && offset + count > i_size)
> +			fuse_do_truncate(file);
> +	}
>  
>  	return ret;
>  }
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: truncate file if async dio failed
  2012-12-14 20:16   ` Brian Foster
@ 2012-12-17 14:13     ` Maxim V. Patlasov
  2012-12-17 19:04       ` Brian Foster
  0 siblings, 1 reply; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-17 14:13 UTC (permalink / raw)
  To: Brian Foster; +Cc: miklos, dev, xemul, fuse-devel, linux-kernel, devel

Hi,

12/15/2012 12:16 AM, Brian Foster пишет:
> On 12/14/2012 10:21 AM, Maxim V. Patlasov wrote:
>> The patch improves error handling in fuse_direct_IO(): if we successfully
>> submitted several fuse requests on behalf of synchronous direct write
>> extending file and some of them failed, let's try to do our best to clean-up.
>>
>> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
>> ---
>>   fs/fuse/file.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>   1 files changed, 53 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index 05eed23..b6e9b8d 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -2340,6 +2340,53 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
>>   	return 0;
>>   }
>>   
>> +static void fuse_do_truncate(struct file *file)
>> +{
>> +	struct fuse_file *ff = file->private_data;
>> +	struct inode *inode = file->f_mapping->host;
>> +	struct fuse_conn *fc = get_fuse_conn(inode);
>> +	struct fuse_req *req;
>> +	struct fuse_setattr_in inarg;
>> +	struct fuse_attr_out outarg;
>> +	int err;
>> +
>> +	req = fuse_get_req_nopages(fc);
>> +	if (IS_ERR(req)) {
>> +		printk(KERN_WARNING "failed to allocate req for truncate "
>> +		       "(%ld)\n", PTR_ERR(req));
>> +		return;
>> +	}
>> +
>> +	memset(&inarg, 0, sizeof(inarg));
>> +	memset(&outarg, 0, sizeof(outarg));
>> +
>> +	inarg.valid |= FATTR_SIZE;
>> +	inarg.size = i_size_read(inode);
>> +
>> +	inarg.valid |= FATTR_FH;
>> +	inarg.fh = ff->fh;
>> +
>> +	req->in.h.opcode = FUSE_SETATTR;
>> +	req->in.h.nodeid = get_node_id(inode);
>> +	req->in.numargs = 1;
>> +	req->in.args[0].size = sizeof(inarg);
>> +	req->in.args[0].value = &inarg;
>> +	req->out.numargs = 1;
>> +	if (fc->minor < 9)
>> +		req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE;
>> +	else
>> +		req->out.args[0].size = sizeof(outarg);
>> +	req->out.args[0].value = &outarg;
>> +
>> +	fuse_request_send(fc, req);
>> +	err = req->out.h.error;
>> +	fuse_put_request(fc, req);
>> +
>> +	if (err)
>> +		printk(KERN_WARNING "failed to truncate to %lld with error "
>> +		       "%d\n", i_size_read(inode), err);
>> +}
>> +
> fuse_do_truncate() looks fairly close to fuse_do_setattr(). Is there any
> reason we couldn't make fuse_do_setattr() non-static, change the dentry
> parameter to an inode and use that?

fuse_do_setattr() performs extra checks that fuse_do_truncate() needn't. 
Some of them are harmless, some not: fuse_allow_task() may return 0 if 
task credentials changed. E.g. super-user successfully opened a file, 
then setuid(other_user_uid), then write(2) to the file. write(2) doesn't 
check uid, but fuse_do_truncate() - via fuse_allow_task() - does.

This non-POSIX behaviour (ftruncate(2) returning -1 with errno==EACCES) 
was introduced long time ago:

> commit e57ac68378a287d6336d187b26971f35f7ee7251
> Author: Miklos Szeredi <mszeredi@suse.cz>
> Date:   Thu Oct 18 03:06:58 2007 -0700
>
>     fuse: fix allowing operations
>
>     The following operation didn't check if sending the request was 
> allowed:
>
>       setattr
>       listxattr
>       statfs
>
>     Some other operations don't explicitly do the check, but VFS calls
>     ->permission() which checks this.
>
>     Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

and I'm not sure whether it was done intentionally or not. Maybe Miklos 
could shed some light on it...

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: truncate file if async dio failed
  2012-12-17 14:13     ` Maxim V. Patlasov
@ 2012-12-17 19:04       ` Brian Foster
  2012-12-18  8:12         ` Maxim V. Patlasov
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Foster @ 2012-12-17 19:04 UTC (permalink / raw)
  To: Maxim V. Patlasov; +Cc: miklos, dev, xemul, fuse-devel, linux-kernel, devel

On 12/17/2012 09:13 AM, Maxim V. Patlasov wrote:
> Hi,
> 
> 12/15/2012 12:16 AM, Brian Foster пишет:
>> On 12/14/2012 10:21 AM, Maxim V. Patlasov wrote:
...
>>> +
>> fuse_do_truncate() looks fairly close to fuse_do_setattr(). Is there any
>> reason we couldn't make fuse_do_setattr() non-static, change the dentry
>> parameter to an inode and use that?
> 
> fuse_do_setattr() performs extra checks that fuse_do_truncate() needn't.
> Some of them are harmless, some not: fuse_allow_task() may return 0 if
> task credentials changed. E.g. super-user successfully opened a file,
> then setuid(other_user_uid), then write(2) to the file. write(2) doesn't
> check uid, but fuse_do_truncate() - via fuse_allow_task() - does.
> 

Conversely, what about the extra error handling bits in
fuse_do_setattr() that do not appear in fuse_do_truncate() (i.e., the
inode mode check, the change attributes call, updating the inode size,
etc.)? It seems like we would want some of that code here.

fuse_setattr() is the only caller of fuse_do_setattr(), so why not embed
some of the initial checks (such as fuse_allow_task()) there? I suppose
we could pull out some of the error handling checks there as well if
they are considered harmful to this post-write error truncate situation.

FWIW, I just tested a quick change that pulls up the fuse_allow_task()
check (via instrumenting a write error) and it seems to work as
expected. I can forward a patch if interested...

Brian

> This non-POSIX behaviour (ftruncate(2) returning -1 with errno==EACCES)
> was introduced long time ago:
> 
>> commit e57ac68378a287d6336d187b26971f35f7ee7251
>> Author: Miklos Szeredi <mszeredi@suse.cz>
>> Date:   Thu Oct 18 03:06:58 2007 -0700
>>
>>     fuse: fix allowing operations
>>
>>     The following operation didn't check if sending the request was
>> allowed:
>>
>>       setattr
>>       listxattr
>>       statfs
>>
>>     Some other operations don't explicitly do the check, but VFS calls
>>     ->permission() which checks this.
>>
>>     Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
>>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> and I'm not sure whether it was done intentionally or not. Maybe Miklos
> could shed some light on it...
> 
> Thanks,
> Maxim


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: truncate file if async dio failed
  2012-12-17 19:04       ` Brian Foster
@ 2012-12-18  8:12         ` Maxim V. Patlasov
  2013-04-17 20:42           ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-18  8:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: miklos, dev, xemul, fuse-devel, linux-kernel, devel

12/17/2012 11:04 PM, Brian Foster пишет:
> On 12/17/2012 09:13 AM, Maxim V. Patlasov wrote:
>> Hi,
>>
>> 12/15/2012 12:16 AM, Brian Foster пишет:
>>> On 12/14/2012 10:21 AM, Maxim V. Patlasov wrote:
> ...
>>>> +
>>> fuse_do_truncate() looks fairly close to fuse_do_setattr(). Is there any
>>> reason we couldn't make fuse_do_setattr() non-static, change the dentry
>>> parameter to an inode and use that?
>> fuse_do_setattr() performs extra checks that fuse_do_truncate() needn't.
>> Some of them are harmless, some not: fuse_allow_task() may return 0 if
>> task credentials changed. E.g. super-user successfully opened a file,
>> then setuid(other_user_uid), then write(2) to the file. write(2) doesn't
>> check uid, but fuse_do_truncate() - via fuse_allow_task() - does.
>>
> Conversely, what about the extra error handling bits in
> fuse_do_setattr() that do not appear in fuse_do_truncate() (i.e., the
> inode mode check, the change attributes call, updating the inode size,
> etc.)? It seems like we would want some of that code here.

Yes, they won't harm.

>
> fuse_setattr() is the only caller of fuse_do_setattr(), so why not embed
> some of the initial checks (such as fuse_allow_task()) there? I suppose
> we could pull out some of the error handling checks there as well if
> they are considered harmful to this post-write error truncate situation.

Makes sense. I like it especially because it allows to avoid code 
duplication (handling FUSE_SETATTR fuse-request).

> FWIW, I just tested a quick change that pulls up the fuse_allow_task()
> check (via instrumenting a write error) and it seems to work as
> expected. I can forward a patch if interested...

I did exactly the same before sending previous email :) In my tests it 
works as expected too (modulo fuse_allow_task() that we can move up). 
I'll re-send corrected patch soon.

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] fuse: truncate file if async dio failed - v2
  2012-12-14 15:21 ` [PATCH 5/6] fuse: truncate file if async dio failed Maxim V. Patlasov
  2012-12-14 20:16   ` Brian Foster
@ 2012-12-18 10:05   ` Maxim V. Patlasov
  1 sibling, 0 replies; 19+ messages in thread
From: Maxim V. Patlasov @ 2012-12-18 10:05 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

The patch improves error handling in fuse_direct_IO(): if we successfully
submitted several fuse requests on behalf of synchronous direct write
extending file and some of them failed, let's try to do our best to clean-up.

Changed in v2: reuse fuse_do_setattr(). Thanks to Brian for suggestion.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/dir.c    |   17 +++++++++--------
 fs/fuse/file.c   |   27 +++++++++++++++++++++++++--
 fs/fuse/fuse_i.h |    3 +++
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 20b52a5..049d4c2 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1532,10 +1532,9 @@ void fuse_release_nowrite(struct inode *inode)
  * vmtruncate() doesn't allow for this case, so do the rlimit checking
  * and the actual truncation by hand.
  */
-static int fuse_do_setattr(struct dentry *entry, struct iattr *attr,
-			   struct file *file)
+int fuse_do_setattr(struct inode *inode, struct iattr *attr,
+		    struct file *file)
 {
-	struct inode *inode = entry->d_inode;
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	struct fuse_req *req;
 	struct fuse_setattr_in inarg;
@@ -1544,9 +1543,6 @@ static int fuse_do_setattr(struct dentry *entry, struct iattr *attr,
 	loff_t oldsize;
 	int err;
 
-	if (!fuse_allow_task(fc, current))
-		return -EACCES;
-
 	if (!(fc->flags & FUSE_DEFAULT_PERMISSIONS))
 		attr->ia_valid |= ATTR_FORCE;
 
@@ -1641,10 +1637,15 @@ error:
 
 static int fuse_setattr(struct dentry *entry, struct iattr *attr)
 {
+	struct inode *inode = entry->d_inode;
+
+	if (!fuse_allow_task(get_fuse_conn(inode), current))
+		return -EACCES;
+
 	if (attr->ia_valid & ATTR_FILE)
-		return fuse_do_setattr(entry, attr, attr->ia_file);
+		return fuse_do_setattr(inode, attr, attr->ia_file);
 	else
-		return fuse_do_setattr(entry, attr, NULL);
+		return fuse_do_setattr(inode, attr, NULL);
 }
 
 static int fuse_getattr(struct vfsmount *mnt, struct dentry *entry,
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 05eed23..d9a0568 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2340,6 +2340,25 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 	return 0;
 }
 
+static void fuse_do_truncate(struct file *file)
+{
+	struct inode *inode = file->f_mapping->host;
+	struct iattr attr;
+	int err;
+
+	attr.ia_valid = ATTR_SIZE;
+	attr.ia_size = i_size_read(inode);
+
+	attr.ia_file = file;
+	attr.ia_valid |= ATTR_FILE;
+
+	err = fuse_do_setattr(inode, &attr, file);
+
+	if (err)
+		printk(KERN_WARNING "failed to truncate to %lld with error "
+		       "%d\n", i_size_read(inode), err);
+}
+
 static ssize_t
 fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 			loff_t offset, unsigned long nr_segs)
@@ -2400,8 +2419,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 		kfree(io);
 	}
 
-	if (rw == WRITE && ret > 0)
-		fuse_write_update_size(inode, pos);
+	if (rw == WRITE) {
+		if (ret > 0)
+			fuse_write_update_size(inode, pos);
+		else if (ret < 0 && offset + count > i_size)
+			fuse_do_truncate(file);
+	}
 
 	return ret;
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 91b5192..d4f7f07 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -840,4 +840,7 @@ int fuse_dev_release(struct inode *inode, struct file *file);
 
 void fuse_write_update_size(struct inode *inode, loff_t pos);
 
+int fuse_do_setattr(struct inode *inode, struct iattr *attr,
+		    struct file *file);
+
 #endif /* _FS_FUSE_I_H */


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/6] fuse: process direct IO asynchronously
  2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
                   ` (5 preceding siblings ...)
  2012-12-14 15:21 ` [PATCH 6/6] fuse: optimize short direct reads Maxim V. Patlasov
@ 2012-12-18 14:14 ` Brian Foster
  2013-04-11 11:22 ` [fuse-devel] " Maxim V. Patlasov
  7 siblings, 0 replies; 19+ messages in thread
From: Brian Foster @ 2012-12-18 14:14 UTC (permalink / raw)
  To: Maxim V. Patlasov; +Cc: miklos, dev, xemul, fuse-devel, linux-kernel, devel

On 12/14/2012 10:20 AM, Maxim V. Patlasov wrote:
> Hi,
> 
...
> The throughput on some commodity (rather feeble) server was (in MB/sec):
> 
>              original / patched
> 
> dd reads:    ~322     / ~382
> dd writes:   ~277     / ~288
> 
> aio reads:   ~380     / ~459
> aio writes:  ~319     / ~353
> 
> Changed in v2 - cleanups suggested by Brian:
>  - Updated fuse_io_priv with an async field and file pointer to preserve
>    the current style of interface (i.e., use this instead of iocb).
>  - Trigger the type of request submission based on the async field.
>  - Pulled up the fuse_write_update_size() call out of __fuse_direct_write()
>    to make the separate paths more consistent.
> 

This version plus the updated "fuse: truncated file if async dio failed
- v2" patch address all the questions I had on the set, so consider it:

Reviewed-by: Brian Foster <bfoster@redhat.com>

I also ran some of your aio/dio performance tests on a basic gluster
volume (single client to server) and repeated positive results. The
results include rewrite numbers (file extending writes generally matched
original throughput). Results in MB/s:

		original / patched
1GigE
dd reads:	~74	/	~104
dd rewrites:	~67	/	~103
aio reads:	~53	/	~110
aio rewrites:	~52	/	~112

10GigE
dd reads:	~175	/	~437
dd rewrites:	~134	/	~390
aio reads:	~84	/	~417
aio rewrites:	~88	/	~401

Brian

> Thanks,
> Maxim
> 
> ---
> 
> Maxim V. Patlasov (6):
>       fuse: move fuse_release_user_pages() up
>       fuse: add support of async IO
>       fuse: make fuse_direct_io() aware about AIO
>       fuse: enable asynchronous processing direct IO
>       fuse: truncate file if async dio failed
>       fuse: optimize short direct reads
> 
> 
>  fs/fuse/cuse.c   |    6 +
>  fs/fuse/file.c   |  290 +++++++++++++++++++++++++++++++++++++++++++++++-------
>  fs/fuse/fuse_i.h |   19 +++-
>  3 files changed, 276 insertions(+), 39 deletions(-)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [fuse-devel] [PATCH v2 0/6] fuse: process direct IO asynchronously
  2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
                   ` (6 preceding siblings ...)
  2012-12-18 14:14 ` [PATCH v2 0/6] fuse: process direct IO asynchronously Brian Foster
@ 2013-04-11 11:22 ` Maxim V. Patlasov
  2013-04-11 16:07   ` Miklos Szeredi
  7 siblings, 1 reply; 19+ messages in thread
From: Maxim V. Patlasov @ 2013-04-11 11:22 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, linux-kernel, devel

Hi Miklos,

Any feedback would be highly appreciated.

Thanks,
Maxim

12/14/2012 07:20 PM, Maxim V. Patlasov пишет:
> Hi,
>
> Existing fuse implementation always processes direct IO synchronously: it
> submits next request to userspace fuse only when previous is completed. This
> is suboptimal because: 1) libaio DIO works in blocking way; 2) userspace fuse
> can't achieve parallelism  processing several requests simultaneously (e.g.
> in case of distributed network storage); 3) userspace fuse can't merge
> requests before passing it to actual storage.
>
> The idea of the patch-set is to submit fuse requests in non-blocking way
> (where it's possible) and either return -EIOCBQUEUED or wait for their
> completion synchronously. The patch-set to be applied on top of for-next of
> Miklos' git repo.
>
> To estimate performance improvement I used slightly modified fusexmp over
> tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For synchronous
> operations I used 'dd' like this:
>
> dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct
> dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct conv=notrunc
>
> For AIO I used 'aio-stress' like this:
>
> aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file
> aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file
>
> The throughput on some commodity (rather feeble) server was (in MB/sec):
>
>               original / patched
>
> dd reads:    ~322     / ~382
> dd writes:   ~277     / ~288
>
> aio reads:   ~380     / ~459
> aio writes:  ~319     / ~353
>
> Changed in v2 - cleanups suggested by Brian:
>   - Updated fuse_io_priv with an async field and file pointer to preserve
>     the current style of interface (i.e., use this instead of iocb).
>   - Trigger the type of request submission based on the async field.
>   - Pulled up the fuse_write_update_size() call out of __fuse_direct_write()
>     to make the separate paths more consistent.
>
> Thanks,
> Maxim
>
> ---
>
> Maxim V. Patlasov (6):
>        fuse: move fuse_release_user_pages() up
>        fuse: add support of async IO
>        fuse: make fuse_direct_io() aware about AIO
>        fuse: enable asynchronous processing direct IO
>        fuse: truncate file if async dio failed
>        fuse: optimize short direct reads
>
>
>   fs/fuse/cuse.c   |    6 +
>   fs/fuse/file.c   |  290 +++++++++++++++++++++++++++++++++++++++++++++++-------
>   fs/fuse/fuse_i.h |   19 +++-
>   3 files changed, 276 insertions(+), 39 deletions(-)
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [fuse-devel] [PATCH v2 0/6] fuse: process direct IO asynchronously
  2013-04-11 11:22 ` [fuse-devel] " Maxim V. Patlasov
@ 2013-04-11 16:07   ` Miklos Szeredi
  2013-04-11 16:43     ` Maxim V. Patlasov
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2013-04-11 16:07 UTC (permalink / raw)
  To: Maxim V. Patlasov; +Cc: dev, xemul, fuse-devel, linux-kernel, devel

Hi Maxim,

On Thu, Apr 11, 2013 at 1:22 PM, Maxim V. Patlasov
<mpatlasov@parallels.com> wrote:
> Hi Miklos,
>
> Any feedback would be highly appreciated.

What is the order of all these patchsets with regards to each other?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [fuse-devel] [PATCH v2 0/6] fuse: process direct IO asynchronously
  2013-04-11 16:07   ` Miklos Szeredi
@ 2013-04-11 16:43     ` Maxim V. Patlasov
  0 siblings, 0 replies; 19+ messages in thread
From: Maxim V. Patlasov @ 2013-04-11 16:43 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: dev, xemul, fuse-devel, linux-kernel, devel

Hi,

04/11/2013 08:07 PM, Miklos Szeredi пишет:
> Hi Maxim,
>
> On Thu, Apr 11, 2013 at 1:22 PM, Maxim V. Patlasov
> <mpatlasov@parallels.com> wrote:
>> Hi Miklos,
>>
>> Any feedback would be highly appreciated.
> What is the order of all these patchsets with regards to each other?

They are logically independent, so I formed them to be applied w/o each 
other. There might be some minor collisions between them (if you try to 
apply one patch-set on the top of another). So, as soon as you get one 
of them to fuse-next, I'll update others to be applied smoothly. Either 
we can settle down some order now, and I'll do it in advance.

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: truncate file if async dio failed
  2012-12-18  8:12         ` Maxim V. Patlasov
@ 2013-04-17 20:42           ` Miklos Szeredi
  0 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2013-04-17 20:42 UTC (permalink / raw)
  To: Maxim V. Patlasov
  Cc: Brian Foster, Kirill Korotaev, Pavel Emelianov, fuse-devel,
	Kernel Mailing List, devel

On Tue, Dec 18, 2012 at 9:12 AM, Maxim V. Patlasov
<mpatlasov@parallels.com> wrote:
> I did exactly the same before sending previous email :) In my tests it works
> as expected too (modulo fuse_allow_task() that we can move up). I'll re-send
> corrected patch soon.

This patch was not yet re-sent.  I applied the rest of them in this
series (with small changes) and pushed to for-next.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] fuse: add support of async IO
  2012-12-14 15:20 ` [PATCH 2/6] fuse: add support of async IO Maxim V. Patlasov
@ 2013-04-22 16:34   ` Miklos Szeredi
  2013-04-23 12:21     ` Maxim V. Patlasov
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2013-04-22 16:34 UTC (permalink / raw)
  To: Maxim V. Patlasov; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

On Fri, Dec 14, 2012 at 07:20:41PM +0400, Maxim V. Patlasov wrote:
> The patch implements a framework to process an IO request asynchronously. The
> idea is to associate several fuse requests with a single kiocb by means of
> fuse_io_priv structure. The structure plays the same role for FUSE as 'struct
> dio' for direct-io.c.
> 
> The framework is supposed to be used like this:
>  - someone (who wants to process an IO asynchronously) allocates fuse_io_priv
>    and initializes it setting 'async' field to non-zero value.
>  - as soon as fuse request is filled, it can be submitted (in non-blocking way)
>    by fuse_async_req_send()
>  - when all submitted requests are ACKed by userspace, io->reqs drops to zero
>    triggering aio_complete()
> 
> In case of IO initiated by libaio, aio_complete() will finish processing the
> same way as in case of dio_complete() calling aio_complete(). But the
> framework may be also used for internal FUSE use when initial IO request
> was synchronous (from user perspective), but it's beneficial to process it
> asynchronously. Then the caller should wait on kiocb explicitly and
> aio_complete() will wake the caller up.
> 
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/fuse/file.c   |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/fuse_i.h |   17 ++++++++++
>  2 files changed, 109 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 6685cb0..8dd931f 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -503,6 +503,98 @@ static void fuse_release_user_pages(struct fuse_req *req, int write)
>  	}
>  }
>  
> +/**
> + * In case of short read, the caller sets 'pos' to the position of
> + * actual end of fuse request in IO request. Otherwise, if bytes_requested
> + * == bytes_transferred or rw == WRITE, the caller sets 'pos' to -1.
> + *
> + * An example:
> + * User requested DIO read of 64K. It was splitted into two 32K fuse requests,
> + * both submitted asynchronously. The first of them was ACKed by userspace as
> + * fully completed (req->out.args[0].size == 32K) resulting in pos == -1. The
> + * second request was ACKed as short, e.g. only 1K was read, resulting in
> + * pos == 33K.
> + *
> + * Thus, when all fuse requests are completed, the minimal non-negative 'pos'
> + * will be equal to the length of the longest contiguous fragment of
> + * transferred data starting from the beginning of IO request.
> + */
> +static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos)
> +{
> +	int left;
> +
> +	spin_lock(&io->lock);
> +	if (err)
> +		io->err = io->err ? : err;
> +	else if (pos >= 0 && (io->bytes < 0 || pos < io->bytes))
> +		io->bytes = pos;
> +
> +	left = --io->reqs;
> +	spin_unlock(&io->lock);
> +
> +	if (!left) {
> +		long res;
> +
> +		if (io->err)
> +			res = io->err;
> +		else if (io->bytes >= 0 && io->write)
> +			res = -EIO;
> +		else {
> +			res = io->bytes < 0 ? io->size : io->bytes;
> +
> +			if (!is_sync_kiocb(io->iocb)) {
> +				struct path *path = &io->iocb->ki_filp->f_path;
> +				struct inode *inode = path->dentry->d_inode;
> +				struct fuse_conn *fc = get_fuse_conn(inode);
> +				struct fuse_inode *fi = get_fuse_inode(inode);
> +
> +				spin_lock(&fc->lock);
> +				fi->attr_version = ++fc->attr_version;
> +				spin_unlock(&fc->lock);

Hmm, what is this?  Incrementing the attr version without setting any attributes
doesn't make sense.

Thanks,
Miklos


> +			}
> +		}
> +
> +		aio_complete(io->iocb, res, 0);
> +		kfree(io);
> +	}
> +}
> +
> +static void fuse_aio_complete_req(struct fuse_conn *fc, struct fuse_req *req)
> +{
> +	struct fuse_io_priv *io = req->io;
> +	ssize_t pos = -1;
> +
> +	fuse_release_user_pages(req, !io->write);
> +
> +	if (io->write) {
> +		if (req->misc.write.in.size != req->misc.write.out.size)
> +			pos = req->misc.write.in.offset - io->offset +
> +				req->misc.write.out.size;
> +	} else {
> +		if (req->misc.read.in.size != req->out.args[0].size)
> +			pos = req->misc.read.in.offset - io->offset +
> +				req->out.args[0].size;
> +	}
> +
> +	fuse_aio_complete(io, req->out.h.error, pos);
> +}
> +
> +static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req,
> +		size_t num_bytes, struct fuse_io_priv *io)
> +{
> +	spin_lock(&io->lock);
> +	io->size += num_bytes;
> +	io->reqs++;
> +	spin_unlock(&io->lock);
> +
> +	req->io = io;
> +	req->end = fuse_aio_complete_req;
> +
> +	fuse_request_send_background(fc, req);
> +
> +	return num_bytes;
> +}
> +
>  static size_t fuse_send_read(struct fuse_req *req, struct file *file,
>  			     loff_t pos, size_t count, fl_owner_t owner)
>  {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e4f70ea..e0a5b65 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -219,6 +219,20 @@ enum fuse_req_state {
>  	FUSE_REQ_FINISHED
>  };
>  
> +/** The request IO state (for asynchronous processing) */
> +struct fuse_io_priv {
> +	int async;
> +	spinlock_t lock;
> +	unsigned reqs;
> +	ssize_t bytes;
> +	size_t size;
> +	__u64 offset;
> +	bool write;
> +	int err;
> +	struct kiocb *iocb;
> +	struct file *file;
> +};
> +
>  /**
>   * A request to the client
>   */
> @@ -323,6 +337,9 @@ struct fuse_req {
>  	/** Inode used in the request or NULL */
>  	struct inode *inode;
>  
> +	/** AIO control block */
> +	struct fuse_io_priv *io;
> +
>  	/** Link on fi->writepages */
>  	struct list_head writepages_entry;
>  
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] fuse: add support of async IO
  2013-04-22 16:34   ` Miklos Szeredi
@ 2013-04-23 12:21     ` Maxim V. Patlasov
  0 siblings, 0 replies; 19+ messages in thread
From: Maxim V. Patlasov @ 2013-04-23 12:21 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel

Hi Miklos,

04/22/2013 08:34 PM, Miklos Szeredi пишет:
> On Fri, Dec 14, 2012 at 07:20:41PM +0400, Maxim V. Patlasov wrote:
>> The patch implements a framework to process an IO request asynchronously. The
>> idea is to associate several fuse requests with a single kiocb by means of
>> fuse_io_priv structure. The structure plays the same role for FUSE as 'struct
>> dio' for direct-io.c.
>>
>> The framework is supposed to be used like this:
>>   - someone (who wants to process an IO asynchronously) allocates fuse_io_priv
>>     and initializes it setting 'async' field to non-zero value.
>>   - as soon as fuse request is filled, it can be submitted (in non-blocking way)
>>     by fuse_async_req_send()
>>   - when all submitted requests are ACKed by userspace, io->reqs drops to zero
>>     triggering aio_complete()
>>
>> In case of IO initiated by libaio, aio_complete() will finish processing the
>> same way as in case of dio_complete() calling aio_complete(). But the
>> framework may be also used for internal FUSE use when initial IO request
>> was synchronous (from user perspective), but it's beneficial to process it
>> asynchronously. Then the caller should wait on kiocb explicitly and
>> aio_complete() will wake the caller up.
>>
>> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
>> ---
>>   fs/fuse/file.c   |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/fuse/fuse_i.h |   17 ++++++++++
>>   2 files changed, 109 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index 6685cb0..8dd931f 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -503,6 +503,98 @@ static void fuse_release_user_pages(struct fuse_req *req, int write)
>>   	}
>>   }
>>   
>> +/**
>> + * In case of short read, the caller sets 'pos' to the position of
>> + * actual end of fuse request in IO request. Otherwise, if bytes_requested
>> + * == bytes_transferred or rw == WRITE, the caller sets 'pos' to -1.
>> + *
>> + * An example:
>> + * User requested DIO read of 64K. It was splitted into two 32K fuse requests,
>> + * both submitted asynchronously. The first of them was ACKed by userspace as
>> + * fully completed (req->out.args[0].size == 32K) resulting in pos == -1. The
>> + * second request was ACKed as short, e.g. only 1K was read, resulting in
>> + * pos == 33K.
>> + *
>> + * Thus, when all fuse requests are completed, the minimal non-negative 'pos'
>> + * will be equal to the length of the longest contiguous fragment of
>> + * transferred data starting from the beginning of IO request.
>> + */
>> +static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos)
>> +{
>> +	int left;
>> +
>> +	spin_lock(&io->lock);
>> +	if (err)
>> +		io->err = io->err ? : err;
>> +	else if (pos >= 0 && (io->bytes < 0 || pos < io->bytes))
>> +		io->bytes = pos;
>> +
>> +	left = --io->reqs;
>> +	spin_unlock(&io->lock);
>> +
>> +	if (!left) {
>> +		long res;
>> +
>> +		if (io->err)
>> +			res = io->err;
>> +		else if (io->bytes >= 0 && io->write)
>> +			res = -EIO;
>> +		else {
>> +			res = io->bytes < 0 ? io->size : io->bytes;
>> +
>> +			if (!is_sync_kiocb(io->iocb)) {
>> +				struct path *path = &io->iocb->ki_filp->f_path;
>> +				struct inode *inode = path->dentry->d_inode;
>> +				struct fuse_conn *fc = get_fuse_conn(inode);
>> +				struct fuse_inode *fi = get_fuse_inode(inode);
>> +
>> +				spin_lock(&fc->lock);
>> +				fi->attr_version = ++fc->attr_version;
>> +				spin_unlock(&fc->lock);
> Hmm, what is this?  Incrementing the attr version without setting any attributes
> doesn't make sense.

It makes sense at least for writes. __fuse_direct_write() always called 
fuse_write_update_size() and the latter always incremented attr_version, 
even if *ppos <= inode->i_size. I believed it was implemented in this 
way intentionally: if write succeeded, the file is changed on server, 
hence attrs requested from server early should be regarded as stale.

Adding async IO support to fuse, a case emerges when 
fuse_write_update_size() won't be called: incoming direct IO write is 
asynchronous (e.g. it came from libaio), it's not extending write, so 
it's allowable to process it by submitting fuse requests to background 
and return -EIOCBQUEUED without waiting for completions (see 4th patch 
of this patch-set). But in this case the file on server will be changed 
anyway. That's why I bump attr_version in fuse_aio_complete() -- to be 
consistent with the model we had before this patch-set.

The fact that I did the trick both for writes and reads was probably 
overlook. I'd suggest to fix it like this:

> -			if (!is_sync_kiocb(io->iocb)) {
> +			if (!is_sync_kiocb(io->iocb) && io->write) {

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-04-23 12:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
2012-12-14 15:20 ` [PATCH 1/6] fuse: move fuse_release_user_pages() up Maxim V. Patlasov
2012-12-14 15:20 ` [PATCH 2/6] fuse: add support of async IO Maxim V. Patlasov
2013-04-22 16:34   ` Miklos Szeredi
2013-04-23 12:21     ` Maxim V. Patlasov
2012-12-14 15:20 ` [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO Maxim V. Patlasov
2012-12-14 15:21 ` [PATCH 4/6] fuse: enable asynchronous processing direct IO Maxim V. Patlasov
2012-12-14 15:21 ` [PATCH 5/6] fuse: truncate file if async dio failed Maxim V. Patlasov
2012-12-14 20:16   ` Brian Foster
2012-12-17 14:13     ` Maxim V. Patlasov
2012-12-17 19:04       ` Brian Foster
2012-12-18  8:12         ` Maxim V. Patlasov
2013-04-17 20:42           ` Miklos Szeredi
2012-12-18 10:05   ` [PATCH] fuse: truncate file if async dio failed - v2 Maxim V. Patlasov
2012-12-14 15:21 ` [PATCH 6/6] fuse: optimize short direct reads Maxim V. Patlasov
2012-12-18 14:14 ` [PATCH v2 0/6] fuse: process direct IO asynchronously Brian Foster
2013-04-11 11:22 ` [fuse-devel] " Maxim V. Patlasov
2013-04-11 16:07   ` Miklos Szeredi
2013-04-11 16:43     ` Maxim V. Patlasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).