linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] FUSE: Parallel direct writes on the same file
@ 2022-04-08  6:18 Dharmendra Singh
  2022-04-08  6:18 ` [PATCH 1/1] FUSE: Allow parallel " Dharmendra Singh
  0 siblings, 1 reply; 7+ messages in thread
From: Dharmendra Singh @ 2022-04-08  6:18 UTC (permalink / raw)
  To: bschubert, miklos
  Cc: Dharmendra Singh, linux-fsdevel, fuse-devel, linux-kernel

It is observed that currently in Fuse, for direct writes, we hold 
inode lock for the full duration of the request. As a result, 
only one direct write request can proceed on the same file. This, 
I think is kept the way it is due to the reason that many FUSE 
user space implementations rely on this serialization for 
cache/data integrity which is fine. But it is hurting badly FUSE 
implementations which have their own mechanism of keeping data/cache 
integrity and can handle parallel writes on same file/region.

This patch allows parallel writes to proceed on the same file by
by not holding the inode lock all the time but only acquire it 
when needed to update certain fields. Default behaviour remains the
same i.e one direct write at a time.

I carried out performance test on these changes over example/passthrough
(part of libfuse) by setting direct-io and parallel_direct_writes flags
on the file. 
Note that we disabled write to underlying file system from passthrough 
as we wanted to check gain for Fuse only. Fio was used to test
the impact of these changes on File-per-job and Single shared File. 
CPU binding was performed on passthrough process only.

Job file for SSF:
[global]
directory=/tmp/dest
filename=ssf
size=100g
blocksize=1m
ioengine=sync
group_reporting=1
fallocate=none
runtime=60
stonewall

[write]
rw=randwrite:256
rw_sequencer=sequential
fsync_on_close=1


Job file for file-per-job:
[sequential-write]
rw=write
size=100G
directory=/tmp/dest/
group_reporting
name=sequential-write-direct
bs=1M
runtime=60


RESULT:

Unpatched:-

Single shared file:
numjobs: 1  WRITE: bw=2679MiB/s (2809MB/s), 2679MiB/s-2679MiB/s (2809MB/s-2809MB/s), io=100GiB (107GB), run=38220-38220msec
numjobs: 2  WRITE: bw=4902MiB/s (5140MB/s), 4902MiB/s-4902MiB/s (5140MB/s-5140MB/s), io=200GiB (215GB), run=41778-41778msec
numjobs: 4  WRITE: bw=2756MiB/s (2890MB/s), 2756MiB/s-2756MiB/s (2890MB/s-2890MB/s), io=161GiB (173GB), run=60002-60002msec
numjobs: 8  WRITE: bw=4444MiB/s (4659MB/s), 4444MiB/s-4444MiB/s (4659MB/s-4659MB/s), io=260GiB (280GB), run=60003-60003msec
numjobs: 16  WRITE: bw=3045MiB/s (3192MB/s), 3045MiB/s-3045MiB/s (3192MB/s-3192MB/s), io=178GiB (192GB), run=60006-60006msec
numjobs: 32  WRITE: bw=2977MiB/s (3122MB/s), 2977MiB/s-2977MiB/s (3122MB/s-3122MB/s), io=174GiB (187GB), run=60014-60014msec

File per job:
numjobs: 1  WRITE: bw=3236MiB/s (3393MB/s), 3236MiB/s-3236MiB/s (3393MB/s-3393MB/s), io=100GiB (107GB), run=31647-31647msec
numjobs: 2  WRITE: bw=8087MiB/s (8480MB/s), 8087MiB/s-8087MiB/s (8480MB/s-8480MB/s), io=200GiB (215GB), run=25324-25324msec
numjobs: 4  WRITE: bw=13.8GiB/s (14.8GB/s), 13.8GiB/s-13.8GiB/s (14.8GB/s-14.8GB/s), io=400GiB (429GB), run=28951-28951msec
numjobs: 8  WRITE: bw=20.4GiB/s (21.9GB/s), 20.4GiB/s-20.4GiB/s (21.9GB/s-21.9GB/s), io=800GiB (859GB), run=39266-39266msec
numjobs: 16  WRITE: bw=24.4GiB/s (26.2GB/s), 24.4GiB/s-24.4GiB/s (26.2GB/s-26.2GB/s), io=1462GiB (1569GB), run=60001-60001msec
numjobs: 32  WRITE: bw=20.1GiB/s (21.6GB/s), 20.1GiB/s-20.1GiB/s (21.6GB/s-21.6GB/s), io=1205GiB (1294GB), run=60002-60002msec



Patched:-

Single shared file:
numjobs: 1  WRITE: bw=2674MiB/s (2804MB/s), 2674MiB/s-2674MiB/s (2804MB/s-2804MB/s), io=100GiB (107GB), run=38288-38288msec
numjobs: 2  WRITE: bw=7945MiB/s (8331MB/s), 7945MiB/s-7945MiB/s (8331MB/s-8331MB/s), io=200GiB (215GB), run=25777-25777msec
numjobs: 4  WRITE: bw=14.3GiB/s (15.4GB/s), 14.3GiB/s-14.3GiB/s (15.4GB/s-15.4GB/s), io=400GiB (429GB), run=27935-27935msec
numjobs: 8  WRITE: bw=22.5GiB/s (24.2GB/s), 22.5GiB/s-22.5GiB/s (24.2GB/s-24.2GB/s), io=800GiB (859GB), run=35566-35566msec
numjobs: 16  WRITE: bw=23.7GiB/s (25.5GB/s), 23.7GiB/s-23.7GiB/s (25.5GB/s-25.5GB/s), io=1423GiB (1528GB), run=60001-60001msec
numjobs: 32  WRITE: bw=20.5GiB/s (22.1GB/s), 20.5GiB/s-20.5GiB/s (22.1GB/s-22.1GB/s), io=1233GiB (1324GB), run=60002-60002msec


File per job:
numjobs: 1  WRITE: bw=3546MiB/s (3718MB/s), 3546MiB/s-3546MiB/s (3718MB/s-3718MB/s), io=100GiB (107GB), run=28878-28878msec
numjobs: 2  WRITE: bw=7899MiB/s (8283MB/s), 7899MiB/s-7899MiB/s (8283MB/s-8283MB/s), io=200GiB (215GB), run=25927-25927msec
numjobs: 4  WRITE: bw=14.0GiB/s (15.0GB/s), 14.0GiB/s-14.0GiB/s (15.0GB/s-15.0GB/s), io=400GiB (429GB), run=28548-28548msec
numjobs: 8  WRITE: bw=20.9GiB/s (22.4GB/s), 20.9GiB/s-20.9GiB/s (22.4GB/s-22.4GB/s), io=800GiB (859GB), run=38308-38308msec
numjobs: 16  WRITE: bw=23.2GiB/s (24.9GB/s), 23.2GiB/s-23.2GiB/s (24.9GB/s-24.9GB/s), io=1391GiB (1493GB), run=60001-60001msec
numjobs: 32  WRITE: bw=20.3GiB/s (21.8GB/s), 20.3GiB/s-20.3GiB/s (21.8GB/s-21.8GB/s), io=1218GiB (1308GB), run=60002-60002msec


SSF gain in percentage:-
For 1 fio thread: +0%
For 2 fio threads: +0% 
For 4 fio threads: +148%
For 8 fio threads: +206.8%
For 16 fio threads: +695.8%
For 32 fio threads: +608%

We could see gain is huge. Also it brought single shared file as per file-per-job.

Dharmendra Singh (1):
  Allow parallel direct writes on the same file.

 fs/fuse/file.c            | 38 ++++++++++++++++++++++++++++++++++----
 include/uapi/linux/fuse.h |  2 ++
 2 files changed, 36 insertions(+), 4 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] FUSE: Allow parallel direct writes on the same file
  2022-04-08  6:18 [PATCH 0/1] FUSE: Parallel direct writes on the same file Dharmendra Singh
@ 2022-04-08  6:18 ` Dharmendra Singh
  2022-04-21 15:22   ` Miklos Szeredi
  0 siblings, 1 reply; 7+ messages in thread
From: Dharmendra Singh @ 2022-04-08  6:18 UTC (permalink / raw)
  To: bschubert, miklos
  Cc: Dharmendra Singh, linux-fsdevel, fuse-devel, linux-kernel,
	Dharmendra Singh

As of now, in Fuse, direct writes on the same file are serialized
over inode lock i.e we hold inode lock for the whole duration of
the write request. This serialization works pretty well for the FUSE
user space implementations which rely on this inode lock for their
cache/data integrity etc. But it hurts badly such FUSE implementations
which has their own ways of mainting data/cache integrity and does not
use this serialization at all.

This patch allows parallel direct writes on the same file with the
help of a flag called FOPEN_PARALLEL_WRITES. If this flag is set on
the file (flag is passed from libfuse to fuse kernel as part of file
open/create), we do not hold inode lock for the whole duration of the
request, instead acquire it only to protect updates on certain fields
of the inode. FUSE implementations which rely on this inode lock can
continue to do so and this is default behaviour.

Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
---
 fs/fuse/file.c            | 38 ++++++++++++++++++++++++++++++++++----
 include/uapi/linux/fuse.h |  2 ++
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 37eebfb90500..d3e8f44c1228 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1465,6 +1465,8 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
 	int err = 0;
 	struct fuse_io_args *ia;
 	unsigned int max_pages;
+	bool p_write = write &&
+		(ff->open_flags & FOPEN_PARALLEL_WRITES) ? true : false;
 
 	max_pages = iov_iter_npages(iter, fc->max_pages);
 	ia = fuse_io_alloc(io, max_pages);
@@ -1472,10 +1474,11 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
 		return -ENOMEM;
 
 	if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) {
-		if (!write)
+		/* Parallel write does not come with inode lock held */
+		if (!write || p_write)
 			inode_lock(inode);
 		fuse_sync_writes(inode);
-		if (!write)
+		if (!write || p_write)
 			inode_unlock(inode);
 	}
 
@@ -1568,22 +1571,36 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
 static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
 	struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
 	ssize_t res;
+	bool p_write = ff->open_flags & FOPEN_PARALLEL_WRITES ? true : false;
+	bool unlock_inode = true;
 
 	/* Don't allow parallel writes to the same file */
 	inode_lock(inode);
 	res = generic_write_checks(iocb, from);
 	if (res > 0) {
+		/* Allow parallel writes on the inode by unlocking it */
+		if (p_write) {
+			inode_unlock(inode);
+			unlock_inode = false;
+		}
 		if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) {
 			res = fuse_direct_IO(iocb, from);
 		} else {
 			res = fuse_direct_io(&io, from, &iocb->ki_pos,
 					     FUSE_DIO_WRITE);
+			if (p_write) {
+				inode_lock(inode);
+				unlock_inode = true;
+			}
 			fuse_write_update_attr(inode, iocb->ki_pos, res);
 		}
 	}
-	inode_unlock(inode);
+	if (unlock_inode)
+		inode_unlock(inode);
 
 	return res;
 }
@@ -2850,10 +2867,16 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	size_t count = iov_iter_count(iter), shortened = 0;
 	loff_t offset = iocb->ki_pos;
 	struct fuse_io_priv *io;
-
+	bool p_write = (iov_iter_rw(iter) == WRITE &&
+			ff->open_flags & FOPEN_PARALLEL_WRITES);
 	pos = offset;
 	inode = file->f_mapping->host;
+
+	if (p_write)
+		inode_lock(inode);
 	i_size = i_size_read(inode);
+	if (p_write)
+		inode_unlock(inode);
 
 	if ((iov_iter_rw(iter) == READ) && (offset >= i_size))
 		return 0;
@@ -2924,9 +2947,16 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	kref_put(&io->refcnt, fuse_io_release);
 
 	if (iov_iter_rw(iter) == WRITE) {
+
+		if (p_write)
+			inode_lock(inode);
+
 		fuse_write_update_attr(inode, pos, ret);
 		if (ret < 0 && offset + count > i_size)
 			fuse_do_truncate(file);
+
+		if (p_write)
+			inode_unlock(inode);
 	}
 
 	return ret;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index a28dd60078ff..07f00dfeb0ce 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -301,6 +301,7 @@ struct fuse_file_lock {
  * FOPEN_CACHE_DIR: allow caching this directory
  * FOPEN_STREAM: the file is stream-like (no file position at all)
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
+ * FOPEN_PARALLEL_WRITES: Allow concurrent writes on the same inode
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -308,6 +309,7 @@ struct fuse_file_lock {
 #define FOPEN_CACHE_DIR		(1 << 3)
 #define FOPEN_STREAM		(1 << 4)
 #define FOPEN_NOFLUSH		(1 << 5)
+#define FOPEN_PARALLEL_WRITES	(1 << 6)
 
 /**
  * INIT request/reply flags
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] FUSE: Allow parallel direct writes on the same file
  2022-04-08  6:18 ` [PATCH 1/1] FUSE: Allow parallel " Dharmendra Singh
@ 2022-04-21 15:22   ` Miklos Szeredi
  2022-04-22 14:30     ` Dharmendra Hans
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2022-04-21 15:22 UTC (permalink / raw)
  To: Dharmendra Singh
  Cc: Bernd Schubert, linux-fsdevel, fuse-devel, linux-kernel,
	Dharmendra Singh

On Fri, 8 Apr 2022 at 08:18, Dharmendra Singh <dharamhans87@gmail.com> wrote:
>
> As of now, in Fuse, direct writes on the same file are serialized
> over inode lock i.e we hold inode lock for the whole duration of
> the write request. This serialization works pretty well for the FUSE
> user space implementations which rely on this inode lock for their
> cache/data integrity etc. But it hurts badly such FUSE implementations
> which has their own ways of mainting data/cache integrity and does not
> use this serialization at all.
>
> This patch allows parallel direct writes on the same file with the
> help of a flag called FOPEN_PARALLEL_WRITES. If this flag is set on
> the file (flag is passed from libfuse to fuse kernel as part of file
> open/create), we do not hold inode lock for the whole duration of the
> request, instead acquire it only to protect updates on certain fields
> of the inode. FUSE implementations which rely on this inode lock can
> continue to do so and this is default behaviour.
>
> Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> ---
>  fs/fuse/file.c            | 38 ++++++++++++++++++++++++++++++++++----
>  include/uapi/linux/fuse.h |  2 ++
>  2 files changed, 36 insertions(+), 4 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 37eebfb90500..d3e8f44c1228 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1465,6 +1465,8 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
>         int err = 0;
>         struct fuse_io_args *ia;
>         unsigned int max_pages;
> +       bool p_write = write &&
> +               (ff->open_flags & FOPEN_PARALLEL_WRITES) ? true : false;
>
>         max_pages = iov_iter_npages(iter, fc->max_pages);
>         ia = fuse_io_alloc(io, max_pages);
> @@ -1472,10 +1474,11 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
>                 return -ENOMEM;
>
>         if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) {
> -               if (!write)
> +               /* Parallel write does not come with inode lock held */
> +               if (!write || p_write)

Probably would be good to add an inode_is_locked() assert in
fuse_sync_writes() to make sure we don't miss cases silently.

>                         inode_lock(inode);
>                 fuse_sync_writes(inode);
> -               if (!write)
> +               if (!write || p_write)
>                         inode_unlock(inode);
>         }
>
> @@ -1568,22 +1571,36 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>         struct inode *inode = file_inode(iocb->ki_filp);
> +       struct file *file = iocb->ki_filp;
> +       struct fuse_file *ff = file->private_data;
>         struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
>         ssize_t res;
> +       bool p_write = ff->open_flags & FOPEN_PARALLEL_WRITES ? true : false;
> +       bool unlock_inode = true;
>
>         /* Don't allow parallel writes to the same file */
>         inode_lock(inode);
>         res = generic_write_checks(iocb, from);

I don't think this needs inode lock.  At least nfs_file_direct_write()
doesn't have it.

What it does have, however is taking the inode lock for shared for the
actual write operation, which is probably something that fuse needs as
well.

Also I worry about size extending writes not holding the inode lock
exclusive.  Would that be a problem in your use case?

>         if (res > 0) {
> +               /* Allow parallel writes on the inode by unlocking it */
> +               if (p_write) {
> +                       inode_unlock(inode);
> +                       unlock_inode = false;
> +               }
>                 if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) {
>                         res = fuse_direct_IO(iocb, from);
>                 } else {
>                         res = fuse_direct_io(&io, from, &iocb->ki_pos,
>                                              FUSE_DIO_WRITE);
> +                       if (p_write) {
> +                               inode_lock(inode);
> +                               unlock_inode = true;
> +                       }
>                         fuse_write_update_attr(inode, iocb->ki_pos, res);

This doesn't need the inode lock either if the actual write wasn't locked.

>                 }
>         }
> -       inode_unlock(inode);
> +       if (unlock_inode)
> +               inode_unlock(inode);
>
>         return res;
>  }
> @@ -2850,10 +2867,16 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>         size_t count = iov_iter_count(iter), shortened = 0;
>         loff_t offset = iocb->ki_pos;
>         struct fuse_io_priv *io;
> -
> +       bool p_write = (iov_iter_rw(iter) == WRITE &&
> +                       ff->open_flags & FOPEN_PARALLEL_WRITES);
>         pos = offset;
>         inode = file->f_mapping->host;
> +
> +       if (p_write)
> +               inode_lock(inode);
>         i_size = i_size_read(inode);

Neither this.

> +       if (p_write)
> +               inode_unlock(inode);
>
>         if ((iov_iter_rw(iter) == READ) && (offset >= i_size))
>                 return 0;
> @@ -2924,9 +2947,16 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>         kref_put(&io->refcnt, fuse_io_release);
>
>         if (iov_iter_rw(iter) == WRITE) {
> +
> +               if (p_write)
> +                       inode_lock(inode);
> +
>                 fuse_write_update_attr(inode, pos, ret);
>                 if (ret < 0 && offset + count > i_size)
>                         fuse_do_truncate(file);
> +
> +               if (p_write)
> +                       inode_unlock(inode);

Truncation needs the inode lock, though I'm not completely
understanding why this truncation is needed.  But for example here it
is assumed that file size won't change, which means that non-extending
writes should hold inode lock shared and extending writes should hold
inode lock exculsive to meet this assumption.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] FUSE: Allow parallel direct writes on the same file
  2022-04-21 15:22   ` Miklos Szeredi
@ 2022-04-22 14:30     ` Dharmendra Hans
  2022-04-22 14:48       ` Miklos Szeredi
  0 siblings, 1 reply; 7+ messages in thread
From: Dharmendra Hans @ 2022-04-22 14:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, fuse-devel, linux-kernel, Bernd Schubert,
	Dharmendra Singh

On Thu, Apr 21, 2022 at 8:52 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 8 Apr 2022 at 08:18, Dharmendra Singh <dharamhans87@gmail.com> wrote:
> >
> > As of now, in Fuse, direct writes on the same file are serialized
> > over inode lock i.e we hold inode lock for the whole duration of
> > the write request. This serialization works pretty well for the FUSE
> > user space implementations which rely on this inode lock for their
> > cache/data integrity etc. But it hurts badly such FUSE implementations
> > which has their own ways of mainting data/cache integrity and does not
> > use this serialization at all.
> >
> > This patch allows parallel direct writes on the same file with the
> > help of a flag called FOPEN_PARALLEL_WRITES. If this flag is set on
> > the file (flag is passed from libfuse to fuse kernel as part of file
> > open/create), we do not hold inode lock for the whole duration of the
> > request, instead acquire it only to protect updates on certain fields
> > of the inode. FUSE implementations which rely on this inode lock can
> > continue to do so and this is default behaviour.
> >
> > Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> > ---
> >  fs/fuse/file.c            | 38 ++++++++++++++++++++++++++++++++++----
> >  include/uapi/linux/fuse.h |  2 ++
> >  2 files changed, 36 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 37eebfb90500..d3e8f44c1228 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -1465,6 +1465,8 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
> >         int err = 0;
> >         struct fuse_io_args *ia;
> >         unsigned int max_pages;
> > +       bool p_write = write &&
> > +               (ff->open_flags & FOPEN_PARALLEL_WRITES) ? true : false;
> >
> >         max_pages = iov_iter_npages(iter, fc->max_pages);
> >         ia = fuse_io_alloc(io, max_pages);
> > @@ -1472,10 +1474,11 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
> >                 return -ENOMEM;
> >
> >         if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) {
> > -               if (!write)
> > +               /* Parallel write does not come with inode lock held */
> > +               if (!write || p_write)
>
> Probably would be good to add an inode_is_locked() assert in
> fuse_sync_writes() to make sure we don't miss cases silently.

I think fuse_set_nowrite() called from fuse_sync_writes() already has
this assertion.

>
> >                         inode_lock(inode);
> >                 fuse_sync_writes(inode);
> > -               if (!write)
> > +               if (!write || p_write)
> >                         inode_unlock(inode);
> >         }
> >
> > @@ -1568,22 +1571,36 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  {
> >         struct inode *inode = file_inode(iocb->ki_filp);
> > +       struct file *file = iocb->ki_filp;
> > +       struct fuse_file *ff = file->private_data;
> >         struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
> >         ssize_t res;
> > +       bool p_write = ff->open_flags & FOPEN_PARALLEL_WRITES ? true : false;
> > +       bool unlock_inode = true;
> >
> >         /* Don't allow parallel writes to the same file */
> >         inode_lock(inode);
> >         res = generic_write_checks(iocb, from);
>
> I don't think this needs inode lock.  At least nfs_file_direct_write()
> doesn't have it.
>
> What it does have, however is taking the inode lock for shared for the
> actual write operation, which is probably something that fuse needs as
> well.
>
> Also I worry about size extending writes not holding the inode lock
> exclusive.  Would that be a problem in your use case?

Thanks for pointing out this issue. Actually there is an issue in
appending writes.
Until unless current appeding write is finished and does not update
i_size, next appending
write can't be allowed as it would be otherwise one request
overwriting data written
by another request.
For other kind of writes, I do not see the issue as i_size update can
be handled as it is
done currently as these writes are based upon fixed offset instead of
generating offset
from i_size.

So here is how I am thinking to handle this.
1) Take exclusive lock on the inode for appending writes.
2) Take shared lock on the inode for writes other than appending
writes. This shared lock
     will prevent truncation on the inode at the same time otherwise
we might face issues
    on i_size.

Please note that we use fi->lock to update the i_size. Hence we would
not be required
to upgrade this shared lock to exclusive lock when updating i_size.
Therefore having shared
lock for write requests other than appending writes fulfill our purpose.

>
> >         if (res > 0) {
> > +               /* Allow parallel writes on the inode by unlocking it */
> > +               if (p_write) {
> > +                       inode_unlock(inode);
> > +                       unlock_inode = false;
> > +               }
> >                 if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) {
> >                         res = fuse_direct_IO(iocb, from);
> >                 } else {
> >                         res = fuse_direct_io(&io, from, &iocb->ki_pos,
> >                                              FUSE_DIO_WRITE);
> > +                       if (p_write) {
> > +                               inode_lock(inode);
> > +                               unlock_inode = true;
> > +                       }
> >                         fuse_write_update_attr(inode, iocb->ki_pos, res);
>
> This doesn't need the inode lock either if the actual write wasn't locked.

Would remove

>
> >                 }
> >         }
> > -       inode_unlock(inode);
> > +       if (unlock_inode)
> > +               inode_unlock(inode);
> >
> >         return res;
> >  }
> > @@ -2850,10 +2867,16 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> >         size_t count = iov_iter_count(iter), shortened = 0;
> >         loff_t offset = iocb->ki_pos;
> >         struct fuse_io_priv *io;
> > -
> > +       bool p_write = (iov_iter_rw(iter) == WRITE &&
> > +                       ff->open_flags & FOPEN_PARALLEL_WRITES);
> >         pos = offset;
> >         inode = file->f_mapping->host;
> > +
> > +       if (p_write)
> > +               inode_lock(inode);
> >         i_size = i_size_read(inode);
>
> Neither this.

We would not be taking exclusive lock for request other than appending writes.

>
> > +       if (p_write)
> > +               inode_unlock(inode);
> >
> >         if ((iov_iter_rw(iter) == READ) && (offset >= i_size))
> >                 return 0;
> > @@ -2924,9 +2947,16 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> >         kref_put(&io->refcnt, fuse_io_release);
> >
> >         if (iov_iter_rw(iter) == WRITE) {
> > +
> > +               if (p_write)
> > +                       inode_lock(inode);
> > +
> >                 fuse_write_update_attr(inode, pos, ret);
> >                 if (ret < 0 && offset + count > i_size)
> >                         fuse_do_truncate(file);
> > +
> > +               if (p_write)
> > +                       inode_unlock(inode);
>
> Truncation needs the inode lock, though I'm not completely
> understanding why this truncation is needed.  But for example here it
> is assumed that file size won't change, which means that non-extending
> writes should hold inode lock shared and extending writes should hold
> inode lock exculsive to meet this assumption.

I did not get fully why this truncation is needed here as well. But we would be
taking exclusive lock in this case as  now file size can get  changed before we
came here.

If we agreed, I  would be sending the updated patch shortly.
(Also please take a look on other patches raised by me for atomic-open,  these
 patches are pending since couple of weeks)

Thanks,
Dharmendra

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] FUSE: Allow parallel direct writes on the same file
  2022-04-22 14:30     ` Dharmendra Hans
@ 2022-04-22 14:48       ` Miklos Szeredi
  2022-04-22 15:20         ` Bernd Schubert
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2022-04-22 14:48 UTC (permalink / raw)
  To: Dharmendra Hans
  Cc: linux-fsdevel, fuse-devel, linux-kernel, Bernd Schubert,
	Dharmendra Singh

On Fri, 22 Apr 2022 at 16:30, Dharmendra Hans <dharamhans87@gmail.com> wrote:
>
> On Thu, Apr 21, 2022 at 8:52 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Fri, 8 Apr 2022 at 08:18, Dharmendra Singh <dharamhans87@gmail.com> wrote:
> > >
> > > As of now, in Fuse, direct writes on the same file are serialized
> > > over inode lock i.e we hold inode lock for the whole duration of
> > > the write request. This serialization works pretty well for the FUSE
> > > user space implementations which rely on this inode lock for their
> > > cache/data integrity etc. But it hurts badly such FUSE implementations
> > > which has their own ways of mainting data/cache integrity and does not
> > > use this serialization at all.
> > >
> > > This patch allows parallel direct writes on the same file with the
> > > help of a flag called FOPEN_PARALLEL_WRITES. If this flag is set on
> > > the file (flag is passed from libfuse to fuse kernel as part of file
> > > open/create), we do not hold inode lock for the whole duration of the
> > > request, instead acquire it only to protect updates on certain fields
> > > of the inode. FUSE implementations which rely on this inode lock can
> > > continue to do so and this is default behaviour.
> > >
> > > Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> > > ---
> > >  fs/fuse/file.c            | 38 ++++++++++++++++++++++++++++++++++----
> > >  include/uapi/linux/fuse.h |  2 ++
> > >  2 files changed, 36 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > > index 37eebfb90500..d3e8f44c1228 100644
> > > --- a/fs/fuse/file.c
> > > +++ b/fs/fuse/file.c
> > > @@ -1465,6 +1465,8 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
> > >         int err = 0;
> > >         struct fuse_io_args *ia;
> > >         unsigned int max_pages;
> > > +       bool p_write = write &&
> > > +               (ff->open_flags & FOPEN_PARALLEL_WRITES) ? true : false;
> > >
> > >         max_pages = iov_iter_npages(iter, fc->max_pages);
> > >         ia = fuse_io_alloc(io, max_pages);
> > > @@ -1472,10 +1474,11 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
> > >                 return -ENOMEM;
> > >
> > >         if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) {
> > > -               if (!write)
> > > +               /* Parallel write does not come with inode lock held */
> > > +               if (!write || p_write)
> >
> > Probably would be good to add an inode_is_locked() assert in
> > fuse_sync_writes() to make sure we don't miss cases silently.
>
> I think fuse_set_nowrite() called from fuse_sync_writes() already has
> this assertion.

Ah, okay.

>
> >
> > >                         inode_lock(inode);
> > >                 fuse_sync_writes(inode);
> > > -               if (!write)
> > > +               if (!write || p_write)
> > >                         inode_unlock(inode);
> > >         }
> > >
> > > @@ -1568,22 +1571,36 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > >  static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > >  {
> > >         struct inode *inode = file_inode(iocb->ki_filp);
> > > +       struct file *file = iocb->ki_filp;
> > > +       struct fuse_file *ff = file->private_data;
> > >         struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
> > >         ssize_t res;
> > > +       bool p_write = ff->open_flags & FOPEN_PARALLEL_WRITES ? true : false;
> > > +       bool unlock_inode = true;
> > >
> > >         /* Don't allow parallel writes to the same file */
> > >         inode_lock(inode);
> > >         res = generic_write_checks(iocb, from);
> >
> > I don't think this needs inode lock.  At least nfs_file_direct_write()
> > doesn't have it.
> >
> > What it does have, however is taking the inode lock for shared for the
> > actual write operation, which is probably something that fuse needs as
> > well.
> >
> > Also I worry about size extending writes not holding the inode lock
> > exclusive.  Would that be a problem in your use case?
>
> Thanks for pointing out this issue. Actually there is an issue in
> appending writes.
> Until unless current appeding write is finished and does not update
> i_size, next appending
> write can't be allowed as it would be otherwise one request
> overwriting data written
> by another request.
> For other kind of writes, I do not see the issue as i_size update can
> be handled as it is
> done currently as these writes are based upon fixed offset instead of
> generating offset
> from i_size.

That's true, but I still worry...  Does your workload include
non-append extending writes?  Seems to me making those run in parallel
is asking for trouble.

> If we agreed, I  would be sending the updated patch shortly.
> (Also please take a look on other patches raised by me for atomic-open,  these
>  patches are pending since couple of weeks)

I'm looking at that currently.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] FUSE: Allow parallel direct writes on the same file
  2022-04-22 14:48       ` Miklos Szeredi
@ 2022-04-22 15:20         ` Bernd Schubert
  2022-04-25  8:40           ` Miklos Szeredi
  0 siblings, 1 reply; 7+ messages in thread
From: Bernd Schubert @ 2022-04-22 15:20 UTC (permalink / raw)
  To: Miklos Szeredi, Dharmendra Hans
  Cc: linux-fsdevel, fuse-devel, linux-kernel, Dharmendra Singh



On 4/22/22 16:48, Miklos Szeredi wrote:
> On Fri, 22 Apr 2022 at 16:30, Dharmendra Hans <dharamhans87@gmail.com> wrote:
>>
>> On Thu, Apr 21, 2022 at 8:52 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>
>>> On Fri, 8 Apr 2022 at 08:18, Dharmendra Singh <dharamhans87@gmail.com> wrote:
>>>>
> 
> That's true, but I still worry...  Does your workload include
> non-append extending writes?  Seems to me making those run in parallel
> is asking for trouble.

Our main use case is MPIIO for now and I don't think it first sets the 
file size and would then write to these sparse files. Fixing all the 
different MPI implementations including closed source stacks is probably 
out of question.
Given that MPIIO also supports direct calls into its stack we also do 
support that for some MPIs, but not all stacks. Direct calls bypassing 
the vfs also haas  it's own issues, including security. So it would be 
really great if we could find a way to avoid the inode lock.

Would you mind to share what you worry about in detail?


> 
>> If we agreed, I  would be sending the updated patch shortly.
>> (Also please take a look on other patches raised by me for atomic-open,  these
>>   patches are pending since couple of weeks)
> 
> I'm looking at that currently.

Thank you! There are two more optimizations in the same area, but these 
require VFS changes - let's first get the 'easy' things done...



Thanks,
Bernd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] FUSE: Allow parallel direct writes on the same file
  2022-04-22 15:20         ` Bernd Schubert
@ 2022-04-25  8:40           ` Miklos Szeredi
  0 siblings, 0 replies; 7+ messages in thread
From: Miklos Szeredi @ 2022-04-25  8:40 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Dharmendra Hans, linux-fsdevel, fuse-devel, linux-kernel,
	Dharmendra Singh

On Fri, 22 Apr 2022 at 17:20, Bernd Schubert <bschubert@ddn.com> wrote:
>
>
>
> On 4/22/22 16:48, Miklos Szeredi wrote:
> > On Fri, 22 Apr 2022 at 16:30, Dharmendra Hans <dharamhans87@gmail.com> wrote:
> >>
> >> On Thu, Apr 21, 2022 at 8:52 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>>
> >>> On Fri, 8 Apr 2022 at 08:18, Dharmendra Singh <dharamhans87@gmail.com> wrote:
> >>>>
> >
> > That's true, but I still worry...  Does your workload include
> > non-append extending writes?  Seems to me making those run in parallel
> > is asking for trouble.
>
> Our main use case is MPIIO for now and I don't think it first sets the
> file size and would then write to these sparse files. Fixing all the
> different MPI implementations including closed source stacks is probably
> out of question.

Okay.

> Given that MPIIO also supports direct calls into its stack we also do
> support that for some MPIs, but not all stacks. Direct calls bypassing
> the vfs also haas  it's own issues, including security. So it would be
> really great if we could find a way to avoid the inode lock.
>
> Would you mind to share what you worry about in detail?

I worry about breaking the assumption about i_size does not change if
inode lock is held (exclusive of shared).

Maybe that's not an issue, but we'd need to look very carefully.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-25  8:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-08  6:18 [PATCH 0/1] FUSE: Parallel direct writes on the same file Dharmendra Singh
2022-04-08  6:18 ` [PATCH 1/1] FUSE: Allow parallel " Dharmendra Singh
2022-04-21 15:22   ` Miklos Szeredi
2022-04-22 14:30     ` Dharmendra Hans
2022-04-22 14:48       ` Miklos Szeredi
2022-04-22 15:20         ` Bernd Schubert
2022-04-25  8:40           ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).