linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes
@ 2023-03-07 17:20 Jens Axboe
  2023-03-07 17:20 ` [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Jens Axboe @ 2023-03-07 17:20 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, linux-xfs, linux-ext4

Hi,

This has been on my TODO list for a while, and now that ext4 supports
parallel dio writes as well, time to dust it off and send it out... This
adds an FMODE flag to inform users that a given file supports parallel
dio writes. io_uring can use this to avoid serializing dio writes
upfront, in case it isn't needed. A few details in patch #2, patch 1 does
nothing by itself.

-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-03-07 17:20 [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes Jens Axboe
@ 2023-03-07 17:20 ` Jens Axboe
  2023-04-12 13:40   ` Bernd Schubert
  2023-03-07 17:20 ` [PATCH 2/2] io_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2023-03-07 17:20 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, linux-xfs, linux-ext4; +Cc: Jens Axboe

Some filesystems support multiple threads writing to the same file with
O_DIRECT without requiring exclusive access to it. io_uring can use this
hint to avoid serializing dio writes to this inode, instead allowing them
to run in parallel.

XFS and ext4 both fall into this category, so set the flag for both of
them.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/ext4/file.c     | 3 ++-
 fs/xfs/xfs_file.c  | 3 ++-
 include/linux/fs.h | 3 +++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 0b8b4499e5ca..d101b3b0c7da 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -899,7 +899,8 @@ static int ext4_file_open(struct inode *inode, struct file *filp)
 			return ret;
 	}
 
-	filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
+	filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC |
+			FMODE_DIO_PARALLEL_WRITE;
 	return dquot_file_open(inode, filp);
 }
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 705250f9f90a..863289aaa441 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1171,7 +1171,8 @@ xfs_file_open(
 {
 	if (xfs_is_shutdown(XFS_M(inode->i_sb)))
 		return -EIO;
-	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
+	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC |
+			FMODE_DIO_PARALLEL_WRITE;
 	return generic_file_open(inode, file);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c85916e9f7db..475d88640d3d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -168,6 +168,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 
 #define	FMODE_NOREUSE		((__force fmode_t)0x800000)
 
+/* File supports non-exclusive O_DIRECT writes from multiple threads */
+#define FMODE_DIO_PARALLEL_WRITE	((__force fmode_t)0x1000000)
+
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY		((__force fmode_t)0x4000000)
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] io_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it
  2023-03-07 17:20 [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes Jens Axboe
  2023-03-07 17:20 ` [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag Jens Axboe
@ 2023-03-07 17:20 ` Jens Axboe
  2023-03-15 17:40 ` [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes Jens Axboe
  2023-04-03 12:24 ` Christian Brauner
  3 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2023-03-07 17:20 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, linux-xfs, linux-ext4; +Cc: Jens Axboe

io_uring hashes writes to a given file/inode so that it can serialize
them. This is useful if the file system needs exclusive access to the
file to perform the write, as otherwise we end up with a ton of io-wq
threads trying to lock the inode at the same time. This can cause
excessive system time.

But if the file system has flagged that it supports parallel O_DIRECT
writes, then there's no need to serialize the writes. Check for that
through FMODE_DIO_PARALLEL_WRITE and don't hash it if we don't need to.

In a basic test of 8 threads writing to a file on XFS on a gen2 Optane,
with each thread writing in 4k chunks, it improves performance from
~1350K IOPS (or ~5290MiB/sec) to ~1410K IOPS (or ~5500MiB/sec).

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index fd9ba840c4a2..93cc1ff5e9cd 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -429,7 +429,13 @@ static void io_prep_async_work(struct io_kiocb *req)
 	}
 
 	if (req->flags & REQ_F_ISREG) {
-		if (def->hash_reg_file || (ctx->flags & IORING_SETUP_IOPOLL))
+		bool should_hash = def->hash_reg_file;
+
+		/* don't serialize this request if the fs doesn't need it */
+		if (should_hash && (req->file->f_flags & O_DIRECT) &&
+		    (req->file->f_mode & FMODE_DIO_PARALLEL_WRITE))
+			should_hash = false;
+		if (should_hash || (ctx->flags & IORING_SETUP_IOPOLL))
 			io_wq_hash_work(&req->work, file_inode(req->file));
 	} else if (!req->file || !S_ISBLK(file_inode(req->file)->i_mode)) {
 		if (def->unbound_nonreg_file)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes
  2023-03-07 17:20 [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes Jens Axboe
  2023-03-07 17:20 ` [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag Jens Axboe
  2023-03-07 17:20 ` [PATCH 2/2] io_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it Jens Axboe
@ 2023-03-15 17:40 ` Jens Axboe
  2023-03-16  4:29   ` Darrick J. Wong
  2023-04-03 12:24 ` Christian Brauner
  3 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2023-03-15 17:40 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, linux-xfs, linux-ext4

On 3/7/23 10:20 AM, Jens Axboe wrote:
> Hi,
> 
> This has been on my TODO list for a while, and now that ext4 supports
> parallel dio writes as well, time to dust it off and send it out... This
> adds an FMODE flag to inform users that a given file supports parallel
> dio writes. io_uring can use this to avoid serializing dio writes
> upfront, in case it isn't needed. A few details in patch #2, patch 1 does
> nothing by itself.

I'm assuming silence is consent here and folks are fine with this
change?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes
  2023-03-15 17:40 ` [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes Jens Axboe
@ 2023-03-16  4:29   ` Darrick J. Wong
  2023-03-17  2:53     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2023-03-16  4:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-fsdevel, linux-xfs, linux-ext4

On Wed, Mar 15, 2023 at 11:40:02AM -0600, Jens Axboe wrote:
> On 3/7/23 10:20 AM, Jens Axboe wrote:
> > Hi,
> > 
> > This has been on my TODO list for a while, and now that ext4 supports
> > parallel dio writes as well, time to dust it off and send it out... This
> > adds an FMODE flag to inform users that a given file supports parallel
> > dio writes. io_uring can use this to avoid serializing dio writes
> > upfront, in case it isn't needed. A few details in patch #2, patch 1 does
> > nothing by itself.
> 
> I'm assuming silence is consent here and folks are fine with this
> change?

Oh, yeah, this one fell off my radar.

LGTM,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> -- 
> Jens Axboe
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes
  2023-03-16  4:29   ` Darrick J. Wong
@ 2023-03-17  2:53     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2023-03-17  2:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: io-uring, linux-fsdevel, linux-xfs, linux-ext4

On 3/15/23 10:29 PM, Darrick J. Wong wrote:
> On Wed, Mar 15, 2023 at 11:40:02AM -0600, Jens Axboe wrote:
>> On 3/7/23 10:20 AM, Jens Axboe wrote:
>>> Hi,
>>>
>>> This has been on my TODO list for a while, and now that ext4 supports
>>> parallel dio writes as well, time to dust it off and send it out... This
>>> adds an FMODE flag to inform users that a given file supports parallel
>>> dio writes. io_uring can use this to avoid serializing dio writes
>>> upfront, in case it isn't needed. A few details in patch #2, patch 1 does
>>> nothing by itself.
>>
>> I'm assuming silence is consent here and folks are fine with this
>> change?
> 
> Oh, yeah, this one fell off my radar.
> 
> LGTM,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks Darrick.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes
  2023-03-07 17:20 [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes Jens Axboe
                   ` (2 preceding siblings ...)
  2023-03-15 17:40 ` [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes Jens Axboe
@ 2023-04-03 12:24 ` Christian Brauner
  3 siblings, 0 replies; 20+ messages in thread
From: Christian Brauner @ 2023-04-03 12:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-fsdevel, linux-xfs, linux-ext4

On Tue, Mar 07, 2023 at 10:20:13AM -0700, Jens Axboe wrote:
> Hi,
> 
> This has been on my TODO list for a while, and now that ext4 supports
> parallel dio writes as well, time to dust it off and send it out... This
> adds an FMODE flag to inform users that a given file supports parallel
> dio writes. io_uring can use this to avoid serializing dio writes
> upfront, in case it isn't needed. A few details in patch #2, patch 1 does
> nothing by itself.

Looks good,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-03-07 17:20 ` [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag Jens Axboe
@ 2023-04-12 13:40   ` Bernd Schubert
  2023-04-12 13:43     ` Bernd Schubert
  2023-04-13  7:40     ` Miklos Szeredi
  0 siblings, 2 replies; 20+ messages in thread
From: Bernd Schubert @ 2023-04-12 13:40 UTC (permalink / raw)
  To: axboe; +Cc: io-uring, linux-ext4, linux-fsdevel, linux-xfs, dsingh, bschubert

Miklos, Jens,

could we please also set this flag for fuse?


Thanks,
Bernd


fuse: Set FMODE_DIO_PARALLEL_WRITE flag

From: Bernd Schubert <bschubert@ddn.com>

Fuse can also do parallel DIO writes, if userspace has enabled it.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fuse/file.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 875314ee6f59..46e7f1196fd1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -215,6 +215,9 @@ void fuse_finish_open(struct inode *inode, struct file *file)
 	}
 	if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
 		fuse_link_write_file(file);
+
+	if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)
+		file->f_mode |= FMODE_DIO_PARALLEL_WRITE;
 }
 
 int fuse_open_common(struct inode *inode, struct file *file, bool isdir)

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-12 13:40   ` Bernd Schubert
@ 2023-04-12 13:43     ` Bernd Schubert
  2023-04-13  7:40     ` Miklos Szeredi
  1 sibling, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2023-04-12 13:43 UTC (permalink / raw)
  To: axboe
  Cc: io-uring, linux-ext4, linux-fsdevel, linux-xfs, Dharmendra Singh,
	Miklos Szeredi

Sorry, had forgotten to CC Miklos.

On 4/12/23 15:40, Bernd Schubert wrote:
> Miklos, Jens,
> 
> could we please also set this flag for fuse?
> 
> 
> Thanks,
> Bernd
> 
> 
> fuse: Set FMODE_DIO_PARALLEL_WRITE flag
> 
> From: Bernd Schubert <bschubert@ddn.com>
> 
> Fuse can also do parallel DIO writes, if userspace has enabled it.
> 
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> ---
>   fs/fuse/file.c |    3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 875314ee6f59..46e7f1196fd1 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -215,6 +215,9 @@ void fuse_finish_open(struct inode *inode, struct file *file)
>   	}
>   	if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
>   		fuse_link_write_file(file);
> +
> +	if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)
> +		file->f_mode |= FMODE_DIO_PARALLEL_WRITE;
>   }
>   
>   int fuse_open_common(struct inode *inode, struct file *file, bool isdir)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-12 13:40   ` Bernd Schubert
  2023-04-12 13:43     ` Bernd Schubert
@ 2023-04-13  7:40     ` Miklos Szeredi
  2023-04-13  9:25       ` Bernd Schubert
  2023-04-14  5:11       ` Christoph Hellwig
  1 sibling, 2 replies; 20+ messages in thread
From: Miklos Szeredi @ 2023-04-13  7:40 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: axboe, io-uring, linux-ext4, linux-fsdevel, linux-xfs, dsingh

On Wed, 12 Apr 2023 at 15:42, Bernd Schubert <bschubert@ddn.com> wrote:
>
> Miklos, Jens,
>
> could we please also set this flag for fuse?
>
>
> Thanks,
> Bernd
>
>
> fuse: Set FMODE_DIO_PARALLEL_WRITE flag
>
> From: Bernd Schubert <bschubert@ddn.com>
>
> Fuse can also do parallel DIO writes, if userspace has enabled it.
>
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> ---
>  fs/fuse/file.c |    3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 875314ee6f59..46e7f1196fd1 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -215,6 +215,9 @@ void fuse_finish_open(struct inode *inode, struct file *file)
>         }
>         if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
>                 fuse_link_write_file(file);
> +
> +       if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)
> +               file->f_mode |= FMODE_DIO_PARALLEL_WRITE;

fuse_direct_write_iter():

bool exclusive_lock =
    !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
    iocb->ki_flags & IOCB_APPEND ||
    fuse_direct_write_extending_i_size(iocb, from);

If the write is size extending, then it will take the lock exclusive.
OTOH, I guess that it would be unusual for lots of  size extending
writes to be done in parallel.

What would be the effect of giving the  FMODE_DIO_PARALLEL_WRITE hint
and then still serializing the writes?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-13  7:40     ` Miklos Szeredi
@ 2023-04-13  9:25       ` Bernd Schubert
  2023-04-14  5:11       ` Christoph Hellwig
  1 sibling, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2023-04-13  9:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: axboe, io-uring, linux-ext4, linux-fsdevel, linux-xfs, Dharmendra Singh

On 4/13/23 09:40, Miklos Szeredi wrote:
> On Wed, 12 Apr 2023 at 15:42, Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> Miklos, Jens,
>>
>> could we please also set this flag for fuse?
>>
>>
>> Thanks,
>> Bernd
>>
>>
>> fuse: Set FMODE_DIO_PARALLEL_WRITE flag
>>
>> From: Bernd Schubert <bschubert@ddn.com>
>>
>> Fuse can also do parallel DIO writes, if userspace has enabled it.
>>
>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>> ---
>>   fs/fuse/file.c |    3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index 875314ee6f59..46e7f1196fd1 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -215,6 +215,9 @@ void fuse_finish_open(struct inode *inode, struct file *file)
>>          }
>>          if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
>>                  fuse_link_write_file(file);
>> +
>> +       if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)
>> +               file->f_mode |= FMODE_DIO_PARALLEL_WRITE;
> 
> fuse_direct_write_iter():
> 
> bool exclusive_lock =
>      !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
>      iocb->ki_flags & IOCB_APPEND ||
>      fuse_direct_write_extending_i_size(iocb, from);
> 
> If the write is size extending, then it will take the lock exclusive.
> OTOH, I guess that it would be unusual for lots of  size extending
> writes to be done in parallel.
> 
> What would be the effect of giving the  FMODE_DIO_PARALLEL_WRITE hint
> and then still serializing the writes?

It used here

https://lore.kernel.org/io-uring/20230403-wound-roundworm-c1660e059b8c@brauner/T/#m5f86985d6c67dd1d01a977475dab542c338372dd


fuse_finish_open has its own lock, so letting uring handle requests in 
parallel should not hurt? Is this going like

application -> uring does parallel requests -> fuse.ko -> fuse-deamon

So when fuse-deamon signals that it can handle parallel DIO, it is just 
fuse.ko that might need its own lock to extend the file?


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-13  7:40     ` Miklos Szeredi
  2023-04-13  9:25       ` Bernd Schubert
@ 2023-04-14  5:11       ` Christoph Hellwig
  2023-04-14 15:36         ` Darrick J. Wong
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2023-04-14  5:11 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, axboe, io-uring, linux-ext4, linux-fsdevel,
	linux-xfs, dsingh

On Thu, Apr 13, 2023 at 09:40:29AM +0200, Miklos Szeredi wrote:
> fuse_direct_write_iter():
> 
> bool exclusive_lock =
>     !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
>     iocb->ki_flags & IOCB_APPEND ||
>     fuse_direct_write_extending_i_size(iocb, from);
> 
> If the write is size extending, then it will take the lock exclusive.
> OTOH, I guess that it would be unusual for lots of  size extending
> writes to be done in parallel.
> 
> What would be the effect of giving the  FMODE_DIO_PARALLEL_WRITE hint
> and then still serializing the writes?

I have no idea how this flags work, but XFS also takes i_rwsem
exclusively for appends, when the positions and size aren't aligned to
the block size, and a few other cases.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-14  5:11       ` Christoph Hellwig
@ 2023-04-14 15:36         ` Darrick J. Wong
  2023-04-15 13:15           ` Jens Axboe
  2023-04-16  5:54           ` Christoph Hellwig
  0 siblings, 2 replies; 20+ messages in thread
From: Darrick J. Wong @ 2023-04-14 15:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miklos Szeredi, Bernd Schubert, axboe, io-uring, linux-ext4,
	linux-fsdevel, linux-xfs, dsingh

On Thu, Apr 13, 2023 at 10:11:28PM -0700, Christoph Hellwig wrote:
> On Thu, Apr 13, 2023 at 09:40:29AM +0200, Miklos Szeredi wrote:
> > fuse_direct_write_iter():
> > 
> > bool exclusive_lock =
> >     !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
> >     iocb->ki_flags & IOCB_APPEND ||
> >     fuse_direct_write_extending_i_size(iocb, from);
> > 
> > If the write is size extending, then it will take the lock exclusive.
> > OTOH, I guess that it would be unusual for lots of  size extending
> > writes to be done in parallel.
> > 
> > What would be the effect of giving the  FMODE_DIO_PARALLEL_WRITE hint
> > and then still serializing the writes?
> 
> I have no idea how this flags work, but XFS also takes i_rwsem
> exclusively for appends, when the positions and size aren't aligned to
> the block size, and a few other cases.

IIUC uring wants to avoid the situation where someone sends 300 writes
to the same file, all of which end up in background workers, and all of
which then contend on exclusive i_rwsem.  Hence it has some hashing
scheme that executes io requests serially if they hash to the same value
(which iirc is the inode number?) to prevent resource waste.

This flag turns off that hashing behavior on the assumption that each of
those 300 writes won't serialize on the other 299 writes, hence it's ok
to start up 300 workers.

(apologies for precoffee garbled response)

--D

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-14 15:36         ` Darrick J. Wong
@ 2023-04-15 13:15           ` Jens Axboe
  2023-04-18 12:42             ` Miklos Szeredi
  2023-04-16  5:54           ` Christoph Hellwig
  1 sibling, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2023-04-15 13:15 UTC (permalink / raw)
  To: Darrick J. Wong, Christoph Hellwig
  Cc: Miklos Szeredi, Bernd Schubert, io-uring, linux-ext4,
	linux-fsdevel, linux-xfs, dsingh

On 4/14/23 9:36?AM, Darrick J. Wong wrote:
> On Thu, Apr 13, 2023 at 10:11:28PM -0700, Christoph Hellwig wrote:
>> On Thu, Apr 13, 2023 at 09:40:29AM +0200, Miklos Szeredi wrote:
>>> fuse_direct_write_iter():
>>>
>>> bool exclusive_lock =
>>>     !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
>>>     iocb->ki_flags & IOCB_APPEND ||
>>>     fuse_direct_write_extending_i_size(iocb, from);
>>>
>>> If the write is size extending, then it will take the lock exclusive.
>>> OTOH, I guess that it would be unusual for lots of  size extending
>>> writes to be done in parallel.
>>>
>>> What would be the effect of giving the  FMODE_DIO_PARALLEL_WRITE hint
>>> and then still serializing the writes?
>>
>> I have no idea how this flags work, but XFS also takes i_rwsem
>> exclusively for appends, when the positions and size aren't aligned to
>> the block size, and a few other cases.
> 
> IIUC uring wants to avoid the situation where someone sends 300 writes
> to the same file, all of which end up in background workers, and all of
> which then contend on exclusive i_rwsem.  Hence it has some hashing
> scheme that executes io requests serially if they hash to the same value
> (which iirc is the inode number?) to prevent resource waste.
> 
> This flag turns off that hashing behavior on the assumption that each of
> those 300 writes won't serialize on the other 299 writes, hence it's ok
> to start up 300 workers.
> 
> (apologies for precoffee garbled response)

Yep, that is pretty much it. If all writes to that inode are serialized
by a lock on the fs side, then we'll get a lot of contention on that
mutex. And since, originally, nothing supported async writes, everything
would get punted to the io-wq workers. io_uring added per-inode hashing
for this, so that any punt to io-wq of a write would get serialized.

IOW, it's an efficiency thing, not a correctness thing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-14 15:36         ` Darrick J. Wong
  2023-04-15 13:15           ` Jens Axboe
@ 2023-04-16  5:54           ` Christoph Hellwig
  2023-04-19  1:29             ` Jens Axboe
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2023-04-16  5:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Miklos Szeredi, Bernd Schubert, axboe,
	io-uring, linux-ext4, linux-fsdevel, linux-xfs, dsingh

On Fri, Apr 14, 2023 at 08:36:12AM -0700, Darrick J. Wong wrote:
> IIUC uring wants to avoid the situation where someone sends 300 writes
> to the same file, all of which end up in background workers, and all of
> which then contend on exclusive i_rwsem.  Hence it has some hashing
> scheme that executes io requests serially if they hash to the same value
> (which iirc is the inode number?) to prevent resource waste.
> 
> This flag turns off that hashing behavior on the assumption that each of
> those 300 writes won't serialize on the other 299 writes, hence it's ok
> to start up 300 workers.
> 
> (apologies for precoffee garbled response)

It might be useful if someone (Jens?) could clearly document the
assumptions for this flag.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-15 13:15           ` Jens Axboe
@ 2023-04-18 12:42             ` Miklos Szeredi
  2023-04-18 12:55               ` Bernd Schubert
  0 siblings, 1 reply; 20+ messages in thread
From: Miklos Szeredi @ 2023-04-18 12:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Darrick J. Wong, Christoph Hellwig, Bernd Schubert, io-uring,
	linux-ext4, linux-fsdevel, linux-xfs, dsingh

On Sat, 15 Apr 2023 at 15:15, Jens Axboe <axboe@kernel.dk> wrote:

> Yep, that is pretty much it. If all writes to that inode are serialized
> by a lock on the fs side, then we'll get a lot of contention on that
> mutex. And since, originally, nothing supported async writes, everything
> would get punted to the io-wq workers. io_uring added per-inode hashing
> for this, so that any punt to io-wq of a write would get serialized.
>
> IOW, it's an efficiency thing, not a correctness thing.

We could still get a performance regression if the majority of writes
still trigger the exclusive locking.  The questions are:

 - how often does that happen in real life?
 - how bad the performance regression would be?

Without first attempting to answer those questions, I'd be reluctant
to add  FMODE_DIO_PARALLEL_WRITE to fuse.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-18 12:42             ` Miklos Szeredi
@ 2023-04-18 12:55               ` Bernd Schubert
  2023-04-18 22:13                 ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Bernd Schubert @ 2023-04-18 12:55 UTC (permalink / raw)
  To: Miklos Szeredi, Jens Axboe
  Cc: Darrick J. Wong, Christoph Hellwig, io-uring, linux-ext4,
	linux-fsdevel, linux-xfs, Dharmendra Singh

On 4/18/23 14:42, Miklos Szeredi wrote:
> On Sat, 15 Apr 2023 at 15:15, Jens Axboe <axboe@kernel.dk> wrote:
> 
>> Yep, that is pretty much it. If all writes to that inode are serialized
>> by a lock on the fs side, then we'll get a lot of contention on that
>> mutex. And since, originally, nothing supported async writes, everything
>> would get punted to the io-wq workers. io_uring added per-inode hashing
>> for this, so that any punt to io-wq of a write would get serialized.
>>
>> IOW, it's an efficiency thing, not a correctness thing.
> 
> We could still get a performance regression if the majority of writes
> still trigger the exclusive locking.  The questions are:
> 
>   - how often does that happen in real life?

Application depending? My personal opinion is that 
applications/developers knowing about uring would also know that they 
should set the right file size first. Like MPIIO is extending files 
persistently and it is hard to fix with all these different MPI stacks 
(I can try to notify mpich and mvapich developers). So best would be to 
document it somewhere in the uring man page that parallel extending 
files might have negative side effects?


>   - how bad the performance regression would be?

I can give it a try with fio and fallocate=none over fuse during the 
next days.

> 
> Without first attempting to answer those questions, I'd be reluctant
> to add  FMODE_DIO_PARALLEL_WRITE to fuse.
> 


Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-18 12:55               ` Bernd Schubert
@ 2023-04-18 22:13                 ` Dave Chinner
  2023-04-19  1:28                   ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2023-04-18 22:13 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Darrick J. Wong, Christoph Hellwig,
	io-uring, linux-ext4, linux-fsdevel, linux-xfs, Dharmendra Singh

On Tue, Apr 18, 2023 at 12:55:40PM +0000, Bernd Schubert wrote:
> On 4/18/23 14:42, Miklos Szeredi wrote:
> > On Sat, 15 Apr 2023 at 15:15, Jens Axboe <axboe@kernel.dk> wrote:
> > 
> >> Yep, that is pretty much it. If all writes to that inode are serialized
> >> by a lock on the fs side, then we'll get a lot of contention on that
> >> mutex. And since, originally, nothing supported async writes, everything
> >> would get punted to the io-wq workers. io_uring added per-inode hashing
> >> for this, so that any punt to io-wq of a write would get serialized.
> >>
> >> IOW, it's an efficiency thing, not a correctness thing.
> > 
> > We could still get a performance regression if the majority of writes
> > still trigger the exclusive locking.  The questions are:
> > 
> >   - how often does that happen in real life?
> 
> Application depending? My personal opinion is that 
> applications/developers knowing about uring would also know that they 
> should set the right file size first. Like MPIIO is extending files 
> persistently and it is hard to fix with all these different MPI stacks 
> (I can try to notify mpich and mvapich developers). So best would be to 
> document it somewhere in the uring man page that parallel extending 
> files might have negative side effects?

There are relatively few applications running concurrent async
RWF_APPEND DIO writes. IIRC SycallaDB was the first we came across a
few years ago. Apps that use RWF_APPEND for individual DIOs expect
that it doesn't cause performance anomolies.

These days XFS will run concurrent append DIO writes and it doesn't
serialise RWF_APPEND IO against other RWF_APPEND IOs. Avoiding data
corruption due to racing append IOs doing file extension has been
delegated to the userspace application similar to how we delegate
the responsibility for avoiding data corruption due to overlapping
concurrent DIO to userspace.

> >   - how bad the performance regression would be?
> 
> I can give it a try with fio and fallocate=none over fuse during the 
> next days.

It depends on where the lock that triggers serialisation is, and how
bad the contention on it is. rwsems suck for write contention
because of the "spin on owner" "optimisations" for write locking and
long write holds that occur in the IO path. In general, it will be
no worse than using userspace threads to issue the exact same IO
pattern using concurrent sync IO.

> > Without first attempting to answer those questions, I'd be reluctant
> > to add  FMODE_DIO_PARALLEL_WRITE to fuse.

I'd tag it with this anyway - for the majority of apps that are
doing concurrent DIO within EOF, shared locking is big win. If
there's a corner case that apps trigger that is slow, deal with them
when they are reported....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-18 22:13                 ` Dave Chinner
@ 2023-04-19  1:28                   ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2023-04-19  1:28 UTC (permalink / raw)
  To: Dave Chinner, Bernd Schubert
  Cc: Miklos Szeredi, Darrick J. Wong, Christoph Hellwig, io-uring,
	linux-ext4, linux-fsdevel, linux-xfs, Dharmendra Singh

On 4/18/23 4:13?PM, Dave Chinner wrote:
>>> Without first attempting to answer those questions, I'd be reluctant
>>> to add  FMODE_DIO_PARALLEL_WRITE to fuse.
> 
> I'd tag it with this anyway - for the majority of apps that are
> doing concurrent DIO within EOF, shared locking is big win. If
> there's a corner case that apps trigger that is slow, deal with them
> when they are reported....

Agree, the common/fast case will be fine, which is really the most
important part.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag
  2023-04-16  5:54           ` Christoph Hellwig
@ 2023-04-19  1:29             ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2023-04-19  1:29 UTC (permalink / raw)
  To: Christoph Hellwig, Darrick J. Wong
  Cc: Miklos Szeredi, Bernd Schubert, io-uring, linux-ext4,
	linux-fsdevel, linux-xfs, dsingh

On 4/15/23 11:54?PM, Christoph Hellwig wrote:
> On Fri, Apr 14, 2023 at 08:36:12AM -0700, Darrick J. Wong wrote:
>> IIUC uring wants to avoid the situation where someone sends 300 writes
>> to the same file, all of which end up in background workers, and all of
>> which then contend on exclusive i_rwsem.  Hence it has some hashing
>> scheme that executes io requests serially if they hash to the same value
>> (which iirc is the inode number?) to prevent resource waste.
>>
>> This flag turns off that hashing behavior on the assumption that each of
>> those 300 writes won't serialize on the other 299 writes, hence it's ok
>> to start up 300 workers.
>>
>> (apologies for precoffee garbled response)
> 
> It might be useful if someone (Jens?) could clearly document the
> assumptions for this flag.

I guess it can be summed up as the common case should not be using
exclusive (per file/inode) locking. If file extensions need exclusive
locking that's less of a concern, as I don't think it's unreasonable to
expect that to require stricter locking.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-04-19  1:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-07 17:20 [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes Jens Axboe
2023-03-07 17:20 ` [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag Jens Axboe
2023-04-12 13:40   ` Bernd Schubert
2023-04-12 13:43     ` Bernd Schubert
2023-04-13  7:40     ` Miklos Szeredi
2023-04-13  9:25       ` Bernd Schubert
2023-04-14  5:11       ` Christoph Hellwig
2023-04-14 15:36         ` Darrick J. Wong
2023-04-15 13:15           ` Jens Axboe
2023-04-18 12:42             ` Miklos Szeredi
2023-04-18 12:55               ` Bernd Schubert
2023-04-18 22:13                 ` Dave Chinner
2023-04-19  1:28                   ` Jens Axboe
2023-04-16  5:54           ` Christoph Hellwig
2023-04-19  1:29             ` Jens Axboe
2023-03-07 17:20 ` [PATCH 2/2] io_uring: avoid hashing O_DIRECT writes if the filesystem doesn't need it Jens Axboe
2023-03-15 17:40 ` [PATCHSET for-next 0/2] Flag file systems as supporting parallel dio writes Jens Axboe
2023-03-16  4:29   ` Darrick J. Wong
2023-03-17  2:53     ` Jens Axboe
2023-04-03 12:24 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).