linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Andres Freund <andres@anarazel.de>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>,
	linux-fsdevel@vger.kernel.org, jack@suse.com, hch@infradead.org
Subject: Re: Odd locking pattern introduced as part of "nowait aio support"
Date: Wed, 11 Sep 2019 14:04:20 +1000	[thread overview]
Message-ID: <20190911040420.GB27547@dread.disaster.area> (raw)
In-Reply-To: <20190910223327.mnegfoggopwqqy33@alap3.anarazel.de>

On Tue, Sep 10, 2019 at 03:33:27PM -0700, Andres Freund wrote:
> Hi,
> 
> Especially with buffered io it's fairly easy to hit contention on the
> inode lock, during writes. With something like io_uring, it's even
> easier, because it currently (but see [1]) farms out buffered writes to
> workers, which then can easily contend on the inode lock, even if only
> one process submits writes.  But I've seen it in plenty other cases too.
> 
> Looking at the code I noticed that several parts of the "nowait aio
> support" (cf 728fbc0e10b7f3) series introduced code like:
> 
> static ssize_t
> ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> ...
> 	if (!inode_trylock(inode)) {
> 		if (iocb->ki_flags & IOCB_NOWAIT)
> 			return -EAGAIN;
> 		inode_lock(inode);
> 	}

The ext4 code is just buggy here - we don't support RWF_NOWAIT on
buffered writes. Buffered reads, and dio/dax reads and writes, yes,
but not buffered writes because they are almost guaranteed to block
somewhere. See xfs_file_buffered_aio_write():

	if (iocb->ki_flags & IOCB_NOWAIT)
		return -EOPNOTSUPP;

generic_write_checks() will also reject IOCB_NOWAIT on buffered
writes, so that code in ext4 is likely in the wrong place...

> 
> isn't trylocking and then locking in a blocking fashion an inefficient
> pattern? I.e. I think this should be
> 
> 	if (iocb->ki_flags & IOCB_NOWAIT) {
> 		if (!inode_trylock(inode))
> 			return -EAGAIN;
> 	}
>         else
>         	inode_lock(inode);

Yes, you are right.

History: commit 91f9943e1c7b ("fs: support RWF_NOWAIT
for buffered reads") which introduced the first locking pattern
you describe in XFS.

That was followed soon after by:

commit 942491c9e6d631c012f3c4ea8e7777b0b02edeab
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Oct 23 18:31:50 2017 -0700

    xfs: fix AIM7 regression
    
    Apparently our current rwsem code doesn't like doing the trylock, then
    lock for real scheme.  So change our read/write methods to just do the
    trylock for the RWF_NOWAIT case.  This fixes a ~25% regression in
    AIM7.
    
    Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads")
    Reported-by: kernel test robot <xiaolong.ye@intel.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Which changed all the trylock/eagain/lock patterns to the second
form you quote. None of the other filesystems had AIM7 regressions
reported against them, so nobody changed them....

> Obviously this isn't going to improve scalability to a very significant
> degree. But not unnecessarily doing two atomic ops on a contended lock
> can't hurt scalability either. Also, the current code just seems
> confusing.
> 
> Am I missing something?

Just that the sort of performance regression testing that uncovers
this sort of thing isn't widely done, and most filesystems are
concurrency limited in some way before they hit inode lock
scalability issues. Hence filesystem concurrency foccussed
benchmarks that could uncover it (like aim7) won't because the inode
locks don't end up stressed enough to make a difference to
benchmark performance.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-09-11  4:04 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-10 22:33 Odd locking pattern introduced as part of "nowait aio support" Andres Freund
2019-09-11  4:04 ` Dave Chinner [this message]
2019-09-11  9:39   ` Andres Freund
2019-09-11 10:19     ` Christoph Hellwig
2019-09-11 10:31     ` Ritesh Harjani
2019-09-11 10:55     ` Goldwyn Rodrigues
2019-09-11 16:45     ` Fix inode sem regression for nowait Goldwyn Rodrigues
2019-09-11 16:45       ` [PATCH 1/3] btrfs: fix inode rwsem regression Goldwyn Rodrigues
2019-09-11 17:21         ` David Sterba
2019-09-11 16:45       ` [PATCH 2/3] ext4: " Goldwyn Rodrigues
2019-09-12  8:52         ` Ritesh Harjani
2019-09-12  9:26           ` Matthew Bobrowski
2019-09-23 10:10         ` Jan Kara
2019-09-23 13:18           ` Theodore Y. Ts'o
2019-09-11 16:45       ` [PATCH 3/3] f2fs: " Goldwyn Rodrigues
2019-09-12  6:17         ` Chao Yu
2019-09-13 19:46         ` Jaegeuk Kim
2019-09-16  1:16           ` Chao Yu
2019-09-11 12:25   ` Odd locking pattern introduced as part of "nowait aio support" Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190911040420.GB27547@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=andres@anarazel.de \
    --cc=hch@infradead.org \
    --cc=jack@suse.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rgoldwyn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).