From: Andres Freund <andres@anarazel.de> To: Dave Chinner <david@fromorbit.com>, David Sterba <dsterba@suse.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>, linux-fsdevel@vger.kernel.org, jack@suse.com, hch@infradead.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: Re: Odd locking pattern introduced as part of "nowait aio support" Date: Wed, 11 Sep 2019 02:39:26 -0700 Message-ID: <20190911093926.pfkkx25mffzeuo32@alap3.anarazel.de> (raw) In-Reply-To: <20190911040420.GB27547@dread.disaster.area> Hi, On 2019-09-11 14:04:20 +1000, Dave Chinner wrote: > On Tue, Sep 10, 2019 at 03:33:27PM -0700, Andres Freund wrote: > > Hi, > > > > Especially with buffered io it's fairly easy to hit contention on the > > inode lock, during writes. With something like io_uring, it's even > > easier, because it currently (but see [1]) farms out buffered writes to > > workers, which then can easily contend on the inode lock, even if only > > one process submits writes. But I've seen it in plenty other cases too. > > > > Looking at the code I noticed that several parts of the "nowait aio > > support" (cf 728fbc0e10b7f3) series introduced code like: > > > > static ssize_t > > ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > > { > > ... > > if (!inode_trylock(inode)) { > > if (iocb->ki_flags & IOCB_NOWAIT) > > return -EAGAIN; > > inode_lock(inode); > > } > > The ext4 code is just buggy here - we don't support RWF_NOWAIT on > buffered writes. But both buffered and non-buffered writes go through ext4_file_write_iter(). And there's a preceding check, at least these days, preventing IOCB_NOWAIT to apply to buffered writes: if (!o_direct && (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP; I do really wish buffered NOWAIT was supported... The overhead of having to do async buffered writes through the workqueue in io_uring, even if an already existing page is targeted, is quite noticable. Even if it failed with EAGAIN as soon as the buffered write's target isn't in the page cache, it'd be worthwhile. > > isn't trylocking and then locking in a blocking fashion an inefficient > > pattern? I.e. I think this should be > > > > if (iocb->ki_flags & IOCB_NOWAIT) { > > if (!inode_trylock(inode)) > > return -EAGAIN; > > } > > else > > inode_lock(inode); > > Yes, you are right. > > History: commit 91f9943e1c7b ("fs: support RWF_NOWAIT > for buffered reads") which introduced the first locking pattern > you describe in XFS. Seems, as part of the nowait work, the pattern was introduced in ext4, xfs and btrfs. And fixed in xfs. > That was followed soon after by: > > commit 942491c9e6d631c012f3c4ea8e7777b0b02edeab > Author: Christoph Hellwig <hch@lst.de> > Date: Mon Oct 23 18:31:50 2017 -0700 > > xfs: fix AIM7 regression > > Apparently our current rwsem code doesn't like doing the trylock, then > lock for real scheme. So change our read/write methods to just do the > trylock for the RWF_NOWAIT case. This fixes a ~25% regression in > AIM7. > > Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads") > Reported-by: kernel test robot <xiaolong.ye@intel.com> > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > Which changed all the trylock/eagain/lock patterns to the second > form you quote. None of the other filesystems had AIM7 regressions > reported against them, so nobody changed them.... :( > > Obviously this isn't going to improve scalability to a very significant > > degree. But not unnecessarily doing two atomic ops on a contended lock > > can't hurt scalability either. Also, the current code just seems > > confusing. > > > > Am I missing something? > > Just that the sort of performance regression testing that uncovers > this sort of thing isn't widely done, and most filesystems are > concurrency limited in some way before they hit inode lock > scalability issues. Hence filesystem concurrency foccussed > benchmarks that could uncover it (like aim7) won't because the inode > locks don't end up stressed enough to make a difference to > benchmark performance. Ok. Goldwyn, do you want to write a patch, or do you want me to write one up? Greetings, Andres Freund
next parent reply index Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <20190910223327.mnegfoggopwqqy33@alap3.anarazel.de> [not found] ` <20190911040420.GB27547@dread.disaster.area> 2019-09-11 9:39 ` Andres Freund [this message] 2019-09-11 10:19 ` Christoph Hellwig 2019-09-11 10:31 ` Ritesh Harjani 2019-09-11 10:55 ` Goldwyn Rodrigues 2019-09-11 16:45 ` Fix inode sem regression for nowait Goldwyn Rodrigues 2019-09-11 16:45 ` [PATCH 1/3] btrfs: fix inode rwsem regression Goldwyn Rodrigues 2019-09-11 17:21 ` David Sterba 2019-09-11 16:45 ` [PATCH 2/3] ext4: " Goldwyn Rodrigues 2019-09-12 8:52 ` Ritesh Harjani 2019-09-12 9:26 ` Matthew Bobrowski 2019-09-23 10:10 ` Jan Kara 2019-09-23 13:18 ` Theodore Y. Ts'o 2019-09-11 16:45 ` [PATCH 3/3] f2fs: " Goldwyn Rodrigues 2019-09-12 6:17 ` Chao Yu 2019-09-13 19:46 ` Jaegeuk Kim 2019-09-16 1:16 ` Chao Yu
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190911093926.pfkkx25mffzeuo32@alap3.anarazel.de \ --to=andres@anarazel.de \ --cc=david@fromorbit.com \ --cc=dsterba@suse.com \ --cc=hch@infradead.org \ --cc=jack@suse.com \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=rgoldwyn@suse.de \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-BTRFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \ linux-btrfs@vger.kernel.org public-inbox-index linux-btrfs Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs AGPL code for this site: git clone https://public-inbox.org/public-inbox.git