From: Ritesh Harjani <riteshh@linux.ibm.com> To: Andres Freund <andres@anarazel.de>, Dave Chinner <david@fromorbit.com>, David Sterba <dsterba@suse.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>, linux-fsdevel@vger.kernel.org, jack@suse.com, hch@infradead.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: Re: Odd locking pattern introduced as part of "nowait aio support" Date: Wed, 11 Sep 2019 16:01:14 +0530 Message-ID: <20190911103117.E32C34C044@d06av22.portsmouth.uk.ibm.com> (raw) In-Reply-To: <20190911093926.pfkkx25mffzeuo32@alap3.anarazel.de> Hi, On 9/11/19 3:09 PM, Andres Freund wrote: > Hi, > > On 2019-09-11 14:04:20 +1000, Dave Chinner wrote: >> On Tue, Sep 10, 2019 at 03:33:27PM -0700, Andres Freund wrote: >>> Hi, >>> >>> Especially with buffered io it's fairly easy to hit contention on the >>> inode lock, during writes. With something like io_uring, it's even >>> easier, because it currently (but see [1]) farms out buffered writes to >>> workers, which then can easily contend on the inode lock, even if only >>> one process submits writes. But I've seen it in plenty other cases too. >>> >>> Looking at the code I noticed that several parts of the "nowait aio >>> support" (cf 728fbc0e10b7f3) series introduced code like: >>> >>> static ssize_t >>> ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) >>> { >>> ... >>> if (!inode_trylock(inode)) { >>> if (iocb->ki_flags & IOCB_NOWAIT) >>> return -EAGAIN; >>> inode_lock(inode); >>> } >> >> The ext4 code is just buggy here - we don't support RWF_NOWAIT on >> buffered write > > But both buffered and non-buffered writes go through > ext4_file_write_iter(). And there's a preceding check, at least these > days, preventing IOCB_NOWAIT to apply to buffered writes: > > if (!o_direct && (iocb->ki_flags & IOCB_NOWAIT)) > return -EOPNOTSUPP; > -EOPNOTSUPP is now taken care in ext4 iomap patch series as well. > > I do really wish buffered NOWAIT was supported... The overhead of having > to do async buffered writes through the workqueue in io_uring, even if > an already existing page is targeted, is quite noticable. Even if it > failed with EAGAIN as soon as the buffered write's target isn't in the > page cache, it'd be worthwhile. > > >>> isn't trylocking and then locking in a blocking fashion an inefficient >>> pattern? I.e. I think this should be >>> >>> if (iocb->ki_flags & IOCB_NOWAIT) { >>> if (!inode_trylock(inode)) >>> return -EAGAIN; >>> } >>> else >>> inode_lock(inode); >> >> Yes, you are right. >> >> History: commit 91f9943e1c7b ("fs: support RWF_NOWAIT >> for buffered reads") which introduced the first locking pattern >> you describe in XFS. > > Seems, as part of the nowait work, the pattern was introduced in ext4, > xfs and btrfs. And fixed in xfs. > > >> That was followed soon after by: >> >> commit 942491c9e6d631c012f3c4ea8e7777b0b02edeab >> Author: Christoph Hellwig <hch@lst.de> >> Date: Mon Oct 23 18:31:50 2017 -0700 >> >> xfs: fix AIM7 regression >> >> Apparently our current rwsem code doesn't like doing the trylock, then >> lock for real scheme. So change our read/write methods to just do the >> trylock for the RWF_NOWAIT case. This fixes a ~25% regression in >> AIM7. >> >> Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads") >> Reported-by: kernel test robot <xiaolong.ye@intel.com> >> Signed-off-by: Christoph Hellwig <hch@lst.de> >> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> >> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> >> >> Which changed all the trylock/eagain/lock patterns to the second >> form you quote. None of the other filesystems had AIM7 regressions >> reported against them, so nobody changed them.... > > :( > > >>> Obviously this isn't going to improve scalability to a very significant >>> degree. But not unnecessarily doing two atomic ops on a contended lock >>> can't hurt scalability either. Also, the current code just seems >>> confusing. >>> >>> Am I missing something? >> >> Just that the sort of performance regression testing that uncovers >> this sort of thing isn't widely done, and most filesystems are >> concurrency limited in some way before they hit inode lock >> scalability issues. Hence filesystem concurrency foccussed >> benchmarks that could uncover it (like aim7) won't because the inode >> locks don't end up stressed enough to make a difference to >> benchmark performance. > > Ok. Goldwyn, do you want to write a patch, or do you want me to write > one up? I am anyways looking into ext4 performance issue of mixed parallel DIO workload. This will require some new APIs for inode locking similar to that of XFS. In that I can take care of this symantics reported here by you (which is taken care by XFS in above patch) for ext4. Thanks -ritesh > > > Greetings, > > Andres Freund >
next prev parent reply index Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <20190910223327.mnegfoggopwqqy33@alap3.anarazel.de> [not found] ` <20190911040420.GB27547@dread.disaster.area> 2019-09-11 9:39 ` Andres Freund 2019-09-11 10:19 ` Christoph Hellwig 2019-09-11 10:31 ` Ritesh Harjani [this message] 2019-09-11 10:55 ` Goldwyn Rodrigues 2019-09-11 16:45 ` Fix inode sem regression for nowait Goldwyn Rodrigues 2019-09-11 16:45 ` [PATCH 1/3] btrfs: fix inode rwsem regression Goldwyn Rodrigues 2019-09-11 17:21 ` David Sterba 2019-09-11 16:45 ` [PATCH 2/3] ext4: " Goldwyn Rodrigues 2019-09-12 8:52 ` Ritesh Harjani 2019-09-12 9:26 ` Matthew Bobrowski 2019-09-23 10:10 ` Jan Kara 2019-09-23 13:18 ` Theodore Y. Ts'o 2019-09-11 16:45 ` [PATCH 3/3] f2fs: " Goldwyn Rodrigues 2019-09-12 6:17 ` Chao Yu 2019-09-13 19:46 ` Jaegeuk Kim 2019-09-16 1:16 ` Chao Yu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190911103117.E32C34C044@d06av22.portsmouth.uk.ibm.com \ --to=riteshh@linux.ibm.com \ --cc=andres@anarazel.de \ --cc=david@fromorbit.com \ --cc=dsterba@suse.com \ --cc=hch@infradead.org \ --cc=jack@suse.com \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=rgoldwyn@suse.de \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-BTRFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \ linux-btrfs@vger.kernel.org public-inbox-index linux-btrfs Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs AGPL code for this site: git clone https://public-inbox.org/public-inbox.git