From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43A80CCA473 for ; Wed, 22 Jun 2022 03:23:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356516AbiFVDXU (ORCPT ); Tue, 21 Jun 2022 23:23:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356339AbiFVDXT (ORCPT ); Tue, 21 Jun 2022 23:23:19 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58ABA5FE1; Tue, 21 Jun 2022 20:23:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=gd8gvPDKJUfSWTRzefun4vrxfLe3EUIpuUk+m4wcIrs=; b=mUFsyC5knxIPjYhfw0K0A1iQbj dB7zXIFcdmPikLtc9dOvk0aQMDn3+ulelVlNM8gwKIiRST+4tHtOU6io6l+N+lnNLrEtFOqqVrzP0 61wHQXv/yurjgu/PLcmJUOTOYgEbk9aU3mlv9R/OUxgozAAjKVI3oVZIHXv0+pCWw/TDqbgD05JQR 1HNklpv+YWC5deP5pJ7UXh/16VeFdVSSabmSA9liHLfheNHY0nxLFb6Q9UO52Y7YdBwfBH6Y9p4Rg BcpO1bThZ8+pGXnbs+bUI1VFr4pjw50X7/rmuJ5RLqNS/l9vs+6QCdrS+3JluE3zfaae4jS57bTsj TQa0YaqA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1o3qxP-006gdu-B9; Wed, 22 Jun 2022 03:23:07 +0000 Date: Wed, 22 Jun 2022 04:23:07 +0100 From: Matthew Wilcox To: Amir Goldstein Cc: Jan Kara , "Darrick J . Wong" , Dave Chinner , Christoph Hellwig , linux-xfs , linux-fsdevel Subject: Re: [POC][PATCH] xfs: reduce ilock contention on buffered randrw workload Message-ID: References: <20190408141114.GC15023@quack2.suse.cz> <20190409082605.GA8107@quack2.suse.cz> <20220617151135.yc6vytge6hjabsuz@quack3> <20220620091136.4uosazpwkmt65a5d@quack3.lan> <20220621085956.y5wyopfgzmqkaeiw@quack3.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Tue, Jun 21, 2022 at 03:53:33PM +0300, Amir Goldstein wrote: > On Tue, Jun 21, 2022 at 11:59 AM Jan Kara wrote: > > > > On Tue 21-06-22 10:49:48, Amir Goldstein wrote: > > > > How exactly do you imagine the synchronization of buffered read against > > > > buffered write would work? Lock all pages for the read range in the page > > > > cache? You'd need to be careful to not bring the machine OOM when someone > > > > asks to read a huge range... > > > > > > I imagine that the atomic r/w synchronisation will remain *exactly* as it is > > > today by taking XFS_IOLOCK_SHARED around generic_file_read_iter(), > > > when reading data into user buffer, but before that, I would like to issue > > > and wait for read of the pages in the range to reduce the probability > > > of doing the read I/O under XFS_IOLOCK_SHARED. > > > > > > The pre-warm of page cache does not need to abide to the atomic read > > > semantics and it is also tolerable if some pages are evicted in between > > > pre-warn and read to user buffer - in the worst case this will result in > > > I/O amplification, but for the common case, it will be a big win for the > > > mixed random r/w performance on xfs. > > > > > > To reduce risk of page cache thrashing we can limit this optimization > > > to a maximum number of page cache pre-warm. > > > > > > The questions are: > > > 1. Does this plan sound reasonable? > > > > Ah, I see now. So essentially the idea is to pull the readahead (which is > > currently happening from filemap_read() -> filemap_get_pages()) out from under > > the i_rwsem. It looks like a fine idea to me. > > Great! > Anyone doesn't like the idea or has another suggestion? I guess I'm still confused. The problem was the the XFS IOLOCK was being held while we waited for readahead to complete. To fix this, you're planning on waiting for readahead to complete with the invalidate lock held? I don't see the benefit. I see the invalidate_lock as being roughly equivalent to the IOLOCK, just pulled up to the VFS. Is that incorrect?