From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C25D0C282DA for ; Thu, 18 Apr 2019 03:10:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9CC11214DA for ; Thu, 18 Apr 2019 03:10:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387506AbfDRDKS (ORCPT ); Wed, 17 Apr 2019 23:10:18 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:35957 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731985AbfDRDKS (ORCPT ); Wed, 17 Apr 2019 23:10:18 -0400 Received: from dread.disaster.area (pa49-195-160-97.pa.nsw.optusnet.com.au [49.195.160.97]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 27FC5436123; Thu, 18 Apr 2019 13:10:13 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92) (envelope-from ) id 1hGxRF-00032n-Cl; Thu, 18 Apr 2019 13:10:13 +1000 Date: Thu, 18 Apr 2019 13:10:13 +1000 From: Dave Chinner To: Davidlohr Bueso Cc: Jan Kara , Amir Goldstein , "Darrick J . Wong" , Christoph Hellwig , Matthew Wilcox , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [POC][PATCH] xfs: reduce ilock contention on buffered randrw workload Message-ID: <20190418031013.GX29573@dread.disaster.area> References: <20190404165737.30889-1-amir73il@gmail.com> <20190404211730.GD26298@dastard> <20190408103303.GA18239@quack2.suse.cz> <1554741429.3326.43.camel@suse.com> <20190411011117.GC29573@dread.disaster.area> <20190416122240.GN29573@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190416122240.GN29573@dread.disaster.area> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0 cx=a_idp_d a=EHa8gIBQe3daEtuMEU8ptg==:117 a=EHa8gIBQe3daEtuMEU8ptg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=oexKYjalfGEA:10 a=7-415B0cAAAA:8 a=mYhNP0n0lnST_4d5yIoA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Tue, Apr 16, 2019 at 10:22:40PM +1000, Dave Chinner wrote: > On Thu, Apr 11, 2019 at 11:11:17AM +1000, Dave Chinner wrote: > > On Mon, Apr 08, 2019 at 09:37:09AM -0700, Davidlohr Bueso wrote: > > > On Mon, 2019-04-08 at 12:33 +0200, Jan Kara wrote: > > > > On Fri 05-04-19 08:17:30, Dave Chinner wrote: > > > > > FYI, I'm working on a range lock implementation that should both > > > > > solve the performance issue and the reader starvation issue at the > > > > > same time by allowing concurrent buffered reads and writes to > > > > > different file ranges. > > > > > > > > Are you aware of range locks Davidlohr has implemented [1]? It didn't get > > > > merged because he had no in-tree user at the time (he was more aiming at > > > > converting mmap_sem which is rather difficult). But the generic lock > > > > implementation should be well usable. > > > > > > > > Added Davidlohr to CC. ..... > Fio randrw numbers on a single file on a pmem device on a 16p > machine using 4kB AIO-DIO iodepth 128 w/ fio on 5.1.0-rc3: > > IOPS read/write (direct IO) > fio processes rwsem rangelock > 1 78k / 78k 75k / 75k > 2 131k / 131k 123k / 123k > 4 267k / 267k 183k / 183k > 8 372k / 372k 177k / 177k > 16 315k / 315k 135k / 135k .... > FWIW, I'm not convinced about the scalability of the rb/interval > tree, to tell you the truth. We got rid of the rbtree in XFS for > cache indexing because the multi-level pointer chasing was just too > expensive to do under a spinlock - it's just not a cache efficient > structure for random index object storage. Yeah, definitely not convinced an rbtree is the right structure here. Locking of the tree is the limitation.... > FWIW, I have basic hack to replace the i_rwsem in XFS with a full > range read or write lock with my XFS range lock implementation so it > just behaves like a rwsem at this point. It is not in any way > optimised at this point. Numbers for same AIO-DIO test are: Now the stuff I've been working on has the same interface as Davidlohr's patch, so I can swap and change them without thinking about it. It's still completely unoptimised, but: IOPS read/write (direct IO) processes rwsem DB rangelock XFS rangelock 1 78k / 78k 75k / 75k 72k / 72k 2 131k / 131k 123k / 123k 133k / 133k 4 267k / 267k 183k / 183k 237k / 237k 8 372k / 372k 177k / 177k 265k / 265k 16 315k / 315k 135k / 135k 228k / 228k It's still substantially faster than the interval tree code. BTW, if I take away the rwsem serialisation altogether, this test tops out at just under 500k/500k at 8 threads, and at 16 threads has started dropping off (~440k/440k). So the rwsem is a scalability limitation at just 8 threads.... /me goes off and thinks more about adding optimistic lock coupling to the XFS iext btree to get rid of the need for tree-wide locking altogether Cheers, Dave. -- Dave Chinner david@fromorbit.com