From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: Fwd: Fwd: [newstore (again)] how disable double write WAL Date: Mon, 22 Feb 2016 07:01:43 -0500 (EST) Message-ID: References: <5661F3A9.8070703@redhat.com> <20151208044640.GL1983@devil.localdomain> <20160216033538.GB2005@devil.localdomain> <20160219052637.GF2005@devil.localdomain> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Return-path: Received: from cobra.newdream.net ([66.33.216.30]:59714 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754549AbcBVMCL (ORCPT ); Mon, 22 Feb 2016 07:02:11 -0500 In-Reply-To: <20160219052637.GF2005@devil.localdomain> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Dave Chinner Cc: David Casier , Ric Wheeler , Ceph Development , Brian Foster , Eric Sandeen , =?ISO-8859-15?Q?Beno=EEt_LORIOT?= On Fri, 19 Feb 2016, Dave Chinner wrote: > On Tue, Feb 16, 2016 at 09:39:28AM +0100, David Casier wrote: > > "With this model, filestore rearrange the tree very > > frequently : + 40 I/O every 32 objects link/unlink." > > It is the consequence of parameters : > > filestore_merge_threshold = 2 > > filestore_split_multiple = 1 > > > > Not of ext4 customization. > > It's a function of the directory structure you are using to work > around the scalability deficiencies of the ext4 directory structure. > i.e. the root cause is that you are working around an ext4 problem. If only it were just that :(. The other problem is that we need in-order enumeration of files/objects (with a particular sort order we define) and POSIX doesn't give us that. Small directories let us read the whole thing and sort in memory. If there is a 'good' directory size that tends to have a small/minimal number of IOs for listing all files it may make sense to change the defaults (picked semi-randomly several years back), but beyond that there isn't much to do here except wait for the replacement for this whole module that doesn't try to map our namespace onto POSIX's. Optimizations to any of this FileStore code will see limited mileage since it'll be deprecated shortly anyway... sage