From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754039Ab0GARws (ORCPT ); Thu, 1 Jul 2010 13:52:48 -0400 Received: from cantor2.suse.de ([195.135.220.15]:37395 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753209Ab0GARwp (ORCPT ); Thu, 1 Jul 2010 13:52:45 -0400 Date: Fri, 2 Jul 2010 03:52:30 +1000 From: Nick Piggin To: Linus Torvalds Cc: Dave Chinner , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, John Stultz , Frank Mayhar Subject: Re: [patch 00/52] vfs scalability patches updated Message-ID: <20100701175230.GC1830@laptop> References: <20100624030212.676457061@suse.de> <20100630113054.GL24712@dastard> <20100630124049.GH21358@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 01, 2010 at 10:35:35AM -0700, Linus Torvalds wrote: > On Wed, Jun 30, 2010 at 5:40 AM, Nick Piggin wrote: > >> > >> That's a pretty big ouch. Why does RCU freeing of inodes cause that > >> much regression? The RCU freeing is out of line, so where does the big > >> impact come from? > > > > That comes mostly from inability to reuse the cache-hot inode structure, > > and the cost to go over the deferred RCU list and free them after they > > get cache cold. > > I do wonder if this isn't a big design bug. It's possible, yes. Although a lot of that drop does come from hitting RCU and overruning slab allocator queues. It was what, closer to 10% when doing small numbers of creat/unlink loops. > Most of the time with RCU, we don't need to wait to actually do the > _freeing_ of the individual data structure, we only need to make sure > that the data structure remains of the same _type_. IOW, we can free > it (and re-use it), but the backing storage cannot be released to the > page cache. That's what SLAB_DESTROY_BY_RCU should give us. > > Is that not possible in this situation? Do we really need to keep the > inode _identity_ around for RCU? > > If you use just SLAB_DESTROY_BY_RCU, then inode re-use remains, and > cache behavior would be much improved. The usual requirement for > SLAB_DESTROY_BY_RCU is that you only touch a lock (and perhaps > re-validate the identity) in the RCU-reader paths. Could that be made > to work? I definitely thought of that. I actually thought it would not be possible with the store-free path walk patches though, because we need to check some inode properties (eg. permission). So I was thinking the usual approach of taking a per-entry lock defeats the whole purpose of store-free path walk. But you've got me to think about it again and it should be possible to do just using the dentry seqlock. IOW, if the inode gets disconnected from the dentry (and then can get possibly freed and reused) then just retry the lookup. It may be a little tricky. I'll wait until the path-walk code is more polished first. > > Because that 27% drop really is pretty distressing. > > That said, open (of the non-creating kind), close, and stat are > certainly more important than creating and freeing files. So as a > trade-off, it's probably the right thing to do. But if we can get all > the improvement _without_ that big downside, that would obviously be > better yet. We have actually bigger regressions than that for other code paths. The RCU freeing for files structs causes similar, about 20-30% regression in open/close. I actually have a (proper) patch to make that use DESTROY_BY_RCU too. It actually slows down fd lookup by a tiny bit, though (lock, load, branch, increment, unlock versus atomic inc). But same number of atomic ops.