From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161200AbcBQKhr (ORCPT ); Wed, 17 Feb 2016 05:37:47 -0500 Received: from ipmail05.adl6.internode.on.net ([150.101.137.143]:32984 "EHLO ipmail05.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161178AbcBQKho (ORCPT ); Wed, 17 Feb 2016 05:37:44 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AbBwBoTMRWXJbY03ZegzqBP4Joo3gBAQEBAQEGi3KFSIQIhgcEAgKBP00BAQEBAQEHRECEQgEBBCcTHCMQCAMYCSUPBSUDBxoTiBm7GwEBAQEGAh4YhTKFA4hvBYdVhVSJWo1QjnyOR4JkGYFcKC6IYQEBAQ Date: Wed, 17 Feb 2016 21:37:39 +1100 From: Dave Chinner To: Waiman Long Cc: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch Subject: Re: [RRC PATCH 2/2] vfs: Use per-cpu list for superblock's inode list Message-ID: <20160217103739.GP14668@dastard> References: <1455672680-7153-1-git-send-email-Waiman.Long@hpe.com> <1455672680-7153-3-git-send-email-Waiman.Long@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1455672680-7153-3-git-send-email-Waiman.Long@hpe.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 16, 2016 at 08:31:20PM -0500, Waiman Long wrote: > When many threads are trying to add or delete inode to or from > a superblock's s_inodes list, spinlock contention on the list can > become a performance bottleneck. > > This patch changes the s_inodes field to become a per-cpu list with > per-cpu spinlocks. > > With an exit microbenchmark that creates a large number of threads, > attachs many inodes to them and then exits. The runtimes of that > microbenchmark with 1000 threads before and after the patch on a > 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as > follows: > > Kernel Elapsed Time System Time > ------ ------------ ----------- > Vanilla 4.5-rc4 65.29s 82m14s > Patched 4.5-rc4 22.81s 23m03s Pretty good :) My fsmark tests usually show up a fair bit of contention - moving 250k inodes through the cache every second over 16p does generate a bit of load on the list. The patch makes the inode list add/del operations disappear completely from the perf profiles, and there's a marginal decrease in runtime (~4m40s vs 4m30s). I think the global lock is right on the edge of breakdown under this load, though, so if I was testing on a larger system I think the difference would be much bigger. I'll run some more testing on it, see if anything breaks. A few comments on the code follow. > @@ -1866,8 +1866,8 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg) > { > struct inode *inode, *old_inode = NULL; > > - spin_lock(&blockdev_superblock->s_inode_list_lock); > - list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) { > + for_all_percpu_list_entries_simple(inode, percpu_lock, > + blockdev_superblock->s_inodes_cpu, i_sb_list) { This is kind what I meant about names getting way too long. How about something like: #define walk_sb_inodes(inode, sb, pcpu_lock) \ for_all_percpu_list_entries_simple(inode, pcpu_lock, \ sb->s_inodes_list, i_sb_list) #define walk_sb_inodes_end(pcpu_lock) end_all_percpu_list_entries(pcpu_lock) for brevity? > @@ -189,7 +190,7 @@ void fsnotify_unmount_inodes(struct super_block *sb) > spin_unlock(&inode->i_lock); > > /* In case the dropping of a reference would nuke next_i. */ > - while (&next_i->i_sb_list != &sb->s_inodes) { > + while (&next_i->i_sb_list.list != percpu_head) { > spin_lock(&next_i->i_lock); > if (!(next_i->i_state & (I_FREEING | I_WILL_FREE)) && > atomic_read(&next_i->i_count)) { > @@ -199,16 +200,16 @@ void fsnotify_unmount_inodes(struct super_block *sb) > break; > } > spin_unlock(&next_i->i_lock); > - next_i = list_next_entry(next_i, i_sb_list); > + next_i = list_next_entry(next_i, i_sb_list.list); pcpu_list_next_entry(next_i, i_sb_list)? > @@ -1397,9 +1398,8 @@ struct super_block { > */ > int s_stack_depth; > > - /* s_inode_list_lock protects s_inodes */ > - spinlock_t s_inode_list_lock ____cacheline_aligned_in_smp; > - struct list_head s_inodes; /* all inodes */ > + /* The percpu locks protect s_inodes_cpu */ > + PERCPU_LIST_HEAD(s_inodes_cpu); /* all inodes */ There is no need to encode the type of list into the name. i.e. drop the "_cpu" suffix - we can see it's a percpu list from the declaration. Cheers, Dave. -- Dave Chinner david@fromorbit.com