From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933745AbcBQHQj (ORCPT ); Wed, 17 Feb 2016 02:16:39 -0500 Received: from mail-wm0-f52.google.com ([74.125.82.52]:36733 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754244AbcBQHQh (ORCPT ); Wed, 17 Feb 2016 02:16:37 -0500 Date: Wed, 17 Feb 2016 08:16:32 +0100 From: Ingo Molnar To: Waiman Long Cc: Alexander Viro , Jan Kara , Jeff Layton , "J. Bruce Fields" , Tejun Heo , Christoph Lameter , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Andi Kleen , Dave Chinner , Scott J Norton , Douglas Hatch , Linus Torvalds , Andrew Morton , Peter Zijlstra , Thomas Gleixner Subject: Re: [RRC PATCH 2/2] vfs: Use per-cpu list for superblock's inode list Message-ID: <20160217071632.GA18403@gmail.com> References: <1455672680-7153-1-git-send-email-Waiman.Long@hpe.com> <1455672680-7153-3-git-send-email-Waiman.Long@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1455672680-7153-3-git-send-email-Waiman.Long@hpe.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Waiman Long wrote: > When many threads are trying to add or delete inode to or from > a superblock's s_inodes list, spinlock contention on the list can > become a performance bottleneck. > > This patch changes the s_inodes field to become a per-cpu list with > per-cpu spinlocks. > > With an exit microbenchmark that creates a large number of threads, > attachs many inodes to them and then exits. The runtimes of that > microbenchmark with 1000 threads before and after the patch on a > 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as > follows: > > Kernel Elapsed Time System Time > ------ ------------ ----------- > Vanilla 4.5-rc4 65.29s 82m14s > Patched 4.5-rc4 22.81s 23m03s > > Before the patch, spinlock contention at the inode_sb_list_add() > function at the startup phase and the inode_sb_list_del() function at > the exit phase were about 79% and 93% of total CPU time respectively > (as measured by perf). After the patch, the percpu_list_add() > function consumed only about 0.04% of CPU time at startup phase. The > percpu_list_del() function consumed about 0.4% of CPU time at exit > phase. There were still some spinlock contention, but they happened > elsewhere. Pretty impressive IMHO! Just for the record, here's your former 'batched list' number inserted into the above table: Kernel Elapsed Time System Time ------ ------------ ----------- Vanilla [v4.5-rc4] 65.29s 82m14s batched list [v4.4] 45.69s 49m44s percpu list [v4.5-rc4] 22.81s 23m03s i.e. the proper per CPU data structure and the resulting improvement in cache locality gave another doubling in performance. Just out of curiosity, could you post the profile of the latest patches - is there any (bigger) SMP overhead left, or is the profile pretty flat now? Thanks, Ingo