From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932446AbcBAVoL (ORCPT ); Mon, 1 Feb 2016 16:44:11 -0500 Received: from g1t6213.austin.hp.com ([15.73.96.121]:46604 "EHLO g1t6213.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753608AbcBAVoH (ORCPT ); Mon, 1 Feb 2016 16:44:07 -0500 Message-ID: <56AFD1A2.1060902@hpe.com> Date: Mon, 01 Feb 2016 16:44:02 -0500 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Ingo Molnar CC: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Alexander Viro , linux-fsdevel@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Andi Kleen , Scott J Norton , Douglas Hatch Subject: Re: [PATCH v2 3/3] vfs: Enable list batching for the superblock's inode list References: <1454095846-19628-1-git-send-email-Waiman.Long@hpe.com> <1454095846-19628-4-git-send-email-Waiman.Long@hpe.com> <20160130083557.GA31749@gmail.com> In-Reply-To: <20160130083557.GA31749@gmail.com> Content-Type: multipart/mixed; boundary="------------060907050406000808040700" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------060907050406000808040700 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 01/30/2016 03:35 AM, Ingo Molnar wrote: > * Waiman Long wrote: > >> The inode_sb_list_add() and inode_sb_list_del() functions in the vfs >> layer just perform list addition and deletion under lock. So they can >> use the new list batching facility to speed up the list operations >> when many CPUs are trying to do it simultaneously. >> >> In particular, the inode_sb_list_del() function can be a performance >> bottleneck when large applications with many threads and associated >> inodes exit. With an exit microbenchmark that creates a large number >> of threads, attachs many inodes to them and then exits. The runtimes >> of that microbenchmark with 1000 threads before and after the patch >> on a 4-socket Intel E7-4820 v3 system (48 cores, 96 threads) were >> as follows: >> >> Kernel Elapsed Time System Time >> ------ ------------ ----------- >> Vanilla 4.4 65.29s 82m14s >> Patched 4.4 45.69s 49m44s >> >> The elapsed time and the reported system time were reduced by 30% >> and 40% respectively. > That's pretty impressive! > > I'm wondering, why are inode_sb_list_add()/del() even called for a presumably > reasonably well cached benchmark running on a system with enough RAM? Are these > perhaps thousands of temporary files, already deleted, and released when all the > file descriptors are closed as part of sys_exit()? The inodes that need to be deleted were actually procfs files which have to go away when the processes/threads exit. I encountered this problem when running the SPECjbb2013 benchmark on large machine where sometimes it might seems to hang for 30 mins or so after the benchmark complete. I wrote a simple microbenchmark to simulate this situation which is in the attachment. > If that's the case then I suspect an even bigger win would be not just to batch > the (sb-)global list fiddling, but to potentially turn the sb list into a > percpu_alloc() managed set of per CPU lists? It's a bigger change, but it could > speed up a lot of other temporary file intensive usecases as well, not just > batched delete. > > Thanks, > > Ingo Yes, that can be another possible. I will investigate further on that one. Thanks for the suggestion. Cheers, Longman --------------060907050406000808040700 Content-Type: text/plain; name="exit_test.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="exit_test.c" /* * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * Authors: Waiman Long */ /* * This is an exit test */ #include #include #include #include #include #include #include #include #include #include #define do_exit() syscall(SYS_exit) #define gettid() syscall(SYS_gettid) #define MAX_THREADS 2048 static inline void cpu_relax(void) { __asm__ __volatile__("rep;nop": : :"memory"); } static inline void atomic_inc(volatile int *v) { __asm__ __volatile__("lock incl %0": "+m" (*v)); } static volatile int exit_now = 0; static volatile int threadcnt = 0; /* * Walk the /proc/ filesystem to make them fill the dentry cache */ static void walk_procfs(void) { char cmdbuf[256]; pid_t tid = gettid(); snprintf(cmdbuf, sizeof(cmdbuf), "find /proc/%d > /dev/null 2>&1", tid); if (system(cmdbuf) < 0) perror("system() failed!"); } static void *exit_thread(void *dummy) { long tid = (long)dummy; walk_procfs(); atomic_inc(&threadcnt); /* * Busy wait until the do_exit flag is set and then call exit */ while (!exit_now) sleep(1); do_exit(); } static void exit_test(int threads) { pthread_t thread[threads]; long i = 0, finish; time_t start = time(NULL); while (i++ < threads) { if (pthread_create(thread + i - 1, NULL, exit_thread, (void *)i)) { perror("pthread_create"); exit(1); } #if 0 /* * Pipelining to reduce contention & improve speed */ if ((i & 0xf) == 0) while (i - threadcnt > 12) usleep(1); #endif } while (threadcnt != threads) usleep(1); walk_procfs(); printf("Setup time = %lus\n", time(NULL) - start); printf("Process ready to exit!\n"); kill(0, SIGKILL); exit(0); } int main(int argc, char *argv[]) { int tcnt; /* Thread counts */ char *cmd = argv[0]; if ((argc != 2) || !isdigit(argv[1][0])) { fprintf(stderr, "Usage: %s \n", cmd); exit(1); } tcnt = strtoul(argv[1], NULL, 10); if (tcnt > MAX_THREADS) { fprintf(stderr, "Error: thread count should be <= %d\n", MAX_THREADS); exit(1); } exit_test(tcnt); return 0; /* Not reaachable */ } --------------060907050406000808040700--