All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <waiman.long@hpe.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>, Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>, Jeff Layton <jlayton@poochiereds.net>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andi Kleen <andi@firstfloor.org>,
	Dave Chinner <dchinner@redhat.com>,
	Scott J Norton <scott.norton@hp.com>,
	Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list
Date: Thu, 25 Feb 2016 09:43:46 -0500	[thread overview]
Message-ID: <56CF1322.2040609@hpe.com> (raw)
In-Reply-To: <20160225080635.GB10611@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3052 bytes --]

On 02/25/2016 03:06 AM, Ingo Molnar wrote:
> * Jan Kara<jack@suse.cz>  wrote:
>
>>>>> With an exit microbenchmark that creates a large number of threads,
>>>>> attachs many inodes to them and then exits. The runtimes of that
>>>>> microbenchmark with 1000 threads before and after the patch on a 4-socket
>>>>> Intel E7-4820 v3 system (40 cores, 80 threads) were as follows:
>>>>>
>>>>>    Kernel            Elapsed Time    System Time
>>>>>    ------            ------------    -----------
>>>>>    Vanilla 4.5-rc4      65.29s         82m14s
>>>>>    Patched 4.5-rc4      22.81s         23m03s
>>>>>
>>>>> Before the patch, spinlock contention at the inode_sb_list_add() function
>>>>> at the startup phase and the inode_sb_list_del() function at the exit
>>>>> phase were about 79% and 93% of total CPU time respectively (as measured
>>>>> by perf). After the patch, the percpu_list_add() function consumed only
>>>>> about 0.04% of CPU time at startup phase. The percpu_list_del() function
>>>>> consumed about 0.4% of CPU time at exit phase. There were still some
>>>>> spinlock contention, but they happened elsewhere.
>>>> While looking through this patch, I have noticed that the
>>>> list_for_each_entry_safe() iterations in evict_inodes() and
>>>> invalidate_inodes() are actually unnecessary. So if you first apply the
>>>> attached patch, you don't have to implement safe iteration variants at all.
>>>>
>>>> As a second comment, I'd note that this patch grows struct inode by 1
>>>> pointer. It is probably acceptable for large machines given the speedup but
>>>> it should be noted in the changelog. Furthermore for UP or even small SMP
>>>> systems this is IMHO undesired bloat since the speedup won't be noticeable.
>>>>
>>>> So for these small systems it would be good if per-cpu list magic would just
>>>> fall back to single linked list with a spinlock. Do you think that is
>>>> reasonably doable?
>>> Even many 'small' systems tend to be SMP these days.
>> Yes, I know. But my tablet with 4 ARM cores is unlikely to benefit from this
>> change either. [...]
> I'm not sure about that at all, the above numbers are showing a 3x-4x speedup in
> system time, which ought to be noticeable on smaller SMP systems as well.
>
> Waiman, could you please post the microbenchmark?
>
> Thanks,
>
> 	Ingo

The microbenchmark that I used is attached.

I do agree that performance benefit will decrease as the number of CPUs 
get smaller. The system that I used for testing have 4 sockets with 40 
cores (80 threads). Dave Chinner had run his fstests on a 16-core system 
(probably 2-socket) which showed modest improvement in performance 
(~4m40s vs 4m30s in runtime).

This patch enables parallel insertion and deletion to/from the inode 
list which used to be a serialized operation. So if that list operation 
is a bottleneck, you will see significant improvement. If it is not, we 
may not notice that much of a difference. For a single-socket 4-core 
system, I agree that the performance benefit, if any, will be limited.

Cheers,
Longman


[-- Attachment #2: exit_test.c --]
[-- Type: text/plain, Size: 2665 bytes --]

/*
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * Authors: Waiman Long <waiman.long@hp.com>
 */
/*
 * This is an exit test
 */
#include <ctype.h>
#include <errno.h>
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/syscall.h>


#define do_exit()	syscall(SYS_exit)
#define	gettid()	syscall(SYS_gettid)
#define	MAX_THREADS	2048

static inline void cpu_relax(void)
{
        __asm__ __volatile__("rep;nop": : :"memory");
}

static inline void atomic_inc(volatile int *v)
{
	__asm__ __volatile__("lock incl %0": "+m" (*v));
}

static volatile int exit_now  = 0;
static volatile int threadcnt = 0;

/*
 * Walk the /proc/<pid> filesystem to make them fill the dentry cache
 */
static void walk_procfs(void)
{
	char cmdbuf[256];
	pid_t tid = gettid();

	snprintf(cmdbuf, sizeof(cmdbuf), "find /proc/%d > /dev/null 2>&1", tid);
	if (system(cmdbuf) < 0)
		perror("system() failed!");
}

static void *exit_thread(void *dummy)
{
	long tid = (long)dummy;

	walk_procfs();
	atomic_inc(&threadcnt);
	/*
	 * Busy wait until the do_exit flag is set and then call exit
	 */
	while (!exit_now)
		sleep(1);
	do_exit();
}

static void exit_test(int threads)
{
	pthread_t thread[threads];
	long i = 0, finish;
	time_t start = time(NULL);

	while (i++ < threads) {
		if (pthread_create(thread + i - 1, NULL, exit_thread,
				  (void *)i)) {
			perror("pthread_create");
			exit(1);
		}
#if 0
		/*
		 * Pipelining to reduce contention & improve speed
		 */
		if ((i & 0xf) == 0)
			 while (i - threadcnt > 12)
				usleep(1);
#endif
	}
	while (threadcnt != threads)
		usleep(1);
	walk_procfs();
	printf("Setup time = %lus\n", time(NULL) - start);
	printf("Process ready to exit!\n");
	kill(0, SIGKILL);
	exit(0);
}

int main(int argc, char *argv[])
{
	int   tcnt;	/* Thread counts */
	char *cmd = argv[0];

	if ((argc != 2) || !isdigit(argv[1][0])) {
		fprintf(stderr, "Usage: %s <thread count>\n", cmd);
		exit(1);
	}
	tcnt = strtoul(argv[1], NULL, 10);
	if (tcnt > MAX_THREADS) {
		fprintf(stderr, "Error: thread count should be <= %d\n",
			MAX_THREADS);
		exit(1);
	}
	exit_test(tcnt);
	return 0;	/* Not reaachable */
}

  reply	other threads:[~2016-02-25 14:44 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-23 19:04 [PATCH v3 0/3] vfs: Use per-cpu list for SB's s_inodes list Waiman Long
2016-02-23 19:04 ` [PATCH v3 1/3] lib/percpu-list: Per-cpu list with associated per-cpu locks Waiman Long
2016-02-24  2:00   ` Boqun Feng
2016-02-24  4:01     ` Waiman Long
2016-02-24  7:56   ` Jan Kara
2016-02-24 19:51     ` Waiman Long
2016-02-23 19:04 ` [PATCH v3 2/3] fsnotify: Simplify inode iteration on umount Waiman Long
2016-02-23 19:04 ` [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list Waiman Long
2016-02-24  8:28   ` Jan Kara
2016-02-24  8:36     ` Ingo Molnar
2016-02-24  8:58       ` Jan Kara
2016-02-25  8:06         ` Ingo Molnar
2016-02-25 14:43           ` Waiman Long [this message]
2016-02-24 20:23     ` Waiman Long
2016-02-25 14:50       ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56CF1322.2040609@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=andi@firstfloor.org \
    --cc=bfields@fieldses.org \
    --cc=cl@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=doug.hatch@hp.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=jlayton@poochiereds.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hp.com \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.