All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] Perf Bench: Locking Microbenchmark
@ 2014-09-30 23:49 Tuan Bui
  2014-10-01  5:28 ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Tuan Bui @ 2014-09-30 23:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: dbueso, a.p.zijlstra, paulus, acme, artagnon, jolsa, dvhart,
	Aswin Chandramouleeswaran, Jason Low, akpm, mingo, Tuan Bui

Subject: [RFC PATCH] Perf Bench: Locking Microbenchmark

In response to this thread https://lkml.org/lkml/2014/2/11/93, this is
a micro benchmark that stresses locking contention in the kernel with
creat(2) system call by spawning multiple processes to spam this system
call.  This workload generate similar results and contentions in AIM7
fserver workload but can generate outputs within seconds.

With the creat(2) system call the contention vary on what locks are used
in the particular file system. I have ran this benchmark only on ext4
and xfs file system.

Running the creat workload on ext4 show contention in the mutex lock
that is used by ext4_orphan_add() and ext4_orphan_del() to add or delete
an inode from the list of inodes. At the same time running the creat
workload on xfs show contention in the spinlock that is used by
xsf_log_commit_cil() to commit a transaction to the Committed Item List.

Here is a comparison of this benchmark with AIM7 running fserver workload
at 500-1000 users along with a perf trace running on ext4 file system.

Test machine is a 8-sockets 80 cores Westmere system HT-off on v3.17-rc6.

	AIM7		AIM7		perf-bench	perf-bench
Users	Jobs/min	Jobs/min/child	Ops/sec		Ops/sec/child
500	119668.25	239.34		104249		208
600	126074.90	210.12		106136		176
700	128662.42	183.80		106175		151
800	119822.05	149.78		106290		132
900	106150.25	117.94		105230		116
1000	104681.29	104.68		106489		106

Perf trace for AIM7 fserver:
14.51%	reaim  		[kernel.kallsyms]	[k] osq_lock
4.98%	reaim  		reaim			[.] add_long
4.98%	reaim  		reaim			[.] add_int
4.31%	reaim  		[kernel.kallsyms]	[k] mutex_spin_on_owner
...

Perf trace of perf bench creat
22.37%	locking-creat  [kernel.kallsyms]	[k] osq_lock
5.77%	locking-creat  [kernel.kallsyms]	[k] mutex_spin_on_owner
5.31%	locking-creat  [kernel.kallsyms]	[k] _raw_spin_lock
5.15%	locking-creat  [jbd2]			[k] jbd2_journal_put_journal_head
...

Signed-off-by: Tuan Bui <tuan.d.bui@hp.com>
---
 tools/perf/Documentation/perf-bench.txt |    8 ++
 tools/perf/Makefile.perf                |    1 +
 tools/perf/bench/bench.h                |    1 +
 tools/perf/bench/locking.c              |  234 +++++++++++++++++++++++++++++++
 tools/perf/builtin-bench.c              |    8 ++
 5 files changed, 252 insertions(+)
 create mode 100644 tools/perf/bench/locking.c

diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt
index f6480cb..e679135 100644
--- a/tools/perf/Documentation/perf-bench.txt
+++ b/tools/perf/Documentation/perf-bench.txt
@@ -58,6 +58,9 @@ SUBSYSTEM
 'futex'::
 	Futex stressing benchmarks.
 
+'locking'::
+        Locking stressing benchmarks.
+
 'all'::
 	All benchmark subsystems.
 
@@ -213,6 +216,11 @@ Suite for evaluating wake calls.
 *requeue*::
 Suite for evaluating requeue calls.
 
+SUITES FOR 'locking'
+~~~~~~~~~~~~~~~~~~
+*creat*::
+Suite for evaluating locking contention through creat(2).
+
 SEE ALSO
 --------
 linkperf:perf[1]
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 2240974..e5a9d23 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -438,6 +438,7 @@ BUILTIN_OBJS += $(OUTPUT)bench/mem-memset.o
 BUILTIN_OBJS += $(OUTPUT)bench/futex-hash.o
 BUILTIN_OBJS += $(OUTPUT)bench/futex-wake.o
 BUILTIN_OBJS += $(OUTPUT)bench/futex-requeue.o
+BUILTIN_OBJS += $(OUTPUT)bench/locking.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-diff.o
 BUILTIN_OBJS += $(OUTPUT)builtin-evlist.o
diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
index 3c4dd44..d31c0ba 100644
--- a/tools/perf/bench/bench.h
+++ b/tools/perf/bench/bench.h
@@ -34,6 +34,7 @@ extern int bench_mem_memset(int argc, const char **argv, const char *prefix);
 extern int bench_futex_hash(int argc, const char **argv, const char *prefix);
 extern int bench_futex_wake(int argc, const char **argv, const char *prefix);
 extern int bench_futex_requeue(int argc, const char **argv, const char *prefix);
+extern int bench_locking_creat(int argc, const char **argv, const char *prefix);
 
 #define BENCH_FORMAT_DEFAULT_STR	"default"
 #define BENCH_FORMAT_DEFAULT		0
diff --git a/tools/perf/bench/locking.c b/tools/perf/bench/locking.c
new file mode 100644
index 0000000..8cbe8a6
--- /dev/null
+++ b/tools/perf/bench/locking.c
@@ -0,0 +1,234 @@
+/*
+ * locking.c
+ *
+ * Simple micro benchmark that stress kernel locking contention with
+ * creat(2) system call by spawning multiple processes to call
+ * this system call.
+ *
+ * Results output are average operations/sec for all threads and
+ * average operations/sec per threads.
+ *
+ * Tuan Bui <tuan.d.bui@hp.com>
+ */
+
+#include "../perf.h"
+#include "../util/util.h"
+#include "../util/stat.h"
+#include "../util/parse-options.h"
+#include "../util/header.h"
+#include "bench.h"
+
+#include <err.h>
+#include <stdlib.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include <sys/resource.h>
+#include <linux/futex.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <signal.h>
+#include <dirent.h>
+
+struct worker {
+	pid_t pid;
+	unsigned int order_id;
+	char str[50];
+};
+
+struct timeval start, end, total;
+static unsigned int start_nr_threads = 100;
+static unsigned int end_nr_threads = 1100;
+static unsigned int increment_threads_by = 100;
+static unsigned int bench_dur = 5;
+
+/* Shared variables between fork processes*/
+unsigned int *finished, *setup;
+unsigned long long *shared_workers;
+/* all processes will block on the same futex */
+u_int32_t *futex;
+
+static const struct option options[] = {
+	OPT_UINTEGER('s', "start", &start_nr_threads, "Numbers of processes to start"),
+	OPT_UINTEGER('e', "end", &end_nr_threads, "Numbers of process to end"),
+	OPT_UINTEGER('i', "increment", &increment_threads_by, "Number of threads to increment)"),
+	OPT_UINTEGER('r', "runtime", &bench_dur, "Specify benchmark runtime in seconds"),
+	OPT_END()
+};
+
+static const char * const bench_locking_creat_usage[] = {
+	"perf bench locking creat <options>",
+	NULL
+};
+
+/* Running bench creat workload */
+static void *run_bench_creat(struct worker *workers)
+{
+	int fd;
+	unsigned long long nr_ops = 0;
+	int ret;
+
+	sprintf(workers->str, "%d-XXXXXX", getpid());
+	ret = mkstemp(workers->str);
+	if (ret < 0)
+		err(EXIT_FAILURE, "mkstemp");
+
+	/* Signal to parent process and wait till all threads are ready run */
+	setup[workers->order_id] = 1;
+	syscall(SYS_futex, futex, FUTEX_WAIT, 0, NULL, NULL, 0);
+
+	/* Start of the benchmark keep looping till parent process signal completion */
+	while (!*finished) {
+		fd = creat(workers->str, S_IRWXU);
+		if (fd < 0)
+			err(EXIT_FAILURE, "creat");
+		nr_ops++;
+		close(fd);
+	}
+
+	unlink(workers->str);
+	shared_workers[workers->order_id] = nr_ops;
+	setup[workers->order_id] = 0;
+	exit(0);
+}
+
+/* Setting shared variable finished and shared_workers */
+static void setup_shared(void)
+{
+	unsigned int *finished_tmp, *setup_tmp;
+	unsigned long long *shared_workers_tmp;
+	u_int32_t *futex_tmp;
+
+	/* finished shared var is use to signal start and end of benchmark */
+	finished_tmp = (void *)mmap(0, sizeof(unsigned int), PROT_READ|PROT_WRITE,
+			MAP_SHARED|MAP_ANONYMOUS, -1, 0);
+	if (finished_tmp == (void *) -1)
+		err(EXIT_FAILURE, "mmap finished");
+	finished = finished_tmp;
+
+	/* shared_workers is an array of ops perform by each process */
+	shared_workers_tmp = (void *)mmap(0, sizeof(unsigned long long)*end_nr_threads,
+			PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
+	if (shared_workers_tmp == (void *) -1)
+		err(EXIT_FAILURE, "mmap shared_workers");
+	shared_workers = shared_workers_tmp;
+
+	/* setup is use for each processes to signal that it is done
+	 * setting up for the benchmark and is ready to run */
+	setup_tmp = (void *)mmap(0, sizeof(unsigned int)*end_nr_threads,
+			PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
+	if (setup_tmp == (void *) -1)
+		err(EXIT_FAILURE, "mmap shared_workers");
+	setup = setup_tmp;
+
+	/* Processes will sleep on this futex until all other processes
+	 * are done setting up and are ready to run */
+	futex_tmp = (void *)mmap(0, sizeof(u_int32_t *), PROT_READ|PROT_WRITE,
+			MAP_SHARED|MAP_ANONYMOUS, -1, 0);
+	if (futex_tmp == (void *) -1)
+		err(EXIT_FAILURE, "mmap finished");
+	futex = futex_tmp;
+	(*futex) = 0;
+}
+
+/* Freeing shared variables */
+static void free_resources(void)
+{
+	if ((munmap(finished, sizeof(unsigned int)) == -1))
+		err(EXIT_FAILURE, "munmap finished");
+
+	if ((munmap(shared_workers, sizeof(unsigned long long) * end_nr_threads) == -1))
+		err(EXIT_FAILURE, "munmap shared_workers");
+
+	if ((munmap(setup, sizeof(unsigned int) * end_nr_threads) == -1))
+		err(EXIT_FAILURE, "munmap shared_workers");
+
+	if ((munmap(futex, sizeof(u_int32_t))) == -1)
+		err(EXIT_FAILURE, "munmap finished");
+}
+
+/* Start to spawn workers and wait till all workers have been
+ * created before starting workload */
+static void spawn_workers(void *(*bench_ptr) (struct worker *))
+{
+	pid_t parent, child;
+	unsigned int i, j, k;
+	struct worker workers;
+	unsigned long long total_ops;
+	unsigned int total_workers;
+
+	parent = getpid();
+	setup_shared();
+
+	/* This loop through all the run each is increment by increment_threads_by */
+	for (i = start_nr_threads; i <= end_nr_threads; i += increment_threads_by) {
+
+		for (j = 0; j < i; j++) {
+			if (!fork())
+				break;
+		}
+
+		child = getpid();
+		/* Initialize child worker struct and run benchmark */
+		if (child != parent) {
+			workers.order_id = j;
+			workers.pid = child;
+			bench_ptr(&workers);
+		}
+		/* Parent to sleep during the duration of benchmark */
+		else{
+			/* Make sure all child process are created and setup
+			 * before starting benchmark for bench_dur durations */
+			do {
+				total_workers = 0;
+				for (k = 0; k < i; k++)
+					total_workers = total_workers + setup[k];
+			} while (total_workers != i);
+
+			/* Wake up all sleeping process to run the benchmark */
+			(*futex) = 1;
+			syscall(SYS_futex, futex, FUTEX_WAKE, i, NULL, NULL, 0);
+
+			/* All proccesses are ready signal them to run */
+			gettimeofday(&start, NULL);
+			sleep(bench_dur);
+			(*finished) = 1;
+			gettimeofday(&end, NULL);
+			timersub(&end, &start, &total);
+
+			/* Wait for all process to terminate before getting outputs */
+			for (k = 0; k < i; k++)
+				wait(NULL);
+
+			/* Sum up all the ops by each process and report */
+			total_ops = 0;
+			for (k = 0; k < i; k++)
+				total_ops = total_ops + shared_workers[k];
+
+			printf("\n%6d threads: throughput = %llu average opts/sec all threads\n",
+				i, (total_ops / total.tv_sec));
+
+			printf("%6d threads: throughput = %llu average opts/sec per thread\n",
+				i, ((total_ops/total.tv_sec)/(!i ? 1:i)));
+
+			/* Reset back to 0 for next run */
+			(*finished) = 0;
+			(*futex) = 1;
+		}
+	}
+}
+
+int bench_locking_creat(int argc, const char **argv,
+			const char *prefix __maybe_unused)
+{
+	argc = parse_options(argc, argv, options, bench_locking_creat_usage, 0);
+
+	if (argc) {
+		usage_with_options(bench_locking_creat_usage, options);
+		exit(EXIT_FAILURE);
+	}
+
+	spawn_workers(run_bench_creat);
+	free_resources();
+	return 0;
+}
diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
index b9a56fa..cdaa84a 100644
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
@@ -63,6 +63,13 @@ static struct bench futex_benchmarks[] = {
 	{ NULL,		NULL,						NULL			}
 };
 
+static struct bench locking_benchmarks[] = {
+	{ "creat",      "Benchmark using creat(2)",			bench_locking_creat     },
+	{ "all",        "Run all benchmarks in this suite",		NULL			},
+	{ NULL,		NULL,						NULL			}
+};
+
+
 struct collection {
 	const char	*name;
 	const char	*summary;
@@ -76,6 +83,7 @@ static struct collection collections[] = {
 	{ "numa",	"NUMA scheduling and MM benchmarks",		numa_benchmarks		},
 #endif
 	{"futex",       "Futex stressing benchmarks",                   futex_benchmarks        },
+	{"locking",     "Kernel locking benchmarks",                    locking_benchmarks      },
 	{ "all",	"All benchmarks",				NULL			},
 	{ NULL,		NULL,						NULL			}
 };
-- 
1.7.9.5




^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] Perf Bench: Locking Microbenchmark
  2014-09-30 23:49 [RFC PATCH] Perf Bench: Locking Microbenchmark Tuan Bui
@ 2014-10-01  5:28 ` Ingo Molnar
  2014-10-01 17:12   ` Arnaldo Carvalho de Melo
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ingo Molnar @ 2014-10-01  5:28 UTC (permalink / raw)
  To: Tuan Bui
  Cc: linux-kernel, dbueso, a.p.zijlstra, paulus, acme, artagnon,
	jolsa, dvhart, Aswin Chandramouleeswaran, Jason Low, akpm


* Tuan Bui <tuan.d.bui@hp.com> wrote:

> Subject: [RFC PATCH] Perf Bench: Locking Microbenchmark
> 
> In response to this thread https://lkml.org/lkml/2014/2/11/93, 
> this is a micro benchmark that stresses locking contention in 
> the kernel with creat(2) system call by spawning multiple 
> processes to spam this system call.  This workload generate 
> similar results and contentions in AIM7 fserver workload but 
> can generate outputs within seconds.
> 
> With the creat(2) system call the contention vary on what locks 
> are used in the particular file system. I have ran this 
> benchmark only on ext4 and xfs file system.
> 
> Running the creat workload on ext4 show contention in the mutex 
> lock that is used by ext4_orphan_add() and ext4_orphan_del() to 
> add or delete an inode from the list of inodes. At the same 
> time running the creat workload on xfs show contention in the 
> spinlock that is used by xsf_log_commit_cil() to commit a 
> transaction to the Committed Item List.
> 
> Here is a comparison of this benchmark with AIM7 running 
> fserver workload at 500-1000 users along with a perf trace 
> running on ext4 file system.
> 
> Test machine is a 8-sockets 80 cores Westmere system HT-off on 
> v3.17-rc6.
> 
> 	AIM7		AIM7		perf-bench	perf-bench
> Users	Jobs/min	Jobs/min/child	Ops/sec		Ops/sec/child
> 500	119668.25	239.34		104249		208
> 600	126074.90	210.12		106136		176
> 700	128662.42	183.80		106175		151
> 800	119822.05	149.78		106290		132
> 900	106150.25	117.94		105230		116
> 1000	104681.29	104.68		106489		106
> 
> Perf trace for AIM7 fserver:
> 14.51%	reaim  		[kernel.kallsyms]	[k] osq_lock
> 4.98%	reaim  		reaim			[.] add_long
> 4.98%	reaim  		reaim			[.] add_int
> 4.31%	reaim  		[kernel.kallsyms]	[k] mutex_spin_on_owner
> ...
> 
> Perf trace of perf bench creat
> 22.37%	locking-creat  [kernel.kallsyms]	[k] osq_lock
> 5.77%	locking-creat  [kernel.kallsyms]	[k] mutex_spin_on_owner
> 5.31%	locking-creat  [kernel.kallsyms]	[k] _raw_spin_lock
> 5.15%	locking-creat  [jbd2]			[k] jbd2_journal_put_journal_head
> ...

Very nice!

If you compare an strace of AIM7 steady state and 'perf bench 
lock' steady state, is it comparable, i.e. do the syscalls and 
other behavioral patterns match up?

> +'locking'::
> +        Locking stressing benchmarks.
> +
>  'all'::
>  	All benchmark subsystems.
>  
> @@ -213,6 +216,11 @@ Suite for evaluating wake calls.
>  *requeue*::
>  Suite for evaluating requeue calls.
>  
> +SUITES FOR 'locking'
> +~~~~~~~~~~~~~~~~~~
> +*creat*::
> +Suite for evaluating locking contention through creat(2).

So I'd display it in the help text prominently that it's a 
workload similar to the AIM7 workload.

> +static const struct option options[] = {
> +	OPT_UINTEGER('s', "start", &start_nr_threads, "Numbers of processes to start"),
> +	OPT_UINTEGER('e', "end", &end_nr_threads, "Numbers of process to end"),
> +	OPT_UINTEGER('i', "increment", &increment_threads_by, "Number of threads to increment)"),
> +	OPT_UINTEGER('r', "runtime", &bench_dur, "Specify benchmark runtime in seconds"),
> +	OPT_END()
> +};

Is this the kind of parameters that AIM7 takes as well?

In any case, this is a very nice benchmarking utility.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] Perf Bench: Locking Microbenchmark
  2014-10-01  5:28 ` Ingo Molnar
@ 2014-10-01 17:12   ` Arnaldo Carvalho de Melo
  2014-10-03  4:57     ` Davidlohr Bueso
  2014-10-08 22:11     ` Tuan Bui
  2014-10-03  4:52   ` Davidlohr Bueso
  2014-10-08 22:13   ` Tuan Bui
  2 siblings, 2 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-10-01 17:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tuan Bui, linux-kernel, dbueso, a.p.zijlstra, paulus, artagnon,
	jolsa, dvhart, Aswin Chandramouleeswaran, Jason Low, akpm

Em Wed, Oct 01, 2014 at 07:28:32AM +0200, Ingo Molnar escreveu:
> If you compare an strace of AIM7 steady state and 'perf bench 
> lock' steady state, is it comparable, i.e. do the syscalls and 

Isn't "lock" too generic? Isn't this stressing some specific lock and if
so shouldn't that be made abundantly clear in the 'perf bench' test name
and in the docs?

Or is this the case that it started by using 'creat' calls to stress
some locking and will go on adding more syscalls to stress more kernel
locks?

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] Perf Bench: Locking Microbenchmark
  2014-10-01  5:28 ` Ingo Molnar
  2014-10-01 17:12   ` Arnaldo Carvalho de Melo
@ 2014-10-03  4:52   ` Davidlohr Bueso
  2014-10-08 22:13   ` Tuan Bui
  2 siblings, 0 replies; 9+ messages in thread
From: Davidlohr Bueso @ 2014-10-03  4:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tuan Bui, linux-kernel, a.p.zijlstra, paulus, acme, artagnon,
	jolsa, dvhart, Aswin Chandramouleeswaran, Jason Low, akpm

On Wed, 2014-10-01 at 07:28 +0200, Ingo Molnar wrote:
> If you compare an strace of AIM7 steady state and 'perf bench 
> lock' steady state, is it comparable, i.e. do the syscalls and 
> other behavioral patterns match up?

With more than 1000 users I'm seeing:

-  33.74%    locking-creat  [kernel.kallsyms]              [k] mspin_lock                                                                                                ◆
   + mspin_lock                                                                                                                                                          ▒
   + __mutex_lock_slowpath                                                                                                                                               ▒
   + mutex_lock                                                                                                                                                          ▒
-   7.97%    locking-creat  [kernel.kallsyms]              [k] mutex_spin_on_owner                                                                                       ▒
   + mutex_spin_on_owner                                                                                                                                                 ▒
   + __mutex_lock_slowpath                                                                                                                                               ▒
   + mutex_lock                                                                                                     

Lower users count just shows the syscall entries.

Of course, the aim7 setup was running on a ramdisk, thus avoiding any IO
overhead in the traces.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] Perf Bench: Locking Microbenchmark
  2014-10-01 17:12   ` Arnaldo Carvalho de Melo
@ 2014-10-03  4:57     ` Davidlohr Bueso
  2014-10-08 22:14       ` Tuan Bui
  2014-10-08 22:11     ` Tuan Bui
  1 sibling, 1 reply; 9+ messages in thread
From: Davidlohr Bueso @ 2014-10-03  4:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Tuan Bui, linux-kernel, a.p.zijlstra, paulus,
	artagnon, jolsa, dvhart, Aswin Chandramouleeswaran, Jason Low,
	akpm

On Wed, 2014-10-01 at 14:12 -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 01, 2014 at 07:28:32AM +0200, Ingo Molnar escreveu:
> > If you compare an strace of AIM7 steady state and 'perf bench 
> > lock' steady state, is it comparable, i.e. do the syscalls and 
> 
> Isn't "lock" too generic? Isn't this stressing some specific lock and if
> so shouldn't that be made abundantly clear in the 'perf bench' test name
> and in the docs?

yeah, and 'perf bench locking creat' just doesn't sound right.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] Perf Bench: Locking Microbenchmark
  2014-10-01 17:12   ` Arnaldo Carvalho de Melo
  2014-10-03  4:57     ` Davidlohr Bueso
@ 2014-10-08 22:11     ` Tuan Bui
  1 sibling, 0 replies; 9+ messages in thread
From: Tuan Bui @ 2014-10-08 22:11 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, dbueso, a.p.zijlstra, paulus,
	artagnon, jolsa, dvhart, Aswin Chandramouleeswaran, Jason Low,
	akpm

On Wed, 2014-10-01 at 14:12 -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 01, 2014 at 07:28:32AM +0200, Ingo Molnar escreveu:
> > If you compare an strace of AIM7 steady state and 'perf bench 
> > lock' steady state, is it comparable, i.e. do the syscalls and 
> 
> Isn't "lock" too generic? Isn't this stressing some specific lock and if
> so shouldn't that be made abundantly clear in the 'perf bench' test name
> and in the docs?
> 

In this micro benchmark, I am trying to exhibit the same locking
contention shown in an AIM7 fserver workload.  Since the creat(2) system
call is file system dependent running this on different file system show
different lock being contended that is why i did not specify specific
lock name in the doc.  Do you have a suggestion here on how i should
name this benchmark?

> Or is this the case that it started by using 'creat' calls to stress
> some locking and will go on adding more syscalls to stress more kernel
> locks?
> 

When running all AIM7 workloads looking for locking contention to
reproduce, creat was the only one I found interesting and useful to
stress locking contention.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] Perf Bench: Locking Microbenchmark
  2014-10-01  5:28 ` Ingo Molnar
  2014-10-01 17:12   ` Arnaldo Carvalho de Melo
  2014-10-03  4:52   ` Davidlohr Bueso
@ 2014-10-08 22:13   ` Tuan Bui
  2014-10-09  7:21     ` Ingo Molnar
  2 siblings, 1 reply; 9+ messages in thread
From: Tuan Bui @ 2014-10-08 22:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, dbueso, a.p.zijlstra, paulus, acme, artagnon,
	jolsa, dvhart, Aswin Chandramouleeswaran, Jason Low, akpm

On Wed, 2014-10-01 at 07:28 +0200, Ingo Molnar wrote:

> > 
> > Perf trace of perf bench creat
> > 22.37%	locking-creat  [kernel.kallsyms]	[k] osq_lock
> > 5.77%	locking-creat  [kernel.kallsyms]	[k] mutex_spin_on_owner
> > 5.31%	locking-creat  [kernel.kallsyms]	[k] _raw_spin_lock
> > 5.15%	locking-creat  [jbd2]			[k] jbd2_journal_put_journal_head
> > ...
> 
> Very nice!
> 
> If you compare an strace of AIM7 steady state and 'perf bench 
> lock' steady state, is it comparable, i.e. do the syscalls and 
> other behavioral patterns match up?
> 

Here is an strace -cf of my perf bench and AIM7 fserver workload at 1000
users on an ext4 file system.  My perf bench results look comparable to
the AIM7 fserver workload to me.  What do you think?

strace -cf for perf bench locking creat at 1000 users

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ---------
 79.29    4.421000         221     20018           creat
 13.07    0.729000         729      1000           unlink
  6.47    0.361000          18     20032           close
  0.60    0.033213          33      1000           wait4
  0.37    0.020365          20      1000           clone
  0.20    0.011000          11      1003         2 futex
  0.00    0.000037           6         6           munmap
  0.00    0.000010           0        24           mprotect
  0.00    0.000009           0        44           mmap
  0.00    0.000000           0        12           read
  0.00    0.000000           0         4           write
  0.00    0.000000           0      1027        14 open

strace -cf for AIM7 fserver workload at 1000 users

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- -----------
 24.42  163.436284          50   3243016           creat
 18.15  121.475390          17   7148543           brk
 14.49   96.990556       85229      1138        35 wait4
  7.86   52.605030          15   3394990           close
  5.73   38.310323          31   1222317           write
  4.99   33.389587          17   2000001           kill
  4.85   32.432000          16   2001035      1000 rt_sigreturn
  4.64   31.050979          64    483800           getdents
  4.38   29.316247          14   2029311           rt_sigaction
  3.10   20.744360          45    464016      5000 unlink
  2.57   17.171514          15   1153825           read
  1.13    7.588489          35    215104           link
  0.89    5.945480           8    786320       433 stat
  0.60    4.045701          11    366004           lseek
  0.36    2.420812           9    263006           times
  0.34    2.272305          18    124982       129 open


> > +'locking'::
> > +        Locking stressing benchmarks.
> > +
> >  'all'::
> >  	All benchmark subsystems.
> >  
> > @@ -213,6 +216,11 @@ Suite for evaluating wake calls.
> >  *requeue*::
> >  Suite for evaluating requeue calls.
> >  
> > +SUITES FOR 'locking'
> > +~~~~~~~~~~~~~~~~~~
> > +*creat*::
> > +Suite for evaluating locking contention through creat(2).
> 
> So I'd display it in the help text prominently that it's a 
> workload similar to the AIM7 workload.
> 

Thank you Ingo, I will add more comments to make it more clear that it
is similar to AIM7 fserver workload. 

> > +static const struct option options[] = {
> > +	OPT_UINTEGER('s', "start", &start_nr_threads, "Numbers of processes to start"),
> > +	OPT_UINTEGER('e', "end", &end_nr_threads, "Numbers of process to end"),
> > +	OPT_UINTEGER('i', "increment", &increment_threads_by, "Number of threads to increment)"),
> > +	OPT_UINTEGER('r', "runtime", &bench_dur, "Specify benchmark runtime in seconds"),
> > +	OPT_END()
> > +};
> 
> Is this the kind of parameters that AIM7 takes as well?
> 
> In any case, this is a very nice benchmarking utility.

Yes these parameters are similar to what AIM7 take except for the
runtime parameter.  AIM7 does not have the option to specify how long
the benchmark will run.  Also in AIM7 you can also specify numbers of
jobs per run which i did not include since i added a runtime parameter
for the benchmark.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] Perf Bench: Locking Microbenchmark
  2014-10-03  4:57     ` Davidlohr Bueso
@ 2014-10-08 22:14       ` Tuan Bui
  0 siblings, 0 replies; 9+ messages in thread
From: Tuan Bui @ 2014-10-08 22:14 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	a.p.zijlstra, paulus, artagnon, jolsa, dvhart,
	Aswin Chandramouleeswaran, Jason Low, akpm

On Thu, 2014-10-02 at 21:57 -0700, Davidlohr Bueso wrote:
> On Wed, 2014-10-01 at 14:12 -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Oct 01, 2014 at 07:28:32AM +0200, Ingo Molnar escreveu:
> > > If you compare an strace of AIM7 steady state and 'perf bench 
> > > lock' steady state, is it comparable, i.e. do the syscalls and 
> > 
> > Isn't "lock" too generic? Isn't this stressing some specific lock and if
> > so shouldn't that be made abundantly clear in the 'perf bench' test name
> > and in the docs?
> 
> yeah, and 'perf bench locking creat' just doesn't sound right.
> 

Do you have any suggestion on how i should name it?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] Perf Bench: Locking Microbenchmark
  2014-10-08 22:13   ` Tuan Bui
@ 2014-10-09  7:21     ` Ingo Molnar
  0 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2014-10-09  7:21 UTC (permalink / raw)
  To: Tuan Bui
  Cc: linux-kernel, dbueso, a.p.zijlstra, paulus, acme, artagnon,
	jolsa, dvhart, Aswin Chandramouleeswaran, Jason Low, akpm


* Tuan Bui <tuan.d.bui@hp.com> wrote:

> > > +static const struct option options[] = {
> > > +	OPT_UINTEGER('s', "start", &start_nr_threads, "Numbers of processes to start"),
> > > +	OPT_UINTEGER('e', "end", &end_nr_threads, "Numbers of process to end"),
> > > +	OPT_UINTEGER('i', "increment", &increment_threads_by, "Number of threads to increment)"),
> > > +	OPT_UINTEGER('r', "runtime", &bench_dur, "Specify benchmark runtime in seconds"),
> > > +	OPT_END()
> > > +};
> > 
> > Is this the kind of parameters that AIM7 takes as well?
> > 
> > In any case, this is a very nice benchmarking utility.
> 
> Yes these parameters are similar to what AIM7 take except for 
> the runtime parameter.  AIM7 does not have the option to 
> specify how long the benchmark will run.  Also in AIM7 you can 
> also specify numbers of jobs per run which i did not include 
> since i added a runtime parameter for the benchmark.

It might make sense to add that parameter - which would only be 
allowed if no runtime is specified, or so.

I.e. to make it as easy for people to use this new tool when they 
come with AIM7 benchmarking knowledge.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-10-09  7:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-30 23:49 [RFC PATCH] Perf Bench: Locking Microbenchmark Tuan Bui
2014-10-01  5:28 ` Ingo Molnar
2014-10-01 17:12   ` Arnaldo Carvalho de Melo
2014-10-03  4:57     ` Davidlohr Bueso
2014-10-08 22:14       ` Tuan Bui
2014-10-08 22:11     ` Tuan Bui
2014-10-03  4:52   ` Davidlohr Bueso
2014-10-08 22:13   ` Tuan Bui
2014-10-09  7:21     ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.