linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf bench: add --prefault option for causing page faults before benchmark
@ 2010-11-05 17:06 Hitoshi Mitake
  2010-11-10  9:29 ` Ingo Molnar
  0 siblings, 1 reply; 3+ messages in thread
From: Hitoshi Mitake @ 2010-11-05 17:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, mitake, h.mitake, Ma Ling, Zhao Yakui,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Paul Mackerras,
	Frederic Weisbecker, Steven Rostedt, Thomas Gleixner,
	H. Peter Anvin

This patch adds --prefault option to perf bench mem memcpy.
If user specify this option to perf bench mem memcpy, overhead of
page faults will be removed from the score of memcpy().

Example of usage:
| % ./perf bench mem memcpy -l 500MB
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes from 0x7fc036749010 to 0x7fc055b4a010 ...
|
|      628.526821 MB/Sec
| mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --prefault
| # Running mem/memcpy benchmark...
| # Copying 500MB Bytes from 0x7ff1b45e2010 to 0x7ff1d39e3010 ...
|
|        4.849256 GB/Sec

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ma Ling <ling.ma@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
---
 tools/perf/bench/mem-memcpy.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/tools/perf/bench/mem-memcpy.c b/tools/perf/bench/mem-memcpy.c
index 38dae74..be31ddb 100644
--- a/tools/perf/bench/mem-memcpy.c
+++ b/tools/perf/bench/mem-memcpy.c
@@ -23,8 +23,9 @@
 
 static const char	*length_str	= "1MB";
 static const char	*routine	= "default";
-static bool		use_clock	= false;
+static bool		use_clock;
 static int		clock_fd;
+static bool		prefault;
 
 static const struct option options[] = {
 	OPT_STRING('l', "length", &length_str, "1MB",
@@ -34,6 +35,8 @@ static const struct option options[] = {
 		    "Specify routine to copy"),
 	OPT_BOOLEAN('c', "clock", &use_clock,
 		    "Use CPU clock for measuring"),
+	OPT_BOOLEAN('p', "prefault", &prefault,
+		    "Cause page faults before memcpy()"),
 	OPT_END()
 };
 
@@ -139,6 +142,10 @@ int bench_mem_memcpy(int argc, const char **argv,
 		       length_str, src, dst);
 	}
 
+
+	if (prefault)
+		routines[i].fn(dst, src, length);
+
 	if (use_clock) {
 		init_clock();
 		clock_start = get_clock();
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf bench: add --prefault option for causing page faults before benchmark
  2010-11-05 17:06 [PATCH] perf bench: add --prefault option for causing page faults before benchmark Hitoshi Mitake
@ 2010-11-10  9:29 ` Ingo Molnar
  2010-11-15 15:58   ` Hitoshi Mitake
  0 siblings, 1 reply; 3+ messages in thread
From: Ingo Molnar @ 2010-11-10  9:29 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: linux-kernel, h.mitake, Ma Ling, Zhao Yakui, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras, Frederic Weisbecker,
	Steven Rostedt, Thomas Gleixner, H. Peter Anvin


* Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> wrote:

> This patch adds --prefault option to perf bench mem memcpy.
> If user specify this option to perf bench mem memcpy, overhead of
> page faults will be removed from the score of memcpy().
> 
> Example of usage:
> | % ./perf bench mem memcpy -l 500MB
> | # Running mem/memcpy benchmark...
> | # Copying 500MB Bytes from 0x7fc036749010 to 0x7fc055b4a010 ...
> |
> |      628.526821 MB/Sec
> | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --prefault
> | # Running mem/memcpy benchmark...
> | # Copying 500MB Bytes from 0x7ff1b45e2010 to 0x7ff1d39e3010 ...
> |
> |        4.849256 GB/Sec

Ok, looks rather useful.

We are rather close to being able to apply these bits. We need a resolution for the 
arch/x86/lib/memcpy_64.S details. The ugliest are these kinds of #ifdefs:

+#ifndef PERF_BENCH
 .Lmemcpy_e:
        .previous
+#endif

What happens if we keep that label in place?

This:

+#ifndef PERF_BENCH
 ENTRY(__memcpy)
 ENTRY(memcpy)
        CFI_STARTPROC
+#else
+	.globl  memcpy_x86_64_unrolled
+memcpy_x86_64_unrolled:
+#endif

Could be removed if you defined an ENTRY() macro in perf, right?

This:

+#ifndef PERF_BENCH
+
        CFI_ENDPROC
 ENDPROC(memcpy)
 ENDPROC(__memcpy)

Could be solved by defining ENDPROC()/etc. macros in perf, right?

We could remove this #ifdef:

+#ifndef PERF_BENCH
+
 #include <linux/linkage.h>

 #include <asm/cpufeature.h>
 #include <asm/dwarf2.h>

+#endif /* PERF_BENCH */

if you added empty linkage.h, cpufeature.h and dwarf2.h files as 
tools/perf/util/include/linux/linkage.h, tools/perf/util/include/asm/cpufeature.h.

That linkage.h file could even contain a short perf version of the ENTRY() macro, 
etc.

That way we can avoid having to touch arch/x86/lib/memcpy_64.S altogether.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf bench: add --prefault option for causing page faults before benchmark
  2010-11-10  9:29 ` Ingo Molnar
@ 2010-11-15 15:58   ` Hitoshi Mitake
  0 siblings, 0 replies; 3+ messages in thread
From: Hitoshi Mitake @ 2010-11-15 15:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, h.mitake, Ma Ling, Zhao Yakui, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras, Frederic Weisbecker,
	Steven Rostedt, Thomas Gleixner, H. Peter Anvin

On 2010年11月10日 18:29, Ingo Molnar wrote:
>
> * Hitoshi Mitake<mitake@dcl.info.waseda.ac.jp>  wrote:
>
>> This patch adds --prefault option to perf bench mem memcpy.
>> If user specify this option to perf bench mem memcpy, overhead of
>> page faults will be removed from the score of memcpy().
>>
>> Example of usage:
>> | % ./perf bench mem memcpy -l 500MB
>> | # Running mem/memcpy benchmark...
>> | # Copying 500MB Bytes from 0x7fc036749010 to 0x7fc055b4a010 ...
>> |
>> |      628.526821 MB/Sec
>> | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --prefault
>> | # Running mem/memcpy benchmark...
>> | # Copying 500MB Bytes from 0x7ff1b45e2010 to 0x7ff1d39e3010 ...
>> |
>> |        4.849256 GB/Sec
>
> Ok, looks rather useful.
>
> We are rather close to being able to apply these bits. We need a resolution for the
> arch/x86/lib/memcpy_64.S details. The ugliest are these kinds of #ifdefs:
>
> +#ifndef PERF_BENCH
>   .Lmemcpy_e:
>          .previous
> +#endif
>
> What happens if we keep that label in place?

This is the part of objdump -D arch/x86/lib/memcpy_64.o,

Disassembly of section .altinstr_replacement:

0000000000000000 <.altinstr_replacement>:
    0:   48 89 f8                mov    %rdi,%rax
    3:   89 d1                   mov    %edx,%ecx
    5:   c1 e9 03                shr    $0x3,%ecx
    8:   83 e2 07                and    $0x7,%edx
    b:   f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)
    e:   89 d1                   mov    %edx,%ecx
   10:   f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
   12:   c3                      retq

I didn't know that we can use the symbol name which start with '.',
and it seems that such a symbol is eliminated from object file.

We can know the start address of .Lmemcpy_c, the rep version of memcpy()
because the start address is stored in another section,
.altinstructions like this.

These information can be exploited for our purose, I'll try it.

>
> This:
>
> +#ifndef PERF_BENCH
>   ENTRY(__memcpy)
>   ENTRY(memcpy)
>          CFI_STARTPROC
> +#else
> +	.globl  memcpy_x86_64_unrolled
> +memcpy_x86_64_unrolled:
> +#endif
>
> Could be removed if you defined an ENTRY() macro in perf, right?
>
> This:
>
> +#ifndef PERF_BENCH
> +
>          CFI_ENDPROC
>   ENDPROC(memcpy)
>   ENDPROC(__memcpy)
>
> Could be solved by defining ENDPROC()/etc. macros in perf, right?
>
> We could remove this #ifdef:
>
> +#ifndef PERF_BENCH
> +
>   #include<linux/linkage.h>
>
>   #include<asm/cpufeature.h>
>   #include<asm/dwarf2.h>
>
> +#endif /* PERF_BENCH */
>
> if you added empty linkage.h, cpufeature.h and dwarf2.h files as
> tools/perf/util/include/linux/linkage.h, tools/perf/util/include/asm/cpufeature.h.
>
> That linkage.h file could even contain a short perf version of the ENTRY() macro,
> etc.
>
> That way we can avoid having to touch arch/x86/lib/memcpy_64.S altogether.

Thanks for your advice. adding empty headers and macros
will be the smart way to include memcpy_64.S without modification.

Thanks,



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-11-15 15:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-05 17:06 [PATCH] perf bench: add --prefault option for causing page faults before benchmark Hitoshi Mitake
2010-11-10  9:29 ` Ingo Molnar
2010-11-15 15:58   ` Hitoshi Mitake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).