linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
@ 2016-07-04 10:27 Byungchul Park
  2016-07-04 10:27 ` [PATCH 2/2] x86/dumpstack: Add save_stack_trace_norm() Byungchul Park
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Byungchul Park @ 2016-07-04 10:27 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, walken

I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
I want to proceed saperately since it's somewhat independent from each
other. Frankly speaking, I want this patchset to be accepted at first so
that the crossfeature can use this optimized save_stack_trace_norm()
which makes crossrelease work smoothly.

----->8-----
>From 1ceb4cee520cfc562d5d63471f6db4e9a8d9ff42 Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul.park@lge.com>
Date: Mon, 4 Jul 2016 18:31:09 +0900
Subject: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace

Currently, x86 implementation of save_stack_trace() is walking all stack
region word by word regardless of what the trace->max_entries is.
However, it's unnecessary to walk after already fulfilling caller's
requirement, say, if trace->nr_entries >= trace->max_entries is true.

I measured its overhead and printed its difference of sched_clock() with
my QEMU x86 machine. The latency was improved over 70% when
trace->max_entries = 5.

Before this patch:

[    2.326940] save_stack_trace() takes 83931 ns
[    2.326389] save_stack_trace() takes 62576 ns
[    2.327575] save_stack_trace() takes 58826 ns
[    2.327000] save_stack_trace() takes 88980 ns
[    2.327424] save_stack_trace() takes 59831 ns
[    2.327575] save_stack_trace() takes 58482 ns
[    2.327597] save_stack_trace() takes 87114 ns
[    2.327931] save_stack_trace() takes 121140 ns
[    2.327434] save_stack_trace() takes 64321 ns
[    2.328632] save_stack_trace() takes 84997 ns
[    2.328000] save_stack_trace() takes 115037 ns
[    2.328460] save_stack_trace() takes 72292 ns
[    2.328632] save_stack_trace() takes 61236 ns
[    2.328567] save_stack_trace() takes 76666 ns
[    2.328867] save_stack_trace() takes 79525 ns
[    2.328460] save_stack_trace() takes 64902 ns
[    2.329585] save_stack_trace() takes 58760 ns
[    2.329000] save_stack_trace() takes 91349 ns
[    2.329414] save_stack_trace() takes 60069 ns
[    2.329585] save_stack_trace() takes 61012 ns
[    2.329573] save_stack_trace() takes 76820 ns
[    2.329863] save_stack_trace() takes 62131 ns
[    2.330000] save_stack_trace() takes 99476 ns
[    2.329846] save_stack_trace() takes 62419 ns
[    2.330000] save_stack_trace() takes 88918 ns
[    2.330253] save_stack_trace() takes 73669 ns
[    2.330520] save_stack_trace() takes 67876 ns
[    2.330671] save_stack_trace() takes 75963 ns
[    2.330983] save_stack_trace() takes 95079 ns
[    2.330451] save_stack_trace() takes 62352 ns

After this patch:

[    2.780735] save_stack_trace() takes 19902 ns
[    2.780718] save_stack_trace() takes 20240 ns
[    2.781692] save_stack_trace() takes 45215 ns
[    2.781477] save_stack_trace() takes 20191 ns
[    2.781694] save_stack_trace() takes 20044 ns
[    2.782589] save_stack_trace() takes 20292 ns
[    2.782706] save_stack_trace() takes 20024 ns
[    2.782706] save_stack_trace() takes 19881 ns
[    2.782881] save_stack_trace() takes 24577 ns
[    2.782706] save_stack_trace() takes 19901 ns
[    2.783621] save_stack_trace() takes 24381 ns
[    2.783621] save_stack_trace() takes 20205 ns
[    2.783760] save_stack_trace() takes 19956 ns
[    2.783718] save_stack_trace() takes 20280 ns
[    2.784179] save_stack_trace() takes 20099 ns
[    2.784835] save_stack_trace() takes 20055 ns
[    2.785922] save_stack_trace() takes 20157 ns
[    2.785922] save_stack_trace() takes 20140 ns
[    2.786178] save_stack_trace() takes 20040 ns
[    2.786877] save_stack_trace() takes 20102 ns
[    2.795000] save_stack_trace() takes 21147 ns
[    2.795397] save_stack_trace() takes 20230 ns
[    2.795397] save_stack_trace() takes 31274 ns
[    2.795739] save_stack_trace() takes 19706 ns
[    2.796484] save_stack_trace() takes 20266 ns
[    2.796484] save_stack_trace() takes 20902 ns
[    2.797000] save_stack_trace() takes 38110 ns
[    2.797510] save_stack_trace() takes 20224 ns
[    2.798181] save_stack_trace() takes 20172 ns
[    2.798837] save_stack_trace() takes 20824 ns

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 arch/x86/include/asm/stacktrace.h | 1 +
 arch/x86/kernel/dumpstack.c       | 4 ++++
 arch/x86/kernel/dumpstack_32.c    | 2 ++
 arch/x86/kernel/stacktrace.c      | 7 +++++++
 4 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
index 70bbe39..fc572e7 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -41,6 +41,7 @@ struct stacktrace_ops {
 	/* On negative return stop dumping */
 	int (*stack)(void *data, char *name);
 	walk_stack_t	walk_stack;
+	int (*end_walk)(void *data);
 };
 
 void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 9c30acf..89f68f3 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -115,6 +115,8 @@ print_context_stack(struct thread_info *tinfo,
 			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
 		}
 		stack++;
+		if (ops->end_walk && ops->end_walk(data))
+			break;
 	}
 	return bp;
 }
@@ -139,6 +141,8 @@ print_context_stack_bp(struct thread_info *tinfo,
 		frame = frame->next_frame;
 		ret_addr = &frame->return_address;
 		print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
+		if (ops->end_walk && ops->end_walk(data))
+			break;
 	}
 
 	return (unsigned long)frame;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index 464ffd6..cc51419 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -71,6 +71,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		context = task_thread_info(task);
 		bp = ops->walk_stack(context, stack, bp, ops, data,
 				     end_stack, &graph);
+		if (ops->end_walk && ops->end_walk(data))
+			break;
 
 		/* Stop if not on irq stack */
 		if (!end_stack)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index fdd0c64..9545719 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -43,10 +43,17 @@ save_stack_address_nosched(void *data, unsigned long addr, int reliable)
 	return __save_stack_address(data, addr, reliable, true);
 }
 
+static int save_stack_end(void *data)
+{
+	struct stack_trace *trace = data;
+	return trace->nr_entries >= trace->max_entries;
+}
+
 static const struct stacktrace_ops save_stack_ops = {
 	.stack		= save_stack_stack,
 	.address	= save_stack_address,
 	.walk_stack	= print_context_stack,
+	.end_walk	= save_stack_end,
 };
 
 static const struct stacktrace_ops save_stack_ops_nosched = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] x86/dumpstack: Add save_stack_trace_norm()
  2016-07-04 10:27 [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
@ 2016-07-04 10:27 ` Byungchul Park
  2016-07-07 10:17 ` [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Byungchul Park @ 2016-07-04 10:27 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, walken

In non-oops case, it's usually not necessary to check all words of stack
area to extract backtrace. Instead, we can achieve it by tracking frame
pointer. So made it possible to save stack trace lightly in normal case.

I measured its ovehead and printed its difference of sched_clock() with
my QEMU x86 machine. The latency was improved over 80% when
trace->max_entries = 5.

Before this patch:

[    2.780735] save_stack_trace takes 19902 (sched_lock)
[    2.780718] save_stack_trace takes 20240 (sched_lock)
[    2.781692] save_stack_trace takes 45215 (sched_lock)
[    2.781477] save_stack_trace takes 20191 (sched_lock)
[    2.781694] save_stack_trace takes 20044 (sched_lock)
[    2.782589] save_stack_trace takes 20292 (sched_lock)
[    2.782706] save_stack_trace takes 20024 (sched_lock)
[    2.782706] save_stack_trace takes 19881 (sched_lock)
[    2.782881] save_stack_trace takes 24577 (sched_lock)
[    2.782706] save_stack_trace takes 19901 (sched_lock)
[    2.783621] save_stack_trace takes 24381 (sched_lock)
[    2.783621] save_stack_trace takes 20205 (sched_lock)
[    2.783760] save_stack_trace takes 19956 (sched_lock)
[    2.783718] save_stack_trace takes 20280 (sched_lock)
[    2.784179] save_stack_trace takes 20099 (sched_lock)
[    2.784835] save_stack_trace takes 20055 (sched_lock)
[    2.785922] save_stack_trace takes 20157 (sched_lock)
[    2.785922] save_stack_trace takes 20140 (sched_lock)
[    2.786178] save_stack_trace takes 20040 (sched_lock)
[    2.786877] save_stack_trace takes 20102 (sched_lock)
[    2.795000] save_stack_trace takes 21147 (sched_lock)
[    2.795397] save_stack_trace takes 20230 (sched_lock)
[    2.795397] save_stack_trace takes 31274 (sched_lock)
[    2.795739] save_stack_trace takes 19706 (sched_lock)
[    2.796484] save_stack_trace takes 20266 (sched_lock)
[    2.796484] save_stack_trace takes 20902 (sched_lock)
[    2.797000] save_stack_trace takes 38110 (sched_lock)
[    2.797510] save_stack_trace takes 20224 (sched_lock)
[    2.798181] save_stack_trace takes 20172 (sched_lock)
[    2.798837] save_stack_trace takes 20824 (sched_lock)

After this patch:

[    3.100520] save_stack_trace takes 3817 (sched_lock)
[    3.100680] save_stack_trace takes 3812 (sched_lock)
[    3.101740] save_stack_trace takes 5012 (sched_lock)
[    3.101809] save_stack_trace takes 3868 (sched_lock)
[    3.103264] save_stack_trace takes 3657 (sched_lock)
[    3.103264] save_stack_trace takes 3821 (sched_lock)
[    3.129442] save_stack_trace takes 3923 (sched_lock)
[    3.129983] save_stack_trace takes 3405 (sched_lock)
[    3.130000] save_stack_trace takes 3603 (sched_lock)
[    3.130246] save_stack_trace takes 3362 (sched_lock)
[    3.130598] save_stack_trace takes 3674 (sched_lock)
[    3.130968] save_stack_trace takes 3609 (sched_lock)
[    3.131379] save_stack_trace takes 6376 (sched_lock)
[    3.131847] save_stack_trace takes 3221 (sched_lock)
[    3.132000] save_stack_trace takes 3597 (sched_lock)
[    3.132043] save_stack_trace takes 3400 (sched_lock)
[    3.132572] save_stack_trace takes 3283 (sched_lock)
[    3.132714] save_stack_trace takes 3335 (sched_lock)
[    3.133039] save_stack_trace takes 3358 (sched_lock)
[    3.133476] save_stack_trace takes 3160 (sched_lock)
[    3.133807] save_stack_trace takes 3297 (sched_lock)
[    3.133954] save_stack_trace takes 3330 (sched_lock)
[    3.134235] save_stack_trace takes 3517 (sched_lock)
[    3.134711] save_stack_trace takes 3773 (sched_lock)
[    3.135000] save_stack_trace takes 3685 (sched_lock)
[    3.135541] save_stack_trace takes 4757 (sched_lock)
[    3.135865] save_stack_trace takes 3420 (sched_lock)
[    3.136000] save_stack_trace takes 3329 (sched_lock)
[    3.137000] save_stack_trace takes 4058 (sched_lock)
[    3.137000] save_stack_trace takes 3499 (sched_lock)

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 arch/x86/kernel/stacktrace.c | 25 +++++++++++++++++++++++++
 include/linux/stacktrace.h   |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 9545719..f1ca767 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -49,6 +49,10 @@ static int save_stack_end(void *data)
 	return trace->nr_entries >= trace->max_entries;
 }
 
+/*
+ * This operation should be used in the oops case where
+ * stack might be broken.
+ */
 static const struct stacktrace_ops save_stack_ops = {
 	.stack		= save_stack_stack,
 	.address	= save_stack_address,
@@ -56,6 +60,13 @@ static const struct stacktrace_ops save_stack_ops = {
 	.end_walk	= save_stack_end,
 };
 
+static const struct stacktrace_ops save_stack_ops_norm = {
+	.stack		= save_stack_stack,
+	.address	= save_stack_address,
+	.walk_stack	= print_context_stack_bp,
+	.end_walk	= save_stack_end,
+};
+
 static const struct stacktrace_ops save_stack_ops_nosched = {
 	.stack		= save_stack_stack,
 	.address	= save_stack_address_nosched,
@@ -64,6 +75,7 @@ static const struct stacktrace_ops save_stack_ops_nosched = {
 
 /*
  * Save stack-backtrace addresses into a stack_trace buffer.
+ * It works even in oops.
  */
 void save_stack_trace(struct stack_trace *trace)
 {
@@ -73,6 +85,19 @@ void save_stack_trace(struct stack_trace *trace)
 }
 EXPORT_SYMBOL_GPL(save_stack_trace);
 
+/*
+ * Save stack-backtrace addresses into a stack_trace buffer.
+ * This is perfered in normal case where we expect the stack is
+ * reliable.
+ */
+void save_stack_trace_norm(struct stack_trace *trace)
+{
+	dump_trace(current, NULL, NULL, 0, &save_stack_ops_norm, trace);
+	if (trace->nr_entries < trace->max_entries)
+		trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_norm);
+
 void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
 {
 	dump_trace(current, regs, NULL, 0, &save_stack_ops, trace);
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index 0a34489..58d5176 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -14,6 +14,7 @@ struct stack_trace {
 };
 
 extern void save_stack_trace(struct stack_trace *trace);
+extern void save_stack_trace_norm(struct stack_trace *trace);
 extern void save_stack_trace_regs(struct pt_regs *regs,
 				  struct stack_trace *trace);
 extern void save_stack_trace_tsk(struct task_struct *tsk,
@@ -31,6 +32,7 @@ extern void save_stack_trace_user(struct stack_trace *trace);
 
 #else
 # define save_stack_trace(trace)			do { } while (0)
+# define save_stack_trace_norm(trace)			do { } while (0)
 # define save_stack_trace_tsk(tsk, trace)		do { } while (0)
 # define save_stack_trace_user(trace)			do { } while (0)
 # define print_stack_trace(trace, spaces)		do { } while (0)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-04 10:27 [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
  2016-07-04 10:27 ` [PATCH 2/2] x86/dumpstack: Add save_stack_trace_norm() Byungchul Park
@ 2016-07-07 10:17 ` Byungchul Park
  2016-07-08 10:08   ` Ingo Molnar
  2016-07-08 14:08 ` Josh Poimboeuf
  2016-07-08 14:44 ` Frederic Weisbecker
  3 siblings, 1 reply; 17+ messages in thread
From: Byungchul Park @ 2016-07-07 10:17 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, walken

On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> I want to proceed saperately since it's somewhat independent from each
> other. Frankly speaking, I want this patchset to be accepted at first so
> that the crossfeature can use this optimized save_stack_trace_norm()
> which makes crossrelease work smoothly.

What do you think about this way to improve it?

> 
> ----->8-----
> >From 1ceb4cee520cfc562d5d63471f6db4e9a8d9ff42 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul.park@lge.com>
> Date: Mon, 4 Jul 2016 18:31:09 +0900
> Subject: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
> 
> Currently, x86 implementation of save_stack_trace() is walking all stack
> region word by word regardless of what the trace->max_entries is.
> However, it's unnecessary to walk after already fulfilling caller's
> requirement, say, if trace->nr_entries >= trace->max_entries is true.
> 
> I measured its overhead and printed its difference of sched_clock() with
> my QEMU x86 machine. The latency was improved over 70% when
> trace->max_entries = 5.
> 
> Before this patch:
> 
> [    2.326940] save_stack_trace() takes 83931 ns
> [    2.326389] save_stack_trace() takes 62576 ns
> [    2.327575] save_stack_trace() takes 58826 ns
> [    2.327000] save_stack_trace() takes 88980 ns
> [    2.327424] save_stack_trace() takes 59831 ns
> [    2.327575] save_stack_trace() takes 58482 ns
> [    2.327597] save_stack_trace() takes 87114 ns
> [    2.327931] save_stack_trace() takes 121140 ns
> [    2.327434] save_stack_trace() takes 64321 ns
> [    2.328632] save_stack_trace() takes 84997 ns
> [    2.328000] save_stack_trace() takes 115037 ns
> [    2.328460] save_stack_trace() takes 72292 ns
> [    2.328632] save_stack_trace() takes 61236 ns
> [    2.328567] save_stack_trace() takes 76666 ns
> [    2.328867] save_stack_trace() takes 79525 ns
> [    2.328460] save_stack_trace() takes 64902 ns
> [    2.329585] save_stack_trace() takes 58760 ns
> [    2.329000] save_stack_trace() takes 91349 ns
> [    2.329414] save_stack_trace() takes 60069 ns
> [    2.329585] save_stack_trace() takes 61012 ns
> [    2.329573] save_stack_trace() takes 76820 ns
> [    2.329863] save_stack_trace() takes 62131 ns
> [    2.330000] save_stack_trace() takes 99476 ns
> [    2.329846] save_stack_trace() takes 62419 ns
> [    2.330000] save_stack_trace() takes 88918 ns
> [    2.330253] save_stack_trace() takes 73669 ns
> [    2.330520] save_stack_trace() takes 67876 ns
> [    2.330671] save_stack_trace() takes 75963 ns
> [    2.330983] save_stack_trace() takes 95079 ns
> [    2.330451] save_stack_trace() takes 62352 ns
> 
> After this patch:
> 
> [    2.780735] save_stack_trace() takes 19902 ns
> [    2.780718] save_stack_trace() takes 20240 ns
> [    2.781692] save_stack_trace() takes 45215 ns
> [    2.781477] save_stack_trace() takes 20191 ns
> [    2.781694] save_stack_trace() takes 20044 ns
> [    2.782589] save_stack_trace() takes 20292 ns
> [    2.782706] save_stack_trace() takes 20024 ns
> [    2.782706] save_stack_trace() takes 19881 ns
> [    2.782881] save_stack_trace() takes 24577 ns
> [    2.782706] save_stack_trace() takes 19901 ns
> [    2.783621] save_stack_trace() takes 24381 ns
> [    2.783621] save_stack_trace() takes 20205 ns
> [    2.783760] save_stack_trace() takes 19956 ns
> [    2.783718] save_stack_trace() takes 20280 ns
> [    2.784179] save_stack_trace() takes 20099 ns
> [    2.784835] save_stack_trace() takes 20055 ns
> [    2.785922] save_stack_trace() takes 20157 ns
> [    2.785922] save_stack_trace() takes 20140 ns
> [    2.786178] save_stack_trace() takes 20040 ns
> [    2.786877] save_stack_trace() takes 20102 ns
> [    2.795000] save_stack_trace() takes 21147 ns
> [    2.795397] save_stack_trace() takes 20230 ns
> [    2.795397] save_stack_trace() takes 31274 ns
> [    2.795739] save_stack_trace() takes 19706 ns
> [    2.796484] save_stack_trace() takes 20266 ns
> [    2.796484] save_stack_trace() takes 20902 ns
> [    2.797000] save_stack_trace() takes 38110 ns
> [    2.797510] save_stack_trace() takes 20224 ns
> [    2.798181] save_stack_trace() takes 20172 ns
> [    2.798837] save_stack_trace() takes 20824 ns
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  arch/x86/include/asm/stacktrace.h | 1 +
>  arch/x86/kernel/dumpstack.c       | 4 ++++
>  arch/x86/kernel/dumpstack_32.c    | 2 ++
>  arch/x86/kernel/stacktrace.c      | 7 +++++++
>  4 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
> index 70bbe39..fc572e7 100644
> --- a/arch/x86/include/asm/stacktrace.h
> +++ b/arch/x86/include/asm/stacktrace.h
> @@ -41,6 +41,7 @@ struct stacktrace_ops {
>  	/* On negative return stop dumping */
>  	int (*stack)(void *data, char *name);
>  	walk_stack_t	walk_stack;
> +	int (*end_walk)(void *data);
>  };
>  
>  void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index 9c30acf..89f68f3 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -115,6 +115,8 @@ print_context_stack(struct thread_info *tinfo,
>  			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
>  		}
>  		stack++;
> +		if (ops->end_walk && ops->end_walk(data))
> +			break;
>  	}
>  	return bp;
>  }
> @@ -139,6 +141,8 @@ print_context_stack_bp(struct thread_info *tinfo,
>  		frame = frame->next_frame;
>  		ret_addr = &frame->return_address;
>  		print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
> +		if (ops->end_walk && ops->end_walk(data))
> +			break;
>  	}
>  
>  	return (unsigned long)frame;
> diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
> index 464ffd6..cc51419 100644
> --- a/arch/x86/kernel/dumpstack_32.c
> +++ b/arch/x86/kernel/dumpstack_32.c
> @@ -71,6 +71,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
>  		context = task_thread_info(task);
>  		bp = ops->walk_stack(context, stack, bp, ops, data,
>  				     end_stack, &graph);
> +		if (ops->end_walk && ops->end_walk(data))
> +			break;
>  
>  		/* Stop if not on irq stack */
>  		if (!end_stack)
> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> index fdd0c64..9545719 100644
> --- a/arch/x86/kernel/stacktrace.c
> +++ b/arch/x86/kernel/stacktrace.c
> @@ -43,10 +43,17 @@ save_stack_address_nosched(void *data, unsigned long addr, int reliable)
>  	return __save_stack_address(data, addr, reliable, true);
>  }
>  
> +static int save_stack_end(void *data)
> +{
> +	struct stack_trace *trace = data;
> +	return trace->nr_entries >= trace->max_entries;
> +}
> +
>  static const struct stacktrace_ops save_stack_ops = {
>  	.stack		= save_stack_stack,
>  	.address	= save_stack_address,
>  	.walk_stack	= print_context_stack,
> +	.end_walk	= save_stack_end,
>  };
>  
>  static const struct stacktrace_ops save_stack_ops_nosched = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-07 10:17 ` [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
@ 2016-07-08 10:08   ` Ingo Molnar
  2016-07-08 14:29     ` Josh Poimboeuf
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ingo Molnar @ 2016-07-08 10:08 UTC (permalink / raw)
  To: Byungchul Park
  Cc: peterz, linux-kernel, walken, Frédéric Weisbecker,
	Josh Poimboeuf, Peter Zijlstra


* Byungchul Park <byungchul.park@lge.com> wrote:

> On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> > I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> > I want to proceed saperately since it's somewhat independent from each
> > other. Frankly speaking, I want this patchset to be accepted at first so
> > that the crossfeature can use this optimized save_stack_trace_norm()
> > which makes crossrelease work smoothly.
> 
> What do you think about this way to improve it?

I like both of your improvements, the speed up is impressive:

  [    2.327597] save_stack_trace() takes 87114 ns
  ...
  [    2.781694] save_stack_trace() takes 20044 ns
  ...
  [    3.103264] save_stack_trace takes 3821 (sched_lock)

Could you please also measure call graph recording (perf record -g), how much 
faster does it get with your patches and what are our remaining performance hot 
spots?

Could you please merge your patches to the latest -tip tree, because this commit I 
merged earlier today:

  81c2949f7fdc x86/dumpstack: Add show_stack_regs() and use it

conflicts with your patches. (I'll push this commit out later today.)

Also, could you please rename the _norm names to _fast or so, to signal that this 
is a faster but less reliable method to get a stack dump? Nobody knows what 
'_norm' means, but '_fast' is pretty self-explanatory.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-04 10:27 [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
  2016-07-04 10:27 ` [PATCH 2/2] x86/dumpstack: Add save_stack_trace_norm() Byungchul Park
  2016-07-07 10:17 ` [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
@ 2016-07-08 14:08 ` Josh Poimboeuf
  2016-07-08 14:44 ` Frederic Weisbecker
  3 siblings, 0 replies; 17+ messages in thread
From: Josh Poimboeuf @ 2016-07-08 14:08 UTC (permalink / raw)
  To: Byungchul Park; +Cc: peterz, mingo, linux-kernel, walken

On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> I want to proceed saperately since it's somewhat independent from each
> other. Frankly speaking, I want this patchset to be accepted at first so
> that the crossfeature can use this optimized save_stack_trace_norm()
> which makes crossrelease work smoothly.
> 
> ----->8-----
> From 1ceb4cee520cfc562d5d63471f6db4e9a8d9ff42 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul.park@lge.com>
> Date: Mon, 4 Jul 2016 18:31:09 +0900
> Subject: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
> 
> Currently, x86 implementation of save_stack_trace() is walking all stack
> region word by word regardless of what the trace->max_entries is.
> However, it's unnecessary to walk after already fulfilling caller's
> requirement, say, if trace->nr_entries >= trace->max_entries is true.
> 
> I measured its overhead and printed its difference of sched_clock() with
> my QEMU x86 machine. The latency was improved over 70% when
> trace->max_entries = 5.
> 
> Before this patch:
> 
> [    2.326940] save_stack_trace() takes 83931 ns
> [    2.326389] save_stack_trace() takes 62576 ns
> [    2.327575] save_stack_trace() takes 58826 ns
> [    2.327000] save_stack_trace() takes 88980 ns
> [    2.327424] save_stack_trace() takes 59831 ns
> [    2.327575] save_stack_trace() takes 58482 ns
> [    2.327597] save_stack_trace() takes 87114 ns
> [    2.327931] save_stack_trace() takes 121140 ns
> [    2.327434] save_stack_trace() takes 64321 ns
> [    2.328632] save_stack_trace() takes 84997 ns
> [    2.328000] save_stack_trace() takes 115037 ns
> [    2.328460] save_stack_trace() takes 72292 ns
> [    2.328632] save_stack_trace() takes 61236 ns
> [    2.328567] save_stack_trace() takes 76666 ns
> [    2.328867] save_stack_trace() takes 79525 ns
> [    2.328460] save_stack_trace() takes 64902 ns
> [    2.329585] save_stack_trace() takes 58760 ns
> [    2.329000] save_stack_trace() takes 91349 ns
> [    2.329414] save_stack_trace() takes 60069 ns
> [    2.329585] save_stack_trace() takes 61012 ns
> [    2.329573] save_stack_trace() takes 76820 ns
> [    2.329863] save_stack_trace() takes 62131 ns
> [    2.330000] save_stack_trace() takes 99476 ns
> [    2.329846] save_stack_trace() takes 62419 ns
> [    2.330000] save_stack_trace() takes 88918 ns
> [    2.330253] save_stack_trace() takes 73669 ns
> [    2.330520] save_stack_trace() takes 67876 ns
> [    2.330671] save_stack_trace() takes 75963 ns
> [    2.330983] save_stack_trace() takes 95079 ns
> [    2.330451] save_stack_trace() takes 62352 ns
> 
> After this patch:
> 
> [    2.780735] save_stack_trace() takes 19902 ns
> [    2.780718] save_stack_trace() takes 20240 ns
> [    2.781692] save_stack_trace() takes 45215 ns
> [    2.781477] save_stack_trace() takes 20191 ns
> [    2.781694] save_stack_trace() takes 20044 ns
> [    2.782589] save_stack_trace() takes 20292 ns
> [    2.782706] save_stack_trace() takes 20024 ns
> [    2.782706] save_stack_trace() takes 19881 ns
> [    2.782881] save_stack_trace() takes 24577 ns
> [    2.782706] save_stack_trace() takes 19901 ns
> [    2.783621] save_stack_trace() takes 24381 ns
> [    2.783621] save_stack_trace() takes 20205 ns
> [    2.783760] save_stack_trace() takes 19956 ns
> [    2.783718] save_stack_trace() takes 20280 ns
> [    2.784179] save_stack_trace() takes 20099 ns
> [    2.784835] save_stack_trace() takes 20055 ns
> [    2.785922] save_stack_trace() takes 20157 ns
> [    2.785922] save_stack_trace() takes 20140 ns
> [    2.786178] save_stack_trace() takes 20040 ns
> [    2.786877] save_stack_trace() takes 20102 ns
> [    2.795000] save_stack_trace() takes 21147 ns
> [    2.795397] save_stack_trace() takes 20230 ns
> [    2.795397] save_stack_trace() takes 31274 ns
> [    2.795739] save_stack_trace() takes 19706 ns
> [    2.796484] save_stack_trace() takes 20266 ns
> [    2.796484] save_stack_trace() takes 20902 ns
> [    2.797000] save_stack_trace() takes 38110 ns
> [    2.797510] save_stack_trace() takes 20224 ns
> [    2.798181] save_stack_trace() takes 20172 ns
> [    2.798837] save_stack_trace() takes 20824 ns
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  arch/x86/include/asm/stacktrace.h | 1 +
>  arch/x86/kernel/dumpstack.c       | 4 ++++
>  arch/x86/kernel/dumpstack_32.c    | 2 ++
>  arch/x86/kernel/stacktrace.c      | 7 +++++++
>  4 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
> index 70bbe39..fc572e7 100644
> --- a/arch/x86/include/asm/stacktrace.h
> +++ b/arch/x86/include/asm/stacktrace.h
> @@ -41,6 +41,7 @@ struct stacktrace_ops {
>  	/* On negative return stop dumping */
>  	int (*stack)(void *data, char *name);
>  	walk_stack_t	walk_stack;
> +	int (*end_walk)(void *data);
>  };
>  
>  void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index 9c30acf..89f68f3 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -115,6 +115,8 @@ print_context_stack(struct thread_info *tinfo,
>  			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
>  		}
>  		stack++;
> +		if (ops->end_walk && ops->end_walk(data))
> +			break;

Instead of adding a new callback, why not just check the ops->address()
return value?  It already returns an error if the array is full. 
 
I think that would be cleaner and would help prevent more callback
sprawl.


>  	}
>  	return bp;
>  }
> @@ -139,6 +141,8 @@ print_context_stack_bp(struct thread_info *tinfo,
>  		frame = frame->next_frame;
>  		ret_addr = &frame->return_address;
>  		print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
> +		if (ops->end_walk && ops->end_walk(data))
> +			break;

Same here, and print_context_stack_bp() already checks the
ops->address() return value anyway.

-- 
Josh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-08 10:08   ` Ingo Molnar
@ 2016-07-08 14:29     ` Josh Poimboeuf
  2016-07-08 14:48       ` Ingo Molnar
  2016-07-08 15:02       ` Frederic Weisbecker
  2016-07-08 15:07     ` Frederic Weisbecker
  2016-07-18  2:37     ` Byungchul Park
  2 siblings, 2 replies; 17+ messages in thread
From: Josh Poimboeuf @ 2016-07-08 14:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Byungchul Park, peterz, linux-kernel, walken,
	Frédéric Weisbecker, Peter Zijlstra

On Fri, Jul 08, 2016 at 12:08:19PM +0200, Ingo Molnar wrote:
> 
> * Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> > > I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> > > I want to proceed saperately since it's somewhat independent from each
> > > other. Frankly speaking, I want this patchset to be accepted at first so
> > > that the crossfeature can use this optimized save_stack_trace_norm()
> > > which makes crossrelease work smoothly.
> > 
> > What do you think about this way to improve it?
> 
> I like both of your improvements, the speed up is impressive:
> 
>   [    2.327597] save_stack_trace() takes 87114 ns
>   ...
>   [    2.781694] save_stack_trace() takes 20044 ns
>   ...
>   [    3.103264] save_stack_trace takes 3821 (sched_lock)
> 
> Could you please also measure call graph recording (perf record -g), how much 
> faster does it get with your patches and what are our remaining performance hot 
> spots?
> 
> Could you please merge your patches to the latest -tip tree, because this commit I 
> merged earlier today:
> 
>   81c2949f7fdc x86/dumpstack: Add show_stack_regs() and use it
> 
> conflicts with your patches. (I'll push this commit out later today.)
> 
> Also, could you please rename the _norm names to _fast or so, to signal that this 
> is a faster but less reliable method to get a stack dump? Nobody knows what 
> '_norm' means, but '_fast' is pretty self-explanatory.

Hm, but is print_context_stack_bp() variant really less reliable?  From
what I can tell, its only differences vs print_context_stack() are:

- It doesn't scan the stack for "guesses" (which are 'unreliable' and
  are ignored by the ops->address() callback anyway).

- It stops if ops->address() returns an error (which in this case means
  the array is full anyway).

- It stops if the address isn't a kernel text address.  I think this
  shouldn't normally be possible unless there's some generated code like
  bpf on the stack.  Maybe it could be slightly improved for this case.

So instead of adding a new save_stack_trace_fast() variant, why don't we
just modify the existing save_stack_trace() to use
print_context_stack_bp()?

-- 
Josh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-04 10:27 [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
                   ` (2 preceding siblings ...)
  2016-07-08 14:08 ` Josh Poimboeuf
@ 2016-07-08 14:44 ` Frederic Weisbecker
  2016-07-18  3:25   ` Byungchul Park
  3 siblings, 1 reply; 17+ messages in thread
From: Frederic Weisbecker @ 2016-07-08 14:44 UTC (permalink / raw)
  To: Byungchul Park; +Cc: peterz, mingo, linux-kernel, walken

On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> I want to proceed saperately since it's somewhat independent from each
> other. Frankly speaking, I want this patchset to be accepted at first so
> that the crossfeature can use this optimized save_stack_trace_norm()
> which makes crossrelease work smoothly.
> 
> ----->8-----
> From 1ceb4cee520cfc562d5d63471f6db4e9a8d9ff42 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul.park@lge.com>
> Date: Mon, 4 Jul 2016 18:31:09 +0900
> Subject: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
> 
> Currently, x86 implementation of save_stack_trace() is walking all stack
> region word by word regardless of what the trace->max_entries is.
> However, it's unnecessary to walk after already fulfilling caller's
> requirement, say, if trace->nr_entries >= trace->max_entries is true.
> 
> I measured its overhead and printed its difference of sched_clock() with
> my QEMU x86 machine. The latency was improved over 70% when
> trace->max_entries = 5.
> 
> Before this patch:
> 
> [    2.326940] save_stack_trace() takes 83931 ns
> [    2.326389] save_stack_trace() takes 62576 ns
> [    2.327575] save_stack_trace() takes 58826 ns
> [    2.327000] save_stack_trace() takes 88980 ns
> [    2.327424] save_stack_trace() takes 59831 ns
> [    2.327575] save_stack_trace() takes 58482 ns
> [    2.327597] save_stack_trace() takes 87114 ns
> [    2.327931] save_stack_trace() takes 121140 ns
> [    2.327434] save_stack_trace() takes 64321 ns
> [    2.328632] save_stack_trace() takes 84997 ns
> [    2.328000] save_stack_trace() takes 115037 ns
> [    2.328460] save_stack_trace() takes 72292 ns
> [    2.328632] save_stack_trace() takes 61236 ns
> [    2.328567] save_stack_trace() takes 76666 ns
> [    2.328867] save_stack_trace() takes 79525 ns
> [    2.328460] save_stack_trace() takes 64902 ns
> [    2.329585] save_stack_trace() takes 58760 ns
> [    2.329000] save_stack_trace() takes 91349 ns
> [    2.329414] save_stack_trace() takes 60069 ns
> [    2.329585] save_stack_trace() takes 61012 ns
> [    2.329573] save_stack_trace() takes 76820 ns
> [    2.329863] save_stack_trace() takes 62131 ns
> [    2.330000] save_stack_trace() takes 99476 ns
> [    2.329846] save_stack_trace() takes 62419 ns
> [    2.330000] save_stack_trace() takes 88918 ns
> [    2.330253] save_stack_trace() takes 73669 ns
> [    2.330520] save_stack_trace() takes 67876 ns
> [    2.330671] save_stack_trace() takes 75963 ns
> [    2.330983] save_stack_trace() takes 95079 ns
> [    2.330451] save_stack_trace() takes 62352 ns
> 
> After this patch:
> 
> [    2.780735] save_stack_trace() takes 19902 ns
> [    2.780718] save_stack_trace() takes 20240 ns
> [    2.781692] save_stack_trace() takes 45215 ns
> [    2.781477] save_stack_trace() takes 20191 ns
> [    2.781694] save_stack_trace() takes 20044 ns
> [    2.782589] save_stack_trace() takes 20292 ns
> [    2.782706] save_stack_trace() takes 20024 ns
> [    2.782706] save_stack_trace() takes 19881 ns
> [    2.782881] save_stack_trace() takes 24577 ns
> [    2.782706] save_stack_trace() takes 19901 ns
> [    2.783621] save_stack_trace() takes 24381 ns
> [    2.783621] save_stack_trace() takes 20205 ns
> [    2.783760] save_stack_trace() takes 19956 ns
> [    2.783718] save_stack_trace() takes 20280 ns
> [    2.784179] save_stack_trace() takes 20099 ns
> [    2.784835] save_stack_trace() takes 20055 ns
> [    2.785922] save_stack_trace() takes 20157 ns
> [    2.785922] save_stack_trace() takes 20140 ns
> [    2.786178] save_stack_trace() takes 20040 ns
> [    2.786877] save_stack_trace() takes 20102 ns
> [    2.795000] save_stack_trace() takes 21147 ns
> [    2.795397] save_stack_trace() takes 20230 ns
> [    2.795397] save_stack_trace() takes 31274 ns
> [    2.795739] save_stack_trace() takes 19706 ns
> [    2.796484] save_stack_trace() takes 20266 ns
> [    2.796484] save_stack_trace() takes 20902 ns
> [    2.797000] save_stack_trace() takes 38110 ns
> [    2.797510] save_stack_trace() takes 20224 ns
> [    2.798181] save_stack_trace() takes 20172 ns
> [    2.798837] save_stack_trace() takes 20824 ns
> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  arch/x86/include/asm/stacktrace.h | 1 +
>  arch/x86/kernel/dumpstack.c       | 4 ++++
>  arch/x86/kernel/dumpstack_32.c    | 2 ++
>  arch/x86/kernel/stacktrace.c      | 7 +++++++
>  4 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h
> index 70bbe39..fc572e7 100644
> --- a/arch/x86/include/asm/stacktrace.h
> +++ b/arch/x86/include/asm/stacktrace.h
> @@ -41,6 +41,7 @@ struct stacktrace_ops {
>  	/* On negative return stop dumping */
>  	int (*stack)(void *data, char *name);
>  	walk_stack_t	walk_stack;
> +	int (*end_walk)(void *data);

Nice improvement but how about doing that with the return value of
stacktrace_ops::address() instead?

print_context_stack_bp() uses that for example. This behaviour could
be extended.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-08 14:29     ` Josh Poimboeuf
@ 2016-07-08 14:48       ` Ingo Molnar
  2016-07-08 15:02       ` Frederic Weisbecker
  1 sibling, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2016-07-08 14:48 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Byungchul Park, peterz, linux-kernel, walken,
	Frédéric Weisbecker, Peter Zijlstra


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Fri, Jul 08, 2016 at 12:08:19PM +0200, Ingo Molnar wrote:
> > 
> > * Byungchul Park <byungchul.park@lge.com> wrote:
> > 
> > > On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> > > > I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> > > > I want to proceed saperately since it's somewhat independent from each
> > > > other. Frankly speaking, I want this patchset to be accepted at first so
> > > > that the crossfeature can use this optimized save_stack_trace_norm()
> > > > which makes crossrelease work smoothly.
> > > 
> > > What do you think about this way to improve it?
> > 
> > I like both of your improvements, the speed up is impressive:
> > 
> >   [    2.327597] save_stack_trace() takes 87114 ns
> >   ...
> >   [    2.781694] save_stack_trace() takes 20044 ns
> >   ...
> >   [    3.103264] save_stack_trace takes 3821 (sched_lock)
> > 
> > Could you please also measure call graph recording (perf record -g), how much 
> > faster does it get with your patches and what are our remaining performance hot 
> > spots?
> > 
> > Could you please merge your patches to the latest -tip tree, because this commit I 
> > merged earlier today:
> > 
> >   81c2949f7fdc x86/dumpstack: Add show_stack_regs() and use it
> > 
> > conflicts with your patches. (I'll push this commit out later today.)
> > 
> > Also, could you please rename the _norm names to _fast or so, to signal that this 
> > is a faster but less reliable method to get a stack dump? Nobody knows what 
> > '_norm' means, but '_fast' is pretty self-explanatory.
> 
> Hm, but is print_context_stack_bp() variant really less reliable?  From
> what I can tell, its only differences vs print_context_stack() are:
> 
> - It doesn't scan the stack for "guesses" (which are 'unreliable' and
>   are ignored by the ops->address() callback anyway).
> 
> - It stops if ops->address() returns an error (which in this case means
>   the array is full anyway).
> 
> - It stops if the address isn't a kernel text address.  I think this
>   shouldn't normally be possible unless there's some generated code like
>   bpf on the stack.  Maybe it could be slightly improved for this case.
> 
> So instead of adding a new save_stack_trace_fast() variant, why don't we
> just modify the existing save_stack_trace() to use
> print_context_stack_bp()?

Ok, agreed!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-08 14:29     ` Josh Poimboeuf
  2016-07-08 14:48       ` Ingo Molnar
@ 2016-07-08 15:02       ` Frederic Weisbecker
  2016-07-08 15:22         ` Josh Poimboeuf
  2016-07-18  2:42         ` Byungchul Park
  1 sibling, 2 replies; 17+ messages in thread
From: Frederic Weisbecker @ 2016-07-08 15:02 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Byungchul Park, peterz, linux-kernel, walken,
	Peter Zijlstra

On Fri, Jul 08, 2016 at 09:29:29AM -0500, Josh Poimboeuf wrote:
> On Fri, Jul 08, 2016 at 12:08:19PM +0200, Ingo Molnar wrote:
> > 
> > * Byungchul Park <byungchul.park@lge.com> wrote:
> > 
> > > On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> > > > I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> > > > I want to proceed saperately since it's somewhat independent from each
> > > > other. Frankly speaking, I want this patchset to be accepted at first so
> > > > that the crossfeature can use this optimized save_stack_trace_norm()
> > > > which makes crossrelease work smoothly.
> > > 
> > > What do you think about this way to improve it?
> > 
> > I like both of your improvements, the speed up is impressive:
> > 
> >   [    2.327597] save_stack_trace() takes 87114 ns
> >   ...
> >   [    2.781694] save_stack_trace() takes 20044 ns
> >   ...
> >   [    3.103264] save_stack_trace takes 3821 (sched_lock)
> > 
> > Could you please also measure call graph recording (perf record -g), how much 
> > faster does it get with your patches and what are our remaining performance hot 
> > spots?
> > 
> > Could you please merge your patches to the latest -tip tree, because this commit I 
> > merged earlier today:
> > 
> >   81c2949f7fdc x86/dumpstack: Add show_stack_regs() and use it
> > 
> > conflicts with your patches. (I'll push this commit out later today.)
> > 
> > Also, could you please rename the _norm names to _fast or so, to signal that this 
> > is a faster but less reliable method to get a stack dump? Nobody knows what 
> > '_norm' means, but '_fast' is pretty self-explanatory.
> 
> Hm, but is print_context_stack_bp() variant really less reliable?  From
> what I can tell, its only differences vs print_context_stack() are:
> 
> - It doesn't scan the stack for "guesses" (which are 'unreliable' and
>   are ignored by the ops->address() callback anyway).
> 
> - It stops if ops->address() returns an error (which in this case means
>   the array is full anyway).
> 
> - It stops if the address isn't a kernel text address.  I think this
>   shouldn't normally be possible unless there's some generated code like
>   bpf on the stack.  Maybe it could be slightly improved for this case.
> 
> So instead of adding a new save_stack_trace_fast() variant, why don't we
> just modify the existing save_stack_trace() to use
> print_context_stack_bp()?

I'm not sure this is a good idea. First of all if the kernel isn't built with
frame pointers, all you have is wild walk guesses. Also even if frame pointers
is built, the bp-non-validated "guesses" are important clues for debugging because
they tell about previous calls that happened, or callbacks that were reffered to by
the stack.

There are several different users of save_stack_trace() in the kernel, we can't
be sure that all of them are interested in dropping those guesses.

So I'd rather advocate in favour of a new seperate helper.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-08 10:08   ` Ingo Molnar
  2016-07-08 14:29     ` Josh Poimboeuf
@ 2016-07-08 15:07     ` Frederic Weisbecker
  2016-07-18  2:37     ` Byungchul Park
  2 siblings, 0 replies; 17+ messages in thread
From: Frederic Weisbecker @ 2016-07-08 15:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Byungchul Park, peterz, linux-kernel, walken, Josh Poimboeuf,
	Peter Zijlstra

On Fri, Jul 08, 2016 at 12:08:19PM +0200, Ingo Molnar wrote:
> 
> * Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> > > I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> > > I want to proceed saperately since it's somewhat independent from each
> > > other. Frankly speaking, I want this patchset to be accepted at first so
> > > that the crossfeature can use this optimized save_stack_trace_norm()
> > > which makes crossrelease work smoothly.
> > 
> > What do you think about this way to improve it?
> 
> I like both of your improvements, the speed up is impressive:
> 
>   [    2.327597] save_stack_trace() takes 87114 ns
>   ...
>   [    2.781694] save_stack_trace() takes 20044 ns
>   ...
>   [    3.103264] save_stack_trace takes 3821 (sched_lock)
> 
> Could you please also measure call graph recording (perf record -g), how much 
> faster does it get with your patches and what are our remaining performance hot 
> spots?

I don't think it will impact much perf because print_context_stack_bp() checks
ops->address() which return perf_callchain_store() which tells if we crossed
the buffer limit.

But it's worth checking anyway.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-08 15:02       ` Frederic Weisbecker
@ 2016-07-08 15:22         ` Josh Poimboeuf
  2016-07-18  3:14           ` Byungchul Park
  2016-07-18  2:42         ` Byungchul Park
  1 sibling, 1 reply; 17+ messages in thread
From: Josh Poimboeuf @ 2016-07-08 15:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Byungchul Park, peterz, linux-kernel, walken,
	Peter Zijlstra

On Fri, Jul 08, 2016 at 05:02:33PM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 08, 2016 at 09:29:29AM -0500, Josh Poimboeuf wrote:
> > On Fri, Jul 08, 2016 at 12:08:19PM +0200, Ingo Molnar wrote:
> > > 
> > > * Byungchul Park <byungchul.park@lge.com> wrote:
> > > 
> > > > On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> > > > > I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> > > > > I want to proceed saperately since it's somewhat independent from each
> > > > > other. Frankly speaking, I want this patchset to be accepted at first so
> > > > > that the crossfeature can use this optimized save_stack_trace_norm()
> > > > > which makes crossrelease work smoothly.
> > > > 
> > > > What do you think about this way to improve it?
> > > 
> > > I like both of your improvements, the speed up is impressive:
> > > 
> > >   [    2.327597] save_stack_trace() takes 87114 ns
> > >   ...
> > >   [    2.781694] save_stack_trace() takes 20044 ns
> > >   ...
> > >   [    3.103264] save_stack_trace takes 3821 (sched_lock)
> > > 
> > > Could you please also measure call graph recording (perf record -g), how much 
> > > faster does it get with your patches and what are our remaining performance hot 
> > > spots?
> > > 
> > > Could you please merge your patches to the latest -tip tree, because this commit I 
> > > merged earlier today:
> > > 
> > >   81c2949f7fdc x86/dumpstack: Add show_stack_regs() and use it
> > > 
> > > conflicts with your patches. (I'll push this commit out later today.)
> > > 
> > > Also, could you please rename the _norm names to _fast or so, to signal that this 
> > > is a faster but less reliable method to get a stack dump? Nobody knows what 
> > > '_norm' means, but '_fast' is pretty self-explanatory.
> > 
> > Hm, but is print_context_stack_bp() variant really less reliable?  From
> > what I can tell, its only differences vs print_context_stack() are:
> > 
> > - It doesn't scan the stack for "guesses" (which are 'unreliable' and
> >   are ignored by the ops->address() callback anyway).
> > 
> > - It stops if ops->address() returns an error (which in this case means
> >   the array is full anyway).
> > 
> > - It stops if the address isn't a kernel text address.  I think this
> >   shouldn't normally be possible unless there's some generated code like
> >   bpf on the stack.  Maybe it could be slightly improved for this case.
> > 
> > So instead of adding a new save_stack_trace_fast() variant, why don't we
> > just modify the existing save_stack_trace() to use
> > print_context_stack_bp()?
> 
> I'm not sure this is a good idea. First of all if the kernel isn't built with
> frame pointers, all you have is wild walk guesses.

True, though I'd argue that if frame pointers are disabled then
save_stack_trace() should return an empty trace.  But admittedly, that
would be different from today's behavior.

> Also even if frame pointers
> is built, the bp-non-validated "guesses" are important clues for debugging because
> they tell about previous calls that happened, or callbacks that were reffered to by
> the stack.

Actually with frame pointers, the guesses are ignored.  From the
beginning of __save_stack_address():

#ifdef CONFIG_FRAME_POINTER
	if (!reliable)
		return 0;
#endif

And I'd argue that's the right approach.  The guesses are just
breadcrumbs which are indeed useful for debugging, but not what you'd
normally expect when asking for a stack trace.

> There are several different users of save_stack_trace() in the kernel, we can't
> be sure that all of them are interested in dropping those guesses.
> 
> So I'd rather advocate in favour of a new seperate helper.

So how about we change save_stack_trace() to use print_context_stack()
for CONFIG_FRAME_POINTER=n and print_context_stack_bp() for
CONFIG_FRAME_POINTER=y?  That would preserve the existing behavior, no?

-- 
Josh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-08 10:08   ` Ingo Molnar
  2016-07-08 14:29     ` Josh Poimboeuf
  2016-07-08 15:07     ` Frederic Weisbecker
@ 2016-07-18  2:37     ` Byungchul Park
  2 siblings, 0 replies; 17+ messages in thread
From: Byungchul Park @ 2016-07-18  2:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: peterz, linux-kernel, walken, Frédéric Weisbecker,
	Josh Poimboeuf, Peter Zijlstra

On Fri, Jul 08, 2016 at 12:08:19PM +0200, Ingo Molnar wrote:
> 
> * Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> > > I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> > > I want to proceed saperately since it's somewhat independent from each
> > > other. Frankly speaking, I want this patchset to be accepted at first so
> > > that the crossfeature can use this optimized save_stack_trace_norm()
> > > which makes crossrelease work smoothly.
> > 
> > What do you think about this way to improve it?
> 
> I like both of your improvements, the speed up is impressive:
> 
>   [    2.327597] save_stack_trace() takes 87114 ns
>   ...
>   [    2.781694] save_stack_trace() takes 20044 ns
>   ...
>   [    3.103264] save_stack_trace takes 3821 (sched_lock)
> 
> Could you please also measure call graph recording (perf record -g), how much 
> faster does it get with your patches and what are our remaining performance hot 
> spots?

Of course, I will.

> 
> Could you please merge your patches to the latest -tip tree, because this commit I 
> merged earlier today:

Sure.

> 
>   81c2949f7fdc x86/dumpstack: Add show_stack_regs() and use it
> 
> conflicts with your patches. (I'll push this commit out later today.)
> 
> Also, could you please rename the _norm names to _fast or so, to signal that this 
> is a faster but less reliable method to get a stack dump? Nobody knows what 
> '_norm' means, but '_fast' is pretty self-explanatory.

It would be better.

Thank you,
Byungchul

> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-08 15:02       ` Frederic Weisbecker
  2016-07-08 15:22         ` Josh Poimboeuf
@ 2016-07-18  2:42         ` Byungchul Park
  1 sibling, 0 replies; 17+ messages in thread
From: Byungchul Park @ 2016-07-18  2:42 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Poimboeuf, Ingo Molnar, peterz, linux-kernel, walken,
	Peter Zijlstra

On Fri, Jul 08, 2016 at 05:02:33PM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 08, 2016 at 09:29:29AM -0500, Josh Poimboeuf wrote:
> > On Fri, Jul 08, 2016 at 12:08:19PM +0200, Ingo Molnar wrote:
> > > 
> > > * Byungchul Park <byungchul.park@lge.com> wrote:
> > > 
> > > > On Mon, Jul 04, 2016 at 07:27:54PM +0900, Byungchul Park wrote:
> > > > > I suggested this patch on https://lkml.org/lkml/2016/6/20/22. However,
> > > > > I want to proceed saperately since it's somewhat independent from each
> > > > > other. Frankly speaking, I want this patchset to be accepted at first so
> > > > > that the crossfeature can use this optimized save_stack_trace_norm()
> > > > > which makes crossrelease work smoothly.
> > > > 
> > > > What do you think about this way to improve it?
> > > 
> > > I like both of your improvements, the speed up is impressive:
> > > 
> > >   [    2.327597] save_stack_trace() takes 87114 ns
> > >   ...
> > >   [    2.781694] save_stack_trace() takes 20044 ns
> > >   ...
> > >   [    3.103264] save_stack_trace takes 3821 (sched_lock)
> > > 
> > > Could you please also measure call graph recording (perf record -g), how much 
> > > faster does it get with your patches and what are our remaining performance hot 
> > > spots?
> > > 
> > > Could you please merge your patches to the latest -tip tree, because this commit I 
> > > merged earlier today:
> > > 
> > >   81c2949f7fdc x86/dumpstack: Add show_stack_regs() and use it
> > > 
> > > conflicts with your patches. (I'll push this commit out later today.)
> > > 
> > > Also, could you please rename the _norm names to _fast or so, to signal that this 
> > > is a faster but less reliable method to get a stack dump? Nobody knows what 
> > > '_norm' means, but '_fast' is pretty self-explanatory.
> > 
> > Hm, but is print_context_stack_bp() variant really less reliable?  From
> > what I can tell, its only differences vs print_context_stack() are:
> > 
> > - It doesn't scan the stack for "guesses" (which are 'unreliable' and
> >   are ignored by the ops->address() callback anyway).
> > 
> > - It stops if ops->address() returns an error (which in this case means
> >   the array is full anyway).
> > 
> > - It stops if the address isn't a kernel text address.  I think this
> >   shouldn't normally be possible unless there's some generated code like
> >   bpf on the stack.  Maybe it could be slightly improved for this case.
> > 
> > So instead of adding a new save_stack_trace_fast() variant, why don't we
> > just modify the existing save_stack_trace() to use
> > print_context_stack_bp()?
> 
> I'm not sure this is a good idea. First of all if the kernel isn't built with
> frame pointers, all you have is wild walk guesses. Also even if frame pointers
> is built, the bp-non-validated "guesses" are important clues for debugging because
> they tell about previous calls that happened, or callbacks that were reffered to by
> the stack.

This was what I exactly intended to.

> 
> There are several different users of save_stack_trace() in the kernel, we can't
> be sure that all of them are interested in dropping those guesses.
> 
> So I'd rather advocate in favour of a new seperate helper.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-08 15:22         ` Josh Poimboeuf
@ 2016-07-18  3:14           ` Byungchul Park
  2016-07-18 13:09             ` Josh Poimboeuf
  0 siblings, 1 reply; 17+ messages in thread
From: Byungchul Park @ 2016-07-18  3:14 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Frederic Weisbecker, Ingo Molnar, peterz, linux-kernel, walken,
	Peter Zijlstra

On Fri, Jul 08, 2016 at 10:22:46AM -0500, Josh Poimboeuf wrote:
> > > > Also, could you please rename the _norm names to _fast or so, to signal that this 
> > > > is a faster but less reliable method to get a stack dump? Nobody knows what 
> > > > '_norm' means, but '_fast' is pretty self-explanatory.
> > > 
> > > Hm, but is print_context_stack_bp() variant really less reliable?  From
> > > what I can tell, its only differences vs print_context_stack() are:
> > > 
> > > - It doesn't scan the stack for "guesses" (which are 'unreliable' and
> > >   are ignored by the ops->address() callback anyway).
> > > 
> > > - It stops if ops->address() returns an error (which in this case means
> > >   the array is full anyway).
> > > 
> > > - It stops if the address isn't a kernel text address.  I think this
> > >   shouldn't normally be possible unless there's some generated code like
> > >   bpf on the stack.  Maybe it could be slightly improved for this case.
> > > 
> > > So instead of adding a new save_stack_trace_fast() variant, why don't we
> > > just modify the existing save_stack_trace() to use
> > > print_context_stack_bp()?
> > 
> > I'm not sure this is a good idea. First of all if the kernel isn't built with
> > frame pointers, all you have is wild walk guesses.
> 
> True, though I'd argue that if frame pointers are disabled then
> save_stack_trace() should return an empty trace.  But admittedly, that

As Frederic said, I think, some save_stack_trace() users may want to
check the 'guesses', in other words, it's not good idea for
save_stack_trace() to return an empty trace when frame pointers are
disabled. No?

> > There are several different users of save_stack_trace() in the kernel, we can't
> > be sure that all of them are interested in dropping those guesses.
> > 
> > So I'd rather advocate in favour of a new seperate helper.
> 
> So how about we change save_stack_trace() to use print_context_stack()
> for CONFIG_FRAME_POINTER=n and print_context_stack_bp() for
> CONFIG_FRAME_POINTER=y?  That would preserve the existing behavior, no?

Even if CONFIG_FRAME_POINTER=y, someone may want to guess, doesn't they?

> 
> -- 
> Josh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-08 14:44 ` Frederic Weisbecker
@ 2016-07-18  3:25   ` Byungchul Park
  0 siblings, 0 replies; 17+ messages in thread
From: Byungchul Park @ 2016-07-18  3:25 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: peterz, mingo, linux-kernel, walken

On Fri, Jul 08, 2016 at 04:44:04PM +0200, Frederic Weisbecker wrote:
> Nice improvement but how about doing that with the return value of
> stacktrace_ops::address() instead?
> 
> print_context_stack_bp() uses that for example. This behaviour could
> be extended.

Yes. I will leave the change in print_context_stack_bp() unchanged back.

But frankly speaking, I thought the way to add end_walk improves
its readibility. So I am not sure which way is better. Could you any guys
give me additional opinions about the way to implement? I will follow it.

Thank you,
Byungchul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-18  3:14           ` Byungchul Park
@ 2016-07-18 13:09             ` Josh Poimboeuf
  2016-07-19  0:08               ` Byungchul Park
  0 siblings, 1 reply; 17+ messages in thread
From: Josh Poimboeuf @ 2016-07-18 13:09 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Frederic Weisbecker, Ingo Molnar, peterz, linux-kernel, walken,
	Peter Zijlstra

On Mon, Jul 18, 2016 at 12:14:22PM +0900, Byungchul Park wrote:
> On Fri, Jul 08, 2016 at 10:22:46AM -0500, Josh Poimboeuf wrote:
> > > > > Also, could you please rename the _norm names to _fast or so, to signal that this 
> > > > > is a faster but less reliable method to get a stack dump? Nobody knows what 
> > > > > '_norm' means, but '_fast' is pretty self-explanatory.
> > > > 
> > > > Hm, but is print_context_stack_bp() variant really less reliable?  From
> > > > what I can tell, its only differences vs print_context_stack() are:
> > > > 
> > > > - It doesn't scan the stack for "guesses" (which are 'unreliable' and
> > > >   are ignored by the ops->address() callback anyway).
> > > > 
> > > > - It stops if ops->address() returns an error (which in this case means
> > > >   the array is full anyway).
> > > > 
> > > > - It stops if the address isn't a kernel text address.  I think this
> > > >   shouldn't normally be possible unless there's some generated code like
> > > >   bpf on the stack.  Maybe it could be slightly improved for this case.
> > > > 
> > > > So instead of adding a new save_stack_trace_fast() variant, why don't we
> > > > just modify the existing save_stack_trace() to use
> > > > print_context_stack_bp()?
> > > 
> > > I'm not sure this is a good idea. First of all if the kernel isn't built with
> > > frame pointers, all you have is wild walk guesses.
> > 
> > True, though I'd argue that if frame pointers are disabled then
> > save_stack_trace() should return an empty trace.  But admittedly, that
> 
> As Frederic said, I think, some save_stack_trace() users may want to
> check the 'guesses', in other words, it's not good idea for
> save_stack_trace() to return an empty trace when frame pointers are
> disabled. No?

With frame pointers disabled, yes, maybe guesses are better than
nothing.

> > > There are several different users of save_stack_trace() in the kernel, we can't
> > > be sure that all of them are interested in dropping those guesses.
> > > 
> > > So I'd rather advocate in favour of a new seperate helper.
> > 
> > So how about we change save_stack_trace() to use print_context_stack()
> > for CONFIG_FRAME_POINTER=n and print_context_stack_bp() for
> > CONFIG_FRAME_POINTER=y?  That would preserve the existing behavior, no?
> 
> Even if CONFIG_FRAME_POINTER=y, someone may want to guess, doesn't they?

For CONFIG_FRAME_POINTER=y, the guesses are ignored by
__save_stack_address() and only the reliable addresses are saved.

We shouldn't change that behavior, unless you actually know of a caller
who wants the guesses.  And even then the "guess" variation should be
named accordingly to make it clear that it's not a "reliable" stack
trace, even though frame pointers are enabled.

-- 
Josh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace
  2016-07-18 13:09             ` Josh Poimboeuf
@ 2016-07-19  0:08               ` Byungchul Park
  0 siblings, 0 replies; 17+ messages in thread
From: Byungchul Park @ 2016-07-19  0:08 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Frederic Weisbecker, Ingo Molnar, peterz, linux-kernel, walken,
	Peter Zijlstra

On Mon, Jul 18, 2016 at 08:09:10AM -0500, Josh Poimboeuf wrote:
> > > > There are several different users of save_stack_trace() in the kernel, we can't
> > > > be sure that all of them are interested in dropping those guesses.
> > > > 
> > > > So I'd rather advocate in favour of a new seperate helper.
> > > 
> > > So how about we change save_stack_trace() to use print_context_stack()
> > > for CONFIG_FRAME_POINTER=n and print_context_stack_bp() for
> > > CONFIG_FRAME_POINTER=y?  That would preserve the existing behavior, no?
> > 
> > Even if CONFIG_FRAME_POINTER=y, someone may want to guess, doesn't they?
> 
> For CONFIG_FRAME_POINTER=y, the guesses are ignored by
> __save_stack_address() and only the reliable addresses are saved.

Indeed. I was confused.

> We shouldn't change that behavior, unless you actually know of a caller
> who wants the guesses.  And even then the "guess" variation should be
> named accordingly to make it clear that it's not a "reliable" stack
> trace, even though frame pointers are enabled.

My question was caused by being confused. I agree with you.

> 
> -- 
> Josh

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-07-19  0:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-04 10:27 [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
2016-07-04 10:27 ` [PATCH 2/2] x86/dumpstack: Add save_stack_trace_norm() Byungchul Park
2016-07-07 10:17 ` [PATCH 1/2] x86/dumpstack: Optimize save_stack_trace Byungchul Park
2016-07-08 10:08   ` Ingo Molnar
2016-07-08 14:29     ` Josh Poimboeuf
2016-07-08 14:48       ` Ingo Molnar
2016-07-08 15:02       ` Frederic Weisbecker
2016-07-08 15:22         ` Josh Poimboeuf
2016-07-18  3:14           ` Byungchul Park
2016-07-18 13:09             ` Josh Poimboeuf
2016-07-19  0:08               ` Byungchul Park
2016-07-18  2:42         ` Byungchul Park
2016-07-08 15:07     ` Frederic Weisbecker
2016-07-18  2:37     ` Byungchul Park
2016-07-08 14:08 ` Josh Poimboeuf
2016-07-08 14:44 ` Frederic Weisbecker
2016-07-18  3:25   ` Byungchul Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).