linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] ARC: ARCv2: Introduce SmaRT support
@ 2018-10-19 14:27 Eugeniy Paltsev
  2018-10-19 14:33 ` [RFC] ARC: ARCv2: Introduce SmaRT support : lmbench results Eugeniy Paltsev
  2018-10-24 18:28 ` [RFC] ARC: ARCv2: Introduce SmaRT support Vineet Gupta
  0 siblings, 2 replies; 3+ messages in thread
From: Eugeniy Paltsev @ 2018-10-19 14:27 UTC (permalink / raw)
  To: linux-snps-arc, Vineet Gupta
  Cc: linux-kernel, Alexey Brodkin, Eugeniy Paltsev

Add compile-time 'ARC_USE_SMART' option for enabling SmaRT support.

Small real time trace (SmaRT) is an optional on-chip debug hardware
component that captures instruction-trace history. It stores the
address of the most recent non-sequential instructions executed into
internal buffer.

Usually we use MetaWare debugger to enable SmaRT and display trace
information.

This patch allows to display the decoded content of SmaRT buffer
without MetaWare debugger. It is done by extending ordinary exception
message with decoded SmaRT instruction-trace history.

In some cases it's really usefull as it allows to show pre-exception
instruction-trace which was not tainted by exception handler code,
printk code, etc...

Nevertheless this option has negative performance impact due to
implementation as we dump SmaRT buffer content into external memory
buffer in the begining of every slowpath exception handler code.
We choose this implementation as a compromise between performance
impact and SmaRT buffer tainting.
Although the performance impact is not really significant (according
to lmbench) we leave this option disabled by default.

Here is th examples of user-space and kernel-space fault messages with
'ARC_USE_SMART' option enabled:

User-space exception:
----------------------->8-------------------------
Exception: u_hell[99]: at 0x103a2 [off 0x103a2 in /root/u_hell, VMA: 00010000:00012000]
  ECR: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103a2
SmaRT (64 entries):
 [   0]    V 0x90232358 -> 0x9022ce3c [src do_page_fault+0x2c/0x2d8] [dst populate_smart+0x0/0x9c]
 [   1]    V 0x9022e3f8 -> 0x9023232c [src EV_TLBProtV+0xec/0xf0] [dst do_page_fault+0x0/0x2d8]
 [   2]    V 0x90233194 -> 0x9022e30c [src do_slow_path_pf+0x10/0x14] [dst EV_TLBProtV+0x0/0xf0]
 [   3]    V 0x90233120 -> 0x90233184 [src EV_TLBMissD+0x80/0xe0] [dst do_slow_path_pf+0x0/0x14]
 [   4]  E V 0x000103a2 -> 0x902330a0 [off 0x103a2 in /root/u_hell, VMA: 00010000:00012000] [dst EV_TLBMissD+0x0/0xe0]
 [   5] U  V 0x2004f238 -> 0x00010398 [off 0x43238 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x10398 in /root/u_hell, VMA: 00010000:00012000]
 [   6] U  V 0x20049a82 -> 0x2004f214 [off 0x3da82 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x43214 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
 [   7] U  V 0x20049a64 -> 0x20049a76 [off 0x3da64 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x3da76 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
 [   8] U  V 0x2001d8e4 -> 0x20049a58 [off 0x118e4 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x3da58 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
 [   9] U  V 0x2001d8f4 -> 0x2001d8a8 [off 0x118f4 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x118a8 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
 [  10] U  V 0x2001d5c8 -> 0x2001d8f0 [off 0x115c8 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x118f0 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
...[snip]...
----------------------->8-------------------------

Kernel-space exception:
----------------------->8-------------------------
Exception: at dw_mci_probe+0xf0/0x944:
  ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x905e26e0
SmaRT (64 entries):
 [   0]    V 0x90232358 -> 0x9022ce3c [src do_page_fault+0x2c/0x2d8] [dst populate_smart+0x0/0x9c]
 [   1]    V 0x9022e3f8 -> 0x9023232c [src EV_TLBProtV+0xec/0xf0] [dst do_page_fault+0x0/0x2d8]
 [   2]    V 0x9022e398 -> 0x9022e3a0 [src EV_TLBProtV+0x8c/0xf0] [dst EV_TLBProtV+0x94/0xf0]
 [   3]    V 0x90233194 -> 0x9022e30c [src do_slow_path_pf+0x10/0x14] [dst EV_TLBProtV+0x0/0xf0]
 [   4]    V 0x902330dc -> 0x90233184 [src EV_TLBMissD+0x3c/0xe0] [dst do_slow_path_pf+0x0/0x14]
 [   5]  E V 0x905e26e0 -> 0x902330a0 [src dw_mci_probe+0xf0/0x944] [dst EV_TLBMissD+0x0/0xe0]
 [   6]    V 0x904cb940 -> 0x905e26d6 [src clk_get_rate+0xdc/0x208] [dst dw_mci_probe+0xe6/0x944]
 [   7]    V 0x9074e14a -> 0x904cb932 [src mutex_unlock+0x1e/0x24] [dst clk_get_rate+0xce/0x208]
 [   8]    V 0x904cb92e -> 0x9074e12c [src clk_get_rate+0xca/0x208] [dst mutex_unlock+0x0/0x24]
 [   9]    V 0x904cb8ea -> 0x904cb91e [src clk_get_rate+0x86/0x208] [dst clk_get_rate+0xba/0x208]
 [  10]    V 0x904cb8c2 -> 0x904cb8d0 [src clk_get_rate+0x5e/0x208] [dst clk_get_rate+0x6c/0x208]
 [  11]    V 0x9074e01a -> 0x904cb884 [src mutex_trylock+0x4e/0x50] [dst clk_get_rate+0x20/0x208]
 [  12]    V 0x9074e00e -> 0x9074e018 [src mutex_trylock+0x42/0x50] [dst mutex_trylock+0x4c/0x50]
 [  13]    V 0x9074dfdc -> 0x9074dff0 [src mutex_trylock+0x10/0x50] [dst mutex_trylock+0x24/0x50]
 [  14]    V 0x904cb880 -> 0x9074dfcc [src clk_get_rate+0x1c/0x208] [dst mutex_trylock+0x0/0x50]
 [  15]    V 0x905e26d2 -> 0x904cb864 [src dw_mci_probe+0xe2/0x944] [dst clk_get_rate+0x0/0x208]
...[snip]...
----------------------->8-------------------------

TODO:
  Add runtime procfs options to configure/suspend SmaRT.
  Add SmaRT BCR encoding struct.
  Check SmaRT version number in BCR.

NOTE:
this RFC has prerequisite:
  http://patchwork.ozlabs.org/patch/986820/

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
---
 arch/arc/Kconfig               |   8 +++
 arch/arc/include/asm/arcregs.h |  14 ++++++
 arch/arc/include/asm/bug.h     |   7 +++
 arch/arc/kernel/setup.c        |   6 +++
 arch/arc/kernel/traps.c        |   7 +++
 arch/arc/kernel/troubleshoot.c | 112 +++++++++++++++++++++++++++++++++++++++++
 arch/arc/mm/fault.c            |   1 +
 7 files changed, 155 insertions(+)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index a045f3086047..f006fe81e085 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -528,6 +528,14 @@ config ARC_DBG_TLB_PARANOIA
 	bool "Paranoia Checks in Low Level TLB Handlers"
 	default n
 
+config ARC_USE_SMART
+	bool "Enable real time trace on-chip debug HW"
+	depends on ISA_ARCV2
+	help
+	  Enable Small real time trace on-chip debug hardware component that
+	  captures instruction-trace history if exists. This trace will be
+	  printed if any unexpected exception is occured.
+
 endif
 
 config ARC_UBOOT_SUPPORT
diff --git a/arch/arc/include/asm/arcregs.h b/arch/arc/include/asm/arcregs.h
index 49bfbd879caa..11f4c5c0df4d 100644
--- a/arch/arc/include/asm/arcregs.h
+++ b/arch/arc/include/asm/arcregs.h
@@ -118,6 +118,20 @@
 #define ARC_AUX_DPFP_2H         0x304
 #define ARC_AUX_DPFP_STAT       0x305
 
+/* SmaRT registers */
+#define ARC_AUX_SMART_CONTROL	0x700
+#define SMART_CTL_EN		BIT(0)
+#define SMART_CTL_DATA_POS	8
+#define SMART_CTL_DATA_SRC	0
+#define SMART_CTL_DATA_DST	1
+#define SMART_CTL_DATA_FLAG	2
+#define SMART_CTL_IDX_POS	10
+#define ARC_AUX_SMART_DATA	0x701
+#define SMART_FLAG_U		BIT(8)
+#define SMART_FLAG_E		BIT(9)
+#define SMART_FLAG_R		BIT(10)
+#define SMART_FLAG_V		BIT(31)
+
 #ifndef __ASSEMBLY__
 
 #include <soc/arc/aux.h>
diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
index b68f7f82f2d8..bc9c14bcee2a 100644
--- a/arch/arc/include/asm/bug.h
+++ b/arch/arc/include/asm/bug.h
@@ -15,6 +15,13 @@
 
 struct task_struct;
 
+#ifdef CONFIG_ARC_USE_SMART
+void populate_smart(void);
+#define POPULATE_SMART()	populate_smart()
+#else
+#define POPULATE_SMART()
+#endif /* CONFIG_ARC_USE_SMART */
+
 void show_regs(struct pt_regs *regs);
 void show_exception_mesg(struct pt_regs *regs);
 void show_stacktrace(struct task_struct *tsk, struct pt_regs *regs);
diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
index b2cae79a25d7..e83bbc86397e 100644
--- a/arch/arc/kernel/setup.c
+++ b/arch/arc/kernel/setup.c
@@ -447,6 +447,12 @@ void setup_processor(void)
 	pr_info("%s", arc_platform_smp_cpuinfo());
 
 	arc_chk_core_config();
+
+#ifdef CONFIG_ARC_USE_SMART
+	if (cpuinfo_arc700[cpu_id].extn.smart)
+		write_aux_reg(ARC_AUX_SMART_CONTROL, SMART_CTL_EN);
+#endif
+
 }
 
 static inline int is_kernel(unsigned long addr)
diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
index e66fd40296b3..0228f7182e6d 100644
--- a/arch/arc/kernel/traps.c
+++ b/arch/arc/kernel/traps.c
@@ -70,6 +70,7 @@ int name(unsigned long address, struct pt_regs *regs) \
 {						\
 	siginfo_t info;				\
 						\
+	POPULATE_SMART();			\
 	clear_siginfo(&info);			\
 	info.si_signo = signr;			\
 	info.si_errno = 0;			\
@@ -96,6 +97,8 @@ DO_ERROR_INFO(SIGSEGV, "gcc generated __builtin_trap", do_trap5_error, 0)
 int do_misaligned_access(unsigned long address, struct pt_regs *regs,
 			 struct callee_regs *cregs)
 {
+	POPULATE_SMART();
+
 	/* If emulation not enabled, or failed, kill the task */
 	if (misaligned_fixup(address, regs, cregs) != 0)
 		return do_misaligned_error(address, regs);
@@ -128,6 +131,8 @@ void do_non_swi_trap(unsigned long address, struct pt_regs *regs)
 {
 	unsigned int param = regs->ecr_param;
 
+	POPULATE_SMART();
+
 	switch (param) {
 	case 1:
 		trap_is_brkpt(address, regs);
@@ -159,6 +164,8 @@ void do_insterror_or_kprobe(unsigned long address, struct pt_regs *regs)
 {
 	int rc;
 
+	POPULATE_SMART();
+
 	/* Check if this exception is caused by kprobes */
 	rc = notify_die(DIE_IERR, "kprobe_ierr", regs, address, 0, SIGILL);
 	if (rc == NOTIFY_STOP)
diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index fdfba1942a06..22017474a29c 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -14,10 +14,23 @@
 #include <linux/file.h>
 #include <linux/sched/mm.h>
 #include <linux/sched/debug.h>
+#include <linux/percpu-defs.h>
 
 #include <asm/arcregs.h>
 #include <asm/irqflags.h>
 
+#ifdef CONFIG_ARC_USE_SMART
+#define MAX_SMART_BUFF	4096
+
+struct smart_buff {
+	u32 src[MAX_SMART_BUFF];
+	u32 dst[MAX_SMART_BUFF];
+	u32 flags[MAX_SMART_BUFF];
+};
+
+DEFINE_PER_CPU(struct smart_buff, smart_buff_log);
+#endif /* CONFIG_ARC_USE_SMART */
+
 /*
  * Common routine to print scratch regs (r0-r12) or callee regs (r13-r25)
  *   -Prints 3 regs per line and a CR.
@@ -189,9 +202,104 @@ static inline void show_exception_mesg_k(struct pt_regs *regs)
 	show_ecr_verbose(regs);
 }
 
+#ifdef CONFIG_ARC_USE_SMART
+static inline bool smart_exist(void)
+{
+	struct bcr_generic bcr;
+
+	READ_BCR(ARC_REG_SMART_BCR, bcr);
+	return !!bcr.ver ;
+}
+
+static inline u32 smart_stack_size(void)
+{
+	return read_aux_reg(ARC_REG_SMART_BCR) >> SMART_CTL_IDX_POS;
+}
+
+static inline void smart_enable(void)
+{
+	write_aux_reg(ARC_AUX_SMART_CONTROL, SMART_CTL_EN);
+}
+
+static inline void smart_disable(void)
+{
+	write_aux_reg(ARC_AUX_SMART_CONTROL, 0);
+}
+
+static void show_smart(void)
+{
+	struct smart_buff *smart_buff_cpu = this_cpu_ptr(&smart_buff_log);
+	int i, stack_size;
+	char *buf;
+
+	if (!smart_exist())
+		return;
+
+	stack_size = smart_stack_size();
+	pr_info("SmaRT (%d entries):\n", stack_size);
+
+	buf = (char *)__get_free_page(GFP_NOWAIT);
+	if (!buf)
+		return;
+
+	for (i = 0; i < stack_size; i++) {
+		pr_info(" [%4d] %s%s%s%s %#010x -> %#010x ", i,
+			smart_buff_cpu->flags[i] & SMART_FLAG_U ? "U" : " ",
+			smart_buff_cpu->flags[i] & SMART_FLAG_E ? "E" : " ",
+			smart_buff_cpu->flags[i] & SMART_FLAG_R ? "R" : " ",
+			smart_buff_cpu->flags[i] & SMART_FLAG_V ? "V" : " ",
+			smart_buff_cpu->src[i], smart_buff_cpu->dst[i]);
+
+		if (smart_buff_cpu->src[i] < 0x80000000)
+			show_faulting_vma(smart_buff_cpu->src[i], buf);
+		else
+			pr_cont("[src %pS] ", (void *)smart_buff_cpu->src[i]);
+
+		if (smart_buff_cpu->dst[i] < 0x80000000)
+			show_faulting_vma(smart_buff_cpu->dst[i], buf);
+		else
+			pr_cont("[dst %pS]\n", (void *)smart_buff_cpu->dst[i]);
+	}
+
+	free_page((unsigned long)buf);
+}
+#endif /* CONFIG_ARC_USE_SMART */
+
 /************************************************************************
  *  API called by rest of kernel
  ***********************************************************************/
+#ifdef CONFIG_ARC_USE_SMART
+void populate_smart(void)
+{
+	struct smart_buff *smart_buff_cpu;
+	u32 stack_size;
+	int i;
+
+	if (!smart_exist())
+		return;
+
+	smart_disable();
+
+	smart_buff_cpu = this_cpu_ptr(&smart_buff_log);
+	stack_size = smart_stack_size();
+	for (i = 0; i < stack_size; i++) {
+		write_aux_reg(ARC_AUX_SMART_CONTROL,
+			(SMART_CTL_DATA_SRC << SMART_CTL_DATA_POS)
+			| (i << SMART_CTL_IDX_POS));
+		smart_buff_cpu->src[i] = read_aux_reg(ARC_AUX_SMART_DATA);
+		write_aux_reg(ARC_AUX_SMART_CONTROL,
+			(SMART_CTL_DATA_DST << SMART_CTL_DATA_POS)
+			| (i << SMART_CTL_IDX_POS));
+		smart_buff_cpu->dst[i] = read_aux_reg(ARC_AUX_SMART_DATA);
+		write_aux_reg(ARC_AUX_SMART_CONTROL,
+			(SMART_CTL_DATA_FLAG << SMART_CTL_DATA_POS)
+			| (i << SMART_CTL_IDX_POS));
+		smart_buff_cpu->flags[i] = read_aux_reg(ARC_AUX_SMART_DATA);
+	}
+
+	smart_enable();
+}
+#endif /* CONFIG_ARC_USE_SMART */
 
 void show_exception_mesg(struct pt_regs *regs)
 {
@@ -199,6 +307,10 @@ void show_exception_mesg(struct pt_regs *regs)
 		show_exception_mesg_u(regs);
 	else
 		show_exception_mesg_k(regs);
+
+#ifdef CONFIG_ARC_USE_SMART
+	show_smart();
+#endif /* CONFIG_ARC_USE_SMART */
 }
 
 void show_regs(struct pt_regs *regs)
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 026d662a7668..7a759fd874de 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -72,6 +72,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 	int write = regs->ecr_cause & ECR_C_PROTV_STORE;  /* ST/EX */
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
+	POPULATE_SMART();
 	clear_siginfo(&info);
 
 	/*
-- 
2.14.4


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC] ARC: ARCv2: Introduce SmaRT support : lmbench results
  2018-10-19 14:27 [RFC] ARC: ARCv2: Introduce SmaRT support Eugeniy Paltsev
@ 2018-10-19 14:33 ` Eugeniy Paltsev
  2018-10-24 18:28 ` [RFC] ARC: ARCv2: Introduce SmaRT support Vineet Gupta
  1 sibling, 0 replies; 3+ messages in thread
From: Eugeniy Paltsev @ 2018-10-19 14:33 UTC (permalink / raw)
  To: Vineet Gupta, linux-snps-arc; +Cc: linux-kernel, Alexey Brodkin

[-- Attachment #1: Type: text/plain, Size: 93 bytes --]

Lmbench summary with enabled and disabled SmaRT support is attached.

-- 
 Eugeniy Paltsev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: smart_compare.log --]
[-- Type: text/x-log; name="smart_compare.log", Size: 10957 bytes --]


                 L M B E N C H  3 . 0   S U M M A R Y
                 ------------------------------------
		 (Alpha software, do not distribute)

Basic system parameters
--------------------------------------------------------------------------------------
        Host                 OS Description                             Mhz  tlb  cache  mem   scal
                                                                             pages line   par   load
                                                                                   bytes
----------------- ------------- --------------------------------------- ---- ----- ----- ------ ----
ena               Linux 4.19.0- ena                                     1000     8   128 1.7500    1
ena               Linux 4.19.0- ena                                     1000     8       1.7500    1
ena               Linux 4.19.0- ena                                     1000     8       1.7300    1
ena               Linux 4.19.0- ena                                     1000     8   128 1.7500    1
dis               Linux 4.19.0- dis                                     1000     8       1.7400    1
dis               Linux 4.19.0- dis                                     1000     8       1.7500    1
dis               Linux 4.19.0- dis                                     1000     8   128 1.7500    1
dis               Linux 4.19.0- dis                                     1000     8       1.7400    1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh  
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
ena       Linux 4.19.0- 1000 0.38 0.59 2.46 4.87 23.4 0.55 2.20 388. 1975 3673
ena       Linux 4.19.0- 1000 0.38 0.59 2.45 4.90 23.4 0.55 2.20 366. 1979 3671
ena       Linux 4.19.0- 1000 0.38 0.59 2.44 4.89 23.4 0.55 2.19 381. 1961 3696
ena       Linux 4.19.0- 1000 0.38 0.59 2.43 5.22 23.4 0.55 2.18 388. 1972 3702
dis       Linux 4.19.0- 1000 0.39 0.61 2.53 4.71 23.6 0.55 2.20 311. 1793 3370
dis       Linux 4.19.0- 1000 0.39 0.59 2.55 4.74 23.6 0.55 2.20 311. 1807 3363
dis       Linux 4.19.0- 1000 0.39 0.59 2.55 4.77 23.5 0.58 2.20 314. 1787 3373
dis       Linux 4.19.0- 1000 0.39 0.59 2.54 4.78 23.5 0.61 2.20 314. 1786 3395

Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host                 OS  intgr intgr  intgr  intgr  intgr  
                          bit   add    mul    div    mod   
--------- ------------- ------ ------ ------ ------ ------ 
ena       Linux 4.19.0- 1.0000 0.5500          13.0 8.0000
ena       Linux 4.19.0- 1.0000 0.5500          13.0 8.0100
ena       Linux 4.19.0- 1.0000 0.5500          13.0 8.0100
ena       Linux 4.19.0- 1.0000 0.5500          13.0 8.0000
dis       Linux 4.19.0- 1.0000 0.5500          13.0 8.0000
dis       Linux 4.19.0- 1.0000 0.5500          13.0 8.0000
dis       Linux 4.19.0- 1.0000 0.5500          13.0 8.0000
dis       Linux 4.19.0- 1.0000 0.5500          13.0 8.0000

Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS int64  int64  int64  int64  int64  
                         bit    add    mul    div    mod   
--------- ------------- ------ ------ ------ ------ ------ 
ena       Linux 4.19.0-    11.          10.3  116.2   97.0
ena       Linux 4.19.0-    12.          10.3  116.2   96.4
ena       Linux 4.19.0-    11.          10.3  116.2   96.5
ena       Linux 4.19.0-    11.          10.3  116.2   96.4
dis       Linux 4.19.0-    12.          10.3  116.2   96.4
dis       Linux 4.19.0-    11.          10.3  116.2   96.4
dis       Linux 4.19.0-    11.          10.3  116.2   96.5
dis       Linux 4.19.0-    11.          10.3  116.2   96.4

Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host                 OS  float  float  float  float
                         add    mul    div    bogo
--------- ------------- ------ ------ ------ ------ 
ena       Linux 4.19.0-  145.2  144.7  288.2  730.9
ena       Linux 4.19.0-  145.9  144.6  288.0  729.0
ena       Linux 4.19.0-  145.9  144.7  288.2  730.2
ena       Linux 4.19.0-  145.9  144.7  288.2  729.6
dis       Linux 4.19.0-  146.0  144.6  288.3  730.2
dis       Linux 4.19.0-  145.2  144.7  288.2  729.0
dis       Linux 4.19.0-  145.4  144.7  288.2  729.9
dis       Linux 4.19.0-  146.0  144.7  288.2  765.1

Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS  double double double double
                         add    mul    div    bogo
--------- ------------- ------  ------ ------ ------ 
ena       Linux 4.19.0-  196.1  273.5 1506.9 2170.0
ena       Linux 4.19.0-  196.7  273.5 1506.9 2112.0
ena       Linux 4.19.0-  196.7  273.5 1506.9 2121.0
ena       Linux 4.19.0-  196.7  273.5 1506.4 2081.3
dis       Linux 4.19.0-  196.2  273.5 1497.8 2125.7
dis       Linux 4.19.0-  196.7  273.5 1506.7 2082.0
dis       Linux 4.19.0-  196.2  273.5 1506.7 2119.7
dis       Linux 4.19.0-  196.2  273.5 1506.9 2119.0

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
ena       Linux 4.19.0- 3.7100 3.8000 1.0600 7.9600   66.8    14.8   120.7
ena       Linux 4.19.0- 3.9100 4.5000 7.5400 7.7400   92.7    19.8   121.8
ena       Linux 4.19.0- 4.5400 5.2400   14.8 6.6600   86.7    17.3   124.9
ena       Linux 4.19.0- 3.5800 5.6800   12.8 6.2300   75.4    17.3   124.1
dis       Linux 4.19.0- 4.1100 4.5500   14.5 7.6700   72.8    18.4   123.4
dis       Linux 4.19.0- 3.7100 4.1100 9.0300   12.7   86.0    17.3   125.3
dis       Linux 4.19.0- 4.2400 3.7000 8.9100 7.1500   72.2    15.3   123.0
dis       Linux 4.19.0- 4.4300 5.1800   10.1 6.7100   74.7    18.0   123.7

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
ena       Linux 4.19.0- 3.710  14.9 12.3  33.4        47.5        93.
ena       Linux 4.19.0- 3.910  14.4 12.5  33.2        47.0        94.
ena       Linux 4.19.0- 4.540  14.9 12.4  32.8        47.5        93.
ena       Linux 4.19.0- 3.580  14.6 15.2  32.6        47.4        95.
dis       Linux 4.19.0- 4.110  14.4 15.6  31.7        49.1        94.
dis       Linux 4.19.0- 3.710  14.3 12.4  31.7        48.5        93.
dis       Linux 4.19.0- 4.240  14.5 12.5  32.1        48.4        94.
dis       Linux 4.19.0- 4.430  14.3 15.8  31.1        47.9       154.

*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS   UDP  RPC/  TCP   RPC/ TCP
                               UDP         TCP  conn
--------- ------------- ----- ----- ----- ----- ----
ena       Linux 4.19.0-                             
ena       Linux 4.19.0-                             
ena       Linux 4.19.0-                             
ena       Linux 4.19.0-                             
dis       Linux 4.19.0-                             
dis       Linux 4.19.0-                             
dis       Linux 4.19.0-                             
dis       Linux 4.19.0-                             

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
ena       Linux 4.19.0-   10.4 7.5078   53.9   12.7   611.0 4.288 1.66740 8.318
ena       Linux 4.19.0-   10.5 7.4928   54.7   12.6   612.0 4.339 1.66900 8.316
ena       Linux 4.19.0-   10.5 7.5434   54.8   12.6   610.0 4.337 1.66820 8.348
ena       Linux 4.19.0-   10.6 7.5052   53.9   12.5   629.0 4.293 1.65990 8.317
dis       Linux 4.19.0-   10.5 7.4831   54.3   12.4   210.0 0.416 1.17590 8.369
dis       Linux 4.19.0-   10.5 7.5618   53.9   12.5   212.0 0.499 1.17850 8.395
dis       Linux 4.19.0-   10.7 7.4484   54.9   12.4   211.0 0.422 1.17710 8.399
dis       Linux 4.19.0-   10.7 7.4860   55.0   12.4   212.0 0.405 1.18190 8.373

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
ena       Linux 4.19.0- 219. 455. 106.  248.2  404.9  383.9  233.6 405. 389.7
ena       Linux 4.19.0- 213. 455. 106.  246.9  405.1  383.7  233.5 405. 389.1
ena       Linux 4.19.0- 220. 463. 157.  250.1  405.2  383.7  233.5 405. 389.5
ena       Linux 4.19.0- 218. 454. 147.  252.7  405.2  383.6  233.5 405. 389.5
dis       Linux 4.19.0- 221. 448. 104.  250.5  405.2  384.2  233.4 405. 389.7
dis       Linux 4.19.0- 220. 447. 105.  257.0  405.3  383.9  233.6 405. 389.5
dis       Linux 4.19.0- 210. 442. 106.  251.1  405.2  383.9  233.4 405. 389.4
dis       Linux 4.19.0- 217. 465. 118.  249.5  405.3  383.7  233.3 405. 389.4

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
--------- -------------   ---   ----   ----    --------    --------    -------
ena       Linux 4.19.0-  1000 3.0420   24.3       210.5       371.4
ena       Linux 4.19.0-  1000 3.0420   27.3       210.7       368.3
ena       Linux 4.19.0-  1000 3.0420   27.3       210.4       370.4
ena       Linux 4.19.0-  1000 3.0420   22.8       210.4       368.8
dis       Linux 4.19.0-  1000 3.0420   27.3       210.4       368.4
dis       Linux 4.19.0-  1000 3.0420   27.3       210.4       368.4
dis       Linux 4.19.0-  1000 3.0420   24.3       210.4       370.1
dis       Linux 4.19.0-  1000 3.0420   25.8       210.4       368.6

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] ARC: ARCv2: Introduce SmaRT support
  2018-10-19 14:27 [RFC] ARC: ARCv2: Introduce SmaRT support Eugeniy Paltsev
  2018-10-19 14:33 ` [RFC] ARC: ARCv2: Introduce SmaRT support : lmbench results Eugeniy Paltsev
@ 2018-10-24 18:28 ` Vineet Gupta
  1 sibling, 0 replies; 3+ messages in thread
From: Vineet Gupta @ 2018-10-24 18:28 UTC (permalink / raw)
  To: Eugeniy Paltsev; +Cc: Alexey Brodkin, lkml, arcml


On 10/19/2018 07:27 AM, Eugeniy Paltsev wrote:
> Add compile-time 'ARC_USE_SMART' option for enabling SmaRT support.

Nice !

> Small real time trace (SmaRT) is an optional on-chip debug hardware
> component that captures instruction-trace history. It stores the
> address of the most recent non-sequential instructions executed into
> internal buffer.
>
> Usually we use MetaWare debugger to enable SmaRT and display trace
> information.
>
> This patch allows to display the decoded content of SmaRT buffer
> without MetaWare debugger. It is done by extending ordinary exception
> message with decoded SmaRT instruction-trace history.
>
> In some cases it's really usefull as it allows to show pre-exception
> instruction-trace which was not tainted by exception handler code,
> printk code, etc...

So the reason is not so much as lack of mdb, but to reduce the trace clutter. Its
funny because mdb goes to great lengths to generate the clutter (i.e. reconstruct
the interim disassembly from the sparse smaRT entries)

> Nevertheless this option has negative performance impact due to
> implementation as we dump SmaRT buffer content into external memory
> buffer in the begining of every slowpath exception handler code.
> We choose this implementation as a compromise between performance
> impact and SmaRT buffer tainting.
> Although the performance impact is not really significant (according
> to lmbench) we leave this option disabled by default.

Oh yes, this a debug feature and intrusive even if not shows in profiles, so needs
to be disabled by default.

> Here is th examples of user-space and kernel-space fault messages with
> 'ARC_USE_SMART' option enabled:
>
> User-space exception:
> ----------------------->8-------------------------
> Exception: u_hell[99]: at 0x103a2 [off 0x103a2 in /root/u_hell, VMA: 00010000:00012000]
>   ECR: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103a2
> SmaRT (64 entries):
>  [   0]    V 0x90232358 -> 0x9022ce3c [src do_page_fault+0x2c/0x2d8] [dst populate_smart+0x0/0x9c]

So I had to dig into smart spec to understand this src, dst stuff. What it implies
is that @src PC, a branch to @dst was taken.
Say we have samples SRC1: DST1, SRC2:DST2. All this is implies is that these 4 PCs
were observed. So just flatten out the SRC/DST and print them in order. So only 1
PC entry per line. makes it easier to follow and comprehend.

>  [   1]    V 0x9022e3f8 -> 0x9023232c [src EV_TLBProtV+0xec/0xf0] [dst do_page_fault+0x0/0x2d8]
>  [   2]    V 0x90233194 -> 0x9022e30c [src do_slow_path_pf+0x10/0x14] [dst EV_TLBProtV+0x0/0xf0]
>  [   3]    V 0x90233120 -> 0x90233184 [src EV_TLBMissD+0x80/0xe0] [dst do_slow_path_pf+0x0/0x14]
>  [   4]  E V 0x000103a2 -> 0x902330a0 [off 0x103a2 in /root/u_hell, VMA: 00010000:00012000] [dst EV_TLBMissD+0x0/0xe0]
>  [   5] U  V 0x2004f238 -> 0x00010398 [off 0x43238 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x10398 in /root/u_hell, VMA: 00010000:00012000]
>  [   6] U  V 0x20049a82 -> 0x2004f214 [off 0x3da82 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x43214 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]

Once we do above, then we can reduce the print clutter by only printing the vma if
it changed - again less printing means brain has to process less information.

> ...[snip]...
> ----------------------->8-------------------------
>
> TODO:
>   Add runtime procfs options to configure/suspend SmaRT.

Good.

>   Add SmaRT BCR encoding struct.
>   Check SmaRT version number in BCR.

Do we need to also think about how to co-exist with mdb. What if uses enables it
in mdb before hitting run etc.

> NOTE:
> this RFC has prerequisite:
>   http://patchwork.ozlabs.org/patch/986820/

Right I'm still not happy with our approach there and I will respond seperately
after a few trials and tribulations of my own so please be patient with that.
See below for some coding comments

>  
> +config ARC_USE_SMART

ARC_SMART_TRACE ? I know why you picked the _USE_, but the semantics are different
here.


> +	bool "Enable real time trace on-chip debug HW"

This might confused with RTT, so keep smaRT keyword here with hungarian case.

> diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
>  
> +#ifdef CONFIG_ARC_USE_SMART
> +void populate_smart(void);
> +#define POPULATE_SMART()	populate_smart()
> +#else
> +#define POPULATE_SMART()
> +#endif /* CONFIG_ARC_USE_SMART */
> +

Lets keep all smart related stuff in files of own: smart.h and smart.c

> diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
>  
>  	arc_chk_core_config();
> +
> +#ifdef CONFIG_ARC_USE_SMART

> +	if (cpuinfo_arc700[cpu_id].extn.smart)

IS_ENABLED() is better here

> +		write_aux_reg(ARC_AUX_SMART_CONTROL, SMART_CTL_EN);
> +#endif
> +
>  }
>  
> diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
>  
> +#ifdef CONFIG_ARC_USE_SMART
> +#define MAX_SMART_BUFF	4096
> +
> +struct smart_buff {
> +	u32 src[MAX_SMART_BUFF];
> +	u32 dst[MAX_SMART_BUFF];
> +	u32 flags[MAX_SMART_BUFF];
> +};

Move all of this into smart.c

> +#ifdef CONFIG_ARC_USE_SMART
> +static inline bool smart_exist(void)
> +{
> +	struct bcr_generic bcr;
> +
> +	READ_BCR(ARC_REG_SMART_BCR, bcr);
> +	return !!bcr.ver ;
> +}
> +
> +static inline u32 smart_stack_size(void)
> +{
> +	return read_aux_reg(ARC_REG_SMART_BCR) >> SMART_CTL_IDX_POS;
> +}
> +
> +static inline void smart_enable(void)
> +{
> +	write_aux_reg(ARC_AUX_SMART_CONTROL, SMART_CTL_EN);
> +}
> +
> +static inline void smart_disable(void)
> +{
> +	write_aux_reg(ARC_AUX_SMART_CONTROL, 0);
> +}

Good coding style.

> +
> +static void show_smart(void)
> +{
> +	struct smart_buff *smart_buff_cpu = this_cpu_ptr(&smart_buff_log);
> +	int i, stack_size;
> +	char *buf;
> +
> +	if (!smart_exist())
> +		return;
> +
> +	stack_size = smart_stack_size();
> +	pr_info("SmaRT (%d entries):\n", stack_size);
> +
> +	buf = (char *)__get_free_page(GFP_NOWAIT);
> +	if (!buf)
> +		return;
> +
> +	for (i = 0; i < stack_size; i++) {
> +		pr_info(" [%4d] %s%s%s%s %#010x -> %#010x ", i,
> +			smart_buff_cpu->flags[i] & SMART_FLAG_U ? "U" : " ",
> +			smart_buff_cpu->flags[i] & SMART_FLAG_E ? "E" : " ",
> +			smart_buff_cpu->flags[i] & SMART_FLAG_R ? "R" : " ",
> +			smart_buff_cpu->flags[i] & SMART_FLAG_V ? "V" : " ",
> +			smart_buff_cpu->src[i], smart_buff_cpu->dst[i]);
> +
> +		if (smart_buff_cpu->src[i] < 0x80000000)
> +			show_faulting_vma(smart_buff_cpu->src[i], buf);
> +		else
> +			pr_cont("[src %pS] ", (void *)smart_buff_cpu->src[i]);
> +
> +		if (smart_buff_cpu->dst[i] < 0x80000000)
> +			show_faulting_vma(smart_buff_cpu->dst[i], buf);
> +		else
> +			pr_cont("[dst %pS]\n", (void *)smart_buff_cpu->dst[i]);

Some of this changes, based on my comments above about flattening src/dst !

> @@ -199,6 +307,10 @@ void show_exception_mesg(struct pt_regs *regs)
>  		show_exception_mesg_u(regs);
>  	else
>  		show_exception_mesg_k(regs);
> +
> +#ifdef CONFIG_ARC_USE_SMART
> +	show_smart();
> +#endif /* CONFIG_ARC_USE_SMART */

Provide 2 variants in header to avoid #ifdef here.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-10-24 18:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-19 14:27 [RFC] ARC: ARCv2: Introduce SmaRT support Eugeniy Paltsev
2018-10-19 14:33 ` [RFC] ARC: ARCv2: Introduce SmaRT support : lmbench results Eugeniy Paltsev
2018-10-24 18:28 ` [RFC] ARC: ARCv2: Introduce SmaRT support Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).