* [RFC] ARC: ARCv2: Introduce SmaRT support
@ 2018-10-19 14:27 Eugeniy Paltsev
2018-10-19 14:33 ` [RFC] ARC: ARCv2: Introduce SmaRT support : lmbench results Eugeniy Paltsev
2018-10-24 18:28 ` [RFC] ARC: ARCv2: Introduce SmaRT support Vineet Gupta
0 siblings, 2 replies; 3+ messages in thread
From: Eugeniy Paltsev @ 2018-10-19 14:27 UTC (permalink / raw)
To: linux-snps-arc, Vineet Gupta
Cc: linux-kernel, Alexey Brodkin, Eugeniy Paltsev
Add compile-time 'ARC_USE_SMART' option for enabling SmaRT support.
Small real time trace (SmaRT) is an optional on-chip debug hardware
component that captures instruction-trace history. It stores the
address of the most recent non-sequential instructions executed into
internal buffer.
Usually we use MetaWare debugger to enable SmaRT and display trace
information.
This patch allows to display the decoded content of SmaRT buffer
without MetaWare debugger. It is done by extending ordinary exception
message with decoded SmaRT instruction-trace history.
In some cases it's really usefull as it allows to show pre-exception
instruction-trace which was not tainted by exception handler code,
printk code, etc...
Nevertheless this option has negative performance impact due to
implementation as we dump SmaRT buffer content into external memory
buffer in the begining of every slowpath exception handler code.
We choose this implementation as a compromise between performance
impact and SmaRT buffer tainting.
Although the performance impact is not really significant (according
to lmbench) we leave this option disabled by default.
Here is th examples of user-space and kernel-space fault messages with
'ARC_USE_SMART' option enabled:
User-space exception:
----------------------->8-------------------------
Exception: u_hell[99]: at 0x103a2 [off 0x103a2 in /root/u_hell, VMA: 00010000:00012000]
ECR: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103a2
SmaRT (64 entries):
[ 0] V 0x90232358 -> 0x9022ce3c [src do_page_fault+0x2c/0x2d8] [dst populate_smart+0x0/0x9c]
[ 1] V 0x9022e3f8 -> 0x9023232c [src EV_TLBProtV+0xec/0xf0] [dst do_page_fault+0x0/0x2d8]
[ 2] V 0x90233194 -> 0x9022e30c [src do_slow_path_pf+0x10/0x14] [dst EV_TLBProtV+0x0/0xf0]
[ 3] V 0x90233120 -> 0x90233184 [src EV_TLBMissD+0x80/0xe0] [dst do_slow_path_pf+0x0/0x14]
[ 4] E V 0x000103a2 -> 0x902330a0 [off 0x103a2 in /root/u_hell, VMA: 00010000:00012000] [dst EV_TLBMissD+0x0/0xe0]
[ 5] U V 0x2004f238 -> 0x00010398 [off 0x43238 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x10398 in /root/u_hell, VMA: 00010000:00012000]
[ 6] U V 0x20049a82 -> 0x2004f214 [off 0x3da82 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x43214 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
[ 7] U V 0x20049a64 -> 0x20049a76 [off 0x3da64 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x3da76 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
[ 8] U V 0x2001d8e4 -> 0x20049a58 [off 0x118e4 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x3da58 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
[ 9] U V 0x2001d8f4 -> 0x2001d8a8 [off 0x118f4 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x118a8 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
[ 10] U V 0x2001d5c8 -> 0x2001d8f0 [off 0x115c8 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x118f0 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
...[snip]...
----------------------->8-------------------------
Kernel-space exception:
----------------------->8-------------------------
Exception: at dw_mci_probe+0xf0/0x944:
ECR: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x905e26e0
SmaRT (64 entries):
[ 0] V 0x90232358 -> 0x9022ce3c [src do_page_fault+0x2c/0x2d8] [dst populate_smart+0x0/0x9c]
[ 1] V 0x9022e3f8 -> 0x9023232c [src EV_TLBProtV+0xec/0xf0] [dst do_page_fault+0x0/0x2d8]
[ 2] V 0x9022e398 -> 0x9022e3a0 [src EV_TLBProtV+0x8c/0xf0] [dst EV_TLBProtV+0x94/0xf0]
[ 3] V 0x90233194 -> 0x9022e30c [src do_slow_path_pf+0x10/0x14] [dst EV_TLBProtV+0x0/0xf0]
[ 4] V 0x902330dc -> 0x90233184 [src EV_TLBMissD+0x3c/0xe0] [dst do_slow_path_pf+0x0/0x14]
[ 5] E V 0x905e26e0 -> 0x902330a0 [src dw_mci_probe+0xf0/0x944] [dst EV_TLBMissD+0x0/0xe0]
[ 6] V 0x904cb940 -> 0x905e26d6 [src clk_get_rate+0xdc/0x208] [dst dw_mci_probe+0xe6/0x944]
[ 7] V 0x9074e14a -> 0x904cb932 [src mutex_unlock+0x1e/0x24] [dst clk_get_rate+0xce/0x208]
[ 8] V 0x904cb92e -> 0x9074e12c [src clk_get_rate+0xca/0x208] [dst mutex_unlock+0x0/0x24]
[ 9] V 0x904cb8ea -> 0x904cb91e [src clk_get_rate+0x86/0x208] [dst clk_get_rate+0xba/0x208]
[ 10] V 0x904cb8c2 -> 0x904cb8d0 [src clk_get_rate+0x5e/0x208] [dst clk_get_rate+0x6c/0x208]
[ 11] V 0x9074e01a -> 0x904cb884 [src mutex_trylock+0x4e/0x50] [dst clk_get_rate+0x20/0x208]
[ 12] V 0x9074e00e -> 0x9074e018 [src mutex_trylock+0x42/0x50] [dst mutex_trylock+0x4c/0x50]
[ 13] V 0x9074dfdc -> 0x9074dff0 [src mutex_trylock+0x10/0x50] [dst mutex_trylock+0x24/0x50]
[ 14] V 0x904cb880 -> 0x9074dfcc [src clk_get_rate+0x1c/0x208] [dst mutex_trylock+0x0/0x50]
[ 15] V 0x905e26d2 -> 0x904cb864 [src dw_mci_probe+0xe2/0x944] [dst clk_get_rate+0x0/0x208]
...[snip]...
----------------------->8-------------------------
TODO:
Add runtime procfs options to configure/suspend SmaRT.
Add SmaRT BCR encoding struct.
Check SmaRT version number in BCR.
NOTE:
this RFC has prerequisite:
http://patchwork.ozlabs.org/patch/986820/
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
---
arch/arc/Kconfig | 8 +++
arch/arc/include/asm/arcregs.h | 14 ++++++
arch/arc/include/asm/bug.h | 7 +++
arch/arc/kernel/setup.c | 6 +++
arch/arc/kernel/traps.c | 7 +++
arch/arc/kernel/troubleshoot.c | 112 +++++++++++++++++++++++++++++++++++++++++
arch/arc/mm/fault.c | 1 +
7 files changed, 155 insertions(+)
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index a045f3086047..f006fe81e085 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -528,6 +528,14 @@ config ARC_DBG_TLB_PARANOIA
bool "Paranoia Checks in Low Level TLB Handlers"
default n
+config ARC_USE_SMART
+ bool "Enable real time trace on-chip debug HW"
+ depends on ISA_ARCV2
+ help
+ Enable Small real time trace on-chip debug hardware component that
+ captures instruction-trace history if exists. This trace will be
+ printed if any unexpected exception is occured.
+
endif
config ARC_UBOOT_SUPPORT
diff --git a/arch/arc/include/asm/arcregs.h b/arch/arc/include/asm/arcregs.h
index 49bfbd879caa..11f4c5c0df4d 100644
--- a/arch/arc/include/asm/arcregs.h
+++ b/arch/arc/include/asm/arcregs.h
@@ -118,6 +118,20 @@
#define ARC_AUX_DPFP_2H 0x304
#define ARC_AUX_DPFP_STAT 0x305
+/* SmaRT registers */
+#define ARC_AUX_SMART_CONTROL 0x700
+#define SMART_CTL_EN BIT(0)
+#define SMART_CTL_DATA_POS 8
+#define SMART_CTL_DATA_SRC 0
+#define SMART_CTL_DATA_DST 1
+#define SMART_CTL_DATA_FLAG 2
+#define SMART_CTL_IDX_POS 10
+#define ARC_AUX_SMART_DATA 0x701
+#define SMART_FLAG_U BIT(8)
+#define SMART_FLAG_E BIT(9)
+#define SMART_FLAG_R BIT(10)
+#define SMART_FLAG_V BIT(31)
+
#ifndef __ASSEMBLY__
#include <soc/arc/aux.h>
diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
index b68f7f82f2d8..bc9c14bcee2a 100644
--- a/arch/arc/include/asm/bug.h
+++ b/arch/arc/include/asm/bug.h
@@ -15,6 +15,13 @@
struct task_struct;
+#ifdef CONFIG_ARC_USE_SMART
+void populate_smart(void);
+#define POPULATE_SMART() populate_smart()
+#else
+#define POPULATE_SMART()
+#endif /* CONFIG_ARC_USE_SMART */
+
void show_regs(struct pt_regs *regs);
void show_exception_mesg(struct pt_regs *regs);
void show_stacktrace(struct task_struct *tsk, struct pt_regs *regs);
diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
index b2cae79a25d7..e83bbc86397e 100644
--- a/arch/arc/kernel/setup.c
+++ b/arch/arc/kernel/setup.c
@@ -447,6 +447,12 @@ void setup_processor(void)
pr_info("%s", arc_platform_smp_cpuinfo());
arc_chk_core_config();
+
+#ifdef CONFIG_ARC_USE_SMART
+ if (cpuinfo_arc700[cpu_id].extn.smart)
+ write_aux_reg(ARC_AUX_SMART_CONTROL, SMART_CTL_EN);
+#endif
+
}
static inline int is_kernel(unsigned long addr)
diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
index e66fd40296b3..0228f7182e6d 100644
--- a/arch/arc/kernel/traps.c
+++ b/arch/arc/kernel/traps.c
@@ -70,6 +70,7 @@ int name(unsigned long address, struct pt_regs *regs) \
{ \
siginfo_t info; \
\
+ POPULATE_SMART(); \
clear_siginfo(&info); \
info.si_signo = signr; \
info.si_errno = 0; \
@@ -96,6 +97,8 @@ DO_ERROR_INFO(SIGSEGV, "gcc generated __builtin_trap", do_trap5_error, 0)
int do_misaligned_access(unsigned long address, struct pt_regs *regs,
struct callee_regs *cregs)
{
+ POPULATE_SMART();
+
/* If emulation not enabled, or failed, kill the task */
if (misaligned_fixup(address, regs, cregs) != 0)
return do_misaligned_error(address, regs);
@@ -128,6 +131,8 @@ void do_non_swi_trap(unsigned long address, struct pt_regs *regs)
{
unsigned int param = regs->ecr_param;
+ POPULATE_SMART();
+
switch (param) {
case 1:
trap_is_brkpt(address, regs);
@@ -159,6 +164,8 @@ void do_insterror_or_kprobe(unsigned long address, struct pt_regs *regs)
{
int rc;
+ POPULATE_SMART();
+
/* Check if this exception is caused by kprobes */
rc = notify_die(DIE_IERR, "kprobe_ierr", regs, address, 0, SIGILL);
if (rc == NOTIFY_STOP)
diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index fdfba1942a06..22017474a29c 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -14,10 +14,23 @@
#include <linux/file.h>
#include <linux/sched/mm.h>
#include <linux/sched/debug.h>
+#include <linux/percpu-defs.h>
#include <asm/arcregs.h>
#include <asm/irqflags.h>
+#ifdef CONFIG_ARC_USE_SMART
+#define MAX_SMART_BUFF 4096
+
+struct smart_buff {
+ u32 src[MAX_SMART_BUFF];
+ u32 dst[MAX_SMART_BUFF];
+ u32 flags[MAX_SMART_BUFF];
+};
+
+DEFINE_PER_CPU(struct smart_buff, smart_buff_log);
+#endif /* CONFIG_ARC_USE_SMART */
+
/*
* Common routine to print scratch regs (r0-r12) or callee regs (r13-r25)
* -Prints 3 regs per line and a CR.
@@ -189,9 +202,104 @@ static inline void show_exception_mesg_k(struct pt_regs *regs)
show_ecr_verbose(regs);
}
+#ifdef CONFIG_ARC_USE_SMART
+static inline bool smart_exist(void)
+{
+ struct bcr_generic bcr;
+
+ READ_BCR(ARC_REG_SMART_BCR, bcr);
+ return !!bcr.ver ;
+}
+
+static inline u32 smart_stack_size(void)
+{
+ return read_aux_reg(ARC_REG_SMART_BCR) >> SMART_CTL_IDX_POS;
+}
+
+static inline void smart_enable(void)
+{
+ write_aux_reg(ARC_AUX_SMART_CONTROL, SMART_CTL_EN);
+}
+
+static inline void smart_disable(void)
+{
+ write_aux_reg(ARC_AUX_SMART_CONTROL, 0);
+}
+
+static void show_smart(void)
+{
+ struct smart_buff *smart_buff_cpu = this_cpu_ptr(&smart_buff_log);
+ int i, stack_size;
+ char *buf;
+
+ if (!smart_exist())
+ return;
+
+ stack_size = smart_stack_size();
+ pr_info("SmaRT (%d entries):\n", stack_size);
+
+ buf = (char *)__get_free_page(GFP_NOWAIT);
+ if (!buf)
+ return;
+
+ for (i = 0; i < stack_size; i++) {
+ pr_info(" [%4d] %s%s%s%s %#010x -> %#010x ", i,
+ smart_buff_cpu->flags[i] & SMART_FLAG_U ? "U" : " ",
+ smart_buff_cpu->flags[i] & SMART_FLAG_E ? "E" : " ",
+ smart_buff_cpu->flags[i] & SMART_FLAG_R ? "R" : " ",
+ smart_buff_cpu->flags[i] & SMART_FLAG_V ? "V" : " ",
+ smart_buff_cpu->src[i], smart_buff_cpu->dst[i]);
+
+ if (smart_buff_cpu->src[i] < 0x80000000)
+ show_faulting_vma(smart_buff_cpu->src[i], buf);
+ else
+ pr_cont("[src %pS] ", (void *)smart_buff_cpu->src[i]);
+
+ if (smart_buff_cpu->dst[i] < 0x80000000)
+ show_faulting_vma(smart_buff_cpu->dst[i], buf);
+ else
+ pr_cont("[dst %pS]\n", (void *)smart_buff_cpu->dst[i]);
+ }
+
+ free_page((unsigned long)buf);
+}
+#endif /* CONFIG_ARC_USE_SMART */
+
/************************************************************************
* API called by rest of kernel
***********************************************************************/
+#ifdef CONFIG_ARC_USE_SMART
+void populate_smart(void)
+{
+ struct smart_buff *smart_buff_cpu;
+ u32 stack_size;
+ int i;
+
+ if (!smart_exist())
+ return;
+
+ smart_disable();
+
+ smart_buff_cpu = this_cpu_ptr(&smart_buff_log);
+ stack_size = smart_stack_size();
+ for (i = 0; i < stack_size; i++) {
+ write_aux_reg(ARC_AUX_SMART_CONTROL,
+ (SMART_CTL_DATA_SRC << SMART_CTL_DATA_POS)
+ | (i << SMART_CTL_IDX_POS));
+ smart_buff_cpu->src[i] = read_aux_reg(ARC_AUX_SMART_DATA);
+ write_aux_reg(ARC_AUX_SMART_CONTROL,
+ (SMART_CTL_DATA_DST << SMART_CTL_DATA_POS)
+ | (i << SMART_CTL_IDX_POS));
+ smart_buff_cpu->dst[i] = read_aux_reg(ARC_AUX_SMART_DATA);
+ write_aux_reg(ARC_AUX_SMART_CONTROL,
+ (SMART_CTL_DATA_FLAG << SMART_CTL_DATA_POS)
+ | (i << SMART_CTL_IDX_POS));
+ smart_buff_cpu->flags[i] = read_aux_reg(ARC_AUX_SMART_DATA);
+ }
+
+ smart_enable();
+}
+#endif /* CONFIG_ARC_USE_SMART */
void show_exception_mesg(struct pt_regs *regs)
{
@@ -199,6 +307,10 @@ void show_exception_mesg(struct pt_regs *regs)
show_exception_mesg_u(regs);
else
show_exception_mesg_k(regs);
+
+#ifdef CONFIG_ARC_USE_SMART
+ show_smart();
+#endif /* CONFIG_ARC_USE_SMART */
}
void show_regs(struct pt_regs *regs)
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 026d662a7668..7a759fd874de 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -72,6 +72,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
int write = regs->ecr_cause & ECR_C_PROTV_STORE; /* ST/EX */
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ POPULATE_SMART();
clear_siginfo(&info);
/*
--
2.14.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [RFC] ARC: ARCv2: Introduce SmaRT support : lmbench results
2018-10-19 14:27 [RFC] ARC: ARCv2: Introduce SmaRT support Eugeniy Paltsev
@ 2018-10-19 14:33 ` Eugeniy Paltsev
2018-10-24 18:28 ` [RFC] ARC: ARCv2: Introduce SmaRT support Vineet Gupta
1 sibling, 0 replies; 3+ messages in thread
From: Eugeniy Paltsev @ 2018-10-19 14:33 UTC (permalink / raw)
To: Vineet Gupta, linux-snps-arc; +Cc: linux-kernel, Alexey Brodkin
[-- Attachment #1: Type: text/plain, Size: 93 bytes --]
Lmbench summary with enabled and disabled SmaRT support is attached.
--
Eugeniy Paltsev
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: smart_compare.log --]
[-- Type: text/x-log; name="smart_compare.log", Size: 10957 bytes --]
L M B E N C H 3 . 0 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Basic system parameters
--------------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
----------------- ------------- --------------------------------------- ---- ----- ----- ------ ----
ena Linux 4.19.0- ena 1000 8 128 1.7500 1
ena Linux 4.19.0- ena 1000 8 1.7500 1
ena Linux 4.19.0- ena 1000 8 1.7300 1
ena Linux 4.19.0- ena 1000 8 128 1.7500 1
dis Linux 4.19.0- dis 1000 8 1.7400 1
dis Linux 4.19.0- dis 1000 8 1.7500 1
dis Linux 4.19.0- dis 1000 8 128 1.7500 1
dis Linux 4.19.0- dis 1000 8 1.7400 1
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
ena Linux 4.19.0- 1000 0.38 0.59 2.46 4.87 23.4 0.55 2.20 388. 1975 3673
ena Linux 4.19.0- 1000 0.38 0.59 2.45 4.90 23.4 0.55 2.20 366. 1979 3671
ena Linux 4.19.0- 1000 0.38 0.59 2.44 4.89 23.4 0.55 2.19 381. 1961 3696
ena Linux 4.19.0- 1000 0.38 0.59 2.43 5.22 23.4 0.55 2.18 388. 1972 3702
dis Linux 4.19.0- 1000 0.39 0.61 2.53 4.71 23.6 0.55 2.20 311. 1793 3370
dis Linux 4.19.0- 1000 0.39 0.59 2.55 4.74 23.6 0.55 2.20 311. 1807 3363
dis Linux 4.19.0- 1000 0.39 0.59 2.55 4.77 23.5 0.58 2.20 314. 1787 3373
dis Linux 4.19.0- 1000 0.39 0.59 2.54 4.78 23.5 0.61 2.20 314. 1786 3395
Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host OS intgr intgr intgr intgr intgr
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
ena Linux 4.19.0- 1.0000 0.5500 13.0 8.0000
ena Linux 4.19.0- 1.0000 0.5500 13.0 8.0100
ena Linux 4.19.0- 1.0000 0.5500 13.0 8.0100
ena Linux 4.19.0- 1.0000 0.5500 13.0 8.0000
dis Linux 4.19.0- 1.0000 0.5500 13.0 8.0000
dis Linux 4.19.0- 1.0000 0.5500 13.0 8.0000
dis Linux 4.19.0- 1.0000 0.5500 13.0 8.0000
dis Linux 4.19.0- 1.0000 0.5500 13.0 8.0000
Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS int64 int64 int64 int64 int64
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
ena Linux 4.19.0- 11. 10.3 116.2 97.0
ena Linux 4.19.0- 12. 10.3 116.2 96.4
ena Linux 4.19.0- 11. 10.3 116.2 96.5
ena Linux 4.19.0- 11. 10.3 116.2 96.4
dis Linux 4.19.0- 12. 10.3 116.2 96.4
dis Linux 4.19.0- 11. 10.3 116.2 96.4
dis Linux 4.19.0- 11. 10.3 116.2 96.5
dis Linux 4.19.0- 11. 10.3 116.2 96.4
Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host OS float float float float
add mul div bogo
--------- ------------- ------ ------ ------ ------
ena Linux 4.19.0- 145.2 144.7 288.2 730.9
ena Linux 4.19.0- 145.9 144.6 288.0 729.0
ena Linux 4.19.0- 145.9 144.7 288.2 730.2
ena Linux 4.19.0- 145.9 144.7 288.2 729.6
dis Linux 4.19.0- 146.0 144.6 288.3 730.2
dis Linux 4.19.0- 145.2 144.7 288.2 729.0
dis Linux 4.19.0- 145.4 144.7 288.2 729.9
dis Linux 4.19.0- 146.0 144.7 288.2 765.1
Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS double double double double
add mul div bogo
--------- ------------- ------ ------ ------ ------
ena Linux 4.19.0- 196.1 273.5 1506.9 2170.0
ena Linux 4.19.0- 196.7 273.5 1506.9 2112.0
ena Linux 4.19.0- 196.7 273.5 1506.9 2121.0
ena Linux 4.19.0- 196.7 273.5 1506.4 2081.3
dis Linux 4.19.0- 196.2 273.5 1497.8 2125.7
dis Linux 4.19.0- 196.7 273.5 1506.7 2082.0
dis Linux 4.19.0- 196.2 273.5 1506.7 2119.7
dis Linux 4.19.0- 196.2 273.5 1506.9 2119.0
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
ena Linux 4.19.0- 3.7100 3.8000 1.0600 7.9600 66.8 14.8 120.7
ena Linux 4.19.0- 3.9100 4.5000 7.5400 7.7400 92.7 19.8 121.8
ena Linux 4.19.0- 4.5400 5.2400 14.8 6.6600 86.7 17.3 124.9
ena Linux 4.19.0- 3.5800 5.6800 12.8 6.2300 75.4 17.3 124.1
dis Linux 4.19.0- 4.1100 4.5500 14.5 7.6700 72.8 18.4 123.4
dis Linux 4.19.0- 3.7100 4.1100 9.0300 12.7 86.0 17.3 125.3
dis Linux 4.19.0- 4.2400 3.7000 8.9100 7.1500 72.2 15.3 123.0
dis Linux 4.19.0- 4.4300 5.1800 10.1 6.7100 74.7 18.0 123.7
*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
ena Linux 4.19.0- 3.710 14.9 12.3 33.4 47.5 93.
ena Linux 4.19.0- 3.910 14.4 12.5 33.2 47.0 94.
ena Linux 4.19.0- 4.540 14.9 12.4 32.8 47.5 93.
ena Linux 4.19.0- 3.580 14.6 15.2 32.6 47.4 95.
dis Linux 4.19.0- 4.110 14.4 15.6 31.7 49.1 94.
dis Linux 4.19.0- 3.710 14.3 12.4 31.7 48.5 93.
dis Linux 4.19.0- 4.240 14.5 12.5 32.1 48.4 94.
dis Linux 4.19.0- 4.430 14.3 15.8 31.1 47.9 154.
*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS UDP RPC/ TCP RPC/ TCP
UDP TCP conn
--------- ------------- ----- ----- ----- ----- ----
ena Linux 4.19.0-
ena Linux 4.19.0-
ena Linux 4.19.0-
ena Linux 4.19.0-
dis Linux 4.19.0-
dis Linux 4.19.0-
dis Linux 4.19.0-
dis Linux 4.19.0-
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
ena Linux 4.19.0- 10.4 7.5078 53.9 12.7 611.0 4.288 1.66740 8.318
ena Linux 4.19.0- 10.5 7.4928 54.7 12.6 612.0 4.339 1.66900 8.316
ena Linux 4.19.0- 10.5 7.5434 54.8 12.6 610.0 4.337 1.66820 8.348
ena Linux 4.19.0- 10.6 7.5052 53.9 12.5 629.0 4.293 1.65990 8.317
dis Linux 4.19.0- 10.5 7.4831 54.3 12.4 210.0 0.416 1.17590 8.369
dis Linux 4.19.0- 10.5 7.5618 53.9 12.5 212.0 0.499 1.17850 8.395
dis Linux 4.19.0- 10.7 7.4484 54.9 12.4 211.0 0.422 1.17710 8.399
dis Linux 4.19.0- 10.7 7.4860 55.0 12.4 212.0 0.405 1.18190 8.373
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
ena Linux 4.19.0- 219. 455. 106. 248.2 404.9 383.9 233.6 405. 389.7
ena Linux 4.19.0- 213. 455. 106. 246.9 405.1 383.7 233.5 405. 389.1
ena Linux 4.19.0- 220. 463. 157. 250.1 405.2 383.7 233.5 405. 389.5
ena Linux 4.19.0- 218. 454. 147. 252.7 405.2 383.6 233.5 405. 389.5
dis Linux 4.19.0- 221. 448. 104. 250.5 405.2 384.2 233.4 405. 389.7
dis Linux 4.19.0- 220. 447. 105. 257.0 405.3 383.9 233.6 405. 389.5
dis Linux 4.19.0- 210. 442. 106. 251.1 405.2 383.9 233.4 405. 389.4
dis Linux 4.19.0- 217. 465. 118. 249.5 405.3 383.7 233.3 405. 389.4
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
--------- ------------- --- ---- ---- -------- -------- -------
ena Linux 4.19.0- 1000 3.0420 24.3 210.5 371.4
ena Linux 4.19.0- 1000 3.0420 27.3 210.7 368.3
ena Linux 4.19.0- 1000 3.0420 27.3 210.4 370.4
ena Linux 4.19.0- 1000 3.0420 22.8 210.4 368.8
dis Linux 4.19.0- 1000 3.0420 27.3 210.4 368.4
dis Linux 4.19.0- 1000 3.0420 27.3 210.4 368.4
dis Linux 4.19.0- 1000 3.0420 24.3 210.4 370.1
dis Linux 4.19.0- 1000 3.0420 25.8 210.4 368.6
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] ARC: ARCv2: Introduce SmaRT support
2018-10-19 14:27 [RFC] ARC: ARCv2: Introduce SmaRT support Eugeniy Paltsev
2018-10-19 14:33 ` [RFC] ARC: ARCv2: Introduce SmaRT support : lmbench results Eugeniy Paltsev
@ 2018-10-24 18:28 ` Vineet Gupta
1 sibling, 0 replies; 3+ messages in thread
From: Vineet Gupta @ 2018-10-24 18:28 UTC (permalink / raw)
To: Eugeniy Paltsev; +Cc: Alexey Brodkin, lkml, arcml
On 10/19/2018 07:27 AM, Eugeniy Paltsev wrote:
> Add compile-time 'ARC_USE_SMART' option for enabling SmaRT support.
Nice !
> Small real time trace (SmaRT) is an optional on-chip debug hardware
> component that captures instruction-trace history. It stores the
> address of the most recent non-sequential instructions executed into
> internal buffer.
>
> Usually we use MetaWare debugger to enable SmaRT and display trace
> information.
>
> This patch allows to display the decoded content of SmaRT buffer
> without MetaWare debugger. It is done by extending ordinary exception
> message with decoded SmaRT instruction-trace history.
>
> In some cases it's really usefull as it allows to show pre-exception
> instruction-trace which was not tainted by exception handler code,
> printk code, etc...
So the reason is not so much as lack of mdb, but to reduce the trace clutter. Its
funny because mdb goes to great lengths to generate the clutter (i.e. reconstruct
the interim disassembly from the sparse smaRT entries)
> Nevertheless this option has negative performance impact due to
> implementation as we dump SmaRT buffer content into external memory
> buffer in the begining of every slowpath exception handler code.
> We choose this implementation as a compromise between performance
> impact and SmaRT buffer tainting.
> Although the performance impact is not really significant (according
> to lmbench) we leave this option disabled by default.
Oh yes, this a debug feature and intrusive even if not shows in profiles, so needs
to be disabled by default.
> Here is th examples of user-space and kernel-space fault messages with
> 'ARC_USE_SMART' option enabled:
>
> User-space exception:
> ----------------------->8-------------------------
> Exception: u_hell[99]: at 0x103a2 [off 0x103a2 in /root/u_hell, VMA: 00010000:00012000]
> ECR: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103a2
> SmaRT (64 entries):
> [ 0] V 0x90232358 -> 0x9022ce3c [src do_page_fault+0x2c/0x2d8] [dst populate_smart+0x0/0x9c]
So I had to dig into smart spec to understand this src, dst stuff. What it implies
is that @src PC, a branch to @dst was taken.
Say we have samples SRC1: DST1, SRC2:DST2. All this is implies is that these 4 PCs
were observed. So just flatten out the SRC/DST and print them in order. So only 1
PC entry per line. makes it easier to follow and comprehend.
> [ 1] V 0x9022e3f8 -> 0x9023232c [src EV_TLBProtV+0xec/0xf0] [dst do_page_fault+0x0/0x2d8]
> [ 2] V 0x90233194 -> 0x9022e30c [src do_slow_path_pf+0x10/0x14] [dst EV_TLBProtV+0x0/0xf0]
> [ 3] V 0x90233120 -> 0x90233184 [src EV_TLBMissD+0x80/0xe0] [dst do_slow_path_pf+0x0/0x14]
> [ 4] E V 0x000103a2 -> 0x902330a0 [off 0x103a2 in /root/u_hell, VMA: 00010000:00012000] [dst EV_TLBMissD+0x0/0xe0]
> [ 5] U V 0x2004f238 -> 0x00010398 [off 0x43238 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x10398 in /root/u_hell, VMA: 00010000:00012000]
> [ 6] U V 0x20049a82 -> 0x2004f214 [off 0x3da82 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000] [off 0x43214 in /lib/libuClibc-1.0.18.so, VMA: 2000c000:20072000]
Once we do above, then we can reduce the print clutter by only printing the vma if
it changed - again less printing means brain has to process less information.
> ...[snip]...
> ----------------------->8-------------------------
>
> TODO:
> Add runtime procfs options to configure/suspend SmaRT.
Good.
> Add SmaRT BCR encoding struct.
> Check SmaRT version number in BCR.
Do we need to also think about how to co-exist with mdb. What if uses enables it
in mdb before hitting run etc.
> NOTE:
> this RFC has prerequisite:
> http://patchwork.ozlabs.org/patch/986820/
Right I'm still not happy with our approach there and I will respond seperately
after a few trials and tribulations of my own so please be patient with that.
See below for some coding comments
>
> +config ARC_USE_SMART
ARC_SMART_TRACE ? I know why you picked the _USE_, but the semantics are different
here.
> + bool "Enable real time trace on-chip debug HW"
This might confused with RTT, so keep smaRT keyword here with hungarian case.
> diff --git a/arch/arc/include/asm/bug.h b/arch/arc/include/asm/bug.h
>
> +#ifdef CONFIG_ARC_USE_SMART
> +void populate_smart(void);
> +#define POPULATE_SMART() populate_smart()
> +#else
> +#define POPULATE_SMART()
> +#endif /* CONFIG_ARC_USE_SMART */
> +
Lets keep all smart related stuff in files of own: smart.h and smart.c
> diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
>
> arc_chk_core_config();
> +
> +#ifdef CONFIG_ARC_USE_SMART
> + if (cpuinfo_arc700[cpu_id].extn.smart)
IS_ENABLED() is better here
> + write_aux_reg(ARC_AUX_SMART_CONTROL, SMART_CTL_EN);
> +#endif
> +
> }
>
> diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
>
> +#ifdef CONFIG_ARC_USE_SMART
> +#define MAX_SMART_BUFF 4096
> +
> +struct smart_buff {
> + u32 src[MAX_SMART_BUFF];
> + u32 dst[MAX_SMART_BUFF];
> + u32 flags[MAX_SMART_BUFF];
> +};
Move all of this into smart.c
> +#ifdef CONFIG_ARC_USE_SMART
> +static inline bool smart_exist(void)
> +{
> + struct bcr_generic bcr;
> +
> + READ_BCR(ARC_REG_SMART_BCR, bcr);
> + return !!bcr.ver ;
> +}
> +
> +static inline u32 smart_stack_size(void)
> +{
> + return read_aux_reg(ARC_REG_SMART_BCR) >> SMART_CTL_IDX_POS;
> +}
> +
> +static inline void smart_enable(void)
> +{
> + write_aux_reg(ARC_AUX_SMART_CONTROL, SMART_CTL_EN);
> +}
> +
> +static inline void smart_disable(void)
> +{
> + write_aux_reg(ARC_AUX_SMART_CONTROL, 0);
> +}
Good coding style.
> +
> +static void show_smart(void)
> +{
> + struct smart_buff *smart_buff_cpu = this_cpu_ptr(&smart_buff_log);
> + int i, stack_size;
> + char *buf;
> +
> + if (!smart_exist())
> + return;
> +
> + stack_size = smart_stack_size();
> + pr_info("SmaRT (%d entries):\n", stack_size);
> +
> + buf = (char *)__get_free_page(GFP_NOWAIT);
> + if (!buf)
> + return;
> +
> + for (i = 0; i < stack_size; i++) {
> + pr_info(" [%4d] %s%s%s%s %#010x -> %#010x ", i,
> + smart_buff_cpu->flags[i] & SMART_FLAG_U ? "U" : " ",
> + smart_buff_cpu->flags[i] & SMART_FLAG_E ? "E" : " ",
> + smart_buff_cpu->flags[i] & SMART_FLAG_R ? "R" : " ",
> + smart_buff_cpu->flags[i] & SMART_FLAG_V ? "V" : " ",
> + smart_buff_cpu->src[i], smart_buff_cpu->dst[i]);
> +
> + if (smart_buff_cpu->src[i] < 0x80000000)
> + show_faulting_vma(smart_buff_cpu->src[i], buf);
> + else
> + pr_cont("[src %pS] ", (void *)smart_buff_cpu->src[i]);
> +
> + if (smart_buff_cpu->dst[i] < 0x80000000)
> + show_faulting_vma(smart_buff_cpu->dst[i], buf);
> + else
> + pr_cont("[dst %pS]\n", (void *)smart_buff_cpu->dst[i]);
Some of this changes, based on my comments above about flattening src/dst !
> @@ -199,6 +307,10 @@ void show_exception_mesg(struct pt_regs *regs)
> show_exception_mesg_u(regs);
> else
> show_exception_mesg_k(regs);
> +
> +#ifdef CONFIG_ARC_USE_SMART
> + show_smart();
> +#endif /* CONFIG_ARC_USE_SMART */
Provide 2 variants in header to avoid #ifdef here.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-10-24 18:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-19 14:27 [RFC] ARC: ARCv2: Introduce SmaRT support Eugeniy Paltsev
2018-10-19 14:33 ` [RFC] ARC: ARCv2: Introduce SmaRT support : lmbench results Eugeniy Paltsev
2018-10-24 18:28 ` [RFC] ARC: ARCv2: Introduce SmaRT support Vineet Gupta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).