linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/9] Add data type profiling support for powerpc
@ 2024-05-06 12:18 Athira Rajeev
  2024-05-06 12:18 ` [PATCH V2 1/9] tools/perf: Move the data structures related to register type to header file Athira Rajeev
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:18 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

The patchset from Namhyung added support for data type profiling
in perf tool. This enabled support to associate PMU samples to data
types they refer using DWARF debug information. With the upstream
perf, currently it possible to run perf report or perf annotate to
view the data type information on x86.

Initial patchset posted here had changes need to enable data type
profiling support for powerpc.

https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/

Main change were:
1. powerpc instruction nmemonic table to associate load/store
instructions with move_ops which is use to identify if instruction
is a memory access one.
2. To get register number and access offset from the given
instruction, code uses fields from "struct arch" -> objump.
Added entry for powerpc here.
3. A get_arch_regnum to return register number from the
register name string.

But the apporach used in the initial patchset used parsing of
disassembled code which the current perf tool implementation does.

Example: lwz     r10,0(r9)

This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. Also to find whether there is a memory
reference in the operands, "memory_ref_char" field of objdump is used.
For x86, "(" is used as memory_ref_char to tackle instructions of the
form "mov  (%rax), %rcx".

In case of powerpc, not all instructions using "(" are the only memory
instructions. Example, above instruction can also be of extended form (X
form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
and extract the source/target registers, this patchset adds support to use
raw instruction. With raw instruction, macros are added to extract opcode
and register fields.

Example representation using --show-raw-insn in objdump gives result:

38 01 81 e8     ld      r4,312(r1)

Here "38 01 81 e8" is the raw instruction representation. In powerpc,
this translates to instruction form: "ld RT,DS(RA)" and binary code
as:
_____________________________________
| 58 |  RT  |  RA |      DS       | |
-------------------------------------
0    6     11    16              30 31

Patchset adds support to pick the opcode and reg fields from this
raw/binary instruction code. This approach came in from review comment
by Segher Boessenkool for the initial patchset.

Apart from that, instruction tracking is enabled for powerpc and
support function is added to find variables defined as registers
Example, in powerpc, two registers are
defined to represent variable:
1. r13: represents local_paca
register struct paca_struct *local_paca asm("r13");

2. r1: represents stack_pointer
register void *__stack_pointer asm("r1");

These are handled in this patchset.

- Patch 1 is to rearrange register state type structures to header file
so that it can referred from other arch specific files
- Patch 2 is to make instruction tracking as a callback to"struct arch"
so that it can be implemented by other archs easily and defined in arch
specific files
- Patch 3 is to fix a small comment
- Patch 4 adds support to capture and parse raw instruction in objdump
by keeping existing approach intact.
- Patch 5 update parameters for reg extract functions to use raw
instruction on powerpc
- Patch 6 and patch 7 handles instruction tracking for powerpc.
- Patch 8 and Patch 8 handles support to find global register variables

With the current patchset:

 ./perf record -a -e mem-loads sleep 1
 ./perf report -s type,typeoff --hierarchy --group --stdio
 ./perf annotate --data-type --insn-stat

perf annotate logs:

Annotate Instruction stats
total 562, ok 441 (78.5%), bad 121 (21.5%)

  Name      :  Good   Bad
-----------------------------------------------------------
  ld        :   313    54
  lwz       :    51    32
  lbz       :    31     5
  ldx       :     6    21
  lhz       :    23     0
  lwa       :     4     3
  lwarx     :     5     0
  lwzx      :     2     2
  ldarx     :     3     0
  lwzu      :     2     0
  stdcx.    :     0     1
  nop       :     0     1
  ldu       :     1     0
  lbzx      :     0     1
  lwax      :     0     1

perf report logs:

# Samples: 1K of event 'mem-loads'
# Event count (approx.): 937238
#
# Overhead  Data Type  Data Type Offset
# ........  .........  ................
#
    48.81%  (unknown)  (unknown) +0 (no field)
    12.85%  long unsigned int  long unsigned int +0 (current_stack_pointer)
     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
     2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
     2.64%  struct paca_struct  struct paca_struct +2808 (canary)
     2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
     1.41%  struct vm_fault  struct vm_fault +0 (vma)
     1.29%  struct task_struct  struct task_struct +276 (flags)
     1.03%  struct pt_regs  struct pt_regs +264 (user_regs.msr)
     1.00%  struct menu_device  struct menu_device +4 (tick_wakeup)
     0.90%  struct security_hook_list  struct security_hook_list +0 (list.next)
     0.76%  struct irq_desc  struct irq_desc +304 (irq_data.chip)
     0.76%  struct rq  struct rq +2856 (cpu)

Thanks
Athira Rajeev

Changelog:
From v1->v2:
- Addressed suggestion from Christophe Leroy and Segher Boessenkool
  to use the binary code (raw insn) to fetch opcode, register and
  offset fields.
- Added support for instruction tracking in powerpc
- Find the register defined variables (r13 and r1 which points to
  local_paca and current_stack_pointer in powerpc)

Athira Rajeev (9):
  tools/perf: Move the data structures related to register  type to
    header file
  tools/perf: Add "update_insn_state" callback function to handle arch
    specific instruction tracking
  tools/perf: Fix a comment about multi_regs in extract_reg_offset
    function
  tools/perf: Add support to capture and parse raw instruction in
    objdump
  tools/perf: Update parameters for reg extract functions to use raw
    instruction on powerpc
  tools/perf: Update instruction tracking for powerpc
  tools/perf: Update instruction tracking with add instruction
  tools/perf: Add support to find global register variables using
    find_data_type_global_reg
  tools/perf: Add support for global_die to capture name of variable in
    case of register defined variable

 tools/include/linux/string.h                  |   2 +
 tools/lib/string.c                            |  13 +
 .../perf/arch/powerpc/annotate/instructions.c |  84 +++
 tools/perf/arch/powerpc/util/dwarf-regs.c     |  52 ++
 tools/perf/arch/x86/annotate/instructions.c   | 383 +++++++++++++
 tools/perf/util/annotate-data.c               | 519 +++---------------
 tools/perf/util/annotate-data.h               |  78 +++
 tools/perf/util/annotate.c                    |  32 +-
 tools/perf/util/annotate.h                    |   1 +
 tools/perf/util/disasm.c                      | 109 +++-
 tools/perf/util/disasm.h                      |  17 +-
 tools/perf/util/dwarf-aux.c                   |   1 +
 tools/perf/util/dwarf-aux.h                   |   1 +
 tools/perf/util/include/dwarf-regs.h          |  12 +
 tools/perf/util/sort.c                        |   7 +-
 15 files changed, 854 insertions(+), 457 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V2 1/9] tools/perf: Move the data structures related to register  type to header file
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
@ 2024-05-06 12:18 ` Athira Rajeev
  2024-05-06 12:18 ` [PATCH V2 2/9] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking Athira Rajeev
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:18 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

Data type profiling uses instruction tracking by checking each
instruction and updating the register type state in some data
structures. This is useful to find the data type in cases when the
register state gets transferred from one reg to another. Example, in
x86, "mov" instruction and in powerpc, "mr" instruction. Currently these
structures are defined in annotate-data.c and instruction tracking is
implemented only for x86. Move these data structures to
"annotate-data.h" header file so that other arch implementations can use
it in arch specific files as well.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/annotate-data.c | 53 +------------------------------
 tools/perf/util/annotate-data.h | 55 +++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 52 deletions(-)

diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 2c98813f95cd..e812dec09c99 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -30,15 +30,6 @@
 
 static void delete_var_types(struct die_var_type *var_types);
 
-enum type_state_kind {
-	TSR_KIND_INVALID = 0,
-	TSR_KIND_TYPE,
-	TSR_KIND_PERCPU_BASE,
-	TSR_KIND_CONST,
-	TSR_KIND_POINTER,
-	TSR_KIND_CANARY,
-};
-
 #define pr_debug_dtp(fmt, ...)					\
 do {								\
 	if (debug_type_profile)					\
@@ -139,49 +130,7 @@ static void pr_debug_location(Dwarf_Die *die, u64 pc, int reg)
 	}
 }
 
-/*
- * Type information in a register, valid when @ok is true.
- * The @caller_saved registers are invalidated after a function call.
- */
-struct type_state_reg {
-	Dwarf_Die type;
-	u32 imm_value;
-	bool ok;
-	bool caller_saved;
-	u8 kind;
-};
-
-/* Type information in a stack location, dynamically allocated */
-struct type_state_stack {
-	struct list_head list;
-	Dwarf_Die type;
-	int offset;
-	int size;
-	bool compound;
-	u8 kind;
-};
-
-/* FIXME: This should be arch-dependent */
-#define TYPE_STATE_MAX_REGS  16
-
-/*
- * State table to maintain type info in each register and stack location.
- * It'll be updated when new variable is allocated or type info is moved
- * to a new location (register or stack).  As it'd be used with the
- * shortest path of basic blocks, it only maintains a single table.
- */
-struct type_state {
-	/* state of general purpose registers */
-	struct type_state_reg regs[TYPE_STATE_MAX_REGS];
-	/* state of stack location */
-	struct list_head stack_vars;
-	/* return value register */
-	int ret_reg;
-	/* stack pointer register */
-	int stack_reg;
-};
-
-static bool has_reg_type(struct type_state *state, int reg)
+bool has_reg_type(struct type_state *state, int reg)
 {
 	return (unsigned)reg < ARRAY_SIZE(state->regs);
 }
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index 0a57d9f5ee78..ef235b1b15e1 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -6,6 +6,9 @@
 #include <linux/compiler.h>
 #include <linux/rbtree.h>
 #include <linux/types.h>
+#include "dwarf-aux.h"
+#include "annotate.h"
+#include "debuginfo.h"
 
 struct annotated_op_loc;
 struct debuginfo;
@@ -15,6 +18,15 @@ struct hist_entry;
 struct map_symbol;
 struct thread;
 
+enum type_state_kind {
+	TSR_KIND_INVALID = 0,
+	TSR_KIND_TYPE,
+	TSR_KIND_PERCPU_BASE,
+	TSR_KIND_CONST,
+	TSR_KIND_POINTER,
+	TSR_KIND_CANARY,
+};
+
 /**
  * struct annotated_member - Type of member field
  * @node: List entry in the parent list
@@ -142,6 +154,48 @@ struct annotated_data_stat {
 };
 extern struct annotated_data_stat ann_data_stat;
 
+/*
+ * Type information in a register, valid when @ok is true.
+ * The @caller_saved registers are invalidated after a function call.
+ */
+struct type_state_reg {
+	Dwarf_Die type;
+	u32 imm_value;
+	bool ok;
+	bool caller_saved;
+	u8 kind;
+};
+
+/* Type information in a stack location, dynamically allocated */
+struct type_state_stack {
+	struct list_head list;
+	Dwarf_Die type;
+	int offset;
+	int size;
+	bool compound;
+	u8 kind;
+};
+
+/* FIXME: This should be arch-dependent */
+#define TYPE_STATE_MAX_REGS  32
+
+/*
+ * State table to maintain type info in each register and stack location.
+ * It'll be updated when new variable is allocated or type info is moved
+ * to a new location (register or stack).  As it'd be used with the
+ * shortest path of basic blocks, it only maintains a single table.
+ */
+struct type_state {
+	/* state of general purpose registers */
+	struct type_state_reg regs[TYPE_STATE_MAX_REGS];
+	/* state of stack location */
+	struct list_head stack_vars;
+	/* return value register */
+	int ret_reg;
+	/* stack pointer register */
+	int stack_reg;
+};
+
 #ifdef HAVE_DWARF_SUPPORT
 
 /* Returns data type at the location (ip, reg, offset) */
@@ -160,6 +214,7 @@ void global_var_type__tree_delete(struct rb_root *root);
 
 int hist_entry__annotate_data_tty(struct hist_entry *he, struct evsel *evsel);
 
+bool has_reg_type(struct type_state *state, int reg);
 #else /* HAVE_DWARF_SUPPORT */
 
 static inline struct annotated_data_type *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 2/9] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
  2024-05-06 12:18 ` [PATCH V2 1/9] tools/perf: Move the data structures related to register type to header file Athira Rajeev
@ 2024-05-06 12:18 ` Athira Rajeev
  2024-05-06 12:19 ` [PATCH V2 3/9] tools/perf: Fix a comment about multi_regs in extract_reg_offset function Athira Rajeev
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:18 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

Add "update_insn_state" callback to "struct arch" to handle instruction
tracking. Currently updating instruction state is handled by static
function "update_insn_state_x86" which is defined in "annotate-data.c".
Make this as a callback for specific arch and move to archs specific
file "arch/x86/annotate/instructions.c" . This will help to add helper
function for other platforms in file:
"arch/<platform>/annotate/instructions.c and make changes/updates
easier.

Define callback "update_insn_state" as part of "struct arch", also make
some of the debug functions non-static so that it can be referenced from
other places.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/arch/x86/annotate/instructions.c | 383 +++++++++++++++++++
 tools/perf/util/annotate-data.c             | 391 +-------------------
 tools/perf/util/annotate-data.h             |  23 ++
 tools/perf/util/disasm.c                    |   2 +
 tools/perf/util/disasm.h                    |   7 +
 5 files changed, 423 insertions(+), 383 deletions(-)

diff --git a/tools/perf/arch/x86/annotate/instructions.c b/tools/perf/arch/x86/annotate/instructions.c
index 5cdf457f5cbe..cd2fa59a8034 100644
--- a/tools/perf/arch/x86/annotate/instructions.c
+++ b/tools/perf/arch/x86/annotate/instructions.c
@@ -206,3 +206,386 @@ static int x86__annotate_init(struct arch *arch, char *cpuid)
 	arch->initialized = true;
 	return err;
 }
+
+#ifdef HAVE_DWARF_SUPPORT
+static void update_insn_state_x86(struct type_state *state,
+				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
+				  struct disasm_line *dl)
+{
+	struct annotated_insn_loc loc;
+	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
+	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
+	struct type_state_reg *tsr;
+	Dwarf_Die type_die;
+	u32 insn_offset = dl->al.offset;
+	int fbreg = dloc->fbreg;
+	int fboff = 0;
+
+	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
+		return;
+
+	if (ins__is_call(&dl->ins)) {
+		struct symbol *func = dl->ops.target.sym;
+
+		if (func == NULL)
+			return;
+
+		/* __fentry__ will preserve all registers */
+		if (!strcmp(func->name, "__fentry__"))
+			return;
+
+		pr_debug_dtp("call [%x] %s\n", insn_offset, func->name);
+
+		/* Otherwise invalidate caller-saved registers after call */
+		for (unsigned i = 0; i < ARRAY_SIZE(state->regs); i++) {
+			if (state->regs[i].caller_saved)
+				state->regs[i].ok = false;
+		}
+
+		/* Update register with the return type (if any) */
+		if (die_find_func_rettype(cu_die, func->name, &type_die)) {
+			tsr = &state->regs[state->ret_reg];
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("call [%x] return -> reg%d",
+				     insn_offset, state->ret_reg);
+			pr_debug_type_name(&type_die, tsr->kind);
+		}
+		return;
+	}
+
+	if (!strncmp(dl->ins.name, "add", 3)) {
+		u64 imm_value = -1ULL;
+		int offset;
+		const char *var_name = NULL;
+		struct map_symbol *ms = dloc->ms;
+		u64 ip = ms->sym->start + dl->al.offset;
+
+		if (!has_reg_type(state, dst->reg1))
+			return;
+
+		tsr = &state->regs[dst->reg1];
+
+		if (src->imm)
+			imm_value = src->offset;
+		else if (has_reg_type(state, src->reg1) &&
+			 state->regs[src->reg1].kind == TSR_KIND_CONST)
+			imm_value = state->regs[src->reg1].imm_value;
+		else if (src->reg1 == DWARF_REG_PC) {
+			u64 var_addr = annotate_calc_pcrel(dloc->ms, ip,
+							   src->offset, dl);
+
+			if (get_global_var_info(dloc, var_addr,
+						&var_name, &offset) &&
+			    !strcmp(var_name, "this_cpu_off") &&
+			    tsr->kind == TSR_KIND_CONST) {
+				tsr->kind = TSR_KIND_PERCPU_BASE;
+				imm_value = tsr->imm_value;
+			}
+		}
+		else
+			return;
+
+		if (tsr->kind != TSR_KIND_PERCPU_BASE)
+			return;
+
+		if (get_global_var_type(cu_die, dloc, ip, imm_value, &offset,
+					&type_die) && offset == 0) {
+			/*
+			 * This is not a pointer type, but it should be treated
+			 * as a pointer.
+			 */
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_POINTER;
+			tsr->ok = true;
+
+			pr_debug_dtp("add [%x] percpu %#"PRIx64" -> reg%d",
+				     insn_offset, imm_value, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		return;
+	}
+
+	if (strncmp(dl->ins.name, "mov", 3))
+		return;
+
+	if (dloc->fb_cfa) {
+		u64 ip = dloc->ms->sym->start + dl->al.offset;
+		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
+
+		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
+			fbreg = -1;
+	}
+
+	/* Case 1. register to register or segment:offset to register transfers */
+	if (!src->mem_ref && !dst->mem_ref) {
+		if (!has_reg_type(state, dst->reg1))
+			return;
+
+		tsr = &state->regs[dst->reg1];
+		if (map__dso(dloc->ms->map)->kernel &&
+		    src->segment == INSN_SEG_X86_GS && src->imm) {
+			u64 ip = dloc->ms->sym->start + dl->al.offset;
+			u64 var_addr;
+			int offset;
+
+			/*
+			 * In kernel, %gs points to a per-cpu region for the
+			 * current CPU.  Access with a constant offset should
+			 * be treated as a global variable access.
+			 */
+			var_addr = src->offset;
+
+			if (var_addr == 40) {
+				tsr->kind = TSR_KIND_CANARY;
+				tsr->ok = true;
+
+				pr_debug_dtp("mov [%x] stack canary -> reg%d\n",
+					     insn_offset, dst->reg1);
+				return;
+			}
+
+			if (!get_global_var_type(cu_die, dloc, ip, var_addr,
+						 &offset, &type_die) ||
+			    !die_get_member_type(&type_die, offset, &type_die)) {
+				tsr->ok = false;
+				return;
+			}
+
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] this-cpu addr=%#"PRIx64" -> reg%d",
+				     insn_offset, var_addr, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+			return;
+		}
+
+		if (src->imm) {
+			tsr->kind = TSR_KIND_CONST;
+			tsr->imm_value = src->offset;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
+				     insn_offset, tsr->imm_value, dst->reg1);
+			return;
+		}
+
+		if (!has_reg_type(state, src->reg1) ||
+		    !state->regs[src->reg1].ok) {
+			tsr->ok = false;
+			return;
+		}
+
+		tsr->type = state->regs[src->reg1].type;
+		tsr->kind = state->regs[src->reg1].kind;
+		tsr->ok = true;
+
+		pr_debug_dtp("mov [%x] reg%d -> reg%d",
+			     insn_offset, src->reg1, dst->reg1);
+		pr_debug_type_name(&tsr->type, tsr->kind);
+	}
+	/* Case 2. memory to register transers */
+	if (src->mem_ref && !dst->mem_ref) {
+		int sreg = src->reg1;
+
+		if (!has_reg_type(state, dst->reg1))
+			return;
+
+		tsr = &state->regs[dst->reg1];
+
+retry:
+		/* Check stack variables with offset */
+		if (sreg == fbreg) {
+			struct type_state_stack *stack;
+			int offset = src->offset - fboff;
+
+			stack = find_stack_state(state, offset);
+			if (stack == NULL) {
+				tsr->ok = false;
+				return;
+			} else if (!stack->compound) {
+				tsr->type = stack->type;
+				tsr->kind = stack->kind;
+				tsr->ok = true;
+			} else if (die_get_member_type(&stack->type,
+						       offset - stack->offset,
+						       &type_die)) {
+				tsr->type = type_die;
+				tsr->kind = TSR_KIND_TYPE;
+				tsr->ok = true;
+			} else {
+				tsr->ok = false;
+				return;
+			}
+
+			pr_debug_dtp("mov [%x] -%#x(stack) -> reg%d",
+				     insn_offset, -offset, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/* And then dereference the pointer if it has one */
+		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
+			 state->regs[sreg].kind == TSR_KIND_TYPE &&
+			 die_deref_ptr_type(&state->regs[sreg].type,
+					    src->offset, &type_die)) {
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] %#x(reg%d) -> reg%d",
+				     insn_offset, src->offset, sreg, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/* Or check if it's a global variable */
+		else if (sreg == DWARF_REG_PC) {
+			struct map_symbol *ms = dloc->ms;
+			u64 ip = ms->sym->start + dl->al.offset;
+			u64 addr;
+			int offset;
+
+			addr = annotate_calc_pcrel(ms, ip, src->offset, dl);
+
+			if (!get_global_var_type(cu_die, dloc, ip, addr, &offset,
+						 &type_die) ||
+			    !die_get_member_type(&type_die, offset, &type_die)) {
+				tsr->ok = false;
+				return;
+			}
+
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] global addr=%"PRIx64" -> reg%d",
+				     insn_offset, addr, dst->reg1);
+			pr_debug_type_name(&type_die, tsr->kind);
+		}
+		/* And check percpu access with base register */
+		else if (has_reg_type(state, sreg) &&
+			 state->regs[sreg].kind == TSR_KIND_PERCPU_BASE) {
+			u64 ip = dloc->ms->sym->start + dl->al.offset;
+			u64 var_addr = src->offset;
+			int offset;
+
+			if (src->multi_regs) {
+				int reg2 = (sreg == src->reg1) ? src->reg2 : src->reg1;
+
+				if (has_reg_type(state, reg2) && state->regs[reg2].ok &&
+				    state->regs[reg2].kind == TSR_KIND_CONST)
+					var_addr += state->regs[reg2].imm_value;
+			}
+
+			/*
+			 * In kernel, %gs points to a per-cpu region for the
+			 * current CPU.  Access with a constant offset should
+			 * be treated as a global variable access.
+			 */
+			if (get_global_var_type(cu_die, dloc, ip, var_addr,
+						&offset, &type_die) &&
+			    die_get_member_type(&type_die, offset, &type_die)) {
+				tsr->type = type_die;
+				tsr->kind = TSR_KIND_TYPE;
+				tsr->ok = true;
+
+				if (src->multi_regs) {
+					pr_debug_dtp("mov [%x] percpu %#x(reg%d,reg%d) -> reg%d",
+						     insn_offset, src->offset, src->reg1,
+						     src->reg2, dst->reg1);
+				} else {
+					pr_debug_dtp("mov [%x] percpu %#x(reg%d) -> reg%d",
+						     insn_offset, src->offset, sreg, dst->reg1);
+				}
+				pr_debug_type_name(&tsr->type, tsr->kind);
+			} else {
+				tsr->ok = false;
+			}
+		}
+		/* And then dereference the calculated pointer if it has one */
+		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
+			 state->regs[sreg].kind == TSR_KIND_POINTER &&
+			 die_get_member_type(&state->regs[sreg].type,
+					     src->offset, &type_die)) {
+			tsr->type = type_die;
+			tsr->kind = TSR_KIND_TYPE;
+			tsr->ok = true;
+
+			pr_debug_dtp("mov [%x] pointer %#x(reg%d) -> reg%d",
+				     insn_offset, src->offset, sreg, dst->reg1);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/* Or try another register if any */
+		else if (src->multi_regs && sreg == src->reg1 &&
+			 src->reg1 != src->reg2) {
+			sreg = src->reg2;
+			goto retry;
+		}
+		else {
+			int offset;
+			const char *var_name = NULL;
+
+			/* it might be per-cpu variable (in kernel) access */
+			if (src->offset < 0) {
+				if (get_global_var_info(dloc, (s64)src->offset,
+							&var_name, &offset) &&
+				    !strcmp(var_name, "__per_cpu_offset")) {
+					tsr->kind = TSR_KIND_PERCPU_BASE;
+
+					pr_debug_dtp("mov [%x] percpu base reg%d\n",
+						     insn_offset, dst->reg1);
+				}
+			}
+
+			tsr->ok = false;
+		}
+	}
+	/* Case 3. register to memory transfers */
+	if (!src->mem_ref && dst->mem_ref) {
+		if (!has_reg_type(state, src->reg1) ||
+		    !state->regs[src->reg1].ok)
+			return;
+
+		/* Check stack variables with offset */
+		if (dst->reg1 == fbreg) {
+			struct type_state_stack *stack;
+			int offset = dst->offset - fboff;
+
+			tsr = &state->regs[src->reg1];
+
+			stack = find_stack_state(state, offset);
+			if (stack) {
+				/*
+				 * The source register is likely to hold a type
+				 * of member if it's a compound type.  Do not
+				 * update the stack variable type since we can
+				 * get the member type later by using the
+				 * die_get_member_type().
+				 */
+				if (!stack->compound)
+					set_stack_state(stack, offset, tsr->kind,
+							&tsr->type);
+			} else {
+				findnew_stack_state(state, offset, tsr->kind,
+						    &tsr->type);
+			}
+
+			pr_debug_dtp("mov [%x] reg%d -> -%#x(stack)",
+				     insn_offset, src->reg1, -offset);
+			pr_debug_type_name(&tsr->type, tsr->kind);
+		}
+		/*
+		 * Ignore other transfers since it'd set a value in a struct
+		 * and won't change the type.
+		 */
+	}
+	/* Case 4. memory to memory transfers (not handled for now) */
+}
+#else /* HAVE_DWARF_SUPPORT */
+static void update_insn_state_x86(struct type_state *state __maybe_unused, struct data_loc_info *dloc __maybe_unused,
+		Dwarf_Die *cu_die __maybe_unused, struct disasm_line *dl __maybe_unused)
+{
+	return;
+}
+#endif /* HAVE_DWARF_SUPPORT */
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index e812dec09c99..9d6d4f472c85 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -38,7 +38,7 @@ do {								\
 		pr_debug3(fmt, ##__VA_ARGS__);			\
 } while (0)
 
-static void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
+void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind)
 {
 	struct strbuf sb;
 	char *str;
@@ -389,7 +389,7 @@ static int check_variable(struct data_loc_info *dloc, Dwarf_Die *var_die,
 	return 0;
 }
 
-static struct type_state_stack *find_stack_state(struct type_state *state,
+struct type_state_stack *find_stack_state(struct type_state *state,
 						 int offset)
 {
 	struct type_state_stack *stack;
@@ -405,7 +405,7 @@ static struct type_state_stack *find_stack_state(struct type_state *state,
 	return NULL;
 }
 
-static void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
+void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
 			    Dwarf_Die *type_die)
 {
 	int tag;
@@ -432,7 +432,7 @@ static void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
 	}
 }
 
-static struct type_state_stack *findnew_stack_state(struct type_state *state,
+struct type_state_stack *findnew_stack_state(struct type_state *state,
 						    int offset, u8 kind,
 						    Dwarf_Die *type_die)
 {
@@ -536,7 +536,7 @@ void global_var_type__tree_delete(struct rb_root *root)
 	}
 }
 
-static bool get_global_var_info(struct data_loc_info *dloc, u64 addr,
+bool get_global_var_info(struct data_loc_info *dloc, u64 addr,
 				const char **var_name, int *var_offset)
 {
 	struct addr_location al;
@@ -610,7 +610,7 @@ static void global_var__collect(struct data_loc_info *dloc)
 	}
 }
 
-static bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
+bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
 				u64 ip, u64 var_addr, int *var_offset,
 				Dwarf_Die *type_die)
 {
@@ -721,381 +721,6 @@ static void update_var_state(struct type_state *state, struct data_loc_info *dlo
 	}
 }
 
-static void update_insn_state_x86(struct type_state *state,
-				  struct data_loc_info *dloc, Dwarf_Die *cu_die,
-				  struct disasm_line *dl)
-{
-	struct annotated_insn_loc loc;
-	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
-	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
-	struct type_state_reg *tsr;
-	Dwarf_Die type_die;
-	u32 insn_offset = dl->al.offset;
-	int fbreg = dloc->fbreg;
-	int fboff = 0;
-
-	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
-		return;
-
-	if (ins__is_call(&dl->ins)) {
-		struct symbol *func = dl->ops.target.sym;
-
-		if (func == NULL)
-			return;
-
-		/* __fentry__ will preserve all registers */
-		if (!strcmp(func->name, "__fentry__"))
-			return;
-
-		pr_debug_dtp("call [%x] %s\n", insn_offset, func->name);
-
-		/* Otherwise invalidate caller-saved registers after call */
-		for (unsigned i = 0; i < ARRAY_SIZE(state->regs); i++) {
-			if (state->regs[i].caller_saved)
-				state->regs[i].ok = false;
-		}
-
-		/* Update register with the return type (if any) */
-		if (die_find_func_rettype(cu_die, func->name, &type_die)) {
-			tsr = &state->regs[state->ret_reg];
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("call [%x] return -> reg%d",
-				     insn_offset, state->ret_reg);
-			pr_debug_type_name(&type_die, tsr->kind);
-		}
-		return;
-	}
-
-	if (!strncmp(dl->ins.name, "add", 3)) {
-		u64 imm_value = -1ULL;
-		int offset;
-		const char *var_name = NULL;
-		struct map_symbol *ms = dloc->ms;
-		u64 ip = ms->sym->start + dl->al.offset;
-
-		if (!has_reg_type(state, dst->reg1))
-			return;
-
-		tsr = &state->regs[dst->reg1];
-
-		if (src->imm)
-			imm_value = src->offset;
-		else if (has_reg_type(state, src->reg1) &&
-			 state->regs[src->reg1].kind == TSR_KIND_CONST)
-			imm_value = state->regs[src->reg1].imm_value;
-		else if (src->reg1 == DWARF_REG_PC) {
-			u64 var_addr = annotate_calc_pcrel(dloc->ms, ip,
-							   src->offset, dl);
-
-			if (get_global_var_info(dloc, var_addr,
-						&var_name, &offset) &&
-			    !strcmp(var_name, "this_cpu_off") &&
-			    tsr->kind == TSR_KIND_CONST) {
-				tsr->kind = TSR_KIND_PERCPU_BASE;
-				imm_value = tsr->imm_value;
-			}
-		}
-		else
-			return;
-
-		if (tsr->kind != TSR_KIND_PERCPU_BASE)
-			return;
-
-		if (get_global_var_type(cu_die, dloc, ip, imm_value, &offset,
-					&type_die) && offset == 0) {
-			/*
-			 * This is not a pointer type, but it should be treated
-			 * as a pointer.
-			 */
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_POINTER;
-			tsr->ok = true;
-
-			pr_debug_dtp("add [%x] percpu %#"PRIx64" -> reg%d",
-				     insn_offset, imm_value, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		return;
-	}
-
-	if (strncmp(dl->ins.name, "mov", 3))
-		return;
-
-	if (dloc->fb_cfa) {
-		u64 ip = dloc->ms->sym->start + dl->al.offset;
-		u64 pc = map__rip_2objdump(dloc->ms->map, ip);
-
-		if (die_get_cfa(dloc->di->dbg, pc, &fbreg, &fboff) < 0)
-			fbreg = -1;
-	}
-
-	/* Case 1. register to register or segment:offset to register transfers */
-	if (!src->mem_ref && !dst->mem_ref) {
-		if (!has_reg_type(state, dst->reg1))
-			return;
-
-		tsr = &state->regs[dst->reg1];
-		if (map__dso(dloc->ms->map)->kernel &&
-		    src->segment == INSN_SEG_X86_GS && src->imm) {
-			u64 ip = dloc->ms->sym->start + dl->al.offset;
-			u64 var_addr;
-			int offset;
-
-			/*
-			 * In kernel, %gs points to a per-cpu region for the
-			 * current CPU.  Access with a constant offset should
-			 * be treated as a global variable access.
-			 */
-			var_addr = src->offset;
-
-			if (var_addr == 40) {
-				tsr->kind = TSR_KIND_CANARY;
-				tsr->ok = true;
-
-				pr_debug_dtp("mov [%x] stack canary -> reg%d\n",
-					     insn_offset, dst->reg1);
-				return;
-			}
-
-			if (!get_global_var_type(cu_die, dloc, ip, var_addr,
-						 &offset, &type_die) ||
-			    !die_get_member_type(&type_die, offset, &type_die)) {
-				tsr->ok = false;
-				return;
-			}
-
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] this-cpu addr=%#"PRIx64" -> reg%d",
-				     insn_offset, var_addr, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-			return;
-		}
-
-		if (src->imm) {
-			tsr->kind = TSR_KIND_CONST;
-			tsr->imm_value = src->offset;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] imm=%#x -> reg%d\n",
-				     insn_offset, tsr->imm_value, dst->reg1);
-			return;
-		}
-
-		if (!has_reg_type(state, src->reg1) ||
-		    !state->regs[src->reg1].ok) {
-			tsr->ok = false;
-			return;
-		}
-
-		tsr->type = state->regs[src->reg1].type;
-		tsr->kind = state->regs[src->reg1].kind;
-		tsr->ok = true;
-
-		pr_debug_dtp("mov [%x] reg%d -> reg%d",
-			     insn_offset, src->reg1, dst->reg1);
-		pr_debug_type_name(&tsr->type, tsr->kind);
-	}
-	/* Case 2. memory to register transers */
-	if (src->mem_ref && !dst->mem_ref) {
-		int sreg = src->reg1;
-
-		if (!has_reg_type(state, dst->reg1))
-			return;
-
-		tsr = &state->regs[dst->reg1];
-
-retry:
-		/* Check stack variables with offset */
-		if (sreg == fbreg) {
-			struct type_state_stack *stack;
-			int offset = src->offset - fboff;
-
-			stack = find_stack_state(state, offset);
-			if (stack == NULL) {
-				tsr->ok = false;
-				return;
-			} else if (!stack->compound) {
-				tsr->type = stack->type;
-				tsr->kind = stack->kind;
-				tsr->ok = true;
-			} else if (die_get_member_type(&stack->type,
-						       offset - stack->offset,
-						       &type_die)) {
-				tsr->type = type_die;
-				tsr->kind = TSR_KIND_TYPE;
-				tsr->ok = true;
-			} else {
-				tsr->ok = false;
-				return;
-			}
-
-			pr_debug_dtp("mov [%x] -%#x(stack) -> reg%d",
-				     insn_offset, -offset, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/* And then dereference the pointer if it has one */
-		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
-			 state->regs[sreg].kind == TSR_KIND_TYPE &&
-			 die_deref_ptr_type(&state->regs[sreg].type,
-					    src->offset, &type_die)) {
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] %#x(reg%d) -> reg%d",
-				     insn_offset, src->offset, sreg, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/* Or check if it's a global variable */
-		else if (sreg == DWARF_REG_PC) {
-			struct map_symbol *ms = dloc->ms;
-			u64 ip = ms->sym->start + dl->al.offset;
-			u64 addr;
-			int offset;
-
-			addr = annotate_calc_pcrel(ms, ip, src->offset, dl);
-
-			if (!get_global_var_type(cu_die, dloc, ip, addr, &offset,
-						 &type_die) ||
-			    !die_get_member_type(&type_die, offset, &type_die)) {
-				tsr->ok = false;
-				return;
-			}
-
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] global addr=%"PRIx64" -> reg%d",
-				     insn_offset, addr, dst->reg1);
-			pr_debug_type_name(&type_die, tsr->kind);
-		}
-		/* And check percpu access with base register */
-		else if (has_reg_type(state, sreg) &&
-			 state->regs[sreg].kind == TSR_KIND_PERCPU_BASE) {
-			u64 ip = dloc->ms->sym->start + dl->al.offset;
-			u64 var_addr = src->offset;
-			int offset;
-
-			if (src->multi_regs) {
-				int reg2 = (sreg == src->reg1) ? src->reg2 : src->reg1;
-
-				if (has_reg_type(state, reg2) && state->regs[reg2].ok &&
-				    state->regs[reg2].kind == TSR_KIND_CONST)
-					var_addr += state->regs[reg2].imm_value;
-			}
-
-			/*
-			 * In kernel, %gs points to a per-cpu region for the
-			 * current CPU.  Access with a constant offset should
-			 * be treated as a global variable access.
-			 */
-			if (get_global_var_type(cu_die, dloc, ip, var_addr,
-						&offset, &type_die) &&
-			    die_get_member_type(&type_die, offset, &type_die)) {
-				tsr->type = type_die;
-				tsr->kind = TSR_KIND_TYPE;
-				tsr->ok = true;
-
-				if (src->multi_regs) {
-					pr_debug_dtp("mov [%x] percpu %#x(reg%d,reg%d) -> reg%d",
-						     insn_offset, src->offset, src->reg1,
-						     src->reg2, dst->reg1);
-				} else {
-					pr_debug_dtp("mov [%x] percpu %#x(reg%d) -> reg%d",
-						     insn_offset, src->offset, sreg, dst->reg1);
-				}
-				pr_debug_type_name(&tsr->type, tsr->kind);
-			} else {
-				tsr->ok = false;
-			}
-		}
-		/* And then dereference the calculated pointer if it has one */
-		else if (has_reg_type(state, sreg) && state->regs[sreg].ok &&
-			 state->regs[sreg].kind == TSR_KIND_POINTER &&
-			 die_get_member_type(&state->regs[sreg].type,
-					     src->offset, &type_die)) {
-			tsr->type = type_die;
-			tsr->kind = TSR_KIND_TYPE;
-			tsr->ok = true;
-
-			pr_debug_dtp("mov [%x] pointer %#x(reg%d) -> reg%d",
-				     insn_offset, src->offset, sreg, dst->reg1);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/* Or try another register if any */
-		else if (src->multi_regs && sreg == src->reg1 &&
-			 src->reg1 != src->reg2) {
-			sreg = src->reg2;
-			goto retry;
-		}
-		else {
-			int offset;
-			const char *var_name = NULL;
-
-			/* it might be per-cpu variable (in kernel) access */
-			if (src->offset < 0) {
-				if (get_global_var_info(dloc, (s64)src->offset,
-							&var_name, &offset) &&
-				    !strcmp(var_name, "__per_cpu_offset")) {
-					tsr->kind = TSR_KIND_PERCPU_BASE;
-
-					pr_debug_dtp("mov [%x] percpu base reg%d\n",
-						     insn_offset, dst->reg1);
-				}
-			}
-
-			tsr->ok = false;
-		}
-	}
-	/* Case 3. register to memory transfers */
-	if (!src->mem_ref && dst->mem_ref) {
-		if (!has_reg_type(state, src->reg1) ||
-		    !state->regs[src->reg1].ok)
-			return;
-
-		/* Check stack variables with offset */
-		if (dst->reg1 == fbreg) {
-			struct type_state_stack *stack;
-			int offset = dst->offset - fboff;
-
-			tsr = &state->regs[src->reg1];
-
-			stack = find_stack_state(state, offset);
-			if (stack) {
-				/*
-				 * The source register is likely to hold a type
-				 * of member if it's a compound type.  Do not
-				 * update the stack variable type since we can
-				 * get the member type later by using the
-				 * die_get_member_type().
-				 */
-				if (!stack->compound)
-					set_stack_state(stack, offset, tsr->kind,
-							&tsr->type);
-			} else {
-				findnew_stack_state(state, offset, tsr->kind,
-						    &tsr->type);
-			}
-
-			pr_debug_dtp("mov [%x] reg%d -> -%#x(stack)",
-				     insn_offset, src->reg1, -offset);
-			pr_debug_type_name(&tsr->type, tsr->kind);
-		}
-		/*
-		 * Ignore other transfers since it'd set a value in a struct
-		 * and won't change the type.
-		 */
-	}
-	/* Case 4. memory to memory transfers (not handled for now) */
-}
-
 /**
  * update_insn_state - Update type state for an instruction
  * @state: type state table
@@ -1114,8 +739,8 @@ static void update_insn_state_x86(struct type_state *state,
 static void update_insn_state(struct type_state *state, struct data_loc_info *dloc,
 			      Dwarf_Die *cu_die, struct disasm_line *dl)
 {
-	if (arch__is(dloc->arch, "x86"))
-		update_insn_state_x86(state, dloc, cu_die, dl);
+	if (dloc->arch->update_insn_state)
+		dloc->arch->update_insn_state(state, dloc, cu_die, dl);
 }
 
 /*
diff --git a/tools/perf/util/annotate-data.h b/tools/perf/util/annotate-data.h
index ef235b1b15e1..2bc870e61c74 100644
--- a/tools/perf/util/annotate-data.h
+++ b/tools/perf/util/annotate-data.h
@@ -7,6 +7,7 @@
 #include <linux/rbtree.h>
 #include <linux/types.h>
 #include "dwarf-aux.h"
+#include "dwarf-regs.h"
 #include "annotate.h"
 #include "debuginfo.h"
 
@@ -18,6 +19,14 @@ struct hist_entry;
 struct map_symbol;
 struct thread;
 
+#define pr_debug_dtp(fmt, ...)					\
+do {								\
+	if (debug_type_profile)					\
+		pr_info(fmt, ##__VA_ARGS__);			\
+	else							\
+		pr_debug3(fmt, ##__VA_ARGS__);			\
+} while (0)
+
 enum type_state_kind {
 	TSR_KIND_INVALID = 0,
 	TSR_KIND_TYPE,
@@ -215,6 +224,20 @@ void global_var_type__tree_delete(struct rb_root *root);
 int hist_entry__annotate_data_tty(struct hist_entry *he, struct evsel *evsel);
 
 bool has_reg_type(struct type_state *state, int reg);
+struct type_state_stack *findnew_stack_state(struct type_state *state,
+						int offset, u8 kind,
+						Dwarf_Die *type_die);
+void set_stack_state(struct type_state_stack *stack, int offset, u8 kind,
+				Dwarf_Die *type_die);
+struct type_state_stack *find_stack_state(struct type_state *state,
+						int offset);
+bool get_global_var_type(Dwarf_Die *cu_die, struct data_loc_info *dloc,
+				u64 ip, u64 var_addr, int *var_offset,
+				Dwarf_Die *type_die);
+bool get_global_var_info(struct data_loc_info *dloc, u64 addr,
+				const char **var_name, int *var_offset);
+void pr_debug_type_name(Dwarf_Die *die, enum type_state_kind kind);
+
 #else /* HAVE_DWARF_SUPPORT */
 
 static inline struct annotated_data_type *
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 6d1125e687b7..2de66a092cab 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -12,6 +12,7 @@
 #include <subcmd/run-command.h>
 
 #include "annotate.h"
+#include "annotate-data.h"
 #include "build-id.h"
 #include "debug.h"
 #include "disasm.h"
@@ -145,6 +146,7 @@ static struct arch architectures[] = {
 			.memory_ref_char = '(',
 			.imm_char = '$',
 		},
+		.update_insn_state = update_insn_state_x86,
 	},
 	{
 		.name = "powerpc",
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 3d381a043520..718177fa4775 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -3,12 +3,16 @@
 #define __PERF_UTIL_DISASM_H
 
 #include "map_symbol.h"
+#include "dwarf-aux.h"
 
 struct annotation_options;
 struct disasm_line;
 struct ins;
 struct evsel;
 struct symbol;
+struct data_loc_info;
+struct type_state;
+struct disasm_line;
 
 struct arch {
 	const char	*name;
@@ -32,6 +36,9 @@ struct arch {
 		char memory_ref_char;
 		char imm_char;
 	} objdump;
+	void		(*update_insn_state)(struct type_state *state,
+				struct data_loc_info *dloc, Dwarf_Die *cu_die,
+				struct disasm_line *dl);
 };
 
 struct ins {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 3/9] tools/perf: Fix a comment about multi_regs in extract_reg_offset function
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
  2024-05-06 12:18 ` [PATCH V2 1/9] tools/perf: Move the data structures related to register type to header file Athira Rajeev
  2024-05-06 12:18 ` [PATCH V2 2/9] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking Athira Rajeev
@ 2024-05-06 12:19 ` Athira Rajeev
  2024-05-07  4:40   ` Namhyung Kim
  2024-05-06 12:19 ` [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump Athira Rajeev
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:19 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

Fix a comment in function which explains how multi_regs field gets set
for an instruction. In the example, "mov  %rsi, 8(%rbx,%rcx,4)", the
comment mistakenly referred to "dst_multi_regs = 0". Correct it to use
"src_multi_regs = 0"

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/annotate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index f5b6b5e5e757..0f5e10654d09 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2093,7 +2093,7 @@ static int extract_reg_offset(struct arch *arch, const char *str,
  *   mov  0x18, %r8      # src_reg1 = -1, src_mem = 0
  *                       # dst_reg1 = r8, dst_mem = 0
  *
- *   mov  %rsi, 8(%rbx,%rcx,4)  # src_reg1 = rsi, src_mem = 0, dst_multi_regs = 0
+ *   mov  %rsi, 8(%rbx,%rcx,4)  # src_reg1 = rsi, src_mem = 0, src_multi_regs = 0
  *                              # dst_reg1 = rbx, dst_reg2 = rcx, dst_mem = 1
  *                              # dst_multi_regs = 1, dst_offset = 8
  */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
                   ` (2 preceding siblings ...)
  2024-05-06 12:19 ` [PATCH V2 3/9] tools/perf: Fix a comment about multi_regs in extract_reg_offset function Athira Rajeev
@ 2024-05-06 12:19 ` Athira Rajeev
  2024-05-07  4:57   ` Namhyung Kim
  2024-05-07  9:35   ` Christophe Leroy
  2024-05-06 12:19 ` [PATCH V2 5/9] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:19 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

Add support to capture and parse raw instruction in objdump.
Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
with "objdump" while disassemble. Example from powerpc with this option
for an instruction address is:

Snippet from:
objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>

c0000000010224b4:	lwz     r10,0(r9)

This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. Also to find whether there is a memory
reference in the operands, "memory_ref_char" field of objdump is used.
For x86, "(" is used as memory_ref_char to tackle instructions of the
form "mov  (%rax), %rcx".

In case of powerpc, not all instructions using "(" are the only memory
instructions. Example, above instruction can also be of extended form (X
form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
and extract the source/target registers, patch adds support to use raw
instruction. With raw instruction, macros are added to extract opcode
and register fields.

"struct ins_operands" and "struct ins" is updated to carry opcode and
raw instruction binary code (raw_insn). Function "disasm_line__parse"
is updated to fill the raw instruction hex value and opcode in newly
added fields. There is no changes in existing code paths, which parses
the disassembled code. The architecture using the instruction name and
present approach is not altered. Since this approach targets powerpc,
the macro implementation is added for powerpc as of now.

Example:
representation using --show-raw-insn in objdump gives result:

38 01 81 e8     ld      r4,312(r1)

Here "38 01 81 e8" is the raw instruction representation. In powerpc,
this translates to instruction form: "ld RT,DS(RA)" and binary code
as:
_____________________________________
| 58 |  RT  |  RA |      DS       | |
-------------------------------------
0    6     11    16              30 31

Function "disasm_line__parse" is updated to capture:

line:    38 01 81 e8     ld      r4,312(r1)
opcode and raw instruction "38 01 81 e8"
Raw instruction is used later to extract the reg/offset fields.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/include/linux/string.h              |  2 +
 tools/lib/string.c                        | 13 +++++++
 tools/perf/arch/powerpc/util/dwarf-regs.c | 19 ++++++++++
 tools/perf/util/disasm.c                  | 46 +++++++++++++++++++----
 tools/perf/util/disasm.h                  |  6 +++
 tools/perf/util/include/dwarf-regs.h      |  9 +++++
 6 files changed, 88 insertions(+), 7 deletions(-)

diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
index db5c99318c79..0acb1fc14e19 100644
--- a/tools/include/linux/string.h
+++ b/tools/include/linux/string.h
@@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
 
 extern char *strim(char *);
 
+extern void remove_spaces(char *s);
+
 extern void *memchr_inv(const void *start, int c, size_t bytes);
 #endif /* _TOOLS_LINUX_STRING_H_ */
diff --git a/tools/lib/string.c b/tools/lib/string.c
index 8b6892f959ab..21d273e69951 100644
--- a/tools/lib/string.c
+++ b/tools/lib/string.c
@@ -153,6 +153,19 @@ char *strim(char *s)
 	return skip_spaces(s);
 }
 
+/*
+ * remove_spaces - Removes whitespaces from @s
+ */
+void remove_spaces(char *s)
+{
+	char *d = s;
+	do {
+		while (*d == ' ') {
+			++d;
+		}
+	} while ((*s++ = *d++));
+}
+
 /**
  * strreplace - Replace all occurrences of character in string.
  * @s: The string to operate on.
diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
index 0c4f4caf53ac..e60a71fd846e 100644
--- a/tools/perf/arch/powerpc/util/dwarf-regs.c
+++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
@@ -98,3 +98,22 @@ int regs_query_register_offset(const char *name)
 			return roff->ptregs_offset;
 	return -EINVAL;
 }
+
+#define	PPC_OP(op)	(((op) >> 26) & 0x3F)
+#define PPC_RA(a)	(((a) >> 16) & 0x1f)
+#define PPC_RT(t)	(((t) >> 21) & 0x1f)
+
+int get_opcode_insn(unsigned int raw_insn)
+{
+	return PPC_OP(raw_insn);
+}
+
+int get_source_reg(unsigned int raw_insn)
+{
+	return PPC_RA(raw_insn);
+}
+
+int get_target_reg(unsigned int raw_insn)
+{
+	return PPC_RT(raw_insn);
+}
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 2de66a092cab..85692f73e78f 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -43,7 +43,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
 			   struct ins_operands *ops, int max_ins_name);
 
 static void ins__sort(struct arch *arch);
-static int disasm_line__parse(char *line, const char **namep, char **rawp);
+static int disasm_line__parse(char *line, const char **namep, char **rawp, int *opcode, int *rawp_insn);
 
 static __attribute__((constructor)) void symbol__init_regexpr(void)
 {
@@ -512,7 +512,7 @@ static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_s
 	if (ops->locked.ops == NULL)
 		return 0;
 
-	if (disasm_line__parse(ops->raw, &ops->locked.ins.name, &ops->locked.ops->raw) < 0)
+	if (disasm_line__parse(ops->raw, &ops->locked.ins.name, &ops->locked.ops->raw, &ops->locked.ins.opcode, &ops->locked.ops->raw_insn) < 0)
 		goto out_free_ops;
 
 	ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name);
@@ -815,11 +815,38 @@ static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, str
 		dl->ins.ops = NULL;
 }
 
-static int disasm_line__parse(char *line, const char **namep, char **rawp)
+int __weak get_opcode_insn(unsigned int raw_insn __maybe_unused)
 {
-	char tmp, *name = skip_spaces(line);
+	return -1;
+}
+
+int __weak get_source_reg(unsigned int raw_insn __maybe_unused)
+{
+	return -1;
+}
+
+int __weak get_target_reg(unsigned int raw_insn __maybe_unused)
+{
+	return -1;
+}
+
+/*
+ * Parses the objdump result captured with --show-raw-insn.
+ * Example, objdump line from powerpc:
+ * line:    38 01 81 e8     ld      r4,312(r1)
+ * namep : ld
+ * rawp  : "ld r4,312(r1)"
+ * opcode: fetched from arch specific get_opcode_insn
+ * rawp_insn: e8810138
+ *
+ * rawp_insn is used later to extract the reg/offset fields
+ */
+static int disasm_line__parse(char *line, const char **namep, char **rawp, int *opcode, int *rawp_insn)
+{
+	char tmp, *tmp_opcode, *name_opcode = skip_spaces(line);
+	char *name = skip_spaces(name_opcode + 11);
 
-	if (name[0] == '\0')
+	if (name_opcode[0] == '\0')
 		return -1;
 
 	*rawp = name + 1;
@@ -829,13 +856,18 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
 
 	tmp = (*rawp)[0];
 	(*rawp)[0] = '\0';
+	tmp_opcode = strdup(name_opcode);
+	tmp_opcode[11] = '\0';
+	remove_spaces(tmp_opcode);
 	*namep = strdup(name);
+	*opcode = get_opcode_insn(be32_to_cpu(strtol(tmp_opcode, NULL, 16)));
 
 	if (*namep == NULL)
 		goto out;
 
 	(*rawp)[0] = tmp;
 	*rawp = strim(*rawp);
+	*rawp_insn = be32_to_cpu(strtol(tmp_opcode, NULL, 16));
 
 	return 0;
 
@@ -896,7 +928,7 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
 		goto out_delete;
 
 	if (args->offset != -1) {
-		if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
+		if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw, &dl->ins.opcode, &dl->ops.raw_insn) < 0)
 			goto out_free_line;
 
 		disasm_line__init_ins(dl, args->arch, &args->ms);
@@ -1726,7 +1758,7 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
 		 map__rip_2objdump(map, sym->start),
 		 map__rip_2objdump(map, sym->end),
 		 opts->show_linenr ? "-l" : "",
-		 opts->show_asm_raw ? "" : "--no-show-raw-insn",
+		 opts->show_asm_raw ? "" : "--show-raw-insn",
 		 opts->annotate_src ? "-S" : "",
 		 opts->prefix ? "--prefix " : "",
 		 opts->prefix ? '"' : ' ',
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 718177fa4775..5c1520010ca7 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -43,14 +43,18 @@ struct arch {
 
 struct ins {
 	const char     *name;
+	int	opcode;
 	struct ins_ops *ops;
 };
 
 struct ins_operands {
 	char	*raw;
+	int	raw_insn;
 	struct {
 		char	*raw;
 		char	*name;
+		int	opcode;
+		int	raw_insn;
 		struct symbol *sym;
 		u64	addr;
 		s64	offset;
@@ -62,6 +66,8 @@ struct ins_operands {
 		struct {
 			char	*raw;
 			char	*name;
+			int	opcode;
+			int	raw_insn;
 			u64	addr;
 			bool	multi_regs;
 		} source;
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 01fb25a1150a..2a4e1e16e52c 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -31,6 +31,15 @@ static inline int get_dwarf_regnum(const char *name __maybe_unused,
 }
 #endif
 
+/*
+ * get_opcode_insn - Return opcode from raw instruction
+ * get_source_reg - Return source reg from raw instruction
+ * get_target_reg - Return target reg from raw instruction
+ */
+int get_opcode_insn(unsigned int raw_insn);
+int get_source_reg(unsigned int raw_insn);
+int get_target_reg(unsigned int raw_insn);
+
 #ifdef HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET
 /*
  * Arch should support fetching the offset of a register in pt_regs
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 5/9] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
                   ` (3 preceding siblings ...)
  2024-05-06 12:19 ` [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump Athira Rajeev
@ 2024-05-06 12:19 ` Athira Rajeev
  2024-05-07  9:48   ` Christophe Leroy
  2024-05-06 12:19 ` [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc Athira Rajeev
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:19 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset. The implementation addresses
the D-form, X-form, DS-form instructions. Two main functions are added.
New parse function "load_store__parse" as instruction ops parser for
memory instructions. Unlink other parser (like mov__parse), this parser
fills in only the "raw" field for source/target and new added "mem_ref"
field. This is because, here there is no need to parse the disassembled
code and arch specific macros will take care of extracting offset and
regs which is easier and will be precise.

In powerpc, all instructions with a primary opcode from 32 to 63
are memory instructions. Update "ins__find" function to have "opcode"
also as a parameter. Don't use the "extract_reg_offset", instead use
newly added function "get_arch_regs" which will set these fields: reg1,
reg2, offset depending of where it is source or target ops.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/arch/powerpc/util/dwarf-regs.c | 33 +++++++++++++
 tools/perf/util/annotate.c                | 22 ++++++++-
 tools/perf/util/disasm.c                  | 59 +++++++++++++++++++++--
 tools/perf/util/disasm.h                  |  4 +-
 tools/perf/util/include/dwarf-regs.h      |  4 +-
 5 files changed, 114 insertions(+), 8 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
index e60a71fd846e..3121c70dc0d3 100644
--- a/tools/perf/arch/powerpc/util/dwarf-regs.c
+++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
@@ -102,6 +102,9 @@ int regs_query_register_offset(const char *name)
 #define	PPC_OP(op)	(((op) >> 26) & 0x3F)
 #define PPC_RA(a)	(((a) >> 16) & 0x1f)
 #define PPC_RT(t)	(((t) >> 21) & 0x1f)
+#define PPC_RB(b)	(((b) >> 11) & 0x1f)
+#define PPC_D(D)	((D) & 0xfffe)
+#define PPC_DS(DS)	((DS) & 0xfffc)
 
 int get_opcode_insn(unsigned int raw_insn)
 {
@@ -117,3 +120,33 @@ int get_target_reg(unsigned int raw_insn)
 {
 	return PPC_RT(raw_insn);
 }
+
+int get_offset_opcode(int raw_insn __maybe_unused)
+{
+	int opcode = PPC_OP(raw_insn);
+
+	/* DS- form */
+	if ((opcode == 58) || (opcode == 62))
+		return PPC_DS(raw_insn);
+	else
+		return PPC_D(raw_insn);
+}
+
+/*
+ * Fills the required fields for op_loc depending on if it
+ * is a source of target.
+ * D form: ins RT,D(RA) -> src_reg1 = RA, offset = D, dst_reg1 = RT
+ * DS form: ins RT,DS(RA) -> src_reg1 = RA, offset = DS, dst_reg1 = RT
+ * X form: ins RT,RA,RB -> src_reg1 = RA, src_reg2 = RB, dst_reg1 = RT
+ */
+void get_arch_regs(int raw_insn __maybe_unused, int is_source __maybe_unused, struct annotated_op_loc *op_loc __maybe_unused)
+{
+	if (is_source)
+		op_loc->reg1 = get_source_reg(raw_insn);
+	else
+		op_loc->reg1 = get_target_reg(raw_insn);
+	if (op_loc->multi_regs)
+		op_loc->reg2 = PPC_RB(raw_insn);
+	if (op_loc->mem_ref)
+		op_loc->offset = get_offset_opcode(raw_insn);
+}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 0f5e10654d09..48739c7ffdc7 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2073,6 +2073,12 @@ static int extract_reg_offset(struct arch *arch, const char *str,
 	return 0;
 }
 
+__weak void get_arch_regs(int raw_insn __maybe_unused, int is_source __maybe_unused,
+		struct annotated_op_loc *op_loc __maybe_unused)
+{
+	return;
+}
+
 /**
  * annotate_get_insn_location - Get location of instruction
  * @arch: the architecture info
@@ -2117,10 +2123,12 @@ int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
 	for_each_insn_op_loc(loc, i, op_loc) {
 		const char *insn_str = ops->source.raw;
 		bool multi_regs = ops->source.multi_regs;
+		bool mem_ref = ops->source.mem_ref;
 
 		if (i == INSN_OP_TARGET) {
 			insn_str = ops->target.raw;
 			multi_regs = ops->target.multi_regs;
+			mem_ref = ops->target.mem_ref;
 		}
 
 		/* Invalidate the register by default */
@@ -2130,7 +2138,19 @@ int annotate_get_insn_location(struct arch *arch, struct disasm_line *dl,
 		if (insn_str == NULL)
 			continue;
 
-		if (strchr(insn_str, arch->objdump.memory_ref_char)) {
+		/*
+		 * For powerpc, call get_arch_regs function which extracts the
+		 * required fields for op_loc, ie reg1, reg2, offset from the
+		 * raw instruction.
+		 */
+		if (arch__is(arch, "powerpc")) {
+			op_loc->mem_ref = mem_ref;
+			if ((!strchr(insn_str, '(')) && (i == INSN_OP_SOURCE))
+				op_loc->multi_regs = true;
+			get_arch_regs(ops->raw_insn, !i, op_loc);
+			if (op_loc->multi_regs)
+				op_loc->offset = 0;
+		} else if (strchr(insn_str, arch->objdump.memory_ref_char)) {
 			op_loc->mem_ref = true;
 			op_loc->multi_regs = multi_regs;
 			extract_reg_offset(arch, insn_str, op_loc);
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 85692f73e78f..f41a0fadeab4 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -515,7 +515,7 @@ static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_s
 	if (disasm_line__parse(ops->raw, &ops->locked.ins.name, &ops->locked.ops->raw, &ops->locked.ins.opcode, &ops->locked.ops->raw_insn) < 0)
 		goto out_free_ops;
 
-	ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name);
+	ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name, 0);
 
 	if (ops->locked.ins.ops == NULL)
 		goto out_free_ops;
@@ -670,6 +670,46 @@ static struct ins_ops mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
+static int load_store__scnprintf(struct ins *ins, char *bf, size_t size,
+		struct ins_operands *ops, int max_ins_name)
+{
+	return scnprintf(bf, size, "%-*s %s", max_ins_name, ins->name,
+			ops->raw);
+}
+
+/*
+ * Sets only two fields "raw" and "mem_ref".
+ * "mem_ref" is set for ops->source which is later used to
+ * fill the objdump->memory_ref-char field. This ops is currently
+ * used by powerpc and since binary instruction code is used to
+ * extract opcode, regs and offset, no other parsing is needed here
+ */
+static int load_store__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
+{
+	ops->source.raw = strdup(ops->raw);
+	ops->source.mem_ref = true;
+
+	if (ops->source.raw == NULL)
+		return -1;
+
+	ops->target.raw = strdup(ops->raw);
+	ops->target.mem_ref = false;
+
+	if (ops->target.raw == NULL)
+		goto out_free_source;
+
+	return 0;
+
+out_free_source:
+	zfree(&ops->source.raw);
+	return -1;
+}
+
+static struct ins_ops load_store_ops = {
+	.parse     = load_store__parse,
+	.scnprintf = load_store__scnprintf,
+};
+
 static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map_symbol *ms __maybe_unused)
 {
 	char *target, *comment, *s, prev;
@@ -760,11 +800,20 @@ static void ins__sort(struct arch *arch)
 	qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
 }
 
-static struct ins_ops *__ins__find(struct arch *arch, const char *name)
+static struct ins_ops *__ins__find(struct arch *arch, const char *name, int opcode)
 {
 	struct ins *ins;
 	const int nmemb = arch->nr_instructions;
 
+	if (arch__is(arch, "powerpc")) {
+		/*
+		 * Instructions with a primary opcode from 32 to 63
+		 * are memory instructions in powerpc.
+		 */
+		if ((opcode >= 31) && (opcode <= 63))
+			return &load_store_ops;
+	}
+
 	if (!arch->sorted_instructions) {
 		ins__sort(arch);
 		arch->sorted_instructions = true;
@@ -794,9 +843,9 @@ static struct ins_ops *__ins__find(struct arch *arch, const char *name)
 	return ins ? ins->ops : NULL;
 }
 
-struct ins_ops *ins__find(struct arch *arch, const char *name)
+struct ins_ops *ins__find(struct arch *arch, const char *name, int opcode)
 {
-	struct ins_ops *ops = __ins__find(arch, name);
+	struct ins_ops *ops = __ins__find(arch, name, opcode);
 
 	if (!ops && arch->associate_instruction_ops)
 		ops = arch->associate_instruction_ops(arch, name);
@@ -806,7 +855,7 @@ struct ins_ops *ins__find(struct arch *arch, const char *name)
 
 static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map_symbol *ms)
 {
-	dl->ins.ops = ins__find(arch, dl->ins.name);
+	dl->ins.ops = ins__find(arch, dl->ins.name, dl->ins.opcode);
 
 	if (!dl->ins.ops)
 		return;
diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
index 5c1520010ca7..ed8875b35bfe 100644
--- a/tools/perf/util/disasm.h
+++ b/tools/perf/util/disasm.h
@@ -61,6 +61,7 @@ struct ins_operands {
 		bool	offset_avail;
 		bool	outside;
 		bool	multi_regs;
+		bool	mem_ref;
 	} target;
 	union {
 		struct {
@@ -70,6 +71,7 @@ struct ins_operands {
 			int	raw_insn;
 			u64	addr;
 			bool	multi_regs;
+			bool	mem_ref;
 		} source;
 		struct {
 			struct ins	    ins;
@@ -103,7 +105,7 @@ struct annotate_args {
 struct arch *arch__find(const char *name);
 bool arch__is(struct arch *arch, const char *name);
 
-struct ins_ops *ins__find(struct arch *arch, const char *name);
+struct ins_ops *ins__find(struct arch *arch, const char *name, int opcode);
 int ins__scnprintf(struct ins *ins, char *bf, size_t size,
 		   struct ins_operands *ops, int max_ins_name);
 
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 2a4e1e16e52c..d691245dd703 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _PERF_DWARF_REGS_H_
 #define _PERF_DWARF_REGS_H_
+#include "annotate.h"
 
 #define DWARF_REG_PC  0xd3af9c /* random number */
 #define DWARF_REG_FB  0xd3affb /* random number */
@@ -39,7 +40,8 @@ static inline int get_dwarf_regnum(const char *name __maybe_unused,
 int get_opcode_insn(unsigned int raw_insn);
 int get_source_reg(unsigned int raw_insn);
 int get_target_reg(unsigned int raw_insn);
-
+int get_offset_opcode(int raw_insn);
+void get_arch_regs(int raw_insn, int is_source, struct annotated_op_loc *op_loc);
 #ifdef HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET
 /*
  * Arch should support fetching the offset of a register in pt_regs
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
                   ` (4 preceding siblings ...)
  2024-05-06 12:19 ` [PATCH V2 5/9] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
@ 2024-05-06 12:19 ` Athira Rajeev
  2024-05-07  9:52   ` Christophe Leroy
  2024-05-06 12:19 ` [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction Athira Rajeev
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:19 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

Add instruction tracking function "update_insn_state_powerpc" for
powerpc. Example sequence in powerpc:

ld      r10,264(r3)
mr      r31,r3
<<after some sequence>
ld      r9,312(r31)

Consider ithe sample is pointing to: "ld r9,312(r31)".
Here the memory reference is hit at "312(r31)" where 312 is the offset
and r31 is the source register. Previous instruction sequence shows that
register state of r3 is moved to r31. So to identify the data type for r31
access, the previous instruction ("mr") needs to be tracked and the
state type entry has to be updated. Current instruction tracking support
in perf tools infrastructure is specific to x86. Patch adds this for
powerpc and adds "mr" instruction to be tracked.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c | 63 +++++++++++++++++++
 tools/perf/util/annotate-data.c               |  9 ++-
 tools/perf/util/disasm.c                      |  1 +
 3 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index a3f423c27cae..cce7023951fe 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -49,6 +49,69 @@ static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, con
 	return ops;
 }
 
+/*
+ * Instruction tracking function to track register state moves.
+ * Example sequence:
+ *    ld      r10,264(r3)
+ *    mr      r31,r3
+ *    <<after some sequence>
+ *    ld      r9,312(r31)
+ *
+ * Previous instruction sequence shows that register state of r3
+ * is moved to r31. update_insn_state_powerpc tracks these state
+ * changes
+ */
+#ifdef HAVE_DWARF_SUPPORT
+static void update_insn_state_powerpc(struct type_state *state,
+		struct data_loc_info *dloc, Dwarf_Die *cu_die __maybe_unused,
+		struct disasm_line *dl)
+{
+	struct annotated_insn_loc loc;
+	struct annotated_op_loc *src = &loc.ops[INSN_OP_SOURCE];
+	struct annotated_op_loc *dst = &loc.ops[INSN_OP_TARGET];
+	struct type_state_reg *tsr;
+	u32 insn_offset = dl->al.offset;
+
+	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
+		return;
+
+	if (strncmp(dl->ins.name, "mr", 2))
+		return;
+
+	if (!strncmp(dl->ins.name, "mr", 2)) {
+		int src_reg = src->reg1;
+
+		src->reg1 = dst->reg1;
+		dst->reg1 = src_reg;
+	}
+
+	if (!has_reg_type(state, dst->reg1))
+		return;
+
+	tsr = &state->regs[dst->reg1];
+
+	if (!has_reg_type(state, src->reg1) ||
+			!state->regs[src->reg1].ok) {
+		tsr->ok = false;
+		return;
+	}
+
+	tsr->type = state->regs[src->reg1].type;
+	tsr->kind = state->regs[src->reg1].kind;
+	tsr->ok = true;
+
+	pr_debug("mov [%x] reg%d -> reg%d",
+			insn_offset, src->reg1, dst->reg1);
+	pr_debug_type_name(&tsr->type, tsr->kind);
+}
+#else /* HAVE_DWARF_SUPPORT */
+static void update_insn_state_powerpc(struct type_state *state __maybe_unused, struct data_loc_info *dloc __maybe_unused,
+		Dwarf_Die *cu_die __maybe_unused, struct disasm_line *dl __maybe_unused)
+{
+	return;
+}
+#endif /* HAVE_DWARF_SUPPORT */
+
 static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 {
 	if (!arch->initialized) {
diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index 9d6d4f472c85..e22ba35c93b2 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1079,6 +1079,13 @@ static int find_data_type_insn(struct data_loc_info *dloc,
 	return ret;
 }
 
+static int arch_supports_insn_tracking(struct data_loc_info *dloc)
+{
+	if ((arch__is(dloc->arch, "x86")) || (arch__is(dloc->arch, "powerpc")))
+		return 1;
+	return 0;
+}
+
 /*
  * Construct a list of basic blocks for each scope with variables and try to find
  * the data type by updating a type state table through instructions.
@@ -1093,7 +1100,7 @@ static int find_data_type_block(struct data_loc_info *dloc,
 	int ret = -1;
 
 	/* TODO: other architecture support */
-	if (!arch__is(dloc->arch, "x86"))
+	if (!arch_supports_insn_tracking(dloc))
 		return -1;
 
 	prev_dst_ip = dst_ip = dloc->ip;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index f41a0fadeab4..ac6b8b8da38a 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -151,6 +151,7 @@ static struct arch architectures[] = {
 	{
 		.name = "powerpc",
 		.init = powerpc__annotate_init,
+		.update_insn_state = update_insn_state_powerpc,
 	},
 	{
 		.name = "riscv64",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
                   ` (5 preceding siblings ...)
  2024-05-06 12:19 ` [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc Athira Rajeev
@ 2024-05-06 12:19 ` Athira Rajeev
  2024-05-07  9:58   ` Christophe Leroy
  2024-05-06 12:19 ` [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg Athira Rajeev
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:19 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

Update instruction tracking with add instruction. Apart from "mr"
instruction, the register state is carried on by other insns, ie,
"add, addi, addis". Since these are not memory instructions and doesn't
fall in the range of (32 to 63), add these as part of nmemonic table.
For now, add* instructions are added. There is possibility of getting
more added here. But to extract regs, still the binary code will be
used. So associate this with "load_store_ops" itself and no other
changes required.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 .../perf/arch/powerpc/annotate/instructions.c | 21 +++++++++++++++++++
 tools/perf/util/disasm.c                      |  1 +
 2 files changed, 22 insertions(+)

diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
index cce7023951fe..1f35d8a65bb4 100644
--- a/tools/perf/arch/powerpc/annotate/instructions.c
+++ b/tools/perf/arch/powerpc/annotate/instructions.c
@@ -1,6 +1,17 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/compiler.h>
 
+/*
+ * powerpc instruction nmemonic table to associate load/store instructions with
+ * move_ops. mov_ops is used to identify add/mr to do instruction tracking.
+ */
+static struct ins powerpc__instructions[] = {
+	{ .name = "mr",         .ops = &load_store_ops,  },
+	{ .name = "addi",       .ops = &load_store_ops,   },
+	{ .name = "addis",      .ops = &load_store_ops,  },
+	{ .name = "add",        .ops = &load_store_ops,  },
+};
+
 static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, const char *name)
 {
 	int i;
@@ -75,6 +86,9 @@ static void update_insn_state_powerpc(struct type_state *state,
 	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
 		return;
 
+	if (!strncmp(dl->ins.name, "add", 3))
+		goto regs_check;
+
 	if (strncmp(dl->ins.name, "mr", 2))
 		return;
 
@@ -85,6 +99,7 @@ static void update_insn_state_powerpc(struct type_state *state,
 		dst->reg1 = src_reg;
 	}
 
+regs_check:
 	if (!has_reg_type(state, dst->reg1))
 		return;
 
@@ -115,6 +130,12 @@ static void update_insn_state_powerpc(struct type_state *state __maybe_unused, s
 static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 {
 	if (!arch->initialized) {
+		arch->nr_instructions = ARRAY_SIZE(powerpc__instructions);
+		arch->instructions = calloc(arch->nr_instructions, sizeof(struct ins));
+		if (!arch->instructions)
+			return -ENOMEM;
+		memcpy(arch->instructions, powerpc__instructions, sizeof(struct ins) * arch->nr_instructions);
+		arch->nr_instructions_allocated = arch->nr_instructions;
 		arch->initialized = true;
 		arch->associate_instruction_ops = powerpc__associate_instruction_ops;
 		arch->objdump.comment_char      = '#';
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index ac6b8b8da38a..32cf506a9010 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -36,6 +36,7 @@ static struct ins_ops mov_ops;
 static struct ins_ops nop_ops;
 static struct ins_ops lock_ops;
 static struct ins_ops ret_ops;
+static struct ins_ops load_store_ops;
 
 static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
 			   struct ins_operands *ops, int max_ins_name);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
                   ` (6 preceding siblings ...)
  2024-05-06 12:19 ` [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction Athira Rajeev
@ 2024-05-06 12:19 ` Athira Rajeev
  2024-05-07 10:03   ` Christophe Leroy
  2024-05-06 12:19 ` [PATCH V2 9/9] tools/perf: Add support for global_die to capture name of variable in case of register defined variable Athira Rajeev
  2024-05-07  4:39 ` [PATCH V2 0/9] Add data type profiling support for powerpc Namhyung Kim
  9 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:19 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

There are cases where define a global register variable and associate it
with a specified register. Example, in powerpc, two registers are
defined to represent variable:
1. r13: represents local_paca
register struct paca_struct *local_paca asm("r13");

2. r1: represents stack_pointer
register void *__stack_pointer asm("r1");

These regs are present in dwarf debug as DW_OP_reg as part of variables
in the cu_die (compile unit). These are not present in die search done
in the list of nested scopes since these are global register variables.

Example for local_paca represented by r13:

<<>>
 <1><18dc6b4>: Abbrev Number: 128 (DW_TAG_variable)
    <18dc6b6>   DW_AT_name        : (indirect string, offset: 0x3861): local_paca
    <18dc6ba>   DW_AT_decl_file   : 48
    <18dc6bb>   DW_AT_decl_line   : 36
    <18dc6bc>   DW_AT_decl_column : 30
    <18dc6bd>   DW_AT_type        : <0x18dc6c3>
    <18dc6c1>   DW_AT_external    : 1
    <18dc6c1>   DW_AT_location    : 1 byte block: 5d    (DW_OP_reg13 (r13))

 <1><18dc6c3>: Abbrev Number: 3 (DW_TAG_pointer_type)
    <18dc6c4>   DW_AT_byte_size   : 8
    <18dc6c4>   DW_AT_type        : <0x18dc353>

Where  DW_AT_type : <0x18dc6c3> further points to :

 <1><18dc6c3>: Abbrev Number: 3 (DW_TAG_pointer_type)
    <18dc6c4>   DW_AT_byte_size   : 8
    <18dc6c4>   DW_AT_type        : <0x18dc353>

which belongs to:

 <1><18dc353>: Abbrev Number: 67 (DW_TAG_structure_type)
    <18dc354>   DW_AT_name        : (indirect string, offset: 0x56cd): paca_struct
    <18dc358>   DW_AT_byte_size   : 2944
    <18dc35a>   DW_AT_alignment   : 128
    <18dc35b>   DW_AT_decl_file   : 48
    <18dc35c>   DW_AT_decl_line   : 61
    <18dc35d>   DW_AT_decl_column : 8
    <18dc35d>   DW_AT_sibling     : <0x18dc6b4>
<<>>

Similar is case with "r1".

<<>>
 <1><18dd772>: Abbrev Number: 129 (DW_TAG_variable)
    <18dd774>   DW_AT_name        : (indirect string, offset: 0x11ba): current_stack_pointer
    <18dd778>   DW_AT_decl_file   : 51
    <18dd779>   DW_AT_decl_line   : 1468
    <18dd77b>   DW_AT_decl_column : 24
    <18dd77c>   DW_AT_type        : <0x18da5cd>
    <18dd780>   DW_AT_external    : 1
    <18dd780>   DW_AT_location    : 1 byte block: 51    (DW_OP_reg1 (r1))

 where 18da5cd is:

 <1><18da5cd>: Abbrev Number: 47 (DW_TAG_base_type)
    <18da5ce>   DW_AT_byte_size   : 8
    <18da5cf>   DW_AT_encoding    : 7   (unsigned)
    <18da5d0>   DW_AT_name        : (indirect string, offset: 0x55c7): long unsigned int
<<>>

To identify data type for these two special cases, iterate over
variables in the CU die (Compile Unit) and match it with the register.
If the variable is a base type, ie die_get_real_type will return NULL
here, set offset to zero. With the changes, data type for "paca_struct"
and "long unsigned int" for r1 is identified.

Snippet from ./perf report -s type,type_off

    12.85%  long unsigned int  long unsigned int +0 (no field)
     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/annotate-data.c      | 40 ++++++++++++++++++++++++++++
 tools/perf/util/annotate.c           |  8 ++++++
 tools/perf/util/annotate.h           |  1 +
 tools/perf/util/include/dwarf-regs.h |  1 +
 4 files changed, 50 insertions(+)

diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index e22ba35c93b2..ab2168c4ef41 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1169,6 +1169,40 @@ static int find_data_type_block(struct data_loc_info *dloc,
 	return ret;
 }
 
+/*
+ * Handle cases where define a global register variable and
+ * associate it with a specified register. These regs are
+ * present in dwarf debug as DW_OP_reg as part of variables
+ * in the cu_die (compile unit). Iterate over variables in the
+ * cu_die and match with reg to identify data type die.
+ */
+static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_Die *cu_die,
+		Dwarf_Die *type_die)
+{
+	Dwarf_Die vr_die;
+	int ret = -1;
+	struct die_var_type *var_types = NULL;
+
+	die_collect_vars(cu_die, &var_types);
+	while (var_types) {
+		if (var_types->reg == reg) {
+			if (dwarf_offdie(dloc->di->dbg, var_types->die_off, &vr_die)) {
+				if (die_get_real_type(&vr_die, type_die) == NULL) {
+					dloc->type_offset = 0;
+					dwarf_offdie(dloc->di->dbg, var_types->die_off, type_die);
+				}
+				pr_debug_type_name(type_die, TSR_KIND_TYPE);
+				ret = 0;
+				pr_debug_dtp("found by CU for %s (die:%#lx)\n",
+						dwarf_diename(type_die), (long)dwarf_dieoffset(type_die));
+			}
+			break;
+		}
+		var_types = var_types->next;
+	}
+	return ret;
+}
+
 /* The result will be saved in @type_die */
 static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
 {
@@ -1216,6 +1250,12 @@ static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
 	pr_debug_dtp("CU for %s (die:%#lx)\n",
 		     dwarf_diename(&cu_die), (long)dwarf_dieoffset(&cu_die));
 
+	if (loc->reg_type == DWARF_REG_GLOBAL) {
+		ret = find_data_type_global_reg(dloc, reg, &cu_die, type_die);
+		if (!ret)
+			goto out;
+	}
+
 	if (reg == DWARF_REG_PC) {
 		if (get_global_var_type(&cu_die, dloc, dloc->ip, dloc->var_addr,
 					&offset, type_die)) {
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 48739c7ffdc7..2b0b5279f86d 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2426,6 +2426,14 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he)
 			op_loc->reg1 = DWARF_REG_PC;
 		}
 
+		/* Global reg variable 13 and 1
+		 * assign to DWARF_REG_GLOBAL
+		 */
+		if (arch__is(arch, "powerpc")) {
+			if ((op_loc->reg1 == 13) || (op_loc->reg1 == 1))
+				op_loc->reg_type = DWARF_REG_GLOBAL;
+		}
+
 		mem_type = find_data_type(&dloc);
 
 		if (mem_type == NULL && is_stack_canary(arch, op_loc)) {
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index d5c821c22f79..43ae75d8356b 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -472,6 +472,7 @@ struct annotated_op_loc {
 	bool mem_ref;
 	bool multi_regs;
 	bool imm;
+	int reg_type;
 };
 
 enum annotated_insn_ops {
diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index d691245dd703..adcce167d9ca 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -5,6 +5,7 @@
 
 #define DWARF_REG_PC  0xd3af9c /* random number */
 #define DWARF_REG_FB  0xd3affb /* random number */
+#define DWARF_REG_GLOBAL 0xd3affc /* random number */
 
 #ifdef HAVE_DWARF_SUPPORT
 const char *get_arch_regstr(unsigned int n);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V2 9/9] tools/perf: Add support for global_die to capture name of variable in case of register defined variable
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
                   ` (7 preceding siblings ...)
  2024-05-06 12:19 ` [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg Athira Rajeev
@ 2024-05-06 12:19 ` Athira Rajeev
  2024-05-07  4:39 ` [PATCH V2 0/9] Add data type profiling support for powerpc Namhyung Kim
  9 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-06 12:19 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung, segher, christophe.leroy
  Cc: atrajeev, kjain, linux-kernel, akanksha, linux-perf-users, maddy,
	disgoel, linuxppc-dev

In case of register defined variable (found using
find_data_type_global_reg), if the type of variable happens to be base
type (example, long unsigned int), perf report captures it as:

    12.85%  long unsigned int  long unsigned int +0 (no field)

The above data type is actually referring to samples captured while
accessing "r1" which represents current stack pointer in powerpc.
register void *__stack_pointer asm("r1");

The dwarf debug contains this as:

<<>>
 <1><18dd772>: Abbrev Number: 129 (DW_TAG_variable)
    <18dd774>   DW_AT_name        : (indirect string, offset: 0x11ba): current_stack_pointer
    <18dd778>   DW_AT_decl_file   : 51
    <18dd779>   DW_AT_decl_line   : 1468
    <18dd77b>   DW_AT_decl_column : 24
    <18dd77c>   DW_AT_type        : <0x18da5cd>
    <18dd780>   DW_AT_external    : 1
    <18dd780>   DW_AT_location    : 1 byte block: 51    (DW_OP_reg1 (r1))

 where 18da5cd is:

 <1><18da5cd>: Abbrev Number: 47 (DW_TAG_base_type)
    <18da5ce>   DW_AT_byte_size   : 8
    <18da5cf>   DW_AT_encoding    : 7   (unsigned)
    <18da5d0>   DW_AT_name        : (indirect string, offset: 0x55c7): long unsigned int
<<>>

To make it more clear to the user, capture the DW_AT_name of the
variable and save it as part of Dwarf_Global. Dwarf_Global is used so
that it can be used and retrieved while presenting the result.

Update "dso__findnew_data_type" function to set "var_name" if
variable name is set as part of Dwarf_Global. Updated
"hist_entry__typeoff_snprintf" to print var_name if it is set.
With the changes, along with "long unsigned int" report also says the
variable name as current_stack_pointer

Snippet of result:

    12.85%  long unsigned int  long unsigned int +0 (current_stack_pointer)
     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/util/annotate-data.c | 30 ++++++++++++++++++++++++------
 tools/perf/util/dwarf-aux.c     |  1 +
 tools/perf/util/dwarf-aux.h     |  1 +
 tools/perf/util/sort.c          |  7 +++++--
 4 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
index ab2168c4ef41..9f72d4b6a5f4 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -267,23 +267,32 @@ static void delete_members(struct annotated_member *member)
 }
 
 static struct annotated_data_type *dso__findnew_data_type(struct dso *dso,
-							  Dwarf_Die *type_die)
+							  Dwarf_Die *type_die, Dwarf_Global *global_die)
 {
 	struct annotated_data_type *result = NULL;
 	struct annotated_data_type key;
 	struct rb_node *node;
 	struct strbuf sb;
+	struct strbuf sb_var_name;
 	char *type_name;
+	char *var_name;
 	Dwarf_Word size;
 
 	strbuf_init(&sb, 32);
+	strbuf_init(&sb_var_name, 32);
 	if (die_get_typename_from_type(type_die, &sb) < 0)
 		strbuf_add(&sb, "(unknown type)", 14);
+	if (global_die->name) {
+		strbuf_addstr(&sb_var_name, global_die->name);
+		var_name = strbuf_detach(&sb_var_name, NULL);
+	}
 	type_name = strbuf_detach(&sb, NULL);
 	dwarf_aggregate_size(type_die, &size);
 
 	/* Check existing nodes in dso->data_types tree */
 	key.self.type_name = type_name;
+	if (global_die->name)
+		key.self.var_name = var_name;
 	key.self.size = size;
 	node = rb_find(&key, &dso->data_types, data_type_cmp);
 	if (node) {
@@ -300,6 +309,8 @@ static struct annotated_data_type *dso__findnew_data_type(struct dso *dso,
 	}
 
 	result->self.type_name = type_name;
+	if (global_die->name)
+		result->self.var_name = var_name;
 	result->self.size = size;
 	INIT_LIST_HEAD(&result->self.children);
 
@@ -1177,7 +1188,7 @@ static int find_data_type_block(struct data_loc_info *dloc,
  * cu_die and match with reg to identify data type die.
  */
 static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_Die *cu_die,
-		Dwarf_Die *type_die)
+		Dwarf_Die *type_die, Dwarf_Global *global_die)
 {
 	Dwarf_Die vr_die;
 	int ret = -1;
@@ -1189,8 +1200,11 @@ static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_
 			if (dwarf_offdie(dloc->di->dbg, var_types->die_off, &vr_die)) {
 				if (die_get_real_type(&vr_die, type_die) == NULL) {
 					dloc->type_offset = 0;
+					global_die->name = var_types->name;
 					dwarf_offdie(dloc->di->dbg, var_types->die_off, type_die);
 				}
+				global_die->die_offset = (long)dwarf_dieoffset(type_die);
+				global_die->cu_offset = (long)dwarf_dieoffset(cu_die);
 				pr_debug_type_name(type_die, TSR_KIND_TYPE);
 				ret = 0;
 				pr_debug_dtp("found by CU for %s (die:%#lx)\n",
@@ -1204,7 +1218,8 @@ static int find_data_type_global_reg(struct data_loc_info *dloc, int reg, Dwarf_
 }
 
 /* The result will be saved in @type_die */
-static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
+static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die,
+		Dwarf_Global *global_die)
 {
 	struct annotated_op_loc *loc = dloc->op;
 	Dwarf_Die cu_die, var_die;
@@ -1218,6 +1233,8 @@ static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
 	u64 pc;
 	char buf[64];
 
+	memset(global_die, 0, sizeof(Dwarf_Global));
+
 	if (dloc->op->multi_regs)
 		snprintf(buf, sizeof(buf), "reg%d, reg%d", dloc->op->reg1, dloc->op->reg2);
 	else if (dloc->op->reg1 == DWARF_REG_PC)
@@ -1251,7 +1268,7 @@ static int find_data_type_die(struct data_loc_info *dloc, Dwarf_Die *type_die)
 		     dwarf_diename(&cu_die), (long)dwarf_dieoffset(&cu_die));
 
 	if (loc->reg_type == DWARF_REG_GLOBAL) {
-		ret = find_data_type_global_reg(dloc, reg, &cu_die, type_die);
+		ret = find_data_type_global_reg(dloc, reg, &cu_die, type_die, global_die);
 		if (!ret)
 			goto out;
 	}
@@ -1387,6 +1404,7 @@ struct annotated_data_type *find_data_type(struct data_loc_info *dloc)
 	struct annotated_data_type *result = NULL;
 	struct dso *dso = map__dso(dloc->ms->map);
 	Dwarf_Die type_die;
+	Dwarf_Global global_die;
 
 	dloc->di = debuginfo__new(dso->long_name);
 	if (dloc->di == NULL) {
@@ -1402,10 +1420,10 @@ struct annotated_data_type *find_data_type(struct data_loc_info *dloc)
 
 	dloc->fbreg = -1;
 
-	if (find_data_type_die(dloc, &type_die) < 0)
+	if (find_data_type_die(dloc, &type_die, &global_die) < 0)
 		goto out;
 
-	result = dso__findnew_data_type(dso, &type_die);
+	result = dso__findnew_data_type(dso, &type_die, &global_die);
 
 out:
 	debuginfo__delete(dloc->di);
diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c
index c0a492e65388..d0478d4407d9 100644
--- a/tools/perf/util/dwarf-aux.c
+++ b/tools/perf/util/dwarf-aux.c
@@ -1608,6 +1608,7 @@ static int __die_collect_vars_cb(Dwarf_Die *die_mem, void *arg)
 	vt->reg = reg_from_dwarf_op(ops);
 	vt->offset = offset_from_dwarf_op(ops);
 	vt->next = *var_types;
+	vt->name = dwarf_diename(die_mem);
 	*var_types = vt;
 
 	return DIE_FIND_CB_SIBLING;
diff --git a/tools/perf/util/dwarf-aux.h b/tools/perf/util/dwarf-aux.h
index 24446412b869..406a5b1e269b 100644
--- a/tools/perf/util/dwarf-aux.h
+++ b/tools/perf/util/dwarf-aux.h
@@ -146,6 +146,7 @@ struct die_var_type {
 	u64 addr;
 	int reg;
 	int offset;
+	const char *name;
 };
 
 /* Return type info of a member at offset */
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index add8601c57fd..c7dfeea1a23c 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2303,9 +2303,12 @@ static int hist_entry__typeoff_snprintf(struct hist_entry *he, char *bf,
 	char buf[4096];
 
 	buf[0] = '\0';
-	if (list_empty(&he_type->self.children))
+	if (list_empty(&he_type->self.children)) {
 		snprintf(buf, sizeof(buf), "no field");
-	else
+		if (he_type->self.var_name)
+			strcpy(buf, he_type->self.var_name);
+
+	} else
 		fill_member_name(buf, sizeof(buf), &he_type->self,
 				 he->mem_type_off, true);
 	buf[4095] = '\0';
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/9] Add data type profiling support for powerpc
  2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
                   ` (8 preceding siblings ...)
  2024-05-06 12:19 ` [PATCH V2 9/9] tools/perf: Add support for global_die to capture name of variable in case of register defined variable Athira Rajeev
@ 2024-05-07  4:39 ` Namhyung Kim
  2024-05-13  7:32   ` Athira Rajeev
  9 siblings, 1 reply; 29+ messages in thread
From: Namhyung Kim @ 2024-05-07  4:39 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: irogers, maddy, kjain, adrian.hunter, linux-kernel,
	linux-perf-users, acme, jolsa, akanksha, disgoel, linuxppc-dev

Hello,

On Mon, May 6, 2024 at 5:19 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> The patchset from Namhyung added support for data type profiling
> in perf tool. This enabled support to associate PMU samples to data
> types they refer using DWARF debug information. With the upstream
> perf, currently it possible to run perf report or perf annotate to
> view the data type information on x86.
>
> Initial patchset posted here had changes need to enable data type
> profiling support for powerpc.
>
> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
>
> Main change were:
> 1. powerpc instruction nmemonic table to associate load/store
> instructions with move_ops which is use to identify if instruction
> is a memory access one.
> 2. To get register number and access offset from the given
> instruction, code uses fields from "struct arch" -> objump.
> Added entry for powerpc here.
> 3. A get_arch_regnum to return register number from the
> register name string.
>
> But the apporach used in the initial patchset used parsing of
> disassembled code which the current perf tool implementation does.
>
> Example: lwz     r10,0(r9)
>
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. Also to find whether there is a memory
> reference in the operands, "memory_ref_char" field of objdump is used.
> For x86, "(" is used as memory_ref_char to tackle instructions of the
> form "mov  (%rax), %rcx".
>
> In case of powerpc, not all instructions using "(" are the only memory
> instructions. Example, above instruction can also be of extended form (X
> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> and extract the source/target registers, this patchset adds support to use
> raw instruction. With raw instruction, macros are added to extract opcode
> and register fields.
>
> Example representation using --show-raw-insn in objdump gives result:
>
> 38 01 81 e8     ld      r4,312(r1)
>
> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> this translates to instruction form: "ld RT,DS(RA)" and binary code
> as:
> _____________________________________
> | 58 |  RT  |  RA |      DS       | |
> -------------------------------------
> 0    6     11    16              30 31
>
> Patchset adds support to pick the opcode and reg fields from this
> raw/binary instruction code. This approach came in from review comment
> by Segher Boessenkool for the initial patchset.
>
> Apart from that, instruction tracking is enabled for powerpc and
> support function is added to find variables defined as registers
> Example, in powerpc, two registers are
> defined to represent variable:
> 1. r13: represents local_paca
> register struct paca_struct *local_paca asm("r13");
>
> 2. r1: represents stack_pointer
> register void *__stack_pointer asm("r1");
>
> These are handled in this patchset.
>
> - Patch 1 is to rearrange register state type structures to header file
> so that it can referred from other arch specific files
> - Patch 2 is to make instruction tracking as a callback to"struct arch"
> so that it can be implemented by other archs easily and defined in arch
> specific files
> - Patch 3 is to fix a small comment
> - Patch 4 adds support to capture and parse raw instruction in objdump
> by keeping existing approach intact.
> - Patch 5 update parameters for reg extract functions to use raw
> instruction on powerpc
> - Patch 6 and patch 7 handles instruction tracking for powerpc.
> - Patch 8 and Patch 8 handles support to find global register variables
>
> With the current patchset:
>
>  ./perf record -a -e mem-loads sleep 1
>  ./perf report -s type,typeoff --hierarchy --group --stdio
>  ./perf annotate --data-type --insn-stat
>
> perf annotate logs:
>
> Annotate Instruction stats
> total 562, ok 441 (78.5%), bad 121 (21.5%)
>
>   Name      :  Good   Bad
> -----------------------------------------------------------
>   ld        :   313    54
>   lwz       :    51    32
>   lbz       :    31     5
>   ldx       :     6    21
>   lhz       :    23     0
>   lwa       :     4     3
>   lwarx     :     5     0
>   lwzx      :     2     2
>   ldarx     :     3     0
>   lwzu      :     2     0
>   stdcx.    :     0     1
>   nop       :     0     1
>   ldu       :     1     0
>   lbzx      :     0     1
>   lwax      :     0     1
>
> perf report logs:
>
> # Samples: 1K of event 'mem-loads'
> # Event count (approx.): 937238
> #
> # Overhead  Data Type  Data Type Offset
> # ........  .........  ................
> #
>     48.81%  (unknown)  (unknown) +0 (no field)
>     12.85%  long unsigned int  long unsigned int +0 (current_stack_pointer)
>      4.68%  struct paca_struct  struct paca_struct +2312 (__current)
>      4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
>      2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
>      2.64%  struct paca_struct  struct paca_struct +2808 (canary)
>      2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
>      1.41%  struct vm_fault  struct vm_fault +0 (vma)
>      1.29%  struct task_struct  struct task_struct +276 (flags)
>      1.03%  struct pt_regs  struct pt_regs +264 (user_regs.msr)
>      1.00%  struct menu_device  struct menu_device +4 (tick_wakeup)
>      0.90%  struct security_hook_list  struct security_hook_list +0 (list.next)
>      0.76%  struct irq_desc  struct irq_desc +304 (irq_data.chip)
>      0.76%  struct rq  struct rq +2856 (cpu)

Looks great!  I'm glad it worked on powerpc too.
We still need to verify the returned type is properly annotated.
But overall it looks really good, I will leave comments in reply.

Thanks,
Namhyung

>
> Thanks
> Athira Rajeev
>
> Changelog:
> From v1->v2:
> - Addressed suggestion from Christophe Leroy and Segher Boessenkool
>   to use the binary code (raw insn) to fetch opcode, register and
>   offset fields.
> - Added support for instruction tracking in powerpc
> - Find the register defined variables (r13 and r1 which points to
>   local_paca and current_stack_pointer in powerpc)
>
> Athira Rajeev (9):
>   tools/perf: Move the data structures related to register  type to
>     header file
>   tools/perf: Add "update_insn_state" callback function to handle arch
>     specific instruction tracking
>   tools/perf: Fix a comment about multi_regs in extract_reg_offset
>     function
>   tools/perf: Add support to capture and parse raw instruction in
>     objdump
>   tools/perf: Update parameters for reg extract functions to use raw
>     instruction on powerpc
>   tools/perf: Update instruction tracking for powerpc
>   tools/perf: Update instruction tracking with add instruction
>   tools/perf: Add support to find global register variables using
>     find_data_type_global_reg
>   tools/perf: Add support for global_die to capture name of variable in
>     case of register defined variable
>
>  tools/include/linux/string.h                  |   2 +
>  tools/lib/string.c                            |  13 +
>  .../perf/arch/powerpc/annotate/instructions.c |  84 +++
>  tools/perf/arch/powerpc/util/dwarf-regs.c     |  52 ++
>  tools/perf/arch/x86/annotate/instructions.c   | 383 +++++++++++++
>  tools/perf/util/annotate-data.c               | 519 +++---------------
>  tools/perf/util/annotate-data.h               |  78 +++
>  tools/perf/util/annotate.c                    |  32 +-
>  tools/perf/util/annotate.h                    |   1 +
>  tools/perf/util/disasm.c                      | 109 +++-
>  tools/perf/util/disasm.h                      |  17 +-
>  tools/perf/util/dwarf-aux.c                   |   1 +
>  tools/perf/util/dwarf-aux.h                   |   1 +
>  tools/perf/util/include/dwarf-regs.h          |  12 +
>  tools/perf/util/sort.c                        |   7 +-
>  15 files changed, 854 insertions(+), 457 deletions(-)
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 3/9] tools/perf: Fix a comment about multi_regs in extract_reg_offset function
  2024-05-06 12:19 ` [PATCH V2 3/9] tools/perf: Fix a comment about multi_regs in extract_reg_offset function Athira Rajeev
@ 2024-05-07  4:40   ` Namhyung Kim
  2024-05-07 14:52     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 29+ messages in thread
From: Namhyung Kim @ 2024-05-07  4:40 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: irogers, maddy, kjain, adrian.hunter, linux-kernel,
	linux-perf-users, acme, jolsa, akanksha, disgoel, linuxppc-dev

On Mon, May 6, 2024 at 5:19 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> Fix a comment in function which explains how multi_regs field gets set
> for an instruction. In the example, "mov  %rsi, 8(%rbx,%rcx,4)", the
> comment mistakenly referred to "dst_multi_regs = 0". Correct it to use
> "src_multi_regs = 0"
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

> ---
>  tools/perf/util/annotate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index f5b6b5e5e757..0f5e10654d09 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -2093,7 +2093,7 @@ static int extract_reg_offset(struct arch *arch, const char *str,
>   *   mov  0x18, %r8      # src_reg1 = -1, src_mem = 0
>   *                       # dst_reg1 = r8, dst_mem = 0
>   *
> - *   mov  %rsi, 8(%rbx,%rcx,4)  # src_reg1 = rsi, src_mem = 0, dst_multi_regs = 0
> + *   mov  %rsi, 8(%rbx,%rcx,4)  # src_reg1 = rsi, src_mem = 0, src_multi_regs = 0
>   *                              # dst_reg1 = rbx, dst_reg2 = rcx, dst_mem = 1
>   *                              # dst_multi_regs = 1, dst_offset = 8
>   */
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump
  2024-05-06 12:19 ` [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump Athira Rajeev
@ 2024-05-07  4:57   ` Namhyung Kim
  2024-05-07  9:35   ` Christophe Leroy
  1 sibling, 0 replies; 29+ messages in thread
From: Namhyung Kim @ 2024-05-07  4:57 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: irogers, maddy, kjain, adrian.hunter, linux-kernel,
	linux-perf-users, acme, jolsa, akanksha, disgoel, linuxppc-dev

On Mon, May 6, 2024 at 5:21 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> Add support to capture and parse raw instruction in objdump.
> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
> with "objdump" while disassemble. Example from powerpc with this option
> for an instruction address is:
>
> Snippet from:
> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>
> c0000000010224b4:       lwz     r10,0(r9)
>
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. Also to find whether there is a memory
> reference in the operands, "memory_ref_char" field of objdump is used.
> For x86, "(" is used as memory_ref_char to tackle instructions of the
> form "mov  (%rax), %rcx".
>
> In case of powerpc, not all instructions using "(" are the only memory
> instructions. Example, above instruction can also be of extended form (X
> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> and extract the source/target registers, patch adds support to use raw
> instruction. With raw instruction, macros are added to extract opcode
> and register fields.
>
> "struct ins_operands" and "struct ins" is updated to carry opcode and
> raw instruction binary code (raw_insn). Function "disasm_line__parse"
> is updated to fill the raw instruction hex value and opcode in newly
> added fields. There is no changes in existing code paths, which parses
> the disassembled code. The architecture using the instruction name and
> present approach is not altered. Since this approach targets powerpc,
> the macro implementation is added for powerpc as of now.
>
> Example:
> representation using --show-raw-insn in objdump gives result:
>
> 38 01 81 e8     ld      r4,312(r1)
>
> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> this translates to instruction form: "ld RT,DS(RA)" and binary code
> as:
> _____________________________________
> | 58 |  RT  |  RA |      DS       | |
> -------------------------------------
> 0    6     11    16              30 31
>
> Function "disasm_line__parse" is updated to capture:
>
> line:    38 01 81 e8     ld      r4,312(r1)
> opcode and raw instruction "38 01 81 e8"
> Raw instruction is used later to extract the reg/offset fields.
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/include/linux/string.h              |  2 +
>  tools/lib/string.c                        | 13 +++++++
>  tools/perf/arch/powerpc/util/dwarf-regs.c | 19 ++++++++++
>  tools/perf/util/disasm.c                  | 46 +++++++++++++++++++----
>  tools/perf/util/disasm.h                  |  6 +++
>  tools/perf/util/include/dwarf-regs.h      |  9 +++++
>  6 files changed, 88 insertions(+), 7 deletions(-)
>
> diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
> index db5c99318c79..0acb1fc14e19 100644
> --- a/tools/include/linux/string.h
> +++ b/tools/include/linux/string.h
> @@ -46,5 +46,7 @@ extern char * __must_check skip_spaces(const char *);
>
>  extern char *strim(char *);
>
> +extern void remove_spaces(char *s);
> +
>  extern void *memchr_inv(const void *start, int c, size_t bytes);
>  #endif /* _TOOLS_LINUX_STRING_H_ */
> diff --git a/tools/lib/string.c b/tools/lib/string.c
> index 8b6892f959ab..21d273e69951 100644
> --- a/tools/lib/string.c
> +++ b/tools/lib/string.c
> @@ -153,6 +153,19 @@ char *strim(char *s)
>         return skip_spaces(s);
>  }
>
> +/*
> + * remove_spaces - Removes whitespaces from @s
> + */
> +void remove_spaces(char *s)
> +{
> +       char *d = s;
> +       do {
> +               while (*d == ' ') {
> +                       ++d;
> +               }
> +       } while ((*s++ = *d++));
> +}
> +
>  /**
>   * strreplace - Replace all occurrences of character in string.
>   * @s: The string to operate on.
> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
> index 0c4f4caf53ac..e60a71fd846e 100644
> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
> @@ -98,3 +98,22 @@ int regs_query_register_offset(const char *name)
>                         return roff->ptregs_offset;
>         return -EINVAL;
>  }
> +
> +#define        PPC_OP(op)      (((op) >> 26) & 0x3F)
> +#define PPC_RA(a)      (((a) >> 16) & 0x1f)
> +#define PPC_RT(t)      (((t) >> 21) & 0x1f)
> +
> +int get_opcode_insn(unsigned int raw_insn)
> +{
> +       return PPC_OP(raw_insn);
> +}
> +
> +int get_source_reg(unsigned int raw_insn)
> +{
> +       return PPC_RA(raw_insn);
> +}
> +
> +int get_target_reg(unsigned int raw_insn)
> +{
> +       return PPC_RT(raw_insn);
> +}
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 2de66a092cab..85692f73e78f 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -43,7 +43,7 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
>                            struct ins_operands *ops, int max_ins_name);
>
>  static void ins__sort(struct arch *arch);
> -static int disasm_line__parse(char *line, const char **namep, char **rawp);
> +static int disasm_line__parse(char *line, const char **namep, char **rawp, int *opcode, int *rawp_insn);
>
>  static __attribute__((constructor)) void symbol__init_regexpr(void)
>  {
> @@ -512,7 +512,7 @@ static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map_s
>         if (ops->locked.ops == NULL)
>                 return 0;
>
> -       if (disasm_line__parse(ops->raw, &ops->locked.ins.name, &ops->locked.ops->raw) < 0)
> +       if (disasm_line__parse(ops->raw, &ops->locked.ins.name, &ops->locked.ops->raw, &ops->locked.ins.opcode, &ops->locked.ops->raw_insn) < 0)

This line is too long.


>                 goto out_free_ops;
>
>         ops->locked.ins.ops = ins__find(arch, ops->locked.ins.name);
> @@ -815,11 +815,38 @@ static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, str
>                 dl->ins.ops = NULL;
>  }
>
> -static int disasm_line__parse(char *line, const char **namep, char **rawp)
> +int __weak get_opcode_insn(unsigned int raw_insn __maybe_unused)
>  {
> -       char tmp, *name = skip_spaces(line);
> +       return -1;
> +}
> +
> +int __weak get_source_reg(unsigned int raw_insn __maybe_unused)
> +{
> +       return -1;
> +}
> +
> +int __weak get_target_reg(unsigned int raw_insn __maybe_unused)
> +{
> +       return -1;
> +}

I prefer having conditional code with #ifdef rather than weak
functions especially if it isn't needed for every arch.

> +
> +/*
> + * Parses the objdump result captured with --show-raw-insn.
> + * Example, objdump line from powerpc:
> + * line:    38 01 81 e8     ld      r4,312(r1)
> + * namep : ld
> + * rawp  : "ld r4,312(r1)"
> + * opcode: fetched from arch specific get_opcode_insn
> + * rawp_insn: e8810138
> + *
> + * rawp_insn is used later to extract the reg/offset fields
> + */
> +static int disasm_line__parse(char *line, const char **namep, char **rawp, int *opcode, int *rawp_insn)
> +{
> +       char tmp, *tmp_opcode, *name_opcode = skip_spaces(line);
> +       char *name = skip_spaces(name_opcode + 11);
>
> -       if (name[0] == '\0')
> +       if (name_opcode[0] == '\0')
>                 return -1;
>
>         *rawp = name + 1;
> @@ -829,13 +856,18 @@ static int disasm_line__parse(char *line, const char **namep, char **rawp)
>
>         tmp = (*rawp)[0];
>         (*rawp)[0] = '\0';
> +       tmp_opcode = strdup(name_opcode);
> +       tmp_opcode[11] = '\0';

Looks like powerpc specific.  Can we move it under arch check?


> +       remove_spaces(tmp_opcode);
>         *namep = strdup(name);
> +       *opcode = get_opcode_insn(be32_to_cpu(strtol(tmp_opcode, NULL, 16)));

This as well.  Maybe we could have per-arch disasm_line__parse().

>
>         if (*namep == NULL)
>                 goto out;
>
>         (*rawp)[0] = tmp;
>         *rawp = strim(*rawp);
> +       *rawp_insn = be32_to_cpu(strtol(tmp_opcode, NULL, 16));
>
>         return 0;
>
> @@ -896,7 +928,7 @@ struct disasm_line *disasm_line__new(struct annotate_args *args)
>                 goto out_delete;
>
>         if (args->offset != -1) {
> -               if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw) < 0)
> +               if (disasm_line__parse(dl->al.line, &dl->ins.name, &dl->ops.raw, &dl->ins.opcode, &dl->ops.raw_insn) < 0)
>                         goto out_free_line;
>
>                 disasm_line__init_ins(dl, args->arch, &args->ms);
> @@ -1726,7 +1758,7 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
>                  map__rip_2objdump(map, sym->start),
>                  map__rip_2objdump(map, sym->end),
>                  opts->show_linenr ? "-l" : "",
> -                opts->show_asm_raw ? "" : "--no-show-raw-insn",
> +                opts->show_asm_raw ? "" : "--show-raw-insn",

This is not what we want.  According to the man page of objdump
the --show-raw-insn should be enabled by default.  So I think the
default value prints nothing.  But if it's not the case on powerpc
(maybe on x86 too?) we could add it in the positive case too and
make powerpc init code reset the option.

Thanks,
Namhyung


>                  opts->annotate_src ? "-S" : "",
>                  opts->prefix ? "--prefix " : "",
>                  opts->prefix ? '"' : ' ',
> diff --git a/tools/perf/util/disasm.h b/tools/perf/util/disasm.h
> index 718177fa4775..5c1520010ca7 100644
> --- a/tools/perf/util/disasm.h
> +++ b/tools/perf/util/disasm.h
> @@ -43,14 +43,18 @@ struct arch {
>
>  struct ins {
>         const char     *name;
> +       int     opcode;
>         struct ins_ops *ops;
>  };
>
>  struct ins_operands {
>         char    *raw;
> +       int     raw_insn;
>         struct {
>                 char    *raw;
>                 char    *name;
> +               int     opcode;
> +               int     raw_insn;
>                 struct symbol *sym;
>                 u64     addr;
>                 s64     offset;
> @@ -62,6 +66,8 @@ struct ins_operands {
>                 struct {
>                         char    *raw;
>                         char    *name;
> +                       int     opcode;
> +                       int     raw_insn;
>                         u64     addr;
>                         bool    multi_regs;
>                 } source;
> diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
> index 01fb25a1150a..2a4e1e16e52c 100644
> --- a/tools/perf/util/include/dwarf-regs.h
> +++ b/tools/perf/util/include/dwarf-regs.h
> @@ -31,6 +31,15 @@ static inline int get_dwarf_regnum(const char *name __maybe_unused,
>  }
>  #endif
>
> +/*
> + * get_opcode_insn - Return opcode from raw instruction
> + * get_source_reg - Return source reg from raw instruction
> + * get_target_reg - Return target reg from raw instruction
> + */
> +int get_opcode_insn(unsigned int raw_insn);
> +int get_source_reg(unsigned int raw_insn);
> +int get_target_reg(unsigned int raw_insn);
> +
>  #ifdef HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET
>  /*
>   * Arch should support fetching the offset of a register in pt_regs
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump
  2024-05-06 12:19 ` [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump Athira Rajeev
  2024-05-07  4:57   ` Namhyung Kim
@ 2024-05-07  9:35   ` Christophe Leroy
  2024-05-09 17:26     ` Athira Rajeev
  1 sibling, 1 reply; 29+ messages in thread
From: Christophe Leroy @ 2024-05-07  9:35 UTC (permalink / raw)
  To: Athira Rajeev, acme, jolsa, adrian.hunter, irogers, namhyung, segher
  Cc: maddy, kjain, linux-kernel, akanksha, linux-perf-users, disgoel,
	linuxppc-dev



Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> Add support to capture and parse raw instruction in objdump.

What's the purpose of using 'objdump' for reading raw instructions ? 
Can't they be read directly without invoking 'objdump' ? It looks odd to 
me to use objdump to provide readable text and then parse it back.

> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
> with "objdump" while disassemble. Example from powerpc with this option
> for an instruction address is:

Yes and that makes sense because the purpose of objdump is to provide 
human readable annotations, not to perform automated analysis. Am I 
missing something ?

> 
> Snippet from:
> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
> 
> c0000000010224b4:	lwz     r10,0(r9)
> 
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. Also to find whether there is a memory
> reference in the operands, "memory_ref_char" field of objdump is used.
> For x86, "(" is used as memory_ref_char to tackle instructions of the
> form "mov  (%rax), %rcx".
> 
> In case of powerpc, not all instructions using "(" are the only memory
> instructions. Example, above instruction can also be of extended form (X
> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> and extract the source/target registers, patch adds support to use raw
> instruction. With raw instruction, macros are added to extract opcode
> and register fields.
> 
> "struct ins_operands" and "struct ins" is updated to carry opcode and
> raw instruction binary code (raw_insn). Function "disasm_line__parse"
> is updated to fill the raw instruction hex value and opcode in newly
> added fields. There is no changes in existing code paths, which parses
> the disassembled code. The architecture using the instruction name and
> present approach is not altered. Since this approach targets powerpc,
> the macro implementation is added for powerpc as of now.
> 
> Example:
> representation using --show-raw-insn in objdump gives result:
> 
> 38 01 81 e8     ld      r4,312(r1)
> 
> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> this translates to instruction form: "ld RT,DS(RA)" and binary code
> as:
> _____________________________________
> | 58 |  RT  |  RA |      DS       | |
> -------------------------------------
> 0    6     11    16              30 31
> 
> Function "disasm_line__parse" is updated to capture:
> 
> line:    38 01 81 e8     ld      r4,312(r1)
> opcode and raw instruction "38 01 81 e8"
> Raw instruction is used later to extract the reg/offset fields.
> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 5/9] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc
  2024-05-06 12:19 ` [PATCH V2 5/9] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
@ 2024-05-07  9:48   ` Christophe Leroy
  2024-05-22 14:06     ` Athira Rajeev
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Leroy @ 2024-05-07  9:48 UTC (permalink / raw)
  To: Athira Rajeev, acme, jolsa, adrian.hunter, irogers, namhyung, segher
  Cc: maddy, kjain, linux-kernel, akanksha, linux-perf-users, disgoel,
	linuxppc-dev



Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> Use the raw instruction code and macros to identify memory instructions,
> extract register fields and also offset. The implementation addresses
> the D-form, X-form, DS-form instructions. Two main functions are added.
> New parse function "load_store__parse" as instruction ops parser for
> memory instructions. Unlink other parser (like mov__parse), this parser
> fills in only the "raw" field for source/target and new added "mem_ref"
> field. This is because, here there is no need to parse the disassembled
> code and arch specific macros will take care of extracting offset and
> regs which is easier and will be precise.
> 
> In powerpc, all instructions with a primary opcode from 32 to 63
> are memory instructions. Update "ins__find" function to have "opcode"
> also as a parameter. Don't use the "extract_reg_offset", instead use
> newly added function "get_arch_regs" which will set these fields: reg1,
> reg2, offset depending of where it is source or target ops.

Yes all instructions with a primary opcode from 32 to 63 are memory 
instructions, but not all memory instructions have opcode 32 to 63.

> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>   tools/perf/arch/powerpc/util/dwarf-regs.c | 33 +++++++++++++
>   tools/perf/util/annotate.c                | 22 ++++++++-
>   tools/perf/util/disasm.c                  | 59 +++++++++++++++++++++--
>   tools/perf/util/disasm.h                  |  4 +-
>   tools/perf/util/include/dwarf-regs.h      |  4 +-
>   5 files changed, 114 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
> index e60a71fd846e..3121c70dc0d3 100644
> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
> @@ -102,6 +102,9 @@ int regs_query_register_offset(const char *name)
>   #define	PPC_OP(op)	(((op) >> 26) & 0x3F)
>   #define PPC_RA(a)	(((a) >> 16) & 0x1f)
>   #define PPC_RT(t)	(((t) >> 21) & 0x1f)
> +#define PPC_RB(b)	(((b) >> 11) & 0x1f)
> +#define PPC_D(D)	((D) & 0xfffe)
> +#define PPC_DS(DS)	((DS) & 0xfffc)
>   
>   int get_opcode_insn(unsigned int raw_insn)
>   {
> @@ -117,3 +120,33 @@ int get_target_reg(unsigned int raw_insn)
>   {
>   	return PPC_RT(raw_insn);
>   }
> +
> +int get_offset_opcode(int raw_insn __maybe_unused)
> +{
> +	int opcode = PPC_OP(raw_insn);
> +
> +	/* DS- form */
> +	if ((opcode == 58) || (opcode == 62))

Can you re-use macros from arch/powerpc/include/asm/ppc-opcode.h ?

#define OP_LD		58
#define OP_STD		62


> +		return PPC_DS(raw_insn);
> +	else
> +		return PPC_D(raw_insn);
> +}
> +
> +/*
> + * Fills the required fields for op_loc depending on if it
> + * is a source of target.
> + * D form: ins RT,D(RA) -> src_reg1 = RA, offset = D, dst_reg1 = RT
> + * DS form: ins RT,DS(RA) -> src_reg1 = RA, offset = DS, dst_reg1 = RT
> + * X form: ins RT,RA,RB -> src_reg1 = RA, src_reg2 = RB, dst_reg1 = RT
> + */
> +void get_arch_regs(int raw_insn __maybe_unused, int is_source __maybe_unused, struct annotated_op_loc *op_loc __maybe_unused)
> +{
> +	if (is_source)
> +		op_loc->reg1 = get_source_reg(raw_insn);
> +	else
> +		op_loc->reg1 = get_target_reg(raw_insn);
> +	if (op_loc->multi_regs)
> +		op_loc->reg2 = PPC_RB(raw_insn);
> +	if (op_loc->mem_ref)
> +		op_loc->offset = get_offset_opcode(raw_insn);
> +}

> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index 85692f73e78f..f41a0fadeab4 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -760,11 +800,20 @@ static void ins__sort(struct arch *arch)
>   	qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
>   }
>   
> -static struct ins_ops *__ins__find(struct arch *arch, const char *name)
> +static struct ins_ops *__ins__find(struct arch *arch, const char *name, int opcode)
>   {
>   	struct ins *ins;
>   	const int nmemb = arch->nr_instructions;
>   
> +	if (arch__is(arch, "powerpc")) {
> +		/*
> +		 * Instructions with a primary opcode from 32 to 63
> +		 * are memory instructions in powerpc.
> +		 */
> +		if ((opcode >= 31) && (opcode <= 63))

Could just be if ((opcode & 0x20)) I guess.

By the way your test is wrong because opcode 31 is not only memory 
instructions, see example below (not exhaustive):

#define OP_31_XOP_TRAP      4		==> No
#define OP_31_XOP_LDX       21		==> Yes
#define OP_31_XOP_LWZX      23		==> Yes
#define OP_31_XOP_LDUX      53		==> Yes
#define OP_31_XOP_DCBST     54		==> No
#define OP_31_XOP_LWZUX     55		==> Yes
#define OP_31_XOP_TRAP_64   68		==> No
#define OP_31_XOP_DCBF      86		==> No
#define OP_31_XOP_LBZX      87		==> Yes
#define OP_31_XOP_STDX      149		==> Yes
#define OP_31_XOP_STWX      151		==> Yes




> +			return &load_store_ops;
> +	}
> +
>   	if (!arch->sorted_instructions) {
>   		ins__sort(arch);
>   		arch->sorted_instructions = true;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc
  2024-05-06 12:19 ` [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc Athira Rajeev
@ 2024-05-07  9:52   ` Christophe Leroy
  2024-05-23 13:58     ` Athira Rajeev
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Leroy @ 2024-05-07  9:52 UTC (permalink / raw)
  To: Athira Rajeev, acme, jolsa, adrian.hunter, irogers, namhyung, segher
  Cc: maddy, kjain, linux-kernel, akanksha, linux-perf-users, disgoel,
	linuxppc-dev



Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> Add instruction tracking function "update_insn_state_powerpc" for
> powerpc. Example sequence in powerpc:
> 
> ld      r10,264(r3)
> mr      r31,r3
> <<after some sequence>
> ld      r9,312(r31)

Your approach looks fragile.

mr is a simplified instruction which in fact is  "or r31, r3, r3"

By the way, the compiler sometimes does it different, like below:

lwz	r10,264(r3)
addi	r31, r3, 312
lwz	r9, 0(r31)

And what about sequences with lwzu ?

> 
> Consider ithe sample is pointing to: "ld r9,312(r31)".
> Here the memory reference is hit at "312(r31)" where 312 is the offset
> and r31 is the source register. Previous instruction sequence shows that
> register state of r3 is moved to r31. So to identify the data type for r31
> access, the previous instruction ("mr") needs to be tracked and the
> state type entry has to be updated. Current instruction tracking support
> in perf tools infrastructure is specific to x86. Patch adds this for
> powerpc and adds "mr" instruction to be tracked.
> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction
  2024-05-06 12:19 ` [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction Athira Rajeev
@ 2024-05-07  9:58   ` Christophe Leroy
  2024-05-23 13:55     ` Athira Rajeev
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Leroy @ 2024-05-07  9:58 UTC (permalink / raw)
  To: Athira Rajeev, acme, jolsa, adrian.hunter, irogers, namhyung, segher
  Cc: maddy, kjain, linux-kernel, akanksha, linux-perf-users, disgoel,
	linuxppc-dev



Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> Update instruction tracking with add instruction. Apart from "mr"
> instruction, the register state is carried on by other insns, ie,
> "add, addi, addis". Since these are not memory instructions and doesn't
> fall in the range of (32 to 63), add these as part of nmemonic table.
> For now, add* instructions are added. There is possibility of getting
> more added here. But to extract regs, still the binary code will be
> used. So associate this with "load_store_ops" itself and no other
> changes required.

Looks fragile.

How do you handle addc, adde, addic ?
And also addme, addze, which only have two operands instead of 3 ?

> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>   .../perf/arch/powerpc/annotate/instructions.c | 21 +++++++++++++++++++
>   tools/perf/util/disasm.c                      |  1 +
>   2 files changed, 22 insertions(+)
> 
> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
> index cce7023951fe..1f35d8a65bb4 100644
> --- a/tools/perf/arch/powerpc/annotate/instructions.c
> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
> @@ -1,6 +1,17 @@
>   // SPDX-License-Identifier: GPL-2.0
>   #include <linux/compiler.h>
>   
> +/*
> + * powerpc instruction nmemonic table to associate load/store instructions with
> + * move_ops. mov_ops is used to identify add/mr to do instruction tracking.
> + */
> +static struct ins powerpc__instructions[] = {
> +	{ .name = "mr",         .ops = &load_store_ops,  },
> +	{ .name = "addi",       .ops = &load_store_ops,   },
> +	{ .name = "addis",      .ops = &load_store_ops,  },
> +	{ .name = "add",        .ops = &load_store_ops,  },
> +};
> +
>   static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, const char *name)
>   {
>   	int i;
> @@ -75,6 +86,9 @@ static void update_insn_state_powerpc(struct type_state *state,
>   	if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
>   		return;
>   
> +	if (!strncmp(dl->ins.name, "add", 3))
> +		goto regs_check;
> +
>   	if (strncmp(dl->ins.name, "mr", 2))
>   		return;
>   
> @@ -85,6 +99,7 @@ static void update_insn_state_powerpc(struct type_state *state,
>   		dst->reg1 = src_reg;
>   	}
>   
> +regs_check:
>   	if (!has_reg_type(state, dst->reg1))
>   		return;
>   
> @@ -115,6 +130,12 @@ static void update_insn_state_powerpc(struct type_state *state __maybe_unused, s
>   static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>   {
>   	if (!arch->initialized) {
> +		arch->nr_instructions = ARRAY_SIZE(powerpc__instructions);
> +		arch->instructions = calloc(arch->nr_instructions, sizeof(struct ins));
> +		if (!arch->instructions)
> +			return -ENOMEM;
> +		memcpy(arch->instructions, powerpc__instructions, sizeof(struct ins) * arch->nr_instructions);
> +		arch->nr_instructions_allocated = arch->nr_instructions;
>   		arch->initialized = true;
>   		arch->associate_instruction_ops = powerpc__associate_instruction_ops;
>   		arch->objdump.comment_char      = '#';
> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
> index ac6b8b8da38a..32cf506a9010 100644
> --- a/tools/perf/util/disasm.c
> +++ b/tools/perf/util/disasm.c
> @@ -36,6 +36,7 @@ static struct ins_ops mov_ops;
>   static struct ins_ops nop_ops;
>   static struct ins_ops lock_ops;
>   static struct ins_ops ret_ops;
> +static struct ins_ops load_store_ops;
>   
>   static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
>   			   struct ins_operands *ops, int max_ins_name);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg
  2024-05-06 12:19 ` [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg Athira Rajeev
@ 2024-05-07 10:03   ` Christophe Leroy
  2024-05-24 12:17     ` Athira Rajeev
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Leroy @ 2024-05-07 10:03 UTC (permalink / raw)
  To: Athira Rajeev, acme, jolsa, adrian.hunter, irogers, namhyung, segher
  Cc: maddy, kjain, linux-kernel, akanksha, linux-perf-users, disgoel,
	linuxppc-dev



Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> There are cases where define a global register variable and associate it
> with a specified register. Example, in powerpc, two registers are
> defined to represent variable:
> 1. r13: represents local_paca
> register struct paca_struct *local_paca asm("r13");
> 
> 2. r1: represents stack_pointer
> register void *__stack_pointer asm("r1");

What about r2:

register struct task_struct *current asm ("r2");

> 
> These regs are present in dwarf debug as DW_OP_reg as part of variables
> in the cu_die (compile unit). These are not present in die search done
> in the list of nested scopes since these are global register variables.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 3/9] tools/perf: Fix a comment about multi_regs in extract_reg_offset function
  2024-05-07  4:40   ` Namhyung Kim
@ 2024-05-07 14:52     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-05-07 14:52 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: irogers, Athira Rajeev, kjain, adrian.hunter, linux-kernel,
	linux-perf-users, maddy, jolsa, akanksha, disgoel, linuxppc-dev

On Mon, May 06, 2024 at 09:40:15PM -0700, Namhyung Kim wrote:
> On Mon, May 6, 2024 at 5:19 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
> >
> > Fix a comment in function which explains how multi_regs field gets set
> > for an instruction. In the example, "mov  %rsi, 8(%rbx,%rcx,4)", the
> > comment mistakenly referred to "dst_multi_regs = 0". Correct it to use
> > "src_multi_regs = 0"
> >
> > Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>

Cherry picked this one into perf-tools-next.

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump
  2024-05-07  9:35   ` Christophe Leroy
@ 2024-05-09 17:26     ` Athira Rajeev
  2024-05-09 20:55       ` Namhyung Kim
  2024-05-10 14:26       ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-09 17:26 UTC (permalink / raw)
  To: Christophe Leroy, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers
  Cc: maddy, kjain, adrian.hunter, linux-kernel, linux-perf-users,
	jolsa, akanksha, disgoel, linuxppc-dev



> On 7 May 2024, at 3:05 PM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
> 
> 
> Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
>> Add support to capture and parse raw instruction in objdump.
> 
> What's the purpose of using 'objdump' for reading raw instructions ? 
> Can't they be read directly without invoking 'objdump' ? It looks odd to 
> me to use objdump to provide readable text and then parse it back.

Hi Christophe,

Thanks for your review comments.

Current implementation for data type profiling on X86 uses "objdump" tool to get the disassembled code.
And then the objdump result lines are parsed to get the instruction name and register fields. The initial patchset I posted to enable the data type profiling feature in powerpc was using the same way by getting disassembled code from objdump and parsing the disassembled lines. But in V2, we are introducing change for powerpc to use "raw instruction" and fetch opcode, reg fields from the raw instruction.

I tried to explain below that current objdump uses option "--no-show-raw-insn" which doesn't capture raw instruction.  So to capture raw instruction, V2 patchset has changes to use default option "--show-raw-insn" and get the raw instruction [ for powerpc ] along with human readable annotation [ which is used by other archs ]. Since perf tool already has objdump implementation in place, I went in the direction to enhance it to use "--show-raw-insn" for powerpc purpose.

But as you mentioned, we can directly read raw instruction without using "objdump" tool.
perf has support to read object code. The dso open/read utilities and helper functions are already present in "util/dso.c" And "dso__data_read_offset" function reads data from dso file offset. We can use these functions and I can make changes to directly read binary instruction without using objdump.

Namhyung, Arnaldo, Christophe
Looking for your valuable feedback on this approach. Please suggest if this approach looks fine


Thanks
Athira
> 
>> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
>> with "objdump" while disassemble. Example from powerpc with this option
>> for an instruction address is:
> 
> Yes and that makes sense because the purpose of objdump is to provide 
> human readable annotations, not to perform automated analysis. Am I 
> missing something ?
> 
>> 
>> Snippet from:
>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>> 
>> c0000000010224b4: lwz     r10,0(r9)
>> 
>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>> registers names and offset. Also to find whether there is a memory
>> reference in the operands, "memory_ref_char" field of objdump is used.
>> For x86, "(" is used as memory_ref_char to tackle instructions of the
>> form "mov  (%rax), %rcx".
>> 
>> In case of powerpc, not all instructions using "(" are the only memory
>> instructions. Example, above instruction can also be of extended form (X
>> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
>> and extract the source/target registers, patch adds support to use raw
>> instruction. With raw instruction, macros are added to extract opcode
>> and register fields.
>> 
>> "struct ins_operands" and "struct ins" is updated to carry opcode and
>> raw instruction binary code (raw_insn). Function "disasm_line__parse"
>> is updated to fill the raw instruction hex value and opcode in newly
>> added fields. There is no changes in existing code paths, which parses
>> the disassembled code. The architecture using the instruction name and
>> present approach is not altered. Since this approach targets powerpc,
>> the macro implementation is added for powerpc as of now.
>> 
>> Example:
>> representation using --show-raw-insn in objdump gives result:
>> 
>> 38 01 81 e8     ld      r4,312(r1)
>> 
>> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
>> this translates to instruction form: "ld RT,DS(RA)" and binary code
>> as:
>> _____________________________________
>> | 58 |  RT  |  RA |      DS       | |
>> -------------------------------------
>> 0    6     11    16              30 31
>> 
>> Function "disasm_line__parse" is updated to capture:
>> 
>> line:    38 01 81 e8     ld      r4,312(r1)
>> opcode and raw instruction "38 01 81 e8"
>> Raw instruction is used later to extract the reg/offset fields.
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump
  2024-05-09 17:26     ` Athira Rajeev
@ 2024-05-09 20:55       ` Namhyung Kim
  2024-05-10 14:26       ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 29+ messages in thread
From: Namhyung Kim @ 2024-05-09 20:55 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Ian Rogers, maddy, kjain, adrian.hunter,
	Arnaldo Carvalho de Melo, linux-kernel, linux-perf-users, jolsa,
	akanksha, disgoel, linuxppc-dev

On Thu, May 9, 2024 at 10:27 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
>
>
> > On 7 May 2024, at 3:05 PM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> >
> >
> >
> > Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> >> Add support to capture and parse raw instruction in objdump.
> >
> > What's the purpose of using 'objdump' for reading raw instructions ?
> > Can't they be read directly without invoking 'objdump' ? It looks odd to
> > me to use objdump to provide readable text and then parse it back.
>
> Hi Christophe,
>
> Thanks for your review comments.
>
> Current implementation for data type profiling on X86 uses "objdump" tool to get the disassembled code.
> And then the objdump result lines are parsed to get the instruction name and register fields. The initial patchset I posted to enable the data type profiling feature in powerpc was using the same way by getting disassembled code from objdump and parsing the disassembled lines. But in V2, we are introducing change for powerpc to use "raw instruction" and fetch opcode, reg fields from the raw instruction.
>
> I tried to explain below that current objdump uses option "--no-show-raw-insn" which doesn't capture raw instruction.  So to capture raw instruction, V2 patchset has changes to use default option "--show-raw-insn" and get the raw instruction [ for powerpc ] along with human readable annotation [ which is used by other archs ]. Since perf tool already has objdump implementation in place, I went in the direction to enhance it to use "--show-raw-insn" for powerpc purpose.
>
> But as you mentioned, we can directly read raw instruction without using "objdump" tool.
> perf has support to read object code. The dso open/read utilities and helper functions are already present in "util/dso.c" And "dso__data_read_offset" function reads data from dso file offset. We can use these functions and I can make changes to directly read binary instruction without using objdump.
>
> Namhyung, Arnaldo, Christophe
> Looking for your valuable feedback on this approach. Please suggest if this approach looks fine

Looks like you want to implement instruction decoding
like in arch/x86/lib/{insn,inat}.c.  I think it's ok to do that
but you need to decide which way is more convenient.

Also it works on the struct disasm_line so you need to
fill in the necessary info when not using objdump.  As
long as it produces the same output I don't care much
if you use objdump or not.  Actually it uses libcapstone
to disassemble x86 instructions if possible.  Maybe you
can use that on powerpc too.

Thanks,
Namhyung

>
>
> Thanks
> Athira
> >
> >> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
> >> with "objdump" while disassemble. Example from powerpc with this option
> >> for an instruction address is:
> >
> > Yes and that makes sense because the purpose of objdump is to provide
> > human readable annotations, not to perform automated analysis. Am I
> > missing something ?
> >
> >>
> >> Snippet from:
> >> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
> >>
> >> c0000000010224b4: lwz     r10,0(r9)
> >>
> >> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> >> registers names and offset. Also to find whether there is a memory
> >> reference in the operands, "memory_ref_char" field of objdump is used.
> >> For x86, "(" is used as memory_ref_char to tackle instructions of the
> >> form "mov  (%rax), %rcx".
> >>
> >> In case of powerpc, not all instructions using "(" are the only memory
> >> instructions. Example, above instruction can also be of extended form (X
> >> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> >> and extract the source/target registers, patch adds support to use raw
> >> instruction. With raw instruction, macros are added to extract opcode
> >> and register fields.
> >>
> >> "struct ins_operands" and "struct ins" is updated to carry opcode and
> >> raw instruction binary code (raw_insn). Function "disasm_line__parse"
> >> is updated to fill the raw instruction hex value and opcode in newly
> >> added fields. There is no changes in existing code paths, which parses
> >> the disassembled code. The architecture using the instruction name and
> >> present approach is not altered. Since this approach targets powerpc,
> >> the macro implementation is added for powerpc as of now.
> >>
> >> Example:
> >> representation using --show-raw-insn in objdump gives result:
> >>
> >> 38 01 81 e8     ld      r4,312(r1)
> >>
> >> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> >> this translates to instruction form: "ld RT,DS(RA)" and binary code
> >> as:
> >> _____________________________________
> >> | 58 |  RT  |  RA |      DS       | |
> >> -------------------------------------
> >> 0    6     11    16              30 31
> >>
> >> Function "disasm_line__parse" is updated to capture:
> >>
> >> line:    38 01 81 e8     ld      r4,312(r1)
> >> opcode and raw instruction "38 01 81 e8"
> >> Raw instruction is used later to extract the reg/offset fields.
> >>
> >> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> >> ---
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump
  2024-05-09 17:26     ` Athira Rajeev
  2024-05-09 20:55       ` Namhyung Kim
@ 2024-05-10 14:26       ` Arnaldo Carvalho de Melo
  2024-05-22 13:58         ` Athira Rajeev
  1 sibling, 1 reply; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-05-10 14:26 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Ian Rogers, maddy, kjain, adrian.hunter, linux-kernel,
	linux-perf-users, jolsa, akanksha, Namhyung Kim, disgoel,
	linuxppc-dev

On Thu, May 09, 2024 at 10:56:23PM +0530, Athira Rajeev wrote:
> 
> 
> > On 7 May 2024, at 3:05 PM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> > 
> > 
> > 
> > Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> >> Add support to capture and parse raw instruction in objdump.
> > 
> > What's the purpose of using 'objdump' for reading raw instructions ? 
> > Can't they be read directly without invoking 'objdump' ? It looks odd to 
> > me to use objdump to provide readable text and then parse it back.
> 
> Hi Christophe,
> 
> Thanks for your review comments.
> 
> Current implementation for data type profiling on X86 uses "objdump" tool to get the disassembled code.

commit 6d17edc113de1e21fc66afa76be475a4f7c91826
Author: Namhyung Kim <namhyung@kernel.org>
Date:   Fri Mar 29 14:58:11 2024 -0700

    perf annotate: Use libcapstone to disassemble
    
    Now it can use the capstone library to disassemble the instructions.
    Let's use that (if available) for perf annotate to speed up.  Currently
    it only supports x86 architecture.  With this change I can see ~3x speed
    up in data type profiling.
    
    But note that capstone cannot give the source file and line number info.
    For now, users should use the external objdump for that by specifying
    the --objdump option explicitly.
    
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Tested-by: Ian Rogers <irogers@google.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Changbin Du <changbin.du@huawei.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Kan Liang <kan.liang@linux.intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20240329215812.537846-5-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

From a quick look at http://www.capstone-engine.org/compile.html it
seems PowerPC is supported.

But since we did it first with objdump output parsing, its good to have
it as an alternative and sometimes a fallback:

commit f35847de2a65137e011e559f38a3de5902a5463f
Author: Namhyung Kim <namhyung@kernel.org>
Date:   Wed Apr 24 17:51:56 2024 -0700

    perf annotate: Fallback disassemble to objdump when capstone fails
    
    I found some cases that capstone failed to disassemble.  Probably my
    capstone is an old version but anyway there's a chance it can fail.  And
    then it silently stopped in the middle.  In my case, it didn't
    understand "RDPKRU" instruction.
    
    Let's check if the capstone disassemble reached the end of the function
    and fallback to objdump if not

---------------

- Arnaldo

> And then the objdump result lines are parsed to get the instruction
> name and register fields. The initial patchset I posted to enable the
> data type profiling feature in powerpc was using the same way by
> getting disassembled code from objdump and parsing the disassembled
> lines. But in V2, we are introducing change for powerpc to use "raw
> instruction" and fetch opcode, reg fields from the raw instruction.
 
> I tried to explain below that current objdump uses option
> "--no-show-raw-insn" which doesn't capture raw instruction.  So to
> capture raw instruction, V2 patchset has changes to use default option
> "--show-raw-insn" and get the raw instruction [ for powerpc ] along
> with human readable annotation [ which is used by other archs ]. Since
> perf tool already has objdump implementation in place, I went in the
> direction to enhance it to use "--show-raw-insn" for powerpc purpose.
 
> But as you mentioned, we can directly read raw instruction without
> using "objdump" tool.  perf has support to read object code. The dso
> open/read utilities and helper functions are already present in
> "util/dso.c" And "dso__data_read_offset" function reads data from dso
> file offset. We can use these functions and I can make changes to
> directly read binary instruction without using objdump.
 
> Namhyung, Arnaldo, Christophe
> Looking for your valuable feedback on this approach. Please suggest if this approach looks fine
> 
> 
> Thanks
> Athira
> > 
> >> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
> >> with "objdump" while disassemble. Example from powerpc with this option
> >> for an instruction address is:
> > 
> > Yes and that makes sense because the purpose of objdump is to provide 
> > human readable annotations, not to perform automated analysis. Am I 
> > missing something ?
> > 
> >> 
> >> Snippet from:
> >> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
> >> 
> >> c0000000010224b4: lwz     r10,0(r9)
> >> 
> >> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> >> registers names and offset. Also to find whether there is a memory
> >> reference in the operands, "memory_ref_char" field of objdump is used.
> >> For x86, "(" is used as memory_ref_char to tackle instructions of the
> >> form "mov  (%rax), %rcx".
> >> 
> >> In case of powerpc, not all instructions using "(" are the only memory
> >> instructions. Example, above instruction can also be of extended form (X
> >> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> >> and extract the source/target registers, patch adds support to use raw
> >> instruction. With raw instruction, macros are added to extract opcode
> >> and register fields.
> >> 
> >> "struct ins_operands" and "struct ins" is updated to carry opcode and
> >> raw instruction binary code (raw_insn). Function "disasm_line__parse"
> >> is updated to fill the raw instruction hex value and opcode in newly
> >> added fields. There is no changes in existing code paths, which parses
> >> the disassembled code. The architecture using the instruction name and
> >> present approach is not altered. Since this approach targets powerpc,
> >> the macro implementation is added for powerpc as of now.
> >> 
> >> Example:
> >> representation using --show-raw-insn in objdump gives result:
> >> 
> >> 38 01 81 e8     ld      r4,312(r1)
> >> 
> >> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> >> this translates to instruction form: "ld RT,DS(RA)" and binary code
> >> as:
> >> _____________________________________
> >> | 58 |  RT  |  RA |      DS       | |
> >> -------------------------------------
> >> 0    6     11    16              30 31
> >> 
> >> Function "disasm_line__parse" is updated to capture:
> >> 
> >> line:    38 01 81 e8     ld      r4,312(r1)
> >> opcode and raw instruction "38 01 81 e8"
> >> Raw instruction is used later to extract the reg/offset fields.
> >> 
> >> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> >> ---

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 0/9] Add data type profiling support for powerpc
  2024-05-07  4:39 ` [PATCH V2 0/9] Add data type profiling support for powerpc Namhyung Kim
@ 2024-05-13  7:32   ` Athira Rajeev
  0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-13  7:32 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Madhavan Srinivasan, Kajol Jain, Adrian Hunter, LKML,
	linux-perf-users, Arnaldo Carvalho de Melo, Jiri Olsa, akanksha,
	Disha Goel, linuxppc-dev



> On 7 May 2024, at 10:09 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> Hello,
> 
> On Mon, May 6, 2024 at 5:19 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
>> 
>> The patchset from Namhyung added support for data type profiling
>> in perf tool. This enabled support to associate PMU samples to data
>> types they refer using DWARF debug information. With the upstream
>> perf, currently it possible to run perf report or perf annotate to
>> view the data type information on x86.
>> 
>> Initial patchset posted here had changes need to enable data type
>> profiling support for powerpc.
>> 
>> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
>> 
>> Main change were:
>> 1. powerpc instruction nmemonic table to associate load/store
>> instructions with move_ops which is use to identify if instruction
>> is a memory access one.
>> 2. To get register number and access offset from the given
>> instruction, code uses fields from "struct arch" -> objump.
>> Added entry for powerpc here.
>> 3. A get_arch_regnum to return register number from the
>> register name string.
>> 
>> But the apporach used in the initial patchset used parsing of
>> disassembled code which the current perf tool implementation does.
>> 
>> Example: lwz     r10,0(r9)
>> 
>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>> registers names and offset. Also to find whether there is a memory
>> reference in the operands, "memory_ref_char" field of objdump is used.
>> For x86, "(" is used as memory_ref_char to tackle instructions of the
>> form "mov  (%rax), %rcx".
>> 
>> In case of powerpc, not all instructions using "(" are the only memory
>> instructions. Example, above instruction can also be of extended form (X
>> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
>> and extract the source/target registers, this patchset adds support to use
>> raw instruction. With raw instruction, macros are added to extract opcode
>> and register fields.
>> 
>> Example representation using --show-raw-insn in objdump gives result:
>> 
>> 38 01 81 e8     ld      r4,312(r1)
>> 
>> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
>> this translates to instruction form: "ld RT,DS(RA)" and binary code
>> as:
>> _____________________________________
>> | 58 |  RT  |  RA |      DS       | |
>> -------------------------------------
>> 0    6     11    16              30 31
>> 
>> Patchset adds support to pick the opcode and reg fields from this
>> raw/binary instruction code. This approach came in from review comment
>> by Segher Boessenkool for the initial patchset.
>> 
>> Apart from that, instruction tracking is enabled for powerpc and
>> support function is added to find variables defined as registers
>> Example, in powerpc, two registers are
>> defined to represent variable:
>> 1. r13: represents local_paca
>> register struct paca_struct *local_paca asm("r13");
>> 
>> 2. r1: represents stack_pointer
>> register void *__stack_pointer asm("r1");
>> 
>> These are handled in this patchset.
>> 
>> - Patch 1 is to rearrange register state type structures to header file
>> so that it can referred from other arch specific files
>> - Patch 2 is to make instruction tracking as a callback to"struct arch"
>> so that it can be implemented by other archs easily and defined in arch
>> specific files
>> - Patch 3 is to fix a small comment
>> - Patch 4 adds support to capture and parse raw instruction in objdump
>> by keeping existing approach intact.
>> - Patch 5 update parameters for reg extract functions to use raw
>> instruction on powerpc
>> - Patch 6 and patch 7 handles instruction tracking for powerpc.
>> - Patch 8 and Patch 8 handles support to find global register variables
>> 
>> With the current patchset:
>> 
>> ./perf record -a -e mem-loads sleep 1
>> ./perf report -s type,typeoff --hierarchy --group --stdio
>> ./perf annotate --data-type --insn-stat
>> 
>> perf annotate logs:
>> 
>> Annotate Instruction stats
>> total 562, ok 441 (78.5%), bad 121 (21.5%)
>> 
>>  Name      :  Good   Bad
>> -----------------------------------------------------------
>>  ld        :   313    54
>>  lwz       :    51    32
>>  lbz       :    31     5
>>  ldx       :     6    21
>>  lhz       :    23     0
>>  lwa       :     4     3
>>  lwarx     :     5     0
>>  lwzx      :     2     2
>>  ldarx     :     3     0
>>  lwzu      :     2     0
>>  stdcx.    :     0     1
>>  nop       :     0     1
>>  ldu       :     1     0
>>  lbzx      :     0     1
>>  lwax      :     0     1
>> 
>> perf report logs:
>> 
>> # Samples: 1K of event 'mem-loads'
>> # Event count (approx.): 937238
>> #
>> # Overhead  Data Type  Data Type Offset
>> # ........  .........  ................
>> #
>>    48.81%  (unknown)  (unknown) +0 (no field)
>>    12.85%  long unsigned int  long unsigned int +0 (current_stack_pointer)
>>     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
>>     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
>>     2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
>>     2.64%  struct paca_struct  struct paca_struct +2808 (canary)
>>     2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
>>     1.41%  struct vm_fault  struct vm_fault +0 (vma)
>>     1.29%  struct task_struct  struct task_struct +276 (flags)
>>     1.03%  struct pt_regs  struct pt_regs +264 (user_regs.msr)
>>     1.00%  struct menu_device  struct menu_device +4 (tick_wakeup)
>>     0.90%  struct security_hook_list  struct security_hook_list +0 (list.next)
>>     0.76%  struct irq_desc  struct irq_desc +304 (irq_data.chip)
>>     0.76%  struct rq  struct rq +2856 (cpu)
> 
> Looks great!  I'm glad it worked on powerpc too.
> We still need to verify the returned type is properly annotated.
> But overall it looks really good, I will leave comments in reply.

Hi Namhyung

Thanks a lot for looking at the patchset and sharing the review comments.
I will address review comments and respond to them sooner.

Thanks
Athira
> 
> Thanks,
> Namhyung
> 
>> 
>> Thanks
>> Athira Rajeev
>> 
>> Changelog:
>> From v1->v2:
>> - Addressed suggestion from Christophe Leroy and Segher Boessenkool
>>  to use the binary code (raw insn) to fetch opcode, register and
>>  offset fields.
>> - Added support for instruction tracking in powerpc
>> - Find the register defined variables (r13 and r1 which points to
>>  local_paca and current_stack_pointer in powerpc)
>> 
>> Athira Rajeev (9):
>>  tools/perf: Move the data structures related to register  type to
>>    header file
>>  tools/perf: Add "update_insn_state" callback function to handle arch
>>    specific instruction tracking
>>  tools/perf: Fix a comment about multi_regs in extract_reg_offset
>>    function
>>  tools/perf: Add support to capture and parse raw instruction in
>>    objdump
>>  tools/perf: Update parameters for reg extract functions to use raw
>>    instruction on powerpc
>>  tools/perf: Update instruction tracking for powerpc
>>  tools/perf: Update instruction tracking with add instruction
>>  tools/perf: Add support to find global register variables using
>>    find_data_type_global_reg
>>  tools/perf: Add support for global_die to capture name of variable in
>>    case of register defined variable
>> 
>> tools/include/linux/string.h                  |   2 +
>> tools/lib/string.c                            |  13 +
>> .../perf/arch/powerpc/annotate/instructions.c |  84 +++
>> tools/perf/arch/powerpc/util/dwarf-regs.c     |  52 ++
>> tools/perf/arch/x86/annotate/instructions.c   | 383 +++++++++++++
>> tools/perf/util/annotate-data.c               | 519 +++---------------
>> tools/perf/util/annotate-data.h               |  78 +++
>> tools/perf/util/annotate.c                    |  32 +-
>> tools/perf/util/annotate.h                    |   1 +
>> tools/perf/util/disasm.c                      | 109 +++-
>> tools/perf/util/disasm.h                      |  17 +-
>> tools/perf/util/dwarf-aux.c                   |   1 +
>> tools/perf/util/dwarf-aux.h                   |   1 +
>> tools/perf/util/include/dwarf-regs.h          |  12 +
>> tools/perf/util/sort.c                        |   7 +-
>> 15 files changed, 854 insertions(+), 457 deletions(-)
>> 
>> --
>> 2.43.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump
  2024-05-10 14:26       ` Arnaldo Carvalho de Melo
@ 2024-05-22 13:58         ` Athira Rajeev
  0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-22 13:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Namhyung Kim
  Cc: Ian Rogers, maddy, kjain, adrian.hunter, linux-kernel,
	linux-perf-users, jolsa, akanksha, disgoel, linuxppc-dev



> On 10 May 2024, at 7:56 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> On Thu, May 09, 2024 at 10:56:23PM +0530, Athira Rajeev wrote:
>> 
>> 
>>> On 7 May 2024, at 3:05 PM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
>>> 
>>> 
>>> 
>>> Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
>>>> Add support to capture and parse raw instruction in objdump.
>>> 
>>> What's the purpose of using 'objdump' for reading raw instructions ? 
>>> Can't they be read directly without invoking 'objdump' ? It looks odd to 
>>> me to use objdump to provide readable text and then parse it back.
>> 
>> Hi Christophe,
>> 
>> Thanks for your review comments.
>> 
>> Current implementation for data type profiling on X86 uses "objdump" tool to get the disassembled code.
> 
> commit 6d17edc113de1e21fc66afa76be475a4f7c91826
> Author: Namhyung Kim <namhyung@kernel.org>
> Date:   Fri Mar 29 14:58:11 2024 -0700
> 
>    perf annotate: Use libcapstone to disassemble
> 
>    Now it can use the capstone library to disassemble the instructions.
>    Let's use that (if available) for perf annotate to speed up.  Currently
>    it only supports x86 architecture.  With this change I can see ~3x speed
>    up in data type profiling.
> 
>    But note that capstone cannot give the source file and line number info.
>    For now, users should use the external objdump for that by specifying
>    the --objdump option explicitly.
> 
>    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>    Tested-by: Ian Rogers <irogers@google.com>
>    Cc: Adrian Hunter <adrian.hunter@intel.com>
>    Cc: Changbin Du <changbin.du@huawei.com>
>    Cc: Ingo Molnar <mingo@kernel.org>
>    Cc: Jiri Olsa <jolsa@kernel.org>
>    Cc: Kan Liang <kan.liang@linux.intel.com>
>    Cc: Peter Zijlstra <peterz@infradead.org>
>    Link: https://lore.kernel.org/r/20240329215812.537846-5-namhyung@kernel.org
>    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> From a quick look at http://www.capstone-engine.org/compile.html it
> seems PowerPC is supported.
> 
> But since we did it first with objdump output parsing, its good to have
> it as an alternative and sometimes a fallback:

Hi Arnaldo, Namhyung

Thanks for the suggestions. libcapstone is a good option and it is faster too.
I will address these changes in V3.

Thanks
Athira
> 
> commit f35847de2a65137e011e559f38a3de5902a5463f
> Author: Namhyung Kim <namhyung@kernel.org>
> Date:   Wed Apr 24 17:51:56 2024 -0700
> 
>    perf annotate: Fallback disassemble to objdump when capstone fails
> 
>    I found some cases that capstone failed to disassemble.  Probably my
>    capstone is an old version but anyway there's a chance it can fail.  And
>    then it silently stopped in the middle.  In my case, it didn't
>    understand "RDPKRU" instruction.
> 
>    Let's check if the capstone disassemble reached the end of the function
>    and fallback to objdump if not
> 
> ---------------
> 
> - Arnaldo
> 
>> And then the objdump result lines are parsed to get the instruction
>> name and register fields. The initial patchset I posted to enable the
>> data type profiling feature in powerpc was using the same way by
>> getting disassembled code from objdump and parsing the disassembled
>> lines. But in V2, we are introducing change for powerpc to use "raw
>> instruction" and fetch opcode, reg fields from the raw instruction.
> 
>> I tried to explain below that current objdump uses option
>> "--no-show-raw-insn" which doesn't capture raw instruction.  So to
>> capture raw instruction, V2 patchset has changes to use default option
>> "--show-raw-insn" and get the raw instruction [ for powerpc ] along
>> with human readable annotation [ which is used by other archs ]. Since
>> perf tool already has objdump implementation in place, I went in the
>> direction to enhance it to use "--show-raw-insn" for powerpc purpose.
> 
>> But as you mentioned, we can directly read raw instruction without
>> using "objdump" tool.  perf has support to read object code. The dso
>> open/read utilities and helper functions are already present in
>> "util/dso.c" And "dso__data_read_offset" function reads data from dso
>> file offset. We can use these functions and I can make changes to
>> directly read binary instruction without using objdump.
> 
>> Namhyung, Arnaldo, Christophe
>> Looking for your valuable feedback on this approach. Please suggest if this approach looks fine
>> 
>> 
>> Thanks
>> Athira
>>> 
>>>> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
>>>> with "objdump" while disassemble. Example from powerpc with this option
>>>> for an instruction address is:
>>> 
>>> Yes and that makes sense because the purpose of objdump is to provide 
>>> human readable annotations, not to perform automated analysis. Am I 
>>> missing something ?
>>> 
>>>> 
>>>> Snippet from:
>>>> objdump  --start-address=<address> --stop-address=<address>  -d --no-show-raw-insn -C <vmlinux>
>>>> 
>>>> c0000000010224b4: lwz     r10,0(r9)
>>>> 
>>>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>>>> registers names and offset. Also to find whether there is a memory
>>>> reference in the operands, "memory_ref_char" field of objdump is used.
>>>> For x86, "(" is used as memory_ref_char to tackle instructions of the
>>>> form "mov  (%rax), %rcx".
>>>> 
>>>> In case of powerpc, not all instructions using "(" are the only memory
>>>> instructions. Example, above instruction can also be of extended form (X
>>>> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
>>>> and extract the source/target registers, patch adds support to use raw
>>>> instruction. With raw instruction, macros are added to extract opcode
>>>> and register fields.
>>>> 
>>>> "struct ins_operands" and "struct ins" is updated to carry opcode and
>>>> raw instruction binary code (raw_insn). Function "disasm_line__parse"
>>>> is updated to fill the raw instruction hex value and opcode in newly
>>>> added fields. There is no changes in existing code paths, which parses
>>>> the disassembled code. The architecture using the instruction name and
>>>> present approach is not altered. Since this approach targets powerpc,
>>>> the macro implementation is added for powerpc as of now.
>>>> 
>>>> Example:
>>>> representation using --show-raw-insn in objdump gives result:
>>>> 
>>>> 38 01 81 e8     ld      r4,312(r1)
>>>> 
>>>> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
>>>> this translates to instruction form: "ld RT,DS(RA)" and binary code
>>>> as:
>>>> _____________________________________
>>>> | 58 |  RT  |  RA |      DS       | |
>>>> -------------------------------------
>>>> 0    6     11    16              30 31
>>>> 
>>>> Function "disasm_line__parse" is updated to capture:
>>>> 
>>>> line:    38 01 81 e8     ld      r4,312(r1)
>>>> opcode and raw instruction "38 01 81 e8"
>>>> Raw instruction is used later to extract the reg/offset fields.
>>>> 
>>>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>>>> ---



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 5/9] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc
  2024-05-07  9:48   ` Christophe Leroy
@ 2024-05-22 14:06     ` Athira Rajeev
  0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-22 14:06 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: irogers, maddy, kjain, adrian.hunter, acme, linux-kernel,
	linux-perf-users, jolsa, akanksha, namhyung, disgoel,
	linuxppc-dev



> On 7 May 2024, at 3:18 PM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
> 
> 
> Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
>> Use the raw instruction code and macros to identify memory instructions,
>> extract register fields and also offset. The implementation addresses
>> the D-form, X-form, DS-form instructions. Two main functions are added.
>> New parse function "load_store__parse" as instruction ops parser for
>> memory instructions. Unlink other parser (like mov__parse), this parser
>> fills in only the "raw" field for source/target and new added "mem_ref"
>> field. This is because, here there is no need to parse the disassembled
>> code and arch specific macros will take care of extracting offset and
>> regs which is easier and will be precise.
>> 
>> In powerpc, all instructions with a primary opcode from 32 to 63
>> are memory instructions. Update "ins__find" function to have "opcode"
>> also as a parameter. Don't use the "extract_reg_offset", instead use
>> newly added function "get_arch_regs" which will set these fields: reg1,
>> reg2, offset depending of where it is source or target ops.
> 
> Yes all instructions with a primary opcode from 32 to 63 are memory 
> instructions, but not all memory instructions have opcode 32 to 63.
> 
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>>  tools/perf/arch/powerpc/util/dwarf-regs.c | 33 +++++++++++++
>>  tools/perf/util/annotate.c                | 22 ++++++++-
>>  tools/perf/util/disasm.c                  | 59 +++++++++++++++++++++--
>>  tools/perf/util/disasm.h                  |  4 +-
>>  tools/perf/util/include/dwarf-regs.h      |  4 +-
>>  5 files changed, 114 insertions(+), 8 deletions(-)
>> 
>> diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
>> index e60a71fd846e..3121c70dc0d3 100644
>> --- a/tools/perf/arch/powerpc/util/dwarf-regs.c
>> +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
>> @@ -102,6 +102,9 @@ int regs_query_register_offset(const char *name)
>>  #define PPC_OP(op) (((op) >> 26) & 0x3F)
>>  #define PPC_RA(a) (((a) >> 16) & 0x1f)
>>  #define PPC_RT(t) (((t) >> 21) & 0x1f)
>> +#define PPC_RB(b) (((b) >> 11) & 0x1f)
>> +#define PPC_D(D) ((D) & 0xfffe)
>> +#define PPC_DS(DS) ((DS) & 0xfffc)
>> 
>>  int get_opcode_insn(unsigned int raw_insn)
>>  {
>> @@ -117,3 +120,33 @@ int get_target_reg(unsigned int raw_insn)
>>  {
>>   return PPC_RT(raw_insn);
>>  }
>> +
>> +int get_offset_opcode(int raw_insn __maybe_unused)
>> +{
>> + int opcode = PPC_OP(raw_insn);
>> +
>> + /* DS- form */
>> + if ((opcode == 58) || (opcode == 62))
> 
> Can you re-use macros from arch/powerpc/include/asm/ppc-opcode.h ?
> 
> #define OP_LD 58
> #define OP_STD 62

Hi Christophe

Sure, I will make these changes

> 
> 
>> + return PPC_DS(raw_insn);
>> + else
>> + return PPC_D(raw_insn);
>> +}
>> +
>> +/*
>> + * Fills the required fields for op_loc depending on if it
>> + * is a source of target.
>> + * D form: ins RT,D(RA) -> src_reg1 = RA, offset = D, dst_reg1 = RT
>> + * DS form: ins RT,DS(RA) -> src_reg1 = RA, offset = DS, dst_reg1 = RT
>> + * X form: ins RT,RA,RB -> src_reg1 = RA, src_reg2 = RB, dst_reg1 = RT
>> + */
>> +void get_arch_regs(int raw_insn __maybe_unused, int is_source __maybe_unused, struct annotated_op_loc *op_loc __maybe_unused)
>> +{
>> + if (is_source)
>> + op_loc->reg1 = get_source_reg(raw_insn);
>> + else
>> + op_loc->reg1 = get_target_reg(raw_insn);
>> + if (op_loc->multi_regs)
>> + op_loc->reg2 = PPC_RB(raw_insn);
>> + if (op_loc->mem_ref)
>> + op_loc->offset = get_offset_opcode(raw_insn);
>> +}
> 
>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>> index 85692f73e78f..f41a0fadeab4 100644
>> --- a/tools/perf/util/disasm.c
>> +++ b/tools/perf/util/disasm.c
>> @@ -760,11 +800,20 @@ static void ins__sort(struct arch *arch)
>>   qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
>>  }
>> 
>> -static struct ins_ops *__ins__find(struct arch *arch, const char *name)
>> +static struct ins_ops *__ins__find(struct arch *arch, const char *name, int opcode)
>>  {
>>   struct ins *ins;
>>   const int nmemb = arch->nr_instructions;
>> 
>> + if (arch__is(arch, "powerpc")) {
>> + /*
>> +  * Instructions with a primary opcode from 32 to 63
>> +  * are memory instructions in powerpc.
>> +  */
>> + if ((opcode >= 31) && (opcode <= 63))
> 
> Could just be if ((opcode & 0x20)) I guess.
Ok,, 
> 
> By the way your test is wrong because opcode 31 is not only memory 
> instructions, see example below (not exhaustive):
> 
> #define OP_31_XOP_TRAP      4 ==> No
> #define OP_31_XOP_LDX       21 ==> Yes
> #define OP_31_XOP_LWZX      23 ==> Yes
> #define OP_31_XOP_LDUX      53 ==> Yes
> #define OP_31_XOP_DCBST     54 ==> No
> #define OP_31_XOP_LWZUX     55 ==> Yes
> #define OP_31_XOP_TRAP_64   68 ==> No
> #define OP_31_XOP_DCBF      86 ==> No
> #define OP_31_XOP_LBZX      87 ==> Yes
> #define OP_31_XOP_STDX      149 ==> Yes
> #define OP_31_XOP_STWX      151 ==> Yes

Thanks for pointing this. I checked through others in this list in arch/powerpc/include/asm/ppc-opcode.h 
I will have these filtered in V3

Thanks
Athira
> 
> 
> 
> 
>> + return &load_store_ops;
>> + }
>> +
>>   if (!arch->sorted_instructions) {
>>   ins__sort(arch);
>>   arch->sorted_instructions = true;



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction
  2024-05-07  9:58   ` Christophe Leroy
@ 2024-05-23 13:55     ` Athira Rajeev
  0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-23 13:55 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: irogers, maddy, kjain, adrian.hunter, acme, linux-kernel,
	linux-perf-users, jolsa, akanksha, namhyung, disgoel,
	linuxppc-dev



> On 7 May 2024, at 3:28 PM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
> 
> 
> Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
>> Update instruction tracking with add instruction. Apart from "mr"
>> instruction, the register state is carried on by other insns, ie,
>> "add, addi, addis". Since these are not memory instructions and doesn't
>> fall in the range of (32 to 63), add these as part of nmemonic table.
>> For now, add* instructions are added. There is possibility of getting
>> more added here. But to extract regs, still the binary code will be
>> used. So associate this with "load_store_ops" itself and no other
>> changes required.
> 
> Looks fragile.
> 
> How do you handle addc, adde, addic ?
> And also addme, addze, which only have two operands instead of 3 ?

Hi Christophe,

Thanks for pointing these cases. Will address these cases in next version

Thanks
Athira
> 
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>>  .../perf/arch/powerpc/annotate/instructions.c | 21 +++++++++++++++++++
>>  tools/perf/util/disasm.c                      |  1 +
>>  2 files changed, 22 insertions(+)
>> 
>> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c b/tools/perf/arch/powerpc/annotate/instructions.c
>> index cce7023951fe..1f35d8a65bb4 100644
>> --- a/tools/perf/arch/powerpc/annotate/instructions.c
>> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
>> @@ -1,6 +1,17 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include <linux/compiler.h>
>> 
>> +/*
>> + * powerpc instruction nmemonic table to associate load/store instructions with
>> + * move_ops. mov_ops is used to identify add/mr to do instruction tracking.
>> + */
>> +static struct ins powerpc__instructions[] = {
>> + { .name = "mr",         .ops = &load_store_ops,  },
>> + { .name = "addi",       .ops = &load_store_ops,   },
>> + { .name = "addis",      .ops = &load_store_ops,  },
>> + { .name = "add",        .ops = &load_store_ops,  },
>> +};
>> +
>>  static struct ins_ops *powerpc__associate_instruction_ops(struct arch *arch, const char *name)
>>  {
>>   int i;
>> @@ -75,6 +86,9 @@ static void update_insn_state_powerpc(struct type_state *state,
>>   if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0)
>>   return;
>> 
>> + if (!strncmp(dl->ins.name, "add", 3))
>> + goto regs_check;
>> +
>>   if (strncmp(dl->ins.name, "mr", 2))
>>   return;
>> 
>> @@ -85,6 +99,7 @@ static void update_insn_state_powerpc(struct type_state *state,
>>   dst->reg1 = src_reg;
>>   }
>> 
>> +regs_check:
>>   if (!has_reg_type(state, dst->reg1))
>>   return;
>> 
>> @@ -115,6 +130,12 @@ static void update_insn_state_powerpc(struct type_state *state __maybe_unused, s
>>  static int powerpc__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
>>  {
>>   if (!arch->initialized) {
>> + arch->nr_instructions = ARRAY_SIZE(powerpc__instructions);
>> + arch->instructions = calloc(arch->nr_instructions, sizeof(struct ins));
>> + if (!arch->instructions)
>> + return -ENOMEM;
>> + memcpy(arch->instructions, powerpc__instructions, sizeof(struct ins) * arch->nr_instructions);
>> + arch->nr_instructions_allocated = arch->nr_instructions;
>>   arch->initialized = true;
>>   arch->associate_instruction_ops = powerpc__associate_instruction_ops;
>>   arch->objdump.comment_char      = '#';
>> diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
>> index ac6b8b8da38a..32cf506a9010 100644
>> --- a/tools/perf/util/disasm.c
>> +++ b/tools/perf/util/disasm.c
>> @@ -36,6 +36,7 @@ static struct ins_ops mov_ops;
>>  static struct ins_ops nop_ops;
>>  static struct ins_ops lock_ops;
>>  static struct ins_ops ret_ops;
>> +static struct ins_ops load_store_ops;
>> 
>>  static int jump__scnprintf(struct ins *ins, char *bf, size_t size,
>>     struct ins_operands *ops, int max_ins_name);


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc
  2024-05-07  9:52   ` Christophe Leroy
@ 2024-05-23 13:58     ` Athira Rajeev
  0 siblings, 0 replies; 29+ messages in thread
From: Athira Rajeev @ 2024-05-23 13:58 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: irogers, maddy, kjain, adrian.hunter, acme, linux-kernel,
	linux-perf-users, jolsa, akanksha, namhyung, disgoel,
	linuxppc-dev



> On 7 May 2024, at 3:22 PM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
> 
> 
> Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
>> Add instruction tracking function "update_insn_state_powerpc" for
>> powerpc. Example sequence in powerpc:
>> 
>> ld      r10,264(r3)
>> mr      r31,r3
>> <<after some sequence>
>> ld      r9,312(r31)
> 
> Your approach looks fragile.
> 
> mr is a simplified instruction which in fact is  "or r31, r3, r3"
> 
> By the way, the compiler sometimes does it different, like below:
> 
> lwz r10,264(r3)
> addi r31, r3, 312
> lwz r9, 0(r31)
> 
> And what about sequences with lwzu ?
Hi Christophe,

This patch added “mr”. In next patch in same series, add instruction also is added to instruction tracking.
I will be posting a V3 with some changes to the logic for annotating add instructions.

Thanks
Athira
> 
>> 
>> Consider ithe sample is pointing to: "ld r9,312(r31)".
>> Here the memory reference is hit at "312(r31)" where 312 is the offset
>> and r31 is the source register. Previous instruction sequence shows that
>> register state of r3 is moved to r31. So to identify the data type for r31
>> access, the previous instruction ("mr") needs to be tracked and the
>> state type entry has to be updated. Current instruction tracking support
>> in perf tools infrastructure is specific to x86. Patch adds this for
>> powerpc and adds "mr" instruction to be tracked.
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg
  2024-05-07 10:03   ` Christophe Leroy
@ 2024-05-24 12:17     ` Athira Rajeev
  2024-05-24 12:47       ` Christophe Leroy
  0 siblings, 1 reply; 29+ messages in thread
From: Athira Rajeev @ 2024-05-24 12:17 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: irogers, maddy, kjain, adrian.hunter, acme, linux-kernel,
	linux-perf-users, jolsa, akanksha, namhyung, disgoel,
	linuxppc-dev



> On 7 May 2024, at 3:33 PM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
> 
> 
> Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
>> There are cases where define a global register variable and associate it
>> with a specified register. Example, in powerpc, two registers are
>> defined to represent variable:
>> 1. r13: represents local_paca
>> register struct paca_struct *local_paca asm("r13");
>> 
>> 2. r1: represents stack_pointer
>> register void *__stack_pointer asm("r1");
> 
> What about r2:
> 
> register struct task_struct *current asm ("r2");

Hi Christophe,

Referring to arch/powerpc/include/asm/current.h, “current” in powerpc 64 bit is from paca_struct which is handled with r13
R2 definition which you shared above is for 32 bit case.

Thanks
Athira
> 
>> 
>> These regs are present in dwarf debug as DW_OP_reg as part of variables
>> in the cu_die (compile unit). These are not present in die search done
>> in the list of nested scopes since these are global register variables.
>> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg
  2024-05-24 12:17     ` Athira Rajeev
@ 2024-05-24 12:47       ` Christophe Leroy
  0 siblings, 0 replies; 29+ messages in thread
From: Christophe Leroy @ 2024-05-24 12:47 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: irogers, maddy, kjain, adrian.hunter, acme, linux-kernel,
	linux-perf-users, jolsa, akanksha, namhyung, disgoel,
	linuxppc-dev



Le 24/05/2024 à 14:17, Athira Rajeev a écrit :
> 
> 
>> On 7 May 2024, at 3:33 PM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
>>
>>
>>
>> Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
>>> There are cases where define a global register variable and associate it
>>> with a specified register. Example, in powerpc, two registers are
>>> defined to represent variable:
>>> 1. r13: represents local_paca
>>> register struct paca_struct *local_paca asm("r13");
>>>
>>> 2. r1: represents stack_pointer
>>> register void *__stack_pointer asm("r1");
>>
>> What about r2:
>>
>> register struct task_struct *current asm ("r2");
> 
> Hi Christophe,
> 
> Referring to arch/powerpc/include/asm/current.h, “current” in powerpc 64 bit is from paca_struct which is handled with r13
> R2 definition which you shared above is for 32 bit case.
> 

Hi Athira,

Yes I know.

Your patches are meant to handle both powerpc/64 and powerpc/32, aren't 
they ?

Christophe

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-05-24 12:56 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-06 12:18 [PATCH V2 0/9] Add data type profiling support for powerpc Athira Rajeev
2024-05-06 12:18 ` [PATCH V2 1/9] tools/perf: Move the data structures related to register type to header file Athira Rajeev
2024-05-06 12:18 ` [PATCH V2 2/9] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking Athira Rajeev
2024-05-06 12:19 ` [PATCH V2 3/9] tools/perf: Fix a comment about multi_regs in extract_reg_offset function Athira Rajeev
2024-05-07  4:40   ` Namhyung Kim
2024-05-07 14:52     ` Arnaldo Carvalho de Melo
2024-05-06 12:19 ` [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump Athira Rajeev
2024-05-07  4:57   ` Namhyung Kim
2024-05-07  9:35   ` Christophe Leroy
2024-05-09 17:26     ` Athira Rajeev
2024-05-09 20:55       ` Namhyung Kim
2024-05-10 14:26       ` Arnaldo Carvalho de Melo
2024-05-22 13:58         ` Athira Rajeev
2024-05-06 12:19 ` [PATCH V2 5/9] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc Athira Rajeev
2024-05-07  9:48   ` Christophe Leroy
2024-05-22 14:06     ` Athira Rajeev
2024-05-06 12:19 ` [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc Athira Rajeev
2024-05-07  9:52   ` Christophe Leroy
2024-05-23 13:58     ` Athira Rajeev
2024-05-06 12:19 ` [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction Athira Rajeev
2024-05-07  9:58   ` Christophe Leroy
2024-05-23 13:55     ` Athira Rajeev
2024-05-06 12:19 ` [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg Athira Rajeev
2024-05-07 10:03   ` Christophe Leroy
2024-05-24 12:17     ` Athira Rajeev
2024-05-24 12:47       ` Christophe Leroy
2024-05-06 12:19 ` [PATCH V2 9/9] tools/perf: Add support for global_die to capture name of variable in case of register defined variable Athira Rajeev
2024-05-07  4:39 ` [PATCH V2 0/9] Add data type profiling support for powerpc Namhyung Kim
2024-05-13  7:32   ` Athira Rajeev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).