All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 00/16] Add the page size in the perf record
@ 2020-08-10 21:24 Kan Liang
  2020-08-10 21:24 ` [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
                   ` (15 more replies)
  0 siblings, 16 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

Changes since V5
- Introduce a new universal page walker for the page size in the perf
  subsystem.
- Rebased on Peter's tree.

Current perf can report both virtual addresses and physical addresses,
but not the page size. Without the page size information of the utilized
page, users cannot decide whether to promote/demote large pages to
optimize memory usage.

The patch set was submitted a year ago.
https://lkml.kernel.org/r/1549648509-12704-1-git-send-email-kan.liang@linux.intel.com
It introduced a __weak function, perf_get_page_size(), aim to retrieve
the page size via a given virtual address in the generic code, and
implemented a x86 specific version of perf_get_page_size().
However, the proposal was rejected, because it's a pure x86
implementation.
https://lkml.kernel.org/r/20190208200731.GN32511@hirez.programming.kicks-ass.net

At that time, it's not easy to support perf_get_page_size() universally,
because some key functions, e.g., p?d_large, are not supported by some
architectures.

Now, the generic p?d_leaf() functions are added in the latest kernel.
https://lkml.kernel.org/r/20191218162402.45610-2-steven.price@arm.com
Starts from V6, a new universal perf_get_page_size() function is
implemented based on the generic p?d_leaf() functions.

On some platforms, e.g., X86, the page walker is invoked in an NMI
handler. So the page walker must be IRQ-safe and low overhead. Besides,
the page walker should work for both user and kernel virtual address.
The existing generic page walker, e.g., walk_page_range_novma(), is a
little bit complex and doesn't guarantee the IRQ-safe. The follow_page()
is only for the user-virtual address. So a simpler page walk function is
implemented here.

Kan Liang (11):
  perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE
  tools headers UAPI: Update tools's copy of linux/perf_event.h
  perf record: Support new sample type for data page size
  perf script: Use ULL for enum perf_output_field
  perf script: Support data page size
  perf sort: Add sort option for data page size
  perf mem: Factor out a function to generate sort order
  perf mem: Clean up output format
  perf mem: Support data page size
  perf test: Add test case for PERF_SAMPLE_DATA_PAGE_SIZE

Stephane Eranian (5):
  perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  perf tools: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  perf script: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  perf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  perf test: Add test case for PERF_SAMPLE_CODE_PAGE_SIZE

 arch/x86/events/intel/ds.c                |  11 +-
 include/linux/perf_event.h                |   2 +
 include/uapi/linux/perf_event.h           |   6 +-
 kernel/events/core.c                      | 132 ++++++++++++++++++-
 tools/include/uapi/linux/perf_event.h     |   6 +-
 tools/perf/Documentation/perf-mem.txt     |   3 +
 tools/perf/Documentation/perf-record.txt  |   6 +
 tools/perf/Documentation/perf-report.txt  |   2 +
 tools/perf/Documentation/perf-script.txt  |   5 +-
 tools/perf/builtin-mem.c                  | 150 ++++++++++++----------
 tools/perf/builtin-record.c               |   4 +
 tools/perf/builtin-script.c               |  90 ++++++++-----
 tools/perf/tests/sample-parsing.c         |  10 +-
 tools/perf/util/event.h                   |   5 +
 tools/perf/util/evsel.c                   |  18 +++
 tools/perf/util/hist.c                    |   5 +
 tools/perf/util/hist.h                    |   2 +
 tools/perf/util/machine.c                 |   7 +-
 tools/perf/util/map_symbol.h              |   1 +
 tools/perf/util/perf_event_attr_fprintf.c |   2 +-
 tools/perf/util/record.h                  |   2 +
 tools/perf/util/session.c                 |  26 ++++
 tools/perf/util/sort.c                    |  56 ++++++++
 tools/perf/util/sort.h                    |   3 +
 tools/perf/util/synthetic-events.c        |  16 +++
 25 files changed, 456 insertions(+), 114 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:35   ` Peter Zijlstra
                     ` (2 more replies)
  2020-08-10 21:24 ` [PATCH V6 02/16] perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
                   ` (14 subsequent siblings)
  15 siblings, 3 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

Current perf can report both virtual addresses and physical addresses,
but not the page size. Without the page size information of the utilized
page, users cannot decide whether to promote/demote large pages to
optimize memory usage.

Add a new sample type for the data page size.

Current perf already has a facility to collect data virtual addresses.
A page walker is required to walk the pages tables and calculate the
page size from a given virtual address.

On some platforms, e.g., X86, the page walker is invoked in an NMI
handler. So the page walker must be IRQ-safe and low overhead. Besides,
the page walker should work for both user and kernel virtual address.
The existing generic page walker, e.g., walk_page_range_novma(), is a
little bit complex and doesn't guarantee the IRQ-safe. The follow_page()
is only for user-virtual address.

Add a new function perf_get_page_size() to walk the page tables and
calculate the page size. In the function:
- Interrupts have to be disabled to prevent any teardown of the page
  tables.
- The size of a normal page is from the pre-defined page size macros.
- The size of a compound page is retrieved from the helper function,
  page_size().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 include/linux/perf_event.h      |   1 +
 include/uapi/linux/perf_event.h |   4 +-
 kernel/events/core.c            | 121 ++++++++++++++++++++++++++++++++
 3 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3737e653f47e..5de95f36d7a8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1035,6 +1035,7 @@ struct perf_sample_data {
 
 	u64				phys_addr;
 	u64				cgroup;
+	u64				data_page_size;
 } ____cacheline_aligned;
 
 /* default value for data source */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 52ca2093831c..32484accc7a3 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -143,8 +143,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
 	PERF_SAMPLE_AUX				= 1U << 20,
 	PERF_SAMPLE_CGROUP			= 1U << 21,
+	PERF_SAMPLE_DATA_PAGE_SIZE		= 1U << 22,
 
-	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 23,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
@@ -879,6 +880,7 @@ enum perf_event_type {
 	 *	{ u64			phys_addr;} && PERF_SAMPLE_PHYS_ADDR
 	 *	{ u64			size;
 	 *	  char			data[size]; } && PERF_SAMPLE_AUX
+	 *	{ u64			data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0fbf17ff7cc5..00becacfd15e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -51,6 +51,7 @@
 #include <linux/proc_ns.h>
 #include <linux/mount.h>
 #include <linux/min_heap.h>
+#include <linux/highmem.h>
 
 #include "internal.h"
 
@@ -1895,6 +1896,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
 	if (sample_type & PERF_SAMPLE_CGROUP)
 		size += sizeof(data->cgroup);
 
+	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
+		size += sizeof(data->data_page_size);
+
 	event->header_size = size;
 }
 
@@ -6939,6 +6943,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_CGROUP)
 		perf_output_put(handle, data->cgroup);
 
+	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
+		perf_output_put(handle, data->data_page_size);
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		perf_output_put(handle, data->aux_size);
 
@@ -6996,6 +7003,117 @@ static u64 perf_virt_to_phys(u64 virt)
 	return phys_addr;
 }
 
+#ifdef CONFIG_MMU
+
+static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr)
+{
+	struct page *page;
+	pgd_t *pgd;
+	p4d_t *p4d;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	pgd = pgd_offset(mm, addr);
+	if (pgd_none(*pgd))
+		return 0;
+
+	p4d = p4d_offset(pgd, addr);
+	if (!p4d_present(*p4d))
+		return 0;
+
+#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
+	if (p4d_leaf(*p4d)) {
+		page = p4d_page(*p4d);
+
+		if (PageCompound(page))
+			return page_size(compound_head(page));
+
+		return P4D_SIZE;
+	}
+#endif
+
+	pud = pud_offset(p4d, addr);
+	if (!pud_present(*pud))
+		return 0;
+
+#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
+	if (pud_leaf(*pud)) {
+		page = pud_page(*pud);
+
+		if (PageCompound(page))
+			return page_size(compound_head(page));
+
+		return PUD_SIZE;
+	}
+#endif
+
+	pmd = pmd_offset(pud, addr);
+	if (!pmd_present(*pmd))
+		return 0;
+
+#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
+	if (pmd_leaf(*pmd)) {
+		page = pmd_page(*pmd);
+
+		if (PageCompound(page))
+			return page_size(compound_head(page));
+
+		return PMD_SIZE;
+	}
+#endif
+
+	pte = pte_offset_map(pmd, addr);
+	if (!pte_present(*pte)) {
+		pte_unmap(pte);
+		return 0;
+	}
+
+	pte_unmap(pte);
+	return PAGE_SIZE;
+}
+
+#else
+
+static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr)
+{
+	return 0;
+}
+
+#endif
+
+/* Return the page size of a given virtual address. */
+static u64 perf_get_page_size(unsigned long addr)
+{
+	struct mm_struct *mm;
+	unsigned long flags;
+	u64 size;
+
+	if (!addr)
+		return 0;
+
+	/*
+	 * Software page-table walkers must disable IRQs,
+	 * which prevents any tear down of the page tables.
+	 */
+	local_irq_save(flags);
+
+	mm = current->mm;
+	if (!mm) {
+		/*
+		 * For kernel threads and the like, use init_mm so that
+		 * we can find kernel memory.
+		 */
+		mm = &init_mm;
+	}
+
+	size = __perf_get_page_size(mm, addr);
+
+	local_irq_restore(flags);
+
+	return size;
+}
+
 static struct perf_callchain_entry __empty_callchain = { .nr = 0, };
 
 struct perf_callchain_entry *
@@ -7151,6 +7269,9 @@ void perf_prepare_sample(struct perf_event_header *header,
 	}
 #endif
 
+	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
+		data->data_page_size = perf_get_page_size(data->addr);
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		u64 size;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 02/16] perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
  2020-08-10 21:24 ` [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:40   ` Peter Zijlstra
  2020-08-10 21:24 ` [PATCH V6 03/16] perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE Kan Liang
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

The new sample type, PERF_SAMPLE_DATA_PAGE_SIZE, requires the virtual
address. Update the data->addr if the sample type is set.

The large PEBS is disabled with the sample type, because perf doesn't
support munmap tracking yet. The PEBS buffer for large PEBS cannot be
flushed for each munmap. Wrong page size may be calculated. The large
PEBS can be enabled later separately when munmap tracking is supported.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/ds.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 86848c57b55e..861cb5178cb6 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -961,7 +961,8 @@ static void adaptive_pebs_record_size_update(void)
 
 #define PERF_PEBS_MEMINFO_TYPE	(PERF_SAMPLE_ADDR | PERF_SAMPLE_DATA_SRC |   \
 				PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_WEIGHT | \
-				PERF_SAMPLE_TRANSACTION)
+				PERF_SAMPLE_TRANSACTION |		     \
+				PERF_SAMPLE_DATA_PAGE_SIZE)
 
 static u64 pebs_update_adaptive_cfg(struct perf_event *event)
 {
@@ -1337,6 +1338,10 @@ static u64 get_data_src(struct perf_event *event, u64 aux)
 	return val;
 }
 
+#define PERF_SAMPLE_ADDR_TYPE	(PERF_SAMPLE_ADDR |		\
+				 PERF_SAMPLE_PHYS_ADDR |	\
+				 PERF_SAMPLE_DATA_PAGE_SIZE)
+
 static void setup_pebs_fixed_sample_data(struct perf_event *event,
 				   struct pt_regs *iregs, void *__pebs,
 				   struct perf_sample_data *data,
@@ -1451,7 +1456,7 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
 	}
 
 
-	if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) &&
+	if ((sample_type & PERF_SAMPLE_ADDR_TYPE) &&
 	    x86_pmu.intel_cap.pebs_format >= 1)
 		data->addr = pebs->dla;
 
@@ -1579,7 +1584,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
 		if (sample_type & PERF_SAMPLE_DATA_SRC)
 			data->data_src.val = get_data_src(event, meminfo->aux);
 
-		if (sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR))
+		if (sample_type & PERF_SAMPLE_ADDR_TYPE)
 			data->addr = meminfo->address;
 
 		if (sample_type & PERF_SAMPLE_TRANSACTION)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 03/16] perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
  2020-08-10 21:24 ` [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
  2020-08-10 21:24 ` [PATCH V6 02/16] perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:41   ` Peter Zijlstra
  2020-08-10 21:24 ` [PATCH V6 04/16] tools headers UAPI: Update tools's copy of linux/perf_event.h Kan Liang
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov

From: Stephane Eranian <eranian@google.com>

When studying code layout, it is useful to capture the page size of the
sampled code address.

Add a new sample type for code page size.
The new sample type requires collecting the ip. The code page size can
be calculated from the IRQ-safe perf_get_page_size().

Only the generic support is covered. The large PEBS will be disabled
with this sample type.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/perf_event.h      |  1 +
 include/uapi/linux/perf_event.h |  4 +++-
 kernel/events/core.c            | 11 ++++++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5de95f36d7a8..f3d5ca63d831 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1036,6 +1036,7 @@ struct perf_sample_data {
 	u64				phys_addr;
 	u64				cgroup;
 	u64				data_page_size;
+	u64				code_page_size;
 } ____cacheline_aligned;
 
 /* default value for data source */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 32484accc7a3..01c73860da48 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -144,8 +144,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_AUX				= 1U << 20,
 	PERF_SAMPLE_CGROUP			= 1U << 21,
 	PERF_SAMPLE_DATA_PAGE_SIZE		= 1U << 22,
+	PERF_SAMPLE_CODE_PAGE_SIZE		= 1U << 23,
 
-	PERF_SAMPLE_MAX = 1U << 23,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 24,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
@@ -881,6 +882,7 @@ enum perf_event_type {
 	 *	{ u64			size;
 	 *	  char			data[size]; } && PERF_SAMPLE_AUX
 	 *	{ u64			data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
+	 *	{ u64			code_page_size;} && PERF_SAMPLE_CODE_PAGE_SIZE
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 00becacfd15e..6bfd7fd16d06 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1899,6 +1899,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
 	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
 		size += sizeof(data->data_page_size);
 
+	if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)
+		size += sizeof(data->code_page_size);
+
 	event->header_size = size;
 }
 
@@ -6946,6 +6949,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
 		perf_output_put(handle, data->data_page_size);
 
+	if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)
+		perf_output_put(handle, data->code_page_size);
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		perf_output_put(handle, data->aux_size);
 
@@ -7149,7 +7155,7 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 	__perf_event_header__init_id(header, data, event);
 
-	if (sample_type & PERF_SAMPLE_IP)
+	if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
 		data->ip = perf_instruction_pointer(regs);
 
 	if (sample_type & PERF_SAMPLE_CALLCHAIN) {
@@ -7272,6 +7278,9 @@ void perf_prepare_sample(struct perf_event_header *header,
 	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
 		data->data_page_size = perf_get_page_size(data->addr);
 
+	if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)
+		data->code_page_size = perf_get_page_size(data->ip);
+
 	if (sample_type & PERF_SAMPLE_AUX) {
 		u64 size;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 04/16] tools headers UAPI: Update tools's copy of linux/perf_event.h
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (2 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 03/16] perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 05/16] perf record: Support new sample type for data page size Kan Liang
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

To get the changes in:
   ("perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE")
   ("perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE")

This silences this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h'
differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h
include/uapi/linux/perf_event.h

This update is a prerequisite to adding support for the HW index of raw
branch records.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/include/uapi/linux/perf_event.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 7b2d6fc9e6ed..4666405adce6 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -143,8 +143,10 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
 	PERF_SAMPLE_AUX				= 1U << 20,
 	PERF_SAMPLE_CGROUP			= 1U << 21,
+	PERF_SAMPLE_DATA_PAGE_SIZE		= 1U << 22,
+	PERF_SAMPLE_CODE_PAGE_SIZE		= 1U << 23,
 
-	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 24,		/* non-ABI */
 
 	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
 };
@@ -878,6 +880,8 @@ enum perf_event_type {
 	 *	{ u64			phys_addr;} && PERF_SAMPLE_PHYS_ADDR
 	 *	{ u64			size;
 	 *	  char			data[size]; } && PERF_SAMPLE_AUX
+	 *	{ u64			data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
+	 *	{ u64			code_page_size;} && PERF_SAMPLE_CODE_PAGE_SIZE
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 05/16] perf record: Support new sample type for data page size
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (3 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 04/16] tools headers UAPI: Update tools's copy of linux/perf_event.h Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 06/16] perf script: Use ULL for enum perf_output_field Kan Liang
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

Support new sample type PERF_SAMPLE_DATA_PAGE_SIZE for page size.

Add new option --data-page-size to record sample data page size.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt  | 3 +++
 tools/perf/builtin-record.c               | 2 ++
 tools/perf/util/event.h                   | 1 +
 tools/perf/util/evsel.c                   | 9 +++++++++
 tools/perf/util/perf_event_attr_fprintf.c | 2 +-
 tools/perf/util/record.h                  | 1 +
 tools/perf/util/synthetic-events.c        | 8 ++++++++
 7 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index fa8a5fcd27ab..cbc3f7fdf48d 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -289,6 +289,9 @@ OPTIONS
 --phys-data::
 	Record the sample physical addresses.
 
+--data-page-size::
+	Record the sampled data address data page size
+
 -T::
 --timestamp::
 	Record the sample timestamps. Use it with 'perf report -D' to see the
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a37e7910e9e9..27d8e563fe33 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2445,6 +2445,8 @@ static struct option __record_options[] = {
 	OPT_BOOLEAN('d', "data", &record.opts.sample_address, "Record the sample addresses"),
 	OPT_BOOLEAN(0, "phys-data", &record.opts.sample_phys_addr,
 		    "Record the sample physical addresses"),
+	OPT_BOOLEAN(0, "data-page-size", &record.opts.sample_data_page_size,
+		    "Record the sampled data address data page size"),
 	OPT_BOOLEAN(0, "sample-cpu", &record.opts.sample_cpu, "Record the sample cpu"),
 	OPT_BOOLEAN_SET('T', "timestamp", &record.opts.sample_time,
 			&record.opts.sample_time_set,
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 6ae01c3c2ffa..69cdf14c23fa 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -135,6 +135,7 @@ struct perf_sample {
 	u32 raw_size;
 	u64 data_src;
 	u64 phys_addr;
+	u64 data_page_size;
 	u64 cgroup;
 	u32 flags;
 	u16 insn_len;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ef802f6d40c1..9e5e986b56bc 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1075,6 +1075,9 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 		evsel__set_sample_bit(evsel, CGROUP);
 	}
 
+	if (opts->sample_data_page_size)
+		evsel__set_sample_bit(evsel, DATA_PAGE_SIZE);
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
@@ -2245,6 +2248,12 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	data->data_page_size = 0;
+	if (type & PERF_SAMPLE_DATA_PAGE_SIZE) {
+		data->data_page_size = *array;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		OVERFLOW_CHECK_u64(array);
 		sz = *array++;
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index b94fa07f5d32..68188c7c188a 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -35,7 +35,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
 		bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
 		bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
 		bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX),
-		bit_name(CGROUP),
+		bit_name(CGROUP), bit_name(DATA_PAGE_SIZE),
 		{ .name = NULL, }
 	};
 #undef bit_name
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 39d1de4b2a36..924fcfbebd18 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -22,6 +22,7 @@ struct record_opts {
 	bool	      raw_samples;
 	bool	      sample_address;
 	bool	      sample_phys_addr;
+	bool	      sample_data_page_size;
 	bool	      sample_weight;
 	bool	      sample_time;
 	bool	      sample_time_set;
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 89b390623b63..0de5f8c0b867 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1406,6 +1406,9 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	if (type & PERF_SAMPLE_CGROUP)
 		result += sizeof(u64);
 
+	if (type & PERF_SAMPLE_DATA_PAGE_SIZE)
+		result += sizeof(u64);
+
 	if (type & PERF_SAMPLE_AUX) {
 		result += sizeof(u64);
 		result += sample->aux_sample.size;
@@ -1585,6 +1588,11 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_DATA_PAGE_SIZE) {
+		*array = sample->data_page_size;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		sz = sample->aux_sample.size;
 		*array++ = sz;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 06/16] perf script: Use ULL for enum perf_output_field
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (4 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 05/16] perf record: Support new sample type for data page size Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-12 12:21   ` Arnaldo Carvalho de Melo
  2020-08-10 21:24 ` [PATCH V6 07/16] perf script: Support data page size Kan Liang
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

The Bitwise-Shift operator (1U << ) is used in the enum
perf_output_field, which has already reached its capacity (32 items).
If more items are added, a compile error will be triggered.

Change the U to ULL, which extend the capacity to 64 items.

The enum perf_output_field is only used to calculate a value for the
'fields' in the output structure. The 'fields' is u64. The change
doesn't break anything.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-script.c | 64 ++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 447457786362..214bec350971 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -82,38 +82,38 @@ static bool			native_arch;
 unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
 
 enum perf_output_field {
-	PERF_OUTPUT_COMM            = 1U << 0,
-	PERF_OUTPUT_TID             = 1U << 1,
-	PERF_OUTPUT_PID             = 1U << 2,
-	PERF_OUTPUT_TIME            = 1U << 3,
-	PERF_OUTPUT_CPU             = 1U << 4,
-	PERF_OUTPUT_EVNAME          = 1U << 5,
-	PERF_OUTPUT_TRACE           = 1U << 6,
-	PERF_OUTPUT_IP              = 1U << 7,
-	PERF_OUTPUT_SYM             = 1U << 8,
-	PERF_OUTPUT_DSO             = 1U << 9,
-	PERF_OUTPUT_ADDR            = 1U << 10,
-	PERF_OUTPUT_SYMOFFSET       = 1U << 11,
-	PERF_OUTPUT_SRCLINE         = 1U << 12,
-	PERF_OUTPUT_PERIOD          = 1U << 13,
-	PERF_OUTPUT_IREGS	    = 1U << 14,
-	PERF_OUTPUT_BRSTACK	    = 1U << 15,
-	PERF_OUTPUT_BRSTACKSYM	    = 1U << 16,
-	PERF_OUTPUT_DATA_SRC	    = 1U << 17,
-	PERF_OUTPUT_WEIGHT	    = 1U << 18,
-	PERF_OUTPUT_BPF_OUTPUT	    = 1U << 19,
-	PERF_OUTPUT_CALLINDENT	    = 1U << 20,
-	PERF_OUTPUT_INSN	    = 1U << 21,
-	PERF_OUTPUT_INSNLEN	    = 1U << 22,
-	PERF_OUTPUT_BRSTACKINSN	    = 1U << 23,
-	PERF_OUTPUT_BRSTACKOFF	    = 1U << 24,
-	PERF_OUTPUT_SYNTH           = 1U << 25,
-	PERF_OUTPUT_PHYS_ADDR       = 1U << 26,
-	PERF_OUTPUT_UREGS	    = 1U << 27,
-	PERF_OUTPUT_METRIC	    = 1U << 28,
-	PERF_OUTPUT_MISC            = 1U << 29,
-	PERF_OUTPUT_SRCCODE	    = 1U << 30,
-	PERF_OUTPUT_IPC             = 1U << 31,
+	PERF_OUTPUT_COMM            = 1ULL << 0,
+	PERF_OUTPUT_TID             = 1ULL << 1,
+	PERF_OUTPUT_PID             = 1ULL << 2,
+	PERF_OUTPUT_TIME            = 1ULL << 3,
+	PERF_OUTPUT_CPU             = 1ULL << 4,
+	PERF_OUTPUT_EVNAME          = 1ULL << 5,
+	PERF_OUTPUT_TRACE           = 1ULL << 6,
+	PERF_OUTPUT_IP              = 1ULL << 7,
+	PERF_OUTPUT_SYM             = 1ULL << 8,
+	PERF_OUTPUT_DSO             = 1ULL << 9,
+	PERF_OUTPUT_ADDR            = 1ULL << 10,
+	PERF_OUTPUT_SYMOFFSET       = 1ULL << 11,
+	PERF_OUTPUT_SRCLINE         = 1ULL << 12,
+	PERF_OUTPUT_PERIOD          = 1ULL << 13,
+	PERF_OUTPUT_IREGS	    = 1ULL << 14,
+	PERF_OUTPUT_BRSTACK	    = 1ULL << 15,
+	PERF_OUTPUT_BRSTACKSYM	    = 1ULL << 16,
+	PERF_OUTPUT_DATA_SRC	    = 1ULL << 17,
+	PERF_OUTPUT_WEIGHT	    = 1ULL << 18,
+	PERF_OUTPUT_BPF_OUTPUT	    = 1ULL << 19,
+	PERF_OUTPUT_CALLINDENT	    = 1ULL << 20,
+	PERF_OUTPUT_INSN	    = 1ULL << 21,
+	PERF_OUTPUT_INSNLEN	    = 1ULL << 22,
+	PERF_OUTPUT_BRSTACKINSN	    = 1ULL << 23,
+	PERF_OUTPUT_BRSTACKOFF	    = 1ULL << 24,
+	PERF_OUTPUT_SYNTH           = 1ULL << 25,
+	PERF_OUTPUT_PHYS_ADDR       = 1ULL << 26,
+	PERF_OUTPUT_UREGS	    = 1ULL << 27,
+	PERF_OUTPUT_METRIC	    = 1ULL << 28,
+	PERF_OUTPUT_MISC            = 1ULL << 29,
+	PERF_OUTPUT_SRCCODE	    = 1ULL << 30,
+	PERF_OUTPUT_IPC             = 1ULL << 31,
 };
 
 struct output_option {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 07/16] perf script: Support data page size
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (5 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 06/16] perf script: Use ULL for enum perf_output_field Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 08/16] perf sort: Add sort option for " Kan Liang
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

Display the data page size if it is available.

Can be configured by the user, for example:
  perf script --fields comm,event,phys_addr,data_page_size
            dtlb mem-loads:uP:        3fec82ea8 4K
            dtlb mem-loads:uP:        3fec82e90 4K
            dtlb mem-loads:uP:        3e23700a4 4K
            dtlb mem-loads:uP:        3fec82f20 4K
            dtlb mem-loads:uP:        3e23700a4 4K
            dtlb mem-loads:uP:        3b4211bec 4K
            dtlb mem-loads:uP:        382205dc0 2M
            dtlb mem-loads:uP:        36fa082c0 2M
            dtlb mem-loads:uP:        377607340 2M
            dtlb mem-loads:uP:        330010180 2M
            dtlb mem-loads:uP:        33200fd80 2M
            dtlb mem-loads:uP:        31b012b80 2M

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-script.txt |  5 +++--
 tools/perf/builtin-script.c              | 17 +++++++++++++++--
 tools/perf/util/event.h                  |  3 +++
 tools/perf/util/session.c                | 23 +++++++++++++++++++++++
 4 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 372dfd110e6d..27a49f2e6cb7 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -116,8 +116,9 @@ OPTIONS
 --fields::
         Comma separated list of fields to print. Options are:
         comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
-        srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output, brstackinsn,
-        brstackoff, callindent, insn, insnlen, synth, phys_addr, metric, misc, srccode, ipc.
+	srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
+	brstackinsn, brstackoff, callindent, insn, insnlen, synth, phys_addr,
+	metric, misc, srccode, ipc, data_page_size.
         Field list can be prepended with the type, trace, sw or hw,
         to indicate to which event type the field list applies.
         e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 214bec350971..69773025cc58 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -30,6 +30,7 @@
 #include "util/thread-stack.h"
 #include "util/time-utils.h"
 #include "util/path.h"
+#include "util/event.h"
 #include "ui/ui.h"
 #include "print_binary.h"
 #include "archinsn.h"
@@ -114,6 +115,7 @@ enum perf_output_field {
 	PERF_OUTPUT_MISC            = 1ULL << 29,
 	PERF_OUTPUT_SRCCODE	    = 1ULL << 30,
 	PERF_OUTPUT_IPC             = 1ULL << 31,
+	PERF_OUTPUT_DATA_PAGE_SIZE  = 1ULL << 32,
 };
 
 struct output_option {
@@ -152,6 +154,7 @@ struct output_option {
 	{.str = "misc", .field = PERF_OUTPUT_MISC},
 	{.str = "srccode", .field = PERF_OUTPUT_SRCCODE},
 	{.str = "ipc", .field = PERF_OUTPUT_IPC},
+	{.str = "data_page_size", .field = PERF_OUTPUT_DATA_PAGE_SIZE},
 };
 
 enum {
@@ -224,7 +227,8 @@ static struct {
 			      PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
 			      PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD |
 			      PERF_OUTPUT_ADDR | PERF_OUTPUT_DATA_SRC |
-			      PERF_OUTPUT_WEIGHT | PERF_OUTPUT_PHYS_ADDR,
+			      PERF_OUTPUT_WEIGHT | PERF_OUTPUT_PHYS_ADDR |
+			      PERF_OUTPUT_DATA_PAGE_SIZE,
 
 		.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
 	},
@@ -473,6 +477,10 @@ static int perf_evsel__check_attr(struct evsel *evsel, struct perf_session *sess
 	    evsel__check_stype(evsel, PERF_SAMPLE_PHYS_ADDR, "PHYS_ADDR", PERF_OUTPUT_PHYS_ADDR))
 		return -EINVAL;
 
+	if (PRINT_FIELD(DATA_PAGE_SIZE) &&
+	    evsel__check_stype(evsel, PERF_SAMPLE_DATA_PAGE_SIZE, "DATA_PAGE_SIZE", PERF_OUTPUT_DATA_PAGE_SIZE))
+		return -EINVAL;
+
 	return 0;
 }
 
@@ -1853,6 +1861,7 @@ static void process_event(struct perf_script *script,
 	unsigned int type = output_type(attr->type);
 	struct evsel_script *es = evsel->priv;
 	FILE *fp = es->fp;
+	char str[PAGE_SIZE_NAME_LEN];
 
 	if (output[type].fields == 0)
 		return;
@@ -1941,6 +1950,9 @@ static void process_event(struct perf_script *script,
 	if (PRINT_FIELD(PHYS_ADDR))
 		fprintf(fp, "%16" PRIx64, sample->phys_addr);
 
+	if (PRINT_FIELD(DATA_PAGE_SIZE))
+		fprintf(fp, " %s", get_page_size_name(sample->data_page_size, str));
+
 	perf_sample__fprintf_ipc(sample, attr, fp);
 
 	fprintf(fp, "\n");
@@ -3423,7 +3435,8 @@ int cmd_script(int argc, const char **argv)
 		     "Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,"
 		     "addr,symoff,srcline,period,iregs,uregs,brstack,"
 		     "brstacksym,flags,bpf-output,brstackinsn,brstackoff,"
-		     "callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc",
+		     "callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc,"
+		     "data_page_size",
 		     parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 69cdf14c23fa..9db85c515a8b 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -404,4 +404,7 @@ extern int sysctl_perf_event_max_stack;
 extern int sysctl_perf_event_max_contexts_per_stack;
 extern unsigned int proc_map_timeout;
 
+#define PAGE_SIZE_NAME_LEN	10
+char *get_page_size_name(u64 size, char *str);
+
 #endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1a157e84a04a..f810b07d10d2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1225,10 +1225,30 @@ static void dump_event(struct evlist *evlist, union perf_event *event,
 	       event->header.size, perf_event__name(event->header.type));
 }
 
+char *get_page_size_name(u64 size, char *str)
+{
+	const char suffixes[5] = { 'B', 'K', 'M', 'G', 'T' };
+	int i;
+
+	if (size == 0) {
+		snprintf(str, PAGE_SIZE_NAME_LEN, "%s", "N/A");
+		return str;
+	}
+	for (i = 0; i < 5; i++) {
+		if (size < 1024)
+			break;
+		size /= 1024;
+	}
+
+	snprintf(str, PAGE_SIZE_NAME_LEN, "%lu%c", size, suffixes[i]);
+	return str;
+}
+
 static void dump_sample(struct evsel *evsel, union perf_event *event,
 			struct perf_sample *sample)
 {
 	u64 sample_type;
+	char str[PAGE_SIZE_NAME_LEN];
 
 	if (!dump_trace)
 		return;
@@ -1263,6 +1283,9 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		printf(" .. phys_addr: 0x%"PRIx64"\n", sample->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
+		printf(" .. data page size: %s\n", get_page_size_name(sample->data_page_size, str));
+
 	if (sample_type & PERF_SAMPLE_TRANSACTION)
 		printf("... transaction: %" PRIx64 "\n", sample->transaction);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 08/16] perf sort: Add sort option for data page size
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (6 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 07/16] perf script: Support data page size Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 09/16] perf mem: Factor out a function to generate sort order Kan Liang
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

Add a new sort option "data_page_size" for --mem-mode sort.  With this
option applied, perf can sort and report by sample's data page size.

Here is an example.
perf report --stdio --mem-mode
--sort=comm,symbol,phys_daddr,data_page_size

 # To display the perf.data header info, please use
 # --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 9K of event 'mem-loads:uP'
 # Total weight : 9028
 # Sort order   : comm,symbol,phys_daddr,data_page_size
 #
 # Overhead  Command  Symbol                        Data Physical
 # Address
 # Data Page Size
 # ........  .......  ............................
 # ......................  ......................
 #
    11.19%  dtlb     [.] touch_buffer              [.]
0x00000003fec82ea8  4K
     8.61%  dtlb     [.] GetTickCount              [.]
0x00000003c4f2c8a8  4K
     4.52%  dtlb     [.] GetTickCount              [.]
0x00000003fec82f58  4K
     4.33%  dtlb     [.] __gettimeofday            [.]
0x00000003fec82f48  4K
     4.32%  dtlb     [.] GetTickCount              [.]
0x00000003fec82f78  4K
     4.28%  dtlb     [.] GetTickCount              [.]
0x00000003fec82f50  4K
     4.23%  dtlb     [.] GetTickCount              [.]
0x00000003fec82f70  4K
     4.11%  dtlb     [.] GetTickCount              [.]
0x00000003fec82f68  4K
     4.00%  dtlb     [.] Calibrate                 [.]
0x00000003fec82f98  4K
     3.91%  dtlb     [.] Calibrate                 [.]
0x00000003fec82f90  4K
     3.43%  dtlb     [.] touch_buffer              [.]
0x00000003fec82e98  4K
     3.42%  dtlb     [.] touch_buffer              [.]
0x00000003fec82e90  4K
     0.09%  dtlb     [.] DoDependentLoads          [.]
0x000000036ea084c0  2M
     0.08%  dtlb     [.] DoDependentLoads          [.]
0x000000032b010b80  2M

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/util/hist.c                   |  3 +++
 tools/perf/util/hist.h                   |  1 +
 tools/perf/util/machine.c                |  7 ++++--
 tools/perf/util/map_symbol.h             |  1 +
 tools/perf/util/sort.c                   | 30 ++++++++++++++++++++++++
 tools/perf/util/sort.h                   |  1 +
 7 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index d068103690cc..8f7f4e9605d8 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -150,6 +150,7 @@ OPTIONS
 	- snoop: type of snoop (if any) for the data at the time of the sample
 	- dcacheline: the cacheline the data address is on at the time of the sample
 	- phys_daddr: physical address of data being executed on at the time of sample
+	- data_page_size: the data page size of data being executed on at the time of sample
 
 	And the default sort keys are changed to local_weight, mem, sym, dso,
 	symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 8a793e4c9400..7829ecd7ea59 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -188,6 +188,9 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 		hists__new_col_len(hists, HISTC_MEM_PHYS_DADDR,
 				   unresolved_col_width + 4 + 2);
 
+		hists__new_col_len(hists, HISTC_MEM_DATA_PAGE_SIZE,
+				   unresolved_col_width + 4 + 2);
+
 	} else {
 		symlen = unresolved_col_width + 4 + 2;
 		hists__new_col_len(hists, HISTC_MEM_DADDR_SYMBOL, symlen);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 96b1c13bbccc..e44cf5bb655f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -56,6 +56,7 @@ enum hist_column {
 	HISTC_MEM_DADDR_SYMBOL,
 	HISTC_MEM_DADDR_DSO,
 	HISTC_MEM_PHYS_DADDR,
+	HISTC_MEM_DATA_PAGE_SIZE,
 	HISTC_MEM_LOCKED,
 	HISTC_MEM_TLB,
 	HISTC_MEM_LVL,
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index d5384807372b..3f078ad65e95 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1967,11 +1967,12 @@ static void ip__resolve_ams(struct thread *thread,
 	ams->ms.sym = al.sym;
 	ams->ms.map = al.map;
 	ams->phys_addr = 0;
+	ams->data_page_size = 0;
 }
 
 static void ip__resolve_data(struct thread *thread,
 			     u8 m, struct addr_map_symbol *ams,
-			     u64 addr, u64 phys_addr)
+			     u64 addr, u64 phys_addr, u64 daddr_page_size)
 {
 	struct addr_location al;
 
@@ -1985,6 +1986,7 @@ static void ip__resolve_data(struct thread *thread,
 	ams->ms.sym = al.sym;
 	ams->ms.map = al.map;
 	ams->phys_addr = phys_addr;
+	ams->data_page_size = daddr_page_size;
 }
 
 struct mem_info *sample__resolve_mem(struct perf_sample *sample,
@@ -1997,7 +1999,8 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 
 	ip__resolve_ams(al->thread, &mi->iaddr, sample->ip);
 	ip__resolve_data(al->thread, al->cpumode, &mi->daddr,
-			 sample->addr, sample->phys_addr);
+			 sample->addr, sample->phys_addr,
+			 sample->data_page_size);
 	mi->data_src.val = sample->data_src;
 
 	return mi;
diff --git a/tools/perf/util/map_symbol.h b/tools/perf/util/map_symbol.h
index 5b8ca93798e9..7d22ade082c8 100644
--- a/tools/perf/util/map_symbol.h
+++ b/tools/perf/util/map_symbol.h
@@ -19,5 +19,6 @@ struct addr_map_symbol {
 	u64	      addr;
 	u64	      al_addr;
 	u64	      phys_addr;
+	u64	      data_page_size;
 };
 #endif // __PERF_MAP_SYMBOL
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index d42339df20f8..ad9666db07fb 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1462,6 +1462,35 @@ struct sort_entry sort_mem_phys_daddr = {
 	.se_width_idx	= HISTC_MEM_PHYS_DADDR,
 };
 
+static int64_t
+sort__data_page_size_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	uint64_t l = 0, r = 0;
+
+	if (left->mem_info)
+		l = left->mem_info->daddr.data_page_size;
+	if (right->mem_info)
+		r = right->mem_info->daddr.data_page_size;
+
+	return (int64_t)(r - l);
+}
+
+static int hist_entry__data_page_size_snprintf(struct hist_entry *he, char *bf,
+					  size_t size, unsigned int width)
+{
+	char str[PAGE_SIZE_NAME_LEN];
+
+	return repsep_snprintf(bf, size, "%-*s", width,
+			       get_page_size_name(he->mem_info->daddr.data_page_size, str));
+}
+
+struct sort_entry sort_mem_data_page_size = {
+	.se_header	= "Data Page Size",
+	.se_cmp		= sort__data_page_size_cmp,
+	.se_snprintf	= hist_entry__data_page_size_snprintf,
+	.se_width_idx	= HISTC_MEM_DATA_PAGE_SIZE,
+};
+
 static int64_t
 sort__abort_cmp(struct hist_entry *left, struct hist_entry *right)
 {
@@ -1740,6 +1769,7 @@ static struct sort_dimension memory_sort_dimensions[] = {
 	DIM(SORT_MEM_SNOOP, "snoop", sort_mem_snoop),
 	DIM(SORT_MEM_DCACHELINE, "dcacheline", sort_mem_dcacheline),
 	DIM(SORT_MEM_PHYS_DADDR, "phys_daddr", sort_mem_phys_daddr),
+	DIM(SORT_MEM_DATA_PAGE_SIZE, "data_page_size", sort_mem_data_page_size),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 66d39c4cfe2b..e50f2b695bc4 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -255,6 +255,7 @@ enum sort_type {
 	SORT_MEM_DCACHELINE,
 	SORT_MEM_IADDR_SYMBOL,
 	SORT_MEM_PHYS_DADDR,
+	SORT_MEM_DATA_PAGE_SIZE,
 };
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 09/16] perf mem: Factor out a function to generate sort order
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (7 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 08/16] perf sort: Add sort option for " Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 10/16] perf mem: Clean up output format Kan Liang
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

Now, "--phys-data" is the only option which impacts the sort order.
A simple "if else" is enough to handle the option. But there will be
more options added, e.g. "--data-page-size", which also impact the sort
order. The code will become too complex to be maintained.

Divide the sort order string into several small pieces.
The first piece is always the default sort string for LOAD/STORE.
Appends the specific sort string if related option is applied.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-mem.c | 41 ++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 3523279af6af..7fb04f41cd99 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -265,11 +265,35 @@ static int report_raw_events(struct perf_mem *mem)
 	perf_session__delete(session);
 	return ret;
 }
+static char *get_sort_order(struct perf_mem *mem)
+{
+	bool has_extra_options = mem->phys_addr ? true : false;
+	char sort[128];
+
+	/*
+	 * there is no weight (cost) associated with stores, so don't print
+	 * the column
+	 */
+	if (!(mem->operation & MEM_OPERATION_LOAD)) {
+		strcpy(sort, "--sort=mem,sym,dso,symbol_daddr,"
+			     "dso_daddr,tlb,locked");
+	} else if (has_extra_options) {
+		strcpy(sort, "--sort=local_weight,mem,sym,dso,symbol_daddr,"
+			     "dso_daddr,snoop,tlb,locked");
+	} else
+		return NULL;
+
+	if (mem->phys_addr)
+		strcat(sort, ",phys_daddr");
+
+	return strdup(sort);
+}
 
 static int report_events(int argc, const char **argv, struct perf_mem *mem)
 {
 	const char **rep_argv;
 	int ret, i = 0, j, rep_argc;
+	char *new_sort_order;
 
 	if (mem->dump_raw)
 		return report_raw_events(mem);
@@ -283,20 +307,9 @@ static int report_events(int argc, const char **argv, struct perf_mem *mem)
 	rep_argv[i++] = "--mem-mode";
 	rep_argv[i++] = "-n"; /* display number of samples */
 
-	/*
-	 * there is no weight (cost) associated with stores, so don't print
-	 * the column
-	 */
-	if (!(mem->operation & MEM_OPERATION_LOAD)) {
-		if (mem->phys_addr)
-			rep_argv[i++] = "--sort=mem,sym,dso,symbol_daddr,"
-					"dso_daddr,tlb,locked,phys_daddr";
-		else
-			rep_argv[i++] = "--sort=mem,sym,dso,symbol_daddr,"
-					"dso_daddr,tlb,locked";
-	} else if (mem->phys_addr)
-		rep_argv[i++] = "--sort=local_weight,mem,sym,dso,symbol_daddr,"
-				"dso_daddr,snoop,tlb,locked,phys_daddr";
+	new_sort_order = get_sort_order(mem);
+	if (new_sort_order)
+		rep_argv[i++] = new_sort_order;
 
 	for (j = 1; j < argc; j++, i++)
 		rep_argv[i] = argv[j];
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 10/16] perf mem: Clean up output format
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (8 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 09/16] perf mem: Factor out a function to generate sort order Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 11/16] perf mem: Support data page size Kan Liang
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

Now, "--phys-data" is the only option which impacts the output format.
A simple "if else" is enough to handle the option. But there will be
more options added, e.g. "--data-page-size", which also impact the
output format. The code will become too complex to be maintained.

Divide the big printf into several small pieces. Output the specific
piece only if the related option is applied.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-mem.c | 93 ++++++++++++++++------------------------
 1 file changed, 38 insertions(+), 55 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 7fb04f41cd99..200ff7c9d7b7 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -147,7 +147,7 @@ dump_raw_samples(struct perf_tool *tool,
 {
 	struct perf_mem *mem = container_of(tool, struct perf_mem, tool);
 	struct addr_location al;
-	const char *fmt;
+	const char *fmt, *field_sep;
 
 	if (machine__resolve(machine, &al, sample) < 0) {
 		fprintf(stderr, "problem processing %d event, skipping it.\n",
@@ -161,60 +161,41 @@ dump_raw_samples(struct perf_tool *tool,
 	if (al.map != NULL)
 		al.map->dso->hit = 1;
 
-	if (mem->phys_addr) {
-		if (symbol_conf.field_sep) {
-			fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s0x%016"PRIx64
-			      "%s%"PRIu64"%s0x%"PRIx64"%s%s:%s\n";
-		} else {
-			fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64
-			      "%s0x%016"PRIx64"%s%5"PRIu64"%s0x%06"PRIx64
-			      "%s%s:%s\n";
-			symbol_conf.field_sep = " ";
-		}
-
-		printf(fmt,
-			sample->pid,
-			symbol_conf.field_sep,
-			sample->tid,
-			symbol_conf.field_sep,
-			sample->ip,
-			symbol_conf.field_sep,
-			sample->addr,
-			symbol_conf.field_sep,
-			sample->phys_addr,
-			symbol_conf.field_sep,
-			sample->weight,
-			symbol_conf.field_sep,
-			sample->data_src,
-			symbol_conf.field_sep,
-			al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
-			al.sym ? al.sym->name : "???");
+	field_sep = symbol_conf.field_sep;
+	if (field_sep) {
+		fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s";
 	} else {
-		if (symbol_conf.field_sep) {
-			fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s%"PRIu64
-			      "%s0x%"PRIx64"%s%s:%s\n";
-		} else {
-			fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64
-			      "%s%5"PRIu64"%s0x%06"PRIx64"%s%s:%s\n";
-			symbol_conf.field_sep = " ";
-		}
+		fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64"%s";
+		symbol_conf.field_sep = " ";
+	}
+	printf(fmt,
+		sample->pid,
+		symbol_conf.field_sep,
+		sample->tid,
+		symbol_conf.field_sep,
+		sample->ip,
+		symbol_conf.field_sep,
+		sample->addr,
+		symbol_conf.field_sep);
 
-		printf(fmt,
-			sample->pid,
-			symbol_conf.field_sep,
-			sample->tid,
-			symbol_conf.field_sep,
-			sample->ip,
-			symbol_conf.field_sep,
-			sample->addr,
-			symbol_conf.field_sep,
-			sample->weight,
-			symbol_conf.field_sep,
-			sample->data_src,
-			symbol_conf.field_sep,
-			al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
-			al.sym ? al.sym->name : "???");
+	if (mem->phys_addr) {
+		printf("0x%016"PRIx64"%s",
+			sample->phys_addr,
+			symbol_conf.field_sep);
 	}
+
+	if (field_sep)
+		fmt = "%"PRIu64"%s0x%"PRIx64"%s%s:%s\n";
+	else
+		fmt = "%5"PRIu64"%s0x%06"PRIx64"%s%s:%s\n";
+
+	printf(fmt,
+		sample->weight,
+		symbol_conf.field_sep,
+		sample->data_src,
+		symbol_conf.field_sep,
+		al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
+		al.sym ? al.sym->name : "???");
 out_put:
 	addr_location__put(&al);
 	return 0;
@@ -254,10 +235,12 @@ static int report_raw_events(struct perf_mem *mem)
 	if (ret < 0)
 		goto out_delete;
 
+	printf("# PID, TID, IP, ADDR, ");
+
 	if (mem->phys_addr)
-		printf("# PID, TID, IP, ADDR, PHYS ADDR, LOCAL WEIGHT, DSRC, SYMBOL\n");
-	else
-		printf("# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL\n");
+		printf("PHYS ADDR, ");
+
+	printf("LOCAL WEIGHT, DSRC, SYMBOL\n");
 
 	ret = perf_session__process_events(session);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 11/16] perf mem: Support data page size
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (9 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 10/16] perf mem: Clean up output format Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 12/16] perf test: Add test case for PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

Add option --data-page-size in "perf mem" to record/report data page
size.

Here are some examples.
perf mem --phys-data --data-page-size report -D

 # PID, TID, IP, ADDR, PHYS ADDR, DATA PAGE SIZE, LOCAL WEIGHT, DSRC,
 # SYMBOL
20134 20134 0xffffffffb5bd2fd0 0x016ffff9a274e96a308 0x000000044e96a308
4K  1168 0x5080144
/lib/modules/4.18.0-rc7+/build/vmlinux:perf_ctx_unlock
20134 20134 0xffffffffb63f645c 0xffffffffb752b814 0xcfb52b814 2M 225
0x26a100142 /lib/modules/4.18.0-rc7+/build/vmlinux:_raw_spin_lock
20134 20134 0xffffffffb660300c 0xfffffe00016b8bb0 0x0 4K 0 0x5080144
/lib/modules/4.18.0-rc7+/build/vmlinux:__x86_indirect_thunk_rax

perf mem --phys-data --data-page-size report --stdio

 # To display the perf.data header info, please use
 # --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 5K of event 'cpu/mem-loads,ldlat=30/P'
 # Total weight : 281234
 # Sort order   :
 # mem,sym,dso,symbol_daddr,dso_daddr,tlb,locked,phys_daddr,data_page_size
 #
 # Overhead       Samples  Memory access             Symbol
 # Shared Object     Data Symbol                                  Data
 # Object              TLB access              Locked  Data Physical
 # Address   Data Page Size
 # ........  ............  ........................
 # ................................  ................
 # ...........................................  .......................
 # ......................  ......  ......................
 # ......................
 #
    28.54%          1826  L1 or L1 hit              [k]
__x86_indirect_thunk_rax      [kernel.vmlinux]  [k] 0xffffb0df31b0ff28
[unknown]                L1 or L2 hit            No      [k]
0000000000000000    4K
     6.02%           256  L1 or L1 hit              [.] touch_buffer
dtlb              [.] 0x00007ffd50109da8                       [stack]
L1 or L2 hit            No      [.] 0x000000042454ada8  4K
     3.23%             5  L1 or L1 hit              [k] clear_huge_page
[kernel.vmlinux]  [k] 0xffff9a2753b8ce60                       [unknown]
L1 or L2 hit            No      [k] 0x0000000453b8ce60  2M
     2.98%             4  L1 or L1 hit              [k] clear_page_erms
[kernel.vmlinux]  [k] 0xffffb0df31b0fd00                       [unknown]
L1 or L2 hit            No      [k] 0000000000000000    4K

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-mem.txt |  3 +++
 tools/perf/builtin-mem.c              | 20 +++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 199ea0f0a6c0..66177511c5c4 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -63,6 +63,9 @@ OPTIONS
 --phys-data::
 	Record/Report sample physical addresses
 
+--data-page-size::
+	Record/Report sample data address page size
+
 RECORD OPTIONS
 --------------
 -e::
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 200ff7c9d7b7..433e02ab45eb 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -29,6 +29,7 @@ struct perf_mem {
 	bool			dump_raw;
 	bool			force;
 	bool			phys_addr;
+	bool			data_page_size;
 	int			operation;
 	const char		*cpu_list;
 	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
@@ -100,6 +101,9 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
 	if (mem->phys_addr)
 		rec_argv[i++] = "--phys-data";
 
+	if (mem->data_page_size)
+		rec_argv[i++] = "--data-page-size";
+
 	for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
 		if (!perf_mem_events[j].record)
 			continue;
@@ -148,6 +152,7 @@ dump_raw_samples(struct perf_tool *tool,
 	struct perf_mem *mem = container_of(tool, struct perf_mem, tool);
 	struct addr_location al;
 	const char *fmt, *field_sep;
+	char str[PAGE_SIZE_NAME_LEN];
 
 	if (machine__resolve(machine, &al, sample) < 0) {
 		fprintf(stderr, "problem processing %d event, skipping it.\n",
@@ -184,6 +189,12 @@ dump_raw_samples(struct perf_tool *tool,
 			symbol_conf.field_sep);
 	}
 
+	if (mem->data_page_size) {
+		printf("%s%s",
+			get_page_size_name(sample->data_page_size, str),
+			symbol_conf.field_sep);
+	}
+
 	if (field_sep)
 		fmt = "%"PRIu64"%s0x%"PRIx64"%s%s:%s\n";
 	else
@@ -240,6 +251,9 @@ static int report_raw_events(struct perf_mem *mem)
 	if (mem->phys_addr)
 		printf("PHYS ADDR, ");
 
+	if (mem->data_page_size)
+		printf("DATA PAGE SIZE, ");
+
 	printf("LOCAL WEIGHT, DSRC, SYMBOL\n");
 
 	ret = perf_session__process_events(session);
@@ -250,7 +264,7 @@ static int report_raw_events(struct perf_mem *mem)
 }
 static char *get_sort_order(struct perf_mem *mem)
 {
-	bool has_extra_options = mem->phys_addr ? true : false;
+	bool has_extra_options = (mem->phys_addr | mem->data_page_size) ? true : false;
 	char sort[128];
 
 	/*
@@ -269,6 +283,9 @@ static char *get_sort_order(struct perf_mem *mem)
 	if (mem->phys_addr)
 		strcat(sort, ",phys_daddr");
 
+	if (mem->data_page_size)
+		strcat(sort, ",data_page_size");
+
 	return strdup(sort);
 }
 
@@ -410,6 +427,7 @@ int cmd_mem(int argc, const char **argv)
 		   " between columns '.' is reserved."),
 	OPT_BOOLEAN('f', "force", &mem.force, "don't complain, do it"),
 	OPT_BOOLEAN('p', "phys-data", &mem.phys_addr, "Record/Report sample physical addresses"),
+	OPT_BOOLEAN(0, "data-page-size", &mem.data_page_size, "Record/Report sample data address page size"),
 	OPT_END()
 	};
 	const char *const mem_subcommands[] = { "record", "report", NULL };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 12/16] perf test: Add test case for PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (10 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 11/16] perf mem: Support data page size Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 13/16] perf tools: Add support for PERF_SAMPLE_CODE_PAGE_SIZE Kan Liang
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov, Kan Liang

Extend sample-parsing test cases to support new sample type
PERF_SAMPLE_DATA_PAGE_SIZE.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/tests/sample-parsing.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index a0bdaf390ac8..6baed165c850 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -154,6 +154,9 @@ static bool samples_same(const struct perf_sample *s1,
 	if (type & PERF_SAMPLE_CGROUP)
 		COMP(cgroup);
 
+	if (type & PERF_SAMPLE_DATA_PAGE_SIZE)
+		COMP(data_page_size);
+
 	if (type & PERF_SAMPLE_AUX) {
 		COMP(aux_sample.size);
 		if (memcmp(s1->aux_sample.data, s2->aux_sample.data,
@@ -234,6 +237,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 		},
 		.phys_addr	= 113,
 		.cgroup		= 114,
+		.data_page_size	= 4096,
 		.aux_sample	= {
 			.size	= sizeof(aux_data),
 			.data	= (void *)aux_data,
@@ -340,7 +344,7 @@ int test__sample_parsing(struct test *test __maybe_unused, int subtest __maybe_u
 	 * were added.  Please actually update the test rather than just change
 	 * the condition below.
 	 */
-	if (PERF_SAMPLE_MAX > PERF_SAMPLE_CGROUP << 1) {
+	if (PERF_SAMPLE_MAX > PERF_SAMPLE_DATA_PAGE_SIZE << 1) {
 		pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n");
 		return -1;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 13/16] perf tools: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (11 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 12/16] perf test: Add test case for PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 14/16] perf script: " Kan Liang
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov

From: Stephane Eranian <eranian@google.com>

Adds the infrastructure to sample the code address page size.

Introduce a new --code-page-size option for perf record.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/Documentation/perf-record.txt  | 3 +++
 tools/perf/builtin-record.c               | 2 ++
 tools/perf/util/event.h                   | 1 +
 tools/perf/util/evsel.c                   | 9 +++++++++
 tools/perf/util/perf_event_attr_fprintf.c | 2 +-
 tools/perf/util/record.h                  | 1 +
 tools/perf/util/synthetic-events.c        | 8 ++++++++
 7 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index cbc3f7fdf48d..3994318ac5b4 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -292,6 +292,9 @@ OPTIONS
 --data-page-size::
 	Record the sampled data address data page size
 
+--code-page-size::
+	Record the sampled code address (ip) page size
+
 -T::
 --timestamp::
 	Record the sample timestamps. Use it with 'perf report -D' to see the
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 27d8e563fe33..e44da98124ba 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2447,6 +2447,8 @@ static struct option __record_options[] = {
 		    "Record the sample physical addresses"),
 	OPT_BOOLEAN(0, "data-page-size", &record.opts.sample_data_page_size,
 		    "Record the sampled data address data page size"),
+	OPT_BOOLEAN(0, "code-page-size", &record.opts.sample_code_page_size,
+		    "Record the sampled code address (ip) page size"),
 	OPT_BOOLEAN(0, "sample-cpu", &record.opts.sample_cpu, "Record the sample cpu"),
 	OPT_BOOLEAN_SET('T', "timestamp", &record.opts.sample_time,
 			&record.opts.sample_time_set,
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 9db85c515a8b..5bd1b31f7b7f 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -136,6 +136,7 @@ struct perf_sample {
 	u64 data_src;
 	u64 phys_addr;
 	u64 data_page_size;
+	u64 code_page_size;
 	u64 cgroup;
 	u32 flags;
 	u16 insn_len;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9e5e986b56bc..f96fdac83a02 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1078,6 +1078,9 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 	if (opts->sample_data_page_size)
 		evsel__set_sample_bit(evsel, DATA_PAGE_SIZE);
 
+	if (opts->sample_code_page_size)
+		evsel__set_sample_bit(evsel, CODE_PAGE_SIZE);
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
@@ -2254,6 +2257,12 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	data->code_page_size = 0;
+	if (type & PERF_SAMPLE_CODE_PAGE_SIZE) {
+		data->code_page_size = *array;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		OVERFLOW_CHECK_u64(array);
 		sz = *array++;
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 68188c7c188a..a97a95f3b1be 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -35,7 +35,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
 		bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
 		bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
 		bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX),
-		bit_name(CGROUP), bit_name(DATA_PAGE_SIZE),
+		bit_name(CGROUP), bit_name(DATA_PAGE_SIZE), bit_name(CODE_PAGE_SIZE),
 		{ .name = NULL, }
 	};
 #undef bit_name
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 924fcfbebd18..20141cb7a953 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -23,6 +23,7 @@ struct record_opts {
 	bool	      sample_address;
 	bool	      sample_phys_addr;
 	bool	      sample_data_page_size;
+	bool	      sample_code_page_size;
 	bool	      sample_weight;
 	bool	      sample_time;
 	bool	      sample_time_set;
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 0de5f8c0b867..88f8f42c6b76 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1409,6 +1409,9 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	if (type & PERF_SAMPLE_DATA_PAGE_SIZE)
 		result += sizeof(u64);
 
+	if (type & PERF_SAMPLE_CODE_PAGE_SIZE)
+		result += sizeof(u64);
+
 	if (type & PERF_SAMPLE_AUX) {
 		result += sizeof(u64);
 		result += sample->aux_sample.size;
@@ -1593,6 +1596,11 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_CODE_PAGE_SIZE) {
+		*array = sample->code_page_size;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_AUX) {
 		sz = sample->aux_sample.size;
 		*array++ = sz;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 14/16] perf script: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (12 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 13/16] perf tools: Add support for PERF_SAMPLE_CODE_PAGE_SIZE Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 15/16] perf report: " Kan Liang
  2020-08-10 21:24 ` [PATCH V6 16/16] perf test: Add test case " Kan Liang
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov

From: Stephane Eranian <eranian@google.com>

Display sampled code page sizes when PERF_SAMPLE_CODE_PAGE_SIZE was set.

For example,
perf script --fields comm,event,ip,code_page_size
            dtlb mem-loads:uP:            445777 4K
            dtlb mem-loads:uP:            40f724 4K
            dtlb mem-loads:uP:            474926 4K
            dtlb mem-loads:uP:            401075 4K
            dtlb mem-loads:uP:            401095 4K
            dtlb mem-loads:uP:            401095 4K
            dtlb mem-loads:uP:            4010cc 4K
            dtlb mem-loads:uP:            440b6f 4K

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/Documentation/perf-script.txt |  2 +-
 tools/perf/builtin-script.c              | 13 +++++++++++--
 tools/perf/util/session.c                |  3 +++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 27a49f2e6cb7..fd7ec6cb971c 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -118,7 +118,7 @@ OPTIONS
         comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
 	srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
 	brstackinsn, brstackoff, callindent, insn, insnlen, synth, phys_addr,
-	metric, misc, srccode, ipc, data_page_size.
+	metric, misc, srccode, ipc, data_page_size, code_page_size.
         Field list can be prepended with the type, trace, sw or hw,
         to indicate to which event type the field list applies.
         e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 69773025cc58..e6341216e6f7 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -116,6 +116,7 @@ enum perf_output_field {
 	PERF_OUTPUT_SRCCODE	    = 1ULL << 30,
 	PERF_OUTPUT_IPC             = 1ULL << 31,
 	PERF_OUTPUT_DATA_PAGE_SIZE  = 1ULL << 32,
+	PERF_OUTPUT_CODE_PAGE_SIZE  = 1ULL << 33,
 };
 
 struct output_option {
@@ -155,6 +156,7 @@ struct output_option {
 	{.str = "srccode", .field = PERF_OUTPUT_SRCCODE},
 	{.str = "ipc", .field = PERF_OUTPUT_IPC},
 	{.str = "data_page_size", .field = PERF_OUTPUT_DATA_PAGE_SIZE},
+	{.str = "code_page_size", .field = PERF_OUTPUT_CODE_PAGE_SIZE},
 };
 
 enum {
@@ -228,7 +230,7 @@ static struct {
 			      PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD |
 			      PERF_OUTPUT_ADDR | PERF_OUTPUT_DATA_SRC |
 			      PERF_OUTPUT_WEIGHT | PERF_OUTPUT_PHYS_ADDR |
-			      PERF_OUTPUT_DATA_PAGE_SIZE,
+			      PERF_OUTPUT_DATA_PAGE_SIZE | PERF_OUTPUT_CODE_PAGE_SIZE,
 
 		.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
 	},
@@ -481,6 +483,10 @@ static int perf_evsel__check_attr(struct evsel *evsel, struct perf_session *sess
 	    evsel__check_stype(evsel, PERF_SAMPLE_DATA_PAGE_SIZE, "DATA_PAGE_SIZE", PERF_OUTPUT_DATA_PAGE_SIZE))
 		return -EINVAL;
 
+	if (PRINT_FIELD(CODE_PAGE_SIZE) &&
+	    evsel__check_stype(evsel, PERF_SAMPLE_CODE_PAGE_SIZE, "CODE_PAGE_SIZE", PERF_OUTPUT_CODE_PAGE_SIZE))
+		return -EINVAL;
+
 	return 0;
 }
 
@@ -1953,6 +1959,9 @@ static void process_event(struct perf_script *script,
 	if (PRINT_FIELD(DATA_PAGE_SIZE))
 		fprintf(fp, " %s", get_page_size_name(sample->data_page_size, str));
 
+	if (PRINT_FIELD(CODE_PAGE_SIZE))
+		fprintf(fp, " %s", get_page_size_name(sample->code_page_size, str));
+
 	perf_sample__fprintf_ipc(sample, attr, fp);
 
 	fprintf(fp, "\n");
@@ -3436,7 +3445,7 @@ int cmd_script(int argc, const char **argv)
 		     "addr,symoff,srcline,period,iregs,uregs,brstack,"
 		     "brstacksym,flags,bpf-output,brstackinsn,brstackoff,"
 		     "callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc,"
-		     "data_page_size",
+		     "data_page_size,code_page_size",
 		     parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f810b07d10d2..83d4680044a2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1286,6 +1286,9 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
 	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
 		printf(" .. data page size: %s\n", get_page_size_name(sample->data_page_size, str));
 
+	if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)
+		printf(" .. code page size: %s\n", get_page_size_name(sample->code_page_size, str));
+
 	if (sample_type & PERF_SAMPLE_TRANSACTION)
 		printf("... transaction: %" PRIx64 "\n", sample->transaction);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 15/16] perf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (13 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 14/16] perf script: " Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  2020-08-10 21:24 ` [PATCH V6 16/16] perf test: Add test case " Kan Liang
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov

From: Stephane Eranian <eranian@google.com>

Add a new sort dimension "code_page_size" for common sort.
With this option applied, perf can sort and report by sample's code page
size.

For example,
perf report --stdio --sort=comm,symbol,code_page_size
 # To display the perf.data header info, please use
 # --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 3K of event 'mem-loads:uP'
 # Event count (approx.): 1470769
 #
 # Overhead  Command  Symbol                        Code Page Size IPC
 # [IPC Coverage]
 # ........  .......  ............................  ..............
 # ....................
 #
     69.56%  dtlb     [.] GetTickCount              4K             -

     17.93%  dtlb     [.] Calibrate                 4K             -
 -
     11.40%  dtlb     [.] __gettimeofday            4K             -
 -

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/util/hist.c                   |  2 ++
 tools/perf/util/hist.h                   |  1 +
 tools/perf/util/sort.c                   | 26 ++++++++++++++++++++++++
 tools/perf/util/sort.h                   |  2 ++
 5 files changed, 32 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 8f7f4e9605d8..e44045842c5c 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -108,6 +108,7 @@ OPTIONS
 	- period: Raw number of event count of sample
 	- time: Separate the samples by time stamp with the resolution specified by
 	--time-quantum (default 100ms). Specify with overhead and before it.
+	- code_page_size: the code page size of sampled code address (ip)
 
 	By default, comm, dso and symbol keys are used.
 	(i.e. --sort comm,dso,symbol)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 7829ecd7ea59..af948da14d94 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -212,6 +212,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 		hists__new_col_len(hists, HISTC_TIME, 16);
 	else
 		hists__new_col_len(hists, HISTC_TIME, 12);
+	hists__new_col_len(hists, HISTC_CODE_PAGE_SIZE, 6);
 
 	if (h->srcline) {
 		len = MAX(strlen(h->srcline), strlen(sort_srcline.se_header));
@@ -718,6 +719,7 @@ __hists__add_entry(struct hists *hists,
 		.cpumode = al->cpumode,
 		.ip	 = al->addr,
 		.level	 = al->level,
+		.code_page_size = sample->code_page_size,
 		.stat = {
 			.nr_events = 1,
 			.period	= sample->period,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index e44cf5bb655f..6500c00ae7be 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -53,6 +53,7 @@ enum hist_column {
 	HISTC_DSO_TO,
 	HISTC_LOCAL_WEIGHT,
 	HISTC_GLOBAL_WEIGHT,
+	HISTC_CODE_PAGE_SIZE,
 	HISTC_MEM_DADDR_SYMBOL,
 	HISTC_MEM_DADDR_DSO,
 	HISTC_MEM_PHYS_DADDR,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index ad9666db07fb..bc79d446bcbd 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1491,6 +1491,31 @@ struct sort_entry sort_mem_data_page_size = {
 	.se_width_idx	= HISTC_MEM_DATA_PAGE_SIZE,
 };
 
+static int64_t
+sort__code_page_size_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	uint64_t l = left->code_page_size;
+	uint64_t r = right->code_page_size;
+
+	return (int64_t)(r - l);
+}
+
+static int hist_entry__code_page_size_snprintf(struct hist_entry *he, char *bf,
+					  size_t size, unsigned int width)
+{
+	char str[PAGE_SIZE_NAME_LEN];
+
+	return repsep_snprintf(bf, size, "%-*s", width,
+			       get_page_size_name(he->code_page_size, str));
+}
+
+struct sort_entry sort_code_page_size = {
+	.se_header	= "Code Page Size",
+	.se_cmp		= sort__code_page_size_cmp,
+	.se_snprintf	= hist_entry__code_page_size_snprintf,
+	.se_width_idx	= HISTC_CODE_PAGE_SIZE,
+};
+
 static int64_t
 sort__abort_cmp(struct hist_entry *left, struct hist_entry *right)
 {
@@ -1735,6 +1760,7 @@ static struct sort_dimension common_sort_dimensions[] = {
 	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 	DIM(SORT_SYM_IPC_NULL, "ipc_null", sort_sym_ipc_null),
 	DIM(SORT_TIME, "time", sort_time),
+	DIM(SORT_CODE_PAGE_SIZE, "code_page_size", sort_code_page_size),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index e50f2b695bc4..cab4172a6ec3 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -106,6 +106,7 @@ struct hist_entry {
 	u64			transaction;
 	s32			socket;
 	s32			cpu;
+	u64			code_page_size;
 	u8			cpumode;
 	u8			depth;
 
@@ -229,6 +230,7 @@ enum sort_type {
 	SORT_CGROUP_ID,
 	SORT_SYM_IPC_NULL,
 	SORT_TIME,
+	SORT_CODE_PAGE_SIZE,
 
 	/* branch stack specific sort keys */
 	__SORT_BRANCH_STACK,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH V6 16/16] perf test: Add test case for PERF_SAMPLE_CODE_PAGE_SIZE
  2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
                   ` (14 preceding siblings ...)
  2020-08-10 21:24 ` [PATCH V6 15/16] perf report: " Kan Liang
@ 2020-08-10 21:24 ` Kan Liang
  15 siblings, 0 replies; 32+ messages in thread
From: Kan Liang @ 2020-08-10 21:24 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak,
	dave.hansen, kirill.shutemov

From: Stephane Eranian <eranian@google.com>

Extend sample-parsing test cases to support new sample type
PERF_SAMPLE_CODE_PAGE_SIZE.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/tests/sample-parsing.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 6baed165c850..5e780b85d952 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -157,6 +157,9 @@ static bool samples_same(const struct perf_sample *s1,
 	if (type & PERF_SAMPLE_DATA_PAGE_SIZE)
 		COMP(data_page_size);
 
+	if (type & PERF_SAMPLE_CODE_PAGE_SIZE)
+		COMP(code_page_size);
+
 	if (type & PERF_SAMPLE_AUX) {
 		COMP(aux_sample.size);
 		if (memcmp(s1->aux_sample.data, s2->aux_sample.data,
@@ -238,6 +241,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 		.phys_addr	= 113,
 		.cgroup		= 114,
 		.data_page_size	= 4096,
+		.code_page_size	= 4096,
 		.aux_sample	= {
 			.size	= sizeof(aux_data),
 			.data	= (void *)aux_data,
@@ -344,7 +348,7 @@ int test__sample_parsing(struct test *test __maybe_unused, int subtest __maybe_u
 	 * were added.  Please actually update the test rather than just change
 	 * the condition below.
 	 */
-	if (PERF_SAMPLE_MAX > PERF_SAMPLE_DATA_PAGE_SIZE << 1) {
+	if (PERF_SAMPLE_MAX > PERF_SAMPLE_CODE_PAGE_SIZE << 1) {
 		pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n");
 		return -1;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:24 ` [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
@ 2020-08-10 21:35   ` Peter Zijlstra
  2020-08-10 21:39   ` Peter Zijlstra
  2020-08-10 21:47   ` Dave Hansen
  2 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2020-08-10 21:35 UTC (permalink / raw)
  To: Kan Liang
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov

On Mon, Aug 10, 2020 at 02:24:21PM -0700, Kan Liang wrote:
> Current perf can report both virtual addresses and physical addresses,
> but not the page size. Without the page size information of the utilized
> page, users cannot decide whether to promote/demote large pages to
> optimize memory usage.
> 
> Add a new sample type for the data page size.
> 
> Current perf already has a facility to collect data virtual addresses.
> A page walker is required to walk the pages tables and calculate the
> page size from a given virtual address.
> 
> On some platforms, e.g., X86, the page walker is invoked in an NMI
> handler. So the page walker must be IRQ-safe and low overhead. Besides,
> the page walker should work for both user and kernel virtual address.
> The existing generic page walker, e.g., walk_page_range_novma(), is a
> little bit complex and doesn't guarantee the IRQ-safe. The follow_page()
> is only for user-virtual address.

s/IRQ/NMI/g

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:24 ` [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
  2020-08-10 21:35   ` Peter Zijlstra
@ 2020-08-10 21:39   ` Peter Zijlstra
  2020-08-10 22:36     ` Liang, Kan
  2020-08-10 21:47   ` Dave Hansen
  2 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-08-10 21:39 UTC (permalink / raw)
  To: Kan Liang
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov

On Mon, Aug 10, 2020 at 02:24:21PM -0700, Kan Liang wrote:
> Current perf can report both virtual addresses and physical addresses,
> but not the page size. Without the page size information of the utilized
> page, users cannot decide whether to promote/demote large pages to
> optimize memory usage.
> 
> Add a new sample type for the data page size.
> 
> Current perf already has a facility to collect data virtual addresses.
> A page walker is required to walk the pages tables and calculate the
> page size from a given virtual address.
> 
> On some platforms, e.g., X86, the page walker is invoked in an NMI
> handler. So the page walker must be IRQ-safe and low overhead. Besides,
> the page walker should work for both user and kernel virtual address.
> The existing generic page walker, e.g., walk_page_range_novma(), is a
> little bit complex and doesn't guarantee the IRQ-safe. The follow_page()
> is only for user-virtual address.
> 
> Add a new function perf_get_page_size() to walk the page tables and
> calculate the page size. In the function:
> - Interrupts have to be disabled to prevent any teardown of the page
>   tables.
> - The size of a normal page is from the pre-defined page size macros.
> - The size of a compound page is retrieved from the helper function,
>   page_size().
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>

>  /* default value for data source */
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 52ca2093831c..32484accc7a3 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -143,8 +143,9 @@ enum perf_event_sample_format {
>  	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
>  	PERF_SAMPLE_AUX				= 1U << 20,
>  	PERF_SAMPLE_CGROUP			= 1U << 21,
> +	PERF_SAMPLE_DATA_PAGE_SIZE		= 1U << 22,
>  
> -	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
> +	PERF_SAMPLE_MAX = 1U << 23,		/* non-ABI */
>  
>  	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
>  };

> @@ -7151,6 +7269,9 @@ void perf_prepare_sample(struct perf_event_header *header,
>  	}
>  #endif
>  
> +	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
> +		data->data_page_size = perf_get_page_size(data->addr);
> +

We could just require SAMPLE_DATA_PAGE requires SAMPLE_ADDR.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 02/16] perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:24 ` [PATCH V6 02/16] perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
@ 2020-08-10 21:40   ` Peter Zijlstra
  2020-08-10 22:36     ` Liang, Kan
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-08-10 21:40 UTC (permalink / raw)
  To: Kan Liang
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov

On Mon, Aug 10, 2020 at 02:24:22PM -0700, Kan Liang wrote:
> The new sample type, PERF_SAMPLE_DATA_PAGE_SIZE, requires the virtual
> address. Update the data->addr if the sample type is set.
> 
> The large PEBS is disabled with the sample type, because perf doesn't
> support munmap tracking yet. The PEBS buffer for large PEBS cannot be
> flushed for each munmap. Wrong page size may be calculated. The large
> PEBS can be enabled later separately when munmap tracking is supported.


You also get to fix up Power.

arch/powerpc/perf/core-book3s.c:                    (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR))


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 03/16] perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  2020-08-10 21:24 ` [PATCH V6 03/16] perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE Kan Liang
@ 2020-08-10 21:41   ` Peter Zijlstra
  2020-08-10 22:37     ` Liang, Kan
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-08-10 21:41 UTC (permalink / raw)
  To: Kan Liang
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov

On Mon, Aug 10, 2020 at 02:24:23PM -0700, Kan Liang wrote:
> From: Stephane Eranian <eranian@google.com>
> 
> When studying code layout, it is useful to capture the page size of the
> sampled code address.
> 
> Add a new sample type for code page size.
> The new sample type requires collecting the ip. The code page size can
> be calculated from the IRQ-safe perf_get_page_size().
> 
> Only the generic support is covered. The large PEBS will be disabled
> with this sample type.

-ENOREASON

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:24 ` [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
  2020-08-10 21:35   ` Peter Zijlstra
  2020-08-10 21:39   ` Peter Zijlstra
@ 2020-08-10 21:47   ` Dave Hansen
  2020-08-10 22:38     ` Liang, Kan
  2 siblings, 1 reply; 32+ messages in thread
From: Dave Hansen @ 2020-08-10 21:47 UTC (permalink / raw)
  To: Kan Liang, peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak, kirill.shutemov

On 8/10/20 2:24 PM, Kan Liang wrote:
> +static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr)
> +{
> +	struct page *page;
> +	pgd_t *pgd;
> +	p4d_t *p4d;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +
> +	pgd = pgd_offset(mm, addr);
> +	if (pgd_none(*pgd))
> +		return 0;
> +
> +	p4d = p4d_offset(pgd, addr);
> +	if (!p4d_present(*p4d))
> +		return 0;
> +
> +#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
> +	if (p4d_leaf(*p4d)) {
> +		page = p4d_page(*p4d);
> +
> +		if (PageCompound(page))
> +			return page_size(compound_head(page));
> +
> +		return P4D_SIZE;
> +	}
> +#endif
> +
> +	pud = pud_offset(p4d, addr);
> +	if (!pud_present(*pud))
> +		return 0;
> +
> +#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
> +	if (pud_leaf(*pud)) {
> +		page = pud_page(*pud);
> +
> +		if (PageCompound(page))
> +			return page_size(compound_head(page));
> +
> +		return PUD_SIZE;
> +	}
> +#endif
> +
> +	pmd = pmd_offset(pud, addr);
> +	if (!pmd_present(*pmd))
> +		return 0;
> +
> +#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
> +	if (pmd_leaf(*pmd)) {
> +		page = pmd_page(*pmd);
> +
> +		if (PageCompound(page))
> +			return page_size(compound_head(page));
> +
> +		return PMD_SIZE;
> +	}
> +#endif
> +
> +	pte = pte_offset_map(pmd, addr);
> +	if (!pte_present(*pte)) {
> +		pte_unmap(pte);
> +		return 0;
> +	}
> +
> +	pte_unmap(pte);
> +	return PAGE_SIZE;
> +}

It's probably best if we very carefully define up front what is getting
reported here.  For instance, I believe we already have some fun cases
with huge tmpfs where a compound page is mapped with 4k PTEs.  Kirill
also found a few drivers doing this as well.  I think there were also
some weird cases for ARM hugetlbfs where there were multiple hardware
page table entries mapping a single hugetlbfs page.  These would be
cases where compound_head() size would be greater than the size of the
leaf paging structure entry.

This is also why we have KerelPageSize and MMUPageSize in /proc/$pid/smaps.

So, is this returning the kernel software page size or the MMU size?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:39   ` Peter Zijlstra
@ 2020-08-10 22:36     ` Liang, Kan
  0 siblings, 0 replies; 32+ messages in thread
From: Liang, Kan @ 2020-08-10 22:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov



On 8/10/2020 5:39 PM, Peter Zijlstra wrote:
> On Mon, Aug 10, 2020 at 02:24:21PM -0700, Kan Liang wrote:
>> Current perf can report both virtual addresses and physical addresses,
>> but not the page size. Without the page size information of the utilized
>> page, users cannot decide whether to promote/demote large pages to
>> optimize memory usage.
>>
>> Add a new sample type for the data page size.
>>
>> Current perf already has a facility to collect data virtual addresses.
>> A page walker is required to walk the pages tables and calculate the
>> page size from a given virtual address.
>>
>> On some platforms, e.g., X86, the page walker is invoked in an NMI
>> handler. So the page walker must be IRQ-safe and low overhead. Besides,
>> the page walker should work for both user and kernel virtual address.
>> The existing generic page walker, e.g., walk_page_range_novma(), is a
>> little bit complex and doesn't guarantee the IRQ-safe. The follow_page()
>> is only for user-virtual address.
>>
>> Add a new function perf_get_page_size() to walk the page tables and
>> calculate the page size. In the function:
>> - Interrupts have to be disabled to prevent any teardown of the page
>>    tables.
>> - The size of a normal page is from the pre-defined page size macros.
>> - The size of a compound page is retrieved from the helper function,
>>    page_size().
>>
>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> 
>>   /* default value for data source */
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index 52ca2093831c..32484accc7a3 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -143,8 +143,9 @@ enum perf_event_sample_format {
>>   	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
>>   	PERF_SAMPLE_AUX				= 1U << 20,
>>   	PERF_SAMPLE_CGROUP			= 1U << 21,
>> +	PERF_SAMPLE_DATA_PAGE_SIZE		= 1U << 22,
>>   
>> -	PERF_SAMPLE_MAX = 1U << 22,		/* non-ABI */
>> +	PERF_SAMPLE_MAX = 1U << 23,		/* non-ABI */
>>   
>>   	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
>>   };
> 
>> @@ -7151,6 +7269,9 @@ void perf_prepare_sample(struct perf_event_header *header,
>>   	}
>>   #endif
>>   
>> +	if (sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)
>> +		data->data_page_size = perf_get_page_size(data->addr);
>> +
> 
> We could just require SAMPLE_DATA_PAGE requires SAMPLE_ADDR.
> 

If we only require the SAMPLE_DATA_PAGE_SIZE and no SAMPLE_ADDR, the 
data->addr will be updated implicitly, but the value will not dump to 
userspace tool. I will add a comment here.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 02/16] perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:40   ` Peter Zijlstra
@ 2020-08-10 22:36     ` Liang, Kan
  0 siblings, 0 replies; 32+ messages in thread
From: Liang, Kan @ 2020-08-10 22:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov



On 8/10/2020 5:40 PM, Peter Zijlstra wrote:
> On Mon, Aug 10, 2020 at 02:24:22PM -0700, Kan Liang wrote:
>> The new sample type, PERF_SAMPLE_DATA_PAGE_SIZE, requires the virtual
>> address. Update the data->addr if the sample type is set.
>>
>> The large PEBS is disabled with the sample type, because perf doesn't
>> support munmap tracking yet. The PEBS buffer for large PEBS cannot be
>> flushed for each munmap. Wrong page size may be calculated. The large
>> PEBS can be enabled later separately when munmap tracking is supported.
> 
> 
> You also get to fix up Power.
> 
> arch/powerpc/perf/core-book3s.c:                    (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR))

Sure. I will add one patch for Power.

Thanks,
Kan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 03/16] perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  2020-08-10 21:41   ` Peter Zijlstra
@ 2020-08-10 22:37     ` Liang, Kan
  2020-08-10 22:44       ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Liang, Kan @ 2020-08-10 22:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov



On 8/10/2020 5:41 PM, Peter Zijlstra wrote:
> On Mon, Aug 10, 2020 at 02:24:23PM -0700, Kan Liang wrote:
>> From: Stephane Eranian <eranian@google.com>
>>
>> When studying code layout, it is useful to capture the page size of the
>> sampled code address.
>>
>> Add a new sample type for code page size.
>> The new sample type requires collecting the ip. The code page size can
>> be calculated from the IRQ-safe perf_get_page_size().
>>
>> Only the generic support is covered. The large PEBS will be disabled
>> with this sample type.
> 
> -ENOREASON

I think the reason is similar to PERF_SAMPLE_DATA_PAGE_SIZE. For large 
PEBS, the mapping could be gone for the earlier PEBS records. Invalid 
page size may be retrieved. I will update the commit message.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 21:47   ` Dave Hansen
@ 2020-08-10 22:38     ` Liang, Kan
  2020-08-10 22:47       ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Liang, Kan @ 2020-08-10 22:38 UTC (permalink / raw)
  To: Dave Hansen, peterz, acme, mingo, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, eranian, ak, kirill.shutemov



On 8/10/2020 5:47 PM, Dave Hansen wrote:
> On 8/10/20 2:24 PM, Kan Liang wrote:
>> +static u64 __perf_get_page_size(struct mm_struct *mm, unsigned long addr)
>> +{
>> +	struct page *page;
>> +	pgd_t *pgd;
>> +	p4d_t *p4d;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte;
>> +
>> +	pgd = pgd_offset(mm, addr);
>> +	if (pgd_none(*pgd))
>> +		return 0;
>> +
>> +	p4d = p4d_offset(pgd, addr);
>> +	if (!p4d_present(*p4d))
>> +		return 0;
>> +
>> +#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
>> +	if (p4d_leaf(*p4d)) {
>> +		page = p4d_page(*p4d);
>> +
>> +		if (PageCompound(page))
>> +			return page_size(compound_head(page));
>> +
>> +		return P4D_SIZE;
>> +	}
>> +#endif
>> +
>> +	pud = pud_offset(p4d, addr);
>> +	if (!pud_present(*pud))
>> +		return 0;
>> +
>> +#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
>> +	if (pud_leaf(*pud)) {
>> +		page = pud_page(*pud);
>> +
>> +		if (PageCompound(page))
>> +			return page_size(compound_head(page));
>> +
>> +		return PUD_SIZE;
>> +	}
>> +#endif
>> +
>> +	pmd = pmd_offset(pud, addr);
>> +	if (!pmd_present(*pmd))
>> +		return 0;
>> +
>> +#if (defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
>> +	if (pmd_leaf(*pmd)) {
>> +		page = pmd_page(*pmd);
>> +
>> +		if (PageCompound(page))
>> +			return page_size(compound_head(page));
>> +
>> +		return PMD_SIZE;
>> +	}
>> +#endif
>> +
>> +	pte = pte_offset_map(pmd, addr);
>> +	if (!pte_present(*pte)) {
>> +		pte_unmap(pte);
>> +		return 0;
>> +	}
>> +
>> +	pte_unmap(pte);
>> +	return PAGE_SIZE;
>> +}
> 
> It's probably best if we very carefully define up front what is getting
> reported here.  For instance, I believe we already have some fun cases
> with huge tmpfs where a compound page is mapped with 4k PTEs.  Kirill
> also found a few drivers doing this as well.  I think there were also
> some weird cases for ARM hugetlbfs where there were multiple hardware
> page table entries mapping a single hugetlbfs page.  These would be
> cases where compound_head() size would be greater than the size of the
> leaf paging structure entry.
> 
> This is also why we have KerelPageSize and MMUPageSize in /proc/$pid/smaps.
> 
> So, is this returning the kernel software page size or the MMU size?
> 

This tries to return the kernel software page size. I will add a commit 
to the function. For the above cases, I think they can be detected by 
PageCompound(page). The current code should already cover them. Is my 
understanding correct?

Thanks,
Kan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 03/16] perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
  2020-08-10 22:37     ` Liang, Kan
@ 2020-08-10 22:44       ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2020-08-10 22:44 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov

On Mon, Aug 10, 2020 at 06:37:08PM -0400, Liang, Kan wrote:
> 
> 
> On 8/10/2020 5:41 PM, Peter Zijlstra wrote:
> > On Mon, Aug 10, 2020 at 02:24:23PM -0700, Kan Liang wrote:
> > > From: Stephane Eranian <eranian@google.com>
> > > 
> > > When studying code layout, it is useful to capture the page size of the
> > > sampled code address.
> > > 
> > > Add a new sample type for code page size.
> > > The new sample type requires collecting the ip. The code page size can
> > > be calculated from the IRQ-safe perf_get_page_size().
> > > 
> > > Only the generic support is covered. The large PEBS will be disabled
> > > with this sample type.
> > 
> > -ENOREASON
> 
> I think the reason is similar to PERF_SAMPLE_DATA_PAGE_SIZE. For large PEBS,
> the mapping could be gone for the earlier PEBS records. Invalid page size
> may be retrieved. I will update the commit message.

That's extremely unlikely though.. We might as well just do it and pray.
The worst case is that we return '0' page-size because it's gone, that
seems fairly sane.

Alternatively, we can register mmu_notifiers or something and flush PEBS
buffers when ranges get invalidated.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 22:38     ` Liang, Kan
@ 2020-08-10 22:47       ` Peter Zijlstra
  2020-08-12 13:39         ` Liang, Kan
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-08-10 22:47 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Dave Hansen, acme, mingo, linux-kernel, mark.rutland,
	alexander.shishkin, jolsa, eranian, ak, kirill.shutemov

On Mon, Aug 10, 2020 at 06:38:35PM -0400, Liang, Kan wrote:
> On 8/10/2020 5:47 PM, Dave Hansen wrote:

> > It's probably best if we very carefully define up front what is getting
> > reported here.  For instance, I believe we already have some fun cases
> > with huge tmpfs where a compound page is mapped with 4k PTEs.  Kirill
> > also found a few drivers doing this as well.  I think there were also
> > some weird cases for ARM hugetlbfs where there were multiple hardware
> > page table entries mapping a single hugetlbfs page.  These would be
> > cases where compound_head() size would be greater than the size of the
> > leaf paging structure entry.
> > 
> > This is also why we have KerelPageSize and MMUPageSize in /proc/$pid/smaps.
> > 
> > So, is this returning the kernel software page size or the MMU size?
> > 
> 
> This tries to return the kernel software page size. I will add a commit to
> the function. For the above cases, I think they can be detected by
> PageCompound(page). The current code should already cover them. Is my
> understanding correct?

But the rationale for the whole feature was to measure and possibly
drive large page promotion/demotion, which requires the mmu page-size.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 06/16] perf script: Use ULL for enum perf_output_field
  2020-08-10 21:24 ` [PATCH V6 06/16] perf script: Use ULL for enum perf_output_field Kan Liang
@ 2020-08-12 12:21   ` Arnaldo Carvalho de Melo
  2020-08-12 13:42     ` Liang, Kan
  0 siblings, 1 reply; 32+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-08-12 12:21 UTC (permalink / raw)
  To: Kan Liang
  Cc: peterz, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov

Em Mon, Aug 10, 2020 at 02:24:26PM -0700, Kan Liang escreveu:
> The Bitwise-Shift operator (1U << ) is used in the enum
> perf_output_field, which has already reached its capacity (32 items).
> If more items are added, a compile error will be triggered.
> 
> Change the U to ULL, which extend the capacity to 64 items.
> 
> The enum perf_output_field is only used to calculate a value for the
> 'fields' in the output structure. The 'fields' is u64. The change
> doesn't break anything.

Jiri did this already:

https://git.kernel.org/torvalds/c/60e5eeb56a1
 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  tools/perf/builtin-script.c | 64 ++++++++++++++++++-------------------
>  1 file changed, 32 insertions(+), 32 deletions(-)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 447457786362..214bec350971 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -82,38 +82,38 @@ static bool			native_arch;
>  unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
>  
>  enum perf_output_field {
> -	PERF_OUTPUT_COMM            = 1U << 0,
> -	PERF_OUTPUT_TID             = 1U << 1,
> -	PERF_OUTPUT_PID             = 1U << 2,
> -	PERF_OUTPUT_TIME            = 1U << 3,
> -	PERF_OUTPUT_CPU             = 1U << 4,
> -	PERF_OUTPUT_EVNAME          = 1U << 5,
> -	PERF_OUTPUT_TRACE           = 1U << 6,
> -	PERF_OUTPUT_IP              = 1U << 7,
> -	PERF_OUTPUT_SYM             = 1U << 8,
> -	PERF_OUTPUT_DSO             = 1U << 9,
> -	PERF_OUTPUT_ADDR            = 1U << 10,
> -	PERF_OUTPUT_SYMOFFSET       = 1U << 11,
> -	PERF_OUTPUT_SRCLINE         = 1U << 12,
> -	PERF_OUTPUT_PERIOD          = 1U << 13,
> -	PERF_OUTPUT_IREGS	    = 1U << 14,
> -	PERF_OUTPUT_BRSTACK	    = 1U << 15,
> -	PERF_OUTPUT_BRSTACKSYM	    = 1U << 16,
> -	PERF_OUTPUT_DATA_SRC	    = 1U << 17,
> -	PERF_OUTPUT_WEIGHT	    = 1U << 18,
> -	PERF_OUTPUT_BPF_OUTPUT	    = 1U << 19,
> -	PERF_OUTPUT_CALLINDENT	    = 1U << 20,
> -	PERF_OUTPUT_INSN	    = 1U << 21,
> -	PERF_OUTPUT_INSNLEN	    = 1U << 22,
> -	PERF_OUTPUT_BRSTACKINSN	    = 1U << 23,
> -	PERF_OUTPUT_BRSTACKOFF	    = 1U << 24,
> -	PERF_OUTPUT_SYNTH           = 1U << 25,
> -	PERF_OUTPUT_PHYS_ADDR       = 1U << 26,
> -	PERF_OUTPUT_UREGS	    = 1U << 27,
> -	PERF_OUTPUT_METRIC	    = 1U << 28,
> -	PERF_OUTPUT_MISC            = 1U << 29,
> -	PERF_OUTPUT_SRCCODE	    = 1U << 30,
> -	PERF_OUTPUT_IPC             = 1U << 31,
> +	PERF_OUTPUT_COMM            = 1ULL << 0,
> +	PERF_OUTPUT_TID             = 1ULL << 1,
> +	PERF_OUTPUT_PID             = 1ULL << 2,
> +	PERF_OUTPUT_TIME            = 1ULL << 3,
> +	PERF_OUTPUT_CPU             = 1ULL << 4,
> +	PERF_OUTPUT_EVNAME          = 1ULL << 5,
> +	PERF_OUTPUT_TRACE           = 1ULL << 6,
> +	PERF_OUTPUT_IP              = 1ULL << 7,
> +	PERF_OUTPUT_SYM             = 1ULL << 8,
> +	PERF_OUTPUT_DSO             = 1ULL << 9,
> +	PERF_OUTPUT_ADDR            = 1ULL << 10,
> +	PERF_OUTPUT_SYMOFFSET       = 1ULL << 11,
> +	PERF_OUTPUT_SRCLINE         = 1ULL << 12,
> +	PERF_OUTPUT_PERIOD          = 1ULL << 13,
> +	PERF_OUTPUT_IREGS	    = 1ULL << 14,
> +	PERF_OUTPUT_BRSTACK	    = 1ULL << 15,
> +	PERF_OUTPUT_BRSTACKSYM	    = 1ULL << 16,
> +	PERF_OUTPUT_DATA_SRC	    = 1ULL << 17,
> +	PERF_OUTPUT_WEIGHT	    = 1ULL << 18,
> +	PERF_OUTPUT_BPF_OUTPUT	    = 1ULL << 19,
> +	PERF_OUTPUT_CALLINDENT	    = 1ULL << 20,
> +	PERF_OUTPUT_INSN	    = 1ULL << 21,
> +	PERF_OUTPUT_INSNLEN	    = 1ULL << 22,
> +	PERF_OUTPUT_BRSTACKINSN	    = 1ULL << 23,
> +	PERF_OUTPUT_BRSTACKOFF	    = 1ULL << 24,
> +	PERF_OUTPUT_SYNTH           = 1ULL << 25,
> +	PERF_OUTPUT_PHYS_ADDR       = 1ULL << 26,
> +	PERF_OUTPUT_UREGS	    = 1ULL << 27,
> +	PERF_OUTPUT_METRIC	    = 1ULL << 28,
> +	PERF_OUTPUT_MISC            = 1ULL << 29,
> +	PERF_OUTPUT_SRCCODE	    = 1ULL << 30,
> +	PERF_OUTPUT_IPC             = 1ULL << 31,
>  };
>  
>  struct output_option {
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-10 22:47       ` Peter Zijlstra
@ 2020-08-12 13:39         ` Liang, Kan
  2020-08-12 13:53           ` Dave Hansen
  0 siblings, 1 reply; 32+ messages in thread
From: Liang, Kan @ 2020-08-12 13:39 UTC (permalink / raw)
  To: Peter Zijlstra, Dave Hansen
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, kirill.shutemov



On 8/10/2020 6:47 PM, Peter Zijlstra wrote:
> On Mon, Aug 10, 2020 at 06:38:35PM -0400, Liang, Kan wrote:
>> On 8/10/2020 5:47 PM, Dave Hansen wrote:
> 
>>> It's probably best if we very carefully define up front what is getting
>>> reported here.  For instance, I believe we already have some fun cases
>>> with huge tmpfs where a compound page is mapped with 4k PTEs.  Kirill
>>> also found a few drivers doing this as well.  I think there were also
>>> some weird cases for ARM hugetlbfs where there were multiple hardware
>>> page table entries mapping a single hugetlbfs page.  These would be
>>> cases where compound_head() size would be greater than the size of the
>>> leaf paging structure entry.
>>>
>>> This is also why we have KerelPageSize and MMUPageSize in /proc/$pid/smaps.
>>>
>>> So, is this returning the kernel software page size or the MMU size?
>>>
>>
>> This tries to return the kernel software page size. I will add a commit to
>> the function. For the above cases, I think they can be detected by
>> PageCompound(page). The current code should already cover them. Is my
>> understanding correct?
> 
> But the rationale for the whole feature was to measure and possibly
> drive large page promotion/demotion, which requires the mmu page-size.

Yes, the MMU page-size is better here.

I still have some questions regarding MMUPageSize VS. KerelPageSize.
Could you please clarify?

I checked the show_smap code in fs/proc/task_mmu.c. We defined a __weak 
function for vma_mmu_pagesize(), which invokes vma_kernel_pagesize(). 
The comments also say that "In the majority of cases, the page size used 
by the kernel matches the MMU size. On architectures where it differs, 
an architecture-specific 'strong' version of this symbol is required."
I searched the vma_mmu_pagesize(). It seems that PowerPC is the only one 
that defines a 'strong' function. In other words, the MMUPageSize and 
KerelPageSize are the same for X86. However, it seems not true for the 
above compound page cases. Is it a bug for smaps? Or am I missed anything?

Thanks,
Kan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 06/16] perf script: Use ULL for enum perf_output_field
  2020-08-12 12:21   ` Arnaldo Carvalho de Melo
@ 2020-08-12 13:42     ` Liang, Kan
  0 siblings, 0 replies; 32+ messages in thread
From: Liang, Kan @ 2020-08-12 13:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, dave.hansen, kirill.shutemov



On 8/12/2020 8:21 AM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Aug 10, 2020 at 02:24:26PM -0700, Kan Liang escreveu:
>> The Bitwise-Shift operator (1U << ) is used in the enum
>> perf_output_field, which has already reached its capacity (32 items).
>> If more items are added, a compile error will be triggered.
>>
>> Change the U to ULL, which extend the capacity to 64 items.
>>
>> The enum perf_output_field is only used to calculate a value for the
>> 'fields' in the output structure. The 'fields' is u64. The change
>> doesn't break anything.
> 
> Jiri did this already:
> 
> https://git.kernel.org/torvalds/c/60e5eeb56a1

Thanks for pointing it out.

I will rebase the code on top of it.

Thanks,
Kan

>   
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>   tools/perf/builtin-script.c | 64 ++++++++++++++++++-------------------
>>   1 file changed, 32 insertions(+), 32 deletions(-)
>>
>> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
>> index 447457786362..214bec350971 100644
>> --- a/tools/perf/builtin-script.c
>> +++ b/tools/perf/builtin-script.c
>> @@ -82,38 +82,38 @@ static bool			native_arch;
>>   unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
>>   
>>   enum perf_output_field {
>> -	PERF_OUTPUT_COMM            = 1U << 0,
>> -	PERF_OUTPUT_TID             = 1U << 1,
>> -	PERF_OUTPUT_PID             = 1U << 2,
>> -	PERF_OUTPUT_TIME            = 1U << 3,
>> -	PERF_OUTPUT_CPU             = 1U << 4,
>> -	PERF_OUTPUT_EVNAME          = 1U << 5,
>> -	PERF_OUTPUT_TRACE           = 1U << 6,
>> -	PERF_OUTPUT_IP              = 1U << 7,
>> -	PERF_OUTPUT_SYM             = 1U << 8,
>> -	PERF_OUTPUT_DSO             = 1U << 9,
>> -	PERF_OUTPUT_ADDR            = 1U << 10,
>> -	PERF_OUTPUT_SYMOFFSET       = 1U << 11,
>> -	PERF_OUTPUT_SRCLINE         = 1U << 12,
>> -	PERF_OUTPUT_PERIOD          = 1U << 13,
>> -	PERF_OUTPUT_IREGS	    = 1U << 14,
>> -	PERF_OUTPUT_BRSTACK	    = 1U << 15,
>> -	PERF_OUTPUT_BRSTACKSYM	    = 1U << 16,
>> -	PERF_OUTPUT_DATA_SRC	    = 1U << 17,
>> -	PERF_OUTPUT_WEIGHT	    = 1U << 18,
>> -	PERF_OUTPUT_BPF_OUTPUT	    = 1U << 19,
>> -	PERF_OUTPUT_CALLINDENT	    = 1U << 20,
>> -	PERF_OUTPUT_INSN	    = 1U << 21,
>> -	PERF_OUTPUT_INSNLEN	    = 1U << 22,
>> -	PERF_OUTPUT_BRSTACKINSN	    = 1U << 23,
>> -	PERF_OUTPUT_BRSTACKOFF	    = 1U << 24,
>> -	PERF_OUTPUT_SYNTH           = 1U << 25,
>> -	PERF_OUTPUT_PHYS_ADDR       = 1U << 26,
>> -	PERF_OUTPUT_UREGS	    = 1U << 27,
>> -	PERF_OUTPUT_METRIC	    = 1U << 28,
>> -	PERF_OUTPUT_MISC            = 1U << 29,
>> -	PERF_OUTPUT_SRCCODE	    = 1U << 30,
>> -	PERF_OUTPUT_IPC             = 1U << 31,
>> +	PERF_OUTPUT_COMM            = 1ULL << 0,
>> +	PERF_OUTPUT_TID             = 1ULL << 1,
>> +	PERF_OUTPUT_PID             = 1ULL << 2,
>> +	PERF_OUTPUT_TIME            = 1ULL << 3,
>> +	PERF_OUTPUT_CPU             = 1ULL << 4,
>> +	PERF_OUTPUT_EVNAME          = 1ULL << 5,
>> +	PERF_OUTPUT_TRACE           = 1ULL << 6,
>> +	PERF_OUTPUT_IP              = 1ULL << 7,
>> +	PERF_OUTPUT_SYM             = 1ULL << 8,
>> +	PERF_OUTPUT_DSO             = 1ULL << 9,
>> +	PERF_OUTPUT_ADDR            = 1ULL << 10,
>> +	PERF_OUTPUT_SYMOFFSET       = 1ULL << 11,
>> +	PERF_OUTPUT_SRCLINE         = 1ULL << 12,
>> +	PERF_OUTPUT_PERIOD          = 1ULL << 13,
>> +	PERF_OUTPUT_IREGS	    = 1ULL << 14,
>> +	PERF_OUTPUT_BRSTACK	    = 1ULL << 15,
>> +	PERF_OUTPUT_BRSTACKSYM	    = 1ULL << 16,
>> +	PERF_OUTPUT_DATA_SRC	    = 1ULL << 17,
>> +	PERF_OUTPUT_WEIGHT	    = 1ULL << 18,
>> +	PERF_OUTPUT_BPF_OUTPUT	    = 1ULL << 19,
>> +	PERF_OUTPUT_CALLINDENT	    = 1ULL << 20,
>> +	PERF_OUTPUT_INSN	    = 1ULL << 21,
>> +	PERF_OUTPUT_INSNLEN	    = 1ULL << 22,
>> +	PERF_OUTPUT_BRSTACKINSN	    = 1ULL << 23,
>> +	PERF_OUTPUT_BRSTACKOFF	    = 1ULL << 24,
>> +	PERF_OUTPUT_SYNTH           = 1ULL << 25,
>> +	PERF_OUTPUT_PHYS_ADDR       = 1ULL << 26,
>> +	PERF_OUTPUT_UREGS	    = 1ULL << 27,
>> +	PERF_OUTPUT_METRIC	    = 1ULL << 28,
>> +	PERF_OUTPUT_MISC            = 1ULL << 29,
>> +	PERF_OUTPUT_SRCCODE	    = 1ULL << 30,
>> +	PERF_OUTPUT_IPC             = 1ULL << 31,
>>   };
>>   
>>   struct output_option {
>> -- 
>> 2.17.1
>>
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
  2020-08-12 13:39         ` Liang, Kan
@ 2020-08-12 13:53           ` Dave Hansen
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Hansen @ 2020-08-12 13:53 UTC (permalink / raw)
  To: Liang, Kan, Peter Zijlstra
  Cc: acme, mingo, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, eranian, ak, kirill.shutemov

On 8/12/20 6:39 AM, Liang, Kan wrote:
> I searched the vma_mmu_pagesize(). It seems that PowerPC is the only
> one that defines a 'strong' function. In other words, the MMUPageSize
> and KerelPageSize are the same for X86. However, it seems not true
> for the above compound page cases. Is it a bug for smaps? Or am I
> missed anything?

__weak unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
{
        return vma_kernel_pagesize(vma);
}

unsigned long vma_kernel_pagesize(struct vm_area_struct *vma)
{
        if (vma->vm_ops && vma->vm_ops->pagesize)
                return vma->vm_ops->pagesize(vma);
        return PAGE_SIZE;
}

It can be overridden with vm_ops too, not just a weak symbol.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-08-12 13:53 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-10 21:24 [PATCH V6 00/16] Add the page size in the perf record Kan Liang
2020-08-10 21:24 ` [PATCH V6 01/16] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
2020-08-10 21:35   ` Peter Zijlstra
2020-08-10 21:39   ` Peter Zijlstra
2020-08-10 22:36     ` Liang, Kan
2020-08-10 21:47   ` Dave Hansen
2020-08-10 22:38     ` Liang, Kan
2020-08-10 22:47       ` Peter Zijlstra
2020-08-12 13:39         ` Liang, Kan
2020-08-12 13:53           ` Dave Hansen
2020-08-10 21:24 ` [PATCH V6 02/16] perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
2020-08-10 21:40   ` Peter Zijlstra
2020-08-10 22:36     ` Liang, Kan
2020-08-10 21:24 ` [PATCH V6 03/16] perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE Kan Liang
2020-08-10 21:41   ` Peter Zijlstra
2020-08-10 22:37     ` Liang, Kan
2020-08-10 22:44       ` Peter Zijlstra
2020-08-10 21:24 ` [PATCH V6 04/16] tools headers UAPI: Update tools's copy of linux/perf_event.h Kan Liang
2020-08-10 21:24 ` [PATCH V6 05/16] perf record: Support new sample type for data page size Kan Liang
2020-08-10 21:24 ` [PATCH V6 06/16] perf script: Use ULL for enum perf_output_field Kan Liang
2020-08-12 12:21   ` Arnaldo Carvalho de Melo
2020-08-12 13:42     ` Liang, Kan
2020-08-10 21:24 ` [PATCH V6 07/16] perf script: Support data page size Kan Liang
2020-08-10 21:24 ` [PATCH V6 08/16] perf sort: Add sort option for " Kan Liang
2020-08-10 21:24 ` [PATCH V6 09/16] perf mem: Factor out a function to generate sort order Kan Liang
2020-08-10 21:24 ` [PATCH V6 10/16] perf mem: Clean up output format Kan Liang
2020-08-10 21:24 ` [PATCH V6 11/16] perf mem: Support data page size Kan Liang
2020-08-10 21:24 ` [PATCH V6 12/16] perf test: Add test case for PERF_SAMPLE_DATA_PAGE_SIZE Kan Liang
2020-08-10 21:24 ` [PATCH V6 13/16] perf tools: Add support for PERF_SAMPLE_CODE_PAGE_SIZE Kan Liang
2020-08-10 21:24 ` [PATCH V6 14/16] perf script: " Kan Liang
2020-08-10 21:24 ` [PATCH V6 15/16] perf report: " Kan Liang
2020-08-10 21:24 ` [PATCH V6 16/16] perf test: Add test case " Kan Liang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.