[V5,02/14] perf/x86: Add perf_get_page_size support
diff mbox series

Message ID 1549648509-12704-2-git-send-email-kan.liang@linux.intel.com
State New
Headers show
Series
  • [V5,01/14] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
Related show

Commit Message

Liang, Kan Feb. 8, 2019, 5:54 p.m. UTC
From: Kan Liang <kan.liang@linux.intel.com>

Implement a x86 specific version of perf_get_page_size(), which do full
page-table walk of a given virtual address to retrieve page size.
For x86, disabling IRQs over the walk is sufficient to prevent any tear
down of the page tables.

The new sample type requires collecting the virtual address. The virtual
address will not be output unless SAMPLE_ADDR is applied.

The large PEBS will be disabled with this sample type. Because we need
to track munmap to flush the PEBS buffer for large PEBS. Perf doesn't
support munmap tracking yet. The large PEBS can be enabled later
separately when munmap tracking is supported.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V4
- Split patch 1 of V4 into two patches.
  This patch add the x86 implementation

 arch/x86/events/core.c     | 31 +++++++++++++++++++++++++++++++
 arch/x86/events/intel/ds.c |  3 ++-
 2 files changed, 33 insertions(+), 1 deletion(-)

Comments

Thomas Gleixner Feb. 8, 2019, 6:47 p.m. UTC | #1
On Fri, 8 Feb 2019, kan.liang@linux.intel.com wrote:
> +u64 perf_get_page_size(u64 virt)
> +{
> +	unsigned long flags;
> +	unsigned int level;
> +	pte_t *pte;
> +
> +	if (!virt)
> +		return 0;
> +
> +	/*
> +	 * Interrupts are disabled, so it prevents any tear down
> +	 * of the page tables.
> +	 * See the comment near struct mmu_table_batch.
> +	 */
> +	local_irq_save(flags);
> +	if (virt >= TASK_SIZE)
> +		pte = lookup_address(virt, &level);
> +	else {
> +		if (current->mm) {
> +			pte = lookup_address_in_pgd(pgd_offset(current->mm, virt),
> +						    virt, &level);
> +		} else
> +			level = PG_LEVEL_NUM;
> +	}

This still lacks quite some curly brackets and aside of that you can write
that so it becomes readable:

	if (virt >= TASK_SIZE)
		pte = lookup_address(virt, &level);
	} else if (current->mm) {
  		pte = lookup_address_in_pgd(pgd_offset(current->mm, virt),
						       virt, &level);
	} else
		level = PG_LEVEL_NUM;
	}


> +	local_irq_restore(flags);
> +	if (level >= PG_LEVEL_NUM)
> +		return 0;
> +
> +	return (u64)page_level_size(level);
> +}
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index e9acf1d..720dc9e 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1274,7 +1274,8 @@ static void setup_pebs_sample_data(struct perf_event *event,
>  	}
>  
>  
> -	if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) &&
> +	if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR
> +			    | PERF_SAMPLE_DATA_PAGE_SIZE)) &&
>  	    x86_pmu.intel_cap.pebs_format >= 1)

Can you please define a mask from those constants and use that instead of
adding this ugly line break.

Thanks,

	tglx
Peter Zijlstra Feb. 8, 2019, 8:07 p.m. UTC | #2
On Fri, Feb 08, 2019 at 09:54:57AM -0800, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Implement a x86 specific version of perf_get_page_size(), which do full
> page-table walk of a given virtual address to retrieve page size.
> For x86, disabling IRQs over the walk is sufficient to prevent any tear
> down of the page tables.
> 
> The new sample type requires collecting the virtual address. The virtual
> address will not be output unless SAMPLE_ADDR is applied.
> 
> The large PEBS will be disabled with this sample type. Because we need
> to track munmap to flush the PEBS buffer for large PEBS. Perf doesn't
> support munmap tracking yet. The large PEBS can be enabled later
> separately when munmap tracking is supported.
> 
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> 
> Changes since V4
> - Split patch 1 of V4 into two patches.
>   This patch add the x86 implementation
> 
>  arch/x86/events/core.c     | 31 +++++++++++++++++++++++++++++++
>  arch/x86/events/intel/ds.c |  3 ++-
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 374a197..229a73b 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2578,3 +2578,34 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
>  	cap->events_mask_len	= x86_pmu.events_mask_len;
>  }
>  EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability);
> +
> +u64 perf_get_page_size(u64 virt)
> +{
> +	unsigned long flags;
> +	unsigned int level;
> +	pte_t *pte;
> +
> +	if (!virt)
> +		return 0;
> +
> +	/*
> +	 * Interrupts are disabled, so it prevents any tear down
> +	 * of the page tables.
> +	 * See the comment near struct mmu_table_batch.
> +	 */
> +	local_irq_save(flags);
> +	if (virt >= TASK_SIZE)
> +		pte = lookup_address(virt, &level);
> +	else {
> +		if (current->mm) {
> +			pte = lookup_address_in_pgd(pgd_offset(current->mm, virt),
> +						    virt, &level);
> +		} else
> +			level = PG_LEVEL_NUM;
> +	}
> +	local_irq_restore(flags);
> +	if (level >= PG_LEVEL_NUM)
> +		return 0;
> +
> +	return (u64)page_level_size(level);
> +}

Full NAK on a pure x86 implementation.

Patch
diff mbox series

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a197..229a73b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2578,3 +2578,34 @@  void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 	cap->events_mask_len	= x86_pmu.events_mask_len;
 }
 EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability);
+
+u64 perf_get_page_size(u64 virt)
+{
+	unsigned long flags;
+	unsigned int level;
+	pte_t *pte;
+
+	if (!virt)
+		return 0;
+
+	/*
+	 * Interrupts are disabled, so it prevents any tear down
+	 * of the page tables.
+	 * See the comment near struct mmu_table_batch.
+	 */
+	local_irq_save(flags);
+	if (virt >= TASK_SIZE)
+		pte = lookup_address(virt, &level);
+	else {
+		if (current->mm) {
+			pte = lookup_address_in_pgd(pgd_offset(current->mm, virt),
+						    virt, &level);
+		} else
+			level = PG_LEVEL_NUM;
+	}
+	local_irq_restore(flags);
+	if (level >= PG_LEVEL_NUM)
+		return 0;
+
+	return (u64)page_level_size(level);
+}
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index e9acf1d..720dc9e 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1274,7 +1274,8 @@  static void setup_pebs_sample_data(struct perf_event *event,
 	}
 
 
-	if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) &&
+	if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR
+			    | PERF_SAMPLE_DATA_PAGE_SIZE)) &&
 	    x86_pmu.intel_cap.pebs_format >= 1)
 		data->addr = pebs->dla;