From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 529D6C169C4 for ; Thu, 31 Jan 2019 14:58:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 26895218AF for ; Thu, 31 Jan 2019 14:58:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727510AbfAaO6N (ORCPT ); Thu, 31 Jan 2019 09:58:13 -0500 Received: from mga07.intel.com ([134.134.136.100]:33389 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725796AbfAaO6N (ORCPT ); Thu, 31 Jan 2019 09:58:13 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Jan 2019 06:58:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,545,1539673200"; d="scan'208";a="120985220" Received: from linux.intel.com ([10.54.29.200]) by fmsmga008.fm.intel.com with ESMTP; 31 Jan 2019 06:58:11 -0800 Received: from [10.251.7.163] (kliang2-mobl1.ccr.corp.intel.com [10.251.7.163]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id A5C9B580297; Thu, 31 Jan 2019 06:58:10 -0800 (PST) Subject: Re: [PATCH V3 01/13] perf/core, x86: Add PERF_SAMPLE_DATA_PAGE_SIZE To: Peter Zijlstra Cc: acme@kernel.org, tglx@linutronix.de, mingo@redhat.com, linux-kernel@vger.kernel.org, eranian@google.com, jolsa@redhat.com, namhyung@kernel.org, ak@linux.intel.com, Andy Lutomirski References: <1548858234-8872-1-git-send-email-kan.liang@linux.intel.com> <20190131123725.GB31516@hirez.programming.kicks-ass.net> From: "Liang, Kan" Message-ID: <6ecaa29b-af17-938d-8638-acb9de5454be@linux.intel.com> Date: Thu, 31 Jan 2019 09:58:09 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <20190131123725.GB31516@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/31/2019 7:37 AM, Peter Zijlstra wrote: > On Wed, Jan 30, 2019 at 06:23:42AM -0800, kan.liang@linux.intel.com wrote: >> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c >> index 374a197..03bf45d 100644 >> --- a/arch/x86/events/core.c >> +++ b/arch/x86/events/core.c >> @@ -2578,3 +2578,45 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) >> cap->events_mask_len = x86_pmu.events_mask_len; >> } >> EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability); >> + >> +/* >> + * map x86 page levels to perf page sizes >> + */ >> +static const enum perf_page_size perf_page_size_map[PG_LEVEL_NUM] = { >> + [PG_LEVEL_NONE] = PERF_PAGE_SIZE_NONE, >> + [PG_LEVEL_4K] = PERF_PAGE_SIZE_4K, >> + [PG_LEVEL_2M] = PERF_PAGE_SIZE_2M, >> + [PG_LEVEL_1G] = PERF_PAGE_SIZE_1G, >> + [PG_LEVEL_512G] = PERF_PAGE_SIZE_512G, >> +}; >> + >> +u64 perf_get_page_size(u64 virt) >> +{ >> + unsigned long flags; >> + unsigned int level; >> + pte_t *pte; >> + >> + if (!virt) >> + return 0; >> + >> + /* >> + * Interrupts are disabled, so it prevents any tear down >> + * of the page tables. >> + * See the comment near struct mmu_table_batch. >> + */ >> + local_irq_save(flags); >> + if (virt >= TASK_SIZE) >> + pte = lookup_address(virt, &level); >> + else { >> + if (current->mm) >> + pte = lookup_address_in_pgd(pgd_offset(current->mm, virt), >> + virt, &level); > > Aside from all the missin {}, I'm fairly sure this is broken since this > happens from NMI context. This can interrupt switch_mm() and things like > use_temporary_mm(). > > Also; why does this live in the x86 code and not in the generic code? > This is x86 implementation. In generic code, there is a __weak function. I'll make it clear in the change log in v4. +/* Return page size of given virtual address. IRQ-safe required. */ +u64 __weak perf_get_page_size(u64 virt) +{ + return PERF_PAGE_SIZE_NONE; +} >> + else >> + level = PG_LEVEL_NUM; >> + } >> + local_irq_restore(flags); >> + if (level >= PG_LEVEL_NUM) >> + return PERF_PAGE_SIZE_NONE; >> + >> + return (u64)perf_page_size_map[level]; >> +} > >> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h >> index 7198ddd..79daacd 100644 >> --- a/include/uapi/linux/perf_event.h >> +++ b/include/uapi/linux/perf_event.h >> @@ -141,8 +141,9 @@ enum perf_event_sample_format { >> PERF_SAMPLE_TRANSACTION = 1U << 17, >> PERF_SAMPLE_REGS_INTR = 1U << 18, >> PERF_SAMPLE_PHYS_ADDR = 1U << 19, >> + PERF_SAMPLE_DATA_PAGE_SIZE = 1U << 20, >> >> - PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */ >> + PERF_SAMPLE_MAX = 1U << 21, /* non-ABI */ >> >> __PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* non-ABI; internal use */ >> }; >> @@ -863,6 +864,7 @@ enum perf_event_type { >> * { u64 abi; # enum perf_sample_regs_abi >> * u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR >> * { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR >> + * { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE >> * }; >> */ >> PERF_RECORD_SAMPLE = 9, >> @@ -1150,6 +1152,18 @@ union perf_mem_data_src { >> #define PERF_MEM_S(a, s) \ >> (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT) >> >> + >> +enum perf_page_size { >> + PERF_PAGE_SIZE_NONE, >> + PERF_PAGE_SIZE_4K, >> + PERF_PAGE_SIZE_8K, >> + PERF_PAGE_SIZE_16K, >> + PERF_PAGE_SIZE_64K, >> + PERF_PAGE_SIZE_2M, >> + PERF_PAGE_SIZE_1G, >> + PERF_PAGE_SIZE_512G, >> +}; > > Since you have a u64 to store this in, WTH do you use this limited enum? > Are you very sure this covers all the possible page sizes for all > architectures? > > Why not simply report the page size in bytes? > I will use the page size instead in V4. Thanks, Kan