From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6A2BC433EF for ; Fri, 29 Oct 2021 10:51:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 93F0D60F93 for ; Fri, 29 Oct 2021 10:51:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231790AbhJ2Kxv (ORCPT ); Fri, 29 Oct 2021 06:53:51 -0400 Received: from foss.arm.com ([217.140.110.172]:37012 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231719AbhJ2Kxu (ORCPT ); Fri, 29 Oct 2021 06:53:50 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0F7A21FB; Fri, 29 Oct 2021 03:51:22 -0700 (PDT) Received: from [10.57.46.211] (unknown [10.57.46.211]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1101B3F5A1; Fri, 29 Oct 2021 03:51:18 -0700 (PDT) Subject: Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events From: German Gomez To: Leo Yan Cc: Namhyung Kim , James Clark , Arnaldo Carvalho de Melo , Jiri Olsa , Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Ian Rogers , Stephane Eranian , Adrian Hunter References: <20211004062638.GB174271@leoy-ThinkPad-X240s> <20211006093620.GA14400@leoy-ThinkPad-X240s> <87dad53f-a9a5-cd36-7348-ee10f4edd8fb@arm.com> <20211011142940.GB37383@leoy-ThinkPad-X240s> <8a1eafe3-d19e-40d6-f659-de0e9daa5877@arm.com> <20211018132328.GG130233@leoy-ThinkPad-X240s> Message-ID: <354d76da-5402-5c24-516f-c1f7e58590fc@arm.com> Date: Fri, 29 Oct 2021 11:51:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Leo, The current plan is to define a global flag in the `struct arm_spe` to select the method of pid tracing (context pkt, or switch events):     struct arm_spe {        /* ... */        u8        use_ctx_pkt_for_pid;     } The method could be determined by peeking at the top element of the `struct auxtrace_heap` at the beginning of the perf-report. If ctx packets have been collected, the first one should have a context_id != -1. We could then tweak this part of Namhyung patch slightly:     if (!spe->use_ctx_pkt_for_pid &&         (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||          event->header.type == PERF_RECORD_SWITCH))             err = arm_spe_context_switch(spe, event, sample); Then we could apply patch [1] which wasn't fully merged in the end, including similar `if (spe->use_ctx_pkt_for_pid)` to collect the pid/tid from the context packets. What do you think? Thanks, German [1] https://lore.kernel.org/lkml/20210119144658.793-8-james.clark@arm.com/ On 19/10/2021 13:21, German Gomez wrote: > Hi Leo, > > Many thanks for you comments as always and sorry for the rushed patch. > > On 18/10/2021 14:23, Leo Yan wrote: >> Hi German, >> >> On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote: >>> Hi, >>> >>> What do you thing of the patch below? PERF_RECORD_SWITCH events are also >>> included for tracing forks. The patch would sit on top of Namhyung's. >> Yeah, it's good to add PERF_RECORD_SWITCH. >> >>> On 12/10/2021 12:07, German Gomez wrote: >>>> Hi, Leo and Namhyung, >>>> >>>> I want to make sure I'm on the same page as you regarding this topic. >>>> >>>> [...] >>>> >>>> If we are not considering patching the driver at this stage, so we allow >>>> hardware tracing on non-root namespaces. I think we could proceed like >>>> this: >>>> >>>> � - For userspace, always use context-switch events as they are >>>> ��� accurate and consistent with namespaces. >> I don't think you can always use context-switch events for userspace >> samples. The underlying mechanism is when there have context-switch >> event or context packet is coming, it will invoke the function >> machine__set_current_tid() to set current pid/tid; afterwards, we >> can retrieve the current pid/tid with the function >> arm_spe_set_pid_tid_cpu(). >> >> The question is that if we want to use the tid/pid info at the same >> time for both context-switch events and context packets, then it's >> hard to maintain. E.g. we need to create multiple thread context, one >> is used to track pid info coming from context-switch events and >> another context is to track pid info from context packet. > My thinking was to use only one of the methods for the entire run, but > the code below is not expressive enough I'm afraid and I agree it could > become hard to maintain. I need to polish it up. > >> To simplify the code, I still think we give context packet priority and >> use it if it's avalible. And we rollback to use context-switch events >> for pid/tid when context packet is not avaliable. > OK if it simplifies things. I think context-pkt availability can be > determined before any events are processed by looking at the top record > in the auxtrace_heap, o any of the auxtrace_queues. > >>>> � - For kernel tracing, if context packets are enabled, use them, but >>>> ��� warn the user that the PIDs correspond to the root namespace. >>>> � - Otherwise, use context-switch events and warn the user of the time >>>> ��� inaccuracies. >>>> >>>> Later, if the driver is patched to disable context packets outside the >>>> root namespace, kernel tracing could fall back to using context-switch >>>> events and warn the user with a single message about the time >>>> inaccuracies. >>>> >>>> If we are aligned, we could collect your feedback and share an updated >>>> patch that considers the warnings. >>>> >>>> Many thanks >>>> Best regards >>> --- >>> �tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++-- >>> �1 file changed, 63 insertions(+), 3 deletions(-) >>> >>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c >>> index 708323d7c93c..6a2f7a484a80 100644 >>> --- a/tools/perf/util/arm-spe.c >>> +++ b/tools/perf/util/arm-spe.c >>> @@ -71,6 +71,17 @@ struct arm_spe { >>> ���� u64��� ��� ��� ��� kernel_start; >>> � >>> ���� unsigned long��� ��� ��� num_events; >>> + >>> +��� /* >>> +��� �* Used for PID tracing. >>> +��� �*/ >>> +��� u8��� ��� ��� ��� exclude_kernel; >>> + >>> +��� /* >>> +��� �* Warning messages. >>> +��� �*/ >>> +��� u8��� ��� ��� ��� warn_context_pkt_namesapce; >>> +��� u8��� ��� ��� ��� warn_context_switch_ev_accuracy; >>> �}; >>> � >>> �struct arm_spe_queue { >>> @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe) >>> ���� return timeless_decoding; >>> �} >>> � >>> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) { >>> +��� struct evsel *evsel; >>> +��� struct evlist *evlist = spe->session->evlist; >>> + >>> +��� evlist__for_each_entry(evlist, evsel) { >>> +��� if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel) >>> +��� ��� return true; >>> +��� } >>> + >>> +��� return false; >>> +} >>> + >>> �static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe, >>> ���� ��� ��� ��� ��� struct auxtrace_queue *queue) >>> �{ >>> ���� struct arm_spe_queue *speq = queue->priv; >>> -��� pid_t tid; >>> +��� pid_t tid = machine__get_current_tid(spe->machine, speq->cpu); >>> +��� u64 context_id = speq->decoder->record.context_id; >>> + >>> +��� /* >>> +��� * We're tracing the kernel. >>> +��� */ >>> +��� if (!spe->exclude_kernel) { >> This is incorrect ... 'exclude_kernel' is a global variable and if >> it's set then perf will always run below code. >> >> I think here you want to avoid using contect packet for user space >> samples, but checking 'exclude_kernel' cannot help for this purpose >> since 'exclude_kernel' cannot be used to decide sample mode (kernel >> mode or user mode). >> >> Thanks, >> Leo >> >>> +��� ��� /* >>> +��� ��� �* Use CONTEXT packets in kernel tracing if available and warn the user of the >>> +��� ��� �* values correspond to the root PID namespace. >>> +��� ��� �* >>> +��� ��� �* If CONTEXT packets aren't available but context-switch events are, warn the user >>> +��� ��� �* of the time inaccuracies. >>> +��� ��� �*/ >>> +��� ��� if (context_id != (u64) -1) { >>> +��� ��� ��� tid = speq->decoder->record.context_id; >>> +��� ��� ��� spe->warn_context_pkt_namesapce = true; >>> +��� ��� } else if (tid != -1 && context_id == (u64) -1) >>> +��� ��� ��� spe->warn_context_switch_ev_accuracy = true; >>> +��� } >>> � >>> ���� tid = machine__get_current_tid(spe->machine, speq->cpu); >>> ���� if (tid != -1) { >>> @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session, >>> ���� ��� if (err) >>> ���� ��� ��� return err; >>> � >>> -��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) >>> +��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE || >>> +��� ��� ��� event->header.type == PERF_RECORD_SWITCH) >>> ���� ��� ��� err = arm_spe_context_switch(spe, event, sample); >>> ���� } >>> � >>> @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused, >>> ���� ��� return arm_spe_process_timeless_queues(spe, -1, >>> ���� ��� ��� ��� MAX_TIMESTAMP - 1); >>> � >>> -��� return arm_spe_process_queues(spe, MAX_TIMESTAMP); >>> +��� ret = arm_spe_process_queues(spe, MAX_TIMESTAMP); >>> + >>> +��� if (spe->warn_context_pkt_namesapce) >>> +��� ��� ui__warning( >>> +��� ��� ��� "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n" >>> +��� ��� ��� "PID values correspond to the root PID namespace.\n\n"); >>> + >>> +��� if (spe->warn_context_switch_ev_accuracy) >>> +��� ��� ui__warning( >>> +��� ��� ��� "No Arm SPE CONTEXT packets found within traces.\n\n" >>> +��� ��� ��� "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n" >>> +��� ��� ��� "workload-dependant timing inaccuracies.\n\n"); >>> + >>> +��� return ret; >>> �} >>> � >>> �static void arm_spe_free_queue(void *priv) >>> @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event, >>> � >>> ���� spe->timeless_decoding = arm_spe__is_timeless_decoding(spe); >>> � >>> +��� spe->exclude_kernel = arm_spe__is_exclude_kernel(spe); >>> +��� spe->warn_context_pkt_namesapce = false; >>> +��� spe->warn_context_switch_ev_accuracy = false; >>> + >>> ���� /* >>> ���� �* The synthesized event PERF_RECORD_TIME_CONV has been handled ahead >>> ���� �* and the parameters for hardware clock are stored in the session >>> -- >>> 2.17.1 > > Thanks, > German >