From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A487C4363A for ; Mon, 26 Oct 2020 18:19:52 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 210742087C for ; Mon, 26 Oct 2020 18:19:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="k8A74/RO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 210742087C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=YWxvlpybT+x1UBJW3DG2dOVRcGa0rq5gEtzAye+fr+0=; b=k8A74/ROpr+Spo4q5JriPBBLt owgoidGHDnJqXRHnttmN3CtKg5U1vl/w4pJciHoX0bWtpH2iz/5Uq0BxctHw3C9dOE5ktPnG19sD/ w1qiRpRA76M6Gdqoi5q1IsGaU9ZYzpqZRsxyiqzwLTEbTe1bbtTdgAOriT2TZURAyJLsxT41gVt7S ZfXfmvNBN9SAmpyUPtT462jGxToCZmEFhOhdXU5H0kVcOzCqRA7rqaySxZElw76CHfb90lrLcwzcm FI0FjUMZvKP0kX5Jh3+KhaapuTcfXXnN9OEK+em4x4J2BU/W8EW4Q7ks2IICCchw+FoyrJQJpp38c f39Hu4PJQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kX74Z-0008NW-Ss; Mon, 26 Oct 2020 18:18:23 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kX73w-00080R-5K for linux-arm-kernel@lists.infradead.org; Mon, 26 Oct 2020 18:17:46 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 37AAF139F; Mon, 26 Oct 2020 11:17:41 -0700 (PDT) Received: from [10.57.18.142] (unknown [10.57.18.142]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3A02B3F66E; Mon, 26 Oct 2020 11:17:40 -0700 (PDT) Subject: Re: [PATCH v4 0/2] Make sysFS functional on topologies with per core sink To: Linu Cherian References: <20200904024106.21478-1-lcherian@marvell.com> <2bd65f2d-5660-10b3-f51f-448221d78d3d@arm.com> From: Suzuki K Poulose Message-ID: Date: Mon, 26 Oct 2020 18:17:39 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-GB X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201026_141744_409299_F4C1E680 X-CRM114-Status: GOOD ( 48.12 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arm-kernel , Coresight ML , Mathieu Poirier , Linu Cherian , Mike Leach Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Linu, Thanks for the feedback. My responses inline. On 10/26/20 4:33 AM, Linu Cherian wrote: > Hi Suzuki, > > On Mon, Oct 5, 2020 at 4:52 PM Suzuki K Poulose wrote: >> >> Hi Linu, >> >> On 09/04/2020 03:41 AM, Linu Cherian wrote: >>> This patch series tries to fix the sysfs breakage on topologies >>> with per core sink. >>> >>> Changes since v3: >>> - References to coresight_get_enabled_sink in perf interface >>> has been removed and marked deprecated as a new patch. >>> - To avoid changes to coresight_find_sink for ease of maintenance, >>> search function specific to sysfs usage has been added. >>> - Sysfs being the only user for coresight_get_enabled sink, >>> reset option is removed as well. >> >> Have you tried running perf with --per-thread option ? I believe >> this will be impacted as well, as we choose a single sink at the >> moment and this may not be reachable from the other CPUs, where >> the event may be scheduled. Eventually loosing trace for the >> duration where the task is scheduled on a different CPU. >> >> Please could you try this patch and see if helps ? I have lightly >> tested this on a fast model. > > We are seeing some issues while testing with this patch. > The issue is that, always buffer allocation for the sink happens to be on the > first core in cpu mask and this doesn't match with the core on which > event is started. Please see below for additional comments. Please could you clarify the "issues" ? How is the buffer allocation a problem ? > > >> >> ---8>--- >> >> coresight: etm-perf: Allow an event to use multiple sinks >> >> When there are multiple sinks on the system, in the absence >> of a specified sink, it is quite possible that a default sink >> for an ETM could be different from that of another ETM (e.g, on >> systems with per-CPU sinks). However we do not support having >> multiple sinks for an event yet. This patch allows the event to >> use the default sinks on the ETMs where they are scheduled as >> long as the sinks are of the same type. >> >> e.g, if we have 1x1 topology with per-CPU ETRs, the event can >> use the per-CPU ETR for the session. However, if the sinks >> are of different type, e.g TMC-ETR on one and a custom sink >> on another, the event will only trace on the first detected >> sink (just like we have today). >> >> Cc: Linu Cherian >> Cc: Mathieu Poirier >> Cc: Mike Leach >> Signed-off-by: Suzuki K Poulose >> --- >> .../hwtracing/coresight/coresight-etm-perf.c | 69 +++++++++++++------ >> 1 file changed, 49 insertions(+), 20 deletions(-) >> >> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c >> b/drivers/hwtracing/coresight/coresight-etm-perf.c >> index c2c9b127d074..19fe38010474 100644 >> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c >> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c >> @@ -204,14 +204,28 @@ static void etm_free_aux(void *data) >> schedule_work(&event_data->work); >> } >> >> +/* >> + * When an event could be scheduled on more than one CPUs, we have to make >> + * sure that the sinks are of the same type, so that the sink_buffer could >> + * be reused. >> + */ >> +static bool sinks_match(struct coresight_device *a, struct coresight_device *b) >> +{ >> + if (!a || !b) >> + return false; >> + return (sink_ops(a) == sink_ops(b)) && >> + (a->subtype.sink_subtype == b->subtype.sink_subtype); >> +} >> + >> static void *etm_setup_aux(struct perf_event *event, void **pages, >> int nr_pages, bool overwrite) >> { >> u32 id; >> int cpu = event->cpu; >> cpumask_t *mask; >> - struct coresight_device *sink; >> + struct coresight_device *sink = NULL; >> struct etm_event_data *event_data = NULL; >> + bool sink_forced = false; >> >> event_data = alloc_event_data(cpu); >> if (!event_data) >> @@ -222,6 +236,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, >> if (event->attr.config2) { >> id = (u32)event->attr.config2; >> sink = coresight_get_sink_by_id(id); >> + sink_forced = true; >> } >> >> mask = &event_data->mask; >> @@ -235,7 +250,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, >> */ >> for_each_cpu(cpu, mask) { >> struct list_head *path; >> - struct coresight_device *csdev; >> + struct coresight_device *csdev, *cpu_sink; >> >> csdev = per_cpu(csdev_src, cpu); >> /* >> @@ -243,33 +258,42 @@ static void *etm_setup_aux(struct perf_event *event, void **pages, >> * the mask and continue with the rest. If ever we try to trace >> * on this CPU, we handle it accordingly. >> */ >> - if (!csdev) { >> - cpumask_clear_cpu(cpu, mask); >> - continue; >> - } >> - >> + if (!csdev) >> + goto clear_cpu; >> /* >> - * No sink provided - look for a default sink for one of the >> - * devices. At present we only support topology where all CPUs >> - * use the same sink [N:1], so only need to find one sink. The >> - * coresight_build_path later will remove any CPU that does not >> - * attach to the sink, or if we have not found a sink. >> + * No sink provided - look for a default sink for all the devices. >> + * We only support multiple sinks, only if all the default sinks >> + * are of the same type, so that the sink buffer can be shared >> + * as the event moves around. As earlier, we don't trace on a >> + * CPU, if we can't find a suitable sink. >> */ >> - if (!sink) >> - sink = coresight_find_default_sink(csdev); >> + if (!sink_forced) { >> + cpu_sink = coresight_find_default_sink(csdev); >> + if (!cpu_sink) >> + goto clear_cpu; >> + /* First sink for this event */ >> + if (!sink) { >> + sink = cpu_sink; >> + } else if (!sinks_match(cpu_sink, sink)) { >> + goto clear_cpu; >> + } >> > > > In per-thread option, cpu_sink always happens to be on core 0, > since all cores are enabled in the cpu mask. > So feels like we need to take into consideration of the core on which > the event gets started(the core on which the process gets executed) > while doing the buffer allocation. event can only be scheduled on a single CPU at anytime (even though, the mask indicates it could be scheduled on "cpu_mask"). So it doesn't really matter where the "buffer" is allocated, unless the NUMA situation comes into picture. But that doesn't help anyway with the AUX buffer, which is attached to the "event" and not the ETMs. When the event moves, the buffer is moved to the corresponding "sink" attached to the CPU. We make sure that the new sink can use the buffer. So, it should work fine. > > On a related note, I had another question on this. > Don't we also need to address cases when multiple threads are forked > in a process ? I don't think we should worry about that in the underlying driver. All we care about is an event and where it can be scheduled and which sink can we use. --per-thread doesn't handle multiple-threads anyway. In the normal mode (i.e, when none of -a / -C / --per-thread are specified), we should get an event per CPU. Each of those events would get a buffer allocated on the "sink" closer to it. In reality buffer allocation is only an issue on NUMA. When a thread is scheduled on the CPU, the event for that CPU kicks in the sink is automatically used. One potential solution to solve the NUMA case is having a buffer allocated per-node. But the AUX buffer would need handling too, which is controlled by the generic perf. Suzuki > > Thanks. > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel