From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A7B7C4332F for ; Tue, 11 Oct 2022 16:17:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229866AbiJKQRQ (ORCPT ); Tue, 11 Oct 2022 12:17:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229600AbiJKQRP (ORCPT ); Tue, 11 Oct 2022 12:17:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46A0E7FE45 for ; Tue, 11 Oct 2022 09:17:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665505029; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pm4VeB8ssmcYRaiZFIcIhGhr8ty4z8Z6FbdtNOJecy0=; b=X24PB1uGCMhBpXyL2eFidTjEOFO7f+QodkMC03tdMCmfyfTPIp+mdLXTscYuUxTjgpnnCY DkCEkXwG77/2i2tR+nmbFP6zXBIp+LiunHReUqFme7BOWXHi42K29n9pDyx2eGx+ZoRMNN qU0IjWxBX+d71GtH/KeV+55Scil5I7Q= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-661-fThP2BvoNjuVNoEdveWT3A-1; Tue, 11 Oct 2022 12:17:08 -0400 X-MC-Unique: fThP2BvoNjuVNoEdveWT3A-1 Received: by mail-wr1-f69.google.com with SMTP id e11-20020adfa74b000000b0022e39e5c151so4008399wrd.3 for ; Tue, 11 Oct 2022 09:17:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pm4VeB8ssmcYRaiZFIcIhGhr8ty4z8Z6FbdtNOJecy0=; b=2++TjN/MfFDy1MZrj5nbmgAJeKbCxwdUew8ZYsc6SR/EOtlMdWAmybJDqboyec3liW pH8xeqg6gZTyVUvJannH4vP+Z8r2e42NQAhlyjOFlC1fQFcm6v8wl115nJImJPVOB92A sFIi3sQN1S1l9fnchIVhVlDkn0QJ6EOAZrlEhi9PLSldgkIr78LeOhcDMn5oqUo9wd60 vF+Erla7zu2PXL0D1NhwTTu+dq0ES8nfinCdlz+jn+yxu5ECL+gFNiZ9JV+hqnZ/aGyr z6nDA8LdHUlicNu1QBq/f07DbCZdR2smq7PGjIoJ/2aRm+X7nmrf70c0O/ouOWC7nWsU OKCA== X-Gm-Message-State: ACrzQf3deSvxhi1Wvb77u5mXTLnVSsO5+hxpe6AGVea4VqWkhPQNzvdb gxPhWv2CmsZgml0M86LIZauVPk9XGR7dCaJFxht8V6Ahcrhh0aG/xtiXg6EqLrEte0wRBgLqa0h KddBYls3npKo2UUswj8o= X-Received: by 2002:a7b:cd96:0:b0:3b4:856a:28f7 with SMTP id y22-20020a7bcd96000000b003b4856a28f7mr17405253wmj.117.1665505027272; Tue, 11 Oct 2022 09:17:07 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6+yXZheRjtpI2Mraq3mO0aiY2aGzgLzN/bP6F6DHEepFtgXyphrd7vxKSt0bqtKCZRjEunTA== X-Received: by 2002:a7b:cd96:0:b0:3b4:856a:28f7 with SMTP id y22-20020a7bcd96000000b003b4856a28f7mr17405207wmj.117.1665505026699; Tue, 11 Oct 2022 09:17:06 -0700 (PDT) Received: from vschneid.remote.csb ([104.132.153.106]) by smtp.gmail.com with ESMTPSA id b21-20020a05600c151500b003c6b9749505sm4667967wmg.30.2022.10.11.09.17.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Oct 2022 09:17:05 -0700 (PDT) From: Valentin Schneider To: Marcelo Tosatti Cc: linux-alpha@vger.kernel.org, linux-kernel@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, openrisc@lists.librecores.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, x86@kernel.org, "Paul E. McKenney" , Steven Rostedt , Peter Zijlstra , Thomas Gleixner , Sebastian Andrzej Siewior , Juri Lelli , Daniel Bristot de Oliveira , Frederic Weisbecker , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Marc Zyngier , Mark Rutland , Russell King , Nicholas Piggin , Guo Ren , "David S. Miller" , Douglas RAILLARD Subject: Re: [RFC PATCH 0/5] Generic IPI sending tracepoint In-Reply-To: References: <20221007154145.1877054-1-vschneid@redhat.com> Date: Tue, 11 Oct 2022 17:17:04 +0100 Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-sh@vger.kernel.org +Cc Douglas On 07/10/22 17:01, Marcelo Tosatti wrote: > Hi Valentin, > > On Fri, Oct 07, 2022 at 04:41:40PM +0100, Valentin Schneider wrote: >> Background >> ========== >> >> As for the targeted CPUs, the existing tracepoint does export them, albeit in >> cpumask form, which is quite inconvenient from a tooling perspective. For >> instance, as far as I'm aware, it's not possible to do event filtering on a >> cpumask via trace-cmd. > > https://man7.org/linux/man-pages/man1/trace-cmd-set.1.html > > -f filter > Specify a filter for the previous event. This must come after > a -e. This will filter what events get recorded based on the > content of the event. Filtering is passed to the kernel > directly so what filtering is allowed may depend on what > version of the kernel you have. Basically, it will let you > use C notation to check if an event should be processed or > not. > > ==, >=, <=, >, <, &, |, && and || > > The above are usually safe to use to compare fields. > > This looks overkill to me (consider large number of bits set in mask). > > +#define trace_ipi_send_cpumask(callsite, mask) do { \ > + if (static_key_false(&__tracepoint_ipi_send_cpu.key)) { \ > + int cpu; \ > + for_each_cpu(cpu, mask) \ > + trace_ipi_send_cpu(callsite, cpu); \ > + } \ > +} while (0) > Indeed, I expected pushback on this :-) I went for this due to how much simpler an int is to process/use compared to a cpumask. There is the trigger example I listed above, but the consumption of the trace event itself as well. Consider this event collected on an arm64 QEMU instance (output from trace-cmd) <...>-234 [001] 37.251567: ipi_raise: target_mask=00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004 (Function call interrupts) That sort of formatting has been an issue downstream for things like LISA [1] where events are aggregated into Pandas tables, and we need to play silly games for performance reason because bitmasks aren't a native Python type. I had a look at libtraceevent to see how this data is exposed and if the answer would be better tooling: tep_get_field_val() just yields an unsigned long long of value 0x200018, which AFAICT is just the [length, offset] thing associated with dynamic arrays. Not really usable, and I don't see anything exported in the lib to extract and use those values. tep_get_field_raw() is better, it handles the dynamic array for us and yields a pointer to the cpumask array at the tail of the record. With that it's easy to get an output such as: cpumask[size=32]=[4,0,0,0,]. Still, this isn't a native type for many programming languages. In contrast, this is immediately readable and consumable by userspace tools <...>-234 [001] 37.250882: ipi_send_cpu: callsite=__smp_call_single_queue+0x5c target_cpu=2 Thinking out loud, it makes way more sense to record a cpumask in the tracepoint, but perhaps we could have a postprocessing step to transform those into N events each targeting a single CPU? [1]: https://github.com/ARM-software/lisa/blob/37b51243a94b27ea031ff62bb4ce818a59a7f6ef/lisa/trace.py#L4756 > >> >> Because of the above points, this is introducing a new tracepoint. >> >> Patches >> ======= >> >> This results in having trace events for: >> >> o smp_call_function*() >> o smp_send_reschedule() >> o irq_work_queue*() >> >> This is incomplete, just looking at arm64 there's more IPI types that aren't covered: >> >> IPI_CPU_STOP, >> IPI_CPU_CRASH_STOP, >> IPI_TIMER, >> IPI_WAKEUP, >> >> ... But it feels like a good starting point. > > Can't you have a single tracepoint (or variant with cpumask) that would > cover such cases as well? > > Maybe (as parameters for tracepoint): > > * type (reschedule, smp_call_function, timer, wakeup, ...). > > * function address: valid for smp_call_function, irq_work_queue > types. > That's a good point, I wasn't sure about having a parameter serving as discriminant for another, but the function address would be either valid or NULL which is fine. So perhaps: o callsite (i.e. _RET_IP_), serves as type o address of callback tied to IPI, if any o target CPUs >> Another thing worth mentioning is that depending on the callsite, the _RET_IP_ >> fed to the tracepoint is not always useful - generic_exec_single() doesn't tell >> you much about the actual callback being sent via IPI, so there might be value >> in exploding the single tracepoint into at least one variant for smp_calls. > > Not sure i grasp what you mean by "exploding the single tracepoint...", > but yes knowing the function or irq work function is very useful. > Sorry; I meant having several "specialized" tracepoints instead of a single one.