From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753333AbbGWOBp (ORCPT ); Thu, 23 Jul 2015 10:01:45 -0400 Received: from mail.kernel.org ([198.145.29.136]:40420 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753262AbbGWOBc (ORCPT ); Thu, 23 Jul 2015 10:01:32 -0400 Date: Thu, 23 Jul 2015 11:01:27 -0300 From: Arnaldo Carvalho de Melo To: Masami Hiramatsu Cc: Hemant Kumar , Peter Zijlstra , linux-kernel@vger.kernel.org, Adrian Hunter , Ingo Molnar , Jiri Olsa , Namhyung Kim , Borislav Petkov Subject: Re: Re: [RFC PATCH perf/core v2 00/16] perf-probe --cache and SDT support Message-ID: <20150723140127.GD3152@kernel.org> References: <20150715091352.8915.87480.stgit@localhost.localdomain> <55A7215F.40803@linux.vnet.ibm.com> <55A874C6.5030202@hitachi.com> <55AFA4E2.4040801@linux.vnet.ibm.com> <55B0E872.8030206@hitachi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55B0E872.8030206@hitachi.com> X-Url: http://acmel.wordpress.com User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Thu, Jul 23, 2015 at 10:13:22PM +0900, Masami Hiramatsu escreveu: > On 2015/07/22 23:12, Hemant Kumar wrote: > > On 07/17/2015 08:51 AM, Masami Hiramatsu wrote: > >> On 2015/07/16 12:13, Hemant Kumar wrote: > > The idea behind '%' was to identify the SDT events and take a different path > > to lookup through the cache, put a probe, record and then delete the probe. > > Or, do you want "perf record" to record any event this way (not just an sdt > > event). > I see, but I think that is not good by following reasons, > - when we record event with "-e %provider:event", it will be shown as > "provider:event" > - if perf-list shows the SDT(cached) events as "%provider:event", that > will not match the recorded result. > - it is somewhat fragile that we temporary add the SDT event and remove it > after record, because the event will not hide from ftrace users (this > means that we'll fail removing the event by -EBUSY if someone use it > via ftrace) We should avoid that, if we say record event "foo:bar", then we should consistently show "foo:bar" everywhere this is referenced. > - if we set SDT events perf-probe, it will be shown as "provider:event" name > because "%" will be rejected by ftrace. In that case, what the perf-list show > those events, both of %provider:event and provider:event ? > thus I pushed the "%" as a "special remembering mark" only for looking > up the event from cache by perf-probe. > So I'd like to suggest that the following behavior > 1) perf-list shows the cached-with-name and SDT events as Tracepoint events > even if it is not yet probed. I can agree with 'perf list' showing what can be used as events, and SDTs, AFAIK, match that definition, i.e. we have somewhere (in the DSOs, right?) information about where to ask for an event to be enabled. If there are details on how that needs to be obtained, then passed to the kernel somehow to then become really, really accessible, then these details are completely internal. > # perf list > > List of pre-defined events (to be used in -e): > ... > libc:memory_heap_new [Tracepoint event] Is it like this or is it like [ku]probes where we already have a namespace qualifier, i.e.: [root@zoo ~]# perf probe icmp_rcv Added new event: probe:icmp_rcv (on icmp_rcv) You can now use it in all perf tools, such as: perf record -e probe:icmp_rcv -aR sleep 1 [root@zoo ~]# [root@zoo ~]# perf probe /lib64/libc-2.20.so malloc Added new events: probe_libc:malloc (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_1 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_2 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_3 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_4 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_5 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_6 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_7 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_8 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_9 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_10 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_11 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_12 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_13 (on malloc in /lib64/libc-2.20.so) probe_libc:malloc_14 (on malloc in /lib64/libc-2.20.so) You can now use it in all perf tools, such as: perf record -e probe_libc:malloc_14 -aR sleep 1 [root@zoo ~]# "probe" for kernel events, "probe_%s" % DSO basename for userspace events. Why not continue with that and have SDTs use the probe_%s: namespace? Sorry if this was already discussed here... If there is some ambiguity, that can be resolved by explicitely setting a new name, 'perf probe' has provision for that, right? I.e.: [root@zoo ~]# perf probe /lib64/libc-2.20.so another_name=malloc Added new events: probe_libc:another_name_14 (on malloc in /lib64/libc-2.20.so) You can now use it in all perf tools, such as: perf record -e probe_libc:another_name_14 -aR sleep 1 [root@zoo ~]# > ... > probes:myevent [Tracepoint event] > ... > 2) perf-record -e with no-probed event should try to set up the given probe > by using perf-probe. It is possible to remove that the probe after recording, > but also ignore if it fails by -EBUSY. (anyway, there is no difference for > users) Right, being able to add new probes _without_ calling 'perf probe', but instead functions used by 'perf probe' is something the eBPF does (I'm almost getting there... :) ) and that I want to do in other tools, like 'trace' as well... There are permission problems about how to add new probes and how to _enable_ them, i.e. I think it is ok to allow users to ask for probe:foo, if they are monitoring just their workloads... [root@zoo ~]# perf probe 'vfs_getname=getname_flags:72 pathname=filename:string' Added new event: probe:vfs_getname (on getname_flags:72 with pathname=filename:string) You can now use it in all perf tools, such as: perf record -e probe:vfs_getname -aR sleep 1 [root@zoo ~]# Then, as !root: [acme@zoo linux]$ trace cat /etc/passwd Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit) Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug' By default, !root users can't see debugfs, so, no dice in trying to use the raw_syscalls tracepoints, bummer, then: [acme@zoo linux]$ sudo mount -o remount,mode=755 /sys/kernel/debug [sudo] password for acme: [acme@zoo linux]$ trace cat /etc/passwd Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit) Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' Ouch, now this tracefs thing inside debugfs, at least sudo doesn't asks me for the password this time! But then, since 'perf trace' knows there is a probe:vfs_getname syscall in place, it will try to use it, to monitor a workload it is about to start, but: [acme@zoo linux]$ sudo mount -o remount,mode=755 /sys/kernel/debug/tracing [acme@zoo linux]$ trace cat /etc/passwd Error: Operation not permitted. Hint: Check /proc/sys/kernel/perf_event_paranoid setting. Hint: For system wide tracing it needs to be set to -1. Hint: Try: 'sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"' Hint: The current value is 1. [acme@zoo linux]$ This needs refining, i.e. I should just warn that albeit "probe:vfs_getname" is available, it can't be used, unless we open the doors wide. Sorry for the digression, but these permission issues will hit us with SDT as well, no? > This rule will solve the contradiction between the event name on recorded > data and listed events. However, as we discussed there are other clashes. > A) clash among binaries: Since the binary builders can freely use the > provider name, it is possible to clash to other binaries' SDTs. Yes, one way to disambiguate is to use the buildid, i.e. content based, when/if the need arises. We should _always_ store the build-id, together with any probe used, so that we can bail out when trying to run those with a non matching DSO (kernel, module, library, whatever). When specifying some SDT that is ambiguous, we should bail out and ask for further namespacing, like: [acme@zoo linux]$ git show --oneline 25b6 error: short SHA1 25b6 is ambiguous. error: short SHA1 25b6 is ambiguous. fatal: ambiguous argument '25b6': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git [...] -- [...]' [acme@zoo linux]$ git show --oneline 25b67 error: short SHA1 25b67 is ambiguous. error: short SHA1 25b67 is ambiguous. fatal: ambiguous argument '25b67': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git [...] -- [...]' It can be via build-id or by asking for the full pathname to the DSO, etc. > B) clash among different versions: Of course the different versions of binaries > can be co-exist on the system. Those usually have the same SDTs and same > basename, just different build-ids. Right, in that case the build-id or the full pathname needs to be specified somehow > These issues are not solved by using "%" because it happens among SDTs. > So we need to find another way to distinguish the SDTs. >>From what I've read so far (not much): agreed. - Arnaldo