* [RFC] perf record: missing buildid for callstack modules @ 2016-01-07 21:56 Stephane Eranian 2016-01-07 21:57 ` Andi Kleen 2016-01-07 21:59 ` Arnaldo Carvalho de Melo 0 siblings, 2 replies; 47+ messages in thread From: Stephane Eranian @ 2016-01-07 21:56 UTC (permalink / raw) To: LKML Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Peter Zijlstra, Ingo Molnar, Adrian Hunter, ak Hi, Whenever you do: $ perf record -g -a sleep 10 Perf will collect the callstack for each sample. At the end of the run, perf record adds the buildid for all dso with at least one sample. But when it does this, it only looks at the sampled IP and ignore the modules traversed by the callstack. That means that, it is not possible to uniquely identify the modules executed, unless they had at least one IP sample captured. But this is not always the case. How about providing an option to perf record to force collecting buildid for all IPs captured in the callstack? I understand that would cost more at the end of the collection, but this would be beneficial to several monitoring scenarios. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-07 21:56 [RFC] perf record: missing buildid for callstack modules Stephane Eranian @ 2016-01-07 21:57 ` Andi Kleen 2016-01-07 22:00 ` Stephane Eranian 2016-01-07 21:59 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 47+ messages in thread From: Andi Kleen @ 2016-01-07 21:57 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Peter Zijlstra, Ingo Molnar, Adrian Hunter On Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian wrote: > Hi, > > Whenever you do: > > $ perf record -g -a sleep 10 > > Perf will collect the callstack for each sample. At the end of the > run, perf record > adds the buildid for all dso with at least one sample. But when it does this, it > only looks at the sampled IP and ignore the modules traversed by the callstack. > That means that, it is not possible to uniquely identify the modules executed, > unless they had at least one IP sample captured. But this is not > always the case. > > How about providing an option to perf record to force collecting > buildid for all IPs > captured in the callstack? I understand that would cost more at the end of the > collection, but this would be beneficial to several monitoring scenarios. If you do that I would rather collect buildid for everything that is synthesized (everything in /proc) -Andi ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-07 21:57 ` Andi Kleen @ 2016-01-07 22:00 ` Stephane Eranian 0 siblings, 0 replies; 47+ messages in thread From: Stephane Eranian @ 2016-01-07 22:00 UTC (permalink / raw) To: Andi Kleen Cc: LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Peter Zijlstra, Ingo Molnar, Adrian Hunter On Thu, Jan 7, 2016 at 1:57 PM, Andi Kleen <ak@linux.intel.com> wrote: > On Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian wrote: >> Hi, >> >> Whenever you do: >> >> $ perf record -g -a sleep 10 >> >> Perf will collect the callstack for each sample. At the end of the >> run, perf record >> adds the buildid for all dso with at least one sample. But when it does this, it >> only looks at the sampled IP and ignore the modules traversed by the callstack. >> That means that, it is not possible to uniquely identify the modules executed, >> unless they had at least one IP sample captured. But this is not >> always the case. >> >> How about providing an option to perf record to force collecting >> buildid for all IPs >> captured in the callstack? I understand that would cost more at the end of the >> collection, but this would be beneficial to several monitoring scenarios. > > If you do that I would rather collect buildid for everything that is synthesized > (everything in /proc) > But that would be much more than whatever would be captured with hits in callstacks. A valid scenario here is sampling on tracepoints (which are kernel only events) but you want to know where the syscall originated from or which user code was interrupted. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-07 21:56 [RFC] perf record: missing buildid for callstack modules Stephane Eranian 2016-01-07 21:57 ` Andi Kleen @ 2016-01-07 21:59 ` Arnaldo Carvalho de Melo 2016-01-07 22:00 ` Stephane Eranian 1 sibling, 1 reply; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-07 21:59 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, Jiri Olsa, Namhyung Kim, Peter Zijlstra, Ingo Molnar, Adrian Hunter, ak Em Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian escreveu: > Hi, > > Whenever you do: > > $ perf record -g -a sleep 10 > > Perf will collect the callstack for each sample. At the end of the > run, perf record > adds the buildid for all dso with at least one sample. But when it does this, it > only looks at the sampled IP and ignore the modules traversed by the callstack. > That means that, it is not possible to uniquely identify the modules executed, > unless they had at least one IP sample captured. But this is not > always the case. > > How about providing an option to perf record to force collecting > buildid for all IPs > captured in the callstack? I understand that would cost more at the end of the > collection, but this would be beneficial to several monitoring scenarios. I agree, would consider applying a patch that provides the option but does not do this by default. - Arnaldo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-07 21:59 ` Arnaldo Carvalho de Melo @ 2016-01-07 22:00 ` Stephane Eranian 2016-01-07 22:47 ` Namhyung Kim 0 siblings, 1 reply; 47+ messages in thread From: Stephane Eranian @ 2016-01-07 22:00 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: LKML, Jiri Olsa, Namhyung Kim, Peter Zijlstra, Ingo Molnar, Adrian Hunter, ak On Thu, Jan 7, 2016 at 1:59 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Em Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian escreveu: >> Hi, >> >> Whenever you do: >> >> $ perf record -g -a sleep 10 >> >> Perf will collect the callstack for each sample. At the end of the >> run, perf record >> adds the buildid for all dso with at least one sample. But when it does this, it >> only looks at the sampled IP and ignore the modules traversed by the callstack. >> That means that, it is not possible to uniquely identify the modules executed, >> unless they had at least one IP sample captured. But this is not >> always the case. >> >> How about providing an option to perf record to force collecting >> buildid for all IPs >> captured in the callstack? I understand that would cost more at the end of the >> collection, but this would be beneficial to several monitoring scenarios. > > I agree, would consider applying a patch that provides the option but > does not do this by default. > I agree, not the default. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-07 22:00 ` Stephane Eranian @ 2016-01-07 22:47 ` Namhyung Kim 2016-01-07 23:47 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 47+ messages in thread From: Namhyung Kim @ 2016-01-07 22:47 UTC (permalink / raw) To: Stephane Eranian, Arnaldo Carvalho de Melo Cc: LKML, Jiri Olsa, Namhyung Kim, Peter Zijlstra, Ingo Molnar, Adrian Hunter, ak On January 8, 2016 7:00:35 AM GMT+09:00, Stephane Eranian <eranian@google.com> wrote: >On Thu, Jan 7, 2016 at 1:59 PM, Arnaldo Carvalho de Melo ><acme@kernel.org> wrote: >> Em Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian escreveu: >>> Hi, >>> >>> Whenever you do: >>> >>> $ perf record -g -a sleep 10 >>> >>> Perf will collect the callstack for each sample. At the end of the >>> run, perf record >>> adds the buildid for all dso with at least one sample. But when it >does this, it >>> only looks at the sampled IP and ignore the modules traversed by the >callstack. >>> That means that, it is not possible to uniquely identify the modules >executed, >>> unless they had at least one IP sample captured. But this is not >>> always the case. >>> >>> How about providing an option to perf record to force collecting >>> buildid for all IPs >>> captured in the callstack? I understand that would cost more at the >end of the >>> collection, but this would be beneficial to several monitoring >scenarios. >> >> I agree, would consider applying a patch that provides the option but >> does not do this by default. >> >I agree, not the default. Hi Stephane, Please see https://lkml.org/lkml/2015/3/22/249 Thanks, Namhyung -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-07 22:47 ` Namhyung Kim @ 2016-01-07 23:47 ` Arnaldo Carvalho de Melo 2016-01-08 18:01 ` Stephane Eranian 0 siblings, 1 reply; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-07 23:47 UTC (permalink / raw) To: Namhyung Kim Cc: Stephane Eranian, LKML, Jiri Olsa, Namhyung Kim, Peter Zijlstra, Ingo Molnar, Adrian Hunter, ak Em Fri, Jan 08, 2016 at 07:47:03AM +0900, Namhyung Kim escreveu: > On January 8, 2016 7:00:35 AM GMT+09:00, Stephane Eranian <eranian@google.com> wrote: > >On Thu, Jan 7, 2016 at 1:59 PM, Arnaldo Carvalho de Melo > ><acme@kernel.org> wrote: > >> Em Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian escreveu: > >>> Hi, > >>> > >>> Whenever you do: > >>> > >>> $ perf record -g -a sleep 10 > >>> > >>> Perf will collect the callstack for each sample. At the end of the > >>> run, perf record > >>> adds the buildid for all dso with at least one sample. But when it > >does this, it > >>> only looks at the sampled IP and ignore the modules traversed by the > >callstack. > >>> That means that, it is not possible to uniquely identify the modules > >executed, > >>> unless they had at least one IP sample captured. But this is not > >>> always the case. > >>> > >>> How about providing an option to perf record to force collecting > >>> buildid for all IPs > >>> captured in the callstack? I understand that would cost more at the > >end of the > >>> collection, but this would be beneficial to several monitoring > >scenarios. > >> > >> I agree, would consider applying a patch that provides the option but > >> does not do this by default. > >> > >I agree, not the default. > > Hi Stephane, > > Please see > > https://lkml.org/lkml/2015/3/22/249 Oops, Stephane, please try this, so that we can finally merge it :-\ - Arnaldo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-07 23:47 ` Arnaldo Carvalho de Melo @ 2016-01-08 18:01 ` Stephane Eranian 2016-01-08 18:19 ` Arnaldo Carvalho de Melo 2016-01-09 10:31 ` Namhyung Kim 0 siblings, 2 replies; 47+ messages in thread From: Stephane Eranian @ 2016-01-08 18:01 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Peter Zijlstra, Ingo Molnar, Adrian Hunter, ak On Thu, Jan 7, 2016 at 3:47 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Em Fri, Jan 08, 2016 at 07:47:03AM +0900, Namhyung Kim escreveu: >> On January 8, 2016 7:00:35 AM GMT+09:00, Stephane Eranian <eranian@google.com> wrote: >> >On Thu, Jan 7, 2016 at 1:59 PM, Arnaldo Carvalho de Melo >> ><acme@kernel.org> wrote: >> >> Em Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian escreveu: >> >>> Hi, >> >>> >> >>> Whenever you do: >> >>> >> >>> $ perf record -g -a sleep 10 >> >>> >> >>> Perf will collect the callstack for each sample. At the end of the >> >>> run, perf record >> >>> adds the buildid for all dso with at least one sample. But when it >> >does this, it >> >>> only looks at the sampled IP and ignore the modules traversed by the >> >callstack. >> >>> That means that, it is not possible to uniquely identify the modules >> >executed, >> >>> unless they had at least one IP sample captured. But this is not >> >>> always the case. >> >>> >> >>> How about providing an option to perf record to force collecting >> >>> buildid for all IPs >> >>> captured in the callstack? I understand that would cost more at the >> >end of the >> >>> collection, but this would be beneficial to several monitoring >> >scenarios. >> >> >> >> I agree, would consider applying a patch that provides the option but >> >> does not do this by default. >> >> >> >I agree, not the default. >> >> Hi Stephane, >> >> Please see >> >> https://lkml.org/lkml/2015/3/22/249 > > > Oops, Stephane, please try this, so that we can finally merge it :-\ > I will try it today. However, I am a bit worried about the performance impact. Unless I am missing something in this approach we may end up looking up N times the same module if it appears in N callstacks. In Andi's suggested approach, there would be only one pass at the beginning (or the end of the run). But you could miss some modules if they are gone by the time you run the pass. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-08 18:01 ` Stephane Eranian @ 2016-01-08 18:19 ` Arnaldo Carvalho de Melo 2016-01-11 17:30 ` Peter Zijlstra 2016-01-09 10:31 ` Namhyung Kim 1 sibling, 1 reply; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-08 18:19 UTC (permalink / raw) To: Stephane Eranian Cc: Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Peter Zijlstra, Ingo Molnar, Adrian Hunter, ak Em Fri, Jan 08, 2016 at 10:01:24AM -0800, Stephane Eranian escreveu: > On Thu, Jan 7, 2016 at 3:47 PM, Arnaldo Carvalho de Melo > <acme@kernel.org> wrote: > > Em Fri, Jan 08, 2016 at 07:47:03AM +0900, Namhyung Kim escreveu: > >> On January 8, 2016 7:00:35 AM GMT+09:00, Stephane Eranian <eranian@google.com> wrote: > >> >>> How about providing an option to perf record to force collecting > >> >>> buildid for all IPs > >> >>> captured in the callstack? I understand that would cost more at the > >> >end of the > >> >>> collection, but this would be beneficial to several monitoring > >> >scenarios. > >> >> I agree, would consider applying a patch that provides the option but > >> >> does not do this by default. > >> >I agree, not the default. > >> Please see > >> https://lkml.org/lkml/2015/3/22/249 > > Oops, Stephane, please try this, so that we can finally merge it :-\ > I will try it today. However, I am a bit worried about the performance > impact. Unless I am missing something in this approach we may end up > looking up N times the same module if it appears in N callstacks. In > Andi's suggested approach, there would be only one pass at the beginning > (or the end of the run). But you could miss some modules if they are gone > by the time you run the pass. For kernel modules, yeah, since we'll have to synthesize them at session start, we could as well save the buildids at that point, but then this, as well as the saving of buildids for normal DSOs is racy, since we collect it just at the end of the session. We already discussed how to solve it, and it involves extending once more PERF_RECORD_MMAP, so that, when we load a DSO we stash its build-id in a per-DSO data structure in the kernel, then, when generating PERF_RECORD_MMAP3 we put the buildid there, this way if any of those binaries gets replaced while we're recording samples, we would notice, i.e. we wouldn't care that much about the pathname, looking everything by the content based buildid instead. And also for modules, what if a module is loaded during the session? We'll miss, or gets replaced by a newer version? We'll miss as well. We need to hook into module loading/unloading and generate PERF_RECORD_MMAP3 records there as well. Best thing we can do right now (by adding extra events to the 'perf record' command line) is to hook into places to detect these issues and at least warn the user that some of the DSOs with hits got updated during the session. Namhyungs approach is no better than what we do now, i.e. it doesn't detects such flaps, but will be just fine if no updates take place, guaranteeing that later, at analysis time, we pick the right ELF/kallsyms files, keyed by their build ids. - Arnaldo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-08 18:19 ` Arnaldo Carvalho de Melo @ 2016-01-11 17:30 ` Peter Zijlstra 2016-01-11 18:22 ` Arnaldo Carvalho de Melo 2016-01-12 10:39 ` Ingo Molnar 0 siblings, 2 replies; 47+ messages in thread From: Peter Zijlstra @ 2016-01-11 17:30 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar, Adrian Hunter, ak On Fri, Jan 08, 2016 at 03:19:42PM -0300, Arnaldo Carvalho de Melo wrote: > We already discussed how to solve it, and it involves extending once > more PERF_RECORD_MMAP, so that, when we load a DSO we stash its build-id > in a per-DSO data structure in the kernel, then, when generating > PERF_RECORD_MMAP3 we put the buildid there, this way if any of those > binaries gets replaced while we're recording samples, we would notice, > i.e. we wouldn't care that much about the pathname, looking everything > by the content based buildid instead. Does the kernel even know about the buildid crap? AFAIK the binfmt stuff doesn't know or care about things like that. Heck, we support binfmts that do not even have a buildid. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-11 17:30 ` Peter Zijlstra @ 2016-01-11 18:22 ` Arnaldo Carvalho de Melo 2016-01-11 20:06 ` Stephane Eranian 2016-01-12 10:39 ` Ingo Molnar 1 sibling, 1 reply; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-11 18:22 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar, Adrian Hunter, ak Em Mon, Jan 11, 2016 at 06:30:36PM +0100, Peter Zijlstra escreveu: > On Fri, Jan 08, 2016 at 03:19:42PM -0300, Arnaldo Carvalho de Melo wrote: > > We already discussed how to solve it, and it involves extending once > > more PERF_RECORD_MMAP, so that, when we load a DSO we stash its build-id > > in a per-DSO data structure in the kernel, then, when generating > > PERF_RECORD_MMAP3 we put the buildid there, this way if any of those > > binaries gets replaced while we're recording samples, we would notice, > > i.e. we wouldn't care that much about the pathname, looking everything > > by the content based buildid instead. > Does the kernel even know about the buildid crap? AFAIK the binfmt stuff > doesn't know or care about things like that. Heck, we support binfmts > that do not even have a buildid. Well, we need some cookie like that, build-id or something that allows us to find the right binary for doing symbol resolution, annotation, etc. And we need to do it at mmap time, i.e. we don't know upfront what DSOs and if we do at the end we will incur in things we also don't like: workload wide scanning of used DSOs, and they could've been replaced... If we do it using these build id ELF sections distros have for quite a while or with something else, we'd have to try and see. I.e. the MMAP3 would have whatever is in a content-based pre-calculated unique cookie slot, per DSO. How to populate that slot? Well, at load time we could get that build-id section and put there, for ELF DSOs. Or we could generate that at load time and later recalculate when looking for the DSO to match what is in a perf.data file. - Arnaldo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-11 18:22 ` Arnaldo Carvalho de Melo @ 2016-01-11 20:06 ` Stephane Eranian 0 siblings, 0 replies; 47+ messages in thread From: Stephane Eranian @ 2016-01-11 20:06 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Ingo Molnar, Adrian Hunter, ak Hi, On Mon, Jan 11, 2016 at 10:22 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Em Mon, Jan 11, 2016 at 06:30:36PM +0100, Peter Zijlstra escreveu: >> On Fri, Jan 08, 2016 at 03:19:42PM -0300, Arnaldo Carvalho de Melo wrote: >> > We already discussed how to solve it, and it involves extending once >> > more PERF_RECORD_MMAP, so that, when we load a DSO we stash its build-id >> > in a per-DSO data structure in the kernel, then, when generating >> > PERF_RECORD_MMAP3 we put the buildid there, this way if any of those >> > binaries gets replaced while we're recording samples, we would notice, >> > i.e. we wouldn't care that much about the pathname, looking everything >> > by the content based buildid instead. > >> Does the kernel even know about the buildid crap? AFAIK the binfmt stuff >> doesn't know or care about things like that. Heck, we support binfmts >> that do not even have a buildid. > It does not have to right now. > Well, we need some cookie like that, build-id or something that allows > us to find the right binary for doing symbol resolution, annotation, > etc. > Well, you need a cookie that the kernel can have access to quickly and cheaply. That cookie would have to be accessible to the user as well and has to be unique per DSO. So it would have to be in the DSO headers and calculated from known fileds in the DSO header (which fields depends on the DSO binary format, does not have to be ELF). And you'd need a unique cookie -> filepath mapping. So yeah, it would have to be in a MMAP-type record. > And we need to do it at mmap time, i.e. we don't know upfront what DSOs > and if we do at the end we will incur in things we also don't like: > workload wide scanning of used DSOs, and they could've been replaced... > > If we do it using these build id ELF sections distros have for quite a > while or with something else, we'd have to try and see. > > I.e. the MMAP3 would have whatever is in a content-based pre-calculated > unique cookie slot, per DSO. How to populate that slot? Well, at load > time we could get that build-id section and put there, for ELF DSOs. > > Or we could generate that at load time and later recalculate when > looking for the DSO to match what is in a perf.data file. > I can see clearly how to make this work for ELF + buildid. It it much harder for formats without buildids. You'd have to have a rule between the kernel and user tool that controls what fields in the DSO and used to compute the hash. Buildid are at most 20 bytes IIRC. So any hash would have to fit in these. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-11 17:30 ` Peter Zijlstra 2016-01-11 18:22 ` Arnaldo Carvalho de Melo @ 2016-01-12 10:39 ` Ingo Molnar 2016-01-12 11:35 ` Peter Zijlstra 2016-01-12 14:23 ` Arnaldo Carvalho de Melo 1 sibling, 2 replies; 47+ messages in thread From: Ingo Molnar @ 2016-01-12 10:39 UTC (permalink / raw) To: Peter Zijlstra Cc: Arnaldo Carvalho de Melo, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak * Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jan 08, 2016 at 03:19:42PM -0300, Arnaldo Carvalho de Melo wrote: > > > We already discussed how to solve it, and it involves extending once more > > PERF_RECORD_MMAP, so that, when we load a DSO we stash its build-id in a > > per-DSO data structure in the kernel, then, when generating PERF_RECORD_MMAP3 > > we put the buildid there, this way if any of those binaries gets replaced > > while we're recording samples, we would notice, i.e. we wouldn't care that > > much about the pathname, looking everything by the content based buildid > > instead. > > Does the kernel even know about the buildid crap? AFAIK the binfmt stuff doesn't > know or care about things like that. Heck, we support binfmts that do not even > have a buildid. The kernel's exec() code does not care about the past, it will execute whatever is fit to execute right now. But perf tooling cares very much: it can lead to subtle bugs and bad data if we display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of binary mismatch can be very convincing and can send developers down the wrong path for hours. I'd expect my tooling to not do that. Path names alone (the thing that exec() cares about) are not unique enough to identify the binary that was profiled. So we need a content hash - hence the build-ID. Can you suggest a better solution than a build-time calculated content hash? As for binary formats that suck and don't allow for a content hash: we do our best, but of course the risk of data mismatch is there. We could perhaps cache the binary inode's mtime field to at least produce a 'profile data is older than binary/DSO modification date!' warning. (Which check won't catch all cases, like cross-system profiling data matches.) Thanks, Ingo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 10:39 ` Ingo Molnar @ 2016-01-12 11:35 ` Peter Zijlstra 2016-01-12 12:18 ` Ingo Molnar ` (2 more replies) 2016-01-12 14:23 ` Arnaldo Carvalho de Melo 1 sibling, 3 replies; 47+ messages in thread From: Peter Zijlstra @ 2016-01-12 11:35 UTC (permalink / raw) To: Ingo Molnar Cc: Arnaldo Carvalho de Melo, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Tue, Jan 12, 2016 at 11:39:43AM +0100, Ingo Molnar wrote: > > Does the kernel even know about the buildid crap? AFAIK the binfmt stuff doesn't > > know or care about things like that. Heck, we support binfmts that do not even > > have a buildid. > > The kernel's exec() code does not care about the past, it will execute whatever is > fit to execute right now. > > But perf tooling cares very much: it can lead to subtle bugs and bad data if we > display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of > binary mismatch can be very convincing and can send developers down the wrong path > for hours. I'd expect my tooling to not do that. Well, it really is rather a rare case, replacing binaries you're profiling. Sure, if it happens (by accident or otherwise) it can be a pain, but the cost of fixing this 'problem' is huge. > Path names alone (the thing that exec() cares about) are not unique enough to > identify the binary that was profiled. So we need a content hash - hence the > build-ID. > > Can you suggest a better solution than a build-time calculated content hash? Not really, but the current 'solution' is a massive pain. The result is that perf-record needs to do a full scan of the recorded data after completion and look for buildids across the system. On my system that pass takes longer than the actual workload (of building a kernel). Furthermore, the resulting data is useless for me. > As for binary formats that suck and don't allow for a content hash: we do our > best, but of course the risk of data mismatch is there. We could perhaps cache the > binary inode's mtime field to at least produce a 'profile data is older than > binary/DSO modification date!' warning. (Which check won't catch all cases, like > cross-system profiling data matches.) So my problem with the kernel side thing is that I fear it will, again, be a partial solution, and we'll still end up scanning the perf-record output, ie. nothing better than we are now. Sure, maybe we can have binfmt_elf read the buildid and cache it someplace, maybe we can even have the other binfmt thingies do something similar (at small cost, we obviously cannot compute hashes over files at exec() time, that would upset people). But what do we do for DSOs, does dlopen() ever end up in the binfmt code? I would think not, I would fully expect the dynamic linker to just mmap() the relevant bits and be done with it. And we cannot, at mmap() time, 'assume' the file is ELF and try prodding into it to find a buildid or whatnot. And all for some weird corner case. ~ Peter, who thinks buildid stuff stinks. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 11:35 ` Peter Zijlstra @ 2016-01-12 12:18 ` Ingo Molnar 2016-01-12 13:40 ` Peter Zijlstra 2016-01-12 13:08 ` One Thousand Gnomes 2016-01-12 14:34 ` Arnaldo Carvalho de Melo 2 siblings, 1 reply; 47+ messages in thread From: Ingo Molnar @ 2016-01-12 12:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Arnaldo Carvalho de Melo, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak * Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Jan 12, 2016 at 11:39:43AM +0100, Ingo Molnar wrote: > > > Does the kernel even know about the buildid crap? AFAIK the binfmt stuff doesn't > > > know or care about things like that. Heck, we support binfmts that do not even > > > have a buildid. > > > > The kernel's exec() code does not care about the past, it will execute whatever is > > fit to execute right now. > > > > But perf tooling cares very much: it can lead to subtle bugs and bad data if we > > display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of > > binary mismatch can be very convincing and can send developers down the wrong path > > for hours. I'd expect my tooling to not do that. > > Well, it really is rather a rare case, replacing binaries you're > profiling. Sure, if it happens (by accident or otherwise) it can be a > pain, but the cost of fixing this 'problem' is huge. But isn't this the common case for developers, who rebuild their binaries all the time, while profiling them? Looking at the wrong profile without having an indication that it's wrong is a problem. > > Path names alone (the thing that exec() cares about) are not unique enough to > > identify the binary that was profiled. So we need a content hash - hence the > > build-ID. > > > > Can you suggest a better solution than a build-time calculated content hash? > > Not really, but the current 'solution' is a massive pain. The result is that > perf-record needs to do a full scan of the recorded data after completion and > look for buildids across the system. > > On my system that pass takes longer than the actual workload (of building a > kernel). Furthermore, the resulting data is useless for me. Hm, that's a powerful performance argument. Why is it so slow? I'd assume that by default we only need to save the build-ID itself per object - which is like 20 bytes? > > As for binary formats that suck and don't allow for a content hash: we do our > > best, but of course the risk of data mismatch is there. We could perhaps cache > > the binary inode's mtime field to at least produce a 'profile data is older > > than binary/DSO modification date!' warning. (Which check won't catch all > > cases, like cross-system profiling data matches.) > > So my problem with the kernel side thing is that I fear it will, again, be a > partial solution, and we'll still end up scanning the perf-record output, ie. > nothing better than we are now. > > Sure, maybe we can have binfmt_elf read the buildid and cache it someplace, > maybe we can even have the other binfmt thingies do something similar (at small > cost, we obviously cannot compute hashes over files at exec() time, that would > upset people). > > But what do we do for DSOs, does dlopen() ever end up in the binfmt code? I > would think not, I would fully expect the dynamic linker to just mmap() the > relevant bits and be done with it. > > And we cannot, at mmap() time, 'assume' the file is ELF and try prodding into it > to find a buildid or whatnot. > > And all for some weird corner case. So could we perhaps just switch the whole thing over to be mtime based: mtime is pretty indicative of whether a binary is the right one or not. And mtime could be checked at perf report time, not at perf record time: we'd only have to check whether the mtime of the object we read at perf report time is newer than the mtime of the perf.data (the creation of the profile). This does not solve rare corner cases like cross-system profiling, but I think the common case should not be burdened with the overhead of a rare case. Thanks, Ingo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 12:18 ` Ingo Molnar @ 2016-01-12 13:40 ` Peter Zijlstra 2016-01-12 14:38 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2016-01-12 13:40 UTC (permalink / raw) To: Ingo Molnar Cc: Arnaldo Carvalho de Melo, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Tue, Jan 12, 2016 at 01:18:05PM +0100, Ingo Molnar wrote: > > Well, it really is rather a rare case, replacing binaries you're > > profiling. Sure, if it happens (by accident or otherwise) it can be a > > pain, but the cost of fixing this 'problem' is huge. > > But isn't this the common case for developers, who rebuild their binaries all the > time, while profiling them? Looking at the wrong profile without having an > indication that it's wrong is a problem. I tend to: 1: edit code compile code (perf) run code inspect profile goto 1 which does not have this problem at all. Only if you want to inspect 'old' profiles does this problem occur. > > On my system that pass takes longer than the actual workload (of building a > > kernel). Furthermore, the resulting data is useless for me. > > Hm, that's a powerful performance argument. Why is it so slow? I'd assume that by > default we only need to save the build-ID itself per object - which is like 20 > bytes? There is no buildid in the recorded data, I think it looks at every MMAP record, finds the associated file, extracts the buildid and copies crap into .debug directory. Also, just parsing the gigabytes of data that comes out of perf-record takes significant time, let alone poking around the filesystem and copying files around. Furthermore, I have 40 CPUs generating data, while only a single one is doing all this post-processing. # rm -rf ~/.debug/ # make O=defconfig-build/ clean; perf record make O=defconfig-build/ -j80 -s # ls -lah perf.data -rw------- 1 root root 2.7G Jan 12 14:18 perf.data # du -sh ~/.debug/ 240M /root/.debug/ That's a lot of pointless work. > > And all for some weird corner case. > > So could we perhaps just switch the whole thing over to be mtime based: mtime is > pretty indicative of whether a binary is the right one or not. > > And mtime could be checked at perf report time, not at perf record time: we'd only > have to check whether the mtime of the object we read at perf report time is newer > than the mtime of the perf.data (the creation of the profile). > > This does not solve rare corner cases like cross-system profiling, but I think the > common case should not be burdened with the overhead of a rare case. That might work, we have easy access to the mtime data for any file. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 13:40 ` Peter Zijlstra @ 2016-01-12 14:38 ` Arnaldo Carvalho de Melo 2016-01-12 15:34 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-12 14:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak Em Tue, Jan 12, 2016 at 02:40:12PM +0100, Peter Zijlstra escreveu: > On Tue, Jan 12, 2016 at 01:18:05PM +0100, Ingo Molnar wrote: > > > Well, it really is rather a rare case, replacing binaries you're > > > profiling. Sure, if it happens (by accident or otherwise) it can be a > > > pain, but the cost of fixing this 'problem' is huge. > > > > But isn't this the common case for developers, who rebuild their binaries all the > > time, while profiling them? Looking at the wrong profile without having an > > indication that it's wrong is a problem. > > I tend to: > > 1: > edit code > compile code > (perf) run code > inspect profile > goto 1 > > which does not have this problem at all. Only if you want to inspect > 'old' profiles does this problem occur. > > > > On my system that pass takes longer than the actual workload (of building a > > > kernel). Furthermore, the resulting data is useless for me. > > > > Hm, that's a powerful performance argument. Why is it so slow? I'd assume that by > > default we only need to save the build-ID itself per object - which is like 20 > > bytes? > > There is no buildid in the recorded data, I think it looks at every MMAP > record, finds the associated file, extracts the buildid and copies crap > into .debug directory. $ perf record -h build Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -B, --no-buildid do not collect buildids in perf.data -N, --no-buildid-cache do not update the buildid cache [acme@zoo linux]$ > Also, just parsing the gigabytes of data that comes out of perf-record > takes significant time, let alone poking around the filesystem and Right, that is what we would elliminate with stashing the content-based cookie into a PERF_RECORD_MMAP3 record. > copying files around. > > Furthermore, I have 40 CPUs generating data, while only a single one is > doing all this post-processing. > > # rm -rf ~/.debug/ > # make O=defconfig-build/ clean; perf record make O=defconfig-build/ -j80 -s > # ls -lah perf.data > -rw------- 1 root root 2.7G Jan 12 14:18 perf.data > # du -sh ~/.debug/ > 240M /root/.debug/ > > That's a lot of pointless work. Right, for you -B is the only sane way (or doing that in ~/.perfconfig and disabling this for good). BTW, mtime would incur in postprocessing it all. - Arnaldo > > > And all for some weird corner case. > > > > So could we perhaps just switch the whole thing over to be mtime based: mtime is > > pretty indicative of whether a binary is the right one or not. > > > > And mtime could be checked at perf report time, not at perf record time: we'd only > > have to check whether the mtime of the object we read at perf report time is newer > > than the mtime of the perf.data (the creation of the profile). > > > > This does not solve rare corner cases like cross-system profiling, but I think the > > common case should not be burdened with the overhead of a rare case. > > That might work, we have easy access to the mtime data for any file. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 14:38 ` Arnaldo Carvalho de Melo @ 2016-01-12 15:34 ` Peter Zijlstra 2016-01-12 15:48 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2016-01-12 15:34 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Tue, Jan 12, 2016 at 11:38:05AM -0300, Arnaldo Carvalho de Melo wrote: > > Also, just parsing the gigabytes of data that comes out of perf-record > > takes significant time, let alone poking around the filesystem and > > Right, that is what we would elliminate with stashing the content-based > cookie into a PERF_RECORD_MMAP3 record. Again, how would you go about getting that cookie for a DSO? The whole kernel isn't involved with dlopen(), all it sees is a mmap(PROT_EXEC). > BTW, mtime would incur in postprocessing it all. mtime can still warn you if things are non-matching at report time without this post-processing, and thereby solves the problem of staring at broken/wrong data. It doesn't get you right data, but knowing your data is broken allows you to manually do things 'right'. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 15:34 ` Peter Zijlstra @ 2016-01-12 15:48 ` Arnaldo Carvalho de Melo 2016-01-12 16:10 ` Peter Zijlstra 2016-01-12 21:02 ` Stephane Eranian 0 siblings, 2 replies; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-12 15:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak Em Tue, Jan 12, 2016 at 04:34:40PM +0100, Peter Zijlstra escreveu: > On Tue, Jan 12, 2016 at 11:38:05AM -0300, Arnaldo Carvalho de Melo wrote: > > > Also, just parsing the gigabytes of data that comes out of perf-record > > > takes significant time, let alone poking around the filesystem and > > > > Right, that is what we would elliminate with stashing the content-based > > cookie into a PERF_RECORD_MMAP3 record. > > Again, how would you go about getting that cookie for a DSO? The whole > kernel isn't involved with dlopen(), all it sees is a mmap(PROT_EXEC). > > > BTW, mtime would incur in postprocessing it all. > > mtime can still warn you if things are non-matching at report time > without this post-processing, and thereby solves the problem of staring > at broken/wrong data. How will we collect the mtime for the DSOs in PERF_RECORD_MMAP records if we don't look at those records? What mtime are you talking about? > It doesn't get you right data, but knowing your data is broken allows > you to manually do things 'right'. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 15:48 ` Arnaldo Carvalho de Melo @ 2016-01-12 16:10 ` Peter Zijlstra 2016-01-12 16:27 ` Peter Zijlstra 2016-01-12 21:02 ` Stephane Eranian 1 sibling, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2016-01-12 16:10 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Tue, Jan 12, 2016 at 12:48:12PM -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Jan 12, 2016 at 04:34:40PM +0100, Peter Zijlstra escreveu: > > On Tue, Jan 12, 2016 at 11:38:05AM -0300, Arnaldo Carvalho de Melo wrote: > > > > Also, just parsing the gigabytes of data that comes out of perf-record > > > > takes significant time, let alone poking around the filesystem and > > > > > > Right, that is what we would elliminate with stashing the content-based > > > cookie into a PERF_RECORD_MMAP3 record. > > > > Again, how would you go about getting that cookie for a DSO? The whole > > kernel isn't involved with dlopen(), all it sees is a mmap(PROT_EXEC). > > > > > BTW, mtime would incur in postprocessing it all. > > > > mtime can still warn you if things are non-matching at report time > > without this post-processing, and thereby solves the problem of staring > > at broken/wrong data. > > How will we collect the mtime for the DSOs in PERF_RECORD_MMAP records > if we don't look at those records? Kernel side, the vma has a vm_file member. So vma->vm_file->f_inode->i_mtime will get us that for all file based mmap()s. (or maybe ctime, not sure how popular it is these days to switch off mtime accounting). > What mtime are you talking about? perf-report looks at those records, it can compare the recorded mtime vs the currently observed mtime and complain if non-matching. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 16:10 ` Peter Zijlstra @ 2016-01-12 16:27 ` Peter Zijlstra 2016-01-12 17:15 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2016-01-12 16:27 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Tue, Jan 12, 2016 at 05:10:27PM +0100, Peter Zijlstra wrote: > > How will we collect the mtime for the DSOs in PERF_RECORD_MMAP records > > if we don't look at those records? > > Kernel side, the vma has a vm_file member. So > vma->vm_file->f_inode->i_mtime will get us that for all file based > mmap()s. > > (or maybe ctime, not sure how popular it is these days to switch off > mtime accounting). And this would (obviously) be a good time to see what else we'd like to stuff in there. --- include/uapi/linux/perf_event.h | 26 +++++++++++++++++++++++++- kernel/events/core.c | 19 +++++++++++++++---- 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 1afe9623c1a7..a0f43140a0e2 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -340,7 +340,8 @@ struct perf_event_attr { comm_exec : 1, /* flag comm events that are due to an exec */ use_clockid : 1, /* use @clockid for time fields */ context_switch : 1, /* context switch data */ - __reserved_1 : 37; + mmap3 : 1, /* include mmap with mtime */ + __reserved_1 : 36; union { __u32 wakeup_events; /* wakeup every n events */ @@ -856,6 +857,29 @@ enum perf_event_type { */ PERF_RECORD_SWITCH_CPU_WIDE = 15, + /* + * The MMAP3 records are an augmented version of MMAP2, they add + * mtime. + * + * struct { + * struct perf_event_header header; + * + * u32 pid, tid; + * u64 addr; + * u64 len; + * u64 pgoff; + * u32 maj; + * u32 min; + * u64 ino; + * u64 ino_generation; + * u32 prot, flags; + * u64 mtime; + * char filename[]; + * struct sample_id sample_id; + * } + */ + PERF_RECORD_MMAP3 = 16, + PERF_RECORD_MAX, /* non-ABI */ }; diff --git a/kernel/events/core.c b/kernel/events/core.c index bf8244190d0f..d400da14b923 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5661,7 +5661,7 @@ perf_event_aux(perf_event_aux_output_cb output, void *data, /* * task tracking -- fork/exit * - * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap_data | attr.task + * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap3 | attr.mmap_data | attr.task */ struct perf_task_event { @@ -5682,8 +5682,8 @@ struct perf_task_event { static int perf_event_task_match(struct perf_event *event) { return event->attr.comm || event->attr.mmap || - event->attr.mmap2 || event->attr.mmap_data || - event->attr.task; + event->attr.mmap2 || event->attr.mmap3 || + event->attr.mmap_data || event->attr.task; } static void perf_event_task_output(struct perf_event *event, @@ -5872,6 +5872,7 @@ struct perf_mmap_event { u64 ino; u64 ino_generation; u32 prot, flags; + u64 mtime; struct { struct perf_event_header header; @@ -5892,7 +5893,7 @@ static int perf_event_mmap_match(struct perf_event *event, int executable = vma->vm_flags & VM_EXEC; return (!executable && event->attr.mmap_data) || - (executable && (event->attr.mmap || event->attr.mmap2)); + (executable && (event->attr.mmap || event->attr.mmap2 || event-attr.mmap3)); } static void perf_event_mmap_output(struct perf_event *event, @@ -5916,6 +5917,9 @@ static void perf_event_mmap_output(struct perf_event *event, mmap_event->event_id.header.size += sizeof(mmap_event->prot); mmap_event->event_id.header.size += sizeof(mmap_event->flags); } + if (event->attr.mmap3) { + mmap_event->event_id.header.size += sizeof(mmap_event->mtime); + } perf_event_header__init_id(&mmap_event->event_id.header, &sample, event); ret = perf_output_begin(&handle, event, @@ -5936,6 +5940,9 @@ static void perf_event_mmap_output(struct perf_event *event, perf_output_put(&handle, mmap_event->prot); perf_output_put(&handle, mmap_event->flags); } + if (event->attr.mmap3) { + perf_output_put(&handle, mmap_event->mtime); + } __output_copy(&handle, mmap_event->file_name, mmap_event->file_size); @@ -5954,6 +5961,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) int maj = 0, min = 0; u64 ino = 0, gen = 0; u32 prot = 0, flags = 0; + u64 mtime = 0; unsigned int size; char tmp[16]; char *buf = NULL; @@ -5984,6 +5992,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) gen = inode->i_generation; maj = MAJOR(dev); min = MINOR(dev); + mtime = timespec_to_ns(inode->i_mtime); if (vma->vm_flags & VM_READ) prot |= PROT_READ; @@ -6054,6 +6063,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) mmap_event->ino_generation = gen; mmap_event->prot = prot; mmap_event->flags = flags; + mmap_event->mtime = mtime; if (!(vma->vm_flags & VM_EXEC)) mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA; @@ -6096,6 +6106,7 @@ void perf_event_mmap(struct vm_area_struct *vma) /* .ino_generation (attr_mmap2 only) */ /* .prot (attr_mmap2 only) */ /* .flags (attr_mmap2 only) */ + /* .mtime (attr_mmap3 only) */ }; perf_event_mmap_event(&mmap_event); ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 16:27 ` Peter Zijlstra @ 2016-01-12 17:15 ` Arnaldo Carvalho de Melo 2016-01-13 10:21 ` Ingo Molnar 0 siblings, 1 reply; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-12 17:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak Em Tue, Jan 12, 2016 at 05:27:19PM +0100, Peter Zijlstra escreveu: > On Tue, Jan 12, 2016 at 05:10:27PM +0100, Peter Zijlstra wrote: > > > How will we collect the mtime for the DSOs in PERF_RECORD_MMAP records > > > if we don't look at those records? > > Kernel side, the vma has a vm_file member. So Gotcha, via PERF_RECORD_MMAP3, using something we have (mtime) that while not good as a content-based cookie (that we don't easily have at PERF_RECORD_MMAP3 generation time) will allow us to do minimal detection of mismatched DSOs at analysis time. > > vma->vm_file->f_inode->i_mtime will get us that for all file based > > mmap()s. > > (or maybe ctime, not sure how popular it is these days to switch off > > mtime accounting). > And this would (obviously) be a good time to see what else we'd like to > stuff in there. Perhaps a version number? Else we'll be using those reserved bits again when we decide to introduce MMAP4 ;-\ But I still think we should have a content-based cookie to be delivered via that record, so at a minimum leave a cookie there starting with a size, that, for now, would be zero. If present, that would get into the current build-id infrastructure existing in the tooling side, something like: + /* + * The MMAP3 records are an augmented version of MMAP2, they add + * mtime and an optional content-based cookie, if available. + * struct perf_event_header header; + * + * u32 pid, tid; + * u64 addr; + * u64 len; + * u64 pgoff; + * u32 maj; + * u32 min; + * u64 ino; + * u64 ino_generation; + * u32 prot, flags; + * u64 mtime; + * u32 cookie_len + * char cookie[2+]; + * char filename[]; + * struct sample_id sample_id; Then someone (I want to do this, after processing more stuff from a neverending backlog) should try to experiment with the varios ELF loaders, etc to, in the process of loading, set that content-based cookie somehow (ioctl? prctl? whatever). - Arnaldo > --- > include/uapi/linux/perf_event.h | 26 +++++++++++++++++++++++++- > kernel/events/core.c | 19 +++++++++++++++---- > 2 files changed, 40 insertions(+), 5 deletions(-) > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h > index 1afe9623c1a7..a0f43140a0e2 100644 > --- a/include/uapi/linux/perf_event.h > +++ b/include/uapi/linux/perf_event.h > @@ -340,7 +340,8 @@ struct perf_event_attr { > comm_exec : 1, /* flag comm events that are due to an exec */ > use_clockid : 1, /* use @clockid for time fields */ > context_switch : 1, /* context switch data */ > - __reserved_1 : 37; > + mmap3 : 1, /* include mmap with mtime */ > + __reserved_1 : 36; > > union { > __u32 wakeup_events; /* wakeup every n events */ > @@ -856,6 +857,29 @@ enum perf_event_type { > */ > PERF_RECORD_SWITCH_CPU_WIDE = 15, > > + /* > + * The MMAP3 records are an augmented version of MMAP2, they add > + * mtime. > + * > + * struct { > + * struct perf_event_header header; > + * > + * u32 pid, tid; > + * u64 addr; > + * u64 len; > + * u64 pgoff; > + * u32 maj; > + * u32 min; > + * u64 ino; > + * u64 ino_generation; > + * u32 prot, flags; > + * u64 mtime; > + * char filename[]; > + * struct sample_id sample_id; > + * } > + */ > + PERF_RECORD_MMAP3 = 16, > + > PERF_RECORD_MAX, /* non-ABI */ > }; > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index bf8244190d0f..d400da14b923 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5661,7 +5661,7 @@ perf_event_aux(perf_event_aux_output_cb output, void *data, > /* > * task tracking -- fork/exit > * > - * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap_data | attr.task > + * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap3 | attr.mmap_data | attr.task > */ > > struct perf_task_event { > @@ -5682,8 +5682,8 @@ struct perf_task_event { > static int perf_event_task_match(struct perf_event *event) > { > return event->attr.comm || event->attr.mmap || > - event->attr.mmap2 || event->attr.mmap_data || > - event->attr.task; > + event->attr.mmap2 || event->attr.mmap3 || > + event->attr.mmap_data || event->attr.task; > } > > static void perf_event_task_output(struct perf_event *event, > @@ -5872,6 +5872,7 @@ struct perf_mmap_event { > u64 ino; > u64 ino_generation; > u32 prot, flags; > + u64 mtime; > > struct { > struct perf_event_header header; > @@ -5892,7 +5893,7 @@ static int perf_event_mmap_match(struct perf_event *event, > int executable = vma->vm_flags & VM_EXEC; > > return (!executable && event->attr.mmap_data) || > - (executable && (event->attr.mmap || event->attr.mmap2)); > + (executable && (event->attr.mmap || event->attr.mmap2 || event-attr.mmap3)); > } > > static void perf_event_mmap_output(struct perf_event *event, > @@ -5916,6 +5917,9 @@ static void perf_event_mmap_output(struct perf_event *event, > mmap_event->event_id.header.size += sizeof(mmap_event->prot); > mmap_event->event_id.header.size += sizeof(mmap_event->flags); > } > + if (event->attr.mmap3) { > + mmap_event->event_id.header.size += sizeof(mmap_event->mtime); > + } > > perf_event_header__init_id(&mmap_event->event_id.header, &sample, event); > ret = perf_output_begin(&handle, event, > @@ -5936,6 +5940,9 @@ static void perf_event_mmap_output(struct perf_event *event, > perf_output_put(&handle, mmap_event->prot); > perf_output_put(&handle, mmap_event->flags); > } > + if (event->attr.mmap3) { > + perf_output_put(&handle, mmap_event->mtime); > + } > > __output_copy(&handle, mmap_event->file_name, > mmap_event->file_size); > @@ -5954,6 +5961,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) > int maj = 0, min = 0; > u64 ino = 0, gen = 0; > u32 prot = 0, flags = 0; > + u64 mtime = 0; > unsigned int size; > char tmp[16]; > char *buf = NULL; > @@ -5984,6 +5992,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) > gen = inode->i_generation; > maj = MAJOR(dev); > min = MINOR(dev); > + mtime = timespec_to_ns(inode->i_mtime); > > if (vma->vm_flags & VM_READ) > prot |= PROT_READ; > @@ -6054,6 +6063,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) > mmap_event->ino_generation = gen; > mmap_event->prot = prot; > mmap_event->flags = flags; > + mmap_event->mtime = mtime; > > if (!(vma->vm_flags & VM_EXEC)) > mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA; > @@ -6096,6 +6106,7 @@ void perf_event_mmap(struct vm_area_struct *vma) > /* .ino_generation (attr_mmap2 only) */ > /* .prot (attr_mmap2 only) */ > /* .flags (attr_mmap2 only) */ > + /* .mtime (attr_mmap3 only) */ > }; > > perf_event_mmap_event(&mmap_event); ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 17:15 ` Arnaldo Carvalho de Melo @ 2016-01-13 10:21 ` Ingo Molnar 2016-01-13 12:40 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Ingo Molnar @ 2016-01-13 10:21 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > And this would (obviously) be a good time to see what else we'd like to > > stuff in there. > > Perhaps a version number? Else we'll be using those reserved bits again > when we decide to introduce MMAP4 ;-\ No version numbers please. Cannot we have a size field and be done with it? The size is the 'version', if we only ever expand the record. (which is the typical case) Thanks, Ingo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-13 10:21 ` Ingo Molnar @ 2016-01-13 12:40 ` Peter Zijlstra 2016-01-14 11:27 ` Ingo Molnar 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2016-01-13 12:40 UTC (permalink / raw) To: Ingo Molnar Cc: Arnaldo Carvalho de Melo, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Wed, Jan 13, 2016 at 11:21:07AM +0100, Ingo Molnar wrote: > > * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > > > And this would (obviously) be a good time to see what else we'd like to > > > stuff in there. > > > > Perhaps a version number? Else we'll be using those reserved bits again > > when we decide to introduce MMAP4 ;-\ > > No version numbers please. Cannot we have a size field and be done with it? The > size is the 'version', if we only ever expand the record. (which is the typical > case) The current problem with that is that we use the 'remaining' size as the length field for a string (with the PERF_RECORD_MMAP* records). We could of course fix that no problem. --- include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++- kernel/events/core.c | 35 ++++++++++++++++++++++++----------- 2 files changed, 50 insertions(+), 12 deletions(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 1afe9623c1a7..b7b673387581 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -340,7 +340,8 @@ struct perf_event_attr { comm_exec : 1, /* flag comm events that are due to an exec */ use_clockid : 1, /* use @clockid for time fields */ context_switch : 1, /* context switch data */ - __reserved_1 : 37; + mmap3 : 1, /* include mmap with mtime */ + __reserved_1 : 36; union { __u32 wakeup_events; /* wakeup every n events */ @@ -856,6 +857,30 @@ enum perf_event_type { */ PERF_RECORD_SWITCH_CPU_WIDE = 15, + /* + * The MMAP3 records are an augmented version of MMAP2, they add + * mtime and filename_len, allowing for size based extensions. + * + * struct { + * struct perf_event_header header; + * + * u32 pid, tid; + * u64 addr; + * u64 len; + * u64 pgoff; + * u32 maj; + * u32 min; + * u64 ino; + * u64 ino_generation; + * u32 prot, flags; + * u64 mtime; + * u32 filename_len; + * char filename[2+]; + * struct sample_id sample_id; + * } + */ + PERF_RECORD_MMAP3 = 16, + PERF_RECORD_MAX, /* non-ABI */ }; diff --git a/kernel/events/core.c b/kernel/events/core.c index bf8244190d0f..1cf15793f96c 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5661,7 +5661,7 @@ perf_event_aux(perf_event_aux_output_cb output, void *data, /* * task tracking -- fork/exit * - * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap_data | attr.task + * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap3 | attr.mmap_data | attr.task */ struct perf_task_event { @@ -5682,8 +5682,8 @@ struct perf_task_event { static int perf_event_task_match(struct perf_event *event) { return event->attr.comm || event->attr.mmap || - event->attr.mmap2 || event->attr.mmap_data || - event->attr.task; + event->attr.mmap2 || event->attr.mmap3 || + event->attr.mmap_data || event->attr.task; } static void perf_event_task_output(struct perf_event *event, @@ -5867,11 +5867,12 @@ struct perf_mmap_event { struct vm_area_struct *vma; const char *file_name; - int file_size; + u32 file_size; int maj, min; u64 ino; u64 ino_generation; u32 prot, flags; + u64 mtime; struct { struct perf_event_header header; @@ -5892,11 +5893,10 @@ static int perf_event_mmap_match(struct perf_event *event, int executable = vma->vm_flags & VM_EXEC; return (!executable && event->attr.mmap_data) || - (executable && (event->attr.mmap || event->attr.mmap2)); + (executable && (event->attr.mmap || event->attr.mmap2 || event->attr.mmap3)); } -static void perf_event_mmap_output(struct perf_event *event, - void *data) +static void perf_event_mmap_output(struct perf_event *event, void *data) { struct perf_mmap_event *mmap_event = data; struct perf_output_handle handle; @@ -5907,7 +5907,7 @@ static void perf_event_mmap_output(struct perf_event *event, if (!perf_event_mmap_match(event, data)) return; - if (event->attr.mmap2) { + if (event->attr.mmap2 || event->attr.mmap3) { mmap_event->event_id.header.type = PERF_RECORD_MMAP2; mmap_event->event_id.header.size += sizeof(mmap_event->maj); mmap_event->event_id.header.size += sizeof(mmap_event->min); @@ -5916,6 +5916,11 @@ static void perf_event_mmap_output(struct perf_event *event, mmap_event->event_id.header.size += sizeof(mmap_event->prot); mmap_event->event_id.header.size += sizeof(mmap_event->flags); } + if (event->attr.mmap3) { + mmap_event->event_id.header.type = PERF_RECORD_MMAP3; + mmap_event->event_id.header.size += sizeof(mmap_event->mtime); + mmap_event->event_id.header.size += sizeof(mmap_event->file_size); + } perf_event_header__init_id(&mmap_event->event_id.header, &sample, event); ret = perf_output_begin(&handle, event, @@ -5928,7 +5933,7 @@ static void perf_event_mmap_output(struct perf_event *event, perf_output_put(&handle, mmap_event->event_id); - if (event->attr.mmap2) { + if (event->attr.mmap2 || event->attr.mmap3) { perf_output_put(&handle, mmap_event->maj); perf_output_put(&handle, mmap_event->min); perf_output_put(&handle, mmap_event->ino); @@ -5936,6 +5941,10 @@ static void perf_event_mmap_output(struct perf_event *event, perf_output_put(&handle, mmap_event->prot); perf_output_put(&handle, mmap_event->flags); } + if (event->attr.mmap3) { + perf_output_put(&handle, mmap_event->mtime); + perf_output_put(&handle, mmap_event->file_size); + } __output_copy(&handle, mmap_event->file_name, mmap_event->file_size); @@ -5954,8 +5963,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) int maj = 0, min = 0; u64 ino = 0, gen = 0; u32 prot = 0, flags = 0; + u64 mtime = 0; unsigned int size; - char tmp[16]; + char tmp[18]; char *buf = NULL; char *name; @@ -5984,6 +5994,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) gen = inode->i_generation; maj = MAJOR(dev); min = MINOR(dev); + mtime = timespec_to_ns(&inode->i_mtime); if (vma->vm_flags & VM_READ) prot |= PROT_READ; @@ -6043,7 +6054,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) * zero'd out to avoid leaking random bits to userspace. */ size = strlen(name)+1; - while (!IS_ALIGNED(size, sizeof(u64))) + while (!IS_ALIGNED(2+size, sizeof(u64))) name[size++] = '\0'; mmap_event->file_name = name; @@ -6054,6 +6065,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) mmap_event->ino_generation = gen; mmap_event->prot = prot; mmap_event->flags = flags; + mmap_event->mtime = mtime; if (!(vma->vm_flags & VM_EXEC)) mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA; @@ -6096,6 +6108,7 @@ void perf_event_mmap(struct vm_area_struct *vma) /* .ino_generation (attr_mmap2 only) */ /* .prot (attr_mmap2 only) */ /* .flags (attr_mmap2 only) */ + /* .mtime (attr_mmap3 only) */ }; perf_event_mmap_event(&mmap_event); ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-13 12:40 ` Peter Zijlstra @ 2016-01-14 11:27 ` Ingo Molnar 2016-01-14 11:36 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Ingo Molnar @ 2016-01-14 11:27 UTC (permalink / raw) To: Peter Zijlstra Cc: Arnaldo Carvalho de Melo, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak * Peter Zijlstra <peterz@infradead.org> wrote: > The current problem with that is that we use the 'remaining' size as > the length field for a string (with the PERF_RECORD_MMAP* records). > > We could of course fix that no problem. > > > --- > include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++- > kernel/events/core.c | 35 ++++++++++++++++++++++++----------- > 2 files changed, 50 insertions(+), 12 deletions(-) > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h > index 1afe9623c1a7..b7b673387581 100644 > --- a/include/uapi/linux/perf_event.h > +++ b/include/uapi/linux/perf_event.h > @@ -340,7 +340,8 @@ struct perf_event_attr { > comm_exec : 1, /* flag comm events that are due to an exec */ > use_clockid : 1, /* use @clockid for time fields */ > context_switch : 1, /* context switch data */ > - __reserved_1 : 37; > + mmap3 : 1, /* include mmap with mtime */ > + __reserved_1 : 36; > > union { > __u32 wakeup_events; /* wakeup every n events */ > @@ -856,6 +857,30 @@ enum perf_event_type { > */ > PERF_RECORD_SWITCH_CPU_WIDE = 15, > > + /* > + * The MMAP3 records are an augmented version of MMAP2, they add > + * mtime and filename_len, allowing for size based extensions. > + * > + * struct { > + * struct perf_event_header header; > + * > + * u32 pid, tid; > + * u64 addr; > + * u64 len; > + * u64 pgoff; > + * u32 maj; > + * u32 min; > + * u64 ino; > + * u64 ino_generation; > + * u32 prot, flags; > + * u64 mtime; > + * u32 filename_len; > + * char filename[2+]; > + * struct sample_id sample_id; > + * } > + */ > + PERF_RECORD_MMAP3 = 16, > + > PERF_RECORD_MAX, /* non-ABI */ > }; Yeah, very nice! And this means v3 should be the last ever version - all future extensions can happen via the length field. Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-14 11:27 ` Ingo Molnar @ 2016-01-14 11:36 ` Peter Zijlstra 2016-01-15 1:59 ` Stephane Eranian 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2016-01-14 11:36 UTC (permalink / raw) To: Ingo Molnar Cc: Arnaldo Carvalho de Melo, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Thu, Jan 14, 2016 at 12:27:34PM +0100, Ingo Molnar wrote: > > + * u32 filename_len; > > + * char filename[2+]; > Acked-by: Ingo Molnar <mingo@kernel.org> except of course that sizeof(u32) == 4 :/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-14 11:36 ` Peter Zijlstra @ 2016-01-15 1:59 ` Stephane Eranian 2016-01-15 9:34 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Stephane Eranian @ 2016-01-15 1:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak Peter, On Thu, Jan 14, 2016 at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Jan 14, 2016 at 12:27:34PM +0100, Ingo Molnar wrote: >> > + * u32 filename_len; >> > + * char filename[2+]; > >> Acked-by: Ingo Molnar <mingo@kernel.org> > > except of course that sizeof(u32) == 4 :/ There is no padding here. Are you concerned about running out of bits in filename_len? Any extension possible because header.size - sizeof(mmap3) - filename_len sizing what's after filename, right? ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-15 1:59 ` Stephane Eranian @ 2016-01-15 9:34 ` Peter Zijlstra 2016-01-15 18:58 ` Stephane Eranian 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2016-01-15 9:34 UTC (permalink / raw) To: Stephane Eranian Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Thu, Jan 14, 2016 at 05:59:48PM -0800, Stephane Eranian wrote: > Peter, > > On Thu, Jan 14, 2016 at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Jan 14, 2016 at 12:27:34PM +0100, Ingo Molnar wrote: > >> > + * u32 filename_len; > >> > + * char filename[2+]; > > > >> Acked-by: Ingo Molnar <mingo@kernel.org> > > > > except of course that sizeof(u32) == 4 :/ > There is no padding here. Are you concerned about running out of bits > in filename_len? No, I just made a mess of it :-) filename_len should have been u16 and filename should then be 6+8n in size. > Any extension possible because header.size - sizeof(mmap3) - > filename_len sizing what's after filename, right? Right, current MMAP records use the remaining size as the filename length, but by explicitly specifying that we can add optional fields. These fields must be after filename_len, otherwise you'd not be able to find filename_len and you could not compute the extra size. And given alignment constraints it makes sense to do it after filename[]. So suppose we wanted to also add atime and ctime, we could do. PERF_RECORD_MMAP3 { ... u16 filename_len; char filename[6+8n]; if (extra_size >= 16) { u64 stime; u64 ctime; }; } or something like that. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-15 9:34 ` Peter Zijlstra @ 2016-01-15 18:58 ` Stephane Eranian 2016-01-15 19:49 ` Arnaldo Carvalho de Melo 2016-01-15 21:36 ` Peter Zijlstra 0 siblings, 2 replies; 47+ messages in thread From: Stephane Eranian @ 2016-01-15 18:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Fri, Jan 15, 2016 at 1:34 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Jan 14, 2016 at 05:59:48PM -0800, Stephane Eranian wrote: >> Peter, >> >> On Thu, Jan 14, 2016 at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Thu, Jan 14, 2016 at 12:27:34PM +0100, Ingo Molnar wrote: >> >> > + * u32 filename_len; >> >> > + * char filename[2+]; >> > >> >> Acked-by: Ingo Molnar <mingo@kernel.org> >> > >> > except of course that sizeof(u32) == 4 :/ > >> There is no padding here. Are you concerned about running out of bits >> in filename_len? > > No, I just made a mess of it :-) > > filename_len should have been u16 and filename should then be 6+8n in > size. > why don't you make it more explicit: u16 filename_len u16 extra_len then it would be clear what is what, no field with dual meaning. Now, that assumes that no pathname can be longer than 65535 bytes. >> Any extension possible because header.size - sizeof(mmap3) - >> filename_len sizing what's after filename, right? > > Right, current MMAP records use the remaining size as the filename > length, but by explicitly specifying that we can add optional fields. > > These fields must be after filename_len, otherwise you'd not be able to > find filename_len and you could not compute the extra size. And given > alignment constraints it makes sense to do it after filename[]. > > So suppose we wanted to also add atime and ctime, we could do. > > PERF_RECORD_MMAP3 { > ... > u16 filename_len; > char filename[6+8n]; > > if (extra_size >= 16) { > u64 stime; > u64 ctime; > }; > } > > or something like that. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-15 18:58 ` Stephane Eranian @ 2016-01-15 19:49 ` Arnaldo Carvalho de Melo 2016-01-15 21:49 ` Stephane Eranian 2016-01-15 21:36 ` Peter Zijlstra 1 sibling, 1 reply; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-15 19:49 UTC (permalink / raw) To: Stephane Eranian Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak Em Fri, Jan 15, 2016 at 10:58:48AM -0800, Stephane Eranian escreveu: > On Fri, Jan 15, 2016 at 1:34 AM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Jan 14, 2016 at 05:59:48PM -0800, Stephane Eranian wrote: > >> Peter, > >> > >> On Thu, Jan 14, 2016 at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote: > >> > On Thu, Jan 14, 2016 at 12:27:34PM +0100, Ingo Molnar wrote: > >> >> > + * u32 filename_len; > >> >> > + * char filename[2+]; > >> > > >> >> Acked-by: Ingo Molnar <mingo@kernel.org> > >> > > >> > except of course that sizeof(u32) == 4 :/ > > > >> There is no padding here. Are you concerned about running out of bits > >> in filename_len? > > > > No, I just made a mess of it :-) > > > > filename_len should have been u16 and filename should then be 6+8n in > > size. > > > why don't you make it more explicit: > u16 filename_len > u16 extra_len > then it would be clear what is what, no field with dual meaning. > Now, that assumes that no pathname can be longer than 65535 bytes. There is no requirement for the filename to start at a u64 boundary, right? So it can start right after the filename_len, be it u16 or u32, and that extra_len can be computed as Peter described right here: > >> Any extension possible because header.size - sizeof(mmap3) - > >> filename_len sizing what's after filename, right? So, no need to have that extra_len :-) Ah PATH_MAX is 4096, so I guess u16 should be enough, no? - Arnaldo > > Right, current MMAP records use the remaining size as the filename > > length, but by explicitly specifying that we can add optional fields. > > > > These fields must be after filename_len, otherwise you'd not be able to > > find filename_len and you could not compute the extra size. And given > > alignment constraints it makes sense to do it after filename[]. > > > > So suppose we wanted to also add atime and ctime, we could do. > > > > PERF_RECORD_MMAP3 { > > ... > > u16 filename_len; > > char filename[6+8n]; > > > > if (extra_size >= 16) { > > u64 stime; > > u64 ctime; > > }; > > } > > > > or something like that. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-15 19:49 ` Arnaldo Carvalho de Melo @ 2016-01-15 21:49 ` Stephane Eranian 0 siblings, 0 replies; 47+ messages in thread From: Stephane Eranian @ 2016-01-15 21:49 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Fri, Jan 15, 2016 at 11:49 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Em Fri, Jan 15, 2016 at 10:58:48AM -0800, Stephane Eranian escreveu: >> On Fri, Jan 15, 2016 at 1:34 AM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Thu, Jan 14, 2016 at 05:59:48PM -0800, Stephane Eranian wrote: >> >> Peter, >> >> >> >> On Thu, Jan 14, 2016 at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote: >> >> > On Thu, Jan 14, 2016 at 12:27:34PM +0100, Ingo Molnar wrote: >> >> >> > + * u32 filename_len; >> >> >> > + * char filename[2+]; >> >> > >> >> >> Acked-by: Ingo Molnar <mingo@kernel.org> >> >> > >> >> > except of course that sizeof(u32) == 4 :/ >> > >> >> There is no padding here. Are you concerned about running out of bits >> >> in filename_len? >> > >> > No, I just made a mess of it :-) >> > >> > filename_len should have been u16 and filename should then be 6+8n in >> > size. >> > >> why don't you make it more explicit: >> u16 filename_len >> u16 extra_len > >> then it would be clear what is what, no field with dual meaning. >> Now, that assumes that no pathname can be longer than 65535 bytes. > > There is no requirement for the filename to start at a u64 boundary, > right? So it can start right after the filename_len, be it u16 or u32, > and that extra_len can be computed as Peter described right here: > >> >> Any extension possible because header.size - sizeof(mmap3) - >> >> filename_len sizing what's after filename, right? > > So, no need to have that extra_len :-) > I know it can. I just find this cumbersome to have to have to compose fields to get the extra length. That would be extra = header.size - sizeof(header) - sizeof(all_the_data_fields) - filename_len; That's ugly and error prone. I know it's been used for other record types. But maybe we could avoid it here. > Ah PATH_MAX is 4096, so I guess u16 should be enough, no? > > - Arnaldo > >> > Right, current MMAP records use the remaining size as the filename >> > length, but by explicitly specifying that we can add optional fields. >> > >> > These fields must be after filename_len, otherwise you'd not be able to >> > find filename_len and you could not compute the extra size. And given >> > alignment constraints it makes sense to do it after filename[]. >> > >> > So suppose we wanted to also add atime and ctime, we could do. >> > >> > PERF_RECORD_MMAP3 { >> > ... >> > u16 filename_len; >> > char filename[6+8n]; >> > >> > if (extra_size >= 16) { >> > u64 stime; >> > u64 ctime; >> > }; >> > } >> > >> > or something like that. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-15 18:58 ` Stephane Eranian 2016-01-15 19:49 ` Arnaldo Carvalho de Melo @ 2016-01-15 21:36 ` Peter Zijlstra 1 sibling, 0 replies; 47+ messages in thread From: Peter Zijlstra @ 2016-01-15 21:36 UTC (permalink / raw) To: Stephane Eranian Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Fri, Jan 15, 2016 at 10:58:48AM -0800, Stephane Eranian wrote: > then it would be clear what is what, no field with dual meaning. > Now, that assumes that no pathname can be longer than 65535 bytes. include/uapi/linux/limits.h:#define PATH_MAX 4096 /* # chars in a path name including nul */ Also, our RECORDS cannot exceed 64k. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 15:48 ` Arnaldo Carvalho de Melo 2016-01-12 16:10 ` Peter Zijlstra @ 2016-01-12 21:02 ` Stephane Eranian 1 sibling, 0 replies; 47+ messages in thread From: Stephane Eranian @ 2016-01-12 21:02 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Tue, Jan 12, 2016 at 7:48 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Em Tue, Jan 12, 2016 at 04:34:40PM +0100, Peter Zijlstra escreveu: >> On Tue, Jan 12, 2016 at 11:38:05AM -0300, Arnaldo Carvalho de Melo wrote: >> > > Also, just parsing the gigabytes of data that comes out of perf-record >> > > takes significant time, let alone poking around the filesystem and >> > >> > Right, that is what we would elliminate with stashing the content-based >> > cookie into a PERF_RECORD_MMAP3 record. >> >> Again, how would you go about getting that cookie for a DSO? The whole >> kernel isn't involved with dlopen(), all it sees is a mmap(PROT_EXEC). >> >> > BTW, mtime would incur in postprocessing it all. >> >> mtime can still warn you if things are non-matching at report time >> without this post-processing, and thereby solves the problem of staring >> at broken/wrong data. > > How will we collect the mtime for the DSOs in PERF_RECORD_MMAP records > if we don't look at those records? What mtime are you talking about? > I think the post-processing of MMAP could be sped up if they were saved out-of-band inside of with the samples. Isn't that what the auxbuffer allows you to do? >> It doesn't get you right data, but knowing your data is broken allows >> you to manually do things 'right'. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 11:35 ` Peter Zijlstra 2016-01-12 12:18 ` Ingo Molnar @ 2016-01-12 13:08 ` One Thousand Gnomes 2016-01-12 14:34 ` Arnaldo Carvalho de Melo 2 siblings, 0 replies; 47+ messages in thread From: One Thousand Gnomes @ 2016-01-12 13:08 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak > And we cannot, at mmap() time, 'assume' the file is ELF and try prodding > into it to find a buildid or whatnot. And you cannot at any time assume that something like NFS won't change the page contents under you. In fact it's valid (but completely insane) to use MAP_DISCARD to throw away part of your program and page it back in again so that a daemon on the other side of the NFS can change the code you are executing 8) Alan ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 11:35 ` Peter Zijlstra 2016-01-12 12:18 ` Ingo Molnar 2016-01-12 13:08 ` One Thousand Gnomes @ 2016-01-12 14:34 ` Arnaldo Carvalho de Melo 2016-01-12 15:37 ` Peter Zijlstra 2 siblings, 1 reply; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-12 14:34 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak Em Tue, Jan 12, 2016 at 12:35:21PM +0100, Peter Zijlstra escreveu: > On Tue, Jan 12, 2016 at 11:39:43AM +0100, Ingo Molnar wrote: > > > Does the kernel even know about the buildid crap? AFAIK the binfmt stuff doesn't > > > know or care about things like that. Heck, we support binfmts that do not even > > > have a buildid. > > > > The kernel's exec() code does not care about the past, it will execute whatever is > > fit to execute right now. > > > > But perf tooling cares very much: it can lead to subtle bugs and bad data if we > > display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of > > binary mismatch can be very convincing and can send developers down the wrong path > > for hours. I'd expect my tooling to not do that. > > Well, it really is rather a rare case, replacing binaries you're > profiling. Sure, if it happens (by accident or otherwise) it can be a > pain, but the cost of fixing this 'problem' is huge. Humm, for most things today, i.e. ELF, most distros (all? maybe this is a gcc switch that is default on, haven't checked) come with such pre-computed cookie its just a way for efficiently passing it to the tooling via a new record. With that, no post processing, etc. But then someone would need to prototype this... [acme@zoo linux]$ perf record -h build Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -B, --no-buildid do not collect buildids in perf.data -N, --no-buildid-cache do not update the buildid cache [acme@zoo linux]$ Have you ever played, when you noticed those overheads, with -N? Or just used the -B big hammer and moved on? - Arnaldo > > Path names alone (the thing that exec() cares about) are not unique enough to > > identify the binary that was profiled. So we need a content hash - hence the > > build-ID. > > > > Can you suggest a better solution than a build-time calculated content hash? > > Not really, but the current 'solution' is a massive pain. The result is > that perf-record needs to do a full scan of the recorded data after > completion and look for buildids across the system. > > On my system that pass takes longer than the actual workload (of > building a kernel). Furthermore, the resulting data is useless for me. > > > As for binary formats that suck and don't allow for a content hash: we do our > > best, but of course the risk of data mismatch is there. We could perhaps cache the > > binary inode's mtime field to at least produce a 'profile data is older than > > binary/DSO modification date!' warning. (Which check won't catch all cases, like > > cross-system profiling data matches.) > > So my problem with the kernel side thing is that I fear it will, again, > be a partial solution, and we'll still end up scanning the perf-record > output, ie. nothing better than we are now. > > Sure, maybe we can have binfmt_elf read the buildid and cache it > someplace, maybe we can even have the other binfmt thingies do something > similar (at small cost, we obviously cannot compute hashes over files at > exec() time, that would upset people). > > But what do we do for DSOs, does dlopen() ever end up in the binfmt > code? I would think not, I would fully expect the dynamic linker to just > mmap() the relevant bits and be done with it. > > And we cannot, at mmap() time, 'assume' the file is ELF and try prodding > into it to find a buildid or whatnot. > > And all for some weird corner case. > > ~ Peter, who thinks buildid stuff stinks. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 14:34 ` Arnaldo Carvalho de Melo @ 2016-01-12 15:37 ` Peter Zijlstra 2016-01-13 10:25 ` Ingo Molnar 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2016-01-12 15:37 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak On Tue, Jan 12, 2016 at 11:34:54AM -0300, Arnaldo Carvalho de Melo wrote: > Have you ever played, when you noticed those overheads, with -N? Or just > used the -B big hammer and moved on? I yelled on irc, jolsa told me to use -B, I moved on ;-) Not sure -N really buys me anything, its still slower and I really don't need any of this. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 15:37 ` Peter Zijlstra @ 2016-01-13 10:25 ` Ingo Molnar 0 siblings, 0 replies; 47+ messages in thread From: Ingo Molnar @ 2016-01-13 10:25 UTC (permalink / raw) To: Peter Zijlstra Cc: Arnaldo Carvalho de Melo, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak * Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Jan 12, 2016 at 11:34:54AM -0300, Arnaldo Carvalho de Melo wrote: > > > Have you ever played, when you noticed those overheads, with -N? Or just used > > the -B big hammer and moved on? > > I yelled on irc, jolsa told me to use -B, I moved on ;-) So I think we should grow mtime protection, to not display incorrect data - and then that should become the default. (in addition to per CPU recording threads.) Way too slow recording is something I experience as well - and to have a slow _performance_ profiling tool is pretty ironic! ;-) Thanks, Ingo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 10:39 ` Ingo Molnar 2016-01-12 11:35 ` Peter Zijlstra @ 2016-01-12 14:23 ` Arnaldo Carvalho de Melo 2016-01-13 9:57 ` Ingo Molnar 1 sibling, 1 reply; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-12 14:23 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak Em Tue, Jan 12, 2016 at 11:39:43AM +0100, Ingo Molnar escreveu: > But perf tooling cares very much: it can lead to subtle bugs and bad data if we > display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of > binary mismatch can be very convincing and can send developers down the wrong path > for hours. I'd expect my tooling to not do that. > Path names alone (the thing that exec() cares about) are not unique enough to > identify the binary that was profiled. So we need a content hash - hence the > build-ID. > Can you suggest a better solution than a build-time calculated content hash? > As for binary formats that suck and don't allow for a content hash: we do our > best, but of course the risk of data mismatch is there. We could perhaps cache the > binary inode's mtime field to at least produce a 'profile data is older than > binary/DSO modification date!' warning. (Which check won't catch all cases, like > cross-system profiling data matches.) So, we could think of this as: binary formats that want to aid observability tools to: 1) Detect mismatches in contents for DSOs present at recording time to those to be used at analysis time. 2) Find symtabs, DSO binary contents, CFI tables, present in the DSO where samples were taken. Using mtime, as suggested in other messages will help with #1, but not with #2. Checking for inefficiencies in the current approach of right-after-recording post-processing looking for PERF_RECORD_MMAPs, Adrian suggested something here, also disabling the saving into ~/.debug/ will help, collecting numbers would be great. But the mtime thing also requires traversing the whole perf.data contents looking for those paths in PERF_RECORD_MMAP records. - Arnaldo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-12 14:23 ` Arnaldo Carvalho de Melo @ 2016-01-13 9:57 ` Ingo Molnar 2016-01-13 15:27 ` Arnaldo Carvalho de Melo 2016-01-19 14:56 ` Namhyung Kim 0 siblings, 2 replies; 47+ messages in thread From: Ingo Molnar @ 2016-01-13 9:57 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Em Tue, Jan 12, 2016 at 11:39:43AM +0100, Ingo Molnar escreveu: > > But perf tooling cares very much: it can lead to subtle bugs and bad data if we > > display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of > > binary mismatch can be very convincing and can send developers down the wrong path > > for hours. I'd expect my tooling to not do that. > > > Path names alone (the thing that exec() cares about) are not unique enough to > > identify the binary that was profiled. So we need a content hash - hence the > > build-ID. > > > Can you suggest a better solution than a build-time calculated content hash? > > > As for binary formats that suck and don't allow for a content hash: we do our > > best, but of course the risk of data mismatch is there. We could perhaps cache the > > binary inode's mtime field to at least produce a 'profile data is older than > > binary/DSO modification date!' warning. (Which check won't catch all cases, like > > cross-system profiling data matches.) > > So, we could think of this as: binary formats that want to aid > observability tools to: > > 1) Detect mismatches in contents for DSOs present at recording time to > those to be used at analysis time. > > 2) Find symtabs, DSO binary contents, CFI tables, present in the DSO > where samples were taken. > > Using mtime, as suggested in other messages will help with #1, but not > with #2. But but ... why is #2 a problem with mtime? If we have an out of date record in the perf.data, then the perf.data is uninteresting in 99% of the usecases! It's out of date, most likely because the binary the developer is working on got rebuilt, or the system got upgraded - in both cases the developer does not care about the old records anymore... What matters is #1, to detect mismatches, to be a reliable tool. Once we've detected that, we can inform the user and our job is mostly done. But reliable != perfect time machine. Really, #2 is a second, third order concern that should never cause slowdowns on the magnitude that Peter is complaining about! I realize that there might be special workflows (such as system-wide monitoring) where collecting at recording time might be useful, but those are not the common case - and they should not slow down the common case. > Checking for inefficiencies in the current approach of > right-after-recording post-processing looking for PERF_RECORD_MMAPs, > Adrian suggested something here, also disabling the saving into > ~/.debug/ will help, collecting numbers would be great. I think Peter mentioned a number: the kernel build time almost _doubles_ with this. That's clearly unacceptable. > But the mtime thing also requires traversing the whole perf.data > contents looking for those paths in PERF_RECORD_MMAP records. But why? Why cannot we do it at perf report time, when we will parse them anyway? Thanks, Ingo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-13 9:57 ` Ingo Molnar @ 2016-01-13 15:27 ` Arnaldo Carvalho de Melo 2016-01-19 14:56 ` Namhyung Kim 1 sibling, 0 replies; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-13 15:27 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Stephane Eranian, Namhyung Kim, LKML, Jiri Olsa, Namhyung Kim, Adrian Hunter, ak Em Wed, Jan 13, 2016 at 10:57:38AM +0100, Ingo Molnar escreveu: > * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > Em Tue, Jan 12, 2016 at 11:39:43AM +0100, Ingo Molnar escreveu: > > > But perf tooling cares very much: it can lead to subtle bugs and bad data if we > > > display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of > > > binary mismatch can be very convincing and can send developers down the wrong path > > > for hours. I'd expect my tooling to not do that. > > > Path names alone (the thing that exec() cares about) are not unique enough to > > > identify the binary that was profiled. So we need a content hash - hence the > > > build-ID. > > > Can you suggest a better solution than a build-time calculated content hash? > > > As for binary formats that suck and don't allow for a content hash: we do our > > > best, but of course the risk of data mismatch is there. We could perhaps cache the > > > binary inode's mtime field to at least produce a 'profile data is older than > > > binary/DSO modification date!' warning. (Which check won't catch all cases, like > > > cross-system profiling data matches.) > > So, we could think of this as: binary formats that want to aid > > observability tools to: > > 1) Detect mismatches in contents for DSOs present at recording time to > > those to be used at analysis time. > > 2) Find symtabs, DSO binary contents, CFI tables, present in the DSO > > where samples were taken. > > Using mtime, as suggested in other messages will help with #1, but not > > with #2. > > But but ... why is #2 a problem with mtime? If we have an out of date record in > the perf.data, then the perf.data is uninteresting in 99% of the usecases! It's > out of date, most likely because the binary the developer is working on got > rebuilt, or the system got upgraded - in both cases the developer does not care > about the old records anymore... Oh, with mtime we'll be able to, with some effort, most of the time, find the right symtab/DSO contents, it is not as good as the content-based build-id, but it improves the current situation, for, to reuse an euphemism, 99% of the cases ;-) It is just a pity that what would arguably be the last step to make content-based DSO identifiers a first class citizen, already available mostly everywhere, i.e. in ELF binaries will have to wait a bit more. With PeterZ's change to make the filename length part of the PERF_RECORD_MMAP3 we at least leave the door open to including that cookie without having to introduce PERF_RECORD_MMAP4 :-) > What matters is #1, to detect mismatches, to be a reliable tool. Once we've > detected that, we can inform the user and our job is mostly done. Well, we could tell the user to install the package with that symtab, in fedora it is called 'foo-debuginfo' and can be installed via, for instance (example taken from a gdb post somewhere): [root@zoo ~]# dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/3d/f5385c6be529423a8ae3dd39a3deb9425201cc <SNIP> Using metadata from Wed Jan 13 12:12:16 2016 (0:02:56 hours old) Dependencies resolved. ================================================================== Package Arch Version Repository Size ================================================================== Installing: glibc-debuginfo x86_64 2.20-8.fc21 updates-debuginfo 9.2 M Transaction Summary ================================================================== Install 1 Package Total download size: 9.2 M Installed size: 58 M Is this ok [y/N]: ----------------------------------- I.e. infrastructure is in place to get this "time machine" you mention below, is somewhat in place to get a symtab for a DSO, be it the latest version of some versions ago. Which could be useful to understand the behaviour of code in production while an update can't be applied (not vetted by powers that be, whatever reason). > But reliable != perfect time machine. Really, #2 is a second, third order concern > that should never cause slowdowns on the magnitude that Peter is complaining > about! Sure, mtime will improve peterz's usecase, no question about it. > I realize that there might be special workflows (such as system-wide monitoring) > where collecting at recording time might be useful, but those are not the common > case - and they should not slow down the common case. Sure, no question about this. > > Checking for inefficiencies in the current approach of > > right-after-recording post-processing looking for PERF_RECORD_MMAPs, > > Adrian suggested something here, also disabling the saving into > > ~/.debug/ will help, collecting numbers would be great. > > I think Peter mentioned a number: the kernel build time almost _doubles_ with > this. That's clearly unacceptable. And for him, most of the time, not a problem, in fact we could argue that he has not a problem, Stephane seems to have ;-) > > But the mtime thing also requires traversing the whole perf.data > > contents looking for those paths in PERF_RECORD_MMAP records. > But why? Why cannot we do it at perf report time, when we will parse them anyway? Hey, I was talking only about 'record' time. And at record, we don'have to traverse the whole perf.data contents as soon as we add a new PERF_RECORD_MMAP3, just not with the initial motivation for it (a content-based cookie), but instead the DSOs's mtime :-) - Arnaldo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-13 9:57 ` Ingo Molnar 2016-01-13 15:27 ` Arnaldo Carvalho de Melo @ 2016-01-19 14:56 ` Namhyung Kim 2016-01-19 15:27 ` Peter Zijlstra 1 sibling, 1 reply; 47+ messages in thread From: Namhyung Kim @ 2016-01-19 14:56 UTC (permalink / raw) To: Ingo Molnar Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Stephane Eranian, LKML, Jiri Olsa, Adrian Hunter, ak On Wed, Jan 13, 2016 at 10:57:38AM +0100, Ingo Molnar wrote: > > * Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > > Em Tue, Jan 12, 2016 at 11:39:43AM +0100, Ingo Molnar escreveu: > > > But perf tooling cares very much: it can lead to subtle bugs and bad data if we > > > display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of > > > binary mismatch can be very convincing and can send developers down the wrong path > > > for hours. I'd expect my tooling to not do that. > > > > > Path names alone (the thing that exec() cares about) are not unique enough to > > > identify the binary that was profiled. So we need a content hash - hence the > > > build-ID. > > > > > Can you suggest a better solution than a build-time calculated content hash? > > > > > As for binary formats that suck and don't allow for a content hash: we do our > > > best, but of course the risk of data mismatch is there. We could perhaps cache the > > > binary inode's mtime field to at least produce a 'profile data is older than > > > binary/DSO modification date!' warning. (Which check won't catch all cases, like > > > cross-system profiling data matches.) > > > > So, we could think of this as: binary formats that want to aid > > observability tools to: > > > > 1) Detect mismatches in contents for DSOs present at recording time to > > those to be used at analysis time. > > > > 2) Find symtabs, DSO binary contents, CFI tables, present in the DSO > > where samples were taken. > > > > Using mtime, as suggested in other messages will help with #1, but not > > with #2. > > But but ... why is #2 a problem with mtime? If we have an out of date record in > the perf.data, then the perf.data is uninteresting in 99% of the usecases! It's > out of date, most likely because the binary the developer is working on got > rebuilt, or the system got upgraded - in both cases the developer does not care > about the old records anymore... We have 'perf diff' command which compares old and new performance results of a same program. People can use it to see how much improved in the new version than the baseline. In this case, the old binary should be found from the old perf.data. Thanks, Namhyung > > What matters is #1, to detect mismatches, to be a reliable tool. Once we've > detected that, we can inform the user and our job is mostly done. > > But reliable != perfect time machine. Really, #2 is a second, third order concern > that should never cause slowdowns on the magnitude that Peter is complaining > about! > > I realize that there might be special workflows (such as system-wide monitoring) > where collecting at recording time might be useful, but those are not the common > case - and they should not slow down the common case. > > > Checking for inefficiencies in the current approach of > > right-after-recording post-processing looking for PERF_RECORD_MMAPs, > > Adrian suggested something here, also disabling the saving into > > ~/.debug/ will help, collecting numbers would be great. > > I think Peter mentioned a number: the kernel build time almost _doubles_ with > this. That's clearly unacceptable. > > > But the mtime thing also requires traversing the whole perf.data > > contents looking for those paths in PERF_RECORD_MMAP records. > > But why? Why cannot we do it at perf report time, when we will parse them anyway? > > Thanks, > > Ingo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-19 14:56 ` Namhyung Kim @ 2016-01-19 15:27 ` Peter Zijlstra 2016-01-19 15:48 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2016-01-19 15:27 UTC (permalink / raw) To: Namhyung Kim Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian, LKML, Jiri Olsa, Adrian Hunter, ak On Tue, Jan 19, 2016 at 11:56:40PM +0900, Namhyung Kim wrote: > > But but ... why is #2 a problem with mtime? If we have an out of date record in > > the perf.data, then the perf.data is uninteresting in 99% of the usecases! It's > > out of date, most likely because the binary the developer is working on got > > rebuilt, or the system got upgraded - in both cases the developer does not care > > about the old records anymore... > > We have 'perf diff' command which compares old and new performance > results of a same program. People can use it to see how much improved > in the new version than the baseline. In this case, the old binary > should be found from the old perf.data. Just means they'll have to use perf-archive or whatnot before that works. Making the regular perf-record dead slow just so that a few more complex workloads work doesn't make sense. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-19 15:27 ` Peter Zijlstra @ 2016-01-19 15:48 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 47+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-01-19 15:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Namhyung Kim, Ingo Molnar, Stephane Eranian, LKML, Jiri Olsa, Adrian Hunter, ak Em Tue, Jan 19, 2016 at 04:27:42PM +0100, Peter Zijlstra escreveu: > On Tue, Jan 19, 2016 at 11:56:40PM +0900, Namhyung Kim wrote: > > > > But but ... why is #2 a problem with mtime? If we have an out of date record in > > > the perf.data, then the perf.data is uninteresting in 99% of the usecases! It's > > > out of date, most likely because the binary the developer is working on got > > > rebuilt, or the system got upgraded - in both cases the developer does not care > > > about the old records anymore... > > We have 'perf diff' command which compares old and new performance > > results of a same program. People can use it to see how much improved > > in the new version than the baseline. In this case, the old binary > > should be found from the old perf.data. > Just means they'll have to use perf-archive or whatnot before that > works. Making the regular perf-record dead slow just so that a few more > complex workloads work doesn't make sense. And perf-diff will be able to find the right binaries in most cases, as we'll end up using that mtime to pick the right binary for the baseline and the other perf.data files used. So, if both files are reachable via the usual path (/usr/lib/debug, /bin/, /lib64, ~/.debug/, etc) it'll work. For users wanting the convenience of auto-saving binaries + its sources, then they will have to incur the cost of postprocessing the just generated perf.data file to do that, and make sure debuginfo is enabled, for having the sources embedded into those fat binaries. With the current situation, without any disambiguation mechanism, we _need_ to traverse the perf.data file to find that disambiguation bit, the build-id (we could'be been using the mtime, but having to traverse it all we may as well use something better, the build-id), and while doing that we could as well do a hardlink into ~/.debug/ for the DSOs found in PERF_RECORD_MMAP(2) meta events (a copy if that hardlink isn't possible, more overhead). Now we'll use that mtime, which should be good enough for both figuring out if a file can be used to even choose among various entries for a same pathname. And we'll cover the case where a DSO is replaced during a record session, as newer PERF_RECORD_MMAP3 emitted after the update will have a different mtime, yay! The extra, content based verification to double check that is the real DSO used, which we're not using anyway right now (but could) will not be possible, i.e. regenerate the build-id from the DSO contents to check that it is really what was present at the record phase. But for those super stringent needs, the door is open to add the content-based cookie at the end of the PERF_RECORD_MMAP3, as we have in it the filename-len, etc. So now, with PERF_RECORD_MMAP3 the onus of using this all is back at where it belongs, away from kernel developers and into tools/perf/ developers, tooling should use build-ids if available and if not, fallback to mtime, if even that is not available (older kernels) then warn the user and hope the pathname is enough. - Arnaldo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-08 18:01 ` Stephane Eranian 2016-01-08 18:19 ` Arnaldo Carvalho de Melo @ 2016-01-09 10:31 ` Namhyung Kim 2016-01-11 9:27 ` Adrian Hunter 1 sibling, 1 reply; 47+ messages in thread From: Namhyung Kim @ 2016-01-09 10:31 UTC (permalink / raw) To: Stephane Eranian, Adrian Hunter Cc: Arnaldo Carvalho de Melo, LKML, Jiri Olsa, Peter Zijlstra, Ingo Molnar, ak Hi Stephane, On Fri, Jan 08, 2016 at 10:01:24AM -0800, Stephane Eranian wrote: > On Thu, Jan 7, 2016 at 3:47 PM, Arnaldo Carvalho de Melo > <acme@kernel.org> wrote: > > Em Fri, Jan 08, 2016 at 07:47:03AM +0900, Namhyung Kim escreveu: > >> On January 8, 2016 7:00:35 AM GMT+09:00, Stephane Eranian <eranian@google.com> wrote: > >> >On Thu, Jan 7, 2016 at 1:59 PM, Arnaldo Carvalho de Melo > >> ><acme@kernel.org> wrote: > >> >> Em Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian escreveu: > >> >>> Hi, > >> >>> > >> >>> Whenever you do: > >> >>> > >> >>> $ perf record -g -a sleep 10 > >> >>> > >> >>> Perf will collect the callstack for each sample. At the end of the > >> >>> run, perf record > >> >>> adds the buildid for all dso with at least one sample. But when it > >> >does this, it > >> >>> only looks at the sampled IP and ignore the modules traversed by the > >> >callstack. > >> >>> That means that, it is not possible to uniquely identify the modules > >> >executed, > >> >>> unless they had at least one IP sample captured. But this is not > >> >>> always the case. > >> >>> > >> >>> How about providing an option to perf record to force collecting > >> >>> buildid for all IPs > >> >>> captured in the callstack? I understand that would cost more at the > >> >end of the > >> >>> collection, but this would be beneficial to several monitoring > >> >scenarios. > >> >> > >> >> I agree, would consider applying a patch that provides the option but > >> >> does not do this by default. > >> >> > >> >I agree, not the default. > >> > >> Hi Stephane, > >> > >> Please see > >> > >> https://lkml.org/lkml/2015/3/22/249 > > > > > > Oops, Stephane, please try this, so that we can finally merge it :-\ > > > I will try it today. However, I am a bit worried about the performance > impact. Unless I am missing something in this approach we may end up > looking up N times the same module if it appears in N callstacks. In > Andi's suggested approach, there would be only one pass at the beginning > (or the end of the run). But you could miss some modules if they are gone > by the time you run the pass. How about this then? Adrian, is it ok to skip process_buildids() for the auxtrace? Thanks, Namhyung diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 3a1a32f5479f..fbceb631387c 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -338,6 +338,9 @@ Options passed to clang when compiling BPF scriptlets. Specify vmlinux path which has debuginfo. (enabled when BPF prologue is on) +--buildid-all:: +Record build-id of all DSOs regardless whether it's actually hit or not. + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1] diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index dc4e0adf5c5b..ab18db3153a6 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -50,6 +50,7 @@ struct record { int realtime_prio; bool no_buildid; bool no_buildid_cache; + bool buildid_all; unsigned long long samples; }; @@ -755,14 +756,10 @@ out_child: file->size = lseek(perf_data_file__fd(file), 0, SEEK_CUR); if (!rec->no_buildid) { - process_buildids(rec); - /* - * We take all buildids when the file contains - * AUX area tracing data because we do not decode the - * trace because it would take too long. - */ - if (rec->opts.full_auxtrace) + if (rec->buildid_all) dsos__hit_all(rec->session); + else + process_buildids(rec); } perf_session__write_header(rec->session, rec->evlist, fd, true); } @@ -1138,6 +1135,8 @@ struct option __record_options[] = { "options passed to clang when compiling BPF scriptlets"), OPT_STRING(0, "vmlinux", &symbol_conf.vmlinux_name, "file", "vmlinux pathname"), + OPT_BOOLEAN(0, "buildid-all", &record.buildid_all, + "Record build-id of all DSOs regardless of hits"), OPT_END() }; @@ -1255,6 +1254,14 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused) if (err) goto out_symbol_exit; + /* + * We take all buildids when the file contains + * AUX area tracing data because we do not decode the + * trace because it would take too long. + */ + if (rec->opts.full_auxtrace) + rec->buildid_all = true; + if (record_opts__config(&rec->opts)) { err = -EINVAL; goto out_symbol_exit; ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-09 10:31 ` Namhyung Kim @ 2016-01-11 9:27 ` Adrian Hunter 2016-01-11 11:02 ` Namhyung Kim 0 siblings, 1 reply; 47+ messages in thread From: Adrian Hunter @ 2016-01-11 9:27 UTC (permalink / raw) To: Namhyung Kim, Stephane Eranian Cc: Arnaldo Carvalho de Melo, LKML, Jiri Olsa, Peter Zijlstra, Ingo Molnar, ak On 09/01/16 12:31, Namhyung Kim wrote: > Hi Stephane, > > On Fri, Jan 08, 2016 at 10:01:24AM -0800, Stephane Eranian wrote: >> On Thu, Jan 7, 2016 at 3:47 PM, Arnaldo Carvalho de Melo >> <acme@kernel.org> wrote: >>> Em Fri, Jan 08, 2016 at 07:47:03AM +0900, Namhyung Kim escreveu: >>>> On January 8, 2016 7:00:35 AM GMT+09:00, Stephane Eranian <eranian@google.com> wrote: >>>>> On Thu, Jan 7, 2016 at 1:59 PM, Arnaldo Carvalho de Melo >>>>> <acme@kernel.org> wrote: >>>>>> Em Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian escreveu: >>>>>>> Hi, >>>>>>> >>>>>>> Whenever you do: >>>>>>> >>>>>>> $ perf record -g -a sleep 10 >>>>>>> >>>>>>> Perf will collect the callstack for each sample. At the end of the >>>>>>> run, perf record >>>>>>> adds the buildid for all dso with at least one sample. But when it >>>>> does this, it >>>>>>> only looks at the sampled IP and ignore the modules traversed by the >>>>> callstack. >>>>>>> That means that, it is not possible to uniquely identify the modules >>>>> executed, >>>>>>> unless they had at least one IP sample captured. But this is not >>>>>>> always the case. >>>>>>> >>>>>>> How about providing an option to perf record to force collecting >>>>>>> buildid for all IPs >>>>>>> captured in the callstack? I understand that would cost more at the >>>>> end of the >>>>>>> collection, but this would be beneficial to several monitoring >>>>> scenarios. >>>>>> >>>>>> I agree, would consider applying a patch that provides the option but >>>>>> does not do this by default. >>>>>> >>>>> I agree, not the default. >>>> >>>> Hi Stephane, >>>> >>>> Please see >>>> >>>> https://lkml.org/lkml/2015/3/22/249 >>> >>> >>> Oops, Stephane, please try this, so that we can finally merge it :-\ >>> >> I will try it today. However, I am a bit worried about the performance >> impact. Unless I am missing something in this approach we may end up >> looking up N times the same module if it appears in N callstacks. In >> Andi's suggested approach, there would be only one pass at the beginning >> (or the end of the run). But you could miss some modules if they are gone >> by the time you run the pass. > > How about this then? > > Adrian, is it ok to skip process_buildids() for the auxtrace? If you don't post-process (i.e. call process_buildids), then where do the DSOs come from? i.e. dsos__hit_all() just hits the DSOs that exist. > > Thanks, > Namhyung > > > diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt > index 3a1a32f5479f..fbceb631387c 100644 > --- a/tools/perf/Documentation/perf-record.txt > +++ b/tools/perf/Documentation/perf-record.txt > @@ -338,6 +338,9 @@ Options passed to clang when compiling BPF scriptlets. > Specify vmlinux path which has debuginfo. > (enabled when BPF prologue is on) > > +--buildid-all:: > +Record build-id of all DSOs regardless whether it's actually hit or not. > + > SEE ALSO > -------- > linkperf:perf-stat[1], linkperf:perf-list[1] > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > index dc4e0adf5c5b..ab18db3153a6 100644 > --- a/tools/perf/builtin-record.c > +++ b/tools/perf/builtin-record.c > @@ -50,6 +50,7 @@ struct record { > int realtime_prio; > bool no_buildid; > bool no_buildid_cache; > + bool buildid_all; > unsigned long long samples; > }; > > @@ -755,14 +756,10 @@ out_child: > file->size = lseek(perf_data_file__fd(file), 0, SEEK_CUR); > > if (!rec->no_buildid) { > - process_buildids(rec); > - /* > - * We take all buildids when the file contains > - * AUX area tracing data because we do not decode the > - * trace because it would take too long. > - */ > - if (rec->opts.full_auxtrace) > + if (rec->buildid_all) > dsos__hit_all(rec->session); > + else > + process_buildids(rec); > } > perf_session__write_header(rec->session, rec->evlist, fd, true); > } > @@ -1138,6 +1135,8 @@ struct option __record_options[] = { > "options passed to clang when compiling BPF scriptlets"), > OPT_STRING(0, "vmlinux", &symbol_conf.vmlinux_name, > "file", "vmlinux pathname"), > + OPT_BOOLEAN(0, "buildid-all", &record.buildid_all, > + "Record build-id of all DSOs regardless of hits"), > OPT_END() > }; > > @@ -1255,6 +1254,14 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused) > if (err) > goto out_symbol_exit; > > + /* > + * We take all buildids when the file contains > + * AUX area tracing data because we do not decode the > + * trace because it would take too long. > + */ > + if (rec->opts.full_auxtrace) > + rec->buildid_all = true; > + > if (record_opts__config(&rec->opts)) { > err = -EINVAL; > goto out_symbol_exit; > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-11 9:27 ` Adrian Hunter @ 2016-01-11 11:02 ` Namhyung Kim 2016-01-11 11:54 ` Adrian Hunter 0 siblings, 1 reply; 47+ messages in thread From: Namhyung Kim @ 2016-01-11 11:02 UTC (permalink / raw) To: Adrian Hunter Cc: Stephane Eranian, Arnaldo Carvalho de Melo, LKML, Jiri Olsa, Peter Zijlstra, Ingo Molnar, ak Hi Adrian, On Mon, Jan 11, 2016 at 11:27:56AM +0200, Adrian Hunter wrote: > On 09/01/16 12:31, Namhyung Kim wrote: > > Hi Stephane, > > > > On Fri, Jan 08, 2016 at 10:01:24AM -0800, Stephane Eranian wrote: > >> On Thu, Jan 7, 2016 at 3:47 PM, Arnaldo Carvalho de Melo > >> <acme@kernel.org> wrote: > >>> Em Fri, Jan 08, 2016 at 07:47:03AM +0900, Namhyung Kim escreveu: > >>>> On January 8, 2016 7:00:35 AM GMT+09:00, Stephane Eranian <eranian@google.com> wrote: > >>>>> On Thu, Jan 7, 2016 at 1:59 PM, Arnaldo Carvalho de Melo > >>>>> <acme@kernel.org> wrote: > >>>>>> Em Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian escreveu: > >>>>>>> Hi, > >>>>>>> > >>>>>>> Whenever you do: > >>>>>>> > >>>>>>> $ perf record -g -a sleep 10 > >>>>>>> > >>>>>>> Perf will collect the callstack for each sample. At the end of the > >>>>>>> run, perf record > >>>>>>> adds the buildid for all dso with at least one sample. But when it > >>>>> does this, it > >>>>>>> only looks at the sampled IP and ignore the modules traversed by the > >>>>> callstack. > >>>>>>> That means that, it is not possible to uniquely identify the modules > >>>>> executed, > >>>>>>> unless they had at least one IP sample captured. But this is not > >>>>>>> always the case. > >>>>>>> > >>>>>>> How about providing an option to perf record to force collecting > >>>>>>> buildid for all IPs > >>>>>>> captured in the callstack? I understand that would cost more at the > >>>>> end of the > >>>>>>> collection, but this would be beneficial to several monitoring > >>>>> scenarios. > >>>>>> > >>>>>> I agree, would consider applying a patch that provides the option but > >>>>>> does not do this by default. > >>>>>> > >>>>> I agree, not the default. > >>>> > >>>> Hi Stephane, > >>>> > >>>> Please see > >>>> > >>>> https://lkml.org/lkml/2015/3/22/249 > >>> > >>> > >>> Oops, Stephane, please try this, so that we can finally merge it :-\ > >>> > >> I will try it today. However, I am a bit worried about the performance > >> impact. Unless I am missing something in this approach we may end up > >> looking up N times the same module if it appears in N callstacks. In > >> Andi's suggested approach, there would be only one pass at the beginning > >> (or the end of the run). But you could miss some modules if they are gone > >> by the time you run the pass. > > > > How about this then? > > > > Adrian, is it ok to skip process_buildids() for the auxtrace? > > If you don't post-process (i.e. call process_buildids), then where do the > DSOs come from? i.e. dsos__hit_all() just hits the DSOs that exist. Ah, right. I somehow thought that it was processed already elsewhere. Then, how about this? >From 38b2bc273329afe4334a11d87d20ef71639132fb Mon Sep 17 00:00:00 2001 From: Namhyung Kim <namhyung@kernel.org> Date: Sat, 9 Jan 2016 22:37:55 +0900 Subject: [PATCH] perf record: Add --buildid-all option The --buildid-all option is to record build-id of all DSOs in the file. It might be very costly to postprocess samples to find which DSO hits. Signed-off-by: Namhyung Kim <namhyung@kernel.org> --- tools/perf/Documentation/perf-record.txt | 3 +++ tools/perf/builtin-record.c | 26 ++++++++++++++++++++------ 2 files changed, 23 insertions(+), 6 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 3a1a32f5479f..fbceb631387c 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -338,6 +338,9 @@ Options passed to clang when compiling BPF scriptlets. Specify vmlinux path which has debuginfo. (enabled when BPF prologue is on) +--buildid-all:: +Record build-id of all DSOs regardless whether it's actually hit or not. + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1] diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index dc4e0adf5c5b..319712a4e02b 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -50,6 +50,7 @@ struct record { int realtime_prio; bool no_buildid; bool no_buildid_cache; + bool buildid_all; unsigned long long samples; }; @@ -362,6 +363,13 @@ static int process_buildids(struct record *rec) */ symbol_conf.ignore_vmlinux_buildid = true; + /* + * If --buildid-all is given, it marks all DSO regardless of hits, + * so no need to process samples. + */ + if (rec->buildid_all) + rec->tool.sample = NULL; + return perf_session__process_events(session); } @@ -756,12 +764,8 @@ out_child: if (!rec->no_buildid) { process_buildids(rec); - /* - * We take all buildids when the file contains - * AUX area tracing data because we do not decode the - * trace because it would take too long. - */ - if (rec->opts.full_auxtrace) + + if (rec->buildid_all) dsos__hit_all(rec->session); } perf_session__write_header(rec->session, rec->evlist, fd, true); @@ -1138,6 +1142,8 @@ struct option __record_options[] = { "options passed to clang when compiling BPF scriptlets"), OPT_STRING(0, "vmlinux", &symbol_conf.vmlinux_name, "file", "vmlinux pathname"), + OPT_BOOLEAN(0, "buildid-all", &record.buildid_all, + "Record build-id of all DSOs regardless of hits"), OPT_END() }; @@ -1255,6 +1261,14 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused) if (err) goto out_symbol_exit; + /* + * We take all buildids when the file contains + * AUX area tracing data because we do not decode the + * trace because it would take too long. + */ + if (rec->opts.full_auxtrace) + rec->buildid_all = true; + if (record_opts__config(&rec->opts)) { err = -EINVAL; goto out_symbol_exit; -- 2.6.4 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC] perf record: missing buildid for callstack modules 2016-01-11 11:02 ` Namhyung Kim @ 2016-01-11 11:54 ` Adrian Hunter 0 siblings, 0 replies; 47+ messages in thread From: Adrian Hunter @ 2016-01-11 11:54 UTC (permalink / raw) To: Namhyung Kim Cc: Stephane Eranian, Arnaldo Carvalho de Melo, LKML, Jiri Olsa, Peter Zijlstra, Ingo Molnar, ak On 11/01/16 13:02, Namhyung Kim wrote: > Hi Adrian, > > On Mon, Jan 11, 2016 at 11:27:56AM +0200, Adrian Hunter wrote: >> On 09/01/16 12:31, Namhyung Kim wrote: >>> Hi Stephane, >>> >>> On Fri, Jan 08, 2016 at 10:01:24AM -0800, Stephane Eranian wrote: >>>> On Thu, Jan 7, 2016 at 3:47 PM, Arnaldo Carvalho de Melo >>>> <acme@kernel.org> wrote: >>>>> Em Fri, Jan 08, 2016 at 07:47:03AM +0900, Namhyung Kim escreveu: >>>>>> On January 8, 2016 7:00:35 AM GMT+09:00, Stephane Eranian <eranian@google.com> wrote: >>>>>>> On Thu, Jan 7, 2016 at 1:59 PM, Arnaldo Carvalho de Melo >>>>>>> <acme@kernel.org> wrote: >>>>>>>> Em Thu, Jan 07, 2016 at 01:56:14PM -0800, Stephane Eranian escreveu: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Whenever you do: >>>>>>>>> >>>>>>>>> $ perf record -g -a sleep 10 >>>>>>>>> >>>>>>>>> Perf will collect the callstack for each sample. At the end of the >>>>>>>>> run, perf record >>>>>>>>> adds the buildid for all dso with at least one sample. But when it >>>>>>> does this, it >>>>>>>>> only looks at the sampled IP and ignore the modules traversed by the >>>>>>> callstack. >>>>>>>>> That means that, it is not possible to uniquely identify the modules >>>>>>> executed, >>>>>>>>> unless they had at least one IP sample captured. But this is not >>>>>>>>> always the case. >>>>>>>>> >>>>>>>>> How about providing an option to perf record to force collecting >>>>>>>>> buildid for all IPs >>>>>>>>> captured in the callstack? I understand that would cost more at the >>>>>>> end of the >>>>>>>>> collection, but this would be beneficial to several monitoring >>>>>>> scenarios. >>>>>>>> >>>>>>>> I agree, would consider applying a patch that provides the option but >>>>>>>> does not do this by default. >>>>>>>> >>>>>>> I agree, not the default. >>>>>> >>>>>> Hi Stephane, >>>>>> >>>>>> Please see >>>>>> >>>>>> https://lkml.org/lkml/2015/3/22/249 >>>>> >>>>> >>>>> Oops, Stephane, please try this, so that we can finally merge it :-\ >>>>> >>>> I will try it today. However, I am a bit worried about the performance >>>> impact. Unless I am missing something in this approach we may end up >>>> looking up N times the same module if it appears in N callstacks. In >>>> Andi's suggested approach, there would be only one pass at the beginning >>>> (or the end of the run). But you could miss some modules if they are gone >>>> by the time you run the pass. >>> >>> How about this then? >>> >>> Adrian, is it ok to skip process_buildids() for the auxtrace? >> >> If you don't post-process (i.e. call process_buildids), then where do the >> DSOs come from? i.e. dsos__hit_all() just hits the DSOs that exist. > > Ah, right. I somehow thought that it was processed already elsewhere. > > Then, how about this? Looks OK for auxtrace. In other respects you should probably clarify what happens when --no-buildid and --buildid-all are used together. Maybe it should be an error. > > > >>From 38b2bc273329afe4334a11d87d20ef71639132fb Mon Sep 17 00:00:00 2001 > From: Namhyung Kim <namhyung@kernel.org> > Date: Sat, 9 Jan 2016 22:37:55 +0900 > Subject: [PATCH] perf record: Add --buildid-all option > > The --buildid-all option is to record build-id of all DSOs in the file. > It might be very costly to postprocess samples to find which DSO hits. > > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > --- > tools/perf/Documentation/perf-record.txt | 3 +++ > tools/perf/builtin-record.c | 26 ++++++++++++++++++++------ > 2 files changed, 23 insertions(+), 6 deletions(-) > > diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt > index 3a1a32f5479f..fbceb631387c 100644 > --- a/tools/perf/Documentation/perf-record.txt > +++ b/tools/perf/Documentation/perf-record.txt > @@ -338,6 +338,9 @@ Options passed to clang when compiling BPF scriptlets. > Specify vmlinux path which has debuginfo. > (enabled when BPF prologue is on) > > +--buildid-all:: > +Record build-id of all DSOs regardless whether it's actually hit or not. > + > SEE ALSO > -------- > linkperf:perf-stat[1], linkperf:perf-list[1] > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > index dc4e0adf5c5b..319712a4e02b 100644 > --- a/tools/perf/builtin-record.c > +++ b/tools/perf/builtin-record.c > @@ -50,6 +50,7 @@ struct record { > int realtime_prio; > bool no_buildid; > bool no_buildid_cache; > + bool buildid_all; > unsigned long long samples; > }; > > @@ -362,6 +363,13 @@ static int process_buildids(struct record *rec) > */ > symbol_conf.ignore_vmlinux_buildid = true; > > + /* > + * If --buildid-all is given, it marks all DSO regardless of hits, > + * so no need to process samples. > + */ > + if (rec->buildid_all) > + rec->tool.sample = NULL; > + > return perf_session__process_events(session); > } > > @@ -756,12 +764,8 @@ out_child: > > if (!rec->no_buildid) { > process_buildids(rec); > - /* > - * We take all buildids when the file contains > - * AUX area tracing data because we do not decode the > - * trace because it would take too long. > - */ > - if (rec->opts.full_auxtrace) > + > + if (rec->buildid_all) > dsos__hit_all(rec->session); > } > perf_session__write_header(rec->session, rec->evlist, fd, true); > @@ -1138,6 +1142,8 @@ struct option __record_options[] = { > "options passed to clang when compiling BPF scriptlets"), > OPT_STRING(0, "vmlinux", &symbol_conf.vmlinux_name, > "file", "vmlinux pathname"), > + OPT_BOOLEAN(0, "buildid-all", &record.buildid_all, > + "Record build-id of all DSOs regardless of hits"), > OPT_END() > }; > > @@ -1255,6 +1261,14 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused) > if (err) > goto out_symbol_exit; > > + /* > + * We take all buildids when the file contains > + * AUX area tracing data because we do not decode the > + * trace because it would take too long. > + */ > + if (rec->opts.full_auxtrace) > + rec->buildid_all = true; > + > if (record_opts__config(&rec->opts)) { > err = -EINVAL; > goto out_symbol_exit; > ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2016-01-19 15:48 UTC | newest] Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-07 21:56 [RFC] perf record: missing buildid for callstack modules Stephane Eranian 2016-01-07 21:57 ` Andi Kleen 2016-01-07 22:00 ` Stephane Eranian 2016-01-07 21:59 ` Arnaldo Carvalho de Melo 2016-01-07 22:00 ` Stephane Eranian 2016-01-07 22:47 ` Namhyung Kim 2016-01-07 23:47 ` Arnaldo Carvalho de Melo 2016-01-08 18:01 ` Stephane Eranian 2016-01-08 18:19 ` Arnaldo Carvalho de Melo 2016-01-11 17:30 ` Peter Zijlstra 2016-01-11 18:22 ` Arnaldo Carvalho de Melo 2016-01-11 20:06 ` Stephane Eranian 2016-01-12 10:39 ` Ingo Molnar 2016-01-12 11:35 ` Peter Zijlstra 2016-01-12 12:18 ` Ingo Molnar 2016-01-12 13:40 ` Peter Zijlstra 2016-01-12 14:38 ` Arnaldo Carvalho de Melo 2016-01-12 15:34 ` Peter Zijlstra 2016-01-12 15:48 ` Arnaldo Carvalho de Melo 2016-01-12 16:10 ` Peter Zijlstra 2016-01-12 16:27 ` Peter Zijlstra 2016-01-12 17:15 ` Arnaldo Carvalho de Melo 2016-01-13 10:21 ` Ingo Molnar 2016-01-13 12:40 ` Peter Zijlstra 2016-01-14 11:27 ` Ingo Molnar 2016-01-14 11:36 ` Peter Zijlstra 2016-01-15 1:59 ` Stephane Eranian 2016-01-15 9:34 ` Peter Zijlstra 2016-01-15 18:58 ` Stephane Eranian 2016-01-15 19:49 ` Arnaldo Carvalho de Melo 2016-01-15 21:49 ` Stephane Eranian 2016-01-15 21:36 ` Peter Zijlstra 2016-01-12 21:02 ` Stephane Eranian 2016-01-12 13:08 ` One Thousand Gnomes 2016-01-12 14:34 ` Arnaldo Carvalho de Melo 2016-01-12 15:37 ` Peter Zijlstra 2016-01-13 10:25 ` Ingo Molnar 2016-01-12 14:23 ` Arnaldo Carvalho de Melo 2016-01-13 9:57 ` Ingo Molnar 2016-01-13 15:27 ` Arnaldo Carvalho de Melo 2016-01-19 14:56 ` Namhyung Kim 2016-01-19 15:27 ` Peter Zijlstra 2016-01-19 15:48 ` Arnaldo Carvalho de Melo 2016-01-09 10:31 ` Namhyung Kim 2016-01-11 9:27 ` Adrian Hunter 2016-01-11 11:02 ` Namhyung Kim 2016-01-11 11:54 ` Adrian Hunter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.