* perf/jit doesn't cope well with mprotect() to jit containing pages @ 2016-12-10 5:02 Andres Freund 2016-12-12 8:49 ` Peter Zijlstra 2017-01-26 20:32 ` perf/jit doesn't cope well with mprotect() to jit containing pages Stephane Eranian 0 siblings, 2 replies; 29+ messages in thread From: Andres Freund @ 2016-12-10 5:02 UTC (permalink / raw) To: linux-kernel, Stephane Eranian Cc: acme, jolsa, peterz, mingo, anton, namhyung, Stephane Eranian Hi, While working on optionally jit-compiling parts of postgres using llvm (MCJIT currently, but Orc would have the same issue afaics), I'm trying to use perf jit support to make profiling of those JITed parts easier. Turns out the current jit support in perf doesn't work that well for LLVM - but it doesn't primarily look like LLVM's fault. Syscall-wise llvm does (heavily filtered): mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000 mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0 mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0 write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68 mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000 mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000 mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0 mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0 write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68 ... i.e. it mmaps single pages for the each JITed function's sections. Which makes sense, because the first function is JITed independently from the second one. The corresponding MMAP2 records according to perf perf script --show-mmap-events are: postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon Note how the size of the mapping continually increases, so that the each MMAP2 record covers previous sections. If one perf inject --jit into that it looks like: postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon postgres 4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so Note how the first injected record is also covered by the following "//anon" event. This leads to the the curious effect that samples for the first function (evalexpr0) are associated with the right generated .so, until the second function is JITed. I hacked up perf inject to omit such MMAP2 records by adding if (event->mmap2.prot == 0) return 0; to perf_event__jit_repipe_mmap2() and suddenly things work. I presume the increasing MMAP2 size is triggered by the consecutive pages being represented as a single page-range in the kernel? If I, to work around such consecutive pages, force another page to be mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes away. Am I doing something wrong, or is there a bug here? FWIW, this is on linux 4.8.8, with perf from master (v4.9-rc8-108-g810ac7b7558d). BTW, it's also a bit weird that those MMAP2 records triggered by mprotect/mmap, have prot set to 0... Regards, Andres ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2016-12-10 5:02 perf/jit doesn't cope well with mprotect() to jit containing pages Andres Freund @ 2016-12-12 8:49 ` Peter Zijlstra 2016-12-12 9:01 ` Andres Freund 2017-01-26 22:15 ` Peter Zijlstra 2017-01-26 20:32 ` perf/jit doesn't cope well with mprotect() to jit containing pages Stephane Eranian 1 sibling, 2 replies; 29+ messages in thread From: Peter Zijlstra @ 2016-12-12 8:49 UTC (permalink / raw) To: Andres Freund Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton, namhyung, Stephane Eranian On Fri, Dec 09, 2016 at 09:02:18PM -0800, Andres Freund wrote: > > I presume the increasing MMAP2 size is triggered by the consecutive > pages being represented as a single page-range in the kernel? Yes, we print struct vm_area_struct based information, if vmas get merged, that shows. > If I, to work around such consecutive pages, force another page to be > mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes > away. This would indeed inhibit vma merging. > Am I doing something wrong, or is there a bug here? Expected behaviour afaict > BTW, it's also a bit weird that those MMAP2 records triggered by > mprotect/mmap, have prot set to 0... Yes, mprotect() does: vma->vm_flags = newflags; before calling perf_event_mmap(vma); which then looks at VM_{READ,WRITE,EXEC} bits in that word to generate the prot value. So that being 0 is a bit weird. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2016-12-12 8:49 ` Peter Zijlstra @ 2016-12-12 9:01 ` Andres Freund 2016-12-12 9:28 ` Peter Zijlstra 2017-01-26 22:15 ` Peter Zijlstra 1 sibling, 1 reply; 29+ messages in thread From: Andres Freund @ 2016-12-12 9:01 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton, namhyung, Stephane Eranian On December 12, 2016 12:49:03 AM PST, Peter Zijlstra <peterz@infradead.org> wrote: >On Fri, Dec 09, 2016 at 09:02:18PM -0800, Andres Freund wrote: >> Am I doing something wrong, or is there a bug here? > >Expected behaviour afaict So I need to prevent vma merging to use perf jit support? That seems a bit weird. Possibly the inject --jit pass needs to do something about this? A hack might be to re-emit the fake mmap2 records after ones potentially overwriting previously emitted records. Don't know perf internals well enough to suggest something prettier. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2016-12-12 9:01 ` Andres Freund @ 2016-12-12 9:28 ` Peter Zijlstra 2017-01-26 1:25 ` Andres Freund 0 siblings, 1 reply; 29+ messages in thread From: Peter Zijlstra @ 2016-12-12 9:28 UTC (permalink / raw) To: Andres Freund Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton, namhyung, Stephane Eranian On Mon, Dec 12, 2016 at 01:01:48AM -0800, Andres Freund wrote: > > > On December 12, 2016 12:49:03 AM PST, Peter Zijlstra <peterz@infradead.org> wrote: > >On Fri, Dec 09, 2016 at 09:02:18PM -0800, Andres Freund wrote: > > >> Am I doing something wrong, or is there a bug here? > > > >Expected behaviour afaict > > So I need to prevent vma merging to use perf jit support? That seems > a bit weird. No idea about that, I don't really have the whole JIT stuff present atm. I just know how the mmap stuff works. Stephane? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2016-12-12 9:28 ` Peter Zijlstra @ 2017-01-26 1:25 ` Andres Freund 0 siblings, 0 replies; 29+ messages in thread From: Andres Freund @ 2017-01-26 1:25 UTC (permalink / raw) To: Stephane Eranian Cc: linux-kernel, Peter Zijlstra, acme, jolsa, mingo, anton, namhyung, Stephane Eranian Hi, On 2016-12-12 10:28:13 +0100, Peter Zijlstra wrote: > On Mon, Dec 12, 2016 at 01:01:48AM -0800, Andres Freund wrote: > > > > > > On December 12, 2016 12:49:03 AM PST, Peter Zijlstra <peterz@infradead.org> wrote: > > >On Fri, Dec 09, 2016 at 09:02:18PM -0800, Andres Freund wrote: > > > > >> Am I doing something wrong, or is there a bug here? > > > > > >Expected behaviour afaict > > > > So I need to prevent vma merging to use perf jit support? That seems > > a bit weird. > > No idea about that, I don't really have the whole JIT stuff present atm. > I just know how the mmap stuff works. > > Stephane? Ping? Andres ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2016-12-12 8:49 ` Peter Zijlstra 2016-12-12 9:01 ` Andres Freund @ 2017-01-26 22:15 ` Peter Zijlstra 2017-01-26 23:04 ` Andres Freund 2017-01-30 11:52 ` [tip:perf/core] perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory tip-bot for Peter Zijlstra 1 sibling, 2 replies; 29+ messages in thread From: Peter Zijlstra @ 2017-01-26 22:15 UTC (permalink / raw) To: Andres Freund Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton, namhyung, Stephane Eranian On Mon, Dec 12, 2016 at 09:49:03AM +0100, Peter Zijlstra wrote: > > BTW, it's also a bit weird that those MMAP2 records triggered by > > mprotect/mmap, have prot set to 0... > > Yes, mprotect() does: vma->vm_flags = newflags; before calling > perf_event_mmap(vma); which then looks at VM_{READ,WRITE,EXEC} bits in > that word to generate the prot value. > > So that being 0 is a bit weird. OK, that was a silly bug :/ With that fixed, the following proglet: #include <unistd.h> #include <signal.h> #include <stdio.h> #include <malloc.h> #include <stdlib.h> #include <errno.h> #include <sys/mman.h> void main(void) { void *ptr[5]; int i; for (i=0; i<5; i++) { ptr[i] = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); *(char *)ptr[i] = '\0'; mprotect(ptr[i], 4096, PROT_EXEC); } } Gives: root@ivb-ep:~# perf record -d ./mprot ... root@ivb-ep:~# perf report -D | grep MMAP2 ... 43805378481 0xc60 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0862000(0x1000) @ 0x7f6ec0862000 00:00 0 0]: --xp //anon 43805381755 0xcc0 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0861000(0x1000) @ 0x7f6ec0861000 00:00 0 0]: rw-p //anon 43805392374 0xd20 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0861000(0x2000) @ 0x7f6ec0861000 00:00 0 0]: --xp //anon 43805395093 0xd80 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0860000(0x1000) @ 0x7f6ec0860000 00:00 0 0]: rw-p //anon 43805402842 0xde0 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0860000(0x3000) @ 0x7f6ec0860000 00:00 0 0]: --xp //anon 43805405388 0xe40 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec085f000(0x1000) @ 0x7f6ec085f000 00:00 0 0]: rw-p //anon 43805412605 0xea0 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec085f000(0x4000) @ 0x7f6ec085f000 00:00 0 0]: --xp //anon 43805415134 0xf00 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec085e000(0x1000) @ 0x7f6ec085e000 00:00 0 0]: rw-p //anon 43805422225 0xf60 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec085e000(0x5000) @ 0x7f6ec085e000 00:00 0 0]: --xp //anon If you omit the mmap_data=1 thing, it looks like: 123087941779 0x500 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8bd000(0x1000) @ 0x7fca5a8bd000 00:00 0 0]: --xp //anon 123087953825 0x560 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8bc000(0x2000) @ 0x7fca5a8bc000 00:00 0 0]: --xp //anon 123087962944 0x5c0 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8bb000(0x3000) @ 0x7fca5a8bb000 00:00 0 0]: --xp //anon 123087971135 0x620 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8ba000(0x4000) @ 0x7fca5a8ba000 00:00 0 0]: --xp //anon 123087979360 0x680 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8b9000(0x5000) @ 0x7fca5a8b9000 00:00 0 0]: --xp //anon --- diff --git a/kernel/events/core.c b/kernel/events/core.c index 896aa53..e67c5ba 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6660,6 +6683,27 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) char *buf = NULL; char *name; + if (vma->vm_flags & VM_READ) + prot |= PROT_READ; + if (vma->vm_flags & VM_WRITE) + prot |= PROT_WRITE; + if (vma->vm_flags & VM_EXEC) + prot |= PROT_EXEC; + + if (vma->vm_flags & VM_MAYSHARE) + flags = MAP_SHARED; + else + flags = MAP_PRIVATE; + + if (vma->vm_flags & VM_DENYWRITE) + flags |= MAP_DENYWRITE; + if (vma->vm_flags & VM_MAYEXEC) + flags |= MAP_EXECUTABLE; + if (vma->vm_flags & VM_LOCKED) + flags |= MAP_LOCKED; + if (vma->vm_flags & VM_HUGETLB) + flags |= MAP_HUGETLB; + if (file) { struct inode *inode; dev_t dev; @@ -6686,27 +6730,6 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) maj = MAJOR(dev); min = MINOR(dev); - if (vma->vm_flags & VM_READ) - prot |= PROT_READ; - if (vma->vm_flags & VM_WRITE) - prot |= PROT_WRITE; - if (vma->vm_flags & VM_EXEC) - prot |= PROT_EXEC; - - if (vma->vm_flags & VM_MAYSHARE) - flags = MAP_SHARED; - else - flags = MAP_PRIVATE; - - if (vma->vm_flags & VM_DENYWRITE) - flags |= MAP_DENYWRITE; - if (vma->vm_flags & VM_MAYEXEC) - flags |= MAP_EXECUTABLE; - if (vma->vm_flags & VM_LOCKED) - flags |= MAP_LOCKED; - if (vma->vm_flags & VM_HUGETLB) - flags |= MAP_HUGETLB; - goto got_name; } else { if (vma->vm_ops && vma->vm_ops->name) { ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 22:15 ` Peter Zijlstra @ 2017-01-26 23:04 ` Andres Freund 2017-01-30 11:52 ` [tip:perf/core] perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory tip-bot for Peter Zijlstra 1 sibling, 0 replies; 29+ messages in thread From: Andres Freund @ 2017-01-26 23:04 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton, namhyung, Stephane Eranian On 2017-01-26 23:15:08 +0100, Peter Zijlstra wrote: > On Mon, Dec 12, 2016 at 09:49:03AM +0100, Peter Zijlstra wrote: > > > > BTW, it's also a bit weird that those MMAP2 records triggered by > > > mprotect/mmap, have prot set to 0... > > > > Yes, mprotect() does: vma->vm_flags = newflags; before calling > > perf_event_mmap(vma); which then looks at VM_{READ,WRITE,EXEC} bits in > > that word to generate the prot value. > > > > So that being 0 is a bit weird. > > OK, that was a silly bug :/ Heh, that explains that part. Thanks for looking into it! > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 896aa53..e67c5ba 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -6660,6 +6683,27 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) > char *buf = NULL; > char *name; > > + if (vma->vm_flags & VM_READ) > + prot |= PROT_READ; > + if (vma->vm_flags & VM_WRITE) > + prot |= PROT_WRITE; > + if (vma->vm_flags & VM_EXEC) > + prot |= PROT_EXEC; > + > + if (vma->vm_flags & VM_MAYSHARE) > + flags = MAP_SHARED; > + else > + flags = MAP_PRIVATE; > + > + if (vma->vm_flags & VM_DENYWRITE) > + flags |= MAP_DENYWRITE; > + if (vma->vm_flags & VM_MAYEXEC) > + flags |= MAP_EXECUTABLE; > + if (vma->vm_flags & VM_LOCKED) > + flags |= MAP_LOCKED; > + if (vma->vm_flags & VM_HUGETLB) > + flags |= MAP_HUGETLB; > + > if (file) { > struct inode *inode; > dev_t dev; > @@ -6686,27 +6730,6 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) > maj = MAJOR(dev); > min = MINOR(dev); > > - if (vma->vm_flags & VM_READ) > - prot |= PROT_READ; > - if (vma->vm_flags & VM_WRITE) > - prot |= PROT_WRITE; > - if (vma->vm_flags & VM_EXEC) > - prot |= PROT_EXEC; > - > - if (vma->vm_flags & VM_MAYSHARE) > - flags = MAP_SHARED; > - else > - flags = MAP_PRIVATE; > - > - if (vma->vm_flags & VM_DENYWRITE) > - flags |= MAP_DENYWRITE; > - if (vma->vm_flags & VM_MAYEXEC) > - flags |= MAP_EXECUTABLE; > - if (vma->vm_flags & VM_LOCKED) > - flags |= MAP_LOCKED; > - if (vma->vm_flags & VM_HUGETLB) > - flags |= MAP_HUGETLB; > - > goto got_name; > } else { > if (vma->vm_ops && vma->vm_ops->name) { I'm not a mm/perf (or kernel in general) expert. But this looks obviously buggy to me. And the fix makes it look correct. So if that's helpful (I'm not actually sure): Signed-off-by: Andres Freund <andres@anarazel.de> Not sure if this on its own has enough consequences to be worth being put into stable? Not that it's this fix's responsibility, but I did note that that piece of code doesn't maintain MAP_GROWSDOWN, nor set the bits controlling the size of the HUGETLB mapping. Seems harmless enough, but given that I already looked I thought it'd bring it up. Greetings, Andres Freund ^ permalink raw reply [flat|nested] 29+ messages in thread
* [tip:perf/core] perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory 2017-01-26 22:15 ` Peter Zijlstra 2017-01-26 23:04 ` Andres Freund @ 2017-01-30 11:52 ` tip-bot for Peter Zijlstra 1 sibling, 0 replies; 29+ messages in thread From: tip-bot for Peter Zijlstra @ 2017-01-30 11:52 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, torvalds, mingo, jolsa, andres, acme, tglx, peterz, alexander.shishkin, eranian, hpa, eranian Commit-ID: 0b3589be9b98994ce3d5aeca52445d1f5627c4ba Gitweb: http://git.kernel.org/tip/0b3589be9b98994ce3d5aeca52445d1f5627c4ba Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Thu, 26 Jan 2017 23:15:08 +0100 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Mon, 30 Jan 2017 11:41:26 +0100 perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory Andres reported that MMAP2 records for anonymous memory always have their protection field 0. Turns out, someone daft put the prot/flags generation code in the file branch, leaving them unset for anonymous memory. Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Don Zickus <dzickus@redhat.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: anton@ozlabs.org Cc: namhyung@kernel.org Cc: stable@vger.kernel.org # v3.16+ Fixes: f972eb63b100 ("perf: Pass protection and flags bits through mmap2 interface") Link: http://lkml.kernel.org/r/20170126221508.GF6536@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/events/core.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 4e1f4c0..e5aaa80 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6632,6 +6632,27 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) char *buf = NULL; char *name; + if (vma->vm_flags & VM_READ) + prot |= PROT_READ; + if (vma->vm_flags & VM_WRITE) + prot |= PROT_WRITE; + if (vma->vm_flags & VM_EXEC) + prot |= PROT_EXEC; + + if (vma->vm_flags & VM_MAYSHARE) + flags = MAP_SHARED; + else + flags = MAP_PRIVATE; + + if (vma->vm_flags & VM_DENYWRITE) + flags |= MAP_DENYWRITE; + if (vma->vm_flags & VM_MAYEXEC) + flags |= MAP_EXECUTABLE; + if (vma->vm_flags & VM_LOCKED) + flags |= MAP_LOCKED; + if (vma->vm_flags & VM_HUGETLB) + flags |= MAP_HUGETLB; + if (file) { struct inode *inode; dev_t dev; @@ -6658,27 +6679,6 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) maj = MAJOR(dev); min = MINOR(dev); - if (vma->vm_flags & VM_READ) - prot |= PROT_READ; - if (vma->vm_flags & VM_WRITE) - prot |= PROT_WRITE; - if (vma->vm_flags & VM_EXEC) - prot |= PROT_EXEC; - - if (vma->vm_flags & VM_MAYSHARE) - flags = MAP_SHARED; - else - flags = MAP_PRIVATE; - - if (vma->vm_flags & VM_DENYWRITE) - flags |= MAP_DENYWRITE; - if (vma->vm_flags & VM_MAYEXEC) - flags |= MAP_EXECUTABLE; - if (vma->vm_flags & VM_LOCKED) - flags |= MAP_LOCKED; - if (vma->vm_flags & VM_HUGETLB) - flags |= MAP_HUGETLB; - goto got_name; } else { if (vma->vm_ops && vma->vm_ops->name) { ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2016-12-10 5:02 perf/jit doesn't cope well with mprotect() to jit containing pages Andres Freund 2016-12-12 8:49 ` Peter Zijlstra @ 2017-01-26 20:32 ` Stephane Eranian 2017-01-26 21:00 ` Andres Freund 1 sibling, 1 reply; 29+ messages in thread From: Stephane Eranian @ 2017-01-26 20:32 UTC (permalink / raw) To: Andres Freund Cc: LKML, Stephane Eranian, Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar, anton, Namhyung Kim Hi Andres, Sorry for the delay. On Fri, Dec 9, 2016 at 9:02 PM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > While working on optionally jit-compiling parts of postgres using llvm > (MCJIT currently, but Orc would have the same issue afaics), I'm trying > to use perf jit support to make profiling of those JITed parts easier. > > Turns out the current jit support in perf doesn't work that well for > LLVM - but it doesn't primarily look like LLVM's fault. Syscall-wise > llvm does (heavily filtered): > Ok, when you say you are using perf jit with LLVM, I want to understand how? The perf jit support requires cooperation from the jitted runtime. For the case of Java there is a JVMTI agent (shared lib) that generates a jitdump info file with information about how the runtime jitted functions and what is mapped where. The kernel is still recording the mmap syscalls. But you need more to map an address to a symbol. You need to know where and when the JIT compiler mapped functions, when they are jitted, re-jitted or moved. This is the role of the agent to encode all of this in the jitdump file which is then used by perf inject to rewrite the perf.data file with the new mmap records that point to the .so files that it generates for each jitted function. Did you write such support? Is it already there in LLVM? If not, then you will not be able to correctly correlate symbols. > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000 > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000 > mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0 > mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0 > write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68 > > mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000 > mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000 > mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0 > mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0 > write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68 > > ... > > i.e. it mmaps single pages for the each JITed function's sections. Which > makes sense, because the first function is JITed independently from the > second one. > > The corresponding MMAP2 records according to perf perf script > --show-mmap-events are: > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon > postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon > Note how the size of the mapping continually increases, so that the each > MMAP2 record covers previous sections. > > If one perf inject --jit into that it looks like: > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon > postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon > postgres 4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so > > Note how the first injected record is also covered by the following > "//anon" event. This leads to the the curious effect that samples for > the first function (evalexpr0) are associated with the right generated > .so, until the second function is JITed. > > I hacked up perf inject to omit such MMAP2 records by adding > if (event->mmap2.prot == 0) > return 0; > to perf_event__jit_repipe_mmap2() and suddenly things work. > > I presume the increasing MMAP2 size is triggered by the consecutive > pages being represented as a single page-range in the kernel? > > If I, to work around such consecutive pages, force another page to be > mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes > away. > > Am I doing something wrong, or is there a bug here? > > FWIW, this is on linux 4.8.8, with perf from master > (v4.9-rc8-108-g810ac7b7558d). > > BTW, it's also a bit weird that those MMAP2 records triggered by > mprotect/mmap, have prot set to 0... > > Regards, > > Andres > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 20:32 ` perf/jit doesn't cope well with mprotect() to jit containing pages Stephane Eranian @ 2017-01-26 21:00 ` Andres Freund 2017-01-26 21:17 ` Stephane Eranian 2017-01-26 22:19 ` Peter Zijlstra 0 siblings, 2 replies; 29+ messages in thread From: Andres Freund @ 2017-01-26 21:00 UTC (permalink / raw) To: eranian Cc: LKML, Stephane Eranian, Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar, anton, Namhyung Kim Hi Stephane, On 2017-01-26 12:32:02 -0800, Stephane Eranian wrote: > On Fri, Dec 9, 2016 at 9:02 PM, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > > > While working on optionally jit-compiling parts of postgres using llvm > > (MCJIT currently, but Orc would have the same issue afaics), I'm trying > > to use perf jit support to make profiling of those JITed parts easier. > > > > Turns out the current jit support in perf doesn't work that well for > > LLVM - but it doesn't primarily look like LLVM's fault. Syscall-wise > > llvm does (heavily filtered): > > > Ok, when you say you are using perf jit with LLVM, I want to understand how? Sure. > The perf jit support requires cooperation from the jitted runtime. For > the case of Java > there is a JVMTI agent (shared lib) that generates a jitdump info file > with information about > how the runtime jitted functions and what is mapped where. The kernel > is still recording > the mmap syscalls. But you need more to map an address to a symbol. > You need to know > where and when the JIT compiler mapped functions, when they are > jitted, re-jitted or moved. This is the > role of the agent to encode all of this in the jitdump file which is > then used by perf inject > to rewrite the perf.data file with the new mmap records that point to > the .so files that it generates > for each jitted function. Right. > Did you write such support? Yes. LLVM's jit has callback support both for symbol and debugging information. You can register additional sets of callbacks to be called (at runtime). There's basically +void PerfJITEventListener::NotifyObjectEmitted( + const ObjectFile &Obj, + const RuntimeDyld::LoadedObjectInfo &L) { and +void PerfJITEventListener::NotifyFreeingObject(const ObjectFile &Obj) { which is enough to use the perf jit support. > Is it already there in LLVM? Nope, not yet. I didn't want to submit an implementation that has the ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But once that's resolved I plan to push it upstream (they indicated interest). As long as I somehow prevent VMA merging (or just filter out PERF_RECORD_MMAP2 records with prot 0), that support works. The problem is that (quoted below) without that hack the subsequent mmaps just expand the previous VMAs which leads to perf loosing its (address,range) -> symbol mappings for previously (in the same expanded range) known symbols. Greetings, Andres > If not, then you will not be able to correctly correlate symbols. > > > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000 > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000 > > mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0 > > mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0 > > write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68 > > > > mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000 > > mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000 > > mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0 > > mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0 > > write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68 > > > > ... > > > > i.e. it mmaps single pages for the each JITed function's sections. Which > > makes sense, because the first function is JITed independently from the > > second one. > > > > The corresponding MMAP2 records according to perf perf script > > --show-mmap-events are: > > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon > > postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon > > Note how the size of the mapping continually increases, so that the each > > MMAP2 record covers previous sections. > > > > If one perf inject --jit into that it looks like: > > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > > postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so > > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon > > postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon > > postgres 4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so > > > > Note how the first injected record is also covered by the following > > "//anon" event. This leads to the the curious effect that samples for > > the first function (evalexpr0) are associated with the right generated > > .so, until the second function is JITed. > > > > I hacked up perf inject to omit such MMAP2 records by adding > > if (event->mmap2.prot == 0) > > return 0; > > to perf_event__jit_repipe_mmap2() and suddenly things work. > > > > I presume the increasing MMAP2 size is triggered by the consecutive > > pages being represented as a single page-range in the kernel? > > > > If I, to work around such consecutive pages, force another page to be > > mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes > > away. > > > > Am I doing something wrong, or is there a bug here? > > > > FWIW, this is on linux 4.8.8, with perf from master > > (v4.9-rc8-108-g810ac7b7558d). > > > > BTW, it's also a bit weird that those MMAP2 records triggered by > > mprotect/mmap, have prot set to 0... > > > > Regards, > > > > Andres > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 21:00 ` Andres Freund @ 2017-01-26 21:17 ` Stephane Eranian 2017-01-26 21:22 ` Andres Freund 2017-01-26 22:19 ` Peter Zijlstra 1 sibling, 1 reply; 29+ messages in thread From: Stephane Eranian @ 2017-01-26 21:17 UTC (permalink / raw) To: Andres Freund Cc: eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar, Anton Blanchard, Namhyung Kim On Thu, Jan 26, 2017 at 1:00 PM, Andres Freund <andres@anarazel.de> wrote: > > Hi Stephane, > > On 2017-01-26 12:32:02 -0800, Stephane Eranian wrote: > > On Fri, Dec 9, 2016 at 9:02 PM, Andres Freund <andres@anarazel.de> wrote: > > > Hi, > > > > > > While working on optionally jit-compiling parts of postgres using llvm > > > (MCJIT currently, but Orc would have the same issue afaics), I'm trying > > > to use perf jit support to make profiling of those JITed parts easier. > > > > > > Turns out the current jit support in perf doesn't work that well for > > > LLVM - but it doesn't primarily look like LLVM's fault. Syscall-wise > > > llvm does (heavily filtered): > > > > > Ok, when you say you are using perf jit with LLVM, I want to understand how? > > Sure. > > > > The perf jit support requires cooperation from the jitted runtime. For > > the case of Java > > there is a JVMTI agent (shared lib) that generates a jitdump info file > > with information about > > how the runtime jitted functions and what is mapped where. The kernel > > is still recording > > the mmap syscalls. But you need more to map an address to a symbol. > > You need to know > > where and when the JIT compiler mapped functions, when they are > > jitted, re-jitted or moved. This is the > > role of the agent to encode all of this in the jitdump file which is > > then used by perf inject > > to rewrite the perf.data file with the new mmap records that point to > > the .so files that it generates > > for each jitted function. > > Right. > > > > Did you write such support? > > Yes. LLVM's jit has callback support both for symbol and debugging > information. You can register additional sets of callbacks to be called > (at runtime). > > There's basically > +void PerfJITEventListener::NotifyObjectEmitted( > + const ObjectFile &Obj, > + const RuntimeDyld::LoadedObjectInfo &L) { > and > +void PerfJITEventListener::NotifyFreeingObject(const ObjectFile &Obj) { > > which is enough to use the perf jit support. > > > > Is it already there in LLVM? > > Nope, not yet. I didn't want to submit an implementation that has the > ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But > once that's resolved I plan to push it upstream (they indicated > interest). As long as I somehow prevent VMA merging (or just filter > out PERF_RECORD_MMAP2 records with prot 0), that support works. > > The problem is that (quoted below) without that hack the subsequent > mmaps just expand the previous VMAs which leads to perf loosing its > (address,range) -> symbol mappings for previously (in the same expanded > range) known symbols. > Which timeclock source are you using to timestamp the jitdump records? It needs to match whatever you are using with perf record. Otherwise, inject will not work correctly. If you use CLOCK_MONOTONIC, then you need to use perf record -k mono as well. > > Greetings, > > Andres > > > If not, then you will not be able to correctly correlate symbols. > > > > > > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000 > > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000 > > > mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0 > > > mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0 > > > write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68 > > > > > > mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000 > > > mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000 > > > mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0 > > > mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0 > > > write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68 > > > > > > ... > > > > > > i.e. it mmaps single pages for the each JITed function's sections. Which > > > makes sense, because the first function is JITed independently from the > > > second one. > > > > > > The corresponding MMAP2 records according to perf perf script > > > --show-mmap-events are: > > > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > > > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > > > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon > > > postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon > > > Note how the size of the mapping continually increases, so that the each > > > MMAP2 record covers previous sections. > > > > > > If one perf inject --jit into that it looks like: > > > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > > > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > > > postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so > > > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon > > > postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon > > > postgres 4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so > > > > > > Note how the first injected record is also covered by the following > > > "//anon" event. This leads to the the curious effect that samples for > > > the first function (evalexpr0) are associated with the right generated > > > .so, until the second function is JITed. > > > > > > I hacked up perf inject to omit such MMAP2 records by adding > > > if (event->mmap2.prot == 0) > > > return 0; > > > to perf_event__jit_repipe_mmap2() and suddenly things work. > > > > > > I presume the increasing MMAP2 size is triggered by the consecutive > > > pages being represented as a single page-range in the kernel? > > > > > > If I, to work around such consecutive pages, force another page to be > > > mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes > > > away. > > > > > > Am I doing something wrong, or is there a bug here? > > > > > > FWIW, this is on linux 4.8.8, with perf from master > > > (v4.9-rc8-108-g810ac7b7558d). > > > > > > BTW, it's also a bit weird that those MMAP2 records triggered by > > > mprotect/mmap, have prot set to 0... > > > > > > Regards, > > > > > > Andres > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 21:17 ` Stephane Eranian @ 2017-01-26 21:22 ` Andres Freund 2017-01-26 21:34 ` Stephane Eranian 0 siblings, 1 reply; 29+ messages in thread From: Andres Freund @ 2017-01-26 21:22 UTC (permalink / raw) To: Stephane Eranian Cc: eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar, Anton Blanchard, Namhyung Kim On 2017-01-26 13:17:56 -0800, Stephane Eranian wrote: > > Nope, not yet. I didn't want to submit an implementation that has the > > ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But > > once that's resolved I plan to push it upstream (they indicated > > interest). As long as I somehow prevent VMA merging (or just filter > > out PERF_RECORD_MMAP2 records with prot 0), that support works. > > > > The problem is that (quoted below) without that hack the subsequent > > mmaps just expand the previous VMAs which leads to perf loosing its > > (address,range) -> symbol mappings for previously (in the same expanded > > range) known symbols. > > > > Which timeclock source are you using to timestamp the jitdump records? > It needs to match whatever you are using with perf record. Otherwise, inject > will not work correctly. If you use CLOCK_MONOTONIC, then you need to > use perf record -k mono as well. Neither works without preventing the same VMA from being reused, both work with. Look at the following strace output from a demo application creating two JITed functions: mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000 mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0 mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0 write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68 mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000 mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000 mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0 mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0 write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68 ... i.e. it mmaps single pages for the each JITed function's sections. Which makes sense, because the first function is JITed independently from the second one. The corresponding MMAP2 records according to perf perf script --show-mmap-events are: postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon Note how the size of the mapping continually increases, so that the each MMAP2 record covers previous sections. If one perf inject --jit into that it looks like: postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon postgres 4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so Note how the first injected record is also covered by the following "//anon" event. This leads to the the curious effect that samples for the first function (evalexpr0) are associated with the right generated .so, until the second function is JITed. I hacked up perf inject to omit such MMAP2 records by adding if (event->mmap2.prot == 0) return 0; to perf_event__jit_repipe_mmap2() and suddenly things work. I presume the increasing MMAP2 size is triggered by the consecutive pages being represented as a single page-range in the kernel? If I, to work around such consecutive pages, force another page to be mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes away. BTW, it's also a bit weird that those MMAP2 records triggered by mprotect/mmap, have prot set to 0... - Andres ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 21:22 ` Andres Freund @ 2017-01-26 21:34 ` Stephane Eranian 2017-01-26 21:51 ` Andres Freund 0 siblings, 1 reply; 29+ messages in thread From: Stephane Eranian @ 2017-01-26 21:34 UTC (permalink / raw) To: Andres Freund Cc: eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar, Anton Blanchard, Namhyung Kim On Thu, Jan 26, 2017 at 1:22 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2017-01-26 13:17:56 -0800, Stephane Eranian wrote: > > > Nope, not yet. I didn't want to submit an implementation that has the > > > ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But > > > once that's resolved I plan to push it upstream (they indicated > > > interest). As long as I somehow prevent VMA merging (or just filter > > > out PERF_RECORD_MMAP2 records with prot 0), that support works. > > > > > > The problem is that (quoted below) without that hack the subsequent > > > mmaps just expand the previous VMAs which leads to perf loosing its > > > (address,range) -> symbol mappings for previously (in the same expanded > > > range) known symbols. > > > > > > > Which timeclock source are you using to timestamp the jitdump records? > > It needs to match whatever you are using with perf record. Otherwise, inject > > will not work correctly. If you use CLOCK_MONOTONIC, then you need to > > use perf record -k mono as well. > > Neither works without preventing the same VMA from being reused, both > work with. > There is something I am not following here unless there is a bug in inject or perf report. The timestamps should take care of ordering what is mapped where and when. If look at the timestamps: postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit- The MMAP2 injected for the jitted function appears after the 2 mappings. If you have samples timestamped between 595444.867737 and 595444.868140 - 1 coming from that function, they will be symbolized correctly. Otherwise things should work as perf report will consider from 595444.868140 that you jitted function cover the range of addresses. Unless there is something wrong in perf report or I am not understanding the problem. postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit- > > Look at the following strace output from a demo application creating two > JITed functions: > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000 > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000 > mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0 > mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0 > write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68 > > mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000 > mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000 > mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0 > mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0 > write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68 > > ... > > i.e. it mmaps single pages for the each JITed function's sections. Which > makes sense, because the first function is JITed independently from the > second one. > > The corresponding MMAP2 records according to perf perf script > --show-mmap-events are: > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon > postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon > Note how the size of the mapping continually increases, so that the each > MMAP2 record covers previous sections. > > If one perf inject --jit into that it looks like: > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon > postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon > postgres 4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so > > Note how the first injected record is also covered by the following > "//anon" event. This leads to the the curious effect that samples for > the first function (evalexpr0) are associated with the right generated > .so, until the second function is JITed. > > I hacked up perf inject to omit such MMAP2 records by adding > if (event->mmap2.prot == 0) > return 0; > to perf_event__jit_repipe_mmap2() and suddenly things work. > > I presume the increasing MMAP2 size is triggered by the consecutive > pages being represented as a single page-range in the kernel? > > If I, to work around such consecutive pages, force another page to be > mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes > away. > > BTW, it's also a bit weird that those MMAP2 records triggered by > mprotect/mmap, have prot set to 0... > > - Andres ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 21:34 ` Stephane Eranian @ 2017-01-26 21:51 ` Andres Freund 0 siblings, 0 replies; 29+ messages in thread From: Andres Freund @ 2017-01-26 21:51 UTC (permalink / raw) To: Stephane Eranian Cc: eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar, Anton Blanchard, Namhyung Kim Hi, On 2017-01-26 13:34:35 -0800, Stephane Eranian wrote: > On Thu, Jan 26, 2017 at 1:22 PM, Andres Freund <andres@anarazel.de> wrote: > > > > On 2017-01-26 13:17:56 -0800, Stephane Eranian wrote: > > > > Nope, not yet. I didn't want to submit an implementation that has the > > > > ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But > > > > once that's resolved I plan to push it upstream (they indicated > > > > interest). As long as I somehow prevent VMA merging (or just filter > > > > out PERF_RECORD_MMAP2 records with prot 0), that support works. > > > > > > > > The problem is that (quoted below) without that hack the subsequent > > > > mmaps just expand the previous VMAs which leads to perf loosing its > > > > (address,range) -> symbol mappings for previously (in the same expanded > > > > range) known symbols. > > > > > > > > > > Which timeclock source are you using to timestamp the jitdump records? > > > It needs to match whatever you are using with perf record. Otherwise, inject > > > will not work correctly. If you use CLOCK_MONOTONIC, then you need to > > > use perf record -k mono as well. > > > > Neither works without preventing the same VMA from being reused, both > > work with. > > > There is something I am not following here unless there is a bug in inject > or perf report. > The timestamps should take care of ordering what is mapped where and when. > If look at the timestamps: > > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: > [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: > [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > > postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: > [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs > /home/andres/.debug/jit/llvm-IR-jit- > > The MMAP2 injected for the jitted function appears after the 2 > mappings. If you have samples timestamped between > > 595444.867737 and 595444.868140 - 1 coming from that function, they > will be symbolized correctly. Otherwise things should work as perf > report will consider from 595444.868140 that you jitted function > cover the range of addresses. Unless there is something wrong in perf > report or I am not understanding the problem. If I just have a single JITed function it *is* symbolized correctly. But if I have multiple ones only the last one is correctly symbolized... The problem appears to be triggered by a single VMA being used for all the mappings. Check how the length of the overall mapping increases from 0x7efd3866e000(0x1000) increasing to 0x7efd3866b000(0x4000) (that's the same mapping if you add the length, just increased in size by another page). When script shows > postgres 4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon all hits for 0x7efd3866e000 are symbolized correctly if, and only if, they occur before the second PERF_RECORD_MMAP2 mapping to //anon. The next function (JITing which triggered enlarging the VMA) is then correctly symbolized, until the third function is JITed... I've confirmed this by ignoring these //anon mappings by adding if (event->mmap2.prot == 0) return 0; to perf_event__jit_repipe_mmap2() - then everything works. (note that I don't understand why prot is 0 in the first place for these, but it was a convenient hack to filter the records). Which seems to suggest to me that perf's tracking of memory regions is over-eager in overwriting more specific mappings like 0x7efd3866e000(0x45) to /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so with the larger 0x7efd3866c000(0x3000) to //anon mapping. Similarly, if I prevent the independent mmap()s from sharing a VMA (by creating a mapping inbetween the individual function mappings which can't be shared), everything works as well. - Andres ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 21:00 ` Andres Freund 2017-01-26 21:17 ` Stephane Eranian @ 2017-01-26 22:19 ` Peter Zijlstra 2017-01-26 22:26 ` Stephane Eranian 1 sibling, 1 reply; 29+ messages in thread From: Peter Zijlstra @ 2017-01-26 22:19 UTC (permalink / raw) To: Andres Freund Cc: eranian, LKML, Stephane Eranian, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, anton, Namhyung Kim On Thu, Jan 26, 2017 at 01:00:52PM -0800, Andres Freund wrote: > The problem is that (quoted below) without that hack the subsequent > mmaps just expand the previous VMAs which leads to perf loosing its > (address,range) -> symbol mappings for previously (in the same expanded > range) known symbols. I'm assuming this is the userspace tool doing that, right? Acme, is that something we can cure? Instead of throwing away eveything known of the previous map, merge them when they grow? So I think the problem was that we don't track munmap(), so if you see two overlapping mappings, we have no clue and assume an munmap() has happened or something. I can't really remember, its been many many years since I wrote all that :-( ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 22:19 ` Peter Zijlstra @ 2017-01-26 22:26 ` Stephane Eranian 2017-01-26 22:38 ` Andres Freund 0 siblings, 1 reply; 29+ messages in thread From: Stephane Eranian @ 2017-01-26 22:26 UTC (permalink / raw) To: Peter Zijlstra Cc: Andres Freund, eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim On Thu, Jan 26, 2017 at 2:19 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Jan 26, 2017 at 01:00:52PM -0800, Andres Freund wrote: >> The problem is that (quoted below) without that hack the subsequent >> mmaps just expand the previous VMAs which leads to perf loosing its >> (address,range) -> symbol mappings for previously (in the same expanded >> range) known symbols. > > I'm assuming this is the userspace tool doing that, right? > > Acme, is that something we can cure? Instead of throwing away eveything > known of the previous map, merge them when they grow? > > So I think the problem was that we don't track munmap(), so if you see > two overlapping mappings, we have no clue and assume an munmap() has > happened or something. I can't really remember, its been many many years > since I wrote all that :-( But perf report takes care of overlapping mmaps normally, see util/map.c:map_groups__fixup_overlappings VMA merging does not change anything at the user level. The kernel will see two mmaps and so will perf record and perf report. This is what you have. In the samples I quoted from your example, this is what should happen. If it does not then there is something wrong in perf report. In the case of Java, if I recall, there is a single large mmap for the code cache and it works fine. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 22:26 ` Stephane Eranian @ 2017-01-26 22:38 ` Andres Freund 2017-01-26 22:51 ` Stephane Eranian 0 siblings, 1 reply; 29+ messages in thread From: Andres Freund @ 2017-01-26 22:38 UTC (permalink / raw) To: Stephane Eranian Cc: Peter Zijlstra, eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim On 2017-01-26 14:26:12 -0800, Stephane Eranian wrote: > On Thu, Jan 26, 2017 at 2:19 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Jan 26, 2017 at 01:00:52PM -0800, Andres Freund wrote: > >> The problem is that (quoted below) without that hack the subsequent > >> mmaps just expand the previous VMAs which leads to perf loosing its > >> (address,range) -> symbol mappings for previously (in the same expanded > >> range) known symbols. > > > > I'm assuming this is the userspace tool doing that, right? > > > > Acme, is that something we can cure? Instead of throwing away eveything > > known of the previous map, merge them when they grow? > > > > So I think the problem was that we don't track munmap(), so if you see > > two overlapping mappings, we have no clue and assume an munmap() has > > happened or something. I can't really remember, its been many many years > > since I wrote all that :-( > > But perf report takes care of overlapping mmaps normally, > see util/map.c:map_groups__fixup_overlappings > > VMA merging does not change anything at the user level. > The kernel will see two mmaps and so will perf record and perf report. But the kernel *didn't* report two independent mmaps. Looking at strace shows that the mmaps were done independently in page sized chunks: mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000 mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0 mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0 write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68 (first function) mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000 mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000 mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0 mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0 write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68 (second function) But then looking at the unannotated perf script --show-mmap-events: postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon these four mmaps regions are: 0x7efd3866e000(0x1000): 0x7efd3866e000 - 0x7efd3866f000 0x7efd3866d000(0x2000): 0x7efd3866d000 - 0x7efd3866f000 0x7efd3866c000(0x3000): 0x7efd3866c000 - 0x7efd3866f000 0x7efd3866b000(0x4000): 0x7efd3866b000 - 0x7efd3866f000 So there's just a single mapping being reported. > This is what you have. I'm a bit confused - isn't the trace showing exactly that the mmaps *are* reported with a merged VMA? > In the case of Java, if I recall, there is a single large mmap for the > code cache and it works fine. That explains why it's working there... Greetings, Andres Freund ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 22:38 ` Andres Freund @ 2017-01-26 22:51 ` Stephane Eranian 2017-01-26 23:09 ` Andres Freund 0 siblings, 1 reply; 29+ messages in thread From: Stephane Eranian @ 2017-01-26 22:51 UTC (permalink / raw) To: Andres Freund Cc: Peter Zijlstra, eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim On Thu, Jan 26, 2017 at 2:38 PM, Andres Freund <andres@anarazel.de> wrote: > On 2017-01-26 14:26:12 -0800, Stephane Eranian wrote: >> On Thu, Jan 26, 2017 at 2:19 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Thu, Jan 26, 2017 at 01:00:52PM -0800, Andres Freund wrote: >> >> The problem is that (quoted below) without that hack the subsequent >> >> mmaps just expand the previous VMAs which leads to perf loosing its >> >> (address,range) -> symbol mappings for previously (in the same expanded >> >> range) known symbols. >> > >> > I'm assuming this is the userspace tool doing that, right? >> > >> > Acme, is that something we can cure? Instead of throwing away eveything >> > known of the previous map, merge them when they grow? >> > >> > So I think the problem was that we don't track munmap(), so if you see >> > two overlapping mappings, we have no clue and assume an munmap() has >> > happened or something. I can't really remember, its been many many years >> > since I wrote all that :-( >> >> But perf report takes care of overlapping mmaps normally, >> see util/map.c:map_groups__fixup_overlappings >> >> VMA merging does not change anything at the user level. >> The kernel will see two mmaps and so will perf record and perf report. > > But the kernel *didn't* report two independent mmaps. > > Looking at strace shows that the mmaps were done independently in page > sized chunks: > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000 > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000 > mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0 > mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0 > write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68 > > (first function) > > mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000 > mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000 > mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0 > mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0 > write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68 > > (second function) > > > But then looking at the unannotated perf script --show-mmap-events: > > postgres 4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon > postgres 4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon > postgres 4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon > postgres 4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon > > these four mmaps regions are: > 0x7efd3866e000(0x1000): 0x7efd3866e000 - 0x7efd3866f000 > 0x7efd3866d000(0x2000): 0x7efd3866d000 - 0x7efd3866f000 > 0x7efd3866c000(0x3000): 0x7efd3866c000 - 0x7efd3866f000 > 0x7efd3866b000(0x4000): 0x7efd3866b000 - 0x7efd3866f000 > > So there's just a single mapping being reported. > > >> This is what you have. > > I'm a bit confused - isn't the trace showing exactly that the mmaps > *are* reported with a merged VMA? > > >> In the case of Java, if I recall, there is a single large mmap for the >> code cache and it works fine. > > That explains why it's working there... > Ok, I think I see the problem now (sorry was slow....): the timeline is as follows as seen from the user in your case: T0: mmap(0x1000) for func1() T1 mmap(0x2000) for func1(); T3: jit emits info func1() [0x1000-0x1fff] T4: mmap(0x3000) for func2() T5: mmap(0x4000) for funcs2() T6: jit emits info for func2() [0x2000-0x3fff] But the problem is that each mmap covers existing mmaps and thus supersedes the others as per the time stamp. The problem is not specific to jit, it just reveals itself in your case. The logic in perf is that a more recent mmap supersedes an older one, so you have: T3: 0x1000-0x2000 owned by func1 T4: 0x1000-0x3000 owned by anon T5: 0x1000-0x4000 owned by anon T6: 0x1000-0x4000 owned partially by func2() And thus perf cannot symbolize func1() anymore because it has nothing mapped in 0x1000-0x1fff but anon. Did I get the problem right this time? This is tricky to solve here because the tool does not know about the merging of the VMAs and assume you are overlapping mmaps and not merging them. Again the problem is not specific to jit, merging of VMA can happen anytime with any app. > Greetings, > > Andres Freund ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 22:51 ` Stephane Eranian @ 2017-01-26 23:09 ` Andres Freund 2017-01-26 23:16 ` Stephane Eranian 0 siblings, 1 reply; 29+ messages in thread From: Andres Freund @ 2017-01-26 23:09 UTC (permalink / raw) To: Stephane Eranian Cc: Peter Zijlstra, eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim Hi Stephane, On 2017-01-26 14:51:02 -0800, Stephane Eranian wrote: > Ok, I think I see the problem now (sorry was slow....): No worries ;) > the timeline is as follows as seen from the user in your case: > > T0: mmap(0x1000) for func1() > T1 mmap(0x2000) for func1(); > T3: jit emits info func1() [0x1000-0x1fff] > T4: mmap(0x3000) for func2() > T5: mmap(0x4000) for funcs2() > T6: jit emits info for func2() [0x2000-0x3fff] > > But the problem is that each mmap covers existing mmaps and thus > supersedes the others as per the time stamp. Yes, I think that's whats happening. Not that I actually know what I'm talking about here :) > The problem is not specific to jit, it just reveals itself in your case. > > The logic in perf is that a more recent mmap supersedes an older one, > so you have: > T3: 0x1000-0x2000 owned by func1 > T4: 0x1000-0x3000 owned by anon > T5: 0x1000-0x4000 owned by anon > T6: 0x1000-0x4000 owned partially by func2() > > And thus perf cannot symbolize func1() anymore because it has nothing > mapped in 0x1000-0x1fff but anon. > > Did I get the problem right this time? Yep. > This is tricky to solve here because the tool does not know about the > merging of the VMAs and assume you are overlapping mmaps and not > merging them. Yea, it looked tricky. I'd looked around and the only solutions I'd found was filtering out the anon mappings (obviously not a real solution) or preventing the merging (not a real solution either). > Again the problem is not specific to jit, merging of VMA can happen > anytime with any app. Sorry if I hinted in the wrong direction - I didn't see any other bad consequences. I guess in most other cases with merged VMAs its relatively harmless, since it'll presumably mostly be memory allocations and such, where this wont matter. Greetings, Andres Freund ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 23:09 ` Andres Freund @ 2017-01-26 23:16 ` Stephane Eranian 2017-01-27 13:07 ` Peter Zijlstra 0 siblings, 1 reply; 29+ messages in thread From: Stephane Eranian @ 2017-01-26 23:16 UTC (permalink / raw) To: Andres Freund Cc: Peter Zijlstra, eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim On Thu, Jan 26, 2017 at 3:09 PM, Andres Freund <andres@anarazel.de> wrote: > Hi Stephane, > > > On 2017-01-26 14:51:02 -0800, Stephane Eranian wrote: >> Ok, I think I see the problem now (sorry was slow....): > > No worries ;) > > >> the timeline is as follows as seen from the user in your case: >> >> T0: mmap(0x1000) for func1() >> T1 mmap(0x2000) for func1(); >> T3: jit emits info func1() [0x1000-0x1fff] >> T4: mmap(0x3000) for func2() >> T5: mmap(0x4000) for funcs2() >> T6: jit emits info for func2() [0x2000-0x3fff] >> >> But the problem is that each mmap covers existing mmaps and thus >> supersedes the others as per the time stamp. > > Yes, I think that's whats happening. Not that I actually know what I'm > talking about here :) > > >> The problem is not specific to jit, it just reveals itself in your case. >> >> The logic in perf is that a more recent mmap supersedes an older one, >> so you have: >> T3: 0x1000-0x2000 owned by func1 >> T4: 0x1000-0x3000 owned by anon >> T5: 0x1000-0x4000 owned by anon >> T6: 0x1000-0x4000 owned partially by func2() >> >> And thus perf cannot symbolize func1() anymore because it has nothing >> mapped in 0x1000-0x1fff but anon. >> >> Did I get the problem right this time? > > Yep. > > >> This is tricky to solve here because the tool does not know about the >> merging of the VMAs and assume you are overlapping mmaps and not >> merging them. > > Yea, it looked tricky. I'd looked around and the only solutions I'd > found was filtering out the anon mappings (obviously not a real > solution) or preventing the merging (not a real solution either). > One solution would be for the kernel to report actual mmaps and not resulting VMA layouts. Is that case you would have your 4 mmaps each reporting 4kb. That means the perf hook in the mmap code would have to be placed somewhere else. I don't know how feasible this is. I will let Peter comment on this. But hopefully by now, I have described the problem clearly enough that we can work out a solution. > >> Again the problem is not specific to jit, merging of VMA can happen >> anytime with any app. > > Sorry if I hinted in the wrong direction - I didn't see any other bad > consequences. I guess in most other cases with merged VMAs its > relatively harmless, since it'll presumably mostly be memory allocations > and such, where this wont matter. > > Greetings, > > Andres Freund ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-26 23:16 ` Stephane Eranian @ 2017-01-27 13:07 ` Peter Zijlstra 2017-01-27 15:43 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 29+ messages in thread From: Peter Zijlstra @ 2017-01-27 13:07 UTC (permalink / raw) To: Stephane Eranian Cc: Andres Freund, eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim On Thu, Jan 26, 2017 at 03:16:08PM -0800, Stephane Eranian wrote: > One solution would be for the kernel to report actual mmaps and not resulting > VMA layouts. Is that case you would have your 4 mmaps each reporting 4kb. > That means the perf hook in the mmap code would have to be placed somewhere > else. I don't know how feasible this is. I will let Peter comment on this. But > hopefully by now, I have described the problem clearly enough that we can work > out a solution. There's a number of problems with that; first and foremost is that it changes existing behaviour. Second is that there is no clear definition of what an actual mmap is; suppose something like: ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); mprotect(ptr, 4096, PROT_EXEC); What should be reported? Like I said yesterday, the existing heuristic in perf exists because we do not track munmap() and therefore cannot tell the difference between: ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); munmap(ptr); ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); and: ptr1 = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); ptr2 = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); where these two get merged. Something like the (compile tested only) below might be sufficient to disambiguate things. It would need a corresponding tools/perf patch of course, but I'm not too familiar with that code anymore. --- include/linux/perf_event.h | 2 ++ include/uapi/linux/perf_event.h | 14 ++++++++- kernel/events/core.c | 66 ++++++++++++++++++++++++++++++++++++++++- mm/memory.c | 2 ++ 4 files changed, 82 insertions(+), 2 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 78ed8105e64d..957a5f2d8e56 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1104,6 +1104,8 @@ static inline u64 __perf_event_count(struct perf_event *event) } extern void perf_event_mmap(struct vm_area_struct *vma); +extern void perf_event_munmap(struct vm_area_struct *vma); + extern struct perf_guest_info_callbacks *perf_guest_cbs; extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index c66a485a24ac..7571b7b43951 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -344,7 +344,8 @@ struct perf_event_attr { use_clockid : 1, /* use @clockid for time fields */ context_switch : 1, /* context switch data */ write_backward : 1, /* Write ring buffer from end to beginning */ - __reserved_1 : 36; + munmap : 1, /* include munmap data */ + __reserved_1 : 35; union { __u32 wakeup_events; /* wakeup every n events */ @@ -862,6 +863,17 @@ enum perf_event_type { */ PERF_RECORD_SWITCH_CPU_WIDE = 15, + /* + * struct { + * struct perf_event_header header; + * + * u64 addr; + * u64 len; + * struct sample_id sample_id; + * }; + */ + PERF_RECORD_MUNMAP = 16, + PERF_RECORD_MAX, /* non-ABI */ }; diff --git a/kernel/events/core.c b/kernel/events/core.c index 110b38a58493..256ab94e8baf 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -374,6 +374,7 @@ static DEFINE_PER_CPU(int, perf_sched_cb_usages); static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events); static atomic_t nr_mmap_events __read_mostly; +static atomic_t nr_munmap_events __read_mostly; static atomic_t nr_comm_events __read_mostly; static atomic_t nr_task_events __read_mostly; static atomic_t nr_freq_events __read_mostly; @@ -3850,7 +3851,7 @@ static bool is_sb_event(struct perf_event *event) if (event->attach_state & PERF_ATTACH_TASK) return false; - if (attr->mmap || attr->mmap_data || attr->mmap2 || + if (attr->mmap || attr->mmap_data || attr->mmap2 || attr->munmap || attr->comm || attr->comm_exec || attr->task || attr->context_switch) @@ -3906,6 +3907,8 @@ static void unaccount_event(struct perf_event *event) dec = true; if (event->attr.mmap || event->attr.mmap_data) atomic_dec(&nr_mmap_events); + if (event->attr.munmap) + atomic_dec(&nr_munmap_events); if (event->attr.comm) atomic_dec(&nr_comm_events); if (event->attr.task) @@ -6831,6 +6834,65 @@ void perf_event_mmap(struct vm_area_struct *vma) perf_event_mmap_event(&mmap_event); } +struct perf_munmap_event { + struct perf_event_header header; + u64 start; + u64 len; +}; + +static void perf_event_munmap_output(struct perf_event *event, void *data) +{ + struct perf_munmap_event *munmap_event = data; + struct perf_output_handle handle; + struct perf_sample_data sample; + int size = munmap_event->header.size; + int ret; + + if (!event->attr.munmap) + return; + + if (!event->attr.mmap) + return; /* no munmap without mmap */ + + if (!event->attr.mmap_data && + (munmap_event->header.misc & PERF_RECORD_MISC_MMAP_DATA)) + return; /* no data munmap without mmap_data */ + + perf_event_header__init_id(&munmap_event->header, &sample, event); + ret = perf_output_begin(&handle, event, munmap_event->header.size); + if (ret) + goto out; + + perf_output_put(&handle, *munmap_event); + perf_event__output_id_sample(event, &handle, &sample); + perf_output_end(&handle); +out: + munmap_event->header.size = size; +} + +void perf_event_munmap(struct vm_area_struct *vma) +{ + struct perf_munmap_event munmap_event; + + if (!atomic_read(&nr_munmap_events)) + return; + + munmap_event = (struct perf_munmap_event){ + .header = { + .type = PERF_RECORD_MUNMAP, + .misc = PERF_RECORD_MISC_USER, + .size = sizeof(munmap_event), + }, + .start = vma->vm_start, + .len = vma->vm_end - vma->vm_start, + }; + + if (!(vma->vm_flags & VM_EXEC)) + munmap_event.header.misc |= PERF_RECORD_MISC_MMAP_DATA; + + perf_iterate_sb(perf_event_munmap_output, &munmap_event, NULL); +} + void perf_event_aux_event(struct perf_event *event, unsigned long head, unsigned long size, u64 flags) { @@ -9067,6 +9129,8 @@ static void account_event(struct perf_event *event) inc = true; if (event->attr.mmap || event->attr.mmap_data) atomic_inc(&nr_mmap_events); + if (event->attr.munmap) + atomic_inc(&nr_munmap_events); if (event->attr.comm) atomic_inc(&nr_comm_events); if (event->attr.task) diff --git a/mm/memory.c b/mm/memory.c index 6bf2b471e30c..b89355229608 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -64,6 +64,7 @@ #include <linux/debugfs.h> #include <linux/userfaultfd_k.h> #include <linux/dax.h> +#include <linux/perf_event.h> #include <asm/io.h> #include <asm/mmu_context.h> @@ -1312,6 +1313,7 @@ static void unmap_single_vma(struct mmu_gather *tlb, if (end <= vma->vm_start) return; + perf_event_munmap(vma); if (vma->vm_file) uprobe_munmap(vma, start, end); ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-27 13:07 ` Peter Zijlstra @ 2017-01-27 15:43 ` Arnaldo Carvalho de Melo 2017-01-27 17:35 ` Stephane Eranian 2017-01-27 17:38 ` [PATCH] handle munmap records in tools/perf was: " Arnaldo Carvalho de Melo 0 siblings, 2 replies; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2017-01-27 15:43 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephane Eranian, Andres Freund, eranian, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu: > Something like the (compile tested only) below might be sufficient to > disambiguate things. It would need a corresponding tools/perf patch of > course, but I'm not too familiar with that code anymore. I'm working on patch to do feature test, fallback and handling of the event, etc, will post later. - Arnaldo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-27 15:43 ` Arnaldo Carvalho de Melo @ 2017-01-27 17:35 ` Stephane Eranian 2017-01-27 17:38 ` [PATCH] handle munmap records in tools/perf was: " Arnaldo Carvalho de Melo 1 sibling, 0 replies; 29+ messages in thread From: Stephane Eranian @ 2017-01-27 17:35 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Andres Freund, eranian, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim On Fri, Jan 27, 2017 at 7:43 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu: >> Something like the (compile tested only) below might be sufficient to >> disambiguate things. It would need a corresponding tools/perf patch of >> course, but I'm not too familiar with that code anymore. > > I'm working on patch to do feature test, fallback and handling of the > event, etc, will post later. > Going back to Andres' case: T0: mmap(0x1000) anon for func1() T1 mmap(0x2000) anon for func1(); T3: jit emits info func1() [0x0000-0x0fff] T4: mmap(0x3000) for func2() T5: mmap(0x4000) for funcs2() T6: jit emits info for func2() [0x2000-0x2fff] So with Peter's patch, nothing will change in the flow of MMAP2 seen by perf in this particular case. Furthermore, there is no munmap(). But now the perf tool behavior. Now perf detects the overlap but because it did not see an unmap(), it will have to merge or split mmaps(). At T0 (no change from previous behavior): 0x0000-0x0fff anon At T1 (no change = overlap): 0x0000-0x1fff anon At T3 (no change =overlap): 0x0000-0x0fff owned by func1, 0x1000-0x1fff anon. At T4 (change: cannot overlap non-anon range without an munmap, so need to split): 0x0000-0x0fff owned by func1() 0x1000-0x2fff anon At T5, (change: cannot overlap non-anon range without an munmap, so need to split): 0x0000-0x0fff owned by func1() 0x1000-0x3fff anon At T6, (no change: overlapping an anon): 0x=0000-0x=0fff owned by func1() 0x1000-0x1fff anon 0x2000-0x2fff owned by func2() 0x300-0x3fff anon In summary the key change is that an MMAP2(anon) cannot overlap an MMAP2(file) anymore. It has to be split. Everything else stays the same. Do we agree? ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-27 15:43 ` Arnaldo Carvalho de Melo 2017-01-27 17:35 ` Stephane Eranian @ 2017-01-27 17:38 ` Arnaldo Carvalho de Melo 2017-01-27 17:46 ` Stephane Eranian 1 sibling, 1 reply; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2017-01-27 17:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephane Eranian, Andres Freund, Stephane Eranian, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu: > Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu: > > Something like the (compile tested only) below might be sufficient to > > disambiguate things. It would need a corresponding tools/perf patch of > > course, but I'm not too familiar with that code anymore. > > I'm working on patch to do feature test, fallback and handling of the > event, etc, will post later. Just compile tested, need to build a kernel with PeterZ's patch to test, feel free to go from there if in a hurry. The place where the map is yanked out of the thread's maps rbtree is at machine__process_munmap_event() The rest is making sure the tool works with older kernels, deals with endianness in the record in a perf.data file for cross platform analysis, hooking it to the various tools where handling this event makes sense. [acme@jouet linux]$ diffstat /tmp/a.patch include/uapi/linux/perf_event.h | 14 +++++++++- perf/builtin-annotate.c | 1 perf/builtin-c2c.c | 1 perf/builtin-diff.c | 1 perf/builtin-inject.c | 17 +++++++++++- perf/builtin-kmem.c | 1 perf/builtin-mem.c | 1 perf/builtin-record.c | 1 perf/builtin-report.c | 1 perf/builtin-script.c | 30 ++++++++++++++++++++++ perf/builtin-trace.c | 1 perf/util/data-convert-bt.c | 1 perf/util/event.c | 18 +++++++++++++ perf/util/event.h | 10 +++++++ perf/util/evsel.c | 12 +++++++- perf/util/machine.c | 54 ++++++++++++++++++++++++++++++++++++++++ perf/util/machine.h | 2 + perf/util/python.c | 1 perf/util/session.c | 19 ++++++++++++++ perf/util/tool.h | 1 20 files changed, 183 insertions(+), 4 deletions(-) [acme@jouet linux]$ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index c66a485a24ac..59c5cd5abbbf 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -344,7 +344,8 @@ struct perf_event_attr { use_clockid : 1, /* use @clockid for time fields */ context_switch : 1, /* context switch data */ write_backward : 1, /* Write ring buffer from end to beginning */ - __reserved_1 : 36; + munmap : 1, /* include munmap data */ + __reserved_1 : 35; union { __u32 wakeup_events; /* wakeup every n events */ @@ -862,6 +863,17 @@ enum perf_event_type { */ PERF_RECORD_SWITCH_CPU_WIDE = 15, + /* + * struct { + * struct perf_event_header header; + * + * u64 addr; + * u64 len; + * struct sample_id sample_id; + * }; + */ + PERF_RECORD_MUNMAP = 16, + PERF_RECORD_MAX, /* non-ABI */ }; diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index ebb628332a6e..082b5f100ac5 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -390,6 +390,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused) .sample = process_sample_event, .mmap = perf_event__process_mmap, .mmap2 = perf_event__process_mmap2, + .munmap = perf_event__process_munmap, .comm = perf_event__process_comm, .exit = perf_event__process_exit, .fork = perf_event__process_fork, diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c index e2b21723bbf8..3e1f1eba77e2 100644 --- a/tools/perf/builtin-c2c.c +++ b/tools/perf/builtin-c2c.c @@ -313,6 +313,7 @@ static struct perf_c2c c2c = { .sample = process_sample_event, .mmap = perf_event__process_mmap, .mmap2 = perf_event__process_mmap2, + .munmap = perf_event__process_munmap, .comm = perf_event__process_comm, .exit = perf_event__process_exit, .fork = perf_event__process_fork, diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index 9ff0db4e2d0c..2c7856848887 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -350,6 +350,7 @@ static struct perf_tool tool = { .sample = diff__process_sample_event, .mmap = perf_event__process_mmap, .mmap2 = perf_event__process_mmap2, + .munmap = perf_event__process_munmap, .comm = perf_event__process_comm, .exit = perf_event__process_exit, .fork = perf_event__process_fork, diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index b9bc7e39833a..4e6103d0c163 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -282,6 +282,19 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool, return err; } +static int perf_event__repipe_munmap(struct perf_tool *tool, + union perf_event *event, + struct perf_sample *sample, + struct machine *machine) +{ + int err; + + err = perf_event__process_munmap(tool, event, sample, machine); + perf_event__repipe(tool, event, sample, machine); + + return err; +} + #ifdef HAVE_JITDUMP static int perf_event__jit_repipe_mmap2(struct perf_tool *tool, union perf_event *event, @@ -569,7 +582,7 @@ static void strip_init(struct perf_inject *inject) static bool has_tracking(struct perf_evsel *evsel) { return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm || - evsel->attr.task; + evsel->attr.task || evsel->attr.munmap; } #define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \ @@ -632,6 +645,7 @@ static int __cmd_inject(struct perf_inject *inject) inject->itrace_synth_opts.set) { inject->tool.mmap = perf_event__repipe_mmap; inject->tool.mmap2 = perf_event__repipe_mmap2; + inject->tool.munmap = perf_event__repipe_munmap; inject->tool.fork = perf_event__repipe_fork; inject->tool.tracing_data = perf_event__repipe_tracing_data; } @@ -732,6 +746,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused) .sample = perf_event__repipe_sample, .mmap = perf_event__repipe, .mmap2 = perf_event__repipe, + .munmap = perf_event__repipe, .comm = perf_event__repipe, .fork = perf_event__repipe, .exit = perf_event__repipe, diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index 29f4751a3574..eb59aa7a0f5b 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = { .comm = perf_event__process_comm, .mmap = perf_event__process_mmap, .mmap2 = perf_event__process_mmap2, + .munmap = perf_event__process_munmap, .ordered_events = true, }; diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c index cd7bc4d104e2..de026e9ef1be 100644 --- a/tools/perf/builtin-mem.c +++ b/tools/perf/builtin-mem.c @@ -338,6 +338,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused) .sample = process_sample_event, .mmap = perf_event__process_mmap, .mmap2 = perf_event__process_mmap2, + .munmap = perf_event__process_munmap, .comm = perf_event__process_comm, .lost = perf_event__process_lost, .fork = perf_event__process_fork, diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index ffac8ca9fb01..f197fc196cd8 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -1498,6 +1498,7 @@ static struct record record = { .comm = perf_event__process_comm, .mmap = perf_event__process_mmap, .mmap2 = perf_event__process_mmap2, + .munmap = perf_event__process_munmap, .ordered_events = true, }, }; diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index dbd7fa028861..96680fbeb664 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -693,6 +693,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) .sample = process_sample_event, .mmap = perf_event__process_mmap, .mmap2 = perf_event__process_mmap2, + .munmap = perf_event__process_munmap, .comm = perf_event__process_comm, .exit = perf_event__process_exit, .fork = perf_event__process_fork, diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index c0783b4f7b6c..27fcbcb15782 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1244,6 +1244,34 @@ static int process_mmap2_event(struct perf_tool *tool, return 0; } +static int process_munmap_event(struct perf_tool *tool, union perf_event *event, + struct perf_sample *sample, struct machine *machine) +{ + struct thread *thread; + struct perf_script *script = container_of(tool, struct perf_script, tool); + struct perf_session *session = script->session; + struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id); + + if (perf_event__process_munmap(tool, event, sample, machine) < 0) + return -1; + + thread = machine__findnew_thread(machine, sample->pid, sample->tid); + if (thread == NULL) { + pr_debug("problem processing MUNMAP event, skipping it.\n"); + return -1; + } + + if (!evsel->attr.sample_id_all) { + pr_debug("MUNMAP event requires attr.sample_id_all, skipping it.\n"); + return -1; + } + + print_sample_start(sample, thread, evsel); + perf_event__fprintf(event, stdout); + thread__put(thread); + return 0; +} + static int process_switch_event(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, @@ -1290,6 +1318,7 @@ static int __cmd_script(struct perf_script *script) if (script->show_mmap_events) { script->tool.mmap = process_mmap_event; script->tool.mmap2 = process_mmap2_event; + script->tool.munmap = process_munmap_event; } if (script->show_switch_events) script->tool.context_switch = process_switch_event; @@ -2096,6 +2125,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) .sample = process_sample_event, .mmap = perf_event__process_mmap, .mmap2 = perf_event__process_mmap2, + .munmap = perf_event__process_munmap, .comm = perf_event__process_comm, .exit = perf_event__process_exit, .fork = perf_event__process_fork, diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index 40ef9b293d1b..cfb7e9e2cd7d 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -2411,6 +2411,7 @@ static int trace__replay(struct trace *trace) trace->tool.sample = trace__process_sample; trace->tool.mmap = perf_event__process_mmap; trace->tool.mmap2 = perf_event__process_mmap2; + trace->tool.munmap = perf_event__process_munmap; trace->tool.comm = perf_event__process_comm; trace->tool.exit = perf_event__process_exit; trace->tool.fork = perf_event__process_fork; diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c index 4e6cbc99f08e..8d1feb9fe82d 100644 --- a/tools/perf/util/data-convert-bt.c +++ b/tools/perf/util/data-convert-bt.c @@ -1462,6 +1462,7 @@ int bt_convert__perf2ctf(const char *input, const char *path, .sample = process_sample_event, .mmap = perf_event__process_mmap, .mmap2 = perf_event__process_mmap2, + .munmap = perf_event__process_munmap, .comm = perf_event__process_comm, .exit = perf_event__process_exit, .fork = perf_event__process_fork, diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index 8ab0d7da956b..ffcba9c07c0d 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -18,6 +18,7 @@ static const char *perf_event__names[] = { [0] = "TOTAL", [PERF_RECORD_MMAP] = "MMAP", [PERF_RECORD_MMAP2] = "MMAP2", + [PERF_RECORD_MUNMAP] = "MUNMAP", [PERF_RECORD_LOST] = "LOST", [PERF_RECORD_COMM] = "COMM", [PERF_RECORD_EXIT] = "EXIT", @@ -1080,6 +1081,12 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp) event->mmap2.filename); } +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp) +{ + return fprintf(fp, " [%#" PRIx64 ", %" PRIu64 "]\n", + event->munmap.start, event->munmap.len); +} + size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp) { struct thread_map *threads = thread_map__new_event(&event->thread_map); @@ -1128,6 +1135,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused, return machine__process_mmap2_event(machine, event, sample); } +int perf_event__process_munmap(struct perf_tool *tool __maybe_unused, + union perf_event *event, + struct perf_sample *sample, + struct machine *machine) +{ + return machine__process_munmap_event(machine, event, sample); +} + size_t perf_event__fprintf_task(union perf_event *event, FILE *fp) { return fprintf(fp, "(%d:%d):(%d:%d)\n", @@ -1199,6 +1214,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp) case PERF_RECORD_MMAP2: ret += perf_event__fprintf_mmap2(event, fp); break; + case PERF_RECORD_MUNMAP: + ret += perf_event__fprintf_munmap(event, fp); + break; case PERF_RECORD_AUX: ret += perf_event__fprintf_aux(event, fp); break; diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index c735c53a26f8..00942b6a9d48 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -33,6 +33,12 @@ struct mmap2_event { char filename[PATH_MAX]; }; +struct munmap_event { + struct perf_event_header header; + u64 start; + u64 len; +}; + struct comm_event { struct perf_event_header header; u32 pid, tid; @@ -484,6 +490,7 @@ union perf_event { struct perf_event_header header; struct mmap_event mmap; struct mmap2_event mmap2; + struct munmap_event munmap; struct comm_event comm; struct fork_event fork; struct lost_event lost; @@ -595,6 +602,8 @@ int perf_event__process_mmap2(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, struct machine *machine); +int perf_event__process_munmap(struct perf_tool *tool, union perf_event *event, + struct perf_sample *sample, struct machine *machine); int perf_event__process_fork(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, @@ -647,6 +656,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool, size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp); size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp); size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp); +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp); size_t perf_event__fprintf_task(union perf_event *event, FILE *fp); size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp); size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 04e536ae4d88..c4b88e4f2422 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -34,6 +34,7 @@ static struct { bool sample_id_all; bool exclude_guest; bool mmap2; + bool munmap; bool cloexec; bool clockid; bool clockid_wrong; @@ -930,6 +931,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts, attr->task = track; attr->mmap = track; attr->mmap2 = track && !perf_missing_features.mmap2; + attr->munmap = track && !perf_missing_features.munmap; attr->comm = track; if (opts->record_switch_events) @@ -1395,6 +1397,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr, PRINT_ATTRf(exclude_callchain_kernel, p_unsigned); PRINT_ATTRf(exclude_callchain_user, p_unsigned); PRINT_ATTRf(mmap2, p_unsigned); + PRINT_ATTRf(munmap, p_unsigned); PRINT_ATTRf(comm_exec, p_unsigned); PRINT_ATTRf(use_clockid, p_unsigned); PRINT_ATTRf(context_switch, p_unsigned); @@ -1474,6 +1477,8 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, } fallback_missing_features: + if (perf_missing_features.munmap) + evsel->attr.munmap = 0; if (perf_missing_features.clockid_wrong) evsel->attr.clockid = CLOCK_MONOTONIC; /* should always work */ if (perf_missing_features.clockid) { @@ -1603,10 +1608,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, goto out_close; /* - * Must probe features in the order they were added to the + * Must probe features in the reverse order they were added to the * perf_event_attr interface. */ - if (!perf_missing_features.write_backward && evsel->attr.write_backward) { + if (!perf_missing_features.munmap && evsel->attr.munmap) { + perf_missing_features.munmap = true; + goto fallback_missing_features; + } else if (!perf_missing_features.write_backward && evsel->attr.write_backward) { perf_missing_features.write_backward = true; goto out_close; } else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) { diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 747a034d1ff3..24f2f309d70f 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1320,6 +1320,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine, return -1; } +static int machine__process_kernel_munmap_event(struct machine *machine __maybe_unused, + union perf_event *event __maybe_unused) +{ + /* + * XXX: Fill this in as soon as we get munmap event for kernel + * "mmaps", aka module unload + */ + return 0; +} + int machine__process_mmap2_event(struct machine *machine, union perf_event *event, struct perf_sample *sample) @@ -1379,6 +1389,48 @@ int machine__process_mmap2_event(struct machine *machine, return 0; } +int machine__process_munmap_event(struct machine *machine, + union perf_event *event, + struct perf_sample *sample) +{ + struct thread *thread; + struct map *map; + enum map_type type = MAP__FUNCTION; + int ret = 0; + + if (dump_trace) + perf_event__fprintf_munmap(event, stdout); + + if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL || + sample->cpumode == PERF_RECORD_MISC_KERNEL) { + ret = machine__process_kernel_munmap_event(machine, event); + if (ret < 0) + goto out_problem; + return 0; + } + + thread = machine__find_thread(machine, sample->pid, sample->tid); + if (thread == NULL) + goto out_problem; + + if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA) + type = MAP__VARIABLE; + + map = map_groups__find(thread->mg, type, event->munmap.start); + if (map != NULL) + goto out_problem_map; + + map_groups__remove(thread->mg, map); + thread__put(thread); + return 0; + +out_problem_map: + thread__put(thread); +out_problem: + dump_printf("problem processing PERF_RECORD_MUNMAP, skipping event.\n"); + return 0; +} + int machine__process_mmap_event(struct machine *machine, union perf_event *event, struct perf_sample *sample) { @@ -1540,6 +1592,8 @@ int machine__process_event(struct machine *machine, union perf_event *event, ret = machine__process_mmap_event(machine, event, sample); break; case PERF_RECORD_MMAP2: ret = machine__process_mmap2_event(machine, event, sample); break; + case PERF_RECORD_MUNMAP: + ret = machine__process_munmap_event(machine, event, sample); break; case PERF_RECORD_FORK: ret = machine__process_fork_event(machine, event, sample); break; case PERF_RECORD_EXIT: diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h index a28305029711..eabc8b9d8d36 100644 --- a/tools/perf/util/machine.h +++ b/tools/perf/util/machine.h @@ -101,6 +101,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event struct perf_sample *sample); int machine__process_mmap2_event(struct machine *machine, union perf_event *event, struct perf_sample *sample); +int machine__process_munmap_event(struct machine *machine, union perf_event *event, + struct perf_sample *sample); int machine__process_event(struct machine *machine, union perf_event *event, struct perf_sample *sample); diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index a5fbc012e3df..047e004dc53f 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -1155,6 +1155,7 @@ static struct { PERF_CONST(RECORD_READ), PERF_CONST(RECORD_SAMPLE), PERF_CONST(RECORD_MMAP2), + PERF_CONST(RECORD_MUNMAP), PERF_CONST(RECORD_AUX), PERF_CONST(RECORD_ITRACE_START), PERF_CONST(RECORD_LOST_SAMPLES), diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 349c68144e55..42edd68ce0b8 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -357,6 +357,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool) tool->mmap = process_event_stub; if (tool->mmap2 == NULL) tool->mmap2 = process_event_stub; + if (tool->munmap == NULL) + tool->munmap = process_event_stub; if (tool->comm == NULL) tool->comm = process_event_stub; if (tool->fork == NULL) @@ -480,6 +482,20 @@ static void perf_event__mmap2_swap(union perf_event *event, swap_sample_id_all(event, data); } } + +static void perf_event__munmap_swap(union perf_event *event, bool sample_id_all) +{ + event->munmap.start = bswap_64(event->munmap.start); + event->munmap.len = bswap_64(event->munmap.len); + + if (sample_id_all) { + void *data = &event->munmap.len; + + data += sizeof(event->munmap.len); + swap_sample_id_all(event, data); + } +} + static void perf_event__task_swap(union perf_event *event, bool sample_id_all) { event->fork.pid = bswap_32(event->fork.pid); @@ -773,6 +789,7 @@ typedef void (*perf_event__swap_op)(union perf_event *event, static perf_event__swap_op perf_event__swap_ops[] = { [PERF_RECORD_MMAP] = perf_event__mmap_swap, [PERF_RECORD_MMAP2] = perf_event__mmap2_swap, + [PERF_RECORD_MUNMAP] = perf_event__munmap_swap, [PERF_RECORD_COMM] = perf_event__comm_swap, [PERF_RECORD_FORK] = perf_event__task_swap, [PERF_RECORD_EXIT] = perf_event__task_swap, @@ -1237,6 +1254,8 @@ static int machines__deliver_event(struct machines *machines, if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT) ++evlist->stats.nr_proc_map_timeout; return tool->mmap2(tool, event, sample, machine); + case PERF_RECORD_MUNMAP: + return tool->munmap(tool, event, sample, machine); case PERF_RECORD_COMM: return tool->comm(tool, event, sample, machine); case PERF_RECORD_FORK: diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h index ac2590a3de2d..66291063b8b2 100644 --- a/tools/perf/util/tool.h +++ b/tools/perf/util/tool.h @@ -39,6 +39,7 @@ struct perf_tool { read; event_op mmap, mmap2, + munmap, comm, fork, exit, ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-27 17:38 ` [PATCH] handle munmap records in tools/perf was: " Arnaldo Carvalho de Melo @ 2017-01-27 17:46 ` Stephane Eranian 2017-01-27 18:05 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 29+ messages in thread From: Stephane Eranian @ 2017-01-27 17:46 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Stephane Eranian, Andres Freund, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote: > Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu: >> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu: >> > Something like the (compile tested only) below might be sufficient to >> > disambiguate things. It would need a corresponding tools/perf patch of >> > course, but I'm not too familiar with that code anymore. >> >> I'm working on patch to do feature test, fallback and handling of the >> event, etc, will post later. > > Just compile tested, need to build a kernel with PeterZ's patch to test, > feel free to go from there if in a hurry. > > > The place where the map is yanked out of the thread's maps rbtree is at > > machine__process_munmap_event() > > The rest is making sure the tool works with older kernels, deals with > endianness in the record in a perf.data file for cross platform > analysis, hooking it to the various tools where handling this event > makes sense. > At first glance this patch handles the munmap() well. But it will not solve the case of Andres. Unless you're telling me that the kernel with Peterz's patch will now generate munmap record because of the merging. If not, then the code handling overlaps needs to change as well as I described in my other Email. > [acme@jouet linux]$ diffstat /tmp/a.patch > include/uapi/linux/perf_event.h | 14 +++++++++- > perf/builtin-annotate.c | 1 > perf/builtin-c2c.c | 1 > perf/builtin-diff.c | 1 > perf/builtin-inject.c | 17 +++++++++++- > perf/builtin-kmem.c | 1 > perf/builtin-mem.c | 1 > perf/builtin-record.c | 1 > perf/builtin-report.c | 1 > perf/builtin-script.c | 30 ++++++++++++++++++++++ > perf/builtin-trace.c | 1 > perf/util/data-convert-bt.c | 1 > perf/util/event.c | 18 +++++++++++++ > perf/util/event.h | 10 +++++++ > perf/util/evsel.c | 12 +++++++- > perf/util/machine.c | 54 ++++++++++++++++++++++++++++++++++++++++ > perf/util/machine.h | 2 + > perf/util/python.c | 1 > perf/util/session.c | 19 ++++++++++++++ > perf/util/tool.h | 1 > 20 files changed, 183 insertions(+), 4 deletions(-) > [acme@jouet linux]$ > > diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h > index c66a485a24ac..59c5cd5abbbf 100644 > --- a/tools/include/uapi/linux/perf_event.h > +++ b/tools/include/uapi/linux/perf_event.h > @@ -344,7 +344,8 @@ struct perf_event_attr { > use_clockid : 1, /* use @clockid for time fields */ > context_switch : 1, /* context switch data */ > write_backward : 1, /* Write ring buffer from end to beginning */ > - __reserved_1 : 36; > + munmap : 1, /* include munmap data */ > + __reserved_1 : 35; > > union { > __u32 wakeup_events; /* wakeup every n events */ > @@ -862,6 +863,17 @@ enum perf_event_type { > */ > PERF_RECORD_SWITCH_CPU_WIDE = 15, > > + /* > + * struct { > + * struct perf_event_header header; > + * > + * u64 addr; > + * u64 len; > + * struct sample_id sample_id; > + * }; > + */ > + PERF_RECORD_MUNMAP = 16, > + > PERF_RECORD_MAX, /* non-ABI */ > }; > > diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c > index ebb628332a6e..082b5f100ac5 100644 > --- a/tools/perf/builtin-annotate.c > +++ b/tools/perf/builtin-annotate.c > @@ -390,6 +390,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused) > .sample = process_sample_event, > .mmap = perf_event__process_mmap, > .mmap2 = perf_event__process_mmap2, > + .munmap = perf_event__process_munmap, > .comm = perf_event__process_comm, > .exit = perf_event__process_exit, > .fork = perf_event__process_fork, > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c > index e2b21723bbf8..3e1f1eba77e2 100644 > --- a/tools/perf/builtin-c2c.c > +++ b/tools/perf/builtin-c2c.c > @@ -313,6 +313,7 @@ static struct perf_c2c c2c = { > .sample = process_sample_event, > .mmap = perf_event__process_mmap, > .mmap2 = perf_event__process_mmap2, > + .munmap = perf_event__process_munmap, > .comm = perf_event__process_comm, > .exit = perf_event__process_exit, > .fork = perf_event__process_fork, > diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c > index 9ff0db4e2d0c..2c7856848887 100644 > --- a/tools/perf/builtin-diff.c > +++ b/tools/perf/builtin-diff.c > @@ -350,6 +350,7 @@ static struct perf_tool tool = { > .sample = diff__process_sample_event, > .mmap = perf_event__process_mmap, > .mmap2 = perf_event__process_mmap2, > + .munmap = perf_event__process_munmap, > .comm = perf_event__process_comm, > .exit = perf_event__process_exit, > .fork = perf_event__process_fork, > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c > index b9bc7e39833a..4e6103d0c163 100644 > --- a/tools/perf/builtin-inject.c > +++ b/tools/perf/builtin-inject.c > @@ -282,6 +282,19 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool, > return err; > } > > +static int perf_event__repipe_munmap(struct perf_tool *tool, > + union perf_event *event, > + struct perf_sample *sample, > + struct machine *machine) > +{ > + int err; > + > + err = perf_event__process_munmap(tool, event, sample, machine); > + perf_event__repipe(tool, event, sample, machine); > + > + return err; > +} > + > #ifdef HAVE_JITDUMP > static int perf_event__jit_repipe_mmap2(struct perf_tool *tool, > union perf_event *event, > @@ -569,7 +582,7 @@ static void strip_init(struct perf_inject *inject) > static bool has_tracking(struct perf_evsel *evsel) > { > return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm || > - evsel->attr.task; > + evsel->attr.task || evsel->attr.munmap; > } > > #define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \ > @@ -632,6 +645,7 @@ static int __cmd_inject(struct perf_inject *inject) > inject->itrace_synth_opts.set) { > inject->tool.mmap = perf_event__repipe_mmap; > inject->tool.mmap2 = perf_event__repipe_mmap2; > + inject->tool.munmap = perf_event__repipe_munmap; > inject->tool.fork = perf_event__repipe_fork; > inject->tool.tracing_data = perf_event__repipe_tracing_data; > } > @@ -732,6 +746,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused) > .sample = perf_event__repipe_sample, > .mmap = perf_event__repipe, > .mmap2 = perf_event__repipe, > + .munmap = perf_event__repipe, > .comm = perf_event__repipe, > .fork = perf_event__repipe, > .exit = perf_event__repipe, > diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c > index 29f4751a3574..eb59aa7a0f5b 100644 > --- a/tools/perf/builtin-kmem.c > +++ b/tools/perf/builtin-kmem.c > @@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = { > .comm = perf_event__process_comm, > .mmap = perf_event__process_mmap, > .mmap2 = perf_event__process_mmap2, > + .munmap = perf_event__process_munmap, > .ordered_events = true, > }; > > diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c > index cd7bc4d104e2..de026e9ef1be 100644 > --- a/tools/perf/builtin-mem.c > +++ b/tools/perf/builtin-mem.c > @@ -338,6 +338,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused) > .sample = process_sample_event, > .mmap = perf_event__process_mmap, > .mmap2 = perf_event__process_mmap2, > + .munmap = perf_event__process_munmap, > .comm = perf_event__process_comm, > .lost = perf_event__process_lost, > .fork = perf_event__process_fork, > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > index ffac8ca9fb01..f197fc196cd8 100644 > --- a/tools/perf/builtin-record.c > +++ b/tools/perf/builtin-record.c > @@ -1498,6 +1498,7 @@ static struct record record = { > .comm = perf_event__process_comm, > .mmap = perf_event__process_mmap, > .mmap2 = perf_event__process_mmap2, > + .munmap = perf_event__process_munmap, > .ordered_events = true, > }, > }; > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c > index dbd7fa028861..96680fbeb664 100644 > --- a/tools/perf/builtin-report.c > +++ b/tools/perf/builtin-report.c > @@ -693,6 +693,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) > .sample = process_sample_event, > .mmap = perf_event__process_mmap, > .mmap2 = perf_event__process_mmap2, > + .munmap = perf_event__process_munmap, > .comm = perf_event__process_comm, > .exit = perf_event__process_exit, > .fork = perf_event__process_fork, > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > index c0783b4f7b6c..27fcbcb15782 100644 > --- a/tools/perf/builtin-script.c > +++ b/tools/perf/builtin-script.c > @@ -1244,6 +1244,34 @@ static int process_mmap2_event(struct perf_tool *tool, > return 0; > } > > +static int process_munmap_event(struct perf_tool *tool, union perf_event *event, > + struct perf_sample *sample, struct machine *machine) > +{ > + struct thread *thread; > + struct perf_script *script = container_of(tool, struct perf_script, tool); > + struct perf_session *session = script->session; > + struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id); > + > + if (perf_event__process_munmap(tool, event, sample, machine) < 0) > + return -1; > + > + thread = machine__findnew_thread(machine, sample->pid, sample->tid); > + if (thread == NULL) { > + pr_debug("problem processing MUNMAP event, skipping it.\n"); > + return -1; > + } > + > + if (!evsel->attr.sample_id_all) { > + pr_debug("MUNMAP event requires attr.sample_id_all, skipping it.\n"); > + return -1; > + } > + > + print_sample_start(sample, thread, evsel); > + perf_event__fprintf(event, stdout); > + thread__put(thread); > + return 0; > +} > + > static int process_switch_event(struct perf_tool *tool, > union perf_event *event, > struct perf_sample *sample, > @@ -1290,6 +1318,7 @@ static int __cmd_script(struct perf_script *script) > if (script->show_mmap_events) { > script->tool.mmap = process_mmap_event; > script->tool.mmap2 = process_mmap2_event; > + script->tool.munmap = process_munmap_event; > } > if (script->show_switch_events) > script->tool.context_switch = process_switch_event; > @@ -2096,6 +2125,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) > .sample = process_sample_event, > .mmap = perf_event__process_mmap, > .mmap2 = perf_event__process_mmap2, > + .munmap = perf_event__process_munmap, > .comm = perf_event__process_comm, > .exit = perf_event__process_exit, > .fork = perf_event__process_fork, > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c > index 40ef9b293d1b..cfb7e9e2cd7d 100644 > --- a/tools/perf/builtin-trace.c > +++ b/tools/perf/builtin-trace.c > @@ -2411,6 +2411,7 @@ static int trace__replay(struct trace *trace) > trace->tool.sample = trace__process_sample; > trace->tool.mmap = perf_event__process_mmap; > trace->tool.mmap2 = perf_event__process_mmap2; > + trace->tool.munmap = perf_event__process_munmap; > trace->tool.comm = perf_event__process_comm; > trace->tool.exit = perf_event__process_exit; > trace->tool.fork = perf_event__process_fork; > diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c > index 4e6cbc99f08e..8d1feb9fe82d 100644 > --- a/tools/perf/util/data-convert-bt.c > +++ b/tools/perf/util/data-convert-bt.c > @@ -1462,6 +1462,7 @@ int bt_convert__perf2ctf(const char *input, const char *path, > .sample = process_sample_event, > .mmap = perf_event__process_mmap, > .mmap2 = perf_event__process_mmap2, > + .munmap = perf_event__process_munmap, > .comm = perf_event__process_comm, > .exit = perf_event__process_exit, > .fork = perf_event__process_fork, > diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c > index 8ab0d7da956b..ffcba9c07c0d 100644 > --- a/tools/perf/util/event.c > +++ b/tools/perf/util/event.c > @@ -18,6 +18,7 @@ static const char *perf_event__names[] = { > [0] = "TOTAL", > [PERF_RECORD_MMAP] = "MMAP", > [PERF_RECORD_MMAP2] = "MMAP2", > + [PERF_RECORD_MUNMAP] = "MUNMAP", > [PERF_RECORD_LOST] = "LOST", > [PERF_RECORD_COMM] = "COMM", > [PERF_RECORD_EXIT] = "EXIT", > @@ -1080,6 +1081,12 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp) > event->mmap2.filename); > } > > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp) > +{ > + return fprintf(fp, " [%#" PRIx64 ", %" PRIu64 "]\n", > + event->munmap.start, event->munmap.len); > +} > + > size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp) > { > struct thread_map *threads = thread_map__new_event(&event->thread_map); > @@ -1128,6 +1135,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused, > return machine__process_mmap2_event(machine, event, sample); > } > > +int perf_event__process_munmap(struct perf_tool *tool __maybe_unused, > + union perf_event *event, > + struct perf_sample *sample, > + struct machine *machine) > +{ > + return machine__process_munmap_event(machine, event, sample); > +} > + > size_t perf_event__fprintf_task(union perf_event *event, FILE *fp) > { > return fprintf(fp, "(%d:%d):(%d:%d)\n", > @@ -1199,6 +1214,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp) > case PERF_RECORD_MMAP2: > ret += perf_event__fprintf_mmap2(event, fp); > break; > + case PERF_RECORD_MUNMAP: > + ret += perf_event__fprintf_munmap(event, fp); > + break; > case PERF_RECORD_AUX: > ret += perf_event__fprintf_aux(event, fp); > break; > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h > index c735c53a26f8..00942b6a9d48 100644 > --- a/tools/perf/util/event.h > +++ b/tools/perf/util/event.h > @@ -33,6 +33,12 @@ struct mmap2_event { > char filename[PATH_MAX]; > }; > > +struct munmap_event { > + struct perf_event_header header; > + u64 start; > + u64 len; > +}; > + > struct comm_event { > struct perf_event_header header; > u32 pid, tid; > @@ -484,6 +490,7 @@ union perf_event { > struct perf_event_header header; > struct mmap_event mmap; > struct mmap2_event mmap2; > + struct munmap_event munmap; > struct comm_event comm; > struct fork_event fork; > struct lost_event lost; > @@ -595,6 +602,8 @@ int perf_event__process_mmap2(struct perf_tool *tool, > union perf_event *event, > struct perf_sample *sample, > struct machine *machine); > +int perf_event__process_munmap(struct perf_tool *tool, union perf_event *event, > + struct perf_sample *sample, struct machine *machine); > int perf_event__process_fork(struct perf_tool *tool, > union perf_event *event, > struct perf_sample *sample, > @@ -647,6 +656,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool, > size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp); > size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp); > size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp); > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp); > size_t perf_event__fprintf_task(union perf_event *event, FILE *fp); > size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp); > size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp); > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c > index 04e536ae4d88..c4b88e4f2422 100644 > --- a/tools/perf/util/evsel.c > +++ b/tools/perf/util/evsel.c > @@ -34,6 +34,7 @@ static struct { > bool sample_id_all; > bool exclude_guest; > bool mmap2; > + bool munmap; > bool cloexec; > bool clockid; > bool clockid_wrong; > @@ -930,6 +931,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts, > attr->task = track; > attr->mmap = track; > attr->mmap2 = track && !perf_missing_features.mmap2; > + attr->munmap = track && !perf_missing_features.munmap; > attr->comm = track; > > if (opts->record_switch_events) > @@ -1395,6 +1397,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr, > PRINT_ATTRf(exclude_callchain_kernel, p_unsigned); > PRINT_ATTRf(exclude_callchain_user, p_unsigned); > PRINT_ATTRf(mmap2, p_unsigned); > + PRINT_ATTRf(munmap, p_unsigned); > PRINT_ATTRf(comm_exec, p_unsigned); > PRINT_ATTRf(use_clockid, p_unsigned); > PRINT_ATTRf(context_switch, p_unsigned); > @@ -1474,6 +1477,8 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, > } > > fallback_missing_features: > + if (perf_missing_features.munmap) > + evsel->attr.munmap = 0; > if (perf_missing_features.clockid_wrong) > evsel->attr.clockid = CLOCK_MONOTONIC; /* should always work */ > if (perf_missing_features.clockid) { > @@ -1603,10 +1608,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, > goto out_close; > > /* > - * Must probe features in the order they were added to the > + * Must probe features in the reverse order they were added to the > * perf_event_attr interface. > */ > - if (!perf_missing_features.write_backward && evsel->attr.write_backward) { > + if (!perf_missing_features.munmap && evsel->attr.munmap) { > + perf_missing_features.munmap = true; > + goto fallback_missing_features; > + } else if (!perf_missing_features.write_backward && evsel->attr.write_backward) { > perf_missing_features.write_backward = true; > goto out_close; > } else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) { > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c > index 747a034d1ff3..24f2f309d70f 100644 > --- a/tools/perf/util/machine.c > +++ b/tools/perf/util/machine.c > @@ -1320,6 +1320,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine, > return -1; > } > > +static int machine__process_kernel_munmap_event(struct machine *machine __maybe_unused, > + union perf_event *event __maybe_unused) > +{ > + /* > + * XXX: Fill this in as soon as we get munmap event for kernel > + * "mmaps", aka module unload > + */ > + return 0; > +} > + > int machine__process_mmap2_event(struct machine *machine, > union perf_event *event, > struct perf_sample *sample) > @@ -1379,6 +1389,48 @@ int machine__process_mmap2_event(struct machine *machine, > return 0; > } > > +int machine__process_munmap_event(struct machine *machine, > + union perf_event *event, > + struct perf_sample *sample) > +{ > + struct thread *thread; > + struct map *map; > + enum map_type type = MAP__FUNCTION; > + int ret = 0; > + > + if (dump_trace) > + perf_event__fprintf_munmap(event, stdout); > + > + if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL || > + sample->cpumode == PERF_RECORD_MISC_KERNEL) { > + ret = machine__process_kernel_munmap_event(machine, event); > + if (ret < 0) > + goto out_problem; > + return 0; > + } > + > + thread = machine__find_thread(machine, sample->pid, sample->tid); > + if (thread == NULL) > + goto out_problem; > + > + if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA) > + type = MAP__VARIABLE; > + > + map = map_groups__find(thread->mg, type, event->munmap.start); > + if (map != NULL) > + goto out_problem_map; > + > + map_groups__remove(thread->mg, map); > + thread__put(thread); > + return 0; > + > +out_problem_map: > + thread__put(thread); > +out_problem: > + dump_printf("problem processing PERF_RECORD_MUNMAP, skipping event.\n"); > + return 0; > +} > + > int machine__process_mmap_event(struct machine *machine, union perf_event *event, > struct perf_sample *sample) > { > @@ -1540,6 +1592,8 @@ int machine__process_event(struct machine *machine, union perf_event *event, > ret = machine__process_mmap_event(machine, event, sample); break; > case PERF_RECORD_MMAP2: > ret = machine__process_mmap2_event(machine, event, sample); break; > + case PERF_RECORD_MUNMAP: > + ret = machine__process_munmap_event(machine, event, sample); break; > case PERF_RECORD_FORK: > ret = machine__process_fork_event(machine, event, sample); break; > case PERF_RECORD_EXIT: > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h > index a28305029711..eabc8b9d8d36 100644 > --- a/tools/perf/util/machine.h > +++ b/tools/perf/util/machine.h > @@ -101,6 +101,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event > struct perf_sample *sample); > int machine__process_mmap2_event(struct machine *machine, union perf_event *event, > struct perf_sample *sample); > +int machine__process_munmap_event(struct machine *machine, union perf_event *event, > + struct perf_sample *sample); > int machine__process_event(struct machine *machine, union perf_event *event, > struct perf_sample *sample); > > diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c > index a5fbc012e3df..047e004dc53f 100644 > --- a/tools/perf/util/python.c > +++ b/tools/perf/util/python.c > @@ -1155,6 +1155,7 @@ static struct { > PERF_CONST(RECORD_READ), > PERF_CONST(RECORD_SAMPLE), > PERF_CONST(RECORD_MMAP2), > + PERF_CONST(RECORD_MUNMAP), > PERF_CONST(RECORD_AUX), > PERF_CONST(RECORD_ITRACE_START), > PERF_CONST(RECORD_LOST_SAMPLES), > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c > index 349c68144e55..42edd68ce0b8 100644 > --- a/tools/perf/util/session.c > +++ b/tools/perf/util/session.c > @@ -357,6 +357,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool) > tool->mmap = process_event_stub; > if (tool->mmap2 == NULL) > tool->mmap2 = process_event_stub; > + if (tool->munmap == NULL) > + tool->munmap = process_event_stub; > if (tool->comm == NULL) > tool->comm = process_event_stub; > if (tool->fork == NULL) > @@ -480,6 +482,20 @@ static void perf_event__mmap2_swap(union perf_event *event, > swap_sample_id_all(event, data); > } > } > + > +static void perf_event__munmap_swap(union perf_event *event, bool sample_id_all) > +{ > + event->munmap.start = bswap_64(event->munmap.start); > + event->munmap.len = bswap_64(event->munmap.len); > + > + if (sample_id_all) { > + void *data = &event->munmap.len; > + > + data += sizeof(event->munmap.len); > + swap_sample_id_all(event, data); > + } > +} > + > static void perf_event__task_swap(union perf_event *event, bool sample_id_all) > { > event->fork.pid = bswap_32(event->fork.pid); > @@ -773,6 +789,7 @@ typedef void (*perf_event__swap_op)(union perf_event *event, > static perf_event__swap_op perf_event__swap_ops[] = { > [PERF_RECORD_MMAP] = perf_event__mmap_swap, > [PERF_RECORD_MMAP2] = perf_event__mmap2_swap, > + [PERF_RECORD_MUNMAP] = perf_event__munmap_swap, > [PERF_RECORD_COMM] = perf_event__comm_swap, > [PERF_RECORD_FORK] = perf_event__task_swap, > [PERF_RECORD_EXIT] = perf_event__task_swap, > @@ -1237,6 +1254,8 @@ static int machines__deliver_event(struct machines *machines, > if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT) > ++evlist->stats.nr_proc_map_timeout; > return tool->mmap2(tool, event, sample, machine); > + case PERF_RECORD_MUNMAP: > + return tool->munmap(tool, event, sample, machine); > case PERF_RECORD_COMM: > return tool->comm(tool, event, sample, machine); > case PERF_RECORD_FORK: > diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h > index ac2590a3de2d..66291063b8b2 100644 > --- a/tools/perf/util/tool.h > +++ b/tools/perf/util/tool.h > @@ -39,6 +39,7 @@ struct perf_tool { > read; > event_op mmap, > mmap2, > + munmap, > comm, > fork, > exit, ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-27 17:46 ` Stephane Eranian @ 2017-01-27 18:05 ` Arnaldo Carvalho de Melo 2017-01-27 18:10 ` Stephane Eranian 0 siblings, 1 reply; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2017-01-27 18:05 UTC (permalink / raw) To: eranian Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Stephane Eranian, Andres Freund, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim Em Fri, Jan 27, 2017 at 09:46:55AM -0800, Stephane Eranian escreveu: > On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo > <arnaldo.melo@gmail.com> wrote: > > Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu: > >> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu: > >> > Something like the (compile tested only) below might be sufficient to > >> > disambiguate things. It would need a corresponding tools/perf patch of > >> > course, but I'm not too familiar with that code anymore. > >> > >> I'm working on patch to do feature test, fallback and handling of the > >> event, etc, will post later. > > > > Just compile tested, need to build a kernel with PeterZ's patch to test, > > feel free to go from there if in a hurry. > > > > > > The place where the map is yanked out of the thread's maps rbtree is at > > > > machine__process_munmap_event() > > > > The rest is making sure the tool works with older kernels, deals with > > endianness in the record in a perf.data file for cross platform > > analysis, hooking it to the various tools where handling this event > > makes sense. > > > At first glance this patch handles the munmap() well. But it will not solve > the case of Andres. Unless you're telling me that the kernel with Peterz's patch Nah, I just tried to implement support for the facility PeterZ was proposing for the kernel, not trying to solve Andres, silly me ;-) But then it doesn't even does that well, as it needs to take munmap.len into account, to possibly split the map if they aren't a perfect match (start, len). > will now generate munmap record because of the merging. If not, then the code > handling overlaps needs to change as well as I described in my other Email. Will re-read it, found it confusing at first read :-\ - Arnaldo > > [acme@jouet linux]$ diffstat /tmp/a.patch > > include/uapi/linux/perf_event.h | 14 +++++++++- > > perf/builtin-annotate.c | 1 > > perf/builtin-c2c.c | 1 > > perf/builtin-diff.c | 1 > > perf/builtin-inject.c | 17 +++++++++++- > > perf/builtin-kmem.c | 1 > > perf/builtin-mem.c | 1 > > perf/builtin-record.c | 1 > > perf/builtin-report.c | 1 > > perf/builtin-script.c | 30 ++++++++++++++++++++++ > > perf/builtin-trace.c | 1 > > perf/util/data-convert-bt.c | 1 > > perf/util/event.c | 18 +++++++++++++ > > perf/util/event.h | 10 +++++++ > > perf/util/evsel.c | 12 +++++++- > > perf/util/machine.c | 54 ++++++++++++++++++++++++++++++++++++++++ > > perf/util/machine.h | 2 + > > perf/util/python.c | 1 > > perf/util/session.c | 19 ++++++++++++++ > > perf/util/tool.h | 1 > > 20 files changed, 183 insertions(+), 4 deletions(-) > > [acme@jouet linux]$ > > > > diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h > > index c66a485a24ac..59c5cd5abbbf 100644 > > --- a/tools/include/uapi/linux/perf_event.h > > +++ b/tools/include/uapi/linux/perf_event.h > > @@ -344,7 +344,8 @@ struct perf_event_attr { > > use_clockid : 1, /* use @clockid for time fields */ > > context_switch : 1, /* context switch data */ > > write_backward : 1, /* Write ring buffer from end to beginning */ > > - __reserved_1 : 36; > > + munmap : 1, /* include munmap data */ > > + __reserved_1 : 35; > > > > union { > > __u32 wakeup_events; /* wakeup every n events */ > > @@ -862,6 +863,17 @@ enum perf_event_type { > > */ > > PERF_RECORD_SWITCH_CPU_WIDE = 15, > > > > + /* > > + * struct { > > + * struct perf_event_header header; > > + * > > + * u64 addr; > > + * u64 len; > > + * struct sample_id sample_id; > > + * }; > > + */ > > + PERF_RECORD_MUNMAP = 16, > > + > > PERF_RECORD_MAX, /* non-ABI */ > > }; > > > > diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c > > index ebb628332a6e..082b5f100ac5 100644 > > --- a/tools/perf/builtin-annotate.c > > +++ b/tools/perf/builtin-annotate.c > > @@ -390,6 +390,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused) > > .sample = process_sample_event, > > .mmap = perf_event__process_mmap, > > .mmap2 = perf_event__process_mmap2, > > + .munmap = perf_event__process_munmap, > > .comm = perf_event__process_comm, > > .exit = perf_event__process_exit, > > .fork = perf_event__process_fork, > > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c > > index e2b21723bbf8..3e1f1eba77e2 100644 > > --- a/tools/perf/builtin-c2c.c > > +++ b/tools/perf/builtin-c2c.c > > @@ -313,6 +313,7 @@ static struct perf_c2c c2c = { > > .sample = process_sample_event, > > .mmap = perf_event__process_mmap, > > .mmap2 = perf_event__process_mmap2, > > + .munmap = perf_event__process_munmap, > > .comm = perf_event__process_comm, > > .exit = perf_event__process_exit, > > .fork = perf_event__process_fork, > > diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c > > index 9ff0db4e2d0c..2c7856848887 100644 > > --- a/tools/perf/builtin-diff.c > > +++ b/tools/perf/builtin-diff.c > > @@ -350,6 +350,7 @@ static struct perf_tool tool = { > > .sample = diff__process_sample_event, > > .mmap = perf_event__process_mmap, > > .mmap2 = perf_event__process_mmap2, > > + .munmap = perf_event__process_munmap, > > .comm = perf_event__process_comm, > > .exit = perf_event__process_exit, > > .fork = perf_event__process_fork, > > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c > > index b9bc7e39833a..4e6103d0c163 100644 > > --- a/tools/perf/builtin-inject.c > > +++ b/tools/perf/builtin-inject.c > > @@ -282,6 +282,19 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool, > > return err; > > } > > > > +static int perf_event__repipe_munmap(struct perf_tool *tool, > > + union perf_event *event, > > + struct perf_sample *sample, > > + struct machine *machine) > > +{ > > + int err; > > + > > + err = perf_event__process_munmap(tool, event, sample, machine); > > + perf_event__repipe(tool, event, sample, machine); > > + > > + return err; > > +} > > + > > #ifdef HAVE_JITDUMP > > static int perf_event__jit_repipe_mmap2(struct perf_tool *tool, > > union perf_event *event, > > @@ -569,7 +582,7 @@ static void strip_init(struct perf_inject *inject) > > static bool has_tracking(struct perf_evsel *evsel) > > { > > return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm || > > - evsel->attr.task; > > + evsel->attr.task || evsel->attr.munmap; > > } > > > > #define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \ > > @@ -632,6 +645,7 @@ static int __cmd_inject(struct perf_inject *inject) > > inject->itrace_synth_opts.set) { > > inject->tool.mmap = perf_event__repipe_mmap; > > inject->tool.mmap2 = perf_event__repipe_mmap2; > > + inject->tool.munmap = perf_event__repipe_munmap; > > inject->tool.fork = perf_event__repipe_fork; > > inject->tool.tracing_data = perf_event__repipe_tracing_data; > > } > > @@ -732,6 +746,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused) > > .sample = perf_event__repipe_sample, > > .mmap = perf_event__repipe, > > .mmap2 = perf_event__repipe, > > + .munmap = perf_event__repipe, > > .comm = perf_event__repipe, > > .fork = perf_event__repipe, > > .exit = perf_event__repipe, > > diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c > > index 29f4751a3574..eb59aa7a0f5b 100644 > > --- a/tools/perf/builtin-kmem.c > > +++ b/tools/perf/builtin-kmem.c > > @@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = { > > .comm = perf_event__process_comm, > > .mmap = perf_event__process_mmap, > > .mmap2 = perf_event__process_mmap2, > > + .munmap = perf_event__process_munmap, > > .ordered_events = true, > > }; > > > > diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c > > index cd7bc4d104e2..de026e9ef1be 100644 > > --- a/tools/perf/builtin-mem.c > > +++ b/tools/perf/builtin-mem.c > > @@ -338,6 +338,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused) > > .sample = process_sample_event, > > .mmap = perf_event__process_mmap, > > .mmap2 = perf_event__process_mmap2, > > + .munmap = perf_event__process_munmap, > > .comm = perf_event__process_comm, > > .lost = perf_event__process_lost, > > .fork = perf_event__process_fork, > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > > index ffac8ca9fb01..f197fc196cd8 100644 > > --- a/tools/perf/builtin-record.c > > +++ b/tools/perf/builtin-record.c > > @@ -1498,6 +1498,7 @@ static struct record record = { > > .comm = perf_event__process_comm, > > .mmap = perf_event__process_mmap, > > .mmap2 = perf_event__process_mmap2, > > + .munmap = perf_event__process_munmap, > > .ordered_events = true, > > }, > > }; > > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c > > index dbd7fa028861..96680fbeb664 100644 > > --- a/tools/perf/builtin-report.c > > +++ b/tools/perf/builtin-report.c > > @@ -693,6 +693,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) > > .sample = process_sample_event, > > .mmap = perf_event__process_mmap, > > .mmap2 = perf_event__process_mmap2, > > + .munmap = perf_event__process_munmap, > > .comm = perf_event__process_comm, > > .exit = perf_event__process_exit, > > .fork = perf_event__process_fork, > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > > index c0783b4f7b6c..27fcbcb15782 100644 > > --- a/tools/perf/builtin-script.c > > +++ b/tools/perf/builtin-script.c > > @@ -1244,6 +1244,34 @@ static int process_mmap2_event(struct perf_tool *tool, > > return 0; > > } > > > > +static int process_munmap_event(struct perf_tool *tool, union perf_event *event, > > + struct perf_sample *sample, struct machine *machine) > > +{ > > + struct thread *thread; > > + struct perf_script *script = container_of(tool, struct perf_script, tool); > > + struct perf_session *session = script->session; > > + struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id); > > + > > + if (perf_event__process_munmap(tool, event, sample, machine) < 0) > > + return -1; > > + > > + thread = machine__findnew_thread(machine, sample->pid, sample->tid); > > + if (thread == NULL) { > > + pr_debug("problem processing MUNMAP event, skipping it.\n"); > > + return -1; > > + } > > + > > + if (!evsel->attr.sample_id_all) { > > + pr_debug("MUNMAP event requires attr.sample_id_all, skipping it.\n"); > > + return -1; > > + } > > + > > + print_sample_start(sample, thread, evsel); > > + perf_event__fprintf(event, stdout); > > + thread__put(thread); > > + return 0; > > +} > > + > > static int process_switch_event(struct perf_tool *tool, > > union perf_event *event, > > struct perf_sample *sample, > > @@ -1290,6 +1318,7 @@ static int __cmd_script(struct perf_script *script) > > if (script->show_mmap_events) { > > script->tool.mmap = process_mmap_event; > > script->tool.mmap2 = process_mmap2_event; > > + script->tool.munmap = process_munmap_event; > > } > > if (script->show_switch_events) > > script->tool.context_switch = process_switch_event; > > @@ -2096,6 +2125,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) > > .sample = process_sample_event, > > .mmap = perf_event__process_mmap, > > .mmap2 = perf_event__process_mmap2, > > + .munmap = perf_event__process_munmap, > > .comm = perf_event__process_comm, > > .exit = perf_event__process_exit, > > .fork = perf_event__process_fork, > > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c > > index 40ef9b293d1b..cfb7e9e2cd7d 100644 > > --- a/tools/perf/builtin-trace.c > > +++ b/tools/perf/builtin-trace.c > > @@ -2411,6 +2411,7 @@ static int trace__replay(struct trace *trace) > > trace->tool.sample = trace__process_sample; > > trace->tool.mmap = perf_event__process_mmap; > > trace->tool.mmap2 = perf_event__process_mmap2; > > + trace->tool.munmap = perf_event__process_munmap; > > trace->tool.comm = perf_event__process_comm; > > trace->tool.exit = perf_event__process_exit; > > trace->tool.fork = perf_event__process_fork; > > diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c > > index 4e6cbc99f08e..8d1feb9fe82d 100644 > > --- a/tools/perf/util/data-convert-bt.c > > +++ b/tools/perf/util/data-convert-bt.c > > @@ -1462,6 +1462,7 @@ int bt_convert__perf2ctf(const char *input, const char *path, > > .sample = process_sample_event, > > .mmap = perf_event__process_mmap, > > .mmap2 = perf_event__process_mmap2, > > + .munmap = perf_event__process_munmap, > > .comm = perf_event__process_comm, > > .exit = perf_event__process_exit, > > .fork = perf_event__process_fork, > > diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c > > index 8ab0d7da956b..ffcba9c07c0d 100644 > > --- a/tools/perf/util/event.c > > +++ b/tools/perf/util/event.c > > @@ -18,6 +18,7 @@ static const char *perf_event__names[] = { > > [0] = "TOTAL", > > [PERF_RECORD_MMAP] = "MMAP", > > [PERF_RECORD_MMAP2] = "MMAP2", > > + [PERF_RECORD_MUNMAP] = "MUNMAP", > > [PERF_RECORD_LOST] = "LOST", > > [PERF_RECORD_COMM] = "COMM", > > [PERF_RECORD_EXIT] = "EXIT", > > @@ -1080,6 +1081,12 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp) > > event->mmap2.filename); > > } > > > > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp) > > +{ > > + return fprintf(fp, " [%#" PRIx64 ", %" PRIu64 "]\n", > > + event->munmap.start, event->munmap.len); > > +} > > + > > size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp) > > { > > struct thread_map *threads = thread_map__new_event(&event->thread_map); > > @@ -1128,6 +1135,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused, > > return machine__process_mmap2_event(machine, event, sample); > > } > > > > +int perf_event__process_munmap(struct perf_tool *tool __maybe_unused, > > + union perf_event *event, > > + struct perf_sample *sample, > > + struct machine *machine) > > +{ > > + return machine__process_munmap_event(machine, event, sample); > > +} > > + > > size_t perf_event__fprintf_task(union perf_event *event, FILE *fp) > > { > > return fprintf(fp, "(%d:%d):(%d:%d)\n", > > @@ -1199,6 +1214,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp) > > case PERF_RECORD_MMAP2: > > ret += perf_event__fprintf_mmap2(event, fp); > > break; > > + case PERF_RECORD_MUNMAP: > > + ret += perf_event__fprintf_munmap(event, fp); > > + break; > > case PERF_RECORD_AUX: > > ret += perf_event__fprintf_aux(event, fp); > > break; > > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h > > index c735c53a26f8..00942b6a9d48 100644 > > --- a/tools/perf/util/event.h > > +++ b/tools/perf/util/event.h > > @@ -33,6 +33,12 @@ struct mmap2_event { > > char filename[PATH_MAX]; > > }; > > > > +struct munmap_event { > > + struct perf_event_header header; > > + u64 start; > > + u64 len; > > +}; > > + > > struct comm_event { > > struct perf_event_header header; > > u32 pid, tid; > > @@ -484,6 +490,7 @@ union perf_event { > > struct perf_event_header header; > > struct mmap_event mmap; > > struct mmap2_event mmap2; > > + struct munmap_event munmap; > > struct comm_event comm; > > struct fork_event fork; > > struct lost_event lost; > > @@ -595,6 +602,8 @@ int perf_event__process_mmap2(struct perf_tool *tool, > > union perf_event *event, > > struct perf_sample *sample, > > struct machine *machine); > > +int perf_event__process_munmap(struct perf_tool *tool, union perf_event *event, > > + struct perf_sample *sample, struct machine *machine); > > int perf_event__process_fork(struct perf_tool *tool, > > union perf_event *event, > > struct perf_sample *sample, > > @@ -647,6 +656,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool, > > size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp); > > size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp); > > size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp); > > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp); > > size_t perf_event__fprintf_task(union perf_event *event, FILE *fp); > > size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp); > > size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp); > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c > > index 04e536ae4d88..c4b88e4f2422 100644 > > --- a/tools/perf/util/evsel.c > > +++ b/tools/perf/util/evsel.c > > @@ -34,6 +34,7 @@ static struct { > > bool sample_id_all; > > bool exclude_guest; > > bool mmap2; > > + bool munmap; > > bool cloexec; > > bool clockid; > > bool clockid_wrong; > > @@ -930,6 +931,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts, > > attr->task = track; > > attr->mmap = track; > > attr->mmap2 = track && !perf_missing_features.mmap2; > > + attr->munmap = track && !perf_missing_features.munmap; > > attr->comm = track; > > > > if (opts->record_switch_events) > > @@ -1395,6 +1397,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr, > > PRINT_ATTRf(exclude_callchain_kernel, p_unsigned); > > PRINT_ATTRf(exclude_callchain_user, p_unsigned); > > PRINT_ATTRf(mmap2, p_unsigned); > > + PRINT_ATTRf(munmap, p_unsigned); > > PRINT_ATTRf(comm_exec, p_unsigned); > > PRINT_ATTRf(use_clockid, p_unsigned); > > PRINT_ATTRf(context_switch, p_unsigned); > > @@ -1474,6 +1477,8 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, > > } > > > > fallback_missing_features: > > + if (perf_missing_features.munmap) > > + evsel->attr.munmap = 0; > > if (perf_missing_features.clockid_wrong) > > evsel->attr.clockid = CLOCK_MONOTONIC; /* should always work */ > > if (perf_missing_features.clockid) { > > @@ -1603,10 +1608,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, > > goto out_close; > > > > /* > > - * Must probe features in the order they were added to the > > + * Must probe features in the reverse order they were added to the > > * perf_event_attr interface. > > */ > > - if (!perf_missing_features.write_backward && evsel->attr.write_backward) { > > + if (!perf_missing_features.munmap && evsel->attr.munmap) { > > + perf_missing_features.munmap = true; > > + goto fallback_missing_features; > > + } else if (!perf_missing_features.write_backward && evsel->attr.write_backward) { > > perf_missing_features.write_backward = true; > > goto out_close; > > } else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) { > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c > > index 747a034d1ff3..24f2f309d70f 100644 > > --- a/tools/perf/util/machine.c > > +++ b/tools/perf/util/machine.c > > @@ -1320,6 +1320,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine, > > return -1; > > } > > > > +static int machine__process_kernel_munmap_event(struct machine *machine __maybe_unused, > > + union perf_event *event __maybe_unused) > > +{ > > + /* > > + * XXX: Fill this in as soon as we get munmap event for kernel > > + * "mmaps", aka module unload > > + */ > > + return 0; > > +} > > + > > int machine__process_mmap2_event(struct machine *machine, > > union perf_event *event, > > struct perf_sample *sample) > > @@ -1379,6 +1389,48 @@ int machine__process_mmap2_event(struct machine *machine, > > return 0; > > } > > > > +int machine__process_munmap_event(struct machine *machine, > > + union perf_event *event, > > + struct perf_sample *sample) > > +{ > > + struct thread *thread; > > + struct map *map; > > + enum map_type type = MAP__FUNCTION; > > + int ret = 0; > > + > > + if (dump_trace) > > + perf_event__fprintf_munmap(event, stdout); > > + > > + if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL || > > + sample->cpumode == PERF_RECORD_MISC_KERNEL) { > > + ret = machine__process_kernel_munmap_event(machine, event); > > + if (ret < 0) > > + goto out_problem; > > + return 0; > > + } > > + > > + thread = machine__find_thread(machine, sample->pid, sample->tid); > > + if (thread == NULL) > > + goto out_problem; > > + > > + if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA) > > + type = MAP__VARIABLE; > > + > > + map = map_groups__find(thread->mg, type, event->munmap.start); > > + if (map != NULL) > > + goto out_problem_map; > > + > > + map_groups__remove(thread->mg, map); > > + thread__put(thread); > > + return 0; > > + > > +out_problem_map: > > + thread__put(thread); > > +out_problem: > > + dump_printf("problem processing PERF_RECORD_MUNMAP, skipping event.\n"); > > + return 0; > > +} > > + > > int machine__process_mmap_event(struct machine *machine, union perf_event *event, > > struct perf_sample *sample) > > { > > @@ -1540,6 +1592,8 @@ int machine__process_event(struct machine *machine, union perf_event *event, > > ret = machine__process_mmap_event(machine, event, sample); break; > > case PERF_RECORD_MMAP2: > > ret = machine__process_mmap2_event(machine, event, sample); break; > > + case PERF_RECORD_MUNMAP: > > + ret = machine__process_munmap_event(machine, event, sample); break; > > case PERF_RECORD_FORK: > > ret = machine__process_fork_event(machine, event, sample); break; > > case PERF_RECORD_EXIT: > > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h > > index a28305029711..eabc8b9d8d36 100644 > > --- a/tools/perf/util/machine.h > > +++ b/tools/perf/util/machine.h > > @@ -101,6 +101,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event > > struct perf_sample *sample); > > int machine__process_mmap2_event(struct machine *machine, union perf_event *event, > > struct perf_sample *sample); > > +int machine__process_munmap_event(struct machine *machine, union perf_event *event, > > + struct perf_sample *sample); > > int machine__process_event(struct machine *machine, union perf_event *event, > > struct perf_sample *sample); > > > > diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c > > index a5fbc012e3df..047e004dc53f 100644 > > --- a/tools/perf/util/python.c > > +++ b/tools/perf/util/python.c > > @@ -1155,6 +1155,7 @@ static struct { > > PERF_CONST(RECORD_READ), > > PERF_CONST(RECORD_SAMPLE), > > PERF_CONST(RECORD_MMAP2), > > + PERF_CONST(RECORD_MUNMAP), > > PERF_CONST(RECORD_AUX), > > PERF_CONST(RECORD_ITRACE_START), > > PERF_CONST(RECORD_LOST_SAMPLES), > > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c > > index 349c68144e55..42edd68ce0b8 100644 > > --- a/tools/perf/util/session.c > > +++ b/tools/perf/util/session.c > > @@ -357,6 +357,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool) > > tool->mmap = process_event_stub; > > if (tool->mmap2 == NULL) > > tool->mmap2 = process_event_stub; > > + if (tool->munmap == NULL) > > + tool->munmap = process_event_stub; > > if (tool->comm == NULL) > > tool->comm = process_event_stub; > > if (tool->fork == NULL) > > @@ -480,6 +482,20 @@ static void perf_event__mmap2_swap(union perf_event *event, > > swap_sample_id_all(event, data); > > } > > } > > + > > +static void perf_event__munmap_swap(union perf_event *event, bool sample_id_all) > > +{ > > + event->munmap.start = bswap_64(event->munmap.start); > > + event->munmap.len = bswap_64(event->munmap.len); > > + > > + if (sample_id_all) { > > + void *data = &event->munmap.len; > > + > > + data += sizeof(event->munmap.len); > > + swap_sample_id_all(event, data); > > + } > > +} > > + > > static void perf_event__task_swap(union perf_event *event, bool sample_id_all) > > { > > event->fork.pid = bswap_32(event->fork.pid); > > @@ -773,6 +789,7 @@ typedef void (*perf_event__swap_op)(union perf_event *event, > > static perf_event__swap_op perf_event__swap_ops[] = { > > [PERF_RECORD_MMAP] = perf_event__mmap_swap, > > [PERF_RECORD_MMAP2] = perf_event__mmap2_swap, > > + [PERF_RECORD_MUNMAP] = perf_event__munmap_swap, > > [PERF_RECORD_COMM] = perf_event__comm_swap, > > [PERF_RECORD_FORK] = perf_event__task_swap, > > [PERF_RECORD_EXIT] = perf_event__task_swap, > > @@ -1237,6 +1254,8 @@ static int machines__deliver_event(struct machines *machines, > > if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT) > > ++evlist->stats.nr_proc_map_timeout; > > return tool->mmap2(tool, event, sample, machine); > > + case PERF_RECORD_MUNMAP: > > + return tool->munmap(tool, event, sample, machine); > > case PERF_RECORD_COMM: > > return tool->comm(tool, event, sample, machine); > > case PERF_RECORD_FORK: > > diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h > > index ac2590a3de2d..66291063b8b2 100644 > > --- a/tools/perf/util/tool.h > > +++ b/tools/perf/util/tool.h > > @@ -39,6 +39,7 @@ struct perf_tool { > > read; > > event_op mmap, > > mmap2, > > + munmap, > > comm, > > fork, > > exit, ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-27 18:05 ` Arnaldo Carvalho de Melo @ 2017-01-27 18:10 ` Stephane Eranian 2017-01-27 19:18 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 29+ messages in thread From: Stephane Eranian @ 2017-01-27 18:10 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Stephane Eranian, Andres Freund, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim On Fri, Jan 27, 2017 at 10:05 AM, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote: > Em Fri, Jan 27, 2017 at 09:46:55AM -0800, Stephane Eranian escreveu: >> On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo >> <arnaldo.melo@gmail.com> wrote: >> > Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu: >> >> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu: >> >> > Something like the (compile tested only) below might be sufficient to >> >> > disambiguate things. It would need a corresponding tools/perf patch of >> >> > course, but I'm not too familiar with that code anymore. >> >> >> >> I'm working on patch to do feature test, fallback and handling of the >> >> event, etc, will post later. >> > >> > Just compile tested, need to build a kernel with PeterZ's patch to test, >> > feel free to go from there if in a hurry. >> > >> > >> > The place where the map is yanked out of the thread's maps rbtree is at >> > >> > machine__process_munmap_event() >> > >> > The rest is making sure the tool works with older kernels, deals with >> > endianness in the record in a perf.data file for cross platform >> > analysis, hooking it to the various tools where handling this event >> > makes sense. >> > >> At first glance this patch handles the munmap() well. But it will not solve >> the case of Andres. Unless you're telling me that the kernel with Peterz's patch > > > Nah, I just tried to implement support for the facility PeterZ was > proposing for the kernel, not trying to solve Andres, silly me ;-) > > But then it doesn't even does that well, as it needs to take munmap.len > into account, to possibly split the map if they aren't a perfect match > (start, len). > Ah, yes, that's correct. You need to consider len and possibly split the maps you have. >> will now generate munmap record because of the merging. If not, then the code >> handling overlaps needs to change as well as I described in my other Email. > > Will re-read it, found it confusing at first read :-\ > Took me some time to understand his test case as well. But the key point is the fact that anon cannot overlap a non-anon map without a prior munmap() anymore. > - Arnaldo > >> > [acme@jouet linux]$ diffstat /tmp/a.patch >> > include/uapi/linux/perf_event.h | 14 +++++++++- >> > perf/builtin-annotate.c | 1 >> > perf/builtin-c2c.c | 1 >> > perf/builtin-diff.c | 1 >> > perf/builtin-inject.c | 17 +++++++++++- >> > perf/builtin-kmem.c | 1 >> > perf/builtin-mem.c | 1 >> > perf/builtin-record.c | 1 >> > perf/builtin-report.c | 1 >> > perf/builtin-script.c | 30 ++++++++++++++++++++++ >> > perf/builtin-trace.c | 1 >> > perf/util/data-convert-bt.c | 1 >> > perf/util/event.c | 18 +++++++++++++ >> > perf/util/event.h | 10 +++++++ >> > perf/util/evsel.c | 12 +++++++- >> > perf/util/machine.c | 54 ++++++++++++++++++++++++++++++++++++++++ >> > perf/util/machine.h | 2 + >> > perf/util/python.c | 1 >> > perf/util/session.c | 19 ++++++++++++++ >> > perf/util/tool.h | 1 >> > 20 files changed, 183 insertions(+), 4 deletions(-) >> > [acme@jouet linux]$ >> > >> > diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h >> > index c66a485a24ac..59c5cd5abbbf 100644 >> > --- a/tools/include/uapi/linux/perf_event.h >> > +++ b/tools/include/uapi/linux/perf_event.h >> > @@ -344,7 +344,8 @@ struct perf_event_attr { >> > use_clockid : 1, /* use @clockid for time fields */ >> > context_switch : 1, /* context switch data */ >> > write_backward : 1, /* Write ring buffer from end to beginning */ >> > - __reserved_1 : 36; >> > + munmap : 1, /* include munmap data */ >> > + __reserved_1 : 35; >> > >> > union { >> > __u32 wakeup_events; /* wakeup every n events */ >> > @@ -862,6 +863,17 @@ enum perf_event_type { >> > */ >> > PERF_RECORD_SWITCH_CPU_WIDE = 15, >> > >> > + /* >> > + * struct { >> > + * struct perf_event_header header; >> > + * >> > + * u64 addr; >> > + * u64 len; >> > + * struct sample_id sample_id; >> > + * }; >> > + */ >> > + PERF_RECORD_MUNMAP = 16, >> > + >> > PERF_RECORD_MAX, /* non-ABI */ >> > }; >> > >> > diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c >> > index ebb628332a6e..082b5f100ac5 100644 >> > --- a/tools/perf/builtin-annotate.c >> > +++ b/tools/perf/builtin-annotate.c >> > @@ -390,6 +390,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused) >> > .sample = process_sample_event, >> > .mmap = perf_event__process_mmap, >> > .mmap2 = perf_event__process_mmap2, >> > + .munmap = perf_event__process_munmap, >> > .comm = perf_event__process_comm, >> > .exit = perf_event__process_exit, >> > .fork = perf_event__process_fork, >> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c >> > index e2b21723bbf8..3e1f1eba77e2 100644 >> > --- a/tools/perf/builtin-c2c.c >> > +++ b/tools/perf/builtin-c2c.c >> > @@ -313,6 +313,7 @@ static struct perf_c2c c2c = { >> > .sample = process_sample_event, >> > .mmap = perf_event__process_mmap, >> > .mmap2 = perf_event__process_mmap2, >> > + .munmap = perf_event__process_munmap, >> > .comm = perf_event__process_comm, >> > .exit = perf_event__process_exit, >> > .fork = perf_event__process_fork, >> > diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c >> > index 9ff0db4e2d0c..2c7856848887 100644 >> > --- a/tools/perf/builtin-diff.c >> > +++ b/tools/perf/builtin-diff.c >> > @@ -350,6 +350,7 @@ static struct perf_tool tool = { >> > .sample = diff__process_sample_event, >> > .mmap = perf_event__process_mmap, >> > .mmap2 = perf_event__process_mmap2, >> > + .munmap = perf_event__process_munmap, >> > .comm = perf_event__process_comm, >> > .exit = perf_event__process_exit, >> > .fork = perf_event__process_fork, >> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c >> > index b9bc7e39833a..4e6103d0c163 100644 >> > --- a/tools/perf/builtin-inject.c >> > +++ b/tools/perf/builtin-inject.c >> > @@ -282,6 +282,19 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool, >> > return err; >> > } >> > >> > +static int perf_event__repipe_munmap(struct perf_tool *tool, >> > + union perf_event *event, >> > + struct perf_sample *sample, >> > + struct machine *machine) >> > +{ >> > + int err; >> > + >> > + err = perf_event__process_munmap(tool, event, sample, machine); >> > + perf_event__repipe(tool, event, sample, machine); >> > + >> > + return err; >> > +} >> > + >> > #ifdef HAVE_JITDUMP >> > static int perf_event__jit_repipe_mmap2(struct perf_tool *tool, >> > union perf_event *event, >> > @@ -569,7 +582,7 @@ static void strip_init(struct perf_inject *inject) >> > static bool has_tracking(struct perf_evsel *evsel) >> > { >> > return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm || >> > - evsel->attr.task; >> > + evsel->attr.task || evsel->attr.munmap; >> > } >> > >> > #define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \ >> > @@ -632,6 +645,7 @@ static int __cmd_inject(struct perf_inject *inject) >> > inject->itrace_synth_opts.set) { >> > inject->tool.mmap = perf_event__repipe_mmap; >> > inject->tool.mmap2 = perf_event__repipe_mmap2; >> > + inject->tool.munmap = perf_event__repipe_munmap; >> > inject->tool.fork = perf_event__repipe_fork; >> > inject->tool.tracing_data = perf_event__repipe_tracing_data; >> > } >> > @@ -732,6 +746,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused) >> > .sample = perf_event__repipe_sample, >> > .mmap = perf_event__repipe, >> > .mmap2 = perf_event__repipe, >> > + .munmap = perf_event__repipe, >> > .comm = perf_event__repipe, >> > .fork = perf_event__repipe, >> > .exit = perf_event__repipe, >> > diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c >> > index 29f4751a3574..eb59aa7a0f5b 100644 >> > --- a/tools/perf/builtin-kmem.c >> > +++ b/tools/perf/builtin-kmem.c >> > @@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = { >> > .comm = perf_event__process_comm, >> > .mmap = perf_event__process_mmap, >> > .mmap2 = perf_event__process_mmap2, >> > + .munmap = perf_event__process_munmap, >> > .ordered_events = true, >> > }; >> > >> > diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c >> > index cd7bc4d104e2..de026e9ef1be 100644 >> > --- a/tools/perf/builtin-mem.c >> > +++ b/tools/perf/builtin-mem.c >> > @@ -338,6 +338,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused) >> > .sample = process_sample_event, >> > .mmap = perf_event__process_mmap, >> > .mmap2 = perf_event__process_mmap2, >> > + .munmap = perf_event__process_munmap, >> > .comm = perf_event__process_comm, >> > .lost = perf_event__process_lost, >> > .fork = perf_event__process_fork, >> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c >> > index ffac8ca9fb01..f197fc196cd8 100644 >> > --- a/tools/perf/builtin-record.c >> > +++ b/tools/perf/builtin-record.c >> > @@ -1498,6 +1498,7 @@ static struct record record = { >> > .comm = perf_event__process_comm, >> > .mmap = perf_event__process_mmap, >> > .mmap2 = perf_event__process_mmap2, >> > + .munmap = perf_event__process_munmap, >> > .ordered_events = true, >> > }, >> > }; >> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c >> > index dbd7fa028861..96680fbeb664 100644 >> > --- a/tools/perf/builtin-report.c >> > +++ b/tools/perf/builtin-report.c >> > @@ -693,6 +693,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused) >> > .sample = process_sample_event, >> > .mmap = perf_event__process_mmap, >> > .mmap2 = perf_event__process_mmap2, >> > + .munmap = perf_event__process_munmap, >> > .comm = perf_event__process_comm, >> > .exit = perf_event__process_exit, >> > .fork = perf_event__process_fork, >> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c >> > index c0783b4f7b6c..27fcbcb15782 100644 >> > --- a/tools/perf/builtin-script.c >> > +++ b/tools/perf/builtin-script.c >> > @@ -1244,6 +1244,34 @@ static int process_mmap2_event(struct perf_tool *tool, >> > return 0; >> > } >> > >> > +static int process_munmap_event(struct perf_tool *tool, union perf_event *event, >> > + struct perf_sample *sample, struct machine *machine) >> > +{ >> > + struct thread *thread; >> > + struct perf_script *script = container_of(tool, struct perf_script, tool); >> > + struct perf_session *session = script->session; >> > + struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id); >> > + >> > + if (perf_event__process_munmap(tool, event, sample, machine) < 0) >> > + return -1; >> > + >> > + thread = machine__findnew_thread(machine, sample->pid, sample->tid); >> > + if (thread == NULL) { >> > + pr_debug("problem processing MUNMAP event, skipping it.\n"); >> > + return -1; >> > + } >> > + >> > + if (!evsel->attr.sample_id_all) { >> > + pr_debug("MUNMAP event requires attr.sample_id_all, skipping it.\n"); >> > + return -1; >> > + } >> > + >> > + print_sample_start(sample, thread, evsel); >> > + perf_event__fprintf(event, stdout); >> > + thread__put(thread); >> > + return 0; >> > +} >> > + >> > static int process_switch_event(struct perf_tool *tool, >> > union perf_event *event, >> > struct perf_sample *sample, >> > @@ -1290,6 +1318,7 @@ static int __cmd_script(struct perf_script *script) >> > if (script->show_mmap_events) { >> > script->tool.mmap = process_mmap_event; >> > script->tool.mmap2 = process_mmap2_event; >> > + script->tool.munmap = process_munmap_event; >> > } >> > if (script->show_switch_events) >> > script->tool.context_switch = process_switch_event; >> > @@ -2096,6 +2125,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused) >> > .sample = process_sample_event, >> > .mmap = perf_event__process_mmap, >> > .mmap2 = perf_event__process_mmap2, >> > + .munmap = perf_event__process_munmap, >> > .comm = perf_event__process_comm, >> > .exit = perf_event__process_exit, >> > .fork = perf_event__process_fork, >> > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c >> > index 40ef9b293d1b..cfb7e9e2cd7d 100644 >> > --- a/tools/perf/builtin-trace.c >> > +++ b/tools/perf/builtin-trace.c >> > @@ -2411,6 +2411,7 @@ static int trace__replay(struct trace *trace) >> > trace->tool.sample = trace__process_sample; >> > trace->tool.mmap = perf_event__process_mmap; >> > trace->tool.mmap2 = perf_event__process_mmap2; >> > + trace->tool.munmap = perf_event__process_munmap; >> > trace->tool.comm = perf_event__process_comm; >> > trace->tool.exit = perf_event__process_exit; >> > trace->tool.fork = perf_event__process_fork; >> > diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c >> > index 4e6cbc99f08e..8d1feb9fe82d 100644 >> > --- a/tools/perf/util/data-convert-bt.c >> > +++ b/tools/perf/util/data-convert-bt.c >> > @@ -1462,6 +1462,7 @@ int bt_convert__perf2ctf(const char *input, const char *path, >> > .sample = process_sample_event, >> > .mmap = perf_event__process_mmap, >> > .mmap2 = perf_event__process_mmap2, >> > + .munmap = perf_event__process_munmap, >> > .comm = perf_event__process_comm, >> > .exit = perf_event__process_exit, >> > .fork = perf_event__process_fork, >> > diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c >> > index 8ab0d7da956b..ffcba9c07c0d 100644 >> > --- a/tools/perf/util/event.c >> > +++ b/tools/perf/util/event.c >> > @@ -18,6 +18,7 @@ static const char *perf_event__names[] = { >> > [0] = "TOTAL", >> > [PERF_RECORD_MMAP] = "MMAP", >> > [PERF_RECORD_MMAP2] = "MMAP2", >> > + [PERF_RECORD_MUNMAP] = "MUNMAP", >> > [PERF_RECORD_LOST] = "LOST", >> > [PERF_RECORD_COMM] = "COMM", >> > [PERF_RECORD_EXIT] = "EXIT", >> > @@ -1080,6 +1081,12 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp) >> > event->mmap2.filename); >> > } >> > >> > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp) >> > +{ >> > + return fprintf(fp, " [%#" PRIx64 ", %" PRIu64 "]\n", >> > + event->munmap.start, event->munmap.len); >> > +} >> > + >> > size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp) >> > { >> > struct thread_map *threads = thread_map__new_event(&event->thread_map); >> > @@ -1128,6 +1135,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused, >> > return machine__process_mmap2_event(machine, event, sample); >> > } >> > >> > +int perf_event__process_munmap(struct perf_tool *tool __maybe_unused, >> > + union perf_event *event, >> > + struct perf_sample *sample, >> > + struct machine *machine) >> > +{ >> > + return machine__process_munmap_event(machine, event, sample); >> > +} >> > + >> > size_t perf_event__fprintf_task(union perf_event *event, FILE *fp) >> > { >> > return fprintf(fp, "(%d:%d):(%d:%d)\n", >> > @@ -1199,6 +1214,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp) >> > case PERF_RECORD_MMAP2: >> > ret += perf_event__fprintf_mmap2(event, fp); >> > break; >> > + case PERF_RECORD_MUNMAP: >> > + ret += perf_event__fprintf_munmap(event, fp); >> > + break; >> > case PERF_RECORD_AUX: >> > ret += perf_event__fprintf_aux(event, fp); >> > break; >> > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h >> > index c735c53a26f8..00942b6a9d48 100644 >> > --- a/tools/perf/util/event.h >> > +++ b/tools/perf/util/event.h >> > @@ -33,6 +33,12 @@ struct mmap2_event { >> > char filename[PATH_MAX]; >> > }; >> > >> > +struct munmap_event { >> > + struct perf_event_header header; >> > + u64 start; >> > + u64 len; >> > +}; >> > + >> > struct comm_event { >> > struct perf_event_header header; >> > u32 pid, tid; >> > @@ -484,6 +490,7 @@ union perf_event { >> > struct perf_event_header header; >> > struct mmap_event mmap; >> > struct mmap2_event mmap2; >> > + struct munmap_event munmap; >> > struct comm_event comm; >> > struct fork_event fork; >> > struct lost_event lost; >> > @@ -595,6 +602,8 @@ int perf_event__process_mmap2(struct perf_tool *tool, >> > union perf_event *event, >> > struct perf_sample *sample, >> > struct machine *machine); >> > +int perf_event__process_munmap(struct perf_tool *tool, union perf_event *event, >> > + struct perf_sample *sample, struct machine *machine); >> > int perf_event__process_fork(struct perf_tool *tool, >> > union perf_event *event, >> > struct perf_sample *sample, >> > @@ -647,6 +656,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool, >> > size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp); >> > size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp); >> > size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp); >> > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp); >> > size_t perf_event__fprintf_task(union perf_event *event, FILE *fp); >> > size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp); >> > size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp); >> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c >> > index 04e536ae4d88..c4b88e4f2422 100644 >> > --- a/tools/perf/util/evsel.c >> > +++ b/tools/perf/util/evsel.c >> > @@ -34,6 +34,7 @@ static struct { >> > bool sample_id_all; >> > bool exclude_guest; >> > bool mmap2; >> > + bool munmap; >> > bool cloexec; >> > bool clockid; >> > bool clockid_wrong; >> > @@ -930,6 +931,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts, >> > attr->task = track; >> > attr->mmap = track; >> > attr->mmap2 = track && !perf_missing_features.mmap2; >> > + attr->munmap = track && !perf_missing_features.munmap; >> > attr->comm = track; >> > >> > if (opts->record_switch_events) >> > @@ -1395,6 +1397,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr, >> > PRINT_ATTRf(exclude_callchain_kernel, p_unsigned); >> > PRINT_ATTRf(exclude_callchain_user, p_unsigned); >> > PRINT_ATTRf(mmap2, p_unsigned); >> > + PRINT_ATTRf(munmap, p_unsigned); >> > PRINT_ATTRf(comm_exec, p_unsigned); >> > PRINT_ATTRf(use_clockid, p_unsigned); >> > PRINT_ATTRf(context_switch, p_unsigned); >> > @@ -1474,6 +1477,8 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, >> > } >> > >> > fallback_missing_features: >> > + if (perf_missing_features.munmap) >> > + evsel->attr.munmap = 0; >> > if (perf_missing_features.clockid_wrong) >> > evsel->attr.clockid = CLOCK_MONOTONIC; /* should always work */ >> > if (perf_missing_features.clockid) { >> > @@ -1603,10 +1608,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, >> > goto out_close; >> > >> > /* >> > - * Must probe features in the order they were added to the >> > + * Must probe features in the reverse order they were added to the >> > * perf_event_attr interface. >> > */ >> > - if (!perf_missing_features.write_backward && evsel->attr.write_backward) { >> > + if (!perf_missing_features.munmap && evsel->attr.munmap) { >> > + perf_missing_features.munmap = true; >> > + goto fallback_missing_features; >> > + } else if (!perf_missing_features.write_backward && evsel->attr.write_backward) { >> > perf_missing_features.write_backward = true; >> > goto out_close; >> > } else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) { >> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c >> > index 747a034d1ff3..24f2f309d70f 100644 >> > --- a/tools/perf/util/machine.c >> > +++ b/tools/perf/util/machine.c >> > @@ -1320,6 +1320,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine, >> > return -1; >> > } >> > >> > +static int machine__process_kernel_munmap_event(struct machine *machine __maybe_unused, >> > + union perf_event *event __maybe_unused) >> > +{ >> > + /* >> > + * XXX: Fill this in as soon as we get munmap event for kernel >> > + * "mmaps", aka module unload >> > + */ >> > + return 0; >> > +} >> > + >> > int machine__process_mmap2_event(struct machine *machine, >> > union perf_event *event, >> > struct perf_sample *sample) >> > @@ -1379,6 +1389,48 @@ int machine__process_mmap2_event(struct machine *machine, >> > return 0; >> > } >> > >> > +int machine__process_munmap_event(struct machine *machine, >> > + union perf_event *event, >> > + struct perf_sample *sample) >> > +{ >> > + struct thread *thread; >> > + struct map *map; >> > + enum map_type type = MAP__FUNCTION; >> > + int ret = 0; >> > + >> > + if (dump_trace) >> > + perf_event__fprintf_munmap(event, stdout); >> > + >> > + if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL || >> > + sample->cpumode == PERF_RECORD_MISC_KERNEL) { >> > + ret = machine__process_kernel_munmap_event(machine, event); >> > + if (ret < 0) >> > + goto out_problem; >> > + return 0; >> > + } >> > + >> > + thread = machine__find_thread(machine, sample->pid, sample->tid); >> > + if (thread == NULL) >> > + goto out_problem; >> > + >> > + if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA) >> > + type = MAP__VARIABLE; >> > + >> > + map = map_groups__find(thread->mg, type, event->munmap.start); >> > + if (map != NULL) >> > + goto out_problem_map; >> > + >> > + map_groups__remove(thread->mg, map); >> > + thread__put(thread); >> > + return 0; >> > + >> > +out_problem_map: >> > + thread__put(thread); >> > +out_problem: >> > + dump_printf("problem processing PERF_RECORD_MUNMAP, skipping event.\n"); >> > + return 0; >> > +} >> > + >> > int machine__process_mmap_event(struct machine *machine, union perf_event *event, >> > struct perf_sample *sample) >> > { >> > @@ -1540,6 +1592,8 @@ int machine__process_event(struct machine *machine, union perf_event *event, >> > ret = machine__process_mmap_event(machine, event, sample); break; >> > case PERF_RECORD_MMAP2: >> > ret = machine__process_mmap2_event(machine, event, sample); break; >> > + case PERF_RECORD_MUNMAP: >> > + ret = machine__process_munmap_event(machine, event, sample); break; >> > case PERF_RECORD_FORK: >> > ret = machine__process_fork_event(machine, event, sample); break; >> > case PERF_RECORD_EXIT: >> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h >> > index a28305029711..eabc8b9d8d36 100644 >> > --- a/tools/perf/util/machine.h >> > +++ b/tools/perf/util/machine.h >> > @@ -101,6 +101,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event >> > struct perf_sample *sample); >> > int machine__process_mmap2_event(struct machine *machine, union perf_event *event, >> > struct perf_sample *sample); >> > +int machine__process_munmap_event(struct machine *machine, union perf_event *event, >> > + struct perf_sample *sample); >> > int machine__process_event(struct machine *machine, union perf_event *event, >> > struct perf_sample *sample); >> > >> > diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c >> > index a5fbc012e3df..047e004dc53f 100644 >> > --- a/tools/perf/util/python.c >> > +++ b/tools/perf/util/python.c >> > @@ -1155,6 +1155,7 @@ static struct { >> > PERF_CONST(RECORD_READ), >> > PERF_CONST(RECORD_SAMPLE), >> > PERF_CONST(RECORD_MMAP2), >> > + PERF_CONST(RECORD_MUNMAP), >> > PERF_CONST(RECORD_AUX), >> > PERF_CONST(RECORD_ITRACE_START), >> > PERF_CONST(RECORD_LOST_SAMPLES), >> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c >> > index 349c68144e55..42edd68ce0b8 100644 >> > --- a/tools/perf/util/session.c >> > +++ b/tools/perf/util/session.c >> > @@ -357,6 +357,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool) >> > tool->mmap = process_event_stub; >> > if (tool->mmap2 == NULL) >> > tool->mmap2 = process_event_stub; >> > + if (tool->munmap == NULL) >> > + tool->munmap = process_event_stub; >> > if (tool->comm == NULL) >> > tool->comm = process_event_stub; >> > if (tool->fork == NULL) >> > @@ -480,6 +482,20 @@ static void perf_event__mmap2_swap(union perf_event *event, >> > swap_sample_id_all(event, data); >> > } >> > } >> > + >> > +static void perf_event__munmap_swap(union perf_event *event, bool sample_id_all) >> > +{ >> > + event->munmap.start = bswap_64(event->munmap.start); >> > + event->munmap.len = bswap_64(event->munmap.len); >> > + >> > + if (sample_id_all) { >> > + void *data = &event->munmap.len; >> > + >> > + data += sizeof(event->munmap.len); >> > + swap_sample_id_all(event, data); >> > + } >> > +} >> > + >> > static void perf_event__task_swap(union perf_event *event, bool sample_id_all) >> > { >> > event->fork.pid = bswap_32(event->fork.pid); >> > @@ -773,6 +789,7 @@ typedef void (*perf_event__swap_op)(union perf_event *event, >> > static perf_event__swap_op perf_event__swap_ops[] = { >> > [PERF_RECORD_MMAP] = perf_event__mmap_swap, >> > [PERF_RECORD_MMAP2] = perf_event__mmap2_swap, >> > + [PERF_RECORD_MUNMAP] = perf_event__munmap_swap, >> > [PERF_RECORD_COMM] = perf_event__comm_swap, >> > [PERF_RECORD_FORK] = perf_event__task_swap, >> > [PERF_RECORD_EXIT] = perf_event__task_swap, >> > @@ -1237,6 +1254,8 @@ static int machines__deliver_event(struct machines *machines, >> > if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT) >> > ++evlist->stats.nr_proc_map_timeout; >> > return tool->mmap2(tool, event, sample, machine); >> > + case PERF_RECORD_MUNMAP: >> > + return tool->munmap(tool, event, sample, machine); >> > case PERF_RECORD_COMM: >> > return tool->comm(tool, event, sample, machine); >> > case PERF_RECORD_FORK: >> > diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h >> > index ac2590a3de2d..66291063b8b2 100644 >> > --- a/tools/perf/util/tool.h >> > +++ b/tools/perf/util/tool.h >> > @@ -39,6 +39,7 @@ struct perf_tool { >> > read; >> > event_op mmap, >> > mmap2, >> > + munmap, >> > comm, >> > fork, >> > exit, ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-27 18:10 ` Stephane Eranian @ 2017-01-27 19:18 ` Arnaldo Carvalho de Melo 2017-01-27 19:26 ` Stephane Eranian 0 siblings, 1 reply; 29+ messages in thread From: Arnaldo Carvalho de Melo @ 2017-01-27 19:18 UTC (permalink / raw) To: eranian Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Stephane Eranian, Andres Freund, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim Em Fri, Jan 27, 2017 at 10:10:09AM -0800, Stephane Eranian escreveu: > On Fri, Jan 27, 2017 at 10:05 AM, Arnaldo Carvalho de Melo > <arnaldo.melo@gmail.com> wrote: > > Em Fri, Jan 27, 2017 at 09:46:55AM -0800, Stephane Eranian escreveu: > >> On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo > >> <arnaldo.melo@gmail.com> wrote: > >> > Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu: > >> >> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu: > >> >> > Something like the (compile tested only) below might be sufficient to > >> >> > disambiguate things. It would need a corresponding tools/perf patch of > >> >> > course, but I'm not too familiar with that code anymore. > >> >> > >> >> I'm working on patch to do feature test, fallback and handling of the > >> >> event, etc, will post later. > >> > > >> > Just compile tested, need to build a kernel with PeterZ's patch to test, > >> > feel free to go from there if in a hurry. > >> > > >> > > >> > The place where the map is yanked out of the thread's maps rbtree is at > >> > > >> > machine__process_munmap_event() > >> > > >> > The rest is making sure the tool works with older kernels, deals with > >> > endianness in the record in a perf.data file for cross platform > >> > analysis, hooking it to the various tools where handling this event > >> > makes sense. > >> > > >> At first glance this patch handles the munmap() well. But it will not solve > >> the case of Andres. Unless you're telling me that the kernel with Peterz's patch > > > > > > Nah, I just tried to implement support for the facility PeterZ was > > proposing for the kernel, not trying to solve Andres, silly me ;-) > > > > But then it doesn't even does that well, as it needs to take munmap.len > > into account, to possibly split the map if they aren't a perfect match > > (start, len). > > > Ah, yes, that's correct. You need to consider len and possibly split > the maps you have. In fact we need to replicate the kernel's do_munmap() routine, as the munmap range may straddle multiple existing maps, possibly splitting the first and/or the last, ditching whatever is in between. - Arnaldo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages 2017-01-27 19:18 ` Arnaldo Carvalho de Melo @ 2017-01-27 19:26 ` Stephane Eranian 0 siblings, 0 replies; 29+ messages in thread From: Stephane Eranian @ 2017-01-27 19:26 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Stephane Eranian, Andres Freund, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim On Fri, Jan 27, 2017 at 11:18 AM, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote: > Em Fri, Jan 27, 2017 at 10:10:09AM -0800, Stephane Eranian escreveu: >> On Fri, Jan 27, 2017 at 10:05 AM, Arnaldo Carvalho de Melo >> <arnaldo.melo@gmail.com> wrote: >> > Em Fri, Jan 27, 2017 at 09:46:55AM -0800, Stephane Eranian escreveu: >> >> On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo >> >> <arnaldo.melo@gmail.com> wrote: >> >> > Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu: >> >> >> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu: >> >> >> > Something like the (compile tested only) below might be sufficient to >> >> >> > disambiguate things. It would need a corresponding tools/perf patch of >> >> >> > course, but I'm not too familiar with that code anymore. >> >> >> >> >> >> I'm working on patch to do feature test, fallback and handling of the >> >> >> event, etc, will post later. >> >> > >> >> > Just compile tested, need to build a kernel with PeterZ's patch to test, >> >> > feel free to go from there if in a hurry. >> >> > >> >> > >> >> > The place where the map is yanked out of the thread's maps rbtree is at >> >> > >> >> > machine__process_munmap_event() >> >> > >> >> > The rest is making sure the tool works with older kernels, deals with >> >> > endianness in the record in a perf.data file for cross platform >> >> > analysis, hooking it to the various tools where handling this event >> >> > makes sense. >> >> > >> >> At first glance this patch handles the munmap() well. But it will not solve >> >> the case of Andres. Unless you're telling me that the kernel with Peterz's patch >> > >> > >> > Nah, I just tried to implement support for the facility PeterZ was >> > proposing for the kernel, not trying to solve Andres, silly me ;-) >> > >> > But then it doesn't even does that well, as it needs to take munmap.len >> > into account, to possibly split the map if they aren't a perfect match >> > (start, len). >> > >> Ah, yes, that's correct. You need to consider len and possibly split >> the maps you have. > > In fact we need to replicate the kernel's do_munmap() routine, as the > munmap range may straddle multiple existing maps, possibly splitting the > first and/or the last, ditching whatever is in between. > Yes. So this is not a small change in perf unfortunately. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2017-01-30 11:53 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-12-10 5:02 perf/jit doesn't cope well with mprotect() to jit containing pages Andres Freund 2016-12-12 8:49 ` Peter Zijlstra 2016-12-12 9:01 ` Andres Freund 2016-12-12 9:28 ` Peter Zijlstra 2017-01-26 1:25 ` Andres Freund 2017-01-26 22:15 ` Peter Zijlstra 2017-01-26 23:04 ` Andres Freund 2017-01-30 11:52 ` [tip:perf/core] perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory tip-bot for Peter Zijlstra 2017-01-26 20:32 ` perf/jit doesn't cope well with mprotect() to jit containing pages Stephane Eranian 2017-01-26 21:00 ` Andres Freund 2017-01-26 21:17 ` Stephane Eranian 2017-01-26 21:22 ` Andres Freund 2017-01-26 21:34 ` Stephane Eranian 2017-01-26 21:51 ` Andres Freund 2017-01-26 22:19 ` Peter Zijlstra 2017-01-26 22:26 ` Stephane Eranian 2017-01-26 22:38 ` Andres Freund 2017-01-26 22:51 ` Stephane Eranian 2017-01-26 23:09 ` Andres Freund 2017-01-26 23:16 ` Stephane Eranian 2017-01-27 13:07 ` Peter Zijlstra 2017-01-27 15:43 ` Arnaldo Carvalho de Melo 2017-01-27 17:35 ` Stephane Eranian 2017-01-27 17:38 ` [PATCH] handle munmap records in tools/perf was: " Arnaldo Carvalho de Melo 2017-01-27 17:46 ` Stephane Eranian 2017-01-27 18:05 ` Arnaldo Carvalho de Melo 2017-01-27 18:10 ` Stephane Eranian 2017-01-27 19:18 ` Arnaldo Carvalho de Melo 2017-01-27 19:26 ` Stephane Eranian
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).