linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf/jit doesn't cope well with mprotect() to jit containing pages
@ 2016-12-10  5:02 Andres Freund
  2016-12-12  8:49 ` Peter Zijlstra
  2017-01-26 20:32 ` perf/jit doesn't cope well with mprotect() to jit containing pages Stephane Eranian
  0 siblings, 2 replies; 29+ messages in thread
From: Andres Freund @ 2016-12-10  5:02 UTC (permalink / raw)
  To: linux-kernel, Stephane Eranian
  Cc: acme, jolsa, peterz, mingo, anton, namhyung, Stephane Eranian

Hi,

While working on optionally jit-compiling parts of postgres using llvm
(MCJIT currently, but Orc would have the same issue afaics), I'm trying
to use perf jit support to make profiling of those JITed parts easier.

Turns out the current jit support in perf doesn't work that well for
LLVM - but it doesn't primarily look like LLVM's fault. Syscall-wise
llvm does (heavily filtered):

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000
mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0
mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0
write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68

mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000
mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000
mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0
mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0
write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68

...

i.e. it mmaps single pages for the each JITed function's sections. Which
makes sense, because the first function is JITed independently from the
second one.

The corresponding MMAP2 records according to perf perf script
--show-mmap-events are:
        postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
        postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
        postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
        postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
Note how the size of the mapping continually increases, so that the each
MMAP2 record covers previous sections.

If one perf inject --jit into that it looks like:
        postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
        postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
        postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so
        postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
        postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
        postgres  4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so

Note how the first injected record is also covered by the following
"//anon" event.  This leads to the the curious effect that samples for
the first function (evalexpr0) are associated with the right generated
.so, until the second function is JITed.

I hacked up perf inject to omit such MMAP2 records by adding
	if (event->mmap2.prot == 0)
		return 0;
to perf_event__jit_repipe_mmap2() and suddenly things work.

I presume the increasing MMAP2 size is triggered by the consecutive
pages being represented as a single page-range in the kernel?

If I, to work around such consecutive pages, force another page to be
mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes
away.

Am I doing something wrong, or is there a bug here?

FWIW, this is on linux 4.8.8, with perf from master
(v4.9-rc8-108-g810ac7b7558d).

BTW, it's also a bit weird that those MMAP2 records triggered by
mprotect/mmap, have prot set to 0...

Regards,

Andres

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2016-12-10  5:02 perf/jit doesn't cope well with mprotect() to jit containing pages Andres Freund
@ 2016-12-12  8:49 ` Peter Zijlstra
  2016-12-12  9:01   ` Andres Freund
  2017-01-26 22:15   ` Peter Zijlstra
  2017-01-26 20:32 ` perf/jit doesn't cope well with mprotect() to jit containing pages Stephane Eranian
  1 sibling, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2016-12-12  8:49 UTC (permalink / raw)
  To: Andres Freund
  Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton,
	namhyung, Stephane Eranian

On Fri, Dec 09, 2016 at 09:02:18PM -0800, Andres Freund wrote:
> 
> I presume the increasing MMAP2 size is triggered by the consecutive
> pages being represented as a single page-range in the kernel?

Yes, we print struct vm_area_struct based information, if vmas get
merged, that shows.

> If I, to work around such consecutive pages, force another page to be
> mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes
> away.

This would indeed inhibit vma merging.

> Am I doing something wrong, or is there a bug here?

Expected behaviour afaict

> BTW, it's also a bit weird that those MMAP2 records triggered by
> mprotect/mmap, have prot set to 0...

Yes, mprotect() does: vma->vm_flags = newflags; before calling
perf_event_mmap(vma); which then looks at VM_{READ,WRITE,EXEC} bits in
that word to generate the prot value.

So that being 0 is a bit weird.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2016-12-12  8:49 ` Peter Zijlstra
@ 2016-12-12  9:01   ` Andres Freund
  2016-12-12  9:28     ` Peter Zijlstra
  2017-01-26 22:15   ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Andres Freund @ 2016-12-12  9:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton,
	namhyung, Stephane Eranian



On December 12, 2016 12:49:03 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>On Fri, Dec 09, 2016 at 09:02:18PM -0800, Andres Freund wrote:

>> Am I doing something wrong, or is there a bug here?
>
>Expected behaviour afaict

So I need to prevent vma  merging to use perf jit support? That seems a bit weird.  Possibly the inject --jit  pass needs to do something about this? A hack might be to re-emit  the fake mmap2 records after ones potentially overwriting previously emitted records.   Don't know perf internals well enough to suggest something prettier.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2016-12-12  9:01   ` Andres Freund
@ 2016-12-12  9:28     ` Peter Zijlstra
  2017-01-26  1:25       ` Andres Freund
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2016-12-12  9:28 UTC (permalink / raw)
  To: Andres Freund
  Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton,
	namhyung, Stephane Eranian

On Mon, Dec 12, 2016 at 01:01:48AM -0800, Andres Freund wrote:
> 
> 
> On December 12, 2016 12:49:03 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
> >On Fri, Dec 09, 2016 at 09:02:18PM -0800, Andres Freund wrote:
> 
> >> Am I doing something wrong, or is there a bug here?
> >
> >Expected behaviour afaict
> 
> So I need to prevent vma  merging to use perf jit support? That seems
> a bit weird. 

No idea about that, I don't really have the whole JIT stuff present atm.
I just know how the mmap stuff works.

Stephane?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2016-12-12  9:28     ` Peter Zijlstra
@ 2017-01-26  1:25       ` Andres Freund
  0 siblings, 0 replies; 29+ messages in thread
From: Andres Freund @ 2017-01-26  1:25 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, Peter Zijlstra, acme, jolsa, mingo, anton,
	namhyung, Stephane Eranian

Hi,

On 2016-12-12 10:28:13 +0100, Peter Zijlstra wrote:
> On Mon, Dec 12, 2016 at 01:01:48AM -0800, Andres Freund wrote:
> > 
> > 
> > On December 12, 2016 12:49:03 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
> > >On Fri, Dec 09, 2016 at 09:02:18PM -0800, Andres Freund wrote:
> > 
> > >> Am I doing something wrong, or is there a bug here?
> > >
> > >Expected behaviour afaict
> > 
> > So I need to prevent vma  merging to use perf jit support? That seems
> > a bit weird. 
> 
> No idea about that, I don't really have the whole JIT stuff present atm.
> I just know how the mmap stuff works.
> 
> Stephane?

Ping?

Andres

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2016-12-10  5:02 perf/jit doesn't cope well with mprotect() to jit containing pages Andres Freund
  2016-12-12  8:49 ` Peter Zijlstra
@ 2017-01-26 20:32 ` Stephane Eranian
  2017-01-26 21:00   ` Andres Freund
  1 sibling, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2017-01-26 20:32 UTC (permalink / raw)
  To: Andres Freund
  Cc: LKML, Stephane Eranian, Arnaldo Carvalho de Melo, Jiri Olsa,
	Peter Zijlstra, Ingo Molnar, anton, Namhyung Kim

Hi Andres,

Sorry for the delay.

On Fri, Dec 9, 2016 at 9:02 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> While working on optionally jit-compiling parts of postgres using llvm
> (MCJIT currently, but Orc would have the same issue afaics), I'm trying
> to use perf jit support to make profiling of those JITed parts easier.
>
> Turns out the current jit support in perf doesn't work that well for
> LLVM - but it doesn't primarily look like LLVM's fault. Syscall-wise
> llvm does (heavily filtered):
>
Ok, when you say you are using perf jit with LLVM, I want to understand how?

The perf jit support requires cooperation from the jitted runtime. For
the case of Java
there is a JVMTI agent (shared lib) that generates a jitdump info file
with information about
how the runtime jitted functions and what is mapped where. The kernel
is still recording
the mmap syscalls. But you need more to map an address to a symbol.
You need to know
where and when the JIT compiler mapped functions, when they are
jitted, re-jitted or moved. This is the
role of the agent to encode all of this in the jitdump file which is
then used by perf inject
to rewrite the perf.data file with the new mmap records that point to
the .so files that it generates
for each jitted function.

Did you write such support? Is it already there in LLVM?
If not, then you will not be able to correctly correlate symbols.


> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000
> mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0
> mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0
> write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68
>
> mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000
> mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000
> mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0
> mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0
> write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68
>
> ...
>
> i.e. it mmaps single pages for the each JITed function's sections. Which
> makes sense, because the first function is JITed independently from the
> second one.
>
> The corresponding MMAP2 records according to perf perf script
> --show-mmap-events are:
>         postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
> Note how the size of the mapping continually increases, so that the each
> MMAP2 record covers previous sections.
>
> If one perf inject --jit into that it looks like:
>         postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so
>         postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so
>
> Note how the first injected record is also covered by the following
> "//anon" event.  This leads to the the curious effect that samples for
> the first function (evalexpr0) are associated with the right generated
> .so, until the second function is JITed.
>
> I hacked up perf inject to omit such MMAP2 records by adding
>         if (event->mmap2.prot == 0)
>                 return 0;
> to perf_event__jit_repipe_mmap2() and suddenly things work.
>
> I presume the increasing MMAP2 size is triggered by the consecutive
> pages being represented as a single page-range in the kernel?
>
> If I, to work around such consecutive pages, force another page to be
> mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes
> away.
>
> Am I doing something wrong, or is there a bug here?
>
> FWIW, this is on linux 4.8.8, with perf from master
> (v4.9-rc8-108-g810ac7b7558d).
>
> BTW, it's also a bit weird that those MMAP2 records triggered by
> mprotect/mmap, have prot set to 0...
>
> Regards,
>
> Andres
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 20:32 ` perf/jit doesn't cope well with mprotect() to jit containing pages Stephane Eranian
@ 2017-01-26 21:00   ` Andres Freund
  2017-01-26 21:17     ` Stephane Eranian
  2017-01-26 22:19     ` Peter Zijlstra
  0 siblings, 2 replies; 29+ messages in thread
From: Andres Freund @ 2017-01-26 21:00 UTC (permalink / raw)
  To: eranian
  Cc: LKML, Stephane Eranian, Arnaldo Carvalho de Melo, Jiri Olsa,
	Peter Zijlstra, Ingo Molnar, anton, Namhyung Kim

Hi Stephane,

On 2017-01-26 12:32:02 -0800, Stephane Eranian wrote:
> On Fri, Dec 9, 2016 at 9:02 PM, Andres Freund <andres@anarazel.de> wrote:
> > Hi,
> >
> > While working on optionally jit-compiling parts of postgres using llvm
> > (MCJIT currently, but Orc would have the same issue afaics), I'm trying
> > to use perf jit support to make profiling of those JITed parts easier.
> >
> > Turns out the current jit support in perf doesn't work that well for
> > LLVM - but it doesn't primarily look like LLVM's fault. Syscall-wise
> > llvm does (heavily filtered):
> >
> Ok, when you say you are using perf jit with LLVM, I want to understand how?

Sure.


> The perf jit support requires cooperation from the jitted runtime. For
> the case of Java
> there is a JVMTI agent (shared lib) that generates a jitdump info file
> with information about
> how the runtime jitted functions and what is mapped where. The kernel
> is still recording
> the mmap syscalls. But you need more to map an address to a symbol.
> You need to know
> where and when the JIT compiler mapped functions, when they are
> jitted, re-jitted or moved. This is the
> role of the agent to encode all of this in the jitdump file which is
> then used by perf inject
> to rewrite the perf.data file with the new mmap records that point to
> the .so files that it generates
> for each jitted function.

Right.


> Did you write such support?

Yes.  LLVM's jit has callback support both for symbol and debugging
information. You can register additional sets of callbacks to be called
(at runtime).

There's basically
+void PerfJITEventListener::NotifyObjectEmitted(
+                                       const ObjectFile &Obj,
+                                       const RuntimeDyld::LoadedObjectInfo &L) {
and
+void PerfJITEventListener::NotifyFreeingObject(const ObjectFile &Obj) {

which is enough to use the perf jit support.


> Is it already there in LLVM?

Nope, not yet.  I didn't want to submit an implementation that has the
ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But
once that's resolved I plan to push it upstream (they indicated
interest).  As long as I somehow prevent VMA merging (or just filter
out PERF_RECORD_MMAP2 records with prot 0), that support works.

The problem is that (quoted below) without that hack the subsequent
mmaps just expand the previous VMAs which leads to perf loosing its
(address,range) -> symbol mappings for previously (in the same expanded
range) known symbols.

Greetings,

Andres

> If not, then you will not be able to correctly correlate symbols.
> 
> 
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000
> > mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0
> > mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0
> > write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68
> >
> > mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000
> > mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000
> > mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0
> > mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0
> > write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68
> >
> > ...
> >
> > i.e. it mmaps single pages for the each JITed function's sections. Which
> > makes sense, because the first function is JITed independently from the
> > second one.
> >
> > The corresponding MMAP2 records according to perf perf script
> > --show-mmap-events are:
> >         postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
> >         postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
> >         postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
> >         postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
> > Note how the size of the mapping continually increases, so that the each
> > MMAP2 record covers previous sections.
> >
> > If one perf inject --jit into that it looks like:
> >         postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
> >         postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
> >         postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so
> >         postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
> >         postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
> >         postgres  4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so
> >
> > Note how the first injected record is also covered by the following
> > "//anon" event.  This leads to the the curious effect that samples for
> > the first function (evalexpr0) are associated with the right generated
> > .so, until the second function is JITed.
> >
> > I hacked up perf inject to omit such MMAP2 records by adding
> >         if (event->mmap2.prot == 0)
> >                 return 0;
> > to perf_event__jit_repipe_mmap2() and suddenly things work.
> >
> > I presume the increasing MMAP2 size is triggered by the consecutive
> > pages being represented as a single page-range in the kernel?
> >
> > If I, to work around such consecutive pages, force another page to be
> > mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes
> > away.
> >
> > Am I doing something wrong, or is there a bug here?
> >
> > FWIW, this is on linux 4.8.8, with perf from master
> > (v4.9-rc8-108-g810ac7b7558d).
> >
> > BTW, it's also a bit weird that those MMAP2 records triggered by
> > mprotect/mmap, have prot set to 0...
> >
> > Regards,
> >
> > Andres
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 21:00   ` Andres Freund
@ 2017-01-26 21:17     ` Stephane Eranian
  2017-01-26 21:22       ` Andres Freund
  2017-01-26 22:19     ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2017-01-26 21:17 UTC (permalink / raw)
  To: Andres Freund
  Cc: eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Peter Zijlstra, Ingo Molnar, Anton Blanchard, Namhyung Kim

On Thu, Jan 26, 2017 at 1:00 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi Stephane,
>
> On 2017-01-26 12:32:02 -0800, Stephane Eranian wrote:
> > On Fri, Dec 9, 2016 at 9:02 PM, Andres Freund <andres@anarazel.de> wrote:
> > > Hi,
> > >
> > > While working on optionally jit-compiling parts of postgres using llvm
> > > (MCJIT currently, but Orc would have the same issue afaics), I'm trying
> > > to use perf jit support to make profiling of those JITed parts easier.
> > >
> > > Turns out the current jit support in perf doesn't work that well for
> > > LLVM - but it doesn't primarily look like LLVM's fault. Syscall-wise
> > > llvm does (heavily filtered):
> > >
> > Ok, when you say you are using perf jit with LLVM, I want to understand how?
>
> Sure.
>
>
> > The perf jit support requires cooperation from the jitted runtime. For
> > the case of Java
> > there is a JVMTI agent (shared lib) that generates a jitdump info file
> > with information about
> > how the runtime jitted functions and what is mapped where. The kernel
> > is still recording
> > the mmap syscalls. But you need more to map an address to a symbol.
> > You need to know
> > where and when the JIT compiler mapped functions, when they are
> > jitted, re-jitted or moved. This is the
> > role of the agent to encode all of this in the jitdump file which is
> > then used by perf inject
> > to rewrite the perf.data file with the new mmap records that point to
> > the .so files that it generates
> > for each jitted function.
>
> Right.
>
>
> > Did you write such support?
>
> Yes.  LLVM's jit has callback support both for symbol and debugging
> information. You can register additional sets of callbacks to be called
> (at runtime).
>
> There's basically
> +void PerfJITEventListener::NotifyObjectEmitted(
> +                                       const ObjectFile &Obj,
> +                                       const RuntimeDyld::LoadedObjectInfo &L) {
> and
> +void PerfJITEventListener::NotifyFreeingObject(const ObjectFile &Obj) {
>
> which is enough to use the perf jit support.
>
>
> > Is it already there in LLVM?
>
> Nope, not yet.  I didn't want to submit an implementation that has the
> ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But
> once that's resolved I plan to push it upstream (they indicated
> interest).  As long as I somehow prevent VMA merging (or just filter
> out PERF_RECORD_MMAP2 records with prot 0), that support works.
>
> The problem is that (quoted below) without that hack the subsequent
> mmaps just expand the previous VMAs which leads to perf loosing its
> (address,range) -> symbol mappings for previously (in the same expanded
> range) known symbols.
>

Which timeclock source are you using to timestamp the jitdump records?
It needs to match whatever you are using with perf record. Otherwise, inject
will not work correctly. If you use CLOCK_MONOTONIC, then you need to
use perf record -k mono as well.


>
> Greetings,
>
> Andres
>
> > If not, then you will not be able to correctly correlate symbols.
> >
> >
> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000
> > > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000
> > > mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0
> > > mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0
> > > write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68
> > >
> > > mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000
> > > mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000
> > > mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0
> > > mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0
> > > write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68
> > >
> > > ...
> > >
> > > i.e. it mmaps single pages for the each JITed function's sections. Which
> > > makes sense, because the first function is JITed independently from the
> > > second one.
> > >
> > > The corresponding MMAP2 records according to perf perf script
> > > --show-mmap-events are:
> > >         postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
> > >         postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
> > >         postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
> > >         postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
> > > Note how the size of the mapping continually increases, so that the each
> > > MMAP2 record covers previous sections.
> > >
> > > If one perf inject --jit into that it looks like:
> > >         postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
> > >         postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
> > >         postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so
> > >         postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
> > >         postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
> > >         postgres  4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so
> > >
> > > Note how the first injected record is also covered by the following
> > > "//anon" event.  This leads to the the curious effect that samples for
> > > the first function (evalexpr0) are associated with the right generated
> > > .so, until the second function is JITed.
> > >
> > > I hacked up perf inject to omit such MMAP2 records by adding
> > >         if (event->mmap2.prot == 0)
> > >                 return 0;
> > > to perf_event__jit_repipe_mmap2() and suddenly things work.
> > >
> > > I presume the increasing MMAP2 size is triggered by the consecutive
> > > pages being represented as a single page-range in the kernel?
> > >
> > > If I, to work around such consecutive pages, force another page to be
> > > mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes
> > > away.
> > >
> > > Am I doing something wrong, or is there a bug here?
> > >
> > > FWIW, this is on linux 4.8.8, with perf from master
> > > (v4.9-rc8-108-g810ac7b7558d).
> > >
> > > BTW, it's also a bit weird that those MMAP2 records triggered by
> > > mprotect/mmap, have prot set to 0...
> > >
> > > Regards,
> > >
> > > Andres
> > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 21:17     ` Stephane Eranian
@ 2017-01-26 21:22       ` Andres Freund
  2017-01-26 21:34         ` Stephane Eranian
  0 siblings, 1 reply; 29+ messages in thread
From: Andres Freund @ 2017-01-26 21:22 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Peter Zijlstra, Ingo Molnar, Anton Blanchard, Namhyung Kim

On 2017-01-26 13:17:56 -0800, Stephane Eranian wrote:
> > Nope, not yet.  I didn't want to submit an implementation that has the
> > ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But
> > once that's resolved I plan to push it upstream (they indicated
> > interest).  As long as I somehow prevent VMA merging (or just filter
> > out PERF_RECORD_MMAP2 records with prot 0), that support works.
> >
> > The problem is that (quoted below) without that hack the subsequent
> > mmaps just expand the previous VMAs which leads to perf loosing its
> > (address,range) -> symbol mappings for previously (in the same expanded
> > range) known symbols.
> >
> 
> Which timeclock source are you using to timestamp the jitdump records?
> It needs to match whatever you are using with perf record. Otherwise, inject
> will not work correctly. If you use CLOCK_MONOTONIC, then you need to
> use perf record -k mono as well.

Neither works without preventing the same VMA from being reused, both
work with.

Look at the following strace output from a demo application creating two
JITed functions:

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000
mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0
mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0
write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68

mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000
mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000
mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0
mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0
write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68

...

i.e. it mmaps single pages for the each JITed function's sections. Which
makes sense, because the first function is JITed independently from the
second one.

The corresponding MMAP2 records according to perf perf script
--show-mmap-events are:
        postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
        postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
        postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
        postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
Note how the size of the mapping continually increases, so that the each
MMAP2 record covers previous sections.

If one perf inject --jit into that it looks like:
        postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
        postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
        postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so
        postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
        postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
        postgres  4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so

Note how the first injected record is also covered by the following
"//anon" event.  This leads to the the curious effect that samples for
the first function (evalexpr0) are associated with the right generated
.so, until the second function is JITed.

I hacked up perf inject to omit such MMAP2 records by adding
        if (event->mmap2.prot == 0)
                return 0;
to perf_event__jit_repipe_mmap2() and suddenly things work.

I presume the increasing MMAP2 size is triggered by the consecutive
pages being represented as a single page-range in the kernel?

If I, to work around such consecutive pages, force another page to be
mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes
away.

BTW, it's also a bit weird that those MMAP2 records triggered by
mprotect/mmap, have prot set to 0...

- Andres

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 21:22       ` Andres Freund
@ 2017-01-26 21:34         ` Stephane Eranian
  2017-01-26 21:51           ` Andres Freund
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2017-01-26 21:34 UTC (permalink / raw)
  To: Andres Freund
  Cc: eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Peter Zijlstra, Ingo Molnar, Anton Blanchard, Namhyung Kim

On Thu, Jan 26, 2017 at 1:22 PM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2017-01-26 13:17:56 -0800, Stephane Eranian wrote:
> > > Nope, not yet.  I didn't want to submit an implementation that has the
> > > ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But
> > > once that's resolved I plan to push it upstream (they indicated
> > > interest).  As long as I somehow prevent VMA merging (or just filter
> > > out PERF_RECORD_MMAP2 records with prot 0), that support works.
> > >
> > > The problem is that (quoted below) without that hack the subsequent
> > > mmaps just expand the previous VMAs which leads to perf loosing its
> > > (address,range) -> symbol mappings for previously (in the same expanded
> > > range) known symbols.
> > >
> >
> > Which timeclock source are you using to timestamp the jitdump records?
> > It needs to match whatever you are using with perf record. Otherwise, inject
> > will not work correctly. If you use CLOCK_MONOTONIC, then you need to
> > use perf record -k mono as well.
>
> Neither works without preventing the same VMA from being reused, both
> work with.
>
There is something I am not following here unless there is a bug in inject
or perf report.
The timestamps should take care of ordering what is mapped where and when.
 If look at the timestamps:

postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107:
[0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon

postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107:
[0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon

postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107:
[0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs
/home/andres/.debug/jit/llvm-IR-jit-

The MMAP2 injected for the jitted function appears after the 2
mappings. If you have samples timestamped between

595444.867737 and 595444.868140 - 1 coming from that function, they
will be symbolized correctly. Otherwise things should work as perf
report will consider from  595444.868140 that you jitted function
cover the range of addresses. Unless there is something wrong in perf
report or I am not understanding the problem.




  postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107:
[0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
       postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107:
[0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
       postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107:
[0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs
/home/andres/.debug/jit/llvm-IR-jit-
>
> Look at the following strace output from a demo application creating two
> JITed functions:
>
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000
> mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0
> mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0
> write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68
>
> mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000
> mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000
> mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0
> mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0
> write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68
>
> ...
>
> i.e. it mmaps single pages for the each JITed function's sections. Which
> makes sense, because the first function is JITed independently from the
> second one.
>
> The corresponding MMAP2 records according to perf perf script
> --show-mmap-events are:
>         postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
> Note how the size of the mapping continually increases, so that the each
> MMAP2 record covers previous sections.
>
> If one perf inject --jit into that it looks like:
>         postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so
>         postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
>         postgres  4107 595444.884232: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x45) @ 0x40 fd:02 33434599 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-2.so
>
> Note how the first injected record is also covered by the following
> "//anon" event.  This leads to the the curious effect that samples for
> the first function (evalexpr0) are associated with the right generated
> .so, until the second function is JITed.
>
> I hacked up perf inject to omit such MMAP2 records by adding
>         if (event->mmap2.prot == 0)
>                 return 0;
> to perf_event__jit_repipe_mmap2() and suddenly things work.
>
> I presume the increasing MMAP2 size is triggered by the consecutive
> pages being represented as a single page-range in the kernel?
>
> If I, to work around such consecutive pages, force another page to be
> mmap()ed inbetween, and avoid using MAP_ANONYMOUS, the problem also goes
> away.
>
> BTW, it's also a bit weird that those MMAP2 records triggered by
> mprotect/mmap, have prot set to 0...
>
> - Andres

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 21:34         ` Stephane Eranian
@ 2017-01-26 21:51           ` Andres Freund
  0 siblings, 0 replies; 29+ messages in thread
From: Andres Freund @ 2017-01-26 21:51 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: eranian, LKML, Arnaldo Carvalho de Melo, Jiri Olsa,
	Peter Zijlstra, Ingo Molnar, Anton Blanchard, Namhyung Kim

Hi,

On 2017-01-26 13:34:35 -0800, Stephane Eranian wrote:
> On Thu, Jan 26, 2017 at 1:22 PM, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2017-01-26 13:17:56 -0800, Stephane Eranian wrote:
> > > > Nope, not yet.  I didn't want to submit an implementation that has the
> > > > ugly hack of mmap()ing /dev/zero pages to prevent VMA merging ;). But
> > > > once that's resolved I plan to push it upstream (they indicated
> > > > interest).  As long as I somehow prevent VMA merging (or just filter
> > > > out PERF_RECORD_MMAP2 records with prot 0), that support works.
> > > >
> > > > The problem is that (quoted below) without that hack the subsequent
> > > > mmaps just expand the previous VMAs which leads to perf loosing its
> > > > (address,range) -> symbol mappings for previously (in the same expanded
> > > > range) known symbols.
> > > >
> > >
> > > Which timeclock source are you using to timestamp the jitdump records?
> > > It needs to match whatever you are using with perf record. Otherwise, inject
> > > will not work correctly. If you use CLOCK_MONOTONIC, then you need to
> > > use perf record -k mono as well.
> >
> > Neither works without preventing the same VMA from being reused, both
> > work with.
> >
> There is something I am not following here unless there is a bug in inject
> or perf report.
> The timestamps should take care of ordering what is mapped where and when.
>  If look at the timestamps:
> 
> postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107:
> [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
> 
> postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107:
> [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
> 
> postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107:
> [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs
> /home/andres/.debug/jit/llvm-IR-jit-
> 
> The MMAP2 injected for the jitted function appears after the 2
> mappings. If you have samples timestamped between
> 
> 595444.867737 and 595444.868140 - 1 coming from that function, they
> will be symbolized correctly. Otherwise things should work as perf
> report will consider from  595444.868140 that you jitted function
> cover the range of addresses. Unless there is something wrong in perf
> report or I am not understanding the problem.

If I just have a single JITed function it *is* symbolized correctly. But
if I have multiple ones only the last one is correctly symbolized...

The problem appears to be triggered by a single VMA being used for all
the mappings.  Check how the length of the overall mapping increases
from 0x7efd3866e000(0x1000) increasing to 0x7efd3866b000(0x4000) (that's
the same mapping if you add the length, just increased in size by
another page).  When script shows

>         postgres  4107 595444.868140: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x45) @ 0x40 fd:02 33434534 1]: --xs /home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so
>         postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon

all hits for 0x7efd3866e000 are symbolized correctly if, and only if,
they occur before the second PERF_RECORD_MMAP2 mapping to //anon.  The
next function (JITing which triggered enlarging the VMA) is then
correctly symbolized, until the third function is JITed...

I've confirmed this by ignoring these //anon mappings by adding
         if (event->mmap2.prot == 0)
                 return 0;
to perf_event__jit_repipe_mmap2() - then everything works.

(note that I don't understand why prot is 0 in the first place for
these, but it was a convenient hack to filter the records).

Which seems to suggest to me that perf's tracking of memory regions is
over-eager in overwriting more specific mappings like
0x7efd3866e000(0x45) to
/home/andres/.debug/jit/llvm-IR-jit-20161209.XXfN0K3O/jitted-4107-1.so
with the larger 0x7efd3866c000(0x3000) to //anon mapping.

Similarly, if I prevent the independent mmap()s from sharing a VMA (by
creating a mapping inbetween the individual function mappings which can't
be shared), everything works as well.

- Andres

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2016-12-12  8:49 ` Peter Zijlstra
  2016-12-12  9:01   ` Andres Freund
@ 2017-01-26 22:15   ` Peter Zijlstra
  2017-01-26 23:04     ` Andres Freund
  2017-01-30 11:52     ` [tip:perf/core] perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory tip-bot for Peter Zijlstra
  1 sibling, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2017-01-26 22:15 UTC (permalink / raw)
  To: Andres Freund
  Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton,
	namhyung, Stephane Eranian

On Mon, Dec 12, 2016 at 09:49:03AM +0100, Peter Zijlstra wrote:

> > BTW, it's also a bit weird that those MMAP2 records triggered by
> > mprotect/mmap, have prot set to 0...
> 
> Yes, mprotect() does: vma->vm_flags = newflags; before calling
> perf_event_mmap(vma); which then looks at VM_{READ,WRITE,EXEC} bits in
> that word to generate the prot value.
> 
> So that being 0 is a bit weird.

OK, that was a silly bug :/

With that fixed, the following proglet:

#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>

void main(void)
{
	void *ptr[5];
	int i;

	for (i=0; i<5; i++) {
		ptr[i] = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
		*(char *)ptr[i] = '\0';
		mprotect(ptr[i], 4096, PROT_EXEC);
	}
}


Gives:

root@ivb-ep:~# perf record -d ./mprot
...
root@ivb-ep:~# perf report -D | grep MMAP2
...
43805378481 0xc60 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0862000(0x1000) @ 0x7f6ec0862000 00:00 0 0]: --xp //anon
43805381755 0xcc0 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0861000(0x1000) @ 0x7f6ec0861000 00:00 0 0]: rw-p //anon
43805392374 0xd20 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0861000(0x2000) @ 0x7f6ec0861000 00:00 0 0]: --xp //anon
43805395093 0xd80 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0860000(0x1000) @ 0x7f6ec0860000 00:00 0 0]: rw-p //anon
43805402842 0xde0 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec0860000(0x3000) @ 0x7f6ec0860000 00:00 0 0]: --xp //anon
43805405388 0xe40 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec085f000(0x1000) @ 0x7f6ec085f000 00:00 0 0]: rw-p //anon
43805412605 0xea0 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec085f000(0x4000) @ 0x7f6ec085f000 00:00 0 0]: --xp //anon
43805415134 0xf00 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec085e000(0x1000) @ 0x7f6ec085e000 00:00 0 0]: rw-p //anon
43805422225 0xf60 [0x60]: PERF_RECORD_MMAP2 2073/2073: [0x7f6ec085e000(0x5000) @ 0x7f6ec085e000 00:00 0 0]: --xp //anon

If you omit the mmap_data=1 thing, it looks like:

123087941779 0x500 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8bd000(0x1000) @ 0x7fca5a8bd000 00:00 0 0]: --xp //anon
123087953825 0x560 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8bc000(0x2000) @ 0x7fca5a8bc000 00:00 0 0]: --xp //anon
123087962944 0x5c0 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8bb000(0x3000) @ 0x7fca5a8bb000 00:00 0 0]: --xp //anon
123087971135 0x620 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8ba000(0x4000) @ 0x7fca5a8ba000 00:00 0 0]: --xp //anon
123087979360 0x680 [0x60]: PERF_RECORD_MMAP2 2084/2084: [0x7fca5a8b9000(0x5000) @ 0x7fca5a8b9000 00:00 0 0]: --xp //anon

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 896aa53..e67c5ba 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6660,6 +6683,27 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
 	char *buf = NULL;
 	char *name;
 
+	if (vma->vm_flags & VM_READ)
+		prot |= PROT_READ;
+	if (vma->vm_flags & VM_WRITE)
+		prot |= PROT_WRITE;
+	if (vma->vm_flags & VM_EXEC)
+		prot |= PROT_EXEC;
+
+	if (vma->vm_flags & VM_MAYSHARE)
+		flags = MAP_SHARED;
+	else
+		flags = MAP_PRIVATE;
+
+	if (vma->vm_flags & VM_DENYWRITE)
+		flags |= MAP_DENYWRITE;
+	if (vma->vm_flags & VM_MAYEXEC)
+		flags |= MAP_EXECUTABLE;
+	if (vma->vm_flags & VM_LOCKED)
+		flags |= MAP_LOCKED;
+	if (vma->vm_flags & VM_HUGETLB)
+		flags |= MAP_HUGETLB;
+
 	if (file) {
 		struct inode *inode;
 		dev_t dev;
@@ -6686,27 +6730,6 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
 		maj = MAJOR(dev);
 		min = MINOR(dev);
 
-		if (vma->vm_flags & VM_READ)
-			prot |= PROT_READ;
-		if (vma->vm_flags & VM_WRITE)
-			prot |= PROT_WRITE;
-		if (vma->vm_flags & VM_EXEC)
-			prot |= PROT_EXEC;
-
-		if (vma->vm_flags & VM_MAYSHARE)
-			flags = MAP_SHARED;
-		else
-			flags = MAP_PRIVATE;
-
-		if (vma->vm_flags & VM_DENYWRITE)
-			flags |= MAP_DENYWRITE;
-		if (vma->vm_flags & VM_MAYEXEC)
-			flags |= MAP_EXECUTABLE;
-		if (vma->vm_flags & VM_LOCKED)
-			flags |= MAP_LOCKED;
-		if (vma->vm_flags & VM_HUGETLB)
-			flags |= MAP_HUGETLB;
-
 		goto got_name;
 	} else {
 		if (vma->vm_ops && vma->vm_ops->name) {

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 21:00   ` Andres Freund
  2017-01-26 21:17     ` Stephane Eranian
@ 2017-01-26 22:19     ` Peter Zijlstra
  2017-01-26 22:26       ` Stephane Eranian
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2017-01-26 22:19 UTC (permalink / raw)
  To: Andres Freund
  Cc: eranian, LKML, Stephane Eranian, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, anton, Namhyung Kim

On Thu, Jan 26, 2017 at 01:00:52PM -0800, Andres Freund wrote:
> The problem is that (quoted below) without that hack the subsequent
> mmaps just expand the previous VMAs which leads to perf loosing its
> (address,range) -> symbol mappings for previously (in the same expanded
> range) known symbols.

I'm assuming this is the userspace tool doing that, right?

Acme, is that something we can cure? Instead of throwing away eveything
known of the previous map, merge them when they grow?

So I think the problem was that we don't track munmap(), so if you see
two overlapping mappings, we have no clue and assume an munmap() has
happened or something. I can't really remember, its been many many years
since I wrote all that :-(

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 22:19     ` Peter Zijlstra
@ 2017-01-26 22:26       ` Stephane Eranian
  2017-01-26 22:38         ` Andres Freund
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2017-01-26 22:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andres Freund, eranian, LKML, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim

On Thu, Jan 26, 2017 at 2:19 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jan 26, 2017 at 01:00:52PM -0800, Andres Freund wrote:
>> The problem is that (quoted below) without that hack the subsequent
>> mmaps just expand the previous VMAs which leads to perf loosing its
>> (address,range) -> symbol mappings for previously (in the same expanded
>> range) known symbols.
>
> I'm assuming this is the userspace tool doing that, right?
>
> Acme, is that something we can cure? Instead of throwing away eveything
> known of the previous map, merge them when they grow?
>
> So I think the problem was that we don't track munmap(), so if you see
> two overlapping mappings, we have no clue and assume an munmap() has
> happened or something. I can't really remember, its been many many years
> since I wrote all that :-(

But perf report takes care of overlapping mmaps normally,
see util/map.c:map_groups__fixup_overlappings

VMA merging does not change anything at the user level.
The kernel will see two mmaps and so will perf record and perf report.
This is what you have. In the samples I quoted from your example, this
is what should happen. If it does not then there is something wrong in
perf report.


In the case of Java, if I recall, there is a single large mmap for the
code cache
and it works fine.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 22:26       ` Stephane Eranian
@ 2017-01-26 22:38         ` Andres Freund
  2017-01-26 22:51           ` Stephane Eranian
  0 siblings, 1 reply; 29+ messages in thread
From: Andres Freund @ 2017-01-26 22:38 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, eranian, LKML, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim

On 2017-01-26 14:26:12 -0800, Stephane Eranian wrote:
> On Thu, Jan 26, 2017 at 2:19 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Jan 26, 2017 at 01:00:52PM -0800, Andres Freund wrote:
> >> The problem is that (quoted below) without that hack the subsequent
> >> mmaps just expand the previous VMAs which leads to perf loosing its
> >> (address,range) -> symbol mappings for previously (in the same expanded
> >> range) known symbols.
> >
> > I'm assuming this is the userspace tool doing that, right?
> >
> > Acme, is that something we can cure? Instead of throwing away eveything
> > known of the previous map, merge them when they grow?
> >
> > So I think the problem was that we don't track munmap(), so if you see
> > two overlapping mappings, we have no clue and assume an munmap() has
> > happened or something. I can't really remember, its been many many years
> > since I wrote all that :-(
> 
> But perf report takes care of overlapping mmaps normally,
> see util/map.c:map_groups__fixup_overlappings
> 
> VMA merging does not change anything at the user level.
> The kernel will see two mmaps and so will perf record and perf report.

But the kernel *didn't* report two independent mmaps.

Looking at strace shows that the mmaps were done independently in page
sized chunks:

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000
mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0
mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0
write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68

(first function)

mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000
mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000
mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0
mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0
write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68

(second function)


But then looking at the unannotated perf script --show-mmap-events:

postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon

these four mmaps regions are:
0x7efd3866e000(0x1000): 0x7efd3866e000 - 0x7efd3866f000
0x7efd3866d000(0x2000): 0x7efd3866d000 - 0x7efd3866f000
0x7efd3866c000(0x3000): 0x7efd3866c000 - 0x7efd3866f000
0x7efd3866b000(0x4000): 0x7efd3866b000 - 0x7efd3866f000

So there's just a single mapping being reported.


> This is what you have.

I'm a bit confused - isn't the trace showing exactly that the mmaps
*are* reported with a merged VMA?


> In the case of Java, if I recall, there is a single large mmap for the
> code cache and it works fine.

That explains why it's working there...

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 22:38         ` Andres Freund
@ 2017-01-26 22:51           ` Stephane Eranian
  2017-01-26 23:09             ` Andres Freund
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2017-01-26 22:51 UTC (permalink / raw)
  To: Andres Freund
  Cc: Peter Zijlstra, eranian, LKML, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim

On Thu, Jan 26, 2017 at 2:38 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-01-26 14:26:12 -0800, Stephane Eranian wrote:
>> On Thu, Jan 26, 2017 at 2:19 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, Jan 26, 2017 at 01:00:52PM -0800, Andres Freund wrote:
>> >> The problem is that (quoted below) without that hack the subsequent
>> >> mmaps just expand the previous VMAs which leads to perf loosing its
>> >> (address,range) -> symbol mappings for previously (in the same expanded
>> >> range) known symbols.
>> >
>> > I'm assuming this is the userspace tool doing that, right?
>> >
>> > Acme, is that something we can cure? Instead of throwing away eveything
>> > known of the previous map, merge them when they grow?
>> >
>> > So I think the problem was that we don't track munmap(), so if you see
>> > two overlapping mappings, we have no clue and assume an munmap() has
>> > happened or something. I can't really remember, its been many many years
>> > since I wrote all that :-(
>>
>> But perf report takes care of overlapping mmaps normally,
>> see util/map.c:map_groups__fixup_overlappings
>>
>> VMA merging does not change anything at the user level.
>> The kernel will see two mmaps and so will perf record and perf report.
>
> But the kernel *didn't* report two independent mmaps.
>
> Looking at strace shows that the mmaps were done independently in page
> sized chunks:
>
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866e000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866d000
> mprotect(0x7efd3866e000, 4096, PROT_READ|PROT_EXEC) = 0
> mprotect(0x7efd3866d000, 4096, PROT_READ|PROT_EXEC) = 0
> write(2, "Function loaded: evalexpr0 at 139626038091776 0x7efd3866e000 len 69", 68) = 68
>
> (first function)
>
> mmap(0x7efd3866f000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866c000
> mmap(0x7efd3866e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efd3866b000
> mprotect(0x7efd3866c000, 4096, PROT_READ|PROT_EXEC) = 0
> mprotect(0x7efd3866b000, 4096, PROT_READ|PROT_EXEC) = 0
> write(2, "Function loaded: evalexpr1 at 139626038083584 0x7efd3866c000 len 69", 68) = 68
>
> (second function)
>
>
> But then looking at the unannotated perf script --show-mmap-events:
>
> postgres  4107 595444.867737: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866e000(0x1000) @ 0x7efd3866e000 00:00 0 0]: ---p //anon
> postgres  4107 595444.867825: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866d000(0x2000) @ 0x7efd3866d000 00:00 0 0]: ---p //anon
> postgres  4107 595444.884090: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866c000(0x3000) @ 0x7efd3866c000 00:00 0 0]: ---p //anon
> postgres  4107 595444.884113: PERF_RECORD_MMAP2 4107/4107: [0x7efd3866b000(0x4000) @ 0x7efd3866b000 00:00 0 0]: ---p //anon
>
> these four mmaps regions are:
> 0x7efd3866e000(0x1000): 0x7efd3866e000 - 0x7efd3866f000
> 0x7efd3866d000(0x2000): 0x7efd3866d000 - 0x7efd3866f000
> 0x7efd3866c000(0x3000): 0x7efd3866c000 - 0x7efd3866f000
> 0x7efd3866b000(0x4000): 0x7efd3866b000 - 0x7efd3866f000
>
> So there's just a single mapping being reported.
>
>
>> This is what you have.
>
> I'm a bit confused - isn't the trace showing exactly that the mmaps
> *are* reported with a merged VMA?
>
>
>> In the case of Java, if I recall, there is a single large mmap for the
>> code cache and it works fine.
>
> That explains why it's working there...
>
Ok, I think I see the problem now (sorry was slow....):
the timeline is as follows as seen from the user in your case:

T0: mmap(0x1000) for func1()
T1 mmap(0x2000) for func1();
T3: jit emits info func1() [0x1000-0x1fff]
T4: mmap(0x3000) for func2()
T5: mmap(0x4000)  for funcs2()
T6: jit emits info for func2() [0x2000-0x3fff]

But the problem is that each mmap covers existing mmaps and thus
supersedes the others as per the time stamp.
The problem is not specific to jit, it just reveals itself in your case.

The logic in perf is that a more recent mmap supersedes an older one,
so you have:
 T3: 0x1000-0x2000 owned by func1
 T4: 0x1000-0x3000 owned by anon
 T5: 0x1000-0x4000 owned by anon
 T6: 0x1000-0x4000 owned partially by func2()

And thus perf cannot symbolize func1() anymore because it has nothing
mapped in 0x1000-0x1fff but anon.

Did I get the problem right this time?

This is tricky to solve here because the tool does not know about the
merging of the VMAs and assume you are overlapping mmaps and not
merging them.

Again the problem is not specific to jit, merging of VMA can happen
anytime with any app.





> Greetings,
>
> Andres Freund

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 22:15   ` Peter Zijlstra
@ 2017-01-26 23:04     ` Andres Freund
  2017-01-30 11:52     ` [tip:perf/core] perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: Andres Freund @ 2017-01-26 23:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Stephane Eranian, acme, jolsa, mingo, anton,
	namhyung, Stephane Eranian

On 2017-01-26 23:15:08 +0100, Peter Zijlstra wrote:
> On Mon, Dec 12, 2016 at 09:49:03AM +0100, Peter Zijlstra wrote:
> 
> > > BTW, it's also a bit weird that those MMAP2 records triggered by
> > > mprotect/mmap, have prot set to 0...
> > 
> > Yes, mprotect() does: vma->vm_flags = newflags; before calling
> > perf_event_mmap(vma); which then looks at VM_{READ,WRITE,EXEC} bits in
> > that word to generate the prot value.
> > 
> > So that being 0 is a bit weird.
> 
> OK, that was a silly bug :/

Heh, that explains that part.  Thanks for looking into it!


> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 896aa53..e67c5ba 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6660,6 +6683,27 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
>  	char *buf = NULL;
>  	char *name;
>  
> +	if (vma->vm_flags & VM_READ)
> +		prot |= PROT_READ;
> +	if (vma->vm_flags & VM_WRITE)
> +		prot |= PROT_WRITE;
> +	if (vma->vm_flags & VM_EXEC)
> +		prot |= PROT_EXEC;
> +
> +	if (vma->vm_flags & VM_MAYSHARE)
> +		flags = MAP_SHARED;
> +	else
> +		flags = MAP_PRIVATE;
> +
> +	if (vma->vm_flags & VM_DENYWRITE)
> +		flags |= MAP_DENYWRITE;
> +	if (vma->vm_flags & VM_MAYEXEC)
> +		flags |= MAP_EXECUTABLE;
> +	if (vma->vm_flags & VM_LOCKED)
> +		flags |= MAP_LOCKED;
> +	if (vma->vm_flags & VM_HUGETLB)
> +		flags |= MAP_HUGETLB;
> +
>  	if (file) {
>  		struct inode *inode;
>  		dev_t dev;
> @@ -6686,27 +6730,6 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
>  		maj = MAJOR(dev);
>  		min = MINOR(dev);
>  
> -		if (vma->vm_flags & VM_READ)
> -			prot |= PROT_READ;
> -		if (vma->vm_flags & VM_WRITE)
> -			prot |= PROT_WRITE;
> -		if (vma->vm_flags & VM_EXEC)
> -			prot |= PROT_EXEC;
> -
> -		if (vma->vm_flags & VM_MAYSHARE)
> -			flags = MAP_SHARED;
> -		else
> -			flags = MAP_PRIVATE;
> -
> -		if (vma->vm_flags & VM_DENYWRITE)
> -			flags |= MAP_DENYWRITE;
> -		if (vma->vm_flags & VM_MAYEXEC)
> -			flags |= MAP_EXECUTABLE;
> -		if (vma->vm_flags & VM_LOCKED)
> -			flags |= MAP_LOCKED;
> -		if (vma->vm_flags & VM_HUGETLB)
> -			flags |= MAP_HUGETLB;
> -
>  		goto got_name;
>  	} else {
>  		if (vma->vm_ops && vma->vm_ops->name) {

I'm not a mm/perf (or kernel in general) expert. But this looks
obviously buggy to me. And the fix makes it look correct. So if that's
helpful (I'm not actually sure):
Signed-off-by: Andres Freund <andres@anarazel.de>

Not sure if this on its own has enough consequences to be worth being
put into stable?

Not that it's this fix's responsibility, but I did note that that piece
of code doesn't maintain MAP_GROWSDOWN, nor set the bits controlling the
size of the HUGETLB mapping.  Seems harmless enough, but given that I
already looked I thought it'd bring it up.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 22:51           ` Stephane Eranian
@ 2017-01-26 23:09             ` Andres Freund
  2017-01-26 23:16               ` Stephane Eranian
  0 siblings, 1 reply; 29+ messages in thread
From: Andres Freund @ 2017-01-26 23:09 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, eranian, LKML, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim

Hi Stephane,


On 2017-01-26 14:51:02 -0800, Stephane Eranian wrote:
> Ok, I think I see the problem now (sorry was slow....):

No worries ;)


> the timeline is as follows as seen from the user in your case:
> 
> T0: mmap(0x1000) for func1()
> T1 mmap(0x2000) for func1();
> T3: jit emits info func1() [0x1000-0x1fff]
> T4: mmap(0x3000) for func2()
> T5: mmap(0x4000)  for funcs2()
> T6: jit emits info for func2() [0x2000-0x3fff]
> 
> But the problem is that each mmap covers existing mmaps and thus
> supersedes the others as per the time stamp.

Yes, I think that's whats happening.  Not that I actually know what I'm
talking about here :)


> The problem is not specific to jit, it just reveals itself in your case.
> 
> The logic in perf is that a more recent mmap supersedes an older one,
> so you have:
>  T3: 0x1000-0x2000 owned by func1
>  T4: 0x1000-0x3000 owned by anon
>  T5: 0x1000-0x4000 owned by anon
>  T6: 0x1000-0x4000 owned partially by func2()
> 
> And thus perf cannot symbolize func1() anymore because it has nothing
> mapped in 0x1000-0x1fff but anon.
> 
> Did I get the problem right this time?

Yep.


> This is tricky to solve here because the tool does not know about the
> merging of the VMAs and assume you are overlapping mmaps and not
> merging them.

Yea, it looked tricky. I'd looked around and the only solutions I'd
found was filtering out the anon mappings (obviously not a real
solution) or preventing the merging (not a real solution either).


> Again the problem is not specific to jit, merging of VMA can happen
> anytime with any app.

Sorry if I hinted in the wrong direction - I didn't see any other bad
consequences. I guess in most other cases with merged VMAs its
relatively harmless, since it'll presumably mostly be memory allocations
and such, where this wont matter.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 23:09             ` Andres Freund
@ 2017-01-26 23:16               ` Stephane Eranian
  2017-01-27 13:07                 ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2017-01-26 23:16 UTC (permalink / raw)
  To: Andres Freund
  Cc: Peter Zijlstra, eranian, LKML, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim

On Thu, Jan 26, 2017 at 3:09 PM, Andres Freund <andres@anarazel.de> wrote:
> Hi Stephane,
>
>
> On 2017-01-26 14:51:02 -0800, Stephane Eranian wrote:
>> Ok, I think I see the problem now (sorry was slow....):
>
> No worries ;)
>
>
>> the timeline is as follows as seen from the user in your case:
>>
>> T0: mmap(0x1000) for func1()
>> T1 mmap(0x2000) for func1();
>> T3: jit emits info func1() [0x1000-0x1fff]
>> T4: mmap(0x3000) for func2()
>> T5: mmap(0x4000)  for funcs2()
>> T6: jit emits info for func2() [0x2000-0x3fff]
>>
>> But the problem is that each mmap covers existing mmaps and thus
>> supersedes the others as per the time stamp.
>
> Yes, I think that's whats happening.  Not that I actually know what I'm
> talking about here :)
>
>
>> The problem is not specific to jit, it just reveals itself in your case.
>>
>> The logic in perf is that a more recent mmap supersedes an older one,
>> so you have:
>>  T3: 0x1000-0x2000 owned by func1
>>  T4: 0x1000-0x3000 owned by anon
>>  T5: 0x1000-0x4000 owned by anon
>>  T6: 0x1000-0x4000 owned partially by func2()
>>
>> And thus perf cannot symbolize func1() anymore because it has nothing
>> mapped in 0x1000-0x1fff but anon.
>>
>> Did I get the problem right this time?
>
> Yep.
>
>
>> This is tricky to solve here because the tool does not know about the
>> merging of the VMAs and assume you are overlapping mmaps and not
>> merging them.
>
> Yea, it looked tricky. I'd looked around and the only solutions I'd
> found was filtering out the anon mappings (obviously not a real
> solution) or preventing the merging (not a real solution either).
>
One solution would be for the kernel to report actual mmaps and not resulting
VMA layouts. Is that case you would have your 4 mmaps each reporting 4kb.
That means the perf hook in the mmap code would have to be placed somewhere
else. I don't know how feasible this is. I will let Peter comment on this. But
hopefully by now, I have described the problem clearly enough that we can work
out a solution.



>
>> Again the problem is not specific to jit, merging of VMA can happen
>> anytime with any app.
>
> Sorry if I hinted in the wrong direction - I didn't see any other bad
> consequences. I guess in most other cases with merged VMAs its
> relatively harmless, since it'll presumably mostly be memory allocations
> and such, where this wont matter.
>
> Greetings,
>
> Andres Freund

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-26 23:16               ` Stephane Eranian
@ 2017-01-27 13:07                 ` Peter Zijlstra
  2017-01-27 15:43                   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2017-01-27 13:07 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Andres Freund, eranian, LKML, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim

On Thu, Jan 26, 2017 at 03:16:08PM -0800, Stephane Eranian wrote:
> One solution would be for the kernel to report actual mmaps and not resulting
> VMA layouts. Is that case you would have your 4 mmaps each reporting 4kb.
> That means the perf hook in the mmap code would have to be placed somewhere
> else. I don't know how feasible this is. I will let Peter comment on this. But
> hopefully by now, I have described the problem clearly enough that we can work
> out a solution.

There's a number of problems with that; first and foremost is that it
changes existing behaviour.

Second is that there is no clear definition of what an actual mmap is;
suppose something like:

	ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	mprotect(ptr, 4096, PROT_EXEC);

What should be reported?


Like I said yesterday, the existing heuristic in perf exists because we
do not track munmap() and therefore cannot tell the difference between:

	ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	munmap(ptr);
	ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

and:
	ptr1 = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	ptr2 = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

where these two get merged.

Something like the (compile tested only) below might be sufficient to
disambiguate things. It would need a corresponding tools/perf patch of
course, but I'm not too familiar with that code anymore.

---
 include/linux/perf_event.h      |  2 ++
 include/uapi/linux/perf_event.h | 14 ++++++++-
 kernel/events/core.c            | 66 ++++++++++++++++++++++++++++++++++++++++-
 mm/memory.c                     |  2 ++
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 78ed8105e64d..957a5f2d8e56 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1104,6 +1104,8 @@ static inline u64 __perf_event_count(struct perf_event *event)
 }
 
 extern void perf_event_mmap(struct vm_area_struct *vma);
+extern void perf_event_munmap(struct vm_area_struct *vma);
+
 extern struct perf_guest_info_callbacks *perf_guest_cbs;
 extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
 extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..7571b7b43951 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
-				__reserved_1   : 36;
+				munmap         :  1, /* include munmap data */
+				__reserved_1   : 35;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -862,6 +863,17 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *
+	 *	u64				addr;
+	 *	u64				len;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_MUNMAP			= 16,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 110b38a58493..256ab94e8baf 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -374,6 +374,7 @@ static DEFINE_PER_CPU(int, perf_sched_cb_usages);
 static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events);
 
 static atomic_t nr_mmap_events __read_mostly;
+static atomic_t nr_munmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 static atomic_t nr_freq_events __read_mostly;
@@ -3850,7 +3851,7 @@ static bool is_sb_event(struct perf_event *event)
 	if (event->attach_state & PERF_ATTACH_TASK)
 		return false;
 
-	if (attr->mmap || attr->mmap_data || attr->mmap2 ||
+	if (attr->mmap || attr->mmap_data || attr->mmap2 || attr->munmap ||
 	    attr->comm || attr->comm_exec ||
 	    attr->task ||
 	    attr->context_switch)
@@ -3906,6 +3907,8 @@ static void unaccount_event(struct perf_event *event)
 		dec = true;
 	if (event->attr.mmap || event->attr.mmap_data)
 		atomic_dec(&nr_mmap_events);
+	if (event->attr.munmap)
+		atomic_dec(&nr_munmap_events);
 	if (event->attr.comm)
 		atomic_dec(&nr_comm_events);
 	if (event->attr.task)
@@ -6831,6 +6834,65 @@ void perf_event_mmap(struct vm_area_struct *vma)
 	perf_event_mmap_event(&mmap_event);
 }
 
+struct perf_munmap_event {
+	struct perf_event_header	header;
+	u64				start;
+	u64				len;
+};
+
+static void perf_event_munmap_output(struct perf_event *event, void *data)
+{
+	struct perf_munmap_event *munmap_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	int size = munmap_event->header.size;
+	int ret;
+
+	if (!event->attr.munmap)
+		return;
+
+	if (!event->attr.mmap)
+		return; /* no munmap without mmap */
+
+	if (!event->attr.mmap_data &&
+	    (munmap_event->header.misc & PERF_RECORD_MISC_MMAP_DATA))
+		return; /* no data munmap without mmap_data */
+
+	perf_event_header__init_id(&munmap_event->header, &sample, event);
+	ret = perf_output_begin(&handle, event, munmap_event->header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, *munmap_event);
+	perf_event__output_id_sample(event, &handle, &sample);
+	perf_output_end(&handle);
+out:
+	munmap_event->header.size = size;
+}
+
+void perf_event_munmap(struct vm_area_struct *vma)
+{
+	struct perf_munmap_event munmap_event;
+
+	if (!atomic_read(&nr_munmap_events))
+		return;
+
+	munmap_event = (struct perf_munmap_event){
+		.header = {
+			.type = PERF_RECORD_MUNMAP,
+			.misc = PERF_RECORD_MISC_USER,
+			.size = sizeof(munmap_event),
+		},
+		.start = vma->vm_start,
+		.len   = vma->vm_end - vma->vm_start,
+	};
+
+	if (!(vma->vm_flags & VM_EXEC))
+		munmap_event.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
+
+	perf_iterate_sb(perf_event_munmap_output, &munmap_event, NULL);
+}
+
 void perf_event_aux_event(struct perf_event *event, unsigned long head,
 			  unsigned long size, u64 flags)
 {
@@ -9067,6 +9129,8 @@ static void account_event(struct perf_event *event)
 		inc = true;
 	if (event->attr.mmap || event->attr.mmap_data)
 		atomic_inc(&nr_mmap_events);
+	if (event->attr.munmap)
+		atomic_inc(&nr_munmap_events);
 	if (event->attr.comm)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.task)
diff --git a/mm/memory.c b/mm/memory.c
index 6bf2b471e30c..b89355229608 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -64,6 +64,7 @@
 #include <linux/debugfs.h>
 #include <linux/userfaultfd_k.h>
 #include <linux/dax.h>
+#include <linux/perf_event.h>
 
 #include <asm/io.h>
 #include <asm/mmu_context.h>
@@ -1312,6 +1313,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 	if (end <= vma->vm_start)
 		return;
 
+	perf_event_munmap(vma);
 	if (vma->vm_file)
 		uprobe_munmap(vma, start, end);
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-27 13:07                 ` Peter Zijlstra
@ 2017-01-27 15:43                   ` Arnaldo Carvalho de Melo
  2017-01-27 17:35                     ` Stephane Eranian
  2017-01-27 17:38                     ` [PATCH] handle munmap records in tools/perf was: " Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-01-27 15:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Andres Freund, eranian, LKML, Jiri Olsa,
	Ingo Molnar, Anton Blanchard, Namhyung Kim

Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu:
> Something like the (compile tested only) below might be sufficient to
> disambiguate things. It would need a corresponding tools/perf patch of
> course, but I'm not too familiar with that code anymore.

I'm working on patch to do feature test, fallback and handling of the
event, etc, will post later.

- Arnaldo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-27 15:43                   ` Arnaldo Carvalho de Melo
@ 2017-01-27 17:35                     ` Stephane Eranian
  2017-01-27 17:38                     ` [PATCH] handle munmap records in tools/perf was: " Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 29+ messages in thread
From: Stephane Eranian @ 2017-01-27 17:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Andres Freund, eranian, LKML, Jiri Olsa,
	Ingo Molnar, Anton Blanchard, Namhyung Kim

On Fri, Jan 27, 2017 at 7:43 AM, Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu:
>> Something like the (compile tested only) below might be sufficient to
>> disambiguate things. It would need a corresponding tools/perf patch of
>> course, but I'm not too familiar with that code anymore.
>
> I'm working on patch to do feature test, fallback and handling of the
> event, etc, will post later.
>
Going back to Andres' case:

T0: mmap(0x1000) anon for func1()
T1 mmap(0x2000) anon for func1();
T3: jit emits info func1() [0x0000-0x0fff]
T4: mmap(0x3000) for func2()
T5: mmap(0x4000)  for funcs2()
T6: jit emits info for func2() [0x2000-0x2fff]

So with Peter's patch, nothing will change in the flow of MMAP2 seen by perf in
this particular case. Furthermore, there is no munmap(). But now the perf tool
behavior.

Now perf detects the overlap but because it did not see an unmap(), it
will have to merge or split mmaps().

At T0 (no change from previous behavior):
0x0000-0x0fff anon

At T1 (no change = overlap):
0x0000-0x1fff anon

At T3 (no change =overlap):
0x0000-0x0fff owned by func1,
0x1000-0x1fff anon.

At T4 (change: cannot overlap non-anon range without an munmap, so
need to split):
0x0000-0x0fff owned by func1()
0x1000-0x2fff anon

At T5, (change:  cannot overlap non-anon range without an munmap, so
need to split):
0x0000-0x0fff owned by func1()
0x1000-0x3fff anon

At T6, (no change: overlapping an anon):
0x=0000-0x=0fff owned by func1()
0x1000-0x1fff anon
0x2000-0x2fff owned by func2()
0x300-0x3fff anon

In summary the key change is that an MMAP2(anon) cannot overlap an
MMAP2(file) anymore. It has to be split.
Everything else stays the same.

Do we agree?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-27 15:43                   ` Arnaldo Carvalho de Melo
  2017-01-27 17:35                     ` Stephane Eranian
@ 2017-01-27 17:38                     ` Arnaldo Carvalho de Melo
  2017-01-27 17:46                       ` Stephane Eranian
  1 sibling, 1 reply; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-01-27 17:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Andres Freund, Stephane Eranian, LKML,
	Jiri Olsa, Ingo Molnar, Anton Blanchard, Namhyung Kim

Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu:
> > Something like the (compile tested only) below might be sufficient to
> > disambiguate things. It would need a corresponding tools/perf patch of
> > course, but I'm not too familiar with that code anymore.
> 
> I'm working on patch to do feature test, fallback and handling of the
> event, etc, will post later.

Just compile tested, need to build a kernel with PeterZ's patch to test,
feel free to go from there if in a hurry.


The place where the map is yanked out of the thread's maps rbtree is at

machine__process_munmap_event()

The rest is making sure the tool works with older kernels, deals with
endianness in the record in a perf.data file for cross platform
analysis, hooking it to the various tools where handling this event
makes sense.

[acme@jouet linux]$ diffstat /tmp/a.patch 
 include/uapi/linux/perf_event.h |   14 +++++++++-
 perf/builtin-annotate.c         |    1 
 perf/builtin-c2c.c              |    1 
 perf/builtin-diff.c             |    1 
 perf/builtin-inject.c           |   17 +++++++++++-
 perf/builtin-kmem.c             |    1 
 perf/builtin-mem.c              |    1 
 perf/builtin-record.c           |    1 
 perf/builtin-report.c           |    1 
 perf/builtin-script.c           |   30 ++++++++++++++++++++++
 perf/builtin-trace.c            |    1 
 perf/util/data-convert-bt.c     |    1 
 perf/util/event.c               |   18 +++++++++++++
 perf/util/event.h               |   10 +++++++
 perf/util/evsel.c               |   12 +++++++-
 perf/util/machine.c             |   54 ++++++++++++++++++++++++++++++++++++++++
 perf/util/machine.h             |    2 +
 perf/util/python.c              |    1 
 perf/util/session.c             |   19 ++++++++++++++
 perf/util/tool.h                |    1 
 20 files changed, 183 insertions(+), 4 deletions(-)
[acme@jouet linux]$

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c66a485a24ac..59c5cd5abbbf 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
-				__reserved_1   : 36;
+				munmap	       :  1, /* include munmap data */
+				__reserved_1   : 35;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -862,6 +863,17 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *
+	 *	u64				addr;
+	 *	u64				len;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_MUNMAP			= 16,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index ebb628332a6e..082b5f100ac5 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -390,6 +390,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
 			.sample	= process_sample_event,
 			.mmap	= perf_event__process_mmap,
 			.mmap2	= perf_event__process_mmap2,
+			.munmap	= perf_event__process_munmap,
 			.comm	= perf_event__process_comm,
 			.exit	= perf_event__process_exit,
 			.fork	= perf_event__process_fork,
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index e2b21723bbf8..3e1f1eba77e2 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -313,6 +313,7 @@ static struct perf_c2c c2c = {
 		.sample		= process_sample_event,
 		.mmap		= perf_event__process_mmap,
 		.mmap2		= perf_event__process_mmap2,
+		.munmap		= perf_event__process_munmap,
 		.comm		= perf_event__process_comm,
 		.exit		= perf_event__process_exit,
 		.fork		= perf_event__process_fork,
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 9ff0db4e2d0c..2c7856848887 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -350,6 +350,7 @@ static struct perf_tool tool = {
 	.sample	= diff__process_sample_event,
 	.mmap	= perf_event__process_mmap,
 	.mmap2	= perf_event__process_mmap2,
+	.munmap	= perf_event__process_munmap,
 	.comm	= perf_event__process_comm,
 	.exit	= perf_event__process_exit,
 	.fork	= perf_event__process_fork,
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index b9bc7e39833a..4e6103d0c163 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -282,6 +282,19 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool,
 	return err;
 }
 
+static int perf_event__repipe_munmap(struct perf_tool *tool,
+				     union perf_event *event,
+				     struct perf_sample *sample,
+				     struct machine *machine)
+{
+	int err;
+
+	err = perf_event__process_munmap(tool, event, sample, machine);
+	perf_event__repipe(tool, event, sample, machine);
+
+	return err;
+}
+
 #ifdef HAVE_JITDUMP
 static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
 					union perf_event *event,
@@ -569,7 +582,7 @@ static void strip_init(struct perf_inject *inject)
 static bool has_tracking(struct perf_evsel *evsel)
 {
 	return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm ||
-	       evsel->attr.task;
+	       evsel->attr.task || evsel->attr.munmap;
 }
 
 #define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
@@ -632,6 +645,7 @@ static int __cmd_inject(struct perf_inject *inject)
 	    inject->itrace_synth_opts.set) {
 		inject->tool.mmap	  = perf_event__repipe_mmap;
 		inject->tool.mmap2	  = perf_event__repipe_mmap2;
+		inject->tool.munmap	  = perf_event__repipe_munmap;
 		inject->tool.fork	  = perf_event__repipe_fork;
 		inject->tool.tracing_data = perf_event__repipe_tracing_data;
 	}
@@ -732,6 +746,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 			.sample		= perf_event__repipe_sample,
 			.mmap		= perf_event__repipe,
 			.mmap2		= perf_event__repipe,
+			.munmap		= perf_event__repipe,
 			.comm		= perf_event__repipe,
 			.fork		= perf_event__repipe,
 			.exit		= perf_event__repipe,
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 29f4751a3574..eb59aa7a0f5b 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = {
 	.comm		 = perf_event__process_comm,
 	.mmap		 = perf_event__process_mmap,
 	.mmap2		 = perf_event__process_mmap2,
+	.munmap		 = perf_event__process_munmap,
 	.ordered_events	 = true,
 };
 
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index cd7bc4d104e2..de026e9ef1be 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -338,6 +338,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
 			.sample		= process_sample_event,
 			.mmap		= perf_event__process_mmap,
 			.mmap2		= perf_event__process_mmap2,
+			.munmap		= perf_event__process_munmap,
 			.comm		= perf_event__process_comm,
 			.lost		= perf_event__process_lost,
 			.fork		= perf_event__process_fork,
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ffac8ca9fb01..f197fc196cd8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1498,6 +1498,7 @@ static struct record record = {
 		.comm		= perf_event__process_comm,
 		.mmap		= perf_event__process_mmap,
 		.mmap2		= perf_event__process_mmap2,
+		.munmap		= perf_event__process_munmap,
 		.ordered_events	= true,
 	},
 };
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index dbd7fa028861..96680fbeb664 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -693,6 +693,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 			.sample		 = process_sample_event,
 			.mmap		 = perf_event__process_mmap,
 			.mmap2		 = perf_event__process_mmap2,
+			.munmap		 = perf_event__process_munmap,
 			.comm		 = perf_event__process_comm,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c0783b4f7b6c..27fcbcb15782 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1244,6 +1244,34 @@ static int process_mmap2_event(struct perf_tool *tool,
 	return 0;
 }
 
+static int process_munmap_event(struct perf_tool *tool, union perf_event *event,
+				struct perf_sample *sample, struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+
+	if (perf_event__process_munmap(tool, event, sample, machine) < 0)
+		return -1;
+
+	thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	if (thread == NULL) {
+		pr_debug("problem processing MUNMAP event, skipping it.\n");
+		return -1;
+	}
+
+	if (!evsel->attr.sample_id_all) {
+		pr_debug("MUNMAP event requires attr.sample_id_all, skipping it.\n");
+		return -1;
+	}
+
+	print_sample_start(sample, thread, evsel);
+	perf_event__fprintf(event, stdout);
+	thread__put(thread);
+	return 0;
+}
+
 static int process_switch_event(struct perf_tool *tool,
 				union perf_event *event,
 				struct perf_sample *sample,
@@ -1290,6 +1318,7 @@ static int __cmd_script(struct perf_script *script)
 	if (script->show_mmap_events) {
 		script->tool.mmap = process_mmap_event;
 		script->tool.mmap2 = process_mmap2_event;
+		script->tool.munmap = process_munmap_event;
 	}
 	if (script->show_switch_events)
 		script->tool.context_switch = process_switch_event;
@@ -2096,6 +2125,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 			.sample		 = process_sample_event,
 			.mmap		 = perf_event__process_mmap,
 			.mmap2		 = perf_event__process_mmap2,
+			.munmap		 = perf_event__process_munmap,
 			.comm		 = perf_event__process_comm,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 40ef9b293d1b..cfb7e9e2cd7d 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2411,6 +2411,7 @@ static int trace__replay(struct trace *trace)
 	trace->tool.sample	  = trace__process_sample;
 	trace->tool.mmap	  = perf_event__process_mmap;
 	trace->tool.mmap2	  = perf_event__process_mmap2;
+	trace->tool.munmap	  = perf_event__process_munmap;
 	trace->tool.comm	  = perf_event__process_comm;
 	trace->tool.exit	  = perf_event__process_exit;
 	trace->tool.fork	  = perf_event__process_fork;
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 4e6cbc99f08e..8d1feb9fe82d 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1462,6 +1462,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
 			.sample          = process_sample_event,
 			.mmap            = perf_event__process_mmap,
 			.mmap2           = perf_event__process_mmap2,
+			.munmap          = perf_event__process_munmap,
 			.comm            = perf_event__process_comm,
 			.exit            = perf_event__process_exit,
 			.fork            = perf_event__process_fork,
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 8ab0d7da956b..ffcba9c07c0d 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -18,6 +18,7 @@ static const char *perf_event__names[] = {
 	[0]					= "TOTAL",
 	[PERF_RECORD_MMAP]			= "MMAP",
 	[PERF_RECORD_MMAP2]			= "MMAP2",
+	[PERF_RECORD_MUNMAP]			= "MUNMAP",
 	[PERF_RECORD_LOST]			= "LOST",
 	[PERF_RECORD_COMM]			= "COMM",
 	[PERF_RECORD_EXIT]			= "EXIT",
@@ -1080,6 +1081,12 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp)
 		       event->mmap2.filename);
 }
 
+size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp)
+{
+	return fprintf(fp, " [%#" PRIx64 ", %" PRIu64 "]\n",
+		       event->munmap.start, event->munmap.len);
+}
+
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp)
 {
 	struct thread_map *threads = thread_map__new_event(&event->thread_map);
@@ -1128,6 +1135,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused,
 	return machine__process_mmap2_event(machine, event, sample);
 }
 
+int perf_event__process_munmap(struct perf_tool *tool __maybe_unused,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine)
+{
+	return machine__process_munmap_event(machine, event, sample);
+}
+
 size_t perf_event__fprintf_task(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, "(%d:%d):(%d:%d)\n",
@@ -1199,6 +1214,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_MMAP2:
 		ret += perf_event__fprintf_mmap2(event, fp);
 		break;
+	case PERF_RECORD_MUNMAP:
+		ret += perf_event__fprintf_munmap(event, fp);
+		break;
 	case PERF_RECORD_AUX:
 		ret += perf_event__fprintf_aux(event, fp);
 		break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c735c53a26f8..00942b6a9d48 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -33,6 +33,12 @@ struct mmap2_event {
 	char filename[PATH_MAX];
 };
 
+struct munmap_event {
+	struct perf_event_header header;
+	u64			 start;
+	u64			 len;
+};
+
 struct comm_event {
 	struct perf_event_header header;
 	u32 pid, tid;
@@ -484,6 +490,7 @@ union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
 	struct mmap2_event		mmap2;
+	struct munmap_event		munmap;
 	struct comm_event		comm;
 	struct fork_event		fork;
 	struct lost_event		lost;
@@ -595,6 +602,8 @@ int perf_event__process_mmap2(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
 			     struct machine *machine);
+int perf_event__process_munmap(struct perf_tool *tool, union perf_event *event,
+			       struct perf_sample *sample, struct machine *machine);
 int perf_event__process_fork(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -647,6 +656,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 04e536ae4d88..c4b88e4f2422 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -34,6 +34,7 @@ static struct {
 	bool sample_id_all;
 	bool exclude_guest;
 	bool mmap2;
+	bool munmap;
 	bool cloexec;
 	bool clockid;
 	bool clockid_wrong;
@@ -930,6 +931,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 	attr->task  = track;
 	attr->mmap  = track;
 	attr->mmap2 = track && !perf_missing_features.mmap2;
+	attr->munmap = track && !perf_missing_features.munmap;
 	attr->comm  = track;
 
 	if (opts->record_switch_events)
@@ -1395,6 +1397,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(exclude_callchain_kernel, p_unsigned);
 	PRINT_ATTRf(exclude_callchain_user, p_unsigned);
 	PRINT_ATTRf(mmap2, p_unsigned);
+	PRINT_ATTRf(munmap, p_unsigned);
 	PRINT_ATTRf(comm_exec, p_unsigned);
 	PRINT_ATTRf(use_clockid, p_unsigned);
 	PRINT_ATTRf(context_switch, p_unsigned);
@@ -1474,6 +1477,8 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 	}
 
 fallback_missing_features:
+	if (perf_missing_features.munmap)
+		evsel->attr.munmap = 0;
 	if (perf_missing_features.clockid_wrong)
 		evsel->attr.clockid = CLOCK_MONOTONIC; /* should always work */
 	if (perf_missing_features.clockid) {
@@ -1603,10 +1608,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 		goto out_close;
 
 	/*
-	 * Must probe features in the order they were added to the
+	 * Must probe features in the reverse order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
+	if (!perf_missing_features.munmap && evsel->attr.munmap) {
+		perf_missing_features.munmap = true;
+		goto fallback_missing_features;
+	} else if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
 		perf_missing_features.write_backward = true;
 		goto out_close;
 	} else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 747a034d1ff3..24f2f309d70f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1320,6 +1320,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 	return -1;
 }
 
+static int machine__process_kernel_munmap_event(struct machine *machine __maybe_unused,
+						union perf_event *event __maybe_unused)
+{
+	/*
+ 	 * XXX: Fill this in as soon as we get munmap event for kernel
+ 	 * 	"mmaps", aka module unload
+ 	 */
+	return 0;
+}
+
 int machine__process_mmap2_event(struct machine *machine,
 				 union perf_event *event,
 				 struct perf_sample *sample)
@@ -1379,6 +1389,48 @@ int machine__process_mmap2_event(struct machine *machine,
 	return 0;
 }
 
+int machine__process_munmap_event(struct machine *machine,
+				  union perf_event *event,
+				  struct perf_sample *sample)
+{
+	struct thread *thread;
+	struct map *map;
+	enum map_type type = MAP__FUNCTION;
+	int ret = 0;
+
+	if (dump_trace)
+		perf_event__fprintf_munmap(event, stdout);
+
+	if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
+	    sample->cpumode == PERF_RECORD_MISC_KERNEL) {
+		ret = machine__process_kernel_munmap_event(machine, event);
+		if (ret < 0)
+			goto out_problem;
+		return 0;
+	}
+
+	thread = machine__find_thread(machine, sample->pid, sample->tid);
+	if (thread == NULL)
+		goto out_problem;
+
+	if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA)
+		type = MAP__VARIABLE;
+
+	map = map_groups__find(thread->mg, type, event->munmap.start);
+	if (map != NULL)
+		goto out_problem_map;
+
+	map_groups__remove(thread->mg, map);
+	thread__put(thread);
+	return 0;
+
+out_problem_map:
+	thread__put(thread);
+out_problem:
+	dump_printf("problem processing PERF_RECORD_MUNMAP, skipping event.\n");
+	return 0;
+}
+
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample)
 {
@@ -1540,6 +1592,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_mmap_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP2:
 		ret = machine__process_mmap2_event(machine, event, sample); break;
+	case PERF_RECORD_MUNMAP:
+		ret = machine__process_munmap_event(machine, event, sample); break;
 	case PERF_RECORD_FORK:
 		ret = machine__process_fork_event(machine, event, sample); break;
 	case PERF_RECORD_EXIT:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index a28305029711..eabc8b9d8d36 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -101,6 +101,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
 				 struct perf_sample *sample);
+int machine__process_munmap_event(struct machine *machine, union perf_event *event,
+				  struct perf_sample *sample);
 int machine__process_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index a5fbc012e3df..047e004dc53f 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -1155,6 +1155,7 @@ static struct {
 	PERF_CONST(RECORD_READ),
 	PERF_CONST(RECORD_SAMPLE),
 	PERF_CONST(RECORD_MMAP2),
+	PERF_CONST(RECORD_MUNMAP),
 	PERF_CONST(RECORD_AUX),
 	PERF_CONST(RECORD_ITRACE_START),
 	PERF_CONST(RECORD_LOST_SAMPLES),
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 349c68144e55..42edd68ce0b8 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -357,6 +357,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->mmap = process_event_stub;
 	if (tool->mmap2 == NULL)
 		tool->mmap2 = process_event_stub;
+	if (tool->munmap == NULL)
+		tool->munmap = process_event_stub;
 	if (tool->comm == NULL)
 		tool->comm = process_event_stub;
 	if (tool->fork == NULL)
@@ -480,6 +482,20 @@ static void perf_event__mmap2_swap(union perf_event *event,
 		swap_sample_id_all(event, data);
 	}
 }
+
+static void perf_event__munmap_swap(union perf_event *event, bool sample_id_all)
+{
+	event->munmap.start = bswap_64(event->munmap.start);
+	event->munmap.len   = bswap_64(event->munmap.len);
+
+	if (sample_id_all) {
+		void *data = &event->munmap.len;
+
+		data += sizeof(event->munmap.len);
+		swap_sample_id_all(event, data);
+	}
+}
+
 static void perf_event__task_swap(union perf_event *event, bool sample_id_all)
 {
 	event->fork.pid	 = bswap_32(event->fork.pid);
@@ -773,6 +789,7 @@ typedef void (*perf_event__swap_op)(union perf_event *event,
 static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_MMAP]		  = perf_event__mmap_swap,
 	[PERF_RECORD_MMAP2]		  = perf_event__mmap2_swap,
+	[PERF_RECORD_MUNMAP]		  = perf_event__munmap_swap,
 	[PERF_RECORD_COMM]		  = perf_event__comm_swap,
 	[PERF_RECORD_FORK]		  = perf_event__task_swap,
 	[PERF_RECORD_EXIT]		  = perf_event__task_swap,
@@ -1237,6 +1254,8 @@ static int machines__deliver_event(struct machines *machines,
 		if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT)
 			++evlist->stats.nr_proc_map_timeout;
 		return tool->mmap2(tool, event, sample, machine);
+	case PERF_RECORD_MUNMAP:
+		return tool->munmap(tool, event, sample, machine);
 	case PERF_RECORD_COMM:
 		return tool->comm(tool, event, sample, machine);
 	case PERF_RECORD_FORK:
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index ac2590a3de2d..66291063b8b2 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -39,6 +39,7 @@ struct perf_tool {
 			read;
 	event_op	mmap,
 			mmap2,
+			munmap,
 			comm,
 			fork,
 			exit,

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-27 17:38                     ` [PATCH] handle munmap records in tools/perf was: " Arnaldo Carvalho de Melo
@ 2017-01-27 17:46                       ` Stephane Eranian
  2017-01-27 18:05                         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2017-01-27 17:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Stephane Eranian, Andres Freund, LKML, Jiri Olsa,
	Ingo Molnar, Anton Blanchard, Namhyung Kim

On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu:
>> > Something like the (compile tested only) below might be sufficient to
>> > disambiguate things. It would need a corresponding tools/perf patch of
>> > course, but I'm not too familiar with that code anymore.
>>
>> I'm working on patch to do feature test, fallback and handling of the
>> event, etc, will post later.
>
> Just compile tested, need to build a kernel with PeterZ's patch to test,
> feel free to go from there if in a hurry.
>
>
> The place where the map is yanked out of the thread's maps rbtree is at
>
> machine__process_munmap_event()
>
> The rest is making sure the tool works with older kernels, deals with
> endianness in the record in a perf.data file for cross platform
> analysis, hooking it to the various tools where handling this event
> makes sense.
>
At first glance this patch handles the munmap() well. But it will not solve
the case of Andres. Unless you're telling me that the kernel with Peterz's patch
will now generate munmap record because of the merging. If not, then the code
handling overlaps needs to change as well as I described in my other Email.

> [acme@jouet linux]$ diffstat /tmp/a.patch
>  include/uapi/linux/perf_event.h |   14 +++++++++-
>  perf/builtin-annotate.c         |    1
>  perf/builtin-c2c.c              |    1
>  perf/builtin-diff.c             |    1
>  perf/builtin-inject.c           |   17 +++++++++++-
>  perf/builtin-kmem.c             |    1
>  perf/builtin-mem.c              |    1
>  perf/builtin-record.c           |    1
>  perf/builtin-report.c           |    1
>  perf/builtin-script.c           |   30 ++++++++++++++++++++++
>  perf/builtin-trace.c            |    1
>  perf/util/data-convert-bt.c     |    1
>  perf/util/event.c               |   18 +++++++++++++
>  perf/util/event.h               |   10 +++++++
>  perf/util/evsel.c               |   12 +++++++-
>  perf/util/machine.c             |   54 ++++++++++++++++++++++++++++++++++++++++
>  perf/util/machine.h             |    2 +
>  perf/util/python.c              |    1
>  perf/util/session.c             |   19 ++++++++++++++
>  perf/util/tool.h                |    1
>  20 files changed, 183 insertions(+), 4 deletions(-)
> [acme@jouet linux]$
>
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index c66a485a24ac..59c5cd5abbbf 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -344,7 +344,8 @@ struct perf_event_attr {
>                                 use_clockid    :  1, /* use @clockid for time fields */
>                                 context_switch :  1, /* context switch data */
>                                 write_backward :  1, /* Write ring buffer from end to beginning */
> -                               __reserved_1   : 36;
> +                               munmap         :  1, /* include munmap data */
> +                               __reserved_1   : 35;
>
>         union {
>                 __u32           wakeup_events;    /* wakeup every n events */
> @@ -862,6 +863,17 @@ enum perf_event_type {
>          */
>         PERF_RECORD_SWITCH_CPU_WIDE             = 15,
>
> +       /*
> +        * struct {
> +        *      struct perf_event_header        header;
> +        *
> +        *      u64                             addr;
> +        *      u64                             len;
> +        *      struct sample_id                sample_id;
> +        * };
> +        */
> +       PERF_RECORD_MUNMAP                      = 16,
> +
>         PERF_RECORD_MAX,                        /* non-ABI */
>  };
>
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index ebb628332a6e..082b5f100ac5 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -390,6 +390,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
>                         .sample = process_sample_event,
>                         .mmap   = perf_event__process_mmap,
>                         .mmap2  = perf_event__process_mmap2,
> +                       .munmap = perf_event__process_munmap,
>                         .comm   = perf_event__process_comm,
>                         .exit   = perf_event__process_exit,
>                         .fork   = perf_event__process_fork,
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index e2b21723bbf8..3e1f1eba77e2 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -313,6 +313,7 @@ static struct perf_c2c c2c = {
>                 .sample         = process_sample_event,
>                 .mmap           = perf_event__process_mmap,
>                 .mmap2          = perf_event__process_mmap2,
> +               .munmap         = perf_event__process_munmap,
>                 .comm           = perf_event__process_comm,
>                 .exit           = perf_event__process_exit,
>                 .fork           = perf_event__process_fork,
> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> index 9ff0db4e2d0c..2c7856848887 100644
> --- a/tools/perf/builtin-diff.c
> +++ b/tools/perf/builtin-diff.c
> @@ -350,6 +350,7 @@ static struct perf_tool tool = {
>         .sample = diff__process_sample_event,
>         .mmap   = perf_event__process_mmap,
>         .mmap2  = perf_event__process_mmap2,
> +       .munmap = perf_event__process_munmap,
>         .comm   = perf_event__process_comm,
>         .exit   = perf_event__process_exit,
>         .fork   = perf_event__process_fork,
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index b9bc7e39833a..4e6103d0c163 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -282,6 +282,19 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool,
>         return err;
>  }
>
> +static int perf_event__repipe_munmap(struct perf_tool *tool,
> +                                    union perf_event *event,
> +                                    struct perf_sample *sample,
> +                                    struct machine *machine)
> +{
> +       int err;
> +
> +       err = perf_event__process_munmap(tool, event, sample, machine);
> +       perf_event__repipe(tool, event, sample, machine);
> +
> +       return err;
> +}
> +
>  #ifdef HAVE_JITDUMP
>  static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
>                                         union perf_event *event,
> @@ -569,7 +582,7 @@ static void strip_init(struct perf_inject *inject)
>  static bool has_tracking(struct perf_evsel *evsel)
>  {
>         return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm ||
> -              evsel->attr.task;
> +              evsel->attr.task || evsel->attr.munmap;
>  }
>
>  #define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
> @@ -632,6 +645,7 @@ static int __cmd_inject(struct perf_inject *inject)
>             inject->itrace_synth_opts.set) {
>                 inject->tool.mmap         = perf_event__repipe_mmap;
>                 inject->tool.mmap2        = perf_event__repipe_mmap2;
> +               inject->tool.munmap       = perf_event__repipe_munmap;
>                 inject->tool.fork         = perf_event__repipe_fork;
>                 inject->tool.tracing_data = perf_event__repipe_tracing_data;
>         }
> @@ -732,6 +746,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
>                         .sample         = perf_event__repipe_sample,
>                         .mmap           = perf_event__repipe,
>                         .mmap2          = perf_event__repipe,
> +                       .munmap         = perf_event__repipe,
>                         .comm           = perf_event__repipe,
>                         .fork           = perf_event__repipe,
>                         .exit           = perf_event__repipe,
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index 29f4751a3574..eb59aa7a0f5b 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = {
>         .comm            = perf_event__process_comm,
>         .mmap            = perf_event__process_mmap,
>         .mmap2           = perf_event__process_mmap2,
> +       .munmap          = perf_event__process_munmap,
>         .ordered_events  = true,
>  };
>
> diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
> index cd7bc4d104e2..de026e9ef1be 100644
> --- a/tools/perf/builtin-mem.c
> +++ b/tools/perf/builtin-mem.c
> @@ -338,6 +338,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
>                         .sample         = process_sample_event,
>                         .mmap           = perf_event__process_mmap,
>                         .mmap2          = perf_event__process_mmap2,
> +                       .munmap         = perf_event__process_munmap,
>                         .comm           = perf_event__process_comm,
>                         .lost           = perf_event__process_lost,
>                         .fork           = perf_event__process_fork,
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index ffac8ca9fb01..f197fc196cd8 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1498,6 +1498,7 @@ static struct record record = {
>                 .comm           = perf_event__process_comm,
>                 .mmap           = perf_event__process_mmap,
>                 .mmap2          = perf_event__process_mmap2,
> +               .munmap         = perf_event__process_munmap,
>                 .ordered_events = true,
>         },
>  };
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index dbd7fa028861..96680fbeb664 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -693,6 +693,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>                         .sample          = process_sample_event,
>                         .mmap            = perf_event__process_mmap,
>                         .mmap2           = perf_event__process_mmap2,
> +                       .munmap          = perf_event__process_munmap,
>                         .comm            = perf_event__process_comm,
>                         .exit            = perf_event__process_exit,
>                         .fork            = perf_event__process_fork,
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index c0783b4f7b6c..27fcbcb15782 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -1244,6 +1244,34 @@ static int process_mmap2_event(struct perf_tool *tool,
>         return 0;
>  }
>
> +static int process_munmap_event(struct perf_tool *tool, union perf_event *event,
> +                               struct perf_sample *sample, struct machine *machine)
> +{
> +       struct thread *thread;
> +       struct perf_script *script = container_of(tool, struct perf_script, tool);
> +       struct perf_session *session = script->session;
> +       struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
> +
> +       if (perf_event__process_munmap(tool, event, sample, machine) < 0)
> +               return -1;
> +
> +       thread = machine__findnew_thread(machine, sample->pid, sample->tid);
> +       if (thread == NULL) {
> +               pr_debug("problem processing MUNMAP event, skipping it.\n");
> +               return -1;
> +       }
> +
> +       if (!evsel->attr.sample_id_all) {
> +               pr_debug("MUNMAP event requires attr.sample_id_all, skipping it.\n");
> +               return -1;
> +       }
> +
> +       print_sample_start(sample, thread, evsel);
> +       perf_event__fprintf(event, stdout);
> +       thread__put(thread);
> +       return 0;
> +}
> +
>  static int process_switch_event(struct perf_tool *tool,
>                                 union perf_event *event,
>                                 struct perf_sample *sample,
> @@ -1290,6 +1318,7 @@ static int __cmd_script(struct perf_script *script)
>         if (script->show_mmap_events) {
>                 script->tool.mmap = process_mmap_event;
>                 script->tool.mmap2 = process_mmap2_event;
> +               script->tool.munmap = process_munmap_event;
>         }
>         if (script->show_switch_events)
>                 script->tool.context_switch = process_switch_event;
> @@ -2096,6 +2125,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
>                         .sample          = process_sample_event,
>                         .mmap            = perf_event__process_mmap,
>                         .mmap2           = perf_event__process_mmap2,
> +                       .munmap          = perf_event__process_munmap,
>                         .comm            = perf_event__process_comm,
>                         .exit            = perf_event__process_exit,
>                         .fork            = perf_event__process_fork,
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 40ef9b293d1b..cfb7e9e2cd7d 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -2411,6 +2411,7 @@ static int trace__replay(struct trace *trace)
>         trace->tool.sample        = trace__process_sample;
>         trace->tool.mmap          = perf_event__process_mmap;
>         trace->tool.mmap2         = perf_event__process_mmap2;
> +       trace->tool.munmap        = perf_event__process_munmap;
>         trace->tool.comm          = perf_event__process_comm;
>         trace->tool.exit          = perf_event__process_exit;
>         trace->tool.fork          = perf_event__process_fork;
> diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
> index 4e6cbc99f08e..8d1feb9fe82d 100644
> --- a/tools/perf/util/data-convert-bt.c
> +++ b/tools/perf/util/data-convert-bt.c
> @@ -1462,6 +1462,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
>                         .sample          = process_sample_event,
>                         .mmap            = perf_event__process_mmap,
>                         .mmap2           = perf_event__process_mmap2,
> +                       .munmap          = perf_event__process_munmap,
>                         .comm            = perf_event__process_comm,
>                         .exit            = perf_event__process_exit,
>                         .fork            = perf_event__process_fork,
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 8ab0d7da956b..ffcba9c07c0d 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -18,6 +18,7 @@ static const char *perf_event__names[] = {
>         [0]                                     = "TOTAL",
>         [PERF_RECORD_MMAP]                      = "MMAP",
>         [PERF_RECORD_MMAP2]                     = "MMAP2",
> +       [PERF_RECORD_MUNMAP]                    = "MUNMAP",
>         [PERF_RECORD_LOST]                      = "LOST",
>         [PERF_RECORD_COMM]                      = "COMM",
>         [PERF_RECORD_EXIT]                      = "EXIT",
> @@ -1080,6 +1081,12 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp)
>                        event->mmap2.filename);
>  }
>
> +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp)
> +{
> +       return fprintf(fp, " [%#" PRIx64 ", %" PRIu64 "]\n",
> +                      event->munmap.start, event->munmap.len);
> +}
> +
>  size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp)
>  {
>         struct thread_map *threads = thread_map__new_event(&event->thread_map);
> @@ -1128,6 +1135,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused,
>         return machine__process_mmap2_event(machine, event, sample);
>  }
>
> +int perf_event__process_munmap(struct perf_tool *tool __maybe_unused,
> +                              union perf_event *event,
> +                              struct perf_sample *sample,
> +                              struct machine *machine)
> +{
> +       return machine__process_munmap_event(machine, event, sample);
> +}
> +
>  size_t perf_event__fprintf_task(union perf_event *event, FILE *fp)
>  {
>         return fprintf(fp, "(%d:%d):(%d:%d)\n",
> @@ -1199,6 +1214,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
>         case PERF_RECORD_MMAP2:
>                 ret += perf_event__fprintf_mmap2(event, fp);
>                 break;
> +       case PERF_RECORD_MUNMAP:
> +               ret += perf_event__fprintf_munmap(event, fp);
> +               break;
>         case PERF_RECORD_AUX:
>                 ret += perf_event__fprintf_aux(event, fp);
>                 break;
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index c735c53a26f8..00942b6a9d48 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -33,6 +33,12 @@ struct mmap2_event {
>         char filename[PATH_MAX];
>  };
>
> +struct munmap_event {
> +       struct perf_event_header header;
> +       u64                      start;
> +       u64                      len;
> +};
> +
>  struct comm_event {
>         struct perf_event_header header;
>         u32 pid, tid;
> @@ -484,6 +490,7 @@ union perf_event {
>         struct perf_event_header        header;
>         struct mmap_event               mmap;
>         struct mmap2_event              mmap2;
> +       struct munmap_event             munmap;
>         struct comm_event               comm;
>         struct fork_event               fork;
>         struct lost_event               lost;
> @@ -595,6 +602,8 @@ int perf_event__process_mmap2(struct perf_tool *tool,
>                              union perf_event *event,
>                              struct perf_sample *sample,
>                              struct machine *machine);
> +int perf_event__process_munmap(struct perf_tool *tool, union perf_event *event,
> +                              struct perf_sample *sample, struct machine *machine);
>  int perf_event__process_fork(struct perf_tool *tool,
>                              union perf_event *event,
>                              struct perf_sample *sample,
> @@ -647,6 +656,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
> +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp);
>  size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 04e536ae4d88..c4b88e4f2422 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -34,6 +34,7 @@ static struct {
>         bool sample_id_all;
>         bool exclude_guest;
>         bool mmap2;
> +       bool munmap;
>         bool cloexec;
>         bool clockid;
>         bool clockid_wrong;
> @@ -930,6 +931,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
>         attr->task  = track;
>         attr->mmap  = track;
>         attr->mmap2 = track && !perf_missing_features.mmap2;
> +       attr->munmap = track && !perf_missing_features.munmap;
>         attr->comm  = track;
>
>         if (opts->record_switch_events)
> @@ -1395,6 +1397,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
>         PRINT_ATTRf(exclude_callchain_kernel, p_unsigned);
>         PRINT_ATTRf(exclude_callchain_user, p_unsigned);
>         PRINT_ATTRf(mmap2, p_unsigned);
> +       PRINT_ATTRf(munmap, p_unsigned);
>         PRINT_ATTRf(comm_exec, p_unsigned);
>         PRINT_ATTRf(use_clockid, p_unsigned);
>         PRINT_ATTRf(context_switch, p_unsigned);
> @@ -1474,6 +1477,8 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
>         }
>
>  fallback_missing_features:
> +       if (perf_missing_features.munmap)
> +               evsel->attr.munmap = 0;
>         if (perf_missing_features.clockid_wrong)
>                 evsel->attr.clockid = CLOCK_MONOTONIC; /* should always work */
>         if (perf_missing_features.clockid) {
> @@ -1603,10 +1608,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
>                 goto out_close;
>
>         /*
> -        * Must probe features in the order they were added to the
> +        * Must probe features in the reverse order they were added to the
>          * perf_event_attr interface.
>          */
> -       if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
> +       if (!perf_missing_features.munmap && evsel->attr.munmap) {
> +               perf_missing_features.munmap = true;
> +               goto fallback_missing_features;
> +       } else if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
>                 perf_missing_features.write_backward = true;
>                 goto out_close;
>         } else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 747a034d1ff3..24f2f309d70f 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1320,6 +1320,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
>         return -1;
>  }
>
> +static int machine__process_kernel_munmap_event(struct machine *machine __maybe_unused,
> +                                               union perf_event *event __maybe_unused)
> +{
> +       /*
> +        * XXX: Fill this in as soon as we get munmap event for kernel
> +        *      "mmaps", aka module unload
> +        */
> +       return 0;
> +}
> +
>  int machine__process_mmap2_event(struct machine *machine,
>                                  union perf_event *event,
>                                  struct perf_sample *sample)
> @@ -1379,6 +1389,48 @@ int machine__process_mmap2_event(struct machine *machine,
>         return 0;
>  }
>
> +int machine__process_munmap_event(struct machine *machine,
> +                                 union perf_event *event,
> +                                 struct perf_sample *sample)
> +{
> +       struct thread *thread;
> +       struct map *map;
> +       enum map_type type = MAP__FUNCTION;
> +       int ret = 0;
> +
> +       if (dump_trace)
> +               perf_event__fprintf_munmap(event, stdout);
> +
> +       if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
> +           sample->cpumode == PERF_RECORD_MISC_KERNEL) {
> +               ret = machine__process_kernel_munmap_event(machine, event);
> +               if (ret < 0)
> +                       goto out_problem;
> +               return 0;
> +       }
> +
> +       thread = machine__find_thread(machine, sample->pid, sample->tid);
> +       if (thread == NULL)
> +               goto out_problem;
> +
> +       if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA)
> +               type = MAP__VARIABLE;
> +
> +       map = map_groups__find(thread->mg, type, event->munmap.start);
> +       if (map != NULL)
> +               goto out_problem_map;
> +
> +       map_groups__remove(thread->mg, map);
> +       thread__put(thread);
> +       return 0;
> +
> +out_problem_map:
> +       thread__put(thread);
> +out_problem:
> +       dump_printf("problem processing PERF_RECORD_MUNMAP, skipping event.\n");
> +       return 0;
> +}
> +
>  int machine__process_mmap_event(struct machine *machine, union perf_event *event,
>                                 struct perf_sample *sample)
>  {
> @@ -1540,6 +1592,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
>                 ret = machine__process_mmap_event(machine, event, sample); break;
>         case PERF_RECORD_MMAP2:
>                 ret = machine__process_mmap2_event(machine, event, sample); break;
> +       case PERF_RECORD_MUNMAP:
> +               ret = machine__process_munmap_event(machine, event, sample); break;
>         case PERF_RECORD_FORK:
>                 ret = machine__process_fork_event(machine, event, sample); break;
>         case PERF_RECORD_EXIT:
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index a28305029711..eabc8b9d8d36 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -101,6 +101,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
>                                 struct perf_sample *sample);
>  int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
>                                  struct perf_sample *sample);
> +int machine__process_munmap_event(struct machine *machine, union perf_event *event,
> +                                 struct perf_sample *sample);
>  int machine__process_event(struct machine *machine, union perf_event *event,
>                                 struct perf_sample *sample);
>
> diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
> index a5fbc012e3df..047e004dc53f 100644
> --- a/tools/perf/util/python.c
> +++ b/tools/perf/util/python.c
> @@ -1155,6 +1155,7 @@ static struct {
>         PERF_CONST(RECORD_READ),
>         PERF_CONST(RECORD_SAMPLE),
>         PERF_CONST(RECORD_MMAP2),
> +       PERF_CONST(RECORD_MUNMAP),
>         PERF_CONST(RECORD_AUX),
>         PERF_CONST(RECORD_ITRACE_START),
>         PERF_CONST(RECORD_LOST_SAMPLES),
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 349c68144e55..42edd68ce0b8 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -357,6 +357,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
>                 tool->mmap = process_event_stub;
>         if (tool->mmap2 == NULL)
>                 tool->mmap2 = process_event_stub;
> +       if (tool->munmap == NULL)
> +               tool->munmap = process_event_stub;
>         if (tool->comm == NULL)
>                 tool->comm = process_event_stub;
>         if (tool->fork == NULL)
> @@ -480,6 +482,20 @@ static void perf_event__mmap2_swap(union perf_event *event,
>                 swap_sample_id_all(event, data);
>         }
>  }
> +
> +static void perf_event__munmap_swap(union perf_event *event, bool sample_id_all)
> +{
> +       event->munmap.start = bswap_64(event->munmap.start);
> +       event->munmap.len   = bswap_64(event->munmap.len);
> +
> +       if (sample_id_all) {
> +               void *data = &event->munmap.len;
> +
> +               data += sizeof(event->munmap.len);
> +               swap_sample_id_all(event, data);
> +       }
> +}
> +
>  static void perf_event__task_swap(union perf_event *event, bool sample_id_all)
>  {
>         event->fork.pid  = bswap_32(event->fork.pid);
> @@ -773,6 +789,7 @@ typedef void (*perf_event__swap_op)(union perf_event *event,
>  static perf_event__swap_op perf_event__swap_ops[] = {
>         [PERF_RECORD_MMAP]                = perf_event__mmap_swap,
>         [PERF_RECORD_MMAP2]               = perf_event__mmap2_swap,
> +       [PERF_RECORD_MUNMAP]              = perf_event__munmap_swap,
>         [PERF_RECORD_COMM]                = perf_event__comm_swap,
>         [PERF_RECORD_FORK]                = perf_event__task_swap,
>         [PERF_RECORD_EXIT]                = perf_event__task_swap,
> @@ -1237,6 +1254,8 @@ static int machines__deliver_event(struct machines *machines,
>                 if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT)
>                         ++evlist->stats.nr_proc_map_timeout;
>                 return tool->mmap2(tool, event, sample, machine);
> +       case PERF_RECORD_MUNMAP:
> +               return tool->munmap(tool, event, sample, machine);
>         case PERF_RECORD_COMM:
>                 return tool->comm(tool, event, sample, machine);
>         case PERF_RECORD_FORK:
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index ac2590a3de2d..66291063b8b2 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -39,6 +39,7 @@ struct perf_tool {
>                         read;
>         event_op        mmap,
>                         mmap2,
> +                       munmap,
>                         comm,
>                         fork,
>                         exit,

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-27 17:46                       ` Stephane Eranian
@ 2017-01-27 18:05                         ` Arnaldo Carvalho de Melo
  2017-01-27 18:10                           ` Stephane Eranian
  0 siblings, 1 reply; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-01-27 18:05 UTC (permalink / raw)
  To: eranian
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Stephane Eranian,
	Andres Freund, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard,
	Namhyung Kim

Em Fri, Jan 27, 2017 at 09:46:55AM -0800, Stephane Eranian escreveu:
> On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu:
> >> > Something like the (compile tested only) below might be sufficient to
> >> > disambiguate things. It would need a corresponding tools/perf patch of
> >> > course, but I'm not too familiar with that code anymore.
> >>
> >> I'm working on patch to do feature test, fallback and handling of the
> >> event, etc, will post later.
> >
> > Just compile tested, need to build a kernel with PeterZ's patch to test,
> > feel free to go from there if in a hurry.
> >
> >
> > The place where the map is yanked out of the thread's maps rbtree is at
> >
> > machine__process_munmap_event()
> >
> > The rest is making sure the tool works with older kernels, deals with
> > endianness in the record in a perf.data file for cross platform
> > analysis, hooking it to the various tools where handling this event
> > makes sense.
> >
> At first glance this patch handles the munmap() well. But it will not solve
> the case of Andres. Unless you're telling me that the kernel with Peterz's patch


Nah, I just tried to implement support for the facility PeterZ was
proposing for the kernel, not trying to solve Andres, silly me ;-)

But then it doesn't even does that well, as it needs to take munmap.len
into account, to possibly split the map if they aren't a perfect match
(start, len).

> will now generate munmap record because of the merging. If not, then the code
> handling overlaps needs to change as well as I described in my other Email.

Will re-read it, found it confusing at first read :-\

- Arnaldo
 
> > [acme@jouet linux]$ diffstat /tmp/a.patch
> >  include/uapi/linux/perf_event.h |   14 +++++++++-
> >  perf/builtin-annotate.c         |    1
> >  perf/builtin-c2c.c              |    1
> >  perf/builtin-diff.c             |    1
> >  perf/builtin-inject.c           |   17 +++++++++++-
> >  perf/builtin-kmem.c             |    1
> >  perf/builtin-mem.c              |    1
> >  perf/builtin-record.c           |    1
> >  perf/builtin-report.c           |    1
> >  perf/builtin-script.c           |   30 ++++++++++++++++++++++
> >  perf/builtin-trace.c            |    1
> >  perf/util/data-convert-bt.c     |    1
> >  perf/util/event.c               |   18 +++++++++++++
> >  perf/util/event.h               |   10 +++++++
> >  perf/util/evsel.c               |   12 +++++++-
> >  perf/util/machine.c             |   54 ++++++++++++++++++++++++++++++++++++++++
> >  perf/util/machine.h             |    2 +
> >  perf/util/python.c              |    1
> >  perf/util/session.c             |   19 ++++++++++++++
> >  perf/util/tool.h                |    1
> >  20 files changed, 183 insertions(+), 4 deletions(-)
> > [acme@jouet linux]$
> >
> > diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> > index c66a485a24ac..59c5cd5abbbf 100644
> > --- a/tools/include/uapi/linux/perf_event.h
> > +++ b/tools/include/uapi/linux/perf_event.h
> > @@ -344,7 +344,8 @@ struct perf_event_attr {
> >                                 use_clockid    :  1, /* use @clockid for time fields */
> >                                 context_switch :  1, /* context switch data */
> >                                 write_backward :  1, /* Write ring buffer from end to beginning */
> > -                               __reserved_1   : 36;
> > +                               munmap         :  1, /* include munmap data */
> > +                               __reserved_1   : 35;
> >
> >         union {
> >                 __u32           wakeup_events;    /* wakeup every n events */
> > @@ -862,6 +863,17 @@ enum perf_event_type {
> >          */
> >         PERF_RECORD_SWITCH_CPU_WIDE             = 15,
> >
> > +       /*
> > +        * struct {
> > +        *      struct perf_event_header        header;
> > +        *
> > +        *      u64                             addr;
> > +        *      u64                             len;
> > +        *      struct sample_id                sample_id;
> > +        * };
> > +        */
> > +       PERF_RECORD_MUNMAP                      = 16,
> > +
> >         PERF_RECORD_MAX,                        /* non-ABI */
> >  };
> >
> > diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> > index ebb628332a6e..082b5f100ac5 100644
> > --- a/tools/perf/builtin-annotate.c
> > +++ b/tools/perf/builtin-annotate.c
> > @@ -390,6 +390,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
> >                         .sample = process_sample_event,
> >                         .mmap   = perf_event__process_mmap,
> >                         .mmap2  = perf_event__process_mmap2,
> > +                       .munmap = perf_event__process_munmap,
> >                         .comm   = perf_event__process_comm,
> >                         .exit   = perf_event__process_exit,
> >                         .fork   = perf_event__process_fork,
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index e2b21723bbf8..3e1f1eba77e2 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -313,6 +313,7 @@ static struct perf_c2c c2c = {
> >                 .sample         = process_sample_event,
> >                 .mmap           = perf_event__process_mmap,
> >                 .mmap2          = perf_event__process_mmap2,
> > +               .munmap         = perf_event__process_munmap,
> >                 .comm           = perf_event__process_comm,
> >                 .exit           = perf_event__process_exit,
> >                 .fork           = perf_event__process_fork,
> > diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> > index 9ff0db4e2d0c..2c7856848887 100644
> > --- a/tools/perf/builtin-diff.c
> > +++ b/tools/perf/builtin-diff.c
> > @@ -350,6 +350,7 @@ static struct perf_tool tool = {
> >         .sample = diff__process_sample_event,
> >         .mmap   = perf_event__process_mmap,
> >         .mmap2  = perf_event__process_mmap2,
> > +       .munmap = perf_event__process_munmap,
> >         .comm   = perf_event__process_comm,
> >         .exit   = perf_event__process_exit,
> >         .fork   = perf_event__process_fork,
> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> > index b9bc7e39833a..4e6103d0c163 100644
> > --- a/tools/perf/builtin-inject.c
> > +++ b/tools/perf/builtin-inject.c
> > @@ -282,6 +282,19 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool,
> >         return err;
> >  }
> >
> > +static int perf_event__repipe_munmap(struct perf_tool *tool,
> > +                                    union perf_event *event,
> > +                                    struct perf_sample *sample,
> > +                                    struct machine *machine)
> > +{
> > +       int err;
> > +
> > +       err = perf_event__process_munmap(tool, event, sample, machine);
> > +       perf_event__repipe(tool, event, sample, machine);
> > +
> > +       return err;
> > +}
> > +
> >  #ifdef HAVE_JITDUMP
> >  static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
> >                                         union perf_event *event,
> > @@ -569,7 +582,7 @@ static void strip_init(struct perf_inject *inject)
> >  static bool has_tracking(struct perf_evsel *evsel)
> >  {
> >         return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm ||
> > -              evsel->attr.task;
> > +              evsel->attr.task || evsel->attr.munmap;
> >  }
> >
> >  #define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
> > @@ -632,6 +645,7 @@ static int __cmd_inject(struct perf_inject *inject)
> >             inject->itrace_synth_opts.set) {
> >                 inject->tool.mmap         = perf_event__repipe_mmap;
> >                 inject->tool.mmap2        = perf_event__repipe_mmap2;
> > +               inject->tool.munmap       = perf_event__repipe_munmap;
> >                 inject->tool.fork         = perf_event__repipe_fork;
> >                 inject->tool.tracing_data = perf_event__repipe_tracing_data;
> >         }
> > @@ -732,6 +746,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
> >                         .sample         = perf_event__repipe_sample,
> >                         .mmap           = perf_event__repipe,
> >                         .mmap2          = perf_event__repipe,
> > +                       .munmap         = perf_event__repipe,
> >                         .comm           = perf_event__repipe,
> >                         .fork           = perf_event__repipe,
> >                         .exit           = perf_event__repipe,
> > diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> > index 29f4751a3574..eb59aa7a0f5b 100644
> > --- a/tools/perf/builtin-kmem.c
> > +++ b/tools/perf/builtin-kmem.c
> > @@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = {
> >         .comm            = perf_event__process_comm,
> >         .mmap            = perf_event__process_mmap,
> >         .mmap2           = perf_event__process_mmap2,
> > +       .munmap          = perf_event__process_munmap,
> >         .ordered_events  = true,
> >  };
> >
> > diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
> > index cd7bc4d104e2..de026e9ef1be 100644
> > --- a/tools/perf/builtin-mem.c
> > +++ b/tools/perf/builtin-mem.c
> > @@ -338,6 +338,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
> >                         .sample         = process_sample_event,
> >                         .mmap           = perf_event__process_mmap,
> >                         .mmap2          = perf_event__process_mmap2,
> > +                       .munmap         = perf_event__process_munmap,
> >                         .comm           = perf_event__process_comm,
> >                         .lost           = perf_event__process_lost,
> >                         .fork           = perf_event__process_fork,
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index ffac8ca9fb01..f197fc196cd8 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -1498,6 +1498,7 @@ static struct record record = {
> >                 .comm           = perf_event__process_comm,
> >                 .mmap           = perf_event__process_mmap,
> >                 .mmap2          = perf_event__process_mmap2,
> > +               .munmap         = perf_event__process_munmap,
> >                 .ordered_events = true,
> >         },
> >  };
> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> > index dbd7fa028861..96680fbeb664 100644
> > --- a/tools/perf/builtin-report.c
> > +++ b/tools/perf/builtin-report.c
> > @@ -693,6 +693,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
> >                         .sample          = process_sample_event,
> >                         .mmap            = perf_event__process_mmap,
> >                         .mmap2           = perf_event__process_mmap2,
> > +                       .munmap          = perf_event__process_munmap,
> >                         .comm            = perf_event__process_comm,
> >                         .exit            = perf_event__process_exit,
> >                         .fork            = perf_event__process_fork,
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index c0783b4f7b6c..27fcbcb15782 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -1244,6 +1244,34 @@ static int process_mmap2_event(struct perf_tool *tool,
> >         return 0;
> >  }
> >
> > +static int process_munmap_event(struct perf_tool *tool, union perf_event *event,
> > +                               struct perf_sample *sample, struct machine *machine)
> > +{
> > +       struct thread *thread;
> > +       struct perf_script *script = container_of(tool, struct perf_script, tool);
> > +       struct perf_session *session = script->session;
> > +       struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
> > +
> > +       if (perf_event__process_munmap(tool, event, sample, machine) < 0)
> > +               return -1;
> > +
> > +       thread = machine__findnew_thread(machine, sample->pid, sample->tid);
> > +       if (thread == NULL) {
> > +               pr_debug("problem processing MUNMAP event, skipping it.\n");
> > +               return -1;
> > +       }
> > +
> > +       if (!evsel->attr.sample_id_all) {
> > +               pr_debug("MUNMAP event requires attr.sample_id_all, skipping it.\n");
> > +               return -1;
> > +       }
> > +
> > +       print_sample_start(sample, thread, evsel);
> > +       perf_event__fprintf(event, stdout);
> > +       thread__put(thread);
> > +       return 0;
> > +}
> > +
> >  static int process_switch_event(struct perf_tool *tool,
> >                                 union perf_event *event,
> >                                 struct perf_sample *sample,
> > @@ -1290,6 +1318,7 @@ static int __cmd_script(struct perf_script *script)
> >         if (script->show_mmap_events) {
> >                 script->tool.mmap = process_mmap_event;
> >                 script->tool.mmap2 = process_mmap2_event;
> > +               script->tool.munmap = process_munmap_event;
> >         }
> >         if (script->show_switch_events)
> >                 script->tool.context_switch = process_switch_event;
> > @@ -2096,6 +2125,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
> >                         .sample          = process_sample_event,
> >                         .mmap            = perf_event__process_mmap,
> >                         .mmap2           = perf_event__process_mmap2,
> > +                       .munmap          = perf_event__process_munmap,
> >                         .comm            = perf_event__process_comm,
> >                         .exit            = perf_event__process_exit,
> >                         .fork            = perf_event__process_fork,
> > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> > index 40ef9b293d1b..cfb7e9e2cd7d 100644
> > --- a/tools/perf/builtin-trace.c
> > +++ b/tools/perf/builtin-trace.c
> > @@ -2411,6 +2411,7 @@ static int trace__replay(struct trace *trace)
> >         trace->tool.sample        = trace__process_sample;
> >         trace->tool.mmap          = perf_event__process_mmap;
> >         trace->tool.mmap2         = perf_event__process_mmap2;
> > +       trace->tool.munmap        = perf_event__process_munmap;
> >         trace->tool.comm          = perf_event__process_comm;
> >         trace->tool.exit          = perf_event__process_exit;
> >         trace->tool.fork          = perf_event__process_fork;
> > diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
> > index 4e6cbc99f08e..8d1feb9fe82d 100644
> > --- a/tools/perf/util/data-convert-bt.c
> > +++ b/tools/perf/util/data-convert-bt.c
> > @@ -1462,6 +1462,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
> >                         .sample          = process_sample_event,
> >                         .mmap            = perf_event__process_mmap,
> >                         .mmap2           = perf_event__process_mmap2,
> > +                       .munmap          = perf_event__process_munmap,
> >                         .comm            = perf_event__process_comm,
> >                         .exit            = perf_event__process_exit,
> >                         .fork            = perf_event__process_fork,
> > diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> > index 8ab0d7da956b..ffcba9c07c0d 100644
> > --- a/tools/perf/util/event.c
> > +++ b/tools/perf/util/event.c
> > @@ -18,6 +18,7 @@ static const char *perf_event__names[] = {
> >         [0]                                     = "TOTAL",
> >         [PERF_RECORD_MMAP]                      = "MMAP",
> >         [PERF_RECORD_MMAP2]                     = "MMAP2",
> > +       [PERF_RECORD_MUNMAP]                    = "MUNMAP",
> >         [PERF_RECORD_LOST]                      = "LOST",
> >         [PERF_RECORD_COMM]                      = "COMM",
> >         [PERF_RECORD_EXIT]                      = "EXIT",
> > @@ -1080,6 +1081,12 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp)
> >                        event->mmap2.filename);
> >  }
> >
> > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp)
> > +{
> > +       return fprintf(fp, " [%#" PRIx64 ", %" PRIu64 "]\n",
> > +                      event->munmap.start, event->munmap.len);
> > +}
> > +
> >  size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp)
> >  {
> >         struct thread_map *threads = thread_map__new_event(&event->thread_map);
> > @@ -1128,6 +1135,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused,
> >         return machine__process_mmap2_event(machine, event, sample);
> >  }
> >
> > +int perf_event__process_munmap(struct perf_tool *tool __maybe_unused,
> > +                              union perf_event *event,
> > +                              struct perf_sample *sample,
> > +                              struct machine *machine)
> > +{
> > +       return machine__process_munmap_event(machine, event, sample);
> > +}
> > +
> >  size_t perf_event__fprintf_task(union perf_event *event, FILE *fp)
> >  {
> >         return fprintf(fp, "(%d:%d):(%d:%d)\n",
> > @@ -1199,6 +1214,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
> >         case PERF_RECORD_MMAP2:
> >                 ret += perf_event__fprintf_mmap2(event, fp);
> >                 break;
> > +       case PERF_RECORD_MUNMAP:
> > +               ret += perf_event__fprintf_munmap(event, fp);
> > +               break;
> >         case PERF_RECORD_AUX:
> >                 ret += perf_event__fprintf_aux(event, fp);
> >                 break;
> > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> > index c735c53a26f8..00942b6a9d48 100644
> > --- a/tools/perf/util/event.h
> > +++ b/tools/perf/util/event.h
> > @@ -33,6 +33,12 @@ struct mmap2_event {
> >         char filename[PATH_MAX];
> >  };
> >
> > +struct munmap_event {
> > +       struct perf_event_header header;
> > +       u64                      start;
> > +       u64                      len;
> > +};
> > +
> >  struct comm_event {
> >         struct perf_event_header header;
> >         u32 pid, tid;
> > @@ -484,6 +490,7 @@ union perf_event {
> >         struct perf_event_header        header;
> >         struct mmap_event               mmap;
> >         struct mmap2_event              mmap2;
> > +       struct munmap_event             munmap;
> >         struct comm_event               comm;
> >         struct fork_event               fork;
> >         struct lost_event               lost;
> > @@ -595,6 +602,8 @@ int perf_event__process_mmap2(struct perf_tool *tool,
> >                              union perf_event *event,
> >                              struct perf_sample *sample,
> >                              struct machine *machine);
> > +int perf_event__process_munmap(struct perf_tool *tool, union perf_event *event,
> > +                              struct perf_sample *sample, struct machine *machine);
> >  int perf_event__process_fork(struct perf_tool *tool,
> >                              union perf_event *event,
> >                              struct perf_sample *sample,
> > @@ -647,6 +656,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> >  size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp);
> >  size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp);
> >  size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
> > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp);
> >  size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
> >  size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp);
> >  size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index 04e536ae4d88..c4b88e4f2422 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -34,6 +34,7 @@ static struct {
> >         bool sample_id_all;
> >         bool exclude_guest;
> >         bool mmap2;
> > +       bool munmap;
> >         bool cloexec;
> >         bool clockid;
> >         bool clockid_wrong;
> > @@ -930,6 +931,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
> >         attr->task  = track;
> >         attr->mmap  = track;
> >         attr->mmap2 = track && !perf_missing_features.mmap2;
> > +       attr->munmap = track && !perf_missing_features.munmap;
> >         attr->comm  = track;
> >
> >         if (opts->record_switch_events)
> > @@ -1395,6 +1397,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
> >         PRINT_ATTRf(exclude_callchain_kernel, p_unsigned);
> >         PRINT_ATTRf(exclude_callchain_user, p_unsigned);
> >         PRINT_ATTRf(mmap2, p_unsigned);
> > +       PRINT_ATTRf(munmap, p_unsigned);
> >         PRINT_ATTRf(comm_exec, p_unsigned);
> >         PRINT_ATTRf(use_clockid, p_unsigned);
> >         PRINT_ATTRf(context_switch, p_unsigned);
> > @@ -1474,6 +1477,8 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
> >         }
> >
> >  fallback_missing_features:
> > +       if (perf_missing_features.munmap)
> > +               evsel->attr.munmap = 0;
> >         if (perf_missing_features.clockid_wrong)
> >                 evsel->attr.clockid = CLOCK_MONOTONIC; /* should always work */
> >         if (perf_missing_features.clockid) {
> > @@ -1603,10 +1608,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
> >                 goto out_close;
> >
> >         /*
> > -        * Must probe features in the order they were added to the
> > +        * Must probe features in the reverse order they were added to the
> >          * perf_event_attr interface.
> >          */
> > -       if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
> > +       if (!perf_missing_features.munmap && evsel->attr.munmap) {
> > +               perf_missing_features.munmap = true;
> > +               goto fallback_missing_features;
> > +       } else if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
> >                 perf_missing_features.write_backward = true;
> >                 goto out_close;
> >         } else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 747a034d1ff3..24f2f309d70f 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -1320,6 +1320,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
> >         return -1;
> >  }
> >
> > +static int machine__process_kernel_munmap_event(struct machine *machine __maybe_unused,
> > +                                               union perf_event *event __maybe_unused)
> > +{
> > +       /*
> > +        * XXX: Fill this in as soon as we get munmap event for kernel
> > +        *      "mmaps", aka module unload
> > +        */
> > +       return 0;
> > +}
> > +
> >  int machine__process_mmap2_event(struct machine *machine,
> >                                  union perf_event *event,
> >                                  struct perf_sample *sample)
> > @@ -1379,6 +1389,48 @@ int machine__process_mmap2_event(struct machine *machine,
> >         return 0;
> >  }
> >
> > +int machine__process_munmap_event(struct machine *machine,
> > +                                 union perf_event *event,
> > +                                 struct perf_sample *sample)
> > +{
> > +       struct thread *thread;
> > +       struct map *map;
> > +       enum map_type type = MAP__FUNCTION;
> > +       int ret = 0;
> > +
> > +       if (dump_trace)
> > +               perf_event__fprintf_munmap(event, stdout);
> > +
> > +       if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
> > +           sample->cpumode == PERF_RECORD_MISC_KERNEL) {
> > +               ret = machine__process_kernel_munmap_event(machine, event);
> > +               if (ret < 0)
> > +                       goto out_problem;
> > +               return 0;
> > +       }
> > +
> > +       thread = machine__find_thread(machine, sample->pid, sample->tid);
> > +       if (thread == NULL)
> > +               goto out_problem;
> > +
> > +       if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA)
> > +               type = MAP__VARIABLE;
> > +
> > +       map = map_groups__find(thread->mg, type, event->munmap.start);
> > +       if (map != NULL)
> > +               goto out_problem_map;
> > +
> > +       map_groups__remove(thread->mg, map);
> > +       thread__put(thread);
> > +       return 0;
> > +
> > +out_problem_map:
> > +       thread__put(thread);
> > +out_problem:
> > +       dump_printf("problem processing PERF_RECORD_MUNMAP, skipping event.\n");
> > +       return 0;
> > +}
> > +
> >  int machine__process_mmap_event(struct machine *machine, union perf_event *event,
> >                                 struct perf_sample *sample)
> >  {
> > @@ -1540,6 +1592,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
> >                 ret = machine__process_mmap_event(machine, event, sample); break;
> >         case PERF_RECORD_MMAP2:
> >                 ret = machine__process_mmap2_event(machine, event, sample); break;
> > +       case PERF_RECORD_MUNMAP:
> > +               ret = machine__process_munmap_event(machine, event, sample); break;
> >         case PERF_RECORD_FORK:
> >                 ret = machine__process_fork_event(machine, event, sample); break;
> >         case PERF_RECORD_EXIT:
> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > index a28305029711..eabc8b9d8d36 100644
> > --- a/tools/perf/util/machine.h
> > +++ b/tools/perf/util/machine.h
> > @@ -101,6 +101,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
> >                                 struct perf_sample *sample);
> >  int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
> >                                  struct perf_sample *sample);
> > +int machine__process_munmap_event(struct machine *machine, union perf_event *event,
> > +                                 struct perf_sample *sample);
> >  int machine__process_event(struct machine *machine, union perf_event *event,
> >                                 struct perf_sample *sample);
> >
> > diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
> > index a5fbc012e3df..047e004dc53f 100644
> > --- a/tools/perf/util/python.c
> > +++ b/tools/perf/util/python.c
> > @@ -1155,6 +1155,7 @@ static struct {
> >         PERF_CONST(RECORD_READ),
> >         PERF_CONST(RECORD_SAMPLE),
> >         PERF_CONST(RECORD_MMAP2),
> > +       PERF_CONST(RECORD_MUNMAP),
> >         PERF_CONST(RECORD_AUX),
> >         PERF_CONST(RECORD_ITRACE_START),
> >         PERF_CONST(RECORD_LOST_SAMPLES),
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index 349c68144e55..42edd68ce0b8 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -357,6 +357,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
> >                 tool->mmap = process_event_stub;
> >         if (tool->mmap2 == NULL)
> >                 tool->mmap2 = process_event_stub;
> > +       if (tool->munmap == NULL)
> > +               tool->munmap = process_event_stub;
> >         if (tool->comm == NULL)
> >                 tool->comm = process_event_stub;
> >         if (tool->fork == NULL)
> > @@ -480,6 +482,20 @@ static void perf_event__mmap2_swap(union perf_event *event,
> >                 swap_sample_id_all(event, data);
> >         }
> >  }
> > +
> > +static void perf_event__munmap_swap(union perf_event *event, bool sample_id_all)
> > +{
> > +       event->munmap.start = bswap_64(event->munmap.start);
> > +       event->munmap.len   = bswap_64(event->munmap.len);
> > +
> > +       if (sample_id_all) {
> > +               void *data = &event->munmap.len;
> > +
> > +               data += sizeof(event->munmap.len);
> > +               swap_sample_id_all(event, data);
> > +       }
> > +}
> > +
> >  static void perf_event__task_swap(union perf_event *event, bool sample_id_all)
> >  {
> >         event->fork.pid  = bswap_32(event->fork.pid);
> > @@ -773,6 +789,7 @@ typedef void (*perf_event__swap_op)(union perf_event *event,
> >  static perf_event__swap_op perf_event__swap_ops[] = {
> >         [PERF_RECORD_MMAP]                = perf_event__mmap_swap,
> >         [PERF_RECORD_MMAP2]               = perf_event__mmap2_swap,
> > +       [PERF_RECORD_MUNMAP]              = perf_event__munmap_swap,
> >         [PERF_RECORD_COMM]                = perf_event__comm_swap,
> >         [PERF_RECORD_FORK]                = perf_event__task_swap,
> >         [PERF_RECORD_EXIT]                = perf_event__task_swap,
> > @@ -1237,6 +1254,8 @@ static int machines__deliver_event(struct machines *machines,
> >                 if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT)
> >                         ++evlist->stats.nr_proc_map_timeout;
> >                 return tool->mmap2(tool, event, sample, machine);
> > +       case PERF_RECORD_MUNMAP:
> > +               return tool->munmap(tool, event, sample, machine);
> >         case PERF_RECORD_COMM:
> >                 return tool->comm(tool, event, sample, machine);
> >         case PERF_RECORD_FORK:
> > diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> > index ac2590a3de2d..66291063b8b2 100644
> > --- a/tools/perf/util/tool.h
> > +++ b/tools/perf/util/tool.h
> > @@ -39,6 +39,7 @@ struct perf_tool {
> >                         read;
> >         event_op        mmap,
> >                         mmap2,
> > +                       munmap,
> >                         comm,
> >                         fork,
> >                         exit,

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-27 18:05                         ` Arnaldo Carvalho de Melo
@ 2017-01-27 18:10                           ` Stephane Eranian
  2017-01-27 19:18                             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 29+ messages in thread
From: Stephane Eranian @ 2017-01-27 18:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Stephane Eranian, Andres Freund, LKML, Jiri Olsa,
	Ingo Molnar, Anton Blanchard, Namhyung Kim

On Fri, Jan 27, 2017 at 10:05 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Fri, Jan 27, 2017 at 09:46:55AM -0800, Stephane Eranian escreveu:
>> On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo
>> <arnaldo.melo@gmail.com> wrote:
>> > Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu:
>> >> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu:
>> >> > Something like the (compile tested only) below might be sufficient to
>> >> > disambiguate things. It would need a corresponding tools/perf patch of
>> >> > course, but I'm not too familiar with that code anymore.
>> >>
>> >> I'm working on patch to do feature test, fallback and handling of the
>> >> event, etc, will post later.
>> >
>> > Just compile tested, need to build a kernel with PeterZ's patch to test,
>> > feel free to go from there if in a hurry.
>> >
>> >
>> > The place where the map is yanked out of the thread's maps rbtree is at
>> >
>> > machine__process_munmap_event()
>> >
>> > The rest is making sure the tool works with older kernels, deals with
>> > endianness in the record in a perf.data file for cross platform
>> > analysis, hooking it to the various tools where handling this event
>> > makes sense.
>> >
>> At first glance this patch handles the munmap() well. But it will not solve
>> the case of Andres. Unless you're telling me that the kernel with Peterz's patch
>
>
> Nah, I just tried to implement support for the facility PeterZ was
> proposing for the kernel, not trying to solve Andres, silly me ;-)
>
> But then it doesn't even does that well, as it needs to take munmap.len
> into account, to possibly split the map if they aren't a perfect match
> (start, len).
>
Ah, yes, that's correct. You need to consider len and possibly split
the maps you have.

>> will now generate munmap record because of the merging. If not, then the code
>> handling overlaps needs to change as well as I described in my other Email.
>
> Will re-read it, found it confusing at first read :-\
>
Took me some time to understand his test case as well.
But the key point is the fact that anon cannot overlap a non-anon map
without a prior munmap() anymore.

> - Arnaldo
>
>> > [acme@jouet linux]$ diffstat /tmp/a.patch
>> >  include/uapi/linux/perf_event.h |   14 +++++++++-
>> >  perf/builtin-annotate.c         |    1
>> >  perf/builtin-c2c.c              |    1
>> >  perf/builtin-diff.c             |    1
>> >  perf/builtin-inject.c           |   17 +++++++++++-
>> >  perf/builtin-kmem.c             |    1
>> >  perf/builtin-mem.c              |    1
>> >  perf/builtin-record.c           |    1
>> >  perf/builtin-report.c           |    1
>> >  perf/builtin-script.c           |   30 ++++++++++++++++++++++
>> >  perf/builtin-trace.c            |    1
>> >  perf/util/data-convert-bt.c     |    1
>> >  perf/util/event.c               |   18 +++++++++++++
>> >  perf/util/event.h               |   10 +++++++
>> >  perf/util/evsel.c               |   12 +++++++-
>> >  perf/util/machine.c             |   54 ++++++++++++++++++++++++++++++++++++++++
>> >  perf/util/machine.h             |    2 +
>> >  perf/util/python.c              |    1
>> >  perf/util/session.c             |   19 ++++++++++++++
>> >  perf/util/tool.h                |    1
>> >  20 files changed, 183 insertions(+), 4 deletions(-)
>> > [acme@jouet linux]$
>> >
>> > diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
>> > index c66a485a24ac..59c5cd5abbbf 100644
>> > --- a/tools/include/uapi/linux/perf_event.h
>> > +++ b/tools/include/uapi/linux/perf_event.h
>> > @@ -344,7 +344,8 @@ struct perf_event_attr {
>> >                                 use_clockid    :  1, /* use @clockid for time fields */
>> >                                 context_switch :  1, /* context switch data */
>> >                                 write_backward :  1, /* Write ring buffer from end to beginning */
>> > -                               __reserved_1   : 36;
>> > +                               munmap         :  1, /* include munmap data */
>> > +                               __reserved_1   : 35;
>> >
>> >         union {
>> >                 __u32           wakeup_events;    /* wakeup every n events */
>> > @@ -862,6 +863,17 @@ enum perf_event_type {
>> >          */
>> >         PERF_RECORD_SWITCH_CPU_WIDE             = 15,
>> >
>> > +       /*
>> > +        * struct {
>> > +        *      struct perf_event_header        header;
>> > +        *
>> > +        *      u64                             addr;
>> > +        *      u64                             len;
>> > +        *      struct sample_id                sample_id;
>> > +        * };
>> > +        */
>> > +       PERF_RECORD_MUNMAP                      = 16,
>> > +
>> >         PERF_RECORD_MAX,                        /* non-ABI */
>> >  };
>> >
>> > diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
>> > index ebb628332a6e..082b5f100ac5 100644
>> > --- a/tools/perf/builtin-annotate.c
>> > +++ b/tools/perf/builtin-annotate.c
>> > @@ -390,6 +390,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
>> >                         .sample = process_sample_event,
>> >                         .mmap   = perf_event__process_mmap,
>> >                         .mmap2  = perf_event__process_mmap2,
>> > +                       .munmap = perf_event__process_munmap,
>> >                         .comm   = perf_event__process_comm,
>> >                         .exit   = perf_event__process_exit,
>> >                         .fork   = perf_event__process_fork,
>> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
>> > index e2b21723bbf8..3e1f1eba77e2 100644
>> > --- a/tools/perf/builtin-c2c.c
>> > +++ b/tools/perf/builtin-c2c.c
>> > @@ -313,6 +313,7 @@ static struct perf_c2c c2c = {
>> >                 .sample         = process_sample_event,
>> >                 .mmap           = perf_event__process_mmap,
>> >                 .mmap2          = perf_event__process_mmap2,
>> > +               .munmap         = perf_event__process_munmap,
>> >                 .comm           = perf_event__process_comm,
>> >                 .exit           = perf_event__process_exit,
>> >                 .fork           = perf_event__process_fork,
>> > diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
>> > index 9ff0db4e2d0c..2c7856848887 100644
>> > --- a/tools/perf/builtin-diff.c
>> > +++ b/tools/perf/builtin-diff.c
>> > @@ -350,6 +350,7 @@ static struct perf_tool tool = {
>> >         .sample = diff__process_sample_event,
>> >         .mmap   = perf_event__process_mmap,
>> >         .mmap2  = perf_event__process_mmap2,
>> > +       .munmap = perf_event__process_munmap,
>> >         .comm   = perf_event__process_comm,
>> >         .exit   = perf_event__process_exit,
>> >         .fork   = perf_event__process_fork,
>> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
>> > index b9bc7e39833a..4e6103d0c163 100644
>> > --- a/tools/perf/builtin-inject.c
>> > +++ b/tools/perf/builtin-inject.c
>> > @@ -282,6 +282,19 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool,
>> >         return err;
>> >  }
>> >
>> > +static int perf_event__repipe_munmap(struct perf_tool *tool,
>> > +                                    union perf_event *event,
>> > +                                    struct perf_sample *sample,
>> > +                                    struct machine *machine)
>> > +{
>> > +       int err;
>> > +
>> > +       err = perf_event__process_munmap(tool, event, sample, machine);
>> > +       perf_event__repipe(tool, event, sample, machine);
>> > +
>> > +       return err;
>> > +}
>> > +
>> >  #ifdef HAVE_JITDUMP
>> >  static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
>> >                                         union perf_event *event,
>> > @@ -569,7 +582,7 @@ static void strip_init(struct perf_inject *inject)
>> >  static bool has_tracking(struct perf_evsel *evsel)
>> >  {
>> >         return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm ||
>> > -              evsel->attr.task;
>> > +              evsel->attr.task || evsel->attr.munmap;
>> >  }
>> >
>> >  #define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
>> > @@ -632,6 +645,7 @@ static int __cmd_inject(struct perf_inject *inject)
>> >             inject->itrace_synth_opts.set) {
>> >                 inject->tool.mmap         = perf_event__repipe_mmap;
>> >                 inject->tool.mmap2        = perf_event__repipe_mmap2;
>> > +               inject->tool.munmap       = perf_event__repipe_munmap;
>> >                 inject->tool.fork         = perf_event__repipe_fork;
>> >                 inject->tool.tracing_data = perf_event__repipe_tracing_data;
>> >         }
>> > @@ -732,6 +746,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
>> >                         .sample         = perf_event__repipe_sample,
>> >                         .mmap           = perf_event__repipe,
>> >                         .mmap2          = perf_event__repipe,
>> > +                       .munmap         = perf_event__repipe,
>> >                         .comm           = perf_event__repipe,
>> >                         .fork           = perf_event__repipe,
>> >                         .exit           = perf_event__repipe,
>> > diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
>> > index 29f4751a3574..eb59aa7a0f5b 100644
>> > --- a/tools/perf/builtin-kmem.c
>> > +++ b/tools/perf/builtin-kmem.c
>> > @@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = {
>> >         .comm            = perf_event__process_comm,
>> >         .mmap            = perf_event__process_mmap,
>> >         .mmap2           = perf_event__process_mmap2,
>> > +       .munmap          = perf_event__process_munmap,
>> >         .ordered_events  = true,
>> >  };
>> >
>> > diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
>> > index cd7bc4d104e2..de026e9ef1be 100644
>> > --- a/tools/perf/builtin-mem.c
>> > +++ b/tools/perf/builtin-mem.c
>> > @@ -338,6 +338,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
>> >                         .sample         = process_sample_event,
>> >                         .mmap           = perf_event__process_mmap,
>> >                         .mmap2          = perf_event__process_mmap2,
>> > +                       .munmap         = perf_event__process_munmap,
>> >                         .comm           = perf_event__process_comm,
>> >                         .lost           = perf_event__process_lost,
>> >                         .fork           = perf_event__process_fork,
>> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> > index ffac8ca9fb01..f197fc196cd8 100644
>> > --- a/tools/perf/builtin-record.c
>> > +++ b/tools/perf/builtin-record.c
>> > @@ -1498,6 +1498,7 @@ static struct record record = {
>> >                 .comm           = perf_event__process_comm,
>> >                 .mmap           = perf_event__process_mmap,
>> >                 .mmap2          = perf_event__process_mmap2,
>> > +               .munmap         = perf_event__process_munmap,
>> >                 .ordered_events = true,
>> >         },
>> >  };
>> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>> > index dbd7fa028861..96680fbeb664 100644
>> > --- a/tools/perf/builtin-report.c
>> > +++ b/tools/perf/builtin-report.c
>> > @@ -693,6 +693,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>> >                         .sample          = process_sample_event,
>> >                         .mmap            = perf_event__process_mmap,
>> >                         .mmap2           = perf_event__process_mmap2,
>> > +                       .munmap          = perf_event__process_munmap,
>> >                         .comm            = perf_event__process_comm,
>> >                         .exit            = perf_event__process_exit,
>> >                         .fork            = perf_event__process_fork,
>> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
>> > index c0783b4f7b6c..27fcbcb15782 100644
>> > --- a/tools/perf/builtin-script.c
>> > +++ b/tools/perf/builtin-script.c
>> > @@ -1244,6 +1244,34 @@ static int process_mmap2_event(struct perf_tool *tool,
>> >         return 0;
>> >  }
>> >
>> > +static int process_munmap_event(struct perf_tool *tool, union perf_event *event,
>> > +                               struct perf_sample *sample, struct machine *machine)
>> > +{
>> > +       struct thread *thread;
>> > +       struct perf_script *script = container_of(tool, struct perf_script, tool);
>> > +       struct perf_session *session = script->session;
>> > +       struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
>> > +
>> > +       if (perf_event__process_munmap(tool, event, sample, machine) < 0)
>> > +               return -1;
>> > +
>> > +       thread = machine__findnew_thread(machine, sample->pid, sample->tid);
>> > +       if (thread == NULL) {
>> > +               pr_debug("problem processing MUNMAP event, skipping it.\n");
>> > +               return -1;
>> > +       }
>> > +
>> > +       if (!evsel->attr.sample_id_all) {
>> > +               pr_debug("MUNMAP event requires attr.sample_id_all, skipping it.\n");
>> > +               return -1;
>> > +       }
>> > +
>> > +       print_sample_start(sample, thread, evsel);
>> > +       perf_event__fprintf(event, stdout);
>> > +       thread__put(thread);
>> > +       return 0;
>> > +}
>> > +
>> >  static int process_switch_event(struct perf_tool *tool,
>> >                                 union perf_event *event,
>> >                                 struct perf_sample *sample,
>> > @@ -1290,6 +1318,7 @@ static int __cmd_script(struct perf_script *script)
>> >         if (script->show_mmap_events) {
>> >                 script->tool.mmap = process_mmap_event;
>> >                 script->tool.mmap2 = process_mmap2_event;
>> > +               script->tool.munmap = process_munmap_event;
>> >         }
>> >         if (script->show_switch_events)
>> >                 script->tool.context_switch = process_switch_event;
>> > @@ -2096,6 +2125,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
>> >                         .sample          = process_sample_event,
>> >                         .mmap            = perf_event__process_mmap,
>> >                         .mmap2           = perf_event__process_mmap2,
>> > +                       .munmap          = perf_event__process_munmap,
>> >                         .comm            = perf_event__process_comm,
>> >                         .exit            = perf_event__process_exit,
>> >                         .fork            = perf_event__process_fork,
>> > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
>> > index 40ef9b293d1b..cfb7e9e2cd7d 100644
>> > --- a/tools/perf/builtin-trace.c
>> > +++ b/tools/perf/builtin-trace.c
>> > @@ -2411,6 +2411,7 @@ static int trace__replay(struct trace *trace)
>> >         trace->tool.sample        = trace__process_sample;
>> >         trace->tool.mmap          = perf_event__process_mmap;
>> >         trace->tool.mmap2         = perf_event__process_mmap2;
>> > +       trace->tool.munmap        = perf_event__process_munmap;
>> >         trace->tool.comm          = perf_event__process_comm;
>> >         trace->tool.exit          = perf_event__process_exit;
>> >         trace->tool.fork          = perf_event__process_fork;
>> > diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
>> > index 4e6cbc99f08e..8d1feb9fe82d 100644
>> > --- a/tools/perf/util/data-convert-bt.c
>> > +++ b/tools/perf/util/data-convert-bt.c
>> > @@ -1462,6 +1462,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
>> >                         .sample          = process_sample_event,
>> >                         .mmap            = perf_event__process_mmap,
>> >                         .mmap2           = perf_event__process_mmap2,
>> > +                       .munmap          = perf_event__process_munmap,
>> >                         .comm            = perf_event__process_comm,
>> >                         .exit            = perf_event__process_exit,
>> >                         .fork            = perf_event__process_fork,
>> > diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
>> > index 8ab0d7da956b..ffcba9c07c0d 100644
>> > --- a/tools/perf/util/event.c
>> > +++ b/tools/perf/util/event.c
>> > @@ -18,6 +18,7 @@ static const char *perf_event__names[] = {
>> >         [0]                                     = "TOTAL",
>> >         [PERF_RECORD_MMAP]                      = "MMAP",
>> >         [PERF_RECORD_MMAP2]                     = "MMAP2",
>> > +       [PERF_RECORD_MUNMAP]                    = "MUNMAP",
>> >         [PERF_RECORD_LOST]                      = "LOST",
>> >         [PERF_RECORD_COMM]                      = "COMM",
>> >         [PERF_RECORD_EXIT]                      = "EXIT",
>> > @@ -1080,6 +1081,12 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp)
>> >                        event->mmap2.filename);
>> >  }
>> >
>> > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp)
>> > +{
>> > +       return fprintf(fp, " [%#" PRIx64 ", %" PRIu64 "]\n",
>> > +                      event->munmap.start, event->munmap.len);
>> > +}
>> > +
>> >  size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp)
>> >  {
>> >         struct thread_map *threads = thread_map__new_event(&event->thread_map);
>> > @@ -1128,6 +1135,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused,
>> >         return machine__process_mmap2_event(machine, event, sample);
>> >  }
>> >
>> > +int perf_event__process_munmap(struct perf_tool *tool __maybe_unused,
>> > +                              union perf_event *event,
>> > +                              struct perf_sample *sample,
>> > +                              struct machine *machine)
>> > +{
>> > +       return machine__process_munmap_event(machine, event, sample);
>> > +}
>> > +
>> >  size_t perf_event__fprintf_task(union perf_event *event, FILE *fp)
>> >  {
>> >         return fprintf(fp, "(%d:%d):(%d:%d)\n",
>> > @@ -1199,6 +1214,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
>> >         case PERF_RECORD_MMAP2:
>> >                 ret += perf_event__fprintf_mmap2(event, fp);
>> >                 break;
>> > +       case PERF_RECORD_MUNMAP:
>> > +               ret += perf_event__fprintf_munmap(event, fp);
>> > +               break;
>> >         case PERF_RECORD_AUX:
>> >                 ret += perf_event__fprintf_aux(event, fp);
>> >                 break;
>> > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
>> > index c735c53a26f8..00942b6a9d48 100644
>> > --- a/tools/perf/util/event.h
>> > +++ b/tools/perf/util/event.h
>> > @@ -33,6 +33,12 @@ struct mmap2_event {
>> >         char filename[PATH_MAX];
>> >  };
>> >
>> > +struct munmap_event {
>> > +       struct perf_event_header header;
>> > +       u64                      start;
>> > +       u64                      len;
>> > +};
>> > +
>> >  struct comm_event {
>> >         struct perf_event_header header;
>> >         u32 pid, tid;
>> > @@ -484,6 +490,7 @@ union perf_event {
>> >         struct perf_event_header        header;
>> >         struct mmap_event               mmap;
>> >         struct mmap2_event              mmap2;
>> > +       struct munmap_event             munmap;
>> >         struct comm_event               comm;
>> >         struct fork_event               fork;
>> >         struct lost_event               lost;
>> > @@ -595,6 +602,8 @@ int perf_event__process_mmap2(struct perf_tool *tool,
>> >                              union perf_event *event,
>> >                              struct perf_sample *sample,
>> >                              struct machine *machine);
>> > +int perf_event__process_munmap(struct perf_tool *tool, union perf_event *event,
>> > +                              struct perf_sample *sample, struct machine *machine);
>> >  int perf_event__process_fork(struct perf_tool *tool,
>> >                              union perf_event *event,
>> >                              struct perf_sample *sample,
>> > @@ -647,6 +656,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>> >  size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp);
>> >  size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp);
>> >  size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
>> > +size_t perf_event__fprintf_munmap(union perf_event *event, FILE *fp);
>> >  size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
>> >  size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp);
>> >  size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
>> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> > index 04e536ae4d88..c4b88e4f2422 100644
>> > --- a/tools/perf/util/evsel.c
>> > +++ b/tools/perf/util/evsel.c
>> > @@ -34,6 +34,7 @@ static struct {
>> >         bool sample_id_all;
>> >         bool exclude_guest;
>> >         bool mmap2;
>> > +       bool munmap;
>> >         bool cloexec;
>> >         bool clockid;
>> >         bool clockid_wrong;
>> > @@ -930,6 +931,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
>> >         attr->task  = track;
>> >         attr->mmap  = track;
>> >         attr->mmap2 = track && !perf_missing_features.mmap2;
>> > +       attr->munmap = track && !perf_missing_features.munmap;
>> >         attr->comm  = track;
>> >
>> >         if (opts->record_switch_events)
>> > @@ -1395,6 +1397,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
>> >         PRINT_ATTRf(exclude_callchain_kernel, p_unsigned);
>> >         PRINT_ATTRf(exclude_callchain_user, p_unsigned);
>> >         PRINT_ATTRf(mmap2, p_unsigned);
>> > +       PRINT_ATTRf(munmap, p_unsigned);
>> >         PRINT_ATTRf(comm_exec, p_unsigned);
>> >         PRINT_ATTRf(use_clockid, p_unsigned);
>> >         PRINT_ATTRf(context_switch, p_unsigned);
>> > @@ -1474,6 +1477,8 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
>> >         }
>> >
>> >  fallback_missing_features:
>> > +       if (perf_missing_features.munmap)
>> > +               evsel->attr.munmap = 0;
>> >         if (perf_missing_features.clockid_wrong)
>> >                 evsel->attr.clockid = CLOCK_MONOTONIC; /* should always work */
>> >         if (perf_missing_features.clockid) {
>> > @@ -1603,10 +1608,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
>> >                 goto out_close;
>> >
>> >         /*
>> > -        * Must probe features in the order they were added to the
>> > +        * Must probe features in the reverse order they were added to the
>> >          * perf_event_attr interface.
>> >          */
>> > -       if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
>> > +       if (!perf_missing_features.munmap && evsel->attr.munmap) {
>> > +               perf_missing_features.munmap = true;
>> > +               goto fallback_missing_features;
>> > +       } else if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
>> >                 perf_missing_features.write_backward = true;
>> >                 goto out_close;
>> >         } else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
>> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>> > index 747a034d1ff3..24f2f309d70f 100644
>> > --- a/tools/perf/util/machine.c
>> > +++ b/tools/perf/util/machine.c
>> > @@ -1320,6 +1320,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
>> >         return -1;
>> >  }
>> >
>> > +static int machine__process_kernel_munmap_event(struct machine *machine __maybe_unused,
>> > +                                               union perf_event *event __maybe_unused)
>> > +{
>> > +       /*
>> > +        * XXX: Fill this in as soon as we get munmap event for kernel
>> > +        *      "mmaps", aka module unload
>> > +        */
>> > +       return 0;
>> > +}
>> > +
>> >  int machine__process_mmap2_event(struct machine *machine,
>> >                                  union perf_event *event,
>> >                                  struct perf_sample *sample)
>> > @@ -1379,6 +1389,48 @@ int machine__process_mmap2_event(struct machine *machine,
>> >         return 0;
>> >  }
>> >
>> > +int machine__process_munmap_event(struct machine *machine,
>> > +                                 union perf_event *event,
>> > +                                 struct perf_sample *sample)
>> > +{
>> > +       struct thread *thread;
>> > +       struct map *map;
>> > +       enum map_type type = MAP__FUNCTION;
>> > +       int ret = 0;
>> > +
>> > +       if (dump_trace)
>> > +               perf_event__fprintf_munmap(event, stdout);
>> > +
>> > +       if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
>> > +           sample->cpumode == PERF_RECORD_MISC_KERNEL) {
>> > +               ret = machine__process_kernel_munmap_event(machine, event);
>> > +               if (ret < 0)
>> > +                       goto out_problem;
>> > +               return 0;
>> > +       }
>> > +
>> > +       thread = machine__find_thread(machine, sample->pid, sample->tid);
>> > +       if (thread == NULL)
>> > +               goto out_problem;
>> > +
>> > +       if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA)
>> > +               type = MAP__VARIABLE;
>> > +
>> > +       map = map_groups__find(thread->mg, type, event->munmap.start);
>> > +       if (map != NULL)
>> > +               goto out_problem_map;
>> > +
>> > +       map_groups__remove(thread->mg, map);
>> > +       thread__put(thread);
>> > +       return 0;
>> > +
>> > +out_problem_map:
>> > +       thread__put(thread);
>> > +out_problem:
>> > +       dump_printf("problem processing PERF_RECORD_MUNMAP, skipping event.\n");
>> > +       return 0;
>> > +}
>> > +
>> >  int machine__process_mmap_event(struct machine *machine, union perf_event *event,
>> >                                 struct perf_sample *sample)
>> >  {
>> > @@ -1540,6 +1592,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
>> >                 ret = machine__process_mmap_event(machine, event, sample); break;
>> >         case PERF_RECORD_MMAP2:
>> >                 ret = machine__process_mmap2_event(machine, event, sample); break;
>> > +       case PERF_RECORD_MUNMAP:
>> > +               ret = machine__process_munmap_event(machine, event, sample); break;
>> >         case PERF_RECORD_FORK:
>> >                 ret = machine__process_fork_event(machine, event, sample); break;
>> >         case PERF_RECORD_EXIT:
>> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
>> > index a28305029711..eabc8b9d8d36 100644
>> > --- a/tools/perf/util/machine.h
>> > +++ b/tools/perf/util/machine.h
>> > @@ -101,6 +101,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
>> >                                 struct perf_sample *sample);
>> >  int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
>> >                                  struct perf_sample *sample);
>> > +int machine__process_munmap_event(struct machine *machine, union perf_event *event,
>> > +                                 struct perf_sample *sample);
>> >  int machine__process_event(struct machine *machine, union perf_event *event,
>> >                                 struct perf_sample *sample);
>> >
>> > diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
>> > index a5fbc012e3df..047e004dc53f 100644
>> > --- a/tools/perf/util/python.c
>> > +++ b/tools/perf/util/python.c
>> > @@ -1155,6 +1155,7 @@ static struct {
>> >         PERF_CONST(RECORD_READ),
>> >         PERF_CONST(RECORD_SAMPLE),
>> >         PERF_CONST(RECORD_MMAP2),
>> > +       PERF_CONST(RECORD_MUNMAP),
>> >         PERF_CONST(RECORD_AUX),
>> >         PERF_CONST(RECORD_ITRACE_START),
>> >         PERF_CONST(RECORD_LOST_SAMPLES),
>> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> > index 349c68144e55..42edd68ce0b8 100644
>> > --- a/tools/perf/util/session.c
>> > +++ b/tools/perf/util/session.c
>> > @@ -357,6 +357,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
>> >                 tool->mmap = process_event_stub;
>> >         if (tool->mmap2 == NULL)
>> >                 tool->mmap2 = process_event_stub;
>> > +       if (tool->munmap == NULL)
>> > +               tool->munmap = process_event_stub;
>> >         if (tool->comm == NULL)
>> >                 tool->comm = process_event_stub;
>> >         if (tool->fork == NULL)
>> > @@ -480,6 +482,20 @@ static void perf_event__mmap2_swap(union perf_event *event,
>> >                 swap_sample_id_all(event, data);
>> >         }
>> >  }
>> > +
>> > +static void perf_event__munmap_swap(union perf_event *event, bool sample_id_all)
>> > +{
>> > +       event->munmap.start = bswap_64(event->munmap.start);
>> > +       event->munmap.len   = bswap_64(event->munmap.len);
>> > +
>> > +       if (sample_id_all) {
>> > +               void *data = &event->munmap.len;
>> > +
>> > +               data += sizeof(event->munmap.len);
>> > +               swap_sample_id_all(event, data);
>> > +       }
>> > +}
>> > +
>> >  static void perf_event__task_swap(union perf_event *event, bool sample_id_all)
>> >  {
>> >         event->fork.pid  = bswap_32(event->fork.pid);
>> > @@ -773,6 +789,7 @@ typedef void (*perf_event__swap_op)(union perf_event *event,
>> >  static perf_event__swap_op perf_event__swap_ops[] = {
>> >         [PERF_RECORD_MMAP]                = perf_event__mmap_swap,
>> >         [PERF_RECORD_MMAP2]               = perf_event__mmap2_swap,
>> > +       [PERF_RECORD_MUNMAP]              = perf_event__munmap_swap,
>> >         [PERF_RECORD_COMM]                = perf_event__comm_swap,
>> >         [PERF_RECORD_FORK]                = perf_event__task_swap,
>> >         [PERF_RECORD_EXIT]                = perf_event__task_swap,
>> > @@ -1237,6 +1254,8 @@ static int machines__deliver_event(struct machines *machines,
>> >                 if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT)
>> >                         ++evlist->stats.nr_proc_map_timeout;
>> >                 return tool->mmap2(tool, event, sample, machine);
>> > +       case PERF_RECORD_MUNMAP:
>> > +               return tool->munmap(tool, event, sample, machine);
>> >         case PERF_RECORD_COMM:
>> >                 return tool->comm(tool, event, sample, machine);
>> >         case PERF_RECORD_FORK:
>> > diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
>> > index ac2590a3de2d..66291063b8b2 100644
>> > --- a/tools/perf/util/tool.h
>> > +++ b/tools/perf/util/tool.h
>> > @@ -39,6 +39,7 @@ struct perf_tool {
>> >                         read;
>> >         event_op        mmap,
>> >                         mmap2,
>> > +                       munmap,
>> >                         comm,
>> >                         fork,
>> >                         exit,

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-27 18:10                           ` Stephane Eranian
@ 2017-01-27 19:18                             ` Arnaldo Carvalho de Melo
  2017-01-27 19:26                               ` Stephane Eranian
  0 siblings, 1 reply; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-01-27 19:18 UTC (permalink / raw)
  To: eranian
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Stephane Eranian,
	Andres Freund, LKML, Jiri Olsa, Ingo Molnar, Anton Blanchard,
	Namhyung Kim

Em Fri, Jan 27, 2017 at 10:10:09AM -0800, Stephane Eranian escreveu:
> On Fri, Jan 27, 2017 at 10:05 AM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > Em Fri, Jan 27, 2017 at 09:46:55AM -0800, Stephane Eranian escreveu:
> >> On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo
> >> <arnaldo.melo@gmail.com> wrote:
> >> > Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu:
> >> >> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu:
> >> >> > Something like the (compile tested only) below might be sufficient to
> >> >> > disambiguate things. It would need a corresponding tools/perf patch of
> >> >> > course, but I'm not too familiar with that code anymore.
> >> >>
> >> >> I'm working on patch to do feature test, fallback and handling of the
> >> >> event, etc, will post later.
> >> >
> >> > Just compile tested, need to build a kernel with PeterZ's patch to test,
> >> > feel free to go from there if in a hurry.
> >> >
> >> >
> >> > The place where the map is yanked out of the thread's maps rbtree is at
> >> >
> >> > machine__process_munmap_event()
> >> >
> >> > The rest is making sure the tool works with older kernels, deals with
> >> > endianness in the record in a perf.data file for cross platform
> >> > analysis, hooking it to the various tools where handling this event
> >> > makes sense.
> >> >
> >> At first glance this patch handles the munmap() well. But it will not solve
> >> the case of Andres. Unless you're telling me that the kernel with Peterz's patch
> >
> >
> > Nah, I just tried to implement support for the facility PeterZ was
> > proposing for the kernel, not trying to solve Andres, silly me ;-)
> >
> > But then it doesn't even does that well, as it needs to take munmap.len
> > into account, to possibly split the map if they aren't a perfect match
> > (start, len).
> >
> Ah, yes, that's correct. You need to consider len and possibly split
> the maps you have.

In fact we need to replicate the kernel's do_munmap() routine, as the
munmap range may straddle multiple existing maps, possibly splitting the
first and/or the last, ditching whatever is in between.

- Arnaldo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] handle munmap records in tools/perf was: Re: perf/jit doesn't cope well with mprotect() to jit containing pages
  2017-01-27 19:18                             ` Arnaldo Carvalho de Melo
@ 2017-01-27 19:26                               ` Stephane Eranian
  0 siblings, 0 replies; 29+ messages in thread
From: Stephane Eranian @ 2017-01-27 19:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Stephane Eranian, Andres Freund, LKML, Jiri Olsa,
	Ingo Molnar, Anton Blanchard, Namhyung Kim

On Fri, Jan 27, 2017 at 11:18 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Fri, Jan 27, 2017 at 10:10:09AM -0800, Stephane Eranian escreveu:
>> On Fri, Jan 27, 2017 at 10:05 AM, Arnaldo Carvalho de Melo
>> <arnaldo.melo@gmail.com> wrote:
>> > Em Fri, Jan 27, 2017 at 09:46:55AM -0800, Stephane Eranian escreveu:
>> >> On Fri, Jan 27, 2017 at 9:38 AM, Arnaldo Carvalho de Melo
>> >> <arnaldo.melo@gmail.com> wrote:
>> >> > Em Fri, Jan 27, 2017 at 12:43:05PM -0300, Arnaldo Carvalho de Melo escreveu:
>> >> >> Em Fri, Jan 27, 2017 at 02:07:02PM +0100, Peter Zijlstra escreveu:
>> >> >> > Something like the (compile tested only) below might be sufficient to
>> >> >> > disambiguate things. It would need a corresponding tools/perf patch of
>> >> >> > course, but I'm not too familiar with that code anymore.
>> >> >>
>> >> >> I'm working on patch to do feature test, fallback and handling of the
>> >> >> event, etc, will post later.
>> >> >
>> >> > Just compile tested, need to build a kernel with PeterZ's patch to test,
>> >> > feel free to go from there if in a hurry.
>> >> >
>> >> >
>> >> > The place where the map is yanked out of the thread's maps rbtree is at
>> >> >
>> >> > machine__process_munmap_event()
>> >> >
>> >> > The rest is making sure the tool works with older kernels, deals with
>> >> > endianness in the record in a perf.data file for cross platform
>> >> > analysis, hooking it to the various tools where handling this event
>> >> > makes sense.
>> >> >
>> >> At first glance this patch handles the munmap() well. But it will not solve
>> >> the case of Andres. Unless you're telling me that the kernel with Peterz's patch
>> >
>> >
>> > Nah, I just tried to implement support for the facility PeterZ was
>> > proposing for the kernel, not trying to solve Andres, silly me ;-)
>> >
>> > But then it doesn't even does that well, as it needs to take munmap.len
>> > into account, to possibly split the map if they aren't a perfect match
>> > (start, len).
>> >
>> Ah, yes, that's correct. You need to consider len and possibly split
>> the maps you have.
>
> In fact we need to replicate the kernel's do_munmap() routine, as the
> munmap range may straddle multiple existing maps, possibly splitting the
> first and/or the last, ditching whatever is in between.
>
Yes. So this is not a small change in perf unfortunately.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip:perf/core] perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory
  2017-01-26 22:15   ` Peter Zijlstra
  2017-01-26 23:04     ` Andres Freund
@ 2017-01-30 11:52     ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-01-30 11:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, torvalds, mingo, jolsa, andres, acme, tglx, peterz,
	alexander.shishkin, eranian, hpa, eranian

Commit-ID:  0b3589be9b98994ce3d5aeca52445d1f5627c4ba
Gitweb:     http://git.kernel.org/tip/0b3589be9b98994ce3d5aeca52445d1f5627c4ba
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 26 Jan 2017 23:15:08 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 30 Jan 2017 11:41:26 +0100

perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory

Andres reported that MMAP2 records for anonymous memory always have
their protection field 0.

Turns out, someone daft put the prot/flags generation code in the file
branch, leaving them unset for anonymous memory.

Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Don Zickus <dzickus@redhat.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: anton@ozlabs.org
Cc: namhyung@kernel.org
Cc: stable@vger.kernel.org # v3.16+
Fixes: f972eb63b100 ("perf: Pass protection and flags bits through mmap2 interface")
Link: http://lkml.kernel.org/r/20170126221508.GF6536@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4e1f4c0..e5aaa80 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6632,6 +6632,27 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
 	char *buf = NULL;
 	char *name;
 
+	if (vma->vm_flags & VM_READ)
+		prot |= PROT_READ;
+	if (vma->vm_flags & VM_WRITE)
+		prot |= PROT_WRITE;
+	if (vma->vm_flags & VM_EXEC)
+		prot |= PROT_EXEC;
+
+	if (vma->vm_flags & VM_MAYSHARE)
+		flags = MAP_SHARED;
+	else
+		flags = MAP_PRIVATE;
+
+	if (vma->vm_flags & VM_DENYWRITE)
+		flags |= MAP_DENYWRITE;
+	if (vma->vm_flags & VM_MAYEXEC)
+		flags |= MAP_EXECUTABLE;
+	if (vma->vm_flags & VM_LOCKED)
+		flags |= MAP_LOCKED;
+	if (vma->vm_flags & VM_HUGETLB)
+		flags |= MAP_HUGETLB;
+
 	if (file) {
 		struct inode *inode;
 		dev_t dev;
@@ -6658,27 +6679,6 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
 		maj = MAJOR(dev);
 		min = MINOR(dev);
 
-		if (vma->vm_flags & VM_READ)
-			prot |= PROT_READ;
-		if (vma->vm_flags & VM_WRITE)
-			prot |= PROT_WRITE;
-		if (vma->vm_flags & VM_EXEC)
-			prot |= PROT_EXEC;
-
-		if (vma->vm_flags & VM_MAYSHARE)
-			flags = MAP_SHARED;
-		else
-			flags = MAP_PRIVATE;
-
-		if (vma->vm_flags & VM_DENYWRITE)
-			flags |= MAP_DENYWRITE;
-		if (vma->vm_flags & VM_MAYEXEC)
-			flags |= MAP_EXECUTABLE;
-		if (vma->vm_flags & VM_LOCKED)
-			flags |= MAP_LOCKED;
-		if (vma->vm_flags & VM_HUGETLB)
-			flags |= MAP_HUGETLB;
-
 		goto got_name;
 	} else {
 		if (vma->vm_ops && vma->vm_ops->name) {

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-01-30 11:53 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-10  5:02 perf/jit doesn't cope well with mprotect() to jit containing pages Andres Freund
2016-12-12  8:49 ` Peter Zijlstra
2016-12-12  9:01   ` Andres Freund
2016-12-12  9:28     ` Peter Zijlstra
2017-01-26  1:25       ` Andres Freund
2017-01-26 22:15   ` Peter Zijlstra
2017-01-26 23:04     ` Andres Freund
2017-01-30 11:52     ` [tip:perf/core] perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory tip-bot for Peter Zijlstra
2017-01-26 20:32 ` perf/jit doesn't cope well with mprotect() to jit containing pages Stephane Eranian
2017-01-26 21:00   ` Andres Freund
2017-01-26 21:17     ` Stephane Eranian
2017-01-26 21:22       ` Andres Freund
2017-01-26 21:34         ` Stephane Eranian
2017-01-26 21:51           ` Andres Freund
2017-01-26 22:19     ` Peter Zijlstra
2017-01-26 22:26       ` Stephane Eranian
2017-01-26 22:38         ` Andres Freund
2017-01-26 22:51           ` Stephane Eranian
2017-01-26 23:09             ` Andres Freund
2017-01-26 23:16               ` Stephane Eranian
2017-01-27 13:07                 ` Peter Zijlstra
2017-01-27 15:43                   ` Arnaldo Carvalho de Melo
2017-01-27 17:35                     ` Stephane Eranian
2017-01-27 17:38                     ` [PATCH] handle munmap records in tools/perf was: " Arnaldo Carvalho de Melo
2017-01-27 17:46                       ` Stephane Eranian
2017-01-27 18:05                         ` Arnaldo Carvalho de Melo
2017-01-27 18:10                           ` Stephane Eranian
2017-01-27 19:18                             ` Arnaldo Carvalho de Melo
2017-01-27 19:26                               ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).