linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] perf inject --jit: Remove //anon mmap events
@ 2020-05-27  1:51 Steve MacLean
  2020-05-27 19:00 ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Steve MacLean @ 2020-05-27  1:51 UTC (permalink / raw)
  Cc: Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, linux-kernel

From: Steve MacLean <Steve.MacLean@Microsoft.com>

**perf-<pid>.map and jit-<pid>.dump designs:

When a JIT generates code to be executed, it must allocate memory and
mark it executable using an mmap call.

*** perf-<pid>.map design

The perf-<pid>.map assumes that any sample recorded in an anonymous
memory page is JIT code. It then tries to resolve the symbol name by
looking at the process' perf-<pid>.map.

*** jit-<pid>.dump design

The jit-<pid>.dump mechanism takes a different approach. It requires a JIT
to write a `<path>/jit-<pid>.dump` file. This file must also be mmapped
so that perf inject -jit can find the file. The JIT must also add
JIT_CODE_LOAD records for any functions it generates. The records are
timestamped using a clock which can be correlated to the perf record
clock.

After perf record,  the `perf inject -jit` pass parses the recording
looking for a `<path>/jit-<pid>.dump` file. When it finds the file, it
parses it and for each JIT_CODE_LOAD record:
* creates an elf file `<path>/jitted-<pid>-<code_index>.so
* injects a new mmap record mapping the new elf file into the process.

*** Coexistence design

The kernel and perf support both of these mechanisms. We need to make
sure perf works on an app supporting either or both of these mechanisms.
Both designs rely on mmap records to determine how to resolve an ip
address.

The mmap records of both techniques by definition overlap. When the JIT
compiles a method, it must:
* allocate memory (mmap)
* add execution privilege (mprotect or mmap. either will
generate an mmap event form the kernel to perf)
* compile code into memory
* add a function record to perf-<pid>.map and/or jit-<pid>.dump

Because the jit-<pid>.dump mechanism supports greater capabilities, perf
prefers the symbols from jit-<pid>.dump. It implements this based on
timestamp ordering of events. There is an implicit ASSUMPTION that the
JIT_CODE_LOAD record timestamp will be after the // anon mmap event that
was generated during memory allocation or adding the execution privilege setting.

*** Problems with the ASSUMPTION

The ASSUMPTION made in the Coexistence design section above is violated
in the following scenario.

*** Scenario

While a JIT is jitting code it will eventually need to commit more
pages and change these pages to executable permissions. Typically the
JIT will want these collocated to minimize branch displacements.

The kernel will coalesce these anonymous mapping with identical
permissions before sending an MMAP event for the new pages. The address
range of the new mmap will not be just the most recently mmap pages.
It will include the entire coalesced mmap region.

See mm/mmap.c

unsigned long mmap_region(struct file *file, unsigned long addr,
                unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
                struct list_head *uf)
{
...
        /*
         * Can we just expand an old mapping?
         */
...
        perf_event_mmap(vma);
...
}

*** Symptoms

The coalesced // anon mmap event will be timestamped after the
JIT_CODE_LOAD records. This means it will be used as the most recent
mapping for that entire address range. For remaining events it will look at the
inferior perf-<pid>.map for symbols.

If both mechanisms are supported, the symbol will appear twice with
different module names. This causes weird behavior in reporting.

If only jit-<pid>.dump is supported, the symbol will no longer be resolved.

** Implemented solution

This patch solves the issue by removing // anon mmap events for any
process which has a valid jit-<pid>.dump file.

It tracks on a per process basis to handle the case where some running
apps support jit-<pid>.dump, but some only support perf-<pid>.map.

It adds new assumptions:
* // anon mmap events are only required for perf-<pid>.map support.
* An app that uses jit-<pid>.dump, no longer needs
perf-<pid>.map support. It assumes that any perf-<pid>.map info is
inferior.

*** Details

Use thread->priv to store whether a jitdump file has been processed

During "perf inject --jit", discard "//anon*" mmap events for any pid which
has sucessfully processed a jitdump file.

** Committer testing:

// jitdump case
perf record <app with jitdump>
perf inject --jit --input perf.data --output perfjit.data
// verify mmap "//anon" events present initially
perf script --input perf.data --show-mmap-events | grep '//anon'
// verify mmap "//anon" events removed
perf script --input perfjit.data --show-mmap-events | grep '//anon'
// no jitdump case
perf record <app without jitdump>
perf inject --jit --input perf.data --output perfjit.data
// verify mmap "//anon" events present initially
perf script --input perf.data --show-mmap-events | grep '//anon'
// verify mmap "//anon" events not removed
perf script --input perfjit.data --show-mmap-events | grep '//anon'

** Repro:

This issue was discovered while testing the initial CoreCLR jitdump
implementation. https://github.com/dotnet/coreclr/pull/26897.

** Alternate solutions considered

These were also briefly considered
* Change kernel to not coalesce mmap regions.
* Change kernel reporting of coalesced mmap regions to perf. Only
include newly mapped memory.
* Only strip parts of // anon mmap events overlapping existing
jitted-<pid>-<code_index>.so mmap events.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steve MacLean <Steve.MacLean@Microsoft.com>
---
 tools/perf/builtin-inject.c |  4 ++--
 tools/perf/util/jitdump.c   | 31 ++++++++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 372ecb3..0f38862 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -263,7 +263,7 @@ static int perf_event__jit_repipe_mmap(struct perf_tool *tool,
 	 * if jit marker, then inject jit mmaps and generate ELF images
 	 */
 	ret = jit_process(inject->session, &inject->output, machine,
-			  event->mmap.filename, sample->pid, &n);
+			  event->mmap.filename, event->mmap.pid, &n);
 	if (ret < 0)
 		return ret;
 	if (ret) {
@@ -301,7 +301,7 @@ static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
 	 * if jit marker, then inject jit mmaps and generate ELF images
 	 */
 	ret = jit_process(inject->session, &inject->output, machine,
-			  event->mmap2.filename, sample->pid, &n);
+			  event->mmap2.filename, event->mmap2.pid, &n);
 	if (ret < 0)
 		return ret;
 	if (ret) {
diff --git a/tools/perf/util/jitdump.c b/tools/perf/util/jitdump.c
index e3ccb0c..d18596e 100644
--- a/tools/perf/util/jitdump.c
+++ b/tools/perf/util/jitdump.c
@@ -26,6 +26,7 @@
 #include "jit.h"
 #include "jitdump.h"
 #include "genelf.h"
+#include "thread.h"
 
 #include <linux/ctype.h>
 #include <linux/zalloc.h>
@@ -749,6 +750,28 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
 	return 0;
 }
 
+static void jit_add_pid(struct machine *machine, pid_t pid)
+{
+	struct thread *thread = machine__findnew_thread(machine, pid, pid);
+
+	if (!thread) {
+		pr_err("%s: thread %d not found or created\n", __func__, pid);
+		return;
+	}
+
+	thread->priv = (void *)1;
+}
+
+static bool jit_has_pid(struct machine *machine, pid_t pid)
+{
+	struct thread *thread = machine__find_thread(machine, pid, pid);
+
+	if (!thread)
+		return 0;
+
+	return (bool)thread->priv;
+}
+
 int
 jit_process(struct perf_session *session,
 	    struct perf_data *output,
@@ -764,8 +787,13 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
 	/*
 	 * first, detect marker mmap (i.e., the jitdump mmap)
 	 */
-	if (jit_detect(filename, pid))
+	if (jit_detect(filename, pid)) {
+		// Strip //anon* mmaps if we processed a jitdump for this pid
+		if (jit_has_pid(machine, pid) && (strncmp(filename, "//anon", 6) == 0))
+			return 1;
+
 		return 0;
+	}
 
 	memset(&jd, 0, sizeof(jd));
 
@@ -784,6 +812,7 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
 
 	ret = jit_inject(&jd, filename);
 	if (!ret) {
+		jit_add_pid(machine, pid);
 		*nbytes = jd.bytes_written;
 		ret = 1;
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-05-27  1:51 [PATCH v4] perf inject --jit: Remove //anon mmap events Steve MacLean
@ 2020-05-27 19:00 ` Ian Rogers
  2020-05-27 19:27   ` [EXTERNAL] " Steve MacLean
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2020-05-27 19:00 UTC (permalink / raw)
  To: Steve MacLean
  Cc: Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

On Tue, May 26, 2020 at 6:52 PM Steve MacLean
<steve.maclean@linux.microsoft.com> wrote:
>
> From: Steve MacLean <Steve.MacLean@Microsoft.com>
>
> **perf-<pid>.map and jit-<pid>.dump designs:
>
> When a JIT generates code to be executed, it must allocate memory and
> mark it executable using an mmap call.
>
> *** perf-<pid>.map design
>
> The perf-<pid>.map assumes that any sample recorded in an anonymous
> memory page is JIT code. It then tries to resolve the symbol name by
> looking at the process' perf-<pid>.map.
>
> *** jit-<pid>.dump design
>
> The jit-<pid>.dump mechanism takes a different approach. It requires a JIT
> to write a `<path>/jit-<pid>.dump` file. This file must also be mmapped
> so that perf inject -jit can find the file. The JIT must also add
> JIT_CODE_LOAD records for any functions it generates. The records are
> timestamped using a clock which can be correlated to the perf record
> clock.
>
> After perf record,  the `perf inject -jit` pass parses the recording
> looking for a `<path>/jit-<pid>.dump` file. When it finds the file, it
> parses it and for each JIT_CODE_LOAD record:
> * creates an elf file `<path>/jitted-<pid>-<code_index>.so
> * injects a new mmap record mapping the new elf file into the process.
>
> *** Coexistence design
>
> The kernel and perf support both of these mechanisms. We need to make
> sure perf works on an app supporting either or both of these mechanisms.
> Both designs rely on mmap records to determine how to resolve an ip
> address.
>
> The mmap records of both techniques by definition overlap. When the JIT
> compiles a method, it must:
> * allocate memory (mmap)
> * add execution privilege (mprotect or mmap. either will
> generate an mmap event form the kernel to perf)
> * compile code into memory
> * add a function record to perf-<pid>.map and/or jit-<pid>.dump
>
> Because the jit-<pid>.dump mechanism supports greater capabilities, perf
> prefers the symbols from jit-<pid>.dump. It implements this based on
> timestamp ordering of events. There is an implicit ASSUMPTION that the
> JIT_CODE_LOAD record timestamp will be after the // anon mmap event that
> was generated during memory allocation or adding the execution privilege setting.
>
> *** Problems with the ASSUMPTION
>
> The ASSUMPTION made in the Coexistence design section above is violated
> in the following scenario.
>
> *** Scenario
>
> While a JIT is jitting code it will eventually need to commit more
> pages and change these pages to executable permissions. Typically the
> JIT will want these collocated to minimize branch displacements.
>
> The kernel will coalesce these anonymous mapping with identical
> permissions before sending an MMAP event for the new pages. The address
> range of the new mmap will not be just the most recently mmap pages.
> It will include the entire coalesced mmap region.
>
> See mm/mmap.c
>
> unsigned long mmap_region(struct file *file, unsigned long addr,
>                 unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
>                 struct list_head *uf)
> {
> ...
>         /*
>          * Can we just expand an old mapping?
>          */
> ...
>         perf_event_mmap(vma);
> ...
> }
>
> *** Symptoms
>
> The coalesced // anon mmap event will be timestamped after the
> JIT_CODE_LOAD records. This means it will be used as the most recent
> mapping for that entire address range. For remaining events it will look at the
> inferior perf-<pid>.map for symbols.
>
> If both mechanisms are supported, the symbol will appear twice with
> different module names. This causes weird behavior in reporting.
>
> If only jit-<pid>.dump is supported, the symbol will no longer be resolved.
>
> ** Implemented solution
>
> This patch solves the issue by removing // anon mmap events for any
> process which has a valid jit-<pid>.dump file.
>
> It tracks on a per process basis to handle the case where some running
> apps support jit-<pid>.dump, but some only support perf-<pid>.map.
>
> It adds new assumptions:
> * // anon mmap events are only required for perf-<pid>.map support.
> * An app that uses jit-<pid>.dump, no longer needs
> perf-<pid>.map support. It assumes that any perf-<pid>.map info is
> inferior.

Thanks Steve this is an important fix! As //anon could be for malloc
or other uses, should the stripping behavior be behind a flag?

Ian

> *** Details
>
> Use thread->priv to store whether a jitdump file has been processed
>
> During "perf inject --jit", discard "//anon*" mmap events for any pid which
> has sucessfully processed a jitdump file.
>
> ** Committer testing:
>
> // jitdump case
> perf record <app with jitdump>
> perf inject --jit --input perf.data --output perfjit.data
> // verify mmap "//anon" events present initially
> perf script --input perf.data --show-mmap-events | grep '//anon'
> // verify mmap "//anon" events removed
> perf script --input perfjit.data --show-mmap-events | grep '//anon'
> // no jitdump case
> perf record <app without jitdump>
> perf inject --jit --input perf.data --output perfjit.data
> // verify mmap "//anon" events present initially
> perf script --input perf.data --show-mmap-events | grep '//anon'
> // verify mmap "//anon" events not removed
> perf script --input perfjit.data --show-mmap-events | grep '//anon'
>
> ** Repro:
>
> This issue was discovered while testing the initial CoreCLR jitdump
> implementation. https://github.com/dotnet/coreclr/pull/26897.
>
> ** Alternate solutions considered
>
> These were also briefly considered
> * Change kernel to not coalesce mmap regions.
> * Change kernel reporting of coalesced mmap regions to perf. Only
> include newly mapped memory.
> * Only strip parts of // anon mmap events overlapping existing
> jitted-<pid>-<code_index>.so mmap events.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Steve MacLean <Steve.MacLean@Microsoft.com>
> ---
>  tools/perf/builtin-inject.c |  4 ++--
>  tools/perf/util/jitdump.c   | 31 ++++++++++++++++++++++++++++++-
>  2 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 372ecb3..0f38862 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -263,7 +263,7 @@ static int perf_event__jit_repipe_mmap(struct perf_tool *tool,
>          * if jit marker, then inject jit mmaps and generate ELF images
>          */
>         ret = jit_process(inject->session, &inject->output, machine,
> -                         event->mmap.filename, sample->pid, &n);
> +                         event->mmap.filename, event->mmap.pid, &n);
>         if (ret < 0)
>                 return ret;
>         if (ret) {
> @@ -301,7 +301,7 @@ static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
>          * if jit marker, then inject jit mmaps and generate ELF images
>          */
>         ret = jit_process(inject->session, &inject->output, machine,
> -                         event->mmap2.filename, sample->pid, &n);
> +                         event->mmap2.filename, event->mmap2.pid, &n);
>         if (ret < 0)
>                 return ret;
>         if (ret) {
> diff --git a/tools/perf/util/jitdump.c b/tools/perf/util/jitdump.c
> index e3ccb0c..d18596e 100644
> --- a/tools/perf/util/jitdump.c
> +++ b/tools/perf/util/jitdump.c
> @@ -26,6 +26,7 @@
>  #include "jit.h"
>  #include "jitdump.h"
>  #include "genelf.h"
> +#include "thread.h"
>
>  #include <linux/ctype.h>
>  #include <linux/zalloc.h>
> @@ -749,6 +750,28 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
>         return 0;
>  }
>
> +static void jit_add_pid(struct machine *machine, pid_t pid)
> +{
> +       struct thread *thread = machine__findnew_thread(machine, pid, pid);
> +
> +       if (!thread) {
> +               pr_err("%s: thread %d not found or created\n", __func__, pid);
> +               return;
> +       }
> +
> +       thread->priv = (void *)1;
> +}
> +
> +static bool jit_has_pid(struct machine *machine, pid_t pid)
> +{
> +       struct thread *thread = machine__find_thread(machine, pid, pid);
> +
> +       if (!thread)
> +               return 0;
> +
> +       return (bool)thread->priv;
> +}
> +
>  int
>  jit_process(struct perf_session *session,
>             struct perf_data *output,
> @@ -764,8 +787,13 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
>         /*
>          * first, detect marker mmap (i.e., the jitdump mmap)
>          */
> -       if (jit_detect(filename, pid))
> +       if (jit_detect(filename, pid)) {
> +               // Strip //anon* mmaps if we processed a jitdump for this pid
> +               if (jit_has_pid(machine, pid) && (strncmp(filename, "//anon", 6) == 0))
> +                       return 1;
> +
>                 return 0;
> +       }
>
>         memset(&jd, 0, sizeof(jd));
>
> @@ -784,6 +812,7 @@ static int jit_repipe_debug_info(struct jit_buf_desc *jd, union jr_entry *jr)
>
>         ret = jit_inject(&jd, filename);
>         if (!ret) {
> +               jit_add_pid(machine, pid);
>                 *nbytes = jd.bytes_written;
>                 ret = 1;
>         }
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-05-27 19:00 ` Ian Rogers
@ 2020-05-27 19:27   ` Steve MacLean
  2020-05-27 20:59     ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Steve MacLean @ 2020-05-27 19:27 UTC (permalink / raw)
  To: Ian Rogers, Steve MacLean
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, LKML

>> ** Implemented solution
>>
>> This patch solves the issue by removing // anon mmap events for any 
>> process which has a valid jit-<pid>.dump file.
>>
>> It tracks on a per process basis to handle the case where some running 
>> apps support jit-<pid>.dump, but some only support perf-<pid>.map.
>>
>> It adds new assumptions:
>> * // anon mmap events are only required for perf-<pid>.map support.
>> * An app that uses jit-<pid>.dump, no longer needs perf-<pid>.map 
>> support. It assumes that any perf-<pid>.map info is inferior.
>>
>> *** Details
>>
>> Use thread->priv to store whether a jitdump file has been processed
>>
>> During "perf inject --jit", discard "//anon*" mmap events for any pid 
>> which has sucessfully processed a jitdump file.
>
>
> Thanks Steve this is an important fix! As //anon could be for malloc or other uses, should the stripping behavior be behind a flag? 
>
> Ian

I hadn't anticipated a need to preserve the //anon mmap events when profiling JIT generated code.

As far as I know mmap events are captured by perf only for mapping code to symbols.  File mappings are kept
by the change.  Only // anon mappings are stripped.  (Only for processes which emitted jitdump files.)
And these are stripped only during the `perf inject --jit` step. I believe the // Anon mapping are only 
generally useful for mapping JIT code.

I suppose if someone was trying to count mmap events it might be confusing, but `perf inject --jit` creates 
synthetic mmap file events which would also make this scenario confusing.

I personally don't see a good reason to add a flag.  I also don't see a simple way either.  Not running `perf inject --jit`
would preserve existing behavior w/o jitdump support.  Without stripping the anon events jitdump support is painfully
broken....

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-05-27 19:27   ` [EXTERNAL] " Steve MacLean
@ 2020-05-27 20:59     ` Ian Rogers
  2020-05-28  9:32       ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2020-05-27 20:59 UTC (permalink / raw)
  To: Steve MacLean
  Cc: Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

On Wed, May 27, 2020 at 12:27 PM Steve MacLean
<Steve.MacLean@microsoft.com> wrote:
>
> >> ** Implemented solution
> >>
> >> This patch solves the issue by removing // anon mmap events for any
> >> process which has a valid jit-<pid>.dump file.
> >>
> >> It tracks on a per process basis to handle the case where some running
> >> apps support jit-<pid>.dump, but some only support perf-<pid>.map.
> >>
> >> It adds new assumptions:
> >> * // anon mmap events are only required for perf-<pid>.map support.
> >> * An app that uses jit-<pid>.dump, no longer needs perf-<pid>.map
> >> support. It assumes that any perf-<pid>.map info is inferior.
> >>
> >> *** Details
> >>
> >> Use thread->priv to store whether a jitdump file has been processed
> >>
> >> During "perf inject --jit", discard "//anon*" mmap events for any pid
> >> which has sucessfully processed a jitdump file.
> >
> >
> > Thanks Steve this is an important fix! As //anon could be for malloc or other uses, should the stripping behavior be behind a flag?
> >
> > Ian
>
> I hadn't anticipated a need to preserve the //anon mmap events when profiling JIT generated code.
>
> As far as I know mmap events are captured by perf only for mapping code to symbols.  File mappings are kept
> by the change.  Only // anon mappings are stripped.  (Only for processes which emitted jitdump files.)
> And these are stripped only during the `perf inject --jit` step. I believe the // Anon mapping are only
> generally useful for mapping JIT code.
>
> I suppose if someone was trying to count mmap events it might be confusing, but `perf inject --jit` creates
> synthetic mmap file events which would also make this scenario confusing.
>
> I personally don't see a good reason to add a flag.  I also don't see a simple way either.  Not running `perf inject --jit`
> would preserve existing behavior w/o jitdump support.  Without stripping the anon events jitdump support is painfully
> broken....

Agreed that things are broken. In general only executable mappings are
held onto by perf, so it could be I'm over worrying about //anon
stripping breaking around memory allocations. We have some other use
cases for //anon at Google but they aren't impacted by jitdump. We
have also been trying to migrate jit caches to using memfd_create,
which has the same problem that this patch fixes for //anon. Fixing
memfd_create is a separate issue to //anon. I'll try to get a repro
for Java that demonstrates the problem and then add a Tested-by.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-05-27 20:59     ` Ian Rogers
@ 2020-05-28  9:32       ` Ian Rogers
  2020-05-29 18:48         ` Steve MacLean
  2020-06-01  6:17         ` Nick Gasson
  0 siblings, 2 replies; 12+ messages in thread
From: Ian Rogers @ 2020-05-28  9:32 UTC (permalink / raw)
  To: Steve MacLean
  Cc: Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

On Wed, May 27, 2020 at 1:59 PM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, May 27, 2020 at 12:27 PM Steve MacLean
> <Steve.MacLean@microsoft.com> wrote:
> >
> > >> ** Implemented solution
> > >>
> > >> This patch solves the issue by removing // anon mmap events for any
> > >> process which has a valid jit-<pid>.dump file.
> > >>
> > >> It tracks on a per process basis to handle the case where some running
> > >> apps support jit-<pid>.dump, but some only support perf-<pid>.map.
> > >>
> > >> It adds new assumptions:
> > >> * // anon mmap events are only required for perf-<pid>.map support.
> > >> * An app that uses jit-<pid>.dump, no longer needs perf-<pid>.map
> > >> support. It assumes that any perf-<pid>.map info is inferior.
> > >>
> > >> *** Details
> > >>
> > >> Use thread->priv to store whether a jitdump file has been processed
> > >>
> > >> During "perf inject --jit", discard "//anon*" mmap events for any pid
> > >> which has sucessfully processed a jitdump file.
> > >
> > >
> > > Thanks Steve this is an important fix! As //anon could be for malloc or other uses, should the stripping behavior be behind a flag?
> > >
> > > Ian
> >
> > I hadn't anticipated a need to preserve the //anon mmap events when profiling JIT generated code.
> >
> > As far as I know mmap events are captured by perf only for mapping code to symbols.  File mappings are kept
> > by the change.  Only // anon mappings are stripped.  (Only for processes which emitted jitdump files.)
> > And these are stripped only during the `perf inject --jit` step. I believe the // Anon mapping are only
> > generally useful for mapping JIT code.
> >
> > I suppose if someone was trying to count mmap events it might be confusing, but `perf inject --jit` creates
> > synthetic mmap file events which would also make this scenario confusing.
> >
> > I personally don't see a good reason to add a flag.  I also don't see a simple way either.  Not running `perf inject --jit`
> > would preserve existing behavior w/o jitdump support.  Without stripping the anon events jitdump support is painfully
> > broken....
>
> Agreed that things are broken. In general only executable mappings are
> held onto by perf, so it could be I'm over worrying about //anon
> stripping breaking around memory allocations. We have some other use
> cases for //anon at Google but they aren't impacted by jitdump. We
> have also been trying to migrate jit caches to using memfd_create,
> which has the same problem that this patch fixes for //anon. Fixing
> memfd_create is a separate issue to //anon. I'll try to get a repro
> for Java that demonstrates the problem and then add a Tested-by.
>
> Thanks,
> Ian

So on tip/perf/core with:
1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries

I've been trying variants of:

Before:
/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
dacapo-9.12-bach.jar jython
/tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
/tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
578
/tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
6

After:
/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
dacapo-9.12-bach.jar jython
/tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
/tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
589
/tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
60

So maybe the jit cache isn't behaving the way that the patch cures,
the uptick in unknowns appears consistent in my testing though. I
expect user error, is there an obvious explanation I'm missing?

Thanks,
Ian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-05-28  9:32       ` Ian Rogers
@ 2020-05-29 18:48         ` Steve MacLean
  2020-05-29 21:44           ` Steve MacLean
  2020-06-01  6:17         ` Nick Gasson
  1 sibling, 1 reply; 12+ messages in thread
From: Steve MacLean @ 2020-05-29 18:48 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

> I've been trying variants of:
>
> Before:
>/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar dacapo-9.12-bach.jar jython /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 578
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 6
> 
> After:
> /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar dacapo-9.12-bach.jar jython /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 589
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 60
>
> So maybe the jit cache isn't behaving the way that the patch cures, the uptick in unknowns appears consistent in my testing though. I expect user error, is there an obvious explanation I'm missing?

That is certainly disturbing. I don't see an obvious user error, but I haven't played with java jitdump

Couple of ideas come to mind.

+ When I was drafting .NET jitdump support, I looked briefly at the java agent.  My impression was the java agent support wasn't thread safe.  The jitdump file format requires that certain records are collocated to simplify the reader.  I wasn't sure how that was being accomplished by the writer if JAVA's JIT was multithreaded (like CoreCLR JIT is).

+ The way I implemented jitdump on CoreCLR was to hook into the existing system to write perf-map files.  So I am pretty confident there is a 1:1 correspondence of perf-map records with jitdump records.  Is it possible that Java choose to only emit interesting jitdump records?  Perhaps skipping thunks/stubs/trampolines?  

+ There are still some `unknown` records in CoreCLR, but I don't believe there is an increase.  I am pretty sure some of our stubs are not being generated.  They were present in our before case too.

+ Your methodology would be more repeatable if you record once, and analyze multiple times.  record, report, inject / report, inject / report

+ I was focusing on eliminating duplicate symbols for the same address.  So having the module name in the report allowed me to see if the symbol was from jitdump or from perf-map.  In the before case I could see duplicate symbols for the same address.  This was how the problem was first observed.

+ I would be more interested in looking at the diff of the reports.  Possibly sorted by address.  Probably with zero context.

If I were to guess, I would think Java choose to optimize jitdump and only record interesting code.

So if we need to support that scenario the approach of this patch wouldn't work.

We would need to use a different approach.  Ideas...
+ Keep anon mappings from overwriting jitdump mappings.  Introduce a weak add map method, that would only fill empty/anon.
+ Move the anon mapping timestamp to the beginning of time, so they are processed first
+ Fix the kernel perf map events
+ Keep the anon mappings in a separate fall back map

I kind of like the add weak map approach.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-05-29 18:48         ` Steve MacLean
@ 2020-05-29 21:44           ` Steve MacLean
  0 siblings, 0 replies; 12+ messages in thread
From: Steve MacLean @ 2020-05-29 21:44 UTC (permalink / raw)
  To: Steve MacLean, Ian Rogers
  Cc: Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

>/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java 
>-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer 
>-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar 
>dacapo-9.12-bach.jar jython 

Its also possible the `InitialCodeCacheSize=20M` argument is masking the problem.  Something like 4K would make the problem much easier to see.

I see the problem every time .NET grows the cache across a 64K page boundary.  20M may mean you are allocating huge pages or something which might mask the problem. We may be allocating code pages 64K at a time.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-05-28  9:32       ` Ian Rogers
  2020-05-29 18:48         ` Steve MacLean
@ 2020-06-01  6:17         ` Nick Gasson
  2020-06-01  8:53           ` Ian Rogers
  1 sibling, 1 reply; 12+ messages in thread
From: Nick Gasson @ 2020-06-01  6:17 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Steve MacLean, Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, linux-kernel

On 05/28/20 17:32 PM, Ian Rogers wrote:
>
> So on tip/perf/core with:
> 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
> 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
>
> I've been trying variants of:
>
> Before:
> /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> dacapo-9.12-bach.jar jython
> /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 578
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 6
>
> After:
> /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> dacapo-9.12-bach.jar jython
> /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 589
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 60
>
> So maybe the jit cache isn't behaving the way that the patch cures,
> the uptick in unknowns appears consistent in my testing though. I
> expect user error, is there an obvious explanation I'm missing?
>

Hi Ian,

I tried this as well with latest perf/core. The difference is that
unresolved addresses currently look like:

     0.00%  java             [JIT] tid 221782       [.] 0x0000ffff451499a4
     0.00%  java             [JIT] tid 221782       [.] 0x0000ffff4514f3e8
     0.00%  java             [JIT] tid 221782       [.] 0x0000ffff45149394

But after Steve's patch this becomes:

     0.00%  java             [unknown]              [.] 0x0000ffff58557d14
     0.00%  java             [unknown]              [.] 0x0000ffff785c03b4
     0.00%  java             [unknown]              [.] 0x0000ffff58386520

I couldn't see any events that were symbolised before but are no longer
symbolised after this patch.

I think most of these unknown events are caused by the asynchronous
nature of the JVMTI event handling. After an nmethod is compiled the
JVMTI event is posted to the Service Thread (*). So there can be a delay
between the time the compiled code starts executing and the time the
plugin receives the compiled code load event.

Here's an edited down example:

            java 215881 750014.947873:    1289634 cycles:u:      ffff7856ad10 [unknown] ([unknown])
  Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: [0xffff785694c0(0x640) @ 0x40 fd:01 121010091 1]:
            java 215881 750014.948665:    1295994 cycles:u:      ffff7856ad10 org.python.core.__builtin__.range(org.python

The plugin receives the event ~100us after the first sample inside that
method. Ideally we would use the timestamp when the method was actually
compiled, but I can't see any way to extract this information.

However I also saw a few recurring [unknown] addresses that never have a
corresponding code load event. I'm not sure where these come from.

(*) http://hg.openjdk.java.net/jdk/jdk/file/50fe8727ed79/src/hotspot/share/code/nmethod.cpp#l1591

--
Nick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-06-01  6:17         ` Nick Gasson
@ 2020-06-01  8:53           ` Ian Rogers
  2020-06-01 10:02             ` Nick Gasson
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2020-06-01  8:53 UTC (permalink / raw)
  To: Nick Gasson
  Cc: Steve MacLean, Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

On Sun, May 31, 2020 at 11:17 PM Nick Gasson <nick.gasson@arm.com> wrote:
>
> On 05/28/20 17:32 PM, Ian Rogers wrote:
> >
> > So on tip/perf/core with:
> > 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
> > 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
> >
> > I've been trying variants of:
> >
> > Before:
> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> > dacapo-9.12-bach.jar jython
> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> > 578
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> > 6
> >
> > After:
> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> > dacapo-9.12-bach.jar jython
> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> > 589
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> > 60
> >
> > So maybe the jit cache isn't behaving the way that the patch cures,
> > the uptick in unknowns appears consistent in my testing though. I
> > expect user error, is there an obvious explanation I'm missing?
> >
>
> Hi Ian,
>
> I tried this as well with latest perf/core. The difference is that
> unresolved addresses currently look like:
>
>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff451499a4
>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff4514f3e8
>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff45149394
>
> But after Steve's patch this becomes:
>
>      0.00%  java             [unknown]              [.] 0x0000ffff58557d14
>      0.00%  java             [unknown]              [.] 0x0000ffff785c03b4
>      0.00%  java             [unknown]              [.] 0x0000ffff58386520
>
> I couldn't see any events that were symbolised before but are no longer
> symbolised after this patch.

I see this, thanks for digging into the explanation! Were you able to
get a test case where the unknowns went down? For example, by forcing
the code cache size to be small? This is the result I'd expect to see.

> I think most of these unknown events are caused by the asynchronous
> nature of the JVMTI event handling. After an nmethod is compiled the
> JVMTI event is posted to the Service Thread (*). So there can be a delay
> between the time the compiled code starts executing and the time the
> plugin receives the compiled code load event.
>
> Here's an edited down example:
>
>             java 215881 750014.947873:    1289634 cycles:u:      ffff7856ad10 [unknown] ([unknown])
>   Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: [0xffff785694c0(0x640) @ 0x40 fd:01 121010091 1]:
>             java 215881 750014.948665:    1295994 cycles:u:      ffff7856ad10 org.python.core.__builtin__.range(org.python
>
> The plugin receives the event ~100us after the first sample inside that
> method. Ideally we would use the timestamp when the method was actually
> compiled, but I can't see any way to extract this information.

Hmm.. this is a bit weird as the compile_info at one point was a
resource allocation and so would be cleared out when the resource mark
was passed on the compiler's return. Not something you'd want to do
asynchronously. Presumably this has changed to improve performance,
but doing the jvmti on a separate thread defeats jitdump - if we see
samples in code ahead of the code being loaded. Perhaps a profiler
like async-profiler
(https://github.com/jvm-profiling-tools/async-profiler) has a
workaround for the lag.

> However I also saw a few recurring [unknown] addresses that never have a
> corresponding code load event. I'm not sure where these come from.

C2 is pretty bad for having assembly stub routine helpers (aka stub
routines) that aren't "documented". C1 is less like this, but it is
increasingly hard to force C1 as the compiler. If you are seeing these
things as leaf routines then its likely some stub routine clean up
could fix the issue.

Thanks,
Ian

> (*) http://hg.openjdk.java.net/jdk/jdk/file/50fe8727ed79/src/hotspot/share/code/nmethod.cpp#l1591
>
> --
> Nick
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-06-01  8:53           ` Ian Rogers
@ 2020-06-01 10:02             ` Nick Gasson
  2020-06-12 19:00               ` Steve MacLean
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Gasson @ 2020-06-01 10:02 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Steve MacLean, Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

On 06/01/20 16:53 PM, Ian Rogers wrote:
> On Sun, May 31, 2020 at 11:17 PM Nick Gasson <nick.gasson@arm.com> wrote:
>>
>> On 05/28/20 17:32 PM, Ian Rogers wrote:
>> >
>> > So on tip/perf/core with:
>> > 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
>> > 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
>> >
>> > I've been trying variants of:
>> >
>> > Before:
>> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
>> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
>> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
>> > dacapo-9.12-bach.jar jython
>> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
>> > 578
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
>> > 6
>> >
>> > After:
>> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
>> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
>> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
>> > dacapo-9.12-bach.jar jython
>> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
>> > 589
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
>> > 60
>> >
>> > So maybe the jit cache isn't behaving the way that the patch cures,
>> > the uptick in unknowns appears consistent in my testing though. I
>> > expect user error, is there an obvious explanation I'm missing?
>> >
>>
>> Hi Ian,
>>
>> I tried this as well with latest perf/core. The difference is that
>> unresolved addresses currently look like:
>>
>>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff451499a4
>>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff4514f3e8
>>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff45149394
>>
>> But after Steve's patch this becomes:
>>
>>      0.00%  java             [unknown]              [.] 0x0000ffff58557d14
>>      0.00%  java             [unknown]              [.] 0x0000ffff785c03b4
>>      0.00%  java             [unknown]              [.] 0x0000ffff58386520
>>
>> I couldn't see any events that were symbolised before but are no longer
>> symbolised after this patch.
>
> I see this, thanks for digging into the explanation! Were you able to
> get a test case where the unknowns went down? For example, by forcing
> the code cache size to be small? This is the result I'd expect to see.

I tried the same Dacapo benchmark as you with different values of
InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.

       Base   Patched
 100M  338    373
 50M   333    315
 25M   323    368
 15M   1238   309
 10M   2600   333
 1M    6035   337

This looks fairly convincing to me: the cliff at 15M is where the code
cache starts needing to be enlarged.

>
>> I think most of these unknown events are caused by the asynchronous
>> nature of the JVMTI event handling. After an nmethod is compiled the
>> JVMTI event is posted to the Service Thread (*). So there can be a delay
>> between the time the compiled code starts executing and the time the
>> plugin receives the compiled code load event.
>>
>> Here's an edited down example:
>>
>>             java 215881 750014.947873:    1289634 cycles:u:      ffff7856ad10 [unknown] ([unknown])
>>   Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: [0xffff785694c0(0x640) @ 0x40 fd:01 121010091 1]:
>>             java 215881 750014.948665:    1295994 cycles:u:      ffff7856ad10 org.python.core.__builtin__.range(org.python
>>
>> The plugin receives the event ~100us after the first sample inside that
>> method. Ideally we would use the timestamp when the method was actually
>> compiled, but I can't see any way to extract this information.
>
> Hmm.. this is a bit weird as the compile_info at one point was a
> resource allocation and so would be cleared out when the resource mark
> was passed on the compiler's return. Not something you'd want to do
> asynchronously. Presumably this has changed to improve performance,
> but doing the jvmti on a separate thread defeats jitdump - if we see
> samples in code ahead of the code being loaded. Perhaps a profiler
> like async-profiler
> (https://github.com/jvm-profiling-tools/async-profiler) has a
> workaround for the lag.
>

I had a brief look at the async-profiler JVMTI code but I can't see
anything it's doing differently. Their profiler.sh attaches to a running
JVM - I suspect this problem will be less obvious once the JIT has
warmed up.

The compile_info structure is still resource-allocated in
JvmtiExport::post_compiled_method_load() but that function is called
from the service thread not the compiler thread. The compiler thread
just pushes a nmethod* on a queue and the inlining data is recovered
from the nmethod object.

It seems we could extend JVMTI here to pass the code installation
timestamp as another CMLR record type.

--
Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-06-01 10:02             ` Nick Gasson
@ 2020-06-12 19:00               ` Steve MacLean
  2020-06-12 19:33                 ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Steve MacLean @ 2020-06-12 19:00 UTC (permalink / raw)
  To: Nick Gasson, Ian Rogers
  Cc: Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, LKML

>>> Hi Ian,
>>>
>>>> I tried this as well with latest perf/core. The difference is that 
>>> unresolved addresses currently look like:
>>>
>>>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff451499a4
>>>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff4514f3e8
>>>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff45149394
>>>
>>> But after Steve's patch this becomes:
>>>
>>>      0.00%  java             [unknown]              [.] 0x0000ffff58557d14
>>>      0.00%  java             [unknown]              [.] 0x0000ffff785c03b4
>>>      0.00%  java             [unknown]              [.] 0x0000ffff58386520
>>>
>>> I couldn't see any events that were symbolised before but are no 
>>> longer symbolised after this patch.
>>
>> I see this, thanks for digging into the explanation! Were you able to 
>> get a test case where the unknowns went down? For example, by forcing 
>> the code cache size to be small? This is the result I'd expect to see.
>
>I tried the same Dacapo benchmark as you with different values of InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.
>
>       Base   Patched
> 100M  338    373
> 50M   333    315
> 25M   323    368
> 15M   1238   309
> 10M   2600   333
> 1M    6035   337
>
>This looks fairly convincing to me: the cliff at 15M is where the code cache starts needing to be enlarged.
>

Removing the anonymous mappings causes a small regression.  Specifically,
the reporting of the module name goes from "[JIT] tid <tid>" to "[unknown]".
This occurs when the JIT fails to report memory used in jitdump before it 
is used.

However there is also confirmation that JAVA does see the reported issue 
when using a small code cache.  The current patch resolves the issue in
this case.

I see two options:

+ Accept the regression. Since this is a regression for a jit dump 
reporting synchronization error, this may be a reasonable option.

+ Design a more complicated patch. Either
    + Only strip parts of // anon mmap events overlapping existing 
      jitted-<pid>-<code_index>.so mmap events.
    + Only strip parts of // anon mmap events overlapping prior
      // anon mmap events

Any opinions?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events
  2020-06-12 19:00               ` Steve MacLean
@ 2020-06-12 19:33                 ` Ian Rogers
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2020-06-12 19:33 UTC (permalink / raw)
  To: Steve MacLean, Arnaldo Carvalho de Melo
  Cc: Nick Gasson, Steve MacLean, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, LKML

On Fri, Jun 12, 2020 at 12:00 PM Steve MacLean
<Steve.MacLean@microsoft.com> wrote:
>
> >>> Hi Ian,
> >>>
> >>>> I tried this as well with latest perf/core. The difference is that
> >>> unresolved addresses currently look like:
> >>>
> >>>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff451499a4
> >>>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff4514f3e8
> >>>      0.00%  java             [JIT] tid 221782       [.] 0x0000ffff45149394
> >>>
> >>> But after Steve's patch this becomes:
> >>>
> >>>      0.00%  java             [unknown]              [.] 0x0000ffff58557d14
> >>>      0.00%  java             [unknown]              [.] 0x0000ffff785c03b4
> >>>      0.00%  java             [unknown]              [.] 0x0000ffff58386520
> >>>
> >>> I couldn't see any events that were symbolised before but are no
> >>> longer symbolised after this patch.
> >>
> >> I see this, thanks for digging into the explanation! Were you able to
> >> get a test case where the unknowns went down? For example, by forcing
> >> the code cache size to be small? This is the result I'd expect to see.
> >
> >I tried the same Dacapo benchmark as you with different values of InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.
> >
> >       Base   Patched
> > 100M  338    373
> > 50M   333    315
> > 25M   323    368
> > 15M   1238   309
> > 10M   2600   333
> > 1M    6035   337
> >
> >This looks fairly convincing to me: the cliff at 15M is where the code cache starts needing to be enlarged.
> >
>
> Removing the anonymous mappings causes a small regression.  Specifically,
> the reporting of the module name goes from "[JIT] tid <tid>" to "[unknown]".
> This occurs when the JIT fails to report memory used in jitdump before it
> is used.
>
> However there is also confirmation that JAVA does see the reported issue
> when using a small code cache.  The current patch resolves the issue in
> this case.
>
> I see two options:
>
> + Accept the regression. Since this is a regression for a jit dump
> reporting synchronization error, this may be a reasonable option.
>
> + Design a more complicated patch. Either
>     + Only strip parts of // anon mmap events overlapping existing
>       jitted-<pid>-<code_index>.so mmap events.
>     + Only strip parts of // anon mmap events overlapping prior
>       // anon mmap events
>
> Any opinions?

Hi Steve,

I think we should merge this change. I wanted to get a test case
together showing the benefit. Based on Nick Gasson's feedback I was
trying with command lines like:
/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data
/usr/lib/jvm/java-14-openjdk-amd64/bin/java
-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
-XX:InitialCodeCacheSize=10M -jar dacapo-9.12-bach.jar jython

I wanted to be able to demonstrate a non-zero effect in getting
samples. I spent limited time on the test and didn't get a result that
demonstrated a benefit, things weren't worse. I think the lack of
benefit is another bug where HotSpot's JVMTI code runs in parallel to
the new code being available in the JIT cache. We get samples ahead of
when the jitdump thinks the code is there and so it symbolizes them as
unknown. We should get HotSpot to provide the earliest timestamp to
avoid this problem.

I think a good way forward would be to merge this code as other issues
can be fixed as a follow up. Arnaldo, does that work for you? If so,
Acked-by: Ian Rogers <irogers@google.com>.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-06-12 19:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27  1:51 [PATCH v4] perf inject --jit: Remove //anon mmap events Steve MacLean
2020-05-27 19:00 ` Ian Rogers
2020-05-27 19:27   ` [EXTERNAL] " Steve MacLean
2020-05-27 20:59     ` Ian Rogers
2020-05-28  9:32       ` Ian Rogers
2020-05-29 18:48         ` Steve MacLean
2020-05-29 21:44           ` Steve MacLean
2020-06-01  6:17         ` Nick Gasson
2020-06-01  8:53           ` Ian Rogers
2020-06-01 10:02             ` Nick Gasson
2020-06-12 19:00               ` Steve MacLean
2020-06-12 19:33                 ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).