All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/9] perf kmem: Implement page allocation analysis (v6)
@ 2015-04-06  5:36 ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Hello,

Currently perf kmem command only analyzes SLAB memory allocation.  And
I'd like to introduce page allocation analysis also.  Users can use
 --slab and/or --page option to select it.  If none of these options
 are used, it does slab allocation analysis for backward compatibility.

 * changes in v6)
   - add -i option fix  (Jiri)
   - libtraceevent operator priority fix

* changes in v5)
   - print migration type and gfp flags in more compact form  (Arnaldo)
   - add kmem.default config option

 * changes in v4)
   - use pfn instead of struct page * in tracepoints  (Joonsoo, Ingo)
   - print gfp flags in human readable string  (Joonsoo, Minchan)

* changes in v3)
  - add live page statistics

 * changes in v2)
   - Use thousand grouping for big numbers - i.e. 12345 -> 12,345  (Ingo)
   - Improve output stat readability  (Ingo)
   - Remove alloc size column as it can be calculated from hits and order

Patch 1 is to convert tracepoint to save pfn instead of struct page *.
Patch 2 implements basic support for page allocation analysis, patch 3
deals with the callsite and patch 4 implements sorting.  Patch 5
introduces live page analysis which is to focus on currently allocated
pages only.  Finally patch 6 prints gfp flags in human readable string.

In this patchset, I used two kmem events: kmem:mm_page_alloc and
kmem_page_free for analysis as they can track almost all of memory
allocation/free path AFAIK.  However, unlike slab tracepoint events,
those page allocation events don't provide callsite info directly.  So
I recorded callchains and extracted callsites like below:

Normal page allocation callchains look like this:

  360a7e __alloc_pages_nodemask
  3a711c alloc_pages_current
  357bc7 __page_cache_alloc   <-- callsite
  357cf6 pagecache_get_page
   48b0a prepare_pages
   494d3 __btrfs_buffered_write
   49cdf btrfs_file_write_iter
  3ceb6e new_sync_write
  3cf447 vfs_write
  3cff99 sys_write
  7556e9 system_call
    f880 __write_nocancel
   33eb9 cmd_record
   4b38e cmd_kmem
   7aa23 run_builtin
   27a9a main
   20800 __libc_start_main

But first two are internal page allocation functions so it should be
skipped.  To determine such allocation functions, I used following regex:

  ^_?_?(alloc|get_free|get_zeroed)_pages?

This gave me a following list of functions (you can see this with -v):

  alloc func: __get_free_pages
  alloc func: get_zeroed_page
  alloc func: alloc_pages_exact
  alloc func: __alloc_pages_direct_compact
  alloc func: __alloc_pages_nodemask
  alloc func: alloc_page_interleave
  alloc func: alloc_pages_current
  alloc func: alloc_pages_vma
  alloc func: alloc_page_buffers
  alloc func: alloc_pages_exact_nid

After skipping those function, it got '__page_cache_alloc'.

Other information such as allocation order, migration type and gfp
flags are provided by tracepoint events.

Basically the output will be sorted by total allocation bytes, but you
can change it by using -s/--sort option.  The following sort keys are
added to support page analysis: page, order, mtype, gfp.  Existing
'callsite', 'bytes' and 'hit' sort keys also can be used.

An example follows:

  # perf kmem record --page sleep 5
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 1.065 MB perf.data (2949 samples) ]

  # perf kmem stat --page --caller -l 10
  # GFP flags
  # ---------
  # 00000010: GFP_NOIO
  # 000000d0: GFP_KERNEL
  # 00000200: GFP_NOWARN
  # 000052d0: GFP_KERNEL|GFP_NOWARN|GFP_NORETRY|GFP_COMP
  # 000084d0: GFP_KERNEL|GFP_REPEAT|GFP_ZERO
  # 000200d0: GFP_USER
  # 000200d2: GFP_HIGHUSER
  # 000200da: GFP_HIGHUSER_MOVABLE
  # 000280da: GFP_HIGHUSER_MOVABLE|GFP_ZERO
  # 002084d0: GFP_KERNEL|GFP_REPEAT|GFP_ZERO|GFP_NOTRACK
  # 0102005a: GFP_NOFS|GFP_HARDWALL|GFP_MOVABLE
  ---------------------------------------------------------------------------------------------------------
   Total alloc (KB) | Hits      | Order | Migration type | GFP flags | Callsite
  ---------------------------------------------------------------------------------------------------------
                 16 |         1 |     2 |      UNMOVABLE |  000052d0 | alloc_skb_with_frags
                 24 |         3 |     1 |      UNMOVABLE |  000052d0 | alloc_skb_with_frags
              3,876 |       969 |     0 |        MOVABLE |  000200da | shmem_alloc_page
                972 |       243 |     0 |      UNMOVABLE |  000000d0 | __pollwait
                624 |       156 |     0 |        MOVABLE |  0102005a | __page_cache_alloc
                304 |        76 |     0 |      UNMOVABLE |  000200d0 | dma_generic_alloc_coherent
                108 |        27 |     0 |        MOVABLE |  000280da | handle_mm_fault
                 56 |        14 |     0 |      UNMOVABLE |  002084d0 | pte_alloc_one
                 24 |         6 |     0 |        MOVABLE |  000200da | do_wp_page
                 16 |         4 |     0 |      UNMOVABLE |  00000200 | __tlb_remove_page
   ...              | ...       | ...   | ...            | ...       | ...
  ---------------------------------------------------------------------------------------------------------

  SUMMARY (page allocator)
  ========================
  Total allocation requests     :            1,518   [            6,096 KB ]
  Total free requests           :            1,431   [            5,748 KB ]

  Total alloc+freed requests    :            1,330   [            5,344 KB ]
  Total alloc-only requests     :              188   [              752 KB ]
  Total free-only requests      :              101   [              404 KB ]

  Total allocation failures     :                0   [                0 KB ]

  Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
  -----  ------------  ------------  ------------  ------------  ------------
      0           351             .         1,163             .             .
      1             3             .             .             .             .
      2             1             .             .             .             .
      3             .             .             .             .             .
      4             .             .             .             .             .
      5             .             .             .             .             .
      6             .             .             .             .             .
      7             .             .             .             .             .
      8             .             .             .             .             .
      9             .             .             .             .             .
     10             .             .             .             .             .

I have some idea how to improve it.  But I'd also like to hear other
idea, suggestion, feedback and so on.

This is available at perf/kmem-page-v6 branch on my tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Jiri Olsa (1):
  perf kmem: Respect -i option

Namhyung Kim (8):
  tracing, mm: Record pfn instead of pointer to struct page
  perf kmem: Analyze page allocator events also
  perf kmem: Implement stat --page --caller
  perf kmem: Support sort keys on page analysis
  perf kmem: Add --live option for current allocation stat
  perf kmem: Print gfp flags in human readable string
  perf kmem: Add kmem.default config option
  tools lib traceevent: Honor operator priority

 include/trace/events/filemap.h         |    8 +-
 include/trace/events/kmem.h            |   42 +-
 include/trace/events/vmscan.h          |    8 +-
 tools/lib/traceevent/event-parse.c     |   17 +-
 tools/perf/Documentation/perf-kmem.txt |   19 +-
 tools/perf/builtin-kmem.c              | 1302 ++++++++++++++++++++++++++++++--
 6 files changed, 1307 insertions(+), 89 deletions(-)

-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCHSET 0/9] perf kmem: Implement page allocation analysis (v6)
@ 2015-04-06  5:36 ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Hello,

Currently perf kmem command only analyzes SLAB memory allocation.  And
I'd like to introduce page allocation analysis also.  Users can use
 --slab and/or --page option to select it.  If none of these options
 are used, it does slab allocation analysis for backward compatibility.

 * changes in v6)
   - add -i option fix  (Jiri)
   - libtraceevent operator priority fix

* changes in v5)
   - print migration type and gfp flags in more compact form  (Arnaldo)
   - add kmem.default config option

 * changes in v4)
   - use pfn instead of struct page * in tracepoints  (Joonsoo, Ingo)
   - print gfp flags in human readable string  (Joonsoo, Minchan)

* changes in v3)
  - add live page statistics

 * changes in v2)
   - Use thousand grouping for big numbers - i.e. 12345 -> 12,345  (Ingo)
   - Improve output stat readability  (Ingo)
   - Remove alloc size column as it can be calculated from hits and order

Patch 1 is to convert tracepoint to save pfn instead of struct page *.
Patch 2 implements basic support for page allocation analysis, patch 3
deals with the callsite and patch 4 implements sorting.  Patch 5
introduces live page analysis which is to focus on currently allocated
pages only.  Finally patch 6 prints gfp flags in human readable string.

In this patchset, I used two kmem events: kmem:mm_page_alloc and
kmem_page_free for analysis as they can track almost all of memory
allocation/free path AFAIK.  However, unlike slab tracepoint events,
those page allocation events don't provide callsite info directly.  So
I recorded callchains and extracted callsites like below:

Normal page allocation callchains look like this:

  360a7e __alloc_pages_nodemask
  3a711c alloc_pages_current
  357bc7 __page_cache_alloc   <-- callsite
  357cf6 pagecache_get_page
   48b0a prepare_pages
   494d3 __btrfs_buffered_write
   49cdf btrfs_file_write_iter
  3ceb6e new_sync_write
  3cf447 vfs_write
  3cff99 sys_write
  7556e9 system_call
    f880 __write_nocancel
   33eb9 cmd_record
   4b38e cmd_kmem
   7aa23 run_builtin
   27a9a main
   20800 __libc_start_main

But first two are internal page allocation functions so it should be
skipped.  To determine such allocation functions, I used following regex:

  ^_?_?(alloc|get_free|get_zeroed)_pages?

This gave me a following list of functions (you can see this with -v):

  alloc func: __get_free_pages
  alloc func: get_zeroed_page
  alloc func: alloc_pages_exact
  alloc func: __alloc_pages_direct_compact
  alloc func: __alloc_pages_nodemask
  alloc func: alloc_page_interleave
  alloc func: alloc_pages_current
  alloc func: alloc_pages_vma
  alloc func: alloc_page_buffers
  alloc func: alloc_pages_exact_nid

After skipping those function, it got '__page_cache_alloc'.

Other information such as allocation order, migration type and gfp
flags are provided by tracepoint events.

Basically the output will be sorted by total allocation bytes, but you
can change it by using -s/--sort option.  The following sort keys are
added to support page analysis: page, order, mtype, gfp.  Existing
'callsite', 'bytes' and 'hit' sort keys also can be used.

An example follows:

  # perf kmem record --page sleep 5
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 1.065 MB perf.data (2949 samples) ]

  # perf kmem stat --page --caller -l 10
  # GFP flags
  # ---------
  # 00000010: GFP_NOIO
  # 000000d0: GFP_KERNEL
  # 00000200: GFP_NOWARN
  # 000052d0: GFP_KERNEL|GFP_NOWARN|GFP_NORETRY|GFP_COMP
  # 000084d0: GFP_KERNEL|GFP_REPEAT|GFP_ZERO
  # 000200d0: GFP_USER
  # 000200d2: GFP_HIGHUSER
  # 000200da: GFP_HIGHUSER_MOVABLE
  # 000280da: GFP_HIGHUSER_MOVABLE|GFP_ZERO
  # 002084d0: GFP_KERNEL|GFP_REPEAT|GFP_ZERO|GFP_NOTRACK
  # 0102005a: GFP_NOFS|GFP_HARDWALL|GFP_MOVABLE
  ---------------------------------------------------------------------------------------------------------
   Total alloc (KB) | Hits      | Order | Migration type | GFP flags | Callsite
  ---------------------------------------------------------------------------------------------------------
                 16 |         1 |     2 |      UNMOVABLE |  000052d0 | alloc_skb_with_frags
                 24 |         3 |     1 |      UNMOVABLE |  000052d0 | alloc_skb_with_frags
              3,876 |       969 |     0 |        MOVABLE |  000200da | shmem_alloc_page
                972 |       243 |     0 |      UNMOVABLE |  000000d0 | __pollwait
                624 |       156 |     0 |        MOVABLE |  0102005a | __page_cache_alloc
                304 |        76 |     0 |      UNMOVABLE |  000200d0 | dma_generic_alloc_coherent
                108 |        27 |     0 |        MOVABLE |  000280da | handle_mm_fault
                 56 |        14 |     0 |      UNMOVABLE |  002084d0 | pte_alloc_one
                 24 |         6 |     0 |        MOVABLE |  000200da | do_wp_page
                 16 |         4 |     0 |      UNMOVABLE |  00000200 | __tlb_remove_page
   ...              | ...       | ...   | ...            | ...       | ...
  ---------------------------------------------------------------------------------------------------------

  SUMMARY (page allocator)
  ========================
  Total allocation requests     :            1,518   [            6,096 KB ]
  Total free requests           :            1,431   [            5,748 KB ]

  Total alloc+freed requests    :            1,330   [            5,344 KB ]
  Total alloc-only requests     :              188   [              752 KB ]
  Total free-only requests      :              101   [              404 KB ]

  Total allocation failures     :                0   [                0 KB ]

  Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
  -----  ------------  ------------  ------------  ------------  ------------
      0           351             .         1,163             .             .
      1             3             .             .             .             .
      2             1             .             .             .             .
      3             .             .             .             .             .
      4             .             .             .             .             .
      5             .             .             .             .             .
      6             .             .             .             .             .
      7             .             .             .             .             .
      8             .             .             .             .             .
      9             .             .             .             .             .
     10             .             .             .             .             .

I have some idea how to improve it.  But I'd also like to hear other
idea, suggestion, feedback and so on.

This is available at perf/kmem-page-v6 branch on my tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Jiri Olsa (1):
  perf kmem: Respect -i option

Namhyung Kim (8):
  tracing, mm: Record pfn instead of pointer to struct page
  perf kmem: Analyze page allocator events also
  perf kmem: Implement stat --page --caller
  perf kmem: Support sort keys on page analysis
  perf kmem: Add --live option for current allocation stat
  perf kmem: Print gfp flags in human readable string
  perf kmem: Add kmem.default config option
  tools lib traceevent: Honor operator priority

 include/trace/events/filemap.h         |    8 +-
 include/trace/events/kmem.h            |   42 +-
 include/trace/events/vmscan.h          |    8 +-
 tools/lib/traceevent/event-parse.c     |   17 +-
 tools/perf/Documentation/perf-kmem.txt |   19 +-
 tools/perf/builtin-kmem.c              | 1302 ++++++++++++++++++++++++++++++--
 6 files changed, 1307 insertions(+), 89 deletions(-)

-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/9] perf kmem: Respect -i option
  2015-04-06  5:36 ` Namhyung Kim
@ 2015-04-06  5:36   ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm, Jiri Olsa

From: Jiri Olsa <jolsa@kernel.org>

Currently the perf kmem does not respect -i option.
Initializing the file.path properly after options
get parsed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-kmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index ac303ef9f2f0..4ebf65c79434 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -663,7 +663,6 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 {
 	const char * const default_sort_order = "frag,hit,bytes";
 	struct perf_data_file file = {
-		.path = input_name,
 		.mode = PERF_DATA_MODE_READ,
 	};
 	const struct option kmem_options[] = {
@@ -701,6 +700,8 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 		return __cmd_record(argc, argv);
 	}
 
+	file.path = input_name;
+
 	session = perf_session__new(&file, false, &perf_kmem);
 	if (session == NULL)
 		return -1;
-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/9] perf kmem: Respect -i option
@ 2015-04-06  5:36   ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm, Jiri Olsa

From: Jiri Olsa <jolsa@kernel.org>

Currently the perf kmem does not respect -i option.
Initializing the file.path properly after options
get parsed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-kmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index ac303ef9f2f0..4ebf65c79434 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -663,7 +663,6 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 {
 	const char * const default_sort_order = "frag,hit,bytes";
 	struct perf_data_file file = {
-		.path = input_name,
 		.mode = PERF_DATA_MODE_READ,
 	};
 	const struct option kmem_options[] = {
@@ -701,6 +700,8 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 		return __cmd_record(argc, argv);
 	}
 
+	file.path = input_name;
+
 	session = perf_session__new(&file, false, &perf_kmem);
 	if (session == NULL)
 		return -1;
-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/9] tracing, mm: Record pfn instead of pointer to struct page
  2015-04-06  5:36 ` Namhyung Kim
@ 2015-04-06  5:36   ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

The struct page is opaque for userspace tools, so it'd be better to save
pfn in order to identify page frames.

The textual output of $debugfs/tracing/trace file remains unchanged and
only raw (binary) data format is changed - but thanks to libtraceevent,
userspace tools which deal with the raw data (like perf and trace-cmd)
can parse the format easily.  So impact on the userspace will also be
minimal.

Based-on-patch-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-mm@kvack.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/trace/events/filemap.h |  8 ++++----
 include/trace/events/kmem.h    | 42 +++++++++++++++++++++---------------------
 include/trace/events/vmscan.h  |  8 ++++----
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index 0421f49a20f7..42febb6bc1d5 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -18,14 +18,14 @@ DECLARE_EVENT_CLASS(mm_filemap_op_page_cache,
 	TP_ARGS(page),
 
 	TP_STRUCT__entry(
-		__field(struct page *, page)
+		__field(unsigned long, pfn)
 		__field(unsigned long, i_ino)
 		__field(unsigned long, index)
 		__field(dev_t, s_dev)
 	),
 
 	TP_fast_assign(
-		__entry->page = page;
+		__entry->pfn = page_to_pfn(page);
 		__entry->i_ino = page->mapping->host->i_ino;
 		__entry->index = page->index;
 		if (page->mapping->host->i_sb)
@@ -37,8 +37,8 @@ DECLARE_EVENT_CLASS(mm_filemap_op_page_cache,
 	TP_printk("dev %d:%d ino %lx page=%p pfn=%lu ofs=%lu",
 		MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
 		__entry->i_ino,
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		__entry->index << PAGE_SHIFT)
 );
 
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 4ad10baecd4d..81ea59812117 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -154,18 +154,18 @@ TRACE_EVENT(mm_page_free,
 	TP_ARGS(page, order),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page_to_pfn(page);
 		__entry->order		= order;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%d",
-			__entry->page,
-			page_to_pfn(__entry->page),
+			pfn_to_page(__entry->pfn),
+			__entry->pfn,
 			__entry->order)
 );
 
@@ -176,18 +176,18 @@ TRACE_EVENT(mm_page_free_batched,
 	TP_ARGS(page, cold),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	int,		cold		)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page_to_pfn(page);
 		__entry->cold		= cold;
 	),
 
 	TP_printk("page=%p pfn=%lu order=0 cold=%d",
-			__entry->page,
-			page_to_pfn(__entry->page),
+			pfn_to_page(__entry->pfn),
+			__entry->pfn,
 			__entry->cold)
 );
 
@@ -199,22 +199,22 @@ TRACE_EVENT(mm_page_alloc,
 	TP_ARGS(page, order, gfp_flags, migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 		__field(	gfp_t,		gfp_flags	)
 		__field(	int,		migratetype	)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page ? page_to_pfn(page) : -1UL;
 		__entry->order		= order;
 		__entry->gfp_flags	= gfp_flags;
 		__entry->migratetype	= migratetype;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s",
-		__entry->page,
-		__entry->page ? page_to_pfn(__entry->page) : 0,
+		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
+		__entry->pfn != -1UL ? __entry->pfn : 0,
 		__entry->order,
 		__entry->migratetype,
 		show_gfp_flags(__entry->gfp_flags))
@@ -227,20 +227,20 @@ DECLARE_EVENT_CLASS(mm_page,
 	TP_ARGS(page, order, migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 		__field(	int,		migratetype	)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page ? page_to_pfn(page) : -1UL;
 		__entry->order		= order;
 		__entry->migratetype	= migratetype;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%u migratetype=%d percpu_refill=%d",
-		__entry->page,
-		__entry->page ? page_to_pfn(__entry->page) : 0,
+		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
+		__entry->pfn != -1UL ? __entry->pfn : 0,
 		__entry->order,
 		__entry->migratetype,
 		__entry->order == 0)
@@ -260,7 +260,7 @@ DEFINE_EVENT_PRINT(mm_page, mm_page_pcpu_drain,
 	TP_ARGS(page, order, migratetype),
 
 	TP_printk("page=%p pfn=%lu order=%d migratetype=%d",
-		__entry->page, page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn), __entry->pfn,
 		__entry->order, __entry->migratetype)
 );
 
@@ -275,7 +275,7 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 		alloc_migratetype, fallback_migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page			)
+		__field(	unsigned long,	pfn			)
 		__field(	int,		alloc_order		)
 		__field(	int,		fallback_order		)
 		__field(	int,		alloc_migratetype	)
@@ -284,7 +284,7 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 	),
 
 	TP_fast_assign(
-		__entry->page			= page;
+		__entry->pfn			= page_to_pfn(page);
 		__entry->alloc_order		= alloc_order;
 		__entry->fallback_order		= fallback_order;
 		__entry->alloc_migratetype	= alloc_migratetype;
@@ -294,8 +294,8 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 	),
 
 	TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d",
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		__entry->alloc_order,
 		__entry->fallback_order,
 		pageblock_order,
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 69590b6ffc09..f66476b96264 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -336,18 +336,18 @@ TRACE_EVENT(mm_vmscan_writepage,
 	TP_ARGS(page, reclaim_flags),
 
 	TP_STRUCT__entry(
-		__field(struct page *, page)
+		__field(unsigned long, pfn)
 		__field(int, reclaim_flags)
 	),
 
 	TP_fast_assign(
-		__entry->page = page;
+		__entry->pfn = page_to_pfn(page);
 		__entry->reclaim_flags = reclaim_flags;
 	),
 
 	TP_printk("page=%p pfn=%lu flags=%s",
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		show_reclaim_flags(__entry->reclaim_flags))
 );
 
-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 2/9] tracing, mm: Record pfn instead of pointer to struct page
@ 2015-04-06  5:36   ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

The struct page is opaque for userspace tools, so it'd be better to save
pfn in order to identify page frames.

The textual output of $debugfs/tracing/trace file remains unchanged and
only raw (binary) data format is changed - but thanks to libtraceevent,
userspace tools which deal with the raw data (like perf and trace-cmd)
can parse the format easily.  So impact on the userspace will also be
minimal.

Based-on-patch-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-mm@kvack.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/trace/events/filemap.h |  8 ++++----
 include/trace/events/kmem.h    | 42 +++++++++++++++++++++---------------------
 include/trace/events/vmscan.h  |  8 ++++----
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index 0421f49a20f7..42febb6bc1d5 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -18,14 +18,14 @@ DECLARE_EVENT_CLASS(mm_filemap_op_page_cache,
 	TP_ARGS(page),
 
 	TP_STRUCT__entry(
-		__field(struct page *, page)
+		__field(unsigned long, pfn)
 		__field(unsigned long, i_ino)
 		__field(unsigned long, index)
 		__field(dev_t, s_dev)
 	),
 
 	TP_fast_assign(
-		__entry->page = page;
+		__entry->pfn = page_to_pfn(page);
 		__entry->i_ino = page->mapping->host->i_ino;
 		__entry->index = page->index;
 		if (page->mapping->host->i_sb)
@@ -37,8 +37,8 @@ DECLARE_EVENT_CLASS(mm_filemap_op_page_cache,
 	TP_printk("dev %d:%d ino %lx page=%p pfn=%lu ofs=%lu",
 		MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
 		__entry->i_ino,
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		__entry->index << PAGE_SHIFT)
 );
 
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 4ad10baecd4d..81ea59812117 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -154,18 +154,18 @@ TRACE_EVENT(mm_page_free,
 	TP_ARGS(page, order),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page_to_pfn(page);
 		__entry->order		= order;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%d",
-			__entry->page,
-			page_to_pfn(__entry->page),
+			pfn_to_page(__entry->pfn),
+			__entry->pfn,
 			__entry->order)
 );
 
@@ -176,18 +176,18 @@ TRACE_EVENT(mm_page_free_batched,
 	TP_ARGS(page, cold),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	int,		cold		)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page_to_pfn(page);
 		__entry->cold		= cold;
 	),
 
 	TP_printk("page=%p pfn=%lu order=0 cold=%d",
-			__entry->page,
-			page_to_pfn(__entry->page),
+			pfn_to_page(__entry->pfn),
+			__entry->pfn,
 			__entry->cold)
 );
 
@@ -199,22 +199,22 @@ TRACE_EVENT(mm_page_alloc,
 	TP_ARGS(page, order, gfp_flags, migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 		__field(	gfp_t,		gfp_flags	)
 		__field(	int,		migratetype	)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page ? page_to_pfn(page) : -1UL;
 		__entry->order		= order;
 		__entry->gfp_flags	= gfp_flags;
 		__entry->migratetype	= migratetype;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s",
-		__entry->page,
-		__entry->page ? page_to_pfn(__entry->page) : 0,
+		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
+		__entry->pfn != -1UL ? __entry->pfn : 0,
 		__entry->order,
 		__entry->migratetype,
 		show_gfp_flags(__entry->gfp_flags))
@@ -227,20 +227,20 @@ DECLARE_EVENT_CLASS(mm_page,
 	TP_ARGS(page, order, migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 		__field(	int,		migratetype	)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page ? page_to_pfn(page) : -1UL;
 		__entry->order		= order;
 		__entry->migratetype	= migratetype;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%u migratetype=%d percpu_refill=%d",
-		__entry->page,
-		__entry->page ? page_to_pfn(__entry->page) : 0,
+		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
+		__entry->pfn != -1UL ? __entry->pfn : 0,
 		__entry->order,
 		__entry->migratetype,
 		__entry->order == 0)
@@ -260,7 +260,7 @@ DEFINE_EVENT_PRINT(mm_page, mm_page_pcpu_drain,
 	TP_ARGS(page, order, migratetype),
 
 	TP_printk("page=%p pfn=%lu order=%d migratetype=%d",
-		__entry->page, page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn), __entry->pfn,
 		__entry->order, __entry->migratetype)
 );
 
@@ -275,7 +275,7 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 		alloc_migratetype, fallback_migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page			)
+		__field(	unsigned long,	pfn			)
 		__field(	int,		alloc_order		)
 		__field(	int,		fallback_order		)
 		__field(	int,		alloc_migratetype	)
@@ -284,7 +284,7 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 	),
 
 	TP_fast_assign(
-		__entry->page			= page;
+		__entry->pfn			= page_to_pfn(page);
 		__entry->alloc_order		= alloc_order;
 		__entry->fallback_order		= fallback_order;
 		__entry->alloc_migratetype	= alloc_migratetype;
@@ -294,8 +294,8 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 	),
 
 	TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d",
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		__entry->alloc_order,
 		__entry->fallback_order,
 		pageblock_order,
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 69590b6ffc09..f66476b96264 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -336,18 +336,18 @@ TRACE_EVENT(mm_vmscan_writepage,
 	TP_ARGS(page, reclaim_flags),
 
 	TP_STRUCT__entry(
-		__field(struct page *, page)
+		__field(unsigned long, pfn)
 		__field(int, reclaim_flags)
 	),
 
 	TP_fast_assign(
-		__entry->page = page;
+		__entry->pfn = page_to_pfn(page);
 		__entry->reclaim_flags = reclaim_flags;
 	),
 
 	TP_printk("page=%p pfn=%lu flags=%s",
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		show_reclaim_flags(__entry->reclaim_flags))
 );
 
-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/9] perf kmem: Analyze page allocator events also
  2015-04-06  5:36 ` Namhyung Kim
@ 2015-04-06  5:36   ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

The perf kmem command records and analyze kernel memory allocation
only for SLAB objects.  This patch implement a simple page allocator
analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.

It adds two new options of --slab and --page.  The --slab option is
for analyzing SLAB allocator and that's what perf kmem currently does.

The new --page option enables page allocator events and analyze kernel
memory usage in page unit.  Currently, 'stat --alloc' subcommand is
implemented only.

If none of these --slab nor --page is specified, --slab is implied.

  # perf kmem stat --page --alloc --line 10

  -------------------------------------------------------------------------------
   PFN              | Total alloc (KB) | Hits     | Order | Mig.type | GFP flags
  -------------------------------------------------------------------------------
            4045014 |               16 |        1 |     2 |  RECLAIM |  00285250
            4143980 |               16 |        1 |     2 |  RECLAIM |  00285250
            3938658 |               16 |        1 |     2 |  RECLAIM |  00285250
            4045400 |               16 |        1 |     2 |  RECLAIM |  00285250
            3568708 |               16 |        1 |     2 |  RECLAIM |  00285250
            3729824 |               16 |        1 |     2 |  RECLAIM |  00285250
            3657210 |               16 |        1 |     2 |  RECLAIM |  00285250
            4120750 |               16 |        1 |     2 |  RECLAIM |  00285250
            3678850 |               16 |        1 |     2 |  RECLAIM |  00285250
            3693874 |               16 |        1 |     2 |  RECLAIM |  00285250
   ...              | ...              | ...      | ...   | ...      | ...
  -------------------------------------------------------------------------------

  SUMMARY (page allocator)
  ========================
  Total allocation requests     :           44,260   [          177,256 KB ]
  Total free requests           :              117   [              468 KB ]

  Total alloc+freed requests    :               49   [              196 KB ]
  Total alloc-only requests     :           44,211   [          177,060 KB ]
  Total free-only requests      :               68   [              272 KB ]

  Total allocation failures     :                0   [                0 KB ]

  Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
  -----  ------------  ------------  ------------  ------------  ------------
      0            32             .        44,210             .             .
      1             .             .             .             .             .
      2             .            18             .             .             .
      3             .             .             .             .             .
      4             .             .             .             .             .
      5             .             .             .             .             .
      6             .             .             .             .             .
      7             .             .             .             .             .
      8             .             .             .             .             .
      9             .             .             .             .             .
     10             .             .             .             .             .

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-kmem.txt |   8 +-
 tools/perf/builtin-kmem.c              | 500 +++++++++++++++++++++++++++++++--
 2 files changed, 491 insertions(+), 17 deletions(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
index 150253cc3c97..23219c65c16f 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -3,7 +3,7 @@ perf-kmem(1)
 
 NAME
 ----
-perf-kmem - Tool to trace/measure kernel memory(slab) properties
+perf-kmem - Tool to trace/measure kernel memory properties
 
 SYNOPSIS
 --------
@@ -46,6 +46,12 @@ OPTIONS
 --raw-ip::
 	Print raw ip instead of symbol
 
+--slab::
+	Analyze SLAB allocator events.
+
+--page::
+	Analyze page allocator events
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 4ebf65c79434..63ea01349b6e 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -22,6 +22,11 @@
 #include <linux/string.h>
 #include <locale.h>
 
+static int	kmem_slab;
+static int	kmem_page;
+
+static long	kmem_page_size;
+
 struct alloc_stat;
 typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
 
@@ -226,6 +231,244 @@ static int perf_evsel__process_free_event(struct perf_evsel *evsel,
 	return 0;
 }
 
+static u64 total_page_alloc_bytes;
+static u64 total_page_free_bytes;
+static u64 total_page_nomatch_bytes;
+static u64 total_page_fail_bytes;
+static unsigned long nr_page_allocs;
+static unsigned long nr_page_frees;
+static unsigned long nr_page_fails;
+static unsigned long nr_page_nomatch;
+
+static bool use_pfn;
+
+#define MAX_MIGRATE_TYPES  6
+#define MAX_PAGE_ORDER     11
+
+static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
+
+struct page_stat {
+	struct rb_node 	node;
+	u64 		page;
+	int 		order;
+	unsigned 	gfp_flags;
+	unsigned 	migrate_type;
+	u64		alloc_bytes;
+	u64 		free_bytes;
+	int 		nr_alloc;
+	int 		nr_free;
+};
+
+static struct rb_root page_tree;
+static struct rb_root page_alloc_tree;
+static struct rb_root page_alloc_sorted;
+
+static struct page_stat *search_page(unsigned long page, bool create)
+{
+	struct rb_node **node = &page_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = data->page - page;
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->page = page;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_tree);
+	}
+
+	return data;
+}
+
+static int page_stat_cmp(struct page_stat *a, struct page_stat *b)
+{
+	if (a->page > b->page)
+		return -1;
+	if (a->page < b->page)
+		return 1;
+	if (a->order > b->order)
+		return -1;
+	if (a->order < b->order)
+		return 1;
+	if (a->migrate_type > b->migrate_type)
+		return -1;
+	if (a->migrate_type < b->migrate_type)
+		return 1;
+	if (a->gfp_flags > b->gfp_flags)
+		return -1;
+	if (a->gfp_flags < b->gfp_flags)
+		return 1;
+	return 0;
+}
+
+static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool create)
+{
+	struct rb_node **node = &page_alloc_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = page_stat_cmp(data, stat);
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->page = stat->page;
+		data->order = stat->order;
+		data->gfp_flags = stat->gfp_flags;
+		data->migrate_type = stat->migrate_type;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_alloc_tree);
+	}
+
+	return data;
+}
+
+static bool valid_page(u64 pfn_or_page)
+{
+	if (use_pfn && pfn_or_page == -1UL)
+		return false;
+	if (!use_pfn && pfn_or_page == 0)
+		return false;
+	return true;
+}
+
+static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
+						struct perf_sample *sample)
+{
+	u64 page;
+	unsigned int order = perf_evsel__intval(evsel, sample, "order");
+	unsigned int gfp_flags = perf_evsel__intval(evsel, sample, "gfp_flags");
+	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
+						       "migratetype");
+	u64 bytes = kmem_page_size << order;
+	struct page_stat *stat;
+	struct page_stat this = {
+		.order = order,
+		.gfp_flags = gfp_flags,
+		.migrate_type = migrate_type,
+	};
+
+	if (use_pfn)
+		page = perf_evsel__intval(evsel, sample, "pfn");
+	else
+		page = perf_evsel__intval(evsel, sample, "page");
+
+	nr_page_allocs++;
+	total_page_alloc_bytes += bytes;
+
+	if (!valid_page(page)) {
+		nr_page_fails++;
+		total_page_fail_bytes += bytes;
+
+		return 0;
+	}
+
+	/*
+	 * This is to find the current page (with correct gfp flags and
+	 * migrate type) at free event.
+	 */
+	stat = search_page(page, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->order = order;
+	stat->gfp_flags = gfp_flags;
+	stat->migrate_type = migrate_type;
+
+	this.page = page;
+	stat = search_page_alloc_stat(&this, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->nr_alloc++;
+	stat->alloc_bytes += bytes;
+
+	order_stats[order][migrate_type]++;
+
+	return 0;
+}
+
+static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
+						struct perf_sample *sample)
+{
+	u64 page;
+	unsigned int order = perf_evsel__intval(evsel, sample, "order");
+	u64 bytes = kmem_page_size << order;
+	struct page_stat *stat;
+	struct page_stat this = {
+		.order = order,
+	};
+
+	if (use_pfn)
+		page = perf_evsel__intval(evsel, sample, "pfn");
+	else
+		page = perf_evsel__intval(evsel, sample, "page");
+
+	nr_page_frees++;
+	total_page_free_bytes += bytes;
+
+	stat = search_page(page, false);
+	if (stat == NULL) {
+		pr_debug2("missing free at page %"PRIx64" (order: %d)\n",
+			  page, order);
+
+		nr_page_nomatch++;
+		total_page_nomatch_bytes += bytes;
+
+		return 0;
+	}
+
+	this.page = page;
+	this.gfp_flags = stat->gfp_flags;
+	this.migrate_type = stat->migrate_type;
+
+	rb_erase(&stat->node, &page_tree);
+	free(stat);
+
+	stat = search_page_alloc_stat(&this, false);
+	if (stat == NULL)
+		return -ENOENT;
+
+	stat->nr_free++;
+	stat->free_bytes += bytes;
+
+	return 0;
+}
+
 typedef int (*tracepoint_handler)(struct perf_evsel *evsel,
 				  struct perf_sample *sample);
 
@@ -270,8 +513,9 @@ static double fragmentation(unsigned long n_req, unsigned long n_alloc)
 		return 100.0 - (100.0 * n_req / n_alloc);
 }
 
-static void __print_result(struct rb_root *root, struct perf_session *session,
-			   int n_lines, int is_caller)
+static void __print_slab_result(struct rb_root *root,
+				struct perf_session *session,
+				int n_lines, int is_caller)
 {
 	struct rb_node *next;
 	struct machine *machine = &session->machines.host;
@@ -323,9 +567,56 @@ static void __print_result(struct rb_root *root, struct perf_session *session,
 	printf("%.105s\n", graph_dotted_line);
 }
 
-static void print_summary(void)
+static const char * const migrate_type_str[] = {
+	"UNMOVABL",
+	"RECLAIM",
+	"MOVABLE",
+	"RESERVED",
+	"CMA/ISLT",
+	"UNKNOWN",
+};
+
+static void __print_page_result(struct rb_root *root,
+				struct perf_session *session __maybe_unused,
+				int n_lines)
+{
+	struct rb_node *next = rb_first(root);
+	const char *format;
+
+	printf("\n%.80s\n", graph_dotted_line);
+	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
+	       use_pfn ? "PFN" : "Page");
+	printf("%.80s\n", graph_dotted_line);
+
+	if (use_pfn)
+		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+	else
+		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+
+	while (next && n_lines--) {
+		struct page_stat *data;
+
+		data = rb_entry(next, struct page_stat, node);
+
+		printf(format, (unsigned long long)data->page,
+		       (unsigned long long)data->alloc_bytes / 1024,
+		       data->nr_alloc, data->order,
+		       migrate_type_str[data->migrate_type],
+		       (unsigned long)data->gfp_flags);
+
+		next = rb_next(next);
+	}
+
+	if (n_lines == -1)
+		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
+
+	printf("%.80s\n", graph_dotted_line);
+}
+
+static void print_slab_summary(void)
 {
-	printf("\nSUMMARY\n=======\n");
+	printf("\nSUMMARY (SLAB allocator)");
+	printf("\n========================\n");
 	printf("Total bytes requested: %'lu\n", total_requested);
 	printf("Total bytes allocated: %'lu\n", total_allocated);
 	printf("Total bytes wasted on internal fragmentation: %'lu\n",
@@ -335,13 +626,73 @@ static void print_summary(void)
 	printf("Cross CPU allocations: %'lu/%'lu\n", nr_cross_allocs, nr_allocs);
 }
 
-static void print_result(struct perf_session *session)
+static void print_page_summary(void)
+{
+	int o, m;
+	u64 nr_alloc_freed = nr_page_frees - nr_page_nomatch;
+	u64 total_alloc_freed_bytes = total_page_free_bytes - total_page_nomatch_bytes;
+
+	printf("\nSUMMARY (page allocator)");
+	printf("\n========================\n");
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation requests",
+	       nr_page_allocs, total_page_alloc_bytes / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free requests",
+	       nr_page_frees, total_page_free_bytes / 1024);
+	printf("\n");
+
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc+freed requests",
+	       nr_alloc_freed, (total_alloc_freed_bytes) / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc-only requests",
+	       nr_page_allocs - nr_alloc_freed,
+	       (total_page_alloc_bytes - total_alloc_freed_bytes) / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free-only requests",
+	       nr_page_nomatch, total_page_nomatch_bytes / 1024);
+	printf("\n");
+
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation failures",
+	       nr_page_fails, total_page_fail_bytes / 1024);
+	printf("\n");
+
+	printf("%5s  %12s  %12s  %12s  %12s  %12s\n", "Order",  "Unmovable",
+	       "Reclaimable", "Movable", "Reserved", "CMA/Isolated");
+	printf("%.5s  %.12s  %.12s  %.12s  %.12s  %.12s\n", graph_dotted_line,
+	       graph_dotted_line, graph_dotted_line, graph_dotted_line,
+	       graph_dotted_line, graph_dotted_line);
+
+	for (o = 0; o < MAX_PAGE_ORDER; o++) {
+		printf("%5d", o);
+		for (m = 0; m < MAX_MIGRATE_TYPES - 1; m++) {
+			if (order_stats[o][m])
+				printf("  %'12d", order_stats[o][m]);
+			else
+				printf("  %12c", '.');
+		}
+		printf("\n");
+	}
+}
+
+static void print_slab_result(struct perf_session *session)
 {
 	if (caller_flag)
-		__print_result(&root_caller_sorted, session, caller_lines, 1);
+		__print_slab_result(&root_caller_sorted, session, caller_lines, 1);
+	if (alloc_flag)
+		__print_slab_result(&root_alloc_sorted, session, alloc_lines, 0);
+	print_slab_summary();
+}
+
+static void print_page_result(struct perf_session *session)
+{
 	if (alloc_flag)
-		__print_result(&root_alloc_sorted, session, alloc_lines, 0);
-	print_summary();
+		__print_page_result(&page_alloc_sorted, session, alloc_lines);
+	print_page_summary();
+}
+
+static void print_result(struct perf_session *session)
+{
+	if (kmem_slab)
+		print_slab_result(session);
+	if (kmem_page)
+		print_page_result(session);
 }
 
 struct sort_dimension {
@@ -353,8 +704,8 @@ struct sort_dimension {
 static LIST_HEAD(caller_sort);
 static LIST_HEAD(alloc_sort);
 
-static void sort_insert(struct rb_root *root, struct alloc_stat *data,
-			struct list_head *sort_list)
+static void sort_slab_insert(struct rb_root *root, struct alloc_stat *data,
+			     struct list_head *sort_list)
 {
 	struct rb_node **new = &(root->rb_node);
 	struct rb_node *parent = NULL;
@@ -383,8 +734,8 @@ static void sort_insert(struct rb_root *root, struct alloc_stat *data,
 	rb_insert_color(&data->node, root);
 }
 
-static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
-			  struct list_head *sort_list)
+static void __sort_slab_result(struct rb_root *root, struct rb_root *root_sorted,
+			       struct list_head *sort_list)
 {
 	struct rb_node *node;
 	struct alloc_stat *data;
@@ -396,26 +747,79 @@ static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
 
 		rb_erase(node, root);
 		data = rb_entry(node, struct alloc_stat, node);
-		sort_insert(root_sorted, data, sort_list);
+		sort_slab_insert(root_sorted, data, sort_list);
+	}
+}
+
+static void sort_page_insert(struct rb_root *root, struct page_stat *data)
+{
+	struct rb_node **new = &root->rb_node;
+	struct rb_node *parent = NULL;
+
+	while (*new) {
+		struct page_stat *this;
+		int cmp = 0;
+
+		this = rb_entry(*new, struct page_stat, node);
+		parent = *new;
+
+		/* TODO: support more sort key */
+		cmp = data->alloc_bytes - this->alloc_bytes;
+
+		if (cmp > 0)
+			new = &parent->rb_left;
+		else
+			new = &parent->rb_right;
+	}
+
+	rb_link_node(&data->node, parent, new);
+	rb_insert_color(&data->node, root);
+}
+
+static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted)
+{
+	struct rb_node *node;
+	struct page_stat *data;
+
+	for (;;) {
+		node = rb_first(root);
+		if (!node)
+			break;
+
+		rb_erase(node, root);
+		data = rb_entry(node, struct page_stat, node);
+		sort_page_insert(root_sorted, data);
 	}
 }
 
 static void sort_result(void)
 {
-	__sort_result(&root_alloc_stat, &root_alloc_sorted, &alloc_sort);
-	__sort_result(&root_caller_stat, &root_caller_sorted, &caller_sort);
+	if (kmem_slab) {
+		__sort_slab_result(&root_alloc_stat, &root_alloc_sorted,
+				   &alloc_sort);
+		__sort_slab_result(&root_caller_stat, &root_caller_sorted,
+				   &caller_sort);
+	}
+	if (kmem_page) {
+		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
+	}
 }
 
 static int __cmd_kmem(struct perf_session *session)
 {
 	int err = -EINVAL;
+	struct perf_evsel *evsel;
 	const struct perf_evsel_str_handler kmem_tracepoints[] = {
+		/* slab allocator */
 		{ "kmem:kmalloc",		perf_evsel__process_alloc_event, },
     		{ "kmem:kmem_cache_alloc",	perf_evsel__process_alloc_event, },
 		{ "kmem:kmalloc_node",		perf_evsel__process_alloc_node_event, },
     		{ "kmem:kmem_cache_alloc_node", perf_evsel__process_alloc_node_event, },
 		{ "kmem:kfree",			perf_evsel__process_free_event, },
     		{ "kmem:kmem_cache_free",	perf_evsel__process_free_event, },
+		/* page allocator */
+		{ "kmem:mm_page_alloc",		perf_evsel__process_page_alloc_event, },
+		{ "kmem:mm_page_free",		perf_evsel__process_page_free_event, },
 	};
 
 	if (!perf_session__has_traces(session, "kmem record"))
@@ -426,10 +830,20 @@ static int __cmd_kmem(struct perf_session *session)
 		goto out;
 	}
 
+	evlist__for_each(session->evlist, evsel) {
+		if (!strcmp(perf_evsel__name(evsel), "kmem:mm_page_alloc") &&
+		    perf_evsel__field(evsel, "pfn")) {
+			use_pfn = true;
+			break;
+		}
+	}
+
 	setup_pager();
 	err = perf_session__process_events(session);
-	if (err != 0)
+	if (err != 0) {
+		pr_err("error during process events: %d\n", err);
 		goto out;
+	}
 	sort_result();
 	print_result(session);
 out:
@@ -612,6 +1026,22 @@ static int parse_alloc_opt(const struct option *opt __maybe_unused,
 	return 0;
 }
 
+static int parse_slab_opt(const struct option *opt __maybe_unused,
+			  const char *arg __maybe_unused,
+			  int unset __maybe_unused)
+{
+	kmem_slab = (kmem_page + 1);
+	return 0;
+}
+
+static int parse_page_opt(const struct option *opt __maybe_unused,
+			  const char *arg __maybe_unused,
+			  int unset __maybe_unused)
+{
+	kmem_page = (kmem_slab + 1);
+	return 0;
+}
+
 static int parse_line_opt(const struct option *opt __maybe_unused,
 			  const char *arg, int unset __maybe_unused)
 {
@@ -634,6 +1064,8 @@ static int __cmd_record(int argc, const char **argv)
 {
 	const char * const record_args[] = {
 	"record", "-a", "-R", "-c", "1",
+	};
+	const char * const slab_events[] = {
 	"-e", "kmem:kmalloc",
 	"-e", "kmem:kmalloc_node",
 	"-e", "kmem:kfree",
@@ -641,10 +1073,19 @@ static int __cmd_record(int argc, const char **argv)
 	"-e", "kmem:kmem_cache_alloc_node",
 	"-e", "kmem:kmem_cache_free",
 	};
+	const char * const page_events[] = {
+	"-e", "kmem:mm_page_alloc",
+	"-e", "kmem:mm_page_free",
+	};
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
 
 	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	if (kmem_slab)
+		rec_argc += ARRAY_SIZE(slab_events);
+	if (kmem_page)
+		rec_argc += ARRAY_SIZE(page_events);
+
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
 	if (rec_argv == NULL)
@@ -653,6 +1094,15 @@ static int __cmd_record(int argc, const char **argv)
 	for (i = 0; i < ARRAY_SIZE(record_args); i++)
 		rec_argv[i] = strdup(record_args[i]);
 
+	if (kmem_slab) {
+		for (j = 0; j < ARRAY_SIZE(slab_events); j++, i++)
+			rec_argv[i] = strdup(slab_events[j]);
+	}
+	if (kmem_page) {
+		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
+			rec_argv[i] = strdup(page_events[j]);
+	}
+
 	for (j = 1; j < (unsigned int)argc; j++, i++)
 		rec_argv[i] = argv[j];
 
@@ -679,6 +1129,10 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_CALLBACK('l', "line", NULL, "num", "show n lines", parse_line_opt),
 	OPT_BOOLEAN(0, "raw-ip", &raw_ip, "show raw ip instead of symbol"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
+	OPT_CALLBACK_NOOPT(0, "slab", NULL, NULL, "Analyze slab allocator",
+			   parse_slab_opt),
+	OPT_CALLBACK_NOOPT(0, "page", NULL, NULL, "Analyze page allocator",
+			   parse_page_opt),
 	OPT_END()
 	};
 	const char *const kmem_subcommands[] = { "record", "stat", NULL };
@@ -695,6 +1149,9 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (!argc)
 		usage_with_options(kmem_usage, kmem_options);
 
+	if (kmem_slab == 0 && kmem_page == 0)
+		kmem_slab = 1;  /* for backward compatibility */
+
 	if (!strncmp(argv[0], "rec", 3)) {
 		symbol__init(NULL);
 		return __cmd_record(argc, argv);
@@ -706,6 +1163,17 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (session == NULL)
 		return -1;
 
+	if (kmem_page) {
+		struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+
+		if (evsel == NULL || evsel->tp_format == NULL) {
+			pr_err("invalid event found.. aborting\n");
+			return -1;
+		}
+
+		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
+	}
+
 	symbol__init(&session->header.env);
 
 	if (!strcmp(argv[0], "stat")) {
-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 3/9] perf kmem: Analyze page allocator events also
@ 2015-04-06  5:36   ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

The perf kmem command records and analyze kernel memory allocation
only for SLAB objects.  This patch implement a simple page allocator
analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.

It adds two new options of --slab and --page.  The --slab option is
for analyzing SLAB allocator and that's what perf kmem currently does.

The new --page option enables page allocator events and analyze kernel
memory usage in page unit.  Currently, 'stat --alloc' subcommand is
implemented only.

If none of these --slab nor --page is specified, --slab is implied.

  # perf kmem stat --page --alloc --line 10

  -------------------------------------------------------------------------------
   PFN              | Total alloc (KB) | Hits     | Order | Mig.type | GFP flags
  -------------------------------------------------------------------------------
            4045014 |               16 |        1 |     2 |  RECLAIM |  00285250
            4143980 |               16 |        1 |     2 |  RECLAIM |  00285250
            3938658 |               16 |        1 |     2 |  RECLAIM |  00285250
            4045400 |               16 |        1 |     2 |  RECLAIM |  00285250
            3568708 |               16 |        1 |     2 |  RECLAIM |  00285250
            3729824 |               16 |        1 |     2 |  RECLAIM |  00285250
            3657210 |               16 |        1 |     2 |  RECLAIM |  00285250
            4120750 |               16 |        1 |     2 |  RECLAIM |  00285250
            3678850 |               16 |        1 |     2 |  RECLAIM |  00285250
            3693874 |               16 |        1 |     2 |  RECLAIM |  00285250
   ...              | ...              | ...      | ...   | ...      | ...
  -------------------------------------------------------------------------------

  SUMMARY (page allocator)
  ========================
  Total allocation requests     :           44,260   [          177,256 KB ]
  Total free requests           :              117   [              468 KB ]

  Total alloc+freed requests    :               49   [              196 KB ]
  Total alloc-only requests     :           44,211   [          177,060 KB ]
  Total free-only requests      :               68   [              272 KB ]

  Total allocation failures     :                0   [                0 KB ]

  Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
  -----  ------------  ------------  ------------  ------------  ------------
      0            32             .        44,210             .             .
      1             .             .             .             .             .
      2             .            18             .             .             .
      3             .             .             .             .             .
      4             .             .             .             .             .
      5             .             .             .             .             .
      6             .             .             .             .             .
      7             .             .             .             .             .
      8             .             .             .             .             .
      9             .             .             .             .             .
     10             .             .             .             .             .

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-kmem.txt |   8 +-
 tools/perf/builtin-kmem.c              | 500 +++++++++++++++++++++++++++++++--
 2 files changed, 491 insertions(+), 17 deletions(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
index 150253cc3c97..23219c65c16f 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -3,7 +3,7 @@ perf-kmem(1)
 
 NAME
 ----
-perf-kmem - Tool to trace/measure kernel memory(slab) properties
+perf-kmem - Tool to trace/measure kernel memory properties
 
 SYNOPSIS
 --------
@@ -46,6 +46,12 @@ OPTIONS
 --raw-ip::
 	Print raw ip instead of symbol
 
+--slab::
+	Analyze SLAB allocator events.
+
+--page::
+	Analyze page allocator events
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 4ebf65c79434..63ea01349b6e 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -22,6 +22,11 @@
 #include <linux/string.h>
 #include <locale.h>
 
+static int	kmem_slab;
+static int	kmem_page;
+
+static long	kmem_page_size;
+
 struct alloc_stat;
 typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
 
@@ -226,6 +231,244 @@ static int perf_evsel__process_free_event(struct perf_evsel *evsel,
 	return 0;
 }
 
+static u64 total_page_alloc_bytes;
+static u64 total_page_free_bytes;
+static u64 total_page_nomatch_bytes;
+static u64 total_page_fail_bytes;
+static unsigned long nr_page_allocs;
+static unsigned long nr_page_frees;
+static unsigned long nr_page_fails;
+static unsigned long nr_page_nomatch;
+
+static bool use_pfn;
+
+#define MAX_MIGRATE_TYPES  6
+#define MAX_PAGE_ORDER     11
+
+static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
+
+struct page_stat {
+	struct rb_node 	node;
+	u64 		page;
+	int 		order;
+	unsigned 	gfp_flags;
+	unsigned 	migrate_type;
+	u64		alloc_bytes;
+	u64 		free_bytes;
+	int 		nr_alloc;
+	int 		nr_free;
+};
+
+static struct rb_root page_tree;
+static struct rb_root page_alloc_tree;
+static struct rb_root page_alloc_sorted;
+
+static struct page_stat *search_page(unsigned long page, bool create)
+{
+	struct rb_node **node = &page_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = data->page - page;
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->page = page;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_tree);
+	}
+
+	return data;
+}
+
+static int page_stat_cmp(struct page_stat *a, struct page_stat *b)
+{
+	if (a->page > b->page)
+		return -1;
+	if (a->page < b->page)
+		return 1;
+	if (a->order > b->order)
+		return -1;
+	if (a->order < b->order)
+		return 1;
+	if (a->migrate_type > b->migrate_type)
+		return -1;
+	if (a->migrate_type < b->migrate_type)
+		return 1;
+	if (a->gfp_flags > b->gfp_flags)
+		return -1;
+	if (a->gfp_flags < b->gfp_flags)
+		return 1;
+	return 0;
+}
+
+static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool create)
+{
+	struct rb_node **node = &page_alloc_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = page_stat_cmp(data, stat);
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->page = stat->page;
+		data->order = stat->order;
+		data->gfp_flags = stat->gfp_flags;
+		data->migrate_type = stat->migrate_type;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_alloc_tree);
+	}
+
+	return data;
+}
+
+static bool valid_page(u64 pfn_or_page)
+{
+	if (use_pfn && pfn_or_page == -1UL)
+		return false;
+	if (!use_pfn && pfn_or_page == 0)
+		return false;
+	return true;
+}
+
+static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
+						struct perf_sample *sample)
+{
+	u64 page;
+	unsigned int order = perf_evsel__intval(evsel, sample, "order");
+	unsigned int gfp_flags = perf_evsel__intval(evsel, sample, "gfp_flags");
+	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
+						       "migratetype");
+	u64 bytes = kmem_page_size << order;
+	struct page_stat *stat;
+	struct page_stat this = {
+		.order = order,
+		.gfp_flags = gfp_flags,
+		.migrate_type = migrate_type,
+	};
+
+	if (use_pfn)
+		page = perf_evsel__intval(evsel, sample, "pfn");
+	else
+		page = perf_evsel__intval(evsel, sample, "page");
+
+	nr_page_allocs++;
+	total_page_alloc_bytes += bytes;
+
+	if (!valid_page(page)) {
+		nr_page_fails++;
+		total_page_fail_bytes += bytes;
+
+		return 0;
+	}
+
+	/*
+	 * This is to find the current page (with correct gfp flags and
+	 * migrate type) at free event.
+	 */
+	stat = search_page(page, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->order = order;
+	stat->gfp_flags = gfp_flags;
+	stat->migrate_type = migrate_type;
+
+	this.page = page;
+	stat = search_page_alloc_stat(&this, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->nr_alloc++;
+	stat->alloc_bytes += bytes;
+
+	order_stats[order][migrate_type]++;
+
+	return 0;
+}
+
+static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
+						struct perf_sample *sample)
+{
+	u64 page;
+	unsigned int order = perf_evsel__intval(evsel, sample, "order");
+	u64 bytes = kmem_page_size << order;
+	struct page_stat *stat;
+	struct page_stat this = {
+		.order = order,
+	};
+
+	if (use_pfn)
+		page = perf_evsel__intval(evsel, sample, "pfn");
+	else
+		page = perf_evsel__intval(evsel, sample, "page");
+
+	nr_page_frees++;
+	total_page_free_bytes += bytes;
+
+	stat = search_page(page, false);
+	if (stat == NULL) {
+		pr_debug2("missing free at page %"PRIx64" (order: %d)\n",
+			  page, order);
+
+		nr_page_nomatch++;
+		total_page_nomatch_bytes += bytes;
+
+		return 0;
+	}
+
+	this.page = page;
+	this.gfp_flags = stat->gfp_flags;
+	this.migrate_type = stat->migrate_type;
+
+	rb_erase(&stat->node, &page_tree);
+	free(stat);
+
+	stat = search_page_alloc_stat(&this, false);
+	if (stat == NULL)
+		return -ENOENT;
+
+	stat->nr_free++;
+	stat->free_bytes += bytes;
+
+	return 0;
+}
+
 typedef int (*tracepoint_handler)(struct perf_evsel *evsel,
 				  struct perf_sample *sample);
 
@@ -270,8 +513,9 @@ static double fragmentation(unsigned long n_req, unsigned long n_alloc)
 		return 100.0 - (100.0 * n_req / n_alloc);
 }
 
-static void __print_result(struct rb_root *root, struct perf_session *session,
-			   int n_lines, int is_caller)
+static void __print_slab_result(struct rb_root *root,
+				struct perf_session *session,
+				int n_lines, int is_caller)
 {
 	struct rb_node *next;
 	struct machine *machine = &session->machines.host;
@@ -323,9 +567,56 @@ static void __print_result(struct rb_root *root, struct perf_session *session,
 	printf("%.105s\n", graph_dotted_line);
 }
 
-static void print_summary(void)
+static const char * const migrate_type_str[] = {
+	"UNMOVABL",
+	"RECLAIM",
+	"MOVABLE",
+	"RESERVED",
+	"CMA/ISLT",
+	"UNKNOWN",
+};
+
+static void __print_page_result(struct rb_root *root,
+				struct perf_session *session __maybe_unused,
+				int n_lines)
+{
+	struct rb_node *next = rb_first(root);
+	const char *format;
+
+	printf("\n%.80s\n", graph_dotted_line);
+	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
+	       use_pfn ? "PFN" : "Page");
+	printf("%.80s\n", graph_dotted_line);
+
+	if (use_pfn)
+		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+	else
+		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+
+	while (next && n_lines--) {
+		struct page_stat *data;
+
+		data = rb_entry(next, struct page_stat, node);
+
+		printf(format, (unsigned long long)data->page,
+		       (unsigned long long)data->alloc_bytes / 1024,
+		       data->nr_alloc, data->order,
+		       migrate_type_str[data->migrate_type],
+		       (unsigned long)data->gfp_flags);
+
+		next = rb_next(next);
+	}
+
+	if (n_lines == -1)
+		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
+
+	printf("%.80s\n", graph_dotted_line);
+}
+
+static void print_slab_summary(void)
 {
-	printf("\nSUMMARY\n=======\n");
+	printf("\nSUMMARY (SLAB allocator)");
+	printf("\n========================\n");
 	printf("Total bytes requested: %'lu\n", total_requested);
 	printf("Total bytes allocated: %'lu\n", total_allocated);
 	printf("Total bytes wasted on internal fragmentation: %'lu\n",
@@ -335,13 +626,73 @@ static void print_summary(void)
 	printf("Cross CPU allocations: %'lu/%'lu\n", nr_cross_allocs, nr_allocs);
 }
 
-static void print_result(struct perf_session *session)
+static void print_page_summary(void)
+{
+	int o, m;
+	u64 nr_alloc_freed = nr_page_frees - nr_page_nomatch;
+	u64 total_alloc_freed_bytes = total_page_free_bytes - total_page_nomatch_bytes;
+
+	printf("\nSUMMARY (page allocator)");
+	printf("\n========================\n");
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation requests",
+	       nr_page_allocs, total_page_alloc_bytes / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free requests",
+	       nr_page_frees, total_page_free_bytes / 1024);
+	printf("\n");
+
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc+freed requests",
+	       nr_alloc_freed, (total_alloc_freed_bytes) / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc-only requests",
+	       nr_page_allocs - nr_alloc_freed,
+	       (total_page_alloc_bytes - total_alloc_freed_bytes) / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free-only requests",
+	       nr_page_nomatch, total_page_nomatch_bytes / 1024);
+	printf("\n");
+
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation failures",
+	       nr_page_fails, total_page_fail_bytes / 1024);
+	printf("\n");
+
+	printf("%5s  %12s  %12s  %12s  %12s  %12s\n", "Order",  "Unmovable",
+	       "Reclaimable", "Movable", "Reserved", "CMA/Isolated");
+	printf("%.5s  %.12s  %.12s  %.12s  %.12s  %.12s\n", graph_dotted_line,
+	       graph_dotted_line, graph_dotted_line, graph_dotted_line,
+	       graph_dotted_line, graph_dotted_line);
+
+	for (o = 0; o < MAX_PAGE_ORDER; o++) {
+		printf("%5d", o);
+		for (m = 0; m < MAX_MIGRATE_TYPES - 1; m++) {
+			if (order_stats[o][m])
+				printf("  %'12d", order_stats[o][m]);
+			else
+				printf("  %12c", '.');
+		}
+		printf("\n");
+	}
+}
+
+static void print_slab_result(struct perf_session *session)
 {
 	if (caller_flag)
-		__print_result(&root_caller_sorted, session, caller_lines, 1);
+		__print_slab_result(&root_caller_sorted, session, caller_lines, 1);
+	if (alloc_flag)
+		__print_slab_result(&root_alloc_sorted, session, alloc_lines, 0);
+	print_slab_summary();
+}
+
+static void print_page_result(struct perf_session *session)
+{
 	if (alloc_flag)
-		__print_result(&root_alloc_sorted, session, alloc_lines, 0);
-	print_summary();
+		__print_page_result(&page_alloc_sorted, session, alloc_lines);
+	print_page_summary();
+}
+
+static void print_result(struct perf_session *session)
+{
+	if (kmem_slab)
+		print_slab_result(session);
+	if (kmem_page)
+		print_page_result(session);
 }
 
 struct sort_dimension {
@@ -353,8 +704,8 @@ struct sort_dimension {
 static LIST_HEAD(caller_sort);
 static LIST_HEAD(alloc_sort);
 
-static void sort_insert(struct rb_root *root, struct alloc_stat *data,
-			struct list_head *sort_list)
+static void sort_slab_insert(struct rb_root *root, struct alloc_stat *data,
+			     struct list_head *sort_list)
 {
 	struct rb_node **new = &(root->rb_node);
 	struct rb_node *parent = NULL;
@@ -383,8 +734,8 @@ static void sort_insert(struct rb_root *root, struct alloc_stat *data,
 	rb_insert_color(&data->node, root);
 }
 
-static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
-			  struct list_head *sort_list)
+static void __sort_slab_result(struct rb_root *root, struct rb_root *root_sorted,
+			       struct list_head *sort_list)
 {
 	struct rb_node *node;
 	struct alloc_stat *data;
@@ -396,26 +747,79 @@ static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
 
 		rb_erase(node, root);
 		data = rb_entry(node, struct alloc_stat, node);
-		sort_insert(root_sorted, data, sort_list);
+		sort_slab_insert(root_sorted, data, sort_list);
+	}
+}
+
+static void sort_page_insert(struct rb_root *root, struct page_stat *data)
+{
+	struct rb_node **new = &root->rb_node;
+	struct rb_node *parent = NULL;
+
+	while (*new) {
+		struct page_stat *this;
+		int cmp = 0;
+
+		this = rb_entry(*new, struct page_stat, node);
+		parent = *new;
+
+		/* TODO: support more sort key */
+		cmp = data->alloc_bytes - this->alloc_bytes;
+
+		if (cmp > 0)
+			new = &parent->rb_left;
+		else
+			new = &parent->rb_right;
+	}
+
+	rb_link_node(&data->node, parent, new);
+	rb_insert_color(&data->node, root);
+}
+
+static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted)
+{
+	struct rb_node *node;
+	struct page_stat *data;
+
+	for (;;) {
+		node = rb_first(root);
+		if (!node)
+			break;
+
+		rb_erase(node, root);
+		data = rb_entry(node, struct page_stat, node);
+		sort_page_insert(root_sorted, data);
 	}
 }
 
 static void sort_result(void)
 {
-	__sort_result(&root_alloc_stat, &root_alloc_sorted, &alloc_sort);
-	__sort_result(&root_caller_stat, &root_caller_sorted, &caller_sort);
+	if (kmem_slab) {
+		__sort_slab_result(&root_alloc_stat, &root_alloc_sorted,
+				   &alloc_sort);
+		__sort_slab_result(&root_caller_stat, &root_caller_sorted,
+				   &caller_sort);
+	}
+	if (kmem_page) {
+		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
+	}
 }
 
 static int __cmd_kmem(struct perf_session *session)
 {
 	int err = -EINVAL;
+	struct perf_evsel *evsel;
 	const struct perf_evsel_str_handler kmem_tracepoints[] = {
+		/* slab allocator */
 		{ "kmem:kmalloc",		perf_evsel__process_alloc_event, },
     		{ "kmem:kmem_cache_alloc",	perf_evsel__process_alloc_event, },
 		{ "kmem:kmalloc_node",		perf_evsel__process_alloc_node_event, },
     		{ "kmem:kmem_cache_alloc_node", perf_evsel__process_alloc_node_event, },
 		{ "kmem:kfree",			perf_evsel__process_free_event, },
     		{ "kmem:kmem_cache_free",	perf_evsel__process_free_event, },
+		/* page allocator */
+		{ "kmem:mm_page_alloc",		perf_evsel__process_page_alloc_event, },
+		{ "kmem:mm_page_free",		perf_evsel__process_page_free_event, },
 	};
 
 	if (!perf_session__has_traces(session, "kmem record"))
@@ -426,10 +830,20 @@ static int __cmd_kmem(struct perf_session *session)
 		goto out;
 	}
 
+	evlist__for_each(session->evlist, evsel) {
+		if (!strcmp(perf_evsel__name(evsel), "kmem:mm_page_alloc") &&
+		    perf_evsel__field(evsel, "pfn")) {
+			use_pfn = true;
+			break;
+		}
+	}
+
 	setup_pager();
 	err = perf_session__process_events(session);
-	if (err != 0)
+	if (err != 0) {
+		pr_err("error during process events: %d\n", err);
 		goto out;
+	}
 	sort_result();
 	print_result(session);
 out:
@@ -612,6 +1026,22 @@ static int parse_alloc_opt(const struct option *opt __maybe_unused,
 	return 0;
 }
 
+static int parse_slab_opt(const struct option *opt __maybe_unused,
+			  const char *arg __maybe_unused,
+			  int unset __maybe_unused)
+{
+	kmem_slab = (kmem_page + 1);
+	return 0;
+}
+
+static int parse_page_opt(const struct option *opt __maybe_unused,
+			  const char *arg __maybe_unused,
+			  int unset __maybe_unused)
+{
+	kmem_page = (kmem_slab + 1);
+	return 0;
+}
+
 static int parse_line_opt(const struct option *opt __maybe_unused,
 			  const char *arg, int unset __maybe_unused)
 {
@@ -634,6 +1064,8 @@ static int __cmd_record(int argc, const char **argv)
 {
 	const char * const record_args[] = {
 	"record", "-a", "-R", "-c", "1",
+	};
+	const char * const slab_events[] = {
 	"-e", "kmem:kmalloc",
 	"-e", "kmem:kmalloc_node",
 	"-e", "kmem:kfree",
@@ -641,10 +1073,19 @@ static int __cmd_record(int argc, const char **argv)
 	"-e", "kmem:kmem_cache_alloc_node",
 	"-e", "kmem:kmem_cache_free",
 	};
+	const char * const page_events[] = {
+	"-e", "kmem:mm_page_alloc",
+	"-e", "kmem:mm_page_free",
+	};
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
 
 	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	if (kmem_slab)
+		rec_argc += ARRAY_SIZE(slab_events);
+	if (kmem_page)
+		rec_argc += ARRAY_SIZE(page_events);
+
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
 	if (rec_argv == NULL)
@@ -653,6 +1094,15 @@ static int __cmd_record(int argc, const char **argv)
 	for (i = 0; i < ARRAY_SIZE(record_args); i++)
 		rec_argv[i] = strdup(record_args[i]);
 
+	if (kmem_slab) {
+		for (j = 0; j < ARRAY_SIZE(slab_events); j++, i++)
+			rec_argv[i] = strdup(slab_events[j]);
+	}
+	if (kmem_page) {
+		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
+			rec_argv[i] = strdup(page_events[j]);
+	}
+
 	for (j = 1; j < (unsigned int)argc; j++, i++)
 		rec_argv[i] = argv[j];
 
@@ -679,6 +1129,10 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_CALLBACK('l', "line", NULL, "num", "show n lines", parse_line_opt),
 	OPT_BOOLEAN(0, "raw-ip", &raw_ip, "show raw ip instead of symbol"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
+	OPT_CALLBACK_NOOPT(0, "slab", NULL, NULL, "Analyze slab allocator",
+			   parse_slab_opt),
+	OPT_CALLBACK_NOOPT(0, "page", NULL, NULL, "Analyze page allocator",
+			   parse_page_opt),
 	OPT_END()
 	};
 	const char *const kmem_subcommands[] = { "record", "stat", NULL };
@@ -695,6 +1149,9 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (!argc)
 		usage_with_options(kmem_usage, kmem_options);
 
+	if (kmem_slab == 0 && kmem_page == 0)
+		kmem_slab = 1;  /* for backward compatibility */
+
 	if (!strncmp(argv[0], "rec", 3)) {
 		symbol__init(NULL);
 		return __cmd_record(argc, argv);
@@ -706,6 +1163,17 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (session == NULL)
 		return -1;
 
+	if (kmem_page) {
+		struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+
+		if (evsel == NULL || evsel->tp_format == NULL) {
+			pr_err("invalid event found.. aborting\n");
+			return -1;
+		}
+
+		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
+	}
+
 	symbol__init(&session->header.env);
 
 	if (!strcmp(argv[0], "stat")) {
-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/9] perf kmem: Implement stat --page --caller
  2015-04-06  5:36 ` Namhyung Kim
@ 2015-04-06  5:36   ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

It perf kmem support caller statistics for page.  Unlike slab case,
the tracepoints in page allocator don't provide callsite info.  So
it records with callchain and extracts callsite info.

Note that the callchain contains several memory allocation functions
which has no meaning for users.  So skip those functions to get proper
callsites.  I used following regex pattern to skip the allocator
functions:

  ^_?_?(alloc|get_free|get_zeroed)_pages?

This gave me a following list of functions:

  # perf kmem record --page sleep 3
  # perf kmem stat --page -v
  ...
  alloc func: __get_free_pages
  alloc func: get_zeroed_page
  alloc func: alloc_pages_exact
  alloc func: __alloc_pages_direct_compact
  alloc func: __alloc_pages_nodemask
  alloc func: alloc_page_interleave
  alloc func: alloc_pages_current
  alloc func: alloc_pages_vma
  alloc func: alloc_page_buffers
  alloc func: alloc_pages_exact_nid
  ...

The output looks mostly same as --alloc (I also added callsite column
to that) but groups entries by callsite.  Currently, the order,
migrate type and GFP flag info is for the last allocation and not
guaranteed to be same for all allocations from the callsite.

  ---------------------------------------------------------------------------------------------
   Total_alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite
  ---------------------------------------------------------------------------------------------
              1,064 |       266 |     0 | UNMOVABL |  000000d0 | __pollwait
                 52 |        13 |     0 | UNMOVABL |  002084d0 | pte_alloc_one
                 44 |        11 |     0 |  MOVABLE |  000280da | handle_mm_fault
                 20 |         5 |     0 |  MOVABLE |  000200da | do_cow_fault
                 20 |         5 |     0 |  MOVABLE |  000200da | do_wp_page
                 16 |         4 |     0 | UNMOVABL |  000084d0 | __pmd_alloc
                 16 |         4 |     0 | UNMOVABL |  00000200 | __tlb_remove_page
                 12 |         3 |     0 | UNMOVABL |  000084d0 | __pud_alloc
                  8 |         2 |     0 | UNMOVABL |  00000010 | bio_copy_user_iov
                  4 |         1 |     0 | UNMOVABL |  000200d2 | pipe_write
                  4 |         1 |     0 |  MOVABLE |  000280da | do_wp_page
                  4 |         1 |     0 | UNMOVABL |  002084d0 | pgd_alloc
  ---------------------------------------------------------------------------------------------

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-kmem.c | 279 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 263 insertions(+), 16 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 63ea01349b6e..5b3ed17c293a 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -10,6 +10,7 @@
 #include "util/header.h"
 #include "util/session.h"
 #include "util/tool.h"
+#include "util/callchain.h"
 
 #include "util/parse-options.h"
 #include "util/trace-event.h"
@@ -21,6 +22,7 @@
 #include <linux/rbtree.h>
 #include <linux/string.h>
 #include <locale.h>
+#include <regex.h>
 
 static int	kmem_slab;
 static int	kmem_page;
@@ -241,6 +243,7 @@ static unsigned long nr_page_fails;
 static unsigned long nr_page_nomatch;
 
 static bool use_pfn;
+static struct perf_session *kmem_session;
 
 #define MAX_MIGRATE_TYPES  6
 #define MAX_PAGE_ORDER     11
@@ -250,6 +253,7 @@ static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
 struct page_stat {
 	struct rb_node 	node;
 	u64 		page;
+	u64 		callsite;
 	int 		order;
 	unsigned 	gfp_flags;
 	unsigned 	migrate_type;
@@ -262,8 +266,138 @@ struct page_stat {
 static struct rb_root page_tree;
 static struct rb_root page_alloc_tree;
 static struct rb_root page_alloc_sorted;
+static struct rb_root page_caller_tree;
+static struct rb_root page_caller_sorted;
 
-static struct page_stat *search_page(unsigned long page, bool create)
+struct alloc_func {
+	u64 start;
+	u64 end;
+	char *name;
+};
+
+static int nr_alloc_funcs;
+static struct alloc_func *alloc_func_list;
+
+static int funcmp(const void *a, const void *b)
+{
+	const struct alloc_func *fa = a;
+	const struct alloc_func *fb = b;
+
+	if (fa->start > fb->start)
+		return 1;
+	else
+		return -1;
+}
+
+static int callcmp(const void *a, const void *b)
+{
+	const struct alloc_func *fa = a;
+	const struct alloc_func *fb = b;
+
+	if (fb->start <= fa->start && fa->end < fb->end)
+		return 0;
+
+	if (fa->start > fb->start)
+		return 1;
+	else
+		return -1;
+}
+
+static int build_alloc_func_list(void)
+{
+	int ret;
+	struct map *kernel_map;
+	struct symbol *sym;
+	struct rb_node *node;
+	struct alloc_func *func;
+	struct machine *machine = &kmem_session->machines.host;
+
+	regex_t alloc_func_regex;
+	const char pattern[] = "^_?_?(alloc|get_free|get_zeroed)_pages?";
+
+	ret = regcomp(&alloc_func_regex, pattern, REG_EXTENDED);
+	if (ret) {
+		char err[BUFSIZ];
+
+		regerror(ret, &alloc_func_regex, err, sizeof(err));
+		pr_err("Invalid regex: %s\n%s", pattern, err);
+		return -EINVAL;
+	}
+
+	kernel_map = machine->vmlinux_maps[MAP__FUNCTION];
+	map__load(kernel_map, NULL);
+
+	map__for_each_symbol(kernel_map, sym, node) {
+		if (regexec(&alloc_func_regex, sym->name, 0, NULL, 0))
+			continue;
+
+		func = realloc(alloc_func_list,
+			       (nr_alloc_funcs + 1) * sizeof(*func));
+		if (func == NULL)
+			return -ENOMEM;
+
+		pr_debug("alloc func: %s\n", sym->name);
+		func[nr_alloc_funcs].start = sym->start;
+		func[nr_alloc_funcs].end   = sym->end;
+		func[nr_alloc_funcs].name  = sym->name;
+
+		alloc_func_list = func;
+		nr_alloc_funcs++;
+	}
+
+	qsort(alloc_func_list, nr_alloc_funcs, sizeof(*func), funcmp);
+
+	regfree(&alloc_func_regex);
+	return 0;
+}
+
+/*
+ * Find first non-memory allocation function from callchain.
+ * The allocation functions are in the 'alloc_func_list'.
+ */
+static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
+{
+	struct addr_location al;
+	struct machine *machine = &kmem_session->machines.host;
+	struct callchain_cursor_node *node;
+
+	if (alloc_func_list == NULL)
+		build_alloc_func_list();
+
+	al.thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	sample__resolve_callchain(sample, NULL, evsel, &al, 16);
+
+	callchain_cursor_commit(&callchain_cursor);
+	while (true) {
+		struct alloc_func key, *caller;
+		u64 addr;
+
+		node = callchain_cursor_current(&callchain_cursor);
+		if (node == NULL)
+			break;
+
+		key.start = key.end = node->ip;
+		caller = bsearch(&key, alloc_func_list, nr_alloc_funcs,
+				 sizeof(key), callcmp);
+		if (!caller) {
+			/* found */
+			if (node->map)
+				addr = map__unmap_ip(node->map, node->ip);
+			else
+				addr = node->ip;
+
+			return addr;
+		} else
+			pr_debug3("skipping alloc function: %s\n", caller->name);
+
+		callchain_cursor_advance(&callchain_cursor);
+	}
+
+	pr_debug2("unknown callsite: %"PRIx64 "\n", sample->ip);
+	return sample->ip;
+}
+
+static struct page_stat *search_page(u64 page, bool create)
 {
 	struct rb_node **node = &page_tree.rb_node;
 	struct rb_node *parent = NULL;
@@ -357,6 +491,41 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
 	return data;
 }
 
+static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
+{
+	struct rb_node **node = &page_caller_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = data->callsite - callsite;
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->callsite = callsite;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_caller_tree);
+	}
+
+	return data;
+}
+
 static bool valid_page(u64 pfn_or_page)
 {
 	if (use_pfn && pfn_or_page == -1UL)
@@ -375,6 +544,7 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
 						       "migratetype");
 	u64 bytes = kmem_page_size << order;
+	u64 callsite;
 	struct page_stat *stat;
 	struct page_stat this = {
 		.order = order,
@@ -397,6 +567,8 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 		return 0;
 	}
 
+	callsite = find_callsite(evsel, sample);
+
 	/*
 	 * This is to find the current page (with correct gfp flags and
 	 * migrate type) at free event.
@@ -408,6 +580,7 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	stat->order = order;
 	stat->gfp_flags = gfp_flags;
 	stat->migrate_type = migrate_type;
+	stat->callsite = callsite;
 
 	this.page = page;
 	stat = search_page_alloc_stat(&this, true);
@@ -416,6 +589,18 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 
 	stat->nr_alloc++;
 	stat->alloc_bytes += bytes;
+	stat->callsite = callsite;
+
+	stat = search_page_caller_stat(callsite, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->order = order;
+	stat->gfp_flags = gfp_flags;
+	stat->migrate_type = migrate_type;
+
+	stat->nr_alloc++;
+	stat->alloc_bytes += bytes;
 
 	order_stats[order][migrate_type]++;
 
@@ -455,6 +640,7 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	this.page = page;
 	this.gfp_flags = stat->gfp_flags;
 	this.migrate_type = stat->migrate_type;
+	this.callsite = stat->callsite;
 
 	rb_erase(&stat->node, &page_tree);
 	free(stat);
@@ -466,6 +652,13 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	stat->nr_free++;
 	stat->free_bytes += bytes;
 
+	stat = search_page_caller_stat(this.callsite, false);
+	if (stat == NULL)
+		return -ENOENT;
+
+	stat->nr_free++;
+	stat->free_bytes += bytes;
+
 	return 0;
 }
 
@@ -576,41 +769,89 @@ static const char * const migrate_type_str[] = {
 	"UNKNOWN",
 };
 
-static void __print_page_result(struct rb_root *root,
-				struct perf_session *session __maybe_unused,
-				int n_lines)
+static void __print_page_alloc_result(struct perf_session *session, int n_lines)
 {
-	struct rb_node *next = rb_first(root);
+	struct rb_node *next = rb_first(&page_alloc_sorted);
+	struct machine *machine = &session->machines.host;
 	const char *format;
 
-	printf("\n%.80s\n", graph_dotted_line);
-	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
+	printf("\n%.105s\n", graph_dotted_line);
+	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
 	       use_pfn ? "PFN" : "Page");
-	printf("%.80s\n", graph_dotted_line);
+	printf("%.105s\n", graph_dotted_line);
 
 	if (use_pfn)
-		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
 	else
-		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
 
 	while (next && n_lines--) {
 		struct page_stat *data;
+		struct symbol *sym;
+		struct map *map;
+		char buf[32];
+		char *caller = buf;
 
 		data = rb_entry(next, struct page_stat, node);
+		sym = machine__find_kernel_function(machine, data->callsite,
+						    &map, NULL);
+		if (sym && sym->name)
+			caller = sym->name;
+		else
+			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
 
 		printf(format, (unsigned long long)data->page,
 		       (unsigned long long)data->alloc_bytes / 1024,
 		       data->nr_alloc, data->order,
 		       migrate_type_str[data->migrate_type],
-		       (unsigned long)data->gfp_flags);
+		       (unsigned long)data->gfp_flags, caller);
+
+		next = rb_next(next);
+	}
+
+	if (n_lines == -1)
+		printf(" ...              | ...              | ...       | ...   | ...      | ...       | ...\n");
+
+	printf("%.105s\n", graph_dotted_line);
+}
+
+static void __print_page_caller_result(struct perf_session *session, int n_lines)
+{
+	struct rb_node *next = rb_first(&page_caller_sorted);
+	struct machine *machine = &session->machines.host;
+
+	printf("\n%.105s\n", graph_dotted_line);
+	printf(" Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n");
+	printf("%.105s\n", graph_dotted_line);
+
+	while (next && n_lines--) {
+		struct page_stat *data;
+		struct symbol *sym;
+		struct map *map;
+		char buf[32];
+		char *caller = buf;
+
+		data = rb_entry(next, struct page_stat, node);
+		sym = machine__find_kernel_function(machine, data->callsite,
+						    &map, NULL);
+		if (sym && sym->name)
+			caller = sym->name;
+		else
+			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
+
+		printf(" %'16llu | %'9d | %5d | %8s |  %08lx | %s\n",
+		       (unsigned long long)data->alloc_bytes / 1024,
+		       data->nr_alloc, data->order,
+		       migrate_type_str[data->migrate_type],
+		       (unsigned long)data->gfp_flags, caller);
 
 		next = rb_next(next);
 	}
 
 	if (n_lines == -1)
-		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
+		printf(" ...              | ...       | ...   | ...      | ...       | ...\n");
 
-	printf("%.80s\n", graph_dotted_line);
+	printf("%.105s\n", graph_dotted_line);
 }
 
 static void print_slab_summary(void)
@@ -682,8 +923,10 @@ static void print_slab_result(struct perf_session *session)
 
 static void print_page_result(struct perf_session *session)
 {
+	if (caller_flag)
+		__print_page_caller_result(session, caller_lines);
 	if (alloc_flag)
-		__print_page_result(&page_alloc_sorted, session, alloc_lines);
+		__print_page_alloc_result(session, alloc_lines);
 	print_page_summary();
 }
 
@@ -802,6 +1045,7 @@ static void sort_result(void)
 	}
 	if (kmem_page) {
 		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
+		__sort_page_result(&page_caller_tree, &page_caller_sorted);
 	}
 }
 
@@ -1084,7 +1328,7 @@ static int __cmd_record(int argc, const char **argv)
 	if (kmem_slab)
 		rec_argc += ARRAY_SIZE(slab_events);
 	if (kmem_page)
-		rec_argc += ARRAY_SIZE(page_events);
+		rec_argc += ARRAY_SIZE(page_events) + 1; /* for -g */
 
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
@@ -1099,6 +1343,8 @@ static int __cmd_record(int argc, const char **argv)
 			rec_argv[i] = strdup(slab_events[j]);
 	}
 	if (kmem_page) {
+		rec_argv[i++] = strdup("-g");
+
 		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
 			rec_argv[i] = strdup(page_events[j]);
 	}
@@ -1159,7 +1405,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 
 	file.path = input_name;
 
-	session = perf_session__new(&file, false, &perf_kmem);
+	kmem_session = session = perf_session__new(&file, false, &perf_kmem);
 	if (session == NULL)
 		return -1;
 
@@ -1172,6 +1418,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 
 		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
+		symbol_conf.use_callchain = true;
 	}
 
 	symbol__init(&session->header.env);
-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 4/9] perf kmem: Implement stat --page --caller
@ 2015-04-06  5:36   ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

It perf kmem support caller statistics for page.  Unlike slab case,
the tracepoints in page allocator don't provide callsite info.  So
it records with callchain and extracts callsite info.

Note that the callchain contains several memory allocation functions
which has no meaning for users.  So skip those functions to get proper
callsites.  I used following regex pattern to skip the allocator
functions:

  ^_?_?(alloc|get_free|get_zeroed)_pages?

This gave me a following list of functions:

  # perf kmem record --page sleep 3
  # perf kmem stat --page -v
  ...
  alloc func: __get_free_pages
  alloc func: get_zeroed_page
  alloc func: alloc_pages_exact
  alloc func: __alloc_pages_direct_compact
  alloc func: __alloc_pages_nodemask
  alloc func: alloc_page_interleave
  alloc func: alloc_pages_current
  alloc func: alloc_pages_vma
  alloc func: alloc_page_buffers
  alloc func: alloc_pages_exact_nid
  ...

The output looks mostly same as --alloc (I also added callsite column
to that) but groups entries by callsite.  Currently, the order,
migrate type and GFP flag info is for the last allocation and not
guaranteed to be same for all allocations from the callsite.

  ---------------------------------------------------------------------------------------------
   Total_alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite
  ---------------------------------------------------------------------------------------------
              1,064 |       266 |     0 | UNMOVABL |  000000d0 | __pollwait
                 52 |        13 |     0 | UNMOVABL |  002084d0 | pte_alloc_one
                 44 |        11 |     0 |  MOVABLE |  000280da | handle_mm_fault
                 20 |         5 |     0 |  MOVABLE |  000200da | do_cow_fault
                 20 |         5 |     0 |  MOVABLE |  000200da | do_wp_page
                 16 |         4 |     0 | UNMOVABL |  000084d0 | __pmd_alloc
                 16 |         4 |     0 | UNMOVABL |  00000200 | __tlb_remove_page
                 12 |         3 |     0 | UNMOVABL |  000084d0 | __pud_alloc
                  8 |         2 |     0 | UNMOVABL |  00000010 | bio_copy_user_iov
                  4 |         1 |     0 | UNMOVABL |  000200d2 | pipe_write
                  4 |         1 |     0 |  MOVABLE |  000280da | do_wp_page
                  4 |         1 |     0 | UNMOVABL |  002084d0 | pgd_alloc
  ---------------------------------------------------------------------------------------------

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-kmem.c | 279 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 263 insertions(+), 16 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 63ea01349b6e..5b3ed17c293a 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -10,6 +10,7 @@
 #include "util/header.h"
 #include "util/session.h"
 #include "util/tool.h"
+#include "util/callchain.h"
 
 #include "util/parse-options.h"
 #include "util/trace-event.h"
@@ -21,6 +22,7 @@
 #include <linux/rbtree.h>
 #include <linux/string.h>
 #include <locale.h>
+#include <regex.h>
 
 static int	kmem_slab;
 static int	kmem_page;
@@ -241,6 +243,7 @@ static unsigned long nr_page_fails;
 static unsigned long nr_page_nomatch;
 
 static bool use_pfn;
+static struct perf_session *kmem_session;
 
 #define MAX_MIGRATE_TYPES  6
 #define MAX_PAGE_ORDER     11
@@ -250,6 +253,7 @@ static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
 struct page_stat {
 	struct rb_node 	node;
 	u64 		page;
+	u64 		callsite;
 	int 		order;
 	unsigned 	gfp_flags;
 	unsigned 	migrate_type;
@@ -262,8 +266,138 @@ struct page_stat {
 static struct rb_root page_tree;
 static struct rb_root page_alloc_tree;
 static struct rb_root page_alloc_sorted;
+static struct rb_root page_caller_tree;
+static struct rb_root page_caller_sorted;
 
-static struct page_stat *search_page(unsigned long page, bool create)
+struct alloc_func {
+	u64 start;
+	u64 end;
+	char *name;
+};
+
+static int nr_alloc_funcs;
+static struct alloc_func *alloc_func_list;
+
+static int funcmp(const void *a, const void *b)
+{
+	const struct alloc_func *fa = a;
+	const struct alloc_func *fb = b;
+
+	if (fa->start > fb->start)
+		return 1;
+	else
+		return -1;
+}
+
+static int callcmp(const void *a, const void *b)
+{
+	const struct alloc_func *fa = a;
+	const struct alloc_func *fb = b;
+
+	if (fb->start <= fa->start && fa->end < fb->end)
+		return 0;
+
+	if (fa->start > fb->start)
+		return 1;
+	else
+		return -1;
+}
+
+static int build_alloc_func_list(void)
+{
+	int ret;
+	struct map *kernel_map;
+	struct symbol *sym;
+	struct rb_node *node;
+	struct alloc_func *func;
+	struct machine *machine = &kmem_session->machines.host;
+
+	regex_t alloc_func_regex;
+	const char pattern[] = "^_?_?(alloc|get_free|get_zeroed)_pages?";
+
+	ret = regcomp(&alloc_func_regex, pattern, REG_EXTENDED);
+	if (ret) {
+		char err[BUFSIZ];
+
+		regerror(ret, &alloc_func_regex, err, sizeof(err));
+		pr_err("Invalid regex: %s\n%s", pattern, err);
+		return -EINVAL;
+	}
+
+	kernel_map = machine->vmlinux_maps[MAP__FUNCTION];
+	map__load(kernel_map, NULL);
+
+	map__for_each_symbol(kernel_map, sym, node) {
+		if (regexec(&alloc_func_regex, sym->name, 0, NULL, 0))
+			continue;
+
+		func = realloc(alloc_func_list,
+			       (nr_alloc_funcs + 1) * sizeof(*func));
+		if (func == NULL)
+			return -ENOMEM;
+
+		pr_debug("alloc func: %s\n", sym->name);
+		func[nr_alloc_funcs].start = sym->start;
+		func[nr_alloc_funcs].end   = sym->end;
+		func[nr_alloc_funcs].name  = sym->name;
+
+		alloc_func_list = func;
+		nr_alloc_funcs++;
+	}
+
+	qsort(alloc_func_list, nr_alloc_funcs, sizeof(*func), funcmp);
+
+	regfree(&alloc_func_regex);
+	return 0;
+}
+
+/*
+ * Find first non-memory allocation function from callchain.
+ * The allocation functions are in the 'alloc_func_list'.
+ */
+static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
+{
+	struct addr_location al;
+	struct machine *machine = &kmem_session->machines.host;
+	struct callchain_cursor_node *node;
+
+	if (alloc_func_list == NULL)
+		build_alloc_func_list();
+
+	al.thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	sample__resolve_callchain(sample, NULL, evsel, &al, 16);
+
+	callchain_cursor_commit(&callchain_cursor);
+	while (true) {
+		struct alloc_func key, *caller;
+		u64 addr;
+
+		node = callchain_cursor_current(&callchain_cursor);
+		if (node == NULL)
+			break;
+
+		key.start = key.end = node->ip;
+		caller = bsearch(&key, alloc_func_list, nr_alloc_funcs,
+				 sizeof(key), callcmp);
+		if (!caller) {
+			/* found */
+			if (node->map)
+				addr = map__unmap_ip(node->map, node->ip);
+			else
+				addr = node->ip;
+
+			return addr;
+		} else
+			pr_debug3("skipping alloc function: %s\n", caller->name);
+
+		callchain_cursor_advance(&callchain_cursor);
+	}
+
+	pr_debug2("unknown callsite: %"PRIx64 "\n", sample->ip);
+	return sample->ip;
+}
+
+static struct page_stat *search_page(u64 page, bool create)
 {
 	struct rb_node **node = &page_tree.rb_node;
 	struct rb_node *parent = NULL;
@@ -357,6 +491,41 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
 	return data;
 }
 
+static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
+{
+	struct rb_node **node = &page_caller_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = data->callsite - callsite;
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->callsite = callsite;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_caller_tree);
+	}
+
+	return data;
+}
+
 static bool valid_page(u64 pfn_or_page)
 {
 	if (use_pfn && pfn_or_page == -1UL)
@@ -375,6 +544,7 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
 						       "migratetype");
 	u64 bytes = kmem_page_size << order;
+	u64 callsite;
 	struct page_stat *stat;
 	struct page_stat this = {
 		.order = order,
@@ -397,6 +567,8 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 		return 0;
 	}
 
+	callsite = find_callsite(evsel, sample);
+
 	/*
 	 * This is to find the current page (with correct gfp flags and
 	 * migrate type) at free event.
@@ -408,6 +580,7 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	stat->order = order;
 	stat->gfp_flags = gfp_flags;
 	stat->migrate_type = migrate_type;
+	stat->callsite = callsite;
 
 	this.page = page;
 	stat = search_page_alloc_stat(&this, true);
@@ -416,6 +589,18 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 
 	stat->nr_alloc++;
 	stat->alloc_bytes += bytes;
+	stat->callsite = callsite;
+
+	stat = search_page_caller_stat(callsite, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->order = order;
+	stat->gfp_flags = gfp_flags;
+	stat->migrate_type = migrate_type;
+
+	stat->nr_alloc++;
+	stat->alloc_bytes += bytes;
 
 	order_stats[order][migrate_type]++;
 
@@ -455,6 +640,7 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	this.page = page;
 	this.gfp_flags = stat->gfp_flags;
 	this.migrate_type = stat->migrate_type;
+	this.callsite = stat->callsite;
 
 	rb_erase(&stat->node, &page_tree);
 	free(stat);
@@ -466,6 +652,13 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	stat->nr_free++;
 	stat->free_bytes += bytes;
 
+	stat = search_page_caller_stat(this.callsite, false);
+	if (stat == NULL)
+		return -ENOENT;
+
+	stat->nr_free++;
+	stat->free_bytes += bytes;
+
 	return 0;
 }
 
@@ -576,41 +769,89 @@ static const char * const migrate_type_str[] = {
 	"UNKNOWN",
 };
 
-static void __print_page_result(struct rb_root *root,
-				struct perf_session *session __maybe_unused,
-				int n_lines)
+static void __print_page_alloc_result(struct perf_session *session, int n_lines)
 {
-	struct rb_node *next = rb_first(root);
+	struct rb_node *next = rb_first(&page_alloc_sorted);
+	struct machine *machine = &session->machines.host;
 	const char *format;
 
-	printf("\n%.80s\n", graph_dotted_line);
-	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
+	printf("\n%.105s\n", graph_dotted_line);
+	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
 	       use_pfn ? "PFN" : "Page");
-	printf("%.80s\n", graph_dotted_line);
+	printf("%.105s\n", graph_dotted_line);
 
 	if (use_pfn)
-		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
 	else
-		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
 
 	while (next && n_lines--) {
 		struct page_stat *data;
+		struct symbol *sym;
+		struct map *map;
+		char buf[32];
+		char *caller = buf;
 
 		data = rb_entry(next, struct page_stat, node);
+		sym = machine__find_kernel_function(machine, data->callsite,
+						    &map, NULL);
+		if (sym && sym->name)
+			caller = sym->name;
+		else
+			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
 
 		printf(format, (unsigned long long)data->page,
 		       (unsigned long long)data->alloc_bytes / 1024,
 		       data->nr_alloc, data->order,
 		       migrate_type_str[data->migrate_type],
-		       (unsigned long)data->gfp_flags);
+		       (unsigned long)data->gfp_flags, caller);
+
+		next = rb_next(next);
+	}
+
+	if (n_lines == -1)
+		printf(" ...              | ...              | ...       | ...   | ...      | ...       | ...\n");
+
+	printf("%.105s\n", graph_dotted_line);
+}
+
+static void __print_page_caller_result(struct perf_session *session, int n_lines)
+{
+	struct rb_node *next = rb_first(&page_caller_sorted);
+	struct machine *machine = &session->machines.host;
+
+	printf("\n%.105s\n", graph_dotted_line);
+	printf(" Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n");
+	printf("%.105s\n", graph_dotted_line);
+
+	while (next && n_lines--) {
+		struct page_stat *data;
+		struct symbol *sym;
+		struct map *map;
+		char buf[32];
+		char *caller = buf;
+
+		data = rb_entry(next, struct page_stat, node);
+		sym = machine__find_kernel_function(machine, data->callsite,
+						    &map, NULL);
+		if (sym && sym->name)
+			caller = sym->name;
+		else
+			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
+
+		printf(" %'16llu | %'9d | %5d | %8s |  %08lx | %s\n",
+		       (unsigned long long)data->alloc_bytes / 1024,
+		       data->nr_alloc, data->order,
+		       migrate_type_str[data->migrate_type],
+		       (unsigned long)data->gfp_flags, caller);
 
 		next = rb_next(next);
 	}
 
 	if (n_lines == -1)
-		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
+		printf(" ...              | ...       | ...   | ...      | ...       | ...\n");
 
-	printf("%.80s\n", graph_dotted_line);
+	printf("%.105s\n", graph_dotted_line);
 }
 
 static void print_slab_summary(void)
@@ -682,8 +923,10 @@ static void print_slab_result(struct perf_session *session)
 
 static void print_page_result(struct perf_session *session)
 {
+	if (caller_flag)
+		__print_page_caller_result(session, caller_lines);
 	if (alloc_flag)
-		__print_page_result(&page_alloc_sorted, session, alloc_lines);
+		__print_page_alloc_result(session, alloc_lines);
 	print_page_summary();
 }
 
@@ -802,6 +1045,7 @@ static void sort_result(void)
 	}
 	if (kmem_page) {
 		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
+		__sort_page_result(&page_caller_tree, &page_caller_sorted);
 	}
 }
 
@@ -1084,7 +1328,7 @@ static int __cmd_record(int argc, const char **argv)
 	if (kmem_slab)
 		rec_argc += ARRAY_SIZE(slab_events);
 	if (kmem_page)
-		rec_argc += ARRAY_SIZE(page_events);
+		rec_argc += ARRAY_SIZE(page_events) + 1; /* for -g */
 
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
@@ -1099,6 +1343,8 @@ static int __cmd_record(int argc, const char **argv)
 			rec_argv[i] = strdup(slab_events[j]);
 	}
 	if (kmem_page) {
+		rec_argv[i++] = strdup("-g");
+
 		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
 			rec_argv[i] = strdup(page_events[j]);
 	}
@@ -1159,7 +1405,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 
 	file.path = input_name;
 
-	session = perf_session__new(&file, false, &perf_kmem);
+	kmem_session = session = perf_session__new(&file, false, &perf_kmem);
 	if (session == NULL)
 		return -1;
 
@@ -1172,6 +1418,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 
 		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
+		symbol_conf.use_callchain = true;
 	}
 
 	symbol__init(&session->header.env);
-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 5/9] perf kmem: Support sort keys on page analysis
  2015-04-06  5:36 ` Namhyung Kim
@ 2015-04-06  5:36   ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Add new sort keys for page: page, order, migtype, gfp - existing
'bytes', 'hit' and 'callsite' sort keys also work for page.  Note that
-s/--sort option should be preceded by either of --slab or --page
option to determine where the sort keys applies.

Now it properly groups and sorts allocation stats - so same
page/caller with different order/migtype/gfp will be printed on a
different line.

  # perf kmem stat --page --caller -l 10 -s order,hit

  --------------------------------------------------------------------------------------------
   Total alloc (KB) |  Hits     | Order | Mig.type | GFP flags | Callsite
  --------------------------------------------------------------------------------------------
                 64 |         4 |     2 |  RECLAIM |  00285250 | new_slab
             50,144 |    12,536 |     0 |  MOVABLE |  0102005a | __page_cache_alloc
                 52 |        13 |     0 | UNMOVABL |  002084d0 | pte_alloc_one
                 40 |        10 |     0 |  MOVABLE |  000280da | handle_mm_fault
                 28 |         7 |     0 | UNMOVABL |  000000d0 | __pollwait
                 20 |         5 |     0 |  MOVABLE |  000200da | do_wp_page
                 20 |         5 |     0 |  MOVABLE |  000200da | do_cow_fault
                 16 |         4 |     0 | UNMOVABL |  00000200 | __tlb_remove_page
                 16 |         4 |     0 | UNMOVABL |  000084d0 | __pmd_alloc
                  8 |         2 |     0 | UNMOVABL |  000084d0 | __pud_alloc
   ...              | ...       | ...   | ...      | ...       | ...
  --------------------------------------------------------------------------------------------

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-kmem.txt |   6 +-
 tools/perf/builtin-kmem.c              | 398 ++++++++++++++++++++++++++-------
 2 files changed, 317 insertions(+), 87 deletions(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
index 23219c65c16f..69e181272c51 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -37,7 +37,11 @@ OPTIONS
 
 -s <key[,key2...]>::
 --sort=<key[,key2...]>::
-	Sort the output (default: frag,hit,bytes)
+	Sort the output (default: 'frag,hit,bytes' for slab and 'bytes,hit'
+	for page).  Available sort keys are 'ptr, callsite, bytes, hit,
+	pingpong, frag' for slab and 'page, callsite, bytes, hit, order,
+	migtype, gfp' for page.  This option should be preceded by one of the
+	mode selection options - i.e. --slab, --page, --alloc and/or --caller.
 
 -l <num>::
 --line=<num>::
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 5b3ed17c293a..719aaf782116 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -30,7 +30,7 @@ static int	kmem_page;
 static long	kmem_page_size;
 
 struct alloc_stat;
-typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
+typedef int (*sort_fn_t)(void *, void *);
 
 static int			alloc_flag;
 static int			caller_flag;
@@ -181,8 +181,8 @@ static int perf_evsel__process_alloc_node_event(struct perf_evsel *evsel,
 	return ret;
 }
 
-static int ptr_cmp(struct alloc_stat *, struct alloc_stat *);
-static int callsite_cmp(struct alloc_stat *, struct alloc_stat *);
+static int ptr_cmp(void *, void *);
+static int slab_callsite_cmp(void *, void *);
 
 static struct alloc_stat *search_alloc_stat(unsigned long ptr,
 					    unsigned long call_site,
@@ -223,7 +223,8 @@ static int perf_evsel__process_free_event(struct perf_evsel *evsel,
 		s_alloc->pingpong++;
 
 		s_caller = search_alloc_stat(0, s_alloc->call_site,
-					     &root_caller_stat, callsite_cmp);
+					     &root_caller_stat,
+					     slab_callsite_cmp);
 		if (!s_caller)
 			return -1;
 		s_caller->pingpong++;
@@ -397,6 +398,7 @@ static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
 	return sample->ip;
 }
 
+
 static struct page_stat *search_page(u64 page, bool create)
 {
 	struct rb_node **node = &page_tree.rb_node;
@@ -432,40 +434,35 @@ static struct page_stat *search_page(u64 page, bool create)
 	return data;
 }
 
-static int page_stat_cmp(struct page_stat *a, struct page_stat *b)
-{
-	if (a->page > b->page)
-		return -1;
-	if (a->page < b->page)
-		return 1;
-	if (a->order > b->order)
-		return -1;
-	if (a->order < b->order)
-		return 1;
-	if (a->migrate_type > b->migrate_type)
-		return -1;
-	if (a->migrate_type < b->migrate_type)
-		return 1;
-	if (a->gfp_flags > b->gfp_flags)
-		return -1;
-	if (a->gfp_flags < b->gfp_flags)
-		return 1;
-	return 0;
-}
+struct sort_dimension {
+	const char		name[20];
+	sort_fn_t		cmp;
+	struct list_head	list;
+};
 
-static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool create)
+static LIST_HEAD(page_alloc_sort_input);
+static LIST_HEAD(page_caller_sort_input);
+
+static struct page_stat *search_page_alloc_stat(struct page_stat *this,
+						bool create)
 {
 	struct rb_node **node = &page_alloc_tree.rb_node;
 	struct rb_node *parent = NULL;
 	struct page_stat *data;
+	struct sort_dimension *sort;
 
 	while (*node) {
-		s64 cmp;
+		int cmp = 0;
 
 		parent = *node;
 		data = rb_entry(*node, struct page_stat, node);
 
-		cmp = page_stat_cmp(data, stat);
+		list_for_each_entry(sort, &page_alloc_sort_input, list) {
+			cmp = sort->cmp(this, data);
+			if (cmp)
+				break;
+		}
+
 		if (cmp < 0)
 			node = &parent->rb_left;
 		else if (cmp > 0)
@@ -479,10 +476,10 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
 
 	data = zalloc(sizeof(*data));
 	if (data != NULL) {
-		data->page = stat->page;
-		data->order = stat->order;
-		data->gfp_flags = stat->gfp_flags;
-		data->migrate_type = stat->migrate_type;
+		data->page = this->page;
+		data->order = this->order;
+		data->migrate_type = this->migrate_type;
+		data->gfp_flags = this->gfp_flags;
 
 		rb_link_node(&data->node, parent, node);
 		rb_insert_color(&data->node, &page_alloc_tree);
@@ -491,19 +488,26 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
 	return data;
 }
 
-static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
+static struct page_stat *search_page_caller_stat(struct page_stat *this,
+						 bool create)
 {
 	struct rb_node **node = &page_caller_tree.rb_node;
 	struct rb_node *parent = NULL;
 	struct page_stat *data;
+	struct sort_dimension *sort;
 
 	while (*node) {
-		s64 cmp;
+		int cmp = 0;
 
 		parent = *node;
 		data = rb_entry(*node, struct page_stat, node);
 
-		cmp = data->callsite - callsite;
+		list_for_each_entry(sort, &page_caller_sort_input, list) {
+			cmp = sort->cmp(this, data);
+			if (cmp)
+				break;
+		}
+
 		if (cmp < 0)
 			node = &parent->rb_left;
 		else if (cmp > 0)
@@ -517,7 +521,10 @@ static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
 
 	data = zalloc(sizeof(*data));
 	if (data != NULL) {
-		data->callsite = callsite;
+		data->callsite = this->callsite;
+		data->order = this->order;
+		data->migrate_type = this->migrate_type;
+		data->gfp_flags = this->gfp_flags;
 
 		rb_link_node(&data->node, parent, node);
 		rb_insert_color(&data->node, &page_caller_tree);
@@ -591,14 +598,11 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	stat->alloc_bytes += bytes;
 	stat->callsite = callsite;
 
-	stat = search_page_caller_stat(callsite, true);
+	this.callsite = callsite;
+	stat = search_page_caller_stat(&this, true);
 	if (stat == NULL)
 		return -ENOMEM;
 
-	stat->order = order;
-	stat->gfp_flags = gfp_flags;
-	stat->migrate_type = migrate_type;
-
 	stat->nr_alloc++;
 	stat->alloc_bytes += bytes;
 
@@ -652,7 +656,7 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	stat->nr_free++;
 	stat->free_bytes += bytes;
 
-	stat = search_page_caller_stat(this.callsite, false);
+	stat = search_page_caller_stat(&this, false);
 	if (stat == NULL)
 		return -ENOENT;
 
@@ -938,14 +942,10 @@ static void print_result(struct perf_session *session)
 		print_page_result(session);
 }
 
-struct sort_dimension {
-	const char		name[20];
-	sort_fn_t		cmp;
-	struct list_head	list;
-};
-
-static LIST_HEAD(caller_sort);
-static LIST_HEAD(alloc_sort);
+static LIST_HEAD(slab_caller_sort);
+static LIST_HEAD(slab_alloc_sort);
+static LIST_HEAD(page_caller_sort);
+static LIST_HEAD(page_alloc_sort);
 
 static void sort_slab_insert(struct rb_root *root, struct alloc_stat *data,
 			     struct list_head *sort_list)
@@ -994,10 +994,12 @@ static void __sort_slab_result(struct rb_root *root, struct rb_root *root_sorted
 	}
 }
 
-static void sort_page_insert(struct rb_root *root, struct page_stat *data)
+static void sort_page_insert(struct rb_root *root, struct page_stat *data,
+			     struct list_head *sort_list)
 {
 	struct rb_node **new = &root->rb_node;
 	struct rb_node *parent = NULL;
+	struct sort_dimension *sort;
 
 	while (*new) {
 		struct page_stat *this;
@@ -1006,8 +1008,11 @@ static void sort_page_insert(struct rb_root *root, struct page_stat *data)
 		this = rb_entry(*new, struct page_stat, node);
 		parent = *new;
 
-		/* TODO: support more sort key */
-		cmp = data->alloc_bytes - this->alloc_bytes;
+		list_for_each_entry(sort, sort_list, list) {
+			cmp = sort->cmp(data, this);
+			if (cmp)
+				break;
+		}
 
 		if (cmp > 0)
 			new = &parent->rb_left;
@@ -1019,7 +1024,8 @@ static void sort_page_insert(struct rb_root *root, struct page_stat *data)
 	rb_insert_color(&data->node, root);
 }
 
-static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted)
+static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted,
+			       struct list_head *sort_list)
 {
 	struct rb_node *node;
 	struct page_stat *data;
@@ -1031,7 +1037,7 @@ static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted
 
 		rb_erase(node, root);
 		data = rb_entry(node, struct page_stat, node);
-		sort_page_insert(root_sorted, data);
+		sort_page_insert(root_sorted, data, sort_list);
 	}
 }
 
@@ -1039,13 +1045,15 @@ static void sort_result(void)
 {
 	if (kmem_slab) {
 		__sort_slab_result(&root_alloc_stat, &root_alloc_sorted,
-				   &alloc_sort);
+				   &slab_alloc_sort);
 		__sort_slab_result(&root_caller_stat, &root_caller_sorted,
-				   &caller_sort);
+				   &slab_caller_sort);
 	}
 	if (kmem_page) {
-		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
-		__sort_page_result(&page_caller_tree, &page_caller_sorted);
+		__sort_page_result(&page_alloc_tree, &page_alloc_sorted,
+				   &page_alloc_sort);
+		__sort_page_result(&page_caller_tree, &page_caller_sorted,
+				   &page_caller_sort);
 	}
 }
 
@@ -1094,8 +1102,12 @@ out:
 	return err;
 }
 
-static int ptr_cmp(struct alloc_stat *l, struct alloc_stat *r)
+/* slab sort keys */
+static int ptr_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->ptr < r->ptr)
 		return -1;
 	else if (l->ptr > r->ptr)
@@ -1108,8 +1120,11 @@ static struct sort_dimension ptr_sort_dimension = {
 	.cmp	= ptr_cmp,
 };
 
-static int callsite_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int slab_callsite_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->call_site < r->call_site)
 		return -1;
 	else if (l->call_site > r->call_site)
@@ -1119,11 +1134,14 @@ static int callsite_cmp(struct alloc_stat *l, struct alloc_stat *r)
 
 static struct sort_dimension callsite_sort_dimension = {
 	.name	= "callsite",
-	.cmp	= callsite_cmp,
+	.cmp	= slab_callsite_cmp,
 };
 
-static int hit_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int hit_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->hit < r->hit)
 		return -1;
 	else if (l->hit > r->hit)
@@ -1136,8 +1154,11 @@ static struct sort_dimension hit_sort_dimension = {
 	.cmp	= hit_cmp,
 };
 
-static int bytes_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int bytes_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->bytes_alloc < r->bytes_alloc)
 		return -1;
 	else if (l->bytes_alloc > r->bytes_alloc)
@@ -1150,9 +1171,11 @@ static struct sort_dimension bytes_sort_dimension = {
 	.cmp	= bytes_cmp,
 };
 
-static int frag_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int frag_cmp(void *a, void *b)
 {
 	double x, y;
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
 
 	x = fragmentation(l->bytes_req, l->bytes_alloc);
 	y = fragmentation(r->bytes_req, r->bytes_alloc);
@@ -1169,8 +1192,11 @@ static struct sort_dimension frag_sort_dimension = {
 	.cmp	= frag_cmp,
 };
 
-static int pingpong_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int pingpong_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->pingpong < r->pingpong)
 		return -1;
 	else if (l->pingpong > r->pingpong)
@@ -1183,7 +1209,135 @@ static struct sort_dimension pingpong_sort_dimension = {
 	.cmp	= pingpong_cmp,
 };
 
-static struct sort_dimension *avail_sorts[] = {
+/* page sort keys */
+static int page_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->page < r->page)
+		return -1;
+	else if (l->page > r->page)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_sort_dimension = {
+	.name	= "page",
+	.cmp	= page_cmp,
+};
+
+static int page_callsite_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->callsite < r->callsite)
+		return -1;
+	else if (l->callsite > r->callsite)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_callsite_sort_dimension = {
+	.name	= "callsite",
+	.cmp	= page_callsite_cmp,
+};
+
+static int page_hit_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->nr_alloc < r->nr_alloc)
+		return -1;
+	else if (l->nr_alloc > r->nr_alloc)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_hit_sort_dimension = {
+	.name	= "hit",
+	.cmp	= page_hit_cmp,
+};
+
+static int page_bytes_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->alloc_bytes < r->alloc_bytes)
+		return -1;
+	else if (l->alloc_bytes > r->alloc_bytes)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_bytes_sort_dimension = {
+	.name	= "bytes",
+	.cmp	= page_bytes_cmp,
+};
+
+static int page_order_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->order < r->order)
+		return -1;
+	else if (l->order > r->order)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_order_sort_dimension = {
+	.name	= "order",
+	.cmp	= page_order_cmp,
+};
+
+static int migrate_type_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	/* for internal use to find free'd page */
+	if (l->migrate_type == -1U)
+		return 0;
+
+	if (l->migrate_type < r->migrate_type)
+		return -1;
+	else if (l->migrate_type > r->migrate_type)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension migrate_type_sort_dimension = {
+	.name	= "migtype",
+	.cmp	= migrate_type_cmp,
+};
+
+static int gfp_flags_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	/* for internal use to find free'd page */
+	if (l->gfp_flags == -1U)
+		return 0;
+
+	if (l->gfp_flags < r->gfp_flags)
+		return -1;
+	else if (l->gfp_flags > r->gfp_flags)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension gfp_flags_sort_dimension = {
+	.name	= "gfp",
+	.cmp	= gfp_flags_cmp,
+};
+
+static struct sort_dimension *slab_sorts[] = {
 	&ptr_sort_dimension,
 	&callsite_sort_dimension,
 	&hit_sort_dimension,
@@ -1192,16 +1346,44 @@ static struct sort_dimension *avail_sorts[] = {
 	&pingpong_sort_dimension,
 };
 
-#define NUM_AVAIL_SORTS	((int)ARRAY_SIZE(avail_sorts))
+static struct sort_dimension *page_sorts[] = {
+	&page_sort_dimension,
+	&page_callsite_sort_dimension,
+	&page_hit_sort_dimension,
+	&page_bytes_sort_dimension,
+	&page_order_sort_dimension,
+	&migrate_type_sort_dimension,
+	&gfp_flags_sort_dimension,
+};
+
+static int slab_sort_dimension__add(const char *tok, struct list_head *list)
+{
+	struct sort_dimension *sort;
+	int i;
+
+	for (i = 0; i < (int)ARRAY_SIZE(slab_sorts); i++) {
+		if (!strcmp(slab_sorts[i]->name, tok)) {
+			sort = memdup(slab_sorts[i], sizeof(*slab_sorts[i]));
+			if (!sort) {
+				pr_err("%s: memdup failed\n", __func__);
+				return -1;
+			}
+			list_add_tail(&sort->list, list);
+			return 0;
+		}
+	}
+
+	return -1;
+}
 
-static int sort_dimension__add(const char *tok, struct list_head *list)
+static int page_sort_dimension__add(const char *tok, struct list_head *list)
 {
 	struct sort_dimension *sort;
 	int i;
 
-	for (i = 0; i < NUM_AVAIL_SORTS; i++) {
-		if (!strcmp(avail_sorts[i]->name, tok)) {
-			sort = memdup(avail_sorts[i], sizeof(*avail_sorts[i]));
+	for (i = 0; i < (int)ARRAY_SIZE(page_sorts); i++) {
+		if (!strcmp(page_sorts[i]->name, tok)) {
+			sort = memdup(page_sorts[i], sizeof(*page_sorts[i]));
 			if (!sort) {
 				pr_err("%s: memdup failed\n", __func__);
 				return -1;
@@ -1214,7 +1396,7 @@ static int sort_dimension__add(const char *tok, struct list_head *list)
 	return -1;
 }
 
-static int setup_sorting(struct list_head *sort_list, const char *arg)
+static int setup_slab_sorting(struct list_head *sort_list, const char *arg)
 {
 	char *tok;
 	char *str = strdup(arg);
@@ -1229,8 +1411,34 @@ static int setup_sorting(struct list_head *sort_list, const char *arg)
 		tok = strsep(&pos, ",");
 		if (!tok)
 			break;
-		if (sort_dimension__add(tok, sort_list) < 0) {
-			error("Unknown --sort key: '%s'", tok);
+		if (slab_sort_dimension__add(tok, sort_list) < 0) {
+			error("Unknown slab --sort key: '%s'", tok);
+			free(str);
+			return -1;
+		}
+	}
+
+	free(str);
+	return 0;
+}
+
+static int setup_page_sorting(struct list_head *sort_list, const char *arg)
+{
+	char *tok;
+	char *str = strdup(arg);
+	char *pos = str;
+
+	if (!str) {
+		pr_err("%s: strdup failed\n", __func__);
+		return -1;
+	}
+
+	while (true) {
+		tok = strsep(&pos, ",");
+		if (!tok)
+			break;
+		if (page_sort_dimension__add(tok, sort_list) < 0) {
+			error("Unknown page --sort key: '%s'", tok);
 			free(str);
 			return -1;
 		}
@@ -1246,10 +1454,17 @@ static int parse_sort_opt(const struct option *opt __maybe_unused,
 	if (!arg)
 		return -1;
 
-	if (caller_flag > alloc_flag)
-		return setup_sorting(&caller_sort, arg);
-	else
-		return setup_sorting(&alloc_sort, arg);
+	if (kmem_page > kmem_slab) {
+		if (caller_flag > alloc_flag)
+			return setup_page_sorting(&page_caller_sort, arg);
+		else
+			return setup_page_sorting(&page_alloc_sort, arg);
+	} else {
+		if (caller_flag > alloc_flag)
+			return setup_slab_sorting(&slab_caller_sort, arg);
+		else
+			return setup_slab_sorting(&slab_alloc_sort, arg);
+	}
 
 	return 0;
 }
@@ -1357,10 +1572,11 @@ static int __cmd_record(int argc, const char **argv)
 
 int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 {
-	const char * const default_sort_order = "frag,hit,bytes";
 	struct perf_data_file file = {
 		.mode = PERF_DATA_MODE_READ,
 	};
+	const char * const default_slab_sort = "frag,hit,bytes";
+	const char * const default_page_sort = "bytes,hit";
 	const struct option kmem_options[] = {
 	OPT_STRING('i', "input", &input_name, "file", "input file name"),
 	OPT_INCR('v', "verbose", &verbose,
@@ -1370,8 +1586,8 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_CALLBACK_NOOPT(0, "alloc", NULL, NULL,
 			   "show per-allocation statistics", parse_alloc_opt),
 	OPT_CALLBACK('s', "sort", NULL, "key[,key2...]",
-		     "sort by keys: ptr, call_site, bytes, hit, pingpong, frag",
-		     parse_sort_opt),
+		     "sort by keys: ptr, callsite, bytes, hit, pingpong, frag, "
+		     "page, order, migtype, gfp", parse_sort_opt),
 	OPT_CALLBACK('l', "line", NULL, "num", "show n lines", parse_line_opt),
 	OPT_BOOLEAN(0, "raw-ip", &raw_ip, "show raw ip instead of symbol"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
@@ -1429,11 +1645,21 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 		if (cpu__setup_cpunode_map())
 			goto out_delete;
 
-		if (list_empty(&caller_sort))
-			setup_sorting(&caller_sort, default_sort_order);
-		if (list_empty(&alloc_sort))
-			setup_sorting(&alloc_sort, default_sort_order);
-
+		if (list_empty(&slab_caller_sort))
+			setup_slab_sorting(&slab_caller_sort, default_slab_sort);
+		if (list_empty(&slab_alloc_sort))
+			setup_slab_sorting(&slab_alloc_sort, default_slab_sort);
+		if (list_empty(&page_caller_sort))
+			setup_page_sorting(&page_caller_sort, default_page_sort);
+		if (list_empty(&page_alloc_sort))
+			setup_page_sorting(&page_alloc_sort, default_page_sort);
+
+		if (kmem_page) {
+			setup_page_sorting(&page_alloc_sort_input,
+					   "page,order,migtype,gfp");
+			setup_page_sorting(&page_caller_sort_input,
+					   "callsite,order,migtype,gfp");
+		}
 		ret = __cmd_kmem(session);
 	} else
 		usage_with_options(kmem_usage, kmem_options);
-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 5/9] perf kmem: Support sort keys on page analysis
@ 2015-04-06  5:36   ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Add new sort keys for page: page, order, migtype, gfp - existing
'bytes', 'hit' and 'callsite' sort keys also work for page.  Note that
-s/--sort option should be preceded by either of --slab or --page
option to determine where the sort keys applies.

Now it properly groups and sorts allocation stats - so same
page/caller with different order/migtype/gfp will be printed on a
different line.

  # perf kmem stat --page --caller -l 10 -s order,hit

  --------------------------------------------------------------------------------------------
   Total alloc (KB) |  Hits     | Order | Mig.type | GFP flags | Callsite
  --------------------------------------------------------------------------------------------
                 64 |         4 |     2 |  RECLAIM |  00285250 | new_slab
             50,144 |    12,536 |     0 |  MOVABLE |  0102005a | __page_cache_alloc
                 52 |        13 |     0 | UNMOVABL |  002084d0 | pte_alloc_one
                 40 |        10 |     0 |  MOVABLE |  000280da | handle_mm_fault
                 28 |         7 |     0 | UNMOVABL |  000000d0 | __pollwait
                 20 |         5 |     0 |  MOVABLE |  000200da | do_wp_page
                 20 |         5 |     0 |  MOVABLE |  000200da | do_cow_fault
                 16 |         4 |     0 | UNMOVABL |  00000200 | __tlb_remove_page
                 16 |         4 |     0 | UNMOVABL |  000084d0 | __pmd_alloc
                  8 |         2 |     0 | UNMOVABL |  000084d0 | __pud_alloc
   ...              | ...       | ...   | ...      | ...       | ...
  --------------------------------------------------------------------------------------------

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-kmem.txt |   6 +-
 tools/perf/builtin-kmem.c              | 398 ++++++++++++++++++++++++++-------
 2 files changed, 317 insertions(+), 87 deletions(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
index 23219c65c16f..69e181272c51 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -37,7 +37,11 @@ OPTIONS
 
 -s <key[,key2...]>::
 --sort=<key[,key2...]>::
-	Sort the output (default: frag,hit,bytes)
+	Sort the output (default: 'frag,hit,bytes' for slab and 'bytes,hit'
+	for page).  Available sort keys are 'ptr, callsite, bytes, hit,
+	pingpong, frag' for slab and 'page, callsite, bytes, hit, order,
+	migtype, gfp' for page.  This option should be preceded by one of the
+	mode selection options - i.e. --slab, --page, --alloc and/or --caller.
 
 -l <num>::
 --line=<num>::
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 5b3ed17c293a..719aaf782116 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -30,7 +30,7 @@ static int	kmem_page;
 static long	kmem_page_size;
 
 struct alloc_stat;
-typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
+typedef int (*sort_fn_t)(void *, void *);
 
 static int			alloc_flag;
 static int			caller_flag;
@@ -181,8 +181,8 @@ static int perf_evsel__process_alloc_node_event(struct perf_evsel *evsel,
 	return ret;
 }
 
-static int ptr_cmp(struct alloc_stat *, struct alloc_stat *);
-static int callsite_cmp(struct alloc_stat *, struct alloc_stat *);
+static int ptr_cmp(void *, void *);
+static int slab_callsite_cmp(void *, void *);
 
 static struct alloc_stat *search_alloc_stat(unsigned long ptr,
 					    unsigned long call_site,
@@ -223,7 +223,8 @@ static int perf_evsel__process_free_event(struct perf_evsel *evsel,
 		s_alloc->pingpong++;
 
 		s_caller = search_alloc_stat(0, s_alloc->call_site,
-					     &root_caller_stat, callsite_cmp);
+					     &root_caller_stat,
+					     slab_callsite_cmp);
 		if (!s_caller)
 			return -1;
 		s_caller->pingpong++;
@@ -397,6 +398,7 @@ static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
 	return sample->ip;
 }
 
+
 static struct page_stat *search_page(u64 page, bool create)
 {
 	struct rb_node **node = &page_tree.rb_node;
@@ -432,40 +434,35 @@ static struct page_stat *search_page(u64 page, bool create)
 	return data;
 }
 
-static int page_stat_cmp(struct page_stat *a, struct page_stat *b)
-{
-	if (a->page > b->page)
-		return -1;
-	if (a->page < b->page)
-		return 1;
-	if (a->order > b->order)
-		return -1;
-	if (a->order < b->order)
-		return 1;
-	if (a->migrate_type > b->migrate_type)
-		return -1;
-	if (a->migrate_type < b->migrate_type)
-		return 1;
-	if (a->gfp_flags > b->gfp_flags)
-		return -1;
-	if (a->gfp_flags < b->gfp_flags)
-		return 1;
-	return 0;
-}
+struct sort_dimension {
+	const char		name[20];
+	sort_fn_t		cmp;
+	struct list_head	list;
+};
 
-static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool create)
+static LIST_HEAD(page_alloc_sort_input);
+static LIST_HEAD(page_caller_sort_input);
+
+static struct page_stat *search_page_alloc_stat(struct page_stat *this,
+						bool create)
 {
 	struct rb_node **node = &page_alloc_tree.rb_node;
 	struct rb_node *parent = NULL;
 	struct page_stat *data;
+	struct sort_dimension *sort;
 
 	while (*node) {
-		s64 cmp;
+		int cmp = 0;
 
 		parent = *node;
 		data = rb_entry(*node, struct page_stat, node);
 
-		cmp = page_stat_cmp(data, stat);
+		list_for_each_entry(sort, &page_alloc_sort_input, list) {
+			cmp = sort->cmp(this, data);
+			if (cmp)
+				break;
+		}
+
 		if (cmp < 0)
 			node = &parent->rb_left;
 		else if (cmp > 0)
@@ -479,10 +476,10 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
 
 	data = zalloc(sizeof(*data));
 	if (data != NULL) {
-		data->page = stat->page;
-		data->order = stat->order;
-		data->gfp_flags = stat->gfp_flags;
-		data->migrate_type = stat->migrate_type;
+		data->page = this->page;
+		data->order = this->order;
+		data->migrate_type = this->migrate_type;
+		data->gfp_flags = this->gfp_flags;
 
 		rb_link_node(&data->node, parent, node);
 		rb_insert_color(&data->node, &page_alloc_tree);
@@ -491,19 +488,26 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
 	return data;
 }
 
-static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
+static struct page_stat *search_page_caller_stat(struct page_stat *this,
+						 bool create)
 {
 	struct rb_node **node = &page_caller_tree.rb_node;
 	struct rb_node *parent = NULL;
 	struct page_stat *data;
+	struct sort_dimension *sort;
 
 	while (*node) {
-		s64 cmp;
+		int cmp = 0;
 
 		parent = *node;
 		data = rb_entry(*node, struct page_stat, node);
 
-		cmp = data->callsite - callsite;
+		list_for_each_entry(sort, &page_caller_sort_input, list) {
+			cmp = sort->cmp(this, data);
+			if (cmp)
+				break;
+		}
+
 		if (cmp < 0)
 			node = &parent->rb_left;
 		else if (cmp > 0)
@@ -517,7 +521,10 @@ static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
 
 	data = zalloc(sizeof(*data));
 	if (data != NULL) {
-		data->callsite = callsite;
+		data->callsite = this->callsite;
+		data->order = this->order;
+		data->migrate_type = this->migrate_type;
+		data->gfp_flags = this->gfp_flags;
 
 		rb_link_node(&data->node, parent, node);
 		rb_insert_color(&data->node, &page_caller_tree);
@@ -591,14 +598,11 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	stat->alloc_bytes += bytes;
 	stat->callsite = callsite;
 
-	stat = search_page_caller_stat(callsite, true);
+	this.callsite = callsite;
+	stat = search_page_caller_stat(&this, true);
 	if (stat == NULL)
 		return -ENOMEM;
 
-	stat->order = order;
-	stat->gfp_flags = gfp_flags;
-	stat->migrate_type = migrate_type;
-
 	stat->nr_alloc++;
 	stat->alloc_bytes += bytes;
 
@@ -652,7 +656,7 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	stat->nr_free++;
 	stat->free_bytes += bytes;
 
-	stat = search_page_caller_stat(this.callsite, false);
+	stat = search_page_caller_stat(&this, false);
 	if (stat == NULL)
 		return -ENOENT;
 
@@ -938,14 +942,10 @@ static void print_result(struct perf_session *session)
 		print_page_result(session);
 }
 
-struct sort_dimension {
-	const char		name[20];
-	sort_fn_t		cmp;
-	struct list_head	list;
-};
-
-static LIST_HEAD(caller_sort);
-static LIST_HEAD(alloc_sort);
+static LIST_HEAD(slab_caller_sort);
+static LIST_HEAD(slab_alloc_sort);
+static LIST_HEAD(page_caller_sort);
+static LIST_HEAD(page_alloc_sort);
 
 static void sort_slab_insert(struct rb_root *root, struct alloc_stat *data,
 			     struct list_head *sort_list)
@@ -994,10 +994,12 @@ static void __sort_slab_result(struct rb_root *root, struct rb_root *root_sorted
 	}
 }
 
-static void sort_page_insert(struct rb_root *root, struct page_stat *data)
+static void sort_page_insert(struct rb_root *root, struct page_stat *data,
+			     struct list_head *sort_list)
 {
 	struct rb_node **new = &root->rb_node;
 	struct rb_node *parent = NULL;
+	struct sort_dimension *sort;
 
 	while (*new) {
 		struct page_stat *this;
@@ -1006,8 +1008,11 @@ static void sort_page_insert(struct rb_root *root, struct page_stat *data)
 		this = rb_entry(*new, struct page_stat, node);
 		parent = *new;
 
-		/* TODO: support more sort key */
-		cmp = data->alloc_bytes - this->alloc_bytes;
+		list_for_each_entry(sort, sort_list, list) {
+			cmp = sort->cmp(data, this);
+			if (cmp)
+				break;
+		}
 
 		if (cmp > 0)
 			new = &parent->rb_left;
@@ -1019,7 +1024,8 @@ static void sort_page_insert(struct rb_root *root, struct page_stat *data)
 	rb_insert_color(&data->node, root);
 }
 
-static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted)
+static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted,
+			       struct list_head *sort_list)
 {
 	struct rb_node *node;
 	struct page_stat *data;
@@ -1031,7 +1037,7 @@ static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted
 
 		rb_erase(node, root);
 		data = rb_entry(node, struct page_stat, node);
-		sort_page_insert(root_sorted, data);
+		sort_page_insert(root_sorted, data, sort_list);
 	}
 }
 
@@ -1039,13 +1045,15 @@ static void sort_result(void)
 {
 	if (kmem_slab) {
 		__sort_slab_result(&root_alloc_stat, &root_alloc_sorted,
-				   &alloc_sort);
+				   &slab_alloc_sort);
 		__sort_slab_result(&root_caller_stat, &root_caller_sorted,
-				   &caller_sort);
+				   &slab_caller_sort);
 	}
 	if (kmem_page) {
-		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
-		__sort_page_result(&page_caller_tree, &page_caller_sorted);
+		__sort_page_result(&page_alloc_tree, &page_alloc_sorted,
+				   &page_alloc_sort);
+		__sort_page_result(&page_caller_tree, &page_caller_sorted,
+				   &page_caller_sort);
 	}
 }
 
@@ -1094,8 +1102,12 @@ out:
 	return err;
 }
 
-static int ptr_cmp(struct alloc_stat *l, struct alloc_stat *r)
+/* slab sort keys */
+static int ptr_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->ptr < r->ptr)
 		return -1;
 	else if (l->ptr > r->ptr)
@@ -1108,8 +1120,11 @@ static struct sort_dimension ptr_sort_dimension = {
 	.cmp	= ptr_cmp,
 };
 
-static int callsite_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int slab_callsite_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->call_site < r->call_site)
 		return -1;
 	else if (l->call_site > r->call_site)
@@ -1119,11 +1134,14 @@ static int callsite_cmp(struct alloc_stat *l, struct alloc_stat *r)
 
 static struct sort_dimension callsite_sort_dimension = {
 	.name	= "callsite",
-	.cmp	= callsite_cmp,
+	.cmp	= slab_callsite_cmp,
 };
 
-static int hit_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int hit_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->hit < r->hit)
 		return -1;
 	else if (l->hit > r->hit)
@@ -1136,8 +1154,11 @@ static struct sort_dimension hit_sort_dimension = {
 	.cmp	= hit_cmp,
 };
 
-static int bytes_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int bytes_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->bytes_alloc < r->bytes_alloc)
 		return -1;
 	else if (l->bytes_alloc > r->bytes_alloc)
@@ -1150,9 +1171,11 @@ static struct sort_dimension bytes_sort_dimension = {
 	.cmp	= bytes_cmp,
 };
 
-static int frag_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int frag_cmp(void *a, void *b)
 {
 	double x, y;
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
 
 	x = fragmentation(l->bytes_req, l->bytes_alloc);
 	y = fragmentation(r->bytes_req, r->bytes_alloc);
@@ -1169,8 +1192,11 @@ static struct sort_dimension frag_sort_dimension = {
 	.cmp	= frag_cmp,
 };
 
-static int pingpong_cmp(struct alloc_stat *l, struct alloc_stat *r)
+static int pingpong_cmp(void *a, void *b)
 {
+	struct alloc_stat *l = a;
+	struct alloc_stat *r = b;
+
 	if (l->pingpong < r->pingpong)
 		return -1;
 	else if (l->pingpong > r->pingpong)
@@ -1183,7 +1209,135 @@ static struct sort_dimension pingpong_sort_dimension = {
 	.cmp	= pingpong_cmp,
 };
 
-static struct sort_dimension *avail_sorts[] = {
+/* page sort keys */
+static int page_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->page < r->page)
+		return -1;
+	else if (l->page > r->page)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_sort_dimension = {
+	.name	= "page",
+	.cmp	= page_cmp,
+};
+
+static int page_callsite_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->callsite < r->callsite)
+		return -1;
+	else if (l->callsite > r->callsite)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_callsite_sort_dimension = {
+	.name	= "callsite",
+	.cmp	= page_callsite_cmp,
+};
+
+static int page_hit_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->nr_alloc < r->nr_alloc)
+		return -1;
+	else if (l->nr_alloc > r->nr_alloc)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_hit_sort_dimension = {
+	.name	= "hit",
+	.cmp	= page_hit_cmp,
+};
+
+static int page_bytes_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->alloc_bytes < r->alloc_bytes)
+		return -1;
+	else if (l->alloc_bytes > r->alloc_bytes)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_bytes_sort_dimension = {
+	.name	= "bytes",
+	.cmp	= page_bytes_cmp,
+};
+
+static int page_order_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	if (l->order < r->order)
+		return -1;
+	else if (l->order > r->order)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension page_order_sort_dimension = {
+	.name	= "order",
+	.cmp	= page_order_cmp,
+};
+
+static int migrate_type_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	/* for internal use to find free'd page */
+	if (l->migrate_type == -1U)
+		return 0;
+
+	if (l->migrate_type < r->migrate_type)
+		return -1;
+	else if (l->migrate_type > r->migrate_type)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension migrate_type_sort_dimension = {
+	.name	= "migtype",
+	.cmp	= migrate_type_cmp,
+};
+
+static int gfp_flags_cmp(void *a, void *b)
+{
+	struct page_stat *l = a;
+	struct page_stat *r = b;
+
+	/* for internal use to find free'd page */
+	if (l->gfp_flags == -1U)
+		return 0;
+
+	if (l->gfp_flags < r->gfp_flags)
+		return -1;
+	else if (l->gfp_flags > r->gfp_flags)
+		return 1;
+	return 0;
+}
+
+static struct sort_dimension gfp_flags_sort_dimension = {
+	.name	= "gfp",
+	.cmp	= gfp_flags_cmp,
+};
+
+static struct sort_dimension *slab_sorts[] = {
 	&ptr_sort_dimension,
 	&callsite_sort_dimension,
 	&hit_sort_dimension,
@@ -1192,16 +1346,44 @@ static struct sort_dimension *avail_sorts[] = {
 	&pingpong_sort_dimension,
 };
 
-#define NUM_AVAIL_SORTS	((int)ARRAY_SIZE(avail_sorts))
+static struct sort_dimension *page_sorts[] = {
+	&page_sort_dimension,
+	&page_callsite_sort_dimension,
+	&page_hit_sort_dimension,
+	&page_bytes_sort_dimension,
+	&page_order_sort_dimension,
+	&migrate_type_sort_dimension,
+	&gfp_flags_sort_dimension,
+};
+
+static int slab_sort_dimension__add(const char *tok, struct list_head *list)
+{
+	struct sort_dimension *sort;
+	int i;
+
+	for (i = 0; i < (int)ARRAY_SIZE(slab_sorts); i++) {
+		if (!strcmp(slab_sorts[i]->name, tok)) {
+			sort = memdup(slab_sorts[i], sizeof(*slab_sorts[i]));
+			if (!sort) {
+				pr_err("%s: memdup failed\n", __func__);
+				return -1;
+			}
+			list_add_tail(&sort->list, list);
+			return 0;
+		}
+	}
+
+	return -1;
+}
 
-static int sort_dimension__add(const char *tok, struct list_head *list)
+static int page_sort_dimension__add(const char *tok, struct list_head *list)
 {
 	struct sort_dimension *sort;
 	int i;
 
-	for (i = 0; i < NUM_AVAIL_SORTS; i++) {
-		if (!strcmp(avail_sorts[i]->name, tok)) {
-			sort = memdup(avail_sorts[i], sizeof(*avail_sorts[i]));
+	for (i = 0; i < (int)ARRAY_SIZE(page_sorts); i++) {
+		if (!strcmp(page_sorts[i]->name, tok)) {
+			sort = memdup(page_sorts[i], sizeof(*page_sorts[i]));
 			if (!sort) {
 				pr_err("%s: memdup failed\n", __func__);
 				return -1;
@@ -1214,7 +1396,7 @@ static int sort_dimension__add(const char *tok, struct list_head *list)
 	return -1;
 }
 
-static int setup_sorting(struct list_head *sort_list, const char *arg)
+static int setup_slab_sorting(struct list_head *sort_list, const char *arg)
 {
 	char *tok;
 	char *str = strdup(arg);
@@ -1229,8 +1411,34 @@ static int setup_sorting(struct list_head *sort_list, const char *arg)
 		tok = strsep(&pos, ",");
 		if (!tok)
 			break;
-		if (sort_dimension__add(tok, sort_list) < 0) {
-			error("Unknown --sort key: '%s'", tok);
+		if (slab_sort_dimension__add(tok, sort_list) < 0) {
+			error("Unknown slab --sort key: '%s'", tok);
+			free(str);
+			return -1;
+		}
+	}
+
+	free(str);
+	return 0;
+}
+
+static int setup_page_sorting(struct list_head *sort_list, const char *arg)
+{
+	char *tok;
+	char *str = strdup(arg);
+	char *pos = str;
+
+	if (!str) {
+		pr_err("%s: strdup failed\n", __func__);
+		return -1;
+	}
+
+	while (true) {
+		tok = strsep(&pos, ",");
+		if (!tok)
+			break;
+		if (page_sort_dimension__add(tok, sort_list) < 0) {
+			error("Unknown page --sort key: '%s'", tok);
 			free(str);
 			return -1;
 		}
@@ -1246,10 +1454,17 @@ static int parse_sort_opt(const struct option *opt __maybe_unused,
 	if (!arg)
 		return -1;
 
-	if (caller_flag > alloc_flag)
-		return setup_sorting(&caller_sort, arg);
-	else
-		return setup_sorting(&alloc_sort, arg);
+	if (kmem_page > kmem_slab) {
+		if (caller_flag > alloc_flag)
+			return setup_page_sorting(&page_caller_sort, arg);
+		else
+			return setup_page_sorting(&page_alloc_sort, arg);
+	} else {
+		if (caller_flag > alloc_flag)
+			return setup_slab_sorting(&slab_caller_sort, arg);
+		else
+			return setup_slab_sorting(&slab_alloc_sort, arg);
+	}
 
 	return 0;
 }
@@ -1357,10 +1572,11 @@ static int __cmd_record(int argc, const char **argv)
 
 int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 {
-	const char * const default_sort_order = "frag,hit,bytes";
 	struct perf_data_file file = {
 		.mode = PERF_DATA_MODE_READ,
 	};
+	const char * const default_slab_sort = "frag,hit,bytes";
+	const char * const default_page_sort = "bytes,hit";
 	const struct option kmem_options[] = {
 	OPT_STRING('i', "input", &input_name, "file", "input file name"),
 	OPT_INCR('v', "verbose", &verbose,
@@ -1370,8 +1586,8 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_CALLBACK_NOOPT(0, "alloc", NULL, NULL,
 			   "show per-allocation statistics", parse_alloc_opt),
 	OPT_CALLBACK('s', "sort", NULL, "key[,key2...]",
-		     "sort by keys: ptr, call_site, bytes, hit, pingpong, frag",
-		     parse_sort_opt),
+		     "sort by keys: ptr, callsite, bytes, hit, pingpong, frag, "
+		     "page, order, migtype, gfp", parse_sort_opt),
 	OPT_CALLBACK('l', "line", NULL, "num", "show n lines", parse_line_opt),
 	OPT_BOOLEAN(0, "raw-ip", &raw_ip, "show raw ip instead of symbol"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
@@ -1429,11 +1645,21 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 		if (cpu__setup_cpunode_map())
 			goto out_delete;
 
-		if (list_empty(&caller_sort))
-			setup_sorting(&caller_sort, default_sort_order);
-		if (list_empty(&alloc_sort))
-			setup_sorting(&alloc_sort, default_sort_order);
-
+		if (list_empty(&slab_caller_sort))
+			setup_slab_sorting(&slab_caller_sort, default_slab_sort);
+		if (list_empty(&slab_alloc_sort))
+			setup_slab_sorting(&slab_alloc_sort, default_slab_sort);
+		if (list_empty(&page_caller_sort))
+			setup_page_sorting(&page_caller_sort, default_page_sort);
+		if (list_empty(&page_alloc_sort))
+			setup_page_sorting(&page_alloc_sort, default_page_sort);
+
+		if (kmem_page) {
+			setup_page_sorting(&page_alloc_sort_input,
+					   "page,order,migtype,gfp");
+			setup_page_sorting(&page_caller_sort_input,
+					   "callsite,order,migtype,gfp");
+		}
 		ret = __cmd_kmem(session);
 	} else
 		usage_with_options(kmem_usage, kmem_options);
-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 6/9] perf kmem: Add --live option for current allocation stat
  2015-04-06  5:36 ` Namhyung Kim
@ 2015-04-06  5:36   ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Currently perf kmem shows total (page) allocation stat by default, but
sometimes one might want to see live (total alloc-only) requests/pages
only.  The new --live option does this by subtracting freed allocation
from the stat.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-kmem.txt |   5 ++
 tools/perf/builtin-kmem.c              | 103 ++++++++++++++++++++-------------
 2 files changed, 69 insertions(+), 39 deletions(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
index 69e181272c51..ff0f433b3fce 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -56,6 +56,11 @@ OPTIONS
 --page::
 	Analyze page allocator events
 
+--live::
+	Show live page stat.  The perf kmem shows total allocation stat by
+	default, but this option shows live (currently allocated) pages
+	instead.  (This option works with --page option only)
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 719aaf782116..3311ebdd4fb8 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -244,6 +244,7 @@ static unsigned long nr_page_fails;
 static unsigned long nr_page_nomatch;
 
 static bool use_pfn;
+static bool live_page;
 static struct perf_session *kmem_session;
 
 #define MAX_MIGRATE_TYPES  6
@@ -264,7 +265,7 @@ struct page_stat {
 	int 		nr_free;
 };
 
-static struct rb_root page_tree;
+static struct rb_root page_live_tree;
 static struct rb_root page_alloc_tree;
 static struct rb_root page_alloc_sorted;
 static struct rb_root page_caller_tree;
@@ -398,10 +399,19 @@ static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
 	return sample->ip;
 }
 
+struct sort_dimension {
+	const char		name[20];
+	sort_fn_t		cmp;
+	struct list_head	list;
+};
+
+static LIST_HEAD(page_alloc_sort_input);
+static LIST_HEAD(page_caller_sort_input);
 
-static struct page_stat *search_page(u64 page, bool create)
+static struct page_stat *search_page_live_stat(struct page_stat *this,
+					       bool create)
 {
-	struct rb_node **node = &page_tree.rb_node;
+	struct rb_node **node = &page_live_tree.rb_node;
 	struct rb_node *parent = NULL;
 	struct page_stat *data;
 
@@ -411,7 +421,7 @@ static struct page_stat *search_page(u64 page, bool create)
 		parent = *node;
 		data = rb_entry(*node, struct page_stat, node);
 
-		cmp = data->page - page;
+		cmp = data->page - this->page;
 		if (cmp < 0)
 			node = &parent->rb_left;
 		else if (cmp > 0)
@@ -425,24 +435,17 @@ static struct page_stat *search_page(u64 page, bool create)
 
 	data = zalloc(sizeof(*data));
 	if (data != NULL) {
-		data->page = page;
+		data->page = this->page;
+		data->order = this->order;
+		data->migrate_type = this->migrate_type;
+		data->gfp_flags = this->gfp_flags;
 
 		rb_link_node(&data->node, parent, node);
-		rb_insert_color(&data->node, &page_tree);
+		rb_insert_color(&data->node, &page_live_tree);
 	}
 
 	return data;
 }
-
-struct sort_dimension {
-	const char		name[20];
-	sort_fn_t		cmp;
-	struct list_head	list;
-};
-
-static LIST_HEAD(page_alloc_sort_input);
-static LIST_HEAD(page_caller_sort_input);
-
 static struct page_stat *search_page_alloc_stat(struct page_stat *this,
 						bool create)
 {
@@ -580,17 +583,8 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	 * This is to find the current page (with correct gfp flags and
 	 * migrate type) at free event.
 	 */
-	stat = search_page(page, true);
-	if (stat == NULL)
-		return -ENOMEM;
-
-	stat->order = order;
-	stat->gfp_flags = gfp_flags;
-	stat->migrate_type = migrate_type;
-	stat->callsite = callsite;
-
 	this.page = page;
-	stat = search_page_alloc_stat(&this, true);
+	stat = search_page_live_stat(&this, true);
 	if (stat == NULL)
 		return -ENOMEM;
 
@@ -598,6 +592,16 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	stat->alloc_bytes += bytes;
 	stat->callsite = callsite;
 
+	if (!live_page) {
+		stat = search_page_alloc_stat(&this, true);
+		if (stat == NULL)
+			return -ENOMEM;
+
+		stat->nr_alloc++;
+		stat->alloc_bytes += bytes;
+		stat->callsite = callsite;
+	}
+
 	this.callsite = callsite;
 	stat = search_page_caller_stat(&this, true);
 	if (stat == NULL)
@@ -630,7 +634,8 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	nr_page_frees++;
 	total_page_free_bytes += bytes;
 
-	stat = search_page(page, false);
+	this.page = page;
+	stat = search_page_live_stat(&this, false);
 	if (stat == NULL) {
 		pr_debug2("missing free at page %"PRIx64" (order: %d)\n",
 			  page, order);
@@ -641,20 +646,23 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	this.page = page;
 	this.gfp_flags = stat->gfp_flags;
 	this.migrate_type = stat->migrate_type;
 	this.callsite = stat->callsite;
 
-	rb_erase(&stat->node, &page_tree);
+	rb_erase(&stat->node, &page_live_tree);
 	free(stat);
 
-	stat = search_page_alloc_stat(&this, false);
-	if (stat == NULL)
-		return -ENOENT;
+	if (live_page) {
+		order_stats[this.order][this.migrate_type]--;
+	} else {
+		stat = search_page_alloc_stat(&this, false);
+		if (stat == NULL)
+			return -ENOMEM;
 
-	stat->nr_free++;
-	stat->free_bytes += bytes;
+		stat->nr_free++;
+		stat->free_bytes += bytes;
+	}
 
 	stat = search_page_caller_stat(&this, false);
 	if (stat == NULL)
@@ -663,6 +671,16 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	stat->nr_free++;
 	stat->free_bytes += bytes;
 
+	if (live_page) {
+		stat->nr_alloc--;
+		stat->alloc_bytes -= bytes;
+
+		if (stat->nr_alloc == 0) {
+			rb_erase(&stat->node, &page_caller_tree);
+			free(stat);
+		}
+	}
+
 	return 0;
 }
 
@@ -780,8 +798,8 @@ static void __print_page_alloc_result(struct perf_session *session, int n_lines)
 	const char *format;
 
 	printf("\n%.105s\n", graph_dotted_line);
-	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
-	       use_pfn ? "PFN" : "Page");
+	printf(" %-16s | %5s alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
+	       use_pfn ? "PFN" : "Page", live_page ? "Live" : "Total");
 	printf("%.105s\n", graph_dotted_line);
 
 	if (use_pfn)
@@ -825,7 +843,8 @@ static void __print_page_caller_result(struct perf_session *session, int n_lines
 	struct machine *machine = &session->machines.host;
 
 	printf("\n%.105s\n", graph_dotted_line);
-	printf(" Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n");
+	printf(" %5s alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
+	       live_page ? "Live" : "Total");
 	printf("%.105s\n", graph_dotted_line);
 
 	while (next && n_lines--) {
@@ -1050,8 +1069,13 @@ static void sort_result(void)
 				   &slab_caller_sort);
 	}
 	if (kmem_page) {
-		__sort_page_result(&page_alloc_tree, &page_alloc_sorted,
-				   &page_alloc_sort);
+		if (live_page)
+			__sort_page_result(&page_live_tree, &page_alloc_sorted,
+					   &page_alloc_sort);
+		else
+			__sort_page_result(&page_alloc_tree, &page_alloc_sorted,
+					   &page_alloc_sort);
+
 		__sort_page_result(&page_caller_tree, &page_caller_sorted,
 				   &page_caller_sort);
 	}
@@ -1595,6 +1619,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 			   parse_slab_opt),
 	OPT_CALLBACK_NOOPT(0, "page", NULL, NULL, "Analyze page allocator",
 			   parse_page_opt),
+	OPT_BOOLEAN(0, "live", &live_page, "Show live page stat"),
 	OPT_END()
 	};
 	const char *const kmem_subcommands[] = { "record", "stat", NULL };
-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 6/9] perf kmem: Add --live option for current allocation stat
@ 2015-04-06  5:36   ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Currently perf kmem shows total (page) allocation stat by default, but
sometimes one might want to see live (total alloc-only) requests/pages
only.  The new --live option does this by subtracting freed allocation
from the stat.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-kmem.txt |   5 ++
 tools/perf/builtin-kmem.c              | 103 ++++++++++++++++++++-------------
 2 files changed, 69 insertions(+), 39 deletions(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
index 69e181272c51..ff0f433b3fce 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -56,6 +56,11 @@ OPTIONS
 --page::
 	Analyze page allocator events
 
+--live::
+	Show live page stat.  The perf kmem shows total allocation stat by
+	default, but this option shows live (currently allocated) pages
+	instead.  (This option works with --page option only)
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 719aaf782116..3311ebdd4fb8 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -244,6 +244,7 @@ static unsigned long nr_page_fails;
 static unsigned long nr_page_nomatch;
 
 static bool use_pfn;
+static bool live_page;
 static struct perf_session *kmem_session;
 
 #define MAX_MIGRATE_TYPES  6
@@ -264,7 +265,7 @@ struct page_stat {
 	int 		nr_free;
 };
 
-static struct rb_root page_tree;
+static struct rb_root page_live_tree;
 static struct rb_root page_alloc_tree;
 static struct rb_root page_alloc_sorted;
 static struct rb_root page_caller_tree;
@@ -398,10 +399,19 @@ static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
 	return sample->ip;
 }
 
+struct sort_dimension {
+	const char		name[20];
+	sort_fn_t		cmp;
+	struct list_head	list;
+};
+
+static LIST_HEAD(page_alloc_sort_input);
+static LIST_HEAD(page_caller_sort_input);
 
-static struct page_stat *search_page(u64 page, bool create)
+static struct page_stat *search_page_live_stat(struct page_stat *this,
+					       bool create)
 {
-	struct rb_node **node = &page_tree.rb_node;
+	struct rb_node **node = &page_live_tree.rb_node;
 	struct rb_node *parent = NULL;
 	struct page_stat *data;
 
@@ -411,7 +421,7 @@ static struct page_stat *search_page(u64 page, bool create)
 		parent = *node;
 		data = rb_entry(*node, struct page_stat, node);
 
-		cmp = data->page - page;
+		cmp = data->page - this->page;
 		if (cmp < 0)
 			node = &parent->rb_left;
 		else if (cmp > 0)
@@ -425,24 +435,17 @@ static struct page_stat *search_page(u64 page, bool create)
 
 	data = zalloc(sizeof(*data));
 	if (data != NULL) {
-		data->page = page;
+		data->page = this->page;
+		data->order = this->order;
+		data->migrate_type = this->migrate_type;
+		data->gfp_flags = this->gfp_flags;
 
 		rb_link_node(&data->node, parent, node);
-		rb_insert_color(&data->node, &page_tree);
+		rb_insert_color(&data->node, &page_live_tree);
 	}
 
 	return data;
 }
-
-struct sort_dimension {
-	const char		name[20];
-	sort_fn_t		cmp;
-	struct list_head	list;
-};
-
-static LIST_HEAD(page_alloc_sort_input);
-static LIST_HEAD(page_caller_sort_input);
-
 static struct page_stat *search_page_alloc_stat(struct page_stat *this,
 						bool create)
 {
@@ -580,17 +583,8 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	 * This is to find the current page (with correct gfp flags and
 	 * migrate type) at free event.
 	 */
-	stat = search_page(page, true);
-	if (stat == NULL)
-		return -ENOMEM;
-
-	stat->order = order;
-	stat->gfp_flags = gfp_flags;
-	stat->migrate_type = migrate_type;
-	stat->callsite = callsite;
-
 	this.page = page;
-	stat = search_page_alloc_stat(&this, true);
+	stat = search_page_live_stat(&this, true);
 	if (stat == NULL)
 		return -ENOMEM;
 
@@ -598,6 +592,16 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 	stat->alloc_bytes += bytes;
 	stat->callsite = callsite;
 
+	if (!live_page) {
+		stat = search_page_alloc_stat(&this, true);
+		if (stat == NULL)
+			return -ENOMEM;
+
+		stat->nr_alloc++;
+		stat->alloc_bytes += bytes;
+		stat->callsite = callsite;
+	}
+
 	this.callsite = callsite;
 	stat = search_page_caller_stat(&this, true);
 	if (stat == NULL)
@@ -630,7 +634,8 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	nr_page_frees++;
 	total_page_free_bytes += bytes;
 
-	stat = search_page(page, false);
+	this.page = page;
+	stat = search_page_live_stat(&this, false);
 	if (stat == NULL) {
 		pr_debug2("missing free at page %"PRIx64" (order: %d)\n",
 			  page, order);
@@ -641,20 +646,23 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	this.page = page;
 	this.gfp_flags = stat->gfp_flags;
 	this.migrate_type = stat->migrate_type;
 	this.callsite = stat->callsite;
 
-	rb_erase(&stat->node, &page_tree);
+	rb_erase(&stat->node, &page_live_tree);
 	free(stat);
 
-	stat = search_page_alloc_stat(&this, false);
-	if (stat == NULL)
-		return -ENOENT;
+	if (live_page) {
+		order_stats[this.order][this.migrate_type]--;
+	} else {
+		stat = search_page_alloc_stat(&this, false);
+		if (stat == NULL)
+			return -ENOMEM;
 
-	stat->nr_free++;
-	stat->free_bytes += bytes;
+		stat->nr_free++;
+		stat->free_bytes += bytes;
+	}
 
 	stat = search_page_caller_stat(&this, false);
 	if (stat == NULL)
@@ -663,6 +671,16 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
 	stat->nr_free++;
 	stat->free_bytes += bytes;
 
+	if (live_page) {
+		stat->nr_alloc--;
+		stat->alloc_bytes -= bytes;
+
+		if (stat->nr_alloc == 0) {
+			rb_erase(&stat->node, &page_caller_tree);
+			free(stat);
+		}
+	}
+
 	return 0;
 }
 
@@ -780,8 +798,8 @@ static void __print_page_alloc_result(struct perf_session *session, int n_lines)
 	const char *format;
 
 	printf("\n%.105s\n", graph_dotted_line);
-	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
-	       use_pfn ? "PFN" : "Page");
+	printf(" %-16s | %5s alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
+	       use_pfn ? "PFN" : "Page", live_page ? "Live" : "Total");
 	printf("%.105s\n", graph_dotted_line);
 
 	if (use_pfn)
@@ -825,7 +843,8 @@ static void __print_page_caller_result(struct perf_session *session, int n_lines
 	struct machine *machine = &session->machines.host;
 
 	printf("\n%.105s\n", graph_dotted_line);
-	printf(" Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n");
+	printf(" %5s alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
+	       live_page ? "Live" : "Total");
 	printf("%.105s\n", graph_dotted_line);
 
 	while (next && n_lines--) {
@@ -1050,8 +1069,13 @@ static void sort_result(void)
 				   &slab_caller_sort);
 	}
 	if (kmem_page) {
-		__sort_page_result(&page_alloc_tree, &page_alloc_sorted,
-				   &page_alloc_sort);
+		if (live_page)
+			__sort_page_result(&page_live_tree, &page_alloc_sorted,
+					   &page_alloc_sort);
+		else
+			__sort_page_result(&page_alloc_tree, &page_alloc_sorted,
+					   &page_alloc_sort);
+
 		__sort_page_result(&page_caller_tree, &page_caller_sorted,
 				   &page_caller_sort);
 	}
@@ -1595,6 +1619,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 			   parse_slab_opt),
 	OPT_CALLBACK_NOOPT(0, "page", NULL, NULL, "Analyze page allocator",
 			   parse_page_opt),
+	OPT_BOOLEAN(0, "live", &live_page, "Show live page stat"),
 	OPT_END()
 	};
 	const char *const kmem_subcommands[] = { "record", "stat", NULL };
-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7/9] perf kmem: Print gfp flags in human readable string
  2015-04-06  5:36 ` Namhyung Kim
@ 2015-04-06  5:36   ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Save libtraceevent output and print it in the header.

  # perf kmem stat --page --caller
  # GFP flags
  # ---------
  # 00000010:       NI: GFP_NOIO
  # 000000d0:        K: GFP_KERNEL
  # 00000200:      NWR: GFP_NOWARN
  # 000084d0:    K|R|Z: GFP_KERNEL|GFP_REPEAT|GFP_ZERO
  # 000200d2:       HU: GFP_HIGHUSER
  # 000200da:      HUM: GFP_HIGHUSER_MOVABLE
  # 000280da:    HUM|Z: GFP_HIGHUSER_MOVABLE|GFP_ZERO
  # 002084d0: K|R|Z|NT: GFP_KERNEL|GFP_REPEAT|GFP_ZERO|GFP_NOTRACK
  # 0102005a:  NF|HW|M: GFP_NOFS|GFP_HARDWALL|GFP_MOVABLE

  ---------------------------------------------------------------------------------------------------------
   Total alloc (KB) | Hits      | Order | Migration type | GFP flags | Callsite
  ---------------------------------------------------------------------------------------------------------
                 60 |        15 |     0 |      UNMOVABLE | K|R|Z|NT  | pte_alloc_one
                 40 |        10 |     0 |        MOVABLE | HUM|Z     | handle_mm_fault
                 24 |         6 |     0 |        MOVABLE | HUM       | do_wp_page
                 24 |         6 |     0 |      UNMOVABLE | K         | __pollwait
   ...

Requested-by: Joonsoo Kim <js1304@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-kmem.c | 221 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 208 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 3311ebdd4fb8..bf0f8bf56375 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -545,6 +545,176 @@ static bool valid_page(u64 pfn_or_page)
 	return true;
 }
 
+struct gfp_flag {
+	unsigned int flags;
+	char *compact_str;
+	char *human_readable;
+};
+
+static struct gfp_flag *gfps;
+static int nr_gfps;
+
+static int gfpcmp(const void *a, const void *b)
+{
+	const struct gfp_flag *fa = a;
+	const struct gfp_flag *fb = b;
+
+	return fa->flags - fb->flags;
+}
+
+/* see include/trace/events/gfpflags.h */
+static const struct {
+	const char *original;
+	const char *compact;
+} gfp_compact_table[] = {
+	{ "GFP_TRANSHUGE",		"THP" },
+	{ "GFP_HIGHUSER_MOVABLE",	"HUM" },
+	{ "GFP_HIGHUSER",		"HU" },
+	{ "GFP_USER",			"U" },
+	{ "GFP_TEMPORARY",		"TMP" },
+	{ "GFP_KERNEL",			"K" },
+	{ "GFP_NOFS",			"NF" },
+	{ "GFP_ATOMIC",			"A" },
+	{ "GFP_NOIO",			"NI" },
+	{ "GFP_HIGH",			"H" },
+	{ "GFP_WAIT",			"W" },
+	{ "GFP_IO",			"I" },
+	{ "GFP_COLD",			"CO" },
+	{ "GFP_NOWARN",			"NWR" },
+	{ "GFP_REPEAT",			"R" },
+	{ "GFP_NOFAIL",			"NF" },
+	{ "GFP_NORETRY",		"NR" },
+	{ "GFP_COMP",			"C" },
+	{ "GFP_ZERO",			"Z" },
+	{ "GFP_NOMEMALLOC",		"NMA" },
+	{ "GFP_MEMALLOC",		"MA" },
+	{ "GFP_HARDWALL",		"HW" },
+	{ "GFP_THISNODE",		"TN" },
+	{ "GFP_RECLAIMABLE",		"RC" },
+	{ "GFP_MOVABLE",		"M" },
+	{ "GFP_NOTRACK",		"NT" },
+	{ "GFP_NO_KSWAPD",		"NK" },
+	{ "GFP_OTHER_NODE",		"ON" },
+	{ "GFP_NOWAIT",			"NW" },
+};
+
+static size_t max_gfp_len;
+
+static char *compact_gfp_flags(char *gfp_flags)
+{
+	char *orig_flags = strdup(gfp_flags);
+	char *new_flags = NULL;
+	char *str, *pos;
+	size_t len = 0;
+
+	if (orig_flags == NULL)
+		return NULL;
+
+	str = strtok_r(orig_flags, "|", &pos);
+	while (str) {
+		size_t i;
+		char *new;
+		const char *cpt;
+
+		for (i = 0; i < ARRAY_SIZE(gfp_compact_table); i++) {
+			if (strcmp(gfp_compact_table[i].original, str))
+				continue;
+
+			cpt = gfp_compact_table[i].compact;
+			new = realloc(new_flags, len + strlen(cpt) + 2);
+			if (new == NULL) {
+				free(new_flags);
+				return NULL;
+			}
+
+			new_flags = new;
+
+			if (!len) {
+				strcpy(new_flags, cpt);
+			} else {
+				strcat(new_flags, "|");
+				strcat(new_flags, cpt);
+				len++;
+			}
+
+			len += strlen(cpt);
+		}
+
+		str = strtok_r(NULL, "|", &pos);
+	}
+
+	if (max_gfp_len < len)
+		max_gfp_len = len;
+
+	free(orig_flags);
+	return new_flags;
+}
+
+static char *compact_gfp_string(unsigned long gfp_flags)
+{
+	struct gfp_flag key = {
+		.flags = gfp_flags,
+	};
+	struct gfp_flag *gfp;
+
+	gfp = bsearch(&key, gfps, nr_gfps, sizeof(*gfps), gfpcmp);
+	if (gfp)
+		return gfp->compact_str;
+
+	return NULL;
+}
+
+static int parse_gfp_flags(struct perf_evsel *evsel, struct perf_sample *sample,
+			   unsigned int gfp_flags)
+{
+	struct pevent_record record = {
+		.cpu = sample->cpu,
+		.data = sample->raw_data,
+		.size = sample->raw_size,
+	};
+	struct trace_seq seq;
+	char *str, *pos;
+
+	if (nr_gfps) {
+		struct gfp_flag key = {
+			.flags = gfp_flags,
+		};
+
+		if (bsearch(&key, gfps, nr_gfps, sizeof(*gfps), gfpcmp))
+			return 0;
+	}
+
+	trace_seq_init(&seq);
+	pevent_event_info(&seq, evsel->tp_format, &record);
+
+	str = strtok_r(seq.buffer, " ", &pos);
+	while (str) {
+		if (!strncmp(str, "gfp_flags=", 10)) {
+			struct gfp_flag *new;
+
+			new = realloc(gfps, (nr_gfps + 1) * sizeof(*gfps));
+			if (new == NULL)
+				return -ENOMEM;
+
+			gfps = new;
+			new += nr_gfps++;
+
+			new->flags = gfp_flags;
+			new->human_readable = strdup(str + 10);
+			new->compact_str = compact_gfp_flags(str + 10);
+			if (!new->human_readable || !new->compact_str)
+				return -ENOMEM;
+
+			qsort(gfps, nr_gfps, sizeof(*gfps), gfpcmp);
+		}
+
+		str = strtok_r(NULL, " ", &pos);
+	}
+
+	trace_seq_destroy(&seq);
+	return 0;
+}
+
 static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 						struct perf_sample *sample)
 {
@@ -577,6 +747,9 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 		return 0;
 	}
 
+	if (parse_gfp_flags(evsel, sample, gfp_flags) < 0)
+		return -1;
+
 	callsite = find_callsite(evsel, sample);
 
 	/*
@@ -796,16 +969,18 @@ static void __print_page_alloc_result(struct perf_session *session, int n_lines)
 	struct rb_node *next = rb_first(&page_alloc_sorted);
 	struct machine *machine = &session->machines.host;
 	const char *format;
+	int gfp_len = max(strlen("GFP flags"), max_gfp_len);
 
 	printf("\n%.105s\n", graph_dotted_line);
-	printf(" %-16s | %5s alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
-	       use_pfn ? "PFN" : "Page", live_page ? "Live" : "Total");
+	printf(" %-16s | %5s alloc (KB) | Hits      | Order | Mig.type | %-*s | Callsite\n",
+	       use_pfn ? "PFN" : "Page", live_page ? "Live" : "Total",
+	       gfp_len, "GFP flags");
 	printf("%.105s\n", graph_dotted_line);
 
 	if (use_pfn)
-		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
+		format = " %16llu | %'16llu | %'9d | %5d | %8s | %-*s | %s\n";
 	else
-		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
+		format = " %016llx | %'16llu | %'9d | %5d | %8s | %-*s | %s\n";
 
 	while (next && n_lines--) {
 		struct page_stat *data;
@@ -826,13 +1001,15 @@ static void __print_page_alloc_result(struct perf_session *session, int n_lines)
 		       (unsigned long long)data->alloc_bytes / 1024,
 		       data->nr_alloc, data->order,
 		       migrate_type_str[data->migrate_type],
-		       (unsigned long)data->gfp_flags, caller);
+		       gfp_len, compact_gfp_string(data->gfp_flags), caller);
 
 		next = rb_next(next);
 	}
 
-	if (n_lines == -1)
-		printf(" ...              | ...              | ...       | ...   | ...      | ...       | ...\n");
+	if (n_lines == -1) {
+		printf(" ...              | ...              | ...       | ...   | ...      | %-*s | ...\n",
+		       gfp_len, "...");
+	}
 
 	printf("%.105s\n", graph_dotted_line);
 }
@@ -841,10 +1018,11 @@ static void __print_page_caller_result(struct perf_session *session, int n_lines
 {
 	struct rb_node *next = rb_first(&page_caller_sorted);
 	struct machine *machine = &session->machines.host;
+	int gfp_len = max(strlen("GFP flags"), max_gfp_len);
 
 	printf("\n%.105s\n", graph_dotted_line);
-	printf(" %5s alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
-	       live_page ? "Live" : "Total");
+	printf(" %5s alloc (KB) | Hits      | Order | Mig.type | %-*s | Callsite\n",
+	       live_page ? "Live" : "Total", gfp_len, "GFP flags");
 	printf("%.105s\n", graph_dotted_line);
 
 	while (next && n_lines--) {
@@ -862,21 +1040,36 @@ static void __print_page_caller_result(struct perf_session *session, int n_lines
 		else
 			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
 
-		printf(" %'16llu | %'9d | %5d | %8s |  %08lx | %s\n",
+		printf(" %'16llu | %'9d | %5d | %8s | %-*s | %s\n",
 		       (unsigned long long)data->alloc_bytes / 1024,
 		       data->nr_alloc, data->order,
 		       migrate_type_str[data->migrate_type],
-		       (unsigned long)data->gfp_flags, caller);
+		       gfp_len, compact_gfp_string(data->gfp_flags), caller);
 
 		next = rb_next(next);
 	}
 
-	if (n_lines == -1)
-		printf(" ...              | ...       | ...   | ...      | ...       | ...\n");
+	if (n_lines == -1) {
+		printf(" ...              | ...       | ...   | ...      | %-*s | ...\n",
+		       gfp_len, "...");
+	}
 
 	printf("%.105s\n", graph_dotted_line);
 }
 
+static void print_gfp_flags(void)
+{
+	int i;
+
+	printf("# GFP flags\n");
+	printf("# ---------\n");
+	for (i = 0; i < nr_gfps; i++) {
+		printf("# %08x: %*s: %s\n", gfps[i].flags,
+		       (int) max_gfp_len, gfps[i].compact_str,
+		       gfps[i].human_readable);
+	}
+}
+
 static void print_slab_summary(void)
 {
 	printf("\nSUMMARY (SLAB allocator)");
@@ -946,6 +1139,8 @@ static void print_slab_result(struct perf_session *session)
 
 static void print_page_result(struct perf_session *session)
 {
+	if (caller_flag || alloc_flag)
+		print_gfp_flags();
 	if (caller_flag)
 		__print_page_caller_result(session, caller_lines);
 	if (alloc_flag)
-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7/9] perf kmem: Print gfp flags in human readable string
@ 2015-04-06  5:36   ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Save libtraceevent output and print it in the header.

  # perf kmem stat --page --caller
  # GFP flags
  # ---------
  # 00000010:       NI: GFP_NOIO
  # 000000d0:        K: GFP_KERNEL
  # 00000200:      NWR: GFP_NOWARN
  # 000084d0:    K|R|Z: GFP_KERNEL|GFP_REPEAT|GFP_ZERO
  # 000200d2:       HU: GFP_HIGHUSER
  # 000200da:      HUM: GFP_HIGHUSER_MOVABLE
  # 000280da:    HUM|Z: GFP_HIGHUSER_MOVABLE|GFP_ZERO
  # 002084d0: K|R|Z|NT: GFP_KERNEL|GFP_REPEAT|GFP_ZERO|GFP_NOTRACK
  # 0102005a:  NF|HW|M: GFP_NOFS|GFP_HARDWALL|GFP_MOVABLE

  ---------------------------------------------------------------------------------------------------------
   Total alloc (KB) | Hits      | Order | Migration type | GFP flags | Callsite
  ---------------------------------------------------------------------------------------------------------
                 60 |        15 |     0 |      UNMOVABLE | K|R|Z|NT  | pte_alloc_one
                 40 |        10 |     0 |        MOVABLE | HUM|Z     | handle_mm_fault
                 24 |         6 |     0 |        MOVABLE | HUM       | do_wp_page
                 24 |         6 |     0 |      UNMOVABLE | K         | __pollwait
   ...

Requested-by: Joonsoo Kim <js1304@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-kmem.c | 221 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 208 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 3311ebdd4fb8..bf0f8bf56375 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -545,6 +545,176 @@ static bool valid_page(u64 pfn_or_page)
 	return true;
 }
 
+struct gfp_flag {
+	unsigned int flags;
+	char *compact_str;
+	char *human_readable;
+};
+
+static struct gfp_flag *gfps;
+static int nr_gfps;
+
+static int gfpcmp(const void *a, const void *b)
+{
+	const struct gfp_flag *fa = a;
+	const struct gfp_flag *fb = b;
+
+	return fa->flags - fb->flags;
+}
+
+/* see include/trace/events/gfpflags.h */
+static const struct {
+	const char *original;
+	const char *compact;
+} gfp_compact_table[] = {
+	{ "GFP_TRANSHUGE",		"THP" },
+	{ "GFP_HIGHUSER_MOVABLE",	"HUM" },
+	{ "GFP_HIGHUSER",		"HU" },
+	{ "GFP_USER",			"U" },
+	{ "GFP_TEMPORARY",		"TMP" },
+	{ "GFP_KERNEL",			"K" },
+	{ "GFP_NOFS",			"NF" },
+	{ "GFP_ATOMIC",			"A" },
+	{ "GFP_NOIO",			"NI" },
+	{ "GFP_HIGH",			"H" },
+	{ "GFP_WAIT",			"W" },
+	{ "GFP_IO",			"I" },
+	{ "GFP_COLD",			"CO" },
+	{ "GFP_NOWARN",			"NWR" },
+	{ "GFP_REPEAT",			"R" },
+	{ "GFP_NOFAIL",			"NF" },
+	{ "GFP_NORETRY",		"NR" },
+	{ "GFP_COMP",			"C" },
+	{ "GFP_ZERO",			"Z" },
+	{ "GFP_NOMEMALLOC",		"NMA" },
+	{ "GFP_MEMALLOC",		"MA" },
+	{ "GFP_HARDWALL",		"HW" },
+	{ "GFP_THISNODE",		"TN" },
+	{ "GFP_RECLAIMABLE",		"RC" },
+	{ "GFP_MOVABLE",		"M" },
+	{ "GFP_NOTRACK",		"NT" },
+	{ "GFP_NO_KSWAPD",		"NK" },
+	{ "GFP_OTHER_NODE",		"ON" },
+	{ "GFP_NOWAIT",			"NW" },
+};
+
+static size_t max_gfp_len;
+
+static char *compact_gfp_flags(char *gfp_flags)
+{
+	char *orig_flags = strdup(gfp_flags);
+	char *new_flags = NULL;
+	char *str, *pos;
+	size_t len = 0;
+
+	if (orig_flags == NULL)
+		return NULL;
+
+	str = strtok_r(orig_flags, "|", &pos);
+	while (str) {
+		size_t i;
+		char *new;
+		const char *cpt;
+
+		for (i = 0; i < ARRAY_SIZE(gfp_compact_table); i++) {
+			if (strcmp(gfp_compact_table[i].original, str))
+				continue;
+
+			cpt = gfp_compact_table[i].compact;
+			new = realloc(new_flags, len + strlen(cpt) + 2);
+			if (new == NULL) {
+				free(new_flags);
+				return NULL;
+			}
+
+			new_flags = new;
+
+			if (!len) {
+				strcpy(new_flags, cpt);
+			} else {
+				strcat(new_flags, "|");
+				strcat(new_flags, cpt);
+				len++;
+			}
+
+			len += strlen(cpt);
+		}
+
+		str = strtok_r(NULL, "|", &pos);
+	}
+
+	if (max_gfp_len < len)
+		max_gfp_len = len;
+
+	free(orig_flags);
+	return new_flags;
+}
+
+static char *compact_gfp_string(unsigned long gfp_flags)
+{
+	struct gfp_flag key = {
+		.flags = gfp_flags,
+	};
+	struct gfp_flag *gfp;
+
+	gfp = bsearch(&key, gfps, nr_gfps, sizeof(*gfps), gfpcmp);
+	if (gfp)
+		return gfp->compact_str;
+
+	return NULL;
+}
+
+static int parse_gfp_flags(struct perf_evsel *evsel, struct perf_sample *sample,
+			   unsigned int gfp_flags)
+{
+	struct pevent_record record = {
+		.cpu = sample->cpu,
+		.data = sample->raw_data,
+		.size = sample->raw_size,
+	};
+	struct trace_seq seq;
+	char *str, *pos;
+
+	if (nr_gfps) {
+		struct gfp_flag key = {
+			.flags = gfp_flags,
+		};
+
+		if (bsearch(&key, gfps, nr_gfps, sizeof(*gfps), gfpcmp))
+			return 0;
+	}
+
+	trace_seq_init(&seq);
+	pevent_event_info(&seq, evsel->tp_format, &record);
+
+	str = strtok_r(seq.buffer, " ", &pos);
+	while (str) {
+		if (!strncmp(str, "gfp_flags=", 10)) {
+			struct gfp_flag *new;
+
+			new = realloc(gfps, (nr_gfps + 1) * sizeof(*gfps));
+			if (new == NULL)
+				return -ENOMEM;
+
+			gfps = new;
+			new += nr_gfps++;
+
+			new->flags = gfp_flags;
+			new->human_readable = strdup(str + 10);
+			new->compact_str = compact_gfp_flags(str + 10);
+			if (!new->human_readable || !new->compact_str)
+				return -ENOMEM;
+
+			qsort(gfps, nr_gfps, sizeof(*gfps), gfpcmp);
+		}
+
+		str = strtok_r(NULL, " ", &pos);
+	}
+
+	trace_seq_destroy(&seq);
+	return 0;
+}
+
 static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 						struct perf_sample *sample)
 {
@@ -577,6 +747,9 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
 		return 0;
 	}
 
+	if (parse_gfp_flags(evsel, sample, gfp_flags) < 0)
+		return -1;
+
 	callsite = find_callsite(evsel, sample);
 
 	/*
@@ -796,16 +969,18 @@ static void __print_page_alloc_result(struct perf_session *session, int n_lines)
 	struct rb_node *next = rb_first(&page_alloc_sorted);
 	struct machine *machine = &session->machines.host;
 	const char *format;
+	int gfp_len = max(strlen("GFP flags"), max_gfp_len);
 
 	printf("\n%.105s\n", graph_dotted_line);
-	printf(" %-16s | %5s alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
-	       use_pfn ? "PFN" : "Page", live_page ? "Live" : "Total");
+	printf(" %-16s | %5s alloc (KB) | Hits      | Order | Mig.type | %-*s | Callsite\n",
+	       use_pfn ? "PFN" : "Page", live_page ? "Live" : "Total",
+	       gfp_len, "GFP flags");
 	printf("%.105s\n", graph_dotted_line);
 
 	if (use_pfn)
-		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
+		format = " %16llu | %'16llu | %'9d | %5d | %8s | %-*s | %s\n";
 	else
-		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
+		format = " %016llx | %'16llu | %'9d | %5d | %8s | %-*s | %s\n";
 
 	while (next && n_lines--) {
 		struct page_stat *data;
@@ -826,13 +1001,15 @@ static void __print_page_alloc_result(struct perf_session *session, int n_lines)
 		       (unsigned long long)data->alloc_bytes / 1024,
 		       data->nr_alloc, data->order,
 		       migrate_type_str[data->migrate_type],
-		       (unsigned long)data->gfp_flags, caller);
+		       gfp_len, compact_gfp_string(data->gfp_flags), caller);
 
 		next = rb_next(next);
 	}
 
-	if (n_lines == -1)
-		printf(" ...              | ...              | ...       | ...   | ...      | ...       | ...\n");
+	if (n_lines == -1) {
+		printf(" ...              | ...              | ...       | ...   | ...      | %-*s | ...\n",
+		       gfp_len, "...");
+	}
 
 	printf("%.105s\n", graph_dotted_line);
 }
@@ -841,10 +1018,11 @@ static void __print_page_caller_result(struct perf_session *session, int n_lines
 {
 	struct rb_node *next = rb_first(&page_caller_sorted);
 	struct machine *machine = &session->machines.host;
+	int gfp_len = max(strlen("GFP flags"), max_gfp_len);
 
 	printf("\n%.105s\n", graph_dotted_line);
-	printf(" %5s alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
-	       live_page ? "Live" : "Total");
+	printf(" %5s alloc (KB) | Hits      | Order | Mig.type | %-*s | Callsite\n",
+	       live_page ? "Live" : "Total", gfp_len, "GFP flags");
 	printf("%.105s\n", graph_dotted_line);
 
 	while (next && n_lines--) {
@@ -862,21 +1040,36 @@ static void __print_page_caller_result(struct perf_session *session, int n_lines
 		else
 			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
 
-		printf(" %'16llu | %'9d | %5d | %8s |  %08lx | %s\n",
+		printf(" %'16llu | %'9d | %5d | %8s | %-*s | %s\n",
 		       (unsigned long long)data->alloc_bytes / 1024,
 		       data->nr_alloc, data->order,
 		       migrate_type_str[data->migrate_type],
-		       (unsigned long)data->gfp_flags, caller);
+		       gfp_len, compact_gfp_string(data->gfp_flags), caller);
 
 		next = rb_next(next);
 	}
 
-	if (n_lines == -1)
-		printf(" ...              | ...       | ...   | ...      | ...       | ...\n");
+	if (n_lines == -1) {
+		printf(" ...              | ...       | ...   | ...      | %-*s | ...\n",
+		       gfp_len, "...");
+	}
 
 	printf("%.105s\n", graph_dotted_line);
 }
 
+static void print_gfp_flags(void)
+{
+	int i;
+
+	printf("# GFP flags\n");
+	printf("# ---------\n");
+	for (i = 0; i < nr_gfps; i++) {
+		printf("# %08x: %*s: %s\n", gfps[i].flags,
+		       (int) max_gfp_len, gfps[i].compact_str,
+		       gfps[i].human_readable);
+	}
+}
+
 static void print_slab_summary(void)
 {
 	printf("\nSUMMARY (SLAB allocator)");
@@ -946,6 +1139,8 @@ static void print_slab_result(struct perf_session *session)
 
 static void print_page_result(struct perf_session *session)
 {
+	if (caller_flag || alloc_flag)
+		print_gfp_flags();
 	if (caller_flag)
 		__print_page_caller_result(session, caller_lines);
 	if (alloc_flag)
-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 8/9] perf kmem: Add kmem.default config option
  2015-04-06  5:36 ` Namhyung Kim
@ 2015-04-06  5:36   ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm, Taeung Song

Currently perf kmem command will select --slab if neither --slab nor
--page is given for backward compatibility.  Add kmem.default config
option to select the default value ('page' or 'slab').

  # cat ~/.perfconfig
  [kmem]
  	default = page

  # perf kmem stat

  SUMMARY (page allocator)
  ========================
  Total allocation requests     :            1,518   [            6,096 KB ]
  Total free requests           :            1,431   [            5,748 KB ]

  Total alloc+freed requests    :            1,330   [            5,344 KB ]
  Total alloc-only requests     :              188   [              752 KB ]
  Total free-only requests      :              101   [              404 KB ]

  Total allocation failures     :                0   [                0 KB ]
  ...

Cc: Taeung Song <treeze.taeung@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-kmem.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index bf0f8bf56375..d2dfcabdf684 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -28,6 +28,10 @@ static int	kmem_slab;
 static int	kmem_page;
 
 static long	kmem_page_size;
+static enum {
+	KMEM_SLAB,
+	KMEM_PAGE,
+} kmem_default = KMEM_SLAB;  /* for backward compatibility */
 
 struct alloc_stat;
 typedef int (*sort_fn_t)(void *, void *);
@@ -1673,7 +1677,8 @@ static int parse_sort_opt(const struct option *opt __maybe_unused,
 	if (!arg)
 		return -1;
 
-	if (kmem_page > kmem_slab) {
+	if (kmem_page > kmem_slab ||
+	    (kmem_page == 0 && kmem_slab == 0 && kmem_default == KMEM_PAGE)) {
 		if (caller_flag > alloc_flag)
 			return setup_page_sorting(&page_caller_sort, arg);
 		else
@@ -1789,6 +1794,22 @@ static int __cmd_record(int argc, const char **argv)
 	return cmd_record(i, rec_argv, NULL);
 }
 
+static int kmem_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "kmem.default")) {
+		if (!strcmp(value, "slab"))
+			kmem_default = KMEM_SLAB;
+		else if (!strcmp(value, "page"))
+			kmem_default = KMEM_PAGE;
+		else
+			pr_err("invalid default value ('slab' or 'page' required): %s\n",
+			       value);
+		return 0;
+	}
+
+	return perf_default_config(var, value, cb);
+}
+
 int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 {
 	struct perf_data_file file = {
@@ -1825,14 +1846,19 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	struct perf_session *session;
 	int ret = -1;
 
+	perf_config(kmem_config, NULL);
 	argc = parse_options_subcommand(argc, argv, kmem_options,
 					kmem_subcommands, kmem_usage, 0);
 
 	if (!argc)
 		usage_with_options(kmem_usage, kmem_options);
 
-	if (kmem_slab == 0 && kmem_page == 0)
-		kmem_slab = 1;  /* for backward compatibility */
+	if (kmem_slab == 0 && kmem_page == 0) {
+		if (kmem_default == KMEM_SLAB)
+			kmem_slab = 1;
+		else
+			kmem_page = 1;
+	}
 
 	if (!strncmp(argv[0], "rec", 3)) {
 		symbol__init(NULL);
-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 8/9] perf kmem: Add kmem.default config option
@ 2015-04-06  5:36   ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm, Taeung Song

Currently perf kmem command will select --slab if neither --slab nor
--page is given for backward compatibility.  Add kmem.default config
option to select the default value ('page' or 'slab').

  # cat ~/.perfconfig
  [kmem]
  	default = page

  # perf kmem stat

  SUMMARY (page allocator)
  ========================
  Total allocation requests     :            1,518   [            6,096 KB ]
  Total free requests           :            1,431   [            5,748 KB ]

  Total alloc+freed requests    :            1,330   [            5,344 KB ]
  Total alloc-only requests     :              188   [              752 KB ]
  Total free-only requests      :              101   [              404 KB ]

  Total allocation failures     :                0   [                0 KB ]
  ...

Cc: Taeung Song <treeze.taeung@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-kmem.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index bf0f8bf56375..d2dfcabdf684 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -28,6 +28,10 @@ static int	kmem_slab;
 static int	kmem_page;
 
 static long	kmem_page_size;
+static enum {
+	KMEM_SLAB,
+	KMEM_PAGE,
+} kmem_default = KMEM_SLAB;  /* for backward compatibility */
 
 struct alloc_stat;
 typedef int (*sort_fn_t)(void *, void *);
@@ -1673,7 +1677,8 @@ static int parse_sort_opt(const struct option *opt __maybe_unused,
 	if (!arg)
 		return -1;
 
-	if (kmem_page > kmem_slab) {
+	if (kmem_page > kmem_slab ||
+	    (kmem_page == 0 && kmem_slab == 0 && kmem_default == KMEM_PAGE)) {
 		if (caller_flag > alloc_flag)
 			return setup_page_sorting(&page_caller_sort, arg);
 		else
@@ -1789,6 +1794,22 @@ static int __cmd_record(int argc, const char **argv)
 	return cmd_record(i, rec_argv, NULL);
 }
 
+static int kmem_config(const char *var, const char *value, void *cb)
+{
+	if (!strcmp(var, "kmem.default")) {
+		if (!strcmp(value, "slab"))
+			kmem_default = KMEM_SLAB;
+		else if (!strcmp(value, "page"))
+			kmem_default = KMEM_PAGE;
+		else
+			pr_err("invalid default value ('slab' or 'page' required): %s\n",
+			       value);
+		return 0;
+	}
+
+	return perf_default_config(var, value, cb);
+}
+
 int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 {
 	struct perf_data_file file = {
@@ -1825,14 +1846,19 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	struct perf_session *session;
 	int ret = -1;
 
+	perf_config(kmem_config, NULL);
 	argc = parse_options_subcommand(argc, argv, kmem_options,
 					kmem_subcommands, kmem_usage, 0);
 
 	if (!argc)
 		usage_with_options(kmem_usage, kmem_options);
 
-	if (kmem_slab == 0 && kmem_page == 0)
-		kmem_slab = 1;  /* for backward compatibility */
+	if (kmem_slab == 0 && kmem_page == 0) {
+		if (kmem_default == KMEM_SLAB)
+			kmem_slab = 1;
+		else
+			kmem_page = 1;
+	}
 
 	if (!strncmp(argv[0], "rec", 3)) {
 		symbol__init(NULL);
-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 9/9] tools lib traceevent: Honor operator priority
  2015-04-06  5:36 ` Namhyung Kim
@ 2015-04-06  5:36   ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm, Steven Rostedt

Currently it ignores operator priority and just sets processed args as a
right operand.  But it could result in priority inversion in case that
the right operand is also a operator arg and its priority is lower.

For example, following print format is from new kmem events.

  "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)

But this was treated as below:

  REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

In this case, the right arg was '?' operator which has lower priority.
But it just sets the whole arg so making the output confusing - page was
always 0 or 1 since that's the result of logical operation.

With this patch, it can handle it properly like following:

  ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/lib/traceevent/event-parse.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
index 6d31b6419d37..604bea5c3fb0 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -1939,7 +1939,22 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok)
 			goto out_warn_free;
 
 		type = process_arg_token(event, right, tok, type);
-		arg->op.right = right;
+
+		if (right->type == PRINT_OP &&
+		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
+			struct print_arg tmp;
+
+			/* swap ops according to the priority */
+			arg->op.right = right->op.left;
+
+			tmp = *arg;
+			*arg = *right;
+			*right = tmp;
+
+			arg->op.left = right;
+		} else {
+			arg->op.right = right;
+		}
 
 	} else if (strcmp(token, "[") == 0) {
 
-- 
2.3.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 9/9] tools lib traceevent: Honor operator priority
@ 2015-04-06  5:36   ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-06  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm, Steven Rostedt

Currently it ignores operator priority and just sets processed args as a
right operand.  But it could result in priority inversion in case that
the right operand is also a operator arg and its priority is lower.

For example, following print format is from new kmem events.

  "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)

But this was treated as below:

  REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

In this case, the right arg was '?' operator which has lower priority.
But it just sets the whole arg so making the output confusing - page was
always 0 or 1 since that's the result of logical operation.

With this patch, it can handle it properly like following:

  ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/lib/traceevent/event-parse.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
index 6d31b6419d37..604bea5c3fb0 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -1939,7 +1939,22 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok)
 			goto out_warn_free;
 
 		type = process_arg_token(event, right, tok, type);
-		arg->op.right = right;
+
+		if (right->type == PRINT_OP &&
+		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
+			struct print_arg tmp;
+
+			/* swap ops according to the priority */
+			arg->op.right = right->op.left;
+
+			tmp = *arg;
+			*arg = *right;
+			*right = tmp;
+
+			arg->op.left = right;
+		} else {
+			arg->op.right = right;
+		}
 
 	} else if (strcmp(token, "[") == 0) {
 
-- 
2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
  2015-04-06  5:36   ` Namhyung Kim
@ 2015-04-06 14:45     ` Steven Rostedt
  -1 siblings, 0 replies; 50+ messages in thread
From: Steven Rostedt @ 2015-04-06 14:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

On Mon,  6 Apr 2015 14:36:16 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> Currently it ignores operator priority and just sets processed args as a
> right operand.  But it could result in priority inversion in case that
> the right operand is also a operator arg and its priority is lower.
> 
> For example, following print format is from new kmem events.
> 
>   "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)
> 
> But this was treated as below:
> 
>   REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
> 
> In this case, the right arg was '?' operator which has lower priority.
> But it just sets the whole arg so making the output confusing - page was
> always 0 or 1 since that's the result of logical operation.
> 
> With this patch, it can handle it properly like following:
> 
>   ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

Nice catch. One nit.

> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/lib/traceevent/event-parse.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
> index 6d31b6419d37..604bea5c3fb0 100644
> --- a/tools/lib/traceevent/event-parse.c
> +++ b/tools/lib/traceevent/event-parse.c
> @@ -1939,7 +1939,22 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok)
>  			goto out_warn_free;
>  
>  		type = process_arg_token(event, right, tok, type);
> -		arg->op.right = right;
> +
> +		if (right->type == PRINT_OP &&
> +		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
> +			struct print_arg tmp;
> +
> +			/* swap ops according to the priority */

This isn't really a swap. Better term to use is "rotate".

But other than that,

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> +			arg->op.right = right->op.left;
> +
> +			tmp = *arg;
> +			*arg = *right;
> +			*right = tmp;
> +
> +			arg->op.left = right;
> +		} else {
> +			arg->op.right = right;
> +		}
>  
>  	} else if (strcmp(token, "[") == 0) {
>  


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
@ 2015-04-06 14:45     ` Steven Rostedt
  0 siblings, 0 replies; 50+ messages in thread
From: Steven Rostedt @ 2015-04-06 14:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

On Mon,  6 Apr 2015 14:36:16 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> Currently it ignores operator priority and just sets processed args as a
> right operand.  But it could result in priority inversion in case that
> the right operand is also a operator arg and its priority is lower.
> 
> For example, following print format is from new kmem events.
> 
>   "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)
> 
> But this was treated as below:
> 
>   REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
> 
> In this case, the right arg was '?' operator which has lower priority.
> But it just sets the whole arg so making the output confusing - page was
> always 0 or 1 since that's the result of logical operation.
> 
> With this patch, it can handle it properly like following:
> 
>   ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

Nice catch. One nit.

> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/lib/traceevent/event-parse.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
> index 6d31b6419d37..604bea5c3fb0 100644
> --- a/tools/lib/traceevent/event-parse.c
> +++ b/tools/lib/traceevent/event-parse.c
> @@ -1939,7 +1939,22 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok)
>  			goto out_warn_free;
>  
>  		type = process_arg_token(event, right, tok, type);
> -		arg->op.right = right;
> +
> +		if (right->type == PRINT_OP &&
> +		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
> +			struct print_arg tmp;
> +
> +			/* swap ops according to the priority */

This isn't really a swap. Better term to use is "rotate".

But other than that,

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> +			arg->op.right = right->op.left;
> +
> +			tmp = *arg;
> +			*arg = *right;
> +			*right = tmp;
> +
> +			arg->op.left = right;
> +		} else {
> +			arg->op.right = right;
> +		}
>  
>  	} else if (strcmp(token, "[") == 0) {
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
  2015-04-06 14:45     ` Steven Rostedt
@ 2015-04-07  7:52       ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-07  7:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

Hi Steve,

On Mon, Apr 06, 2015 at 10:45:04AM -0400, Steven Rostedt wrote:
> On Mon,  6 Apr 2015 14:36:16 +0900
> Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > Currently it ignores operator priority and just sets processed args as a
> > right operand.  But it could result in priority inversion in case that
> > the right operand is also a operator arg and its priority is lower.
> > 
> > For example, following print format is from new kmem events.
> > 
> >   "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)
> > 
> > But this was treated as below:
> > 
> >   REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
> > 
> > In this case, the right arg was '?' operator which has lower priority.
> > But it just sets the whole arg so making the output confusing - page was
> > always 0 or 1 since that's the result of logical operation.
> > 
> > With this patch, it can handle it properly like following:
> > 
> >   ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
> 
> Nice catch. One nit.
> 
> > 
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/lib/traceevent/event-parse.c | 17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
> > index 6d31b6419d37..604bea5c3fb0 100644
> > --- a/tools/lib/traceevent/event-parse.c
> > +++ b/tools/lib/traceevent/event-parse.c
> > @@ -1939,7 +1939,22 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok)
> >  			goto out_warn_free;
> >  
> >  		type = process_arg_token(event, right, tok, type);
> > -		arg->op.right = right;
> > +
> > +		if (right->type == PRINT_OP &&
> > +		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
> > +			struct print_arg tmp;
> > +
> > +			/* swap ops according to the priority */
> 
> This isn't really a swap. Better term to use is "rotate".

You're right!

> 
> But other than that,
> 
> Acked-by: Steven Rostedt <rostedt@goodmis.org>

Thanks for the review
Namhyung


> 
> > +			arg->op.right = right->op.left;
> > +
> > +			tmp = *arg;
> > +			*arg = *right;
> > +			*right = tmp;
> > +
> > +			arg->op.left = right;
> > +		} else {
> > +			arg->op.right = right;
> > +		}
> >  
> >  	} else if (strcmp(token, "[") == 0) {
> >  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
@ 2015-04-07  7:52       ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-07  7:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

Hi Steve,

On Mon, Apr 06, 2015 at 10:45:04AM -0400, Steven Rostedt wrote:
> On Mon,  6 Apr 2015 14:36:16 +0900
> Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > Currently it ignores operator priority and just sets processed args as a
> > right operand.  But it could result in priority inversion in case that
> > the right operand is also a operator arg and its priority is lower.
> > 
> > For example, following print format is from new kmem events.
> > 
> >   "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)
> > 
> > But this was treated as below:
> > 
> >   REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
> > 
> > In this case, the right arg was '?' operator which has lower priority.
> > But it just sets the whole arg so making the output confusing - page was
> > always 0 or 1 since that's the result of logical operation.
> > 
> > With this patch, it can handle it properly like following:
> > 
> >   ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
> 
> Nice catch. One nit.
> 
> > 
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/lib/traceevent/event-parse.c | 17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
> > index 6d31b6419d37..604bea5c3fb0 100644
> > --- a/tools/lib/traceevent/event-parse.c
> > +++ b/tools/lib/traceevent/event-parse.c
> > @@ -1939,7 +1939,22 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok)
> >  			goto out_warn_free;
> >  
> >  		type = process_arg_token(event, right, tok, type);
> > -		arg->op.right = right;
> > +
> > +		if (right->type == PRINT_OP &&
> > +		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
> > +			struct print_arg tmp;
> > +
> > +			/* swap ops according to the priority */
> 
> This isn't really a swap. Better term to use is "rotate".

You're right!

> 
> But other than that,
> 
> Acked-by: Steven Rostedt <rostedt@goodmis.org>

Thanks for the review
Namhyung


> 
> > +			arg->op.right = right->op.left;
> > +
> > +			tmp = *arg;
> > +			*arg = *right;
> > +			*right = tmp;
> > +
> > +			arg->op.left = right;
> > +		} else {
> > +			arg->op.right = right;
> > +		}
> >  
> >  	} else if (strcmp(token, "[") == 0) {
> >  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
  2015-04-07  7:52       ` Namhyung Kim
@ 2015-04-07 13:02         ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-07 13:02 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

Em Tue, Apr 07, 2015 at 04:52:26PM +0900, Namhyung Kim escreveu:
> On Mon, Apr 06, 2015 at 10:45:04AM -0400, Steven Rostedt wrote:
> > >  		type = process_arg_token(event, right, tok, type);
> > > -		arg->op.right = right;
> > > +
> > > +		if (right->type == PRINT_OP &&
> > > +		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
> > > +			struct print_arg tmp;
> > > +
> > > +			/* swap ops according to the priority */

> > This isn't really a swap. Better term to use is "rotate".

> You're right!

> > But other than that,

> > Acked-by: Steven Rostedt <rostedt@goodmis.org>
> 
> Thanks for the review

Ok, so just doing that s/swap/rotate/g, sticking Rostedt's ack and
applying, ok?

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
@ 2015-04-07 13:02         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-07 13:02 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

Em Tue, Apr 07, 2015 at 04:52:26PM +0900, Namhyung Kim escreveu:
> On Mon, Apr 06, 2015 at 10:45:04AM -0400, Steven Rostedt wrote:
> > >  		type = process_arg_token(event, right, tok, type);
> > > -		arg->op.right = right;
> > > +
> > > +		if (right->type == PRINT_OP &&
> > > +		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
> > > +			struct print_arg tmp;
> > > +
> > > +			/* swap ops according to the priority */

> > This isn't really a swap. Better term to use is "rotate".

> You're right!

> > But other than that,

> > Acked-by: Steven Rostedt <rostedt@goodmis.org>
> 
> Thanks for the review

Ok, so just doing that s/swap/rotate/g, sticking Rostedt's ack and
applying, ok?

- Arnaldo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
  2015-04-07 13:02         ` Arnaldo Carvalho de Melo
@ 2015-04-07 13:57           ` Steven Rostedt
  -1 siblings, 0 replies; 50+ messages in thread
From: Steven Rostedt @ 2015-04-07 13:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

On Tue, 7 Apr 2015 10:02:08 -0300
Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Ok, so just doing that s/swap/rotate/g, sticking Rostedt's ack and
> applying, ok?

I'm fine with that.

-- Steve


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
@ 2015-04-07 13:57           ` Steven Rostedt
  0 siblings, 0 replies; 50+ messages in thread
From: Steven Rostedt @ 2015-04-07 13:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

On Tue, 7 Apr 2015 10:02:08 -0300
Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Ok, so just doing that s/swap/rotate/g, sticking Rostedt's ack and
> applying, ok?

I'm fine with that.

-- Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
  2015-04-07 13:02         ` Arnaldo Carvalho de Melo
@ 2015-04-07 14:10           ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-07 14:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

On Tue, Apr 07, 2015 at 10:02:08AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Apr 07, 2015 at 04:52:26PM +0900, Namhyung Kim escreveu:
> > On Mon, Apr 06, 2015 at 10:45:04AM -0400, Steven Rostedt wrote:
> > > >  		type = process_arg_token(event, right, tok, type);
> > > > -		arg->op.right = right;
> > > > +
> > > > +		if (right->type == PRINT_OP &&
> > > > +		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
> > > > +			struct print_arg tmp;
> > > > +
> > > > +			/* swap ops according to the priority */
> 
> > > This isn't really a swap. Better term to use is "rotate".
> 
> > You're right!
> 
> > > But other than that,
> 
> > > Acked-by: Steven Rostedt <rostedt@goodmis.org>
> > 
> > Thanks for the review
> 
> Ok, so just doing that s/swap/rotate/g, sticking Rostedt's ack and
> applying, ok?

Sure thing!

Thanks for your work,
Namhyung

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
@ 2015-04-07 14:10           ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-07 14:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Minchan Kim, Joonsoo Kim, linux-mm

On Tue, Apr 07, 2015 at 10:02:08AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Apr 07, 2015 at 04:52:26PM +0900, Namhyung Kim escreveu:
> > On Mon, Apr 06, 2015 at 10:45:04AM -0400, Steven Rostedt wrote:
> > > >  		type = process_arg_token(event, right, tok, type);
> > > > -		arg->op.right = right;
> > > > +
> > > > +		if (right->type == PRINT_OP &&
> > > > +		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
> > > > +			struct print_arg tmp;
> > > > +
> > > > +			/* swap ops according to the priority */
> 
> > > This isn't really a swap. Better term to use is "rotate".
> 
> > You're right!
> 
> > > But other than that,
> 
> > > Acked-by: Steven Rostedt <rostedt@goodmis.org>
> > 
> > Thanks for the review
> 
> Ok, so just doing that s/swap/rotate/g, sticking Rostedt's ack and
> applying, ok?

Sure thing!

Thanks for your work,
Namhyung

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [tip:perf/core] tools lib traceevent: Honor operator priority
  2015-04-06  5:36   ` Namhyung Kim
  (?)
  (?)
@ 2015-04-08 15:11   ` tip-bot for Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Namhyung Kim @ 2015-04-08 15:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: a.p.zijlstra, rostedt, mingo, namhyung, dsahern, acme, js1304,
	hpa, jolsa, linux-kernel, minchan, tglx

Commit-ID:  3201f0dc42f7fad9387afc4692cea3d0c730cba2
Gitweb:     http://git.kernel.org/tip/3201f0dc42f7fad9387afc4692cea3d0c730cba2
Author:     Namhyung Kim <namhyung@kernel.org>
AuthorDate: Mon, 6 Apr 2015 14:36:16 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:09 -0300

tools lib traceevent: Honor operator priority

Currently it ignores operator priority and just sets processed args as a
right operand.  But it could result in priority inversion in case that
the right operand is also a operator arg and its priority is lower.

For example, following print format is from new kmem events.

  "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)

But this was treated as below:

  REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

In this case, the right arg was '?' operator which has lower priority.
But it just sets the whole arg so making the output confusing - page was
always 0 or 1 since that's the result of logical operation.

With this patch, it can handle it properly like following:

  ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1428298576-9785-10-git-send-email-namhyung@kernel.org
[ Replaced 'swap' with 'rotate' in a comment as requested by Steve and agreed by Namhyung ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/traceevent/event-parse.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
index 6d31b64..12a7e2a 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -1939,7 +1939,22 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok)
 			goto out_warn_free;
 
 		type = process_arg_token(event, right, tok, type);
-		arg->op.right = right;
+
+		if (right->type == PRINT_OP &&
+		    get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
+			struct print_arg tmp;
+
+			/* rotate ops according to the priority */
+			arg->op.right = right->op.left;
+
+			tmp = *arg;
+			*arg = *right;
+			*right = tmp;
+
+			arg->op.left = right;
+		} else {
+			arg->op.right = right;
+		}
 
 	} else if (strcmp(token, "[") == 0) {
 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [tip:perf/core] perf kmem: Respect -i option
  2015-04-06  5:36   ` Namhyung Kim
  (?)
@ 2015-04-08 15:11   ` tip-bot for Jiri Olsa
  -1 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Jiri Olsa @ 2015-04-08 15:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: a.p.zijlstra, tglx, linux-kernel, jolsa, acme, js1304, jolsa,
	minchan, hpa, namhyung, mingo, dsahern

Commit-ID:  28939e1a1f6d198239d86b1d77fa9fd55773189a
Gitweb:     http://git.kernel.org/tip/28939e1a1f6d198239d86b1d77fa9fd55773189a
Author:     Jiri Olsa <jolsa@kernel.org>
AuthorDate: Mon, 6 Apr 2015 14:36:08 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 8 Apr 2015 09:07:14 -0300

perf kmem: Respect -i option

Currently the perf kmem does not respect -i option.

Initializing the file.path properly after options get parsed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1428298576-9785-2-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-kmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index ac303ef..4ebf65c 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -663,7 +663,6 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 {
 	const char * const default_sort_order = "frag,hit,bytes";
 	struct perf_data_file file = {
-		.path = input_name,
 		.mode = PERF_DATA_MODE_READ,
 	};
 	const struct option kmem_options[] = {
@@ -701,6 +700,8 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 		return __cmd_record(argc, argv);
 	}
 
+	file.path = input_name;
+
 	session = perf_session__new(&file, false, &perf_kmem);
 	if (session == NULL)
 		return -1;

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
  2015-04-06  5:36   ` Namhyung Kim
@ 2015-04-10 21:06     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-10 21:06 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> The perf kmem command records and analyze kernel memory allocation
> only for SLAB objects.  This patch implement a simple page allocator
> analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.
> 
> It adds two new options of --slab and --page.  The --slab option is
> for analyzing SLAB allocator and that's what perf kmem currently does.
> 
> The new --page option enables page allocator events and analyze kernel
> memory usage in page unit.  Currently, 'stat --alloc' subcommand is
> implemented only.
> 
> If none of these --slab nor --page is specified, --slab is implied.
> 
>   # perf kmem stat --page --alloc --line 10

Hi, applied the first patch, the kernel one, reboot with that kernel:

[root@ssdandy ~]# cat /t/events/kmem/mm_page_free/format 
name: mm_page_free
ID: 367
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long pfn;	offset:8;	size:8;	signed:0;
	field:unsigned int order;	offset:16;	size:4;	signed:0;

print fmt: "page=%p pfn=%lu order=%d", (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)), REC->pfn, REC->order
[root@ssdandy ~]#

Then did:

[root@ssdandy ~]# perf kmem record


^C[ perf record: Woken up 0 times to write data ]
per[ perf record: Captured and wrote 624.101 MB perf.data (7022790 samples) ]

[root@ssdandy ~]# perf evlist
kmem:kmalloc
kmem:kmalloc_node
kmem:kfree
kmem:kmem_cache_alloc
kmem:kmem_cache_alloc_node
kmem:kmem_cache_free
[root@ssdandy ~]# ls -la perf.data
-rw-------. 1 root root 659632943 Apr 10 18:05 perf.data
[root@ssdandy ~]#

But I only get:

[root@ssdandy ~]# perf kmem stat --page --alloc
Warning:
188 out of order events recorded.

--------------------------------------------------------------------------------
 Page             | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags
--------------------------------------------------------------------------------
 ...              | ...              | ...       | ...   | ...      | ...     
--------------------------------------------------------------------------------

SUMMARY (page allocator)
========================
Total allocation requests     :                0   [                0 KB ]
Total free requests           :                0   [                0 KB ]

Total alloc+freed requests    :                0   [                0 KB ]
Total alloc-only requests     :                0   [                0 KB ]
Total free-only requests      :                0   [                0 KB ]

Total allocation failures     :                0   [                0 KB ]

Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
-----  ------------  ------------  ------------  ------------  ------------
    0             .             .             .             .             .
    1             .             .             .             .             .
    2             .             .             .             .             .
    3             .             .             .             .             .
    4             .             .             .             .             .
    5             .             .             .             .             .
    6             .             .             .             .             .
    7             .             .             .             .             .
    8             .             .             .             .             .
    9             .             .             .             .             .
   10             .             .             .             .             .
[root@ssdandy ~]#

What am I missing?

- Arnaldo
 
>   -------------------------------------------------------------------------------
>    PFN              | Total alloc (KB) | Hits     | Order | Mig.type | GFP flags
>   -------------------------------------------------------------------------------
>             4045014 |               16 |        1 |     2 |  RECLAIM |  00285250
>             4143980 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3938658 |               16 |        1 |     2 |  RECLAIM |  00285250
>             4045400 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3568708 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3729824 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3657210 |               16 |        1 |     2 |  RECLAIM |  00285250
>             4120750 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3678850 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3693874 |               16 |        1 |     2 |  RECLAIM |  00285250
>    ...              | ...              | ...      | ...   | ...      | ...
>   -------------------------------------------------------------------------------
> 
>   SUMMARY (page allocator)
>   ========================
>   Total allocation requests     :           44,260   [          177,256 KB ]
>   Total free requests           :              117   [              468 KB ]
> 
>   Total alloc+freed requests    :               49   [              196 KB ]
>   Total alloc-only requests     :           44,211   [          177,060 KB ]
>   Total free-only requests      :               68   [              272 KB ]
> 
>   Total allocation failures     :                0   [                0 KB ]
> 
>   Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
>   -----  ------------  ------------  ------------  ------------  ------------
>       0            32             .        44,210             .             .
>       1             .             .             .             .             .
>       2             .            18             .             .             .
>       3             .             .             .             .             .
>       4             .             .             .             .             .
>       5             .             .             .             .             .
>       6             .             .             .             .             .
>       7             .             .             .             .             .
>       8             .             .             .             .             .
>       9             .             .             .             .             .
>      10             .             .             .             .             .
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-kmem.txt |   8 +-
>  tools/perf/builtin-kmem.c              | 500 +++++++++++++++++++++++++++++++--
>  2 files changed, 491 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
> index 150253cc3c97..23219c65c16f 100644
> --- a/tools/perf/Documentation/perf-kmem.txt
> +++ b/tools/perf/Documentation/perf-kmem.txt
> @@ -3,7 +3,7 @@ perf-kmem(1)
>  
>  NAME
>  ----
> -perf-kmem - Tool to trace/measure kernel memory(slab) properties
> +perf-kmem - Tool to trace/measure kernel memory properties
>  
>  SYNOPSIS
>  --------
> @@ -46,6 +46,12 @@ OPTIONS
>  --raw-ip::
>  	Print raw ip instead of symbol
>  
> +--slab::
> +	Analyze SLAB allocator events.
> +
> +--page::
> +	Analyze page allocator events
> +
>  SEE ALSO
>  --------
>  linkperf:perf-record[1]
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index 4ebf65c79434..63ea01349b6e 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -22,6 +22,11 @@
>  #include <linux/string.h>
>  #include <locale.h>
>  
> +static int	kmem_slab;
> +static int	kmem_page;
> +
> +static long	kmem_page_size;
> +
>  struct alloc_stat;
>  typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
>  
> @@ -226,6 +231,244 @@ static int perf_evsel__process_free_event(struct perf_evsel *evsel,
>  	return 0;
>  }
>  
> +static u64 total_page_alloc_bytes;
> +static u64 total_page_free_bytes;
> +static u64 total_page_nomatch_bytes;
> +static u64 total_page_fail_bytes;
> +static unsigned long nr_page_allocs;
> +static unsigned long nr_page_frees;
> +static unsigned long nr_page_fails;
> +static unsigned long nr_page_nomatch;
> +
> +static bool use_pfn;
> +
> +#define MAX_MIGRATE_TYPES  6
> +#define MAX_PAGE_ORDER     11
> +
> +static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
> +
> +struct page_stat {
> +	struct rb_node 	node;
> +	u64 		page;
> +	int 		order;
> +	unsigned 	gfp_flags;
> +	unsigned 	migrate_type;
> +	u64		alloc_bytes;
> +	u64 		free_bytes;
> +	int 		nr_alloc;
> +	int 		nr_free;
> +};
> +
> +static struct rb_root page_tree;
> +static struct rb_root page_alloc_tree;
> +static struct rb_root page_alloc_sorted;
> +
> +static struct page_stat *search_page(unsigned long page, bool create)
> +{
> +	struct rb_node **node = &page_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct page_stat *data;
> +
> +	while (*node) {
> +		s64 cmp;
> +
> +		parent = *node;
> +		data = rb_entry(*node, struct page_stat, node);
> +
> +		cmp = data->page - page;
> +		if (cmp < 0)
> +			node = &parent->rb_left;
> +		else if (cmp > 0)
> +			node = &parent->rb_right;
> +		else
> +			return data;
> +	}
> +
> +	if (!create)
> +		return NULL;
> +
> +	data = zalloc(sizeof(*data));
> +	if (data != NULL) {
> +		data->page = page;
> +
> +		rb_link_node(&data->node, parent, node);
> +		rb_insert_color(&data->node, &page_tree);
> +	}
> +
> +	return data;
> +}
> +
> +static int page_stat_cmp(struct page_stat *a, struct page_stat *b)
> +{
> +	if (a->page > b->page)
> +		return -1;
> +	if (a->page < b->page)
> +		return 1;
> +	if (a->order > b->order)
> +		return -1;
> +	if (a->order < b->order)
> +		return 1;
> +	if (a->migrate_type > b->migrate_type)
> +		return -1;
> +	if (a->migrate_type < b->migrate_type)
> +		return 1;
> +	if (a->gfp_flags > b->gfp_flags)
> +		return -1;
> +	if (a->gfp_flags < b->gfp_flags)
> +		return 1;
> +	return 0;
> +}
> +
> +static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool create)
> +{
> +	struct rb_node **node = &page_alloc_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct page_stat *data;
> +
> +	while (*node) {
> +		s64 cmp;
> +
> +		parent = *node;
> +		data = rb_entry(*node, struct page_stat, node);
> +
> +		cmp = page_stat_cmp(data, stat);
> +		if (cmp < 0)
> +			node = &parent->rb_left;
> +		else if (cmp > 0)
> +			node = &parent->rb_right;
> +		else
> +			return data;
> +	}
> +
> +	if (!create)
> +		return NULL;
> +
> +	data = zalloc(sizeof(*data));
> +	if (data != NULL) {
> +		data->page = stat->page;
> +		data->order = stat->order;
> +		data->gfp_flags = stat->gfp_flags;
> +		data->migrate_type = stat->migrate_type;
> +
> +		rb_link_node(&data->node, parent, node);
> +		rb_insert_color(&data->node, &page_alloc_tree);
> +	}
> +
> +	return data;
> +}
> +
> +static bool valid_page(u64 pfn_or_page)
> +{
> +	if (use_pfn && pfn_or_page == -1UL)
> +		return false;
> +	if (!use_pfn && pfn_or_page == 0)
> +		return false;
> +	return true;
> +}
> +
> +static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
> +						struct perf_sample *sample)
> +{
> +	u64 page;
> +	unsigned int order = perf_evsel__intval(evsel, sample, "order");
> +	unsigned int gfp_flags = perf_evsel__intval(evsel, sample, "gfp_flags");
> +	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
> +						       "migratetype");
> +	u64 bytes = kmem_page_size << order;
> +	struct page_stat *stat;
> +	struct page_stat this = {
> +		.order = order,
> +		.gfp_flags = gfp_flags,
> +		.migrate_type = migrate_type,
> +	};
> +
> +	if (use_pfn)
> +		page = perf_evsel__intval(evsel, sample, "pfn");
> +	else
> +		page = perf_evsel__intval(evsel, sample, "page");
> +
> +	nr_page_allocs++;
> +	total_page_alloc_bytes += bytes;
> +
> +	if (!valid_page(page)) {
> +		nr_page_fails++;
> +		total_page_fail_bytes += bytes;
> +
> +		return 0;
> +	}
> +
> +	/*
> +	 * This is to find the current page (with correct gfp flags and
> +	 * migrate type) at free event.
> +	 */
> +	stat = search_page(page, true);
> +	if (stat == NULL)
> +		return -ENOMEM;
> +
> +	stat->order = order;
> +	stat->gfp_flags = gfp_flags;
> +	stat->migrate_type = migrate_type;
> +
> +	this.page = page;
> +	stat = search_page_alloc_stat(&this, true);
> +	if (stat == NULL)
> +		return -ENOMEM;
> +
> +	stat->nr_alloc++;
> +	stat->alloc_bytes += bytes;
> +
> +	order_stats[order][migrate_type]++;
> +
> +	return 0;
> +}
> +
> +static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
> +						struct perf_sample *sample)
> +{
> +	u64 page;
> +	unsigned int order = perf_evsel__intval(evsel, sample, "order");
> +	u64 bytes = kmem_page_size << order;
> +	struct page_stat *stat;
> +	struct page_stat this = {
> +		.order = order,
> +	};
> +
> +	if (use_pfn)
> +		page = perf_evsel__intval(evsel, sample, "pfn");
> +	else
> +		page = perf_evsel__intval(evsel, sample, "page");
> +
> +	nr_page_frees++;
> +	total_page_free_bytes += bytes;
> +
> +	stat = search_page(page, false);
> +	if (stat == NULL) {
> +		pr_debug2("missing free at page %"PRIx64" (order: %d)\n",
> +			  page, order);
> +
> +		nr_page_nomatch++;
> +		total_page_nomatch_bytes += bytes;
> +
> +		return 0;
> +	}
> +
> +	this.page = page;
> +	this.gfp_flags = stat->gfp_flags;
> +	this.migrate_type = stat->migrate_type;
> +
> +	rb_erase(&stat->node, &page_tree);
> +	free(stat);
> +
> +	stat = search_page_alloc_stat(&this, false);
> +	if (stat == NULL)
> +		return -ENOENT;
> +
> +	stat->nr_free++;
> +	stat->free_bytes += bytes;
> +
> +	return 0;
> +}
> +
>  typedef int (*tracepoint_handler)(struct perf_evsel *evsel,
>  				  struct perf_sample *sample);
>  
> @@ -270,8 +513,9 @@ static double fragmentation(unsigned long n_req, unsigned long n_alloc)
>  		return 100.0 - (100.0 * n_req / n_alloc);
>  }
>  
> -static void __print_result(struct rb_root *root, struct perf_session *session,
> -			   int n_lines, int is_caller)
> +static void __print_slab_result(struct rb_root *root,
> +				struct perf_session *session,
> +				int n_lines, int is_caller)
>  {
>  	struct rb_node *next;
>  	struct machine *machine = &session->machines.host;
> @@ -323,9 +567,56 @@ static void __print_result(struct rb_root *root, struct perf_session *session,
>  	printf("%.105s\n", graph_dotted_line);
>  }
>  
> -static void print_summary(void)
> +static const char * const migrate_type_str[] = {
> +	"UNMOVABL",
> +	"RECLAIM",
> +	"MOVABLE",
> +	"RESERVED",
> +	"CMA/ISLT",
> +	"UNKNOWN",
> +};
> +
> +static void __print_page_result(struct rb_root *root,
> +				struct perf_session *session __maybe_unused,
> +				int n_lines)
> +{
> +	struct rb_node *next = rb_first(root);
> +	const char *format;
> +
> +	printf("\n%.80s\n", graph_dotted_line);
> +	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
> +	       use_pfn ? "PFN" : "Page");
> +	printf("%.80s\n", graph_dotted_line);
> +
> +	if (use_pfn)
> +		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
> +	else
> +		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
> +
> +	while (next && n_lines--) {
> +		struct page_stat *data;
> +
> +		data = rb_entry(next, struct page_stat, node);
> +
> +		printf(format, (unsigned long long)data->page,
> +		       (unsigned long long)data->alloc_bytes / 1024,
> +		       data->nr_alloc, data->order,
> +		       migrate_type_str[data->migrate_type],
> +		       (unsigned long)data->gfp_flags);
> +
> +		next = rb_next(next);
> +	}
> +
> +	if (n_lines == -1)
> +		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
> +
> +	printf("%.80s\n", graph_dotted_line);
> +}
> +
> +static void print_slab_summary(void)
>  {
> -	printf("\nSUMMARY\n=======\n");
> +	printf("\nSUMMARY (SLAB allocator)");
> +	printf("\n========================\n");
>  	printf("Total bytes requested: %'lu\n", total_requested);
>  	printf("Total bytes allocated: %'lu\n", total_allocated);
>  	printf("Total bytes wasted on internal fragmentation: %'lu\n",
> @@ -335,13 +626,73 @@ static void print_summary(void)
>  	printf("Cross CPU allocations: %'lu/%'lu\n", nr_cross_allocs, nr_allocs);
>  }
>  
> -static void print_result(struct perf_session *session)
> +static void print_page_summary(void)
> +{
> +	int o, m;
> +	u64 nr_alloc_freed = nr_page_frees - nr_page_nomatch;
> +	u64 total_alloc_freed_bytes = total_page_free_bytes - total_page_nomatch_bytes;
> +
> +	printf("\nSUMMARY (page allocator)");
> +	printf("\n========================\n");
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation requests",
> +	       nr_page_allocs, total_page_alloc_bytes / 1024);
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free requests",
> +	       nr_page_frees, total_page_free_bytes / 1024);
> +	printf("\n");
> +
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc+freed requests",
> +	       nr_alloc_freed, (total_alloc_freed_bytes) / 1024);
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc-only requests",
> +	       nr_page_allocs - nr_alloc_freed,
> +	       (total_page_alloc_bytes - total_alloc_freed_bytes) / 1024);
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free-only requests",
> +	       nr_page_nomatch, total_page_nomatch_bytes / 1024);
> +	printf("\n");
> +
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation failures",
> +	       nr_page_fails, total_page_fail_bytes / 1024);
> +	printf("\n");
> +
> +	printf("%5s  %12s  %12s  %12s  %12s  %12s\n", "Order",  "Unmovable",
> +	       "Reclaimable", "Movable", "Reserved", "CMA/Isolated");
> +	printf("%.5s  %.12s  %.12s  %.12s  %.12s  %.12s\n", graph_dotted_line,
> +	       graph_dotted_line, graph_dotted_line, graph_dotted_line,
> +	       graph_dotted_line, graph_dotted_line);
> +
> +	for (o = 0; o < MAX_PAGE_ORDER; o++) {
> +		printf("%5d", o);
> +		for (m = 0; m < MAX_MIGRATE_TYPES - 1; m++) {
> +			if (order_stats[o][m])
> +				printf("  %'12d", order_stats[o][m]);
> +			else
> +				printf("  %12c", '.');
> +		}
> +		printf("\n");
> +	}
> +}
> +
> +static void print_slab_result(struct perf_session *session)
>  {
>  	if (caller_flag)
> -		__print_result(&root_caller_sorted, session, caller_lines, 1);
> +		__print_slab_result(&root_caller_sorted, session, caller_lines, 1);
> +	if (alloc_flag)
> +		__print_slab_result(&root_alloc_sorted, session, alloc_lines, 0);
> +	print_slab_summary();
> +}
> +
> +static void print_page_result(struct perf_session *session)
> +{
>  	if (alloc_flag)
> -		__print_result(&root_alloc_sorted, session, alloc_lines, 0);
> -	print_summary();
> +		__print_page_result(&page_alloc_sorted, session, alloc_lines);
> +	print_page_summary();
> +}
> +
> +static void print_result(struct perf_session *session)
> +{
> +	if (kmem_slab)
> +		print_slab_result(session);
> +	if (kmem_page)
> +		print_page_result(session);
>  }
>  
>  struct sort_dimension {
> @@ -353,8 +704,8 @@ struct sort_dimension {
>  static LIST_HEAD(caller_sort);
>  static LIST_HEAD(alloc_sort);
>  
> -static void sort_insert(struct rb_root *root, struct alloc_stat *data,
> -			struct list_head *sort_list)
> +static void sort_slab_insert(struct rb_root *root, struct alloc_stat *data,
> +			     struct list_head *sort_list)
>  {
>  	struct rb_node **new = &(root->rb_node);
>  	struct rb_node *parent = NULL;
> @@ -383,8 +734,8 @@ static void sort_insert(struct rb_root *root, struct alloc_stat *data,
>  	rb_insert_color(&data->node, root);
>  }
>  
> -static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
> -			  struct list_head *sort_list)
> +static void __sort_slab_result(struct rb_root *root, struct rb_root *root_sorted,
> +			       struct list_head *sort_list)
>  {
>  	struct rb_node *node;
>  	struct alloc_stat *data;
> @@ -396,26 +747,79 @@ static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
>  
>  		rb_erase(node, root);
>  		data = rb_entry(node, struct alloc_stat, node);
> -		sort_insert(root_sorted, data, sort_list);
> +		sort_slab_insert(root_sorted, data, sort_list);
> +	}
> +}
> +
> +static void sort_page_insert(struct rb_root *root, struct page_stat *data)
> +{
> +	struct rb_node **new = &root->rb_node;
> +	struct rb_node *parent = NULL;
> +
> +	while (*new) {
> +		struct page_stat *this;
> +		int cmp = 0;
> +
> +		this = rb_entry(*new, struct page_stat, node);
> +		parent = *new;
> +
> +		/* TODO: support more sort key */
> +		cmp = data->alloc_bytes - this->alloc_bytes;
> +
> +		if (cmp > 0)
> +			new = &parent->rb_left;
> +		else
> +			new = &parent->rb_right;
> +	}
> +
> +	rb_link_node(&data->node, parent, new);
> +	rb_insert_color(&data->node, root);
> +}
> +
> +static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted)
> +{
> +	struct rb_node *node;
> +	struct page_stat *data;
> +
> +	for (;;) {
> +		node = rb_first(root);
> +		if (!node)
> +			break;
> +
> +		rb_erase(node, root);
> +		data = rb_entry(node, struct page_stat, node);
> +		sort_page_insert(root_sorted, data);
>  	}
>  }
>  
>  static void sort_result(void)
>  {
> -	__sort_result(&root_alloc_stat, &root_alloc_sorted, &alloc_sort);
> -	__sort_result(&root_caller_stat, &root_caller_sorted, &caller_sort);
> +	if (kmem_slab) {
> +		__sort_slab_result(&root_alloc_stat, &root_alloc_sorted,
> +				   &alloc_sort);
> +		__sort_slab_result(&root_caller_stat, &root_caller_sorted,
> +				   &caller_sort);
> +	}
> +	if (kmem_page) {
> +		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
> +	}
>  }
>  
>  static int __cmd_kmem(struct perf_session *session)
>  {
>  	int err = -EINVAL;
> +	struct perf_evsel *evsel;
>  	const struct perf_evsel_str_handler kmem_tracepoints[] = {
> +		/* slab allocator */
>  		{ "kmem:kmalloc",		perf_evsel__process_alloc_event, },
>      		{ "kmem:kmem_cache_alloc",	perf_evsel__process_alloc_event, },
>  		{ "kmem:kmalloc_node",		perf_evsel__process_alloc_node_event, },
>      		{ "kmem:kmem_cache_alloc_node", perf_evsel__process_alloc_node_event, },
>  		{ "kmem:kfree",			perf_evsel__process_free_event, },
>      		{ "kmem:kmem_cache_free",	perf_evsel__process_free_event, },
> +		/* page allocator */
> +		{ "kmem:mm_page_alloc",		perf_evsel__process_page_alloc_event, },
> +		{ "kmem:mm_page_free",		perf_evsel__process_page_free_event, },
>  	};
>  
>  	if (!perf_session__has_traces(session, "kmem record"))
> @@ -426,10 +830,20 @@ static int __cmd_kmem(struct perf_session *session)
>  		goto out;
>  	}
>  
> +	evlist__for_each(session->evlist, evsel) {
> +		if (!strcmp(perf_evsel__name(evsel), "kmem:mm_page_alloc") &&
> +		    perf_evsel__field(evsel, "pfn")) {
> +			use_pfn = true;
> +			break;
> +		}
> +	}
> +
>  	setup_pager();
>  	err = perf_session__process_events(session);
> -	if (err != 0)
> +	if (err != 0) {
> +		pr_err("error during process events: %d\n", err);
>  		goto out;
> +	}
>  	sort_result();
>  	print_result(session);
>  out:
> @@ -612,6 +1026,22 @@ static int parse_alloc_opt(const struct option *opt __maybe_unused,
>  	return 0;
>  }
>  
> +static int parse_slab_opt(const struct option *opt __maybe_unused,
> +			  const char *arg __maybe_unused,
> +			  int unset __maybe_unused)
> +{
> +	kmem_slab = (kmem_page + 1);
> +	return 0;
> +}
> +
> +static int parse_page_opt(const struct option *opt __maybe_unused,
> +			  const char *arg __maybe_unused,
> +			  int unset __maybe_unused)
> +{
> +	kmem_page = (kmem_slab + 1);
> +	return 0;
> +}
> +
>  static int parse_line_opt(const struct option *opt __maybe_unused,
>  			  const char *arg, int unset __maybe_unused)
>  {
> @@ -634,6 +1064,8 @@ static int __cmd_record(int argc, const char **argv)
>  {
>  	const char * const record_args[] = {
>  	"record", "-a", "-R", "-c", "1",
> +	};
> +	const char * const slab_events[] = {
>  	"-e", "kmem:kmalloc",
>  	"-e", "kmem:kmalloc_node",
>  	"-e", "kmem:kfree",
> @@ -641,10 +1073,19 @@ static int __cmd_record(int argc, const char **argv)
>  	"-e", "kmem:kmem_cache_alloc_node",
>  	"-e", "kmem:kmem_cache_free",
>  	};
> +	const char * const page_events[] = {
> +	"-e", "kmem:mm_page_alloc",
> +	"-e", "kmem:mm_page_free",
> +	};
>  	unsigned int rec_argc, i, j;
>  	const char **rec_argv;
>  
>  	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
> +	if (kmem_slab)
> +		rec_argc += ARRAY_SIZE(slab_events);
> +	if (kmem_page)
> +		rec_argc += ARRAY_SIZE(page_events);
> +
>  	rec_argv = calloc(rec_argc + 1, sizeof(char *));
>  
>  	if (rec_argv == NULL)
> @@ -653,6 +1094,15 @@ static int __cmd_record(int argc, const char **argv)
>  	for (i = 0; i < ARRAY_SIZE(record_args); i++)
>  		rec_argv[i] = strdup(record_args[i]);
>  
> +	if (kmem_slab) {
> +		for (j = 0; j < ARRAY_SIZE(slab_events); j++, i++)
> +			rec_argv[i] = strdup(slab_events[j]);
> +	}
> +	if (kmem_page) {
> +		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
> +			rec_argv[i] = strdup(page_events[j]);
> +	}
> +
>  	for (j = 1; j < (unsigned int)argc; j++, i++)
>  		rec_argv[i] = argv[j];
>  
> @@ -679,6 +1129,10 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  	OPT_CALLBACK('l', "line", NULL, "num", "show n lines", parse_line_opt),
>  	OPT_BOOLEAN(0, "raw-ip", &raw_ip, "show raw ip instead of symbol"),
>  	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
> +	OPT_CALLBACK_NOOPT(0, "slab", NULL, NULL, "Analyze slab allocator",
> +			   parse_slab_opt),
> +	OPT_CALLBACK_NOOPT(0, "page", NULL, NULL, "Analyze page allocator",
> +			   parse_page_opt),
>  	OPT_END()
>  	};
>  	const char *const kmem_subcommands[] = { "record", "stat", NULL };
> @@ -695,6 +1149,9 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  	if (!argc)
>  		usage_with_options(kmem_usage, kmem_options);
>  
> +	if (kmem_slab == 0 && kmem_page == 0)
> +		kmem_slab = 1;  /* for backward compatibility */
> +
>  	if (!strncmp(argv[0], "rec", 3)) {
>  		symbol__init(NULL);
>  		return __cmd_record(argc, argv);
> @@ -706,6 +1163,17 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  	if (session == NULL)
>  		return -1;
>  
> +	if (kmem_page) {
> +		struct perf_evsel *evsel = perf_evlist__first(session->evlist);
> +
> +		if (evsel == NULL || evsel->tp_format == NULL) {
> +			pr_err("invalid event found.. aborting\n");
> +			return -1;
> +		}
> +
> +		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
> +	}
> +
>  	symbol__init(&session->header.env);
>  
>  	if (!strcmp(argv[0], "stat")) {
> -- 
> 2.3.2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
@ 2015-04-10 21:06     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-10 21:06 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> The perf kmem command records and analyze kernel memory allocation
> only for SLAB objects.  This patch implement a simple page allocator
> analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.
> 
> It adds two new options of --slab and --page.  The --slab option is
> for analyzing SLAB allocator and that's what perf kmem currently does.
> 
> The new --page option enables page allocator events and analyze kernel
> memory usage in page unit.  Currently, 'stat --alloc' subcommand is
> implemented only.
> 
> If none of these --slab nor --page is specified, --slab is implied.
> 
>   # perf kmem stat --page --alloc --line 10

Hi, applied the first patch, the kernel one, reboot with that kernel:

[root@ssdandy ~]# cat /t/events/kmem/mm_page_free/format 
name: mm_page_free
ID: 367
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long pfn;	offset:8;	size:8;	signed:0;
	field:unsigned int order;	offset:16;	size:4;	signed:0;

print fmt: "page=%p pfn=%lu order=%d", (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)), REC->pfn, REC->order
[root@ssdandy ~]#

Then did:

[root@ssdandy ~]# perf kmem record


^C[ perf record: Woken up 0 times to write data ]
per[ perf record: Captured and wrote 624.101 MB perf.data (7022790 samples) ]

[root@ssdandy ~]# perf evlist
kmem:kmalloc
kmem:kmalloc_node
kmem:kfree
kmem:kmem_cache_alloc
kmem:kmem_cache_alloc_node
kmem:kmem_cache_free
[root@ssdandy ~]# ls -la perf.data
-rw-------. 1 root root 659632943 Apr 10 18:05 perf.data
[root@ssdandy ~]#

But I only get:

[root@ssdandy ~]# perf kmem stat --page --alloc
Warning:
188 out of order events recorded.

--------------------------------------------------------------------------------
 Page             | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags
--------------------------------------------------------------------------------
 ...              | ...              | ...       | ...   | ...      | ...     
--------------------------------------------------------------------------------

SUMMARY (page allocator)
========================
Total allocation requests     :                0   [                0 KB ]
Total free requests           :                0   [                0 KB ]

Total alloc+freed requests    :                0   [                0 KB ]
Total alloc-only requests     :                0   [                0 KB ]
Total free-only requests      :                0   [                0 KB ]

Total allocation failures     :                0   [                0 KB ]

Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
-----  ------------  ------------  ------------  ------------  ------------
    0             .             .             .             .             .
    1             .             .             .             .             .
    2             .             .             .             .             .
    3             .             .             .             .             .
    4             .             .             .             .             .
    5             .             .             .             .             .
    6             .             .             .             .             .
    7             .             .             .             .             .
    8             .             .             .             .             .
    9             .             .             .             .             .
   10             .             .             .             .             .
[root@ssdandy ~]#

What am I missing?

- Arnaldo
 
>   -------------------------------------------------------------------------------
>    PFN              | Total alloc (KB) | Hits     | Order | Mig.type | GFP flags
>   -------------------------------------------------------------------------------
>             4045014 |               16 |        1 |     2 |  RECLAIM |  00285250
>             4143980 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3938658 |               16 |        1 |     2 |  RECLAIM |  00285250
>             4045400 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3568708 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3729824 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3657210 |               16 |        1 |     2 |  RECLAIM |  00285250
>             4120750 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3678850 |               16 |        1 |     2 |  RECLAIM |  00285250
>             3693874 |               16 |        1 |     2 |  RECLAIM |  00285250
>    ...              | ...              | ...      | ...   | ...      | ...
>   -------------------------------------------------------------------------------
> 
>   SUMMARY (page allocator)
>   ========================
>   Total allocation requests     :           44,260   [          177,256 KB ]
>   Total free requests           :              117   [              468 KB ]
> 
>   Total alloc+freed requests    :               49   [              196 KB ]
>   Total alloc-only requests     :           44,211   [          177,060 KB ]
>   Total free-only requests      :               68   [              272 KB ]
> 
>   Total allocation failures     :                0   [                0 KB ]
> 
>   Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
>   -----  ------------  ------------  ------------  ------------  ------------
>       0            32             .        44,210             .             .
>       1             .             .             .             .             .
>       2             .            18             .             .             .
>       3             .             .             .             .             .
>       4             .             .             .             .             .
>       5             .             .             .             .             .
>       6             .             .             .             .             .
>       7             .             .             .             .             .
>       8             .             .             .             .             .
>       9             .             .             .             .             .
>      10             .             .             .             .             .
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-kmem.txt |   8 +-
>  tools/perf/builtin-kmem.c              | 500 +++++++++++++++++++++++++++++++--
>  2 files changed, 491 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
> index 150253cc3c97..23219c65c16f 100644
> --- a/tools/perf/Documentation/perf-kmem.txt
> +++ b/tools/perf/Documentation/perf-kmem.txt
> @@ -3,7 +3,7 @@ perf-kmem(1)
>  
>  NAME
>  ----
> -perf-kmem - Tool to trace/measure kernel memory(slab) properties
> +perf-kmem - Tool to trace/measure kernel memory properties
>  
>  SYNOPSIS
>  --------
> @@ -46,6 +46,12 @@ OPTIONS
>  --raw-ip::
>  	Print raw ip instead of symbol
>  
> +--slab::
> +	Analyze SLAB allocator events.
> +
> +--page::
> +	Analyze page allocator events
> +
>  SEE ALSO
>  --------
>  linkperf:perf-record[1]
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index 4ebf65c79434..63ea01349b6e 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -22,6 +22,11 @@
>  #include <linux/string.h>
>  #include <locale.h>
>  
> +static int	kmem_slab;
> +static int	kmem_page;
> +
> +static long	kmem_page_size;
> +
>  struct alloc_stat;
>  typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
>  
> @@ -226,6 +231,244 @@ static int perf_evsel__process_free_event(struct perf_evsel *evsel,
>  	return 0;
>  }
>  
> +static u64 total_page_alloc_bytes;
> +static u64 total_page_free_bytes;
> +static u64 total_page_nomatch_bytes;
> +static u64 total_page_fail_bytes;
> +static unsigned long nr_page_allocs;
> +static unsigned long nr_page_frees;
> +static unsigned long nr_page_fails;
> +static unsigned long nr_page_nomatch;
> +
> +static bool use_pfn;
> +
> +#define MAX_MIGRATE_TYPES  6
> +#define MAX_PAGE_ORDER     11
> +
> +static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
> +
> +struct page_stat {
> +	struct rb_node 	node;
> +	u64 		page;
> +	int 		order;
> +	unsigned 	gfp_flags;
> +	unsigned 	migrate_type;
> +	u64		alloc_bytes;
> +	u64 		free_bytes;
> +	int 		nr_alloc;
> +	int 		nr_free;
> +};
> +
> +static struct rb_root page_tree;
> +static struct rb_root page_alloc_tree;
> +static struct rb_root page_alloc_sorted;
> +
> +static struct page_stat *search_page(unsigned long page, bool create)
> +{
> +	struct rb_node **node = &page_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct page_stat *data;
> +
> +	while (*node) {
> +		s64 cmp;
> +
> +		parent = *node;
> +		data = rb_entry(*node, struct page_stat, node);
> +
> +		cmp = data->page - page;
> +		if (cmp < 0)
> +			node = &parent->rb_left;
> +		else if (cmp > 0)
> +			node = &parent->rb_right;
> +		else
> +			return data;
> +	}
> +
> +	if (!create)
> +		return NULL;
> +
> +	data = zalloc(sizeof(*data));
> +	if (data != NULL) {
> +		data->page = page;
> +
> +		rb_link_node(&data->node, parent, node);
> +		rb_insert_color(&data->node, &page_tree);
> +	}
> +
> +	return data;
> +}
> +
> +static int page_stat_cmp(struct page_stat *a, struct page_stat *b)
> +{
> +	if (a->page > b->page)
> +		return -1;
> +	if (a->page < b->page)
> +		return 1;
> +	if (a->order > b->order)
> +		return -1;
> +	if (a->order < b->order)
> +		return 1;
> +	if (a->migrate_type > b->migrate_type)
> +		return -1;
> +	if (a->migrate_type < b->migrate_type)
> +		return 1;
> +	if (a->gfp_flags > b->gfp_flags)
> +		return -1;
> +	if (a->gfp_flags < b->gfp_flags)
> +		return 1;
> +	return 0;
> +}
> +
> +static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool create)
> +{
> +	struct rb_node **node = &page_alloc_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct page_stat *data;
> +
> +	while (*node) {
> +		s64 cmp;
> +
> +		parent = *node;
> +		data = rb_entry(*node, struct page_stat, node);
> +
> +		cmp = page_stat_cmp(data, stat);
> +		if (cmp < 0)
> +			node = &parent->rb_left;
> +		else if (cmp > 0)
> +			node = &parent->rb_right;
> +		else
> +			return data;
> +	}
> +
> +	if (!create)
> +		return NULL;
> +
> +	data = zalloc(sizeof(*data));
> +	if (data != NULL) {
> +		data->page = stat->page;
> +		data->order = stat->order;
> +		data->gfp_flags = stat->gfp_flags;
> +		data->migrate_type = stat->migrate_type;
> +
> +		rb_link_node(&data->node, parent, node);
> +		rb_insert_color(&data->node, &page_alloc_tree);
> +	}
> +
> +	return data;
> +}
> +
> +static bool valid_page(u64 pfn_or_page)
> +{
> +	if (use_pfn && pfn_or_page == -1UL)
> +		return false;
> +	if (!use_pfn && pfn_or_page == 0)
> +		return false;
> +	return true;
> +}
> +
> +static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
> +						struct perf_sample *sample)
> +{
> +	u64 page;
> +	unsigned int order = perf_evsel__intval(evsel, sample, "order");
> +	unsigned int gfp_flags = perf_evsel__intval(evsel, sample, "gfp_flags");
> +	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
> +						       "migratetype");
> +	u64 bytes = kmem_page_size << order;
> +	struct page_stat *stat;
> +	struct page_stat this = {
> +		.order = order,
> +		.gfp_flags = gfp_flags,
> +		.migrate_type = migrate_type,
> +	};
> +
> +	if (use_pfn)
> +		page = perf_evsel__intval(evsel, sample, "pfn");
> +	else
> +		page = perf_evsel__intval(evsel, sample, "page");
> +
> +	nr_page_allocs++;
> +	total_page_alloc_bytes += bytes;
> +
> +	if (!valid_page(page)) {
> +		nr_page_fails++;
> +		total_page_fail_bytes += bytes;
> +
> +		return 0;
> +	}
> +
> +	/*
> +	 * This is to find the current page (with correct gfp flags and
> +	 * migrate type) at free event.
> +	 */
> +	stat = search_page(page, true);
> +	if (stat == NULL)
> +		return -ENOMEM;
> +
> +	stat->order = order;
> +	stat->gfp_flags = gfp_flags;
> +	stat->migrate_type = migrate_type;
> +
> +	this.page = page;
> +	stat = search_page_alloc_stat(&this, true);
> +	if (stat == NULL)
> +		return -ENOMEM;
> +
> +	stat->nr_alloc++;
> +	stat->alloc_bytes += bytes;
> +
> +	order_stats[order][migrate_type]++;
> +
> +	return 0;
> +}
> +
> +static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
> +						struct perf_sample *sample)
> +{
> +	u64 page;
> +	unsigned int order = perf_evsel__intval(evsel, sample, "order");
> +	u64 bytes = kmem_page_size << order;
> +	struct page_stat *stat;
> +	struct page_stat this = {
> +		.order = order,
> +	};
> +
> +	if (use_pfn)
> +		page = perf_evsel__intval(evsel, sample, "pfn");
> +	else
> +		page = perf_evsel__intval(evsel, sample, "page");
> +
> +	nr_page_frees++;
> +	total_page_free_bytes += bytes;
> +
> +	stat = search_page(page, false);
> +	if (stat == NULL) {
> +		pr_debug2("missing free at page %"PRIx64" (order: %d)\n",
> +			  page, order);
> +
> +		nr_page_nomatch++;
> +		total_page_nomatch_bytes += bytes;
> +
> +		return 0;
> +	}
> +
> +	this.page = page;
> +	this.gfp_flags = stat->gfp_flags;
> +	this.migrate_type = stat->migrate_type;
> +
> +	rb_erase(&stat->node, &page_tree);
> +	free(stat);
> +
> +	stat = search_page_alloc_stat(&this, false);
> +	if (stat == NULL)
> +		return -ENOENT;
> +
> +	stat->nr_free++;
> +	stat->free_bytes += bytes;
> +
> +	return 0;
> +}
> +
>  typedef int (*tracepoint_handler)(struct perf_evsel *evsel,
>  				  struct perf_sample *sample);
>  
> @@ -270,8 +513,9 @@ static double fragmentation(unsigned long n_req, unsigned long n_alloc)
>  		return 100.0 - (100.0 * n_req / n_alloc);
>  }
>  
> -static void __print_result(struct rb_root *root, struct perf_session *session,
> -			   int n_lines, int is_caller)
> +static void __print_slab_result(struct rb_root *root,
> +				struct perf_session *session,
> +				int n_lines, int is_caller)
>  {
>  	struct rb_node *next;
>  	struct machine *machine = &session->machines.host;
> @@ -323,9 +567,56 @@ static void __print_result(struct rb_root *root, struct perf_session *session,
>  	printf("%.105s\n", graph_dotted_line);
>  }
>  
> -static void print_summary(void)
> +static const char * const migrate_type_str[] = {
> +	"UNMOVABL",
> +	"RECLAIM",
> +	"MOVABLE",
> +	"RESERVED",
> +	"CMA/ISLT",
> +	"UNKNOWN",
> +};
> +
> +static void __print_page_result(struct rb_root *root,
> +				struct perf_session *session __maybe_unused,
> +				int n_lines)
> +{
> +	struct rb_node *next = rb_first(root);
> +	const char *format;
> +
> +	printf("\n%.80s\n", graph_dotted_line);
> +	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
> +	       use_pfn ? "PFN" : "Page");
> +	printf("%.80s\n", graph_dotted_line);
> +
> +	if (use_pfn)
> +		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
> +	else
> +		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
> +
> +	while (next && n_lines--) {
> +		struct page_stat *data;
> +
> +		data = rb_entry(next, struct page_stat, node);
> +
> +		printf(format, (unsigned long long)data->page,
> +		       (unsigned long long)data->alloc_bytes / 1024,
> +		       data->nr_alloc, data->order,
> +		       migrate_type_str[data->migrate_type],
> +		       (unsigned long)data->gfp_flags);
> +
> +		next = rb_next(next);
> +	}
> +
> +	if (n_lines == -1)
> +		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
> +
> +	printf("%.80s\n", graph_dotted_line);
> +}
> +
> +static void print_slab_summary(void)
>  {
> -	printf("\nSUMMARY\n=======\n");
> +	printf("\nSUMMARY (SLAB allocator)");
> +	printf("\n========================\n");
>  	printf("Total bytes requested: %'lu\n", total_requested);
>  	printf("Total bytes allocated: %'lu\n", total_allocated);
>  	printf("Total bytes wasted on internal fragmentation: %'lu\n",
> @@ -335,13 +626,73 @@ static void print_summary(void)
>  	printf("Cross CPU allocations: %'lu/%'lu\n", nr_cross_allocs, nr_allocs);
>  }
>  
> -static void print_result(struct perf_session *session)
> +static void print_page_summary(void)
> +{
> +	int o, m;
> +	u64 nr_alloc_freed = nr_page_frees - nr_page_nomatch;
> +	u64 total_alloc_freed_bytes = total_page_free_bytes - total_page_nomatch_bytes;
> +
> +	printf("\nSUMMARY (page allocator)");
> +	printf("\n========================\n");
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation requests",
> +	       nr_page_allocs, total_page_alloc_bytes / 1024);
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free requests",
> +	       nr_page_frees, total_page_free_bytes / 1024);
> +	printf("\n");
> +
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc+freed requests",
> +	       nr_alloc_freed, (total_alloc_freed_bytes) / 1024);
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc-only requests",
> +	       nr_page_allocs - nr_alloc_freed,
> +	       (total_page_alloc_bytes - total_alloc_freed_bytes) / 1024);
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free-only requests",
> +	       nr_page_nomatch, total_page_nomatch_bytes / 1024);
> +	printf("\n");
> +
> +	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation failures",
> +	       nr_page_fails, total_page_fail_bytes / 1024);
> +	printf("\n");
> +
> +	printf("%5s  %12s  %12s  %12s  %12s  %12s\n", "Order",  "Unmovable",
> +	       "Reclaimable", "Movable", "Reserved", "CMA/Isolated");
> +	printf("%.5s  %.12s  %.12s  %.12s  %.12s  %.12s\n", graph_dotted_line,
> +	       graph_dotted_line, graph_dotted_line, graph_dotted_line,
> +	       graph_dotted_line, graph_dotted_line);
> +
> +	for (o = 0; o < MAX_PAGE_ORDER; o++) {
> +		printf("%5d", o);
> +		for (m = 0; m < MAX_MIGRATE_TYPES - 1; m++) {
> +			if (order_stats[o][m])
> +				printf("  %'12d", order_stats[o][m]);
> +			else
> +				printf("  %12c", '.');
> +		}
> +		printf("\n");
> +	}
> +}
> +
> +static void print_slab_result(struct perf_session *session)
>  {
>  	if (caller_flag)
> -		__print_result(&root_caller_sorted, session, caller_lines, 1);
> +		__print_slab_result(&root_caller_sorted, session, caller_lines, 1);
> +	if (alloc_flag)
> +		__print_slab_result(&root_alloc_sorted, session, alloc_lines, 0);
> +	print_slab_summary();
> +}
> +
> +static void print_page_result(struct perf_session *session)
> +{
>  	if (alloc_flag)
> -		__print_result(&root_alloc_sorted, session, alloc_lines, 0);
> -	print_summary();
> +		__print_page_result(&page_alloc_sorted, session, alloc_lines);
> +	print_page_summary();
> +}
> +
> +static void print_result(struct perf_session *session)
> +{
> +	if (kmem_slab)
> +		print_slab_result(session);
> +	if (kmem_page)
> +		print_page_result(session);
>  }
>  
>  struct sort_dimension {
> @@ -353,8 +704,8 @@ struct sort_dimension {
>  static LIST_HEAD(caller_sort);
>  static LIST_HEAD(alloc_sort);
>  
> -static void sort_insert(struct rb_root *root, struct alloc_stat *data,
> -			struct list_head *sort_list)
> +static void sort_slab_insert(struct rb_root *root, struct alloc_stat *data,
> +			     struct list_head *sort_list)
>  {
>  	struct rb_node **new = &(root->rb_node);
>  	struct rb_node *parent = NULL;
> @@ -383,8 +734,8 @@ static void sort_insert(struct rb_root *root, struct alloc_stat *data,
>  	rb_insert_color(&data->node, root);
>  }
>  
> -static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
> -			  struct list_head *sort_list)
> +static void __sort_slab_result(struct rb_root *root, struct rb_root *root_sorted,
> +			       struct list_head *sort_list)
>  {
>  	struct rb_node *node;
>  	struct alloc_stat *data;
> @@ -396,26 +747,79 @@ static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
>  
>  		rb_erase(node, root);
>  		data = rb_entry(node, struct alloc_stat, node);
> -		sort_insert(root_sorted, data, sort_list);
> +		sort_slab_insert(root_sorted, data, sort_list);
> +	}
> +}
> +
> +static void sort_page_insert(struct rb_root *root, struct page_stat *data)
> +{
> +	struct rb_node **new = &root->rb_node;
> +	struct rb_node *parent = NULL;
> +
> +	while (*new) {
> +		struct page_stat *this;
> +		int cmp = 0;
> +
> +		this = rb_entry(*new, struct page_stat, node);
> +		parent = *new;
> +
> +		/* TODO: support more sort key */
> +		cmp = data->alloc_bytes - this->alloc_bytes;
> +
> +		if (cmp > 0)
> +			new = &parent->rb_left;
> +		else
> +			new = &parent->rb_right;
> +	}
> +
> +	rb_link_node(&data->node, parent, new);
> +	rb_insert_color(&data->node, root);
> +}
> +
> +static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted)
> +{
> +	struct rb_node *node;
> +	struct page_stat *data;
> +
> +	for (;;) {
> +		node = rb_first(root);
> +		if (!node)
> +			break;
> +
> +		rb_erase(node, root);
> +		data = rb_entry(node, struct page_stat, node);
> +		sort_page_insert(root_sorted, data);
>  	}
>  }
>  
>  static void sort_result(void)
>  {
> -	__sort_result(&root_alloc_stat, &root_alloc_sorted, &alloc_sort);
> -	__sort_result(&root_caller_stat, &root_caller_sorted, &caller_sort);
> +	if (kmem_slab) {
> +		__sort_slab_result(&root_alloc_stat, &root_alloc_sorted,
> +				   &alloc_sort);
> +		__sort_slab_result(&root_caller_stat, &root_caller_sorted,
> +				   &caller_sort);
> +	}
> +	if (kmem_page) {
> +		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
> +	}
>  }
>  
>  static int __cmd_kmem(struct perf_session *session)
>  {
>  	int err = -EINVAL;
> +	struct perf_evsel *evsel;
>  	const struct perf_evsel_str_handler kmem_tracepoints[] = {
> +		/* slab allocator */
>  		{ "kmem:kmalloc",		perf_evsel__process_alloc_event, },
>      		{ "kmem:kmem_cache_alloc",	perf_evsel__process_alloc_event, },
>  		{ "kmem:kmalloc_node",		perf_evsel__process_alloc_node_event, },
>      		{ "kmem:kmem_cache_alloc_node", perf_evsel__process_alloc_node_event, },
>  		{ "kmem:kfree",			perf_evsel__process_free_event, },
>      		{ "kmem:kmem_cache_free",	perf_evsel__process_free_event, },
> +		/* page allocator */
> +		{ "kmem:mm_page_alloc",		perf_evsel__process_page_alloc_event, },
> +		{ "kmem:mm_page_free",		perf_evsel__process_page_free_event, },
>  	};
>  
>  	if (!perf_session__has_traces(session, "kmem record"))
> @@ -426,10 +830,20 @@ static int __cmd_kmem(struct perf_session *session)
>  		goto out;
>  	}
>  
> +	evlist__for_each(session->evlist, evsel) {
> +		if (!strcmp(perf_evsel__name(evsel), "kmem:mm_page_alloc") &&
> +		    perf_evsel__field(evsel, "pfn")) {
> +			use_pfn = true;
> +			break;
> +		}
> +	}
> +
>  	setup_pager();
>  	err = perf_session__process_events(session);
> -	if (err != 0)
> +	if (err != 0) {
> +		pr_err("error during process events: %d\n", err);
>  		goto out;
> +	}
>  	sort_result();
>  	print_result(session);
>  out:
> @@ -612,6 +1026,22 @@ static int parse_alloc_opt(const struct option *opt __maybe_unused,
>  	return 0;
>  }
>  
> +static int parse_slab_opt(const struct option *opt __maybe_unused,
> +			  const char *arg __maybe_unused,
> +			  int unset __maybe_unused)
> +{
> +	kmem_slab = (kmem_page + 1);
> +	return 0;
> +}
> +
> +static int parse_page_opt(const struct option *opt __maybe_unused,
> +			  const char *arg __maybe_unused,
> +			  int unset __maybe_unused)
> +{
> +	kmem_page = (kmem_slab + 1);
> +	return 0;
> +}
> +
>  static int parse_line_opt(const struct option *opt __maybe_unused,
>  			  const char *arg, int unset __maybe_unused)
>  {
> @@ -634,6 +1064,8 @@ static int __cmd_record(int argc, const char **argv)
>  {
>  	const char * const record_args[] = {
>  	"record", "-a", "-R", "-c", "1",
> +	};
> +	const char * const slab_events[] = {
>  	"-e", "kmem:kmalloc",
>  	"-e", "kmem:kmalloc_node",
>  	"-e", "kmem:kfree",
> @@ -641,10 +1073,19 @@ static int __cmd_record(int argc, const char **argv)
>  	"-e", "kmem:kmem_cache_alloc_node",
>  	"-e", "kmem:kmem_cache_free",
>  	};
> +	const char * const page_events[] = {
> +	"-e", "kmem:mm_page_alloc",
> +	"-e", "kmem:mm_page_free",
> +	};
>  	unsigned int rec_argc, i, j;
>  	const char **rec_argv;
>  
>  	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
> +	if (kmem_slab)
> +		rec_argc += ARRAY_SIZE(slab_events);
> +	if (kmem_page)
> +		rec_argc += ARRAY_SIZE(page_events);
> +
>  	rec_argv = calloc(rec_argc + 1, sizeof(char *));
>  
>  	if (rec_argv == NULL)
> @@ -653,6 +1094,15 @@ static int __cmd_record(int argc, const char **argv)
>  	for (i = 0; i < ARRAY_SIZE(record_args); i++)
>  		rec_argv[i] = strdup(record_args[i]);
>  
> +	if (kmem_slab) {
> +		for (j = 0; j < ARRAY_SIZE(slab_events); j++, i++)
> +			rec_argv[i] = strdup(slab_events[j]);
> +	}
> +	if (kmem_page) {
> +		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
> +			rec_argv[i] = strdup(page_events[j]);
> +	}
> +
>  	for (j = 1; j < (unsigned int)argc; j++, i++)
>  		rec_argv[i] = argv[j];
>  
> @@ -679,6 +1129,10 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  	OPT_CALLBACK('l', "line", NULL, "num", "show n lines", parse_line_opt),
>  	OPT_BOOLEAN(0, "raw-ip", &raw_ip, "show raw ip instead of symbol"),
>  	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
> +	OPT_CALLBACK_NOOPT(0, "slab", NULL, NULL, "Analyze slab allocator",
> +			   parse_slab_opt),
> +	OPT_CALLBACK_NOOPT(0, "page", NULL, NULL, "Analyze page allocator",
> +			   parse_page_opt),
>  	OPT_END()
>  	};
>  	const char *const kmem_subcommands[] = { "record", "stat", NULL };
> @@ -695,6 +1149,9 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  	if (!argc)
>  		usage_with_options(kmem_usage, kmem_options);
>  
> +	if (kmem_slab == 0 && kmem_page == 0)
> +		kmem_slab = 1;  /* for backward compatibility */
> +
>  	if (!strncmp(argv[0], "rec", 3)) {
>  		symbol__init(NULL);
>  		return __cmd_record(argc, argv);
> @@ -706,6 +1163,17 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  	if (session == NULL)
>  		return -1;
>  
> +	if (kmem_page) {
> +		struct perf_evsel *evsel = perf_evlist__first(session->evlist);
> +
> +		if (evsel == NULL || evsel->tp_format == NULL) {
> +			pr_err("invalid event found.. aborting\n");
> +			return -1;
> +		}
> +
> +		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
> +	}
> +
>  	symbol__init(&session->header.env);
>  
>  	if (!strcmp(argv[0], "stat")) {
> -- 
> 2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
  2015-04-10 21:06     ` Arnaldo Carvalho de Melo
@ 2015-04-10 21:10       ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-10 21:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Fri, Apr 10, 2015 at 06:06:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> > The perf kmem command records and analyze kernel memory allocation
> > only for SLAB objects.  This patch implement a simple page allocator
> > analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.
> > 
> > It adds two new options of --slab and --page.  The --slab option is
> > for analyzing SLAB allocator and that's what perf kmem currently does.
> > 
> > The new --page option enables page allocator events and analyze kernel
> > memory usage in page unit.  Currently, 'stat --alloc' subcommand is
> > implemented only.
> > 
> > If none of these --slab nor --page is specified, --slab is implied.
> > 
> >   # perf kmem stat --page --alloc --line 10
> 
> Hi, applied the first patch, the kernel one, reboot with that kernel:

<SNIP>

> [root@ssdandy ~]#
> 
> What am I missing?

Argh, I was expecting to read just what is in that cset and be able to
reproduce the results, had to go back to the [PATCH 0/0] cover letter to
figure out that I need to run:

perf kmem record --page sleep 5

With that I get:

[root@ssdandy ~]# perf kmem stat --page --alloc --line 20

--------------------------------------------------------------------------------
 PFN              | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags
--------------------------------------------------------------------------------
          3487838 |               12 |         3 |     0 | UNMOVABL |  00020010
          3493414 |                8 |         2 |     0 | UNMOVABL |  000284d0
          3487761 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3487764 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3487982 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3487991 |                4 |         1 |     0 | UNMOVABL |  000284d0
          3488046 |                4 |         1 |     0 | UNMOVABL |  002284d0
          3488057 |                4 |         1 |     0 | UNMOVABL |  000200d0
          3488191 |                4 |         1 |     0 | UNMOVABL |  002284d0
          3488203 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488206 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488210 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488211 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488213 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488215 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488298 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488325 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488326 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488327 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488329 |                4 |         1 |     0 | UNMOVABL |  000202d0
 ...              | ...              | ...       | ...   | ...      | ...     
--------------------------------------------------------------------------------

SUMMARY (page allocator)
========================
Total allocation requests     :              166   [              664 KB ]
Total free requests           :              239   [              956 KB ]

Total alloc+freed requests    :               49   [              196 KB ]
Total alloc-only requests     :              117   [              468 KB ]
Total free-only requests      :              190   [              760 KB ]

Total allocation failures     :                0   [                0 KB ]

Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
-----  ------------  ------------  ------------  ------------  ------------
    0           143             .            23             .             .
    1             .             .             .             .             .
    2             .             .             .             .             .
    3             .             .             .             .             .
    4             .             .             .             .             .
    5             .             .             .             .             .
    6             .             .             .             .             .
    7             .             .             .             .             .
    8             .             .             .             .             .
    9             .             .             .             .             .
   10             .             .             .             .             .
[root@ssdandy ~]#

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
@ 2015-04-10 21:10       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-10 21:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Fri, Apr 10, 2015 at 06:06:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> > The perf kmem command records and analyze kernel memory allocation
> > only for SLAB objects.  This patch implement a simple page allocator
> > analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.
> > 
> > It adds two new options of --slab and --page.  The --slab option is
> > for analyzing SLAB allocator and that's what perf kmem currently does.
> > 
> > The new --page option enables page allocator events and analyze kernel
> > memory usage in page unit.  Currently, 'stat --alloc' subcommand is
> > implemented only.
> > 
> > If none of these --slab nor --page is specified, --slab is implied.
> > 
> >   # perf kmem stat --page --alloc --line 10
> 
> Hi, applied the first patch, the kernel one, reboot with that kernel:

<SNIP>

> [root@ssdandy ~]#
> 
> What am I missing?

Argh, I was expecting to read just what is in that cset and be able to
reproduce the results, had to go back to the [PATCH 0/0] cover letter to
figure out that I need to run:

perf kmem record --page sleep 5

With that I get:

[root@ssdandy ~]# perf kmem stat --page --alloc --line 20

--------------------------------------------------------------------------------
 PFN              | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags
--------------------------------------------------------------------------------
          3487838 |               12 |         3 |     0 | UNMOVABL |  00020010
          3493414 |                8 |         2 |     0 | UNMOVABL |  000284d0
          3487761 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3487764 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3487982 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3487991 |                4 |         1 |     0 | UNMOVABL |  000284d0
          3488046 |                4 |         1 |     0 | UNMOVABL |  002284d0
          3488057 |                4 |         1 |     0 | UNMOVABL |  000200d0
          3488191 |                4 |         1 |     0 | UNMOVABL |  002284d0
          3488203 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488206 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488210 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488211 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488213 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488215 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488298 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488325 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488326 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488327 |                4 |         1 |     0 | UNMOVABL |  000202d0
          3488329 |                4 |         1 |     0 | UNMOVABL |  000202d0
 ...              | ...              | ...       | ...   | ...      | ...     
--------------------------------------------------------------------------------

SUMMARY (page allocator)
========================
Total allocation requests     :              166   [              664 KB ]
Total free requests           :              239   [              956 KB ]

Total alloc+freed requests    :               49   [              196 KB ]
Total alloc-only requests     :              117   [              468 KB ]
Total free-only requests      :              190   [              760 KB ]

Total allocation failures     :                0   [                0 KB ]

Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
-----  ------------  ------------  ------------  ------------  ------------
    0           143             .            23             .             .
    1             .             .             .             .             .
    2             .             .             .             .             .
    3             .             .             .             .             .
    4             .             .             .             .             .
    5             .             .             .             .             .
    6             .             .             .             .             .
    7             .             .             .             .             .
    8             .             .             .             .             .
    9             .             .             .             .             .
   10             .             .             .             .             .
[root@ssdandy ~]#

- Arnaldo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
  2015-04-10 21:10       ` Arnaldo Carvalho de Melo
@ 2015-04-13  6:59         ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-13  6:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Hi Arnaldo,

On Fri, Apr 10, 2015 at 06:10:49PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Apr 10, 2015 at 06:06:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> > > The perf kmem command records and analyze kernel memory allocation
> > > only for SLAB objects.  This patch implement a simple page allocator
> > > analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.
> > > 
> > > It adds two new options of --slab and --page.  The --slab option is
> > > for analyzing SLAB allocator and that's what perf kmem currently does.
> > > 
> > > The new --page option enables page allocator events and analyze kernel
> > > memory usage in page unit.  Currently, 'stat --alloc' subcommand is
> > > implemented only.
> > > 
> > > If none of these --slab nor --page is specified, --slab is implied.
> > > 
> > >   # perf kmem stat --page --alloc --line 10
> > 
> > Hi, applied the first patch, the kernel one, reboot with that kernel:
> 
> <SNIP>
> 
> > [root@ssdandy ~]#
> > 
> > What am I missing?
> 
> Argh, I was expecting to read just what is in that cset and be able to
> reproduce the results, had to go back to the [PATCH 0/0] cover letter to
> figure out that I need to run:
> 
> perf kmem record --page sleep 5

Right.  Maybe I need to change to print warning if no events found
with option.


> 
> With that I get:
> 
> [root@ssdandy ~]# perf kmem stat --page --alloc --line 20
> 
> --------------------------------------------------------------------------------
>  PFN              | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags
> --------------------------------------------------------------------------------
>           3487838 |               12 |         3 |     0 | UNMOVABL |  00020010
>           3493414 |                8 |         2 |     0 | UNMOVABL |  000284d0
>           3487761 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3487764 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3487982 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3487991 |                4 |         1 |     0 | UNMOVABL |  000284d0
>           3488046 |                4 |         1 |     0 | UNMOVABL |  002284d0
>           3488057 |                4 |         1 |     0 | UNMOVABL |  000200d0
>           3488191 |                4 |         1 |     0 | UNMOVABL |  002284d0
>           3488203 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488206 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488210 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488211 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488213 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488215 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488298 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488325 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488326 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488327 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488329 |                4 |         1 |     0 | UNMOVABL |  000202d0
>  ...              | ...              | ...       | ...   | ...      | ...     
> --------------------------------------------------------------------------------

Hmm.. looks like you ran some old version.  Please check v6! :)

Thanks,
Namhyung


> 
> SUMMARY (page allocator)
> ========================
> Total allocation requests     :              166   [              664 KB ]
> Total free requests           :              239   [              956 KB ]
> 
> Total alloc+freed requests    :               49   [              196 KB ]
> Total alloc-only requests     :              117   [              468 KB ]
> Total free-only requests      :              190   [              760 KB ]
> 
> Total allocation failures     :                0   [                0 KB ]
> 
> Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
> -----  ------------  ------------  ------------  ------------  ------------
>     0           143             .            23             .             .
>     1             .             .             .             .             .
>     2             .             .             .             .             .
>     3             .             .             .             .             .
>     4             .             .             .             .             .
>     5             .             .             .             .             .
>     6             .             .             .             .             .
>     7             .             .             .             .             .
>     8             .             .             .             .             .
>     9             .             .             .             .             .
>    10             .             .             .             .             .
> [root@ssdandy ~]#
> 
> - Arnaldo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
@ 2015-04-13  6:59         ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-13  6:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Hi Arnaldo,

On Fri, Apr 10, 2015 at 06:10:49PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Apr 10, 2015 at 06:06:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> > > The perf kmem command records and analyze kernel memory allocation
> > > only for SLAB objects.  This patch implement a simple page allocator
> > > analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.
> > > 
> > > It adds two new options of --slab and --page.  The --slab option is
> > > for analyzing SLAB allocator and that's what perf kmem currently does.
> > > 
> > > The new --page option enables page allocator events and analyze kernel
> > > memory usage in page unit.  Currently, 'stat --alloc' subcommand is
> > > implemented only.
> > > 
> > > If none of these --slab nor --page is specified, --slab is implied.
> > > 
> > >   # perf kmem stat --page --alloc --line 10
> > 
> > Hi, applied the first patch, the kernel one, reboot with that kernel:
> 
> <SNIP>
> 
> > [root@ssdandy ~]#
> > 
> > What am I missing?
> 
> Argh, I was expecting to read just what is in that cset and be able to
> reproduce the results, had to go back to the [PATCH 0/0] cover letter to
> figure out that I need to run:
> 
> perf kmem record --page sleep 5

Right.  Maybe I need to change to print warning if no events found
with option.


> 
> With that I get:
> 
> [root@ssdandy ~]# perf kmem stat --page --alloc --line 20
> 
> --------------------------------------------------------------------------------
>  PFN              | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags
> --------------------------------------------------------------------------------
>           3487838 |               12 |         3 |     0 | UNMOVABL |  00020010
>           3493414 |                8 |         2 |     0 | UNMOVABL |  000284d0
>           3487761 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3487764 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3487982 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3487991 |                4 |         1 |     0 | UNMOVABL |  000284d0
>           3488046 |                4 |         1 |     0 | UNMOVABL |  002284d0
>           3488057 |                4 |         1 |     0 | UNMOVABL |  000200d0
>           3488191 |                4 |         1 |     0 | UNMOVABL |  002284d0
>           3488203 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488206 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488210 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488211 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488213 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488215 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488298 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488325 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488326 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488327 |                4 |         1 |     0 | UNMOVABL |  000202d0
>           3488329 |                4 |         1 |     0 | UNMOVABL |  000202d0
>  ...              | ...              | ...       | ...   | ...      | ...     
> --------------------------------------------------------------------------------

Hmm.. looks like you ran some old version.  Please check v6! :)

Thanks,
Namhyung


> 
> SUMMARY (page allocator)
> ========================
> Total allocation requests     :              166   [              664 KB ]
> Total free requests           :              239   [              956 KB ]
> 
> Total alloc+freed requests    :               49   [              196 KB ]
> Total alloc-only requests     :              117   [              468 KB ]
> Total free-only requests      :              190   [              760 KB ]
> 
> Total allocation failures     :                0   [                0 KB ]
> 
> Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
> -----  ------------  ------------  ------------  ------------  ------------
>     0           143             .            23             .             .
>     1             .             .             .             .             .
>     2             .             .             .             .             .
>     3             .             .             .             .             .
>     4             .             .             .             .             .
>     5             .             .             .             .             .
>     6             .             .             .             .             .
>     7             .             .             .             .             .
>     8             .             .             .             .             .
>     9             .             .             .             .             .
>    10             .             .             .             .             .
> [root@ssdandy ~]#
> 
> - Arnaldo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
  2015-04-13  6:59         ` Namhyung Kim
@ 2015-04-13 13:21           ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 13:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Mon, Apr 13, 2015 at 03:59:24PM +0900, Namhyung Kim escreveu:
> On Fri, Apr 10, 2015 at 06:10:49PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Apr 10, 2015 at 06:06:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> > > > If none of these --slab nor --page is specified, --slab is implied.

> > > >   # perf kmem stat --page --alloc --line 10

> > > Hi, applied the first patch, the kernel one, reboot with that kernel:

> > <SNIP>

> > > [root@ssdandy ~]#

> > > What am I missing?

> > Argh, I was expecting to read just what is in that cset and be able to
> > reproduce the results, had to go back to the [PATCH 0/0] cover letter to
> > figure out that I need to run:

> > perf kmem record --page sleep 5

> Right.  Maybe I need to change to print warning if no events found
> with option.

Ok!

> Hmm.. looks like you ran some old version.  Please check v6! :)

Thanks, will do,

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
@ 2015-04-13 13:21           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 13:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Mon, Apr 13, 2015 at 03:59:24PM +0900, Namhyung Kim escreveu:
> On Fri, Apr 10, 2015 at 06:10:49PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Apr 10, 2015 at 06:06:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> > > > If none of these --slab nor --page is specified, --slab is implied.

> > > >   # perf kmem stat --page --alloc --line 10

> > > Hi, applied the first patch, the kernel one, reboot with that kernel:

> > <SNIP>

> > > [root@ssdandy ~]#

> > > What am I missing?

> > Argh, I was expecting to read just what is in that cset and be able to
> > reproduce the results, had to go back to the [PATCH 0/0] cover letter to
> > figure out that I need to run:

> > perf kmem record --page sleep 5

> Right.  Maybe I need to change to print warning if no events found
> with option.

Ok!

> Hmm.. looks like you ran some old version.  Please check v6! :)

Thanks, will do,

- Arnaldo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/9] perf kmem: Implement stat --page --caller
  2015-04-06  5:36   ` Namhyung Kim
@ 2015-04-13 13:40     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 13:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Mon, Apr 06, 2015 at 02:36:11PM +0900, Namhyung Kim escreveu:
> It perf kmem support caller statistics for page.  Unlike slab case,
> the tracepoints in page allocator don't provide callsite info.  So
> it records with callchain and extracts callsite info.
> 
> Note that the callchain contains several memory allocation functions
> which has no meaning for users.  So skip those functions to get proper
> callsites.  I used following regex pattern to skip the allocator
> functions:
> 
>   ^_?_?(alloc|get_free|get_zeroed)_pages?
> 
> This gave me a following list of functions:
> 
>   # perf kmem record --page sleep 3
>   # perf kmem stat --page -v
>   ...
>   alloc func: __get_free_pages
>   alloc func: get_zeroed_page
>   alloc func: alloc_pages_exact
>   alloc func: __alloc_pages_direct_compact
>   alloc func: __alloc_pages_nodemask
>   alloc func: alloc_page_interleave
>   alloc func: alloc_pages_current
>   alloc func: alloc_pages_vma
>   alloc func: alloc_page_buffers
>   alloc func: alloc_pages_exact_nid
>   ...
> 
> The output looks mostly same as --alloc (I also added callsite column
> to that) but groups entries by callsite.  Currently, the order,
> migrate type and GFP flag info is for the last allocation and not
> guaranteed to be same for all allocations from the callsite.
> 
>   ---------------------------------------------------------------------------------------------
>    Total_alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite
>   ---------------------------------------------------------------------------------------------
>               1,064 |       266 |     0 | UNMOVABL |  000000d0 | __pollwait
>                  52 |        13 |     0 | UNMOVABL |  002084d0 | pte_alloc_one
>                  44 |        11 |     0 |  MOVABLE |  000280da | handle_mm_fault
>                  20 |         5 |     0 |  MOVABLE |  000200da | do_cow_fault
>                  20 |         5 |     0 |  MOVABLE |  000200da | do_wp_page
>                  16 |         4 |     0 | UNMOVABL |  000084d0 | __pmd_alloc
>                  16 |         4 |     0 | UNMOVABL |  00000200 | __tlb_remove_page
>                  12 |         3 |     0 | UNMOVABL |  000084d0 | __pud_alloc
>                   8 |         2 |     0 | UNMOVABL |  00000010 | bio_copy_user_iov
>                   4 |         1 |     0 | UNMOVABL |  000200d2 | pipe_write
>                   4 |         1 |     0 |  MOVABLE |  000280da | do_wp_page
>                   4 |         1 |     0 | UNMOVABL |  002084d0 | pgd_alloc
>   ---------------------------------------------------------------------------------------------
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-kmem.c | 279 +++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 263 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index 63ea01349b6e..5b3ed17c293a 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -10,6 +10,7 @@
>  #include "util/header.h"
>  #include "util/session.h"
>  #include "util/tool.h"
> +#include "util/callchain.h"
>  
>  #include "util/parse-options.h"
>  #include "util/trace-event.h"
> @@ -21,6 +22,7 @@
>  #include <linux/rbtree.h>
>  #include <linux/string.h>
>  #include <locale.h>
> +#include <regex.h>
>  
>  static int	kmem_slab;
>  static int	kmem_page;
> @@ -241,6 +243,7 @@ static unsigned long nr_page_fails;
>  static unsigned long nr_page_nomatch;
>  
>  static bool use_pfn;
> +static struct perf_session *kmem_session;
>  
>  #define MAX_MIGRATE_TYPES  6
>  #define MAX_PAGE_ORDER     11
> @@ -250,6 +253,7 @@ static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
>  struct page_stat {
>  	struct rb_node 	node;
>  	u64 		page;
> +	u64 		callsite;
>  	int 		order;
>  	unsigned 	gfp_flags;
>  	unsigned 	migrate_type;
> @@ -262,8 +266,138 @@ struct page_stat {
>  static struct rb_root page_tree;
>  static struct rb_root page_alloc_tree;
>  static struct rb_root page_alloc_sorted;
> +static struct rb_root page_caller_tree;
> +static struct rb_root page_caller_sorted;
>  
> -static struct page_stat *search_page(unsigned long page, bool create)
> +struct alloc_func {
> +	u64 start;
> +	u64 end;
> +	char *name;
> +};
> +
> +static int nr_alloc_funcs;
> +static struct alloc_func *alloc_func_list;
> +
> +static int funcmp(const void *a, const void *b)
> +{
> +	const struct alloc_func *fa = a;
> +	const struct alloc_func *fb = b;
> +
> +	if (fa->start > fb->start)
> +		return 1;
> +	else
> +		return -1;
> +}
> +
> +static int callcmp(const void *a, const void *b)
> +{
> +	const struct alloc_func *fa = a;
> +	const struct alloc_func *fb = b;
> +
> +	if (fb->start <= fa->start && fa->end < fb->end)
> +		return 0;
> +
> +	if (fa->start > fb->start)
> +		return 1;
> +	else
> +		return -1;
> +}
> +
> +static int build_alloc_func_list(void)
> +{
> +	int ret;
> +	struct map *kernel_map;
> +	struct symbol *sym;
> +	struct rb_node *node;
> +	struct alloc_func *func;
> +	struct machine *machine = &kmem_session->machines.host;
> +

Why having a blank like here?

> +	regex_t alloc_func_regex;
> +	const char pattern[] = "^_?_?(alloc|get_free|get_zeroed)_pages?";
> +
> +	ret = regcomp(&alloc_func_regex, pattern, REG_EXTENDED);
> +	if (ret) {
> +		char err[BUFSIZ];
> +
> +		regerror(ret, &alloc_func_regex, err, sizeof(err));
> +		pr_err("Invalid regex: %s\n%s", pattern, err);
> +		return -EINVAL;
> +	}
> +
> +	kernel_map = machine->vmlinux_maps[MAP__FUNCTION];
> +	map__load(kernel_map, NULL);

What if the map can't be loaded?

> +
> +	map__for_each_symbol(kernel_map, sym, node) {
> +		if (regexec(&alloc_func_regex, sym->name, 0, NULL, 0))
> +			continue;
> +
> +		func = realloc(alloc_func_list,
> +			       (nr_alloc_funcs + 1) * sizeof(*func));
> +		if (func == NULL)
> +			return -ENOMEM;
> +
> +		pr_debug("alloc func: %s\n", sym->name);
> +		func[nr_alloc_funcs].start = sym->start;
> +		func[nr_alloc_funcs].end   = sym->end;
> +		func[nr_alloc_funcs].name  = sym->name;
> +
> +		alloc_func_list = func;
> +		nr_alloc_funcs++;
> +	}
> +
> +	qsort(alloc_func_list, nr_alloc_funcs, sizeof(*func), funcmp);
> +
> +	regfree(&alloc_func_regex);
> +	return 0;
> +}
> +
> +/*
> + * Find first non-memory allocation function from callchain.
> + * The allocation functions are in the 'alloc_func_list'.
> + */
> +static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
> +{
> +	struct addr_location al;
> +	struct machine *machine = &kmem_session->machines.host;
> +	struct callchain_cursor_node *node;
> +
> +	if (alloc_func_list == NULL)
> +		build_alloc_func_list();
> +
> +	al.thread = machine__findnew_thread(machine, sample->pid, sample->tid);
> +	sample__resolve_callchain(sample, NULL, evsel, &al, 16);
> +
> +	callchain_cursor_commit(&callchain_cursor);
> +	while (true) {
> +		struct alloc_func key, *caller;
> +		u64 addr;
> +
> +		node = callchain_cursor_current(&callchain_cursor);
> +		if (node == NULL)
> +			break;
> +
> +		key.start = key.end = node->ip;
> +		caller = bsearch(&key, alloc_func_list, nr_alloc_funcs,
> +				 sizeof(key), callcmp);
> +		if (!caller) {
> +			/* found */
> +			if (node->map)
> +				addr = map__unmap_ip(node->map, node->ip);
> +			else
> +				addr = node->ip;
> +
> +			return addr;
> +		} else
> +			pr_debug3("skipping alloc function: %s\n", caller->name);
> +
> +		callchain_cursor_advance(&callchain_cursor);
> +	}
> +
> +	pr_debug2("unknown callsite: %"PRIx64 "\n", sample->ip);
> +	return sample->ip;
> +}
> +
> +static struct page_stat *search_page(u64 page, bool create)
>  {
>  	struct rb_node **node = &page_tree.rb_node;
>  	struct rb_node *parent = NULL;
> @@ -357,6 +491,41 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
>  	return data;
>  }
>  
> +static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
> +{
> +	struct rb_node **node = &page_caller_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct page_stat *data;

Please use the "findnew" idiom to name this function, looking at only
its name one things it searches a tree, a read only operation, but it
may insert elements too, a modify operation.

Since we use the findnew idiom elsewhere for operations that do that,
i.e. optimize the "new" part of "findnew" by using the "find" part,
please use it here as well.

> +	while (*node) {
> +		s64 cmp;
> +
> +		parent = *node;
> +		data = rb_entry(*node, struct page_stat, node);
> +
> +		cmp = data->callsite - callsite;
> +		if (cmp < 0)
> +			node = &parent->rb_left;
> +		else if (cmp > 0)
> +			node = &parent->rb_right;
> +		else
> +			return data;
> +	}
> +
> +	if (!create)
> +		return NULL;
> +
> +	data = zalloc(sizeof(*data));
> +	if (data != NULL) {
> +		data->callsite = callsite;
> +
> +		rb_link_node(&data->node, parent, node);
> +		rb_insert_color(&data->node, &page_caller_tree);
> +	}
> +
> +	return data;
> +}
> +
>  static bool valid_page(u64 pfn_or_page)
>  {
>  	if (use_pfn && pfn_or_page == -1UL)
> @@ -375,6 +544,7 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
>  	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
>  						       "migratetype");
>  	u64 bytes = kmem_page_size << order;
> +	u64 callsite;
>  	struct page_stat *stat;
>  	struct page_stat this = {
>  		.order = order,
> @@ -397,6 +567,8 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
>  		return 0;
>  	}
>  
> +	callsite = find_callsite(evsel, sample);
> +
>  	/*
>  	 * This is to find the current page (with correct gfp flags and
>  	 * migrate type) at free event.
> @@ -408,6 +580,7 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
>  	stat->order = order;
>  	stat->gfp_flags = gfp_flags;
>  	stat->migrate_type = migrate_type;
> +	stat->callsite = callsite;
>  
>  	this.page = page;
>  	stat = search_page_alloc_stat(&this, true);
> @@ -416,6 +589,18 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
>  
>  	stat->nr_alloc++;
>  	stat->alloc_bytes += bytes;
> +	stat->callsite = callsite;
> +
> +	stat = search_page_caller_stat(callsite, true);
> +	if (stat == NULL)
> +		return -ENOMEM;
> +
> +	stat->order = order;
> +	stat->gfp_flags = gfp_flags;
> +	stat->migrate_type = migrate_type;
> +
> +	stat->nr_alloc++;
> +	stat->alloc_bytes += bytes;
>  
>  	order_stats[order][migrate_type]++;
>  
> @@ -455,6 +640,7 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
>  	this.page = page;
>  	this.gfp_flags = stat->gfp_flags;
>  	this.migrate_type = stat->migrate_type;
> +	this.callsite = stat->callsite;
>  
>  	rb_erase(&stat->node, &page_tree);
>  	free(stat);
> @@ -466,6 +652,13 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
>  	stat->nr_free++;
>  	stat->free_bytes += bytes;
>  
> +	stat = search_page_caller_stat(this.callsite, false);
> +	if (stat == NULL)
> +		return -ENOENT;
> +
> +	stat->nr_free++;
> +	stat->free_bytes += bytes;
> +
>  	return 0;
>  }
>  
> @@ -576,41 +769,89 @@ static const char * const migrate_type_str[] = {
>  	"UNKNOWN",
>  };
>  
> -static void __print_page_result(struct rb_root *root,
> -				struct perf_session *session __maybe_unused,
> -				int n_lines)
> +static void __print_page_alloc_result(struct perf_session *session, int n_lines)
>  {
> -	struct rb_node *next = rb_first(root);
> +	struct rb_node *next = rb_first(&page_alloc_sorted);
> +	struct machine *machine = &session->machines.host;
>  	const char *format;
>  
> -	printf("\n%.80s\n", graph_dotted_line);
> -	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
> +	printf("\n%.105s\n", graph_dotted_line);
> +	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
>  	       use_pfn ? "PFN" : "Page");
> -	printf("%.80s\n", graph_dotted_line);
> +	printf("%.105s\n", graph_dotted_line);
>  
>  	if (use_pfn)
> -		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
> +		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
>  	else
> -		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
> +		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
>  
>  	while (next && n_lines--) {
>  		struct page_stat *data;
> +		struct symbol *sym;
> +		struct map *map;
> +		char buf[32];
> +		char *caller = buf;
>  
>  		data = rb_entry(next, struct page_stat, node);
> +		sym = machine__find_kernel_function(machine, data->callsite,
> +						    &map, NULL);
> +		if (sym && sym->name)
> +			caller = sym->name;
> +		else
> +			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
>  
>  		printf(format, (unsigned long long)data->page,
>  		       (unsigned long long)data->alloc_bytes / 1024,
>  		       data->nr_alloc, data->order,
>  		       migrate_type_str[data->migrate_type],
> -		       (unsigned long)data->gfp_flags);
> +		       (unsigned long)data->gfp_flags, caller);
> +
> +		next = rb_next(next);
> +	}
> +
> +	if (n_lines == -1)
> +		printf(" ...              | ...              | ...       | ...   | ...      | ...       | ...\n");
> +
> +	printf("%.105s\n", graph_dotted_line);
> +}
> +
> +static void __print_page_caller_result(struct perf_session *session, int n_lines)
> +{
> +	struct rb_node *next = rb_first(&page_caller_sorted);
> +	struct machine *machine = &session->machines.host;
> +
> +	printf("\n%.105s\n", graph_dotted_line);
> +	printf(" Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n");
> +	printf("%.105s\n", graph_dotted_line);
> +
> +	while (next && n_lines--) {
> +		struct page_stat *data;
> +		struct symbol *sym;
> +		struct map *map;
> +		char buf[32];
> +		char *caller = buf;
> +
> +		data = rb_entry(next, struct page_stat, node);
> +		sym = machine__find_kernel_function(machine, data->callsite,
> +						    &map, NULL);
> +		if (sym && sym->name)
> +			caller = sym->name;
> +		else
> +			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
> +
> +		printf(" %'16llu | %'9d | %5d | %8s |  %08lx | %s\n",
> +		       (unsigned long long)data->alloc_bytes / 1024,
> +		       data->nr_alloc, data->order,
> +		       migrate_type_str[data->migrate_type],
> +		       (unsigned long)data->gfp_flags, caller);
>  
>  		next = rb_next(next);
>  	}
>  
>  	if (n_lines == -1)
> -		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
> +		printf(" ...              | ...       | ...   | ...      | ...       | ...\n");
>  
> -	printf("%.80s\n", graph_dotted_line);
> +	printf("%.105s\n", graph_dotted_line);
>  }
>  
>  static void print_slab_summary(void)
> @@ -682,8 +923,10 @@ static void print_slab_result(struct perf_session *session)
>  
>  static void print_page_result(struct perf_session *session)
>  {
> +	if (caller_flag)
> +		__print_page_caller_result(session, caller_lines);
>  	if (alloc_flag)
> -		__print_page_result(&page_alloc_sorted, session, alloc_lines);
> +		__print_page_alloc_result(session, alloc_lines);
>  	print_page_summary();
>  }
>  
> @@ -802,6 +1045,7 @@ static void sort_result(void)
>  	}
>  	if (kmem_page) {
>  		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
> +		__sort_page_result(&page_caller_tree, &page_caller_sorted);
>  	}
>  }
>  
> @@ -1084,7 +1328,7 @@ static int __cmd_record(int argc, const char **argv)
>  	if (kmem_slab)
>  		rec_argc += ARRAY_SIZE(slab_events);
>  	if (kmem_page)
> -		rec_argc += ARRAY_SIZE(page_events);
> +		rec_argc += ARRAY_SIZE(page_events) + 1; /* for -g */
>  
>  	rec_argv = calloc(rec_argc + 1, sizeof(char *));
>  
> @@ -1099,6 +1343,8 @@ static int __cmd_record(int argc, const char **argv)
>  			rec_argv[i] = strdup(slab_events[j]);
>  	}
>  	if (kmem_page) {
> +		rec_argv[i++] = strdup("-g");
> +
>  		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
>  			rec_argv[i] = strdup(page_events[j]);
>  	}
> @@ -1159,7 +1405,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  
>  	file.path = input_name;
>  
> -	session = perf_session__new(&file, false, &perf_kmem);
> +	kmem_session = session = perf_session__new(&file, false, &perf_kmem);
>  	if (session == NULL)
>  		return -1;
>  
> @@ -1172,6 +1418,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  		}
>  
>  		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
> +		symbol_conf.use_callchain = true;
>  	}
>  
>  	symbol__init(&session->header.env);
> -- 
> 2.3.2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/9] perf kmem: Implement stat --page --caller
@ 2015-04-13 13:40     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 13:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Mon, Apr 06, 2015 at 02:36:11PM +0900, Namhyung Kim escreveu:
> It perf kmem support caller statistics for page.  Unlike slab case,
> the tracepoints in page allocator don't provide callsite info.  So
> it records with callchain and extracts callsite info.
> 
> Note that the callchain contains several memory allocation functions
> which has no meaning for users.  So skip those functions to get proper
> callsites.  I used following regex pattern to skip the allocator
> functions:
> 
>   ^_?_?(alloc|get_free|get_zeroed)_pages?
> 
> This gave me a following list of functions:
> 
>   # perf kmem record --page sleep 3
>   # perf kmem stat --page -v
>   ...
>   alloc func: __get_free_pages
>   alloc func: get_zeroed_page
>   alloc func: alloc_pages_exact
>   alloc func: __alloc_pages_direct_compact
>   alloc func: __alloc_pages_nodemask
>   alloc func: alloc_page_interleave
>   alloc func: alloc_pages_current
>   alloc func: alloc_pages_vma
>   alloc func: alloc_page_buffers
>   alloc func: alloc_pages_exact_nid
>   ...
> 
> The output looks mostly same as --alloc (I also added callsite column
> to that) but groups entries by callsite.  Currently, the order,
> migrate type and GFP flag info is for the last allocation and not
> guaranteed to be same for all allocations from the callsite.
> 
>   ---------------------------------------------------------------------------------------------
>    Total_alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite
>   ---------------------------------------------------------------------------------------------
>               1,064 |       266 |     0 | UNMOVABL |  000000d0 | __pollwait
>                  52 |        13 |     0 | UNMOVABL |  002084d0 | pte_alloc_one
>                  44 |        11 |     0 |  MOVABLE |  000280da | handle_mm_fault
>                  20 |         5 |     0 |  MOVABLE |  000200da | do_cow_fault
>                  20 |         5 |     0 |  MOVABLE |  000200da | do_wp_page
>                  16 |         4 |     0 | UNMOVABL |  000084d0 | __pmd_alloc
>                  16 |         4 |     0 | UNMOVABL |  00000200 | __tlb_remove_page
>                  12 |         3 |     0 | UNMOVABL |  000084d0 | __pud_alloc
>                   8 |         2 |     0 | UNMOVABL |  00000010 | bio_copy_user_iov
>                   4 |         1 |     0 | UNMOVABL |  000200d2 | pipe_write
>                   4 |         1 |     0 |  MOVABLE |  000280da | do_wp_page
>                   4 |         1 |     0 | UNMOVABL |  002084d0 | pgd_alloc
>   ---------------------------------------------------------------------------------------------
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-kmem.c | 279 +++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 263 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
> index 63ea01349b6e..5b3ed17c293a 100644
> --- a/tools/perf/builtin-kmem.c
> +++ b/tools/perf/builtin-kmem.c
> @@ -10,6 +10,7 @@
>  #include "util/header.h"
>  #include "util/session.h"
>  #include "util/tool.h"
> +#include "util/callchain.h"
>  
>  #include "util/parse-options.h"
>  #include "util/trace-event.h"
> @@ -21,6 +22,7 @@
>  #include <linux/rbtree.h>
>  #include <linux/string.h>
>  #include <locale.h>
> +#include <regex.h>
>  
>  static int	kmem_slab;
>  static int	kmem_page;
> @@ -241,6 +243,7 @@ static unsigned long nr_page_fails;
>  static unsigned long nr_page_nomatch;
>  
>  static bool use_pfn;
> +static struct perf_session *kmem_session;
>  
>  #define MAX_MIGRATE_TYPES  6
>  #define MAX_PAGE_ORDER     11
> @@ -250,6 +253,7 @@ static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
>  struct page_stat {
>  	struct rb_node 	node;
>  	u64 		page;
> +	u64 		callsite;
>  	int 		order;
>  	unsigned 	gfp_flags;
>  	unsigned 	migrate_type;
> @@ -262,8 +266,138 @@ struct page_stat {
>  static struct rb_root page_tree;
>  static struct rb_root page_alloc_tree;
>  static struct rb_root page_alloc_sorted;
> +static struct rb_root page_caller_tree;
> +static struct rb_root page_caller_sorted;
>  
> -static struct page_stat *search_page(unsigned long page, bool create)
> +struct alloc_func {
> +	u64 start;
> +	u64 end;
> +	char *name;
> +};
> +
> +static int nr_alloc_funcs;
> +static struct alloc_func *alloc_func_list;
> +
> +static int funcmp(const void *a, const void *b)
> +{
> +	const struct alloc_func *fa = a;
> +	const struct alloc_func *fb = b;
> +
> +	if (fa->start > fb->start)
> +		return 1;
> +	else
> +		return -1;
> +}
> +
> +static int callcmp(const void *a, const void *b)
> +{
> +	const struct alloc_func *fa = a;
> +	const struct alloc_func *fb = b;
> +
> +	if (fb->start <= fa->start && fa->end < fb->end)
> +		return 0;
> +
> +	if (fa->start > fb->start)
> +		return 1;
> +	else
> +		return -1;
> +}
> +
> +static int build_alloc_func_list(void)
> +{
> +	int ret;
> +	struct map *kernel_map;
> +	struct symbol *sym;
> +	struct rb_node *node;
> +	struct alloc_func *func;
> +	struct machine *machine = &kmem_session->machines.host;
> +

Why having a blank like here?

> +	regex_t alloc_func_regex;
> +	const char pattern[] = "^_?_?(alloc|get_free|get_zeroed)_pages?";
> +
> +	ret = regcomp(&alloc_func_regex, pattern, REG_EXTENDED);
> +	if (ret) {
> +		char err[BUFSIZ];
> +
> +		regerror(ret, &alloc_func_regex, err, sizeof(err));
> +		pr_err("Invalid regex: %s\n%s", pattern, err);
> +		return -EINVAL;
> +	}
> +
> +	kernel_map = machine->vmlinux_maps[MAP__FUNCTION];
> +	map__load(kernel_map, NULL);

What if the map can't be loaded?

> +
> +	map__for_each_symbol(kernel_map, sym, node) {
> +		if (regexec(&alloc_func_regex, sym->name, 0, NULL, 0))
> +			continue;
> +
> +		func = realloc(alloc_func_list,
> +			       (nr_alloc_funcs + 1) * sizeof(*func));
> +		if (func == NULL)
> +			return -ENOMEM;
> +
> +		pr_debug("alloc func: %s\n", sym->name);
> +		func[nr_alloc_funcs].start = sym->start;
> +		func[nr_alloc_funcs].end   = sym->end;
> +		func[nr_alloc_funcs].name  = sym->name;
> +
> +		alloc_func_list = func;
> +		nr_alloc_funcs++;
> +	}
> +
> +	qsort(alloc_func_list, nr_alloc_funcs, sizeof(*func), funcmp);
> +
> +	regfree(&alloc_func_regex);
> +	return 0;
> +}
> +
> +/*
> + * Find first non-memory allocation function from callchain.
> + * The allocation functions are in the 'alloc_func_list'.
> + */
> +static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
> +{
> +	struct addr_location al;
> +	struct machine *machine = &kmem_session->machines.host;
> +	struct callchain_cursor_node *node;
> +
> +	if (alloc_func_list == NULL)
> +		build_alloc_func_list();
> +
> +	al.thread = machine__findnew_thread(machine, sample->pid, sample->tid);
> +	sample__resolve_callchain(sample, NULL, evsel, &al, 16);
> +
> +	callchain_cursor_commit(&callchain_cursor);
> +	while (true) {
> +		struct alloc_func key, *caller;
> +		u64 addr;
> +
> +		node = callchain_cursor_current(&callchain_cursor);
> +		if (node == NULL)
> +			break;
> +
> +		key.start = key.end = node->ip;
> +		caller = bsearch(&key, alloc_func_list, nr_alloc_funcs,
> +				 sizeof(key), callcmp);
> +		if (!caller) {
> +			/* found */
> +			if (node->map)
> +				addr = map__unmap_ip(node->map, node->ip);
> +			else
> +				addr = node->ip;
> +
> +			return addr;
> +		} else
> +			pr_debug3("skipping alloc function: %s\n", caller->name);
> +
> +		callchain_cursor_advance(&callchain_cursor);
> +	}
> +
> +	pr_debug2("unknown callsite: %"PRIx64 "\n", sample->ip);
> +	return sample->ip;
> +}
> +
> +static struct page_stat *search_page(u64 page, bool create)
>  {
>  	struct rb_node **node = &page_tree.rb_node;
>  	struct rb_node *parent = NULL;
> @@ -357,6 +491,41 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
>  	return data;
>  }
>  
> +static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
> +{
> +	struct rb_node **node = &page_caller_tree.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct page_stat *data;

Please use the "findnew" idiom to name this function, looking at only
its name one things it searches a tree, a read only operation, but it
may insert elements too, a modify operation.

Since we use the findnew idiom elsewhere for operations that do that,
i.e. optimize the "new" part of "findnew" by using the "find" part,
please use it here as well.

> +	while (*node) {
> +		s64 cmp;
> +
> +		parent = *node;
> +		data = rb_entry(*node, struct page_stat, node);
> +
> +		cmp = data->callsite - callsite;
> +		if (cmp < 0)
> +			node = &parent->rb_left;
> +		else if (cmp > 0)
> +			node = &parent->rb_right;
> +		else
> +			return data;
> +	}
> +
> +	if (!create)
> +		return NULL;
> +
> +	data = zalloc(sizeof(*data));
> +	if (data != NULL) {
> +		data->callsite = callsite;
> +
> +		rb_link_node(&data->node, parent, node);
> +		rb_insert_color(&data->node, &page_caller_tree);
> +	}
> +
> +	return data;
> +}
> +
>  static bool valid_page(u64 pfn_or_page)
>  {
>  	if (use_pfn && pfn_or_page == -1UL)
> @@ -375,6 +544,7 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
>  	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
>  						       "migratetype");
>  	u64 bytes = kmem_page_size << order;
> +	u64 callsite;
>  	struct page_stat *stat;
>  	struct page_stat this = {
>  		.order = order,
> @@ -397,6 +567,8 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
>  		return 0;
>  	}
>  
> +	callsite = find_callsite(evsel, sample);
> +
>  	/*
>  	 * This is to find the current page (with correct gfp flags and
>  	 * migrate type) at free event.
> @@ -408,6 +580,7 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
>  	stat->order = order;
>  	stat->gfp_flags = gfp_flags;
>  	stat->migrate_type = migrate_type;
> +	stat->callsite = callsite;
>  
>  	this.page = page;
>  	stat = search_page_alloc_stat(&this, true);
> @@ -416,6 +589,18 @@ static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
>  
>  	stat->nr_alloc++;
>  	stat->alloc_bytes += bytes;
> +	stat->callsite = callsite;
> +
> +	stat = search_page_caller_stat(callsite, true);
> +	if (stat == NULL)
> +		return -ENOMEM;
> +
> +	stat->order = order;
> +	stat->gfp_flags = gfp_flags;
> +	stat->migrate_type = migrate_type;
> +
> +	stat->nr_alloc++;
> +	stat->alloc_bytes += bytes;
>  
>  	order_stats[order][migrate_type]++;
>  
> @@ -455,6 +640,7 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
>  	this.page = page;
>  	this.gfp_flags = stat->gfp_flags;
>  	this.migrate_type = stat->migrate_type;
> +	this.callsite = stat->callsite;
>  
>  	rb_erase(&stat->node, &page_tree);
>  	free(stat);
> @@ -466,6 +652,13 @@ static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
>  	stat->nr_free++;
>  	stat->free_bytes += bytes;
>  
> +	stat = search_page_caller_stat(this.callsite, false);
> +	if (stat == NULL)
> +		return -ENOENT;
> +
> +	stat->nr_free++;
> +	stat->free_bytes += bytes;
> +
>  	return 0;
>  }
>  
> @@ -576,41 +769,89 @@ static const char * const migrate_type_str[] = {
>  	"UNKNOWN",
>  };
>  
> -static void __print_page_result(struct rb_root *root,
> -				struct perf_session *session __maybe_unused,
> -				int n_lines)
> +static void __print_page_alloc_result(struct perf_session *session, int n_lines)
>  {
> -	struct rb_node *next = rb_first(root);
> +	struct rb_node *next = rb_first(&page_alloc_sorted);
> +	struct machine *machine = &session->machines.host;
>  	const char *format;
>  
> -	printf("\n%.80s\n", graph_dotted_line);
> -	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
> +	printf("\n%.105s\n", graph_dotted_line);
> +	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n",
>  	       use_pfn ? "PFN" : "Page");
> -	printf("%.80s\n", graph_dotted_line);
> +	printf("%.105s\n", graph_dotted_line);
>  
>  	if (use_pfn)
> -		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
> +		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
>  	else
> -		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
> +		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx | %s\n";
>  
>  	while (next && n_lines--) {
>  		struct page_stat *data;
> +		struct symbol *sym;
> +		struct map *map;
> +		char buf[32];
> +		char *caller = buf;
>  
>  		data = rb_entry(next, struct page_stat, node);
> +		sym = machine__find_kernel_function(machine, data->callsite,
> +						    &map, NULL);
> +		if (sym && sym->name)
> +			caller = sym->name;
> +		else
> +			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
>  
>  		printf(format, (unsigned long long)data->page,
>  		       (unsigned long long)data->alloc_bytes / 1024,
>  		       data->nr_alloc, data->order,
>  		       migrate_type_str[data->migrate_type],
> -		       (unsigned long)data->gfp_flags);
> +		       (unsigned long)data->gfp_flags, caller);
> +
> +		next = rb_next(next);
> +	}
> +
> +	if (n_lines == -1)
> +		printf(" ...              | ...              | ...       | ...   | ...      | ...       | ...\n");
> +
> +	printf("%.105s\n", graph_dotted_line);
> +}
> +
> +static void __print_page_caller_result(struct perf_session *session, int n_lines)
> +{
> +	struct rb_node *next = rb_first(&page_caller_sorted);
> +	struct machine *machine = &session->machines.host;
> +
> +	printf("\n%.105s\n", graph_dotted_line);
> +	printf(" Total alloc (KB) | Hits      | Order | Mig.type | GFP flags | Callsite\n");
> +	printf("%.105s\n", graph_dotted_line);
> +
> +	while (next && n_lines--) {
> +		struct page_stat *data;
> +		struct symbol *sym;
> +		struct map *map;
> +		char buf[32];
> +		char *caller = buf;
> +
> +		data = rb_entry(next, struct page_stat, node);
> +		sym = machine__find_kernel_function(machine, data->callsite,
> +						    &map, NULL);
> +		if (sym && sym->name)
> +			caller = sym->name;
> +		else
> +			scnprintf(buf, sizeof(buf), "%"PRIx64, data->callsite);
> +
> +		printf(" %'16llu | %'9d | %5d | %8s |  %08lx | %s\n",
> +		       (unsigned long long)data->alloc_bytes / 1024,
> +		       data->nr_alloc, data->order,
> +		       migrate_type_str[data->migrate_type],
> +		       (unsigned long)data->gfp_flags, caller);
>  
>  		next = rb_next(next);
>  	}
>  
>  	if (n_lines == -1)
> -		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
> +		printf(" ...              | ...       | ...   | ...      | ...       | ...\n");
>  
> -	printf("%.80s\n", graph_dotted_line);
> +	printf("%.105s\n", graph_dotted_line);
>  }
>  
>  static void print_slab_summary(void)
> @@ -682,8 +923,10 @@ static void print_slab_result(struct perf_session *session)
>  
>  static void print_page_result(struct perf_session *session)
>  {
> +	if (caller_flag)
> +		__print_page_caller_result(session, caller_lines);
>  	if (alloc_flag)
> -		__print_page_result(&page_alloc_sorted, session, alloc_lines);
> +		__print_page_alloc_result(session, alloc_lines);
>  	print_page_summary();
>  }
>  
> @@ -802,6 +1045,7 @@ static void sort_result(void)
>  	}
>  	if (kmem_page) {
>  		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
> +		__sort_page_result(&page_caller_tree, &page_caller_sorted);
>  	}
>  }
>  
> @@ -1084,7 +1328,7 @@ static int __cmd_record(int argc, const char **argv)
>  	if (kmem_slab)
>  		rec_argc += ARRAY_SIZE(slab_events);
>  	if (kmem_page)
> -		rec_argc += ARRAY_SIZE(page_events);
> +		rec_argc += ARRAY_SIZE(page_events) + 1; /* for -g */
>  
>  	rec_argv = calloc(rec_argc + 1, sizeof(char *));
>  
> @@ -1099,6 +1343,8 @@ static int __cmd_record(int argc, const char **argv)
>  			rec_argv[i] = strdup(slab_events[j]);
>  	}
>  	if (kmem_page) {
> +		rec_argv[i++] = strdup("-g");
> +
>  		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
>  			rec_argv[i] = strdup(page_events[j]);
>  	}
> @@ -1159,7 +1405,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  
>  	file.path = input_name;
>  
> -	session = perf_session__new(&file, false, &perf_kmem);
> +	kmem_session = session = perf_session__new(&file, false, &perf_kmem);
>  	if (session == NULL)
>  		return -1;
>  
> @@ -1172,6 +1418,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
>  		}
>  
>  		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
> +		symbol_conf.use_callchain = true;
>  	}
>  
>  	symbol__init(&session->header.env);
> -- 
> 2.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
  2015-04-06  5:36   ` Namhyung Kim
@ 2015-04-13 13:40     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 13:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> The perf kmem command records and analyze kernel memory allocation
> only for SLAB objects.  This patch implement a simple page allocator
> analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.
> 
> It adds two new options of --slab and --page.  The --slab option is
> for analyzing SLAB allocator and that's what perf kmem currently does.
> 
> The new --page option enables page allocator events and analyze kernel
> memory usage in page unit.  Currently, 'stat --alloc' subcommand is
> implemented only.
> 
> If none of these --slab nor --page is specified, --slab is implied.
> 
>   # perf kmem stat --page --alloc --line 10

Applied this and the kernel part, tested, thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/9] perf kmem: Analyze page allocator events also
@ 2015-04-13 13:40     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 13:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Em Mon, Apr 06, 2015 at 02:36:10PM +0900, Namhyung Kim escreveu:
> The perf kmem command records and analyze kernel memory allocation
> only for SLAB objects.  This patch implement a simple page allocator
> analyzer using kmem:mm_page_alloc and kmem:mm_page_free events.
> 
> It adds two new options of --slab and --page.  The --slab option is
> for analyzing SLAB allocator and that's what perf kmem currently does.
> 
> The new --page option enables page allocator events and analyze kernel
> memory usage in page unit.  Currently, 'stat --alloc' subcommand is
> implemented only.
> 
> If none of these --slab nor --page is specified, --slab is implied.
> 
>   # perf kmem stat --page --alloc --line 10

Applied this and the kernel part, tested, thanks,

- Arnaldo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
  2015-04-06  5:36   ` Namhyung Kim
@ 2015-04-13 13:41     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 13:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm, Steven Rostedt

Em Mon, Apr 06, 2015 at 02:36:16PM +0900, Namhyung Kim escreveu:
> Currently it ignores operator priority and just sets processed args as a
> right operand.  But it could result in priority inversion in case that
> the right operand is also a operator arg and its priority is lower.
> 
> For example, following print format is from new kmem events.
> 
>   "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)
> 
> But this was treated as below:
> 
>   REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
> 
> In this case, the right arg was '?' operator which has lower priority.
> But it just sets the whole arg so making the output confusing - page was
> always 0 or 1 since that's the result of logical operation.
> 
> With this patch, it can handle it properly like following:
> 
>   ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

And this one already went upstream.

- Arnaldo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 9/9] tools lib traceevent: Honor operator priority
@ 2015-04-13 13:41     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 50+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 13:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm, Steven Rostedt

Em Mon, Apr 06, 2015 at 02:36:16PM +0900, Namhyung Kim escreveu:
> Currently it ignores operator priority and just sets processed args as a
> right operand.  But it could result in priority inversion in case that
> the right operand is also a operator arg and its priority is lower.
> 
> For example, following print format is from new kmem events.
> 
>   "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)
> 
> But this was treated as below:
> 
>   REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
> 
> In this case, the right arg was '?' operator which has lower priority.
> But it just sets the whole arg so making the output confusing - page was
> always 0 or 1 since that's the result of logical operation.
> 
> With this patch, it can handle it properly like following:
> 
>   ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)

And this one already went upstream.

- Arnaldo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/9] perf kmem: Implement stat --page --caller
  2015-04-13 13:40     ` Arnaldo Carvalho de Melo
@ 2015-04-14  2:17       ` Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-14  2:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Hi Arnaldo,

On Mon, Apr 13, 2015 at 10:40:24AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Apr 06, 2015 at 02:36:11PM +0900, Namhyung Kim escreveu:
> > +static int build_alloc_func_list(void)
> > +{
> > +	int ret;
> > +	struct map *kernel_map;
> > +	struct symbol *sym;
> > +	struct rb_node *node;
> > +	struct alloc_func *func;
> > +	struct machine *machine = &kmem_session->machines.host;
> > +
> 
> Why having a blank like here?

Will remove.

> 
> > +	regex_t alloc_func_regex;
> > +	const char pattern[] = "^_?_?(alloc|get_free|get_zeroed)_pages?";
> > +
> > +	ret = regcomp(&alloc_func_regex, pattern, REG_EXTENDED);
> > +	if (ret) {
> > +		char err[BUFSIZ];
> > +
> > +		regerror(ret, &alloc_func_regex, err, sizeof(err));
> > +		pr_err("Invalid regex: %s\n%s", pattern, err);
> > +		return -EINVAL;
> > +	}
> > +
> > +	kernel_map = machine->vmlinux_maps[MAP__FUNCTION];
> > +	map__load(kernel_map, NULL);
> 
> What if the map can't be loaded?

Hmm.. yes, I need to check the return code.


> 
> > +
> > +	map__for_each_symbol(kernel_map, sym, node) {
> > +		if (regexec(&alloc_func_regex, sym->name, 0, NULL, 0))
> > +			continue;
> > +
> > +		func = realloc(alloc_func_list,
> > +			       (nr_alloc_funcs + 1) * sizeof(*func));
> > +		if (func == NULL)
> > +			return -ENOMEM;
> > +
> > +		pr_debug("alloc func: %s\n", sym->name);
> > +		func[nr_alloc_funcs].start = sym->start;
> > +		func[nr_alloc_funcs].end   = sym->end;
> > +		func[nr_alloc_funcs].name  = sym->name;
> > +
> > +		alloc_func_list = func;
> > +		nr_alloc_funcs++;
> > +	}
> > +
> > +	qsort(alloc_func_list, nr_alloc_funcs, sizeof(*func), funcmp);
> > +
> > +	regfree(&alloc_func_regex);
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Find first non-memory allocation function from callchain.
> > + * The allocation functions are in the 'alloc_func_list'.
> > + */
> > +static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
> > +{
> > +	struct addr_location al;
> > +	struct machine *machine = &kmem_session->machines.host;
> > +	struct callchain_cursor_node *node;
> > +
> > +	if (alloc_func_list == NULL)
> > +		build_alloc_func_list();

And here too..


> > +
> > +	al.thread = machine__findnew_thread(machine, sample->pid, sample->tid);
> > +	sample__resolve_callchain(sample, NULL, evsel, &al, 16);
> > +
> > +	callchain_cursor_commit(&callchain_cursor);
> > +	while (true) {
> > +		struct alloc_func key, *caller;
> > +		u64 addr;
> > +
> > +		node = callchain_cursor_current(&callchain_cursor);
> > +		if (node == NULL)
> > +			break;
> > +
> > +		key.start = key.end = node->ip;
> > +		caller = bsearch(&key, alloc_func_list, nr_alloc_funcs,
> > +				 sizeof(key), callcmp);
> > +		if (!caller) {
> > +			/* found */
> > +			if (node->map)
> > +				addr = map__unmap_ip(node->map, node->ip);
> > +			else
> > +				addr = node->ip;
> > +
> > +			return addr;
> > +		} else
> > +			pr_debug3("skipping alloc function: %s\n", caller->name);
> > +
> > +		callchain_cursor_advance(&callchain_cursor);
> > +	}
> > +
> > +	pr_debug2("unknown callsite: %"PRIx64 "\n", sample->ip);
> > +	return sample->ip;
> > +}
> > +
> > +static struct page_stat *search_page(u64 page, bool create)
> >  {
> >  	struct rb_node **node = &page_tree.rb_node;
> >  	struct rb_node *parent = NULL;
> > @@ -357,6 +491,41 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
> >  	return data;
> >  }
> >  
> > +static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
> > +{
> > +	struct rb_node **node = &page_caller_tree.rb_node;
> > +	struct rb_node *parent = NULL;
> > +	struct page_stat *data;
> 
> Please use the "findnew" idiom to name this function, looking at only
> its name one things it searches a tree, a read only operation, but it
> may insert elements too, a modify operation.
> 
> Since we use the findnew idiom elsewhere for operations that do that,
> i.e. optimize the "new" part of "findnew" by using the "find" part,
> please use it here as well.

OK.  Will change and resend v7 soon.

Thanks for your review!
Namhyung

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/9] perf kmem: Implement stat --page --caller
@ 2015-04-14  2:17       ` Namhyung Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Namhyung Kim @ 2015-04-14  2:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim, linux-mm

Hi Arnaldo,

On Mon, Apr 13, 2015 at 10:40:24AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Apr 06, 2015 at 02:36:11PM +0900, Namhyung Kim escreveu:
> > +static int build_alloc_func_list(void)
> > +{
> > +	int ret;
> > +	struct map *kernel_map;
> > +	struct symbol *sym;
> > +	struct rb_node *node;
> > +	struct alloc_func *func;
> > +	struct machine *machine = &kmem_session->machines.host;
> > +
> 
> Why having a blank like here?

Will remove.

> 
> > +	regex_t alloc_func_regex;
> > +	const char pattern[] = "^_?_?(alloc|get_free|get_zeroed)_pages?";
> > +
> > +	ret = regcomp(&alloc_func_regex, pattern, REG_EXTENDED);
> > +	if (ret) {
> > +		char err[BUFSIZ];
> > +
> > +		regerror(ret, &alloc_func_regex, err, sizeof(err));
> > +		pr_err("Invalid regex: %s\n%s", pattern, err);
> > +		return -EINVAL;
> > +	}
> > +
> > +	kernel_map = machine->vmlinux_maps[MAP__FUNCTION];
> > +	map__load(kernel_map, NULL);
> 
> What if the map can't be loaded?

Hmm.. yes, I need to check the return code.


> 
> > +
> > +	map__for_each_symbol(kernel_map, sym, node) {
> > +		if (regexec(&alloc_func_regex, sym->name, 0, NULL, 0))
> > +			continue;
> > +
> > +		func = realloc(alloc_func_list,
> > +			       (nr_alloc_funcs + 1) * sizeof(*func));
> > +		if (func == NULL)
> > +			return -ENOMEM;
> > +
> > +		pr_debug("alloc func: %s\n", sym->name);
> > +		func[nr_alloc_funcs].start = sym->start;
> > +		func[nr_alloc_funcs].end   = sym->end;
> > +		func[nr_alloc_funcs].name  = sym->name;
> > +
> > +		alloc_func_list = func;
> > +		nr_alloc_funcs++;
> > +	}
> > +
> > +	qsort(alloc_func_list, nr_alloc_funcs, sizeof(*func), funcmp);
> > +
> > +	regfree(&alloc_func_regex);
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Find first non-memory allocation function from callchain.
> > + * The allocation functions are in the 'alloc_func_list'.
> > + */
> > +static u64 find_callsite(struct perf_evsel *evsel, struct perf_sample *sample)
> > +{
> > +	struct addr_location al;
> > +	struct machine *machine = &kmem_session->machines.host;
> > +	struct callchain_cursor_node *node;
> > +
> > +	if (alloc_func_list == NULL)
> > +		build_alloc_func_list();

And here too..


> > +
> > +	al.thread = machine__findnew_thread(machine, sample->pid, sample->tid);
> > +	sample__resolve_callchain(sample, NULL, evsel, &al, 16);
> > +
> > +	callchain_cursor_commit(&callchain_cursor);
> > +	while (true) {
> > +		struct alloc_func key, *caller;
> > +		u64 addr;
> > +
> > +		node = callchain_cursor_current(&callchain_cursor);
> > +		if (node == NULL)
> > +			break;
> > +
> > +		key.start = key.end = node->ip;
> > +		caller = bsearch(&key, alloc_func_list, nr_alloc_funcs,
> > +				 sizeof(key), callcmp);
> > +		if (!caller) {
> > +			/* found */
> > +			if (node->map)
> > +				addr = map__unmap_ip(node->map, node->ip);
> > +			else
> > +				addr = node->ip;
> > +
> > +			return addr;
> > +		} else
> > +			pr_debug3("skipping alloc function: %s\n", caller->name);
> > +
> > +		callchain_cursor_advance(&callchain_cursor);
> > +	}
> > +
> > +	pr_debug2("unknown callsite: %"PRIx64 "\n", sample->ip);
> > +	return sample->ip;
> > +}
> > +
> > +static struct page_stat *search_page(u64 page, bool create)
> >  {
> >  	struct rb_node **node = &page_tree.rb_node;
> >  	struct rb_node *parent = NULL;
> > @@ -357,6 +491,41 @@ static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool cre
> >  	return data;
> >  }
> >  
> > +static struct page_stat *search_page_caller_stat(u64 callsite, bool create)
> > +{
> > +	struct rb_node **node = &page_caller_tree.rb_node;
> > +	struct rb_node *parent = NULL;
> > +	struct page_stat *data;
> 
> Please use the "findnew" idiom to name this function, looking at only
> its name one things it searches a tree, a read only operation, but it
> may insert elements too, a modify operation.
> 
> Since we use the findnew idiom elsewhere for operations that do that,
> i.e. optimize the "new" part of "findnew" by using the "find" part,
> please use it here as well.

OK.  Will change and resend v7 soon.

Thanks for your review!
Namhyung

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [tip:perf/urgent] tracing, mm: Record pfn instead of pointer to struct page
  2015-04-06  5:36   ` Namhyung Kim
  (?)
@ 2015-04-14 12:16   ` tip-bot for Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Namhyung Kim @ 2015-04-14 12:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: rostedt, linux-kernel, namhyung, a.p.zijlstra, js1304, jolsa,
	dsahern, mingo, hpa, acme, tglx, minchan

Commit-ID:  9fdd8a875c6f3b02af48d5fa426206ca009b2b06
Gitweb:     http://git.kernel.org/tip/9fdd8a875c6f3b02af48d5fa426206ca009b2b06
Author:     Namhyung Kim <namhyung@kernel.org>
AuthorDate: Mon, 6 Apr 2015 14:36:09 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 13 Apr 2015 11:44:52 -0300

tracing, mm: Record pfn instead of pointer to struct page

The struct page is opaque for userspace tools, so it'd be better to save
pfn in order to identify page frames.

The textual output of $debugfs/tracing/trace file remains unchanged and
only raw (binary) data format is changed - but thanks to libtraceevent,
userspace tools which deal with the raw data (like perf and trace-cmd)
can parse the format easily.  So impact on the userspace will also be
minimal.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Based-on-patch-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1428298576-9785-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 include/trace/events/filemap.h |  8 ++++----
 include/trace/events/kmem.h    | 42 +++++++++++++++++++++---------------------
 include/trace/events/vmscan.h  |  8 ++++----
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index 0421f49..42febb6 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -18,14 +18,14 @@ DECLARE_EVENT_CLASS(mm_filemap_op_page_cache,
 	TP_ARGS(page),
 
 	TP_STRUCT__entry(
-		__field(struct page *, page)
+		__field(unsigned long, pfn)
 		__field(unsigned long, i_ino)
 		__field(unsigned long, index)
 		__field(dev_t, s_dev)
 	),
 
 	TP_fast_assign(
-		__entry->page = page;
+		__entry->pfn = page_to_pfn(page);
 		__entry->i_ino = page->mapping->host->i_ino;
 		__entry->index = page->index;
 		if (page->mapping->host->i_sb)
@@ -37,8 +37,8 @@ DECLARE_EVENT_CLASS(mm_filemap_op_page_cache,
 	TP_printk("dev %d:%d ino %lx page=%p pfn=%lu ofs=%lu",
 		MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
 		__entry->i_ino,
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		__entry->index << PAGE_SHIFT)
 );
 
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 4ad10ba..81ea598 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -154,18 +154,18 @@ TRACE_EVENT(mm_page_free,
 	TP_ARGS(page, order),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page_to_pfn(page);
 		__entry->order		= order;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%d",
-			__entry->page,
-			page_to_pfn(__entry->page),
+			pfn_to_page(__entry->pfn),
+			__entry->pfn,
 			__entry->order)
 );
 
@@ -176,18 +176,18 @@ TRACE_EVENT(mm_page_free_batched,
 	TP_ARGS(page, cold),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	int,		cold		)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page_to_pfn(page);
 		__entry->cold		= cold;
 	),
 
 	TP_printk("page=%p pfn=%lu order=0 cold=%d",
-			__entry->page,
-			page_to_pfn(__entry->page),
+			pfn_to_page(__entry->pfn),
+			__entry->pfn,
 			__entry->cold)
 );
 
@@ -199,22 +199,22 @@ TRACE_EVENT(mm_page_alloc,
 	TP_ARGS(page, order, gfp_flags, migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 		__field(	gfp_t,		gfp_flags	)
 		__field(	int,		migratetype	)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page ? page_to_pfn(page) : -1UL;
 		__entry->order		= order;
 		__entry->gfp_flags	= gfp_flags;
 		__entry->migratetype	= migratetype;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s",
-		__entry->page,
-		__entry->page ? page_to_pfn(__entry->page) : 0,
+		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
+		__entry->pfn != -1UL ? __entry->pfn : 0,
 		__entry->order,
 		__entry->migratetype,
 		show_gfp_flags(__entry->gfp_flags))
@@ -227,20 +227,20 @@ DECLARE_EVENT_CLASS(mm_page,
 	TP_ARGS(page, order, migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 		__field(	int,		migratetype	)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page ? page_to_pfn(page) : -1UL;
 		__entry->order		= order;
 		__entry->migratetype	= migratetype;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%u migratetype=%d percpu_refill=%d",
-		__entry->page,
-		__entry->page ? page_to_pfn(__entry->page) : 0,
+		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
+		__entry->pfn != -1UL ? __entry->pfn : 0,
 		__entry->order,
 		__entry->migratetype,
 		__entry->order == 0)
@@ -260,7 +260,7 @@ DEFINE_EVENT_PRINT(mm_page, mm_page_pcpu_drain,
 	TP_ARGS(page, order, migratetype),
 
 	TP_printk("page=%p pfn=%lu order=%d migratetype=%d",
-		__entry->page, page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn), __entry->pfn,
 		__entry->order, __entry->migratetype)
 );
 
@@ -275,7 +275,7 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 		alloc_migratetype, fallback_migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page			)
+		__field(	unsigned long,	pfn			)
 		__field(	int,		alloc_order		)
 		__field(	int,		fallback_order		)
 		__field(	int,		alloc_migratetype	)
@@ -284,7 +284,7 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 	),
 
 	TP_fast_assign(
-		__entry->page			= page;
+		__entry->pfn			= page_to_pfn(page);
 		__entry->alloc_order		= alloc_order;
 		__entry->fallback_order		= fallback_order;
 		__entry->alloc_migratetype	= alloc_migratetype;
@@ -294,8 +294,8 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 	),
 
 	TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d",
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		__entry->alloc_order,
 		__entry->fallback_order,
 		pageblock_order,
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 69590b6..f66476b 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -336,18 +336,18 @@ TRACE_EVENT(mm_vmscan_writepage,
 	TP_ARGS(page, reclaim_flags),
 
 	TP_STRUCT__entry(
-		__field(struct page *, page)
+		__field(unsigned long, pfn)
 		__field(int, reclaim_flags)
 	),
 
 	TP_fast_assign(
-		__entry->page = page;
+		__entry->pfn = page_to_pfn(page);
 		__entry->reclaim_flags = reclaim_flags;
 	),
 
 	TP_printk("page=%p pfn=%lu flags=%s",
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		show_reclaim_flags(__entry->reclaim_flags))
 );
 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [tip:perf/urgent] perf kmem: Analyze page allocator events also
  2015-04-06  5:36   ` Namhyung Kim
                     ` (2 preceding siblings ...)
  (?)
@ 2015-04-14 12:17   ` tip-bot for Namhyung Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Namhyung Kim @ 2015-04-14 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: a.p.zijlstra, jolsa, namhyung, dsahern, minchan, linux-kernel,
	acme, tglx, mingo, hpa, js1304

Commit-ID:  0d68bc92c48167130b61b449f08be27dc862dba2
Gitweb:     http://git.kernel.org/tip/0d68bc92c48167130b61b449f08be27dc862dba2
Author:     Namhyung Kim <namhyung@kernel.org>
AuthorDate: Mon, 6 Apr 2015 14:36:10 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 13 Apr 2015 11:44:52 -0300

perf kmem: Analyze page allocator events also

The perf kmem command records and analyze kernel memory allocation only
for SLAB objects.  This patch implement a simple page allocator analyzer
using kmem:mm_page_alloc and kmem:mm_page_free events.

It adds two new options of --slab and --page.  The --slab option is for
analyzing SLAB allocator and that's what perf kmem currently does.

The new --page option enables page allocator events and analyze kernel
memory usage in page unit.  Currently, 'stat --alloc' subcommand is
implemented only.

If none of these --slab nor --page is specified, --slab is implied.

First run 'perf kmem record' to generate a suitable perf.data file:

  # perf kmem record --page sleep 5

Then run 'perf kmem stat' to postprocess the perf.data file:

  # perf kmem stat --page --alloc --line 10

  -------------------------------------------------------------------------------
   PFN              | Total alloc (KB) | Hits     | Order | Mig.type | GFP flags
  -------------------------------------------------------------------------------
            4045014 |               16 |        1 |     2 |  RECLAIM |  00285250
            4143980 |               16 |        1 |     2 |  RECLAIM |  00285250
            3938658 |               16 |        1 |     2 |  RECLAIM |  00285250
            4045400 |               16 |        1 |     2 |  RECLAIM |  00285250
            3568708 |               16 |        1 |     2 |  RECLAIM |  00285250
            3729824 |               16 |        1 |     2 |  RECLAIM |  00285250
            3657210 |               16 |        1 |     2 |  RECLAIM |  00285250
            4120750 |               16 |        1 |     2 |  RECLAIM |  00285250
            3678850 |               16 |        1 |     2 |  RECLAIM |  00285250
            3693874 |               16 |        1 |     2 |  RECLAIM |  00285250
   ...              | ...              | ...      | ...   | ...      | ...
  -------------------------------------------------------------------------------

  SUMMARY (page allocator)
  ========================
  Total allocation requests     :           44,260   [          177,256 KB ]
  Total free requests           :              117   [              468 KB ]

  Total alloc+freed requests    :               49   [              196 KB ]
  Total alloc-only requests     :           44,211   [          177,060 KB ]
  Total free-only requests      :               68   [              272 KB ]

  Total allocation failures     :                0   [                0 KB ]

  Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
  -----  ------------  ------------  ------------  ------------  ------------
      0            32             .        44,210             .             .
      1             .             .             .             .             .
      2             .            18             .             .             .
      3             .             .             .             .             .
      4             .             .             .             .             .
      5             .             .             .             .             .
      6             .             .             .             .             .
      7             .             .             .             .             .
      8             .             .             .             .             .
      9             .             .             .             .             .
     10             .             .             .             .             .

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1428298576-9785-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-kmem.txt |   8 +-
 tools/perf/builtin-kmem.c              | 500 +++++++++++++++++++++++++++++++--
 2 files changed, 491 insertions(+), 17 deletions(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
index 150253c..23219c6 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -3,7 +3,7 @@ perf-kmem(1)
 
 NAME
 ----
-perf-kmem - Tool to trace/measure kernel memory(slab) properties
+perf-kmem - Tool to trace/measure kernel memory properties
 
 SYNOPSIS
 --------
@@ -46,6 +46,12 @@ OPTIONS
 --raw-ip::
 	Print raw ip instead of symbol
 
+--slab::
+	Analyze SLAB allocator events.
+
+--page::
+	Analyze page allocator events
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 4ebf65c..63ea013 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -22,6 +22,11 @@
 #include <linux/string.h>
 #include <locale.h>
 
+static int	kmem_slab;
+static int	kmem_page;
+
+static long	kmem_page_size;
+
 struct alloc_stat;
 typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
 
@@ -226,6 +231,244 @@ static int perf_evsel__process_free_event(struct perf_evsel *evsel,
 	return 0;
 }
 
+static u64 total_page_alloc_bytes;
+static u64 total_page_free_bytes;
+static u64 total_page_nomatch_bytes;
+static u64 total_page_fail_bytes;
+static unsigned long nr_page_allocs;
+static unsigned long nr_page_frees;
+static unsigned long nr_page_fails;
+static unsigned long nr_page_nomatch;
+
+static bool use_pfn;
+
+#define MAX_MIGRATE_TYPES  6
+#define MAX_PAGE_ORDER     11
+
+static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
+
+struct page_stat {
+	struct rb_node 	node;
+	u64 		page;
+	int 		order;
+	unsigned 	gfp_flags;
+	unsigned 	migrate_type;
+	u64		alloc_bytes;
+	u64 		free_bytes;
+	int 		nr_alloc;
+	int 		nr_free;
+};
+
+static struct rb_root page_tree;
+static struct rb_root page_alloc_tree;
+static struct rb_root page_alloc_sorted;
+
+static struct page_stat *search_page(unsigned long page, bool create)
+{
+	struct rb_node **node = &page_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = data->page - page;
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->page = page;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_tree);
+	}
+
+	return data;
+}
+
+static int page_stat_cmp(struct page_stat *a, struct page_stat *b)
+{
+	if (a->page > b->page)
+		return -1;
+	if (a->page < b->page)
+		return 1;
+	if (a->order > b->order)
+		return -1;
+	if (a->order < b->order)
+		return 1;
+	if (a->migrate_type > b->migrate_type)
+		return -1;
+	if (a->migrate_type < b->migrate_type)
+		return 1;
+	if (a->gfp_flags > b->gfp_flags)
+		return -1;
+	if (a->gfp_flags < b->gfp_flags)
+		return 1;
+	return 0;
+}
+
+static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool create)
+{
+	struct rb_node **node = &page_alloc_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = page_stat_cmp(data, stat);
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->page = stat->page;
+		data->order = stat->order;
+		data->gfp_flags = stat->gfp_flags;
+		data->migrate_type = stat->migrate_type;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_alloc_tree);
+	}
+
+	return data;
+}
+
+static bool valid_page(u64 pfn_or_page)
+{
+	if (use_pfn && pfn_or_page == -1UL)
+		return false;
+	if (!use_pfn && pfn_or_page == 0)
+		return false;
+	return true;
+}
+
+static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
+						struct perf_sample *sample)
+{
+	u64 page;
+	unsigned int order = perf_evsel__intval(evsel, sample, "order");
+	unsigned int gfp_flags = perf_evsel__intval(evsel, sample, "gfp_flags");
+	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
+						       "migratetype");
+	u64 bytes = kmem_page_size << order;
+	struct page_stat *stat;
+	struct page_stat this = {
+		.order = order,
+		.gfp_flags = gfp_flags,
+		.migrate_type = migrate_type,
+	};
+
+	if (use_pfn)
+		page = perf_evsel__intval(evsel, sample, "pfn");
+	else
+		page = perf_evsel__intval(evsel, sample, "page");
+
+	nr_page_allocs++;
+	total_page_alloc_bytes += bytes;
+
+	if (!valid_page(page)) {
+		nr_page_fails++;
+		total_page_fail_bytes += bytes;
+
+		return 0;
+	}
+
+	/*
+	 * This is to find the current page (with correct gfp flags and
+	 * migrate type) at free event.
+	 */
+	stat = search_page(page, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->order = order;
+	stat->gfp_flags = gfp_flags;
+	stat->migrate_type = migrate_type;
+
+	this.page = page;
+	stat = search_page_alloc_stat(&this, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->nr_alloc++;
+	stat->alloc_bytes += bytes;
+
+	order_stats[order][migrate_type]++;
+
+	return 0;
+}
+
+static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
+						struct perf_sample *sample)
+{
+	u64 page;
+	unsigned int order = perf_evsel__intval(evsel, sample, "order");
+	u64 bytes = kmem_page_size << order;
+	struct page_stat *stat;
+	struct page_stat this = {
+		.order = order,
+	};
+
+	if (use_pfn)
+		page = perf_evsel__intval(evsel, sample, "pfn");
+	else
+		page = perf_evsel__intval(evsel, sample, "page");
+
+	nr_page_frees++;
+	total_page_free_bytes += bytes;
+
+	stat = search_page(page, false);
+	if (stat == NULL) {
+		pr_debug2("missing free at page %"PRIx64" (order: %d)\n",
+			  page, order);
+
+		nr_page_nomatch++;
+		total_page_nomatch_bytes += bytes;
+
+		return 0;
+	}
+
+	this.page = page;
+	this.gfp_flags = stat->gfp_flags;
+	this.migrate_type = stat->migrate_type;
+
+	rb_erase(&stat->node, &page_tree);
+	free(stat);
+
+	stat = search_page_alloc_stat(&this, false);
+	if (stat == NULL)
+		return -ENOENT;
+
+	stat->nr_free++;
+	stat->free_bytes += bytes;
+
+	return 0;
+}
+
 typedef int (*tracepoint_handler)(struct perf_evsel *evsel,
 				  struct perf_sample *sample);
 
@@ -270,8 +513,9 @@ static double fragmentation(unsigned long n_req, unsigned long n_alloc)
 		return 100.0 - (100.0 * n_req / n_alloc);
 }
 
-static void __print_result(struct rb_root *root, struct perf_session *session,
-			   int n_lines, int is_caller)
+static void __print_slab_result(struct rb_root *root,
+				struct perf_session *session,
+				int n_lines, int is_caller)
 {
 	struct rb_node *next;
 	struct machine *machine = &session->machines.host;
@@ -323,9 +567,56 @@ static void __print_result(struct rb_root *root, struct perf_session *session,
 	printf("%.105s\n", graph_dotted_line);
 }
 
-static void print_summary(void)
+static const char * const migrate_type_str[] = {
+	"UNMOVABL",
+	"RECLAIM",
+	"MOVABLE",
+	"RESERVED",
+	"CMA/ISLT",
+	"UNKNOWN",
+};
+
+static void __print_page_result(struct rb_root *root,
+				struct perf_session *session __maybe_unused,
+				int n_lines)
+{
+	struct rb_node *next = rb_first(root);
+	const char *format;
+
+	printf("\n%.80s\n", graph_dotted_line);
+	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
+	       use_pfn ? "PFN" : "Page");
+	printf("%.80s\n", graph_dotted_line);
+
+	if (use_pfn)
+		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+	else
+		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+
+	while (next && n_lines--) {
+		struct page_stat *data;
+
+		data = rb_entry(next, struct page_stat, node);
+
+		printf(format, (unsigned long long)data->page,
+		       (unsigned long long)data->alloc_bytes / 1024,
+		       data->nr_alloc, data->order,
+		       migrate_type_str[data->migrate_type],
+		       (unsigned long)data->gfp_flags);
+
+		next = rb_next(next);
+	}
+
+	if (n_lines == -1)
+		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
+
+	printf("%.80s\n", graph_dotted_line);
+}
+
+static void print_slab_summary(void)
 {
-	printf("\nSUMMARY\n=======\n");
+	printf("\nSUMMARY (SLAB allocator)");
+	printf("\n========================\n");
 	printf("Total bytes requested: %'lu\n", total_requested);
 	printf("Total bytes allocated: %'lu\n", total_allocated);
 	printf("Total bytes wasted on internal fragmentation: %'lu\n",
@@ -335,13 +626,73 @@ static void print_summary(void)
 	printf("Cross CPU allocations: %'lu/%'lu\n", nr_cross_allocs, nr_allocs);
 }
 
-static void print_result(struct perf_session *session)
+static void print_page_summary(void)
+{
+	int o, m;
+	u64 nr_alloc_freed = nr_page_frees - nr_page_nomatch;
+	u64 total_alloc_freed_bytes = total_page_free_bytes - total_page_nomatch_bytes;
+
+	printf("\nSUMMARY (page allocator)");
+	printf("\n========================\n");
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation requests",
+	       nr_page_allocs, total_page_alloc_bytes / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free requests",
+	       nr_page_frees, total_page_free_bytes / 1024);
+	printf("\n");
+
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc+freed requests",
+	       nr_alloc_freed, (total_alloc_freed_bytes) / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc-only requests",
+	       nr_page_allocs - nr_alloc_freed,
+	       (total_page_alloc_bytes - total_alloc_freed_bytes) / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free-only requests",
+	       nr_page_nomatch, total_page_nomatch_bytes / 1024);
+	printf("\n");
+
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation failures",
+	       nr_page_fails, total_page_fail_bytes / 1024);
+	printf("\n");
+
+	printf("%5s  %12s  %12s  %12s  %12s  %12s\n", "Order",  "Unmovable",
+	       "Reclaimable", "Movable", "Reserved", "CMA/Isolated");
+	printf("%.5s  %.12s  %.12s  %.12s  %.12s  %.12s\n", graph_dotted_line,
+	       graph_dotted_line, graph_dotted_line, graph_dotted_line,
+	       graph_dotted_line, graph_dotted_line);
+
+	for (o = 0; o < MAX_PAGE_ORDER; o++) {
+		printf("%5d", o);
+		for (m = 0; m < MAX_MIGRATE_TYPES - 1; m++) {
+			if (order_stats[o][m])
+				printf("  %'12d", order_stats[o][m]);
+			else
+				printf("  %12c", '.');
+		}
+		printf("\n");
+	}
+}
+
+static void print_slab_result(struct perf_session *session)
 {
 	if (caller_flag)
-		__print_result(&root_caller_sorted, session, caller_lines, 1);
+		__print_slab_result(&root_caller_sorted, session, caller_lines, 1);
+	if (alloc_flag)
+		__print_slab_result(&root_alloc_sorted, session, alloc_lines, 0);
+	print_slab_summary();
+}
+
+static void print_page_result(struct perf_session *session)
+{
 	if (alloc_flag)
-		__print_result(&root_alloc_sorted, session, alloc_lines, 0);
-	print_summary();
+		__print_page_result(&page_alloc_sorted, session, alloc_lines);
+	print_page_summary();
+}
+
+static void print_result(struct perf_session *session)
+{
+	if (kmem_slab)
+		print_slab_result(session);
+	if (kmem_page)
+		print_page_result(session);
 }
 
 struct sort_dimension {
@@ -353,8 +704,8 @@ struct sort_dimension {
 static LIST_HEAD(caller_sort);
 static LIST_HEAD(alloc_sort);
 
-static void sort_insert(struct rb_root *root, struct alloc_stat *data,
-			struct list_head *sort_list)
+static void sort_slab_insert(struct rb_root *root, struct alloc_stat *data,
+			     struct list_head *sort_list)
 {
 	struct rb_node **new = &(root->rb_node);
 	struct rb_node *parent = NULL;
@@ -383,8 +734,8 @@ static void sort_insert(struct rb_root *root, struct alloc_stat *data,
 	rb_insert_color(&data->node, root);
 }
 
-static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
-			  struct list_head *sort_list)
+static void __sort_slab_result(struct rb_root *root, struct rb_root *root_sorted,
+			       struct list_head *sort_list)
 {
 	struct rb_node *node;
 	struct alloc_stat *data;
@@ -396,26 +747,79 @@ static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
 
 		rb_erase(node, root);
 		data = rb_entry(node, struct alloc_stat, node);
-		sort_insert(root_sorted, data, sort_list);
+		sort_slab_insert(root_sorted, data, sort_list);
+	}
+}
+
+static void sort_page_insert(struct rb_root *root, struct page_stat *data)
+{
+	struct rb_node **new = &root->rb_node;
+	struct rb_node *parent = NULL;
+
+	while (*new) {
+		struct page_stat *this;
+		int cmp = 0;
+
+		this = rb_entry(*new, struct page_stat, node);
+		parent = *new;
+
+		/* TODO: support more sort key */
+		cmp = data->alloc_bytes - this->alloc_bytes;
+
+		if (cmp > 0)
+			new = &parent->rb_left;
+		else
+			new = &parent->rb_right;
+	}
+
+	rb_link_node(&data->node, parent, new);
+	rb_insert_color(&data->node, root);
+}
+
+static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted)
+{
+	struct rb_node *node;
+	struct page_stat *data;
+
+	for (;;) {
+		node = rb_first(root);
+		if (!node)
+			break;
+
+		rb_erase(node, root);
+		data = rb_entry(node, struct page_stat, node);
+		sort_page_insert(root_sorted, data);
 	}
 }
 
 static void sort_result(void)
 {
-	__sort_result(&root_alloc_stat, &root_alloc_sorted, &alloc_sort);
-	__sort_result(&root_caller_stat, &root_caller_sorted, &caller_sort);
+	if (kmem_slab) {
+		__sort_slab_result(&root_alloc_stat, &root_alloc_sorted,
+				   &alloc_sort);
+		__sort_slab_result(&root_caller_stat, &root_caller_sorted,
+				   &caller_sort);
+	}
+	if (kmem_page) {
+		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
+	}
 }
 
 static int __cmd_kmem(struct perf_session *session)
 {
 	int err = -EINVAL;
+	struct perf_evsel *evsel;
 	const struct perf_evsel_str_handler kmem_tracepoints[] = {
+		/* slab allocator */
 		{ "kmem:kmalloc",		perf_evsel__process_alloc_event, },
     		{ "kmem:kmem_cache_alloc",	perf_evsel__process_alloc_event, },
 		{ "kmem:kmalloc_node",		perf_evsel__process_alloc_node_event, },
     		{ "kmem:kmem_cache_alloc_node", perf_evsel__process_alloc_node_event, },
 		{ "kmem:kfree",			perf_evsel__process_free_event, },
     		{ "kmem:kmem_cache_free",	perf_evsel__process_free_event, },
+		/* page allocator */
+		{ "kmem:mm_page_alloc",		perf_evsel__process_page_alloc_event, },
+		{ "kmem:mm_page_free",		perf_evsel__process_page_free_event, },
 	};
 
 	if (!perf_session__has_traces(session, "kmem record"))
@@ -426,10 +830,20 @@ static int __cmd_kmem(struct perf_session *session)
 		goto out;
 	}
 
+	evlist__for_each(session->evlist, evsel) {
+		if (!strcmp(perf_evsel__name(evsel), "kmem:mm_page_alloc") &&
+		    perf_evsel__field(evsel, "pfn")) {
+			use_pfn = true;
+			break;
+		}
+	}
+
 	setup_pager();
 	err = perf_session__process_events(session);
-	if (err != 0)
+	if (err != 0) {
+		pr_err("error during process events: %d\n", err);
 		goto out;
+	}
 	sort_result();
 	print_result(session);
 out:
@@ -612,6 +1026,22 @@ static int parse_alloc_opt(const struct option *opt __maybe_unused,
 	return 0;
 }
 
+static int parse_slab_opt(const struct option *opt __maybe_unused,
+			  const char *arg __maybe_unused,
+			  int unset __maybe_unused)
+{
+	kmem_slab = (kmem_page + 1);
+	return 0;
+}
+
+static int parse_page_opt(const struct option *opt __maybe_unused,
+			  const char *arg __maybe_unused,
+			  int unset __maybe_unused)
+{
+	kmem_page = (kmem_slab + 1);
+	return 0;
+}
+
 static int parse_line_opt(const struct option *opt __maybe_unused,
 			  const char *arg, int unset __maybe_unused)
 {
@@ -634,6 +1064,8 @@ static int __cmd_record(int argc, const char **argv)
 {
 	const char * const record_args[] = {
 	"record", "-a", "-R", "-c", "1",
+	};
+	const char * const slab_events[] = {
 	"-e", "kmem:kmalloc",
 	"-e", "kmem:kmalloc_node",
 	"-e", "kmem:kfree",
@@ -641,10 +1073,19 @@ static int __cmd_record(int argc, const char **argv)
 	"-e", "kmem:kmem_cache_alloc_node",
 	"-e", "kmem:kmem_cache_free",
 	};
+	const char * const page_events[] = {
+	"-e", "kmem:mm_page_alloc",
+	"-e", "kmem:mm_page_free",
+	};
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
 
 	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	if (kmem_slab)
+		rec_argc += ARRAY_SIZE(slab_events);
+	if (kmem_page)
+		rec_argc += ARRAY_SIZE(page_events);
+
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
 	if (rec_argv == NULL)
@@ -653,6 +1094,15 @@ static int __cmd_record(int argc, const char **argv)
 	for (i = 0; i < ARRAY_SIZE(record_args); i++)
 		rec_argv[i] = strdup(record_args[i]);
 
+	if (kmem_slab) {
+		for (j = 0; j < ARRAY_SIZE(slab_events); j++, i++)
+			rec_argv[i] = strdup(slab_events[j]);
+	}
+	if (kmem_page) {
+		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
+			rec_argv[i] = strdup(page_events[j]);
+	}
+
 	for (j = 1; j < (unsigned int)argc; j++, i++)
 		rec_argv[i] = argv[j];
 
@@ -679,6 +1129,10 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_CALLBACK('l', "line", NULL, "num", "show n lines", parse_line_opt),
 	OPT_BOOLEAN(0, "raw-ip", &raw_ip, "show raw ip instead of symbol"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
+	OPT_CALLBACK_NOOPT(0, "slab", NULL, NULL, "Analyze slab allocator",
+			   parse_slab_opt),
+	OPT_CALLBACK_NOOPT(0, "page", NULL, NULL, "Analyze page allocator",
+			   parse_page_opt),
 	OPT_END()
 	};
 	const char *const kmem_subcommands[] = { "record", "stat", NULL };
@@ -695,6 +1149,9 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (!argc)
 		usage_with_options(kmem_usage, kmem_options);
 
+	if (kmem_slab == 0 && kmem_page == 0)
+		kmem_slab = 1;  /* for backward compatibility */
+
 	if (!strncmp(argv[0], "rec", 3)) {
 		symbol__init(NULL);
 		return __cmd_record(argc, argv);
@@ -706,6 +1163,17 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (session == NULL)
 		return -1;
 
+	if (kmem_page) {
+		struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+
+		if (evsel == NULL || evsel->tp_format == NULL) {
+			pr_err("invalid event found.. aborting\n");
+			return -1;
+		}
+
+		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
+	}
+
 	symbol__init(&session->header.env);
 
 	if (!strcmp(argv[0], "stat")) {

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2015-04-14 12:17 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-06  5:36 [PATCHSET 0/9] perf kmem: Implement page allocation analysis (v6) Namhyung Kim
2015-04-06  5:36 ` Namhyung Kim
2015-04-06  5:36 ` [PATCH 1/9] perf kmem: Respect -i option Namhyung Kim
2015-04-06  5:36   ` Namhyung Kim
2015-04-08 15:11   ` [tip:perf/core] " tip-bot for Jiri Olsa
2015-04-06  5:36 ` [PATCH 2/9] tracing, mm: Record pfn instead of pointer to struct page Namhyung Kim
2015-04-06  5:36   ` Namhyung Kim
2015-04-14 12:16   ` [tip:perf/urgent] " tip-bot for Namhyung Kim
2015-04-06  5:36 ` [PATCH 3/9] perf kmem: Analyze page allocator events also Namhyung Kim
2015-04-06  5:36   ` Namhyung Kim
2015-04-10 21:06   ` Arnaldo Carvalho de Melo
2015-04-10 21:06     ` Arnaldo Carvalho de Melo
2015-04-10 21:10     ` Arnaldo Carvalho de Melo
2015-04-10 21:10       ` Arnaldo Carvalho de Melo
2015-04-13  6:59       ` Namhyung Kim
2015-04-13  6:59         ` Namhyung Kim
2015-04-13 13:21         ` Arnaldo Carvalho de Melo
2015-04-13 13:21           ` Arnaldo Carvalho de Melo
2015-04-13 13:40   ` Arnaldo Carvalho de Melo
2015-04-13 13:40     ` Arnaldo Carvalho de Melo
2015-04-14 12:17   ` [tip:perf/urgent] " tip-bot for Namhyung Kim
2015-04-06  5:36 ` [PATCH 4/9] perf kmem: Implement stat --page --caller Namhyung Kim
2015-04-06  5:36   ` Namhyung Kim
2015-04-13 13:40   ` Arnaldo Carvalho de Melo
2015-04-13 13:40     ` Arnaldo Carvalho de Melo
2015-04-14  2:17     ` Namhyung Kim
2015-04-14  2:17       ` Namhyung Kim
2015-04-06  5:36 ` [PATCH 5/9] perf kmem: Support sort keys on page analysis Namhyung Kim
2015-04-06  5:36   ` Namhyung Kim
2015-04-06  5:36 ` [PATCH 6/9] perf kmem: Add --live option for current allocation stat Namhyung Kim
2015-04-06  5:36   ` Namhyung Kim
2015-04-06  5:36 ` [PATCH 7/9] perf kmem: Print gfp flags in human readable string Namhyung Kim
2015-04-06  5:36   ` Namhyung Kim
2015-04-06  5:36 ` [PATCH 8/9] perf kmem: Add kmem.default config option Namhyung Kim
2015-04-06  5:36   ` Namhyung Kim
2015-04-06  5:36 ` [PATCH 9/9] tools lib traceevent: Honor operator priority Namhyung Kim
2015-04-06  5:36   ` Namhyung Kim
2015-04-06 14:45   ` Steven Rostedt
2015-04-06 14:45     ` Steven Rostedt
2015-04-07  7:52     ` Namhyung Kim
2015-04-07  7:52       ` Namhyung Kim
2015-04-07 13:02       ` Arnaldo Carvalho de Melo
2015-04-07 13:02         ` Arnaldo Carvalho de Melo
2015-04-07 13:57         ` Steven Rostedt
2015-04-07 13:57           ` Steven Rostedt
2015-04-07 14:10         ` Namhyung Kim
2015-04-07 14:10           ` Namhyung Kim
2015-04-08 15:11   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-04-13 13:41   ` [PATCH 9/9] " Arnaldo Carvalho de Melo
2015-04-13 13:41     ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.