All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH -tip 0/2 v2] pagecache tracepoints proposal
@ 2010-01-23  0:04 ` Keiichi KII
  0 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-23  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: lwoodman, linux-mm, mingo, tzanussi, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

Hello,

This is v2 of a patchset to add some tracepoints for pagecache.

I would propose several tracepoints for tracing pagecache behaviors and
a script for these.
By using both the tracepoints and the script, we can monitor pagecache usage
with high resolution. Example output of the script looks like:

[file list]
                            cached
        device      inode    pages
  --------------------------------
         253:0    1051413      130
         253:0    1051399        2
         253:0    1051414       44
         253:0    1051417      154

o postmaster-2330
                            cached    added  removed      indirect
        device      inode    pages    pages    pages removed pages
  ----------------------------------------------------------------
         253:0    1051399        0        2        0             0
         253:0    1051417      154        0        0             0
         253:0    1051413      130        0        0             0
         253:0    1051414       44        0        0             0
  ----------------------------------------------------------------
  total:                       337        2        0             0

We can now know system-wide pagecache usage by /proc/meminfo.
But we have no method to get higher resolution information like per file or
per process usage than system-wide one.
A process may share some pagecache or add a pagecache to the memory or
remove a pagecache from the memory.
If a pagecache miss hit ratio rises, maybe it leads to extra I/O and
affects system performance.

So, by using the tracepoints we can get the following information.
 1. how many pagecaches each process has per each file
 2. how many pages are cached per each file
 3. how many pagecaches each process shares
 4. how often each process adds/removes pagecache
 5. how long a pagecache stays in the memory
 6. pagecache hit rate per file

Especially, the monitoring pagecache usage per each file would help us tune
some applications like database.

Changelog since v1
o Add a script based on "perf trace stream scripting support".

Any comments are welcome.
--
Keiichi Kii <k-keiichi@bx.jp.nec.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH -tip 0/2 v2] pagecache tracepoints proposal
@ 2010-01-23  0:04 ` Keiichi KII
  0 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-23  0:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: lwoodman, linux-mm, mingo, tzanussi, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

Hello,

This is v2 of a patchset to add some tracepoints for pagecache.

I would propose several tracepoints for tracing pagecache behaviors and
a script for these.
By using both the tracepoints and the script, we can monitor pagecache usage
with high resolution. Example output of the script looks like:

[file list]
                            cached
        device      inode    pages
  --------------------------------
         253:0    1051413      130
         253:0    1051399        2
         253:0    1051414       44
         253:0    1051417      154

o postmaster-2330
                            cached    added  removed      indirect
        device      inode    pages    pages    pages removed pages
  ----------------------------------------------------------------
         253:0    1051399        0        2        0             0
         253:0    1051417      154        0        0             0
         253:0    1051413      130        0        0             0
         253:0    1051414       44        0        0             0
  ----------------------------------------------------------------
  total:                       337        2        0             0

We can now know system-wide pagecache usage by /proc/meminfo.
But we have no method to get higher resolution information like per file or
per process usage than system-wide one.
A process may share some pagecache or add a pagecache to the memory or
remove a pagecache from the memory.
If a pagecache miss hit ratio rises, maybe it leads to extra I/O and
affects system performance.

So, by using the tracepoints we can get the following information.
 1. how many pagecaches each process has per each file
 2. how many pages are cached per each file
 3. how many pagecaches each process shares
 4. how often each process adds/removes pagecache
 5. how long a pagecache stays in the memory
 6. pagecache hit rate per file

Especially, the monitoring pagecache usage per each file would help us tune
some applications like database.

Changelog since v1
o Add a script based on "perf trace stream scripting support".

Any comments are welcome.
--
Keiichi Kii <k-keiichi@bx.jp.nec.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH -tip 1/2 v2] add tracepoints for pagecache
  2010-01-23  0:04 ` Keiichi KII
@ 2010-01-23  0:07   ` Keiichi KII
  -1 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-23  0:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: lwoodman, linux-mm, mingo, tzanussi, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

This patch adds several tracepoints to track pagecach behavior.
These trecepoints would help us monitor pagecache usage with high resolution.

Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
Cc: Atsushi Tsuji <a-tsuji@bk.jp.nec.com> 
---
 include/trace/events/filemap.h |   83 +++++++++++++++++++++++++++++++++++++++++
 mm/filemap.c                   |    5 ++
 mm/truncate.c                  |    2 
 mm/vmscan.c                    |    3 +
 4 files changed, 93 insertions(+)

Index: linux-2.6-tip/include/trace/events/filemap.h
===================================================================
--- /dev/null
+++ linux-2.6-tip/include/trace/events/filemap.h
@@ -0,0 +1,83 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM filemap
+
+#if !defined(_TRACE_FILEMAP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FILEMAP_H
+
+#include <linux/fs.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(find_get_page,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset,
+		struct page *page),
+
+	TP_ARGS(mapping, offset, page),
+
+	TP_STRUCT__entry(
+		__field(dev_t, s_dev)
+		__field(ino_t, i_ino)
+		__field(pgoff_t, offset)
+		__field(struct page *, page)
+		),
+
+	TP_fast_assign(
+		__entry->s_dev = mapping->host ? mapping->host->i_sb->s_dev : 0;
+		__entry->i_ino = mapping->host ? mapping->host->i_ino : 0;
+		__entry->offset = offset;
+		__entry->page = page;
+		),
+
+	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu %s", MAJOR(__entry->s_dev),
+		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset,
+		__entry->page == NULL ? "page_not_found" : "page_found")
+);
+
+TRACE_EVENT(add_to_page_cache,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset),
+
+	TP_ARGS(mapping, offset),
+
+	TP_STRUCT__entry(
+		__field(dev_t, s_dev)
+		__field(ino_t, i_ino)
+		__field(pgoff_t, offset)
+		),
+
+	TP_fast_assign(
+		__entry->s_dev = mapping->host->i_sb->s_dev;
+		__entry->i_ino = mapping->host->i_ino;
+		__entry->offset = offset;
+		),
+
+	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
+		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
+);
+
+TRACE_EVENT(remove_from_page_cache,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset),
+
+	TP_ARGS(mapping, offset),
+
+	TP_STRUCT__entry(
+		__field(dev_t, s_dev)
+		__field(ino_t, i_ino)
+		__field(pgoff_t, offset)
+		),
+
+	TP_fast_assign(
+		__entry->s_dev = mapping->host->i_sb->s_dev;
+		__entry->i_ino = mapping->host->i_ino;
+		__entry->offset = offset;
+		),
+
+	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
+		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
+);
+
+#endif /* _TRACE_FILEMAP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
Index: linux-2.6-tip/mm/filemap.c
===================================================================
--- linux-2.6-tip.orig/mm/filemap.c
+++ linux-2.6-tip/mm/filemap.c
@@ -34,6 +34,8 @@
 #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
 #include <linux/memcontrol.h>
 #include <linux/mm_inline.h> /* for page_is_file_cache() */
+#define CREATE_TRACE_POINTS
+#include <trace/events/filemap.h>
 #include "internal.h"
 
 /*
@@ -149,6 +151,7 @@ void remove_from_page_cache(struct page 
 	spin_lock_irq(&mapping->tree_lock);
 	__remove_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);
+	trace_remove_from_page_cache(mapping, page->index);
 	mem_cgroup_uncharge_cache_page(page);
 }
 
@@ -419,6 +422,7 @@ int add_to_page_cache_locked(struct page
 			if (PageSwapBacked(page))
 				__inc_zone_page_state(page, NR_SHMEM);
 			spin_unlock_irq(&mapping->tree_lock);
+			trace_add_to_page_cache(mapping, offset);
 		} else {
 			page->mapping = NULL;
 			spin_unlock_irq(&mapping->tree_lock);
@@ -642,6 +646,7 @@ repeat:
 	}
 	rcu_read_unlock();
 
+	trace_find_get_page(mapping, offset, page);
 	return page;
 }
 EXPORT_SYMBOL(find_get_page);
Index: linux-2.6-tip/mm/truncate.c
===================================================================
--- linux-2.6-tip.orig/mm/truncate.c
+++ linux-2.6-tip/mm/truncate.c
@@ -20,6 +20,7 @@
 				   do_invalidatepage */
 #include "internal.h"
 
+#include <trace/events/filemap.h>
 
 /**
  * do_invalidatepage - invalidate part or all of a page
@@ -388,6 +389,7 @@ invalidate_complete_page2(struct address
 	BUG_ON(page_has_private(page));
 	__remove_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);
+	trace_remove_from_page_cache(mapping, page->index);
 	mem_cgroup_uncharge_cache_page(page);
 	page_cache_release(page);	/* pagecache ref */
 	return 1;
Index: linux-2.6-tip/mm/vmscan.c
===================================================================
--- linux-2.6-tip.orig/mm/vmscan.c
+++ linux-2.6-tip/mm/vmscan.c
@@ -48,6 +48,8 @@
 
 #include "internal.h"
 
+#include <trace/events/filemap.h>
+
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
 	unsigned long nr_scanned;
@@ -477,6 +479,7 @@ static int __remove_mapping(struct addre
 	} else {
 		__remove_from_page_cache(page);
 		spin_unlock_irq(&mapping->tree_lock);
+		trace_remove_from_page_cache(mapping, page->index);
 		mem_cgroup_uncharge_cache_page(page);
 	}
 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH -tip 1/2 v2] add tracepoints for pagecache
@ 2010-01-23  0:07   ` Keiichi KII
  0 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-23  0:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: lwoodman, linux-mm, mingo, tzanussi, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

This patch adds several tracepoints to track pagecach behavior.
These trecepoints would help us monitor pagecache usage with high resolution.

Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
Cc: Atsushi Tsuji <a-tsuji@bk.jp.nec.com> 
---
 include/trace/events/filemap.h |   83 +++++++++++++++++++++++++++++++++++++++++
 mm/filemap.c                   |    5 ++
 mm/truncate.c                  |    2 
 mm/vmscan.c                    |    3 +
 4 files changed, 93 insertions(+)

Index: linux-2.6-tip/include/trace/events/filemap.h
===================================================================
--- /dev/null
+++ linux-2.6-tip/include/trace/events/filemap.h
@@ -0,0 +1,83 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM filemap
+
+#if !defined(_TRACE_FILEMAP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FILEMAP_H
+
+#include <linux/fs.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(find_get_page,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset,
+		struct page *page),
+
+	TP_ARGS(mapping, offset, page),
+
+	TP_STRUCT__entry(
+		__field(dev_t, s_dev)
+		__field(ino_t, i_ino)
+		__field(pgoff_t, offset)
+		__field(struct page *, page)
+		),
+
+	TP_fast_assign(
+		__entry->s_dev = mapping->host ? mapping->host->i_sb->s_dev : 0;
+		__entry->i_ino = mapping->host ? mapping->host->i_ino : 0;
+		__entry->offset = offset;
+		__entry->page = page;
+		),
+
+	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu %s", MAJOR(__entry->s_dev),
+		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset,
+		__entry->page == NULL ? "page_not_found" : "page_found")
+);
+
+TRACE_EVENT(add_to_page_cache,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset),
+
+	TP_ARGS(mapping, offset),
+
+	TP_STRUCT__entry(
+		__field(dev_t, s_dev)
+		__field(ino_t, i_ino)
+		__field(pgoff_t, offset)
+		),
+
+	TP_fast_assign(
+		__entry->s_dev = mapping->host->i_sb->s_dev;
+		__entry->i_ino = mapping->host->i_ino;
+		__entry->offset = offset;
+		),
+
+	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
+		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
+);
+
+TRACE_EVENT(remove_from_page_cache,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset),
+
+	TP_ARGS(mapping, offset),
+
+	TP_STRUCT__entry(
+		__field(dev_t, s_dev)
+		__field(ino_t, i_ino)
+		__field(pgoff_t, offset)
+		),
+
+	TP_fast_assign(
+		__entry->s_dev = mapping->host->i_sb->s_dev;
+		__entry->i_ino = mapping->host->i_ino;
+		__entry->offset = offset;
+		),
+
+	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
+		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
+);
+
+#endif /* _TRACE_FILEMAP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
Index: linux-2.6-tip/mm/filemap.c
===================================================================
--- linux-2.6-tip.orig/mm/filemap.c
+++ linux-2.6-tip/mm/filemap.c
@@ -34,6 +34,8 @@
 #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
 #include <linux/memcontrol.h>
 #include <linux/mm_inline.h> /* for page_is_file_cache() */
+#define CREATE_TRACE_POINTS
+#include <trace/events/filemap.h>
 #include "internal.h"
 
 /*
@@ -149,6 +151,7 @@ void remove_from_page_cache(struct page 
 	spin_lock_irq(&mapping->tree_lock);
 	__remove_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);
+	trace_remove_from_page_cache(mapping, page->index);
 	mem_cgroup_uncharge_cache_page(page);
 }
 
@@ -419,6 +422,7 @@ int add_to_page_cache_locked(struct page
 			if (PageSwapBacked(page))
 				__inc_zone_page_state(page, NR_SHMEM);
 			spin_unlock_irq(&mapping->tree_lock);
+			trace_add_to_page_cache(mapping, offset);
 		} else {
 			page->mapping = NULL;
 			spin_unlock_irq(&mapping->tree_lock);
@@ -642,6 +646,7 @@ repeat:
 	}
 	rcu_read_unlock();
 
+	trace_find_get_page(mapping, offset, page);
 	return page;
 }
 EXPORT_SYMBOL(find_get_page);
Index: linux-2.6-tip/mm/truncate.c
===================================================================
--- linux-2.6-tip.orig/mm/truncate.c
+++ linux-2.6-tip/mm/truncate.c
@@ -20,6 +20,7 @@
 				   do_invalidatepage */
 #include "internal.h"
 
+#include <trace/events/filemap.h>
 
 /**
  * do_invalidatepage - invalidate part or all of a page
@@ -388,6 +389,7 @@ invalidate_complete_page2(struct address
 	BUG_ON(page_has_private(page));
 	__remove_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);
+	trace_remove_from_page_cache(mapping, page->index);
 	mem_cgroup_uncharge_cache_page(page);
 	page_cache_release(page);	/* pagecache ref */
 	return 1;
Index: linux-2.6-tip/mm/vmscan.c
===================================================================
--- linux-2.6-tip.orig/mm/vmscan.c
+++ linux-2.6-tip/mm/vmscan.c
@@ -48,6 +48,8 @@
 
 #include "internal.h"
 
+#include <trace/events/filemap.h>
+
 struct scan_control {
 	/* Incremented by the number of inactive pages that were scanned */
 	unsigned long nr_scanned;
@@ -477,6 +479,7 @@ static int __remove_mapping(struct addre
 	} else {
 		__remove_from_page_cache(page);
 		spin_unlock_irq(&mapping->tree_lock);
+		trace_remove_from_page_cache(mapping, page->index);
 		mem_cgroup_uncharge_cache_page(page);
 	}
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
  2010-01-23  0:04 ` Keiichi KII
@ 2010-01-23  0:08   ` Keiichi KII
  -1 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-23  0:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: lwoodman, linux-mm, mingo, tzanussi, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

The scripts are implemented based on the trace stream scripting support.
And the scripts implement the following.
 - how many pagecaches each process has per each file
 - how many pages are cached per each file
 - how many pagecaches each process shares

To monitor pagecache usage per a process, run "pagecache-usage-record" to
record perf data for "pagecache-usage.pl" and run "pagecache-usage-report"
to display.

The below outputs show execution sample.

[file list]
        device      inode   caches
  --------------------------------
         253:0    1051413      130
         253:0    1051399        2
         253:0    1051414       44
         253:0    1051417      154

[process list]
o postmaster-2330
                            cached    added  removed      indirect
        device      inode    pages    pages    pages removed pages
  ----------------------------------------------------------------
         253:0    1051399        0        2        0             0
         253:0    1051417      154        0        0             0
         253:0    1051413      130        0        0             0
         253:0    1051414       44        0        0             0
  ----------------------------------------------------------------
  total:                       337        2        0             0

>From the output, we can know some information like:

- if "added pages" > "cached pages" on process list then
    It means repeating add/remove pagecache many times.
  => Bad case for pagecache usage

- if "added pages" <= "cached pages" on process list then
    It means no unnecessary I/O operations.
  => Good case for pagecache usage.

- if "caches" on file list > 
         sum "cached pages" per each file on process list then
    It means there are unneccessary pagecaches in the memory. 
  => Bad case for pagecache usage

Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
Cc: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
---
 tools/perf/scripts/perl/bin/pagecache-usage-record |    7 
 tools/perf/scripts/perl/bin/pagecache-usage-report |    6 
 tools/perf/scripts/perl/pagecache-usage.pl         |  160 +++++++++++++++++++++
 3 files changed, 173 insertions(+)

Index: linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-record
===================================================================
--- /dev/null
+++ linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-record
@@ -0,0 +1,7 @@
+#!/bin/bash
+perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache
+
+
+
+
+
Index: linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-report
===================================================================
--- /dev/null
+++ linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-report
@@ -0,0 +1,6 @@
+#!/bin/bash
+# description: pagecache usage per process
+perf trace -s ~/libexec/perf-core/scripts/perl/pagecache-usage.pl
+
+
+
Index: linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
===================================================================
--- /dev/null
+++ linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
@@ -0,0 +1,160 @@
+# perf trace event handlers, generated by perf trace -g perl
+# Licensed under the terms of the GNU GPL License version 2
+
+# The common_* event handler fields are the most useful fields common to
+# all events.  They don't necessarily correspond to the 'common_*' fields
+# in the format files.  Those fields not available as handler params can
+# be retrieved using Perl functions of the form common_*($context).
+# See Context.pm for the list of available functions.
+
+use lib "$ENV{'PERF_EXEC_PATH'}/scripts/perl/Perf-Trace-Util/lib";
+use lib "./Perf-Trace-Util/lib";
+use Perf::Trace::Core;
+use Perf::Trace::Context;
+use Perf::Trace::Util;
+use List::Util qw/sum/;
+my %files;
+my %processes;
+my %records;
+
+sub trace_end
+{
+	print_pagecache_usage_per_file();
+	print "\n";
+	print_pagecache_usage_per_process();
+	print_unhandled();
+}
+
+sub filemap::remove_from_page_cache
+{
+	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+	    $common_pid, $common_comm,
+	    $s_dev, $i_ino, $offset) = @_;
+	my $f = \%{$files{$s_dev}{$i_ino}};
+	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
+
+	delete $$f{$offset};
+	$$r{inode} = $i_ino;
+	$$r{dev} = $s_dev;
+	if (exists $$r{added}{$offset}) {
+	    $$r{removed}++;
+	} else {
+	    $$r{indirect_removed}++;
+	}
+}
+
+sub filemap::add_to_page_cache
+{
+	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+	    $common_pid, $common_comm,
+	    $s_dev, $i_ino, $offset) = @_;
+	my $f = \%{$files{$s_dev}{$i_ino}};
+	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
+
+	$$f{$offset}++;
+	$$r{added}{$offset}++;
+	$$r{inode} = $i_ino;
+	$$r{dev} = $s_dev;
+}
+
+sub filemap::find_get_page
+{
+	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+	    $common_pid, $common_comm,
+	    $s_dev, $i_ino, $offset, $page) = @_;
+	my $f = \%{$files{$s_dev}{$i_ino}};
+	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
+
+	if ($page != 0) {
+	    $$f{$offset}++;
+	    $$r{cached}++;
+	    $$r{inode} = $i_ino;
+	    $$r{dev} = $s_dev;
+	}
+}
+
+my %unhandled;
+
+sub trace_unhandled
+{
+	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+	    $common_pid, $common_comm) = @_;
+
+	$unhandled{$event_name}++;
+}
+
+sub print_unhandled
+{
+	if ((scalar keys %unhandled) == 0) {
+	    print "unhandled events nothing\n";
+	    return;
+	}
+
+	print "\nunhandled events:\n\n";
+
+	printf("%-40s  %10s\n", "event", "count");
+	printf("%-40s  %10s\n", "----------------------------------------",
+	       "-----------");
+
+	foreach my $event_name (keys %unhandled) {
+	    printf("%-40s  %10d\n", $event_name, $unhandled{$event_name});
+	}
+}
+
+sub minor
+{
+	my $dev = shift;
+	return $dev & ((1 << 20) - 1);
+}
+
+sub major
+{
+	my $dev = shift;
+	return $dev >> 20;
+}
+
+sub print_pagecache_usage_per_file
+{
+	print "[file list]\n";
+	printf("  %12s %10s %8s\n", "", "", "cached");
+	printf("  %12s %10s %8s\n", "device", "inode", "pages");
+	printf("  %s\n", '-' x 32);
+	while(my($dev, $file) = each(%files)) {
+	    while(my($inode, $r) = each(%$file)) {
+		my $count = values %$r;
+		next if $count == 0;
+		printf("  %12s %10d %8d\n",
+		       major($dev).":".minor($dev), $inode, $count);
+	    }
+	}
+}
+
+sub print_pagecache_usage_per_process
+{
+	print "[process list]\n";
+	while(my ($pid, $v) = each(%records)) {
+	    my ($sum_cached, $sum_added, $sum_removed, $sum_indirect_removed);
+
+	    print "o $pid\n";
+	    printf("  %12s %10s %8s %8s %8s %13s\n", "", "",
+		   "cached", "added", "removed", "indirect");
+	    printf("  %12s %10s %8s %8s %8s %13s\n", "device", "inode",
+		   "pages", "pages", "pages", "removed pages");
+	    printf("  %s\n", '-' x 64);
+	    while(my ($file, $r) = each(%$v)) {
+		my $added_num = List::Util::sum(values %{$$r{added}});
+		$sum_cached += $$r{cached};
+		$sum_added += $added_num;
+		$sum_removed += $$r{removed};
+		$sum_indirect_removed += $$r{indirect_removed};
+		printf("  %12s %10d %8d %8d %8d %13d\n",
+		       major($$r{dev}).":".minor($$r{dev}), $$r{inode},
+		       $$r{cached}, $added_num, $$r{removed},
+		       $$r{indirect_removed});
+	    }
+	    printf("  %s\n", '-' x 64);
+	    printf("  total: %5s %10s %8d %8d %8d %13d\n", "", "", $sum_cached,
+		   $sum_added, $sum_removed, $sum_indirect_removed);
+	    print "\n";
+	}
+}




^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
@ 2010-01-23  0:08   ` Keiichi KII
  0 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-23  0:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: lwoodman, linux-mm, mingo, tzanussi, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

The scripts are implemented based on the trace stream scripting support.
And the scripts implement the following.
 - how many pagecaches each process has per each file
 - how many pages are cached per each file
 - how many pagecaches each process shares

To monitor pagecache usage per a process, run "pagecache-usage-record" to
record perf data for "pagecache-usage.pl" and run "pagecache-usage-report"
to display.

The below outputs show execution sample.

[file list]
        device      inode   caches
  --------------------------------
         253:0    1051413      130
         253:0    1051399        2
         253:0    1051414       44
         253:0    1051417      154

[process list]
o postmaster-2330
                            cached    added  removed      indirect
        device      inode    pages    pages    pages removed pages
  ----------------------------------------------------------------
         253:0    1051399        0        2        0             0
         253:0    1051417      154        0        0             0
         253:0    1051413      130        0        0             0
         253:0    1051414       44        0        0             0
  ----------------------------------------------------------------
  total:                       337        2        0             0

>From the output, we can know some information like:

- if "added pages" > "cached pages" on process list then
    It means repeating add/remove pagecache many times.
  => Bad case for pagecache usage

- if "added pages" <= "cached pages" on process list then
    It means no unnecessary I/O operations.
  => Good case for pagecache usage.

- if "caches" on file list > 
         sum "cached pages" per each file on process list then
    It means there are unneccessary pagecaches in the memory. 
  => Bad case for pagecache usage

Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
Cc: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
---
 tools/perf/scripts/perl/bin/pagecache-usage-record |    7 
 tools/perf/scripts/perl/bin/pagecache-usage-report |    6 
 tools/perf/scripts/perl/pagecache-usage.pl         |  160 +++++++++++++++++++++
 3 files changed, 173 insertions(+)

Index: linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-record
===================================================================
--- /dev/null
+++ linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-record
@@ -0,0 +1,7 @@
+#!/bin/bash
+perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache
+
+
+
+
+
Index: linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-report
===================================================================
--- /dev/null
+++ linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-report
@@ -0,0 +1,6 @@
+#!/bin/bash
+# description: pagecache usage per process
+perf trace -s ~/libexec/perf-core/scripts/perl/pagecache-usage.pl
+
+
+
Index: linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
===================================================================
--- /dev/null
+++ linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
@@ -0,0 +1,160 @@
+# perf trace event handlers, generated by perf trace -g perl
+# Licensed under the terms of the GNU GPL License version 2
+
+# The common_* event handler fields are the most useful fields common to
+# all events.  They don't necessarily correspond to the 'common_*' fields
+# in the format files.  Those fields not available as handler params can
+# be retrieved using Perl functions of the form common_*($context).
+# See Context.pm for the list of available functions.
+
+use lib "$ENV{'PERF_EXEC_PATH'}/scripts/perl/Perf-Trace-Util/lib";
+use lib "./Perf-Trace-Util/lib";
+use Perf::Trace::Core;
+use Perf::Trace::Context;
+use Perf::Trace::Util;
+use List::Util qw/sum/;
+my %files;
+my %processes;
+my %records;
+
+sub trace_end
+{
+	print_pagecache_usage_per_file();
+	print "\n";
+	print_pagecache_usage_per_process();
+	print_unhandled();
+}
+
+sub filemap::remove_from_page_cache
+{
+	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+	    $common_pid, $common_comm,
+	    $s_dev, $i_ino, $offset) = @_;
+	my $f = \%{$files{$s_dev}{$i_ino}};
+	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
+
+	delete $$f{$offset};
+	$$r{inode} = $i_ino;
+	$$r{dev} = $s_dev;
+	if (exists $$r{added}{$offset}) {
+	    $$r{removed}++;
+	} else {
+	    $$r{indirect_removed}++;
+	}
+}
+
+sub filemap::add_to_page_cache
+{
+	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+	    $common_pid, $common_comm,
+	    $s_dev, $i_ino, $offset) = @_;
+	my $f = \%{$files{$s_dev}{$i_ino}};
+	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
+
+	$$f{$offset}++;
+	$$r{added}{$offset}++;
+	$$r{inode} = $i_ino;
+	$$r{dev} = $s_dev;
+}
+
+sub filemap::find_get_page
+{
+	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+	    $common_pid, $common_comm,
+	    $s_dev, $i_ino, $offset, $page) = @_;
+	my $f = \%{$files{$s_dev}{$i_ino}};
+	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
+
+	if ($page != 0) {
+	    $$f{$offset}++;
+	    $$r{cached}++;
+	    $$r{inode} = $i_ino;
+	    $$r{dev} = $s_dev;
+	}
+}
+
+my %unhandled;
+
+sub trace_unhandled
+{
+	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+	    $common_pid, $common_comm) = @_;
+
+	$unhandled{$event_name}++;
+}
+
+sub print_unhandled
+{
+	if ((scalar keys %unhandled) == 0) {
+	    print "unhandled events nothing\n";
+	    return;
+	}
+
+	print "\nunhandled events:\n\n";
+
+	printf("%-40s  %10s\n", "event", "count");
+	printf("%-40s  %10s\n", "----------------------------------------",
+	       "-----------");
+
+	foreach my $event_name (keys %unhandled) {
+	    printf("%-40s  %10d\n", $event_name, $unhandled{$event_name});
+	}
+}
+
+sub minor
+{
+	my $dev = shift;
+	return $dev & ((1 << 20) - 1);
+}
+
+sub major
+{
+	my $dev = shift;
+	return $dev >> 20;
+}
+
+sub print_pagecache_usage_per_file
+{
+	print "[file list]\n";
+	printf("  %12s %10s %8s\n", "", "", "cached");
+	printf("  %12s %10s %8s\n", "device", "inode", "pages");
+	printf("  %s\n", '-' x 32);
+	while(my($dev, $file) = each(%files)) {
+	    while(my($inode, $r) = each(%$file)) {
+		my $count = values %$r;
+		next if $count == 0;
+		printf("  %12s %10d %8d\n",
+		       major($dev).":".minor($dev), $inode, $count);
+	    }
+	}
+}
+
+sub print_pagecache_usage_per_process
+{
+	print "[process list]\n";
+	while(my ($pid, $v) = each(%records)) {
+	    my ($sum_cached, $sum_added, $sum_removed, $sum_indirect_removed);
+
+	    print "o $pid\n";
+	    printf("  %12s %10s %8s %8s %8s %13s\n", "", "",
+		   "cached", "added", "removed", "indirect");
+	    printf("  %12s %10s %8s %8s %8s %13s\n", "device", "inode",
+		   "pages", "pages", "pages", "removed pages");
+	    printf("  %s\n", '-' x 64);
+	    while(my ($file, $r) = each(%$v)) {
+		my $added_num = List::Util::sum(values %{$$r{added}});
+		$sum_cached += $$r{cached};
+		$sum_added += $added_num;
+		$sum_removed += $$r{removed};
+		$sum_indirect_removed += $$r{indirect_removed};
+		printf("  %12s %10d %8d %8d %8d %13d\n",
+		       major($$r{dev}).":".minor($$r{dev}), $$r{inode},
+		       $$r{cached}, $added_num, $$r{removed},
+		       $$r{indirect_removed});
+	    }
+	    printf("  %s\n", '-' x 64);
+	    printf("  total: %5s %10s %8d %8d %8d %13d\n", "", "", $sum_cached,
+		   $sum_added, $sum_removed, $sum_indirect_removed);
+	    print "\n";
+	}
+}



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 1/2 v2] add tracepoints for pagecache
  2010-01-23  0:07   ` Keiichi KII
@ 2010-01-23  2:28     ` Steven Rostedt
  -1 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2010-01-23  2:28 UTC (permalink / raw)
  To: Keiichi KII
  Cc: linux-kernel, lwoodman, linux-mm, mingo, tzanussi, riel, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

On Fri, 2010-01-22 at 19:07 -0500, Keiichi KII wrote:
> This patch adds several tracepoints to track pagecach behavior.
> These trecepoints would help us monitor pagecache usage with high resolution.
> 
> Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
> Cc: Atsushi Tsuji <a-tsuji@bk.jp.nec.com> 
> ---
>  include/trace/events/filemap.h |   83 +++++++++++++++++++++++++++++++++++++++++
>  mm/filemap.c                   |    5 ++
>  mm/truncate.c                  |    2 
>  mm/vmscan.c                    |    3 +
>  4 files changed, 93 insertions(+)
> 
> Index: linux-2.6-tip/include/trace/events/filemap.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6-tip/include/trace/events/filemap.h
> @@ -0,0 +1,83 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM filemap
> +
> +#if !defined(_TRACE_FILEMAP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_FILEMAP_H
> +
> +#include <linux/fs.h>
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(find_get_page,
> +
> +	TP_PROTO(struct address_space *mapping, pgoff_t offset,
> +		struct page *page),
> +
> +	TP_ARGS(mapping, offset, page),
> +
> +	TP_STRUCT__entry(
> +		__field(dev_t, s_dev)
> +		__field(ino_t, i_ino)
> +		__field(pgoff_t, offset)
> +		__field(struct page *, page)
> +		),
> +
> +	TP_fast_assign(
> +		__entry->s_dev = mapping->host ? mapping->host->i_sb->s_dev : 0;
> +		__entry->i_ino = mapping->host ? mapping->host->i_ino : 0;
> +		__entry->offset = offset;
> +		__entry->page = page;
> +		),
> +
> +	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu %s", MAJOR(__entry->s_dev),
> +		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset,
> +		__entry->page == NULL ? "page_not_found" : "page_found")
> +);
> +
> +TRACE_EVENT(add_to_page_cache,
> +
> +	TP_PROTO(struct address_space *mapping, pgoff_t offset),
> +
> +	TP_ARGS(mapping, offset),
> +
> +	TP_STRUCT__entry(
> +		__field(dev_t, s_dev)
> +		__field(ino_t, i_ino)
> +		__field(pgoff_t, offset)
> +		),
> +
> +	TP_fast_assign(
> +		__entry->s_dev = mapping->host->i_sb->s_dev;
> +		__entry->i_ino = mapping->host->i_ino;
> +		__entry->offset = offset;
> +		),
> +
> +	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
> +		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
> +);
> +
> +TRACE_EVENT(remove_from_page_cache,
> +
> +	TP_PROTO(struct address_space *mapping, pgoff_t offset),
> +
> +	TP_ARGS(mapping, offset),
> +
> +	TP_STRUCT__entry(
> +		__field(dev_t, s_dev)
> +		__field(ino_t, i_ino)
> +		__field(pgoff_t, offset)
> +		),
> +
> +	TP_fast_assign(
> +		__entry->s_dev = mapping->host->i_sb->s_dev;
> +		__entry->i_ino = mapping->host->i_ino;
> +		__entry->offset = offset;
> +		),
> +
> +	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
> +		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
> +);
> +

The above qualify in converting to templates or DECLACE_TRACE_CLASS, and
DEFINE_EVENT, That is, rename the above TRACE_EVENT into
DECLARE_TRACE_CLASS, and then have the other one be a DEFINE_EVENT().
See the trace/event/sched.h for examples.

The TRACE_EVENT can add a bit of code, so use DECLARE_TRACE_CLASS when
possible and it will save on the size overhead.

-- Steve

> +#endif /* _TRACE_FILEMAP_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> Index: linux-2.6-tip/mm/filemap.c
> ===================================================================
> --- linux-2.6-tip.orig/mm/filemap.c
> +++ linux-2.6-tip/mm/filemap.c
> @@ -34,6 +34,8 @@
>  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
>  #include <linux/memcontrol.h>
>  #include <linux/mm_inline.h> /* for page_is_file_cache() */
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/filemap.h>
>  #include "internal.h"
>  
>  /*
> @@ -149,6 +151,7 @@ void remove_from_page_cache(struct page 
>  	spin_lock_irq(&mapping->tree_lock);
>  	__remove_from_page_cache(page);
>  	spin_unlock_irq(&mapping->tree_lock);
> +	trace_remove_from_page_cache(mapping, page->index);
>  	mem_cgroup_uncharge_cache_page(page);
>  }
>  
> @@ -419,6 +422,7 @@ int add_to_page_cache_locked(struct page
>  			if (PageSwapBacked(page))
>  				__inc_zone_page_state(page, NR_SHMEM);
>  			spin_unlock_irq(&mapping->tree_lock);
> +			trace_add_to_page_cache(mapping, offset);
>  		} else {
>  			page->mapping = NULL;
>  			spin_unlock_irq(&mapping->tree_lock);
> @@ -642,6 +646,7 @@ repeat:
>  	}
>  	rcu_read_unlock();
>  
> +	trace_find_get_page(mapping, offset, page);
>  	return page;
>  }
>  EXPORT_SYMBOL(find_get_page);
> Index: linux-2.6-tip/mm/truncate.c
> ===================================================================
> --- linux-2.6-tip.orig/mm/truncate.c
> +++ linux-2.6-tip/mm/truncate.c
> @@ -20,6 +20,7 @@
>  				   do_invalidatepage */
>  #include "internal.h"
>  
> +#include <trace/events/filemap.h>
>  
>  /**
>   * do_invalidatepage - invalidate part or all of a page
> @@ -388,6 +389,7 @@ invalidate_complete_page2(struct address
>  	BUG_ON(page_has_private(page));
>  	__remove_from_page_cache(page);
>  	spin_unlock_irq(&mapping->tree_lock);
> +	trace_remove_from_page_cache(mapping, page->index);
>  	mem_cgroup_uncharge_cache_page(page);
>  	page_cache_release(page);	/* pagecache ref */
>  	return 1;
> Index: linux-2.6-tip/mm/vmscan.c
> ===================================================================
> --- linux-2.6-tip.orig/mm/vmscan.c
> +++ linux-2.6-tip/mm/vmscan.c
> @@ -48,6 +48,8 @@
>  
>  #include "internal.h"
>  
> +#include <trace/events/filemap.h>
> +
>  struct scan_control {
>  	/* Incremented by the number of inactive pages that were scanned */
>  	unsigned long nr_scanned;
> @@ -477,6 +479,7 @@ static int __remove_mapping(struct addre
>  	} else {
>  		__remove_from_page_cache(page);
>  		spin_unlock_irq(&mapping->tree_lock);
> +		trace_remove_from_page_cache(mapping, page->index);
>  		mem_cgroup_uncharge_cache_page(page);
>  	}
>  
> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 1/2 v2] add tracepoints for pagecache
@ 2010-01-23  2:28     ` Steven Rostedt
  0 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2010-01-23  2:28 UTC (permalink / raw)
  To: Keiichi KII
  Cc: linux-kernel, lwoodman, linux-mm, mingo, tzanussi, riel, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

On Fri, 2010-01-22 at 19:07 -0500, Keiichi KII wrote:
> This patch adds several tracepoints to track pagecach behavior.
> These trecepoints would help us monitor pagecache usage with high resolution.
> 
> Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
> Cc: Atsushi Tsuji <a-tsuji@bk.jp.nec.com> 
> ---
>  include/trace/events/filemap.h |   83 +++++++++++++++++++++++++++++++++++++++++
>  mm/filemap.c                   |    5 ++
>  mm/truncate.c                  |    2 
>  mm/vmscan.c                    |    3 +
>  4 files changed, 93 insertions(+)
> 
> Index: linux-2.6-tip/include/trace/events/filemap.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6-tip/include/trace/events/filemap.h
> @@ -0,0 +1,83 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM filemap
> +
> +#if !defined(_TRACE_FILEMAP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_FILEMAP_H
> +
> +#include <linux/fs.h>
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(find_get_page,
> +
> +	TP_PROTO(struct address_space *mapping, pgoff_t offset,
> +		struct page *page),
> +
> +	TP_ARGS(mapping, offset, page),
> +
> +	TP_STRUCT__entry(
> +		__field(dev_t, s_dev)
> +		__field(ino_t, i_ino)
> +		__field(pgoff_t, offset)
> +		__field(struct page *, page)
> +		),
> +
> +	TP_fast_assign(
> +		__entry->s_dev = mapping->host ? mapping->host->i_sb->s_dev : 0;
> +		__entry->i_ino = mapping->host ? mapping->host->i_ino : 0;
> +		__entry->offset = offset;
> +		__entry->page = page;
> +		),
> +
> +	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu %s", MAJOR(__entry->s_dev),
> +		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset,
> +		__entry->page == NULL ? "page_not_found" : "page_found")
> +);
> +
> +TRACE_EVENT(add_to_page_cache,
> +
> +	TP_PROTO(struct address_space *mapping, pgoff_t offset),
> +
> +	TP_ARGS(mapping, offset),
> +
> +	TP_STRUCT__entry(
> +		__field(dev_t, s_dev)
> +		__field(ino_t, i_ino)
> +		__field(pgoff_t, offset)
> +		),
> +
> +	TP_fast_assign(
> +		__entry->s_dev = mapping->host->i_sb->s_dev;
> +		__entry->i_ino = mapping->host->i_ino;
> +		__entry->offset = offset;
> +		),
> +
> +	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
> +		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
> +);
> +
> +TRACE_EVENT(remove_from_page_cache,
> +
> +	TP_PROTO(struct address_space *mapping, pgoff_t offset),
> +
> +	TP_ARGS(mapping, offset),
> +
> +	TP_STRUCT__entry(
> +		__field(dev_t, s_dev)
> +		__field(ino_t, i_ino)
> +		__field(pgoff_t, offset)
> +		),
> +
> +	TP_fast_assign(
> +		__entry->s_dev = mapping->host->i_sb->s_dev;
> +		__entry->i_ino = mapping->host->i_ino;
> +		__entry->offset = offset;
> +		),
> +
> +	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
> +		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
> +);
> +

The above qualify in converting to templates or DECLACE_TRACE_CLASS, and
DEFINE_EVENT, That is, rename the above TRACE_EVENT into
DECLARE_TRACE_CLASS, and then have the other one be a DEFINE_EVENT().
See the trace/event/sched.h for examples.

The TRACE_EVENT can add a bit of code, so use DECLARE_TRACE_CLASS when
possible and it will save on the size overhead.

-- Steve

> +#endif /* _TRACE_FILEMAP_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> Index: linux-2.6-tip/mm/filemap.c
> ===================================================================
> --- linux-2.6-tip.orig/mm/filemap.c
> +++ linux-2.6-tip/mm/filemap.c
> @@ -34,6 +34,8 @@
>  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
>  #include <linux/memcontrol.h>
>  #include <linux/mm_inline.h> /* for page_is_file_cache() */
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/filemap.h>
>  #include "internal.h"
>  
>  /*
> @@ -149,6 +151,7 @@ void remove_from_page_cache(struct page 
>  	spin_lock_irq(&mapping->tree_lock);
>  	__remove_from_page_cache(page);
>  	spin_unlock_irq(&mapping->tree_lock);
> +	trace_remove_from_page_cache(mapping, page->index);
>  	mem_cgroup_uncharge_cache_page(page);
>  }
>  
> @@ -419,6 +422,7 @@ int add_to_page_cache_locked(struct page
>  			if (PageSwapBacked(page))
>  				__inc_zone_page_state(page, NR_SHMEM);
>  			spin_unlock_irq(&mapping->tree_lock);
> +			trace_add_to_page_cache(mapping, offset);
>  		} else {
>  			page->mapping = NULL;
>  			spin_unlock_irq(&mapping->tree_lock);
> @@ -642,6 +646,7 @@ repeat:
>  	}
>  	rcu_read_unlock();
>  
> +	trace_find_get_page(mapping, offset, page);
>  	return page;
>  }
>  EXPORT_SYMBOL(find_get_page);
> Index: linux-2.6-tip/mm/truncate.c
> ===================================================================
> --- linux-2.6-tip.orig/mm/truncate.c
> +++ linux-2.6-tip/mm/truncate.c
> @@ -20,6 +20,7 @@
>  				   do_invalidatepage */
>  #include "internal.h"
>  
> +#include <trace/events/filemap.h>
>  
>  /**
>   * do_invalidatepage - invalidate part or all of a page
> @@ -388,6 +389,7 @@ invalidate_complete_page2(struct address
>  	BUG_ON(page_has_private(page));
>  	__remove_from_page_cache(page);
>  	spin_unlock_irq(&mapping->tree_lock);
> +	trace_remove_from_page_cache(mapping, page->index);
>  	mem_cgroup_uncharge_cache_page(page);
>  	page_cache_release(page);	/* pagecache ref */
>  	return 1;
> Index: linux-2.6-tip/mm/vmscan.c
> ===================================================================
> --- linux-2.6-tip.orig/mm/vmscan.c
> +++ linux-2.6-tip/mm/vmscan.c
> @@ -48,6 +48,8 @@
>  
>  #include "internal.h"
>  
> +#include <trace/events/filemap.h>
> +
>  struct scan_control {
>  	/* Incremented by the number of inactive pages that were scanned */
>  	unsigned long nr_scanned;
> @@ -477,6 +479,7 @@ static int __remove_mapping(struct addre
>  	} else {
>  		__remove_from_page_cache(page);
>  		spin_unlock_irq(&mapping->tree_lock);
> +		trace_remove_from_page_cache(mapping, page->index);
>  		mem_cgroup_uncharge_cache_page(page);
>  	}
>  
> 
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
  2010-01-23  0:08   ` Keiichi KII
@ 2010-01-23  8:21     ` Tom Zanussi
  -1 siblings, 0 replies; 22+ messages in thread
From: Tom Zanussi @ 2010-01-23  8:21 UTC (permalink / raw)
  To: Keiichi KII
  Cc: linux-kernel, lwoodman, linux-mm, mingo, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

Hi,

On Fri, 2010-01-22 at 19:08 -0500, Keiichi KII wrote:
> The scripts are implemented based on the trace stream scripting support.
> And the scripts implement the following.
>  - how many pagecaches each process has per each file
>  - how many pages are cached per each file
>  - how many pagecaches each process shares
> 

Nice, it looks like a very useful script - I gave it a quick try and it
seems to work well...

The only problem I see, nothing to do with your script and nothing you
can do anything about at the moment, is that the record step generates a
huge amount of data, which of course makes the event processing take
awhile.  A lot of it appears to be due to perf itself - being able to
filter out the perf-generated events in the kernel would make a big
difference, I think; you normally don't want to see those anyway...

BTW, I see that you did your first version in Python - not that you'd
want to redo it again, but just FYI I now have working Python support
that I'll be posting soon - I still have some small details to hammer
out, but if you have any other scripts in the pipeline, in a couple days
you'll be able to use Python instead if you want.

A few more small comments below...

> To monitor pagecache usage per a process, run "pagecache-usage-record" to
> record perf data for "pagecache-usage.pl" and run "pagecache-usage-report"
> to display.

Another way of course would be to use 'perf trace record/report' and the
script name as shown by perf trace -l:

$ perf trace record pagecache-usage
$ perf trace report pagecache-usage

> 
> The below outputs show execution sample.
> 
> [file list]
>         device      inode   caches
>   --------------------------------
>          253:0    1051413      130
>          253:0    1051399        2
>          253:0    1051414       44
>          253:0    1051417      154
> 
> [process list]
> o postmaster-2330
>                             cached    added  removed      indirect
>         device      inode    pages    pages    pages removed pages
>   ----------------------------------------------------------------
>          253:0    1051399        0        2        0             0
>          253:0    1051417      154        0        0             0
>          253:0    1051413      130        0        0             0
>          253:0    1051414       44        0        0             0
>   ----------------------------------------------------------------
>   total:                       337        2        0             0
> 
> >From the output, we can know some information like:
> 
> - if "added pages" > "cached pages" on process list then
>     It means repeating add/remove pagecache many times.
>   => Bad case for pagecache usage
> 
> - if "added pages" <= "cached pages" on process list then
>     It means no unnecessary I/O operations.
>   => Good case for pagecache usage.
> 
> - if "caches" on file list > 
>          sum "cached pages" per each file on process list then
>     It means there are unneccessary pagecaches in the memory. 
>   => Bad case for pagecache usage
> 
> Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
> Cc: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> ---
>  tools/perf/scripts/perl/bin/pagecache-usage-record |    7 
>  tools/perf/scripts/perl/bin/pagecache-usage-report |    6 
>  tools/perf/scripts/perl/pagecache-usage.pl         |  160 +++++++++++++++++++++
>  3 files changed, 173 insertions(+)
> 
> Index: linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-record
> ===================================================================
> --- /dev/null
> +++ linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-record
> @@ -0,0 +1,7 @@
> +#!/bin/bash
> +perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache
> +
> +
> +
> +
> +
> Index: linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-report
> ===================================================================
> --- /dev/null
> +++ linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-report
> @@ -0,0 +1,6 @@
> +#!/bin/bash
> +# description: pagecache usage per process
> +perf trace -s ~/libexec/perf-core/scripts/perl/pagecache-usage.pl
> +
> +
> +
> Index: linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
> ===================================================================
> --- /dev/null
> +++ linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
> @@ -0,0 +1,160 @@
> +# perf trace event handlers, generated by perf trace -g perl

You might want to get rid of this and add a short description and your
name, if you want to take credit for it. ;-)

> +# Licensed under the terms of the GNU GPL License version 2
> +

> +# The common_* event handler fields are the most useful fields common to
> +# all events.  They don't necessarily correspond to the 'common_*' fields
> +# in the format files.  Those fields not available as handler params can
> +# be retrieved using Perl functions of the form common_*($context).
> +# See Context.pm for the list of available functions.
> +

You can get rid of this part too - it's just meant to be helpful
information generated when starting a script.

> +use lib "$ENV{'PERF_EXEC_PATH'}/scripts/perl/Perf-Trace-Util/lib";
> +use lib "./Perf-Trace-Util/lib";
> +use Perf::Trace::Core;
> +use Perf::Trace::Context;
> +use Perf::Trace::Util;
> +use List::Util qw/sum/;
> +my %files;
> +my %processes;
> +my %records;
> +
> +sub trace_end
> +{
> +	print_pagecache_usage_per_file();
> +	print "\n";
> +	print_pagecache_usage_per_process();
> +	print_unhandled();
> +}
> +
> +sub filemap::remove_from_page_cache
> +{
> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
> +	    $common_pid, $common_comm,
> +	    $s_dev, $i_ino, $offset) = @_;
> +	my $f = \%{$files{$s_dev}{$i_ino}};
> +	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
> +
> +	delete $$f{$offset};
> +	$$r{inode} = $i_ino;
> +	$$r{dev} = $s_dev;
> +	if (exists $$r{added}{$offset}) {
> +	    $$r{removed}++;
> +	} else {
> +	    $$r{indirect_removed}++;
> +	}
> +}
> +
> +sub filemap::add_to_page_cache
> +{
> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
> +	    $common_pid, $common_comm,
> +	    $s_dev, $i_ino, $offset) = @_;
> +	my $f = \%{$files{$s_dev}{$i_ino}};
> +	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
> +
> +	$$f{$offset}++;
> +	$$r{added}{$offset}++;
> +	$$r{inode} = $i_ino;
> +	$$r{dev} = $s_dev;
> +}
> +
> +sub filemap::find_get_page
> +{
> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
> +	    $common_pid, $common_comm,
> +	    $s_dev, $i_ino, $offset, $page) = @_;
> +	my $f = \%{$files{$s_dev}{$i_ino}};
> +	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
> +
> +	if ($page != 0) {
> +	    $$f{$offset}++;
> +	    $$r{cached}++;
> +	    $$r{inode} = $i_ino;
> +	    $$r{dev} = $s_dev;
> +	}
> +}
> +
> +my %unhandled;
> +
> +sub trace_unhandled
> +{
> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
> +	    $common_pid, $common_comm) = @_;
> +
> +	$unhandled{$event_name}++;
> +}
> +
> +sub print_unhandled
> +{
> +	if ((scalar keys %unhandled) == 0) {
> +	    print "unhandled events nothing\n";

This is kind of distracting - it's not too useful to know that you don't
have unhandled events, but if you do have some, it is useful to print
those as you do below - it points out that some event type are being
unnecessarily recorded or the script is being run on the wrong trace
data.

Thanks,

Tom

> +	    return;
> +	}
> +
> +	print "\nunhandled events:\n\n";
> +
> +	printf("%-40s  %10s\n", "event", "count");
> +	printf("%-40s  %10s\n", "----------------------------------------",
> +	       "-----------");
> +
> +	foreach my $event_name (keys %unhandled) {
> +	    printf("%-40s  %10d\n", $event_name, $unhandled{$event_name});
> +	}
> +}
> +
> +sub minor
> +{
> +	my $dev = shift;
> +	return $dev & ((1 << 20) - 1);
> +}
> +
> +sub major
> +{
> +	my $dev = shift;
> +	return $dev >> 20;
> +}
> +
> +sub print_pagecache_usage_per_file
> +{
> +	print "[file list]\n";
> +	printf("  %12s %10s %8s\n", "", "", "cached");
> +	printf("  %12s %10s %8s\n", "device", "inode", "pages");
> +	printf("  %s\n", '-' x 32);
> +	while(my($dev, $file) = each(%files)) {
> +	    while(my($inode, $r) = each(%$file)) {
> +		my $count = values %$r;
> +		next if $count == 0;
> +		printf("  %12s %10d %8d\n",
> +		       major($dev).":".minor($dev), $inode, $count);
> +	    }
> +	}
> +}
> +
> +sub print_pagecache_usage_per_process
> +{
> +	print "[process list]\n";
> +	while(my ($pid, $v) = each(%records)) {
> +	    my ($sum_cached, $sum_added, $sum_removed, $sum_indirect_removed);
> +
> +	    print "o $pid\n";
> +	    printf("  %12s %10s %8s %8s %8s %13s\n", "", "",
> +		   "cached", "added", "removed", "indirect");
> +	    printf("  %12s %10s %8s %8s %8s %13s\n", "device", "inode",
> +		   "pages", "pages", "pages", "removed pages");
> +	    printf("  %s\n", '-' x 64);
> +	    while(my ($file, $r) = each(%$v)) {
> +		my $added_num = List::Util::sum(values %{$$r{added}});
> +		$sum_cached += $$r{cached};
> +		$sum_added += $added_num;
> +		$sum_removed += $$r{removed};
> +		$sum_indirect_removed += $$r{indirect_removed};
> +		printf("  %12s %10d %8d %8d %8d %13d\n",
> +		       major($$r{dev}).":".minor($$r{dev}), $$r{inode},
> +		       $$r{cached}, $added_num, $$r{removed},
> +		       $$r{indirect_removed});
> +	    }
> +	    printf("  %s\n", '-' x 64);
> +	    printf("  total: %5s %10s %8d %8d %8d %13d\n", "", "", $sum_cached,
> +		   $sum_added, $sum_removed, $sum_indirect_removed);
> +	    print "\n";
> +	}
> +}
> 
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
@ 2010-01-23  8:21     ` Tom Zanussi
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Zanussi @ 2010-01-23  8:21 UTC (permalink / raw)
  To: Keiichi KII
  Cc: linux-kernel, lwoodman, linux-mm, mingo, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

Hi,

On Fri, 2010-01-22 at 19:08 -0500, Keiichi KII wrote:
> The scripts are implemented based on the trace stream scripting support.
> And the scripts implement the following.
>  - how many pagecaches each process has per each file
>  - how many pages are cached per each file
>  - how many pagecaches each process shares
> 

Nice, it looks like a very useful script - I gave it a quick try and it
seems to work well...

The only problem I see, nothing to do with your script and nothing you
can do anything about at the moment, is that the record step generates a
huge amount of data, which of course makes the event processing take
awhile.  A lot of it appears to be due to perf itself - being able to
filter out the perf-generated events in the kernel would make a big
difference, I think; you normally don't want to see those anyway...

BTW, I see that you did your first version in Python - not that you'd
want to redo it again, but just FYI I now have working Python support
that I'll be posting soon - I still have some small details to hammer
out, but if you have any other scripts in the pipeline, in a couple days
you'll be able to use Python instead if you want.

A few more small comments below...

> To monitor pagecache usage per a process, run "pagecache-usage-record" to
> record perf data for "pagecache-usage.pl" and run "pagecache-usage-report"
> to display.

Another way of course would be to use 'perf trace record/report' and the
script name as shown by perf trace -l:

$ perf trace record pagecache-usage
$ perf trace report pagecache-usage

> 
> The below outputs show execution sample.
> 
> [file list]
>         device      inode   caches
>   --------------------------------
>          253:0    1051413      130
>          253:0    1051399        2
>          253:0    1051414       44
>          253:0    1051417      154
> 
> [process list]
> o postmaster-2330
>                             cached    added  removed      indirect
>         device      inode    pages    pages    pages removed pages
>   ----------------------------------------------------------------
>          253:0    1051399        0        2        0             0
>          253:0    1051417      154        0        0             0
>          253:0    1051413      130        0        0             0
>          253:0    1051414       44        0        0             0
>   ----------------------------------------------------------------
>   total:                       337        2        0             0
> 
> >From the output, we can know some information like:
> 
> - if "added pages" > "cached pages" on process list then
>     It means repeating add/remove pagecache many times.
>   => Bad case for pagecache usage
> 
> - if "added pages" <= "cached pages" on process list then
>     It means no unnecessary I/O operations.
>   => Good case for pagecache usage.
> 
> - if "caches" on file list > 
>          sum "cached pages" per each file on process list then
>     It means there are unneccessary pagecaches in the memory. 
>   => Bad case for pagecache usage
> 
> Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
> Cc: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> ---
>  tools/perf/scripts/perl/bin/pagecache-usage-record |    7 
>  tools/perf/scripts/perl/bin/pagecache-usage-report |    6 
>  tools/perf/scripts/perl/pagecache-usage.pl         |  160 +++++++++++++++++++++
>  3 files changed, 173 insertions(+)
> 
> Index: linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-record
> ===================================================================
> --- /dev/null
> +++ linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-record
> @@ -0,0 +1,7 @@
> +#!/bin/bash
> +perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache
> +
> +
> +
> +
> +
> Index: linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-report
> ===================================================================
> --- /dev/null
> +++ linux-2.6-tip/tools/perf/scripts/perl/bin/pagecache-usage-report
> @@ -0,0 +1,6 @@
> +#!/bin/bash
> +# description: pagecache usage per process
> +perf trace -s ~/libexec/perf-core/scripts/perl/pagecache-usage.pl
> +
> +
> +
> Index: linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
> ===================================================================
> --- /dev/null
> +++ linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
> @@ -0,0 +1,160 @@
> +# perf trace event handlers, generated by perf trace -g perl

You might want to get rid of this and add a short description and your
name, if you want to take credit for it. ;-)

> +# Licensed under the terms of the GNU GPL License version 2
> +

> +# The common_* event handler fields are the most useful fields common to
> +# all events.  They don't necessarily correspond to the 'common_*' fields
> +# in the format files.  Those fields not available as handler params can
> +# be retrieved using Perl functions of the form common_*($context).
> +# See Context.pm for the list of available functions.
> +

You can get rid of this part too - it's just meant to be helpful
information generated when starting a script.

> +use lib "$ENV{'PERF_EXEC_PATH'}/scripts/perl/Perf-Trace-Util/lib";
> +use lib "./Perf-Trace-Util/lib";
> +use Perf::Trace::Core;
> +use Perf::Trace::Context;
> +use Perf::Trace::Util;
> +use List::Util qw/sum/;
> +my %files;
> +my %processes;
> +my %records;
> +
> +sub trace_end
> +{
> +	print_pagecache_usage_per_file();
> +	print "\n";
> +	print_pagecache_usage_per_process();
> +	print_unhandled();
> +}
> +
> +sub filemap::remove_from_page_cache
> +{
> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
> +	    $common_pid, $common_comm,
> +	    $s_dev, $i_ino, $offset) = @_;
> +	my $f = \%{$files{$s_dev}{$i_ino}};
> +	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
> +
> +	delete $$f{$offset};
> +	$$r{inode} = $i_ino;
> +	$$r{dev} = $s_dev;
> +	if (exists $$r{added}{$offset}) {
> +	    $$r{removed}++;
> +	} else {
> +	    $$r{indirect_removed}++;
> +	}
> +}
> +
> +sub filemap::add_to_page_cache
> +{
> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
> +	    $common_pid, $common_comm,
> +	    $s_dev, $i_ino, $offset) = @_;
> +	my $f = \%{$files{$s_dev}{$i_ino}};
> +	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
> +
> +	$$f{$offset}++;
> +	$$r{added}{$offset}++;
> +	$$r{inode} = $i_ino;
> +	$$r{dev} = $s_dev;
> +}
> +
> +sub filemap::find_get_page
> +{
> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
> +	    $common_pid, $common_comm,
> +	    $s_dev, $i_ino, $offset, $page) = @_;
> +	my $f = \%{$files{$s_dev}{$i_ino}};
> +	my $r = \%{$records{$common_comm."-".$common_pid}{$f}};
> +
> +	if ($page != 0) {
> +	    $$f{$offset}++;
> +	    $$r{cached}++;
> +	    $$r{inode} = $i_ino;
> +	    $$r{dev} = $s_dev;
> +	}
> +}
> +
> +my %unhandled;
> +
> +sub trace_unhandled
> +{
> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
> +	    $common_pid, $common_comm) = @_;
> +
> +	$unhandled{$event_name}++;
> +}
> +
> +sub print_unhandled
> +{
> +	if ((scalar keys %unhandled) == 0) {
> +	    print "unhandled events nothing\n";

This is kind of distracting - it's not too useful to know that you don't
have unhandled events, but if you do have some, it is useful to print
those as you do below - it points out that some event type are being
unnecessarily recorded or the script is being run on the wrong trace
data.

Thanks,

Tom

> +	    return;
> +	}
> +
> +	print "\nunhandled events:\n\n";
> +
> +	printf("%-40s  %10s\n", "event", "count");
> +	printf("%-40s  %10s\n", "----------------------------------------",
> +	       "-----------");
> +
> +	foreach my $event_name (keys %unhandled) {
> +	    printf("%-40s  %10d\n", $event_name, $unhandled{$event_name});
> +	}
> +}
> +
> +sub minor
> +{
> +	my $dev = shift;
> +	return $dev & ((1 << 20) - 1);
> +}
> +
> +sub major
> +{
> +	my $dev = shift;
> +	return $dev >> 20;
> +}
> +
> +sub print_pagecache_usage_per_file
> +{
> +	print "[file list]\n";
> +	printf("  %12s %10s %8s\n", "", "", "cached");
> +	printf("  %12s %10s %8s\n", "device", "inode", "pages");
> +	printf("  %s\n", '-' x 32);
> +	while(my($dev, $file) = each(%files)) {
> +	    while(my($inode, $r) = each(%$file)) {
> +		my $count = values %$r;
> +		next if $count == 0;
> +		printf("  %12s %10d %8d\n",
> +		       major($dev).":".minor($dev), $inode, $count);
> +	    }
> +	}
> +}
> +
> +sub print_pagecache_usage_per_process
> +{
> +	print "[process list]\n";
> +	while(my ($pid, $v) = each(%records)) {
> +	    my ($sum_cached, $sum_added, $sum_removed, $sum_indirect_removed);
> +
> +	    print "o $pid\n";
> +	    printf("  %12s %10s %8s %8s %8s %13s\n", "", "",
> +		   "cached", "added", "removed", "indirect");
> +	    printf("  %12s %10s %8s %8s %8s %13s\n", "device", "inode",
> +		   "pages", "pages", "pages", "removed pages");
> +	    printf("  %s\n", '-' x 64);
> +	    while(my ($file, $r) = each(%$v)) {
> +		my $added_num = List::Util::sum(values %{$$r{added}});
> +		$sum_cached += $$r{cached};
> +		$sum_added += $added_num;
> +		$sum_removed += $$r{removed};
> +		$sum_indirect_removed += $$r{indirect_removed};
> +		printf("  %12s %10d %8d %8d %8d %13d\n",
> +		       major($$r{dev}).":".minor($$r{dev}), $$r{inode},
> +		       $$r{cached}, $added_num, $$r{removed},
> +		       $$r{indirect_removed});
> +	    }
> +	    printf("  %s\n", '-' x 64);
> +	    printf("  total: %5s %10s %8d %8d %8d %13d\n", "", "", $sum_cached,
> +		   $sum_added, $sum_removed, $sum_indirect_removed);
> +	    print "\n";
> +	}
> +}
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
  2010-01-23  8:21     ` Tom Zanussi
@ 2010-01-25 22:16       ` Keiichi KII
  -1 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-25 22:16 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: linux-kernel, lwoodman, linux-mm, mingo, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

(2010年01月23日 03:21), Tom Zanussi wrote:
> Hi,
> 
> On Fri, 2010-01-22 at 19:08 -0500, Keiichi KII wrote:
>> The scripts are implemented based on the trace stream scripting support.
>> And the scripts implement the following.
>>  - how many pagecaches each process has per each file
>>  - how many pages are cached per each file
>>  - how many pagecaches each process shares
>>
> 
> Nice, it looks like a very useful script - I gave it a quick try and it
> seems to work well...
> 
> The only problem I see, nothing to do with your script and nothing you
> can do anything about at the moment, is that the record step generates a
> huge amount of data, which of course makes the event processing take
> awhile.  A lot of it appears to be due to perf itself - being able to
> filter out the perf-generated events in the kernel would make a big
> difference, I think; you normally don't want to see those anyway...

Yes, right. I don't want to process the data of perf itself.
I will try to find any way to solve this problem. 

> BTW, I see that you did your first version in Python - not that you'd
> want to redo it again, but just FYI I now have working Python support
> that I'll be posting soon - I still have some small details to hammer
> out, but if you have any other scripts in the pipeline, in a couple days
> you'll be able to use Python instead if you want.

It will help me create some scripts. I will use Python support if it is posted.

>> To monitor pagecache usage per a process, run "pagecache-usage-record" to
>> record perf data for "pagecache-usage.pl" and run "pagecache-usage-report"
>> to display.
> 
> Another way of course would be to use 'perf trace record/report' and the
> script name as shown by perf trace -l:
> 
> $ perf trace record pagecache-usage
> $ perf trace report pagecache-usage

Thank you for your information.
I will use this way.

>> Index: linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
>> ===================================================================
>> --- /dev/null
>> +++ linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
>> @@ -0,0 +1,160 @@
>> +# perf trace event handlers, generated by perf trace -g perl
> 
> You might want to get rid of this and add a short description and your
> name, if you want to take credit for it. ;-)
> 
>> +# Licensed under the terms of the GNU GPL License version 2
>> +
> 
>> +# The common_* event handler fields are the most useful fields common to
>> +# all events.  They don't necessarily correspond to the 'common_*' fields
>> +# in the format files.  Those fields not available as handler params can
>> +# be retrieved using Perl functions of the form common_*($context).
>> +# See Context.pm for the list of available functions.
>> +
> 
> You can get rid of this part too - it's just meant to be helpful
> information generated when starting a script.
> 

I'll remove unnecessary comments and add a short description next time.

>> +my %unhandled;
>> +
>> +sub trace_unhandled
>> +{
>> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
>> +	    $common_pid, $common_comm) = @_;
>> +
>> +	$unhandled{$event_name}++;
>> +}
>> +
>> +sub print_unhandled
>> +{
>> +	if ((scalar keys %unhandled) == 0) {
>> +	    print "unhandled events nothing\n";
> 
> This is kind of distracting - it's not too useful to know that you don't
> have unhandled events, but if you do have some, it is useful to print
> those as you do below - it points out that some event type are being
> unnecessarily recorded or the script is being run on the wrong trace
> data.

I don't have unhandled events to monitor pagecache usage as you say.
So I will remove these code.

Thanks,
Keiichi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
@ 2010-01-25 22:16       ` Keiichi KII
  0 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-25 22:16 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: linux-kernel, lwoodman, linux-mm, mingo, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

(2010a1'01ae??23ae?JPY 03:21), Tom Zanussi wrote:
> Hi,
> 
> On Fri, 2010-01-22 at 19:08 -0500, Keiichi KII wrote:
>> The scripts are implemented based on the trace stream scripting support.
>> And the scripts implement the following.
>>  - how many pagecaches each process has per each file
>>  - how many pages are cached per each file
>>  - how many pagecaches each process shares
>>
> 
> Nice, it looks like a very useful script - I gave it a quick try and it
> seems to work well...
> 
> The only problem I see, nothing to do with your script and nothing you
> can do anything about at the moment, is that the record step generates a
> huge amount of data, which of course makes the event processing take
> awhile.  A lot of it appears to be due to perf itself - being able to
> filter out the perf-generated events in the kernel would make a big
> difference, I think; you normally don't want to see those anyway...

Yes, right. I don't want to process the data of perf itself.
I will try to find any way to solve this problem. 

> BTW, I see that you did your first version in Python - not that you'd
> want to redo it again, but just FYI I now have working Python support
> that I'll be posting soon - I still have some small details to hammer
> out, but if you have any other scripts in the pipeline, in a couple days
> you'll be able to use Python instead if you want.

It will help me create some scripts. I will use Python support if it is posted.

>> To monitor pagecache usage per a process, run "pagecache-usage-record" to
>> record perf data for "pagecache-usage.pl" and run "pagecache-usage-report"
>> to display.
> 
> Another way of course would be to use 'perf trace record/report' and the
> script name as shown by perf trace -l:
> 
> $ perf trace record pagecache-usage
> $ perf trace report pagecache-usage

Thank you for your information.
I will use this way.

>> Index: linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
>> ===================================================================
>> --- /dev/null
>> +++ linux-2.6-tip/tools/perf/scripts/perl/pagecache-usage.pl
>> @@ -0,0 +1,160 @@
>> +# perf trace event handlers, generated by perf trace -g perl
> 
> You might want to get rid of this and add a short description and your
> name, if you want to take credit for it. ;-)
> 
>> +# Licensed under the terms of the GNU GPL License version 2
>> +
> 
>> +# The common_* event handler fields are the most useful fields common to
>> +# all events.  They don't necessarily correspond to the 'common_*' fields
>> +# in the format files.  Those fields not available as handler params can
>> +# be retrieved using Perl functions of the form common_*($context).
>> +# See Context.pm for the list of available functions.
>> +
> 
> You can get rid of this part too - it's just meant to be helpful
> information generated when starting a script.
> 

I'll remove unnecessary comments and add a short description next time.

>> +my %unhandled;
>> +
>> +sub trace_unhandled
>> +{
>> +	my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
>> +	    $common_pid, $common_comm) = @_;
>> +
>> +	$unhandled{$event_name}++;
>> +}
>> +
>> +sub print_unhandled
>> +{
>> +	if ((scalar keys %unhandled) == 0) {
>> +	    print "unhandled events nothing\n";
> 
> This is kind of distracting - it's not too useful to know that you don't
> have unhandled events, but if you do have some, it is useful to print
> those as you do below - it points out that some event type are being
> unnecessarily recorded or the script is being run on the wrong trace
> data.

I don't have unhandled events to monitor pagecache usage as you say.
So I will remove these code.

Thanks,
Keiichi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 1/2 v2] add tracepoints for pagecache
  2010-01-23  2:28     ` Steven Rostedt
@ 2010-01-25 22:17       ` Keiichi KII
  -1 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-25 22:17 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, lwoodman, linux-mm, mingo, tzanussi, riel, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

(2010年01月22日 21:28), Steven Rostedt wrote:
>> +TRACE_EVENT(remove_from_page_cache,
>> +
>> +	TP_PROTO(struct address_space *mapping, pgoff_t offset),
>> +
>> +	TP_ARGS(mapping, offset),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(dev_t, s_dev)
>> +		__field(ino_t, i_ino)
>> +		__field(pgoff_t, offset)
>> +		),
>> +
>> +	TP_fast_assign(
>> +		__entry->s_dev = mapping->host->i_sb->s_dev;
>> +		__entry->i_ino = mapping->host->i_ino;
>> +		__entry->offset = offset;
>> +		),
>> +
>> +	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
>> +		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
>> +);
>> +
> 
> The above qualify in converting to templates or DECLACE_TRACE_CLASS, and
> DEFINE_EVENT, That is, rename the above TRACE_EVENT into
> DECLARE_TRACE_CLASS, and then have the other one be a DEFINE_EVENT().
> See the trace/event/sched.h for examples.
> 
> The TRACE_EVENT can add a bit of code, so use DECLARE_TRACE_CLASS when
> possible and it will save on the size overhead.

Thank you for your information. I'll fix it next time.

Thanks,
Keiichi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 1/2 v2] add tracepoints for pagecache
@ 2010-01-25 22:17       ` Keiichi KII
  0 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-01-25 22:17 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, lwoodman, linux-mm, mingo, tzanussi, riel, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

(2010a1'01ae??22ae?JPY 21:28), Steven Rostedt wrote:
>> +TRACE_EVENT(remove_from_page_cache,
>> +
>> +	TP_PROTO(struct address_space *mapping, pgoff_t offset),
>> +
>> +	TP_ARGS(mapping, offset),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(dev_t, s_dev)
>> +		__field(ino_t, i_ino)
>> +		__field(pgoff_t, offset)
>> +		),
>> +
>> +	TP_fast_assign(
>> +		__entry->s_dev = mapping->host->i_sb->s_dev;
>> +		__entry->i_ino = mapping->host->i_ino;
>> +		__entry->offset = offset;
>> +		),
>> +
>> +	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
>> +		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
>> +);
>> +
> 
> The above qualify in converting to templates or DECLACE_TRACE_CLASS, and
> DEFINE_EVENT, That is, rename the above TRACE_EVENT into
> DECLARE_TRACE_CLASS, and then have the other one be a DEFINE_EVENT().
> See the trace/event/sched.h for examples.
> 
> The TRACE_EVENT can add a bit of code, so use DECLARE_TRACE_CLASS when
> possible and it will save on the size overhead.

Thank you for your information. I'll fix it next time.

Thanks,
Keiichi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
  2010-01-25 22:16       ` Keiichi KII
@ 2010-02-01  8:17         ` Tom Zanussi
  -1 siblings, 0 replies; 22+ messages in thread
From: Tom Zanussi @ 2010-02-01  8:17 UTC (permalink / raw)
  To: Keiichi KII
  Cc: linux-kernel, lwoodman, linux-mm, mingo, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

Hi,

On Mon, 2010-01-25 at 17:16 -0500, Keiichi KII wrote:
> (2010年01月23日 03:21), Tom Zanussi wrote:
> > Hi,
> > 
> > On Fri, 2010-01-22 at 19:08 -0500, Keiichi KII wrote:
> >> The scripts are implemented based on the trace stream scripting support.
> >> And the scripts implement the following.
> >>  - how many pagecaches each process has per each file
> >>  - how many pages are cached per each file
> >>  - how many pagecaches each process shares
> >>
> > 
> > Nice, it looks like a very useful script - I gave it a quick try and it
> > seems to work well...
> > 
> > The only problem I see, nothing to do with your script and nothing you
> > can do anything about at the moment, is that the record step generates a
> > huge amount of data, which of course makes the event processing take
> > awhile.  A lot of it appears to be due to perf itself - being able to
> > filter out the perf-generated events in the kernel would make a big
> > difference, I think; you normally don't want to see those anyway...
> 
> Yes, right. I don't want to process the data of perf itself.
> I will try to find any way to solve this problem. 
> 

Here's one way, using the tracepoint filters - it does make a big
difference in this case.

Before (using the new -P option, which includes perf in the trace
data):  

root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache -P sleep 5
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 71.201 MB perf.data (~3110815 samples) ]

After (filters out events generated by perf):

root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.309 MB perf.data (~13479 samples) ]

Tom

[PATCH] perf record: filter out perf process tracepoint events

The perf process itself can generate a lot of trace data, which most
of the time isn't of any interest.  This patch adds a predicate to the
kernel tracepoint filter of each recorded event type which effectively
screens out any event generated by perf.

Assuming the common case would be to ignore perf, this makes it the
default; the old behavior can be selected by using 'perf record -P'.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
---
 tools/perf/builtin-record.c    |   44 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/parse-events.h |    3 +-
 2 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index eea5691..5fa113a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -49,6 +49,8 @@ static int			no_samples			=      0;
 static int			sample_address			=      0;
 static int			multiplex			=      0;
 static int			multiplex_fd			=     -1;
+static int			include_perf			=      0;
+static char			exclude_perf_pred[MAX_FILTER_STR_VAL];
 
 static long			samples				=      0;
 static struct timeval		last_read;
@@ -236,6 +238,37 @@ static struct perf_header_attr *get_header_attr(struct perf_event_attr *a, int n
 	return h_attr;
 }
 
+static char *add_exclude_perf_pred(char *old_filter)
+{
+	int len = strlen(exclude_perf_pred);
+	char *filter;
+
+	if (old_filter != NULL)
+		len = len + strlen(" && ()") + strlen(old_filter);
+
+	if (len >= MAX_FILTER_STR_VAL) {
+		fprintf(stderr, "excluding perf exceeds max filter length,"
+			" use -P instead\n");
+		exit(-1);
+	}
+
+	filter = malloc(len + 1);
+	if (!filter) {
+		fprintf(stderr, "not enough memory to hold filter string\n");
+		exit(-1);
+	}
+
+	strcpy(filter, exclude_perf_pred);
+
+	if (old_filter) {
+		strcat(filter, " && (");
+		strcat(filter, old_filter);
+		strcat(filter, ")");
+	}
+
+	return filter;
+}
+
 static void create_counter(int counter, int cpu, pid_t pid)
 {
 	char *filter = filters[counter];
@@ -375,6 +408,12 @@ try_again:
 		}
 	}
 
+	if (include_perf == 0 && cpu == 0) {
+		filters[counter] = add_exclude_perf_pred(filter);
+		free(filter);
+		filter = filters[counter];
+	}
+
 	if (filter != NULL) {
 		ret = ioctl(fd[nr_cpu][counter],
 			    PERF_EVENT_IOC_SET_FILTER, filter);
@@ -677,6 +716,8 @@ static const struct option options[] = {
 		    "don't sample"),
 	OPT_BOOLEAN('M', "multiplex", &multiplex,
 		    "multiplex counter output in a single channel"),
+	OPT_BOOLEAN('P', "include-perf", &include_perf,
+		    "include perf in trace data (normally excluded)"),
 	OPT_END()
 };
 
@@ -716,5 +757,8 @@ int cmd_record(int argc, const char **argv, const char *prefix __used)
 		attrs[counter].sample_period = default_interval;
 	}
 
+	if (include_perf == 0)
+		sprintf(exclude_perf_pred, "common_pid != %d", getpid());
+
 	return __cmd_record(argc, argv);
 }
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index b8c1f64..d43d9b0 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -25,7 +25,8 @@ extern const char *__event_name(int type, u64 config);
 extern int parse_events(const struct option *opt, const char *str, int unset);
 extern int parse_filter(const struct option *opt, const char *str, int unset);
 
-#define EVENTS_HELP_MAX (128*1024)
+#define EVENTS_HELP_MAX		(128*1024)
+#define MAX_FILTER_STR_VAL	256
 
 extern void print_events(void);
 
-- 
1.6.4.GIT




^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
@ 2010-02-01  8:17         ` Tom Zanussi
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Zanussi @ 2010-02-01  8:17 UTC (permalink / raw)
  To: Keiichi KII
  Cc: linux-kernel, lwoodman, linux-mm, mingo, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

Hi,

On Mon, 2010-01-25 at 17:16 -0500, Keiichi KII wrote:
> (2010a1'01ae??23ae?JPY 03:21), Tom Zanussi wrote:
> > Hi,
> > 
> > On Fri, 2010-01-22 at 19:08 -0500, Keiichi KII wrote:
> >> The scripts are implemented based on the trace stream scripting support.
> >> And the scripts implement the following.
> >>  - how many pagecaches each process has per each file
> >>  - how many pages are cached per each file
> >>  - how many pagecaches each process shares
> >>
> > 
> > Nice, it looks like a very useful script - I gave it a quick try and it
> > seems to work well...
> > 
> > The only problem I see, nothing to do with your script and nothing you
> > can do anything about at the moment, is that the record step generates a
> > huge amount of data, which of course makes the event processing take
> > awhile.  A lot of it appears to be due to perf itself - being able to
> > filter out the perf-generated events in the kernel would make a big
> > difference, I think; you normally don't want to see those anyway...
> 
> Yes, right. I don't want to process the data of perf itself.
> I will try to find any way to solve this problem. 
> 

Here's one way, using the tracepoint filters - it does make a big
difference in this case.

Before (using the new -P option, which includes perf in the trace
data):  

root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache -P sleep 5
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 71.201 MB perf.data (~3110815 samples) ]

After (filters out events generated by perf):

root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.309 MB perf.data (~13479 samples) ]

Tom

[PATCH] perf record: filter out perf process tracepoint events

The perf process itself can generate a lot of trace data, which most
of the time isn't of any interest.  This patch adds a predicate to the
kernel tracepoint filter of each recorded event type which effectively
screens out any event generated by perf.

Assuming the common case would be to ignore perf, this makes it the
default; the old behavior can be selected by using 'perf record -P'.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
---
 tools/perf/builtin-record.c    |   44 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/parse-events.h |    3 +-
 2 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index eea5691..5fa113a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -49,6 +49,8 @@ static int			no_samples			=      0;
 static int			sample_address			=      0;
 static int			multiplex			=      0;
 static int			multiplex_fd			=     -1;
+static int			include_perf			=      0;
+static char			exclude_perf_pred[MAX_FILTER_STR_VAL];
 
 static long			samples				=      0;
 static struct timeval		last_read;
@@ -236,6 +238,37 @@ static struct perf_header_attr *get_header_attr(struct perf_event_attr *a, int n
 	return h_attr;
 }
 
+static char *add_exclude_perf_pred(char *old_filter)
+{
+	int len = strlen(exclude_perf_pred);
+	char *filter;
+
+	if (old_filter != NULL)
+		len = len + strlen(" && ()") + strlen(old_filter);
+
+	if (len >= MAX_FILTER_STR_VAL) {
+		fprintf(stderr, "excluding perf exceeds max filter length,"
+			" use -P instead\n");
+		exit(-1);
+	}
+
+	filter = malloc(len + 1);
+	if (!filter) {
+		fprintf(stderr, "not enough memory to hold filter string\n");
+		exit(-1);
+	}
+
+	strcpy(filter, exclude_perf_pred);
+
+	if (old_filter) {
+		strcat(filter, " && (");
+		strcat(filter, old_filter);
+		strcat(filter, ")");
+	}
+
+	return filter;
+}
+
 static void create_counter(int counter, int cpu, pid_t pid)
 {
 	char *filter = filters[counter];
@@ -375,6 +408,12 @@ try_again:
 		}
 	}
 
+	if (include_perf == 0 && cpu == 0) {
+		filters[counter] = add_exclude_perf_pred(filter);
+		free(filter);
+		filter = filters[counter];
+	}
+
 	if (filter != NULL) {
 		ret = ioctl(fd[nr_cpu][counter],
 			    PERF_EVENT_IOC_SET_FILTER, filter);
@@ -677,6 +716,8 @@ static const struct option options[] = {
 		    "don't sample"),
 	OPT_BOOLEAN('M', "multiplex", &multiplex,
 		    "multiplex counter output in a single channel"),
+	OPT_BOOLEAN('P', "include-perf", &include_perf,
+		    "include perf in trace data (normally excluded)"),
 	OPT_END()
 };
 
@@ -716,5 +757,8 @@ int cmd_record(int argc, const char **argv, const char *prefix __used)
 		attrs[counter].sample_period = default_interval;
 	}
 
+	if (include_perf == 0)
+		sprintf(exclude_perf_pred, "common_pid != %d", getpid());
+
 	return __cmd_record(argc, argv);
 }
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index b8c1f64..d43d9b0 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -25,7 +25,8 @@ extern const char *__event_name(int type, u64 config);
 extern int parse_events(const struct option *opt, const char *str, int unset);
 extern int parse_filter(const struct option *opt, const char *str, int unset);
 
-#define EVENTS_HELP_MAX (128*1024)
+#define EVENTS_HELP_MAX		(128*1024)
+#define MAX_FILTER_STR_VAL	256
 
 extern void print_events(void);
 
-- 
1.6.4.GIT



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
  2010-02-01  8:17         ` Tom Zanussi
@ 2010-02-01 21:20           ` Keiichi KII
  -1 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-02-01 21:20 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: linux-kernel, lwoodman, linux-mm, mingo, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

Hello,

(02/01/10 03:17), Tom Zanussi wrote:
> Here's one way, using the tracepoint filters - it does make a big
> difference in this case.
> 
> Before (using the new -P option, which includes perf in the trace
> data):  
> 
> root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache -P sleep 5
> [ perf record: Woken up 0 times to write data ]
> [ perf record: Captured and wrote 71.201 MB perf.data (~3110815 samples) ]
> 
> After (filters out events generated by perf):
> 
> root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.309 MB perf.data (~13479 samples) ]

I tried your patch.
It works well for me. Thank you.

Keiichi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
@ 2010-02-01 21:20           ` Keiichi KII
  0 siblings, 0 replies; 22+ messages in thread
From: Keiichi KII @ 2010-02-01 21:20 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: linux-kernel, lwoodman, linux-mm, mingo, riel, rostedt, akpm,
	fweisbec, Munehiro Ikeda, Atsushi Tsuji

Hello,

(02/01/10 03:17), Tom Zanussi wrote:
> Here's one way, using the tracepoint filters - it does make a big
> difference in this case.
> 
> Before (using the new -P option, which includes perf in the trace
> data):  
> 
> root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache -P sleep 5
> [ perf record: Woken up 0 times to write data ]
> [ perf record: Captured and wrote 71.201 MB perf.data (~3110815 samples) ]
> 
> After (filters out events generated by perf):
> 
> root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.309 MB perf.data (~13479 samples) ]

I tried your patch.
It works well for me. Thank you.

Keiichi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
  2010-02-01  8:17         ` Tom Zanussi
@ 2010-02-23 17:54           ` Frederic Weisbecker
  -1 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2010-02-23 17:54 UTC (permalink / raw)
  To: Tom Zanussi, Ingo Molnar, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo
  Cc: Keiichi KII, linux-kernel, lwoodman, linux-mm, mingo, riel,
	rostedt, akpm, Munehiro Ikeda, Atsushi Tsuji

On Mon, Feb 01, 2010 at 02:17:35AM -0600, Tom Zanussi wrote:
> Here's one way, using the tracepoint filters - it does make a big
> difference in this case.
> 
> Before (using the new -P option, which includes perf in the trace
> data):  
> 
> root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache -P sleep 5
> [ perf record: Woken up 0 times to write data ]
> [ perf record: Captured and wrote 71.201 MB perf.data (~3110815 samples) ]
> 
> After (filters out events generated by perf):
> 
> root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.309 MB perf.data (~13479 samples) ]
> 
> Tom
> 
> [PATCH] perf record: filter out perf process tracepoint events
> 
> The perf process itself can generate a lot of trace data, which most
> of the time isn't of any interest.  This patch adds a predicate to the
> kernel tracepoint filter of each recorded event type which effectively
> screens out any event generated by perf.
> 
> Assuming the common case would be to ignore perf, this makes it the
> default; the old behavior can be selected by using 'perf record -P'.


I think filtering out perf from the instrumentation is a very
desirable features.

But I see two drawbacks with this patch.
First of all, we want to keep perf as a part of the instrumentation
as a default behaviour I think, as it is a true part of the system
wide load. So I would rather suggest to keep it as a default and
have an exclude_perf option instead of include_perf.

The other downside is that this filtering only applies to ftrace events
and not to other perf events. I would expect an exclude_perf option
to apply to every events, not just a family of them.

This is not that easy though. It's trivial for a process bound
instrumentation as we only need to use enable_on_exec for that
(assuming we create the targeted process from perf).

Otherwise we need the cpu events to filter out a given context, which
needs to be done from the kernel, on events scheduling time.
It's just an idea, I'm adding more interested parties in Cc.

Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
@ 2010-02-23 17:54           ` Frederic Weisbecker
  0 siblings, 0 replies; 22+ messages in thread
From: Frederic Weisbecker @ 2010-02-23 17:54 UTC (permalink / raw)
  To: Tom Zanussi, Ingo Molnar, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo
  Cc: Keiichi KII, linux-kernel, lwoodman, linux-mm, riel, rostedt,
	akpm, Munehiro Ikeda, Atsushi Tsuji

On Mon, Feb 01, 2010 at 02:17:35AM -0600, Tom Zanussi wrote:
> Here's one way, using the tracepoint filters - it does make a big
> difference in this case.
> 
> Before (using the new -P option, which includes perf in the trace
> data):  
> 
> root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache -P sleep 5
> [ perf record: Woken up 0 times to write data ]
> [ perf record: Captured and wrote 71.201 MB perf.data (~3110815 samples) ]
> 
> After (filters out events generated by perf):
> 
> root@tropicana:~# perf record -c 1 -f -a -M -R -e filemap:add_to_page_cache -e filemap:find_get_page -e filemap:remove_from_page_cache sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.309 MB perf.data (~13479 samples) ]
> 
> Tom
> 
> [PATCH] perf record: filter out perf process tracepoint events
> 
> The perf process itself can generate a lot of trace data, which most
> of the time isn't of any interest.  This patch adds a predicate to the
> kernel tracepoint filter of each recorded event type which effectively
> screens out any event generated by perf.
> 
> Assuming the common case would be to ignore perf, this makes it the
> default; the old behavior can be selected by using 'perf record -P'.


I think filtering out perf from the instrumentation is a very
desirable features.

But I see two drawbacks with this patch.
First of all, we want to keep perf as a part of the instrumentation
as a default behaviour I think, as it is a true part of the system
wide load. So I would rather suggest to keep it as a default and
have an exclude_perf option instead of include_perf.

The other downside is that this filtering only applies to ftrace events
and not to other perf events. I would expect an exclude_perf option
to apply to every events, not just a family of them.

This is not that easy though. It's trivial for a process bound
instrumentation as we only need to use enable_on_exec for that
(assuming we create the targeted process from perf).

Otherwise we need the cpu events to filter out a given context, which
needs to be done from the kernel, on events scheduling time.
It's just an idea, I'm adding more interested parties in Cc.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
  2010-02-23 17:54           ` Frederic Weisbecker
@ 2010-02-23 18:13             ` Peter Zijlstra
  -1 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2010-02-23 18:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Tom Zanussi, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Keiichi KII, linux-kernel, lwoodman,
	linux-mm, riel, rostedt, akpm, Munehiro Ikeda, Atsushi Tsuji

On Tue, 2010-02-23 at 18:54 +0100, Frederic Weisbecker wrote:

> > [PATCH] perf record: filter out perf process tracepoint events
> > 
> > The perf process itself can generate a lot of trace data, which most
> > of the time isn't of any interest.  This patch adds a predicate to the
> > kernel tracepoint filter of each recorded event type which effectively
> > screens out any event generated by perf.
> > 
> > Assuming the common case would be to ignore perf, this makes it the
> > default; the old behavior can be selected by using 'perf record -P'.
> 
> 
> I think filtering out perf from the instrumentation is a very
> desirable features.
> 
> But I see two drawbacks with this patch.
> First of all, we want to keep perf as a part of the instrumentation
> as a default behaviour I think, as it is a true part of the system
> wide load. So I would rather suggest to keep it as a default and
> have an exclude_perf option instead of include_perf.

Agreed, excluding by default doesn't seem to be a sane idea.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process
@ 2010-02-23 18:13             ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2010-02-23 18:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Tom Zanussi, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Keiichi KII, linux-kernel, lwoodman,
	linux-mm, riel, rostedt, akpm, Munehiro Ikeda, Atsushi Tsuji

On Tue, 2010-02-23 at 18:54 +0100, Frederic Weisbecker wrote:

> > [PATCH] perf record: filter out perf process tracepoint events
> > 
> > The perf process itself can generate a lot of trace data, which most
> > of the time isn't of any interest.  This patch adds a predicate to the
> > kernel tracepoint filter of each recorded event type which effectively
> > screens out any event generated by perf.
> > 
> > Assuming the common case would be to ignore perf, this makes it the
> > default; the old behavior can be selected by using 'perf record -P'.
> 
> 
> I think filtering out perf from the instrumentation is a very
> desirable features.
> 
> But I see two drawbacks with this patch.
> First of all, we want to keep perf as a part of the instrumentation
> as a default behaviour I think, as it is a true part of the system
> wide load. So I would rather suggest to keep it as a default and
> have an exclude_perf option instead of include_perf.

Agreed, excluding by default doesn't seem to be a sane idea.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-02-23 18:13 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-23  0:04 [RFC PATCH -tip 0/2 v2] pagecache tracepoints proposal Keiichi KII
2010-01-23  0:04 ` Keiichi KII
2010-01-23  0:07 ` [RFC PATCH -tip 1/2 v2] add tracepoints for pagecache Keiichi KII
2010-01-23  0:07   ` Keiichi KII
2010-01-23  2:28   ` Steven Rostedt
2010-01-23  2:28     ` Steven Rostedt
2010-01-25 22:17     ` Keiichi KII
2010-01-25 22:17       ` Keiichi KII
2010-01-23  0:08 ` [RFC PATCH -tip 2/2 v2] add a scripts for pagecache usage per process Keiichi KII
2010-01-23  0:08   ` Keiichi KII
2010-01-23  8:21   ` Tom Zanussi
2010-01-23  8:21     ` Tom Zanussi
2010-01-25 22:16     ` Keiichi KII
2010-01-25 22:16       ` Keiichi KII
2010-02-01  8:17       ` Tom Zanussi
2010-02-01  8:17         ` Tom Zanussi
2010-02-01 21:20         ` Keiichi KII
2010-02-01 21:20           ` Keiichi KII
2010-02-23 17:54         ` Frederic Weisbecker
2010-02-23 17:54           ` Frederic Weisbecker
2010-02-23 18:13           ` Peter Zijlstra
2010-02-23 18:13             ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.