linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH -tip 0/5] perf tools: pagecache monitoring
@ 2011-07-18 21:31 Keiichi KII
  2011-07-18 21:33 ` [RFC PATCH -tip 1/5] perf tools: handle '-' and '*' in trace parsing Keiichi KII
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Keiichi KII @ 2011-07-18 21:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, linux-mm, Tom Zanussi, riel, Steven Rostedt,
	Fr??d??ric Weisbecker, Keiichi KII, Wu Fengguang, BA, Moussa

Hello,

I would propose pagecache monitoring tools using perf tools.
The purpose of the tools is to clarify pagecache behavior in a system.

We can now know system-wide pagecache usage by "/proc/meminfo".
But we don't have any way to get higher resolution information like
per file or per process usage than system-wide one.
If pagecache miss hit ratio rises due to unnecessary adding/removing
pagecaches, maybe it leads to extra I/O and affects system performance.
But it's difficult to find out what is going on in the system.

So, the tools I propose provide 2 functions:

1. pagecache snapshooting(perf script pagecache-snapshoot)

This function clarifies pagecache usage per each file in the system.
This function is based mainly on "pagecache object collections" that is
developed by Wu Fengguang (http://lkml.org/lkml/2010/2/9/156).
The following is sample output of this function.

pagecache snapshooting (time: 14131, path: /home)
                             file name cache(B)  file(B)  ratio  +/-(B)    age
-------------------------------------- -------- -------- ------ ------- ------
/home/foo/git/linux-2.6-tip/.git/objec    71.0M   436.6M    16%       0   9012
/home/foo/git/linux-2.6-tip/.git/objec    49.6M    57.7M    86%       0   9012
/home/foo/.thunderbird/xso5zn7g.defaul    19.8M    19.8M   100%       0   7223
/home/foo/.thunderbird/xso5zn7g.defaul     5.7M     5.7M   100%       0   6621
/home/foo/git/linux-2.6-tip/.git/index     3.5M     3.5M   100%       0   4306
/home/foo/.thunderbird/xso5zn7g.defaul     2.2M     2.2M   100%       0   7524
/home/foo/.thunderbird/xso5zn7g.defaul     2.2M     2.2M   100%       0   7526
/home/foo/.thunderbird/xso5zn7g.defaul     1.7M     1.7M   100%       0   6921
...

2. continuous pagecache monitoring(perf script pagecachetop)

This function clarifies pagecache behavior like pagecache hit ratio and
added/removed pagecache amount on the basis of file/process.
This functions is based on pagecache tracepoints I propose.
While the pagecache snapshooting can take a pagecache snapshoot at a point,
the continuous pagecache monitoring can measure dynamic change between
2 snapshoots.
The following is sample output of this function.

pagecache behavior per file (time:15826, interval:10)

                         find        hit    cache      add   remove  proc
                file    count      ratio pages(B) pages(B) pages(B) count
-------------------- -------- ---------- -------- -------- -------- -----
        libc-2.13.so      620    100.00%     1.2M        0        0     7
                bash      283    100.00%   888.0K        0        0     6
          ld-2.13.so      136    100.00%   148.0K        0        0     6
                gawk      130    100.00%   376.0K        0        0     2
         ld.so.cache       60    100.00%   116.0K        0        0     4
...

pagecache behavior per process (time:16294, interval:10)

                         find        hit      add   remove  file
             process    count      ratio pages(B) pages(B) count
-------------------- -------- ---------- -------- -------- -----
            zsh-7761     2968     99.93%     4.0K        0   246
           perf-7758      369    100.00%        0        0    17
           xmms-7634       52    100.00%        0        0     1
           perf-7759       11    100.00%        0        0     2
            zsh-2815        6     83.33%     4.0K     4.0K     2
       gconfd-2-4849        3      0.00%    12.0K    12.0K     4
       rsyslogd-7194        1    100.00%        0        0     1


By these 2 functions, we can find out whether pagecaches are used
efficiently or not.
And also these tools would help us tune some applications like database.
It will also help us tune the kernel parameters like "vm.dirty_*".

My patches are based on the latest "linux-tip.git" tree and
also the following 3 commits in "tip:tracing/mm" and a "pagecache
object collections" patch. 

  - dcac8cd: tracing/mm: add page frame snapshot trace
  - 1487a7a: tracing/mm: fix mapcount trace record field
  - eb46710: tracing/mm: rename 'trigger' file to 'dump_range'
  - http://lkml.org/lkml/2010/2/9/156

Any comments are welcome.
--
Keiichi Kii <k-keiichi@bx.jp.nec.com>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH -tip 1/5] perf tools: handle '-' and '*' in trace parsing
  2011-07-18 21:31 [RFC PATCH -tip 0/5] perf tools: pagecache monitoring Keiichi KII
@ 2011-07-18 21:33 ` Keiichi KII
  2011-07-18 21:34 ` [RFC PATCH -tip 2/5] tracing/mm: add header event for object collections Keiichi KII
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Keiichi KII @ 2011-07-18 21:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Keiichi KII, Ingo Molnar, linux-mm, Tom Zanussi, riel,
	Steven Rostedt, Fr??d??ric Weisbecker, Wu Fengguang, BA, Moussa

From: Keiichi Kii <k-keiichi@bx.jp.nec.com>

add '-' and '*' to parsing in trace parsing.

Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
---

 tools/perf/util/trace-event-parse.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)


diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
index 0a7ed5b..447f7c0 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -2162,6 +2162,12 @@ static unsigned long long eval_num_arg(void *data, int size,
 		case '+':
 			val = left + right;
 			break;
+		case '*':
+			val = left * right;
+			break;
+		case '/':
+			val = left / right;
+			break;
 		default:
 			die("unknown op '%s'", arg->op.op);
 		}



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH -tip 2/5] tracing/mm: add header event for object collections
  2011-07-18 21:31 [RFC PATCH -tip 0/5] perf tools: pagecache monitoring Keiichi KII
  2011-07-18 21:33 ` [RFC PATCH -tip 1/5] perf tools: handle '-' and '*' in trace parsing Keiichi KII
@ 2011-07-18 21:34 ` Keiichi KII
  2011-07-29  9:55   ` Mel Gorman
  2011-07-18 21:36 ` [RFC PATCH -tip 3/5] perf tools: scripts for pagecache snapshooting Keiichi KII
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Keiichi KII @ 2011-07-18 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Keiichi KII, Ingo Molnar, linux-mm, Tom Zanussi, riel,
	Steven Rostedt, Fr??d??ric Weisbecker, Wu Fengguang, BA, Moussa

From: Keiichi Kii <k-keiichi@bx.jp.nec.com>

We can use this "dump_header" event to separate trace data
for the object collections.

Usage and Sample output:

zsh  2815 [001]  8819.880776: dump_header: object=mm/pages/walk-fs input=/
zsh  2815 [001]  8819.880786: dump_inode: ino=139161 size=507416 cached=507904 age=29 dirty=7 dev=254:0 file=strchr
zsh  2815 [001]  8819.880790: dump_pagecache_range: index=0 len=1 flags=4000000000000878 count=2 mapcount=0
zsh  2815 [001]  8819.880793: dump_pagecache_range: index=1 len=18 flags=400000000000087c count=2 mapcount=0
zsh  2815 [001]  8819.880795: dump_pagecache_range: index=19 len=1 flags=400000000000083c count=2 mapcount=0
zsh  2815 [001]  8819.880796: dump_pagecache_range: index=20 len=2 flags=400000000000087c count=2 mapcount=0
...
zsh  2816 [001]  8820.XXXXXX: dump_header: object=mm/pages/walk-fs input=/
...

Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
---

 include/trace/events/mm.h |   19 +++++++++++++++++++
 kernel/trace/trace_mm.c   |    9 +++++++++
 2 files changed, 28 insertions(+), 0 deletions(-)


diff --git a/include/trace/events/mm.h b/include/trace/events/mm.h
index e625b49..05bd35a 100644
--- a/include/trace/events/mm.h
+++ b/include/trace/events/mm.h
@@ -111,6 +111,25 @@ TRACE_EVENT(dump_inode,
 		  strchr(__get_str(file), '\n') ? "?" : __get_str(file))
 );
 
+TRACE_EVENT(dump_header,
+
+	TP_PROTO(char *object_name, char *input_data),
+
+	TP_ARGS(object_name, input_data),
+
+	TP_STRUCT__entry(
+		__string(	object_name,	object_name	)
+		__string(	input_data,	input_data	)
+	),
+
+	TP_fast_assign(
+		__assign_str(object_name, object_name);
+		__assign_str(input_data, input_data);
+	),
+
+	TP_printk("object=%s input=%s",
+		__get_str(object_name), __get_str(input_data))
+);
 
 #endif /*  _TRACE_MM_H */
 
diff --git a/kernel/trace/trace_mm.c b/kernel/trace/trace_mm.c
index 0d77dfd..fa9ab7c 100644
--- a/kernel/trace/trace_mm.c
+++ b/kernel/trace/trace_mm.c
@@ -72,6 +72,8 @@ trace_mm_dump_range_write(struct file *filp, const char __user *ubuf, size_t cnt
 	if (tracing_update_buffers() < 0)
 		return -ENOMEM;
 
+	if (trace_set_clr_event("mm", "dump_header", 1))
+		return -EINVAL;
 	if (trace_set_clr_event("mm", "dump_pages", 1))
 		return -EINVAL;
 
@@ -87,6 +89,7 @@ trace_mm_dump_range_write(struct file *filp, const char __user *ubuf, size_t cnt
 	else
 		end = start + val;
 
+	trace_dump_header("mm/pages/dump_range", buf);
 	trace_read_page_frames(start, end, trace_do_dump_pages);
 
 	*ppos += cnt;
@@ -270,6 +273,10 @@ trace_pagecache_write(struct file *filp, const char __user *ubuf, size_t count,
 		err = -ENOMEM;
 		goto out;
 	}
+	if (trace_set_clr_event("mm", "dump_header", 1)) {
+		err = -EINVAL;
+		goto out;
+	}
 	if (trace_set_clr_event("mm", "dump_pagecache_range", 1)) {
 		err = -EINVAL;
 		goto out;
@@ -280,8 +287,10 @@ trace_pagecache_write(struct file *filp, const char __user *ubuf, size_t count,
 	}
 
 	if (filp->f_path.dentry->d_inode->i_private) {
+		trace_dump_header("mm/pages/walk-fs", name);
 		dump_fs_pagecache(file->f_path.dentry->d_sb, file->f_path.mnt);
 	} else {
+		trace_dump_header("mm/pages/walk-file", name);
 		dump_pagecache(file->f_mapping);
 	}



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH -tip 3/5] perf tools: scripts for pagecache snapshooting
  2011-07-18 21:31 [RFC PATCH -tip 0/5] perf tools: pagecache monitoring Keiichi KII
  2011-07-18 21:33 ` [RFC PATCH -tip 1/5] perf tools: handle '-' and '*' in trace parsing Keiichi KII
  2011-07-18 21:34 ` [RFC PATCH -tip 2/5] tracing/mm: add header event for object collections Keiichi KII
@ 2011-07-18 21:36 ` Keiichi KII
  2011-07-18 21:38 ` [RFC PATCH -tip 4/5] tracepoints: add tracepoints for pagecache Keiichi KII
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Keiichi KII @ 2011-07-18 21:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Keiichi KII, Ingo Molnar, linux-mm, Tom Zanussi, riel,
	Steven Rostedt, Fr??d??ric Weisbecker, Wu Fengguang, BA, Moussa

From: Keiichi Kii <k-keiichi@bx.jp.nec.com>

The "pagecache snapshooting" is implemented based on "pagecache object
collections" and the trace stream scripting support in perf tools.

To take pagecache snapshoots,
we can run "perf script pagecache-snapshoot <path of fs> <interval>
ex) perf script pagecache-snapshoot / 600
    => keep taking a snapshoot for "/" filesystem every 10 minutes.

The output consists of 2 parts. The following shows sample outputs.

o One part is to show pagecache usage in a system at a point.

pangecache snapshooting (time: 18501, path: /)
                             file name cache(B)  file(B)  ratio  ±(B)    age
-------------------------------------- -------- -------- ------ ------- ------
/var/lib/rpm/Packages                     68.5M    70.0M    97%   60.1M     12
/usr/lib64/thunderbird-3.1/thunderbird     8.8M    19.4M    45%       0  11899
/usr/bin/emacs-23.2                        4.3M    10.3M    41%       0  17192
/usr/share/anthy/anthy.dic                 3.0M    20.0M    14%       0  18402
/usr/lib64/libgtk-x11-2.0.so.0.2200.0      2.8M     4.5M    63%       0  18485
/usr/lib/locale/locale-archive             1.8M    94.6M     1%       0  18496
/usr/lib64/libpython2.7.so.1.0             1.5M     1.8M    87%       0  18417
/usr/bin/Xorg                              1.3M     1.9M    71%       0  18471
/usr/share/fonts/sazanami/mincho/sazan     1.3M    10.1M    12%       0   8634
/var/lib/rpm/__db.003                      1.3M     1.3M   100%       0     12
/usr/lib64/perl5/CORE/libperl.so           1.2M     1.5M    82%       0  13415
/lib64/libc-2.13.so                        1.2M     1.9M    64%       0  18497
/lib64/libdb-4.8.so                        1.2M     1.5M    79%       0  18474
/usr/bin/nautilus                       1008.0K     1.9M    53%       0  18418
/usr/lib64/libnss3.so                    992.0K     1.2M    78%       0  18491
/lib64/libcrypto.so.1.0.0d               948.0K     1.6M    58%       0  18477
/var/cache/hald/fdi-cache                880.0K   997.5K    88%       0  18479

o the other is to show statistics for pagecache changes between snapshoots
  after finishing taking snapshoots.

[stat]
o pagecache max
  1:    69.5MB /var/lib/rpm/Packages
  2:    37.6MB /var/cache/yum/x86_64/14/fedora/19091941
  3:    28.1MB /var/cache/yum/x86_64/14/updates/2d3f230
  4:     8.8MB /usr/lib64/thunderbird-3.1/thunderbird-b
  5:     7.1MB /var/cache/yum/x86_64/14/fedora-debuginf
  6:     6.8MB /var/cache/yum/x86_64/14/updates/updatei
  7:     4.3MB /usr/bin/emacs-23.2
  8:     3.0MB /usr/share/anthy/anthy.dic
  9:     2.8MB /usr/lib64/libgtk-x11-2.0.so.0.2200.0
 10:     2.6MB /var/cache/yum/x86_64/14/updates-debugin

o pagecache average
  1:    24.4MB/   69.5MB(  35%) /var/lib/rpm/Packages
  2:     8.8MB/    8.8MB( 100%) /usr/lib64/thunderbird-3.1/thunderbird-b
  3:     6.3MB/   37.6MB(  16%) /var/cache/yum/x86_64/14/fedora/19091941
  4:     4.7MB/   28.1MB(  16%) /var/cache/yum/x86_64/14/updates/2d3f230
  5:     4.3MB/    4.3MB( 100%) /usr/bin/emacs-23.2
  6:     3.0MB/    3.0MB( 100%) /usr/share/anthy/anthy.dic
  7:     2.8MB/    2.8MB( 100%) /usr/lib64/libgtk-x11-2.0.so.0.2200.0
  8:     1.8MB/    1.8MB( 100%) /usr/lib/locale/locale-archive
  9:     1.4MB/    1.5MB(  87%) /usr/lib64/libpython2.7.so.1.0
 10:     1.3MB/    1.3MB( 100%) /usr/bin/Xorg

o increased pagecache total
  1:    69.5MB/   69.5MB( 100%) /var/lib/rpm/Packages
  2:    37.6MB/   37.6MB( 100%) /var/cache/yum/x86_64/14/fedora/19091941
  3:    28.1MB/   28.1MB( 100%) /var/cache/yum/x86_64/14/updates/2d3f230
  4:     7.1MB/    7.1MB( 100%) /var/cache/yum/x86_64/14/fedora-debuginf
  5:     6.8MB/    6.8MB( 100%) /var/cache/yum/x86_64/14/updates/updatei
  6:     2.6MB/    2.6MB( 100%) /var/cache/yum/x86_64/14/updates-debugin
  7:     1.5MB/    1.5MB( 100%) /var/cache/yum/x86_64/14/rpmfusion-free/
  8:     1.4MB/    1.4MB( 100%) /var/cache/yum/x86_64/14/rpmfusion-free-
  9:     1.3MB/    1.3MB( 100%) /var/lib/rpm/__db.003
 10:   808.0KB/    1.2MB(  66%) /lib64/libdb-4.8.so

o decreased pagecache total
  1:    69.5MB/   69.5MB( 100%) /var/lib/rpm/Packages
  2:    37.6MB/   37.6MB( 100%) /var/cache/yum/x86_64/14/fedora/19091941
  3:    28.1MB/   28.1MB( 100%) /var/cache/yum/x86_64/14/updates/2d3f230
  4:     7.1MB/    7.1MB( 100%) /var/cache/yum/x86_64/14/fedora-debuginf
  5:     6.8MB/    6.8MB( 100%) /var/cache/yum/x86_64/14/updates/updatei
  6:     2.6MB/    2.6MB( 100%) /var/cache/yum/x86_64/14/updates-debugin
  7:     1.5MB/    1.5MB( 100%) /var/cache/yum/x86_64/14/rpmfusion-free/
  8:     1.4MB/    1.4MB( 100%) /var/cache/yum/x86_64/14/rpmfusion-free-
  9:     1.3MB/    1.3MB( 100%) /var/lib/rpm/__db.003
 10:   808.0KB/    1.2MB(  66%) /lib64/libdb-4.8.so

Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
---

 .../scripts/perl/bin/pagecache-snapshoot-record    |    3 
 .../scripts/perl/bin/pagecache-snapshoot-report    |   34 ++++
 tools/perf/scripts/perl/pagecache-snapshoot.pl     |  188 ++++++++++++++++++++
 3 files changed, 225 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/scripts/perl/bin/pagecache-snapshoot-record
 create mode 100644 tools/perf/scripts/perl/bin/pagecache-snapshoot-report
 create mode 100644 tools/perf/scripts/perl/pagecache-snapshoot.pl


diff --git a/tools/perf/scripts/perl/bin/pagecache-snapshoot-record b/tools/perf/scripts/perl/bin/pagecache-snapshoot-record
new file mode 100644
index 0000000..7ece9c9
--- /dev/null
+++ b/tools/perf/scripts/perl/bin/pagecache-snapshoot-record
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+perf record -D -e mm:dump_inode --filter "nrpages>20" -e mm:dump_header $@
diff --git a/tools/perf/scripts/perl/bin/pagecache-snapshoot-report b/tools/perf/scripts/perl/bin/pagecache-snapshoot-report
new file mode 100644
index 0000000..228c9d2
--- /dev/null
+++ b/tools/perf/scripts/perl/bin/pagecache-snapshoot-report
@@ -0,0 +1,34 @@
+#!/bin/bash
+# description: snapshoot pagecache usage
+# args: <path> <interval>
+
+function find_debugfs_path {
+  echo `cat /proc/mounts | grep debugfs | awk 'NR == 1 {print $2}'`
+}
+
+for i in "$@"
+do
+    if expr match "$i" "-" > /dev/null; then
+        break
+    fi
+    n_args=$(( $n_args + 1 ))
+done
+
+if [ "$n_args" -eq 2 ]; then
+    path=$1
+    interval=$2
+    shift 2
+else
+    echo "usage: pagecache-snapshoot <path> <interval>"
+    exit
+fi
+
+CUR_PID=$$
+while true;
+do
+  test `ps -e | grep $CUR_PID | wc -l` = 0 && break;
+  echo $path > `find_debugfs_path`/tracing/objects/mm/pages/walk-fs
+  sleep $interval
+done&
+
+perf script $@ -s "$PERF_EXEC_PATH"/scripts/perl/pagecache-snapshoot.pl
diff --git a/tools/perf/scripts/perl/pagecache-snapshoot.pl b/tools/perf/scripts/perl/pagecache-snapshoot.pl
new file mode 100644
index 0000000..f87f3fc
--- /dev/null
+++ b/tools/perf/scripts/perl/pagecache-snapshoot.pl
@@ -0,0 +1,188 @@
+#!/usr/bin/perl -w
+# (C) 2011, Keiichi Kii <k-keiichi@bx.jp.nec.com>
+# Licensed under the terms of the GNU GPL License version 2
+
+# Take pagecache snapshoot in system.
+
+use lib "$ENV{'PERF_EXEC_PATH'}/scripts/perl/Perf-Trace-Util/lib";
+use lib "./Perf-Trace-Util/lib";
+use Perf::Trace::Core;
+use Perf::Trace::Context;
+use Perf::Trace::Util;
+use File::Basename qw/basename/;
+use threads;
+use Getopt::Std;
+
+my $interval;
+my $cur_mode = $ARGV[0];
+my $cur_record;
+my $limit = 17;
+my @records;
+my @stats;
+
+sub trace_end {
+    clear_term();
+    analyze();
+    print_stats();
+}
+
+sub mm::dump_header
+{
+    my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+        $common_pid, $common_comm,
+        $object_name, $input_data) = @_;
+
+    clear_term();
+    print_files($cur_record);
+    switch_record($input_data, $common_secs);
+}
+
+sub switch_record() {
+    my ($path, $time) = @_;
+    my %new_record = ();
+
+    $$cur_record{complete} = 1;
+    $new_record{path} = $path;
+    $new_record{time} = $time;
+    $new_record{files} = ();
+    $new_record{complete} = 0;
+
+    if (@records > 0) {
+        foreach my $f (keys %{$$cur_record{files}}) {
+            if ($$cur_record{files}{$f}{pages} == 0) {
+                $new_record{files}{$f} = {cached => 0, pages => 0};
+            } else {
+                %{$new_record{files}{$f}} = %{$$cur_record{files}{$f}};
+                $new_record{files}{$f}{cached} = 1;
+                $new_record{files}{$f}{pages} = 0;
+                $new_record{files}{$f}{change} =
+                    -$$cur_record{files}{$f}{pages};
+            }
+        }
+    }
+    $cur_record = \%new_record;
+    push(@records, \%new_record);
+}
+
+sub mm::dump_inode
+{
+    my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+        $common_pid, $common_comm,
+        $ino, $size, $nrpages, $age, 
+        $state, $dev, $file) = @_;
+
+    if ($file =~ /$$cur_record{path}/ and $nrpages != 0) {
+        if (defined($$cur_record{files}{$dev.":".$ino})) {
+            $$cur_record{files}{$dev.":".$ino}{path} = $file;
+            $$cur_record{files}{$dev.":".$ino}{size} = $size;
+            $$cur_record{files}{$dev.":".$ino}{pages} = $nrpages;
+            $$cur_record{files}{$dev.":".$ino}{change} += $nrpages;
+            $$cur_record{files}{$dev.":".$ino}{cached} = 1;
+            $$cur_record{files}{$dev.":".$ino}{age} = $age;
+        } else {
+            $$cur_record{files}{$dev.":".$ino} =
+                {cached => 1, path => $file, size => $size, pages => $nrpages,
+                 change => (@records == 1) ? 0 : $nrpages, age => $age};
+        }
+    }
+}
+
+sub print_files() {
+    my $record = shift;
+    my $i = 0;
+    my $file_name ="";
+
+    printf("pagecache snapshooting (time: %d, path: %s)\n",
+        $$record{time}, $$record{path});
+    printf("%38s %8s %8s %6s %7s %6s\n",
+           "file name", "cache(B)", "file(B)", "ratio", "+/-(B)", "age");
+    printf("%38s %8s %8s %6s %7s %6s\n",
+           '-'x38, '-'x8, '-'x8, '-'x6, '-'x7, '-'x6);
+    foreach my $f (sort {$$b{pages} <=> $$a{pages}} values %{$$record{files}}) {
+        next if $$f{cached} == 0;
+        $i++;
+        printf("%-38s %8s %8s %5d%% %7s %6d\n", get_file_name($$f{path}, 38),
+               convert_unit($$f{pages} * 4096), convert_unit($$f{size}),
+               ($$f{size} == 0) ? 0 : ($$f{pages} * 4096) / $$f{size} * 100,
+               convert_unit($$f{change} * 4096),
+               $$f{age} / 1000);
+        last if $i >= $limit;
+    }
+}
+
+sub analyze() {
+    foreach my $key (keys %{$records[$#records]{files}}) {
+        my (@result, $total);
+	my %stat = (max => 0, increase => 0, decrease => 0, average => 0);
+
+        @result = grep $$_{files}{$key}{cached} == 1 &&
+            $$_{complete} == 1, @records;
+        $total = grep $$_{complete} == 1, @records;
+	$stat{max} = 0;
+        foreach my $r (@result) {
+            $stat{path} = $$r{files}{$key}{path};
+            if ($stat{max} < $$r{files}{$key}{pages}) {
+                $stat{max} = $$r{files}{$key}{pages};
+            }
+            $stat{average} += $$r{files}{$key}{pages} / $total;
+            if ($$r{files}{$key}{change} > 0) {
+                $stat{increase} += $$r{files}{$key}{change};
+            } else {
+                $stat{decrease} += -$$r{files}{$key}{change};
+            }
+        }
+        $stats{files}{$key} = \%stat;
+    }
+}
+
+sub print_stats() {
+    print "[stat]\n";
+    print "o pagecache max\n";
+    print_stats_elem("max", 10);
+    print "o pagecache average\n";
+    print_stats_elem("average", 10);
+    print "o increased pagecache total\n";
+    print_stats_elem("increase", 10);
+    print "o decreased pagecache total\n";
+    print_stats_elem("decrease", 10);
+}
+
+sub print_stats_elem() {
+    my ($type, $num) = @_;
+    my $i = 1;
+
+    foreach my $s (sort {$$b{$type} <=> $$a{$type}} values %{$stats{files}}) {
+        next if $$s{$type} == 0;
+        if ($type eq "max") {
+            printf("%3d: %8sB %-40s\n", $i, convert_unit($$s{$type} * 4096),
+                   get_file_name($$s{path}, 40));
+        } else {
+            printf("%3d: %8sB/%8sB(%4d%%) %-40s\n", $i,
+                   convert_unit($$s{$type} * 4096),
+                   convert_unit($$s{max} * 4096),
+                   $$s{$type} / $$s{max} * 100, get_file_name($$s{path}, 40));
+        }
+        $i++;
+        last if $i > $num;
+    }
+    print "\n";
+}
+
+@unit = ("K", "M", "G", "T");
+sub convert_unit() {
+    my $size = shift;
+
+    for (my $i=$#unit; $i >= 0; $i--) {
+        if (abs($size) >= 1024 ** ($i+1)) {
+            return sprintf("%.1f%s", $size/1024 ** ($i+1) , $unit[$i]);
+        }
+    }
+    return $size
+}
+
+
+sub get_file_name() {
+    my ($f, $length) = @_;
+
+    return substr($f, 0, $length);
+}



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH -tip 4/5] tracepoints: add tracepoints for pagecache
  2011-07-18 21:31 [RFC PATCH -tip 0/5] perf tools: pagecache monitoring Keiichi KII
                   ` (2 preceding siblings ...)
  2011-07-18 21:36 ` [RFC PATCH -tip 3/5] perf tools: scripts for pagecache snapshooting Keiichi KII
@ 2011-07-18 21:38 ` Keiichi KII
  2011-07-29 10:06   ` Mel Gorman
  2011-07-18 21:40 ` [RFC PATCH -tip 5/5] perf tools: scripts for continuous pagecache monitoring Keiichi KII
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Keiichi KII @ 2011-07-18 21:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Keiichi KII, Ingo Molnar, linux-mm, Tom Zanussi, riel,
	Steven Rostedt, Fr??d??ric Weisbecker, Wu Fengguang, BA, Moussa

From: Keiichi Kii <k-keiichi@bx.jp.nec.com>

This patch adds several tracepoints to track pagecach behavior.
These trecepoints would help us monitor pagecache usage with high resolution.

Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
Cc: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
---

 include/trace/events/filemap.h |   75 ++++++++++++++++++++++++++++++++++++++++
 mm/filemap.c                   |    4 ++
 mm/truncate.c                  |    2 +
 mm/vmscan.c                    |    2 +
 4 files changed, 83 insertions(+), 0 deletions(-)
 create mode 100644 include/trace/events/filemap.h


diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
new file mode 100644
index 0000000..0f83992
--- /dev/null
+++ b/include/trace/events/filemap.h
@@ -0,0 +1,75 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM filemap
+
+#if !defined(_TRACE_FILEMAP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FILEMAP_H
+
+#include <linux/fs.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(find_get_page,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset,
+		struct page *page),
+
+	TP_ARGS(mapping, offset, page),
+
+	TP_STRUCT__entry(
+		__field(dev_t, s_dev)
+		__field(ino_t, i_ino)
+		__field(pgoff_t, offset)
+		__field(struct page *, page)
+		),
+
+	TP_fast_assign(
+		__entry->s_dev = mapping->host ? mapping->host->i_sb->s_dev : 0;
+		__entry->i_ino = mapping->host ? mapping->host->i_ino : 0;
+		__entry->offset = offset;
+		__entry->page = page;
+		),
+
+	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu %s", MAJOR(__entry->s_dev),
+		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset,
+		__entry->page == NULL ? "page_not_found" : "page_found")
+);
+
+DECLARE_EVENT_CLASS(page_cache_template,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset),
+
+	TP_ARGS(mapping, offset),
+
+	TP_STRUCT__entry(
+		__field(dev_t, s_dev)
+		__field(ino_t, i_ino)
+		__field(pgoff_t, offset)
+		),
+
+	TP_fast_assign(
+		__entry->s_dev = mapping->host->i_sb->s_dev;
+		__entry->i_ino = mapping->host->i_ino;
+		__entry->offset = offset;
+		),
+
+	TP_printk("s_dev=%u:%u i_ino=%lu offset=%lu", MAJOR(__entry->s_dev),
+		MINOR(__entry->s_dev), __entry->i_ino, __entry->offset)
+);
+
+DEFINE_EVENT(page_cache_template, add_to_page_cache,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset),
+
+	TP_ARGS(mapping, offset)
+);
+
+DEFINE_EVENT(page_cache_template, remove_from_page_cache,
+
+	TP_PROTO(struct address_space *mapping, pgoff_t offset),
+
+	TP_ARGS(mapping, offset)
+);
+
+#endif /* _TRACE_FILEMAP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/filemap.c b/mm/filemap.c
index a8251a8..9382785 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -35,6 +35,7 @@
 #include <linux/memcontrol.h>
 #include <linux/mm_inline.h> /* for page_is_file_cache() */
 #include <linux/cleancache.h>
+#include <trace/events/filemap.h>
 #include "internal.h"
 
 /*
@@ -169,6 +170,7 @@ void delete_from_page_cache(struct page *page)
 	spin_lock_irq(&mapping->tree_lock);
 	__delete_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);
+	trace_remove_from_page_cache(mapping, page->index);
 	mem_cgroup_uncharge_cache_page(page);
 
 	if (freepage)
@@ -484,6 +486,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 			if (PageSwapBacked(page))
 				__inc_zone_page_state(page, NR_SHMEM);
 			spin_unlock_irq(&mapping->tree_lock);
+			trace_add_to_page_cache(mapping, offset);
 		} else {
 			page->mapping = NULL;
 			spin_unlock_irq(&mapping->tree_lock);
@@ -734,6 +737,7 @@ repeat:
 out:
 	rcu_read_unlock();
 
+	trace_find_get_page(mapping, offset, page);
 	return page;
 }
 EXPORT_SYMBOL(find_get_page);
diff --git a/mm/truncate.c b/mm/truncate.c
index e13f22e..5b10356 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -22,6 +22,7 @@
 #include <linux/cleancache.h>
 #include "internal.h"
 
+#include <trace/events/filemap.h>
 
 /**
  * do_invalidatepage - invalidate part or all of a page
@@ -406,6 +407,7 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)
 	BUG_ON(page_has_private(page));
 	__delete_from_page_cache(page);
 	spin_unlock_irq(&mapping->tree_lock);
+	trace_remove_from_page_cache(mapping, page->index);
 	mem_cgroup_uncharge_cache_page(page);
 
 	if (mapping->a_ops->freepage)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5ed24b9..b7aea3a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -53,6 +53,7 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
+#include <trace/events/filemap.h>
 
 /*
  * reclaim_mode determines how the inactive list is shrunk
@@ -532,6 +533,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page)
 
 		__delete_from_page_cache(page);
 		spin_unlock_irq(&mapping->tree_lock);
+		trace_remove_from_page_cache(mapping, page->index);
 		mem_cgroup_uncharge_cache_page(page);
 
 		if (freepage != NULL)



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH -tip 5/5] perf tools: scripts for continuous pagecache monitoring
  2011-07-18 21:31 [RFC PATCH -tip 0/5] perf tools: pagecache monitoring Keiichi KII
                   ` (3 preceding siblings ...)
  2011-07-18 21:38 ` [RFC PATCH -tip 4/5] tracepoints: add tracepoints for pagecache Keiichi KII
@ 2011-07-18 21:40 ` Keiichi KII
  2011-07-21  7:01 ` [RFC PATCH -tip 0/5] perf tools: " Ingo Molnar
  2011-07-29  9:14 ` Mel Gorman
  6 siblings, 0 replies; 11+ messages in thread
From: Keiichi KII @ 2011-07-18 21:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: Keiichi KII, Ingo Molnar, linux-mm, Tom Zanussi, riel,
	Steven Rostedt, Fr??d??ric Weisbecker, Wu Fengguang, BA, Moussa

From: Keiichi Kii <k-keiichi@bx.jp.nec.com>

The "continuous pagecache monitoring" is implemented based on
"pagecache tracepoints" and the trace stream scripting support
in perf tools.

To monitor dynamic changes for pagecaches,
we can run "perf script pagecachetop {file|process}".
ex) perf script pagecachetop file
    => monitor pagecache behavior on the basis of file

This tool shows two types of the output.

o One is to show pagecache behavior on the basis of "file"

pagecache behavior per file (time:20207, interval:10)

                         find        hit    cache      add   remove  proc
                file    count      ratio pages(B) pages(B) pages(B) count
-------------------- -------- ---------- -------- -------- -------- -----
            Packages    32813    100.00%    69.5M        0        0     1
190919419ab3582cb090    30677    100.00%    37.6M        0        0     1
2d3f2307106003b599d2    10715    100.00%    17.0M        0        0     1
29fe4f91d89bab54d355     5545    100.00%     7.1M        0        0     1
c5ee54fd83797583e6c2     1823    100.00%     2.6M        0        0     1
        libc-2.13.so      830    100.00%     1.2M        0        0     9
            __db.003      540    100.00%     1.3M        0        0     1
8faff879329920b2638a      439    100.00%     1.4M        0        0     1
1b695937ce00a8c305ee      352    100.00%     1.5M        0        0     1
 libpython2.7.so.1.0      330    100.00%     1.5M        0        0     1
                bash      283    100.00%   828.0K        0        0     6
         ld.so.cache      227    100.00%   116.0K        0        0     5
        .zsh_history      196    100.00%   772.0K        0        0     1
fdc15d6feaec65abbfae      196    100.00%   464.0K        0        0     1
3b316befdc0469fa84b7      192    100.00%   324.0K        0        0     1

o The other is to show pagecache behaviors on the basis of "process"

pagecache behavior per process (time:20160, interval:10)

                         find        hit      add   remove  file
             process    count      ratio pages(B) pages(B) count
-------------------- -------- ---------- -------- -------- -----
            yum-9006    97210     99.25%    99.9M        0    40
           xmms-7768       43    100.00%   128.0K        0     1
          crond-1307        8    100.00%        0        0     1
       rsyslogd-7194        1    100.00%        0        0     1

Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
---

 tools/perf/scripts/perl/bin/pagecachetop-record |    3 
 tools/perf/scripts/perl/bin/pagecachetop-report |   21 ++
 tools/perf/scripts/perl/pagecachetop.pl         |  292 +++++++++++++++++++++++
 3 files changed, 316 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/scripts/perl/bin/pagecachetop-record
 create mode 100644 tools/perf/scripts/perl/bin/pagecachetop-report
 create mode 100644 tools/perf/scripts/perl/pagecachetop.pl


diff --git a/tools/perf/scripts/perl/bin/pagecachetop-record b/tools/perf/scripts/perl/bin/pagecachetop-record
new file mode 100644
index 0000000..2c05539
--- /dev/null
+++ b/tools/perf/scripts/perl/bin/pagecachetop-record
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+perf record -D -e filemap:find_get_page -e filemap:add_to_page_cache -e filemap:remove_from_page_cache -e mm:dump_inode --filter "nrpages>10" $@
diff --git a/tools/perf/scripts/perl/bin/pagecachetop-report b/tools/perf/scripts/perl/bin/pagecachetop-report
new file mode 100644
index 0000000..62c54bb
--- /dev/null
+++ b/tools/perf/scripts/perl/bin/pagecachetop-report
@@ -0,0 +1,21 @@
+#!/bin/bash
+# description: continuous pagecache monitoring per file
+
+for i in "$@"
+do
+    if expr match "$i" "-" > /dev/null; then
+        break
+    fi
+    n_args=$(( $n_args + 1 ))
+done
+
+if [ "$n_args" -eq 1 ] ; then
+    mode=$1
+    shift
+else
+    echo "usage: pagecachetop {file|process}"
+    echo $@
+    exit
+fi
+
+perf script $@ -s "$PERF_EXEC_PATH"/scripts/perl/pagecachetop.pl $mode
diff --git a/tools/perf/scripts/perl/pagecachetop.pl b/tools/perf/scripts/perl/pagecachetop.pl
new file mode 100644
index 0000000..ec77f89
--- /dev/null
+++ b/tools/perf/scripts/perl/pagecachetop.pl
@@ -0,0 +1,292 @@
+#!/usr/bin/perl -w
+# (C) 2011, Keiichi Kii <k-keiichi@bx.jp.nec.com>
+# Licensed under the terms of the GNU GPL License version 2
+
+# pagecache top
+#
+# Periodically display system-wide pagecache activity focusing on
+# process or file. If "process" arg is specified, it displays
+# pagecache behavior per each process. If "file" arg is specified,
+# it displays pagecache behavior per each file.
+
+use 5.010000;
+use strict;
+use warnings;
+
+use lib "$ENV{'PERF_EXEC_PATH'}/scripts/perl/Perf-Trace-Util/lib";
+use lib "./Perf-Trace-Util/lib";
+use Perf::Trace::Core;
+use Perf::Trace::Context;
+use Perf::Trace::Util;
+use File::Basename qw/basename/;
+
+my %files;
+my %processes;
+my $interval = 10;
+my $pre_print_time = 0;
+my $print_limit = 20;
+my $debugfs_mountpoint;
+my $mode = shift;
+
+sub trace_begin {
+    $debugfs_mountpoint = find_debugfs_mntpt();
+}
+
+sub find_debugfs_mntpt() {
+    my $path = "";
+    open my $fh, "<", "/proc/mounts"
+        or die "Can't open /proc/mounts: $!";
+    while (my $l = <$fh>) {
+        if ($l =~ /debugfs/) {
+            $path = (split(/\s/, $l))[1];
+        }
+    }
+    close($fh);
+    return $path;
+}
+
+sub mm::dump_inode {
+    my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+        $common_pid, $common_comm,
+        $ino, $size, $nrpages, $age, 
+        $state, $dev, $file) = @_;
+
+    my $f = get_file($dev, $ino);
+    return if !$f;
+
+    $$f{path} = $file;
+    $$f{cache} = $nrpages;
+}
+
+sub filemap::remove_from_page_cache {
+    my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+        $common_pid, $common_comm,
+        $s_dev, $i_ino, $offset) = @_;
+
+    my $s = get_stat($common_pid, $common_comm, $s_dev, $i_ino);
+    return if !$s;
+
+    $$s{remove}++;
+    print_check($common_secs);
+}
+
+sub filemap::add_to_page_cache {
+    my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+        $common_pid, $common_comm,
+        $s_dev, $i_ino, $offset) = @_;
+
+    my $s = get_stat($common_pid, $common_comm, $s_dev, $i_ino);
+    return if !$s;
+
+    $$s{add}++;
+    print_check($common_secs);
+}
+
+sub filemap::find_get_page {
+    my ($event_name, $context, $common_cpu, $common_secs, $common_nsecs,
+        $common_pid, $common_comm,
+        $s_dev, $i_ino, $offset, $page) = @_;
+
+    my $s = get_stat($common_pid, $common_comm, $s_dev, $i_ino);
+    return if !$s;
+
+    $$s{find}++;
+    if ($page != 0) {
+        $$s{hit}++;
+    }
+
+    print_check($common_secs);
+}
+
+sub get_file {
+    my ($dev, $inode) = @_;
+
+    return if major($dev) == 0;
+
+    if (!defined($files{$dev.":".$inode})) {
+        $files{$dev.":".$inode} = init_file($dev, $inode);
+    }
+
+    return $files{$dev.":".$inode};
+}
+
+sub get_stat {
+    my ($pid, $cmd, $dev, $inode) = @_;
+    my %stat = (find => 0, hit => 0, add => 0, remove => 0);
+
+    return if major($dev) == 0;
+
+    if (!defined($processes{$pid})) {
+        $processes{$pid} = init_process($pid, $cmd);
+    }
+    if (!defined($files{$dev.":".$inode})) {
+        $files{$dev.":".$inode} = init_file($dev, $inode);
+    }
+    if (!defined($processes{$pid}{file}{$dev.":".$inode})) {
+        $files{$dev.":".$inode}{process}{$pid} = \%stat;
+        $processes{$pid}{file}{$dev.":".$inode} = \%stat;
+        return \%stat;
+    }
+    return $files{$dev.":".$inode}{process}{$pid};
+}
+
+sub init_file() {
+    my ($dev, $inode) = @_;
+    my %f;
+
+    $f{path} = major($dev).":".minor($dev).",".$inode;
+    $f{cache} = 0;
+    $f{stat} = {find => 0, hit => 0, add => 0, remove => 0};
+    $f{process} = {};
+
+    return \%f;
+}
+
+sub init_process() {
+    my ($pid, $cmd) = @_;
+    my %p;
+
+    $p{name} = $cmd."-".$pid;
+    $p{stat} = {find => 0, hit => 0, add => 0, remove => 0};
+    $p{file} = {};
+
+    return \%p;
+}
+
+sub print_check() {
+    my $cur_sec = shift;
+
+    if ($pre_print_time == 0) {
+        $pre_print_time = $cur_sec;
+        return
+    }
+    if ($cur_sec - $pre_print_time > $interval) {
+        dump_fs_pagecache("/");
+        clear_term();
+        if ($mode eq "file") {
+            print_files($cur_sec);
+        } elsif ($mode eq "process") {
+            print_processes($cur_sec);
+        }
+        clear_stats();
+        $pre_print_time = $cur_sec;
+    }
+}
+
+sub clear_stats {
+    foreach my $f (values %files) {
+        $$f{stat} = {find => 0, hit => 0, add => 0, remove => 0};
+        $$f{process} = ();
+    }
+    %processes = ();
+}
+
+sub minor {
+    my $dev = shift;
+    return $dev & ((1 << 20) - 1);
+}
+
+sub major {
+    my $dev = shift;
+    return $dev >> 20;
+}
+
+sub print_files {
+    my $cur_sec = shift;
+    my $i = 0;
+
+    foreach my $f (values %files) {
+        foreach my $s (values %{$$f{process}}) {
+            add_stat($$f{stat}, $s);
+        }
+    }
+
+    printf "pagecache behavior per file (time:%d, interval:%d)\n\n"
+        ,$cur_sec, $interval;
+    printf("%20s %8s %10s %8s %8s %8s %5s\n",
+           "", "find", "hit", "cache", "add", "remove", "proc");
+    printf("%20s %8s %10s %8s %8s %8s %5s\n", "file", "count", "ratio",
+           "pages(B)", "pages(B)", "pages(B)", "count");
+    printf("%20s %8s %10s %8s %8s %8s %5s\n",
+           '-'x20, '-'x8, '-'x10, '-'x8, '-'x8, '-'x8, '-'x5);
+    foreach my $f (sort {$$b{stat}{find} <=> $$a{stat}{find}} values %files) {
+        $i++;
+        my $pcount = scalar(keys(%{$$f{process}}));
+        if ($pcount != 0) {
+            printf("%20s %8s %9.2f%% %8s %8s %8s %5d\n",
+                   substr(basename($$f{path}), 0, 20),
+                   $$f{stat}{find},
+                   ($$f{stat}{find} == 0) ?
+                   0 : $$f{stat}{hit} / $$f{stat}{find} * 100,
+                   ($$f{cache} != 0) ? convert_unit($$f{cache} * 4096): "N/A",
+                   convert_unit($$f{stat}{add} * 4096),
+                   convert_unit($$f{stat}{remove} * 4096),
+                   $pcount);
+        }
+        last if $i >= $print_limit;
+    }
+}
+
+sub print_processes {
+    my $cur_sec = shift;
+    my $i = 0;
+
+    foreach my $p (values %processes) {
+        foreach my $s (values %{$$p{file}}) {
+            add_stat($$p{stat}, $s);
+        }
+    }
+
+    printf "pagecache behavior per process (time:%d, interval:%d)\n\n"
+        ,$cur_sec, $interval;
+    printf("%20s %8s %10s %8s %8s %5s\n",
+           "", "find", "hit", "add", "remove", "file");
+    printf("%20s %8s %10s %8s %8s %5s\n", "process", "count", "ratio",
+           "pages(B)", "pages(B)", "count");
+    printf("%20s %8s %10s %8s %8s %5s\n",
+           '-'x20, '-'x8, '-'x10, '-'x8, '-'x8, '-'x5);
+    foreach my $p (sort {$$b{stat}{find} <=> $$a{stat}{find}} values %processes) {
+        $i++;
+        my $fcount = scalar(keys(%{$$p{file}}));
+        if ($fcount != 0) {
+            printf("%20s %8s %9.2f%% %8s %8s %5d\n",
+                   substr(basename($$p{name}), 0, 20),
+                   $$p{stat}{find},
+                   ($$p{stat}{find} == 0) ?
+                   0 : $$p{stat}{hit} / $$p{stat}{find} * 100,
+                   convert_unit($$p{stat}{add} * 4096),
+                   convert_unit($$p{stat}{remove} * 4096),
+                   $fcount);
+        }
+        last if $i >= $print_limit;
+    }
+}
+
+my @unit = ("K", "M", "G", "T");
+sub convert_unit() {
+    my $size = shift;
+
+     for (my $i=$#unit; $i >= 0; $i--) {
+        if (abs($size) >= 1024 ** ($i+1)) {
+            return sprintf("%.1f%s", $size/1024 ** ($i+1) , $unit[$i]);
+        }
+    }
+    return $size
+}
+
+sub dump_fs_pagecache() {
+    my $path = shift;
+    open my $fh, ">", "$debugfs_mountpoint/tracing/objects/mm/pages/walk-fs"
+        or die "Can't open tracing/objects/mm/pages/walk-fs: $!";
+    print $fh "$path\n";
+    close($fh);
+}
+
+sub add_stat {
+    my ($s1, $s2) = @_;
+
+    $$s1{find} += $$s2{find};
+    $$s1{hit} += $$s2{hit};
+    $$s1{add} += $$s2{add};
+    $$s1{remove} += $$s2{remove};
+}



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH -tip 0/5] perf tools: pagecache monitoring
  2011-07-18 21:31 [RFC PATCH -tip 0/5] perf tools: pagecache monitoring Keiichi KII
                   ` (4 preceding siblings ...)
  2011-07-18 21:40 ` [RFC PATCH -tip 5/5] perf tools: scripts for continuous pagecache monitoring Keiichi KII
@ 2011-07-21  7:01 ` Ingo Molnar
  2011-07-29  0:28   ` Keiichi KII
  2011-07-29  9:14 ` Mel Gorman
  6 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2011-07-21  7:01 UTC (permalink / raw)
  To: Keiichi KII, Wu Fengguang, Mel Gorman
  Cc: linux-kernel, linux-mm, Tom Zanussi, riel, Steven Rostedt,
	Fr??d??ric Weisbecker, Wu Fengguang, BA, Moussa


* Keiichi KII <k-keiichi@bx.jp.nec.com> wrote:

> Hello,
> 
> I would propose pagecache monitoring tools using perf tools.
> The purpose of the tools is to clarify pagecache behavior in a system.
> 
> We can now know system-wide pagecache usage by "/proc/meminfo".
> But we don't have any way to get higher resolution information like
> per file or per process usage than system-wide one.
> If pagecache miss hit ratio rises due to unnecessary adding/removing
> pagecaches, maybe it leads to extra I/O and affects system performance.
> But it's difficult to find out what is going on in the system.
> 
> So, the tools I propose provide 2 functions:
> 
> 1. pagecache snapshooting(perf script pagecache-snapshoot)
> 
> This function clarifies pagecache usage per each file in the system.
> This function is based mainly on "pagecache object collections" that is
> developed by Wu Fengguang (http://lkml.org/lkml/2010/2/9/156).
> The following is sample output of this function.
> 
> pagecache snapshooting (time: 14131, path: /home)
>                              file name cache(B)  file(B)  ratio  +/-(B)    age
> -------------------------------------- -------- -------- ------ ------- ------
> /home/foo/git/linux-2.6-tip/.git/objec    71.0M   436.6M    16%       0   9012
> /home/foo/git/linux-2.6-tip/.git/objec    49.6M    57.7M    86%       0   9012
> /home/foo/.thunderbird/xso5zn7g.defaul    19.8M    19.8M   100%       0   7223
> /home/foo/.thunderbird/xso5zn7g.defaul     5.7M     5.7M   100%       0   6621
> /home/foo/git/linux-2.6-tip/.git/index     3.5M     3.5M   100%       0   4306
> /home/foo/.thunderbird/xso5zn7g.defaul     2.2M     2.2M   100%       0   7524
> /home/foo/.thunderbird/xso5zn7g.defaul     2.2M     2.2M   100%       0   7526
> /home/foo/.thunderbird/xso5zn7g.defaul     1.7M     1.7M   100%       0   6921
> ...
> 
> 2. continuous pagecache monitoring(perf script pagecachetop)
> 
> This function clarifies pagecache behavior like pagecache hit ratio and
> added/removed pagecache amount on the basis of file/process.
> This functions is based on pagecache tracepoints I propose.
> While the pagecache snapshooting can take a pagecache snapshoot at a point,
> the continuous pagecache monitoring can measure dynamic change between
> 2 snapshoots.
> The following is sample output of this function.
> 
> pagecache behavior per file (time:15826, interval:10)
> 
>                          find        hit    cache      add   remove  proc
>                 file    count      ratio pages(B) pages(B) pages(B) count
> -------------------- -------- ---------- -------- -------- -------- -----
>         libc-2.13.so      620    100.00%     1.2M        0        0     7
>                 bash      283    100.00%   888.0K        0        0     6
>           ld-2.13.so      136    100.00%   148.0K        0        0     6
>                 gawk      130    100.00%   376.0K        0        0     2
>          ld.so.cache       60    100.00%   116.0K        0        0     4
> ...
> 
> pagecache behavior per process (time:16294, interval:10)
> 
>                          find        hit      add   remove  file
>              process    count      ratio pages(B) pages(B) count
> -------------------- -------- ---------- -------- -------- -----
>             zsh-7761     2968     99.93%     4.0K        0   246
>            perf-7758      369    100.00%        0        0    17
>            xmms-7634       52    100.00%        0        0     1
>            perf-7759       11    100.00%        0        0     2
>             zsh-2815        6     83.33%     4.0K     4.0K     2
>        gconfd-2-4849        3      0.00%    12.0K    12.0K     4
>        rsyslogd-7194        1    100.00%        0        0     1
> 
> 
> By these 2 functions, we can find out whether pagecaches are used
> efficiently or not.
> And also these tools would help us tune some applications like database.
> It will also help us tune the kernel parameters like "vm.dirty_*".
> 
> My patches are based on the latest "linux-tip.git" tree and
> also the following 3 commits in "tip:tracing/mm" and a "pagecache
> object collections" patch. 
> 
>   - dcac8cd: tracing/mm: add page frame snapshot trace
>   - 1487a7a: tracing/mm: fix mapcount trace record field
>   - eb46710: tracing/mm: rename 'trigger' file to 'dump_range'
>   - http://lkml.org/lkml/2010/2/9/156
> 
> Any comments are welcome.

I totally like the approach you have taken here.

Note that tracepoints need a detailed, careful review from interested 
mm folks.

The set of tracepoints does not have to be complete but the 
tracepoints have to be well thought out and near-perfect in this 
context they are instrumenting, with an eye on future extensions with 
the goal of making them painless.

the pagecache tracepoints you have added are:

 include/trace/events/filemap.h |   75 ++++++++++++++++++++++++++++++++++++++++
 mm/filemap.c                   |    4 ++
 mm/truncate.c                  |    2 +
 mm/vmscan.c                    |    2 +
 4 files changed, 83 insertions(+), 0 deletions(-)

So once such kind of review has been iterated through and Andrew et 
al is happy with it i'd be more than happy to dust off the tracing/mm 
bits (which have been done two years ago) and get it all to Linus.

Andrew, Mel, Fengguang?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH -tip 0/5] perf tools: pagecache monitoring
  2011-07-21  7:01 ` [RFC PATCH -tip 0/5] perf tools: " Ingo Molnar
@ 2011-07-29  0:28   ` Keiichi KII
  0 siblings, 0 replies; 11+ messages in thread
From: Keiichi KII @ 2011-07-29  0:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Wu Fengguang, Mel Gorman, linux-kernel, linux-mm, Tom Zanussi,
	riel, Steven Rostedt, Fr??d??ric Weisbecker, BA, Moussa

>> My patches are based on the latest "linux-tip.git" tree and
>> also the following 3 commits in "tip:tracing/mm" and a "pagecache
>> object collections" patch. 
>>
>>   - dcac8cd: tracing/mm: add page frame snapshot trace
>>   - 1487a7a: tracing/mm: fix mapcount trace record field
>>   - eb46710: tracing/mm: rename 'trigger' file to 'dump_range'
>>   - http://lkml.org/lkml/2010/2/9/156
>>
>> Any comments are welcome.
> 
> I totally like the approach you have taken here.
> 
> Note that tracepoints need a detailed, careful review from interested 
> mm folks.
> 
> The set of tracepoints does not have to be complete but the 
> tracepoints have to be well thought out and near-perfect in this 
> context they are instrumenting, with an eye on future extensions with 
> the goal of making them painless.
> 
> the pagecache tracepoints you have added are:
> 
>  include/trace/events/filemap.h |   75 ++++++++++++++++++++++++++++++++++++++++
>  mm/filemap.c                   |    4 ++
>  mm/truncate.c                  |    2 +
>  mm/vmscan.c                    |    2 +
>  4 files changed, 83 insertions(+), 0 deletions(-)
> 
> So once such kind of review has been iterated through and Andrew et 
> al is happy with it i'd be more than happy to dust off the tracing/mm 
> bits (which have been done two years ago) and get it all to Linus.
> 
> Andrew, Mel, Fengguang?

Thank you for your comments. 
And I agree the tracepoints need reviews from mm folks.
I think the patches in tracing/mm are useful for monitoring mm behavior.
Is there any comments for these patches(especially the tracepoints)?
Any comments are welcome.

Thanks,
Keiichi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH -tip 0/5] perf tools: pagecache monitoring
  2011-07-18 21:31 [RFC PATCH -tip 0/5] perf tools: pagecache monitoring Keiichi KII
                   ` (5 preceding siblings ...)
  2011-07-21  7:01 ` [RFC PATCH -tip 0/5] perf tools: " Ingo Molnar
@ 2011-07-29  9:14 ` Mel Gorman
  6 siblings, 0 replies; 11+ messages in thread
From: Mel Gorman @ 2011-07-29  9:14 UTC (permalink / raw)
  To: Keiichi KII
  Cc: linux-kernel, Ingo Molnar, linux-mm, Tom Zanussi, riel,
	Steven Rostedt, Fr??d??ric Weisbecker, Wu Fengguang, BA, Moussa

On Mon, Jul 18, 2011 at 05:31:09PM -0400, Keiichi KII wrote:
> Hello,
> 
> I would propose pagecache monitoring tools using perf tools.
> The purpose of the tools is to clarify pagecache behavior in a system.
> 
> We can now know system-wide pagecache usage by "/proc/meminfo".
> But we don't have any way to get higher resolution information like
> per file or per process usage than system-wide one.

It's convulated and slow but it can be determined what processes are
mapping pages from a file by scanning /proc/PID/maps to identify what
files are mapped and then using /proc/PID/pageflags to see what pages
are mapped on a per-process basis.

> If pagecache miss hit ratio rises due to unnecessary adding/removing
> pagecaches, maybe it leads to extra I/O and affects system performance.
> But it's difficult to find out what is going on in the system.
> 

What is "unnecessary adding/removing pagecaches"? If it's added, it
means either someone has faulted the page or it is due to readahead.
It's necessary to something.

> So, the tools I propose provide 2 functions:
> 
> 1. pagecache snapshooting(perf script pagecache-snapshoot)
> 

snapshot, not snapshoot :)

> This function clarifies pagecache usage per each file in the system.
> This function is based mainly on "pagecache object collections" that is
> developed by Wu Fengguang (http://lkml.org/lkml/2010/2/9/156).
> The following is sample output of this function.
> 
> pagecache snapshooting (time: 14131, path: /home)
>                              file name cache(B)  file(B)  ratio  +/-(B)    age
> -------------------------------------- -------- -------- ------ ------- ------
> /home/foo/git/linux-2.6-tip/.git/objec    71.0M   436.6M    16%       0   9012
> /home/foo/git/linux-2.6-tip/.git/objec    49.6M    57.7M    86%       0   9012
> /home/foo/.thunderbird/xso5zn7g.defaul    19.8M    19.8M   100%       0   7223
> /home/foo/.thunderbird/xso5zn7g.defaul     5.7M     5.7M   100%       0   6621
> /home/foo/git/linux-2.6-tip/.git/index     3.5M     3.5M   100%       0   4306
> /home/foo/.thunderbird/xso5zn7g.defaul     2.2M     2.2M   100%       0   7524
> /home/foo/.thunderbird/xso5zn7g.defaul     2.2M     2.2M   100%       0   7526
> /home/foo/.thunderbird/xso5zn7g.defaul     1.7M     1.7M   100%       0   6921
> ...
> 
> 2. continuous pagecache monitoring(perf script pagecachetop)
> 
> This function clarifies pagecache behavior like pagecache hit ratio and
> added/removed pagecache amount on the basis of file/process.
> This functions is based on pagecache tracepoints I propose.
> While the pagecache snapshooting can take a pagecache snapshoot at a point,
> the continuous pagecache monitoring can measure dynamic change between
> 2 snapshoots.
> The following is sample output of this function.
> 
> pagecache behavior per file (time:15826, interval:10)
> 
>                          find        hit    cache      add   remove  proc
>                 file    count      ratio pages(B) pages(B) pages(B) count
> -------------------- -------- ---------- -------- -------- -------- -----
>         libc-2.13.so      620    100.00%     1.2M        0        0     7
>                 bash      283    100.00%   888.0K        0        0     6
>           ld-2.13.so      136    100.00%   148.0K        0        0     6
>                 gawk      130    100.00%   376.0K        0        0     2
>          ld.so.cache       60    100.00%   116.0K        0        0     4
> ...
> 
> pagecache behavior per process (time:16294, interval:10)
> 
>                          find        hit      add   remove  file
>              process    count      ratio pages(B) pages(B) count
> -------------------- -------- ---------- -------- -------- -----
>             zsh-7761     2968     99.93%     4.0K        0   246
>            perf-7758      369    100.00%        0        0    17
>            xmms-7634       52    100.00%        0        0     1
>            perf-7759       11    100.00%        0        0     2
>             zsh-2815        6     83.33%     4.0K     4.0K     2
>        gconfd-2-4849        3      0.00%    12.0K    12.0K     4
>        rsyslogd-7194        1    100.00%        0        0     1
> 

Ok, I can see how this could identify hot files in the system and
processes that are currently active but I'm not seeing what problem
could be solved with this information.

For example, a process that is suffering a poor hit ratio can usually
be identified with a combination of vmstat showing a large amount
of activity with page in/out or swap in/out and a profile showing
a process that is spending a high percentage of its time servicing
faults. If a profile is unsuitable, picking active processes from
top and then monitoring /proc/PID/stat and the fault counters can
identify which process in trouble.

Can you give an example of a problem that using this tool would
identify and help resolve?

> By these 2 functions, we can find out whether pagecaches are used
> efficiently or not.
> And also these tools would help us tune some applications like database.

If the database is using direct IO, will it even show up in this tool?

> It will also help us tune the kernel parameters like "vm.dirty_*".
> 

Does it though? A dirty_ratio that is too low can cause stalls as
processes get throttled cleaning pages but the pages are still present
in the page cache and the hit ratio would not be affected.

I'm not saying this is a bad idea but the changelog needs to be improved
to explain why this information is helpful and why the existing
mechanisms are either unsuitable or just too awkward to use.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH -tip 2/5] tracing/mm: add header event for object collections
  2011-07-18 21:34 ` [RFC PATCH -tip 2/5] tracing/mm: add header event for object collections Keiichi KII
@ 2011-07-29  9:55   ` Mel Gorman
  0 siblings, 0 replies; 11+ messages in thread
From: Mel Gorman @ 2011-07-29  9:55 UTC (permalink / raw)
  To: Keiichi KII
  Cc: linux-kernel, Ingo Molnar, linux-mm, Tom Zanussi, riel,
	Steven Rostedt, Fr??d??ric Weisbecker, Wu Fengguang, BA, Moussa

On Mon, Jul 18, 2011 at 05:34:45PM -0400, Keiichi KII wrote:
> From: Keiichi Kii <k-keiichi@bx.jp.nec.com>
> 
> We can use this "dump_header" event to separate trace data
> for the object collections.
> 

dump_header is a very generic name. A "header" could apply to almost
anything. Network packets have headers but is unrelated to this.

> Usage and Sample output:
> 
> zsh  2815 [001]  8819.880776: dump_header: object=mm/pages/walk-fs input=/
> zsh  2815 [001]  8819.880786: dump_inode: ino=139161 size=507416 cached=507904 age=29 dirty=7 dev=254:0 file=strchr
> zsh  2815 [001]  8819.880790: dump_pagecache_range: index=0 len=1 flags=4000000000000878 count=2 mapcount=0
> zsh  2815 [001]  8819.880793: dump_pagecache_range: index=1 len=18 flags=400000000000087c count=2 mapcount=0
> zsh  2815 [001]  8819.880795: dump_pagecache_range: index=19 len=1 flags=400000000000083c count=2 mapcount=0
> zsh  2815 [001]  8819.880796: dump_pagecache_range: index=20 len=2 flags=400000000000087c count=2 mapcount=0
> ...
> zsh  2816 [001]  8820.XXXXXX: dump_header: object=mm/pages/walk-fs input=/

Is it possible for other trace information to appear in the middle of
this? In particular, is it possible for a new "dump_header" to appear in
the middle of an existing dump?

> Signed-off-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
> ---
> 
>  include/trace/events/mm.h |   19 +++++++++++++++++++
>  kernel/trace/trace_mm.c   |    9 +++++++++
>  2 files changed, 28 insertions(+), 0 deletions(-)
> 

Where are these files? Your leader makes reference to latest linux-tip
but there are a few trees called linux-tip. Even then, which latest
branch? I dug through Ingo's linux-tip tree but couldn't find where the
dump_inode tracepoint was to look at it so I couldn't review the
changes. Sorry if I missed something obvious :(

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH -tip 4/5] tracepoints: add tracepoints for pagecache
  2011-07-18 21:38 ` [RFC PATCH -tip 4/5] tracepoints: add tracepoints for pagecache Keiichi KII
@ 2011-07-29 10:06   ` Mel Gorman
  0 siblings, 0 replies; 11+ messages in thread
From: Mel Gorman @ 2011-07-29 10:06 UTC (permalink / raw)
  To: Keiichi KII
  Cc: linux-kernel, Ingo Molnar, linux-mm, Tom Zanussi, riel,
	Steven Rostedt, Fr??d??ric Weisbecker, Wu Fengguang, BA, Moussa

On Mon, Jul 18, 2011 at 05:38:03PM -0400, Keiichi KII wrote:
> From: Keiichi Kii <k-keiichi@bx.jp.nec.com>
> 
> This patch adds several tracepoints to track pagecach behavior.
> These trecepoints would help us monitor pagecache usage with high resolution.
> 

There are a few spelling mistakes there but what harm. This is an RFC.

Again, it would be really nice if the changelog explained why it was
useful to monitor pagecache usage at this resolution. For example,
I could identify files with high and low hit ratios and conceivably
identify system activity that resulted in page cache being trashed.
However, even in that case, I don't necessarily care which files got
chucked out and that sort of problem can also be seen via fault rates.

Another scenario that may be useful is it could potentially identify an
application bug that was invalidating a portion of a file that was in
fact hot and in use by other processes. I'm sure you have much better
examples that motivated the development of this series :)

The tracepoints themselves look fine.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-07-29 10:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-18 21:31 [RFC PATCH -tip 0/5] perf tools: pagecache monitoring Keiichi KII
2011-07-18 21:33 ` [RFC PATCH -tip 1/5] perf tools: handle '-' and '*' in trace parsing Keiichi KII
2011-07-18 21:34 ` [RFC PATCH -tip 2/5] tracing/mm: add header event for object collections Keiichi KII
2011-07-29  9:55   ` Mel Gorman
2011-07-18 21:36 ` [RFC PATCH -tip 3/5] perf tools: scripts for pagecache snapshooting Keiichi KII
2011-07-18 21:38 ` [RFC PATCH -tip 4/5] tracepoints: add tracepoints for pagecache Keiichi KII
2011-07-29 10:06   ` Mel Gorman
2011-07-18 21:40 ` [RFC PATCH -tip 5/5] perf tools: scripts for continuous pagecache monitoring Keiichi KII
2011-07-21  7:01 ` [RFC PATCH -tip 0/5] perf tools: " Ingo Molnar
2011-07-29  0:28   ` Keiichi KII
2011-07-29  9:14 ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).