linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Add tracepoints to track pagecache transition
@ 2009-02-17  9:00 Atsushi Tsuji
  2009-02-17  9:33 ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Atsushi Tsuji @ 2009-02-17  9:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jason Baron, Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler,
	Kazuto Miyoshi, rostedt

[-- Attachment #1: Type: text/plain, Size: 2564 bytes --]

Hi,

The below patch adds instrumentation for pagecache.

I thought it would be useful to trace pagecache behavior for problem
analysis (performance bottlenecks, behavior differences between stable
time and trouble time).

By using those tracepoints, we can describe and visualize pagecache
transition (file-by-file basis) in kernel and  pagecache
consumes most of the memory in running system and pagecache hit rate
and writeback behavior will influence system load and performance.

I attached an example which is visualization of pagecache status using
SystemTap. That graph describes pagecache transition of File A and File B
on a file-by-file basis with the situation where regular I/O to File A
is delayed because of other I/O to File B. We visually understand
pagecache for File A is narrowed down due to I/O pressure from File B.


Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
---
diff --git a/include/trace/filemap.h b/include/trace/filemap.h
new file mode 100644
index 0000000..196955e
--- /dev/null
+++ b/include/trace/filemap.h
@@ -0,0 +1,13 @@
+#ifndef _TRACE_FILEMAP_H
+#define _TRACE_FILEMAP_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_TRACE(filemap_add_to_page_cache,
+	TPPROTO(struct address_space *mapping, pgoff_t offset),
+	TPARGS(mapping, offset));
+DECLARE_TRACE(filemap_remove_from_page_cache,
+	TPPROTO(struct address_space *mapping),
+	TPARGS(mapping));
+
+#endif
diff --git a/mm/filemap.c b/mm/filemap.c
index 23acefe..76a6887 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -34,6 +34,7 @@
 #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
 #include <linux/memcontrol.h>
 #include <linux/mm_inline.h> /* for page_is_file_cache() */
+#include <trace/filemap.h>
 #include "internal.h"
 
 /*
@@ -43,6 +44,8 @@
 
 #include <asm/mman.h>
 
+DEFINE_TRACE(filemap_add_to_page_cache);
+DEFINE_TRACE(filemap_remove_from_page_cache);
 
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
@@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
 	page->mapping = NULL;
 	mapping->nrpages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
+	trace_filemap_remove_from_page_cache(mapping);
 	BUG_ON(page_mapped(page));
 	mem_cgroup_uncharge_cache_page(page);
 
@@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 		if (likely(!error)) {
 			mapping->nrpages++;
 			__inc_zone_page_state(page, NR_FILE_PAGES);
+			trace_filemap_add_to_page_cache(mapping, offset);
 		} else {
 			page->mapping = NULL;
 			mem_cgroup_uncharge_cache_page(page);



[-- Attachment #2: pgcache.jpg --]
[-- Type: image/jpeg, Size: 38334 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17  9:00 [PATCH] Add tracepoints to track pagecache transition Atsushi Tsuji
@ 2009-02-17  9:33 ` Peter Zijlstra
  2009-02-17 11:04   ` Atsushi Tsuji
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2009-02-17  9:33 UTC (permalink / raw)
  To: Atsushi Tsuji
  Cc: linux-kernel, Jason Baron, Ingo Molnar, Mathieu Desnoyers,
	Frank Ch. Eigler, Kazuto Miyoshi, rostedt, linux-mm,
	Andrew Morton, Nick Piggin, Hugh Dickins

On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:

> The below patch adds instrumentation for pagecache.

And somehow you forgot to CC any of the mm people.. ;-)

> I thought it would be useful to trace pagecache behavior for problem
> analysis (performance bottlenecks, behavior differences between stable
> time and trouble time).
> 
> By using those tracepoints, we can describe and visualize pagecache
> transition (file-by-file basis) in kernel and  pagecache
> consumes most of the memory in running system and pagecache hit rate
> and writeback behavior will influence system load and performance.

> Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> ---
> diff --git a/include/trace/filemap.h b/include/trace/filemap.h
> new file mode 100644
> index 0000000..196955e
> --- /dev/null
> +++ b/include/trace/filemap.h
> @@ -0,0 +1,13 @@
> +#ifndef _TRACE_FILEMAP_H
> +#define _TRACE_FILEMAP_H
> +
> +#include <linux/tracepoint.h>
> +
> +DECLARE_TRACE(filemap_add_to_page_cache,
> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> +	TPARGS(mapping, offset));
> +DECLARE_TRACE(filemap_remove_from_page_cache,
> +	TPPROTO(struct address_space *mapping),
> +	TPARGS(mapping));

This is rather asymmetric, why don't we care about the offset for the
removed page?

> +#endif
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 23acefe..76a6887 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -34,6 +34,7 @@
>  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
>  #include <linux/memcontrol.h>
>  #include <linux/mm_inline.h> /* for page_is_file_cache() */
> +#include <trace/filemap.h>
>  #include "internal.h"
>  
>  /*
> @@ -43,6 +44,8 @@
>  
>  #include <asm/mman.h>
>  
> +DEFINE_TRACE(filemap_add_to_page_cache);
> +DEFINE_TRACE(filemap_remove_from_page_cache);
>  
>  /*
>   * Shared mappings implemented 30.11.1994. It's not fully working yet,
> @@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
>  	page->mapping = NULL;
>  	mapping->nrpages--;
>  	__dec_zone_page_state(page, NR_FILE_PAGES);
> +	trace_filemap_remove_from_page_cache(mapping);
>  	BUG_ON(page_mapped(page));
>  	mem_cgroup_uncharge_cache_page(page);
>  
> @@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>  		if (likely(!error)) {
>  			mapping->nrpages++;
>  			__inc_zone_page_state(page, NR_FILE_PAGES);
> +			trace_filemap_add_to_page_cache(mapping, offset);
>  		} else {
>  			page->mapping = NULL;
>  			mem_cgroup_uncharge_cache_page(page);
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17  9:33 ` Peter Zijlstra
@ 2009-02-17 11:04   ` Atsushi Tsuji
  2009-02-17 11:38     ` KOSAKI Motohiro
  2009-02-17 15:24     ` Steven Rostedt
  0 siblings, 2 replies; 14+ messages in thread
From: Atsushi Tsuji @ 2009-02-17 11:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Jason Baron, Ingo Molnar, Mathieu Desnoyers,
	Frank Ch. Eigler, Kazuto Miyoshi, rostedt, linux-mm,
	Andrew Morton, Nick Piggin, Hugh Dickins

Peter Zijlstra wrote:
> On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:
> 
>> The below patch adds instrumentation for pagecache.
> 
> And somehow you forgot to CC any of the mm people.. ;-)

Hi Peter,

Ah, sorry.
Thank you for adding to CC list.

>> +DECLARE_TRACE(filemap_add_to_page_cache,
>> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
>> +	TPARGS(mapping, offset));
>> +DECLARE_TRACE(filemap_remove_from_page_cache,
>> +	TPPROTO(struct address_space *mapping),
>> +	TPARGS(mapping));
> 
> This is rather asymmetric, why don't we care about the offset for the
> removed page?
> 

Indeed.
I added the offset to the argument for the removed page and resend fixed patch.

Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
---
diff --git a/include/trace/filemap.h b/include/trace/filemap.h
new file mode 100644
index 0000000..a17dc92
--- /dev/null
+++ b/include/trace/filemap.h
@@ -0,0 +1,13 @@
+#ifndef _TRACE_FILEMAP_H
+#define _TRACE_FILEMAP_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_TRACE(filemap_add_to_page_cache,
+	TPPROTO(struct address_space *mapping, pgoff_t offset),
+	TPARGS(mapping, offset));
+DECLARE_TRACE(filemap_remove_from_page_cache,
+	TPPROTO(struct address_space *mapping, pgoff_t offset),
+	TPARGS(mapping, offset));
+
+#endif
diff --git a/mm/filemap.c b/mm/filemap.c
index 23acefe..23f75d2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -34,6 +34,7 @@
 #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
 #include <linux/memcontrol.h>
 #include <linux/mm_inline.h> /* for page_is_file_cache() */
+#include <trace/filemap.h>
 #include "internal.h"
 
 /*
@@ -43,6 +44,8 @@
 
 #include <asm/mman.h>
 
+DEFINE_TRACE(filemap_add_to_page_cache);
+DEFINE_TRACE(filemap_remove_from_page_cache);
 
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
@@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
 	page->mapping = NULL;
 	mapping->nrpages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
+	trace_filemap_remove_from_page_cache(mapping, page->index);
 	BUG_ON(page_mapped(page));
 	mem_cgroup_uncharge_cache_page(page);
 
@@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 		if (likely(!error)) {
 			mapping->nrpages++;
 			__inc_zone_page_state(page, NR_FILE_PAGES);
+			trace_filemap_add_to_page_cache(mapping, offset);
 		} else {
 			page->mapping = NULL;
 			mem_cgroup_uncharge_cache_page(page);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:04   ` Atsushi Tsuji
@ 2009-02-17 11:38     ` KOSAKI Motohiro
  2009-02-17 11:54       ` Peter Zijlstra
                         ` (2 more replies)
  2009-02-17 15:24     ` Steven Rostedt
  1 sibling, 3 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-02-17 11:38 UTC (permalink / raw)
  To: Atsushi Tsuji
  Cc: kosaki.motohiro, Peter Zijlstra, linux-kernel, Jason Baron,
	Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi,
	rostedt, linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

Hi


In my 1st impression, this patch description is a bit strange.

> The below patch adds instrumentation for pagecache.
> 
> I thought it would be useful to trace pagecache behavior for problem
> analysis (performance bottlenecks, behavior differences between stable
> time and trouble time).
> 
> By using those tracepoints, we can describe and visualize pagecache
> transition (file-by-file basis) in kernel and  pagecache
> consumes most of the memory in running system and pagecache hit rate
> and writeback behavior will influence system load and performance.

Why do you think this tracepoint describe pagecache hit rate?
and, why describe writeback behavior?

> 
> I attached an example which is visualization of pagecache status using
> SystemTap. 

it seems no attached. and SystemTap isn't used kernel developer at all.
I don't think it's enough explanation.
Can you make seekwatcher liked completed comsumer program?
(if you don't know seekwatcher, see http://oss.oracle.com/~mason/seekwatcher/)

> That graph describes pagecache transition of File A and File B
> on a file-by-file basis with the situation where regular I/O to File A
> is delayed because of other I/O to File B. 

If you want to see I/O activity, you need to add tracepoint into block layer.

> We visually understand
> pagecache for File A is narrowed down due to I/O pressure from File B.

confused. Can we assume the number of anon pages/files pages ratio don't chage?


> Peter Zijlstra wrote:
> > On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:
> > 
> >> The below patch adds instrumentation for pagecache.
> > 
> > And somehow you forgot to CC any of the mm people.. ;-)
> 
> Hi Peter,
> 
> Ah, sorry.
> Thank you for adding to CC list.
> 
> >> +DECLARE_TRACE(filemap_add_to_page_cache,
> >> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> >> +	TPARGS(mapping, offset));
> >> +DECLARE_TRACE(filemap_remove_from_page_cache,
> >> +	TPPROTO(struct address_space *mapping),
> >> +	TPARGS(mapping));
> > 
> > This is rather asymmetric, why don't we care about the offset for the
> > removed page?
> > 
> 
> Indeed.
> I added the offset to the argument for the removed page and resend fixed patch.
> 
> Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> ---
> diff --git a/include/trace/filemap.h b/include/trace/filemap.h

please add diffstat.


> new file mode 100644
> index 0000000..a17dc92
> --- /dev/null
> +++ b/include/trace/filemap.h
> @@ -0,0 +1,13 @@
> +#ifndef _TRACE_FILEMAP_H
> +#define _TRACE_FILEMAP_H
> +
> +#include <linux/tracepoint.h>
> +
> +DECLARE_TRACE(filemap_add_to_page_cache,
> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> +	TPARGS(mapping, offset));
> +DECLARE_TRACE(filemap_remove_from_page_cache,
> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> +	TPARGS(mapping, offset));
> +
> +#endif
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 23acefe..23f75d2 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -34,6 +34,7 @@
>  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
>  #include <linux/memcontrol.h>
>  #include <linux/mm_inline.h> /* for page_is_file_cache() */
> +#include <trace/filemap.h>
>  #include "internal.h"
>  
>  /*
> @@ -43,6 +44,8 @@
>  
>  #include <asm/mman.h>
>  
> +DEFINE_TRACE(filemap_add_to_page_cache);
> +DEFINE_TRACE(filemap_remove_from_page_cache);
>  
>  /*
>   * Shared mappings implemented 30.11.1994. It's not fully working yet,
> @@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
>  	page->mapping = NULL;
>  	mapping->nrpages--;
>  	__dec_zone_page_state(page, NR_FILE_PAGES);
> +	trace_filemap_remove_from_page_cache(mapping, page->index);

__remove_from_page_cache() is passed struct page.
Why don't you use struct page

and, this mean
  - the page have been removed from mapping.
  - vmstate have been decremented.
  - but, the page haven't been uncharged from memcg.

Why?


>  	BUG_ON(page_mapped(page));
>  	mem_cgroup_uncharge_cache_page(page);
>  
> @@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>  		if (likely(!error)) {
>  			mapping->nrpages++;
>  			__inc_zone_page_state(page, NR_FILE_PAGES);
> +			trace_filemap_add_to_page_cache(mapping, offset);

Why do you select this line?
In general, trace point calling under spin lock grabbing is a bit problematic.


>  		} else {
>  			page->mapping = NULL;
>  			mem_cgroup_uncharge_cache_page(page);
> 

And, both function is freqentlly called one.
I worry about performance issue. can you prove no degression?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:38     ` KOSAKI Motohiro
@ 2009-02-17 11:54       ` Peter Zijlstra
  2009-02-17 12:33         ` KOSAKI Motohiro
  2009-02-17 14:33       ` Frederic Weisbecker
  2009-02-19  4:41       ` Atsushi Tsuji
  2 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2009-02-17 11:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Atsushi Tsuji, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, rostedt,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins,
	Arnaldo Carvalho de Melo

On Tue, 2009-02-17 at 20:38 +0900, KOSAKI Motohiro wrote:
> If you want to see I/O activity, you need to add tracepoint into block
> layer.

We already have that, Arnaldo converted blktrace to use tracepoints and
wrote an ftrace tracer for it.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:54       ` Peter Zijlstra
@ 2009-02-17 12:33         ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-02-17 12:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Atsushi Tsuji, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, rostedt,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins,
	Arnaldo Carvalho de Melo

>> If you want to see I/O activity, you need to add tracepoint into block
>> layer.
>
> We already have that, Arnaldo converted blktrace to use tracepoints and
> wrote an ftrace tracer for it.

Yup. I agree I selected wrong word.
My point is, if he want to know I/O delaying reason, the number of the
cache pages seems unrelated thing.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:38     ` KOSAKI Motohiro
  2009-02-17 11:54       ` Peter Zijlstra
@ 2009-02-17 14:33       ` Frederic Weisbecker
  2009-02-18  0:29         ` KOSAKI Motohiro
  2009-02-19  4:41       ` Atsushi Tsuji
  2 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2009-02-17 14:33 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Atsushi Tsuji, Peter Zijlstra, linux-kernel, Jason Baron,
	Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi,
	rostedt, linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

On Tue, Feb 17, 2009 at 08:38:20PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> 
> In my 1st impression, this patch description is a bit strange.
> 
> > The below patch adds instrumentation for pagecache.
> > 
> > I thought it would be useful to trace pagecache behavior for problem
> > analysis (performance bottlenecks, behavior differences between stable
> > time and trouble time).
> > 
> > By using those tracepoints, we can describe and visualize pagecache
> > transition (file-by-file basis) in kernel and  pagecache
> > consumes most of the memory in running system and pagecache hit rate
> > and writeback behavior will influence system load and performance.
> 
> Why do you think this tracepoint describe pagecache hit rate?
> and, why describe writeback behavior?
> 
> > 
> > I attached an example which is visualization of pagecache status using
> > SystemTap. 
> 
> it seems no attached. and SystemTap isn't used kernel developer at all.
> I don't think it's enough explanation.
> Can you make seekwatcher liked completed comsumer program?
> (if you don't know seekwatcher, see http://oss.oracle.com/~mason/seekwatcher/)
> 
> > That graph describes pagecache transition of File A and File B
> > on a file-by-file basis with the situation where regular I/O to File A
> > is delayed because of other I/O to File B. 
> 
> If you want to see I/O activity, you need to add tracepoint into block layer.
> 
> > We visually understand
> > pagecache for File A is narrowed down due to I/O pressure from File B.
> 
> confused. Can we assume the number of anon pages/files pages ratio don't chage?
> 
> 
> > Peter Zijlstra wrote:
> > > On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:
> > > 
> > >> The below patch adds instrumentation for pagecache.
> > > 
> > > And somehow you forgot to CC any of the mm people.. ;-)
> > 
> > Hi Peter,
> > 
> > Ah, sorry.
> > Thank you for adding to CC list.
> > 
> > >> +DECLARE_TRACE(filemap_add_to_page_cache,
> > >> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> > >> +	TPARGS(mapping, offset));
> > >> +DECLARE_TRACE(filemap_remove_from_page_cache,
> > >> +	TPPROTO(struct address_space *mapping),
> > >> +	TPARGS(mapping));
> > > 
> > > This is rather asymmetric, why don't we care about the offset for the
> > > removed page?
> > > 
> > 
> > Indeed.
> > I added the offset to the argument for the removed page and resend fixed patch.
> > 
> > Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> > ---
> > diff --git a/include/trace/filemap.h b/include/trace/filemap.h
> 
> please add diffstat.
> 
> 
> > new file mode 100644
> > index 0000000..a17dc92
> > --- /dev/null
> > +++ b/include/trace/filemap.h
> > @@ -0,0 +1,13 @@
> > +#ifndef _TRACE_FILEMAP_H
> > +#define _TRACE_FILEMAP_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +DECLARE_TRACE(filemap_add_to_page_cache,
> > +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> > +	TPARGS(mapping, offset));
> > +DECLARE_TRACE(filemap_remove_from_page_cache,
> > +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> > +	TPARGS(mapping, offset));
> > +
> > +#endif
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 23acefe..23f75d2 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -34,6 +34,7 @@
> >  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
> >  #include <linux/memcontrol.h>
> >  #include <linux/mm_inline.h> /* for page_is_file_cache() */
> > +#include <trace/filemap.h>
> >  #include "internal.h"
> >  
> >  /*
> > @@ -43,6 +44,8 @@
> >  
> >  #include <asm/mman.h>
> >  
> > +DEFINE_TRACE(filemap_add_to_page_cache);
> > +DEFINE_TRACE(filemap_remove_from_page_cache);
> >  
> >  /*
> >   * Shared mappings implemented 30.11.1994. It's not fully working yet,
> > @@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
> >  	page->mapping = NULL;
> >  	mapping->nrpages--;
> >  	__dec_zone_page_state(page, NR_FILE_PAGES);
> > +	trace_filemap_remove_from_page_cache(mapping, page->index);
> 
> __remove_from_page_cache() is passed struct page.
> Why don't you use struct page
> 
> and, this mean
>   - the page have been removed from mapping.
>   - vmstate have been decremented.
>   - but, the page haven't been uncharged from memcg.
> 
> Why?
> 
> 
> >  	BUG_ON(page_mapped(page));
> >  	mem_cgroup_uncharge_cache_page(page);
> >  
> > @@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
> >  		if (likely(!error)) {
> >  			mapping->nrpages++;
> >  			__inc_zone_page_state(page, NR_FILE_PAGES);
> > +			trace_filemap_add_to_page_cache(mapping, offset);
> 
> Why do you select this line?
> In general, trace point calling under spin lock grabbing is a bit problematic.
> 
> 
> >  		} else {
> >  			page->mapping = NULL;
> >  			mem_cgroup_uncharge_cache_page(page);
> > 
> 
> And, both function is freqentlly called one.
> I worry about performance issue. can you prove no degression?


It would be very hard to prove. Tracepoints are very cheap in that they only
add the overhead of a single branch check while off.

But are there some plans about writing a tracer or so for pagecache?

 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:04   ` Atsushi Tsuji
  2009-02-17 11:38     ` KOSAKI Motohiro
@ 2009-02-17 15:24     ` Steven Rostedt
  2009-02-19 18:51       ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2009-02-17 15:24 UTC (permalink / raw)
  To: Atsushi Tsuji
  Cc: Peter Zijlstra, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, linux-mm,
	Andrew Morton, Nick Piggin, Hugh Dickins


On Tue, 17 Feb 2009, Atsushi Tsuji wrote:
> > 
> > This is rather asymmetric, why don't we care about the offset for the
> > removed page?
> > 
> 
> Indeed.
> I added the offset to the argument for the removed page and resend fixed patch.
> 
> Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>

Could you package it up in one patch again and resend with [PATCH v2].
Also make sure to Cc the memory folks, and ask for an Acked-by from them.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 14:33       ` Frederic Weisbecker
@ 2009-02-18  0:29         ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-02-18  0:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: kosaki.motohiro, Atsushi Tsuji, Peter Zijlstra, linux-kernel,
	Jason Baron, Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler,
	Kazuto Miyoshi, rostedt, linux-mm, Andrew Morton, Nick Piggin,
	Hugh Dickins

Hi Frederic,

> > And, both function is freqentlly called one.
> > I worry about performance issue. can you prove no degression?
> 
> It would be very hard to prove. Tracepoints are very cheap in that they only
> add the overhead of a single branch check while off.

this is typical reviewing comment.

Memory folks adage says,
	Don't believe theory, you believe benchmark result.

I don't oppose your theorical background opinion :)


> But are there some plans about writing a tracer or so for pagecache?




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:38     ` KOSAKI Motohiro
  2009-02-17 11:54       ` Peter Zijlstra
  2009-02-17 14:33       ` Frederic Weisbecker
@ 2009-02-19  4:41       ` Atsushi Tsuji
  2009-02-19 13:12         ` KOSAKI Motohiro
  2 siblings, 1 reply; 14+ messages in thread
From: Atsushi Tsuji @ 2009-02-19  4:41 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Peter Zijlstra, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, rostedt,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

Hi Kosaki-san,

Thank you for your comment.

KOSAKI Motohiro wrote:
> Hi
> 
> 
> In my 1st impression, this patch description is a bit strange.
> 
>> The below patch adds instrumentation for pagecache.
>>
>> I thought it would be useful to trace pagecache behavior for problem
>> analysis (performance bottlenecks, behavior differences between stable
>> time and trouble time).
>>
>> By using those tracepoints, we can describe and visualize pagecache
>> transition (file-by-file basis) in kernel and  pagecache
>> consumes most of the memory in running system and pagecache hit rate
>> and writeback behavior will influence system load and performance.
> 
> Why do you think this tracepoint describe pagecache hit rate?
> and, why describe writeback behavior?

I mean, we can describe file-by-file basis pagecache usage by using 
these tracepoints and it is important for analyzing process I/O behavior. 
Currently, we can understand the amount of pagecache from "Cached"
in /proc/meminfo. So I'd like to understand which files are using pagecache. 

> 
>> I attached an example which is visualization of pagecache status using
>> SystemTap. 
> 
> it seems no attached. and SystemTap isn't used kernel developer at all.
> I don't think it's enough explanation.
> Can you make seekwatcher liked completed comsumer program?
> (if you don't know seekwatcher, see http://oss.oracle.com/~mason/seekwatcher/)

I understand a tracer using these tracepoints need to be implemented.
What I want to do is counting pagecache per file. We can retrieve inode
from mapping and count pagecache per inode in these tracepoints.

>> That graph describes pagecache transition of File A and File B
>> on a file-by-file basis with the situation where regular I/O to File A
>> is delayed because of other I/O to File B. 
> 
> If you want to see I/O activity, you need to add tracepoint into block layer.

I think tracking pagecache is useful for understanding process I/O activity,
because whether process I/O completes by accessing memory or HDD is determined by
accessed files on pagecache or not.

> 
>> We visually understand
>> pagecache for File A is narrowed down due to I/O pressure from File B.
> 
> confused. Can we assume the number of anon pages/files pages ratio don't chage?
> 
> 
>> Peter Zijlstra wrote:
>>> On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:
>>>
>>>> The below patch adds instrumentation for pagecache.
>>> And somehow you forgot to CC any of the mm people.. ;-)
>> Hi Peter,
>>
>> Ah, sorry.
>> Thank you for adding to CC list.
>>
>>>> +DECLARE_TRACE(filemap_add_to_page_cache,
>>>> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
>>>> +	TPARGS(mapping, offset));
>>>> +DECLARE_TRACE(filemap_remove_from_page_cache,
>>>> +	TPPROTO(struct address_space *mapping),
>>>> +	TPARGS(mapping));
>>> This is rather asymmetric, why don't we care about the offset for the
>>> removed page?
>>>
>> Indeed.
>> I added the offset to the argument for the removed page and resend fixed patch.
>>
>> Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
>> ---
>> diff --git a/include/trace/filemap.h b/include/trace/filemap.h
> 
> please add diffstat.
> 
> 
>> new file mode 100644
>> index 0000000..a17dc92
>> --- /dev/null
>> +++ b/include/trace/filemap.h
>> @@ -0,0 +1,13 @@
>> +#ifndef _TRACE_FILEMAP_H
>> +#define _TRACE_FILEMAP_H
>> +
>> +#include <linux/tracepoint.h>
>> +
>> +DECLARE_TRACE(filemap_add_to_page_cache,
>> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
>> +	TPARGS(mapping, offset));
>> +DECLARE_TRACE(filemap_remove_from_page_cache,
>> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
>> +	TPARGS(mapping, offset));
>> +
>> +#endif
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 23acefe..23f75d2 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -34,6 +34,7 @@
>>  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
>>  #include <linux/memcontrol.h>
>>  #include <linux/mm_inline.h> /* for page_is_file_cache() */
>> +#include <trace/filemap.h>
>>  #include "internal.h"
>>  
>>  /*
>> @@ -43,6 +44,8 @@
>>  
>>  #include <asm/mman.h>
>>  
>> +DEFINE_TRACE(filemap_add_to_page_cache);
>> +DEFINE_TRACE(filemap_remove_from_page_cache);
>>  
>>  /*
>>   * Shared mappings implemented 30.11.1994. It's not fully working yet,
>> @@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
>>  	page->mapping = NULL;
>>  	mapping->nrpages--;
>>  	__dec_zone_page_state(page, NR_FILE_PAGES);
>> +	trace_filemap_remove_from_page_cache(mapping, page->index);
> 
> __remove_from_page_cache() is passed struct page.
> Why don't you use struct page
> 
> and, this mean
>   - the page have been removed from mapping.
>   - vmstate have been decremented.
>   - but, the page haven't been uncharged from memcg.
> 
> Why?
> 
> 
>>  	BUG_ON(page_mapped(page));
>>  	mem_cgroup_uncharge_cache_page(page);
>>  
>> @@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>>  		if (likely(!error)) {
>>  			mapping->nrpages++;
>>  			__inc_zone_page_state(page, NR_FILE_PAGES);
>> +			trace_filemap_add_to_page_cache(mapping, offset);
> 
> Why do you select this line?
> In general, trace point calling under spin lock grabbing is a bit problematic.
 
I understand my patch is wrong. I will fix and send it later.

> 
>>  		} else {
>>  			page->mapping = NULL;
>>  			mem_cgroup_uncharge_cache_page(page);
>>
> 
> And, both function is freqentlly called one.
> I worry about performance issue. can you prove no degression?

I will try to probe that.

Thanks,
-Atsushi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-19  4:41       ` Atsushi Tsuji
@ 2009-02-19 13:12         ` KOSAKI Motohiro
  2009-02-19 14:21           ` Lee Schermerhorn
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-02-19 13:12 UTC (permalink / raw)
  To: Atsushi Tsuji
  Cc: Peter Zijlstra, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, rostedt,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

> Hi Kosaki-san,
>
> Thank you for your comment.
>
> KOSAKI Motohiro wrote:
>> Hi
>>
>>
>> In my 1st impression, this patch description is a bit strange.
>>
>>> The below patch adds instrumentation for pagecache.
>>>
>>> I thought it would be useful to trace pagecache behavior for problem
>>> analysis (performance bottlenecks, behavior differences between stable
>>> time and trouble time).
>>>
>>> By using those tracepoints, we can describe and visualize pagecache
>>> transition (file-by-file basis) in kernel and  pagecache
>>> consumes most of the memory in running system and pagecache hit rate
>>> and writeback behavior will influence system load and performance.
>>
>> Why do you think this tracepoint describe pagecache hit rate?
>> and, why describe writeback behavior?
>
> I mean, we can describe file-by-file basis pagecache usage by using
> these tracepoints and it is important for analyzing process I/O behavior.

More confusing.
Your page cache tracepoint don't have any per-process information.


> Currently, we can understand the amount of pagecache from "Cached"
> in /proc/meminfo. So I'd like to understand which files are using pagecache.

There is one meta question, Why do you think file-by-file pagecache
infomartion is valueable?


>>> I attached an example which is visualization of pagecache status using
>>> SystemTap.
>>
>> it seems no attached. and SystemTap isn't used kernel developer at all.
>> I don't think it's enough explanation.
>> Can you make seekwatcher liked completed comsumer program?
>> (if you don't know seekwatcher, see http://oss.oracle.com/~mason/seekwatcher/)
>
> I understand a tracer using these tracepoints need to be implemented.
> What I want to do is counting pagecache per file. We can retrieve inode
> from mapping and count pagecache per inode in these tracepoints.
>
>
>>> That graph describes pagecache transition of File A and File B
>>> on a file-by-file basis with the situation where regular I/O to File A
>>> is delayed because of other I/O to File B.
>>
>> If you want to see I/O activity, you need to add tracepoint into block layer.
>
> I think tracking pagecache is useful for understanding process I/O activity,
> because whether process I/O completes by accessing memory or HDD is determined by
> accessed files on pagecache or not.

I don't know your opinion is right or not.
However, your opinion don't get consensus yet.

Perhaps, you need to make demonstrate programs, I think.


>> And, both function is freqentlly called one.
>> I worry about performance issue. can you prove no degression?
>
> I will try to probe that.

Perhaps, this is not needed yet.
Generally, worthless patch is never merged although it's no performance penalty.
So, I think you should explain the patch worth at first :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-19 13:12         ` KOSAKI Motohiro
@ 2009-02-19 14:21           ` Lee Schermerhorn
  2009-02-20  1:13             ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Lee Schermerhorn @ 2009-02-19 14:21 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Atsushi Tsuji, Peter Zijlstra, linux-kernel, Jason Baron,
	Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi,
	rostedt, linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

On Thu, 2009-02-19 at 22:12 +0900, KOSAKI Motohiro wrote:
> > Hi Kosaki-san,
> >
> > Thank you for your comment.
> >
> > KOSAKI Motohiro wrote:
> >> Hi
> >>
> >>
> >> In my 1st impression, this patch description is a bit strange.
> >>
> >>> The below patch adds instrumentation for pagecache.
> >>>
> >>> I thought it would be useful to trace pagecache behavior for problem
> >>> analysis (performance bottlenecks, behavior differences between stable
> >>> time and trouble time).
> >>>
> >>> By using those tracepoints, we can describe and visualize pagecache
> >>> transition (file-by-file basis) in kernel and  pagecache
> >>> consumes most of the memory in running system and pagecache hit rate
> >>> and writeback behavior will influence system load and performance.
> >>
> >> Why do you think this tracepoint describe pagecache hit rate?
> >> and, why describe writeback behavior?
> >
> > I mean, we can describe file-by-file basis pagecache usage by using
> > these tracepoints and it is important for analyzing process I/O behavior.
> 
> More confusing.
> Your page cache tracepoint don't have any per-process information.
> 
> 
> > Currently, we can understand the amount of pagecache from "Cached"
> > in /proc/meminfo. So I'd like to understand which files are using pagecache.
> 
> There is one meta question, Why do you think file-by-file pagecache
> infomartion is valueable?
> 

One might take a look at Marcello Tosatti's old 'vmtrace' patch.  It
contains it's own data store/transport via relayfs, but the trace points
could be ported to the current kernel tracing infrastructure.

Here's a starting point:   http://linux-mm.org/VmTrace

Quoting from that page:

>From the previous email to linux-mm:
>"The sequence of pages which a given process or workload accesses
>during its lifetime, a.k.a. "reference trace", is very important
>information. It has been used in the past for comparison of page
>replacement algorithms and other optimizations..."

Lee


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 15:24     ` Steven Rostedt
@ 2009-02-19 18:51       ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2009-02-19 18:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Atsushi Tsuji, Peter Zijlstra, linux-kernel, Jason Baron,
	Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

On Tue, Feb 17, 2009 at 10:24:46AM -0500, Steven Rostedt wrote:
> 
> On Tue, 17 Feb 2009, Atsushi Tsuji wrote:
> > > 
> > > This is rather asymmetric, why don't we care about the offset for the
> > > removed page?
> > > 
> > 
> > Indeed.
> > I added the offset to the argument for the removed page and resend fixed patch.
> > 
> > Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> 
> Could you package it up in one patch again and resend with [PATCH v2].
> Also make sure to Cc the memory folks, and ask for an Acked-by from them.

Well, until we actually get a consumer of those tracepoints, e.g. a
ftrace pluging into the tree strong NACK for me.

(p.s. I really don't get it why people keep trying to push dead code
 into the tree)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-19 14:21           ` Lee Schermerhorn
@ 2009-02-20  1:13             ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-02-20  1:13 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, Atsushi Tsuji, Peter Zijlstra, linux-kernel,
	Jason Baron, Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler,
	Kazuto Miyoshi, rostedt, linux-mm, Andrew Morton, Nick Piggin,
	Hugh Dickins

Hi

> > > Currently, we can understand the amount of pagecache from "Cached"
> > > in /proc/meminfo. So I'd like to understand which files are using pagecache.
> > 
> > There is one meta question, Why do you think file-by-file pagecache
> > infomartion is valueable?
> 
> One might take a look at Marcello Tosatti's old 'vmtrace' patch.  It
> contains it's own data store/transport via relayfs, but the trace points
> could be ported to the current kernel tracing infrastructure.
> 
> Here's a starting point:   http://linux-mm.org/VmTrace
> 
> Quoting from that page:
> 
> >From the previous email to linux-mm:
> >"The sequence of pages which a given process or workload accesses
> >during its lifetime, a.k.a. "reference trace", is very important
> >information. It has been used in the past for comparison of page
> >replacement algorithms and other optimizations..."

Sure.
but strong difference exist.

vmtrace
  - can run standalone
  - reviewer can confirm its output result is useful or not.

Christoph also explained reason more kindly.
I think we need useful consumer. 



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-02-20  1:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-17  9:00 [PATCH] Add tracepoints to track pagecache transition Atsushi Tsuji
2009-02-17  9:33 ` Peter Zijlstra
2009-02-17 11:04   ` Atsushi Tsuji
2009-02-17 11:38     ` KOSAKI Motohiro
2009-02-17 11:54       ` Peter Zijlstra
2009-02-17 12:33         ` KOSAKI Motohiro
2009-02-17 14:33       ` Frederic Weisbecker
2009-02-18  0:29         ` KOSAKI Motohiro
2009-02-19  4:41       ` Atsushi Tsuji
2009-02-19 13:12         ` KOSAKI Motohiro
2009-02-19 14:21           ` Lee Schermerhorn
2009-02-20  1:13             ` KOSAKI Motohiro
2009-02-17 15:24     ` Steven Rostedt
2009-02-19 18:51       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).