All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] writeback: kernel visibility
@ 2010-09-12 20:30 ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

Patch #1 sets up some helper functions for account_page_dirty and fixes
a bug in ceph

Patch #2 sets up some helper functions for account_page_writeback

Patch #3 adds writeback visibility in /proc/vmstat

To help developers and applications gain visibility into writeback
behaviour this patch adds two counters to /proc/vmstat.

  # grep nr_dirtied /proc/vmstat
  nr_dirtied 3747
  # grep nr_written /proc/vmstat
  nr_written 3618

These entries allow user apps to understand writeback behaviour over
time and learn how it is impacting their performance. Currently there
is no way to inspect dirty and writeback speed over time. It's not
possible for nr_dirty/nr_writeback.

These entries are necessary to give visibility into writeback
behaviour. We have /proc/diskstats which lets us understand the io in
the block layer. We have blktrace for more in depth understanding. We have
e2fsprogs and debugsfs to give insight into the file systems behaviour,
but we don't offer our users the ability understand what writeback is
doing. There is no way to know how active it is over the whole system,
if it's falling behind or to quantify it's efforts. With these values
exported users can easily see how much data applications are sending
through writeback and also at what rates writeback is processing this
data. Comparing the rates of change between the two allow developers
to see when writeback is not able to keep up with incoming traffic and
the rate of dirty memory being sent to the IO back end. This allows
folks to understand their io workloads and track kernel issues. Non
kernel engineers at Google often use these counters to solve puzzling
performance problems.

Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written

Patch #5 add writeback thresholds to /proc/vmstat

Currently these values are in debugfs. But they should be promoted to
/proc since they are useful for developers who are writing databases
and file servers and are not debugging the kernel.

The output is as below:

 # grep threshold /proc/vmstat
 nr_pages_dirty_threshold 409111
 nr_pages_dirty_background_threshold 818223

Michael Rubin (5):
  mm: exporting account_page_dirty
  mm: account_page_writeback added
  writeback: nr_dirtied and nr_written in /proc/vmstat
  writeback: Adding /sys/devices/system/node/<node>/vmstat
  writeback: Reporting dirty thresholds in /proc/vmstat

 drivers/base/node.c    |   14 ++++++++++++++
 fs/ceph/addr.c         |    8 +-------
 fs/nilfs2/segment.c    |    2 +-
 include/linux/mm.h     |    1 +
 include/linux/mmzone.h |    4 ++++
 mm/page-writeback.c    |   16 +++++++++++++++-
 mm/vmstat.c            |    7 +++++++
 7 files changed, 43 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/5] writeback: kernel visibility
@ 2010-09-12 20:30 ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

Patch #1 sets up some helper functions for account_page_dirty and fixes
a bug in ceph

Patch #2 sets up some helper functions for account_page_writeback

Patch #3 adds writeback visibility in /proc/vmstat

To help developers and applications gain visibility into writeback
behaviour this patch adds two counters to /proc/vmstat.

  # grep nr_dirtied /proc/vmstat
  nr_dirtied 3747
  # grep nr_written /proc/vmstat
  nr_written 3618

These entries allow user apps to understand writeback behaviour over
time and learn how it is impacting their performance. Currently there
is no way to inspect dirty and writeback speed over time. It's not
possible for nr_dirty/nr_writeback.

These entries are necessary to give visibility into writeback
behaviour. We have /proc/diskstats which lets us understand the io in
the block layer. We have blktrace for more in depth understanding. We have
e2fsprogs and debugsfs to give insight into the file systems behaviour,
but we don't offer our users the ability understand what writeback is
doing. There is no way to know how active it is over the whole system,
if it's falling behind or to quantify it's efforts. With these values
exported users can easily see how much data applications are sending
through writeback and also at what rates writeback is processing this
data. Comparing the rates of change between the two allow developers
to see when writeback is not able to keep up with incoming traffic and
the rate of dirty memory being sent to the IO back end. This allows
folks to understand their io workloads and track kernel issues. Non
kernel engineers at Google often use these counters to solve puzzling
performance problems.

Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written

Patch #5 add writeback thresholds to /proc/vmstat

Currently these values are in debugfs. But they should be promoted to
/proc since they are useful for developers who are writing databases
and file servers and are not debugging the kernel.

The output is as below:

 # grep threshold /proc/vmstat
 nr_pages_dirty_threshold 409111
 nr_pages_dirty_background_threshold 818223

Michael Rubin (5):
  mm: exporting account_page_dirty
  mm: account_page_writeback added
  writeback: nr_dirtied and nr_written in /proc/vmstat
  writeback: Adding /sys/devices/system/node/<node>/vmstat
  writeback: Reporting dirty thresholds in /proc/vmstat

 drivers/base/node.c    |   14 ++++++++++++++
 fs/ceph/addr.c         |    8 +-------
 fs/nilfs2/segment.c    |    2 +-
 include/linux/mm.h     |    1 +
 include/linux/mmzone.h |    4 ++++
 mm/page-writeback.c    |   16 +++++++++++++++-
 mm/vmstat.c            |    7 +++++++
 7 files changed, 43 insertions(+), 9 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5] mm: exporting account_page_dirty
  2010-09-12 20:30 ` Michael Rubin
@ 2010-09-12 20:30   ` Michael Rubin
  -1 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

This allows code outside of the mm core to safely manipulate page state
and not worry about the other accounting. Not using these routines means
that some code will lose track of the accounting and we get bugs. This
has happened once already.

Modified cephs to use the interface.

Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/ceph/addr.c      |    8 +-------
 mm/page-writeback.c |    1 +
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 5598a0d..420d469 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page)
 	spin_lock_irq(&mapping->tree_lock);
 	if (page->mapping) {	/* Race with truncate? */
 		WARN_ON_ONCE(!PageUptodate(page));
-
-		if (mapping_cap_account_dirty(mapping)) {
-			__inc_zone_page_state(page, NR_FILE_DIRTY);
-			__inc_bdi_stat(mapping->backing_dev_info,
-					BDI_RECLAIMABLE);
-			task_io_account_write(PAGE_CACHE_SIZE);
-		}
+		account_page_dirtied(page, page->mapping);
 		radix_tree_tag_set(&mapping->page_tree,
 				page_index(page), PAGECACHE_TAG_DIRTY);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7262aac..9d07a8d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 		task_io_account_write(PAGE_CACHE_SIZE);
 	}
 }
+EXPORT_SYMBOL(account_page_dirtied);
 
 /*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/5] mm: exporting account_page_dirty
@ 2010-09-12 20:30   ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

This allows code outside of the mm core to safely manipulate page state
and not worry about the other accounting. Not using these routines means
that some code will lose track of the accounting and we get bugs. This
has happened once already.

Modified cephs to use the interface.

Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/ceph/addr.c      |    8 +-------
 mm/page-writeback.c |    1 +
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 5598a0d..420d469 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page)
 	spin_lock_irq(&mapping->tree_lock);
 	if (page->mapping) {	/* Race with truncate? */
 		WARN_ON_ONCE(!PageUptodate(page));
-
-		if (mapping_cap_account_dirty(mapping)) {
-			__inc_zone_page_state(page, NR_FILE_DIRTY);
-			__inc_bdi_stat(mapping->backing_dev_info,
-					BDI_RECLAIMABLE);
-			task_io_account_write(PAGE_CACHE_SIZE);
-		}
+		account_page_dirtied(page, page->mapping);
 		radix_tree_tag_set(&mapping->page_tree,
 				page_index(page), PAGECACHE_TAG_DIRTY);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7262aac..9d07a8d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 		task_io_account_write(PAGE_CACHE_SIZE);
 	}
 }
+EXPORT_SYMBOL(account_page_dirtied);
 
 /*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] mm: account_page_writeback added
  2010-09-12 20:30 ` Michael Rubin
@ 2010-09-12 20:30   ` Michael Rubin
  -1 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

This allows code outside of the mm core to safely manipulate page
writeback state and not worry about the other accounting. Not using
these routines means that some code will lose track of the accounting
and we get bugs.

Modified nilfs2 to use interface.

Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/nilfs2/segment.c |    2 +-
 include/linux/mm.h  |    1 +
 mm/page-writeback.c |   13 ++++++++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 9fd051a..5617f16 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1599,7 +1599,7 @@ nilfs_copy_replace_page_buffers(struct page *page, struct list_head *out)
 	kunmap_atomic(kaddr, KM_USER0);
 
 	if (!TestSetPageWriteback(clone_page))
-		inc_zone_page_state(clone_page, NR_WRITEBACK);
+		account_page_writeback(clone_page);
 	unlock_page(clone_page);
 
 	return 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 709f672..4b2f38b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -856,6 +856,7 @@ int __set_page_dirty_no_writeback(struct page *page);
 int redirty_page_for_writepage(struct writeback_control *wbc,
 				struct page *page);
 void account_page_dirtied(struct page *page, struct address_space *mapping);
+void account_page_writeback(struct page *page);
 int set_page_dirty(struct page *page);
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 9d07a8d..ae5f5d5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1134,6 +1134,17 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 EXPORT_SYMBOL(account_page_dirtied);
 
 /*
+ * Helper function for set_page_writeback family.
+ * NOTE: Unlike account_page_dirtied this does not rely on being atomic
+ * wrt interrupts.
+ */
+void account_page_writeback(struct page *page)
+{
+	inc_zone_page_state(page, NR_WRITEBACK);
+}
+EXPORT_SYMBOL(account_page_writeback);
+
+/*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
  * its radix tree.
  *
@@ -1371,7 +1382,7 @@ int test_set_page_writeback(struct page *page)
 		ret = TestSetPageWriteback(page);
 	}
 	if (!ret)
-		inc_zone_page_state(page, NR_WRITEBACK);
+		account_page_writeback(page);
 	return ret;
 
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] mm: account_page_writeback added
@ 2010-09-12 20:30   ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

This allows code outside of the mm core to safely manipulate page
writeback state and not worry about the other accounting. Not using
these routines means that some code will lose track of the accounting
and we get bugs.

Modified nilfs2 to use interface.

Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/nilfs2/segment.c |    2 +-
 include/linux/mm.h  |    1 +
 mm/page-writeback.c |   13 ++++++++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 9fd051a..5617f16 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1599,7 +1599,7 @@ nilfs_copy_replace_page_buffers(struct page *page, struct list_head *out)
 	kunmap_atomic(kaddr, KM_USER0);
 
 	if (!TestSetPageWriteback(clone_page))
-		inc_zone_page_state(clone_page, NR_WRITEBACK);
+		account_page_writeback(clone_page);
 	unlock_page(clone_page);
 
 	return 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 709f672..4b2f38b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -856,6 +856,7 @@ int __set_page_dirty_no_writeback(struct page *page);
 int redirty_page_for_writepage(struct writeback_control *wbc,
 				struct page *page);
 void account_page_dirtied(struct page *page, struct address_space *mapping);
+void account_page_writeback(struct page *page);
 int set_page_dirty(struct page *page);
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 9d07a8d..ae5f5d5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1134,6 +1134,17 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 EXPORT_SYMBOL(account_page_dirtied);
 
 /*
+ * Helper function for set_page_writeback family.
+ * NOTE: Unlike account_page_dirtied this does not rely on being atomic
+ * wrt interrupts.
+ */
+void account_page_writeback(struct page *page)
+{
+	inc_zone_page_state(page, NR_WRITEBACK);
+}
+EXPORT_SYMBOL(account_page_writeback);
+
+/*
  * For address_spaces which do not use buffers.  Just tag the page as dirty in
  * its radix tree.
  *
@@ -1371,7 +1382,7 @@ int test_set_page_writeback(struct page *page)
 		ret = TestSetPageWriteback(page);
 	}
 	if (!ret)
-		inc_zone_page_state(page, NR_WRITEBACK);
+		account_page_writeback(page);
 	return ret;
 
 }
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] writeback: nr_dirtied and nr_written in /proc/vmstat
  2010-09-12 20:30 ` Michael Rubin
@ 2010-09-12 20:30   ` Michael Rubin
  -1 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

To help developers and applications gain visibility into writeback
behaviour adding two entries to vm_stat_items and /proc/vmstat. This
will allow us to track the "written" and "dirtied" counts.

   # grep nr_dirtied /proc/vmstat
   nr_dirtied 3747
   # grep nr_written /proc/vmstat
   nr_cleaned 3618

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 include/linux/mmzone.h |    2 ++
 mm/page-writeback.c    |    2 ++
 mm/vmstat.c            |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6e6e626..d0d7454 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -104,6 +104,8 @@ enum zone_stat_item {
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
+	NR_FILE_DIRTIED,	/* accumulated dirty pages */
+	NR_WRITTEN,		/* accumulated written pages */
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ae5f5d5..4d6ef9c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1126,6 +1126,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 {
 	if (mapping_cap_account_dirty(mapping)) {
 		__inc_zone_page_state(page, NR_FILE_DIRTY);
+		__inc_zone_page_state(page, NR_FILE_DIRTIED);
 		__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
 		task_dirty_inc(current);
 		task_io_account_write(PAGE_CACHE_SIZE);
@@ -1141,6 +1142,7 @@ EXPORT_SYMBOL(account_page_dirtied);
 void account_page_writeback(struct page *page)
 {
 	inc_zone_page_state(page, NR_WRITEBACK);
+	inc_zone_page_state(page, NR_WRITTEN);
 }
 EXPORT_SYMBOL(account_page_writeback);
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f389168..d448ef4 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -732,6 +732,9 @@ static const char * const vmstat_text[] = {
 	"nr_isolated_anon",
 	"nr_isolated_file",
 	"nr_shmem",
+	"nr_dirtied",
+	"nr_written",
+
 #ifdef CONFIG_NUMA
 	"numa_hit",
 	"numa_miss",
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] writeback: nr_dirtied and nr_written in /proc/vmstat
@ 2010-09-12 20:30   ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

To help developers and applications gain visibility into writeback
behaviour adding two entries to vm_stat_items and /proc/vmstat. This
will allow us to track the "written" and "dirtied" counts.

   # grep nr_dirtied /proc/vmstat
   nr_dirtied 3747
   # grep nr_written /proc/vmstat
   nr_cleaned 3618

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 include/linux/mmzone.h |    2 ++
 mm/page-writeback.c    |    2 ++
 mm/vmstat.c            |    3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6e6e626..d0d7454 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -104,6 +104,8 @@ enum zone_stat_item {
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
+	NR_FILE_DIRTIED,	/* accumulated dirty pages */
+	NR_WRITTEN,		/* accumulated written pages */
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ae5f5d5..4d6ef9c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1126,6 +1126,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping)
 {
 	if (mapping_cap_account_dirty(mapping)) {
 		__inc_zone_page_state(page, NR_FILE_DIRTY);
+		__inc_zone_page_state(page, NR_FILE_DIRTIED);
 		__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
 		task_dirty_inc(current);
 		task_io_account_write(PAGE_CACHE_SIZE);
@@ -1141,6 +1142,7 @@ EXPORT_SYMBOL(account_page_dirtied);
 void account_page_writeback(struct page *page)
 {
 	inc_zone_page_state(page, NR_WRITEBACK);
+	inc_zone_page_state(page, NR_WRITTEN);
 }
 EXPORT_SYMBOL(account_page_writeback);
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f389168..d448ef4 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -732,6 +732,9 @@ static const char * const vmstat_text[] = {
 	"nr_isolated_anon",
 	"nr_isolated_file",
 	"nr_shmem",
+	"nr_dirtied",
+	"nr_written",
+
 #ifdef CONFIG_NUMA
 	"numa_hit",
 	"numa_miss",
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] writeback: Adding /sys/devices/system/node/<node>/vmstat
  2010-09-12 20:30 ` Michael Rubin
@ 2010-09-12 20:30   ` Michael Rubin
  -1 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

For NUMA node systems it is important to have visibility in memory
characteristics. Two of the /proc/vmstat values "nr_cleaned" and
"nr_dirtied" are added here.

	# cat /sys/devices/system/node/node20/vmstat
	nr_cleaned 0
	nr_dirtied 0

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 drivers/base/node.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 2872e86..6aaccd9 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -160,6 +160,18 @@ static ssize_t node_read_numastat(struct sys_device * dev,
 }
 static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
 
+static ssize_t node_read_vmstat(struct sys_device *dev,
+				struct sysdev_attribute *attr, char *buf)
+{
+	int nid = dev->id;
+	return sprintf(buf,
+		"nr_written %lu\n"
+		"nr_dirtied %lu\n",
+		node_page_state(nid, NR_WRITTEN),
+		node_page_state(nid, NR_FILE_DIRTIED));
+}
+static SYSDEV_ATTR(vmstat, S_IRUGO, node_read_vmstat, NULL);
+
 static ssize_t node_read_distance(struct sys_device * dev,
 			struct sysdev_attribute *attr, char * buf)
 {
@@ -243,6 +255,7 @@ int register_node(struct node *node, int num, struct node *parent)
 		sysdev_create_file(&node->sysdev, &attr_meminfo);
 		sysdev_create_file(&node->sysdev, &attr_numastat);
 		sysdev_create_file(&node->sysdev, &attr_distance);
+		sysdev_create_file(&node->sysdev, &attr_vmstat);
 
 		scan_unevictable_register_node(node);
 
@@ -267,6 +280,7 @@ void unregister_node(struct node *node)
 	sysdev_remove_file(&node->sysdev, &attr_meminfo);
 	sysdev_remove_file(&node->sysdev, &attr_numastat);
 	sysdev_remove_file(&node->sysdev, &attr_distance);
+	sysdev_remove_file(&node->sysdev, &attr_vmstat);
 
 	scan_unevictable_unregister_node(node);
 	hugetlb_unregister_node(node);		/* no-op, if memoryless node */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] writeback: Adding /sys/devices/system/node/<node>/vmstat
@ 2010-09-12 20:30   ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

For NUMA node systems it is important to have visibility in memory
characteristics. Two of the /proc/vmstat values "nr_cleaned" and
"nr_dirtied" are added here.

	# cat /sys/devices/system/node/node20/vmstat
	nr_cleaned 0
	nr_dirtied 0

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 drivers/base/node.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 2872e86..6aaccd9 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -160,6 +160,18 @@ static ssize_t node_read_numastat(struct sys_device * dev,
 }
 static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
 
+static ssize_t node_read_vmstat(struct sys_device *dev,
+				struct sysdev_attribute *attr, char *buf)
+{
+	int nid = dev->id;
+	return sprintf(buf,
+		"nr_written %lu\n"
+		"nr_dirtied %lu\n",
+		node_page_state(nid, NR_WRITTEN),
+		node_page_state(nid, NR_FILE_DIRTIED));
+}
+static SYSDEV_ATTR(vmstat, S_IRUGO, node_read_vmstat, NULL);
+
 static ssize_t node_read_distance(struct sys_device * dev,
 			struct sysdev_attribute *attr, char * buf)
 {
@@ -243,6 +255,7 @@ int register_node(struct node *node, int num, struct node *parent)
 		sysdev_create_file(&node->sysdev, &attr_meminfo);
 		sysdev_create_file(&node->sysdev, &attr_numastat);
 		sysdev_create_file(&node->sysdev, &attr_distance);
+		sysdev_create_file(&node->sysdev, &attr_vmstat);
 
 		scan_unevictable_register_node(node);
 
@@ -267,6 +280,7 @@ void unregister_node(struct node *node)
 	sysdev_remove_file(&node->sysdev, &attr_meminfo);
 	sysdev_remove_file(&node->sysdev, &attr_numastat);
 	sysdev_remove_file(&node->sysdev, &attr_distance);
+	sysdev_remove_file(&node->sysdev, &attr_vmstat);
 
 	scan_unevictable_unregister_node(node);
 	hugetlb_unregister_node(node);		/* no-op, if memoryless node */
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-09-12 20:30 ` Michael Rubin
@ 2010-09-12 20:30   ` Michael Rubin
  -1 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

The kernel already exposes the user desired thresholds in /proc/sys/vm
with dirty_background_ratio and background_ratio. But the kernel may
alter the number requested without giving the user any indication that
is the case.

Knowing the actual ratios the kernel is honoring can help app developers
understand how their buffered IO will be sent to the disk.

        $ grep threshold /proc/vmstat
        nr_dirty_threshold 409111
        nr_dirty_background_threshold 818223

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 include/linux/mmzone.h |    2 ++
 mm/vmstat.c            |    4 ++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d0d7454..1e87936 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -106,6 +106,8 @@ enum zone_stat_item {
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
 	NR_FILE_DIRTIED,	/* accumulated dirty pages */
 	NR_WRITTEN,		/* accumulated written pages */
+	NR_DIRTY_THRESHOLD,	/* writeback threshold */
+	NR_DIRTY_BG_THRESHOLD,	/* bg writeback threshold */
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index d448ef4..0c1ddca 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -17,6 +17,7 @@
 #include <linux/vmstat.h>
 #include <linux/sched.h>
 #include <linux/math64.h>
+#include <linux/writeback.h>
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -734,6 +735,8 @@ static const char * const vmstat_text[] = {
 	"nr_shmem",
 	"nr_dirtied",
 	"nr_written",
+	"nr_dirty_threshold",
+	"nr_dirty_background_threshold",
 
 #ifdef CONFIG_NUMA
 	"numa_hit",
@@ -917,6 +920,7 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos)
 		return ERR_PTR(-ENOMEM);
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		v[i] = global_page_state(i);
+	global_dirty_limits(v + NR_DIRTY_BG_THRESHOLD, v + NR_DIRTY_THRESHOLD);
 #ifdef CONFIG_VM_EVENT_COUNTERS
 	e = v + NR_VM_ZONE_STAT_ITEMS;
 	all_vm_events(e);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
@ 2010-09-12 20:30   ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-12 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

The kernel already exposes the user desired thresholds in /proc/sys/vm
with dirty_background_ratio and background_ratio. But the kernel may
alter the number requested without giving the user any indication that
is the case.

Knowing the actual ratios the kernel is honoring can help app developers
understand how their buffered IO will be sent to the disk.

        $ grep threshold /proc/vmstat
        nr_dirty_threshold 409111
        nr_dirty_background_threshold 818223

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 include/linux/mmzone.h |    2 ++
 mm/vmstat.c            |    4 ++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d0d7454..1e87936 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -106,6 +106,8 @@ enum zone_stat_item {
 	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
 	NR_FILE_DIRTIED,	/* accumulated dirty pages */
 	NR_WRITTEN,		/* accumulated written pages */
+	NR_DIRTY_THRESHOLD,	/* writeback threshold */
+	NR_DIRTY_BG_THRESHOLD,	/* bg writeback threshold */
 #ifdef CONFIG_NUMA
 	NUMA_HIT,		/* allocated in intended node */
 	NUMA_MISS,		/* allocated in non intended node */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index d448ef4..0c1ddca 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -17,6 +17,7 @@
 #include <linux/vmstat.h>
 #include <linux/sched.h>
 #include <linux/math64.h>
+#include <linux/writeback.h>
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -734,6 +735,8 @@ static const char * const vmstat_text[] = {
 	"nr_shmem",
 	"nr_dirtied",
 	"nr_written",
+	"nr_dirty_threshold",
+	"nr_dirty_background_threshold",
 
 #ifdef CONFIG_NUMA
 	"numa_hit",
@@ -917,6 +920,7 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos)
 		return ERR_PTR(-ENOMEM);
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		v[i] = global_page_state(i);
+	global_dirty_limits(v + NR_DIRTY_BG_THRESHOLD, v + NR_DIRTY_THRESHOLD);
 #ifdef CONFIG_VM_EVENT_COUNTERS
 	e = v + NR_VM_ZONE_STAT_ITEMS;
 	all_vm_events(e);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] mm: account_page_writeback added
  2010-09-12 20:30   ` Michael Rubin
@ 2010-09-13  2:50     ` Wu Fengguang
  -1 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2010-09-13  2:50 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Mon, Sep 13, 2010 at 04:30:37AM +0800, Michael Rubin wrote:
> This allows code outside of the mm core to safely manipulate page
> writeback state and not worry about the other accounting. Not using
> these routines means that some code will lose track of the accounting
> and we get bugs.
> 
> Modified nilfs2 to use interface.
> 
> Signed-off-by: Michael Rubin <mrubin@google.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] mm: account_page_writeback added
@ 2010-09-13  2:50     ` Wu Fengguang
  0 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2010-09-13  2:50 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Mon, Sep 13, 2010 at 04:30:37AM +0800, Michael Rubin wrote:
> This allows code outside of the mm core to safely manipulate page
> writeback state and not worry about the other accounting. Not using
> these routines means that some code will lose track of the accounting
> and we get bugs.
> 
> Modified nilfs2 to use interface.
> 
> Signed-off-by: Michael Rubin <mrubin@google.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] writeback: nr_dirtied and nr_written in /proc/vmstat
  2010-09-12 20:30   ` Michael Rubin
@ 2010-09-13  2:58     ` Wu Fengguang
  -1 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2010-09-13  2:58 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Mon, Sep 13, 2010 at 04:30:38AM +0800, Michael Rubin wrote:
> To help developers and applications gain visibility into writeback
> behaviour adding two entries to vm_stat_items and /proc/vmstat. This
> will allow us to track the "written" and "dirtied" counts.
> 
>    # grep nr_dirtied /proc/vmstat
>    nr_dirtied 3747
>    # grep nr_written /proc/vmstat
>    nr_cleaned 3618

s/nr_cleaned/nr_written

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] writeback: nr_dirtied and nr_written in /proc/vmstat
@ 2010-09-13  2:58     ` Wu Fengguang
  0 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2010-09-13  2:58 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Mon, Sep 13, 2010 at 04:30:38AM +0800, Michael Rubin wrote:
> To help developers and applications gain visibility into writeback
> behaviour adding two entries to vm_stat_items and /proc/vmstat. This
> will allow us to track the "written" and "dirtied" counts.
> 
>    # grep nr_dirtied /proc/vmstat
>    nr_dirtied 3747
>    # grep nr_written /proc/vmstat
>    nr_cleaned 3618

s/nr_cleaned/nr_written

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] writeback: Adding /sys/devices/system/node/<node>/vmstat
  2010-09-12 20:30   ` Michael Rubin
@ 2010-09-13  3:02     ` Wu Fengguang
  -1 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2010-09-13  3:02 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Mon, Sep 13, 2010 at 04:30:39AM +0800, Michael Rubin wrote:
> For NUMA node systems it is important to have visibility in memory
> characteristics. Two of the /proc/vmstat values "nr_cleaned" and

s/nr_cleaned/nr_written/

> "nr_dirtied" are added here.
> 
> 	# cat /sys/devices/system/node/node20/vmstat
> 	nr_cleaned 0

ditto

> 	nr_dirtied 0
> 
> Signed-off-by: Michael Rubin <mrubin@google.com>

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

> +static ssize_t node_read_vmstat(struct sys_device *dev,
> +				struct sysdev_attribute *attr, char *buf)
> +{
> +	int nid = dev->id;
> +	return sprintf(buf,
> +		"nr_written %lu\n"
> +		"nr_dirtied %lu\n",
> +		node_page_state(nid, NR_WRITTEN),
> +		node_page_state(nid, NR_FILE_DIRTIED));
> +}

Do you have plan to port more vmstat_text[] items? :)

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] writeback: Adding /sys/devices/system/node/<node>/vmstat
@ 2010-09-13  3:02     ` Wu Fengguang
  0 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2010-09-13  3:02 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Mon, Sep 13, 2010 at 04:30:39AM +0800, Michael Rubin wrote:
> For NUMA node systems it is important to have visibility in memory
> characteristics. Two of the /proc/vmstat values "nr_cleaned" and

s/nr_cleaned/nr_written/

> "nr_dirtied" are added here.
> 
> 	# cat /sys/devices/system/node/node20/vmstat
> 	nr_cleaned 0

ditto

> 	nr_dirtied 0
> 
> Signed-off-by: Michael Rubin <mrubin@google.com>

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>

> +static ssize_t node_read_vmstat(struct sys_device *dev,
> +				struct sysdev_attribute *attr, char *buf)
> +{
> +	int nid = dev->id;
> +	return sprintf(buf,
> +		"nr_written %lu\n"
> +		"nr_dirtied %lu\n",
> +		node_page_state(nid, NR_WRITTEN),
> +		node_page_state(nid, NR_FILE_DIRTIED));
> +}

Do you have plan to port more vmstat_text[] items? :)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-09-12 20:30   ` Michael Rubin
@ 2010-09-13  3:15     ` Wu Fengguang
  -1 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2010-09-13  3:15 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Mon, Sep 13, 2010 at 04:30:40AM +0800, Michael Rubin wrote:
> The kernel already exposes the user desired thresholds in /proc/sys/vm
> with dirty_background_ratio and background_ratio. But the kernel may
> alter the number requested without giving the user any indication that
> is the case.
> 
> Knowing the actual ratios the kernel is honoring can help app developers
> understand how their buffered IO will be sent to the disk.
> 
>         $ grep threshold /proc/vmstat
>         nr_dirty_threshold 409111
>         nr_dirty_background_threshold 818223
> 
> Signed-off-by: Michael Rubin <mrubin@google.com>
> ---
>  include/linux/mmzone.h |    2 ++
>  mm/vmstat.c            |    4 ++++
>  2 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d0d7454..1e87936 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -106,6 +106,8 @@ enum zone_stat_item {
>  	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
>  	NR_FILE_DIRTIED,	/* accumulated dirty pages */
>  	NR_WRITTEN,		/* accumulated written pages */
> +	NR_DIRTY_THRESHOLD,	/* writeback threshold */

s/writeback/dirty throttling/

> +	NR_DIRTY_BG_THRESHOLD,	/* bg writeback threshold */

I have no idea about this interface change. No ACK or NAK.

But technical wise, the above two enum items should better be removed
to avoid possibly eating one more cache line. The two items can be
printed by explicit code.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
@ 2010-09-13  3:15     ` Wu Fengguang
  0 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2010-09-13  3:15 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Mon, Sep 13, 2010 at 04:30:40AM +0800, Michael Rubin wrote:
> The kernel already exposes the user desired thresholds in /proc/sys/vm
> with dirty_background_ratio and background_ratio. But the kernel may
> alter the number requested without giving the user any indication that
> is the case.
> 
> Knowing the actual ratios the kernel is honoring can help app developers
> understand how their buffered IO will be sent to the disk.
> 
>         $ grep threshold /proc/vmstat
>         nr_dirty_threshold 409111
>         nr_dirty_background_threshold 818223
> 
> Signed-off-by: Michael Rubin <mrubin@google.com>
> ---
>  include/linux/mmzone.h |    2 ++
>  mm/vmstat.c            |    4 ++++
>  2 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d0d7454..1e87936 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -106,6 +106,8 @@ enum zone_stat_item {
>  	NR_SHMEM,		/* shmem pages (included tmpfs/GEM pages) */
>  	NR_FILE_DIRTIED,	/* accumulated dirty pages */
>  	NR_WRITTEN,		/* accumulated written pages */
> +	NR_DIRTY_THRESHOLD,	/* writeback threshold */

s/writeback/dirty throttling/

> +	NR_DIRTY_BG_THRESHOLD,	/* bg writeback threshold */

I have no idea about this interface change. No ACK or NAK.

But technical wise, the above two enum items should better be removed
to avoid possibly eating one more cache line. The two items can be
printed by explicit code.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-09-13  3:15     ` Wu Fengguang
@ 2010-09-13  5:45       ` Michael Rubin
  -1 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-13  5:45 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Sun, Sep 12, 2010 at 8:15 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:

> But technical wise, the above two enum items should better be removed
> to avoid possibly eating one more cache line. The two items can be
> printed by explicit code.

Done. Patch coming.

mrubin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
@ 2010-09-13  5:45       ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-13  5:45 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Sun, Sep 12, 2010 at 8:15 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:

> But technical wise, the above two enum items should better be removed
> to avoid possibly eating one more cache line. The two items can be
> printed by explicit code.

Done. Patch coming.

mrubin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] writeback: Adding /sys/devices/system/node/<node>/vmstat
  2010-09-13  3:02     ` Wu Fengguang
@ 2010-09-15  6:18       ` Michael Rubin
  -1 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-15  6:18 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Sun, Sep 12, 2010 at 8:02 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> Do you have plan to port more vmstat_text[] items? :)

Yes. I feel bound to do it after all your help. :-)
I may need a few weeks to resolve some other issues but I can get back to this.

mrubin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] writeback: Adding /sys/devices/system/node/<node>/vmstat
@ 2010-09-15  6:18       ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-15  6:18 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david,
	kosaki.motohiro, npiggin, hch, axboe

On Sun, Sep 12, 2010 at 8:02 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> Do you have plan to port more vmstat_text[] items? :)

Yes. I feel bound to do it after all your help. :-)
I may need a few weeks to resolve some other issues but I can get back to this.

mrubin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-09-13  5:58   ` Michael Rubin
@ 2010-09-13 21:24     ` Andrew Morton
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2010-09-13 21:24 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel,
	david, kosaki.motohiro, npiggin, hch, axboe

On Sun, 12 Sep 2010 22:58:13 -0700
Michael Rubin <mrubin@google.com> wrote:

> The kernel already exposes the user desired thresholds in /proc/sys/vm
> with dirty_background_ratio and background_ratio. But the kernel may
> alter the number requested without giving the user any indication that
> is the case.
> 
> Knowing the actual ratios the kernel is honoring can help app developers
> understand how their buffered IO will be sent to the disk.
> 
>         $ grep threshold /proc/vmstat
>         nr_dirty_threshold 409111
>         nr_dirty_background_threshold 818223
> 

Yes, I think /proc/vmstat is a decent place to put these.  The needed
infrastructural support is minimal and although these numbers are
closely tied to the implementation-of-the-day, people should expect
individual fields in /proc/vmstat to appear and disappear at random as
kernel versions change.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
@ 2010-09-13 21:24     ` Andrew Morton
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2010-09-13 21:24 UTC (permalink / raw)
  To: Michael Rubin
  Cc: linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel,
	david, kosaki.motohiro, npiggin, hch, axboe

On Sun, 12 Sep 2010 22:58:13 -0700
Michael Rubin <mrubin@google.com> wrote:

> The kernel already exposes the user desired thresholds in /proc/sys/vm
> with dirty_background_ratio and background_ratio. But the kernel may
> alter the number requested without giving the user any indication that
> is the case.
> 
> Knowing the actual ratios the kernel is honoring can help app developers
> understand how their buffered IO will be sent to the disk.
> 
>         $ grep threshold /proc/vmstat
>         nr_dirty_threshold 409111
>         nr_dirty_background_threshold 818223
> 

Yes, I think /proc/vmstat is a decent place to put these.  The needed
infrastructural support is minimal and although these numbers are
closely tied to the implementation-of-the-day, people should expect
individual fields in /proc/vmstat to appear and disappear at random as
kernel versions change.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
  2010-09-13  5:58 [PATCH 0/5] writeback: kernel visibility Michael Rubin
@ 2010-09-13  5:58   ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-13  5:58 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

The kernel already exposes the user desired thresholds in /proc/sys/vm
with dirty_background_ratio and background_ratio. But the kernel may
alter the number requested without giving the user any indication that
is the case.

Knowing the actual ratios the kernel is honoring can help app developers
understand how their buffered IO will be sent to the disk.

        $ grep threshold /proc/vmstat
        nr_dirty_threshold 409111
        nr_dirty_background_threshold 818223

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 mm/vmstat.c |   39 +++++++++++++++++++++++++--------------
 1 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index d448ef4..76c37cd 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -17,6 +17,7 @@
 #include <linux/vmstat.h>
 #include <linux/sched.h>
 #include <linux/math64.h>
+#include <linux/writeback.h>
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -734,6 +735,8 @@ static const char * const vmstat_text[] = {
 	"nr_shmem",
 	"nr_dirtied",
 	"nr_written",
+	"nr_dirty_threshold",
+	"nr_dirty_background_threshold",
 
 #ifdef CONFIG_NUMA
 	"numa_hit",
@@ -894,36 +897,44 @@ static const struct file_operations proc_zoneinfo_file_operations = {
 	.release	= seq_release,
 };
 
+enum writeback_stat_item {
+	NR_DIRTY_THRESHOLD,
+	NR_DIRTY_BG_THRESHOLD,
+	NR_VM_WRITEBACK_STAT_ITEMS,
+};
+
 static void *vmstat_start(struct seq_file *m, loff_t *pos)
 {
 	unsigned long *v;
-#ifdef CONFIG_VM_EVENT_COUNTERS
-	unsigned long *e;
-#endif
-	int i;
+	int i, stat_items_size;
 
 	if (*pos >= ARRAY_SIZE(vmstat_text))
 		return NULL;
+	stat_items_size = NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long) +
+			  NR_VM_WRITEBACK_STAT_ITEMS * sizeof(unsigned long);
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
-	v = kmalloc(NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long)
-			+ sizeof(struct vm_event_state), GFP_KERNEL);
-#else
-	v = kmalloc(NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long),
-			GFP_KERNEL);
+	stat_items_size += sizeof(struct vm_event_state);
 #endif
+
+	v = kmalloc(stat_items_size, GFP_KERNEL);
 	m->private = v;
 	if (!v)
 		return ERR_PTR(-ENOMEM);
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		v[i] = global_page_state(i);
+	v += NR_VM_ZONE_STAT_ITEMS;
+
+	global_dirty_limits(v + NR_DIRTY_BG_THRESHOLD,
+			    v + NR_DIRTY_THRESHOLD);
+	v += NR_VM_WRITEBACK_STAT_ITEMS;
+
 #ifdef CONFIG_VM_EVENT_COUNTERS
-	e = v + NR_VM_ZONE_STAT_ITEMS;
-	all_vm_events(e);
-	e[PGPGIN] /= 2;		/* sectors -> kbytes */
-	e[PGPGOUT] /= 2;
+	all_vm_events(v);
+	v[PGPGIN] /= 2;		/* sectors -> kbytes */
+	v[PGPGOUT] /= 2;
 #endif
-	return v + *pos;
+	return m->private + *pos;
 }
 
 static void *vmstat_next(struct seq_file *m, void *arg, loff_t *pos)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat
@ 2010-09-13  5:58   ` Michael Rubin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Rubin @ 2010-09-13  5:58 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: fengguang.wu, jack, riel, akpm, david, kosaki.motohiro, npiggin,
	hch, axboe, Michael Rubin

The kernel already exposes the user desired thresholds in /proc/sys/vm
with dirty_background_ratio and background_ratio. But the kernel may
alter the number requested without giving the user any indication that
is the case.

Knowing the actual ratios the kernel is honoring can help app developers
understand how their buffered IO will be sent to the disk.

        $ grep threshold /proc/vmstat
        nr_dirty_threshold 409111
        nr_dirty_background_threshold 818223

Signed-off-by: Michael Rubin <mrubin@google.com>
---
 mm/vmstat.c |   39 +++++++++++++++++++++++++--------------
 1 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index d448ef4..76c37cd 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -17,6 +17,7 @@
 #include <linux/vmstat.h>
 #include <linux/sched.h>
 #include <linux/math64.h>
+#include <linux/writeback.h>
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -734,6 +735,8 @@ static const char * const vmstat_text[] = {
 	"nr_shmem",
 	"nr_dirtied",
 	"nr_written",
+	"nr_dirty_threshold",
+	"nr_dirty_background_threshold",
 
 #ifdef CONFIG_NUMA
 	"numa_hit",
@@ -894,36 +897,44 @@ static const struct file_operations proc_zoneinfo_file_operations = {
 	.release	= seq_release,
 };
 
+enum writeback_stat_item {
+	NR_DIRTY_THRESHOLD,
+	NR_DIRTY_BG_THRESHOLD,
+	NR_VM_WRITEBACK_STAT_ITEMS,
+};
+
 static void *vmstat_start(struct seq_file *m, loff_t *pos)
 {
 	unsigned long *v;
-#ifdef CONFIG_VM_EVENT_COUNTERS
-	unsigned long *e;
-#endif
-	int i;
+	int i, stat_items_size;
 
 	if (*pos >= ARRAY_SIZE(vmstat_text))
 		return NULL;
+	stat_items_size = NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long) +
+			  NR_VM_WRITEBACK_STAT_ITEMS * sizeof(unsigned long);
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
-	v = kmalloc(NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long)
-			+ sizeof(struct vm_event_state), GFP_KERNEL);
-#else
-	v = kmalloc(NR_VM_ZONE_STAT_ITEMS * sizeof(unsigned long),
-			GFP_KERNEL);
+	stat_items_size += sizeof(struct vm_event_state);
 #endif
+
+	v = kmalloc(stat_items_size, GFP_KERNEL);
 	m->private = v;
 	if (!v)
 		return ERR_PTR(-ENOMEM);
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		v[i] = global_page_state(i);
+	v += NR_VM_ZONE_STAT_ITEMS;
+
+	global_dirty_limits(v + NR_DIRTY_BG_THRESHOLD,
+			    v + NR_DIRTY_THRESHOLD);
+	v += NR_VM_WRITEBACK_STAT_ITEMS;
+
 #ifdef CONFIG_VM_EVENT_COUNTERS
-	e = v + NR_VM_ZONE_STAT_ITEMS;
-	all_vm_events(e);
-	e[PGPGIN] /= 2;		/* sectors -> kbytes */
-	e[PGPGOUT] /= 2;
+	all_vm_events(v);
+	v[PGPGIN] /= 2;		/* sectors -> kbytes */
+	v[PGPGOUT] /= 2;
 #endif
-	return v + *pos;
+	return m->private + *pos;
 }
 
 static void *vmstat_next(struct seq_file *m, void *arg, loff_t *pos)
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2010-09-15  6:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-12 20:30 [PATCH 0/5] writeback: kernel visibility Michael Rubin
2010-09-12 20:30 ` Michael Rubin
2010-09-12 20:30 ` [PATCH 1/5] mm: exporting account_page_dirty Michael Rubin
2010-09-12 20:30   ` Michael Rubin
2010-09-12 20:30 ` [PATCH 2/5] mm: account_page_writeback added Michael Rubin
2010-09-12 20:30   ` Michael Rubin
2010-09-13  2:50   ` Wu Fengguang
2010-09-13  2:50     ` Wu Fengguang
2010-09-12 20:30 ` [PATCH 3/5] writeback: nr_dirtied and nr_written in /proc/vmstat Michael Rubin
2010-09-12 20:30   ` Michael Rubin
2010-09-13  2:58   ` Wu Fengguang
2010-09-13  2:58     ` Wu Fengguang
2010-09-12 20:30 ` [PATCH 4/5] writeback: Adding /sys/devices/system/node/<node>/vmstat Michael Rubin
2010-09-12 20:30   ` Michael Rubin
2010-09-13  3:02   ` Wu Fengguang
2010-09-13  3:02     ` Wu Fengguang
2010-09-15  6:18     ` Michael Rubin
2010-09-15  6:18       ` Michael Rubin
2010-09-12 20:30 ` [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat Michael Rubin
2010-09-12 20:30   ` Michael Rubin
2010-09-13  3:15   ` Wu Fengguang
2010-09-13  3:15     ` Wu Fengguang
2010-09-13  5:45     ` Michael Rubin
2010-09-13  5:45       ` Michael Rubin
2010-09-13  5:58 [PATCH 0/5] writeback: kernel visibility Michael Rubin
2010-09-13  5:58 ` [PATCH 5/5] writeback: Reporting dirty thresholds in /proc/vmstat Michael Rubin
2010-09-13  5:58   ` Michael Rubin
2010-09-13 21:24   ` Andrew Morton
2010-09-13 21:24     ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.