* [PATCH 0/4] writeback: kernel visibility @ 2010-08-20 9:31 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin Patch #1 sets up some helper functions for account_page_dirty Patch #2 sets up some helper functions for account_page_writeback Patch #3 adds writeback visibility in /proc/vmstat Patch #4 add writeback thresholds to /proc/vmstat To help developers and applications gain visibility into writeback behaviour this patch adds two counters to /proc/vmstat. # grep nr_dirtied /proc/vmstat nr_dirtied 3747 # grep nr_entered_writeback /proc/vmstat nr_entered_writeback 3618 These entries allow user apps to understand writeback behaviour over time and learn how it is impacting their performance. Currently there is no way to inspect dirty and writeback speed over time. It's not possible for nr_dirty/nr_writeback. These entries are necessary to give visibility into writeback behaviour. We have /proc/diskstats which lets us understand the io in the block layer. We have blktrace for more in depth understanding. We have e2fsprogs and debugsfs to give insight into the file systems behaviour, but we don't offer our users the ability understand what writeback is doing. There is no way to know how active it is over the whole system, if it's falling behind or to quantify it's efforts. With these values exported users can easily see how much data applications are sending through writeback and also at what rates writeback is processing this data. Comparing the rates of change between the two allow developers to see when writeback is not able to keep up with incoming traffic and the rate of dirty memory being sent to the IO back end. This allows folks to understand their io workloads and track kernel issues. Non kernel engineers at Google often use these counters to solve puzzling performance problems. Patch #3 adds dirty thresholds to /proc/vmstat. # grep threshold /proc/vmstat nr_pages_dirty_threshold 409111 nr_pages_dirty_background_threshold 818223 The files that report the dirty thresholds belong in /proc/vmstat. They are meant for application writers so should not be in debugfs. But since they are more related to internals of writeback, albeit internals that are fundamental to how it works, /proc/sys/vm is not appropriate. Michael Rubin (4): mm: exporting account_page_dirty mm: account_page_writeback added writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat writeback: Reporting dirty thresholds in /proc/vmstat drivers/base/node.c | 14 ++++++++++++++ fs/ceph/addr.c | 7 +------ fs/nilfs2/segment.c | 2 +- include/linux/mm.h | 1 + include/linux/mmzone.h | 5 +++++ mm/page-writeback.c | 16 +++++++++++++++- mm/vmstat.c | 8 ++++++++ 7 files changed, 45 insertions(+), 8 deletions(-) ^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 0/4] writeback: kernel visibility @ 2010-08-20 9:31 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin Patch #1 sets up some helper functions for account_page_dirty Patch #2 sets up some helper functions for account_page_writeback Patch #3 adds writeback visibility in /proc/vmstat Patch #4 add writeback thresholds to /proc/vmstat To help developers and applications gain visibility into writeback behaviour this patch adds two counters to /proc/vmstat. # grep nr_dirtied /proc/vmstat nr_dirtied 3747 # grep nr_entered_writeback /proc/vmstat nr_entered_writeback 3618 These entries allow user apps to understand writeback behaviour over time and learn how it is impacting their performance. Currently there is no way to inspect dirty and writeback speed over time. It's not possible for nr_dirty/nr_writeback. These entries are necessary to give visibility into writeback behaviour. We have /proc/diskstats which lets us understand the io in the block layer. We have blktrace for more in depth understanding. We have e2fsprogs and debugsfs to give insight into the file systems behaviour, but we don't offer our users the ability understand what writeback is doing. There is no way to know how active it is over the whole system, if it's falling behind or to quantify it's efforts. With these values exported users can easily see how much data applications are sending through writeback and also at what rates writeback is processing this data. Comparing the rates of change between the two allow developers to see when writeback is not able to keep up with incoming traffic and the rate of dirty memory being sent to the IO back end. This allows folks to understand their io workloads and track kernel issues. Non kernel engineers at Google often use these counters to solve puzzling performance problems. Patch #3 adds dirty thresholds to /proc/vmstat. # grep threshold /proc/vmstat nr_pages_dirty_threshold 409111 nr_pages_dirty_background_threshold 818223 The files that report the dirty thresholds belong in /proc/vmstat. They are meant for application writers so should not be in debugfs. But since they are more related to internals of writeback, albeit internals that are fundamental to how it works, /proc/sys/vm is not appropriate. Michael Rubin (4): mm: exporting account_page_dirty mm: account_page_writeback added writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat writeback: Reporting dirty thresholds in /proc/vmstat drivers/base/node.c | 14 ++++++++++++++ fs/ceph/addr.c | 7 +------ fs/nilfs2/segment.c | 2 +- include/linux/mm.h | 1 + include/linux/mmzone.h | 5 +++++ mm/page-writeback.c | 16 +++++++++++++++- mm/vmstat.c | 8 ++++++++ 7 files changed, 45 insertions(+), 8 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 1/4] mm: exporting account_page_dirty 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 9:31 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin This allows code outside of the mm core to safely manipulate page state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. This has happened once already. Signed-off-by: Michael Rubin <mrubin@google.com> --- fs/ceph/addr.c | 8 +------- mm/page-writeback.c | 1 + 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 5598a0d..420d469 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page) spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ WARN_ON_ONCE(!PageUptodate(page)); - - if (mapping_cap_account_dirty(mapping)) { - __inc_zone_page_state(page, NR_FILE_DIRTY); - __inc_bdi_stat(mapping->backing_dev_info, - BDI_RECLAIMABLE); - task_io_account_write(PAGE_CACHE_SIZE); - } + account_page_dirtied(page, page->mapping); radix_tree_tag_set(&mapping->page_tree, page_index(page), PAGECACHE_TAG_DIRTY); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 7262aac..9d07a8d 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) task_io_account_write(PAGE_CACHE_SIZE); } } +EXPORT_SYMBOL(account_page_dirtied); /* * For address_spaces which do not use buffers. Just tag the page as dirty in -- 1.7.1 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* [PATCH 1/4] mm: exporting account_page_dirty @ 2010-08-20 9:31 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin This allows code outside of the mm core to safely manipulate page state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. This has happened once already. Signed-off-by: Michael Rubin <mrubin@google.com> --- fs/ceph/addr.c | 8 +------- mm/page-writeback.c | 1 + 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 5598a0d..420d469 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page) spin_lock_irq(&mapping->tree_lock); if (page->mapping) { /* Race with truncate? */ WARN_ON_ONCE(!PageUptodate(page)); - - if (mapping_cap_account_dirty(mapping)) { - __inc_zone_page_state(page, NR_FILE_DIRTY); - __inc_bdi_stat(mapping->backing_dev_info, - BDI_RECLAIMABLE); - task_io_account_write(PAGE_CACHE_SIZE); - } + account_page_dirtied(page, page->mapping); radix_tree_tag_set(&mapping->page_tree, page_index(page), PAGECACHE_TAG_DIRTY); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 7262aac..9d07a8d 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) task_io_account_write(PAGE_CACHE_SIZE); } } +EXPORT_SYMBOL(account_page_dirtied); /* * For address_spaces which do not use buffers. Just tag the page as dirty in -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [PATCH 1/4] mm: exporting account_page_dirty 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 9:39 ` Wu Fengguang -1 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-20 9:39 UTC (permalink / raw) To: Michael Rubin, Sage Weil Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe Sage, This is actually a bug fix for ceph, which missed the task_dirty_inc() call in account_page_dirty(). Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> On Fri, Aug 20, 2010 at 05:31:26PM +0800, Michael Rubin wrote: > This allows code outside of the mm core to safely manipulate page state > and not worry about the other accounting. Not using these routines means > that some code will lose track of the accounting and we get bugs. This > has happened once already. > > Signed-off-by: Michael Rubin <mrubin@google.com> > --- > fs/ceph/addr.c | 8 +------- > mm/page-writeback.c | 1 + > 2 files changed, 2 insertions(+), 7 deletions(-) > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 5598a0d..420d469 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page) > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > WARN_ON_ONCE(!PageUptodate(page)); > - > - if (mapping_cap_account_dirty(mapping)) { > - __inc_zone_page_state(page, NR_FILE_DIRTY); > - __inc_bdi_stat(mapping->backing_dev_info, > - BDI_RECLAIMABLE); > - task_io_account_write(PAGE_CACHE_SIZE); > - } > + account_page_dirtied(page, page->mapping); > radix_tree_tag_set(&mapping->page_tree, > page_index(page), PAGECACHE_TAG_DIRTY); > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 7262aac..9d07a8d 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) > task_io_account_write(PAGE_CACHE_SIZE); > } > } > +EXPORT_SYMBOL(account_page_dirtied); > > /* > * For address_spaces which do not use buffers. Just tag the page as dirty in > -- > 1.7.1 ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 1/4] mm: exporting account_page_dirty @ 2010-08-20 9:39 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-20 9:39 UTC (permalink / raw) To: Michael Rubin, Sage Weil Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe Sage, This is actually a bug fix for ceph, which missed the task_dirty_inc() call in account_page_dirty(). Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> On Fri, Aug 20, 2010 at 05:31:26PM +0800, Michael Rubin wrote: > This allows code outside of the mm core to safely manipulate page state > and not worry about the other accounting. Not using these routines means > that some code will lose track of the accounting and we get bugs. This > has happened once already. > > Signed-off-by: Michael Rubin <mrubin@google.com> > --- > fs/ceph/addr.c | 8 +------- > mm/page-writeback.c | 1 + > 2 files changed, 2 insertions(+), 7 deletions(-) > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > index 5598a0d..420d469 100644 > --- a/fs/ceph/addr.c > +++ b/fs/ceph/addr.c > @@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page) > spin_lock_irq(&mapping->tree_lock); > if (page->mapping) { /* Race with truncate? */ > WARN_ON_ONCE(!PageUptodate(page)); > - > - if (mapping_cap_account_dirty(mapping)) { > - __inc_zone_page_state(page, NR_FILE_DIRTY); > - __inc_bdi_stat(mapping->backing_dev_info, > - BDI_RECLAIMABLE); > - task_io_account_write(PAGE_CACHE_SIZE); > - } > + account_page_dirtied(page, page->mapping); > radix_tree_tag_set(&mapping->page_tree, > page_index(page), PAGECACHE_TAG_DIRTY); > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 7262aac..9d07a8d 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) > task_io_account_write(PAGE_CACHE_SIZE); > } > } > +EXPORT_SYMBOL(account_page_dirtied); > > /* > * For address_spaces which do not use buffers. Just tag the page as dirty in > -- > 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 1/4] mm: exporting account_page_dirty 2010-08-20 9:39 ` Wu Fengguang @ 2010-08-20 15:37 ` Sage Weil -1 siblings, 0 replies; 59+ messages in thread From: Sage Weil @ 2010-08-20 15:37 UTC (permalink / raw) To: Wu Fengguang Cc: Michael Rubin, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, 20 Aug 2010, Wu Fengguang wrote: > Sage, > > This is actually a bug fix for ceph, which missed the task_dirty_inc() > call in account_page_dirty(). > > Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Thanks, I missed this one earlier. I'll queue it up. sage > > On Fri, Aug 20, 2010 at 05:31:26PM +0800, Michael Rubin wrote: > > This allows code outside of the mm core to safely manipulate page state > > and not worry about the other accounting. Not using these routines means > > that some code will lose track of the accounting and we get bugs. This > > has happened once already. > > > > Signed-off-by: Michael Rubin <mrubin@google.com> > > --- > > fs/ceph/addr.c | 8 +------- > > mm/page-writeback.c | 1 + > > 2 files changed, 2 insertions(+), 7 deletions(-) > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > index 5598a0d..420d469 100644 > > --- a/fs/ceph/addr.c > > +++ b/fs/ceph/addr.c > > @@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page) > > spin_lock_irq(&mapping->tree_lock); > > if (page->mapping) { /* Race with truncate? */ > > WARN_ON_ONCE(!PageUptodate(page)); > > - > > - if (mapping_cap_account_dirty(mapping)) { > > - __inc_zone_page_state(page, NR_FILE_DIRTY); > > - __inc_bdi_stat(mapping->backing_dev_info, > > - BDI_RECLAIMABLE); > > - task_io_account_write(PAGE_CACHE_SIZE); > > - } > > + account_page_dirtied(page, page->mapping); > > radix_tree_tag_set(&mapping->page_tree, > > page_index(page), PAGECACHE_TAG_DIRTY); > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 7262aac..9d07a8d 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) > > task_io_account_write(PAGE_CACHE_SIZE); > > } > > } > > +EXPORT_SYMBOL(account_page_dirtied); > > > > /* > > * For address_spaces which do not use buffers. Just tag the page as dirty in > > -- > > 1.7.1 > > ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 1/4] mm: exporting account_page_dirty @ 2010-08-20 15:37 ` Sage Weil 0 siblings, 0 replies; 59+ messages in thread From: Sage Weil @ 2010-08-20 15:37 UTC (permalink / raw) To: Wu Fengguang Cc: Michael Rubin, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, 20 Aug 2010, Wu Fengguang wrote: > Sage, > > This is actually a bug fix for ceph, which missed the task_dirty_inc() > call in account_page_dirty(). > > Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Thanks, I missed this one earlier. I'll queue it up. sage > > On Fri, Aug 20, 2010 at 05:31:26PM +0800, Michael Rubin wrote: > > This allows code outside of the mm core to safely manipulate page state > > and not worry about the other accounting. Not using these routines means > > that some code will lose track of the accounting and we get bugs. This > > has happened once already. > > > > Signed-off-by: Michael Rubin <mrubin@google.com> > > --- > > fs/ceph/addr.c | 8 +------- > > mm/page-writeback.c | 1 + > > 2 files changed, 2 insertions(+), 7 deletions(-) > > > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > > index 5598a0d..420d469 100644 > > --- a/fs/ceph/addr.c > > +++ b/fs/ceph/addr.c > > @@ -105,13 +105,7 @@ static int ceph_set_page_dirty(struct page *page) > > spin_lock_irq(&mapping->tree_lock); > > if (page->mapping) { /* Race with truncate? */ > > WARN_ON_ONCE(!PageUptodate(page)); > > - > > - if (mapping_cap_account_dirty(mapping)) { > > - __inc_zone_page_state(page, NR_FILE_DIRTY); > > - __inc_bdi_stat(mapping->backing_dev_info, > > - BDI_RECLAIMABLE); > > - task_io_account_write(PAGE_CACHE_SIZE); > > - } > > + account_page_dirtied(page, page->mapping); > > radix_tree_tag_set(&mapping->page_tree, > > page_index(page), PAGECACHE_TAG_DIRTY); > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 7262aac..9d07a8d 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1131,6 +1131,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) > > task_io_account_write(PAGE_CACHE_SIZE); > > } > > } > > +EXPORT_SYMBOL(account_page_dirtied); > > > > /* > > * For address_spaces which do not use buffers. Just tag the page as dirty in > > -- > > 1.7.1 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 2/4] mm: account_page_writeback added 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 9:31 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin This allows code outside of the mm core to safely manipulate page writeback state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. Signed-off-by: Michael Rubin <mrubin@google.com> --- fs/nilfs2/segment.c | 2 +- include/linux/mm.h | 1 + mm/page-writeback.c | 13 ++++++++++++- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 9fd051a..5617f16 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1599,7 +1599,7 @@ nilfs_copy_replace_page_buffers(struct page *page, struct list_head *out) kunmap_atomic(kaddr, KM_USER0); if (!TestSetPageWriteback(clone_page)) - inc_zone_page_state(clone_page, NR_WRITEBACK); + account_page_writeback(clone_page); unlock_page(clone_page); return 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index 709f672..4b2f38b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -856,6 +856,7 @@ int __set_page_dirty_no_writeback(struct page *page); int redirty_page_for_writepage(struct writeback_control *wbc, struct page *page); void account_page_dirtied(struct page *page, struct address_space *mapping); +void account_page_writeback(struct page *page); int set_page_dirty(struct page *page); int set_page_dirty_lock(struct page *page); int clear_page_dirty_for_io(struct page *page); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 9d07a8d..ae5f5d5 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1134,6 +1134,17 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) EXPORT_SYMBOL(account_page_dirtied); /* + * Helper function for set_page_writeback family. + * NOTE: Unlike account_page_dirtied this does not rely on being atomic + * wrt interrupts. + */ +void account_page_writeback(struct page *page) +{ + inc_zone_page_state(page, NR_WRITEBACK); +} +EXPORT_SYMBOL(account_page_writeback); + +/* * For address_spaces which do not use buffers. Just tag the page as dirty in * its radix tree. * @@ -1371,7 +1382,7 @@ int test_set_page_writeback(struct page *page) ret = TestSetPageWriteback(page); } if (!ret) - inc_zone_page_state(page, NR_WRITEBACK); + account_page_writeback(page); return ret; } -- 1.7.1 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* [PATCH 2/4] mm: account_page_writeback added @ 2010-08-20 9:31 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin This allows code outside of the mm core to safely manipulate page writeback state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. Signed-off-by: Michael Rubin <mrubin@google.com> --- fs/nilfs2/segment.c | 2 +- include/linux/mm.h | 1 + mm/page-writeback.c | 13 ++++++++++++- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 9fd051a..5617f16 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1599,7 +1599,7 @@ nilfs_copy_replace_page_buffers(struct page *page, struct list_head *out) kunmap_atomic(kaddr, KM_USER0); if (!TestSetPageWriteback(clone_page)) - inc_zone_page_state(clone_page, NR_WRITEBACK); + account_page_writeback(clone_page); unlock_page(clone_page); return 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index 709f672..4b2f38b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -856,6 +856,7 @@ int __set_page_dirty_no_writeback(struct page *page); int redirty_page_for_writepage(struct writeback_control *wbc, struct page *page); void account_page_dirtied(struct page *page, struct address_space *mapping); +void account_page_writeback(struct page *page); int set_page_dirty(struct page *page); int set_page_dirty_lock(struct page *page); int clear_page_dirty_for_io(struct page *page); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 9d07a8d..ae5f5d5 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1134,6 +1134,17 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) EXPORT_SYMBOL(account_page_dirtied); /* + * Helper function for set_page_writeback family. + * NOTE: Unlike account_page_dirtied this does not rely on being atomic + * wrt interrupts. + */ +void account_page_writeback(struct page *page) +{ + inc_zone_page_state(page, NR_WRITEBACK); +} +EXPORT_SYMBOL(account_page_writeback); + +/* * For address_spaces which do not use buffers. Just tag the page as dirty in * its radix tree. * @@ -1371,7 +1382,7 @@ int test_set_page_writeback(struct page *page) ret = TestSetPageWriteback(page); } if (!ret) - inc_zone_page_state(page, NR_WRITEBACK); + account_page_writeback(page); return ret; } -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [PATCH 2/4] mm: account_page_writeback added 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 9:45 ` Wu Fengguang -1 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-20 9:45 UTC (permalink / raw) To: Michael Rubin Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe I'm not sure this should be an inline function, just a reminder. Even with one more inc_zone_page_state() in next patch. > +void account_page_writeback(struct page *page) > +{ > + inc_zone_page_state(page, NR_WRITEBACK); > +} > +EXPORT_SYMBOL(account_page_writeback); ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 2/4] mm: account_page_writeback added @ 2010-08-20 9:45 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-20 9:45 UTC (permalink / raw) To: Michael Rubin Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe I'm not sure this should be an inline function, just a reminder. Even with one more inc_zone_page_state() in next patch. > +void account_page_writeback(struct page *page) > +{ > + inc_zone_page_state(page, NR_WRITEBACK); > +} > +EXPORT_SYMBOL(account_page_writeback); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 2/4] mm: account_page_writeback added 2010-08-20 9:45 ` Wu Fengguang @ 2010-08-20 10:08 ` KOSAKI Motohiro -1 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-20 10:08 UTC (permalink / raw) To: Wu Fengguang Cc: kosaki.motohiro, Michael Rubin, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe > I'm not sure this should be an inline function, just a reminder. > Even with one more inc_zone_page_state() in next patch. > > > +void account_page_writeback(struct page *page) > > +{ > > + inc_zone_page_state(page, NR_WRITEBACK); > > +} > > +EXPORT_SYMBOL(account_page_writeback); Personally, I like inline. but it's no big matter. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 2/4] mm: account_page_writeback added @ 2010-08-20 10:08 ` KOSAKI Motohiro 0 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-20 10:08 UTC (permalink / raw) To: Wu Fengguang Cc: kosaki.motohiro, Michael Rubin, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe > I'm not sure this should be an inline function, just a reminder. > Even with one more inc_zone_page_state() in next patch. > > > +void account_page_writeback(struct page *page) > > +{ > > + inc_zone_page_state(page, NR_WRITEBACK); > > +} > > +EXPORT_SYMBOL(account_page_writeback); Personally, I like inline. but it's no big matter. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 9:31 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin To help developers and applications gain visibility into writeback behaviour adding two entries to /proc/vmstat. # grep nr_dirtied /proc/vmstat nr_dirtied 3747 # grep nr_entered_writeback /proc/vmstat nr_entered_writeback 3618 In order to track the "cleaned" and "dirtied" counts we added two vm_stat_items. Per memory node stats have been added also. So we can see per node granularity: # cat /sys/devices/system/node/node20/writebackstat Node 20 pages_writeback: 0 times Node 20 pages_dirtied: 0 times Signed-off-by: Michael Rubin <mrubin@google.com> --- drivers/base/node.c | 14 ++++++++++++++ include/linux/mmzone.h | 2 ++ mm/page-writeback.c | 2 ++ mm/vmstat.c | 3 +++ 4 files changed, 21 insertions(+), 0 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 2872e86..2d05421 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -160,6 +160,18 @@ static ssize_t node_read_numastat(struct sys_device * dev, } static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL); +static ssize_t node_read_writebackstat(struct sys_device *dev, + struct sysdev_attribute *attr, char *buf) +{ + int nid = dev->id; + return sprintf(buf, + "Node %d pages_writeback: %lu times\n" + "Node %d pages_dirtied: %lu times\n", + nid, node_page_state(nid, NR_PAGES_ENTERED_WRITEBACK), + nid, node_page_state(nid, NR_FILE_PAGES_DIRTIED)); +} +static SYSDEV_ATTR(writebackstat, S_IRUGO, node_read_writebackstat, NULL); + static ssize_t node_read_distance(struct sys_device * dev, struct sysdev_attribute *attr, char * buf) { @@ -243,6 +255,7 @@ int register_node(struct node *node, int num, struct node *parent) sysdev_create_file(&node->sysdev, &attr_meminfo); sysdev_create_file(&node->sysdev, &attr_numastat); sysdev_create_file(&node->sysdev, &attr_distance); + sysdev_create_file(&node->sysdev, &attr_writebackstat); scan_unevictable_register_node(node); @@ -267,6 +280,7 @@ void unregister_node(struct node *node) sysdev_remove_file(&node->sysdev, &attr_meminfo); sysdev_remove_file(&node->sysdev, &attr_numastat); sysdev_remove_file(&node->sysdev, &attr_distance); + sysdev_remove_file(&node->sysdev, &attr_writebackstat); scan_unevictable_unregister_node(node); hugetlb_unregister_node(node); /* no-op, if memoryless node */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6e6e626..fe4e6dd 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -104,6 +104,8 @@ enum zone_stat_item { NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ + NR_FILE_PAGES_DIRTIED, /* number of times pages get dirtied */ + NR_PAGES_ENTERED_WRITEBACK, /* number of times pages enter writeback */ #ifdef CONFIG_NUMA NUMA_HIT, /* allocated in intended node */ NUMA_MISS, /* allocated in non intended node */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index ae5f5d5..1b1763c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1126,6 +1126,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) { if (mapping_cap_account_dirty(mapping)) { __inc_zone_page_state(page, NR_FILE_DIRTY); + __inc_zone_page_state(page, NR_FILE_PAGES_DIRTIED); __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); task_dirty_inc(current); task_io_account_write(PAGE_CACHE_SIZE); @@ -1141,6 +1142,7 @@ EXPORT_SYMBOL(account_page_dirtied); void account_page_writeback(struct page *page) { inc_zone_page_state(page, NR_WRITEBACK); + inc_zone_page_state(page, NR_PAGES_ENTERED_WRITEBACK); } EXPORT_SYMBOL(account_page_writeback); diff --git a/mm/vmstat.c b/mm/vmstat.c index f389168..073a496 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -732,6 +732,9 @@ static const char * const vmstat_text[] = { "nr_isolated_anon", "nr_isolated_file", "nr_shmem", + "nr_dirtied", + "nr_entered_writeback", + #ifdef CONFIG_NUMA "numa_hit", "numa_miss", -- 1.7.1 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-20 9:31 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin To help developers and applications gain visibility into writeback behaviour adding two entries to /proc/vmstat. # grep nr_dirtied /proc/vmstat nr_dirtied 3747 # grep nr_entered_writeback /proc/vmstat nr_entered_writeback 3618 In order to track the "cleaned" and "dirtied" counts we added two vm_stat_items. Per memory node stats have been added also. So we can see per node granularity: # cat /sys/devices/system/node/node20/writebackstat Node 20 pages_writeback: 0 times Node 20 pages_dirtied: 0 times Signed-off-by: Michael Rubin <mrubin@google.com> --- drivers/base/node.c | 14 ++++++++++++++ include/linux/mmzone.h | 2 ++ mm/page-writeback.c | 2 ++ mm/vmstat.c | 3 +++ 4 files changed, 21 insertions(+), 0 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 2872e86..2d05421 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -160,6 +160,18 @@ static ssize_t node_read_numastat(struct sys_device * dev, } static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL); +static ssize_t node_read_writebackstat(struct sys_device *dev, + struct sysdev_attribute *attr, char *buf) +{ + int nid = dev->id; + return sprintf(buf, + "Node %d pages_writeback: %lu times\n" + "Node %d pages_dirtied: %lu times\n", + nid, node_page_state(nid, NR_PAGES_ENTERED_WRITEBACK), + nid, node_page_state(nid, NR_FILE_PAGES_DIRTIED)); +} +static SYSDEV_ATTR(writebackstat, S_IRUGO, node_read_writebackstat, NULL); + static ssize_t node_read_distance(struct sys_device * dev, struct sysdev_attribute *attr, char * buf) { @@ -243,6 +255,7 @@ int register_node(struct node *node, int num, struct node *parent) sysdev_create_file(&node->sysdev, &attr_meminfo); sysdev_create_file(&node->sysdev, &attr_numastat); sysdev_create_file(&node->sysdev, &attr_distance); + sysdev_create_file(&node->sysdev, &attr_writebackstat); scan_unevictable_register_node(node); @@ -267,6 +280,7 @@ void unregister_node(struct node *node) sysdev_remove_file(&node->sysdev, &attr_meminfo); sysdev_remove_file(&node->sysdev, &attr_numastat); sysdev_remove_file(&node->sysdev, &attr_distance); + sysdev_remove_file(&node->sysdev, &attr_writebackstat); scan_unevictable_unregister_node(node); hugetlb_unregister_node(node); /* no-op, if memoryless node */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6e6e626..fe4e6dd 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -104,6 +104,8 @@ enum zone_stat_item { NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ + NR_FILE_PAGES_DIRTIED, /* number of times pages get dirtied */ + NR_PAGES_ENTERED_WRITEBACK, /* number of times pages enter writeback */ #ifdef CONFIG_NUMA NUMA_HIT, /* allocated in intended node */ NUMA_MISS, /* allocated in non intended node */ diff --git a/mm/page-writeback.c b/mm/page-writeback.c index ae5f5d5..1b1763c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1126,6 +1126,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) { if (mapping_cap_account_dirty(mapping)) { __inc_zone_page_state(page, NR_FILE_DIRTY); + __inc_zone_page_state(page, NR_FILE_PAGES_DIRTIED); __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); task_dirty_inc(current); task_io_account_write(PAGE_CACHE_SIZE); @@ -1141,6 +1142,7 @@ EXPORT_SYMBOL(account_page_dirtied); void account_page_writeback(struct page *page) { inc_zone_page_state(page, NR_WRITEBACK); + inc_zone_page_state(page, NR_PAGES_ENTERED_WRITEBACK); } EXPORT_SYMBOL(account_page_writeback); diff --git a/mm/vmstat.c b/mm/vmstat.c index f389168..073a496 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -732,6 +732,9 @@ static const char * const vmstat_text[] = { "nr_isolated_anon", "nr_isolated_file", "nr_shmem", + "nr_dirtied", + "nr_entered_writeback", + #ifdef CONFIG_NUMA "numa_hit", "numa_miss", -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 10:05 ` KOSAKI Motohiro -1 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-20 10:05 UTC (permalink / raw) To: Michael Rubin Cc: kosaki.motohiro, linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe > diff --git a/mm/vmstat.c b/mm/vmstat.c > index f389168..073a496 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -732,6 +732,9 @@ static const char * const vmstat_text[] = { > "nr_isolated_anon", > "nr_isolated_file", > "nr_shmem", > + "nr_dirtied", > + "nr_entered_writeback", > + > #ifdef CONFIG_NUMA > "numa_hit", > "numa_miss", 'nr_entered_writeback' seems ok. but nr_dirtied seems a bit easy confusable with 'nr_dirty'. Can you please choice more clear meaningful name? Otherwise looks good to me. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-20 10:05 ` KOSAKI Motohiro 0 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-20 10:05 UTC (permalink / raw) To: Michael Rubin Cc: kosaki.motohiro, linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe > diff --git a/mm/vmstat.c b/mm/vmstat.c > index f389168..073a496 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -732,6 +732,9 @@ static const char * const vmstat_text[] = { > "nr_isolated_anon", > "nr_isolated_file", > "nr_shmem", > + "nr_dirtied", > + "nr_entered_writeback", > + > #ifdef CONFIG_NUMA > "numa_hit", > "numa_miss", 'nr_entered_writeback' seems ok. but nr_dirtied seems a bit easy confusable with 'nr_dirty'. Can you please choice more clear meaningful name? Otherwise looks good to me. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 10:08 ` Wu Fengguang -1 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-20 10:08 UTC (permalink / raw) To: Michael Rubin Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 05:31:28PM +0800, Michael Rubin wrote: > To help developers and applications gain visibility into writeback > behaviour adding two entries to /proc/vmstat. > > # grep nr_dirtied /proc/vmstat > nr_dirtied 3747 > # grep nr_entered_writeback /proc/vmstat > nr_entered_writeback 3618 How about the names nr_dirty_accumulated and nr_writeback_accumulated? It seems more consistent, for both the interface and code (see below). I'm not really sure though. > In order to track the "cleaned" and "dirtied" counts we added two > vm_stat_items. Per memory node stats have been added also. So we can > see per node granularity: > > # cat /sys/devices/system/node/node20/writebackstat > Node 20 pages_writeback: 0 times > Node 20 pages_dirtied: 0 times I'd prefer the name "vmstat" over "writebackstat", and propose to migrate items from /proc/zoneinfo over time. zoneinfo is a terrible interface for scripting. Also, are there meaningful usage of per-node writeback stats? The numbers are naturally per-bdi ones instead. But if we plan to expose them for each bdi, this patch will need to be implemented vastly differently. > Signed-off-by: Michael Rubin <mrubin@google.com> > --- > drivers/base/node.c | 14 ++++++++++++++ > include/linux/mmzone.h | 2 ++ > mm/page-writeback.c | 2 ++ > mm/vmstat.c | 3 +++ > 4 files changed, 21 insertions(+), 0 deletions(-) > > diff --git a/drivers/base/node.c b/drivers/base/node.c > index 2872e86..2d05421 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -160,6 +160,18 @@ static ssize_t node_read_numastat(struct sys_device * dev, > } > static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL); > > +static ssize_t node_read_writebackstat(struct sys_device *dev, > + struct sysdev_attribute *attr, char *buf) > +{ > + int nid = dev->id; > + return sprintf(buf, > + "Node %d pages_writeback: %lu times\n" > + "Node %d pages_dirtied: %lu times\n", > + nid, node_page_state(nid, NR_PAGES_ENTERED_WRITEBACK), > + nid, node_page_state(nid, NR_FILE_PAGES_DIRTIED)); nid, node_page_state(nid, NR_WRITEBACK_ACCUMULATED), nid, node_page_state(nid, NR_FILE_DIRTY_ACCUMULATED)); > +} > +static SYSDEV_ATTR(writebackstat, S_IRUGO, node_read_writebackstat, NULL); > + s/writebackstat/vmstat/ > static ssize_t node_read_distance(struct sys_device * dev, > struct sysdev_attribute *attr, char * buf) > { > @@ -243,6 +255,7 @@ int register_node(struct node *node, int num, struct node *parent) > sysdev_create_file(&node->sysdev, &attr_meminfo); > sysdev_create_file(&node->sysdev, &attr_numastat); > sysdev_create_file(&node->sysdev, &attr_distance); > + sysdev_create_file(&node->sysdev, &attr_writebackstat); ditto s/writebackstat/vmstat/ > scan_unevictable_register_node(node); > > @@ -267,6 +280,7 @@ void unregister_node(struct node *node) > sysdev_remove_file(&node->sysdev, &attr_meminfo); > sysdev_remove_file(&node->sysdev, &attr_numastat); > sysdev_remove_file(&node->sysdev, &attr_distance); > + sysdev_remove_file(&node->sysdev, &attr_writebackstat); ditto s/writebackstat/vmstat/ > scan_unevictable_unregister_node(node); > hugetlb_unregister_node(node); /* no-op, if memoryless node */ > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 6e6e626..fe4e6dd 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -104,6 +104,8 @@ enum zone_stat_item { > NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ > NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ > NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ > + NR_FILE_PAGES_DIRTIED, /* number of times pages get dirtied */ > + NR_PAGES_ENTERED_WRITEBACK, /* number of times pages enter writeback */ NR_FILE_DIRTY_ACCUMULATED, /* number of times pages get dirtied */ NR_WRITEBACK_ACCUMULATED, /* number of times pages enter writeback */ > #ifdef CONFIG_NUMA > NUMA_HIT, /* allocated in intended node */ > NUMA_MISS, /* allocated in non intended node */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index ae5f5d5..1b1763c 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1126,6 +1126,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) > { > if (mapping_cap_account_dirty(mapping)) { > __inc_zone_page_state(page, NR_FILE_DIRTY); > + __inc_zone_page_state(page, NR_FILE_PAGES_DIRTIED); __inc_zone_page_state(page, NR_FILE_DIRTY_ACCUMULATED); > __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > task_dirty_inc(current); > task_io_account_write(PAGE_CACHE_SIZE); > @@ -1141,6 +1142,7 @@ EXPORT_SYMBOL(account_page_dirtied); > void account_page_writeback(struct page *page) > { > inc_zone_page_state(page, NR_WRITEBACK); > + inc_zone_page_state(page, NR_PAGES_ENTERED_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK_ACCUMULATED); > } > EXPORT_SYMBOL(account_page_writeback); > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index f389168..073a496 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -732,6 +732,9 @@ static const char * const vmstat_text[] = { > "nr_isolated_anon", > "nr_isolated_file", > "nr_shmem", > + "nr_dirtied", > + "nr_entered_writeback", "nr_dirty_accumulated", "nr_writeback_accumulated", > #ifdef CONFIG_NUMA > "numa_hit", > "numa_miss", > -- > 1.7.1 ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-20 10:08 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-20 10:08 UTC (permalink / raw) To: Michael Rubin Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 05:31:28PM +0800, Michael Rubin wrote: > To help developers and applications gain visibility into writeback > behaviour adding two entries to /proc/vmstat. > > # grep nr_dirtied /proc/vmstat > nr_dirtied 3747 > # grep nr_entered_writeback /proc/vmstat > nr_entered_writeback 3618 How about the names nr_dirty_accumulated and nr_writeback_accumulated? It seems more consistent, for both the interface and code (see below). I'm not really sure though. > In order to track the "cleaned" and "dirtied" counts we added two > vm_stat_items. Per memory node stats have been added also. So we can > see per node granularity: > > # cat /sys/devices/system/node/node20/writebackstat > Node 20 pages_writeback: 0 times > Node 20 pages_dirtied: 0 times I'd prefer the name "vmstat" over "writebackstat", and propose to migrate items from /proc/zoneinfo over time. zoneinfo is a terrible interface for scripting. Also, are there meaningful usage of per-node writeback stats? The numbers are naturally per-bdi ones instead. But if we plan to expose them for each bdi, this patch will need to be implemented vastly differently. > Signed-off-by: Michael Rubin <mrubin@google.com> > --- > drivers/base/node.c | 14 ++++++++++++++ > include/linux/mmzone.h | 2 ++ > mm/page-writeback.c | 2 ++ > mm/vmstat.c | 3 +++ > 4 files changed, 21 insertions(+), 0 deletions(-) > > diff --git a/drivers/base/node.c b/drivers/base/node.c > index 2872e86..2d05421 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -160,6 +160,18 @@ static ssize_t node_read_numastat(struct sys_device * dev, > } > static SYSDEV_ATTR(numastat, S_IRUGO, node_read_numastat, NULL); > > +static ssize_t node_read_writebackstat(struct sys_device *dev, > + struct sysdev_attribute *attr, char *buf) > +{ > + int nid = dev->id; > + return sprintf(buf, > + "Node %d pages_writeback: %lu times\n" > + "Node %d pages_dirtied: %lu times\n", > + nid, node_page_state(nid, NR_PAGES_ENTERED_WRITEBACK), > + nid, node_page_state(nid, NR_FILE_PAGES_DIRTIED)); nid, node_page_state(nid, NR_WRITEBACK_ACCUMULATED), nid, node_page_state(nid, NR_FILE_DIRTY_ACCUMULATED)); > +} > +static SYSDEV_ATTR(writebackstat, S_IRUGO, node_read_writebackstat, NULL); > + s/writebackstat/vmstat/ > static ssize_t node_read_distance(struct sys_device * dev, > struct sysdev_attribute *attr, char * buf) > { > @@ -243,6 +255,7 @@ int register_node(struct node *node, int num, struct node *parent) > sysdev_create_file(&node->sysdev, &attr_meminfo); > sysdev_create_file(&node->sysdev, &attr_numastat); > sysdev_create_file(&node->sysdev, &attr_distance); > + sysdev_create_file(&node->sysdev, &attr_writebackstat); ditto s/writebackstat/vmstat/ > scan_unevictable_register_node(node); > > @@ -267,6 +280,7 @@ void unregister_node(struct node *node) > sysdev_remove_file(&node->sysdev, &attr_meminfo); > sysdev_remove_file(&node->sysdev, &attr_numastat); > sysdev_remove_file(&node->sysdev, &attr_distance); > + sysdev_remove_file(&node->sysdev, &attr_writebackstat); ditto s/writebackstat/vmstat/ > scan_unevictable_unregister_node(node); > hugetlb_unregister_node(node); /* no-op, if memoryless node */ > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 6e6e626..fe4e6dd 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -104,6 +104,8 @@ enum zone_stat_item { > NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ > NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ > NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ > + NR_FILE_PAGES_DIRTIED, /* number of times pages get dirtied */ > + NR_PAGES_ENTERED_WRITEBACK, /* number of times pages enter writeback */ NR_FILE_DIRTY_ACCUMULATED, /* number of times pages get dirtied */ NR_WRITEBACK_ACCUMULATED, /* number of times pages enter writeback */ > #ifdef CONFIG_NUMA > NUMA_HIT, /* allocated in intended node */ > NUMA_MISS, /* allocated in non intended node */ > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index ae5f5d5..1b1763c 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1126,6 +1126,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) > { > if (mapping_cap_account_dirty(mapping)) { > __inc_zone_page_state(page, NR_FILE_DIRTY); > + __inc_zone_page_state(page, NR_FILE_PAGES_DIRTIED); __inc_zone_page_state(page, NR_FILE_DIRTY_ACCUMULATED); > __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); > task_dirty_inc(current); > task_io_account_write(PAGE_CACHE_SIZE); > @@ -1141,6 +1142,7 @@ EXPORT_SYMBOL(account_page_dirtied); > void account_page_writeback(struct page *page) > { > inc_zone_page_state(page, NR_WRITEBACK); > + inc_zone_page_state(page, NR_PAGES_ENTERED_WRITEBACK); inc_zone_page_state(page, NR_WRITEBACK_ACCUMULATED); > } > EXPORT_SYMBOL(account_page_writeback); > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index f389168..073a496 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -732,6 +732,9 @@ static const char * const vmstat_text[] = { > "nr_isolated_anon", > "nr_isolated_file", > "nr_shmem", > + "nr_dirtied", > + "nr_entered_writeback", "nr_dirty_accumulated", "nr_writeback_accumulated", > #ifdef CONFIG_NUMA > "numa_hit", > "numa_miss", > -- > 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat 2010-08-20 10:08 ` Wu Fengguang @ 2010-08-20 23:51 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 23:51 UTC (permalink / raw) To: Wu Fengguang Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 3:08 AM, Wu Fengguang <fengguang.wu@intel.com> wrote: > How about the names nr_dirty_accumulated and nr_writeback_accumulated? > It seems more consistent, for both the interface and code (see below). > I'm not really sure though. Those names don't seem to right to me. I admit I like "nr_dirtied" and "nr_cleaned" that seems most understood. These numbers also get very big pretty fast so I don't think it's hard to infer. >> In order to track the "cleaned" and "dirtied" counts we added two >> vm_stat_items. Per memory node stats have been added also. So we can >> see per node granularity: >> >> # cat /sys/devices/system/node/node20/writebackstat >> Node 20 pages_writeback: 0 times >> Node 20 pages_dirtied: 0 times > > I'd prefer the name "vmstat" over "writebackstat", and propose to > migrate items from /proc/zoneinfo over time. zoneinfo is a terrible > interface for scripting. I like vmstat also. I can do that. > Also, are there meaningful usage of per-node writeback stats? For us yes. We use fake numa nodes to implement cgroup memory isolation. This allows us to see what the writeback behaviour is like per cgroup. > The numbers are naturally per-bdi ones instead. But if we plan to > expose them for each bdi, this patch will need to be implemented > vastly differently. Currently I have no plans to do that. mrubin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-20 23:51 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 23:51 UTC (permalink / raw) To: Wu Fengguang Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 3:08 AM, Wu Fengguang <fengguang.wu@intel.com> wrote: > How about the names nr_dirty_accumulated and nr_writeback_accumulated? > It seems more consistent, for both the interface and code (see below). > I'm not really sure though. Those names don't seem to right to me. I admit I like "nr_dirtied" and "nr_cleaned" that seems most understood. These numbers also get very big pretty fast so I don't think it's hard to infer. >> In order to track the "cleaned" and "dirtied" counts we added two >> vm_stat_items. Per memory node stats have been added also. So we can >> see per node granularity: >> >> # cat /sys/devices/system/node/node20/writebackstat >> Node 20 pages_writeback: 0 times >> Node 20 pages_dirtied: 0 times > > I'd prefer the name "vmstat" over "writebackstat", and propose to > migrate items from /proc/zoneinfo over time. zoneinfo is a terrible > interface for scripting. I like vmstat also. I can do that. > Also, are there meaningful usage of per-node writeback stats? For us yes. We use fake numa nodes to implement cgroup memory isolation. This allows us to see what the writeback behaviour is like per cgroup. > The numbers are naturally per-bdi ones instead. But if we plan to > expose them for each bdi, this patch will need to be implemented > vastly differently. Currently I have no plans to do that. mrubin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat 2010-08-20 23:51 ` Michael Rubin (?) @ 2010-08-21 0:48 ` Wu Fengguang -1 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-21 0:48 UTC (permalink / raw) To: Michael Rubin Cc: Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Sat, Aug 21, 2010 at 07:51:38AM +0800, Michael Rubin wrote: > On Fri, Aug 20, 2010 at 3:08 AM, Wu Fengguang <fengguang.wu@intel.com> wrote: > > How about the names nr_dirty_accumulated and nr_writeback_accumulated? > > It seems more consistent, for both the interface and code (see below). > > I'm not really sure though. > > Those names don't seem to right to me. > I admit I like "nr_dirtied" and "nr_cleaned" that seems most > understood. These numbers also get very big pretty fast so I don't > think it's hard to infer. That's fine. I like "nr_cleaned". > >> In order to track the "cleaned" and "dirtied" counts we added two > >> vm_stat_items. Per memory node stats have been added also. So we can > >> see per node granularity: > >> > >> # cat /sys/devices/system/node/node20/writebackstat > >> Node 20 pages_writeback: 0 times > >> Node 20 pages_dirtied: 0 times > > > > I'd prefer the name "vmstat" over "writebackstat", and propose to > > migrate items from /proc/zoneinfo over time. zoneinfo is a terrible > > interface for scripting. > > I like vmstat also. I can do that. Thank you. > > Also, are there meaningful usage of per-node writeback stats? > > For us yes. We use fake numa nodes to implement cgroup memory isolation. > This allows us to see what the writeback behaviour is like per cgroup. That's sure convenient for you, for now. But it's special use case. I wonder if you'll still stick to the fake NUMA scenario two years later -- when memcg grows powerful enough. What do we do then? "Hey let's rip these counters, their major consumer has dumped them.." For per-job nr_dirtied, I suspect the per-process write_bytes and cancelled_write_bytes in /proc/self/io will serve you well. For per-job nr_cleaned, I suspect the per-zone nr_writeback will be sufficient for debug purposes (in despite of being a bit different). > > The numbers are naturally per-bdi ones instead. But if we plan to > > expose them for each bdi, this patch will need to be implemented > > vastly differently. > > Currently I have no plans to do that. Peter? :) Thanks, Fengguang ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-21 0:48 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-21 0:48 UTC (permalink / raw) To: Michael Rubin Cc: Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Sat, Aug 21, 2010 at 07:51:38AM +0800, Michael Rubin wrote: > On Fri, Aug 20, 2010 at 3:08 AM, Wu Fengguang <fengguang.wu@intel.com> wrote: > > How about the names nr_dirty_accumulated and nr_writeback_accumulated? > > It seems more consistent, for both the interface and code (see below). > > I'm not really sure though. > > Those names don't seem to right to me. > I admit I like "nr_dirtied" and "nr_cleaned" that seems most > understood. These numbers also get very big pretty fast so I don't > think it's hard to infer. That's fine. I like "nr_cleaned". > >> In order to track the "cleaned" and "dirtied" counts we added two > >> vm_stat_items. A Per memory node stats have been added also. So we can > >> see per node granularity: > >> > >> A A # cat /sys/devices/system/node/node20/writebackstat > >> A A Node 20 pages_writeback: 0 times > >> A A Node 20 pages_dirtied: 0 times > > > > I'd prefer the name "vmstat" over "writebackstat", and propose to > > migrate items from /proc/zoneinfo over time. zoneinfo is a terrible > > interface for scripting. > > I like vmstat also. I can do that. Thank you. > > Also, are there meaningful usage of per-node writeback stats? > > For us yes. We use fake numa nodes to implement cgroup memory isolation. > This allows us to see what the writeback behaviour is like per cgroup. That's sure convenient for you, for now. But it's special use case. I wonder if you'll still stick to the fake NUMA scenario two years later -- when memcg grows powerful enough. What do we do then? "Hey let's rip these counters, their major consumer has dumped them.." For per-job nr_dirtied, I suspect the per-process write_bytes and cancelled_write_bytes in /proc/self/io will serve you well. For per-job nr_cleaned, I suspect the per-zone nr_writeback will be sufficient for debug purposes (in despite of being a bit different). > > The numbers are naturally per-bdi ones instead. But if we plan to > > expose them for each bdi, this patch will need to be implemented > > vastly differently. > > Currently I have no plans to do that. Peter? :) Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-21 0:48 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-21 0:48 UTC (permalink / raw) To: Michael Rubin Cc: Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Sat, Aug 21, 2010 at 07:51:38AM +0800, Michael Rubin wrote: > On Fri, Aug 20, 2010 at 3:08 AM, Wu Fengguang <fengguang.wu@intel.com> wrote: > > How about the names nr_dirty_accumulated and nr_writeback_accumulated? > > It seems more consistent, for both the interface and code (see below). > > I'm not really sure though. > > Those names don't seem to right to me. > I admit I like "nr_dirtied" and "nr_cleaned" that seems most > understood. These numbers also get very big pretty fast so I don't > think it's hard to infer. That's fine. I like "nr_cleaned". > >> In order to track the "cleaned" and "dirtied" counts we added two > >> vm_stat_items. Per memory node stats have been added also. So we can > >> see per node granularity: > >> > >> # cat /sys/devices/system/node/node20/writebackstat > >> Node 20 pages_writeback: 0 times > >> Node 20 pages_dirtied: 0 times > > > > I'd prefer the name "vmstat" over "writebackstat", and propose to > > migrate items from /proc/zoneinfo over time. zoneinfo is a terrible > > interface for scripting. > > I like vmstat also. I can do that. Thank you. > > Also, are there meaningful usage of per-node writeback stats? > > For us yes. We use fake numa nodes to implement cgroup memory isolation. > This allows us to see what the writeback behaviour is like per cgroup. That's sure convenient for you, for now. But it's special use case. I wonder if you'll still stick to the fake NUMA scenario two years later -- when memcg grows powerful enough. What do we do then? "Hey let's rip these counters, their major consumer has dumped them.." For per-job nr_dirtied, I suspect the per-process write_bytes and cancelled_write_bytes in /proc/self/io will serve you well. For per-job nr_cleaned, I suspect the per-zone nr_writeback will be sufficient for debug purposes (in despite of being a bit different). > > The numbers are naturally per-bdi ones instead. But if we plan to > > expose them for each bdi, this patch will need to be implemented > > vastly differently. > > Currently I have no plans to do that. Peter? :) Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat 2010-08-21 0:48 ` Wu Fengguang @ 2010-08-23 17:45 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-23 17:45 UTC (permalink / raw) To: Wu Fengguang Cc: Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 5:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > I wonder if you'll still stick to the fake NUMA scenario two years > later -- when memcg grows powerful enough. What do we do then? "Hey > let's rip these counters, their major consumer has dumped them.." I think the counters will still be useful for NUMA also. Is there a performance hit here I am missing to having the per node counters? Just want to make sure we are only wondering about whether or not we are polluting the interface? Also since we plan to change the name to vmstat instead doesn't that make it more generic in the future? mrubin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-23 17:45 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-23 17:45 UTC (permalink / raw) To: Wu Fengguang Cc: Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 5:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > I wonder if you'll still stick to the fake NUMA scenario two years > later -- when memcg grows powerful enough. What do we do then? "Hey > let's rip these counters, their major consumer has dumped them.." I think the counters will still be useful for NUMA also. Is there a performance hit here I am missing to having the per node counters? Just want to make sure we are only wondering about whether or not we are polluting the interface? Also since we plan to change the name to vmstat instead doesn't that make it more generic in the future? mrubin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat 2010-08-23 17:45 ` Michael Rubin @ 2010-08-24 2:30 ` Wu Fengguang -1 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-24 2:30 UTC (permalink / raw) To: Michael Rubin Cc: Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Tue, Aug 24, 2010 at 01:45:41AM +0800, Michael Rubin wrote: > On Fri, Aug 20, 2010 at 5:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > > I wonder if you'll still stick to the fake NUMA scenario two years > > later -- when memcg grows powerful enough. What do we do then? "Hey > > let's rip these counters, their major consumer has dumped them.." > > I think the counters will still be useful for NUMA also. Is there a > performance hit here I am missing to having the per node counters? > Just want to make sure we are only wondering about whether or not we > are polluting the interface? Also since we plan to change the name to > vmstat instead doesn't that make it more generic in the future? It's about the interface, I don't mind you adding the per-node vmstat entries which may be convenient for you and mostly harmless to others. My concern is, what do you think about the existing /proc/<pid>/io:write_bytes interface and is it good enough for your? You'll have to iterate through tasks to collect numbers for one job or for the whole system, however that should be easy and more flexible? Thanks, Fengguang ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-24 2:30 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-24 2:30 UTC (permalink / raw) To: Michael Rubin Cc: Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Tue, Aug 24, 2010 at 01:45:41AM +0800, Michael Rubin wrote: > On Fri, Aug 20, 2010 at 5:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > > I wonder if you'll still stick to the fake NUMA scenario two years > > later -- when memcg grows powerful enough. What do we do then? "Hey > > let's rip these counters, their major consumer has dumped them.." > > I think the counters will still be useful for NUMA also. Is there a > performance hit here I am missing to having the per node counters? > Just want to make sure we are only wondering about whether or not we > are polluting the interface? Also since we plan to change the name to > vmstat instead doesn't that make it more generic in the future? It's about the interface, I don't mind you adding the per-node vmstat entries which may be convenient for you and mostly harmless to others. My concern is, what do you think about the existing /proc/<pid>/io:write_bytes interface and is it good enough for your? You'll have to iterate through tasks to collect numbers for one job or for the whole system, however that should be easy and more flexible? Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat 2010-08-24 2:30 ` Wu Fengguang @ 2010-08-24 3:02 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 3:02 UTC (permalink / raw) To: Wu Fengguang Cc: Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 7:30 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > It's about the interface, I don't mind you adding the per-node vmstat > entries which may be convenient for you and mostly harmless to others. > > My concern is, what do you think about the existing > /proc/<pid>/io:write_bytes interface and is it good enough for your? > You'll have to iterate through tasks to collect numbers for one job or > for the whole system, however that should be easy and more flexible? Is this as an alternative to the vmstat counters or the per node vmstat counters? In either case I am not sure /proc/pid will be sufficient. What if 20 processes are created, write a lot of data, then quit? How do we know about those events later? What about jobs that wake up, write data then quit quickly? If we are running many many tasks I would think the amount of time it can take to cycle through all of them might mean we might not capture all the data. The goal is to have a complete view of the dirtying and cleaning of the pages over time. Using /proc/pid feels like it won't achieve that. mrubin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-24 3:02 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 3:02 UTC (permalink / raw) To: Wu Fengguang Cc: Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 7:30 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > It's about the interface, I don't mind you adding the per-node vmstat > entries which may be convenient for you and mostly harmless to others. > > My concern is, what do you think about the existing > /proc/<pid>/io:write_bytes interface and is it good enough for your? > You'll have to iterate through tasks to collect numbers for one job or > for the whole system, however that should be easy and more flexible? Is this as an alternative to the vmstat counters or the per node vmstat counters? In either case I am not sure /proc/pid will be sufficient. What if 20 processes are created, write a lot of data, then quit? How do we know about those events later? What about jobs that wake up, write data then quit quickly? If we are running many many tasks I would think the amount of time it can take to cycle through all of them might mean we might not capture all the data. The goal is to have a complete view of the dirtying and cleaning of the pages over time. Using /proc/pid feels like it won't achieve that. mrubin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat 2010-08-24 3:02 ` Michael Rubin @ 2010-08-24 3:25 ` Wu Fengguang -1 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-24 3:25 UTC (permalink / raw) To: Michael Rubin Cc: Shailabh Nagar, Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Tue, Aug 24, 2010 at 11:02:42AM +0800, Michael Rubin wrote: > On Mon, Aug 23, 2010 at 7:30 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > > It's about the interface, I don't mind you adding the per-node vmstat > > entries which may be convenient for you and mostly harmless to others. > > > > My concern is, what do you think about the existing > > /proc/<pid>/io:write_bytes interface and is it good enough for your? > > You'll have to iterate through tasks to collect numbers for one job or > > for the whole system, however that should be easy and more flexible? > > Is this as an alternative to the vmstat counters or the per node > vmstat counters? > > In either case I am not sure /proc/pid will be sufficient. What if 20 > processes are created, write a lot of data, then quit? How do we know > about those events later? What about jobs that wake up, write data > then quit quickly? According to Documentation/accounting/taskstats.txt, it's possible to register a task for collecting its stat numbers when it quits. However I'm not sure if there are reliable ways to do this if tasks come and go very quickly. It may help to somehow automatically add the numbers to the parent process at process quit time. > If we are running many many tasks I would think the amount of time it > can take to cycle through all of them might mean we might not capture > all the data. Yes, 10k tasks may be too much to track. > The goal is to have a complete view of the dirtying and cleaning of > the pages over time. Using /proc/pid feels like it won't achieve that. Looks so. Thanks, Fengguang ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat @ 2010-08-24 3:25 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-24 3:25 UTC (permalink / raw) To: Michael Rubin Cc: Shailabh Nagar, Peter Zijlstra, KOSAKI Motohiro, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Tue, Aug 24, 2010 at 11:02:42AM +0800, Michael Rubin wrote: > On Mon, Aug 23, 2010 at 7:30 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > > It's about the interface, I don't mind you adding the per-node vmstat > > entries which may be convenient for you and mostly harmless to others. > > > > My concern is, what do you think about the existing > > /proc/<pid>/io:write_bytes interface and is it good enough for your? > > You'll have to iterate through tasks to collect numbers for one job or > > for the whole system, however that should be easy and more flexible? > > Is this as an alternative to the vmstat counters or the per node > vmstat counters? > > In either case I am not sure /proc/pid will be sufficient. What if 20 > processes are created, write a lot of data, then quit? How do we know > about those events later? What about jobs that wake up, write data > then quit quickly? According to Documentation/accounting/taskstats.txt, it's possible to register a task for collecting its stat numbers when it quits. However I'm not sure if there are reliable ways to do this if tasks come and go very quickly. It may help to somehow automatically add the numbers to the parent process at process quit time. > If we are running many many tasks I would think the amount of time it > can take to cycle through all of them might mean we might not capture > all the data. Yes, 10k tasks may be too much to track. > The goal is to have a complete view of the dirtying and cleaning of > the pages over time. Using /proc/pid feels like it won't achieve that. Looks so. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 9:31 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin The kernel already exposes the user desired thresholds in /proc/sys/vm with dirty_background_ratio and background_ratio. But the kernel may alter the number requested without giving the user any indication that is the case. Knowing the actual ratios the kernel is honoring can help app developers understand how their buffered IO will be sent to the disk. $ grep threshold /proc/vmstat nr_dirty_threshold 409111 nr_dirty_background_threshold 818223 Signed-off-by: Michael Rubin <mrubin@google.com> --- include/linux/mmzone.h | 3 +++ mm/vmstat.c | 5 +++++ 2 files changed, 8 insertions(+), 0 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fe4e6dd..c2243d0 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -106,6 +106,9 @@ enum zone_stat_item { NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ NR_FILE_PAGES_DIRTIED, /* number of times pages get dirtied */ NR_PAGES_ENTERED_WRITEBACK, /* number of times pages enter writeback */ + NR_DIRTY_THRESHOLD, /* writeback threshold */ + NR_DIRTY_BG_THRESHOLD, /* bg writeback threshold */ + #ifdef CONFIG_NUMA NUMA_HIT, /* allocated in intended node */ NUMA_MISS, /* allocated in non intended node */ diff --git a/mm/vmstat.c b/mm/vmstat.c index 073a496..c755d71 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -17,6 +17,7 @@ #include <linux/vmstat.h> #include <linux/sched.h> #include <linux/math64.h> +#include <linux/writeback.h> #ifdef CONFIG_VM_EVENT_COUNTERS DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}}; @@ -734,6 +735,8 @@ static const char * const vmstat_text[] = { "nr_shmem", "nr_dirtied", "nr_entered_writeback", + "nr_dirty_threshold", + "nr_dirty_background_threshold", #ifdef CONFIG_NUMA "numa_hit", @@ -917,6 +920,8 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos) return ERR_PTR(-ENOMEM); for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) v[i] = global_page_state(i); + + global_dirty_limits(v + NR_DIRTY_THRESHOLD, v + NR_DIRTY_BG_THRESHOLD); #ifdef CONFIG_VM_EVENT_COUNTERS e = v + NR_VM_ZONE_STAT_ITEMS; all_vm_events(e); -- 1.7.1 ^ permalink raw reply related [flat|nested] 59+ messages in thread
* [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-20 9:31 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-20 9:31 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-mm Cc: fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe, Michael Rubin The kernel already exposes the user desired thresholds in /proc/sys/vm with dirty_background_ratio and background_ratio. But the kernel may alter the number requested without giving the user any indication that is the case. Knowing the actual ratios the kernel is honoring can help app developers understand how their buffered IO will be sent to the disk. $ grep threshold /proc/vmstat nr_dirty_threshold 409111 nr_dirty_background_threshold 818223 Signed-off-by: Michael Rubin <mrubin@google.com> --- include/linux/mmzone.h | 3 +++ mm/vmstat.c | 5 +++++ 2 files changed, 8 insertions(+), 0 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fe4e6dd..c2243d0 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -106,6 +106,9 @@ enum zone_stat_item { NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ NR_FILE_PAGES_DIRTIED, /* number of times pages get dirtied */ NR_PAGES_ENTERED_WRITEBACK, /* number of times pages enter writeback */ + NR_DIRTY_THRESHOLD, /* writeback threshold */ + NR_DIRTY_BG_THRESHOLD, /* bg writeback threshold */ + #ifdef CONFIG_NUMA NUMA_HIT, /* allocated in intended node */ NUMA_MISS, /* allocated in non intended node */ diff --git a/mm/vmstat.c b/mm/vmstat.c index 073a496..c755d71 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -17,6 +17,7 @@ #include <linux/vmstat.h> #include <linux/sched.h> #include <linux/math64.h> +#include <linux/writeback.h> #ifdef CONFIG_VM_EVENT_COUNTERS DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}}; @@ -734,6 +735,8 @@ static const char * const vmstat_text[] = { "nr_shmem", "nr_dirtied", "nr_entered_writeback", + "nr_dirty_threshold", + "nr_dirty_background_threshold", #ifdef CONFIG_NUMA "numa_hit", @@ -917,6 +920,8 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos) return ERR_PTR(-ENOMEM); for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) v[i] = global_page_state(i); + + global_dirty_limits(v + NR_DIRTY_THRESHOLD, v + NR_DIRTY_BG_THRESHOLD); #ifdef CONFIG_VM_EVENT_COUNTERS e = v + NR_VM_ZONE_STAT_ITEMS; all_vm_events(e); -- 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 10:06 ` KOSAKI Motohiro -1 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-20 10:06 UTC (permalink / raw) To: Michael Rubin Cc: kosaki.motohiro, linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe > The kernel already exposes the user desired thresholds in /proc/sys/vm > with dirty_background_ratio and background_ratio. But the kernel may > alter the number requested without giving the user any indication that > is the case. > > Knowing the actual ratios the kernel is honoring can help app developers > understand how their buffered IO will be sent to the disk. > > $ grep threshold /proc/vmstat > nr_dirty_threshold 409111 > nr_dirty_background_threshold 818223 > > Signed-off-by: Michael Rubin <mrubin@google.com> Looks good to me. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-20 10:06 ` KOSAKI Motohiro 0 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-20 10:06 UTC (permalink / raw) To: Michael Rubin Cc: kosaki.motohiro, linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe > The kernel already exposes the user desired thresholds in /proc/sys/vm > with dirty_background_ratio and background_ratio. But the kernel may > alter the number requested without giving the user any indication that > is the case. > > Knowing the actual ratios the kernel is honoring can help app developers > understand how their buffered IO will be sent to the disk. > > $ grep threshold /proc/vmstat > nr_dirty_threshold 409111 > nr_dirty_background_threshold 818223 > > Signed-off-by: Michael Rubin <mrubin@google.com> Looks good to me. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-20 10:06 ` KOSAKI Motohiro (?) @ 2010-08-22 10:27 ` KOSAKI Motohiro -1 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-22 10:27 UTC (permalink / raw) To: KOSAKI Motohiro Cc: kosaki.motohiro, Michael Rubin, linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe > > The kernel already exposes the user desired thresholds in /proc/sys/vm > > with dirty_background_ratio and background_ratio. But the kernel may > > alter the number requested without giving the user any indication that > > is the case. > > > > Knowing the actual ratios the kernel is honoring can help app developers > > understand how their buffered IO will be sent to the disk. > > > > $ grep threshold /proc/vmstat > > nr_dirty_threshold 409111 > > nr_dirty_background_threshold 818223 > > > > Signed-off-by: Michael Rubin <mrubin@google.com> > > Looks good to me. > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> sorry, this is mistake. Wu pointed out this patch is unnecessary. Wu wrote: > I realized that the dirty thresholds has already been exported here: > > $ grep Thresh /debug/bdi/8:0/stats > BdiDirtyThresh: 381000 kB > DirtyThresh: 1719076 kB > BackgroundThresh: 859536 kB > > So why not use that interface directly? ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-22 10:27 ` KOSAKI Motohiro 0 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-22 10:27 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Michael Rubin, linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe > > The kernel already exposes the user desired thresholds in /proc/sys/vm > > with dirty_background_ratio and background_ratio. But the kernel may > > alter the number requested without giving the user any indication that > > is the case. > > > > Knowing the actual ratios the kernel is honoring can help app developers > > understand how their buffered IO will be sent to the disk. > > > > $ grep threshold /proc/vmstat > > nr_dirty_threshold 409111 > > nr_dirty_background_threshold 818223 > > > > Signed-off-by: Michael Rubin <mrubin@google.com> > > Looks good to me. > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> sorry, this is mistake. Wu pointed out this patch is unnecessary. Wu wrote: > I realized that the dirty thresholds has already been exported here: > > $ grep Thresh /debug/bdi/8:0/stats > BdiDirtyThresh: 381000 kB > DirtyThresh: 1719076 kB > BackgroundThresh: 859536 kB > > So why not use that interface directly? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-22 10:27 ` KOSAKI Motohiro 0 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-22 10:27 UTC (permalink / raw) To: KOSAKI Motohiro Cc: kosaki.motohiro, Michael Rubin, linux-kernel, linux-fsdevel, linux-mm, fengguang.wu, jack, riel, akpm, david, npiggin, hch, axboe > > The kernel already exposes the user desired thresholds in /proc/sys/vm > > with dirty_background_ratio and background_ratio. But the kernel may > > alter the number requested without giving the user any indication that > > is the case. > > > > Knowing the actual ratios the kernel is honoring can help app developers > > understand how their buffered IO will be sent to the disk. > > > > $ grep threshold /proc/vmstat > > nr_dirty_threshold 409111 > > nr_dirty_background_threshold 818223 > > > > Signed-off-by: Michael Rubin <mrubin@google.com> > > Looks good to me. > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> sorry, this is mistake. Wu pointed out this patch is unnecessary. Wu wrote: > I realized that the dirty thresholds has already been exported here: > > $ grep Thresh /debug/bdi/8:0/stats > BdiDirtyThresh: 381000 kB > DirtyThresh: 1719076 kB > BackgroundThresh: 859536 kB > > So why not use that interface directly? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-20 9:31 ` Michael Rubin @ 2010-08-20 10:12 ` Wu Fengguang -1 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-20 10:12 UTC (permalink / raw) To: Michael Rubin Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 05:31:29PM +0800, Michael Rubin wrote: > The kernel already exposes the user desired thresholds in /proc/sys/vm > with dirty_background_ratio and background_ratio. But the kernel may > alter the number requested without giving the user any indication that > is the case. > > Knowing the actual ratios the kernel is honoring can help app developers > understand how their buffered IO will be sent to the disk. > > $ grep threshold /proc/vmstat > nr_dirty_threshold 409111 > nr_dirty_background_threshold 818223 > > Signed-off-by: Michael Rubin <mrubin@google.com> > --- > include/linux/mmzone.h | 3 +++ > mm/vmstat.c | 5 +++++ > 2 files changed, 8 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fe4e6dd..c2243d0 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -106,6 +106,9 @@ enum zone_stat_item { > NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ > NR_FILE_PAGES_DIRTIED, /* number of times pages get dirtied */ > NR_PAGES_ENTERED_WRITEBACK, /* number of times pages enter writeback */ > + NR_DIRTY_THRESHOLD, /* writeback threshold */ > + NR_DIRTY_BG_THRESHOLD, /* bg writeback threshold */ This may cost cacheline. Thanks, Fengguang ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-20 10:12 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-20 10:12 UTC (permalink / raw) To: Michael Rubin Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 05:31:29PM +0800, Michael Rubin wrote: > The kernel already exposes the user desired thresholds in /proc/sys/vm > with dirty_background_ratio and background_ratio. But the kernel may > alter the number requested without giving the user any indication that > is the case. > > Knowing the actual ratios the kernel is honoring can help app developers > understand how their buffered IO will be sent to the disk. > > $ grep threshold /proc/vmstat > nr_dirty_threshold 409111 > nr_dirty_background_threshold 818223 > > Signed-off-by: Michael Rubin <mrubin@google.com> > --- > include/linux/mmzone.h | 3 +++ > mm/vmstat.c | 5 +++++ > 2 files changed, 8 insertions(+), 0 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fe4e6dd..c2243d0 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -106,6 +106,9 @@ enum zone_stat_item { > NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ > NR_FILE_PAGES_DIRTIED, /* number of times pages get dirtied */ > NR_PAGES_ENTERED_WRITEBACK, /* number of times pages enter writeback */ > + NR_DIRTY_THRESHOLD, /* writeback threshold */ > + NR_DIRTY_BG_THRESHOLD, /* bg writeback threshold */ This may cost cacheline. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-20 9:31 ` Michael Rubin @ 2010-08-21 5:48 ` Wu Fengguang -1 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-21 5:48 UTC (permalink / raw) To: Michael Rubin Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 05:31:29PM +0800, Michael Rubin wrote: > The kernel already exposes the user desired thresholds in /proc/sys/vm > with dirty_background_ratio and background_ratio. But the kernel may > alter the number requested without giving the user any indication that > is the case. > > Knowing the actual ratios the kernel is honoring can help app developers > understand how their buffered IO will be sent to the disk. > > $ grep threshold /proc/vmstat > nr_dirty_threshold 409111 > nr_dirty_background_threshold 818223 I realized that the dirty thresholds has already been exported here: $ grep Thresh /debug/bdi/8:0/stats BdiDirtyThresh: 381000 kB DirtyThresh: 1719076 kB BackgroundThresh: 859536 kB So why not use that interface directly? Thanks, Fengguang ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-21 5:48 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-21 5:48 UTC (permalink / raw) To: Michael Rubin Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 05:31:29PM +0800, Michael Rubin wrote: > The kernel already exposes the user desired thresholds in /proc/sys/vm > with dirty_background_ratio and background_ratio. But the kernel may > alter the number requested without giving the user any indication that > is the case. > > Knowing the actual ratios the kernel is honoring can help app developers > understand how their buffered IO will be sent to the disk. > > $ grep threshold /proc/vmstat > nr_dirty_threshold 409111 > nr_dirty_background_threshold 818223 I realized that the dirty thresholds has already been exported here: $ grep Thresh /debug/bdi/8:0/stats BdiDirtyThresh: 381000 kB DirtyThresh: 1719076 kB BackgroundThresh: 859536 kB So why not use that interface directly? Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-21 5:48 ` Wu Fengguang @ 2010-08-23 17:52 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-23 17:52 UTC (permalink / raw) To: Wu Fengguang Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 10:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > On Fri, Aug 20, 2010 at 05:31:29PM +0800, Michael Rubin wrote: >> The kernel already exposes the user desired thresholds in /proc/sys/vm >> with dirty_background_ratio and background_ratio. But the kernel may >> alter the number requested without giving the user any indication that >> is the case. >> >> Knowing the actual ratios the kernel is honoring can help app developers >> understand how their buffered IO will be sent to the disk. >> >> $ grep threshold /proc/vmstat >> nr_dirty_threshold 409111 >> nr_dirty_background_threshold 818223 > > I realized that the dirty thresholds has already been exported here: > > $ grep Thresh /debug/bdi/8:0/stats > BdiDirtyThresh: 381000 kB > DirtyThresh: 1719076 kB > BackgroundThresh: 859536 kB > > So why not use that interface directly? LOL. I know about these counters. This goes back and forth a lot. The reason we don't want to use this interface is several fold. 1) It's exporting the implementation of writeback. We are doing bdi today but one day we may not. 2) We need a non debugfs version since there are many situations where debugfs requires root to mount and non root users may want this data. Mounting debugfs all the time is not always an option. 3) Full system counters are easier to handle the juggling of removable storage where these numbers will appear and disappear due to being dynamic. The goal is to get a full view of the system writeback behaviour not a "kinda got it-oops maybe not" view. mrubin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-23 17:52 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-23 17:52 UTC (permalink / raw) To: Wu Fengguang Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Fri, Aug 20, 2010 at 10:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > On Fri, Aug 20, 2010 at 05:31:29PM +0800, Michael Rubin wrote: >> The kernel already exposes the user desired thresholds in /proc/sys/vm >> with dirty_background_ratio and background_ratio. But the kernel may >> alter the number requested without giving the user any indication that >> is the case. >> >> Knowing the actual ratios the kernel is honoring can help app developers >> understand how their buffered IO will be sent to the disk. >> >> $ grep threshold /proc/vmstat >> nr_dirty_threshold 409111 >> nr_dirty_background_threshold 818223 > > I realized that the dirty thresholds has already been exported here: > > $ grep Thresh /debug/bdi/8:0/stats > BdiDirtyThresh: 381000 kB > DirtyThresh: 1719076 kB > BackgroundThresh: 859536 kB > > So why not use that interface directly? LOL. I know about these counters. This goes back and forth a lot. The reason we don't want to use this interface is several fold. 1) It's exporting the implementation of writeback. We are doing bdi today but one day we may not. 2) We need a non debugfs version since there are many situations where debugfs requires root to mount and non root users may want this data. Mounting debugfs all the time is not always an option. 3) Full system counters are easier to handle the juggling of removable storage where these numbers will appear and disappear due to being dynamic. The goal is to get a full view of the system writeback behaviour not a "kinda got it-oops maybe not" view. mrubin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-23 17:52 ` Michael Rubin @ 2010-08-24 1:20 ` KOSAKI Motohiro -1 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-24 1:20 UTC (permalink / raw) To: Michael Rubin Cc: kosaki.motohiro, Wu Fengguang, Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe > On Fri, Aug 20, 2010 at 10:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > > On Fri, Aug 20, 2010 at 05:31:29PM +0800, Michael Rubin wrote: > >> The kernel already exposes the user desired thresholds in /proc/sys/vm > >> with dirty_background_ratio and background_ratio. But the kernel may > >> alter the number requested without giving the user any indication that > >> is the case. > >> > >> Knowing the actual ratios the kernel is honoring can help app developers > >> understand how their buffered IO will be sent to the disk. > >> > >> $ grep threshold /proc/vmstat > >> nr_dirty_threshold 409111 > >> nr_dirty_background_threshold 818223 > > > > I realized that the dirty thresholds has already been exported here: > > > > $ grep Thresh /debug/bdi/8:0/stats > > BdiDirtyThresh: 381000 kB > > DirtyThresh: 1719076 kB > > BackgroundThresh: 859536 kB > > > > So why not use that interface directly? > > LOL. I know about these counters. This goes back and forth a lot. > The reason we don't want to use this interface is several fold. Please don't use LOL if you want to get good discuttion. afaict, Wu have deep knowledge in this area. However all kernel-developer don't know all kernel knob. > > 1) It's exporting the implementation of writeback. We are doing bdi > today but one day we may not. > 2) We need a non debugfs version since there are many situations where > debugfs requires root to mount and non root users may want this data. > Mounting debugfs all the time is not always an option. In nowadays, many distro mount debugfs at boot time. so, can you please elaborate you worried risk? even though we have namespace. > 3) Full system counters are easier to handle the juggling of removable > storage where these numbers will appear and disappear due to being > dynamic. > > The goal is to get a full view of the system writeback behaviour not a > "kinda got it-oops maybe not" view. I bet nobody oppose this point :) ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-24 1:20 ` KOSAKI Motohiro 0 siblings, 0 replies; 59+ messages in thread From: KOSAKI Motohiro @ 2010-08-24 1:20 UTC (permalink / raw) To: Michael Rubin Cc: kosaki.motohiro, Wu Fengguang, Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe > On Fri, Aug 20, 2010 at 10:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > > On Fri, Aug 20, 2010 at 05:31:29PM +0800, Michael Rubin wrote: > >> The kernel already exposes the user desired thresholds in /proc/sys/vm > >> with dirty_background_ratio and background_ratio. But the kernel may > >> alter the number requested without giving the user any indication that > >> is the case. > >> > >> Knowing the actual ratios the kernel is honoring can help app developers > >> understand how their buffered IO will be sent to the disk. > >> > >> $ grep threshold /proc/vmstat > >> nr_dirty_threshold 409111 > >> nr_dirty_background_threshold 818223 > > > > I realized that the dirty thresholds has already been exported here: > > > > $ grep Thresh /debug/bdi/8:0/stats > > BdiDirtyThresh: 381000 kB > > DirtyThresh: 1719076 kB > > BackgroundThresh: 859536 kB > > > > So why not use that interface directly? > > LOL. I know about these counters. This goes back and forth a lot. > The reason we don't want to use this interface is several fold. Please don't use LOL if you want to get good discuttion. afaict, Wu have deep knowledge in this area. However all kernel-developer don't know all kernel knob. > > 1) It's exporting the implementation of writeback. We are doing bdi > today but one day we may not. > 2) We need a non debugfs version since there are many situations where > debugfs requires root to mount and non root users may want this data. > Mounting debugfs all the time is not always an option. In nowadays, many distro mount debugfs at boot time. so, can you please elaborate you worried risk? even though we have namespace. > 3) Full system counters are easier to handle the juggling of removable > storage where these numbers will appear and disappear due to being > dynamic. > > The goal is to get a full view of the system writeback behaviour not a > "kinda got it-oops maybe not" view. I bet nobody oppose this point :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-24 1:20 ` KOSAKI Motohiro (?) @ 2010-08-24 1:41 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 1:41 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Wu Fengguang, Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 6:20 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> On Fri, Aug 20, 2010 at 10:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: >> LOL. I know about these counters. This goes back and forth a lot. >> The reason we don't want to use this interface is several fold. > > Please don't use LOL if you want to get good discuttion. afaict, Wu have > deep knowledge in this area. However all kernel-developer don't know all > kernel knob. Apologies. No offense was intended. I was laughing at the situation and how I too once thought the per bdi counters were enough. Feng has been very helpful and patient. The discussion has done nothing but help the code so far so it is appreciated. > In nowadays, many distro mount debugfs at boot time. so, can you please > elaborate you worried risk? even though we have namespace. Right now we don't mount all of debugfs at boot time. We have not done the work to verify its safe in our environment. It's mostly a nit. Also I was under the impression that debugfs was intended more for kernel devs while /proc and /sys was intended for application developers. >> 3) Full system counters are easier to handle the juggling of removable >> storage where these numbers will appear and disappear due to being >> dynamic. This is the biggie to me. The idea is to get a complete view of the system's writeback behaviour over time. With systems with hot plug devices, or many many drives collecting that view gets difficult. >> The goal is to get a full view of the system writeback behaviour not a >> "kinda got it-oops maybe not" view. > > I bet nobody oppose this point :) Yup. mrubin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-24 1:41 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 1:41 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Wu Fengguang, Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 6:20 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> On Fri, Aug 20, 2010 at 10:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: >> LOL. I know about these counters. This goes back and forth a lot. >> The reason we don't want to use this interface is several fold. > > Please don't use LOL if you want to get good discuttion. afaict, Wu have > deep knowledge in this area. However all kernel-developer don't know all > kernel knob. Apologies. No offense was intended. I was laughing at the situation and how I too once thought the per bdi counters were enough. Feng has been very helpful and patient. The discussion has done nothing but help the code so far so it is appreciated. > In nowadays, many distro mount debugfs at boot time. so, can you please > elaborate you worried risk? even though we have namespace. Right now we don't mount all of debugfs at boot time. We have not done the work to verify its safe in our environment. It's mostly a nit. Also I was under the impression that debugfs was intended more for kernel devs while /proc and /sys was intended for application developers. >> 3) Full system counters are easier to handle the juggling of removable >> storage where these numbers will appear and disappear due to being >> dynamic. This is the biggie to me. The idea is to get a complete view of the system's writeback behaviour over time. With systems with hot plug devices, or many many drives collecting that view gets difficult. >> The goal is to get a full view of the system writeback behaviour not a >> "kinda got it-oops maybe not" view. > > I bet nobody oppose this point :) Yup. mrubin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-24 1:41 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 1:41 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Wu Fengguang, Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 6:20 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> On Fri, Aug 20, 2010 at 10:48 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: >> LOL. I know about these counters. This goes back and forth a lot. >> The reason we don't want to use this interface is several fold. > > Please don't use LOL if you want to get good discuttion. afaict, Wu have > deep knowledge in this area. However all kernel-developer don't know all > kernel knob. Apologies. No offense was intended. I was laughing at the situation and how I too once thought the per bdi counters were enough. Feng has been very helpful and patient. The discussion has done nothing but help the code so far so it is appreciated. > In nowadays, many distro mount debugfs at boot time. so, can you please > elaborate you worried risk? even though we have namespace. Right now we don't mount all of debugfs at boot time. We have not done the work to verify its safe in our environment. It's mostly a nit. Also I was under the impression that debugfs was intended more for kernel devs while /proc and /sys was intended for application developers. >> 3) Full system counters are easier to handle the juggling of removable >> storage where these numbers will appear and disappear due to being >> dynamic. This is the biggie to me. The idea is to get a complete view of the system's writeback behaviour over time. With systems with hot plug devices, or many many drives collecting that view gets difficult. >> The goal is to get a full view of the system writeback behaviour not a >> "kinda got it-oops maybe not" view. > > I bet nobody oppose this point :) Yup. mrubin -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-24 1:41 ` Michael Rubin (?) (?) @ 2010-08-24 2:11 ` Wu Fengguang 2010-08-24 2:42 ` Michael Rubin -1 siblings, 1 reply; 59+ messages in thread From: Wu Fengguang @ 2010-08-24 2:11 UTC (permalink / raw) To: Michael Rubin Cc: KOSAKI Motohiro, Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe > Right now we don't mount all of debugfs at boot time. We have not done > the work to verify its safe in our environment. It's mostly a nit. You work discreetly, that's a good thing. Note that most sub-directories under debugfs can be turned off in kconfig. > Also I was under the impression that debugfs was intended more for > kernel devs while /proc and /sys was intended for application > developers. I guess the keyword here is "debugging/diagnosing". Think about /debug/tracing. DirtyThresh seems like the same stuff. > >> 3) Full system counters are easier to handle the juggling of removable > >> storage where these numbers will appear and disappear due to being > >> dynamic. > > This is the biggie to me. The idea is to get a complete view of the > system's writeback behaviour over time. With systems with hot plug > devices, or many many drives collecting that view gets difficult. Sorry for giving a wrong example. Hope this one is better: $ cat /debug/bdi/default/stats [...] DirtyThresh: 1838904 kB BackgroundThresh: 919452 kB [...] It's a trick to avoid messing with real devices :) Thanks, Fengguang ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-24 2:11 ` Wu Fengguang @ 2010-08-24 2:42 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 2:42 UTC (permalink / raw) To: Wu Fengguang Cc: KOSAKI Motohiro, Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 7:11 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > Sorry for giving a wrong example. Hope this one is better: > > $ cat /debug/bdi/default/stats > [...] > DirtyThresh: 1838904 kB > BackgroundThresh: 919452 kB > [...] > > It's a trick to avoid messing with real devices :) That's cool. And it's the exact code path :-) mrubin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-24 2:42 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 2:42 UTC (permalink / raw) To: Wu Fengguang Cc: KOSAKI Motohiro, Peter Zijlstra, linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 7:11 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: > Sorry for giving a wrong example. Hope this one is better: > > $ cat /debug/bdi/default/stats > [...] > DirtyThresh: 1838904 kB > BackgroundThresh: 919452 kB > [...] > > It's a trick to avoid messing with real devices :) That's cool. And it's the exact code path :-) mrubin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-20 9:31 ` Michael Rubin @ 2010-08-24 2:01 ` Wu Fengguang -1 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-24 2:01 UTC (permalink / raw) To: Michael Rubin Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe > + global_dirty_limits(v + NR_DIRTY_THRESHOLD, v + NR_DIRTY_BG_THRESHOLD); Sorry I messed it up. The parameters should be swapped. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-24 2:01 ` Wu Fengguang 0 siblings, 0 replies; 59+ messages in thread From: Wu Fengguang @ 2010-08-24 2:01 UTC (permalink / raw) To: Michael Rubin Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe > + global_dirty_limits(v + NR_DIRTY_THRESHOLD, v + NR_DIRTY_BG_THRESHOLD); Sorry I messed it up. The parameters should be swapped. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat 2010-08-24 2:01 ` Wu Fengguang (?) @ 2010-08-24 2:04 ` Michael Rubin -1 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 2:04 UTC (permalink / raw) To: Wu Fengguang Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 7:01 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: >> + global_dirty_limits(v + NR_DIRTY_THRESHOLD, v + NR_DIRTY_BG_THRESHOLD); > > Sorry I messed it up. The parameters should be swapped. Got it. Thanks. mrubin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-24 2:04 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 2:04 UTC (permalink / raw) To: Wu Fengguang Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 7:01 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: >> + global_dirty_limits(v + NR_DIRTY_THRESHOLD, v + NR_DIRTY_BG_THRESHOLD); > > Sorry I messed it up. The parameters should be swapped. Got it. Thanks. mrubin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 4/4] writeback: Reporting dirty thresholds in /proc/vmstat @ 2010-08-24 2:04 ` Michael Rubin 0 siblings, 0 replies; 59+ messages in thread From: Michael Rubin @ 2010-08-24 2:04 UTC (permalink / raw) To: Wu Fengguang Cc: linux-kernel, linux-fsdevel, linux-mm, jack, riel, akpm, david, npiggin, hch, axboe On Mon, Aug 23, 2010 at 7:01 PM, Wu Fengguang <fengguang.wu@intel.com> wrote: >> + global_dirty_limits(v + NR_DIRTY_THRESHOLD, v + NR_DIRTY_BG_THRESHOLD); > > Sorry I messed it up. The parameters should be swapped. Got it. Thanks. mrubin -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 59+ messages in thread
end of thread, other threads:[~2010-08-24 3:26 UTC | newest] Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-08-20 9:31 [PATCH 0/4] writeback: kernel visibility Michael Rubin 2010-08-20 9:31 ` Michael Rubin 2010-08-20 9:31 ` [PATCH 1/4] mm: exporting account_page_dirty Michael Rubin 2010-08-20 9:31 ` Michael Rubin 2010-08-20 9:39 ` Wu Fengguang 2010-08-20 9:39 ` Wu Fengguang 2010-08-20 15:37 ` Sage Weil 2010-08-20 15:37 ` Sage Weil 2010-08-20 9:31 ` [PATCH 2/4] mm: account_page_writeback added Michael Rubin 2010-08-20 9:31 ` Michael Rubin 2010-08-20 9:45 ` Wu Fengguang 2010-08-20 9:45 ` Wu Fengguang 2010-08-20 10:08 ` KOSAKI Motohiro 2010-08-20 10:08 ` KOSAKI Motohiro 2010-08-20 9:31 ` [PATCH 3/4] writeback: nr_dirtied and nr_entered_writeback in /proc/vmstat Michael Rubin 2010-08-20 9:31 ` Michael Rubin 2010-08-20 10:05 ` KOSAKI Motohiro 2010-08-20 10:05 ` KOSAKI Motohiro 2010-08-20 10:08 ` Wu Fengguang 2010-08-20 10:08 ` Wu Fengguang 2010-08-20 23:51 ` Michael Rubin 2010-08-20 23:51 ` Michael Rubin 2010-08-21 0:48 ` Wu Fengguang 2010-08-21 0:48 ` Wu Fengguang 2010-08-21 0:48 ` Wu Fengguang 2010-08-23 17:45 ` Michael Rubin 2010-08-23 17:45 ` Michael Rubin 2010-08-24 2:30 ` Wu Fengguang 2010-08-24 2:30 ` Wu Fengguang 2010-08-24 3:02 ` Michael Rubin 2010-08-24 3:02 ` Michael Rubin 2010-08-24 3:25 ` Wu Fengguang 2010-08-24 3:25 ` Wu Fengguang 2010-08-20 9:31 ` [PATCH 4/4] writeback: Reporting dirty thresholds " Michael Rubin 2010-08-20 9:31 ` Michael Rubin 2010-08-20 10:06 ` KOSAKI Motohiro 2010-08-20 10:06 ` KOSAKI Motohiro 2010-08-22 10:27 ` KOSAKI Motohiro 2010-08-22 10:27 ` KOSAKI Motohiro 2010-08-22 10:27 ` KOSAKI Motohiro 2010-08-20 10:12 ` Wu Fengguang 2010-08-20 10:12 ` Wu Fengguang 2010-08-21 5:48 ` Wu Fengguang 2010-08-21 5:48 ` Wu Fengguang 2010-08-23 17:52 ` Michael Rubin 2010-08-23 17:52 ` Michael Rubin 2010-08-24 1:20 ` KOSAKI Motohiro 2010-08-24 1:20 ` KOSAKI Motohiro 2010-08-24 1:41 ` Michael Rubin 2010-08-24 1:41 ` Michael Rubin 2010-08-24 1:41 ` Michael Rubin 2010-08-24 2:11 ` Wu Fengguang 2010-08-24 2:42 ` Michael Rubin 2010-08-24 2:42 ` Michael Rubin 2010-08-24 2:01 ` Wu Fengguang 2010-08-24 2:01 ` Wu Fengguang 2010-08-24 2:04 ` Michael Rubin 2010-08-24 2:04 ` Michael Rubin 2010-08-24 2:04 ` Michael Rubin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.