All of lore.kernel.org
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Greg Thelen <gthelen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Balbir Singh <bsingharora@gmail.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Dave Chinner <david@fromorbit.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Andrea Righi <andrea@betterlinux.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v9 11/13] writeback: make background writeback cgroup aware
Date: Thu, 18 Aug 2011 10:23:44 +0900	[thread overview]
Message-ID: <20110818102344.110829ce.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <1313597705-6093-12-git-send-email-gthelen@google.com>

On Wed, 17 Aug 2011 09:15:03 -0700
Greg Thelen <gthelen@google.com> wrote:

> When the system is under background dirty memory threshold but some
> cgroups are over their background dirty memory thresholds, then only
> writeback inodes associated with the over-limit cgroups.
> 
> In addition to checking if the system dirty memory usage is over the
> system background threshold, over_bground_thresh() now checks if any
> cgroups are over their respective background dirty memory thresholds.
> 
> If over-limit cgroups are found, then the new
> wb_writeback_work.for_cgroup field is set to distinguish between system
> and memcg overages.  The new wb_writeback_work.shared_inodes field is
> also set.  Inodes written by multiple cgroup are marked owned by
> I_MEMCG_SHARED rather than a particular cgroup.  Such shared inodes
> cannot easily be attributed to a cgroup, so per-cgroup writeback
> (futures version of wakeup_flusher_threads and balance_dirty_pages)
> performs suboptimally in the presence of shared inodes.  Therefore,
> write shared inodes when performing cgroup background writeback.
> 
> If performing cgroup writeback, move_expired_inodes() skips inodes that
> do not contribute dirty pages to the cgroup being written back.
> 
> After writing some pages, wb_writeback() will call
> mem_cgroup_writeback_done() to update the set of over-bg-limits memcg.
> 
> This change also makes wakeup_flusher_threads() memcg aware so that
> per-cgroup try_to_free_pages() is able to operate more efficiently
> without having to write pages of foreign containers.  This change adds a
> mem_cgroup parameter to wakeup_flusher_threads() to allow callers,
> especially try_to_free_pages() and foreground writeback from
> balance_dirty_pages(), to specify a particular cgroup to write inodes
> from.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---
> Changelog since v8:
> 
> - Added optional memcg parameter to __bdi_start_writeback(),
>   bdi_start_writeback(), wakeup_flusher_threads(), writeback_inodes_wb().
> 
> - move_expired_inodes() now uses pass in struct wb_writeback_work instead of
>   struct writeback_control.
> 
> - Added comments to over_bground_thresh().
> 
>  fs/buffer.c               |    2 +-
>  fs/fs-writeback.c         |   96 +++++++++++++++++++++++++++++++++-----------
>  fs/sync.c                 |    2 +-
>  include/linux/writeback.h |    6 ++-
>  mm/backing-dev.c          |    3 +-
>  mm/page-writeback.c       |    3 +-
>  mm/vmscan.c               |    3 +-
>  7 files changed, 84 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index dd0220b..da1fb23 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -293,7 +293,7 @@ static void free_more_memory(void)
>  	struct zone *zone;
>  	int nid;
>  
> -	wakeup_flusher_threads(1024);
> +	wakeup_flusher_threads(1024, NULL);
>  	yield();
>  
>  	for_each_online_node(nid) {
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index e91fb82..ba55336 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -38,10 +38,14 @@ struct wb_writeback_work {
>  	struct super_block *sb;
>  	unsigned long *older_than_this;
>  	enum writeback_sync_modes sync_mode;
> +	unsigned short memcg_id;	/* If non-zero, then writeback specified
> +					 * cgroup. */
>  	unsigned int tagged_writepages:1;
>  	unsigned int for_kupdate:1;
>  	unsigned int range_cyclic:1;
>  	unsigned int for_background:1;
> +	unsigned int for_cgroup:1;	/* cgroup writeback */
> +	unsigned int shared_inodes:1;	/* write inodes spanning cgroups */
>  
>  	struct list_head list;		/* pending work list */
>  	struct completion *done;	/* set if the caller waits */
> @@ -114,9 +118,12 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
>  	spin_unlock_bh(&bdi->wb_lock);
>  }
>  
> +/*
> + * @memcg is optional.  If set, then limit writeback to the specified cgroup.
> + */
>  static void
>  __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
> -		      bool range_cyclic)
> +		      bool range_cyclic, struct mem_cgroup *memcg)
>  {
>  	struct wb_writeback_work *work;
>  
> @@ -136,6 +143,8 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
>  	work->sync_mode	= WB_SYNC_NONE;
>  	work->nr_pages	= nr_pages;
>  	work->range_cyclic = range_cyclic;
> +	work->memcg_id = memcg ? css_id(mem_cgroup_css(memcg)) : 0;
> +	work->for_cgroup = memcg != NULL;
>  


I couldn't find a patch for mem_cgroup_css(NULL). Is it in patch 1-10 ?
Other parts seems ok to me.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Greg Thelen <gthelen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Balbir Singh <bsingharora@gmail.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Dave Chinner <david@fromorbit.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Andrea Righi <andrea@betterlinux.com>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v9 11/13] writeback: make background writeback cgroup aware
Date: Thu, 18 Aug 2011 10:23:44 +0900	[thread overview]
Message-ID: <20110818102344.110829ce.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <1313597705-6093-12-git-send-email-gthelen@google.com>

On Wed, 17 Aug 2011 09:15:03 -0700
Greg Thelen <gthelen@google.com> wrote:

> When the system is under background dirty memory threshold but some
> cgroups are over their background dirty memory thresholds, then only
> writeback inodes associated with the over-limit cgroups.
> 
> In addition to checking if the system dirty memory usage is over the
> system background threshold, over_bground_thresh() now checks if any
> cgroups are over their respective background dirty memory thresholds.
> 
> If over-limit cgroups are found, then the new
> wb_writeback_work.for_cgroup field is set to distinguish between system
> and memcg overages.  The new wb_writeback_work.shared_inodes field is
> also set.  Inodes written by multiple cgroup are marked owned by
> I_MEMCG_SHARED rather than a particular cgroup.  Such shared inodes
> cannot easily be attributed to a cgroup, so per-cgroup writeback
> (futures version of wakeup_flusher_threads and balance_dirty_pages)
> performs suboptimally in the presence of shared inodes.  Therefore,
> write shared inodes when performing cgroup background writeback.
> 
> If performing cgroup writeback, move_expired_inodes() skips inodes that
> do not contribute dirty pages to the cgroup being written back.
> 
> After writing some pages, wb_writeback() will call
> mem_cgroup_writeback_done() to update the set of over-bg-limits memcg.
> 
> This change also makes wakeup_flusher_threads() memcg aware so that
> per-cgroup try_to_free_pages() is able to operate more efficiently
> without having to write pages of foreign containers.  This change adds a
> mem_cgroup parameter to wakeup_flusher_threads() to allow callers,
> especially try_to_free_pages() and foreground writeback from
> balance_dirty_pages(), to specify a particular cgroup to write inodes
> from.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---
> Changelog since v8:
> 
> - Added optional memcg parameter to __bdi_start_writeback(),
>   bdi_start_writeback(), wakeup_flusher_threads(), writeback_inodes_wb().
> 
> - move_expired_inodes() now uses pass in struct wb_writeback_work instead of
>   struct writeback_control.
> 
> - Added comments to over_bground_thresh().
> 
>  fs/buffer.c               |    2 +-
>  fs/fs-writeback.c         |   96 +++++++++++++++++++++++++++++++++-----------
>  fs/sync.c                 |    2 +-
>  include/linux/writeback.h |    6 ++-
>  mm/backing-dev.c          |    3 +-
>  mm/page-writeback.c       |    3 +-
>  mm/vmscan.c               |    3 +-
>  7 files changed, 84 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index dd0220b..da1fb23 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -293,7 +293,7 @@ static void free_more_memory(void)
>  	struct zone *zone;
>  	int nid;
>  
> -	wakeup_flusher_threads(1024);
> +	wakeup_flusher_threads(1024, NULL);
>  	yield();
>  
>  	for_each_online_node(nid) {
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index e91fb82..ba55336 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -38,10 +38,14 @@ struct wb_writeback_work {
>  	struct super_block *sb;
>  	unsigned long *older_than_this;
>  	enum writeback_sync_modes sync_mode;
> +	unsigned short memcg_id;	/* If non-zero, then writeback specified
> +					 * cgroup. */
>  	unsigned int tagged_writepages:1;
>  	unsigned int for_kupdate:1;
>  	unsigned int range_cyclic:1;
>  	unsigned int for_background:1;
> +	unsigned int for_cgroup:1;	/* cgroup writeback */
> +	unsigned int shared_inodes:1;	/* write inodes spanning cgroups */
>  
>  	struct list_head list;		/* pending work list */
>  	struct completion *done;	/* set if the caller waits */
> @@ -114,9 +118,12 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
>  	spin_unlock_bh(&bdi->wb_lock);
>  }
>  
> +/*
> + * @memcg is optional.  If set, then limit writeback to the specified cgroup.
> + */
>  static void
>  __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
> -		      bool range_cyclic)
> +		      bool range_cyclic, struct mem_cgroup *memcg)
>  {
>  	struct wb_writeback_work *work;
>  
> @@ -136,6 +143,8 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
>  	work->sync_mode	= WB_SYNC_NONE;
>  	work->nr_pages	= nr_pages;
>  	work->range_cyclic = range_cyclic;
> +	work->memcg_id = memcg ? css_id(mem_cgroup_css(memcg)) : 0;
> +	work->for_cgroup = memcg != NULL;
>  


I couldn't find a patch for mem_cgroup_css(NULL). Is it in patch 1-10 ?
Other parts seems ok to me.


Thanks,
-Kame


  reply	other threads:[~2011-08-18  1:23 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-17 16:14 [PATCH v9 00/13] memcg: per cgroup dirty page limiting Greg Thelen
2011-08-17 16:14 ` Greg Thelen
2011-08-17 16:14 ` [PATCH v9 01/13] memcg: document cgroup dirty memory interfaces Greg Thelen
2011-08-17 16:14   ` Greg Thelen
2011-08-17 16:14 ` [PATCH v9 02/13] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2011-08-17 16:14   ` Greg Thelen
2011-08-17 16:14 ` [PATCH v9 03/13] memcg: add dirty page accounting infrastructure Greg Thelen
2011-08-17 16:14   ` Greg Thelen
2011-08-18  0:39   ` KAMEZAWA Hiroyuki
2011-08-18  0:39     ` KAMEZAWA Hiroyuki
2011-08-18  6:07     ` Greg Thelen
2011-08-18  6:07       ` Greg Thelen
2011-08-17 16:14 ` [PATCH v9 04/13] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2011-08-17 16:14   ` Greg Thelen
2011-08-17 16:14 ` [PATCH v9 05/13] memcg: add mem_cgroup_mark_inode_dirty() Greg Thelen
2011-08-17 16:14   ` Greg Thelen
2011-08-18  0:51   ` KAMEZAWA Hiroyuki
2011-08-18  0:51     ` KAMEZAWA Hiroyuki
2011-08-17 16:14 ` [PATCH v9 06/13] memcg: add dirty limits to mem_cgroup Greg Thelen
2011-08-17 16:14   ` Greg Thelen
2011-08-18  0:53   ` KAMEZAWA Hiroyuki
2011-08-18  0:53     ` KAMEZAWA Hiroyuki
2011-08-17 16:14 ` [PATCH v9 07/13] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2011-08-17 16:14   ` Greg Thelen
2011-08-18  0:55   ` KAMEZAWA Hiroyuki
2011-08-18  0:55     ` KAMEZAWA Hiroyuki
2011-08-17 16:15 ` [PATCH v9 08/13] memcg: dirty page accounting support routines Greg Thelen
2011-08-17 16:15   ` Greg Thelen
2011-08-18  1:05   ` KAMEZAWA Hiroyuki
2011-08-18  1:05     ` KAMEZAWA Hiroyuki
2011-08-18  7:04     ` Greg Thelen
2011-08-18  7:04       ` Greg Thelen
2011-08-17 16:15 ` [PATCH v9 09/13] memcg: create support routines for writeback Greg Thelen
2011-08-17 16:15   ` Greg Thelen
2011-08-18  1:13   ` KAMEZAWA Hiroyuki
2011-08-18  1:13     ` KAMEZAWA Hiroyuki
2011-08-17 16:15 ` [PATCH v9 10/13] writeback: pass wb_writeback_work into move_expired_inodes() Greg Thelen
2011-08-17 16:15   ` Greg Thelen
2011-08-18  1:15   ` KAMEZAWA Hiroyuki
2011-08-18  1:15     ` KAMEZAWA Hiroyuki
2011-08-17 16:15 ` [PATCH v9 11/13] writeback: make background writeback cgroup aware Greg Thelen
2011-08-17 16:15   ` Greg Thelen
2011-08-18  1:23   ` KAMEZAWA Hiroyuki [this message]
2011-08-18  1:23     ` KAMEZAWA Hiroyuki
2011-08-18  7:10     ` Greg Thelen
2011-08-18  7:10       ` Greg Thelen
2011-08-18  7:17       ` KAMEZAWA Hiroyuki
2011-08-18  7:17         ` KAMEZAWA Hiroyuki
2011-08-18  7:38         ` Greg Thelen
2011-08-18  7:38           ` Greg Thelen
2011-08-18  7:35           ` KAMEZAWA Hiroyuki
2011-08-18  7:35             ` KAMEZAWA Hiroyuki
2011-08-17 16:15 ` [PATCH v9 12/13] memcg: create support routines for page writeback Greg Thelen
2011-08-17 16:15   ` Greg Thelen
2011-08-18  1:38   ` KAMEZAWA Hiroyuki
2011-08-18  1:38     ` KAMEZAWA Hiroyuki
2011-08-18  2:36     ` Wu Fengguang
2011-08-18  2:36       ` Wu Fengguang
2011-08-18 10:12       ` Jan Kara
2011-08-18 10:12         ` Jan Kara
2011-08-18 12:17         ` Wu Fengguang
2011-08-18 12:17           ` Wu Fengguang
2011-08-18 20:08           ` Jan Kara
2011-08-18 20:08             ` Jan Kara
2011-08-19  1:36             ` Wu Fengguang
2011-08-19  1:36               ` Wu Fengguang
2011-08-17 16:15 ` [PATCH v9 13/13] memcg: check memcg dirty limits in " Greg Thelen
2011-08-17 16:15   ` Greg Thelen
2011-08-18  1:40   ` KAMEZAWA Hiroyuki
2011-08-18  1:40     ` KAMEZAWA Hiroyuki
2011-08-18  0:35 ` [PATCH v9 00/13] memcg: per cgroup dirty page limiting KAMEZAWA Hiroyuki
2011-08-18  0:35   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110818102344.110829ce.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@betterlinux.com \
    --cc=bsingharora@gmail.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.