All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Thelen <gthelen@google.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH v8 11/12] writeback: make background writeback cgroup aware
Date: Tue, 07 Jun 2011 13:43:08 -0700	[thread overview]
Message-ID: <xr93lixdv0df.fsf@gthelen.mtv.corp.google.com> (raw)
In-Reply-To: <20110607193835.GD26965@redhat.com> (Vivek Goyal's message of "Tue, 7 Jun 2011 15:38:35 -0400")

Vivek Goyal <vgoyal@redhat.com> writes:

> On Fri, Jun 03, 2011 at 09:12:17AM -0700, Greg Thelen wrote:
>> When the system is under background dirty memory threshold but a cgroup
>> is over its background dirty memory threshold, then only writeback
>> inodes associated with the over-limit cgroup(s).
>> 
>
> [..]
>> -static inline bool over_bground_thresh(void)
>> +static inline bool over_bground_thresh(struct bdi_writeback *wb,
>> +				       struct writeback_control *wbc)
>>  {
>>  	unsigned long background_thresh, dirty_thresh;
>>  
>>  	global_dirty_limits(&background_thresh, &dirty_thresh);
>>  
>> -	return (global_page_state(NR_FILE_DIRTY) +
>> -		global_page_state(NR_UNSTABLE_NFS) > background_thresh);
>> +	if (global_page_state(NR_FILE_DIRTY) +
>> +	    global_page_state(NR_UNSTABLE_NFS) > background_thresh) {
>> +		wbc->for_cgroup = 0;
>> +		return true;
>> +	}
>> +
>> +	wbc->for_cgroup = 1;
>> +	wbc->shared_inodes = 1;
>> +	return mem_cgroups_over_bground_dirty_thresh();
>>  }
>
> Hi Greg,
>
> So all the logic of writeout from mem cgroup works only if system is
> below background limit. The moment we cross background limit, looks
> like we will fall back to existing way of writting inodes?

Correct.  If the system is over its background limit then the previous
cgroup-unaware background writeback occurs.  I think of the system
limits as those of the root cgroup.  If the system is over the global
limit than all cgroups are eligible for writeback.  In this situation
the current code does not distinguish between cgroups over or under
their dirty background limit.

Vivek Goyal <vgoyal@redhat.com> writes:
> If yes, then from design point of view it is little odd that as long
> as we are below background limit, we share the bdi between different
> cgroups. The moment we are above background limit, we fall back to
> algorithm of sharing the disk among individual inodes and forget
> about memory cgroups. Kind of awkward.
>
> This kind of cgroup writeback I think will atleast not solve the problem
> for CFQ IO controller, as we fall back to old ways of writting back inodes
> the moment we cross dirty ratio.

It might make more sense to reverse the order of the checks in the
proposed over_bground_thresh(): the new version would first check if any
memcg are over limit; assuming none are over limit, then check global
limits.  Assuming that the system is over its background limit and some
cgroups are also over their limits, then the over limit cgroups would
first be written possibly getting the system below its limit.  Does this
address your concern?

Note: mem_cgroup_balance_dirty_pages() (patch 10/12) will perform
foreground writeback when a memcg is above its dirty limit.  This would
offer CFQ multiple tasks issuing IO.

> Also have you done any benchmarking regarding what's the overhead of
> going through say thousands of inodes to find the inode which is eligible
> for writeback from a cgroup? I think Dave Chinner had raised this concern
> in the past.
>
> Thanks
> Vivek

I will collect some performance data measuring the cost of scanning.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Greg Thelen <gthelen@google.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH v8 11/12] writeback: make background writeback cgroup aware
Date: Tue, 07 Jun 2011 13:43:08 -0700	[thread overview]
Message-ID: <xr93lixdv0df.fsf@gthelen.mtv.corp.google.com> (raw)
In-Reply-To: <20110607193835.GD26965@redhat.com> (Vivek Goyal's message of "Tue, 7 Jun 2011 15:38:35 -0400")

Vivek Goyal <vgoyal@redhat.com> writes:

> On Fri, Jun 03, 2011 at 09:12:17AM -0700, Greg Thelen wrote:
>> When the system is under background dirty memory threshold but a cgroup
>> is over its background dirty memory threshold, then only writeback
>> inodes associated with the over-limit cgroup(s).
>> 
>
> [..]
>> -static inline bool over_bground_thresh(void)
>> +static inline bool over_bground_thresh(struct bdi_writeback *wb,
>> +				       struct writeback_control *wbc)
>>  {
>>  	unsigned long background_thresh, dirty_thresh;
>>  
>>  	global_dirty_limits(&background_thresh, &dirty_thresh);
>>  
>> -	return (global_page_state(NR_FILE_DIRTY) +
>> -		global_page_state(NR_UNSTABLE_NFS) > background_thresh);
>> +	if (global_page_state(NR_FILE_DIRTY) +
>> +	    global_page_state(NR_UNSTABLE_NFS) > background_thresh) {
>> +		wbc->for_cgroup = 0;
>> +		return true;
>> +	}
>> +
>> +	wbc->for_cgroup = 1;
>> +	wbc->shared_inodes = 1;
>> +	return mem_cgroups_over_bground_dirty_thresh();
>>  }
>
> Hi Greg,
>
> So all the logic of writeout from mem cgroup works only if system is
> below background limit. The moment we cross background limit, looks
> like we will fall back to existing way of writting inodes?

Correct.  If the system is over its background limit then the previous
cgroup-unaware background writeback occurs.  I think of the system
limits as those of the root cgroup.  If the system is over the global
limit than all cgroups are eligible for writeback.  In this situation
the current code does not distinguish between cgroups over or under
their dirty background limit.

Vivek Goyal <vgoyal@redhat.com> writes:
> If yes, then from design point of view it is little odd that as long
> as we are below background limit, we share the bdi between different
> cgroups. The moment we are above background limit, we fall back to
> algorithm of sharing the disk among individual inodes and forget
> about memory cgroups. Kind of awkward.
>
> This kind of cgroup writeback I think will atleast not solve the problem
> for CFQ IO controller, as we fall back to old ways of writting back inodes
> the moment we cross dirty ratio.

It might make more sense to reverse the order of the checks in the
proposed over_bground_thresh(): the new version would first check if any
memcg are over limit; assuming none are over limit, then check global
limits.  Assuming that the system is over its background limit and some
cgroups are also over their limits, then the over limit cgroups would
first be written possibly getting the system below its limit.  Does this
address your concern?

Note: mem_cgroup_balance_dirty_pages() (patch 10/12) will perform
foreground writeback when a memcg is above its dirty limit.  This would
offer CFQ multiple tasks issuing IO.

> Also have you done any benchmarking regarding what's the overhead of
> going through say thousands of inodes to find the inode which is eligible
> for writeback from a cgroup? I think Dave Chinner had raised this concern
> in the past.
>
> Thanks
> Vivek

I will collect some performance data measuring the cost of scanning.

  parent reply	other threads:[~2011-06-07 20:43 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-03 16:12 [PATCH v8 00/12] memcg: per cgroup dirty page accounting Greg Thelen
2011-06-03 16:12 ` Greg Thelen
2011-06-03 16:12 ` [PATCH v8 01/12] memcg: document cgroup dirty memory interfaces Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04  9:54   ` Minchan Kim
2011-06-04  9:54     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 02/12] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04  9:56   ` Minchan Kim
2011-06-04  9:56     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 03/12] memcg: add mem_cgroup_mark_inode_dirty() Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-03 23:09   ` Andrea Righi
2011-06-03 23:09     ` Andrea Righi
2011-06-03 23:45     ` Greg Thelen
2011-06-03 23:45       ` Greg Thelen
2011-06-07  7:27   ` KAMEZAWA Hiroyuki
2011-06-07  7:27     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 04/12] memcg: add dirty page accounting infrastructure Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 10:11   ` Minchan Kim
2011-06-04 10:11     ` Minchan Kim
2011-06-07  7:28   ` KAMEZAWA Hiroyuki
2011-06-07  7:28     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 05/12] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 15:42   ` Minchan Kim
2011-06-04 15:42     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 06/12] memcg: add dirty limits to mem_cgroup Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 15:57   ` Minchan Kim
2011-06-04 15:57     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 07/12] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 16:04   ` Minchan Kim
2011-06-04 16:04     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 08/12] memcg: dirty page accounting support routines Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-07  7:44   ` KAMEZAWA Hiroyuki
2011-06-07  7:44     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 09/12] memcg: create support routines for writeback Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-05  2:46   ` Minchan Kim
2011-06-05  2:46     ` Minchan Kim
2011-06-07  7:46   ` KAMEZAWA Hiroyuki
2011-06-07  7:46     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 10/12] memcg: create support routines for page-writeback Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-05  3:11   ` Minchan Kim
2011-06-05  3:11     ` Minchan Kim
2011-06-06 18:47     ` Greg Thelen
2011-06-06 18:47       ` Greg Thelen
2011-06-07  8:50   ` KAMEZAWA Hiroyuki
2011-06-07  8:50     ` KAMEZAWA Hiroyuki
2011-06-07 15:58     ` Greg Thelen
2011-06-07 15:58       ` Greg Thelen
2011-06-08  0:01       ` KAMEZAWA Hiroyuki
2011-06-08  0:01         ` KAMEZAWA Hiroyuki
2011-06-08  1:50         ` Greg Thelen
2011-06-08  1:50           ` Greg Thelen
2011-06-03 16:12 ` [PATCH v8 11/12] writeback: make background writeback cgroup aware Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-05  4:11   ` Minchan Kim
2011-06-05  4:11     ` Minchan Kim
2011-06-06 18:51     ` Greg Thelen
2011-06-06 18:51       ` Greg Thelen
2011-06-07  8:56   ` KAMEZAWA Hiroyuki
2011-06-07  8:56     ` KAMEZAWA Hiroyuki
2011-06-07 19:38   ` Vivek Goyal
2011-06-07 19:38     ` Vivek Goyal
2011-06-07 19:42     ` Vivek Goyal
2011-06-07 19:42       ` Vivek Goyal
2011-06-07 20:43     ` Greg Thelen [this message]
2011-06-07 20:43       ` Greg Thelen
2011-06-07 21:05       ` Vivek Goyal
2011-06-07 21:05         ` Vivek Goyal
2011-06-08  0:18         ` KAMEZAWA Hiroyuki
2011-06-08  0:18           ` KAMEZAWA Hiroyuki
2011-06-08  0:18           ` KAMEZAWA Hiroyuki
2011-06-08  4:02           ` Greg Thelen
2011-06-08  4:02             ` Greg Thelen
2011-06-08  4:02             ` Greg Thelen
2011-06-08  4:03             ` KAMEZAWA Hiroyuki
2011-06-08  4:03               ` KAMEZAWA Hiroyuki
2011-06-08  4:03               ` KAMEZAWA Hiroyuki
2011-06-08  5:20               ` Greg Thelen
2011-06-08  5:20                 ` Greg Thelen
2011-06-08 20:42               ` Vivek Goyal
2011-06-08 20:42                 ` Vivek Goyal
2011-06-08 20:42                 ` Vivek Goyal
2011-06-08 20:39             ` Vivek Goyal
2011-06-08 20:39               ` Vivek Goyal
2011-06-09 17:55               ` Greg Thelen
2011-06-09 17:55                 ` Greg Thelen
2011-06-09 21:26                 ` Vivek Goyal
2011-06-09 21:26                   ` Vivek Goyal
2011-06-09 21:26                   ` Vivek Goyal
2011-06-09 22:21                   ` Greg Thelen
2011-06-09 22:21                     ` Greg Thelen
2011-06-09 22:21                     ` Greg Thelen
2011-06-03 22:46 ` [PATCH v8 00/12] memcg: per cgroup dirty page accounting Hiroyuki Kamezawa
2011-06-03 22:46   ` Hiroyuki Kamezawa
2011-06-03 22:50   ` Greg Thelen
2011-06-03 22:50     ` Greg Thelen
2011-06-03 22:50     ` Greg Thelen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xr93lixdv0df.fsf@gthelen.mtv.corp.google.com \
    --to=gthelen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.