All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Thelen <gthelen@google.com>
To: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH v8 00/12] memcg: per cgroup dirty page accounting
Date: Fri, 3 Jun 2011 15:50:28 -0700	[thread overview]
Message-ID: <BANLkTik50YQtoLVHFg7BP6KLz7GtvU1KzEjdCwx620oYWBH0qQ@mail.gmail.com> (raw)
In-Reply-To: <BANLkTikureiwJ=hSfnwo2y0wWoW3hGge9Q@mail.gmail.com>

On Fri, Jun 3, 2011 at 3:46 PM, Hiroyuki Kamezawa
<kamezawa.hiroyuki@gmail.com> wrote:
> 2011/6/4 Greg Thelen <gthelen@google.com>:
>> This patch series provides the ability for each cgroup to have independent dirty
>> page usage limits.  Limiting dirty memory fixes the max amount of dirty (hard to
>> reclaim) page cache used by a cgroup.  This allows for better per cgroup memory
>> isolation and fewer ooms within a single cgroup.
>>
>> Having per cgroup dirty memory limits is not very interesting unless writeback
>> is cgroup aware.  There is not much isolation if cgroups have to writeback data
>> from other cgroups to get below their dirty memory threshold.
>>
>> Per-memcg dirty limits are provided to support isolation and thus cross cgroup
>> inode sharing is not a priority.  This allows the code be simpler.
>>
>> To add cgroup awareness to writeback, this series adds a memcg field to the
>> inode to allow writeback to isolate inodes for a particular cgroup.  When an
>> inode is marked dirty, i_memcg is set to the current cgroup.  When inode pages
>> are marked dirty the i_memcg field compared against the page's cgroup.  If they
>> differ, then the inode is marked as shared by setting i_memcg to a special
>> shared value (zero).
>>
>> Previous discussions suggested that a per-bdi per-memcg b_dirty list was a good
>> way to assoicate inodes with a cgroup without having to add a field to struct
>> inode.  I prototyped this approach but found that it involved more complex
>> writeback changes and had at least one major shortcoming: detection of when an
>> inode becomes shared by multiple cgroups.  While such sharing is not expected to
>> be common, the system should gracefully handle it.
>>
>> balance_dirty_pages() calls mem_cgroup_balance_dirty_pages(), which checks the
>> dirty usage vs dirty thresholds for the current cgroup and its parents.  If any
>> over-limit cgroups are found, they are marked in a global over-limit bitmap
>> (indexed by cgroup id) and the bdi flusher is awoke.
>>
>> The bdi flusher uses wb_check_background_flush() to check for any memcg over
>> their dirty limit.  When performing per-memcg background writeback,
>> move_expired_inodes() walks per bdi b_dirty list using each inode's i_memcg and
>> the global over-limit memcg bitmap to determine if the inode should be written.
>>
>> If mem_cgroup_balance_dirty_pages() is unable to get below the dirty page
>> threshold writing per-memcg inodes, then downshifts to also writing shared
>> inodes (i_memcg=0).
>>
>> I know that there is some significant writeback changes associated with the
>> IO-less balance_dirty_pages() effort.  I am not trying to derail that, so this
>> patch series is merely an RFC to get feedback on the design.  There are probably
>> some subtle races in these patches.  I have done moderate functional testing of
>> the newly proposed features.
>>
>
> Thank you...hmm, is this set really "merely RFC ?". I'd like to merge
> this function
> before other new big hammer works because this makes behavior of memcg
> much better.

Oops.  I meant to remove the above RFC paragraph.   This -v8 patch
series is intended for merging into mmotm.

> I'd like to review and test this set (but maybe I can't do much in the
> weekend...)

Thank you.

> Anyway, thank you.
> -Kame



>> Here is an example of the memcg-oom that is avoided with this patch series:
>>        # mkdir /dev/cgroup/memory/x
>>        # echo 100M > /dev/cgroup/memory/x/memory.limit_in_bytes
>>        # echo $$ > /dev/cgroup/memory/x/tasks
>>        # dd if=/dev/zero of=/data/f1 bs=1k count=1M &
>>        # dd if=/dev/zero of=/data/f2 bs=1k count=1M &
>>        # wait
>>        [1]-  Killed                  dd if=/dev/zero of=/data/f1 bs=1M count=1k
>>        [2]+  Killed                  dd if=/dev/zero of=/data/f1 bs=1M count=1k
>>
>> Known limitations:
>>        If a dirty limit is lowered a cgroup may be over its limit.
>>
>> Changes since -v7:
>> - Merged -v7 09/14 'cgroup: move CSS_ID_MAX to cgroup.h' into
>>  -v8 09/13 'memcg: create support routines for writeback'
>>
>> - Merged -v7 08/14 'writeback: add memcg fields to writeback_control'
>>  into -v8 09/13 'memcg: create support routines for writeback' and
>>  -v8 10/13 'memcg: create support routines for page-writeback'.  This
>>  moves the declaration of new fields with the first usage of the
>>  respective fields.
>>
>> - mem_cgroup_writeback_done() now clears corresponding bit for cgroup that
>>  cannot be referenced.  Such a bit would represent a cgroup previously over
>>  dirty limit, but that has been deleted before writeback cleaned all pages.  By
>>  clearing bit, writeback will not continually try to writeback the deleted
>>  cgroup.
>>
>> - Previously mem_cgroup_writeback_done() would only finish writeback when the
>>  cgroup's dirty memory usage dropped below the dirty limit.  This was the wrong
>>  limit to check.  This now correctly checks usage against the background dirty
>>  limit.
>>
>> - over_bground_thresh() now sets shared_inodes=1.  In -v7 per memcg
>>  background writeback did not, so it did not write pages of shared
>>  inodes in background writeback.  In the (potentially common) case
>>  where the system dirty memory usage is below the system background
>>  dirty threshold but at least one cgroup is over its background dirty
>>  limit, then per memcg background writeback is queued for any
>>  over-background-threshold cgroups.  Background writeback should be
>>  allowed to writeback shared inodes.  The hope is that writing such
>>  inodes has good chance of cleaning the inodes so they can transition
>>  from shared to non-shared.  Such a transition is good because then the
>>  inode will remain unshared until it is written by multiple cgroup.
>>  Non-shared inodes offer better isolation.
>>
>> Single patch that can be applied to mmotm-2011-05-12-15-52:
>>  http://www.kernel.org/pub/linux/kernel/people/gthelen/memcg/memcg-dirty-limits-v8-on-mmotm-2011-05-12-15-52.patch
>>
>> Patches are based on mmotm-2011-05-12-15-52.
>>
>> Greg Thelen (12):
>>  memcg: document cgroup dirty memory interfaces
>>  memcg: add page_cgroup flags for dirty page tracking
>>  memcg: add mem_cgroup_mark_inode_dirty()
>>  memcg: add dirty page accounting infrastructure
>>  memcg: add kernel calls for memcg dirty page stats
>>  memcg: add dirty limits to mem_cgroup
>>  memcg: add cgroupfs interface to memcg dirty limits
>>  memcg: dirty page accounting support routines
>>  memcg: create support routines for writeback
>>  memcg: create support routines for page-writeback
>>  writeback: make background writeback cgroup aware
>>  memcg: check memcg dirty limits in page writeback
>>
>>  Documentation/cgroups/memory.txt  |   70 ++++
>>  fs/fs-writeback.c                 |   34 ++-
>>  fs/inode.c                        |    3 +
>>  fs/nfs/write.c                    |    4 +
>>  include/linux/cgroup.h            |    1 +
>>  include/linux/fs.h                |    9 +
>>  include/linux/memcontrol.h        |   63 ++++-
>>  include/linux/page_cgroup.h       |   23 ++
>>  include/linux/writeback.h         |    5 +-
>>  include/trace/events/memcontrol.h |  198 +++++++++++
>>  kernel/cgroup.c                   |    1 -
>>  mm/filemap.c                      |    1 +
>>  mm/memcontrol.c                   |  708 ++++++++++++++++++++++++++++++++++++-
>>  mm/page-writeback.c               |   42 ++-
>>  mm/truncate.c                     |    1 +
>>  mm/vmscan.c                       |    2 +-
>>  16 files changed, 1138 insertions(+), 27 deletions(-)
>>  create mode 100644 include/trace/events/memcontrol.h
>>
>> --
>> 1.7.3.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Greg Thelen <gthelen@google.com>
To: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH v8 00/12] memcg: per cgroup dirty page accounting
Date: Fri, 3 Jun 2011 15:50:28 -0700	[thread overview]
Message-ID: <BANLkTik50YQtoLVHFg7BP6KLz7GtvU1KzEjdCwx620oYWBH0qQ@mail.gmail.com> (raw)
In-Reply-To: <BANLkTikureiwJ=hSfnwo2y0wWoW3hGge9Q@mail.gmail.com>

On Fri, Jun 3, 2011 at 3:46 PM, Hiroyuki Kamezawa
<kamezawa.hiroyuki@gmail.com> wrote:
> 2011/6/4 Greg Thelen <gthelen@google.com>:
>> This patch series provides the ability for each cgroup to have independent dirty
>> page usage limits.  Limiting dirty memory fixes the max amount of dirty (hard to
>> reclaim) page cache used by a cgroup.  This allows for better per cgroup memory
>> isolation and fewer ooms within a single cgroup.
>>
>> Having per cgroup dirty memory limits is not very interesting unless writeback
>> is cgroup aware.  There is not much isolation if cgroups have to writeback data
>> from other cgroups to get below their dirty memory threshold.
>>
>> Per-memcg dirty limits are provided to support isolation and thus cross cgroup
>> inode sharing is not a priority.  This allows the code be simpler.
>>
>> To add cgroup awareness to writeback, this series adds a memcg field to the
>> inode to allow writeback to isolate inodes for a particular cgroup.  When an
>> inode is marked dirty, i_memcg is set to the current cgroup.  When inode pages
>> are marked dirty the i_memcg field compared against the page's cgroup.  If they
>> differ, then the inode is marked as shared by setting i_memcg to a special
>> shared value (zero).
>>
>> Previous discussions suggested that a per-bdi per-memcg b_dirty list was a good
>> way to assoicate inodes with a cgroup without having to add a field to struct
>> inode.  I prototyped this approach but found that it involved more complex
>> writeback changes and had at least one major shortcoming: detection of when an
>> inode becomes shared by multiple cgroups.  While such sharing is not expected to
>> be common, the system should gracefully handle it.
>>
>> balance_dirty_pages() calls mem_cgroup_balance_dirty_pages(), which checks the
>> dirty usage vs dirty thresholds for the current cgroup and its parents.  If any
>> over-limit cgroups are found, they are marked in a global over-limit bitmap
>> (indexed by cgroup id) and the bdi flusher is awoke.
>>
>> The bdi flusher uses wb_check_background_flush() to check for any memcg over
>> their dirty limit.  When performing per-memcg background writeback,
>> move_expired_inodes() walks per bdi b_dirty list using each inode's i_memcg and
>> the global over-limit memcg bitmap to determine if the inode should be written.
>>
>> If mem_cgroup_balance_dirty_pages() is unable to get below the dirty page
>> threshold writing per-memcg inodes, then downshifts to also writing shared
>> inodes (i_memcg=0).
>>
>> I know that there is some significant writeback changes associated with the
>> IO-less balance_dirty_pages() effort.  I am not trying to derail that, so this
>> patch series is merely an RFC to get feedback on the design.  There are probably
>> some subtle races in these patches.  I have done moderate functional testing of
>> the newly proposed features.
>>
>
> Thank you...hmm, is this set really "merely RFC ?". I'd like to merge
> this function
> before other new big hammer works because this makes behavior of memcg
> much better.

Oops.  I meant to remove the above RFC paragraph.   This -v8 patch
series is intended for merging into mmotm.

> I'd like to review and test this set (but maybe I can't do much in the
> weekend...)

Thank you.

> Anyway, thank you.
> -Kame



>> Here is an example of the memcg-oom that is avoided with this patch series:
>>        # mkdir /dev/cgroup/memory/x
>>        # echo 100M > /dev/cgroup/memory/x/memory.limit_in_bytes
>>        # echo $$ > /dev/cgroup/memory/x/tasks
>>        # dd if=/dev/zero of=/data/f1 bs=1k count=1M &
>>        # dd if=/dev/zero of=/data/f2 bs=1k count=1M &
>>        # wait
>>        [1]-  Killed                  dd if=/dev/zero of=/data/f1 bs=1M count=1k
>>        [2]+  Killed                  dd if=/dev/zero of=/data/f1 bs=1M count=1k
>>
>> Known limitations:
>>        If a dirty limit is lowered a cgroup may be over its limit.
>>
>> Changes since -v7:
>> - Merged -v7 09/14 'cgroup: move CSS_ID_MAX to cgroup.h' into
>>  -v8 09/13 'memcg: create support routines for writeback'
>>
>> - Merged -v7 08/14 'writeback: add memcg fields to writeback_control'
>>  into -v8 09/13 'memcg: create support routines for writeback' and
>>  -v8 10/13 'memcg: create support routines for page-writeback'.  This
>>  moves the declaration of new fields with the first usage of the
>>  respective fields.
>>
>> - mem_cgroup_writeback_done() now clears corresponding bit for cgroup that
>>  cannot be referenced.  Such a bit would represent a cgroup previously over
>>  dirty limit, but that has been deleted before writeback cleaned all pages.  By
>>  clearing bit, writeback will not continually try to writeback the deleted
>>  cgroup.
>>
>> - Previously mem_cgroup_writeback_done() would only finish writeback when the
>>  cgroup's dirty memory usage dropped below the dirty limit.  This was the wrong
>>  limit to check.  This now correctly checks usage against the background dirty
>>  limit.
>>
>> - over_bground_thresh() now sets shared_inodes=1.  In -v7 per memcg
>>  background writeback did not, so it did not write pages of shared
>>  inodes in background writeback.  In the (potentially common) case
>>  where the system dirty memory usage is below the system background
>>  dirty threshold but at least one cgroup is over its background dirty
>>  limit, then per memcg background writeback is queued for any
>>  over-background-threshold cgroups.  Background writeback should be
>>  allowed to writeback shared inodes.  The hope is that writing such
>>  inodes has good chance of cleaning the inodes so they can transition
>>  from shared to non-shared.  Such a transition is good because then the
>>  inode will remain unshared until it is written by multiple cgroup.
>>  Non-shared inodes offer better isolation.
>>
>> Single patch that can be applied to mmotm-2011-05-12-15-52:
>>  http://www.kernel.org/pub/linux/kernel/people/gthelen/memcg/memcg-dirty-limits-v8-on-mmotm-2011-05-12-15-52.patch
>>
>> Patches are based on mmotm-2011-05-12-15-52.
>>
>> Greg Thelen (12):
>>  memcg: document cgroup dirty memory interfaces
>>  memcg: add page_cgroup flags for dirty page tracking
>>  memcg: add mem_cgroup_mark_inode_dirty()
>>  memcg: add dirty page accounting infrastructure
>>  memcg: add kernel calls for memcg dirty page stats
>>  memcg: add dirty limits to mem_cgroup
>>  memcg: add cgroupfs interface to memcg dirty limits
>>  memcg: dirty page accounting support routines
>>  memcg: create support routines for writeback
>>  memcg: create support routines for page-writeback
>>  writeback: make background writeback cgroup aware
>>  memcg: check memcg dirty limits in page writeback
>>
>>  Documentation/cgroups/memory.txt  |   70 ++++
>>  fs/fs-writeback.c                 |   34 ++-
>>  fs/inode.c                        |    3 +
>>  fs/nfs/write.c                    |    4 +
>>  include/linux/cgroup.h            |    1 +
>>  include/linux/fs.h                |    9 +
>>  include/linux/memcontrol.h        |   63 ++++-
>>  include/linux/page_cgroup.h       |   23 ++
>>  include/linux/writeback.h         |    5 +-
>>  include/trace/events/memcontrol.h |  198 +++++++++++
>>  kernel/cgroup.c                   |    1 -
>>  mm/filemap.c                      |    1 +
>>  mm/memcontrol.c                   |  708 ++++++++++++++++++++++++++++++++++++-
>>  mm/page-writeback.c               |   42 ++-
>>  mm/truncate.c                     |    1 +
>>  mm/vmscan.c                       |    2 +-
>>  16 files changed, 1138 insertions(+), 27 deletions(-)
>>  create mode 100644 include/trace/events/memcontrol.h
>>
>> --
>> 1.7.3.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>

WARNING: multiple messages have this Message-ID (diff)
From: Greg Thelen <gthelen@google.com>
To: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	containers@lists.osdl.org, linux-fsdevel@vger.kernel.org,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Minchan Kim <minchan.kim@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ciju Rajan K <ciju@linux.vnet.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH v8 00/12] memcg: per cgroup dirty page accounting
Date: Fri, 3 Jun 2011 15:50:28 -0700	[thread overview]
Message-ID: <BANLkTik50YQtoLVHFg7BP6KLz7GtvU1KzEjdCwx620oYWBH0qQ@mail.gmail.com> (raw)
In-Reply-To: <BANLkTikureiwJ=hSfnwo2y0wWoW3hGge9Q@mail.gmail.com>

On Fri, Jun 3, 2011 at 3:46 PM, Hiroyuki Kamezawa
<kamezawa.hiroyuki@gmail.com> wrote:
> 2011/6/4 Greg Thelen <gthelen@google.com>:
>> This patch series provides the ability for each cgroup to have independent dirty
>> page usage limits.  Limiting dirty memory fixes the max amount of dirty (hard to
>> reclaim) page cache used by a cgroup.  This allows for better per cgroup memory
>> isolation and fewer ooms within a single cgroup.
>>
>> Having per cgroup dirty memory limits is not very interesting unless writeback
>> is cgroup aware.  There is not much isolation if cgroups have to writeback data
>> from other cgroups to get below their dirty memory threshold.
>>
>> Per-memcg dirty limits are provided to support isolation and thus cross cgroup
>> inode sharing is not a priority.  This allows the code be simpler.
>>
>> To add cgroup awareness to writeback, this series adds a memcg field to the
>> inode to allow writeback to isolate inodes for a particular cgroup.  When an
>> inode is marked dirty, i_memcg is set to the current cgroup.  When inode pages
>> are marked dirty the i_memcg field compared against the page's cgroup.  If they
>> differ, then the inode is marked as shared by setting i_memcg to a special
>> shared value (zero).
>>
>> Previous discussions suggested that a per-bdi per-memcg b_dirty list was a good
>> way to assoicate inodes with a cgroup without having to add a field to struct
>> inode.  I prototyped this approach but found that it involved more complex
>> writeback changes and had at least one major shortcoming: detection of when an
>> inode becomes shared by multiple cgroups.  While such sharing is not expected to
>> be common, the system should gracefully handle it.
>>
>> balance_dirty_pages() calls mem_cgroup_balance_dirty_pages(), which checks the
>> dirty usage vs dirty thresholds for the current cgroup and its parents.  If any
>> over-limit cgroups are found, they are marked in a global over-limit bitmap
>> (indexed by cgroup id) and the bdi flusher is awoke.
>>
>> The bdi flusher uses wb_check_background_flush() to check for any memcg over
>> their dirty limit.  When performing per-memcg background writeback,
>> move_expired_inodes() walks per bdi b_dirty list using each inode's i_memcg and
>> the global over-limit memcg bitmap to determine if the inode should be written.
>>
>> If mem_cgroup_balance_dirty_pages() is unable to get below the dirty page
>> threshold writing per-memcg inodes, then downshifts to also writing shared
>> inodes (i_memcg=0).
>>
>> I know that there is some significant writeback changes associated with the
>> IO-less balance_dirty_pages() effort.  I am not trying to derail that, so this
>> patch series is merely an RFC to get feedback on the design.  There are probably
>> some subtle races in these patches.  I have done moderate functional testing of
>> the newly proposed features.
>>
>
> Thank you...hmm, is this set really "merely RFC ?". I'd like to merge
> this function
> before other new big hammer works because this makes behavior of memcg
> much better.

Oops.  I meant to remove the above RFC paragraph.   This -v8 patch
series is intended for merging into mmotm.

> I'd like to review and test this set (but maybe I can't do much in the
> weekend...)

Thank you.

> Anyway, thank you.
> -Kame



>> Here is an example of the memcg-oom that is avoided with this patch series:
>>        # mkdir /dev/cgroup/memory/x
>>        # echo 100M > /dev/cgroup/memory/x/memory.limit_in_bytes
>>        # echo $$ > /dev/cgroup/memory/x/tasks
>>        # dd if=/dev/zero of=/data/f1 bs=1k count=1M &
>>        # dd if=/dev/zero of=/data/f2 bs=1k count=1M &
>>        # wait
>>        [1]-  Killed                  dd if=/dev/zero of=/data/f1 bs=1M count=1k
>>        [2]+  Killed                  dd if=/dev/zero of=/data/f1 bs=1M count=1k
>>
>> Known limitations:
>>        If a dirty limit is lowered a cgroup may be over its limit.
>>
>> Changes since -v7:
>> - Merged -v7 09/14 'cgroup: move CSS_ID_MAX to cgroup.h' into
>>  -v8 09/13 'memcg: create support routines for writeback'
>>
>> - Merged -v7 08/14 'writeback: add memcg fields to writeback_control'
>>  into -v8 09/13 'memcg: create support routines for writeback' and
>>  -v8 10/13 'memcg: create support routines for page-writeback'.  This
>>  moves the declaration of new fields with the first usage of the
>>  respective fields.
>>
>> - mem_cgroup_writeback_done() now clears corresponding bit for cgroup that
>>  cannot be referenced.  Such a bit would represent a cgroup previously over
>>  dirty limit, but that has been deleted before writeback cleaned all pages.  By
>>  clearing bit, writeback will not continually try to writeback the deleted
>>  cgroup.
>>
>> - Previously mem_cgroup_writeback_done() would only finish writeback when the
>>  cgroup's dirty memory usage dropped below the dirty limit.  This was the wrong
>>  limit to check.  This now correctly checks usage against the background dirty
>>  limit.
>>
>> - over_bground_thresh() now sets shared_inodes=1.  In -v7 per memcg
>>  background writeback did not, so it did not write pages of shared
>>  inodes in background writeback.  In the (potentially common) case
>>  where the system dirty memory usage is below the system background
>>  dirty threshold but at least one cgroup is over its background dirty
>>  limit, then per memcg background writeback is queued for any
>>  over-background-threshold cgroups.  Background writeback should be
>>  allowed to writeback shared inodes.  The hope is that writing such
>>  inodes has good chance of cleaning the inodes so they can transition
>>  from shared to non-shared.  Such a transition is good because then the
>>  inode will remain unshared until it is written by multiple cgroup.
>>  Non-shared inodes offer better isolation.
>>
>> Single patch that can be applied to mmotm-2011-05-12-15-52:
>>  http://www.kernel.org/pub/linux/kernel/people/gthelen/memcg/memcg-dirty-limits-v8-on-mmotm-2011-05-12-15-52.patch
>>
>> Patches are based on mmotm-2011-05-12-15-52.
>>
>> Greg Thelen (12):
>>  memcg: document cgroup dirty memory interfaces
>>  memcg: add page_cgroup flags for dirty page tracking
>>  memcg: add mem_cgroup_mark_inode_dirty()
>>  memcg: add dirty page accounting infrastructure
>>  memcg: add kernel calls for memcg dirty page stats
>>  memcg: add dirty limits to mem_cgroup
>>  memcg: add cgroupfs interface to memcg dirty limits
>>  memcg: dirty page accounting support routines
>>  memcg: create support routines for writeback
>>  memcg: create support routines for page-writeback
>>  writeback: make background writeback cgroup aware
>>  memcg: check memcg dirty limits in page writeback
>>
>>  Documentation/cgroups/memory.txt  |   70 ++++
>>  fs/fs-writeback.c                 |   34 ++-
>>  fs/inode.c                        |    3 +
>>  fs/nfs/write.c                    |    4 +
>>  include/linux/cgroup.h            |    1 +
>>  include/linux/fs.h                |    9 +
>>  include/linux/memcontrol.h        |   63 ++++-
>>  include/linux/page_cgroup.h       |   23 ++
>>  include/linux/writeback.h         |    5 +-
>>  include/trace/events/memcontrol.h |  198 +++++++++++
>>  kernel/cgroup.c                   |    1 -
>>  mm/filemap.c                      |    1 +
>>  mm/memcontrol.c                   |  708 ++++++++++++++++++++++++++++++++++++-
>>  mm/page-writeback.c               |   42 ++-
>>  mm/truncate.c                     |    1 +
>>  mm/vmscan.c                       |    2 +-
>>  16 files changed, 1138 insertions(+), 27 deletions(-)
>>  create mode 100644 include/trace/events/memcontrol.h
>>
>> --
>> 1.7.3.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-06-03 22:50 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-03 16:12 [PATCH v8 00/12] memcg: per cgroup dirty page accounting Greg Thelen
2011-06-03 16:12 ` Greg Thelen
2011-06-03 16:12 ` [PATCH v8 01/12] memcg: document cgroup dirty memory interfaces Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04  9:54   ` Minchan Kim
2011-06-04  9:54     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 02/12] memcg: add page_cgroup flags for dirty page tracking Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04  9:56   ` Minchan Kim
2011-06-04  9:56     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 03/12] memcg: add mem_cgroup_mark_inode_dirty() Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-03 23:09   ` Andrea Righi
2011-06-03 23:09     ` Andrea Righi
2011-06-03 23:45     ` Greg Thelen
2011-06-03 23:45       ` Greg Thelen
2011-06-07  7:27   ` KAMEZAWA Hiroyuki
2011-06-07  7:27     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 04/12] memcg: add dirty page accounting infrastructure Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 10:11   ` Minchan Kim
2011-06-04 10:11     ` Minchan Kim
2011-06-07  7:28   ` KAMEZAWA Hiroyuki
2011-06-07  7:28     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 05/12] memcg: add kernel calls for memcg dirty page stats Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 15:42   ` Minchan Kim
2011-06-04 15:42     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 06/12] memcg: add dirty limits to mem_cgroup Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 15:57   ` Minchan Kim
2011-06-04 15:57     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 07/12] memcg: add cgroupfs interface to memcg dirty limits Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-04 16:04   ` Minchan Kim
2011-06-04 16:04     ` Minchan Kim
2011-06-03 16:12 ` [PATCH v8 08/12] memcg: dirty page accounting support routines Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-07  7:44   ` KAMEZAWA Hiroyuki
2011-06-07  7:44     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 09/12] memcg: create support routines for writeback Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-05  2:46   ` Minchan Kim
2011-06-05  2:46     ` Minchan Kim
2011-06-07  7:46   ` KAMEZAWA Hiroyuki
2011-06-07  7:46     ` KAMEZAWA Hiroyuki
2011-06-03 16:12 ` [PATCH v8 10/12] memcg: create support routines for page-writeback Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-05  3:11   ` Minchan Kim
2011-06-05  3:11     ` Minchan Kim
2011-06-06 18:47     ` Greg Thelen
2011-06-06 18:47       ` Greg Thelen
2011-06-07  8:50   ` KAMEZAWA Hiroyuki
2011-06-07  8:50     ` KAMEZAWA Hiroyuki
2011-06-07 15:58     ` Greg Thelen
2011-06-07 15:58       ` Greg Thelen
2011-06-08  0:01       ` KAMEZAWA Hiroyuki
2011-06-08  0:01         ` KAMEZAWA Hiroyuki
2011-06-08  1:50         ` Greg Thelen
2011-06-08  1:50           ` Greg Thelen
2011-06-03 16:12 ` [PATCH v8 11/12] writeback: make background writeback cgroup aware Greg Thelen
2011-06-03 16:12   ` Greg Thelen
2011-06-05  4:11   ` Minchan Kim
2011-06-05  4:11     ` Minchan Kim
2011-06-06 18:51     ` Greg Thelen
2011-06-06 18:51       ` Greg Thelen
2011-06-07  8:56   ` KAMEZAWA Hiroyuki
2011-06-07  8:56     ` KAMEZAWA Hiroyuki
2011-06-07 19:38   ` Vivek Goyal
2011-06-07 19:38     ` Vivek Goyal
2011-06-07 19:42     ` Vivek Goyal
2011-06-07 19:42       ` Vivek Goyal
2011-06-07 20:43     ` Greg Thelen
2011-06-07 20:43       ` Greg Thelen
2011-06-07 21:05       ` Vivek Goyal
2011-06-07 21:05         ` Vivek Goyal
2011-06-08  0:18         ` KAMEZAWA Hiroyuki
2011-06-08  0:18           ` KAMEZAWA Hiroyuki
2011-06-08  0:18           ` KAMEZAWA Hiroyuki
2011-06-08  4:02           ` Greg Thelen
2011-06-08  4:02             ` Greg Thelen
2011-06-08  4:02             ` Greg Thelen
2011-06-08  4:03             ` KAMEZAWA Hiroyuki
2011-06-08  4:03               ` KAMEZAWA Hiroyuki
2011-06-08  4:03               ` KAMEZAWA Hiroyuki
2011-06-08  5:20               ` Greg Thelen
2011-06-08  5:20                 ` Greg Thelen
2011-06-08 20:42               ` Vivek Goyal
2011-06-08 20:42                 ` Vivek Goyal
2011-06-08 20:42                 ` Vivek Goyal
2011-06-08 20:39             ` Vivek Goyal
2011-06-08 20:39               ` Vivek Goyal
2011-06-09 17:55               ` Greg Thelen
2011-06-09 17:55                 ` Greg Thelen
2011-06-09 21:26                 ` Vivek Goyal
2011-06-09 21:26                   ` Vivek Goyal
2011-06-09 21:26                   ` Vivek Goyal
2011-06-09 22:21                   ` Greg Thelen
2011-06-09 22:21                     ` Greg Thelen
2011-06-09 22:21                     ` Greg Thelen
2011-06-03 22:46 ` [PATCH v8 00/12] memcg: per cgroup dirty page accounting Hiroyuki Kamezawa
2011-06-03 22:46   ` Hiroyuki Kamezawa
2011-06-03 22:50   ` Greg Thelen [this message]
2011-06-03 22:50     ` Greg Thelen
2011-06-03 22:50     ` Greg Thelen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BANLkTik50YQtoLVHFg7BP6KLz7GtvU1KzEjdCwx620oYWBH0qQ@mail.gmail.com \
    --to=gthelen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ciju@linux.vnet.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kamezawa.hiroyuki@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=rientjes@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.