Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCHSET v2] writeback, memcg: Implement foreign inode flushing
@ 2019-08-15 19:56 Tejun Heo
  2019-08-15 19:57 ` [PATCH 1/5] writeback: Generalize and expose wb_completion Tejun Heo
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Tejun Heo @ 2019-08-15 19:56 UTC (permalink / raw)
  To: axboe, jack, hannes, mhocko, vdavydov.dev
  Cc: cgroups, linux-mm, linux-block, linux-kernel, kernel-team, guro, akpm

Hello,

Changes from v1[1]:

* More comments explaining the parameters.

* 0003-writeback-Separate-out-wb_get_lookup-from-wb_get_create.patch
  added and avoid spuriously creating missing wbs for foreign
  flushing.

There's an inherent mismatch between memcg and writeback.  The former
trackes ownership per-page while the latter per-inode.  This was a
deliberate design decision because honoring per-page ownership in the
writeback path is complicated, may lead to higher CPU and IO overheads
and deemed unnecessary given that write-sharing an inode across
different cgroups isn't a common use-case.

Combined with inode majority-writer ownership switching, this works
well enough in most cases but there are some pathological cases.  For
example, let's say there are two cgroups A and B which keep writing to
different but confined parts of the same inode.  B owns the inode and
A's memory is limited far below B's.  A's dirty ratio can rise enough
to trigger balance_dirty_pages() sleeps but B's can be low enough to
avoid triggering background writeback.  A will be slowed down without
a way to make writeback of the dirty pages happen.

This patchset implements foreign dirty recording and foreign mechanism
so that when a memcg encounters a condition as above it can trigger
flushes on bdi_writebacks which can clean its pages.  Please see the
last patch for more details.

This patchset contains the following four patches.

 0001-writeback-Generalize-and-expose-wb_completion.patch
 0002-bdi-Add-bdi-id.patch
 0003-writeback-Separate-out-wb_get_lookup-from-wb_get_create.patch
 0004-writeback-memcg-Implement-cgroup_writeback_by_id.patch
 0005-writeback-memcg-Implement-foreign-dirty-flushing.patch

0001-0004 are prep patches which expose wb_completion and implement
bdi->id and flushing by bdi and memcg IDs.

0005 implements foreign inode flushing.

Thanks.  diffstat follows.

 fs/fs-writeback.c                |  114 +++++++++++++++++++++++----------
 include/linux/backing-dev-defs.h |   23 ++++++
 include/linux/backing-dev.h      |    5 +
 include/linux/memcontrol.h       |   39 +++++++++++
 include/linux/writeback.h        |    2 
 mm/backing-dev.c                 |  120 +++++++++++++++++++++++++++++------
 mm/memcontrol.c                  |  132 +++++++++++++++++++++++++++++++++++++++
 mm/page-writeback.c              |    4 +
 8 files changed, 386 insertions(+), 53 deletions(-)

--
tejun

[1] http://lkml.kernel.org/r/20190803140155.181190-1-tj@kernel.org

^ permalink raw reply	[flat|nested] 18+ messages in thread
* [PATCHSET v3] writeback, memcg: Implement foreign inode flushing
@ 2019-08-26 16:06 Tejun Heo
  2019-08-26 16:06 ` [PATCH 5/5] writeback, memcg: Implement foreign dirty flushing Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2019-08-26 16:06 UTC (permalink / raw)
  To: axboe, jack, hannes, mhocko, vdavydov.dev
  Cc: cgroups, linux-mm, linux-block, linux-kernel, kernel-team, guro, akpm

Hello,

Changes from v1[1]:

* More comments explaining the parameters.

* 0003-writeback-Separate-out-wb_get_lookup-from-wb_get_create.patch
  added and avoid spuriously creating missing wbs for foreign
  flushing.

Changes from v2[2]:

* Added livelock avoidance and applied other smaller changes suggested
  by Jan.

There's an inherent mismatch between memcg and writeback.  The former
trackes ownership per-page while the latter per-inode.  This was a
deliberate design decision because honoring per-page ownership in the
writeback path is complicated, may lead to higher CPU and IO overheads
and deemed unnecessary given that write-sharing an inode across
different cgroups isn't a common use-case.

Combined with inode majority-writer ownership switching, this works
well enough in most cases but there are some pathological cases.  For
example, let's say there are two cgroups A and B which keep writing to
different but confined parts of the same inode.  B owns the inode and
A's memory is limited far below B's.  A's dirty ratio can rise enough
to trigger balance_dirty_pages() sleeps but B's can be low enough to
avoid triggering background writeback.  A will be slowed down without
a way to make writeback of the dirty pages happen.

This patchset implements foreign dirty recording and foreign mechanism
so that when a memcg encounters a condition as above it can trigger
flushes on bdi_writebacks which can clean its pages.  Please see the
last patch for more details.

This patchset contains the following four patches.

 0001-writeback-Generalize-and-expose-wb_completion.patch
 0002-bdi-Add-bdi-id.patch
 0003-writeback-Separate-out-wb_get_lookup-from-wb_get_create.patch
 0004-writeback-memcg-Implement-cgroup_writeback_by_id.patch
 0005-writeback-memcg-Implement-foreign-dirty-flushing.patch

0001-0004 are prep patches which expose wb_completion and implement
bdi->id and flushing by bdi and memcg IDs.

0005 implements foreign inode flushing.

Thanks.  diffstat follows.

 fs/fs-writeback.c                |  130 ++++++++++++++++++++++++++++---------
 include/linux/backing-dev-defs.h |   23 ++++++
 include/linux/backing-dev.h      |    5 +
 include/linux/memcontrol.h       |   39 +++++++++++
 include/linux/writeback.h        |    2 
 mm/backing-dev.c                 |  120 +++++++++++++++++++++++++++++-----
 mm/memcontrol.c                  |  134 +++++++++++++++++++++++++++++++++++++++
 mm/page-writeback.c              |    4 +
 8 files changed, 404 insertions(+), 53 deletions(-)

--
tejun

[1] http://lkml.kernel.org/r/20190803140155.181190-1-tj@kernel.org
[2] http://lkml.kenrel.org/r/20190815195619.GA2263813@devbig004.ftw2.facebook.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, back to index

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-15 19:56 [PATCHSET v2] writeback, memcg: Implement foreign inode flushing Tejun Heo
2019-08-15 19:57 ` [PATCH 1/5] writeback: Generalize and expose wb_completion Tejun Heo
2019-08-15 19:57 ` [PATCH 2/5] bdi: Add bdi->id Tejun Heo
2019-08-15 19:58 ` [PATCH 3/5] writeback: Separate out wb_get_lookup() from wb_get_create() Tejun Heo
2019-08-16 15:45   ` Jan Kara
2019-08-15 19:59 ` [PATCH 4/5] writeback, memcg: Implement cgroup_writeback_by_id() Tejun Heo
2019-08-16 15:47   ` Jan Kara
2019-08-21 21:02   ` [PATCH v3 " Tejun Heo
2019-08-26 13:49     ` Jan Kara
2019-08-15 19:59 ` [PATCH 5/5] writeback, memcg: Implement foreign dirty flushing Tejun Heo
2019-08-16 16:02   ` Jan Kara
2019-08-21 16:00     ` Tejun Heo
2019-08-21 16:04       ` Tejun Heo
2019-08-21 21:02   ` [PATCH v3 " Tejun Heo
2019-08-26 13:54     ` Jan Kara
2019-08-26 15:58       ` Tejun Heo
2019-08-26 16:06 [PATCHSET v3] writeback, memcg: Implement foreign inode flushing Tejun Heo
2019-08-26 16:06 ` [PATCH 5/5] writeback, memcg: Implement foreign dirty flushing Tejun Heo
2019-08-27 14:47   ` Jan Kara

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox