Re: [PATCHSET 1/3 v2 block/for-4.1/core] writeback: cgroup writeback support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org, jack@suse.cz,
	hch@infradead.org, hannes@cmpxchg.org,
	linux-fsdevel@vger.kernel.org, lizefan@huawei.com,
	cgroups@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.cz,
	clm@fb.com, fengguang.wu@intel.com, david@fromorbit.com,
	gthelen@google.com
Subject: Re: [PATCHSET 1/3 v2 block/for-4.1/core] writeback: cgroup writeback support
Date: Wed, 25 Mar 2015 12:01:15 -0400	[thread overview]
Message-ID: <20150325160115.GR3880@htj.duckdns.org> (raw)
In-Reply-To: <20150325154022.GC29728@redhat.com>

Hello, Vivek.

On Wed, Mar 25, 2015 at 11:40:22AM -0400, Vivek Goyal wrote:
> I have 32G of RAM on my system and I setup a write bandwidth of 1MB/s
> on the disk and allowed a dd to run. That dd quickly consumed 5G of
> page cache before it reached to a steady state. Sounds like too much
> of cache consumption which will be drained at a speed of 1MB/s. Not
> sure if this is expected or bdi back-pressure is not being applied soon
> enough.

Ooh, the system will happily dirty certain amount of memory regardless
of the writeback speed.  The default is bg_thresh 10% and thresh 20%
which puts the target ratio at 15%.  On a 32G system this is ~4.8G, so
sounds about right.  This is intentional as otherwise we may end up
threshing worloads which can perfectly fit in the memory due to slow
backing device.  e.g. a workload which has 4G dirty footprint would
work perfectly fine in the above setup regardless of the speed of the
backing device.  If we capped dirty memory at, say, 120s of write
bandwidth, which is 120MB in this case, that workload would suffer
horribly for no good reason.

The proportional distribution of dirty pages is really just
proportional.  If you don't have higher bw backing device active on
the system, whatever is active, however slow that may be, get to
consume the entirety of the allowable dirty memory.  This doesn't
necessarily make sense for things like USB sticks, so we have per-bdi
max_ratio which can be set from userland for devices which aren't
supposed to host those sort of workloads (as you aren't gonna run a DB
workload on your thumbdrive).

So, that's where that 5G amount came from, but while you're
excercising cgroup writeback path, it isn't really doing anything
differently from before if you don't configure memcg limits.  This is
the same behavior which would happen in the global case.  Try to
configure different cgroups w/ different memory limits and write to
devices with differing write speeds.  They all will converge to ~15%
of the allowable memory in each cgroup and the dirty pages in each
cgroup will be distributed according to each device's writeback speed
in that cgroup.

Thanks.

-- 
tejun

WARNING: multiple messages have this Message-ID (diff)

From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jack-AlSwsSmVLrQ@public.gmane.org,
	hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	mhocko-AlSwsSmVLrQ@public.gmane.org,
	clm-b10kYP2dOMg@public.gmane.org,
	fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org,
	gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCHSET 1/3 v2 block/for-4.1/core] writeback: cgroup writeback support
Date: Wed, 25 Mar 2015 12:01:15 -0400	[thread overview]
Message-ID: <20150325160115.GR3880@htj.duckdns.org> (raw)
In-Reply-To: <20150325154022.GC29728-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Hello, Vivek.

On Wed, Mar 25, 2015 at 11:40:22AM -0400, Vivek Goyal wrote:
> I have 32G of RAM on my system and I setup a write bandwidth of 1MB/s
> on the disk and allowed a dd to run. That dd quickly consumed 5G of
> page cache before it reached to a steady state. Sounds like too much
> of cache consumption which will be drained at a speed of 1MB/s. Not
> sure if this is expected or bdi back-pressure is not being applied soon
> enough.

Ooh, the system will happily dirty certain amount of memory regardless
of the writeback speed.  The default is bg_thresh 10% and thresh 20%
which puts the target ratio at 15%.  On a 32G system this is ~4.8G, so
sounds about right.  This is intentional as otherwise we may end up
threshing worloads which can perfectly fit in the memory due to slow
backing device.  e.g. a workload which has 4G dirty footprint would
work perfectly fine in the above setup regardless of the speed of the
backing device.  If we capped dirty memory at, say, 120s of write
bandwidth, which is 120MB in this case, that workload would suffer
horribly for no good reason.

The proportional distribution of dirty pages is really just
proportional.  If you don't have higher bw backing device active on
the system, whatever is active, however slow that may be, get to
consume the entirety of the allowable dirty memory.  This doesn't
necessarily make sense for things like USB sticks, so we have per-bdi
max_ratio which can be set from userland for devices which aren't
supposed to host those sort of workloads (as you aren't gonna run a DB
workload on your thumbdrive).

So, that's where that 5G amount came from, but while you're
excercising cgroup writeback path, it isn't really doing anything
differently from before if you don't configure memcg limits.  This is
the same behavior which would happen in the global case.  Try to
configure different cgroups w/ different memory limits and write to
devices with differing write speeds.  They all will converge to ~15%
of the allowable memory in each cgroup and the dirty pages in each
cgroup will be distributed according to each device's writeback speed
in that cgroup.

Thanks.

-- 
tejun

WARNING: multiple messages have this Message-ID (diff)

From: Tejun Heo <tj@kernel.org>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org, jack@suse.cz,
	hch@infradead.org, hannes@cmpxchg.org,
	linux-fsdevel@vger.kernel.org, lizefan@huawei.com,
	cgroups@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.cz,
	clm@fb.com, fengguang.wu@intel.com, david@fromorbit.com,
	gthelen@google.com
Subject: Re: [PATCHSET 1/3 v2 block/for-4.1/core] writeback: cgroup writeback support
Date: Wed, 25 Mar 2015 12:01:15 -0400	[thread overview]
Message-ID: <20150325160115.GR3880@htj.duckdns.org> (raw)
In-Reply-To: <20150325154022.GC29728@redhat.com>

Hello, Vivek.

On Wed, Mar 25, 2015 at 11:40:22AM -0400, Vivek Goyal wrote:
> I have 32G of RAM on my system and I setup a write bandwidth of 1MB/s
> on the disk and allowed a dd to run. That dd quickly consumed 5G of
> page cache before it reached to a steady state. Sounds like too much
> of cache consumption which will be drained at a speed of 1MB/s. Not
> sure if this is expected or bdi back-pressure is not being applied soon
> enough.

Ooh, the system will happily dirty certain amount of memory regardless
of the writeback speed.  The default is bg_thresh 10% and thresh 20%
which puts the target ratio at 15%.  On a 32G system this is ~4.8G, so
sounds about right.  This is intentional as otherwise we may end up
threshing worloads which can perfectly fit in the memory due to slow
backing device.  e.g. a workload which has 4G dirty footprint would
work perfectly fine in the above setup regardless of the speed of the
backing device.  If we capped dirty memory at, say, 120s of write
bandwidth, which is 120MB in this case, that workload would suffer
horribly for no good reason.

The proportional distribution of dirty pages is really just
proportional.  If you don't have higher bw backing device active on
the system, whatever is active, however slow that may be, get to
consume the entirety of the allowable dirty memory.  This doesn't
necessarily make sense for things like USB sticks, so we have per-bdi
max_ratio which can be set from userland for devices which aren't
supposed to host those sort of workloads (as you aren't gonna run a DB
workload on your thumbdrive).

So, that's where that 5G amount came from, but while you're
excercising cgroup writeback path, it isn't really doing anything
differently from before if you don't configure memcg limits.  This is
the same behavior which would happen in the global case.  Try to
configure different cgroups w/ different memory limits and write to
devices with differing write speeds.  They all will converge to ~15%
of the allowable memory in each cgroup and the dirty pages in each
cgroup will be distributed according to each device's writeback speed
in that cgroup.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-03-25 16:01 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-23  4:54 [PATCHSET 1/3 v2 block/for-4.1/core] writeback: cgroup writeback support Tejun Heo
2015-03-23  4:54 ` Tejun Heo
2015-03-23  4:54 ` [PATCH 01/48] memcg: add per cgroup dirty page accounting Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 02/48] blkcg: move block/blk-cgroup.h to include/linux/blk-cgroup.h Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 03/48] update !CONFIG_BLK_CGROUP dummies in include/linux/blk-cgroup.h Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 04/48] memcg: add mem_cgroup_root_css Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 05/48] blkcg: add blkcg_root_css Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 06/48] cgroup, block: implement task_get_css() and use it in bio_associate_current() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 07/48] blkcg: implement task_get_blkcg_css() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 08/48] blkcg: implement bio_associate_blkcg() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 09/48] memcg: implement mem_cgroup_css_from_page() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 10/48] writeback: move backing_dev_info->state into bdi_writeback Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 11/48] writeback: move backing_dev_info->bdi_stat[] " Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 12/48] writeback: move bandwidth related fields from backing_dev_info " Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 13/48] writeback: s/bdi/wb/ in mm/page-writeback.c Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 14/48] writeback: move backing_dev_info->wb_lock and ->worklist into bdi_writeback Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 15/48] writeback: reorganize mm/backing-dev.c Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 16/48] writeback: separate out include/linux/backing-dev-defs.h Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 17/48] bdi: make inode_to_bdi() inline Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 18/48] writeback: add @gfp to wb_init() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 19/48] bdi: separate out congested state into a separate struct Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 20/48] writeback: add {CONFIG|BDI_CAP|FS}_CGROUP_WRITEBACK Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 21/48] writeback: make backing_dev_info host cgroup-specific bdi_writebacks Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-27 21:06   ` Vivek Goyal
2015-03-27 21:06     ` Vivek Goyal
2015-03-27 21:41     ` Tejun Heo
2015-03-27 21:41       ` Tejun Heo
2015-03-23  4:54 ` [PATCH 22/48] writeback, blkcg: associate each blkcg_gq with the corresponding bdi_writeback_congested Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 23/48] writeback: attribute stats to the matching per-cgroup bdi_writeback Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 24/48] writeback: let balance_dirty_pages() work on the matching cgroup bdi_writeback Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 25/48] writeback: make congestion functions per bdi_writeback Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 26/48] writeback, blkcg: restructure blk_{set|clear}_queue_congested() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 27/48] writeback, blkcg: propagate non-root blkcg congestion state Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 28/48] writeback: implement and use mapping_congested() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-27 18:06   ` Vivek Goyal
2015-03-27 18:06     ` Vivek Goyal
2015-03-27 18:06     ` Vivek Goyal
2015-03-27 21:46     ` Tejun Heo
2015-03-27 21:46       ` Tejun Heo
2015-03-23  4:54 ` [PATCH 29/48] writeback: implement WB_has_dirty_io wb_state flag Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 30/48] writeback: implement backing_dev_info->tot_write_bandwidth Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 31/48] writeback: make bdi_has_dirty_io() take multiple bdi_writeback's into account Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 32/48] writeback: don't issue wb_writeback_work if clean Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 33/48] writeback: make bdi->min/max_ratio handling cgroup writeback aware Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 34/48] writeback: implement bdi_for_each_wb() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 35/48] writeback: remove bdi_start_writeback() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 36/48] writeback: make laptop_mode_timer_fn() handle multiple bdi_writeback's Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 37/48] writeback: make writeback_in_progress() take bdi_writeback instead of backing_dev_info Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 38/48] writeback: make bdi_start_background_writeback() " Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 39/48] writeback: make wakeup_flusher_threads() handle multiple bdi_writeback's Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 40/48] writeback: add wb_writeback_work->auto_free Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 41/48] writeback: implement bdi_wait_for_completion() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 42/48] writeback: implement wb_wait_for_single_work() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 43/48] writeback: restructure try_writeback_inodes_sb[_nr]() Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 44/48] writeback: make writeback initiation functions handle multiple bdi_writeback's Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 45/48] writeback: dirty inodes against their matching cgroup bdi_writeback's Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 46/48] buffer, writeback: make __block_write_full_page() honor cgroup writeback Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 47/48] mpage: make __mpage_writepage() " Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-23  4:54 ` [PATCH 48/48] ext2: enable cgroup writeback support Tejun Heo
2015-03-23  4:54   ` Tejun Heo
2015-03-25 15:40 ` [PATCHSET 1/3 v2 block/for-4.1/core] writeback: " Vivek Goyal
2015-03-25 15:40   ` Vivek Goyal
2015-03-25 15:40   ` Vivek Goyal
2015-03-25 16:01   ` Tejun Heo [this message]
2015-03-25 16:01     ` Tejun Heo
2015-03-25 16:01     ` Tejun Heo
2015-03-31 15:26 ` Tejun Heo
2015-03-31 15:26   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150325160115.GR3880@htj.duckdns.org \
    --to=tj@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.