All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET] block: implement per-blkg request allocation
@ 2012-04-26 21:59 ` Tejun Heo
  0 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2012-04-26 21:59 UTC (permalink / raw)
  To: axboe-tSWWG44O7X1aa/9Udqfwiw
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hughd-hpIqsD4AKlfQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	fengguang.wu-ral2JQCrhuEAvxtiuMwx3w,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA

Hello,

Currently block layer shares a single request_list (@q->rq) for all
IOs regardless of their blkcg associations.  This means that once the
shared pool is exhausted, blkcg limits don't mean much.  Whoever grabs
the requests being freed the first grabs the next IO slot.

This priority inversion can be easily demonstrated by creating a blkio
cgroup w/ very low weight, put a program which can issue a lot of
random direct IOs there and running a sequential IO from a different
cgroup.  As soon as the request pool is used up, the sequential IO
bandwidth crashes.

This patchset implements per-blkg request allocation so that each
blkcg-request_queue pair has its own request pool to allocate from.
This isolates different blkcgs in terms of request allocation.

Most changes are straight-forward; unfortunately, bdi isn't
blkcg-aware yet so it currently just propagates the congestion state
from root blkcg.  As writeback currently is always on the root blkcg,
this kinda works for write congestion but readahead may behave
non-optimally under congestion for now.  This needs to be improved but
the situation is still way better than blkcg completely collapsing.

 0001-blkcg-fix-blkg_alloc-failure-path.patch
 0002-blkcg-__blkg_lookup_create-doesn-t-have-to-fail-on-r.patch
 0003-blkcg-make-root-blkcg-allocation-use-GFP_KERNEL.patch
 0004-mempool-add-gfp_mask-to-mempool_create_node.patch
 0005-block-drop-custom-queue-draining-used-by-scsi_transp.patch
 0006-block-refactor-get_request-_wait.patch
 0007-block-allocate-io_context-upfront.patch
 0008-blkcg-inline-bio_blkcg-and-friends.patch
 0009-block-add-q-nr_rqs-and-move-q-rq.elvpriv-to-q-nr_rqs.patch
 0010-block-prepare-for-multiple-request_lists.patch
 0011-blkcg-implement-per-blkg-request-allocation.patch

0001-0003 are assorted fixes / improvements which can be separated
from this patchset.  Just sending as part of this series for
convenience.

0004 adds @gfp_mask to mempool_create_node().  This is necessary
because blkg allocation is on the IO path and now blkg contains
mempool for request_list.  Note that blkg allocation failure doesn't
lead to catastrophic failure.  It just hinders blkcg enforcement.

0005 drops custom queue draining which I dont't think is necessary and
hinders with further updates.

0006-0010 are prep patches and 0011 implements per-blkg request
allocation.

This patchset is on top of,

  block/for-3.5/core bd1a68b59c "vmsplice: relax alignement requireme..."
+ [1] blkcg: tg_stats_alloc_lock is an irq lock

and is also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-rl

Jens, I still can't reproduce the boot failure you were seeing on
block/for-3.5/core, so am just basing this series on top.  Once we
figure that one out, we can resequence the patches.

Thanks.

 block/blk-cgroup.c                  |  147 ++++++++++++++++----------
 block/blk-cgroup.h                  |  121 +++++++++++++++++++++
 block/blk-core.c                    |  200 ++++++++++++++++++------------------
 block/blk-sysfs.c                   |   34 +++---
 block/blk-throttle.c                |    3 
 block/blk.h                         |    3 
 block/bsg-lib.c                     |   53 ---------
 drivers/scsi/scsi_transport_fc.c    |   38 ------
 drivers/scsi/scsi_transport_iscsi.c |    2 
 include/linux/blkdev.h              |   53 +++++----
 include/linux/bsg-lib.h             |    1 
 include/linux/mempool.h             |    3 
 mm/mempool.c                        |   12 +-
 13 files changed, 379 insertions(+), 291 deletions(-)

--
tejun

[1] http://article.gmane.org/gmane.linux.kernel/1288400

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2012-05-02  4:59 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-26 21:59 [PATCHSET] block: implement per-blkg request allocation Tejun Heo
2012-04-26 21:59 ` Tejun Heo
     [not found] ` <1335477561-11131-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-04-26 21:59   ` [PATCH 01/11] blkcg: fix blkg_alloc() failure path Tejun Heo
2012-04-26 21:59     ` Tejun Heo
     [not found]     ` <1335477561-11131-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-04-27 14:26       ` Vivek Goyal
2012-04-27 14:26     ` Vivek Goyal
2012-04-27 14:26       ` Vivek Goyal
2012-04-27 14:27       ` Tejun Heo
2012-04-27 14:27         ` Tejun Heo
     [not found]       ` <20120427142652.GH10579-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-27 14:27         ` Tejun Heo
2012-04-26 21:59   ` [PATCH 02/11] blkcg: __blkg_lookup_create() doesn't have to fail on radix tree preload failure Tejun Heo
2012-04-26 21:59     ` Tejun Heo
2012-04-27 21:18     ` [PATCH UPDATED 02/11] blkcg: __blkg_lookup_create() doesn't need radix preload Tejun Heo
2012-04-27 21:18       ` Tejun Heo
     [not found]     ` <1335477561-11131-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-04-27 14:42       ` [PATCH 02/11] blkcg: __blkg_lookup_create() doesn't have to fail on radix tree preload failure Vivek Goyal
2012-04-27 14:42         ` Vivek Goyal
2012-04-27 14:47         ` Tejun Heo
2012-04-27 14:47           ` Tejun Heo
     [not found]         ` <20120427144258.GI10579-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-27 14:47           ` Tejun Heo
2012-04-27 21:18       ` [PATCH UPDATED 02/11] blkcg: __blkg_lookup_create() doesn't need radix preload Tejun Heo
2012-04-26 21:59   ` [PATCH 03/11] blkcg: make root blkcg allocation use %GFP_KERNEL Tejun Heo
2012-04-26 21:59     ` Tejun Heo
     [not found]     ` <1335477561-11131-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-04-27 21:19       ` [PATCH UPDATED " Tejun Heo
2012-04-27 21:19         ` Tejun Heo
2012-04-26 21:59   ` [PATCH 04/11] mempool: add @gfp_mask to mempool_create_node() Tejun Heo
2012-04-26 21:59     ` Tejun Heo
2012-04-26 21:59   ` [PATCH 05/11] block: drop custom queue draining used by scsi_transport_{iscsi|fc} Tejun Heo
2012-04-26 21:59     ` Tejun Heo
2012-05-02  4:55     ` Mike Christie
2012-05-02  4:55       ` Mike Christie
     [not found]     ` <1335477561-11131-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-05-02  4:55       ` Mike Christie
2012-04-26 21:59   ` [PATCH 06/11] block: refactor get_request[_wait]() Tejun Heo
2012-04-26 21:59     ` Tejun Heo
2012-04-26 21:59   ` [PATCH 07/11] block: allocate io_context upfront Tejun Heo
2012-04-26 21:59     ` Tejun Heo
2012-04-26 21:59   ` [PATCH 08/11] blkcg: inline bio_blkcg() and friends Tejun Heo
2012-04-26 21:59     ` Tejun Heo
2012-04-26 21:59   ` [PATCH 09/11] block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv Tejun Heo
2012-04-26 21:59   ` [PATCH 10/11] block: prepare for multiple request_lists Tejun Heo
2012-04-26 21:59     ` Tejun Heo
2012-04-26 21:59   ` [PATCH 11/11] blkcg: implement per-blkg request allocation Tejun Heo
2012-04-26 21:59     ` Tejun Heo
     [not found]     ` <1335477561-11131-12-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-04-27 14:54       ` Jeff Moyer
2012-04-27 14:54         ` Jeff Moyer
     [not found]         ` <x49wr51usxi.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2012-04-27 15:02           ` Tejun Heo
2012-04-27 15:02         ` Tejun Heo
2012-04-27 15:02           ` Tejun Heo
     [not found]           ` <20120427150217.GK27486-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-27 15:40             ` Vivek Goyal
2012-04-27 15:40               ` Vivek Goyal
2012-04-27 15:45               ` Tejun Heo
2012-04-27 15:45                 ` Tejun Heo
2012-04-27 15:48                 ` Vivek Goyal
2012-04-27 15:48                   ` Vivek Goyal
     [not found]                   ` <20120427154841.GA16237-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-27 15:51                     ` Tejun Heo
2012-04-27 15:51                       ` Tejun Heo
     [not found]                       ` <20120427155140.GN27486-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-27 15:56                         ` Vivek Goyal
2012-04-27 15:56                           ` Vivek Goyal
     [not found]                           ` <20120427155612.GK10579-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-27 16:19                             ` Vivek Goyal
2012-04-27 16:19                               ` Vivek Goyal
2012-04-27 16:20                             ` Tejun Heo
2012-04-27 16:20                               ` Tejun Heo
2012-04-27 17:21                               ` Vivek Goyal
2012-04-27 17:21                                 ` Vivek Goyal
2012-04-27 17:25                                 ` Tejun Heo
2012-04-27 17:25                                   ` Tejun Heo
     [not found]                                 ` <20120427172110.GM10579-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-27 17:25                                   ` Tejun Heo
     [not found]                               ` <20120427162012.GP27486-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-27 17:21                                 ` Vivek Goyal
     [not found]                 ` <20120427154502.GM27486-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-27 15:48                   ` Vivek Goyal
     [not found]               ` <20120427154033.GJ10579-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-27 15:45                 ` Tejun Heo
2012-04-27 19:46       ` Vivek Goyal
2012-04-27 19:46         ` Vivek Goyal
     [not found]         ` <20120427194654.GN10579-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-27 20:15           ` Tejun Heo
2012-04-27 20:15             ` Tejun Heo
     [not found]             ` <20120427201516.GJ26595-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-27 20:21               ` Vivek Goyal
2012-04-27 20:21             ` Vivek Goyal
2012-04-27 20:21               ` Vivek Goyal
2012-04-26 21:59 ` [PATCH 09/11] block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.