linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs
@ 2015-06-08  8:59 Tejun Heo
  2015-06-08  8:59 ` [PATCH 1/8] cfq-iosched: simplify control flow in cfq_get_queue() Tejun Heo
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Tejun Heo @ 2015-06-08  8:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna

Hello,

cfq has always charged all async IOs to the root cgroup.  It didn't
have much choice as writeback didn't know about cgroups and there was
no way to tell who to blame for a given writeback IO.  writeback
finally grew support for cgroups and now tags each writeback IO with
the appropriate cgroup to charge it against.

This patchset updates cfq so that it follows the blkcg each bio is
tagged with.  Async cfq_queues are now shared across cfq_group, which
is per-cgroup, instead of per-request_queue cfq_data.  This makes all
IOs follow the weight based IO resource distribution implemented by
cfq.

This patchset contains the following 8 patches.

 0001-cfq-iosched-simplify-control-flow-in-cfq_get_queue.patch
 0002-cfq-iosched-fix-async-oom-queue-handling.patch
 0003-cfq-iosched-fix-oom-cfq_queue-ref-leak-in-cfq_set_re.patch
 0004-cfq-iosched-minor-cleanups.patch
 0005-cfq-iosched-remove-gfp_mask-from-cfq_find_alloc_queu.patch
 0006-cfq-iosched-move-cfq_group-determination-from-cfq_fi.patch
 0007-cfq-iosched-fold-cfq_find_alloc_queue-into-cfq_get_q.patch
 0008-cfq-iosched-charge-async-IOs-to-the-appropriate-blkc.patch

0001-0003 are a prep and two fix patches on top.  The bugs fixed by
these patches are very unlikely to cause problems in actual usage, so
the patches aren't tagged w/ -stable.

0004-0007 are prep patches.

0008 makes cfq cgroup-writeback ready.

This patchset is on top of block/for-4.2/writeback 5857cd637bc0 ("bdi:
fix wrong error return value in cgwb_create()").

Thanks, diffstat follows.

 block/cfq-iosched.c |  220 ++++++++++++++++++++--------------------------------
 1 file changed, 85 insertions(+), 135 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/8] cfq-iosched: simplify control flow in cfq_get_queue()
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
@ 2015-06-08  8:59 ` Tejun Heo
  2015-06-08 18:36   ` Jeff Moyer
  2015-06-08  8:59 ` [PATCH 2/8] cfq-iosched: fix async oom queue handling Tejun Heo
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-08  8:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna, Tejun Heo

cfq_get_queue()'s control flow looks like the following.

	async_cfqq = NULL;
	cfqq = NULL;

	if (!is_sync) {
		...
		async_cfqq = ...;
		cfqq = *async_cfqq;
	}

	if (!cfqq)
		cfqq = ...;

	if (!is_sync && !(*async_cfqq))
		...;

The only thing the local variable init, the second if, and the
async_cfqq test in the third if achieves is to skip cfqq creation and
installation if *async_cfqq was already non-NULL.  This is needlessly
complicated with different tests examining the same condition.
Simplify it to the following.

	if (!is_sync) {
		...
		async_cfqq = ...;
		cfqq = *async_cfqq;
		if (cfqq)
			goto out;
	}

	cfqq = ...;

	if (!is_sync)
		...;
 out:

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/cfq-iosched.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index bc8f429..2814bb7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3663,8 +3663,8 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 {
 	int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
 	int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
-	struct cfq_queue **async_cfqq = NULL;
-	struct cfq_queue *cfqq = NULL;
+	struct cfq_queue **async_cfqq;
+	struct cfq_queue *cfqq;
 
 	if (!is_sync) {
 		if (!ioprio_valid(cic->ioprio)) {
@@ -3674,19 +3674,20 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 		}
 		async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
 		cfqq = *async_cfqq;
+		if (cfqq)
+			goto out;
 	}
 
-	if (!cfqq)
-		cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);
+	cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);
 
 	/*
 	 * pin the queue now that it's allocated, scheduler exit will prune it
 	 */
-	if (!is_sync && !(*async_cfqq)) {
+	if (!is_sync) {
 		cfqq->ref++;
 		*async_cfqq = cfqq;
 	}
-
+out:
 	cfqq->ref++;
 	return cfqq;
 }
-- 
2.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/8] cfq-iosched: fix async oom queue handling
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
  2015-06-08  8:59 ` [PATCH 1/8] cfq-iosched: simplify control flow in cfq_get_queue() Tejun Heo
@ 2015-06-08  8:59 ` Tejun Heo
  2015-06-08 18:42   ` Jeff Moyer
  2015-06-08  8:59 ` [PATCH 3/8] cfq-iosched: fix oom cfq_queue ref leak in cfq_set_request() Tejun Heo
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-08  8:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna, Tejun Heo

Async cfqq's (cfq_queue's) are shared across cfq_data.  When
cfq_get_queue() obtains a new queue from cfq_find_alloc_queue(), it
stashes the pointer in cfq_data and reuses it from then on; however,
the function doesn't consider that cfq_find_alloc_queue() may return
the oom_cfqq under memory pressure and installs the returned queue
unconditionally.

If the oom_cfqq is installed as an async cfqq, cfq_set_request() will
continue calling cfq_get_queue() hoping to replace it with a proper
queue; however, cfq_get_queue() will keep returning the cached queue
for the slot - the oom_cfqq.

Fix it by skipping caching if the queue is the oom one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/cfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 2814bb7..c7b33aa 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3683,7 +3683,7 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 	/*
 	 * pin the queue now that it's allocated, scheduler exit will prune it
 	 */
-	if (!is_sync) {
+	if (!is_sync && cfqq != &cfqd->oom_cfqq) {
 		cfqq->ref++;
 		*async_cfqq = cfqq;
 	}
-- 
2.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/8] cfq-iosched: fix oom cfq_queue ref leak in cfq_set_request()
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
  2015-06-08  8:59 ` [PATCH 1/8] cfq-iosched: simplify control flow in cfq_get_queue() Tejun Heo
  2015-06-08  8:59 ` [PATCH 2/8] cfq-iosched: fix async oom queue handling Tejun Heo
@ 2015-06-08  8:59 ` Tejun Heo
  2015-06-08 18:51   ` Jeff Moyer
  2015-06-08  8:59 ` [PATCH 4/8] cfq-iosched: minor cleanups Tejun Heo
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-08  8:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna, Tejun Heo

If the cfq_queue cached in cfq_io_cq is the oom one, cfq_set_request()
replaces it by invoking cfq_get_queue() again without putting the oom
queue leaking the reference it was holding.  While oom queues are not
released through reference counting, they're still reference counted
and this can theoretically lead to the reference count overflowing and
incorrectly invoke the usual release path on it.

Fix it by making cfq_set_request() put the ref it was holding.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/cfq-iosched.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index c7b33aa..e0a34ba 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -4231,6 +4231,8 @@ cfq_set_request(struct request_queue *q, struct request *rq, struct bio *bio,
 new_queue:
 	cfqq = cic_to_cfqq(cic, is_sync);
 	if (!cfqq || cfqq == &cfqd->oom_cfqq) {
+		if (cfqq)
+			cfq_put_queue(cfqq);
 		cfqq = cfq_get_queue(cfqd, is_sync, cic, bio, gfp_mask);
 		cic_set_cfqq(cic, cfqq, is_sync);
 	} else {
-- 
2.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/8] cfq-iosched: minor cleanups
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
                   ` (2 preceding siblings ...)
  2015-06-08  8:59 ` [PATCH 3/8] cfq-iosched: fix oom cfq_queue ref leak in cfq_set_request() Tejun Heo
@ 2015-06-08  8:59 ` Tejun Heo
  2015-06-08 18:59   ` Jeff Moyer
  2015-06-08  8:59 ` [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue() Tejun Heo
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-08  8:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna, Tejun Heo

* Some were accessing cic->cfqq[] directly.  Always use cic_to_cfqq()
  and cic_set_cfqq().

* check_ioprio_changed() doesn't need to verify cfq_get_queue()'s
  return for NULL.  It's always non-NULL.  Simplify accordingly.

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/cfq-iosched.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e0a34ba..90d5a87 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3438,14 +3438,14 @@ static void cfq_exit_icq(struct io_cq *icq)
 	struct cfq_io_cq *cic = icq_to_cic(icq);
 	struct cfq_data *cfqd = cic_to_cfqd(cic);
 
-	if (cic->cfqq[BLK_RW_ASYNC]) {
-		cfq_exit_cfqq(cfqd, cic->cfqq[BLK_RW_ASYNC]);
-		cic->cfqq[BLK_RW_ASYNC] = NULL;
+	if (cic_to_cfqq(cic, false)) {
+		cfq_exit_cfqq(cfqd, cic_to_cfqq(cic, false));
+		cic_set_cfqq(cic, NULL, false);
 	}
 
-	if (cic->cfqq[BLK_RW_SYNC]) {
-		cfq_exit_cfqq(cfqd, cic->cfqq[BLK_RW_SYNC]);
-		cic->cfqq[BLK_RW_SYNC] = NULL;
+	if (cic_to_cfqq(cic, true)) {
+		cfq_exit_cfqq(cfqd, cic_to_cfqq(cic, true));
+		cic_set_cfqq(cic, NULL, true);
 	}
 }
 
@@ -3504,18 +3504,14 @@ static void check_ioprio_changed(struct cfq_io_cq *cic, struct bio *bio)
 	if (unlikely(!cfqd) || likely(cic->ioprio == ioprio))
 		return;
 
-	cfqq = cic->cfqq[BLK_RW_ASYNC];
+	cfqq = cic_to_cfqq(cic, false);
 	if (cfqq) {
-		struct cfq_queue *new_cfqq;
-		new_cfqq = cfq_get_queue(cfqd, BLK_RW_ASYNC, cic, bio,
-					 GFP_ATOMIC);
-		if (new_cfqq) {
-			cic->cfqq[BLK_RW_ASYNC] = new_cfqq;
-			cfq_put_queue(cfqq);
-		}
+		cfq_put_queue(cfqq);
+		cfqq = cfq_get_queue(cfqd, BLK_RW_ASYNC, cic, bio, GFP_ATOMIC);
+		cic_set_cfqq(cic, cfqq, false);
 	}
 
-	cfqq = cic->cfqq[BLK_RW_SYNC];
+	cfqq = cic_to_cfqq(cic, true);
 	if (cfqq)
 		cfq_mark_cfqq_prio_changed(cfqq);
 
-- 
2.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue()
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
                   ` (3 preceding siblings ...)
  2015-06-08  8:59 ` [PATCH 4/8] cfq-iosched: minor cleanups Tejun Heo
@ 2015-06-08  8:59 ` Tejun Heo
  2015-06-08 19:24   ` Jeff Moyer
  2015-06-08  8:59 ` [PATCH 6/8] cfq-iosched: move cfq_group determination from cfq_find_alloc_queue() to cfq_get_queue() Tejun Heo
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-08  8:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna, Tejun Heo

Even when allocations fail, cfq_find_alloc_queue() always returns a
valid cfq_queue by falling back to the oom cfq_queue.  As such, there
isn't much point in taking @gfp_mask and trying "harder" if __GFP_WAIT
is set.  GFP_ATOMIC allocations don't fail often and even when they do
the degraded behavior is acceptable and temporary.

After all, the only reason get_request(), which ultimately determines
the gfp_mask, cares about __GFP_WAIT is to guarantee request
allocation, assuming IO forward progress, for callers which are
willing to wait.  There's no reason for cfq_find_alloc_queue() to
behave differently on __GFP_WAIT when it already has a fallback
mechanism.

Remove @gfp_mask from cfq_find_alloc_queue() and propagate the changes
to its callers.  This simplifies the function quite a bit and will
help making async queues per-cfq_group.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/cfq-iosched.c | 45 ++++++++++-----------------------------------
 1 file changed, 10 insertions(+), 35 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 90d5a87..b8e83cd 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -858,8 +858,7 @@ static inline int cfqg_busy_async_queues(struct cfq_data *cfqd,
 
 static void cfq_dispatch_insert(struct request_queue *, struct request *);
 static struct cfq_queue *cfq_get_queue(struct cfq_data *cfqd, bool is_sync,
-				       struct cfq_io_cq *cic, struct bio *bio,
-				       gfp_t gfp_mask);
+				       struct cfq_io_cq *cic, struct bio *bio);
 
 static inline struct cfq_io_cq *icq_to_cic(struct io_cq *icq)
 {
@@ -3507,7 +3506,7 @@ static void check_ioprio_changed(struct cfq_io_cq *cic, struct bio *bio)
 	cfqq = cic_to_cfqq(cic, false);
 	if (cfqq) {
 		cfq_put_queue(cfqq);
-		cfqq = cfq_get_queue(cfqd, BLK_RW_ASYNC, cic, bio, GFP_ATOMIC);
+		cfqq = cfq_get_queue(cfqd, BLK_RW_ASYNC, cic, bio);
 		cic_set_cfqq(cic, cfqq, false);
 	}
 
@@ -3575,13 +3574,12 @@ static inline void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio) {
 
 static struct cfq_queue *
 cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
-		     struct bio *bio, gfp_t gfp_mask)
+		     struct bio *bio)
 {
 	struct blkcg *blkcg;
-	struct cfq_queue *cfqq, *new_cfqq = NULL;
+	struct cfq_queue *cfqq;
 	struct cfq_group *cfqg;
 
-retry:
 	rcu_read_lock();
 
 	blkcg = bio_blkcg(bio);
@@ -3598,27 +3596,9 @@ cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 	 * originally, since it should just be a temporary situation.
 	 */
 	if (!cfqq || cfqq == &cfqd->oom_cfqq) {
-		cfqq = NULL;
-		if (new_cfqq) {
-			cfqq = new_cfqq;
-			new_cfqq = NULL;
-		} else if (gfp_mask & __GFP_WAIT) {
-			rcu_read_unlock();
-			spin_unlock_irq(cfqd->queue->queue_lock);
-			new_cfqq = kmem_cache_alloc_node(cfq_pool,
-					gfp_mask | __GFP_ZERO,
-					cfqd->queue->node);
-			spin_lock_irq(cfqd->queue->queue_lock);
-			if (new_cfqq)
-				goto retry;
-			else
-				return &cfqd->oom_cfqq;
-		} else {
-			cfqq = kmem_cache_alloc_node(cfq_pool,
-					gfp_mask | __GFP_ZERO,
-					cfqd->queue->node);
-		}
-
+		cfqq = kmem_cache_alloc_node(cfq_pool,
+					     GFP_ATOMIC | __GFP_ZERO,
+					     cfqd->queue->node);
 		if (cfqq) {
 			cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
 			cfq_init_prio_data(cfqq, cic);
@@ -3628,9 +3608,6 @@ cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 			cfqq = &cfqd->oom_cfqq;
 	}
 out:
-	if (new_cfqq)
-		kmem_cache_free(cfq_pool, new_cfqq);
-
 	rcu_read_unlock();
 	return cfqq;
 }
@@ -3655,7 +3632,7 @@ cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
 
 static struct cfq_queue *
 cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
-	      struct bio *bio, gfp_t gfp_mask)
+	      struct bio *bio)
 {
 	int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
 	int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
@@ -3674,7 +3651,7 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 			goto out;
 	}
 
-	cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);
+	cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio);
 
 	/*
 	 * pin the queue now that it's allocated, scheduler exit will prune it
@@ -4218,8 +4195,6 @@ cfq_set_request(struct request_queue *q, struct request *rq, struct bio *bio,
 	const bool is_sync = rq_is_sync(rq);
 	struct cfq_queue *cfqq;
 
-	might_sleep_if(gfp_mask & __GFP_WAIT);
-
 	spin_lock_irq(q->queue_lock);
 
 	check_ioprio_changed(cic, bio);
@@ -4229,7 +4204,7 @@ cfq_set_request(struct request_queue *q, struct request *rq, struct bio *bio,
 	if (!cfqq || cfqq == &cfqd->oom_cfqq) {
 		if (cfqq)
 			cfq_put_queue(cfqq);
-		cfqq = cfq_get_queue(cfqd, is_sync, cic, bio, gfp_mask);
+		cfqq = cfq_get_queue(cfqd, is_sync, cic, bio);
 		cic_set_cfqq(cic, cfqq, is_sync);
 	} else {
 		/*
-- 
2.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/8] cfq-iosched: move cfq_group determination from cfq_find_alloc_queue() to cfq_get_queue()
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
                   ` (4 preceding siblings ...)
  2015-06-08  8:59 ` [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue() Tejun Heo
@ 2015-06-08  8:59 ` Tejun Heo
  2015-06-09 14:32   ` Jeff Moyer
  2015-06-08  8:59 ` [PATCH 7/8] cfq-iosched: fold cfq_find_alloc_queue() into cfq_get_queue() Tejun Heo
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-08  8:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna, Tejun Heo

This is necessary for making async cfq_cgroups per-cfq_group instead
of per-cfq_data.  While this change makes cfq_get_queue() perform RCU
locking and look up cfq_group even when it reuses async queue, the
extra overhead is extremely unlikely to be noticeable given that this
is already sitting behind cic->cfqq[] cache and the overall cost of
cfq operation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/cfq-iosched.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b8e83cd..a775128 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3573,21 +3573,10 @@ static inline void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio) {
 #endif  /* CONFIG_CFQ_GROUP_IOSCHED */
 
 static struct cfq_queue *
-cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
-		     struct bio *bio)
+cfq_find_alloc_queue(struct cfq_data *cfqd, struct cfq_group *cfqg, bool is_sync,
+		     struct cfq_io_cq *cic, struct bio *bio)
 {
-	struct blkcg *blkcg;
 	struct cfq_queue *cfqq;
-	struct cfq_group *cfqg;
-
-	rcu_read_lock();
-
-	blkcg = bio_blkcg(bio);
-	cfqg = cfq_lookup_create_cfqg(cfqd, blkcg);
-	if (!cfqg) {
-		cfqq = &cfqd->oom_cfqq;
-		goto out;
-	}
 
 	cfqq = cic_to_cfqq(cic, is_sync);
 
@@ -3607,8 +3596,6 @@ cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 		} else
 			cfqq = &cfqd->oom_cfqq;
 	}
-out:
-	rcu_read_unlock();
 	return cfqq;
 }
 
@@ -3638,6 +3625,14 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 	int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
 	struct cfq_queue **async_cfqq;
 	struct cfq_queue *cfqq;
+	struct cfq_group *cfqg;
+
+	rcu_read_lock();
+	cfqg = cfq_lookup_create_cfqg(cfqd, bio_blkcg(bio));
+	if (!cfqg) {
+		cfqq = &cfqd->oom_cfqq;
+		goto out;
+	}
 
 	if (!is_sync) {
 		if (!ioprio_valid(cic->ioprio)) {
@@ -3651,7 +3646,7 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 			goto out;
 	}
 
-	cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio);
+	cfqq = cfq_find_alloc_queue(cfqd, cfqg, is_sync, cic, bio);
 
 	/*
 	 * pin the queue now that it's allocated, scheduler exit will prune it
@@ -3662,6 +3657,7 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 	}
 out:
 	cfqq->ref++;
+	rcu_read_unlock();
 	return cfqq;
 }
 
-- 
2.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 7/8] cfq-iosched: fold cfq_find_alloc_queue() into cfq_get_queue()
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
                   ` (5 preceding siblings ...)
  2015-06-08  8:59 ` [PATCH 6/8] cfq-iosched: move cfq_group determination from cfq_find_alloc_queue() to cfq_get_queue() Tejun Heo
@ 2015-06-08  8:59 ` Tejun Heo
  2015-06-09 14:40   ` Jeff Moyer
  2015-06-08  8:59 ` [PATCH 8/8] cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root Tejun Heo
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-08  8:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna, Tejun Heo

cfq_find_alloc_queue() checks whether a queue actually needs to be
allocated, which is unnecessary as its sole caller, cfq_get_queue(),
only calls it if so.  Also, the oom queue fallback logic is scattered
between cfq_get_queue() and cfq_find_alloc_queue().  There really
isn't much going on in the latter and things can be made simpler by
folding it into cfq_get_queue().

This patch collapses cfq_find_alloc_queue() into cfq_get_queue().  The
change is fairly straight-forward with one exception - async_cfqq is
now initialized to NULL and the "!is_sync" test in the last if
conditional is replaced with "async_cfqq" test.  This is because gcc
(5.1.1) gets confused for some reason and warns that async_cfqq may be
used uninitialized otherwise.  Oh well, the code isn't necessarily
worse this way.

This patch doesn't cause any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/cfq-iosched.c | 49 +++++++++++++++----------------------------------
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index a775128..393befb 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3572,33 +3572,6 @@ static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
 static inline void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio) { }
 #endif  /* CONFIG_CFQ_GROUP_IOSCHED */
 
-static struct cfq_queue *
-cfq_find_alloc_queue(struct cfq_data *cfqd, struct cfq_group *cfqg, bool is_sync,
-		     struct cfq_io_cq *cic, struct bio *bio)
-{
-	struct cfq_queue *cfqq;
-
-	cfqq = cic_to_cfqq(cic, is_sync);
-
-	/*
-	 * Always try a new alloc if we fell back to the OOM cfqq
-	 * originally, since it should just be a temporary situation.
-	 */
-	if (!cfqq || cfqq == &cfqd->oom_cfqq) {
-		cfqq = kmem_cache_alloc_node(cfq_pool,
-					     GFP_ATOMIC | __GFP_ZERO,
-					     cfqd->queue->node);
-		if (cfqq) {
-			cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
-			cfq_init_prio_data(cfqq, cic);
-			cfq_link_cfqq_cfqg(cfqq, cfqg);
-			cfq_log_cfqq(cfqd, cfqq, "alloced");
-		} else
-			cfqq = &cfqd->oom_cfqq;
-	}
-	return cfqq;
-}
-
 static struct cfq_queue **
 cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
 {
@@ -3623,7 +3596,7 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 {
 	int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
 	int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
-	struct cfq_queue **async_cfqq;
+	struct cfq_queue **async_cfqq = NULL;
 	struct cfq_queue *cfqq;
 	struct cfq_group *cfqg;
 
@@ -3646,12 +3619,20 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 			goto out;
 	}
 
-	cfqq = cfq_find_alloc_queue(cfqd, cfqg, is_sync, cic, bio);
+	cfqq = kmem_cache_alloc_node(cfq_pool, GFP_ATOMIC | __GFP_ZERO,
+				     cfqd->queue->node);
+	if (!cfqq) {
+		cfqq = &cfqd->oom_cfqq;
+		goto out;
+	}
+
+	cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
+	cfq_init_prio_data(cfqq, cic);
+	cfq_link_cfqq_cfqg(cfqq, cfqg);
+	cfq_log_cfqq(cfqd, cfqq, "alloced");
 
-	/*
-	 * pin the queue now that it's allocated, scheduler exit will prune it
-	 */
-	if (!is_sync && cfqq != &cfqd->oom_cfqq) {
+	if (async_cfqq) {
+		/* a new async queue is created, pin and remember */
 		cfqq->ref++;
 		*async_cfqq = cfqq;
 	}
@@ -4401,7 +4382,7 @@ static int cfq_init_queue(struct request_queue *q, struct elevator_type *e)
 		cfqd->prio_trees[i] = RB_ROOT;
 
 	/*
-	 * Our fallback cfqq if cfq_find_alloc_queue() runs into OOM issues.
+	 * Our fallback cfqq if cfq_get_queue() runs into OOM issues.
 	 * Grab a permanent reference to it, so that the normal code flow
 	 * will not attempt to free it.  oom_cfqq is linked to root_group
 	 * but shouldn't hold a reference as it'll never be unlinked.  Lose
-- 
2.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 8/8] cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
                   ` (6 preceding siblings ...)
  2015-06-08  8:59 ` [PATCH 7/8] cfq-iosched: fold cfq_find_alloc_queue() into cfq_get_queue() Tejun Heo
@ 2015-06-08  8:59 ` Tejun Heo
  2015-06-08 22:29   ` Vivek Goyal
  2015-06-09 15:03   ` Jeff Moyer
  2015-06-08 19:49 ` [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Jeff Moyer
  2015-06-09  4:21 ` [PATCH 4.5/8] blkcg, cfq-iosched: use GFP_NOWAIT instead of GFP_ATOMIC for non-critical allocations Tejun Heo
  9 siblings, 2 replies; 31+ messages in thread
From: Tejun Heo @ 2015-06-08  8:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna, Tejun Heo

Up until now, all async IOs were queued to async queues which are
shared across the whole request_queue, which means that blkcg resource
control is completely void on async IOs including all writeback IOs.
It was done this way because writeback didn't support writeback and
there was no way of telling which writeback IO belonged to which
cgroup; however, writeback recently became cgroup aware and writeback
bio's are sent down properly tagged with the blkcg's to charge them
against.

This patch makes async cfq_queues per-cfq_cgroup instead of
per-cfq_data so that each async IO is charged to the blkcg that it was
tagged for instead of unconditionally attributing it to root.

* cfq_data->async_cfqq and ->async_idle_cfqq are moved to cfq_group
  and alloc / destroy paths are updated accordingly.

* cfq_link_cfqq_cfqg() no longer overrides @cfqg to root for async
  queues.

* check_blkcg_changed() now also invalidates async queues as they no
  longer stay the same across cgroups.

After this patch, cfq's proportional IO control through blkio.weight
works correctly when cgroup writeback is in use.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/cfq-iosched.c | 85 ++++++++++++++++++++++++++---------------------------
 1 file changed, 42 insertions(+), 43 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 393befb..fded8d7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -291,6 +291,11 @@ struct cfq_group {
 	struct cfq_ttime ttime;
 	struct cfqg_stats stats;	/* stats for this cfqg */
 	struct cfqg_stats dead_stats;	/* stats pushed from dead children */
+
+	/* async queue for each priority case */
+	struct cfq_queue *async_cfqq[2][IOPRIO_BE_NR];
+	struct cfq_queue *async_idle_cfqq;
+
 };
 
 struct cfq_io_cq {
@@ -356,12 +361,6 @@ struct cfq_data {
 	struct cfq_queue *active_queue;
 	struct cfq_io_cq *active_cic;
 
-	/*
-	 * async queue for each priority case
-	 */
-	struct cfq_queue *async_cfqq[2][IOPRIO_BE_NR];
-	struct cfq_queue *async_idle_cfqq;
-
 	sector_t last_position;
 
 	/*
@@ -387,6 +386,7 @@ struct cfq_data {
 };
 
 static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
+static void cfq_put_queue(struct cfq_queue *cfqq);
 
 static struct cfq_rb_root *st_for(struct cfq_group *cfqg,
 					    enum wl_class_t class,
@@ -1556,13 +1556,26 @@ static void cfq_pd_init(struct blkcg_gq *blkg)
 
 static void cfq_pd_offline(struct blkcg_gq *blkg)
 {
+	struct cfq_group *cfqg = blkg_to_cfqg(blkg);
+	int i;
+
+	for (i = 0; i < IOPRIO_BE_NR; i++) {
+		if (cfqg->async_cfqq[0][i])
+			cfq_put_queue(cfqg->async_cfqq[0][i]);
+		if (cfqg->async_cfqq[1][i])
+			cfq_put_queue(cfqg->async_cfqq[1][i]);
+	}
+
+	if (cfqg->async_idle_cfqq)
+		cfq_put_queue(cfqg->async_idle_cfqq);
+
 	/*
 	 * @blkg is going offline and will be ignored by
 	 * blkg_[rw]stat_recursive_sum().  Transfer stats to the parent so
 	 * that they don't get lost.  If IOs complete after this point, the
 	 * stats for them will be lost.  Oh well...
 	 */
-	cfqg_stats_xfer_dead(blkg_to_cfqg(blkg));
+	cfqg_stats_xfer_dead(cfqg);
 }
 
 /* offset delta from cfqg->stats to cfqg->dead_stats */
@@ -1625,10 +1638,6 @@ static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd,
 
 static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg)
 {
-	/* Currently, all async queues are mapped to root group */
-	if (!cfq_cfqq_sync(cfqq))
-		cfqg = cfqq->cfqd->root_group;
-
 	cfqq->cfqg = cfqg;
 	/* cfqq reference on cfqg */
 	cfqg_get(cfqg);
@@ -3541,7 +3550,7 @@ static void cfq_init_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
 {
 	struct cfq_data *cfqd = cic_to_cfqd(cic);
-	struct cfq_queue *sync_cfqq;
+	struct cfq_queue *cfqq;
 	uint64_t serial_nr;
 
 	rcu_read_lock();
@@ -3555,15 +3564,22 @@ static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
 	if (unlikely(!cfqd) || likely(cic->blkcg_serial_nr == serial_nr))
 		return;
 
-	sync_cfqq = cic_to_cfqq(cic, 1);
-	if (sync_cfqq) {
-		/*
-		 * Drop reference to sync queue. A new sync queue will be
-		 * assigned in new group upon arrival of a fresh request.
-		 */
-		cfq_log_cfqq(cfqd, sync_cfqq, "changed cgroup");
-		cic_set_cfqq(cic, NULL, 1);
-		cfq_put_queue(sync_cfqq);
+	/*
+	 * Drop reference to queues.  New queues will be assigned in new
+	 * group upon arrival of fresh requests.
+	 */
+	cfqq = cic_to_cfqq(cic, false);
+	if (cfqq) {
+		cfq_log_cfqq(cfqd, cfqq, "changed cgroup");
+		cic_set_cfqq(cic, NULL, false);
+		cfq_put_queue(cfqq);
+	}
+
+	cfqq = cic_to_cfqq(cic, true);
+	if (cfqq) {
+		cfq_log_cfqq(cfqd, cfqq, "changed cgroup");
+		cic_set_cfqq(cic, NULL, true);
+		cfq_put_queue(cfqq);
 	}
 
 	cic->blkcg_serial_nr = serial_nr;
@@ -3573,18 +3589,18 @@ static inline void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio) {
 #endif  /* CONFIG_CFQ_GROUP_IOSCHED */
 
 static struct cfq_queue **
-cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
+cfq_async_queue_prio(struct cfq_group *cfqg, int ioprio_class, int ioprio)
 {
 	switch (ioprio_class) {
 	case IOPRIO_CLASS_RT:
-		return &cfqd->async_cfqq[0][ioprio];
+		return &cfqg->async_cfqq[0][ioprio];
 	case IOPRIO_CLASS_NONE:
 		ioprio = IOPRIO_NORM;
 		/* fall through */
 	case IOPRIO_CLASS_BE:
-		return &cfqd->async_cfqq[1][ioprio];
+		return &cfqg->async_cfqq[1][ioprio];
 	case IOPRIO_CLASS_IDLE:
-		return &cfqd->async_idle_cfqq;
+		return &cfqg->async_idle_cfqq;
 	default:
 		BUG();
 	}
@@ -3613,7 +3629,7 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 			ioprio = task_nice_ioprio(tsk);
 			ioprio_class = task_nice_ioclass(tsk);
 		}
-		async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
+		async_cfqq = cfq_async_queue_prio(cfqg, ioprio_class, ioprio);
 		cfqq = *async_cfqq;
 		if (cfqq)
 			goto out;
@@ -4287,21 +4303,6 @@ static void cfq_shutdown_timer_wq(struct cfq_data *cfqd)
 	cancel_work_sync(&cfqd->unplug_work);
 }
 
-static void cfq_put_async_queues(struct cfq_data *cfqd)
-{
-	int i;
-
-	for (i = 0; i < IOPRIO_BE_NR; i++) {
-		if (cfqd->async_cfqq[0][i])
-			cfq_put_queue(cfqd->async_cfqq[0][i]);
-		if (cfqd->async_cfqq[1][i])
-			cfq_put_queue(cfqd->async_cfqq[1][i]);
-	}
-
-	if (cfqd->async_idle_cfqq)
-		cfq_put_queue(cfqd->async_idle_cfqq);
-}
-
 static void cfq_exit_queue(struct elevator_queue *e)
 {
 	struct cfq_data *cfqd = e->elevator_data;
@@ -4314,8 +4315,6 @@ static void cfq_exit_queue(struct elevator_queue *e)
 	if (cfqd->active_queue)
 		__cfq_slice_expired(cfqd, cfqd->active_queue, 0);
 
-	cfq_put_async_queues(cfqd);
-
 	spin_unlock_irq(q->queue_lock);
 
 	cfq_shutdown_timer_wq(cfqd);
-- 
2.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/8] cfq-iosched: simplify control flow in cfq_get_queue()
  2015-06-08  8:59 ` [PATCH 1/8] cfq-iosched: simplify control flow in cfq_get_queue() Tejun Heo
@ 2015-06-08 18:36   ` Jeff Moyer
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Moyer @ 2015-06-08 18:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> cfq_get_queue()'s control flow looks like the following.
>
> 	async_cfqq = NULL;
> 	cfqq = NULL;
>
> 	if (!is_sync) {
> 		...
> 		async_cfqq = ...;
> 		cfqq = *async_cfqq;
> 	}
>
> 	if (!cfqq)
> 		cfqq = ...;
>
> 	if (!is_sync && !(*async_cfqq))
> 		...;
>
> The only thing the local variable init, the second if, and the
> async_cfqq test in the third if achieves is to skip cfqq creation and
> installation if *async_cfqq was already non-NULL.  This is needlessly
> complicated with different tests examining the same condition.
> Simplify it to the following.
>
> 	if (!is_sync) {
> 		...
> 		async_cfqq = ...;
> 		cfqq = *async_cfqq;
> 		if (cfqq)
> 			goto out;
> 	}
>
> 	cfqq = ...;
>
> 	if (!is_sync)
> 		...;
>  out:
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Arianna Avanzini <avanzini.arianna@gmail.com>

Acked-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/8] cfq-iosched: fix async oom queue handling
  2015-06-08  8:59 ` [PATCH 2/8] cfq-iosched: fix async oom queue handling Tejun Heo
@ 2015-06-08 18:42   ` Jeff Moyer
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Moyer @ 2015-06-08 18:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> Async cfqq's (cfq_queue's) are shared across cfq_data.  When
> cfq_get_queue() obtains a new queue from cfq_find_alloc_queue(), it
> stashes the pointer in cfq_data and reuses it from then on; however,
> the function doesn't consider that cfq_find_alloc_queue() may return
> the oom_cfqq under memory pressure and installs the returned queue
> unconditionally.
>
> If the oom_cfqq is installed as an async cfqq, cfq_set_request() will
> continue calling cfq_get_queue() hoping to replace it with a proper
> queue; however, cfq_get_queue() will keep returning the cached queue
> for the slot - the oom_cfqq.
>
> Fix it by skipping caching if the queue is the oom one.

Good catch.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/8] cfq-iosched: fix oom cfq_queue ref leak in cfq_set_request()
  2015-06-08  8:59 ` [PATCH 3/8] cfq-iosched: fix oom cfq_queue ref leak in cfq_set_request() Tejun Heo
@ 2015-06-08 18:51   ` Jeff Moyer
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Moyer @ 2015-06-08 18:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> If the cfq_queue cached in cfq_io_cq is the oom one, cfq_set_request()
> replaces it by invoking cfq_get_queue() again without putting the oom
> queue leaking the reference it was holding.  While oom queues are not
> released through reference counting, they're still reference counted
> and this can theoretically lead to the reference count overflowing and
> incorrectly invoke the usual release path on it.
>
> Fix it by making cfq_set_request() put the ref it was holding.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Arianna Avanzini <avanzini.arianna@gmail.com>

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/8] cfq-iosched: minor cleanups
  2015-06-08  8:59 ` [PATCH 4/8] cfq-iosched: minor cleanups Tejun Heo
@ 2015-06-08 18:59   ` Jeff Moyer
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Moyer @ 2015-06-08 18:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> * Some were accessing cic->cfqq[] directly.  Always use cic_to_cfqq()
>   and cic_set_cfqq().
>
> * check_ioprio_changed() doesn't need to verify cfq_get_queue()'s
>   return for NULL.  It's always non-NULL.  Simplify accordingly.
>
> This patch doesn't cause any functional changes.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Arianna Avanzini <avanzini.arianna@gmail.com>

Looks good.  I don't much like the "bool is_sync" function parameters.
Calls to cic_to_cfqq, cic_set_cfqq, etc would be much easier to read if
they took BLK_RW_SYNC and BLK_RW_ASYNC.  Certainly not a problem with
this patch, though, just a general observation.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue()
  2015-06-08  8:59 ` [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue() Tejun Heo
@ 2015-06-08 19:24   ` Jeff Moyer
  2015-06-08 20:27     ` Jeff Moyer
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Moyer @ 2015-06-08 19:24 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> Even when allocations fail, cfq_find_alloc_queue() always returns a
> valid cfq_queue by falling back to the oom cfq_queue.  As such, there
> isn't much point in taking @gfp_mask and trying "harder" if __GFP_WAIT
> is set.  GFP_ATOMIC allocations don't fail often and even when they do
> the degraded behavior is acceptable and temporary.
>
> After all, the only reason get_request(), which ultimately determines
> the gfp_mask, cares about __GFP_WAIT is to guarantee request
> allocation, assuming IO forward progress, for callers which are
> willing to wait.  There's no reason for cfq_find_alloc_queue() to
> behave differently on __GFP_WAIT when it already has a fallback
> mechanism.
>
> Remove @gfp_mask from cfq_find_alloc_queue() and propagate the changes
> to its callers.  This simplifies the function quite a bit and will
> help making async queues per-cfq_group.

Sorry, I disagree with this patch.  You've changed it so that all cfqq
allocations are GFP_ATOMIC, and most, if not all of them simply don't
need to be.

I'll take it one step further and suggest that we should fix
cfq_find_alloc_queue to pass the gfp_mask to check_ioprio_changed.  That
also shouldn't be using GFP_ATOMIC unconditionally.

NAK

-Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
                   ` (7 preceding siblings ...)
  2015-06-08  8:59 ` [PATCH 8/8] cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root Tejun Heo
@ 2015-06-08 19:49 ` Jeff Moyer
  2015-06-09  3:03   ` Tejun Heo
  2015-06-09  4:21 ` [PATCH 4.5/8] blkcg, cfq-iosched: use GFP_NOWAIT instead of GFP_ATOMIC for non-critical allocations Tejun Heo
  9 siblings, 1 reply; 31+ messages in thread
From: Jeff Moyer @ 2015-06-08 19:49 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> Hello,
>
> cfq has always charged all async IOs to the root cgroup.  It didn't
> have much choice as writeback didn't know about cgroups and there was
> no way to tell who to blame for a given writeback IO.  writeback
> finally grew support for cgroups and now tags each writeback IO with
> the appropriate cgroup to charge it against.
>
> This patchset updates cfq so that it follows the blkcg each bio is
> tagged with.  Async cfq_queues are now shared across cfq_group, which
> is per-cgroup, instead of per-request_queue cfq_data.  This makes all
> IOs follow the weight based IO resource distribution implemented by
> cfq.
>
> This patchset contains the following 8 patches.
>
>  0001-cfq-iosched-simplify-control-flow-in-cfq_get_queue.patch
>  0002-cfq-iosched-fix-async-oom-queue-handling.patch
>  0003-cfq-iosched-fix-oom-cfq_queue-ref-leak-in-cfq_set_re.patch
>  0004-cfq-iosched-minor-cleanups.patch
>  0005-cfq-iosched-remove-gfp_mask-from-cfq_find_alloc_queu.patch
>  0006-cfq-iosched-move-cfq_group-determination-from-cfq_fi.patch
>  0007-cfq-iosched-fold-cfq_find_alloc_queue-into-cfq_get_q.patch
>  0008-cfq-iosched-charge-async-IOs-to-the-appropriate-blkc.patch

Hi, Tejun,

Assuming you're ok with dropping patch 5, I'll review patches 6-8 once
they've been reworked to account for that.  I took a look at them, and
they look OK to me.  But, if they are going to change, I'd rather wait
to ack the final versions.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue()
  2015-06-08 19:24   ` Jeff Moyer
@ 2015-06-08 20:27     ` Jeff Moyer
  2015-06-08 21:19       ` Vivek Goyal
  2015-06-09  3:00       ` Tejun Heo
  0 siblings, 2 replies; 31+ messages in thread
From: Jeff Moyer @ 2015-06-08 20:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Jeff Moyer <jmoyer@redhat.com> writes:

> Tejun Heo <tj@kernel.org> writes:
>
>> Even when allocations fail, cfq_find_alloc_queue() always returns a
>> valid cfq_queue by falling back to the oom cfq_queue.  As such, there
>> isn't much point in taking @gfp_mask and trying "harder" if __GFP_WAIT
>> is set.  GFP_ATOMIC allocations don't fail often and even when they do
>> the degraded behavior is acceptable and temporary.
>>
>> After all, the only reason get_request(), which ultimately determines
>> the gfp_mask, cares about __GFP_WAIT is to guarantee request
>> allocation, assuming IO forward progress, for callers which are
>> willing to wait.  There's no reason for cfq_find_alloc_queue() to
>> behave differently on __GFP_WAIT when it already has a fallback
>> mechanism.
>>
>> Remove @gfp_mask from cfq_find_alloc_queue() and propagate the changes
>> to its callers.  This simplifies the function quite a bit and will
>> help making async queues per-cfq_group.
>
> Sorry, I disagree with this patch.  You've changed it so that all cfqq
> allocations are GFP_ATOMIC, and most, if not all of them simply don't
> need to be.

It occurs to me that replacing GFP_ATOMIC with GFP_NOWAIT in your patch
would address my concerns, and patches 6-8 would apply almost as-is.
What do you think about that?

-Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue()
  2015-06-08 20:27     ` Jeff Moyer
@ 2015-06-08 21:19       ` Vivek Goyal
  2015-06-09  3:01         ` Tejun Heo
  2015-06-09  3:00       ` Tejun Heo
  1 sibling, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2015-06-08 21:19 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Tejun Heo, axboe, linux-kernel, cgroups, avanzini.arianna

On Mon, Jun 08, 2015 at 04:27:10PM -0400, Jeff Moyer wrote:
> Jeff Moyer <jmoyer@redhat.com> writes:
> 
> > Tejun Heo <tj@kernel.org> writes:
> >
> >> Even when allocations fail, cfq_find_alloc_queue() always returns a
> >> valid cfq_queue by falling back to the oom cfq_queue.  As such, there
> >> isn't much point in taking @gfp_mask and trying "harder" if __GFP_WAIT
> >> is set.  GFP_ATOMIC allocations don't fail often and even when they do
> >> the degraded behavior is acceptable and temporary.
> >>
> >> After all, the only reason get_request(), which ultimately determines
> >> the gfp_mask, cares about __GFP_WAIT is to guarantee request
> >> allocation, assuming IO forward progress, for callers which are
> >> willing to wait.  There's no reason for cfq_find_alloc_queue() to
> >> behave differently on __GFP_WAIT when it already has a fallback
> >> mechanism.
> >>
> >> Remove @gfp_mask from cfq_find_alloc_queue() and propagate the changes
> >> to its callers.  This simplifies the function quite a bit and will
> >> help making async queues per-cfq_group.
> >
> > Sorry, I disagree with this patch.  You've changed it so that all cfqq
> > allocations are GFP_ATOMIC, and most, if not all of them simply don't
> > need to be.
> 
> It occurs to me that replacing GFP_ATOMIC with GFP_NOWAIT in your patch
> would address my concerns, and patches 6-8 would apply almost as-is.
> What do you think about that?

Whatever we end up using, may be it is a good idea to use same policy
for block group allocation too. Right now we use GFP_ATOMIC for blkcg
allocation.

So this will be equivalent of that when memory is low, we don't provide
service differentiation.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/8] cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root
  2015-06-08  8:59 ` [PATCH 8/8] cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root Tejun Heo
@ 2015-06-08 22:29   ` Vivek Goyal
  2015-06-09  3:11     ` Tejun Heo
  2015-06-09 15:03   ` Jeff Moyer
  1 sibling, 1 reply; 31+ messages in thread
From: Vivek Goyal @ 2015-06-08 22:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: axboe, linux-kernel, cgroups, avanzini.arianna,
	device-mapper development

On Mon, Jun 08, 2015 at 05:59:33PM +0900, Tejun Heo wrote:
> Up until now, all async IOs were queued to async queues which are
> shared across the whole request_queue, which means that blkcg resource
> control is completely void on async IOs including all writeback IOs.
> It was done this way because writeback didn't support writeback and
> there was no way of telling which writeback IO belonged to which
> cgroup; however, writeback recently became cgroup aware and writeback
> bio's are sent down properly tagged with the blkcg's to charge them
> against.
> 
> This patch makes async cfq_queues per-cfq_cgroup instead of
> per-cfq_data so that each async IO is charged to the blkcg that it was
> tagged for instead of unconditionally attributing it to root.
> 
> * cfq_data->async_cfqq and ->async_idle_cfqq are moved to cfq_group
>   and alloc / destroy paths are updated accordingly.
> 
> * cfq_link_cfqq_cfqg() no longer overrides @cfqg to root for async
>   queues.
> 
> * check_blkcg_changed() now also invalidates async queues as they no
>   longer stay the same across cgroups.
> 
> After this patch, cfq's proportional IO control through blkio.weight
> works correctly when cgroup writeback is in use.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
> ---
>  block/cfq-iosched.c | 85 ++++++++++++++++++++++++++---------------------------
>  1 file changed, 42 insertions(+), 43 deletions(-)
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 393befb..fded8d7 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -291,6 +291,11 @@ struct cfq_group {
>  	struct cfq_ttime ttime;
>  	struct cfqg_stats stats;	/* stats for this cfqg */
>  	struct cfqg_stats dead_stats;	/* stats pushed from dead children */
> +
> +	/* async queue for each priority case */
> +	struct cfq_queue *async_cfqq[2][IOPRIO_BE_NR];
> +	struct cfq_queue *async_idle_cfqq;
> +
>  };
>  
>  struct cfq_io_cq {
> @@ -356,12 +361,6 @@ struct cfq_data {
>  	struct cfq_queue *active_queue;
>  	struct cfq_io_cq *active_cic;
>  
> -	/*
> -	 * async queue for each priority case
> -	 */
> -	struct cfq_queue *async_cfqq[2][IOPRIO_BE_NR];
> -	struct cfq_queue *async_idle_cfqq;
> -
>  	sector_t last_position;
>  
>  	/*
> @@ -387,6 +386,7 @@ struct cfq_data {
>  };
>  
>  static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
> +static void cfq_put_queue(struct cfq_queue *cfqq);
>  
>  static struct cfq_rb_root *st_for(struct cfq_group *cfqg,
>  					    enum wl_class_t class,
> @@ -1556,13 +1556,26 @@ static void cfq_pd_init(struct blkcg_gq *blkg)
>  
>  static void cfq_pd_offline(struct blkcg_gq *blkg)
>  {
> +	struct cfq_group *cfqg = blkg_to_cfqg(blkg);
> +	int i;
> +
> +	for (i = 0; i < IOPRIO_BE_NR; i++) {
> +		if (cfqg->async_cfqq[0][i])
> +			cfq_put_queue(cfqg->async_cfqq[0][i]);
> +		if (cfqg->async_cfqq[1][i])
> +			cfq_put_queue(cfqg->async_cfqq[1][i]);
> +	}
> +
> +	if (cfqg->async_idle_cfqq)
> +		cfq_put_queue(cfqg->async_idle_cfqq);
> +
>  	/*
>  	 * @blkg is going offline and will be ignored by
>  	 * blkg_[rw]stat_recursive_sum().  Transfer stats to the parent so
>  	 * that they don't get lost.  If IOs complete after this point, the
>  	 * stats for them will be lost.  Oh well...
>  	 */
> -	cfqg_stats_xfer_dead(blkg_to_cfqg(blkg));
> +	cfqg_stats_xfer_dead(cfqg);
>  }
>  
>  /* offset delta from cfqg->stats to cfqg->dead_stats */
> @@ -1625,10 +1638,6 @@ static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd,
>  
>  static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg)
>  {
> -	/* Currently, all async queues are mapped to root group */
> -	if (!cfq_cfqq_sync(cfqq))
> -		cfqg = cfqq->cfqd->root_group;
> -
>  	cfqq->cfqg = cfqg;
>  	/* cfqq reference on cfqg */
>  	cfqg_get(cfqg);
> @@ -3541,7 +3550,7 @@ static void cfq_init_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
>  {
>  	struct cfq_data *cfqd = cic_to_cfqd(cic);
> -	struct cfq_queue *sync_cfqq;
> +	struct cfq_queue *cfqq;
>  	uint64_t serial_nr;
>  
>  	rcu_read_lock();

> @@ -3555,15 +3564,22 @@ static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
>  	if (unlikely(!cfqd) || likely(cic->blkcg_serial_nr == serial_nr))
>  		return;
>  
> -	sync_cfqq = cic_to_cfqq(cic, 1);
> -	if (sync_cfqq) {
> -		/*
> -		 * Drop reference to sync queue. A new sync queue will be
> -		 * assigned in new group upon arrival of a fresh request.
> -		 */
> -		cfq_log_cfqq(cfqd, sync_cfqq, "changed cgroup");
> -		cic_set_cfqq(cic, NULL, 1);
> -		cfq_put_queue(sync_cfqq);
> +	/*
> +	 * Drop reference to queues.  New queues will be assigned in new
> +	 * group upon arrival of fresh requests.
> +	 */
> +	cfqq = cic_to_cfqq(cic, false);
> +	if (cfqq) {
> +		cfq_log_cfqq(cfqd, cfqq, "changed cgroup");
> +		cic_set_cfqq(cic, NULL, false);
> +		cfq_put_queue(cfqq);
> +	}
> +
> +	cfqq = cic_to_cfqq(cic, true);
> +	if (cfqq) {
> +		cfq_log_cfqq(cfqd, cfqq, "changed cgroup");
> +		cic_set_cfqq(cic, NULL, true);
> +		cfq_put_queue(cfqq);
>  	}

Hi Tejun,

I am getting confused between cgroup and iocontext interaction, hence
some basic questions.

So a bio can carry either both iocontext and cgroup information. If
iocontext or cgroup information is present, it is used during rq
and cfqq allocation otherwise submitter's iocontext and cgroup is
used.

bio_associate_current() will associate an iocontext as well as cgroup
of submitter to bio. Now bio can be offloaded to helper threads for
submission and still be accounted in right cgroup and right iocontext.
As of now only blk throttling layer makes use of it. 

But above is not true for buffered writes, we will not associate io
context. Instead only cgroup information will be sent down and io
context of submitter will be used. 

So any thread which is forced to submit buffered write for some other
cgroup, will have its sync queue also reset (Because CFQ will think
that cgroup of submitter has changed). Not sure how often it will happen,
but if it happens frequenty, this might show up in profiles. I had
mentioned this in the past and IIUC you said we will have to carry
writeback information all the way into lower layers. May be that's
an optimzation for later. So nothing new here, just trying to understand
the current situation.

Also I am wondring if this cgroup and io context information is carried
through dm layer or not. I guess it might be a separate discussion. It
has come for discussion internally in the past. So this might be a good
time to get attention of dm developers on these upcoming changes. (CC dm-devel).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue()
  2015-06-08 20:27     ` Jeff Moyer
  2015-06-08 21:19       ` Vivek Goyal
@ 2015-06-09  3:00       ` Tejun Heo
  2015-06-09 14:29         ` Jeff Moyer
  1 sibling, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-09  3:00 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Hey, Jeff.

On Mon, Jun 08, 2015 at 04:27:10PM -0400, Jeff Moyer wrote:
> >> Remove @gfp_mask from cfq_find_alloc_queue() and propagate the changes
> >> to its callers.  This simplifies the function quite a bit and will
> >> help making async queues per-cfq_group.
> >
> > Sorry, I disagree with this patch.  You've changed it so that all cfqq
> > allocations are GFP_ATOMIC, and most, if not all of them simply don't
> > need to be.
> 
> It occurs to me that replacing GFP_ATOMIC with GFP_NOWAIT in your patch
> would address my concerns, and patches 6-8 would apply almost as-is.
> What do you think about that?

Oh yeah, it's okay to fail these allocations under memory pressure.
GFP_NOWAIT is the better pick here.  It's GFP_ATOMIC mostly due to
historic copy&paste anyway.  I'll change them to GFP_NOWAIT.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue()
  2015-06-08 21:19       ` Vivek Goyal
@ 2015-06-09  3:01         ` Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2015-06-09  3:01 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Jeff Moyer, axboe, linux-kernel, cgroups, avanzini.arianna

On Mon, Jun 08, 2015 at 05:19:30PM -0400, Vivek Goyal wrote:
> Whatever we end up using, may be it is a good idea to use same policy
> for block group allocation too. Right now we use GFP_ATOMIC for blkcg
> allocation.
> 
> So this will be equivalent of that when memory is low, we don't provide
> service differentiation.

Yeap, I'll add a patch to convert all optional allocations to
GFP_NOWAIT.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs
  2015-06-08 19:49 ` [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Jeff Moyer
@ 2015-06-09  3:03   ` Tejun Heo
  2015-06-09 15:05     ` Jeff Moyer
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-09  3:03 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Hey, Jeff.

On Mon, Jun 08, 2015 at 03:49:56PM -0400, Jeff Moyer wrote:
> Assuming you're ok with dropping patch 5, I'll review patches 6-8 once

I'll update it to use GPF_NOWAIT.  Can't drop it easily as that would
complicate group lookup more complicated w/ pinning and all.

> they've been reworked to account for that.  I took a look at them, and
> they look OK to me.  But, if they are going to change, I'd rather wait
> to ack the final versions.

Hmm... the changes will be simple s/GFP_ATOMIC/GFP_NOWAIT.  Shouldn't
affect reviewing, I think.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/8] cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root
  2015-06-08 22:29   ` Vivek Goyal
@ 2015-06-09  3:11     ` Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2015-06-09  3:11 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: axboe, linux-kernel, cgroups, avanzini.arianna,
	device-mapper development

Hey, Vivek.

On Mon, Jun 08, 2015 at 06:29:04PM -0400, Vivek Goyal wrote:
...
> But above is not true for buffered writes, we will not associate io
> context. Instead only cgroup information will be sent down and io
> context of submitter will be used. 
> 
> So any thread which is forced to submit buffered write for some other
> cgroup, will have its sync queue also reset (Because CFQ will think

Yeah, it'll usually be the writeback work items.

> that cgroup of submitter has changed). Not sure how often it will happen,
> but if it happens frequenty, this might show up in profiles. I had

They're unlikely to have sync queues associated with them to begin
with and even when the context gets reset each iteration should be big
enough to drown this sort of overhead.  Writeback operates in a fairly
sizable chunks.

> mentioned this in the past and IIUC you said we will have to carry
> writeback information all the way into lower layers. May be that's
> an optimzation for later. So nothing new here, just trying to understand
> the current situation.

I'm actually kinda doubtful about caching async queues on cic at all.
We already perform most of operations necessary to lookup the async
queue for check_blkcg_changed().  We might as well just look up and
pin each time.

> Also I am wondring if this cgroup and io context information is carried
> through dm layer or not. I guess it might be a separate discussion. It
> has come for discussion internally in the past. So this might be a good
> time to get attention of dm developers on these upcoming changes. (CC dm-devel).

Hmmm... it depends on what dm does with the bio.  It can surely
propagate the cgroup information through sub bios.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 4.5/8] blkcg, cfq-iosched: use GFP_NOWAIT instead of GFP_ATOMIC for non-critical allocations
  2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
                   ` (8 preceding siblings ...)
  2015-06-08 19:49 ` [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Jeff Moyer
@ 2015-06-09  4:21 ` Tejun Heo
  2015-06-09 14:27   ` Jeff Moyer
  9 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2015-06-09  4:21 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, cgroups, vgoyal, avanzini.arianna

>From b848495d2c987e960d1f7794966d82c05fcefc6d Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 9 Jun 2015 13:19:40 +0900

blkcg performs several allocations to track IOs per cgroup and enforce
resource control.  Most of these allocations are performed lazily on
demand in the IO path and thus can't involve reclaim path.  Currently,
these allocations use GFP_ATOMIC; however, blkcg can gracefully deal
with occassional failures of these allocations by punting IOs to the
root cgroup and there's no reason to reach into the emergency reserve.

This patch replaces GFP_ATOMIC with GFP_NOWAIT for the following
allocations.

* bdi_writeback_congested and blkcg_gq allocations in blkg_create().

* radix tree node allocations for blkcg->blkg_tree.

* cfq_queue allocation on ioprio changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Jeff Moyer <jmoyer@redhat.com>
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
---
Hello,

git branch updated accordingly.  A couple of later patches need to be
updated but the changes are trivial.  I'll repost the patchset once
review is done.

Thanks.

 block/blk-cgroup.c  | 8 ++++----
 block/cfq-iosched.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 31610ae..1fddbbd 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -175,7 +175,7 @@ EXPORT_SYMBOL_GPL(blkg_lookup);
 
 /*
  * If @new_blkg is %NULL, this function tries to allocate a new one as
- * necessary using %GFP_ATOMIC.  @new_blkg is always consumed on return.
+ * necessary using %GFP_NOWAIT.  @new_blkg is always consumed on return.
  */
 static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 				    struct request_queue *q,
@@ -195,7 +195,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 	}
 
 	wb_congested = wb_congested_get_create(&q->backing_dev_info,
-					       blkcg->css.id, GFP_ATOMIC);
+					       blkcg->css.id, GFP_NOWAIT);
 	if (!wb_congested) {
 		ret = -ENOMEM;
 		goto err_put_css;
@@ -203,7 +203,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 
 	/* allocate */
 	if (!new_blkg) {
-		new_blkg = blkg_alloc(blkcg, q, GFP_ATOMIC);
+		new_blkg = blkg_alloc(blkcg, q, GFP_NOWAIT);
 		if (unlikely(!new_blkg)) {
 			ret = -ENOMEM;
 			goto err_put_congested;
@@ -841,7 +841,7 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 	blkcg->cfq_leaf_weight = CFQ_WEIGHT_DEFAULT;
 done:
 	spin_lock_init(&blkcg->lock);
-	INIT_RADIX_TREE(&blkcg->blkg_tree, GFP_ATOMIC);
+	INIT_RADIX_TREE(&blkcg->blkg_tree, GFP_NOWAIT);
 	INIT_HLIST_HEAD(&blkcg->blkg_list);
 #ifdef CONFIG_CGROUP_WRITEBACK
 	INIT_LIST_HEAD(&blkcg->cgwb_list);
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 90d5a87..97863ce 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3507,7 +3507,7 @@ static void check_ioprio_changed(struct cfq_io_cq *cic, struct bio *bio)
 	cfqq = cic_to_cfqq(cic, false);
 	if (cfqq) {
 		cfq_put_queue(cfqq);
-		cfqq = cfq_get_queue(cfqd, BLK_RW_ASYNC, cic, bio, GFP_ATOMIC);
+		cfqq = cfq_get_queue(cfqd, BLK_RW_ASYNC, cic, bio, GFP_NOWAIT);
 		cic_set_cfqq(cic, cfqq, false);
 	}
 
-- 
2.4.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 4.5/8] blkcg, cfq-iosched: use GFP_NOWAIT instead of GFP_ATOMIC for non-critical allocations
  2015-06-09  4:21 ` [PATCH 4.5/8] blkcg, cfq-iosched: use GFP_NOWAIT instead of GFP_ATOMIC for non-critical allocations Tejun Heo
@ 2015-06-09 14:27   ` Jeff Moyer
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Moyer @ 2015-06-09 14:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> From b848495d2c987e960d1f7794966d82c05fcefc6d Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Tue, 9 Jun 2015 13:19:40 +0900
>
> blkcg performs several allocations to track IOs per cgroup and enforce
> resource control.  Most of these allocations are performed lazily on
> demand in the IO path and thus can't involve reclaim path.  Currently,
> these allocations use GFP_ATOMIC; however, blkcg can gracefully deal
> with occassional failures of these allocations by punting IOs to the
> root cgroup and there's no reason to reach into the emergency reserve.
>
> This patch replaces GFP_ATOMIC with GFP_NOWAIT for the following
> allocations.
>
> * bdi_writeback_congested and blkcg_gq allocations in blkg_create().
>
> * radix tree node allocations for blkcg->blkg_tree.
>
> * cfq_queue allocation on ioprio changes.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Suggested-by: Jeff Moyer <jmoyer@redhat.com>
> Suggested-by: Vivek Goyal <vgoyal@redhat.com>

Thanks, Tejun!

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue()
  2015-06-09  3:00       ` Tejun Heo
@ 2015-06-09 14:29         ` Jeff Moyer
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Moyer @ 2015-06-09 14:29 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> Hey, Jeff.
>
> On Mon, Jun 08, 2015 at 04:27:10PM -0400, Jeff Moyer wrote:
>> >> Remove @gfp_mask from cfq_find_alloc_queue() and propagate the changes
>> >> to its callers.  This simplifies the function quite a bit and will
>> >> help making async queues per-cfq_group.
>> >
>> > Sorry, I disagree with this patch.  You've changed it so that all cfqq
>> > allocations are GFP_ATOMIC, and most, if not all of them simply don't
>> > need to be.
>> 
>> It occurs to me that replacing GFP_ATOMIC with GFP_NOWAIT in your patch
>> would address my concerns, and patches 6-8 would apply almost as-is.
>> What do you think about that?
>
> Oh yeah, it's okay to fail these allocations under memory pressure.
> GFP_NOWAIT is the better pick here.  It's GFP_ATOMIC mostly due to
> historic copy&paste anyway.  I'll change them to GFP_NOWAIT.

OK.  Assuming that's the only change, you can go ahead and add my
Reviewed-by: Jeff Moyer <jmoyer@redhat.com> to the next iteration.

Thanks!
Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/8] cfq-iosched: move cfq_group determination from cfq_find_alloc_queue() to cfq_get_queue()
  2015-06-08  8:59 ` [PATCH 6/8] cfq-iosched: move cfq_group determination from cfq_find_alloc_queue() to cfq_get_queue() Tejun Heo
@ 2015-06-09 14:32   ` Jeff Moyer
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Moyer @ 2015-06-09 14:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> This is necessary for making async cfq_cgroups per-cfq_group instead
> of per-cfq_data.  While this change makes cfq_get_queue() perform RCU
> locking and look up cfq_group even when it reuses async queue, the
> extra overhead is extremely unlikely to be noticeable given that this
> is already sitting behind cic->cfqq[] cache and the overall cost of
> cfq operation.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Arianna Avanzini <avanzini.arianna@gmail.com>

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/8] cfq-iosched: fold cfq_find_alloc_queue() into cfq_get_queue()
  2015-06-08  8:59 ` [PATCH 7/8] cfq-iosched: fold cfq_find_alloc_queue() into cfq_get_queue() Tejun Heo
@ 2015-06-09 14:40   ` Jeff Moyer
  2015-06-10  2:47     ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Moyer @ 2015-06-09 14:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> cfq_find_alloc_queue() checks whether a queue actually needs to be
> allocated, which is unnecessary as its sole caller, cfq_get_queue(),
> only calls it if so.  Also, the oom queue fallback logic is scattered
> between cfq_get_queue() and cfq_find_alloc_queue().  There really
> isn't much going on in the latter and things can be made simpler by
> folding it into cfq_get_queue().
>
> This patch collapses cfq_find_alloc_queue() into cfq_get_queue().  The
> change is fairly straight-forward with one exception - async_cfqq is
> now initialized to NULL and the "!is_sync" test in the last if
> conditional is replaced with "async_cfqq" test.  This is because gcc
> (5.1.1) gets confused for some reason and warns that async_cfqq may be
> used uninitialized otherwise.  Oh well, the code isn't necessarily
> worse this way.
>
> This patch doesn't cause any functional difference.

The resulting code (introduced by the last patch, I know) is not ideal:

        rcu_read_lock();
        cfqg = cfq_lookup_create_cfqg(cfqd, bio_blkcg(bio));
        if (!cfqg) {
                cfqq = &cfqd->oom_cfqq;
                goto out;
        }

        if (!is_sync) {
                if (!ioprio_valid(cic->ioprio)) {
                        struct task_struct *tsk = current;
                        ioprio = task_nice_ioprio(tsk);
                        ioprio_class = task_nice_ioclass(tsk);
                }
                async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class,
                ioprio);
                cfqq = *async_cfqq;
                if (cfqq)
                        goto out;
        }

As you mentioned, we don't need to lookup the cfqg for the async queue.
What's more is we could fallback to the oom_cfqq even if we had an
existing async cfqq.  I'm guessing you structured the code this way to
make the error path cleaner.  I don't think it's a big deal, as it
should be a rare occurrence, so...

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/8] cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root
  2015-06-08  8:59 ` [PATCH 8/8] cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root Tejun Heo
  2015-06-08 22:29   ` Vivek Goyal
@ 2015-06-09 15:03   ` Jeff Moyer
  1 sibling, 0 replies; 31+ messages in thread
From: Jeff Moyer @ 2015-06-09 15:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> Up until now, all async IOs were queued to async queues which are
> shared across the whole request_queue, which means that blkcg resource
> control is completely void on async IOs including all writeback IOs.
> It was done this way because writeback didn't support writeback and
> there was no way of telling which writeback IO belonged to which
> cgroup; however, writeback recently became cgroup aware and writeback
> bio's are sent down properly tagged with the blkcg's to charge them
> against.
>
> This patch makes async cfq_queues per-cfq_cgroup instead of
> per-cfq_data so that each async IO is charged to the blkcg that it was
> tagged for instead of unconditionally attributing it to root.
>
> * cfq_data->async_cfqq and ->async_idle_cfqq are moved to cfq_group
>   and alloc / destroy paths are updated accordingly.
>
> * cfq_link_cfqq_cfqg() no longer overrides @cfqg to root for async
>   queues.
>
> * check_blkcg_changed() now also invalidates async queues as they no
>   longer stay the same across cgroups.
>
> After this patch, cfq's proportional IO control through blkio.weight
> works correctly when cgroup writeback is in use.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Arianna Avanzini <avanzini.arianna@gmail.com>

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs
  2015-06-09  3:03   ` Tejun Heo
@ 2015-06-09 15:05     ` Jeff Moyer
  2015-06-10  2:49       ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Moyer @ 2015-06-09 15:05 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Tejun Heo <tj@kernel.org> writes:

> Hey, Jeff.
>
> On Mon, Jun 08, 2015 at 03:49:56PM -0400, Jeff Moyer wrote:
>> Assuming you're ok with dropping patch 5, I'll review patches 6-8 once
>
> I'll update it to use GPF_NOWAIT.  Can't drop it easily as that would
> complicate group lookup more complicated w/ pinning and all.
>
>> they've been reworked to account for that.  I took a look at them, and
>> they look OK to me.  But, if they are going to change, I'd rather wait
>> to ack the final versions.
>
> Hmm... the changes will be simple s/GFP_ATOMIC/GFP_NOWAIT.  Shouldn't
> affect reviewing, I think.

Yeah, I sent this email before realizing that GFP_NOWAIT would work out.
I've reviewed the rest of the series.  Looks good to me!  You did a nice
job splitting things up into easily reviewable pieces, so thanks for
that.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/8] cfq-iosched: fold cfq_find_alloc_queue() into cfq_get_queue()
  2015-06-09 14:40   ` Jeff Moyer
@ 2015-06-10  2:47     ` Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2015-06-10  2:47 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Hey, Jeff.

On Tue, Jun 09, 2015 at 10:40:02AM -0400, Jeff Moyer wrote:
> The resulting code (introduced by the last patch, I know) is not ideal:
> 
>         rcu_read_lock();
>         cfqg = cfq_lookup_create_cfqg(cfqd, bio_blkcg(bio));
>         if (!cfqg) {
>                 cfqq = &cfqd->oom_cfqq;
>                 goto out;
>         }
> 
>         if (!is_sync) {
>                 if (!ioprio_valid(cic->ioprio)) {
>                         struct task_struct *tsk = current;
>                         ioprio = task_nice_ioprio(tsk);
>                         ioprio_class = task_nice_ioclass(tsk);
>                 }
>                 async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class,
>                 ioprio);
>                 cfqq = *async_cfqq;
>                 if (cfqq)
>                         goto out;
>         }
> 
> As you mentioned, we don't need to lookup the cfqg for the async queue.
> What's more is we could fallback to the oom_cfqq even if we had an
> existing async cfqq.  I'm guessing you structured the code this way to
> make the error path cleaner.  I don't think it's a big deal, as it
> should be a rare occurrence, so...

In this patch, the lookup seems unnecessary for the async case but the
change is required for the following changes because async queues are
moved from cfq_data to cfq_group, so we can't determine async queues
w/o looking up cfqg at all and we'll have to fall back to oom_cfqq if
cfqg lookup fails (cuz there's no system-wide async queues).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs
  2015-06-09 15:05     ` Jeff Moyer
@ 2015-06-10  2:49       ` Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2015-06-10  2:49 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, linux-kernel, cgroups, vgoyal, avanzini.arianna

Hello,

On Tue, Jun 09, 2015 at 11:05:46AM -0400, Jeff Moyer wrote:
> Yeah, I sent this email before realizing that GFP_NOWAIT would work out.
> I've reviewed the rest of the series.  Looks good to me!  You did a nice
> job splitting things up into easily reviewable pieces, so thanks for
> that.

Thanks a lot for reviewing the patches. :)

I think it's already too late for the v4.2 merge window.  I'll re-post
the patches w/ your reviewed-by's added once v4.2-rc1 drops.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-06-10  2:49 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-08  8:59 [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Tejun Heo
2015-06-08  8:59 ` [PATCH 1/8] cfq-iosched: simplify control flow in cfq_get_queue() Tejun Heo
2015-06-08 18:36   ` Jeff Moyer
2015-06-08  8:59 ` [PATCH 2/8] cfq-iosched: fix async oom queue handling Tejun Heo
2015-06-08 18:42   ` Jeff Moyer
2015-06-08  8:59 ` [PATCH 3/8] cfq-iosched: fix oom cfq_queue ref leak in cfq_set_request() Tejun Heo
2015-06-08 18:51   ` Jeff Moyer
2015-06-08  8:59 ` [PATCH 4/8] cfq-iosched: minor cleanups Tejun Heo
2015-06-08 18:59   ` Jeff Moyer
2015-06-08  8:59 ` [PATCH 5/8] cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue() Tejun Heo
2015-06-08 19:24   ` Jeff Moyer
2015-06-08 20:27     ` Jeff Moyer
2015-06-08 21:19       ` Vivek Goyal
2015-06-09  3:01         ` Tejun Heo
2015-06-09  3:00       ` Tejun Heo
2015-06-09 14:29         ` Jeff Moyer
2015-06-08  8:59 ` [PATCH 6/8] cfq-iosched: move cfq_group determination from cfq_find_alloc_queue() to cfq_get_queue() Tejun Heo
2015-06-09 14:32   ` Jeff Moyer
2015-06-08  8:59 ` [PATCH 7/8] cfq-iosched: fold cfq_find_alloc_queue() into cfq_get_queue() Tejun Heo
2015-06-09 14:40   ` Jeff Moyer
2015-06-10  2:47     ` Tejun Heo
2015-06-08  8:59 ` [PATCH 8/8] cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root Tejun Heo
2015-06-08 22:29   ` Vivek Goyal
2015-06-09  3:11     ` Tejun Heo
2015-06-09 15:03   ` Jeff Moyer
2015-06-08 19:49 ` [PATCHSET block/for-4.2/writeback] block, cgroup: make cfq charge async IOs to the appropriate blkcgs Jeff Moyer
2015-06-09  3:03   ` Tejun Heo
2015-06-09 15:05     ` Jeff Moyer
2015-06-10  2:49       ` Tejun Heo
2015-06-09  4:21 ` [PATCH 4.5/8] blkcg, cfq-iosched: use GFP_NOWAIT instead of GFP_ATOMIC for non-critical allocations Tejun Heo
2015-06-09 14:27   ` Jeff Moyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).