[PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups
@ 2022-03-30 12:42 Jan Kara
  2022-03-30 12:42 ` [PATCH 1/9] bfq: Avoid false marking of bic as stably merged Jan Kara
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara

Hello,

with a big delay (I'm sorry for that) here is the sixth version of my patches
to fix use-after-free issues in BFQ when processes with merged queues get moved
to different cgroups. The patches have survived some beating in my test VM, but
so far I fail to reproduce the original KASAN reports so testing from people
who can reproduce them is most welcome. Kuai, can you please give these patches
a run in your setup? Thanks a lot for your help with fixing this!

Changes since v5:
* Added handling of situation when bio is submitted for a cgroup that has
  already went through bfq_pd_offline()
* Convert bfq to avoid using deprecated __bio_blkcg() and thus fix possible
  races when returned cgroup can change while bfq is working with a request

Changes since v4:
* Even more aggressive splitting of merged bfq queues to avoid problems with
  long merge chains.

Changes since v3:
* Changed handling of bfq group move to handle the case when target of the
  merge has moved.

Changes since v2:
* Improved handling of bfq queue splitting on move between cgroups
* Removed broken change to bfq_put_cooperator()

Changes since v1:
* Added fix for bfq_put_cooperator()
* Added fix to handle move between cgroups in bfq_merge_bio()

								Honza
Previous versions:
Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4
Link: http://lore.kernel.org/r/20220121105503.14069-1-jack@suse.cz # v5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/9] bfq: Avoid false marking of bic as stably merged
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
@ 2022-03-30 12:42 ` Jan Kara
  2022-03-30 12:42 ` [PATCH 2/9] bfq: Avoid merging queues with different parents Jan Kara
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara, stable

bfq_setup_cooperator() can mark bic as stably merged even though it
decides to not merge its bfqqs (when bfq_setup_merge() returns NULL).
Make sure to mark bic as stably merged only if we are really going to
merge bfqqs.

CC: stable@vger.kernel.org
Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 2e0dd68a3cbe..6d122c28086e 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2895,9 +2895,12 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 				struct bfq_queue *new_bfqq =
 					bfq_setup_merge(bfqq, stable_merge_bfqq);
 
-				bic->stably_merged = true;
-				if (new_bfqq && new_bfqq->bic)
-					new_bfqq->bic->stably_merged = true;
+				if (new_bfqq) {
+					bic->stably_merged = true;
+					if (new_bfqq->bic)
+						new_bfqq->bic->stably_merged =
+									true;
+				}
 				return new_bfqq;
 			} else
 				return NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/9] bfq: Avoid merging queues with different parents
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
  2022-03-30 12:42 ` [PATCH 1/9] bfq: Avoid false marking of bic as stably merged Jan Kara
@ 2022-03-30 12:42 ` Jan Kara
  2022-03-30 12:42 ` [PATCH 3/9] bfq: Split shared queues on move between cgroups Jan Kara
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara, stable

It can happen that the parent of a bfqq changes between the moment we
decide two queues are worth to merge (and set bic->stable_merge_bfqq)
and the moment bfq_setup_merge() is called. This can happen e.g. because
the process submitted IO for a different cgroup and thus bfqq got
reparented. It can even happen that the bfqq we are merging with has
parent cgroup that is already offline and going to be destroyed in which
case the merge can lead to use-after-free issues such as:

BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50
Read of size 8 at addr ffff88800693c0c0 by task runc:[2:INIT]/10544

CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G            E     5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased) f1f3b891c72369aebecd2e43e4641a6358867c70
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0x46/0x5a
 print_address_description.constprop.0+0x1f/0x140
 ? __bfq_deactivate_entity+0x9cb/0xa50
 kasan_report.cold+0x7f/0x11b
 ? __bfq_deactivate_entity+0x9cb/0xa50
 __bfq_deactivate_entity+0x9cb/0xa50
 ? update_curr+0x32f/0x5d0
 bfq_deactivate_entity+0xa0/0x1d0
 bfq_del_bfqq_busy+0x28a/0x420
 ? resched_curr+0x116/0x1d0
 ? bfq_requeue_bfqq+0x70/0x70
 ? check_preempt_wakeup+0x52b/0xbc0
 __bfq_bfqq_expire+0x1a2/0x270
 bfq_bfqq_expire+0xd16/0x2160
 ? try_to_wake_up+0x4ee/0x1260
 ? bfq_end_wr_async_queues+0xe0/0xe0
 ? _raw_write_unlock_bh+0x60/0x60
 ? _raw_spin_lock_irq+0x81/0xe0
 bfq_idle_slice_timer+0x109/0x280
 ? bfq_dispatch_request+0x4870/0x4870
 __hrtimer_run_queues+0x37d/0x700
 ? enqueue_hrtimer+0x1b0/0x1b0
 ? kvm_clock_get_cycles+0xd/0x10
 ? ktime_get_update_offsets_now+0x6f/0x280
 hrtimer_interrupt+0x2c8/0x740

Fix the problem by checking that the parent of the two bfqqs we are
merging in bfq_setup_merge() is the same.

Link: https://lore.kernel.org/linux-block/20211125172809.GC19572@quack2.suse.cz/
CC: stable@vger.kernel.org
Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 6d122c28086e..7d00b21ebe5d 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2758,6 +2758,14 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
 	if (process_refs == 0 || new_process_refs == 0)
 		return NULL;
 
+	/*
+	 * Make sure merged queues belong to the same parent. Parents could
+	 * have changed since the time we decided the two queues are suitable
+	 * for merging.
+	 */
+	if (new_bfqq->entity.parent != bfqq->entity.parent)
+		return NULL;
+
 	bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
 		new_bfqq->pid);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/9] bfq: Split shared queues on move between cgroups
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
  2022-03-30 12:42 ` [PATCH 1/9] bfq: Avoid false marking of bic as stably merged Jan Kara
  2022-03-30 12:42 ` [PATCH 2/9] bfq: Avoid merging queues with different parents Jan Kara
@ 2022-03-30 12:42 ` Jan Kara
  2022-12-08  3:52   ` Yu Kuai
  2022-03-30 12:42 ` [PATCH 4/9] bfq: Update cgroup information before merging bio Jan Kara
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara, stable

When bfqq is shared by multiple processes it can happen that one of the
processes gets moved to a different cgroup (or just starts submitting IO
for different cgroup). In case that happens we need to split the merged
bfqq as otherwise we will have IO for multiple cgroups in one bfqq and
we will just account IO time to wrong entities etc.

Similarly if the bfqq is scheduled to merge with another bfqq but the
merge didn't happen yet, cancel the merge as it need not be valid
anymore.

CC: stable@vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-cgroup.c  | 36 +++++++++++++++++++++++++++++++++---
 block/bfq-iosched.c |  2 +-
 block/bfq-iosched.h |  1 +
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 420eda2589c0..9352f3cc2377 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -743,9 +743,39 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
 	}
 
 	if (sync_bfqq) {
-		entity = &sync_bfqq->entity;
-		if (entity->sched_data != &bfqg->sched_data)
-			bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
+		if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
+			/* We are the only user of this bfqq, just move it */
+			if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
+				bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
+		} else {
+			struct bfq_queue *bfqq;
+
+			/*
+			 * The queue was merged to a different queue. Check
+			 * that the merge chain still belongs to the same
+			 * cgroup.
+			 */
+			for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
+				if (bfqq->entity.sched_data !=
+				    &bfqg->sched_data)
+					break;
+			if (bfqq) {
+				/*
+				 * Some queue changed cgroup so the merge is
+				 * not valid anymore. We cannot easily just
+				 * cancel the merge (by clearing new_bfqq) as
+				 * there may be other processes using this
+				 * queue and holding refs to all queues below
+				 * sync_bfqq->new_bfqq. Similarly if the merge
+				 * already happened, we need to detach from
+				 * bfqq now so that we cannot merge bio to a
+				 * request from the old cgroup.
+				 */
+				bfq_put_cooperator(sync_bfqq);
+				bfq_release_process_ref(bfqd, sync_bfqq);
+				bic_set_bfqq(bic, NULL, 1);
+			}
+		}
 	}
 
 	return bfqg;
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 7d00b21ebe5d..89fe3f85eb3c 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5315,7 +5315,7 @@ static void bfq_put_stable_ref(struct bfq_queue *bfqq)
 	bfq_put_queue(bfqq);
 }
 
-static void bfq_put_cooperator(struct bfq_queue *bfqq)
+void bfq_put_cooperator(struct bfq_queue *bfqq)
 {
 	struct bfq_queue *__bfqq, *next;
 
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 3b83e3d1c2e5..a56763045d19 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -979,6 +979,7 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd,
 void bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 		     bool compensate, enum bfqq_expiration reason);
 void bfq_put_queue(struct bfq_queue *bfqq);
+void bfq_put_cooperator(struct bfq_queue *bfqq);
 void bfq_end_wr_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
 void bfq_release_process_ref(struct bfq_data *bfqd, struct bfq_queue *bfqq);
 void bfq_schedule_dispatch(struct bfq_data *bfqd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/9] bfq: Update cgroup information before merging bio
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
                   ` (2 preceding siblings ...)
  2022-03-30 12:42 ` [PATCH 3/9] bfq: Split shared queues on move between cgroups Jan Kara
@ 2022-03-30 12:42 ` Jan Kara
  2022-03-30 12:42 ` [PATCH 5/9] bfq: Drop pointless unlock-lock pair Jan Kara
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara, stable

When the process is migrated to a different cgroup (or in case of
writeback just starts submitting bios associated with a different
cgroup) bfq_merge_bio() can operate with stale cgroup information in
bic. Thus the bio can be merged to a request from a different cgroup or
it can result in merging of bfqqs for different cgroups or bfqqs of
already dead cgroups and causing possible use-after-free issues. Fix the
problem by updating cgroup information in bfq_merge_bio().

CC: stable@vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 89fe3f85eb3c..1fc4d4628fba 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2457,10 +2457,17 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
 
 	spin_lock_irq(&bfqd->lock);
 
-	if (bic)
+	if (bic) {
+		/*
+		 * Make sure cgroup info is uptodate for current process before
+		 * considering the merge.
+		 */
+		bfq_bic_update_cgroup(bic, bio);
+
 		bfqd->bio_bfqq = bic_to_bfqq(bic, op_is_sync(bio->bi_opf));
-	else
+	} else {
 		bfqd->bio_bfqq = NULL;
+	}
 	bfqd->bio_bic = bic;
 
 	ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/9] bfq: Drop pointless unlock-lock pair
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
                   ` (3 preceding siblings ...)
  2022-03-30 12:42 ` [PATCH 4/9] bfq: Update cgroup information before merging bio Jan Kara
@ 2022-03-30 12:42 ` Jan Kara
  2022-03-30 12:42 ` [PATCH 6/9] bfq: Remove pointless bfq_init_rq() calls Jan Kara
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara

In bfq_insert_request() we unlock bfqd->lock only to call
trace_block_rq_insert() and then lock bfqd->lock again. This is really
pointless since tracing is disabled if we really care about performance
and even if the tracepoint is enabled, it is a quick call.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 1fc4d4628fba..19082e14f3c1 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6150,11 +6150,8 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		return;
 	}
 
-	spin_unlock_irq(&bfqd->lock);
-
 	trace_block_rq_insert(rq);
 
-	spin_lock_irq(&bfqd->lock);
 	bfqq = bfq_init_rq(rq);
 	if (!bfqq || at_head) {
 		if (at_head)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/9] bfq: Remove pointless bfq_init_rq() calls
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
                   ` (4 preceding siblings ...)
  2022-03-30 12:42 ` [PATCH 5/9] bfq: Drop pointless unlock-lock pair Jan Kara
@ 2022-03-30 12:42 ` Jan Kara
  2022-03-30 12:42 ` [PATCH 7/9] bfq: Track whether bfq_group is still online Jan Kara
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara

We call bfq_init_rq() from request merging functions where requests we
get should have already gone through bfq_init_rq() during insert and
anyway we want to do anything only if the request is already tracked by
BFQ. So replace calls to bfq_init_rq() with RQ_BFQQ() instead to simply
skip requests untracked by BFQ. We move bfq_init_rq() call in
bfq_insert_request() a bit earlier to cover request merging and thus
can transfer FIFO position in case of a merge.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 19082e14f3c1..d7cf930b47bb 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2497,8 +2497,6 @@ static int bfq_request_merge(struct request_queue *q, struct request **req,
 	return ELEVATOR_NO_MERGE;
 }
 
-static struct bfq_queue *bfq_init_rq(struct request *rq);
-
 static void bfq_request_merged(struct request_queue *q, struct request *req,
 			       enum elv_merge type)
 {
@@ -2507,7 +2505,7 @@ static void bfq_request_merged(struct request_queue *q, struct request *req,
 	    blk_rq_pos(req) <
 	    blk_rq_pos(container_of(rb_prev(&req->rb_node),
 				    struct request, rb_node))) {
-		struct bfq_queue *bfqq = bfq_init_rq(req);
+		struct bfq_queue *bfqq = RQ_BFQQ(req);
 		struct bfq_data *bfqd;
 		struct request *prev, *next_rq;
 
@@ -2559,8 +2557,8 @@ static void bfq_request_merged(struct request_queue *q, struct request *req,
 static void bfq_requests_merged(struct request_queue *q, struct request *rq,
 				struct request *next)
 {
-	struct bfq_queue *bfqq = bfq_init_rq(rq),
-		*next_bfqq = bfq_init_rq(next);
+	struct bfq_queue *bfqq = RQ_BFQQ(rq),
+		*next_bfqq = RQ_BFQQ(next);
 
 	if (!bfqq)
 		goto remove;
@@ -6129,6 +6127,8 @@ static inline void bfq_update_insert_stats(struct request_queue *q,
 					   unsigned int cmd_flags) {}
 #endif /* CONFIG_BFQ_CGROUP_DEBUG */
 
+static struct bfq_queue *bfq_init_rq(struct request *rq);
+
 static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 			       bool at_head)
 {
@@ -6144,6 +6144,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		bfqg_stats_update_legacy_io(q, rq);
 #endif
 	spin_lock_irq(&bfqd->lock);
+	bfqq = bfq_init_rq(rq);
 	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
 		spin_unlock_irq(&bfqd->lock);
 		blk_mq_free_requests(&free);
@@ -6152,7 +6153,6 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	trace_block_rq_insert(rq);
 
-	bfqq = bfq_init_rq(rq);
 	if (!bfqq || at_head) {
 		if (at_head)
 			list_add(&rq->queuelist, &bfqd->dispatch);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/9] bfq: Track whether bfq_group is still online
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
                   ` (5 preceding siblings ...)
  2022-03-30 12:42 ` [PATCH 6/9] bfq: Remove pointless bfq_init_rq() calls Jan Kara
@ 2022-03-30 12:42 ` Jan Kara
  2022-03-30 12:42 ` [PATCH 8/9] bfq: Get rid of __bio_blkcg() usage Jan Kara
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara

Track whether bfq_group is still online. We cannot rely on
blkcg_gq->online because that gets cleared only after all policies are
offlined and we need something that gets updated already under
bfqd->lock when we are cleaning up our bfq_group to be able to guarantee
that when we see online bfq_group, it will stay online while we are
holding bfqd->lock lock.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-cgroup.c  | 3 ++-
 block/bfq-iosched.h | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 9352f3cc2377..879380c2bc7e 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -557,6 +557,7 @@ static void bfq_pd_init(struct blkg_policy_data *pd)
 				   */
 	bfqg->bfqd = bfqd;
 	bfqg->active_entities = 0;
+	bfqg->online = true;
 	bfqg->rq_pos_tree = RB_ROOT;
 }
 
@@ -603,7 +604,6 @@ struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,
 	struct bfq_entity *entity;
 
 	bfqg = bfq_lookup_bfqg(bfqd, blkcg);
-
 	if (unlikely(!bfqg))
 		return NULL;
 
@@ -979,6 +979,7 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
 
 put_async_queues:
 	bfq_put_async_queues(bfqd, bfqg);
+	bfqg->online = false;
 
 	spin_unlock_irqrestore(&bfqd->lock, flags);
 	/*
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index a56763045d19..4664e2f3e828 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -928,6 +928,8 @@ struct bfq_group {
 
 	/* reference counter (see comments in bfq_bic_update_cgroup) */
 	int ref;
+	/* Is bfq_group still online? */
+	bool online;
 
 	struct bfq_entity entity;
 	struct bfq_sched_data sched_data;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/9] bfq: Get rid of __bio_blkcg() usage
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
                   ` (6 preceding siblings ...)
  2022-03-30 12:42 ` [PATCH 7/9] bfq: Track whether bfq_group is still online Jan Kara
@ 2022-03-30 12:42 ` Jan Kara
  2022-03-30 14:12   ` Christoph Hellwig
  2022-03-30 12:42 ` [PATCH 9/9] bfq: Make sure bfqg for which we are queueing requests is online Jan Kara
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara

BFQ usage of __bio_blkcg() is a relict from the past. Furthermore if bio
would not be associated with any blkcg, the usage of __bio_blkcg() in
BFQ is prone to races with the task being migrated between cgroups as
__bio_blkcg() calls at different places could return different blkcgs.

Convert BFQ to the new situation where bio->bi_blkg is initialized in
bio_set_dev() and thus practically always valid. This allows us to save
blkcg_gq lookup and noticeably simplify the code.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-cgroup.c  | 63 +++++++++++++++++----------------------------
 block/bfq-iosched.c | 11 +-------
 block/bfq-iosched.h |  3 +--
 3 files changed, 25 insertions(+), 52 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 879380c2bc7e..32d2c2a47480 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -586,27 +586,11 @@ static void bfq_group_set_parent(struct bfq_group *bfqg,
 	entity->sched_data = &parent->sched_data;
 }
 
-static struct bfq_group *bfq_lookup_bfqg(struct bfq_data *bfqd,
-					 struct blkcg *blkcg)
+static void bfq_link_bfqg(struct bfq_data *bfqd, struct bfq_group *bfqg)
 {
-	struct blkcg_gq *blkg;
-
-	blkg = blkg_lookup(blkcg, bfqd->queue);
-	if (likely(blkg))
-		return blkg_to_bfqg(blkg);
-	return NULL;
-}
-
-struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,
-				     struct blkcg *blkcg)
-{
-	struct bfq_group *bfqg, *parent;
+	struct bfq_group *parent;
 	struct bfq_entity *entity;
 
-	bfqg = bfq_lookup_bfqg(bfqd, blkcg);
-	if (unlikely(!bfqg))
-		return NULL;
-
 	/*
 	 * Update chain of bfq_groups as we might be handling a leaf group
 	 * which, along with some of its relatives, has not been hooked yet
@@ -623,8 +607,15 @@ struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,
 			bfq_group_set_parent(curr_bfqg, parent);
 		}
 	}
+}
 
-	return bfqg;
+struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
+{
+	struct blkcg_gq *blkg = bio->bi_blkg;
+
+	if (!blkg)
+		return bfqd->root_group;
+	return blkg_to_bfqg(blkg);
 }
 
 /**
@@ -714,25 +705,15 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
  * Move bic to blkcg, assuming that bfqd->lock is held; which makes
  * sure that the reference to cgroup is valid across the call (see
  * comments in bfq_bic_update_cgroup on this issue)
- *
- * NOTE: an alternative approach might have been to store the current
- * cgroup in bfqq and getting a reference to it, reducing the lookup
- * time here, at the price of slightly more complex code.
  */
-static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
-						struct bfq_io_cq *bic,
-						struct blkcg *blkcg)
+static void *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
+				     struct bfq_io_cq *bic,
+				     struct bfq_group *bfqg)
 {
 	struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
 	struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
-	struct bfq_group *bfqg;
 	struct bfq_entity *entity;
 
-	bfqg = bfq_find_set_group(bfqd, blkcg);
-
-	if (unlikely(!bfqg))
-		bfqg = bfqd->root_group;
-
 	if (async_bfqq) {
 		entity = &async_bfqq->entity;
 
@@ -784,20 +765,24 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
 void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
 {
 	struct bfq_data *bfqd = bic_to_bfqd(bic);
-	struct bfq_group *bfqg = NULL;
+	struct bfq_group *bfqg = bfq_bio_bfqg(bfqd, bio);
 	uint64_t serial_nr;
 
-	rcu_read_lock();
-	serial_nr = __bio_blkcg(bio)->css.serial_nr;
+	serial_nr = bfqg_to_blkg(bfqg)->blkcg->css.serial_nr;
 
 	/*
 	 * Check whether blkcg has changed.  The condition may trigger
 	 * spuriously on a newly created cic but there's no harm.
 	 */
 	if (unlikely(!bfqd) || likely(bic->blkcg_serial_nr == serial_nr))
-		goto out;
+		return;
 
-	bfqg = __bfq_bic_change_cgroup(bfqd, bic, __bio_blkcg(bio));
+	/*
+	 * New cgroup for this process. Make sure it is linked to bfq internal
+	 * cgroup hierarchy.
+	 */
+	bfq_link_bfqg(bfqd, bfqg);
+	__bfq_bic_change_cgroup(bfqd, bic, bfqg);
 	/*
 	 * Update blkg_path for bfq_log_* functions. We cache this
 	 * path, and update it here, for the following
@@ -850,8 +835,6 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
 	 */
 	blkg_path(bfqg_to_blkg(bfqg), bfqg->blkg_path, sizeof(bfqg->blkg_path));
 	bic->blkcg_serial_nr = serial_nr;
-out:
-	rcu_read_unlock();
 }
 
 /**
@@ -1469,7 +1452,7 @@ void bfq_end_wr_async(struct bfq_data *bfqd)
 	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
 }
 
-struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd, struct blkcg *blkcg)
+struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
 {
 	return bfqd->root_group;
 }
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index d7cf930b47bb..e47c75f1fa0f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5726,14 +5726,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
 	struct bfq_queue *bfqq;
 	struct bfq_group *bfqg;
 
-	rcu_read_lock();
-
-	bfqg = bfq_find_set_group(bfqd, __bio_blkcg(bio));
-	if (!bfqg) {
-		bfqq = &bfqd->oom_bfqq;
-		goto out;
-	}
-
+	bfqg = bfq_bio_bfqg(bfqd, bio);
 	if (!is_sync) {
 		async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
 						  ioprio);
@@ -5779,8 +5772,6 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
 
 	if (bfqq != &bfqd->oom_bfqq && is_sync && !respawn)
 		bfqq = bfq_do_or_sched_stable_merge(bfqd, bfqq, bic);
-
-	rcu_read_unlock();
 	return bfqq;
 }
 
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 4664e2f3e828..978ef5d6fe6a 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -1009,8 +1009,7 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg);
 void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio);
 void bfq_end_wr_async(struct bfq_data *bfqd);
-struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,
-				     struct blkcg *blkcg);
+struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio);
 struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg);
 struct bfq_group *bfqq_group(struct bfq_queue *bfqq);
 struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 9/9] bfq: Make sure bfqg for which we are queueing requests is online
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
                   ` (7 preceding siblings ...)
  2022-03-30 12:42 ` [PATCH 8/9] bfq: Get rid of __bio_blkcg() usage Jan Kara
@ 2022-03-30 12:42 ` Jan Kara
  2022-03-31  9:31 ` [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups yukuai (C)
  2022-04-01  3:40 ` yukuai (C)
  10 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-03-30 12:42 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jens Axboe, yukuai (C), Jan Kara, stable

Bios queued into BFQ IO scheduler can be associated with a cgroup that
was already offlined. This may then cause insertion of this bfq_group
into a service tree. But this bfq_group will get freed as soon as last
bio associated with it is completed leading to use after free issues for
service tree users. Fix the problem by making sure we always operate on
online bfq_group. If the bfq_group associated with the bio is not
online, we pick the first online parent.

CC: stable@vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-cgroup.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 32d2c2a47480..09574af83566 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -612,10 +612,19 @@ static void bfq_link_bfqg(struct bfq_data *bfqd, struct bfq_group *bfqg)
 struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
 {
 	struct blkcg_gq *blkg = bio->bi_blkg;
+	struct bfq_group *bfqg;
 
-	if (!blkg)
-		return bfqd->root_group;
-	return blkg_to_bfqg(blkg);
+	while (blkg) {
+		bfqg = blkg_to_bfqg(blkg);
+		if (bfqg->online) {
+			bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
+			return bfqg;
+		}
+		blkg = blkg->parent;
+	}
+	bio_associate_blkg_from_css(bio,
+				&bfqg_to_blkg(bfqd->root_group)->blkcg->css);
+	return bfqd->root_group;
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/9] bfq: Get rid of __bio_blkcg() usage
  2022-03-30 12:42 ` [PATCH 8/9] bfq: Get rid of __bio_blkcg() usage Jan Kara
@ 2022-03-30 14:12   ` Christoph Hellwig
  2022-03-30 15:02     ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-03-30 14:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block, Paolo Valente, Jens Axboe, yukuai (C)

On Wed, Mar 30, 2022 at 02:42:51PM +0200, Jan Kara wrote:
> Convert BFQ to the new situation where bio->bi_blkg is initialized in
> bio_set_dev() and thus practically always valid. This allows us to save
> blkcg_gq lookup and noticeably simplify the code.

Is there any case at all where it is not valid these days?
I can't think of any even if we have plenty of boilerplate code trying
to handle that case.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/9] bfq: Get rid of __bio_blkcg() usage
  2022-03-30 14:12   ` Christoph Hellwig
@ 2022-03-30 15:02     ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-03-30 15:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-block, Paolo Valente, Jens Axboe, yukuai (C)

On Wed 30-03-22 07:12:58, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 02:42:51PM +0200, Jan Kara wrote:
> > Convert BFQ to the new situation where bio->bi_blkg is initialized in
> > bio_set_dev() and thus practically always valid. This allows us to save
> > blkcg_gq lookup and noticeably simplify the code.
> 
> Is there any case at all where it is not valid these days?
> I can't think of any even if we have plenty of boilerplate code trying
> to handle that case.

So I didn't find any as well but I was not 100% sure since
blkg_tryget_closest() seems like it *could* return NULL in some rare error
cases. If we either modify blkg_tryget_closest() or handle that case in
bio_associate_blkg_from_css() by using root_blkg, we can remove all the
boilerplate code I assume.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
                   ` (8 preceding siblings ...)
  2022-03-30 12:42 ` [PATCH 9/9] bfq: Make sure bfqg for which we are queueing requests is online Jan Kara
@ 2022-03-31  9:31 ` yukuai (C)
  2022-04-01  3:40 ` yukuai (C)
  10 siblings, 0 replies; 20+ messages in thread
From: yukuai (C) @ 2022-03-31  9:31 UTC (permalink / raw)
  To: Jan Kara, linux-block; +Cc: Paolo Valente, Jens Axboe

在 2022/03/30 20:42, Jan Kara 写道:
> Hello,
> 
> with a big delay (I'm sorry for that) here is the sixth version of my patches
> to fix use-after-free issues in BFQ when processes with merged queues get moved
> to different cgroups. The patches have survived some beating in my test VM, but
> so far I fail to reproduce the original KASAN reports so testing from people
> who can reproduce them is most welcome. Kuai, can you please give these patches
> a run in your setup? Thanks a lot for your help with fixing this!

Thanks for the patch, I'll run reporducer and feedback to you soon.

Kuai
> 
> Changes since v5:
> * Added handling of situation when bio is submitted for a cgroup that has
>    already went through bfq_pd_offline()
> * Convert bfq to avoid using deprecated __bio_blkcg() and thus fix possible
>    races when returned cgroup can change while bfq is working with a request
> 
> Changes since v4:
> * Even more aggressive splitting of merged bfq queues to avoid problems with
>    long merge chains.
> 
> Changes since v3:
> * Changed handling of bfq group move to handle the case when target of the
>    merge has moved.
> 
> Changes since v2:
> * Improved handling of bfq queue splitting on move between cgroups
> * Removed broken change to bfq_put_cooperator()
> 
> Changes since v1:
> * Added fix for bfq_put_cooperator()
> * Added fix to handle move between cgroups in bfq_merge_bio()
> 
> 								Honza
> Previous versions:
> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
> Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
> Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4
> Link: http://lore.kernel.org/r/20220121105503.14069-1-jack@suse.cz # v5
> .
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups
  2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
                   ` (9 preceding siblings ...)
  2022-03-31  9:31 ` [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups yukuai (C)
@ 2022-04-01  3:40 ` yukuai (C)
  2022-04-01  9:26   ` Jan Kara
  10 siblings, 1 reply; 20+ messages in thread
From: yukuai (C) @ 2022-04-01  3:40 UTC (permalink / raw)
  To: Jan Kara, linux-block; +Cc: Paolo Valente, Jens Axboe

在 2022/03/30 20:42, Jan Kara 写道:
> Hello,
> 
> with a big delay (I'm sorry for that) here is the sixth version of my patches
> to fix use-after-free issues in BFQ when processes with merged queues get moved
> to different cgroups. The patches have survived some beating in my test VM, but
> so far I fail to reproduce the original KASAN reports so testing from people
> who can reproduce them is most welcome. Kuai, can you please give these patches
> a run in your setup? Thanks a lot for your help with fixing this!
> 
Hi, Jan

I ran the reproducer for more than 12 hours aready, and the uaf is not
reporduced anymore. Before this patchset this problem can be reporduced
within an hour.

Thanks,
Kuai
> Changes since v5:
> * Added handling of situation when bio is submitted for a cgroup that has
>    already went through bfq_pd_offline()
> * Convert bfq to avoid using deprecated __bio_blkcg() and thus fix possible
>    races when returned cgroup can change while bfq is working with a request
> 
> Changes since v4:
> * Even more aggressive splitting of merged bfq queues to avoid problems with
>    long merge chains.
> 
> Changes since v3:
> * Changed handling of bfq group move to handle the case when target of the
>    merge has moved.
> 
> Changes since v2:
> * Improved handling of bfq queue splitting on move between cgroups
> * Removed broken change to bfq_put_cooperator()
> 
> Changes since v1:
> * Added fix for bfq_put_cooperator()
> * Added fix to handle move between cgroups in bfq_merge_bio()
> 
> 								Honza
> Previous versions:
> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
> Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
> Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4
> Link: http://lore.kernel.org/r/20220121105503.14069-1-jack@suse.cz # v5
> .
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups
  2022-04-01  3:40 ` yukuai (C)
@ 2022-04-01  9:26   ` Jan Kara
  2022-04-01  9:40     ` yukuai (C)
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2022-04-01  9:26 UTC (permalink / raw)
  To: yukuai (C); +Cc: Jan Kara, linux-block, Paolo Valente, Jens Axboe

On Fri 01-04-22 11:40:39, yukuai (C) wrote:
> 在 2022/03/30 20:42, Jan Kara 写道:
> > Hello,
> > 
> > with a big delay (I'm sorry for that) here is the sixth version of my patches
> > to fix use-after-free issues in BFQ when processes with merged queues get moved
> > to different cgroups. The patches have survived some beating in my test VM, but
> > so far I fail to reproduce the original KASAN reports so testing from people
> > who can reproduce them is most welcome. Kuai, can you please give these patches
> > a run in your setup? Thanks a lot for your help with fixing this!
> > 
> Hi, Jan
> 
> I ran the reproducer for more than 12 hours aready, and the uaf is not
> reporduced anymore. Before this patchset this problem can be reporduced
> within an hour.

Great to hear that! Thanks a lot for testing and help with analysis! Can I
add your Tested-by tag?

									Honza

> > Changes since v5:
> > * Added handling of situation when bio is submitted for a cgroup that has
> >    already went through bfq_pd_offline()
> > * Convert bfq to avoid using deprecated __bio_blkcg() and thus fix possible
> >    races when returned cgroup can change while bfq is working with a request
> > 
> > Changes since v4:
> > * Even more aggressive splitting of merged bfq queues to avoid problems with
> >    long merge chains.
> > 
> > Changes since v3:
> > * Changed handling of bfq group move to handle the case when target of the
> >    merge has moved.
> > 
> > Changes since v2:
> > * Improved handling of bfq queue splitting on move between cgroups
> > * Removed broken change to bfq_put_cooperator()
> > 
> > Changes since v1:
> > * Added fix for bfq_put_cooperator()
> > * Added fix to handle move between cgroups in bfq_merge_bio()
> > 
> > 								Honza
> > Previous versions:
> > Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> > Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
> > Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
> > Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4
> > Link: http://lore.kernel.org/r/20220121105503.14069-1-jack@suse.cz # v5
> > .
> > 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups
  2022-04-01  9:26   ` Jan Kara
@ 2022-04-01  9:40     ` yukuai (C)
  0 siblings, 0 replies; 20+ messages in thread
From: yukuai (C) @ 2022-04-01  9:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block, Paolo Valente, Jens Axboe

在 2022/04/01 17:26, Jan Kara 写道:
> On Fri 01-04-22 11:40:39, yukuai (C) wrote:
>> 在 2022/03/30 20:42, Jan Kara 写道:
>>> Hello,
>>>
>>> with a big delay (I'm sorry for that) here is the sixth version of my patches
>>> to fix use-after-free issues in BFQ when processes with merged queues get moved
>>> to different cgroups. The patches have survived some beating in my test VM, but
>>> so far I fail to reproduce the original KASAN reports so testing from people
>>> who can reproduce them is most welcome. Kuai, can you please give these patches
>>> a run in your setup? Thanks a lot for your help with fixing this!
>>>
>> Hi, Jan
>>
>> I ran the reproducer for more than 12 hours aready, and the uaf is not
>> reporduced anymore. Before this patchset this problem can be reporduced
>> within an hour.
> 
> Great to hear that! Thanks a lot for testing and help with analysis! Can I
> add your Tested-by tag?

Of course.

Cheers for address this problem
Kuai
> 
> 									Honza
> 
>>> Changes since v5:
>>> * Added handling of situation when bio is submitted for a cgroup that has
>>>     already went through bfq_pd_offline()
>>> * Convert bfq to avoid using deprecated __bio_blkcg() and thus fix possible
>>>     races when returned cgroup can change while bfq is working with a request
>>>
>>> Changes since v4:
>>> * Even more aggressive splitting of merged bfq queues to avoid problems with
>>>     long merge chains.
>>>
>>> Changes since v3:
>>> * Changed handling of bfq group move to handle the case when target of the
>>>     merge has moved.
>>>
>>> Changes since v2:
>>> * Improved handling of bfq queue splitting on move between cgroups
>>> * Removed broken change to bfq_put_cooperator()
>>>
>>> Changes since v1:
>>> * Added fix for bfq_put_cooperator()
>>> * Added fix to handle move between cgroups in bfq_merge_bio()
>>>
>>> 								Honza
>>> Previous versions:
>>> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
>>> Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
>>> Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
>>> Link: http://lore.kernel.org/r/20220114164215.28972-1-jack@suse.cz # v4
>>> Link: http://lore.kernel.org/r/20220121105503.14069-1-jack@suse.cz # v5
>>> .
>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] bfq: Split shared queues on move between cgroups
  2022-03-30 12:42 ` [PATCH 3/9] bfq: Split shared queues on move between cgroups Jan Kara
@ 2022-12-08  3:52   ` Yu Kuai
  2022-12-08  9:37     ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Yu Kuai @ 2022-12-08  3:52 UTC (permalink / raw)
  To: Jan Kara, linux-block; +Cc: Paolo Valente, Jens Axboe, stable, yukuai (C)

Hi, Jan!

在 2022/03/30 20:42, Jan Kara 写道:
> When bfqq is shared by multiple processes it can happen that one of the
> processes gets moved to a different cgroup (or just starts submitting IO
> for different cgroup). In case that happens we need to split the merged
> bfqq as otherwise we will have IO for multiple cgroups in one bfqq and
> we will just account IO time to wrong entities etc.
> 
> Similarly if the bfqq is scheduled to merge with another bfqq but the
> merge didn't happen yet, cancel the merge as it need not be valid
> anymore.
> 
> CC: stable@vger.kernel.org
> Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>   block/bfq-cgroup.c  | 36 +++++++++++++++++++++++++++++++++---
>   block/bfq-iosched.c |  2 +-
>   block/bfq-iosched.h |  1 +
>   3 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index 420eda2589c0..9352f3cc2377 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -743,9 +743,39 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
>   	}
>   
>   	if (sync_bfqq) {
> -		entity = &sync_bfqq->entity;
> -		if (entity->sched_data != &bfqg->sched_data)
> -			bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
> +		if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
> +			/* We are the only user of this bfqq, just move it */
> +			if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
> +				bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
> +		} else {
> +			struct bfq_queue *bfqq;
> +
> +			/*
> +			 * The queue was merged to a different queue. Check
> +			 * that the merge chain still belongs to the same
> +			 * cgroup.
> +			 */
> +			for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
> +				if (bfqq->entity.sched_data !=
> +				    &bfqg->sched_data)
> +					break;
> +			if (bfqq) {
> +				/*
> +				 * Some queue changed cgroup so the merge is
> +				 * not valid anymore. We cannot easily just
> +				 * cancel the merge (by clearing new_bfqq) as
> +				 * there may be other processes using this
> +				 * queue and holding refs to all queues below
> +				 * sync_bfqq->new_bfqq. Similarly if the merge
> +				 * already happened, we need to detach from
> +				 * bfqq now so that we cannot merge bio to a
> +				 * request from the old cgroup.
> +				 */
> +				bfq_put_cooperator(sync_bfqq);
> +				bfq_release_process_ref(bfqd, sync_bfqq);
> +				bic_set_bfqq(bic, NULL, 1);
> +			}
> +		}
>   	}
Our test found a use-after-free while accessing bfqq->bic->bfqq[] ([1]),
and I really suspect the above change.

1) init state, 2 thread, 2 bic, and 2 bfqq

bic1->bfqq = bfqq1
bfqq1->bic = bic1
bic2->bfqq = bfqq2
bfqq2->bic = bic2

2) bfqq1 and bfqq2 is merged
bic1->bfqq = bfqq2
bfqq1->bic = NULL
bic2->bfqq = bfqq2
bfqq2->bic = NULL

3) bfqq1 issue a io, before such io reaches bfq, move t1 to a new
cgroup, and issues a new io. If the new io reaches bfq first:
bic1->bfqq = NULL
bfqq1->bic = NULL
bic2->bfqq = bfqq2
bfqq2->bic = NULL

4) while handling the new io, a new bfqq is created
bic1->bfqq = bfqq3
bfqq3->bic = bic1
bic2->bfqq = bfqq2
bfqq2->bic = NULL

5) the old io reaches bfq, corresponding bic is got from rq:
bic1->bfqq = bfqq3
bfqq3->bic = bic1
bic2->bfqq = bfqq2
bfqq2->bic = bic1

Here comes up the problematic state, two different bfqq point to the
same bic. And furthermore, if t1 exit and all the io from t1 is done,
bic1 can be freed while bfqq2->bic still points to bic1.

I'm not quite sure about how the bic is working, but I think it make
sense to make sure that bfqq->bic never point to a bic that
corresponding thread doesn't match. Something like below:

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a72304c728fc..22bbc9d45ddf 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6757,7 +6757,8 @@ static struct bfq_queue *bfq_init_rq(struct 
request *rq)
          * addition, if the queue has also just been split, we have to
          * resume its state.
          */
-       if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) 
== 1) {
+       if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) 
== 1 &&
+           bfqq->org_bic == bic) {
                 bfqq->bic = bic;

[1]
[605854.098478] 
==================================================================
[605854.099367] BUG: KASAN: use-after-free in bfq_select_queue+0x378/0xa30
[605854.100133] Read of size 8 at addr ffff88810efb42d8 by task 
fsstress/2318352

[605854.101213] CPU: 6 PID: 2318352 Comm: fsstress Kdump: loaded Not 
tainted 5.10.0-60.18.0.50.h602.kasan.eulerosv2r11.x86_64 #1
[605854.101218] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.12.1-0-ga5cab58-20220320_160524-szxrtosci10000 04/01/2014
[605854.101221] Call Trace:
[605854.101231]  dump_stack+0x9c/0xd3
[605854.101240]  print_address_description.constprop.0+0x19/0x170
[605854.101247]  ? bfq_select_queue+0x378/0xa30
[605854.101254]  __kasan_report.cold+0x6c/0x84
[605854.101261]  ? bfq_select_queue+0x378/0xa30
[605854.101267]  kasan_report+0x3a/0x50
[605854.101273]  bfq_select_queue+0x378/0xa30
[605854.101279]  ? bfq_bfqq_expire+0x6c0/0x6c0
[605854.101286]  ? bfq_mark_bfqq_busy+0x1f/0x30
[605854.101293]  ? _raw_spin_lock_irq+0x7b/0xd0
[605854.101299]  __bfq_dispatch_request+0x1c4/0x220
[605854.101306]  bfq_dispatch_request+0xe8/0x130
[605854.101313]  __blk_mq_do_dispatch_sched+0x3f4/0x560
[605854.101320]  ? blk_mq_sched_mark_restart_hctx+0x50/0x50
[605854.101326]  ? bfq_init_rq+0x128/0x940
[605854.101333]  ? pvclock_clocksource_read+0xf6/0x1d0
[605854.101339]  blk_mq_do_dispatch_sched+0x62/0xb0
[605854.101345]  __blk_mq_sched_dispatch_requests+0x215/0x2a0
[605854.101351]  ? blk_mq_do_dispatch_ctx+0x3a0/0x3a0
[605854.101358]  ? bfq_insert_request+0x193/0x3f0
[605854.101364]  blk_mq_sched_dispatch_requests+0x8f/0xd0
[605854.101370]  __blk_mq_run_hw_queue+0x98/0x180
[605854.101377]  __blk_mq_delay_run_hw_queue+0x22b/0x240
[605854.101383]  ? bfq_asymmetric_scenario+0x160/0x160
[605854.101389]  blk_mq_run_hw_queue+0xe3/0x190
[605854.101396]  ? bfq_insert_request+0x3f0/0x3f0
[605854.101401]  blk_mq_sched_insert_requests+0x107/0x200
[605854.101408]  blk_mq_flush_plug_list+0x26e/0x3c0
[605854.101415]  ? blk_mq_insert_requests+0x250/0x250
[605854.101422]  ? blk_check_plugged+0x190/0x190
[605854.101429]  blk_finish_plug+0x63/0x90
[605854.101436]  __iomap_dio_rw+0x7b5/0x910
[605854.101443]  ? iomap_dio_actor+0x150/0x150
[605854.101450]  ? userns_put+0x70/0x70
[605854.101456]  ? userns_put+0x70/0x70
[605854.101463]  ? avc_has_perm_noaudit+0x1d0/0x1d0
[605854.101468]  ? down_read+0xd5/0x1a0
[605854.101474]  ? down_read_killable+0x1b0/0x1b0
[605854.101479]  ? from_kgid+0xa0/0xa0
[605854.101485]  iomap_dio_rw+0x36/0x80
[605854.101544]  ext4_dio_read_iter+0x146/0x190 [ext4]
[605854.101602]  ext4_file_read_iter+0x1e2/0x230 [ext4]
[605854.101609]  new_sync_read+0x29f/0x400
[605854.101615]  ? default_llseek+0x160/0x160
[605854.101621]  ? find_isec_ns+0x8d/0x2e0
[605854.101627]  ? avc_policy_seqno+0x27/0x40
[605854.101633]  ? selinux_file_permission+0x34/0x180
[605854.101641]  ? security_file_permission+0x135/0x2b0
[605854.101648]  vfs_read+0x24e/0x2d0
[605854.101654]  ksys_read+0xd5/0x1b0
[605854.101661]  ? __ia32_sys_pread64+0x160/0x160
[605854.101667]  ? __audit_syscall_entry+0x1cc/0x220
[605854.101675]  do_syscall_64+0x33/0x40
[605854.101681]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[605854.101686] RIP: 0033:0x7ff05f96fe62
[605854.101705] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 12 04 0c 00 e8 b5 fe 
01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 
05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[605854.101709] RSP: 002b:00007fffd30c0ff8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
[605854.101717] RAX: ffffffffffffffda RBX: 00000000000000a5 RCX: 
00007ff05f96fe62
[605854.101721] RDX: 000000000001d000 RSI: 0000000001ffc000 RDI: 
0000000000000003
[605854.101724] RBP: 0000000000000003 R08: 0000000002019000 R09: 
0000000000000000
[605854.101729] R10: 00007ff05fa65290 R11: 0000000000000246 R12: 
0000000000131800
[605854.101732] R13: 000000000001d000 R14: 0000000000000000 R15: 
0000000001ffc000

[605854.102003] Allocated by task 2318348:
[605854.102489]  kasan_save_stack+0x1b/0x40
[605854.102495]  __kasan_kmalloc.constprop.0+0xb5/0xe0
[605854.102501]  kmem_cache_alloc_node+0x15d/0x480
[605854.102505]  ioc_create_icq+0x68/0x2e0
[605854.102510]  blk_mq_sched_assign_ioc+0xbc/0xd0
[605854.102516]  blk_mq_rq_ctx_init+0x4b0/0x600
[605854.102521]  __blk_mq_alloc_request+0x21f/0x2e0
[605854.102526]  blk_mq_submit_bio+0x27a/0xd60
[605854.102532]  __submit_bio_noacct_mq+0x10b/0x270
[605854.102538]  submit_bio_noacct+0x13d/0x150
[605854.102543]  submit_bio+0xbf/0x280
[605854.102548]  iomap_dio_submit_bio+0x155/0x180
[605854.102553]  iomap_dio_bio_actor+0x2f0/0x770
[605854.102557]  iomap_dio_actor+0xd9/0x150
[605854.102562]  iomap_apply+0x1d2/0x4f0
[605854.102567]  __iomap_dio_rw+0x43a/0x910
[605854.102572]  iomap_dio_rw+0x36/0x80
[605854.102628]  ext4_dio_write_iter+0x46f/0x730 [ext4]
[605854.102684]  ext4_file_write_iter+0xd8/0x100 [ext4]
[605854.102689]  new_sync_write+0x2ac/0x3a0
[605854.102701]  vfs_write+0x365/0x430
[605854.102707]  ksys_write+0xd5/0x1b0
[605854.102712]  do_syscall_64+0x33/0x40
[605854.102718]  entry_SYSCALL_64_after_hwframe+0x61/0xc6

[605854.102984] Freed by task 2320929:
[605854.103434]  kasan_save_stack+0x1b/0x40
[605854.103439]  kasan_set_track+0x1c/0x30
[605854.103444]  kasan_set_free_info+0x20/0x40
[605854.103449]  __kasan_slab_free+0x151/0x180
[605854.103454]  kmem_cache_free+0x9e/0x540
[605854.103461]  rcu_do_batch+0x292/0x700
[605854.103465]  rcu_core+0x270/0x2d0
[605854.103471]  __do_softirq+0xfd/0x402

[605854.103741] Last call_rcu():
[605854.104141]  kasan_save_stack+0x1b/0x40
[605854.104146]  kasan_record_aux_stack+0xa8/0xf0
[605854.104150]  __call_rcu+0xa4/0x3a0
[605854.104155]  ioc_release_fn+0x45/0x120
[605854.104161]  process_one_work+0x3c5/0x730
[605854.104167]  worker_thread+0x93/0x650
[605854.104172]  kthread+0x1ba/0x210
[605854.104178]  ret_from_fork+0x22/0x30

[605854.104440] Second to last call_rcu():
[605854.104930]  kasan_save_stack+0x1b/0x40
[605854.104935]  kasan_record_aux_stack+0xa8/0xf0
[605854.104939]  __call_rcu+0xa4/0x3a0
[605854.104944]  ioc_release_fn+0x45/0x120
[605854.104949]  process_one_work+0x3c5/0x730
[605854.104955]  worker_thread+0x93/0x650
[605854.104960]  kthread+0x1ba/0x210
[605854.104965]  ret_from_fork+0x22/0x30

[605854.105229] The buggy address belongs to the object at ffff88810efb42a0
                  which belongs to the cache bfq_io_cq of size 160
[605854.106659] The buggy address is located 56 bytes inside of
                  160-byte region [ffff88810efb42a0, ffff88810efb4340)
[605854.108021] The buggy address belongs to the page:
[605854.108608] page:00000000a519c14c refcount:1 mapcount:0 
mapping:0000000000000000 index:0xffff88810efb4000 pfn:0x10efb4
[605854.108612] head:00000000a519c14c order:1 compound_mapcount:0
[605854.108620] flags: 
0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[605854.108628] raw: 0017ffffc0010200 0000000000000000 dead000000000122 
ffff8881407c8600
[605854.108635] raw: ffff88810efb4000 000000008024001a 00000001ffffffff 
0000000000000000
[605854.108639] page dumped because: kasan: bad access detected

[605854.108909] Memory state around the buggy address:
[605854.109494]  ffff88810efb4180: fc fc fc fc fc fc fc fc fb fb fb fb 
fb fb fb fb
[605854.110323]  ffff88810efb4200: fb fb fb fb fb fb fb fb fb fb fb fb 
fc fc fc fc
[605854.111151] >ffff88810efb4280: fc fc fc fc fa fb fb fb fb fb fb fb 
fb fb fb fb
[605854.111978]                                                     ^
[605854.112692]  ffff88810efb4300: fb fb fb fb fb fb fb fb fc fc fc fc 
fc fc fc fc
[605854.113522]  ffff88810efb4380: fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb fb
[605854.114350] 
==================================================================


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] bfq: Split shared queues on move between cgroups
  2022-12-08  3:52   ` Yu Kuai
@ 2022-12-08  9:37     ` Jan Kara
  2022-12-08 12:59       ` Yu Kuai
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2022-12-08  9:37 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jan Kara, linux-block, Paolo Valente, Jens Axboe, stable, yukuai (C)

On Thu 08-12-22 11:52:38, Yu Kuai wrote:
> Hi, Jan!
> 
> 在 2022/03/30 20:42, Jan Kara 写道:
> > When bfqq is shared by multiple processes it can happen that one of the
> > processes gets moved to a different cgroup (or just starts submitting IO
> > for different cgroup). In case that happens we need to split the merged
> > bfqq as otherwise we will have IO for multiple cgroups in one bfqq and
> > we will just account IO time to wrong entities etc.
> > 
> > Similarly if the bfqq is scheduled to merge with another bfqq but the
> > merge didn't happen yet, cancel the merge as it need not be valid
> > anymore.
> > 
> > CC: stable@vger.kernel.org
> > Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >   block/bfq-cgroup.c  | 36 +++++++++++++++++++++++++++++++++---
> >   block/bfq-iosched.c |  2 +-
> >   block/bfq-iosched.h |  1 +
> >   3 files changed, 35 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> > index 420eda2589c0..9352f3cc2377 100644
> > --- a/block/bfq-cgroup.c
> > +++ b/block/bfq-cgroup.c
> > @@ -743,9 +743,39 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
> >   	}
> >   	if (sync_bfqq) {
> > -		entity = &sync_bfqq->entity;
> > -		if (entity->sched_data != &bfqg->sched_data)
> > -			bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
> > +		if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
> > +			/* We are the only user of this bfqq, just move it */
> > +			if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
> > +				bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
> > +		} else {
> > +			struct bfq_queue *bfqq;
> > +
> > +			/*
> > +			 * The queue was merged to a different queue. Check
> > +			 * that the merge chain still belongs to the same
> > +			 * cgroup.
> > +			 */
> > +			for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
> > +				if (bfqq->entity.sched_data !=
> > +				    &bfqg->sched_data)
> > +					break;
> > +			if (bfqq) {
> > +				/*
> > +				 * Some queue changed cgroup so the merge is
> > +				 * not valid anymore. We cannot easily just
> > +				 * cancel the merge (by clearing new_bfqq) as
> > +				 * there may be other processes using this
> > +				 * queue and holding refs to all queues below
> > +				 * sync_bfqq->new_bfqq. Similarly if the merge
> > +				 * already happened, we need to detach from
> > +				 * bfqq now so that we cannot merge bio to a
> > +				 * request from the old cgroup.
> > +				 */
> > +				bfq_put_cooperator(sync_bfqq);
> > +				bfq_release_process_ref(bfqd, sync_bfqq);
> > +				bic_set_bfqq(bic, NULL, 1);
> > +			}
> > +		}
> >   	}
> Our test found a use-after-free while accessing bfqq->bic->bfqq[] ([1]),
> and I really suspect the above change.

OK, so bfqq points to bfq_io_cq that is already freed. Nasty.

> 1) init state, 2 thread, 2 bic, and 2 bfqq
> 
> bic1->bfqq = bfqq1
> bfqq1->bic = bic1
> bic2->bfqq = bfqq2
> bfqq2->bic = bic2
> 
> 2) bfqq1 and bfqq2 is merged
> bic1->bfqq = bfqq2
> bfqq1->bic = NULL
> bic2->bfqq = bfqq2
> bfqq2->bic = NULL
> 
> 3) bfqq1 issue a io, before such io reaches bfq, move t1 to a new
> cgroup, and issues a new io. If the new io reaches bfq first:

What do you mean by 'bfqq1 issues IO'? Do you mean t1?

> bic1->bfqq = NULL
> bfqq1->bic = NULL
> bic2->bfqq = bfqq2
> bfqq2->bic = NULL
> 
> 4) while handling the new io, a new bfqq is created
> bic1->bfqq = bfqq3
> bfqq3->bic = bic1
> bic2->bfqq = bfqq2
> bfqq2->bic = NULL
> 
> 5) the old io reaches bfq, corresponding bic is got from rq:
> bic1->bfqq = bfqq3
> bfqq3->bic = bic1
> bic2->bfqq = bfqq2
> bfqq2->bic = bic1

So if this state happens, it would be indeed a problem. But I don't see how
it could happen. bics are associated with the process. So t1 will always
use bic1, t2 will always use bic2. In bfq_init_rq() we get bfqq either from
bic (so it would return bfqq3 for bic1) or we allocate a new queue (that
would be some bfqq4). So I see no way how bfqq2 could start pointing to
bic1...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] bfq: Split shared queues on move between cgroups
  2022-12-08  9:37     ` Jan Kara
@ 2022-12-08 12:59       ` Yu Kuai
  0 siblings, 0 replies; 20+ messages in thread
From: Yu Kuai @ 2022-12-08 12:59 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: linux-block, Paolo Valente, Jens Axboe, stable, yukuai (C)

Hi

在 2022/12/08 17:37, Jan Kara 写道:
> 
> So if this state happens, it would be indeed a problem. But I don't see how
> it could happen. bics are associated with the process. So t1 will always
> use bic1, t2 will always use bic2. In bfq_init_rq() we get bfqq either from
> bic (so it would return bfqq3 for bic1) or we allocate a new queue (that
> would be some bfqq4). So I see no way how bfqq2 could start pointing to
> bic1...


> 
> 								Honza
>

Following is possible scenarios that we derived:

1) Initial state, two process with io.

Process 1       Process 2
  (BIC1)          (BIC2)
   |  Λ           |  Λ
   |  |            |  |
   V  |            V  |
   bfqq1           bfqq2

2) bfqq1 is merged to bfqq2, now bfqq2 has two process ref, bfqq2->bic
    will not be set.

Process 1       Process 2
  (BIC1)          (BIC2)
   |               |
    \-------------\|
                   V
   bfqq1           bfqq2(coop)

3) Process 1 exit, then issue io(denoted IOA) from Process 2.

Process 2
  (BIC1)
   |  Λ
   |  |
   V  |
bfqq2(coop)

4) Before IOA completed, move Process 2 to another cgroup and issue
    io.

Process 2
  (BIC2)
    Λ
    |\--------------\
    |                V
bfqq2(coop)      bfqq3

Now that BIC2 point to bfqq3, while bfqq2 and bfqq3 both point to BIC2.

5) If all the io are completed and Process 2 exit, BIC2 will be freed,
    while bfqq2 still ponits to BIC2.

It's easy to construct such scenario, however, I'm not able to trigger
such UAF yet. It seems hard to let bfqq2 become in_service_queue again
and access bfqq2->bic in bfq_select_queue.

While I'm organizing the above procedures, I realized that my former
solution is wrong. Now I think that the right thing to do is to also
clear bfqq->bic while process ref is decreased to 0.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 8/9] bfq: Get rid of __bio_blkcg() usage
  2022-04-01 10:27 [PATCH 0/9 v7] " Jan Kara
@ 2022-04-01 10:27 ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2022-04-01 10:27 UTC (permalink / raw)
  To: Paolo Valente; +Cc: linux-block, Jens Axboe, yukuai (C), Jan Kara, stable

BFQ usage of __bio_blkcg() is a relict from the past. Furthermore if bio
would not be associated with any blkcg, the usage of __bio_blkcg() in
BFQ is prone to races with the task being migrated between cgroups as
__bio_blkcg() calls at different places could return different blkcgs.

Convert BFQ to the new situation where bio->bi_blkg is initialized in
bio_set_dev() and thus practically always valid. This allows us to save
blkcg_gq lookup and noticeably simplify the code.

CC: stable@vger.kernel.org
Fixes: 0fe061b9f03c ("blkcg: fix ref count issue with bio_blkcg() using task_css")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-cgroup.c  | 63 +++++++++++++++++----------------------------
 block/bfq-iosched.c | 11 +-------
 block/bfq-iosched.h |  3 +--
 3 files changed, 25 insertions(+), 52 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 879380c2bc7e..32d2c2a47480 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -586,27 +586,11 @@ static void bfq_group_set_parent(struct bfq_group *bfqg,
 	entity->sched_data = &parent->sched_data;
 }
 
-static struct bfq_group *bfq_lookup_bfqg(struct bfq_data *bfqd,
-					 struct blkcg *blkcg)
+static void bfq_link_bfqg(struct bfq_data *bfqd, struct bfq_group *bfqg)
 {
-	struct blkcg_gq *blkg;
-
-	blkg = blkg_lookup(blkcg, bfqd->queue);
-	if (likely(blkg))
-		return blkg_to_bfqg(blkg);
-	return NULL;
-}
-
-struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,
-				     struct blkcg *blkcg)
-{
-	struct bfq_group *bfqg, *parent;
+	struct bfq_group *parent;
 	struct bfq_entity *entity;
 
-	bfqg = bfq_lookup_bfqg(bfqd, blkcg);
-	if (unlikely(!bfqg))
-		return NULL;
-
 	/*
 	 * Update chain of bfq_groups as we might be handling a leaf group
 	 * which, along with some of its relatives, has not been hooked yet
@@ -623,8 +607,15 @@ struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,
 			bfq_group_set_parent(curr_bfqg, parent);
 		}
 	}
+}
 
-	return bfqg;
+struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
+{
+	struct blkcg_gq *blkg = bio->bi_blkg;
+
+	if (!blkg)
+		return bfqd->root_group;
+	return blkg_to_bfqg(blkg);
 }
 
 /**
@@ -714,25 +705,15 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
  * Move bic to blkcg, assuming that bfqd->lock is held; which makes
  * sure that the reference to cgroup is valid across the call (see
  * comments in bfq_bic_update_cgroup on this issue)
- *
- * NOTE: an alternative approach might have been to store the current
- * cgroup in bfqq and getting a reference to it, reducing the lookup
- * time here, at the price of slightly more complex code.
  */
-static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
-						struct bfq_io_cq *bic,
-						struct blkcg *blkcg)
+static void *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
+				     struct bfq_io_cq *bic,
+				     struct bfq_group *bfqg)
 {
 	struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
 	struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
-	struct bfq_group *bfqg;
 	struct bfq_entity *entity;
 
-	bfqg = bfq_find_set_group(bfqd, blkcg);
-
-	if (unlikely(!bfqg))
-		bfqg = bfqd->root_group;
-
 	if (async_bfqq) {
 		entity = &async_bfqq->entity;
 
@@ -784,20 +765,24 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
 void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
 {
 	struct bfq_data *bfqd = bic_to_bfqd(bic);
-	struct bfq_group *bfqg = NULL;
+	struct bfq_group *bfqg = bfq_bio_bfqg(bfqd, bio);
 	uint64_t serial_nr;
 
-	rcu_read_lock();
-	serial_nr = __bio_blkcg(bio)->css.serial_nr;
+	serial_nr = bfqg_to_blkg(bfqg)->blkcg->css.serial_nr;
 
 	/*
 	 * Check whether blkcg has changed.  The condition may trigger
 	 * spuriously on a newly created cic but there's no harm.
 	 */
 	if (unlikely(!bfqd) || likely(bic->blkcg_serial_nr == serial_nr))
-		goto out;
+		return;
 
-	bfqg = __bfq_bic_change_cgroup(bfqd, bic, __bio_blkcg(bio));
+	/*
+	 * New cgroup for this process. Make sure it is linked to bfq internal
+	 * cgroup hierarchy.
+	 */
+	bfq_link_bfqg(bfqd, bfqg);
+	__bfq_bic_change_cgroup(bfqd, bic, bfqg);
 	/*
 	 * Update blkg_path for bfq_log_* functions. We cache this
 	 * path, and update it here, for the following
@@ -850,8 +835,6 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
 	 */
 	blkg_path(bfqg_to_blkg(bfqg), bfqg->blkg_path, sizeof(bfqg->blkg_path));
 	bic->blkcg_serial_nr = serial_nr;
-out:
-	rcu_read_unlock();
 }
 
 /**
@@ -1469,7 +1452,7 @@ void bfq_end_wr_async(struct bfq_data *bfqd)
 	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
 }
 
-struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd, struct blkcg *blkcg)
+struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
 {
 	return bfqd->root_group;
 }
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index d7cf930b47bb..e47c75f1fa0f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5726,14 +5726,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
 	struct bfq_queue *bfqq;
 	struct bfq_group *bfqg;
 
-	rcu_read_lock();
-
-	bfqg = bfq_find_set_group(bfqd, __bio_blkcg(bio));
-	if (!bfqg) {
-		bfqq = &bfqd->oom_bfqq;
-		goto out;
-	}
-
+	bfqg = bfq_bio_bfqg(bfqd, bio);
 	if (!is_sync) {
 		async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
 						  ioprio);
@@ -5779,8 +5772,6 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
 
 	if (bfqq != &bfqd->oom_bfqq && is_sync && !respawn)
 		bfqq = bfq_do_or_sched_stable_merge(bfqd, bfqq, bic);
-
-	rcu_read_unlock();
 	return bfqq;
 }
 
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 4664e2f3e828..978ef5d6fe6a 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -1009,8 +1009,7 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg);
 void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio);
 void bfq_end_wr_async(struct bfq_data *bfqd);
-struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd,
-				     struct blkcg *blkcg);
+struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio);
 struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg);
 struct bfq_group *bfqq_group(struct bfq_queue *bfqq);
 struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-12-08 12:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-30 12:42 [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups Jan Kara
2022-03-30 12:42 ` [PATCH 1/9] bfq: Avoid false marking of bic as stably merged Jan Kara
2022-03-30 12:42 ` [PATCH 2/9] bfq: Avoid merging queues with different parents Jan Kara
2022-03-30 12:42 ` [PATCH 3/9] bfq: Split shared queues on move between cgroups Jan Kara
2022-12-08  3:52   ` Yu Kuai
2022-12-08  9:37     ` Jan Kara
2022-12-08 12:59       ` Yu Kuai
2022-03-30 12:42 ` [PATCH 4/9] bfq: Update cgroup information before merging bio Jan Kara
2022-03-30 12:42 ` [PATCH 5/9] bfq: Drop pointless unlock-lock pair Jan Kara
2022-03-30 12:42 ` [PATCH 6/9] bfq: Remove pointless bfq_init_rq() calls Jan Kara
2022-03-30 12:42 ` [PATCH 7/9] bfq: Track whether bfq_group is still online Jan Kara
2022-03-30 12:42 ` [PATCH 8/9] bfq: Get rid of __bio_blkcg() usage Jan Kara
2022-03-30 14:12   ` Christoph Hellwig
2022-03-30 15:02     ` Jan Kara
2022-03-30 12:42 ` [PATCH 9/9] bfq: Make sure bfqg for which we are queueing requests is online Jan Kara
2022-03-31  9:31 ` [PATCH 0/9 v6] bfq: Avoid use-after-free when moving processes between cgroups yukuai (C)
2022-04-01  3:40 ` yukuai (C)
2022-04-01  9:26   ` Jan Kara
2022-04-01  9:40     ` yukuai (C)
2022-04-01 10:27 [PATCH 0/9 v7] " Jan Kara
2022-04-01 10:27 ` [PATCH 8/9] bfq: Get rid of __bio_blkcg() usage Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.