linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "yukuai (C)" <yukuai3@huawei.com>
To: "Jan Kara" <jack@suse.cz>, "Michal Koutný" <mkoutny@suse.com>
Cc: Paolo Valente <paolo.valente@linaro.org>,
	<linux-block@vger.kernel.org>, <fvogt@suse.de>,
	<cgroups@vger.kernel.org>
Subject: Re: Use after free with BFQ and cgroups
Date: Thu, 9 Dec 2021 10:23:33 +0800	[thread overview]
Message-ID: <f03b2b1c-808a-c657-327d-03165b988e7d@huawei.com> (raw)
In-Reply-To: <20211129171115.GC29512@quack2.suse.cz>

在 2021/11/30 1:11, Jan Kara 写道:
> On Fri 26-11-21 15:47:24, Michal Koutný wrote:
>> Hello.
>>
>> On Thu, Nov 25, 2021 at 06:28:09PM +0100, Jan Kara <jack@suse.cz> wrote:
>> [...]
>> +Cc cgroups ML
>> https://lore.kernel.org/linux-block/20211125172809.GC19572@quack2.suse.cz/
>>
>>
>> I understand there are more objects than blkcgs but I assume it can
>> eventually boil down to blkcg references, so I suggest another
>> alternative. (But I may easily miss the relations between BFQ objects,
>> so consider this only high-level opinion.)
>>
>>> After some poking, looking into crashdumps, and applying some debug patches
>>> the following seems to be happening: We have a process P in blkcg G. Now
>>> G is taken offline so bfq_group is cleaned up in bfq_pd_offline() but P
>>> still holds reference to G from its bfq_queue. Then P submits IO, G gets
>>> inserted into service tree despite being already offline.
>>
>> (If G is offline, P can only be zombie, just saying. (I guess it can
>> still be Q's IO on behalf of G.))
>>
>> IIUC, the reference to G is only held by P. If the G reference is copied
>> into another structure (the service tree) it should get another
>> reference. My naïve proposal would be css_get(). (1)
> 
> So I was looking into this puzzle. The answer is following:
> 
> The process P (podman, pid 2571) is currently attached to the root cgroup
> but it has io_context with BFQ queue that points to the already-offline G
> as a parent. The bio is thus associated with the root cgroup (via
> bio->bi_blkg) but BFQ uses io_context to lookup the BFQ queue where IO
> should be queued and then uses its parent to determine blkg which it should
> be charged and thus gets to the dying cgroup.

Hi, Jan

After some code review, we found that the root cause of the problem
semms to be different.

If the process is moved from group G to root group, and a new io is
issued from the process, then bfq should detect this and changing
bfq_queue's parent to root bfq_group:

bfq_insert_request
  bfq_init_rq
   bfq_bic_update_cgroup
    serial_nr = __bio_blkcg(bio)->css.serial_nr; -> from root group
    bic->blkcg_serial_nr == serial_nr -> this do not pass,because 
bic->blkcg_serial_nr is still from group G
    __bfq_bic_change_cgroup -> bfq_queue parent will be changed to root 
group

And we think the following path is possible to trigger the problem:

1) process P1 and P2 is currently in cgroup C1, corresponding to
bfq_queue q1, q2 and bfq_group g1. And q1 and q2 are merged:
q1->next_bfqq = q2.

2) move P1 from C1 to root_cgroup, q1->next_bfqq is still q2
and flag BFQQF_split_coop is not set yet.

3) P2 exit, q2 won't exit because it's still referenced through
queue merge.

4) delete C1, g1 is offlined

5) issue a new io in q1, q1's parent entity will change to root,
however the io will end up in q1->next_bfqq = q2, and thus the
offlined g1 is inserted to service tree through q2.

6) P1 exit, q2 exit, and finially g1 is freed, while g1 is still
in service tree of it's parent.

We confirmed this by our reproducer through a simple patch:
stop merging bfq_queues if their parents are different.

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 1ce1a99a7160..14c1d1c3811e 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2626,6 +2626,11 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct 
bfq_queue *new_bfqq)
         while ((__bfqq = new_bfqq->new_bfqq)) {
                 if (__bfqq == bfqq)
                         return NULL;
+               if (__bfqq->entity.parent != bfqq->entity.parent) {
+                       if (bfq_bfqq_coop(__bfqq))
+                               bfq_mark_bfqq_split_coop(__bfqq);
+                       return NULL;
+               }
                 new_bfqq = __bfqq;
         }

@@ -2825,8 +2830,16 @@ bfq_setup_cooperator(struct bfq_data *bfqd, 
struct bfq_queue *bfqq,
         if (bfq_too_late_for_merging(bfqq))
                 return NULL;

-       if (bfqq->new_bfqq)
-               return bfqq->new_bfqq;
+       if (bfqq->new_bfqq) {
+               struct bfq_queue *new_bfqq = bfqq->new_bfqq;
+
+               if(bfqq->entity.parent == new_bfqq->entity.parent)
+                       return new_bfqq;
+
+               if(bfq_bfqq_coop(new_bfqq))
+                       bfq_mark_bfqq_split_coop(new_bfqq);
+               return NULL;
+       }

Do you think this analysis is correct?

Thanks,
Kuai
> 
> Apparently P got recently moved from G to the root cgroup and there was
> reference left in the BFQ queue structure to G.
> 
>>> IO completes, P exits, bfq_queue pointing to G gets destroyed, the
>>> last reference to G is dropped, G gets freed although is it still
>>> inserted in the service tree.  Eventually someone trips over the freed
>>> memory.
>>
>> Isn't it the bfq_queue.bfq_entity that's inserted in the service tree
>> (not blkcg G)?
> 
> Yes, it is. But the entity is part of bfq_group structure which is the pd
> for the blkcg.
> 
>> You write bfq_queue is destroyed, shouldn't that remove it from the
>> service tree? (2)
> 
> Yes, BFQ queue is removed from the service trees on destruction. But its
> parent - bfq_group - is not removed from its service tree. And that's where
> we hit the problem.
> 
>>> Now I was looking into how to best fix this. There are several
>>> possibilities and I'm not sure which one to pick so that's why I'm writing
>>> to you. bfq_pd_offline() is walking all entities in service trees and
>>> trying to get rid of references to bfq_group (by reparenting entities).
>>> Is this guaranteed to see all entities that point to G? From the scenario
>>> I'm observing it seems this can miss entities pointing to G - e.g. if they
>>> are in idle tree, we will just remove them from the idle tree but we won't
>>> change entity->parent so they still point to G. This can be seen as one
>>> culprit of the bug.
>>
>> There can be two types of references to blkcg (transitively via
>> bfq_group):
>> a) "plain" (just a pointer stored somewhere),
>> b) "pinned" (marked by css_get() of the respective blkcg).
>>
>> The bfq_pd_offline() callback should erase all plain references (e.g. by
>> reparenting) or poke the holders of pinned references to release (unpin)
>> them eventually (so that blkcg goes away).
>>
>> I reckon it's not possible to traverse all references in the
>> bfq_pd_offline().
> 
> So bfq_pd_offline() does erase all plain references AFAICT. But later it
> can create new plain references (service tree) from the existing "pinned"
> ones and once pinned references go away, those created plain references
> cause trouble. And the more I'm looking into this the more I'm convinced
> bfq_pd_offline() should be more careful and remove also the pinned
> references from bfq queues. It actually does it for most queues but it can
> currently miss some... I'll look into that.
> 
> Thanks for your very good questions and hints!
> 
> 								Honza
> 

  reply	other threads:[~2021-12-09  2:23 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-25 17:28 Use after free with BFQ and cgroups Jan Kara
2021-11-26 14:47 ` Michal Koutný
2021-11-29 17:11   ` Jan Kara
2021-12-09  2:23     ` yukuai (C) [this message]
2021-12-09 15:33       ` Paolo Valente
2021-12-13 17:33       ` Jan Kara
2021-12-14  1:24         ` yukuai (C)
2021-12-20 18:38           ` Jan Kara
2021-12-22 15:21       ` Jan Kara
2021-12-23  1:02         ` yukuai (C)
2021-12-23 17:13           ` Jan Kara
2021-11-29 17:12   ` Tejun Heo
2021-11-30 11:50     ` Jan Kara
2021-11-30 16:22       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f03b2b1c-808a-c657-327d-03165b988e7d@huawei.com \
    --to=yukuai3@huawei.com \
    --cc=cgroups@vger.kernel.org \
    --cc=fvogt@suse.de \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=mkoutny@suse.com \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).