linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michal Koutný" <mkoutny@suse.com>
To: Jan Kara <jack@suse.cz>
Cc: Paolo Valente <paolo.valente@linaro.org>,
	linux-block@vger.kernel.org, fvogdt@suse.de,
	cgroups@vger.kernel.org
Subject: Re: Use after free with BFQ and cgroups
Date: Fri, 26 Nov 2021 15:47:24 +0100	[thread overview]
Message-ID: <20211126144724.GA31093@blackbody.suse.cz> (raw)
In-Reply-To: <20211125172809.GC19572@quack2.suse.cz>

Hello.

On Thu, Nov 25, 2021 at 06:28:09PM +0100, Jan Kara <jack@suse.cz> wrote:
[...]
+Cc cgroups ML
https://lore.kernel.org/linux-block/20211125172809.GC19572@quack2.suse.cz/


I understand there are more objects than blkcgs but I assume it can
eventually boil down to blkcg references, so I suggest another
alternative. (But I may easily miss the relations between BFQ objects,
so consider this only high-level opinion.)

> After some poking, looking into crashdumps, and applying some debug patches
> the following seems to be happening: We have a process P in blkcg G. Now
> G is taken offline so bfq_group is cleaned up in bfq_pd_offline() but P
> still holds reference to G from its bfq_queue. Then P submits IO, G gets
> inserted into service tree despite being already offline.

(If G is offline, P can only be zombie, just saying. (I guess it can
still be Q's IO on behalf of G.))

IIUC, the reference to G is only held by P. If the G reference is copied
into another structure (the service tree) it should get another
reference. My naïve proposal would be css_get(). (1)

> IO completes, P exits, bfq_queue pointing to G gets destroyed, the
> last reference to G is dropped, G gets freed although is it still
> inserted in the service tree.  Eventually someone trips over the freed
> memory.

Isn't it the bfq_queue.bfq_entity that's inserted in the service tree
(not blkcg G)?
You write bfq_queue is destroyed, shouldn't that remove it from the
service tree? (2)

> Now I was looking into how to best fix this. There are several
> possibilities and I'm not sure which one to pick so that's why I'm writing
> to you. bfq_pd_offline() is walking all entities in service trees and
> trying to get rid of references to bfq_group (by reparenting entities).
> Is this guaranteed to see all entities that point to G? From the scenario
> I'm observing it seems this can miss entities pointing to G - e.g. if they
> are in idle tree, we will just remove them from the idle tree but we won't
> change entity->parent so they still point to G. This can be seen as one
> culprit of the bug.

There can be two types of references to blkcg (transitively via
bfq_group):
a) "plain" (just a pointer stored somewhere),
b) "pinned" (marked by css_get() of the respective blkcg).

The bfq_pd_offline() callback should erase all plain references (e.g. by
reparenting) or poke the holders of pinned references to release (unpin)
them eventually (so that blkcg goes away).

I reckon it's not possible to traverse all references in the
bfq_pd_offline().

> Or alternatively, should we e.g. add __bfq_deactivate_entity() to
> bfq_put_queue() when that function is dropping last queue in a bfq_group?

I guess this is what I wondered about in (2). (But I'm not sure this
really is proof against subsequent re-insertions into the tree.)

> Or should we just reparent bfq queues that have already dead parent on
> activation?

If (1) used css_tryget_online(), the parent (or ancestor if it happened
to be offlined too) could be the fallback.

> What's your opinion?

The question here is how long would stay the offlined blkcgs around if
they were directly pinned upon the IO submission. If it's unbound, then
reparenting makes more sense.


Michal

  reply	other threads:[~2021-11-26 14:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-25 17:28 Use after free with BFQ and cgroups Jan Kara
2021-11-26 14:47 ` Michal Koutný [this message]
2021-11-29 17:11   ` Jan Kara
2021-12-09  2:23     ` yukuai (C)
2021-12-09 15:33       ` Paolo Valente
2021-12-13 17:33       ` Jan Kara
2021-12-14  1:24         ` yukuai (C)
2021-12-20 18:38           ` Jan Kara
2021-12-22 15:21       ` Jan Kara
2021-12-23  1:02         ` yukuai (C)
2021-12-23 17:13           ` Jan Kara
2021-11-29 17:12   ` Tejun Heo
2021-11-30 11:50     ` Jan Kara
2021-11-30 16:22       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211126144724.GA31093@blackbody.suse.cz \
    --to=mkoutny@suse.com \
    --cc=cgroups@vger.kernel.org \
    --cc=fvogdt@suse.de \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).