From: Luis Chamberlain <mcgrof@kernel.org>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org,
rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz,
ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org,
mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Omar Sandoval <osandov@fb.com>,
Hannes Reinecke <hare@suse.com>, Michal Hocko <mhocko@kernel.org>,
syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com
Subject: Re: [PATCH v2 03/10] blktrace: fix debugfs use after free
Date: Wed, 22 Apr 2020 07:28:59 +0000 [thread overview]
Message-ID: <20200422072859.GQ11244@42.do-not-panic.com> (raw)
In-Reply-To: <20200421070048.GD347130@kroah.com>
On Tue, Apr 21, 2020 at 09:00:48AM +0200, Greg KH wrote:
> On Mon, Apr 20, 2020 at 08:41:56PM +0000, Luis Chamberlain wrote:
> > On Mon, Apr 20, 2020 at 10:16:15PM +0200, Greg KH wrote:
> > > On Sun, Apr 19, 2020 at 07:45:22PM +0000, Luis Chamberlain wrote:
> > >
> > > This patch triggered gmail's spam detection, your changelog text is
> > > whack...
> >
> > Oh? What do you think triggered it?
>
> No idea.
Alright, well I'm going to move most of the analysis to the bug report
and be as concise as possible on the commit log.
> > > > diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c
> > > > index 19091e1effc0..d84038bce0a5 100644
> > > > --- a/block/blk-debugfs.c
> > > > +++ b/block/blk-debugfs.c
> > > > @@ -3,6 +3,9 @@
> > > > /*
> > > > * Shared request-based / make_request-based functionality
> > > > */
> > > > +
> > > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > > > +
> > > > #include <linux/kernel.h>
> > > > #include <linux/blkdev.h>
> > > > #include <linux/debugfs.h>
> > > > @@ -13,3 +16,30 @@ void blk_debugfs_register(void)
> > > > {
> > > > blk_debugfs_root = debugfs_create_dir("block", NULL);
> > > > }
> > > > +
> > > > +int __must_check blk_queue_debugfs_register(struct request_queue *q)
> > > > +{
> > > > + struct dentry *dir = NULL;
> > > > +
> > > > + /* This can happen if we have a bug in the lower layers */
> > > > + dir = debugfs_lookup(kobject_name(q->kobj.parent), blk_debugfs_root);
> > > > + if (dir) {
> > > > + pr_warn("%s: registering request_queue debugfs directory twice is not allowed\n",
> > > > + kobject_name(q->kobj.parent));
> > > > + dput(dir);
> > > > + return -EALREADY;
> > > > + }
> > > > +
> > > > + q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
> > > > + blk_debugfs_root);
> > > > + if (!q->debugfs_dir)
> > > > + return -ENOMEM;
> > >
> > > Why doesn't the directory just live in the request queue, or somewhere
> > > else, so that you save it when it is created and then that's it. No
> > > need to "look it up" anywhere else.
> >
> > Its already there. And yes, after my changes it is technically possible
> > to just re-use it directly. But this is complicated by a few things. One
> > is that at this point in time, asynchronous request_queue removal is
> > still possible, and so a race was exposed where a requeust_queue may be
> > lingering but its old device is gone. That race is fixed by reverting us
> > back to synchronous request_queue removal, therefore ensuring that the
> > debugfs dir exists so long as the device does.
> >
> > I can remove the debugfs_lookup() *after* we revert to synchronous
> > request_queue removal, or we just re-order the patches so that the
> > revert happens first. That should simplify this patch.
> >
> > The code in this patch was designed to help dispute the logic behind
> > the CVE, in particular it shows exactly where debugfs_dir *is* the
> > one found by debugfs_lookup(), and shows the real issue behind the
> > removal.
> >
> > But yeah, now that that is done, I hope its clear to all, and I think
> > this patch can be simplified if we just revert the async requeust_queue
> > removal first.
>
> Don't try to "dispute" crazyness, that's not what kernel code is for.
> Just do the right thing, and simply saving off the pointer to the
> debugfs file when created is the "correct" thing to do, no matter what.
> No race conditions or anything else can happen when you do that.
Nope, races are still possible even if we move revert back to sync
request_queue removal, but I do believe that just reveals a bug
elsewhere, which I'll just fix as I think I know where this is.
> > > Or do you do that in later patches? I only see this one at the moment,
> > > sorry...
> > >
> > > > static struct dentry *blk_trace_debugfs_dir(struct blk_user_trace_setup *buts,
> > > > + struct request_queue *q,
> > > > struct blk_trace *bt)
> > > > {
> > > > struct dentry *dir = NULL;
> > > >
> > > > + /* This can only happen if we have a bug on our lower layers */
> > > > + if (!q->kobj.parent) {
> > > > + pr_warn("%s: request_queue parent is gone\n", buts->name);
> > >
> > > A kobject always has a parent, unless it has not been registered yet, so
> > > I don't know what you are testing could ever happen.
> >
> > Or it has been kobject_del()'d?
>
> If that happened, how in the world are you in this function anyway, as
> the request_queue is an invalid pointer at that point in time???
Nope, the block layer still finishes some work on it.
Drivers are allowed to cleanup a block device in this order, this
example, is from the loop block driver:
static void loop_remove(struct loop_device *lo)
{
del_gendisk(lo->lo_disk);
blk_cleanup_queue(lo->lo_queue);
blk_mq_free_tag_set(&lo->tag_set);
put_disk(lo->lo_disk);
kfree(lo);
}
At this point in time patch-wise we still haven't reverted back to
synchronous request_queue removal. Considering this, a race with the
parent disappearing can happen because the request_queue removal is
deferred, that is, the request_queue's kobject's release() call used
schedule_work() to finish off its removal. We expect the last
blk_put_queue() to be called at the end of blk_cleanup_queue(). Since
that is deferred and device_del() is called also at the end of
del_gendisk(), it means the release of the queue can happen in a
context where the disk is gone.
Although this issue is triggerable easily with the current async
request_queue removal, I can think of other ways to trigger an issue
here and one of them was suggested as possible by Christoph on the last
v1 patch series.
blk_queue_get() is not atomic and so what it returns can be incorrect:
bool blk_get_queue(struct request_queue *q)
{
if (likely(!blk_queue_dying(q))) {
__blk_get_queue(q);
return true;
}
----> we can schedule here easily and move the queue to dying
----> and race with blk_cleanup_queue() which will then allow
----> code paths to incorrectly trigger the release of the queue
----> in places we don't want
return false;
}
EXPORT_SYMBOL(blk_get_queue);
Another area of concern I am seeing through code inspection is that
since the request_queue life depends on the disk, it seemse odd we call
device_del() before we remove the request_queue. If this is indeed
wrong, we could move the device_del() from del_gendisk() to
disk_release() triggered by the last put_disk().
I have a test now which shows that even if we revert back to synchronous
request_queue removal I can trigger a panic on the same use-after-free
case on debugfs on blktrace, and this is because we are overwriting the
same debugfs directory, and I think the above are its root causes.
I can fix this bug, but that just moves the bug to conflicts within
two sysfs objects already existing, and this is because add_disk()
(which calls __device_add_disk() doesn't return errors). This is both
a blk layer bug in the sense we never check for error and a driver bug
for allowing conflicts. All this just needs to be fixed, and although I
thought this could be done later, I think I'm just going to fix all this
now.
I have reproducer now which enables the same race with synchronous
request_queue removal, I'll work my way towards fixing all this...
> > A deferred requeust_queue removal shows this is possible, the parent is
> > taken underneath from us because the refcounting of this kobject is
> > already kobject_del()'d, and its actual removal scheduled for later.
> > The parent is taken underneath from us prior to the scheduled removal
> > completing.
>
> No, a parent's reference is always valid while the child pointer is
> alive.
OK I'll keep thsi in mind then.
> There is an odd race condition right now that we are working on fixing
> if you notice another lkml thread, but that race will soon be fixed.
Got a pointer?
> So
> no need for every single user in the kernel to try to test for something
> like this (hint, this check is still wrong as with this logic, what
> could prevent parent from going away right _after_ you check it???)
>
> Just remove this invalid check please.
Yeah sure, make sense.
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + /*
> > > > + * From a sysfs kobject perspective, the request_queue sits on top of
> > > > + * the gendisk, which has the name of the disk. We always create a
> > > > + * debugfs directory upon init for this gendisk kobject, so we re-use
> > > > + * that if blktrace is going to be done for it.
> > > > + */
> > > > + if (blk_trace_target_disk(buts->name, kobject_name(q->kobj.parent))) {
> > > > + if (!q->debugfs_dir) {
> > > > + pr_warn("%s: expected request_queue debugfs_dir is not set\n",
> > > > + buts->name);
> > >
> > > What is userspace supposed to be able to do if they see this warning?
> >
> > Userspace doesn't parse warnings, but the NULL ensures it won't crash
> > the kernel. The warn informs the kernel of a possible block layer bug.
>
> Again, the code should not care if the pointer is NULL, as only debugfs
> deals with those pointers (and it can handle a NULL pointer just fine.)
Alright.
> > > > + return NULL;
> > > > + }
> > > > + /*
> > > > + * debugfs_lookup() is used to ensure the directory is not
> > > > + * taken from underneath us. We must dput() it later once
> > > > + * done with it within blktrace.
> > > > + */
> > > > + dir = debugfs_lookup(buts->name, blk_debugfs_root);
> > > > + if (!dir) {
> > > > + pr_warn("%s: expected request_queue debugfs_dir dentry is gone\n",
> > > > + buts->name);
> > >
> > > Again, can't we just save the pointer when we create it and not have to
> > > look it up again?
> >
> > Only if we do the revert of the requeust_queue removal first.
>
> Your end goal should be no more debugfs_lookup() calls. Hopefully by
> the end of this patch series, that is the result.
Alrighty.
> > > > + return NULL;
> > > > + }
> > > > + /*
> > > > + * This is a reaffirmation that debugfs_lookup() shall always
> > > > + * return the same dentry if it was already set.
> > > > + */
> > >
> > > I'm all for reaffirmation and the like, but really, is this needed???
> >
> > To those who were still not sure that the issue was not a debugfs issue
> > I hoped this to make it clear. But indeed, if we revert back to
> > synchronous request_queue removal, that should simplify this.
>
> Again, don't pander to crazies :)
At least I didn't say it :)
Luis
next prev parent reply other threads:[~2020-04-22 7:29 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-19 19:45 [PATCH v2 00/10] block: fix blktrace debugfs use after free Luis Chamberlain
2020-04-19 19:45 ` [PATCH v2 01/10] block: move main block debugfs initialization to its own file Luis Chamberlain
2020-04-19 21:06 ` Bart Van Assche
2020-04-19 19:45 ` [PATCH v2 02/10] blktrace: move blktrace debugfs creation to helper function Luis Chamberlain
2020-04-19 21:11 ` Bart Van Assche
2020-04-22 7:12 ` Christoph Hellwig
2020-04-19 19:45 ` [PATCH v2 03/10] blktrace: fix debugfs use after free Luis Chamberlain
2020-04-19 21:55 ` Bart Van Assche
2020-04-20 0:04 ` Luis Chamberlain
2020-04-20 0:38 ` Bart Van Assche
2020-04-20 18:46 ` Luis Chamberlain
2020-04-20 20:16 ` Greg KH
2020-04-20 20:41 ` Luis Chamberlain
2020-04-21 7:00 ` Greg KH
2020-04-22 7:28 ` Luis Chamberlain [this message]
2020-04-22 9:43 ` Ming Lei
2020-04-22 10:31 ` Luis Chamberlain
2020-04-24 23:47 ` Luis Chamberlain
2020-04-22 7:29 ` Christoph Hellwig
2020-04-22 7:34 ` Luis Chamberlain
2020-04-22 7:27 ` Christoph Hellwig
2020-04-22 7:48 ` Luis Chamberlain
2020-04-22 8:10 ` Christoph Hellwig
2020-04-22 8:26 ` Luis Chamberlain
2020-04-19 19:45 ` [PATCH v2 04/10] block: revert back to synchronous request_queue removal Luis Chamberlain
2020-04-19 22:23 ` Bart Van Assche
2020-04-20 18:59 ` Luis Chamberlain
2020-04-20 21:11 ` Bart Van Assche
2020-04-20 21:51 ` Luis Chamberlain
2020-04-19 19:45 ` [PATCH v2 05/10] blktrace: upgrade warns to BUG_ON() on unexpected circmunstances Luis Chamberlain
2020-04-19 22:50 ` Bart Van Assche
2020-04-19 23:07 ` Luis Chamberlain
2020-04-20 23:20 ` Steven Rostedt
2020-04-19 19:45 ` [PATCH v2 06/10] blk-debugfs: upgrade warns to BUG_ON() if directory is already found Luis Chamberlain
2020-04-20 11:36 ` Greg KH
2020-04-25 11:43 ` [blk] 90d38c0e30: kernel_BUG_at_block/blk-debugfs.c kernel test robot
2020-04-19 19:45 ` [PATCH v2 07/10] blktrace: move debugfs file creation to its own function Luis Chamberlain
2020-04-19 22:55 ` Bart Van Assche
2020-04-20 11:37 ` Greg KH
2020-04-19 19:45 ` [PATCH v2 08/10] blktrace: add checks for created debugfs files on setup Luis Chamberlain
2020-04-19 22:57 ` Bart Van Assche
2020-04-19 23:05 ` Luis Chamberlain
2020-04-19 23:17 ` Bart Van Assche
2020-04-20 11:40 ` Greg KH
2020-04-20 18:44 ` Luis Chamberlain
2020-04-20 20:11 ` Greg KH
2020-04-20 20:20 ` Luis Chamberlain
2020-04-21 6:55 ` Greg KH
2020-04-20 11:39 ` Greg KH
2020-04-19 19:45 ` [PATCH v2 09/10] block: panic if block debugfs dir is not created Luis Chamberlain
2020-04-19 23:08 ` Bart Van Assche
2020-04-20 11:38 ` Greg KH
2020-04-19 19:45 ` [PATCH v2 10/10] block: put_device() if device_add() fails Luis Chamberlain
2020-04-19 23:40 ` Bart Van Assche
2020-04-24 22:32 ` Luis Chamberlain
2020-04-25 1:58 ` Bart Van Assche
2020-04-25 2:12 ` Luis Chamberlain
2020-04-19 19:48 ` [PATCH v2 00/10] block: fix blktrace debugfs use after free Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200422072859.GQ11244@42.do-not-panic.com \
--to=mcgrof@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=gregkh@linuxfoundation.org \
--cc=hare@suse.com \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=ming.lei@redhat.com \
--cc=mingo@redhat.com \
--cc=nstange@suse.de \
--cc=osandov@fb.com \
--cc=rostedt@goodmis.org \
--cc=syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).