linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org,
	gregkh@linuxfoundation.org, rostedt@goodmis.org,
	mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com,
	nstange@suse.de, akpm@linux-foundation.org, mhocko@suse.com,
	yukuai3@huawei.com, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Omar Sandoval <osandov@fb.com>,
	Hannes Reinecke <hare@suse.com>, Michal Hocko <mhocko@kernel.org>,
	syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com
Subject: Re: [PATCH v2 03/10] blktrace: fix debugfs use after free
Date: Wed, 22 Apr 2020 00:27:15 -0700	[thread overview]
Message-ID: <20200422072715.GC19116@infradead.org> (raw)
In-Reply-To: <20200419194529.4872-4-mcgrof@kernel.org>

On Sun, Apr 19, 2020 at 07:45:22PM +0000, Luis Chamberlain wrote:
> On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
> merged on v4.12 Omar fixed the original blktrace code for request-based
> drivers (multiqueue). This however left in place a possible crash, if you
> happen to abuse blktrace in a way it was not intended, and even more so
> with our current asynchronous request_queue removal.
> 
> Namely, if you loop adding a device, setup the blktrace with BLKTRACESETUP,
> forget to BLKTRACETEARDOWN, and then just remove the device you end up
> with a panic:

FYI, I find all this backtrace garbage not hepful at all.  It requires
me to scroll for so long that I've forgot what was written above by
the time I'm past it.

> This splat happens to be very similar to the one reported via
> kernel.org korg#205713, only that korg#205713 was for v4.19.83
> and the above now includes the simple_recursive_removal() introduced
> via commit a3d1e7eb5abe ("simple_recursive_removal(): kernel-side rm
> -rf for ramfs-style filesystems") merged on v5.6.
> 
> korg#205713 then was used to create CVE-2019-19770 and claims that
> the bug is in a use-after-free in the debugfs core code. The
> implications of this being a generic UAF on debugfs would be
> much more severe, as it would imply parent dentries can sometimes
> not be positive, which we hold by design is just not possible.
> 
> Below is the splat explained with a bit more details, explaining
> what is happening in userspace, kernel, and a print of the CPU on,
> which the code runs on:
> 
> load loopback module
> [   13.603371] == blk_mq_debugfs_register(12) start
> [   13.604040] == blk_mq_debugfs_register(12) q->debugfs_dir created

Same for this..  I think the real valuable changelog only stars below
this 'trace'.

> The root cause to this issue is that debugfs_lookup() can find a
> previous incarnation's dir of the same name which is about to get
> removed from a not yet schedule work. When that happens, the the files
> are taken underneath the nose of the blktrace, and when it comes time to
> cleanup, these dentries are long gone because of a scheduled removal.
> 
> This issue is happening because of two reasons:
> 
>   1) The request_queue is currently removed asynchronously as of commit
>      dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on
>      v4.12, this allows races with userspace which were not possible
>      before unless as removal of a block device used to happen
>      synchronously with its request_queue. One could however still
>      parallelize blksetup calls while one loops on device addition and
>      removal.
> 
>   2) There are no errors checks when we create the debugfs directory,
>      be it on init or for blktrace. The concept of when the directory
>      *should* really exist is further complicated by the fact that
>      we have asynchronous request_queue removal. And, we have no
>      real sanity checks to ensure we don't re-create the queue debugfs
>      directory.
> 
> We can fix the UAF by using a debugfs directory which moving forward
> will always be accessible if debugfs is enabled, this way, its allocated
> and avaialble always for both request-based block drivers or
> make_request drivers (multiqueue) block drivers.
> 
> We also put sanity checks in place to ensure that if the directory is
> found with debugfs_lookup() it is the dentry we expect. When doing a
> blktrace against a parition, we will always be creating a temporary
> debugfs directory, so ensure that only exists once as well to avoid
> issues against concurrent blktrace runs.
> 
> Lastly, since we are now always creating the needed request_queue
> debugfs directory upon init, we can also take the initiative to
> proactively check against errors. We currently do not check for
> errors on add_disk() and friends, but we shouldn't make the issue
> any worse.
> 
> This also simplifies the code considerably, with the only penalty now
> being that we're always creating the request queue debugfs directory for
> the request-based block device drivers.
> 
> The UAF then is not a core debugfs issue, but instead a complex misuse
> of debugfs, and this issue can only be triggered if you are root.
> 
> This issue can be reproduced with break-blktrace [2] using:
> 
>   break-blktrace -c 10 -d -s
> 
> This patch fixes this issue. Note that there is also another
> respective UAF but from the ioctl path [3], this should also fix
> that issue.
> 
> This patch then also disputes the severity of CVE-2019-19770 as
> this issue is only possible by being root and using blktrace.
> 
> It is not a core debugfs issue.
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=205713
> [1] https://nvd.nist.gov/vuln/detail/CVE-2019-19770
> [2] https://github.com/mcgrof/break-blktrace
> [3] https://lore.kernel.org/lkml/000000000000ec635b059f752700@google.com/

> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

This looks like an unrelated change.

> +
>  #include <linux/kernel.h>
>  #include <linux/blkdev.h>
>  #include <linux/debugfs.h>
> @@ -13,3 +16,30 @@ void blk_debugfs_register(void)
>  {
>  	blk_debugfs_root = debugfs_create_dir("block", NULL);
>  }
> +
> +int __must_check blk_queue_debugfs_register(struct request_queue *q)

__must_check for a function with a single caller looks silly.

> +{
> +	struct dentry *dir = NULL;
> +
> +	/* This can happen if we have a bug in the lower layers */
> +	dir = debugfs_lookup(kobject_name(q->kobj.parent), blk_debugfs_root);
> +	if (dir) {
> +		pr_warn("%s: registering request_queue debugfs directory twice is not allowed\n",
> +			kobject_name(q->kobj.parent));
> +		dput(dir);
> +		return -EALREADY;
> +	}

I don't see why we need this check.  If it is valueable enough we
should have a debugfs_create_dir_exclusive or so that retunrns an error
for an exsting directory, instead of reimplementing it in the caller in
a racy way.  But I'm not really sure we need it to start with.

> +
> +	q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
> +					    blk_debugfs_root);
> +	if (!q->debugfs_dir)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +void blk_queue_debugfs_unregister(struct request_queue *q)
> +{
> +	debugfs_remove_recursive(q->debugfs_dir);
> +	q->debugfs_dir = NULL;
> +}

Which to me suggests we can just fold these two into the callers,
with an IS_ENABLED for the creation case given that we check for errors
and the stub will always return an error.

>  	debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs);
>  
>  	/*
> @@ -856,9 +853,7 @@ void blk_mq_debugfs_register(struct request_queue *q)
>  
>  void blk_mq_debugfs_unregister(struct request_queue *q)
>  {
> -	debugfs_remove_recursive(q->debugfs_dir);
>  	q->sched_debugfs_dir = NULL;
> -	q->debugfs_dir = NULL;
>  }

This function is weird - the sched dir gets removed by the
debugfs_remove_recursive, so just leaving a function that clears
a pointer is rather odd.  In fact I don't think we need to clear
either sched_debugfs_dir or debugfs_dir anywhere.

>  
> @@ -975,6 +976,14 @@ int blk_register_queue(struct gendisk *disk)
>  		goto unlock;
>  	}
>  
> +	ret = blk_queue_debugfs_register(q);
> +	if (ret) {
> +		blk_trace_remove_sysfs(dev);
> +		kobject_del(&q->kobj);
> +		kobject_put(&dev->kobj);
> +		goto unlock;
> +	}
> +

Please use a goto label to consolidate the common cleanup code.

Also I think these generic debugfs changes probably should be separate
to the blktrace changes.

> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
>  #include <linux/kernel.h>
>  #include <linux/blkdev.h>
>  #include <linux/blktrace_api.h>
> @@ -311,7 +314,15 @@ static void blk_trace_free(struct blk_trace *bt)
>  	debugfs_remove(bt->msg_file);
>  	debugfs_remove(bt->dropped_file);
>  	relay_close(bt->rchan);
> -	debugfs_remove(bt->dir);
> +	/*
> +	 * backing_dir is set when we use the request_queue debugfs directory.
> +	 * Otherwise we are using a temporary directory created only for the
> +	 * purpose of blktrace.
> +	 */
> +	if (bt->backing_dir)
> +		dput(bt->backing_dir);
> +	else
> +		debugfs_remove(bt->dir);
>  	free_percpu(bt->sequence);
>  	free_percpu(bt->msg_data);
>  	kfree(bt);
> @@ -468,16 +479,89 @@ static void blk_trace_setup_lba(struct blk_trace *bt,
>  	}
>  }
>  
> +static bool blk_trace_target_disk(const char *target, const char *diskname)
> +{
> +	if (strlen(target) != strlen(diskname))
> +		return false;
> +
> +	if (!strncmp(target, diskname,
> +		     min_t(size_t, strlen(target), strlen(diskname))))
> +		return true;
> +
> +	return false;
> +}
> +
>  static struct dentry *blk_trace_debugfs_dir(struct blk_user_trace_setup *buts,
> +					    struct request_queue *q,
>  					    struct blk_trace *bt)
>  {
>  	struct dentry *dir = NULL;
>  
> +	/* This can only happen if we have a bug on our lower layers */
> +	if (!q->kobj.parent) {
> +		pr_warn("%s: request_queue parent is gone\n", buts->name);
> +		return NULL;
> +	}

Why is this not simply a WARN_ON_ONCE()?

> +	/*
> +	 * From a sysfs kobject perspective, the request_queue sits on top of
> +	 * the gendisk, which has the name of the disk. We always create a
> +	 * debugfs directory upon init for this gendisk kobject, so we re-use
> +	 * that if blktrace is going to be done for it.
> +	 */

-EPARSE.

> +	if (blk_trace_target_disk(buts->name, kobject_name(q->kobj.parent))) {
> +		if (!q->debugfs_dir) {
> +			pr_warn("%s: expected request_queue debugfs_dir is not set\n",
> +				buts->name);
> +			return NULL;
> +		}
> +		/*
> +		 * debugfs_lookup() is used to ensure the directory is not
> +		 * taken from underneath us. We must dput() it later once
> +		 * done with it within blktrace.
> +		 */
> +		dir = debugfs_lookup(buts->name, blk_debugfs_root);
> +		if (!dir) {
> +			pr_warn("%s: expected request_queue debugfs_dir dentry is gone\n",
> +				buts->name);
> +			return NULL;
> +		}
> +		 /*
> +		 * This is a reaffirmation that debugfs_lookup() shall always
> +		 * return the same dentry if it was already set.
> +		 */
> +		if (dir != q->debugfs_dir) {
> +			dput(dir);
> +			pr_warn("%s: expected dentry dir != q->debugfs_dir\n",
> +				buts->name);
> +			return NULL;
> +		}
> +		bt->backing_dir = q->debugfs_dir;
> +		return bt->backing_dir;
> +	}

Even with the gigantic commit log I don't get the point of this
code.  It looks rather sketchy and I can't find a rationale for it.

  parent reply	other threads:[~2020-04-22  7:27 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-19 19:45 [PATCH v2 00/10] block: fix blktrace debugfs use after free Luis Chamberlain
2020-04-19 19:45 ` [PATCH v2 01/10] block: move main block debugfs initialization to its own file Luis Chamberlain
2020-04-19 21:06   ` Bart Van Assche
2020-04-19 19:45 ` [PATCH v2 02/10] blktrace: move blktrace debugfs creation to helper function Luis Chamberlain
2020-04-19 21:11   ` Bart Van Assche
2020-04-22  7:12   ` Christoph Hellwig
2020-04-19 19:45 ` [PATCH v2 03/10] blktrace: fix debugfs use after free Luis Chamberlain
2020-04-19 21:55   ` Bart Van Assche
2020-04-20  0:04     ` Luis Chamberlain
2020-04-20  0:38       ` Bart Van Assche
2020-04-20 18:46         ` Luis Chamberlain
2020-04-20 20:16   ` Greg KH
2020-04-20 20:41     ` Luis Chamberlain
2020-04-21  7:00       ` Greg KH
2020-04-22  7:28         ` Luis Chamberlain
2020-04-22  9:43           ` Ming Lei
2020-04-22 10:31             ` Luis Chamberlain
2020-04-24 23:47             ` Luis Chamberlain
2020-04-22  7:29       ` Christoph Hellwig
2020-04-22  7:34         ` Luis Chamberlain
2020-04-22  7:27   ` Christoph Hellwig [this message]
2020-04-22  7:48     ` Luis Chamberlain
2020-04-22  8:10       ` Christoph Hellwig
2020-04-22  8:26         ` Luis Chamberlain
2020-04-19 19:45 ` [PATCH v2 04/10] block: revert back to synchronous request_queue removal Luis Chamberlain
2020-04-19 22:23   ` Bart Van Assche
2020-04-20 18:59     ` Luis Chamberlain
2020-04-20 21:11       ` Bart Van Assche
2020-04-20 21:51         ` Luis Chamberlain
2020-04-19 19:45 ` [PATCH v2 05/10] blktrace: upgrade warns to BUG_ON() on unexpected circmunstances Luis Chamberlain
2020-04-19 22:50   ` Bart Van Assche
2020-04-19 23:07     ` Luis Chamberlain
2020-04-20 23:20       ` Steven Rostedt
2020-04-19 19:45 ` [PATCH v2 06/10] blk-debugfs: upgrade warns to BUG_ON() if directory is already found Luis Chamberlain
2020-04-20 11:36   ` Greg KH
2020-04-25 11:43   ` [blk] 90d38c0e30: kernel_BUG_at_block/blk-debugfs.c kernel test robot
2020-04-19 19:45 ` [PATCH v2 07/10] blktrace: move debugfs file creation to its own function Luis Chamberlain
2020-04-19 22:55   ` Bart Van Assche
2020-04-20 11:37     ` Greg KH
2020-04-19 19:45 ` [PATCH v2 08/10] blktrace: add checks for created debugfs files on setup Luis Chamberlain
2020-04-19 22:57   ` Bart Van Assche
2020-04-19 23:05     ` Luis Chamberlain
2020-04-19 23:17       ` Bart Van Assche
2020-04-20 11:40         ` Greg KH
2020-04-20 18:44           ` Luis Chamberlain
2020-04-20 20:11             ` Greg KH
2020-04-20 20:20               ` Luis Chamberlain
2020-04-21  6:55                 ` Greg KH
2020-04-20 11:39   ` Greg KH
2020-04-19 19:45 ` [PATCH v2 09/10] block: panic if block debugfs dir is not created Luis Chamberlain
2020-04-19 23:08   ` Bart Van Assche
2020-04-20 11:38   ` Greg KH
2020-04-19 19:45 ` [PATCH v2 10/10] block: put_device() if device_add() fails Luis Chamberlain
2020-04-19 23:40   ` Bart Van Assche
2020-04-24 22:32     ` Luis Chamberlain
2020-04-25  1:58       ` Bart Van Assche
2020-04-25  2:12         ` Luis Chamberlain
2020-04-19 19:48 ` [PATCH v2 00/10] block: fix blktrace debugfs use after free Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200422072715.GC19116@infradead.org \
    --to=hch@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=ming.lei@redhat.com \
    --cc=mingo@redhat.com \
    --cc=nstange@suse.de \
    --cc=osandov@fb.com \
    --cc=rostedt@goodmis.org \
    --cc=syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).