From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1856C55181 for ; Mon, 20 Apr 2020 20:42:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7E8CD207FC for ; Mon, 20 Apr 2020 20:42:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7E8CD207FC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D45398E0005; Mon, 20 Apr 2020 16:42:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF59F8E0003; Mon, 20 Apr 2020 16:42:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE5748E0005; Mon, 20 Apr 2020 16:42:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id A802F8E0003 for ; Mon, 20 Apr 2020 16:42:00 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5F34E8248068 for ; Mon, 20 Apr 2020 20:42:00 +0000 (UTC) X-FDA: 76729405200.27.power84_2e92d063f0825 X-HE-Tag: power84_2e92d063f0825 X-Filterd-Recvd-Size: 9887 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Mon, 20 Apr 2020 20:41:59 +0000 (UTC) Received: by mail-pl1-f195.google.com with SMTP id t4so4375583plq.12 for ; Mon, 20 Apr 2020 13:41:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=W70HODHvPGy2tybwv11DFoJ/NeBLhWW5ThLnvqLq4jw=; b=thGRak4Z9L4RNhG3YfVpHUeaZrJA5Xx6Yw7j1yjyDS+vErtg0+U3Y+qq2U390c5XKd VloqZy3cqvm30cLNleQacZn22CQydQCyN4TPuZQ9fJwZ3qCRTZzj6pAjSL1i+1ejItAS LGh9jATE4JZtVB0jvwm9Wb3jdg9mwbEHLGb8xzuOyDgA9KS9VkFSWBnjHstKkF4PlSWQ N8ct+v9WBiiFqwf5fwmQLt3LDDD+eGQPRMGMe87V5u/y1Qh6Fyjb+WbsU1ZN+2p5eM/E pIZmBoURrMoJ+oncN8O2WZE4ZsNj76D1IpStMac8XE0KD2oobwvodBnvK2mOsp8QOJBq HOxQ== X-Gm-Message-State: AGi0Pubxcz7guTL60OZKOJK2IiqMVqbGcZRbZqPgPjRBrvAb9E5w2v/T 5sxvOy45sqBSkPKUlwng6dQ= X-Google-Smtp-Source: APiQypJ4sR9R+DlVxqQS0MoSp3x92ejjwbDh/Pdx52htFHQWU9alTiU9j5K9HouA86h3GL82FtnQHQ== X-Received: by 2002:a17:902:8487:: with SMTP id c7mr16757665plo.251.1587415318778; Mon, 20 Apr 2020 13:41:58 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id n16sm369107pfq.61.2020.04.20.13.41.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Apr 2020 13:41:57 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id DCAF84028E; Mon, 20 Apr 2020 20:41:56 +0000 (UTC) Date: Mon, 20 Apr 2020 20:41:56 +0000 From: Luis Chamberlain To: Greg KH Cc: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org, mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Omar Sandoval , Hannes Reinecke , Michal Hocko , syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Subject: Re: [PATCH v2 03/10] blktrace: fix debugfs use after free Message-ID: <20200420204156.GO11244@42.do-not-panic.com> References: <20200419194529.4872-1-mcgrof@kernel.org> <20200419194529.4872-4-mcgrof@kernel.org> <20200420201615.GC302402@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200420201615.GC302402@kroah.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 20, 2020 at 10:16:15PM +0200, Greg KH wrote: > On Sun, Apr 19, 2020 at 07:45:22PM +0000, Luis Chamberlain wrote: > > This patch triggered gmail's spam detection, your changelog text is > whack... Oh? What do you think triggered it? > > diff --git a/block/blk-debugfs.c b/block/blk-debugfs.c > > index 19091e1effc0..d84038bce0a5 100644 > > --- a/block/blk-debugfs.c > > +++ b/block/blk-debugfs.c > > @@ -3,6 +3,9 @@ > > /* > > * Shared request-based / make_request-based functionality > > */ > > + > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > + > > #include > > #include > > #include > > @@ -13,3 +16,30 @@ void blk_debugfs_register(void) > > { > > blk_debugfs_root = debugfs_create_dir("block", NULL); > > } > > + > > +int __must_check blk_queue_debugfs_register(struct request_queue *q) > > +{ > > + struct dentry *dir = NULL; > > + > > + /* This can happen if we have a bug in the lower layers */ > > + dir = debugfs_lookup(kobject_name(q->kobj.parent), blk_debugfs_root); > > + if (dir) { > > + pr_warn("%s: registering request_queue debugfs directory twice is not allowed\n", > > + kobject_name(q->kobj.parent)); > > + dput(dir); > > + return -EALREADY; > > + } > > + > > + q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), > > + blk_debugfs_root); > > + if (!q->debugfs_dir) > > + return -ENOMEM; > > Why doesn't the directory just live in the request queue, or somewhere > else, so that you save it when it is created and then that's it. No > need to "look it up" anywhere else. Its already there. And yes, after my changes it is technically possible to just re-use it directly. But this is complicated by a few things. One is that at this point in time, asynchronous request_queue removal is still possible, and so a race was exposed where a requeust_queue may be lingering but its old device is gone. That race is fixed by reverting us back to synchronous request_queue removal, therefore ensuring that the debugfs dir exists so long as the device does. I can remove the debugfs_lookup() *after* we revert to synchronous request_queue removal, or we just re-order the patches so that the revert happens first. That should simplify this patch. The code in this patch was designed to help dispute the logic behind the CVE, in particular it shows exactly where debugfs_dir *is* the one found by debugfs_lookup(), and shows the real issue behind the removal. But yeah, now that that is done, I hope its clear to all, and I think this patch can be simplified if we just revert the async requeust_queue removal first. > Or do you do that in later patches? I only see this one at the moment, > sorry... > > > static struct dentry *blk_trace_debugfs_dir(struct blk_user_trace_setup *buts, > > + struct request_queue *q, > > struct blk_trace *bt) > > { > > struct dentry *dir = NULL; > > > > + /* This can only happen if we have a bug on our lower layers */ > > + if (!q->kobj.parent) { > > + pr_warn("%s: request_queue parent is gone\n", buts->name); > > A kobject always has a parent, unless it has not been registered yet, so > I don't know what you are testing could ever happen. Or it has been kobject_del()'d? A deferred requeust_queue removal shows this is possible, the parent is taken underneath from us because the refcounting of this kobject is already kobject_del()'d, and its actual removal scheduled for later. The parent is taken underneath from us prior to the scheduled removal completing. > > > + return NULL; > > + } > > + > > + /* > > + * From a sysfs kobject perspective, the request_queue sits on top of > > + * the gendisk, which has the name of the disk. We always create a > > + * debugfs directory upon init for this gendisk kobject, so we re-use > > + * that if blktrace is going to be done for it. > > + */ > > + if (blk_trace_target_disk(buts->name, kobject_name(q->kobj.parent))) { > > + if (!q->debugfs_dir) { > > + pr_warn("%s: expected request_queue debugfs_dir is not set\n", > > + buts->name); > > What is userspace supposed to be able to do if they see this warning? Userspace doesn't parse warnings, but the NULL ensures it won't crash the kernel. The warn informs the kernel of a possible block layer bug. > > + return NULL; > > + } > > + /* > > + * debugfs_lookup() is used to ensure the directory is not > > + * taken from underneath us. We must dput() it later once > > + * done with it within blktrace. > > + */ > > + dir = debugfs_lookup(buts->name, blk_debugfs_root); > > + if (!dir) { > > + pr_warn("%s: expected request_queue debugfs_dir dentry is gone\n", > > + buts->name); > > Again, can't we just save the pointer when we create it and not have to > look it up again? Only if we do the revert of the requeust_queue removal first. > > + return NULL; > > + } > > + /* > > + * This is a reaffirmation that debugfs_lookup() shall always > > + * return the same dentry if it was already set. > > + */ > > I'm all for reaffirmation and the like, but really, is this needed??? To those who were still not sure that the issue was not a debugfs issue I hoped this to make it clear. But indeed, if we revert back to synchronous request_queue removal, that should simplify this. > > + if (dir != q->debugfs_dir) { > > + dput(dir); > > + pr_warn("%s: expected dentry dir != q->debugfs_dir\n", > > + buts->name); > > + return NULL; > > Why are you testing to see if debugfs really is working properly? > SHould all users do crazy things like this (hint, rhetorical > question...) No, this can happen with the race I mentioned above. > > + } > > + bt->backing_dir = q->debugfs_dir; > > + return bt->backing_dir; > > + } > > + > > + /* > > + * If not using blktrace on the gendisk, we are going to create a > > + * temporary debugfs directory for it, however this cannot be shared > > + * between two concurrent blktraces since the path is not unique, so > > + * ensure this is only done once. > > + */ > > dir = debugfs_lookup(buts->name, blk_debugfs_root); > > - if (!dir) > > - bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); > > + if (dir) { > > + pr_warn("%s: temporary blktrace debugfs directory already present\n", > > + buts->name); > > + dput(dir); > > + return NULL; > > + } > > + > > + bt->dir = debugfs_create_dir(buts->name, blk_debugfs_root); > > + if (!bt->dir) { > > + pr_warn("%s: temporary blktrace debugfs directory could not be created\n", > > + buts->name); > > Again, do not test the return value, you do not care. I've been > removing these checks from everywhere. Sure, the question still stands on what *should* the kernel do if the blktrace setup failed to create the debugfs files. Luis