From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB60FC433E0 for ; Wed, 27 May 2020 03:12:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 878B5206C3 for ; Wed, 27 May 2020 03:12:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 878B5206C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 165D3800B6; Tue, 26 May 2020 23:12:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 115FC80010; Tue, 26 May 2020 23:12:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 004C0800B6; Tue, 26 May 2020 23:12:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id DA06E80010 for ; Tue, 26 May 2020 23:12:06 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9D2E74DBC for ; Wed, 27 May 2020 03:12:06 +0000 (UTC) X-FDA: 76861025052.02.tail78_0910c0e26d4e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 8408A4DAB for ; Wed, 27 May 2020 03:12:06 +0000 (UTC) X-HE-Tag: tail78_0910c0e26d4e X-Filterd-Recvd-Size: 11665 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Wed, 27 May 2020 03:12:05 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id j21so11053709pgb.7 for ; Tue, 26 May 2020 20:12:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=jPdSyUNbBu4Tk0UBqM+yLv8ZqjemRwgoOw92D8fo/Ys=; b=jcjaEjH+r2xeBGkpl5wND7Pa+ev/e0MGUFsREavtHIUnrZjYgd+NCDnMTE+p5Ey+uA 0vFiIGzdt+DrARPNjy1jwTmljuMpyr9/pp2WMeMJ+0xPA9CQ0o+JZmmutUJ82YyPWBLO qjJJEB50x8jteegTJysHM8x4QjOE5X6X9FxVx2rP60pyMJ9mmB1zfGw+WJnu8YI1w2cT 5RDpP+cBUx4PbvSIIKyEJgz5ygdMRFyvOr2mtK2pNewIdw5CM9rGaC/PwlfE86MUCMnT 6Evk4JRKPZ5t6s+6EYZTAx+9HcpqBtbEYYN1BsS3EuCaWhUJEEB7bSEyeE/FFJQZBMCW 3L4A== X-Gm-Message-State: AOAM532wE5qwoJIU2CbwFsQekiHDe0AJcII92pjHmoK5bMDctRmxWn1x je/90oII8FOkGEqqsYA0gLo= X-Google-Smtp-Source: ABdhPJyKomyTJQ5gaYgDKB/WKtQZxfmOCKI9VPkAPDTYVfHGsWhziXM0XJANNASrM5gh/ua4aEyXBw== X-Received: by 2002:a63:510c:: with SMTP id f12mr1806776pgb.20.1590549124513; Tue, 26 May 2020 20:12:04 -0700 (PDT) Received: from 42.do-not-panic.com (42.do-not-panic.com. [157.230.128.187]) by smtp.gmail.com with ESMTPSA id l11sm853406pjj.33.2020.05.26.20.12.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 May 2020 20:12:03 -0700 (PDT) Received: by 42.do-not-panic.com (Postfix, from userid 1000) id 5391D419C3; Wed, 27 May 2020 03:12:02 +0000 (UTC) Date: Wed, 27 May 2020 03:12:02 +0000 From: Luis Chamberlain To: Christoph Hellwig Cc: axboe@kernel.dk, viro@zeniv.linux.org.uk, bvanassche@acm.org, gregkh@linuxfoundation.org, rostedt@goodmis.org, mingo@redhat.com, jack@suse.cz, ming.lei@redhat.com, nstange@suse.de, akpm@linux-foundation.org, mhocko@suse.com, yukuai3@huawei.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Omar Sandoval , Hannes Reinecke , Michal Hocko , syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Subject: Re: [PATCH v5 5/7] blktrace: fix debugfs use after free Message-ID: <20200527031202.GT11244@42.do-not-panic.com> References: <20200516031956.2605-1-mcgrof@kernel.org> <20200516031956.2605-6-mcgrof@kernel.org> <20200519163713.GA29944@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200519163713.GA29944@infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Queue-Id: 8408A4DAB X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 19, 2020 at 09:37:13AM -0700, Christoph Hellwig wrote: > I don't think we need any of that symlink stuff. Even if we want it > (which I don't), it should not be in a bug fix patch. > > In fact to fix the blktrace race I think we only need something like > this fairly trivial patch (completely untested so far) below. > > (and with that we can also drop the previous patch, as blk-debugfs.c > becomes rather pointless) You forgot to deal with partitions. Putting similar lipstick on the pig, this is what I end up with, let me know if this seems agreeable: diff --git a/block/blk-core.c b/block/blk-core.c index 2096373bd16d..67cd9ddac822 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -51,9 +51,7 @@ #include "blk-pm.h" #include "blk-rq-qos.h" -#ifdef CONFIG_DEBUG_FS struct dentry *blk_debugfs_root; -#endif EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap); EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); @@ -1877,9 +1875,7 @@ int __init blk_dev_init(void) blk_requestq_cachep = kmem_cache_create("request_queue", sizeof(struct request_queue), 0, SLAB_PANIC, NULL); -#ifdef CONFIG_DEBUG_FS blk_debugfs_root = debugfs_create_dir("block", NULL); -#endif return 0; } diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 96b7a35c898a..08edc3a54114 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -822,9 +822,6 @@ void blk_mq_debugfs_register(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i; - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), - blk_debugfs_root); - debugfs_create_files(q->debugfs_dir, q, blk_mq_debugfs_queue_attrs); /* @@ -855,9 +852,7 @@ void blk_mq_debugfs_register(struct request_queue *q) void blk_mq_debugfs_unregister(struct request_queue *q) { - debugfs_remove_recursive(q->debugfs_dir); q->sched_debugfs_dir = NULL; - q->debugfs_dir = NULL; } static void blk_mq_debugfs_register_ctx(struct blk_mq_hw_ctx *hctx, diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 561624d4cc4e..5babb6547f48 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "blk.h" #include "blk-mq.h" @@ -918,6 +919,7 @@ static void blk_release_queue(struct kobject *kobj) blk_trace_shutdown(q); + debugfs_remove_recursive(q->debugfs_dir); if (queue_is_mq(q)) blk_mq_debugfs_unregister(q); @@ -989,6 +991,28 @@ int blk_register_queue(struct gendisk *disk) goto unlock; } + /* + * Blktrace needs a debugsfs name even for queues that don't register + * a gendisk, so it lazily registers the debugfs directory. But that + * can get us into a situation where a SCSI device is found, with no + * driver for it (yet). Then blktrace is used on the device, creating + * the debugfs directory, and only after that a drivers is loaded. In + * that case we might already have a debugfs directory registered here. + * Even worse we could be racing with blktrace to register it. + */ +#ifdef CONFIG_BLK_DEV_IO_TRACE + mutex_lock(&q->blk_trace_mutex); + if (!q->debugfs_dir) { + q->debugfs_dir = + debugfs_create_dir(kobject_name(q->kobj.parent), + blk_debugfs_root); + } + mutex_unlock(&q->blk_trace_mutex); +#else + q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent), + blk_debugfs_root); +#endif + if (queue_is_mq(q)) { __blk_mq_register_dev(dev, q); blk_mq_debugfs_register(q); diff --git a/block/blk.h b/block/blk.h index 5db4ec1e85f7..f11f79295419 100644 --- a/block/blk.h +++ b/block/blk.h @@ -14,9 +14,7 @@ /* Max future timer expiry for timeouts */ #define BLK_MAX_TIMEOUT (5 * HZ) -#ifdef CONFIG_DEBUG_FS extern struct dentry *blk_debugfs_root; -#endif struct blk_flush_queue { unsigned int flush_pending_idx:1; diff --git a/block/partitions/core.c b/block/partitions/core.c index 297004fd2264..95f9019aac83 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "check.h" static int (*check_part[])(struct parsed_partitions *) = { @@ -322,6 +323,7 @@ void delete_partition(struct gendisk *disk, struct hd_struct *part) get_device(disk_to_dev(part_to_disk(part))); rcu_assign_pointer(ptbl->part[part->partno], NULL); kobject_put(part->holder_dir); + debugfs_remove_recursive(part->debugfs_dir); device_del(part_to_dev(part)); /* @@ -443,6 +445,7 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno, if (!p->holder_dir) goto out_del; + p->debugfs_dir = debugfs_create_dir(dev_name(pdev), blk_debugfs_root); dev_set_uevent_suppress(pdev, 0); if (flags & ADDPART_FLAG_WHOLEDISK) { err = device_create_file(pdev, &dev_attr_whole_disk); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 20e378b428b8..737467c29a31 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -574,8 +574,8 @@ struct request_queue { struct list_head tag_set_list; struct bio_set bio_split; -#ifdef CONFIG_BLK_DEBUG_FS struct dentry *debugfs_dir; +#ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; struct dentry *rqos_debugfs_dir; #endif diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 3b6ff5902edc..eb6db276e293 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -22,7 +22,6 @@ struct blk_trace { u64 end_lba; u32 pid; u32 dev; - struct dentry *dir; struct dentry *dropped_file; struct dentry *msg_file; struct list_head running_list; diff --git a/include/linux/genhd.h b/include/linux/genhd.h index a9384449465a..7ff4c4c06140 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -89,6 +89,7 @@ struct hd_struct { int make_it_fail; #endif struct rcu_work rcu_work; + struct dentry *debugfs_dir; }; /** diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index ea47f2084087..8209d41dec18 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt) debugfs_remove(bt->msg_file); debugfs_remove(bt->dropped_file); relay_close(bt->rchan); - debugfs_remove(bt->dir); free_percpu(bt->sequence); free_percpu(bt->msg_data); kfree(bt); @@ -482,9 +481,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, if (!buts->buf_size || !buts->buf_nr) return -EINVAL; - if (!blk_debugfs_root) - return -ENOENT; - strncpy(buts->name, name, BLKTRACE_BDEV_SIZE); buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0'; @@ -494,6 +490,38 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, */ strreplace(buts->name, '/', '_'); + /* + * We also have to use a partition directory if a partition is + * being worked on, even though the same request_queue is shared. + */ + if (bdev && bdev != bdev->bd_contains) + dir = bdev->bd_part->debugfs_dir; + else { + /* + * For queues that do not have a gendisk attached to them, the + * debugfs directory will not have been created at setup time. + * Create it here lazily, it will only be removed when the + * queue is torn down. + */ + if (!q->debugfs_dir) { + q->debugfs_dir = + debugfs_create_dir(buts->name, + blk_debugfs_root); + } + dir = q->debugfs_dir; + } + + /* + * As blktrace relies on debugfs for its interface the debugfs directory + * is required, contrary to the usual mantra of not checking for debugfs + * files or directories. + */ + if (IS_ERR_OR_NULL(q->debugfs_dir)) { + pr_warn("debugfs_dir not present for %s so skipping\n", + buts->name); + return -ENOENT; + } + bt = kzalloc(sizeof(*bt), GFP_KERNEL); if (!bt) return -ENOMEM; @@ -507,12 +535,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, if (!bt->msg_data) goto err; - ret = -ENOENT; - - dir = debugfs_lookup(buts->name, blk_debugfs_root); - if (!dir) - bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); - bt->dev = dev; atomic_set(&bt->dropped, 0); INIT_LIST_HEAD(&bt->running_list); @@ -551,8 +573,6 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, ret = 0; err: - if (dir && !bt->dir) - dput(dir); if (ret) blk_trace_free(bt); return ret; -- 2.26.2