linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Greg KH <gregkh@linuxfoundation.org>, Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Hannes Reinecke <hare@suse.de>,
	Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>,
	Hillf Danton <hdanton@sina.com>,
	Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>,
	linux-block <linux-block@vger.kernel.org>,
	Tyler Hicks <tyhicks@linux.microsoft.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>
Subject: [PATCH v5] block: genhd: don't call probe function with major_names_lock held
Date: Sat, 21 Aug 2021 15:12:04 +0900	[thread overview]
Message-ID: <b2af8a5b-3c1b-204e-7f56-bea0b15848d6@i-love.sakura.ne.jp> (raw)
In-Reply-To: <YR0nLGdH3zVMSRXm@kroah.com>

syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit a160c6159d4a0cf8 ("block: add an optional probe callback to
major_names") is calling the module's probe function with major_names_lock
held.

When copying content of /proc/devices to another file via sendfile(),

  sb_writers#$N => &p->lock => major_names_lock

dependency is recorded.

When loop_process_work() from WQ context performs a write request,

  (wq_completion)loop$M => (work_completion)&lo->rootcg_work =>
  sb_writers#$N

dependency is recorded.

When flush_workqueue() from drain_workqueue() from destroy_workqueue()
 from __loop_clr_fd() from blkdev_put() from blkdev_close() from __fput()
is called,

  &disk->open_mutex => &lo->lo_mutex => (wq_completion)loop$M

dependency is recorded.

When loop_control_remove() from loop_control_ioctl(LOOP_CTL_REMOVE) is
called,

  loop_ctl_mutex => &lo->lo_mutex

dependency is recorded.

As a result, lockdep thinks that there is

  loop_ctl_mutex => &lo->lo_mutex => (wq_completion)loop$M =>
  (work_completion)&lo->rootcg_work => sb_writers#$N => &p->lock =>
  major_names_lock

dependency chain.

Then, if loop_add() from loop_probe() from blk_request_module() from
blkdev_get_no_open() from blkdev_get_by_dev() from blkdev_open() from
do_dentry_open() from path_openat() from do_filp_open() is called,

  major_names_lock => loop_ctl_mutex

dependency is appended to the dependency chain.

There would be two approaches for breaking this circular dependency.
One is to kill loop_ctl_mutex => &lo->lo_mutex chain. The other is to
kill major_names_lock => loop_ctl_mutex chain. This patch implements
the latter, due to the following reasons.

 (1) sb_writers#$N => &p->lock => major_names_lock chain is unavoidable

 (2) this patch can also fix similar problem in other modules [2] which
     is caused by calling the probe function with major_names_lock held

 (3) I believe that this patch is principally safer than e.g.
     commit bd5c39edad535d9f ("loop: reduce loop_ctl_mutex coverage in
     loop_exit") which waits until the probe function finishes using
     global mutex in order to fix deadlock reproducible by sleep
     injection [3]

This patch adds THIS_MODULE parameter to __register_blkdev() as with
usb_register(), and drops major_names_lock before calling probe function
by holding a reference to that module which contains that probe function.

Link: https://syzkaller.appspot.com/bug?id=7bb10e8b62f83e4d445cdf4c13d69e407e629558 [1]
Link: https://syzkaller.appspot.com/bug?id=7bd106c28e846d1023d4ca915718b1a0905444cb [2]
Link: https://lkml.kernel.org/r/c4edf07f-92e1-a350-2743-f0b0234a2b6c@i-love.sakura.ne.jp [3]
Reported-by: syzbot <syzbot+f61766d5763f9e7a118f@syzkaller.appspotmail.com>
Reported-by: syzbot <syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+f61766d5763f9e7a118f@syzkaller.appspotmail.com>
Fixes: a160c6159d4a0cf8 ("block: add an optional probe callback to major_names")
---
Changes in v5:
  Repost of v4, with history added.

  We should admit that the locking dependency complication caused by
  a160c6159d4a0cf8 is currently an untamable beast, for we can't allocate
  resources for finding approaches that can reduce the locking complexity
  of all individual modules using probe function.

  Keeping this locking dependency problem unresolved degrades quality of
  future kernels by "patches are merged and kernels are released without
  testing, for it is impossible to test due to this locking dependency".
  I'm not worrying about only stable kernels. Stopping this untamable
  beast immediately is important for "releasing tested kernels".

  As an example which kills loop_ctl_mutex => &lo->lo_mutex chain, I wrote
  "[PATCH] loop: break loop_ctl_mutex into loop_idr_spinlock and
  loop_removal_mutex"
  ( https://lkml.kernel.org/r/2901b9c2-f798-413e-4073-451259718288@i-love.sakura.ne.jp ).
  However, this approach is not considered as reducing the locking
  complexity
  ( https://lkml.kernel.org/r/20210819091941.GB12883@lst.de ).
  Then, what different approach is possible? We are run out of ideas
  which can stop this untamable beast from the loop module side.

  I consider that "making it possible to release tested kernels" (instead
  of leaving this problem by refusing to convert from multi-purposed
  long-held single lock into single-purposed short-held two locks) is
  also "Fix it right". But if we can't add new locks, we have no choice
  but addressing this problem in the common path.

  Note that "[PATCH v5] block: genhd: don't call probe function with
  major_names_lock held" is easily revetrtable because this patch can
  fix the locking dependency without adding new locks and without any
  dependent commits. We can revert this patch after we succeeded in making
  locking dependency of all individual modules using probe function
  simple enough.

Changes in v4:
  Use a macro and an inline function for transparently passing THIS_MODULE
  argument to all __register_blkdev() users, as a response to "avoid
  directly modifying __register_blkdev() users" feedback.

Changes in v3:
  Directly add THIS_MODULE argument to all __register_blkdev() users,
  as a response to "avoid use of ____ naming scheme" feedback.

Changes in v2:
  Changed the strategy for addressing this problem from individual modules
  to common path. Chose ____ naming scheme for transparently passing
  THIS_MODULE argument.
---
 block/genhd.c         | 33 ++++++++++++++++++++++++++-------
 include/linux/genhd.h | 11 +++++++++--
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 298ee78c1bda..a3e2e5666457 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -162,6 +162,7 @@ static struct blk_major_name {
 	int major;
 	char name[16];
 	void (*probe)(dev_t devt);
+	struct module *owner;
 } *major_names[BLKDEV_MAJOR_HASH_SIZE];
 static DEFINE_MUTEX(major_names_lock);
 
@@ -190,7 +191,8 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
  * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
  *         @major = 0, try to allocate any unused major number.
  * @name: the name of the new block device as a zero terminated string
- * @probe: allback that is called on access to any minor number of @major
+ * @probe: callback that is called on access to any minor number of @major
+ * @owner: THIS_MODULE if @probe is not NULL, ignored if @probe is NULL.
  *
  * The @name must be unique within the system.
  *
@@ -207,8 +209,9 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
  *
  * Use register_blkdev instead for any new code.
  */
+#undef __register_blkdev
 int __register_blkdev(unsigned int major, const char *name,
-		void (*probe)(dev_t devt))
+		      void (*probe)(dev_t devt), struct module *owner)
 {
 	struct blk_major_name **n, *p;
 	int index, ret = 0;
@@ -248,6 +251,7 @@ int __register_blkdev(unsigned int major, const char *name,
 
 	p->major = major;
 	p->probe = probe;
+	p->owner = owner;
 	strlcpy(p->name, name, sizeof(p->name));
 	p->next = NULL;
 	index = major_to_index(major);
@@ -653,14 +657,29 @@ void blk_request_module(dev_t devt)
 {
 	unsigned int major = MAJOR(devt);
 	struct blk_major_name **n;
+	void (*probe_fn)(dev_t devt);
 
 	mutex_lock(&major_names_lock);
 	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
-		if ((*n)->major == major && (*n)->probe) {
-			(*n)->probe(devt);
-			mutex_unlock(&major_names_lock);
-			return;
-		}
+		if ((*n)->major != major || !(*n)->probe)
+			continue;
+		if (!try_module_get((*n)->owner))
+			break;
+		/*
+		 * Calling probe function with major_names_lock held causes
+		 * circular locking dependency problem. Thus, call it after
+		 * releasing major_names_lock.
+		 */
+		probe_fn = (*n)->probe;
+		mutex_unlock(&major_names_lock);
+		/*
+		 * Assuming that unregister_blkdev() is called from module's
+		 * __exit function, a module refcount taken above allows us
+		 * to safely call probe function without major_names_lock held.
+		 */
+		probe_fn(devt);
+		module_put((*n)->owner);
+		return;
 	}
 	mutex_unlock(&major_names_lock);
 
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 13b34177cc85..2ed856616d47 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -303,9 +303,16 @@ struct gendisk *__blk_alloc_disk(int node);
 void blk_cleanup_disk(struct gendisk *disk);
 
 int __register_blkdev(unsigned int major, const char *name,
-		void (*probe)(dev_t devt));
+		      void (*probe)(dev_t devt), struct module *owner);
+static inline int register_blkdev_with_probe(unsigned int major, const char *name,
+					     void (*probe)(dev_t devt), struct module *owner)
+{
+	return __register_blkdev(major, name, probe, owner);
+}
+#define __register_blkdev(major, name, probe) \
+	register_blkdev_with_probe(major, name, probe, THIS_MODULE)
 #define register_blkdev(major, name) \
-	__register_blkdev(major, name, NULL)
+	register_blkdev_with_probe(major, name, NULL, NULL)
 void unregister_blkdev(unsigned int major, const char *name);
 
 bool bdev_check_media_change(struct block_device *bdev);
-- 
2.18.4



  reply	other threads:[~2021-08-21  6:12 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-19  1:05 [PATCH v2] block: genhd: don't call probe function with major_names_lock held Tetsuo Handa
2021-06-19  3:24 ` kernel test robot
2021-06-19  6:14 ` kernel test robot
2021-06-19  6:44 ` Greg KH
2021-06-19  8:47   ` Tetsuo Handa
     [not found]   ` <20210620024403.820-1-hdanton@sina.com>
2021-06-20 13:54     ` Tetsuo Handa
2021-06-21  8:54       ` Greg KH
2021-06-21  6:18 ` Christoph Hellwig
2021-08-15  6:52 ` [PATCH v3] " Tetsuo Handa
2021-08-15  7:06   ` Greg KH
2021-08-15  7:49     ` Tetsuo Handa
2021-08-15  9:19       ` Greg KH
2021-08-18 11:07         ` [PATCH v4] " Tetsuo Handa
2021-08-18 13:27           ` Greg KH
2021-08-18 14:44             ` Tetsuo Handa
2021-08-18 15:28               ` Greg KH
2021-08-21  6:12                 ` Tetsuo Handa [this message]
2021-08-18 13:47           ` Christoph Hellwig
2021-08-18 14:34             ` Tetsuo Handa
2021-08-18 14:41               ` Greg KH
2021-08-18 14:51                 ` Tetsuo Handa
2021-08-19  9:16                   ` Christoph Hellwig
2021-08-19 14:47                     ` Tetsuo Handa
2021-08-19  9:19               ` Christoph Hellwig
2021-08-19 14:23                 ` Tetsuo Handa
2021-08-19 15:10                   ` Greg KH
2021-08-16  7:33   ` [PATCH v3] " Christoph Hellwig
2021-08-16 14:44     ` Tetsuo Handa
     [not found]     ` <20210817081045.3609-1-hdanton@sina.com>
2021-08-17 10:18       ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2af8a5b-3c1b-204e-7f56-bea0b15848d6@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=axboe@kernel.dk \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=desmondcheongzx@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=linux-block@vger.kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tyhicks@linux.microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).