linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Hannes Reinecke <hare@suse.de>,
	Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>,
	Hillf Danton <hdanton@sina.com>,
	Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>,
	linux-block <linux-block@vger.kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	Tyler Hicks <tyhicks@linux.microsoft.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>
Subject: Re: [PATCH v4] block: genhd: don't call probe function with major_names_lock held
Date: Wed, 18 Aug 2021 15:47:52 +0200	[thread overview]
Message-ID: <20210818134752.GA7453@lst.de> (raw)
In-Reply-To: <d7d31bf1-33d3-b817-0ce3-943e6835de33@i-love.sakura.ne.jp>

Hi Tetsuo,

I might sound like a broken record, but we need to reduce the locking
complexity, not make it worse and fall down the locking cliff.  I did
send a suggested series this morning, in which you found a bunch of
bugs.  I'd appreciate it if you could use your superior skills to
actually help explain the issue and arrive at a common solution that
actually simplifies things instead of steaming down the locking cliff
full speed.  Thank you very much.

On Wed, Aug 18, 2021 at 08:07:32PM +0900, Tetsuo Handa wrote:
> syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
> commit a160c6159d4a0cf8 ("block: add an optional probe callback to
> major_names") is calling the module's probe function with major_names_lock
> held.
> 
> When copying content of /proc/devices to another file via sendfile(),
> 
>   sb_writers#$N => &p->lock => major_names_lock
> 
> dependency is recorded.
> 
> When loop_process_work() from WQ context performs a write request,
> 
>   (wq_completion)loop$M => (work_completion)&lo->rootcg_work =>
>   sb_writers#$N
> 
> dependency is recorded.
> 
> When flush_workqueue() from drain_workqueue() from destroy_workqueue()
>  from __loop_clr_fd() from blkdev_put() from blkdev_close() from __fput()
> is called,
> 
>   &disk->open_mutex => &lo->lo_mutex => (wq_completion)loop$M
> 
> dependency is recorded.
> 
> When loop_control_remove() from loop_control_ioctl(LOOP_CTL_REMOVE) is
> called,
> 
>   loop_ctl_mutex => &lo->lo_mutex
> 
> dependency is recorded.
> 
> As a result, lockdep thinks that there is
> 
>   loop_ctl_mutex => &lo->lo_mutex => (wq_completion)loop$M =>
>   (work_completion)&lo->rootcg_work => sb_writers#$N => &p->lock =>
>   major_names_lock
> 
> dependency chain.
> 
> Then, if loop_add() from loop_probe() from blk_request_module() from
> blkdev_get_no_open() from blkdev_get_by_dev() from blkdev_open() from
> do_dentry_open() from path_openat() from do_filp_open() is called,
> 
>   major_names_lock => loop_ctl_mutex
> 
> dependency is appended to the dependency chain.
> 
> There would be two approaches for breaking this circular dependency.
> One is to kill loop_ctl_mutex => &lo->lo_mutex chain. The other is to
> kill major_names_lock => loop_ctl_mutex chain. This patch implements
> the latter, due to the following reasons.
> 
>  (1) sb_writers#$N => &p->lock => major_names_lock chain is unavoidable
> 
>  (2) this patch can also fix similar problem in other modules [2] which
>      is caused by calling the probe function with major_names_lock held
> 
>  (3) I believe that this patch is principally safer than e.g.
>      commit bd5c39edad535d9f ("loop: reduce loop_ctl_mutex coverage in
>      loop_exit") which waits until the probe function finishes using
>      global mutex in order to fix deadlock reproducible by sleep
>      injection [3]
> 
> This patch adds THIS_MODULE parameter to __register_blkdev() as with
> usb_register(), and drops major_names_lock before calling probe function
> by holding a reference to that module which contains that probe function.
> 
> Since cdev uses register_chrdev() and __register_chrdev(), bdev should be
> able to preserve register_blkdev() and __register_blkdev() naming scheme.
> Thus, use register_blkdev_with_probe() as an inline function for
> automagically appending THIS_MODULE parameter.
> 
> Link: https://syzkaller.appspot.com/bug?id=7bb10e8b62f83e4d445cdf4c13d69e407e629558 [1]
> Link: https://syzkaller.appspot.com/bug?id=7bd106c28e846d1023d4ca915718b1a0905444cb [2]
> Link: https://lkml.kernel.org/r/c4edf07f-92e1-a350-2743-f0b0234a2b6c@i-love.sakura.ne.jp [3]
> Reported-by: syzbot <syzbot+f61766d5763f9e7a118f@syzkaller.appspotmail.com>
> Reported-by: syzbot <syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: syzbot <syzbot+f61766d5763f9e7a118f@syzkaller.appspotmail.com>
> Fixes: a160c6159d4a0cf8 ("block: add an optional probe callback to major_names")
> ---
>  block/genhd.c         | 33 ++++++++++++++++++++++++++-------
>  include/linux/genhd.h | 11 +++++++++--
>  2 files changed, 35 insertions(+), 9 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 298ee78c1bda..a3e2e5666457 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -162,6 +162,7 @@ static struct blk_major_name {
>  	int major;
>  	char name[16];
>  	void (*probe)(dev_t devt);
> +	struct module *owner;
>  } *major_names[BLKDEV_MAJOR_HASH_SIZE];
>  static DEFINE_MUTEX(major_names_lock);
>  
> @@ -190,7 +191,8 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
>   * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
>   *         @major = 0, try to allocate any unused major number.
>   * @name: the name of the new block device as a zero terminated string
> - * @probe: allback that is called on access to any minor number of @major
> + * @probe: callback that is called on access to any minor number of @major
> + * @owner: THIS_MODULE if @probe is not NULL, ignored if @probe is NULL.
>   *
>   * The @name must be unique within the system.
>   *
> @@ -207,8 +209,9 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
>   *
>   * Use register_blkdev instead for any new code.
>   */
> +#undef __register_blkdev
>  int __register_blkdev(unsigned int major, const char *name,
> -		void (*probe)(dev_t devt))
> +		      void (*probe)(dev_t devt), struct module *owner)
>  {
>  	struct blk_major_name **n, *p;
>  	int index, ret = 0;
> @@ -248,6 +251,7 @@ int __register_blkdev(unsigned int major, const char *name,
>  
>  	p->major = major;
>  	p->probe = probe;
> +	p->owner = owner;
>  	strlcpy(p->name, name, sizeof(p->name));
>  	p->next = NULL;
>  	index = major_to_index(major);
> @@ -653,14 +657,29 @@ void blk_request_module(dev_t devt)
>  {
>  	unsigned int major = MAJOR(devt);
>  	struct blk_major_name **n;
> +	void (*probe_fn)(dev_t devt);
>  
>  	mutex_lock(&major_names_lock);
>  	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
> -		if ((*n)->major == major && (*n)->probe) {
> -			(*n)->probe(devt);
> -			mutex_unlock(&major_names_lock);
> -			return;
> -		}
> +		if ((*n)->major != major || !(*n)->probe)
> +			continue;
> +		if (!try_module_get((*n)->owner))
> +			break;
> +		/*
> +		 * Calling probe function with major_names_lock held causes
> +		 * circular locking dependency problem. Thus, call it after
> +		 * releasing major_names_lock.
> +		 */
> +		probe_fn = (*n)->probe;
> +		mutex_unlock(&major_names_lock);
> +		/*
> +		 * Assuming that unregister_blkdev() is called from module's
> +		 * __exit function, a module refcount taken above allows us
> +		 * to safely call probe function without major_names_lock held.
> +		 */
> +		probe_fn(devt);
> +		module_put((*n)->owner);
> +		return;
>  	}
>  	mutex_unlock(&major_names_lock);
>  
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 13b34177cc85..2ed856616d47 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -303,9 +303,16 @@ struct gendisk *__blk_alloc_disk(int node);
>  void blk_cleanup_disk(struct gendisk *disk);
>  
>  int __register_blkdev(unsigned int major, const char *name,
> -		void (*probe)(dev_t devt));
> +		      void (*probe)(dev_t devt), struct module *owner);
> +static inline int register_blkdev_with_probe(unsigned int major, const char *name,
> +					     void (*probe)(dev_t devt), struct module *owner)
> +{
> +	return __register_blkdev(major, name, probe, owner);
> +}
> +#define __register_blkdev(major, name, probe) \
> +	register_blkdev_with_probe(major, name, probe, THIS_MODULE)
>  #define register_blkdev(major, name) \
> -	__register_blkdev(major, name, NULL)
> +	register_blkdev_with_probe(major, name, NULL, NULL)
>  void unregister_blkdev(unsigned int major, const char *name);
>  
>  bool bdev_check_media_change(struct block_device *bdev);
> -- 
> 2.25.1
> 
---end quoted text---

  parent reply	other threads:[~2021-08-18 13:47 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-19  1:05 [PATCH v2] block: genhd: don't call probe function with major_names_lock held Tetsuo Handa
2021-06-19  3:24 ` kernel test robot
2021-06-19  6:14 ` kernel test robot
2021-06-19  6:44 ` Greg KH
2021-06-19  8:47   ` Tetsuo Handa
     [not found]   ` <20210620024403.820-1-hdanton@sina.com>
2021-06-20 13:54     ` Tetsuo Handa
2021-06-21  8:54       ` Greg KH
2021-06-21  6:18 ` Christoph Hellwig
2021-08-15  6:52 ` [PATCH v3] " Tetsuo Handa
2021-08-15  7:06   ` Greg KH
2021-08-15  7:49     ` Tetsuo Handa
2021-08-15  9:19       ` Greg KH
2021-08-18 11:07         ` [PATCH v4] " Tetsuo Handa
2021-08-18 13:27           ` Greg KH
2021-08-18 14:44             ` Tetsuo Handa
2021-08-18 15:28               ` Greg KH
2021-08-21  6:12                 ` [PATCH v5] " Tetsuo Handa
2021-08-18 13:47           ` Christoph Hellwig [this message]
2021-08-18 14:34             ` [PATCH v4] " Tetsuo Handa
2021-08-18 14:41               ` Greg KH
2021-08-18 14:51                 ` Tetsuo Handa
2021-08-19  9:16                   ` Christoph Hellwig
2021-08-19 14:47                     ` Tetsuo Handa
2021-08-19  9:19               ` Christoph Hellwig
2021-08-19 14:23                 ` Tetsuo Handa
2021-08-19 15:10                   ` Greg KH
2021-08-16  7:33   ` [PATCH v3] " Christoph Hellwig
2021-08-16 14:44     ` Tetsuo Handa
     [not found]     ` <20210817081045.3609-1-hdanton@sina.com>
2021-08-17 10:18       ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210818134752.GA7453@lst.de \
    --to=hch@lst.de \
    --cc=axboe@kernel.dk \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=desmondcheongzx@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.de \
    --cc=hdanton@sina.com \
    --cc=linux-block@vger.kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=torvalds@linux-foundation.org \
    --cc=tyhicks@linux.microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).