All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Greg KH <gregkh@linuxfoundation.org>, hch@infradead.org
Cc: axboe@kernel.dk, desmondcheongzx@gmail.com,
	linux-block@vger.kernel.org,
	linux-kernel-mentees@lists.linuxfoundation.org,
	linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org,
	miquel.raynal@bootlin.com, richard@nod.at,
	Shuah Khan <skhan@linuxfoundation.org>,
	syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com,
	vigneshr@ti.com
Subject: Re: [PATCH v2] block: genhd: don't call probe function with major_names_lock held
Date: Sat, 19 Jun 2021 17:47:30 +0900	[thread overview]
Message-ID: <c4edf07f-92e1-a350-2743-f0b0234a2b6c@i-love.sakura.ne.jp> (raw)
In-Reply-To: <YM2STfTN5AupWlSa@kroah.com>

On 2021/06/19 15:44, Greg KH wrote:
>> Note that regardless of this patch, it is up to probe function to
>> serialize module's __init function and probe function in that module
>> by using e.g. a mutex. This patch simply makes sure that module's __exit
>> function won't be called when probe function is about to be called.
> 
> module init functions ARE serialized.

The __init functions between module foo and module bar are serialized.
But the __init function for module foo and the probe function for module
foo are not serialized.

> The module owner should not matter here.

The __exit functions between module foo and module bar are serialized.
But the __exit function for module bar and the probe function for module
bar are not serialized.

You can observe a deadlock via the following steps.

(1) Build loop.ko with CONFIG_BLK_DEV_LOOP_MIN_COUNT=0 and
    a delay injection patch shown below.

----------
diff --git a/block/genhd.c b/block/genhd.c
index 9f8cb7beaad1..fe0360dc8c5d 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -680,6 +680,11 @@ void blk_request_module(dev_t devt)
 	mutex_lock(&major_names_lock);
 	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
 		if ((*n)->major == major && (*n)->probe) {
+			if (!strcmp(current->comm, "bash")) {
+				pr_info("sleep injection start\n");
+				schedule_timeout_killable(10 * HZ); // Call "rmmod" here.
+				pr_info("sleep injection end\n");
+			}
 			(*n)->probe(devt);
 			mutex_unlock(&major_names_lock);
 			return;
----------

(2) Run the following commands from bash shell.

# modprobe loop
# mknod /dev/loop0 b 7 0
# exec 100<>/dev/loop0 & sleep 1; rmmod loop

(3) See dmesg output.

----------
[   32.260467][ T2873] loop: module loaded
[   32.289961][ T2880] sleep injection start
[   42.484039][ T2880] sleep injection end
[   42.484218][ T2880]
[   42.484248][ T2880] ======================================================
[   42.484381][ T2880] WARNING: possible circular locking dependency detected
[   42.484455][ T2880] 5.13.0-rc6+ #867 Not tainted
[   42.484508][ T2880] ------------------------------------------------------
[   42.484579][ T2880] bash/2880 is trying to acquire lock:
[   42.484638][ T2880] ffffffffc075b468 (loop_ctl_mutex){+.+.}-{3:3}, at: loop_probe+0x44/0x90 [loop]
[   42.484760][ T2880]
[   42.484760][ T2880] but task is already holding lock:
[   42.484836][ T2880] ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[   42.484950][ T2880]
[   42.484950][ T2880] which lock already depends on the new lock.
[   42.484950][ T2880]
[   42.485053][ T2880]
[   42.485053][ T2880] the existing dependency chain (in reverse order) is:
[   42.485144][ T2880]
[   42.485144][ T2880] -> #1 (major_names_lock){+.+.}-{3:3}:
[   42.485230][ T2880]        lock_acquire+0xb3/0x380
[   42.485292][ T2880]        __mutex_lock+0x89/0x8f0
[   42.485350][ T2880]        mutex_lock_nested+0x16/0x20
[   42.485410][ T2880]        unregister_blkdev+0x38/0xb0
[   42.485469][ T2880]        loop_exit+0x44/0xd84 [loop]
[   42.485534][ T2880]        __x64_sys_delete_module+0x135/0x210
[   42.485602][ T2880]        do_syscall_64+0x36/0x70
[   42.485660][ T2880]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.485733][ T2880]
[   42.485733][ T2880] -> #0 (loop_ctl_mutex){+.+.}-{3:3}:
[   42.485817][ T2880]        check_prevs_add+0x16a/0x1040
[   42.487091][ T2880]        __lock_acquire+0x118b/0x1580
[   42.488427][ T2880]        lock_acquire+0xb3/0x380
[   42.489701][ T2880]        __mutex_lock+0x89/0x8f0
[   42.490960][ T2880]        mutex_lock_nested+0x16/0x20
[   42.492374][ T2880]        loop_probe+0x44/0x90 [loop]
[   42.493756][ T2880]        blk_request_module+0xa3/0x100
[   42.495207][ T2880]        blkdev_get_no_open+0x93/0xc0
[   42.496516][ T2880]        blkdev_get_by_dev+0x56/0x200
[   42.497735][ T2880]        blkdev_open+0x5d/0xa0
[   42.498919][ T2880]        do_dentry_open+0x163/0x3b0
[   42.500157][ T2880]        vfs_open+0x28/0x30
[   42.501312][ T2880]        path_openat+0x7e6/0xad0
[   42.502443][ T2880]        do_filp_open+0x9f/0x110
[   42.503592][ T2880]        do_sys_openat2+0x245/0x310
[   42.504647][ T2880]        do_sys_open+0x48/0x80
[   42.505689][ T2880]        __x64_sys_open+0x1c/0x20
[   42.506730][ T2880]        do_syscall_64+0x36/0x70
[   42.507795][ T2880]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.508890][ T2880]
[   42.508890][ T2880] other info that might help us debug this:
[   42.508890][ T2880]
[   42.511436][ T2880]  Possible unsafe locking scenario:
[   42.511436][ T2880]
[   42.512303][ T2880]        CPU0                    CPU1
[   42.512727][ T2880]        ----                    ----
[   42.513143][ T2880]   lock(major_names_lock);
[   42.513557][ T2880]                                lock(loop_ctl_mutex);
[   42.513987][ T2880]                                lock(major_names_lock);
[   42.514417][ T2880]   lock(loop_ctl_mutex);
[   42.514841][ T2880]
[   42.514841][ T2880]  *** DEADLOCK ***
[   42.514841][ T2880]
[   42.516112][ T2880] 1 lock held by bash/2880:
[   42.516518][ T2880]  #0: ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[   42.517383][ T2880]
[   42.517383][ T2880] stack backtrace:
[   42.518228][ T2880] CPU: 3 PID: 2880 Comm: bash Not tainted 5.13.0-rc6+ #867
[   42.518679][ T2880] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[   42.519650][ T2880] Call Trace:
[   42.520128][ T2880]  dump_stack+0x76/0x95
[   42.520608][ T2880]  print_circular_bug.isra.50.cold.71+0x13c/0x141
[   42.521105][ T2880]  check_noncircular+0xfe/0x110
[   42.521607][ T2880]  check_prevs_add+0x16a/0x1040
[   42.522106][ T2880]  __lock_acquire+0x118b/0x1580
[   42.522606][ T2880]  lock_acquire+0xb3/0x380
[   42.523244][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.523753][ T2880]  __mutex_lock+0x89/0x8f0
[   42.524250][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.524749][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.525290][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.525790][ T2880]  ? vprintk_default+0x18/0x20
[   42.526286][ T2880]  ? vprintk+0x56/0x130
[   42.526797][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.527286][ T2880]  mutex_lock_nested+0x16/0x20
[   42.527768][ T2880]  ? mutex_lock_nested+0x16/0x20
[   42.528250][ T2880]  loop_probe+0x44/0x90 [loop]
[   42.528730][ T2880]  blk_request_module+0xa3/0x100
[   42.529209][ T2880]  blkdev_get_no_open+0x93/0xc0
[   42.529691][ T2880]  blkdev_get_by_dev+0x56/0x200
[   42.530176][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.530688][ T2880]  blkdev_open+0x5d/0xa0
[   42.531170][ T2880]  do_dentry_open+0x163/0x3b0
[   42.531656][ T2880]  vfs_open+0x28/0x30
[   42.532132][ T2880]  path_openat+0x7e6/0xad0
[   42.532612][ T2880]  do_filp_open+0x9f/0x110
[   42.533094][ T2880]  ? _raw_spin_unlock+0x1d/0x30
[   42.533582][ T2880]  ? alloc_fd+0x116/0x200
[   42.534069][ T2880]  do_sys_openat2+0x245/0x310
[   42.534566][ T2880]  do_sys_open+0x48/0x80
[   42.535015][ T2880]  __x64_sys_open+0x1c/0x20
[   42.535462][ T2880]  do_syscall_64+0x36/0x70
[   42.535913][ T2880]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.536364][ T2880] RIP: 0033:0x7f16f15737f0
[   42.536797][ T2880] Code: 48 8b 15 83 76 2d 00 f7 d8 64 89 02 48 83 c8 ff c3 66 0f 1f 84 00 00 00 00 00 83 3d cd d7 2d 00 00 75 10 b8 02 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e cf 01 00 48 89 04 24
[   42.538148][ T2880] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[   42.538671][ T2880] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[   42.539173][ T2880] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[   42.539681][ T2880] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[   42.540187][ T2880] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[   42.540696][ T2880] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[  246.772906][   T35] INFO: task bash:2880 blocked for more than 122 seconds.
[  246.774289][   T35]       Not tainted 5.13.0-rc6+ #867
[  246.775478][   T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.776734][   T35] task:bash            state:D stack:    0 pid: 2880 ppid:  2856 flags:0x00004000
[  246.779550][   T35] Call Trace:
[  246.781054][   T35]  __schedule+0x271/0x9e0
[  246.782381][   T35]  schedule+0x50/0xc0
[  246.783671][   T35]  schedule_preempt_disabled+0x10/0x20
[  246.784978][   T35]  __mutex_lock+0x467/0x8f0
[  246.786277][   T35]  ? loop_probe+0x44/0x90 [loop]
[  246.787582][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.789093][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.790409][   T35]  mutex_lock_nested+0x16/0x20
[  246.791700][   T35]  ? mutex_lock_nested+0x16/0x20
[  246.792992][   T35]  loop_probe+0x44/0x90 [loop]
[  246.794286][   T35]  blk_request_module+0xa3/0x100
[  246.795757][   T35]  blkdev_get_no_open+0x93/0xc0
[  246.797066][   T35]  blkdev_get_by_dev+0x56/0x200
[  246.798360][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.799661][   T35]  blkdev_open+0x5d/0xa0
[  246.800953][   T35]  do_dentry_open+0x163/0x3b0
[  246.802258][   T35]  vfs_open+0x28/0x30
[  246.803559][   T35]  path_openat+0x7e6/0xad0
[  246.805201][   T35]  do_filp_open+0x9f/0x110
[  246.807128][   T35]  ? _raw_spin_unlock+0x1d/0x30
[  246.808461][   T35]  ? alloc_fd+0x116/0x200
[  246.809763][   T35]  do_sys_openat2+0x245/0x310
[  246.811034][   T35]  do_sys_open+0x48/0x80
[  246.812257][   T35]  __x64_sys_open+0x1c/0x20
[  246.813416][   T35]  do_syscall_64+0x36/0x70
[  246.814522][   T35]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  246.815624][   T35] RIP: 0033:0x7f16f15737f0
[  246.816707][   T35] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[  246.817863][   T35] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[  246.819033][   T35] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[  246.820059][   T35] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[  246.820687][   T35] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[  246.821292][   T35] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[  246.821882][   T35] INFO: task rmmod:2882 blocked for more than 122 seconds.
[  246.822456][   T35]       Not tainted 5.13.0-rc6+ #867
[  246.823011][   T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.823599][   T35] task:rmmod           state:D stack:    0 pid: 2882 ppid:  2856 flags:0x00000000
[  246.824799][   T35] Call Trace:
[  246.825405][   T35]  __schedule+0x271/0x9e0
[  246.826018][   T35]  schedule+0x50/0xc0
[  246.826621][   T35]  schedule_preempt_disabled+0x10/0x20
[  246.827232][   T35]  __mutex_lock+0x467/0x8f0
[  246.827854][   T35]  ? unregister_blkdev+0x38/0xb0
[  246.828608][   T35]  mutex_lock_nested+0x16/0x20
[  246.829237][   T35]  ? mutex_lock_nested+0x16/0x20
[  246.829858][   T35]  unregister_blkdev+0x38/0xb0
[  246.830478][   T35]  loop_exit+0x44/0xd84 [loop]
[  246.831112][   T35]  __x64_sys_delete_module+0x135/0x210
[  246.831749][   T35]  do_syscall_64+0x36/0x70
[  246.832379][   T35]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  246.833019][   T35] RIP: 0033:0x7ff4990fcee7
[  246.833655][   T35] RSP: 002b:00007ffe445ecab8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  246.834326][   T35] RAX: ffffffffffffffda RBX: 0000000000a21220 RCX: 00007ff4990fcee7
[  246.835011][   T35] RDX: 00007ff499171a80 RSI: 0000000000000800 RDI: 0000000000a21288
[  246.835794][   T35] RBP: 0000000000000000 R08: 00007ff4993c6060 R09: 00007ff499171a80
[  246.836501][   T35] R10: 00007ffe445ec520 R11: 0000000000000206 R12: 00007ffe445ee840
[  246.837206][   T35] R13: 0000000000000000 R14: 0000000000a21220 R15: 0000000000a21010
[  246.837922][   T35] INFO: lockdep is turned off.
----------

Christoph and Desmond commented

  > > And now you can all probe after it has been freed and/or the module has
  > > been unloaded. The obviously correct fix is to only hold mtd_table_mutex
  > > for the actually required critical section:
  > > 
  > 
  > Thank you for the correction, Christoph. I hadn't thought of the 
  > scenario where the module is unloaded. I'll be more conscientious in the 
  > future.

at https://lkml.kernel.org/r/YMs3O/ce1567cf-bc94-790c-cfc0-e4e429e1a86a@gmail.com
but the fact that AB-BA deadlock is possible was forgotten. Therefore, I propose
this patch for fixing commit a160c6159d4a0cf8 ("block: add an optional probe callback
to major_names") which started calling the probe function without making sure that
the module will not go away.

>> @@ -676,14 +679,29 @@ void blk_request_module(dev_t devt)
>>  {
>>  	unsigned int major = MAJOR(devt);
>>  	struct blk_major_name **n;
>> +	void (*probe_fn)(dev_t devt);
>>  
>>  	mutex_lock(&major_names_lock);
>>  	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
>> -		if ((*n)->major == major && (*n)->probe) {
>> -			(*n)->probe(devt);
>> -			mutex_unlock(&major_names_lock);
>> -			return;
>> -		}
>> +		if ((*n)->major != major || !(*n)->probe)
>> +			continue;
>> +		if (!try_module_get((*n)->owner))
>> +			break;
> 
> So you just fail?  Are you sure that is ok?

After "break;", the control reaches request_module() (which waits for module
unloading to complete and then tries to load that module again).

I think failing open() for once due to racing with module unloading is acceptable
(as long as there is no leak/deadlock/oops etc.), but do we want to immediately
retry this loop after returning from request_module() ?

Also, to make sure that the __init function for module foo and the probe
function for module foo are serialized, should we also verify that
(*n)->owner && (*n)->owner->state == MODULE_STATE_LIVE (which indicates that
the __init function for that module has completed) because module_is_live()
 from try_module_get() is rather weak?

----------
/* FIXME: It'd be nice to isolate modules during init, too, so they
   aren't used before they (may) fail.  But presently too much code
   (IDE & SCSI) require entry into the module during init.*/
static inline bool module_is_live(struct module *mod)
{
	return mod->state != MODULE_STATE_GOING;
}
----------

>> +		/*
>> +		 * Assuming that unregister_blkdev() is called from module's
>> +		 * __exit function, a module refcount taken above allows us
>> +		 * to safely call probe function without major_names_lock held.
>> +		 */
>> +		probe_fn(devt);
>> +		module_put((*n)->owner);
> 
> So you are trying to keep the module in memory while the probe is call,
> ok.  But module removal is not an issue, you remove modules at your own
> risk.  As syzbot isn't even testing this, why is this an issue?

That's a crazy comment. A module removal bug (unless forced unloading is used)
is an issue. Why not fix bugs where syzbot cannot test?


WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Greg KH <gregkh@linuxfoundation.org>, hch@infradead.org
Cc: axboe@kernel.dk, desmondcheongzx@gmail.com,
	linux-block@vger.kernel.org,
	linux-kernel-mentees@lists.linuxfoundation.org,
	linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org,
	miquel.raynal@bootlin.com, richard@nod.at,
	Shuah Khan <skhan@linuxfoundation.org>,
	syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com,
	vigneshr@ti.com
Subject: Re: [PATCH v2] block: genhd: don't call probe function with major_names_lock held
Date: Sat, 19 Jun 2021 17:47:30 +0900	[thread overview]
Message-ID: <c4edf07f-92e1-a350-2743-f0b0234a2b6c@i-love.sakura.ne.jp> (raw)
In-Reply-To: <YM2STfTN5AupWlSa@kroah.com>

On 2021/06/19 15:44, Greg KH wrote:
>> Note that regardless of this patch, it is up to probe function to
>> serialize module's __init function and probe function in that module
>> by using e.g. a mutex. This patch simply makes sure that module's __exit
>> function won't be called when probe function is about to be called.
> 
> module init functions ARE serialized.

The __init functions between module foo and module bar are serialized.
But the __init function for module foo and the probe function for module
foo are not serialized.

> The module owner should not matter here.

The __exit functions between module foo and module bar are serialized.
But the __exit function for module bar and the probe function for module
bar are not serialized.

You can observe a deadlock via the following steps.

(1) Build loop.ko with CONFIG_BLK_DEV_LOOP_MIN_COUNT=0 and
    a delay injection patch shown below.

----------
diff --git a/block/genhd.c b/block/genhd.c
index 9f8cb7beaad1..fe0360dc8c5d 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -680,6 +680,11 @@ void blk_request_module(dev_t devt)
 	mutex_lock(&major_names_lock);
 	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
 		if ((*n)->major == major && (*n)->probe) {
+			if (!strcmp(current->comm, "bash")) {
+				pr_info("sleep injection start\n");
+				schedule_timeout_killable(10 * HZ); // Call "rmmod" here.
+				pr_info("sleep injection end\n");
+			}
 			(*n)->probe(devt);
 			mutex_unlock(&major_names_lock);
 			return;
----------

(2) Run the following commands from bash shell.

# modprobe loop
# mknod /dev/loop0 b 7 0
# exec 100<>/dev/loop0 & sleep 1; rmmod loop

(3) See dmesg output.

----------
[   32.260467][ T2873] loop: module loaded
[   32.289961][ T2880] sleep injection start
[   42.484039][ T2880] sleep injection end
[   42.484218][ T2880]
[   42.484248][ T2880] ======================================================
[   42.484381][ T2880] WARNING: possible circular locking dependency detected
[   42.484455][ T2880] 5.13.0-rc6+ #867 Not tainted
[   42.484508][ T2880] ------------------------------------------------------
[   42.484579][ T2880] bash/2880 is trying to acquire lock:
[   42.484638][ T2880] ffffffffc075b468 (loop_ctl_mutex){+.+.}-{3:3}, at: loop_probe+0x44/0x90 [loop]
[   42.484760][ T2880]
[   42.484760][ T2880] but task is already holding lock:
[   42.484836][ T2880] ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[   42.484950][ T2880]
[   42.484950][ T2880] which lock already depends on the new lock.
[   42.484950][ T2880]
[   42.485053][ T2880]
[   42.485053][ T2880] the existing dependency chain (in reverse order) is:
[   42.485144][ T2880]
[   42.485144][ T2880] -> #1 (major_names_lock){+.+.}-{3:3}:
[   42.485230][ T2880]        lock_acquire+0xb3/0x380
[   42.485292][ T2880]        __mutex_lock+0x89/0x8f0
[   42.485350][ T2880]        mutex_lock_nested+0x16/0x20
[   42.485410][ T2880]        unregister_blkdev+0x38/0xb0
[   42.485469][ T2880]        loop_exit+0x44/0xd84 [loop]
[   42.485534][ T2880]        __x64_sys_delete_module+0x135/0x210
[   42.485602][ T2880]        do_syscall_64+0x36/0x70
[   42.485660][ T2880]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.485733][ T2880]
[   42.485733][ T2880] -> #0 (loop_ctl_mutex){+.+.}-{3:3}:
[   42.485817][ T2880]        check_prevs_add+0x16a/0x1040
[   42.487091][ T2880]        __lock_acquire+0x118b/0x1580
[   42.488427][ T2880]        lock_acquire+0xb3/0x380
[   42.489701][ T2880]        __mutex_lock+0x89/0x8f0
[   42.490960][ T2880]        mutex_lock_nested+0x16/0x20
[   42.492374][ T2880]        loop_probe+0x44/0x90 [loop]
[   42.493756][ T2880]        blk_request_module+0xa3/0x100
[   42.495207][ T2880]        blkdev_get_no_open+0x93/0xc0
[   42.496516][ T2880]        blkdev_get_by_dev+0x56/0x200
[   42.497735][ T2880]        blkdev_open+0x5d/0xa0
[   42.498919][ T2880]        do_dentry_open+0x163/0x3b0
[   42.500157][ T2880]        vfs_open+0x28/0x30
[   42.501312][ T2880]        path_openat+0x7e6/0xad0
[   42.502443][ T2880]        do_filp_open+0x9f/0x110
[   42.503592][ T2880]        do_sys_openat2+0x245/0x310
[   42.504647][ T2880]        do_sys_open+0x48/0x80
[   42.505689][ T2880]        __x64_sys_open+0x1c/0x20
[   42.506730][ T2880]        do_syscall_64+0x36/0x70
[   42.507795][ T2880]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.508890][ T2880]
[   42.508890][ T2880] other info that might help us debug this:
[   42.508890][ T2880]
[   42.511436][ T2880]  Possible unsafe locking scenario:
[   42.511436][ T2880]
[   42.512303][ T2880]        CPU0                    CPU1
[   42.512727][ T2880]        ----                    ----
[   42.513143][ T2880]   lock(major_names_lock);
[   42.513557][ T2880]                                lock(loop_ctl_mutex);
[   42.513987][ T2880]                                lock(major_names_lock);
[   42.514417][ T2880]   lock(loop_ctl_mutex);
[   42.514841][ T2880]
[   42.514841][ T2880]  *** DEADLOCK ***
[   42.514841][ T2880]
[   42.516112][ T2880] 1 lock held by bash/2880:
[   42.516518][ T2880]  #0: ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[   42.517383][ T2880]
[   42.517383][ T2880] stack backtrace:
[   42.518228][ T2880] CPU: 3 PID: 2880 Comm: bash Not tainted 5.13.0-rc6+ #867
[   42.518679][ T2880] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[   42.519650][ T2880] Call Trace:
[   42.520128][ T2880]  dump_stack+0x76/0x95
[   42.520608][ T2880]  print_circular_bug.isra.50.cold.71+0x13c/0x141
[   42.521105][ T2880]  check_noncircular+0xfe/0x110
[   42.521607][ T2880]  check_prevs_add+0x16a/0x1040
[   42.522106][ T2880]  __lock_acquire+0x118b/0x1580
[   42.522606][ T2880]  lock_acquire+0xb3/0x380
[   42.523244][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.523753][ T2880]  __mutex_lock+0x89/0x8f0
[   42.524250][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.524749][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.525290][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.525790][ T2880]  ? vprintk_default+0x18/0x20
[   42.526286][ T2880]  ? vprintk+0x56/0x130
[   42.526797][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.527286][ T2880]  mutex_lock_nested+0x16/0x20
[   42.527768][ T2880]  ? mutex_lock_nested+0x16/0x20
[   42.528250][ T2880]  loop_probe+0x44/0x90 [loop]
[   42.528730][ T2880]  blk_request_module+0xa3/0x100
[   42.529209][ T2880]  blkdev_get_no_open+0x93/0xc0
[   42.529691][ T2880]  blkdev_get_by_dev+0x56/0x200
[   42.530176][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.530688][ T2880]  blkdev_open+0x5d/0xa0
[   42.531170][ T2880]  do_dentry_open+0x163/0x3b0
[   42.531656][ T2880]  vfs_open+0x28/0x30
[   42.532132][ T2880]  path_openat+0x7e6/0xad0
[   42.532612][ T2880]  do_filp_open+0x9f/0x110
[   42.533094][ T2880]  ? _raw_spin_unlock+0x1d/0x30
[   42.533582][ T2880]  ? alloc_fd+0x116/0x200
[   42.534069][ T2880]  do_sys_openat2+0x245/0x310
[   42.534566][ T2880]  do_sys_open+0x48/0x80
[   42.535015][ T2880]  __x64_sys_open+0x1c/0x20
[   42.535462][ T2880]  do_syscall_64+0x36/0x70
[   42.535913][ T2880]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.536364][ T2880] RIP: 0033:0x7f16f15737f0
[   42.536797][ T2880] Code: 48 8b 15 83 76 2d 00 f7 d8 64 89 02 48 83 c8 ff c3 66 0f 1f 84 00 00 00 00 00 83 3d cd d7 2d 00 00 75 10 b8 02 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e cf 01 00 48 89 04 24
[   42.538148][ T2880] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[   42.538671][ T2880] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[   42.539173][ T2880] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[   42.539681][ T2880] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[   42.540187][ T2880] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[   42.540696][ T2880] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[  246.772906][   T35] INFO: task bash:2880 blocked for more than 122 seconds.
[  246.774289][   T35]       Not tainted 5.13.0-rc6+ #867
[  246.775478][   T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.776734][   T35] task:bash            state:D stack:    0 pid: 2880 ppid:  2856 flags:0x00004000
[  246.779550][   T35] Call Trace:
[  246.781054][   T35]  __schedule+0x271/0x9e0
[  246.782381][   T35]  schedule+0x50/0xc0
[  246.783671][   T35]  schedule_preempt_disabled+0x10/0x20
[  246.784978][   T35]  __mutex_lock+0x467/0x8f0
[  246.786277][   T35]  ? loop_probe+0x44/0x90 [loop]
[  246.787582][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.789093][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.790409][   T35]  mutex_lock_nested+0x16/0x20
[  246.791700][   T35]  ? mutex_lock_nested+0x16/0x20
[  246.792992][   T35]  loop_probe+0x44/0x90 [loop]
[  246.794286][   T35]  blk_request_module+0xa3/0x100
[  246.795757][   T35]  blkdev_get_no_open+0x93/0xc0
[  246.797066][   T35]  blkdev_get_by_dev+0x56/0x200
[  246.798360][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.799661][   T35]  blkdev_open+0x5d/0xa0
[  246.800953][   T35]  do_dentry_open+0x163/0x3b0
[  246.802258][   T35]  vfs_open+0x28/0x30
[  246.803559][   T35]  path_openat+0x7e6/0xad0
[  246.805201][   T35]  do_filp_open+0x9f/0x110
[  246.807128][   T35]  ? _raw_spin_unlock+0x1d/0x30
[  246.808461][   T35]  ? alloc_fd+0x116/0x200
[  246.809763][   T35]  do_sys_openat2+0x245/0x310
[  246.811034][   T35]  do_sys_open+0x48/0x80
[  246.812257][   T35]  __x64_sys_open+0x1c/0x20
[  246.813416][   T35]  do_syscall_64+0x36/0x70
[  246.814522][   T35]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  246.815624][   T35] RIP: 0033:0x7f16f15737f0
[  246.816707][   T35] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[  246.817863][   T35] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[  246.819033][   T35] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[  246.820059][   T35] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[  246.820687][   T35] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[  246.821292][   T35] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[  246.821882][   T35] INFO: task rmmod:2882 blocked for more than 122 seconds.
[  246.822456][   T35]       Not tainted 5.13.0-rc6+ #867
[  246.823011][   T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.823599][   T35] task:rmmod           state:D stack:    0 pid: 2882 ppid:  2856 flags:0x00000000
[  246.824799][   T35] Call Trace:
[  246.825405][   T35]  __schedule+0x271/0x9e0
[  246.826018][   T35]  schedule+0x50/0xc0
[  246.826621][   T35]  schedule_preempt_disabled+0x10/0x20
[  246.827232][   T35]  __mutex_lock+0x467/0x8f0
[  246.827854][   T35]  ? unregister_blkdev+0x38/0xb0
[  246.828608][   T35]  mutex_lock_nested+0x16/0x20
[  246.829237][   T35]  ? mutex_lock_nested+0x16/0x20
[  246.829858][   T35]  unregister_blkdev+0x38/0xb0
[  246.830478][   T35]  loop_exit+0x44/0xd84 [loop]
[  246.831112][   T35]  __x64_sys_delete_module+0x135/0x210
[  246.831749][   T35]  do_syscall_64+0x36/0x70
[  246.832379][   T35]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  246.833019][   T35] RIP: 0033:0x7ff4990fcee7
[  246.833655][   T35] RSP: 002b:00007ffe445ecab8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  246.834326][   T35] RAX: ffffffffffffffda RBX: 0000000000a21220 RCX: 00007ff4990fcee7
[  246.835011][   T35] RDX: 00007ff499171a80 RSI: 0000000000000800 RDI: 0000000000a21288
[  246.835794][   T35] RBP: 0000000000000000 R08: 00007ff4993c6060 R09: 00007ff499171a80
[  246.836501][   T35] R10: 00007ffe445ec520 R11: 0000000000000206 R12: 00007ffe445ee840
[  246.837206][   T35] R13: 0000000000000000 R14: 0000000000a21220 R15: 0000000000a21010
[  246.837922][   T35] INFO: lockdep is turned off.
----------

Christoph and Desmond commented

  > > And now you can all probe after it has been freed and/or the module has
  > > been unloaded. The obviously correct fix is to only hold mtd_table_mutex
  > > for the actually required critical section:
  > > 
  > 
  > Thank you for the correction, Christoph. I hadn't thought of the 
  > scenario where the module is unloaded. I'll be more conscientious in the 
  > future.

at https://lkml.kernel.org/r/YMs3O/ce1567cf-bc94-790c-cfc0-e4e429e1a86a@gmail.com
but the fact that AB-BA deadlock is possible was forgotten. Therefore, I propose
this patch for fixing commit a160c6159d4a0cf8 ("block: add an optional probe callback
to major_names") which started calling the probe function without making sure that
the module will not go away.

>> @@ -676,14 +679,29 @@ void blk_request_module(dev_t devt)
>>  {
>>  	unsigned int major = MAJOR(devt);
>>  	struct blk_major_name **n;
>> +	void (*probe_fn)(dev_t devt);
>>  
>>  	mutex_lock(&major_names_lock);
>>  	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
>> -		if ((*n)->major == major && (*n)->probe) {
>> -			(*n)->probe(devt);
>> -			mutex_unlock(&major_names_lock);
>> -			return;
>> -		}
>> +		if ((*n)->major != major || !(*n)->probe)
>> +			continue;
>> +		if (!try_module_get((*n)->owner))
>> +			break;
> 
> So you just fail?  Are you sure that is ok?

After "break;", the control reaches request_module() (which waits for module
unloading to complete and then tries to load that module again).

I think failing open() for once due to racing with module unloading is acceptable
(as long as there is no leak/deadlock/oops etc.), but do we want to immediately
retry this loop after returning from request_module() ?

Also, to make sure that the __init function for module foo and the probe
function for module foo are serialized, should we also verify that
(*n)->owner && (*n)->owner->state == MODULE_STATE_LIVE (which indicates that
the __init function for that module has completed) because module_is_live()
 from try_module_get() is rather weak?

----------
/* FIXME: It'd be nice to isolate modules during init, too, so they
   aren't used before they (may) fail.  But presently too much code
   (IDE & SCSI) require entry into the module during init.*/
static inline bool module_is_live(struct module *mod)
{
	return mod->state != MODULE_STATE_GOING;
}
----------

>> +		/*
>> +		 * Assuming that unregister_blkdev() is called from module's
>> +		 * __exit function, a module refcount taken above allows us
>> +		 * to safely call probe function without major_names_lock held.
>> +		 */
>> +		probe_fn(devt);
>> +		module_put((*n)->owner);
> 
> So you are trying to keep the module in memory while the probe is call,
> ok.  But module removal is not an issue, you remove modules at your own
> risk.  As syzbot isn't even testing this, why is this an issue?

That's a crazy comment. A module removal bug (unless forced unloading is used)
is an issue. Why not fix bugs where syzbot cannot test?


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Greg KH <gregkh@linuxfoundation.org>, hch@infradead.org
Cc: axboe@kernel.dk,
	syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com,
	vigneshr@ti.com, richard@nod.at, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-mtd@lists.infradead.org,
	miquel.raynal@bootlin.com, desmondcheongzx@gmail.com,
	linux-kernel-mentees@lists.linuxfoundation.org
Subject: Re: [PATCH v2] block: genhd: don't call probe function with major_names_lock held
Date: Sat, 19 Jun 2021 17:47:30 +0900	[thread overview]
Message-ID: <c4edf07f-92e1-a350-2743-f0b0234a2b6c@i-love.sakura.ne.jp> (raw)
In-Reply-To: <YM2STfTN5AupWlSa@kroah.com>

On 2021/06/19 15:44, Greg KH wrote:
>> Note that regardless of this patch, it is up to probe function to
>> serialize module's __init function and probe function in that module
>> by using e.g. a mutex. This patch simply makes sure that module's __exit
>> function won't be called when probe function is about to be called.
> 
> module init functions ARE serialized.

The __init functions between module foo and module bar are serialized.
But the __init function for module foo and the probe function for module
foo are not serialized.

> The module owner should not matter here.

The __exit functions between module foo and module bar are serialized.
But the __exit function for module bar and the probe function for module
bar are not serialized.

You can observe a deadlock via the following steps.

(1) Build loop.ko with CONFIG_BLK_DEV_LOOP_MIN_COUNT=0 and
    a delay injection patch shown below.

----------
diff --git a/block/genhd.c b/block/genhd.c
index 9f8cb7beaad1..fe0360dc8c5d 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -680,6 +680,11 @@ void blk_request_module(dev_t devt)
 	mutex_lock(&major_names_lock);
 	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
 		if ((*n)->major == major && (*n)->probe) {
+			if (!strcmp(current->comm, "bash")) {
+				pr_info("sleep injection start\n");
+				schedule_timeout_killable(10 * HZ); // Call "rmmod" here.
+				pr_info("sleep injection end\n");
+			}
 			(*n)->probe(devt);
 			mutex_unlock(&major_names_lock);
 			return;
----------

(2) Run the following commands from bash shell.

# modprobe loop
# mknod /dev/loop0 b 7 0
# exec 100<>/dev/loop0 & sleep 1; rmmod loop

(3) See dmesg output.

----------
[   32.260467][ T2873] loop: module loaded
[   32.289961][ T2880] sleep injection start
[   42.484039][ T2880] sleep injection end
[   42.484218][ T2880]
[   42.484248][ T2880] ======================================================
[   42.484381][ T2880] WARNING: possible circular locking dependency detected
[   42.484455][ T2880] 5.13.0-rc6+ #867 Not tainted
[   42.484508][ T2880] ------------------------------------------------------
[   42.484579][ T2880] bash/2880 is trying to acquire lock:
[   42.484638][ T2880] ffffffffc075b468 (loop_ctl_mutex){+.+.}-{3:3}, at: loop_probe+0x44/0x90 [loop]
[   42.484760][ T2880]
[   42.484760][ T2880] but task is already holding lock:
[   42.484836][ T2880] ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[   42.484950][ T2880]
[   42.484950][ T2880] which lock already depends on the new lock.
[   42.484950][ T2880]
[   42.485053][ T2880]
[   42.485053][ T2880] the existing dependency chain (in reverse order) is:
[   42.485144][ T2880]
[   42.485144][ T2880] -> #1 (major_names_lock){+.+.}-{3:3}:
[   42.485230][ T2880]        lock_acquire+0xb3/0x380
[   42.485292][ T2880]        __mutex_lock+0x89/0x8f0
[   42.485350][ T2880]        mutex_lock_nested+0x16/0x20
[   42.485410][ T2880]        unregister_blkdev+0x38/0xb0
[   42.485469][ T2880]        loop_exit+0x44/0xd84 [loop]
[   42.485534][ T2880]        __x64_sys_delete_module+0x135/0x210
[   42.485602][ T2880]        do_syscall_64+0x36/0x70
[   42.485660][ T2880]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.485733][ T2880]
[   42.485733][ T2880] -> #0 (loop_ctl_mutex){+.+.}-{3:3}:
[   42.485817][ T2880]        check_prevs_add+0x16a/0x1040
[   42.487091][ T2880]        __lock_acquire+0x118b/0x1580
[   42.488427][ T2880]        lock_acquire+0xb3/0x380
[   42.489701][ T2880]        __mutex_lock+0x89/0x8f0
[   42.490960][ T2880]        mutex_lock_nested+0x16/0x20
[   42.492374][ T2880]        loop_probe+0x44/0x90 [loop]
[   42.493756][ T2880]        blk_request_module+0xa3/0x100
[   42.495207][ T2880]        blkdev_get_no_open+0x93/0xc0
[   42.496516][ T2880]        blkdev_get_by_dev+0x56/0x200
[   42.497735][ T2880]        blkdev_open+0x5d/0xa0
[   42.498919][ T2880]        do_dentry_open+0x163/0x3b0
[   42.500157][ T2880]        vfs_open+0x28/0x30
[   42.501312][ T2880]        path_openat+0x7e6/0xad0
[   42.502443][ T2880]        do_filp_open+0x9f/0x110
[   42.503592][ T2880]        do_sys_openat2+0x245/0x310
[   42.504647][ T2880]        do_sys_open+0x48/0x80
[   42.505689][ T2880]        __x64_sys_open+0x1c/0x20
[   42.506730][ T2880]        do_syscall_64+0x36/0x70
[   42.507795][ T2880]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.508890][ T2880]
[   42.508890][ T2880] other info that might help us debug this:
[   42.508890][ T2880]
[   42.511436][ T2880]  Possible unsafe locking scenario:
[   42.511436][ T2880]
[   42.512303][ T2880]        CPU0                    CPU1
[   42.512727][ T2880]        ----                    ----
[   42.513143][ T2880]   lock(major_names_lock);
[   42.513557][ T2880]                                lock(loop_ctl_mutex);
[   42.513987][ T2880]                                lock(major_names_lock);
[   42.514417][ T2880]   lock(loop_ctl_mutex);
[   42.514841][ T2880]
[   42.514841][ T2880]  *** DEADLOCK ***
[   42.514841][ T2880]
[   42.516112][ T2880] 1 lock held by bash/2880:
[   42.516518][ T2880]  #0: ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[   42.517383][ T2880]
[   42.517383][ T2880] stack backtrace:
[   42.518228][ T2880] CPU: 3 PID: 2880 Comm: bash Not tainted 5.13.0-rc6+ #867
[   42.518679][ T2880] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[   42.519650][ T2880] Call Trace:
[   42.520128][ T2880]  dump_stack+0x76/0x95
[   42.520608][ T2880]  print_circular_bug.isra.50.cold.71+0x13c/0x141
[   42.521105][ T2880]  check_noncircular+0xfe/0x110
[   42.521607][ T2880]  check_prevs_add+0x16a/0x1040
[   42.522106][ T2880]  __lock_acquire+0x118b/0x1580
[   42.522606][ T2880]  lock_acquire+0xb3/0x380
[   42.523244][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.523753][ T2880]  __mutex_lock+0x89/0x8f0
[   42.524250][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.524749][ T2880]  ? loop_probe+0x44/0x90 [loop]
[   42.525290][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.525790][ T2880]  ? vprintk_default+0x18/0x20
[   42.526286][ T2880]  ? vprintk+0x56/0x130
[   42.526797][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.527286][ T2880]  mutex_lock_nested+0x16/0x20
[   42.527768][ T2880]  ? mutex_lock_nested+0x16/0x20
[   42.528250][ T2880]  loop_probe+0x44/0x90 [loop]
[   42.528730][ T2880]  blk_request_module+0xa3/0x100
[   42.529209][ T2880]  blkdev_get_no_open+0x93/0xc0
[   42.529691][ T2880]  blkdev_get_by_dev+0x56/0x200
[   42.530176][ T2880]  ? blkdev_get_by_dev+0x200/0x200
[   42.530688][ T2880]  blkdev_open+0x5d/0xa0
[   42.531170][ T2880]  do_dentry_open+0x163/0x3b0
[   42.531656][ T2880]  vfs_open+0x28/0x30
[   42.532132][ T2880]  path_openat+0x7e6/0xad0
[   42.532612][ T2880]  do_filp_open+0x9f/0x110
[   42.533094][ T2880]  ? _raw_spin_unlock+0x1d/0x30
[   42.533582][ T2880]  ? alloc_fd+0x116/0x200
[   42.534069][ T2880]  do_sys_openat2+0x245/0x310
[   42.534566][ T2880]  do_sys_open+0x48/0x80
[   42.535015][ T2880]  __x64_sys_open+0x1c/0x20
[   42.535462][ T2880]  do_syscall_64+0x36/0x70
[   42.535913][ T2880]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   42.536364][ T2880] RIP: 0033:0x7f16f15737f0
[   42.536797][ T2880] Code: 48 8b 15 83 76 2d 00 f7 d8 64 89 02 48 83 c8 ff c3 66 0f 1f 84 00 00 00 00 00 83 3d cd d7 2d 00 00 75 10 b8 02 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e cf 01 00 48 89 04 24
[   42.538148][ T2880] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[   42.538671][ T2880] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[   42.539173][ T2880] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[   42.539681][ T2880] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[   42.540187][ T2880] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[   42.540696][ T2880] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[  246.772906][   T35] INFO: task bash:2880 blocked for more than 122 seconds.
[  246.774289][   T35]       Not tainted 5.13.0-rc6+ #867
[  246.775478][   T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.776734][   T35] task:bash            state:D stack:    0 pid: 2880 ppid:  2856 flags:0x00004000
[  246.779550][   T35] Call Trace:
[  246.781054][   T35]  __schedule+0x271/0x9e0
[  246.782381][   T35]  schedule+0x50/0xc0
[  246.783671][   T35]  schedule_preempt_disabled+0x10/0x20
[  246.784978][   T35]  __mutex_lock+0x467/0x8f0
[  246.786277][   T35]  ? loop_probe+0x44/0x90 [loop]
[  246.787582][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.789093][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.790409][   T35]  mutex_lock_nested+0x16/0x20
[  246.791700][   T35]  ? mutex_lock_nested+0x16/0x20
[  246.792992][   T35]  loop_probe+0x44/0x90 [loop]
[  246.794286][   T35]  blk_request_module+0xa3/0x100
[  246.795757][   T35]  blkdev_get_no_open+0x93/0xc0
[  246.797066][   T35]  blkdev_get_by_dev+0x56/0x200
[  246.798360][   T35]  ? blkdev_get_by_dev+0x200/0x200
[  246.799661][   T35]  blkdev_open+0x5d/0xa0
[  246.800953][   T35]  do_dentry_open+0x163/0x3b0
[  246.802258][   T35]  vfs_open+0x28/0x30
[  246.803559][   T35]  path_openat+0x7e6/0xad0
[  246.805201][   T35]  do_filp_open+0x9f/0x110
[  246.807128][   T35]  ? _raw_spin_unlock+0x1d/0x30
[  246.808461][   T35]  ? alloc_fd+0x116/0x200
[  246.809763][   T35]  do_sys_openat2+0x245/0x310
[  246.811034][   T35]  do_sys_open+0x48/0x80
[  246.812257][   T35]  __x64_sys_open+0x1c/0x20
[  246.813416][   T35]  do_syscall_64+0x36/0x70
[  246.814522][   T35]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  246.815624][   T35] RIP: 0033:0x7f16f15737f0
[  246.816707][   T35] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[  246.817863][   T35] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[  246.819033][   T35] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[  246.820059][   T35] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[  246.820687][   T35] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[  246.821292][   T35] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[  246.821882][   T35] INFO: task rmmod:2882 blocked for more than 122 seconds.
[  246.822456][   T35]       Not tainted 5.13.0-rc6+ #867
[  246.823011][   T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.823599][   T35] task:rmmod           state:D stack:    0 pid: 2882 ppid:  2856 flags:0x00000000
[  246.824799][   T35] Call Trace:
[  246.825405][   T35]  __schedule+0x271/0x9e0
[  246.826018][   T35]  schedule+0x50/0xc0
[  246.826621][   T35]  schedule_preempt_disabled+0x10/0x20
[  246.827232][   T35]  __mutex_lock+0x467/0x8f0
[  246.827854][   T35]  ? unregister_blkdev+0x38/0xb0
[  246.828608][   T35]  mutex_lock_nested+0x16/0x20
[  246.829237][   T35]  ? mutex_lock_nested+0x16/0x20
[  246.829858][   T35]  unregister_blkdev+0x38/0xb0
[  246.830478][   T35]  loop_exit+0x44/0xd84 [loop]
[  246.831112][   T35]  __x64_sys_delete_module+0x135/0x210
[  246.831749][   T35]  do_syscall_64+0x36/0x70
[  246.832379][   T35]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  246.833019][   T35] RIP: 0033:0x7ff4990fcee7
[  246.833655][   T35] RSP: 002b:00007ffe445ecab8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  246.834326][   T35] RAX: ffffffffffffffda RBX: 0000000000a21220 RCX: 00007ff4990fcee7
[  246.835011][   T35] RDX: 00007ff499171a80 RSI: 0000000000000800 RDI: 0000000000a21288
[  246.835794][   T35] RBP: 0000000000000000 R08: 00007ff4993c6060 R09: 00007ff499171a80
[  246.836501][   T35] R10: 00007ffe445ec520 R11: 0000000000000206 R12: 00007ffe445ee840
[  246.837206][   T35] R13: 0000000000000000 R14: 0000000000a21220 R15: 0000000000a21010
[  246.837922][   T35] INFO: lockdep is turned off.
----------

Christoph and Desmond commented

  > > And now you can all probe after it has been freed and/or the module has
  > > been unloaded. The obviously correct fix is to only hold mtd_table_mutex
  > > for the actually required critical section:
  > > 
  > 
  > Thank you for the correction, Christoph. I hadn't thought of the 
  > scenario where the module is unloaded. I'll be more conscientious in the 
  > future.

at https://lkml.kernel.org/r/YMs3O/ce1567cf-bc94-790c-cfc0-e4e429e1a86a@gmail.com
but the fact that AB-BA deadlock is possible was forgotten. Therefore, I propose
this patch for fixing commit a160c6159d4a0cf8 ("block: add an optional probe callback
to major_names") which started calling the probe function without making sure that
the module will not go away.

>> @@ -676,14 +679,29 @@ void blk_request_module(dev_t devt)
>>  {
>>  	unsigned int major = MAJOR(devt);
>>  	struct blk_major_name **n;
>> +	void (*probe_fn)(dev_t devt);
>>  
>>  	mutex_lock(&major_names_lock);
>>  	for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
>> -		if ((*n)->major == major && (*n)->probe) {
>> -			(*n)->probe(devt);
>> -			mutex_unlock(&major_names_lock);
>> -			return;
>> -		}
>> +		if ((*n)->major != major || !(*n)->probe)
>> +			continue;
>> +		if (!try_module_get((*n)->owner))
>> +			break;
> 
> So you just fail?  Are you sure that is ok?

After "break;", the control reaches request_module() (which waits for module
unloading to complete and then tries to load that module again).

I think failing open() for once due to racing with module unloading is acceptable
(as long as there is no leak/deadlock/oops etc.), but do we want to immediately
retry this loop after returning from request_module() ?

Also, to make sure that the __init function for module foo and the probe
function for module foo are serialized, should we also verify that
(*n)->owner && (*n)->owner->state == MODULE_STATE_LIVE (which indicates that
the __init function for that module has completed) because module_is_live()
 from try_module_get() is rather weak?

----------
/* FIXME: It'd be nice to isolate modules during init, too, so they
   aren't used before they (may) fail.  But presently too much code
   (IDE & SCSI) require entry into the module during init.*/
static inline bool module_is_live(struct module *mod)
{
	return mod->state != MODULE_STATE_GOING;
}
----------

>> +		/*
>> +		 * Assuming that unregister_blkdev() is called from module's
>> +		 * __exit function, a module refcount taken above allows us
>> +		 * to safely call probe function without major_names_lock held.
>> +		 */
>> +		probe_fn(devt);
>> +		module_put((*n)->owner);
> 
> So you are trying to keep the module in memory while the probe is call,
> ok.  But module removal is not an issue, you remove modules at your own
> risk.  As syzbot isn't even testing this, why is this an issue?

That's a crazy comment. A module removal bug (unless forced unloading is used)
is an issue. Why not fix bugs where syzbot cannot test?

_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

  reply	other threads:[~2021-06-19  8:48 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-19  1:05 [PATCH v2] block: genhd: don't call probe function with major_names_lock held Tetsuo Handa
2021-06-19  1:05 ` Tetsuo Handa
2021-06-19  1:05 ` Tetsuo Handa
2021-06-19  3:24 ` kernel test robot
2021-06-19  3:24   ` kernel test robot
2021-06-19  3:24   ` kernel test robot
2021-06-19  3:24   ` kernel test robot
2021-06-19  6:14 ` kernel test robot
2021-06-19  6:14   ` kernel test robot
2021-06-19  6:14   ` kernel test robot
2021-06-19  6:14   ` kernel test robot
2021-06-19  6:44 ` Greg KH
2021-06-19  6:44   ` Greg KH
2021-06-19  6:44   ` Greg KH
2021-06-19  8:47   ` Tetsuo Handa [this message]
2021-06-19  8:47     ` Tetsuo Handa
2021-06-19  8:47     ` Tetsuo Handa
2021-06-20  2:44     ` Hillf Danton
2021-06-20  2:44       ` Hillf Danton
2021-06-20 13:54       ` Tetsuo Handa
2021-06-20 13:54         ` Tetsuo Handa
2021-06-20 13:54         ` Tetsuo Handa
2021-06-21  8:54         ` Greg KH
2021-06-21  8:54           ` Greg KH
2021-06-21  8:54           ` Greg KH
2021-06-21  6:18 ` Christoph Hellwig
2021-06-21  6:18   ` Christoph Hellwig
2021-06-21  6:18   ` Christoph Hellwig
2021-08-15  6:52 ` [PATCH v3] " Tetsuo Handa
2021-08-15  7:06   ` Greg KH
2021-08-15  7:49     ` Tetsuo Handa
2021-08-15  9:19       ` Greg KH
2021-08-18 11:07         ` [PATCH v4] " Tetsuo Handa
2021-08-18 13:27           ` Greg KH
2021-08-18 14:44             ` Tetsuo Handa
2021-08-18 15:28               ` Greg KH
2021-08-21  6:12                 ` [PATCH v5] " Tetsuo Handa
2021-08-18 13:47           ` [PATCH v4] " Christoph Hellwig
2021-08-18 14:34             ` Tetsuo Handa
2021-08-18 14:41               ` Greg KH
2021-08-18 14:51                 ` Tetsuo Handa
2021-08-19  9:16                   ` Christoph Hellwig
2021-08-19 14:47                     ` Tetsuo Handa
2021-08-19  9:19               ` Christoph Hellwig
2021-08-19 14:23                 ` Tetsuo Handa
2021-08-19 15:10                   ` Greg KH
2021-08-16  7:33   ` [PATCH v3] " Christoph Hellwig
2021-08-16 14:44     ` Tetsuo Handa
     [not found]     ` <20210817081045.3609-1-hdanton@sina.com>
2021-08-17 10:18       ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c4edf07f-92e1-a350-2743-f0b0234a2b6c@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=axboe@kernel.dk \
    --cc=desmondcheongzx@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=miquel.raynal@bootlin.com \
    --cc=richard@nod.at \
    --cc=skhan@linuxfoundation.org \
    --cc=syzbot+6a8a0d93c91e8fbf2e80@syzkaller.appspotmail.com \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.