All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: tj@kernel.org, gregkh@linuxfoundation.org,
	akpm@linux-foundation.org, minchan@kernel.org, jeyu@kernel.org,
	shuah@kernel.org, bvanassche@acm.org, dan.j.williams@intel.com,
	joe@perches.com, tglx@linutronix.de, keescook@chromium.org,
	rostedt@goodmis.org, linux-spdx@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v8 09/12] sysfs: fix deadlock race with module removal
Date: Wed, 13 Oct 2021 09:07:03 +0800	[thread overview]
Message-ID: <YWYxN875B6rlmAjC@T590> (raw)
In-Reply-To: <YWX7pAn0YMaJeJBA@bombadil.infradead.org>

On Tue, Oct 12, 2021 at 02:18:28PM -0700, Luis Chamberlain wrote:
> On Tue, Oct 12, 2021 at 08:20:46AM +0800, Ming Lei wrote:
> > On Mon, Oct 11, 2021 at 02:25:46PM -0700, Luis Chamberlain wrote:
> > > On Tue, Oct 05, 2021 at 05:24:18PM +0800, Ming Lei wrote:
> > > > On Mon, Sep 27, 2021 at 09:38:02AM -0700, Luis Chamberlain wrote:
> > > > > When driver sysfs attributes use a lock also used on module removal we
> > > > > can race to deadlock. This happens when for instance a sysfs file on
> > > > > a driver is used, then at the same time we have module removal call
> > > > > trigger. The module removal call code holds a lock, and then the
> > > > > driver's sysfs file entry waits for the same lock. While holding the
> > > > > lock the module removal tries to remove the sysfs entries, but these
> > > > > cannot be removed yet as one is waiting for a lock. This won't complete
> > > > > as the lock is already held. Likewise module removal cannot complete,
> > > > > and so we deadlock.
> > > > > 
> > > > > This can now be easily reproducible with our sysfs selftest as follows:
> > > > > 
> > > > > ./tools/testing/selftests/sysfs/sysfs.sh -t 0027
> > > > > 
> > > > > This uses a local driver lock. Test 0028 can also be used, that uses
> > > > > the rtnl_lock():
> > > > > 
> > > > > ./tools/testing/selftests/sysfs/sysfs.sh -t 0028
> > > > > 
> > > > > To fix this we extend the struct kernfs_node with a module reference
> > > > > and use the try_module_get() after kernfs_get_active() is called. As
> > > > > documented in the prior patch, we now know that once kernfs_get_active()
> > > > > is called the module is implicitly guarded to exist and cannot be removed.
> > > > > This is because the module is the one in charge of removing the same
> > > > > sysfs file it created, and removal of sysfs files on module exit will wait
> > > > > until they don't have any active references. By using a try_module_get()
> > > > > after kernfs_get_active() we yield to let module removal trump calls to
> > > > > process a sysfs operation, while also preventing module removal if a sysfs
> > > > > operation is in already progress. This prevents the deadlock.
> > > > > 
> > > > > This deadlock was first reported with the zram driver, however the live
> > > > 
> > > > Looks not see the lock pattern you mentioned in zram driver, can you
> > > > share the related zram code?
> > > 
> > > I recommend to not look at the zram driver, instead look at the
> > > test_sysfs driver as that abstracts the issue more clearly and uses
> > 
> > Looks test_sysfs isn't in linus tree, where can I find it?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20210927-sysfs-generic-deadlock-fix
> 
> > Also please
> > update your commit log about this wrong info if it can't be applied on
> > zram.
> 
> It does apply to zram, it is just that I have other fixes for zram in
> my pipeline which will change the zram driver further, and so what makes
> more sense is to abstract the issue into a selftest driver to
> demonstrate the issue more clearly.
> 
> To reproduce the deadlock revert the patch in this thread and then run
> either of these two tests as root:
> 
> ./tools/testing/selftests/sysfs/sysfs.sh -w 0027
> ./tools/testing/selftests/sysfs/sysfs.sh -w 0028
> 
> You will need to enable the test_sysfs driver.
> 
> > > two different locks as an example. The point is that if on module
> > > removal *any* lock is used which is *also* used on the sysfs file
> > > created by the module, you can deadlock.
> > > 
> > > > > And this can lead to this condition:
> > > > > 
> > > > > CPU A                              CPU B
> > > > >                                    foo_store()
> > > > > foo_exit()
> > > > >   mutex_lock(&foo)
> > > > >                                    mutex_lock(&foo)
> > > > >    del_gendisk(some_struct->disk);
> > > > >      device_del()
> > > > >        device_remove_groups()
> > > > 
> > > > I guess the deadlock exists if foo_exit() is called anywhere. If yes,
> > > > look the issue may not be related with removing module directly, right?
> > > 
> > > No, the reason this can deadlock is that the module exit routine will
> > > patiently wait for the sysfs / kernfs files to be stop being used,
> > 
> > Can you share the code which waits for the sysfs / kernfs files to be
> > stop being used?
> 
> How about a call trace of the two tasks which deadlock, here is one of
> running test 0027:
> 
> kdevops login: [  363.875459] INFO: task sysfs.sh:1271 blocked for more
> than 120 seconds.
> [  363.878341]       Tainted: G            E
> 5.15.0-rc3-next-20210927+ #83
> [  363.881218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  363.882255] task:sysfs.sh        state:D stack:    0 pid: 1271 ppid:
> 1 flags:0x00000004
> [  363.882894] Call Trace:
> [  363.883091]  <TASK>
> [  363.883259]  __schedule+0x2fd/0x990
> [  363.883551]  schedule+0x43/0xe0
> [  363.883800]  schedule_preempt_disabled+0x14/0x20
> [  363.884160]  __mutex_lock.constprop.0+0x249/0x470
> [  363.884524]  test_dev_x_store+0xa5/0xc0 [test_sysfs]
> [  363.884915]  kernfs_fop_write_iter+0x177/0x220
> [  363.885257]  new_sync_write+0x11c/0x1b0
> [  363.885556]  vfs_write+0x20d/0x2a0
> [  363.885821]  ksys_write+0x5f/0xe0
> [  363.886081]  do_syscall_64+0x38/0xc0
> [  363.886359]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  363.886748] RIP: 0033:0x7fee00f8bf33
> [  363.887029] RSP: 002b:00007ffd372c5d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [  363.887633] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fee00f8bf33
> [  363.888217] RDX: 0000000000000003 RSI: 000055a4d14a0db0 RDI: 0000000000000001
> [  363.888761] RBP: 000055a4d14a0db0 R08: 000000000000000a R09: 0000000000000002
> [  363.889267] R10: 000055a4d1554ac0 R11: 0000000000000246 R12: 0000000000000003
> [  363.889983] R13: 00007fee0105c6a0 R14: 0000000000000003 R15: 00007fee0105c8a0
> [  363.890513]  </TASK>
> [  363.890709] INFO: task modprobe:1276 blocked for more than 120 seconds.
> [  363.891185]       Tainted: G            E 5.15.0-rc3-next-20210927+ #83
> [  363.891781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  363.892353] task:modprobe        state:D stack:    0 pid: 1276 ppid: 1 flags:0x00004000
> [  363.892955] Call Trace:
> [  363.893141]  <TASK>
> [  363.893457]  __schedule+0x2fd/0x990
> [  363.893865]  schedule+0x43/0xe0
> [  363.894246]  __kernfs_remove.part.0+0x21e/0x2a0
> [  363.894704]  ? do_wait_intr_irq+0xa0/0xa0
> [  363.895142]  kernfs_remove_by_name_ns+0x50/0x90
> [  363.895632]  remove_files+0x2b/0x60
> [  363.896035]  sysfs_remove_group+0x38/0x80
> [  363.896470]  sysfs_remove_groups+0x29/0x40
> [  363.896912]  device_remove_attrs+0x5b/0x90
> [  363.897352]  device_del+0x183/0x400
> [  363.897758]  unregister_test_dev_sysfs+0x5b/0xaa [test_sysfs]
> [  363.898317]  test_sysfs_exit+0x45/0xfb0 [test_sysfs]
> [  363.898833]  __do_sys_delete_module+0x18d/0x2a0
> [  363.899329]  ? fpregs_assert_state_consistent+0x1e/0x40
> [  363.899868]  ? exit_to_user_mode_prepare+0x3a/0x180
> [  363.900390]  do_syscall_64+0x38/0xc0
> [  363.900810]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  363.901330] RIP: 0033:0x7f21915c57d7
> [  363.901747] RSP: 002b:00007ffd90869fe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [  363.902442] RAX: ffffffffffffffda RBX: 000055ce676ffc30 RCX: 00007f21915c57d7
> [  363.903104] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055ce676ffc98
> [  363.903782] RBP: 000055ce676ffc30 R08: 0000000000000000 R09: 0000000000000000
> [  363.904462] R10: 00007f2191638ac0 R11: 0000000000000206 R12: 000055ce676ffc98
> [  363.905128] R13: 0000000000000000 R14: 0000000000000000 R15: 000055ce676ffdf0
> [  363.905797]  </TASK>

That doesn't show the deadlock is related with module_exit().

> 
> 
> And gdb:
> 
> (gdb) l *(__kernfs_remove+0x21e)
> 0xffffffff8139288e is in __kernfs_remove (fs/kernfs/dir.c:476).
> 471                     if (atomic_read(&kn->active) != KN_DEACTIVATED_BIAS)
> 472                             lock_contended(&kn->dep_map, _RET_IP_);
> 473             }
> 474
> 475             /* but everyone should wait for draining */
> 476             wait_event(root->deactivate_waitq,
> 477                        atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
> 478
> 479             if (kernfs_lockdep(kn)) {
> 480                     lock_acquired(&kn->dep_map, _RET_IP_);
> 
> (gdb) l *(kernfs_remove_by_name_ns+0x50)
> 0xffffffff813938d0 is in kernfs_remove_by_name_ns (fs/kernfs/dir.c:1534).
> 1529
> 1530            kn = kernfs_find_ns(parent, name, ns);
> 1531            if (kn)
> 1532                    __kernfs_remove(kn);
> 1533
> 1534            up_write(&kernfs_rwsem);
> 1535
> 1536            if (kn)
> 1537                    return 0;
> 1538            else
> 
> The same happens for test 0028 except instead of a mutex
> lock an rtnl_lock() is used.
> 
> Would this be better for the commit log?
> 
> > And why does it make a difference in case of being
> > called from module_exit()?
> 
> Well because that is where we remove the sysfs files. *If*
> a developer happens to use a lock on a sysfs op but it is
> also used on module exit, this deadlock is bound to happen.

It is clearly one AA deadlock, what I meant was that it isn't related with
module exit cause lock & device_del() isn't always done in module exit, so
I doubt your fix with grabbing module refcnt is good or generic enough.

Except for your cooked test_sys module, how many real drivers do suffer the
problem? What are they? Why can't we fix the exact driver?


Thanks,
Ming


  reply	other threads:[~2021-10-13  1:07 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-27 16:37 [PATCH v8 00/12] syfs: generic deadlock fix with module removal Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 01/12] LICENSES: Add the copyleft-next-0.3.1 license Luis Chamberlain
     [not found]   ` <202110050907.35FBD2A1@keescook>
     [not found]     ` <YWR2ZrtzChamY1y4@bombadil.infradead.org>
2021-10-11 17:57       ` Kees Cook
2021-09-27 16:37 ` [PATCH v8 02/12] testing: use the copyleft-next-0.3.1 SPDX tag Luis Chamberlain
2021-10-05 16:11   ` Kees Cook
2021-09-27 16:37 ` [PATCH v8 03/12] selftests: add tests_sysfs module Luis Chamberlain
2021-10-05 14:16   ` Greg KH
2021-10-05 16:57     ` Tim.Bird
2021-10-11 17:40       ` Luis Chamberlain
2021-10-11 17:38     ` Luis Chamberlain
2021-10-07 14:23   ` Miroslav Benes
2021-10-11 19:11     ` Luis Chamberlain
     [not found]   ` <202110050912.3DF681ED@keescook>
2021-10-11 19:03     ` Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 04/12] kernfs: add initial failure injection support Luis Chamberlain
2021-10-05 19:47   ` Kees Cook
2021-10-11 20:44     ` Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 05/12] test_sysfs: add support to use kernfs failure injection Luis Chamberlain
2021-10-05 19:51   ` Kees Cook
2021-10-11 20:56     ` Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 06/12] kernel/module: add documentation for try_module_get() Luis Chamberlain
2021-10-05 19:58   ` Kees Cook
2021-10-11 21:16     ` Luis Chamberlain
2021-09-27 16:38 ` [PATCH v8 07/12] fs/kernfs/symlink.c: replace S_IRWXUGO with 0777 on kernfs_create_link() Luis Chamberlain
2021-10-05 19:59   ` Kees Cook
2021-09-27 16:38 ` [PATCH v8 08/12] fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755 sysfs_create_dir_ns() Luis Chamberlain
2021-10-05 16:05   ` Kees Cook
2021-09-27 16:38 ` [PATCH v8 09/12] sysfs: fix deadlock race with module removal Luis Chamberlain
2021-10-05  9:24   ` Ming Lei
2021-10-11 21:25     ` Luis Chamberlain
2021-10-12  0:20       ` Ming Lei
2021-10-12 21:18         ` Luis Chamberlain
2021-10-13  1:07           ` Ming Lei [this message]
2021-10-13 12:35             ` Luis Chamberlain
2021-10-13 15:04               ` Ming Lei
2021-10-13 21:16                 ` Luis Chamberlain
2021-10-05 20:50   ` Kees Cook
2021-10-11 22:26     ` Luis Chamberlain
2021-10-13 12:41       ` Luis Chamberlain
2021-09-27 16:38 ` [PATCH v8 10/12] test_sysfs: enable deadlock tests by default Luis Chamberlain
2021-09-27 16:38 ` [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate Luis Chamberlain
2021-10-05 20:55   ` Kees Cook
2021-10-11 18:27     ` Luis Chamberlain
2021-10-14  1:55   ` Ming Lei
2021-10-14  2:11     ` Ming Lei
2021-10-14 20:24       ` Luis Chamberlain
2021-10-14 23:52         ` Ming Lei
2021-10-15  0:22           ` Luis Chamberlain
2021-10-15  8:36             ` Ming Lei
2021-10-15  8:52               ` Greg KH
2021-10-15 17:31               ` Luis Chamberlain
2021-10-16 11:28                 ` Ming Lei
2021-10-18 19:32                   ` Luis Chamberlain
2021-10-19  2:34                     ` Ming Lei
2021-10-19  6:23                       ` Miroslav Benes
2021-10-19  9:23                         ` Ming Lei
2021-10-20  6:43                           ` Miroslav Benes
2021-10-20  7:49                             ` Ming Lei
2021-10-20  8:19                               ` Miroslav Benes
2021-10-20  8:28                                 ` Greg KH
2021-10-25  9:58                                   ` Miroslav Benes
2021-10-20 10:09                                 ` Ming Lei
2021-10-26  8:48                                   ` Petr Mladek
2021-10-26 15:37                                     ` Ming Lei
2021-10-26 17:01                                       ` Luis Chamberlain
2021-10-27 11:57                                         ` Miroslav Benes
2021-10-27 14:27                                           ` Luis Chamberlain
2021-11-02 15:24                                           ` Petr Mladek
2021-11-02 16:25                                             ` Luis Chamberlain
2021-11-03  0:01                                               ` Ming Lei
2021-11-03 12:44                                                 ` Luis Chamberlain
2021-10-27 11:42                                       ` Miroslav Benes
2021-11-02 14:15                                       ` Petr Mladek
2021-11-02 14:51                                         ` Petr Mladek
2021-11-02 15:17                                           ` Ming Lei
2021-11-02 14:56                                         ` Ming Lei
2021-10-19 15:28                       ` Luis Chamberlain
2021-10-19 16:29                         ` Ming Lei
2021-10-19 19:36                           ` Luis Chamberlain
2021-10-20  1:15                             ` Ming Lei
2021-10-20 15:48                               ` Luis Chamberlain
2021-10-21  0:39                                 ` Ming Lei
2021-10-21 17:18                                   ` Luis Chamberlain
2021-10-22  0:05                                     ` Ming Lei
2021-10-19 15:50                       ` Luis Chamberlain
2021-10-19 16:25                         ` Greg KH
2021-10-19 16:30                           ` Luis Chamberlain
2021-10-19 17:28                             ` Greg KH
2021-10-19 19:46                               ` Luis Chamberlain
2021-10-19 16:39                         ` Ming Lei
2021-10-19 19:38                           ` Luis Chamberlain
2021-10-20  0:55                             ` Ming Lei
2021-09-27 16:38 ` [PATCH v8 12/12] zram: use ATTRIBUTE_GROUPS to fix sysfs deadlock module removal Luis Chamberlain
2021-10-05 20:57   ` Kees Cook
2021-10-11 18:28     ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWYxN875B6rlmAjC@T590 \
    --to=ming.lei@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bvanassche@acm.org \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jeyu@kernel.org \
    --cc=joe@perches.com \
    --cc=keescook@chromium.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-spdx@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=minchan@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.