linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	tj@kernel.org, gregkh@linuxfoundation.org,
	akpm@linux-foundation.org, minchan@kernel.org, jeyu@kernel.org,
	shuah@kernel.org, bvanassche@acm.org, dan.j.williams@intel.com,
	joe@perches.com, tglx@linutronix.de, keescook@chromium.org,
	rostedt@goodmis.org, linux-spdx@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, ming.lei@redhat.com
Subject: Re: [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate
Date: Wed, 20 Oct 2021 08:55:57 +0800	[thread overview]
Message-ID: <YW9pHcJiWYLjI04/@T590> (raw)
In-Reply-To: <YW8ewvAac/T2vxz2@bombadil.infradead.org>

On Tue, Oct 19, 2021 at 12:38:42PM -0700, Luis Chamberlain wrote:
> On Wed, Oct 20, 2021 at 12:39:22AM +0800, Ming Lei wrote:
> > On Tue, Oct 19, 2021 at 08:50:24AM -0700, Luis Chamberlain wrote:
> > > So do you want to take the position:
> > > 
> > > Hey driver authors: you cannot use any shared lock on module removal and
> > > on sysfs ops?
> > 
> > IMO, yes, in your patch of 'zram: fix crashes with cpu hotplug multistate',
> > when you added mutex_lock(zram_index_mutex) to disksize_store() and
> > other attribute show() or store() method. You have added new deadlock
> > between hot_remove_store() and disksize_store() & others, which can't be
> > addressed by your approach of holding module refcnt.
> > 
> > So far not see ltp tests covers hot add/remove interface yet.
> 
> Care to show what commands to use to cause this deadlock with my patches?

Build a kernel with your patch 4,7,8,9,11 and 12(all others are test module or
document change), with lockdep enabled, run the following command, then you
will see the warning, and it is one real deadlock, not false warning.

BTW, your patch 9 can't be applied cleanly against both linus and next
tree, so I edited it manually, but that can't make difference wrt. this issue.


[root@ktest-09 ~]# lsblk | grep zram
zram0   253:0    0    0B  0 disk 
cat /sys/class/zram-control/hot_add
[root@ktest-09 ~]# lsblk | grep zram
zram0   253:0    0    0B  0 disk 
zram1   253:1    0    0B  0 disk 
[root@ktest-09 ~]# echo 256M > /sys/block/zram1/disksize 
[root@ktest-09 ~]# echo 1 >  /sys/class/zram-control/hot_remove 
[root@ktest-09 ~]# dmesg
...
[   75.599882] ======================================================
[   75.601355] WARNING: possible circular locking dependency detected
[   75.602818] 5.15.0-rc3_zram_fix_luis+ #24 Not tainted
[   75.604038] ------------------------------------------------------
[   75.605512] bash/1154 is trying to acquire lock:
[   75.606634] ffff91ce026cd428 (kn->active#237){++++}-{0:0}, at: __kernfs_remove+0x1ab/0x1e0
[   75.608570]
               but task is already holding lock:
[   75.609955] ffffffff839e3ef0 (zram_index_mutex){+.+.}-{3:3}, at: hot_remove_store+0x52/0xf0
[   75.611910]
               which lock already depends on the new lock.

[   75.613896]
               the existing dependency chain (in reverse order) is:
[   75.615830]
               -> #1 (zram_index_mutex){+.+.}-{3:3}:
[   75.617483]        __lock_acquire+0x4d2/0x930
[   75.618650]        lock_acquire+0xbb/0x2d0
[   75.619748]        __mutex_lock+0x8e/0x8a0
[   75.620854]        disksize_store+0x38/0x180
[   75.621996]        kernfs_fop_write_iter+0x134/0x1d0
[   75.623287]        new_sync_write+0x122/0x1b0
[   75.624442]        vfs_write+0x23e/0x350
[   75.625506]        ksys_write+0x68/0xe0
[   75.626550]        do_syscall_64+0x3b/0x90
[   75.627649]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   75.629070]
               -> #0 (kn->active#237){++++}-{0:0}:
[   75.630677]        check_prev_add+0x91/0xc10
[   75.631816]        validate_chain+0x474/0x500
[   75.632972]        __lock_acquire+0x4d2/0x930
[   75.634131]        lock_acquire+0xbb/0x2d0
[   75.635234]        kernfs_drain+0x139/0x190
[   75.636355]        __kernfs_remove+0x1ab/0x1e0
[   75.637532]        kernfs_remove_by_name_ns+0x3f/0x80
[   75.638843]        remove_files+0x2b/0x60
[   75.639926]        sysfs_remove_group+0x38/0x80
[   75.641120]        sysfs_remove_groups+0x29/0x40
[   75.642334]        device_remove_attrs+0x5b/0x90
[   75.643552]        device_del+0x184/0x400
[   75.644635]        zram_remove+0xac/0xc0
[   75.645700]        hot_remove_store+0xa3/0xf0
[   75.646856]        kernfs_fop_write_iter+0x134/0x1d0
[   75.648147]        new_sync_write+0x122/0x1b0
[   75.649311]        vfs_write+0x23e/0x350
[   75.650372]        ksys_write+0x68/0xe0
[   75.651412]        do_syscall_64+0x3b/0x90
[   75.652512]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   75.653929]
               other info that might help us debug this:

[   75.656054]  Possible unsafe locking scenario:

[   75.657637]        CPU0                    CPU1
[   75.658833]        ----                    ----
[   75.660020]   lock(zram_index_mutex);
[   75.661024]                                lock(kn->active#237);
[   75.662549]                                lock(zram_index_mutex);
[   75.664103]   lock(kn->active#237);
[   75.665072]
                *** DEADLOCK ***

[   75.666736] 4 locks held by bash/1154:
[   75.667767]  #0: ffff91ce06983470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x68/0xe0
[   75.669802]  #1: ffff91ce4123d290 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x100/0x1d0
[   75.672050]  #2: ffff91ce05a7ac40 (kn->active#238){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x108/0x1d0
[   75.674383]  #3: ffffffff839e3ef0 (zram_index_mutex){+.+.}-{3:3}, at: hot_remove_store+0x52/0xf0
[   75.676595]
               stack backtrace:
[   75.677835] CPU: 2 PID: 1154 Comm: bash Not tainted 5.15.0-rc3_zram_fix_luis+ #24
[   75.679768] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
[   75.681927] Call Trace:
[   75.682674]  dump_stack_lvl+0x57/0x7d
[   75.683680]  check_noncircular+0xff/0x110
[   75.684758]  ? stack_trace_save+0x4b/0x70
[   75.685843]  check_prev_add+0x91/0xc10
[   75.686867]  ? add_chain_cache+0x112/0x2d0
[   75.687965]  validate_chain+0x474/0x500
[   75.689005]  __lock_acquire+0x4d2/0x930
[   75.690054]  lock_acquire+0xbb/0x2d0
[   75.691038]  ? __kernfs_remove+0x1ab/0x1e0
[   75.692131]  ? __lock_release+0x179/0x2c0
[   75.693212]  ? kernfs_drain+0x5b/0x190
[   75.694239]  kernfs_drain+0x139/0x190
[   75.695240]  ? __kernfs_remove+0x1ab/0x1e0
[   75.696341]  __kernfs_remove+0x1ab/0x1e0
[   75.697408]  kernfs_remove_by_name_ns+0x3f/0x80
[   75.698607]  remove_files+0x2b/0x60
[   75.699576]  sysfs_remove_group+0x38/0x80
[   75.700661]  sysfs_remove_groups+0x29/0x40
[   75.701770]  device_remove_attrs+0x5b/0x90
[   75.702870]  device_del+0x184/0x400
[   75.703835]  zram_remove+0xac/0xc0
[   75.704785]  hot_remove_store+0xa3/0xf0
[   75.705831]  kernfs_fop_write_iter+0x134/0x1d0
[   75.707004]  new_sync_write+0x122/0x1b0
[   75.708048]  ? __do_fast_syscall_32+0xe0/0xf0
[   75.709214]  vfs_write+0x23e/0x350
[   75.710161]  ksys_write+0x68/0xe0
[   75.711088]  do_syscall_64+0x3b/0x90
[   75.712078]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   75.713389] RIP: 0033:0x7fcc1893f927
[   75.714381] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[   75.718879] RSP: 002b:00007ffcd56d91a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   75.720832] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fcc1893f927
[   75.722592] RDX: 0000000000000002 RSI: 000055d7d33f78c0 RDI: 0000000000000001
[   75.724352] RBP: 000055d7d33f78c0 R08: 0000000000000000 R09: 00007fcc189f44e0
[   75.726123] R10: 00007fcc189f43e0 R11: 0000000000000246 R12: 0000000000000002
[   75.727884] R13: 00007fcc18a395a0 R14: 0000000000000002 R15: 00007fcc18a397a0



Thanks, 
Ming


  reply	other threads:[~2021-10-20  0:56 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-27 16:37 [PATCH v8 00/12] syfs: generic deadlock fix with module removal Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 01/12] LICENSES: Add the copyleft-next-0.3.1 license Luis Chamberlain
     [not found]   ` <202110050907.35FBD2A1@keescook>
     [not found]     ` <YWR2ZrtzChamY1y4@bombadil.infradead.org>
2021-10-11 17:57       ` Kees Cook
2021-09-27 16:37 ` [PATCH v8 02/12] testing: use the copyleft-next-0.3.1 SPDX tag Luis Chamberlain
2021-10-05 16:11   ` Kees Cook
2021-09-27 16:37 ` [PATCH v8 03/12] selftests: add tests_sysfs module Luis Chamberlain
2021-10-05 14:16   ` Greg KH
2021-10-05 16:57     ` Tim.Bird
2021-10-11 17:40       ` Luis Chamberlain
2021-10-11 17:38     ` Luis Chamberlain
2021-10-07 14:23   ` Miroslav Benes
2021-10-11 19:11     ` Luis Chamberlain
     [not found]   ` <202110050912.3DF681ED@keescook>
2021-10-11 19:03     ` Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 04/12] kernfs: add initial failure injection support Luis Chamberlain
2021-10-05 19:47   ` Kees Cook
2021-10-11 20:44     ` Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 05/12] test_sysfs: add support to use kernfs failure injection Luis Chamberlain
2021-10-05 19:51   ` Kees Cook
2021-10-11 20:56     ` Luis Chamberlain
2021-09-27 16:37 ` [PATCH v8 06/12] kernel/module: add documentation for try_module_get() Luis Chamberlain
2021-10-05 19:58   ` Kees Cook
2021-10-11 21:16     ` Luis Chamberlain
2021-09-27 16:38 ` [PATCH v8 07/12] fs/kernfs/symlink.c: replace S_IRWXUGO with 0777 on kernfs_create_link() Luis Chamberlain
2021-10-05 19:59   ` Kees Cook
2021-09-27 16:38 ` [PATCH v8 08/12] fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755 sysfs_create_dir_ns() Luis Chamberlain
2021-10-05 16:05   ` Kees Cook
2021-09-27 16:38 ` [PATCH v8 09/12] sysfs: fix deadlock race with module removal Luis Chamberlain
2021-10-05  9:24   ` Ming Lei
2021-10-11 21:25     ` Luis Chamberlain
2021-10-12  0:20       ` Ming Lei
2021-10-12 21:18         ` Luis Chamberlain
2021-10-13  1:07           ` Ming Lei
2021-10-13 12:35             ` Luis Chamberlain
2021-10-13 15:04               ` Ming Lei
2021-10-13 21:16                 ` Luis Chamberlain
2021-10-05 20:50   ` Kees Cook
2021-10-11 22:26     ` Luis Chamberlain
2021-10-13 12:41       ` Luis Chamberlain
2021-09-27 16:38 ` [PATCH v8 10/12] test_sysfs: enable deadlock tests by default Luis Chamberlain
2021-09-27 16:38 ` [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate Luis Chamberlain
2021-10-05 20:55   ` Kees Cook
2021-10-11 18:27     ` Luis Chamberlain
2021-10-14  1:55   ` Ming Lei
2021-10-14  2:11     ` Ming Lei
2021-10-14 20:24       ` Luis Chamberlain
2021-10-14 23:52         ` Ming Lei
2021-10-15  0:22           ` Luis Chamberlain
2021-10-15  8:36             ` Ming Lei
2021-10-15  8:52               ` Greg KH
2021-10-15 17:31               ` Luis Chamberlain
2021-10-16 11:28                 ` Ming Lei
2021-10-18 19:32                   ` Luis Chamberlain
2021-10-19  2:34                     ` Ming Lei
2021-10-19  6:23                       ` Miroslav Benes
2021-10-19  9:23                         ` Ming Lei
2021-10-20  6:43                           ` Miroslav Benes
2021-10-20  7:49                             ` Ming Lei
2021-10-20  8:19                               ` Miroslav Benes
2021-10-20  8:28                                 ` Greg KH
2021-10-25  9:58                                   ` Miroslav Benes
2021-10-20 10:09                                 ` Ming Lei
2021-10-26  8:48                                   ` Petr Mladek
2021-10-26 15:37                                     ` Ming Lei
2021-10-26 17:01                                       ` Luis Chamberlain
2021-10-27 11:57                                         ` Miroslav Benes
2021-10-27 14:27                                           ` Luis Chamberlain
2021-11-02 15:24                                           ` Petr Mladek
2021-11-02 16:25                                             ` Luis Chamberlain
2021-11-03  0:01                                               ` Ming Lei
2021-11-03 12:44                                                 ` Luis Chamberlain
2021-10-27 11:42                                       ` Miroslav Benes
2021-11-02 14:15                                       ` Petr Mladek
2021-11-02 14:51                                         ` Petr Mladek
2021-11-02 15:17                                           ` Ming Lei
2021-11-02 14:56                                         ` Ming Lei
2021-10-19 15:28                       ` Luis Chamberlain
2021-10-19 16:29                         ` Ming Lei
2021-10-19 19:36                           ` Luis Chamberlain
2021-10-20  1:15                             ` Ming Lei
2021-10-20 15:48                               ` Luis Chamberlain
2021-10-21  0:39                                 ` Ming Lei
2021-10-21 17:18                                   ` Luis Chamberlain
2021-10-22  0:05                                     ` Ming Lei
2021-10-19 15:50                       ` Luis Chamberlain
2021-10-19 16:25                         ` Greg KH
2021-10-19 16:30                           ` Luis Chamberlain
2021-10-19 17:28                             ` Greg KH
2021-10-19 19:46                               ` Luis Chamberlain
2021-10-19 16:39                         ` Ming Lei
2021-10-19 19:38                           ` Luis Chamberlain
2021-10-20  0:55                             ` Ming Lei [this message]
2021-09-27 16:38 ` [PATCH v8 12/12] zram: use ATTRIBUTE_GROUPS to fix sysfs deadlock module removal Luis Chamberlain
2021-10-05 20:57   ` Kees Cook
2021-10-11 18:28     ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YW9pHcJiWYLjI04/@T590 \
    --to=ming.lei@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=bvanassche@acm.org \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jeyu@kernel.org \
    --cc=joe@perches.com \
    --cc=keescook@chromium.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-spdx@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=minchan@kernel.org \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).