All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Donald Buczek <buczek@molgen.mpg.de>,
	Logan Gunthorpe <logang@deltatee.com>, Song Liu <song@kernel.org>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held
Date: Mon, 23 May 2022 09:08:22 +0800	[thread overview]
Message-ID: <5b0584a3-c128-cb53-7c8a-63744c60c667@linux.dev> (raw)
In-Reply-To: <836b2a93-65be-8d6c-8610-18373b88f86d@molgen.mpg.de>



On 5/22/22 2:23 AM, Donald Buczek wrote:
> On 20.05.22 20:27, Logan Gunthorpe wrote:
>>
>> Hi,
>>
>> On 2022-05-10 00:44, Song Liu wrote:
>>> On Mon, May 9, 2022 at 1:09 AM Guoqing Jiang 
>>> <guoqing.jiang@linux.dev> wrote:
>>>> On 5/9/22 2:37 PM, Song Liu wrote:
>>>>> On Fri, May 6, 2022 at 4:37 AM Guoqing 
>>>>> Jiang<guoqing.jiang@linux.dev>  wrote:
>>>>>> From: Guoqing Jiang<guoqing.jiang@cloud.ionos.com>
>>>>>>
>>>>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>>>>> doesn't reconfigure array.
>>>>>>
>>>>>> And it could cause deadlock problem for raid5 as follows:
>>>>>>
>>>>>> 1. process A tried to reap sync thread with reconfig_mutex held 
>>>>>> after echo
>>>>>>      idle to sync_action.
>>>>>> 2. raid5 sync thread was blocked if there were too many active 
>>>>>> stripes.
>>>>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from 
>>>>>> upper layer)
>>>>>>      which causes the number of active stripes can't be decreased.
>>>>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was 
>>>>>> not able
>>>>>>      to hold reconfig_mutex.
>>>>>>
>>>>>> More details in the link:
>>>>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t 
>>>>>>
>>>>>>
>>>>>> Let's call unregister thread between mddev_unlock and 
>>>>>> mddev_lock_nointr
>>>>>> (thanks for the report from kernel test robot<lkp@intel.com>) if the
>>>>>> reconfig_mutex is held, and mddev_is_locked is introduced 
>>>>>> accordingly.
>>>>> mddev_is_locked() feels really hacky to me. It cannot tell whether
>>>>> mddev is locked
>>>>> by current thread. So technically, we can unlock reconfigure_mutex 
>>>>> for
>>>>> other thread
>>>>> by accident, no?
>>>>
>>>> I can switch back to V2 if you think that is the correct way to do 
>>>> though no
>>>> one comment about the change in dm-raid.
>>>
>>> I guess v2 is the best at the moment. I pushed a slightly modified 
>>> v2 to
>>> md-next.
>>>
>>
>> I noticed a clear regression with mdadm tests with this patch in md-next
>> (7e6ba434cc6080).
>>
>> Before the patch, tests 07reshape5intr and 07revert-grow would fail
>> fairly infrequently (about 1 in 4 runs for the former and 1 in 30 runs
>> for the latter).
>>
>> After this patch, both tests always fail.
>>
>> I don't have time to dig into why this is, but it would be nice if
>> someone can at least fix the regression. It is hard to make any progress
>> on these tests if we are continuing to further break them.
>
> Hmmm. I wanted to try to help a bit by reproducing and digging into this.
>
> But it seems that more or less ALL tests hang my system one way or 
> another.
>
> I've used a qemu/kvm machine with md-next and mdraid master.
>
> Is this supposed to work?
>
> I can investigate the bugs I see, but probably that is a waste of time 
> because I'm doing something wrong fundamentally?
>
> This is an example from 00raid0:
>
> [   57.434064] md: md0 stopped.
> [   57.586951] md0: detected capacity change from 0 to 107520
> [   57.618454] BUG: kernel NULL pointer dereference, address: 
> 0000000000000094
> [   57.620830] #PF: supervisor read access in kernel mode
> [   57.622554] #PF: error_code(0x0000) - not-present page
> [   57.624273] PGD 800000010d5ee067 P4D 800000010d5ee067 PUD 10df28067 
> PMD 0
> [   57.626548] Oops: 0000 [#1] PREEMPT SMP PTI
> [   57.627942] CPU: 3 PID: 1064 Comm: mkfs.ext3 Not tainted 
> 5.18.0-rc3.mx64.425-00108-g6ad84d559b8c #77
> [   57.630952] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> [   57.635927] RIP: 0010:bfq_bio_bfqg+0x26/0x80
> [   57.638027] Code: 00 0f 1f 00 0f 1f 44 00 00 55 53 48 89 fd 48 8b 
> 56 48 48 89 f7 48 85 d2 74 32 48 63 05 53 54 1c 01 48 83 c0 16 48 8b 
> 5c c2 08 <80> bb 94 00 00 00 00 70
> [   57.645295] RSP: 0018:ffffc90001c27b38 EFLAGS: 00010006
> [   57.647414] RAX: 0000000000000018 RBX: 0000000000000000 RCX: 
> 0000000000000001
> [   57.650039] RDX: ffff888109297800 RSI: ffff8881032ba180 RDI: 
> ffff8881032ba180
> [   57.652541] RBP: ffff888102177800 R08: ffff88810c9004c8 R09: 
> ffff88810318cb00
> [   57.654852] R10: 0000000000000000 R11: ffff8881032ba180 R12: 
> ffff88810318cae0
> [   57.657128] R13: ffff888102177800 R14: ffffc90001c27ca8 R15: 
> ffffc90001c27c00
> [   57.659316] FS:  00007fdfce47d440(0000) GS:ffff8882b5ac0000(0000) 
> knlGS:0000000000000000
> [   57.661700] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   57.663461] CR2: 0000000000000094 CR3: 000000010d438002 CR4: 
> 0000000000170ee0
> [   57.665453] Call Trace:
> [   57.666479]  <TASK>
> [   57.667382]  bfq_bic_update_cgroup+0x28/0x1b0
> [   57.668724]  bfq_insert_requests+0x233/0x2340
> [   57.670049]  ? ioc_find_get_icq+0x21c/0x2a0
> [   57.671315]  ? bfq_prepare_request+0x11/0x30
> [   57.672565]  blk_mq_sched_insert_requests+0x5c/0x150
> [   57.673891]  blk_mq_flush_plug_list+0xe1/0x2a0
> [   57.675140]  __blk_flush_plug+0xdf/0x120
> [   57.676259]  io_schedule_prepare+0x3d/0x50
> [   57.677373]  io_schedule_timeout+0xf/0x40
> [   57.678465]  wait_for_completion_io+0x78/0x140
> [   57.679578]  submit_bio_wait+0x5b/0x80
> [   57.680575]  blkdev_issue_discard+0x65/0xb0
> [   57.681640]  blkdev_common_ioctl+0x391/0x8f0
> [   57.682712]  blkdev_ioctl+0x216/0x2a0
> [   57.683648]  __x64_sys_ioctl+0x76/0xb0
> [   57.684607]  do_syscall_64+0x42/0x90
> [   57.685527]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   57.686645] RIP: 0033:0x7fdfce56dc17
> [   57.687535] Code: 48 c7 c3 ff ff ff ff 48 89 d8 5b 5d 41 5c c3 0f 
> 1f 40 00 48 89 e8 48 f7 d8 48 39 c3 0f 92 c0 eb 93 66 90 b8 10 00 00 
> 00 0f 05 <48> 3d 01 f0 ff ff 73 08
> [   57.691055] RSP: 002b:00007ffe24319828 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000010
> [   57.692537] RAX: ffffffffffffffda RBX: 00000000004645a0 RCX: 
> 00007fdfce56dc17
> [   57.693905] RDX: 00007ffe24319830 RSI: 0000000000001277 RDI: 
> 0000000000000003
> [   57.695288] RBP: 0000000000460960 R08: 0000000000000400 R09: 
> 0000000000000000
> [   57.696645] R10: 0000000000000000 R11: 0000000000000246 R12: 
> 0000000000000000
> [   57.697954] R13: 000000000000d200 R14: 0000000000000000 R15: 
> 0000000000000000
> [   57.699281]  </TASK>
> [   57.699901] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs 8021q garp 
> stp mrp llc bochs drm_vram_helper drm_ttm_helper kvm_intel ttm 
> drm_kms_helper kvm drm fb_sys_fops vi4
> [   57.705955] CR2: 0000000000000094
> [   57.706710] ---[ end trace 0000000000000000 ]---
> [   57.707599] RIP: 0010:bfq_bio_bfqg+0x26/0x80
> [   57.708434] Code: 00 0f 1f 00 0f 1f 44 00 00 55 53 48 89 fd 48 8b 
> 56 48 48 89 f7 48 85 d2 74 32 48 63 05 53 54 1c 01 48 83 c0 16 48 8b 
> 5c c2 08 <80> bb 94 00 00 00 00 70
> [   57.711426] RSP: 0018:ffffc90001c27b38 EFLAGS: 00010006
> [   57.712391] RAX: 0000000000000018 RBX: 0000000000000000 RCX: 
> 0000000000000001
> [   57.713605] RDX: ffff888109297800 RSI: ffff8881032ba180 RDI: 
> ffff8881032ba180
> [   57.714811] RBP: ffff888102177800 R08: ffff88810c9004c8 R09: 
> ffff88810318cb00
> [   57.716018] R10: 0000000000000000 R11: ffff8881032ba180 R12: 
> ffff88810318cae0
> [   57.717236] R13: ffff888102177800 R14: ffffc90001c27ca8 R15: 
> ffffc90001c27c00
> [   57.718438] FS:  00007fdfce47d440(0000) GS:ffff8882b5ac0000(0000) 
> knlGS:0000000000000000
> [   57.719778] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   57.720808] CR2: 0000000000000094 CR3: 000000010d438002 CR4: 
> 0000000000170ee0
> [   57.722019] note: mkfs.ext3[1064] exited with preempt_count 1
> [   57.723067] ------------[ cut here ]------------
> [   57.723960] WARNING: CPU: 3 PID: 1064 at kernel/exit.c:741 
> do_exit+0x8cb/0xbc0
> [   57.725196] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs 8021q garp 
> stp mrp llc bochs drm_vram_helper drm_ttm_helper kvm_intel ttm 
> drm_kms_helper kvm drm fb_sys_fops vi4
> [   57.731011] CPU: 3 PID: 1064 Comm: mkfs.ext3 Tainted: G D           
> 5.18.0-rc3.mx64.425-00108-g6ad84d559b8c #77
> [   57.732704] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> [   57.734853] RIP: 0010:do_exit+0x8cb/0xbc0
> [   57.735711] Code: e9 13 ff ff ff 48 8b bb e0 04 00 00 31 f6 e8 4c 
> db ff ff e9 98 fd ff ff 4c 89 e6 bf 05 06 00 00 e8 8a c8 00 00 e9 41 
> f8 ff ff <0f> 0b e9 6b f7 ff ff 4b
> [   57.738851] RSP: 0018:ffffc90001c27ee8 EFLAGS: 00010082
> [   57.739899] RAX: 0000000000000000 RBX: ffff888101e48000 RCX: 
> 0000000000000000
> [   57.741196] RDX: 0000000000000001 RSI: ffffffff8220a969 RDI: 
> 0000000000000009
> [   57.742485] RBP: 0000000000000009 R08: 0000000000000000 R09: 
> c0000000ffffbfff
> [   57.743777] R10: 00007fdfce47d440 R11: ffffc90001c27d60 R12: 
> 0000000000000009
> [   57.745081] R13: 0000000000000046 R14: 0000000000000000 R15: 
> 0000000000000000
> [   57.746388] FS:  00007fdfce47d440(0000) GS:ffff8882b5ac0000(0000) 
> knlGS:0000000000000000
> [   57.747806] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   57.748931] CR2: 0000000000000094 CR3: 000000010d438002 CR4: 
> 0000000000170ee0
> [   57.750225] Call Trace:
> [   57.750894]  <TASK>
> [   57.751535]  make_task_dead+0x41/0xf0
> [   57.752369]  rewind_stack_and_make_dead+0x17/0x17
> [   57.753336] RIP: 0033:0x7fdfce56dc17
> [   57.754155] Code: 48 c7 c3 ff ff ff ff 48 89 d8 5b 5d 41 5c c3 0f 
> 1f 40 00 48 89 e8 48 f7 d8 48 39 c3 0f 92 c0 eb 93 66 90 b8 10 00 00 
> 00 0f 05 <48> 3d 01 f0 ff ff 73 08
> [   57.757318] RSP: 002b:00007ffe24319828 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000010
> [   57.758669] RAX: ffffffffffffffda RBX: 00000000004645a0 RCX: 
> 00007fdfce56dc17
> [   57.759956] RDX: 00007ffe24319830 RSI: 0000000000001277 RDI: 
> 0000000000000003
> [   57.761256] RBP: 0000000000460960 R08: 0000000000000400 R09: 
> 0000000000000000
> [   57.762531] R10: 0000000000000000 R11: 0000000000000246 R12: 
> 0000000000000000
> [   57.763806] R13: 000000000000d200 R14: 0000000000000000 R15: 
> 0000000000000000
> [   57.765177]  </TASK>
> [   57.765813] ---[ end trace 0000000000000000 ]---
> [   57.790046] md0: detected capacity change from 107520 to 0
> [   57.792834] md: md0 stopped.
> [   78.843853] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   78.845334] rcu:     10-...0: (0 ticks this GP) 
> idle=07b/1/0x4000000000000000 softirq=1140/1140 fqs=4805
> [   78.847246]     (detected by 13, t=21005 jiffies, g=9013, q=1419)
> [   78.848619] Sending NMI from CPU 13 to CPUs 10:
> [   78.849810] NMI backtrace for cpu 10
> [   78.849813] CPU: 10 PID: 1081 Comm: mdadm Tainted: G      D 
> W         5.18.0-rc3.mx64.425-00108-g6ad84d559b8c #77
> [   78.849816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> [   78.849817] RIP: 0010:queued_spin_lock_slowpath+0x4c/0x1d0
> [   78.849832] Code: 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 
> a9 00 01 ff ff 75 1b 85 c0 75 0f b8 01 00 00 00 66 89 07 5b 5d 41 5c 
> c3 f3 90 <8b> 07 84 c0 75 f8 eb e7
> [   78.849834] RSP: 0018:ffffc90001c9f9e0 EFLAGS: 00000002
> [   78.849837] RAX: 0000000000040101 RBX: ffff88810c914fc8 RCX: 
> 0000000000000000
> [   78.849838] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> ffff888102177c30
> [   78.849840] RBP: 0000000000000000 R08: ffff88810c914fc8 R09: 
> ffff888106a4ed10
> [   78.849841] R10: ffffc90001c9fae8 R11: ffff888101b048d8 R12: 
> ffff888103833000
> [   78.849842] R13: ffff888102177800 R14: ffffc90001c9fb20 R15: 
> ffffc90001c9fa78
> [   78.849844] FS:  00007fd3d66c4340(0000) GS:ffff8882b5c80000(0000) 
> knlGS:0000000000000000
> [   78.849847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   78.849848] CR2: 00000000004a5b58 CR3: 000000010d438001 CR4: 
> 0000000000170ee0
> [   78.849850] Call Trace:
> [   78.849853]  <TASK>
> [   78.849855]  bfq_insert_requests+0xae/0x2340
> [   78.849862]  ? submit_bio_noacct_nocheck+0x225/0x2b0
> [   78.849868]  blk_mq_sched_insert_requests+0x5c/0x150
> [   78.849872]  blk_mq_flush_plug_list+0xe1/0x2a0
> [   78.849876]  __blk_flush_plug+0xdf/0x120
> [   78.849879]  blk_finish_plug+0x27/0x40
> [   78.849882]  read_pages+0x15b/0x360
> [   78.849891]  page_cache_ra_unbounded+0x120/0x170
> [   78.849894]  filemap_get_pages+0xdd/0x5f0
> [   78.849899]  filemap_read+0xbf/0x350
> [   78.849902]  ? __mod_memcg_lruvec_state+0x72/0xc0
> [   78.849907]  ? __mod_lruvec_page_state+0xb4/0x160
> [   78.849909]  ? folio_add_lru+0x51/0x80
> [   78.849912]  ? _raw_spin_unlock+0x12/0x30
> [   78.849916]  ? __handle_mm_fault+0xdee/0x14d0
> [   78.849921]  blkdev_read_iter+0xa9/0x180
> [   78.849924]  new_sync_read+0x109/0x180
> [   78.849929]  vfs_read+0x187/0x1b0
> [   78.849932]  ksys_read+0xa1/0xe0
> [   78.849935]  do_syscall_64+0x42/0x90
> [   78.849938]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   78.849941] RIP: 0033:0x7fd3d6322f8e
> [   78.849944] Code: c0 e9 c6 fe ff ff 48 8d 3d a7 07 0a 00 48 83 ec 
> 08 e8 b6 e1 01 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 
> 14 0f 05 <48> 3d 00 f0 ff ff 77 59
> [   78.849945] RSP: 002b:00007ffe92d46ea8 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000000
> [   78.849948] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
> 00007fd3d6322f8e
> [   78.849949] RDX: 0000000000001000 RSI: 00000000004a3000 RDI: 
> 0000000000000003
> [   78.849950] RBP: 0000000000000003 R08: 00000000004a3000 R09: 
> 0000000000000003
> [   78.849951] R10: 00007fd3d623d0a8 R11: 0000000000000246 R12: 
> 00000000004a2a60
> [   78.849952] R13: 0000000000000000 R14: 00000000004a3000 R15: 
> 000000000048a4a0
> [   78.849954]  </TASK>

Looks like bfq or block issue, will try it from my side.

Thanks,
Guoqing

  reply	other threads:[~2022-05-23  1:08 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-05  8:16 [PATCH 0/2] two fixes for md Guoqing Jiang
2022-05-05  8:16 ` [PATCH V3 1/2] md: don't unregister sync_thread with reconfig_mutex held Guoqing Jiang
2022-05-05 14:02   ` kernel test robot
2022-05-05 18:04   ` kernel test robot
2022-05-06  2:34     ` Guoqing Jiang
2022-05-06  2:34       ` Guoqing Jiang
2022-05-05  8:16 ` [PATCH 2/2] md: protect md_unregister_thread from reentrancy Guoqing Jiang
2022-05-09  6:39   ` Song Liu
2022-05-09  8:12     ` Guoqing Jiang
2022-05-06 11:36 ` [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held Guoqing Jiang
2022-05-09  6:37   ` Song Liu
2022-05-09  8:09     ` Guoqing Jiang
2022-05-09  9:32       ` Wols Lists
2022-05-09 10:37         ` Guoqing Jiang
2022-05-09 11:19           ` Wols Lists
2022-05-09 11:26             ` Guoqing Jiang
2022-05-10  6:44       ` Song Liu
2022-05-10 12:01         ` Donald Buczek
2022-05-10 12:09           ` Guoqing Jiang
2022-05-10 12:35             ` Donald Buczek
2022-05-10 18:02               ` Song Liu
2022-05-11  8:10                 ` Guoqing Jiang
2022-05-11 21:45                   ` Song Liu
2022-05-20 18:27         ` Logan Gunthorpe
2022-05-21 18:23           ` Donald Buczek
2022-05-23  1:08             ` Guoqing Jiang [this message]
2022-05-23  5:41               ` Donald Buczek
2022-05-23  9:51                 ` Guoqing Jiang
2022-05-24 16:13                   ` Logan Gunthorpe
2022-05-25  9:04                     ` Guoqing Jiang
2022-05-25 18:22                       ` Logan Gunthorpe
2022-05-26  9:46                         ` Jan Kara
2022-05-26 11:53                         ` Jan Kara
2022-05-31  6:11                           ` Christoph Hellwig
2022-05-31  7:43                             ` Jan Kara
2022-05-30  9:55                   ` Guoqing Jiang
2022-05-30 16:35                     ` Logan Gunthorpe
2022-05-31  8:13                       ` Guoqing Jiang
2022-05-24 15:58                 ` Logan Gunthorpe
2022-05-24 18:16                   ` Song Liu
2022-05-25  9:17                 ` Guoqing Jiang
2022-05-24 15:51             ` Logan Gunthorpe
2022-06-02  8:12           ` Xiao Ni
2022-05-09  8:18   ` Donald Buczek
2022-05-09  8:48     ` Guoqing Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b0584a3-c128-cb53-7c8a-63744c60c667@linux.dev \
    --to=guoqing.jiang@linux.dev \
    --cc=buczek@molgen.mpg.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.