* [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce()
2017-09-12 1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
@ 2017-09-12 1:49 ` NeilBrown
2017-09-12 1:49 ` [PATCH 1/4] md: always hold reconfig_mutex when calling mddev_suspend() NeilBrown
` (5 subsequent siblings)
6 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2017-09-12 1:49 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
mddev_suspend() is a more general interface than
calling ->quiesce() and is so more extensible. A
future patch will make use of this.
Signed-off-by: NeilBrown <neilb@suse.com>
---
drivers/md/md.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d6d491243411..0b167169fef9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4820,8 +4820,8 @@ suspend_lo_store(struct mddev *mddev, const char *buf, size_t len)
mddev->pers->quiesce(mddev, 2);
else {
/* Expanding suspended region - need to wait */
- mddev->pers->quiesce(mddev, 1);
- mddev->pers->quiesce(mddev, 0);
+ mddev_suspend(mddev);
+ mddev_resume(mddev);
}
err = 0;
unlock:
@@ -4863,8 +4863,8 @@ suspend_hi_store(struct mddev *mddev, const char *buf, size_t len)
mddev->pers->quiesce(mddev, 2);
else {
/* Expanding suspended region - need to wait */
- mddev->pers->quiesce(mddev, 1);
- mddev->pers->quiesce(mddev, 0);
+ mddev_suspend(mddev);
+ mddev_resume(mddev);
}
err = 0;
unlock:
@@ -6595,7 +6595,7 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
struct bitmap *bitmap;
bitmap = bitmap_create(mddev, -1);
- mddev->pers->quiesce(mddev, 1);
+ mddev_suspend(mddev);
if (!IS_ERR(bitmap)) {
mddev->bitmap = bitmap;
err = bitmap_load(mddev);
@@ -6605,11 +6605,11 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
bitmap_destroy(mddev);
fd = -1;
}
- mddev->pers->quiesce(mddev, 0);
+ mddev_resume(mddev);
} else if (fd < 0) {
- mddev->pers->quiesce(mddev, 1);
+ mddev_suspend(mddev);
bitmap_destroy(mddev);
- mddev->pers->quiesce(mddev, 0);
+ mddev_resume(mddev);
}
}
if (fd < 0) {
@@ -6895,7 +6895,7 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
mddev->bitmap_info.space =
mddev->bitmap_info.default_space;
bitmap = bitmap_create(mddev, -1);
- mddev->pers->quiesce(mddev, 1);
+ mddev_suspend(mddev);
if (!IS_ERR(bitmap)) {
mddev->bitmap = bitmap;
rv = bitmap_load(mddev);
@@ -6903,7 +6903,7 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
rv = PTR_ERR(bitmap);
if (rv)
bitmap_destroy(mddev);
- mddev->pers->quiesce(mddev, 0);
+ mddev_resume(mddev);
} else {
/* remove the bitmap */
if (!mddev->bitmap) {
@@ -6926,9 +6926,9 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
mddev->bitmap_info.nodes = 0;
md_cluster_ops->leave(mddev);
}
- mddev->pers->quiesce(mddev, 1);
+ mddev_suspend(mddev);
bitmap_destroy(mddev);
- mddev->pers->quiesce(mddev, 0);
+ mddev_resume(mddev);
mddev->bitmap_info.offset = 0;
}
}
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 1/4] md: always hold reconfig_mutex when calling mddev_suspend()
2017-09-12 1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
2017-09-12 1:49 ` [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce() NeilBrown
@ 2017-09-12 1:49 ` NeilBrown
2017-09-12 1:49 ` [PATCH 4/4] md: allow metadata update while suspending NeilBrown
` (4 subsequent siblings)
6 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2017-09-12 1:49 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
Most often mddev_suspend() is called with
reconfig_mutex held. Make this a requirement in
preparation a subsequent patch.
Signed-off-by: NeilBrown <neilb@suse.com>
---
drivers/md/dm-raid.c | 5 ++++-
drivers/md/md.c | 1 +
drivers/md/raid5-cache.c | 2 ++
3 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 5bfe285ea9d1..f013b03e15c3 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3628,8 +3628,11 @@ static void raid_postsuspend(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
- if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
+ if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
+ mddev_lock_nointr(&rs->md);
mddev_suspend(&rs->md);
+ mddev_unlock(&rs->md);
+ }
rs->md.ro = 1;
}
diff --git a/drivers/md/md.c b/drivers/md/md.c
index b01e458d31e9..675c9e6495e4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -336,6 +336,7 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
void mddev_suspend(struct mddev *mddev)
{
WARN_ON_ONCE(mddev->thread && current == mddev->thread->tsk);
+ lockdep_assert_held(&mddev->reconfig_mutex);
if (mddev->suspended++)
return;
synchronize_rcu();
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 2dcbafa8e66c..8d09c2fa3d63 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -703,9 +703,11 @@ static void r5c_disable_writeback_async(struct work_struct *work)
wait_event(mddev->sb_wait,
!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
+ mddev_lock_nointr(mddev);
mddev_suspend(mddev);
log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH;
mddev_resume(mddev);
+ mddev_unlock(mddev);
}
static void r5l_submit_current_io(struct r5l_log *log)
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 4/4] md: allow metadata update while suspending.
2017-09-12 1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
2017-09-12 1:49 ` [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce() NeilBrown
2017-09-12 1:49 ` [PATCH 1/4] md: always hold reconfig_mutex when calling mddev_suspend() NeilBrown
@ 2017-09-12 1:49 ` NeilBrown
2017-09-12 1:49 ` [PATCH 2/4] md: don't call bitmap_create() while array is quiesced NeilBrown
` (3 subsequent siblings)
6 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2017-09-12 1:49 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
There are various deadlock that can occur
when a thread holds reconfig_mutex and calls
->quiesce(mddev, 1).
As the md thread update the metadata while the
reconfig mutex is held, the array may not fully
quiesce.
For example, raid5 will not complete stripes
while a metadata update is pending, and that
prevents raid5_quiesce() from completing.
->quiesce() is now usually called from mddev_suspend(),
and it is always called with reconfig_mutex held. So
at this time it is safe for the thread to update metadata
without explicitly taking the lock.
So add 2 new flags, one which says the unlocked update is
allowed, and one which ways it is happening. Then allow
it while the quiesce completes, and then wait for it to finished.
Signed-off-by: NeilBrown <neilb@suse.com>
---
drivers/md/md.c | 14 ++++++++++++++
drivers/md/md.h | 6 ++++++
2 files changed, 20 insertions(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0b167169fef9..f63721109d0e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -341,8 +341,12 @@ void mddev_suspend(struct mddev *mddev)
return;
synchronize_rcu();
wake_up(&mddev->sb_wait);
+ set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
+ smp_mb__after_atomic();
wait_event(mddev->sb_wait, atomic_read(&mddev->active_io) == 0);
mddev->pers->quiesce(mddev, 1);
+ clear_bit_unlock(MD_ALLOW_SB_UPDATE, &mddev->flags);
+ wait_event(mddev->sb_wait, !test_bit(MD_UPDATING_SB, &mddev->flags));
del_timer_sync(&mddev->safemode_timer);
}
@@ -8790,6 +8794,16 @@ void md_check_recovery(struct mddev *mddev)
unlock:
wake_up(&mddev->sb_wait);
mddev_unlock(mddev);
+ } else if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags) && mddev->sb_flags) {
+ /* Write superblock - thread that called mddev_suspend()
+ * holds reconfig_mutex for us.
+ */
+ set_bit(MD_UPDATING_SB, &mddev->flags);
+ smp_mb__after_atomic();
+ if (test_bit(MD_ALLOW_SB_UPDATE, &mddev->flags))
+ md_update_sb(mddev, 0);
+ clear_bit_unlock(MD_UPDATING_SB, &mddev->flags);
+ wake_up(&mddev->sb_wait);
}
}
EXPORT_SYMBOL(md_check_recovery);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 09db03455801..8ab5e1bcb10f 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -236,6 +236,12 @@ enum mddev_flags {
* never cause the array to become failed.
*/
MD_HAS_PPL, /* The raid array has PPL feature set */
+ MD_ALLOW_SB_UPDATE, /* md_check_recovery is allowed to update
+ * the metadata without taking reconfig_mutex.
+ */
+ MD_UPDATING_SB, /* md_check_recovery is updating the metadata
+ * without explicitly holding reconfig_mutex.
+ */
};
enum mddev_sb_flags {
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 2/4] md: don't call bitmap_create() while array is quiesced.
2017-09-12 1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
` (2 preceding siblings ...)
2017-09-12 1:49 ` [PATCH 4/4] md: allow metadata update while suspending NeilBrown
@ 2017-09-12 1:49 ` NeilBrown
2017-09-12 2:51 ` [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without Xiao Ni
` (2 subsequent siblings)
6 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2017-09-12 1:49 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
bitmap_create() allocates memory with GFP_KERNEL and
so can wait for IO.
If called while the array is quiesced, it could wait indefinitely
for write out to the array - deadlock.
So call bitmap_create() before quiescing the array.
Signed-off-by: NeilBrown <neilb@suse.com>
---
drivers/md/md.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 675c9e6495e4..d6d491243411 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6591,22 +6591,26 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
return -ENOENT; /* cannot remove what isn't there */
err = 0;
if (mddev->pers) {
- mddev->pers->quiesce(mddev, 1);
if (fd >= 0) {
struct bitmap *bitmap;
bitmap = bitmap_create(mddev, -1);
+ mddev->pers->quiesce(mddev, 1);
if (!IS_ERR(bitmap)) {
mddev->bitmap = bitmap;
err = bitmap_load(mddev);
} else
err = PTR_ERR(bitmap);
- }
- if (fd < 0 || err) {
+ if (err) {
+ bitmap_destroy(mddev);
+ fd = -1;
+ }
+ mddev->pers->quiesce(mddev, 0);
+ } else if (fd < 0) {
+ mddev->pers->quiesce(mddev, 1);
bitmap_destroy(mddev);
- fd = -1; /* make sure to put the file */
+ mddev->pers->quiesce(mddev, 0);
}
- mddev->pers->quiesce(mddev, 0);
}
if (fd < 0) {
struct file *f = mddev->bitmap_info.file;
@@ -6890,8 +6894,8 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
mddev->bitmap_info.default_offset;
mddev->bitmap_info.space =
mddev->bitmap_info.default_space;
- mddev->pers->quiesce(mddev, 1);
bitmap = bitmap_create(mddev, -1);
+ mddev->pers->quiesce(mddev, 1);
if (!IS_ERR(bitmap)) {
mddev->bitmap = bitmap;
rv = bitmap_load(mddev);
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-12 1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
` (3 preceding siblings ...)
2017-09-12 1:49 ` [PATCH 2/4] md: don't call bitmap_create() while array is quiesced NeilBrown
@ 2017-09-12 2:51 ` Xiao Ni
2017-09-13 2:11 ` Xiao Ni
2017-09-30 9:46 ` Xiao Ni
6 siblings, 0 replies; 27+ messages in thread
From: Xiao Ni @ 2017-09-12 2:51 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Tuesday, September 12, 2017 9:49:12 AM
> Subject: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>
> Hi,
> I looked again at the previous patch I posted which tried to mak
> md_update_sb() safe without taking reconfig_mutex, and realized that
> it had serious problems, particularly around devices being added or
> removed while the update was happening.
>
> So I decided to try a different approach, which is embodied in these
> patches. The md thread is now explicitly allowed to call
> md_update_sb() while some other thread holds the lock and
> waits for mddev_suspend() to complete.
>
> Please test these and confirm that they still address the problem you
> found.
Hi Neil
Thanks for the patches. I'll update the test result.
Regards
Xiao
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-12 1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
` (4 preceding siblings ...)
2017-09-12 2:51 ` [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without Xiao Ni
@ 2017-09-13 2:11 ` Xiao Ni
2017-09-13 15:09 ` Xiao Ni
2017-09-30 9:46 ` Xiao Ni
6 siblings, 1 reply; 27+ messages in thread
From: Xiao Ni @ 2017-09-13 2:11 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Tuesday, September 12, 2017 9:49:12 AM
> Subject: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>
> Hi,
> I looked again at the previous patch I posted which tried to mak
> md_update_sb() safe without taking reconfig_mutex, and realized that
> it had serious problems, particularly around devices being added or
> removed while the update was happening.
>
> So I decided to try a different approach, which is embodied in these
> patches. The md thread is now explicitly allowed to call
> md_update_sb() while some other thread holds the lock and
> waits for mddev_suspend() to complete.
>
> Please test these and confirm that they still address the problem you
> found.
Hi Neil
The test have been running for more than 24 hours. The problem doesn't appear.
The patches can fix this bug.
Best Regards
Xiao
>
> Thanks,
> NeilBrown
>
> ---
>
> NeilBrown (4):
> md: always hold reconfig_mutex when calling mddev_suspend()
> md: don't call bitmap_create() while array is quiesced.
> md: use mddev_suspend/resume instead of ->quiesce()
> md: allow metadata update while suspending.
>
>
> drivers/md/dm-raid.c | 5 ++++-
> drivers/md/md.c | 45
> ++++++++++++++++++++++++++++++++-------------
> drivers/md/md.h | 6 ++++++
> drivers/md/raid5-cache.c | 2 ++
> 4 files changed, 44 insertions(+), 14 deletions(-)
>
> --
> Signature
>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-13 2:11 ` Xiao Ni
@ 2017-09-13 15:09 ` Xiao Ni
2017-09-13 23:05 ` NeilBrown
0 siblings, 1 reply; 27+ messages in thread
From: Xiao Ni @ 2017-09-13 15:09 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
----- Original Message -----
> From: "Xiao Ni" <xni@redhat.com>
> To: "NeilBrown" <neilb@suse.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Wednesday, September 13, 2017 10:11:50 AM
> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>
>
>
> ----- Original Message -----
> > From: "NeilBrown" <neilb@suse.com>
> > To: "Xiao Ni" <xni@redhat.com>
> > Cc: linux-raid@vger.kernel.org
> > Sent: Tuesday, September 12, 2017 9:49:12 AM
> > Subject: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata
> > without
> >
> > Hi,
> > I looked again at the previous patch I posted which tried to mak
> > md_update_sb() safe without taking reconfig_mutex, and realized that
> > it had serious problems, particularly around devices being added or
> > removed while the update was happening.
> >
> > So I decided to try a different approach, which is embodied in these
> > patches. The md thread is now explicitly allowed to call
> > md_update_sb() while some other thread holds the lock and
> > waits for mddev_suspend() to complete.
> >
> > Please test these and confirm that they still address the problem you
> > found.
>
> Hi Neil
>
> The test have been running for more than 24 hours. The problem doesn't
> appear.
> The patches can fix this bug.
>
Hi Neil
Sorry for the bad news. The test is still running and it's stuck again.
Regards
Xiao
> >
> > Thanks,
> > NeilBrown
> >
> > ---
> >
> > NeilBrown (4):
> > md: always hold reconfig_mutex when calling mddev_suspend()
> > md: don't call bitmap_create() while array is quiesced.
> > md: use mddev_suspend/resume instead of ->quiesce()
> > md: allow metadata update while suspending.
> >
> >
> > drivers/md/dm-raid.c | 5 ++++-
> > drivers/md/md.c | 45
> > ++++++++++++++++++++++++++++++++-------------
> > drivers/md/md.h | 6 ++++++
> > drivers/md/raid5-cache.c | 2 ++
> > 4 files changed, 44 insertions(+), 14 deletions(-)
> >
> > --
> > Signature
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-13 15:09 ` Xiao Ni
@ 2017-09-13 23:05 ` NeilBrown
2017-09-14 4:55 ` Xiao Ni
0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2017-09-13 23:05 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 271 bytes --]
On Wed, Sep 13 2017, Xiao Ni wrote:
>
> Hi Neil
>
> Sorry for the bad news. The test is still running and it's stuck again.
Any details? Anything at all? Just a little hint maybe?
Just saying "it's stuck again" is very nearly useless.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-13 23:05 ` NeilBrown
@ 2017-09-14 4:55 ` Xiao Ni
2017-09-14 5:32 ` NeilBrown
0 siblings, 1 reply; 27+ messages in thread
From: Xiao Ni @ 2017-09-14 4:55 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Thursday, September 14, 2017 7:05:20 AM
> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>
> On Wed, Sep 13 2017, Xiao Ni wrote:
> >
> > Hi Neil
> >
> > Sorry for the bad news. The test is still running and it's stuck again.
>
> Any details? Anything at all? Just a little hint maybe?
>
> Just saying "it's stuck again" is very nearly useless.
>
Hi Neil
It doesn't show any useful information in /var/log/messages
echo file raid5.c +p > /sys/kernel/debug/dynamic_debug/control
There aren't any messages too.
It looks like another problem.
[root@dell-pr1700-02 ~]# ps auxf | grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 8381 0.0 0.0 0 0 ? D Sep13 0:00 \_ [kworker/u8:1]
root 8966 0.0 0.0 0 0 ? D Sep13 0:00 \_ [jbd2/md0-8]
root 824 0.0 0.1 216856 8492 ? Ss Sep03 0:06 /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible recursive locking detected ernel BUG at list_del corruption list_add corruption do_IRQ: stack overflow: ear stack overflow (cur: eneral protection fault nable to handle kernel ouble fault: RTNL: assertion failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG ysctl table check failed : nobody cared IRQ handler type mismatch Machine Check Exception: Machine check events logged divide error: bounds: coprocessor segment overrun: invalid TSS: segment not present: invalid opcode: alignment check: stack segment: fpu exception: simd exception: iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops -xtD
root 836 0.0 0.0 195052 3200 ? Ssl Sep03 0:00 /usr/sbin/gssproxy -D
root 1225 0.0 0.0 106008 7436 ? Ss Sep03 0:00 /usr/sbin/sshd -D
root 12411 0.0 0.0 112672 2264 pts/0 S+ 00:50 0:00 \_ grep --color=auto D
root 8987 0.0 0.0 109000 2728 pts/2 D+ Sep13 0:04 \_ dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
root 8983 0.0 0.0 7116 2080 ? Ds Sep13 0:00 /usr/sbin/mdadm --grow --continue /dev/md0
[root@dell-pr1700-02 ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop6[7] loop4[6] loop5[5](S) loop3[3] loop2[2] loop1[1] loop0[0]
2039808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
[>....................] reshape = 0.0% (1/509952) finish=1059.5min speed=7K/sec
unused devices: <none>
It looks like the reshape doesn't start. This time I didn't add the codes to check
the information of mddev->suspended and active_stripes. I just added the patches
to source codes. Do you have other suggestions to check more things?
Best Regards
Xiao
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-14 4:55 ` Xiao Ni
@ 2017-09-14 5:32 ` NeilBrown
2017-09-14 7:57 ` Xiao Ni
0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2017-09-14 5:32 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3205 bytes --]
On Thu, Sep 14 2017, Xiao Ni wrote:
> ----- Original Message -----
>> From: "NeilBrown" <neilb@suse.com>
>> To: "Xiao Ni" <xni@redhat.com>
>> Cc: linux-raid@vger.kernel.org
>> Sent: Thursday, September 14, 2017 7:05:20 AM
>> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>>
>> On Wed, Sep 13 2017, Xiao Ni wrote:
>> >
>> > Hi Neil
>> >
>> > Sorry for the bad news. The test is still running and it's stuck again.
>>
>> Any details? Anything at all? Just a little hint maybe?
>>
>> Just saying "it's stuck again" is very nearly useless.
>>
> Hi Neil
>
> It doesn't show any useful information in /var/log/messages
>
> echo file raid5.c +p > /sys/kernel/debug/dynamic_debug/control
> There aren't any messages too.
>
> It looks like another problem.
>
> [root@dell-pr1700-02 ~]# ps auxf | grep D
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 8381 0.0 0.0 0 0 ? D Sep13 0:00 \_ [kworker/u8:1]
> root 8966 0.0 0.0 0 0 ? D Sep13 0:00 \_ [jbd2/md0-8]
> root 824 0.0 0.1 216856 8492 ? Ss Sep03 0:06 /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible recursive locking detected ernel BUG at list_del corruption list_add corruption do_IRQ: stack overflow: ear stack overflow (cur: eneral protection fault nable to handle kernel ouble fault: RTNL: assertion failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG ysctl table check failed : nobody cared IRQ handler type mismatch Machine Check Exception: Machine check events logged divide error: bounds: coprocessor segment overrun: invalid TSS: segment not present: invalid opcode: alignment check: stack segment: fpu exception: simd exception: iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops -xtD
> root 836 0.0 0.0 195052 3200 ? Ssl Sep03 0:00 /usr/sbin/gssproxy -D
> root 1225 0.0 0.0 106008 7436 ? Ss Sep03 0:00 /usr/sbin/sshd -D
> root 12411 0.0 0.0 112672 2264 pts/0 S+ 00:50 0:00 \_ grep --color=auto D
> root 8987 0.0 0.0 109000 2728 pts/2 D+ Sep13 0:04 \_ dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
> root 8983 0.0 0.0 7116 2080 ? Ds Sep13 0:00 /usr/sbin/mdadm --grow --continue /dev/md0
>
> [root@dell-pr1700-02 ~]# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 loop6[7] loop4[6] loop5[5](S) loop3[3] loop2[2] loop1[1] loop0[0]
> 2039808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
> [>....................] reshape = 0.0% (1/509952) finish=1059.5min speed=7K/sec
>
> unused devices: <none>
>
>
> It looks like the reshape doesn't start. This time I didn't add the codes to check
> the information of mddev->suspended and active_stripes. I just added the patches
> to source codes. Do you have other suggestions to check more things?
>
> Best Regards
> Xiao
What do
cat /proc/8987/stack
cat /proc/8983/stack
cat /proc/8966/stack
cat /proc/8381/stack
show??
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-14 5:32 ` NeilBrown
@ 2017-09-14 7:57 ` Xiao Ni
2017-09-16 13:15 ` Xiao Ni
2017-10-05 5:17 ` NeilBrown
0 siblings, 2 replies; 27+ messages in thread
From: Xiao Ni @ 2017-09-14 7:57 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Thursday, September 14, 2017 1:32:02 PM
> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>
> On Thu, Sep 14 2017, Xiao Ni wrote:
>
> > ----- Original Message -----
> >> From: "NeilBrown" <neilb@suse.com>
> >> To: "Xiao Ni" <xni@redhat.com>
> >> Cc: linux-raid@vger.kernel.org
> >> Sent: Thursday, September 14, 2017 7:05:20 AM
> >> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata
> >> without
> >>
> >> On Wed, Sep 13 2017, Xiao Ni wrote:
> >> >
> >> > Hi Neil
> >> >
> >> > Sorry for the bad news. The test is still running and it's stuck again.
> >>
> >> Any details? Anything at all? Just a little hint maybe?
> >>
> >> Just saying "it's stuck again" is very nearly useless.
> >>
> > Hi Neil
> >
> > It doesn't show any useful information in /var/log/messages
> >
> > echo file raid5.c +p > /sys/kernel/debug/dynamic_debug/control
> > There aren't any messages too.
> >
> > It looks like another problem.
> >
> > [root@dell-pr1700-02 ~]# ps auxf | grep D
> > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> > root 8381 0.0 0.0 0 0 ? D Sep13 0:00 \_
> > [kworker/u8:1]
> > root 8966 0.0 0.0 0 0 ? D Sep13 0:00 \_
> > [jbd2/md0-8]
> > root 824 0.0 0.1 216856 8492 ? Ss Sep03 0:06
> > /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible
> > recursive locking detected ernel BUG at list_del corruption list_add
> > corruption do_IRQ: stack overflow: ear stack overflow (cur: eneral
> > protection fault nable to handle kernel ouble fault: RTNL: assertion
> > failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG
> > ysctl table check failed : nobody cared IRQ handler type mismatch Machine
> > Check Exception: Machine check events logged divide error: bounds:
> > coprocessor segment overrun: invalid TSS: segment not present: invalid
> > opcode: alignment check: stack segment: fpu exception: simd exception:
> > iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops -xtD
> > root 836 0.0 0.0 195052 3200 ? Ssl Sep03 0:00
> > /usr/sbin/gssproxy -D
> > root 1225 0.0 0.0 106008 7436 ? Ss Sep03 0:00
> > /usr/sbin/sshd -D
> > root 12411 0.0 0.0 112672 2264 pts/0 S+ 00:50 0:00
> > \_ grep --color=auto D
> > root 8987 0.0 0.0 109000 2728 pts/2 D+ Sep13 0:04
> > \_ dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
> > root 8983 0.0 0.0 7116 2080 ? Ds Sep13 0:00
> > /usr/sbin/mdadm --grow --continue /dev/md0
> >
> > [root@dell-pr1700-02 ~]# cat /proc/mdstat
> > Personalities : [raid6] [raid5] [raid4]
> > md0 : active raid5 loop6[7] loop4[6] loop5[5](S) loop3[3] loop2[2] loop1[1]
> > loop0[0]
> > 2039808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6]
> > [UUUUUU]
> > [>....................] reshape = 0.0% (1/509952) finish=1059.5min
> > speed=7K/sec
> >
> > unused devices: <none>
> >
> >
> > It looks like the reshape doesn't start. This time I didn't add the codes
> > to check
> > the information of mddev->suspended and active_stripes. I just added the
> > patches
> > to source codes. Do you have other suggestions to check more things?
> >
> > Best Regards
> > Xiao
>
> What do
> cat /proc/8987/stack
> cat /proc/8983/stack
> cat /proc/8966/stack
> cat /proc/8381/stack
>
> show??
dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
[root@dell-pr1700-02 ~]# cat /proc/8987/stack
[<ffffffff810d4ea6>] io_schedule+0x16/0x40
[<ffffffff811c66ae>] __lock_page+0x10e/0x160
[<ffffffffa09b4ef0>] mpage_prepare_extent_to_map+0x290/0x310 [ext4]
[<ffffffffa09ba007>] ext4_writepages+0x467/0xe80 [ext4]
[<ffffffff811d6bec>] do_writepages+0x1c/0x70
[<ffffffff811c7c66>] __filemap_fdatawrite_range+0xc6/0x100
[<ffffffff811c7d6c>] filemap_flush+0x1c/0x20
[<ffffffffa09b757c>] ext4_alloc_da_blocks+0x2c/0x70 [ext4]
[<ffffffffa09a89a9>] ext4_release_file+0x79/0xc0 [ext4]
[<ffffffff81263d67>] __fput+0xe7/0x210
[<ffffffff81263ece>] ____fput+0xe/0x10
[<ffffffff810c59c3>] task_work_run+0x83/0xb0
[<ffffffff81003d64>] exit_to_usermode_loop+0x6c/0xa8
[<ffffffff8100389a>] do_syscall_64+0x13a/0x150
[<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff
/usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add lockdep_assert_held(&mddev->reconfig_mutex)?
[root@dell-pr1700-02 ~]# cat /proc/8983/stack
[<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
[<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
[<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
[<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
[<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
[<ffffffff81260457>] __vfs_write+0x37/0x170
[<ffffffff812619e2>] vfs_write+0xb2/0x1b0
[<ffffffff81263015>] SyS_write+0x55/0xc0
[<ffffffff810037c7>] do_syscall_64+0x67/0x150
[<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff
[jbd2/md0-8]
[root@dell-pr1700-02 ~]# cat /proc/8966/stack
[<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
[<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
[<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
[<ffffffff81376427>] generic_make_request+0x117/0x2f0
[<ffffffff81376675>] submit_bio+0x75/0x150
[<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
[<ffffffff8129e683>] submit_bh+0x13/0x20
[<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
[<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
[<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
[<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
[<ffffffff810c73f9>] kthread+0x109/0x140
[<ffffffff817776c5>] ret_from_fork+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
[kworker/u8:1]
[root@dell-pr1700-02 ~]# cat /proc/8381/stack
[<ffffffffa0a34131>] md_make_request+0xb1/0x260 [md_mod]
[<ffffffff81376427>] generic_make_request+0x117/0x2f0
[<ffffffff81376675>] submit_bio+0x75/0x150
[<ffffffffa09d421c>] ext4_io_submit+0x4c/0x60 [ext4]
[<ffffffffa09d43f4>] ext4_bio_write_page+0x1a4/0x3b0 [ext4]
[<ffffffffa09b44f7>] mpage_submit_page+0x57/0x70 [ext4]
[<ffffffffa09b4778>] mpage_map_and_submit_buffers+0x168/0x290 [ext4]
[<ffffffffa09ba3f2>] ext4_writepages+0x852/0xe80 [ext4]
[<ffffffff811d6bec>] do_writepages+0x1c/0x70
[<ffffffff81293895>] __writeback_single_inode+0x45/0x320
[<ffffffff812940c0>] writeback_sb_inodes+0x280/0x570
[<ffffffff8129443c>] __writeback_inodes_wb+0x8c/0xc0
[<ffffffff812946e6>] wb_writeback+0x276/0x310
[<ffffffff81294f9c>] wb_workfn+0x19c/0x3b0
[<ffffffff810c0ff9>] process_one_work+0x149/0x360
[<ffffffff810c177d>] worker_thread+0x4d/0x3c0
[<ffffffff810c73f9>] kthread+0x109/0x140
[<ffffffff817776c5>] ret_from_fork+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
If they can't give useful hints, I can try to print more information and do test again.
Best Regards
Xiao
>
> NeilBrown
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-14 7:57 ` Xiao Ni
@ 2017-09-16 13:15 ` Xiao Ni
2017-10-05 5:17 ` NeilBrown
1 sibling, 0 replies; 27+ messages in thread
From: Xiao Ni @ 2017-09-16 13:15 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
----- Original Message -----
> From: "Xiao Ni" <xni@redhat.com>
> To: "NeilBrown" <neilb@suse.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Thursday, September 14, 2017 3:57:21 PM
> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>
>
>
> ----- Original Message -----
> > From: "NeilBrown" <neilb@suse.com>
> > To: "Xiao Ni" <xni@redhat.com>
> > Cc: linux-raid@vger.kernel.org
> > Sent: Thursday, September 14, 2017 1:32:02 PM
> > Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata
> > without
> >
> > On Thu, Sep 14 2017, Xiao Ni wrote:
> >
> > > ----- Original Message -----
> > >> From: "NeilBrown" <neilb@suse.com>
> > >> To: "Xiao Ni" <xni@redhat.com>
> > >> Cc: linux-raid@vger.kernel.org
> > >> Sent: Thursday, September 14, 2017 7:05:20 AM
> > >> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with
> > >> metadata
> > >> without
> > >>
> > >> On Wed, Sep 13 2017, Xiao Ni wrote:
> > >> >
> > >> > Hi Neil
> > >> >
> > >> > Sorry for the bad news. The test is still running and it's stuck
> > >> > again.
> > >>
> > >> Any details? Anything at all? Just a little hint maybe?
> > >>
> > >> Just saying "it's stuck again" is very nearly useless.
> > >>
> > > Hi Neil
> > >
> > > It doesn't show any useful information in /var/log/messages
> > >
> > > echo file raid5.c +p > /sys/kernel/debug/dynamic_debug/control
> > > There aren't any messages too.
> > >
> > > It looks like another problem.
> > >
> > > [root@dell-pr1700-02 ~]# ps auxf | grep D
> > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> > > root 8381 0.0 0.0 0 0 ? D Sep13 0:00 \_
> > > [kworker/u8:1]
> > > root 8966 0.0 0.0 0 0 ? D Sep13 0:00 \_
> > > [jbd2/md0-8]
> > > root 824 0.0 0.1 216856 8492 ? Ss Sep03 0:06
> > > /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible
> > > recursive locking detected ernel BUG at list_del corruption list_add
> > > corruption do_IRQ: stack overflow: ear stack overflow (cur: eneral
> > > protection fault nable to handle kernel ouble fault: RTNL: assertion
> > > failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG
> > > ysctl table check failed : nobody cared IRQ handler type mismatch Machine
> > > Check Exception: Machine check events logged divide error: bounds:
> > > coprocessor segment overrun: invalid TSS: segment not present: invalid
> > > opcode: alignment check: stack segment: fpu exception: simd exception:
> > > iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops -xtD
> > > root 836 0.0 0.0 195052 3200 ? Ssl Sep03 0:00
> > > /usr/sbin/gssproxy -D
> > > root 1225 0.0 0.0 106008 7436 ? Ss Sep03 0:00
> > > /usr/sbin/sshd -D
> > > root 12411 0.0 0.0 112672 2264 pts/0 S+ 00:50 0:00
> > > \_ grep --color=auto D
> > > root 8987 0.0 0.0 109000 2728 pts/2 D+ Sep13 0:04
> > > \_ dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
> > > root 8983 0.0 0.0 7116 2080 ? Ds Sep13 0:00
> > > /usr/sbin/mdadm --grow --continue /dev/md0
> > >
> > > [root@dell-pr1700-02 ~]# cat /proc/mdstat
> > > Personalities : [raid6] [raid5] [raid4]
> > > md0 : active raid5 loop6[7] loop4[6] loop5[5](S) loop3[3] loop2[2]
> > > loop1[1]
> > > loop0[0]
> > > 2039808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6]
> > > [UUUUUU]
> > > [>....................] reshape = 0.0% (1/509952)
> > > finish=1059.5min
> > > speed=7K/sec
> > >
> > > unused devices: <none>
> > >
> > >
> > > It looks like the reshape doesn't start. This time I didn't add the codes
> > > to check
> > > the information of mddev->suspended and active_stripes. I just added the
> > > patches
> > > to source codes. Do you have other suggestions to check more things?
> > >
> > > Best Regards
> > > Xiao
> >
> > What do
> > cat /proc/8987/stack
> > cat /proc/8983/stack
> > cat /proc/8966/stack
> > cat /proc/8381/stack
> >
> > show??
>
> dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
>
> [root@dell-pr1700-02 ~]# cat /proc/8987/stack
> [<ffffffff810d4ea6>] io_schedule+0x16/0x40
> [<ffffffff811c66ae>] __lock_page+0x10e/0x160
> [<ffffffffa09b4ef0>] mpage_prepare_extent_to_map+0x290/0x310 [ext4]
> [<ffffffffa09ba007>] ext4_writepages+0x467/0xe80 [ext4]
> [<ffffffff811d6bec>] do_writepages+0x1c/0x70
> [<ffffffff811c7c66>] __filemap_fdatawrite_range+0xc6/0x100
> [<ffffffff811c7d6c>] filemap_flush+0x1c/0x20
> [<ffffffffa09b757c>] ext4_alloc_da_blocks+0x2c/0x70 [ext4]
> [<ffffffffa09a89a9>] ext4_release_file+0x79/0xc0 [ext4]
> [<ffffffff81263d67>] __fput+0xe7/0x210
> [<ffffffff81263ece>] ____fput+0xe/0x10
> [<ffffffff810c59c3>] task_work_run+0x83/0xb0
> [<ffffffff81003d64>] exit_to_usermode_loop+0x6c/0xa8
> [<ffffffff8100389a>] do_syscall_64+0x13a/0x150
> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add
> lockdep_assert_held(&mddev->reconfig_mutex)?
> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
> [<ffffffff81260457>] __vfs_write+0x37/0x170
> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
> [<ffffffff81263015>] SyS_write+0x55/0xc0
> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> [jbd2/md0-8]
> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
> [<ffffffff81376675>] submit_bio+0x75/0x150
> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
> [<ffffffff8129e683>] submit_bh+0x13/0x20
> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
> [<ffffffff810c73f9>] kthread+0x109/0x140
> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> [kworker/u8:1]
> [root@dell-pr1700-02 ~]# cat /proc/8381/stack
> [<ffffffffa0a34131>] md_make_request+0xb1/0x260 [md_mod]
> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
> [<ffffffff81376675>] submit_bio+0x75/0x150
> [<ffffffffa09d421c>] ext4_io_submit+0x4c/0x60 [ext4]
> [<ffffffffa09d43f4>] ext4_bio_write_page+0x1a4/0x3b0 [ext4]
> [<ffffffffa09b44f7>] mpage_submit_page+0x57/0x70 [ext4]
> [<ffffffffa09b4778>] mpage_map_and_submit_buffers+0x168/0x290 [ext4]
> [<ffffffffa09ba3f2>] ext4_writepages+0x852/0xe80 [ext4]
> [<ffffffff811d6bec>] do_writepages+0x1c/0x70
> [<ffffffff81293895>] __writeback_single_inode+0x45/0x320
> [<ffffffff812940c0>] writeback_sb_inodes+0x280/0x570
> [<ffffffff8129443c>] __writeback_inodes_wb+0x8c/0xc0
> [<ffffffff812946e6>] wb_writeback+0x276/0x310
> [<ffffffff81294f9c>] wb_workfn+0x19c/0x3b0
> [<ffffffff810c0ff9>] process_one_work+0x149/0x360
> [<ffffffff810c177d>] worker_thread+0x4d/0x3c0
> [<ffffffff810c73f9>] kthread+0x109/0x140
> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> If they can't give useful hints, I can try to print more information and do
> test again.
Hi Neil
I added some codes to print some information.
[13404.528231] mddev->suspended : 1
[13404.531170] mddev->active_io : 1
[13404.533774] conf->quiesce 0
MD_SB_CHANGE_PENDING of mddev->flags is not set
MD_UPDATING_SB of mddev->flags is not set
It's stuck at mddev_suspend
wait_event(mddev->sb_wait, atomic_read(&mddev->active_io) == 0)
md_write_start
wait_event(mddev->sb_wait,
!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && !mddev->suspended);
Best Regards
Xiao
>
> Best Regards
> Xiao
> >
> > NeilBrown
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-14 7:57 ` Xiao Ni
2017-09-16 13:15 ` Xiao Ni
@ 2017-10-05 5:17 ` NeilBrown
2017-10-06 3:53 ` Xiao Ni
1 sibling, 1 reply; 27+ messages in thread
From: NeilBrown @ 2017-10-05 5:17 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2418 bytes --]
On Thu, Sep 14 2017, Xiao Ni wrote:
>>
>> What do
>> cat /proc/8987/stack
>> cat /proc/8983/stack
>> cat /proc/8966/stack
>> cat /proc/8381/stack
>>
>> show??
>
...
>
> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add lockdep_assert_held(&mddev->reconfig_mutex)?
> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
> [<ffffffff81260457>] __vfs_write+0x37/0x170
> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
> [<ffffffff81263015>] SyS_write+0x55/0xc0
> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> [jbd2/md0-8]
> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
> [<ffffffff81376675>] submit_bio+0x75/0x150
> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
> [<ffffffff8129e683>] submit_bh+0x13/0x20
> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
> [<ffffffff810c73f9>] kthread+0x109/0x140
> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
Thanks for this (and sorry it took so long to get to it).
It looks like
Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and md_write_start()")
is badly broken. I wonder how it ever passed testing.
In write_start() is change the wait_event() call to
wait_event(mddev->sb_wait,
!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && !mddev->suspended);
That should be
wait_event(mddev->sb_wait,
!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || mddev->suspended);
i.e. it was (!A && !B), it should be (!A || B) !!!!!
Could you please make that change and try again.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-10-05 5:17 ` NeilBrown
@ 2017-10-06 3:53 ` Xiao Ni
2017-10-06 4:32 ` NeilBrown
0 siblings, 1 reply; 27+ messages in thread
From: Xiao Ni @ 2017-10-06 3:53 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On 10/05/2017 01:17 PM, NeilBrown wrote:
> On Thu, Sep 14 2017, Xiao Ni wrote:
>
>>> What do
>>> cat /proc/8987/stack
>>> cat /proc/8983/stack
>>> cat /proc/8966/stack
>>> cat /proc/8381/stack
>>>
>>> show??
> ...
>
>> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add lockdep_assert_held(&mddev->reconfig_mutex)?
>> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
>> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
>> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
>> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
>> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
>> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
>> [<ffffffff81260457>] __vfs_write+0x37/0x170
>> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
>> [<ffffffff81263015>] SyS_write+0x55/0xc0
>> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
>> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> [jbd2/md0-8]
>> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
>> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
>> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
>> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
>> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
>> [<ffffffff81376675>] submit_bio+0x75/0x150
>> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
>> [<ffffffff8129e683>] submit_bh+0x13/0x20
>> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
>> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
>> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
>> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
>> [<ffffffff810c73f9>] kthread+0x109/0x140
>> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
>> [<ffffffffffffffff>] 0xffffffffffffffff
> Thanks for this (and sorry it took so long to get to it).
> It looks like
>
> Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and md_write_start()")
>
> is badly broken. I wonder how it ever passed testing.
>
> In write_start() is change the wait_event() call to
>
> wait_event(mddev->sb_wait,
> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && !mddev->suspended);
>
>
> That should be
>
> wait_event(mddev->sb_wait,
> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || mddev->suspended);
Hi Neil
Do we want write bio can be handled when mddev->suspended is 1? After
changing to this,
write bio can be handled when mddev->suspended is 1.
When the stuck happens, mddev->suspended is 0 and MD_SB_CHANGE_PENDING
is set. So
the patch can't fix this problem. I tried the patch, the problem still
exists.
[ 7710.589274] mddev suspend : 0
[ 7710.592228] mddev ro : 0
[ 7710.594746] mddev insync : 0
[ 7710.597620] mddev SB CHANGE PENDING is set
[ 7710.601698] mddev SB CHANGE CLEAN is set
[ 7710.605601] mddev->persistent : 1
[ 7710.608905] mddev->external : 0
[ 7710.612030] conf quiesce : 2
raid5 is still spinning.
Hmm, I have a question. Why can't call md_check_recovery when
MD_SB_CHANGE_PENDING
is set in raid5d?
if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
spin_unlock_irq(&conf->device_lock);
md_check_recovery(mddev);
spin_lock_irq(&conf->device_lock);
}
Best Regards
Xiao
>
> i.e. it was (!A && !B), it should be (!A || B) !!!!!
>
> Could you please make that change and try again.
Hi Neil
I tried the patch and it can't work.
>
> Thanks,
> NeilBrown
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-10-06 3:53 ` Xiao Ni
@ 2017-10-06 4:32 ` NeilBrown
2017-10-09 1:21 ` Xiao Ni
0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2017-10-06 4:32 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 4261 bytes --]
On Fri, Oct 06 2017, Xiao Ni wrote:
> On 10/05/2017 01:17 PM, NeilBrown wrote:
>> On Thu, Sep 14 2017, Xiao Ni wrote:
>>
>>>> What do
>>>> cat /proc/8987/stack
>>>> cat /proc/8983/stack
>>>> cat /proc/8966/stack
>>>> cat /proc/8381/stack
>>>>
>>>> show??
>> ...
>>
>>> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add lockdep_assert_held(&mddev->reconfig_mutex)?
>>> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
>>> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
>>> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
>>> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
>>> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
>>> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
>>> [<ffffffff81260457>] __vfs_write+0x37/0x170
>>> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
>>> [<ffffffff81263015>] SyS_write+0x55/0xc0
>>> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
>>> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> [jbd2/md0-8]
>>> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
>>> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
>>> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
>>> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
>>> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
>>> [<ffffffff81376675>] submit_bio+0x75/0x150
>>> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
>>> [<ffffffff8129e683>] submit_bh+0x13/0x20
>>> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
>>> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
>>> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
>>> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
>>> [<ffffffff810c73f9>] kthread+0x109/0x140
>>> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>> Thanks for this (and sorry it took so long to get to it).
>> It looks like
>>
>> Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and md_write_start()")
>>
>> is badly broken. I wonder how it ever passed testing.
>>
>> In write_start() is change the wait_event() call to
>>
>> wait_event(mddev->sb_wait,
>> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && !mddev->suspended);
>>
>>
>> That should be
>>
>> wait_event(mddev->sb_wait,
>> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || mddev->suspended);
> Hi Neil
>
> Do we want write bio can be handled when mddev->suspended is 1? After
> changing to this,
> write bio can be handled when mddev->suspended is 1.
This is OK.
New write bios will not get past md_handle_request().
A write bios that did get past md_handle_request() is still allowed
through md_write_start(). The mddev_suspend() call won't complete until
that write bio has finished.
>
> When the stuck happens, mddev->suspended is 0 and MD_SB_CHANGE_PENDING
> is set. So
> the patch can't fix this problem. I tried the patch, the problem still
> exists.
>
I need to see all the stack traces.
> [ 7710.589274] mddev suspend : 0
> [ 7710.592228] mddev ro : 0
> [ 7710.594746] mddev insync : 0
> [ 7710.597620] mddev SB CHANGE PENDING is set
> [ 7710.601698] mddev SB CHANGE CLEAN is set
> [ 7710.605601] mddev->persistent : 1
> [ 7710.608905] mddev->external : 0
> [ 7710.612030] conf quiesce : 2
>
> raid5 is still spinning.
>
> Hmm, I have a question. Why can't call md_check_recovery when
> MD_SB_CHANGE_PENDING
> is set in raid5d?
When MD_SB_CHANGE_PENDING is not set, there is no need to call
md_check_recovery(). I wouldn't hurt except that it would be a waste of
time.
NeilBrown
>
> if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
> spin_unlock_irq(&conf->device_lock);
> md_check_recovery(mddev);
> spin_lock_irq(&conf->device_lock);
> }
>
> Best Regards
> Xiao
>
>
>>
>> i.e. it was (!A && !B), it should be (!A || B) !!!!!
>>
>> Could you please make that change and try again.
> Hi Neil
>
> I tried the patch and it can't work.
>>
>> Thanks,
>> NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-10-06 4:32 ` NeilBrown
@ 2017-10-09 1:21 ` Xiao Ni
2017-10-09 4:57 ` NeilBrown
0 siblings, 1 reply; 27+ messages in thread
From: Xiao Ni @ 2017-10-09 1:21 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 5911 bytes --]
----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Friday, October 6, 2017 12:32:19 PM
> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>
> On Fri, Oct 06 2017, Xiao Ni wrote:
>
> > On 10/05/2017 01:17 PM, NeilBrown wrote:
> >> On Thu, Sep 14 2017, Xiao Ni wrote:
> >>
> >>>> What do
> >>>> cat /proc/8987/stack
> >>>> cat /proc/8983/stack
> >>>> cat /proc/8966/stack
> >>>> cat /proc/8381/stack
> >>>>
> >>>> show??
> >> ...
> >>
> >>> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add
> >>> lockdep_assert_held(&mddev->reconfig_mutex)?
> >>> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
> >>> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
> >>> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
> >>> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
> >>> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
> >>> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
> >>> [<ffffffff81260457>] __vfs_write+0x37/0x170
> >>> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
> >>> [<ffffffff81263015>] SyS_write+0x55/0xc0
> >>> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
> >>> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
> >>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>
> >>> [jbd2/md0-8]
> >>> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
> >>> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
> >>> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
> >>> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
> >>> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
> >>> [<ffffffff81376675>] submit_bio+0x75/0x150
> >>> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
> >>> [<ffffffff8129e683>] submit_bh+0x13/0x20
> >>> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
> >>> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
> >>> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
> >>> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
> >>> [<ffffffff810c73f9>] kthread+0x109/0x140
> >>> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
> >>> [<ffffffffffffffff>] 0xffffffffffffffff
> >> Thanks for this (and sorry it took so long to get to it).
> >> It looks like
> >>
> >> Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and
> >> md_write_start()")
> >>
> >> is badly broken. I wonder how it ever passed testing.
> >>
> >> In write_start() is change the wait_event() call to
> >>
> >> wait_event(mddev->sb_wait,
> >> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
> >> !mddev->suspended);
> >>
> >>
> >> That should be
> >>
> >> wait_event(mddev->sb_wait,
> >> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) ||
> >> mddev->suspended);
> > Hi Neil
> >
> > Do we want write bio can be handled when mddev->suspended is 1? After
> > changing to this,
> > write bio can be handled when mddev->suspended is 1.
>
> This is OK.
> New write bios will not get past md_handle_request().
> A write bios that did get past md_handle_request() is still allowed
> through md_write_start(). The mddev_suspend() call won't complete until
> that write bio has finished.
Hi Neil
Thanks for the explanation. I took some time to read the emails about the
patch cc27b0c78 which introduced this. It's similar with this problem I
countered. But there is a call of function mddev_suspend in level_store.
So add the check of mddev->suspended in md_write_start can fix the problem
"reshape raid5 -> raid6 atop bcache deadlocks at start on md_attr_store /
raid5_make_request".
In function suspend_lo_store it doesn't call mddev_suspend under mddev->reconfig_mutex.
So there is still a race possibility as you said at first analysis.
>
> >
> > When the stuck happens, mddev->suspended is 0 and MD_SB_CHANGE_PENDING
> > is set. So
> > the patch can't fix this problem. I tried the patch, the problem still
> > exists.
> >
>
>
> I need to see all the stack traces.
I've added the calltrace as a attachment.
>
>
> > [ 7710.589274] mddev suspend : 0
> > [ 7710.592228] mddev ro : 0
> > [ 7710.594746] mddev insync : 0
> > [ 7710.597620] mddev SB CHANGE PENDING is set
> > [ 7710.601698] mddev SB CHANGE CLEAN is set
> > [ 7710.605601] mddev->persistent : 1
> > [ 7710.608905] mddev->external : 0
> > [ 7710.612030] conf quiesce : 2
> >
> > raid5 is still spinning.
> >
> > Hmm, I have a question. Why can't call md_check_recovery when
> > MD_SB_CHANGE_PENDING
> > is set in raid5d?
>
> When MD_SB_CHANGE_PENDING is not set, there is no need to call
> md_check_recovery(). I wouldn't hurt except that it would be a waste of
> time.
I'm confused. If we want to call md_check_recovery when MD_SB_CHANGE_PENDING
is set, it should be
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6299,7 +6299,7 @@ static void raid5d(struct md_thread *thread)
break;
handled += batch_size;
- if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
+ if (mddev->sb_flags & (1 << MD_SB_CHANGE_PENDING)) {
spin_unlock_irq(&conf->device_lock);
md_check_recovery(mddev);
spin_lock_irq(&conf->device_lock);
Right?
Regards
Xiao
>
> NeilBrown
>
>
> >
> > if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
> > spin_unlock_irq(&conf->device_lock);
> > md_check_recovery(mddev);
> > spin_lock_irq(&conf->device_lock);
> > }
> >
> > Best Regards
> > Xiao
> >
> >
> >>
> >> i.e. it was (!A && !B), it should be (!A || B) !!!!!
> >>
> >> Could you please make that change and try again.
> > Hi Neil
> >
> > I tried the patch and it can't work.
> >>
> >> Thanks,
> >> NeilBrown
>
[-- Attachment #2: calltrace --]
[-- Type: application/octet-stream, Size: 20356 bytes --]
Oct 5 22:46:18 localhost kernel: INFO: task kworker/u8:3:2453 blocked for more than 120 seconds.
Oct 5 22:46:18 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:46:18 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:46:18 localhost kernel: kworker/u8:3 D 0 2453 2 0x00000080
Oct 5 22:46:18 localhost kernel: Workqueue: writeback wb_workfn (flush-9:0)
Oct 5 22:46:18 localhost kernel: Call Trace:
Oct 5 22:46:18 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:46:18 localhost kernel: schedule+0x36/0x80
Oct 5 22:46:18 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct 5 22:46:18 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:18 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct 5 22:46:18 localhost kernel: ? bio_split+0x5d/0x90
Oct 5 22:46:18 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct 5 22:46:18 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:18 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct 5 22:46:18 localhost kernel: generic_make_request+0x117/0x2f0
Oct 5 22:46:18 localhost kernel: submit_bio+0x75/0x150
Oct 5 22:46:18 localhost kernel: ? __test_set_page_writeback+0xc6/0x320
Oct 5 22:46:18 localhost kernel: ext4_io_submit+0x4c/0x60 [ext4]
Oct 5 22:46:18 localhost kernel: ext4_bio_write_page+0x1a4/0x3b0 [ext4]
Oct 5 22:46:18 localhost kernel: mpage_submit_page+0x57/0x70 [ext4]
Oct 5 22:46:18 localhost kernel: mpage_map_and_submit_buffers+0x168/0x290 [ext4]
Oct 5 22:46:18 localhost kernel: ext4_writepages+0x852/0xe80 [ext4]
Oct 5 22:46:18 localhost kernel: ? account_entity_enqueue+0xd8/0x100
Oct 5 22:46:18 localhost kernel: do_writepages+0x1c/0x70
Oct 5 22:46:18 localhost kernel: __writeback_single_inode+0x45/0x320
Oct 5 22:46:18 localhost kernel: writeback_sb_inodes+0x280/0x570
Oct 5 22:46:18 localhost kernel: __writeback_inodes_wb+0x8c/0xc0
Oct 5 22:46:18 localhost kernel: wb_writeback+0x276/0x310
Oct 5 22:46:18 localhost kernel: wb_workfn+0x19c/0x3b0
Oct 5 22:46:18 localhost kernel: process_one_work+0x149/0x360
Oct 5 22:46:18 localhost kernel: worker_thread+0x4d/0x3c0
Oct 5 22:46:18 localhost kernel: kthread+0x109/0x140
Oct 5 22:46:18 localhost kernel: ? rescuer_thread+0x380/0x380
Oct 5 22:46:18 localhost kernel: ? kthread_park+0x60/0x60
Oct 5 22:46:18 localhost kernel: ? do_syscall_64+0x67/0x150
Oct 5 22:46:18 localhost kernel: ret_from_fork+0x25/0x30
Oct 5 22:46:18 localhost kernel: INFO: task jbd2/md0-8:3740 blocked for more than 120 seconds.
Oct 5 22:46:18 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:46:18 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:46:18 localhost kernel: jbd2/md0-8 D 0 3740 2 0x00000080
Oct 5 22:46:18 localhost kernel: Call Trace:
Oct 5 22:46:18 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:46:18 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct 5 22:46:18 localhost kernel: schedule+0x36/0x80
Oct 5 22:46:18 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct 5 22:46:18 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:18 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct 5 22:46:19 localhost kernel: ? find_next_bit+0xb/0x10
Oct 5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:19 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct 5 22:46:19 localhost kernel: generic_make_request+0x117/0x2f0
Oct 5 22:46:19 localhost kernel: submit_bio+0x75/0x150
Oct 5 22:46:19 localhost kernel: ? select_idle_sibling+0x2e0/0x3b0
Oct 5 22:46:19 localhost kernel: submit_bh_wbc+0x140/0x170
Oct 5 22:46:19 localhost kernel: submit_bh+0x13/0x20
Oct 5 22:46:19 localhost kernel: jbd2_write_superblock+0x109/0x230 [jbd2]
Oct 5 22:46:19 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct 5 22:46:19 localhost kernel: ? enqueue_entity+0x1f3/0x720
Oct 5 22:46:19 localhost kernel: jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
Oct 5 22:46:19 localhost kernel: ? mutex_lock_io+0x25/0x30
Oct 5 22:46:19 localhost kernel: jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
Oct 5 22:46:19 localhost kernel: ? account_entity_dequeue+0xaa/0xe0
Oct 5 22:46:19 localhost kernel: ? dequeue_entity+0xed/0x460
Oct 5 22:46:19 localhost kernel: ? ttwu_do_activate+0x7a/0x90
Oct 5 22:46:19 localhost kernel: ? dequeue_task_fair+0x565/0x820
Oct 5 22:46:19 localhost kernel: ? __switch_to+0x229/0x440
Oct 5 22:46:19 localhost kernel: ? lock_timer_base+0x7d/0xa0
Oct 5 22:46:19 localhost kernel: ? try_to_del_timer_sync+0x53/0x80
Oct 5 22:46:19 localhost kernel: kjournald2+0xd2/0x260 [jbd2]
Oct 5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:19 localhost kernel: kthread+0x109/0x140
Oct 5 22:46:19 localhost kernel: ? commit_timeout+0x10/0x10 [jbd2]
Oct 5 22:46:19 localhost kernel: ? kthread_park+0x60/0x60
Oct 5 22:46:19 localhost kernel: ? do_syscall_64+0x67/0x150
Oct 5 22:46:19 localhost kernel: ret_from_fork+0x25/0x30
Oct 5 22:46:19 localhost kernel: INFO: task ext4lazyinit:3743 blocked for more than 120 seconds.
Oct 5 22:46:19 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:46:19 localhost kernel: ext4lazyinit D 0 3743 2 0x00000080
Oct 5 22:46:19 localhost kernel: Call Trace:
Oct 5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:46:19 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct 5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct 5 22:46:19 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct 5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:19 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct 5 22:46:19 localhost kernel: ? bio_split+0x5d/0x90
Oct 5 22:46:19 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct 5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:19 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct 5 22:46:19 localhost kernel: ? mempool_alloc_slab+0x15/0x20
Oct 5 22:46:19 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct 5 22:46:19 localhost kernel: generic_make_request+0x117/0x2f0
Oct 5 22:46:19 localhost kernel: submit_bio+0x75/0x150
Oct 5 22:46:19 localhost kernel: next_bio+0x38/0x40
Oct 5 22:46:19 localhost kernel: __blkdev_issue_zeroout+0x164/0x210
Oct 5 22:46:19 localhost kernel: blkdev_issue_zeroout+0x62/0xc0
Oct 5 22:46:19 localhost kernel: ext4_init_inode_table+0x166/0x380 [ext4]
Oct 5 22:46:19 localhost kernel: ext4_lazyinit_thread+0x2d3/0x350 [ext4]
Oct 5 22:46:19 localhost kernel: kthread+0x109/0x140
Oct 5 22:46:19 localhost kernel: ? ext4_clear_request_list+0x70/0x70 [ext4]
Oct 5 22:46:19 localhost kernel: ? kthread_park+0x60/0x60
Oct 5 22:46:19 localhost kernel: ret_from_fork+0x25/0x30
Oct 5 22:46:19 localhost kernel: INFO: task md0_reshape:3751 blocked for more than 120 seconds.
Oct 5 22:46:19 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:46:19 localhost kernel: md0_reshape D 0 3751 2 0x00000080
Oct 5 22:46:19 localhost kernel: Call Trace:
Oct 5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct 5 22:46:19 localhost kernel: raid5_sync_request+0x2cf/0x370 [raid456]
Oct 5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:19 localhost kernel: md_do_sync+0xafe/0xee0 [md_mod]
Oct 5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:19 localhost kernel: md_thread+0x132/0x180 [md_mod]
Oct 5 22:46:19 localhost kernel: kthread+0x109/0x140
Oct 5 22:46:19 localhost kernel: ? find_pers+0x70/0x70 [md_mod]
Oct 5 22:46:19 localhost kernel: ? kthread_park+0x60/0x60
Oct 5 22:46:19 localhost kernel: ret_from_fork+0x25/0x30
Oct 5 22:46:19 localhost kernel: INFO: task mdadm:3758 blocked for more than 120 seconds.
Oct 5 22:46:19 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:46:19 localhost kernel: mdadm D 0 3758 1 0x00000080
Oct 5 22:46:19 localhost kernel: Call Trace:
Oct 5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct 5 22:46:19 localhost kernel: raid5_quiesce+0x274/0x2b0 [raid456]
Oct 5 22:46:19 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:46:19 localhost kernel: suspend_lo_store+0x82/0xe0 [md_mod]
Oct 5 22:46:19 localhost kernel: md_attr_store+0x80/0xc0 [md_mod]
Oct 5 22:46:19 localhost kernel: sysfs_kf_write+0x3a/0x50
Oct 5 22:46:19 localhost kernel: kernfs_fop_write+0xff/0x180
Oct 5 22:46:19 localhost kernel: __vfs_write+0x37/0x170
Oct 5 22:46:19 localhost kernel: ? selinux_file_permission+0xe5/0x120
Oct 5 22:46:19 localhost kernel: ? security_file_permission+0x3b/0xc0
Oct 5 22:46:19 localhost kernel: vfs_write+0xb2/0x1b0
Oct 5 22:46:19 localhost kernel: ? syscall_trace_enter+0x1d0/0x2b0
Oct 5 22:46:19 localhost kernel: SyS_write+0x55/0xc0
Oct 5 22:46:19 localhost kernel: do_syscall_64+0x67/0x150
Oct 5 22:46:19 localhost kernel: entry_SYSCALL64_slow_path+0x25/0x25
Oct 5 22:46:19 localhost kernel: RIP: 0033:0x7ff6fa57c840
Oct 5 22:46:19 localhost kernel: RSP: 002b:00007ffe2fe145b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Oct 5 22:46:19 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff6fa57c840
Oct 5 22:46:19 localhost kernel: RDX: 0000000000000001 RSI: 00007ffe2fe14660 RDI: 0000000000000003
Oct 5 22:46:19 localhost kernel: RBP: 00007ffe2fe14660 R08: 00007ffe2fe14660 R09: 000000000000001d
Oct 5 22:46:19 localhost kernel: R10: 000000000000000a R11: 0000000000000246 R12: 00000000004699a6
Oct 5 22:46:19 localhost kernel: R13: 0000000000000000 R14: 0000000001a1bd00 R15: 0000000001a1bd00
Oct 5 22:46:19 localhost kernel: INFO: task dd:3761 blocked for more than 120 seconds.
Oct 5 22:46:19 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:46:19 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:46:19 localhost kernel: dd D 0 3761 2291 0x00000080
Oct 5 22:46:19 localhost kernel: Call Trace:
Oct 5 22:46:19 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:46:19 localhost kernel: schedule+0x36/0x80
Oct 5 22:46:19 localhost kernel: io_schedule+0x16/0x40
Oct 5 22:46:19 localhost kernel: __lock_page+0x10e/0x160
Oct 5 22:46:19 localhost kernel: ? page_cache_tree_insert+0xf0/0xf0
Oct 5 22:46:19 localhost kernel: mpage_prepare_extent_to_map+0x290/0x310 [ext4]
Oct 5 22:46:19 localhost kernel: ext4_writepages+0x467/0xe80 [ext4]
Oct 5 22:46:19 localhost kernel: do_writepages+0x1c/0x70
Oct 5 22:46:19 localhost kernel: __filemap_fdatawrite_range+0xc6/0x100
Oct 5 22:46:19 localhost kernel: filemap_flush+0x1c/0x20
Oct 5 22:46:19 localhost kernel: ext4_alloc_da_blocks+0x2c/0x70 [ext4]
Oct 5 22:46:19 localhost kernel: ext4_release_file+0x79/0xc0 [ext4]
Oct 5 22:46:19 localhost kernel: __fput+0xe7/0x210
Oct 5 22:46:19 localhost kernel: ____fput+0xe/0x10
Oct 5 22:46:19 localhost kernel: task_work_run+0x83/0xb0
Oct 5 22:46:19 localhost kernel: exit_to_usermode_loop+0x6c/0xa8
Oct 5 22:46:19 localhost kernel: do_syscall_64+0x13a/0x150
Oct 5 22:46:19 localhost kernel: entry_SYSCALL64_slow_path+0x25/0x25
Oct 5 22:46:19 localhost kernel: RIP: 0033:0x7f09c0d15e90
Oct 5 22:46:19 localhost kernel: RSP: 002b:00007ffd681b4f58 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
Oct 5 22:46:19 localhost kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f09c0d15e90
Oct 5 22:46:19 localhost kernel: RDX: 0000000000100000 RSI: 0000000000000000 RDI: 0000000000000001
Oct 5 22:46:19 localhost kernel: RBP: 00000000000003e8 R08: ffffffffffffffff R09: 0000000000102003
Oct 5 22:46:19 localhost kernel: R10: 00007ffd681b4c60 R11: 0000000000000246 R12: 00000000000003e8
Oct 5 22:46:19 localhost kernel: R13: 0000000000000000 R14: 00007ffd681b634b R15: 00007ffd681b51d0
Oct 5 22:48:21 localhost kernel: INFO: task kworker/u8:3:2453 blocked for more than 120 seconds.
Oct 5 22:48:21 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:48:21 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:48:21 localhost kernel: kworker/u8:3 D 0 2453 2 0x00000080
Oct 5 22:48:21 localhost kernel: Workqueue: writeback wb_workfn (flush-9:0)
Oct 5 22:48:21 localhost kernel: Call Trace:
Oct 5 22:48:21 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:48:21 localhost kernel: schedule+0x36/0x80
Oct 5 22:48:21 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct 5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:48:21 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct 5 22:48:21 localhost kernel: ? bio_split+0x5d/0x90
Oct 5 22:48:21 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct 5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:48:21 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct 5 22:48:21 localhost kernel: generic_make_request+0x117/0x2f0
Oct 5 22:48:21 localhost kernel: submit_bio+0x75/0x150
Oct 5 22:48:21 localhost kernel: ? __test_set_page_writeback+0xc6/0x320
Oct 5 22:48:21 localhost kernel: ext4_io_submit+0x4c/0x60 [ext4]
Oct 5 22:48:21 localhost kernel: ext4_bio_write_page+0x1a4/0x3b0 [ext4]
Oct 5 22:48:21 localhost kernel: mpage_submit_page+0x57/0x70 [ext4]
Oct 5 22:48:21 localhost kernel: mpage_map_and_submit_buffers+0x168/0x290 [ext4]
Oct 5 22:48:21 localhost kernel: ext4_writepages+0x852/0xe80 [ext4]
Oct 5 22:48:21 localhost kernel: ? account_entity_enqueue+0xd8/0x100
Oct 5 22:48:21 localhost kernel: do_writepages+0x1c/0x70
Oct 5 22:48:21 localhost kernel: __writeback_single_inode+0x45/0x320
Oct 5 22:48:21 localhost kernel: writeback_sb_inodes+0x280/0x570
Oct 5 22:48:21 localhost kernel: __writeback_inodes_wb+0x8c/0xc0
Oct 5 22:48:21 localhost kernel: wb_writeback+0x276/0x310
Oct 5 22:48:21 localhost kernel: wb_workfn+0x19c/0x3b0
Oct 5 22:48:21 localhost kernel: process_one_work+0x149/0x360
Oct 5 22:48:21 localhost kernel: worker_thread+0x4d/0x3c0
Oct 5 22:48:21 localhost kernel: kthread+0x109/0x140
Oct 5 22:48:21 localhost kernel: ? rescuer_thread+0x380/0x380
Oct 5 22:48:21 localhost kernel: ? kthread_park+0x60/0x60
Oct 5 22:48:21 localhost kernel: ? do_syscall_64+0x67/0x150
Oct 5 22:48:21 localhost kernel: ret_from_fork+0x25/0x30
Oct 5 22:48:21 localhost kernel: INFO: task jbd2/md0-8:3740 blocked for more than 120 seconds.
Oct 5 22:48:21 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:48:21 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:48:21 localhost kernel: jbd2/md0-8 D 0 3740 2 0x00000080
Oct 5 22:48:21 localhost kernel: Call Trace:
Oct 5 22:48:21 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:48:21 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct 5 22:48:21 localhost kernel: schedule+0x36/0x80
Oct 5 22:48:21 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct 5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:48:21 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct 5 22:48:21 localhost kernel: ? find_next_bit+0xb/0x10
Oct 5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:48:21 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct 5 22:48:21 localhost kernel: generic_make_request+0x117/0x2f0
Oct 5 22:48:21 localhost kernel: submit_bio+0x75/0x150
Oct 5 22:48:21 localhost kernel: ? select_idle_sibling+0x2e0/0x3b0
Oct 5 22:48:21 localhost kernel: submit_bh_wbc+0x140/0x170
Oct 5 22:48:21 localhost kernel: submit_bh+0x13/0x20
Oct 5 22:48:21 localhost kernel: jbd2_write_superblock+0x109/0x230 [jbd2]
Oct 5 22:48:21 localhost kernel: ? __enqueue_entity+0x6c/0x70
Oct 5 22:48:21 localhost kernel: ? enqueue_entity+0x1f3/0x720
Oct 5 22:48:21 localhost kernel: jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
Oct 5 22:48:21 localhost kernel: ? mutex_lock_io+0x25/0x30
Oct 5 22:48:21 localhost kernel: jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
Oct 5 22:48:21 localhost kernel: ? account_entity_dequeue+0xaa/0xe0
Oct 5 22:48:21 localhost kernel: ? dequeue_entity+0xed/0x460
Oct 5 22:48:21 localhost kernel: ? ttwu_do_activate+0x7a/0x90
Oct 5 22:48:21 localhost kernel: ? dequeue_task_fair+0x565/0x820
Oct 5 22:48:21 localhost kernel: ? __switch_to+0x229/0x440
Oct 5 22:48:21 localhost kernel: ? lock_timer_base+0x7d/0xa0
Oct 5 22:48:21 localhost kernel: ? try_to_del_timer_sync+0x53/0x80
Oct 5 22:48:21 localhost kernel: kjournald2+0xd2/0x260 [jbd2]
Oct 5 22:48:21 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:48:21 localhost kernel: kthread+0x109/0x140
Oct 5 22:48:21 localhost kernel: ? commit_timeout+0x10/0x10 [jbd2]
Oct 5 22:48:21 localhost kernel: ? kthread_park+0x60/0x60
Oct 5 22:48:21 localhost kernel: ? do_syscall_64+0x67/0x150
Oct 5 22:48:21 localhost kernel: ret_from_fork+0x25/0x30
Oct 5 22:48:21 localhost kernel: INFO: task ext4lazyinit:3743 blocked for more than 120 seconds.
Oct 5 22:48:22 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:48:22 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:48:22 localhost kernel: ext4lazyinit D 0 3743 2 0x00000080
Oct 5 22:48:22 localhost kernel: Call Trace:
Oct 5 22:48:22 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:48:22 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct 5 22:48:22 localhost kernel: schedule+0x36/0x80
Oct 5 22:48:22 localhost kernel: md_write_start+0x195/0x230 [md_mod]
Oct 5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:48:22 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456]
Oct 5 22:48:22 localhost kernel: ? bio_split+0x5d/0x90
Oct 5 22:48:22 localhost kernel: ? blk_queue_split+0xd2/0x630
Oct 5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:48:22 localhost kernel: md_make_request+0xf5/0x260 [md_mod]
Oct 5 22:48:22 localhost kernel: ? mempool_alloc_slab+0x15/0x20
Oct 5 22:48:22 localhost kernel: ? mempool_alloc+0x6e/0x170
Oct 5 22:48:22 localhost kernel: generic_make_request+0x117/0x2f0
Oct 5 22:48:22 localhost kernel: submit_bio+0x75/0x150
Oct 5 22:48:22 localhost kernel: next_bio+0x38/0x40
Oct 5 22:48:22 localhost kernel: __blkdev_issue_zeroout+0x164/0x210
Oct 5 22:48:22 localhost kernel: blkdev_issue_zeroout+0x62/0xc0
Oct 5 22:48:22 localhost kernel: ext4_init_inode_table+0x166/0x380 [ext4]
Oct 5 22:48:22 localhost kernel: ext4_lazyinit_thread+0x2d3/0x350 [ext4]
Oct 5 22:48:22 localhost kernel: kthread+0x109/0x140
Oct 5 22:48:22 localhost kernel: ? ext4_clear_request_list+0x70/0x70 [ext4]
Oct 5 22:48:22 localhost kernel: ? kthread_park+0x60/0x60
Oct 5 22:48:22 localhost kernel: ret_from_fork+0x25/0x30
Oct 5 22:48:22 localhost kernel: INFO: task md0_reshape:3751 blocked for more than 120 seconds.
Oct 5 22:48:22 localhost kernel: Tainted: G OE 4.13.0-rc5 #1
Oct 5 22:48:22 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 5 22:48:22 localhost kernel: md0_reshape D 0 3751 2 0x00000080
Oct 5 22:48:22 localhost kernel: Call Trace:
Oct 5 22:48:22 localhost kernel: __schedule+0x28d/0x890
Oct 5 22:48:22 localhost kernel: schedule+0x36/0x80
Oct 5 22:48:22 localhost kernel: raid5_sync_request+0x2cf/0x370 [raid456]
Oct 5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:48:22 localhost kernel: md_do_sync+0xafe/0xee0 [md_mod]
Oct 5 22:48:22 localhost kernel: ? remove_wait_queue+0x60/0x60
Oct 5 22:48:22 localhost kernel: md_thread+0x132/0x180 [md_mod]
Oct 5 22:48:22 localhost kernel: kthread+0x109/0x140
Oct 5 22:48:22 localhost kernel: ? find_pers+0x70/0x70 [md_mod]
Oct 5 22:48:22 localhost kernel: ? kthread_park+0x60/0x60
Oct 5 22:48:22 localhost kernel: ret_from_fork+0x25/0x30
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-10-09 1:21 ` Xiao Ni
@ 2017-10-09 4:57 ` NeilBrown
2017-10-09 5:32 ` Xiao Ni
0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2017-10-09 4:57 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 5608 bytes --]
On Sun, Oct 08 2017, Xiao Ni wrote:
> ----- Original Message -----
>> From: "NeilBrown" <neilb@suse.com>
>> To: "Xiao Ni" <xni@redhat.com>
>> Cc: linux-raid@vger.kernel.org
>> Sent: Friday, October 6, 2017 12:32:19 PM
>> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>>
>> On Fri, Oct 06 2017, Xiao Ni wrote:
>>
>> > On 10/05/2017 01:17 PM, NeilBrown wrote:
>> >> On Thu, Sep 14 2017, Xiao Ni wrote:
>> >>
>> >>>> What do
>> >>>> cat /proc/8987/stack
>> >>>> cat /proc/8983/stack
>> >>>> cat /proc/8966/stack
>> >>>> cat /proc/8381/stack
>> >>>>
>> >>>> show??
>> >> ...
>> >>
>> >>> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add
>> >>> lockdep_assert_held(&mddev->reconfig_mutex)?
>> >>> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
>> >>> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
>> >>> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
>> >>> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
>> >>> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
>> >>> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
>> >>> [<ffffffff81260457>] __vfs_write+0x37/0x170
>> >>> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
>> >>> [<ffffffff81263015>] SyS_write+0x55/0xc0
>> >>> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
>> >>> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
>> >>> [<ffffffffffffffff>] 0xffffffffffffffff
>> >>>
>> >>> [jbd2/md0-8]
>> >>> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
>> >>> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
>> >>> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
>> >>> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
>> >>> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
>> >>> [<ffffffff81376675>] submit_bio+0x75/0x150
>> >>> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
>> >>> [<ffffffff8129e683>] submit_bh+0x13/0x20
>> >>> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
>> >>> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
>> >>> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
>> >>> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
>> >>> [<ffffffff810c73f9>] kthread+0x109/0x140
>> >>> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
>> >>> [<ffffffffffffffff>] 0xffffffffffffffff
>> >> Thanks for this (and sorry it took so long to get to it).
>> >> It looks like
>> >>
>> >> Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and
>> >> md_write_start()")
>> >>
>> >> is badly broken. I wonder how it ever passed testing.
>> >>
>> >> In write_start() is change the wait_event() call to
>> >>
>> >> wait_event(mddev->sb_wait,
>> >> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
>> >> !mddev->suspended);
>> >>
>> >>
>> >> That should be
>> >>
>> >> wait_event(mddev->sb_wait,
>> >> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) ||
>> >> mddev->suspended);
>> > Hi Neil
>> >
>> > Do we want write bio can be handled when mddev->suspended is 1? After
>> > changing to this,
>> > write bio can be handled when mddev->suspended is 1.
>>
>> This is OK.
>> New write bios will not get past md_handle_request().
>> A write bios that did get past md_handle_request() is still allowed
>> through md_write_start(). The mddev_suspend() call won't complete until
>> that write bio has finished.
>
> Hi Neil
>
> Thanks for the explanation. I took some time to read the emails about the
> patch cc27b0c78 which introduced this. It's similar with this problem I
> countered. But there is a call of function mddev_suspend in level_store.
> So add the check of mddev->suspended in md_write_start can fix the problem
> "reshape raid5 -> raid6 atop bcache deadlocks at start on md_attr_store /
> raid5_make_request".
>
> In function suspend_lo_store it doesn't call mddev_suspend under mddev->reconfig_mutex.
It would if you had applied
[PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce()
Did you apply all 4 patches?
> So there is still a race possibility as you said at first analysis.
>>
>> >
>> > When the stuck happens, mddev->suspended is 0 and MD_SB_CHANGE_PENDING
>> > is set. So
>> > the patch can't fix this problem. I tried the patch, the problem still
>> > exists.
>> >
>>
>>
>> I need to see all the stack traces.
>
> I've added the calltrace as a attachment.
Thanks. I looks suspend_lo_store() is calling raid5_quiesce() directly
as you say - so a patch is missing.
>
>>
>>
>> > [ 7710.589274] mddev suspend : 0
>> > [ 7710.592228] mddev ro : 0
>> > [ 7710.594746] mddev insync : 0
>> > [ 7710.597620] mddev SB CHANGE PENDING is set
>> > [ 7710.601698] mddev SB CHANGE CLEAN is set
>> > [ 7710.605601] mddev->persistent : 1
>> > [ 7710.608905] mddev->external : 0
>> > [ 7710.612030] conf quiesce : 2
>> >
>> > raid5 is still spinning.
>> >
>> > Hmm, I have a question. Why can't call md_check_recovery when
>> > MD_SB_CHANGE_PENDING
>> > is set in raid5d?
>>
>> When MD_SB_CHANGE_PENDING is not set, there is no need to call
>> md_check_recovery(). I wouldn't hurt except that it would be a waste of
>> time.
>
> I'm confused. If we want to call md_check_recovery when MD_SB_CHANGE_PENDING
> is set, it should be
Sorry, I described the condition wrongly.
If any bit is set in ->sb_flags (except MD_SB_CHANGE_PENDING), then
we need to call md_check_recovery(). If none of those other bits
are set, there is no need.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-10-09 4:57 ` NeilBrown
@ 2017-10-09 5:32 ` Xiao Ni
2017-10-09 5:52 ` NeilBrown
0 siblings, 1 reply; 27+ messages in thread
From: Xiao Ni @ 2017-10-09 5:32 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On 10/09/2017 12:57 PM, NeilBrown wrote:
> On Sun, Oct 08 2017, Xiao Ni wrote:
>
>> ----- Original Message -----
>>> From: "NeilBrown" <neilb@suse.com>
>>> To: "Xiao Ni" <xni@redhat.com>
>>> Cc: linux-raid@vger.kernel.org
>>> Sent: Friday, October 6, 2017 12:32:19 PM
>>> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>>>
>>> On Fri, Oct 06 2017, Xiao Ni wrote:
>>>
>>>> On 10/05/2017 01:17 PM, NeilBrown wrote:
>>>>> On Thu, Sep 14 2017, Xiao Ni wrote:
>>>>>
>>>>>>> What do
>>>>>>> cat /proc/8987/stack
>>>>>>> cat /proc/8983/stack
>>>>>>> cat /proc/8966/stack
>>>>>>> cat /proc/8381/stack
>>>>>>>
>>>>>>> show??
>>>>> ...
>>>>>
>>>>>> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add
>>>>>> lockdep_assert_held(&mddev->reconfig_mutex)?
>>>>>> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
>>>>>> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
>>>>>> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
>>>>>> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
>>>>>> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
>>>>>> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
>>>>>> [<ffffffff81260457>] __vfs_write+0x37/0x170
>>>>>> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
>>>>>> [<ffffffff81263015>] SyS_write+0x55/0xc0
>>>>>> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
>>>>>> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>>
>>>>>> [jbd2/md0-8]
>>>>>> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
>>>>>> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
>>>>>> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
>>>>>> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
>>>>>> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
>>>>>> [<ffffffff81376675>] submit_bio+0x75/0x150
>>>>>> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
>>>>>> [<ffffffff8129e683>] submit_bh+0x13/0x20
>>>>>> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
>>>>>> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
>>>>>> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
>>>>>> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
>>>>>> [<ffffffff810c73f9>] kthread+0x109/0x140
>>>>>> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>> Thanks for this (and sorry it took so long to get to it).
>>>>> It looks like
>>>>>
>>>>> Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and
>>>>> md_write_start()")
>>>>>
>>>>> is badly broken. I wonder how it ever passed testing.
>>>>>
>>>>> In write_start() is change the wait_event() call to
>>>>>
>>>>> wait_event(mddev->sb_wait,
>>>>> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
>>>>> !mddev->suspended);
>>>>>
>>>>>
>>>>> That should be
>>>>>
>>>>> wait_event(mddev->sb_wait,
>>>>> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) ||
>>>>> mddev->suspended);
>>>> Hi Neil
>>>>
>>>> Do we want write bio can be handled when mddev->suspended is 1? After
>>>> changing to this,
>>>> write bio can be handled when mddev->suspended is 1.
>>> This is OK.
>>> New write bios will not get past md_handle_request().
>>> A write bios that did get past md_handle_request() is still allowed
>>> through md_write_start(). The mddev_suspend() call won't complete until
>>> that write bio has finished.
>> Hi Neil
>>
>> Thanks for the explanation. I took some time to read the emails about the
>> patch cc27b0c78 which introduced this. It's similar with this problem I
>> countered. But there is a call of function mddev_suspend in level_store.
>> So add the check of mddev->suspended in md_write_start can fix the problem
>> "reshape raid5 -> raid6 atop bcache deadlocks at start on md_attr_store /
>> raid5_make_request".
>>
>> In function suspend_lo_store it doesn't call mddev_suspend under mddev->reconfig_mutex.
> It would if you had applied
> [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce()
>
> Did you apply all 4 patches?
Sorry, it's my mistake. I insmod the wrong module. I'll apply the four
patches
and do test again.
> Thanks. I looks suspend_lo_store() is calling raid5_quiesce() directly
> as you say - so a patch is missing.
Yes, thanks for pointing about this.
>>>> Hmm, I have a question. Why can't call md_check_recovery when
>>>> MD_SB_CHANGE_PENDING
>>>> is set in raid5d?
>>> When MD_SB_CHANGE_PENDING is not set, there is no need to call
>>> md_check_recovery(). I wouldn't hurt except that it would be a waste of
>>> time.
>> I'm confused. If we want to call md_check_recovery when MD_SB_CHANGE_PENDING
>> is set, it should be
> Sorry, I described the condition wrongly.
> If any bit is set in ->sb_flags (except MD_SB_CHANGE_PENDING), then
> we need to call md_check_recovery(). If none of those other bits
> are set, there is no need.
Hmm, so it's the first question. Why can't call md_check_recovery when
MD_SB_CHANGE_PENDING
is set. It needs to update the superblock too when MD_SB_CHANGE_PENDING
is set. I can't
understand this part.
Can it be:
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6299,7 +6299,7 @@ static void raid5d(struct md_thread *thread)
break;
handled += batch_size;
- if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
+ if (mddev->sb_flags) {
Best Regards
Xiao
>
> NeilBrown
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-10-09 5:32 ` Xiao Ni
@ 2017-10-09 5:52 ` NeilBrown
2017-10-10 6:05 ` Xiao Ni
0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2017-10-09 5:52 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 6070 bytes --]
On Mon, Oct 09 2017, Xiao Ni wrote:
> On 10/09/2017 12:57 PM, NeilBrown wrote:
>> On Sun, Oct 08 2017, Xiao Ni wrote:
>>
>>> ----- Original Message -----
>>>> From: "NeilBrown" <neilb@suse.com>
>>>> To: "Xiao Ni" <xni@redhat.com>
>>>> Cc: linux-raid@vger.kernel.org
>>>> Sent: Friday, October 6, 2017 12:32:19 PM
>>>> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>>>>
>>>> On Fri, Oct 06 2017, Xiao Ni wrote:
>>>>
>>>>> On 10/05/2017 01:17 PM, NeilBrown wrote:
>>>>>> On Thu, Sep 14 2017, Xiao Ni wrote:
>>>>>>
>>>>>>>> What do
>>>>>>>> cat /proc/8987/stack
>>>>>>>> cat /proc/8983/stack
>>>>>>>> cat /proc/8966/stack
>>>>>>>> cat /proc/8381/stack
>>>>>>>>
>>>>>>>> show??
>>>>>> ...
>>>>>>
>>>>>>> /usr/sbin/mdadm --grow --continue /dev/md0. Is it the reason to add
>>>>>>> lockdep_assert_held(&mddev->reconfig_mutex)?
>>>>>>> [root@dell-pr1700-02 ~]# cat /proc/8983/stack
>>>>>>> [<ffffffffa0a3464c>] mddev_suspend+0x12c/0x160 [md_mod]
>>>>>>> [<ffffffffa0a379ec>] suspend_lo_store+0x7c/0xe0 [md_mod]
>>>>>>> [<ffffffffa0a3b7d0>] md_attr_store+0x80/0xc0 [md_mod]
>>>>>>> [<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
>>>>>>> [<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
>>>>>>> [<ffffffff81260457>] __vfs_write+0x37/0x170
>>>>>>> [<ffffffff812619e2>] vfs_write+0xb2/0x1b0
>>>>>>> [<ffffffff81263015>] SyS_write+0x55/0xc0
>>>>>>> [<ffffffff810037c7>] do_syscall_64+0x67/0x150
>>>>>>> [<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>>>
>>>>>>> [jbd2/md0-8]
>>>>>>> [root@dell-pr1700-02 ~]# cat /proc/8966/stack
>>>>>>> [<ffffffffa0a39b20>] md_write_start+0xf0/0x220 [md_mod]
>>>>>>> [<ffffffffa0972b49>] raid5_make_request+0x89/0x8b0 [raid456]
>>>>>>> [<ffffffffa0a34175>] md_make_request+0xf5/0x260 [md_mod]
>>>>>>> [<ffffffff81376427>] generic_make_request+0x117/0x2f0
>>>>>>> [<ffffffff81376675>] submit_bio+0x75/0x150
>>>>>>> [<ffffffff8129e0b0>] submit_bh_wbc+0x140/0x170
>>>>>>> [<ffffffff8129e683>] submit_bh+0x13/0x20
>>>>>>> [<ffffffffa0957e29>] jbd2_write_superblock+0x109/0x230 [jbd2]
>>>>>>> [<ffffffffa0957f8b>] jbd2_journal_update_sb_log_tail+0x3b/0x80 [jbd2]
>>>>>>> [<ffffffffa09517ff>] jbd2_journal_commit_transaction+0x16ef/0x19e0 [jbd2]
>>>>>>> [<ffffffffa0955d02>] kjournald2+0xd2/0x260 [jbd2]
>>>>>>> [<ffffffff810c73f9>] kthread+0x109/0x140
>>>>>>> [<ffffffff817776c5>] ret_from_fork+0x25/0x30
>>>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>> Thanks for this (and sorry it took so long to get to it).
>>>>>> It looks like
>>>>>>
>>>>>> Commit: cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and
>>>>>> md_write_start()")
>>>>>>
>>>>>> is badly broken. I wonder how it ever passed testing.
>>>>>>
>>>>>> In write_start() is change the wait_event() call to
>>>>>>
>>>>>> wait_event(mddev->sb_wait,
>>>>>> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
>>>>>> !mddev->suspended);
>>>>>>
>>>>>>
>>>>>> That should be
>>>>>>
>>>>>> wait_event(mddev->sb_wait,
>>>>>> !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) ||
>>>>>> mddev->suspended);
>>>>> Hi Neil
>>>>>
>>>>> Do we want write bio can be handled when mddev->suspended is 1? After
>>>>> changing to this,
>>>>> write bio can be handled when mddev->suspended is 1.
>>>> This is OK.
>>>> New write bios will not get past md_handle_request().
>>>> A write bios that did get past md_handle_request() is still allowed
>>>> through md_write_start(). The mddev_suspend() call won't complete until
>>>> that write bio has finished.
>>> Hi Neil
>>>
>>> Thanks for the explanation. I took some time to read the emails about the
>>> patch cc27b0c78 which introduced this. It's similar with this problem I
>>> countered. But there is a call of function mddev_suspend in level_store.
>>> So add the check of mddev->suspended in md_write_start can fix the problem
>>> "reshape raid5 -> raid6 atop bcache deadlocks at start on md_attr_store /
>>> raid5_make_request".
>>>
>>> In function suspend_lo_store it doesn't call mddev_suspend under mddev->reconfig_mutex.
>> It would if you had applied
>> [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce()
>>
>> Did you apply all 4 patches?
>
> Sorry, it's my mistake. I insmod the wrong module. I'll apply the four
> patches
> and do test again.
>> Thanks. I looks suspend_lo_store() is calling raid5_quiesce() directly
>> as you say - so a patch is missing.
>
> Yes, thanks for pointing about this.
>
>>>>> Hmm, I have a question. Why can't call md_check_recovery when
>>>>> MD_SB_CHANGE_PENDING
>>>>> is set in raid5d?
>>>> When MD_SB_CHANGE_PENDING is not set, there is no need to call
>>>> md_check_recovery(). I wouldn't hurt except that it would be a waste of
>>>> time.
>>> I'm confused. If we want to call md_check_recovery when MD_SB_CHANGE_PENDING
>>> is set, it should be
>> Sorry, I described the condition wrongly.
>> If any bit is set in ->sb_flags (except MD_SB_CHANGE_PENDING), then
>> we need to call md_check_recovery(). If none of those other bits
>> are set, there is no need.
>
> Hmm, so it's the first question. Why can't call md_check_recovery when
> MD_SB_CHANGE_PENDING
> is set. It needs to update the superblock too when MD_SB_CHANGE_PENDING
> is set. I can't
> understand this part.
>
> Can it be:
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6299,7 +6299,7 @@ static void raid5d(struct md_thread *thread)
> break;
> handled += batch_size;
>
> - if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
> + if (mddev->sb_flags) {
>
Maybe it could, but there is a test in md_check_recovery()
if ( ! (
(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
and it makes sense to match that. There is no point dropping the
spinlock and reclaiming it if md_check_recovery() isn't going to do
anything useful.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-10-09 5:52 ` NeilBrown
@ 2017-10-10 6:05 ` Xiao Ni
2017-10-10 21:20 ` NeilBrown
0 siblings, 1 reply; 27+ messages in thread
From: Xiao Ni @ 2017-10-10 6:05 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 5783 bytes --]
On 10/09/2017 01:52 PM, NeilBrown wrote:
> On Mon, Oct 09 2017, Xiao Ni wrote:
>
>> On 10/09/2017 12:57 PM, NeilBrown wrote:
>>> It would if you had applied
>>> [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce()
>>>
>>> Did you apply all 4 patches?
>> Sorry, it's my mistake. I insmod the wrong module. I'll apply the four
>> patches
>> and do test again.
>>> Thanks. I looks suspend_lo_store() is calling raid5_quiesce() directly
>>> as you say - so a patch is missing.
>> Yes, thanks for pointing about this.
Hi Neil
I applied the four patches and one patch "md: fix deadlock error in
recent patch."
There is a new stuck. It's stuck at suspend_hi_store this time. I add
the calltrace
as an attachment.
I added some printk to print some information.
[12695.993329] mddev suspend : 1
[12695.996270] mddev ro : 0
[12695.998790] mddev insync : 0
[12696.001641] mddev active io: 1
mddev->flags doesn't have MD_SB_CHANGE_PENDING.
root 8653 0.0 0.0 9688 4752 ? DLs 00:52 0:00
/usr/sbin/mdadm --grow --continue /dev/md0
[root@dell-pr1700-02 md]# cat /proc/8653/stack
[<ffffffffa080d64c>] mddev_suspend+0x12c/0x160 [md_mod]
[<ffffffffa081090c>] suspend_hi_store+0x7c/0xe0 [md_mod]
[<ffffffffa0814450>] md_attr_store+0x80/0xc0 [md_mod]
[<ffffffff812ec8da>] sysfs_kf_write+0x3a/0x50
[<ffffffff812ec39f>] kernfs_fop_write+0xff/0x180
[<ffffffff81260457>] __vfs_write+0x37/0x170
[<ffffffff812619e2>] vfs_write+0xb2/0x1b0
[<ffffffff81263015>] SyS_write+0x55/0xc0
[<ffffffff810037c7>] do_syscall_64+0x67/0x150
[<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff
root 1234 0.0 0.0 106008 7280 ? Ss Oct09 0:00
/usr/sbin/sshd -D
root 8655 0.1 0.0 108996 2752 pts/0 D+ 00:52 0:04
| \_ dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
[root@dell-pr1700-02 md]# cat /proc/8655/stack
[<ffffffffa097b09a>] wait_transaction_locked+0x8a/0xd0 [jbd2]
[<ffffffffa097b2d4>] add_transaction_credits+0x1c4/0x2a0 [jbd2]
[<ffffffffa097b587>] start_this_handle+0x197/0x400 [jbd2]
[<ffffffffa097ba3b>] jbd2__journal_start+0xeb/0x1f0 [jbd2]
[<ffffffffa0a26dfd>] __ext4_journal_start_sb+0x6d/0xf0 [ext4]
[<ffffffffa0a45a10>] ext4_da_write_begin+0x140/0x410 [ext4]
[<ffffffff811c4dee>] generic_perform_write+0xbe/0x1b0
[<ffffffff811c812b>] __generic_file_write_iter+0x19b/0x1e0
[<ffffffffa0a32c7f>] ext4_file_write_iter+0x28f/0x3f0 [ext4]
[<ffffffff81260513>] __vfs_write+0xf3/0x170
[<ffffffff812619e2>] vfs_write+0xb2/0x1b0
[<ffffffff81263015>] SyS_write+0x55/0xc0
[<ffffffff810037c7>] do_syscall_64+0x67/0x150
[<ffffffff81777527>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff
root 8143 0.0 0.0 0 0 ? D 00:40 0:00 \_
[kworker/u8:4]
[root@dell-pr1700-02 md]# cat /proc/8143/stack
[<ffffffffa080d131>] md_make_request+0xb1/0x260 [md_mod]
[<ffffffff81376427>] generic_make_request+0x117/0x2f0
[<ffffffff81376675>] submit_bio+0x75/0x150
[<ffffffffa0a5e21c>] ext4_io_submit+0x4c/0x60 [ext4]
[<ffffffffa0a5e3f4>] ext4_bio_write_page+0x1a4/0x3b0 [ext4]
[<ffffffffa0a3e4f7>] mpage_submit_page+0x57/0x70 [ext4]
[<ffffffffa0a3e778>] mpage_map_and_submit_buffers+0x168/0x290 [ext4]
[<ffffffffa0a443f2>] ext4_writepages+0x852/0xe80 [ext4]
[<ffffffff811d6bec>] do_writepages+0x1c/0x70
[<ffffffff81293895>] __writeback_single_inode+0x45/0x320
[<ffffffff812940c0>] writeback_sb_inodes+0x280/0x570
[<ffffffff8129443c>] __writeback_inodes_wb+0x8c/0xc0
[<ffffffff812946e6>] wb_writeback+0x276/0x310
[<ffffffff81294f9c>] wb_workfn+0x19c/0x3b0
[<ffffffff810c0ff9>] process_one_work+0x149/0x360
[<ffffffff810c177d>] worker_thread+0x4d/0x3c0
[<ffffffff810c73f9>] kthread+0x109/0x140
[<ffffffff817776c5>] ret_from_fork+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
>> Hmm, so it's the first question. Why can't call md_check_recovery when
>> MD_SB_CHANGE_PENDING
>> is set. It needs to update the superblock too when MD_SB_CHANGE_PENDING
>> is set. I can't
>> understand this part.
>>
>> Can it be:
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -6299,7 +6299,7 @@ static void raid5d(struct md_thread *thread)
>> break;
>> handled += batch_size;
>>
>> - if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
>> + if (mddev->sb_flags) {
>>
> Maybe it could, but there is a test in md_check_recovery()
>
> if ( ! (
> (mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
>
> and it makes sense to match that. There is no point dropping the
> spinlock and reclaiming it if md_check_recovery() isn't going to do
> anything useful.
It's introduced by:
commit 126925c090155f13e90b9e7e8c4010e96027c00a
Author: NeilBrown <neilb@suse.de>
Date: Tue Sep 7 17:02:47 2010 +1000
md: call md_update_sb even for 'external' metadata arrays.
For external metadata we don't want to take mddev lock if only
MD_SB_CHANGE_PENDING is set. But it also reduce the opportunity
to update superblock for internal metadata if only MD_SB_CHANGE_PENDING
is set.
Can it be:
diff --git a/drivers/md/md.c b/drivers/md/md.c
index b6b7a28..55e9280 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7777,7 +7777,7 @@ void md_check_recovery(struct mddev *mddev)
if (mddev->ro && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
return;
if ( ! (
- (mddev->flags & ~ (1<<MD_CHANGE_PENDING)) ||
+ (mddev->flags & (mddev->external == 1 && ~
(1<<MD_CHANGE_PENDING))) ||
test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
(mddev->external == 0 && mddev->safemode == 1) ||
Best Regards
Xiao
[-- Attachment #2: calltrace --]
[-- Type: text/plain, Size: 15020 bytes --]
[11302.691461] INFO: task kworker/u8:4:8143 blocked for more than 120 seconds.
[11302.698364] Tainted: G OE 4.13.0-rc5 #1
[11302.703719] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11302.711482] kworker/u8:4 D 0 8143 2 0x00000080
[11302.711488] Workqueue: writeback wb_workfn (flush-9:0)
[11302.711489] Call Trace:
[11302.713914] __schedule+0x28d/0x890
[11302.717370] schedule+0x36/0x80
[11302.720485] md_make_request+0xb1/0x260 [md_mod]
[11302.725064] ? remove_wait_queue+0x60/0x60
[11302.729123] generic_make_request+0x117/0x2f0
[11302.733442] submit_bio+0x75/0x150
[11302.736812] ? __test_set_page_writeback+0xc6/0x320
[11302.741660] ext4_io_submit+0x4c/0x60 [ext4]
[11302.745896] ext4_bio_write_page+0x1a4/0x3b0 [ext4]
[11302.750733] mpage_submit_page+0x57/0x70 [ext4]
[11302.755231] mpage_map_and_submit_buffers+0x168/0x290 [ext4]
[11302.760845] ext4_writepages+0x852/0xe80 [ext4]
[11302.765340] ? account_entity_enqueue+0xd8/0x100
[11302.769916] do_writepages+0x1c/0x70
[11302.773462] __writeback_single_inode+0x45/0x320
[11302.778040] writeback_sb_inodes+0x280/0x570
[11302.782273] __writeback_inodes_wb+0x8c/0xc0
[11302.786508] wb_writeback+0x276/0x310
[11302.790137] wb_workfn+0x19c/0x3b0
[11302.793509] process_one_work+0x149/0x360
[11302.797483] worker_thread+0x4d/0x3c0
[11302.801109] kthread+0x109/0x140
[11302.804307] ? rescuer_thread+0x380/0x380
[11302.808280] ? kthread_park+0x60/0x60
[11302.811913] ? do_syscall_64+0x67/0x150
[11302.815715] ret_from_fork+0x25/0x30
[11302.819257] INFO: task jbd2/md0-8:8635 blocked for more than 120 seconds.
[11302.825990] Tainted: G OE 4.13.0-rc5 #1
[11302.831338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11302.839100] jbd2/md0-8 D 0 8635 2 0x00000080
[11302.839101] Call Trace:
[11302.841527] __schedule+0x28d/0x890
[11302.844985] schedule+0x36/0x80
[11302.848097] jbd2_journal_commit_transaction+0x275/0x19e0 [jbd2]
[11302.854052] ? account_entity_dequeue+0xaa/0xe0
[11302.858541] ? dequeue_entity+0xed/0x460
[11302.862428] ? ttwu_do_activate+0x7a/0x90
[11302.866400] ? dequeue_task_fair+0x565/0x820
[11302.870632] ? __switch_to+0x229/0x440
[11302.874350] ? remove_wait_queue+0x60/0x60
[11302.878409] ? lock_timer_base+0x7d/0xa0
[11302.882299] ? try_to_del_timer_sync+0x53/0x80
[11302.886704] kjournald2+0xd2/0x260 [jbd2]
[11302.890676] ? remove_wait_queue+0x60/0x60
[11302.894740] kthread+0x109/0x140
[11302.897940] ? commit_timeout+0x10/0x10 [jbd2]
[11302.902346] ? kthread_park+0x60/0x60
[11302.905973] ret_from_fork+0x25/0x30
[11302.909514] INFO: task mdadm:8653 blocked for more than 120 seconds.
[11302.915812] Tainted: G OE 4.13.0-rc5 #1
[11302.921160] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11302.928923] mdadm D 0 8653 1 0x00000080
[11302.928924] Call Trace:
[11302.931347] __schedule+0x28d/0x890
[11302.934807] schedule+0x36/0x80
[11302.937923] mddev_suspend+0x12c/0x160 [md_mod]
[11302.942416] ? remove_wait_queue+0x60/0x60
[11302.946476] suspend_hi_store+0x7c/0xe0 [md_mod]
[11302.951052] md_attr_store+0x80/0xc0 [md_mod]
[11302.955375] sysfs_kf_write+0x3a/0x50
[11302.959002] kernfs_fop_write+0xff/0x180
[11302.962894] __vfs_write+0x37/0x170
[11302.966351] ? selinux_file_permission+0xe5/0x120
[11302.971013] ? security_file_permission+0x3b/0xc0
[11302.975679] vfs_write+0xb2/0x1b0
[11302.978961] ? syscall_trace_enter+0x1d0/0x2b0
[11302.983366] SyS_write+0x55/0xc0
[11302.986564] do_syscall_64+0x67/0x150
[11302.990193] entry_SYSCALL64_slow_path+0x25/0x25
[11302.994769] RIP: 0033:0x7f016632a840
[11302.998308] RSP: 002b:00007ffdcb5f1d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[11303.005811] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f016632a840
[11303.012883] RDX: 0000000000000005 RSI: 00007ffdcb5f1e30 RDI: 0000000000000003
[11303.019953] RBP: 00007ffdcb5f1e30 R08: 00007ffdcb5f1e30 R09: 000000000000001d
[11303.027027] R10: 000000000000000a R11: 0000000000000246 R12: 00000000004699b1
[11303.034101] R13: 0000000000000000 R14: 00000000004dd000 R15: 0000000000000001
[11303.041173] INFO: task dd:8655 blocked for more than 120 seconds.
[11303.047213] Tainted: G OE 4.13.0-rc5 #1
[11303.052564] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11303.060325] dd D 0 8655 2687 0x00000080
[11303.060326] Call Trace:
[11303.062752] __schedule+0x28d/0x890
[11303.066206] schedule+0x36/0x80
[11303.069317] wait_transaction_locked+0x8a/0xd0 [jbd2]
[11303.074326] ? remove_wait_queue+0x60/0x60
[11303.078385] add_transaction_credits+0x1c4/0x2a0 [jbd2]
[11303.083567] start_this_handle+0x197/0x400 [jbd2]
[11303.088229] ? __add_to_page_cache_locked+0x11c/0x1f0
[11303.093236] ? kmem_cache_alloc+0x194/0x1a0
[11303.097384] jbd2__journal_start+0xeb/0x1f0 [jbd2]
[11303.102142] ? ext4_da_write_begin+0x140/0x410 [ext4]
[11303.107151] __ext4_journal_start_sb+0x6d/0xf0 [ext4]
[11303.112164] ext4_da_write_begin+0x140/0x410 [ext4]
[11303.116997] generic_perform_write+0xbe/0x1b0
[11303.121316] ? file_update_time+0x5e/0x110
[11303.125380] __generic_file_write_iter+0x19b/0x1e0
[11303.130130] ? _crng_backtrack_protect+0x63/0x80
[11303.134712] ext4_file_write_iter+0x28f/0x3f0 [ext4]
[11303.139630] __vfs_write+0xf3/0x170
[11303.143087] vfs_write+0xb2/0x1b0
[11303.146369] ? syscall_trace_enter+0x1d0/0x2b0
[11303.150772] SyS_write+0x55/0xc0
[11303.153971] do_syscall_64+0x67/0x150
[11303.157598] entry_SYSCALL64_slow_path+0x25/0x25
[11303.162177] RIP: 0033:0x7f951dfa0840
[11303.165720] RSP: 002b:00007fff82316978 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[11303.173226] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f951dfa0840
[11303.180296] RDX: 0000000000100000 RSI: 00007f951e38a000 RDI: 0000000000000001
[11303.187369] RBP: 0000000000100000 R08: ffffffffffffffff R09: 0000000000102003
[11303.194445] R10: 00007fff82316690 R11: 0000000000000246 R12: 0000000000100000
[11303.201520] R13: 00007f951e38a000 R14: 00007f951e48a000 R15: 0000000000000000
[11425.569412] INFO: task kworker/u8:4:8143 blocked for more than 120 seconds.
[11425.576313] Tainted: G OE 4.13.0-rc5 #1
[11425.581666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11425.589428] kworker/u8:4 D 0 8143 2 0x00000080
[11425.589434] Workqueue: writeback wb_workfn (flush-9:0)
[11425.589435] Call Trace:
[11425.591862] __schedule+0x28d/0x890
[11425.595317] schedule+0x36/0x80
[11425.598433] md_make_request+0xb1/0x260 [md_mod]
[11425.603015] ? remove_wait_queue+0x60/0x60
[11425.607074] generic_make_request+0x117/0x2f0
[11425.611392] submit_bio+0x75/0x150
[11425.614763] ? __test_set_page_writeback+0xc6/0x320
[11425.619607] ext4_io_submit+0x4c/0x60 [ext4]
[11425.623845] ext4_bio_write_page+0x1a4/0x3b0 [ext4]
[11425.628682] mpage_submit_page+0x57/0x70 [ext4]
[11425.633177] mpage_map_and_submit_buffers+0x168/0x290 [ext4]
[11425.638791] ext4_writepages+0x852/0xe80 [ext4]
[11425.643283] ? account_entity_enqueue+0xd8/0x100
[11425.647856] do_writepages+0x1c/0x70
[11425.651400] __writeback_single_inode+0x45/0x320
[11425.655972] writeback_sb_inodes+0x280/0x570
[11425.660206] __writeback_inodes_wb+0x8c/0xc0
[11425.664436] wb_writeback+0x276/0x310
[11425.668065] wb_workfn+0x19c/0x3b0
[11425.671439] process_one_work+0x149/0x360
[11425.675410] worker_thread+0x4d/0x3c0
[11425.679036] kthread+0x109/0x140
[11425.682236] ? rescuer_thread+0x380/0x380
[11425.686207] ? kthread_park+0x60/0x60
[11425.689837] ? do_syscall_64+0x67/0x150
[11425.693637] ret_from_fork+0x25/0x30
[11425.697180] INFO: task jbd2/md0-8:8635 blocked for more than 120 seconds.
[11425.703907] Tainted: G OE 4.13.0-rc5 #1
[11425.709253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11425.717011] jbd2/md0-8 D 0 8635 2 0x00000080
[11425.717012] Call Trace:
[11425.719436] __schedule+0x28d/0x890
[11425.722891] schedule+0x36/0x80
[11425.726006] jbd2_journal_commit_transaction+0x275/0x19e0 [jbd2]
[11425.731960] ? account_entity_dequeue+0xaa/0xe0
[11425.736448] ? dequeue_entity+0xed/0x460
[11425.740337] ? ttwu_do_activate+0x7a/0x90
[11425.744309] ? dequeue_task_fair+0x565/0x820
[11425.748538] ? __switch_to+0x229/0x440
[11425.752253] ? remove_wait_queue+0x60/0x60
[11425.756313] ? lock_timer_base+0x7d/0xa0
[11425.760201] ? try_to_del_timer_sync+0x53/0x80
[11425.764605] kjournald2+0xd2/0x260 [jbd2]
[11425.768575] ? remove_wait_queue+0x60/0x60
[11425.772636] kthread+0x109/0x140
[11425.775832] ? commit_timeout+0x10/0x10 [jbd2]
[11425.780236] ? kthread_park+0x60/0x60
[11425.783863] ret_from_fork+0x25/0x30
[11425.787406] INFO: task mdadm:8653 blocked for more than 120 seconds.
[11425.793704] Tainted: G OE 4.13.0-rc5 #1
[11425.799053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11425.806815] mdadm D 0 8653 1 0x00000080
[11425.806816] Call Trace:
[11425.809237] __schedule+0x28d/0x890
[11425.812694] schedule+0x36/0x80
[11425.815807] mddev_suspend+0x12c/0x160 [md_mod]
[11425.820297] ? remove_wait_queue+0x60/0x60
[11425.824357] suspend_hi_store+0x7c/0xe0 [md_mod]
[11425.828933] md_attr_store+0x80/0xc0 [md_mod]
[11425.833253] sysfs_kf_write+0x3a/0x50
[11425.836881] kernfs_fop_write+0xff/0x180
[11425.840771] __vfs_write+0x37/0x170
[11425.844228] ? selinux_file_permission+0xe5/0x120
[11425.848888] ? security_file_permission+0x3b/0xc0
[11425.853552] vfs_write+0xb2/0x1b0
[11425.856836] ? syscall_trace_enter+0x1d0/0x2b0
[11425.861242] SyS_write+0x55/0xc0
[11425.864438] do_syscall_64+0x67/0x150
[11425.868063] entry_SYSCALL64_slow_path+0x25/0x25
[11425.872639] RIP: 0033:0x7f016632a840
[11425.876177] RSP: 002b:00007ffdcb5f1d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[11425.883680] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f016632a840
[11425.890752] RDX: 0000000000000005 RSI: 00007ffdcb5f1e30 RDI: 0000000000000003
[11425.897824] RBP: 00007ffdcb5f1e30 R08: 00007ffdcb5f1e30 R09: 000000000000001d
[11425.904895] R10: 000000000000000a R11: 0000000000000246 R12: 00000000004699b1
[11425.911966] R13: 0000000000000000 R14: 00000000004dd000 R15: 0000000000000001
[11425.919038] INFO: task dd:8655 blocked for more than 120 seconds.
[11425.925076] Tainted: G OE 4.13.0-rc5 #1
[11425.930424] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11425.938181] dd D 0 8655 2687 0x00000080
[11425.938182] Call Trace:
[11425.940604] __schedule+0x28d/0x890
[11425.944059] schedule+0x36/0x80
[11425.947170] wait_transaction_locked+0x8a/0xd0 [jbd2]
[11425.952178] ? remove_wait_queue+0x60/0x60
[11425.956238] add_transaction_credits+0x1c4/0x2a0 [jbd2]
[11425.961418] start_this_handle+0x197/0x400 [jbd2]
[11425.966079] ? __add_to_page_cache_locked+0x11c/0x1f0
[11425.971086] ? kmem_cache_alloc+0x194/0x1a0
[11425.975232] jbd2__journal_start+0xeb/0x1f0 [jbd2]
[11425.979987] ? ext4_da_write_begin+0x140/0x410 [ext4]
[11425.984997] __ext4_journal_start_sb+0x6d/0xf0 [ext4]
[11425.990009] ext4_da_write_begin+0x140/0x410 [ext4]
[11425.994840] generic_perform_write+0xbe/0x1b0
[11425.999156] ? file_update_time+0x5e/0x110
[11426.003217] __generic_file_write_iter+0x19b/0x1e0
[11426.007963] ? _crng_backtrack_protect+0x63/0x80
[11426.012546] ext4_file_write_iter+0x28f/0x3f0 [ext4]
[11426.017464] __vfs_write+0xf3/0x170
[11426.020921] vfs_write+0xb2/0x1b0
[11426.024204] ? syscall_trace_enter+0x1d0/0x2b0
[11426.028606] SyS_write+0x55/0xc0
[11426.031806] do_syscall_64+0x67/0x150
[11426.035432] entry_SYSCALL64_slow_path+0x25/0x25
[11426.040010] RIP: 0033:0x7f951dfa0840
[11426.043550] RSP: 002b:00007fff82316978 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[11426.051054] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f951dfa0840
[11426.058126] RDX: 0000000000100000 RSI: 00007f951e38a000 RDI: 0000000000000001
[11426.065200] RBP: 0000000000100000 R08: ffffffffffffffff R09: 0000000000102003
[11426.072273] R10: 00007fff82316690 R11: 0000000000000246 R12: 0000000000100000
[11426.079345] R13: 00007f951e38a000 R14: 00007f951e48a000 R15: 0000000000000000
[11548.447362] INFO: task kworker/u8:4:8143 blocked for more than 120 seconds.
[11548.454261] Tainted: G OE 4.13.0-rc5 #1
[11548.459616] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11548.467378] kworker/u8:4 D 0 8143 2 0x00000080
[11548.467383] Workqueue: writeback wb_workfn (flush-9:0)
[11548.467384] Call Trace:
[11548.469811] __schedule+0x28d/0x890
[11548.473266] schedule+0x36/0x80
[11548.476379] md_make_request+0xb1/0x260 [md_mod]
[11548.480956] ? remove_wait_queue+0x60/0x60
[11548.485014] generic_make_request+0x117/0x2f0
[11548.489331] submit_bio+0x75/0x150
[11548.492702] ? __test_set_page_writeback+0xc6/0x320
[11548.497550] ext4_io_submit+0x4c/0x60 [ext4]
[11548.501786] ext4_bio_write_page+0x1a4/0x3b0 [ext4]
[11548.506624] mpage_submit_page+0x57/0x70 [ext4]
[11548.511119] mpage_map_and_submit_buffers+0x168/0x290 [ext4]
[11548.516731] ext4_writepages+0x852/0xe80 [ext4]
[11548.521222] ? account_entity_enqueue+0xd8/0x100
[11548.525796] do_writepages+0x1c/0x70
[11548.529340] __writeback_single_inode+0x45/0x320
[11548.533913] writeback_sb_inodes+0x280/0x570
[11548.538146] __writeback_inodes_wb+0x8c/0xc0
[11548.542375] wb_writeback+0x276/0x310
[11548.546001] wb_workfn+0x19c/0x3b0
[11548.549373] process_one_work+0x149/0x360
[11548.553344] worker_thread+0x4d/0x3c0
[11548.556972] kthread+0x109/0x140
[11548.560172] ? rescuer_thread+0x380/0x380
[11548.564142] ? kthread_park+0x60/0x60
[11548.567770] ? do_syscall_64+0x67/0x150
[11548.571571] ret_from_fork+0x25/0x30
[11548.575111] INFO: task jbd2/md0-8:8635 blocked for more than 120 seconds.
[11548.581839] Tainted: G OE 4.13.0-rc5 #1
[11548.587186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11548.594948] jbd2/md0-8 D 0 8635 2 0x00000080
[11548.594949] Call Trace:
[11548.597372] __schedule+0x28d/0x890
[11548.600825] schedule+0x36/0x80
[11548.603937] jbd2_journal_commit_transaction+0x275/0x19e0 [jbd2]
[11548.609892] ? account_entity_dequeue+0xaa/0xe0
[11548.614379] ? dequeue_entity+0xed/0x460
[11548.618268] ? ttwu_do_activate+0x7a/0x90
[11548.622240] ? dequeue_task_fair+0x565/0x820
[11548.626469] ? __switch_to+0x229/0x440
[11548.630187] ? remove_wait_queue+0x60/0x60
[11548.634245] ? lock_timer_base+0x7d/0xa0
[11548.638133] ? try_to_del_timer_sync+0x53/0x80
[11548.642536] kjournald2+0xd2/0x260 [jbd2]
[11548.646506] ? remove_wait_queue+0x60/0x60
[11548.650564] kthread+0x109/0x140
[11548.653763] ? commit_timeout+0x10/0x10 [jbd2]
[11548.658168] ? kthread_park+0x60/0x60
[11548.661794] ret_from_fork+0x25/0x30
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-10-10 6:05 ` Xiao Ni
@ 2017-10-10 21:20 ` NeilBrown
[not found] ` <960568852.19225619.1507689864371.JavaMail.zimbra@redhat.com>
0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2017-10-10 21:20 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]
On Tue, Oct 10 2017, Xiao Ni wrote:
> On 10/09/2017 01:52 PM, NeilBrown wrote:
>> On Mon, Oct 09 2017, Xiao Ni wrote:
>>
>>> On 10/09/2017 12:57 PM, NeilBrown wrote:
>>>> It would if you had applied
>>>> [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce()
>>>>
>>>> Did you apply all 4 patches?
>>> Sorry, it's my mistake. I insmod the wrong module. I'll apply the four
>>> patches
>>> and do test again.
>>>> Thanks. I looks suspend_lo_store() is calling raid5_quiesce() directly
>>>> as you say - so a patch is missing.
>>> Yes, thanks for pointing about this.
>
> Hi Neil
>
> I applied the four patches and one patch "md: fix deadlock error in
> recent patch."
> There is a new stuck. It's stuck at suspend_hi_store this time. I add
> the calltrace
> as an attachment.
>
> I added some printk to print some information.
>
> [12695.993329] mddev suspend : 1
> [12695.996270] mddev ro : 0
> [12695.998790] mddev insync : 0
> [12696.001641] mddev active io: 1
You didn't tell me where (in the code) you printed this information.
That makes it hard to interpret.
If mddev->active_io is 1, then some thread must be in this range
of code
atomic_inc(&mddev->active_io);
rcu_read_unlock();
if (!mddev->pers->make_request(mddev, bio)) {
atomic_dec(&mddev->active_io);
wake_up(&mddev->sb_wait);
goto check_suspended;
}
if (atomic_dec_and_test(&mddev->active_io) && mddev->suspended)
wake_up(&mddev->sb_wait);
If that thread is blocked (which appears to be the case) it must be in
->make_request() because nothing else there blocks.
None of the threads you showed are in that code.
But you didn't report all the threads - only those which hard printed
warnings.
echo t > /proc/sysrq-trigger
will produce the stack traces of *all* threads. That would be more
useful.
>
> Can it be:
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b6b7a28..55e9280 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7777,7 +7777,7 @@ void md_check_recovery(struct mddev *mddev)
> if (mddev->ro && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
> return;
> if ( ! (
> - (mddev->flags & ~ (1<<MD_CHANGE_PENDING)) ||
> + (mddev->flags & (mddev->external == 1 && ~
> (1<<MD_CHANGE_PENDING))) ||
Please read that code again and see how it doesn't make any sense at
all.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-12 1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
` (5 preceding siblings ...)
2017-09-13 2:11 ` Xiao Ni
@ 2017-09-30 9:46 ` Xiao Ni
2017-10-05 5:03 ` NeilBrown
6 siblings, 1 reply; 27+ messages in thread
From: Xiao Ni @ 2017-09-30 9:46 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On 09/12/2017 09:49 AM, NeilBrown wrote:
> Hi,
> I looked again at the previous patch I posted which tried to mak
> md_update_sb() safe without taking reconfig_mutex, and realized that
> it had serious problems, particularly around devices being added or
> removed while the update was happening.
Could you explain this in detail? What's the serious problems?
Regards
Xiao
>
> So I decided to try a different approach, which is embodied in these
> patches. The md thread is now explicitly allowed to call
> md_update_sb() while some other thread holds the lock and
> waits for mddev_suspend() to complete.
>
> Please test these and confirm that they still address the problem you
> found.
>
> Thanks,
> NeilBrown
>
> ---
>
> NeilBrown (4):
> md: always hold reconfig_mutex when calling mddev_suspend()
> md: don't call bitmap_create() while array is quiesced.
> md: use mddev_suspend/resume instead of ->quiesce()
> md: allow metadata update while suspending.
>
>
> drivers/md/dm-raid.c | 5 ++++-
> drivers/md/md.c | 45 ++++++++++++++++++++++++++++++++-------------
> drivers/md/md.h | 6 ++++++
> drivers/md/raid5-cache.c | 2 ++
> 4 files changed, 44 insertions(+), 14 deletions(-)
>
> --
> Signature
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-09-30 9:46 ` Xiao Ni
@ 2017-10-05 5:03 ` NeilBrown
2017-10-06 3:40 ` Xiao Ni
0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2017-10-05 5:03 UTC (permalink / raw)
To: Xiao Ni; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1908 bytes --]
On Sat, Sep 30 2017, Xiao Ni wrote:
> On 09/12/2017 09:49 AM, NeilBrown wrote:
>> Hi,
>> I looked again at the previous patch I posted which tried to mak
>> md_update_sb() safe without taking reconfig_mutex, and realized that
>> it had serious problems, particularly around devices being added or
>> removed while the update was happening.
> Could you explain this in detail? What's the serious problems?
My patch allowed md_update_sb() to run without holding ->reconfig_mutex.
md_update_sb() uses rdev_for_each() in several places, so either that
list needs to be kept stable, or md_update_sb() needs to ensure it is
very careful while walking the list.
I couldn't just use a spinlock to protect the lists because one of the
rdev_for_Each calls md_super_write() on some of the devices, which is
not allowed while holding a spinlock.
I decided it was too hard to make md_update_sb() walk the list in a
fully say manner.
NeilBrown
>
> Regards
> Xiao
>>
>> So I decided to try a different approach, which is embodied in these
>> patches. The md thread is now explicitly allowed to call
>> md_update_sb() while some other thread holds the lock and
>> waits for mddev_suspend() to complete.
>>
>> Please test these and confirm that they still address the problem you
>> found.
>>
>> Thanks,
>> NeilBrown
>>
>> ---
>>
>> NeilBrown (4):
>> md: always hold reconfig_mutex when calling mddev_suspend()
>> md: don't call bitmap_create() while array is quiesced.
>> md: use mddev_suspend/resume instead of ->quiesce()
>> md: allow metadata update while suspending.
>>
>>
>> drivers/md/dm-raid.c | 5 ++++-
>> drivers/md/md.c | 45 ++++++++++++++++++++++++++++++++-------------
>> drivers/md/md.h | 6 ++++++
>> drivers/md/raid5-cache.c | 2 ++
>> 4 files changed, 44 insertions(+), 14 deletions(-)
>>
>> --
>> Signature
>>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
2017-10-05 5:03 ` NeilBrown
@ 2017-10-06 3:40 ` Xiao Ni
0 siblings, 0 replies; 27+ messages in thread
From: Xiao Ni @ 2017-10-06 3:40 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
On 10/05/2017 01:03 PM, NeilBrown wrote:
> On Sat, Sep 30 2017, Xiao Ni wrote:
>
>> On 09/12/2017 09:49 AM, NeilBrown wrote:
>>> Hi,
>>> I looked again at the previous patch I posted which tried to mak
>>> md_update_sb() safe without taking reconfig_mutex, and realized that
>>> it had serious problems, particularly around devices being added or
>>> removed while the update was happening.
>> Could you explain this in detail? What's the serious problems?
> My patch allowed md_update_sb() to run without holding ->reconfig_mutex.
> md_update_sb() uses rdev_for_each() in several places, so either that
> list needs to be kept stable, or md_update_sb() needs to ensure it is
> very careful while walking the list.
>
> I couldn't just use a spinlock to protect the lists because one of the
> rdev_for_Each calls md_super_write() on some of the devices, which is
> not allowed while holding a spinlock.
>
> I decided it was too hard to make md_update_sb() walk the list in a
> fully say manner.
>
> NeilBrown
Hi Neil
Thanks for this explanation. Now I know the reason.
Regards
XIao
>
>
>> Regards
>> Xiao
>>> So I decided to try a different approach, which is embodied in these
>>> patches. The md thread is now explicitly allowed to call
>>> md_update_sb() while some other thread holds the lock and
>>> waits for mddev_suspend() to complete.
>>>
>>> Please test these and confirm that they still address the problem you
>>> found.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>> ---
>>>
>>> NeilBrown (4):
>>> md: always hold reconfig_mutex when calling mddev_suspend()
>>> md: don't call bitmap_create() while array is quiesced.
>>> md: use mddev_suspend/resume instead of ->quiesce()
>>> md: allow metadata update while suspending.
>>>
>>>
>>> drivers/md/dm-raid.c | 5 ++++-
>>> drivers/md/md.c | 45 ++++++++++++++++++++++++++++++++-------------
>>> drivers/md/md.h | 6 ++++++
>>> drivers/md/raid5-cache.c | 2 ++
>>> 4 files changed, 44 insertions(+), 14 deletions(-)
>>>
>>> --
>>> Signature
>>>
^ permalink raw reply [flat|nested] 27+ messages in thread