All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed
@ 2016-07-28  6:16 Guoqing Jiang
  2016-07-28  6:16 ` [PATCH 2/8] md-cluster: use FORCEUNLOCK in lockres_free Guoqing Jiang
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-07-28  6:16 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

The new_disk_ack could return failure if WAITING_FOR_NEWDISK
is not set, so we need to kick the dev from array in case
failure happened.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2ed547f..743cd21 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6092,9 +6092,11 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
 			export_rdev(rdev);
 
 		if (mddev_is_clustered(mddev)) {
-			if (info->state & (1 << MD_DISK_CANDIDATE))
-				md_cluster_ops->new_disk_ack(mddev, (err == 0));
-			else {
+			if (info->state & (1 << MD_DISK_CANDIDATE)) {
+				err = md_cluster_ops->new_disk_ack(mddev, (err == 0));
+				if (err)
+					md_kick_rdev_from_array(rdev);
+			} else {
 				if (err)
 					md_cluster_ops->add_new_disk_cancel(mddev);
 				else
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/8] md-cluster: use FORCEUNLOCK in lockres_free
  2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
@ 2016-07-28  6:16 ` Guoqing Jiang
  2016-07-28  6:16 ` [PATCH 3/8] md-cluster: remove some unnecessary dlm_unlock_sync Guoqing Jiang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-07-28  6:16 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

For dlm_unlock, we need to pass flag to dlm_unlock as the
third parameter instead of set res->flags.

Also, DLM_LKF_FORCEUNLOCK is more suitable for dlm_unlock
since it works even the lock is on waiting or convert queue.

Acked-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 41573f1..0a0605f 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -194,25 +194,18 @@ out_err:
 
 static void lockres_free(struct dlm_lock_resource *res)
 {
-	int ret;
+	int ret = 0;
 
 	if (!res)
 		return;
 
-	/* cancel a lock request or a conversion request that is blocked */
-	res->flags |= DLM_LKF_CANCEL;
-retry:
-	ret = dlm_unlock(res->ls, res->lksb.sb_lkid, 0, &res->lksb, res);
-	if (unlikely(ret != 0)) {
-		pr_info("%s: failed to unlock %s return %d\n", __func__, res->name, ret);
-
-		/* if a lock conversion is cancelled, then the lock is put
-		 * back to grant queue, need to ensure it is unlocked */
-		if (ret == -DLM_ECANCEL)
-			goto retry;
-	}
-	res->flags &= ~DLM_LKF_CANCEL;
-	wait_for_completion(&res->completion);
+	/* use FORCEUNLOCK flag, so we can unlock even the lock is on the
+	 * waiting or convert queue */
+	ret = dlm_unlock(res->ls, res->lksb.sb_lkid, DLM_LKF_FORCEUNLOCK, &res->lksb, res);
+	if (unlikely(ret != 0))
+		pr_err("failed to unlock %s return %d\n", res->name, ret);
+	else
+		wait_for_completion(&res->completion);
 
 	kfree(res->name);
 	kfree(res->lksb.sb_lvbptr);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/8] md-cluster: remove some unnecessary dlm_unlock_sync
  2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
  2016-07-28  6:16 ` [PATCH 2/8] md-cluster: use FORCEUNLOCK in lockres_free Guoqing Jiang
@ 2016-07-28  6:16 ` Guoqing Jiang
  2016-07-28  6:16 ` [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang Guoqing Jiang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-07-28  6:16 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

Since DLM_LKF_FORCEUNLOCK is used in lockres_free,
we don't need to call dlm_unlock_sync before free
lock resource.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 0a0605f..ea2699e 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -281,7 +281,7 @@ static void recover_bitmaps(struct md_thread *thread)
 		ret = bitmap_copy_from_slot(mddev, slot, &lo, &hi, true);
 		if (ret) {
 			pr_err("md-cluster: Could not copy data from bitmap %d\n", slot);
-			goto dlm_unlock;
+			goto clear_bit;
 		}
 		if (hi > 0) {
 			if (lo < mddev->recovery_cp)
@@ -293,8 +293,6 @@ static void recover_bitmaps(struct md_thread *thread)
 			    md_wakeup_thread(mddev->thread);
 			}
 		}
-dlm_unlock:
-		dlm_unlock_sync(bm_lockres);
 clear_bit:
 		lockres_free(bm_lockres);
 		clear_bit(slot, &cinfo->recovery_map);
@@ -763,7 +761,6 @@ static int gather_all_resync_info(struct mddev *mddev, int total_slots)
 			md_check_recovery(mddev);
 		}
 
-		dlm_unlock_sync(bm_lockres);
 		lockres_free(bm_lockres);
 	}
 out:
@@ -1173,7 +1170,6 @@ static void unlock_all_bitmaps(struct mddev *mddev)
 	if (cinfo->other_bitmap_lockres) {
 		for (i = 0; i < mddev->bitmap_info.nodes - 1; i++) {
 			if (cinfo->other_bitmap_lockres[i]) {
-				dlm_unlock_sync(cinfo->other_bitmap_lockres[i]);
 				lockres_free(cinfo->other_bitmap_lockres[i]);
 			}
 		}
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
  2016-07-28  6:16 ` [PATCH 2/8] md-cluster: use FORCEUNLOCK in lockres_free Guoqing Jiang
  2016-07-28  6:16 ` [PATCH 3/8] md-cluster: remove some unnecessary dlm_unlock_sync Guoqing Jiang
@ 2016-07-28  6:16 ` Guoqing Jiang
  2016-08-01 22:20   ` Shaohua Li
  2016-07-28  6:16 ` [PATCH 5/8] md: changes for MD_STILL_CLOSED flag Guoqing Jiang
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Guoqing Jiang @ 2016-07-28  6:16 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

When some node leaves cluster, then it's bitmap need to be
synced by another node, so "md*_recover" thread is triggered
for the purpose. However, with below steps. we can find tasks
hang happened either in B or C.

1. Node A create a resyncing cluster raid1, assemble it in
   other two nodes (B and C).
2. stop array in B and C.
3. stop array in A.

linux44:~ # ps aux|grep md|grep D
root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]

linux44:~ # cat /proc/5939/stack
[<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
[<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
[<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
[<ffffffff8107ad94>] kthread+0xb4/0xc0
[<ffffffff8152a518>] ret_from_fork+0x58/0x90

linux44:~ # cat /proc/5938/stack
[<ffffffff8107afde>] kthread_stop+0x6e/0x120
[<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
[<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
[<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
[<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
[<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
[<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
[<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
[<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
[<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
[<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
[<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b

The problem is caused by recover_bitmaps can't reliably abort
when the thread is unregistered. So dlm_lock_sync_interruptible
is introduced to detect the thread's situation to fix the problem.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index ea2699e..f3d584e 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -10,6 +10,8 @@
 
 
 #include <linux/module.h>
+#include <linux/completion.h>
+#include <linux/kthread.h>
 #include <linux/dlm.h>
 #include <linux/sched.h>
 #include <linux/raid/md_p.h>
@@ -141,6 +143,40 @@ static int dlm_unlock_sync(struct dlm_lock_resource *res)
 	return dlm_lock_sync(res, DLM_LOCK_NL);
 }
 
+/* An variation of dlm_lock_sync, which make lock request could
+ * be interrupted */
+static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
+				       struct mddev *mddev)
+{
+	int ret = 0;
+
+	ret = dlm_lock(res->ls, mode, &res->lksb,
+			res->flags, res->name, strlen(res->name),
+			0, sync_ast, res, res->bast);
+	if (ret)
+		return ret;
+
+	wait_event(res->completion.wait,
+		   res->completion.done || kthread_should_stop());
+	if (!res->completion.done) {
+		/*
+		 * the convert queue contains the lock request when request is
+		 * interrupted, and sync_ast could still be run, so need to
+		 * cancel the request and reset completion
+		 */
+		ret = dlm_unlock(res->ls, res->lksb.sb_lkid, DLM_LKF_CANCEL, &res->lksb, res);
+		reinit_completion(&res->completion);
+		if (unlikely(ret != 0))
+			pr_info("failed to cancel previous lock request "
+				 "%s return %d\n", res->name, ret);
+		return -EPERM;
+	}
+	wait_for_completion(&res->completion);
+	if (res->lksb.sb_status == 0)
+		res->mode = mode;
+	return res->lksb.sb_status;
+}
+
 static struct dlm_lock_resource *lockres_init(struct mddev *mddev,
 		char *name, void (*bastfn)(void *arg, int mode), int with_lvb)
 {
@@ -272,7 +308,7 @@ static void recover_bitmaps(struct md_thread *thread)
 			goto clear_bit;
 		}
 
-		ret = dlm_lock_sync(bm_lockres, DLM_LOCK_PW);
+		ret = dlm_lock_sync_interruptible(bm_lockres, DLM_LOCK_PW, mddev);
 		if (ret) {
 			pr_err("md-cluster: Could not DLM lock %s: %d\n",
 					str, ret);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/8] md: changes for MD_STILL_CLOSED flag
  2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
                   ` (2 preceding siblings ...)
  2016-07-28  6:16 ` [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang Guoqing Jiang
@ 2016-07-28  6:16 ` Guoqing Jiang
  2016-07-28  6:16 ` [PATCH 6/8] md-cluster: make resync lock also could be interruptted Guoqing Jiang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-07-28  6:16 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

When stop clustered raid while it is pending on resync,
MD_STILL_CLOSED flag could be cleared since udev rule
is triggered to open the mddev. So obviously array can't
be stopped soon and returns EBUSY.

	mdadm -Ss          md-raid-arrays.rules
  set MD_STILL_CLOSED          md_open()
	... ... ...          clear MD_STILL_CLOSED
	do_md_stop

We make below changes to resolve this issue:

1. rename MD_STILL_CLOSED to MD_CLOSING since it is set
   when stop array and it means we are stopping array.
2. let md_open returns early if CLOSING is set, so no
   other threads will open array if one thread is trying
   to close it.
3. no need to clear CLOSING bit in md_open because 1 has
   ensure the bit is cleared, then we also don't need to
   test CLOSING bit in do_md_stop.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md.c | 14 ++++++++------
 drivers/md/md.h |  5 ++---
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 743cd21..0f046ed 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5567,8 +5567,7 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
 	mutex_lock(&mddev->open_mutex);
 	if ((mddev->pers && atomic_read(&mddev->openers) > !!bdev) ||
 	    mddev->sync_thread ||
-	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
-	    (bdev && !test_bit(MD_STILL_CLOSED, &mddev->flags))) {
+	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
 		printk("md: %s still in use.\n",mdname(mddev));
 		if (did_freeze) {
 			clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
@@ -5630,8 +5629,7 @@ static int do_md_stop(struct mddev *mddev, int mode,
 	if ((mddev->pers && atomic_read(&mddev->openers) > !!bdev) ||
 	    mddev->sysfs_active ||
 	    mddev->sync_thread ||
-	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
-	    (bdev && !test_bit(MD_STILL_CLOSED, &mddev->flags))) {
+	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
 		printk("md: %s still in use.\n",mdname(mddev));
 		mutex_unlock(&mddev->open_mutex);
 		if (did_freeze) {
@@ -6814,7 +6812,7 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
 			err = -EBUSY;
 			goto out;
 		}
-		set_bit(MD_STILL_CLOSED, &mddev->flags);
+		set_bit(MD_CLOSING, &mddev->flags);
 		mutex_unlock(&mddev->open_mutex);
 		sync_blockdev(bdev);
 	}
@@ -7063,9 +7061,13 @@ static int md_open(struct block_device *bdev, fmode_t mode)
 	if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
 		goto out;
 
+	if (test_bit(MD_CLOSING, &mddev->flags)) {
+		mutex_unlock(&mddev->open_mutex);
+		return -ENODEV;
+	}
+
 	err = 0;
 	atomic_inc(&mddev->openers);
-	clear_bit(MD_STILL_CLOSED, &mddev->flags);
 	mutex_unlock(&mddev->open_mutex);
 
 	check_disk_change(bdev);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index dc65ca6..0481f9a 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -201,9 +201,8 @@ struct mddev {
 #define MD_CHANGE_PENDING 2	/* switch from 'clean' to 'active' in progress */
 #define MD_UPDATE_SB_FLAGS (1 | 2 | 4)	/* If these are set, md_update_sb needed */
 #define MD_ARRAY_FIRST_USE 3    /* First use of array, needs initialization */
-#define MD_STILL_CLOSED	4	/* If set, then array has not been opened since
-				 * md_ioctl checked on it.
-				 */
+#define MD_CLOSING	4	/* If set, we are closing the array, do not open
+				 * it then */
 #define MD_JOURNAL_CLEAN 5	/* A raid with journal is already clean */
 #define MD_HAS_JOURNAL	6	/* The raid array has journal feature set */
 #define MD_RELOAD_SB	7	/* Reload the superblock because another node
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/8] md-cluster: make resync lock also could be interruptted
  2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
                   ` (3 preceding siblings ...)
  2016-07-28  6:16 ` [PATCH 5/8] md: changes for MD_STILL_CLOSED flag Guoqing Jiang
@ 2016-07-28  6:16 ` Guoqing Jiang
  2016-08-01 22:29   ` Shaohua Li
  2016-07-28  6:16 ` [PATCH 7/8] md-cluster: clean related infos of cluster Guoqing Jiang
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Guoqing Jiang @ 2016-07-28  6:16 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

When one node is perform resync or recovery, other nodes
can't get resync lock and could block for a while before
it holds the lock, so we can't stop array immediately for
this scenario.

To make array could be stop quickly, we check MD_CLOSING
in dlm_lock_sync_interruptible to make us can interrupt
the lock request.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index f3d584e..0984771 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -157,7 +157,8 @@ static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
 		return ret;
 
 	wait_event(res->completion.wait,
-		   res->completion.done || kthread_should_stop());
+		   res->completion.done || kthread_should_stop()
+					|| test_bit(MD_CLOSING, &mddev->flags));
 	if (!res->completion.done) {
 		/*
 		 * the convert queue contains the lock request when request is
@@ -1026,7 +1027,7 @@ static void metadata_update_cancel(struct mddev *mddev)
 static int resync_start(struct mddev *mddev)
 {
 	struct md_cluster_info *cinfo = mddev->cluster_info;
-	return dlm_lock_sync(cinfo->resync_lockres, DLM_LOCK_EX);
+	return dlm_lock_sync_interruptible(cinfo->resync_lockres, DLM_LOCK_EX, mddev);
 }
 
 static int resync_info_update(struct mddev *mddev, sector_t lo, sector_t hi)
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/8] md-cluster: clean related infos of cluster
  2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
                   ` (4 preceding siblings ...)
  2016-07-28  6:16 ` [PATCH 6/8] md-cluster: make resync lock also could be interruptted Guoqing Jiang
@ 2016-07-28  6:16 ` Guoqing Jiang
  2016-07-28  6:16 ` [PATCH 8/8] md-cluster: remove EXPERIMENTAL info Guoqing Jiang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-07-28  6:16 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

cluster_info and bitmap_info.nodes also need to be
cleared when array is stopped.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0f046ed..ec46f4e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5448,12 +5448,14 @@ static void md_clean(struct mddev *mddev)
 	mddev->degraded = 0;
 	mddev->safemode = 0;
 	mddev->private = NULL;
+	mddev->cluster_info = NULL;
 	mddev->bitmap_info.offset = 0;
 	mddev->bitmap_info.default_offset = 0;
 	mddev->bitmap_info.default_space = 0;
 	mddev->bitmap_info.chunksize = 0;
 	mddev->bitmap_info.daemon_sleep = 0;
 	mddev->bitmap_info.max_write_behind = 0;
+	mddev->bitmap_info.nodes = 0;
 }
 
 static void __md_stop_writes(struct mddev *mddev)
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/8] md-cluster: remove EXPERIMENTAL info
  2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
                   ` (5 preceding siblings ...)
  2016-07-28  6:16 ` [PATCH 7/8] md-cluster: clean related infos of cluster Guoqing Jiang
@ 2016-07-28  6:16 ` Guoqing Jiang
  2016-08-01 21:58 ` [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Shaohua Li
  2016-08-03  2:26 ` [PATCH V2 " Guoqing Jiang
  8 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-07-28  6:16 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

We can remove "EXPERIMENTAL" for md-cluster after
several rounds of development and bug fixes.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/Kconfig      | 2 +-
 drivers/md/md-cluster.c | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 02a5345..74bc3ca 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -178,7 +178,7 @@ config MD_FAULTY
 
 
 config MD_CLUSTER
-	tristate "Cluster Support for MD (EXPERIMENTAL)"
+	tristate "Cluster Support for MD"
 	depends on BLK_DEV_MD
 	depends on DLM
 	default n
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 0984771..678f0a6 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -1266,7 +1266,6 @@ static struct md_cluster_operations cluster_ops = {
 
 static int __init cluster_init(void)
 {
-	pr_warn("md-cluster: EXPERIMENTAL. Use with caution\n");
 	pr_info("Registering Cluster MD functions\n");
 	register_md_cluster_operations(&cluster_ops, THIS_MODULE);
 	return 0;
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed
  2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
                   ` (6 preceding siblings ...)
  2016-07-28  6:16 ` [PATCH 8/8] md-cluster: remove EXPERIMENTAL info Guoqing Jiang
@ 2016-08-01 21:58 ` Shaohua Li
  2016-08-02  3:21   ` Guoqing Jiang
       [not found]   ` <579FF771.4060007@suse.com>
  2016-08-03  2:26 ` [PATCH V2 " Guoqing Jiang
  8 siblings, 2 replies; 20+ messages in thread
From: Shaohua Li @ 2016-08-01 21:58 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Thu, Jul 28, 2016 at 02:16:45AM -0400, Guoqing Jiang wrote:
> The new_disk_ack could return failure if WAITING_FOR_NEWDISK
> is not set, so we need to kick the dev from array in case
> failure happened.
> 
> Reviewed-by: NeilBrown <neilb@suse.com>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  drivers/md/md.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 2ed547f..743cd21 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6092,9 +6092,11 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
>  			export_rdev(rdev);
>  
>  		if (mddev_is_clustered(mddev)) {
> -			if (info->state & (1 << MD_DISK_CANDIDATE))
> -				md_cluster_ops->new_disk_ack(mddev, (err == 0));
> -			else {
> +			if (info->state & (1 << MD_DISK_CANDIDATE)) {

if err != 0, we already do export_rdev, do we need to do md_kick_rdev_from_array in that case?


> +				err = md_cluster_ops->new_disk_ack(mddev, (err == 0));
> +				if (err)
> +					md_kick_rdev_from_array(rdev);
> +			} else {
>  				if (err)
>  					md_cluster_ops->add_new_disk_cancel(mddev);
>  				else
> -- 
> 2.6.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  2016-07-28  6:16 ` [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang Guoqing Jiang
@ 2016-08-01 22:20   ` Shaohua Li
  2016-08-02  3:24     ` Guoqing Jiang
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2016-08-01 22:20 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Thu, Jul 28, 2016 at 02:16:48AM -0400, Guoqing Jiang wrote:
> When some node leaves cluster, then it's bitmap need to be
> synced by another node, so "md*_recover" thread is triggered
> for the purpose. However, with below steps. we can find tasks
> hang happened either in B or C.
> 
> 1. Node A create a resyncing cluster raid1, assemble it in
>    other two nodes (B and C).
> 2. stop array in B and C.
> 3. stop array in A.
> 
> linux44:~ # ps aux|grep md|grep D
> root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
> root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]
> 
> linux44:~ # cat /proc/5939/stack
> [<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
> [<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
> [<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
> [<ffffffff8107ad94>] kthread+0xb4/0xc0
> [<ffffffff8152a518>] ret_from_fork+0x58/0x90
> 
> linux44:~ # cat /proc/5938/stack
> [<ffffffff8107afde>] kthread_stop+0x6e/0x120
> [<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
> [<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
> [<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
> [<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
> [<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
> [<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
> [<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
> [<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
> [<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
> [<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
> [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
> 
> The problem is caused by recover_bitmaps can't reliably abort
> when the thread is unregistered. So dlm_lock_sync_interruptible
> is introduced to detect the thread's situation to fix the problem.
> 
> Reviewed-by: NeilBrown <neilb@suse.com>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  drivers/md/md-cluster.c | 38 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
> index ea2699e..f3d584e 100644
> --- a/drivers/md/md-cluster.c
> +++ b/drivers/md/md-cluster.c
> @@ -10,6 +10,8 @@
>  
>  
>  #include <linux/module.h>
> +#include <linux/completion.h>
> +#include <linux/kthread.h>
>  #include <linux/dlm.h>
>  #include <linux/sched.h>
>  #include <linux/raid/md_p.h>
> @@ -141,6 +143,40 @@ static int dlm_unlock_sync(struct dlm_lock_resource *res)
>  	return dlm_lock_sync(res, DLM_LOCK_NL);
>  }
>  
> +/* An variation of dlm_lock_sync, which make lock request could
> + * be interrupted */
> +static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
> +				       struct mddev *mddev)
> +{
> +	int ret = 0;
> +
> +	ret = dlm_lock(res->ls, mode, &res->lksb,
> +			res->flags, res->name, strlen(res->name),
> +			0, sync_ast, res, res->bast);
> +	if (ret)
> +		return ret;
> +
> +	wait_event(res->completion.wait,
> +		   res->completion.done || kthread_should_stop());

can you convert it to a waitq? Directly using the .wait/.done of completion is
really intrusive.

> +	if (!res->completion.done) {
> +		/*
> +		 * the convert queue contains the lock request when request is
> +		 * interrupted, and sync_ast could still be run, so need to
> +		 * cancel the request and reset completion
> +		 */
> +		ret = dlm_unlock(res->ls, res->lksb.sb_lkid, DLM_LKF_CANCEL, &res->lksb, res);
> +		reinit_completion(&res->completion);
> +		if (unlikely(ret != 0))
> +			pr_info("failed to cancel previous lock request "
> +				 "%s return %d\n", res->name, ret);
> +		return -EPERM;
> +	}
> +	wait_for_completion(&res->completion);
> +	if (res->lksb.sb_status == 0)
> +		res->mode = mode;
> +	return res->lksb.sb_status;
> +}
> +
>  static struct dlm_lock_resource *lockres_init(struct mddev *mddev,
>  		char *name, void (*bastfn)(void *arg, int mode), int with_lvb)
>  {
> @@ -272,7 +308,7 @@ static void recover_bitmaps(struct md_thread *thread)
>  			goto clear_bit;
>  		}
>  
> -		ret = dlm_lock_sync(bm_lockres, DLM_LOCK_PW);
> +		ret = dlm_lock_sync_interruptible(bm_lockres, DLM_LOCK_PW, mddev);
>  		if (ret) {
>  			pr_err("md-cluster: Could not DLM lock %s: %d\n",
>  					str, ret);
> -- 
> 2.6.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/8] md-cluster: make resync lock also could be interruptted
  2016-07-28  6:16 ` [PATCH 6/8] md-cluster: make resync lock also could be interruptted Guoqing Jiang
@ 2016-08-01 22:29   ` Shaohua Li
  2016-08-02  1:38     ` Guoqing Jiang
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2016-08-01 22:29 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Thu, Jul 28, 2016 at 02:16:50AM -0400, Guoqing Jiang wrote:
> When one node is perform resync or recovery, other nodes
> can't get resync lock and could block for a while before
> it holds the lock, so we can't stop array immediately for
> this scenario.
> 
> To make array could be stop quickly, we check MD_CLOSING
> in dlm_lock_sync_interruptible to make us can interrupt
> the lock request.

if the thread is waiting for the resync lock and you set MD_CLOSING, where do
you wake up the thread?

 
> Reviewed-by: NeilBrown <neilb@suse.com>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>  drivers/md/md-cluster.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
> index f3d584e..0984771 100644
> --- a/drivers/md/md-cluster.c
> +++ b/drivers/md/md-cluster.c
> @@ -157,7 +157,8 @@ static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
>  		return ret;
>  
>  	wait_event(res->completion.wait,
> -		   res->completion.done || kthread_should_stop());
> +		   res->completion.done || kthread_should_stop()
> +					|| test_bit(MD_CLOSING, &mddev->flags));
>  	if (!res->completion.done) {
>  		/*
>  		 * the convert queue contains the lock request when request is
> @@ -1026,7 +1027,7 @@ static void metadata_update_cancel(struct mddev *mddev)
>  static int resync_start(struct mddev *mddev)
>  {
>  	struct md_cluster_info *cinfo = mddev->cluster_info;
> -	return dlm_lock_sync(cinfo->resync_lockres, DLM_LOCK_EX);
> +	return dlm_lock_sync_interruptible(cinfo->resync_lockres, DLM_LOCK_EX, mddev);
>  }
>  
>  static int resync_info_update(struct mddev *mddev, sector_t lo, sector_t hi)
> -- 
> 2.6.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/8] md-cluster: make resync lock also could be interruptted
  2016-08-01 22:29   ` Shaohua Li
@ 2016-08-02  1:38     ` Guoqing Jiang
  0 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-08-02  1:38 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid



On 08/01/2016 06:29 PM, Shaohua Li wrote:
> On Thu, Jul 28, 2016 at 02:16:50AM -0400, Guoqing Jiang wrote:
>> When one node is perform resync or recovery, other nodes
>> can't get resync lock and could block for a while before
>> it holds the lock, so we can't stop array immediately for
>> this scenario.
>>
>> To make array could be stop quickly, we check MD_CLOSING
>> in dlm_lock_sync_interruptible to make us can interrupt
>> the lock request.
> if the thread is waiting for the resync lock and you set MD_CLOSING, where do
> you wake up the thread?

If array needs to do resync, it would call md_cluster_ops->resync_start
within md_do_sync. And since MD_CLOSING is set when stop md, we
can see do_md_stop would wake up sync_thread.

Thanks,
Guoqing


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed
  2016-08-01 21:58 ` [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Shaohua Li
@ 2016-08-02  3:21   ` Guoqing Jiang
       [not found]   ` <579FF771.4060007@suse.com>
  1 sibling, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-08-02  3:21 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid



On 08/01/2016 05:58 PM, Shaohua Li wrote:
> On Thu, Jul 28, 2016 at 02:16:45AM -0400, Guoqing Jiang wrote:
>> The new_disk_ack could return failure if WAITING_FOR_NEWDISK
>> is not set, so we need to kick the dev from array in case
>> failure happened.
>>
>> Reviewed-by: NeilBrown <neilb@suse.com>
>> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
>> ---
>>   drivers/md/md.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 2ed547f..743cd21 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -6092,9 +6092,11 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
>>   			export_rdev(rdev);
>>   
>>   		if (mddev_is_clustered(mddev)) {
>> -			if (info->state & (1 << MD_DISK_CANDIDATE))
>> -				md_cluster_ops->new_disk_ack(mddev, (err == 0));
>> -			else {
>> +			if (info->state & (1 << MD_DISK_CANDIDATE)) {
> if err != 0, we already do export_rdev, do we need to do md_kick_rdev_from_array in that case?

I suppose you mean the export_rdev before new_disk_ack, it doesn't need 
to call md_kick_rdev_from_array
since we don't bind rdev to array successfully.

                  err = bind_rdev_to_array(rdev, mddev);

                 if (err)
*export_rdev*(rdev);

                 if (mddev_is_clustered(mddev)) {
                         if (info->state & (1 << MD_DISK_CANDIDATE))
md_cluster_ops->new_disk_ack(mddev, (err == 0));

But it is possible that new_disk_ack returns -EINVAL, and the rdev is 
binded with array, so we should call
md_kick_rdev_from_array for the case.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  2016-08-01 22:20   ` Shaohua Li
@ 2016-08-02  3:24     ` Guoqing Jiang
  2016-08-02 22:36       ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Guoqing Jiang @ 2016-08-02  3:24 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid



On 08/01/2016 06:20 PM, Shaohua Li wrote:
> On Thu, Jul 28, 2016 at 02:16:48AM -0400, Guoqing Jiang wrote:
>> When some node leaves cluster, then it's bitmap need to be
>> synced by another node, so "md*_recover" thread is triggered
>> for the purpose. However, with below steps. we can find tasks
>> hang happened either in B or C.
>>
>> 1. Node A create a resyncing cluster raid1, assemble it in
>>     other two nodes (B and C).
>> 2. stop array in B and C.
>> 3. stop array in A.
>>
>> linux44:~ # ps aux|grep md|grep D
>> root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
>> root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]
>>
>> linux44:~ # cat /proc/5939/stack
>> [<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
>> [<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
>> [<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
>> [<ffffffff8107ad94>] kthread+0xb4/0xc0
>> [<ffffffff8152a518>] ret_from_fork+0x58/0x90
>>
>> linux44:~ # cat /proc/5938/stack
>> [<ffffffff8107afde>] kthread_stop+0x6e/0x120
>> [<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
>> [<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
>> [<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
>> [<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
>> [<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
>> [<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
>> [<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
>> [<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
>> [<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
>> [<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
>> [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
>>
>> The problem is caused by recover_bitmaps can't reliably abort
>> when the thread is unregistered. So dlm_lock_sync_interruptible
>> is introduced to detect the thread's situation to fix the problem.
>>
>> Reviewed-by: NeilBrown <neilb@suse.com>
>> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
>> ---
>>   drivers/md/md-cluster.c | 38 +++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 37 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
>> index ea2699e..f3d584e 100644
>> --- a/drivers/md/md-cluster.c
>> +++ b/drivers/md/md-cluster.c
>> @@ -10,6 +10,8 @@
>>   
>>   
>>   #include <linux/module.h>
>> +#include <linux/completion.h>
>> +#include <linux/kthread.h>
>>   #include <linux/dlm.h>
>>   #include <linux/sched.h>
>>   #include <linux/raid/md_p.h>
>> @@ -141,6 +143,40 @@ static int dlm_unlock_sync(struct dlm_lock_resource *res)
>>   	return dlm_lock_sync(res, DLM_LOCK_NL);
>>   }
>>   
>> +/* An variation of dlm_lock_sync, which make lock request could
>> + * be interrupted */
>> +static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
>> +				       struct mddev *mddev)
>> +{
>> +	int ret = 0;
>> +
>> +	ret = dlm_lock(res->ls, mode, &res->lksb,
>> +			res->flags, res->name, strlen(res->name),
>> +			0, sync_ast, res, res->bast);
>> +	if (ret)
>> +		return ret;
>> +
>> +	wait_event(res->completion.wait,
>> +		   res->completion.done || kthread_should_stop());
> can you convert it to a waitq? Directly using the .wait/.done of completion is
> really intrusive.

Maybe, but we still need completion for dlm_lock_resource otherwise there
are different types of dlm_lock_resource, we also need to keep align with
sync_ast as dlm_lock_sync did.

Regards,
Guoqing

>> +	if (!res->completion.done) {
>> +		/*
>> +		 * the convert queue contains the lock request when request is
>> +		 * interrupted, and sync_ast could still be run, so need to
>> +		 * cancel the request and reset completion
>> +		 */
>> +		ret = dlm_unlock(res->ls, res->lksb.sb_lkid, DLM_LKF_CANCEL, &res->lksb, res);
>> +		reinit_completion(&res->completion);
>> +		if (unlikely(ret != 0))
>> +			pr_info("failed to cancel previous lock request "
>> +				 "%s return %d\n", res->name, ret);
>> +		return -EPERM;
>> +	}
>> +	wait_for_completion(&res->completion);
>> +	if (res->lksb.sb_status == 0)
>> +		res->mode = mode;
>> +	return res->lksb.sb_status;
>> +}
>> +
>>   static struct dlm_lock_resource *lockres_init(struct mddev *mddev,
>>   		char *name, void (*bastfn)(void *arg, int mode), int with_lvb)
>>   {
>> @@ -272,7 +308,7 @@ static void recover_bitmaps(struct md_thread *thread)
>>   			goto clear_bit;
>>   		}
>>   
>> -		ret = dlm_lock_sync(bm_lockres, DLM_LOCK_PW);
>> +		ret = dlm_lock_sync_interruptible(bm_lockres, DLM_LOCK_PW, mddev);
>>   		if (ret) {
>>   			pr_err("md-cluster: Could not DLM lock %s: %d\n",
>>   					str, ret);
>> -- 
>> 2.6.2
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed
       [not found]   ` <579FF771.4060007@suse.com>
@ 2016-08-02 22:17     ` Shaohua Li
  2016-08-03  2:15       ` Guoqing Jiang
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2016-08-02 22:17 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Mon, Aug 01, 2016 at 09:29:21PM -0400, Guoqing Jiang wrote:
> 
> 
> On 08/01/2016 05:58 PM, Shaohua Li wrote:
> >On Thu, Jul 28, 2016 at 02:16:45AM -0400, Guoqing Jiang wrote:
> >>The new_disk_ack could return failure if WAITING_FOR_NEWDISK
> >>is not set, so we need to kick the dev from array in case
> >>failure happened.
> >>
> >>Reviewed-by: NeilBrown <neilb@suse.com>
> >>Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> >>---
> >>  drivers/md/md.c | 8 +++++---
> >>  1 file changed, 5 insertions(+), 3 deletions(-)
> >>
> >>diff --git a/drivers/md/md.c b/drivers/md/md.c
> >>index 2ed547f..743cd21 100644
> >>--- a/drivers/md/md.c
> >>+++ b/drivers/md/md.c
> >>@@ -6092,9 +6092,11 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
> >>  			export_rdev(rdev);
> >>  		if (mddev_is_clustered(mddev)) {
> >>-			if (info->state & (1 << MD_DISK_CANDIDATE))
> >>-				md_cluster_ops->new_disk_ack(mddev, (err == 0));
> >>-			else {
> >>+			if (info->state & (1 << MD_DISK_CANDIDATE)) {
> >if err != 0, we already do export_rdev, do we need to do md_kick_rdev_from_array in that case?
> 
> I suppose you mean the export_rdev before new_disk_ack, it doesn't need to
> call md_kick_rdev_from_array
> since we don't bind rdev to array successfully.
> 
>                  err = bind_rdev_to_array(rdev, mddev);
> 
>                 if (err)
> *export_rdev*(rdev);

Let's assume bind_rdev_to_array fails, err != 0, we export_rdev 
>                 if (mddev_is_clustered(mddev)) {
>                         if (info->state & (1 << MD_DISK_CANDIDATE))
> md_cluster_ops->new_disk_ack(mddev, (err == 0));
 
we don't check err here, so we could call ->new_disk_ack, then if
new_disk_ack() fails, we will call md_kick_rdev_from_array(rdev) again with
your patch. we are kicking out rdev which isn't bound yet in this case.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  2016-08-02  3:24     ` Guoqing Jiang
@ 2016-08-02 22:36       ` Shaohua Li
  2016-08-03  2:39         ` Guoqing Jiang
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2016-08-02 22:36 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Mon, Aug 01, 2016 at 11:24:34PM -0400, Guoqing Jiang wrote:
> 
> 
> On 08/01/2016 06:20 PM, Shaohua Li wrote:
> >On Thu, Jul 28, 2016 at 02:16:48AM -0400, Guoqing Jiang wrote:
> >>When some node leaves cluster, then it's bitmap need to be
> >>synced by another node, so "md*_recover" thread is triggered
> >>for the purpose. However, with below steps. we can find tasks
> >>hang happened either in B or C.
> >>
> >>1. Node A create a resyncing cluster raid1, assemble it in
> >>    other two nodes (B and C).
> >>2. stop array in B and C.
> >>3. stop array in A.
> >>
> >>linux44:~ # ps aux|grep md|grep D
> >>root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
> >>root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]
> >>
> >>linux44:~ # cat /proc/5939/stack
> >>[<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
> >>[<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
> >>[<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
> >>[<ffffffff8107ad94>] kthread+0xb4/0xc0
> >>[<ffffffff8152a518>] ret_from_fork+0x58/0x90
> >>
> >>linux44:~ # cat /proc/5938/stack
> >>[<ffffffff8107afde>] kthread_stop+0x6e/0x120
> >>[<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
> >>[<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
> >>[<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
> >>[<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
> >>[<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
> >>[<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
> >>[<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
> >>[<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
> >>[<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
> >>[<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
> >>[<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
> >>
> >>The problem is caused by recover_bitmaps can't reliably abort
> >>when the thread is unregistered. So dlm_lock_sync_interruptible
> >>is introduced to detect the thread's situation to fix the problem.
> >>
> >>Reviewed-by: NeilBrown <neilb@suse.com>
> >>Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> >>---
> >>  drivers/md/md-cluster.c | 38 +++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 37 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
> >>index ea2699e..f3d584e 100644
> >>--- a/drivers/md/md-cluster.c
> >>+++ b/drivers/md/md-cluster.c
> >>@@ -10,6 +10,8 @@
> >>  #include <linux/module.h>
> >>+#include <linux/completion.h>
> >>+#include <linux/kthread.h>
> >>  #include <linux/dlm.h>
> >>  #include <linux/sched.h>
> >>  #include <linux/raid/md_p.h>
> >>@@ -141,6 +143,40 @@ static int dlm_unlock_sync(struct dlm_lock_resource *res)
> >>  	return dlm_lock_sync(res, DLM_LOCK_NL);
> >>  }
> >>+/* An variation of dlm_lock_sync, which make lock request could
> >>+ * be interrupted */
> >>+static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
> >>+				       struct mddev *mddev)
> >>+{
> >>+	int ret = 0;
> >>+
> >>+	ret = dlm_lock(res->ls, mode, &res->lksb,
> >>+			res->flags, res->name, strlen(res->name),
> >>+			0, sync_ast, res, res->bast);
> >>+	if (ret)
> >>+		return ret;
> >>+
> >>+	wait_event(res->completion.wait,
> >>+		   res->completion.done || kthread_should_stop());
> >can you convert it to a waitq? Directly using the .wait/.done of completion is
> >really intrusive.
> 
> Maybe, but we still need completion for dlm_lock_resource otherwise there
> are different types of dlm_lock_resource, we also need to keep align with
> sync_ast as dlm_lock_sync did.

Yes, we need a waitq and variable like completion.done to indicate the event is
done, and convert the completion API to waitq API in other places like
sync_ast. The point is not using the opaque data structure of 'struct
completion'. Diving into implementation details of a unrelated data structure
(completion here) is really intrusive.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed
  2016-08-02 22:17     ` Shaohua Li
@ 2016-08-03  2:15       ` Guoqing Jiang
  0 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-08-03  2:15 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid



On 08/03/2016 06:17 AM, Shaohua Li wrote:
> On Mon, Aug 01, 2016 at 09:29:21PM -0400, Guoqing Jiang wrote:
>>
>> On 08/01/2016 05:58 PM, Shaohua Li wrote:
>>> On Thu, Jul 28, 2016 at 02:16:45AM -0400, Guoqing Jiang wrote:
>>>> The new_disk_ack could return failure if WAITING_FOR_NEWDISK
>>>> is not set, so we need to kick the dev from array in case
>>>> failure happened.
>>>>
>>>> Reviewed-by: NeilBrown <neilb@suse.com>
>>>> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
>>>> ---
>>>>   drivers/md/md.c | 8 +++++---
>>>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>> index 2ed547f..743cd21 100644
>>>> --- a/drivers/md/md.c
>>>> +++ b/drivers/md/md.c
>>>> @@ -6092,9 +6092,11 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
>>>>   			export_rdev(rdev);
>>>>   		if (mddev_is_clustered(mddev)) {
>>>> -			if (info->state & (1 << MD_DISK_CANDIDATE))
>>>> -				md_cluster_ops->new_disk_ack(mddev, (err == 0));
>>>> -			else {
>>>> +			if (info->state & (1 << MD_DISK_CANDIDATE)) {
>>> if err != 0, we already do export_rdev, do we need to do md_kick_rdev_from_array in that case?
>> I suppose you mean the export_rdev before new_disk_ack, it doesn't need to
>> call md_kick_rdev_from_array
>> since we don't bind rdev to array successfully.
>>
>>                   err = bind_rdev_to_array(rdev, mddev);
>>
>>                  if (err)
>> *export_rdev*(rdev);
> Let's assume bind_rdev_to_array fails, err != 0, we export_rdev
>>                  if (mddev_is_clustered(mddev)) {
>>                          if (info->state & (1 << MD_DISK_CANDIDATE))
>> md_cluster_ops->new_disk_ack(mddev, (err == 0));
>   
> we don't check err here, so we could call ->new_disk_ack, then if
> new_disk_ack() fails, we will call md_kick_rdev_from_array(rdev) again with
> your patch. we are kicking out rdev which isn't bound yet in this case.

Thanks, we should also check err before new_disk_ack for the case.
I will send a new patch for it.

Regards,
Guoqing


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 1/8] md-cluster: call md_kick_rdev_from_array once ack failed
  2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
                   ` (7 preceding siblings ...)
  2016-08-01 21:58 ` [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Shaohua Li
@ 2016-08-03  2:26 ` Guoqing Jiang
  8 siblings, 0 replies; 20+ messages in thread
From: Guoqing Jiang @ 2016-08-03  2:26 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

The new_disk_ack could return failure if WAITING_FOR_NEWDISK
is not set, so we need to kick the dev from array in case
failure happened.

And we missed to check err before call new_disk_ack othwise
we could kick a rdev which isn't in array, thanks for the
reminder from Shaohua.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
Changes:
1. add the check for "err" before call new_disk_ack.

 drivers/md/md.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2c3ab6f..e6eb66b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6102,7 +6102,11 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
 
 		if (mddev_is_clustered(mddev)) {
 			if (info->state & (1 << MD_DISK_CANDIDATE))
-				md_cluster_ops->new_disk_ack(mddev, (err == 0));
+				if (!err) {
+					err = md_cluster_ops->new_disk_ack(mddev, (err == 0));
+					if (err)
+						md_kick_rdev_from_array(rdev);
+				}
 			else {
 				if (err)
 					md_cluster_ops->add_new_disk_cancel(mddev);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  2016-08-02 22:36       ` Shaohua Li
@ 2016-08-03  2:39         ` Guoqing Jiang
  2016-08-06  3:59           ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Guoqing Jiang @ 2016-08-03  2:39 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid



On 08/03/2016 06:36 AM, Shaohua Li wrote:
> On Mon, Aug 01, 2016 at 11:24:34PM -0400, Guoqing Jiang wrote:
>>
>> On 08/01/2016 06:20 PM, Shaohua Li wrote:
>>> On Thu, Jul 28, 2016 at 02:16:48AM -0400, Guoqing Jiang wrote:
>>>> When some node leaves cluster, then it's bitmap need to be
>>>> synced by another node, so "md*_recover" thread is triggered
>>>> for the purpose. However, with below steps. we can find tasks
>>>> hang happened either in B or C.
>>>>
>>>> 1. Node A create a resyncing cluster raid1, assemble it in
>>>>     other two nodes (B and C).
>>>> 2. stop array in B and C.
>>>> 3. stop array in A.
>>>>
>>>> linux44:~ # ps aux|grep md|grep D
>>>> root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
>>>> root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]
>>>>
>>>> linux44:~ # cat /proc/5939/stack
>>>> [<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
>>>> [<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
>>>> [<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
>>>> [<ffffffff8107ad94>] kthread+0xb4/0xc0
>>>> [<ffffffff8152a518>] ret_from_fork+0x58/0x90
>>>>
>>>> linux44:~ # cat /proc/5938/stack
>>>> [<ffffffff8107afde>] kthread_stop+0x6e/0x120
>>>> [<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
>>>> [<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
>>>> [<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
>>>> [<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
>>>> [<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
>>>> [<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
>>>> [<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
>>>> [<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
>>>> [<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
>>>> [<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
>>>> [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
>>>>
>>>> The problem is caused by recover_bitmaps can't reliably abort
>>>> when the thread is unregistered. So dlm_lock_sync_interruptible
>>>> is introduced to detect the thread's situation to fix the problem.
>>>>
>>>> Reviewed-by: NeilBrown <neilb@suse.com>
>>>> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
>>>> ---
>>>>   drivers/md/md-cluster.c | 38 +++++++++++++++++++++++++++++++++++++-
>>>>   1 file changed, 37 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
>>>> index ea2699e..f3d584e 100644
>>>> --- a/drivers/md/md-cluster.c
>>>> +++ b/drivers/md/md-cluster.c
>>>> @@ -10,6 +10,8 @@
>>>>   #include <linux/module.h>
>>>> +#include <linux/completion.h>
>>>> +#include <linux/kthread.h>
>>>>   #include <linux/dlm.h>
>>>>   #include <linux/sched.h>
>>>>   #include <linux/raid/md_p.h>
>>>> @@ -141,6 +143,40 @@ static int dlm_unlock_sync(struct dlm_lock_resource *res)
>>>>   	return dlm_lock_sync(res, DLM_LOCK_NL);
>>>>   }
>>>> +/* An variation of dlm_lock_sync, which make lock request could
>>>> + * be interrupted */
>>>> +static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
>>>> +				       struct mddev *mddev)
>>>> +{
>>>> +	int ret = 0;
>>>> +
>>>> +	ret = dlm_lock(res->ls, mode, &res->lksb,
>>>> +			res->flags, res->name, strlen(res->name),
>>>> +			0, sync_ast, res, res->bast);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	wait_event(res->completion.wait,
>>>> +		   res->completion.done || kthread_should_stop());
>>> can you convert it to a waitq? Directly using the .wait/.done of completion is
>>> really intrusive.
>> Maybe, but we still need completion for dlm_lock_resource otherwise there
>> are different types of dlm_lock_resource, we also need to keep align with
>> sync_ast as dlm_lock_sync did.
> Yes, we need a waitq and variable like completion.done to indicate the event is
> done, and convert the completion API to waitq API in other places like
> sync_ast. The point is not using the opaque data structure of 'struct
> completion'. Diving into implementation details of a unrelated data structure
> (completion here) is really intrusive.

I don't want to argue about the intrusive, but we have to create something
which could be achieved by the existed thing.

OTOH, convert completion to waitqueue should be done in a new patch,
and it is a big change though achieveable, we need to do full test for it.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  2016-08-03  2:39         ` Guoqing Jiang
@ 2016-08-06  3:59           ` Shaohua Li
  0 siblings, 0 replies; 20+ messages in thread
From: Shaohua Li @ 2016-08-06  3:59 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Wed, Aug 03, 2016 at 10:39:51AM +0800, Guoqing Jiang wrote:
> 
> 
> On 08/03/2016 06:36 AM, Shaohua Li wrote:
> >On Mon, Aug 01, 2016 at 11:24:34PM -0400, Guoqing Jiang wrote:
> >>
> >>On 08/01/2016 06:20 PM, Shaohua Li wrote:
> >>>On Thu, Jul 28, 2016 at 02:16:48AM -0400, Guoqing Jiang wrote:
> >>>>When some node leaves cluster, then it's bitmap need to be
> >>>>synced by another node, so "md*_recover" thread is triggered
> >>>>for the purpose. However, with below steps. we can find tasks
> >>>>hang happened either in B or C.
> >>>>
> >>>>1. Node A create a resyncing cluster raid1, assemble it in
> >>>>    other two nodes (B and C).
> >>>>2. stop array in B and C.
> >>>>3. stop array in A.
> >>>>
> >>>>linux44:~ # ps aux|grep md|grep D
> >>>>root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
> >>>>root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]
> >>>>
> >>>>linux44:~ # cat /proc/5939/stack
> >>>>[<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
> >>>>[<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
> >>>>[<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
> >>>>[<ffffffff8107ad94>] kthread+0xb4/0xc0
> >>>>[<ffffffff8152a518>] ret_from_fork+0x58/0x90
> >>>>
> >>>>linux44:~ # cat /proc/5938/stack
> >>>>[<ffffffff8107afde>] kthread_stop+0x6e/0x120
> >>>>[<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
> >>>>[<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
> >>>>[<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
> >>>>[<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
> >>>>[<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
> >>>>[<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
> >>>>[<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
> >>>>[<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
> >>>>[<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
> >>>>[<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
> >>>>[<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
> >>>>
> >>>>The problem is caused by recover_bitmaps can't reliably abort
> >>>>when the thread is unregistered. So dlm_lock_sync_interruptible
> >>>>is introduced to detect the thread's situation to fix the problem.
> >>>>
> >>>>Reviewed-by: NeilBrown <neilb@suse.com>
> >>>>Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> >>>>---
> >>>>  drivers/md/md-cluster.c | 38 +++++++++++++++++++++++++++++++++++++-
> >>>>  1 file changed, 37 insertions(+), 1 deletion(-)
> >>>>
> >>>>diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
> >>>>index ea2699e..f3d584e 100644
> >>>>--- a/drivers/md/md-cluster.c
> >>>>+++ b/drivers/md/md-cluster.c
> >>>>@@ -10,6 +10,8 @@
> >>>>  #include <linux/module.h>
> >>>>+#include <linux/completion.h>
> >>>>+#include <linux/kthread.h>
> >>>>  #include <linux/dlm.h>
> >>>>  #include <linux/sched.h>
> >>>>  #include <linux/raid/md_p.h>
> >>>>@@ -141,6 +143,40 @@ static int dlm_unlock_sync(struct dlm_lock_resource *res)
> >>>>  	return dlm_lock_sync(res, DLM_LOCK_NL);
> >>>>  }
> >>>>+/* An variation of dlm_lock_sync, which make lock request could
> >>>>+ * be interrupted */
> >>>>+static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
> >>>>+				       struct mddev *mddev)
> >>>>+{
> >>>>+	int ret = 0;
> >>>>+
> >>>>+	ret = dlm_lock(res->ls, mode, &res->lksb,
> >>>>+			res->flags, res->name, strlen(res->name),
> >>>>+			0, sync_ast, res, res->bast);
> >>>>+	if (ret)
> >>>>+		return ret;
> >>>>+
> >>>>+	wait_event(res->completion.wait,
> >>>>+		   res->completion.done || kthread_should_stop());
> >>>can you convert it to a waitq? Directly using the .wait/.done of completion is
> >>>really intrusive.
> >>Maybe, but we still need completion for dlm_lock_resource otherwise there
> >>are different types of dlm_lock_resource, we also need to keep align with
> >>sync_ast as dlm_lock_sync did.
> >Yes, we need a waitq and variable like completion.done to indicate the event is
> >done, and convert the completion API to waitq API in other places like
> >sync_ast. The point is not using the opaque data structure of 'struct
> >completion'. Diving into implementation details of a unrelated data structure
> >(completion here) is really intrusive.
> 
> I don't want to argue about the intrusive, but we have to create something
> which could be achieved by the existed thing.
> 
> OTOH, convert completion to waitqueue should be done in a new patch,
> and it is a big change though achieveable, we need to do full test for it.

I can't see why this is a big change. Simply adding a waitq and a variable,
replace wait_for_completion with wait_event checking the condition and replace
complete with setting the variable and wake_up. That's pretty straightforward
and no risk. I'd rather see the first patch converts completion to waitq and
this patch addes the kthread_should_stop() check.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-08-06  3:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-28  6:16 [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
2016-07-28  6:16 ` [PATCH 2/8] md-cluster: use FORCEUNLOCK in lockres_free Guoqing Jiang
2016-07-28  6:16 ` [PATCH 3/8] md-cluster: remove some unnecessary dlm_unlock_sync Guoqing Jiang
2016-07-28  6:16 ` [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang Guoqing Jiang
2016-08-01 22:20   ` Shaohua Li
2016-08-02  3:24     ` Guoqing Jiang
2016-08-02 22:36       ` Shaohua Li
2016-08-03  2:39         ` Guoqing Jiang
2016-08-06  3:59           ` Shaohua Li
2016-07-28  6:16 ` [PATCH 5/8] md: changes for MD_STILL_CLOSED flag Guoqing Jiang
2016-07-28  6:16 ` [PATCH 6/8] md-cluster: make resync lock also could be interruptted Guoqing Jiang
2016-08-01 22:29   ` Shaohua Li
2016-08-02  1:38     ` Guoqing Jiang
2016-07-28  6:16 ` [PATCH 7/8] md-cluster: clean related infos of cluster Guoqing Jiang
2016-07-28  6:16 ` [PATCH 8/8] md-cluster: remove EXPERIMENTAL info Guoqing Jiang
2016-08-01 21:58 ` [PATCH 1/8] md-cluster: call md_kick_rdev_from_array once ack failed Shaohua Li
2016-08-02  3:21   ` Guoqing Jiang
     [not found]   ` <579FF771.4060007@suse.com>
2016-08-02 22:17     ` Shaohua Li
2016-08-03  2:15       ` Guoqing Jiang
2016-08-03  2:26 ` [PATCH V2 " Guoqing Jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.