All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 00/10] The latest changes for md-cluster
@ 2016-08-12  5:42 Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 01/10] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

This version addresses comments from Shaohua, and the 7th
patch is added as well.

Thanks,
Guoqing

Guoqing Jiang (10):
  md-cluster: call md_kick_rdev_from_array once ack failed
  md-cluster: use FORCEUNLOCK in lockres_free
  md-cluster: remove some unnecessary dlm_unlock_sync
  md: changes for MD_STILL_CLOSED flag
  md-cluster: clean related infos of cluster
  md-cluster: protect md_find_rdev_nr_rcu with rcu lock
  md: remove obsolete ret in md_start_sync
  md-cluster: convert the completion to wait queue
  md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  md-cluster: make resync lock also could be interruptted

 drivers/md/md-cluster.c | 93 +++++++++++++++++++++++++++++++++----------------
 drivers/md/md.c         | 34 ++++++++++--------
 drivers/md/md.h         |  5 ++-
 3 files changed, 85 insertions(+), 47 deletions(-)

-- 
2.6.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 01/10] md-cluster: call md_kick_rdev_from_array once ack failed
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 02/10] md-cluster: use FORCEUNLOCK in lockres_free Guoqing Jiang
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

The new_disk_ack could return failure if WAITING_FOR_NEWDISK
is not set, so we need to kick the dev from array in case
failure happened.

And we missed to check err before call new_disk_ack othwise
we could kick a rdev which isn't in array, thanks for the
reminder from Shaohua.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2c3ab6f..d6ae9bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6101,9 +6101,13 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
 			export_rdev(rdev);
 
 		if (mddev_is_clustered(mddev)) {
-			if (info->state & (1 << MD_DISK_CANDIDATE))
-				md_cluster_ops->new_disk_ack(mddev, (err == 0));
-			else {
+			if (info->state & (1 << MD_DISK_CANDIDATE)) {
+				if (!err) {
+					err = md_cluster_ops->new_disk_ack(mddev, (err == 0));
+					if (err)
+						md_kick_rdev_from_array(rdev);
+				}
+			} else {
 				if (err)
 					md_cluster_ops->add_new_disk_cancel(mddev);
 				else
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 02/10] md-cluster: use FORCEUNLOCK in lockres_free
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 01/10] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 03/10] md-cluster: remove some unnecessary dlm_unlock_sync Guoqing Jiang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

For dlm_unlock, we need to pass flag to dlm_unlock as the
third parameter instead of set res->flags.

Also, DLM_LKF_FORCEUNLOCK is more suitable for dlm_unlock
since it works even the lock is on waiting or convert queue.

Acked-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 41573f1..0a0605f 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -194,25 +194,18 @@ out_err:
 
 static void lockres_free(struct dlm_lock_resource *res)
 {
-	int ret;
+	int ret = 0;
 
 	if (!res)
 		return;
 
-	/* cancel a lock request or a conversion request that is blocked */
-	res->flags |= DLM_LKF_CANCEL;
-retry:
-	ret = dlm_unlock(res->ls, res->lksb.sb_lkid, 0, &res->lksb, res);
-	if (unlikely(ret != 0)) {
-		pr_info("%s: failed to unlock %s return %d\n", __func__, res->name, ret);
-
-		/* if a lock conversion is cancelled, then the lock is put
-		 * back to grant queue, need to ensure it is unlocked */
-		if (ret == -DLM_ECANCEL)
-			goto retry;
-	}
-	res->flags &= ~DLM_LKF_CANCEL;
-	wait_for_completion(&res->completion);
+	/* use FORCEUNLOCK flag, so we can unlock even the lock is on the
+	 * waiting or convert queue */
+	ret = dlm_unlock(res->ls, res->lksb.sb_lkid, DLM_LKF_FORCEUNLOCK, &res->lksb, res);
+	if (unlikely(ret != 0))
+		pr_err("failed to unlock %s return %d\n", res->name, ret);
+	else
+		wait_for_completion(&res->completion);
 
 	kfree(res->name);
 	kfree(res->lksb.sb_lvbptr);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 03/10] md-cluster: remove some unnecessary dlm_unlock_sync
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 01/10] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 02/10] md-cluster: use FORCEUNLOCK in lockres_free Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 04/10] md: changes for MD_STILL_CLOSED flag Guoqing Jiang
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

Since DLM_LKF_FORCEUNLOCK is used in lockres_free,
we don't need to call dlm_unlock_sync before free
lock resource.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 0a0605f..ea2699e 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -281,7 +281,7 @@ static void recover_bitmaps(struct md_thread *thread)
 		ret = bitmap_copy_from_slot(mddev, slot, &lo, &hi, true);
 		if (ret) {
 			pr_err("md-cluster: Could not copy data from bitmap %d\n", slot);
-			goto dlm_unlock;
+			goto clear_bit;
 		}
 		if (hi > 0) {
 			if (lo < mddev->recovery_cp)
@@ -293,8 +293,6 @@ static void recover_bitmaps(struct md_thread *thread)
 			    md_wakeup_thread(mddev->thread);
 			}
 		}
-dlm_unlock:
-		dlm_unlock_sync(bm_lockres);
 clear_bit:
 		lockres_free(bm_lockres);
 		clear_bit(slot, &cinfo->recovery_map);
@@ -763,7 +761,6 @@ static int gather_all_resync_info(struct mddev *mddev, int total_slots)
 			md_check_recovery(mddev);
 		}
 
-		dlm_unlock_sync(bm_lockres);
 		lockres_free(bm_lockres);
 	}
 out:
@@ -1173,7 +1170,6 @@ static void unlock_all_bitmaps(struct mddev *mddev)
 	if (cinfo->other_bitmap_lockres) {
 		for (i = 0; i < mddev->bitmap_info.nodes - 1; i++) {
 			if (cinfo->other_bitmap_lockres[i]) {
-				dlm_unlock_sync(cinfo->other_bitmap_lockres[i]);
 				lockres_free(cinfo->other_bitmap_lockres[i]);
 			}
 		}
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 04/10] md: changes for MD_STILL_CLOSED flag
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
                   ` (2 preceding siblings ...)
  2016-08-12  5:42 ` [PATCH V2 03/10] md-cluster: remove some unnecessary dlm_unlock_sync Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 05/10] md-cluster: clean related infos of cluster Guoqing Jiang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

When stop clustered raid while it is pending on resync,
MD_STILL_CLOSED flag could be cleared since udev rule
is triggered to open the mddev. So obviously array can't
be stopped soon and returns EBUSY.

	mdadm -Ss          md-raid-arrays.rules
  set MD_STILL_CLOSED          md_open()
	... ... ...          clear MD_STILL_CLOSED
	do_md_stop

We make below changes to resolve this issue:

1. rename MD_STILL_CLOSED to MD_CLOSING since it is set
   when stop array and it means we are stopping array.
2. let md_open returns early if CLOSING is set, so no
   other threads will open array if one thread is trying
   to close it.
3. no need to clear CLOSING bit in md_open because 1 has
   ensure the bit is cleared, then we also don't need to
   test CLOSING bit in do_md_stop.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md.c | 14 ++++++++------
 drivers/md/md.h |  5 ++---
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d6ae9bc..da5741c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5576,8 +5576,7 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
 	mutex_lock(&mddev->open_mutex);
 	if ((mddev->pers && atomic_read(&mddev->openers) > !!bdev) ||
 	    mddev->sync_thread ||
-	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
-	    (bdev && !test_bit(MD_STILL_CLOSED, &mddev->flags))) {
+	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
 		printk("md: %s still in use.\n",mdname(mddev));
 		if (did_freeze) {
 			clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
@@ -5639,8 +5638,7 @@ static int do_md_stop(struct mddev *mddev, int mode,
 	if ((mddev->pers && atomic_read(&mddev->openers) > !!bdev) ||
 	    mddev->sysfs_active ||
 	    mddev->sync_thread ||
-	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
-	    (bdev && !test_bit(MD_STILL_CLOSED, &mddev->flags))) {
+	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
 		printk("md: %s still in use.\n",mdname(mddev));
 		mutex_unlock(&mddev->open_mutex);
 		if (did_freeze) {
@@ -6825,7 +6823,7 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
 			err = -EBUSY;
 			goto out;
 		}
-		set_bit(MD_STILL_CLOSED, &mddev->flags);
+		set_bit(MD_CLOSING, &mddev->flags);
 		mutex_unlock(&mddev->open_mutex);
 		sync_blockdev(bdev);
 	}
@@ -7074,9 +7072,13 @@ static int md_open(struct block_device *bdev, fmode_t mode)
 	if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
 		goto out;
 
+	if (test_bit(MD_CLOSING, &mddev->flags)) {
+		mutex_unlock(&mddev->open_mutex);
+		return -ENODEV;
+	}
+
 	err = 0;
 	atomic_inc(&mddev->openers);
-	clear_bit(MD_STILL_CLOSED, &mddev->flags);
 	mutex_unlock(&mddev->open_mutex);
 
 	check_disk_change(bdev);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 20c6675..2b20417 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -201,9 +201,8 @@ struct mddev {
 #define MD_CHANGE_PENDING 2	/* switch from 'clean' to 'active' in progress */
 #define MD_UPDATE_SB_FLAGS (1 | 2 | 4)	/* If these are set, md_update_sb needed */
 #define MD_ARRAY_FIRST_USE 3    /* First use of array, needs initialization */
-#define MD_STILL_CLOSED	4	/* If set, then array has not been opened since
-				 * md_ioctl checked on it.
-				 */
+#define MD_CLOSING	4	/* If set, we are closing the array, do not open
+				 * it then */
 #define MD_JOURNAL_CLEAN 5	/* A raid with journal is already clean */
 #define MD_HAS_JOURNAL	6	/* The raid array has journal feature set */
 #define MD_RELOAD_SB	7	/* Reload the superblock because another node
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 05/10] md-cluster: clean related infos of cluster
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
                   ` (3 preceding siblings ...)
  2016-08-12  5:42 ` [PATCH V2 04/10] md: changes for MD_STILL_CLOSED flag Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 06/10] md-cluster: protect md_find_rdev_nr_rcu with rcu lock Guoqing Jiang
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

cluster_info and bitmap_info.nodes also need to be
cleared when array is stopped.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index da5741c..f95e5ed 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5457,12 +5457,14 @@ static void md_clean(struct mddev *mddev)
 	mddev->degraded = 0;
 	mddev->safemode = 0;
 	mddev->private = NULL;
+	mddev->cluster_info = NULL;
 	mddev->bitmap_info.offset = 0;
 	mddev->bitmap_info.default_offset = 0;
 	mddev->bitmap_info.default_space = 0;
 	mddev->bitmap_info.chunksize = 0;
 	mddev->bitmap_info.daemon_sleep = 0;
 	mddev->bitmap_info.max_write_behind = 0;
+	mddev->bitmap_info.nodes = 0;
 }
 
 static void __md_stop_writes(struct mddev *mddev)
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 06/10] md-cluster: protect md_find_rdev_nr_rcu with rcu lock
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
                   ` (4 preceding siblings ...)
  2016-08-12  5:42 ` [PATCH V2 05/10] md-cluster: clean related infos of cluster Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 07/10] md: remove obsolete ret in md_start_sync Guoqing Jiang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

We need to use rcu_read_lock/unlock to avoid potential
race.

Reported-by: Shaohua Li <shli@fb.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index ea2699e..8972413 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -486,9 +486,10 @@ static void process_metadata_update(struct mddev *mddev, struct cluster_msg *msg
 
 static void process_remove_disk(struct mddev *mddev, struct cluster_msg *msg)
 {
-	struct md_rdev *rdev = md_find_rdev_nr_rcu(mddev,
-						   le32_to_cpu(msg->raid_slot));
+	struct md_rdev *rdev;
 
+	rcu_read_lock();
+	rdev = md_find_rdev_nr_rcu(mddev, le32_to_cpu(msg->raid_slot));
 	if (rdev) {
 		set_bit(ClusterRemove, &rdev->flags);
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
@@ -497,18 +498,21 @@ static void process_remove_disk(struct mddev *mddev, struct cluster_msg *msg)
 	else
 		pr_warn("%s: %d Could not find disk(%d) to REMOVE\n",
 			__func__, __LINE__, le32_to_cpu(msg->raid_slot));
+	rcu_read_unlock();
 }
 
 static void process_readd_disk(struct mddev *mddev, struct cluster_msg *msg)
 {
-	struct md_rdev *rdev = md_find_rdev_nr_rcu(mddev,
-						   le32_to_cpu(msg->raid_slot));
+	struct md_rdev *rdev;
 
+	rcu_read_lock();
+	rdev = md_find_rdev_nr_rcu(mddev, le32_to_cpu(msg->raid_slot));
 	if (rdev && test_bit(Faulty, &rdev->flags))
 		clear_bit(Faulty, &rdev->flags);
 	else
 		pr_warn("%s: %d Could not find disk(%d) which is faulty",
 			__func__, __LINE__, le32_to_cpu(msg->raid_slot));
+	rcu_read_unlock();
 }
 
 static int process_recvd_msg(struct mddev *mddev, struct cluster_msg *msg)
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 07/10] md: remove obsolete ret in md_start_sync
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
                   ` (5 preceding siblings ...)
  2016-08-12  5:42 ` [PATCH V2 06/10] md-cluster: protect md_find_rdev_nr_rcu with rcu lock Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 08/10] md-cluster: convert the completion to wait queue Guoqing Jiang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

The ret is not needed anymore since we have already
move resync_start into md_do_sync in commit 41a9a0d.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f95e5ed..cceaaf4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8283,16 +8283,14 @@ no_add:
 static void md_start_sync(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, del_work);
-	int ret = 0;
 
 	mddev->sync_thread = md_register_thread(md_do_sync,
 						mddev,
 						"resync");
 	if (!mddev->sync_thread) {
-		if (!(mddev_is_clustered(mddev) && ret == -EAGAIN))
-			printk(KERN_ERR "%s: could not start resync"
-			       " thread...\n",
-			       mdname(mddev));
+		printk(KERN_ERR "%s: could not start resync"
+		       " thread...\n",
+		       mdname(mddev));
 		/* leave the spares where they are, it shouldn't hurt */
 		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 		clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 08/10] md-cluster: convert the completion to wait queue
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
                   ` (6 preceding siblings ...)
  2016-08-12  5:42 ` [PATCH V2 07/10] md: remove obsolete ret in md_start_sync Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 09/10] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang Guoqing Jiang
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

Previously, we used completion to sync between require dlm lock
and sync_ast, however we will have to expose completion.wait
and completion.done in dlm_lock_sync_interruptible (introduced
later), it is not a common usage for completion, so convert
related things to wait queue.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 8972413..03a51e7 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -25,7 +25,8 @@ struct dlm_lock_resource {
 	struct dlm_lksb lksb;
 	char *name; /* lock name. */
 	uint32_t flags; /* flags to pass to dlm_lock() */
-	struct completion completion; /* completion for synchronized locking */
+	wait_queue_head_t sync_locking; /* wait queue for synchronized locking */
+	bool sync_locking_done;
 	void (*bast)(void *arg, int mode); /* blocking AST function pointer*/
 	struct mddev *mddev; /* pointing back to mddev. */
 	int mode;
@@ -118,7 +119,8 @@ static void sync_ast(void *arg)
 	struct dlm_lock_resource *res;
 
 	res = arg;
-	complete(&res->completion);
+	res->sync_locking_done = true;
+	wake_up(&res->sync_locking);
 }
 
 static int dlm_lock_sync(struct dlm_lock_resource *res, int mode)
@@ -130,7 +132,8 @@ static int dlm_lock_sync(struct dlm_lock_resource *res, int mode)
 			0, sync_ast, res, res->bast);
 	if (ret)
 		return ret;
-	wait_for_completion(&res->completion);
+	wait_event(res->sync_locking, res->sync_locking_done);
+	res->sync_locking_done = false;
 	if (res->lksb.sb_status == 0)
 		res->mode = mode;
 	return res->lksb.sb_status;
@@ -151,7 +154,8 @@ static struct dlm_lock_resource *lockres_init(struct mddev *mddev,
 	res = kzalloc(sizeof(struct dlm_lock_resource), GFP_KERNEL);
 	if (!res)
 		return NULL;
-	init_completion(&res->completion);
+	init_waitqueue_head(&res->sync_locking);
+	res->sync_locking_done = false;
 	res->ls = cinfo->lockspace;
 	res->mddev = mddev;
 	res->mode = DLM_LOCK_IV;
@@ -205,7 +209,7 @@ static void lockres_free(struct dlm_lock_resource *res)
 	if (unlikely(ret != 0))
 		pr_err("failed to unlock %s return %d\n", res->name, ret);
 	else
-		wait_for_completion(&res->completion);
+		wait_event(res->sync_locking, res->sync_locking_done);
 
 	kfree(res->name);
 	kfree(res->lksb.sb_lvbptr);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 09/10] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
                   ` (7 preceding siblings ...)
  2016-08-12  5:42 ` [PATCH V2 08/10] md-cluster: convert the completion to wait queue Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-12  5:42 ` [PATCH V2 10/10] md-cluster: make resync lock also could be interruptted Guoqing Jiang
  2016-08-17  1:49 ` [PATCH V2 00/10] The latest changes for md-cluster Shaohua Li
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

When some node leaves cluster, then it's bitmap need to be
synced by another node, so "md*_recover" thread is triggered
for the purpose. However, with below steps. we can find tasks
hang happened either in B or C.

1. Node A create a resyncing cluster raid1, assemble it in
   other two nodes (B and C).
2. stop array in B and C.
3. stop array in A.

linux44:~ # ps aux|grep md|grep D
root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]

linux44:~ # cat /proc/5939/stack
[<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
[<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
[<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
[<ffffffff8107ad94>] kthread+0xb4/0xc0
[<ffffffff8152a518>] ret_from_fork+0x58/0x90

linux44:~ # cat /proc/5938/stack
[<ffffffff8107afde>] kthread_stop+0x6e/0x120
[<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
[<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
[<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
[<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
[<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
[<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
[<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
[<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
[<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
[<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
[<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b

The problem is caused by recover_bitmaps can't reliably abort
when the thread is unregistered. So dlm_lock_sync_interruptible
is introduced to detect the thread's situation to fix the problem.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 03a51e7..149d19a 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -10,6 +10,7 @@
 
 
 #include <linux/module.h>
+#include <linux/kthread.h>
 #include <linux/dlm.h>
 #include <linux/sched.h>
 #include <linux/raid/md_p.h>
@@ -144,6 +145,40 @@ static int dlm_unlock_sync(struct dlm_lock_resource *res)
 	return dlm_lock_sync(res, DLM_LOCK_NL);
 }
 
+/* An variation of dlm_lock_sync, which make lock request could
+ * be interrupted */
+static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
+				       struct mddev *mddev)
+{
+	int ret = 0;
+
+	ret = dlm_lock(res->ls, mode, &res->lksb,
+			res->flags, res->name, strlen(res->name),
+			0, sync_ast, res, res->bast);
+	if (ret)
+		return ret;
+
+	wait_event(res->sync_locking, res->sync_locking_done
+				      || kthread_should_stop());
+	if (!res->sync_locking_done) {
+		/*
+		 * the convert queue contains the lock request when request is
+		 * interrupted, and sync_ast could still be run, so need to
+		 * cancel the request and reset completion
+		 */
+		ret = dlm_unlock(res->ls, res->lksb.sb_lkid, DLM_LKF_CANCEL, &res->lksb, res);
+		res->sync_locking_done = false;
+		if (unlikely(ret != 0))
+			pr_info("failed to cancel previous lock request "
+				 "%s return %d\n", res->name, ret);
+		return -EPERM;
+	} else
+		res->sync_locking_done = false;
+	if (res->lksb.sb_status == 0)
+		res->mode = mode;
+	return res->lksb.sb_status;
+}
+
 static struct dlm_lock_resource *lockres_init(struct mddev *mddev,
 		char *name, void (*bastfn)(void *arg, int mode), int with_lvb)
 {
@@ -276,7 +311,7 @@ static void recover_bitmaps(struct md_thread *thread)
 			goto clear_bit;
 		}
 
-		ret = dlm_lock_sync(bm_lockres, DLM_LOCK_PW);
+		ret = dlm_lock_sync_interruptible(bm_lockres, DLM_LOCK_PW, mddev);
 		if (ret) {
 			pr_err("md-cluster: Could not DLM lock %s: %d\n",
 					str, ret);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 10/10] md-cluster: make resync lock also could be interruptted
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
                   ` (8 preceding siblings ...)
  2016-08-12  5:42 ` [PATCH V2 09/10] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang Guoqing Jiang
@ 2016-08-12  5:42 ` Guoqing Jiang
  2016-08-17  1:49 ` [PATCH V2 00/10] The latest changes for md-cluster Shaohua Li
  10 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2016-08-12  5:42 UTC (permalink / raw)
  To: linux-raid; +Cc: shli, Guoqing Jiang

When one node is perform resync or recovery, other nodes
can't get resync lock and could block for a while before
it holds the lock, so we can't stop array immediately for
this scenario.

To make array could be stop quickly, we check MD_CLOSING
in dlm_lock_sync_interruptible to make us can interrupt
the lock request.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md-cluster.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 149d19a..16d892d 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -159,7 +159,8 @@ static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
 		return ret;
 
 	wait_event(res->sync_locking, res->sync_locking_done
-				      || kthread_should_stop());
+				      || kthread_should_stop()
+				      || test_bit(MD_CLOSING, &mddev->flags));
 	if (!res->sync_locking_done) {
 		/*
 		 * the convert queue contains the lock request when request is
@@ -1033,7 +1034,7 @@ static void metadata_update_cancel(struct mddev *mddev)
 static int resync_start(struct mddev *mddev)
 {
 	struct md_cluster_info *cinfo = mddev->cluster_info;
-	return dlm_lock_sync(cinfo->resync_lockres, DLM_LOCK_EX);
+	return dlm_lock_sync_interruptible(cinfo->resync_lockres, DLM_LOCK_EX, mddev);
 }
 
 static int resync_info_update(struct mddev *mddev, sector_t lo, sector_t hi)
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 00/10] The latest changes for md-cluster
  2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
                   ` (9 preceding siblings ...)
  2016-08-12  5:42 ` [PATCH V2 10/10] md-cluster: make resync lock also could be interruptted Guoqing Jiang
@ 2016-08-17  1:49 ` Shaohua Li
  10 siblings, 0 replies; 12+ messages in thread
From: Shaohua Li @ 2016-08-17  1:49 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, shli

On Fri, Aug 12, 2016 at 01:42:33PM +0800, Guoqing Jiang wrote:
> This version addresses comments from Shaohua, and the 7th
> patch is added as well.

Applied this series, thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-08-17  1:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12  5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 01/10] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 02/10] md-cluster: use FORCEUNLOCK in lockres_free Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 03/10] md-cluster: remove some unnecessary dlm_unlock_sync Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 04/10] md: changes for MD_STILL_CLOSED flag Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 05/10] md-cluster: clean related infos of cluster Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 06/10] md-cluster: protect md_find_rdev_nr_rcu with rcu lock Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 07/10] md: remove obsolete ret in md_start_sync Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 08/10] md-cluster: convert the completion to wait queue Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 09/10] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang Guoqing Jiang
2016-08-12  5:42 ` [PATCH V2 10/10] md-cluster: make resync lock also could be interruptted Guoqing Jiang
2016-08-17  1:49 ` [PATCH V2 00/10] The latest changes for md-cluster Shaohua Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.