All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] barrier support for request-based dm
@ 2009-10-16  4:52 Kiyoshi Ueda
  2009-10-16  4:55 ` [PATCH 1/9] dm core: clean up in-flight checking Kiyoshi Ueda
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  4:52 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

Hi Alasdair,

This patch-set adds barrier support for request-based dm-multipath.
It is made on top of 2.6.32-rc4 + the patches below:
  o http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=commit;h=a9327cac440be4d8333bba975cbbf76045096275
    (Seperate read and write statistics of in_flight requests)

  o The patches below in the Alasdair's editing tree:
      dm-add-missing-del_gendisk-to-alloc_dev-error-path.patch
      dm-dec_pending-needs-locking-to-save-error-value.patch
      dm-ioctl-prefer-strlcpy-over-strncpy.patch
      dm-trace-request-based-remapping.patch

Most patches are clean-ups and refactorings for preparation of
the last patch.

Please review and apply if no problem.

Summary of the patch-set:
  1/9: dm core: clean up in-flight checking
  2/9: rqdm core: map_request() takes clone instead of orig
  3/9: rqdm core: alloc_rq_tio() takes gfp_mask
  4/9: rqdm core: clean up request cloning
  5/9: rqdm core: simplify suspend code
  6/9: rqdm core: use md->pending for in_flight I/O counting
  7/9: rqdm core: refactor completion functions
  8/9: rqdm core: move dm_end_request()
  9/9: rqdm core: add barrier support

 drivers/md/dm.c |  582 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 323 insertions(+), 259 deletions(-)

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/9] dm core: clean up in-flight checking
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
@ 2009-10-16  4:55 ` Kiyoshi Ueda
  2009-10-19 21:16   ` Alasdair G Kergon
  2009-10-16  4:57 ` [PATCH 2/9] rqdm core: map_request() takes clone instead of orig Kiyoshi Ueda
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  4:55 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

This patch adds md_in_flight() to get the number of in_flight I/Os.
No functional change.

This patch is a preparation for PATCH 6, which changes I/O counter
to md->pending from q->in_flight in request-based dm.

This patch depends on:
  http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=commit;h=a9327cac440be4d8333bba975cbbf76045096275
  (Seperate read and write statistics of in_flight requests)

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -450,6 +450,12 @@ static void free_bio_info(struct dm_rq_c
 	mempool_free(info, info->tio->md->io_pool);
 }
 
+static int md_in_flight(struct mapped_device *md)
+{
+	return atomic_read(&md->pending[READ]) +
+	       atomic_read(&md->pending[WRITE]);
+}
+
 static void start_io_acct(struct dm_io *io)
 {
 	struct mapped_device *md = io->md;
@@ -2100,8 +2106,7 @@ static int dm_wait_for_completion(struct
 				break;
 			}
 			spin_unlock_irqrestore(q->queue_lock, flags);
-		} else if (!atomic_read(&md->pending[0]) &&
-					!atomic_read(&md->pending[1]))
+		} else if (!md_in_flight(md))
 			break;
 
 		if (interruptible == TASK_INTERRUPTIBLE &&

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/9] rqdm core: map_request() takes clone instead of orig
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
  2009-10-16  4:55 ` [PATCH 1/9] dm core: clean up in-flight checking Kiyoshi Ueda
@ 2009-10-16  4:57 ` Kiyoshi Ueda
  2009-10-16  4:58 ` [PATCH 3/9] rqdm core: alloc_rq_tio() takes gfp_mask Kiyoshi Ueda
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  4:57 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

This patch changes the argument of map_request() to clone request
from original request.  No functional change.

This patch is a preparation for PATCH 9, which needs to use
map_request() for clones sharing an original barrier request.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -1493,11 +1493,10 @@ static int dm_prep_fn(struct request_que
 	return BLKPREP_OK;
 }
 
-static void map_request(struct dm_target *ti, struct request *rq,
+static void map_request(struct dm_target *ti, struct request *clone,
 			struct mapped_device *md)
 {
 	int r;
-	struct request *clone = rq->special;
 	struct dm_rq_target_io *tio = clone->end_io_data;
 
 	/*
@@ -1576,7 +1575,7 @@ static void dm_request_fn(struct request
 
 		blk_start_request(rq);
 		spin_unlock(q->queue_lock);
-		map_request(ti, rq, md);
+		map_request(ti, rq->special, md);
 		spin_lock_irq(q->queue_lock);
 	}
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/9] rqdm core: alloc_rq_tio() takes gfp_mask
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
  2009-10-16  4:55 ` [PATCH 1/9] dm core: clean up in-flight checking Kiyoshi Ueda
  2009-10-16  4:57 ` [PATCH 2/9] rqdm core: map_request() takes clone instead of orig Kiyoshi Ueda
@ 2009-10-16  4:58 ` Kiyoshi Ueda
  2009-10-16  4:59 ` [PATCH 4/9] rqdm core: clean up request cloning Kiyoshi Ueda
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  4:58 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

This patch adds the gfp_mask argument to alloc_rq_tio().
No functional change.

This patch is a preparation for PATCH 9, which needs to allocate
tio (for barrier I/O) with different allocation flag (GFP_NOIO) from
the one in the normal I/O code path.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -430,9 +430,10 @@ static void free_tio(struct mapped_devic
 	mempool_free(tio, md->tio_pool);
 }
 
-static struct dm_rq_target_io *alloc_rq_tio(struct mapped_device *md)
+static struct dm_rq_target_io *alloc_rq_tio(struct mapped_device *md,
+					    gfp_t gfp_mask)
 {
-	return mempool_alloc(md->tio_pool, GFP_ATOMIC);
+	return mempool_alloc(md->tio_pool, gfp_mask);
 }
 
 static void free_rq_tio(struct dm_rq_target_io *tio)
@@ -1469,7 +1470,7 @@ static int dm_prep_fn(struct request_que
 		return BLKPREP_KILL;
 	}
 
-	tio = alloc_rq_tio(md); /* Only one for each original request */
+	tio = alloc_rq_tio(md, GFP_ATOMIC);
 	if (!tio)
 		/* -ENOMEM */
 		return BLKPREP_DEFER;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/9] rqdm core: clean up request cloning
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
                   ` (2 preceding siblings ...)
  2009-10-16  4:58 ` [PATCH 3/9] rqdm core: alloc_rq_tio() takes gfp_mask Kiyoshi Ueda
@ 2009-10-16  4:59 ` Kiyoshi Ueda
  2009-10-16  5:01 ` [PATCH 5/9] rqdm core: simplify suspend code Kiyoshi Ueda
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  4:59 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

This patch factors out the request cloning code in dm_prep_fn()
as clone_rq().  No functional change.

This patch is a preparation for PATCH 9, which needs to make clones
from an original barrier request.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |   45 ++++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 17 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -1443,6 +1443,32 @@ static int setup_clone(struct request *c
 	return 0;
 }
 
+static struct request *clone_rq(struct request *rq, struct mapped_device *md,
+				gfp_t gfp_mask)
+{
+	struct request *clone;
+	struct dm_rq_target_io *tio;
+
+	tio = alloc_rq_tio(md, gfp_mask);
+	if (!tio)
+		return NULL;
+
+	tio->md = md;
+	tio->ti = NULL;
+	tio->orig = rq;
+	tio->error = 0;
+	memset(&tio->info, 0, sizeof(tio->info));
+
+	clone = &tio->clone;
+	if (setup_clone(clone, rq, tio)) {
+		/* -ENOMEM */
+		free_rq_tio(tio);
+		return NULL;
+	}
+
+	return clone;
+}
+
 static int dm_rq_flush_suspending(struct mapped_device *md)
 {
 	return !md->suspend_rq.special;
@@ -1454,7 +1480,6 @@ static int dm_rq_flush_suspending(struct
 static int dm_prep_fn(struct request_queue *q, struct request *rq)
 {
 	struct mapped_device *md = q->queuedata;
-	struct dm_rq_target_io *tio;
 	struct request *clone;
 
 	if (unlikely(rq == &md->suspend_rq)) {
@@ -1470,24 +1495,10 @@ static int dm_prep_fn(struct request_que
 		return BLKPREP_KILL;
 	}
 
-	tio = alloc_rq_tio(md, GFP_ATOMIC);
-	if (!tio)
-		/* -ENOMEM */
+	clone = clone_rq(rq, md, GFP_ATOMIC);
+	if (!clone)
 		return BLKPREP_DEFER;
 
-	tio->md = md;
-	tio->ti = NULL;
-	tio->orig = rq;
-	tio->error = 0;
-	memset(&tio->info, 0, sizeof(tio->info));
-
-	clone = &tio->clone;
-	if (setup_clone(clone, rq, tio)) {
-		/* -ENOMEM */
-		free_rq_tio(tio);
-		return BLKPREP_DEFER;
-	}
-
 	rq->special = clone;
 	rq->cmd_flags |= REQ_DONTPREP;
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/9] rqdm core: simplify suspend code
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
                   ` (3 preceding siblings ...)
  2009-10-16  4:59 ` [PATCH 4/9] rqdm core: clean up request cloning Kiyoshi Ueda
@ 2009-10-16  5:01 ` Kiyoshi Ueda
  2009-10-28 16:21   ` Alasdair G Kergon
  2009-10-28 17:39   ` Alasdair G Kergon
  2009-10-16  5:02 ` [PATCH 6/9] rqdm core: use md->pending for in_flight I/O counting Kiyoshi Ueda
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  5:01 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

This patch simplifies suspend code of request-based dm.
(No change since the last post: http://patchwork.kernel.org/patch/40809/)

This patch is a preparation for PATCH 9, which waits for all in_flight
I/Os to complete without stopping request_queue and wants to use
dm_wait_for_completion() for it.


In case of suspend with "--nolockfs" but without "--noflush",
the semantics for bio-based dm has been changed in 2.6.30 as below:
  before 2.6.30:   I/Os submitted before the suspend invocation
                   are flushed
  2.6.30 or later: I/Os submitted even before the suspend invocation
                   may not be flushed

  (*) We had no idea whether the semantics change hurt someone.
      (For details, see http://marc.info/?t=123994433400003&r=1&w=2)
      But it seems no hurt since there is no complaint against 2.6.30
      or later.

Since the semantics of request-based dm committed in 2.6.31-rc1
is still old one, change it to new one by this patch.

The semantics change makes the suspend code simple:
  o Suspend is implemented as stopping request_queue
    in request-based dm, and all I/Os are queued in the request_queue
    even after suspend is invoked.
  o To keep the old semantics, we need to distinguish which I/Os were
    queued before/after the suspend invocation.
    So a special barrier-like request called 'suspend marker' was
    introduced.
  o On the new semantics, we don't need to flush any I/O.
    So we can remove the marker and the codes related to the marker
    handling and I/O flushing.

After removing such codes, the suspend sequence is now below:
  1. Flush all I/Os by lock_fs() if needed.
  2. Stop dispatching any I/O by stopping the request_queue.
  3. Wait for all in-flight I/Os to be completed or requeued.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |  158 ++++----------------------------------------------------
 1 file changed, 14 insertions(+), 144 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -178,9 +178,6 @@ struct mapped_device {
 	/* forced geometry settings */
 	struct hd_geometry geometry;
 
-	/* marker of flush suspend for request-based dm */
-	struct request suspend_rq;
-
 	/* For saving the address of __make_request for request based dm */
 	make_request_fn *saved_make_request_fn;
 
@@ -1469,11 +1466,6 @@ static struct request *clone_rq(struct r
 	return clone;
 }
 
-static int dm_rq_flush_suspending(struct mapped_device *md)
-{
-	return !md->suspend_rq.special;
-}
-
 /*
  * Called with the queue lock held.
  */
@@ -1482,14 +1474,6 @@ static int dm_prep_fn(struct request_que
 	struct mapped_device *md = q->queuedata;
 	struct request *clone;
 
-	if (unlikely(rq == &md->suspend_rq)) {
-		if (dm_rq_flush_suspending(md))
-			return BLKPREP_OK;
-		else
-			/* The flush suspend was interrupted */
-			return BLKPREP_KILL;
-	}
-
 	if (unlikely(rq->special)) {
 		DMWARN("Already has something in rq->special.");
 		return BLKPREP_KILL;
@@ -1560,27 +1544,15 @@ static void dm_request_fn(struct request
 	struct request *rq;
 
 	/*
-	 * For noflush suspend, check blk_queue_stopped() to immediately
-	 * quit I/O dispatching.
+	 * For suspend, check blk_queue_stopped() not to increment
+	 * the number of in-flight I/Os after the queue is stopped
+	 * in dm_suspend().
 	 */
 	while (!blk_queue_plugged(q) && !blk_queue_stopped(q)) {
 		rq = blk_peek_request(q);
 		if (!rq)
 			goto plug_and_out;
 
-		if (unlikely(rq == &md->suspend_rq)) { /* Flush suspend maker */
-			if (queue_in_flight(q))
-				/* Not quiet yet.  Wait more */
-				goto plug_and_out;
-
-			/* This device should be quiet now */
-			__stop_queue(q);
-			blk_start_request(rq);
-			__blk_end_request_all(rq, 0);
-			wake_up(&md->wait);
-			goto out;
-		}
-
 		ti = dm_table_find_target(map, blk_rq_pos(rq));
 		if (ti->type->busy && ti->type->busy(ti))
 			goto plug_and_out;
@@ -2112,7 +2084,7 @@ static int dm_wait_for_completion(struct
 		smp_mb();
 		if (dm_request_based(md)) {
 			spin_lock_irqsave(q->queue_lock, flags);
-			if (!queue_in_flight(q) && blk_queue_stopped(q)) {
+			if (!queue_in_flight(q)) {
 				spin_unlock_irqrestore(q->queue_lock, flags);
 				break;
 			}
@@ -2245,67 +2217,6 @@ out:
 	return r;
 }
 
-static void dm_rq_invalidate_suspend_marker(struct mapped_device *md)
-{
-	md->suspend_rq.special = (void *)0x1;
-}
-
-static void dm_rq_abort_suspend(struct mapped_device *md, int noflush)
-{
-	struct request_queue *q = md->queue;
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (!noflush)
-		dm_rq_invalidate_suspend_marker(md);
-	__start_queue(q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-}
-
-static void dm_rq_start_suspend(struct mapped_device *md, int noflush)
-{
-	struct request *rq = &md->suspend_rq;
-	struct request_queue *q = md->queue;
-
-	if (noflush)
-		stop_queue(q);
-	else {
-		blk_rq_init(q, rq);
-		blk_insert_request(q, rq, 0, NULL);
-	}
-}
-
-static int dm_rq_suspend_available(struct mapped_device *md, int noflush)
-{
-	int r = 1;
-	struct request *rq = &md->suspend_rq;
-	struct request_queue *q = md->queue;
-	unsigned long flags;
-
-	if (noflush)
-		return r;
-
-	/* The marker must be protected by queue lock if it is in use */
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (unlikely(rq->ref_count)) {
-		/*
-		 * This can happen, when the previous flush suspend was
-		 * interrupted, the marker is still in the queue and
-		 * this flush suspend has been invoked, because we don't
-		 * remove the marker at the time of suspend interruption.
-		 * We have only one marker per mapped_device, so we can't
-		 * start another flush suspend while it is in use.
-		 */
-		BUG_ON(!rq->special); /* The marker should be invalidated */
-		DMWARN("Invalidating the previous flush suspend is still in"
-		       " progress.  Please retry later.");
-		r = 0;
-	}
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
-	return r;
-}
-
 /*
  * Functions to lock and unlock any filesystem running on the
  * device.
@@ -2348,49 +2259,11 @@ static void unlock_fs(struct mapped_devi
 /*
  * Suspend mechanism in request-based dm.
  *
- * After the suspend starts, further incoming requests are kept in
- * the request_queue and deferred.
- * Remaining requests in the request_queue at the start of suspend are flushed
- * if it is flush suspend.
- * The suspend completes when the following conditions have been satisfied,
- * so wait for it:
- *    1. q->in_flight is 0 (which means no in_flight request)
- *    2. queue has been stopped (which means no request dispatching)
- *
- *
- * Noflush suspend
- * ---------------
- * Noflush suspend doesn't need to dispatch remaining requests.
- * So stop the queue immediately.  Then, wait for all in_flight requests
- * to be completed or requeued.
- *
- * To abort noflush suspend, start the queue.
+ * 1. Flush all I/Os by lock_fs() if needed.
+ * 2. Stop dispatching any I/O by stopping the request_queue.
+ * 3. Wait for all in-flight I/Os to be completed or requeued.
  *
- *
- * Flush suspend
- * -------------
- * Flush suspend needs to dispatch remaining requests.  So stop the queue
- * after the remaining requests are completed. (Requeued request must be also
- * re-dispatched and completed.  Until then, we can't stop the queue.)
- *
- * During flushing the remaining requests, further incoming requests are also
- * inserted to the same queue.  To distinguish which requests are to be
- * flushed, we insert a marker request to the queue at the time of starting
- * flush suspend, like a barrier.
- * The dispatching is blocked when the marker is found on the top of the queue.
- * And the queue is stopped when all in_flight requests are completed, since
- * that means the remaining requests are completely flushed.
- * Then, the marker is removed from the queue.
- *
- * To abort flush suspend, we also need to take care of the marker, not only
- * starting the queue.
- * We don't remove the marker forcibly from the queue since it's against
- * the block-layer manner.  Instead, we put a invalidated mark on the marker.
- * When the invalidated marker is found on the top of the queue, it is
- * immediately removed from the queue, so it doesn't block dispatching.
- * Because we have only one marker per mapped_device, we can't start another
- * flush suspend until the invalidated marker is removed from the queue.
- * So fail and return with -EBUSY in such a case.
+ * To abort suspend, start the request_queue.
  */
 int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
 {
@@ -2406,11 +2279,6 @@ int dm_suspend(struct mapped_device *md,
 		goto out_unlock;
 	}
 
-	if (dm_request_based(md) && !dm_rq_suspend_available(md, noflush)) {
-		r = -EBUSY;
-		goto out_unlock;
-	}
-
 	map = dm_get_table(md);
 
 	/*
@@ -2424,8 +2292,10 @@ int dm_suspend(struct mapped_device *md,
 	dm_table_presuspend_targets(map);
 
 	/*
-	 * Flush I/O to the device. noflush supersedes do_lockfs,
-	 * because lock_fs() needs to flush I/Os.
+	 * Flush I/O to the device.
+	 * Any I/O submitted after lock_fs() may not be flushed.
+	 * noflush supersedes do_lockfs, because lock_fs() needs to flush I/Os
+	 * and wait for the flushed I/Os to complete.
 	 */
 	if (!noflush && do_lockfs) {
 		r = lock_fs(md);
@@ -2457,7 +2327,7 @@ int dm_suspend(struct mapped_device *md,
 	flush_workqueue(md->wq);
 
 	if (dm_request_based(md))
-		dm_rq_start_suspend(md, noflush);
+		stop_queue(md->queue);
 
 	/*
 	 * At this point no more requests are entering target request routines.
@@ -2476,7 +2346,7 @@ int dm_suspend(struct mapped_device *md,
 		dm_queue_flush(md);
 
 		if (dm_request_based(md))
-			dm_rq_abort_suspend(md, noflush);
+			start_queue(md->queue);
 
 		unlock_fs(md);
 		goto out; /* pushback list is already flushed, so skip flush */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 6/9] rqdm core: use md->pending for in_flight I/O counting
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
                   ` (4 preceding siblings ...)
  2009-10-16  5:01 ` [PATCH 5/9] rqdm core: simplify suspend code Kiyoshi Ueda
@ 2009-10-16  5:02 ` Kiyoshi Ueda
  2009-10-16  5:03 ` [PATCH 7/9] rqdm core: refactor completion functions Kiyoshi Ueda
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  5:02 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

This patch changes the counter for the number of in_flight I/Os
to md->pending from q->in_flight.  No functional change.

This patch is a preparation for PATCH 9.
Request-based dm used q->in_flight to count the number of in-flight
clones assuming the counter is always incremented for an in-flight
original request and original:clone is 1:1 relationship.
However, it is no longer true for barrier requests.
So use md->pending to count the number of in-flight clones.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |   45 +++++++++++++++++----------------------------
 1 file changed, 17 insertions(+), 28 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -725,23 +725,16 @@ static void end_clone_bio(struct bio *cl
  * the md may be freed in dm_put() at the end of this function.
  * Or do dm_get() before calling this function and dm_put() later.
  */
-static void rq_completed(struct mapped_device *md, int run_queue)
+static void rq_completed(struct mapped_device *md, int rw, int run_queue)
 {
-	int wakeup_waiters = 0;
-	struct request_queue *q = md->queue;
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (!queue_in_flight(q))
-		wakeup_waiters = 1;
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	atomic_dec(&md->pending[rw]);
 
 	/* nudge anyone waiting on suspend queue */
-	if (wakeup_waiters)
+	if (!md_in_flight(md))
 		wake_up(&md->wait);
 
 	if (run_queue)
-		blk_run_queue(q);
+		blk_run_queue(md->queue);
 
 	/*
 	 * dm_put() must be at the end of this function. See the comment above
@@ -772,6 +765,7 @@ static void dm_unprep_request(struct req
  */
 void dm_requeue_unmapped_request(struct request *clone)
 {
+	int rw = rq_data_dir(clone);
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct mapped_device *md = tio->md;
 	struct request *rq = tio->orig;
@@ -786,7 +780,7 @@ void dm_requeue_unmapped_request(struct 
 	blk_requeue_request(q, rq);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-	rq_completed(md, 0);
+	rq_completed(md, rw, 0);
 }
 EXPORT_SYMBOL_GPL(dm_requeue_unmapped_request);
 
@@ -825,6 +819,7 @@ static void start_queue(struct request_q
  */
 static void dm_end_request(struct request *clone, int error)
 {
+	int rw = rq_data_dir(clone);
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct mapped_device *md = tio->md;
 	struct request *rq = tio->orig;
@@ -846,7 +841,7 @@ static void dm_end_request(struct reques
 
 	blk_end_request_all(rq, error);
 
-	rq_completed(md, 1);
+	rq_completed(md, rw, 1);
 }
 
 /*
@@ -1541,12 +1536,12 @@ static void dm_request_fn(struct request
 	struct mapped_device *md = q->queuedata;
 	struct dm_table *map = dm_get_table(md);
 	struct dm_target *ti;
-	struct request *rq;
+	struct request *rq, *clone;
 
 	/*
-	 * For suspend, check blk_queue_stopped() not to increment
-	 * the number of in-flight I/Os after the queue is stopped
-	 * in dm_suspend().
+	 * For suspend, check blk_queue_stopped() and increment ->pending
+	 * within a single queue_lock not to increment the number of
+	 * in-flight I/Os after the queue is stopped in dm_suspend().
 	 */
 	while (!blk_queue_plugged(q) && !blk_queue_stopped(q)) {
 		rq = blk_peek_request(q);
@@ -1558,8 +1553,11 @@ static void dm_request_fn(struct request
 			goto plug_and_out;
 
 		blk_start_request(rq);
+		clone = rq->special;
+		atomic_inc(&md->pending[rq_data_dir(clone)]);
+
 		spin_unlock(q->queue_lock);
-		map_request(ti, rq->special, md);
+		map_request(ti, clone, md);
 		spin_lock_irq(q->queue_lock);
 	}
 
@@ -2071,8 +2069,6 @@ static int dm_wait_for_completion(struct
 {
 	int r = 0;
 	DECLARE_WAITQUEUE(wait, current);
-	struct request_queue *q = md->queue;
-	unsigned long flags;
 
 	dm_unplug_all(md->queue);
 
@@ -2082,14 +2078,7 @@ static int dm_wait_for_completion(struct
 		set_current_state(interruptible);
 
 		smp_mb();
-		if (dm_request_based(md)) {
-			spin_lock_irqsave(q->queue_lock, flags);
-			if (!queue_in_flight(q)) {
-				spin_unlock_irqrestore(q->queue_lock, flags);
-				break;
-			}
-			spin_unlock_irqrestore(q->queue_lock, flags);
-		} else if (!md_in_flight(md))
+		if (!md_in_flight(md))
 			break;
 
 		if (interruptible == TASK_INTERRUPTIBLE &&

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 7/9] rqdm core: refactor completion functions
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
                   ` (5 preceding siblings ...)
  2009-10-16  5:02 ` [PATCH 6/9] rqdm core: use md->pending for in_flight I/O counting Kiyoshi Ueda
@ 2009-10-16  5:03 ` Kiyoshi Ueda
  2009-10-16  5:05 ` [PATCH 8/9] rqdm core: move dm_end_request() Kiyoshi Ueda
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  5:03 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

This patch factors out the clone completion code, dm_done(),
from dm_softirq_done().  No functional change.

This patch is a preparation for PATCH 9.
dm_done() will be used in barrier completion, which can't use and
doesn't need softirq.
The softirq_done callback needs to get a clone from given original
request but it can't in the case of barrier, where an original
request is shared by multiple clones.
On the other hand, the completion of barrier clones doesn't involve
re-submitting requests, which was the primary reason of the need for
softirq.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |   37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -844,35 +844,46 @@ static void dm_end_request(struct reques
 	rq_completed(md, rw, 1);
 }
 
-/*
- * Request completion handler for request-based dm
- */
-static void dm_softirq_done(struct request *rq)
+static void dm_done(struct request *clone, int error, bool mapped)
 {
-	struct request *clone = rq->completion_data;
+	int r = error;
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	dm_request_endio_fn rq_end_io = tio->ti->type->rq_end_io;
-	int error = tio->error;
 
-	if (!(rq->cmd_flags & REQ_FAILED) && rq_end_io)
-		error = rq_end_io(tio->ti, clone, error, &tio->info);
+	if (mapped && rq_end_io)
+		r = rq_end_io(tio->ti, clone, error, &tio->info);
 
-	if (error <= 0)
+	if (r <= 0)
 		/* The target wants to complete the I/O */
-		dm_end_request(clone, error);
-	else if (error == DM_ENDIO_INCOMPLETE)
+		dm_end_request(clone, r);
+	else if (r == DM_ENDIO_INCOMPLETE)
 		/* The target will handle the I/O */
 		return;
-	else if (error == DM_ENDIO_REQUEUE)
+	else if (r == DM_ENDIO_REQUEUE)
 		/* The target wants to requeue the I/O */
 		dm_requeue_unmapped_request(clone);
 	else {
-		DMWARN("unimplemented target endio return value: %d", error);
+		DMWARN("unimplemented target endio return value: %d", r);
 		BUG();
 	}
 }
 
 /*
+ * Request completion handler for request-based dm
+ */
+static void dm_softirq_done(struct request *rq)
+{
+	bool mapped = true;
+	struct request *clone = rq->completion_data;
+	struct dm_rq_target_io *tio = clone->end_io_data;
+
+	if (rq->cmd_flags & REQ_FAILED)
+		mapped = false;
+
+	dm_done(clone, tio->error, mapped);
+}
+
+/*
  * Complete the clone and the original request with the error status
  * through softirq context.
  */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 8/9] rqdm core: move dm_end_request()
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
                   ` (6 preceding siblings ...)
  2009-10-16  5:03 ` [PATCH 7/9] rqdm core: refactor completion functions Kiyoshi Ueda
@ 2009-10-16  5:05 ` Kiyoshi Ueda
  2009-10-16  5:06 ` [PATCH 9/9] rqdm core: add barrier support Kiyoshi Ueda
  2009-10-16  5:17 ` Kiyoshi Ueda
  9 siblings, 0 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  5:05 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

This patch moves dm_end_request().  No functional change.

This patch is a preparation for PATCH 9.
In PATCH 9, dm_requeue_unmapped_request() calls dm_end_request().
So this patch moves it before dm_requeue_unmapped_request() to reduce
the patch size of PATCH 9.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |   62 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -750,6 +750,37 @@ static void free_rq_clone(struct request
 	free_rq_tio(tio);
 }
 
+/*
+ * Complete the clone and the original request.
+ * Must be called without queue lock.
+ */
+static void dm_end_request(struct request *clone, int error)
+{
+	int rw = rq_data_dir(clone);
+	struct dm_rq_target_io *tio = clone->end_io_data;
+	struct mapped_device *md = tio->md;
+	struct request *rq = tio->orig;
+
+	if (blk_pc_request(rq)) {
+		rq->errors = clone->errors;
+		rq->resid_len = clone->resid_len;
+
+		if (rq->sense)
+			/*
+			 * We are using the sense buffer of the original
+			 * request.
+			 * So setting the length of the sense data is enough.
+			 */
+			rq->sense_len = clone->sense_len;
+	}
+
+	free_rq_clone(clone);
+
+	blk_end_request_all(rq, error);
+
+	rq_completed(md, rw, 1);
+}
+
 static void dm_unprep_request(struct request *rq)
 {
 	struct request *clone = rq->special;
@@ -813,37 +844,6 @@ static void start_queue(struct request_q
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
-/*
- * Complete the clone and the original request.
- * Must be called without queue lock.
- */
-static void dm_end_request(struct request *clone, int error)
-{
-	int rw = rq_data_dir(clone);
-	struct dm_rq_target_io *tio = clone->end_io_data;
-	struct mapped_device *md = tio->md;
-	struct request *rq = tio->orig;
-
-	if (blk_pc_request(rq)) {
-		rq->errors = clone->errors;
-		rq->resid_len = clone->resid_len;
-
-		if (rq->sense)
-			/*
-			 * We are using the sense buffer of the original
-			 * request.
-			 * So setting the length of the sense data is enough.
-			 */
-			rq->sense_len = clone->sense_len;
-	}
-
-	free_rq_clone(clone);
-
-	blk_end_request_all(rq, error);
-
-	rq_completed(md, rw, 1);
-}
-
 static void dm_done(struct request *clone, int error, bool mapped)
 {
 	int r = error;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 9/9] rqdm core: add barrier support
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
                   ` (7 preceding siblings ...)
  2009-10-16  5:05 ` [PATCH 8/9] rqdm core: move dm_end_request() Kiyoshi Ueda
@ 2009-10-16  5:06 ` Kiyoshi Ueda
  2009-10-16  5:17 ` Kiyoshi Ueda
  9 siblings, 0 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  5:06 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

This patch adds barrier support for request-based dm.

CORE DESIGN
===========
The design is basically same as bio-based dm, which emulates barrier
by mapping empty barrier bios before/after a barrier I/O.
But request-based dm has been using struct request_queue for I/O
queueing, so the block-layer's barrier mechanism can be used.

o Summary of the block-layer's behavior (which is depended by dm-core)
  Request-based dm uses QUEUE_ORDERED_DRAIN_FLUSH ordered mode for
  I/O barrier.  It means that when an I/O requiring barrier is found
  in the request_queue, the block-layer makes pre-flush request and
  post-flush request just before and just after the I/O respectively.

  After the ordered sequence starts, the block-layer waits for all
  in-flight I/Os to complete, then gives drivers the pre-flush request,
  the barrier I/O and the post-flush request one by one.
  It means that the request_queue is stopped automatically by
  the block-layer until drivers complete each sequence.

o dm-core
  For the barrier I/O, treats it as a normal I/O, so no additional
  code is needed.

  For the pre/post-flush request, flushes caches by the followings:
    1. Make the number of empty barrier requests required by target's
       num_flush_requests, and map them (dm_rq_barrier()).
    2. Waits for the mapped barriers to complete (dm_rq_barrier()).
       If error has occurred, save the error value to md->barrier_error
       (dm_end_request()).
       (*) Basically, the first reported error is taken.
           But -EOPNOTSUPP supersedes any error and DM_ENDIO_REQUEUE
           follows.
    3. Requeue the pre/post-flush request if the error value is
       DM_ENDIO_REQUEUE.  Otherwise, completes with the error value
       (dm_rq_barrier_work()).
  The pre/post-flush work above is done in the kernel thread (kdmflush)
  context, since memory allocation which might sleep is needed in
  dm_rq_barrier() but sleep is not allowed in dm_request_fn(), which is
  an irq-disabled context.
  Also, clones of the pre/post-flush request share an original, so
  such clones can't be completed using the softirq context.
  Instead, complete them in the context of underlying device drivers.
  It should be safe since there is no I/O dispatching during
  the completion of such clones.

  For suspend, the workqueue of kdmflush needs to be flushed after
  the request_queue has been stopped.  Otherwise, the next flush work
  can be kicked even after the suspend completes.

TARGET INTERFACE
================
No new interface is added.
Just use the existing num_flush_requests in struct target_type
as same as bio-based dm.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |  214 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 196 insertions(+), 18 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -143,9 +143,19 @@ struct mapped_device {
 	int barrier_error;
 
 	/*
+	 * Protect barrier_error from concurrent endio processing
+	 * in request-based dm.
+	 */
+	spinlock_t barrier_error_lock;
+
+	/*
 	 * Processing queue (flush/barriers)
 	 */
 	struct workqueue_struct *wq;
+	struct work_struct barrier_work;
+
+	/* A pointer to the currently processing pre/post flush request */
+	struct request *flush_request;
 
 	/*
 	 * The current mapping.
@@ -720,6 +730,23 @@ static void end_clone_bio(struct bio *cl
 	blk_update_request(tio->orig, 0, nr_bytes);
 }
 
+static void store_barrier_error(struct mapped_device *md, int error)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&md->barrier_error_lock, flags);
+	/*
+	 * Basically, the first error is taken, but:
+	 *   -EOPNOTSUPP supersedes any I/O error.
+	 *   Requeue request supersedes any I/O error but -EOPNOTSUPP.
+	 */
+	if (!md->barrier_error || error == -EOPNOTSUPP ||
+	    (md->barrier_error != -EOPNOTSUPP &&
+	     error == DM_ENDIO_REQUEUE))
+		md->barrier_error = error;
+	spin_unlock_irqrestore(&md->barrier_error_lock, flags);
+}
+
 /*
  * Don't touch any member of the md after calling this function because
  * the md may be freed in dm_put() at the end of this function.
@@ -757,11 +784,13 @@ static void free_rq_clone(struct request
 static void dm_end_request(struct request *clone, int error)
 {
 	int rw = rq_data_dir(clone);
+	int run_queue = 1;
+	bool is_barrier = blk_barrier_rq(clone);
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct mapped_device *md = tio->md;
 	struct request *rq = tio->orig;
 
-	if (blk_pc_request(rq)) {
+	if (blk_pc_request(rq) && !is_barrier) {
 		rq->errors = clone->errors;
 		rq->resid_len = clone->resid_len;
 
@@ -776,9 +805,14 @@ static void dm_end_request(struct reques
 
 	free_rq_clone(clone);
 
-	blk_end_request_all(rq, error);
+	if (unlikely(is_barrier)) {
+		if (unlikely(error))
+			store_barrier_error(md, error);
+		run_queue = 0;
+	} else
+		blk_end_request_all(rq, error);
 
-	rq_completed(md, rw, 1);
+	rq_completed(md, rw, run_queue);
 }
 
 static void dm_unprep_request(struct request *rq)
@@ -803,6 +837,16 @@ void dm_requeue_unmapped_request(struct 
 	struct request_queue *q = rq->q;
 	unsigned long flags;
 
+	if (unlikely(blk_barrier_rq(clone))) {
+		/*
+		 * Barrier clones share an original request.
+		 * Leave it to dm_end_request(), which handles this special
+		 * case.
+		 */
+		dm_end_request(clone, DM_ENDIO_REQUEUE);
+		return;
+	}
+
 	dm_unprep_request(rq);
 
 	spin_lock_irqsave(q->queue_lock, flags);
@@ -892,6 +936,19 @@ static void dm_complete_request(struct r
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct request *rq = tio->orig;
 
+	if (unlikely(blk_barrier_rq(clone))) {
+		/*
+		 * Barrier clones share an original request.  So can't use
+		 * softirq_done with the original.
+		 * Pass the clone to dm_done() directly in this special case.
+		 * It is safe (even if clone->q->queue_lock is held here)
+		 * because there is no I/O dispatching during the completion
+		 * of barrier clone.
+		 */
+		dm_done(clone, error, true);
+		return;
+	}
+
 	tio->error = error;
 	rq->completion_data = clone;
 	blk_complete_request(rq);
@@ -908,6 +965,17 @@ void dm_kill_unmapped_request(struct req
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct request *rq = tio->orig;
 
+	if (unlikely(blk_barrier_rq(clone))) {
+		/*
+		 * Barrier clones share an original request.
+		 * Leave it to dm_end_request(), which handles this special
+		 * case.
+		 */
+		BUG_ON(error > 0);
+		dm_end_request(clone, error);
+		return;
+	}
+
 	rq->cmd_flags |= REQ_FAILED;
 	dm_complete_request(clone, error);
 }
@@ -1362,11 +1430,6 @@ static int dm_make_request(struct reques
 {
 	struct mapped_device *md = q->queuedata;
 
-	if (unlikely(bio_rw_flagged(bio, BIO_RW_BARRIER))) {
-		bio_endio(bio, -EOPNOTSUPP);
-		return 0;
-	}
-
 	return md->saved_make_request_fn(q, bio); /* call __make_request() */
 }
 
@@ -1385,6 +1448,25 @@ static int dm_request(struct request_que
 	return _dm_request(q, bio);
 }
 
+/*
+ * Mark this request as flush request, so that dm_request_fn() can
+ * recognize.
+ */
+static void dm_rq_prepare_flush(struct request_queue *q, struct request *rq)
+{
+	rq->cmd_type = REQ_TYPE_LINUX_BLOCK;
+	rq->cmd[0] = REQ_LB_OP_FLUSH;
+}
+
+static bool dm_rq_is_flush_request(struct request *rq)
+{
+	if (rq->cmd_type == REQ_TYPE_LINUX_BLOCK &&
+	    rq->cmd[0] == REQ_LB_OP_FLUSH)
+		return true;
+	else
+		return false;
+}
+
 void dm_dispatch_request(struct request *rq)
 {
 	int r;
@@ -1430,16 +1512,24 @@ static int dm_rq_bio_constructor(struct 
 static int setup_clone(struct request *clone, struct request *rq,
 		       struct dm_rq_target_io *tio)
 {
-	int r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
-				  dm_rq_bio_constructor, tio);
+	int r;
 
-	if (r)
-		return r;
+	if (dm_rq_is_flush_request(rq)) {
+		blk_rq_init(NULL, clone);
+		clone->cmd_type = REQ_TYPE_FS;
+		clone->cmd_flags |= (REQ_HARDBARRIER | WRITE);
+	} else {
+		r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
+				      dm_rq_bio_constructor, tio);
+		if (r)
+			return r;
+
+		clone->cmd = rq->cmd;
+		clone->cmd_len = rq->cmd_len;
+		clone->sense = rq->sense;
+		clone->buffer = rq->buffer;
+	}
 
-	clone->cmd = rq->cmd;
-	clone->cmd_len = rq->cmd_len;
-	clone->sense = rq->sense;
-	clone->buffer = rq->buffer;
 	clone->end_io = end_clone_request;
 	clone->end_io_data = tio;
 
@@ -1480,6 +1570,9 @@ static int dm_prep_fn(struct request_que
 	struct mapped_device *md = q->queuedata;
 	struct request *clone;
 
+	if (unlikely(dm_rq_is_flush_request(rq)))
+		return BLKPREP_OK;
+
 	if (unlikely(rq->special)) {
 		DMWARN("Already has something in rq->special.");
 		return BLKPREP_KILL;
@@ -1559,6 +1652,14 @@ static void dm_request_fn(struct request
 		if (!rq)
 			goto plug_and_out;
 
+		if (unlikely(dm_rq_is_flush_request(rq))) {
+			BUG_ON(md->flush_request);
+			md->flush_request = rq;
+			blk_start_request(rq);
+			queue_work(md->wq, &md->barrier_work);
+			goto out;
+		}
+
 		ti = dm_table_find_target(map, blk_rq_pos(rq));
 		if (ti->type->busy && ti->type->busy(ti))
 			goto plug_and_out;
@@ -1725,6 +1826,7 @@ out:
 static const struct block_device_operations dm_blk_dops;
 
 static void dm_wq_work(struct work_struct *work);
+static void dm_rq_barrier_work(struct work_struct *work);
 
 /*
  * Allocate and initialise a blank device with a given minor.
@@ -1754,6 +1856,7 @@ static struct mapped_device *alloc_dev(i
 	init_rwsem(&md->io_lock);
 	mutex_init(&md->suspend_lock);
 	spin_lock_init(&md->deferred_lock);
+	spin_lock_init(&md->barrier_error_lock);
 	rwlock_init(&md->map_lock);
 	atomic_set(&md->holders, 1);
 	atomic_set(&md->open_count, 0);
@@ -1788,6 +1891,8 @@ static struct mapped_device *alloc_dev(i
 	blk_queue_softirq_done(md->queue, dm_softirq_done);
 	blk_queue_prep_rq(md->queue, dm_prep_fn);
 	blk_queue_lld_busy(md->queue, dm_lld_busy);
+	blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH,
+			  dm_rq_prepare_flush);
 
 	md->disk = alloc_disk(1);
 	if (!md->disk)
@@ -1797,6 +1902,7 @@ static struct mapped_device *alloc_dev(i
 	atomic_set(&md->pending[1], 0);
 	init_waitqueue_head(&md->wait);
 	INIT_WORK(&md->work, dm_wq_work);
+	INIT_WORK(&md->barrier_work, dm_rq_barrier_work);
 	init_waitqueue_head(&md->eventq);
 
 	md->disk->major = _major;
@@ -2184,6 +2290,73 @@ static void dm_queue_flush(struct mapped
 	queue_work(md->wq, &md->work);
 }
 
+static void dm_rq_set_flush_nr(struct request *clone, unsigned flush_nr)
+{
+	struct dm_rq_target_io *tio = clone->end_io_data;
+
+	tio->info.flush_request = flush_nr;
+}
+
+/* Issue barrier requests to targets and wait for their completion. */
+static int dm_rq_barrier(struct mapped_device *md)
+{
+	int i, j;
+	struct dm_table *map = dm_get_table(md);
+	unsigned num_targets = dm_table_get_num_targets(map);
+	struct dm_target *ti;
+	struct request *clone;
+
+	md->barrier_error = 0;
+
+	for (i = 0; i < num_targets; i++) {
+		ti = dm_table_get_target(map, i);
+		for (j = 0; j < ti->num_flush_requests; j++) {
+			clone = clone_rq(md->flush_request, md, GFP_NOIO);
+			dm_rq_set_flush_nr(clone, j);
+			atomic_inc(&md->pending[rq_data_dir(clone)]);
+			map_request(ti, clone, md);
+		}
+	}
+
+	dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
+	dm_table_put(map);
+
+	return md->barrier_error;
+}
+
+static void dm_rq_barrier_work(struct work_struct *work)
+{
+	int error;
+	struct mapped_device *md = container_of(work, struct mapped_device,
+						barrier_work);
+	struct request_queue *q = md->queue;
+	struct request *rq;
+	unsigned long flags;
+
+	/*
+	 * Hold the md reference here and leave it at the last part so that
+	 * the md can't be deleted by device opener when the barrier request
+	 * completes.
+	 */
+	dm_get(md);
+
+	error = dm_rq_barrier(md);
+
+	rq = md->flush_request;
+	md->flush_request = NULL;
+
+	if (error == DM_ENDIO_REQUEUE) {
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_requeue_request(q, rq);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	} else
+		blk_end_request_all(rq, error);
+
+	blk_run_queue(q);
+
+	dm_put(md);
+}
+
 /*
  * Swap in a new table (destroying old one).
  */
@@ -2324,11 +2497,16 @@ int dm_suspend(struct mapped_device *md,
 	set_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags);
 	up_write(&md->io_lock);
 
-	flush_workqueue(md->wq);
-
+	/*
+	 * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which
+	 * can be kicked until md->queue is stopped.  So stop md->queue before
+	 * flushing md->wq.
+	 */
 	if (dm_request_based(md))
 		stop_queue(md->queue);
 
+	flush_workqueue(md->wq);
+
 	/*
 	 * At this point no more requests are entering target request routines.
 	 * We call dm_wait_for_completion to wait for all existing requests

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 9/9] rqdm core: add barrier support
  2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
                   ` (8 preceding siblings ...)
  2009-10-16  5:06 ` [PATCH 9/9] rqdm core: add barrier support Kiyoshi Ueda
@ 2009-10-16  5:17 ` Kiyoshi Ueda
  9 siblings, 0 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-16  5:17 UTC (permalink / raw)
  To: Alasdair Kergon; +Cc: device-mapper development

(Resending since the patchwork mis-catched.)

This patch adds barrier support for request-based dm.


CORE DESIGN

The design is basically same as bio-based dm, which emulates barrier
by mapping empty barrier bios before/after a barrier I/O.
But request-based dm has been using struct request_queue for I/O
queueing, so the block-layer's barrier mechanism can be used.

o Summary of the block-layer's behavior (which is depended by dm-core)
  Request-based dm uses QUEUE_ORDERED_DRAIN_FLUSH ordered mode for
  I/O barrier.  It means that when an I/O requiring barrier is found
  in the request_queue, the block-layer makes pre-flush request and
  post-flush request just before and just after the I/O respectively.

  After the ordered sequence starts, the block-layer waits for all
  in-flight I/Os to complete, then gives drivers the pre-flush request,
  the barrier I/O and the post-flush request one by one.
  It means that the request_queue is stopped automatically by
  the block-layer until drivers complete each sequence.

o dm-core
  For the barrier I/O, treats it as a normal I/O, so no additional
  code is needed.

  For the pre/post-flush request, flushes caches by the followings:
    1. Make the number of empty barrier requests required by target's
       num_flush_requests, and map them (dm_rq_barrier()).
    2. Waits for the mapped barriers to complete (dm_rq_barrier()).
       If error has occurred, save the error value to md->barrier_error
       (dm_end_request()).
       (*) Basically, the first reported error is taken.
           But -EOPNOTSUPP supersedes any error and DM_ENDIO_REQUEUE
           follows.
    3. Requeue the pre/post-flush request if the error value is
       DM_ENDIO_REQUEUE.  Otherwise, completes with the error value
       (dm_rq_barrier_work()).
  The pre/post-flush work above is done in the kernel thread (kdmflush)
  context, since memory allocation which might sleep is needed in
  dm_rq_barrier() but sleep is not allowed in dm_request_fn(), which is
  an irq-disabled context.
  Also, clones of the pre/post-flush request share an original, so
  such clones can't be completed using the softirq context.
  Instead, complete them in the context of underlying device drivers.
  It should be safe since there is no I/O dispatching during
  the completion of such clones.

  For suspend, the workqueue of kdmflush needs to be flushed after
  the request_queue has been stopped.  Otherwise, the next flush work
  can be kicked even after the suspend completes.


TARGET INTERFACE

No new interface is added.
Just use the existing num_flush_requests in struct target_type
as same as bio-based dm.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm.c |  214 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 196 insertions(+), 18 deletions(-)

Index: 2.6.32-rc4/drivers/md/dm.c
===================================================================
--- 2.6.32-rc4.orig/drivers/md/dm.c
+++ 2.6.32-rc4/drivers/md/dm.c
@@ -143,9 +143,19 @@ struct mapped_device {
 	int barrier_error;
 
 	/*
+	 * Protect barrier_error from concurrent endio processing
+	 * in request-based dm.
+	 */
+	spinlock_t barrier_error_lock;
+
+	/*
 	 * Processing queue (flush/barriers)
 	 */
 	struct workqueue_struct *wq;
+	struct work_struct barrier_work;
+
+	/* A pointer to the currently processing pre/post flush request */
+	struct request *flush_request;
 
 	/*
 	 * The current mapping.
@@ -720,6 +730,23 @@ static void end_clone_bio(struct bio *cl
 	blk_update_request(tio->orig, 0, nr_bytes);
 }
 
+static void store_barrier_error(struct mapped_device *md, int error)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&md->barrier_error_lock, flags);
+	/*
+	 * Basically, the first error is taken, but:
+	 *   -EOPNOTSUPP supersedes any I/O error.
+	 *   Requeue request supersedes any I/O error but -EOPNOTSUPP.
+	 */
+	if (!md->barrier_error || error == -EOPNOTSUPP ||
+	    (md->barrier_error != -EOPNOTSUPP &&
+	     error == DM_ENDIO_REQUEUE))
+		md->barrier_error = error;
+	spin_unlock_irqrestore(&md->barrier_error_lock, flags);
+}
+
 /*
  * Don't touch any member of the md after calling this function because
  * the md may be freed in dm_put() at the end of this function.
@@ -757,11 +784,13 @@ static void free_rq_clone(struct request
 static void dm_end_request(struct request *clone, int error)
 {
 	int rw = rq_data_dir(clone);
+	int run_queue = 1;
+	bool is_barrier = blk_barrier_rq(clone);
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct mapped_device *md = tio->md;
 	struct request *rq = tio->orig;
 
-	if (blk_pc_request(rq)) {
+	if (blk_pc_request(rq) && !is_barrier) {
 		rq->errors = clone->errors;
 		rq->resid_len = clone->resid_len;
 
@@ -776,9 +805,14 @@ static void dm_end_request(struct reques
 
 	free_rq_clone(clone);
 
-	blk_end_request_all(rq, error);
+	if (unlikely(is_barrier)) {
+		if (unlikely(error))
+			store_barrier_error(md, error);
+		run_queue = 0;
+	} else
+		blk_end_request_all(rq, error);
 
-	rq_completed(md, rw, 1);
+	rq_completed(md, rw, run_queue);
 }
 
 static void dm_unprep_request(struct request *rq)
@@ -803,6 +837,16 @@ void dm_requeue_unmapped_request(struct 
 	struct request_queue *q = rq->q;
 	unsigned long flags;
 
+	if (unlikely(blk_barrier_rq(clone))) {
+		/*
+		 * Barrier clones share an original request.
+		 * Leave it to dm_end_request(), which handles this special
+		 * case.
+		 */
+		dm_end_request(clone, DM_ENDIO_REQUEUE);
+		return;
+	}
+
 	dm_unprep_request(rq);
 
 	spin_lock_irqsave(q->queue_lock, flags);
@@ -892,6 +936,19 @@ static void dm_complete_request(struct r
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct request *rq = tio->orig;
 
+	if (unlikely(blk_barrier_rq(clone))) {
+		/*
+		 * Barrier clones share an original request.  So can't use
+		 * softirq_done with the original.
+		 * Pass the clone to dm_done() directly in this special case.
+		 * It is safe (even if clone->q->queue_lock is held here)
+		 * because there is no I/O dispatching during the completion
+		 * of barrier clone.
+		 */
+		dm_done(clone, error, true);
+		return;
+	}
+
 	tio->error = error;
 	rq->completion_data = clone;
 	blk_complete_request(rq);
@@ -908,6 +965,17 @@ void dm_kill_unmapped_request(struct req
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct request *rq = tio->orig;
 
+	if (unlikely(blk_barrier_rq(clone))) {
+		/*
+		 * Barrier clones share an original request.
+		 * Leave it to dm_end_request(), which handles this special
+		 * case.
+		 */
+		BUG_ON(error > 0);
+		dm_end_request(clone, error);
+		return;
+	}
+
 	rq->cmd_flags |= REQ_FAILED;
 	dm_complete_request(clone, error);
 }
@@ -1362,11 +1430,6 @@ static int dm_make_request(struct reques
 {
 	struct mapped_device *md = q->queuedata;
 
-	if (unlikely(bio_rw_flagged(bio, BIO_RW_BARRIER))) {
-		bio_endio(bio, -EOPNOTSUPP);
-		return 0;
-	}
-
 	return md->saved_make_request_fn(q, bio); /* call __make_request() */
 }
 
@@ -1385,6 +1448,25 @@ static int dm_request(struct request_que
 	return _dm_request(q, bio);
 }
 
+/*
+ * Mark this request as flush request, so that dm_request_fn() can
+ * recognize.
+ */
+static void dm_rq_prepare_flush(struct request_queue *q, struct request *rq)
+{
+	rq->cmd_type = REQ_TYPE_LINUX_BLOCK;
+	rq->cmd[0] = REQ_LB_OP_FLUSH;
+}
+
+static bool dm_rq_is_flush_request(struct request *rq)
+{
+	if (rq->cmd_type == REQ_TYPE_LINUX_BLOCK &&
+	    rq->cmd[0] == REQ_LB_OP_FLUSH)
+		return true;
+	else
+		return false;
+}
+
 void dm_dispatch_request(struct request *rq)
 {
 	int r;
@@ -1430,16 +1512,24 @@ static int dm_rq_bio_constructor(struct 
 static int setup_clone(struct request *clone, struct request *rq,
 		       struct dm_rq_target_io *tio)
 {
-	int r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
-				  dm_rq_bio_constructor, tio);
+	int r;
 
-	if (r)
-		return r;
+	if (dm_rq_is_flush_request(rq)) {
+		blk_rq_init(NULL, clone);
+		clone->cmd_type = REQ_TYPE_FS;
+		clone->cmd_flags |= (REQ_HARDBARRIER | WRITE);
+	} else {
+		r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
+				      dm_rq_bio_constructor, tio);
+		if (r)
+			return r;
+
+		clone->cmd = rq->cmd;
+		clone->cmd_len = rq->cmd_len;
+		clone->sense = rq->sense;
+		clone->buffer = rq->buffer;
+	}
 
-	clone->cmd = rq->cmd;
-	clone->cmd_len = rq->cmd_len;
-	clone->sense = rq->sense;
-	clone->buffer = rq->buffer;
 	clone->end_io = end_clone_request;
 	clone->end_io_data = tio;
 
@@ -1480,6 +1570,9 @@ static int dm_prep_fn(struct request_que
 	struct mapped_device *md = q->queuedata;
 	struct request *clone;
 
+	if (unlikely(dm_rq_is_flush_request(rq)))
+		return BLKPREP_OK;
+
 	if (unlikely(rq->special)) {
 		DMWARN("Already has something in rq->special.");
 		return BLKPREP_KILL;
@@ -1559,6 +1652,14 @@ static void dm_request_fn(struct request
 		if (!rq)
 			goto plug_and_out;
 
+		if (unlikely(dm_rq_is_flush_request(rq))) {
+			BUG_ON(md->flush_request);
+			md->flush_request = rq;
+			blk_start_request(rq);
+			queue_work(md->wq, &md->barrier_work);
+			goto out;
+		}
+
 		ti = dm_table_find_target(map, blk_rq_pos(rq));
 		if (ti->type->busy && ti->type->busy(ti))
 			goto plug_and_out;
@@ -1725,6 +1826,7 @@ out:
 static const struct block_device_operations dm_blk_dops;
 
 static void dm_wq_work(struct work_struct *work);
+static void dm_rq_barrier_work(struct work_struct *work);
 
 /*
  * Allocate and initialise a blank device with a given minor.
@@ -1754,6 +1856,7 @@ static struct mapped_device *alloc_dev(i
 	init_rwsem(&md->io_lock);
 	mutex_init(&md->suspend_lock);
 	spin_lock_init(&md->deferred_lock);
+	spin_lock_init(&md->barrier_error_lock);
 	rwlock_init(&md->map_lock);
 	atomic_set(&md->holders, 1);
 	atomic_set(&md->open_count, 0);
@@ -1788,6 +1891,8 @@ static struct mapped_device *alloc_dev(i
 	blk_queue_softirq_done(md->queue, dm_softirq_done);
 	blk_queue_prep_rq(md->queue, dm_prep_fn);
 	blk_queue_lld_busy(md->queue, dm_lld_busy);
+	blk_queue_ordered(md->queue, QUEUE_ORDERED_DRAIN_FLUSH,
+			  dm_rq_prepare_flush);
 
 	md->disk = alloc_disk(1);
 	if (!md->disk)
@@ -1797,6 +1902,7 @@ static struct mapped_device *alloc_dev(i
 	atomic_set(&md->pending[1], 0);
 	init_waitqueue_head(&md->wait);
 	INIT_WORK(&md->work, dm_wq_work);
+	INIT_WORK(&md->barrier_work, dm_rq_barrier_work);
 	init_waitqueue_head(&md->eventq);
 
 	md->disk->major = _major;
@@ -2184,6 +2290,73 @@ static void dm_queue_flush(struct mapped
 	queue_work(md->wq, &md->work);
 }
 
+static void dm_rq_set_flush_nr(struct request *clone, unsigned flush_nr)
+{
+	struct dm_rq_target_io *tio = clone->end_io_data;
+
+	tio->info.flush_request = flush_nr;
+}
+
+/* Issue barrier requests to targets and wait for their completion. */
+static int dm_rq_barrier(struct mapped_device *md)
+{
+	int i, j;
+	struct dm_table *map = dm_get_table(md);
+	unsigned num_targets = dm_table_get_num_targets(map);
+	struct dm_target *ti;
+	struct request *clone;
+
+	md->barrier_error = 0;
+
+	for (i = 0; i < num_targets; i++) {
+		ti = dm_table_get_target(map, i);
+		for (j = 0; j < ti->num_flush_requests; j++) {
+			clone = clone_rq(md->flush_request, md, GFP_NOIO);
+			dm_rq_set_flush_nr(clone, j);
+			atomic_inc(&md->pending[rq_data_dir(clone)]);
+			map_request(ti, clone, md);
+		}
+	}
+
+	dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
+	dm_table_put(map);
+
+	return md->barrier_error;
+}
+
+static void dm_rq_barrier_work(struct work_struct *work)
+{
+	int error;
+	struct mapped_device *md = container_of(work, struct mapped_device,
+						barrier_work);
+	struct request_queue *q = md->queue;
+	struct request *rq;
+	unsigned long flags;
+
+	/*
+	 * Hold the md reference here and leave it at the last part so that
+	 * the md can't be deleted by device opener when the barrier request
+	 * completes.
+	 */
+	dm_get(md);
+
+	error = dm_rq_barrier(md);
+
+	rq = md->flush_request;
+	md->flush_request = NULL;
+
+	if (error == DM_ENDIO_REQUEUE) {
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_requeue_request(q, rq);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	} else
+		blk_end_request_all(rq, error);
+
+	blk_run_queue(q);
+
+	dm_put(md);
+}
+
 /*
  * Swap in a new table (destroying old one).
  */
@@ -2324,11 +2497,16 @@ int dm_suspend(struct mapped_device *md,
 	set_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags);
 	up_write(&md->io_lock);
 
-	flush_workqueue(md->wq);
-
+	/*
+	 * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which
+	 * can be kicked until md->queue is stopped.  So stop md->queue before
+	 * flushing md->wq.
+	 */
 	if (dm_request_based(md))
 		stop_queue(md->queue);
 
+	flush_workqueue(md->wq);
+
 	/*
 	 * At this point no more requests are entering target request routines.
 	 * We call dm_wait_for_completion to wait for all existing requests

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/9] dm core: clean up in-flight checking
  2009-10-16  4:55 ` [PATCH 1/9] dm core: clean up in-flight checking Kiyoshi Ueda
@ 2009-10-19 21:16   ` Alasdair G Kergon
  2009-10-20  7:47     ` Kiyoshi Ueda
  0 siblings, 1 reply; 15+ messages in thread
From: Alasdair G Kergon @ 2009-10-19 21:16 UTC (permalink / raw)
  To: Kiyoshi Ueda; +Cc: device-mapper development

On Fri, Oct 16, 2009 at 01:55:29PM +0900, Kiyoshi Ueda wrote:
> This patch depends on:
>   http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=commit;h=a9327cac440be4d8333bba975cbbf76045096275
>   (Seperate read and write statistics of in_flight requests)
 
Already upstream I think!

Alasdair

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/9] dm core: clean up in-flight checking
  2009-10-19 21:16   ` Alasdair G Kergon
@ 2009-10-20  7:47     ` Kiyoshi Ueda
  0 siblings, 0 replies; 15+ messages in thread
From: Kiyoshi Ueda @ 2009-10-20  7:47 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: device-mapper development

Hi Alasdair,

On 2009/10/20 6:16 +0900, Alasdair G Kergon wrote:
> On Fri, Oct 16, 2009 at 01:55:29PM +0900, Kiyoshi Ueda wrote:
>> This patch depends on:
>>   http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=commit;h=a9327cac440be4d8333bba975cbbf76045096275
>>   (Seperate read and write statistics of in_flight requests)
>  
> Already upstream I think!

It has been committed in 2.6.32-rc5, so this clean-up patch
can be applied on it now.

I'll update my patch-set (on top of 2.6.32-rc6 probably) and re-post
if you want.

> Alasdair

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/9] rqdm core: simplify suspend code
  2009-10-16  5:01 ` [PATCH 5/9] rqdm core: simplify suspend code Kiyoshi Ueda
@ 2009-10-28 16:21   ` Alasdair G Kergon
  2009-10-28 17:39   ` Alasdair G Kergon
  1 sibling, 0 replies; 15+ messages in thread
From: Alasdair G Kergon @ 2009-10-28 16:21 UTC (permalink / raw)
  To: Kiyoshi Ueda; +Cc: device-mapper development

On Fri, Oct 16, 2009 at 02:01:10PM +0900, Kiyoshi Ueda wrote:
> In case of suspend with "--nolockfs" but without "--noflush",
> the semantics for bio-based dm has been changed in 2.6.30 as below:
>   before 2.6.30:   I/Os submitted before the suspend invocation
>                    are flushed
>   2.6.30 or later: I/Os submitted even before the suspend invocation
>                    may not be flushed
>   (*) We had no idea whether the semantics change hurt someone.
>       (For details, see http://marc.info/?t=123994433400003&r=1&w=2)
>       But it seems no hurt since there is no complaint against 2.6.30
>       or later.

See my new reply on that thread.
 
> Since the semantics of request-based dm committed in 2.6.31-rc1
> is still old one, change it to new one by this patch.

Do you agree that after applying this patch series here, we could
reinstate 'suspend with flush' by flushing before entering the guts of
the suspend code (i.e. sending a flush to the dm device, which would be
implemented using the barrier) and then suspending (--noflush)?

Alasdair

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/9] rqdm core: simplify suspend code
  2009-10-16  5:01 ` [PATCH 5/9] rqdm core: simplify suspend code Kiyoshi Ueda
  2009-10-28 16:21   ` Alasdair G Kergon
@ 2009-10-28 17:39   ` Alasdair G Kergon
  1 sibling, 0 replies; 15+ messages in thread
From: Alasdair G Kergon @ 2009-10-28 17:39 UTC (permalink / raw)
  To: Kiyoshi Ueda; +Cc: device-mapper development

On Fri, Oct 16, 2009 at 02:01:10PM +0900, Kiyoshi Ueda wrote:
>   o To keep the old semantics, we need to distinguish which I/Os were
>     queued before/after the suspend invocation.
>     So a special barrier-like request called 'suspend marker' was
>     introduced.

This is the key to the simplification - when dm devices can support
flushing properly (using this barrier support), there is no need for an
additional implementation of 'flush' to be embedded within any
internal suspend logic.

Alasdair

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-10-28 17:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-16  4:52 [PATCH 0/9] barrier support for request-based dm Kiyoshi Ueda
2009-10-16  4:55 ` [PATCH 1/9] dm core: clean up in-flight checking Kiyoshi Ueda
2009-10-19 21:16   ` Alasdair G Kergon
2009-10-20  7:47     ` Kiyoshi Ueda
2009-10-16  4:57 ` [PATCH 2/9] rqdm core: map_request() takes clone instead of orig Kiyoshi Ueda
2009-10-16  4:58 ` [PATCH 3/9] rqdm core: alloc_rq_tio() takes gfp_mask Kiyoshi Ueda
2009-10-16  4:59 ` [PATCH 4/9] rqdm core: clean up request cloning Kiyoshi Ueda
2009-10-16  5:01 ` [PATCH 5/9] rqdm core: simplify suspend code Kiyoshi Ueda
2009-10-28 16:21   ` Alasdair G Kergon
2009-10-28 17:39   ` Alasdair G Kergon
2009-10-16  5:02 ` [PATCH 6/9] rqdm core: use md->pending for in_flight I/O counting Kiyoshi Ueda
2009-10-16  5:03 ` [PATCH 7/9] rqdm core: refactor completion functions Kiyoshi Ueda
2009-10-16  5:05 ` [PATCH 8/9] rqdm core: move dm_end_request() Kiyoshi Ueda
2009-10-16  5:06 ` [PATCH 9/9] rqdm core: add barrier support Kiyoshi Ueda
2009-10-16  5:17 ` Kiyoshi Ueda

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.