All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review)
@ 2005-06-05  5:57 Tejun Heo
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 01/09] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

 Hello.

 This is the second take of blk ordered reimplementation.  The
original plan was to just add FUA support to the first
reimplementation, but I ended up reimplementing the whole thing again.

 In the original implementation and the first reimplementation, blk
layer just passes through barrier requests when ordered tag is
supported, assuming that the device reporting ordered tag capability
will do the right thing.  However, the only thing ordered tag
guarantees is that the request will be processed in order.  It doesn't
specify anything regarding cache.  So, no cache flushing occurs before
or after the ordered request, thus breaking the semantic of barrier
requests.

 Another problem I've found is with the SCSI midlayer.  When ordered
tag is used, HARD_BARRIER requests should be issued in the exact order
it left blk layer queue.  However, SCSI midlayer request issue
function isn't atomic.  So, requests can be reordered after it leaves
blk layer queue.

 Actually, during testing I've seen this problem occurs quite often
with a SCSI host which has queue depth of 1 (only one outstanding
command).  To do barrier request, the new implementation issues three
requests (pre-flush, barrier-write, post-flush) in order.  After pre
is delivered to LLDD, SCSI midlayer fetches the bar request and tries
to issue it, SCSI midlayer finds out that it should be delayed *after*
it dropped the queue lock and acquired the host lock, and it
determines to requeue the request.  In the meantime, the pre-flush
completes, and the interrupt handler tries to issue the next command.
Unfortunately, the interrupt handler wins the race and successfully
issues the post-flush and after that the bar-write is requeued.  So,
in the end, we have (pre-flush, post-flush, barrier-write).

 * The new implementation is quite flexible in how barrier requests
   can be handled.  Any combination of pre/post-flushing and FUA
   barrier write can be used, and request order can be kept either by
   ordered tag or manual draining.

  The following ordered methods are supported.

  QUEUE_ORDERED_*

  NONE		: ordered requests unsupported,
		  barriers fail w/ -EOPNOTSUPP
  DRAIN		: ordering requests by draining is enough  (no flushing,
		  just drains in-flight requests, issue the barrier,
		  and delay other requests till it completes.)
  DRAIN_FLUSH	: ordering requests by draining w/ pre and post flush
  DRAIN_FUA	: ordering requests by draining w/ pre flush and FUA write
  TAG		: ordering requests with ordered tag is enough
  TAG_FLUSH	: ordering requests with ordered tag w/ pre and post flush
  TAG_FUA	: ordering requests with ordered tag w/ pre flush and FUA-w

 * Barrier requests can be retried, thus allowing falling back to more
   basic methods when something fancy (ordered tag/FUA write) fails.

   Whether to use automatic fallback can be specified with
   QUEUE_ORDERED_NOFALLBACK flag.  When not disabled, blk layer will
   automatically select the next method and retry the failing barrier
   until it succeeds or fallsback to QUEUE_ORDERED_NONE.  Individual
   drivers can also switch ordered method as it likes during ordered
   sequence and the barrier request is retried by the blk layer.

 * Implementation is contained inside the blk layer (ll_rw_blk.c and
   elevator.c).  Specific drivers only need to configure which method
   to use and supply prepare_flush_fn.

 * Devices which support command tagging but not ordered tags can use
   flushing safely.  All in-flight requests are drained before
   pre-flush and no other request until whole sequence completes.
   SATA NCQ falls in this category.

 This patchset makes the following behavior changes.

 * Original code used to BUG_ON in __elv_next_request() when
   elevator_next_req_fn() returns a barrier request when ordered is
   QUEUE_ORDERED_NONE.  This is a bug as ordered could have been
   changed dynamically by previous barrier request while a barrier
   request in already on the queue.  New code just completes such
   requests w/ -EOPNOTSUPP.

 * Barrier request is not deactivated during ordered sequence.  It's
   dequeued before pre-flush and completed after whole sequence is
   finished.  The original code ended up calling
   elv_deactivate_request() twice consecutively on a barrier request
   as it's also called by elv_requeue_request().  Currently, all
   elevators cope with the consecutive deactivations, but I don't
   think it's a very good idea.

 * Any error during flush sequence results in error completion of
   whole barrier request.  (IDE used to report partial success, and
   SCSI didn't handle errors during barrier request properly.)

 I've updated SCSI/libata and IDE to use the new implementation.  This
patchset isn't ready for inclusion yet.  I need to update API docs and
do more testing.  Also, Jens, the reimplementation patch includes some
hacks in elevator.c to not confuse elevators with proxy barrier
request.  The hack can go away with generic dispatch queue I'm
currently cooking.  It takes all dispatch queues from individual
elevators and generalizes it, thus simplifying individual elevators
and making forcing semantics on the dispatch queue easier.

[ Start of patch descriptions ]

01_blk_add_uptodate_to_end_that_request_last.patch
	: add @uptodate to end_that_request_last() and @error to rq_end_io_fn()

	Add @uptodate argument to end_that_request_last() and @error
	to rq_end_io_fn().  There's no generic way to pass error code
	to request completion function, making generic error handling
	of non-fs request difficult (rq->errors is driver-specific and
	each driver uses it differently).  This patch adds @uptodate
	to end_that_request_last() and @error to rq_end_io_fn().

	For fs requests, this doesn't really matter, so just using the
	same uptodate argument used in the last call to
	end_that_request_first() should suffice.  IMHO, this can also
	help the generic command-carrying request Jens is working on.

	One thing that bothers me is this change can be user-visible
	in that additional error codes may be seen by some ioctls.
	eg. -EOPNOTSUPP instead of -EIO can be seen with the following
	two patches on direct-command ioctls.

02_blk_scsi_eopnotsupp.patch
	: make scsi use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST

	Use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST.

03_blk_ide_eopnotsupp.patch
	: make ide use -EOPNOTSUPP instead of -EIO on ABRT_ERR

	Use -EOPNOTSUPP instead of -EIO on ABRT_ERR.

04_blk_add_init_request_from_bio.patch
	: separate out bio init part from __make_request

	Separate out bio initialization part from __make_request.  It
	will be used by the following blk_ordered_reimpl.

05_blk_ordered_reimpl.patch
	: reimplement handling of barrier request

	Reimplement handling of barrier requests.

	* Flexible handling to deal with various capabilities of
          target devices.
	* Retry support for falling back.
	* Tagged queues which don't support ordered tag can do ordered.

06_blk_update_scsi_to_use_new_ordered.patch
	: update SCSI to use the new blk_ordered

	All ordered request related stuff delegated to HLD.  Midlayer
	now doens't deal with ordered setting or prepare_flush
	callback.  sd.c updated to deal with blk_queue_ordered
	setting.  Currently, ordered tag isn't used as SCSI midlayer
	cannot guarantee request ordering.

07_blk_update_libata_to_use_new_ordered.patch
	: update libata to use the new blk_ordered.

	Reflect changes in SCSI midlayer and updated to use the new
	ordered request implementation.

08_blk_update_ide_to_use_new_ordered.patch
	: update IDE to use the new blk_ordered.

	Update IDE to use the new blk_ordered.

09_blk_ordered_reimpl_debug_msgs.patch
	: debug messages

	Theses are debug message I've been using.  If you wanna see
	what's going on...

[ End of patch descriptions ]

 Thanks.

--
tejun


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 01/09] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn()
  2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
@ 2005-06-05  5:57 ` Tejun Heo
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 02/09] blk: make scsi use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST Tejun Heo
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

01_blk_add_uptodate_to_end_that_request_last.patch

	Add @uptodate argument to end_that_request_last() and @error
	to rq_end_io_fn().  There's no generic way to pass error code
	to request completion function, making generic error handling
	of non-fs request difficult (rq->errors is driver-specific and
	each driver uses it differently).  This patch adds @uptodate
	to end_that_request_last() and @error to rq_end_io_fn().

	For fs requests, this doesn't really matter, so just using the
	same uptodate argument used in the last call to
	end_that_request_first() should suffice.  IMHO, this can also
	help the generic command-carrying request Jens is working on.

	One thing that bothers me is this change can be user-visible
	in that additional error codes may be seen by some ioctls.
	eg. -EOPNOTSUPP instead of -EIO can be seen with the following
	two patches on direct-command ioctls.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/block/DAC960.c          |    2 +-
 drivers/block/cciss.c           |    2 +-
 drivers/block/cpqarray.c        |    2 +-
 drivers/block/elevator.c        |    2 +-
 drivers/block/floppy.c          |    2 +-
 drivers/block/ll_rw_blk.c       |   20 ++++++++++++++------
 drivers/block/nbd.c             |    2 +-
 drivers/block/sx8.c             |    2 +-
 drivers/block/ub.c              |    2 +-
 drivers/block/viodasd.c         |    2 +-
 drivers/cdrom/cdu31a.c          |    2 +-
 drivers/ide/ide-cd.c            |    4 ++--
 drivers/ide/ide-io.c            |    6 +++---
 drivers/message/i2o/i2o_block.c |    2 +-
 drivers/mmc/mmc_block.c         |    4 ++--
 drivers/s390/block/dasd.c       |    2 +-
 drivers/s390/char/tape_block.c  |    2 +-
 drivers/scsi/ide-scsi.c         |    4 ++--
 drivers/scsi/scsi_lib.c         |    8 ++++----
 drivers/scsi/sd.c               |    2 +-
 include/linux/blkdev.h          |    6 +++---
 21 files changed, 44 insertions(+), 36 deletions(-)

Index: blk-fixes/drivers/block/DAC960.c
===================================================================
--- blk-fixes.orig/drivers/block/DAC960.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/DAC960.c	2005-06-05 14:53:32.000000000 +0900
@@ -3480,7 +3480,7 @@ static inline boolean DAC960_ProcessComp
 
 	 if (!end_that_request_first(Request, UpToDate, Command->BlockCount)) {
 
- 	 	end_that_request_last(Request);
+ 	 	end_that_request_last(Request, UpToDate);
 
 		if (Command->Completion) {
 			complete(Command->Completion);
Index: blk-fixes/drivers/block/cciss.c
===================================================================
--- blk-fixes.orig/drivers/block/cciss.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/cciss.c	2005-06-05 14:53:32.000000000 +0900
@@ -2072,7 +2072,7 @@ static inline void complete_command( ctl
 	printk("Done with %p\n", cmd->rq);
 #endif /* CCISS_DEBUG */ 
 
-	end_that_request_last(cmd->rq);
+	end_that_request_last(cmd->rq, status ? 1 : -EIO);
 	cmd_free(h,cmd,1);
 }
 
Index: blk-fixes/drivers/block/cpqarray.c
===================================================================
--- blk-fixes.orig/drivers/block/cpqarray.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/cpqarray.c	2005-06-05 14:53:32.000000000 +0900
@@ -1036,7 +1036,7 @@ static inline void complete_command(cmdl
 	complete_buffers(cmd->rq->bio, ok);
 
         DBGPX(printk("Done with %p\n", cmd->rq););
-	end_that_request_last(cmd->rq);
+	end_that_request_last(cmd->rq, ok ? 1 : -EIO);
 }
 
 /*
Index: blk-fixes/drivers/block/elevator.c
===================================================================
--- blk-fixes.orig/drivers/block/elevator.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/elevator.c	2005-06-05 14:53:32.000000000 +0900
@@ -410,7 +410,7 @@ struct request *elv_next_request(request
 			blkdev_dequeue_request(rq);
 			rq->flags |= REQ_QUIET;
 			end_that_request_chunk(rq, 0, nr_bytes);
-			end_that_request_last(rq);
+			end_that_request_last(rq, 0);
 		} else {
 			printk(KERN_ERR "%s: bad return=%d\n", __FUNCTION__,
 								ret);
Index: blk-fixes/drivers/block/floppy.c
===================================================================
--- blk-fixes.orig/drivers/block/floppy.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/floppy.c	2005-06-05 14:53:32.000000000 +0900
@@ -2299,7 +2299,7 @@ static void floppy_end_request(struct re
 	add_disk_randomness(req->rq_disk);
 	floppy_off((long)req->rq_disk->private_data);
 	blkdev_dequeue_request(req);
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 
 	/* We're done with the request */
 	current_req = NULL;
Index: blk-fixes/drivers/block/ll_rw_blk.c
===================================================================
--- blk-fixes.orig/drivers/block/ll_rw_blk.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/ll_rw_blk.c	2005-06-05 14:53:32.000000000 +0900
@@ -347,7 +347,7 @@ EXPORT_SYMBOL(blk_queue_issue_flush_fn);
 /*
  * Cache flushing for ordered writes handling
  */
-static void blk_pre_flush_end_io(struct request *flush_rq)
+static void blk_pre_flush_end_io(struct request *flush_rq, int error)
 {
 	struct request *rq = flush_rq->end_io_data;
 	request_queue_t *q = rq->q;
@@ -363,7 +363,7 @@ static void blk_pre_flush_end_io(struct 
 	}
 }
 
-static void blk_post_flush_end_io(struct request *flush_rq)
+static void blk_post_flush_end_io(struct request *flush_rq, int error)
 {
 	struct request *rq = flush_rq->end_io_data;
 	request_queue_t *q = rq->q;
@@ -2386,7 +2386,7 @@ EXPORT_SYMBOL(blk_put_request);
  * blk_end_sync_rq - executes a completion event on a request
  * @rq: request to complete
  */
-void blk_end_sync_rq(struct request *rq)
+void blk_end_sync_rq(struct request *rq, int error)
 {
 	struct completion *waiting = rq->waiting;
 
@@ -3141,9 +3141,17 @@ EXPORT_SYMBOL(end_that_request_chunk);
 /*
  * queue lock must be held
  */
-void end_that_request_last(struct request *req)
+void end_that_request_last(struct request *req, int uptodate)
 {
 	struct gendisk *disk = req->rq_disk;
+	int error;
+
+	/*
+	 * extend uptodate bool to allow < 0 value to be direct io error
+	 */
+	error = 0;
+	if (end_io_error(uptodate))
+		error = !uptodate ? -EIO : uptodate;
 
 	if (unlikely(laptop_mode) && blk_fs_request(req))
 		laptop_io_completion();
@@ -3164,7 +3172,7 @@ void end_that_request_last(struct reques
 		disk->in_flight--;
 	}
 	if (req->end_io)
-		req->end_io(req);
+		req->end_io(req, error);
 	else
 		__blk_put_request(req->q, req);
 }
@@ -3176,7 +3184,7 @@ void end_request(struct request *req, in
 	if (!end_that_request_first(req, uptodate, req->hard_cur_sectors)) {
 		add_disk_randomness(req->rq_disk);
 		blkdev_dequeue_request(req);
-		end_that_request_last(req);
+		end_that_request_last(req, uptodate);
 	}
 }
 
Index: blk-fixes/drivers/block/nbd.c
===================================================================
--- blk-fixes.orig/drivers/block/nbd.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/nbd.c	2005-06-05 14:53:32.000000000 +0900
@@ -136,7 +136,7 @@ static void nbd_end_request(struct reque
 
 	spin_lock_irqsave(q->queue_lock, flags);
 	if (!end_that_request_first(req, uptodate, req->nr_sectors)) {
-		end_that_request_last(req);
+		end_that_request_last(req, uptodate);
 	}
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
Index: blk-fixes/drivers/block/sx8.c
===================================================================
--- blk-fixes.orig/drivers/block/sx8.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/sx8.c	2005-06-05 14:53:32.000000000 +0900
@@ -745,7 +745,7 @@ static inline void carm_end_request_queu
 	rc = end_that_request_first(req, uptodate, req->hard_nr_sectors);
 	assert(rc == 0);
 
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 
 	rc = carm_put_request(host, crq);
 	assert(rc == 0);
Index: blk-fixes/drivers/block/ub.c
===================================================================
--- blk-fixes.orig/drivers/block/ub.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/ub.c	2005-06-05 14:53:32.000000000 +0900
@@ -875,7 +875,7 @@ static void ub_end_rq(struct request *rq
 
 	rc = end_that_request_first(rq, uptodate, rq->hard_nr_sectors);
 	// assert(rc == 0);
-	end_that_request_last(rq);
+	end_that_request_last(rq, uptodate);
 }
 
 /*
Index: blk-fixes/drivers/block/viodasd.c
===================================================================
--- blk-fixes.orig/drivers/block/viodasd.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/block/viodasd.c	2005-06-05 14:53:32.000000000 +0900
@@ -305,7 +305,7 @@ static void viodasd_end_request(struct r
 	if (end_that_request_first(req, uptodate, num_sectors))
 		return;
 	add_disk_randomness(req->rq_disk);
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 }
 
 /*
Index: blk-fixes/drivers/cdrom/cdu31a.c
===================================================================
--- blk-fixes.orig/drivers/cdrom/cdu31a.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/cdrom/cdu31a.c	2005-06-05 14:53:32.000000000 +0900
@@ -1402,7 +1402,7 @@ static void do_cdu31a_request(request_qu
 			if (!end_that_request_first(req, 1, nblock)) {
 				spin_lock_irq(q->queue_lock);
 				blkdev_dequeue_request(req);
-				end_that_request_last(req);
+				end_that_request_last(req, 1);
 				spin_unlock_irq(q->queue_lock);
 			}
 			continue;
Index: blk-fixes/drivers/ide/ide-cd.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-cd.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/ide/ide-cd.c	2005-06-05 14:53:32.000000000 +0900
@@ -614,7 +614,7 @@ static void cdrom_end_request (ide_drive
 			 */
 			spin_lock_irqsave(&ide_lock, flags);
 			end_that_request_chunk(failed, 0, failed->data_len);
-			end_that_request_last(failed);
+			end_that_request_last(failed, 0);
 			spin_unlock_irqrestore(&ide_lock, flags);
 		}
 
@@ -1739,7 +1739,7 @@ end_request:
 
 	spin_lock_irqsave(&ide_lock, flags);
 	blkdev_dequeue_request(rq);
-	end_that_request_last(rq);
+	end_that_request_last(rq, 1);
 	HWGROUP(drive)->rq = NULL;
 	spin_unlock_irqrestore(&ide_lock, flags);
 	return ide_stopped;
Index: blk-fixes/drivers/ide/ide-io.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-io.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/ide/ide-io.c	2005-06-05 14:53:32.000000000 +0900
@@ -89,7 +89,7 @@ int __ide_end_request(ide_drive_t *drive
 
 		blkdev_dequeue_request(rq);
 		HWGROUP(drive)->rq = NULL;
-		end_that_request_last(rq);
+		end_that_request_last(rq, uptodate);
 		ret = 0;
 	}
 	return ret;
@@ -247,7 +247,7 @@ static void ide_complete_pm_request (ide
 	}
 	blkdev_dequeue_request(rq);
 	HWGROUP(drive)->rq = NULL;
-	end_that_request_last(rq);
+	end_that_request_last(rq, 1);
 	spin_unlock_irqrestore(&ide_lock, flags);
 }
 
@@ -379,7 +379,7 @@ void ide_end_drive_cmd (ide_drive_t *dri
 	blkdev_dequeue_request(rq);
 	HWGROUP(drive)->rq = NULL;
 	rq->errors = err;
-	end_that_request_last(rq);
+	end_that_request_last(rq, !rq->errors);
 	spin_unlock_irqrestore(&ide_lock, flags);
 }
 
Index: blk-fixes/drivers/message/i2o/i2o_block.c
===================================================================
--- blk-fixes.orig/drivers/message/i2o/i2o_block.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/message/i2o/i2o_block.c	2005-06-05 14:53:32.000000000 +0900
@@ -466,7 +466,7 @@ static void i2o_block_end_request(struct
 
 	spin_lock_irqsave(q->queue_lock, flags);
 
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 
 	if (likely(dev)) {
 		dev->open_queue_depth--;
Index: blk-fixes/drivers/mmc/mmc_block.c
===================================================================
--- blk-fixes.orig/drivers/mmc/mmc_block.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/mmc/mmc_block.c	2005-06-05 14:53:32.000000000 +0900
@@ -254,7 +254,7 @@ static int mmc_blk_issue_rq(struct mmc_q
 			 */
 			add_disk_randomness(req->rq_disk);
 			blkdev_dequeue_request(req);
-			end_that_request_last(req);
+			end_that_request_last(req, 1);
 		}
 		spin_unlock_irq(&md->lock);
 	} while (ret);
@@ -280,7 +280,7 @@ static int mmc_blk_issue_rq(struct mmc_q
 
 	add_disk_randomness(req->rq_disk);
 	blkdev_dequeue_request(req);
-	end_that_request_last(req);
+	end_that_request_last(req, 0);
 	spin_unlock_irq(&md->lock);
 
 	return 0;
Index: blk-fixes/drivers/s390/block/dasd.c
===================================================================
--- blk-fixes.orig/drivers/s390/block/dasd.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/s390/block/dasd.c	2005-06-05 14:53:32.000000000 +0900
@@ -1039,7 +1039,7 @@ dasd_end_request(struct request *req, in
 	if (end_that_request_first(req, uptodate, req->hard_nr_sectors))
 		BUG();
 	add_disk_randomness(req->rq_disk);
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 }
 
 /*
Index: blk-fixes/drivers/s390/char/tape_block.c
===================================================================
--- blk-fixes.orig/drivers/s390/char/tape_block.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/s390/char/tape_block.c	2005-06-05 14:53:32.000000000 +0900
@@ -78,7 +78,7 @@ tapeblock_end_request(struct request *re
 {
 	if (end_that_request_first(req, uptodate, req->hard_nr_sectors))
 		BUG();
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 }
 
 static void
Index: blk-fixes/drivers/scsi/ide-scsi.c
===================================================================
--- blk-fixes.orig/drivers/scsi/ide-scsi.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/scsi/ide-scsi.c	2005-06-05 14:53:32.000000000 +0900
@@ -1039,7 +1039,7 @@ static int idescsi_eh_reset (struct scsi
 
 	/* kill current request */
 	blkdev_dequeue_request(req);
-	end_that_request_last(req);
+	end_that_request_last(req, 0);
 	if (req->flags & REQ_SENSE)
 		kfree(scsi->pc->buffer);
 	kfree(scsi->pc);
@@ -1049,7 +1049,7 @@ static int idescsi_eh_reset (struct scsi
 	/* now nuke the drive queue */
 	while ((req = elv_next_request(drive->queue))) {
 		blkdev_dequeue_request(req);
-		end_that_request_last(req);
+		end_that_request_last(req, 0);
 	}
 
 	HWGROUP(drive)->rq = NULL;
Index: blk-fixes/drivers/scsi/scsi_lib.c
===================================================================
--- blk-fixes.orig/drivers/scsi/scsi_lib.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/scsi/scsi_lib.c	2005-06-05 14:53:32.000000000 +0900
@@ -258,7 +258,7 @@ static void scsi_wait_done(struct scsi_c
 /* This is the end routine we get to if a command was never attached
  * to the request.  Simply complete the request without changing
  * rq_status; this will cause a DRIVER_ERROR. */
-static void scsi_wait_req_end_io(struct request *req)
+static void scsi_wait_req_end_io(struct request *req, int error)
 {
 	BUG_ON(!req->waiting);
 
@@ -574,7 +574,7 @@ static struct scsi_cmnd *scsi_end_reques
 	spin_lock_irqsave(q->queue_lock, flags);
 	if (blk_rq_tagged(req))
 		blk_queue_end_tag(q, req);
-	end_that_request_last(req);
+	end_that_request_last(req, uptodate);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	/*
@@ -1259,7 +1259,7 @@ static void scsi_kill_requests(request_q
 		req->flags |= REQ_QUIET;
 		while (end_that_request_first(req, 0, req->nr_sectors))
 			;
-		end_that_request_last(req);
+		end_that_request_last(req, 0);
 	}
 }
 
@@ -1314,7 +1314,7 @@ static void scsi_request_fn(struct reque
 			req->flags |= REQ_QUIET;
 			while (end_that_request_first(req, 0, req->nr_sectors))
 				;
-			end_that_request_last(req);
+			end_that_request_last(req, 0);
 			continue;
 		}
 
Index: blk-fixes/drivers/scsi/sd.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sd.c	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/drivers/scsi/sd.c	2005-06-05 14:53:32.000000000 +0900
@@ -758,7 +758,7 @@ static void sd_end_flush(request_queue_t
 		 * force journal abort of barriers
 		 */
 		end_that_request_first(rq, -EOPNOTSUPP, rq->hard_nr_sectors);
-		end_that_request_last(rq);
+		end_that_request_last(rq, -EOPNOTSUPP);
 	}
 }
 
Index: blk-fixes/include/linux/blkdev.h
===================================================================
--- blk-fixes.orig/include/linux/blkdev.h	2005-06-05 14:50:12.000000000 +0900
+++ blk-fixes/include/linux/blkdev.h	2005-06-05 14:53:32.000000000 +0900
@@ -102,7 +102,7 @@ void copy_io_context(struct io_context *
 void swap_io_context(struct io_context **ioc1, struct io_context **ioc2);
 
 struct request;
-typedef void (rq_end_io_fn)(struct request *);
+typedef void (rq_end_io_fn)(struct request *, int);
 
 struct request_list {
 	int count[2];
@@ -547,7 +547,7 @@ extern void blk_unregister_queue(struct 
 extern void register_disk(struct gendisk *dev);
 extern void generic_make_request(struct bio *bio);
 extern void blk_put_request(struct request *);
-extern void blk_end_sync_rq(struct request *rq);
+extern void blk_end_sync_rq(struct request *rq, int error);
 extern void blk_attempt_remerge(request_queue_t *, struct request *);
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
@@ -595,7 +595,7 @@ static inline void blk_run_address_space
  */
 extern int end_that_request_first(struct request *, int, int);
 extern int end_that_request_chunk(struct request *, int, int);
-extern void end_that_request_last(struct request *);
+extern void end_that_request_last(struct request *, int);
 extern void end_request(struct request *req, int uptodate);
 
 /*


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 02/09] blk: make scsi use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST
  2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 01/09] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
@ 2005-06-05  5:57 ` Tejun Heo
  2005-06-05  7:10   ` Jeff Garzik
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 03/09] blk: make ide use -EOPNOTSUPP instead of -EIO on ABRT_ERR Tejun Heo
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

02_blk_scsi_eopnotsupp.patch

	Use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 scsi_lib.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

Index: blk-fixes/drivers/scsi/scsi_lib.c
===================================================================
--- blk-fixes.orig/drivers/scsi/scsi_lib.c	2005-06-05 14:53:32.000000000 +0900
+++ blk-fixes/drivers/scsi/scsi_lib.c	2005-06-05 14:53:33.000000000 +0900
@@ -849,7 +849,8 @@ void scsi_io_completion(struct scsi_cmnd
 				scsi_requeue_command(q, cmd);
 				result = 0;
 			} else {
-				cmd = scsi_end_request(cmd, 0, this_count, 1);
+				cmd = scsi_end_request(cmd, -EOPNOTSUPP,
+						       this_count, 1);
 				return;
 			}
 			break;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 03/09] blk: make ide use -EOPNOTSUPP instead of -EIO on ABRT_ERR
  2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 01/09] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 02/09] blk: make scsi use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST Tejun Heo
@ 2005-06-05  5:57 ` Tejun Heo
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 04/09] blk: separate out bio init part from __make_request Tejun Heo
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

03_blk_ide_eopnotsupp.patch

	Use -EOPNOTSUPP instead of -EIO on ABRT_ERR.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 ide-io.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletion(-)

Index: blk-fixes/drivers/ide/ide-io.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-io.c	2005-06-05 14:53:32.000000000 +0900
+++ blk-fixes/drivers/ide/ide-io.c	2005-06-05 14:53:33.000000000 +0900
@@ -305,6 +305,7 @@ void ide_end_drive_cmd (ide_drive_t *dri
 	ide_hwif_t *hwif = HWIF(drive);
 	unsigned long flags;
 	struct request *rq;
+	int uptodate;
 
 	spin_lock_irqsave(&ide_lock, flags);
 	rq = HWGROUP(drive)->rq;
@@ -379,7 +380,10 @@ void ide_end_drive_cmd (ide_drive_t *dri
 	blkdev_dequeue_request(rq);
 	HWGROUP(drive)->rq = NULL;
 	rq->errors = err;
-	end_that_request_last(rq, !rq->errors);
+	uptodate = 1;
+	if (err)
+		uptodate = err & ABRT_ERR ? -EOPNOTSUPP : -EIO;
+	end_that_request_last(rq, uptodate);
 	spin_unlock_irqrestore(&ide_lock, flags);
 }
 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 04/09] blk: separate out bio init part from __make_request
  2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
                   ` (2 preceding siblings ...)
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 03/09] blk: make ide use -EOPNOTSUPP instead of -EIO on ABRT_ERR Tejun Heo
@ 2005-06-05  5:57 ` Tejun Heo
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 05/09] blk: reimplement handling of barrier request Tejun Heo
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

04_blk_add_init_request_from_bio.patch

	Separate out bio initialization part from __make_request.  It
	will be used by the following blk_ordered_reimpl.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 ll_rw_blk.c |   61 ++++++++++++++++++++++++++++++++----------------------------
 1 files changed, 33 insertions(+), 28 deletions(-)

Index: blk-fixes/drivers/block/ll_rw_blk.c
===================================================================
--- blk-fixes.orig/drivers/block/ll_rw_blk.c	2005-06-05 14:53:32.000000000 +0900
+++ blk-fixes/drivers/block/ll_rw_blk.c	2005-06-05 14:53:34.000000000 +0900
@@ -38,6 +38,8 @@
 static void blk_unplug_work(void *data);
 static void blk_unplug_timeout(unsigned long data);
 static void drive_stat_acct(struct request *rq, int nr_sectors, int new_io);
+static void init_request_from_bio(struct request *req, struct bio *bio);
+static int __make_request(request_queue_t *q, struct bio *bio);
 
 /*
  * For the allocated request tables
@@ -1638,8 +1640,6 @@ static int blk_init_free_list(request_qu
 	return 0;
 }
 
-static int __make_request(request_queue_t *, struct bio *);
-
 request_queue_t *blk_alloc_queue(int gfp_mask)
 {
 	return blk_alloc_queue_node(gfp_mask, -1);
@@ -2524,6 +2524,36 @@ void blk_attempt_remerge(request_queue_t
 
 EXPORT_SYMBOL(blk_attempt_remerge);
 
+static void init_request_from_bio(struct request *req, struct bio *bio)
+{
+	req->flags |= REQ_CMD;
+
+	/*
+	 * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
+	 */
+	if (bio_rw_ahead(bio) || bio_failfast(bio))
+		req->flags |= REQ_FAILFAST;
+
+	/*
+	 * REQ_BARRIER implies no merging, but lets make it explicit
+	 */
+	if (unlikely(bio_barrier(bio)))
+		req->flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
+
+	req->errors = 0;
+	req->hard_sector = req->sector = bio->bi_sector;
+	req->hard_nr_sectors = req->nr_sectors = bio_sectors(bio);
+	req->current_nr_sectors = req->hard_cur_sectors = bio_cur_sectors(bio);
+	req->nr_phys_segments = bio_phys_segments(req->q, bio);
+	req->nr_hw_segments = bio_hw_segments(req->q, bio);
+	req->buffer = bio_data(bio);	/* see ->buffer comment above */
+	req->waiting = NULL;
+	req->bio = req->biotail = bio;
+	req->ioprio = bio_prio(bio);
+	req->rq_disk = bio->bi_bdev->bd_disk;
+	req->start_time = jiffies;
+}
+
 static int __make_request(request_queue_t *q, struct bio *bio)
 {
 	struct request *req;
@@ -2620,32 +2650,7 @@ get_rq:
 	 * often, and the elevators are able to handle it.
 	 */
 
-	req->flags |= REQ_CMD;
-
-	/*
-	 * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
-	 */
-	if (bio_rw_ahead(bio) || bio_failfast(bio))
-		req->flags |= REQ_FAILFAST;
-
-	/*
-	 * REQ_BARRIER implies no merging, but lets make it explicit
-	 */
-	if (unlikely(barrier))
-		req->flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
-
-	req->errors = 0;
-	req->hard_sector = req->sector = sector;
-	req->hard_nr_sectors = req->nr_sectors = nr_sectors;
-	req->current_nr_sectors = req->hard_cur_sectors = cur_nr_sectors;
-	req->nr_phys_segments = bio_phys_segments(q, bio);
-	req->nr_hw_segments = bio_hw_segments(q, bio);
-	req->buffer = bio_data(bio);	/* see ->buffer comment above */
-	req->waiting = NULL;
-	req->bio = req->biotail = bio;
-	req->ioprio = prio;
-	req->rq_disk = bio->bi_bdev->bd_disk;
-	req->start_time = jiffies;
+	init_request_from_bio(req, bio);
 
 	spin_lock_irq(q->queue_lock);
 	if (elv_queue_empty(q))


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 05/09] blk: reimplement handling of barrier request
  2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
                   ` (3 preceding siblings ...)
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 04/09] blk: separate out bio init part from __make_request Tejun Heo
@ 2005-06-05  5:57 ` Tejun Heo
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 06/09] blk: update SCSI to use the new blk_ordered Tejun Heo
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

05_blk_ordered_reimpl.patch

	Reimplement handling of barrier requests.

	* Flexible handling to deal with various capabilities of
          target devices.
	* Retry support for falling back.
	* Tagged queues which don't support ordered tag can do ordered.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/block/elevator.c  |   79 ++++---
 drivers/block/ll_rw_blk.c |  491 ++++++++++++++++++++++++++++++++--------------
 include/linux/blkdev.h    |   93 ++++++--
 3 files changed, 469 insertions(+), 194 deletions(-)

Index: blk-fixes/drivers/block/elevator.c
===================================================================
--- blk-fixes.orig/drivers/block/elevator.c	2005-06-05 14:53:32.000000000 +0900
+++ blk-fixes/drivers/block/elevator.c	2005-06-05 14:53:34.000000000 +0900
@@ -37,9 +37,38 @@
 
 #include <asm/uaccess.h>
 
+/*
+ * XXX HACK XXX Before entering elevator callbacks, we temporailiy
+ * turn off REQ_CMD of proxy barrier request so that elevators don't
+ * try to account it as a normal one.  This ugliness can go away once
+ * generic dispatch queue is implemented. - tj
+ */
+#define bar_rq_hack_start(q)	((q)->bar_rq && ((q)->bar_rq->flags &= ~REQ_CMD))
+#define bar_rq_hack_end(q)	((q)->bar_rq && ((q)->bar_rq->flags |= REQ_CMD))
+
 static DEFINE_SPINLOCK(elv_list_lock);
 static LIST_HEAD(elv_list);
 
+static inline void elv_inc_inflight(request_queue_t *q)
+{
+	q->in_flight++;
+}
+
+static inline void elv_dec_inflight(request_queue_t *q)
+{
+	q->in_flight--;
+
+	/*
+	 * If the queue is waiting for all accounted requests to be
+	 * drained to start flush sequence and we're the last one,
+	 * kick the queue in the ass to start the flush sequence.
+	 */
+	if (q->in_flight == 0 && blk_ordered_cur_seq(q) == QUEUE_ORDSEQ_DRAIN) {
+		blk_ordered_complete_seq(q, QUEUE_ORDSEQ_DRAIN, 0);
+		q->request_fn(q);
+	}
+}
+
 /*
  * can we safely merge with this request?
  */
@@ -270,12 +299,15 @@ void elv_deactivate_request(request_queu
 	 * in_flight count again
 	 */
 	if (blk_account_rq(rq))
-		q->in_flight--;
+		elv_dec_inflight(q);
 
 	rq->flags &= ~REQ_STARTED;
 
-	if (e->ops->elevator_deactivate_req_fn)
+	if (e->ops->elevator_deactivate_req_fn) {
+		bar_rq_hack_start(q);
 		e->ops->elevator_deactivate_req_fn(q, rq);
+		bar_rq_hack_end(q);
+	}
 }
 
 void elv_requeue_request(request_queue_t *q, struct request *rq)
@@ -283,14 +315,6 @@ void elv_requeue_request(request_queue_t
 	elv_deactivate_request(q, rq);
 
 	/*
-	 * if this is the flush, requeue the original instead and drop the flush
-	 */
-	if (rq->flags & REQ_BAR_FLUSH) {
-		clear_bit(QUEUE_FLAG_FLUSH, &q->queue_flags);
-		rq = rq->end_io_data;
-	}
-
-	/*
 	 * the request is prepped and may have some resources allocated.
 	 * allowing unprepped requests to pass this one may cause resource
 	 * deadlock.  turn on softbarrier.
@@ -301,9 +325,11 @@ void elv_requeue_request(request_queue_t
 	 * if iosched has an explicit requeue hook, then use that. otherwise
 	 * just put the request at the front of the queue
 	 */
-	if (q->elevator->ops->elevator_requeue_req_fn)
+	if (q->elevator->ops->elevator_requeue_req_fn) {
+		bar_rq_hack_start(q);
 		q->elevator->ops->elevator_requeue_req_fn(q, rq);
-	else
+		bar_rq_hack_end(q);
+	} else
 		__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
 }
 
@@ -323,7 +349,9 @@ void __elv_add_request(request_queue_t *
 	rq->q = q;
 
 	if (!test_bit(QUEUE_FLAG_DRAIN, &q->queue_flags)) {
+		bar_rq_hack_start(q);
 		q->elevator->ops->elevator_add_req_fn(q, rq, where);
+		bar_rq_hack_end(q);
 
 		if (blk_queue_plugged(q)) {
 			int nrq = q->rq.count[READ] + q->rq.count[WRITE]
@@ -353,20 +381,10 @@ void elv_add_request(request_queue_t *q,
 
 static inline struct request *__elv_next_request(request_queue_t *q)
 {
-	struct request *rq = q->elevator->ops->elevator_next_req_fn(q);
-
-	/*
-	 * if this is a barrier write and the device has to issue a
-	 * flush sequence to support it, check how far we are
-	 */
-	if (rq && blk_fs_request(rq) && blk_barrier_rq(rq)) {
-		BUG_ON(q->ordered == QUEUE_ORDERED_NONE);
-
-		if (q->ordered == QUEUE_ORDERED_FLUSH &&
-		    !blk_barrier_preflush(rq))
-			rq = blk_start_pre_flush(q, rq);
-	}
-
+	struct request *rq;
+	while ((rq = q->elevator->ops->elevator_next_req_fn(q)))
+		if (blk_do_ordered(q, &rq))
+			break;
 	return rq;
 }
 
@@ -433,7 +451,7 @@ void elv_remove_request(request_queue_t 
 	 * for request-request merges
 	 */
 	if (blk_account_rq(rq))
-		q->in_flight++;
+		elv_inc_inflight(q);
 
 	/*
 	 * the main clearing point for q->last_merge is on retrieval of
@@ -445,8 +463,11 @@ void elv_remove_request(request_queue_t 
 	if (rq == q->last_merge)
 		q->last_merge = NULL;
 
-	if (e->ops->elevator_remove_req_fn)
+	if (e->ops->elevator_remove_req_fn) {
+		bar_rq_hack_start(q);
 		e->ops->elevator_remove_req_fn(q, rq);
+		bar_rq_hack_end(q);
+	}
 }
 
 int elv_queue_empty(request_queue_t *q)
@@ -529,7 +550,7 @@ void elv_completed_request(request_queue
 	 * request is released from the driver, io must be done
 	 */
 	if (blk_account_rq(rq))
-		q->in_flight--;
+		elv_dec_inflight(q);
 
 	if (e->ops->elevator_completed_req_fn)
 		e->ops->elevator_completed_req_fn(q, rq);
Index: blk-fixes/drivers/block/ll_rw_blk.c
===================================================================
--- blk-fixes.orig/drivers/block/ll_rw_blk.c	2005-06-05 14:53:34.000000000 +0900
+++ blk-fixes/drivers/block/ll_rw_blk.c	2005-06-05 14:53:34.000000000 +0900
@@ -291,10 +291,112 @@ static inline void rq_init(request_queue
 	rq->end_io_data = NULL;
 }
 
+static int __blk_queue_ordered(request_queue_t *q, int ordered,
+			       prepare_flush_fn *prepare_flush_fn,
+			       unsigned gfp_mask, int locked)
+{
+	void *rq0 = NULL, *rq1 = NULL, *rq2 = NULL;
+	unsigned long flags = 0; /* shut up, gcc */
+	unsigned ordered_flags;
+	int ret = 0;
+
+	might_sleep_if(gfp_mask & __GFP_WAIT);
+
+	ordered_flags = ordered & QUEUE_ORDERED_FLAGS;
+	ordered &= ~QUEUE_ORDERED_FLAGS;
+
+	if (ordered & (QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_POSTFLUSH) &&
+	    prepare_flush_fn == NULL) {
+		printk(KERN_ERR "blk_queue_ordered: prepare_flush_fn required\n");
+		ret = -EINVAL;
+		goto out_unlocked;
+	}
+
+	if (!locked)
+		spin_lock_irqsave(q->queue_lock, flags);
+
+	if (q->ordseq) {
+		/* Queue flushing in progress, defer changing q->ordered. */
+		q->next_ordered = ordered;
+		q->next_prepare_flush_fn = prepare_flush_fn;
+		ret = -EINPROGRESS;
+		goto out;
+	}
+
+	switch (ordered) {
+	case QUEUE_ORDERED_NONE:
+		ordered_flags = 0;	/* for '== QUEUE_ORDERED_NONE' tests */
+		if (!q->pre_flush_rq) {
+			rq0 = q->pre_flush_rq;
+			rq1 = q->bar_rq;
+			rq2 = q->post_flush_rq;
+			q->pre_flush_rq = NULL;
+			q->bar_rq = NULL;
+			q->post_flush_rq = NULL;
+		}
+		break;
+
+	case QUEUE_ORDERED_DRAIN:
+	case QUEUE_ORDERED_DRAIN_FLUSH:
+	case QUEUE_ORDERED_DRAIN_FUA:
+	case QUEUE_ORDERED_TAG:
+	case QUEUE_ORDERED_TAG_FLUSH:
+	case QUEUE_ORDERED_TAG_FUA:
+		if (q->pre_flush_rq)
+			break;
+
+		if (!locked)
+			spin_unlock_irqrestore(q->queue_lock, flags);
+		if (!(rq0 = kmem_cache_alloc(request_cachep, gfp_mask)) ||
+		    !(rq1 = kmem_cache_alloc(request_cachep, gfp_mask)) ||
+		    !(rq2 = kmem_cache_alloc(request_cachep, gfp_mask))) {
+			printk(KERN_ERR "blk_queue_ordered: failed to "
+			       "switch to 0x%x (out of memory)\n", ordered);
+			ret = -ENOMEM;
+			goto out_unlocked;
+		}
+		if (!locked)
+			spin_lock_irqsave(q->queue_lock, flags);
+
+		if (!q->pre_flush_rq) {
+			q->pre_flush_rq = rq0;
+			q->bar_rq = rq1;
+			q->post_flush_rq = rq2;
+			rq0 = NULL;
+			rq1 = NULL;
+			rq2 = NULL;
+		}
+		break;
+
+	default:
+		printk(KERN_ERR "blk_queue_ordered: bad value %d\n", ordered);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	q->ordered = ordered | ordered_flags;
+	q->next_ordered = q->ordered;
+	q->prepare_flush_fn = prepare_flush_fn;
+
+ out:
+	if (!locked)
+		spin_unlock_irqrestore(q->queue_lock, flags);
+ out_unlocked:
+	if (rq0)
+		kmem_cache_free(request_cachep, rq0);
+	if (rq1)
+		kmem_cache_free(request_cachep, rq1);
+	if (rq2)
+		kmem_cache_free(request_cachep, rq2);
+
+	return ret;
+}
+
 /**
  * blk_queue_ordered - does this queue support ordered writes
- * @q:     the request queue
- * @flag:  see below
+ * @q:        the request queue
+ * @ordered:  one of QUEUE_ORDERED_*
+ * @gfp_mask: allocation mask
  *
  * Description:
  *   For journalled file systems, doing ordered writes on a commit
@@ -303,32 +405,24 @@ static inline void rq_init(request_queue
  *   feature should call this function and indicate so.
  *
  **/
-void blk_queue_ordered(request_queue_t *q, int flag)
+extern int blk_queue_ordered(request_queue_t *q, unsigned ordered,
+			     prepare_flush_fn *prepare_flush_fn,
+			     unsigned gfp_mask)
 {
-	switch (flag) {
-		case QUEUE_ORDERED_NONE:
-			if (q->flush_rq)
-				kmem_cache_free(request_cachep, q->flush_rq);
-			q->flush_rq = NULL;
-			q->ordered = flag;
-			break;
-		case QUEUE_ORDERED_TAG:
-			q->ordered = flag;
-			break;
-		case QUEUE_ORDERED_FLUSH:
-			q->ordered = flag;
-			if (!q->flush_rq)
-				q->flush_rq = kmem_cache_alloc(request_cachep,
-								GFP_KERNEL);
-			break;
-		default:
-			printk("blk_queue_ordered: bad value %d\n", flag);
-			break;
-	}
+	return __blk_queue_ordered(q, ordered, prepare_flush_fn, gfp_mask, 0);
 }
 
 EXPORT_SYMBOL(blk_queue_ordered);
 
+extern int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered,
+				    prepare_flush_fn *prepare_flush_fn,
+				    unsigned gfp_mask)
+{
+	return __blk_queue_ordered(q, ordered, prepare_flush_fn, gfp_mask, 1);
+}
+
+EXPORT_SYMBOL(blk_queue_ordered_locked);
+
 /**
  * blk_queue_issue_flush_fn - set function for issuing a flush
  * @q:     the request queue
@@ -349,165 +443,281 @@ EXPORT_SYMBOL(blk_queue_issue_flush_fn);
 /*
  * Cache flushing for ordered writes handling
  */
-static void blk_pre_flush_end_io(struct request *flush_rq, int error)
+inline unsigned blk_ordered_cur_seq(request_queue_t *q)
 {
-	struct request *rq = flush_rq->end_io_data;
-	request_queue_t *q = rq->q;
-
-	rq->flags |= REQ_BAR_PREFLUSH;
-
-	if (!flush_rq->errors)
-		elv_requeue_request(q, rq);
-	else {
-		q->end_flush_fn(q, flush_rq);
-		clear_bit(QUEUE_FLAG_FLUSH, &q->queue_flags);
-		q->request_fn(q);
-	}
+	if (!q->ordseq)
+		return 0;
+	return 1 << ffz(q->ordseq);
 }
 
-static void blk_post_flush_end_io(struct request *flush_rq, int error)
+static void ordered_set_error(request_queue_t *q, unsigned seq, int error)
 {
-	struct request *rq = flush_rq->end_io_data;
-	request_queue_t *q = rq->q;
+	unsigned ordered;
 
-	rq->flags |= REQ_BAR_POSTFLUSH;
+	q->orderr = error;
 
-	q->end_flush_fn(q, flush_rq);
-	clear_bit(QUEUE_FLAG_FLUSH, &q->queue_flags);
-	q->request_fn(q);
+	if (q->ordered & QUEUE_ORDERED_NOFALLBACK ||
+	    q->ordered != q->next_ordered)
+		return;
+
+	switch (seq) {
+	case QUEUE_ORDSEQ_PREFLUSH:
+		if (q->ordered & QUEUE_ORDERED_TAG)
+			ordered = (q->ordered & ~QUEUE_ORDERED_TAG) |
+				QUEUE_ORDERED_DRAIN;
+		else
+			ordered = QUEUE_ORDERED_NONE;
+		break;
+	case QUEUE_ORDSEQ_BAR:
+		ordered = (q->ordered & ~QUEUE_ORDERED_FUA) |
+			QUEUE_ORDERED_POSTFLUSH;
+		break;
+	default:
+		return;
+	}
+
+	blk_queue_ordered_locked(q, ordered, q->prepare_flush_fn, GFP_ATOMIC);
 }
 
-struct request *blk_start_pre_flush(request_queue_t *q, struct request *rq)
+void blk_ordered_complete_seq(request_queue_t *q, unsigned seq, int error)
 {
-	struct request *flush_rq = q->flush_rq;
+	struct request *rq;
+	int uptodate, changed = 0;
 
-	BUG_ON(!blk_barrier_rq(rq));
+	if (error && !q->orderr)
+		ordered_set_error(q, seq, error);
 
-	if (test_and_set_bit(QUEUE_FLAG_FLUSH, &q->queue_flags))
-		return NULL;
+	BUG_ON(q->ordseq & seq);
+	q->ordseq |= seq;
 
-	rq_init(q, flush_rq);
-	flush_rq->elevator_private = NULL;
-	flush_rq->flags = REQ_BAR_FLUSH;
-	flush_rq->rq_disk = rq->rq_disk;
-	flush_rq->rl = NULL;
+	if (blk_ordered_cur_seq(q) != QUEUE_ORDSEQ_DONE)
+		return;
 
 	/*
-	 * prepare_flush returns 0 if no flush is needed, just mark both
-	 * pre and post flush as done in that case
-	 */
-	if (!q->prepare_flush_fn(q, flush_rq)) {
-		rq->flags |= REQ_BAR_PREFLUSH | REQ_BAR_POSTFLUSH;
-		clear_bit(QUEUE_FLAG_FLUSH, &q->queue_flags);
-		return rq;
-	}
+	 * Okay, sequence complete.
+	 */
+	rq = q->orig_bar_rq;
+	uptodate = q->orderr ? q->orderr : 1;
+
+	q->ordseq = 0;
 
 	/*
-	 * some drivers dequeue requests right away, some only after io
-	 * completion. make sure the request is dequeued.
+	 * We give a failed barrier request second chance if ordered
+	 * has changed since it has started.
 	 */
-	if (!list_empty(&rq->queuelist))
-		blkdev_dequeue_request(rq);
+	if (q->next_ordered != q->ordered)
+		changed = !blk_queue_ordered_locked(q, q->next_ordered,
+						    q->next_prepare_flush_fn,
+						    GFP_ATOMIC);
 
-	elv_deactivate_request(q, rq);
+	if (!q->orderr || !changed || q->ordered == QUEUE_ORDERED_NONE) {
+		end_that_request_first(rq, uptodate, rq->hard_nr_sectors);
+		end_that_request_last(rq, uptodate);
+	} else
+		blk_requeue_request(q, rq);
+}
 
-	flush_rq->end_io_data = rq;
-	flush_rq->end_io = blk_pre_flush_end_io;
+static void pre_flush_end_io(struct request *rq, int error)
+{
+	blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_PREFLUSH, error);
+}
 
-	__elv_add_request(q, flush_rq, ELEVATOR_INSERT_FRONT, 0);
-	return flush_rq;
+static void bar_end_io(struct request *rq, int error)
+{
+	blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_BAR, error);
 }
 
-static void blk_start_post_flush(request_queue_t *q, struct request *rq)
+static void post_flush_end_io(struct request *rq, int error)
 {
-	struct request *flush_rq = q->flush_rq;
+	blk_ordered_complete_seq(rq->q, QUEUE_ORDSEQ_POSTFLUSH, error);
+}
 
-	BUG_ON(!blk_barrier_rq(rq));
+static void queue_flush(request_queue_t *q, unsigned which)
+{
+	struct request *rq;
+	rq_end_io_fn *end_io;
 
-	rq_init(q, flush_rq);
-	flush_rq->elevator_private = NULL;
-	flush_rq->flags = REQ_BAR_FLUSH;
-	flush_rq->rq_disk = rq->rq_disk;
-	flush_rq->rl = NULL;
+	if (which == QUEUE_ORDERED_PREFLUSH) {
+		rq = q->pre_flush_rq;
+		end_io = pre_flush_end_io;
+	} else {
+		rq = q->post_flush_rq;
+		end_io = post_flush_end_io;
+	}
 
-	if (q->prepare_flush_fn(q, flush_rq)) {
-		flush_rq->end_io_data = rq;
-		flush_rq->end_io = blk_post_flush_end_io;
+	rq_init(q, rq);
+	rq->flags = REQ_HARDBARRIER;
+	rq->elevator_private = NULL;
+	rq->rq_disk = q->bar_rq->rq_disk;
+	rq->rl = NULL;
+	rq->end_io = end_io;
+	q->prepare_flush_fn(q, rq);
 
-		__elv_add_request(q, flush_rq, ELEVATOR_INSERT_FRONT, 0);
-		q->request_fn(q);
-	}
+	__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
 }
 
-static inline int blk_check_end_barrier(request_queue_t *q, struct request *rq,
-					int sectors)
+static inline struct request *start_ordered(request_queue_t *q,
+					    struct request *rq)
 {
-	if (sectors > rq->nr_sectors)
-		sectors = rq->nr_sectors;
+	q->bi_size = 0;
+	q->orderr = 0;
+	q->ordseq |= QUEUE_ORDSEQ_STARTED;
 
-	rq->nr_sectors -= sectors;
-	return rq->nr_sectors;
-}
+	/*
+	 * Prep proxy barrier request.
+	 */
+	blkdev_dequeue_request(rq);
+	q->orig_bar_rq = rq;
+	rq = q->bar_rq;
+	rq_init(q, rq);
+	rq->flags = bio_data_dir(q->orig_bar_rq->bio);
+	rq->flags |= q->ordered & QUEUE_ORDERED_FUA ? REQ_FUA : 0;
+	rq->elevator_private = NULL;
+	rq->rl = NULL;
+	init_request_from_bio(rq, q->orig_bar_rq->bio);
+	rq->end_io = bar_end_io;
 
-static int __blk_complete_barrier_rq(request_queue_t *q, struct request *rq,
-				     int sectors, int queue_locked)
-{
-	if (q->ordered != QUEUE_ORDERED_FLUSH)
-		return 0;
-	if (!blk_fs_request(rq) || !blk_barrier_rq(rq))
-		return 0;
-	if (blk_barrier_postflush(rq))
-		return 0;
+	/*
+	 * Queue ordered sequence.  As we stack them at the head, we
+	 * need to queue in reverse order.  Note that we rely on that
+	 * no fs request uses ELEVATOR_INSERT_FRONT and thus no fs
+	 * request gets inbetween ordered sequence.
+	 */
+	if (q->ordered & QUEUE_ORDERED_POSTFLUSH)
+		queue_flush(q, QUEUE_ORDERED_POSTFLUSH);
+	else
+		q->ordseq |= QUEUE_ORDSEQ_POSTFLUSH;
 
-	if (!blk_check_end_barrier(q, rq, sectors)) {
-		unsigned long flags = 0;
+	__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
 
-		if (!queue_locked)
-			spin_lock_irqsave(q->queue_lock, flags);
+	if (q->ordered & QUEUE_ORDERED_PREFLUSH) {
+		queue_flush(q, QUEUE_ORDERED_PREFLUSH);
+		rq = q->pre_flush_rq;
+	} else
+		q->ordseq |= QUEUE_ORDSEQ_PREFLUSH;
 
-		blk_start_post_flush(q, rq);
+	if ((q->ordered & QUEUE_ORDERED_TAG) || q->in_flight == 0)
+		q->ordseq |= QUEUE_ORDSEQ_DRAIN;
+	else
+		rq = NULL;
 
-		if (!queue_locked)
-			spin_unlock_irqrestore(q->queue_lock, flags);
+	return rq;
+}
+
+int blk_do_ordered(request_queue_t *q, struct request **rqp)
+{
+	struct request *rq = *rqp, *allowed_rq;
+	int is_barrier = blk_fs_request(rq) && blk_barrier_rq(rq);
+
+	if (!q->ordseq) {
+		if (!is_barrier)
+			return 1;
+
+		if (q->ordered != QUEUE_ORDERED_NONE) {
+			*rqp = start_ordered(q, rq);
+			return 1;
+		} else {
+			/*
+			 * This can happen when the queue switches to
+			 * ORDERED_NONE while this request is on it.
+			 */
+			blkdev_dequeue_request(rq);
+			end_that_request_first(rq, -EOPNOTSUPP,
+					       rq->hard_nr_sectors);
+			end_that_request_last(rq, -EOPNOTSUPP);
+			*rqp = NULL;
+			return 0;
+		}
+	}
+
+	if (q->ordered & QUEUE_ORDERED_TAG) {
+		if (blk_fs_request(rq) && rq != q->bar_rq)
+			*rqp = NULL;
+		return 1;
 	}
 
+	switch (blk_ordered_cur_seq(q)) {
+	case QUEUE_ORDSEQ_PREFLUSH:
+		allowed_rq = q->pre_flush_rq;
+		break;
+	case QUEUE_ORDSEQ_BAR:
+		allowed_rq = q->bar_rq;
+		break;
+	case QUEUE_ORDSEQ_POSTFLUSH:
+		allowed_rq = q->post_flush_rq;
+		break;
+	default:
+		allowed_rq = NULL;
+		break;
+	}
+
+	if (rq != allowed_rq && (blk_fs_request(rq) || rq == q->pre_flush_rq ||
+				 rq == q->post_flush_rq))
+		*rqp = NULL;
+
 	return 1;
 }
 
-/**
- * blk_complete_barrier_rq - complete possible barrier request
- * @q:  the request queue for the device
- * @rq:  the request
- * @sectors:  number of sectors to complete
- *
- * Description:
- *   Used in driver end_io handling to determine whether to postpone
- *   completion of a barrier request until a post flush has been done. This
- *   is the unlocked variant, used if the caller doesn't already hold the
- *   queue lock.
- **/
-int blk_complete_barrier_rq(request_queue_t *q, struct request *rq, int sectors)
+static int flush_dry_bio_endio(struct bio *bio, unsigned int bytes, int error)
 {
-	return __blk_complete_barrier_rq(q, rq, sectors, 0);
+	request_queue_t *q = bio->bi_private;
+	struct bio_vec *bvec;
+	int i;
+
+	/*
+	 * This is dry run, restore bio_sector and size.  We'll finish
+	 * this request again with the original bi_end_io after an
+	 * error occurs or post flush is complete.
+	 */
+	q->bi_size += bytes;
+
+	if (bio->bi_size)
+		return 1;
+
+	/* Rewind bvec's */
+	bio->bi_idx = 0;
+	bio_for_each_segment(bvec, bio, i) {
+		bvec->bv_len += bvec->bv_offset;
+		bvec->bv_offset = 0;
+	}
+
+	/* Reset bio */
+	set_bit(BIO_UPTODATE, &bio->bi_flags);
+	bio->bi_size = q->bi_size;
+	bio->bi_sector -= (q->bi_size >> 9);
+	q->bi_size = 0;
+
+	return 0;
 }
-EXPORT_SYMBOL(blk_complete_barrier_rq);
 
-/**
- * blk_complete_barrier_rq_locked - complete possible barrier request
- * @q:  the request queue for the device
- * @rq:  the request
- * @sectors:  number of sectors to complete
- *
- * Description:
- *   See blk_complete_barrier_rq(). This variant must be used if the caller
- *   holds the queue lock.
- **/
-int blk_complete_barrier_rq_locked(request_queue_t *q, struct request *rq,
-				   int sectors)
+static inline int ordered_bio_endio(struct request *rq, struct bio *bio,
+				    unsigned int nbytes, int error)
 {
-	return __blk_complete_barrier_rq(q, rq, sectors, 1);
+	request_queue_t *q = rq->q;
+	bio_end_io_t *endio;
+	void *private;
+
+	if (q->bar_rq != rq)
+		return 0;
+
+	/*
+	 * Okay, this is the barrier request in progress, dry finish it.
+	 */
+	if (error && !q->orderr)
+		ordered_set_error(q, QUEUE_ORDSEQ_BAR, error);
+
+	endio = bio->bi_end_io;
+	private = bio->bi_private;
+	bio->bi_end_io = flush_dry_bio_endio;
+	bio->bi_private = q;
+
+	bio_endio(bio, nbytes, error);
+
+	bio->bi_end_io = endio;
+	bio->bi_private = private;
+
+	return 1;
 }
-EXPORT_SYMBOL(blk_complete_barrier_rq_locked);
 
 /**
  * blk_queue_bounce_limit - set bounce buffer limit for queue
@@ -1031,6 +1241,7 @@ static char *rq_flags[] = {
 	"REQ_FAILFAST",
 	"REQ_SOFTBARRIER",
 	"REQ_HARDBARRIER",
+	"REQ_FUA",
 	"REQ_CMD",
 	"REQ_NOMERGE",
 	"REQ_STARTED",
@@ -1614,7 +1825,7 @@ void blk_cleanup_queue(request_queue_t *
 	if (q->queue_tags)
 		__blk_queue_free_tags(q);
 
-	blk_queue_ordered(q, QUEUE_ORDERED_NONE);
+	blk_queue_ordered(q, QUEUE_ORDERED_NONE, NULL, 0);
 
 	kmem_cache_free(requestq_cachep, q);
 }
@@ -3034,7 +3245,8 @@ static int __end_that_request_first(stru
 		if (nr_bytes >= bio->bi_size) {
 			req->bio = bio->bi_next;
 			nbytes = bio->bi_size;
-			bio_endio(bio, nbytes, error);
+			if (!ordered_bio_endio(req, bio, nbytes, error))
+				bio_endio(bio, nbytes, error);
 			next_idx = 0;
 			bio_nbytes = 0;
 		} else {
@@ -3089,7 +3301,8 @@ static int __end_that_request_first(stru
 	 * if the request wasn't completed, update state
 	 */
 	if (bio_nbytes) {
-		bio_endio(bio, bio_nbytes, error);
+		if (!ordered_bio_endio(req, bio, bio_nbytes, error))
+			bio_endio(bio, bio_nbytes, error);
 		bio->bi_idx += next_idx;
 		bio_iovec(bio)->bv_offset += nr_bytes;
 		bio_iovec(bio)->bv_len -= nr_bytes;
Index: blk-fixes/include/linux/blkdev.h
===================================================================
--- blk-fixes.orig/include/linux/blkdev.h	2005-06-05 14:53:32.000000000 +0900
+++ blk-fixes/include/linux/blkdev.h	2005-06-05 14:53:34.000000000 +0900
@@ -205,6 +205,7 @@ enum rq_flag_bits {
 	__REQ_FAILFAST,		/* no low level driver retries */
 	__REQ_SOFTBARRIER,	/* may not be passed by ioscheduler */
 	__REQ_HARDBARRIER,	/* may not be passed by drive either */
+	__REQ_FUA,		/* forced unit access */
 	__REQ_CMD,		/* is a regular fs rw request */
 	__REQ_NOMERGE,		/* don't touch this for merging */
 	__REQ_STARTED,		/* drive already may have started this one */
@@ -227,9 +228,6 @@ enum rq_flag_bits {
 	__REQ_PM_SUSPEND,	/* suspend request */
 	__REQ_PM_RESUME,	/* resume request */
 	__REQ_PM_SHUTDOWN,	/* shutdown request */
-	__REQ_BAR_PREFLUSH,	/* barrier pre-flush done */
-	__REQ_BAR_POSTFLUSH,	/* barrier post-flush */
-	__REQ_BAR_FLUSH,	/* rq is the flush request */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -237,6 +235,7 @@ enum rq_flag_bits {
 #define REQ_FAILFAST	(1 << __REQ_FAILFAST)
 #define REQ_SOFTBARRIER	(1 << __REQ_SOFTBARRIER)
 #define REQ_HARDBARRIER	(1 << __REQ_HARDBARRIER)
+#define REQ_FUA		(1 << __REQ_FUA)
 #define REQ_CMD		(1 << __REQ_CMD)
 #define REQ_NOMERGE	(1 << __REQ_NOMERGE)
 #define REQ_STARTED	(1 << __REQ_STARTED)
@@ -255,9 +254,6 @@ enum rq_flag_bits {
 #define REQ_PM_SUSPEND	(1 << __REQ_PM_SUSPEND)
 #define REQ_PM_RESUME	(1 << __REQ_PM_RESUME)
 #define REQ_PM_SHUTDOWN	(1 << __REQ_PM_SHUTDOWN)
-#define REQ_BAR_PREFLUSH	(1 << __REQ_BAR_PREFLUSH)
-#define REQ_BAR_POSTFLUSH	(1 << __REQ_BAR_POSTFLUSH)
-#define REQ_BAR_FLUSH	(1 << __REQ_BAR_FLUSH)
 
 /*
  * State information carried for REQ_PM_SUSPEND and REQ_PM_RESUME
@@ -287,8 +283,7 @@ struct bio_vec;
 typedef int (merge_bvec_fn) (request_queue_t *, struct bio *, struct bio_vec *);
 typedef void (activity_fn) (void *data, int rw);
 typedef int (issue_flush_fn) (request_queue_t *, struct gendisk *, sector_t *);
-typedef int (prepare_flush_fn) (request_queue_t *, struct request *);
-typedef void (end_flush_fn) (request_queue_t *, struct request *);
+typedef void (prepare_flush_fn) (request_queue_t *, struct request *);
 
 enum blk_queue_state {
 	Queue_down,
@@ -329,7 +324,6 @@ struct request_queue
 	activity_fn		*activity_fn;
 	issue_flush_fn		*issue_flush_fn;
 	prepare_flush_fn	*prepare_flush_fn;
-	end_flush_fn		*end_flush_fn;
 
 	/*
 	 * Auto-unplugging state
@@ -409,14 +403,13 @@ struct request_queue
 	/*
 	 * reserved for flush operations
 	 */
-	struct request		*flush_rq;
-	unsigned char		ordered;
-};
-
-enum {
-	QUEUE_ORDERED_NONE,
-	QUEUE_ORDERED_TAG,
-	QUEUE_ORDERED_FLUSH,
+	unsigned int		ordseq;
+	int			orderr;
+	struct request		*pre_flush_rq, *bar_rq, *post_flush_rq;
+	struct request		*orig_bar_rq;
+	unsigned int		bi_size;
+	unsigned int		ordered, next_ordered;
+	prepare_flush_fn	*next_prepare_flush_fn;
 };
 
 #define RQ_INACTIVE		(-1)
@@ -434,19 +427,67 @@ enum {
 #define QUEUE_FLAG_REENTER	6	/* Re-entrancy avoidance */
 #define QUEUE_FLAG_PLUGGED	7	/* queue is plugged */
 #define QUEUE_FLAG_DRAIN	8	/* draining queue for sched switch */
-#define QUEUE_FLAG_FLUSH	9	/* doing barrier flush sequence */
+
+enum {
+	/*
+	 * Hardbarrier is supported with one of the following methods.
+	 *
+	 * NONE		: hardbarrier unsupported
+	 * DRAIN	: ordering by draining is enough
+	 * DRAIN_FLUSH	: ordering by draining w/ pre and post flushes
+	 * DRAIN_FUA	: ordering by draining w/ pre flush and FUA write
+	 * TAG		: ordering by tag is enough
+	 * TAG_FLUSH	: ordering by tag w/ pre and post flushes
+	 * TAG_FUA	: ordering by tag w/ pre flush and FUA write
+	 *
+	 * The following flags can be or'd.
+	 *
+	 * NOFALLBACK	: disable auto-fallback
+	 */
+	QUEUE_ORDERED_NONE	= 0x00,
+	QUEUE_ORDERED_DRAIN	= 0x01,
+	QUEUE_ORDERED_TAG	= 0x02,
+
+	QUEUE_ORDERED_PREFLUSH	= 0x10,
+	QUEUE_ORDERED_POSTFLUSH	= 0x20,
+	QUEUE_ORDERED_FUA	= 0x40,
+
+	QUEUE_ORDERED_NOFALLBACK = 0x80,
+
+	QUEUE_ORDERED_FLAGS	= QUEUE_ORDERED_NOFALLBACK,
+
+	QUEUE_ORDERED_DRAIN_FLUSH = QUEUE_ORDERED_DRAIN |
+			QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_POSTFLUSH,
+	QUEUE_ORDERED_DRAIN_FUA	= QUEUE_ORDERED_DRAIN |
+			QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_FUA,
+	QUEUE_ORDERED_TAG_FLUSH	= QUEUE_ORDERED_TAG |
+			QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_POSTFLUSH,
+	QUEUE_ORDERED_TAG_FUA	= QUEUE_ORDERED_TAG |
+			QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_FUA,
+
+	/*
+	 * Ordered operation sequence
+	 */
+	QUEUE_ORDSEQ_STARTED	= 0x01,	/* flushing in progress */
+	QUEUE_ORDSEQ_DRAIN	= 0x02,	/* waiting for the queue to be drained */
+	QUEUE_ORDSEQ_PREFLUSH	= 0x04,	/* pre-flushing in progress */
+	QUEUE_ORDSEQ_BAR	= 0x08,	/* original barrier req in progress */
+	QUEUE_ORDSEQ_POSTFLUSH	= 0x10,	/* post-flushing in progress */
+	QUEUE_ORDSEQ_DONE	= 0x20,
+};
 
 #define blk_queue_plugged(q)	test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
 #define blk_queue_tagged(q)	test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
-#define blk_queue_flushing(q)	test_bit(QUEUE_FLAG_FLUSH, &(q)->queue_flags)
+#define blk_queue_flushing(q)	((q)->ordseq)
 
 #define blk_fs_request(rq)	((rq)->flags & REQ_CMD)
 #define blk_pc_request(rq)	((rq)->flags & REQ_BLOCK_PC)
 #define blk_noretry_request(rq)	((rq)->flags & REQ_FAILFAST)
 #define blk_rq_started(rq)	((rq)->flags & REQ_STARTED)
 
-#define blk_account_rq(rq)	(blk_rq_started(rq) && blk_fs_request(rq))
+#define blk_account_rq(rq)	\
+	(blk_rq_started(rq) && blk_fs_request(rq) && rq != rq->q->bar_rq)
 
 #define blk_pm_suspend_request(rq)	((rq)->flags & REQ_PM_SUSPEND)
 #define blk_pm_resume_request(rq)	((rq)->flags & REQ_PM_RESUME)
@@ -454,8 +495,7 @@ enum {
 	((rq)->flags & (REQ_PM_SUSPEND | REQ_PM_RESUME))
 
 #define blk_barrier_rq(rq)	((rq)->flags & REQ_HARDBARRIER)
-#define blk_barrier_preflush(rq)	((rq)->flags & REQ_BAR_PREFLUSH)
-#define blk_barrier_postflush(rq)	((rq)->flags & REQ_BAR_POSTFLUSH)
+#define blk_fua_rq(rq)		((rq)->flags & REQ_FUA)
 
 #define list_entry_rq(ptr)	list_entry((ptr), struct request, queuelist)
 
@@ -636,11 +676,12 @@ extern void blk_queue_prep_rq(request_qu
 extern void blk_queue_merge_bvec(request_queue_t *, merge_bvec_fn *);
 extern void blk_queue_dma_alignment(request_queue_t *, int);
 extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);
-extern void blk_queue_ordered(request_queue_t *, int);
+extern int blk_queue_ordered(request_queue_t *, unsigned, prepare_flush_fn *, unsigned);
+extern int blk_queue_ordered_locked(request_queue_t *, unsigned, prepare_flush_fn *, unsigned);
 extern void blk_queue_issue_flush_fn(request_queue_t *, issue_flush_fn *);
-extern struct request *blk_start_pre_flush(request_queue_t *,struct request *);
-extern int blk_complete_barrier_rq(request_queue_t *, struct request *, int);
-extern int blk_complete_barrier_rq_locked(request_queue_t *, struct request *, int);
+extern int blk_do_ordered(request_queue_t *, struct request **);
+extern unsigned blk_ordered_cur_seq(request_queue_t *);
+extern void blk_ordered_complete_seq(request_queue_t *, unsigned, int);
 
 extern int blk_rq_map_sg(request_queue_t *, struct request *, struct scatterlist *);
 extern void blk_dump_rq_flags(struct request *, char *);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 06/09] blk: update SCSI to use the new blk_ordered
  2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
                   ` (4 preceding siblings ...)
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 05/09] blk: reimplement handling of barrier request Tejun Heo
@ 2005-06-05  5:57 ` Tejun Heo
  2005-06-05  7:08   ` Jeff Garzik
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 07/09] blk: update libata " Tejun Heo
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

06_blk_update_scsi_to_use_new_ordered.patch

	All ordered request related stuff delegated to HLD.  Midlayer
	now doens't deal with ordered setting or prepare_flush
	callback.  sd.c updated to deal with blk_queue_ordered
	setting.  Currently, ordered tag isn't used as SCSI midlayer
	cannot guarantee request ordering.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/scsi/hosts.c       |    9 -----
 drivers/scsi/scsi_lib.c    |   46 --------------------------
 drivers/scsi/sd.c          |   79 +++++++++++++++++++--------------------------
 include/scsi/scsi_driver.h |    1 
 include/scsi/scsi_host.h   |    1 
 5 files changed, 34 insertions(+), 102 deletions(-)

Index: blk-fixes/drivers/scsi/sd.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sd.c	2005-06-05 14:53:32.000000000 +0900
+++ blk-fixes/drivers/scsi/sd.c	2005-06-05 14:53:35.000000000 +0900
@@ -103,6 +103,7 @@ struct scsi_disk {
 	u8		write_prot;
 	unsigned	WCE : 1;	/* state of disk WCE bit */
 	unsigned	RCD : 1;	/* state of disk RCD bit, unused */
+	unsigned	DPOFUA : 1;	/* state of disk DPOFUA bit */
 };
 
 static DEFINE_IDR(sd_index_idr);
@@ -122,8 +123,7 @@ static void sd_shutdown(struct device *d
 static void sd_rescan(struct device *);
 static int sd_init_command(struct scsi_cmnd *);
 static int sd_issue_flush(struct device *, sector_t *);
-static void sd_end_flush(request_queue_t *, struct request *);
-static int sd_prepare_flush(request_queue_t *, struct request *);
+static void sd_prepare_flush(request_queue_t *, struct request *);
 static void sd_read_capacity(struct scsi_disk *sdkp, char *diskname,
 		 struct scsi_request *SRpnt, unsigned char *buffer);
 
@@ -138,8 +138,6 @@ static struct scsi_driver sd_template = 
 	.rescan			= sd_rescan,
 	.init_command		= sd_init_command,
 	.issue_flush		= sd_issue_flush,
-	.prepare_flush		= sd_prepare_flush,
-	.end_flush		= sd_end_flush,
 };
 
 /*
@@ -346,6 +344,7 @@ static int sd_init_command(struct scsi_c
 	
 	if (block > 0xffffffff) {
 		SCpnt->cmnd[0] += READ_16 - READ_6;
+		SCpnt->cmnd[1] |= blk_fua_rq(rq) ? 0x8 : 0;
 		SCpnt->cmnd[2] = sizeof(block) > 4 ? (unsigned char) (block >> 56) & 0xff : 0;
 		SCpnt->cmnd[3] = sizeof(block) > 4 ? (unsigned char) (block >> 48) & 0xff : 0;
 		SCpnt->cmnd[4] = sizeof(block) > 4 ? (unsigned char) (block >> 40) & 0xff : 0;
@@ -360,11 +359,12 @@ static int sd_init_command(struct scsi_c
 		SCpnt->cmnd[13] = (unsigned char) this_count & 0xff;
 		SCpnt->cmnd[14] = SCpnt->cmnd[15] = 0;
 	} else if ((this_count > 0xff) || (block > 0x1fffff) ||
-		   SCpnt->device->use_10_for_rw) {
+		   SCpnt->device->use_10_for_rw || blk_fua_rq(rq)) {
 		if (this_count > 0xffff)
 			this_count = 0xffff;
 
 		SCpnt->cmnd[0] += READ_10 - READ_6;
+		SCpnt->cmnd[1] |= blk_fua_rq(rq) ? 0x8 : 0;
 		SCpnt->cmnd[2] = (unsigned char) (block >> 24) & 0xff;
 		SCpnt->cmnd[3] = (unsigned char) (block >> 16) & 0xff;
 		SCpnt->cmnd[4] = (unsigned char) (block >> 8) & 0xff;
@@ -739,43 +739,12 @@ static int sd_issue_flush(struct device 
 	return sd_sync_cache(sdp);
 }
 
-static void sd_end_flush(request_queue_t *q, struct request *flush_rq)
+static void sd_prepare_flush(request_queue_t *q, struct request *rq)
 {
-	struct request *rq = flush_rq->end_io_data;
-	struct scsi_cmnd *cmd = rq->special;
-	unsigned int bytes = rq->hard_nr_sectors << 9;
-
-	if (!flush_rq->errors) {
-		spin_unlock(q->queue_lock);
-		scsi_io_completion(cmd, bytes, 0);
-		spin_lock(q->queue_lock);
-	} else if (blk_barrier_postflush(rq)) {
-		spin_unlock(q->queue_lock);
-		scsi_io_completion(cmd, 0, bytes);
-		spin_lock(q->queue_lock);
-	} else {
-		/*
-		 * force journal abort of barriers
-		 */
-		end_that_request_first(rq, -EOPNOTSUPP, rq->hard_nr_sectors);
-		end_that_request_last(rq, -EOPNOTSUPP);
-	}
-}
-
-static int sd_prepare_flush(request_queue_t *q, struct request *rq)
-{
-	struct scsi_device *sdev = q->queuedata;
-	struct scsi_disk *sdkp = dev_get_drvdata(&sdev->sdev_gendev);
-
-	if (sdkp->WCE) {
-		memset(rq->cmd, 0, sizeof(rq->cmd));
-		rq->flags |= REQ_BLOCK_PC | REQ_SOFTBARRIER;
-		rq->timeout = SD_TIMEOUT;
-		rq->cmd[0] = SYNCHRONIZE_CACHE;
-		return 1;
-	}
-
-	return 0;
+	memset(rq->cmd, 0, sizeof(rq->cmd));
+	rq->flags |= REQ_BLOCK_PC | REQ_SOFTBARRIER;
+	rq->timeout = SD_TIMEOUT;
+	rq->cmd[0] = SYNCHRONIZE_CACHE;
 }
 
 static void sd_rescan(struct device *dev)
@@ -1433,10 +1402,13 @@ sd_read_cache_type(struct scsi_disk *sdk
 			sdkp->RCD = 0;
 		}
 
+		sdkp->DPOFUA = (data.device_specific & 0x10) != 0;
+
 		ct =  sdkp->RCD + 2*sdkp->WCE;
 
-		printk(KERN_NOTICE "SCSI device %s: drive cache: %s\n",
-		       diskname, types[ct]);
+		printk(KERN_NOTICE "SCSI device %s: drive cache: %s%s\n",
+		       diskname, types[ct],
+		       sdkp->DPOFUA ? " with forced unit access" : "");
 
 		return;
 	}
@@ -1469,6 +1441,7 @@ static int sd_revalidate_disk(struct gen
 	struct scsi_device *sdp = sdkp->device;
 	struct scsi_request *sreq;
 	unsigned char *buffer;
+	unsigned ordered;
 
 	SCSI_LOG_HLQUEUE(3, printk("sd_revalidate_disk: disk=%s\n", disk->disk_name));
 
@@ -1514,7 +1487,22 @@ static int sd_revalidate_disk(struct gen
 					sreq, buffer);
 		sd_read_cache_type(sdkp, disk->disk_name, sreq, buffer);
 	}
-		
+
+	/*
+	 * We now have all cache related info, determine how we deal
+	 * with ordered requests.  Note that as the current SCSI
+	 * dispatch function can alter request order, we cannot use
+	 * QUEUE_ORDERED_TAG_* even when ordered tag is supported.
+	 */
+	if (sdkp->WCE)
+		ordered = sdkp->DPOFUA
+			? QUEUE_ORDERED_DRAIN_FUA : QUEUE_ORDERED_DRAIN_FLUSH;
+	else
+		ordered = QUEUE_ORDERED_DRAIN;
+
+	blk_queue_ordered(sdkp->disk->queue, ordered, sd_prepare_flush,
+			  GFP_KERNEL);
+
 	set_capacity(disk, sdkp->capacity);
 	kfree(buffer);
 
@@ -1615,6 +1603,7 @@ static int sd_probe(struct device *dev)
 	strcpy(gd->devfs_name, sdp->devfs_name);
 
 	gd->private_data = &sdkp->driver;
+	gd->queue = sdkp->device->request_queue;
 
 	sd_revalidate_disk(gd);
 
@@ -1622,7 +1611,6 @@ static int sd_probe(struct device *dev)
 	gd->flags = GENHD_FL_DRIVERFS;
 	if (sdp->removable)
 		gd->flags |= GENHD_FL_REMOVABLE;
-	gd->queue = sdkp->device->request_queue;
 
 	dev_set_drvdata(dev, sdkp);
 	add_disk(gd);
@@ -1657,6 +1645,7 @@ static int sd_remove(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
 
+	blk_queue_ordered(sdkp->disk->queue, QUEUE_ORDERED_NONE, NULL, 0);
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 	down(&sd_ref_sem);
Index: blk-fixes/drivers/scsi/scsi_lib.c
===================================================================
--- blk-fixes.orig/drivers/scsi/scsi_lib.c	2005-06-05 14:53:33.000000000 +0900
+++ blk-fixes/drivers/scsi/scsi_lib.c	2005-06-05 14:53:35.000000000 +0900
@@ -717,9 +717,6 @@ void scsi_io_completion(struct scsi_cmnd
 	int sense_valid = 0;
 	int sense_deferred = 0;
 
-	if (blk_complete_barrier_rq(q, req, good_bytes >> 9))
-		return;
-
 	/*
 	 * Free up any indirection buffers we allocated for DMA purposes. 
 	 * For the case of a READ, we need to copy the data out of the
@@ -984,38 +981,6 @@ static int scsi_init_io(struct scsi_cmnd
 	return BLKPREP_KILL;
 }
 
-static int scsi_prepare_flush_fn(request_queue_t *q, struct request *rq)
-{
-	struct scsi_device *sdev = q->queuedata;
-	struct scsi_driver *drv;
-
-	if (sdev->sdev_state == SDEV_RUNNING) {
-		drv = *(struct scsi_driver **) rq->rq_disk->private_data;
-
-		if (drv->prepare_flush)
-			return drv->prepare_flush(q, rq);
-	}
-
-	return 0;
-}
-
-static void scsi_end_flush_fn(request_queue_t *q, struct request *rq)
-{
-	struct scsi_device *sdev = q->queuedata;
-	struct request *flush_rq = rq->end_io_data;
-	struct scsi_driver *drv;
-
-	if (flush_rq->errors) {
-		printk("scsi: barrier error, disabling flush support\n");
-		blk_queue_ordered(q, QUEUE_ORDERED_NONE);
-	}
-
-	if (sdev->sdev_state == SDEV_RUNNING) {
-		drv = *(struct scsi_driver **) rq->rq_disk->private_data;
-		drv->end_flush(q, rq);
-	}
-}
-
 static int scsi_issue_flush_fn(request_queue_t *q, struct gendisk *disk,
 			       sector_t *error_sector)
 {
@@ -1443,17 +1408,6 @@ struct request_queue *scsi_alloc_queue(s
 	blk_queue_segment_boundary(q, shost->dma_boundary);
 	blk_queue_issue_flush_fn(q, scsi_issue_flush_fn);
 
-	/*
-	 * ordered tags are superior to flush ordering
-	 */
-	if (shost->ordered_tag)
-		blk_queue_ordered(q, QUEUE_ORDERED_TAG);
-	else if (shost->ordered_flush) {
-		blk_queue_ordered(q, QUEUE_ORDERED_FLUSH);
-		q->prepare_flush_fn = scsi_prepare_flush_fn;
-		q->end_flush_fn = scsi_end_flush_fn;
-	}
-
 	if (!shost->use_clustering)
 		clear_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags);
 	return q;
Index: blk-fixes/drivers/scsi/hosts.c
===================================================================
--- blk-fixes.orig/drivers/scsi/hosts.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/hosts.c	2005-06-05 14:53:35.000000000 +0900
@@ -264,17 +264,8 @@ struct Scsi_Host *scsi_host_alloc(struct
 	shost->cmd_per_lun = sht->cmd_per_lun;
 	shost->unchecked_isa_dma = sht->unchecked_isa_dma;
 	shost->use_clustering = sht->use_clustering;
-	shost->ordered_flush = sht->ordered_flush;
 	shost->ordered_tag = sht->ordered_tag;
 
-	/*
-	 * hosts/devices that do queueing must support ordered tags
-	 */
-	if (shost->can_queue > 1 && shost->ordered_flush) {
-		printk(KERN_ERR "scsi: ordered flushes don't support queueing\n");
-		shost->ordered_flush = 0;
-	}
-
 	if (sht->max_host_blocked)
 		shost->max_host_blocked = sht->max_host_blocked;
 	else
Index: blk-fixes/include/scsi/scsi_host.h
===================================================================
--- blk-fixes.orig/include/scsi/scsi_host.h	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/include/scsi/scsi_host.h	2005-06-05 14:53:35.000000000 +0900
@@ -391,7 +391,6 @@ struct scsi_host_template {
 	/*
 	 * ordered write support
 	 */
-	unsigned ordered_flush:1;
 	unsigned ordered_tag:1;
 
 	/*
Index: blk-fixes/include/scsi/scsi_driver.h
===================================================================
--- blk-fixes.orig/include/scsi/scsi_driver.h	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/include/scsi/scsi_driver.h	2005-06-05 14:53:35.000000000 +0900
@@ -15,7 +15,6 @@ struct scsi_driver {
 	void (*rescan)(struct device *);
 	int (*issue_flush)(struct device *, sector_t *);
 	int (*prepare_flush)(struct request_queue *, struct request *);
-	void (*end_flush)(struct request_queue *, struct request *);
 };
 #define to_scsi_driver(drv) \
 	container_of((drv), struct scsi_driver, gendrv)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 07/09] blk: update libata to use the new blk_ordered.
  2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
                   ` (5 preceding siblings ...)
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 06/09] blk: update SCSI to use the new blk_ordered Tejun Heo
@ 2005-06-05  5:57 ` Tejun Heo
  2005-06-05  7:02   ` Jeff Garzik
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 08/09] blk: update IDE " Tejun Heo
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 09/09] blk: debug messages Tejun Heo
  8 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

07_blk_update_libata_to_use_new_ordered.patch

	Reflect changes in SCSI midlayer and updated to use the new
	ordered request implementation.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/scsi/ahci.c         |    1 -
 drivers/scsi/ata_piix.c     |    1 -
 drivers/scsi/libata-core.c  |   14 +++++++++-----
 drivers/scsi/libata-scsi.c  |   22 +++++++++++++++++++++-
 drivers/scsi/sata_nv.c      |    1 -
 drivers/scsi/sata_promise.c |    1 -
 drivers/scsi/sata_sil.c     |    1 -
 drivers/scsi/sata_sis.c     |    1 -
 drivers/scsi/sata_svw.c     |    1 -
 drivers/scsi/sata_sx4.c     |    1 -
 drivers/scsi/sata_uli.c     |    1 -
 drivers/scsi/sata_via.c     |    1 -
 drivers/scsi/sata_vsc.c     |    1 -
 include/linux/ata.h         |    4 +++-
 include/linux/libata.h      |    1 +
 15 files changed, 34 insertions(+), 18 deletions(-)

Index: blk-fixes/drivers/scsi/ahci.c
===================================================================
--- blk-fixes.orig/drivers/scsi/ahci.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/ahci.c	2005-06-05 14:53:35.000000000 +0900
@@ -203,7 +203,6 @@ static Scsi_Host_Template ahci_sht = {
 	.dma_boundary		= AHCI_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations ahci_ops = {
Index: blk-fixes/drivers/scsi/ata_piix.c
===================================================================
--- blk-fixes.orig/drivers/scsi/ata_piix.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/ata_piix.c	2005-06-05 14:53:35.000000000 +0900
@@ -123,7 +123,6 @@ static Scsi_Host_Template piix_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations piix_pata_ops = {
Index: blk-fixes/drivers/scsi/sata_nv.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_nv.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_nv.c	2005-06-05 14:53:35.000000000 +0900
@@ -206,7 +206,6 @@ static Scsi_Host_Template nv_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations nv_ops = {
Index: blk-fixes/drivers/scsi/sata_promise.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_promise.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_promise.c	2005-06-05 14:53:35.000000000 +0900
@@ -104,7 +104,6 @@ static Scsi_Host_Template pdc_ata_sht = 
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations pdc_ata_ops = {
Index: blk-fixes/drivers/scsi/sata_sil.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_sil.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_sil.c	2005-06-05 14:53:35.000000000 +0900
@@ -135,7 +135,6 @@ static Scsi_Host_Template sil_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations sil_ops = {
Index: blk-fixes/drivers/scsi/sata_sis.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_sis.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_sis.c	2005-06-05 14:53:35.000000000 +0900
@@ -90,7 +90,6 @@ static Scsi_Host_Template sis_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations sis_ops = {
Index: blk-fixes/drivers/scsi/sata_svw.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_svw.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_svw.c	2005-06-05 14:53:35.000000000 +0900
@@ -288,7 +288,6 @@ static Scsi_Host_Template k2_sata_sht = 
 	.proc_info		= k2_sata_proc_info,
 #endif
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 
Index: blk-fixes/drivers/scsi/sata_sx4.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_sx4.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_sx4.c	2005-06-05 14:53:35.000000000 +0900
@@ -188,7 +188,6 @@ static Scsi_Host_Template pdc_sata_sht =
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations pdc_20621_ops = {
Index: blk-fixes/drivers/scsi/sata_uli.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_uli.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_uli.c	2005-06-05 14:53:35.000000000 +0900
@@ -82,7 +82,6 @@ static Scsi_Host_Template uli_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations uli_ops = {
Index: blk-fixes/drivers/scsi/sata_via.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_via.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_via.c	2005-06-05 14:53:35.000000000 +0900
@@ -102,7 +102,6 @@ static Scsi_Host_Template svia_sht = {
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 static struct ata_port_operations svia_sata_ops = {
Index: blk-fixes/drivers/scsi/sata_vsc.c
===================================================================
--- blk-fixes.orig/drivers/scsi/sata_vsc.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/sata_vsc.c	2005-06-05 14:53:35.000000000 +0900
@@ -206,7 +206,6 @@ static Scsi_Host_Template vsc_sata_sht =
 	.dma_boundary		= ATA_DMA_BOUNDARY,
 	.slave_configure	= ata_scsi_slave_config,
 	.bios_param		= ata_std_bios_param,
-	.ordered_flush		= 1,
 };
 
 
Index: blk-fixes/drivers/scsi/libata-core.c
===================================================================
--- blk-fixes.orig/drivers/scsi/libata-core.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/libata-core.c	2005-06-05 14:53:35.000000000 +0900
@@ -510,19 +510,21 @@ void ata_tf_from_fis(u8 *fis, struct ata
 }
 
 /**
- *	ata_prot_to_cmd - determine which read/write opcodes to use
+ *	ata_prot_to_cmd - determine which read/write/fua-write opcodes to use
  *	@protocol: ATA_PROT_xxx taskfile protocol
  *	@lba48: true is lba48 is present
  *
- *	Given necessary input, determine which read/write commands
- *	to use to transfer data.
+ *	Given necessary input, determine which read/write/fua-write
+ *	commands to use to transfer data.  Note that we only support
+ *	fua-writes on DMA LBA48 protocol.  In other cases, we simply
+ *	return 0 which is NOP.
  *
  *	LOCKING:
  *	None.
  */
 static int ata_prot_to_cmd(int protocol, int lba48)
 {
-	int rcmd = 0, wcmd = 0;
+	int rcmd = 0, wcmd = 0, wfua = 0;
 
 	switch (protocol) {
 	case ATA_PROT_PIO:
@@ -539,6 +541,7 @@ static int ata_prot_to_cmd(int protocol,
 		if (lba48) {
 			rcmd = ATA_CMD_READ_EXT;
 			wcmd = ATA_CMD_WRITE_EXT;
+			wfua = ATA_CMD_WRITE_FUA_EXT;
 		} else {
 			rcmd = ATA_CMD_READ;
 			wcmd = ATA_CMD_WRITE;
@@ -549,7 +552,7 @@ static int ata_prot_to_cmd(int protocol,
 		return -1;
 	}
 
-	return rcmd | (wcmd << 8);
+	return rcmd | (wcmd << 8) | (wfua << 16);
 }
 
 /**
@@ -582,6 +585,7 @@ static void ata_dev_set_protocol(struct 
 
 	dev->read_cmd = cmd & 0xff;
 	dev->write_cmd = (cmd >> 8) & 0xff;
+	dev->write_fua_cmd = (cmd >> 16) & 0xff;
 }
 
 static const char * xfer_mode_str[] = {
Index: blk-fixes/drivers/scsi/libata-scsi.c
===================================================================
--- blk-fixes.orig/drivers/scsi/libata-scsi.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/scsi/libata-scsi.c	2005-06-05 14:53:35.000000000 +0900
@@ -569,6 +569,7 @@ static unsigned int ata_scsi_rw_xlat(str
 	struct ata_device *dev = qc->dev;
 	unsigned int lba   = tf->flags & ATA_TFLAG_LBA;
 	unsigned int lba48 = tf->flags & ATA_TFLAG_LBA48;
+	int fua = scsicmd[1] & 0x8;
 	u64 block = 0;
 	u32 n_block = 0;
 
@@ -577,9 +578,26 @@ static unsigned int ata_scsi_rw_xlat(str
 
 	if (scsicmd[0] == READ_10 || scsicmd[0] == READ_6 ||
 	    scsicmd[0] == READ_16) {
+		if (fua) {
+			printk(KERN_WARNING
+			       "ata%u(%u): WARNING: FUA READ unsupported\n",
+			       qc->ap->id, qc->dev->devno);
+			return 1;
+		}
 		tf->command = qc->dev->read_cmd;
 	} else {
-		tf->command = qc->dev->write_cmd;
+		if (fua) {
+			if (qc->dev->write_fua_cmd == 0 || !lba48) {
+				printk(KERN_WARNING
+				       "ata%u(%u): WARNING: FUA WRITE "
+				       "unsupported with the current "
+				       "protocol/addressing\n",
+				       qc->ap->id, qc->dev->devno);
+				return 1;
+			}
+			tf->command = qc->dev->write_fua_cmd;
+		} else
+			tf->command = qc->dev->write_cmd;
 		tf->flags |= ATA_TFLAG_WRITE;
 	}
 
@@ -1205,10 +1223,12 @@ unsigned int ata_scsiop_mode_sense(struc
 	if (six_byte) {
 		output_len--;
 		rbuf[0] = output_len;
+		rbuf[2] |= ata_id_has_fua(args->id) ? 0x10 : 0;
 	} else {
 		output_len -= 2;
 		rbuf[0] = output_len >> 8;
 		rbuf[1] = output_len;
+		rbuf[3] |= ata_id_has_fua(args->id) ? 0x10 : 0;
 	}
 
 	return 0;
Index: blk-fixes/include/linux/ata.h
===================================================================
--- blk-fixes.orig/include/linux/ata.h	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/include/linux/ata.h	2005-06-05 14:53:35.000000000 +0900
@@ -117,6 +117,7 @@ enum {
 	ATA_CMD_READ_EXT	= 0x25,
 	ATA_CMD_WRITE		= 0xCA,
 	ATA_CMD_WRITE_EXT	= 0x35,
+	ATA_CMD_WRITE_FUA_EXT	= 0x3D,
 	ATA_CMD_PIO_READ	= 0x20,
 	ATA_CMD_PIO_READ_EXT	= 0x24,
 	ATA_CMD_PIO_WRITE	= 0x30,
@@ -229,7 +230,8 @@ struct ata_taskfile {
 #define ata_id_is_sata(id)	((id)[93] == 0)
 #define ata_id_rahead_enabled(id) ((id)[85] & (1 << 6))
 #define ata_id_wcache_enabled(id) ((id)[85] & (1 << 5))
-#define ata_id_has_flush(id) ((id)[83] & (1 << 12))
+#define ata_id_has_fua(id)	((id)[84] & (1 << 6))
+#define ata_id_has_flush(id)	((id)[83] & (1 << 12))
 #define ata_id_has_flush_ext(id) ((id)[83] & (1 << 13))
 #define ata_id_has_lba48(id)	((id)[83] & (1 << 10))
 #define ata_id_has_wcache(id)	((id)[82] & (1 << 5))
Index: blk-fixes/include/linux/libata.h
===================================================================
--- blk-fixes.orig/include/linux/libata.h	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/include/linux/libata.h	2005-06-05 14:53:35.000000000 +0900
@@ -280,6 +280,7 @@ struct ata_device {
 	u8			xfer_protocol;	/* taskfile xfer protocol */
 	u8			read_cmd;	/* opcode to use on read */
 	u8			write_cmd;	/* opcode to use on write */
+	u8			write_fua_cmd;	/* opcode to use on FUA write */
 
 	/* for CHS addressing */
 	u16			cylinders;	/* Number of cylinders */


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 08/09] blk: update IDE to use the new blk_ordered.
  2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
                   ` (6 preceding siblings ...)
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 07/09] blk: update libata " Tejun Heo
@ 2005-06-05  5:57 ` Tejun Heo
  2005-06-05  6:47   ` Jeff Garzik
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 09/09] blk: debug messages Tejun Heo
  8 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

08_blk_update_ide_to_use_new_ordered.patch

	Update IDE to use the new blk_ordered.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/ide/ide-disk.c |   92 ++++++++++++++++++++-----------------------------
 drivers/ide/ide-io.c   |    5 --
 include/linux/hdreg.h  |   19 +++++++++-
 include/linux/ide.h    |    3 +
 4 files changed, 59 insertions(+), 60 deletions(-)

Index: blk-fixes/drivers/ide/ide-disk.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-disk.c	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/drivers/ide/ide-disk.c	2005-06-05 14:53:35.000000000 +0900
@@ -176,6 +176,18 @@ static ide_startstop_t __ide_do_rw_disk(
 			lba48 = 0;
 	}
 
+	if (blk_fua_rq(rq) &&
+	    (!rq_data_dir(rq) || !drive->select.b.lba || !lba48)) {
+		if (!rq_data_dir(rq))
+			printk(KERN_WARNING "%s: FUA READ unsupported\n",
+			       drive->name);
+		else
+			printk(KERN_WARNING "%s: FUA request received but "
+			       "cannot use LBA48\n", drive->name);
+		ide_end_request(drive, 0, 0);
+		return ide_stopped;
+	}
+
 	if (!dma) {
 		ide_init_sg_cmd(drive, rq);
 		ide_map_sg(drive, rq);
@@ -253,9 +265,12 @@ static ide_startstop_t __ide_do_rw_disk(
 	if (dma) {
 		if (!hwif->dma_setup(drive)) {
 			if (rq_data_dir(rq)) {
-				command = lba48 ? WIN_WRITEDMA_EXT : WIN_WRITEDMA;
-				if (drive->vdma)
-					command = lba48 ? WIN_WRITE_EXT: WIN_WRITE;
+				if (!blk_fua_rq(rq)) {
+					command = lba48 ? WIN_WRITEDMA_EXT : WIN_WRITEDMA;
+					if (drive->vdma)
+						command = lba48 ? WIN_WRITE_EXT: WIN_WRITE;
+				} else
+					command = WIN_WRITEDMA_FUA_EXT;
 			} else {
 				command = lba48 ? WIN_READDMA_EXT : WIN_READDMA;
 				if (drive->vdma)
@@ -284,7 +299,10 @@ static ide_startstop_t __ide_do_rw_disk(
 	} else {
 		if (drive->mult_count) {
 			hwif->data_phase = TASKFILE_MULTI_OUT;
-			command = lba48 ? WIN_MULTWRITE_EXT : WIN_MULTWRITE;
+			if (!blk_fua_rq(rq))
+				command = lba48 ? WIN_MULTWRITE_EXT : WIN_MULTWRITE;
+			else
+				command = WIN_MULTWRITE_FUA_EXT;
 		} else {
 			hwif->data_phase = TASKFILE_OUT;
 			command = lba48 ? WIN_WRITE_EXT : WIN_WRITE;
@@ -681,51 +699,10 @@ static ide_proc_entry_t idedisk_proc[] =
 
 #endif	/* CONFIG_PROC_FS */
 
-static void idedisk_end_flush(request_queue_t *q, struct request *flush_rq)
-{
-	ide_drive_t *drive = q->queuedata;
-	struct request *rq = flush_rq->end_io_data;
-	int good_sectors = rq->hard_nr_sectors;
-	int bad_sectors;
-	sector_t sector;
-
-	if (flush_rq->errors & ABRT_ERR) {
-		printk(KERN_ERR "%s: barrier support doesn't work\n", drive->name);
-		blk_queue_ordered(drive->queue, QUEUE_ORDERED_NONE);
-		blk_queue_issue_flush_fn(drive->queue, NULL);
-		good_sectors = 0;
-	} else if (flush_rq->errors) {
-		good_sectors = 0;
-		if (blk_barrier_preflush(rq)) {
-			sector = ide_get_error_location(drive,flush_rq->buffer);
-			if ((sector >= rq->hard_sector) &&
-			    (sector < rq->hard_sector + rq->hard_nr_sectors))
-				good_sectors = sector - rq->hard_sector;
-		}
-	}
-
-	if (flush_rq->errors)
-		printk(KERN_ERR "%s: failed barrier write: "
-				"sector=%Lx(good=%d/bad=%d)\n",
-				drive->name, (unsigned long long)rq->sector,
-				good_sectors,
-				(int) (rq->hard_nr_sectors-good_sectors));
-
-	bad_sectors = rq->hard_nr_sectors - good_sectors;
-
-	if (good_sectors)
-		__ide_end_request(drive, rq, 1, good_sectors);
-	if (bad_sectors)
-		__ide_end_request(drive, rq, 0, bad_sectors);
-}
-
-static int idedisk_prepare_flush(request_queue_t *q, struct request *rq)
+static void idedisk_prepare_flush(request_queue_t *q, struct request *rq)
 {
 	ide_drive_t *drive = q->queuedata;
 
-	if (!drive->wcache)
-		return 0;
-
 	memset(rq->cmd, 0, sizeof(rq->cmd));
 
 	if (ide_id_has_flush_cache_ext(drive->id) &&
@@ -737,7 +714,6 @@ static int idedisk_prepare_flush(request
 
 	rq->flags |= REQ_DRIVE_TASK | REQ_SOFTBARRIER;
 	rq->buffer = rq->cmd;
-	return 1;
 }
 
 static int idedisk_issue_flush(request_queue_t *q, struct gendisk *disk,
@@ -888,7 +864,7 @@ static void idedisk_setup (ide_drive_t *
 {
 	struct hd_driveid *id = drive->id;
 	unsigned long long capacity;
-	int barrier;
+	int barrier, fua;
 
 	idedisk_add_settings(drive);
 
@@ -1011,14 +987,20 @@ static void idedisk_setup (ide_drive_t *
 			barrier = 0;
 	}
 
-	printk(KERN_INFO "%s: cache flushes %ssupported\n",
-		drive->name, barrier ? "" : "not ");
+	fua = barrier && idedisk_supports_lba48(id) && ide_id_has_fua(id);
+
+	printk(KERN_INFO "%s: cache flushes %ssupported%s\n",
+	       drive->name, barrier ? "" : "not ",
+	       fua ? " with forced unit access" : "");
 	if (barrier) {
-		blk_queue_ordered(drive->queue, QUEUE_ORDERED_FLUSH);
-		drive->queue->prepare_flush_fn = idedisk_prepare_flush;
-		drive->queue->end_flush_fn = idedisk_end_flush;
+		unsigned ordered = fua ? QUEUE_ORDERED_DRAIN_FUA
+				       : QUEUE_ORDERED_DRAIN_FLUSH;
+		blk_queue_ordered(drive->queue, ordered,
+				  idedisk_prepare_flush, GFP_KERNEL);
 		blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
-	}
+	} else if (!drive->wcache)
+		blk_queue_ordered(drive->queue, QUEUE_ORDERED_DRAIN,
+				  NULL, GFP_KERNEL);
 }
 
 static void ide_cacheflush_p(ide_drive_t *drive)
@@ -1036,6 +1018,8 @@ static int ide_disk_remove(struct device
 	struct ide_disk_obj *idkp = drive->driver_data;
 	struct gendisk *g = idkp->disk;
 
+	blk_queue_ordered(drive->queue, QUEUE_ORDERED_NONE, NULL, 0);
+
 	ide_cacheflush_p(drive);
 
 	ide_unregister_subdriver(drive, idkp->driver);
Index: blk-fixes/drivers/ide/ide-io.c
===================================================================
--- blk-fixes.orig/drivers/ide/ide-io.c	2005-06-05 14:53:33.000000000 +0900
+++ blk-fixes/drivers/ide/ide-io.c	2005-06-05 14:53:35.000000000 +0900
@@ -119,10 +119,7 @@ int ide_end_request (ide_drive_t *drive,
 	if (!nr_sectors)
 		nr_sectors = rq->hard_cur_sectors;
 
-	if (blk_complete_barrier_rq_locked(drive->queue, rq, nr_sectors))
-		ret = rq->nr_sectors != 0;
-	else
-		ret = __ide_end_request(drive, rq, uptodate, nr_sectors);
+	ret = __ide_end_request(drive, rq, uptodate, nr_sectors);
 
 	spin_unlock_irqrestore(&ide_lock, flags);
 	return ret;
Index: blk-fixes/include/linux/hdreg.h
===================================================================
--- blk-fixes.orig/include/linux/hdreg.h	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/include/linux/hdreg.h	2005-06-05 14:53:35.000000000 +0900
@@ -221,10 +221,13 @@ typedef struct hd_drive_hob_hdr {
 #define WIN_WRITE_LONG_ONCE		0x33 /* 28-Bit without retries */
 #define WIN_WRITE_EXT			0x34 /* 48-Bit */
 #define WIN_WRITEDMA_EXT		0x35 /* 48-Bit */
+#define WIN_WRITEDMA_FUA_EXT		0x3D /* 48-Bit forced unit access */
 #define WIN_WRITEDMA_QUEUED_EXT		0x36 /* 48-Bit */
+#define WIN_WRITEDMA_QUEUED_FUA_EXT	0x3E /* 48-Bit forced unit access */
 #define WIN_SET_MAX_EXT			0x37 /* 48-Bit */
 #define CFA_WRITE_SECT_WO_ERASE		0x38 /* CFA Write Sectors without erase */
 #define WIN_MULTWRITE_EXT		0x39 /* 48-Bit */
+#define WIN_MULTWRITE_FUA_EXT		0xCE /* 48-Bit forced unit access */
 /*
  *	0x3A->0x3B Reserved
  */
@@ -550,7 +553,13 @@ struct hd_driveid {
 					 * cmd set-feature supported extensions
 					 * 15:	Shall be ZERO
 					 * 14:	Shall be ONE
-					 * 13:6	reserved
+					 * 13:  IDLE IMMEDIATE w/ UNLOAD FEATURE
+					 * 12:11 reserved for technical report
+					 * 10:  URG for WRITE STREAM
+					 *  9:  URG for READ STREAM
+					 *  8:  64-bit World wide name
+					 *  7:  WRITE DMA QUEUED FUA EXT
+					 *  6:  WRITE DMA/MULTIPLE FUA EXT
 					 *  5:	General Purpose Logging
 					 *  4:	Streaming Feature Set
 					 *  3:	Media Card Pass Through
@@ -600,7 +609,13 @@ struct hd_driveid {
 					 * command set-feature default
 					 * 15:	Shall be ZERO
 					 * 14:	Shall be ONE
-					 * 13:6	reserved
+					 * 13:  IDLE IMMEDIATE w/ UNLOAD FEATURE
+					 * 12:11 reserved for technical report
+					 * 10:  URG for WRITE STREAM
+					 *  9:  URG for READ STREAM
+					 *  8:  64-bit World wide name
+					 *  7:  WRITE DMA QUEUED FUA EXT
+					 *  6:  WRITE DMA/MULTIPLE FUA EXT
 					 *  5:	General Purpose Logging enabled
 					 *  4:	Valid CONFIGURE STREAM executed
 					 *  3:	Media Card Pass Through enabled
Index: blk-fixes/include/linux/ide.h
===================================================================
--- blk-fixes.orig/include/linux/ide.h	2005-06-05 14:50:11.000000000 +0900
+++ blk-fixes/include/linux/ide.h	2005-06-05 14:53:35.000000000 +0900
@@ -1497,6 +1497,9 @@ extern struct bus_type ide_bus_type;
 /* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
 #define ide_id_has_flush_cache(id)	((id)->cfs_enable_2 & 0x3000)
 
+/* check if WRITE DMA FUA EXT command is supported (defined in ATA-8) */
+#define ide_id_has_fua(id)		((id)->cfsse & 0x0040)
+
 /* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
 #define ide_id_has_flush_cache_ext(id)	\
 	(((id)->cfs_enable_2 & 0x2400) == 0x2400)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 09/09] blk: debug messages
  2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
                   ` (7 preceding siblings ...)
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 08/09] blk: update IDE " Tejun Heo
@ 2005-06-05  5:57 ` Tejun Heo
  8 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-05  5:57 UTC (permalink / raw)
  To: axboe, James.Bottomley, bzolnier, jgarzik; +Cc: linux-kernel

09_blk_ordered_reimpl_debug_msgs.patch

	Theses are debug message I've been using.  If you wanna see
	what's going on...

Signed-off-by: Tejun Heo <htejun@gmail.com>

 elevator.c  |    7 +++++++
 ll_rw_blk.c |   31 ++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

Index: blk-fixes/drivers/block/elevator.c
===================================================================
--- blk-fixes.orig/drivers/block/elevator.c	2005-06-05 14:53:34.000000000 +0900
+++ blk-fixes/drivers/block/elevator.c	2005-06-05 14:53:36.000000000 +0900
@@ -37,6 +37,8 @@
 
 #include <asm/uaccess.h>
 
+#define pd(fmt, args...) 	printk("[%-24s]: " fmt, __FUNCTION__ , ##args);
+
 /*
  * XXX HACK XXX Before entering elevator callbacks, we temporailiy
  * turn off REQ_CMD of proxy barrier request so that elevators don't
@@ -436,6 +438,11 @@ struct request *elv_next_request(request
 		}
 	}
 
+	if (rq && (rq == q->pre_flush_rq || rq == q->post_flush_rq ||
+		   rq == q->bar_rq))
+		pd("%p (%s)\n", rq,
+		   rq == q->pre_flush_rq ?
+			"pre" : (rq == q->post_flush_rq ? "post" : "bar"));
 	return rq;
 }
 
Index: blk-fixes/drivers/block/ll_rw_blk.c
===================================================================
--- blk-fixes.orig/drivers/block/ll_rw_blk.c	2005-06-05 14:53:34.000000000 +0900
+++ blk-fixes/drivers/block/ll_rw_blk.c	2005-06-05 14:53:36.000000000 +0900
@@ -30,6 +30,8 @@
 #include <linux/writeback.h>
 #include <linux/blkdev.h>
 
+#define pd(fmt, args...) 	printk("[%-24s]: " fmt, __FUNCTION__ , ##args);
+
 /*
  * for max sense size
  */
@@ -300,6 +302,9 @@ static int __blk_queue_ordered(request_q
 	unsigned ordered_flags;
 	int ret = 0;
 
+	pd("%x->%x, ordseq=%x, next_ordered=%x\n", q->ordered, ordered,
+	   q->ordseq, q->next_ordered);
+
 	might_sleep_if(gfp_mask & __GFP_WAIT);
 
 	ordered_flags = ordered & QUEUE_ORDERED_FLAGS;
@@ -484,6 +489,9 @@ void blk_ordered_complete_seq(request_qu
 	struct request *rq;
 	int uptodate, changed = 0;
 
+	pd("ordseq=%02x seq=%02x orderr=%d error=%d\n",
+	   q->ordseq, seq, q->orderr, error);
+
 	if (error && !q->orderr)
 		ordered_set_error(q, seq, error);
 
@@ -496,6 +504,7 @@ void blk_ordered_complete_seq(request_qu
 	/*
 	 * Okay, sequence complete.
 	 */
+	pd("sequence complete\n");
 	rq = q->orig_bar_rq;
 	uptodate = q->orderr ? q->orderr : 1;
 
@@ -559,6 +568,17 @@ static void queue_flush(request_queue_t 
 static inline struct request *start_ordered(request_queue_t *q,
 					    struct request *rq)
 {
+	pd("%p -> %p,%p,%p infl=%u\n",
+	   rq, q->pre_flush_rq, q->bar_rq, q->post_flush_rq, q->in_flight);
+	pd("%p %d %llu %lu %u %u %u %p\n", rq->bio, rq->errors,
+	   (unsigned long long)rq->hard_sector, rq->hard_nr_sectors,
+	   rq->current_nr_sectors, rq->nr_phys_segments, rq->nr_hw_segments,
+	   rq->buffer);
+	struct bio *bio;
+	for (bio = rq->bio; bio; bio = bio->bi_next)
+		pd("BIO %p %llu %u\n",
+		   bio, (unsigned long long)bio->bi_sector, bio->bi_size);
+
 	q->bi_size = 0;
 	q->orderr = 0;
 	q->ordseq |= QUEUE_ORDSEQ_STARTED;
@@ -596,6 +616,7 @@ static inline struct request *start_orde
 	} else
 		q->ordseq |= QUEUE_ORDSEQ_PREFLUSH;
 
+	pd("ordered=%x in_flight=%u\n", q->ordered, q->in_flight);
 	if ((q->ordered & QUEUE_ORDERED_TAG) || q->in_flight == 0)
 		q->ordseq |= QUEUE_ORDSEQ_DRAIN;
 	else
@@ -615,8 +636,10 @@ int blk_do_ordered(request_queue_t *q, s
 
 		if (q->ordered != QUEUE_ORDERED_NONE) {
 			*rqp = start_ordered(q, rq);
+			pd("start_ordered %p->%p\n", rq, *rqp);
 			return 1;
 		} else {
+			pd("ORDERED_NONE, seen barrier\n");
 			/*
 			 * This can happen when the queue switches to
 			 * ORDERED_NONE while this request is on it.
@@ -633,6 +656,7 @@ int blk_do_ordered(request_queue_t *q, s
 	if (q->ordered & QUEUE_ORDERED_TAG) {
 		if (blk_fs_request(rq) && rq != q->bar_rq)
 			*rqp = NULL;
+		pd("seq=%02x %p->%p\n", blk_ordered_cur_seq(q), rq, *rqp);
 		return 1;
 	}
 
@@ -654,7 +678,7 @@ int blk_do_ordered(request_queue_t *q, s
 	if (rq != allowed_rq && (blk_fs_request(rq) || rq == q->pre_flush_rq ||
 				 rq == q->post_flush_rq))
 		*rqp = NULL;
-
+	pd("seq=%02x %p->%p\n", blk_ordered_cur_seq(q), rq, *rqp);
 	return 1;
 }
 
@@ -687,6 +711,9 @@ static int flush_dry_bio_endio(struct bi
 	bio->bi_sector -= (q->bi_size >> 9);
 	q->bi_size = 0;
 
+	pd("BIO %p %llu %u\n",
+	   bio, (unsigned long long)bio->bi_sector, bio->bi_size);
+
 	return 0;
 }
 
@@ -700,6 +727,7 @@ static inline int ordered_bio_endio(stru
 	if (q->bar_rq != rq)
 		return 0;
 
+	pd("q->orderr=%d error=%d\n", q->orderr, error);
 	/*
 	 * Okay, this is the barrier request in progress, dry finish it.
 	 */
@@ -2791,6 +2819,7 @@ static int __make_request(request_queue_
 
 	barrier = bio_barrier(bio);
 	if (unlikely(barrier) && (q->ordered == QUEUE_ORDERED_NONE)) {
+		pd("ORDERED_NONE, seen barrier\n");
 		err = -EOPNOTSUPP;
 		goto end_io;
 	}


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 08/09] blk: update IDE to use the new blk_ordered.
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 08/09] blk: update IDE " Tejun Heo
@ 2005-06-05  6:47   ` Jeff Garzik
  2005-06-05 14:14     ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2005-06-05  6:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

Tejun Heo wrote:
> @@ -176,6 +176,18 @@ static ide_startstop_t __ide_do_rw_disk(
>  			lba48 = 0;
>  	}
>  
> +	if (blk_fua_rq(rq) &&
> +	    (!rq_data_dir(rq) || !drive->select.b.lba || !lba48)) {
> +		if (!rq_data_dir(rq))
> +			printk(KERN_WARNING "%s: FUA READ unsupported\n",
> +			       drive->name);
> +		else
> +			printk(KERN_WARNING "%s: FUA request received but "
> +			       "cannot use LBA48\n", drive->name);
> +		ide_end_request(drive, 0, 0);
> +		return ide_stopped;
> +	}
> +


Does this string of tests really need to be added to the main fast path?

It would be better to simply guarantee that this check need never exist 
in the IDE driver, by guaranteeing that the block layer will never send 
a FUA-READ command to a driver that does not wish it.

	Jeff



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 07/09] blk: update libata to use the new blk_ordered.
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 07/09] blk: update libata " Tejun Heo
@ 2005-06-05  7:02   ` Jeff Garzik
  2005-06-07  2:11     ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2005-06-05  7:02 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

Tejun Heo wrote:
> 07_blk_update_libata_to_use_new_ordered.patch
> 
> 	Reflect changes in SCSI midlayer and updated to use the new
> 	ordered request implementation.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>

I would prefer separate patches for:

* implement support for FUA bit in libata SCSI simulator

* update libata for your ordered flush changes



> Index: blk-fixes/drivers/scsi/ahci.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/ahci.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/ahci.c	2005-06-05 14:53:35.000000000 +0900
> @@ -203,7 +203,6 @@ static Scsi_Host_Template ahci_sht = {
>  	.dma_boundary		= AHCI_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  static struct ata_port_operations ahci_ops = {
> Index: blk-fixes/drivers/scsi/ata_piix.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/ata_piix.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/ata_piix.c	2005-06-05 14:53:35.000000000 +0900
> @@ -123,7 +123,6 @@ static Scsi_Host_Template piix_sht = {
>  	.dma_boundary		= ATA_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  static struct ata_port_operations piix_pata_ops = {
> Index: blk-fixes/drivers/scsi/sata_nv.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sata_nv.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/sata_nv.c	2005-06-05 14:53:35.000000000 +0900
> @@ -206,7 +206,6 @@ static Scsi_Host_Template nv_sht = {
>  	.dma_boundary		= ATA_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  static struct ata_port_operations nv_ops = {
> Index: blk-fixes/drivers/scsi/sata_promise.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sata_promise.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/sata_promise.c	2005-06-05 14:53:35.000000000 +0900
> @@ -104,7 +104,6 @@ static Scsi_Host_Template pdc_ata_sht = 
>  	.dma_boundary		= ATA_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  static struct ata_port_operations pdc_ata_ops = {
> Index: blk-fixes/drivers/scsi/sata_sil.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sata_sil.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/sata_sil.c	2005-06-05 14:53:35.000000000 +0900
> @@ -135,7 +135,6 @@ static Scsi_Host_Template sil_sht = {
>  	.dma_boundary		= ATA_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  static struct ata_port_operations sil_ops = {
> Index: blk-fixes/drivers/scsi/sata_sis.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sata_sis.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/sata_sis.c	2005-06-05 14:53:35.000000000 +0900
> @@ -90,7 +90,6 @@ static Scsi_Host_Template sis_sht = {
>  	.dma_boundary		= ATA_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  static struct ata_port_operations sis_ops = {
> Index: blk-fixes/drivers/scsi/sata_svw.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sata_svw.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/sata_svw.c	2005-06-05 14:53:35.000000000 +0900
> @@ -288,7 +288,6 @@ static Scsi_Host_Template k2_sata_sht = 
>  	.proc_info		= k2_sata_proc_info,
>  #endif
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  
> Index: blk-fixes/drivers/scsi/sata_sx4.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sata_sx4.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/sata_sx4.c	2005-06-05 14:53:35.000000000 +0900
> @@ -188,7 +188,6 @@ static Scsi_Host_Template pdc_sata_sht =
>  	.dma_boundary		= ATA_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  static struct ata_port_operations pdc_20621_ops = {
> Index: blk-fixes/drivers/scsi/sata_uli.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sata_uli.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/sata_uli.c	2005-06-05 14:53:35.000000000 +0900
> @@ -82,7 +82,6 @@ static Scsi_Host_Template uli_sht = {
>  	.dma_boundary		= ATA_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  static struct ata_port_operations uli_ops = {
> Index: blk-fixes/drivers/scsi/sata_via.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sata_via.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/sata_via.c	2005-06-05 14:53:35.000000000 +0900
> @@ -102,7 +102,6 @@ static Scsi_Host_Template svia_sht = {
>  	.dma_boundary		= ATA_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  static struct ata_port_operations svia_sata_ops = {
> Index: blk-fixes/drivers/scsi/sata_vsc.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sata_vsc.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/sata_vsc.c	2005-06-05 14:53:35.000000000 +0900
> @@ -206,7 +206,6 @@ static Scsi_Host_Template vsc_sata_sht =
>  	.dma_boundary		= ATA_DMA_BOUNDARY,
>  	.slave_configure	= ata_scsi_slave_config,
>  	.bios_param		= ata_std_bios_param,
> -	.ordered_flush		= 1,
>  };
>  
>  
> Index: blk-fixes/drivers/scsi/libata-core.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/libata-core.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/libata-core.c	2005-06-05 14:53:35.000000000 +0900
> @@ -510,19 +510,21 @@ void ata_tf_from_fis(u8 *fis, struct ata
>  }
>  
>  /**
> - *	ata_prot_to_cmd - determine which read/write opcodes to use
> + *	ata_prot_to_cmd - determine which read/write/fua-write opcodes to use
>   *	@protocol: ATA_PROT_xxx taskfile protocol
>   *	@lba48: true is lba48 is present
>   *
> - *	Given necessary input, determine which read/write commands
> - *	to use to transfer data.
> + *	Given necessary input, determine which read/write/fua-write
> + *	commands to use to transfer data.  Note that we only support
> + *	fua-writes on DMA LBA48 protocol.  In other cases, we simply
> + *	return 0 which is NOP.
>   *
>   *	LOCKING:
>   *	None.
>   */
>  static int ata_prot_to_cmd(int protocol, int lba48)
>  {
> -	int rcmd = 0, wcmd = 0;
> +	int rcmd = 0, wcmd = 0, wfua = 0;
>  
>  	switch (protocol) {
>  	case ATA_PROT_PIO:
> @@ -539,6 +541,7 @@ static int ata_prot_to_cmd(int protocol,
>  		if (lba48) {
>  			rcmd = ATA_CMD_READ_EXT;
>  			wcmd = ATA_CMD_WRITE_EXT;
> +			wfua = ATA_CMD_WRITE_FUA_EXT;
>  		} else {
>  			rcmd = ATA_CMD_READ;
>  			wcmd = ATA_CMD_WRITE;
> @@ -549,7 +552,7 @@ static int ata_prot_to_cmd(int protocol,
>  		return -1;
>  	}
>  
> -	return rcmd | (wcmd << 8);
> +	return rcmd | (wcmd << 8) | (wfua << 16);
>  }
>  
>  /**
> @@ -582,6 +585,7 @@ static void ata_dev_set_protocol(struct 
>  
>  	dev->read_cmd = cmd & 0xff;
>  	dev->write_cmd = (cmd >> 8) & 0xff;
> +	dev->write_fua_cmd = (cmd >> 16) & 0xff;
>  }
>  
>  static const char * xfer_mode_str[] = {
> Index: blk-fixes/drivers/scsi/libata-scsi.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/libata-scsi.c	2005-06-05 14:50:11.000000000 +0900
> +++ blk-fixes/drivers/scsi/libata-scsi.c	2005-06-05 14:53:35.000000000 +0900
> @@ -569,6 +569,7 @@ static unsigned int ata_scsi_rw_xlat(str
>  	struct ata_device *dev = qc->dev;
>  	unsigned int lba   = tf->flags & ATA_TFLAG_LBA;
>  	unsigned int lba48 = tf->flags & ATA_TFLAG_LBA48;
> +	int fua = scsicmd[1] & 0x8;
>  	u64 block = 0;
>  	u32 n_block = 0;
>  
> @@ -577,9 +578,26 @@ static unsigned int ata_scsi_rw_xlat(str
>  
>  	if (scsicmd[0] == READ_10 || scsicmd[0] == READ_6 ||
>  	    scsicmd[0] == READ_16) {
> +		if (fua) {
> +			printk(KERN_WARNING
> +			       "ata%u(%u): WARNING: FUA READ unsupported\n",
> +			       qc->ap->id, qc->dev->devno);
> +			return 1;
> +		}
>  		tf->command = qc->dev->read_cmd;
>  	} else {
> -		tf->command = qc->dev->write_cmd;
> +		if (fua) {
> +			if (qc->dev->write_fua_cmd == 0 || !lba48) {
> +				printk(KERN_WARNING
> +				       "ata%u(%u): WARNING: FUA WRITE "
> +				       "unsupported with the current "
> +				       "protocol/addressing\n",
> +				       qc->ap->id, qc->dev->devno);
> +				return 1;
> +			}
> +			tf->command = qc->dev->write_fua_cmd;
> +		} else
> +			tf->command = qc->dev->write_cmd;
>  		tf->flags |= ATA_TFLAG_WRITE;
>  	}
>  

this all seems fine.


> @@ -1205,10 +1223,12 @@ unsigned int ata_scsiop_mode_sense(struc
>  	if (six_byte) {
>  		output_len--;
>  		rbuf[0] = output_len;
> +		rbuf[2] |= ata_id_has_fua(args->id) ? 0x10 : 0;
>  	} else {
>  		output_len -= 2;
>  		rbuf[0] = output_len >> 8;
>  		rbuf[1] = output_len;
> +		rbuf[3] |= ata_id_has_fua(args->id) ? 0x10 : 0;
>  	}

I wonder what a SCSI person thinks about this.  Its defined as 'DPO and 
FUA' not just 'FUA'.

Also, a bit of style:  please use "1 << n" for bit constants in libata.

	Jeff



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 06/09] blk: update SCSI to use the new blk_ordered
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 06/09] blk: update SCSI to use the new blk_ordered Tejun Heo
@ 2005-06-05  7:08   ` Jeff Garzik
  2005-06-07  1:58     ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2005-06-05  7:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

Tejun Heo wrote:
> 06_blk_update_scsi_to_use_new_ordered.patch
> 
> 	All ordered request related stuff delegated to HLD.  Midlayer
> 	now doens't deal with ordered setting or prepare_flush
> 	callback.  sd.c updated to deal with blk_queue_ordered
> 	setting.  Currently, ordered tag isn't used as SCSI midlayer
> 	cannot guarantee request ordering.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
>  drivers/scsi/hosts.c       |    9 -----
>  drivers/scsi/scsi_lib.c    |   46 --------------------------
>  drivers/scsi/sd.c          |   79 +++++++++++++++++++--------------------------
>  include/scsi/scsi_driver.h |    1 
>  include/scsi/scsi_host.h   |    1 
>  5 files changed, 34 insertions(+), 102 deletions(-)
> 
> Index: blk-fixes/drivers/scsi/sd.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/sd.c	2005-06-05 14:53:32.000000000 +0900
> +++ blk-fixes/drivers/scsi/sd.c	2005-06-05 14:53:35.000000000 +0900
> @@ -103,6 +103,7 @@ struct scsi_disk {
>  	u8		write_prot;
>  	unsigned	WCE : 1;	/* state of disk WCE bit */
>  	unsigned	RCD : 1;	/* state of disk RCD bit, unused */
> +	unsigned	DPOFUA : 1;	/* state of disk DPOFUA bit */
>  };
>  
>  static DEFINE_IDR(sd_index_idr);
> @@ -122,8 +123,7 @@ static void sd_shutdown(struct device *d
>  static void sd_rescan(struct device *);
>  static int sd_init_command(struct scsi_cmnd *);
>  static int sd_issue_flush(struct device *, sector_t *);
> -static void sd_end_flush(request_queue_t *, struct request *);
> -static int sd_prepare_flush(request_queue_t *, struct request *);
> +static void sd_prepare_flush(request_queue_t *, struct request *);
>  static void sd_read_capacity(struct scsi_disk *sdkp, char *diskname,
>  		 struct scsi_request *SRpnt, unsigned char *buffer);
>  
> @@ -138,8 +138,6 @@ static struct scsi_driver sd_template = 
>  	.rescan			= sd_rescan,
>  	.init_command		= sd_init_command,
>  	.issue_flush		= sd_issue_flush,
> -	.prepare_flush		= sd_prepare_flush,
> -	.end_flush		= sd_end_flush,
>  };
>  
>  /*
> @@ -346,6 +344,7 @@ static int sd_init_command(struct scsi_c
>  	
>  	if (block > 0xffffffff) {
>  		SCpnt->cmnd[0] += READ_16 - READ_6;
> +		SCpnt->cmnd[1] |= blk_fua_rq(rq) ? 0x8 : 0;
>  		SCpnt->cmnd[2] = sizeof(block) > 4 ? (unsigned char) (block >> 56) & 0xff : 0;
>  		SCpnt->cmnd[3] = sizeof(block) > 4 ? (unsigned char) (block >> 48) & 0xff : 0;
>  		SCpnt->cmnd[4] = sizeof(block) > 4 ? (unsigned char) (block >> 40) & 0xff : 0;
> @@ -360,11 +359,12 @@ static int sd_init_command(struct scsi_c
>  		SCpnt->cmnd[13] = (unsigned char) this_count & 0xff;
>  		SCpnt->cmnd[14] = SCpnt->cmnd[15] = 0;
>  	} else if ((this_count > 0xff) || (block > 0x1fffff) ||
> -		   SCpnt->device->use_10_for_rw) {
> +		   SCpnt->device->use_10_for_rw || blk_fua_rq(rq)) {

This seems suspicious, like it would cause unwanted use of READ(10) for 
some devices that prefer READ(6) ?


>  		if (this_count > 0xffff)
>  			this_count = 0xffff;
>  
>  		SCpnt->cmnd[0] += READ_10 - READ_6;
> +		SCpnt->cmnd[1] |= blk_fua_rq(rq) ? 0x8 : 0;
>  		SCpnt->cmnd[2] = (unsigned char) (block >> 24) & 0xff;
>  		SCpnt->cmnd[3] = (unsigned char) (block >> 16) & 0xff;
>  		SCpnt->cmnd[4] = (unsigned char) (block >> 8) & 0xff;
> @@ -739,43 +739,12 @@ static int sd_issue_flush(struct device 
>  	return sd_sync_cache(sdp);
>  }
>  
> -static void sd_end_flush(request_queue_t *q, struct request *flush_rq)
> +static void sd_prepare_flush(request_queue_t *q, struct request *rq)
>  {
> -	struct request *rq = flush_rq->end_io_data;
> -	struct scsi_cmnd *cmd = rq->special;
> -	unsigned int bytes = rq->hard_nr_sectors << 9;
> -
> -	if (!flush_rq->errors) {
> -		spin_unlock(q->queue_lock);
> -		scsi_io_completion(cmd, bytes, 0);
> -		spin_lock(q->queue_lock);
> -	} else if (blk_barrier_postflush(rq)) {
> -		spin_unlock(q->queue_lock);
> -		scsi_io_completion(cmd, 0, bytes);
> -		spin_lock(q->queue_lock);
> -	} else {
> -		/*
> -		 * force journal abort of barriers
> -		 */
> -		end_that_request_first(rq, -EOPNOTSUPP, rq->hard_nr_sectors);
> -		end_that_request_last(rq, -EOPNOTSUPP);
> -	}
> -}
> -
> -static int sd_prepare_flush(request_queue_t *q, struct request *rq)
> -{
> -	struct scsi_device *sdev = q->queuedata;
> -	struct scsi_disk *sdkp = dev_get_drvdata(&sdev->sdev_gendev);
> -
> -	if (sdkp->WCE) {
> -		memset(rq->cmd, 0, sizeof(rq->cmd));
> -		rq->flags |= REQ_BLOCK_PC | REQ_SOFTBARRIER;
> -		rq->timeout = SD_TIMEOUT;
> -		rq->cmd[0] = SYNCHRONIZE_CACHE;
> -		return 1;
> -	}
> -
> -	return 0;
> +	memset(rq->cmd, 0, sizeof(rq->cmd));
> +	rq->flags |= REQ_BLOCK_PC | REQ_SOFTBARRIER;
> +	rq->timeout = SD_TIMEOUT;
> +	rq->cmd[0] = SYNCHRONIZE_CACHE;
>  }
>  
>  static void sd_rescan(struct device *dev)
> @@ -1433,10 +1402,13 @@ sd_read_cache_type(struct scsi_disk *sdk
>  			sdkp->RCD = 0;
>  		}
>  
> +		sdkp->DPOFUA = (data.device_specific & 0x10) != 0;
> +
>  		ct =  sdkp->RCD + 2*sdkp->WCE;
>  
> -		printk(KERN_NOTICE "SCSI device %s: drive cache: %s\n",
> -		       diskname, types[ct]);
> +		printk(KERN_NOTICE "SCSI device %s: drive cache: %s%s\n",
> +		       diskname, types[ct],
> +		       sdkp->DPOFUA ? " with forced unit access" : "");

This is IMO a bit verbose.  Just " w/ FUA" might be better.


>  		return;
>  	}
> @@ -1469,6 +1441,7 @@ static int sd_revalidate_disk(struct gen
>  	struct scsi_device *sdp = sdkp->device;
>  	struct scsi_request *sreq;
>  	unsigned char *buffer;
> +	unsigned ordered;
>  
>  	SCSI_LOG_HLQUEUE(3, printk("sd_revalidate_disk: disk=%s\n", disk->disk_name));
>  
> @@ -1514,7 +1487,22 @@ static int sd_revalidate_disk(struct gen
>  					sreq, buffer);
>  		sd_read_cache_type(sdkp, disk->disk_name, sreq, buffer);
>  	}
> -		
> +
> +	/*
> +	 * We now have all cache related info, determine how we deal
> +	 * with ordered requests.  Note that as the current SCSI
> +	 * dispatch function can alter request order, we cannot use
> +	 * QUEUE_ORDERED_TAG_* even when ordered tag is supported.
> +	 */
> +	if (sdkp->WCE)
> +		ordered = sdkp->DPOFUA
> +			? QUEUE_ORDERED_DRAIN_FUA : QUEUE_ORDERED_DRAIN_FLUSH;

Certainly 'DPO and FUA' implies we have FUA, but I wonder if this test 
is unnecessarily narrow.

	Jeff



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 02/09] blk: make scsi use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST
  2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 02/09] blk: make scsi use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST Tejun Heo
@ 2005-06-05  7:10   ` Jeff Garzik
  2005-06-07  1:34     ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Garzik @ 2005-06-05  7:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

Tejun Heo wrote:
> 02_blk_scsi_eopnotsupp.patch
> 
> 	Use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
>  scsi_lib.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletion(-)
> 
> Index: blk-fixes/drivers/scsi/scsi_lib.c
> ===================================================================
> --- blk-fixes.orig/drivers/scsi/scsi_lib.c	2005-06-05 14:53:32.000000000 +0900
> +++ blk-fixes/drivers/scsi/scsi_lib.c	2005-06-05 14:53:33.000000000 +0900
> @@ -849,7 +849,8 @@ void scsi_io_completion(struct scsi_cmnd
>  				scsi_requeue_command(q, cmd);
>  				result = 0;
>  			} else {
> -				cmd = scsi_end_request(cmd, 0, this_count, 1);
> +				cmd = scsi_end_request(cmd, -EOPNOTSUPP,
> +						       this_count, 1);

This looks like a change from zero to EOPNOTSUPP, but your description 
says its a change from EIO to EOPNOTSUPP.

	Jeff




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 08/09] blk: update IDE to use the new blk_ordered.
  2005-06-05  6:47   ` Jeff Garzik
@ 2005-06-05 14:14     ` Bartlomiej Zolnierkiewicz
  2005-06-07  2:26       ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2005-06-05 14:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Tejun Heo, axboe, James.Bottomley, linux-kernel

On 6/5/05, Jeff Garzik <jgarzik@pobox.com> wrote:
> Tejun Heo wrote:
> > @@ -176,6 +176,18 @@ static ide_startstop_t __ide_do_rw_disk(
> >                       lba48 = 0;
> >       }
> >
> > +     if (blk_fua_rq(rq) &&
> > +         (!rq_data_dir(rq) || !drive->select.b.lba || !lba48)) {
> > +             if (!rq_data_dir(rq))
> > +                     printk(KERN_WARNING "%s: FUA READ unsupported\n",
> > +                            drive->name);
> > +             else
> > +                     printk(KERN_WARNING "%s: FUA request received but "
> > +                            "cannot use LBA48\n", drive->name);
> > +             ide_end_request(drive, 0, 0);
> > +             return ide_stopped;
> > +     }
> > +
> 
> 
> Does this string of tests really need to be added to the main fast path?
> 
> It would be better to simply guarantee that this check need never exist
> in the IDE driver, by guaranteeing that the block layer will never send
> a FUA-READ command to a driver that does not wish it.
> 
>         Jeff

Seconded, plus please use <linux/ata.h> instead of <linux/hdreg.h>
for adding new opcodes.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 02/09] blk: make scsi use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST
  2005-06-05  7:10   ` Jeff Garzik
@ 2005-06-07  1:34     ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-07  1:34 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

Jeff Garzik wrote:
> Tejun Heo wrote:
> 
>> 02_blk_scsi_eopnotsupp.patch
>>
>>     Use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST.
>>
>> Signed-off-by: Tejun Heo <htejun@gmail.com>
>>
>>  scsi_lib.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletion(-)
>>
>> Index: blk-fixes/drivers/scsi/scsi_lib.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/scsi_lib.c    2005-06-05 
>> 14:53:32.000000000 +0900
>> +++ blk-fixes/drivers/scsi/scsi_lib.c    2005-06-05 14:53:33.000000000 
>> +0900
>> @@ -849,7 +849,8 @@ void scsi_io_completion(struct scsi_cmnd
>>                  scsi_requeue_command(q, cmd);
>>                  result = 0;
>>              } else {
>> -                cmd = scsi_end_request(cmd, 0, this_count, 1);
>> +                cmd = scsi_end_request(cmd, -EOPNOTSUPP,
>> +                               this_count, 1);
> 
> 
> This looks like a change from zero to EOPNOTSUPP, but your description 
> says its a change from EIO to EOPNOTSUPP.
> 
>     Jeff
> 

  Hello, Jeff.

  I just found it confusing to write changing 0 to -EOPNOTSUPP when 0 
actually means -EIO (uptodate).  I'll write in more detailed way next time.

  Thank you. :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 06/09] blk: update SCSI to use the new blk_ordered
  2005-06-05  7:08   ` Jeff Garzik
@ 2005-06-07  1:58     ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-07  1:58 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel


  Hi, Jeff.

Jeff Garzik wrote:
> Tejun Heo wrote:
> 
>> 06_blk_update_scsi_to_use_new_ordered.patch
>>
>>     All ordered request related stuff delegated to HLD.  Midlayer
>>     now doens't deal with ordered setting or prepare_flush
>>     callback.  sd.c updated to deal with blk_queue_ordered
>>     setting.  Currently, ordered tag isn't used as SCSI midlayer
>>     cannot guarantee request ordering.
>>
>> Signed-off-by: Tejun Heo <htejun@gmail.com>
>>
>>  drivers/scsi/hosts.c       |    9 -----
>>  drivers/scsi/scsi_lib.c    |   46 --------------------------
>>  drivers/scsi/sd.c          |   79 
>> +++++++++++++++++++--------------------------
>>  include/scsi/scsi_driver.h |    1  include/scsi/scsi_host.h   |    1 
>>  5 files changed, 34 insertions(+), 102 deletions(-)
>>
>> Index: blk-fixes/drivers/scsi/sd.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sd.c    2005-06-05 14:53:32.000000000 
>> +0900
>> +++ blk-fixes/drivers/scsi/sd.c    2005-06-05 14:53:35.000000000 +0900
>> @@ -103,6 +103,7 @@ struct scsi_disk {
>>      u8        write_prot;
>>      unsigned    WCE : 1;    /* state of disk WCE bit */
>>      unsigned    RCD : 1;    /* state of disk RCD bit, unused */
>> +    unsigned    DPOFUA : 1;    /* state of disk DPOFUA bit */
>>  };
>>  
>>  static DEFINE_IDR(sd_index_idr);
>> @@ -122,8 +123,7 @@ static void sd_shutdown(struct device *d
>>  static void sd_rescan(struct device *);
>>  static int sd_init_command(struct scsi_cmnd *);
>>  static int sd_issue_flush(struct device *, sector_t *);
>> -static void sd_end_flush(request_queue_t *, struct request *);
>> -static int sd_prepare_flush(request_queue_t *, struct request *);
>> +static void sd_prepare_flush(request_queue_t *, struct request *);
>>  static void sd_read_capacity(struct scsi_disk *sdkp, char *diskname,
>>           struct scsi_request *SRpnt, unsigned char *buffer);
>>  
>> @@ -138,8 +138,6 @@ static struct scsi_driver sd_template =      
>> .rescan            = sd_rescan,
>>      .init_command        = sd_init_command,
>>      .issue_flush        = sd_issue_flush,
>> -    .prepare_flush        = sd_prepare_flush,
>> -    .end_flush        = sd_end_flush,
>>  };
>>  
>>  /*
>> @@ -346,6 +344,7 @@ static int sd_init_command(struct scsi_c
>>     
>>      if (block > 0xffffffff) {
>>          SCpnt->cmnd[0] += READ_16 - READ_6;
>> +        SCpnt->cmnd[1] |= blk_fua_rq(rq) ? 0x8 : 0;
>>          SCpnt->cmnd[2] = sizeof(block) > 4 ? (unsigned char) (block 
>> >> 56) & 0xff : 0;
>>          SCpnt->cmnd[3] = sizeof(block) > 4 ? (unsigned char) (block 
>> >> 48) & 0xff : 0;
>>          SCpnt->cmnd[4] = sizeof(block) > 4 ? (unsigned char) (block 
>> >> 40) & 0xff : 0;
>> @@ -360,11 +359,12 @@ static int sd_init_command(struct scsi_c
>>          SCpnt->cmnd[13] = (unsigned char) this_count & 0xff;
>>          SCpnt->cmnd[14] = SCpnt->cmnd[15] = 0;
>>      } else if ((this_count > 0xff) || (block > 0x1fffff) ||
>> -           SCpnt->device->use_10_for_rw) {
>> +           SCpnt->device->use_10_for_rw || blk_fua_rq(rq)) {
> 
> 
> This seems suspicious, like it would cause unwanted use of READ(10) for 
> some devices that prefer READ(6) ?
> 

  As READ/WRITE(6) doesn't have FUA field, we have to use WRITE(10) or 
WRITE(16) on FUA writes.  Above addition affects only FUA writes (no 
effect on READs or normal WRITEs).

  FUA write requests are generated only when a device reports DPOFUA 
capability during sd attach, and only READ/WRITE(>=10) have FUA field, 
so I think it's logical to assume that devices which report DPOFUA 
support READ/WRITE(10).

  Also, currently all SCSI disks default to READ/WRITE(10) and 
use_10_for_rw is turned off only when READ/WRITE(10) fails w/ 
ILLEGAL_REQUEST.  So, the assumption is that devices are at least 
capable of properly terminating READ/WRITE(10)s with ILLEGAL REQUEST 
when it doesn't like'em (thus turning off FUA).

  Hmmm... Do you think I should add more tests before enabling FUA such 
that we know WRITE(10) is supported (capacity test comes to mind).

> 
>>          if (this_count > 0xffff)
>>              this_count = 0xffff;
>>  
>>          SCpnt->cmnd[0] += READ_10 - READ_6;
>> +        SCpnt->cmnd[1] |= blk_fua_rq(rq) ? 0x8 : 0;
>>          SCpnt->cmnd[2] = (unsigned char) (block >> 24) & 0xff;
>>          SCpnt->cmnd[3] = (unsigned char) (block >> 16) & 0xff;
>>          SCpnt->cmnd[4] = (unsigned char) (block >> 8) & 0xff;
>> @@ -739,43 +739,12 @@ static int sd_issue_flush(struct device      
>> return sd_sync_cache(sdp);
>>  }
>>  
>> -static void sd_end_flush(request_queue_t *q, struct request *flush_rq)
>> +static void sd_prepare_flush(request_queue_t *q, struct request *rq)
>>  {
>> -    struct request *rq = flush_rq->end_io_data;
>> -    struct scsi_cmnd *cmd = rq->special;
>> -    unsigned int bytes = rq->hard_nr_sectors << 9;
>> -
>> -    if (!flush_rq->errors) {
>> -        spin_unlock(q->queue_lock);
>> -        scsi_io_completion(cmd, bytes, 0);
>> -        spin_lock(q->queue_lock);
>> -    } else if (blk_barrier_postflush(rq)) {
>> -        spin_unlock(q->queue_lock);
>> -        scsi_io_completion(cmd, 0, bytes);
>> -        spin_lock(q->queue_lock);
>> -    } else {
>> -        /*
>> -         * force journal abort of barriers
>> -         */
>> -        end_that_request_first(rq, -EOPNOTSUPP, rq->hard_nr_sectors);
>> -        end_that_request_last(rq, -EOPNOTSUPP);
>> -    }
>> -}
>> -
>> -static int sd_prepare_flush(request_queue_t *q, struct request *rq)
>> -{
>> -    struct scsi_device *sdev = q->queuedata;
>> -    struct scsi_disk *sdkp = dev_get_drvdata(&sdev->sdev_gendev);
>> -
>> -    if (sdkp->WCE) {
>> -        memset(rq->cmd, 0, sizeof(rq->cmd));
>> -        rq->flags |= REQ_BLOCK_PC | REQ_SOFTBARRIER;
>> -        rq->timeout = SD_TIMEOUT;
>> -        rq->cmd[0] = SYNCHRONIZE_CACHE;
>> -        return 1;
>> -    }
>> -
>> -    return 0;
>> +    memset(rq->cmd, 0, sizeof(rq->cmd));
>> +    rq->flags |= REQ_BLOCK_PC | REQ_SOFTBARRIER;
>> +    rq->timeout = SD_TIMEOUT;
>> +    rq->cmd[0] = SYNCHRONIZE_CACHE;
>>  }
>>  
>>  static void sd_rescan(struct device *dev)
>> @@ -1433,10 +1402,13 @@ sd_read_cache_type(struct scsi_disk *sdk
>>              sdkp->RCD = 0;
>>          }
>>  
>> +        sdkp->DPOFUA = (data.device_specific & 0x10) != 0;
>> +
>>          ct =  sdkp->RCD + 2*sdkp->WCE;
>>  
>> -        printk(KERN_NOTICE "SCSI device %s: drive cache: %s\n",
>> -               diskname, types[ct]);
>> +        printk(KERN_NOTICE "SCSI device %s: drive cache: %s%s\n",
>> +               diskname, types[ct],
>> +               sdkp->DPOFUA ? " with forced unit access" : "");
> 
> 
> This is IMO a bit verbose.  Just " w/ FUA" might be better.
> 

  Sure.

> 
>>          return;
>>      }
>> @@ -1469,6 +1441,7 @@ static int sd_revalidate_disk(struct gen
>>      struct scsi_device *sdp = sdkp->device;
>>      struct scsi_request *sreq;
>>      unsigned char *buffer;
>> +    unsigned ordered;
>>  
>>      SCSI_LOG_HLQUEUE(3, printk("sd_revalidate_disk: disk=%s\n", 
>> disk->disk_name));
>>  
>> @@ -1514,7 +1487,22 @@ static int sd_revalidate_disk(struct gen
>>                      sreq, buffer);
>>          sd_read_cache_type(sdkp, disk->disk_name, sreq, buffer);
>>      }
>> -       
>> +
>> +    /*
>> +     * We now have all cache related info, determine how we deal
>> +     * with ordered requests.  Note that as the current SCSI
>> +     * dispatch function can alter request order, we cannot use
>> +     * QUEUE_ORDERED_TAG_* even when ordered tag is supported.
>> +     */
>> +    if (sdkp->WCE)
>> +        ordered = sdkp->DPOFUA
>> +            ? QUEUE_ORDERED_DRAIN_FUA : QUEUE_ORDERED_DRAIN_FLUSH;
> 
> 
> Certainly 'DPO and FUA' implies we have FUA, but I wonder if this test 
> is unnecessarily narrow.

  Meaning...?

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 07/09] blk: update libata to use the new blk_ordered.
  2005-06-05  7:02   ` Jeff Garzik
@ 2005-06-07  2:11     ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-07  2:11 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: axboe, James.Bottomley, bzolnier, linux-kernel

Jeff Garzik wrote:
> Tejun Heo wrote:
> 
>> 07_blk_update_libata_to_use_new_ordered.patch
>>
>>     Reflect changes in SCSI midlayer and updated to use the new
>>     ordered request implementation.
>>
>> Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
> 
> I would prefer separate patches for:
> 
> * implement support for FUA bit in libata SCSI simulator
> 
> * update libata for your ordered flush changes
> 

  Sure.

> 
> 
>> Index: blk-fixes/drivers/scsi/ahci.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/ahci.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/ahci.c    2005-06-05 14:53:35.000000000 +0900
>> @@ -203,7 +203,6 @@ static Scsi_Host_Template ahci_sht = {
>>      .dma_boundary        = AHCI_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  static struct ata_port_operations ahci_ops = {
>> Index: blk-fixes/drivers/scsi/ata_piix.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/ata_piix.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/ata_piix.c    2005-06-05 14:53:35.000000000 
>> +0900
>> @@ -123,7 +123,6 @@ static Scsi_Host_Template piix_sht = {
>>      .dma_boundary        = ATA_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  static struct ata_port_operations piix_pata_ops = {
>> Index: blk-fixes/drivers/scsi/sata_nv.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sata_nv.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/sata_nv.c    2005-06-05 14:53:35.000000000 
>> +0900
>> @@ -206,7 +206,6 @@ static Scsi_Host_Template nv_sht = {
>>      .dma_boundary        = ATA_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  static struct ata_port_operations nv_ops = {
>> Index: blk-fixes/drivers/scsi/sata_promise.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sata_promise.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/sata_promise.c    2005-06-05 
>> 14:53:35.000000000 +0900
>> @@ -104,7 +104,6 @@ static Scsi_Host_Template pdc_ata_sht =      
>> .dma_boundary        = ATA_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  static struct ata_port_operations pdc_ata_ops = {
>> Index: blk-fixes/drivers/scsi/sata_sil.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sata_sil.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/sata_sil.c    2005-06-05 14:53:35.000000000 
>> +0900
>> @@ -135,7 +135,6 @@ static Scsi_Host_Template sil_sht = {
>>      .dma_boundary        = ATA_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  static struct ata_port_operations sil_ops = {
>> Index: blk-fixes/drivers/scsi/sata_sis.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sata_sis.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/sata_sis.c    2005-06-05 14:53:35.000000000 
>> +0900
>> @@ -90,7 +90,6 @@ static Scsi_Host_Template sis_sht = {
>>      .dma_boundary        = ATA_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  static struct ata_port_operations sis_ops = {
>> Index: blk-fixes/drivers/scsi/sata_svw.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sata_svw.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/sata_svw.c    2005-06-05 14:53:35.000000000 
>> +0900
>> @@ -288,7 +288,6 @@ static Scsi_Host_Template k2_sata_sht =      
>> .proc_info        = k2_sata_proc_info,
>>  #endif
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  
>> Index: blk-fixes/drivers/scsi/sata_sx4.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sata_sx4.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/sata_sx4.c    2005-06-05 14:53:35.000000000 
>> +0900
>> @@ -188,7 +188,6 @@ static Scsi_Host_Template pdc_sata_sht =
>>      .dma_boundary        = ATA_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  static struct ata_port_operations pdc_20621_ops = {
>> Index: blk-fixes/drivers/scsi/sata_uli.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sata_uli.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/sata_uli.c    2005-06-05 14:53:35.000000000 
>> +0900
>> @@ -82,7 +82,6 @@ static Scsi_Host_Template uli_sht = {
>>      .dma_boundary        = ATA_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  static struct ata_port_operations uli_ops = {
>> Index: blk-fixes/drivers/scsi/sata_via.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sata_via.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/sata_via.c    2005-06-05 14:53:35.000000000 
>> +0900
>> @@ -102,7 +102,6 @@ static Scsi_Host_Template svia_sht = {
>>      .dma_boundary        = ATA_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  static struct ata_port_operations svia_sata_ops = {
>> Index: blk-fixes/drivers/scsi/sata_vsc.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/sata_vsc.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/sata_vsc.c    2005-06-05 14:53:35.000000000 
>> +0900
>> @@ -206,7 +206,6 @@ static Scsi_Host_Template vsc_sata_sht =
>>      .dma_boundary        = ATA_DMA_BOUNDARY,
>>      .slave_configure    = ata_scsi_slave_config,
>>      .bios_param        = ata_std_bios_param,
>> -    .ordered_flush        = 1,
>>  };
>>  
>>  
>> Index: blk-fixes/drivers/scsi/libata-core.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/libata-core.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/libata-core.c    2005-06-05 
>> 14:53:35.000000000 +0900
>> @@ -510,19 +510,21 @@ void ata_tf_from_fis(u8 *fis, struct ata
>>  }
>>  
>>  /**
>> - *    ata_prot_to_cmd - determine which read/write opcodes to use
>> + *    ata_prot_to_cmd - determine which read/write/fua-write opcodes 
>> to use
>>   *    @protocol: ATA_PROT_xxx taskfile protocol
>>   *    @lba48: true is lba48 is present
>>   *
>> - *    Given necessary input, determine which read/write commands
>> - *    to use to transfer data.
>> + *    Given necessary input, determine which read/write/fua-write
>> + *    commands to use to transfer data.  Note that we only support
>> + *    fua-writes on DMA LBA48 protocol.  In other cases, we simply
>> + *    return 0 which is NOP.
>>   *
>>   *    LOCKING:
>>   *    None.
>>   */
>>  static int ata_prot_to_cmd(int protocol, int lba48)
>>  {
>> -    int rcmd = 0, wcmd = 0;
>> +    int rcmd = 0, wcmd = 0, wfua = 0;
>>  
>>      switch (protocol) {
>>      case ATA_PROT_PIO:
>> @@ -539,6 +541,7 @@ static int ata_prot_to_cmd(int protocol,
>>          if (lba48) {
>>              rcmd = ATA_CMD_READ_EXT;
>>              wcmd = ATA_CMD_WRITE_EXT;
>> +            wfua = ATA_CMD_WRITE_FUA_EXT;
>>          } else {
>>              rcmd = ATA_CMD_READ;
>>              wcmd = ATA_CMD_WRITE;
>> @@ -549,7 +552,7 @@ static int ata_prot_to_cmd(int protocol,
>>          return -1;
>>      }
>>  
>> -    return rcmd | (wcmd << 8);
>> +    return rcmd | (wcmd << 8) | (wfua << 16);
>>  }
>>  
>>  /**
>> @@ -582,6 +585,7 @@ static void ata_dev_set_protocol(struct  
>>      dev->read_cmd = cmd & 0xff;
>>      dev->write_cmd = (cmd >> 8) & 0xff;
>> +    dev->write_fua_cmd = (cmd >> 16) & 0xff;
>>  }
>>  
>>  static const char * xfer_mode_str[] = {
>> Index: blk-fixes/drivers/scsi/libata-scsi.c
>> ===================================================================
>> --- blk-fixes.orig/drivers/scsi/libata-scsi.c    2005-06-05 
>> 14:50:11.000000000 +0900
>> +++ blk-fixes/drivers/scsi/libata-scsi.c    2005-06-05 
>> 14:53:35.000000000 +0900
>> @@ -569,6 +569,7 @@ static unsigned int ata_scsi_rw_xlat(str
>>      struct ata_device *dev = qc->dev;
>>      unsigned int lba   = tf->flags & ATA_TFLAG_LBA;
>>      unsigned int lba48 = tf->flags & ATA_TFLAG_LBA48;
>> +    int fua = scsicmd[1] & 0x8;
>>      u64 block = 0;
>>      u32 n_block = 0;
>>  
>> @@ -577,9 +578,26 @@ static unsigned int ata_scsi_rw_xlat(str
>>  
>>      if (scsicmd[0] == READ_10 || scsicmd[0] == READ_6 ||
>>          scsicmd[0] == READ_16) {
>> +        if (fua) {
>> +            printk(KERN_WARNING
>> +                   "ata%u(%u): WARNING: FUA READ unsupported\n",
>> +                   qc->ap->id, qc->dev->devno);
>> +            return 1;
>> +        }
>>          tf->command = qc->dev->read_cmd;
>>      } else {
>> -        tf->command = qc->dev->write_cmd;
>> +        if (fua) {
>> +            if (qc->dev->write_fua_cmd == 0 || !lba48) {
>> +                printk(KERN_WARNING
>> +                       "ata%u(%u): WARNING: FUA WRITE "
>> +                       "unsupported with the current "
>> +                       "protocol/addressing\n",
>> +                       qc->ap->id, qc->dev->devno);
>> +                return 1;
>> +            }
>> +            tf->command = qc->dev->write_fua_cmd;
>> +        } else
>> +            tf->command = qc->dev->write_cmd;
>>          tf->flags |= ATA_TFLAG_WRITE;
>>      }
>>  
> 
> 
> this all seems fine.
> 
> 
>> @@ -1205,10 +1223,12 @@ unsigned int ata_scsiop_mode_sense(struc
>>      if (six_byte) {
>>          output_len--;
>>          rbuf[0] = output_len;
>> +        rbuf[2] |= ata_id_has_fua(args->id) ? 0x10 : 0;
>>      } else {
>>          output_len -= 2;
>>          rbuf[0] = output_len >> 8;
>>          rbuf[1] = output_len;
>> +        rbuf[3] |= ata_id_has_fua(args->id) ? 0x10 : 0;
>>      }
> 
> 
> I wonder what a SCSI person thinks about this.  Its defined as 'DPO and 
> FUA' not just 'FUA'.
> 

  As DPO is sort of optimization flag, it doesn't make user-visible 
differences other than in performance.  I think we can add DPO check 
when translating commands and abort it w/ ILLEGAL_REQUEST (thus we'll be 
lying about DPO part of the flag but not be lying that DPO operation 
succeeds.)  But it doesn't make any user-visible difference and we're 
not using DPO in anywhere right now, so I think it's an overkill.

  Any better ideas?  Maybe adding another flag to scsi_device structure 
like ->fua_supported which can be adjusted by slave_config?

> Also, a bit of style:  please use "1 << n" for bit constants in libata.
> 
>     Jeff
> 

  Sure.

  Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH Linux 2.6.12-rc5-mm2 08/09] blk: update IDE to use the new blk_ordered.
  2005-06-05 14:14     ` Bartlomiej Zolnierkiewicz
@ 2005-06-07  2:26       ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2005-06-07  2:26 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Jeff Garzik, axboe, James.Bottomley, linux-kernel


  Hello, Jeff.
  Hello, Bartlomiej.

Bartlomiej Zolnierkiewicz wrote:
> On 6/5/05, Jeff Garzik <jgarzik@pobox.com> wrote:
> 
>>Tejun Heo wrote:
>>
>>>@@ -176,6 +176,18 @@ static ide_startstop_t __ide_do_rw_disk(
>>>                      lba48 = 0;
>>>      }
>>>
>>>+     if (blk_fua_rq(rq) &&
>>>+         (!rq_data_dir(rq) || !drive->select.b.lba || !lba48)) {
>>>+             if (!rq_data_dir(rq))
>>>+                     printk(KERN_WARNING "%s: FUA READ unsupported\n",
>>>+                            drive->name);
>>>+             else
>>>+                     printk(KERN_WARNING "%s: FUA request received but "
>>>+                            "cannot use LBA48\n", drive->name);
>>>+             ide_end_request(drive, 0, 0);
>>>+             return ide_stopped;
>>>+     }
>>>+
>>
>>
>>Does this string of tests really need to be added to the main fast path?
>>
>>It would be better to simply guarantee that this check need never exist
>>in the IDE driver, by guaranteeing that the block layer will never send
>>a FUA-READ command to a driver that does not wish it.
>>
>>        Jeff

  I am not particulary proud of the way I've modified the IDE driver 
but, to defend my ass a bit, the structure of __ide_do_rw_disk() was a 
little bit awkward to add FUA support as it first loads all address 
registers and then looks what to execute, and I didn't want to load 
taskfile registers only to abort the command.

  Currently none issues FUA READs, so the rq_data_dir() test can go away 
but I think it's just nice to have it there for completeness.  As the 
whole test is invoked only when blk_fua_rq() is true, I don't think the 
overhead will be anything accountable (or reducible).  And for the 
following select.b.lba and lba48 tests, we actually only need the lba48 
test but I wasn't really sure whether lba48 implies select.b.lba.  It 
seems it currently does but I couldn't find anything that guarantees it 
by design.  Bartlomiej, is it safe to skip select.b.lba test?

  I'll try to make it look better.  If you have any ideas, please let me 
know.

> 
> 
> Seconded, plus please use <linux/ata.h> instead of <linux/hdreg.h>
> for adding new opcodes.

  will do.

  Thank you a lot.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-06-07  2:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-05  5:57 [PATCH Linux 2.6.12-rc5-mm2 00/09] blk: ordered request reimplementation (take 2, for review) Tejun Heo
2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 01/09] blk: add @uptodate to end_that_request_last() and @error to rq_end_io_fn() Tejun Heo
2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 02/09] blk: make scsi use -EOPNOTSUPP instead of -EIO on ILLEGAL_REQUEST Tejun Heo
2005-06-05  7:10   ` Jeff Garzik
2005-06-07  1:34     ` Tejun Heo
2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 03/09] blk: make ide use -EOPNOTSUPP instead of -EIO on ABRT_ERR Tejun Heo
2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 04/09] blk: separate out bio init part from __make_request Tejun Heo
2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 05/09] blk: reimplement handling of barrier request Tejun Heo
2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 06/09] blk: update SCSI to use the new blk_ordered Tejun Heo
2005-06-05  7:08   ` Jeff Garzik
2005-06-07  1:58     ` Tejun Heo
2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 07/09] blk: update libata " Tejun Heo
2005-06-05  7:02   ` Jeff Garzik
2005-06-07  2:11     ` Tejun Heo
2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 08/09] blk: update IDE " Tejun Heo
2005-06-05  6:47   ` Jeff Garzik
2005-06-05 14:14     ` Bartlomiej Zolnierkiewicz
2005-06-07  2:26       ` Tejun Heo
2005-06-05  5:57 ` [PATCH Linux 2.6.12-rc5-mm2 09/09] blk: debug messages Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.