All of lore.kernel.org
 help / color / mirror / Atom feed
* scsi-mq
@ 2014-06-12 13:48 Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 01/14] sd: don't use rq->cmd_len before setting it up Christoph Hellwig
                   ` (15 more replies)
  0 siblings, 16 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

With all the required blk-mq work, and the previous set of scsi midlayer
updates in Linus' tree this is the time for the first format scsi-mq
submission.

At this point the code is ready for merging and use by developers and early
adopters.  The core blk-mq code isn't that suitable for slow devices
yet, mostly due to the lack of an I/O scheduler, but Jens is working on it.
Similarly there is no dm-multipath support for drivers using blk-mq yet,
but I'm working on it.  It should also be noted that the code doesn't
actually support multiple hardware queues or fine grained tuning of the
blk-mq parameters yet.  All these could be added fairly easily as soon
as low-level drivers want to make use of them.

The amount of chances to the existing code are fairly small, and mostly
speedups or cleanups that also apply to the old path as well.  Because
of this I also haven't bothered to put it under a config option, just
like the blk-mq core.

The usage of blk-mq dramatically decreases CPU usage under all workloads going
down from 100% CPU usage that the old setup can hit easily to usually less
than 20% for maxing out storage subsystems with 512byte reads and writes,
and it allows to easily archive millions of IOPS.  Bart and Robert have
helped with some very detailed measurements that they might be able to send
in reply to this, although these usually involve significantly reworked low
level drivers to avoid other bottle necks.

One major objection to previous iterations of this code was the simple
replacement of the host_lock with atomic counters for the host and busy
counters.  The host_lock avoidance on it's own already improves performance,
and with the patch to avoid maintaining the per-target busy counter unless
needed we now replace a lock round trip on the host_lock with just a single
atomic increment in the submission path, and a single atomic decrement in
completion path, which should provide benefits even for the oddest RISC
architecture.  Longer term I'd still love to get rid of these entirely
and use the counters in blk-mq, but due to the difference in how they
are maintained this doesn't seem feasible as long as we still need to
support the legacy request code path.

In addition to the patches in this thread there also is a git available at:

	git://git.infradead.org/users/hch/scsi.git scsi-mq

This work was sponsored by the ION division of Fusion IO.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/14] sd: don't use rq->cmd_len before setting it up
  2014-06-12 13:48 scsi-mq Christoph Hellwig
@ 2014-06-12 13:48 ` Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 02/14] scsi: split __scsi_queue_insert Christoph Hellwig
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Unlike the old request code blk-mq doesn't initialize cmd_len with a
default value, so don't rely on it being set in sd_setup_write_same_cmnd.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e9689d5..dbd0c51 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -810,15 +810,16 @@ static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)
 
 	rq->__data_len = sdp->sector_size;
 	rq->timeout = SD_WRITE_SAME_TIMEOUT;
-	memset(rq->cmd, 0, rq->cmd_len);
 
 	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff) {
 		rq->cmd_len = 16;
+		memset(rq->cmd, 0, rq->cmd_len);
 		rq->cmd[0] = WRITE_SAME_16;
 		put_unaligned_be64(sector, &rq->cmd[2]);
 		put_unaligned_be32(nr_sectors, &rq->cmd[10]);
 	} else {
 		rq->cmd_len = 10;
+		memset(rq->cmd, 0, rq->cmd_len);
 		rq->cmd[0] = WRITE_SAME;
 		put_unaligned_be32(sector, &rq->cmd[2]);
 		put_unaligned_be16(nr_sectors, &rq->cmd[7]);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/14] scsi: split __scsi_queue_insert
  2014-06-12 13:48 scsi-mq Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 01/14] sd: don't use rq->cmd_len before setting it up Christoph Hellwig
@ 2014-06-12 13:48 ` Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 03/14] scsi: centralize command re-queueing in scsi_dispatch_fn Christoph Hellwig
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Factor out a helper to set the _blocked values, which we'll reuse for the
blk-mq code path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c |   44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f7e3163..7662168 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -75,28 +75,12 @@ struct kmem_cache *scsi_sdb_cache;
  */
 #define SCSI_QUEUE_DELAY	3
 
-/**
- * __scsi_queue_insert - private queue insertion
- * @cmd: The SCSI command being requeued
- * @reason:  The reason for the requeue
- * @unbusy: Whether the queue should be unbusied
- *
- * This is a private queue insertion.  The public interface
- * scsi_queue_insert() always assumes the queue should be unbusied
- * because it's always called before the completion.  This function is
- * for a requeue after completion, which should only occur in this
- * file.
- */
-static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
+static void
+scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 {
 	struct Scsi_Host *host = cmd->device->host;
 	struct scsi_device *device = cmd->device;
 	struct scsi_target *starget = scsi_target(device);
-	struct request_queue *q = device->request_queue;
-	unsigned long flags;
-
-	SCSI_LOG_MLQUEUE(1,
-		 printk("Inserting command %p into mlqueue\n", cmd));
 
 	/*
 	 * Set the appropriate busy bit for the device/host.
@@ -123,6 +107,30 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 		starget->target_blocked = starget->max_target_blocked;
 		break;
 	}
+}
+
+/**
+ * __scsi_queue_insert - private queue insertion
+ * @cmd: The SCSI command being requeued
+ * @reason:  The reason for the requeue
+ * @unbusy: Whether the queue should be unbusied
+ *
+ * This is a private queue insertion.  The public interface
+ * scsi_queue_insert() always assumes the queue should be unbusied
+ * because it's always called before the completion.  This function is
+ * for a requeue after completion, which should only occur in this
+ * file.
+ */
+static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
+{
+	struct scsi_device *device = cmd->device;
+	struct request_queue *q = device->request_queue;
+	unsigned long flags;
+
+	SCSI_LOG_MLQUEUE(1,
+		 printk("Inserting command %p into mlqueue\n", cmd));
+
+	scsi_set_blocked(cmd, reason);
 
 	/*
 	 * Decrement the counters, since these commands are no longer
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/14] scsi: centralize command re-queueing in scsi_dispatch_fn
  2014-06-12 13:48 scsi-mq Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 01/14] sd: don't use rq->cmd_len before setting it up Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 02/14] scsi: split __scsi_queue_insert Christoph Hellwig
@ 2014-06-12 13:48 ` Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 04/14] scsi: set ->scsi_done before calling scsi_dispatch_cmd Christoph Hellwig
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Make sure we only have the logic for requeing commands in one place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c     |   36 +++++++++++++-----------------------
 drivers/scsi/scsi_lib.c |    9 ++++++---
 2 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe..b91abab 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -648,9 +648,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 		 * returns an immediate error upwards, and signals
 		 * that the device is no longer present */
 		cmd->result = DID_NO_CONNECT << 16;
-		scsi_done(cmd);
-		/* return 0 (because the command has been processed) */
-		goto out;
+		goto done;
 	}
 
 	/* Check to see if the scsi lld made this device blocked. */
@@ -662,16 +660,8 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 		 * occur until the device transitions out of the
 		 * suspend state.
 		 */
-
-		scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
-
 		SCSI_LOG_MLQUEUE(3, printk("queuecommand : device blocked \n"));
-
-		/*
-		 * NOTE: rtn is still zero here because we don't need the
-		 * queue to be plugged on return (it's already stopped)
-		 */
-		goto out;
+		return SCSI_MLQUEUE_DEVICE_BUSY;
 	}
 
 	/* 
@@ -695,35 +685,35 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 			       "cdb_size=%d host->max_cmd_len=%d\n",
 			       cmd->cmd_len, cmd->device->host->max_cmd_len));
 		cmd->result = (DID_ABORT << 16);
-
-		scsi_done(cmd);
-		goto out;
+		goto done;
 	}
 
 	if (unlikely(host->shost_state == SHOST_DEL)) {
 		cmd->result = (DID_NO_CONNECT << 16);
-		scsi_done(cmd);
-	} else {
-		trace_scsi_dispatch_cmd_start(cmd);
-		cmd->scsi_done = scsi_done;
-		rtn = host->hostt->queuecommand(host, cmd);
+		goto done;
+
 	}
 
+	trace_scsi_dispatch_cmd_start(cmd);
+
+	cmd->scsi_done = scsi_done;
+	rtn = host->hostt->queuecommand(host, cmd);
 	if (rtn) {
 		trace_scsi_dispatch_cmd_error(cmd, rtn);
 		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
 		    rtn != SCSI_MLQUEUE_TARGET_BUSY)
 			rtn = SCSI_MLQUEUE_HOST_BUSY;
 
-		scsi_queue_insert(cmd, rtn);
-
 		SCSI_LOG_MLQUEUE(3,
 		    printk("queuecommand : request rejected\n"));
 	}
 
- out:
 	SCSI_LOG_MLQUEUE(3, printk("leaving scsi_dispatch_cmnd()\n"));
 	return rtn;
+ done:
+	SCSI_LOG_MLQUEUE(3, printk("scsi_dispatch_cmnd() failed\n"));
+	scsi_done(cmd);
+	return 0;
 }
 
 /**
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7662168..98eb358 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1583,9 +1583,12 @@ static void scsi_request_fn(struct request_queue *q)
 		 * Dispatch the command to the low-level driver.
 		 */
 		rtn = scsi_dispatch_cmd(cmd);
-		spin_lock_irq(q->queue_lock);
-		if (rtn)
+		if (rtn) {
+			scsi_queue_insert(cmd, rtn);
+			spin_lock_irq(q->queue_lock);
 			goto out_delay;
+		}
+		spin_lock_irq(q->queue_lock);
 	}
 
 	return;
@@ -1605,7 +1608,7 @@ static void scsi_request_fn(struct request_queue *q)
 	blk_requeue_request(q, req);
 	sdev->device_busy--;
 out_delay:
-	if (sdev->device_busy == 0)
+	if (sdev->device_busy == 0 && !scsi_device_blocked(sdev))
 		blk_delay_queue(q, SCSI_QUEUE_DELAY);
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/14] scsi: set ->scsi_done before calling scsi_dispatch_cmd
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (2 preceding siblings ...)
  2014-06-12 13:48 ` [PATCH 03/14] scsi: centralize command re-queueing in scsi_dispatch_fn Christoph Hellwig
@ 2014-06-12 13:48 ` Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 05/14] scsi: push host_lock down into scsi_{host,target}_queue_ready Christoph Hellwig
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

The blk-mq code path will set this to a different function, so make the
code simpler by setting it up in a legacy-request specific place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c     |   23 +----------------------
 drivers/scsi/scsi_lib.c |   20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index b91abab..8ca9ed2 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -72,8 +72,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/scsi.h>
 
-static void scsi_done(struct scsi_cmnd *cmd);
-
 /*
  * Definitions and constants.
  */
@@ -695,8 +693,6 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 	}
 
 	trace_scsi_dispatch_cmd_start(cmd);
-
-	cmd->scsi_done = scsi_done;
 	rtn = host->hostt->queuecommand(host, cmd);
 	if (rtn) {
 		trace_scsi_dispatch_cmd_error(cmd, rtn);
@@ -712,28 +708,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 	return rtn;
  done:
 	SCSI_LOG_MLQUEUE(3, printk("scsi_dispatch_cmnd() failed\n"));
-	scsi_done(cmd);
+	cmd->scsi_done(cmd);
 	return 0;
 }
 
 /**
- * scsi_done - Invoke completion on finished SCSI command.
- * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives
- * ownership back to SCSI Core -- i.e. the LLDD has finished with it.
- *
- * Description: This function is the mid-level's (SCSI Core) interrupt routine,
- * which regains ownership of the SCSI command (de facto) from a LLDD, and
- * calls blk_complete_request() for further processing.
- *
- * This function is interrupt context safe.
- */
-static void scsi_done(struct scsi_cmnd *cmd)
-{
-	trace_scsi_dispatch_cmd_done(cmd);
-	blk_complete_request(cmd->request);
-}
-
-/**
  * scsi_finish_command - cleanup and pass command back to upper layer
  * @cmd: the command
  *
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 98eb358..f0ec249 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -29,6 +29,8 @@
 #include <scsi/scsi_eh.h>
 #include <scsi/scsi_host.h>
 
+#include <trace/events/scsi.h>
+
 #include "scsi_priv.h"
 #include "scsi_logging.h"
 
@@ -1480,6 +1482,23 @@ static void scsi_softirq_done(struct request *rq)
 	}
 }
 
+/**
+ * scsi_done - Invoke completion on finished SCSI command.
+ * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives
+ * ownership back to SCSI Core -- i.e. the LLDD has finished with it.
+ *
+ * Description: This function is the mid-level's (SCSI Core) interrupt routine,
+ * which regains ownership of the SCSI command (de facto) from a LLDD, and
+ * calls blk_complete_request() for further processing.
+ *
+ * This function is interrupt context safe.
+ */
+static void scsi_done(struct scsi_cmnd *cmd)
+{
+	trace_scsi_dispatch_cmd_done(cmd);
+	blk_complete_request(cmd->request);
+}
+
 /*
  * Function:    scsi_request_fn()
  *
@@ -1582,6 +1601,7 @@ static void scsi_request_fn(struct request_queue *q)
 		/*
 		 * Dispatch the command to the low-level driver.
 		 */
+		cmd->scsi_done = scsi_done;
 		rtn = scsi_dispatch_cmd(cmd);
 		if (rtn) {
 			scsi_queue_insert(cmd, rtn);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/14] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (3 preceding siblings ...)
  2014-06-12 13:48 ` [PATCH 04/14] scsi: set ->scsi_done before calling scsi_dispatch_cmd Christoph Hellwig
@ 2014-06-12 13:48 ` Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 06/14] scsi: convert target_busy to an atomic_t Christoph Hellwig
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Prepare for not taking a host-wide lock in the dispatch path by pushing
the lock down into the places that actually need it.  Note that this
patch is just a preparation step, as it will actually increase lock
roundtrips and thus decrease performance on its own.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c |   75 ++++++++++++++++++++++++-----------------------
 1 file changed, 39 insertions(+), 36 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f0ec249..3d90340 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1300,18 +1300,18 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 /*
  * scsi_target_queue_ready: checks if there we can send commands to target
  * @sdev: scsi device on starget to check.
- *
- * Called with the host lock held.
  */
 static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 					   struct scsi_device *sdev)
 {
 	struct scsi_target *starget = scsi_target(sdev);
+	int ret = 0;
 
+	spin_lock_irq(shost->host_lock);
 	if (starget->single_lun) {
 		if (starget->starget_sdev_user &&
 		    starget->starget_sdev_user != sdev)
-			return 0;
+			goto out;
 		starget->starget_sdev_user = sdev;
 	}
 
@@ -1319,57 +1319,66 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 		/*
 		 * unblock after target_blocked iterates to zero
 		 */
-		if (--starget->target_blocked == 0) {
-			SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
-					 "unblocking target at zero depth\n"));
-		} else
-			return 0;
+		if (--starget->target_blocked != 0)
+			goto out;
+
+		SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
+				 "unblocking target at zero depth\n"));
 	}
 
 	if (scsi_target_is_busy(starget)) {
 		list_move_tail(&sdev->starved_entry, &shost->starved_list);
-		return 0;
+		goto out;
 	}
 
-	return 1;
+	scsi_target(sdev)->target_busy++;
+	ret = 1;
+out:
+	spin_unlock_irq(shost->host_lock);
+	return ret;
 }
 
 /*
  * scsi_host_queue_ready: if we can send requests to shost, return 1 else
  * return 0. We must end up running the queue again whenever 0 is
  * returned, else IO can hang.
- *
- * Called with host_lock held.
  */
 static inline int scsi_host_queue_ready(struct request_queue *q,
 				   struct Scsi_Host *shost,
 				   struct scsi_device *sdev)
 {
+	int ret = 0;
+
+	spin_lock_irq(shost->host_lock);
+
 	if (scsi_host_in_recovery(shost))
-		return 0;
+		goto out;
 	if (shost->host_busy == 0 && shost->host_blocked) {
 		/*
 		 * unblock after host_blocked iterates to zero
 		 */
-		if (--shost->host_blocked == 0) {
-			SCSI_LOG_MLQUEUE(3,
-				printk("scsi%d unblocking host at zero depth\n",
-					shost->host_no));
-		} else {
-			return 0;
-		}
+		if (--shost->host_blocked != 0)
+			goto out;
+
+		SCSI_LOG_MLQUEUE(3,
+			printk("scsi%d unblocking host at zero depth\n",
+				shost->host_no));
 	}
 	if (scsi_host_is_busy(shost)) {
 		if (list_empty(&sdev->starved_entry))
 			list_add_tail(&sdev->starved_entry, &shost->starved_list);
-		return 0;
+		goto out;
 	}
 
 	/* We're OK to process the command, so we can't be starved */
 	if (!list_empty(&sdev->starved_entry))
 		list_del_init(&sdev->starved_entry);
 
-	return 1;
+	shost->host_busy++;
+	ret = 1;
+out:
+	spin_unlock_irq(shost->host_lock);
+	return ret;
 }
 
 /*
@@ -1550,7 +1559,7 @@ static void scsi_request_fn(struct request_queue *q)
 			blk_start_request(req);
 		sdev->device_busy++;
 
-		spin_unlock(q->queue_lock);
+		spin_unlock_irq(q->queue_lock);
 		cmd = req->special;
 		if (unlikely(cmd == NULL)) {
 			printk(KERN_CRIT "impossible request in %s.\n"
@@ -1560,7 +1569,6 @@ static void scsi_request_fn(struct request_queue *q)
 			blk_dump_rq_flags(req, "foo");
 			BUG();
 		}
-		spin_lock(shost->host_lock);
 
 		/*
 		 * We hit this when the driver is using a host wide
@@ -1571,9 +1579,11 @@ static void scsi_request_fn(struct request_queue *q)
 		 * a run when a tag is freed.
 		 */
 		if (blk_queue_tagged(q) && !blk_rq_tagged(req)) {
+			spin_lock_irq(shost->host_lock);
 			if (list_empty(&sdev->starved_entry))
 				list_add_tail(&sdev->starved_entry,
 					      &shost->starved_list);
+			spin_unlock_irq(shost->host_lock);
 			goto not_ready;
 		}
 
@@ -1581,16 +1591,7 @@ static void scsi_request_fn(struct request_queue *q)
 			goto not_ready;
 
 		if (!scsi_host_queue_ready(q, shost, sdev))
-			goto not_ready;
-
-		scsi_target(sdev)->target_busy++;
-		shost->host_busy++;
-
-		/*
-		 * XXX(hch): This is rather suboptimal, scsi_dispatch_cmd will
-		 *		take the lock again.
-		 */
-		spin_unlock_irq(shost->host_lock);
+			goto host_not_ready;
 
 		/*
 		 * Finally, initialize any error handling parameters, and set up
@@ -1613,9 +1614,11 @@ static void scsi_request_fn(struct request_queue *q)
 
 	return;
 
- not_ready:
+ host_not_ready:
+	spin_lock_irq(shost->host_lock);
+	scsi_target(sdev)->target_busy--;
 	spin_unlock_irq(shost->host_lock);
-
+ not_ready:
 	/*
 	 * lock q, handle tag, requeue req, and decrement device_busy. We
 	 * must return with queue_lock held.
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/14] scsi: convert target_busy to an atomic_t
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (4 preceding siblings ...)
  2014-06-12 13:48 ` [PATCH 05/14] scsi: push host_lock down into scsi_{host,target}_queue_ready Christoph Hellwig
@ 2014-06-12 13:48 ` Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 07/14] scsi: convert host_busy to atomic_t Christoph Hellwig
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Avoid taking the host-wide host_lock to check the per-target queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if nessecary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c    |   52 ++++++++++++++++++++++++++------------------
 include/scsi/scsi_device.h |    4 ++--
 2 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3d90340..9e288e6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -294,7 +294,7 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	shost->host_busy--;
-	starget->target_busy--;
+	atomic_dec(&starget->target_busy);
 	if (unlikely(scsi_host_in_recovery(shost) &&
 		     (shost->host_failed || shost->host_eh_scheduled)))
 		scsi_eh_wakeup(shost);
@@ -361,7 +361,7 @@ static inline int scsi_device_is_busy(struct scsi_device *sdev)
 static inline int scsi_target_is_busy(struct scsi_target *starget)
 {
 	return ((starget->can_queue > 0 &&
-		 starget->target_busy >= starget->can_queue) ||
+		 atomic_read(&starget->target_busy) >= starget->can_queue) ||
 		 starget->target_blocked);
 }
 
@@ -1305,37 +1305,49 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 					   struct scsi_device *sdev)
 {
 	struct scsi_target *starget = scsi_target(sdev);
-	int ret = 0;
+	unsigned int busy;
 
-	spin_lock_irq(shost->host_lock);
 	if (starget->single_lun) {
+		spin_lock_irq(shost->host_lock);
 		if (starget->starget_sdev_user &&
-		    starget->starget_sdev_user != sdev)
-			goto out;
+		    starget->starget_sdev_user != sdev) {
+			spin_unlock_irq(shost->host_lock);
+			return 0;
+		}
 		starget->starget_sdev_user = sdev;
+		spin_unlock_irq(shost->host_lock);
 	}
 
-	if (starget->target_busy == 0 && starget->target_blocked) {
+	busy = atomic_inc_return(&starget->target_busy) - 1;
+	if (busy == 0 && starget->target_blocked) {
 		/*
 		 * unblock after target_blocked iterates to zero
 		 */
-		if (--starget->target_blocked != 0)
-			goto out;
+		spin_lock_irq(shost->host_lock);
+		if (--starget->target_blocked != 0) {
+			spin_unlock_irq(shost->host_lock);
+			goto out_dec;
+		}
+		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
 				 "unblocking target at zero depth\n"));
 	}
 
-	if (scsi_target_is_busy(starget)) {
-		list_move_tail(&sdev->starved_entry, &shost->starved_list);
-		goto out;
-	}
+	if (starget->can_queue > 0 && busy >= starget->can_queue)
+		goto starved;
+	if (starget->target_blocked)
+		goto starved;
 
-	scsi_target(sdev)->target_busy++;
-	ret = 1;
-out:
+	return 1;
+
+starved:
+	spin_lock_irq(shost->host_lock);
+	list_move_tail(&sdev->starved_entry, &shost->starved_list);
 	spin_unlock_irq(shost->host_lock);
-	return ret;
+out_dec:
+	atomic_dec(&starget->target_busy);
+	return 0;
 }
 
 /*
@@ -1445,7 +1457,7 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
 	spin_unlock(sdev->request_queue->queue_lock);
 	spin_lock(shost->host_lock);
 	shost->host_busy++;
-	starget->target_busy++;
+	atomic_inc(&starget->target_busy);
 	spin_unlock(shost->host_lock);
 	spin_lock(sdev->request_queue->queue_lock);
 
@@ -1615,9 +1627,7 @@ static void scsi_request_fn(struct request_queue *q)
 	return;
 
  host_not_ready:
-	spin_lock_irq(shost->host_lock);
-	scsi_target(sdev)->target_busy--;
-	spin_unlock_irq(shost->host_lock);
+	atomic_dec(&scsi_target(sdev)->target_busy);
  not_ready:
 	/*
 	 * lock q, handle tag, requeue req, and decrement device_busy. We
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 5853c91..560847b 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -290,8 +290,8 @@ struct scsi_target {
 	unsigned int		expecting_lun_change:1;	/* A device has reported
 						 * a 3F/0E UA, other devices on
 						 * the same target will also. */
-	/* commands actually active on LLD. protected by host lock. */
-	unsigned int		target_busy;
+	/* commands actually active on LLD. */
+	atomic_t		target_busy;
 	/*
 	 * LLDs should set this in the slave_alloc host template callout.
 	 * If set to zero then there is not limit.
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/14] scsi: convert host_busy to atomic_t
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (5 preceding siblings ...)
  2014-06-12 13:48 ` [PATCH 06/14] scsi: convert target_busy to an atomic_t Christoph Hellwig
@ 2014-06-12 13:48 ` Christoph Hellwig
  2014-06-12 13:49 ` [PATCH 08/14] scsi: convert device_busy " Christoph Hellwig
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Avoid taking the host-wide host_lock to check the per-host queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if nessecary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/advansys.c             |    4 +-
 drivers/scsi/libiscsi.c             |    4 +-
 drivers/scsi/libsas/sas_scsi_host.c |    5 ++-
 drivers/scsi/qlogicpti.c            |    2 +-
 drivers/scsi/scsi.c                 |    2 +-
 drivers/scsi/scsi_error.c           |    6 +--
 drivers/scsi/scsi_lib.c             |   71 +++++++++++++++++++++--------------
 drivers/scsi/scsi_sysfs.c           |    9 ++++-
 include/scsi/scsi_host.h            |   10 ++---
 9 files changed, 65 insertions(+), 48 deletions(-)

diff --git a/drivers/scsi/advansys.c b/drivers/scsi/advansys.c
index d814588..0a6ecbd 100644
--- a/drivers/scsi/advansys.c
+++ b/drivers/scsi/advansys.c
@@ -2512,7 +2512,7 @@ static void asc_prt_scsi_host(struct Scsi_Host *s)
 
 	printk("Scsi_Host at addr 0x%p, device %s\n", s, dev_name(boardp->dev));
 	printk(" host_busy %u, host_no %d,\n",
-	       s->host_busy, s->host_no);
+	       atomic_read(&s->host_busy), s->host_no);
 
 	printk(" base 0x%lx, io_port 0x%lx, irq %d,\n",
 	       (ulong)s->base, (ulong)s->io_port, boardp->irq);
@@ -3346,7 +3346,7 @@ static void asc_prt_driver_conf(struct seq_file *m, struct Scsi_Host *shost)
 
 	seq_printf(m,
 		   " host_busy %u, max_id %u, max_lun %u, max_channel %u\n",
-		   shost->host_busy, shost->max_id,
+		   atomic_read(&shost->host_busy), shost->max_id,
 		   shost->max_lun, shost->max_channel);
 
 	seq_printf(m,
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index ecd7bd3..f4e9215 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -2971,7 +2971,7 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 	 */
 	for (;;) {
 		spin_lock_irqsave(session->host->host_lock, flags);
-		if (!session->host->host_busy) { /* OK for ERL == 0 */
+		if (!atomic_read(&session->host->host_busy)) { /* OK for ERL == 0 */
 			spin_unlock_irqrestore(session->host->host_lock, flags);
 			break;
 		}
@@ -2979,7 +2979,7 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 		msleep_interruptible(500);
 		iscsi_conn_printk(KERN_INFO, conn, "iscsi conn_destroy(): "
 				  "host_busy %d host_failed %d\n",
-				  session->host->host_busy,
+				  atomic_read(&session->host->host_busy),
 				  session->host->host_failed);
 		/*
 		 * force eh_abort() to unblock
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 25d0f12..eec31b0 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -812,7 +812,7 @@ retry:
 	spin_unlock_irq(shost->host_lock);
 
 	SAS_DPRINTK("Enter %s busy: %d failed: %d\n",
-		    __func__, shost->host_busy, shost->host_failed);
+		    __func__, atomic_read(&shost->host_busy), shost->host_failed);
 	/*
 	 * Deal with commands that still have SAS tasks (i.e. they didn't
 	 * complete via the normal sas_task completion mechanism),
@@ -857,7 +857,8 @@ out:
 		goto retry;
 
 	SAS_DPRINTK("--- Exit %s: busy: %d failed: %d tries: %d\n",
-		    __func__, shost->host_busy, shost->host_failed, tries);
+		    __func__, atomic_read(&shost->host_busy),
+		    shost->host_failed, tries);
 }
 
 enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
diff --git a/drivers/scsi/qlogicpti.c b/drivers/scsi/qlogicpti.c
index 6d48d30..740ae49 100644
--- a/drivers/scsi/qlogicpti.c
+++ b/drivers/scsi/qlogicpti.c
@@ -959,7 +959,7 @@ static inline void update_can_queue(struct Scsi_Host *host, u_int in_ptr, u_int
 	/* Temporary workaround until bug is found and fixed (one bug has been found
 	   already, but fixing it makes things even worse) -jj */
 	int num_free = QLOGICPTI_REQ_QUEUE_LEN - REQ_QUEUE_DEPTH(in_ptr, out_ptr) - 64;
-	host->can_queue = host->host_busy + num_free;
+	host->can_queue = atomic_read(&host->host_busy) + num_free;
 	host->sg_tablesize = QLOGICPTI_MAX_SG(num_free);
 }
 
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 8ca9ed2..091329a 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -603,7 +603,7 @@ void scsi_log_completion(struct scsi_cmnd *cmd, int disposition)
 			if (level > 3)
 				scmd_printk(KERN_INFO, cmd,
 					    "scsi host busy %d failed %d\n",
-					    cmd->device->host->host_busy,
+					    atomic_read(&cmd->device->host->host_busy),
 					    cmd->device->host->host_failed);
 		}
 	}
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index cbe38e5..3dae321 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -59,7 +59,7 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *,
 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
 {
-	if (shost->host_busy == shost->host_failed) {
+	if (atomic_read(&shost->host_busy) == shost->host_failed) {
 		trace_scsi_eh_wakeup(shost);
 		wake_up_process(shost->ehandler);
 		SCSI_LOG_ERROR_RECOVERY(5,
@@ -2155,7 +2155,7 @@ int scsi_error_handler(void *data)
 	while (!kthread_should_stop()) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
-		    shost->host_failed != shost->host_busy) {
+		    shost->host_failed != atomic_read(&shost->host_busy)) {
 			SCSI_LOG_ERROR_RECOVERY(1,
 				printk("scsi_eh_%d: sleeping\n",
 					shost->host_no));
@@ -2167,7 +2167,7 @@ int scsi_error_handler(void *data)
 		SCSI_LOG_ERROR_RECOVERY(1,
 			printk("scsi_eh_%d: waking up %d/%d/%d\n",
 			       shost->host_no, shost->host_eh_scheduled,
-			       shost->host_failed, shost->host_busy));
+			       shost->host_failed, atomic_read(&shost->host_busy)));
 
 		/*
 		 * We have a host that is failing for some reason.  Figure out
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9e288e6..3f51bb8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -292,14 +292,17 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 	struct scsi_target *starget = scsi_target(sdev);
 	unsigned long flags;
 
-	spin_lock_irqsave(shost->host_lock, flags);
-	shost->host_busy--;
+	atomic_dec(&shost->host_busy);
 	atomic_dec(&starget->target_busy);
+
 	if (unlikely(scsi_host_in_recovery(shost) &&
-		     (shost->host_failed || shost->host_eh_scheduled)))
+		     (shost->host_failed || shost->host_eh_scheduled))) {
+		spin_lock_irqsave(shost->host_lock, flags);
 		scsi_eh_wakeup(shost);
-	spin_unlock(shost->host_lock);
-	spin_lock(sdev->request_queue->queue_lock);
+		spin_unlock_irqrestore(shost->host_lock, flags);
+	}
+
+	spin_lock_irqsave(sdev->request_queue->queue_lock, flags);
 	sdev->device_busy--;
 	spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags);
 }
@@ -367,7 +370,8 @@ static inline int scsi_target_is_busy(struct scsi_target *starget)
 
 static inline int scsi_host_is_busy(struct Scsi_Host *shost)
 {
-	if ((shost->can_queue > 0 && shost->host_busy >= shost->can_queue) ||
+	if ((shost->can_queue > 0 &&
+	     atomic_read(&shost->host_busy) >= shost->can_queue) ||
 	    shost->host_blocked || shost->host_self_blocked)
 		return 1;
 
@@ -1359,38 +1363,51 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 				   struct Scsi_Host *shost,
 				   struct scsi_device *sdev)
 {
-	int ret = 0;
-
-	spin_lock_irq(shost->host_lock);
+	unsigned int busy;
 
 	if (scsi_host_in_recovery(shost))
-		goto out;
-	if (shost->host_busy == 0 && shost->host_blocked) {
+		return 0;
+
+	busy = atomic_inc_return(&shost->host_busy) - 1;
+	if (busy == 0 && shost->host_blocked) {
 		/*
 		 * unblock after host_blocked iterates to zero
 		 */
-		if (--shost->host_blocked != 0)
-			goto out;
+		spin_lock_irq(shost->host_lock);
+		if (--shost->host_blocked != 0) {
+			spin_unlock_irq(shost->host_lock);
+			goto out_dec;
+		}
+		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3,
 			printk("scsi%d unblocking host at zero depth\n",
 				shost->host_no));
 	}
-	if (scsi_host_is_busy(shost)) {
-		if (list_empty(&sdev->starved_entry))
-			list_add_tail(&sdev->starved_entry, &shost->starved_list);
-		goto out;
-	}
+
+	if (shost->can_queue > 0 && busy >= shost->can_queue)
+		goto starved;
+	if (shost->host_blocked || shost->host_self_blocked)
+		goto starved;
 
 	/* We're OK to process the command, so we can't be starved */
-	if (!list_empty(&sdev->starved_entry))
-		list_del_init(&sdev->starved_entry);
+	if (!list_empty(&sdev->starved_entry)) {
+		spin_lock_irq(shost->host_lock);
+		if (!list_empty(&sdev->starved_entry))
+			list_del_init(&sdev->starved_entry);
+		spin_unlock_irq(shost->host_lock);
+	}
 
-	shost->host_busy++;
-	ret = 1;
-out:
+	return 1;
+
+starved:
+	spin_lock_irq(shost->host_lock);
+	if (list_empty(&sdev->starved_entry))
+		list_add_tail(&sdev->starved_entry, &shost->starved_list);
 	spin_unlock_irq(shost->host_lock);
-	return ret;
+out_dec:
+	atomic_dec(&shost->host_busy);
+	return 0;
 }
 
 /*
@@ -1454,12 +1471,8 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
 	 * with the locks as normal issue path does.
 	 */
 	sdev->device_busy++;
-	spin_unlock(sdev->request_queue->queue_lock);
-	spin_lock(shost->host_lock);
-	shost->host_busy++;
+	atomic_inc(&shost->host_busy);
 	atomic_inc(&starget->target_busy);
-	spin_unlock(shost->host_lock);
-	spin_lock(sdev->request_queue->queue_lock);
 
 	blk_complete_request(req);
 }
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 074e8cc..f489934 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -334,7 +334,6 @@ store_shost_eh_deadline(struct device *dev, struct device_attribute *attr,
 static DEVICE_ATTR(eh_deadline, S_IRUGO | S_IWUSR, show_shost_eh_deadline, store_shost_eh_deadline);
 
 shost_rd_attr(unique_id, "%u\n");
-shost_rd_attr(host_busy, "%hu\n");
 shost_rd_attr(cmd_per_lun, "%hd\n");
 shost_rd_attr(can_queue, "%hd\n");
 shost_rd_attr(sg_tablesize, "%hu\n");
@@ -344,6 +343,14 @@ shost_rd_attr(prot_capabilities, "%u\n");
 shost_rd_attr(prot_guard_type, "%hd\n");
 shost_rd_attr2(proc_name, hostt->proc_name, "%s\n");
 
+static ssize_t
+show_host_busy(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	return snprintf(buf, 20, "%hu\n", atomic_read(&shost->host_busy));
+}
+static DEVICE_ATTR(host_busy, S_IRUGO, show_host_busy, NULL);
+
 static struct attribute *scsi_sysfs_shost_attrs[] = {
 	&dev_attr_unique_id.attr,
 	&dev_attr_host_busy.attr,
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 94844fc..590e1af 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -603,13 +603,9 @@ struct Scsi_Host {
 	 */
 	struct blk_queue_tag	*bqt;
 
-	/*
-	 * The following two fields are protected with host_lock;
-	 * however, eh routines can safely access during eh processing
-	 * without acquiring the lock.
-	 */
-	unsigned int host_busy;		   /* commands actually active on low-level */
-	unsigned int host_failed;	   /* commands that failed. */
+	atomic_t host_busy;		   /* commands actually active on low-level */
+	unsigned int host_failed;	   /* commands that failed.
+					      protected by host_lock */
 	unsigned int host_eh_scheduled;    /* EH scheduled without command */
     
 	unsigned int host_no;  /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/14] scsi: convert device_busy to atomic_t
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (6 preceding siblings ...)
  2014-06-12 13:48 ` [PATCH 07/14] scsi: convert host_busy to atomic_t Christoph Hellwig
@ 2014-06-12 13:49 ` Christoph Hellwig
  2014-06-12 13:49 ` [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Avoid taking the queue_lock to check the per-device queue limit.  Instead
we do an atomic_inc_return early on to grab our slot in the queue,
and if nessecary decrement it after finishing all checks.

Unlike the host and target busy counters this doesn't allow us to avoid the
queue_lock in the request_fn due to the way the interface works, but it'll
allow us to prepare for using the blk-mq code, which doesn't use the
queue_lock at all, and it at least avoids a queue_lock rountrip in
scsi_device_unbusy, which is still important given how busy the queue_lock
is.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/message/fusion/mptsas.c |    2 +-
 drivers/scsi/scsi_lib.c         |   50 ++++++++++++++++++++++-----------------
 drivers/scsi/scsi_sysfs.c       |   10 +++++++-
 drivers/scsi/sg.c               |    2 +-
 include/scsi/scsi_device.h      |    4 +---
 5 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index 711fcb5..d636dbe 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -3763,7 +3763,7 @@ mptsas_send_link_status_event(struct fw_event_work *fw_event)
 						printk(MYIOC_s_DEBUG_FMT
 						"SDEV OUTSTANDING CMDS"
 						"%d\n", ioc->name,
-						sdev->device_busy));
+						atomic_read(&sdev->device_busy)));
 				}
 
 			}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3f51bb8..c36c313 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -302,9 +302,7 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 		spin_unlock_irqrestore(shost->host_lock, flags);
 	}
 
-	spin_lock_irqsave(sdev->request_queue->queue_lock, flags);
-	sdev->device_busy--;
-	spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags);
+	atomic_dec(&sdev->device_busy);
 }
 
 /*
@@ -355,9 +353,10 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 
 static inline int scsi_device_is_busy(struct scsi_device *sdev)
 {
-	if (sdev->device_busy >= sdev->queue_depth || sdev->device_blocked)
+	if (atomic_read(&sdev->device_busy) >= sdev->queue_depth)
+		return 1;
+	if (sdev->device_blocked)
 		return 1;
-
 	return 0;
 }
 
@@ -1224,7 +1223,7 @@ scsi_prep_return(struct request_queue *q, struct request *req, int ret)
 		 * queue must be restarted, so we schedule a callback to happen
 		 * shortly.
 		 */
-		if (sdev->device_busy == 0)
+		if (atomic_read(&sdev->device_busy) == 0)
 			blk_delay_queue(q, SCSI_QUEUE_DELAY);
 		break;
 	default:
@@ -1281,26 +1280,32 @@ static void scsi_unprep_fn(struct request_queue *q, struct request *req)
 static inline int scsi_dev_queue_ready(struct request_queue *q,
 				  struct scsi_device *sdev)
 {
-	if (sdev->device_busy == 0 && sdev->device_blocked) {
+	unsigned int busy;
+
+	busy = atomic_inc_return(&sdev->device_busy) - 1;
+	if (busy == 0 && sdev->device_blocked) {
 		/*
 		 * unblock after device_blocked iterates to zero
 		 */
-		if (--sdev->device_blocked == 0) {
-			SCSI_LOG_MLQUEUE(3,
-				   sdev_printk(KERN_INFO, sdev,
-				   "unblocking device at zero depth\n"));
-		} else {
+		if (--sdev->device_blocked != 0) {
 			blk_delay_queue(q, SCSI_QUEUE_DELAY);
-			return 0;
+			goto out_dec;
 		}
+		SCSI_LOG_MLQUEUE(3, sdev_printk(KERN_INFO, sdev,
+				   "unblocking device at zero depth\n"));
 	}
-	if (scsi_device_is_busy(sdev))
-		return 0;
+
+	if (busy >= sdev->queue_depth)
+		goto out_dec;
+	if (sdev->device_blocked)
+		goto out_dec;
 
 	return 1;
+out_dec:
+	atomic_dec(&sdev->device_busy);
+	return 0;
 }
 
-
 /*
  * scsi_target_queue_ready: checks if there we can send commands to target
  * @sdev: scsi device on starget to check.
@@ -1470,7 +1475,7 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
 	 * bump busy counts.  To bump the counters, we need to dance
 	 * with the locks as normal issue path does.
 	 */
-	sdev->device_busy++;
+	atomic_inc(&sdev->device_busy);
 	atomic_inc(&shost->host_busy);
 	atomic_inc(&starget->target_busy);
 
@@ -1566,7 +1571,7 @@ static void scsi_request_fn(struct request_queue *q)
 		 * accept it.
 		 */
 		req = blk_peek_request(q);
-		if (!req || !scsi_dev_queue_ready(q, sdev))
+		if (!req)
 			break;
 
 		if (unlikely(!scsi_device_online(sdev))) {
@@ -1576,13 +1581,14 @@ static void scsi_request_fn(struct request_queue *q)
 			continue;
 		}
 
+		if (!scsi_dev_queue_ready(q, sdev))
+			break;
 
 		/*
 		 * Remove the request from the request list.
 		 */
 		if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
 			blk_start_request(req);
-		sdev->device_busy++;
 
 		spin_unlock_irq(q->queue_lock);
 		cmd = req->special;
@@ -1652,9 +1658,9 @@ static void scsi_request_fn(struct request_queue *q)
 	 */
 	spin_lock_irq(q->queue_lock);
 	blk_requeue_request(q, req);
-	sdev->device_busy--;
+	atomic_dec(&sdev->device_busy);
 out_delay:
-	if (sdev->device_busy == 0 && !scsi_device_blocked(sdev))
+	if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
 		blk_delay_queue(q, SCSI_QUEUE_DELAY);
 }
 
@@ -2394,7 +2400,7 @@ scsi_device_quiesce(struct scsi_device *sdev)
 		return err;
 
 	scsi_run_queue(sdev->request_queue);
-	while (sdev->device_busy) {
+	while (atomic_read(&sdev->device_busy)) {
 		msleep_interruptible(200);
 		scsi_run_queue(sdev->request_queue);
 	}
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index f489934..edc0bd4 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -585,13 +585,21 @@ static int scsi_sdev_check_buf_bit(const char *buf)
  * Create the actual show/store functions and data structures.
  */
 sdev_rd_attr (device_blocked, "%d\n");
-sdev_rd_attr (device_busy, "%d\n");
 sdev_rd_attr (type, "%d\n");
 sdev_rd_attr (scsi_level, "%d\n");
 sdev_rd_attr (vendor, "%.8s\n");
 sdev_rd_attr (model, "%.16s\n");
 sdev_rd_attr (rev, "%.4s\n");
 
+static ssize_t
+sdev_show_device_busy(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(dev);
+	return snprintf(buf, 20, "%d\n", atomic_read(&sdev->device_busy));
+}
+static DEVICE_ATTR(device_busy, S_IRUGO, sdev_show_device_busy, NULL);
+
 /*
  * TODO: can we make these symlinks to the block layer ones?
  */
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 53268aa..e75f71e 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2488,7 +2488,7 @@ static int sg_proc_seq_show_dev(struct seq_file *s, void *v)
 			      scsidp->id, scsidp->lun, (int) scsidp->type,
 			      1,
 			      (int) scsidp->queue_depth,
-			      (int) scsidp->device_busy,
+			      (int) atomic_read(&scsidp->device_busy),
 			      (int) scsi_device_online(scsidp));
 	else
 		seq_printf(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 560847b..73f9683 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -81,9 +81,7 @@ struct scsi_device {
 	struct list_head    siblings;   /* list of all devices on this host */
 	struct list_head    same_target_siblings; /* just the devices sharing same target id */
 
-	/* this is now protected by the request_queue->queue_lock */
-	unsigned int device_busy;	/* commands actually active on
-					 * low-level. protected by queue_lock. */
+	atomic_t device_busy;		/* commands actually active on LLDD */
 	spinlock_t list_lock;
 	struct list_head cmd_list;	/* queue of in use SCSI Command structures */
 	struct list_head starved_entry;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (7 preceding siblings ...)
  2014-06-12 13:49 ` [PATCH 08/14] scsi: convert device_busy " Christoph Hellwig
@ 2014-06-12 13:49 ` Christoph Hellwig
  2014-06-12 13:49 ` [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit Christoph Hellwig
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Seems like these counters are missing any sort of synchronization for
updates, as a over 10 year old comment from me noted.  Fix this by
using atomic counters, and while we're at it also make sure they are
in the same cacheline as the _busy counters and not needlessly stored
to in every I/O completion.

With the new model the _busy counters can temporarily go negative,
so all the readers are updated to check for > 0 values.  Longer
term every successful I/O completion will reset the counters to zero,
so the temporarily negative values will not cause any harm.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c        |   21 ++++++------
 drivers/scsi/scsi_lib.c    |   82 +++++++++++++++++++++-----------------------
 drivers/scsi/scsi_sysfs.c  |   10 +++++-
 include/scsi/scsi_device.h |    7 ++--
 include/scsi/scsi_host.h   |    7 ++--
 5 files changed, 64 insertions(+), 63 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 091329a..e30509a 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -730,17 +730,16 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
 
 	scsi_device_unbusy(sdev);
 
-        /*
-         * Clear the flags which say that the device/host is no longer
-         * capable of accepting new commands.  These are set in scsi_queue.c
-         * for both the queue full condition on a device, and for a
-         * host full condition on the host.
-	 *
-	 * XXX(hch): What about locking?
-         */
-        shost->host_blocked = 0;
-	starget->target_blocked = 0;
-        sdev->device_blocked = 0;
+	/*
+	 * Clear the flags which say that the device/target/host is no longer
+	 * capable of accepting new commands.
+	 */
+	if (atomic_read(&shost->host_blocked))
+		atomic_set(&shost->host_blocked, 0);
+	if (atomic_read(&starget->target_blocked))
+		atomic_set(&starget->target_blocked, 0);
+	if (atomic_read(&sdev->device_blocked))
+		atomic_set(&sdev->device_blocked, 0);
 
 	/*
 	 * If we have valid sense information, then some kind of recovery
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c36c313..0e33dee 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -99,14 +99,16 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 	 */
 	switch (reason) {
 	case SCSI_MLQUEUE_HOST_BUSY:
-		host->host_blocked = host->max_host_blocked;
+		atomic_set(&host->host_blocked, host->max_host_blocked);
 		break;
 	case SCSI_MLQUEUE_DEVICE_BUSY:
 	case SCSI_MLQUEUE_EH_RETRY:
-		device->device_blocked = device->max_device_blocked;
+		atomic_set(&device->device_blocked,
+			   device->max_device_blocked);
 		break;
 	case SCSI_MLQUEUE_TARGET_BUSY:
-		starget->target_blocked = starget->max_target_blocked;
+		atomic_set(&starget->target_blocked,
+			   starget->max_target_blocked);
 		break;
 	}
 }
@@ -351,30 +353,39 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
-static inline int scsi_device_is_busy(struct scsi_device *sdev)
+static inline bool scsi_device_is_busy(struct scsi_device *sdev)
 {
 	if (atomic_read(&sdev->device_busy) >= sdev->queue_depth)
-		return 1;
-	if (sdev->device_blocked)
-		return 1;
+		return true;
+	if (atomic_read(&sdev->device_blocked) > 0)
+		return true;
 	return 0;
 }
 
-static inline int scsi_target_is_busy(struct scsi_target *starget)
+static inline bool scsi_target_is_busy(struct scsi_target *starget)
 {
-	return ((starget->can_queue > 0 &&
-		 atomic_read(&starget->target_busy) >= starget->can_queue) ||
-		 starget->target_blocked);
+	if (starget->can_queue > 0) {
+		if (atomic_read(&starget->target_busy) >= starget->can_queue)
+			return true;
+		if (atomic_read(&starget->target_blocked) > 0)
+			return true;
+	}
+
+	return false;
 }
 
-static inline int scsi_host_is_busy(struct Scsi_Host *shost)
+static inline bool scsi_host_is_busy(struct Scsi_Host *shost)
 {
-	if ((shost->can_queue > 0 &&
-	     atomic_read(&shost->host_busy) >= shost->can_queue) ||
-	    shost->host_blocked || shost->host_self_blocked)
-		return 1;
+	if (shost->can_queue > 0) {
+		if (atomic_read(&shost->host_busy) >= shost->can_queue)
+			return true;
+		if (atomic_read(&shost->host_blocked) > 0)
+			return true;
+		if (shost->host_self_blocked)
+			return true;
+	}
 
-	return 0;
+	return false;
 }
 
 static void scsi_starved_list_run(struct Scsi_Host *shost)
@@ -1283,11 +1294,8 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 	unsigned int busy;
 
 	busy = atomic_inc_return(&sdev->device_busy) - 1;
-	if (busy == 0 && sdev->device_blocked) {
-		/*
-		 * unblock after device_blocked iterates to zero
-		 */
-		if (--sdev->device_blocked != 0) {
+	if (busy == 0 && atomic_read(&sdev->device_blocked) > 0) {
+		if (atomic_dec_return(&sdev->device_blocked) > 0) {
 			blk_delay_queue(q, SCSI_QUEUE_DELAY);
 			goto out_dec;
 		}
@@ -1297,7 +1305,7 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 
 	if (busy >= sdev->queue_depth)
 		goto out_dec;
-	if (sdev->device_blocked)
+	if (atomic_read(&sdev->device_blocked) > 0)
 		goto out_dec;
 
 	return 1;
@@ -1328,16 +1336,9 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 	}
 
 	busy = atomic_inc_return(&starget->target_busy) - 1;
-	if (busy == 0 && starget->target_blocked) {
-		/*
-		 * unblock after target_blocked iterates to zero
-		 */
-		spin_lock_irq(shost->host_lock);
-		if (--starget->target_blocked != 0) {
-			spin_unlock_irq(shost->host_lock);
+	if (busy == 0 && atomic_read(&starget->target_blocked) > 0) {
+		if (atomic_dec_return(&starget->target_blocked) > 0)
 			goto out_dec;
-		}
-		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
 				 "unblocking target at zero depth\n"));
@@ -1345,7 +1346,7 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 
 	if (starget->can_queue > 0 && busy >= starget->can_queue)
 		goto starved;
-	if (starget->target_blocked)
+	if (atomic_read(&starget->target_blocked) > 0)
 		goto starved;
 
 	return 1;
@@ -1374,16 +1375,9 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 		return 0;
 
 	busy = atomic_inc_return(&shost->host_busy) - 1;
-	if (busy == 0 && shost->host_blocked) {
-		/*
-		 * unblock after host_blocked iterates to zero
-		 */
-		spin_lock_irq(shost->host_lock);
-		if (--shost->host_blocked != 0) {
-			spin_unlock_irq(shost->host_lock);
+	if (busy == 0 && atomic_read(&shost->host_blocked) > 0) {
+		if (atomic_dec_return(&shost->host_blocked) > 0)
 			goto out_dec;
-		}
-		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3,
 			printk("scsi%d unblocking host at zero depth\n",
@@ -1392,7 +1386,9 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 
 	if (shost->can_queue > 0 && busy >= shost->can_queue)
 		goto starved;
-	if (shost->host_blocked || shost->host_self_blocked)
+	if (atomic_read(&shost->host_blocked) > 0)
+		goto starved;
+	if (shost->host_self_blocked)
 		goto starved;
 
 	/* We're OK to process the command, so we can't be starved */
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index edc0bd4..9efa2b8 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -584,7 +584,6 @@ static int scsi_sdev_check_buf_bit(const char *buf)
 /*
  * Create the actual show/store functions and data structures.
  */
-sdev_rd_attr (device_blocked, "%d\n");
 sdev_rd_attr (type, "%d\n");
 sdev_rd_attr (scsi_level, "%d\n");
 sdev_rd_attr (vendor, "%.8s\n");
@@ -600,6 +599,15 @@ sdev_show_device_busy(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR(device_busy, S_IRUGO, sdev_show_device_busy, NULL);
 
+static ssize_t
+sdev_show_device_blocked(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(dev);
+	return snprintf(buf, 20, "%d\n", atomic_read(&sdev->device_blocked));
+}
+static DEVICE_ATTR(device_blocked, S_IRUGO, sdev_show_device_blocked, NULL);
+
 /*
  * TODO: can we make these symlinks to the block layer ones?
  */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 73f9683..3ba2c24 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -82,6 +82,8 @@ struct scsi_device {
 	struct list_head    same_target_siblings; /* just the devices sharing same target id */
 
 	atomic_t device_busy;		/* commands actually active on LLDD */
+	atomic_t device_blocked;	/* Device returned QUEUE_FULL. */
+
 	spinlock_t list_lock;
 	struct list_head cmd_list;	/* queue of in use SCSI Command structures */
 	struct list_head starved_entry;
@@ -179,8 +181,6 @@ struct scsi_device {
 	struct list_head event_list;	/* asserted events */
 	struct work_struct event_work;
 
-	unsigned int device_blocked;	/* Device returned QUEUE_FULL. */
-
 	unsigned int max_device_blocked; /* what device_blocked counts down from  */
 #define SCSI_DEFAULT_DEVICE_BLOCKED	3
 
@@ -290,12 +290,13 @@ struct scsi_target {
 						 * the same target will also. */
 	/* commands actually active on LLD. */
 	atomic_t		target_busy;
+	atomic_t		target_blocked;
+
 	/*
 	 * LLDs should set this in the slave_alloc host template callout.
 	 * If set to zero then there is not limit.
 	 */
 	unsigned int		can_queue;
-	unsigned int		target_blocked;
 	unsigned int		max_target_blocked;
 #define SCSI_DEFAULT_TARGET_BLOCKED	3
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 590e1af..c4e4875 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -604,6 +604,8 @@ struct Scsi_Host {
 	struct blk_queue_tag	*bqt;
 
 	atomic_t host_busy;		   /* commands actually active on low-level */
+	atomic_t host_blocked;
+
 	unsigned int host_failed;	   /* commands that failed.
 					      protected by host_lock */
 	unsigned int host_eh_scheduled;    /* EH scheduled without command */
@@ -703,11 +705,6 @@ struct Scsi_Host {
 	struct workqueue_struct *tmf_work_q;
 
 	/*
-	 * Host has rejected a command because it was busy.
-	 */
-	unsigned int host_blocked;
-
-	/*
 	 * Value host_blocked counts down from
 	 */
 	unsigned int max_host_blocked;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (8 preceding siblings ...)
  2014-06-12 13:49 ` [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
@ 2014-06-12 13:49 ` Christoph Hellwig
  2014-06-21 22:10   ` Elliott, Robert (Server Storage)
  2014-06-12 13:49 ` [PATCH 11/14] scsi: unwind blk_end_request_all and blk_end_request_err calls Christoph Hellwig
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

This saves us an atomic operation for each I/O submission and completion
for the usual case where the driver doesn't set a per-target can_queue
value.  Only a few iscsi hardware offload drivers set the per-target
can_queue value at the moment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c |   17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0e33dee..763b3c9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -295,7 +295,8 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 	unsigned long flags;
 
 	atomic_dec(&shost->host_busy);
-	atomic_dec(&starget->target_busy);
+	if (starget->can_queue > 0)
+		atomic_dec(&starget->target_busy);
 
 	if (unlikely(scsi_host_in_recovery(shost) &&
 		     (shost->host_failed || shost->host_eh_scheduled))) {
@@ -1335,6 +1336,9 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 		spin_unlock_irq(shost->host_lock);
 	}
 
+	if (starget->can_queue <= 0)
+		return 1;
+
 	busy = atomic_inc_return(&starget->target_busy) - 1;
 	if (busy == 0 && atomic_read(&starget->target_blocked) > 0) {
 		if (atomic_dec_return(&starget->target_blocked) > 0)
@@ -1344,7 +1348,7 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 				 "unblocking target at zero depth\n"));
 	}
 
-	if (starget->can_queue > 0 && busy >= starget->can_queue)
+	if (busy >= starget->can_queue)
 		goto starved;
 	if (atomic_read(&starget->target_blocked) > 0)
 		goto starved;
@@ -1356,7 +1360,8 @@ starved:
 	list_move_tail(&sdev->starved_entry, &shost->starved_list);
 	spin_unlock_irq(shost->host_lock);
 out_dec:
-	atomic_dec(&starget->target_busy);
+	if (starget->can_queue > 0)
+		atomic_dec(&starget->target_busy);
 	return 0;
 }
 
@@ -1473,7 +1478,8 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
 	 */
 	atomic_inc(&sdev->device_busy);
 	atomic_inc(&shost->host_busy);
-	atomic_inc(&starget->target_busy);
+	if (starget->can_queue > 0)
+		atomic_inc(&starget->target_busy);
 
 	blk_complete_request(req);
 }
@@ -1642,7 +1648,8 @@ static void scsi_request_fn(struct request_queue *q)
 	return;
 
  host_not_ready:
-	atomic_dec(&scsi_target(sdev)->target_busy);
+	if (&scsi_target(sdev)->can_queue > 0)
+		atomic_dec(&scsi_target(sdev)->target_busy);
  not_ready:
 	/*
 	 * lock q, handle tag, requeue req, and decrement device_busy. We
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/14] scsi: unwind blk_end_request_all and blk_end_request_err calls
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (9 preceding siblings ...)
  2014-06-12 13:49 ` [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit Christoph Hellwig
@ 2014-06-12 13:49 ` Christoph Hellwig
  2014-06-12 13:49 ` [PATCH 12/14] scatterlist: allow chaining to preallocated chunks Christoph Hellwig
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Replace the calls to the various blk_end_request variants with opencode
equivalents.  Blk-mq is using a model that gives the driver control
between the bio updates and the actual completion, and making the old
code follow that same model allows us to keep the code more similar for
both pathes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c |   61 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 763b3c9..e438726 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -625,6 +625,37 @@ static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 	cmd->request->next_rq->special = NULL;
 }
 
+static bool scsi_end_request(struct request *req, int error,
+		unsigned int bytes, unsigned int bidi_bytes)
+{
+	struct scsi_cmnd *cmd = req->special;
+	struct scsi_device *sdev = cmd->device;
+	struct request_queue *q = sdev->request_queue;
+	unsigned long flags;
+
+
+	if (blk_update_request(req, error, bytes))
+		return true;
+
+	/* Bidi request must be completed as a whole */
+	if (unlikely(bidi_bytes) &&
+	    blk_update_request(req->next_rq, error, bidi_bytes))
+		return true;
+
+	if (blk_queue_add_random(q))
+		add_disk_randomness(req->rq_disk);
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	blk_finish_request(req, error);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	if (bidi_bytes)
+		scsi_release_bidi_buffers(cmd);
+	scsi_release_buffers(cmd);
+	scsi_next_command(cmd);
+	return false;
+}
+
 /**
  * __scsi_error_from_host_byte - translate SCSI error code into errno
  * @cmd:	SCSI command (unused)
@@ -697,7 +728,7 @@ static int __scsi_error_from_host_byte(struct scsi_cmnd *cmd, int result)
  *		   be put back on the queue and retried using the same
  *		   command as before, possibly after a delay.
  *
- *		c) We can call blk_end_request() with -EIO to fail
+ *		c) We can call scsi_end_request() with -EIO to fail
  *		   the remainder of the request.
  */
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
@@ -749,13 +780,9 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 			 * both sides at once.
 			 */
 			req->next_rq->resid_len = scsi_in(cmd)->resid;
-
-			scsi_release_buffers(cmd);
-			scsi_release_bidi_buffers(cmd);
-
-			blk_end_request_all(req, 0);
-
-			scsi_next_command(cmd);
+			if (scsi_end_request(req, 0, blk_rq_bytes(req),
+					blk_rq_bytes(req->next_rq)))
+				BUG();
 			return;
 		}
 	}
@@ -794,15 +821,16 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	/*
 	 * If we finished all bytes in the request we are done now.
 	 */
-	if (!blk_end_request(req, error, good_bytes))
-		goto next_command;
+	if (!scsi_end_request(req, error, good_bytes, 0))
+		return;
 
 	/*
 	 * Kill remainder if no retrys.
 	 */
 	if (error && scsi_noretry_cmd(cmd)) {
-		blk_end_request_all(req, error);
-		goto next_command;
+		if (scsi_end_request(req, error, blk_rq_bytes(req), 0))
+			BUG();
+		return;
 	}
 
 	/*
@@ -947,8 +975,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				scsi_print_sense("", cmd);
 			scsi_print_command(cmd);
 		}
-		if (!blk_end_request_err(req, error))
-			goto next_command;
+		if (!scsi_end_request(req, error, blk_rq_err_bytes(req), 0))
+			return;
 		/*FALLTHRU*/
 	case ACTION_REPREP:
 	requeue:
@@ -967,11 +995,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 		__scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY, 0);
 		break;
 	}
-	return;
-
-next_command:
-	scsi_release_buffers(cmd);
-	scsi_next_command(cmd);
 }
 
 static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/14] scatterlist: allow chaining to preallocated chunks
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (10 preceding siblings ...)
  2014-06-12 13:49 ` [PATCH 11/14] scsi: unwind blk_end_request_all and blk_end_request_err calls Christoph Hellwig
@ 2014-06-12 13:49 ` Christoph Hellwig
  2014-06-12 13:49 ` [PATCH 13/14] scsi: add support for a blk-mq based I/O path Christoph Hellwig
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

Blk-mq drivers usually preallocate their S/G list as part of the request,
but if we want to support the very large S/G lists currently supported by
the SCSI code that would tie up a lot of memory in the preallocated request
pool.  Add support to the scatterlist code so that it can initialize a
S/G list that uses a preallocated first chunks and dynamically allocated
additional chunks.  That way the scsi-mq code can preallocate a first
page worth of S/G entries as part of the request, and dynamically extent
the S/G list when needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c     |   16 +++++++---------
 include/linux/scatterlist.h |    6 +++---
 lib/scatterlist.c           |   24 ++++++++++++++++--------
 3 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e438726..32fbae4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -567,6 +567,11 @@ static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
 	return mempool_alloc(sgp->pool, gfp_mask);
 }
 
+static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
+{
+	__sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS, false, scsi_sg_free);
+}
+
 static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents,
 			      gfp_t gfp_mask)
 {
@@ -575,19 +580,12 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents,
 	BUG_ON(!nents);
 
 	ret = __sg_alloc_table(&sdb->table, nents, SCSI_MAX_SG_SEGMENTS,
-			       gfp_mask, scsi_sg_alloc);
+			       NULL, gfp_mask, scsi_sg_alloc);
 	if (unlikely(ret))
-		__sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS,
-				scsi_sg_free);
-
+		scsi_free_sgtable(sdb);
 	return ret;
 }
 
-static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
-{
-	__sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS, scsi_sg_free);
-}
-
 /*
  * Function:    scsi_release_buffers()
  *
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index a964f72..f4ec8bb 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -229,10 +229,10 @@ void sg_init_one(struct scatterlist *, const void *, unsigned int);
 typedef struct scatterlist *(sg_alloc_fn)(unsigned int, gfp_t);
 typedef void (sg_free_fn)(struct scatterlist *, unsigned int);
 
-void __sg_free_table(struct sg_table *, unsigned int, sg_free_fn *);
+void __sg_free_table(struct sg_table *, unsigned int, bool, sg_free_fn *);
 void sg_free_table(struct sg_table *);
-int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int, gfp_t,
-		     sg_alloc_fn *);
+int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
+		     struct scatterlist *, gfp_t, sg_alloc_fn *);
 int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
 int sg_alloc_table_from_pages(struct sg_table *sgt,
 	struct page **pages, unsigned int n_pages,
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 3a8e8e8..48c15d2 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -165,6 +165,7 @@ static void sg_kfree(struct scatterlist *sg, unsigned int nents)
  * __sg_free_table - Free a previously mapped sg table
  * @table:	The sg table header to use
  * @max_ents:	The maximum number of entries per single scatterlist
+ * @skip_first_chunk: don't free the (preallocated) first scatterlist chunk
  * @free_fn:	Free function
  *
  *  Description:
@@ -174,7 +175,7 @@ static void sg_kfree(struct scatterlist *sg, unsigned int nents)
  *
  **/
 void __sg_free_table(struct sg_table *table, unsigned int max_ents,
-		     sg_free_fn *free_fn)
+		     bool skip_first_chunk, sg_free_fn *free_fn)
 {
 	struct scatterlist *sgl, *next;
 
@@ -202,7 +203,9 @@ void __sg_free_table(struct sg_table *table, unsigned int max_ents,
 		}
 
 		table->orig_nents -= sg_size;
-		free_fn(sgl, alloc_size);
+		if (!skip_first_chunk)
+			free_fn(sgl, alloc_size);
+		skip_first_chunk = false;
 		sgl = next;
 	}
 
@@ -217,7 +220,7 @@ EXPORT_SYMBOL(__sg_free_table);
  **/
 void sg_free_table(struct sg_table *table)
 {
-	__sg_free_table(table, SG_MAX_SINGLE_ALLOC, sg_kfree);
+	__sg_free_table(table, SG_MAX_SINGLE_ALLOC, false, sg_kfree);
 }
 EXPORT_SYMBOL(sg_free_table);
 
@@ -241,8 +244,8 @@ EXPORT_SYMBOL(sg_free_table);
  *
  **/
 int __sg_alloc_table(struct sg_table *table, unsigned int nents,
-		     unsigned int max_ents, gfp_t gfp_mask,
-		     sg_alloc_fn *alloc_fn)
+		     unsigned int max_ents, struct scatterlist *first_chunk,
+		     gfp_t gfp_mask, sg_alloc_fn *alloc_fn)
 {
 	struct scatterlist *sg, *prv;
 	unsigned int left;
@@ -269,7 +272,12 @@ int __sg_alloc_table(struct sg_table *table, unsigned int nents,
 
 		left -= sg_size;
 
-		sg = alloc_fn(alloc_size, gfp_mask);
+		if (first_chunk) {
+			sg = first_chunk;
+			first_chunk = NULL;
+		} else {
+			sg = alloc_fn(alloc_size, gfp_mask);
+		}
 		if (unlikely(!sg)) {
 			/*
 			 * Adjust entry count to reflect that the last
@@ -324,9 +332,9 @@ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
 	int ret;
 
 	ret = __sg_alloc_table(table, nents, SG_MAX_SINGLE_ALLOC,
-			       gfp_mask, sg_kmalloc);
+			       NULL, gfp_mask, sg_kmalloc);
 	if (unlikely(ret))
-		__sg_free_table(table, SG_MAX_SINGLE_ALLOC, sg_kfree);
+		__sg_free_table(table, SG_MAX_SINGLE_ALLOC, false, sg_kfree);
 
 	return ret;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/14] scsi: add support for a blk-mq based I/O path.
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (11 preceding siblings ...)
  2014-06-12 13:49 ` [PATCH 12/14] scatterlist: allow chaining to preallocated chunks Christoph Hellwig
@ 2014-06-12 13:49 ` Christoph Hellwig
  2014-06-12 13:49 ` [PATCH 14/14] fnic: reject device resets without assigned tags for the blk-mq case Christoph Hellwig
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

This patch adds support for an alternate I/O path in the scsi midlayer
which uses the blk-mq infrastructure instead of the legacy request code.

Use of blk-mq is fully transparent to drivers, although for now a host
template field is provided to opt out of blk-mq usage in case any unforseen
incompatibilities arise.

In general replacing the legacy request code with blk-mq is a simple and
mostly mechanical transformation.  The biggest exception is the new code
that deals with the fact the I/O submissions in blk-mq must happen from
process context, which slightly complicates the I/O completion handler.
The second biggest differences is that blk-mq is build around the concept
of preallocated requests that also include driver specific data, which
in SCSI context means the scsi_cmnd structure.  This completely avoids
dynamic memory allocations for the fast path through I/O submission.

Due the preallocated requests the MQ code path exclusively uses the
host-wide shared tag allocator instead of a per-LUN one.  This only
affects drivers actually using the block layer provided tag allocator
instead of their own.  Unlike the old path blk-mq always provides a tag,
although drivers don't have to use it.

For now the blk-mq path is disable by defauly and must be enabled using
the "use_blk_mq" module parameter.  Once the remaining work in the block
layer to make blk-mq more suitable for slow devices is complete I hope
to make it the default and eventually even remove the old code path.

Based on the earlier scsi-mq prototype by Nicholas Bellinger.

Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and
various sugestions and code contributions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/hosts.c      |   30 ++-
 drivers/scsi/scsi.c       |    5 +-
 drivers/scsi/scsi_lib.c   |  460 +++++++++++++++++++++++++++++++++++++++------
 drivers/scsi/scsi_priv.h  |    3 +
 drivers/scsi/scsi_scan.c  |    5 +-
 drivers/scsi/scsi_sysfs.c |    2 +
 include/scsi/scsi_host.h  |   18 +-
 include/scsi/scsi_tcq.h   |   28 ++-
 8 files changed, 481 insertions(+), 70 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 3cbb57a..0dd6874 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -213,9 +213,24 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 		goto fail;
 	}
 
+	if (shost_use_blk_mq(shost)) {
+		error = scsi_mq_setup_tags(shost);
+		if (error)
+			goto fail;
+	}
+
+	/*
+	 * Note that we allocate the freelist even for the MQ case for now,
+	 * as we need a command set aside for scsi_reset_provider.  Having
+	 * the full host freelist and one command available for that is a
+	 * little heavy-handed, but avoids introducing a special allocator
+	 * just for this.  Eventually the structure of scsi_reset_provider
+	 * will need a major overhaul.
+	 */
 	error = scsi_setup_command_freelist(shost);
 	if (error)
-		goto fail;
+		goto out_destroy_tags;
+
 
 	if (!shost->shost_gendev.parent)
 		shost->shost_gendev.parent = dev ? dev : &platform_bus;
@@ -226,7 +241,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 
 	error = device_add(&shost->shost_gendev);
 	if (error)
-		goto out;
+		goto out_destroy_freelist;
 
 	pm_runtime_set_active(&shost->shost_gendev);
 	pm_runtime_enable(&shost->shost_gendev);
@@ -279,8 +294,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 	device_del(&shost->shost_dev);
  out_del_gendev:
 	device_del(&shost->shost_gendev);
- out:
+ out_destroy_freelist:
 	scsi_destroy_command_freelist(shost);
+ out_destroy_tags:
+	if (shost_use_blk_mq(shost))
+		scsi_mq_destroy_tags(shost);
  fail:
 	return error;
 }
@@ -309,7 +327,9 @@ static void scsi_host_dev_release(struct device *dev)
 	}
 
 	scsi_destroy_command_freelist(shost);
-	if (shost->bqt)
+	if (shost_use_blk_mq(shost) && shost->tag_set.tags)
+		scsi_mq_destroy_tags(shost);
+	else if (shost->bqt)
 		blk_free_tags(shost->bqt);
 
 	kfree(shost->shost_data);
@@ -436,6 +456,8 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	else
 		shost->dma_boundary = 0xffffffff;
 
+	shost->use_blk_mq = scsi_use_blk_mq && !shost->hostt->disable_blk_mq;
+
 	device_initialize(&shost->shost_gendev);
 	dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
 	shost->shost_gendev.bus = &scsi_bus_type;
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index e30509a..cc55b74 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -810,7 +810,7 @@ void scsi_adjust_queue_depth(struct scsi_device *sdev, int tagged, int tags)
 	 * is more IO than the LLD's can_queue (so there are not enuogh
 	 * tags) request_fn's host queue ready check will handle it.
 	 */
-	if (!sdev->host->bqt) {
+	if (!shost_use_blk_mq(sdev->host) && !sdev->host->bqt) {
 		if (blk_queue_tagged(sdev->request_queue) &&
 		    blk_queue_resize_tags(sdev->request_queue, tags) != 0)
 			goto out;
@@ -1364,6 +1364,9 @@ MODULE_LICENSE("GPL");
 module_param(scsi_logging_level, int, S_IRUGO|S_IWUSR);
 MODULE_PARM_DESC(scsi_logging_level, "a bit mask of logging levels");
 
+bool scsi_use_blk_mq = false;
+module_param_named(use_blk_mq, scsi_use_blk_mq, bool, S_IWUSR | S_IRUGO);
+
 static int __init init_scsi(void)
 {
 	int error;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 32fbae4..aecc12e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -20,6 +20,7 @@
 #include <linux/delay.h>
 #include <linux/hardirq.h>
 #include <linux/scatterlist.h>
+#include <linux/blk-mq.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -113,6 +114,16 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 	}
 }
 
+static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd)
+{
+	struct scsi_device *sdev = cmd->device;
+	struct request_queue *q = cmd->request->q;
+
+	blk_mq_requeue_request(cmd->request);
+	blk_mq_kick_requeue_list(q);
+	put_device(&sdev->sdev_gendev);
+}
+
 /**
  * __scsi_queue_insert - private queue insertion
  * @cmd: The SCSI command being requeued
@@ -150,6 +161,10 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 	 * before blk_cleanup_queue() finishes.
 	 */
 	cmd->result = 0;
+	if (q->mq_ops) {
+		scsi_mq_requeue_cmd(cmd);
+		return;
+	}
 	spin_lock_irqsave(q->queue_lock, flags);
 	blk_requeue_request(q, cmd->request);
 	kblockd_schedule_work(&device->requeue_work);
@@ -308,6 +323,14 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 	atomic_dec(&sdev->device_busy);
 }
 
+static void scsi_kick_queue(struct request_queue *q)
+{
+	if (q->mq_ops)
+		blk_mq_start_hw_queues(q);
+	else
+		blk_run_queue(q);
+}
+
 /*
  * Called for single_lun devices on IO completion. Clear starget_sdev_user,
  * and call blk_run_queue for all the scsi_devices on the target -
@@ -332,7 +355,7 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	 * but in most cases, we will be first. Ideally, each LU on the
 	 * target would get some limited time or requests on the target.
 	 */
-	blk_run_queue(current_sdev->request_queue);
+	scsi_kick_queue(current_sdev->request_queue);
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (starget->starget_sdev_user)
@@ -345,7 +368,7 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 			continue;
 
 		spin_unlock_irqrestore(shost->host_lock, flags);
-		blk_run_queue(sdev->request_queue);
+		scsi_kick_queue(sdev->request_queue);
 		spin_lock_irqsave(shost->host_lock, flags);
 	
 		scsi_device_put(sdev);
@@ -438,7 +461,7 @@ static void scsi_starved_list_run(struct Scsi_Host *shost)
 			continue;
 		spin_unlock_irqrestore(shost->host_lock, flags);
 
-		blk_run_queue(slq);
+		scsi_kick_queue(slq);
 		blk_put_queue(slq);
 
 		spin_lock_irqsave(shost->host_lock, flags);
@@ -469,7 +492,10 @@ static void scsi_run_queue(struct request_queue *q)
 	if (!list_empty(&sdev->host->starved_list))
 		scsi_starved_list_run(sdev->host);
 
-	blk_run_queue(q);
+	if (q->mq_ops)
+		blk_mq_start_stopped_hw_queues(q, false);
+	else
+		blk_run_queue(q);
 }
 
 void scsi_requeue_run_queue(struct work_struct *work)
@@ -567,25 +593,57 @@ static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
 	return mempool_alloc(sgp->pool, gfp_mask);
 }
 
-static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
+static void scsi_free_sgtable(struct scsi_data_buffer *sdb, bool mq)
 {
-	__sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS, false, scsi_sg_free);
+	if (mq && sdb->table.nents <= SCSI_MAX_SG_SEGMENTS)
+		return;
+	__sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
 }
 
 static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents,
-			      gfp_t gfp_mask)
+			      gfp_t gfp_mask, bool mq)
 {
+	struct scatterlist *first_chunk = NULL;
 	int ret;
 
 	BUG_ON(!nents);
 
+	if (mq) {
+		if (nents <= SCSI_MAX_SG_SEGMENTS) {
+			sdb->table.nents = nents;
+			sg_init_table(sdb->table.sgl, sdb->table.nents);
+			return 0;
+		}
+		first_chunk = sdb->table.sgl;
+	}
+
 	ret = __sg_alloc_table(&sdb->table, nents, SCSI_MAX_SG_SEGMENTS,
-			       NULL, gfp_mask, scsi_sg_alloc);
+			       first_chunk, gfp_mask, scsi_sg_alloc);
 	if (unlikely(ret))
-		scsi_free_sgtable(sdb);
+		scsi_free_sgtable(sdb, mq);
 	return ret;
 }
 
+static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
+{
+	if (cmd->sdb.table.nents)
+		scsi_free_sgtable(&cmd->sdb, true);
+	if (cmd->request->next_rq && cmd->request->next_rq->special)
+		scsi_free_sgtable(cmd->request->next_rq->special, true);
+	if (scsi_prot_sg_count(cmd))
+		scsi_free_sgtable(cmd->prot_sdb, true);
+}
+
+static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
+{
+	if (cmd->request->cmd_type == REQ_TYPE_FS) {
+		struct scsi_driver *drv = scsi_cmd_to_driver(cmd);
+
+		if (drv->uninit_command)
+			drv->uninit_command(cmd);
+	}
+}
+
 /*
  * Function:    scsi_release_buffers()
  *
@@ -605,12 +663,12 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents,
 void scsi_release_buffers(struct scsi_cmnd *cmd)
 {
 	if (cmd->sdb.table.nents)
-		scsi_free_sgtable(&cmd->sdb);
+		scsi_free_sgtable(&cmd->sdb, false);
 
 	memset(&cmd->sdb, 0, sizeof(cmd->sdb));
 
 	if (scsi_prot_sg_count(cmd))
-		scsi_free_sgtable(cmd->prot_sdb);
+		scsi_free_sgtable(cmd->prot_sdb, false);
 }
 EXPORT_SYMBOL(scsi_release_buffers);
 
@@ -618,7 +676,7 @@ static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
 {
 	struct scsi_data_buffer *bidi_sdb = cmd->request->next_rq->special;
 
-	scsi_free_sgtable(bidi_sdb);
+	scsi_free_sgtable(bidi_sdb, false);
 	kmem_cache_free(scsi_sdb_cache, bidi_sdb);
 	cmd->request->next_rq->special = NULL;
 }
@@ -631,7 +689,6 @@ static bool scsi_end_request(struct request *req, int error,
 	struct request_queue *q = sdev->request_queue;
 	unsigned long flags;
 
-
 	if (blk_update_request(req, error, bytes))
 		return true;
 
@@ -643,14 +700,38 @@ static bool scsi_end_request(struct request *req, int error,
 	if (blk_queue_add_random(q))
 		add_disk_randomness(req->rq_disk);
 
-	spin_lock_irqsave(q->queue_lock, flags);
-	blk_finish_request(req, error);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	if (req->mq_ctx) {
+		/*
+		 * In the MQ case the command gets freed by __blk_mq_end_io,
+		 * so we have to do all cleanup that depends on it earlier.
+		 *
+		 * We also can't kick the queues from irq context, so we
+		 * will have to defer it to a workqueue.
+		 */
+		cancel_delayed_work(&cmd->abort_work);
+		scsi_mq_free_sgtables(cmd);
+		scsi_uninit_cmd(cmd);
+
+		spin_lock_irqsave(&sdev->list_lock, flags);
+		BUG_ON(list_empty(&cmd->list));
+		list_del_init(&cmd->list);
+		spin_unlock_irqrestore(&sdev->list_lock, flags);
+
+		__blk_mq_end_io(req, error);
+
+		kblockd_schedule_work(&sdev->requeue_work);
+		put_device(&sdev->sdev_gendev);
+	} else {
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_finish_request(req, error);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+
+		if (bidi_bytes)
+			scsi_release_bidi_buffers(cmd);
+		scsi_release_buffers(cmd);
+		scsi_next_command(cmd);
+	}
 
-	if (bidi_bytes)
-		scsi_release_bidi_buffers(cmd);
-	scsi_release_buffers(cmd);
-	scsi_next_command(cmd);
 	return false;
 }
 
@@ -981,8 +1062,16 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 		/* Unprep the request and put it back at the head of the queue.
 		 * A new command will be prepared and issued.
 		 */
-		scsi_release_buffers(cmd);
-		scsi_requeue_command(q, cmd);
+		if (q->mq_ops) {
+			cancel_delayed_work(&cmd->abort_work);
+			cmd->request->cmd_flags &= ~REQ_DONTPREP;
+			scsi_mq_free_sgtables(cmd);
+			scsi_uninit_cmd(cmd);
+			scsi_mq_requeue_cmd(cmd);
+		} else {
+			scsi_release_buffers(cmd);
+			scsi_requeue_command(q, cmd);
+		}
 		break;
 	case ACTION_RETRY:
 		/* Retry the same command immediately */
@@ -1004,9 +1093,8 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
 	 * If sg table allocation fails, requeue request later.
 	 */
 	if (unlikely(scsi_alloc_sgtable(sdb, req->nr_phys_segments,
-					gfp_mask))) {
+					gfp_mask, req->mq_ctx != NULL)))
 		return BLKPREP_DEFER;
-	}
 
 	/* 
 	 * Next, walk the list, and fill in the addresses and sizes of
@@ -1034,21 +1122,27 @@ int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 {
 	struct scsi_device *sdev = cmd->device;
 	struct request *rq = cmd->request;
+	bool is_mq = (rq->mq_ctx != NULL);
+	int error;
 
-	int error = scsi_init_sgtable(rq, &cmd->sdb, gfp_mask);
+	error = scsi_init_sgtable(rq, &cmd->sdb, gfp_mask);
 	if (error)
 		goto err_exit;
 
 	if (blk_bidi_rq(rq)) {
-		struct scsi_data_buffer *bidi_sdb = kmem_cache_zalloc(
-			scsi_sdb_cache, GFP_ATOMIC);
-		if (!bidi_sdb) {
-			error = BLKPREP_DEFER;
-			goto err_exit;
+		if (!rq->q->mq_ops) {
+			struct scsi_data_buffer *bidi_sdb =
+				kmem_cache_zalloc(scsi_sdb_cache, GFP_ATOMIC);
+			if (!bidi_sdb) {
+				error = BLKPREP_DEFER;
+				goto err_exit;
+			}
+
+			rq->next_rq->special = bidi_sdb;
 		}
 
-		rq->next_rq->special = bidi_sdb;
-		error = scsi_init_sgtable(rq->next_rq, bidi_sdb, GFP_ATOMIC);
+		error = scsi_init_sgtable(rq->next_rq, rq->next_rq->special,
+					  GFP_ATOMIC);
 		if (error)
 			goto err_exit;
 	}
@@ -1060,7 +1154,7 @@ int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 		BUG_ON(prot_sdb == NULL);
 		ivecs = blk_rq_count_integrity_sg(rq->q, rq->bio);
 
-		if (scsi_alloc_sgtable(prot_sdb, ivecs, gfp_mask)) {
+		if (scsi_alloc_sgtable(prot_sdb, ivecs, gfp_mask, is_mq)) {
 			error = BLKPREP_DEFER;
 			goto err_exit;
 		}
@@ -1074,13 +1168,16 @@ int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 		cmd->prot_sdb->table.nents = count;
 	}
 
-	return BLKPREP_OK ;
-
+	return BLKPREP_OK;
 err_exit:
-	scsi_release_buffers(cmd);
-	cmd->request->special = NULL;
-	scsi_put_command(cmd);
-	put_device(&sdev->sdev_gendev);
+	if (is_mq) {
+		scsi_mq_free_sgtables(cmd);
+	} else {
+		scsi_release_buffers(cmd);
+		cmd->request->special = NULL;
+		scsi_put_command(cmd);
+		put_device(&sdev->sdev_gendev);
+	}
 	return error;
 }
 EXPORT_SYMBOL(scsi_init_io);
@@ -1295,13 +1392,7 @@ out:
 
 static void scsi_unprep_fn(struct request_queue *q, struct request *req)
 {
-	if (req->cmd_type == REQ_TYPE_FS) {
-		struct scsi_cmnd *cmd = req->special;
-		struct scsi_driver *drv = scsi_cmd_to_driver(cmd);
-
-		if (drv->uninit_command)
-			drv->uninit_command(cmd);
-	}
+	scsi_uninit_cmd(req->special);
 }
 
 /*
@@ -1318,7 +1409,11 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 	busy = atomic_inc_return(&sdev->device_busy) - 1;
 	if (busy == 0 && atomic_read(&sdev->device_blocked) > 0) {
 		if (atomic_dec_return(&sdev->device_blocked) > 0) {
-			blk_delay_queue(q, SCSI_QUEUE_DELAY);
+			/*
+			 * For the MQ case we take care of this in the caller.
+			 */
+			if (!q->mq_ops)
+				blk_delay_queue(q, SCSI_QUEUE_DELAY);
 			goto out_dec;
 		}
 		SCSI_LOG_MLQUEUE(3, sdev_printk(KERN_INFO, sdev,
@@ -1688,6 +1783,190 @@ out_delay:
 		blk_delay_queue(q, SCSI_QUEUE_DELAY);
 }
 
+static inline int prep_to_mq(int ret)
+{
+	switch (ret) {
+	case BLKPREP_OK:
+		return 0;
+	case BLKPREP_DEFER:
+		return BLK_MQ_RQ_QUEUE_BUSY;
+	default:
+		return BLK_MQ_RQ_QUEUE_ERROR;
+	}
+}
+
+static int scsi_mq_prep_fn(struct request *req)
+{
+	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
+	struct scsi_device *sdev = req->q->queuedata;
+	struct Scsi_Host *shost = sdev->host;
+	unsigned char *sense_buf = cmd->sense_buffer;
+	struct scatterlist *sg;
+
+	memset(cmd, 0, sizeof(struct scsi_cmnd));
+
+	req->special = cmd;
+
+	cmd->request = req;
+	cmd->device = sdev;
+	cmd->sense_buffer = sense_buf;
+
+	cmd->tag = req->tag;
+
+	req->cmd = req->__cmd;
+	cmd->cmnd = req->cmd;
+	cmd->prot_op = SCSI_PROT_NORMAL;
+
+	INIT_LIST_HEAD(&cmd->list);
+	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
+	cmd->jiffies_at_alloc = jiffies;
+
+	/*
+	 * XXX: cmd_list lookups are only used by two drivers, try to get
+	 * rid of this list in common code.
+	 */
+	spin_lock_irq(&sdev->list_lock);
+	list_add_tail(&cmd->list, &sdev->cmd_list);
+	spin_unlock_irq(&sdev->list_lock);
+
+	sg = (void *)cmd + sizeof(struct scsi_cmnd) + shost->hostt->cmd_size;
+	cmd->sdb.table.sgl = sg;
+
+	if (scsi_host_get_prot(shost)) {
+		cmd->prot_sdb = (void *)sg +
+			shost->sg_tablesize * sizeof(struct scatterlist);
+		memset(cmd->prot_sdb, 0, sizeof(struct scsi_data_buffer));
+
+		cmd->prot_sdb->table.sgl =
+			(struct scatterlist *)(cmd->prot_sdb + 1);
+	}
+
+	if (blk_bidi_rq(req)) {
+		struct request *next_rq = req->next_rq;
+		struct scsi_data_buffer *bidi_sdb = blk_mq_rq_to_pdu(next_rq);
+
+		memset(bidi_sdb, 0, sizeof(struct scsi_data_buffer));
+		bidi_sdb->table.sgl =
+			(struct scatterlist *)(bidi_sdb + 1);
+
+		next_rq->special = bidi_sdb;
+	}
+
+	switch (req->cmd_type) {
+	case REQ_TYPE_FS:
+		return scsi_cmd_to_driver(cmd)->init_command(cmd);
+	case REQ_TYPE_BLOCK_PC:
+		return scsi_setup_blk_pc_cmnd(cmd->device, req);
+	default:
+		return BLKPREP_KILL;
+	}
+}
+
+static void scsi_mq_done(struct scsi_cmnd *cmd)
+{
+	trace_scsi_dispatch_cmd_done(cmd);
+	blk_mq_complete_request(cmd->request);
+}
+
+static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
+{
+	struct request_queue *q = req->q;
+	struct scsi_device *sdev = q->queuedata;
+	struct Scsi_Host *shost = sdev->host;
+	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
+	int ret;
+	int reason;
+
+	ret = prep_to_mq(scsi_prep_state_check(sdev, req));
+	if (ret)
+		goto out;
+
+	ret = BLK_MQ_RQ_QUEUE_BUSY;
+	if (!get_device(&sdev->sdev_gendev))
+		goto out;
+
+	if (!scsi_dev_queue_ready(q, sdev))
+		goto out_put_device;
+	if (!scsi_target_queue_ready(shost, sdev))
+		goto out_dec_device_busy;
+	if (!scsi_host_queue_ready(q, shost, sdev))
+		goto out_dec_target_busy;
+
+	if (!(req->cmd_flags & REQ_DONTPREP)) {
+		ret = prep_to_mq(scsi_mq_prep_fn(req));
+		if (ret)
+			goto out_dec_host_busy;
+		req->cmd_flags |= REQ_DONTPREP;
+	}
+
+	scsi_init_cmd_errh(cmd);
+	cmd->scsi_done = scsi_mq_done;
+
+	reason = scsi_dispatch_cmd(cmd);
+	if (reason) {
+		scsi_set_blocked(cmd, reason);
+		ret = BLK_MQ_RQ_QUEUE_BUSY;
+		goto out_dec_host_busy;
+	}
+
+	return BLK_MQ_RQ_QUEUE_OK;
+
+out_dec_host_busy:
+	cancel_delayed_work(&cmd->abort_work);
+	atomic_dec(&shost->host_busy);
+out_dec_target_busy:
+	if (scsi_target(sdev)->can_queue > 0)
+		atomic_dec(&scsi_target(sdev)->target_busy);
+out_dec_device_busy:
+	atomic_dec(&sdev->device_busy);
+out_put_device:
+	put_device(&sdev->sdev_gendev);
+out:
+	switch (ret) {
+	case BLK_MQ_RQ_QUEUE_BUSY:
+		blk_mq_stop_hw_queue(hctx);
+		if (atomic_read(&sdev->device_busy) == 0 &&
+		    !scsi_device_blocked(sdev))
+			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
+		break;
+	case BLK_MQ_RQ_QUEUE_ERROR:
+		/*
+		 * Make sure to release all allocated ressources when
+		 * we hit an error, as we will never see this command
+		 * again.
+		 */
+		if (req->cmd_flags & REQ_DONTPREP) {
+			scsi_mq_free_sgtables(cmd);
+			scsi_uninit_cmd(cmd);
+		}
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
+static int scsi_init_request(void *data, struct request *rq,
+		unsigned int hctx_idx, unsigned int request_idx,
+		unsigned int numa_node)
+{
+	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
+
+	cmd->sense_buffer = kzalloc_node(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL,
+			numa_node);
+	if (!cmd->sense_buffer)
+		return -ENOMEM;
+	return 0;
+}
+
+static void scsi_exit_request(void *data, struct request *rq,
+		unsigned int hctx_idx, unsigned int request_idx)
+{
+	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
+
+	kfree(cmd->sense_buffer);
+}
+
 u64 scsi_calculate_bounce_limit(struct Scsi_Host *shost)
 {
 	struct device *host_dev;
@@ -1710,16 +1989,10 @@ u64 scsi_calculate_bounce_limit(struct Scsi_Host *shost)
 }
 EXPORT_SYMBOL(scsi_calculate_bounce_limit);
 
-struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
-					 request_fn_proc *request_fn)
+static void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 {
-	struct request_queue *q;
 	struct device *dev = shost->dma_dev;
 
-	q = blk_init_queue(request_fn, NULL);
-	if (!q)
-		return NULL;
-
 	/*
 	 * this limit is imposed by hardware restrictions
 	 */
@@ -1750,7 +2023,17 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
 	 * blk_queue_update_dma_alignment() later.
 	 */
 	blk_queue_dma_alignment(q, 0x03);
+}
+
+struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
+					 request_fn_proc *request_fn)
+{
+	struct request_queue *q;
 
+	q = blk_init_queue(request_fn, NULL);
+	if (!q)
+		return NULL;
+	__scsi_init_queue(shost, q);
 	return q;
 }
 EXPORT_SYMBOL(__scsi_alloc_queue);
@@ -1771,6 +2054,55 @@ struct request_queue *scsi_alloc_queue(struct scsi_device *sdev)
 	return q;
 }
 
+static struct blk_mq_ops scsi_mq_ops = {
+	.map_queue	= blk_mq_map_queue,
+	.queue_rq	= scsi_queue_rq,
+	.complete	= scsi_softirq_done,
+	.timeout	= scsi_times_out,
+	.init_request	= scsi_init_request,
+	.exit_request	= scsi_exit_request,
+};
+
+struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
+{
+	sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
+	if (IS_ERR(sdev->request_queue))
+		return NULL;
+
+	sdev->request_queue->queuedata = sdev;
+	__scsi_init_queue(sdev->host, sdev->request_queue);
+	return sdev->request_queue;
+}
+
+int scsi_mq_setup_tags(struct Scsi_Host *shost)
+{
+	unsigned int cmd_size, sgl_size, tbl_size;
+
+	tbl_size = shost->sg_tablesize;
+	if (tbl_size > SCSI_MAX_SG_SEGMENTS)
+		tbl_size = SCSI_MAX_SG_SEGMENTS;
+	sgl_size = tbl_size * sizeof(struct scatterlist);
+	cmd_size = sizeof(struct scsi_cmnd) + shost->hostt->cmd_size + sgl_size;
+	if (scsi_host_get_prot(shost))
+		cmd_size += sizeof(struct scsi_data_buffer) + sgl_size;
+
+	memset(&shost->tag_set, 0, sizeof(shost->tag_set));
+	shost->tag_set.ops = &scsi_mq_ops;
+	shost->tag_set.nr_hw_queues = 1;
+	shost->tag_set.queue_depth = shost->can_queue;
+	shost->tag_set.cmd_size = cmd_size;
+	shost->tag_set.numa_node = NUMA_NO_NODE;
+	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+	shost->tag_set.driver_data = shost;
+
+	return blk_mq_alloc_tag_set(&shost->tag_set);
+}
+
+void scsi_mq_destroy_tags(struct Scsi_Host *shost)
+{
+	blk_mq_free_tag_set(&shost->tag_set);
+}
+
 /*
  * Function:    scsi_block_requests()
  *
@@ -2516,9 +2848,13 @@ scsi_internal_device_block(struct scsi_device *sdev)
 	 * block layer from calling the midlayer with this device's
 	 * request queue. 
 	 */
-	spin_lock_irqsave(q->queue_lock, flags);
-	blk_stop_queue(q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	if (q->mq_ops) {
+		blk_mq_stop_hw_queues(q);
+	} else {
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_stop_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
 
 	return 0;
 }
@@ -2564,9 +2900,13 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
 		 sdev->sdev_state != SDEV_OFFLINE)
 		return -EINVAL;
 
-	spin_lock_irqsave(q->queue_lock, flags);
-	blk_start_queue(q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	if (q->mq_ops) {
+		blk_mq_start_stopped_hw_queues(q, false);
+	} else {
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_start_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
 
 	return 0;
 }
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 48e5b65..5d8353f 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -88,6 +88,9 @@ extern void scsi_next_command(struct scsi_cmnd *cmd);
 extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
 extern void scsi_run_host_queues(struct Scsi_Host *shost);
 extern struct request_queue *scsi_alloc_queue(struct scsi_device *sdev);
+extern struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev);
+extern int scsi_mq_setup_tags(struct Scsi_Host *shost);
+extern void scsi_mq_destroy_tags(struct Scsi_Host *shost);
 extern int scsi_init_queue(void);
 extern void scsi_exit_queue(void);
 struct request_queue;
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index e02b3aa..e6ce3a1 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -277,7 +277,10 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 	 */
 	sdev->borken = 1;
 
-	sdev->request_queue = scsi_alloc_queue(sdev);
+	if (shost_use_blk_mq(shost))
+		sdev->request_queue = scsi_mq_alloc_queue(sdev);
+	else
+		sdev->request_queue = scsi_alloc_queue(sdev);
 	if (!sdev->request_queue) {
 		/* release fn is set up in scsi_sysfs_device_initialise, so
 		 * have to free and put manually here */
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 9efa2b8..81f50b1 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -333,6 +333,7 @@ store_shost_eh_deadline(struct device *dev, struct device_attribute *attr,
 
 static DEVICE_ATTR(eh_deadline, S_IRUGO | S_IWUSR, show_shost_eh_deadline, store_shost_eh_deadline);
 
+shost_rd_attr(use_blk_mq, "%d\n");
 shost_rd_attr(unique_id, "%u\n");
 shost_rd_attr(cmd_per_lun, "%hd\n");
 shost_rd_attr(can_queue, "%hd\n");
@@ -352,6 +353,7 @@ show_host_busy(struct device *dev, struct device_attribute *attr, char *buf)
 static DEVICE_ATTR(host_busy, S_IRUGO, show_host_busy, NULL);
 
 static struct attribute *scsi_sysfs_shost_attrs[] = {
+	&dev_attr_use_blk_mq.attr,
 	&dev_attr_unique_id.attr,
 	&dev_attr_host_busy.attr,
 	&dev_attr_cmd_per_lun.attr,
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index c4e4875..f48f9ce 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -7,6 +7,7 @@
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
 #include <linux/seq_file.h>
+#include <linux/blk-mq.h>
 #include <scsi/scsi.h>
 
 struct request_queue;
@@ -531,6 +532,9 @@ struct scsi_host_template {
 	 */
 	unsigned int cmd_size;
 	struct scsi_host_cmd_pool *cmd_pool;
+
+	/* temporary flag to disable blk-mq I/O path */
+	bool disable_blk_mq;
 };
 
 /*
@@ -601,7 +605,10 @@ struct Scsi_Host {
 	 * Area to keep a shared tag map (if needed, will be
 	 * NULL if not).
 	 */
-	struct blk_queue_tag	*bqt;
+	union {
+		struct blk_queue_tag	*bqt;
+		struct blk_mq_tag_set	tag_set;
+	};
 
 	atomic_t host_busy;		   /* commands actually active on low-level */
 	atomic_t host_blocked;
@@ -693,6 +700,8 @@ struct Scsi_Host {
 	/* The controller does not support WRITE SAME */
 	unsigned no_write_same:1;
 
+	unsigned use_blk_mq:1;
+
 	/*
 	 * Optional work queue to be utilized by the transport
 	 */
@@ -793,6 +802,13 @@ static inline int scsi_host_in_recovery(struct Scsi_Host *shost)
 		shost->tmf_in_progress;
 }
 
+extern bool scsi_use_blk_mq;
+
+static inline bool shost_use_blk_mq(struct Scsi_Host *shost)
+{
+	return shost->use_blk_mq;
+}
+
 extern int scsi_queue_work(struct Scsi_Host *, struct work_struct *);
 extern void scsi_flush_work(struct Scsi_Host *);
 
diff --git a/include/scsi/scsi_tcq.h b/include/scsi/scsi_tcq.h
index 81dd12e..cdcc90b 100644
--- a/include/scsi/scsi_tcq.h
+++ b/include/scsi/scsi_tcq.h
@@ -67,7 +67,8 @@ static inline void scsi_activate_tcq(struct scsi_device *sdev, int depth)
 	if (!sdev->tagged_supported)
 		return;
 
-	if (!blk_queue_tagged(sdev->request_queue))
+	if (!shost_use_blk_mq(sdev->host) &&
+	    blk_queue_tagged(sdev->request_queue))
 		blk_queue_init_tags(sdev->request_queue, depth,
 				    sdev->host->bqt);
 
@@ -80,7 +81,8 @@ static inline void scsi_activate_tcq(struct scsi_device *sdev, int depth)
  **/
 static inline void scsi_deactivate_tcq(struct scsi_device *sdev, int depth)
 {
-	if (blk_queue_tagged(sdev->request_queue))
+	if (!shost_use_blk_mq(sdev->host) &&
+	    blk_queue_tagged(sdev->request_queue))
 		blk_queue_free_tags(sdev->request_queue);
 	scsi_adjust_queue_depth(sdev, 0, depth);
 }
@@ -108,6 +110,15 @@ static inline int scsi_populate_tag_msg(struct scsi_cmnd *cmd, char *msg)
 	return 0;
 }
 
+static inline struct scsi_cmnd *scsi_mq_find_tag(struct Scsi_Host *shost,
+		unsigned int hw_ctx, int tag)
+{
+	struct request *req;
+
+	req = blk_mq_tag_to_rq(shost->tag_set.tags[hw_ctx], tag);
+	return req ? (struct scsi_cmnd *)req->special : NULL;
+}
+
 /**
  * scsi_find_tag - find a tagged command by device
  * @SDpnt:	pointer to the ScSI device
@@ -118,10 +129,12 @@ static inline int scsi_populate_tag_msg(struct scsi_cmnd *cmd, char *msg)
  **/
 static inline struct scsi_cmnd *scsi_find_tag(struct scsi_device *sdev, int tag)
 {
-
         struct request *req;
 
         if (tag != SCSI_NO_TAG) {
+		if (shost_use_blk_mq(sdev->host))
+			return scsi_mq_find_tag(sdev->host, 0, tag);
+
         	req = blk_queue_find_tag(sdev->request_queue, tag);
 	        return req ? (struct scsi_cmnd *)req->special : NULL;
 	}
@@ -130,6 +143,7 @@ static inline struct scsi_cmnd *scsi_find_tag(struct scsi_device *sdev, int tag)
 	return sdev->current_cmnd;
 }
 
+
 /**
  * scsi_init_shared_tag_map - create a shared tag map
  * @shost:	the host to share the tag map among all devices
@@ -138,6 +152,12 @@ static inline struct scsi_cmnd *scsi_find_tag(struct scsi_device *sdev, int tag)
 static inline int scsi_init_shared_tag_map(struct Scsi_Host *shost, int depth)
 {
 	/*
+	 * We always have a shared tag map around when using blk-mq.
+	 */
+	if (shost_use_blk_mq(shost))
+		return 0;
+
+	/*
 	 * If the shared tag map isn't already initialized, do it now.
 	 * This saves callers from having to check ->bqt when setting up
 	 * devices on the shared host (for libata)
@@ -165,6 +185,8 @@ static inline struct scsi_cmnd *scsi_host_find_tag(struct Scsi_Host *shost,
 	struct request *req;
 
 	if (tag != SCSI_NO_TAG) {
+		if (shost_use_blk_mq(shost))
+			return scsi_mq_find_tag(shost, 0, tag);
 		req = blk_map_queue_find_tag(shost->bqt, tag);
 		return req ? (struct scsi_cmnd *)req->special : NULL;
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 14/14] fnic: reject device resets without assigned tags for the blk-mq case
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (12 preceding siblings ...)
  2014-06-12 13:49 ` [PATCH 13/14] scsi: add support for a blk-mq based I/O path Christoph Hellwig
@ 2014-06-12 13:49 ` Christoph Hellwig
  2014-06-13  6:42 ` scsi-mq Bart Van Assche
  2014-06-17 14:27 ` scsi-mq Bart Van Assche
  15 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi,
	linux-kernel, Hiral Patel, Suma Ramars, Brian Uchino

Current the midlayer fakes up a struct request for the explicit reset
ioctls, and those don't have a tag allocated to them.  The fnic driver pokes
into midlayer structures to paper over this design issue, but that won't
work for the blk-mq case.

Either someone who can actually test the hardware will have to come up with
a similar hack for the blk-mq case, or we'll have to bite the bullet and fix
the way the EH ioctls work for real, but until that happens we fail these
explicit requests here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Hiral Patel <hiralpat@cisco.com>
Cc: Suma Ramars <sramars@cisco.com>
Cc: Brian Uchino <buchino@cisco.com>
---
 drivers/scsi/fnic/fnic_scsi.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index ea28b5c..f5dd7e0 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -2224,6 +2224,22 @@ int fnic_device_reset(struct scsi_cmnd *sc)
 
 	tag = sc->request->tag;
 	if (unlikely(tag < 0)) {
+		/*
+		 * XXX(hch): current the midlayer fakes up a struct
+		 * request for the explicit reset ioctls, and those
+		 * don't have a tag allocated to them.  The below
+		 * code pokes into midlayer structures to paper over
+		 * this design issue, but that won't work for blk-mq.
+		 *
+		 * Either someone who can actually test the hardware
+		 * will have to come up with a similar hack for the
+		 * blk-mq case, or we'll have to bite the bullet and
+		 * fix the way the EH ioctls work for real, but until
+		 * that happens we fail these explicit requests here.
+		 */
+		if (shost_use_blk_mq(sc->device->host))
+			goto fnic_device_reset_end;
+
 		tag = fnic_scsi_host_start_tag(fnic, sc);
 		if (unlikely(tag == SCSI_NO_TAG))
 			goto fnic_device_reset_end;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: scsi-mq
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (13 preceding siblings ...)
  2014-06-12 13:49 ` [PATCH 14/14] fnic: reject device resets without assigned tags for the blk-mq case Christoph Hellwig
@ 2014-06-13  6:42 ` Bart Van Assche
  2014-06-17 14:27 ` scsi-mq Bart Van Assche
  15 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2014-06-13  6:42 UTC (permalink / raw)
  To: Christoph Hellwig, James Bottomley
  Cc: Jens Axboe, Robert Elliot, linux-scsi, linux-kernel

On 06/12/14 15:48, Christoph Hellwig wrote:
> The usage of blk-mq dramatically decreases CPU usage under all workloads going
> down from 100% CPU usage that the old setup can hit easily to usually less
> than 20% for maxing out storage subsystems with 512 byte reads and writes,
> and it allows to easily achieve millions of IOPS. Bart and Robert have
> helped with some very detailed measurements that they might be able to send
> in reply to this, although these usually involve significantly reworked low
> level drivers to avoid other bottle necks.

Thanks Christoph for writing up this excellent summary. In case anyone
is wondering which LLD changes Christoph is referring to: the tests I
ran myself were with an SRP initiator driver modified to use multiple
RDMA channels between initiator and target instead of a single channel.
I will post these changes for review on the linux-rdma mailing list as
soon as I have the time.

Bart.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: scsi-mq
  2014-06-12 13:48 scsi-mq Christoph Hellwig
                   ` (14 preceding siblings ...)
  2014-06-13  6:42 ` scsi-mq Bart Van Assche
@ 2014-06-17 14:27 ` Bart Van Assche
  2014-06-18  3:44   ` scsi-mq Jens Axboe
  15 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2014-06-17 14:27 UTC (permalink / raw)
  To: Christoph Hellwig, James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

On 06/12/14 15:48, Christoph Hellwig wrote:
> Bart and Robert have helped with some very detailed measurements that they
> might be able to send in reply to this, although these usually involve
> significantly reworked low level drivers to avoid other bottle necks.

In case someone would like to see the results of the measurements I ran,
these results can be found here:
https://docs.google.com/file/d/0B1YQOreL3_FxUXFMSjhmNDBNNTg.

Two important conclusions from the data in that PDF document are as follows:
- A small but significant performance improvement for the traditional
  SCSI mid-layer (use_blk_mq=N).
- A very significant performance improvement for multithreaded
  workloads with use_blk_mq=Y. As an example, the number of I/O
  operations per second reported for the random write test increased
  with 170%. That means 2.7 times the performance
  of use_blk_mq=N.

I think this means the scsi-mq patches are ready for wider use.

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: scsi-mq
  2014-06-17 14:27 ` scsi-mq Bart Van Assche
@ 2014-06-18  3:44   ` Jens Axboe
  2014-06-18  7:09     ` scsi-mq Bart Van Assche
  2014-06-19  0:58     ` scsi-mq Elliott, Robert (Server Storage)
  0 siblings, 2 replies; 29+ messages in thread
From: Jens Axboe @ 2014-06-18  3:44 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig, James Bottomley
  Cc: Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

On 2014-06-17 07:27, Bart Van Assche wrote:
> On 06/12/14 15:48, Christoph Hellwig wrote:
>> Bart and Robert have helped with some very detailed measurements that they
>> might be able to send in reply to this, although these usually involve
>> significantly reworked low level drivers to avoid other bottle necks.
>
> In case someone would like to see the results of the measurements I ran,
> these results can be found here:
> https://docs.google.com/file/d/0B1YQOreL3_FxUXFMSjhmNDBNNTg.
>
> Two important conclusions from the data in that PDF document are as follows:
> - A small but significant performance improvement for the traditional
>    SCSI mid-layer (use_blk_mq=N).
> - A very significant performance improvement for multithreaded
>    workloads with use_blk_mq=Y. As an example, the number of I/O
>    operations per second reported for the random write test increased
>    with 170%. That means 2.7 times the performance
>    of use_blk_mq=N.

Thanks for posting these numbers, Bart. The CPU utilization and IOPS 
speak a very clear message. The only mystery is why the singe threaded 
performance is down. That we need to get sort, but it's not a show 
stopper for inclusion.

If you run the single threaded tests and watch for queue depths, is 
there a difference between blk-mq=y/scsi-mq and the stock kernel?

> I think this means the scsi-mq patches are ready for wider use.

I would agree. James, I haven't seen any comments from you on this yet. 
I've run various bits of scsi-mq testing as well, and no ill effects 
seen. On top of that, Christophs patches are nicely separated and have 
general benefits even for the non-blk-mq cases. Time to shove them into 
the queue for the next merge window?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: scsi-mq
  2014-06-18  3:44   ` scsi-mq Jens Axboe
@ 2014-06-18  7:09     ` Bart Van Assche
  2014-06-21  0:52       ` scsi-mq Elliott, Robert (Server Storage)
  2014-06-19  0:58     ` scsi-mq Elliott, Robert (Server Storage)
  1 sibling, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2014-06-18  7:09 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, James Bottomley
  Cc: Robert Elliot, linux-scsi, linux-kernel

On 06/18/14 05:44, Jens Axboe wrote:
> Thanks for posting these numbers, Bart. The CPU utilization and IOPS
> speak a very clear message. The only mystery is why the singe threaded
> performance is down. That we need to get sort, but it's not a show
> stopper for inclusion.
> 
> If you run the single threaded tests and watch for queue depths, is
> there a difference between blk-mq=y/scsi-mq and the stock kernel?

Hello Jens,

Fio reports the same queue depth for use_blk_mq=Y (mq below) and 
use_blk_mq=N (sq below), namely ">=64". However, the number of context
switches differs significantly for the random read-write tests. From
the fio output for these tests:

$ grep ctx= {sq,mq}/randrw*
sq/randrw-1.txt:  cpu          : usr=10.25%, sys=89.42%, ctx=2210, majf=0, minf=1
sq/randrw-2.txt:  cpu          : usr=10.23%, sys=89.34%, ctx=5003, majf=0, minf=2
sq/randrw-3.txt:  cpu          : usr=9.36%, sys=90.21%, ctx=7947, majf=0, minf=3
sq/randrw-4.txt:  cpu          : usr=8.96%, sys=90.10%, ctx=19308, majf=0, minf=4
sq/randrw-5.txt:  cpu          : usr=8.97%, sys=89.70%, ctx=31494, majf=0, minf=5
sq/randrw-6.txt:  cpu          : usr=8.39%, sys=90.08%, ctx=47826, majf=0, minf=6
sq/randrw-7.txt:  cpu          : usr=7.65%, sys=89.65%, ctx=130563, majf=0, minf=7
sq/randrw-8.txt:  cpu          : usr=6.47%, sys=84.08%, ctx=753140, majf=0, minf=8
mq/randrw-1.txt:  cpu          : usr=1.43%, sys=14.43%, ctx=500998, majf=0, minf=1
mq/randrw-2.txt:  cpu          : usr=1.37%, sys=14.13%, ctx=979842, majf=0, minf=2
mq/randrw-3.txt:  cpu          : usr=1.47%, sys=14.81%, ctx=1547996, majf=0, minf=3
mq/randrw-4.txt:  cpu          : usr=1.79%, sys=16.51%, ctx=2321154, majf=0, minf=4
mq/randrw-5.txt:  cpu          : usr=2.49%, sys=22.09%, ctx=4145747, majf=0, minf=5
mq/randrw-6.txt:  cpu          : usr=2.98%, sys=27.07%, ctx=6356183, majf=0, minf=6
mq/randrw-7.txt:  cpu          : usr=3.39%, sys=30.48%, ctx=8675960, majf=0, minf=7
mq/randrw-8.txt:  cpu          : usr=3.37%, sys=31.46%, ctx=10462001, majf=0, minf=8

It seems like with the traditional SCSI mid-layer and block core (sq)
that the number of context switches does not depend too much on the
number of I/O operations but that for the multi-queue SCSI core there
are a little bit more than two context switches per I/O in the
particular test I ran. The "randrw" script I used for this test takes
SCSI LUNs as arguments (/dev/sdX) and starts the fio tool as follows:

for d in "$@"; do
    if [ ! -e "$d" ]; then
	echo "Error: device $d not found."
	exit 1
    fi
    bdev="/sys/class/block/$(basename $d)"
    if [ -e $bdev/queue ]; then
	echo 0    >$bdev/queue/add_random
	echo 0    >$bdev/queue/rotational
	echo 2    >$bdev/queue/rq_affinity
	echo noop >$bdev/queue/scheduler
    fi
done

"$(dirname $0)"/disable-frequency-scaling

fio --bs=512 --ioengine=libaio --rw=randrw --iodepth=128          \
    --iodepth_batch=64 --iodepth_batch_complete=64                \
    --buffered=0 --norandommap --thread --loops=$((2**31))        \
    --runtime=60 --group_reporting --gtod_reduce=1 --invalidate=1 \
    $(for d in "$@"; do echo --name=$d --filename=$d; done)

"$(dirname $0)"/restore-frequency-scaling

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: scsi-mq
  2014-06-18  3:44   ` scsi-mq Jens Axboe
  2014-06-18  7:09     ` scsi-mq Bart Van Assche
@ 2014-06-19  0:58     ` Elliott, Robert (Server Storage)
  1 sibling, 0 replies; 29+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-06-19  0:58 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, Christoph Hellwig, James Bottomley,
	scameron
  Cc: Bart Van Assche, linux-scsi, linux-kernel



> -----Original Message-----
> From: Jens Axboe [mailto:axboe@kernel.dk]
> Sent: Tuesday, 17 June, 2014 10:45 PM
> To: Bart Van Assche; Christoph Hellwig; James Bottomley
> Cc: Bart Van Assche; Elliott, Robert (Server Storage); linux-
> scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: scsi-mq
> 
> On 2014-06-17 07:27, Bart Van Assche wrote:
> > On 06/12/14 15:48, Christoph Hellwig wrote:
> >> Bart and Robert have helped with some very detailed measurements that they
> >> might be able to send in reply to this, although these usually involve
> >> significantly reworked low level drivers to avoid other bottle necks.
> >
> > In case someone would like to see the results of the measurements I ran,
> > these results can be found here:
> > https://docs.google.com/file/d/0B1YQOreL3_FxUXFMSjhmNDBNNTg.
> >
> > Two important conclusions from the data in that PDF document are as
> follows:
> > - A small but significant performance improvement for the traditional
> >    SCSI mid-layer (use_blk_mq=N).
> > - A very significant performance improvement for multithreaded
> >    workloads with use_blk_mq=Y. As an example, the number of I/O
> >    operations per second reported for the random write test increased
> >    with 170%. That means 2.7 times the performance
> >    of use_blk_mq=N.
> 
> Thanks for posting these numbers, Bart. The CPU utilization and IOPS
> speak a very clear message. The only mystery is why the singe threaded
> performance is down. That we need to get sort, but it's not a show
> stopper for inclusion.
> 
> If you run the single threaded tests and watch for queue depths, is
> there a difference between blk-mq=y/scsi-mq and the stock kernel?
> 
> > I think this means the scsi-mq patches are ready for wider use.
> 
> I would agree. James, I haven't seen any comments from you on this yet.
> I've run various bits of scsi-mq testing as well, and no ill effects
> seen. On top of that, Christophs patches are nicely separated and have
> general benefits even for the non-blk-mq cases. Time to shove them into
> the queue for the next merge window?
> 
> --
> Jens Axboe

We've been testing the hpsa driver extensively with the scsi-mq-wip trees.
I don't have numbers with the latest scsi-mq tree yet, but here are some
performance numbers from scsi-mq-wip.5 through 7.  

scsi-mq slightly underperformed non-scsi-mq when using multiple devices:
* normal		975K IOPS (16 devices each made from 1 drive)
* scsi-mq-wip.5	905K IOPS (16 devices each made from 1 drive)
* scsi-mq-wip.6+	969K IOPS (16 devices... 3 threads per device)

but was much better when using a single device:
* normal		166K IOPS (1 device made from 8 drives, 1 thread)          
* normal		266K IOPS (1 device made from 8 drives, 12 threads)
* scsi-mq-wip.5	880K IOPS (1 device made from 8 drives, 12 threads)

* normal		266K IOPS (1 device made from 16 drives, 12 threads)
* scsi-mq-wip.5	973K IOPS (1 device made from 16 drives, 12 threads)
* scsi-mq-wip.6+	979K IOPS (1 device made from 16 drives, 12 threads)


The headline improvement is that one device can reach the same performance 
as multiple devices - no more bottleneck in per-device queue locks limiting 
performance to around 266K IOPS per device.  Even the scsi_debug driver in
fake_rw mode hits that limit.

hpsa is limited to one submission queue, so submissions from multiple CPUs 
still meet inside the driver - SCSI Express will keep them isolated all 
the way.  hpsa supports one completion queue per CPU, so completions are 
already isolated.

The blk-mq bitmap tag allocator is working much better than its 
predecessor, but some combinations of active CPUs and devices still 
result in low queue depths for some devices.

We haven't fully tested cases where the hardware interrupt is handled
on a different CPU than the block layer wants to run its completion
processing per rq_affinity. That was previously scheduled as a softirq,
but is now handled directly in hardirq processing with IPIs.  This
changes the CPU utilization %soft and %hard metrics:
* normal 	5% hard, 25% soft
* scsi-mq	30% hard, 0% soft
(with something like 5% usr, 55% sys, 8% iowait idle, 2% idle)


Configuration:
* HP ProLiant DL380p Gen8 with 6 CPU hyperthreading cores (12 logical cores)
* lockless hpsa driver (forthcoming patches with performance 
  improvements such as eliminating locks, plus improved error handling)
* Smart Array P431 RAID controller
* 16 12 Gb/s SAS SSDs
* fio: 4 KiB random reads with options:
  direct=1, ioengine=libaio, norandommap, randrepeat=0,
  iodepth=96 or 1024, numjobs=1 or 12, thread, 
  cpus_allowed=0-11, cpus_allowed_policy=split,
  iodepth_batch=4, iodepth_batch_complete=4, userspace_reap,
  bs=4096, rw=randread
  time_based, group_reporting, gtod_reduce
* block layer queue parameters:
  nr_requests=1011, add_random=0
  nomerges=2, rq_affinity=2, max_sectors_kb=max_hw_sectors_kb
* old version of irqbalance-1.0.4, which still honors 
  /proc/irq/NN/affinity_hint (the new version defaults to
  ignoring that)

---
Rob Elliott    HP Server Storage




^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: scsi-mq
  2014-06-18  7:09     ` scsi-mq Bart Van Assche
@ 2014-06-21  0:52       ` Elliott, Robert (Server Storage)
  2014-06-23  7:09         ` scsi-mq Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-06-21  0:52 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe, Christoph Hellwig, James Bottomley
  Cc: linux-scsi, linux-kernel



> -----Original Message-----
> From: Bart Van Assche [mailto:bvanassche@acm.org]
> Sent: Wednesday, 18 June, 2014 2:09 AM
> To: Jens Axboe; Christoph Hellwig; James Bottomley
> Cc: Elliott, Robert (Server Storage); linux-scsi@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: scsi-mq
> 
...
> Hello Jens,
> 
> Fio reports the same queue depth for use_blk_mq=Y (mq below) and
> use_blk_mq=N (sq below), namely ">=64". However, the number of context
> switches differs significantly for the random read-write tests.
> 
...
> It seems like with the traditional SCSI mid-layer and block core (sq)
> that the number of context switches does not depend too much on the
> number of I/O operations but that for the multi-queue SCSI core there
> are a little bit more than two context switches per I/O in the
> particular test I ran. The "randrw" script I used for this test takes
> SCSI LUNs as arguments (/dev/sdX) and starts the fio tool as follows:

Some of those context switches might be from scsi_end_request(), 
which always schedules the scsi_requeue_run_queue() function via the
requeue_work workqueue for scsi-mq.  That causes lots of context 
switches from a busy application thread (e.g., fio) to a 
kworker thread.

As shown by ftrace:

             fio-19340 [005] dNh. 12067.908444: scsi_io_completion <-scsi_finish_command
             fio-19340 [005] dNh. 12067.908444: scsi_end_request <-scsi_io_completion
             fio-19340 [005] dNh. 12067.908444: blk_update_request <-scsi_end_request
             fio-19340 [005] dNh. 12067.908445: blk_account_io_completion <-blk_update_request
             fio-19340 [005] dNh. 12067.908445: scsi_mq_free_sgtables <-scsi_end_request
             fio-19340 [005] dNh. 12067.908445: scsi_free_sgtable <-scsi_mq_free_sgtables
             fio-19340 [005] dNh. 12067.908445: blk_account_io_done <-__blk_mq_end_io
             fio-19340 [005] dNh. 12067.908445: blk_mq_free_request <-__blk_mq_end_io
             fio-19340 [005] dNh. 12067.908446: blk_mq_map_queue <-blk_mq_free_request
             fio-19340 [005] dNh. 12067.908446: blk_mq_put_tag <-__blk_mq_free_request
             fio-19340 [005] .N.. 12067.908446: blkdev_direct_IO <-generic_file_direct_write
    kworker/5:1H-3207  [005] .... 12067.908448: scsi_requeue_run_queue <-process_one_work
    kworker/5:1H-3207  [005] .... 12067.908448: scsi_run_queue <-scsi_requeue_run_queue
    kworker/5:1H-3207  [005] .... 12067.908448: blk_mq_start_stopped_hw_queues <-scsi_run_queue
             fio-19340 [005] .... 12067.908449: blk_start_plug <-do_blockdev_direct_IO
             fio-19340 [005] .... 12067.908449: blkdev_get_block <-do_direct_IO
             fio-19340 [005] .... 12067.908450: blk_throtl_bio <-generic_make_request_checks
             fio-19340 [005] .... 12067.908450: blk_sq_make_request <-generic_make_request
             fio-19340 [005] .... 12067.908450: blk_queue_bounce <-blk_sq_make_request
             fio-19340 [005] .... 12067.908450: blk_mq_map_request <-blk_sq_make_request
             fio-19340 [005] .... 12067.908451: blk_mq_queue_enter <-blk_mq_map_request
             fio-19340 [005] .... 12067.908451: blk_mq_map_queue <-blk_mq_map_request
             fio-19340 [005] .... 12067.908451: blk_mq_get_tag <-__blk_mq_alloc_request
             fio-19340 [005] .... 12067.908451: blk_mq_bio_to_request <-blk_sq_make_request
             fio-19340 [005] .... 12067.908451: blk_rq_bio_prep <-init_request_from_bio
             fio-19340 [005] .... 12067.908451: blk_recount_segments <-bio_phys_segments
             fio-19340 [005] .... 12067.908452: blk_account_io_start <-blk_mq_bio_to_request
             fio-19340 [005] .... 12067.908452: blk_mq_hctx_mark_pending <-__blk_mq_insert_request
             fio-19340 [005] .... 12067.908452: blk_mq_run_hw_queue <-blk_sq_make_request
             fio-19340 [005] .... 12067.908452: blk_mq_start_request <-__blk_mq_run_hw_queue

In one snapshot just tracing scsi_end_request() and
scsi_request_run_queue(), 30K scsi_end_request() calls yielded 
20k scsi_request_run_queue() calls.

In this case, blk_mq_start_stopped_hw_queues() doesn't end up
doing anything since there aren't any stopped queues to restart 
(blk_mq_run_hw_queue() gets called a bit later during routine 
fio work); the context switch turned out to be a waste of time.  
If it did find a stopped queue, then it would call 
blk_mq_run_hw_queue() itself.

---
Rob Elliott    HP Server Storage


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit
  2014-06-12 13:49 ` [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit Christoph Hellwig
@ 2014-06-21 22:10   ` Elliott, Robert (Server Storage)
  2014-06-23  7:09     ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-06-21 22:10 UTC (permalink / raw)
  To: Christoph Hellwig, James Bottomley
  Cc: Jens Axboe, Bart Van Assche, linux-scsi, linux-kernel



> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@lst.de]
> Sent: Thursday, 12 June, 2014 8:49 AM
> To: James Bottomley
> Cc: Jens Axboe; Bart Van Assche; Elliott, Robert (Server Storage); linux-
> scsi@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH 10/14] scsi: only maintain target_blocked if the driver has a
> target queue limit
> 
> This saves us an atomic operation for each I/O submission and completion
> for the usual case where the driver doesn't set a per-target can_queue
> value.  Only a few iscsi hardware offload drivers set the per-target
> can_queue value at the moment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/scsi_lib.c |   17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0e33dee..763b3c9 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
...

> @@ -1642,7 +1648,8 @@ static void scsi_request_fn(struct request_queue *q)
>  	return;
> 
>   host_not_ready:
> -	atomic_dec(&scsi_target(sdev)->target_busy);
> +	if (&scsi_target(sdev)->can_queue > 0)
> +		atomic_dec(&scsi_target(sdev)->target_busy);
>   not_ready:
>  	/*
>  	 * lock q, handle tag, requeue req, and decrement device_busy. We

There's an extra & in that if statement.

---
Rob Elliott    HP Server Storage




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit
  2014-06-21 22:10   ` Elliott, Robert (Server Storage)
@ 2014-06-23  7:09     ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-23  7:09 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage)
  Cc: Christoph Hellwig, James Bottomley, Jens Axboe, Bart Van Assche,
	linux-scsi, linux-kernel

On Sat, Jun 21, 2014 at 10:10:14PM +0000, Elliott, Robert (Server Storage) wrote:
> >   not_ready:
> >  	/*
> >  	 * lock q, handle tag, requeue req, and decrement device_busy. We
> 
> There's an extra & in that if statement.

Indeed, this crept in during a rebase and a later patch fixes it.  I'll make
sure this one works on its own.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: scsi-mq
  2014-06-21  0:52       ` scsi-mq Elliott, Robert (Server Storage)
@ 2014-06-23  7:09         ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-23  7:09 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage)
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, James Bottomley,
	linux-scsi, linux-kernel

On Sat, Jun 21, 2014 at 12:52:22AM +0000, Elliott, Robert (Server Storage) wrote:
> Some of those context switches might be from scsi_end_request(), 
> which always schedules the scsi_requeue_run_queue() function via the
> requeue_work workqueue for scsi-mq.  That causes lots of context 
> switches from a busy application thread (e.g., fio) to a 
> kworker thread.

Jens has been prodding me to fix this up, I'll send a patch soon that avoids
the workqueue unless we need to kick other queues and thus walk the list
of LUNs.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess
  2014-07-18 10:13 ` [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
@ 2014-07-25 19:08   ` Martin K. Petersen
  0 siblings, 0 replies; 29+ messages in thread
From: Martin K. Petersen @ 2014-07-25 19:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, linux-scsi, Jens Axboe, Bart Van Assche,
	Mike Christie, Martin K. Petersen, Robert Elliott, Webb Scales,
	linux-kernel

>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:

Christoph> Seems like these counters are missing any sort of
Christoph> synchronization for updates, as a over 10 year old comment
Christoph> from me noted.  Fix this by using atomic counters, and while
Christoph> we're at it also make sure they are in the same cacheline as
Christoph> the _busy counters and not needlessly stored to in every I/O
Christoph> completion.

Christoph> With the new model the _busy counters can temporarily go
Christoph> negative, so all the readers are updated to check for > 0
Christoph> values.  Longer term every successful I/O completion will
Christoph> reset the counters to zero, so the temporarily negative
Christoph> values will not cause any harm.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess
  2014-07-18 10:12 scsi-mq V4 Christoph Hellwig
@ 2014-07-18 10:13 ` Christoph Hellwig
  2014-07-25 19:08   ` Martin K. Petersen
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2014-07-18 10:13 UTC (permalink / raw)
  To: James Bottomley, linux-scsi
  Cc: Jens Axboe, Bart Van Assche, Mike Christie, Martin K. Petersen,
	Robert Elliott, Webb Scales, linux-kernel

Seems like these counters are missing any sort of synchronization for
updates, as a over 10 year old comment from me noted.  Fix this by
using atomic counters, and while we're at it also make sure they are
in the same cacheline as the _busy counters and not needlessly stored
to in every I/O completion.

With the new model the _busy counters can temporarily go negative,
so all the readers are updated to check for > 0 values.  Longer
term every successful I/O completion will reset the counters to zero,
so the temporarily negative values will not cause any harm.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
---
 drivers/scsi/scsi.c        | 21 +++++++--------
 drivers/scsi/scsi_lib.c    | 66 +++++++++++++++++++++++-----------------------
 drivers/scsi/scsi_sysfs.c  | 10 ++++++-
 include/scsi/scsi_device.h |  7 ++---
 include/scsi/scsi_host.h   |  7 ++---
 5 files changed, 58 insertions(+), 53 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 21fb97b..3dde8a3 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -726,17 +726,16 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
 
 	scsi_device_unbusy(sdev);
 
-        /*
-         * Clear the flags which say that the device/host is no longer
-         * capable of accepting new commands.  These are set in scsi_queue.c
-         * for both the queue full condition on a device, and for a
-         * host full condition on the host.
-	 *
-	 * XXX(hch): What about locking?
-         */
-        shost->host_blocked = 0;
-	starget->target_blocked = 0;
-        sdev->device_blocked = 0;
+	/*
+	 * Clear the flags that say that the device/target/host is no longer
+	 * capable of accepting new commands.
+	 */
+	if (atomic_read(&shost->host_blocked))
+		atomic_set(&shost->host_blocked, 0);
+	if (atomic_read(&starget->target_blocked))
+		atomic_set(&starget->target_blocked, 0);
+	if (atomic_read(&sdev->device_blocked))
+		atomic_set(&sdev->device_blocked, 0);
 
 	/*
 	 * If we have valid sense information, then some kind of recovery
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1ddf0fb..69da4cb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -99,14 +99,16 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 	 */
 	switch (reason) {
 	case SCSI_MLQUEUE_HOST_BUSY:
-		host->host_blocked = host->max_host_blocked;
+		atomic_set(&host->host_blocked, host->max_host_blocked);
 		break;
 	case SCSI_MLQUEUE_DEVICE_BUSY:
 	case SCSI_MLQUEUE_EH_RETRY:
-		device->device_blocked = device->max_device_blocked;
+		atomic_set(&device->device_blocked,
+			   device->max_device_blocked);
 		break;
 	case SCSI_MLQUEUE_TARGET_BUSY:
-		starget->target_blocked = starget->max_target_blocked;
+		atomic_set(&starget->target_blocked,
+			   starget->max_target_blocked);
 		break;
 	}
 }
@@ -351,29 +353,35 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
-static inline int scsi_device_is_busy(struct scsi_device *sdev)
+static inline bool scsi_device_is_busy(struct scsi_device *sdev)
 {
-	if (atomic_read(&sdev->device_busy) >= sdev->queue_depth ||
-	    sdev->device_blocked)
-		return 1;
-	return 0;
+	if (atomic_read(&sdev->device_busy) >= sdev->queue_depth)
+		return true;
+	if (atomic_read(&sdev->device_blocked) > 0)
+		return true;
+	return false;
 }
 
-static inline int scsi_target_is_busy(struct scsi_target *starget)
+static inline bool scsi_target_is_busy(struct scsi_target *starget)
 {
-	return ((starget->can_queue > 0 &&
-		 atomic_read(&starget->target_busy) >= starget->can_queue) ||
-		 starget->target_blocked);
+	if (starget->can_queue > 0 &&
+	    atomic_read(&starget->target_busy) >= starget->can_queue)
+		return true;
+	if (atomic_read(&starget->target_blocked) > 0)
+		return true;
+	return false;
 }
 
-static inline int scsi_host_is_busy(struct Scsi_Host *shost)
+static inline bool scsi_host_is_busy(struct Scsi_Host *shost)
 {
-	if ((shost->can_queue > 0 &&
-	     atomic_read(&shost->host_busy) >= shost->can_queue) ||
-	    shost->host_blocked || shost->host_self_blocked)
-		return 1;
-
-	return 0;
+	if (shost->can_queue > 0 &&
+	    atomic_read(&shost->host_busy) >= shost->can_queue)
+		return true;
+	if (atomic_read(&shost->host_blocked) > 0)
+		return true;
+	if (shost->host_self_blocked)
+		return true;
+	return false;
 }
 
 static void scsi_starved_list_run(struct Scsi_Host *shost)
@@ -1256,14 +1264,14 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 	unsigned int busy;
 
 	busy = atomic_inc_return(&sdev->device_busy) - 1;
-	if (sdev->device_blocked) {
+	if (atomic_read(&sdev->device_blocked)) {
 		if (busy)
 			goto out_dec;
 
 		/*
 		 * unblock after device_blocked iterates to zero
 		 */
-		if (--sdev->device_blocked != 0) {
+		if (atomic_dec_return(&sdev->device_blocked) > 0) {
 			blk_delay_queue(q, SCSI_QUEUE_DELAY);
 			goto out_dec;
 		}
@@ -1302,19 +1310,15 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 	}
 
 	busy = atomic_inc_return(&starget->target_busy) - 1;
-	if (starget->target_blocked) {
+	if (atomic_read(&starget->target_blocked) > 0) {
 		if (busy)
 			goto starved;
 
 		/*
 		 * unblock after target_blocked iterates to zero
 		 */
-		spin_lock_irq(shost->host_lock);
-		if (--starget->target_blocked != 0) {
-			spin_unlock_irq(shost->host_lock);
+		if (atomic_dec_return(&starget->target_blocked) > 0)
 			goto out_dec;
-		}
-		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
 				 "unblocking target at zero depth\n"));
@@ -1349,19 +1353,15 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 		return 0;
 
 	busy = atomic_inc_return(&shost->host_busy) - 1;
-	if (shost->host_blocked) {
+	if (atomic_read(&shost->host_blocked) > 0) {
 		if (busy)
 			goto starved;
 
 		/*
 		 * unblock after host_blocked iterates to zero
 		 */
-		spin_lock_irq(shost->host_lock);
-		if (--shost->host_blocked != 0) {
-			spin_unlock_irq(shost->host_lock);
+		if (atomic_dec_return(&shost->host_blocked) > 0)
 			goto out_dec;
-		}
-		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3,
 			shost_printk(KERN_INFO, shost,
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 54e3dac..deef063 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -584,7 +584,6 @@ static int scsi_sdev_check_buf_bit(const char *buf)
 /*
  * Create the actual show/store functions and data structures.
  */
-sdev_rd_attr (device_blocked, "%d\n");
 sdev_rd_attr (type, "%d\n");
 sdev_rd_attr (scsi_level, "%d\n");
 sdev_rd_attr (vendor, "%.8s\n");
@@ -600,6 +599,15 @@ sdev_show_device_busy(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR(device_busy, S_IRUGO, sdev_show_device_busy, NULL);
 
+static ssize_t
+sdev_show_device_blocked(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(dev);
+	return snprintf(buf, 20, "%d\n", atomic_read(&sdev->device_blocked));
+}
+static DEVICE_ATTR(device_blocked, S_IRUGO, sdev_show_device_blocked, NULL);
+
 /*
  * TODO: can we make these symlinks to the block layer ones?
  */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 3329901..0f853f2 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -82,6 +82,8 @@ struct scsi_device {
 	struct list_head    same_target_siblings; /* just the devices sharing same target id */
 
 	atomic_t device_busy;		/* commands actually active on LLDD */
+	atomic_t device_blocked;	/* Device returned QUEUE_FULL. */
+
 	spinlock_t list_lock;
 	struct list_head cmd_list;	/* queue of in use SCSI Command structures */
 	struct list_head starved_entry;
@@ -180,8 +182,6 @@ struct scsi_device {
 	struct list_head event_list;	/* asserted events */
 	struct work_struct event_work;
 
-	unsigned int device_blocked;	/* Device returned QUEUE_FULL. */
-
 	unsigned int max_device_blocked; /* what device_blocked counts down from  */
 #define SCSI_DEFAULT_DEVICE_BLOCKED	3
 
@@ -291,12 +291,13 @@ struct scsi_target {
 						 * the same target will also. */
 	/* commands actually active on LLD. */
 	atomic_t		target_busy;
+	atomic_t		target_blocked;
+
 	/*
 	 * LLDs should set this in the slave_alloc host template callout.
 	 * If set to zero then there is not limit.
 	 */
 	unsigned int		can_queue;
-	unsigned int		target_blocked;
 	unsigned int		max_target_blocked;
 #define SCSI_DEFAULT_TARGET_BLOCKED	3
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 51f7911..5e8ebc1 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -583,6 +583,8 @@ struct Scsi_Host {
 	struct blk_queue_tag	*bqt;
 
 	atomic_t host_busy;		   /* commands actually active on low-level */
+	atomic_t host_blocked;
+
 	unsigned int host_failed;	   /* commands that failed.
 					      protected by host_lock */
 	unsigned int host_eh_scheduled;    /* EH scheduled without command */
@@ -682,11 +684,6 @@ struct Scsi_Host {
 	struct workqueue_struct *tmf_work_q;
 
 	/*
-	 * Host has rejected a command because it was busy.
-	 */
-	unsigned int host_blocked;
-
-	/*
 	 * Value host_blocked counts down from
 	 */
 	unsigned int max_host_blocked;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess
  2014-07-09 11:12   ` Hannes Reinecke
@ 2014-07-10  6:06     ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2014-07-10  6:06 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, James Bottomley, Jens Axboe, Bart Van Assche,
	Robert Elliott, linux-scsi, linux-kernel

On Wed, Jul 09, 2014 at 01:12:17PM +0200, Hannes Reinecke wrote:
> Hmm. I guess there is a race window between
> atomic_read() and atomic_set().
> Doesn't this cause issues when someone calls atomic_set() just before the 
> call to atomic_read?

There is a race window just _after_ the atomic_read, but it's harmless.
The whole _blocked scheme is a backoff to avoid resubmitting I/O all
the time when the HBA or target returned a busy status.  If we race
an incorrectly reset it we will submit I/O and just get a busy indicator
again.

On the other hand doing the atomic_set all the time introduces three atomic
in the I/O completion part that are entirely pointless most of the time.

I guess I should add something like this as a comment to the code..

Note that the old code didn't use any sort of synchronization either.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess
  2014-06-25 16:51 ` [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
@ 2014-07-09 11:12   ` Hannes Reinecke
  2014-07-10  6:06     ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Reinecke @ 2014-07-09 11:12 UTC (permalink / raw)
  To: Christoph Hellwig, James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliott, linux-scsi, linux-kernel

On 06/25/2014 06:51 PM, Christoph Hellwig wrote:
> Seems like these counters are missing any sort of synchronization for
> updates, as a over 10 year old comment from me noted.  Fix this by
> using atomic counters, and while we're at it also make sure they are
> in the same cacheline as the _busy counters and not needlessly stored
> to in every I/O completion.
>
> With the new model the _busy counters can temporarily go negative,
> so all the readers are updated to check for > 0 values.  Longer
> term every successful I/O completion will reset the counters to zero,
> so the temporarily negative values will not cause any harm.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/scsi.c        |   21 ++++++------
>   drivers/scsi/scsi_lib.c    |   82 +++++++++++++++++++++-----------------------
>   drivers/scsi/scsi_sysfs.c  |   10 +++++-
>   include/scsi/scsi_device.h |    7 ++--
>   include/scsi/scsi_host.h   |    7 ++--
>   5 files changed, 64 insertions(+), 63 deletions(-)
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 35a23e2..b362058 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -729,17 +729,16 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
>
>   	scsi_device_unbusy(sdev);
>
> -        /*
> -         * Clear the flags which say that the device/host is no longer
> -         * capable of accepting new commands.  These are set in scsi_queue.c
> -         * for both the queue full condition on a device, and for a
> -         * host full condition on the host.
> -	 *
> -	 * XXX(hch): What about locking?
> -         */
> -        shost->host_blocked = 0;
> -	starget->target_blocked = 0;
> -        sdev->device_blocked = 0;
> +	/*
> +	 * Clear the flags which say that the device/target/host is no longer
> +	 * capable of accepting new commands.
> +	 */
> +	if (atomic_read(&shost->host_blocked))
> +		atomic_set(&shost->host_blocked, 0);
> +	if (atomic_read(&starget->target_blocked))
> +		atomic_set(&starget->target_blocked, 0);
> +	if (atomic_read(&sdev->device_blocked))
> +		atomic_set(&sdev->device_blocked, 0);
>
>   	/*
>   	 * If we have valid sense information, then some kind of recovery
Hmm. I guess there is a race window between
atomic_read() and atomic_set().
Doesn't this cause issues when someone calls atomic_set() just 
before the call to atomic_read?

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index e23fef5..a39d5ba 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -99,14 +99,16 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
>   	 */
>   	switch (reason) {
>   	case SCSI_MLQUEUE_HOST_BUSY:
> -		host->host_blocked = host->max_host_blocked;
> +		atomic_set(&host->host_blocked, host->max_host_blocked);
>   		break;
>   	case SCSI_MLQUEUE_DEVICE_BUSY:
>   	case SCSI_MLQUEUE_EH_RETRY:
> -		device->device_blocked = device->max_device_blocked;
> +		atomic_set(&device->device_blocked,
> +			   device->max_device_blocked);
>   		break;
>   	case SCSI_MLQUEUE_TARGET_BUSY:
> -		starget->target_blocked = starget->max_target_blocked;
> +		atomic_set(&starget->target_blocked,
> +			   starget->max_target_blocked);
>   		break;
>   	}
>   }
> @@ -351,30 +353,39 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
>   	spin_unlock_irqrestore(shost->host_lock, flags);
>   }
>
> -static inline int scsi_device_is_busy(struct scsi_device *sdev)
> +static inline bool scsi_device_is_busy(struct scsi_device *sdev)
>   {
>   	if (atomic_read(&sdev->device_busy) >= sdev->queue_depth)
> -		return 1;
> -	if (sdev->device_blocked)
> -		return 1;
> +		return true;
> +	if (atomic_read(&sdev->device_blocked) > 0)
> +		return true;
>   	return 0;
>   }
>
> -static inline int scsi_target_is_busy(struct scsi_target *starget)
> +static inline bool scsi_target_is_busy(struct scsi_target *starget)
>   {
> -	return ((starget->can_queue > 0 &&
> -		 atomic_read(&starget->target_busy) >= starget->can_queue) ||
> -		 starget->target_blocked);
> +	if (starget->can_queue > 0) {
> +		if (atomic_read(&starget->target_busy) >= starget->can_queue)
> +			return true;
> +		if (atomic_read(&starget->target_blocked) > 0)
> +			return true;
> +	}
> +
> +	return false;
>   }
>
> -static inline int scsi_host_is_busy(struct Scsi_Host *shost)
> +static inline bool scsi_host_is_busy(struct Scsi_Host *shost)
>   {
> -	if ((shost->can_queue > 0 &&
> -	     atomic_read(&shost->host_busy) >= shost->can_queue) ||
> -	    shost->host_blocked || shost->host_self_blocked)
> -		return 1;
> +	if (shost->can_queue > 0) {
> +		if (atomic_read(&shost->host_busy) >= shost->can_queue)
> +			return true;
> +		if (atomic_read(&shost->host_blocked) > 0)
> +			return true;
> +		if (shost->host_self_blocked)
> +			return true;
> +	}
>
> -	return 0;
> +	return false;
>   }
>
>   static void scsi_starved_list_run(struct Scsi_Host *shost)
> @@ -1283,11 +1294,8 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
>   	unsigned int busy;
>
>   	busy = atomic_inc_return(&sdev->device_busy) - 1;
> -	if (busy == 0 && sdev->device_blocked) {
> -		/*
> -		 * unblock after device_blocked iterates to zero
> -		 */
> -		if (--sdev->device_blocked != 0) {
> +	if (busy == 0 && atomic_read(&sdev->device_blocked) > 0) {
> +		if (atomic_dec_return(&sdev->device_blocked) > 0) {
>   			blk_delay_queue(q, SCSI_QUEUE_DELAY);
>   			goto out_dec;
>   		}
> @@ -1297,7 +1305,7 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
>
>   	if (busy >= sdev->queue_depth)
>   		goto out_dec;
> -	if (sdev->device_blocked)
> +	if (atomic_read(&sdev->device_blocked) > 0)
>   		goto out_dec;
>
>   	return 1;
> @@ -1328,16 +1336,9 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
>   	}
>
>   	busy = atomic_inc_return(&starget->target_busy) - 1;
> -	if (busy == 0 && starget->target_blocked) {
> -		/*
> -		 * unblock after target_blocked iterates to zero
> -		 */
> -		spin_lock_irq(shost->host_lock);
> -		if (--starget->target_blocked != 0) {
> -			spin_unlock_irq(shost->host_lock);
> +	if (busy == 0 && atomic_read(&starget->target_blocked) > 0) {
> +		if (atomic_dec_return(&starget->target_blocked) > 0)
>   			goto out_dec;
> -		}
> -		spin_unlock_irq(shost->host_lock);
>
>   		SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
>   				 "unblocking target at zero depth\n"));
> @@ -1345,7 +1346,7 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
>
>   	if (starget->can_queue > 0 && busy >= starget->can_queue)
>   		goto starved;
> -	if (starget->target_blocked)
> +	if (atomic_read(&starget->target_blocked) > 0)
>   		goto starved;
>
>   	return 1;
> @@ -1374,16 +1375,9 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   		return 0;
>
>   	busy = atomic_inc_return(&shost->host_busy) - 1;
> -	if (busy == 0 && shost->host_blocked) {
> -		/*
> -		 * unblock after host_blocked iterates to zero
> -		 */
> -		spin_lock_irq(shost->host_lock);
> -		if (--shost->host_blocked != 0) {
> -			spin_unlock_irq(shost->host_lock);
> +	if (busy == 0 && atomic_read(&shost->host_blocked) > 0) {
> +		if (atomic_dec_return(&shost->host_blocked) > 0)
>   			goto out_dec;
> -		}
> -		spin_unlock_irq(shost->host_lock);
>
>   		SCSI_LOG_MLQUEUE(3,
>   			shost_printk(KERN_INFO, shost,
Same with this one.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess
  2014-06-25 16:51 scsi-mq V2 Christoph Hellwig
@ 2014-06-25 16:51 ` Christoph Hellwig
  2014-07-09 11:12   ` Hannes Reinecke
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2014-06-25 16:51 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliott, linux-scsi, linux-kernel

Seems like these counters are missing any sort of synchronization for
updates, as a over 10 year old comment from me noted.  Fix this by
using atomic counters, and while we're at it also make sure they are
in the same cacheline as the _busy counters and not needlessly stored
to in every I/O completion.

With the new model the _busy counters can temporarily go negative,
so all the readers are updated to check for > 0 values.  Longer
term every successful I/O completion will reset the counters to zero,
so the temporarily negative values will not cause any harm.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c        |   21 ++++++------
 drivers/scsi/scsi_lib.c    |   82 +++++++++++++++++++++-----------------------
 drivers/scsi/scsi_sysfs.c  |   10 +++++-
 include/scsi/scsi_device.h |    7 ++--
 include/scsi/scsi_host.h   |    7 ++--
 5 files changed, 64 insertions(+), 63 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 35a23e2..b362058 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -729,17 +729,16 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
 
 	scsi_device_unbusy(sdev);
 
-        /*
-         * Clear the flags which say that the device/host is no longer
-         * capable of accepting new commands.  These are set in scsi_queue.c
-         * for both the queue full condition on a device, and for a
-         * host full condition on the host.
-	 *
-	 * XXX(hch): What about locking?
-         */
-        shost->host_blocked = 0;
-	starget->target_blocked = 0;
-        sdev->device_blocked = 0;
+	/*
+	 * Clear the flags which say that the device/target/host is no longer
+	 * capable of accepting new commands.
+	 */
+	if (atomic_read(&shost->host_blocked))
+		atomic_set(&shost->host_blocked, 0);
+	if (atomic_read(&starget->target_blocked))
+		atomic_set(&starget->target_blocked, 0);
+	if (atomic_read(&sdev->device_blocked))
+		atomic_set(&sdev->device_blocked, 0);
 
 	/*
 	 * If we have valid sense information, then some kind of recovery
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e23fef5..a39d5ba 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -99,14 +99,16 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 	 */
 	switch (reason) {
 	case SCSI_MLQUEUE_HOST_BUSY:
-		host->host_blocked = host->max_host_blocked;
+		atomic_set(&host->host_blocked, host->max_host_blocked);
 		break;
 	case SCSI_MLQUEUE_DEVICE_BUSY:
 	case SCSI_MLQUEUE_EH_RETRY:
-		device->device_blocked = device->max_device_blocked;
+		atomic_set(&device->device_blocked,
+			   device->max_device_blocked);
 		break;
 	case SCSI_MLQUEUE_TARGET_BUSY:
-		starget->target_blocked = starget->max_target_blocked;
+		atomic_set(&starget->target_blocked,
+			   starget->max_target_blocked);
 		break;
 	}
 }
@@ -351,30 +353,39 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
-static inline int scsi_device_is_busy(struct scsi_device *sdev)
+static inline bool scsi_device_is_busy(struct scsi_device *sdev)
 {
 	if (atomic_read(&sdev->device_busy) >= sdev->queue_depth)
-		return 1;
-	if (sdev->device_blocked)
-		return 1;
+		return true;
+	if (atomic_read(&sdev->device_blocked) > 0)
+		return true;
 	return 0;
 }
 
-static inline int scsi_target_is_busy(struct scsi_target *starget)
+static inline bool scsi_target_is_busy(struct scsi_target *starget)
 {
-	return ((starget->can_queue > 0 &&
-		 atomic_read(&starget->target_busy) >= starget->can_queue) ||
-		 starget->target_blocked);
+	if (starget->can_queue > 0) {
+		if (atomic_read(&starget->target_busy) >= starget->can_queue)
+			return true;
+		if (atomic_read(&starget->target_blocked) > 0)
+			return true;
+	}
+
+	return false;
 }
 
-static inline int scsi_host_is_busy(struct Scsi_Host *shost)
+static inline bool scsi_host_is_busy(struct Scsi_Host *shost)
 {
-	if ((shost->can_queue > 0 &&
-	     atomic_read(&shost->host_busy) >= shost->can_queue) ||
-	    shost->host_blocked || shost->host_self_blocked)
-		return 1;
+	if (shost->can_queue > 0) {
+		if (atomic_read(&shost->host_busy) >= shost->can_queue)
+			return true;
+		if (atomic_read(&shost->host_blocked) > 0)
+			return true;
+		if (shost->host_self_blocked)
+			return true;
+	}
 
-	return 0;
+	return false;
 }
 
 static void scsi_starved_list_run(struct Scsi_Host *shost)
@@ -1283,11 +1294,8 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 	unsigned int busy;
 
 	busy = atomic_inc_return(&sdev->device_busy) - 1;
-	if (busy == 0 && sdev->device_blocked) {
-		/*
-		 * unblock after device_blocked iterates to zero
-		 */
-		if (--sdev->device_blocked != 0) {
+	if (busy == 0 && atomic_read(&sdev->device_blocked) > 0) {
+		if (atomic_dec_return(&sdev->device_blocked) > 0) {
 			blk_delay_queue(q, SCSI_QUEUE_DELAY);
 			goto out_dec;
 		}
@@ -1297,7 +1305,7 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 
 	if (busy >= sdev->queue_depth)
 		goto out_dec;
-	if (sdev->device_blocked)
+	if (atomic_read(&sdev->device_blocked) > 0)
 		goto out_dec;
 
 	return 1;
@@ -1328,16 +1336,9 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 	}
 
 	busy = atomic_inc_return(&starget->target_busy) - 1;
-	if (busy == 0 && starget->target_blocked) {
-		/*
-		 * unblock after target_blocked iterates to zero
-		 */
-		spin_lock_irq(shost->host_lock);
-		if (--starget->target_blocked != 0) {
-			spin_unlock_irq(shost->host_lock);
+	if (busy == 0 && atomic_read(&starget->target_blocked) > 0) {
+		if (atomic_dec_return(&starget->target_blocked) > 0)
 			goto out_dec;
-		}
-		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
 				 "unblocking target at zero depth\n"));
@@ -1345,7 +1346,7 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 
 	if (starget->can_queue > 0 && busy >= starget->can_queue)
 		goto starved;
-	if (starget->target_blocked)
+	if (atomic_read(&starget->target_blocked) > 0)
 		goto starved;
 
 	return 1;
@@ -1374,16 +1375,9 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 		return 0;
 
 	busy = atomic_inc_return(&shost->host_busy) - 1;
-	if (busy == 0 && shost->host_blocked) {
-		/*
-		 * unblock after host_blocked iterates to zero
-		 */
-		spin_lock_irq(shost->host_lock);
-		if (--shost->host_blocked != 0) {
-			spin_unlock_irq(shost->host_lock);
+	if (busy == 0 && atomic_read(&shost->host_blocked) > 0) {
+		if (atomic_dec_return(&shost->host_blocked) > 0)
 			goto out_dec;
-		}
-		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3,
 			shost_printk(KERN_INFO, shost,
@@ -1392,7 +1386,9 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 
 	if (shost->can_queue > 0 && busy >= shost->can_queue)
 		goto starved;
-	if (shost->host_blocked || shost->host_self_blocked)
+	if (atomic_read(&shost->host_blocked) > 0)
+		goto starved;
+	if (shost->host_self_blocked)
 		goto starved;
 
 	/* We're OK to process the command, so we can't be starved */
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 54e3dac..deef063 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -584,7 +584,6 @@ static int scsi_sdev_check_buf_bit(const char *buf)
 /*
  * Create the actual show/store functions and data structures.
  */
-sdev_rd_attr (device_blocked, "%d\n");
 sdev_rd_attr (type, "%d\n");
 sdev_rd_attr (scsi_level, "%d\n");
 sdev_rd_attr (vendor, "%.8s\n");
@@ -600,6 +599,15 @@ sdev_show_device_busy(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR(device_busy, S_IRUGO, sdev_show_device_busy, NULL);
 
+static ssize_t
+sdev_show_device_blocked(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(dev);
+	return snprintf(buf, 20, "%d\n", atomic_read(&sdev->device_blocked));
+}
+static DEVICE_ATTR(device_blocked, S_IRUGO, sdev_show_device_blocked, NULL);
+
 /*
  * TODO: can we make these symlinks to the block layer ones?
  */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 5ff3d24..a8a8981 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -82,6 +82,8 @@ struct scsi_device {
 	struct list_head    same_target_siblings; /* just the devices sharing same target id */
 
 	atomic_t device_busy;		/* commands actually active on LLDD */
+	atomic_t device_blocked;	/* Device returned QUEUE_FULL. */
+
 	spinlock_t list_lock;
 	struct list_head cmd_list;	/* queue of in use SCSI Command structures */
 	struct list_head starved_entry;
@@ -179,8 +181,6 @@ struct scsi_device {
 	struct list_head event_list;	/* asserted events */
 	struct work_struct event_work;
 
-	unsigned int device_blocked;	/* Device returned QUEUE_FULL. */
-
 	unsigned int max_device_blocked; /* what device_blocked counts down from  */
 #define SCSI_DEFAULT_DEVICE_BLOCKED	3
 
@@ -290,12 +290,13 @@ struct scsi_target {
 						 * the same target will also. */
 	/* commands actually active on LLD. */
 	atomic_t		target_busy;
+	atomic_t		target_blocked;
+
 	/*
 	 * LLDs should set this in the slave_alloc host template callout.
 	 * If set to zero then there is not limit.
 	 */
 	unsigned int		can_queue;
-	unsigned int		target_blocked;
 	unsigned int		max_target_blocked;
 #define SCSI_DEFAULT_TARGET_BLOCKED	3
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 3d124f7..7f9bbda 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -604,6 +604,8 @@ struct Scsi_Host {
 	struct blk_queue_tag	*bqt;
 
 	atomic_t host_busy;		   /* commands actually active on low-level */
+	atomic_t host_blocked;
+
 	unsigned int host_failed;	   /* commands that failed.
 					      protected by host_lock */
 	unsigned int host_eh_scheduled;    /* EH scheduled without command */
@@ -703,11 +705,6 @@ struct Scsi_Host {
 	struct workqueue_struct *tmf_work_q;
 
 	/*
-	 * Host has rejected a command because it was busy.
-	 */
-	unsigned int host_blocked;
-
-	/*
 	 * Value host_blocked counts down from
 	 */
 	unsigned int max_host_blocked;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-07-25 19:09 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-12 13:48 scsi-mq Christoph Hellwig
2014-06-12 13:48 ` [PATCH 01/14] sd: don't use rq->cmd_len before setting it up Christoph Hellwig
2014-06-12 13:48 ` [PATCH 02/14] scsi: split __scsi_queue_insert Christoph Hellwig
2014-06-12 13:48 ` [PATCH 03/14] scsi: centralize command re-queueing in scsi_dispatch_fn Christoph Hellwig
2014-06-12 13:48 ` [PATCH 04/14] scsi: set ->scsi_done before calling scsi_dispatch_cmd Christoph Hellwig
2014-06-12 13:48 ` [PATCH 05/14] scsi: push host_lock down into scsi_{host,target}_queue_ready Christoph Hellwig
2014-06-12 13:48 ` [PATCH 06/14] scsi: convert target_busy to an atomic_t Christoph Hellwig
2014-06-12 13:48 ` [PATCH 07/14] scsi: convert host_busy to atomic_t Christoph Hellwig
2014-06-12 13:49 ` [PATCH 08/14] scsi: convert device_busy " Christoph Hellwig
2014-06-12 13:49 ` [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
2014-06-12 13:49 ` [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit Christoph Hellwig
2014-06-21 22:10   ` Elliott, Robert (Server Storage)
2014-06-23  7:09     ` Christoph Hellwig
2014-06-12 13:49 ` [PATCH 11/14] scsi: unwind blk_end_request_all and blk_end_request_err calls Christoph Hellwig
2014-06-12 13:49 ` [PATCH 12/14] scatterlist: allow chaining to preallocated chunks Christoph Hellwig
2014-06-12 13:49 ` [PATCH 13/14] scsi: add support for a blk-mq based I/O path Christoph Hellwig
2014-06-12 13:49 ` [PATCH 14/14] fnic: reject device resets without assigned tags for the blk-mq case Christoph Hellwig
2014-06-13  6:42 ` scsi-mq Bart Van Assche
2014-06-17 14:27 ` scsi-mq Bart Van Assche
2014-06-18  3:44   ` scsi-mq Jens Axboe
2014-06-18  7:09     ` scsi-mq Bart Van Assche
2014-06-21  0:52       ` scsi-mq Elliott, Robert (Server Storage)
2014-06-23  7:09         ` scsi-mq Christoph Hellwig
2014-06-19  0:58     ` scsi-mq Elliott, Robert (Server Storage)
2014-06-25 16:51 scsi-mq V2 Christoph Hellwig
2014-06-25 16:51 ` [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
2014-07-09 11:12   ` Hannes Reinecke
2014-07-10  6:06     ` Christoph Hellwig
2014-07-18 10:12 scsi-mq V4 Christoph Hellwig
2014-07-18 10:13 ` [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
2014-07-25 19:08   ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.