All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] SCSI data path micro-optimizations
@ 2014-02-05 12:39 Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 01/17] scsi: handle command allocation failure in scsi_reset_provider Christoph Hellwig
                   ` (17 more replies)
  0 siblings, 18 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

This series contains various optimizations for the SCSI data I/O path.
They increase the number of IOPS seen with iSCSI or SRP between 2%
and 3.5% in workloads that previously hit the host_lock hard.  While this
isn't a lot it now fully shifts the contention to the queue_lock, which
will get out of the way later when the SCSI midlayer is converted to
use the blk-mq infrastructure.

Testing has been limited on virtio_scsi, libata, iscsi and srp and more
feedback is as always welcome.

This work was sponsored by the ION division of Fusion IO.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/17] scsi: handle command allocation failure in scsi_reset_provider
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 02/17] megaraid: simplify internal command handling Christoph Hellwig
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0001-scsi-handle-command-allocation-failure-in-scsi_reset.patch --]
[-- Type: text/plain, Size: 770 bytes --]

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_error.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 78b004d..f8b54c1 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2289,6 +2289,11 @@ scsi_reset_provider(struct scsi_device *dev, int flag)
 		return FAILED;
 
 	scmd = scsi_get_command(dev, GFP_KERNEL);
+	if (!scmd) {
+		rtn = FAILED;
+		goto out_put_autopm_host;
+	}
+
 	blk_rq_init(NULL, &req);
 	scmd->request = &req;
 
@@ -2345,6 +2350,7 @@ scsi_reset_provider(struct scsi_device *dev, int flag)
 	scsi_run_host_queues(shost);
 
 	scsi_next_command(scmd);
+out_put_autopm_host:
 	scsi_autopm_put_host(shost);
 	return rtn;
 }
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/17] megaraid: simplify internal command handling
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 01/17] scsi: handle command allocation failure in scsi_reset_provider Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-06 16:40   ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 03/17] scsi: remove scsi_allocate_command/scsi_free_command Christoph Hellwig
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0002-megaraid-simplify-internal-command-handling.patch --]
[-- Type: text/plain, Size: 6632 bytes --]

We don't use the passed in scsi command for anything, so just add a
adapter-wide internal status to go along with the internal scb that
is used unter int_mtx to pass back the return value and get rid of
all the complexities and abuse of the scsi_cmnd structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/megaraid.c |  120 ++++++++++++-----------------------------------
 drivers/scsi/megaraid.h |    3 +-
 2 files changed, 32 insertions(+), 91 deletions(-)

diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
index 816db12..8bca30f 100644
--- a/drivers/scsi/megaraid.c
+++ b/drivers/scsi/megaraid.c
@@ -531,13 +531,6 @@ mega_build_cmd(adapter_t *adapter, Scsi_Cmnd *cmd, int *busy)
 	int	target = 0;
 	int	ldrv_num = 0;   /* logical drive number */
 
-
-	/*
-	 * filter the internal and ioctl commands
-	 */
-	if((cmd->cmnd[0] == MEGA_INTERNAL_CMD))
-		return (scb_t *)cmd->host_scribble;
-
 	/*
 	 * We know what channels our logical drives are on - mega_find_card()
 	 */
@@ -1439,19 +1432,22 @@ mega_cmd_done(adapter_t *adapter, u8 completed[], int nstatus, int status)
 
 		cmdid = completed[i];
 
-		if( cmdid == CMDID_INT_CMDS ) { /* internal command */
+		/*
+		 * Only free SCBs for the commands coming down from the
+		 * mid-layer, not for which were issued internally
+		 *
+		 * For internal command, restore the status returned by the
+		 * firmware so that user can interpret it.
+		 */
+		if (cmdid == CMDID_INT_CMDS) {
 			scb = &adapter->int_scb;
-			cmd = scb->cmd;
-			mbox = (mbox_t *)scb->raw_mbox;
 
-			/*
-			 * Internal command interface do not fire the extended
-			 * passthru or 64-bit passthru
-			 */
-			pthru = scb->pthru;
+			list_del_init(&scb->list);
+			scb->state = SCB_FREE;
 
-		}
-		else {
+			adapter->int_status = status;
+			complete(&adapter->int_waitq);
+		} else {
 			scb = &adapter->scb_list[cmdid];
 
 			/*
@@ -1640,25 +1636,7 @@ mega_cmd_done(adapter_t *adapter, u8 completed[], int nstatus, int status)
 				cmd->result |= (DID_BAD_TARGET << 16)|status;
 		}
 
-		/*
-		 * Only free SCBs for the commands coming down from the
-		 * mid-layer, not for which were issued internally
-		 *
-		 * For internal command, restore the status returned by the
-		 * firmware so that user can interpret it.
-		 */
-		if( cmdid == CMDID_INT_CMDS ) { /* internal command */
-			cmd->result = status;
-
-			/*
-			 * Remove the internal command from the pending list
-			 */
-			list_del_init(&scb->list);
-			scb->state = SCB_FREE;
-		}
-		else {
-			mega_free_scb(adapter, scb);
-		}
+		mega_free_scb(adapter, scb);
 
 		/* Add Scsi_Command to end of completed queue */
 		list_add_tail(SCSI_LIST(cmd), &adapter->completed_list);
@@ -4133,23 +4111,15 @@ mega_internal_dev_inquiry(adapter_t *adapter, u8 ch, u8 tgt,
  * The last argument is the address of the passthru structure if the command
  * to be fired is a passthru command
  *
- * lockscope specifies whether the caller has already acquired the lock. Of
- * course, the caller must know which lock we are talking about.
- *
  * Note: parameter 'pthru' is null for non-passthru commands.
  */
 static int
 mega_internal_command(adapter_t *adapter, megacmd_t *mc, mega_passthru *pthru)
 {
-	Scsi_Cmnd	*scmd;
-	struct	scsi_device *sdev;
+	unsigned long flags;
 	scb_t	*scb;
 	int	rval;
 
-	scmd = scsi_allocate_command(GFP_KERNEL);
-	if (!scmd)
-		return -ENOMEM;
-
 	/*
 	 * The internal commands share one command id and hence are
 	 * serialized. This is so because we want to reserve maximum number of
@@ -4160,73 +4130,45 @@ mega_internal_command(adapter_t *adapter, megacmd_t *mc, mega_passthru *pthru)
 	scb = &adapter->int_scb;
 	memset(scb, 0, sizeof(scb_t));
 
-	sdev = kzalloc(sizeof(struct scsi_device), GFP_KERNEL);
-	scmd->device = sdev;
-
-	memset(adapter->int_cdb, 0, sizeof(adapter->int_cdb));
-	scmd->cmnd = adapter->int_cdb;
-	scmd->device->host = adapter->host;
-	scmd->host_scribble = (void *)scb;
-	scmd->cmnd[0] = MEGA_INTERNAL_CMD;
-
-	scb->state |= SCB_ACTIVE;
-	scb->cmd = scmd;
+	scb->idx = CMDID_INT_CMDS;
+	scb->state |= SCB_ACTIVE | SCB_PENDQ;
 
 	memcpy(scb->raw_mbox, mc, sizeof(megacmd_t));
 
 	/*
 	 * Is it a passthru command
 	 */
-	if( mc->cmd == MEGA_MBOXCMD_PASSTHRU ) {
-
+	if (mc->cmd == MEGA_MBOXCMD_PASSTHRU)
 		scb->pthru = pthru;
-	}
-
-	scb->idx = CMDID_INT_CMDS;
 
-	megaraid_queue_lck(scmd, mega_internal_done);
+	spin_lock_irqsave(&adapter->lock, flags);
+	list_add_tail(&scb->list, &adapter->pending_list);
+	/*
+	 * Check if the HBA is in quiescent state, e.g., during a
+	 * delete logical drive opertion. If it is, don't run
+	 * the pending_list.
+	 */
+	if (atomic_read(&adapter->quiescent) == 0)
+		mega_runpendq(adapter);
+	spin_unlock_irqrestore(&adapter->lock, flags);
 
 	wait_for_completion(&adapter->int_waitq);
 
-	rval = scmd->result;
-	mc->status = scmd->result;
-	kfree(sdev);
+	mc->status = rval = adapter->int_status;
 
 	/*
 	 * Print a debug message for all failed commands. Applications can use
 	 * this information.
 	 */
-	if( scmd->result && trace_level ) {
+	if( rval && trace_level ) {
 		printk("megaraid: cmd [%x, %x, %x] status:[%x]\n",
-			mc->cmd, mc->opcode, mc->subopcode, scmd->result);
+			mc->cmd, mc->opcode, mc->subopcode, rval);
 	}
 
 	mutex_unlock(&adapter->int_mtx);
-
-	scsi_free_command(GFP_KERNEL, scmd);
-
 	return rval;
 }
 
-
-/**
- * mega_internal_done()
- * @scmd - internal scsi command
- *
- * Callback routine for internal commands.
- */
-static void
-mega_internal_done(Scsi_Cmnd *scmd)
-{
-	adapter_t	*adapter;
-
-	adapter = (adapter_t *)scmd->device->host->hostdata;
-
-	complete(&adapter->int_waitq);
-
-}
-
-
 static struct scsi_host_template megaraid_template = {
 	.module				= THIS_MODULE,
 	.name				= "MegaRAID",
diff --git a/drivers/scsi/megaraid.h b/drivers/scsi/megaraid.h
index 4d0ce4e..8f2e026 100644
--- a/drivers/scsi/megaraid.h
+++ b/drivers/scsi/megaraid.h
@@ -853,10 +853,10 @@ typedef struct {
 
 	u8	sglen;	/* f/w supported scatter-gather list length */
 
-	unsigned char int_cdb[MAX_COMMAND_SIZE];
 	scb_t			int_scb;
 	struct mutex		int_mtx;	/* To synchronize the internal
 						commands */
+	int			int_status; 	/* status of internal cmd */
 	struct completion	int_waitq;	/* wait queue for internal
 						 cmds */
 
@@ -1004,7 +1004,6 @@ static int mega_del_logdrv(adapter_t *, int);
 static int mega_do_del_logdrv(adapter_t *, int);
 static void mega_get_max_sgl(adapter_t *);
 static int mega_internal_command(adapter_t *, megacmd_t *, mega_passthru *);
-static void mega_internal_done(Scsi_Cmnd *);
 static int mega_support_cluster(adapter_t *);
 #endif
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/17] scsi: remove scsi_allocate_command/scsi_free_command
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 01/17] scsi: handle command allocation failure in scsi_reset_provider Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 02/17] megaraid: simplify internal command handling Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 04/17] scsi: avoid useless free_list lock roundtrips Christoph Hellwig
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0003-scsi-remove-scsi_allocate_command-scsi_free_command.patch --]
[-- Type: text/plain, Size: 3101 bytes --]

All users are gone, so we can get rid of these.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c      |   56 ----------------------------------------------
 include/scsi/scsi_cmnd.h |    3 ---
 2 files changed, 59 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d8afec8..ebcea6c 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -415,62 +415,6 @@ static void scsi_put_host_cmd_pool(gfp_t gfp_mask)
 }
 
 /**
- * scsi_allocate_command - get a fully allocated SCSI command
- * @gfp_mask:	allocation mask
- *
- * This function is for use outside of the normal host based pools.
- * It allocates the relevant command and takes an additional reference
- * on the pool it used.  This function *must* be paired with
- * scsi_free_command which also has the identical mask, otherwise the
- * free pool counts will eventually go wrong and you'll trigger a bug.
- *
- * This function should *only* be used by drivers that need a static
- * command allocation at start of day for internal functions.
- */
-struct scsi_cmnd *scsi_allocate_command(gfp_t gfp_mask)
-{
-	struct scsi_host_cmd_pool *pool = scsi_get_host_cmd_pool(gfp_mask);
-
-	if (!pool)
-		return NULL;
-
-	return scsi_pool_alloc_command(pool, gfp_mask);
-}
-EXPORT_SYMBOL(scsi_allocate_command);
-
-/**
- * scsi_free_command - free a command allocated by scsi_allocate_command
- * @gfp_mask:	mask used in the original allocation
- * @cmd:	command to free
- *
- * Note: using the original allocation mask is vital because that's
- * what determines which command pool we use to free the command.  Any
- * mismatch will cause the system to BUG eventually.
- */
-void scsi_free_command(gfp_t gfp_mask, struct scsi_cmnd *cmd)
-{
-	struct scsi_host_cmd_pool *pool = scsi_get_host_cmd_pool(gfp_mask);
-
-	/*
-	 * this could trigger if the mask to scsi_allocate_command
-	 * doesn't match this mask.  Otherwise we're guaranteed that this
-	 * succeeds because scsi_allocate_command must have taken a reference
-	 * on the pool
-	 */
-	BUG_ON(!pool);
-
-	scsi_pool_free_command(pool, cmd);
-	/*
-	 * scsi_put_host_cmd_pool is called twice; once to release the
-	 * reference we took above, and once to release the reference
-	 * originally taken by scsi_allocate_command
-	 */
-	scsi_put_host_cmd_pool(gfp_mask);
-	scsi_put_host_cmd_pool(gfp_mask);
-}
-EXPORT_SYMBOL(scsi_free_command);
-
-/**
  * scsi_setup_command_freelist - Setup the command freelist for a scsi host.
  * @shost: host to allocate the freelist for.
  *
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 91558a1..dff105a 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -156,9 +156,6 @@ extern void scsi_release_buffers(struct scsi_cmnd *cmd);
 extern int scsi_dma_map(struct scsi_cmnd *cmd);
 extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
 
-struct scsi_cmnd *scsi_allocate_command(gfp_t gfp_mask);
-void scsi_free_command(gfp_t gfp_mask, struct scsi_cmnd *cmd);
-
 static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
 {
 	return cmd->sdb.table.nents;
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/17] scsi: avoid useless free_list lock roundtrips
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (2 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 03/17] scsi: remove scsi_allocate_command/scsi_free_command Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 23:44   ` James Bottomley
  2014-02-07  9:05   ` Paolo Bonzini
  2014-02-05 12:39 ` [PATCH 05/17] scsi: simplify command allocation and freeing a bit Christoph Hellwig
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0004-scsi-avoid-useless-free_list-lock-roundtrips.patch --]
[-- Type: text/plain, Size: 1129 bytes --]

Avoid hitting the host-wide free_list lock unless we need to put a command
back onto the freelist.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index ebcea6c..4139486 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -320,15 +320,16 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
 {
 	unsigned long flags;
 
-	/* changing locks here, don't need to restore the irq state */
-	spin_lock_irqsave(&shost->free_list_lock, flags);
 	if (unlikely(list_empty(&shost->free_list))) {
-		list_add(&cmd->list, &shost->free_list);
-		cmd = NULL;
+		spin_lock_irqsave(&shost->free_list_lock, flags);
+		if (list_empty(&shost->free_list)) {
+			list_add(&cmd->list, &shost->free_list);
+			cmd = NULL;
+		}
+		spin_unlock_irqrestore(&shost->free_list_lock, flags);
 	}
-	spin_unlock_irqrestore(&shost->free_list_lock, flags);
 
-	if (likely(cmd != NULL))
+	if (cmd)
 		scsi_pool_free_command(shost->cmd_pool, cmd);
 
 	put_device(dev);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/17] scsi: simplify command allocation and freeing a bit
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (3 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 04/17] scsi: avoid useless free_list lock roundtrips Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 23:51   ` James Bottomley
  2014-02-05 12:39 ` [PATCH 06/17] scsi: add support for per-host cmd pools Christoph Hellwig
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0005-scsi-simplify-command-allocation-and-freeing-a-bit.patch --]
[-- Type: text/plain, Size: 3818 bytes --]

Just have one level of alloc/free functions that take a host instead
of two levels for the allocation and different calling conventions
for the free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c |   77 +++++++++++++++------------------------------------
 1 file changed, 22 insertions(+), 55 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 4139486..5347f7d 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -160,79 +160,46 @@ static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
 
 static DEFINE_MUTEX(host_cmd_pool_mutex);
 
-/**
- * scsi_pool_alloc_command - internal function to get a fully allocated command
- * @pool:	slab pool to allocate the command from
- * @gfp_mask:	mask for the allocation
- *
- * Returns a fully allocated command (with the allied sense buffer) or
- * NULL on failure
- */
-static struct scsi_cmnd *
-scsi_pool_alloc_command(struct scsi_host_cmd_pool *pool, gfp_t gfp_mask)
-{
-	struct scsi_cmnd *cmd;
-
-	cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
-	if (!cmd)
-		return NULL;
-
-	cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab,
-					     gfp_mask | pool->gfp_mask);
-	if (!cmd->sense_buffer) {
-		kmem_cache_free(pool->cmd_slab, cmd);
-		return NULL;
-	}
-
-	return cmd;
-}
-
-/**
- * scsi_pool_free_command - internal function to release a command
- * @pool:	slab pool to allocate the command from
- * @cmd:	command to release
- *
- * the command must previously have been allocated by
- * scsi_pool_alloc_command.
- */
 static void
-scsi_pool_free_command(struct scsi_host_cmd_pool *pool,
-			 struct scsi_cmnd *cmd)
+scsi_host_free_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
 {
+	struct scsi_host_cmd_pool *pool = shost->cmd_pool;
+
 	if (cmd->prot_sdb)
 		kmem_cache_free(scsi_sdb_cache, cmd->prot_sdb);
-
 	kmem_cache_free(pool->sense_slab, cmd->sense_buffer);
 	kmem_cache_free(pool->cmd_slab, cmd);
 }
 
-/**
- * scsi_host_alloc_command - internal function to allocate command
- * @shost:	SCSI host whose pool to allocate from
- * @gfp_mask:	mask for the allocation
- *
- * Returns a fully allocated command with sense buffer and protection
- * data buffer (where applicable) or NULL on failure
- */
 static struct scsi_cmnd *
 scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t gfp_mask)
 {
+	struct scsi_host_cmd_pool *pool = shost->cmd_pool;
 	struct scsi_cmnd *cmd;
 
-	cmd = scsi_pool_alloc_command(shost->cmd_pool, gfp_mask);
+	cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
 	if (!cmd)
-		return NULL;
+		goto fail;
+
+	cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab,
+					     gfp_mask | pool->gfp_mask);
+	if (!cmd->sense_buffer)
+		goto fail_free_cmd;
 
 	if (scsi_host_get_prot(shost) >= SHOST_DIX_TYPE0_PROTECTION) {
 		cmd->prot_sdb = kmem_cache_zalloc(scsi_sdb_cache, gfp_mask);
-
-		if (!cmd->prot_sdb) {
-			scsi_pool_free_command(shost->cmd_pool, cmd);
-			return NULL;
-		}
+		if (!cmd->prot_sdb)
+			goto fail_free_sense;
 	}
 
 	return cmd;
+
+fail_free_sense:
+	kmem_cache_free(pool->sense_slab, cmd->sense_buffer);
+fail_free_cmd:
+	kmem_cache_free(pool->cmd_slab, cmd);
+fail:
+	return NULL;
 }
 
 /**
@@ -330,7 +297,7 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
 	}
 
 	if (cmd)
-		scsi_pool_free_command(shost->cmd_pool, cmd);
+		scsi_host_free_command(shost, cmd);
 
 	put_device(dev);
 }
@@ -469,7 +436,7 @@ void scsi_destroy_command_freelist(struct Scsi_Host *shost)
 
 		cmd = list_entry(shost->free_list.next, struct scsi_cmnd, list);
 		list_del_init(&cmd->list);
-		scsi_pool_free_command(shost->cmd_pool, cmd);
+		scsi_host_free_command(shost, cmd);
 	}
 	shost->cmd_pool = NULL;
 	scsi_put_host_cmd_pool(shost->unchecked_isa_dma ? GFP_DMA : GFP_KERNEL);
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/17] scsi: add support for per-host cmd pools
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (4 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 05/17] scsi: simplify command allocation and freeing a bit Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-07  9:13   ` Paolo Bonzini
  2014-02-07  9:35   ` Mike Christie
  2014-02-05 12:39 ` [PATCH 07/17] virtio_scsi: use cmd_size Christoph Hellwig
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0006-scsi-add-support-for-per-host-cmd-pools.patch --]
[-- Type: text/plain, Size: 5787 bytes --]

This allows drivers to specify the size of their per-command private
data in the host template and then get extra memory allocated for
each command instead of needing another allocation in ->queuecommand.

With the current SCSI code that already does multiple allocations for
each command this probably doesn't make a big performance impact, but
it allows to clean up the drivers, and prepare them for using the
blk-mq infrastructure where the common allocation will make a difference.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c      |   92 ++++++++++++++++++++++++++++++++++++----------
 include/scsi/scsi_host.h |    7 ++++
 2 files changed, 80 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 5347f7d..9291eb9 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -328,46 +328,99 @@ void scsi_put_command(struct scsi_cmnd *cmd)
 }
 EXPORT_SYMBOL(scsi_put_command);
 
-static struct scsi_host_cmd_pool *scsi_get_host_cmd_pool(gfp_t gfp_mask)
+static struct scsi_host_cmd_pool *
+scsi_find_host_cmd_pool(struct Scsi_Host *shost)
 {
+	if (shost->hostt->cmd_size)
+		return shost->hostt->cmd_pool;
+	if (shost->unchecked_isa_dma)
+		return &scsi_cmd_dma_pool;
+	return &scsi_cmd_pool;
+}
+
+static void
+scsi_free_host_cmd_pool(struct scsi_host_cmd_pool *pool)
+{
+	kfree(pool->sense_name);
+	kfree(pool->cmd_name);
+	kfree(pool);
+}
+
+static struct scsi_host_cmd_pool *
+scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
+{
+	struct scsi_host_template *hostt = shost->hostt;
+	struct scsi_host_cmd_pool *pool;
+
+	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool)
+		return NULL;
+
+	pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);
+	pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->name);
+	if (!pool->cmd_name || !pool->sense_name) {
+		scsi_free_host_cmd_pool(pool);
+		return NULL;
+	}
+
+	pool->slab_flags = SLAB_HWCACHE_ALIGN;
+	return pool;
+}
+
+static struct scsi_host_cmd_pool *
+scsi_get_host_cmd_pool(struct Scsi_Host *shost)
+{
+	struct scsi_host_template *hostt = shost->hostt;
 	struct scsi_host_cmd_pool *retval = NULL, *pool;
+	size_t cmd_size = sizeof(struct scsi_cmnd) + hostt->cmd_size;
+
 	/*
 	 * Select a command slab for this host and create it if not
 	 * yet existent.
 	 */
 	mutex_lock(&host_cmd_pool_mutex);
-	pool = (gfp_mask & __GFP_DMA) ? &scsi_cmd_dma_pool :
-		&scsi_cmd_pool;
+	pool = scsi_find_host_cmd_pool(shost);
+	if (!pool) {
+		pool = scsi_alloc_host_cmd_pool(shost);
+		if (!pool)
+			goto out;
+	}
+
 	if (!pool->users) {
-		pool->cmd_slab = kmem_cache_create(pool->cmd_name,
-						   sizeof(struct scsi_cmnd), 0,
+		pool->cmd_slab = kmem_cache_create(pool->cmd_name, cmd_size, 0,
 						   pool->slab_flags, NULL);
 		if (!pool->cmd_slab)
-			goto fail;
+			goto out_free_pool;
 
 		pool->sense_slab = kmem_cache_create(pool->sense_name,
 						     SCSI_SENSE_BUFFERSIZE, 0,
 						     pool->slab_flags, NULL);
-		if (!pool->sense_slab) {
-			kmem_cache_destroy(pool->cmd_slab);
-			goto fail;
-		}
+		if (!pool->sense_slab)
+			goto out_free_slab;
 	}
 
 	pool->users++;
 	retval = pool;
- fail:
+out:
 	mutex_unlock(&host_cmd_pool_mutex);
 	return retval;
+
+out_free_slab:
+	kmem_cache_destroy(pool->cmd_slab);
+out_free_pool:
+	if (hostt->cmd_size)
+		scsi_free_host_cmd_pool(pool);
+	goto out;
 }
 
-static void scsi_put_host_cmd_pool(gfp_t gfp_mask)
+static void scsi_put_host_cmd_pool(struct Scsi_Host *shost)
 {
+	struct scsi_host_template *hostt = shost->hostt;
 	struct scsi_host_cmd_pool *pool;
 
 	mutex_lock(&host_cmd_pool_mutex);
-	pool = (gfp_mask & __GFP_DMA) ? &scsi_cmd_dma_pool :
-		&scsi_cmd_pool;
+	pool = scsi_find_host_cmd_pool(shost);
+
 	/*
 	 * This may happen if a driver has a mismatched get and put
 	 * of the command pool; the driver should be implicated in
@@ -378,6 +431,8 @@ static void scsi_put_host_cmd_pool(gfp_t gfp_mask)
 	if (!--pool->users) {
 		kmem_cache_destroy(pool->cmd_slab);
 		kmem_cache_destroy(pool->sense_slab);
+		if (hostt->cmd_size)
+			scsi_free_host_cmd_pool(pool);
 	}
 	mutex_unlock(&host_cmd_pool_mutex);
 }
@@ -394,14 +449,13 @@ static void scsi_put_host_cmd_pool(gfp_t gfp_mask)
  */
 int scsi_setup_command_freelist(struct Scsi_Host *shost)
 {
-	struct scsi_cmnd *cmd;
 	const gfp_t gfp_mask = shost->unchecked_isa_dma ? GFP_DMA : GFP_KERNEL;
+	struct scsi_cmnd *cmd;
 
 	spin_lock_init(&shost->free_list_lock);
 	INIT_LIST_HEAD(&shost->free_list);
 
-	shost->cmd_pool = scsi_get_host_cmd_pool(gfp_mask);
-
+	shost->cmd_pool = scsi_get_host_cmd_pool(shost);
 	if (!shost->cmd_pool)
 		return -ENOMEM;
 
@@ -410,7 +464,7 @@ int scsi_setup_command_freelist(struct Scsi_Host *shost)
 	 */
 	cmd = scsi_host_alloc_command(shost, gfp_mask);
 	if (!cmd) {
-		scsi_put_host_cmd_pool(gfp_mask);
+		scsi_put_host_cmd_pool(shost);
 		shost->cmd_pool = NULL;
 		return -ENOMEM;
 	}
@@ -439,7 +493,7 @@ void scsi_destroy_command_freelist(struct Scsi_Host *shost)
 		scsi_host_free_command(shost, cmd);
 	}
 	shost->cmd_pool = NULL;
-	scsi_put_host_cmd_pool(shost->unchecked_isa_dma ? GFP_DMA : GFP_KERNEL);
+	scsi_put_host_cmd_pool(shost);
 }
 
 #ifdef CONFIG_SCSI_LOGGING
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 53075e5..94844fc 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -15,6 +15,7 @@ struct completion;
 struct module;
 struct scsi_cmnd;
 struct scsi_device;
+struct scsi_host_cmd_pool;
 struct scsi_target;
 struct Scsi_Host;
 struct scsi_host_cmd_pool;
@@ -524,6 +525,12 @@ struct scsi_host_template {
 	 *   scsi_netlink.h
 	 */
 	u64 vendor_id;
+
+	/*
+	 * Additional per-command data allocated for the driver.
+	 */
+	unsigned int cmd_size;
+	struct scsi_host_cmd_pool *cmd_pool;
 };
 
 /*
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/17] virtio_scsi: use cmd_size
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (5 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 06/17] scsi: add support for per-host cmd pools Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-07  9:13   ` Paolo Bonzini
  2014-02-05 12:39 ` [PATCH 08/17] scsi: do not manipulate device reference counts in scsi_get/put_command Christoph Hellwig
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0007-virtio_scsi-use-cmd_size.patch --]
[-- Type: text/plain, Size: 2805 bytes --]

Taken almost entirely from Nicholas Bellinger's scsi-mq conversion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/virtio_scsi.c |   25 +++++++------------------
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 16bfd50..d9a6074 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -204,7 +204,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
 			set_driver_byte(sc, DRIVER_SENSE);
 	}
 
-	mempool_free(cmd, virtscsi_cmd_pool);
 	sc->scsi_done(sc);
 
 	atomic_dec(&tgt->reqs);
@@ -279,8 +278,6 @@ static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf)
 
 	if (cmd->comp)
 		complete_all(cmd->comp);
-	else
-		mempool_free(cmd, virtscsi_cmd_pool);
 }
 
 static void virtscsi_ctrl_done(struct virtqueue *vq)
@@ -496,10 +493,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 				 struct virtio_scsi_vq *req_vq,
 				 struct scsi_cmnd *sc)
 {
-	struct virtio_scsi_cmd *cmd;
-	int ret;
-
 	struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
+	struct virtio_scsi_cmd *cmd = (struct virtio_scsi_cmd *)(sc + 1);
+
 	BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
 
 	/* TODO: check feature bit and fail if unsupported?  */
@@ -508,11 +504,6 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 	dev_dbg(&sc->device->sdev_gendev,
 		"cmd %p CDB: %#02x\n", sc, sc->cmnd[0]);
 
-	ret = SCSI_MLQUEUE_HOST_BUSY;
-	cmd = mempool_alloc(virtscsi_cmd_pool, GFP_ATOMIC);
-	if (!cmd)
-		goto out;
-
 	memset(cmd, 0, sizeof(*cmd));
 	cmd->sc = sc;
 	cmd->req.cmd = (struct virtio_scsi_cmd_req){
@@ -531,13 +522,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 
 	if (virtscsi_kick_cmd(req_vq, cmd,
 			      sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
-			      GFP_ATOMIC) == 0)
-		ret = 0;
-	else
-		mempool_free(cmd, virtscsi_cmd_pool);
-
-out:
-	return ret;
+			      GFP_ATOMIC) != 0)
+		return SCSI_MLQUEUE_HOST_BUSY;
+	return 0;
 }
 
 static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
@@ -683,6 +670,7 @@ static struct scsi_host_template virtscsi_host_template_single = {
 	.name = "Virtio SCSI HBA",
 	.proc_name = "virtio_scsi",
 	.this_id = -1,
+	.cmd_size = sizeof(struct virtio_scsi_cmd),
 	.queuecommand = virtscsi_queuecommand_single,
 	.eh_abort_handler = virtscsi_abort,
 	.eh_device_reset_handler = virtscsi_device_reset,
@@ -699,6 +687,7 @@ static struct scsi_host_template virtscsi_host_template_multi = {
 	.name = "Virtio SCSI HBA",
 	.proc_name = "virtio_scsi",
 	.this_id = -1,
+	.cmd_size = sizeof(struct virtio_scsi_cmd),
 	.queuecommand = virtscsi_queuecommand_multi,
 	.eh_abort_handler = virtscsi_abort,
 	.eh_device_reset_handler = virtscsi_device_reset,
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 08/17] scsi: do not manipulate device reference counts in scsi_get/put_command
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (6 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 07/17] virtio_scsi: use cmd_size Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 09/17] scsi: micro-optimize scsi_request_fn() Christoph Hellwig
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0008-scsi-do-not-manipulate-device-reference-counts-in-sc.patch --]
[-- Type: text/plain, Size: 6265 bytes --]

Many callers won't need this and we can optimize them away.  In addition
the handling in the __-prefixed variants was inconsistant to start with.

Based on an earlier patch from Bart Van Assche.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c         |   36 ++++++++++++------------------------
 drivers/scsi/scsi_error.c   |    7 +++++++
 drivers/scsi/scsi_lib.c     |   12 +++++++++++-
 drivers/scsi/scsi_tgt_lib.c |    3 ++-
 include/scsi/scsi_cmnd.h    |    3 +--
 5 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 9291eb9..ab54154 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -251,27 +251,19 @@ EXPORT_SYMBOL_GPL(__scsi_get_command);
  */
 struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, gfp_t gfp_mask)
 {
-	struct scsi_cmnd *cmd;
+	struct scsi_cmnd *cmd = __scsi_get_command(dev->host, gfp_mask);
+	unsigned long flags;
 
-	/* Bail if we can't get a reference to the device */
-	if (!get_device(&dev->sdev_gendev))
+	if (unlikely(cmd == NULL))
 		return NULL;
 
-	cmd = __scsi_get_command(dev->host, gfp_mask);
-
-	if (likely(cmd != NULL)) {
-		unsigned long flags;
-
-		cmd->device = dev;
-		INIT_LIST_HEAD(&cmd->list);
-		INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
-		spin_lock_irqsave(&dev->list_lock, flags);
-		list_add_tail(&cmd->list, &dev->cmd_list);
-		spin_unlock_irqrestore(&dev->list_lock, flags);
-		cmd->jiffies_at_alloc = jiffies;
-	} else
-		put_device(&dev->sdev_gendev);
-
+	cmd->device = dev;
+	INIT_LIST_HEAD(&cmd->list);
+	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
+	spin_lock_irqsave(&dev->list_lock, flags);
+	list_add_tail(&cmd->list, &dev->cmd_list);
+	spin_unlock_irqrestore(&dev->list_lock, flags);
+	cmd->jiffies_at_alloc = jiffies;
 	return cmd;
 }
 EXPORT_SYMBOL(scsi_get_command);
@@ -282,8 +274,7 @@ EXPORT_SYMBOL(scsi_get_command);
  * @cmd: Command to free
  * @dev: parent scsi device
  */
-void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
-			struct device *dev)
+void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
 {
 	unsigned long flags;
 
@@ -298,8 +289,6 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
 
 	if (cmd)
 		scsi_host_free_command(shost, cmd);
-
-	put_device(dev);
 }
 EXPORT_SYMBOL(__scsi_put_command);
 
@@ -313,7 +302,6 @@ EXPORT_SYMBOL(__scsi_put_command);
  */
 void scsi_put_command(struct scsi_cmnd *cmd)
 {
-	struct scsi_device *sdev = cmd->device;
 	unsigned long flags;
 
 	/* serious error if the command hasn't come from a device list */
@@ -324,7 +312,7 @@ void scsi_put_command(struct scsi_cmnd *cmd)
 
 	cancel_delayed_work(&cmd->abort_work);
 
-	__scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
+	__scsi_put_command(cmd->device->host, cmd);
 }
 EXPORT_SYMBOL(scsi_put_command);
 
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f8b54c1..9379ff7 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2288,8 +2288,15 @@ scsi_reset_provider(struct scsi_device *dev, int flag)
 	if (scsi_autopm_get_host(shost) < 0)
 		return FAILED;
 
+	/* Bail if we can't get a reference to the device */
+	if (!get_device(&dev->sdev_gendev)) {
+		rtn = FAILED;
+		goto out_put_autopm_host;
+	}
+
 	scmd = scsi_get_command(dev, GFP_KERNEL);
 	if (!scmd) {
+		put_device(&dev->sdev_gendev);
 		rtn = FAILED;
 		goto out_put_autopm_host;
 	}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7bd7f0d..48ca1df 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -95,6 +95,7 @@ static void scsi_unprep_request(struct request *req)
 	req->special = NULL;
 
 	scsi_put_command(cmd);
+	put_device(&cmd->device->sdev_gendev);
 }
 
 /**
@@ -524,6 +525,7 @@ void scsi_next_command(struct scsi_cmnd *cmd)
 	get_device(&sdev->sdev_gendev);
 
 	scsi_put_command(cmd);
+	put_device(&sdev->sdev_gendev);
 	scsi_run_queue(q);
 
 	/* ok to remove device now */
@@ -1111,6 +1113,7 @@ err_exit:
 	scsi_release_buffers(cmd);
 	cmd->request->special = NULL;
 	scsi_put_command(cmd);
+	put_device(&cmd->device->sdev_gendev);
 	return error;
 }
 EXPORT_SYMBOL(scsi_init_io);
@@ -1121,9 +1124,15 @@ static struct scsi_cmnd *scsi_get_cmd_from_req(struct scsi_device *sdev,
 	struct scsi_cmnd *cmd;
 
 	if (!req->special) {
+		/* Bail if we can't get a reference to the device */
+		if (!get_device(&sdev->sdev_gendev))
+			return NULL;
+
 		cmd = scsi_get_command(sdev, GFP_ATOMIC);
-		if (unlikely(!cmd))
+		if (unlikely(!cmd)) {
+			put_device(&sdev->sdev_gendev);
 			return NULL;
+		}
 		req->special = cmd;
 	} else {
 		cmd = req->special;
@@ -1286,6 +1295,7 @@ int scsi_prep_return(struct request_queue *q, struct request *req, int ret)
 			struct scsi_cmnd *cmd = req->special;
 			scsi_release_buffers(cmd);
 			scsi_put_command(cmd);
+			put_device(&cmd->device->sdev_gendev);
 			req->special = NULL;
 		}
 		break;
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 84a1fdf..e51add0 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -155,7 +155,8 @@ void scsi_host_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
 	__blk_put_request(q, rq);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-	__scsi_put_command(shost, cmd, &shost->shost_gendev);
+	__scsi_put_command(shost, cmd);
+	put_device(&shost->shost_gendev);
 }
 EXPORT_SYMBOL_GPL(scsi_host_put_command);
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index dff105a..dd7c998 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -142,8 +142,7 @@ static inline struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);
 extern struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *, gfp_t);
 extern void scsi_put_command(struct scsi_cmnd *);
-extern void __scsi_put_command(struct Scsi_Host *, struct scsi_cmnd *,
-			       struct device *);
+extern void __scsi_put_command(struct Scsi_Host *, struct scsi_cmnd *);
 extern void scsi_finish_command(struct scsi_cmnd *cmd);
 
 extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 09/17] scsi: micro-optimize scsi_request_fn()
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (7 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 08/17] scsi: do not manipulate device reference counts in scsi_get/put_command Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 10/17] scsi: micro-optimize scsi_next_command() Christoph Hellwig
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger
  Cc: linux-scsi, Bart Van Assche, Tejun Heo, Mike Christie

[-- Attachment #1: 0009-scsi-micro-optimize-scsi_request_fn.patch --]
[-- Type: text/plain, Size: 2040 bytes --]

SCSI devices may only be removed by calling scsi_remove_device().
That function must invoke blk_cleanup_queue() before the final put
of sdev->sdev_gendev. Since blk_cleanup_queue() waits for the
block queue to drain and then tears it down, scsi_request_fn cannot
be active anymore after blk_cleanup_queue() has returned and hence
the get_device()/put_device() pair in scsi_request_fn is unnecessary.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/scsi_lib.c |   14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 48ca1df..568ac24 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1553,16 +1553,14 @@ static void scsi_softirq_done(struct request *rq)
  * Lock status: IO request lock assumed to be held when called.
  */
 static void scsi_request_fn(struct request_queue *q)
+	__releases(q->queue_lock)
+	__acquires(q->queue_lock)
 {
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost;
 	struct scsi_cmnd *cmd;
 	struct request *req;
 
-	if(!get_device(&sdev->sdev_gendev))
-		/* We must be tearing the block queue down already */
-		return;
-
 	/*
 	 * To start with, we keep looping until the queue is empty, or until
 	 * the host is no longer able to accept any more requests.
@@ -1651,7 +1649,7 @@ static void scsi_request_fn(struct request_queue *q)
 			goto out_delay;
 	}
 
-	goto out;
+	return;
 
  not_ready:
 	spin_unlock_irq(shost->host_lock);
@@ -1670,12 +1668,6 @@ static void scsi_request_fn(struct request_queue *q)
 out_delay:
 	if (sdev->device_busy == 0)
 		blk_delay_queue(q, SCSI_QUEUE_DELAY);
-out:
-	/* must be careful here...if we trigger the ->remove() function
-	 * we cannot be holding the q lock */
-	spin_unlock_irq(q->queue_lock);
-	put_device(&sdev->sdev_gendev);
-	spin_lock_irq(q->queue_lock);
 }
 
 u64 scsi_calculate_bounce_limit(struct Scsi_Host *shost)
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/17] scsi: micro-optimize scsi_next_command()
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (8 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 09/17] scsi: micro-optimize scsi_request_fn() Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 11/17] scsi: micro-optimize scsi_requeue_command() Christoph Hellwig
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger
  Cc: linux-scsi, Bart Van Assche

[-- Attachment #1: 0010-scsi-micro-optimize-scsi_next_command.patch --]
[-- Type: text/plain, Size: 964 bytes --]

Eliminate a get_device() / put_device() pair from scsi_next_command().
Both are atomic operations hence removing these slightly improves
performance.

[hch: slight changes due to different context]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c |    5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 568ac24..84d8211 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -521,14 +521,9 @@ void scsi_next_command(struct scsi_cmnd *cmd)
 	struct scsi_device *sdev = cmd->device;
 	struct request_queue *q = sdev->request_queue;
 
-	/* need to hold a reference on the device before we let go of the cmd */
-	get_device(&sdev->sdev_gendev);
-
 	scsi_put_command(cmd);
-	put_device(&sdev->sdev_gendev);
 	scsi_run_queue(q);
 
-	/* ok to remove device now */
 	put_device(&sdev->sdev_gendev);
 }
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 11/17] scsi: micro-optimize scsi_requeue_command()
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (9 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 10/17] scsi: micro-optimize scsi_next_command() Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 12/17] scsi: avoid taking host_lock in scsi_run_queue unless nessecary Christoph Hellwig
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0011-scsi-micro-optimize-scsi_requeue_command.patch --]
[-- Type: text/plain, Size: 1875 bytes --]

Avoid a spurious device get/put cycle by using scsi_put_command and folding
scsi_unprep_request into scsi_requeue_command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c |   35 +++--------------------------------
 1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 84d8211..216be71 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -75,29 +75,6 @@ struct kmem_cache *scsi_sdb_cache;
  */
 #define SCSI_QUEUE_DELAY	3
 
-/*
- * Function:	scsi_unprep_request()
- *
- * Purpose:	Remove all preparation done for a request, including its
- *		associated scsi_cmnd, so that it can be requeued.
- *
- * Arguments:	req	- request to unprepare
- *
- * Lock status:	Assumed that no locks are held upon entry.
- *
- * Returns:	Nothing.
- */
-static void scsi_unprep_request(struct request *req)
-{
-	struct scsi_cmnd *cmd = req->special;
-
-	blk_unprep_request(req);
-	req->special = NULL;
-
-	scsi_put_command(cmd);
-	put_device(&cmd->device->sdev_gendev);
-}
-
 /**
  * __scsi_queue_insert - private queue insertion
  * @cmd: The SCSI command being requeued
@@ -498,16 +475,10 @@ static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 	struct request *req = cmd->request;
 	unsigned long flags;
 
-	/*
-	 * We need to hold a reference on the device to avoid the queue being
-	 * killed after the unlock and before scsi_run_queue is invoked which
-	 * may happen because scsi_unprep_request() puts the command which
-	 * releases its reference on the device.
-	 */
-	get_device(&sdev->sdev_gendev);
-
 	spin_lock_irqsave(q->queue_lock, flags);
-	scsi_unprep_request(req);
+	blk_unprep_request(req);
+	req->special = NULL;
+	scsi_put_command(cmd);
 	blk_requeue_request(q, req);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 12/17] scsi: avoid taking host_lock in scsi_run_queue unless nessecary
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (10 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 11/17] scsi: micro-optimize scsi_requeue_command() Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 23:54   ` James Bottomley
  2014-02-05 12:39 ` [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready Christoph Hellwig
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0012-scsi-avoid-taking-host_lock-in-scsi_run_queue-unless.patch --]
[-- Type: text/plain, Size: 1802 bytes --]

If we don't have starved devices we don't need to take the host lock
to iterate over them.  Also split the function up to be more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c |   31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 216be71..44740c0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -363,29 +363,12 @@ static inline int scsi_host_is_busy(struct Scsi_Host *shost)
 	return 0;
 }
 
-/*
- * Function:	scsi_run_queue()
- *
- * Purpose:	Select a proper request queue to serve next
- *
- * Arguments:	q	- last request's queue
- *
- * Returns:     Nothing
- *
- * Notes:	The previous command was completely finished, start
- *		a new one if possible.
- */
-static void scsi_run_queue(struct request_queue *q)
+static void scsi_starved_list_run(struct Scsi_Host *shost)
 {
-	struct scsi_device *sdev = q->queuedata;
-	struct Scsi_Host *shost;
 	LIST_HEAD(starved_list);
+	struct scsi_device *sdev;
 	unsigned long flags;
 
-	shost = sdev->host;
-	if (scsi_target(sdev)->single_lun)
-		scsi_single_lun_run(sdev);
-
 	spin_lock_irqsave(shost->host_lock, flags);
 	list_splice_init(&shost->starved_list, &starved_list);
 
@@ -437,6 +420,16 @@ static void scsi_run_queue(struct request_queue *q)
 	/* put any unprocessed entries back */
 	list_splice(&starved_list, &shost->starved_list);
 	spin_unlock_irqrestore(shost->host_lock, flags);
+}
+
+static void scsi_run_queue(struct request_queue *q)
+{
+	struct scsi_device *sdev = q->queuedata;
+
+	if (scsi_target(sdev)->single_lun)
+		scsi_single_lun_run(sdev);
+	if (!list_empty(&sdev->host->starved_list))
+		scsi_starved_list_run(sdev->host);
 
 	blk_run_queue(q);
 }
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (11 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 12/17] scsi: avoid taking host_lock in scsi_run_queue unless nessecary Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-06 16:56   ` James Bottomley
  2014-02-05 12:39 ` [PATCH 14/17] scsi: convert target_busy to an atomic_t Christoph Hellwig
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0013-scsi-push-host_lock-down-into-scsi_-host-target-_que.patch --]
[-- Type: text/plain, Size: 5087 bytes --]

Prepare for not taking a host-wide lock in the dispatch path by pushing
the lock down into the places that actually need it.  Note that this
patch is just a preparation step, as it will actually increase lock
roundtrips and thus decrease performance on its own.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c |   75 ++++++++++++++++++++++++-----------------------
 1 file changed, 39 insertions(+), 36 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 44740c0..295e038 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1318,18 +1318,18 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 /*
  * scsi_target_queue_ready: checks if there we can send commands to target
  * @sdev: scsi device on starget to check.
- *
- * Called with the host lock held.
  */
 static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 					   struct scsi_device *sdev)
 {
 	struct scsi_target *starget = scsi_target(sdev);
+	int ret = 0;
 
+	spin_lock_irq(shost->host_lock);
 	if (starget->single_lun) {
 		if (starget->starget_sdev_user &&
 		    starget->starget_sdev_user != sdev)
-			return 0;
+			goto out;
 		starget->starget_sdev_user = sdev;
 	}
 
@@ -1337,57 +1337,66 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 		/*
 		 * unblock after target_blocked iterates to zero
 		 */
-		if (--starget->target_blocked == 0) {
-			SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
-					 "unblocking target at zero depth\n"));
-		} else
-			return 0;
+		if (--starget->target_blocked != 0)
+			goto out;
+
+		SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
+				 "unblocking target at zero depth\n"));
 	}
 
 	if (scsi_target_is_busy(starget)) {
 		list_move_tail(&sdev->starved_entry, &shost->starved_list);
-		return 0;
+		goto out;
 	}
 
-	return 1;
+	scsi_target(sdev)->target_busy++;
+	ret = 1;
+out:
+	spin_unlock_irq(shost->host_lock);
+	return ret;
 }
 
 /*
  * scsi_host_queue_ready: if we can send requests to shost, return 1 else
  * return 0. We must end up running the queue again whenever 0 is
  * returned, else IO can hang.
- *
- * Called with host_lock held.
  */
 static inline int scsi_host_queue_ready(struct request_queue *q,
 				   struct Scsi_Host *shost,
 				   struct scsi_device *sdev)
 {
+	int ret = 0;
+
+	spin_lock_irq(shost->host_lock);
+
 	if (scsi_host_in_recovery(shost))
-		return 0;
+		goto out;
 	if (shost->host_busy == 0 && shost->host_blocked) {
 		/*
 		 * unblock after host_blocked iterates to zero
 		 */
-		if (--shost->host_blocked == 0) {
-			SCSI_LOG_MLQUEUE(3,
-				printk("scsi%d unblocking host at zero depth\n",
-					shost->host_no));
-		} else {
-			return 0;
-		}
+		if (--shost->host_blocked != 0)
+			goto out;
+
+		SCSI_LOG_MLQUEUE(3,
+			printk("scsi%d unblocking host at zero depth\n",
+				shost->host_no));
 	}
 	if (scsi_host_is_busy(shost)) {
 		if (list_empty(&sdev->starved_entry))
 			list_add_tail(&sdev->starved_entry, &shost->starved_list);
-		return 0;
+		goto out;
 	}
 
 	/* We're OK to process the command, so we can't be starved */
 	if (!list_empty(&sdev->starved_entry))
 		list_del_init(&sdev->starved_entry);
 
-	return 1;
+	shost->host_busy++;
+	ret = 1;
+out:
+	spin_unlock_irq(shost->host_lock);
+	return ret;
 }
 
 /*
@@ -1551,7 +1560,7 @@ static void scsi_request_fn(struct request_queue *q)
 			blk_start_request(req);
 		sdev->device_busy++;
 
-		spin_unlock(q->queue_lock);
+		spin_unlock_irq(q->queue_lock);
 		cmd = req->special;
 		if (unlikely(cmd == NULL)) {
 			printk(KERN_CRIT "impossible request in %s.\n"
@@ -1561,7 +1570,6 @@ static void scsi_request_fn(struct request_queue *q)
 			blk_dump_rq_flags(req, "foo");
 			BUG();
 		}
-		spin_lock(shost->host_lock);
 
 		/*
 		 * We hit this when the driver is using a host wide
@@ -1572,9 +1580,11 @@ static void scsi_request_fn(struct request_queue *q)
 		 * a run when a tag is freed.
 		 */
 		if (blk_queue_tagged(q) && !blk_rq_tagged(req)) {
+			spin_lock_irq(shost->host_lock);
 			if (list_empty(&sdev->starved_entry))
 				list_add_tail(&sdev->starved_entry,
 					      &shost->starved_list);
+			spin_unlock_irq(shost->host_lock);
 			goto not_ready;
 		}
 
@@ -1582,16 +1592,7 @@ static void scsi_request_fn(struct request_queue *q)
 			goto not_ready;
 
 		if (!scsi_host_queue_ready(q, shost, sdev))
-			goto not_ready;
-
-		scsi_target(sdev)->target_busy++;
-		shost->host_busy++;
-
-		/*
-		 * XXX(hch): This is rather suboptimal, scsi_dispatch_cmd will
-		 *		take the lock again.
-		 */
-		spin_unlock_irq(shost->host_lock);
+			goto host_not_ready;
 
 		/*
 		 * Finally, initialize any error handling parameters, and set up
@@ -1610,9 +1611,11 @@ static void scsi_request_fn(struct request_queue *q)
 
 	return;
 
- not_ready:
+ host_not_ready:
+	spin_lock_irq(shost->host_lock);
+	scsi_target(sdev)->target_busy--;
 	spin_unlock_irq(shost->host_lock);
-
+ not_ready:
 	/*
 	 * lock q, handle tag, requeue req, and decrement device_busy. We
 	 * must return with queue_lock held.
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 14/17] scsi: convert target_busy to an atomic_t
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (12 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 15/17] scsi: convert host_busy to atomic_t Christoph Hellwig
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0014-scsi-convert-target_busy-to-an-atomic_t.patch --]
[-- Type: text/plain, Size: 4298 bytes --]

Avoid taking the host-wide host_lock to check the per-target queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if nessecary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_lib.c    |   52 ++++++++++++++++++++++++++------------------
 include/scsi/scsi_device.h |    4 ++--
 2 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 295e038..cc0bd7d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -283,7 +283,7 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	shost->host_busy--;
-	starget->target_busy--;
+	atomic_dec(&starget->target_busy);
 	if (unlikely(scsi_host_in_recovery(shost) &&
 		     (shost->host_failed || shost->host_eh_scheduled)))
 		scsi_eh_wakeup(shost);
@@ -350,7 +350,7 @@ static inline int scsi_device_is_busy(struct scsi_device *sdev)
 static inline int scsi_target_is_busy(struct scsi_target *starget)
 {
 	return ((starget->can_queue > 0 &&
-		 starget->target_busy >= starget->can_queue) ||
+		 atomic_read(&starget->target_busy) >= starget->can_queue) ||
 		 starget->target_blocked);
 }
 
@@ -1323,37 +1323,49 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 					   struct scsi_device *sdev)
 {
 	struct scsi_target *starget = scsi_target(sdev);
-	int ret = 0;
+	unsigned int busy;
 
-	spin_lock_irq(shost->host_lock);
 	if (starget->single_lun) {
+		spin_lock_irq(shost->host_lock);
 		if (starget->starget_sdev_user &&
-		    starget->starget_sdev_user != sdev)
-			goto out;
+		    starget->starget_sdev_user != sdev) {
+			spin_unlock_irq(shost->host_lock);
+			return 0;
+		}
 		starget->starget_sdev_user = sdev;
+		spin_unlock_irq(shost->host_lock);
 	}
 
-	if (starget->target_busy == 0 && starget->target_blocked) {
+	busy = atomic_inc_return(&starget->target_busy) - 1;
+	if (busy == 0 && starget->target_blocked) {
 		/*
 		 * unblock after target_blocked iterates to zero
 		 */
-		if (--starget->target_blocked != 0)
-			goto out;
+		spin_lock_irq(shost->host_lock);
+		if (--starget->target_blocked != 0) {
+			spin_unlock_irq(shost->host_lock);
+			goto out_dec;
+		}
+		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
 				 "unblocking target at zero depth\n"));
 	}
 
-	if (scsi_target_is_busy(starget)) {
-		list_move_tail(&sdev->starved_entry, &shost->starved_list);
-		goto out;
-	}
+	if (starget->can_queue > 0 && busy >= starget->can_queue)
+		goto starved;
+	if (starget->target_blocked)
+		goto starved;
 
-	scsi_target(sdev)->target_busy++;
-	ret = 1;
-out:
+	return 1;
+
+starved:
+	spin_lock_irq(shost->host_lock);
+	list_move_tail(&sdev->starved_entry, &shost->starved_list);
 	spin_unlock_irq(shost->host_lock);
-	return ret;
+out_dec:
+	atomic_dec(&starget->target_busy);
+	return 0;
 }
 
 /*
@@ -1463,7 +1475,7 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
 	spin_unlock(sdev->request_queue->queue_lock);
 	spin_lock(shost->host_lock);
 	shost->host_busy++;
-	starget->target_busy++;
+	atomic_inc(&starget->target_busy);
 	spin_unlock(shost->host_lock);
 	spin_lock(sdev->request_queue->queue_lock);
 
@@ -1612,9 +1624,7 @@ static void scsi_request_fn(struct request_queue *q)
 	return;
 
  host_not_ready:
-	spin_lock_irq(shost->host_lock);
-	scsi_target(sdev)->target_busy--;
-	spin_unlock_irq(shost->host_lock);
+	atomic_dec(&scsi_target(sdev)->target_busy);
  not_ready:
 	/*
 	 * lock q, handle tag, requeue req, and decrement device_busy. We
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index d65fbec..f85b1a2 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -272,8 +272,8 @@ struct scsi_target {
 	unsigned int		expecting_lun_change:1;	/* A device has reported
 						 * a 3F/0E UA, other devices on
 						 * the same target will also. */
-	/* commands actually active on LLD. protected by host lock. */
-	unsigned int		target_busy;
+	/* commands actually active on LLD. */
+	atomic_t		target_busy;
 	/*
 	 * LLDs should set this in the slave_alloc host template callout.
 	 * If set to zero then there is not limit.
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 15/17] scsi: convert host_busy to atomic_t
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (13 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 14/17] scsi: convert target_busy to an atomic_t Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 16/17] scsi: convert device_busy " Christoph Hellwig
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0015-scsi-convert-host_busy-to-atomic_t.patch --]
[-- Type: text/plain, Size: 11452 bytes --]

Avoid taking the host-wide host_lock to check the per-host queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if nessecary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/advansys.c             |    4 +-
 drivers/scsi/libiscsi.c             |    4 +-
 drivers/scsi/libsas/sas_scsi_host.c |    5 ++-
 drivers/scsi/qlogicpti.c            |    2 +-
 drivers/scsi/scsi.c                 |    2 +-
 drivers/scsi/scsi_error.c           |    6 +--
 drivers/scsi/scsi_lib.c             |   71 +++++++++++++++++++++--------------
 drivers/scsi/scsi_sysfs.c           |    9 ++++-
 include/scsi/scsi_host.h            |   10 ++---
 9 files changed, 65 insertions(+), 48 deletions(-)

diff --git a/drivers/scsi/advansys.c b/drivers/scsi/advansys.c
index d814588..0a6ecbd 100644
--- a/drivers/scsi/advansys.c
+++ b/drivers/scsi/advansys.c
@@ -2512,7 +2512,7 @@ static void asc_prt_scsi_host(struct Scsi_Host *s)
 
 	printk("Scsi_Host at addr 0x%p, device %s\n", s, dev_name(boardp->dev));
 	printk(" host_busy %u, host_no %d,\n",
-	       s->host_busy, s->host_no);
+	       atomic_read(&s->host_busy), s->host_no);
 
 	printk(" base 0x%lx, io_port 0x%lx, irq %d,\n",
 	       (ulong)s->base, (ulong)s->io_port, boardp->irq);
@@ -3346,7 +3346,7 @@ static void asc_prt_driver_conf(struct seq_file *m, struct Scsi_Host *shost)
 
 	seq_printf(m,
 		   " host_busy %u, max_id %u, max_lun %u, max_channel %u\n",
-		   shost->host_busy, shost->max_id,
+		   atomic_read(&shost->host_busy), shost->max_id,
 		   shost->max_lun, shost->max_channel);
 
 	seq_printf(m,
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 4046241..718ca88 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -2922,7 +2922,7 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 	 */
 	for (;;) {
 		spin_lock_irqsave(session->host->host_lock, flags);
-		if (!session->host->host_busy) { /* OK for ERL == 0 */
+		if (!atomic_read(&session->host->host_busy)) { /* OK for ERL == 0 */
 			spin_unlock_irqrestore(session->host->host_lock, flags);
 			break;
 		}
@@ -2930,7 +2930,7 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 		msleep_interruptible(500);
 		iscsi_conn_printk(KERN_INFO, conn, "iscsi conn_destroy(): "
 				  "host_busy %d host_failed %d\n",
-				  session->host->host_busy,
+				  atomic_read(&session->host->host_busy),
 				  session->host->host_failed);
 		/*
 		 * force eh_abort() to unblock
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index da3aee1..9423009 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -812,7 +812,7 @@ retry:
 	spin_unlock_irq(shost->host_lock);
 
 	SAS_DPRINTK("Enter %s busy: %d failed: %d\n",
-		    __func__, shost->host_busy, shost->host_failed);
+		    __func__, atomic_read(&shost->host_busy), shost->host_failed);
 	/*
 	 * Deal with commands that still have SAS tasks (i.e. they didn't
 	 * complete via the normal sas_task completion mechanism),
@@ -857,7 +857,8 @@ out:
 		goto retry;
 
 	SAS_DPRINTK("--- Exit %s: busy: %d failed: %d tries: %d\n",
-		    __func__, shost->host_busy, shost->host_failed, tries);
+		    __func__, atomic_read(&shost->host_busy),
+		    shost->host_failed, tries);
 }
 
 enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
diff --git a/drivers/scsi/qlogicpti.c b/drivers/scsi/qlogicpti.c
index 6d48d30..740ae49 100644
--- a/drivers/scsi/qlogicpti.c
+++ b/drivers/scsi/qlogicpti.c
@@ -959,7 +959,7 @@ static inline void update_can_queue(struct Scsi_Host *host, u_int in_ptr, u_int
 	/* Temporary workaround until bug is found and fixed (one bug has been found
 	   already, but fixing it makes things even worse) -jj */
 	int num_free = QLOGICPTI_REQ_QUEUE_LEN - REQ_QUEUE_DEPTH(in_ptr, out_ptr) - 64;
-	host->can_queue = host->host_busy + num_free;
+	host->can_queue = atomic_read(&host->host_busy) + num_free;
 	host->sg_tablesize = QLOGICPTI_MAX_SG(num_free);
 }
 
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index ab54154..52d09ce 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -577,7 +577,7 @@ void scsi_log_completion(struct scsi_cmnd *cmd, int disposition)
 			if (level > 3)
 				scmd_printk(KERN_INFO, cmd,
 					    "scsi host busy %d failed %d\n",
-					    cmd->device->host->host_busy,
+					    atomic_read(&cmd->device->host->host_busy),
 					    cmd->device->host->host_failed);
 		}
 	}
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 9379ff7..f96b484 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -59,7 +59,7 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *,
 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
 {
-	if (shost->host_busy == shost->host_failed) {
+	if (atomic_read(&shost->host_busy) == shost->host_failed) {
 		trace_scsi_eh_wakeup(shost);
 		wake_up_process(shost->ehandler);
 		SCSI_LOG_ERROR_RECOVERY(5,
@@ -2141,7 +2141,7 @@ int scsi_error_handler(void *data)
 	while (!kthread_should_stop()) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
-		    shost->host_failed != shost->host_busy) {
+		    shost->host_failed != atomic_read(&shost->host_busy)) {
 			SCSI_LOG_ERROR_RECOVERY(1,
 				printk("scsi_eh_%d: sleeping\n",
 					shost->host_no));
@@ -2153,7 +2153,7 @@ int scsi_error_handler(void *data)
 		SCSI_LOG_ERROR_RECOVERY(1,
 			printk("scsi_eh_%d: waking up %d/%d/%d\n",
 			       shost->host_no, shost->host_eh_scheduled,
-			       shost->host_failed, shost->host_busy));
+			       shost->host_failed, atomic_read(&shost->host_busy)));
 
 		/*
 		 * We have a host that is failing for some reason.  Figure out
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index cc0bd7d..8929345 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -281,14 +281,17 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 	struct scsi_target *starget = scsi_target(sdev);
 	unsigned long flags;
 
-	spin_lock_irqsave(shost->host_lock, flags);
-	shost->host_busy--;
+	atomic_dec(&shost->host_busy);
 	atomic_dec(&starget->target_busy);
+
 	if (unlikely(scsi_host_in_recovery(shost) &&
-		     (shost->host_failed || shost->host_eh_scheduled)))
+		     (shost->host_failed || shost->host_eh_scheduled))) {
+		spin_lock_irqsave(shost->host_lock, flags);
 		scsi_eh_wakeup(shost);
-	spin_unlock(shost->host_lock);
-	spin_lock(sdev->request_queue->queue_lock);
+		spin_unlock_irqrestore(shost->host_lock, flags);
+	}
+
+	spin_lock_irqsave(sdev->request_queue->queue_lock, flags);
 	sdev->device_busy--;
 	spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags);
 }
@@ -356,7 +359,8 @@ static inline int scsi_target_is_busy(struct scsi_target *starget)
 
 static inline int scsi_host_is_busy(struct Scsi_Host *shost)
 {
-	if ((shost->can_queue > 0 && shost->host_busy >= shost->can_queue) ||
+	if ((shost->can_queue > 0 &&
+	     atomic_read(&shost->host_busy) >= shost->can_queue) ||
 	    shost->host_blocked || shost->host_self_blocked)
 		return 1;
 
@@ -1377,38 +1381,51 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 				   struct Scsi_Host *shost,
 				   struct scsi_device *sdev)
 {
-	int ret = 0;
-
-	spin_lock_irq(shost->host_lock);
+	unsigned int busy;
 
 	if (scsi_host_in_recovery(shost))
-		goto out;
-	if (shost->host_busy == 0 && shost->host_blocked) {
+		return 0;
+
+	busy = atomic_inc_return(&shost->host_busy) - 1;
+	if (busy == 0 && shost->host_blocked) {
 		/*
 		 * unblock after host_blocked iterates to zero
 		 */
-		if (--shost->host_blocked != 0)
-			goto out;
+		spin_lock_irq(shost->host_lock);
+		if (--shost->host_blocked != 0) {
+			spin_unlock_irq(shost->host_lock);
+			goto out_dec;
+		}
+		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3,
 			printk("scsi%d unblocking host at zero depth\n",
 				shost->host_no));
 	}
-	if (scsi_host_is_busy(shost)) {
-		if (list_empty(&sdev->starved_entry))
-			list_add_tail(&sdev->starved_entry, &shost->starved_list);
-		goto out;
-	}
+
+	if (shost->can_queue > 0 && busy >= shost->can_queue)
+		goto starved;
+	if (shost->host_blocked || shost->host_self_blocked)
+		goto starved;
 
 	/* We're OK to process the command, so we can't be starved */
-	if (!list_empty(&sdev->starved_entry))
-		list_del_init(&sdev->starved_entry);
+	if (!list_empty(&sdev->starved_entry)) {
+		spin_lock_irq(shost->host_lock);
+		if (!list_empty(&sdev->starved_entry))
+			list_del_init(&sdev->starved_entry);
+		spin_unlock_irq(shost->host_lock);
+	}
+
+	return 1;
 
-	shost->host_busy++;
-	ret = 1;
-out:
+starved:
+	spin_lock_irq(shost->host_lock);
+	if (list_empty(&sdev->starved_entry))
+		list_add_tail(&sdev->starved_entry, &shost->starved_list);
 	spin_unlock_irq(shost->host_lock);
-	return ret;
+out_dec:
+	atomic_dec(&shost->host_busy);
+	return 0;
 }
 
 /*
@@ -1472,12 +1489,8 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
 	 * with the locks as normal issue path does.
 	 */
 	sdev->device_busy++;
-	spin_unlock(sdev->request_queue->queue_lock);
-	spin_lock(shost->host_lock);
-	shost->host_busy++;
+	atomic_inc(&shost->host_busy);
 	atomic_inc(&starget->target_busy);
-	spin_unlock(shost->host_lock);
-	spin_lock(sdev->request_queue->queue_lock);
 
 	blk_complete_request(req);
 }
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 9117d0b..658ee82 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -332,7 +332,6 @@ store_shost_eh_deadline(struct device *dev, struct device_attribute *attr,
 static DEVICE_ATTR(eh_deadline, S_IRUGO | S_IWUSR, show_shost_eh_deadline, store_shost_eh_deadline);
 
 shost_rd_attr(unique_id, "%u\n");
-shost_rd_attr(host_busy, "%hu\n");
 shost_rd_attr(cmd_per_lun, "%hd\n");
 shost_rd_attr(can_queue, "%hd\n");
 shost_rd_attr(sg_tablesize, "%hu\n");
@@ -342,6 +341,14 @@ shost_rd_attr(prot_capabilities, "%u\n");
 shost_rd_attr(prot_guard_type, "%hd\n");
 shost_rd_attr2(proc_name, hostt->proc_name, "%s\n");
 
+static ssize_t
+show_host_busy(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	return snprintf(buf, 20, "%hu\n", atomic_read(&shost->host_busy));
+}
+static DEVICE_ATTR(host_busy, S_IRUGO, show_host_busy, NULL);
+
 static struct attribute *scsi_sysfs_shost_attrs[] = {
 	&dev_attr_unique_id.attr,
 	&dev_attr_host_busy.attr,
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 94844fc..590e1af 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -603,13 +603,9 @@ struct Scsi_Host {
 	 */
 	struct blk_queue_tag	*bqt;
 
-	/*
-	 * The following two fields are protected with host_lock;
-	 * however, eh routines can safely access during eh processing
-	 * without acquiring the lock.
-	 */
-	unsigned int host_busy;		   /* commands actually active on low-level */
-	unsigned int host_failed;	   /* commands that failed. */
+	atomic_t host_busy;		   /* commands actually active on low-level */
+	unsigned int host_failed;	   /* commands that failed.
+					      protected by host_lock */
 	unsigned int host_eh_scheduled;    /* EH scheduled without command */
     
 	unsigned int host_no;  /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 16/17] scsi: convert device_busy to atomic_t
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (14 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 15/17] scsi: convert host_busy to atomic_t Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 12:39 ` [PATCH 17/17] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
  2014-02-05 23:41 ` [PATCH 00/17] SCSI data path micro-optimizations James Bottomley
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0016-scsi-convert-device_busy-to-atomic_t.patch --]
[-- Type: text/plain, Size: 7391 bytes --]

Avoid taking the queue_lock to check the per-device queue limit.  Instead
we do an atomic_inc_return early on to grab our slot in the queue,
and if nessecary decrement it after finishing all checks.

Unlike the host and target busy counters this doesn't allow us to avoid the
queue_lock in the request_fn due to the way the interface works, but it'll
allow us to prepare for using the blk-mq code, which doesn't use the
queue_lock at all, and it at least avoids a queue_lock rountrip in
scsi_device_unbusy, which is still important given how busy the queue_lock
is.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/message/fusion/mptsas.c |    2 +-
 drivers/scsi/scsi_lib.c         |   50 ++++++++++++++++++++++-----------------
 drivers/scsi/scsi_sysfs.c       |   10 +++++++-
 drivers/scsi/sg.c               |    2 +-
 include/scsi/scsi_device.h      |    4 +---
 5 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index 00d339c..ea3f253 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -3765,7 +3765,7 @@ mptsas_send_link_status_event(struct fw_event_work *fw_event)
 						printk(MYIOC_s_DEBUG_FMT
 						"SDEV OUTSTANDING CMDS"
 						"%d\n", ioc->name,
-						sdev->device_busy));
+						atomic_read(&sdev->device_busy)));
 				}
 
 			}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8929345..20dfbb5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -291,9 +291,7 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 		spin_unlock_irqrestore(shost->host_lock, flags);
 	}
 
-	spin_lock_irqsave(sdev->request_queue->queue_lock, flags);
-	sdev->device_busy--;
-	spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags);
+	atomic_dec(&sdev->device_busy);
 }
 
 /*
@@ -344,9 +342,10 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 
 static inline int scsi_device_is_busy(struct scsi_device *sdev)
 {
-	if (sdev->device_busy >= sdev->queue_depth || sdev->device_blocked)
+	if (atomic_read(&sdev->device_busy) >= sdev->queue_depth)
+		return 1;
+	if (sdev->device_blocked)
 		return 1;
-
 	return 0;
 }
 
@@ -1268,7 +1267,7 @@ int scsi_prep_return(struct request_queue *q, struct request *req, int ret)
 		 * queue must be restarted, so we schedule a callback to happen
 		 * shortly.
 		 */
-		if (sdev->device_busy == 0)
+		if (atomic_read(&sdev->device_busy) == 0)
 			blk_delay_queue(q, SCSI_QUEUE_DELAY);
 		break;
 	default:
@@ -1299,26 +1298,32 @@ EXPORT_SYMBOL(scsi_prep_fn);
 static inline int scsi_dev_queue_ready(struct request_queue *q,
 				  struct scsi_device *sdev)
 {
-	if (sdev->device_busy == 0 && sdev->device_blocked) {
+	unsigned int busy;
+
+	busy = atomic_inc_return(&sdev->device_busy) - 1;
+	if (busy == 0 && sdev->device_blocked) {
 		/*
 		 * unblock after device_blocked iterates to zero
 		 */
-		if (--sdev->device_blocked == 0) {
-			SCSI_LOG_MLQUEUE(3,
-				   sdev_printk(KERN_INFO, sdev,
-				   "unblocking device at zero depth\n"));
-		} else {
+		if (--sdev->device_blocked != 0) {
 			blk_delay_queue(q, SCSI_QUEUE_DELAY);
-			return 0;
+			goto out_dec;
 		}
+		SCSI_LOG_MLQUEUE(3, sdev_printk(KERN_INFO, sdev,
+				   "unblocking device at zero depth\n"));
 	}
-	if (scsi_device_is_busy(sdev))
-		return 0;
+
+	if (busy >= sdev->queue_depth)
+		goto out_dec;
+	if (sdev->device_blocked)
+		goto out_dec;
 
 	return 1;
+out_dec:
+	atomic_dec(&sdev->device_busy);
+	return 0;
 }
 
-
 /*
  * scsi_target_queue_ready: checks if there we can send commands to target
  * @sdev: scsi device on starget to check.
@@ -1488,7 +1493,7 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
 	 * bump busy counts.  To bump the counters, we need to dance
 	 * with the locks as normal issue path does.
 	 */
-	sdev->device_busy++;
+	atomic_inc(&sdev->device_busy);
 	atomic_inc(&shost->host_busy);
 	atomic_inc(&starget->target_busy);
 
@@ -1567,7 +1572,7 @@ static void scsi_request_fn(struct request_queue *q)
 		 * accept it.
 		 */
 		req = blk_peek_request(q);
-		if (!req || !scsi_dev_queue_ready(q, sdev))
+		if (!req)
 			break;
 
 		if (unlikely(!scsi_device_online(sdev))) {
@@ -1577,13 +1582,14 @@ static void scsi_request_fn(struct request_queue *q)
 			continue;
 		}
 
+		if (!scsi_dev_queue_ready(q, sdev))
+			break;
 
 		/*
 		 * Remove the request from the request list.
 		 */
 		if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
 			blk_start_request(req);
-		sdev->device_busy++;
 
 		spin_unlock_irq(q->queue_lock);
 		cmd = req->special;
@@ -1649,9 +1655,9 @@ static void scsi_request_fn(struct request_queue *q)
 	 */
 	spin_lock_irq(q->queue_lock);
 	blk_requeue_request(q, req);
-	sdev->device_busy--;
+	atomic_dec(&sdev->device_busy);
 out_delay:
-	if (sdev->device_busy == 0)
+	if (atomic_read(&sdev->device_busy) == 0)
 		blk_delay_queue(q, SCSI_QUEUE_DELAY);
 }
 
@@ -2390,7 +2396,7 @@ scsi_device_quiesce(struct scsi_device *sdev)
 		return err;
 
 	scsi_run_queue(sdev->request_queue);
-	while (sdev->device_busy) {
+	while (atomic_read(&sdev->device_busy)) {
 		msleep_interruptible(200);
 		scsi_run_queue(sdev->request_queue);
 	}
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 658ee82..c59e146 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -587,13 +587,21 @@ static int scsi_sdev_check_buf_bit(const char *buf)
  */
 sdev_rd_attr (device_blocked, "%d\n");
 sdev_rd_attr (queue_depth, "%d\n");
-sdev_rd_attr (device_busy, "%d\n");
 sdev_rd_attr (type, "%d\n");
 sdev_rd_attr (scsi_level, "%d\n");
 sdev_rd_attr (vendor, "%.8s\n");
 sdev_rd_attr (model, "%.16s\n");
 sdev_rd_attr (rev, "%.4s\n");
 
+static ssize_t
+sdev_show_device_busy(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(dev);
+	return snprintf(buf, 20, "%d\n", atomic_read(&sdev->device_busy));
+}
+static DEVICE_ATTR(device_busy, S_IRUGO, sdev_show_device_busy, NULL);
+
 /*
  * TODO: can we make these symlinks to the block layer ones?
  */
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index df5e961..999ce04 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2489,7 +2489,7 @@ static int sg_proc_seq_show_dev(struct seq_file *s, void *v)
 			      scsidp->id, scsidp->lun, (int) scsidp->type,
 			      1,
 			      (int) scsidp->queue_depth,
-			      (int) scsidp->device_busy,
+			      (int) atomic_read(&scsidp->device_busy),
 			      (int) scsi_device_online(scsidp));
 	else
 		seq_printf(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index f85b1a2..7bb39e4 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -81,9 +81,7 @@ struct scsi_device {
 	struct list_head    siblings;   /* list of all devices on this host */
 	struct list_head    same_target_siblings; /* just the devices sharing same target id */
 
-	/* this is now protected by the request_queue->queue_lock */
-	unsigned int device_busy;	/* commands actually active on
-					 * low-level. protected by queue_lock. */
+	atomic_t device_busy;		/* commands actually active on LLDD */
 	spinlock_t list_lock;
 	struct list_head cmd_list;	/* queue of in use SCSI Command structures */
 	struct list_head starved_entry;
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 17/17] scsi: fix the {host,target,device}_blocked counter mess
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (15 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 16/17] scsi: convert device_busy " Christoph Hellwig
@ 2014-02-05 12:39 ` Christoph Hellwig
  2014-02-05 23:41 ` [PATCH 00/17] SCSI data path micro-optimizations James Bottomley
  17 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-05 12:39 UTC (permalink / raw)
  To: Jens Axboe, James Bottomley, Nicholas Bellinger; +Cc: linux-scsi

[-- Attachment #1: 0017-scsi-fix-the-host-target-device-_blocked-counter-mes.patch --]
[-- Type: text/plain, Size: 10205 bytes --]

Seems like these counters are missing any sort of synchronization for
updates, as a over 10 year old comment from me noted.  Fix this by
using atomic counters, and while we're at it also make sure they are
in the same cacheline as the _busy counters and not needlessly stored
to in every I/O completion.

With the new model the _busy counters can temporarily go negative,
so all the readers are updated to check for > 0 values.  Longer
term every successful I/O completion will reset the counters to zero,
so the temporarily negative values will not cause any harm.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi.c        |   21 ++++++------
 drivers/scsi/scsi_lib.c    |   82 +++++++++++++++++++++-----------------------
 drivers/scsi/scsi_sysfs.c  |   10 +++++-
 include/scsi/scsi_device.h |    7 ++--
 include/scsi/scsi_host.h   |    7 ++--
 5 files changed, 64 insertions(+), 63 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 52d09ce..5cb935a 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -733,17 +733,16 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
 
 	scsi_device_unbusy(sdev);
 
-        /*
-         * Clear the flags which say that the device/host is no longer
-         * capable of accepting new commands.  These are set in scsi_queue.c
-         * for both the queue full condition on a device, and for a
-         * host full condition on the host.
-	 *
-	 * XXX(hch): What about locking?
-         */
-        shost->host_blocked = 0;
-	starget->target_blocked = 0;
-        sdev->device_blocked = 0;
+	/*
+	 * Clear the flags which say that the device/target/host is no longer
+	 * capable of accepting new commands.
+	 */
+	if (atomic_read(&shost->host_blocked))
+		atomic_set(&shost->host_blocked, 0);
+	if (atomic_read(&starget->target_blocked))
+		atomic_set(&starget->target_blocked, 0);
+	if (atomic_read(&sdev->device_blocked))
+		atomic_set(&sdev->device_blocked, 0);
 
 	/*
 	 * If we have valid sense information, then some kind of recovery
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 20dfbb5..83c7e37 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -113,14 +113,16 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 	 */
 	switch (reason) {
 	case SCSI_MLQUEUE_HOST_BUSY:
-		host->host_blocked = host->max_host_blocked;
+		atomic_set(&host->host_blocked, host->max_host_blocked);
 		break;
 	case SCSI_MLQUEUE_DEVICE_BUSY:
 	case SCSI_MLQUEUE_EH_RETRY:
-		device->device_blocked = device->max_device_blocked;
+		atomic_set(&device->device_blocked,
+			   device->max_device_blocked);
 		break;
 	case SCSI_MLQUEUE_TARGET_BUSY:
-		starget->target_blocked = starget->max_target_blocked;
+		atomic_set(&starget->target_blocked,
+			   starget->max_target_blocked);
 		break;
 	}
 
@@ -340,30 +342,39 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
-static inline int scsi_device_is_busy(struct scsi_device *sdev)
+static inline bool scsi_device_is_busy(struct scsi_device *sdev)
 {
 	if (atomic_read(&sdev->device_busy) >= sdev->queue_depth)
-		return 1;
-	if (sdev->device_blocked)
-		return 1;
+		return true;
+	if (atomic_read(&sdev->device_blocked) > 0)
+		return true;
 	return 0;
 }
 
-static inline int scsi_target_is_busy(struct scsi_target *starget)
+static inline bool scsi_target_is_busy(struct scsi_target *starget)
 {
-	return ((starget->can_queue > 0 &&
-		 atomic_read(&starget->target_busy) >= starget->can_queue) ||
-		 starget->target_blocked);
+	if (starget->can_queue > 0) {
+		if (atomic_read(&starget->target_busy) >= starget->can_queue)
+			return true;
+		if (atomic_read(&starget->target_blocked) > 0)
+			return true;
+	}
+
+	return false;
 }
 
-static inline int scsi_host_is_busy(struct Scsi_Host *shost)
+static inline bool scsi_host_is_busy(struct Scsi_Host *shost)
 {
-	if ((shost->can_queue > 0 &&
-	     atomic_read(&shost->host_busy) >= shost->can_queue) ||
-	    shost->host_blocked || shost->host_self_blocked)
-		return 1;
+	if (shost->can_queue > 0) {
+		if (atomic_read(&shost->host_busy) >= shost->can_queue)
+			return true;
+		if (atomic_read(&shost->host_blocked) > 0)
+			return true;
+		if (shost->host_self_blocked)
+			return true;
+	}
 
-	return 0;
+	return false;
 }
 
 static void scsi_starved_list_run(struct Scsi_Host *shost)
@@ -1301,11 +1312,8 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 	unsigned int busy;
 
 	busy = atomic_inc_return(&sdev->device_busy) - 1;
-	if (busy == 0 && sdev->device_blocked) {
-		/*
-		 * unblock after device_blocked iterates to zero
-		 */
-		if (--sdev->device_blocked != 0) {
+	if (busy == 0 && atomic_read(&sdev->device_blocked) > 0) {
+		if (atomic_dec_return(&sdev->device_blocked) > 0) {
 			blk_delay_queue(q, SCSI_QUEUE_DELAY);
 			goto out_dec;
 		}
@@ -1315,7 +1323,7 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
 
 	if (busy >= sdev->queue_depth)
 		goto out_dec;
-	if (sdev->device_blocked)
+	if (atomic_read(&sdev->device_blocked) > 0)
 		goto out_dec;
 
 	return 1;
@@ -1346,16 +1354,9 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 	}
 
 	busy = atomic_inc_return(&starget->target_busy) - 1;
-	if (busy == 0 && starget->target_blocked) {
-		/*
-		 * unblock after target_blocked iterates to zero
-		 */
-		spin_lock_irq(shost->host_lock);
-		if (--starget->target_blocked != 0) {
-			spin_unlock_irq(shost->host_lock);
+	if (busy == 0 && atomic_read(&starget->target_blocked) > 0) {
+		if (atomic_dec_return(&starget->target_blocked) > 0)
 			goto out_dec;
-		}
-		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
 				 "unblocking target at zero depth\n"));
@@ -1363,7 +1364,7 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 
 	if (starget->can_queue > 0 && busy >= starget->can_queue)
 		goto starved;
-	if (starget->target_blocked)
+	if (atomic_read(&starget->target_blocked) > 0)
 		goto starved;
 
 	return 1;
@@ -1392,16 +1393,9 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 		return 0;
 
 	busy = atomic_inc_return(&shost->host_busy) - 1;
-	if (busy == 0 && shost->host_blocked) {
-		/*
-		 * unblock after host_blocked iterates to zero
-		 */
-		spin_lock_irq(shost->host_lock);
-		if (--shost->host_blocked != 0) {
-			spin_unlock_irq(shost->host_lock);
+	if (busy == 0 && atomic_read(&shost->host_blocked) > 0) {
+		if (atomic_dec_return(&shost->host_blocked) > 0)
 			goto out_dec;
-		}
-		spin_unlock_irq(shost->host_lock);
 
 		SCSI_LOG_MLQUEUE(3,
 			printk("scsi%d unblocking host at zero depth\n",
@@ -1410,7 +1404,9 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 
 	if (shost->can_queue > 0 && busy >= shost->can_queue)
 		goto starved;
-	if (shost->host_blocked || shost->host_self_blocked)
+	if (atomic_read(&shost->host_blocked) > 0)
+		goto starved;
+	if (shost->host_self_blocked)
 		goto starved;
 
 	/* We're OK to process the command, so we can't be starved */
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index c59e146..707a10c 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -585,7 +585,6 @@ static int scsi_sdev_check_buf_bit(const char *buf)
 /*
  * Create the actual show/store functions and data structures.
  */
-sdev_rd_attr (device_blocked, "%d\n");
 sdev_rd_attr (queue_depth, "%d\n");
 sdev_rd_attr (type, "%d\n");
 sdev_rd_attr (scsi_level, "%d\n");
@@ -602,6 +601,15 @@ sdev_show_device_busy(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR(device_busy, S_IRUGO, sdev_show_device_busy, NULL);
 
+static ssize_t
+sdev_show_device_blocked(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(dev);
+	return snprintf(buf, 20, "%d\n", atomic_read(&sdev->device_blocked));
+}
+static DEVICE_ATTR(device_blocked, S_IRUGO, sdev_show_device_blocked, NULL);
+
 /*
  * TODO: can we make these symlinks to the block layer ones?
  */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 7bb39e4..d72c842 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -82,6 +82,8 @@ struct scsi_device {
 	struct list_head    same_target_siblings; /* just the devices sharing same target id */
 
 	atomic_t device_busy;		/* commands actually active on LLDD */
+	atomic_t device_blocked;	/* Device returned QUEUE_FULL. */
+
 	spinlock_t list_lock;
 	struct list_head cmd_list;	/* queue of in use SCSI Command structures */
 	struct list_head starved_entry;
@@ -173,8 +175,6 @@ struct scsi_device {
 	struct list_head event_list;	/* asserted events */
 	struct work_struct event_work;
 
-	unsigned int device_blocked;	/* Device returned QUEUE_FULL. */
-
 	unsigned int max_device_blocked; /* what device_blocked counts down from  */
 #define SCSI_DEFAULT_DEVICE_BLOCKED	3
 
@@ -272,12 +272,13 @@ struct scsi_target {
 						 * the same target will also. */
 	/* commands actually active on LLD. */
 	atomic_t		target_busy;
+	atomic_t		target_blocked;
+
 	/*
 	 * LLDs should set this in the slave_alloc host template callout.
 	 * If set to zero then there is not limit.
 	 */
 	unsigned int		can_queue;
-	unsigned int		target_blocked;
 	unsigned int		max_target_blocked;
 #define SCSI_DEFAULT_TARGET_BLOCKED	3
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 590e1af..c4e4875 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -604,6 +604,8 @@ struct Scsi_Host {
 	struct blk_queue_tag	*bqt;
 
 	atomic_t host_busy;		   /* commands actually active on low-level */
+	atomic_t host_blocked;
+
 	unsigned int host_failed;	   /* commands that failed.
 					      protected by host_lock */
 	unsigned int host_eh_scheduled;    /* EH scheduled without command */
@@ -703,11 +705,6 @@ struct Scsi_Host {
 	struct workqueue_struct *tmf_work_q;
 
 	/*
-	 * Host has rejected a command because it was busy.
-	 */
-	unsigned int host_blocked;
-
-	/*
 	 * Value host_blocked counts down from
 	 */
 	unsigned int max_host_blocked;
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/17] SCSI data path micro-optimizations
  2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
                   ` (16 preceding siblings ...)
  2014-02-05 12:39 ` [PATCH 17/17] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
@ 2014-02-05 23:41 ` James Bottomley
  2014-02-06 16:29   ` Christoph Hellwig
  17 siblings, 1 reply; 52+ messages in thread
From: James Bottomley @ 2014-02-05 23:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Nicholas Bellinger, linux-scsi

On Wed, 2014-02-05 at 04:39 -0800, Christoph Hellwig wrote:
> This series contains various optimizations for the SCSI data I/O path.
> They increase the number of IOPS seen with iSCSI or SRP between 2%
> and 3.5% in workloads that previously hit the host_lock hard.  While this
> isn't a lot it now fully shifts the contention to the queue_lock, which
> will get out of the way later when the SCSI midlayer is converted to
> use the blk-mq infrastructure.

I will say that you do make the series hard to review by including
things that do code churn for no gain: like making all the error
handling goto based instead of the current in-place if () clause
based ... this doesn't do anything for optmisation, so why do it?  (I'm
not going to take sides on which is better, since we do both).

There's also a lot of extraneous stuff in here.  I'm assuming most of
the performance stuff is get/put removal and locking efficiency, things
like this

1 looks like a lost pm error path, that should go (separately from the
series) for review to the pm people (Alan Stern).

2,3,5 are allocation simplification.  In general it doesn't look so bad,
but it doesn't seem to be part of the series.  It needs an ACK from the
megaraid people since they're the only consumer of the interface you're
trying to eliminate

6,7 is a new interface for command allocation and a virtio consumer.
OK, as it goes, but also separable ... and preferably, if, as you say,
it's important for mq, some more users, also doesn't need to be part of
the series.

Some other procedural issues:

9,10 look like they aren't your patches, so they need an Author From:
field for git (9 needs your signoff, if you're sending it).

9,10,11 have uninformative titles.  Say what you're doing, which is
removing a get/put_device

James



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/17] scsi: avoid useless free_list lock roundtrips
  2014-02-05 12:39 ` [PATCH 04/17] scsi: avoid useless free_list lock roundtrips Christoph Hellwig
@ 2014-02-05 23:44   ` James Bottomley
  2014-02-06 16:22     ` Christoph Hellwig
  2014-02-07  9:05   ` Paolo Bonzini
  1 sibling, 1 reply; 52+ messages in thread
From: James Bottomley @ 2014-02-05 23:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Nicholas Bellinger, linux-scsi

On Wed, 2014-02-05 at 04:39 -0800, Christoph Hellwig wrote:
> Avoid hitting the host-wide free_list lock unless we need to put a command
> back onto the freelist.

This one looks like it can go independently on its own.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/scsi.c |   13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index ebcea6c..4139486 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -320,15 +320,16 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
>  {
>  	unsigned long flags;
>  
> -	/* changing locks here, don't need to restore the irq state */
> -	spin_lock_irqsave(&shost->free_list_lock, flags);
>  	if (unlikely(list_empty(&shost->free_list))) {
> -		list_add(&cmd->list, &shost->free_list);
> -		cmd = NULL;
> +		spin_lock_irqsave(&shost->free_list_lock, flags);
> +		if (list_empty(&shost->free_list)) {
> +			list_add(&cmd->list, &shost->free_list);
> +			cmd = NULL;
> +		}
> +		spin_unlock_irqrestore(&shost->free_list_lock, flags);
>  	}
> -	spin_unlock_irqrestore(&shost->free_list_lock, flags);
>  
> -	if (likely(cmd != NULL))
> +	if (cmd)

Why do this?  cmd is still likely to be not NULL, which helps the
compiler optimise.

James

>  		scsi_pool_free_command(shost->cmd_pool, cmd);
>  
>  	put_device(dev);
> -- 
> 1.7.10.4



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/17] scsi: simplify command allocation and freeing a bit
  2014-02-05 12:39 ` [PATCH 05/17] scsi: simplify command allocation and freeing a bit Christoph Hellwig
@ 2014-02-05 23:51   ` James Bottomley
  2014-02-06 16:21     ` Christoph Hellwig
  0 siblings, 1 reply; 52+ messages in thread
From: James Bottomley @ 2014-02-05 23:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Nicholas Bellinger, linux-scsi

On Wed, 2014-02-05 at 04:39 -0800, Christoph Hellwig wrote:

> Just have one level of alloc/free functions that take a host instead
> of two levels for the allocation and different calling conventions
> for the free.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/scsi.c |   77 +++++++++++++++------------------------------------
>  1 file changed, 22 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 4139486..5347f7d 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -160,79 +160,46 @@ static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
>  
>  static DEFINE_MUTEX(host_cmd_pool_mutex);
>  
> -/**
> - * scsi_pool_alloc_command - internal function to get a fully allocated command
> - * @pool:	slab pool to allocate the command from
> - * @gfp_mask:	mask for the allocation
> - *
> - * Returns a fully allocated command (with the allied sense buffer) or
> - * NULL on failure
> - */
> -static struct scsi_cmnd *
> -scsi_pool_alloc_command(struct scsi_host_cmd_pool *pool, gfp_t gfp_mask)
> -{
> -	struct scsi_cmnd *cmd;
> -
> -	cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
> -	if (!cmd)
> -		return NULL;
> -
> -	cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab,
> -					     gfp_mask | pool->gfp_mask);
> -	if (!cmd->sense_buffer) {
> -		kmem_cache_free(pool->cmd_slab, cmd);
> -		return NULL;
> -	}
> -
> -	return cmd;
> -}
> -
> -/**
> - * scsi_pool_free_command - internal function to release a command
> - * @pool:	slab pool to allocate the command from
> - * @cmd:	command to release
> - *
> - * the command must previously have been allocated by
> - * scsi_pool_alloc_command.
> - */
>  static void
> -scsi_pool_free_command(struct scsi_host_cmd_pool *pool,
> -			 struct scsi_cmnd *cmd)
> +scsi_host_free_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd)

You lost the docbook function description here when you changed the name.

>  {
> +	struct scsi_host_cmd_pool *pool = shost->cmd_pool;
> +
>  	if (cmd->prot_sdb)
>  		kmem_cache_free(scsi_sdb_cache, cmd->prot_sdb);
> -
>  	kmem_cache_free(pool->sense_slab, cmd->sense_buffer);
>  	kmem_cache_free(pool->cmd_slab, cmd);
>  }
>  
> -/**
> - * scsi_host_alloc_command - internal function to allocate command
> - * @shost:	SCSI host whose pool to allocate from
> - * @gfp_mask:	mask for the allocation
> - *
> - * Returns a fully allocated command with sense buffer and protection
> - * data buffer (where applicable) or NULL on failure
> - */

This docbook elimination looks spurious; why do it?

James

>  static struct scsi_cmnd *
>  scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t gfp_mask)
>  {
> +	struct scsi_host_cmd_pool *pool = shost->cmd_pool;
>  	struct scsi_cmnd *cmd;
>  
> -	cmd = scsi_pool_alloc_command(shost->cmd_pool, gfp_mask);
> +	cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
>  	if (!cmd)
> -		return NULL;
> +		goto fail;
> +
> +	cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab,
> +					     gfp_mask | pool->gfp_mask);
> +	if (!cmd->sense_buffer)
> +		goto fail_free_cmd;
>  
>  	if (scsi_host_get_prot(shost) >= SHOST_DIX_TYPE0_PROTECTION) {
>  		cmd->prot_sdb = kmem_cache_zalloc(scsi_sdb_cache, gfp_mask);
> -
> -		if (!cmd->prot_sdb) {
> -			scsi_pool_free_command(shost->cmd_pool, cmd);
> -			return NULL;
> -		}
> +		if (!cmd->prot_sdb)
> +			goto fail_free_sense;
>  	}
>  
>  	return cmd;
> +
> +fail_free_sense:
> +	kmem_cache_free(pool->sense_slab, cmd->sense_buffer);
> +fail_free_cmd:
> +	kmem_cache_free(pool->cmd_slab, cmd);
> +fail:
> +	return NULL;
>  }
>  
>  /**
> @@ -330,7 +297,7 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
>  	}
>  
>  	if (cmd)
> -		scsi_pool_free_command(shost->cmd_pool, cmd);
> +		scsi_host_free_command(shost, cmd);
>  
>  	put_device(dev);
>  }
> @@ -469,7 +436,7 @@ void scsi_destroy_command_freelist(struct Scsi_Host *shost)
>  
>  		cmd = list_entry(shost->free_list.next, struct scsi_cmnd, list);
>  		list_del_init(&cmd->list);
> -		scsi_pool_free_command(shost->cmd_pool, cmd);
> +		scsi_host_free_command(shost, cmd);
>  	}
>  	shost->cmd_pool = NULL;
>  	scsi_put_host_cmd_pool(shost->unchecked_isa_dma ? GFP_DMA : GFP_KERNEL);
> -- 
> 1.7.10.4



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 12/17] scsi: avoid taking host_lock in scsi_run_queue unless nessecary
  2014-02-05 12:39 ` [PATCH 12/17] scsi: avoid taking host_lock in scsi_run_queue unless nessecary Christoph Hellwig
@ 2014-02-05 23:54   ` James Bottomley
  2014-02-06 16:19     ` Christoph Hellwig
  0 siblings, 1 reply; 52+ messages in thread
From: James Bottomley @ 2014-02-05 23:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Nicholas Bellinger, linux-scsi

On Wed, 2014-02-05 at 04:39 -0800, Christoph Hellwig wrote:

> If we don't have starved devices we don't need to take the host lock
> to iterate over them.  Also split the function up to be more clear.

This looks reasonable and can potentially go separately too.

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/scsi_lib.c |   31 ++++++++++++-------------------
>  1 file changed, 12 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 216be71..44740c0 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -363,29 +363,12 @@ static inline int scsi_host_is_busy(struct Scsi_Host *shost)
>  	return 0;
>  }
>  
> -/*
> - * Function:	scsi_run_queue()
> - *
> - * Purpose:	Select a proper request queue to serve next
> - *
> - * Arguments:	q	- last request's queue
> - *
> - * Returns:     Nothing
> - *
> - * Notes:	The previous command was completely finished, start
> - *		a new one if possible.
> - */

Instead of dumping the description, how about converting it to docbook
and making it match the new function?

> -static void scsi_run_queue(struct request_queue *q)
> +static void scsi_starved_list_run(struct Scsi_Host *shost)
>  {
> -	struct scsi_device *sdev = q->queuedata;
> -	struct Scsi_Host *shost;
>  	LIST_HEAD(starved_list);
> +	struct scsi_device *sdev;
>  	unsigned long flags;
>  
> -	shost = sdev->host;
> -	if (scsi_target(sdev)->single_lun)
> -		scsi_single_lun_run(sdev);
> -
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	list_splice_init(&shost->starved_list, &starved_list);
>  
> @@ -437,6 +420,16 @@ static void scsi_run_queue(struct request_queue *q)
>  	/* put any unprocessed entries back */
>  	list_splice(&starved_list, &shost->starved_list);
>  	spin_unlock_irqrestore(shost->host_lock, flags);
> +}

And reinsert either the original or a docbook formatted version of the description.

Thanks,

James

> +static void scsi_run_queue(struct request_queue *q)
> +{
> +	struct scsi_device *sdev = q->queuedata;
> +
> +	if (scsi_target(sdev)->single_lun)
> +		scsi_single_lun_run(sdev);
> +	if (!list_empty(&sdev->host->starved_list))
> +		scsi_starved_list_run(sdev->host);
>  
>  	blk_run_queue(q);
>  }
> -- 
> 1.7.10.4
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 12/17] scsi: avoid taking host_lock in scsi_run_queue unless nessecary
  2014-02-05 23:54   ` James Bottomley
@ 2014-02-06 16:19     ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-06 16:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Jens Axboe, Nicholas Bellinger, linux-scsi

On Wed, Feb 05, 2014 at 03:54:17PM -0800, James Bottomley wrote:
> > -/*
> > - * Function:	scsi_run_queue()
> > - *
> > - * Purpose:	Select a proper request queue to serve next
> > - *
> > - * Arguments:	q	- last request's queue
> > - *
> > - * Returns:     Nothing
> > - *
> > - * Notes:	The previous command was completely finished, start
> > - *		a new one if possible.
> > - */
> 
> Instead of dumping the description, how about converting it to docbook
> and making it match the new function?

I dropped it because there's nothing that isn't trivially obvious from
the code.  The point of comments should be to explain why things are
done and not have a redundant writeup of what is done.

Docbook comments make sense for exported functions but not so much for
a static function with a handful of callers.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/17] scsi: simplify command allocation and freeing a bit
  2014-02-05 23:51   ` James Bottomley
@ 2014-02-06 16:21     ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-06 16:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Jens Axboe, Nicholas Bellinger, linux-scsi

On Wed, Feb 05, 2014 at 03:51:35PM -0800, James Bottomley wrote:
> On Wed, 2014-02-05 at 04:39 -0800, Christoph Hellwig wrote:
> 
> > Just have one level of alloc/free functions that take a host instead
> > of two levels for the allocation and different calling conventions
> > for the free.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  drivers/scsi/scsi.c |   77 +++++++++++++++------------------------------------
> >  1 file changed, 22 insertions(+), 55 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> > index 4139486..5347f7d 100644
> > --- a/drivers/scsi/scsi.c
> > +++ b/drivers/scsi/scsi.c
> > @@ -160,79 +160,46 @@ static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
> >  
> >  static DEFINE_MUTEX(host_cmd_pool_mutex);
> >  
> > -/**
> > - * scsi_pool_alloc_command - internal function to get a fully allocated command
> > - * @pool:	slab pool to allocate the command from
> > - * @gfp_mask:	mask for the allocation
> > - *
> > - * Returns a fully allocated command (with the allied sense buffer) or
> > - * NULL on failure
> > - */
> > -static struct scsi_cmnd *
> > -scsi_pool_alloc_command(struct scsi_host_cmd_pool *pool, gfp_t gfp_mask)
> > -{
> > -	struct scsi_cmnd *cmd;
> > -
> > -	cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
> > -	if (!cmd)
> > -		return NULL;
> > -
> > -	cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab,
> > -					     gfp_mask | pool->gfp_mask);
> > -	if (!cmd->sense_buffer) {
> > -		kmem_cache_free(pool->cmd_slab, cmd);
> > -		return NULL;
> > -	}
> > -
> > -	return cmd;
> > -}
> > -
> > -/**
> > - * scsi_pool_free_command - internal function to release a command
> > - * @pool:	slab pool to allocate the command from
> > - * @cmd:	command to release
> > - *
> > - * the command must previously have been allocated by
> > - * scsi_pool_alloc_command.
> > - */
> >  static void
> > -scsi_pool_free_command(struct scsi_host_cmd_pool *pool,
> > -			 struct scsi_cmnd *cmd)
> > +scsi_host_free_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
> 
> You lost the docbook function description here when you changed the name.

It's a static function with two callers, and it's more obvious from the
function than the comment what it does.

I'm probably one of the people with the highest comment to code ratios
in the kernel, but I'd rather explain in long comments when something
is non-obvious than putting useless boilerplates in.  If you insist on
the comment I'll put it back, but it seems utterly useless.

> This docbook elimination looks spurious; why do it?

Same as above.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/17] scsi: avoid useless free_list lock roundtrips
  2014-02-05 23:44   ` James Bottomley
@ 2014-02-06 16:22     ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-06 16:22 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Jens Axboe, Nicholas Bellinger, linux-scsi

On Wed, Feb 05, 2014 at 03:44:04PM -0800, James Bottomley wrote:
> Why do this?  cmd is still likely to be not NULL, which helps the
> compiler optimise.

Because testing for non-NULL pointers gets that hint implicitly from
gcc.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/17] SCSI data path micro-optimizations
  2014-02-05 23:41 ` [PATCH 00/17] SCSI data path micro-optimizations James Bottomley
@ 2014-02-06 16:29   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-06 16:29 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jens Axboe, Nicholas Bellinger, linux-scsi

On Wed, Feb 05, 2014 at 03:41:04PM -0800, James Bottomley wrote:
> I will say that you do make the series hard to review by including
> things that do code churn for no gain: like making all the error
> handling goto based instead of the current in-place if () clause
> based ... this doesn't do anything for optmisation, so why do it?  (I'm
> not going to take sides on which is better, since we do both).

You mean the command allocation bits?  When I inline one function into a
nother I might as well add coherent error handling.  And for more than
one level of unwinding gotos are always better because a humand actually
can grasp it.

> There's also a lot of extraneous stuff in here.  I'm assuming most of
> the performance stuff is get/put removal and locking efficiency, things
> like this

This was already cut down.   If it makes your life easier we can do the
cmd allocation part separate of the rest, but there's really no "churn"
left.

> 
> 2,3,5 are allocation simplification.  In general it doesn't look so bad,
> but it doesn't seem to be part of the series.  It needs an ACK from the
> megaraid people since they're the only consumer of the interface you're
> trying to eliminate

It's needed to not make the cmd_size addition not a complete mess.

> 
> 6,7 is a new interface for command allocation and a virtio consumer.
> OK, as it goes, but also separable ... and preferably, if, as you say,
> it's important for mq, some more users, also doesn't need to be part of
> the series.

Feel free to consider 1-7 and the rest separate series already, I'll
resend them that way next time.

> 9,10 look like they aren't your patches, so they need an Author From:
> field for git (9 needs your signoff, if you're sending it).

The are, and they are properly attributed in the git tree.
Unfortunately quilt still hasn't learned to properly deal with From:
headers.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/17] megaraid: simplify internal command handling
  2014-02-05 12:39 ` [PATCH 02/17] megaraid: simplify internal command handling Christoph Hellwig
@ 2014-02-06 16:40   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-06 16:40 UTC (permalink / raw)
  To: Neela Syam Kolli
  Cc: Jens Axboe, James Bottomley, Nicholas Bellinger, linux-scsi

Hi Neela,

can you look over this from the megaraid perspective?

On Wed, Feb 05, 2014 at 04:39:32AM -0800, Christoph Hellwig wrote:
> We don't use the passed in scsi command for anything, so just add a
> adapter-wide internal status to go along with the internal scb that
> is used unter int_mtx to pass back the return value and get rid of
> all the complexities and abuse of the scsi_cmnd structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/megaraid.c |  120 ++++++++++++-----------------------------------
>  drivers/scsi/megaraid.h |    3 +-
>  2 files changed, 32 insertions(+), 91 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
> index 816db12..8bca30f 100644
> --- a/drivers/scsi/megaraid.c
> +++ b/drivers/scsi/megaraid.c
> @@ -531,13 +531,6 @@ mega_build_cmd(adapter_t *adapter, Scsi_Cmnd *cmd, int *busy)
>  	int	target = 0;
>  	int	ldrv_num = 0;   /* logical drive number */
>  
> -
> -	/*
> -	 * filter the internal and ioctl commands
> -	 */
> -	if((cmd->cmnd[0] == MEGA_INTERNAL_CMD))
> -		return (scb_t *)cmd->host_scribble;
> -
>  	/*
>  	 * We know what channels our logical drives are on - mega_find_card()
>  	 */
> @@ -1439,19 +1432,22 @@ mega_cmd_done(adapter_t *adapter, u8 completed[], int nstatus, int status)
>  
>  		cmdid = completed[i];
>  
> -		if( cmdid == CMDID_INT_CMDS ) { /* internal command */
> +		/*
> +		 * Only free SCBs for the commands coming down from the
> +		 * mid-layer, not for which were issued internally
> +		 *
> +		 * For internal command, restore the status returned by the
> +		 * firmware so that user can interpret it.
> +		 */
> +		if (cmdid == CMDID_INT_CMDS) {
>  			scb = &adapter->int_scb;
> -			cmd = scb->cmd;
> -			mbox = (mbox_t *)scb->raw_mbox;
>  
> -			/*
> -			 * Internal command interface do not fire the extended
> -			 * passthru or 64-bit passthru
> -			 */
> -			pthru = scb->pthru;
> +			list_del_init(&scb->list);
> +			scb->state = SCB_FREE;
>  
> -		}
> -		else {
> +			adapter->int_status = status;
> +			complete(&adapter->int_waitq);
> +		} else {
>  			scb = &adapter->scb_list[cmdid];
>  
>  			/*
> @@ -1640,25 +1636,7 @@ mega_cmd_done(adapter_t *adapter, u8 completed[], int nstatus, int status)
>  				cmd->result |= (DID_BAD_TARGET << 16)|status;
>  		}
>  
> -		/*
> -		 * Only free SCBs for the commands coming down from the
> -		 * mid-layer, not for which were issued internally
> -		 *
> -		 * For internal command, restore the status returned by the
> -		 * firmware so that user can interpret it.
> -		 */
> -		if( cmdid == CMDID_INT_CMDS ) { /* internal command */
> -			cmd->result = status;
> -
> -			/*
> -			 * Remove the internal command from the pending list
> -			 */
> -			list_del_init(&scb->list);
> -			scb->state = SCB_FREE;
> -		}
> -		else {
> -			mega_free_scb(adapter, scb);
> -		}
> +		mega_free_scb(adapter, scb);
>  
>  		/* Add Scsi_Command to end of completed queue */
>  		list_add_tail(SCSI_LIST(cmd), &adapter->completed_list);
> @@ -4133,23 +4111,15 @@ mega_internal_dev_inquiry(adapter_t *adapter, u8 ch, u8 tgt,
>   * The last argument is the address of the passthru structure if the command
>   * to be fired is a passthru command
>   *
> - * lockscope specifies whether the caller has already acquired the lock. Of
> - * course, the caller must know which lock we are talking about.
> - *
>   * Note: parameter 'pthru' is null for non-passthru commands.
>   */
>  static int
>  mega_internal_command(adapter_t *adapter, megacmd_t *mc, mega_passthru *pthru)
>  {
> -	Scsi_Cmnd	*scmd;
> -	struct	scsi_device *sdev;
> +	unsigned long flags;
>  	scb_t	*scb;
>  	int	rval;
>  
> -	scmd = scsi_allocate_command(GFP_KERNEL);
> -	if (!scmd)
> -		return -ENOMEM;
> -
>  	/*
>  	 * The internal commands share one command id and hence are
>  	 * serialized. This is so because we want to reserve maximum number of
> @@ -4160,73 +4130,45 @@ mega_internal_command(adapter_t *adapter, megacmd_t *mc, mega_passthru *pthru)
>  	scb = &adapter->int_scb;
>  	memset(scb, 0, sizeof(scb_t));
>  
> -	sdev = kzalloc(sizeof(struct scsi_device), GFP_KERNEL);
> -	scmd->device = sdev;
> -
> -	memset(adapter->int_cdb, 0, sizeof(adapter->int_cdb));
> -	scmd->cmnd = adapter->int_cdb;
> -	scmd->device->host = adapter->host;
> -	scmd->host_scribble = (void *)scb;
> -	scmd->cmnd[0] = MEGA_INTERNAL_CMD;
> -
> -	scb->state |= SCB_ACTIVE;
> -	scb->cmd = scmd;
> +	scb->idx = CMDID_INT_CMDS;
> +	scb->state |= SCB_ACTIVE | SCB_PENDQ;
>  
>  	memcpy(scb->raw_mbox, mc, sizeof(megacmd_t));
>  
>  	/*
>  	 * Is it a passthru command
>  	 */
> -	if( mc->cmd == MEGA_MBOXCMD_PASSTHRU ) {
> -
> +	if (mc->cmd == MEGA_MBOXCMD_PASSTHRU)
>  		scb->pthru = pthru;
> -	}
> -
> -	scb->idx = CMDID_INT_CMDS;
>  
> -	megaraid_queue_lck(scmd, mega_internal_done);
> +	spin_lock_irqsave(&adapter->lock, flags);
> +	list_add_tail(&scb->list, &adapter->pending_list);
> +	/*
> +	 * Check if the HBA is in quiescent state, e.g., during a
> +	 * delete logical drive opertion. If it is, don't run
> +	 * the pending_list.
> +	 */
> +	if (atomic_read(&adapter->quiescent) == 0)
> +		mega_runpendq(adapter);
> +	spin_unlock_irqrestore(&adapter->lock, flags);
>  
>  	wait_for_completion(&adapter->int_waitq);
>  
> -	rval = scmd->result;
> -	mc->status = scmd->result;
> -	kfree(sdev);
> +	mc->status = rval = adapter->int_status;
>  
>  	/*
>  	 * Print a debug message for all failed commands. Applications can use
>  	 * this information.
>  	 */
> -	if( scmd->result && trace_level ) {
> +	if( rval && trace_level ) {
>  		printk("megaraid: cmd [%x, %x, %x] status:[%x]\n",
> -			mc->cmd, mc->opcode, mc->subopcode, scmd->result);
> +			mc->cmd, mc->opcode, mc->subopcode, rval);
>  	}
>  
>  	mutex_unlock(&adapter->int_mtx);
> -
> -	scsi_free_command(GFP_KERNEL, scmd);
> -
>  	return rval;
>  }
>  
> -
> -/**
> - * mega_internal_done()
> - * @scmd - internal scsi command
> - *
> - * Callback routine for internal commands.
> - */
> -static void
> -mega_internal_done(Scsi_Cmnd *scmd)
> -{
> -	adapter_t	*adapter;
> -
> -	adapter = (adapter_t *)scmd->device->host->hostdata;
> -
> -	complete(&adapter->int_waitq);
> -
> -}
> -
> -
>  static struct scsi_host_template megaraid_template = {
>  	.module				= THIS_MODULE,
>  	.name				= "MegaRAID",
> diff --git a/drivers/scsi/megaraid.h b/drivers/scsi/megaraid.h
> index 4d0ce4e..8f2e026 100644
> --- a/drivers/scsi/megaraid.h
> +++ b/drivers/scsi/megaraid.h
> @@ -853,10 +853,10 @@ typedef struct {
>  
>  	u8	sglen;	/* f/w supported scatter-gather list length */
>  
> -	unsigned char int_cdb[MAX_COMMAND_SIZE];
>  	scb_t			int_scb;
>  	struct mutex		int_mtx;	/* To synchronize the internal
>  						commands */
> +	int			int_status; 	/* status of internal cmd */
>  	struct completion	int_waitq;	/* wait queue for internal
>  						 cmds */
>  
> @@ -1004,7 +1004,6 @@ static int mega_del_logdrv(adapter_t *, int);
>  static int mega_do_del_logdrv(adapter_t *, int);
>  static void mega_get_max_sgl(adapter_t *);
>  static int mega_internal_command(adapter_t *, megacmd_t *, mega_passthru *);
> -static void mega_internal_done(Scsi_Cmnd *);
>  static int mega_support_cluster(adapter_t *);
>  #endif
>  
> -- 
> 1.7.10.4
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-05 12:39 ` [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready Christoph Hellwig
@ 2014-02-06 16:56   ` James Bottomley
  2014-02-06 17:10     ` Bart Van Assche
  2014-02-10 11:39     ` Christoph Hellwig
  0 siblings, 2 replies; 52+ messages in thread
From: James Bottomley @ 2014-02-06 16:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Nicholas Bellinger, linux-scsi


On Wed, 2014-02-05 at 04:39 -0800, Christoph Hellwig wrote:
> Prepare for not taking a host-wide lock in the dispatch path by pushing
> the lock down into the places that actually need it.  Note that this
> patch is just a preparation step, as it will actually increase lock
> roundtrips and thus decrease performance on its own.

I'm not really convinced by the rest of the series.  I think patches
4,8,9,10,11,12 (lock elimination from fast path and get/put reduction)
are where the improvements are at and they mostly look ready to apply
modulo some author and signoff fixes.

I'm dubious about replacing a locked set of checks and increments with
atomics for the simple reason that atomics are pretty expensive on
non-x86, so you've likely slowed the critical path down for them.  Even
on x86, atomics can be very expensive because of the global bus lock.  I
think about three of them in a row is where you might as well stick with
the lock.

Could you benchmark this lot and show what the actual improvement is
just for this series, if any?

I also think we should be getting more utility out of threading
guarantees.  So, if there's only one thread active per device we don't
need any device counters to be atomic.  Likewise, u32 read/write is an
atomic operation, so we might be able to use sloppy counters for the
target and host stuff (one per CPU that are incremented/decremented on
that CPU ... this will only work using CPU locality ... completion on
same CPU but that seems to be an element of a lot of stuff nowadays).

I'm not saying this will all pan out, but I am saying I don't think just
making all the counters atomic to reduce locking will buy us a huge
amount, so I'd appreciate careful benchmarking to confirm or invalidate
this hypothesis first.

James



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-06 16:56   ` James Bottomley
@ 2014-02-06 17:10     ` Bart Van Assche
  2014-02-06 18:41       ` James Bottomley
  2014-02-06 21:58       ` Nicholas A. Bellinger
  2014-02-10 11:39     ` Christoph Hellwig
  1 sibling, 2 replies; 52+ messages in thread
From: Bart Van Assche @ 2014-02-06 17:10 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig; +Cc: Jens Axboe, linux-scsi

On 02/06/14 17:56, James Bottomley wrote:
> Could you benchmark this lot and show what the actual improvement is
> just for this series, if any?

I see a performance improvement of 12% with the SRP protocol for the
SCSI core optimizations alone (I am still busy measuring the impact of
the blk-mq conversion but I can already see that it is really
significant). Please note that the performance impact depends a lot on
the workload (number of LUNs per SCSI host e.g.) so maybe the workload I
chose is not doing justice to Christoph's work. And it's also important
to mention that with the workload I ran I was saturating the target
system CPU (a quad core Intel i5). In other words, results might be
better with a more powerful target system.

Bart.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-06 17:10     ` Bart Van Assche
@ 2014-02-06 18:41       ` James Bottomley
  2014-02-07 10:42         ` Bart Van Assche
  2014-02-06 21:58       ` Nicholas A. Bellinger
  1 sibling, 1 reply; 52+ messages in thread
From: James Bottomley @ 2014-02-06 18:41 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Christoph Hellwig, Jens Axboe, linux-scsi

On Thu, 2014-02-06 at 18:10 +0100, Bart Van Assche wrote:
> On 02/06/14 17:56, James Bottomley wrote:
> > Could you benchmark this lot and show what the actual improvement is
> > just for this series, if any?
> 
> I see a performance improvement of 12% with the SRP protocol for the
> SCSI core optimizations alone (I am still busy measuring the impact of
> the blk-mq conversion but I can already see that it is really
> significant). Please note that the performance impact depends a lot on
> the workload (number of LUNs per SCSI host e.g.) so maybe the workload I
> chose is not doing justice to Christoph's work. And it's also important
> to mention that with the workload I ran I was saturating the target
> system CPU (a quad core Intel i5). In other words, results might be
> better with a more powerful target system.

On what?  Just the patches I indicated or the whole series?  My specific
concern is that swapping a critical section for atomics may not buy us
anything even on x86 and may slow down non-x86.  That's the bit I'd like
benchmarks to explore.

James



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-06 17:10     ` Bart Van Assche
  2014-02-06 18:41       ` James Bottomley
@ 2014-02-06 21:58       ` Nicholas A. Bellinger
  2014-02-07 10:32         ` Bart Van Assche
  1 sibling, 1 reply; 52+ messages in thread
From: Nicholas A. Bellinger @ 2014-02-06 21:58 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Christoph Hellwig, Jens Axboe, linux-scsi

On Thu, 2014-02-06 at 18:10 +0100, Bart Van Assche wrote:
> On 02/06/14 17:56, James Bottomley wrote:
> > Could you benchmark this lot and show what the actual improvement is
> > just for this series, if any?
> 
> I see a performance improvement of 12% with the SRP protocol for the
> SCSI core optimizations alone (I am still busy measuring the impact of
> the blk-mq conversion but I can already see that it is really
> significant). Please note that the performance impact depends a lot on
> the workload (number of LUNs per SCSI host e.g.) so maybe the workload I
> chose is not doing justice to Christoph's work. And it's also important
> to mention that with the workload I ran I was saturating the target
> system CPU (a quad core Intel i5). In other words, results might be
> better with a more powerful target system.
> 

Starting with a baseline using scsi_debug that NOPs REQ_TYPE_FS commands
to measure improvements would be a better baseline vs. scsi_request_fn()
existing code that what your doing above.

That way at least it's easier to measure specific scsi-core overhead
down to the LLD's queuecommand without everything else in the way.

--nab


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/17] scsi: avoid useless free_list lock roundtrips
  2014-02-05 12:39 ` [PATCH 04/17] scsi: avoid useless free_list lock roundtrips Christoph Hellwig
  2014-02-05 23:44   ` James Bottomley
@ 2014-02-07  9:05   ` Paolo Bonzini
  1 sibling, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2014-02-07  9:05 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, James Bottomley, Nicholas Bellinger
  Cc: linux-scsi

Il 05/02/2014 13:39, Christoph Hellwig ha scritto:
> Avoid hitting the host-wide free_list lock unless we need to put a command
> back onto the freelist.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/scsi.c |   13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index ebcea6c..4139486 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -320,15 +320,16 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
>  {
>  	unsigned long flags;
>
> -	/* changing locks here, don't need to restore the irq state */
> -	spin_lock_irqsave(&shost->free_list_lock, flags);
>  	if (unlikely(list_empty(&shost->free_list))) {
> -		list_add(&cmd->list, &shost->free_list);
> -		cmd = NULL;
> +		spin_lock_irqsave(&shost->free_list_lock, flags);
> +		if (list_empty(&shost->free_list)) {
> +			list_add(&cmd->list, &shost->free_list);
> +			cmd = NULL;
> +		}
> +		spin_unlock_irqrestore(&shost->free_list_lock, flags);
>  	}
> -	spin_unlock_irqrestore(&shost->free_list_lock, flags);
>
> -	if (likely(cmd != NULL))
> +	if (cmd)
>  		scsi_pool_free_command(shost->cmd_pool, cmd);
>
>  	put_device(dev);

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/17] scsi: add support for per-host cmd pools
  2014-02-05 12:39 ` [PATCH 06/17] scsi: add support for per-host cmd pools Christoph Hellwig
@ 2014-02-07  9:13   ` Paolo Bonzini
  2014-02-07 12:44     ` Christoph Hellwig
  2014-02-07  9:35   ` Mike Christie
  1 sibling, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2014-02-07  9:13 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, James Bottomley, Nicholas Bellinger
  Cc: linux-scsi

Il 05/02/2014 13:39, Christoph Hellwig ha scritto:
> +	pool = scsi_find_host_cmd_pool(shost);

Should you have a WARN_ON somewhere if shost->hostt->cmd_size && 
shost->unchecked_isa_dma?

Apart from this,

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> +	if (!pool) {
> +		pool = scsi_alloc_host_cmd_pool(shost);
> +		if (!pool)
> +			goto out;
> +	}
> +


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/17] virtio_scsi: use cmd_size
  2014-02-05 12:39 ` [PATCH 07/17] virtio_scsi: use cmd_size Christoph Hellwig
@ 2014-02-07  9:13   ` Paolo Bonzini
  0 siblings, 0 replies; 52+ messages in thread
From: Paolo Bonzini @ 2014-02-07  9:13 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, James Bottomley, Nicholas Bellinger
  Cc: linux-scsi

Il 05/02/2014 13:39, Christoph Hellwig ha scritto:
> Taken almost entirely from Nicholas Bellinger's scsi-mq conversion.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/virtio_scsi.c |   25 +++++++------------------
>  1 file changed, 7 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 16bfd50..d9a6074 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -204,7 +204,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
>  			set_driver_byte(sc, DRIVER_SENSE);
>  	}
>
> -	mempool_free(cmd, virtscsi_cmd_pool);
>  	sc->scsi_done(sc);
>
>  	atomic_dec(&tgt->reqs);
> @@ -279,8 +278,6 @@ static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf)
>
>  	if (cmd->comp)
>  		complete_all(cmd->comp);
> -	else
> -		mempool_free(cmd, virtscsi_cmd_pool);
>  }
>
>  static void virtscsi_ctrl_done(struct virtqueue *vq)
> @@ -496,10 +493,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
>  				 struct virtio_scsi_vq *req_vq,
>  				 struct scsi_cmnd *sc)
>  {
> -	struct virtio_scsi_cmd *cmd;
> -	int ret;
> -
>  	struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
> +	struct virtio_scsi_cmd *cmd = (struct virtio_scsi_cmd *)(sc + 1);
> +
>  	BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
>
>  	/* TODO: check feature bit and fail if unsupported?  */
> @@ -508,11 +504,6 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
>  	dev_dbg(&sc->device->sdev_gendev,
>  		"cmd %p CDB: %#02x\n", sc, sc->cmnd[0]);
>
> -	ret = SCSI_MLQUEUE_HOST_BUSY;
> -	cmd = mempool_alloc(virtscsi_cmd_pool, GFP_ATOMIC);
> -	if (!cmd)
> -		goto out;
> -
>  	memset(cmd, 0, sizeof(*cmd));
>  	cmd->sc = sc;
>  	cmd->req.cmd = (struct virtio_scsi_cmd_req){
> @@ -531,13 +522,9 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
>
>  	if (virtscsi_kick_cmd(req_vq, cmd,
>  			      sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
> -			      GFP_ATOMIC) == 0)
> -		ret = 0;
> -	else
> -		mempool_free(cmd, virtscsi_cmd_pool);
> -
> -out:
> -	return ret;
> +			      GFP_ATOMIC) != 0)
> +		return SCSI_MLQUEUE_HOST_BUSY;
> +	return 0;
>  }
>
>  static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
> @@ -683,6 +670,7 @@ static struct scsi_host_template virtscsi_host_template_single = {
>  	.name = "Virtio SCSI HBA",
>  	.proc_name = "virtio_scsi",
>  	.this_id = -1,
> +	.cmd_size = sizeof(struct virtio_scsi_cmd),
>  	.queuecommand = virtscsi_queuecommand_single,
>  	.eh_abort_handler = virtscsi_abort,
>  	.eh_device_reset_handler = virtscsi_device_reset,
> @@ -699,6 +687,7 @@ static struct scsi_host_template virtscsi_host_template_multi = {
>  	.name = "Virtio SCSI HBA",
>  	.proc_name = "virtio_scsi",
>  	.this_id = -1,
> +	.cmd_size = sizeof(struct virtio_scsi_cmd),
>  	.queuecommand = virtscsi_queuecommand_multi,
>  	.eh_abort_handler = virtscsi_abort,
>  	.eh_device_reset_handler = virtscsi_device_reset,
> -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe
> linux-scsi" in the body of a message to majordomo@vger.kernel.org More
> majordomo info at http://vger.kernel.org/majordomo-info.html
>

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/17] scsi: add support for per-host cmd pools
  2014-02-05 12:39 ` [PATCH 06/17] scsi: add support for per-host cmd pools Christoph Hellwig
  2014-02-07  9:13   ` Paolo Bonzini
@ 2014-02-07  9:35   ` Mike Christie
  2014-02-07 12:46     ` Christoph Hellwig
  1 sibling, 1 reply; 52+ messages in thread
From: Mike Christie @ 2014-02-07  9:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, James Bottomley, Nicholas Bellinger, linux-scsi

On 02/05/2014 06:39 AM, Christoph Hellwig wrote:
> -static struct scsi_host_cmd_pool *scsi_get_host_cmd_pool(gfp_t gfp_mask)
> +static struct scsi_host_cmd_pool *
> +scsi_find_host_cmd_pool(struct Scsi_Host *shost)
>  {
> +	if (shost->hostt->cmd_size)
> +		return shost->hostt->cmd_pool;
> +	if (shost->unchecked_isa_dma)
> +		return &scsi_cmd_dma_pool;
> +	return &scsi_cmd_pool;
> +}
> +

It seems there is a issue with using the cmd_size to indicate the driver
has its own cmd pool and also using that for scsi mq enabled drivers to
indicate that we want the LLD's struct allocated by blk/scsi mq.

If a driver sets cmd_size for only the scsi/blk mq purpose, this patch
wants the driver to also setup a cmd pool which I do not think is used
when doing scsi/blk mq.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-06 21:58       ` Nicholas A. Bellinger
@ 2014-02-07 10:32         ` Bart Van Assche
  2014-02-07 19:30           ` Nicholas A. Bellinger
  0 siblings, 1 reply; 52+ messages in thread
From: Bart Van Assche @ 2014-02-07 10:32 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: James Bottomley, Christoph Hellwig, Jens Axboe, linux-scsi

On 02/06/14 22:58, Nicholas A. Bellinger wrote:
> Starting with a baseline using scsi_debug that NOPs REQ_TYPE_FS commands
> to measure improvements would be a better baseline vs. scsi_request_fn()
> existing code that what your doing above.
> 
> That way at least it's easier to measure specific scsi-core overhead
> down to the LLD's queuecommand without everything else in the way.

Running initiator and target workload on the same system makes initiator
and target code share CPUs and hence makes it hard to analyze which
performance change is caused by initiator optimizations and which
performance change is caused by interactions between initiator and
target workloads.

Bart.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-06 18:41       ` James Bottomley
@ 2014-02-07 10:42         ` Bart Van Assche
  0 siblings, 0 replies; 52+ messages in thread
From: Bart Van Assche @ 2014-02-07 10:42 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, Jens Axboe, linux-scsi

On 02/06/14 19:41, James Bottomley wrote:
> On Thu, 2014-02-06 at 18:10 +0100, Bart Van Assche wrote:
>> On 02/06/14 17:56, James Bottomley wrote:
>>> Could you benchmark this lot and show what the actual improvement is
>>> just for this series, if any?
>>
>> I see a performance improvement of 12% with the SRP protocol for the
>> SCSI core optimizations alone (I am still busy measuring the impact of
>> the blk-mq conversion but I can already see that it is really
>> significant). Please note that the performance impact depends a lot on
>> the workload (number of LUNs per SCSI host e.g.) so maybe the workload I
>> chose is not doing justice to Christoph's work. And it's also important
>> to mention that with the workload I ran I was saturating the target
>> system CPU (a quad core Intel i5). In other words, results might be
>> better with a more powerful target system.
> 
> On what?  Just the patches I indicated or the whole series?  My specific
> concern is that swapping a critical section for atomics may not buy us
> anything even on x86 and may slow down non-x86.  That's the bit I'd like
> benchmarks to explore.

The numbers I mentioned in my previous e-mail referred to the "SCSI data
path micro-optimizations" patch series and the "A different approach for
using blk-mq in the SCSI layer" series as a whole. I have run a new test
in which I compared the performance between a kernel with these two
patch series applied versus a kernel in which the four patches that
convert host_busy, target_busy and device_busy into atomics have been
reverted. For a workload with a single SCSI host, a single LUN, a block
size of 512 bytes, the SRP protocol and a single CPU thread submitting
I/O requests I see a performance improvement of 0.5% when using atomics.
For a workload with a single SCSI host, eight LUNs and eight CPU threads
submitting I/O I see a performance improvement of 3.8% when using
atomics. Please note that these measurements have been run on a single
socket system, that cache line misses are more expensive on NUMA systems
and hence that the performance impact of these patches on a NUMA system
will be more substantial.

Bart.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/17] scsi: add support for per-host cmd pools
  2014-02-07  9:13   ` Paolo Bonzini
@ 2014-02-07 12:44     ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-07 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Jens Axboe, James Bottomley,
	Nicholas Bellinger, linux-scsi

On Fri, Feb 07, 2014 at 10:13:25AM +0100, Paolo Bonzini wrote:
> Il 05/02/2014 13:39, Christoph Hellwig ha scritto:
> >+	pool = scsi_find_host_cmd_pool(shost);
> 
> Should you have a WARN_ON somewhere if shost->hostt->cmd_size &&
> shost->unchecked_isa_dma?

Seems like we could support passing SLAB_CACHE_DMA and __GFP_DMA
easily even for the per-template pool, I'll look into that.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/17] scsi: add support for per-host cmd pools
  2014-02-07  9:35   ` Mike Christie
@ 2014-02-07 12:46     ` Christoph Hellwig
  2014-02-07 21:43       ` Mike Christie
  0 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-07 12:46 UTC (permalink / raw)
  To: Mike Christie
  Cc: Christoph Hellwig, Jens Axboe, James Bottomley,
	Nicholas Bellinger, linux-scsi

On Fri, Feb 07, 2014 at 03:35:00AM -0600, Mike Christie wrote:
> It seems there is a issue with using the cmd_size to indicate the driver
> has its own cmd pool and also using that for scsi mq enabled drivers to
> indicate that we want the LLD's struct allocated by blk/scsi mq.
> 
> If a driver sets cmd_size for only the scsi/blk mq purpose, this patch
> wants the driver to also setup a cmd pool which I do not think is used
> when doing scsi/blk mq.

I don't quite understand what you mean.  cmd_size means that each
scsi_cmnd passed to the driver will have additional per-driver data.

For the old path it's implemented by creating a slab cache and storing
it in the host template to easily find it, for blk-mq it is used to
increase the allocation in the block core as scsi doesn't do it's own
allocations in this case.

Very different implementation underneath, but same effect for the
driver.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-07 10:32         ` Bart Van Assche
@ 2014-02-07 19:30           ` Nicholas A. Bellinger
  2014-02-08 11:00             ` Bart Van Assche
  0 siblings, 1 reply; 52+ messages in thread
From: Nicholas A. Bellinger @ 2014-02-07 19:30 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Christoph Hellwig, Jens Axboe, linux-scsi

On Fri, 2014-02-07 at 11:32 +0100, Bart Van Assche wrote:
> On 02/06/14 22:58, Nicholas A. Bellinger wrote:
> > Starting with a baseline using scsi_debug that NOPs REQ_TYPE_FS commands
> > to measure improvements would be a better baseline vs. scsi_request_fn()
> > existing code that what your doing above.
> > 
> > That way at least it's easier to measure specific scsi-core overhead
> > down to the LLD's queuecommand without everything else in the way.
> 
> Running initiator and target workload on the same system makes initiator
> and target code share CPUs and hence makes it hard to analyze which
> performance change is caused by initiator optimizations and which
> performance change is caused by interactions between initiator and
> target workloads.
> 

Huh..?  There is no 'target workload' involved.

All that scsi_debug with NOP'ed REQ_TYPE_FS commands is doing is calling
scsi_cmd->done() as soon as the descriptor has been dispatched into LLD
->queuecommand() code.

It's useful for determining an absolute performance ceiling between
scsi_mq vs. scsi_request_fn() based codepaths without involving any
other LLD specific overheads.

--nab


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/17] scsi: add support for per-host cmd pools
  2014-02-07 12:46     ` Christoph Hellwig
@ 2014-02-07 21:43       ` Mike Christie
  2014-02-10 12:20         ` Christoph Hellwig
  0 siblings, 1 reply; 52+ messages in thread
From: Mike Christie @ 2014-02-07 21:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, James Bottomley, Nicholas Bellinger, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 2183 bytes --]

On 02/07/2014 06:46 AM, Christoph Hellwig wrote:
> On Fri, Feb 07, 2014 at 03:35:00AM -0600, Mike Christie wrote:
>> It seems there is a issue with using the cmd_size to indicate the driver
>> has its own cmd pool and also using that for scsi mq enabled drivers to
>> indicate that we want the LLD's struct allocated by blk/scsi mq.
>>
>> If a driver sets cmd_size for only the scsi/blk mq purpose, this patch
>> wants the driver to also setup a cmd pool which I do not think is used
>> when doing scsi/blk mq.
> 
> I don't quite understand what you mean.  cmd_size means that each
> scsi_cmnd passed to the driver will have additional per-driver data.
> 
> For the old path it's implemented by creating a slab cache and storing
> it in the host template to easily find it, for blk-mq it is used to
> increase the allocation in the block core as scsi doesn't do it's own
> allocations in this case.
> 

Ah ok. There is a bug then. I thought you were doing something crazy and
trying to do both at the same time for the same host, and that is why in
the other thread I was asking for a way to figure out where per-driver
data is.

The problem is that if a driver is using scsi-mq and sets cmd_size,
scsi-ml is trying to also setup a shost->cmd_pool. We do:

scsi_add_host_with_dma->scsi_setup_command_freelist->scsi_get_host_cmd_pool->
scsi_find_host_cmd_pool

and that function uses cmd_size to determine if we are using the driver
allocated hostt->cmd_pool or the scsi-ml caches. In the case of where we
are setting cmd_size because we want the mq code to setup the per driver
data we do not want any cmd_pool and have not set one up. The problem is
that scsi_find_host_cmd_pool then returns NULL and
scsi_get_host_cmd_pool does scsi_alloc_host_cmd_pool.

Later when scsi_destroy_command_freelist calls scsi_put_host_cmd_pool,
scsi_find_host_cmd_pool returns NULL again and we oops when we access
the pool in there.

We need something like the attached patch which just prevents scsi-ml
from creating a host pool when mq is used. Note that when
scsi_destroy_command_freelist is called shost->cmd_pool will be NULL so
it will return immediately so no extra check is needed in there.

[-- Attachment #2: dont-setup-host-list.patch --]
[-- Type: text/plain, Size: 529 bytes --]

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index f28ea07..2924252 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -213,9 +213,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 		goto fail;
 	}
 
-	error = scsi_setup_command_freelist(shost);
-	if (error)
-		goto fail;
+	if (!sht->use_blk_mq) {
+		error = scsi_setup_command_freelist(shost);
+		if (error)
+			goto fail;
+	}
 
 	if (!shost->shost_gendev.parent)
 		shost->shost_gendev.parent = dev ? dev : &platform_bus;

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-07 19:30           ` Nicholas A. Bellinger
@ 2014-02-08 11:00             ` Bart Van Assche
  2014-02-09  8:26               ` Nicholas A. Bellinger
  0 siblings, 1 reply; 52+ messages in thread
From: Bart Van Assche @ 2014-02-08 11:00 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: James Bottomley, Christoph Hellwig, Jens Axboe, linux-scsi

On 02/07/14 20:30, Nicholas A. Bellinger wrote:
> All that scsi_debug with NOP'ed REQ_TYPE_FS commands is doing is calling
> scsi_cmd->done() as soon as the descriptor has been dispatched into LLD
> ->queuecommand() code.
> 
> It's useful for determining an absolute performance ceiling between
> scsi_mq vs. scsi_request_fn() based codepaths without involving any
> other LLD specific overheads.

Sorry but I'm not convinced that the scsi_debug driver is well suited
for testing scsi-mq. In source file drivers/scsi/scsi_debug.c of kernel
3.14-rc1 I found the following:

static DEF_SCSI_QCMD(scsi_debug_queuecommand)

>From include/scsi/scsi_host.h:

/*
 * Temporary #define for host lock push down. Can be removed when all
 * drivers have been updated to take advantage of unlocked
 * queuecommand.
 *
 */
#define DEF_SCSI_QCMD(func_name) \
        int func_name(struct Scsi_Host *shost, struct scsi_cmnd *cmd)  \
        {                                                              \
                unsigned long irq_flags;                               \
                int rc;                                                \
                spin_lock_irqsave(shost->host_lock, irq_flags);        \
                scsi_cmd_get_serial(shost, cmd);                       \
                rc = func_name##_lck (cmd, cmd->scsi_done);            \
                spin_unlock_irqrestore(shost->host_lock, irq_flags);   \
                return rc;                                             \
        }

In other words, all scsi_debug_queuecommand() are serialized. I think
for testing the scsi-mq code properly a SCSI LLD driver is needed that
allows concurrent queuecommand() calls.

Bart.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-08 11:00             ` Bart Van Assche
@ 2014-02-09  8:26               ` Nicholas A. Bellinger
  2014-02-10 12:09                 ` Christoph Hellwig
  0 siblings, 1 reply; 52+ messages in thread
From: Nicholas A. Bellinger @ 2014-02-09  8:26 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Christoph Hellwig, Jens Axboe, linux-scsi

On Sat, 2014-02-08 at 12:00 +0100, Bart Van Assche wrote:
> On 02/07/14 20:30, Nicholas A. Bellinger wrote:
> > All that scsi_debug with NOP'ed REQ_TYPE_FS commands is doing is calling
> > scsi_cmd->done() as soon as the descriptor has been dispatched into LLD
> > ->queuecommand() code.
> > 
> > It's useful for determining an absolute performance ceiling between
> > scsi_mq vs. scsi_request_fn() based codepaths without involving any
> > other LLD specific overheads.
> 
> Sorry but I'm not convinced that the scsi_debug driver is well suited
> for testing scsi-mq. In source file drivers/scsi/scsi_debug.c of kernel
> 3.14-rc1 I found the following:
> 
> static DEF_SCSI_QCMD(scsi_debug_queuecommand)
> 
> From include/scsi/scsi_host.h:
> 
> /*
>  * Temporary #define for host lock push down. Can be removed when all
>  * drivers have been updated to take advantage of unlocked
>  * queuecommand.
>  *
>  */
> #define DEF_SCSI_QCMD(func_name) \
>         int func_name(struct Scsi_Host *shost, struct scsi_cmnd *cmd)  \
>         {                                                              \
>                 unsigned long irq_flags;                               \
>                 int rc;                                                \
>                 spin_lock_irqsave(shost->host_lock, irq_flags);        \
>                 scsi_cmd_get_serial(shost, cmd);                       \
>                 rc = func_name##_lck (cmd, cmd->scsi_done);            \
>                 spin_unlock_irqrestore(shost->host_lock, irq_flags);   \
>                 return rc;                                             \
>         }
> 
> In other words, all scsi_debug_queuecommand() are serialized. I think
> for testing the scsi-mq code properly a SCSI LLD driver is needed that
> allows concurrent queuecommand() calls.
> 

Again, try NOP'ing all REQ_TYPE_FS type commands immediately in
->queuecommand() in order to determine a baseline without any other LLD
overhead involved.

Here's what been used so far in target-pending/scsi-mq:

>From 0f312a951eedc87adc4c00adfec8ab317727efdd Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Fri, 24 May 2013 05:11:38 +0000
Subject: scsi-debug: Enable scsi-mq operation

v4 changes:
  - Bump can_queue to 64 for performance testing
  - Enable scsi-mq DIF support
  - Add nop_fs_io module parameter for performance testing

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 80b8b10..612d36d 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -119,6 +119,7 @@ static const char * scsi_debug_version_date = "20100324";
 #define DEF_VIRTUAL_GB   0
 #define DEF_VPD_USE_HOSTNO 1
 #define DEF_WRITESAME_LENGTH 0xFFFF
+#define DEF_NOP_FS_IO 0
 
 /* bit mask values for scsi_debug_opts */
 #define SCSI_DEBUG_OPT_NOISE   1
@@ -195,6 +196,7 @@ static unsigned int scsi_debug_unmap_max_blocks = DEF_UNMAP_MAX_BLOCKS;
 static unsigned int scsi_debug_unmap_max_desc = DEF_UNMAP_MAX_DESC;
 static unsigned int scsi_debug_write_same_length = DEF_WRITESAME_LENGTH;
 static bool scsi_debug_removable = DEF_REMOVABLE;
+static unsigned int scsi_debug_nop_fs_io = DEF_NOP_FS_IO;
 
 static int scsi_debug_cmnd_count = 0;
 
@@ -2779,6 +2781,8 @@ module_param_named(vpd_use_hostno, scsi_debug_vpd_use_hostno, int,
 		   S_IRUGO | S_IWUSR);
 module_param_named(write_same_length, scsi_debug_write_same_length, int,
 		   S_IRUGO | S_IWUSR);
+module_param_named(nop_fs_io, scsi_debug_nop_fs_io, int,
+		   S_IRUGO | S_IWUSR);
 
 MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
 MODULE_DESCRIPTION("SCSI debug adapter driver");
@@ -2820,6 +2824,7 @@ MODULE_PARM_DESC(unmap_max_desc, "max # of ranges that can be unmapped in one cm
 MODULE_PARM_DESC(virtual_gb, "virtual gigabyte size (def=0 -> use dev_size_mb)");
 MODULE_PARM_DESC(vpd_use_hostno, "0 -> dev ids ignore hostno (def=1 -> unique dev ids)");
 MODULE_PARM_DESC(write_same_length, "Maximum blocks per WRITE SAME cmd (def=0xffff)");
+MODULE_PARM_DESC(nop_fs_io, "Turn REQ_TYPE_FS I/O into NOPs");
 
 static char sdebug_info[256];
 
@@ -3954,6 +3959,20 @@ write:
 
 static DEF_SCSI_QCMD(scsi_debug_queuecommand)
 
+static int scsi_debug_queuecommand_mq(struct Scsi_Host *host, struct scsi_cmnd *sc)
+{
+	struct request *rq = sc->request;
+
+	if (scsi_debug_nop_fs_io && rq->cmd_type == REQ_TYPE_FS) {
+		set_host_byte(sc, DID_OK);
+		sc->result |= SAM_STAT_GOOD;
+		sc->scsi_done(sc);
+		return 0;
+	}
+
+	return scsi_debug_queuecommand_lck(sc, sc->scsi_done);
+}
+
 static struct scsi_host_template sdebug_driver_template = {
 	.show_info =		scsi_debug_show_info,
 	.write_info =		scsi_debug_write_info,
@@ -3965,6 +3984,8 @@ static struct scsi_host_template sdebug_driver_template = {
 	.slave_destroy =	scsi_debug_slave_destroy,
 	.ioctl =		scsi_debug_ioctl,
 	.queuecommand =		scsi_debug_queuecommand,
+	.queuecommand_mq =	scsi_debug_queuecommand_mq,
+	.scsi_mq =		true,
 	.eh_abort_handler =	scsi_debug_abort,
 	.eh_bus_reset_handler = scsi_debug_bus_reset,
 	.eh_device_reset_handler = scsi_debug_device_reset,
@@ -3973,7 +3994,7 @@ static struct scsi_host_template sdebug_driver_template = {
 	.can_queue =		SCSI_DEBUG_CANQUEUE,
 	.this_id =		7,
 	.sg_tablesize =		256,
-	.cmd_per_lun =		16,
+	.cmd_per_lun =		64,
 	.max_sectors =		0xffff,
 	.use_clustering = 	DISABLE_CLUSTERING,
 	.module =		THIS_MODULE,


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-06 16:56   ` James Bottomley
  2014-02-06 17:10     ` Bart Van Assche
@ 2014-02-10 11:39     ` Christoph Hellwig
  2014-02-10 20:09       ` Jens Axboe
  2014-02-10 21:10       ` James Bottomley
  1 sibling, 2 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-10 11:39 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Jens Axboe, Nicholas Bellinger, linux-scsi

On Thu, Feb 06, 2014 at 08:56:59AM -0800, James Bottomley wrote:
> I'm dubious about replacing a locked set of checks and increments with
> atomics for the simple reason that atomics are pretty expensive on
> non-x86, so you've likely slowed the critical path down for them.  Even
> on x86, atomics can be very expensive because of the global bus lock.  I
> think about three of them in a row is where you might as well stick with
> the lock.

The three of them replace two locks at least when using blk-mq.  Until
we use blk-mq and those avoid the queue_lock we could keep the
per-device counters as-is.

As Bart's numbers have shown this defintively shows a major improvement
on x86, for other architecture we'd need someone to run benchmarks
on useful hardware.  Maybe some of the IBM people on the list could
help out on PPC and S/390?

> I also think we should be getting more utility out of threading
> guarantees.  So, if there's only one thread active per device we don't
> need any device counters to be atomic.  Likewise, u32 read/write is an
> atomic operation, so we might be able to use sloppy counters for the
> target and host stuff (one per CPU that are incremented/decremented on
> that CPU ... this will only work using CPU locality ... completion on
> same CPU but that seems to be an element of a lot of stuff nowadays).

The blk-mq code is aiming for CPU locality, but there are no hard
guarantees.  I'm also not sure always bouncing around the I/O submission
is a win, but it might be something to play around with at the block
layer.

Jens, did you try something like this earlier?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-09  8:26               ` Nicholas A. Bellinger
@ 2014-02-10 12:09                 ` Christoph Hellwig
  2014-02-10 19:53                   ` Nicholas A. Bellinger
  0 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-10 12:09 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Bart Van Assche, James Bottomley, Christoph Hellwig, Jens Axboe,
	linux-scsi, Douglas Gilbert

On Sun, Feb 09, 2014 at 12:26:48AM -0800, Nicholas A. Bellinger wrote:
> Again, try NOP'ing all REQ_TYPE_FS type commands immediately in
> ->queuecommand() in order to determine a baseline without any other LLD
> overhead involved.

Seems like this duplicates the fake_rw parameter.  Removing the needless
host_lock in queuecommand and maybe a few more optimizations would
certainly be more useful than duplicating the fake_rw parameter.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/17] scsi: add support for per-host cmd pools
  2014-02-07 21:43       ` Mike Christie
@ 2014-02-10 12:20         ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-10 12:20 UTC (permalink / raw)
  To: Mike Christie; +Cc: Jens Axboe, James Bottomley, Nicholas Bellinger, linux-scsi

On Fri, Feb 07, 2014 at 03:43:55PM -0600, Mike Christie wrote:
> We need something like the attached patch which just prevents scsi-ml
> from creating a host pool when mq is used. Note that when
> scsi_destroy_command_freelist is called shost->cmd_pool will be NULL so
> it will return immediately so no extra check is needed in there.

Oops, missed your attached patch and pushed a similar of mine instead,
but the effect should be the same.

Btw, if you can split out the cmd_size bits of the iscsi patch so that
we can send it to James with the command part of the first series I'd
appreciate that.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-10 12:09                 ` Christoph Hellwig
@ 2014-02-10 19:53                   ` Nicholas A. Bellinger
  0 siblings, 0 replies; 52+ messages in thread
From: Nicholas A. Bellinger @ 2014-02-10 19:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, James Bottomley, Jens Axboe, linux-scsi,
	Douglas Gilbert

On Mon, 2014-02-10 at 04:09 -0800, Christoph Hellwig wrote:
> On Sun, Feb 09, 2014 at 12:26:48AM -0800, Nicholas A. Bellinger wrote:
> > Again, try NOP'ing all REQ_TYPE_FS type commands immediately in
> > ->queuecommand() in order to determine a baseline without any other LLD
> > overhead involved.
> 
> Seems like this duplicates the fake_rw parameter.  Removing the needless
> host_lock in queuecommand and maybe a few more optimizations would
> certainly be more useful than duplicating the fake_rw parameter.
> 

Fine by me.

--nab

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 80b8b10..631f5d2 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -3603,7 +3603,7 @@ static void sdebug_remove_adapter(void)
 }
 
 static
-int scsi_debug_queuecommand_lck(struct scsi_cmnd *SCpnt, done_funct_t done)
+int __scsi_debug_queuecommand_lck(struct scsi_cmnd *SCpnt, done_funct_t done)
 {
 	unsigned char *cmd = (unsigned char *) SCpnt->cmnd;
 	int len, k;
@@ -3775,8 +3775,6 @@ read:
 		errsts = check_readiness(SCpnt, 0, devip);
 		if (errsts)
 			break;
-		if (scsi_debug_fake_rw)
-			break;
 		get_data_transfer_info(cmd, &lba, &num, &ei_lba);
 		errsts = resp_read(SCpnt, lba, num, devip, ei_lba);
 		if (inj_recovered && (0 == errsts)) {
@@ -3825,8 +3823,6 @@ write:
 		errsts = check_readiness(SCpnt, 0, devip);
 		if (errsts)
 			break;
-		if (scsi_debug_fake_rw)
-			break;
 		get_data_transfer_info(cmd, &lba, &num, &ei_lba);
 		errsts = resp_write(SCpnt, lba, num, devip, ei_lba);
 		if (inj_recovered && (0 == errsts)) {
@@ -3903,8 +3899,6 @@ write:
 		errsts = check_readiness(SCpnt, 0, devip);
 		if (errsts)
 			break;
-		if (scsi_debug_fake_rw)
-			break;
 		get_data_transfer_info(cmd, &lba, &num, &ei_lba);
 		errsts = resp_read(SCpnt, lba, num, devip, ei_lba);
 		if (errsts)
@@ -3952,7 +3946,21 @@ write:
 			     (delay_override ? 0 : scsi_debug_delay));
 }
 
-static DEF_SCSI_QCMD(scsi_debug_queuecommand)
+static DEF_SCSI_QCMD(__scsi_debug_queuecommand)
+
+static int scsi_debug_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
+{
+	struct request *rq = sc->request;
+
+	if (scsi_debug_fake_rw && rq->cmd_type == REQ_TYPE_FS) {
+		set_host_byte(sc, DID_OK);
+		sc->result |= SAM_STAT_GOOD;
+		sc->scsi_done(sc);
+		return 0;
+	}
+
+	return __scsi_debug_queuecommand(host, sc);
+}
 
 static struct scsi_host_template sdebug_driver_template = {
 	.show_info =		scsi_debug_show_info,
@@ -3973,7 +3981,7 @@ static struct scsi_host_template sdebug_driver_template = {
 	.can_queue =		SCSI_DEBUG_CANQUEUE,
 	.this_id =		7,
 	.sg_tablesize =		256,
-	.cmd_per_lun =		16,
+	.cmd_per_lun =		64,
 	.max_sectors =		0xffff,
 	.use_clustering = 	DISABLE_CLUSTERING,
 	.module =		THIS_MODULE,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-10 11:39     ` Christoph Hellwig
@ 2014-02-10 20:09       ` Jens Axboe
  2014-02-17  9:36         ` Bart Van Assche
  2014-02-10 21:10       ` James Bottomley
  1 sibling, 1 reply; 52+ messages in thread
From: Jens Axboe @ 2014-02-10 20:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James Bottomley, Nicholas Bellinger, linux-scsi

On Mon, Feb 10 2014, Christoph Hellwig wrote:
> > I also think we should be getting more utility out of threading
> > guarantees.  So, if there's only one thread active per device we don't
> > need any device counters to be atomic.  Likewise, u32 read/write is an
> > atomic operation, so we might be able to use sloppy counters for the
> > target and host stuff (one per CPU that are incremented/decremented on
> > that CPU ... this will only work using CPU locality ... completion on
> > same CPU but that seems to be an element of a lot of stuff nowadays).
> 
> The blk-mq code is aiming for CPU locality, but there are no hard
> guarantees.  I'm also not sure always bouncing around the I/O submission
> is a win, but it might be something to play around with at the block
> layer.
> 
> Jens, did you try something like this earlier?

Nope, I've always thought that if you needed to bounce submission
around, you would already have lost. Hopefully we're moving to a model
where you at least have X completion queues and can tell the hardware
where you want the completion. You'd be a lot better off just placing
the tasks differently, for the cases where you are not on the right
node.

If we're talking about shoving to a dedicated thread to avoid all the
locking, that's going to hurt you on the sync workloads as well. And
depending on your device and peak load, it'll kill you on the peak
performance as well. That's why blk-mq was designed to handle parallel
activity more efficiently.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-10 11:39     ` Christoph Hellwig
  2014-02-10 20:09       ` Jens Axboe
@ 2014-02-10 21:10       ` James Bottomley
  1 sibling, 0 replies; 52+ messages in thread
From: James Bottomley @ 2014-02-10 21:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Nicholas Bellinger, linux-scsi


On Mon, 2014-02-10 at 03:39 -0800, Christoph Hellwig wrote:
> On Thu, Feb 06, 2014 at 08:56:59AM -0800, James Bottomley wrote:
> > I'm dubious about replacing a locked set of checks and increments with
> > atomics for the simple reason that atomics are pretty expensive on
> > non-x86, so you've likely slowed the critical path down for them.  Even
> > on x86, atomics can be very expensive because of the global bus lock.  I
> > think about three of them in a row is where you might as well stick with
> > the lock.
> 
> The three of them replace two locks at least when using blk-mq.  Until
> we use blk-mq and those avoid the queue_lock we could keep the
> per-device counters as-is.

I'm not saying never, I'm just really dubious about this bit and the
potential cost on other platforms ... at the very least, surely the
device busy can still be a counter because of the current threading
guarantees?

> As Bart's numbers have shown this defintively shows a major improvement
> on x86, for other architecture we'd need someone to run benchmarks
> on useful hardware.  Maybe some of the IBM people on the list could
> help out on PPC and S/390?

Well, no, they haven't.  The number were an assertion not a benchmark
and 3.8% can be in the error margin of anybody's tests.  Even if the
actual benchmark were published, I can't see it being convincing because
there's too much potential variation (other architectures, different
ways of testing) for such a small result.

I'm happy to take shortening critical sections and removing atomics in
get/put because they're obvious wins without needing benchmark
justification.  I don't really want to take the dubious stuff (like spin
lock replacement with atomic) until we see the shape of what we can do
with the block mq stuff and what's necessary and what isn't.

James



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-10 20:09       ` Jens Axboe
@ 2014-02-17  9:36         ` Bart Van Assche
  2014-02-17 22:00           ` Christoph Hellwig
  0 siblings, 1 reply; 52+ messages in thread
From: Bart Van Assche @ 2014-02-17  9:36 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: James Bottomley, Nicholas Bellinger, linux-scsi

On 02/10/14 21:09, Jens Axboe wrote:
> On Mon, Feb 10 2014, Christoph Hellwig wrote:
>>> I also think we should be getting more utility out of threading
>>> guarantees.  So, if there's only one thread active per device we don't
>>> need any device counters to be atomic.  Likewise, u32 read/write is an
>>> atomic operation, so we might be able to use sloppy counters for the
>>> target and host stuff (one per CPU that are incremented/decremented on
>>> that CPU ... this will only work using CPU locality ... completion on
>>> same CPU but that seems to be an element of a lot of stuff nowadays).
>>
>> The blk-mq code is aiming for CPU locality, but there are no hard
>> guarantees.  I'm also not sure always bouncing around the I/O submission
>> is a win, but it might be something to play around with at the block
>> layer.
>>
>> Jens, did you try something like this earlier?
> 
> Nope, I've always thought that if you needed to bounce submission
> around, you would already have lost. [ ... ]

This comment makes a lot of sense to me. The approach that has been
taken in the scsi-mq patches that have been posted on February 5 is to
associate one blk-mq device with each LUN. That blk-mq device has one
hctx with queue depth shost->cmd_per_lun. So if there are multiple LUNs
per SCSI host the combined hctx queue depth of its LUNs can exceed
shost->can_queue. I'm not sure whether it's possible to prevent this
without modifying the block layer. How about modifying the block layer
such that a single hctx can be shared by multiple block devices and
adding cmd_per_lun support in the block layer ? I think that would allow
to prevent having to bounce submission in the SCSI mid-layer.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-17  9:36         ` Bart Van Assche
@ 2014-02-17 22:00           ` Christoph Hellwig
  2014-02-26 15:39             ` Bart Van Assche
  0 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2014-02-17 22:00 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Nicholas Bellinger, linux-scsi

On Mon, Feb 17, 2014 at 10:36:20AM +0100, Bart Van Assche wrote:
> This comment makes a lot of sense to me. The approach that has been
> taken in the scsi-mq patches that have been posted on February 5 is to
> associate one blk-mq device with each LUN. That blk-mq device has one
> hctx with queue depth shost->cmd_per_lun. So if there are multiple LUNs
> per SCSI host the combined hctx queue depth of its LUNs can exceed
> shost->can_queue. I'm not sure whether it's possible to prevent this
> without modifying the block layer. How about modifying the block layer
> such that a single hctx can be shared by multiple block devices and
> adding cmd_per_lun support in the block layer ? I think that would allow
> to prevent having to bounce submission in the SCSI mid-layer.

Most of the scsi multiqueue work so far has been about modifying the
block layer, so I'm defintively now shy about doing that were needed.
And I think we will eventually need to be able to have n:m queue to hctx
mapping instead of the current 1:n one.

I think the biggest issue will be to sort out the tag allocation.  I'm
going to look into this whole cluster of issues once the basic
functionality is fully working.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready
  2014-02-17 22:00           ` Christoph Hellwig
@ 2014-02-26 15:39             ` Bart Van Assche
  0 siblings, 0 replies; 52+ messages in thread
From: Bart Van Assche @ 2014-02-26 15:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, James Bottomley, Nicholas Bellinger, linux-scsi

On 02/17/14 23:00, Christoph Hellwig wrote:
> Most of the scsi multiqueue work so far has been about modifying the
> block layer, so I'm definitively now shy about doing that were needed.
> And I think we will eventually need to be able to have n:m queue to hctx
> mapping instead of the current 1:n one.

I think it would be great if the blk-mq core would support an n:m queue
to hctx mapping. One of the advantages of that approach would be that
the blk-mq layer already keeps track of how many commands are queued per
hctx and hence that would allow to eliminate the host_busy counter from
the SCSI mid-layer. However, this involves a change in semantics. I hope
it's fine to change the semantics of the hosts->can_queue parameter from
"maximum number of commands queued per SCSI host" into "maximum number
of commands queued per hctx" ?

Bart.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2014-02-26 15:39 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-05 12:39 [PATCH 00/17] SCSI data path micro-optimizations Christoph Hellwig
2014-02-05 12:39 ` [PATCH 01/17] scsi: handle command allocation failure in scsi_reset_provider Christoph Hellwig
2014-02-05 12:39 ` [PATCH 02/17] megaraid: simplify internal command handling Christoph Hellwig
2014-02-06 16:40   ` Christoph Hellwig
2014-02-05 12:39 ` [PATCH 03/17] scsi: remove scsi_allocate_command/scsi_free_command Christoph Hellwig
2014-02-05 12:39 ` [PATCH 04/17] scsi: avoid useless free_list lock roundtrips Christoph Hellwig
2014-02-05 23:44   ` James Bottomley
2014-02-06 16:22     ` Christoph Hellwig
2014-02-07  9:05   ` Paolo Bonzini
2014-02-05 12:39 ` [PATCH 05/17] scsi: simplify command allocation and freeing a bit Christoph Hellwig
2014-02-05 23:51   ` James Bottomley
2014-02-06 16:21     ` Christoph Hellwig
2014-02-05 12:39 ` [PATCH 06/17] scsi: add support for per-host cmd pools Christoph Hellwig
2014-02-07  9:13   ` Paolo Bonzini
2014-02-07 12:44     ` Christoph Hellwig
2014-02-07  9:35   ` Mike Christie
2014-02-07 12:46     ` Christoph Hellwig
2014-02-07 21:43       ` Mike Christie
2014-02-10 12:20         ` Christoph Hellwig
2014-02-05 12:39 ` [PATCH 07/17] virtio_scsi: use cmd_size Christoph Hellwig
2014-02-07  9:13   ` Paolo Bonzini
2014-02-05 12:39 ` [PATCH 08/17] scsi: do not manipulate device reference counts in scsi_get/put_command Christoph Hellwig
2014-02-05 12:39 ` [PATCH 09/17] scsi: micro-optimize scsi_request_fn() Christoph Hellwig
2014-02-05 12:39 ` [PATCH 10/17] scsi: micro-optimize scsi_next_command() Christoph Hellwig
2014-02-05 12:39 ` [PATCH 11/17] scsi: micro-optimize scsi_requeue_command() Christoph Hellwig
2014-02-05 12:39 ` [PATCH 12/17] scsi: avoid taking host_lock in scsi_run_queue unless nessecary Christoph Hellwig
2014-02-05 23:54   ` James Bottomley
2014-02-06 16:19     ` Christoph Hellwig
2014-02-05 12:39 ` [PATCH 13/17] scsi: push host_lock down into scsi_{host,target}_queue_ready Christoph Hellwig
2014-02-06 16:56   ` James Bottomley
2014-02-06 17:10     ` Bart Van Assche
2014-02-06 18:41       ` James Bottomley
2014-02-07 10:42         ` Bart Van Assche
2014-02-06 21:58       ` Nicholas A. Bellinger
2014-02-07 10:32         ` Bart Van Assche
2014-02-07 19:30           ` Nicholas A. Bellinger
2014-02-08 11:00             ` Bart Van Assche
2014-02-09  8:26               ` Nicholas A. Bellinger
2014-02-10 12:09                 ` Christoph Hellwig
2014-02-10 19:53                   ` Nicholas A. Bellinger
2014-02-10 11:39     ` Christoph Hellwig
2014-02-10 20:09       ` Jens Axboe
2014-02-17  9:36         ` Bart Van Assche
2014-02-17 22:00           ` Christoph Hellwig
2014-02-26 15:39             ` Bart Van Assche
2014-02-10 21:10       ` James Bottomley
2014-02-05 12:39 ` [PATCH 14/17] scsi: convert target_busy to an atomic_t Christoph Hellwig
2014-02-05 12:39 ` [PATCH 15/17] scsi: convert host_busy to atomic_t Christoph Hellwig
2014-02-05 12:39 ` [PATCH 16/17] scsi: convert device_busy " Christoph Hellwig
2014-02-05 12:39 ` [PATCH 17/17] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
2014-02-05 23:41 ` [PATCH 00/17] SCSI data path micro-optimizations James Bottomley
2014-02-06 16:29   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.