All of lore.kernel.org
 help / color / mirror / Atom feed
* hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
@ 2011-01-29 22:38 Nicholas A. Bellinger
  2011-01-30  3:26 ` Jeff Garzik
  2011-01-31 14:24 ` scameron
  0 siblings, 2 replies; 7+ messages in thread
From: Nicholas A. Bellinger @ 2011-01-29 22:38 UTC (permalink / raw)
  To: Stephen M. Cameron; +Cc: James Bottomley, J.H., linux-scsi

Hi Stephen and Co,

Attached is a quick patch to enable modern SCSI host_lock' less
operation for the HPSA SCSI LLD driver on >= .37 kernels.  After an
initial LLD conversion and review, I would like to verify with you that
the following calls are currently protected by HPDA LLD internal locks
that disable interrupts from the newly renamed hpsa_scsi_queue_command()
dispatcher below..

*) For cmd_alloc(), struct ctlr_info *h->lock + spin_lock_irqsave() is
obtained and released.

*) For enqueue_cmd_and_start_io(), struct ctlr_info *h->lock +
spin_lock_irqsave() is obtained for addQ() + start_io() and released.

So the one new piece wrt to host_lock less mode that is *not* protected
in hpsa_scsi_queue_command() is the call to hpsa_scatter_gather().  This
should be OK to get called without h->lock held w/ spin_lock_irqsave(),
yes..?

So far this patch has been compile tested only.  Please have a look and
comment.

Thanks!

--nab

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 12deffc..e205f33 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1906,9 +1906,11 @@ sglist_finished:
        return 0;
 }
 
-
-static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
-       void (*done)(struct scsi_cmnd *))
+/*
+ * Running in struct Scsi_Host->host_lock less mode using LLD internal
+ * struct ctlr_info *h->lock w/ spin_lock_irqsave() protection.
+ */
+static int hpsa_scsi_queue_command(struct Scsi_Host *sh, struct scsi_cmnd *cmd)
 {
        struct ctlr_info *h;
        struct hpsa_scsi_dev_t *dev;
@@ -1921,7 +1923,7 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
        dev = cmd->device->hostdata;
        if (!dev) {
                cmd->result = DID_NO_CONNECT << 16;
-               done(cmd);
+               cmd->scsi_done(cmd);
                return 0;
        }
        memcpy(scsi3addr, dev->scsi3addr, sizeof(scsi3addr));
@@ -1936,9 +1938,6 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
        }
 
        /* Fill in the command list header */
-
-       cmd->scsi_done = done;    /* save this for use by completion code */
-
        /* save c in case we have to abort it  */
        cmd->host_scribble = (unsigned char *) c;
 
@@ -2001,8 +2000,6 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
        return 0;
 }
 
-static DEF_SCSI_QCMD(hpsa_scsi_queue_command)
-
 static void hpsa_scan_start(struct Scsi_Host *sh)
 {
        struct ctlr_info *h = shost_to_hba(sh);



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
  2011-01-29 22:38 hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation Nicholas A. Bellinger
@ 2011-01-30  3:26 ` Jeff Garzik
  2011-01-30 21:18   ` Nicholas A. Bellinger
  2011-01-31 14:24 ` scameron
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2011-01-30  3:26 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Stephen M. Cameron, James Bottomley, J.H., linux-scsi

On 01/29/2011 05:38 PM, Nicholas A. Bellinger wrote:
> Hi Stephen and Co,
>
> Attached is a quick patch to enable modern SCSI host_lock' less
> operation for the HPSA SCSI LLD driver on>= .37 kernels.  After an
> initial LLD conversion and review, I would like to verify with you that
> the following calls are currently protected by HPDA LLD internal locks
> that disable interrupts from the newly renamed hpsa_scsi_queue_command()
> dispatcher below..
>
> *) For cmd_alloc(), struct ctlr_info *h->lock + spin_lock_irqsave() is
> obtained and released.
>
> *) For enqueue_cmd_and_start_io(), struct ctlr_info *h->lock +
> spin_lock_irqsave() is obtained for addQ() + start_io() and released.
>
> So the one new piece wrt to host_lock less mode that is *not* protected
> in hpsa_scsi_queue_command() is the call to hpsa_scatter_gather().  This
> should be OK to get called without h->lock held w/ spin_lock_irqsave(),
> yes..?
>
> So far this patch has been compile tested only.  Please have a look and
> comment.
>
> Thanks!
>
> --nab
>
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 12deffc..e205f33 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1906,9 +1906,11 @@ sglist_finished:
>          return 0;
>   }
>
> -
> -static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
> -       void (*done)(struct scsi_cmnd *))
> +/*
> + * Running in struct Scsi_Host->host_lock less mode using LLD internal
> + * struct ctlr_info *h->lock w/ spin_lock_irqsave() protection.
> + */

The way I read this comment, it initially misled me into thinking that 
this queuecommand was somehow wrapped entirely within this protection 
you described.

Only after an in-depth review did I figure out that cmd_alloc() and 
enqueue_cmd_and_start_io() did all the locking necessary.

Therefore, here are some observations and recommendations:

* the code changes are correct.  Reviewed-by: Jeff Garzik 
<jgarzik@redhat.com>

* delete the comment.  lack of "_lck" alone tells you its lockless.

* I question whether the following hpsa.c logic
	lock_irqsave
	cmd_alloc()
	unlock_irqrestore

	init cmd

	lock_irqsave
	enqueue
	unlock_irqrestore
   isn't more expensive than simply
	lock_irqsave
	cmd_alloc()
	init cmd
	enqueue
	unlock_irqrestore

It seems like the common case would cycle the spinlock and interrupts 
twice for an uncontended lock, which the initialization portion of the 
cmd, which may indeed occur in parallel, is so cheap you'll spend more 
time on the double-lock than anywhere else.

	Jeff



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
  2011-01-30  3:26 ` Jeff Garzik
@ 2011-01-30 21:18   ` Nicholas A. Bellinger
  2011-01-31 14:42     ` scameron
  0 siblings, 1 reply; 7+ messages in thread
From: Nicholas A. Bellinger @ 2011-01-30 21:18 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Stephen M. Cameron, James Bottomley, J.H., linux-scsi

On Sat, 2011-01-29 at 22:26 -0500, Jeff Garzik wrote:
> On 01/29/2011 05:38 PM, Nicholas A. Bellinger wrote:
> > Hi Stephen and Co,
> >
> > Attached is a quick patch to enable modern SCSI host_lock' less
> > operation for the HPSA SCSI LLD driver on>= .37 kernels.  After an
> > initial LLD conversion and review, I would like to verify with you that
> > the following calls are currently protected by HPDA LLD internal locks
> > that disable interrupts from the newly renamed hpsa_scsi_queue_command()
> > dispatcher below..
> >
> > *) For cmd_alloc(), struct ctlr_info *h->lock + spin_lock_irqsave() is
> > obtained and released.
> >
> > *) For enqueue_cmd_and_start_io(), struct ctlr_info *h->lock +
> > spin_lock_irqsave() is obtained for addQ() + start_io() and released.
> >
> > So the one new piece wrt to host_lock less mode that is *not* protected
> > in hpsa_scsi_queue_command() is the call to hpsa_scatter_gather().  This
> > should be OK to get called without h->lock held w/ spin_lock_irqsave(),
> > yes..?
> >
> > So far this patch has been compile tested only.  Please have a look and
> > comment.
> >
> > Thanks!
> >
> > --nab
> >
> > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > index 12deffc..e205f33 100644
> > --- a/drivers/scsi/hpsa.c
> > +++ b/drivers/scsi/hpsa.c
> > @@ -1906,9 +1906,11 @@ sglist_finished:
> >          return 0;
> >   }
> >
> > -
> > -static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
> > -       void (*done)(struct scsi_cmnd *))
> > +/*
> > + * Running in struct Scsi_Host->host_lock less mode using LLD internal
> > + * struct ctlr_info *h->lock w/ spin_lock_irqsave() protection.
> > + */
> 
> The way I read this comment, it initially misled me into thinking that 
> this queuecommand was somehow wrapped entirely within this protection 
> you described.
> 
> Only after an in-depth review did I figure out that cmd_alloc() and 
> enqueue_cmd_and_start_io() did all the locking necessary.
> 
> Therefore, here are some observations and recommendations:
> 
> * the code changes are correct.  Reviewed-by: Jeff Garzik 
> <jgarzik@redhat.com>
> 
> * delete the comment.  lack of "_lck" alone tells you its lockless.
> 
> * I question whether the following hpsa.c logic
> 	lock_irqsave
> 	cmd_alloc()
> 	unlock_irqrestore
> 
> 	init cmd
> 
> 	lock_irqsave
> 	enqueue
> 	unlock_irqrestore
>    isn't more expensive than simply
> 	lock_irqsave
> 	cmd_alloc()
> 	init cmd
> 	enqueue
> 	unlock_irqrestore
> 
> It seems like the common case would cycle the spinlock and interrupts 
> twice for an uncontended lock, which the initialization portion of the 
> cmd, which may indeed occur in parallel, is so cheap you'll spend more 
> time on the double-lock than anywhere else.
> 

Hi Jeff,

I was wondering about the overhead of double h->lock cycle for parallel
struct scsi_cmnd -> struct CommandList initialization vs. single h->
lock cycle..  Thanks for your input here!

Attached is a v2 to address your comments.

--nab

------------------------------------------------------------------------

[PATCH] hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation

This patch first converts HPSA to run in struct Scsi_Host->host_lock'
less operation by removing DEF_SCSI_QCMD() and '_lck' suffix from the
hpsa_scsi_queue_command() I/O dispatcher.

Secondly in hpsa_scsi_queue_command() the struct ctlr_info *h->lock is
now held a single lock obtain/release cycle while struct CommandList
initialization is performed, and enqueued into HW.  This enqueuing
is done using a new h->lock unlocked __enqueue_cmd_and_start_io(),
wrapper and conversion of the the original enqueue_cmd_and_start_io()
to use this new code.

Reviewed-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/hpsa.c |   34 +++++++++++++++++++---------------
 1 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 12deffc..fff7cd4 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -329,16 +329,25 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c)
                c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
 }

-static void enqueue_cmd_and_start_io(struct ctlr_info *h,
+/*
+ * Must be called with struct ctlr_info *h->lock held w/ interrupts disabled
+ */
+static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
        struct CommandList *c)
 {
-       unsigned long flags;
-
        set_performant_mode(h, c);
-       spin_lock_irqsave(&h->lock, flags);
        addQ(&h->reqQ, c);
        h->Qdepth++;
        start_io(h);
+}
+
+static void enqueue_cmd_and_start_io(struct ctlr_info *h,
+       struct CommandList *c)
+{
+       unsigned long flags;
+
+       spin_lock_irqsave(&h->lock, flags);
+       __enqueue_cmd_and_start_io(h, c);
        spin_unlock_irqrestore(&h->lock, flags);
 }

@@ -1906,9 +1915,7 @@ sglist_finished:
        return 0;
 }

-
-static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
-       void (*done)(struct scsi_cmnd *))
+static int hpsa_scsi_queue_command(struct Scsi_Host *sh, struct scsi_cmnd *cmd)
 {
        struct ctlr_info *h;
        struct hpsa_scsi_dev_t *dev;
@@ -1921,7 +1928,7 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
        dev = cmd->device->hostdata;
        if (!dev) {
                cmd->result = DID_NO_CONNECT << 16;
-               done(cmd);
+               cmd->scsi_done(cmd);
                return 0;
        }
        memcpy(scsi3addr, dev->scsi3addr, sizeof(scsi3addr));
@@ -1929,16 +1936,13 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
        /* Need a lock as this is being allocated from the pool */
        spin_lock_irqsave(&h->lock, flags);
        c = cmd_alloc(h);
-       spin_unlock_irqrestore(&h->lock, flags);
        if (c == NULL) {                        /* trouble... */
                dev_err(&h->pdev->dev, "cmd_alloc returned NULL!\n");
+               spin_unlock_irqrestore(&h->lock, flags);
                return SCSI_MLQUEUE_HOST_BUSY;
        }

        /* Fill in the command list header */
-
-       cmd->scsi_done = done;    /* save this for use by completion code */
-
        /* save c in case we have to abort it  */
        cmd->host_scribble = (unsigned char *) c;

@@ -1994,15 +1998,15 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,

        if (hpsa_scatter_gather(h, c, cmd) < 0) { /* Fill SG list */
                cmd_free(h, c);
+               spin_unlock_irqrestore(&h->lock, flags);
                return SCSI_MLQUEUE_HOST_BUSY;
        }
-       enqueue_cmd_and_start_io(h, c);
+       __enqueue_cmd_and_start_io(h, c);
+       spin_unlock_irqrestore(&h->lock, flags);
        /* the cmd'll come back via intr handler in complete_scsi_command()  */
        return 0;
 }

-static DEF_SCSI_QCMD(hpsa_scsi_queue_command)
-
 static void hpsa_scan_start(struct Scsi_Host *sh)
 {
        struct ctlr_info *h = shost_to_hba(sh);
-- 
1.7.3.5






^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
  2011-01-29 22:38 hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation Nicholas A. Bellinger
  2011-01-30  3:26 ` Jeff Garzik
@ 2011-01-31 14:24 ` scameron
  1 sibling, 0 replies; 7+ messages in thread
From: scameron @ 2011-01-31 14:24 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: James Bottomley, J.H., linux-scsi

On Sat, Jan 29, 2011 at 02:38:22PM -0800, Nicholas A. Bellinger wrote:
> Hi Stephen and Co,
> 
> Attached is a quick patch to enable modern SCSI host_lock' less
> operation for the HPSA SCSI LLD driver on >= .37 kernels.  After an
> initial LLD conversion and review, I would like to verify with you that
> the following calls are currently protected by HPDA LLD internal locks
> that disable interrupts from the newly renamed hpsa_scsi_queue_command()
> dispatcher below..
> 
> *) For cmd_alloc(), struct ctlr_info *h->lock + spin_lock_irqsave() is
> obtained and released.

Yes (but not inside cmd_alloc()), the lock is taken to protect
the bitmap that's used to track allocated commands, h->cmd_pool_bits.

> 
> *) For enqueue_cmd_and_start_io(), struct ctlr_info *h->lock +
> spin_lock_irqsave() is obtained for addQ() + start_io() and released.

Yes.

> 
> So the one new piece wrt to host_lock less mode that is *not* protected
> in hpsa_scsi_queue_command() is the call to hpsa_scatter_gather().  This
> should be OK to get called without h->lock held w/ spin_lock_irqsave(),
> yes..?

Yes, hpsa_scatter_gather() doesn't require the lock to be held.

> 
> So far this patch has been compile tested only.  Please have a look and
> comment.

Nothing obviously wrong that I see.

-- steve

> 
> Thanks!
> 
> --nab
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 12deffc..e205f33 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1906,9 +1906,11 @@ sglist_finished:
>         return 0;
>  }
>  
> -
> -static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
> -       void (*done)(struct scsi_cmnd *))
> +/*
> + * Running in struct Scsi_Host->host_lock less mode using LLD internal
> + * struct ctlr_info *h->lock w/ spin_lock_irqsave() protection.
> + */
> +static int hpsa_scsi_queue_command(struct Scsi_Host *sh, struct scsi_cmnd *cmd)
>  {
>         struct ctlr_info *h;
>         struct hpsa_scsi_dev_t *dev;
> @@ -1921,7 +1923,7 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
>         dev = cmd->device->hostdata;
>         if (!dev) {
>                 cmd->result = DID_NO_CONNECT << 16;
> -               done(cmd);
> +               cmd->scsi_done(cmd);
>                 return 0;
>         }
>         memcpy(scsi3addr, dev->scsi3addr, sizeof(scsi3addr));
> @@ -1936,9 +1938,6 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
>         }
>  
>         /* Fill in the command list header */
> -
> -       cmd->scsi_done = done;    /* save this for use by completion code */
> -
>         /* save c in case we have to abort it  */
>         cmd->host_scribble = (unsigned char *) c;
>  
> @@ -2001,8 +2000,6 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
>         return 0;
>  }
>  
> -static DEF_SCSI_QCMD(hpsa_scsi_queue_command)
> -
>  static void hpsa_scan_start(struct Scsi_Host *sh)
>  {
>         struct ctlr_info *h = shost_to_hba(sh);
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
  2011-01-30 21:18   ` Nicholas A. Bellinger
@ 2011-01-31 14:42     ` scameron
  2011-01-31 20:26       ` Nicholas A. Bellinger
  0 siblings, 1 reply; 7+ messages in thread
From: scameron @ 2011-01-31 14:42 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Jeff Garzik, James Bottomley, J.H., linux-scsi, scameron

On Sun, Jan 30, 2011 at 01:18:19PM -0800, Nicholas A. Bellinger wrote:
> On Sat, 2011-01-29 at 22:26 -0500, Jeff Garzik wrote:
> > On 01/29/2011 05:38 PM, Nicholas A. Bellinger wrote:
> > > Hi Stephen and Co,
> > >
> > > Attached is a quick patch to enable modern SCSI host_lock' less
> > > operation for the HPSA SCSI LLD driver on>= .37 kernels.  After an
> > > initial LLD conversion and review, I would like to verify with you that
> > > the following calls are currently protected by HPDA LLD internal locks
> > > that disable interrupts from the newly renamed hpsa_scsi_queue_command()
> > > dispatcher below..
> > >
> > > *) For cmd_alloc(), struct ctlr_info *h->lock + spin_lock_irqsave() is
> > > obtained and released.
> > >
> > > *) For enqueue_cmd_and_start_io(), struct ctlr_info *h->lock +
> > > spin_lock_irqsave() is obtained for addQ() + start_io() and released.
> > >
> > > So the one new piece wrt to host_lock less mode that is *not* protected
> > > in hpsa_scsi_queue_command() is the call to hpsa_scatter_gather().  This
> > > should be OK to get called without h->lock held w/ spin_lock_irqsave(),
> > > yes..?
> > >
> > > So far this patch has been compile tested only.  Please have a look and
> > > comment.
> > >
> > > Thanks!
> > >
> > > --nab
> > >
> > > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > > index 12deffc..e205f33 100644
> > > --- a/drivers/scsi/hpsa.c
> > > +++ b/drivers/scsi/hpsa.c
> > > @@ -1906,9 +1906,11 @@ sglist_finished:
> > >          return 0;
> > >   }
> > >
> > > -
> > > -static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
> > > -       void (*done)(struct scsi_cmnd *))
> > > +/*
> > > + * Running in struct Scsi_Host->host_lock less mode using LLD internal
> > > + * struct ctlr_info *h->lock w/ spin_lock_irqsave() protection.
> > > + */
> > 
> > The way I read this comment, it initially misled me into thinking that 
> > this queuecommand was somehow wrapped entirely within this protection 
> > you described.
> > 
> > Only after an in-depth review did I figure out that cmd_alloc() and 
> > enqueue_cmd_and_start_io() did all the locking necessary.
> > 
> > Therefore, here are some observations and recommendations:
> > 
> > * the code changes are correct.  Reviewed-by: Jeff Garzik 
> > <jgarzik@redhat.com>
> > 
> > * delete the comment.  lack of "_lck" alone tells you its lockless.
> > 
> > * I question whether the following hpsa.c logic
> > 	lock_irqsave
> > 	cmd_alloc()
> > 	unlock_irqrestore
> > 
> > 	init cmd
> > 
> > 	lock_irqsave
> > 	enqueue
> > 	unlock_irqrestore
> >    isn't more expensive than simply
> > 	lock_irqsave
> > 	cmd_alloc()
> > 	init cmd
> > 	enqueue
> > 	unlock_irqrestore
> > 
> > It seems like the common case would cycle the spinlock and interrupts 
> > twice for an uncontended lock, which the initialization portion of the 
> > cmd, which may indeed occur in parallel, is so cheap you'll spend more 
> > time on the double-lock than anywhere else.
> > 
> 
> Hi Jeff,
> 
> I was wondering about the overhead of double h->lock cycle for parallel
> struct scsi_cmnd -> struct CommandList initialization vs. single h->
> lock cycle..  Thanks for your input here!
> 

I'll defer to Jeff's expertise on this, though some measurements 
might be nice.  Might be hard to measure though, as I suspect if
you were able to make this part of the code infinitely fast, you'd
still have a hard time telling the difference, as commands spend 
such a small proportion of their lifetimes in there... though if
it eliminates or mitigates some sort of bad, weird interlocking
behavior between this code and the interrupt handler or something
like that, (not that I'm aware of any such bad behavior) I suppose
it might make a noticeable difference.
 
> Attached is a v2 to address your comments.
> 
> --nab
> 
> ------------------------------------------------------------------------
> 
> [PATCH] hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
> 
> This patch first converts HPSA to run in struct Scsi_Host->host_lock'
> less operation by removing DEF_SCSI_QCMD() and '_lck' suffix from the
> hpsa_scsi_queue_command() I/O dispatcher.
> 
> Secondly in hpsa_scsi_queue_command() the struct ctlr_info *h->lock is
> now held a single lock obtain/release cycle while struct CommandList
> initialization is performed, and enqueued into HW.  This enqueuing
> is done using a new h->lock unlocked __enqueue_cmd_and_start_io(),
> wrapper and conversion of the the original enqueue_cmd_and_start_io()
> to use this new code.
> 
> Reviewed-by: Jeff Garzik <jgarzik@redhat.com>
> Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
> ---
>  drivers/scsi/hpsa.c |   34 +++++++++++++++++++---------------
>  1 files changed, 19 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 12deffc..fff7cd4 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -329,16 +329,25 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c)
>                 c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
>  }
> 
> -static void enqueue_cmd_and_start_io(struct ctlr_info *h,
> +/*
> + * Must be called with struct ctlr_info *h->lock held w/ interrupts disabled
> + */
> +static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
>         struct CommandList *c)
>  {
> -       unsigned long flags;
> -
>         set_performant_mode(h, c);
> -       spin_lock_irqsave(&h->lock, flags);
>         addQ(&h->reqQ, c);
>         h->Qdepth++;
>         start_io(h);
> +}
> +
> +static void enqueue_cmd_and_start_io(struct ctlr_info *h,
> +       struct CommandList *c)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&h->lock, flags);
> +       __enqueue_cmd_and_start_io(h, c);
>         spin_unlock_irqrestore(&h->lock, flags);
>  }

Should that be "static inline"?  Or maybe the compiler's smart enough
to decide whether to inline that on its own these days?

> 
> @@ -1906,9 +1915,7 @@ sglist_finished:
>         return 0;
>  }
> 
> -
> -static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
> -       void (*done)(struct scsi_cmnd *))
> +static int hpsa_scsi_queue_command(struct Scsi_Host *sh, struct scsi_cmnd *cmd)
>  {
>         struct ctlr_info *h;
>         struct hpsa_scsi_dev_t *dev;
> @@ -1921,7 +1928,7 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
>         dev = cmd->device->hostdata;
>         if (!dev) {
>                 cmd->result = DID_NO_CONNECT << 16;
> -               done(cmd);
> +               cmd->scsi_done(cmd);
>                 return 0;
>         }
>         memcpy(scsi3addr, dev->scsi3addr, sizeof(scsi3addr));
> @@ -1929,16 +1936,13 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
>         /* Need a lock as this is being allocated from the pool */
>         spin_lock_irqsave(&h->lock, flags);
>         c = cmd_alloc(h);
> -       spin_unlock_irqrestore(&h->lock, flags);
>         if (c == NULL) {                        /* trouble... */
>                 dev_err(&h->pdev->dev, "cmd_alloc returned NULL!\n");
> +               spin_unlock_irqrestore(&h->lock, flags);
>                 return SCSI_MLQUEUE_HOST_BUSY;
>         }
> 
>         /* Fill in the command list header */
> -
> -       cmd->scsi_done = done;    /* save this for use by completion code */
> -
>         /* save c in case we have to abort it  */
>         cmd->host_scribble = (unsigned char *) c;
> 
> @@ -1994,15 +1998,15 @@ static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
> 
>         if (hpsa_scatter_gather(h, c, cmd) < 0) { /* Fill SG list */
>                 cmd_free(h, c);
> +               spin_unlock_irqrestore(&h->lock, flags);
>                 return SCSI_MLQUEUE_HOST_BUSY;
>         }
> -       enqueue_cmd_and_start_io(h, c);
> +       __enqueue_cmd_and_start_io(h, c);
> +       spin_unlock_irqrestore(&h->lock, flags);
>         /* the cmd'll come back via intr handler in complete_scsi_command()  */
>         return 0;
>  }
> 
> -static DEF_SCSI_QCMD(hpsa_scsi_queue_command)
> -
>  static void hpsa_scan_start(struct Scsi_Host *sh)
>  {
>         struct ctlr_info *h = shost_to_hba(sh);
> -- 
> 1.7.3.5
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
  2011-01-31 14:42     ` scameron
@ 2011-01-31 20:26       ` Nicholas A. Bellinger
  2011-02-01  9:13         ` Boaz Harrosh
  0 siblings, 1 reply; 7+ messages in thread
From: Nicholas A. Bellinger @ 2011-01-31 20:26 UTC (permalink / raw)
  To: scameron; +Cc: Jeff Garzik, James Bottomley, J.H., linux-scsi

On Mon, 2011-01-31 at 08:42 -0600, scameron@beardog.cce.hp.com wrote:
> On Sun, Jan 30, 2011 at 01:18:19PM -0800, Nicholas A. Bellinger wrote:
> > On Sat, 2011-01-29 at 22:26 -0500, Jeff Garzik wrote:
> > > On 01/29/2011 05:38 PM, Nicholas A. Bellinger wrote:

<SNIP>

> > > > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > > > index 12deffc..e205f33 100644
> > > > --- a/drivers/scsi/hpsa.c
> > > > +++ b/drivers/scsi/hpsa.c
> > > > @@ -1906,9 +1906,11 @@ sglist_finished:
> > > >          return 0;
> > > >   }
> > > >
> > > > -
> > > > -static int hpsa_scsi_queue_command_lck(struct scsi_cmnd *cmd,
> > > > -       void (*done)(struct scsi_cmnd *))
> > > > +/*
> > > > + * Running in struct Scsi_Host->host_lock less mode using LLD internal
> > > > + * struct ctlr_info *h->lock w/ spin_lock_irqsave() protection.
> > > > + */
> > > 
> > > The way I read this comment, it initially misled me into thinking that 
> > > this queuecommand was somehow wrapped entirely within this protection 
> > > you described.
> > > 
> > > Only after an in-depth review did I figure out that cmd_alloc() and 
> > > enqueue_cmd_and_start_io() did all the locking necessary.
> > > 
> > > Therefore, here are some observations and recommendations:
> > > 
> > > * the code changes are correct.  Reviewed-by: Jeff Garzik 
> > > <jgarzik@redhat.com>
> > > 
> > > * delete the comment.  lack of "_lck" alone tells you its lockless.
> > > 
> > > * I question whether the following hpsa.c logic
> > > 	lock_irqsave
> > > 	cmd_alloc()
> > > 	unlock_irqrestore
> > > 
> > > 	init cmd
> > > 
> > > 	lock_irqsave
> > > 	enqueue
> > > 	unlock_irqrestore
> > >    isn't more expensive than simply
> > > 	lock_irqsave
> > > 	cmd_alloc()
> > > 	init cmd
> > > 	enqueue
> > > 	unlock_irqrestore
> > > 
> > > It seems like the common case would cycle the spinlock and interrupts 
> > > twice for an uncontended lock, which the initialization portion of the 
> > > cmd, which may indeed occur in parallel, is so cheap you'll spend more 
> > > time on the double-lock than anywhere else.
> > > 
> > 
> > Hi Jeff,
> > 
> > I was wondering about the overhead of double h->lock cycle for parallel
> > struct scsi_cmnd -> struct CommandList initialization vs. single h->
> > lock cycle..  Thanks for your input here!
> > 
> 
> I'll defer to Jeff's expertise on this, though some measurements 
> might be nice.  Might be hard to measure though, as I suspect if
> you were able to make this part of the code infinitely fast, you'd
> still have a hard time telling the difference, as commands spend 
> such a small proportion of their lifetimes in there...

Unfortuately I do not have HPSA hardware to test this one in-house, but
this patch is actually intended for John + kernel.org mirrors, whom have
been moving from CCISS raw struct block_device -> HPSA LLD struct
scsi_device recently.

He has been observing some unusually high spin_lock contention on one
particular machine using HPSA LUNs w/ 38-rc2 code, and while it's pretty
certain that the HPSA LLD is not the root culprit, I figured that
getting this code properly converted to run w/o host_lock could not hurt
at this point.

> though if
> it eliminates or mitigates some sort of bad, weird interlocking
> behavior between this code and the interrupt handler or something
> like that, (not that I'm aware of any such bad behavior) I suppose
> it might make a noticeable difference.
>  

Understood, and thanks for the clarification here.  Running with
DEF_SCSI_QCMD() host_lock'ed code w/ the original double h->lock cycle
would have likely prevented any bad behaviour here, so holding h->lock
from before cmd_alloc() until after __enque_cmd_and_start_io() for a
single lock cycle per struct scsi_cmnd would seem to be provide similar
protection.


> > Attached is a v2 to address your comments.
> > 
> > --nab
> > 
> > ------------------------------------------------------------------------
> > 
> > [PATCH] hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
> > 
> > This patch first converts HPSA to run in struct Scsi_Host->host_lock'
> > less operation by removing DEF_SCSI_QCMD() and '_lck' suffix from the
> > hpsa_scsi_queue_command() I/O dispatcher.
> > 
> > Secondly in hpsa_scsi_queue_command() the struct ctlr_info *h->lock is
> > now held a single lock obtain/release cycle while struct CommandList
> > initialization is performed, and enqueued into HW.  This enqueuing
> > is done using a new h->lock unlocked __enqueue_cmd_and_start_io(),
> > wrapper and conversion of the the original enqueue_cmd_and_start_io()
> > to use this new code.
> > 
> > Reviewed-by: Jeff Garzik <jgarzik@redhat.com>
> > Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
> > ---
> >  drivers/scsi/hpsa.c |   34 +++++++++++++++++++---------------
> >  1 files changed, 19 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > index 12deffc..fff7cd4 100644
> > --- a/drivers/scsi/hpsa.c
> > +++ b/drivers/scsi/hpsa.c
> > @@ -329,16 +329,25 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c)
> >                 c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
> >  }
> > 
> > -static void enqueue_cmd_and_start_io(struct ctlr_info *h,
> > +/*
> > + * Must be called with struct ctlr_info *h->lock held w/ interrupts disabled
> > + */
> > +static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
> >         struct CommandList *c)
> >  {
> > -       unsigned long flags;
> > -
> >         set_performant_mode(h, c);
> > -       spin_lock_irqsave(&h->lock, flags);
> >         addQ(&h->reqQ, c);
> >         h->Qdepth++;
> >         start_io(h);
> > +}
> > +
> > +static void enqueue_cmd_and_start_io(struct ctlr_info *h,
> > +       struct CommandList *c)
> > +{
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&h->lock, flags);
> > +       __enqueue_cmd_and_start_io(h, c);
> >         spin_unlock_irqrestore(&h->lock, flags);
> >  }
> 
> Should that be "static inline"?  Or maybe the compiler's smart enough
> to decide whether to inline that on its own these days?
> 

Inlining both of these makes sense to me.  I will add this change and
send out an updated [PATCH] with your Reviewed-by shortly.

Thanks for your comments!

--nab


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation
  2011-01-31 20:26       ` Nicholas A. Bellinger
@ 2011-02-01  9:13         ` Boaz Harrosh
  0 siblings, 0 replies; 7+ messages in thread
From: Boaz Harrosh @ 2011-02-01  9:13 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: scameron, Jeff Garzik, James Bottomley, J.H., linux-scsi

On 01/31/2011 10:26 PM, Nicholas A. Bellinger wrote:
>>> +static void enqueue_cmd_and_start_io(struct ctlr_info *h,
>>> +       struct CommandList *c)
>>> +{
>>> +       unsigned long flags;
>>> +
>>> +       spin_lock_irqsave(&h->lock, flags);
>>> +       __enqueue_cmd_and_start_io(h, c);
>>>         spin_unlock_irqrestore(&h->lock, flags);
>>>  }
>>
>> Should that be "static inline"?  Or maybe the compiler's smart enough
>> to decide whether to inline that on its own these days?
>>
> 
> Inlining both of these makes sense to me.  I will add this change and
> send out an updated [PATCH] with your Reviewed-by shortly.
> 

Don't do that. Yes in an header but in a .c file never do.

If there is only a single user the compiler will inline it.
if there are lots of users then the compiler will choose by
the compile-for-size or not and the function size. In any way
the compiler will make a better choice then you regarding size
and speed gained.

I never put inline in .c files. Just make sure that functions
are not forward declared. (that will kill the inline) and
the gcc will always makes the better choice.

Thanks
Boaz

> Thanks for your comments!
> 
> --nab

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-02-01  9:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-29 22:38 hpsa: Convert SCSI LLD ->queuecommand() for host_lock less operation Nicholas A. Bellinger
2011-01-30  3:26 ` Jeff Garzik
2011-01-30 21:18   ` Nicholas A. Bellinger
2011-01-31 14:42     ` scameron
2011-01-31 20:26       ` Nicholas A. Bellinger
2011-02-01  9:13         ` Boaz Harrosh
2011-01-31 14:24 ` scameron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.