All of lore.kernel.org
 help / color / mirror / Atom feed
* scsi-mq performance check
@ 2015-12-18 14:58 John Garry
  2015-12-18 15:08 ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: John Garry @ 2015-12-18 14:58 UTC (permalink / raw)
  To: bart.vanassche, hch, hare, linux-scsi

Hi,

I have started to enable scsi-mq on the HiSilicon SAS driver.

Are there hints/checks I should use to make sure it is configured 
correctly/optimally? In my initial testing I have seen some performance 
improvements, but none like what I have seen in presentations.

Cheers,
John



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: scsi-mq performance check
  2015-12-18 14:58 scsi-mq performance check John Garry
@ 2015-12-18 15:08 ` Hannes Reinecke
  2015-12-18 15:19   ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2015-12-18 15:08 UTC (permalink / raw)
  To: John Garry, bart.vanassche, hch, linux-scsi

On 12/18/2015 03:58 PM, John Garry wrote:
> Hi,
>
> I have started to enable scsi-mq on the HiSilicon SAS driver.
>
> Are there hints/checks I should use to make sure it is configured
> correctly/optimally? In my initial testing I have seen some
> performance improvements, but none like what I have seen in
> presentations.
>
The whole thing is build around having symmetric submit and receive 
queues, so that we can tack a send/receive queue pair to the same 
CPU. With that we can ensure that we don't have any cache 
invalidation, as the request is already in the cache for that CPU 
when the completion is recieved. _And_ we can get rid of most 
spinlocks as other CPUs cannot access our request.

So make sure to have the submit and receive queues properly done, 
and ensure you don't have any global resources within your driver 
which needs to be locked. Or move access to those resources out of 
the fast path.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: scsi-mq performance check
  2015-12-18 15:08 ` Hannes Reinecke
@ 2015-12-18 15:19   ` Bart Van Assche
  2015-12-18 15:36     ` John Garry
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2015-12-18 15:19 UTC (permalink / raw)
  To: Hannes Reinecke, John Garry, bart.vanassche, hch, linux-scsi

On 12/18/2015 04:08 PM, Hannes Reinecke wrote:
> On 12/18/2015 03:58 PM, John Garry wrote:
>> Hi,
>>
>> I have started to enable scsi-mq on the HiSilicon SAS driver.
>>
>> Are there hints/checks I should use to make sure it is configured
>> correctly/optimally? In my initial testing I have seen some
>> performance improvements, but none like what I have seen in
>> presentations.
>>
> The whole thing is build around having symmetric submit and receive
> queues, so that we can tack a send/receive queue pair to the same CPU.
> With that we can ensure that we don't have any cache invalidation, as
> the request is already in the cache for that CPU when the completion is
> recieved. _And_ we can get rid of most spinlocks as other CPUs cannot
> access our request.
>
> So make sure to have the submit and receive queues properly done, and
> ensure you don't have any global resources within your driver which
> needs to be locked. Or move access to those resources out of the fast path.

Hello John,

It's great news that you started looking into scsi-mq support :-) As 
Hannes wrote, if the performance improvement is not as big as you 
expected this could be caused e.g. by lock contention. Are you familiar 
with the perf tool ? The perf tool can be a great help to verify whether 
lock contention occurs and also which lock(s) cause it.

Bart.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: scsi-mq performance check
  2015-12-18 15:19   ` Bart Van Assche
@ 2015-12-18 15:36     ` John Garry
  2015-12-18 16:05       ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: John Garry @ 2015-12-18 15:36 UTC (permalink / raw)
  To: Bart Van Assche, Hannes Reinecke, hch, linux-scsi; +Cc: zhangfei.gao

On 18/12/2015 15:19, Bart Van Assche wrote:
> On 12/18/2015 04:08 PM, Hannes Reinecke wrote:
>> On 12/18/2015 03:58 PM, John Garry wrote:
>>> Hi,
>>>
>>> I have started to enable scsi-mq on the HiSilicon SAS driver.
>>>
>>> Are there hints/checks I should use to make sure it is configured
>>> correctly/optimally? In my initial testing I have seen some
>>> performance improvements, but none like what I have seen in
>>> presentations.
>>>
>> The whole thing is build around having symmetric submit and receive
>> queues, so that we can tack a send/receive queue pair to the same CPU.
>> With that we can ensure that we don't have any cache invalidation, as
>> the request is already in the cache for that CPU when the completion is
>> recieved. _And_ we can get rid of most spinlocks as other CPUs cannot
>> access our request.
>>
>> So make sure to have the submit and receive queues properly done, and
>> ensure you don't have any global resources within your driver which
>> needs to be locked. Or move access to those resources out of the fast
>> path.
>
> Hello John,
>
> It's great news that you started looking into scsi-mq support :-) As
> Hannes wrote, if the performance improvement is not as big as you
> expected this could be caused e.g. by lock contention. Are you familiar
> with the perf tool ? The perf tool can be a great help to verify whether
> lock contention occurs and also which lock(s) cause it.
>
> Bart.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Thanks for the replies.

One of my main concerns is how we use a spinlock in our task exec 
function to prepare and deliver a frame to the hardware:
hisi_sas_task_exec()
{
     ...

     /* protect task_prep and start_delivery sequence */
     spin_lock_irqsave(&hisi_hba->lock, flags);
     rc = hisi_sas_task_prep(task, hisi_hba, is_tmf, tmf, &pass);
     ...
     hisi_hba->hw->start_delivery(hisi_hba);
     spin_unlock_irqrestore(&hisi_hba->lock, flags);

     ...
}

We have to lock due to how we reserve a slot in the delivery queue. We 
are looking to optimise this, but it's not straightforward.

Perf is a good strategy, but, to be honest, I have not spent a lot of 
time looking at this so I'm looking for low hanging fruit initially.

FYI, our hardware does have the same number of delivery and completion 
queues (32), and 16 cores. One thing to note is that a command which was 
sent on queue x is not quaranteed to complete on queue y.

cheers,



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: scsi-mq performance check
  2015-12-18 15:36     ` John Garry
@ 2015-12-18 16:05       ` Hannes Reinecke
  2015-12-18 16:50         ` John Garry
  0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2015-12-18 16:05 UTC (permalink / raw)
  To: John Garry, Bart Van Assche, hch, linux-scsi; +Cc: zhangfei.gao

On 12/18/2015 04:36 PM, John Garry wrote:
> On 18/12/2015 15:19, Bart Van Assche wrote:
>> On 12/18/2015 04:08 PM, Hannes Reinecke wrote:
>>> On 12/18/2015 03:58 PM, John Garry wrote:
>>>> Hi,
>>>>
>>>> I have started to enable scsi-mq on the HiSilicon SAS driver.
>>>>
>>>> Are there hints/checks I should use to make sure it is configured
>>>> correctly/optimally? In my initial testing I have seen some
>>>> performance improvements, but none like what I have seen in
>>>> presentations.
>>>>
>>> The whole thing is build around having symmetric submit and receive
>>> queues, so that we can tack a send/receive queue pair to the same
>>> CPU.
>>> With that we can ensure that we don't have any cache
>>> invalidation, as
>>> the request is already in the cache for that CPU when the
>>> completion is
>>> recieved. _And_ we can get rid of most spinlocks as other CPUs
>>> cannot
>>> access our request.
>>>
>>> So make sure to have the submit and receive queues properly done,
>>> and
>>> ensure you don't have any global resources within your driver which
>>> needs to be locked. Or move access to those resources out of the
>>> fast
>>> path.
>>
>> Hello John,
>>
>> It's great news that you started looking into scsi-mq support :-) As
>> Hannes wrote, if the performance improvement is not as big as you
>> expected this could be caused e.g. by lock contention. Are you
>> familiar
>> with the perf tool ? The perf tool can be a great help to verify
>> whether
>> lock contention occurs and also which lock(s) cause it.
>>
>> Bart.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> Thanks for the replies.
>
> One of my main concerns is how we use a spinlock in our task exec
> function to prepare and deliver a frame to the hardware:
> hisi_sas_task_exec()
> {
>      ...
>
>      /* protect task_prep and start_delivery sequence */
>      spin_lock_irqsave(&hisi_hba->lock, flags);
>      rc = hisi_sas_task_prep(task, hisi_hba, is_tmf, tmf, &pass);
>      ...
>      hisi_hba->hw->start_delivery(hisi_hba);
>      spin_unlock_irqrestore(&hisi_hba->lock, flags);
>
>      ...
> }
>
> We have to lock due to how we reserve a slot in the delivery queue.
> We are looking to optimise this, but it's not straightforward.
>
> Perf is a good strategy, but, to be honest, I have not spent a lot
> of time looking at this so I'm looking for low hanging fruit initially.
>
> FYI, our hardware does have the same number of delivery and
> completion queues (32), and 16 cores. One thing to note is that a
> command which was sent on queue x is not quaranteed to complete on
> queue y.
>
... then don't bother looking at scsi-mq. That is the very thing it 
relies on ...

Time to change the firmware?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: scsi-mq performance check
  2015-12-18 16:05       ` Hannes Reinecke
@ 2015-12-18 16:50         ` John Garry
  0 siblings, 0 replies; 6+ messages in thread
From: John Garry @ 2015-12-18 16:50 UTC (permalink / raw)
  To: Hannes Reinecke, Bart Van Assche, hch, linux-scsi; +Cc: zhangfei.gao


>> We have to lock due to how we reserve a slot in the delivery queue.
>> We are looking to optimise this, but it's not straightforward.
>>
>> Perf is a good strategy, but, to be honest, I have not spent a lot
>> of time looking at this so I'm looking for low hanging fruit initially.
>>
>> FYI, our hardware does have the same number of delivery and
>> completion queues (32), and 16 cores. One thing to note is that a
>> command which was sent on queue x is not quaranteed to complete on
>> queue y.
>>
> ... then don't bother looking at scsi-mq. That is the very thing it
> relies on ...
>
> Time to change the firmware?
>
> Cheers,
>
> Hannes

Hi,

Even though a slot delivered on queue x is not guaranteed to complete on 
completion queue x, is nearly always does (I just quickly tested on our 
new chip, and 100% of the time it is the same - I need to check with our 
hardware guys if this was only v1 of the IP).

As for firmware, our controller does not have any.

cheers,


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-12-18 16:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-18 14:58 scsi-mq performance check John Garry
2015-12-18 15:08 ` Hannes Reinecke
2015-12-18 15:19   ` Bart Van Assche
2015-12-18 15:36     ` John Garry
2015-12-18 16:05       ` Hannes Reinecke
2015-12-18 16:50         ` John Garry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.