All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Device or HBA level QD throttling creates randomness in sequetial workload
@ 2016-10-21 12:13 Kashyap Desai
  2016-10-21 21:31 ` Omar Sandoval
  0 siblings, 1 reply; 17+ messages in thread
From: Kashyap Desai @ 2016-10-21 12:13 UTC (permalink / raw)
  To: linux-scsi, linux-kernel, linux-block
  Cc: axboe, Christoph Hellwig, paolo.valente, osandov

Hi -

I found below conversation and it is on the same line as I wanted some
input from mailing list.

http://marc.info/?l=linux-kernel&m=147569860526197&w=2

I can do testing on any WIP item as Omar mentioned in above discussion.
https://github.com/osandov/linux/tree/blk-mq-iosched

Is there any workaround/alternative in latest upstream kernel, if user
wants to see limited penalty  for Sequential Work load on HDD ?

` Kashyap

> -----Original Message-----
> From: Kashyap Desai [mailto:kashyap.desai@broadcom.com]
> Sent: Thursday, October 20, 2016 3:39 PM
> To: linux-scsi@vger.kernel.org
> Subject: Device or HBA level QD throttling creates randomness in
sequetial
> workload
>
> [ Apologize, if you find more than one instance of my email.
> Web based email client has some issue, so now trying git send mail.]
>
> Hi,
>
> I am doing some performance tuning in MR driver to understand how sdev
> queue depth and hba queue depth play role in IO submission from above
layer.
> I have 24 JBOD connected to MR 12GB controller and I can see performance
for
> 4K Sequential work load as below.
>
> HBA QD for MR controller is 4065 and Per device QD is set to 32
>
> queue depth from <fio> 256 reports 300K IOPS queue depth from <fio> 128
> reports 330K IOPS queue depth from <fio> 64 reports 360K IOPS queue
depth
> from <fio> 32 reports 510K IOPS
>
> In MR driver I added debug print and confirm that more IO come to driver
as
> random IO whenever I have <fio> queue depth more than 32.
>
> I have debug using scsi logging level and blktrace as well. Below is
snippet of
> logs using scsi logging level.  In summary, if SML do flow control of IO
due to
> Device QD or HBA QD, IO coming to LLD is more random pattern.
>
> I see IO coming to driver is not sequential.
>
> [79546.912041] sd 18:2:21:0: [sdy] tag#854 CDB: Write(10) 2a 00 00 03 c0
3b 00
> 00 01 00 [79546.912049] sd 18:2:21:0: [sdy] tag#855 CDB: Write(10) 2a 00
00 03
> c0 3c 00 00 01 00 [79546.912053] sd 18:2:21:0: [sdy] tag#886 CDB:
Write(10) 2a
> 00 00 03 c0 5b 00 00 01 00
>
> <KD> After LBA "00 03 c0 3c" next command is with LBA "00 03 c0 5b".
> Two Sequence are overlapped due to sdev QD throttling.
>
> [79546.912056] sd 18:2:21:0: [sdy] tag#887 CDB: Write(10) 2a 00 00 03 c0
5c 00
> 00 01 00 [79546.912250] sd 18:2:21:0: [sdy] tag#856 CDB: Write(10) 2a 00
00 03
> c0 3d 00 00 01 00 [79546.912257] sd 18:2:21:0: [sdy] tag#888 CDB:
Write(10) 2a
> 00 00 03 c0 5d 00 00 01 00 [79546.912259] sd 18:2:21:0: [sdy] tag#857
CDB:
> Write(10) 2a 00 00 03 c0 3e 00 00 01 00 [79546.912268] sd 18:2:21:0:
[sdy]
> tag#858 CDB: Write(10) 2a 00 00 03 c0 3f 00 00 01 00
>
>  If scsi_request_fn() breaks due to unavailability of device queue (due
to below
> check), will there be any side defect as I observe ?
>                 if (!scsi_dev_queue_ready(q, sdev))
>                              break;
>
> If I reduce HBA QD and make sure IO from above layer is throttled due to
HBA
> QD, there is a same impact.
> MR driver use host wide shared tag map.
>
> Can someone help me if this can be tunable in LLD providing additional
settings
> or it is expected behavior ? Problem I am facing is, I am not able to
figure out
> optimal device queue depth for different configuration and work load.
>
> Thanks, Kashyap

^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: Device or HBA level QD throttling creates randomness in sequetial workload
@ 2016-10-24 18:54 Kashyap Desai
  2016-10-26 20:56 ` Omar Sandoval
  2016-10-31 17:24 ` Jens Axboe
  0 siblings, 2 replies; 17+ messages in thread
From: Kashyap Desai @ 2016-10-24 18:54 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-scsi, linux-kernel, linux-block, axboe, Christoph Hellwig,
	paolo.valente

> -----Original Message-----
> From: Omar Sandoval [mailto:osandov@osandov.com]
> Sent: Monday, October 24, 2016 9:11 PM
> To: Kashyap Desai
> Cc: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> block@vger.kernel.org; axboe@kernel.dk; Christoph Hellwig;
> paolo.valente@linaro.org
> Subject: Re: Device or HBA level QD throttling creates randomness in
sequetial
> workload
>
> On Mon, Oct 24, 2016 at 06:35:01PM +0530, Kashyap Desai wrote:
> > >
> > > On Fri, Oct 21, 2016 at 05:43:35PM +0530, Kashyap Desai wrote:
> > > > Hi -
> > > >
> > > > I found below conversation and it is on the same line as I wanted
> > > > some input from mailing list.
> > > >
> > > > http://marc.info/?l=linux-kernel&m=147569860526197&w=2
> > > >
> > > > I can do testing on any WIP item as Omar mentioned in above
> > discussion.
> > > > https://github.com/osandov/linux/tree/blk-mq-iosched
> >
> > I tried build kernel using this repo, but looks like it is not allowed
> > to reboot due to some changes in <block> layer.
>
> Did you build the most up-to-date version of that branch? I've been
force
> pushing to it, so the commit id that you built would be useful.
> What boot failure are you seeing?

Below  is latest commit on repo.
commit b077a9a5149f17ccdaa86bc6346fa256e3c1feda
Author: Omar Sandoval <osandov@fb.com>
Date:   Tue Sep 20 11:20:03 2016 -0700

    [WIP] blk-mq: limit bio queue depth

I have latest repo from 4.9/scsi-next maintained by Martin which boots
fine.  Only Delta is  " CONFIG_SBITMAP" is enabled in WIP blk-mq-iosched
branch. I could not see any meaningful data on boot hang, so going to try
one more time tomorrow.


>
> > >
> > > Are you using blk-mq for this disk? If not, then the work there
> > > won't
> > affect you.
> >
> > YES. I am using blk-mq for my test. I also confirm if use_blk_mq is
> > disable, Sequential work load issue is not seen and <cfq> scheduling
> > works well.
>
> Ah, okay, perfect. Can you send the fio job file you're using? Hard to
tell exactly
> what's going on without the details. A sequential workload with just one
> submitter is about as easy as it gets, so this _should_ be behaving
nicely.

<FIO script>

; setup numa policy for each thread
; 'numactl --show' to determine the maximum numa nodes
[global]
ioengine=libaio
buffered=0
rw=write
bssplit=4K/100
iodepth=256
numjobs=1
direct=1
runtime=60s
allow_mounted_write=0

[job1]
filename=/dev/sdd
..
[job24]
filename=/dev/sdaa

When I tune /sys/module/scsi_mod/parameters/use_blk_mq = 1, below is a
ioscheduler detail. (It is in blk-mq mode. )
/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/host10/target10:2:13/10:
2:13:0/block/sdq/queue/scheduler:none

When I have set /sys/module/scsi_mod/parameters/use_blk_mq = 0,
ioscheduler picked by SML is <cfq>.
/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/host10/target10:2:13/10:
2:13:0/block/sdq/queue/scheduler:noop deadline [cfq]

I see in blk-mq performance is very low for Sequential Write work load and
I confirm that blk-mq convert Sequential work load into random stream due
to  io-scheduler change in blk-mq vs legacy block layer.

>
> > >
> > > > Is there any workaround/alternative in latest upstream kernel, if
> > > > user wants to see limited penalty  for Sequential Work load on HDD
?
> > > >
> > > > ` Kashyap
> > > >
>
> P.S., your emails are being marked as spam by Gmail. Actually, Gmail
seems to
> mark just about everything I get from Broadcom as spam due to failed
DMARC.
>
> --
> Omar

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Device or HBA level QD throttling creates randomness in sequetial workload
@ 2016-10-20 10:08 Kashyap Desai
  0 siblings, 0 replies; 17+ messages in thread
From: Kashyap Desai @ 2016-10-20 10:08 UTC (permalink / raw)
  To: linux-scsi

[ Apologize, if you find more than one instance of my email.
Web based email client has some issue, so now trying git send mail.]

Hi,

I am doing some performance tuning in MR driver to understand how sdev queue depth and hba queue depth play role in IO submission from above layer.
I have 24 JBOD connected to MR 12GB controller and I can see performance for 4K Sequential work load as below.

HBA QD for MR controller is 4065 and Per device QD is set to 32

queue depth from <fio> 256 reports 300K IOPS 
queue depth from <fio> 128 reports 330K IOPS
queue depth from <fio> 64 reports 360K IOPS 
queue depth from <fio> 32 reports 510K IOPS

In MR driver I added debug print and confirm that more IO come to driver as random IO whenever I have <fio> queue depth more than 32.

I have debug using scsi logging level and blktrace as well. Below is snippet of logs using scsi logging level.  In summary, if SML do flow control of IO due to Device QD or HBA QD, IO coming to LLD is more random pattern.

I see IO coming to driver is not sequential.

[79546.912041] sd 18:2:21:0: [sdy] tag#854 CDB: Write(10) 2a 00 00 03 c0 3b 00 00 01 00
[79546.912049] sd 18:2:21:0: [sdy] tag#855 CDB: Write(10) 2a 00 00 03 c0 3c 00 00 01 00
[79546.912053] sd 18:2:21:0: [sdy] tag#886 CDB: Write(10) 2a 00 00 03 c0 5b 00 00 01 00 

<KD> After LBA "00 03 c0 3c" next command is with LBA "00 03 c0 5b". 
Two Sequence are overlapped due to sdev QD throttling.

[79546.912056] sd 18:2:21:0: [sdy] tag#887 CDB: Write(10) 2a 00 00 03 c0 5c 00 00 01 00
[79546.912250] sd 18:2:21:0: [sdy] tag#856 CDB: Write(10) 2a 00 00 03 c0 3d 00 00 01 00
[79546.912257] sd 18:2:21:0: [sdy] tag#888 CDB: Write(10) 2a 00 00 03 c0 5d 00 00 01 00
[79546.912259] sd 18:2:21:0: [sdy] tag#857 CDB: Write(10) 2a 00 00 03 c0 3e 00 00 01 00
[79546.912268] sd 18:2:21:0: [sdy] tag#858 CDB: Write(10) 2a 00 00 03 c0 3f 00 00 01 00

 If scsi_request_fn() breaks due to unavailability of device queue (due to below check), will there be any side defect as I observe ?
                if (!scsi_dev_queue_ready(q, sdev))
                             break;

If I reduce HBA QD and make sure IO from above layer is throttled due to HBA QD, there is a same impact.
MR driver use host wide shared tag map.

Can someone help me if this can be tunable in LLD providing additional settings or it is expected behavior ? Problem I am facing is, I am not able to figure out optimal device queue depth for different configuration and work load.

Thanks, Kashyap


^ permalink raw reply	[flat|nested] 17+ messages in thread
* Device or HBA level QD throttling creates randomness in sequetial workload
@ 2016-10-20  9:58 Kashyap Desai
  0 siblings, 0 replies; 17+ messages in thread
From: Kashyap Desai @ 2016-10-20  9:58 UTC (permalink / raw)
  To: linux-scsi

[ Apologize, if you find more than one instance of my email.
Web based email client has some issue, so now trying git send mail.]

Hi,

I am doing some performance tuning in MR driver to understand how sdev queue depth and hba queue depth play role in IO submission from above layer.
I have 24 JBOD connected to MR 12GB controller and I can see performance for 4K Sequential work load as below.

HBA QD for MR controller is 4065 and Per device QD is set to 32

queue depth from <fio> 256 reports 300K IOPS 
queue depth from <fio> 128 reports 330K IOPS
queue depth from <fio> 64 reports 360K IOPS 
queue depth from <fio> 32 reports 510K IOPS

In MR driver I added debug print and confirm that more IO come to driver as random IO whenever I have <fio> queue depth more than 32.

I have debug using scsi logging level and blktrace as well. Below is snippet of logs using scsi logging level.  In summary, if SML do flow control of IO due to Device QD or HBA QD, IO coming to LLD is more random pattern.

I see IO coming to driver is not sequential.

[79546.912041] sd 18:2:21:0: [sdy] tag#854 CDB: Write(10) 2a 00 00 03 c0 3b 00 00 01 00
[79546.912049] sd 18:2:21:0: [sdy] tag#855 CDB: Write(10) 2a 00 00 03 c0 3c 00 00 01 00
[79546.912053] sd 18:2:21:0: [sdy] tag#886 CDB: Write(10) 2a 00 00 03 c0 5b 00 00 01 00 

<KD> After LBA "00 03 c0 3c" next command is with LBA "00 03 c0 5b". 
Two Sequence are overlapped due to sdev QD throttling.

[79546.912056] sd 18:2:21:0: [sdy] tag#887 CDB: Write(10) 2a 00 00 03 c0 5c 00 00 01 00
[79546.912250] sd 18:2:21:0: [sdy] tag#856 CDB: Write(10) 2a 00 00 03 c0 3d 00 00 01 00
[79546.912257] sd 18:2:21:0: [sdy] tag#888 CDB: Write(10) 2a 00 00 03 c0 5d 00 00 01 00
[79546.912259] sd 18:2:21:0: [sdy] tag#857 CDB: Write(10) 2a 00 00 03 c0 3e 00 00 01 00
[79546.912268] sd 18:2:21:0: [sdy] tag#858 CDB: Write(10) 2a 00 00 03 c0 3f 00 00 01 00

 If scsi_request_fn() breaks due to unavailability of device queue (due to below check), will there be any side defect as I observe ?
                if (!scsi_dev_queue_ready(q, sdev))
                             break;

If I reduce HBA QD and make sure IO from above layer is throttled due to HBA QD, there is a same impact.
MR driver use host wide shared tag map.

Can someone help me if this can be tunable in LLD providing additional settings or it is expected behavior ? Problem I am facing is, I am not able to figure out optimal device queue depth for different configuration and work load.

Thanks, Kashyap


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-01-30 18:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-21 12:13 Device or HBA level QD throttling creates randomness in sequetial workload Kashyap Desai
2016-10-21 21:31 ` Omar Sandoval
2016-10-22 15:04   ` Kashyap Desai
2016-10-24 13:05   ` Kashyap Desai
2016-10-24 15:41     ` Omar Sandoval
  -- strict thread matches above, loose matches on Subject: below --
2016-10-24 18:54 Kashyap Desai
2016-10-26 20:56 ` Omar Sandoval
2016-10-31 17:24 ` Jens Axboe
2016-11-01  5:40   ` Kashyap Desai
2017-01-30 13:52   ` Kashyap Desai
2017-01-30 16:30     ` Bart Van Assche
2017-01-30 16:30       ` Bart Van Assche
2017-01-30 16:32       ` Jens Axboe
2017-01-30 18:28         ` Kashyap Desai
2017-01-30 18:29           ` Jens Axboe
2016-10-20 10:08 Kashyap Desai
2016-10-20  9:58 Kashyap Desai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.