All of lore.kernel.org
 help / color / mirror / Atom feed
* Read request exceeding max_hw_sectors_kb
@ 2018-06-13 10:41 ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-13 10:41 UTC (permalink / raw)


Hi Christoph, Daniel,

On my NVMf setup using SPDK, MDTS gets configured to 4 for a target
subsystem.
This sets max_hw_sector_kb to 64KB for a remotely attached NS block device
on the initiator.
The NS is exported from Intel NVMe 750 SSD connected to target which has a
quirk of NVME_QUIRK_STRIPE_SIZE.
So SPDK is filling up noiob to 256 as per vendor specific controller data
vs[3] = 5. The corresponding NS on initiator
gets chunk_sectors configured to 256 (128KB) though max_hw_sectors_kb is
64KB.

This is causing an issue in block layer submitting a readahead request
exceeding max transfer size which SPDK fails causing
nvme-rdma controller recovery. Call trace when request gets submitted has
been pasted below.

The path which allows the request to go through is
blk_mq_make_request -> blk_queue_split -> blk_bio_segment_split ->
get_max_io_size.

Though target is sending chunk size greater than mdts, shouldn't nvme-core
set chunk_sectors appropriately?
In this case, the block layer too, didn't seem to honor the
max_hw_sectors_kb.

So something like resolves the issue:
static void nvme_set_chunk_size(struct nvme_ns *ns)
{
	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));

	chunk_size = rounddown_pow_of_two(chunk_size);
	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
	blk_queue_chunk_sectors(ns->queue,
rounddown_pow_of_two(chunk_size));
}

Please do let me know if this is the right approach or SPDK should set
noiob appropriately.

Thanks,

JB

[ 1798.507808] nvme nvme2: JB: ctrl ffff880220f942c0 max_hw_sectors 128
max_segments 17 page_size 4096
[ 1798.512752] CPU: 8 PID: 5749 Comm: systemd-udevd Tainted: G O 4.14.44
#2
[ 1798.512753] Hardware name: Dell Inc. PowerEdge T620/0658N7, BIOS 2.5.4
01/22/2016
[ 1798.512754] Call Trace:
[ 1798.512761] dump_stack+0x63/0x87
[ 1798.512766] nvme_rdma_queue_rq+0x5ee/0x670 [nvme_rdma]
[ 1798.512769] __blk_mq_try_issue_directly+0xde/0x140
[ 1798.512771] blk_mq_try_issue_directly+0x6f/0x80
[ 1798.512773] ? blk_account_io_start+0xf4/0x190
[ 1798.512774] blk_mq_make_request+0x32a/0x5f0
[ 1798.512776] generic_make_request+0x122/0x2f0
[ 1798.512777] submit_bio+0x73/0x150
[ 1798.512778] ? submit_bio+0x73/0x150
[ 1798.512781] ? guard_bio_eod+0x2c/0x100
[ 1798.512783] mpage_readpages+0x1aa/0x1f0
[ 1798.512784] ? I_BDEV+0x20/0x20
[ 1798.512787] ? alloc_pages_current+0x6a/0xe0
[ 1798.512788] blkdev_readpages+0x1d/0x20
[ 1798.512791] __do_page_cache_readahead+0x1be/0x2c0
[ 1798.512793] force_page_cache_readahead+0xb8/0x110
[ 1798.512794] ? force_page_cache_readahead+0xb8/0x110
[ 1798.512795] page_cache_sync_readahead+0x3f/0x50
[ 1798.512798] generic_file_read_iter+0x7eb/0xbb0
[ 1798.512800] ? page_cache_tree_insert+0xb0/0xb0
[ 1798.512801] blkdev_read_iter+0x35/0x40
[ 1798.512804] __vfs_read+0xf9/0x170
[ 1798.512806] vfs_read+0x93/0x130
[ 1798.512807] SyS_read+0x55/0xc0
[ 1798.512810] do_syscall_64+0x73/0x130
[ 1798.512813] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1798.512814] RIP: 0033:0x7f59a6260500
[ 1798.512815] RSP: 002b:00007ffe79114e58 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 1798.512817] RAX: ffffffffffffffda RBX: 000056152f7e88c0 RCX:
00007f59a6260500
[ 1798.512818] RDX: 0000000000040000 RSI: 000056152f7e88e8 RDI:
000000000000000f
[ 1798.512819] RBP: 000056152f7966b0 R08: 000056152f7e88c0 R09:
0000000000000000
[ 1798.512819] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000040000
[ 1798.512820] R13: 0000000000000000 R14: 000056152f796700 R15:
000056152f7e88d8
[ 1798.512824] nvme nvme2: JB: nvme_rdma_queue_rq: rq ffff88021da00000 op
0 data len 0x11000

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [SPDK] Read request exceeding max_hw_sectors_kb
@ 2018-06-13 10:41 ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-13 10:41 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3674 bytes --]

Hi Christoph, Daniel,

On my NVMf setup using SPDK, MDTS gets configured to 4 for a target
subsystem.
This sets max_hw_sector_kb to 64KB for a remotely attached NS block device
on the initiator.
The NS is exported from Intel NVMe 750 SSD connected to target which has a
quirk of NVME_QUIRK_STRIPE_SIZE.
So SPDK is filling up noiob to 256 as per vendor specific controller data
vs[3] = 5. The corresponding NS on initiator
gets chunk_sectors configured to 256 (128KB) though max_hw_sectors_kb is
64KB.

This is causing an issue in block layer submitting a readahead request
exceeding max transfer size which SPDK fails causing
nvme-rdma controller recovery. Call trace when request gets submitted has
been pasted below.

The path which allows the request to go through is
blk_mq_make_request -> blk_queue_split -> blk_bio_segment_split ->
get_max_io_size.

Though target is sending chunk size greater than mdts, shouldn't nvme-core
set chunk_sectors appropriately?
In this case, the block layer too, didn't seem to honor the
max_hw_sectors_kb.

So something like resolves the issue:
static void nvme_set_chunk_size(struct nvme_ns *ns)
{
	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));

	chunk_size = rounddown_pow_of_two(chunk_size);
	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
	blk_queue_chunk_sectors(ns->queue,
rounddown_pow_of_two(chunk_size));
}

Please do let me know if this is the right approach or SPDK should set
noiob appropriately.

Thanks,

JB

[ 1798.507808] nvme nvme2: JB: ctrl ffff880220f942c0 max_hw_sectors 128
max_segments 17 page_size 4096
[ 1798.512752] CPU: 8 PID: 5749 Comm: systemd-udevd Tainted: G O 4.14.44
#2
[ 1798.512753] Hardware name: Dell Inc. PowerEdge T620/0658N7, BIOS 2.5.4
01/22/2016
[ 1798.512754] Call Trace:
[ 1798.512761] dump_stack+0x63/0x87
[ 1798.512766] nvme_rdma_queue_rq+0x5ee/0x670 [nvme_rdma]
[ 1798.512769] __blk_mq_try_issue_directly+0xde/0x140
[ 1798.512771] blk_mq_try_issue_directly+0x6f/0x80
[ 1798.512773] ? blk_account_io_start+0xf4/0x190
[ 1798.512774] blk_mq_make_request+0x32a/0x5f0
[ 1798.512776] generic_make_request+0x122/0x2f0
[ 1798.512777] submit_bio+0x73/0x150
[ 1798.512778] ? submit_bio+0x73/0x150
[ 1798.512781] ? guard_bio_eod+0x2c/0x100
[ 1798.512783] mpage_readpages+0x1aa/0x1f0
[ 1798.512784] ? I_BDEV+0x20/0x20
[ 1798.512787] ? alloc_pages_current+0x6a/0xe0
[ 1798.512788] blkdev_readpages+0x1d/0x20
[ 1798.512791] __do_page_cache_readahead+0x1be/0x2c0
[ 1798.512793] force_page_cache_readahead+0xb8/0x110
[ 1798.512794] ? force_page_cache_readahead+0xb8/0x110
[ 1798.512795] page_cache_sync_readahead+0x3f/0x50
[ 1798.512798] generic_file_read_iter+0x7eb/0xbb0
[ 1798.512800] ? page_cache_tree_insert+0xb0/0xb0
[ 1798.512801] blkdev_read_iter+0x35/0x40
[ 1798.512804] __vfs_read+0xf9/0x170
[ 1798.512806] vfs_read+0x93/0x130
[ 1798.512807] SyS_read+0x55/0xc0
[ 1798.512810] do_syscall_64+0x73/0x130
[ 1798.512813] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1798.512814] RIP: 0033:0x7f59a6260500
[ 1798.512815] RSP: 002b:00007ffe79114e58 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 1798.512817] RAX: ffffffffffffffda RBX: 000056152f7e88c0 RCX:
00007f59a6260500
[ 1798.512818] RDX: 0000000000040000 RSI: 000056152f7e88e8 RDI:
000000000000000f
[ 1798.512819] RBP: 000056152f7966b0 R08: 000056152f7e88c0 R09:
0000000000000000
[ 1798.512819] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000040000
[ 1798.512820] R13: 0000000000000000 R14: 000056152f796700 R15:
000056152f7e88d8
[ 1798.512824] nvme nvme2: JB: nvme_rdma_queue_rq: rq ffff88021da00000 op
0 data len 0x11000

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Read request exceeding max_hw_sectors_kb
@ 2018-06-13 11:47   ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2018-06-13 11:47 UTC (permalink / raw)


On Wed, Jun 13, 2018@04:11:56PM +0530, Jitendra Bhivare wrote:
> So something like resolves the issue:
> static void nvme_set_chunk_size(struct nvme_ns *ns)
> {
> 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
> 
> 	chunk_size = rounddown_pow_of_two(chunk_size);
> 	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
> 	blk_queue_chunk_sectors(ns->queue,
> rounddown_pow_of_two(chunk_size));
> }
> 
> Please do let me know if this is the right approach or SPDK should set
> noiob appropriately.

I think the answer is both:  The kernel code should be more robust
in handling bogus noiob sizes, so please submit a patch per your above
analysis.  spdk should be fixed to not expose a bogus noiob size as well.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Read request exceeding max_hw_sectors_kb
@ 2018-06-13 11:47   ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2018-06-13 11:47 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

On Wed, Jun 13, 2018 at 04:11:56PM +0530, Jitendra Bhivare wrote:
> So something like resolves the issue:
> static void nvme_set_chunk_size(struct nvme_ns *ns)
> {
> 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
> 
> 	chunk_size = rounddown_pow_of_two(chunk_size);
> 	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
> 	blk_queue_chunk_sectors(ns->queue,
> rounddown_pow_of_two(chunk_size));
> }
> 
> Please do let me know if this is the right approach or SPDK should set
> noiob appropriately.

I think the answer is both:  The kernel code should be more robust
in handling bogus noiob sizes, so please submit a patch per your above
analysis.  spdk should be fixed to not expose a bogus noiob size as well.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Read request exceeding max_hw_sectors_kb
@ 2018-06-13 17:00     ` Daniel Verkamp
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Verkamp @ 2018-06-13 17:00 UTC (permalink / raw)


On 06/13/2018 04:47 AM, Christoph Hellwig wrote:
> On Wed, Jun 13, 2018@04:11:56PM +0530, Jitendra Bhivare wrote:
>> So something like resolves the issue:
>> static void nvme_set_chunk_size(struct nvme_ns *ns)
>> {
>> 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
>>
>> 	chunk_size = rounddown_pow_of_two(chunk_size);
>> 	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
>> 	blk_queue_chunk_sectors(ns->queue,
>> rounddown_pow_of_two(chunk_size));
>> }
>>
>> Please do let me know if this is the right approach or SPDK should set
>> noiob appropriately.
> 
> I think the answer is both:  The kernel code should be more robust
> in handling bogus noiob sizes, so please submit a patch per your above
> analysis.  spdk should be fixed to not expose a bogus noiob size as well.

Hi Christoph,

By my understanding, this is what the spec is trying to say about NOIOB
(though unfortunately not very clearly): for best performance, read and
write I/Os should not cross an LBA that is a multiple of NOIOB.  For
example, if NOIOB is 64KB, an I/O of LBA=48KB size=32KB should be split
into two smaller I/Os, LBA=48KB size=16KB and LBA=64KB size=16KB, since
it crosses a 64KB LBA boundary.

Based on the current wording in the spec, I don't believe there is any
requirement that NOIOB be less than or equal to MDTS; it's purely about
avoiding I/O overlapping any boundary where LBA = NOIOB * n.

That said, we could consider some kind of workaround on the target side
if this kernel host code is in wide use already, like clamping NOIOB to
MDTS (accounting for unit conversion) or reporting NOIOB = 0 if NOIOB
would be greater than MDTS.  I'd like to avoid this route if possible,
though.

Based on a quick glance at the kernel history, it looks like the host
code has been using NOIOB since Linux 4.12 (commit 6b8190d61a622), so
I'm surprised we haven't had more reports about this failing.  Did
something change in the kernel block stack more recently than that to
cause chunks larger than the maximum data transfer size, or has this
always been an issue since the introduction of the NOIOB host code?

Thanks,
-- Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Read request exceeding max_hw_sectors_kb
@ 2018-06-13 17:00     ` Daniel Verkamp
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Verkamp @ 2018-06-13 17:00 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

On 06/13/2018 04:47 AM, Christoph Hellwig wrote:
> On Wed, Jun 13, 2018 at 04:11:56PM +0530, Jitendra Bhivare wrote:
>> So something like resolves the issue:
>> static void nvme_set_chunk_size(struct nvme_ns *ns)
>> {
>> 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
>>
>> 	chunk_size = rounddown_pow_of_two(chunk_size);
>> 	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
>> 	blk_queue_chunk_sectors(ns->queue,
>> rounddown_pow_of_two(chunk_size));
>> }
>>
>> Please do let me know if this is the right approach or SPDK should set
>> noiob appropriately.
> 
> I think the answer is both:  The kernel code should be more robust
> in handling bogus noiob sizes, so please submit a patch per your above
> analysis.  spdk should be fixed to not expose a bogus noiob size as well.

Hi Christoph,

By my understanding, this is what the spec is trying to say about NOIOB
(though unfortunately not very clearly): for best performance, read and
write I/Os should not cross an LBA that is a multiple of NOIOB.  For
example, if NOIOB is 64KB, an I/O of LBA=48KB size=32KB should be split
into two smaller I/Os, LBA=48KB size=16KB and LBA=64KB size=16KB, since
it crosses a 64KB LBA boundary.

Based on the current wording in the spec, I don't believe there is any
requirement that NOIOB be less than or equal to MDTS; it's purely about
avoiding I/O overlapping any boundary where LBA = NOIOB * n.

That said, we could consider some kind of workaround on the target side
if this kernel host code is in wide use already, like clamping NOIOB to
MDTS (accounting for unit conversion) or reporting NOIOB = 0 if NOIOB
would be greater than MDTS.  I'd like to avoid this route if possible,
though.

Based on a quick glance at the kernel history, it looks like the host
code has been using NOIOB since Linux 4.12 (commit 6b8190d61a622), so
I'm surprised we haven't had more reports about this failing.  Did
something change in the kernel block stack more recently than that to
cause chunks larger than the maximum data transfer size, or has this
always been an issue since the introduction of the NOIOB host code?

Thanks,
-- Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Read request exceeding max_hw_sectors_kb
@ 2018-06-14 10:16       ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-14 10:16 UTC (permalink / raw)


> -----Original Message-----
> From: Daniel Verkamp [mailto:daniel.verkamp at intel.com]
> Sent: Wednesday, June 13, 2018 10:30 PM
> To: Christoph Hellwig <hch at lst.de>; Jitendra Bhivare
> <jitendra.bhivare at broadcom.com>
> Cc: linux-nvme at lists.infradead.org; Storage Performance Development Kit
> <spdk at lists.01.org>
> Subject: Re: Read request exceeding max_hw_sectors_kb
>
> On 06/13/2018 04:47 AM, Christoph Hellwig wrote:
> > On Wed, Jun 13, 2018@04:11:56PM +0530, Jitendra Bhivare wrote:
> >> So something like resolves the issue:
> >> static void nvme_set_chunk_size(struct nvme_ns *ns) {
> >> 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
> >>
> >> 	chunk_size = rounddown_pow_of_two(chunk_size);
> >> 	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
> >> 	blk_queue_chunk_sectors(ns->queue,
> >> rounddown_pow_of_two(chunk_size));
> >> }
> >>
> >> Please do let me know if this is the right approach or SPDK should
> >> set noiob appropriately.
> >
> > I think the answer is both:  The kernel code should be more robust in
> > handling bogus noiob sizes, so please submit a patch per your above
> > analysis.  spdk should be fixed to not expose a bogus noiob size as
> > well.
>
> Hi Christoph,
>
> By my understanding, this is what the spec is trying to say about NOIOB
> (though
> unfortunately not very clearly): for best performance, read and write I/Os
> should
> not cross an LBA that is a multiple of NOIOB.  For example, if NOIOB is
> 64KB, an
> I/O of LBA=48KB size=32KB should be split into two smaller I/Os, LBA=48KB
> size=16KB and LBA=64KB size=16KB, since it crosses a 64KB LBA boundary.
>
> Based on the current wording in the spec, I don't believe there is any
> requirement that NOIOB be less than or equal to MDTS; it's purely about
> avoiding I/O overlapping any boundary where LBA = NOIOB * n.
>
> That said, we could consider some kind of workaround on the target side if
> this
> kernel host code is in wide use already, like clamping NOIOB to MDTS
> (accounting for unit conversion) or reporting NOIOB = 0 if NOIOB would be
> greater than MDTS.  I'd like to avoid this route if possible, though.
>
> Based on a quick glance at the kernel history, it looks like the host code
> has been
> using NOIOB since Linux 4.12 (commit 6b8190d61a622), so I'm surprised we
> haven't had more reports about this failing.  Did something change in the
> kernel
> block stack more recently than that to cause chunks larger than the
> maximum
> data transfer size, or has this always been an issue since the
> introduction of the
> NOIOB host code?
>
> Thanks,
> -- Daniel

Thanks Daniel, Christoph.

I agree with Daniel, MDTS being controller property might not be related to
NOIOB.
Need to fix this in initiator kernel.

This is happening only in readahead path issued by systemd-udevd only when
SPDK is configured with that MaxIOSize.

Regards,

JB

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Read request exceeding max_hw_sectors_kb
@ 2018-06-14 10:16       ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-14 10:16 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2974 bytes --]

> -----Original Message-----
> From: Daniel Verkamp [mailto:daniel.verkamp(a)intel.com]
> Sent: Wednesday, June 13, 2018 10:30 PM
> To: Christoph Hellwig <hch(a)lst.de>; Jitendra Bhivare
> <jitendra.bhivare(a)broadcom.com>
> Cc: linux-nvme(a)lists.infradead.org; Storage Performance Development Kit
> <spdk(a)lists.01.org>
> Subject: Re: Read request exceeding max_hw_sectors_kb
>
> On 06/13/2018 04:47 AM, Christoph Hellwig wrote:
> > On Wed, Jun 13, 2018 at 04:11:56PM +0530, Jitendra Bhivare wrote:
> >> So something like resolves the issue:
> >> static void nvme_set_chunk_size(struct nvme_ns *ns) {
> >> 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
> >>
> >> 	chunk_size = rounddown_pow_of_two(chunk_size);
> >> 	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
> >> 	blk_queue_chunk_sectors(ns->queue,
> >> rounddown_pow_of_two(chunk_size));
> >> }
> >>
> >> Please do let me know if this is the right approach or SPDK should
> >> set noiob appropriately.
> >
> > I think the answer is both:  The kernel code should be more robust in
> > handling bogus noiob sizes, so please submit a patch per your above
> > analysis.  spdk should be fixed to not expose a bogus noiob size as
> > well.
>
> Hi Christoph,
>
> By my understanding, this is what the spec is trying to say about NOIOB
> (though
> unfortunately not very clearly): for best performance, read and write I/Os
> should
> not cross an LBA that is a multiple of NOIOB.  For example, if NOIOB is
> 64KB, an
> I/O of LBA=48KB size=32KB should be split into two smaller I/Os, LBA=48KB
> size=16KB and LBA=64KB size=16KB, since it crosses a 64KB LBA boundary.
>
> Based on the current wording in the spec, I don't believe there is any
> requirement that NOIOB be less than or equal to MDTS; it's purely about
> avoiding I/O overlapping any boundary where LBA = NOIOB * n.
>
> That said, we could consider some kind of workaround on the target side if
> this
> kernel host code is in wide use already, like clamping NOIOB to MDTS
> (accounting for unit conversion) or reporting NOIOB = 0 if NOIOB would be
> greater than MDTS.  I'd like to avoid this route if possible, though.
>
> Based on a quick glance at the kernel history, it looks like the host code
> has been
> using NOIOB since Linux 4.12 (commit 6b8190d61a622), so I'm surprised we
> haven't had more reports about this failing.  Did something change in the
> kernel
> block stack more recently than that to cause chunks larger than the
> maximum
> data transfer size, or has this always been an issue since the
> introduction of the
> NOIOB host code?
>
> Thanks,
> -- Daniel

Thanks Daniel, Christoph.

I agree with Daniel, MDTS being controller property might not be related to
NOIOB.
Need to fix this in initiator kernel.

This is happening only in readahead path issued by systemd-udevd only when
SPDK is configured with that MaxIOSize.

Regards,

JB

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Read request exceeding max_hw_sectors_kb
@ 2018-06-14 14:54   ` Keith Busch
  0 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-14 14:54 UTC (permalink / raw)


On Wed, Jun 13, 2018@04:11:56PM +0530, Jitendra Bhivare wrote:
> So something like resolves the issue:
> static void nvme_set_chunk_size(struct nvme_ns *ns)
> {
> 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
> 
> 	chunk_size = rounddown_pow_of_two(chunk_size);
> 	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
> 	blk_queue_chunk_sectors(ns->queue,
> rounddown_pow_of_two(chunk_size));
> }

That doesn't look right. The io boundary may have nothing to do with max
transfer size, so throttling the chunk size may cause lots of
unnecessary splits.

Isn't the reason it's failing is the block layer allowed a command to
form to the chunk size boundary instead of capping it to the max transfer
limit? How about this patch instead?

---
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bca3a92eb55f..9c57b49a4132 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1118,8 +1118,8 @@ static inline unsigned int blk_max_size_offset(struct request_queue *q,
 	if (!q->limits.chunk_sectors)
 		return q->limits.max_sectors;
 
-	return q->limits.chunk_sectors -
-			(offset & (q->limits.chunk_sectors - 1));
+	return min(q->limits.max_sectors, (unsigned int)(q->limits.chunk_sectors -
+			(offset & (q->limits.chunk_sectors - 1))));
 }
 
 static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
--

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [SPDK] Read request exceeding max_hw_sectors_kb
@ 2018-06-14 14:54   ` Keith Busch
  0 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-14 14:54 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1403 bytes --]

On Wed, Jun 13, 2018 at 04:11:56PM +0530, Jitendra Bhivare wrote:
> So something like resolves the issue:
> static void nvme_set_chunk_size(struct nvme_ns *ns)
> {
> 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
> 
> 	chunk_size = rounddown_pow_of_two(chunk_size);
> 	chunk_size = min(ns->ctrl->max_hw_sectors, chunk_size);
> 	blk_queue_chunk_sectors(ns->queue,
> rounddown_pow_of_two(chunk_size));
> }

That doesn't look right. The io boundary may have nothing to do with max
transfer size, so throttling the chunk size may cause lots of
unnecessary splits.

Isn't the reason it's failing is the block layer allowed a command to
form to the chunk size boundary instead of capping it to the max transfer
limit? How about this patch instead?

---
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bca3a92eb55f..9c57b49a4132 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1118,8 +1118,8 @@ static inline unsigned int blk_max_size_offset(struct request_queue *q,
 	if (!q->limits.chunk_sectors)
 		return q->limits.max_sectors;
 
-	return q->limits.chunk_sectors -
-			(offset & (q->limits.chunk_sectors - 1));
+	return min(q->limits.max_sectors, (unsigned int)(q->limits.chunk_sectors -
+			(offset & (q->limits.chunk_sectors - 1))));
 }
 
 static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
--

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Read request exceeding max_hw_sectors_kb
@ 2018-06-15 13:58     ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-15 13:58 UTC (permalink / raw)


> ---
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index bca3a92eb55f..9c57b49a4132 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1118,8 +1118,8 @@ static inline unsigned int blk_max_size_offset(struct request_queue *q,
>         if (!q->limits.chunk_sectors)
>                 return q->limits.max_sectors;
>
> -       return q->limits.chunk_sectors -
> -                       (offset & (q->limits.chunk_sectors - 1));
> +       return min(q->limits.max_sectors, (unsigned int)(q->limits.chunk_sectors -
> +                       (offset & (q->limits.chunk_sectors - 1))));
>  }
>
>  static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
> --

OK, this should work too but I am running into some other issues with
this on my setup, will verify and update.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Read request exceeding max_hw_sectors_kb
@ 2018-06-15 13:58     ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-15 13:58 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

> ---
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index bca3a92eb55f..9c57b49a4132 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1118,8 +1118,8 @@ static inline unsigned int blk_max_size_offset(struct request_queue *q,
>         if (!q->limits.chunk_sectors)
>                 return q->limits.max_sectors;
>
> -       return q->limits.chunk_sectors -
> -                       (offset & (q->limits.chunk_sectors - 1));
> +       return min(q->limits.max_sectors, (unsigned int)(q->limits.chunk_sectors -
> +                       (offset & (q->limits.chunk_sectors - 1))));
>  }
>
>  static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
> --

OK, this should work too but I am running into some other issues with
this on my setup, will verify and update.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Read request exceeding max_hw_sectors_kb
@ 2018-06-26 10:47     ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-26 10:47 UTC (permalink / raw)


> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index
> bca3a92eb55f..9c57b49a4132 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1118,8 +1118,8 @@ static inline unsigned int
blk_max_size_offset(struct
> request_queue *q,
>  	if (!q->limits.chunk_sectors)
>  		return q->limits.max_sectors;
>
> -	return q->limits.chunk_sectors -
> -			(offset & (q->limits.chunk_sectors - 1));
> +	return min(q->limits.max_sectors, (unsigned
int)(q->limits.chunk_sectors
> -
> +			(offset & (q->limits.chunk_sectors - 1))));
>  }
>
>  static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
> --

Sorry for the delayed response. This patch works fine too. Please submit
it.

Regards,

JB

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Read request exceeding max_hw_sectors_kb
@ 2018-06-26 10:47     ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-26 10:47 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index
> bca3a92eb55f..9c57b49a4132 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1118,8 +1118,8 @@ static inline unsigned int
blk_max_size_offset(struct
> request_queue *q,
>  	if (!q->limits.chunk_sectors)
>  		return q->limits.max_sectors;
>
> -	return q->limits.chunk_sectors -
> -			(offset & (q->limits.chunk_sectors - 1));
> +	return min(q->limits.max_sectors, (unsigned
int)(q->limits.chunk_sectors
> -
> +			(offset & (q->limits.chunk_sectors - 1))));
>  }
>
>  static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
> --

Sorry for the delayed response. This patch works fine too. Please submit
it.

Regards,

JB

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Read request exceeding max_hw_sectors_kb
@ 2018-06-26 15:07       ` Keith Busch
  0 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-26 15:07 UTC (permalink / raw)


On Tue, Jun 26, 2018@04:17:35PM +0530, Jitendra Bhivare wrote:
> Sorry for the delayed response. This patch works fine too. Please submit
> it.
 
Will do, thanks for the confirmation. Okay if I add your Tested-by?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Read request exceeding max_hw_sectors_kb
@ 2018-06-26 15:07       ` Keith Busch
  0 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-26 15:07 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 222 bytes --]

On Tue, Jun 26, 2018 at 04:17:35PM +0530, Jitendra Bhivare wrote:
> Sorry for the delayed response. This patch works fine too. Please submit
> it.
 
Will do, thanks for the confirmation. Okay if I add your Tested-by?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Read request exceeding max_hw_sectors_kb
@ 2018-06-27 10:28         ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-27 10:28 UTC (permalink / raw)


> Will do, thanks for the confirmation. Okay if I add your Tested-by?

Sure. I ran fio stress test for an hour.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Read request exceeding max_hw_sectors_kb
@ 2018-06-27 10:28         ` Jitendra Bhivare
  0 siblings, 0 replies; 18+ messages in thread
From: Jitendra Bhivare @ 2018-06-27 10:28 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 115 bytes --]

> Will do, thanks for the confirmation. Okay if I add your Tested-by?

Sure. I ran fio stress test for an hour.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-06-27 10:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-13 10:41 Read request exceeding max_hw_sectors_kb Jitendra Bhivare
2018-06-13 10:41 ` [SPDK] " Jitendra Bhivare
2018-06-13 11:47 ` Christoph Hellwig
2018-06-13 11:47   ` [SPDK] " Christoph Hellwig
2018-06-13 17:00   ` Daniel Verkamp
2018-06-13 17:00     ` [SPDK] " Daniel Verkamp
2018-06-14 10:16     ` Jitendra Bhivare
2018-06-14 10:16       ` [SPDK] " Jitendra Bhivare
2018-06-14 14:54 ` Keith Busch
2018-06-14 14:54   ` [SPDK] " Keith Busch
2018-06-15 13:58   ` Jitendra Bhivare
2018-06-15 13:58     ` [SPDK] " Jitendra Bhivare
2018-06-26 10:47   ` Jitendra Bhivare
2018-06-26 10:47     ` [SPDK] " Jitendra Bhivare
2018-06-26 15:07     ` Keith Busch
2018-06-26 15:07       ` [SPDK] " Keith Busch
2018-06-27 10:28       ` Jitendra Bhivare
2018-06-27 10:28         ` [SPDK] " Jitendra Bhivare

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.