All of lore.kernel.org
 help / color / mirror / Atom feed
* Drives with MDTS set to zero
@ 2015-11-17 20:51 Paul Grabinar
  2015-11-18 22:58 ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Grabinar @ 2015-11-17 20:51 UTC (permalink / raw)


Hi,

I have a drive that sets MDTS to zero. According to the NVMe
specification, that is valid and means that the drive does not have a
limit on the transfer size.

In the current 4.4 kernel driver, we detect this and set
dev->max_hw_sectors to UINT_MAX.

Later, when a namespace is allocated, we set the block level
max_hw_sectors and max_segments based on dev->max_hw_sectors. This
operations perform shifts on the value, but we are already at UINT_MAX,
so end up with strange results.

I'm not entirely sure what is supposed to happen. The following patch
sets max_hw_sectors to

BLK_DEF_MAX_SECTORS and max_segments to BLK_MAX_SEGMENTS, but this still does not seem right, as drives with a large MDTS will set max_segments higher than this.


diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8187df2..be35401 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2265,11 +2265,12 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
 	list_add_tail(&ns->list, &dev->namespaces);
 
 	blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
-	if (dev->max_hw_sectors) {
-		blk_queue_max_hw_sectors(ns->queue, dev->max_hw_sectors);
-		blk_queue_max_segments(ns->queue,
-			((dev->max_hw_sectors << 9) / dev->page_size) + 1);
-	}
+	blk_queue_max_hw_sectors(ns->queue, dev->max_hw_sectors ?
+		dev->max_hw_sectors : BLK_DEF_MAX_SECTORS);
+	blk_queue_max_segments(ns->queue,
+		dev->max_hw_sectors ?
+		((dev->max_hw_sectors << 9) / dev->page_size) + 1 :
+		BLK_MAX_SEGMENTS);
 	if (dev->stripe_size)
 		blk_queue_chunk_sectors(ns->queue, dev->stripe_size >> 9);
 	if (dev->vwc & NVME_CTRL_VWC_PRESENT)
@@ -2622,8 +2623,6 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	memcpy(dev->firmware_rev, ctrl->fr, sizeof(ctrl->fr));
 	if (ctrl->mdts)
 		dev->max_hw_sectors = 1 << (ctrl->mdts + shift - 9);
-	else
-		dev->max_hw_sectors = UINT_MAX;
 	if ((pdev->vendor == PCI_VENDOR_ID_INTEL) &&
 			(pdev->device == 0x0953) && ctrl->vs[3]) {
 		unsigned int max_hw_sectors;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Drives with MDTS set to zero
  2015-11-17 20:51 Drives with MDTS set to zero Paul Grabinar
@ 2015-11-18 22:58 ` Keith Busch
  2015-11-18 23:02   ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2015-11-18 22:58 UTC (permalink / raw)


On Tue, Nov 17, 2015@08:51:50PM +0000, Paul Grabinar wrote:
> I have a drive that sets MDTS to zero. According to the NVMe
> specification, that is valid and means that the drive does not have a
> limit on the transfer size.
> 
> In the current 4.4 kernel driver, we detect this and set
> dev->max_hw_sectors to UINT_MAX.
> 
> Later, when a namespace is allocated, we set the block level
> max_hw_sectors and max_segments based on dev->max_hw_sectors. This
> operations perform shifts on the value, but we are already at UINT_MAX,
> so end up with strange results.
> 
> I'm not entirely sure what is supposed to happen. The following patch
> sets max_hw_sectors to
> 
> BLK_DEF_MAX_SECTORS and max_segments to BLK_MAX_SEGMENTS, but this still does not seem right, as drives with a large MDTS will set max_segments higher than this.

Nice find. The original code was written before the segment count, so
probably was not tested together (my controllers define MDTS, so I've
never tested it at all).

We can fix this by reordering the math instead of artificially reducing
the transfer size.
---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5aca81c..f17e3d3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2266,7 +2266,7 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
 	if (dev->max_hw_sectors) {
 		blk_queue_max_hw_sectors(ns->queue, dev->max_hw_sectors);
 		blk_queue_max_segments(ns->queue,
-			((dev->max_hw_sectors << 9) / dev->page_size) + 1);
+			((dev->max_hw_sectors / (dev->page_size >> 9) + 1);
 	}
 	if (dev->stripe_size)
 		blk_queue_chunk_sectors(ns->queue, dev->stripe_size >> 9);
--

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Drives with MDTS set to zero
  2015-11-18 22:58 ` Keith Busch
@ 2015-11-18 23:02   ` Keith Busch
  2015-11-18 23:08     ` Busch, Keith
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2015-11-18 23:02 UTC (permalink / raw)


On Wed, Nov 18, 2015@10:58:20PM +0000, Keith Busch wrote:
> We can fix this by reordering the math instead of artificially reducing
> the transfer size.

Resend with an actually compilable patch.

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5aca81c..f17e3d3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2266,7 +2266,7 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
 	if (dev->max_hw_sectors) {
 		blk_queue_max_hw_sectors(ns->queue, dev->max_hw_sectors);
 		blk_queue_max_segments(ns->queue,
-			((dev->max_hw_sectors << 9) / dev->page_size) + 1);
+			(dev->max_hw_sectors / (dev->page_size >> 9) + 1);
 	}
 	if (dev->stripe_size)
 		blk_queue_chunk_sectors(ns->queue, dev->stripe_size >> 9);
--

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Drives with MDTS set to zero
  2015-11-18 23:02   ` Keith Busch
@ 2015-11-18 23:08     ` Busch, Keith
  2015-11-19  8:34       ` Paul Grabinar
  0 siblings, 1 reply; 5+ messages in thread
From: Busch, Keith @ 2015-11-18 23:08 UTC (permalink / raw)


Ugh, broken again, sorry, having a distracted day...

I'll resend as a proper patch that really works.

> On Wed, Nov 18, 2015@10:58:20PM +0000, Keith Busch wrote:
> > We can fix this by reordering the math instead of artificially reducing
> > the transfer size.
> 
> Resend with an actually compilable patch.
> 
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 5aca81c..f17e3d3 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2266,7 +2266,7 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
>  	if (dev->max_hw_sectors) {
>  		blk_queue_max_hw_sectors(ns->queue, dev->max_hw_sectors);
>  		blk_queue_max_segments(ns->queue,
> -			((dev->max_hw_sectors << 9) / dev->page_size) + 1);
> +			(dev->max_hw_sectors / (dev->page_size >> 9) + 1);
>  	}
>  	if (dev->stripe_size)
>  		blk_queue_chunk_sectors(ns->queue, dev->stripe_size >> 9);
> --

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Drives with MDTS set to zero
  2015-11-18 23:08     ` Busch, Keith
@ 2015-11-19  8:34       ` Paul Grabinar
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Grabinar @ 2015-11-19  8:34 UTC (permalink / raw)


Thanks.

That should take care of max_segments.

There is still the issue that blk_queue_max_hw_sectors calls
blk_limits_max_hw_sectors which performs:

(max_hw_sectors << 9)

and so ends up with a strange looking value. I'm not if this is a
problem with the driver calling it with UINT_MAX, or a problem with the
way blk_limits_max_hw_sectors processes max_hw_sectors. This is a minor
issue since the device will still operate.


On 11/18/15 23:08, Busch, Keith wrote:
> Ugh, broken again, sorry, having a distracted day...
>
> I'll resend as a proper patch that really works.
>
>> On Wed, Nov 18, 2015@10:58:20PM +0000, Keith Busch wrote:
>>> We can fix this by reordering the math instead of artificially reducing
>>> the transfer size.
>> Resend with an actually compilable patch.
>>
>> ---
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 5aca81c..f17e3d3 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -2266,7 +2266,7 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
>>  	if (dev->max_hw_sectors) {
>>  		blk_queue_max_hw_sectors(ns->queue, dev->max_hw_sectors);
>>  		blk_queue_max_segments(ns->queue,
>> -			((dev->max_hw_sectors << 9) / dev->page_size) + 1);
>> +			(dev->max_hw_sectors / (dev->page_size >> 9) + 1);
>>  	}
>>  	if (dev->stripe_size)
>>  		blk_queue_chunk_sectors(ns->queue, dev->stripe_size >> 9);
>> --
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-11-19  8:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-17 20:51 Drives with MDTS set to zero Paul Grabinar
2015-11-18 22:58 ` Keith Busch
2015-11-18 23:02   ` Keith Busch
2015-11-18 23:08     ` Busch, Keith
2015-11-19  8:34       ` Paul Grabinar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.