All of lore.kernel.org
 help / color / mirror / Atom feed
* NVMe SGL Data Length Error
@ 2018-07-10 17:28 Andrew Maier
  2018-07-10 20:08 ` Keith Busch
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Maier @ 2018-07-10 17:28 UTC (permalink / raw)


Hi all,

I've run into an issue with NVMe SGLs lately with our controller when given multiple SGL segments in a single command from the driver (i.e., more than 256 SGL entries in a single nvme read/write command); where there are not enough SGL Data Block descriptors for the transfer.  The first segment properly links to, in my case, a Last Segment Descriptor, however at the end of the second segment there are not enough Data Block descriptors for the full transfer (it is usually missing space for 4096 or 8192 bytes) which I've verified manually using a PCIe analyzer.  This forces our NVMe Controller to fail and return the SGL Data Length Invalid (0xF) status code.

Repro Steps:
1. Set the sgl_threshold to 4096
2. Run a 4MB nvme read transfer (i.e., nvme read <sgl_nvme_dev> -s 0 -c 8191 -z 4194304 -t)
3. Repeat step 2 until the memory is split into multiple SGL segments or try a larger transfer.

Does anyone know of a patch for this issue?

Cheers,
Andrew Maier
Software Engineer
Eideticom.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* NVMe SGL Data Length Error
  2018-07-10 17:28 NVMe SGL Data Length Error Andrew Maier
@ 2018-07-10 20:08 ` Keith Busch
  2018-07-10 23:55   ` Andrew Maier
  0 siblings, 1 reply; 3+ messages in thread
From: Keith Busch @ 2018-07-10 20:08 UTC (permalink / raw)


On Tue, Jul 10, 2018@05:28:25PM +0000, Andrew Maier wrote:
> Hi all,
> 
> I've run into an issue with NVMe SGLs lately with our controller when given multiple SGL segments in a single command from the driver (i.e., more than 256 SGL entries in a single nvme read/write command); where there are not enough SGL Data Block descriptors for the transfer.  The first segment properly links to, in my case, a Last Segment Descriptor, however at the end of the second segment there are not enough Data Block descriptors for the full transfer (it is usually missing space for 4096 or 8192 bytes) which I've verified manually using a PCIe analyzer.  This forces our NVMe Controller to fail and return the SGL Data Length Invalid (0xF) status code.
> 
> Repro Steps:
> 1. Set the sgl_threshold to 4096
> 2. Run a 4MB nvme read transfer (i.e., nvme read <sgl_nvme_dev> -s 0 -c 8191 -z 4194304 -t)
> 3. Repeat step 2 until the memory is split into multiple SGL segments or try a larger transfer.
> 
> Does anyone know of a patch for this issue?

Probably not the fix you were hoping for, but the following commit
will limit the number of SGL entries to 127 for PCI devices, so it'd
always only have 1 segment descriptor.

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=943e942e6266f22babee5efeb00f8f672fbff5bd

^ permalink raw reply	[flat|nested] 3+ messages in thread

* NVMe SGL Data Length Error
  2018-07-10 20:08 ` Keith Busch
@ 2018-07-10 23:55   ` Andrew Maier
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Maier @ 2018-07-10 23:55 UTC (permalink / raw)


Hey Keith,

Thanks for the commit. I hadn't been testing on 4.18-rc4 which it looks like has this commit as well.  

That does appear to fix the bug, however with large IOCTL commands I now get the "submit-io: Invalid argument" in place of the 2 segment transfer issue which I believe is expected.

Andrew

-----Original Message-----
From: Keith Busch <keith.busch@linux.intel.com> 
Sent: Tuesday, July 10, 2018 2:08 PM
To: Andrew Maier <andrew.maier at eideticom.com>
Cc: linux-nvme at lists.infradead.org; Stephen Bates <stephen at eideticom.com>
Subject: Re: NVMe SGL Data Length Error

On Tue, Jul 10, 2018@05:28:25PM +0000, Andrew Maier wrote:
> Hi all,
> 
> I've run into an issue with NVMe SGLs lately with our controller when given multiple SGL segments in a single command from the driver (i.e., more than 256 SGL entries in a single nvme read/write command); where there are not enough SGL Data Block descriptors for the transfer.  The first segment properly links to, in my case, a Last Segment Descriptor, however at the end of the second segment there are not enough Data Block descriptors for the full transfer (it is usually missing space for 4096 or 8192 bytes) which I've verified manually using a PCIe analyzer.  This forces our NVMe Controller to fail and return the SGL Data Length Invalid (0xF) status code.
> 
> Repro Steps:
> 1. Set the sgl_threshold to 4096
> 2. Run a 4MB nvme read transfer (i.e., nvme read <sgl_nvme_dev> -s 0 
> -c 8191 -z 4194304 -t) 3. Repeat step 2 until the memory is split into multiple SGL segments or try a larger transfer.
> 
> Does anyone know of a patch for this issue?

Probably not the fix you were hoping for, but the following commit will limit the number of SGL entries to 127 for PCI devices, so it'd always only have 1 segment descriptor.

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=943e942e6266f22babee5efeb00f8f672fbff5bd

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-07-10 23:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-10 17:28 NVMe SGL Data Length Error Andrew Maier
2018-07-10 20:08 ` Keith Busch
2018-07-10 23:55   ` Andrew Maier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.