Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* NVMe PCI driver ignores SQHD from completion entries
@ 2019-10-04 18:27 Vaibhav Nagarnaik
  2019-10-05 14:27 ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Vaibhav Nagarnaik @ 2019-10-04 18:27 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, Bart Van Assche, Eric Gouriou, Abbas Companywala, Mihai R.

[-- Attachment #1.1: Type: text/plain, Size: 1251 bytes --]

Hello

According to NVMe spec:
A Submission Queue entry has been consumed by the controller when a
Completion Queue entry is posted that indicates that the Submission
Queue Head Pointer has moved past the slot in which that Submission
Queue entry was placed.

Which means, the driver needs to verify SQ Head Pointer value reported
in the completion entries before considering a particular SQ entry
reusable. Otherwise it's undefined behavior.

However, the NVMe PCI driver (up to and including v5.3.2) does not
respect this value. Abbas (cc'd here) verified that entries that are
pending on the controller will be overwritten by the driver if there
are enough block requests.

Practically speaking, this is probably not an issue for most
controllers since they would start processing the NVMe command as soon
as possible. And then never look at the entry again.

Is this considered spec violation? And if so, is it worth providing a
fix for it?

Keeping in mind, that a particularly long running NVMe command will
not allow some controllers to move SQ Head value. Even though the
subsequent NVMe commands have received completions earlier. This will
cause new block requests from being submitted until the long running
command is completed.


Vaibhav

[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4851 bytes --]

[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NVMe PCI driver ignores SQHD from completion entries
  2019-10-04 18:27 NVMe PCI driver ignores SQHD from completion entries Vaibhav Nagarnaik
@ 2019-10-05 14:27 ` Keith Busch
  2019-10-08  0:32   ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2019-10-05 14:27 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Abbas Companywala, Bart Van Assche, linux-nvme, Jens Axboe,
	Eric Gouriou, Christoph Hellwig, Mihai R.,
	Sagi Grimberg

On Fri, Oct 04, 2019 at 11:27:30AM -0700, Vaibhav Nagarnaik wrote:
> According to NVMe spec:
> A Submission Queue entry has been consumed by the controller when a
> Completion Queue entry is posted that indicates that the Submission
> Queue Head Pointer has moved past the slot in which that Submission
> Queue entry was placed.
> 
> Which means, the driver needs to verify SQ Head Pointer value reported
> in the completion entries before considering a particular SQ entry
> reusable. Otherwise it's undefined behavior.

The spec allows the controller to process and complete commands out of
order, but the controller must fetch those commands in order. It's in the
"Theory of Operation" section 1.4.

Checking SQ head is required only if the host might submit more commands
than there are entries. The Linux nvme driver allocates enough tags
for the depth of the queue, leaving one entry empty, so having a tag
available means the next sq entry must be available.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NVMe PCI driver ignores SQHD from completion entries
  2019-10-05 14:27 ` Keith Busch
@ 2019-10-08  0:32   ` Vaibhav Nagarnaik
  2019-10-08 15:59     ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Vaibhav Nagarnaik @ 2019-10-08  0:32 UTC (permalink / raw)
  To: Keith Busch
  Cc: Abbas Companywala, Bart Van Assche, linux-nvme, Jens Axboe,
	Eric Gouriou, Christoph Hellwig, Mihai R.,
	Sagi Grimberg

[-- Attachment #1.1: Type: text/plain, Size: 1509 bytes --]

On Sat, Oct 5, 2019 at 7:27 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Fri, Oct 04, 2019 at 11:27:30AM -0700, Vaibhav Nagarnaik wrote:
> > According to NVMe spec:
> > A Submission Queue entry has been consumed by the controller when a
> > Completion Queue entry is posted that indicates that the Submission
> > Queue Head Pointer has moved past the slot in which that Submission
> > Queue entry was placed.
> >
> > Which means, the driver needs to verify SQ Head Pointer value reported
> > in the completion entries before considering a particular SQ entry
> > reusable. Otherwise it's undefined behavior.
>
> The spec allows the controller to process and complete commands out of
> order, but the controller must fetch those commands in order. It's in the
> "Theory of Operation" section 1.4.

Thanks for clarifying. That makes sense.

It would be nice to have the spec clarify other sections that makes
the host check the SQ Head Pointer value.

> Checking SQ head is required only if the host might submit more commands
> than there are entries. The Linux nvme driver allocates enough tags
> for the depth of the queue, leaving one entry empty, so having a tag
> available means the next sq entry must be available.

But the driver does overwrite submission queue entries under process
(which are already fetched by the controller). Are there guarantees
from controllers out there that once fetched, the SQ entries will not
be fetched again for any reason? The spec doesn't prohibit that.



Vaibhav

[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4851 bytes --]

[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NVMe PCI driver ignores SQHD from completion entries
  2019-10-08  0:32   ` Vaibhav Nagarnaik
@ 2019-10-08 15:59     ` Keith Busch
  2019-10-08 17:45       ` James Smart
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2019-10-08 15:59 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Abbas Companywala, Bart Van Assche, linux-nvme, Jens Axboe,
	Eric Gouriou, Christoph Hellwig, Mihai R.,
	Sagi Grimberg

On Mon, Oct 07, 2019 at 05:32:55PM -0700, Vaibhav Nagarnaik wrote:
> On Sat, Oct 5, 2019 at 7:27 AM Keith Busch <kbusch@kernel.org> wrote:
> > Checking SQ head is required only if the host might submit more commands
> > than there are entries. The Linux nvme driver allocates enough tags
> > for the depth of the queue, leaving one entry empty, so having a tag
> > available means the next sq entry must be available.
> 
> But the driver does overwrite submission queue entries under process
> (which are already fetched by the controller). Are there guarantees
> from controllers out there that once fetched, the SQ entries will not
> be fetched again for any reason? The spec doesn't prohibit that.

Every controller I've encountered fetches commands exactly once. I have to
admit I don't find this spelled out in great detail in the spec, though
out-of-order fetching doesn't sound like it would satisfy the properties of a
queue.

Closest thing I can find is in section 4.1:

  The consumer of entries on a queue uses the current Head entry pointer to
  identify the slot containing the next entry to be consumed.

And later says:

  Once a Submission Queue or Completion Queue entry has been consumed, the slot
  in which it was placed is free and available for reuse.

If you have to re-fetch an entry, then the controller didn't "consume" it on
the original fetch. If the controller completes a later entry, it has consumed
an entry that wasn't the "next" entry in violation of the consumer requirement.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NVMe PCI driver ignores SQHD from completion entries
  2019-10-08 15:59     ` Keith Busch
@ 2019-10-08 17:45       ` James Smart
  0 siblings, 0 replies; 5+ messages in thread
From: James Smart @ 2019-10-08 17:45 UTC (permalink / raw)
  To: Keith Busch, Vaibhav Nagarnaik
  Cc: Abbas Companywala, Bart Van Assche, linux-nvme, Jens Axboe,
	Mihai R.,
	Christoph Hellwig, Eric Gouriou, Sagi Grimberg

On 10/8/2019 8:59 AM, Keith Busch wrote:
> On Mon, Oct 07, 2019 at 05:32:55PM -0700, Vaibhav Nagarnaik wrote:
>> On Sat, Oct 5, 2019 at 7:27 AM Keith Busch <kbusch@kernel.org> wrote:
>>> Checking SQ head is required only if the host might submit more commands
>>> than there are entries. The Linux nvme driver allocates enough tags
>>> for the depth of the queue, leaving one entry empty, so having a tag
>>> available means the next sq entry must be available.
>> But the driver does overwrite submission queue entries under process
>> (which are already fetched by the controller). Are there guarantees
>> from controllers out there that once fetched, the SQ entries will not
>> be fetched again for any reason? The spec doesn't prohibit that.
> Every controller I've encountered fetches commands exactly once. I have to
> admit I don't find this spelled out in great detail in the spec, though
> out-of-order fetching doesn't sound like it would satisfy the properties of a
> queue.
>
> Closest thing I can find is in section 4.1:
>
>    The consumer of entries on a queue uses the current Head entry pointer to
>    identify the slot containing the next entry to be consumed.
>
> And later says:
>
>    Once a Submission Queue or Completion Queue entry has been consumed, the slot
>    in which it was placed is free and available for reuse.
>
> If you have to re-fetch an entry, then the controller didn't "consume" it on
> the original fetch. If the controller completes a later entry, it has consumed
> an entry that wasn't the "next" entry in violation of the consumer requirement.
>

Agree - Nearly everything so far works in a fetch-once-in-order manner, 
but there's also nothing quoted that has great conviction. There's a lot 
of implications, especially with the word "next", but no hard 
requirement spelled out.

What I read that is more conclusive is.:

(4.1) "A Submission Queue entry has been consumed by the controller when 
a Completion Queue entry is posted that indicates that the Submission 
Queue Head Pointer has moved past the slot in which that Submission 
Queue entry was placed."
and
(4.1) "The controller uses the SQ Head Pointer (SQHD) field in 
Completion Queue entries to communicate new values of the Submission 
Queue Head Pointer to the host. A new SQHD value indicates that 
Submission Queue entries have been consumed, but does not indicate 
either execution or completion of any command."
and
(CQE SQHD description) "A Submission Queue entry has been consumed by 
the controller when a Completion Queue entry is posted that indicates 
that the Submission Queue Head Pointer has moved past the slot in which 
that Submission Queue entry was placed."

All of which say an SQE slot isn't consumed until the controller says so 
via a CQE with a SQHD pointer that says it's moved past the SQE slot. 
And it becomes absolutely necessary with a fabric and OOO SQE delivery.

Given there's some deviation in interpretation, it's likely a good thing 
to take up in the NVM Express group.

-- james


-- 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-04 18:27 NVMe PCI driver ignores SQHD from completion entries Vaibhav Nagarnaik
2019-10-05 14:27 ` Keith Busch
2019-10-08  0:32   ` Vaibhav Nagarnaik
2019-10-08 15:59     ` Keith Busch
2019-10-08 17:45       ` James Smart

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git