All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-pci: slimmer CQ head update
@ 2020-02-28 18:45 Alexey Dobriyan
  2020-02-29  5:53 ` Keith Busch
  0 siblings, 1 reply; 31+ messages in thread
From: Alexey Dobriyan @ 2020-02-28 18:45 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi; +Cc: linux-nvme

Update CQ head with pre-increment operator. This saves subtraction of 1
and a few registers.

Also update phase with "^= 1". This generates only one RMW instruction.

	ffffffff815ba150 <nvme_update_cq_head>:
	ffffffff815ba150:       0f b7 47 70             movzx  eax,WORD PTR [rdi+0x70]
	ffffffff815ba154:       83 c0 01                add    eax,0x1
	ffffffff815ba157:       66 89 47 70             mov    WORD PTR [rdi+0x70],ax
	ffffffff815ba15b:       66 3b 47 68             cmp    ax,WORD PTR [rdi+0x68]
	ffffffff815ba15f:       74 01                   je     ffffffff815ba162 <nvme_update_cq_head+0x12>
	ffffffff815ba161:       c3                      ret
	ffffffff815ba162:       31 c0                   xor    eax,eax
	ffffffff815ba164:       80 77 74 01      ===>   xor    BYTE PTR [rdi+0x74],0x1
	ffffffff815ba168:       66 89 47 70             mov    WORD PTR [rdi+0x70],ax
	ffffffff815ba16c:       c3                      ret


	add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-119 (-119)
	Function                                     old     new   delta
	nvme_poll                                    690     678     -12
	nvme_dev_disable                            1230    1177     -53
	nvme_irq                                     613     559     -54

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 drivers/nvme/host/pci.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -982,11 +982,9 @@ static void nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
 
 static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
 {
-	if (nvmeq->cq_head == nvmeq->q_depth - 1) {
+	if (++nvmeq->cq_head == nvmeq->q_depth) {
 		nvmeq->cq_head = 0;
-		nvmeq->cq_phase = !nvmeq->cq_phase;
-	} else {
-		nvmeq->cq_head++;
+		nvmeq->cq_phase ^= 1;
 	}
 }
 

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-02-28 18:45 [PATCH] nvme-pci: slimmer CQ head update Alexey Dobriyan
@ 2020-02-29  5:53 ` Keith Busch
  2020-05-06 11:03   ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2020-02-29  5:53 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: axboe, hch, linux-nvme, sagi

On Fri, Feb 28, 2020 at 09:45:19PM +0300, Alexey Dobriyan wrote:
> @@ -982,11 +982,9 @@ static void nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
>  
>  static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
>  {
> -	if (nvmeq->cq_head == nvmeq->q_depth - 1) {
> +	if (++nvmeq->cq_head == nvmeq->q_depth) {
>  		nvmeq->cq_head = 0;
> -		nvmeq->cq_phase = !nvmeq->cq_phase;
> -	} else {
> -		nvmeq->cq_head++;
> +		nvmeq->cq_phase ^= 1;
>  	}
>  }
>  

Looks good to me.

Reviewed-by: Keith Busch <kbusch@kernel.org>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-02-29  5:53 ` Keith Busch
@ 2020-05-06 11:03   ` John Garry
  2020-05-06 12:47     ` Keith Busch
  0 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2020-05-06 11:03 UTC (permalink / raw)
  To: Keith Busch, Alexey Dobriyan; +Cc: axboe, hch, linux-nvme, sagi

On 29/02/2020 05:53, Keith Busch wrote:
> On Fri, Feb 28, 2020 at 09:45:19PM +0300, Alexey Dobriyan wrote:
>> @@ -982,11 +982,9 @@ static void nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
>>   

I think that this patch may be broken when use_threaded_handlers=1, as I 
see this crash on my arm64 system:

[   29.366700] Unable to handle kernel paging request at virtual address 
ffff800
01178500e
[   29.375316] Mem abort info:
[   29.378096]   ESR = 0x96000007
[   29.381135]   EC = 0x25: DABT (current EL), IL = 32 bits
[   29.386422]   SET = 0, FnV = 0
[   29.389461]   EA = 0, S1PTW = 0
[   29.392587] Data abort info:
[   29.395456]   ISV = 0, ISS = 0x00000007
[   29.399272]   CM = 0, WnR = 0
[   29.402226] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000020252be08000
[   29.408896] [ffff80001178500e] pgd=00000027ff835003, 
pud=00000027ff836003, pm
d=00000027ffa53003, pte=0000000000000000
[   29.419457] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[   29.425003] Modules linked in:
[   29.428046] CPU: 17 PID: 1023 Comm: irq/137-nvme2q2 Not tainted 
5.7.0-rc4-ga3
c4a5a-dirty #117
[   29.436528] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[   29.445356] pstate: 40400089 (nZcv daIf +PAN -UAO)
[   29.450133] pc : nvme_irq_check+0x10/0x28
[   29.454130] lr : __handle_irq_event_percpu+0x5c/0x144
[   29.459158] sp : ffff80001008be90
[   29.462458] x29: ffff80001008be90 x28: ffff002fd7b144c0
[   29.467746] x27: ffffb111ce83e440 x26: ffffb111ce83e468
[   29.473032] x25: ffffb111cf14426f x24: ffff0027d6b67800
[   29.478319] x23: 0000000000000089 x22: ffff80001008bf1c
[   29.483606] x21: 0000000000000000 x20: ffff0027d6b67800
[   29.488893] x19: ffff2027ca8e6880 x18: 0000000000000000
[   29.494180] x17: 000007ffffffffff x16: 00000000000001de
[   29.499466] x15: 000000000001ffff x14: 0000000000000001
[   29.504754] x13: 000000000000002b x12: 000000000000002b
[   29.510041] x11: 000000000000ffff x10: ffffb111cef71cb8
[   29.515328] x9 : 0000000000000040 x8 : ffff2027d10fbe08
[   29.520616] x7 : 0000000000000000 x6 : 0000000000000000
[   29.525903] x5 : 0000000000000000 x4 : 0000000000000001
[   29.531190] x3 : 0000000000000400 x2 : ffff800011781000
[   29.536477] x1 : ffff800011785000 x0 : 0000000000000001
[   29.541764] Call trace:
[   29.544201]  nvme_irq_check+0x10/0x28
[   29.547847]  handle_irq_event_percpu+0x1c/0x54
[   29.552269]  handle_irq_event+0x44/0x74
[   29.556088]  handle_fasteoi_irq+0xa8/0x160
[   29.560167]  generic_handle_irq+0x2c/0x40
[   29.564158]  __handle_domain_irq+0x64/0xb0
[   29.568238]  gic_handle_irq+0x94/0x194
[   29.571970]  el1_irq+0xb8/0x180
[   29.575097]  nvme_irq+0xe4/0x1f4
[   29.578310]  irq_thread_fn+0x28/0x6c
[   29.581868]  irq_thread+0x158/0x1e8
[   29.585343]  kthread+0x124/0x150
[   29.588556]  ret_from_fork+0x10/0x18
[   29.592116] Code: 7940e023 f9402422 3941d020 8b031041 (79401c21)
[   29.598249] ---[ end trace 33259050d5109e6f ]---
[   29.607114] Kernel panic - not syncing: Fatal exception in interrupt
[   29.613459] SMP: stopping secondary CPUs
[   29.617404] Kernel Offset: 0x3111bd6d0000 from 0xffff800010000000
[   29.623468] PHYS_OFFSET: 0xffffe13f80000000
[   29.627633] CPU features: 0x080002,22808a18
[   29.631797] Memory Limit: none
[   29.639056] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt
]---

Reverting this patch makes the crash go away.

>>   static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
>>   {
>> -	if (nvmeq->cq_head == nvmeq->q_depth - 1) {
>> +	if (++nvmeq->cq_head == nvmeq->q_depth) {

I figure momentarily nvmeq->cq_head may transition to equal 
nvmeq->q_depth and then to 0, which causes an out-of-bounds access here:

static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
{
	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
			nvmeq->cq_phase;
}

>>   		nvmeq->cq_head = 0;
>> -		nvmeq->cq_phase = !nvmeq->cq_phase;
>> -	} else {
>> -		nvmeq->cq_head++;
>> +		nvmeq->cq_phase ^= 1;
>>   	}
>>   }
>>   

Any disagreement to sending a revert patch?

Thanks,
John

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 11:03   ` John Garry
@ 2020-05-06 12:47     ` Keith Busch
  2020-05-06 13:24       ` Alexey Dobriyan
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2020-05-06 12:47 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, sagi, Alexey Dobriyan, linux-nvme, hch

On Wed, May 06, 2020 at 12:03:35PM +0100, John Garry wrote:
> On 29/02/2020 05:53, Keith Busch wrote:
> > On Fri, Feb 28, 2020 at 09:45:19PM +0300, Alexey Dobriyan wrote:
> > >   static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
> > >   {
> > > -	if (nvmeq->cq_head == nvmeq->q_depth - 1) {
> > > +	if (++nvmeq->cq_head == nvmeq->q_depth) {
> 
> I figure momentarily nvmeq->cq_head may transition to equal nvmeq->q_depth
> and then to 0, which causes an out-of-bounds access here:
> 
> static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
> {
> 	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
> 			nvmeq->cq_phase;
> }

Thanks for the notice, your analysis sounds correct to me.

Ideally we wouldn't let the irq check happen while the threaded
handler is running, but that is a bit risky to introduce at this
point. I'm okay with reverting to fix this issue.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 12:47     ` Keith Busch
@ 2020-05-06 13:24       ` Alexey Dobriyan
  2020-05-06 13:44         ` John Garry
  2020-05-06 14:44         ` Keith Busch
  0 siblings, 2 replies; 31+ messages in thread
From: Alexey Dobriyan @ 2020-05-06 13:24 UTC (permalink / raw)
  To: Keith Busch; +Cc: axboe, John Garry, hch, linux-nvme, sagi

On Wed, May 06, 2020 at 06:47:01AM -0600, Keith Busch wrote:
> On Wed, May 06, 2020 at 12:03:35PM +0100, John Garry wrote:
> > On 29/02/2020 05:53, Keith Busch wrote:
> > > On Fri, Feb 28, 2020 at 09:45:19PM +0300, Alexey Dobriyan wrote:
> > > >   static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
> > > >   {
> > > > -	if (nvmeq->cq_head == nvmeq->q_depth - 1) {
> > > > +	if (++nvmeq->cq_head == nvmeq->q_depth) {
> > 
> > I figure momentarily nvmeq->cq_head may transition to equal nvmeq->q_depth
> > and then to 0, which causes an out-of-bounds access here:
> > 
> > static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
> > {
> > 	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
> > 			nvmeq->cq_phase;
> > }
> 
> Thanks for the notice, your analysis sounds correct to me.
> 
> Ideally we wouldn't let the irq check happen while the threaded
> handler is running, but that is a bit risky to introduce at this
> point. I'm okay with reverting to fix this issue.

Pre-increment is still beneficial, should be done in register.
Please, test.


Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 drivers/nvme/host/pci.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -973,9 +973,16 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
 
 static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
 {
-	if (++nvmeq->cq_head == nvmeq->q_depth) {
+	/*
+	 * Can't pre-increment ->cq_head directly.
+	 * It must be in [0, ->q_depth) range at all times.
+	 */
+	u16 tmp = READ_ONCE(nvmeq->cq_head);
+	if (++tmp == nvmeq->q_depth) {
 		nvmeq->cq_head = 0;
 		nvmeq->cq_phase ^= 1;
+	} else {
+		nvmeq->cq_head = tmp;
 	}
 }
 

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 13:24       ` Alexey Dobriyan
@ 2020-05-06 13:44         ` John Garry
  2020-05-06 14:01           ` Alexey Dobriyan
  2020-05-06 14:35           ` Christoph Hellwig
  2020-05-06 14:44         ` Keith Busch
  1 sibling, 2 replies; 31+ messages in thread
From: John Garry @ 2020-05-06 13:44 UTC (permalink / raw)
  To: Alexey Dobriyan, Keith Busch; +Cc: axboe, hch, linux-nvme, sagi

On 06/05/2020 14:24, Alexey Dobriyan wrote:
> On Wed, May 06, 2020 at 06:47:01AM -0600, Keith Busch wrote:
>> On Wed, May 06, 2020 at 12:03:35PM +0100, John Garry wrote:
>>> On 29/02/2020 05:53, Keith Busch wrote:
>>>> On Fri, Feb 28, 2020 at 09:45:19PM +0300, Alexey Dobriyan wrote:
>>>>>    static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
>>>>>    {
>>>>> -	if (nvmeq->cq_head == nvmeq->q_depth - 1) {
>>>>> +	if (++nvmeq->cq_head == nvmeq->q_depth) {
>>>
>>> I figure momentarily nvmeq->cq_head may transition to equal nvmeq->q_depth
>>> and then to 0, which causes an out-of-bounds access here:
>>>
>>> static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
>>> {
>>> 	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
>>> 			nvmeq->cq_phase;
>>> }
>>
>> Thanks for the notice, your analysis sounds correct to me.
>>
>> Ideally we wouldn't let the irq check happen while the threaded
>> handler is running, but that is a bit risky to introduce at this
>> point. I'm okay with reverting to fix this issue.
> 
> Pre-increment is still beneficial, should be done in register.
> Please, test.

I'd rather hear the maintainer’s opinion before bothering testing this...

And you have not given any significant justification for your fix over a 
revert.

Thanks,
John

> 
> 
> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
> ---
> 
>   drivers/nvme/host/pci.c |    9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -973,9 +973,16 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
>   
>   static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
>   {
> -	if (++nvmeq->cq_head == nvmeq->q_depth) {
> +	/*
> +	 * Can't pre-increment ->cq_head directly.
> +	 * It must be in [0, ->q_depth) range at all times.
> +	 */
> +	u16 tmp = READ_ONCE(nvmeq->cq_head); > +	if (++tmp == nvmeq->q_depth) {
>   		nvmeq->cq_head = 0;
>   		nvmeq->cq_phase ^= 1;
> +	} else {
> +		nvmeq->cq_head = tmp;
>   	}
>   }
>   
> .
> 


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 13:44         ` John Garry
@ 2020-05-06 14:01           ` Alexey Dobriyan
  2020-05-06 14:35           ` Christoph Hellwig
  1 sibling, 0 replies; 31+ messages in thread
From: Alexey Dobriyan @ 2020-05-06 14:01 UTC (permalink / raw)
  To: John Garry; +Cc: Keith Busch, axboe, hch, linux-nvme, sagi

On Wed, May 06, 2020 at 02:44:50PM +0100, John Garry wrote:
> On 06/05/2020 14:24, Alexey Dobriyan wrote:
> > On Wed, May 06, 2020 at 06:47:01AM -0600, Keith Busch wrote:
> >> On Wed, May 06, 2020 at 12:03:35PM +0100, John Garry wrote:
> >>> On 29/02/2020 05:53, Keith Busch wrote:
> >>>> On Fri, Feb 28, 2020 at 09:45:19PM +0300, Alexey Dobriyan wrote:
> >>>>>    static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
> >>>>>    {
> >>>>> -	if (nvmeq->cq_head == nvmeq->q_depth - 1) {
> >>>>> +	if (++nvmeq->cq_head == nvmeq->q_depth) {
> >>>
> >>> I figure momentarily nvmeq->cq_head may transition to equal nvmeq->q_depth
> >>> and then to 0, which causes an out-of-bounds access here:
> >>>
> >>> static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
> >>> {
> >>> 	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
> >>> 			nvmeq->cq_phase;
> >>> }
> >>
> >> Thanks for the notice, your analysis sounds correct to me.
> >>
> >> Ideally we wouldn't let the irq check happen while the threaded
> >> handler is running, but that is a bit risky to introduce at this
> >> point. I'm okay with reverting to fix this issue.
> > 
> > Pre-increment is still beneficial, should be done in register.
> > Please, test.
> 
> I'd rather hear the maintainer’s opinion before bothering testing this...
> 
> And you have not given any significant justification for your fix over a 
> revert.

It is in the comment: whatever gets written to ->cq_head must be
in [0, ->q_depth) range.

> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -973,9 +973,16 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
> >   
> >   static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
> >   {
> > -	if (++nvmeq->cq_head == nvmeq->q_depth) {
> > +	/*
> > +	 * Can't pre-increment ->cq_head directly.
> > +	 * It must be in [0, ->q_depth) range at all times.
> > +	 */
> > +	u16 tmp = READ_ONCE(nvmeq->cq_head); > +	if (++tmp == nvmeq->q_depth) {
> >   		nvmeq->cq_head = 0;
> >   		nvmeq->cq_phase ^= 1;
> > +	} else {
> > +		nvmeq->cq_head = tmp;
> >   	}
> >   }

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 13:44         ` John Garry
  2020-05-06 14:01           ` Alexey Dobriyan
@ 2020-05-06 14:35           ` Christoph Hellwig
  2020-05-06 16:26             ` John Garry
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2020-05-06 14:35 UTC (permalink / raw)
  To: John Garry; +Cc: sagi, linux-nvme, hch, axboe, Keith Busch, Alexey Dobriyan

On Wed, May 06, 2020 at 02:44:50PM +0100, John Garry wrote:
> I'd rather hear the maintainer’s opinion before bothering testing this...

As the other maintainer - please give it a spin.  The explanation from
Alexey in the reply to your mail makes complete sense to me.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 13:24       ` Alexey Dobriyan
  2020-05-06 13:44         ` John Garry
@ 2020-05-06 14:44         ` Keith Busch
  2020-05-07 15:58           ` Keith Busch
  1 sibling, 1 reply; 31+ messages in thread
From: Keith Busch @ 2020-05-06 14:44 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: axboe, John Garry, hch, linux-nvme, sagi

On Wed, May 06, 2020 at 04:24:29PM +0300, Alexey Dobriyan wrote:
>  static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
>  {
> -	if (++nvmeq->cq_head == nvmeq->q_depth) {
> +	/*
> +	 * Can't pre-increment ->cq_head directly.
> +	 * It must be in [0, ->q_depth) range at all times.
> +	 */
> +	u16 tmp = READ_ONCE(nvmeq->cq_head);
> +	if (++tmp == nvmeq->q_depth) {


This looks correct too, but there's no need for the READ_ONCE access,
or the pre-increment:

	u16 tmp = nvmeq->cq_head + 1;
 	if (tmp == nvmeq->q_depth) {

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 14:35           ` Christoph Hellwig
@ 2020-05-06 16:26             ` John Garry
  2020-05-06 16:31               ` Will Deacon
  0 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2020-05-06 16:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: sagi, Will Deacon, linux-nvme, axboe, Keith Busch, Robin Murphy,
	Alexey Dobriyan

+ arm64 guys (Please note WARN below, generated when testing NVMe)

On 06/05/2020 15:35, Christoph Hellwig wrote:> On Wed, May 06, 2020 at 
02:44:50PM +0100, John Garry wrote:
>> I'd rather hear the maintainer’s opinion before bothering testing this...
> 
> As the other maintainer - please give it a spin. 

ok, so I have tested with the modification from Keith (to avoid the 
READ_ONCE()), and it's ok for use_threaded_interrupts=1.

However, for use_threaded_interrupts=0, I see a new issue:

[  122.524290] WARNING: CPU: 86 PID: 1157 at 
drivers/iommu/io-pgtable-arm.c:304
__arm_lpae_map+0x1d0/0x2bc
[R(60)] [0.0% d[  122.533640] Modules linked in:
[  122.538062] CPU: 86 PID: 1157 Comm: fio Tainted: G        W 
5.7.0-rc4
-ga3c4a5a-dirty #140
[  122.546892] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
one] [4822MB/0KB[  122.555722] pstate: 60400009 (nZCv daif +PAN -UAO)
[  122.561870] pc : __arm_lpae_map+0x1d0/0x2bc
[  122.566036] lr : __arm_lpae_map+0xf0/0x2bc
/0KB /s] [1235K/[  122.570114] sp : ffff800024b5b4b0
[  122.574794] x29: ffff800024b5b4b0 x28: ffffb62a9e503d4c
0/0 iops] [eta 1[  122.580083] x27: 0000000000001000 x26: 0000000000000001
[  122.586750] x25: ffff2027c6e10980 x24: 0000000000000f44
158050441d:06h:5[  122.592038] x23: 00000027d10f9000 x22: 00000000ef130000
[  122.598706] x21: 0000000000001000 x20: 0000000000000980
[  122.603994] x19: ffff0027dba3e200 x18: 0000000000000000
[  122.609970] x17: 0000000000000000 x16: 0000000000000000
[  122.615257] x15: 0000000000000000 x14: 0000000000000000
[  122.620544] x13: 0000000000000000 x12: 0000000000000000
[  122.625831] x11: 0000000000000002 x10: 0000000000001000
[  122.631119] x9 : 0000000000001000 x8 : 0000000000000000
[  122.636406] x7 : 0000000000000009 x6 : ffff2027c6e10000
[  122.641693] x5 : 0000000000000003 x4 : 0000000000000f44
[  122.646980] x3 : 00000000000ef130 x2 : 0000000000000002
[  122.652266] x1 : 0000000000000001 x0 : 0000000000000003
[  122.657554] Call trace:
[  122.659989]  __arm_lpae_map+0x1d0/0x2bc
[  122.663807]  __arm_lpae_map+0xf0/0x2bc
[  122.667537]  __arm_lpae_map+0xf0/0x2bc
[  122.671270]  __arm_lpae_map+0xf0/0x2bc
[  122.675003]  arm_lpae_map+0xdc/0x164
[  122.678563]  arm_smmu_map+0x18/0x28
[  122.682035]  __iommu_map+0xdc/0x17c
[  122.685508]  iommu_map_atomic+0x10/0x18
[  122.689325]  __iommu_dma_map+0xcc/0xe4
[  122.693058]  iommu_dma_map_page+0x80/0xc4
[  122.697050]  nvme_queue_rq+0x7dc/0x7fc
[  122.700781]  __blk_mq_try_issue_directly+0x108/0x1c0
[  122.705722]  blk_mq_request_issue_directly+0x40/0x64
[  122.710663]  blk_mq_try_issue_list_directly+0x5c/0xf0
[  122.715692]  blk_mq_sched_insert_requests+0x170/0x1d0
[  122.720721]  blk_mq_flush_plug_list+0x10c/0x158
[  122.725231]  blk_flush_plug_list+0xc4/0xd4
[  122.729308]  blk_finish_plug+0x30/0x40
[  122.733040]  blkdev_direct_IO+0x3d4/0x444
[  122.737034]  generic_file_read_iter+0x90/0xaf8
[  122.741458]  blkdev_read_iter+0x3c/0x54
[  122.745276]  aio_read+0xdc/0x138
[  122.748490]  io_submit_one+0x4ac/0xbf0
[  122.752221]  __arm64_sys_io_submit+0x16c/0x1f8
[  122.756645]  el0_svc_common.constprop.3+0xb8/0x170
[  122.761415]  do_el0_svc+0x70/0x88
[  122.764716]  el0_sync_handler+0xf0/0x12c
[  122.768620]  el0_sync+0x140/0x180
[  122.771918] ---[ end trace 2e5c7ee849d0ea30 ]---
[  122.776604] ------------[ cut here ]------------

and many more spewed out. And this is after the RCU stall report from a 
CPU locking up handling the NVMe hard interrupt (which has been seen on 
previous kernels).

> The explanation from
> Alexey in the reply to your mail makes complete sense to me.
> .

I meant that the original patch commit log had disassembly, etc to prove 
the worth, but now not much to show that the patch+fix is still better.

Thanks,
John

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 16:26             ` John Garry
@ 2020-05-06 16:31               ` Will Deacon
  2020-05-06 16:52                 ` Robin Murphy
  0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2020-05-06 16:31 UTC (permalink / raw)
  To: John Garry
  Cc: sagi, linux-nvme, Alexey Dobriyan, axboe, Keith Busch,
	Robin Murphy, Christoph Hellwig

On Wed, May 06, 2020 at 05:26:35PM +0100, John Garry wrote:
> + arm64 guys (Please note WARN below, generated when testing NVMe)
> 
> On 06/05/2020 15:35, Christoph Hellwig wrote:> On Wed, May 06, 2020 at
> 02:44:50PM +0100, John Garry wrote:
> > > I'd rather hear the maintainer’s opinion before bothering testing this...
> > 
> > As the other maintainer - please give it a spin.
> 
> ok, so I have tested with the modification from Keith (to avoid the
> READ_ONCE()), and it's ok for use_threaded_interrupts=1.
> 
> However, for use_threaded_interrupts=0, I see a new issue:
> 
> [  122.524290] WARNING: CPU: 86 PID: 1157 at
> drivers/iommu/io-pgtable-arm.c:304

That means you're trying to map something that is already mapped.

Will

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 16:31               ` Will Deacon
@ 2020-05-06 16:52                 ` Robin Murphy
  2020-05-06 17:02                   ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Murphy @ 2020-05-06 16:52 UTC (permalink / raw)
  To: Will Deacon, John Garry
  Cc: sagi, linux-nvme, Alexey Dobriyan, axboe, Keith Busch, Christoph Hellwig

On 2020-05-06 5:31 pm, Will Deacon wrote:
> On Wed, May 06, 2020 at 05:26:35PM +0100, John Garry wrote:
>> + arm64 guys (Please note WARN below, generated when testing NVMe)
>>
>> On 06/05/2020 15:35, Christoph Hellwig wrote:> On Wed, May 06, 2020 at
>> 02:44:50PM +0100, John Garry wrote:
>>>> I'd rather hear the maintainer’s opinion before bothering testing this...
>>>
>>> As the other maintainer - please give it a spin.
>>
>> ok, so I have tested with the modification from Keith (to avoid the
>> READ_ONCE()), and it's ok for use_threaded_interrupts=1.
>>
>> However, for use_threaded_interrupts=0, I see a new issue:
>>
>> [  122.524290] WARNING: CPU: 86 PID: 1157 at
>> drivers/iommu/io-pgtable-arm.c:304
> 
> That means you're trying to map something that is already mapped.

...which is a bit of an achievement with the DMA API, but most likely 
implies that a prior dma_unmap was called with the wrong size, such that 
we only unmapped part of the IOVA region that has now been reused.

John, if you can reproduce this semi-reliably, try throwing 
DMA_API_DEBUG at it in the first instance.

Robin.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 16:52                 ` Robin Murphy
@ 2020-05-06 17:02                   ` John Garry
  2020-05-07  8:18                     ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2020-05-06 17:02 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon
  Cc: sagi, linux-nvme, Alexey Dobriyan, axboe, Keith Busch, Christoph Hellwig

On 06/05/2020 17:52, Robin Murphy wrote:
> On 2020-05-06 5:31 pm, Will Deacon wrote:
>> On Wed, May 06, 2020 at 05:26:35PM +0100, John Garry wrote:
>>> + arm64 guys (Please note WARN below, generated when testing NVMe)
>>>
>>> On 06/05/2020 15:35, Christoph Hellwig wrote:> On Wed, May 06, 2020 at
>>> 02:44:50PM +0100, John Garry wrote:
>>>>> I'd rather hear the maintainer’s opinion before bothering testing 
>>>>> this...
>>>>
>>>> As the other maintainer - please give it a spin.
>>>
>>> ok, so I have tested with the modification from Keith (to avoid the
>>> READ_ONCE()), and it's ok for use_threaded_interrupts=1.
>>>
>>> However, for use_threaded_interrupts=0, I see a new issue:
>>>
>>> [  122.524290] WARNING: CPU: 86 PID: 1157 at
>>> drivers/iommu/io-pgtable-arm.c:304
>>
>> That means you're trying to map something that is already mapped.

Thanks

So I just confirmed iommu.strict=0 has the same issue, as expected.

> 
> ...which is a bit of an achievement with the DMA API, but most likely 
> implies that a prior dma_unmap was called with the wrong size, such that 
> we only unmapped part of the IOVA region that has now been reused.
> 
> John, if you can reproduce this semi-reliably,

Quite reliably.

> try throwing 
> DMA_API_DEBUG at it in the first instance.
> 

ok, let me have a look at that. BTW, could I request changing that WARN 
to the ratelimited variant :)

Cheers,
John

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 17:02                   ` John Garry
@ 2020-05-07  8:18                     ` John Garry
  2020-05-07 11:04                       ` Robin Murphy
  0 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2020-05-07  8:18 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon
  Cc: sagi, linux-nvme, Alexey Dobriyan, axboe, Keith Busch, Christoph Hellwig

On 06/05/2020 18:02, John Garry wrote:
> 
>>
>> ...which is a bit of an achievement with the DMA API, but most likely 
>> implies that a prior dma_unmap was called with the wrong size, such 
>> that we only unmapped part of the IOVA region that has now been reused.


So I quickly captured this, and I can look further when I get access to 
the HW again later today:

oard/Mouse KVM 1.1.0] on usb-0000:7a:01.0-1.1/input1
[   19.715413] hns3 0000:bd:00.0 eth0: link up
[  130.359047] random: crng init done
[  150.778579] rcu: INFO: rcu_preempt self-detected stall on CPU
[  150.784307] rcu:     32-....: (462 ticks this GP) 
idle=7ce/1/0x40000000000000
04 softirq=125/125 fqs=2609
[  150.793582]  (t=5254 jiffies g=333 q=7212724)
[  150.797921] Task dump for CPU 32:
[  150.801221] fio             R  running task        0  1084   1054 
0x00000002
[  150.808238] Call trace:
[  150.810683]  dump_backtrace+0x0/0x1a8
[  150.814331]  show_stack+0x14/0x1c
[  150.817633]  sched_show_task+0x16c/0x180
[  150.821540]  dump_cpu_task+0x40/0x4c
[  150.825102]  rcu_dump_cpu_stacks+0xb8/0xf8
[  150.829179]  rcu_sched_clock_irq+0x838/0xbe0
[  150.833431]  update_process_times+0x2c/0x50
[  150.837597]  tick_sched_handle.isra.14+0x34/0x54
[  150.842193]  tick_sched_timer+0x48/0x88
[  150.846012]  __hrtimer_run_queues+0x104/0x154
[  150.850349]  hrtimer_interrupt+0xe0/0x240
[  150.854346]  arch_timer_handler_phys+0x2c/0x38
[  150.858770]  handle_percpu_devid_irq+0x78/0x124
[  150.863281]  generic_handle_irq+0x2c/0x40
[  150.867272]  __handle_domain_irq+0x64/0xb0
[  150.871353]  gic_handle_irq+0x94/0x194
[  150.875086]  el1_irq+0xb8/0x180
[  150.878213]  efi_header_end+0xb4/0x230
[  150.881948]  irq_exit+0xc8/0xd0
[  150.885076]  __handle_domain_irq+0x68/0xb0
[  150.889153]  gic_handle_irq+0x94/0x194
[  150.892885]  el1_irq+0xb8/0x180
[  150.896016]  _raw_spin_unlock_irqrestore+0x14/0x3c
[  150.900787]  debug_dma_assert_idle+0x10c/0x1f8
[  150.905215]  wp_page_copy+0x94/0x8c0
[  150.908774]  do_wp_page+0xa4/0x54c
[  150.912162]  __handle_mm_fault+0x6dc/0x1008
[  150.916326]  handle_mm_fault+0xf0/0x1b0
[  150.920148]  do_page_fault+0x250/0x3dc
[  150.923881]  do_mem_abort+0x3c/0x9c
[  150.927354]  el0_sync_handler+0xb0/0x12c
[  150.931259]  el0_sync+0x140/0x180
[  150.934559] Task dump for CPU 39:
[  150.937859] fio             R  running task        0  1059   1054 
0x00000002
[  150.944875] Call trace:
[  150.947313]  __switch_to+0xf8/0x158
[  150.950787]  0xffff800024863c58
[  150.953914] Task dump for CPU 52:
[  150.957214] fio             R  running task        0  1072   1054 
0x00000002
[  150.964230] Call trace:
[  150.966667]  __switch_to+0xf8/0x158
[  150.970139]  0xffff8000248cbc58
[  150.973266] Task dump for CPU 58:
[  150.976566] fio             R  running task        0  1078   1054 
0x00000002
[  150.983581] Call trace:
[  150.986020]  __switch_to+0xf8/0x158
[  150.989492]  0xffff8000248fbc58
[  176.682259] ------------[ cut here ]------------
[  176.686874] WARNING: CPU: 0 PID: 1114 at 
drivers/iommu/io-pgtable-arm.c:603 _
_arm_lpae_unmap+0x4a0/0x4c4
[  176.696310] Modules linked in:
[  176.699355] CPU: 0 PID: 1114 Comm: fio Not tainted 
5.7.0-rc4-g6ef3717-dirty #
146
[  176.706716] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  176.715548] pstate: 80400089 (Nzcv daIf +PAN -UAO)
[  176.720319] pc : __arm_lpae_unmap+0x4a0/0x4c4
[  176.724656] lr : __arm_lpae_unmap+0xf4/0x4c4
[  176.728907] sp : ffff8000100039d0
[  176.732208] x29: ffff8000100039d0 x28: ffff800010003d28
[  176.737496] x27: ffffa6c8a06deec8 x26: 0000000000001000
[  176.742784] x25: ffff800010003d28 x24: 0000000000001000
[  176.748071] x23: 00000000ef371000 x22: 0000000000000000
[  176.753359] x21: ffff002fdce3bd88 x20: 0000000000000000
[  176.758646] x19: ffff002fdce3bc00 x18: 0000000000000000
[  176.763934] x17: 0000001fffffffff x16: 000000000000021e
[  176.769221] x15: 000000000001ffff x14: 0000000000000001
[  176.774509] x13: 0000000000000025 x12: 0000000000000025
[  176.779796] x11: 000000000000ffff x10: ffffa6c8a0e151e0
[  176.785084] x9 : ffff2027c72a7b88 x8 : 000000000000000c
[  176.790371] x7 : 0000000000000b88 x6 : 0000000000000009
[  176.795658] x5 : ffff2027c72a7000 x4 : 0000000000000003
[  176.800945] x3 : 00000000000ef371 x2 : 0000000000000009
[  176.806232] x1 : ffff800010003d28 x0 : 0000000000000000
[  176.811521] Call trace:
[  176.813956]  __arm_lpae_unmap+0x4a0/0x4c4
[  176.817947]  __arm_lpae_unmap+0xf4/0x4c4
[  176.821844]  __arm_lpae_unmap+0xf4/0x4c4
[  176.825748]  __arm_lpae_unmap+0xf4/0x4c4
[  176.829654]  arm_lpae_unmap+0x68/0x7c
[  176.833301]  arm_smmu_unmap+0x18/0x20
[  176.836947]  __iommu_unmap+0xa8/0xfc
[  176.840507]  iommu_unmap_fast+0xc/0x14
[  176.844239]  __iommu_dma_unmap+0x70/0xd8
[  176.848143]  iommu_dma_unmap_page+0x38/0x88
[  176.852310]  nvme_unmap_data+0x238/0x23c
[  176.856216]  nvme_pci_complete_rq+0x28/0x58
[  176.860384]  blk_mq_complete_request+0x114/0x150
[  176.864979]  nvme_irq+0xbc/0x204
[  176.868196]  __handle_irq_event_percpu+0x5c/0x144
[  176.872879]  handle_irq_event_percpu+0x1c/0x54
[  176.877304]  handle_irq_event+0x44/0x74
[  176.881121]  handle_fasteoi_irq+0xa8/0x160
[  176.885198]  generic_handle_irq+0x2c/0x40
[  176.889189]  __handle_domain_irq+0x64/0xb0
[  176.893267]  gic_handle_irq+0x94/0x194
[  176.897001]  el1_irq+0xb8/0x180
[  176.900129]  el0_svc_common.constprop.3+0x34/0x170
[  176.904898]  do_el0_svc+0x70/0x88
[  176.908199]  el0_sync_handler+0xf0/0x12c
[  176.912104]  el0_sync+0x140/0x180
[  176.915404] ---[ end trace c76e980b98b6531c ]---
[  176.920011] ------------[ cut here ]------------
[  176.924608] WARNING: CPU: 0 PID: 1114 at 
drivers/iommu/dma-iommu.c:471 __iomm
u_dma_unmap+0xd0/0xd8
[  176.933524] Modules linked in:
[  176.936566] CPU: 0 PID: 1114 Comm: fio Tainted: G        W 
5.7.0-rc4-
g6ef3717-dirty #146
[  176.945308] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  176.954138] pstate: 80400089 (Nzcv daIf +PAN -UAO)
[  176.958907] pc : __iommu_dma_unmap+0xd0/0xd8
[  176.963157] lr : __iommu_dma_unmap+0x70/0xd8
[  176.967407] sp : ffff800010003cd0
[  176.970707] x29: ffff800010003cd0 x28: ffff002fc25fd280
[  176.975995] x27: ffffa6c8a06deec8 x26: 0000000000001000
[  176.981283] x25: 0000000000000012 x24: ffff2027cbe74000
[  176.986570] x23: ffff002fdce3bd88 x22: 00000000ef371000
[  176.991858] x21: ffff800010003d28 x20: ffff002fd782a000
[  176.997146] x19: 0000000000001000 x18: 0000000000000000
[  177.002433] x17: 0000001fffffffff x16: 000000000000021e
[  177.007720] x15: 000000000001ffff x14: 0000000000000001
[  177.013008] x13: 0000000000000025 x12: 0000000000000025
[  177.018296] x11: 000000000000ffff x10: ffffa6c8a0e151e0
[  177.023583] x9 : ffff2027c72a7b88 x8 : 000000000000000c
[  177.028870] x7 : 0000000000000b88 x6 : 0000000000000009
[  177.034158] x5 : ffff2027c72a7000 x4 : 0000000000000000
[  177.039445] x3 : 00000000000ef371 x2 : 0000000000000009
[  177.044732] x1 : ffff800010003d28 x0 : 0000000000000000
[  177.050020] Call trace:
[  177.052454]  __iommu_dma_unmap+0xd0/0xd8
[  177.056359]  iommu_dma_unmap_page+0x38/0x88
[  177.060525]  nvme_unmap_data+0x238/0x23c
[  177.064431]  nvme_pci_complete_rq+0x28/0x58
[  177.068596]  blk_mq_complete_request+0x114/0x150
[  177.073191]  nvme_irq+0xbc/0x204
[  177.076405]  __handle_irq_event_percpu+0x5c/0x144
[  177.081088]  handle_irq_event_percpu+0x1c/0x54
[  177.085512]  handle_irq_event+0x44/0x74
[  177.089329]  handle_fasteoi_irq+0xa8/0x160
[  177.093409]  generic_handle_irq+0x2c/0x40
[  177.097399]  __handle_domain_irq+0x64/0xb0
[  177.101476]  gic_handle_irq+0x94/0x194
[  177.105209]  el1_irq+0xb8/0x180
[  177.108337]  el0_svc_common.constprop.3+0x34/0x170
[  177.113106]  do_el0_svc+0x70/0x88
[  177.116407]  el0_sync_handler+0xf0/0x12c
[  177.120312]  el0_sync+0x140/0x180
[  177.123612] ---[ end trace c76e980b98b6531d ]---
[  177.128213] ------------[ cut here ]------------
[  177.132810] DMA-API: nvme 0000:85:00.0: device driver tries to free 
DMA memor
y it has not allocated [device address=0x00000000ef371000] [size=4096 bytes]
[  177.146484] WARNING: CPU: 0 PID: 1114 at kernel/dma/debug.c:1017 
check_unmap+
0x698/0x86c
[  177.154536] Modules linked in:
[  177.157578] CPU: 0 PID: 1114 Comm: fio Tainted: G        W 
5.7.0-rc4-
g6ef3717-dirty #146
[  177.166322] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  177.175152] pstate: 60400089 (nZCv daIf +PAN -UAO)
[  177.179921] pc : check_unmap+0x698/0x86c
[  177.183827] lr : check_unmap+0x698/0x86c
[  177.187732] sp : ffff800010003c40
[  177.191032] x29: ffff800010003c40 x28: ffff002fc25fd280
[  177.196321] x27: ffffa6c8a06deec8 x26: 0000000000001000
[  177.201608] x25: 0000000000000080 x24: ffff2027cbe74000
[  177.206895] x23: ffffa6c8a1084530 x22: ffffa6c8a0dfa000
[  177.212182] x21: 00000000ef371000 x20: ffff800010003cc0
[  177.217461] x19: 00000000ef371000 x18: 0000000000000000
[  177.222740] x17: 000000000000000f x16: 00000000000001a4
[  177.228028] x15: 000000000001ffff x14: 30343d657a69735b
[  177.233316] x13: 205d303030313733 x12: 6665303030303030
[  177.238603] x11: 303078303d737365 x10: 7264646120656369
[  177.243891] x9 : ffffa6c8a0e11c80 x8 : 61636f6c6c612074
[  177.249178] x7 : 6f6e207361682074 x6 : ffff0027ffd1d1f0
[  177.254464] x5 : 0000000000000000 x4 : 0000000000000000
[  177.259752] x3 : 00000000ffffffff x2 : ffffa6c8a0e11d00
[  177.265039] x1 : 0000000100010001 x0 : 000000000000008d
[  177.270326] Call trace:
[  177.272761]  check_unmap+0x698/0x86c
[  177.276322]  debug_dma_unmap_page+0x6c/0x78
[  177.280487]  nvme_unmap_data+0x7c/0x23c
[  177.284305]  nvme_pci_complete_rq+0x28/0x58
estuary:/$

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07  8:18                     ` John Garry
@ 2020-05-07 11:04                       ` Robin Murphy
  2020-05-07 13:55                         ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Murphy @ 2020-05-07 11:04 UTC (permalink / raw)
  To: John Garry, Will Deacon
  Cc: sagi, linux-nvme, Alexey Dobriyan, axboe, Keith Busch, Christoph Hellwig

On 2020-05-07 9:18 am, John Garry wrote:
> On 06/05/2020 18:02, John Garry wrote:
>>
>>>
>>> ...which is a bit of an achievement with the DMA API, but most likely 
>>> implies that a prior dma_unmap was called with the wrong size, such 
>>> that we only unmapped part of the IOVA region that has now been reused.
> 
> 
> So I quickly captured this, and I can look further when I get access to 
> the HW again later today:
[...]
> [  177.132810] DMA-API: nvme 0000:85:00.0: device driver tries to free 
> DMA memor
> y it has not allocated [device address=0x00000000ef371000] [size=4096 
> bytes]
[...]
> [  177.276322]  debug_dma_unmap_page+0x6c/0x78
> [  177.280487]  nvme_unmap_data+0x7c/0x23c
> [  177.284305]  nvme_pci_complete_rq+0x28/0x58

OK, so there's clearly something amiss there. I would have suggested 
next sticking the SMMU in passthrough to help focus on the DMA API 
debugging, but since that "DMA address" looks suspiciously like a 
physical address rather than an IOVA, I suspect that things might 
suddenly appear to be working fine if you do...

Robin.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 11:04                       ` Robin Murphy
@ 2020-05-07 13:55                         ` John Garry
  2020-05-07 14:23                           ` Keith Busch
  0 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2020-05-07 13:55 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon
  Cc: sagi, linux-nvme, Alexey Dobriyan, axboe, Keith Busch, Christoph Hellwig

On 07/05/2020 12:04, Robin Murphy wrote:
>> [  177.132810] DMA-API: nvme 0000:85:00.0: device driver tries to free 
>> DMA memor
>> y it has not allocated [device address=0x00000000ef371000] [size=4096 
>> bytes]
> [...]
>> [  177.276322]  debug_dma_unmap_page+0x6c/0x78
>> [  177.280487]  nvme_unmap_data+0x7c/0x23c
>> [  177.284305]  nvme_pci_complete_rq+0x28/0x58
> 
> OK, so there's clearly something amiss there. I would have suggested 
> next sticking the SMMU in passthrough to help focus on the DMA API 
> debugging, but since that "DMA address" looks suspiciously like a 
> physical address rather than an IOVA, I suspect that things might 
> suddenly appear to be working fine if you do...

OK, seems sensible. However it looks like this guy triggers the issue:

324b494c2862 nvme-pci: Remove two-pass completions

With carrying the revert of $subject, it's a quick bisect to that patch. 
I always get the RCU cpu stall warning, which obscures thing a bit.

Cheers,
John

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 13:55                         ` John Garry
@ 2020-05-07 14:23                           ` Keith Busch
  2020-05-07 15:11                             ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2020-05-07 14:23 UTC (permalink / raw)
  To: John Garry
  Cc: sagi, Robin Murphy, linux-nvme, Christoph Hellwig, axboe,
	Will Deacon, Alexey Dobriyan

On Thu, May 07, 2020 at 02:55:37PM +0100, John Garry wrote:
> On 07/05/2020 12:04, Robin Murphy wrote:
> > > [  177.132810] DMA-API: nvme 0000:85:00.0: device driver tries to
> > > free DMA memor
> > > y it has not allocated [device address=0x00000000ef371000]
> > > [size=4096 bytes]
> > [...]
> > > [  177.276322]  debug_dma_unmap_page+0x6c/0x78
> > > [  177.280487]  nvme_unmap_data+0x7c/0x23c
> > > [  177.284305]  nvme_pci_complete_rq+0x28/0x58
> > 
> > OK, so there's clearly something amiss there. I would have suggested
> > next sticking the SMMU in passthrough to help focus on the DMA API
> > debugging, but since that "DMA address" looks suspiciously like a
> > physical address rather than an IOVA, I suspect that things might
> > suddenly appear to be working fine if you do...
> 
> OK, seems sensible. However it looks like this guy triggers the issue:
> 
> 324b494c2862 nvme-pci: Remove two-pass completions
> 
> With carrying the revert of $subject, it's a quick bisect to that patch.

That's weird. Do you see this with different nvme controllers? Does your
controller write the phase bit before writing the command id in the cqe?
Asking because this looks like we're seeing an older command id in the
cqe, and the only thing that patch you've bisected should do is remove a
delay between observing the new phase and reading the command id.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 14:23                           ` Keith Busch
@ 2020-05-07 15:11                             ` John Garry
  2020-05-07 15:35                               ` Keith Busch
  0 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2020-05-07 15:11 UTC (permalink / raw)
  To: Keith Busch
  Cc: sagi, Robin Murphy, linux-nvme, Christoph Hellwig, axboe,
	Will Deacon, Alexey Dobriyan

On 07/05/2020 15:23, Keith Busch wrote:
> On Thu, May 07, 2020 at 02:55:37PM +0100, John Garry wrote:
>> On 07/05/2020 12:04, Robin Murphy wrote:
>>>> [  177.132810] DMA-API: nvme 0000:85:00.0: device driver tries to
>>>> free DMA memor
>>>> y it has not allocated [device address=0x00000000ef371000]
>>>> [size=4096 bytes]
>>> [...]
>>>> [  177.276322]  debug_dma_unmap_page+0x6c/0x78
>>>> [  177.280487]  nvme_unmap_data+0x7c/0x23c
>>>> [  177.284305]  nvme_pci_complete_rq+0x28/0x58
>>>
>>> OK, so there's clearly something amiss there. I would have suggested
>>> next sticking the SMMU in passthrough to help focus on the DMA API
>>> debugging, but since that "DMA address" looks suspiciously like a
>>> physical address rather than an IOVA, I suspect that things might
>>> suddenly appear to be working fine if you do...
>>
>> OK, seems sensible. However it looks like this guy triggers the issue:
>>
>> 324b494c2862 nvme-pci: Remove two-pass completions
>>
>> With carrying the revert of $subject, it's a quick bisect to that patch.
> 
> That's weird.

Or maybe exacerbating some other fault?

  Do you see this with different nvme controllers?

I only have 3x, and they are all ES3000 V3 NVMe PCIe SSD

> Does your
> controller write the phase bit before writing the command id in the cqe?

I don't know. Is that sort of info available from nvme-cli?

> Asking because this looks like we're seeing an older command id in the
> cqe, and the only thing that patch you've bisected should do is remove a
> delay between observing the new phase and reading the command id.
> .

Another log, below, with SMMU off.

John


fio-2.1.10
Starting 60 processes
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
[  885.335343] ------------[ cut here ]------------ta 00m:05s]
[  885.335999] ------------[ cut here ]------------
[  885.339967] DMA-API: nvme 0000:82:00.0: device driver tries to free 
DMA memor
y it has not allocated [device address=0x0000002fd5870000] [size=4096 bytes]
[  885.344575] WARNING: CPU: 41 PID: 4565 at block/blk-mq.c:665 
blk_mq_start_req
uest+0xc4/0xcc
[  885.344577] Modules linked in:
[  885.358287] WARNING: CPU: 39 PID: 1074 at kernel/dma/debug.c:1014 
check_unmap
+0x698/0x86c
[  885.366601] CPU: 41 PID: 4565 Comm: fio Not tainted 
5.6.0-rc4-gd64d242-dirty
#155
[  885.369645] Modules linked in:
[  885.377799] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  885.385262] CPU: 39 PID: 1074 Comm: irq/230-nvme1q2 Not tainted 
5.6.0-rc4-gd6
4d242-dirty #155
[  885.388308] pstate: 60400009 (nZCv daif +PAN -UAO)
[  885.397155] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  885.397157] pstate: 60c00009 (nZCv daif +PAN +UAO)
[  885.405656] pc : blk_mq_start_request+0xc4/0xcc
[  885.405662] lr : nvme_queue_rq+0x134/0x7cc
[  885.410437] pc : check_unmap+0x698/0x86c
[  885.419281] sp : ffff800025ccb770
[  885.419283] x29: ffff800025ccb770 x28: ffff002fdc16d200
[  885.424061] lr : check_unmap+0x698/0x86c
[  885.428577] x27: ffff002fdc16d318 x26: fffffe00bf3621c0
[  885.432663] sp : ffff8000217dbb40
[  885.436574] x25: 0000000000001000 x24: 0000000000000000
[  885.439881] x29: ffff8000217dbb40 x28: ffff002fdc6c6cd4
[  885.445177] x23: 0000000000001000 x22: ffff2027c7540000
[  885.449088] x27: ffffa99cc3c7f000 x26: 0000000000001000
[  885.454387] x21: ffff800025ccb8b0 x20: ffff2027c6ae0000
[  885.457694] x25: 0000000000000000 x24: ffff2027c7540000
[  885.462990] x19: ffff002fdc16d200 x18: 0000000000000000
[  885.468288] x23: ffffa99cc55630d0 x22: ffffa99cc530a000
[  885.473584] x17: 0000000000000000 x16: 0000000000000000
[  885.478882] x21: 0000002fd5870000 x20: ffff8000217dbbc0
[  885.484178] x15: 0000000000000000 x14: 0000000000000000
[  885.489477] x19: 0000002fd5870000 x18: 0000000000000000
[  885.494773] x13: 0000000066641000 x12: ffff2027a15cf9a0
[  885.500071] x17: 0000000000000000 x16: 0000000000000000
[  885.505367] x11: 0000000026641fff x10: 0000000000000002
[  885.510665] x15: 0000000000000000 x14: 7a69735b205d3030
[  885.515964] x9 : 0000000000a80000 x8 : ffff2027d3cb9ac8
[  885.521264] x13: 3030373835646632 x12: 3030303030307830
[  885.526559] x7 : ffffa99cc4f34000 x6 : 0000002fd5887000
[  885.531856] x11: 3d73736572646461 x10: 206563697665645b
[  885.537152] x5 : 0000000000000000 x4 : 0000000000000000
[  885.542449] x9 : ffffa99cc5321bc8 x8 : 6c6120746f6e2073
[  885.547745] x3 : ffff2027d039c0b0 x2 : 0000000000000000
[  885.553042] x7 : 6168207469207972 x6 : ffff002fffdbe1b8
[  885.558338] x1 : 0000000100000000 x0 : 0000000000000002
[  885.563634] x5 : 0000000000000000 x4 : 0000000000000000
[  885.568932] Call trace:
[  885.574228] x3 : 0000000000000000 x2 : ffff002fffdc5088
[  885.579531]  blk_mq_start_request+0xc4/0xcc
[  885.584821] x1 : 0000000100000001 x0 : 000000000000008d
[  885.590120]  nvme_queue_rq+0x134/0x7cc
[  885.595414] Call trace:
[  885.597858]  __blk_mq_try_issue_directly+0x108/0x1bc
[  885.603158]  check_unmap+0x698/0x86c
[  885.607324]  blk_mq_request_issue_directly+0x40/0x64
[  885.612620]  debug_dma_unmap_page+0x6c/0x78
[  885.616359]  blk_mq_try_issue_list_directly+0x50/0xc8
[  885.618800]  nvme_unmap_data+0x7c/0x23c
[  885.623752]  blk_mq_sched_insert_requests+0x170/0x1d0
[  885.623753]  blk_mq_flush_plug_list+0x10c/0x158
[  885.627318]  nvme_pci_complete_rq+0x3c/0x10c
[  885.632271]  blk_flush_plug_list+0xc4/0xd4
[  885.632273]  blk_finish_plug+0x30/0x40
[  885.636444]  blk_mq_complete_request+0x114/0x150
[  885.641484]  blkdev_direct_IO+0x3d4/0x444
[  885.645306]  nvme_irq+0xbc/0x204
[  885.650346]  generic_file_read_iter+0x90/0xaec
[  885.654863]  irq_thread_fn+0x28/0x6c
[  885.659118]  blkdev_read_iter+0x3c/0x54
[  885.663203]  irq_thread+0x158/0x1e8
[  885.666943]  aio_read+0xdc/0x138
[  885.671548]  kthread+0xf4/0x120
[  885.675544]  io_submit_one+0x4ac/0xbf0
[  885.675546]  __arm64_sys_io_submit+0x16c/0x1f8
[  885.678766]  ret_from_fork+0x10/0x18
[  885.683199]  el0_svc_common.constprop.3+0xb8/0x170
[  885.686765] ---[ end trace fc66a57b25e362aa ]---
[  885.690593]  do_el0_svc+0x70/0x88
[  885.724844]  el0_sync_handler+0xf4/0x130
[  885.728758]  el0_sync+0x140/0x180
[  885.732065] ---[ end trace fc66a57b25e362ab ]---
[  885.736768] ------------[ cut here ]------------
[  885.741379] refcount_t: underflow; use-after-free.
[  885.746184] WARNING: CPU: 39 PID: 1074 at lib/refcount.c:28 
refcount_warn_sat
urate+0x6c/0x13c
[  885.754687] Modules linked in:
[  885.757736] CPU: 39 PID: 1074 Comm: irq/230-nvme1q2 Tainted: G        W
    5.6.0-rc4-gd64d242-dirty #155
[  885.767623] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  885.776471] pstate: 60c00009 (nZCv daif +PAN +UAO)
[  885.781250] pc : refcount_warn_saturate+0x6c/0x13c
[  885.786028] lr : refcount_warn_saturate+0x6c/0x13c
[  885.790805] sp : ffff8000217dbc40
[  885.794112] x29: ffff8000217dbc40 x28: ffff002fdc6c6cd4
[  885.799411] x27: ffffa99cc3c7f000 x26: ffffa99cc3c7f948
[  885.804710] x25: 0000000000000001 x24: ffffa99cc4cd3710
[  885.810007] x23: fffffffffffffff8 x22: ffffde07ebde5680
[  885.815305] x21: 0000000000000000 x20: ffff2027c6ae0000
[  885.820603] x19: ffff002fdc16d200 x18: 0000000000000000
[  885.825901] x17: 0000000000000000 x16: 0000000000000000
[  885.831199] x15: 0000000000000000 x14: ffff002fdd922948
[  885.836498] x13: ffff002fdd922150 x12: 0000000000000000
[  885.841796] x11: 00000000000008a4 x10: 000000000000000f
[  885.847094] x9 : ffffa99cc5321bc8 x8 : 72657466612d6573
[  885.852391] x7 : 75203b776f6c6672 x6 : ffff002fffdbe1b8
[  885.857689] x5 : 0000000000000000 x4 : 0000000000000000
[  885.862986] x3 : 0000000000000000 x2 : ffff002fffdc5088
[  885.868284] x1 : 0000000100000001 x0 : 0000000000000026
[  885.873582] Call trace:
[  885.876028]  refcount_warn_saturate+0x6c/0x13c
[  885.880462]  blk_mq_free_request+0x12c/0x14c
[  885.884723]  blk_mq_end_request+0x114/0x134
[  885.888898]  nvme_complete_rq+0x50/0x128
[  885.892811]  nvme_pci_complete_rq+0x44/0x10c
[  885.897070]  blk_mq_complete_request+0x114/0x150
[  885.901674]  nvme_irq+0xbc/0x204
[  885.904898]  irq_thread_fn+0x28/0x6c
[  885.908464]  irq_thread+0x158/0x1e8
[  885.911945]  kthread+0xf4/0x120
[  885.915080]  ret_from_fork+0x10/0x18
[  885.918646] ---[ end trace fc66a57b25e362ac ]---


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 15:11                             ` John Garry
@ 2020-05-07 15:35                               ` Keith Busch
  2020-05-07 15:41                                 ` John Garry
  2020-05-07 16:26                                 ` Robin Murphy
  0 siblings, 2 replies; 31+ messages in thread
From: Keith Busch @ 2020-05-07 15:35 UTC (permalink / raw)
  To: John Garry
  Cc: sagi, Robin Murphy, linux-nvme, Christoph Hellwig, axboe,
	Will Deacon, Alexey Dobriyan

On Thu, May 07, 2020 at 04:11:23PM +0100, John Garry wrote:
> On 07/05/2020 15:23, Keith Busch wrote:
> > On Thu, May 07, 2020 at 02:55:37PM +0100, John Garry wrote:
> > > On 07/05/2020 12:04, Robin Murphy wrote:
> > > > > [  177.132810] DMA-API: nvme 0000:85:00.0: device driver tries to
> > > > > free DMA memor
> > > > > y it has not allocated [device address=0x00000000ef371000]
> > > > > [size=4096 bytes]
> > > > [...]
> > > > > [  177.276322]  debug_dma_unmap_page+0x6c/0x78
> > > > > [  177.280487]  nvme_unmap_data+0x7c/0x23c
> > > > > [  177.284305]  nvme_pci_complete_rq+0x28/0x58
> > > > 
> > > > OK, so there's clearly something amiss there. I would have suggested
> > > > next sticking the SMMU in passthrough to help focus on the DMA API
> > > > debugging, but since that "DMA address" looks suspiciously like a
> > > > physical address rather than an IOVA, I suspect that things might
> > > > suddenly appear to be working fine if you do...
> > > 
> > > OK, seems sensible. However it looks like this guy triggers the issue:
> > > 
> > > 324b494c2862 nvme-pci: Remove two-pass completions
> > > 
> > > With carrying the revert of $subject, it's a quick bisect to that patch.
> > 
> > That's weird.
> 
> Or maybe exacerbating some other fault?
>
>  Do you see this with different nvme controllers?
> 
> I only have 3x, and they are all ES3000 V3 NVMe PCIe SSD
>
> > Does your
> > controller write the phase bit before writing the command id in the cqe?
> 
> I don't know. Is that sort of info available from nvme-cli?

No, the only way to 100% confirm is with bus protocol analyzers. It's
a protocol violation if a controller was behaving that way. We've seen
devices broken like that before, though it's been a while since I've
seen such behvaior.

> [  885.344575] WARNING: CPU: 41 PID: 4565 at block/blk-mq.c:665  blk_mq_start_request+0xc4/0xcc

This warning appears to support my suspicion: the completion side is
observing a new phase with a stale command id, and that command id was
reallocated as a new request that we're still constructing at the time
the double-completion occured.

Host software is supposed to be guaranteed the entire CQE is written
once we see an updated phase, per spec: "If a Completion Queue
Entry is constructed via multiple writes, the Phase Tag bit shall be
updated in the last write of that Completion Queue Entry."

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 15:35                               ` Keith Busch
@ 2020-05-07 15:41                                 ` John Garry
  2020-05-08 16:16                                   ` Keith Busch
  2020-05-07 16:26                                 ` Robin Murphy
  1 sibling, 1 reply; 31+ messages in thread
From: John Garry @ 2020-05-07 15:41 UTC (permalink / raw)
  To: Keith Busch
  Cc: sagi, Robin Murphy, linux-nvme, Alexey Dobriyan, axboe,
	Will Deacon, Christoph Hellwig

>>
>> Or maybe exacerbating some other fault?
>>
>>   Do you see this with different nvme controllers?
>>
>> I only have 3x, and they are all ES3000 V3 NVMe PCIe SSD
>>
>>> Does your
>>> controller write the phase bit before writing the command id in the cqe?
>>
>> I don't know. Is that sort of info available from nvme-cli?
> 
> No, the only way to 100% confirm is with bus protocol analyzers. It's
> a protocol violation if a controller was behaving that way. We've seen
> devices broken like that before, though it's been a while since I've
> seen such behvaior.

I have an FAE contact, who I can ask. That may take a few days.

> 
>> [  885.344575] WARNING: CPU: 41 PID: 4565 at block/blk-mq.c:665  blk_mq_start_request+0xc4/0xcc
> 
> This warning appears to support my suspicion: the completion side is
> observing a new phase with a stale command id, and that command id was
> reallocated as a new request that we're still constructing at the time
> the double-completion occured.
> 
> Host software is supposed to be guaranteed the entire CQE is written
> once we see an updated phase, per spec: "If a Completion Queue
> Entry is constructed via multiple writes, the Phase Tag bit shall be
> updated in the last write of that Completion Queue Entry."

Thanks,
John

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-06 14:44         ` Keith Busch
@ 2020-05-07 15:58           ` Keith Busch
  2020-05-07 20:07             ` [PATCH] nvme-pci: fix "slimmer CQ head update" Alexey Dobriyan
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2020-05-07 15:58 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: axboe, John Garry, hch, linux-nvme, sagi

On Wed, May 06, 2020 at 08:44:34AM -0600, Keith Busch wrote:
> On Wed, May 06, 2020 at 04:24:29PM +0300, Alexey Dobriyan wrote:
> >  static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
> >  {
> > -	if (++nvmeq->cq_head == nvmeq->q_depth) {
> > +	/*
> > +	 * Can't pre-increment ->cq_head directly.
> > +	 * It must be in [0, ->q_depth) range at all times.
> > +	 */
> > +	u16 tmp = READ_ONCE(nvmeq->cq_head);
> > +	if (++tmp == nvmeq->q_depth) {
> 
> 
> This looks correct too, but there's no need for the READ_ONCE access,
> or the pre-increment:
> 
> 	u16 tmp = nvmeq->cq_head + 1;
>  	if (tmp == nvmeq->q_depth) {

Alexey,

Regardless of the unrelated issues discussing on this thread, would you
like to send a formal patch to fix the out-of-bounds access?

Thanks,
Keith

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 15:35                               ` Keith Busch
  2020-05-07 15:41                                 ` John Garry
@ 2020-05-07 16:26                                 ` Robin Murphy
  2020-05-07 17:35                                   ` Keith Busch
  1 sibling, 1 reply; 31+ messages in thread
From: Robin Murphy @ 2020-05-07 16:26 UTC (permalink / raw)
  To: Keith Busch, John Garry
  Cc: sagi, linux-nvme, Alexey Dobriyan, axboe, Will Deacon, Christoph Hellwig

On 2020-05-07 4:35 pm, Keith Busch wrote:
> On Thu, May 07, 2020 at 04:11:23PM +0100, John Garry wrote:
>> On 07/05/2020 15:23, Keith Busch wrote:
>>> On Thu, May 07, 2020 at 02:55:37PM +0100, John Garry wrote:
>>>> On 07/05/2020 12:04, Robin Murphy wrote:
>>>>>> [  177.132810] DMA-API: nvme 0000:85:00.0: device driver tries to
>>>>>> free DMA memor
>>>>>> y it has not allocated [device address=0x00000000ef371000]
>>>>>> [size=4096 bytes]
>>>>> [...]
>>>>>> [  177.276322]  debug_dma_unmap_page+0x6c/0x78
>>>>>> [  177.280487]  nvme_unmap_data+0x7c/0x23c
>>>>>> [  177.284305]  nvme_pci_complete_rq+0x28/0x58
>>>>>
>>>>> OK, so there's clearly something amiss there. I would have suggested
>>>>> next sticking the SMMU in passthrough to help focus on the DMA API
>>>>> debugging, but since that "DMA address" looks suspiciously like a
>>>>> physical address rather than an IOVA, I suspect that things might
>>>>> suddenly appear to be working fine if you do...
>>>>
>>>> OK, seems sensible. However it looks like this guy triggers the issue:
>>>>
>>>> 324b494c2862 nvme-pci: Remove two-pass completions
>>>>
>>>> With carrying the revert of $subject, it's a quick bisect to that patch.
>>>
>>> That's weird.
>>
>> Or maybe exacerbating some other fault?
>>
>>   Do you see this with different nvme controllers?
>>
>> I only have 3x, and they are all ES3000 V3 NVMe PCIe SSD
>>
>>> Does your
>>> controller write the phase bit before writing the command id in the cqe?
>>
>> I don't know. Is that sort of info available from nvme-cli?
> 
> No, the only way to 100% confirm is with bus protocol analyzers. It's
> a protocol violation if a controller was behaving that way. We've seen
> devices broken like that before, though it's been a while since I've
> seen such behvaior.
> 
>> [  885.344575] WARNING: CPU: 41 PID: 4565 at block/blk-mq.c:665  blk_mq_start_request+0xc4/0xcc
> 
> This warning appears to support my suspicion: the completion side is
> observing a new phase with a stale command id, and that command id was
> reallocated as a new request that we're still constructing at the time
> the double-completion occured.
> 
> Host software is supposed to be guaranteed the entire CQE is written
> once we see an updated phase, per spec: "If a Completion Queue
> Entry is constructed via multiple writes, the Phase Tag bit shall be
> updated in the last write of that Completion Queue Entry."

Hmm, that makes me wonder if there might be some interaction with the 
Arm memory model here - if there are strict ordering requirements for 
things in memory being observed, could we be missing appropriate 
barriers between reads/writes of the various fields?

Robin.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 16:26                                 ` Robin Murphy
@ 2020-05-07 17:35                                   ` Keith Busch
  2020-05-07 17:44                                     ` Will Deacon
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2020-05-07 17:35 UTC (permalink / raw)
  To: Robin Murphy
  Cc: sagi, John Garry, linux-nvme, Christoph Hellwig, axboe,
	Will Deacon, Alexey Dobriyan

On Thu, May 07, 2020 at 05:26:58PM +0100, Robin Murphy wrote:
> On 2020-05-07 4:35 pm, Keith Busch wrote:
> > On Thu, May 07, 2020 at 04:11:23PM +0100, John Garry wrote:
> > 
> > > [  885.344575] WARNING: CPU: 41 PID: 4565 at block/blk-mq.c:665  blk_mq_start_request+0xc4/0xcc
> > 
> > This warning appears to support my suspicion: the completion side is
> > observing a new phase with a stale command id, and that command id was
> > reallocated as a new request that we're still constructing at the time
> > the double-completion occured.
> > 
> > Host software is supposed to be guaranteed the entire CQE is written
> > once we see an updated phase, per spec: "If a Completion Queue
> > Entry is constructed via multiple writes, the Phase Tag bit shall be
> > updated in the last write of that Completion Queue Entry."
> 
> Hmm, that makes me wonder if there might be some interaction with the Arm
> memory model here - if there are strict ordering requirements for things in
> memory being observed, could we be missing appropriate barriers between
> reads/writes of the various fields?

Yeah, that might be something to consider. If that's the case, then it
should be reproducible with a different vendor's nvme controller.

If the behavior is unique to this particular nvme model, then it would
sound like the controller is creating a CQE in multiple memory writes
either high-to-low, or with Relaxed Ordering enabled in the TLPs.

But if some other interaction with the arm memory model is suspect, I'm
not sure how to confirm or debug.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 17:35                                   ` Keith Busch
@ 2020-05-07 17:44                                     ` Will Deacon
  2020-05-07 18:06                                       ` Keith Busch
  0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2020-05-07 17:44 UTC (permalink / raw)
  To: Keith Busch
  Cc: sagi, John Garry, linux-nvme, Christoph Hellwig, axboe,
	Robin Murphy, Alexey Dobriyan

On Thu, May 07, 2020 at 10:35:45AM -0700, Keith Busch wrote:
> On Thu, May 07, 2020 at 05:26:58PM +0100, Robin Murphy wrote:
> > On 2020-05-07 4:35 pm, Keith Busch wrote:
> > > On Thu, May 07, 2020 at 04:11:23PM +0100, John Garry wrote:
> > > 
> > > > [  885.344575] WARNING: CPU: 41 PID: 4565 at block/blk-mq.c:665  blk_mq_start_request+0xc4/0xcc
> > > 
> > > This warning appears to support my suspicion: the completion side is
> > > observing a new phase with a stale command id, and that command id was
> > > reallocated as a new request that we're still constructing at the time
> > > the double-completion occured.
> > > 
> > > Host software is supposed to be guaranteed the entire CQE is written
> > > once we see an updated phase, per spec: "If a Completion Queue
> > > Entry is constructed via multiple writes, the Phase Tag bit shall be
> > > updated in the last write of that Completion Queue Entry."
> > 
> > Hmm, that makes me wonder if there might be some interaction with the Arm
> > memory model here - if there are strict ordering requirements for things in
> > memory being observed, could we be missing appropriate barriers between
> > reads/writes of the various fields?
> 
> Yeah, that might be something to consider. If that's the case, then it
> should be reproducible with a different vendor's nvme controller.
> 
> If the behavior is unique to this particular nvme model, then it would
> sound like the controller is creating a CQE in multiple memory writes
> either high-to-low, or with Relaxed Ordering enabled in the TLPs.

[disclaimer: I don't know anything about nvme, so this might be way off! I
know a bit about the Arm memory model though, so I can help with that side.]

Are you sure there's an ordering requirement? My reading of the quote
from the spec is that "last write" just means the final write, not
necessarily the word at the highest address.

> But if some other interaction with the arm memory model is suspect, I'm
> not sure how to confirm or debug.

From what you described, it sounds like the flow should go something like:

  <Read the word of the CQE containing the updated Phase Tag Bit>
  dma_rmb();
  <Read the other words of the CQE>

How easy is it to check if that sort of thing is being followed?

Will

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 17:44                                     ` Will Deacon
@ 2020-05-07 18:06                                       ` Keith Busch
  2020-05-08 11:40                                         ` Will Deacon
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2020-05-07 18:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: sagi, John Garry, linux-nvme, Christoph Hellwig, axboe,
	Robin Murphy, Alexey Dobriyan

On Thu, May 07, 2020 at 06:44:29PM +0100, Will Deacon wrote:
> On Thu, May 07, 2020 at 10:35:45AM -0700, Keith Busch wrote:
> > 
> > Yeah, that might be something to consider. If that's the case, then it
> > should be reproducible with a different vendor's nvme controller.
> > 
> > If the behavior is unique to this particular nvme model, then it would
> > sound like the controller is creating a CQE in multiple memory writes
> > either high-to-low, or with Relaxed Ordering enabled in the TLPs.
> 
> [disclaimer: I don't know anything about nvme, so this might be way off! I
> know a bit about the Arm memory model though, so I can help with that side.]
> 
> Are you sure there's an ordering requirement? My reading of the quote
> from the spec is that "last write" just means the final write, not
> necessarily the word at the highest address.

Oh, NVMe's phase bit is located in the last word of a completion entry,
so the final write of a completion entry must be the highest address of
that entry.
 
> > But if some other interaction with the arm memory model is suspect, I'm
> > not sure how to confirm or debug.
> 
> From what you described, it sounds like the flow should go something like:
> 
>   <Read the word of the CQE containing the updated Phase Tag Bit>
>   dma_rmb();
>   <Read the other words of the CQE>
> 
> How easy is it to check if that sort of thing is being followed?

We don't have such a barrier in the driver. The phase must be written
last, so having it visible to the CPU was supposed to guarantee previous
DMA writes are also visible. Is this not the case?

FWIW, most NVMe controllers create the entire 16-byte CQE with a single
DMA write.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] nvme-pci: fix "slimmer CQ head update"
  2020-05-07 15:58           ` Keith Busch
@ 2020-05-07 20:07             ` Alexey Dobriyan
  0 siblings, 0 replies; 31+ messages in thread
From: Alexey Dobriyan @ 2020-05-07 20:07 UTC (permalink / raw)
  To: Keith Busch; +Cc: axboe, John Garry, hch, linux-nvme, sagi

Pre-incrementing ->cq_head can't be done in memory because OOB value
can be observed by another context.

This devalues space savings compared to original code :-\

	$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
	add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32)
	Function                                     old     new   delta
	nvme_poll_irqdisable                         464     456      -8
	nvme_poll                                    455     447      -8
	nvme_irq                                     388     380      -8
	nvme_dev_disable                             955     947      -8

But the code is minimal now: one read for head, one read for q_depth,
one increment, one comparison, single instruction phase bit update and
one write for new head.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: John Garry <john.garry@huawei.com>
Tested-by: John Garry <john.garry@huawei.com>
Fixes: e2a366a4b0feaeb ("nvme-pci: slimmer CQ head update")
---

 drivers/nvme/host/pci.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -973,9 +973,13 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
 
 static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
 {
-	if (++nvmeq->cq_head == nvmeq->q_depth) {
+	u16 tmp = nvmeq->cq_head + 1;
+
+	if (tmp == nvmeq->q_depth) {
 		nvmeq->cq_head = 0;
 		nvmeq->cq_phase ^= 1;
+	} else {
+		nvmeq->cq_head = tmp;
 	}
 }
 

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 18:06                                       ` Keith Busch
@ 2020-05-08 11:40                                         ` Will Deacon
  2020-05-08 14:07                                           ` Keith Busch
  0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2020-05-08 11:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: sagi, John Garry, linux-nvme, Christoph Hellwig, axboe,
	Robin Murphy, Alexey Dobriyan

On Thu, May 07, 2020 at 11:06:12AM -0700, Keith Busch wrote:
> On Thu, May 07, 2020 at 06:44:29PM +0100, Will Deacon wrote:
> > On Thu, May 07, 2020 at 10:35:45AM -0700, Keith Busch wrote:
> > > 
> > > Yeah, that might be something to consider. If that's the case, then it
> > > should be reproducible with a different vendor's nvme controller.
> > > 
> > > If the behavior is unique to this particular nvme model, then it would
> > > sound like the controller is creating a CQE in multiple memory writes
> > > either high-to-low, or with Relaxed Ordering enabled in the TLPs.
> > 
> > [disclaimer: I don't know anything about nvme, so this might be way off! I
> > know a bit about the Arm memory model though, so I can help with that side.]
> > 
> > Are you sure there's an ordering requirement? My reading of the quote
> > from the spec is that "last write" just means the final write, not
> > necessarily the word at the highest address.
> 
> Oh, NVMe's phase bit is located in the last word of a completion entry,
> so the final write of a completion entry must be the highest address of
> that entry.

Thanks. It sounds like it would still be valid (but weird?) for the device
to do: <write CQE[n-2] ... CQE[0]>; <write CQE[n-1]> though.

> > > But if some other interaction with the arm memory model is suspect, I'm
> > > not sure how to confirm or debug.
> > 
> > From what you described, it sounds like the flow should go something like:
> > 
> >   <Read the word of the CQE containing the updated Phase Tag Bit>
> >   dma_rmb();
> >   <Read the other words of the CQE>
> > 
> > How easy is it to check if that sort of thing is being followed?
> 
> We don't have such a barrier in the driver. The phase must be written
> last, so having it visible to the CPU was supposed to guarantee previous
> DMA writes are also visible. Is this not the case?

Without a barrier on the read side, something like the following might be
able to happen:

	CPU				NVME
	Read stale data
					Write data
					Write phase bit
	Read phase bit

So you need a dma_rmb() on the CPU side to prevent that reordering, and
ensure that the phase bit is read first. Anyway, the only reason I mentioned
it is because it should  be relatively straightforward to add the barrier
(famous last words...) and see if it helps with the problem. If not, we're
looking in the wrong place.

> FWIW, most NVMe controllers create the entire 16-byte CQE with a single
> DMA write.

Yes, that definitely sounds like the sane approach. Perhaps John will be
able to confirm what his device does.

Will

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-08 11:40                                         ` Will Deacon
@ 2020-05-08 14:07                                           ` Keith Busch
  2020-05-08 15:34                                             ` Keith Busch
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2020-05-08 14:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: sagi, John Garry, linux-nvme, Christoph Hellwig, axboe,
	Robin Murphy, Alexey Dobriyan

On Fri, May 08, 2020 at 12:40:20PM +0100, Will Deacon wrote:
> On Thu, May 07, 2020 at 11:06:12AM -0700, Keith Busch wrote:
> > >   <Read the word of the CQE containing the updated Phase Tag Bit>
> > >   dma_rmb();
> > >   <Read the other words of the CQE>
> > > 
> > > How easy is it to check if that sort of thing is being followed?
> > 
> > We don't have such a barrier in the driver. The phase must be written
> > last, so having it visible to the CPU was supposed to guarantee previous
> > DMA writes are also visible. Is this not the case?
> 
> Without a barrier on the read side, something like the following might be
> able to happen:
> 
> 	CPU				NVME
> 	Read stale data
> 					Write data
> 					Write phase bit
> 	Read phase bit
> 
> So you need a dma_rmb() on the CPU side to prevent that reordering, and
> ensure that the phase bit is read first.

I hadn't mentioned the driver CQE reads use 'volatile' access, reading
phase first, then command id some time after.

> Anyway, the only reason I mentioned
> it is because it should  be relatively straightforward to add the barrier
> (famous last words...) and see if it helps with the problem. If not, we're
> looking in the wrong place.

I'll look into this as well.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-08 14:07                                           ` Keith Busch
@ 2020-05-08 15:34                                             ` Keith Busch
  0 siblings, 0 replies; 31+ messages in thread
From: Keith Busch @ 2020-05-08 15:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: sagi, John Garry, linux-nvme, Alexey Dobriyan, axboe,
	Robin Murphy, Christoph Hellwig

On Fri, May 08, 2020 at 07:07:33AM -0700, Keith Busch wrote:
> On Fri, May 08, 2020 at 12:40:20PM +0100, Will Deacon wrote:
> > Without a barrier on the read side, something like the following might be
> > able to happen:
> > 
> > 	CPU				NVME
> > 	Read stale data
> > 					Write data
> > 					Write phase bit
> > 	Read phase bit
> > 
> > So you need a dma_rmb() on the CPU side to prevent that reordering, and
> > ensure that the phase bit is read first.
> 
> I hadn't mentioned the driver CQE reads use 'volatile' access, reading
> phase first, then command id some time after.

Well, maybe that doesn't mean anything. We're using conditional control
dependencies, like this pseudo code:

	if (READ_ONCE(cqe->phase) is new phase) {
		...
		command_id = READ_ONCE(cqe->command_id)
		...
	}

But according to Documentation/memory-barriers.txt:

  (*) Control dependencies can order prior loads against later stores.
      However, they do -not- guarantee any other sort of ordering:
      Not prior loads against later loads

So a read barrier would be necessary if that's how it is.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-07 15:41                                 ` John Garry
@ 2020-05-08 16:16                                   ` Keith Busch
  2020-05-08 17:04                                     ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2020-05-08 16:16 UTC (permalink / raw)
  To: John Garry
  Cc: sagi, Robin Murphy, linux-nvme, Alexey Dobriyan, axboe,
	Will Deacon, Christoph Hellwig

On Thu, May 07, 2020 at 04:41:01PM +0100, John Garry wrote:
> I have an FAE contact, who I can ask. That may take a few days.

John,

In the meantime could you see if the following resolves your
observation?

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e13c370de830..8e96ec6eed61 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -989,6 +989,7 @@ static inline int nvme_process_cq(struct nvme_queue *nvmeq)
 
 	while (nvme_cqe_pending(nvmeq)) {
 		found++;
+		dma_rmb();
 		nvme_handle_cqe(nvmeq, nvmeq->cq_head);
 		nvme_update_cq_head(nvmeq);
 	}
--

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] nvme-pci: slimmer CQ head update
  2020-05-08 16:16                                   ` Keith Busch
@ 2020-05-08 17:04                                     ` John Garry
  0 siblings, 0 replies; 31+ messages in thread
From: John Garry @ 2020-05-08 17:04 UTC (permalink / raw)
  To: Keith Busch
  Cc: sagi, Robin Murphy, luojiaxing, linux-nvme, Alexey Dobriyan,
	axboe, Will Deacon, Christoph Hellwig

On 08/05/2020 17:16, Keith Busch wrote:
> On Thu, May 07, 2020 at 04:41:01PM +0100, John Garry wrote:
>> I have an FAE contact, who I can ask. That may take a few days.
> 
> John,
> 
> In the meantime could you see if the following resolves your
> observation?
> 
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index e13c370de830..8e96ec6eed61 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -989,6 +989,7 @@ static inline int nvme_process_cq(struct nvme_queue *nvmeq)
>   
>   	while (nvme_cqe_pending(nvmeq)) {
>   		found++;
> +		dma_rmb();
>   		nvme_handle_cqe(nvmeq, nvmeq->cq_head);
>   		nvme_update_cq_head(nvmeq);
>   	}
> --

Yeah, it looks much better now. 5 minutes running without issue, as 
opposed to 5 seconds in with a crash.

I'll test more when I return to work on Monday.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-05-08 17:06 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 18:45 [PATCH] nvme-pci: slimmer CQ head update Alexey Dobriyan
2020-02-29  5:53 ` Keith Busch
2020-05-06 11:03   ` John Garry
2020-05-06 12:47     ` Keith Busch
2020-05-06 13:24       ` Alexey Dobriyan
2020-05-06 13:44         ` John Garry
2020-05-06 14:01           ` Alexey Dobriyan
2020-05-06 14:35           ` Christoph Hellwig
2020-05-06 16:26             ` John Garry
2020-05-06 16:31               ` Will Deacon
2020-05-06 16:52                 ` Robin Murphy
2020-05-06 17:02                   ` John Garry
2020-05-07  8:18                     ` John Garry
2020-05-07 11:04                       ` Robin Murphy
2020-05-07 13:55                         ` John Garry
2020-05-07 14:23                           ` Keith Busch
2020-05-07 15:11                             ` John Garry
2020-05-07 15:35                               ` Keith Busch
2020-05-07 15:41                                 ` John Garry
2020-05-08 16:16                                   ` Keith Busch
2020-05-08 17:04                                     ` John Garry
2020-05-07 16:26                                 ` Robin Murphy
2020-05-07 17:35                                   ` Keith Busch
2020-05-07 17:44                                     ` Will Deacon
2020-05-07 18:06                                       ` Keith Busch
2020-05-08 11:40                                         ` Will Deacon
2020-05-08 14:07                                           ` Keith Busch
2020-05-08 15:34                                             ` Keith Busch
2020-05-06 14:44         ` Keith Busch
2020-05-07 15:58           ` Keith Busch
2020-05-07 20:07             ` [PATCH] nvme-pci: fix "slimmer CQ head update" Alexey Dobriyan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.