All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Keith Busch <kbusch@kernel.org>, Alexey Dobriyan <adobriyan@gmail.com>
Cc: axboe@fb.com, hch@lst.de, linux-nvme@lists.infradead.org,
	sagi@grimberg.me
Subject: Re: [PATCH] nvme-pci: slimmer CQ head update
Date: Wed, 6 May 2020 12:03:35 +0100	[thread overview]
Message-ID: <defb25c5-5ae5-5ff9-66db-efb129bd7743@huawei.com> (raw)
In-Reply-To: <20200229055351.GA542920@dhcp-10-100-145-180.wdl.wdc.com>

On 29/02/2020 05:53, Keith Busch wrote:
> On Fri, Feb 28, 2020 at 09:45:19PM +0300, Alexey Dobriyan wrote:
>> @@ -982,11 +982,9 @@ static void nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
>>   

I think that this patch may be broken when use_threaded_handlers=1, as I 
see this crash on my arm64 system:

[   29.366700] Unable to handle kernel paging request at virtual address 
ffff800
01178500e
[   29.375316] Mem abort info:
[   29.378096]   ESR = 0x96000007
[   29.381135]   EC = 0x25: DABT (current EL), IL = 32 bits
[   29.386422]   SET = 0, FnV = 0
[   29.389461]   EA = 0, S1PTW = 0
[   29.392587] Data abort info:
[   29.395456]   ISV = 0, ISS = 0x00000007
[   29.399272]   CM = 0, WnR = 0
[   29.402226] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000020252be08000
[   29.408896] [ffff80001178500e] pgd=00000027ff835003, 
pud=00000027ff836003, pm
d=00000027ffa53003, pte=0000000000000000
[   29.419457] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[   29.425003] Modules linked in:
[   29.428046] CPU: 17 PID: 1023 Comm: irq/137-nvme2q2 Not tainted 
5.7.0-rc4-ga3
c4a5a-dirty #117
[   29.436528] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[   29.445356] pstate: 40400089 (nZcv daIf +PAN -UAO)
[   29.450133] pc : nvme_irq_check+0x10/0x28
[   29.454130] lr : __handle_irq_event_percpu+0x5c/0x144
[   29.459158] sp : ffff80001008be90
[   29.462458] x29: ffff80001008be90 x28: ffff002fd7b144c0
[   29.467746] x27: ffffb111ce83e440 x26: ffffb111ce83e468
[   29.473032] x25: ffffb111cf14426f x24: ffff0027d6b67800
[   29.478319] x23: 0000000000000089 x22: ffff80001008bf1c
[   29.483606] x21: 0000000000000000 x20: ffff0027d6b67800
[   29.488893] x19: ffff2027ca8e6880 x18: 0000000000000000
[   29.494180] x17: 000007ffffffffff x16: 00000000000001de
[   29.499466] x15: 000000000001ffff x14: 0000000000000001
[   29.504754] x13: 000000000000002b x12: 000000000000002b
[   29.510041] x11: 000000000000ffff x10: ffffb111cef71cb8
[   29.515328] x9 : 0000000000000040 x8 : ffff2027d10fbe08
[   29.520616] x7 : 0000000000000000 x6 : 0000000000000000
[   29.525903] x5 : 0000000000000000 x4 : 0000000000000001
[   29.531190] x3 : 0000000000000400 x2 : ffff800011781000
[   29.536477] x1 : ffff800011785000 x0 : 0000000000000001
[   29.541764] Call trace:
[   29.544201]  nvme_irq_check+0x10/0x28
[   29.547847]  handle_irq_event_percpu+0x1c/0x54
[   29.552269]  handle_irq_event+0x44/0x74
[   29.556088]  handle_fasteoi_irq+0xa8/0x160
[   29.560167]  generic_handle_irq+0x2c/0x40
[   29.564158]  __handle_domain_irq+0x64/0xb0
[   29.568238]  gic_handle_irq+0x94/0x194
[   29.571970]  el1_irq+0xb8/0x180
[   29.575097]  nvme_irq+0xe4/0x1f4
[   29.578310]  irq_thread_fn+0x28/0x6c
[   29.581868]  irq_thread+0x158/0x1e8
[   29.585343]  kthread+0x124/0x150
[   29.588556]  ret_from_fork+0x10/0x18
[   29.592116] Code: 7940e023 f9402422 3941d020 8b031041 (79401c21)
[   29.598249] ---[ end trace 33259050d5109e6f ]---
[   29.607114] Kernel panic - not syncing: Fatal exception in interrupt
[   29.613459] SMP: stopping secondary CPUs
[   29.617404] Kernel Offset: 0x3111bd6d0000 from 0xffff800010000000
[   29.623468] PHYS_OFFSET: 0xffffe13f80000000
[   29.627633] CPU features: 0x080002,22808a18
[   29.631797] Memory Limit: none
[   29.639056] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt
]---

Reverting this patch makes the crash go away.

>>   static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
>>   {
>> -	if (nvmeq->cq_head == nvmeq->q_depth - 1) {
>> +	if (++nvmeq->cq_head == nvmeq->q_depth) {

I figure momentarily nvmeq->cq_head may transition to equal 
nvmeq->q_depth and then to 0, which causes an out-of-bounds access here:

static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
{
	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
			nvmeq->cq_phase;
}

>>   		nvmeq->cq_head = 0;
>> -		nvmeq->cq_phase = !nvmeq->cq_phase;
>> -	} else {
>> -		nvmeq->cq_head++;
>> +		nvmeq->cq_phase ^= 1;
>>   	}
>>   }
>>   

Any disagreement to sending a revert patch?

Thanks,
John

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-05-06 11:04 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 18:45 [PATCH] nvme-pci: slimmer CQ head update Alexey Dobriyan
2020-02-29  5:53 ` Keith Busch
2020-05-06 11:03   ` John Garry [this message]
2020-05-06 12:47     ` Keith Busch
2020-05-06 13:24       ` Alexey Dobriyan
2020-05-06 13:44         ` John Garry
2020-05-06 14:01           ` Alexey Dobriyan
2020-05-06 14:35           ` Christoph Hellwig
2020-05-06 16:26             ` John Garry
2020-05-06 16:31               ` Will Deacon
2020-05-06 16:52                 ` Robin Murphy
2020-05-06 17:02                   ` John Garry
2020-05-07  8:18                     ` John Garry
2020-05-07 11:04                       ` Robin Murphy
2020-05-07 13:55                         ` John Garry
2020-05-07 14:23                           ` Keith Busch
2020-05-07 15:11                             ` John Garry
2020-05-07 15:35                               ` Keith Busch
2020-05-07 15:41                                 ` John Garry
2020-05-08 16:16                                   ` Keith Busch
2020-05-08 17:04                                     ` John Garry
2020-05-07 16:26                                 ` Robin Murphy
2020-05-07 17:35                                   ` Keith Busch
2020-05-07 17:44                                     ` Will Deacon
2020-05-07 18:06                                       ` Keith Busch
2020-05-08 11:40                                         ` Will Deacon
2020-05-08 14:07                                           ` Keith Busch
2020-05-08 15:34                                             ` Keith Busch
2020-05-06 14:44         ` Keith Busch
2020-05-07 15:58           ` Keith Busch
2020-05-07 20:07             ` [PATCH] nvme-pci: fix "slimmer CQ head update" Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=defb25c5-5ae5-5ff9-66db-efb129bd7743@huawei.com \
    --to=john.garry@huawei.com \
    --cc=adobriyan@gmail.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.