All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Keith Busch <kbusch@kernel.org>
Cc: sagi@grimberg.me, Robin Murphy <robin.murphy@arm.com>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	axboe@fb.com, Will Deacon <will@kernel.org>,
	Alexey Dobriyan <adobriyan@gmail.com>
Subject: Re: [PATCH] nvme-pci: slimmer CQ head update
Date: Thu, 7 May 2020 16:11:23 +0100	[thread overview]
Message-ID: <8b297620-c72b-2184-36cb-032f5cfda05c@huawei.com> (raw)
In-Reply-To: <20200507142352.GA2621422@dhcp-10-100-145-180.wdl.wdc.com>

On 07/05/2020 15:23, Keith Busch wrote:
> On Thu, May 07, 2020 at 02:55:37PM +0100, John Garry wrote:
>> On 07/05/2020 12:04, Robin Murphy wrote:
>>>> [  177.132810] DMA-API: nvme 0000:85:00.0: device driver tries to
>>>> free DMA memor
>>>> y it has not allocated [device address=0x00000000ef371000]
>>>> [size=4096 bytes]
>>> [...]
>>>> [  177.276322]  debug_dma_unmap_page+0x6c/0x78
>>>> [  177.280487]  nvme_unmap_data+0x7c/0x23c
>>>> [  177.284305]  nvme_pci_complete_rq+0x28/0x58
>>>
>>> OK, so there's clearly something amiss there. I would have suggested
>>> next sticking the SMMU in passthrough to help focus on the DMA API
>>> debugging, but since that "DMA address" looks suspiciously like a
>>> physical address rather than an IOVA, I suspect that things might
>>> suddenly appear to be working fine if you do...
>>
>> OK, seems sensible. However it looks like this guy triggers the issue:
>>
>> 324b494c2862 nvme-pci: Remove two-pass completions
>>
>> With carrying the revert of $subject, it's a quick bisect to that patch.
> 
> That's weird.

Or maybe exacerbating some other fault?

  Do you see this with different nvme controllers?

I only have 3x, and they are all ES3000 V3 NVMe PCIe SSD

> Does your
> controller write the phase bit before writing the command id in the cqe?

I don't know. Is that sort of info available from nvme-cli?

> Asking because this looks like we're seeing an older command id in the
> cqe, and the only thing that patch you've bisected should do is remove a
> delay between observing the new phase and reading the command id.
> .

Another log, below, with SMMU off.

John


fio-2.1.10
Starting 60 processes
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
[  885.335343] ------------[ cut here ]------------ta 00m:05s]
[  885.335999] ------------[ cut here ]------------
[  885.339967] DMA-API: nvme 0000:82:00.0: device driver tries to free 
DMA memor
y it has not allocated [device address=0x0000002fd5870000] [size=4096 bytes]
[  885.344575] WARNING: CPU: 41 PID: 4565 at block/blk-mq.c:665 
blk_mq_start_req
uest+0xc4/0xcc
[  885.344577] Modules linked in:
[  885.358287] WARNING: CPU: 39 PID: 1074 at kernel/dma/debug.c:1014 
check_unmap
+0x698/0x86c
[  885.366601] CPU: 41 PID: 4565 Comm: fio Not tainted 
5.6.0-rc4-gd64d242-dirty
#155
[  885.369645] Modules linked in:
[  885.377799] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  885.385262] CPU: 39 PID: 1074 Comm: irq/230-nvme1q2 Not tainted 
5.6.0-rc4-gd6
4d242-dirty #155
[  885.388308] pstate: 60400009 (nZCv daif +PAN -UAO)
[  885.397155] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  885.397157] pstate: 60c00009 (nZCv daif +PAN +UAO)
[  885.405656] pc : blk_mq_start_request+0xc4/0xcc
[  885.405662] lr : nvme_queue_rq+0x134/0x7cc
[  885.410437] pc : check_unmap+0x698/0x86c
[  885.419281] sp : ffff800025ccb770
[  885.419283] x29: ffff800025ccb770 x28: ffff002fdc16d200
[  885.424061] lr : check_unmap+0x698/0x86c
[  885.428577] x27: ffff002fdc16d318 x26: fffffe00bf3621c0
[  885.432663] sp : ffff8000217dbb40
[  885.436574] x25: 0000000000001000 x24: 0000000000000000
[  885.439881] x29: ffff8000217dbb40 x28: ffff002fdc6c6cd4
[  885.445177] x23: 0000000000001000 x22: ffff2027c7540000
[  885.449088] x27: ffffa99cc3c7f000 x26: 0000000000001000
[  885.454387] x21: ffff800025ccb8b0 x20: ffff2027c6ae0000
[  885.457694] x25: 0000000000000000 x24: ffff2027c7540000
[  885.462990] x19: ffff002fdc16d200 x18: 0000000000000000
[  885.468288] x23: ffffa99cc55630d0 x22: ffffa99cc530a000
[  885.473584] x17: 0000000000000000 x16: 0000000000000000
[  885.478882] x21: 0000002fd5870000 x20: ffff8000217dbbc0
[  885.484178] x15: 0000000000000000 x14: 0000000000000000
[  885.489477] x19: 0000002fd5870000 x18: 0000000000000000
[  885.494773] x13: 0000000066641000 x12: ffff2027a15cf9a0
[  885.500071] x17: 0000000000000000 x16: 0000000000000000
[  885.505367] x11: 0000000026641fff x10: 0000000000000002
[  885.510665] x15: 0000000000000000 x14: 7a69735b205d3030
[  885.515964] x9 : 0000000000a80000 x8 : ffff2027d3cb9ac8
[  885.521264] x13: 3030373835646632 x12: 3030303030307830
[  885.526559] x7 : ffffa99cc4f34000 x6 : 0000002fd5887000
[  885.531856] x11: 3d73736572646461 x10: 206563697665645b
[  885.537152] x5 : 0000000000000000 x4 : 0000000000000000
[  885.542449] x9 : ffffa99cc5321bc8 x8 : 6c6120746f6e2073
[  885.547745] x3 : ffff2027d039c0b0 x2 : 0000000000000000
[  885.553042] x7 : 6168207469207972 x6 : ffff002fffdbe1b8
[  885.558338] x1 : 0000000100000000 x0 : 0000000000000002
[  885.563634] x5 : 0000000000000000 x4 : 0000000000000000
[  885.568932] Call trace:
[  885.574228] x3 : 0000000000000000 x2 : ffff002fffdc5088
[  885.579531]  blk_mq_start_request+0xc4/0xcc
[  885.584821] x1 : 0000000100000001 x0 : 000000000000008d
[  885.590120]  nvme_queue_rq+0x134/0x7cc
[  885.595414] Call trace:
[  885.597858]  __blk_mq_try_issue_directly+0x108/0x1bc
[  885.603158]  check_unmap+0x698/0x86c
[  885.607324]  blk_mq_request_issue_directly+0x40/0x64
[  885.612620]  debug_dma_unmap_page+0x6c/0x78
[  885.616359]  blk_mq_try_issue_list_directly+0x50/0xc8
[  885.618800]  nvme_unmap_data+0x7c/0x23c
[  885.623752]  blk_mq_sched_insert_requests+0x170/0x1d0
[  885.623753]  blk_mq_flush_plug_list+0x10c/0x158
[  885.627318]  nvme_pci_complete_rq+0x3c/0x10c
[  885.632271]  blk_flush_plug_list+0xc4/0xd4
[  885.632273]  blk_finish_plug+0x30/0x40
[  885.636444]  blk_mq_complete_request+0x114/0x150
[  885.641484]  blkdev_direct_IO+0x3d4/0x444
[  885.645306]  nvme_irq+0xbc/0x204
[  885.650346]  generic_file_read_iter+0x90/0xaec
[  885.654863]  irq_thread_fn+0x28/0x6c
[  885.659118]  blkdev_read_iter+0x3c/0x54
[  885.663203]  irq_thread+0x158/0x1e8
[  885.666943]  aio_read+0xdc/0x138
[  885.671548]  kthread+0xf4/0x120
[  885.675544]  io_submit_one+0x4ac/0xbf0
[  885.675546]  __arm64_sys_io_submit+0x16c/0x1f8
[  885.678766]  ret_from_fork+0x10/0x18
[  885.683199]  el0_svc_common.constprop.3+0xb8/0x170
[  885.686765] ---[ end trace fc66a57b25e362aa ]---
[  885.690593]  do_el0_svc+0x70/0x88
[  885.724844]  el0_sync_handler+0xf4/0x130
[  885.728758]  el0_sync+0x140/0x180
[  885.732065] ---[ end trace fc66a57b25e362ab ]---
[  885.736768] ------------[ cut here ]------------
[  885.741379] refcount_t: underflow; use-after-free.
[  885.746184] WARNING: CPU: 39 PID: 1074 at lib/refcount.c:28 
refcount_warn_sat
urate+0x6c/0x13c
[  885.754687] Modules linked in:
[  885.757736] CPU: 39 PID: 1074 Comm: irq/230-nvme1q2 Tainted: G        W
    5.6.0-rc4-gd64d242-dirty #155
[  885.767623] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  885.776471] pstate: 60c00009 (nZCv daif +PAN +UAO)
[  885.781250] pc : refcount_warn_saturate+0x6c/0x13c
[  885.786028] lr : refcount_warn_saturate+0x6c/0x13c
[  885.790805] sp : ffff8000217dbc40
[  885.794112] x29: ffff8000217dbc40 x28: ffff002fdc6c6cd4
[  885.799411] x27: ffffa99cc3c7f000 x26: ffffa99cc3c7f948
[  885.804710] x25: 0000000000000001 x24: ffffa99cc4cd3710
[  885.810007] x23: fffffffffffffff8 x22: ffffde07ebde5680
[  885.815305] x21: 0000000000000000 x20: ffff2027c6ae0000
[  885.820603] x19: ffff002fdc16d200 x18: 0000000000000000
[  885.825901] x17: 0000000000000000 x16: 0000000000000000
[  885.831199] x15: 0000000000000000 x14: ffff002fdd922948
[  885.836498] x13: ffff002fdd922150 x12: 0000000000000000
[  885.841796] x11: 00000000000008a4 x10: 000000000000000f
[  885.847094] x9 : ffffa99cc5321bc8 x8 : 72657466612d6573
[  885.852391] x7 : 75203b776f6c6672 x6 : ffff002fffdbe1b8
[  885.857689] x5 : 0000000000000000 x4 : 0000000000000000
[  885.862986] x3 : 0000000000000000 x2 : ffff002fffdc5088
[  885.868284] x1 : 0000000100000001 x0 : 0000000000000026
[  885.873582] Call trace:
[  885.876028]  refcount_warn_saturate+0x6c/0x13c
[  885.880462]  blk_mq_free_request+0x12c/0x14c
[  885.884723]  blk_mq_end_request+0x114/0x134
[  885.888898]  nvme_complete_rq+0x50/0x128
[  885.892811]  nvme_pci_complete_rq+0x44/0x10c
[  885.897070]  blk_mq_complete_request+0x114/0x150
[  885.901674]  nvme_irq+0xbc/0x204
[  885.904898]  irq_thread_fn+0x28/0x6c
[  885.908464]  irq_thread+0x158/0x1e8
[  885.911945]  kthread+0xf4/0x120
[  885.915080]  ret_from_fork+0x10/0x18
[  885.918646] ---[ end trace fc66a57b25e362ac ]---


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-05-07 15:12 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 18:45 [PATCH] nvme-pci: slimmer CQ head update Alexey Dobriyan
2020-02-29  5:53 ` Keith Busch
2020-05-06 11:03   ` John Garry
2020-05-06 12:47     ` Keith Busch
2020-05-06 13:24       ` Alexey Dobriyan
2020-05-06 13:44         ` John Garry
2020-05-06 14:01           ` Alexey Dobriyan
2020-05-06 14:35           ` Christoph Hellwig
2020-05-06 16:26             ` John Garry
2020-05-06 16:31               ` Will Deacon
2020-05-06 16:52                 ` Robin Murphy
2020-05-06 17:02                   ` John Garry
2020-05-07  8:18                     ` John Garry
2020-05-07 11:04                       ` Robin Murphy
2020-05-07 13:55                         ` John Garry
2020-05-07 14:23                           ` Keith Busch
2020-05-07 15:11                             ` John Garry [this message]
2020-05-07 15:35                               ` Keith Busch
2020-05-07 15:41                                 ` John Garry
2020-05-08 16:16                                   ` Keith Busch
2020-05-08 17:04                                     ` John Garry
2020-05-07 16:26                                 ` Robin Murphy
2020-05-07 17:35                                   ` Keith Busch
2020-05-07 17:44                                     ` Will Deacon
2020-05-07 18:06                                       ` Keith Busch
2020-05-08 11:40                                         ` Will Deacon
2020-05-08 14:07                                           ` Keith Busch
2020-05-08 15:34                                             ` Keith Busch
2020-05-06 14:44         ` Keith Busch
2020-05-07 15:58           ` Keith Busch
2020-05-07 20:07             ` [PATCH] nvme-pci: fix "slimmer CQ head update" Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b297620-c72b-2184-36cb-032f5cfda05c@huawei.com \
    --to=john.garry@huawei.com \
    --cc=adobriyan@gmail.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=sagi@grimberg.me \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.