Re: nvme crash - Re: linux-next: Tree for Aug 13

iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* Re: nvme crash - Re: linux-next: Tree for Aug 13
       [not found] <454c65b1-872a-a48c-662d-690044662772@huawei.com>
@ 2020-08-13 15:50 ` Christoph Hellwig
  2020-08-14 12:00   ` John Garry
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-08-13 15:50 UTC (permalink / raw)
  To: John Garry
  Cc: Stephen Rothwell, Linux Kernel Mailing List, linux-nvme, iommu,
	Linux Next Mailing List, Robin Murphy

On Thu, Aug 13, 2020 at 12:00:19PM +0100, John Garry wrote:
> On 13/08/2020 07:58, Stephen Rothwell wrote:
> > Hi all,
> 
> Hi guys,
> 
> I have experienced this this crash below on linux-next for the last few days
> on my arm64 system. Linus' master branch today also has it.

Adding Robin and the iommu list as this seems to be in the dma-iommu
code.

> root@ubuntu:/home/john# insmod nvme.ko
> [148.254564] nvme 0000:81:00.0: Adding to iommu group 21
> [148.260973] nvme nvme0: pci function 0000:81:00.0
> root@ubuntu:/home/john# [148.272996] Unable to handle kernel NULL pointer
> dereference at virtual address 0000000000000010
> [148.281784] Mem abort info:
> [148.284584] ESR = 0x96000004
> [148.287641] EC = 0x25: DABT (current EL), IL = 32 bits
> [148.292950] SET = 0, FnV = 0
> [148.295998] EA = 0, S1PTW = 0
> [148.299126] Data abort info:
> [148.302003] ISV = 0, ISS = 0x00000004
> [148.305832] CM = 0, WnR = 0
> [148.308794] user pgtable: 4k pages, 48-bit VAs, pgdp=00000a27bf3c9000
> [148.315229] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
> [148.322016] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [148.327577] Modules linked in: nvme nvme_core
> [148.331927] CPU: 56 PID: 256 Comm: kworker/u195:0 Not tainted
> 5.8.0-next-20200812 #27
> [148.339744] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
> V1.16.01 03/15/2019
> [148.348260] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
> [148.353822] pstate: 80c00009 (Nzcv daif +PAN +UAO BTYPE=--)
> [148.359390] pc : __sg_alloc_table_from_pages+0xec/0x238
> [148.364604] lr : __sg_alloc_table_from_pages+0xc8/0x238
> [148.369815] sp : ffff800013ccbad0
> [148.373116] x29: ffff800013ccbad0 x28: ffff0a27b3d380a8
> [148.378417] x27: 0000000000000000 x26: 0000000000002dc2
> [148.383718] x25: 0000000000000dc0 x24: 0000000000000000
> [148.389019] x23: 0000000000000000 x22: ffff800013ccbbe8
> [148.394320] x21: 0000000000000010 x20: 0000000000000000
> [148.399621] x19: 00000000fffff000 x18: ffffffffffffffff
> [148.404922] x17: 00000000000000c0 x16: fffffe289eaf6380
> [148.410223] x15: ffff800011b59948 x14: ffff002bc8fe98f8
> [148.415523] x13: ff00000000000000 x12: ffff8000114ca000
> [148.420824] x11: 0000000000000000 x10: ffffffffffffffff
> [148.426124] x9 : ffffffffffffffc0 x8 : ffff0a27b5f9b6a0
> [148.431425] x7 : 0000000000000000 x6 : 0000000000000001
> [148.436726] x5 : ffff0a27b5f9b680 x4 : 0000000000000000
> [148.442027] x3 : ffff0a27b5f9b680 x2 : 0000000000000000
> [148.447328] x1 : 0000000000000001 x0 : 0000000000000000
> [148.452629] Call trace:
> [148.455065]__sg_alloc_table_from_pages+0xec/0x238
> [148.459931]sg_alloc_table_from_pages+0x18/0x28
> [148.464541]iommu_dma_alloc+0x474/0x678
> [148.468455]dma_alloc_attrs+0xd8/0xf0
> [148.472193]nvme_alloc_queue+0x114/0x160 [nvme]
> [148.476798]nvme_reset_work+0xb34/0x14b4 [nvme]
> [148.481407]process_one_work+0x1e8/0x360
> [148.485405]worker_thread+0x44/0x478
> [148.489055]kthread+0x150/0x158
> [148.492273]ret_from_fork+0x10/0x34
> [148.495838] Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
> [148.501921] ---[ end trace 89bb2b72d59bf925 ]---
> 
> Anything to worry about? I guess not since we're in the merge window, but
> mentioning just in case ...
> 
> Thanks,
> john
> 
> > 
> > News: The merge window has opened, so please do not add any v5.10
> > related material to your linux-next included branches until after the
> > merge window closes again.
> > 
> > Changes since 20200812:
> > 
> > My fixes tree contains:
> > 
> >    73c7adb54169 ("device_cgroup: Fix RCU list debugging warning")
> > 
> > Linus' tree produces a WARNING in my qemu testing (see
> > https://lore.kernel.org/lkml/20200813164654.061dbbd3@canb.auug.org.au/).
> > 
> > Non-merge commits (relative to Linus' tree): 946
> >   1083 files changed, 28405 insertions(+), 9953 deletions(-)
> > 
> > ----------------------------------------------------------------------------
> > 
> > I have created today's linux-next tree at
> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > (patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
> > are tracking the linux-next tree using git, you should not use "git pull"
> > to do so as that will try to merge the new linux-next release with the
> > old one.  You should use "git fetch" and checkout or reset to the new
> > master.
> > 
> > You can see which trees have been included by looking in the Next/Trees
> > file in the source.  There are also quilt-import.log and merge.log
> > files in the Next directory.  Between each merge, the tree was built
> > with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
> > multi_v7_defconfig for arm and a native build of tools/perf. After
> > the final fixups (if any), I do an x86_64 modules_install followed by
> > builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
> > ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
> > and sparc64 defconfig and htmldocs. And finally, a simple boot test
> > of the powerpc pseries_le_defconfig kernel in qemu (with and without
> > kvm enabled).
> > 
> > Below is a summary of the state of the merge.
> > 
> > I am currently merging 327 trees (counting Linus' and 85 trees of bug
> > fix patches pending for the current merge release).
> > 
> > Stats about the size of the tree over time can be seen at
> > http://neuling.org/linux-next-size.html .
> > 
> > Status of my local build tests will be at
> > http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
> > advice about cross compilers/configs that work, we are always open to add
> > more builds.
> > 
> > Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
> > Gortmaker for triage and bug fixes.
> > 
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
---end quoted text---
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nvme crash - Re: linux-next: Tree for Aug 13
  2020-08-13 15:50 ` nvme crash - Re: linux-next: Tree for Aug 13 Christoph Hellwig
@ 2020-08-14 12:00   ` John Garry
  2020-08-14 12:08     ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: John Garry @ 2020-08-14 12:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Stephen Rothwell, chaitanya.kulkarni, Linux Kernel Mailing List,
	linux-nvme, iommu, Linux Next Mailing List, Robin Murphy


>> I have experienced this this crash below on linux-next for the last few days
>> on my arm64 system. Linus' master branch today also has it.
> Adding Robin and the iommu list as this seems to be in the dma-iommu
> code.
> 
>> root@ubuntu:/home/john# insmod nvme.ko
>> [148.254564] nvme 0000:81:00.0: Adding to iommu group 21
>> [148.260973] nvme nvme0: pci function 0000:81:00.0
>> root@ubuntu:/home/john# [148.272996] Unable to handle kernel NULL pointer
>> dereference at virtual address 0000000000000010
>> [148.281784] Mem abort info:
>> [148.284584] ESR = 0x96000004
>> [148.287641] EC = 0x25: DABT (current EL), IL = 32 bits
>> [148.292950] SET = 0, FnV = 0
>> [148.295998] EA = 0, S1PTW = 0
>> [148.299126] Data abort info:
>> [148.302003] ISV = 0, ISS = 0x00000004
>> [148.305832] CM = 0, WnR = 0
>> [148.308794] user pgtable: 4k pages, 48-bit VAs, pgdp=00000a27bf3c9000
>> [148.315229] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
>> [148.322016] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [148.327577] Modules linked in: nvme nvme_core
>> [148.331927] CPU: 56 PID: 256 Comm: kworker/u195:0 Not tainted
>> 5.8.0-next-20200812 #27
>> [148.339744] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
>> V1.16.01 03/15/2019
>> [148.348260] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
>> [148.353822] pstate: 80c00009 (Nzcv daif +PAN +UAO BTYPE=--)
>> [148.359390] pc : __sg_alloc_table_from_pages+0xec/0x238
>> [148.364604] lr : __sg_alloc_table_from_pages+0xc8/0x238
>> [148.369815] sp : ffff800013ccbad0
>> [148.373116] x29: ffff800013ccbad0 x28: ffff0a27b3d380a8
>> [148.378417] x27: 0000000000000000 x26: 0000000000002dc2
>> [148.383718] x25: 0000000000000dc0 x24: 0000000000000000
>> [148.389019] x23: 0000000000000000 x22: ffff800013ccbbe8
>> [148.394320] x21: 0000000000000010 x20: 0000000000000000
>> [148.399621] x19: 00000000fffff000 x18: ffffffffffffffff
>> [148.404922] x17: 00000000000000c0 x16: fffffe289eaf6380
>> [148.410223] x15: ffff800011b59948 x14: ffff002bc8fe98f8
>> [148.415523] x13: ff00000000000000 x12: ffff8000114ca000
>> [148.420824] x11: 0000000000000000 x10: ffffffffffffffff
>> [148.426124] x9 : ffffffffffffffc0 x8 : ffff0a27b5f9b6a0
>> [148.431425] x7 : 0000000000000000 x6 : 0000000000000001
>> [148.436726] x5 : ffff0a27b5f9b680 x4 : 0000000000000000
>> [148.442027] x3 : ffff0a27b5f9b680 x2 : 0000000000000000
>> [148.447328] x1 : 0000000000000001 x0 : 0000000000000000
>> [148.452629] Call trace:
>> [148.455065]__sg_alloc_table_from_pages+0xec/0x238
>> [148.459931]sg_alloc_table_from_pages+0x18/0x28
>> [148.464541]iommu_dma_alloc+0x474/0x678
>> [148.468455]dma_alloc_attrs+0xd8/0xf0
>> [148.472193]nvme_alloc_queue+0x114/0x160 [nvme]
>> [148.476798]nvme_reset_work+0xb34/0x14b4 [nvme]
>> [148.481407]process_one_work+0x1e8/0x360
>> [148.485405]worker_thread+0x44/0x478
>> [148.489055]kthread+0x150/0x158
>> [148.492273]ret_from_fork+0x10/0x34
>> [148.495838] Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
>> [148.501921] ---[ end trace 89bb2b72d59bf925 ]---
>>
>> Anything to worry about? I guess not since we're in the merge window, but
>> mentioning just in case ...

I bisected, and this patch looks to fix it (note the comments below the 
'---'):

 From 263891a760edc24b901085bf6e5fe2480808f86d Mon Sep 17 00:00:00 2001
From: John Garry <john.garry@huawei.com>
Date: Fri, 14 Aug 2020 12:45:18 +0100
Subject: [PATCH] nvme-pci: Use u32 for nvme_dev.q_depth

Recently nvme_dev.q_depth was changed from int to u16 type.

This falls over for the queue depth calculation in nvme_pci_enable(),
where NVME_CAP_MQES(dev->ctrl.cap) + 1 may overflow, as NVME_CAP_MQES()
gives a 16b number also. That happens for me, and this is the result:

root@ubuntu:/home/john# [148.272996] Unable to handle kernel NULL pointer
dereference at virtual address 0000000000000010
[148.281784] Mem abort info:
[148.284584] ESR = 0x96000004
[148.287641] EC = 0x25: DABT (current EL), IL = 32 bits
[148.292950] SET = 0, FnV = 0
[148.295998] EA = 0, S1PTW = 0
[148.299126] Data abort info:
[148.302003] ISV = 0, ISS = 0x00000004
[148.305832] CM = 0, WnR = 0
[148.308794] user pgtable: 4k pages, 48-bit VAs, pgdp=00000a27bf3c9000
[148.315229] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
[148.322016] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[148.327577] Modules linked in: nvme nvme_core
[148.331927] CPU: 56 PID: 256 Comm: kworker/u195:0 Not tainted
5.8.0-next-20200812 #27
[148.339744] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
V1.16.01 03/15/2019
[148.348260] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[148.353822] pstate: 80c00009 (Nzcv daif +PAN +UAO BTYPE=--)
[148.359390] pc : __sg_alloc_table_from_pages+0xec/0x238
[148.364604] lr : __sg_alloc_table_from_pages+0xc8/0x238
[148.369815] sp : ffff800013ccbad0
[148.373116] x29: ffff800013ccbad0 x28: ffff0a27b3d380a8
[148.378417] x27: 0000000000000000 x26: 0000000000002dc2
[148.383718] x25: 0000000000000dc0 x24: 0000000000000000
[148.389019] x23: 0000000000000000 x22: ffff800013ccbbe8
[148.394320] x21: 0000000000000010 x20: 0000000000000000
[148.399621] x19: 00000000fffff000 x18: ffffffffffffffff
[148.404922] x17: 00000000000000c0 x16: fffffe289eaf6380
[148.410223] x15: ffff800011b59948 x14: ffff002bc8fe98f8
[148.415523] x13: ff00000000000000 x12: ffff8000114ca000
[148.420824] x11: 0000000000000000 x10: ffffffffffffffff
[148.426124] x9 : ffffffffffffffc0 x8 : ffff0a27b5f9b6a0
[148.431425] x7 : 0000000000000000 x6 : 0000000000000001
[148.436726] x5 : ffff0a27b5f9b680 x4 : 0000000000000000
[148.442027] x3 : ffff0a27b5f9b680 x2 : 0000000000000000
[148.447328] x1 : 0000000000000001 x0 : 0000000000000000
[148.452629] Call trace:
[148.455065]__sg_alloc_table_from_pages+0xec/0x238
[148.459931]sg_alloc_table_from_pages+0x18/0x28
[148.464541]iommu_dma_alloc+0x474/0x678
[148.468455]dma_alloc_attrs+0xd8/0xf0
[148.472193]nvme_alloc_queue+0x114/0x160 [nvme]
[148.476798]nvme_reset_work+0xb34/0x14b4 [nvme]
[148.481407]process_one_work+0x1e8/0x360
[148.485405]worker_thread+0x44/0x478
[148.489055]kthread+0x150/0x158
[148.492273]ret_from_fork+0x10/0x34
[148.495838] Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
[148.501921] ---[ end trace 89bb2b72d59bf925 ]---

Fix by making a u32.

Fixes: 61f3b8963097 ("nvme-pci: use unsigned for io queue depth")
Signed-off-by: John Garry <john.garry@huawei.com>
---
unsigned int may be better, and io_queue_depth() needs fixing also

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ba725ae47305..72c1402abfc3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -120,7 +120,7 @@ struct nvme_dev {
  	unsigned max_qid;
  	unsigned io_queues[HCTX_MAX_TYPES];
  	unsigned int num_vecs;
-	u16 q_depth;
+	u32 q_depth;
  	int io_sqes;
  	u32 db_stride;
  	void __iomem *bar;
@@ -2320,7 +2320,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)

  	dev->ctrl.cap = lo_hi_readq(dev->bar + NVME_REG_CAP);

-	dev->q_depth = min_t(u16, NVME_CAP_MQES(dev->ctrl.cap) + 1,
+	dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
  				io_queue_depth);
  	dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
  	dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: nvme crash - Re: linux-next: Tree for Aug 13
  2020-08-14 12:00   ` John Garry
@ 2020-08-14 12:08     ` Christoph Hellwig
  2020-08-14 13:07       ` John Garry
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-08-14 12:08 UTC (permalink / raw)
  To: John Garry
  Cc: Stephen Rothwell, chaitanya.kulkarni, Linux Kernel Mailing List,
	linux-nvme, Christoph Hellwig, iommu, Linux Next Mailing List,
	Robin Murphy

On Fri, Aug 14, 2020 at 01:00:30PM +0100, John Garry wrote:
> 
> > > I have experienced this this crash below on linux-next for the last few days
> > > on my arm64 system. Linus' master branch today also has it.
> > Adding Robin and the iommu list as this seems to be in the dma-iommu
> > code.
> > 
> > > root@ubuntu:/home/john# insmod nvme.ko
> > > [148.254564] nvme 0000:81:00.0: Adding to iommu group 21
> > > [148.260973] nvme nvme0: pci function 0000:81:00.0
> > > root@ubuntu:/home/john# [148.272996] Unable to handle kernel NULL pointer
> > > dereference at virtual address 0000000000000010
> > > [148.281784] Mem abort info:
> > > [148.284584] ESR = 0x96000004
> > > [148.287641] EC = 0x25: DABT (current EL), IL = 32 bits
> > > [148.292950] SET = 0, FnV = 0
> > > [148.295998] EA = 0, S1PTW = 0
> > > [148.299126] Data abort info:
> > > [148.302003] ISV = 0, ISS = 0x00000004
> > > [148.305832] CM = 0, WnR = 0
> > > [148.308794] user pgtable: 4k pages, 48-bit VAs, pgdp=00000a27bf3c9000
> > > [148.315229] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
> > > [148.322016] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > > [148.327577] Modules linked in: nvme nvme_core
> > > [148.331927] CPU: 56 PID: 256 Comm: kworker/u195:0 Not tainted
> > > 5.8.0-next-20200812 #27
> > > [148.339744] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
> > > V1.16.01 03/15/2019
> > > [148.348260] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
> > > [148.353822] pstate: 80c00009 (Nzcv daif +PAN +UAO BTYPE=--)
> > > [148.359390] pc : __sg_alloc_table_from_pages+0xec/0x238
> > > [148.364604] lr : __sg_alloc_table_from_pages+0xc8/0x238
> > > [148.369815] sp : ffff800013ccbad0
> > > [148.373116] x29: ffff800013ccbad0 x28: ffff0a27b3d380a8
> > > [148.378417] x27: 0000000000000000 x26: 0000000000002dc2
> > > [148.383718] x25: 0000000000000dc0 x24: 0000000000000000
> > > [148.389019] x23: 0000000000000000 x22: ffff800013ccbbe8
> > > [148.394320] x21: 0000000000000010 x20: 0000000000000000
> > > [148.399621] x19: 00000000fffff000 x18: ffffffffffffffff
> > > [148.404922] x17: 00000000000000c0 x16: fffffe289eaf6380
> > > [148.410223] x15: ffff800011b59948 x14: ffff002bc8fe98f8
> > > [148.415523] x13: ff00000000000000 x12: ffff8000114ca000
> > > [148.420824] x11: 0000000000000000 x10: ffffffffffffffff
> > > [148.426124] x9 : ffffffffffffffc0 x8 : ffff0a27b5f9b6a0
> > > [148.431425] x7 : 0000000000000000 x6 : 0000000000000001
> > > [148.436726] x5 : ffff0a27b5f9b680 x4 : 0000000000000000
> > > [148.442027] x3 : ffff0a27b5f9b680 x2 : 0000000000000000
> > > [148.447328] x1 : 0000000000000001 x0 : 0000000000000000
> > > [148.452629] Call trace:
> > > [148.455065]__sg_alloc_table_from_pages+0xec/0x238
> > > [148.459931]sg_alloc_table_from_pages+0x18/0x28
> > > [148.464541]iommu_dma_alloc+0x474/0x678
> > > [148.468455]dma_alloc_attrs+0xd8/0xf0
> > > [148.472193]nvme_alloc_queue+0x114/0x160 [nvme]
> > > [148.476798]nvme_reset_work+0xb34/0x14b4 [nvme]
> > > [148.481407]process_one_work+0x1e8/0x360
> > > [148.485405]worker_thread+0x44/0x478
> > > [148.489055]kthread+0x150/0x158
> > > [148.492273]ret_from_fork+0x10/0x34
> > > [148.495838] Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
> > > [148.501921] ---[ end trace 89bb2b72d59bf925 ]---
> > > 
> > > Anything to worry about? I guess not since we're in the merge window, but
> > > mentioning just in case ...
> 
> I bisected, and this patch looks to fix it (note the comments below the
> '---'):
> 
> From 263891a760edc24b901085bf6e5fe2480808f86d Mon Sep 17 00:00:00 2001
> From: John Garry <john.garry@huawei.com>
> Date: Fri, 14 Aug 2020 12:45:18 +0100
> Subject: [PATCH] nvme-pci: Use u32 for nvme_dev.q_depth
> 
> Recently nvme_dev.q_depth was changed from int to u16 type.
> 
> This falls over for the queue depth calculation in nvme_pci_enable(),
> where NVME_CAP_MQES(dev->ctrl.cap) + 1 may overflow, as NVME_CAP_MQES()
> gives a 16b number also. That happens for me, and this is the result:

Oh, interesting.  Please also switch the module option parsing to
use kstrtou32 and param_set_uint and send this as a formal patch.

Thanks!
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nvme crash - Re: linux-next: Tree for Aug 13
  2020-08-14 12:08     ` Christoph Hellwig
@ 2020-08-14 13:07       ` John Garry
  2020-08-14 13:37         ` John Garry
  0 siblings, 1 reply; 5+ messages in thread
From: John Garry @ 2020-08-14 13:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Stephen Rothwell, chaitanya.kulkarni, Linux Kernel Mailing List,
	linux-nvme, iommu, Linux Next Mailing List, Robin Murphy

On 14/08/2020 13:08, Christoph Hellwig wrote:
>>>> [148.455065]__sg_alloc_table_from_pages+0xec/0x238
>>>> [148.459931]sg_alloc_table_from_pages+0x18/0x28
>>>> [148.464541]iommu_dma_alloc+0x474/0x678
>>>> [148.468455]dma_alloc_attrs+0xd8/0xf0
>>>> [148.472193]nvme_alloc_queue+0x114/0x160 [nvme]
>>>> [148.476798]nvme_reset_work+0xb34/0x14b4 [nvme]
>>>> [148.481407]process_one_work+0x1e8/0x360
>>>> [148.485405]worker_thread+0x44/0x478
>>>> [148.489055]kthread+0x150/0x158
>>>> [148.492273]ret_from_fork+0x10/0x34
>>>> [148.495838] Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
>>>> [148.501921] ---[ end trace 89bb2b72d59bf925 ]---
>>>>
>>>> Anything to worry about? I guess not since we're in the merge window, but
>>>> mentioning just in case ...
>> I bisected, and this patch looks to fix it (note the comments below the
>> '---'):
>>
>>  From 263891a760edc24b901085bf6e5fe2480808f86d Mon Sep 17 00:00:00 2001
>> From: John Garry<john.garry@huawei.com>
>> Date: Fri, 14 Aug 2020 12:45:18 +0100
>> Subject: [PATCH] nvme-pci: Use u32 for nvme_dev.q_depth
>>
>> Recently nvme_dev.q_depth was changed from int to u16 type.
>>
>> This falls over for the queue depth calculation in nvme_pci_enable(),
>> where NVME_CAP_MQES(dev->ctrl.cap) + 1 may overflow, as NVME_CAP_MQES()
>> gives a 16b number also. That happens for me, and this is the result:
> Oh, interesting.  Please also switch the module option parsing to
> use kstrtou32 and param_set_uint and send this as a formal patch.
> 

I'm doing it now.

BTW, as for the DMA/sg scatterlist code, it so happens in this case that 
we try the dma alloc for size=0 in nvme_alloc_queue() - I know an 
allocation for size=0 makes no sense, but couldn't we bit a bit more robust?

Cheers,
John



_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nvme crash - Re: linux-next: Tree for Aug 13
  2020-08-14 13:07       ` John Garry
@ 2020-08-14 13:37         ` John Garry
  0 siblings, 0 replies; 5+ messages in thread
From: John Garry @ 2020-08-14 13:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Stephen Rothwell, chaitanya.kulkarni, Linux Kernel Mailing List,
	linux-nvme, iommu, Linux Next Mailing List, Robin Murphy

On 14/08/2020 14:07, John Garry wrote:
> 
> BTW, as for the DMA/sg scatterlist code, it so happens in this case that 
> we try the dma alloc for size=0 in nvme_alloc_queue() - I know an 
> allocation for size=0 makes no sense, but couldn't we bit a bit more 
> robust?

it's giving ZERO_SIZE_PTR, which we deference, so ignore me...
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-14 13:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <454c65b1-872a-a48c-662d-690044662772@huawei.com>
2020-08-13 15:50 ` nvme crash - Re: linux-next: Tree for Aug 13 Christoph Hellwig
2020-08-14 12:00   ` John Garry
2020-08-14 12:08     ` Christoph Hellwig
2020-08-14 13:07       ` John Garry
2020-08-14 13:37         ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).