* [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic @ 2022-01-05 14:18 trondmy 2022-01-05 14:37 ` Jason Gunthorpe 0 siblings, 1 reply; 7+ messages in thread From: trondmy @ 2022-01-05 14:18 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: linux-rdma, linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> When doing RPC/RDMA, we're seeing a kernel panic when __ib_umem_release() iterates over the scatter gather list and hits NULL pages. It turns out that commit 79fbd3e1241c ended up changing the iteration from being over only the mapped entries to being over the original list size. Fixes: 79fbd3e1241c ("RDMA: Use the sg_table directly and remove the opencoded version from umem") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- drivers/infiniband/core/umem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 86d479772fbc..59304bae13ca 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -55,7 +55,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d ib_dma_unmap_sgtable_attrs(dev, &umem->sgt_append.sgt, DMA_BIDIRECTIONAL, 0); - for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) + for_each_sgtable_dma_sg(&umem->sgt_append.sgt, sg, i) unpin_user_page_range_dirty_lock(sg_page(sg), DIV_ROUND_UP(sg->length, PAGE_SIZE), make_dirty); -- 2.33.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic 2022-01-05 14:18 [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic trondmy @ 2022-01-05 14:37 ` Jason Gunthorpe 2022-01-05 15:02 ` Trond Myklebust 0 siblings, 1 reply; 7+ messages in thread From: Jason Gunthorpe @ 2022-01-05 14:37 UTC (permalink / raw) To: trondmy; +Cc: linux-rdma, linux-nfs On Wed, Jan 05, 2022 at 09:18:41AM -0500, trondmy@kernel.org wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > When doing RPC/RDMA, we're seeing a kernel panic when __ib_umem_release() > iterates over the scatter gather list and hits NULL pages. > > It turns out that commit 79fbd3e1241c ended up changing the iteration > from being over only the mapped entries to being over the original list > size. You mean this? - for_each_sg(umem->sg_head.sgl, sg, umem->sg_nents, i) + for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) I don't see what changed there? The invarient should be that umem->sg_nents == sgt->orig_nents > @@ -55,7 +55,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d > ib_dma_unmap_sgtable_attrs(dev, &umem->sgt_append.sgt, > DMA_BIDIRECTIONAL, 0); > > - for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) > + for_each_sgtable_dma_sg(&umem->sgt_append.sgt, sg, i) > unpin_user_page_range_dirty_lock(sg_page(sg), Calling sg_page() from under a dma_sg iterator is unconditionally wrong.. More likely your case is something has gone wrong when the sgtable was created and it has the wrong value in orig_nents.. Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic 2022-01-05 14:37 ` Jason Gunthorpe @ 2022-01-05 15:02 ` Trond Myklebust 2022-01-05 16:09 ` Jason Gunthorpe 0 siblings, 1 reply; 7+ messages in thread From: Trond Myklebust @ 2022-01-05 15:02 UTC (permalink / raw) To: jgg, trondmy; +Cc: linux-nfs, linux-rdma On Wed, 2022-01-05 at 10:37 -0400, Jason Gunthorpe wrote: > On Wed, Jan 05, 2022 at 09:18:41AM -0500, trondmy@kernel.org wrote: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > When doing RPC/RDMA, we're seeing a kernel panic when > > __ib_umem_release() > > iterates over the scatter gather list and hits NULL pages. > > > > It turns out that commit 79fbd3e1241c ended up changing the > > iteration > > from being over only the mapped entries to being over the original > > list > > size. > > You mean this? > > - for_each_sg(umem->sg_head.sgl, sg, umem->sg_nents, i) > + for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) > > I don't see what changed there? The invarient should be that > > umem->sg_nents == sgt->orig_nents > > > @@ -55,7 +55,7 @@ static void __ib_umem_release(struct ib_device > > *dev, struct ib_umem *umem, int d > > ib_dma_unmap_sgtable_attrs(dev, &umem- > > >sgt_append.sgt, > > DMA_BIDIRECTIONAL, 0); > > > > - for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) > > + for_each_sgtable_dma_sg(&umem->sgt_append.sgt, sg, i) > > unpin_user_page_range_dirty_lock(sg_page(sg), > > Calling sg_page() from under a dma_sg iterator is unconditionally > wrong.. > > More likely your case is something has gone wrong when the sgtable > was > created and it has the wrong value in orig_nents.. Can you define "wrong value" in this case? Chuck's RPC/RDMA code appears to call ib_alloc_mr() with an 'expected maximum number of entries' (depth) in net/sunrpc/xprtrdma/frwr_ops.c:frwr_mr_init(). It then fills that table with a set of n <= depth pages in net/sunrpc/xprtrdma/frwr_ops.c:frwr_map() and calls ib_dma_map_sg() to map them, and then adjusts the sgtable with a call to ib_map_mr_sg(). What part of that sequence of operations is incorrect? > > Jason -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic 2022-01-05 15:02 ` Trond Myklebust @ 2022-01-05 16:09 ` Jason Gunthorpe 2022-01-05 17:16 ` Trond Myklebust 0 siblings, 1 reply; 7+ messages in thread From: Jason Gunthorpe @ 2022-01-05 16:09 UTC (permalink / raw) To: Trond Myklebust; +Cc: trondmy, linux-nfs, linux-rdma On Wed, Jan 05, 2022 at 03:02:34PM +0000, Trond Myklebust wrote: > On Wed, 2022-01-05 at 10:37 -0400, Jason Gunthorpe wrote: > > On Wed, Jan 05, 2022 at 09:18:41AM -0500, trondmy@kernel.org wrote: > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > When doing RPC/RDMA, we're seeing a kernel panic when > > > __ib_umem_release() > > > iterates over the scatter gather list and hits NULL pages. > > > > > > It turns out that commit 79fbd3e1241c ended up changing the > > > iteration > > > from being over only the mapped entries to being over the original > > > list > > > size. > > > > You mean this? > > > > - for_each_sg(umem->sg_head.sgl, sg, umem->sg_nents, i) > > + for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) > > > > I don't see what changed there? The invarient should be that > > > > umem->sg_nents == sgt->orig_nents > > > > > @@ -55,7 +55,7 @@ static void __ib_umem_release(struct ib_device > > > *dev, struct ib_umem *umem, int d > > > ib_dma_unmap_sgtable_attrs(dev, &umem- > > > >sgt_append.sgt, > > > DMA_BIDIRECTIONAL, 0); > > > > > > - for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) > > > + for_each_sgtable_dma_sg(&umem->sgt_append.sgt, sg, i) > > > unpin_user_page_range_dirty_lock(sg_page(sg), > > > > Calling sg_page() from under a dma_sg iterator is unconditionally > > wrong.. > > > > More likely your case is something has gone wrong when the sgtable > > was > > created and it has the wrong value in orig_nents.. > > Can you define "wrong value" in this case? Chuck's RPC/RDMA code > appears to call ib_alloc_mr() with an 'expected maximum number of > entries' (depth) in net/sunrpc/xprtrdma/frwr_ops.c:frwr_mr_init(). > > It then fills that table with a set of n <= depth pages in > net/sunrpc/xprtrdma/frwr_ops.c:frwr_map() and calls ib_dma_map_sg() to > map them, and then adjusts the sgtable with a call to ib_map_mr_sg(). I'm confused, RPC/RDMA should never touch a umem at all. Is this really the other bug where user and kernel MR are getting confused? Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic 2022-01-05 16:09 ` Jason Gunthorpe @ 2022-01-05 17:16 ` Trond Myklebust 2022-01-05 17:43 ` Jason Gunthorpe 0 siblings, 1 reply; 7+ messages in thread From: Trond Myklebust @ 2022-01-05 17:16 UTC (permalink / raw) To: jgg; +Cc: linux-nfs, linux-rdma, trondmy On Wed, 2022-01-05 at 12:09 -0400, Jason Gunthorpe wrote: > On Wed, Jan 05, 2022 at 03:02:34PM +0000, Trond Myklebust wrote: > > On Wed, 2022-01-05 at 10:37 -0400, Jason Gunthorpe wrote: > > > On Wed, Jan 05, 2022 at 09:18:41AM -0500, > > > trondmy@kernel.org wrote: > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > When doing RPC/RDMA, we're seeing a kernel panic when > > > > __ib_umem_release() > > > > iterates over the scatter gather list and hits NULL pages. > > > > > > > > It turns out that commit 79fbd3e1241c ended up changing the > > > > iteration > > > > from being over only the mapped entries to being over the > > > > original > > > > list > > > > size. > > > > > > You mean this? > > > > > > - for_each_sg(umem->sg_head.sgl, sg, umem->sg_nents, i) > > > + for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) > > > > > > I don't see what changed there? The invarient should be that > > > > > > umem->sg_nents == sgt->orig_nents > > > > > > > @@ -55,7 +55,7 @@ static void __ib_umem_release(struct > > > > ib_device > > > > *dev, struct ib_umem *umem, int d > > > > ib_dma_unmap_sgtable_attrs(dev, &umem- > > > > > sgt_append.sgt, > > > > DMA_BIDIRECTIONAL, > > > > 0); > > > > > > > > - for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i) > > > > + for_each_sgtable_dma_sg(&umem->sgt_append.sgt, sg, i) > > > > unpin_user_page_range_dirty_lock(sg_page(sg), > > > > > > Calling sg_page() from under a dma_sg iterator is unconditionally > > > wrong.. > > > > > > More likely your case is something has gone wrong when the > > > sgtable > > > was > > > created and it has the wrong value in orig_nents.. > > > > Can you define "wrong value" in this case? Chuck's RPC/RDMA code > > appears to call ib_alloc_mr() with an 'expected maximum number of > > entries' (depth) in net/sunrpc/xprtrdma/frwr_ops.c:frwr_mr_init(). > > > > It then fills that table with a set of n <= depth pages in > > net/sunrpc/xprtrdma/frwr_ops.c:frwr_map() and calls ib_dma_map_sg() > > to > > map them, and then adjusts the sgtable with a call to > > ib_map_mr_sg(). > > I'm confused, RPC/RDMA should never touch a umem at all. > > Is this really the other bug where user and kernel MR are getting > confused? > As far as I know, RPC/RDMA is just using the RDMA api to register and unregister chunks of memory, so it is definitely not directly touching the umem. Let me just copy/paste the stack dump that we got. This was a 5.15.12 kernel running on Azure with an infiniband setup. It's a little garbled (sorry) but the stack trace itself is fairly clear that this is happening on transport teardown and that we're iterating through a NULL page. 2022-01-04 17:53:24 kernel: [ 2922.654038] nf[s: 2922.656486] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 2922.660304] #PF: supervisor read access in kernel mode serv[er 1722.1962.20..662320] #PF: error_code(0x0000) - not-present page [ 2922.664606] PGD 1d6c381067 P4D 1d6c381067 PUD 1d0c93e067 PMD 0 2[1 no2t9 r2espo2ndi.n667089] Oops: 0000 [#1] SMP NOPTI [ 2922.668823] CPU: 27 PID: 11384 Comm: kworker/u241:4 Kdump: loaded Tainted: G W 5.15.12-200.pd.17715.el7.x86_64 #1 73695] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [ 2922.677741] Workqueue: xprtiod xprt_autoclose [sunrpc] [202 2-2019-042 217.:5679875] RIP: 0010:__ib_umem_release+0x7a/0xa0 [ib_uverbs] [ 2922.682325] Code: f6 cb 48 89 ef e8 16 90 2e cc 48 89 c5 41 39 5c 24 5c 77 cd 5b 49 8d 7c 24 50 5d 41 5c 41 5d e9 ec 96 2e cc 41 bd 01 00 00 00 <48> 8b 3f 48 85 ff 74 9f 41 8b 54 24 5c 49 8b 74 24 50 45 31 c0 31 3:24[ k2e92rn2.el6: [ 920363] RSP: 0018:ffffb53dee517d28 EFLAGS: 00010246 [ 2922.692787] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9f222edfa180 [92 2.2692540428.]6 nf9s:5675] RDX: 0000000000000001 RSI: ffff9f31b0dc4000 RDI: 0000000000000000 [ 2922.698735] RBP: ffff9f31b0dc4000 R08: 0000000000000000 R09: 0000000000000000 s[er ver2 19722.126.0.701658] R10: ffff9f3ddfb69680 R11: ffffb53dee517d50 R12: ffff9f31b0dc4000 [ 2922.704764] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9f23f2cc09b0 [. 212 n9ot2 re2s.po7nd0i8268] FS: 0000000000000000(0000) GS:ffff9f3d0fcc0000(0000) knlGS:0000000000000000 [ 2922.713056] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 ng[, st2il9l22 .tr7y1in5g925] CR2: 0000000000000000 CR3: 0000001dafdb6005 CR4: 0000000000370ee0 [ 2922.720003] Call Trace: 202229-2021.-047 127:1202] <TASK> [ 2922.722467] ib_umem_release+0x2a/0x90 [ib_uverbs] 53:24[ ke rn2el9: [2 2.724829] mlx5_ib_dereg_mr+0x1fe/0x3d0 [mlx5_ib] [ 2922.727447] ib_dereg_mr_user+0x40/0xa0 [ib_core] 2922.6[540 51]2 nfs922.729658] frwr_mr_release+0x20/0x70 [rpcrdma] [ 2922.732133] rpcrdma_xprt_disconnect+0x2b4/0x3a0 [rpcrdma] [: se2rve9r 1272.126..734753] xprt_rdma_close+0xe/0x30 [rpcrdma] [ 2922.737096] xprt_autoclose+0x52/0xf0 [sunrpc] 0.[21 no2t r9es2pon2d.739248] process_one_work+0x1f1/0x390 [ 2922.741247] worker_thread+0x53/0x3e0 ing[, s2til9l try2in2.743081] ? process_one_work+0x390/0x390 [ 2922.745341] kthread+0x127/0x150 [ 2922.747185] ? set_kthread_struct+0x40/0x40 [ 2922.749685] ret_from_fork+0x22/0x30 [ 2922.751889] </TASK> [ 2922.753068] Modules linked in: nfsv3 bpf_preload veth nfs_layout_flexfiles auth_name rpcsec_gss_krb5 nfsv4 dns_resolver nfsidmap nfs fscache netfs xt_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtypebr_netfilter bridge stp llc nfsd auth_rpcgss overlay nfs_acl lockd grace xt_owner nf_nat_ftp nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp xt_CT xt_sctp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ip6table_filter ip6_tables iptable_filter bonding ipmi_msghandler rpcrdma sunrpc rdma_ucm ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib raid0 ib_uverbs ib_core vfat fat mlx5_core nvme mlxfw psample nvme_core tls intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pci_hyperv hyperv_drm hv_balloon [ 2922.753108] hv_utils hyperv_keyboard hid_hyperv pci_hyperv_intf drm_kms_helper cec drm i2c_piix4 hyperv_fb dm_multipath ip_tables xfs hv_storvsc hv_netvsc scsi_transport_fc crc32c_intel ata_generic hv_vmbus pata_acpi [ 2922.803987] CR2: 0000000000000000 [ 2922.805713] ---[ end trace 450d08bd4a337f29 ]--- [ 2922.808718] RIP: 0010:__ib_umem_release+0x7a/0xa0 [ib_uverbs] [ 2922.812103] Code: f6 cb 48 89 ef e8 16 90 2e cc 48 89 c5 41 39 5c 24 5c 77 cd 5b 49 8d 7c 24 50 5d 41 5c 41 5d e9 ec 96 2e cc 41 bd 01 00 00 00 <48> 8b 3f 48 85 ff 74 9f 41 8b 54 24 5c 49 8b 74 24 50 45 31 c0 31 [ 2922.821369] RSP: 0018:ffffb53dee517d28 EFLAGS: 00010246 [ 2922.824135] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9f222edfa180 [ 2922.827639] RDX: 0000000000000001 RSI: ffff9f31b0dc4000 RDI: 0000000000000000 [ 2922.831121] RBP: ffff9f31b0dc4000 R08: 0000000000000000 R09: 0000000000000000 [ 2922.834528] R10: ffff9f3ddfb69680 R11: ffffb53dee517d50 R12: ffff9f31b0dc4000 [ 2922.838042] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9f23f2cc09b0 [ 2922.841442] FS: 0000000000000000(0000) GS:ffff9f3d0fcc0000(0000) knlGS:0000000000000000 [ 2922.845340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2922.848327] CR2: 0000000000000000 CR3: 0000001dafdb6005 CR4: 0000000000370ee0 [ 2922.852791] Kernel panic - not syncing: Fatal exception [ 2922.858167] Kernel Offset: 0xc000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic 2022-01-05 17:16 ` Trond Myklebust @ 2022-01-05 17:43 ` Jason Gunthorpe 2022-01-05 17:49 ` Trond Myklebust 0 siblings, 1 reply; 7+ messages in thread From: Jason Gunthorpe @ 2022-01-05 17:43 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs, linux-rdma, trondmy On Wed, Jan 05, 2022 at 05:16:06PM +0000, Trond Myklebust wrote: > > I'm confused, RPC/RDMA should never touch a umem at all. > > > > Is this really the other bug where user and kernel MR are getting > > confused? > > > > As far as I know, RPC/RDMA is just using the RDMA api to register and > unregister chunks of memory, so it is definitely not directly touching > the umem. I mean, RPC/RDMA doesn't have a umem at all, so seeing it any stack trace says something is corrupted I suppose it is this: https://lore.kernel.org/r/20211222101312.1358616-1-maorg@nvidia.com Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic 2022-01-05 17:43 ` Jason Gunthorpe @ 2022-01-05 17:49 ` Trond Myklebust 0 siblings, 0 replies; 7+ messages in thread From: Trond Myklebust @ 2022-01-05 17:49 UTC (permalink / raw) To: jgg; +Cc: linux-nfs, linux-rdma, trondmy On Wed, 2022-01-05 at 13:43 -0400, Jason Gunthorpe wrote: > On Wed, Jan 05, 2022 at 05:16:06PM +0000, Trond Myklebust wrote: > > > I'm confused, RPC/RDMA should never touch a umem at all. > > > > > > Is this really the other bug where user and kernel MR are getting > > > confused? > > > > > > > As far as I know, RPC/RDMA is just using the RDMA api to register > > and > > unregister chunks of memory, so it is definitely not directly > > touching > > the umem. > > I mean, RPC/RDMA doesn't have a umem at all, so seeing it any stack > trace says something is corrupted > > I suppose it is this: > > https://lore.kernel.org/r/20211222101312.1358616-1-maorg@nvidia.com > > Jason > Ah... Thanks! I'll try that out. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-01-05 17:49 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-05 14:18 [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic trondmy 2022-01-05 14:37 ` Jason Gunthorpe 2022-01-05 15:02 ` Trond Myklebust 2022-01-05 16:09 ` Jason Gunthorpe 2022-01-05 17:16 ` Trond Myklebust 2022-01-05 17:43 ` Jason Gunthorpe 2022-01-05 17:49 ` Trond Myklebust
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.