* Kernel warning at drivers/infiniband/core/rw.c:349
@ 2021-10-13 0:07 Bart Van Assche
2021-10-13 0:30 ` Logan Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2021-10-13 0:07 UTC (permalink / raw)
To: Logan Gunthorpe; +Cc: linux-rdma, Jason Gunthorpe
Hi,
If I run the SRP tests against the for-next branch of the RDMA git tree
then the following warning appears (commit 2a152512a155 ("RDMA/efa: CQ
notifications")):
------------[ cut here ]------------
WARNING: CPU: 69 PID: 838 at drivers/infiniband/core/rw.c:349
rdma_rw_ctx_init+0x63b/0x690 [ib_core]
CPU: 69 PID: 838 Comm: kworker/69:1H Tainted: G E 5.15.0-rc4-dbg+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
RIP: 0010:rdma_rw_ctx_init+0x63b/0x690 [ib_core]
Code: 8b 45 10 49 8d 7e 48 49 89 46 40 e8 cf 32 ca e0 8b 45 18 49 8d 7e
04 41 89 46 48 e8 df 30 ca e0 41 c6 46 04 00 e9 61 fe ff ff <0f> 0b 41
bc fb ff ff ff e9 3e fe ff ff 48 8b 9d 70 ff ff ff 48 8d
RSP: 0018:ffff88810b867968 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000024 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffff888169ee9a40 RDI: ffff888169ee9a58
RBP: ffff88810b867a20 R08: ffffffffa081b01b R09: 0000000000000000
R10: ffffed1085d2e3f1 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: ffff888169ee9a58 R15: ffff888169ee9a40
FS: 0000000000000000(0000) GS:ffff88842e940000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4720169e88 CR3: 00000001895d9006 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
srpt_alloc_rw_ctxs+0x2f2/0x560 [ib_srpt]
srpt_get_desc_tbl.constprop.0+0x289/0x2e0 [ib_srpt]
srpt_handle_cmd+0x17f/0x2b0 [ib_srpt]
srpt_handle_new_iu+0x27e/0x520 [ib_srpt]
srpt_recv_done+0x9b/0xd0 [ib_srpt]
__ib_process_cq+0x121/0x3d0 [ib_core]
ib_cq_poll_work+0x37/0xb0 [ib_core]
process_one_work+0x585/0xae0
worker_thread+0x2e7/0x700
kthread+0x1f6/0x220
ret_from_fork+0x1f/0x30
irq event stamp: 1255
hardirqs last enabled at (1263): [<ffffffff811ab2c8>]
__up_console_sem+0x58/0x60
hardirqs last disabled at (1270): [<ffffffff811ab2ad>]
__up_console_sem+0x3d/0x60
softirqs last enabled at (1290): [<ffffffff82200473>]
__do_softirq+0x473/0x6ed
softirqs last disabled at (1279): [<ffffffff810e2152>]
__irq_exit_rcu+0xf2/0x140
---[ end trace 81a8636fba7e1a77 ]---
Does this perhaps indicate a regression in the RDMA rw code?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel warning at drivers/infiniband/core/rw.c:349
2021-10-13 0:07 Kernel warning at drivers/infiniband/core/rw.c:349 Bart Van Assche
@ 2021-10-13 0:30 ` Logan Gunthorpe
2021-10-13 5:34 ` Bart Van Assche
0 siblings, 1 reply; 6+ messages in thread
From: Logan Gunthorpe @ 2021-10-13 0:30 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-rdma, Jason Gunthorpe
On 2021-10-12 6:07 p.m., Bart Van Assche wrote:
> Hi,
>
> If I run the SRP tests against the for-next branch of the RDMA git tree
> then the following warning appears (commit 2a152512a155 ("RDMA/efa: CQ
> notifications")):
>
> ------------[ cut here ]------------
> WARNING: CPU: 69 PID: 838 at drivers/infiniband/core/rw.c:349
> rdma_rw_ctx_init+0x63b/0x690 [ib_core]
> CPU: 69 PID: 838 Comm: kworker/69:1H Tainted: G E 5.15.0-rc4-dbg+ #2
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> RIP: 0010:rdma_rw_ctx_init+0x63b/0x690 [ib_core]
> Code: 8b 45 10 49 8d 7e 48 49 89 46 40 e8 cf 32 ca e0 8b 45 18 49 8d 7e
> 04 41 89 46 48 e8 df 30 ca e0 41 c6 46 04 00 e9 61 fe ff ff <0f> 0b 41
> bc fb ff ff ff e9 3e fe ff ff 48 8b 9d 70 ff ff ff 48 8d
> RSP: 0018:ffff88810b867968 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000024 RCX: dffffc0000000000
> RDX: 0000000000000000 RSI: ffff888169ee9a40 RDI: ffff888169ee9a58
> RBP: ffff88810b867a20 R08: ffffffffa081b01b R09: 0000000000000000
> R10: ffffed1085d2e3f1 R11: 0000000000000001 R12: 0000000000000000
> R13: 0000000000000000 R14: ffff888169ee9a58 R15: ffff888169ee9a40
> FS: 0000000000000000(0000) GS:ffff88842e940000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4720169e88 CR3: 00000001895d9006 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> srpt_alloc_rw_ctxs+0x2f2/0x560 [ib_srpt]
> srpt_get_desc_tbl.constprop.0+0x289/0x2e0 [ib_srpt]
> srpt_handle_cmd+0x17f/0x2b0 [ib_srpt]
> srpt_handle_new_iu+0x27e/0x520 [ib_srpt]
> srpt_recv_done+0x9b/0xd0 [ib_srpt]
> __ib_process_cq+0x121/0x3d0 [ib_core]
> ib_cq_poll_work+0x37/0xb0 [ib_core]
> process_one_work+0x585/0xae0
> worker_thread+0x2e7/0x700
> kthread+0x1f6/0x220
> ret_from_fork+0x1f/0x30
> irq event stamp: 1255
> hardirqs last enabled at (1263): [<ffffffff811ab2c8>]
> __up_console_sem+0x58/0x60
> hardirqs last disabled at (1270): [<ffffffff811ab2ad>]
> __up_console_sem+0x3d/0x60
> softirqs last enabled at (1290): [<ffffffff82200473>]
> __do_softirq+0x473/0x6ed
> softirqs last disabled at (1279): [<ffffffff810e2152>]
> __irq_exit_rcu+0xf2/0x140
> ---[ end trace 81a8636fba7e1a77 ]---
>
> Does this perhaps indicate a regression in the RDMA rw code?
Hmm, yes looks like a regression with my recent patch.
Best I can see from the code is that someone is passing an sg_cnt of
zero. Previously that would have returned -ENOMEM, but now it might be
ignored, in which case it would hit that WARNING and return -EIO.
We can try a patch such as below to confirm.
Logan
--
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 5a3bd41b331c..4eb9781ccfaf 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -331,6 +331,10 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx,
struct ib_qp *qp, u3>
return ret;
sg_cnt = sgt.nents;
+ ret = -EIO;
+ if (!sg_cnt)
+ goto out_unmap_sg;
+
/*
* Skip to the S/G entry that sg_offset falls into:
*/
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Kernel warning at drivers/infiniband/core/rw.c:349
2021-10-13 0:30 ` Logan Gunthorpe
@ 2021-10-13 5:34 ` Bart Van Assche
2021-10-13 16:15 ` Logan Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2021-10-13 5:34 UTC (permalink / raw)
To: Logan Gunthorpe; +Cc: linux-rdma, Jason Gunthorpe
On 10/12/21 17:30, Logan Gunthorpe wrote:
> Best I can see from the code is that someone is passing an sg_cnt of
> zero. Previously that would have returned -ENOMEM, but now it might be
> ignored, in which case it would hit that WARNING and return -EIO.
That is not what is happening. The debug patch shown below learned me
the following:
* The sg_cnt argument of rdma_rw_ctx_init() is not zero.
* After the rdma_rw_map_sgtable() call, sgt.nents is zero.
The debug patch that I used is as follows:
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 5a3bd41b331c..a6dabea37958 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -326,11 +326,15 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx,
struct ib_qp *qp, u32 port_num,
};
int ret;
+ WARN_ON_ONCE(!sg_cnt);
+
ret = rdma_rw_map_sgtable(dev, &sgt, dir);
if (ret)
return ret;
sg_cnt = sgt.nents;
+ WARN_ON_ONCE(!sg_cnt);
+
/*
* Skip to the S/G entry that sg_offset falls into:
*/
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 3cadf1295417..d9e3d52eb952 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -911,11 +911,16 @@ static int srpt_alloc_rw_ctxs(struct
srpt_send_ioctx *ioctx,
u32 size = be32_to_cpu(db->len);
u32 rkey = be32_to_cpu(db->key);
+ WARN_ON_ONCE(!size);
+
ret = target_alloc_sgl(&ctx->sg, &ctx->nents, size,
false, i < nbufs - 1);
if (ret)
goto unwind;
+ WARN_ONCE(ctx->nents <= 0, "%u bytes -> %d entries\n",
+ size, ctx->nents);
+
ret = rdma_rw_ctx_init(&ctx->rw, ch->qp,
ch->sport->port,
ctx->sg, ctx->nents, 0, remote_addr,
rkey, dir);
if (ret < 0) {
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Kernel warning at drivers/infiniband/core/rw.c:349
2021-10-13 5:34 ` Bart Van Assche
@ 2021-10-13 16:15 ` Logan Gunthorpe
2021-10-13 16:20 ` Jason Gunthorpe
2021-10-13 16:38 ` Bart Van Assche
0 siblings, 2 replies; 6+ messages in thread
From: Logan Gunthorpe @ 2021-10-13 16:15 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-rdma, Jason Gunthorpe
On 2021-10-12 11:34 p.m., Bart Van Assche wrote:
> On 10/12/21 17:30, Logan Gunthorpe wrote:
>> Best I can see from the code is that someone is passing an sg_cnt of
>> zero. Previously that would have returned -ENOMEM, but now it might be
>> ignored, in which case it would hit that WARNING and return -EIO.
>
> That is not what is happening. The debug patch shown below learned me
> the following:
> * The sg_cnt argument of rdma_rw_ctx_init() is not zero.
> * After the rdma_rw_map_sgtable() call, sgt.nents is zero.
>
> The debug patch that I used is as follows:
Ah, hmm. Perhaps it's this... The virt path in
ib_dma_map_sgtable_attrs() doesn't set the sgt.nents...
Maybe try this something like the patch below.
Thanks,
Logan
--
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 4b50d9a3018a..4ba642fc8a19 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4097,8 +4097,13 @@ static inline int ib_dma_map_sgtable_attrs(struct ib_dev>
enum dma_data_direction direction,
unsigned long dma_attrs)
{
+ int nents;
+
if (ib_uses_virt_dma(dev)) {
- ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
+ nents = ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
+ if (!nents)
+ return -EIO;
+ sgt->nents = nents;
return 0;
}
return dma_map_sgtable(dev->dma_device, sgt, direction, dma_attrs);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Kernel warning at drivers/infiniband/core/rw.c:349
2021-10-13 16:15 ` Logan Gunthorpe
@ 2021-10-13 16:20 ` Jason Gunthorpe
2021-10-13 16:38 ` Bart Van Assche
1 sibling, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2021-10-13 16:20 UTC (permalink / raw)
To: Logan Gunthorpe; +Cc: Bart Van Assche, linux-rdma
On Wed, Oct 13, 2021 at 10:15:59AM -0600, Logan Gunthorpe wrote:
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 4b50d9a3018a..4ba642fc8a19 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -4097,8 +4097,13 @@ static inline int ib_dma_map_sgtable_attrs(struct ib_dev>
> enum dma_data_direction direction,
> unsigned long dma_attrs)
> {
> + int nents;
> +
> if (ib_uses_virt_dma(dev)) {
> - ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
> + nents = ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
> + if (!nents)
> + return -EIO;
> + sgt->nents = nents;
> return 0;
> }
Oh yes, that definitely looks needed.
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel warning at drivers/infiniband/core/rw.c:349
2021-10-13 16:15 ` Logan Gunthorpe
2021-10-13 16:20 ` Jason Gunthorpe
@ 2021-10-13 16:38 ` Bart Van Assche
1 sibling, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2021-10-13 16:38 UTC (permalink / raw)
To: Logan Gunthorpe; +Cc: linux-rdma, Jason Gunthorpe
On 10/13/21 9:15 AM, Logan Gunthorpe wrote:
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 4b50d9a3018a..4ba642fc8a19 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -4097,8 +4097,13 @@ static inline int ib_dma_map_sgtable_attrs(struct ib_dev>
> enum dma_data_direction direction,
> unsigned long dma_attrs)
> {
> + int nents;
> +
> if (ib_uses_virt_dma(dev)) {
> - ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
> + nents = ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
> + if (!nents)
> + return -EIO;
> + sgt->nents = nents;
> return 0;
> }
> return dma_map_sgtable(dev->dma_device, sgt, direction, dma_attrs);
Thanks!
Tested-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-10-13 16:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13 0:07 Kernel warning at drivers/infiniband/core/rw.c:349 Bart Van Assche
2021-10-13 0:30 ` Logan Gunthorpe
2021-10-13 5:34 ` Bart Van Assche
2021-10-13 16:15 ` Logan Gunthorpe
2021-10-13 16:20 ` Jason Gunthorpe
2021-10-13 16:38 ` Bart Van Assche
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.