All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel warning at drivers/infiniband/core/rw.c:349
@ 2021-10-13  0:07 Bart Van Assche
  2021-10-13  0:30 ` Logan Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2021-10-13  0:07 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: linux-rdma, Jason Gunthorpe

Hi,

If I run the SRP tests against the for-next branch of the RDMA git tree
then the following warning appears (commit 2a152512a155 ("RDMA/efa: CQ 
notifications")):

------------[ cut here ]------------
WARNING: CPU: 69 PID: 838 at drivers/infiniband/core/rw.c:349 
rdma_rw_ctx_init+0x63b/0x690 [ib_core]
CPU: 69 PID: 838 Comm: kworker/69:1H Tainted: G    E   5.15.0-rc4-dbg+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
RIP: 0010:rdma_rw_ctx_init+0x63b/0x690 [ib_core]
Code: 8b 45 10 49 8d 7e 48 49 89 46 40 e8 cf 32 ca e0 8b 45 18 49 8d 7e 
04 41 89 46 48 e8 df 30 ca e0 41 c6 46 04 00 e9 61 fe ff ff <0f> 0b 41 
bc fb ff ff ff e9 3e fe ff ff 48 8b 9d 70 ff ff ff 48 8d
RSP: 0018:ffff88810b867968 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000024 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffff888169ee9a40 RDI: ffff888169ee9a58
RBP: ffff88810b867a20 R08: ffffffffa081b01b R09: 0000000000000000
R10: ffffed1085d2e3f1 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: ffff888169ee9a58 R15: ffff888169ee9a40
FS:  0000000000000000(0000) GS:ffff88842e940000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4720169e88 CR3: 00000001895d9006 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
  srpt_alloc_rw_ctxs+0x2f2/0x560 [ib_srpt]
  srpt_get_desc_tbl.constprop.0+0x289/0x2e0 [ib_srpt]
  srpt_handle_cmd+0x17f/0x2b0 [ib_srpt]
  srpt_handle_new_iu+0x27e/0x520 [ib_srpt]
  srpt_recv_done+0x9b/0xd0 [ib_srpt]
  __ib_process_cq+0x121/0x3d0 [ib_core]
  ib_cq_poll_work+0x37/0xb0 [ib_core]
  process_one_work+0x585/0xae0
  worker_thread+0x2e7/0x700
  kthread+0x1f6/0x220
  ret_from_fork+0x1f/0x30
irq event stamp: 1255
hardirqs last  enabled at (1263): [<ffffffff811ab2c8>] 
__up_console_sem+0x58/0x60
hardirqs last disabled at (1270): [<ffffffff811ab2ad>] 
__up_console_sem+0x3d/0x60
softirqs last  enabled at (1290): [<ffffffff82200473>] 
__do_softirq+0x473/0x6ed
softirqs last disabled at (1279): [<ffffffff810e2152>] 
__irq_exit_rcu+0xf2/0x140
---[ end trace 81a8636fba7e1a77 ]---

Does this perhaps indicate a regression in the RDMA rw code?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel warning at drivers/infiniband/core/rw.c:349
  2021-10-13  0:07 Kernel warning at drivers/infiniband/core/rw.c:349 Bart Van Assche
@ 2021-10-13  0:30 ` Logan Gunthorpe
  2021-10-13  5:34   ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: Logan Gunthorpe @ 2021-10-13  0:30 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma, Jason Gunthorpe



On 2021-10-12 6:07 p.m., Bart Van Assche wrote:
> Hi,
> 
> If I run the SRP tests against the for-next branch of the RDMA git tree
> then the following warning appears (commit 2a152512a155 ("RDMA/efa: CQ 
> notifications")):
> 
> ------------[ cut here ]------------
> WARNING: CPU: 69 PID: 838 at drivers/infiniband/core/rw.c:349 
> rdma_rw_ctx_init+0x63b/0x690 [ib_core]
> CPU: 69 PID: 838 Comm: kworker/69:1H Tainted: G    E   5.15.0-rc4-dbg+ #2
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> RIP: 0010:rdma_rw_ctx_init+0x63b/0x690 [ib_core]
> Code: 8b 45 10 49 8d 7e 48 49 89 46 40 e8 cf 32 ca e0 8b 45 18 49 8d 7e 
> 04 41 89 46 48 e8 df 30 ca e0 41 c6 46 04 00 e9 61 fe ff ff <0f> 0b 41 
> bc fb ff ff ff e9 3e fe ff ff 48 8b 9d 70 ff ff ff 48 8d
> RSP: 0018:ffff88810b867968 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000024 RCX: dffffc0000000000
> RDX: 0000000000000000 RSI: ffff888169ee9a40 RDI: ffff888169ee9a58
> RBP: ffff88810b867a20 R08: ffffffffa081b01b R09: 0000000000000000
> R10: ffffed1085d2e3f1 R11: 0000000000000001 R12: 0000000000000000
> R13: 0000000000000000 R14: ffff888169ee9a58 R15: ffff888169ee9a40
> FS:  0000000000000000(0000) GS:ffff88842e940000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4720169e88 CR3: 00000001895d9006 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>   srpt_alloc_rw_ctxs+0x2f2/0x560 [ib_srpt]
>   srpt_get_desc_tbl.constprop.0+0x289/0x2e0 [ib_srpt]
>   srpt_handle_cmd+0x17f/0x2b0 [ib_srpt]
>   srpt_handle_new_iu+0x27e/0x520 [ib_srpt]
>   srpt_recv_done+0x9b/0xd0 [ib_srpt]
>   __ib_process_cq+0x121/0x3d0 [ib_core]
>   ib_cq_poll_work+0x37/0xb0 [ib_core]
>   process_one_work+0x585/0xae0
>   worker_thread+0x2e7/0x700
>   kthread+0x1f6/0x220
>   ret_from_fork+0x1f/0x30
> irq event stamp: 1255
> hardirqs last  enabled at (1263): [<ffffffff811ab2c8>] 
> __up_console_sem+0x58/0x60
> hardirqs last disabled at (1270): [<ffffffff811ab2ad>] 
> __up_console_sem+0x3d/0x60
> softirqs last  enabled at (1290): [<ffffffff82200473>] 
> __do_softirq+0x473/0x6ed
> softirqs last disabled at (1279): [<ffffffff810e2152>] 
> __irq_exit_rcu+0xf2/0x140
> ---[ end trace 81a8636fba7e1a77 ]---
> 
> Does this perhaps indicate a regression in the RDMA rw code?

Hmm, yes looks like a regression with my recent patch.

Best I can see from the code is that someone is passing an sg_cnt of
zero. Previously that would have returned -ENOMEM, but now it might be
ignored, in which case it would hit that WARNING and return -EIO.

We can try a patch such as below to confirm.

Logan

--

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 5a3bd41b331c..4eb9781ccfaf 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -331,6 +331,10 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx,
struct ib_qp *qp, u3>
                return ret;
        sg_cnt = sgt.nents;

+       ret = -EIO;
+       if (!sg_cnt)
+               goto out_unmap_sg;
+
        /*
         * Skip to the S/G entry that sg_offset falls into:
         */


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Kernel warning at drivers/infiniband/core/rw.c:349
  2021-10-13  0:30 ` Logan Gunthorpe
@ 2021-10-13  5:34   ` Bart Van Assche
  2021-10-13 16:15     ` Logan Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2021-10-13  5:34 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: linux-rdma, Jason Gunthorpe

On 10/12/21 17:30, Logan Gunthorpe wrote:
> Best I can see from the code is that someone is passing an sg_cnt of
> zero. Previously that would have returned -ENOMEM, but now it might be
> ignored, in which case it would hit that WARNING and return -EIO.

That is not what is happening. The debug patch shown below learned me 
the following:
* The sg_cnt argument of rdma_rw_ctx_init() is not zero.
* After the rdma_rw_map_sgtable() call, sgt.nents is zero.

The debug patch that I used is as follows:

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 5a3bd41b331c..a6dabea37958 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -326,11 +326,15 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp, u32 port_num,
  	};
  	int ret;

+	WARN_ON_ONCE(!sg_cnt);
+
  	ret = rdma_rw_map_sgtable(dev, &sgt, dir);
  	if (ret)
  		return ret;
  	sg_cnt = sgt.nents;

+	WARN_ON_ONCE(!sg_cnt);
+
  	/*
  	 * Skip to the S/G entry that sg_offset falls into:
  	 */
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 3cadf1295417..d9e3d52eb952 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -911,11 +911,16 @@ static int srpt_alloc_rw_ctxs(struct 
srpt_send_ioctx *ioctx,
  		u32 size = be32_to_cpu(db->len);
  		u32 rkey = be32_to_cpu(db->key);

+		WARN_ON_ONCE(!size);
+
  		ret = target_alloc_sgl(&ctx->sg, &ctx->nents, size,
  				 false, i < nbufs - 1);
  		if (ret)
  			goto unwind;

+		WARN_ONCE(ctx->nents <= 0, "%u bytes -> %d entries\n",
+			  size, ctx->nents);
+
  		ret = rdma_rw_ctx_init(&ctx->rw, ch->qp,
                                 ch->sport->port,
  				ctx->sg, ctx->nents, 0, remote_addr,
                                 rkey, dir);
  		if (ret < 0) {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Kernel warning at drivers/infiniband/core/rw.c:349
  2021-10-13  5:34   ` Bart Van Assche
@ 2021-10-13 16:15     ` Logan Gunthorpe
  2021-10-13 16:20       ` Jason Gunthorpe
  2021-10-13 16:38       ` Bart Van Assche
  0 siblings, 2 replies; 6+ messages in thread
From: Logan Gunthorpe @ 2021-10-13 16:15 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma, Jason Gunthorpe



On 2021-10-12 11:34 p.m., Bart Van Assche wrote:
> On 10/12/21 17:30, Logan Gunthorpe wrote:
>> Best I can see from the code is that someone is passing an sg_cnt of
>> zero. Previously that would have returned -ENOMEM, but now it might be
>> ignored, in which case it would hit that WARNING and return -EIO.
> 
> That is not what is happening. The debug patch shown below learned me
> the following:
> * The sg_cnt argument of rdma_rw_ctx_init() is not zero.
> * After the rdma_rw_map_sgtable() call, sgt.nents is zero.
> 
> The debug patch that I used is as follows:

Ah, hmm. Perhaps it's this... The virt path in
ib_dma_map_sgtable_attrs() doesn't set the sgt.nents...

Maybe try this something like the patch below.

Thanks,

Logan

--


diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 4b50d9a3018a..4ba642fc8a19 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4097,8 +4097,13 @@ static inline int ib_dma_map_sgtable_attrs(struct ib_dev>
                                           enum dma_data_direction direction,
                                           unsigned long dma_attrs)
 {
+       int nents;
+
        if (ib_uses_virt_dma(dev)) {
-               ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
+               nents = ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
+               if (!nents)
+                       return -EIO;
+               sgt->nents = nents;
                return 0;
        }
        return dma_map_sgtable(dev->dma_device, sgt, direction, dma_attrs);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Kernel warning at drivers/infiniband/core/rw.c:349
  2021-10-13 16:15     ` Logan Gunthorpe
@ 2021-10-13 16:20       ` Jason Gunthorpe
  2021-10-13 16:38       ` Bart Van Assche
  1 sibling, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2021-10-13 16:20 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: Bart Van Assche, linux-rdma

On Wed, Oct 13, 2021 at 10:15:59AM -0600, Logan Gunthorpe wrote:
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 4b50d9a3018a..4ba642fc8a19 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -4097,8 +4097,13 @@ static inline int ib_dma_map_sgtable_attrs(struct ib_dev>
>                                            enum dma_data_direction direction,
>                                            unsigned long dma_attrs)
>  {
> +       int nents;
> +
>         if (ib_uses_virt_dma(dev)) {
> -               ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
> +               nents = ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
> +               if (!nents)
> +                       return -EIO;
> +               sgt->nents = nents;
>                 return 0;
>         }

Oh yes, that definitely looks needed.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kernel warning at drivers/infiniband/core/rw.c:349
  2021-10-13 16:15     ` Logan Gunthorpe
  2021-10-13 16:20       ` Jason Gunthorpe
@ 2021-10-13 16:38       ` Bart Van Assche
  1 sibling, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2021-10-13 16:38 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: linux-rdma, Jason Gunthorpe

On 10/13/21 9:15 AM, Logan Gunthorpe wrote:
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 4b50d9a3018a..4ba642fc8a19 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -4097,8 +4097,13 @@ static inline int ib_dma_map_sgtable_attrs(struct ib_dev>
>                                             enum dma_data_direction direction,
>                                             unsigned long dma_attrs)
>   {
> +       int nents;
> +
>          if (ib_uses_virt_dma(dev)) {
> -               ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
> +               nents = ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents);
> +               if (!nents)
> +                       return -EIO;
> +               sgt->nents = nents;
>                  return 0;
>          }
>          return dma_map_sgtable(dev->dma_device, sgt, direction, dma_attrs);

Thanks!

Tested-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-13 16:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13  0:07 Kernel warning at drivers/infiniband/core/rw.c:349 Bart Van Assche
2021-10-13  0:30 ` Logan Gunthorpe
2021-10-13  5:34   ` Bart Van Assche
2021-10-13 16:15     ` Logan Gunthorpe
2021-10-13 16:20       ` Jason Gunthorpe
2021-10-13 16:38       ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.