All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing"
@ 2022-12-02 11:01 Daisuke Matsuda
  2022-12-02 11:45 ` Zhu Yanjun
  2022-12-07 23:43 ` Jason Gunthorpe
  0 siblings, 2 replies; 8+ messages in thread
From: Daisuke Matsuda @ 2022-12-02 11:01 UTC (permalink / raw)
  To: linux-rdma, leonro, jgg, zyjzyj2000
  Cc: lizhijian, rpearsonhpe, Daisuke Matsuda

The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes
a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr
is not set in check_rkey(). The mr is NULL in this case, and a NULL pointer
dereference occurs as shown below.

[  139.607580] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  139.609169] #PF: supervisor write access in kernel mode
[  139.610314] #PF: error_code(0x0002) - not-present page
[  139.611434] PGD 0 P4D 0
[  139.612031] Oops: 0002 [#1] PREEMPT SMP PTI
[  139.612975] CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34
[  139.614465] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[  139.616142] RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe]
[  139.617065] Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c
[  139.620451] RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246
[  139.621413] RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000
[  139.622718] RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000
[  139.624025] RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f
[  139.625328] R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000
[  139.626632] R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000
[  139.627941] FS:  00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000
[  139.629418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  139.630480] CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0
[  139.631805] Call Trace:
[  139.632288]  <IRQ>
[  139.632688]  read_reply+0xda/0x310 [rdma_rxe]
[  139.633515]  rxe_responder+0x82d/0xe50 [rdma_rxe]
[  139.634398]  do_task+0x84/0x170 [rdma_rxe]
[  139.635187]  tasklet_action_common.constprop.0+0xa7/0x120
[  139.636189]  __do_softirq+0xcb/0x2ac
[  139.636877]  do_softirq+0x63/0x90
[  139.637505]  </IRQ>

Link: https://lore.kernel.org/lkml/1666582315-2-1-git-send-email-lizhijian@fujitsu.com/
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
NOTE:
 I think the commit 686d348476ee is not yet merged to Torvalds' tree.
 Perhaps we may just remove the patch from the for-next tree.
 I leave that to the maintainers as I am not familiar with patch reversion.

 drivers/infiniband/sw/rxe/rxe_resp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 6761bcd1d4d8..5d3a4c6f81a3 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -832,7 +832,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 
 	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
 			  payload, RXE_FROM_MR_OBJ);
-	rxe_put(mr);
+	if (mr)
+		rxe_put(mr);
 	if (err) {
 		kfree_skb(skb);
 		return RESPST_ERR_RKEY_VIOLATION;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing"
  2022-12-02 11:01 [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing" Daisuke Matsuda
@ 2022-12-02 11:45 ` Zhu Yanjun
  2022-12-02 14:35   ` lizhijian
  2022-12-07 23:43 ` Jason Gunthorpe
  1 sibling, 1 reply; 8+ messages in thread
From: Zhu Yanjun @ 2022-12-02 11:45 UTC (permalink / raw)
  To: Daisuke Matsuda; +Cc: linux-rdma, leonro, jgg, lizhijian, rpearsonhpe

On Fri, Dec 2, 2022 at 7:02 PM Daisuke Matsuda
<matsuda-daisuke@fujitsu.com> wrote:
>
> The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes
> a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr
> is not set in check_rkey(). The mr is NULL in this case, and a NULL pointer
> dereference occurs as shown below.
>
> [  139.607580] BUG: kernel NULL pointer dereference, address: 0000000000000010
> [  139.609169] #PF: supervisor write access in kernel mode
> [  139.610314] #PF: error_code(0x0002) - not-present page
> [  139.611434] PGD 0 P4D 0
> [  139.612031] Oops: 0002 [#1] PREEMPT SMP PTI
> [  139.612975] CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34
> [  139.614465] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [  139.616142] RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe]
> [  139.617065] Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c
> [  139.620451] RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246
> [  139.621413] RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000
> [  139.622718] RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000
> [  139.624025] RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f
> [  139.625328] R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000
> [  139.626632] R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000
> [  139.627941] FS:  00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000
> [  139.629418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  139.630480] CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0
> [  139.631805] Call Trace:
> [  139.632288]  <IRQ>
> [  139.632688]  read_reply+0xda/0x310 [rdma_rxe]
> [  139.633515]  rxe_responder+0x82d/0xe50 [rdma_rxe]
> [  139.634398]  do_task+0x84/0x170 [rdma_rxe]
> [  139.635187]  tasklet_action_common.constprop.0+0xa7/0x120
> [  139.636189]  __do_softirq+0xcb/0x2ac
> [  139.636877]  do_softirq+0x63/0x90
> [  139.637505]  </IRQ>
>
> Link: https://lore.kernel.org/lkml/1666582315-2-1-git-send-email-lizhijian@fujitsu.com/
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
> NOTE:
>  I think the commit 686d348476ee is not yet merged to Torvalds' tree.
>  Perhaps we may just remove the patch from the for-next tree.
>  I leave that to the maintainers as I am not familiar with patch reversion.

Sure. If this is for for-next, it had better add "[for-netx PATCH]
Revert "RDMA/rxe: Remove unnecessary mr testing""

Thanks.
Zhu Yanjun

>
>  drivers/infiniband/sw/rxe/rxe_resp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index 6761bcd1d4d8..5d3a4c6f81a3 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -832,7 +832,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>
>         err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>                           payload, RXE_FROM_MR_OBJ);
> -       rxe_put(mr);
> +       if (mr)
> +               rxe_put(mr);
>         if (err) {
>                 kfree_skb(skb);
>                 return RESPST_ERR_RKEY_VIOLATION;
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing"
  2022-12-02 11:45 ` Zhu Yanjun
@ 2022-12-02 14:35   ` lizhijian
  2022-12-02 14:43     ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: lizhijian @ 2022-12-02 14:35 UTC (permalink / raw)
  To: Zhu Yanjun, Daisuke Matsuda (Fujitsu)
  Cc: linux-rdma, leonro, jgg, rpearsonhpe



on 12/2/2022 7:45 PM, Zhu Yanjun wrote:
> On Fri, Dec 2, 2022 at 7:02 PM Daisuke Matsuda
> <matsuda-daisuke@fujitsu.com> wrote:
>>
>> The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes
>> a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr
>> is not set in check_rkey(). The mr is NULL in this case, and a NULL pointer
>> dereference occurs as shown below.
>>
>> [  139.607580] BUG: kernel NULL pointer dereference, address: 0000000000000010
>> [  139.609169] #PF: supervisor write access in kernel mode
>> [  139.610314] #PF: error_code(0x0002) - not-present page
>> [  139.611434] PGD 0 P4D 0
>> [  139.612031] Oops: 0002 [#1] PREEMPT SMP PTI
>> [  139.612975] CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34
>> [  139.614465] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
>> [  139.616142] RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe]
>> [  139.617065] Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c
>> [  139.620451] RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246
>> [  139.621413] RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000
>> [  139.622718] RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000
>> [  139.624025] RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f
>> [  139.625328] R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000
>> [  139.626632] R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000
>> [  139.627941] FS:  00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000
>> [  139.629418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  139.630480] CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0
>> [  139.631805] Call Trace:
>> [  139.632288]  <IRQ>
>> [  139.632688]  read_reply+0xda/0x310 [rdma_rxe]
>> [  139.633515]  rxe_responder+0x82d/0xe50 [rdma_rxe]
>> [  139.634398]  do_task+0x84/0x170 [rdma_rxe]
>> [  139.635187]  tasklet_action_common.constprop.0+0xa7/0x120
>> [  139.636189]  __do_softirq+0xcb/0x2ac
>> [  139.636877]  do_softirq+0x63/0x90
>> [  139.637505]  </IRQ>
>>
>> Link: https://lore.kernel.org/lkml/1666582315-2-1-git-send-email-lizhijian@fujitsu.com/
>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>

Good catch, want to know what workload you are running.
I have never got it in pyverbs tests.

Add a TODOs: add pyverbs test to cover this scenario.

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>



>> ---
>> NOTE:
>>   I think the commit 686d348476ee is not yet merged to Torvalds' tree.
>>   Perhaps we may just remove the patch from the for-next tree.
>>   I leave that to the maintainers as I am not familiar with patch reversion.
> 
> Sure. If this is for for-next, it had better add "[for-netx PATCH]
> Revert "RDMA/rxe: Remove unnecessary mr testing""
> 
> Thanks.
> Zhu Yanjun
> 
>>
>>   drivers/infiniband/sw/rxe/rxe_resp.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>> index 6761bcd1d4d8..5d3a4c6f81a3 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>> @@ -832,7 +832,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>
>>          err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>                            payload, RXE_FROM_MR_OBJ);
>> -       rxe_put(mr);
>> +       if (mr)
>> +               rxe_put(mr);
>>          if (err) {
>>                  kfree_skb(skb);
>>                  return RESPST_ERR_RKEY_VIOLATION;
>> --
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing"
  2022-12-02 14:35   ` lizhijian
@ 2022-12-02 14:43     ` Jason Gunthorpe
  2022-12-05  5:19       ` Daisuke Matsuda (Fujitsu)
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2022-12-02 14:43 UTC (permalink / raw)
  To: lizhijian
  Cc: Zhu Yanjun, Daisuke Matsuda (Fujitsu), linux-rdma, leonro, rpearsonhpe

On Fri, Dec 02, 2022 at 02:35:01PM +0000, lizhijian@fujitsu.com wrote:
> 
> 
> on 12/2/2022 7:45 PM, Zhu Yanjun wrote:
> > On Fri, Dec 2, 2022 at 7:02 PM Daisuke Matsuda
> > <matsuda-daisuke@fujitsu.com> wrote:
> >>
> >> The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes
> >> a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr
> >> is not set in check_rkey(). The mr is NULL in this case, and a NULL pointer
> >> dereference occurs as shown below.
> >>
> >> [  139.607580] BUG: kernel NULL pointer dereference, address: 0000000000000010
> >> [  139.609169] #PF: supervisor write access in kernel mode
> >> [  139.610314] #PF: error_code(0x0002) - not-present page
> >> [  139.611434] PGD 0 P4D 0
> >> [  139.612031] Oops: 0002 [#1] PREEMPT SMP PTI
> >> [  139.612975] CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34
> >> [  139.614465] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> >> [  139.616142] RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe]
> >> [  139.617065] Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c
> >> [  139.620451] RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246
> >> [  139.621413] RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000
> >> [  139.622718] RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000
> >> [  139.624025] RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f
> >> [  139.625328] R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000
> >> [  139.626632] R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000
> >> [  139.627941] FS:  00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000
> >> [  139.629418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [  139.630480] CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0
> >> [  139.631805] Call Trace:
> >> [  139.632288]  <IRQ>
> >> [  139.632688]  read_reply+0xda/0x310 [rdma_rxe]
> >> [  139.633515]  rxe_responder+0x82d/0xe50 [rdma_rxe]
> >> [  139.634398]  do_task+0x84/0x170 [rdma_rxe]
> >> [  139.635187]  tasklet_action_common.constprop.0+0xa7/0x120
> >> [  139.636189]  __do_softirq+0xcb/0x2ac
> >> [  139.636877]  do_softirq+0x63/0x90
> >> [  139.637505]  </IRQ>
> >>
> >> Link: https://lore.kernel.org/lkml/1666582315-2-1-git-send-email-lizhijian@fujitsu.com/
> >> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> 
> Good catch, want to know what workload you are running.
> I have never got it in pyverbs tests.
> 
> Add a TODOs: add pyverbs test to cover this scenario.

Yes please

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing"
  2022-12-02 14:43     ` Jason Gunthorpe
@ 2022-12-05  5:19       ` Daisuke Matsuda (Fujitsu)
  0 siblings, 0 replies; 8+ messages in thread
From: Daisuke Matsuda (Fujitsu) @ 2022-12-05  5:19 UTC (permalink / raw)
  To: 'Jason Gunthorpe', lizhijian
  Cc: Zhu Yanjun, linux-rdma, leonro, rpearsonhpe

On Fri, Dec 2, 2022 11:43 PM Jason Gunthorpe wrote:
> On Fri, Dec 02, 2022 at 02:35:01PM +0000, lizhijian@fujitsu.com wrote:
> >
> >
> > on 12/2/2022 7:45 PM, Zhu Yanjun wrote:
> > > On Fri, Dec 2, 2022 at 7:02 PM Daisuke Matsuda
> > > <matsuda-daisuke@fujitsu.com> wrote:
> > >>
> > >> The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes
> > >> a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr
> > >> is not set in check_rkey(). The mr is NULL in this case, and a NULL pointer
> > >> dereference occurs as shown below.
> > >>
> > >> [  139.607580] BUG: kernel NULL pointer dereference, address: 0000000000000010
> > >> [  139.609169] #PF: supervisor write access in kernel mode
> > >> [  139.610314] #PF: error_code(0x0002) - not-present page
> > >> [  139.611434] PGD 0 P4D 0
> > >> [  139.612031] Oops: 0002 [#1] PREEMPT SMP PTI
> > >> [  139.612975] CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34
> > >> [  139.614465] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > >> [  139.616142] RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe]
> > >> [  139.617065] Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f
> 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c
> > >> [  139.620451] RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246
> > >> [  139.621413] RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000
> > >> [  139.622718] RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000
> > >> [  139.624025] RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f
> > >> [  139.625328] R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000
> > >> [  139.626632] R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000
> > >> [  139.627941] FS:  00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000
> > >> [  139.629418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [  139.630480] CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0
> > >> [  139.631805] Call Trace:
> > >> [  139.632288]  <IRQ>
> > >> [  139.632688]  read_reply+0xda/0x310 [rdma_rxe]
> > >> [  139.633515]  rxe_responder+0x82d/0xe50 [rdma_rxe]
> > >> [  139.634398]  do_task+0x84/0x170 [rdma_rxe]
> > >> [  139.635187]  tasklet_action_common.constprop.0+0xa7/0x120
> > >> [  139.636189]  __do_softirq+0xcb/0x2ac
> > >> [  139.636877]  do_softirq+0x63/0x90
> > >> [  139.637505]  </IRQ>
> > >>
> > >> Link: https://lore.kernel.org/lkml/1666582315-2-1-git-send-email-lizhijian@fujitsu.com/
> > >> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> >
> > Good catch, want to know what workload you are running.
> > I have never got it in pyverbs tests.

I found the issue when running my personal testcase for test_odp.py.

> >
> > Add a TODOs: add pyverbs test to cover this scenario.

Zhijian thankfully did it two days ago, but we should also have the RDMA Write counterpart.
Future changes may trigger the similar problem in write_data_in(), so I posted it.
cf. https://github.com/linux-rdma/rdma-core/pull/1269

Daisuke

> 
> Yes please
> 
> Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing"
  2022-12-02 11:01 [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing" Daisuke Matsuda
  2022-12-02 11:45 ` Zhu Yanjun
@ 2022-12-07 23:43 ` Jason Gunthorpe
  2022-12-08  6:08   ` Daisuke Matsuda (Fujitsu)
  1 sibling, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2022-12-07 23:43 UTC (permalink / raw)
  To: Daisuke Matsuda; +Cc: linux-rdma, leonro, zyjzyj2000, lizhijian, rpearsonhpe

On Fri, Dec 02, 2022 at 08:01:57PM +0900, Daisuke Matsuda wrote:
> The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes
> a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr
> is not set in check_rkey(). The mr is NULL in this case, and a NULL pointer
> dereference occurs as shown below.

I don't think this is right.

What justification is there for not validating the rkey in check_rkey
just because the length is 0?

IBA 9.3.3.2 says:

  A responder that supports RDMA and / or ATOMIC Operations shall verify
  the R_Key, the associated access rights, and the specified virtual ad-
  dress. The responder must also perform bounds checking (i.e. verify that
  the length of the data being referenced does not cross the associated
  memory start and end addresses). Any violation must result in the packet
  being discarded and for reliable services, the generation of a NAK.

Which I do not think allows this behavior.

If check_rkey validates the rkey then this function can assume it is
not NULL in all cases, like I think it is supposed to.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing"
  2022-12-07 23:43 ` Jason Gunthorpe
@ 2022-12-08  6:08   ` Daisuke Matsuda (Fujitsu)
  2022-12-08 12:23     ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Daisuke Matsuda (Fujitsu) @ 2022-12-08  6:08 UTC (permalink / raw)
  To: 'Jason Gunthorpe'
  Cc: linux-rdma, leonro, zyjzyj2000, lizhijian, rpearsonhpe

On Thu, Dec 8, 2022 8:44 AM Jason Gunthorpe wrote:
> 
> On Fri, Dec 02, 2022 at 08:01:57PM +0900, Daisuke Matsuda wrote:
> > The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes
> > a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr
> > is not set in check_rkey(). The mr is NULL in this case, and a NULL pointer
> > dereference occurs as shown below.
> 
> I don't think this is right.
> 
> What justification is there for not validating the rkey in check_rkey
> just because the length is 0?

I referred to IB Specification Vol 1-Release-1.5-2021-08-06b.
The behaviour of responder on receiving a packet is described in "9.7.4.1".
The current implementation of check_rkey() is justified by "9.7.4.1.5 C9-88".

> 
> IBA 9.3.3.2 says:
> 
>  <...>

The document is proprietary. I think it is safer not to quote the contents,
so I do not show what "9.7.4.1.5 C9-88" says here.
Sorry for bothering you, but please check the description by yourself.

Thanks,
Daisuke

> 
> Which I do not think allows this behavior.
> 
> If check_rkey validates the rkey then this function can assume it is
> not NULL in all cases, like I think it is supposed to.
> 
> Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing"
  2022-12-08  6:08   ` Daisuke Matsuda (Fujitsu)
@ 2022-12-08 12:23     ` Jason Gunthorpe
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2022-12-08 12:23 UTC (permalink / raw)
  To: Daisuke Matsuda (Fujitsu)
  Cc: linux-rdma, leonro, zyjzyj2000, lizhijian, rpearsonhpe

On Thu, Dec 08, 2022 at 06:08:30AM +0000, Daisuke Matsuda (Fujitsu) wrote:
> On Thu, Dec 8, 2022 8:44 AM Jason Gunthorpe wrote:
> > 
> > On Fri, Dec 02, 2022 at 08:01:57PM +0900, Daisuke Matsuda wrote:
> > > The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes
> > > a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr
> > > is not set in check_rkey(). The mr is NULL in this case, and a NULL pointer
> > > dereference occurs as shown below.
> > 
> > I don't think this is right.
> > 
> > What justification is there for not validating the rkey in check_rkey
> > just because the length is 0?
> 
> I referred to IB Specification Vol 1-Release-1.5-2021-08-06b.
> The behaviour of responder on receiving a packet is described in "9.7.4.1".
> The current implementation of check_rkey() is justified by "9.7.4.1.5 C9-88".
> 
> > 
> > IBA 9.3.3.2 says:
> > 
> >  <...>
> 
> The document is proprietary. I think it is safer not to quote the contents,
> so I do not show what "9.7.4.1.5 C9-88" says here.
> Sorry for bothering you, but please check the description by
> yourself.

Well, that seems clear enough. Let's reference C9-88 in this patch as
well

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-12-08 12:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-02 11:01 [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing" Daisuke Matsuda
2022-12-02 11:45 ` Zhu Yanjun
2022-12-02 14:35   ` lizhijian
2022-12-02 14:43     ` Jason Gunthorpe
2022-12-05  5:19       ` Daisuke Matsuda (Fujitsu)
2022-12-07 23:43 ` Jason Gunthorpe
2022-12-08  6:08   ` Daisuke Matsuda (Fujitsu)
2022-12-08 12:23     ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.