* [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event
@ 2022-07-14 1:30 Cheng Xu
2022-07-18 11:21 ` Leon Romanovsky
0 siblings, 1 reply; 5+ messages in thread
From: Cheng Xu @ 2022-07-14 1:30 UTC (permalink / raw)
To: jgg, leon, BMT; +Cc: linux-rdma, chengyou
If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
in this case. This may trigger a call trace in iw_cm. A simple way to
trigger this:
server: ib_send_lat
client: ib_send_lat -R <server_ip>
The call trace looks like this:
kernel BUG at drivers/infiniband/core/iwcm.c:894!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<...>
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
<TASK>
cm_work_handler+0x1dd/0x370 [iw_cm]
process_one_work+0x1e2/0x3b0
worker_thread+0x49/0x2e0
? rescuer_thread+0x370/0x370
kthread+0xe5/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
---
drivers/infiniband/sw/siw/siw_cm.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index 17f34d584cd9..f88d2971c2c6 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep)
enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR;
rv = siw_recv_mpa_rr(cep);
- if (rv != -EAGAIN)
- siw_cancel_mpatimer(cep);
if (rv)
goto out_err;
+ siw_cancel_mpatimer(cep);
+
rep = &cep->mpa.hdr;
if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) {
@@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep)
}
out_err:
- siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
+ if (rv != -EAGAIN)
+ siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
return rv;
}
--
2.37.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event
2022-07-14 1:30 [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event Cheng Xu
@ 2022-07-18 11:21 ` Leon Romanovsky
0 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2022-07-18 11:21 UTC (permalink / raw)
To: Cheng Xu; +Cc: jgg, BMT, linux-rdma
On Thu, Jul 14, 2022 at 09:30:47AM +0800, Cheng Xu wrote:
> If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
> been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
> in this case. This may trigger a call trace in iw_cm. A simple way to
> trigger this:
> server: ib_send_lat
> client: ib_send_lat -R <server_ip>
>
> The call trace looks like this:
>
> kernel BUG at drivers/infiniband/core/iwcm.c:894!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> <...>
> Workqueue: iw_cm_wq cm_work_handler [iw_cm]
> Call Trace:
> <TASK>
> cm_work_handler+0x1dd/0x370 [iw_cm]
> process_one_work+0x1e2/0x3b0
> worker_thread+0x49/0x2e0
> ? rescuer_thread+0x370/0x370
> kthread+0xe5/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
> </TASK>
>
> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
> ---
> drivers/infiniband/sw/siw/siw_cm.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Thanks, applied.
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event
2022-07-14 13:20 ` Cheng Xu
@ 2022-07-14 13:58 ` Bernard Metzler
0 siblings, 0 replies; 5+ messages in thread
From: Bernard Metzler @ 2022-07-14 13:58 UTC (permalink / raw)
To: Cheng Xu, jgg, leon; +Cc: linux-rdma
> -----Original Message-----
> From: Cheng Xu <chengyou@linux.alibaba.com>
> Sent: Thursday, 14 July 2022 15:20
> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org
> Cc: linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] Re: [PATCH for-next] RDMA/siw: Fix duplicated reported
> IW_CM_EVENT_CONNECT_REPLY event
>
>
>
> On 7/14/22 8:59 PM, Bernard Metzler wrote:
> >> -----Original Message-----
> >> From: Cheng Xu <chengyou@linux.alibaba.com>
> >> Sent: Thursday, 14 July 2022 03:31
> >> To: jgg@ziepe.ca; leon@kernel.org; Bernard Metzler <BMT@zurich.ibm.com>
> >> Cc: linux-rdma@vger.kernel.org; chengyou@linux.alibaba.com
> >> Subject: [EXTERNAL] [PATCH for-next] RDMA/siw: Fix duplicated reported
> >> IW_CM_EVENT_CONNECT_REPLY event
> >>
> >> If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
> >> been received completely, and should not report
> IW_CM_EVENT_CONNECT_REPLY
> >> in this case. This may trigger a call trace in iw_cm. A simple way to
> >> trigger this:
> >
> > Great, thanks! I obviously did never hit an incomplete
> > MPA hdr. Please make another change to fix it correctly,
> > as suggested below.
> >
> >
> > case of an incomplete
> >> server: ib_send_lat
> >> client: ib_send_lat -R <server_ip>
> >>
> >> The call trace looks like this:
> >>
> >> kernel BUG at drivers/infiniband/core/iwcm.c:894!
> >> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> >> <...>
> >> Workqueue: iw_cm_wq cm_work_handler [iw_cm]
> >> Call Trace:
> >> <TASK>
> >> cm_work_handler+0x1dd/0x370 [iw_cm]
> >> process_one_work+0x1e2/0x3b0
> >> worker_thread+0x49/0x2e0
> >> ? rescuer_thread+0x370/0x370
> >> kthread+0xe5/0x110
> >> ? kthread_complete_and_exit+0x20/0x20
> >> ret_from_fork+0x1f/0x30
> >> </TASK>
> >>
> >> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
> >> ---
> >> drivers/infiniband/sw/siw/siw_cm.c | 7 ++++---
> >> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
> >> b/drivers/infiniband/sw/siw/siw_cm.c
> >> index 17f34d584cd9..f88d2971c2c6 100644
> >> --- a/drivers/infiniband/sw/siw/siw_cm.c
> >> +++ b/drivers/infiniband/sw/siw/siw_cm.c
> >> @@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep)
> >> enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR;
> >>
> >> rv = siw_recv_mpa_rr(cep);
> >> - if (rv != -EAGAIN)
> >> - siw_cancel_mpatimer(cep);
> >> if (rv)
> >> goto out_err;
> >>
> >> + siw_cancel_mpatimer(cep);
> >> +
> >
> > Cancel the MPA timer only if we have a
> > real error. -EAGAIN translates to just
> > further waiting. So best to add the timer
> > cancellation to the error bailout section.
> >
> >> rep = &cep->mpa.hdr;
> >>
> >> if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) {
> >> @@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep)
> >> }
> >>
> >> out_err:
> >> - siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
> >> + if (rv != -EAGAIN)
> > {
> > cancel MPA timer here.
>
> Indeed we do not need it here, because when siw_proc_mpareply returns error
> but not -EAGAIN, the release_cep will be set in the caller
> (siw_cm_work_handler),
> and siw_cancel_mpatimer will be called in the error handle flow.
>
> I think this is better, because the error handle is more unified.
Yes, sorry, your original patch is correct.
>
> How do you think?
>
> Thanks,
> Cheng Xu
>
>
> > siw_cancel_mpatimer(cep);
> >> + siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
> > }
> >>
> >> return rv;
> >> }
> >> --
> >> 2.37.0
> >
Thank you!
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event
2022-07-14 12:59 Bernard Metzler
@ 2022-07-14 13:20 ` Cheng Xu
2022-07-14 13:58 ` Bernard Metzler
0 siblings, 1 reply; 5+ messages in thread
From: Cheng Xu @ 2022-07-14 13:20 UTC (permalink / raw)
To: Bernard Metzler, jgg, leon; +Cc: linux-rdma
On 7/14/22 8:59 PM, Bernard Metzler wrote:
>> -----Original Message-----
>> From: Cheng Xu <chengyou@linux.alibaba.com>
>> Sent: Thursday, 14 July 2022 03:31
>> To: jgg@ziepe.ca; leon@kernel.org; Bernard Metzler <BMT@zurich.ibm.com>
>> Cc: linux-rdma@vger.kernel.org; chengyou@linux.alibaba.com
>> Subject: [EXTERNAL] [PATCH for-next] RDMA/siw: Fix duplicated reported
>> IW_CM_EVENT_CONNECT_REPLY event
>>
>> If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
>> been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
>> in this case. This may trigger a call trace in iw_cm. A simple way to
>> trigger this:
>
> Great, thanks! I obviously did never hit an incomplete
> MPA hdr. Please make another change to fix it correctly,
> as suggested below.
>
>
> case of an incomplete
>> server: ib_send_lat
>> client: ib_send_lat -R <server_ip>
>>
>> The call trace looks like this:
>>
>> kernel BUG at drivers/infiniband/core/iwcm.c:894!
>> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> <...>
>> Workqueue: iw_cm_wq cm_work_handler [iw_cm]
>> Call Trace:
>> <TASK>
>> cm_work_handler+0x1dd/0x370 [iw_cm]
>> process_one_work+0x1e2/0x3b0
>> worker_thread+0x49/0x2e0
>> ? rescuer_thread+0x370/0x370
>> kthread+0xe5/0x110
>> ? kthread_complete_and_exit+0x20/0x20
>> ret_from_fork+0x1f/0x30
>> </TASK>
>>
>> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
>> ---
>> drivers/infiniband/sw/siw/siw_cm.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
>> b/drivers/infiniband/sw/siw/siw_cm.c
>> index 17f34d584cd9..f88d2971c2c6 100644
>> --- a/drivers/infiniband/sw/siw/siw_cm.c
>> +++ b/drivers/infiniband/sw/siw/siw_cm.c
>> @@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep)
>> enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR;
>>
>> rv = siw_recv_mpa_rr(cep);
>> - if (rv != -EAGAIN)
>> - siw_cancel_mpatimer(cep);
>> if (rv)
>> goto out_err;
>>
>> + siw_cancel_mpatimer(cep);
>> +
>
> Cancel the MPA timer only if we have a
> real error. -EAGAIN translates to just
> further waiting. So best to add the timer
> cancellation to the error bailout section.
>
>> rep = &cep->mpa.hdr;
>>
>> if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) {
>> @@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep)
>> }
>>
>> out_err:
>> - siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
>> + if (rv != -EAGAIN)
> {
> cancel MPA timer here.
Indeed we do not need it here, because when siw_proc_mpareply returns error
but not -EAGAIN, the release_cep will be set in the caller (siw_cm_work_handler),
and siw_cancel_mpatimer will be called in the error handle flow.
I think this is better, because the error handle is more unified.
How do you think?
Thanks,
Cheng Xu
> siw_cancel_mpatimer(cep);
>> + siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
> }
>>
>> return rv;
>> }
>> --
>> 2.37.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event
@ 2022-07-14 12:59 Bernard Metzler
2022-07-14 13:20 ` Cheng Xu
0 siblings, 1 reply; 5+ messages in thread
From: Bernard Metzler @ 2022-07-14 12:59 UTC (permalink / raw)
To: Cheng Xu, jgg, leon; +Cc: linux-rdma
> -----Original Message-----
> From: Cheng Xu <chengyou@linux.alibaba.com>
> Sent: Thursday, 14 July 2022 03:31
> To: jgg@ziepe.ca; leon@kernel.org; Bernard Metzler <BMT@zurich.ibm.com>
> Cc: linux-rdma@vger.kernel.org; chengyou@linux.alibaba.com
> Subject: [EXTERNAL] [PATCH for-next] RDMA/siw: Fix duplicated reported
> IW_CM_EVENT_CONNECT_REPLY event
>
> If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
> been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
> in this case. This may trigger a call trace in iw_cm. A simple way to
> trigger this:
Great, thanks! I obviously did never hit an incomplete
MPA hdr. Please make another change to fix it correctly,
as suggested below.
case of an incomplete
> server: ib_send_lat
> client: ib_send_lat -R <server_ip>
>
> The call trace looks like this:
>
> kernel BUG at drivers/infiniband/core/iwcm.c:894!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> <...>
> Workqueue: iw_cm_wq cm_work_handler [iw_cm]
> Call Trace:
> <TASK>
> cm_work_handler+0x1dd/0x370 [iw_cm]
> process_one_work+0x1e2/0x3b0
> worker_thread+0x49/0x2e0
> ? rescuer_thread+0x370/0x370
> kthread+0xe5/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
> </TASK>
>
> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
> ---
> drivers/infiniband/sw/siw/siw_cm.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
> b/drivers/infiniband/sw/siw/siw_cm.c
> index 17f34d584cd9..f88d2971c2c6 100644
> --- a/drivers/infiniband/sw/siw/siw_cm.c
> +++ b/drivers/infiniband/sw/siw/siw_cm.c
> @@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep)
> enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR;
>
> rv = siw_recv_mpa_rr(cep);
> - if (rv != -EAGAIN)
> - siw_cancel_mpatimer(cep);
> if (rv)
> goto out_err;
>
> + siw_cancel_mpatimer(cep);
> +
Cancel the MPA timer only if we have a
real error. -EAGAIN translates to just
further waiting. So best to add the timer
cancellation to the error bailout section.
> rep = &cep->mpa.hdr;
>
> if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) {
> @@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep)
> }
>
> out_err:
> - siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
> + if (rv != -EAGAIN)
{
cancel MPA timer here.
siw_cancel_mpatimer(cep);
> + siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
}
>
> return rv;
> }
> --
> 2.37.0
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-07-18 11:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14 1:30 [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event Cheng Xu
2022-07-18 11:21 ` Leon Romanovsky
2022-07-14 12:59 Bernard Metzler
2022-07-14 13:20 ` Cheng Xu
2022-07-14 13:58 ` Bernard Metzler
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.