All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] RDMA/rxe: Ratelimit error messages of read_reply()
@ 2022-08-25 11:02 Daisuke Matsuda
  2022-08-25 15:55 ` Bob Pearson
  2022-08-26 12:28 ` Jason Gunthorpe
  0 siblings, 2 replies; 13+ messages in thread
From: Daisuke Matsuda @ 2022-08-25 11:02 UTC (permalink / raw)
  To: leonro, jgg, zyjzyj2000; +Cc: linux-rdma, Daisuke Matsuda

When responder cannot copy data from a user MR, error messages overflow.
This is because an incoming RDMA Read request can results in multiple Read
responses. If the target MR is somehow unavailable, then the error message
is generated for every Read response.

For the same reason, the error message for packet transmission should also
be ratelimited.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index b36ec5c4d5e0..f9e9679b5e32 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -812,7 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
 			  payload, RXE_FROM_MR_OBJ);
 	if (err)
-		pr_err("Failed copying memory\n");
+		pr_err_ratelimited("Failed copying memory\n");
 	if (mr)
 		rxe_put(mr);
 
@@ -824,7 +824,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 
 	err = rxe_xmit_packet(qp, &ack_pkt, skb);
 	if (err) {
-		pr_err("Failed sending RDMA reply.\n");
+		pr_err_ratelimited("Failed sending RDMA reply.\n");
 		return RESPST_ERR_RNR;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Ratelimit error messages of read_reply()
  2022-08-25 11:02 [PATCH] RDMA/rxe: Ratelimit error messages of read_reply() Daisuke Matsuda
@ 2022-08-25 15:55 ` Bob Pearson
  2022-08-26 12:28 ` Jason Gunthorpe
  1 sibling, 0 replies; 13+ messages in thread
From: Bob Pearson @ 2022-08-25 15:55 UTC (permalink / raw)
  To: Daisuke Matsuda, leonro, jgg, zyjzyj2000; +Cc: linux-rdma

On 8/25/22 06:02, Daisuke Matsuda wrote:
> When responder cannot copy data from a user MR, error messages overflow.
> This is because an incoming RDMA Read request can results in multiple Read
> responses. If the target MR is somehow unavailable, then the error message
> is generated for every Read response.
> 
> For the same reason, the error message for packet transmission should also
> be ratelimited.
> 
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..f9e9679b5e32 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -812,7 +812,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>  	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>  			  payload, RXE_FROM_MR_OBJ);
>  	if (err)
> -		pr_err("Failed copying memory\n");
> +		pr_err_ratelimited("Failed copying memory\n");
>  	if (mr)
>  		rxe_put(mr);
>  
> @@ -824,7 +824,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>  
>  	err = rxe_xmit_packet(qp, &ack_pkt, skb);
>  	if (err) {
> -		pr_err("Failed sending RDMA reply.\n");
> +		pr_err_ratelimited("Failed sending RDMA reply.\n");
>  		return RESPST_ERR_RNR;
>  	}
>  

Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Ratelimit error messages of read_reply()
  2022-08-25 11:02 [PATCH] RDMA/rxe: Ratelimit error messages of read_reply() Daisuke Matsuda
  2022-08-25 15:55 ` Bob Pearson
@ 2022-08-26 12:28 ` Jason Gunthorpe
  2022-08-29  5:16   ` matsuda-daisuke
  2022-08-29  5:44   ` [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests Daisuke Matsuda
  1 sibling, 2 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-08-26 12:28 UTC (permalink / raw)
  To: Daisuke Matsuda; +Cc: leonro, zyjzyj2000, linux-rdma

On Thu, Aug 25, 2022 at 08:02:55PM +0900, Daisuke Matsuda wrote:
> When responder cannot copy data from a user MR, error messages overflow.
> This is because an incoming RDMA Read request can results in multiple Read
> responses. If the target MR is somehow unavailable, then the error message
> is generated for every Read response.
> 
> For the same reason, the error message for packet transmission should also
> be ratelimited.
> 
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

These lines should be deleted, network packts should never trigger
printing.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Ratelimit error messages of read_reply()
  2022-08-26 12:28 ` Jason Gunthorpe
@ 2022-08-29  5:16   ` matsuda-daisuke
  2022-08-29  5:44   ` [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests Daisuke Matsuda
  1 sibling, 0 replies; 13+ messages in thread
From: matsuda-daisuke @ 2022-08-29  5:16 UTC (permalink / raw)
  To: 'Jason Gunthorpe'; +Cc: zyjzyj2000, linux-rdma

On Friday, August 26, 2022 9:28 PM, Jason Gunthorpe wrote:
> On Thu, Aug 25, 2022 at 08:02:55PM +0900, Daisuke Matsuda wrote:
> > When responder cannot copy data from a user MR, error messages overflow.
> > This is because an incoming RDMA Read request can results in multiple
> Read
> > responses. If the target MR is somehow unavailable, then the error message
> > is generated for every Read response.
> >
> > For the same reason, the error message for packet transmission should also
> > be ratelimited.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> >  drivers/infiniband/sw/rxe/rxe_resp.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> These lines should be deleted, network packts should never trigger
> printing.
> 
> Jason

Okay. I will post another patch to do that.

I wonder if we should also delete some messages in rxe_rcv() and its callees.
It seems some of them can be triggered by packets from an arbitrary client
even when there is no established connection between the requesting and
responding nodes.
As far as I know, the message below can cause a message overflow.
=====
static int hdr_check(struct rxe_pkt_info *pkt)
{
~~~~~
        if (qpn != IB_MULTICAST_QPN) {
                index = (qpn == 1) ? port->qp_gsi_index : qpn;

                qp = rxe_pool_get_index(&rxe->qp_pool, index);
                if (unlikely(!qp)) {
                        pr_warn_ratelimited("no qp matches qpn 0x%x\n", qpn);
                        goto err1;
                }
=====

Daisuke Matsuda

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
  2022-08-26 12:28 ` Jason Gunthorpe
  2022-08-29  5:16   ` matsuda-daisuke
@ 2022-08-29  5:44   ` Daisuke Matsuda
  2022-08-29  5:51     ` Cheng Xu
                       ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: Daisuke Matsuda @ 2022-08-29  5:44 UTC (permalink / raw)
  To: jgg; +Cc: linux-rdma, matsuda-daisuke, zyjzyj2000

An incoming Read request causes multiple Read responses. If a user MR to
copy data from is unavailable or responder cannot send a reply, then the
error messages can be printed for each response attempt, resulting in
message overflow.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index b36ec5c4d5e0..4b3e8aec2fb8 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 
 	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
 			  payload, RXE_FROM_MR_OBJ);
-	if (err)
-		pr_err("Failed copying memory\n");
 	if (mr)
 		rxe_put(mr);
 
@@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	}
 
 	err = rxe_xmit_packet(qp, &ack_pkt, skb);
-	if (err) {
-		pr_err("Failed sending RDMA reply.\n");
+	if (err)
 		return RESPST_ERR_RNR;
-	}
 
 	res->read.va += payload;
 	res->read.resid -= payload;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
  2022-08-29  5:44   ` [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests Daisuke Matsuda
@ 2022-08-29  5:51     ` Cheng Xu
  2022-08-29  6:31     ` Zhu Yanjun
  2022-08-29  7:36     ` Li Zhijian
  2 siblings, 0 replies; 13+ messages in thread
From: Cheng Xu @ 2022-08-29  5:51 UTC (permalink / raw)
  To: Daisuke Matsuda, jgg; +Cc: linux-rdma, zyjzyj2000



On 8/29/22 1:44 PM, Daisuke Matsuda wrote:
> An incoming Read request causes multiple Read responses. If a user MR to
> copy data from is unavailable or responder cannot send a reply, then the
> error messages can be printed for each response attempt, resulting in
> message overflow.
> 
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..4b3e8aec2fb8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>  
>  	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>  			  payload, RXE_FROM_MR_OBJ);
> -	if (err)
> -		pr_err("Failed copying memory\n");

The err is set but not used. Below is better:

-  	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
-  			  payload, RXE_FROM_MR_OBJ);
+	rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
+		    payload, RXE_FROM_MR_OBJ);

Thanks,
Cheng Xu

>  	if (mr)
>  		rxe_put(mr);
>  
> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>  	}
>  
>  	err = rxe_xmit_packet(qp, &ack_pkt, skb);
> -	if (err) {
> -		pr_err("Failed sending RDMA reply.\n");
> +	if (err)
>  		return RESPST_ERR_RNR;
> -	}
>  
>  	res->read.va += payload;
>  	res->read.resid -= payload;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
  2022-08-29  5:44   ` [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests Daisuke Matsuda
  2022-08-29  5:51     ` Cheng Xu
@ 2022-08-29  6:31     ` Zhu Yanjun
  2022-08-29  7:02       ` Leon Romanovsky
  2022-08-29  7:36     ` Li Zhijian
  2 siblings, 1 reply; 13+ messages in thread
From: Zhu Yanjun @ 2022-08-29  6:31 UTC (permalink / raw)
  To: Daisuke Matsuda; +Cc: Jason Gunthorpe, RDMA mailing list

On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda
<matsuda-daisuke@fujitsu.com> wrote:
>
> An incoming Read request causes multiple Read responses. If a user MR to
> copy data from is unavailable or responder cannot send a reply, then the
> error messages can be printed for each response attempt, resulting in
> message overflow.
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..4b3e8aec2fb8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>
>         err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>                           payload, RXE_FROM_MR_OBJ);
> -       if (err)
> -               pr_err("Failed copying memory\n");

pr_err_once is better?

>         if (mr)
>                 rxe_put(mr);
>
> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>         }
>
>         err = rxe_xmit_packet(qp, &ack_pkt, skb);
> -       if (err) {
> -               pr_err("Failed sending RDMA reply.\n");

The same with the above

Zhu Yanjun

> +       if (err)
>                 return RESPST_ERR_RNR;
> -       }
>
>         res->read.va += payload;
>         res->read.resid -= payload;
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
  2022-08-29  6:31     ` Zhu Yanjun
@ 2022-08-29  7:02       ` Leon Romanovsky
  2022-08-29  7:13         ` Zhu Yanjun
  0 siblings, 1 reply; 13+ messages in thread
From: Leon Romanovsky @ 2022-08-29  7:02 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: Daisuke Matsuda, Jason Gunthorpe, RDMA mailing list

On Mon, Aug 29, 2022 at 02:31:00PM +0800, Zhu Yanjun wrote:
> On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda
> <matsuda-daisuke@fujitsu.com> wrote:
> >
> > An incoming Read request causes multiple Read responses. If a user MR to
> > copy data from is unavailable or responder cannot send a reply, then the
> > error messages can be printed for each response attempt, resulting in
> > message overflow.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> >  drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > index b36ec5c4d5e0..4b3e8aec2fb8 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >
> >         err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> >                           payload, RXE_FROM_MR_OBJ);
> > -       if (err)
> > -               pr_err("Failed copying memory\n");
> 
> pr_err_once is better?

Like Jason said, there shouldn't any prints in network triggered flows.

Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
  2022-08-29  7:02       ` Leon Romanovsky
@ 2022-08-29  7:13         ` Zhu Yanjun
  0 siblings, 0 replies; 13+ messages in thread
From: Zhu Yanjun @ 2022-08-29  7:13 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Daisuke Matsuda, Jason Gunthorpe, RDMA mailing list

On Mon, Aug 29, 2022 at 3:02 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Aug 29, 2022 at 02:31:00PM +0800, Zhu Yanjun wrote:
> > On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda
> > <matsuda-daisuke@fujitsu.com> wrote:
> > >
> > > An incoming Read request causes multiple Read responses. If a user MR to
> > > copy data from is unavailable or responder cannot send a reply, then the
> > > error messages can be printed for each response attempt, resulting in
> > > message overflow.
> > >
> > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > > ---
> > >  drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> > >  1 file changed, 1 insertion(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > > index b36ec5c4d5e0..4b3e8aec2fb8 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> > >
> > >         err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> > >                           payload, RXE_FROM_MR_OBJ);
> > > -       if (err)
> > > -               pr_err("Failed copying memory\n");
> >
> > pr_err_once is better?
>
> Like Jason said, there shouldn't any prints in network triggered flows.
>

Ok, thanks.

Zhu Yanjun
> Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
  2022-08-29  5:44   ` [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests Daisuke Matsuda
  2022-08-29  5:51     ` Cheng Xu
  2022-08-29  6:31     ` Zhu Yanjun
@ 2022-08-29  7:36     ` Li Zhijian
  2022-08-29 10:21       ` matsuda-daisuke
  2 siblings, 1 reply; 13+ messages in thread
From: Li Zhijian @ 2022-08-29  7:36 UTC (permalink / raw)
  To: Daisuke Matsuda, jgg; +Cc: linux-rdma, zyjzyj2000



On 29/08/2022 13:44, Daisuke Matsuda wrote:
> An incoming Read request causes multiple Read responses. If a user MR to
> copy data from is unavailable or responder cannot send a reply, then the
> error messages can be printed for each response attempt, resulting in
> message overflow.
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..4b3e8aec2fb8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>   
>   	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>   			  payload, RXE_FROM_MR_OBJ);
> -	if (err)
> -		pr_err("Failed copying memory\n");
Not relate to this patch.
I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
IMO, when err happens, responder shall notify the request anyhow.

Thanks
Zhijian

>   	if (mr)
>   		rxe_put(mr);
>   
> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>   	}
>   
>   	err = rxe_xmit_packet(qp, &ack_pkt, skb);
> -	if (err) {
> -		pr_err("Failed sending RDMA reply.\n");
> +	if (err)
>   		return RESPST_ERR_RNR;
> -	}
>   
>   	res->read.va += payload;
>   	res->read.resid -= payload;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
  2022-08-29  7:36     ` Li Zhijian
@ 2022-08-29 10:21       ` matsuda-daisuke
  2022-08-30  9:44         ` Li Zhijian
  0 siblings, 1 reply; 13+ messages in thread
From: matsuda-daisuke @ 2022-08-29 10:21 UTC (permalink / raw)
  To: lizhijian, jgg; +Cc: linux-rdma, zyjzyj2000

On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
> On 29/08/2022 13:44, Daisuke Matsuda wrote:
> > An incoming Read request causes multiple Read responses. If a user MR to
> > copy data from is unavailable or responder cannot send a reply, then the
> > error messages can be printed for each response attempt, resulting in
> > message overflow.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> >   drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> >   1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > index b36ec5c4d5e0..4b3e8aec2fb8 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >
> >   	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> >   			  payload, RXE_FROM_MR_OBJ);
> > -	if (err)
> > -		pr_err("Failed copying memory\n");
> Not relate to this patch.
> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
> IMO, when err happens, responder shall notify the request anyhow.

Practically, I have never seen rxe_mr_copy() failed before,
but I agree the implementation may be incorrect as you mentioned.

As far as I tested, responder replied with the requested amount of payloads
even when rxe_mr_copy() is modified to fail. In this case, 
requester may mistakenly believe that they get data correctly.

For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).

Daisuke Matsuda

> 
> Thanks
> Zhijian
> 
> >   	if (mr)
> >   		rxe_put(mr);
> >
> > @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >   	}
> >
> >   	err = rxe_xmit_packet(qp, &ack_pkt, skb);
> > -	if (err) {
> > -		pr_err("Failed sending RDMA reply.\n");
> > +	if (err)
> >   		return RESPST_ERR_RNR;
> > -	}
> >
> >   	res->read.va += payload;
> >   	res->read.resid -= payload;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
  2022-08-29 10:21       ` matsuda-daisuke
@ 2022-08-30  9:44         ` Li Zhijian
  2022-08-30  9:54           ` Li Zhijian
  0 siblings, 1 reply; 13+ messages in thread
From: Li Zhijian @ 2022-08-30  9:44 UTC (permalink / raw)
  To: Matsuda, Daisuke/松田 大輔, jgg
  Cc: linux-rdma, zyjzyj2000



On 29/08/2022 18:21, Matsuda, Daisuke/松田 大輔 wrote:
> On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
>> On 29/08/2022 13:44, Daisuke Matsuda wrote:
>>> An incoming Read request causes multiple Read responses. If a user MR to
>>> copy data from is unavailable or responder cannot send a reply, then the
>>> error messages can be printed for each response attempt, resulting in
>>> message overflow.
>>>
>>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
>>> ---
>>>    drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>>>    1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>>> index b36ec5c4d5e0..4b3e8aec2fb8 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>>> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>
>>>    	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>>    			  payload, RXE_FROM_MR_OBJ);
>>> -	if (err)
>>> -		pr_err("Failed copying memory\n");
>> Not relate to this patch.
>> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
>> IMO, when err happens, responder shall notify the request anyhow.
> Practically, I have never seen rxe_mr_copy() failed before,
> but I agree the implementation may be incorrect as you mentioned.
>
> As far as I tested, responder replied with the requested amount of payloads
> even when rxe_mr_copy() is modified to fail. In this case,
> requester may mistakenly believe that they get data correctly.
>
> For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).

it seems it's suitable to reply NAK code "REMOTE ACCESS ERROR" to the requester side
by returning RESPST_ERR_RKEY_VIOLATION here.

see "9.7.5.2.4 REMOTE ACCESS ERROR" and "9.7.4.1.5 RESPONDER R_KEY VALIDATION"



>
> Daisuke Matsuda
>
>> Thanks
>> Zhijian
>>
>>>    	if (mr)
>>>    		rxe_put(mr);
>>>
>>> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>    	}
>>>
>>>    	err = rxe_xmit_packet(qp, &ack_pkt, skb);
>>> -	if (err) {
>>> -		pr_err("Failed sending RDMA reply.\n");
>>> +	if (err)
>>>    		return RESPST_ERR_RNR;
>>> -	}
>>>
>>>    	res->read.va += payload;
>>>    	res->read.resid -= payload;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests
  2022-08-30  9:44         ` Li Zhijian
@ 2022-08-30  9:54           ` Li Zhijian
  0 siblings, 0 replies; 13+ messages in thread
From: Li Zhijian @ 2022-08-30  9:54 UTC (permalink / raw)
  To: Matsuda, Daisuke/松田 大輔, jgg
  Cc: linux-rdma, zyjzyj2000



On 30/08/2022 17:44, Li Zhijian wrote:
>
>
> On 29/08/2022 18:21, Matsuda, Daisuke/松田 大輔 wrote:
>> On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
>>> On 29/08/2022 13:44, Daisuke Matsuda wrote:
>>>> An incoming Read request causes multiple Read responses. If a user MR to
>>>> copy data from is unavailable or responder cannot send a reply, then the
>>>> error messages can be printed for each response attempt, resulting in
>>>> message overflow.
>>>>
>>>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
>>>> ---
>>>>    drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>>>>    1 file changed, 1 insertion(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>>>> index b36ec5c4d5e0..4b3e8aec2fb8 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>>>> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>>
>>>>        err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>>>                  payload, RXE_FROM_MR_OBJ);

Looks i was missing the the faulting point inside rxe_mr_copy()

mr_check_range() is the only point to return an error inside rxe_mr_copy()
and mr_check_range() would never fail in this moment, since it is always tested by RESPST_CHK_RKEY
before calling read_reply()

so it's safe to remove this print, and add some comments ?

Thanks
Zhijian



>>>> -    if (err)
>>>> -        pr_err("Failed copying memory\n");
>>> Not relate to this patch.
>>> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
>>> IMO, when err happens, responder shall notify the request anyhow.
>> Practically, I have never seen rxe_mr_copy() failed before,
>> but I agree the implementation may be incorrect as you mentioned.
>>
>> As far as I tested, responder replied with the requested amount of payloads
>> even when rxe_mr_copy() is modified to fail. In this case,
>> requester may mistakenly believe that they get data correctly.
>>
>> For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).
>
> it seems it's suitable to reply NAK code "REMOTE ACCESS ERROR" to the requester side
> by returning RESPST_ERR_RKEY_VIOLATION here.
>
> see "9.7.5.2.4 REMOTE ACCESS ERROR" and "9.7.4.1.5 RESPONDER R_KEY VALIDATION"
>
>
>
>>
>> Daisuke Matsuda
>>
>>> Thanks
>>> Zhijian
>>>
>>>>        if (mr)
>>>>            rxe_put(mr);
>>>>
>>>> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>>        }
>>>>
>>>>        err = rxe_xmit_packet(qp, &ack_pkt, skb);
>>>> -    if (err) {
>>>> -        pr_err("Failed sending RDMA reply.\n");
>>>> +    if (err)
>>>>            return RESPST_ERR_RNR;
>>>> -    }
>>>>
>>>>        res->read.va += payload;
>>>>        res->read.resid -= payload;
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-08-30  9:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 11:02 [PATCH] RDMA/rxe: Ratelimit error messages of read_reply() Daisuke Matsuda
2022-08-25 15:55 ` Bob Pearson
2022-08-26 12:28 ` Jason Gunthorpe
2022-08-29  5:16   ` matsuda-daisuke
2022-08-29  5:44   ` [PATCH] RDMA/rxe: Delete error messages triggered by incoming Read requests Daisuke Matsuda
2022-08-29  5:51     ` Cheng Xu
2022-08-29  6:31     ` Zhu Yanjun
2022-08-29  7:02       ` Leon Romanovsky
2022-08-29  7:13         ` Zhu Yanjun
2022-08-29  7:36     ` Li Zhijian
2022-08-29 10:21       ` matsuda-daisuke
2022-08-30  9:44         ` Li Zhijian
2022-08-30  9:54           ` Li Zhijian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.