linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH  1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe
@ 2022-10-12 23:52 Aru Kolappan
  2022-10-13 10:43 ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: Aru Kolappan @ 2022-10-12 23:52 UTC (permalink / raw)
  To: leon, jgg, saeedm, linux-rdma, linux-kernel, netdev
  Cc: manjunath.b.patil, rama.nichanamatlu, aru.kolappan

From: Arumugam Kolappan <aru.kolappan@oracle.com>

Presently, mlx5 driver dumps error CQE by default for few syndromes. Some
syndromes are expected due to application behavior[Ex: REMOTE_ACCESS_ERR
for revoking rkey before RDMA operation is completed]. There is no option
to disable the log if the application decided to do so. This patch
converts the log into dynamic print and by default, this debug print is
disabled. Users can enable/disable this logging at runtime if needed.

Suggested-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com>
---
 drivers/infiniband/hw/mlx5/cq.c | 2 +-
 include/linux/mlx5/cq.h         | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index be189e0..890cdc3 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -269,7 +269,7 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
 
 static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
 {
-	mlx5_ib_warn(dev, "dump error cqe\n");
+	mlx5_ib_dbg(dev, "dump error cqe\n");
 	mlx5_dump_err_cqe(dev->mdev, cqe);
 }
 
diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h
index cb15308..2eae88a 100644
--- a/include/linux/mlx5/cq.h
+++ b/include/linux/mlx5/cq.h
@@ -198,8 +198,8 @@ int mlx5_core_modify_cq_moderation(struct mlx5_core_dev *dev,
 static inline void mlx5_dump_err_cqe(struct mlx5_core_dev *dev,
 				     struct mlx5_err_cqe *err_cqe)
 {
-	print_hex_dump(KERN_WARNING, "", DUMP_PREFIX_OFFSET, 16, 1, err_cqe,
-		       sizeof(*err_cqe), false);
+	print_hex_dump_debug("", DUMP_PREFIX_OFFSET, 16, 1, err_cqe,
+			     sizeof(*err_cqe), false);
 }
 int mlx5_debug_cq_add(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq);
 void mlx5_debug_cq_remove(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH  1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe
  2022-10-12 23:52 [PATCH 1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe Aru Kolappan
@ 2022-10-13 10:43 ` Leon Romanovsky
  2022-10-14 19:12   ` Aru
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2022-10-13 10:43 UTC (permalink / raw)
  To: Aru Kolappan
  Cc: jgg, saeedm, linux-rdma, linux-kernel, netdev, manjunath.b.patil,
	rama.nichanamatlu

On Wed, Oct 12, 2022 at 04:52:52PM -0700, Aru Kolappan wrote:
> From: Arumugam Kolappan <aru.kolappan@oracle.com>
> 
> Presently, mlx5 driver dumps error CQE by default for few syndromes. Some
> syndromes are expected due to application behavior[Ex: REMOTE_ACCESS_ERR
> for revoking rkey before RDMA operation is completed]. There is no option
> to disable the log if the application decided to do so. This patch
> converts the log into dynamic print and by default, this debug print is
> disabled. Users can enable/disable this logging at runtime if needed.
> 
> Suggested-by: Manjunath Patil <manjunath.b.patil@oracle.com>
> Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com>
> ---
>  drivers/infiniband/hw/mlx5/cq.c | 2 +-
>  include/linux/mlx5/cq.h         | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> index be189e0..890cdc3 100644
> --- a/drivers/infiniband/hw/mlx5/cq.c
> +++ b/drivers/infiniband/hw/mlx5/cq.c
> @@ -269,7 +269,7 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
>  
>  static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
>  {
> -	mlx5_ib_warn(dev, "dump error cqe\n");
> +	mlx5_ib_dbg(dev, "dump error cqe\n");

This path should be handled in switch<->case of mlx5_handle_error_cqe()
by skipping dump_cqe for MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR.

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index be189e0525de..2d75c3071a1e 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -306,6 +306,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
                wc->status = IB_WC_REM_INV_REQ_ERR;
                break;
        case MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR:
+               dump = 0;
                wc->status = IB_WC_REM_ACCESS_ERR;
                break;
        case MLX5_CQE_SYNDROME_REMOTE_OP_ERR:

Thanks

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe
  2022-10-13 10:43 ` Leon Romanovsky
@ 2022-10-14 19:12   ` Aru
  2022-10-18  7:47     ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: Aru @ 2022-10-14 19:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: jgg, saeedm, linux-rdma, linux-kernel, netdev, manjunath.b.patil,
	rama.nichanamatlu

Hi Leon,

Thank you for reviewing the patch.

The method you mentioned disables the dump permanently for the kernel.
We thought vendor might have enabled it for their consumption when needed.
Hence we made it dynamic, so that it can be enabled/disabled at run time.

Especially, in a production environment, having the option to turn this 
log on/off
at runtime will be helpful.

Feel free to share your thoughts.

Thanks,
Aru

On 10/13/22 3:43 AM, Leon Romanovsky wrote:
> On Wed, Oct 12, 2022 at 04:52:52PM -0700, Aru Kolappan wrote:
>> From: Arumugam Kolappan <aru.kolappan@oracle.com>
>>
>> Presently, mlx5 driver dumps error CQE by default for few syndromes. Some
>> syndromes are expected due to application behavior[Ex: REMOTE_ACCESS_ERR
>> for revoking rkey before RDMA operation is completed]. There is no option
>> to disable the log if the application decided to do so. This patch
>> converts the log into dynamic print and by default, this debug print is
>> disabled. Users can enable/disable this logging at runtime if needed.
>>
>> Suggested-by: Manjunath Patil <manjunath.b.patil@oracle.com>
>> Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com>
>> ---
>>   drivers/infiniband/hw/mlx5/cq.c | 2 +-
>>   include/linux/mlx5/cq.h         | 4 ++--
>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
>> index be189e0..890cdc3 100644
>> --- a/drivers/infiniband/hw/mlx5/cq.c
>> +++ b/drivers/infiniband/hw/mlx5/cq.c
>> @@ -269,7 +269,7 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
>>   
>>   static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
>>   {
>> -	mlx5_ib_warn(dev, "dump error cqe\n");
>> +	mlx5_ib_dbg(dev, "dump error cqe\n");
> This path should be handled in switch<->case of mlx5_handle_error_cqe()
> by skipping dump_cqe for MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR.
>
> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> index be189e0525de..2d75c3071a1e 100644
> --- a/drivers/infiniband/hw/mlx5/cq.c
> +++ b/drivers/infiniband/hw/mlx5/cq.c
> @@ -306,6 +306,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
>                  wc->status = IB_WC_REM_INV_REQ_ERR;
>                  break;
>          case MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR:
> +               dump = 0;
>                  wc->status = IB_WC_REM_ACCESS_ERR;
>                  break;
>          case MLX5_CQE_SYNDROME_REMOTE_OP_ERR:
>
> Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe
  2022-10-14 19:12   ` Aru
@ 2022-10-18  7:47     ` Leon Romanovsky
  2022-10-20  8:24       ` Aru
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2022-10-18  7:47 UTC (permalink / raw)
  To: Aru
  Cc: jgg, saeedm, linux-rdma, linux-kernel, netdev, manjunath.b.patil,
	rama.nichanamatlu

On Fri, Oct 14, 2022 at 12:12:36PM -0700, Aru wrote:
> Hi Leon,
> 
> Thank you for reviewing the patch.
> 
> The method you mentioned disables the dump permanently for the kernel.
> We thought vendor might have enabled it for their consumption when needed.
> Hence we made it dynamic, so that it can be enabled/disabled at run time.
> 
> Especially, in a production environment, having the option to turn this log
> on/off
> at runtime will be helpful.

While you are interested on/off this specific warning, your change will
cause "to hide" all syndromes as it is unlikely that anyone runs in
production with debug prints.

 -   mlx5_ib_warn(dev, "dump error cqe\n");
 +   mlx5_ib_dbg(dev, "dump error cqe\n");

Something like this will do the trick without interrupting to the others.

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 457f57b088c6..966206085eb3 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -267,10 +267,29 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
 	wc->wc_flags |= IB_WC_WITH_NETWORK_HDR_TYPE;
 }
 
-static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
+static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe,
+		     struct ib_wc *wc, int dump)
 {
-	mlx5_ib_warn(dev, "dump error cqe\n");
-	mlx5_dump_err_cqe(dev->mdev, cqe);
+	const char *level;
+
+	if (!dump)
+		return;
+
+	mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status,
+		     ib_wc_status_msg(wc->status));
+
+	if (dump == 1) {
+		mlx5_ib_warn(dev, "dump error cqe\n");
+		level = KERN_WARNING;
+	}
+
+	if (dump == 2) {
+		mlx5_ib_dbg(dev, "dump error cqe\n");
+		level = KERN_DEBUG;
+	}
+
+	print_hex_dump(level, "", DUMP_PREFIX_OFFSET, 16, 1, cqe, sizeof(*cqe),
+		       false);
 }
 
 static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
@@ -300,6 +319,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
 		wc->status = IB_WC_BAD_RESP_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR:
+		dump = 2;
 		wc->status = IB_WC_LOC_ACCESS_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
@@ -328,11 +348,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
 	}
 
 	wc->vendor_err = cqe->vendor_err_synd;
-	if (dump) {
-		mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status,
-			     ib_wc_status_msg(wc->status));
-		dump_cqe(dev, cqe);
-	}
+	dump_cqe(dev, cqe, wc, dump);
 }
 
 static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64,

> 
> Feel free to share your thoughts.

And please don't top-post.

Thanks
> 
> Thanks,
> Aru
> 
> On 10/13/22 3:43 AM, Leon Romanovsky wrote:
> > On Wed, Oct 12, 2022 at 04:52:52PM -0700, Aru Kolappan wrote:
> > > From: Arumugam Kolappan <aru.kolappan@oracle.com>
> > > 
> > > Presently, mlx5 driver dumps error CQE by default for few syndromes. Some
> > > syndromes are expected due to application behavior[Ex: REMOTE_ACCESS_ERR
> > > for revoking rkey before RDMA operation is completed]. There is no option
> > > to disable the log if the application decided to do so. This patch
> > > converts the log into dynamic print and by default, this debug print is
> > > disabled. Users can enable/disable this logging at runtime if needed.
> > > 
> > > Suggested-by: Manjunath Patil <manjunath.b.patil@oracle.com>
> > > Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com>
> > > ---
> > >   drivers/infiniband/hw/mlx5/cq.c | 2 +-
> > >   include/linux/mlx5/cq.h         | 4 ++--
> > >   2 files changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> > > index be189e0..890cdc3 100644
> > > --- a/drivers/infiniband/hw/mlx5/cq.c
> > > +++ b/drivers/infiniband/hw/mlx5/cq.c
> > > @@ -269,7 +269,7 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
> > >   static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
> > >   {
> > > -	mlx5_ib_warn(dev, "dump error cqe\n");
> > > +	mlx5_ib_dbg(dev, "dump error cqe\n");
> > This path should be handled in switch<->case of mlx5_handle_error_cqe()
> > by skipping dump_cqe for MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR.
> > 
> > diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> > index be189e0525de..2d75c3071a1e 100644
> > --- a/drivers/infiniband/hw/mlx5/cq.c
> > +++ b/drivers/infiniband/hw/mlx5/cq.c
> > @@ -306,6 +306,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
> >                  wc->status = IB_WC_REM_INV_REQ_ERR;
> >                  break;
> >          case MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR:
> > +               dump = 0;
> >                  wc->status = IB_WC_REM_ACCESS_ERR;
> >                  break;
> >          case MLX5_CQE_SYNDROME_REMOTE_OP_ERR:
> > 
> > Thanks

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe
  2022-10-18  7:47     ` Leon Romanovsky
@ 2022-10-20  8:24       ` Aru
  2022-10-20 11:54         ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: Aru @ 2022-10-20  8:24 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: jgg, saeedm, linux-rdma, linux-kernel, netdev, manjunath.b.patil,
	rama.nichanamatlu

On 10/18/22 12:47 AM, Leon Romanovsky wrote:
> On Fri, Oct 14, 2022 at 12:12:36PM -0700, Aru wrote:
>> Hi Leon,
>>
>> Thank you for reviewing the patch.
>>
>> The method you mentioned disables the dump permanently for the kernel.
>> We thought vendor might have enabled it for their consumption when needed.
>> Hence we made it dynamic, so that it can be enabled/disabled at run time.
>>
>> Especially, in a production environment, having the option to turn this log
>> on/off
>> at runtime will be helpful.
> While you are interested on/off this specific warning, your change will
> cause "to hide" all syndromes as it is unlikely that anyone runs in
> production with debug prints.
>
>   -   mlx5_ib_warn(dev, "dump error cqe\n");
>   +   mlx5_ib_dbg(dev, "dump error cqe\n");
>
> Something like this will do the trick without interrupting to the others.
>
> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> index 457f57b088c6..966206085eb3 100644
> --- a/drivers/infiniband/hw/mlx5/cq.c
> +++ b/drivers/infiniband/hw/mlx5/cq.c
> @@ -267,10 +267,29 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
>   	wc->wc_flags |= IB_WC_WITH_NETWORK_HDR_TYPE;
>   }
>   
> -static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
> +static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe,
> +		     struct ib_wc *wc, int dump)
>   {
> -	mlx5_ib_warn(dev, "dump error cqe\n");
> -	mlx5_dump_err_cqe(dev->mdev, cqe);
> +	const char *level;
> +
> +	if (!dump)
> +		return;
> +
> +	mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status,
> +		     ib_wc_status_msg(wc->status));
> +
> +	if (dump == 1) {
> +		mlx5_ib_warn(dev, "dump error cqe\n");
> +		level = KERN_WARNING;
> +	}
> +
> +	if (dump == 2) {
> +		mlx5_ib_dbg(dev, "dump error cqe\n");
> +		level = KERN_DEBUG;
> +	}
> +
> +	print_hex_dump(level, "", DUMP_PREFIX_OFFSET, 16, 1, cqe, sizeof(*cqe),
> +		       false);
>   }
Hi Leon,

Thank you for the reply and your suggested method to handle this debug 
logging.

We set 'dump=2' for the syndromes applicable to our scenario:  
MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR,
MLX5_CQE_SYNDROME_REMOTE_OP_ERR and MLX5_CQE_SYNDROME_LOCAL_PROT_ERR.
We verified this code change and by default, the dump_cqe is not printed 
to syslog until
the level is changed to KERN_DEBUG level. This works as expected.

I will send out another email with the patch using your method.

Is it fine with you If I add your name in the 'suggested-by' field in 
the new patch?

Thanks
Aru

>   
>   static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
> @@ -300,6 +319,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
>   		wc->status = IB_WC_BAD_RESP_ERR;
>   		break;
>   	case MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR:
> +		dump = 2;
>   		wc->status = IB_WC_LOC_ACCESS_ERR;
>   		break;
>   	case MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
> @@ -328,11 +348,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
>   	}
>   
>   	wc->vendor_err = cqe->vendor_err_synd;
> -	if (dump) {
> -		mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status,
> -			     ib_wc_status_msg(wc->status));
> -		dump_cqe(dev, cqe);
> -	}
> +	dump_cqe(dev, cqe, wc, dump);
>   }
>   
>   static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64,
>
>> Feel free to share your thoughts.
> And please don't top-post.
>
> Thanks
>> Thanks,
>> Aru
>>
>> On 10/13/22 3:43 AM, Leon Romanovsky wrote:
>>> On Wed, Oct 12, 2022 at 04:52:52PM -0700, Aru Kolappan wrote:
>>>> From: Arumugam Kolappan <aru.kolappan@oracle.com>
>>>>
>>>> Presently, mlx5 driver dumps error CQE by default for few syndromes. Some
>>>> syndromes are expected due to application behavior[Ex: REMOTE_ACCESS_ERR
>>>> for revoking rkey before RDMA operation is completed]. There is no option
>>>> to disable the log if the application decided to do so. This patch
>>>> converts the log into dynamic print and by default, this debug print is
>>>> disabled. Users can enable/disable this logging at runtime if needed.
>>>>
>>>> Suggested-by: Manjunath Patil <manjunath.b.patil@oracle.com>
>>>> Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com>
>>>> ---
>>>>    drivers/infiniband/hw/mlx5/cq.c | 2 +-
>>>>    include/linux/mlx5/cq.h         | 4 ++--
>>>>    2 files changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
>>>> index be189e0..890cdc3 100644
>>>> --- a/drivers/infiniband/hw/mlx5/cq.c
>>>> +++ b/drivers/infiniband/hw/mlx5/cq.c
>>>> @@ -269,7 +269,7 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
>>>>    static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
>>>>    {
>>>> -	mlx5_ib_warn(dev, "dump error cqe\n");
>>>> +	mlx5_ib_dbg(dev, "dump error cqe\n");
>>> This path should be handled in switch<->case of mlx5_handle_error_cqe()
>>> by skipping dump_cqe for MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR.
>>>
>>> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
>>> index be189e0525de..2d75c3071a1e 100644
>>> --- a/drivers/infiniband/hw/mlx5/cq.c
>>> +++ b/drivers/infiniband/hw/mlx5/cq.c
>>> @@ -306,6 +306,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
>>>                   wc->status = IB_WC_REM_INV_REQ_ERR;
>>>                   break;
>>>           case MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR:
>>> +               dump = 0;
>>>                   wc->status = IB_WC_REM_ACCESS_ERR;
>>>                   break;
>>>           case MLX5_CQE_SYNDROME_REMOTE_OP_ERR:
>>>
>>> Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe
  2022-10-20  8:24       ` Aru
@ 2022-10-20 11:54         ` Leon Romanovsky
  0 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2022-10-20 11:54 UTC (permalink / raw)
  To: Aru
  Cc: jgg, saeedm, linux-rdma, linux-kernel, netdev, manjunath.b.patil,
	rama.nichanamatlu

On Thu, Oct 20, 2022 at 01:24:54AM -0700, Aru wrote:
> On 10/18/22 12:47 AM, Leon Romanovsky wrote:
> > On Fri, Oct 14, 2022 at 12:12:36PM -0700, Aru wrote:
> > > Hi Leon,
> > > 
> > > Thank you for reviewing the patch.
> > > 
> > > The method you mentioned disables the dump permanently for the kernel.
> > > We thought vendor might have enabled it for their consumption when needed.
> > > Hence we made it dynamic, so that it can be enabled/disabled at run time.
> > > 
> > > Especially, in a production environment, having the option to turn this log
> > > on/off
> > > at runtime will be helpful.
> > While you are interested on/off this specific warning, your change will
> > cause "to hide" all syndromes as it is unlikely that anyone runs in
> > production with debug prints.
> > 
> >   -   mlx5_ib_warn(dev, "dump error cqe\n");
> >   +   mlx5_ib_dbg(dev, "dump error cqe\n");
> > 
> > Something like this will do the trick without interrupting to the others.
> > 
> > diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> > index 457f57b088c6..966206085eb3 100644
> > --- a/drivers/infiniband/hw/mlx5/cq.c
> > +++ b/drivers/infiniband/hw/mlx5/cq.c
> > @@ -267,10 +267,29 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
> >   	wc->wc_flags |= IB_WC_WITH_NETWORK_HDR_TYPE;
> >   }
> > -static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
> > +static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe,
> > +		     struct ib_wc *wc, int dump)
> >   {
> > -	mlx5_ib_warn(dev, "dump error cqe\n");
> > -	mlx5_dump_err_cqe(dev->mdev, cqe);
> > +	const char *level;
> > +
> > +	if (!dump)
> > +		return;
> > +
> > +	mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status,
> > +		     ib_wc_status_msg(wc->status));
> > +
> > +	if (dump == 1) {
> > +		mlx5_ib_warn(dev, "dump error cqe\n");
> > +		level = KERN_WARNING;
> > +	}
> > +
> > +	if (dump == 2) {
> > +		mlx5_ib_dbg(dev, "dump error cqe\n");
> > +		level = KERN_DEBUG;
> > +	}
> > +
> > +	print_hex_dump(level, "", DUMP_PREFIX_OFFSET, 16, 1, cqe, sizeof(*cqe),
> > +		       false);
> >   }
> Hi Leon,
> 
> Thank you for the reply and your suggested method to handle this debug
> logging.
> 
> We set 'dump=2' for the syndromes applicable to our scenario: 
> MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR,
> MLX5_CQE_SYNDROME_REMOTE_OP_ERR and MLX5_CQE_SYNDROME_LOCAL_PROT_ERR.
> We verified this code change and by default, the dump_cqe is not printed to
> syslog until
> the level is changed to KERN_DEBUG level. This works as expected.
> 
> I will send out another email with the patch using your method.
> 
> Is it fine with you If I add your name in the 'suggested-by' field in the
> new patch?

Whatever works for you.

Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-10-20 11:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-12 23:52 [PATCH 1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe Aru Kolappan
2022-10-13 10:43 ` Leon Romanovsky
2022-10-14 19:12   ` Aru
2022-10-18  7:47     ` Leon Romanovsky
2022-10-20  8:24       ` Aru
2022-10-20 11:54         ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).