All of lore.kernel.org
 help / color / mirror / Atom feed
From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: Ruifeng Wang <ruifeng.wang@arm.com>,
	Raslan Darawsheh <rasland@nvidia.com>,
	Matan Azrad <matan@nvidia.com>,
	Shahaf Shuler <shahafs@nvidia.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"jerinj@marvell.com" <jerinj@marvell.com>,
	 "nd@arm.com" <nd@arm.com>,
	"honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
	"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 1/2] net/mlx5: remove redundant operations
Date: Fri, 2 Jul 2021 08:12:38 +0000	[thread overview]
Message-ID: <DM6PR12MB37530E9FCE3A71269870B756DF1F9@DM6PR12MB3753.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20210601083055.97261-2-ruifeng.wang@arm.com>

Hi, Ruifeng

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Tuesday, June 1, 2021 11:31
> To: Raslan Darawsheh <rasland@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; nd@arm.com;
> honnappa.nagarahalli@arm.com; Ruifeng Wang <ruifeng.wang@arm.com>;
> stable@dpdk.org
> Subject: [PATCH 1/2] net/mlx5: remove redundant operations
> 
> Some operations on mask are redundant and can be removed.
> The change yielded 1.6% performance gain on N1SDP.
> On ThunderX2, slight performance uplift was also observed.
> 
> Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> index 2234fbe6b2..98a75b09c6 100644
> --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> @@ -768,18 +768,11 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq,
> volatile struct mlx5_cqe *cq,
>  					  comp_mask), 0)) /
>  					  (sizeof(uint16_t) * 8);
>  		/* D.6 mask out entries after the compressed CQE. */
> -		mask = vcreate_u16(comp_idx <
> MLX5_VPMD_DESCS_PER_LOOP ?
> -				   -1UL >> (comp_idx * sizeof(uint16_t) * 8) :
> -				   0);
> -		invalid_mask = vorr_u16(invalid_mask, mask);
> +		invalid_mask = vorr_u16(invalid_mask, comp_mask);

Mmmm... I'm not sure we can drop the masking compressed (and following) CQE skip.
Let's consider the completion scenario (the series of 4 CQEs, each element is 64B long)

0: normal uncompressed CQE, ownership OK, format uncompressed, opcode OK, no error
1: compressed CQE, ownership OK, format compressed, opcode OK, no error
2: miniCQE array, format can be any!!, may be discovered as ownership OK, format uncompressed, opcode OK, no error
3: miniCQE array, format can be any!!, may be discovered as ownership OK, format uncompressed, opcode OK, no error

Obviously, we should unconditionally mask out 2 and 3, regardless of recognized their formats/opcode/error/etc.
I think we can get the diff above and skip diff below:

>  		/* D.7 count non-compressed valid CQEs. */
>  		n = __builtin_clzl(vget_lane_u64(vreinterpret_u64_u16(
>  				   invalid_mask), 0)) / (sizeof(uint16_t) * 8);
>  		nocmp_n += n;
> -		/* D.2 get the final invalid mask. */
> -		mask = vcreate_u16(n < MLX5_VPMD_DESCS_PER_LOOP ?
> -				   -1UL >> (n * sizeof(uint16_t) * 8) : 0);
> -		invalid_mask = vorr_u16(invalid_mask, mask);

and get the correct final invalid_mask - all compressed and invalid CQEs and following ones will be masked out.

With best regards,
Slava


  reply	other threads:[~2021-07-02  8:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01  8:30 [dpdk-dev] [PATCH 0/2] MLX5 PMD tuning Ruifeng Wang
2021-06-01  8:30 ` [dpdk-dev] [PATCH 1/2] net/mlx5: remove redundant operations Ruifeng Wang
2021-07-02  8:12   ` Slava Ovsiienko [this message]
2021-07-02 10:30     ` Ruifeng Wang
2021-07-05 10:01       ` Slava Ovsiienko
2021-07-07  8:00         ` Ruifeng Wang
2021-06-01  8:30 ` [dpdk-dev] [PATCH 2/2] net/mlx5: reduce unnecessary memory access Ruifeng Wang
2021-07-02  7:05   ` Slava Ovsiienko
2021-07-02  7:28     ` Ruifeng Wang
2021-06-30  7:22 ` [dpdk-dev] [PATCH 0/2] MLX5 PMD tuning Ruifeng Wang
2021-07-07  9:03 ` [dpdk-dev] [PATCH v2 " Ruifeng Wang
2021-07-07  9:03   ` [dpdk-dev] [PATCH v2 1/2] net/mlx5: remove redundant operations Ruifeng Wang
2021-07-12 15:31     ` Slava Ovsiienko
2021-07-07  9:03   ` [dpdk-dev] [PATCH v2 2/2] net/mlx5: reduce unnecessary memory access Ruifeng Wang
2021-07-12 15:33     ` Slava Ovsiienko
2021-07-13  9:32   ` [dpdk-dev] [PATCH v2 0/2] MLX5 PMD tuning Raslan Darawsheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB37530E9FCE3A71269870B756DF1F9@DM6PR12MB3753.namprd12.prod.outlook.com \
    --to=viacheslavo@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerinj@marvell.com \
    --cc=matan@nvidia.com \
    --cc=nd@arm.com \
    --cc=rasland@nvidia.com \
    --cc=ruifeng.wang@arm.com \
    --cc=shahafs@nvidia.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.