RE: [PATCH v2] mempool: micro-optimize put function

From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	"olivier.matz@6wind.com" <olivier.matz@6wind.com>,
	"andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "bruce.richardson@intel.com" <bruce.richardson@intel.com>,
	"konstantin.ananyev@huawei.com" <konstantin.ananyev@huawei.com>,
	nd <nd@arm.com>, nd <nd@arm.com>
Subject: RE: [PATCH v2] mempool: micro-optimize put function
Date: Wed, 16 Nov 2022 15:51:24 +0000	[thread overview]
Message-ID: <DBAPR08MB5814E10D7B0559D79757D8F098079@DBAPR08MB5814.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <20221116121416.94990-1-mb@smartsharesystems.com>

<snip>
> 
> Micro-optimization:
> Reduced the most likely code path in the generic put function by moving an
> unlikely check out of the most likely code path and further down.
> 
> Also updated the comments in the function.
> 
> v2 (feedback from Andrew Rybchenko):
> * Modified comparison to prevent overflow if n is really huge and len is
>   non-zero.
> * Added assertion about the invariant preventing overflow in the
>   comparison.
> * Crossing the threshold is not extremely unlikely, so removed likely()
>   from that comparison.
>   The compiler will generate code with optimal static branch prediction
>   here anyway.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/mempool/rte_mempool.h | 36 ++++++++++++++++++++----------------
>  1 file changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 9f530db24b..dd1a3177d6 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1364,32 +1364,36 @@ rte_mempool_do_generic_put(struct
> rte_mempool *mp, void * const *obj_table,  {
>  	void **cache_objs;
> 
> -	/* No cache provided */
> +	/* No cache provided? */
>  	if (unlikely(cache == NULL))
>  		goto driver_enqueue;
> 
> -	/* increment stat now, adding in mempool always success */
> +	/* Increment stats now, adding in mempool always succeeds. */
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> 
> -	/* The request itself is too big for the cache */
> -	if (unlikely(n > cache->flushthresh))
> -		goto driver_enqueue_stats_incremented;
> -
> -	/*
> -	 * The cache follows the following algorithm:
> -	 *   1. If the objects cannot be added to the cache without crossing
> -	 *      the flush threshold, flush the cache to the backend.
> -	 *   2. Add the objects to the cache.
> -	 */
> +	/* Assert the invariant preventing overflow in the comparison below.
> */
> +	RTE_ASSERT(cache->len <= cache->flushthresh);
> 
> -	if (cache->len + n <= cache->flushthresh) {
> +	if (n <= cache->flushthresh - cache->len) {
> +		/*
> +		 * The objects can be added to the cache without crossing the
> +		 * flush threshold.
> +		 */
>  		cache_objs = &cache->objs[cache->len];
>  		cache->len += n;
> -	} else {
> +	} else if (likely(n <= cache->flushthresh)) {
IMO, this is a misconfiguration on the application part. In the PMDs I have looked at, max value of 'n' is controlled by compile time constants. Application could do a compile time check on the cache threshold or we could have another RTE_ASSERT on this.

> +		/*
> +		 * The request itself fits into the cache.
> +		 * But first, the cache must be flushed to the backend, so
> +		 * adding the objects does not cross the flush threshold.
> +		 */
>  		cache_objs = &cache->objs[0];
>  		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len);
>  		cache->len = n;
> +	} else {
> +		/* The request itself is too big for the cache. */
> +		goto driver_enqueue_stats_incremented;
>  	}
> 
>  	/* Add the objects to the cache. */
> @@ -1399,13 +1403,13 @@ rte_mempool_do_generic_put(struct
> rte_mempool *mp, void * const *obj_table,
> 
>  driver_enqueue:
> 
> -	/* increment stat now, adding in mempool always success */
> +	/* Increment stats now, adding in mempool always succeeds. */
>  	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
>  	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> 
>  driver_enqueue_stats_incremented:
> 
> -	/* push objects to the backend */
> +	/* Push the objects to the backend. */
>  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> 
> --
> 2.17.1