From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=CE/R=UE=dpdk.org=dev-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1FA0AC282DE
	for <dpdk-dev@archiver.kernel.org>; Wed,  5 Jun 2019 10:50:23 +0000 (UTC)
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by mail.kernel.org (Postfix) with ESMTP id ADC4420717
	for <dpdk-dev@archiver.kernel.org>; Wed,  5 Jun 2019 10:50:22 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ADC4420717
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=dev-bounces@dpdk.org
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 088A21B9A6;
	Wed,  5 Jun 2019 12:50:22 +0200 (CEST)
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
 by dpdk.org (Postfix) with ESMTP id D07821B957
 for <dev@dpdk.org>; Wed,  5 Jun 2019 12:50:19 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 05 Jun 2019 03:50:18 -0700
X-ExtLoop1: 1
Received: from vmedvedk-mobl.ger.corp.intel.com (HELO [10.237.220.98])
 ([10.237.220.98])
 by orsmga001.jf.intel.com with ESMTP; 05 Jun 2019 03:50:16 -0700
To: Ruifeng Wang <ruifeng.wang@arm.com>, bruce.richardson@intel.com
Cc: dev@dpdk.org, honnappa.nagarahalli@arm.com, gavin.hu@arm.com, nd@arm.com
References: <20190605055451.30473-1-ruifeng.wang@arm.com>
From: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
Message-ID: <e1e7a999-bc0c-e228-b764-d4c31496b04b@intel.com>
Date: Wed, 5 Jun 2019 11:50:16 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101
 Thunderbird/60.7.0
MIME-Version: 1.0
In-Reply-To: <20190605055451.30473-1-ruifeng.wang@arm.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Subject: Re: [dpdk-dev] [PATCH v1 1/2] lib/lpm: memory orderings to avoid
 race conditions for v1604
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Wang,

On 05/06/2019 06:54, Ruifeng Wang wrote:
> When a tbl8 group is getting attached to a tbl24 entry, lookup
> might fail even though the entry is configured in the table.
>
> For ex: consider a LPM table configured with 10.10.10.1/24.
> When a new entry 10.10.10.32/28 is being added, a new tbl8
> group is allocated and tbl24 entry is changed to point to
> the tbl8 group. If the tbl24 entry is written without the tbl8
> group entries updated, a lookup on 10.10.10.9 will return
> failure.
>
> Correct memory orderings are required to ensure that the
> store to tbl24 does not happen before the stores to tbl8 group
> entries complete.
>
> The orderings have impact on LPM performance test.
> On Arm A72 platform, delete operation has 2.7% degradation, while
> add / lookup has no notable performance change.
> On x86 E5 platform, add operation has 4.3% degradation, delete
> operation has 2.2% - 10.2% degradation, lookup has no performance
> change.

I think it is possible to avoid add/del performance degradation

1. Explicitly mark struct rte_lpm_tbl_entry 4-byte aligned

2. Cast value to uint32_t (uint16_t for 2.0 version) on memory write

3. Use rte_wmb() after memory write

>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>   lib/librte_lpm/rte_lpm.c | 32 +++++++++++++++++++++++++-------
>   lib/librte_lpm/rte_lpm.h |  4 ++++
>   2 files changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 6b7b28a2e..6ec450a08 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -806,7 +806,8 @@ add_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth,
>   			/* Setting tbl24 entry in one go to avoid race
>   			 * conditions
>   			 */
> -			lpm->tbl24[i] = new_tbl24_entry;
> +			__atomic_store(&lpm->tbl24[i], &new_tbl24_entry,
> +					__ATOMIC_RELEASE);
>   
>   			continue;
>   		}
> @@ -1017,7 +1018,11 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   			.depth = 0,
>   		};
>   
> -		lpm->tbl24[tbl24_index] = new_tbl24_entry;
> +		/* The tbl24 entry must be written only after the
> +		 * tbl8 entries are written.
> +		 */
> +		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
> +				__ATOMIC_RELEASE);
>   
>   	} /* If valid entry but not extended calculate the index into Table8. */
>   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> @@ -1063,7 +1068,11 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   				.depth = 0,
>   		};
>   
> -		lpm->tbl24[tbl24_index] = new_tbl24_entry;
> +		/* The tbl24 entry must be written only after the
> +		 * tbl8 entries are written.
> +		 */
> +		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
> +				__ATOMIC_RELEASE);
>   
>   	} else { /*
>   		* If it is valid, extended entry calculate the index into tbl8.
> @@ -1391,6 +1400,7 @@ delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   	/* Calculate the range and index into Table24. */
>   	tbl24_range = depth_to_range(depth);
>   	tbl24_index = (ip_masked >> 8);
> +	struct rte_lpm_tbl_entry zero_tbl24_entry = {0};
>   
>   	/*
>   	 * Firstly check the sub_rule_index. A -1 indicates no replacement rule
> @@ -1405,7 +1415,8 @@ delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   
>   			if (lpm->tbl24[i].valid_group == 0 &&
>   					lpm->tbl24[i].depth <= depth) {
> -				lpm->tbl24[i].valid = INVALID;
> +				__atomic_store(&lpm->tbl24[i],
> +					&zero_tbl24_entry, __ATOMIC_RELEASE);
>   			} else if (lpm->tbl24[i].valid_group == 1) {
>   				/*
>   				 * If TBL24 entry is extended, then there has
> @@ -1450,7 +1461,8 @@ delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   
>   			if (lpm->tbl24[i].valid_group == 0 &&
>   					lpm->tbl24[i].depth <= depth) {
> -				lpm->tbl24[i] = new_tbl24_entry;
> +				__atomic_store(&lpm->tbl24[i], &new_tbl24_entry,
> +						__ATOMIC_RELEASE);
>   			} else  if (lpm->tbl24[i].valid_group == 1) {
>   				/*
>   				 * If TBL24 entry is extended, then there has
> @@ -1713,8 +1725,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   	tbl8_recycle_index = tbl8_recycle_check_v1604(lpm->tbl8, tbl8_group_start);
>   
>   	if (tbl8_recycle_index == -EINVAL) {
> -		/* Set tbl24 before freeing tbl8 to avoid race condition. */
> +		/* Set tbl24 before freeing tbl8 to avoid race condition.
> +		 * Prevent the free of the tbl8 group from hoisting.
> +		 */
>   		lpm->tbl24[tbl24_index].valid = 0;
> +		__atomic_thread_fence(__ATOMIC_RELEASE);
>   		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
>   	} else if (tbl8_recycle_index > -1) {
>   		/* Update tbl24 entry. */
> @@ -1725,8 +1740,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   			.depth = lpm->tbl8[tbl8_recycle_index].depth,
>   		};
>   
> -		/* Set tbl24 before freeing tbl8 to avoid race condition. */
> +		/* Set tbl24 before freeing tbl8 to avoid race condition.
> +		 * Prevent the free of the tbl8 group from hoisting.
> +		 */
>   		lpm->tbl24[tbl24_index] = new_tbl24_entry;
> +		__atomic_thread_fence(__ATOMIC_RELEASE);
>   		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
>   	}
>   #undef group_idx
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index b886f54b4..6f5704c5c 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -354,6 +354,10 @@ rte_lpm_lookup(struct rte_lpm *lpm, uint32_t ip, uint32_t *next_hop)
>   	ptbl = (const uint32_t *)(&lpm->tbl24[tbl24_index]);
>   	tbl_entry = *ptbl;
>   
> +	/* Memory ordering is not required in lookup. Because dataflow
> +	 * dependency exists, compiler or HW won't be able to re-order
> +	 * the operations.
> +	 */
>   	/* Copy tbl8 entry (only if needed) */
>   	if (unlikely((tbl_entry & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>   			RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {

-- 
Regards,
Vladimir