From mboxrd@z Thu Jan  1 00:00:00 1970
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Subject: Re: [PATCH 2/4] hash: add memory ordering to avoid
	race	conditions
Date: Sun, 30 Sep 2018 22:20:59 +0000
Message-ID: <AM6PR08MB3672DF53627F2EBE0000CF7398EE0@AM6PR08MB3672.eurprd08.prod.outlook.com>
References: <1536253938-192391-1-git-send-email-honnappa.nagarahalli@arm.com>
 <1536253938-192391-3-git-send-email-honnappa.nagarahalli@arm.com>
 <D2C4A16CA39F7F4E8E384D204491D7A6614D81C0@FMSMSX151.amr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "dev@dpdk.org" <dev@dpdk.org>, "Gavin Hu (Arm Technology China)"
 <Gavin.Hu@arm.com>, Steve Capper <Steve.Capper@arm.com>, Ola Liljedahl
 <Ola.Liljedahl@arm.com>, nd <nd@arm.com>
To: "Wang, Yipeng1" <yipeng1.wang@intel.com>, "Richardson, Bruce"
 <bruce.richardson@intel.com>, "De Lara Guarch, Pablo"
 <pablo.de.lara.guarch@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from EUR02-VE1-obe.outbound.protection.outlook.com
 (mail-eopbgr20067.outbound.protection.outlook.com [40.107.2.67])
 by dpdk.org (Postfix) with ESMTP id B5596160
 for <dev@dpdk.org>; Mon,  1 Oct 2018 00:21:01 +0200 (CEST)
In-Reply-To: <D2C4A16CA39F7F4E8E384D204491D7A6614D81C0@FMSMSX151.amr.corp.intel.com>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

>=20
> Some general comments for  the various __atomic_store/load added,
>=20
> 1. Although it passes the compiler check, but I just want to confirm that=
 if we
> should use GCC/clang builtins, or if There are higher level APIs in DPDK =
to do
> atomic operations?
>=20
I have used gcc builtins (just like rte_ring does)

> 2. We believe compiler will translate the atomic_store/load to regular MO=
V
> instruction on Total Store Order architecture (e.g. X86_64). But we run t=
he
> perf test on x86 and here is the relative slowdown on lookup comparing to
> master head. I am not sure if the performance drop comes from the atomic
> buitins.
>=20
C11 atomics also block compiler reordering. Other than this, the retry loop=
 is an addition to lookup.
The patch also has the alignment corrected. I am not sure how is that affec=
ting the perf numbers.

> Keysize | single lookup | bulk lookup
> 4 | 0.93 | 0.95
> 8 | 0.95 | 0.96
> 16 | 0.97 | 0.96
> 32 | 0.97 | 1.00
> 48 | 1.03 | 0.99
> 64 | 1.04 | 0.98
> 9 | 0.91 | 0.96
> 13 | 0.97 | 0.98
> 37 | 1.04 | 1.03
> 40 | 1.02 | 0.98
>=20
I assume this is the data from the test cases in test_hash_perf.c file. I t=
ried to reproduce this data, but my data is worse. Can you specify the actu=
al test from test_hash_perf.c you are using (With locks/Pre-computed hash/W=
ith data/Elements in primary)?
IMO, the differences you have provided are not high.

> Other comments inlined.
>=20
> >-----Original Message-----
> >Only race condition that can occur is -  using the key store element
> >before the key write is completed. Hence, while inserting the element
> >the release memory order is used. Any other race condition is caught by
> >the key comparison. Memory orderings are added only where needed.
> >For ex: reads in the writer's context do not need memory ordering as
> >there is a single writer.
> >
> [Wang, Yipeng] I assume this commit works together with the next one to
> enable read-write concurrency on relaxed memory ordering machine. This
> commit by itself does not do that, right?
> If this is the case, should we merge the two commits, or maybe just expla=
in it
> a little bit more in the commit message?
This commit works only with the next commit. I kept it separated to make it=
 easy for review.
I will merge it in the final patch.

>=20
> >key_idx in the bucket entry and pdata in the key store element are used
> >for synchronisation. key_idx is used to release an inserted entry in
> >the bucket to the reader. Use of pdata for synchronisation is required
> >due to updation of an existing entry where-in only the pdata is updated
> >without updating key_idx.
> >
> >
> >-/* Search a key from bucket and update its data */
> >+/* Search a key from bucket and update its data.
> >+ * Writer holds the lock before calling this.
> >+ */
> > static inline int32_t
> > search_and_update(const struct rte_hash *h, void *data, const void *key=
,
> > 	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash) @@
> >-499,8 +501,13 @@ search_and_update(const struct rte_hash *h, void *data=
,
> const void *key,
> > 			k =3D (struct rte_hash_key *) ((char *)keys +
> > 					bkt->key_idx[i] * h->key_entry_size);
> > 			if (rte_hash_cmp_eq(key, k->key, h) =3D=3D 0) {
> >-				/* Update data */
> >-				k->pdata =3D data;
> >+				/* 'pdata' acts as the synchronization point
> >+				 * when an existing hash entry is updated.
> >+				 * Key is not updated in this case.
> >+				 */
> [Wang, Yipeng] I did not quite understand why do we need synchronization
> for hash data update.
> Since pdata write is already atomic, the lookup will either read out the =
stale
> data or the new data, which should be fine without synchronization.
> Is it to ensure the order of multiple reads in lookup threads?
Yes, it is to ensure the order of reads in the lookup threads. There might =
be reads in the application and we want to make sure they do not get reorde=
red with this.

>=20
> >@@ -1243,11 +1289,19 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,
> > 		while (sec_hitmask[i]) {
> > 			uint32_t hit_index =3D __builtin_ctzl(sec_hitmask[i]);
> >
> >-			uint32_t key_idx =3D secondary_bkt[i]-
> >key_idx[hit_index];
> >+			uint32_t key_idx =3D
> >+			__atomic_load_n(
> >+				&secondary_bkt[i]->key_idx[hit_index],
> >+				__ATOMIC_ACQUIRE);
> > 			const struct rte_hash_key *key_slot =3D
> > 				(const struct rte_hash_key *)(
> > 				(const char *)h->key_store +
> > 				key_idx * h->key_entry_size);
> >+
> >+			if (key_idx !=3D EMPTY_SLOT)
> [Wang, Yipeng] I think even for current code, we need to check empty_slot=
.
> Could you export this as a bug fix commit?
>=20
In the existing code, there is check 'if (!!key_idx & !rte_hash....)'. Are =
you referring to '!!key_idx'? I think this should be changed to '(key_idx !=
=3D EMPTY_SLOT)'.