[PATCH v1 0/5] hash: add extendable bucket and partial-key hashing

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing
@ 2018-09-06 17:09 Yipeng Wang
  2018-09-06 17:09 ` [PATCH v1 1/5] test: fix bucket size in hash table perf test Yipeng Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-06 17:09 UTC (permalink / raw)
  To: pablo.de.lara.guarch, bruce.richardson
  Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

This patch set made two major improvements over the current rte_hash library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>

Yipeng Wang (5):
  test: fix bucket size in hash table perf test
  test: more accurate hash table perf test output
  hash: add extendable bucket feature
  test: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 518 +++++++++++++++++++++++++++-----------
 lib/librte_hash/rte_cuckoo_hash.h |  11 +-
 lib/librte_hash/rte_hash.h        |   3 +
 test/test/test_hash.c             | 145 ++++++++++-
 test/test/test_hash_perf.c        | 126 +++++++---
 5 files changed, 618 insertions(+), 185 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v1 1/5] test: fix bucket size in hash table perf test
  2018-09-06 17:09 [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing Yipeng Wang
@ 2018-09-06 17:09 ` Yipeng Wang
  2018-09-06 17:09 ` [PATCH v1 2/5] test: more accurate hash table perf test output Yipeng Wang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-06 17:09 UTC (permalink / raw)
  To: pablo.de.lara.guarch, bruce.richardson
  Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

The bucket size was changed from 4 to 8 but the corresponding
perf test was not changed accordingly.

Fixes: 58017c98ed53 ("hash: add vectorized comparison")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash_perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 33dcb9f..9ed7125 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -20,7 +20,7 @@
 #define MAX_ENTRIES (1 << 19)
 #define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
-#define BUCKET_SIZE 4
+#define BUCKET_SIZE 8
 #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
 #define MAX_KEYSIZE 64
 #define NUM_KEYSIZES 10
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v1 2/5] test: more accurate hash table perf test output
  2018-09-06 17:09 [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing Yipeng Wang
  2018-09-06 17:09 ` [PATCH v1 1/5] test: fix bucket size in hash table perf test Yipeng Wang
@ 2018-09-06 17:09 ` Yipeng Wang
  2018-09-06 17:09 ` [PATCH v1 3/5] hash: add extendable bucket feature Yipeng Wang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-06 17:09 UTC (permalink / raw)
  To: pablo.de.lara.guarch, bruce.richardson
  Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

Edit the printf information when error happens to be more
accurate and informative.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash_perf.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 9ed7125..4d00c20 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -248,7 +248,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 						(const void *) keys[i],
 						signatures[i], data);
 			if (ret < 0) {
-				printf("Failed to add key number %u\n", ret);
+				printf("H+D: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else if (with_hash && !with_data) {
@@ -258,7 +258,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 			if (ret >= 0)
 				positions[i] = ret;
 			else {
-				printf("Failed to add key number %u\n", ret);
+				printf("H: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else if (!with_hash && with_data) {
@@ -266,7 +266,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 						(const void *) keys[i],
 						data);
 			if (ret < 0) {
-				printf("Failed to add key number %u\n", ret);
+				printf("D: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else {
@@ -274,7 +274,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 			if (ret >= 0)
 				positions[i] = ret;
 			else {
-				printf("Failed to add key number %u\n", ret);
+				printf("Failed to add key number %u\n", i);
 				return -1;
 			}
 		}
@@ -442,7 +442,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 		if (ret >= 0)
 			positions[i] = ret;
 		else {
-			printf("Failed to add key number %u\n", ret);
+			printf("Failed to delete key number %u\n", i);
 			return -1;
 		}
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v1 3/5] hash: add extendable bucket feature
  2018-09-06 17:09 [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing Yipeng Wang
  2018-09-06 17:09 ` [PATCH v1 1/5] test: fix bucket size in hash table perf test Yipeng Wang
  2018-09-06 17:09 ` [PATCH v1 2/5] test: more accurate hash table perf test output Yipeng Wang
@ 2018-09-06 17:09 ` Yipeng Wang
  2018-09-06 17:09 ` [PATCH v1 4/5] test: implement extendable bucket hash test Yipeng Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-06 17:09 UTC (permalink / raw)
  To: pablo.de.lara.guarch, bruce.richardson
  Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the table utilization can always acheive 100%.
Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 331 +++++++++++++++++++++++++++++++++-----
 lib/librte_hash/rte_cuckoo_hash.h |   5 +
 lib/librte_hash/rte_hash.h        |   3 +
 3 files changed, 298 insertions(+), 41 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..ff380bb 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,6 +31,10 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
+#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)                            \
+	for (CURRENT_BKT = START_BUCKET;                                      \
+		CURRENT_BKT != NULL;                                          \
+		CURRENT_BKT = CURRENT_BKT->next)
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -63,6 +67,16 @@ rte_hash_find_existing(const char *name)
 	return h;
 }
 
+static inline struct rte_hash_bucket *
+rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt)
+{
+	while (1) {
+		if (lst_bkt->next == NULL)
+			return lst_bkt;
+		lst_bkt = lst_bkt->next;
+	}
+}
+
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)
 {
 	h->cmp_jump_table_idx = KEY_CUSTOM;
@@ -85,13 +99,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	struct rte_tailq_entry *te = NULL;
 	struct rte_hash_list *hash_list;
 	struct rte_ring *r = NULL;
+	struct rte_ring *r_ext = NULL;
 	char hash_name[RTE_HASH_NAMESIZE];
 	void *k = NULL;
 	void *buckets = NULL;
+	void *buckets_ext = NULL;
 	char ring_name[RTE_RING_NAMESIZE];
+	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
 	unsigned i;
 	unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +142,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		multi_writer_support = 1;
 	}
 
+	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
+		ext_table_support = 1;
+
 	/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
 	if (multi_writer_support)
 		/*
@@ -145,6 +166,24 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err;
 	}
 
+	const uint32_t num_buckets = rte_align32pow2(params->entries) /
+						RTE_HASH_BUCKET_ENTRIES;
+
+	snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
+								params->name);
+	/* Create ring for extendable buckets. */
+	if (ext_table_support) {
+		r_ext = rte_ring_create(ext_ring_name,
+				rte_align32pow2(num_buckets + 1),
+				params->socket_id, 0);
+
+		if (r_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+								"failed\n");
+			goto err;
+		}
+	}
+
 	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
 
 	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -177,18 +216,34 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err_unlock;
 	}
 
-	const uint32_t num_buckets = rte_align32pow2(params->entries)
-					/ RTE_HASH_BUCKET_ENTRIES;
-
 	buckets = rte_zmalloc_socket(NULL,
 				num_buckets * sizeof(struct rte_hash_bucket),
 				RTE_CACHE_LINE_SIZE, params->socket_id);
 
 	if (buckets == NULL) {
-		RTE_LOG(ERR, HASH, "memory allocation failed\n");
+		RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
 		goto err_unlock;
 	}
 
+	/* Allocate same number of extendable buckets */
+	if (ext_table_support) {
+		buckets_ext = rte_zmalloc_socket(NULL,
+				num_buckets * sizeof(struct rte_hash_bucket),
+				RTE_CACHE_LINE_SIZE, params->socket_id);
+		if (buckets_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+							"failed\n");
+			goto err_unlock;
+		}
+		/* Populate ext bkt ring. We reserve 0 similar to the
+		 * key-data slot, just in case in future we want to
+		 * use bucket index for the linked list and 0 means NULL
+		 * for next bucket
+		 */
+		for (i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+	}
+
 	const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params->key_len;
 	const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
 
@@ -262,6 +317,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->num_buckets = num_buckets;
 	h->bucket_bitmask = h->num_buckets - 1;
 	h->buckets = buckets;
+	h->buckets_ext = buckets_ext;
+	h->free_ext_bkts = r_ext;
 	h->hash_func = (params->hash_func == NULL) ?
 		default_hash_func : params->hash_func;
 	h->key_store = k;
@@ -269,6 +326,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->hw_trans_mem_support = hw_trans_mem_support;
 	h->multi_writer_support = multi_writer_support;
 	h->readwrite_concur_support = readwrite_concur_support;
+	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -304,9 +362,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
 err:
 	rte_ring_free(r);
+	rte_ring_free(r_ext);
 	rte_free(te);
 	rte_free(h);
 	rte_free(buckets);
+	rte_free(buckets_ext);
 	rte_free(k);
 	return NULL;
 }
@@ -344,6 +404,7 @@ rte_hash_free(struct rte_hash *h)
 		rte_free(h->readwrite_lock);
 	}
 	rte_ring_free(h->free_slots);
+	rte_ring_free(h->free_ext_bkts);
 	rte_free(h->key_store);
 	rte_free(h->buckets);
 	rte_free(h);
@@ -448,6 +509,14 @@ rte_hash_reset(struct rte_hash *h)
 	while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
 		rte_pause();
 
+	/* clear free extendable bucket ring and memory */
+	if (h->ext_table_support) {
+		memset(h->buckets_ext, 0, h->num_buckets *
+						sizeof(struct rte_hash_bucket));
+		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
+			rte_pause();
+	}
+
 	/* Repopulate the free slots ring. Entry zero is reserved for key misses */
 	if (h->multi_writer_support)
 		tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
@@ -458,6 +527,12 @@ rte_hash_reset(struct rte_hash *h)
 	for (i = 1; i < tot_ring_cnt + 1; i++)
 		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
 
+	/* Repopulate the free ext bkt ring. */
+	if (h->ext_table_support)
+		for (i = 1; i < h->num_buckets + 1; i++)
+			rte_ring_sp_enqueue(h->free_ext_bkts,
+						(void *)((uintptr_t) i));
+
 	if (h->multi_writer_support) {
 		/* Reset local caches per lcore */
 		for (i = 0; i < RTE_MAX_LCORE; i++)
@@ -524,24 +599,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		int32_t *ret_val)
 {
 	unsigned int i;
-	struct rte_hash_bucket *cur_bkt = prim_bkt;
+	struct rte_hash_bucket *cur_bkt;
 	int32_t ret;
 
 	__hash_rw_writer_lock(h);
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	/* Insert new entry if there is room in the primary
@@ -580,7 +658,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
-	struct rte_hash_bucket *cur_bkt = bkt;
+	struct rte_hash_bucket *cur_bkt;
 	struct queue_node *prev_node, *curr_node = leaf;
 	struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
 	uint32_t prev_slot, curr_slot = leaf_slot;
@@ -597,18 +675,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
 
-	ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	while (likely(curr_node->prev != NULL)) {
@@ -711,15 +791,18 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	hash_sig_t alt_hash;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
-	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
 	void *slot_id = NULL;
-	uint32_t new_idx;
+	void *ext_bkt_id = NULL;
+	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
+	unsigned int i;
 	struct lcore_cache *cached_free_slots = NULL;
 	int32_t ret_val;
+	struct rte_hash_bucket *last;
 
 	prim_bucket_idx = sig & h->bucket_bitmask;
 	prim_bkt = &h->buckets[prim_bucket_idx];
@@ -739,10 +822,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Check if key is already inserted in secondary location */
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_writer_unlock(h);
 
@@ -808,10 +893,72 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
-	} else {
+	}
+
+	/* if ext table not enabled, we failed the insertion */
+	if (!h->ext_table_support) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret;
 	}
+
+	/* Now we need to go through the extendable table. Protection is needed
+	 * to protect all extendable table processes.
+	 */
+	__hash_rw_writer_lock(h);
+	/* We check of duplicates again since could be added before the lock */
+	/* Check if key is already inserted in primary location */
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	if (ret != -1) {
+		enqueue_slot_back(h, cached_free_slots, slot_id);
+		goto failure;
+	}
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			enqueue_slot_back(h, cached_free_slots, slot_id);
+			goto failure;
+		}
+	}
+
+	/* search extendable table to find an empty entry */
+	struct rte_hash_bucket *next_bkt = sec_bkt->next;
+	FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			/* Check if slot is available */
+			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
+				cur_bkt->sig_current[i] = alt_hash;
+				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->key_idx[i] = new_idx;
+				__hash_rw_writer_unlock(h);
+				return new_idx - 1;
+			}
+		}
+	}
+
+	/* failed to get an empty entry from extendable table. Link a new
+	 * extendable bucket. We first get an free bucket from ring.
+	 */
+	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+		ret = -ENOSPC;
+		goto failure;
+	}
+
+	bkt_id = (uint32_t)((uintptr_t) ext_bkt_id) - 1;
+	/* Use the first location of the new bucket */
+	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
+	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
+	/* Link the new bucket to sec bucket linked list */
+	last = rte_hash_get_last_bkt(sec_bkt);
+	last->next = &h->buckets_ext[bkt_id];
+	__hash_rw_writer_unlock(h);
+	return new_idx - 1;
+
+failure:
+	__hash_rw_writer_unlock(h);
+	return ret;
+
 }
 
 int32_t
@@ -890,7 +1037,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
+	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
 
 	bucket_idx = sig & h->bucket_bitmask;
@@ -904,16 +1051,19 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 		__hash_rw_reader_unlock(h);
 		return ret;
 	}
+
 	/* Calculate secondary hash */
 	alt_hash = rte_hash_secondary_hash(sig);
 	bucket_idx = alt_hash & h->bucket_bitmask;
 	bkt = &h->buckets[bucket_idx];
 
 	/* Check if key is in secondary location */
-	ret = search_one_bucket(h, key, alt_hash, data, bkt);
-	if (ret != -1) {
-		__hash_rw_reader_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, bkt) {
+		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		if (ret != -1) {
+			__hash_rw_reader_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_reader_unlock(h);
 	return -ENOENT;
@@ -1015,15 +1165,17 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
-	int32_t ret;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *cur_bkt, *prev_bkt, *next_bkt;
+	int32_t ret, i;
+	struct rte_hash_bucket *tobe_removed_bkt = NULL;
 
 	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	prim_bkt = &h->buckets[bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -1032,17 +1184,53 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	/* Calculate secondary hash */
 	alt_hash = rte_hash_secondary_hash(sig);
 	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[bucket_idx];
 
 	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, bkt, alt_hash);
+	ret = search_and_remove(h, key, sec_bkt, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
 
+	/* Not in main table, we need to search ext table */
+	if (h->ext_table_support) {
+		next_bkt = sec_bkt->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			ret = search_and_remove(h, key, cur_bkt, alt_hash);
+			if (ret != -1)
+				goto return_bkt;
+		}
+	}
+
 	__hash_rw_writer_unlock(h);
 	return -ENOENT;
+
+/* Search extendable buckets to see if any empty bucket need to be recycled */
+return_bkt:
+	prev_bkt = sec_bkt;
+
+	for (cur_bkt = sec_bkt->next; cur_bkt != NULL;
+			prev_bkt = cur_bkt, cur_bkt = cur_bkt->next) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			if (cur_bkt->key_idx[i] != EMPTY_SLOT)
+				break;
+		}
+		if (i == RTE_HASH_BUCKET_ENTRIES) {
+			prev_bkt->next = cur_bkt->next;
+			cur_bkt->next = NULL;
+			tobe_removed_bkt = cur_bkt;
+			break;
+		}
+	}
+
+	__hash_rw_writer_unlock(h);
+
+	if (tobe_removed_bkt) {
+		uint32_t index = tobe_removed_bkt - h->buckets_ext + 1;
+		rte_ring_mp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+	}
+	return ret;
 }
 
 int32_t
@@ -1143,6 +1331,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 {
 	uint64_t hits = 0;
 	int32_t i;
+	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
@@ -1266,6 +1455,35 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		continue;
 	}
 
+	/* all found, do not need to go through ext bkt */
+	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
+		if (hit_mask != NULL)
+			*hit_mask = hits;
+		__hash_rw_reader_unlock(h);
+		return;
+	}
+
+	/* need to check ext buckets for match */
+	for (i = 0; i < num_keys; i++) {
+		if ((hits & (1ULL << i)) != 0)
+			continue;
+		struct rte_hash_bucket *cur_bkt;
+		struct rte_hash_bucket *next_bkt = secondary_bkt[i]->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			if (data != NULL)
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], &data[i], cur_bkt);
+			else
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], NULL, cur_bkt);
+			if (ret != -1) {
+				positions[i] = ret;
+				hits |= 1ULL << i;
+				break;
+			}
+		}
+	}
+
 	__hash_rw_reader_unlock(h);
 
 	if (hit_mask != NULL)
@@ -1308,10 +1526,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 
 	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries_main = h->num_buckets *
+							RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries = total_entries_main << 1;
+
 	/* Out of bounds */
-	if (*next >= total_entries)
-		return -ENOENT;
+	if (*next >= total_entries_main)
+		goto extend_table;
 
 	/* Calculate bucket and index of current iterator */
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
@@ -1321,8 +1542,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
-		if (*next == total_entries)
-			return -ENOENT;
+		if (*next == total_entries_main)
+			goto extend_table;
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
@@ -1341,4 +1562,32 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	(*next)++;
 
 	return position - 1;
+
+extend_table:
+	/* Out of bounds */
+	if (*next >= total_entries || !h->ext_table_support)
+		return -ENOENT;
+
+	bucket_idx = (*next - total_entries_main) / RTE_HASH_BUCKET_ENTRIES;
+	idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+
+	while (h->buckets_ext[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+		(*next)++;
+		if (*next == total_entries)
+			return -ENOENT;
+		bucket_idx = (*next - total_entries_main) /
+						RTE_HASH_BUCKET_ENTRIES;
+		idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+	}
+	/* Get position of entry in key table */
+	position = h->buckets_ext[bucket_idx].key_idx[idx];
+	next_key = (struct rte_hash_key *) ((char *)h->key_store +
+				position * h->key_entry_size);
+	/* Return key and data */
+	*key = next_key->key;
+	*data = next_key->pdata;
+
+	/* Increment iterator */
+	(*next)++;
+	return position - 1;
 }
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index b43f467..f190b04 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -144,6 +144,8 @@ struct rte_hash_bucket {
 	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
+
+	void *next;
 } __rte_cache_aligned;
 
 /** A hash table structure. */
@@ -185,7 +187,10 @@ struct rte_hash {
 	/**< Table with buckets storing all the	hash values and key indexes
 	 * to the key table.
 	 */
+	uint8_t ext_table_support;     /**< Enable ext table */
 	rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
+	struct rte_hash_bucket *buckets_ext; /**< extra bucket array */
+	struct rte_ring *free_ext_bkts; /**< ring of indexes of free buckets */
 } __rte_cache_aligned;
 
 struct queue_node {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d931..2747522 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -37,6 +37,9 @@ extern "C" {
 /** Flag to support reader writer concurrency */
 #define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
 
+/** Flag to indicate the extended table should be used */
+#define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
+
 /** Signature of key that is stored internally. */
 typedef uint32_t hash_sig_t;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v1 4/5] test: implement extendable bucket hash test
  2018-09-06 17:09 [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing Yipeng Wang
                   ` (2 preceding siblings ...)
  2018-09-06 17:09 ` [PATCH v1 3/5] hash: add extendable bucket feature Yipeng Wang
@ 2018-09-06 17:09 ` Yipeng Wang
  2018-09-06 17:09 ` [PATCH v1 5/5] hash: use partial-key hashing Yipeng Wang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-06 17:09 UTC (permalink / raw)
  To: pablo.de.lara.guarch, bruce.richardson
  Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash.c      | 145 +++++++++++++++++++++++++++++++++++++++++++--
 test/test/test_hash_perf.c | 114 +++++++++++++++++++++++++----------
 2 files changed, 225 insertions(+), 34 deletions(-)

diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd..ca58755 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -660,6 +660,117 @@ static int test_full_bucket(void)
 	return 0;
 }
 
+/*
+ * Similar to the test above (full bucket test), but for extendable buckets.
+ */
+static int test_extendable_bucket(void)
+{
+	struct rte_hash_parameters params_pseudo_hash = {
+		.name = "test5",
+		.entries = 64,
+		.key_len = sizeof(struct flow_key), /* 13 */
+		.hash_func = pseudo_hash,
+		.hash_func_init_val = 0,
+		.socket_id = 0,
+		.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
+	};
+	struct rte_hash *handle;
+	int pos[64];
+	int expected_pos[64];
+	unsigned int i;
+	struct flow_key rand_keys[64];
+
+	for (i = 0; i < 64; i++) {
+		rand_keys[i].port_dst = i;
+		rand_keys[i].port_src = i+1;
+	}
+
+	handle = rte_hash_create(&params_pseudo_hash);
+	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
+
+	/* Fill bucket */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add - update */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Delete 1 key, check other keys are still found */
+	pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
+	print_key_info("Del", &rand_keys[35], pos[35]);
+	RETURN_IF_ERROR(pos[35] != expected_pos[35],
+			"failed to delete key (pos[1]=%d)", pos[35]);
+	pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
+	print_key_info("Lkp", &rand_keys[20], pos[20]);
+	RETURN_IF_ERROR(pos[20] != expected_pos[20],
+			"failed lookup after deleting key from same bucket "
+			"(pos[20]=%d)", pos[20]);
+
+	/* Go back to previous state */
+	pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
+	print_key_info("Add", &rand_keys[35], pos[35]);
+	expected_pos[35] = pos[35];
+	RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)", pos[35]);
+
+	/* Delete */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
+		print_key_info("Del", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to delete key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != -ENOENT,
+			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add again */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	rte_hash_free(handle);
+
+	/* Cover the NULL case. */
+	rte_hash_free(0);
+	return 0;
+}
+
 /******************************************************************************/
 static int
 fbk_hash_unit_test(void)
@@ -1096,7 +1207,7 @@ test_hash_creation_with_good_parameters(void)
  * Test to see the average table utilization (entries added/max entries)
  * before hitting a random entry that cannot be added
  */
-static int test_average_table_utilization(void)
+static int test_average_table_utilization(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	uint8_t simple_key[MAX_KEYSIZE];
@@ -1107,12 +1218,23 @@ static int test_average_table_utilization(void)
 
 	printf("\n# Running test to determine average utilization"
 	       "\n  before adding elements begins to fail\n");
+	if (ext_table)
+		printf("ext table is enabled\n");
+	else
+		printf("ext table is disabled\n");
+
 	printf("Measuring performance, please wait");
 	fflush(stdout);
 	ut_params.entries = 1 << 16;
 	ut_params.name = "test_average_utilization";
 	ut_params.hash_func = rte_jhash;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
+
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
 	for (j = 0; j < ITERATIONS; j++) {
@@ -1161,7 +1283,7 @@ static int test_average_table_utilization(void)
 }
 
 #define NUM_ENTRIES 256
-static int test_hash_iteration(void)
+static int test_hash_iteration(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	unsigned i;
@@ -1177,6 +1299,11 @@ static int test_hash_iteration(void)
 	ut_params.name = "test_hash_iteration";
 	ut_params.hash_func = rte_jhash;
 	ut_params.key_len = 16;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
@@ -1474,6 +1601,8 @@ test_hash(void)
 		return -1;
 	if (test_full_bucket() < 0)
 		return -1;
+	if (test_extendable_bucket() < 0)
+		return -1;
 
 	if (test_fbk_hash_find_existing() < 0)
 		return -1;
@@ -1483,9 +1612,17 @@ test_hash(void)
 		return -1;
 	if (test_hash_creation_with_good_parameters() < 0)
 		return -1;
-	if (test_average_table_utilization() < 0)
+
+	/* ext table disabled */
+	if (test_average_table_utilization(0) < 0)
+		return -1;
+	if (test_hash_iteration(0) < 0)
+		return -1;
+
+	/* ext table enabled */
+	if (test_average_table_utilization(1) < 0)
 		return -1;
-	if (test_hash_iteration() < 0)
+	if (test_hash_iteration(1) < 0)
 		return -1;
 
 	run_hash_func_tests();
diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 4d00c20..d169cd0 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -18,7 +18,8 @@
 #include "test.h"
 
 #define MAX_ENTRIES (1 << 19)
-#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
+#define KEYS_TO_ADD (MAX_ENTRIES)
+#define ADD_PERCENT 0.75 /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
 #define BUCKET_SIZE 8
 #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
@@ -77,7 +78,7 @@ static struct rte_hash_parameters ut_params = {
 
 static int
 create_table(unsigned int with_data, unsigned int table_index,
-		unsigned int with_locks)
+		unsigned int with_locks, unsigned int ext)
 {
 	char name[RTE_HASH_NAMESIZE];
 
@@ -95,6 +96,9 @@ create_table(unsigned int with_data, unsigned int table_index,
 	else
 		ut_params.extra_flag = 0;
 
+	if (ext)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	ut_params.name = name;
 	ut_params.key_len = hashtest_key_lens[table_index];
 	ut_params.socket_id = rte_socket_id();
@@ -116,15 +120,21 @@ create_table(unsigned int with_data, unsigned int table_index,
 
 /* Shuffle the keys that have been added, so lookups will be totally random */
 static void
-shuffle_input_keys(unsigned table_index)
+shuffle_input_keys(unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	uint32_t swap_idx;
 	uint8_t temp_key[MAX_KEYSIZE];
 	hash_sig_t temp_signature;
 	int32_t temp_position;
+	unsigned int keys_to_add;
+
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = KEYS_TO_ADD - 1; i > 0; i--) {
+	for (i = keys_to_add - 1; i > 0; i--) {
 		swap_idx = rte_rand() % i;
 
 		memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
@@ -146,14 +156,20 @@ shuffle_input_keys(unsigned table_index)
  * ALL can fit in hash table (no errors)
  */
 static int
-get_input_keys(unsigned with_pushes, unsigned table_index)
+get_input_keys(unsigned int with_pushes, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j;
 	unsigned bucket_idx, incr, success = 1;
 	uint8_t k = 0;
 	int32_t ret;
 	const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
+	unsigned int keys_to_add;
 
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 	/* Reset all arrays */
 	for (i = 0; i < MAX_ENTRIES; i++)
 		slot_taken[i] = 0;
@@ -170,7 +186,7 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 	 * Regardless a key has been added correctly or not (success),
 	 * the next one to try will be increased by 1.
 	 */
-	for (i = 0; i < KEYS_TO_ADD;) {
+	for (i = 0; i < keys_to_add;) {
 		incr = 0;
 		if (i != 0) {
 			keys[i][0] = ++k;
@@ -234,14 +250,20 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 }
 
 static int
-timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_adds(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *data;
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		data = (void *) ((uintptr_t) signatures[i]);
 		if (with_hash && with_data) {
 			ret = rte_hash_add_key_with_hash_data(h[table_index],
@@ -283,22 +305,31 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][ADD][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][ADD][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
 
 static int
-timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_lookups(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i, j;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *ret_data;
 	void *expected_data;
 	int32_t ret;
-
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD; j++) {
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
+	for (i = 0; i < num_lookups / keys_to_add; i++) {
+		for (j = 0; j < keys_to_add; j++) {
 			if (with_hash && with_data) {
 				ret = rte_hash_lookup_with_hash_data(h[table_index],
 							(const void *) keys[j],
@@ -351,13 +382,14 @@ timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_lookups_multi(unsigned with_data, unsigned table_index)
+timed_lookups_multi(unsigned int with_data, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j, k;
 	int32_t positions_burst[BURST_SIZE];
@@ -366,11 +398,20 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	void *ret_data[BURST_SIZE];
 	uint64_t hit_mask;
 	int ret;
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
 
 	const uint64_t start_tsc = rte_rdtsc();
 
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
+	for (i = 0; i < num_lookups/keys_to_add; i++) {
+		for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
 			for (k = 0; k < BURST_SIZE; k++)
 				keys_burst[k] = keys[j * BURST_SIZE + k];
 			if (with_data) {
@@ -418,19 +459,25 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_deletes(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		/* There are no delete functions with data, so just call two functions */
 		if (with_hash)
 			ret = rte_hash_del_key_with_hash(h[table_index],
@@ -450,7 +497,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][DELETE][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][DELETE][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
@@ -468,7 +515,8 @@ reset_table(unsigned table_index)
 }
 
 static int
-run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
+run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
+						unsigned int ext)
 {
 	unsigned i, j, with_data, with_hash;
 
@@ -477,25 +525,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
 
 	for (with_data = 0; with_data <= 1; with_data++) {
 		for (i = 0; i < NUM_KEYSIZES; i++) {
-			if (create_table(with_data, i, with_locks) < 0)
+			if (create_table(with_data, i, with_locks, ext) < 0)
 				return -1;
 
-			if (get_input_keys(with_pushes, i) < 0)
+			if (get_input_keys(with_pushes, i, ext) < 0)
 				return -1;
 			for (with_hash = 0; with_hash <= 1; with_hash++) {
-				if (timed_adds(with_hash, with_data, i) < 0)
+				if (timed_adds(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				for (j = 0; j < NUM_SHUFFLES; j++)
-					shuffle_input_keys(i);
+					shuffle_input_keys(i, ext);
 
-				if (timed_lookups(with_hash, with_data, i) < 0)
+				if (timed_lookups(with_hash, with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_lookups_multi(with_data, i) < 0)
+				if (timed_lookups_multi(with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_deletes(with_hash, with_data, i) < 0)
+				if (timed_deletes(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				/* Print a dot to show progress on operations */
@@ -631,10 +679,16 @@ test_hash_perf(void)
 				printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
 			else
 				printf("\nELEMENTS IN PRIMARY OR SECONDARY LOCATION\n");
-			if (run_all_tbl_perf_tests(with_pushes, with_locks) < 0)
+			if (run_all_tbl_perf_tests(with_pushes, with_locks, 0) < 0)
 				return -1;
 		}
 	}
+
+	printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
+
+	if (run_all_tbl_perf_tests(1, 0, 1) < 0)
+		return -1;
+
 	if (fbk_hash_perf_test() < 0)
 		return -1;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v1 5/5] hash: use partial-key hashing
  2018-09-06 17:09 [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing Yipeng Wang
                   ` (3 preceding siblings ...)
  2018-09-06 17:09 ` [PATCH v1 4/5] test: implement extendable bucket hash test Yipeng Wang
@ 2018-09-06 17:09 ` Yipeng Wang
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-06 17:09 UTC (permalink / raw)
  To: pablo.de.lara.guarch, bruce.richardson
  Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Bascially the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 225 ++++++++++++++++++--------------------
 lib/librte_hash/rte_cuckoo_hash.h |   6 +-
 2 files changed, 108 insertions(+), 123 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index ff380bb..ace47ad 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -92,6 +92,26 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
 		return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
 }
 
+static inline void
+get_buckets_index(const struct rte_hash *h, const hash_sig_t hash,
+		uint32_t *prim_bkt, uint32_t *sec_bkt, uint16_t *sig)
+{
+	/*
+	 * We use higher 16 bits of hash as the signature value stored in table.
+	 * We use the lower bits for the primary bucket
+	 * location. Then we xor primary bucket location and the signature
+	 * to get the secondary bucket location. This is same as
+	 * proposed in paper" B. Fan, et al's paper
+	 * "Cuckoo Filter: Practically Better Than Bloom". The benefit to use
+	 * xor is that one could derive the alternative bucket location
+	 * by only using the current bucket location and the signature.
+	 */
+	*sig = hash >> 16;
+
+	*prim_bkt = hash & h->bucket_bitmask;
+	*sec_bkt =  (*prim_bkt ^ *sig) & h->bucket_bitmask;
+}
+
 struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
@@ -329,9 +349,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
-		h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
-	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
 		h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
 	else
 #endif
@@ -418,18 +436,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
 	return h->hash_func(key, h->key_len, h->hash_func_init_val);
 }
 
-/* Calc the secondary hash value from the primary hash value of a given key */
-static inline hash_sig_t
-rte_hash_secondary_hash(const hash_sig_t primary_hash)
-{
-	static const unsigned all_bits_shift = 12;
-	static const unsigned alt_bits_xor = 0x5bd1e995;
-
-	uint32_t tag = primary_hash >> all_bits_shift;
-
-	return primary_hash ^ ((tag + 1) * alt_bits_xor);
-}
-
 int32_t
 rte_hash_count(const struct rte_hash *h)
 {
@@ -561,14 +567,13 @@ enqueue_slot_back(const struct rte_hash *h,
 /* Search a key from bucket and update its data */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
-	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+	struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	int i;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		if (bkt->sig_current[i] == sig &&
-				bkt->sig_alt[i] == alt_hash) {
+		if (bkt->sig_current[i] == sig) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -595,7 +600,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		struct rte_hash_bucket *prim_bkt,
 		struct rte_hash_bucket *sec_bkt,
 		const struct rte_hash_key *key, void *data,
-		hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+		uint16_t sig, uint32_t new_idx,
 		int32_t *ret_val)
 {
 	unsigned int i;
@@ -606,7 +611,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -614,7 +619,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -629,7 +634,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			prim_bkt->sig_alt[i] = alt_hash;
 			prim_bkt->key_idx[i] = new_idx;
 			break;
 		}
@@ -654,7 +658,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *alt_bkt,
 			const struct rte_hash_key *key, void *data,
 			struct queue_node *leaf, uint32_t leaf_slot,
-			hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+			uint16_t sig, uint32_t new_idx,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
@@ -675,7 +679,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -683,7 +687,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -696,8 +700,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		prev_bkt = prev_node->bkt;
 		prev_slot = curr_node->prev_slot;
 
-		prev_alt_bkt_idx =
-			prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
+		prev_alt_bkt_idx = (prev_node->cur_bkt_idx ^
+				prev_bkt->sig_current[prev_slot]) &
+				h->bucket_bitmask;
 
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
@@ -711,10 +716,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		 * Cuckoo insert to move elements back to its
 		 * primary bucket if available
 		 */
-		curr_bkt->sig_alt[curr_slot] =
-			 prev_bkt->sig_current[prev_slot];
 		curr_bkt->sig_current[curr_slot] =
-			prev_bkt->sig_alt[prev_slot];
+			prev_bkt->sig_current[prev_slot];
 		curr_bkt->key_idx[curr_slot] =
 			prev_bkt->key_idx[prev_slot];
 
@@ -724,7 +727,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
-	curr_bkt->sig_alt[curr_slot] = alt_hash;
 	curr_bkt->key_idx[curr_slot] = new_idx;
 
 	__hash_rw_writer_unlock(h);
@@ -742,39 +744,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *bkt,
 			struct rte_hash_bucket *sec_bkt,
 			const struct rte_hash_key *key, void *data,
-			hash_sig_t sig, hash_sig_t alt_hash,
+			uint16_t sig, uint32_t bucket_idx,
 			uint32_t new_idx, int32_t *ret_val)
 {
 	unsigned int i;
 	struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
 	struct queue_node *tail, *head;
 	struct rte_hash_bucket *curr_bkt, *alt_bkt;
+	uint32_t cur_idx, alt_idx;
 
 	tail = queue;
 	head = queue + 1;
 	tail->bkt = bkt;
 	tail->prev = NULL;
 	tail->prev_slot = -1;
+	tail->cur_bkt_idx = bucket_idx;
 
 	/* Cuckoo bfs Search */
 	while (likely(tail != head && head <
 					queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
 					RTE_HASH_BUCKET_ENTRIES)) {
 		curr_bkt = tail->bkt;
+		cur_idx = tail->cur_bkt_idx;
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
 				int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
 						bkt, sec_bkt, key, data,
-						tail, i, sig, alt_hash,
+						tail, i, sig,
 						new_idx, ret_val);
 				if (likely(ret != -1))
 					return ret;
 			}
 
 			/* Enqueue new node and keep prev node info */
-			alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
-						    & h->bucket_bitmask]);
+			alt_idx = (curr_bkt->sig_current[i] ^ cur_idx) &
+							h->bucket_bitmask;
+			alt_bkt = &(h->buckets[alt_idx]);
 			head->bkt = alt_bkt;
+			head->cur_bkt_idx = alt_idx;
 			head->prev = tail;
 			head->prev_slot = i;
 			head++;
@@ -789,7 +796,7 @@ static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig, void *data)
 {
-	hash_sig_t alt_hash;
+	uint16_t short_sig;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
@@ -804,18 +811,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	int32_t ret_val;
 	struct rte_hash_bucket *last;
 
-	prim_bucket_idx = sig & h->bucket_bitmask;
+	get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx, &short_sig);
 	prim_bkt = &h->buckets[prim_bucket_idx];
-	rte_prefetch0(prim_bkt);
-
-	alt_hash = rte_hash_secondary_hash(sig);
-	sec_bucket_idx = alt_hash & h->bucket_bitmask;
 	sec_bkt = &h->buckets[sec_bucket_idx];
+	rte_prefetch0(prim_bkt);
 	rte_prefetch0(sec_bkt);
 
 	/* Check if key is already inserted in primary location */
 	__hash_rw_writer_lock(h);
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -823,12 +827,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Check if key is already inserted in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			return ret;
 		}
 	}
+
 	__hash_rw_writer_unlock(h);
 
 	/* Did not find a match, so get a new slot for storing the new key */
@@ -866,7 +871,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+					short_sig, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -876,7 +881,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -886,7 +891,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-					alt_hash, sig, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, new_idx, &ret_val);
 
 	if (ret == 0)
 		return new_idx - 1;
@@ -907,14 +912,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	__hash_rw_writer_lock(h);
 	/* We check of duplicates again since could be added before the lock */
 	/* Check if key is already inserted in primary location */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		goto failure;
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			enqueue_slot_back(h, cached_free_slots, slot_id);
 			goto failure;
@@ -927,8 +932,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
-				cur_bkt->sig_current[i] = alt_hash;
-				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->sig_current[i] = short_sig;
 				cur_bkt->key_idx[i] = new_idx;
 				__hash_rw_writer_unlock(h);
 				return new_idx - 1;
@@ -946,8 +950,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	bkt_id = (uint32_t)((uintptr_t) ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
-	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
@@ -1006,7 +1009,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
 
 /* Search one bucket to find the match key */
 static inline int32_t
-search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
+search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
 			void **data, const struct rte_hash_bucket *bkt)
 {
 	int i;
@@ -1035,31 +1038,29 @@ static inline int32_t
 __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 					hash_sig_t sig, void **data)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx, &short_sig);
+	bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_reader_lock(h);
 
 	/* Check if key is in primary location */
-	ret = search_one_bucket(h, key, sig, data, bkt);
+	ret = search_one_bucket(h, key, short_sig, data, bkt);
 	if (ret != -1) {
 		__hash_rw_reader_unlock(h);
 		return ret;
 	}
 
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	bkt = &h->buckets[sec_bucket_idx];
 
 	/* Check if key is in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, bkt) {
-		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
 		if (ret != -1) {
 			__hash_rw_reader_unlock(h);
 			return ret;
@@ -1106,7 +1107,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	struct lcore_cache *cached_free_slots;
 
 	bkt->sig_current[i] = NULL_SIGNATURE;
-	bkt->sig_alt[i] = NULL_SIGNATURE;
 	if (h->multi_writer_support) {
 		lcore_id = rte_lcore_id();
 		cached_free_slots = &h->local_free_slots[lcore_id];
@@ -1131,7 +1131,7 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig)
+			struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
@@ -1163,31 +1163,29 @@ static inline int32_t
 __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt;
 	struct rte_hash_bucket *cur_bkt, *prev_bkt, *next_bkt;
 	int32_t ret, i;
 	struct rte_hash_bucket *tobe_removed_bkt = NULL;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	prim_bkt = &h->buckets[bucket_idx];
+	get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx, &short_sig);
+	prim_bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, prim_bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
 
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	sec_bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[sec_bucket_idx];
 
 	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, sec_bkt, alt_hash);
+	ret = search_and_remove(h, key, sec_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -1197,7 +1195,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	if (h->ext_table_support) {
 		next_bkt = sec_bkt->next;
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
-			ret = search_and_remove(h, key, cur_bkt, alt_hash);
+			ret = search_and_remove(h, key, cur_bkt, short_sig);
 			if (ret != -1)
 				goto return_bkt;
 		}
@@ -1272,52 +1270,32 @@ static inline void
 compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 			const struct rte_hash_bucket *prim_bkt,
 			const struct rte_hash_bucket *sec_bkt,
-			hash_sig_t prim_hash, hash_sig_t sec_hash,
+			uint16_t sig,
 			enum rte_hash_sig_compare_function sig_cmp_fn)
 {
 	unsigned int i;
 
 	switch (sig_cmp_fn) {
-#ifdef RTE_MACHINE_CPUFLAG_AVX2
-	case RTE_HASH_COMPARE_AVX2:
-		*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)prim_bkt->sig_current),
-				_mm256_set1_epi32(prim_hash)));
-		*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)sec_bkt->sig_current),
-				_mm256_set1_epi32(sec_hash)));
-		break;
-#endif
 #ifdef RTE_MACHINE_CPUFLAG_SSE2
 	case RTE_HASH_COMPARE_SSE:
-		/* Compare the first 4 signatures in the bucket */
-		*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+		/* Compare all signatures in the bucket */
+		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
-				_mm_set1_epi32(prim_hash)));
-		*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&prim_bkt->sig_current[4]),
-				_mm_set1_epi32(prim_hash)))) << 4;
-		/* Compare the first 4 signatures in the bucket */
-		*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+				_mm_set1_epi16(sig)));
+		/* Compare all signatures in the bucket */
+		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
-				_mm_set1_epi32(sec_hash)));
-		*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&sec_bkt->sig_current[4]),
-				_mm_set1_epi32(sec_hash)))) << 4;
+				_mm_set1_epi16(sig)));
 		break;
 #endif
 	default:
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			*prim_hash_matches |=
-				((prim_hash == prim_bkt->sig_current[i]) << i);
+				((sig == prim_bkt->sig_current[i]) << (i << 1));
 			*sec_hash_matches |=
-				((sec_hash == sec_bkt->sig_current[i]) << i);
+				((sig == sec_bkt->sig_current[i]) << (i << 1));
 		}
 	}
 
@@ -1333,7 +1311,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	int32_t i;
 	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
-	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
@@ -1351,10 +1331,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		rte_prefetch0(keys[i + PREFETCH_OFFSET]);
 
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
+		get_buckets_index(h, prim_hash[i],
+				&prim_index[i], &sec_index[i], &sig[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] =  &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1363,10 +1344,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	/* Calculate and prefetch rest of the buckets */
 	for (; i < num_keys; i++) {
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		get_buckets_index(h, prim_hash[i],
+				&prim_index[i], &sec_index[i], &sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] =  &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1377,10 +1360,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
 				primary_bkt[i], secondary_bkt[i],
-				prim_hash[i], sec_hash[i], h->sig_cmp_fn);
+				sig[i], h->sig_cmp_fn);
 
 		if (prim_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 			uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1391,7 +1375,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		}
 
 		if (sec_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 			uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1405,7 +1390,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		positions[i] = -ENOENT;
 		while (prim_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 
 			uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1424,11 +1410,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			prim_hitmask[i] &= ~(1 << (hit_index));
+			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 		while (sec_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 
 			uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1448,7 +1435,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			sec_hitmask[i] &= ~(1 << (hit_index));
+			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 next_key:
@@ -1472,10 +1459,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
 			if (data != NULL)
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], &data[i], cur_bkt);
+						sig[i], &data[i], cur_bkt);
 			else
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], NULL, cur_bkt);
+						sig[i], NULL, cur_bkt);
 			if (ret != -1) {
 				positions[i] = ret;
 				hits |= 1ULL << i;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index f190b04..775b93f 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -131,18 +131,15 @@ struct rte_hash_key {
 enum rte_hash_sig_compare_function {
 	RTE_HASH_COMPARE_SCALAR = 0,
 	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_AVX2,
 	RTE_HASH_COMPARE_NUM
 };
 
 /** Bucket structure */
 struct rte_hash_bucket {
-	hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
+	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
 	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
 
-	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
-
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
 	void *next;
@@ -195,6 +192,7 @@ struct rte_hash {
 
 struct queue_node {
 	struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
+	uint32_t cur_bkt_idx;
 
 	struct queue_node *prev;     /* Parent(bucket) in search path */
 	int prev_slot;               /* Parent(slot) in search path */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v2 0/7] hash: add extendable bucket and partial key hashing
  2018-09-06 17:09 [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing Yipeng Wang
                   ` (4 preceding siblings ...)
  2018-09-06 17:09 ` [PATCH v1 5/5] hash: use partial-key hashing Yipeng Wang
@ 2018-09-21 17:17 ` Yipeng Wang
  2018-09-21 17:17   ` [PATCH v2 1/7] test/hash: fix bucket size in hash perf test Yipeng Wang
                     ` (8 more replies)
  2018-09-26 12:54 ` [PATCH v3 0/5] hash: fix multiple issues Yipeng Wang
  2018-09-26 20:26 ` [PATCH v3 0/3] hash: add extendable bucket and partial key hashing Yipeng Wang
  7 siblings, 9 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-21 17:17 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

The first four commits of the patch set try to fix small issues of
previous code.

The other commits make two major optimizations over the current rte_hash
library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

V1->V2:
1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
2. hash: Reorder the rte_hash struct to align cache line better.
3. test: Minor changes in auto test to add key insertion failure check during
iteration test.
4. test: Add new commit to fix read-write test non-consecutive core issue.
4. hash: Add a new commit to remove unnecessary code introduced by previous
patches.
5. hash: Comments improvement and coding style improvements over multiple
places.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>

Yipeng Wang (7):
  test/hash: fix bucket size in hash perf test
  test/hash: more accurate hash perf test output
  test/hash: fix rw test with non-consecutive cores
  hash: fix unnecessary code
  hash: add extendable bucket feature
  test/hash: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 516 +++++++++++++++++++++++++++-----------
 lib/librte_hash/rte_cuckoo_hash.h |  13 +-
 lib/librte_hash/rte_hash.h        |   8 +-
 test/test/test_hash.c             | 151 ++++++++++-
 test/test/test_hash_perf.c        | 126 +++++++---
 test/test/test_hash_readwrite.c   |  78 +++---
 6 files changed, 672 insertions(+), 220 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v2 1/7] test/hash: fix bucket size in hash perf test
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
@ 2018-09-21 17:17   ` Yipeng Wang
  2018-09-26 10:04     ` Bruce Richardson
  2018-09-27  4:23     ` Honnappa Nagarahalli
  2018-09-21 17:17   ` [PATCH v2 2/7] test/hash: more accurate hash perf test output Yipeng Wang
                     ` (7 subsequent siblings)
  8 siblings, 2 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-21 17:17 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

The bucket size was changed from 4 to 8 but the corresponding
perf test was not changed accordingly.

Fixes: 58017c98ed53 ("hash: add vectorized comparison")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash_perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 33dcb9f..9ed7125 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -20,7 +20,7 @@
 #define MAX_ENTRIES (1 << 19)
 #define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
-#define BUCKET_SIZE 4
+#define BUCKET_SIZE 8
 #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
 #define MAX_KEYSIZE 64
 #define NUM_KEYSIZES 10
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v2 2/7] test/hash: more accurate hash perf test output
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-09-21 17:17   ` [PATCH v2 1/7] test/hash: fix bucket size in hash perf test Yipeng Wang
@ 2018-09-21 17:17   ` Yipeng Wang
  2018-09-26 10:07     ` Bruce Richardson
  2018-09-21 17:17   ` [PATCH v2 3/7] test/hash: fix rw test with non-consecutive cores Yipeng Wang
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-21 17:17 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

Edit the printf information when error happens to be more
accurate and informative.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash_perf.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 9ed7125..4d00c20 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -248,7 +248,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 						(const void *) keys[i],
 						signatures[i], data);
 			if (ret < 0) {
-				printf("Failed to add key number %u\n", ret);
+				printf("H+D: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else if (with_hash && !with_data) {
@@ -258,7 +258,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 			if (ret >= 0)
 				positions[i] = ret;
 			else {
-				printf("Failed to add key number %u\n", ret);
+				printf("H: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else if (!with_hash && with_data) {
@@ -266,7 +266,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 						(const void *) keys[i],
 						data);
 			if (ret < 0) {
-				printf("Failed to add key number %u\n", ret);
+				printf("D: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else {
@@ -274,7 +274,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 			if (ret >= 0)
 				positions[i] = ret;
 			else {
-				printf("Failed to add key number %u\n", ret);
+				printf("Failed to add key number %u\n", i);
 				return -1;
 			}
 		}
@@ -442,7 +442,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 		if (ret >= 0)
 			positions[i] = ret;
 		else {
-			printf("Failed to add key number %u\n", ret);
+			printf("Failed to delete key number %u\n", i);
 			return -1;
 		}
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v2 3/7] test/hash: fix rw test with non-consecutive cores
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-09-21 17:17   ` [PATCH v2 1/7] test/hash: fix bucket size in hash perf test Yipeng Wang
  2018-09-21 17:17   ` [PATCH v2 2/7] test/hash: more accurate hash perf test output Yipeng Wang
@ 2018-09-21 17:17   ` Yipeng Wang
  2018-09-26 11:02     ` Bruce Richardson
  2018-09-26 11:13     ` Bruce Richardson
  2018-09-21 17:17   ` [PATCH v2 4/7] hash: fix unnecessary code Yipeng Wang
                     ` (5 subsequent siblings)
  8 siblings, 2 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-21 17:17 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

the multi-reader and multi-writer rte_hash unit test does not
work correctly with non-consicutive core ids. This commit
fixes the issue.

Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash_readwrite.c | 78 ++++++++++++++++++++++++++---------------
 1 file changed, 49 insertions(+), 29 deletions(-)

diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c
index 55ae33d..2a4f7b9 100644
--- a/test/test/test_hash_readwrite.c
+++ b/test/test/test_hash_readwrite.c
@@ -24,6 +24,7 @@
 #define NUM_TEST 3
 unsigned int core_cnt[NUM_TEST] = {2, 4, 8};
 
+unsigned int slave_core_ids[RTE_MAX_LCORE];
 struct perf {
 	uint32_t single_read;
 	uint32_t single_write;
@@ -60,12 +61,15 @@ test_hash_readwrite_worker(__attribute__((unused)) void *arg)
 	uint64_t begin, cycles;
 	int ret;
 
-	offset = (lcore_id - rte_get_master_lcore())
-			* tbl_rw_test_param.num_insert;
+	for (i = 0; i < rte_lcore_count(); i++) {
+		if (slave_core_ids[i] == lcore_id)
+			break;
+	}
+	offset = tbl_rw_test_param.num_insert * i;
 
 	printf("Core #%d inserting and reading %d: %'"PRId64" - %'"PRId64"\n",
 	       lcore_id, tbl_rw_test_param.num_insert,
-	       offset, offset + tbl_rw_test_param.num_insert);
+	       offset, offset + tbl_rw_test_param.num_insert - 1);
 
 	begin = rte_rdtsc_precise();
 
@@ -171,6 +175,7 @@ test_hash_readwrite_functional(int use_htm)
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 	int use_jhash = 1;
+	int slave_cnt = rte_lcore_count() - 1;
 
 	rte_atomic64_init(&gcycles);
 	rte_atomic64_clear(&gcycles);
@@ -182,17 +187,17 @@ test_hash_readwrite_functional(int use_htm)
 		goto err;
 
 	tbl_rw_test_param.num_insert =
-		TOTAL_INSERT / rte_lcore_count();
+		TOTAL_INSERT / slave_cnt;
 
 	tbl_rw_test_param.rounded_tot_insert =
 		tbl_rw_test_param.num_insert
-		* rte_lcore_count();
+		* slave_cnt;
 
 	printf("++++++++Start function tests:+++++++++\n");
 
 	/* Fire all threads. */
 	rte_eal_mp_remote_launch(test_hash_readwrite_worker,
-				 NULL, CALL_MASTER);
+				 NULL, SKIP_MASTER);
 	rte_eal_mp_wait_lcore();
 
 	while (rte_hash_iterate(tbl_rw_test_param.h, &next_key,
@@ -249,7 +254,7 @@ test_hash_readwrite_functional(int use_htm)
 }
 
 static int
-test_rw_reader(__attribute__((unused)) void *arg)
+test_rw_reader(void *arg)
 {
 	uint64_t i;
 	uint64_t begin, cycles;
@@ -276,7 +281,7 @@ test_rw_reader(__attribute__((unused)) void *arg)
 }
 
 static int
-test_rw_writer(__attribute__((unused)) void *arg)
+test_rw_writer(void *arg)
 {
 	uint64_t i;
 	uint32_t lcore_id = rte_lcore_id();
@@ -285,8 +290,13 @@ test_rw_writer(__attribute__((unused)) void *arg)
 	uint64_t start_coreid = (uint64_t)(uintptr_t)arg;
 	uint64_t offset;
 
-	offset = TOTAL_INSERT / 2 + (lcore_id - start_coreid)
-					* tbl_rw_test_param.num_insert;
+	for (i = 0; i < rte_lcore_count(); i++) {
+		if (slave_core_ids[i] == lcore_id)
+			break;
+	}
+
+	offset = TOTAL_INSERT / 2 + (i - (start_coreid)) *
+				tbl_rw_test_param.num_insert;
 	begin = rte_rdtsc_precise();
 	for (i = offset; i < offset + tbl_rw_test_param.num_insert; i++) {
 		ret = rte_hash_add_key_data(tbl_rw_test_param.h,
@@ -384,8 +394,8 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 	perf_results->single_read = end / i;
 
 	for (n = 0; n < NUM_TEST; n++) {
-		unsigned int tot_lcore = rte_lcore_count();
-		if (tot_lcore < core_cnt[n] * 2 + 1)
+		unsigned int tot_slave_lcore = rte_lcore_count() - 1;
+		if (tot_slave_lcore < core_cnt[n] * 2)
 			goto finish;
 
 		rte_atomic64_clear(&greads);
@@ -415,17 +425,19 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 		 */
 
 		/* Test only reader cases */
-		for (i = 1; i <= core_cnt[n]; i++)
+		for (i = 0; i < core_cnt[n]; i++)
 			rte_eal_remote_launch(test_rw_reader,
-					(void *)(uintptr_t)read_cnt, i);
+					(void *)(uintptr_t)read_cnt,
+					slave_core_ids[i]);
 
 		rte_eal_mp_wait_lcore();
 
 		start_coreid = i;
 		/* Test only writer cases */
-		for (; i <= core_cnt[n] * 2; i++)
+		for (; i < core_cnt[n] * 2; i++)
 			rte_eal_remote_launch(test_rw_writer,
-					(void *)((uintptr_t)start_coreid), i);
+					(void *)((uintptr_t)start_coreid),
+					slave_core_ids[i]);
 
 		rte_eal_mp_wait_lcore();
 
@@ -464,22 +476,26 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 			}
 		}
 
-		start_coreid = core_cnt[n] + 1;
+		start_coreid = core_cnt[n];
 
 		if (reader_faster) {
-			for (i = core_cnt[n] + 1; i <= core_cnt[n] * 2; i++)
+			for (i = core_cnt[n]; i < core_cnt[n] * 2; i++)
 				rte_eal_remote_launch(test_rw_writer,
-					(void *)((uintptr_t)start_coreid), i);
-			for (i = 1; i <= core_cnt[n]; i++)
+					(void *)((uintptr_t)start_coreid),
+					slave_core_ids[i]);
+			for (i = 0; i < core_cnt[n]; i++)
 				rte_eal_remote_launch(test_rw_reader,
-					(void *)(uintptr_t)read_cnt, i);
+					(void *)(uintptr_t)read_cnt,
+					slave_core_ids[i]);
 		} else {
-			for (i = 1; i <= core_cnt[n]; i++)
+			for (i = 0; i < core_cnt[n]; i++)
 				rte_eal_remote_launch(test_rw_reader,
-					(void *)(uintptr_t)read_cnt, i);
-			for (; i <= core_cnt[n] * 2; i++)
+					(void *)(uintptr_t)read_cnt,
+					slave_core_ids[i]);
+			for (; i < core_cnt[n] * 2; i++)
 				rte_eal_remote_launch(test_rw_writer,
-					(void *)((uintptr_t)start_coreid), i);
+					(void *)((uintptr_t)start_coreid),
+					slave_core_ids[i]);
 		}
 
 		rte_eal_mp_wait_lcore();
@@ -562,13 +578,19 @@ test_hash_readwrite_main(void)
 	 * writer threads for performance numbers.
 	 */
 	int use_htm, reader_faster;
+	unsigned int i = 0, core_id = 0;
 
-	if (rte_lcore_count() == 1) {
-		printf("More than one lcore is required "
+	if (rte_lcore_count() <= 2) {
+		printf("More than two lcores are required "
 			"to do read write test\n");
 		return 0;
 	}
 
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		slave_core_ids[i] = core_id;
+		i++;
+	}
+
 	setlocale(LC_NUMERIC, "");
 
 	if (rte_tm_supported()) {
@@ -610,8 +632,6 @@ test_hash_readwrite_main(void)
 
 	printf("Results summary:\n");
 
-	int i;
-
 	printf("single read: %u\n", htm_results.single_read);
 	printf("single write: %u\n", htm_results.single_write);
 	for (i = 0; i < NUM_TEST; i++) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v2 4/7] hash: fix unnecessary code
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (2 preceding siblings ...)
  2018-09-21 17:17   ` [PATCH v2 3/7] test/hash: fix rw test with non-consecutive cores Yipeng Wang
@ 2018-09-21 17:17   ` Yipeng Wang
  2018-09-26 12:55     ` Bruce Richardson
  2018-09-21 17:17   ` [PATCH v2 5/7] hash: add extendable bucket feature Yipeng Wang
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-21 17:17 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

Since the depth-first search of cuckoo path is removed, we do not
need the macro anymore which specifies the depth of the cuckoo
search.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index b43f467..fc0e5c2 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -104,8 +104,6 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
 
 #define LCORE_CACHE_SIZE		64
 
-#define RTE_HASH_MAX_PUSHES             100
-
 #define RTE_HASH_BFS_QUEUE_MAX_LEN       1000
 
 #define RTE_XABORT_CUCKOO_PATH_INVALIDED 0x4
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (3 preceding siblings ...)
  2018-09-21 17:17   ` [PATCH v2 4/7] hash: fix unnecessary code Yipeng Wang
@ 2018-09-21 17:17   ` Yipeng Wang
  2018-09-27  4:23     ` Honnappa Nagarahalli
  2018-09-21 17:17   ` [PATCH v2 6/7] test/hash: implement extendable bucket hash test Yipeng Wang
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-21 17:17 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the table utilization can always acheive 100%.
Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 326 +++++++++++++++++++++++++++++++++-----
 lib/librte_hash/rte_cuckoo_hash.h |   5 +
 lib/librte_hash/rte_hash.h        |   3 +
 3 files changed, 292 insertions(+), 42 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..616900b 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,6 +31,10 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
+#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)                            \
+	for (CURRENT_BKT = START_BUCKET;                                      \
+		CURRENT_BKT != NULL;                                          \
+		CURRENT_BKT = CURRENT_BKT->next)
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
 	return h;
 }
 
+static inline struct rte_hash_bucket *
+rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt)
+{
+	while (lst_bkt->next != NULL)
+		lst_bkt = lst_bkt->next;
+	return lst_bkt;
+}
+
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)
 {
 	h->cmp_jump_table_idx = KEY_CUSTOM;
@@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	struct rte_tailq_entry *te = NULL;
 	struct rte_hash_list *hash_list;
 	struct rte_ring *r = NULL;
+	struct rte_ring *r_ext = NULL;
 	char hash_name[RTE_HASH_NAMESIZE];
 	void *k = NULL;
 	void *buckets = NULL;
+	void *buckets_ext = NULL;
 	char ring_name[RTE_RING_NAMESIZE];
+	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
 	unsigned i;
 	unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		multi_writer_support = 1;
 	}
 
+	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
+		ext_table_support = 1;
+
 	/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
 	if (multi_writer_support)
 		/*
@@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err;
 	}
 
+	const uint32_t num_buckets = rte_align32pow2(params->entries) /
+						RTE_HASH_BUCKET_ENTRIES;
+
+	snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
+								params->name);
+	/* Create ring for extendable buckets. */
+	if (ext_table_support) {
+		r_ext = rte_ring_create(ext_ring_name,
+				rte_align32pow2(num_buckets + 1),
+				params->socket_id, 0);
+
+		if (r_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+								"failed\n");
+			goto err;
+		}
+	}
+
 	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
 
 	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err_unlock;
 	}
 
-	const uint32_t num_buckets = rte_align32pow2(params->entries)
-					/ RTE_HASH_BUCKET_ENTRIES;
-
 	buckets = rte_zmalloc_socket(NULL,
 				num_buckets * sizeof(struct rte_hash_bucket),
 				RTE_CACHE_LINE_SIZE, params->socket_id);
 
 	if (buckets == NULL) {
-		RTE_LOG(ERR, HASH, "memory allocation failed\n");
+		RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
 		goto err_unlock;
 	}
 
+	/* Allocate same number of extendable buckets */
+	if (ext_table_support) {
+		buckets_ext = rte_zmalloc_socket(NULL,
+				num_buckets * sizeof(struct rte_hash_bucket),
+				RTE_CACHE_LINE_SIZE, params->socket_id);
+		if (buckets_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+							"failed\n");
+			goto err_unlock;
+		}
+		/* Populate ext bkt ring. We reserve 0 similar to the
+		 * key-data slot, just in case in future we want to
+		 * use bucket index for the linked list and 0 means NULL
+		 * for next bucket
+		 */
+		for (i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+	}
+
 	const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params->key_len;
 	const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
 
@@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->num_buckets = num_buckets;
 	h->bucket_bitmask = h->num_buckets - 1;
 	h->buckets = buckets;
+	h->buckets_ext = buckets_ext;
+	h->free_ext_bkts = r_ext;
 	h->hash_func = (params->hash_func == NULL) ?
 		default_hash_func : params->hash_func;
 	h->key_store = k;
@@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->hw_trans_mem_support = hw_trans_mem_support;
 	h->multi_writer_support = multi_writer_support;
 	h->readwrite_concur_support = readwrite_concur_support;
+	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
 err:
 	rte_ring_free(r);
+	rte_ring_free(r_ext);
 	rte_free(te);
 	rte_free(h);
 	rte_free(buckets);
+	rte_free(buckets_ext);
 	rte_free(k);
 	return NULL;
 }
@@ -344,6 +402,7 @@ rte_hash_free(struct rte_hash *h)
 		rte_free(h->readwrite_lock);
 	}
 	rte_ring_free(h->free_slots);
+	rte_ring_free(h->free_ext_bkts);
 	rte_free(h->key_store);
 	rte_free(h->buckets);
 	rte_free(h);
@@ -403,7 +462,6 @@ __hash_rw_writer_lock(const struct rte_hash *h)
 		rte_rwlock_write_lock(h->readwrite_lock);
 }
 
-
 static inline void
 __hash_rw_reader_lock(const struct rte_hash *h)
 {
@@ -448,6 +506,14 @@ rte_hash_reset(struct rte_hash *h)
 	while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
 		rte_pause();
 
+	/* clear free extendable bucket ring and memory */
+	if (h->ext_table_support) {
+		memset(h->buckets_ext, 0, h->num_buckets *
+						sizeof(struct rte_hash_bucket));
+		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
+			rte_pause();
+	}
+
 	/* Repopulate the free slots ring. Entry zero is reserved for key misses */
 	if (h->multi_writer_support)
 		tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
@@ -458,6 +524,12 @@ rte_hash_reset(struct rte_hash *h)
 	for (i = 1; i < tot_ring_cnt + 1; i++)
 		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
 
+	/* Repopulate the free ext bkt ring. */
+	if (h->ext_table_support)
+		for (i = 1; i < h->num_buckets + 1; i++)
+			rte_ring_sp_enqueue(h->free_ext_bkts,
+						(void *)((uintptr_t) i));
+
 	if (h->multi_writer_support) {
 		/* Reset local caches per lcore */
 		for (i = 0; i < RTE_MAX_LCORE; i++)
@@ -524,24 +596,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		int32_t *ret_val)
 {
 	unsigned int i;
-	struct rte_hash_bucket *cur_bkt = prim_bkt;
+	struct rte_hash_bucket *cur_bkt;
 	int32_t ret;
 
 	__hash_rw_writer_lock(h);
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	/* Insert new entry if there is room in the primary
@@ -580,7 +655,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
-	struct rte_hash_bucket *cur_bkt = bkt;
+	struct rte_hash_bucket *cur_bkt;
 	struct queue_node *prev_node, *curr_node = leaf;
 	struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
 	uint32_t prev_slot, curr_slot = leaf_slot;
@@ -597,18 +672,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
 
-	ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	while (likely(curr_node->prev != NULL)) {
@@ -711,15 +788,18 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	hash_sig_t alt_hash;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
-	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
 	void *slot_id = NULL;
-	uint32_t new_idx;
+	void *ext_bkt_id = NULL;
+	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
+	unsigned int i;
 	struct lcore_cache *cached_free_slots = NULL;
 	int32_t ret_val;
+	struct rte_hash_bucket *last;
 
 	prim_bucket_idx = sig & h->bucket_bitmask;
 	prim_bkt = &h->buckets[prim_bucket_idx];
@@ -739,10 +819,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Check if key is already inserted in secondary location */
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_writer_unlock(h);
 
@@ -808,10 +890,71 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
-	} else {
+	}
+
+	/* if ext table not enabled, we failed the insertion */
+	if (!h->ext_table_support) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret;
 	}
+
+	/* Now we need to go through the extendable bucket. Protection is needed
+	 * to protect all extendable bucket processes.
+	 */
+	__hash_rw_writer_lock(h);
+	/* We check for duplicates again since could be inserted before the lock */
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	if (ret != -1) {
+		enqueue_slot_back(h, cached_free_slots, slot_id);
+		goto failure;
+	}
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			enqueue_slot_back(h, cached_free_slots, slot_id);
+			goto failure;
+		}
+	}
+
+	/* Search extendable buckets to find an empty entry to insert. */
+	struct rte_hash_bucket *next_bkt = sec_bkt->next;
+	FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			/* Check if slot is available */
+			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
+				cur_bkt->sig_current[i] = alt_hash;
+				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->key_idx[i] = new_idx;
+				__hash_rw_writer_unlock(h);
+				return new_idx - 1;
+			}
+		}
+	}
+
+	/* Failed to get an empty entry from extendable buckets. Link a new
+	 * extendable bucket. We first get a free bucket from ring.
+	 */
+	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+		ret = -ENOSPC;
+		goto failure;
+	}
+
+	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
+	/* Use the first location of the new bucket */
+	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
+	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
+	/* Link the new bucket to sec bucket linked list */
+	last = rte_hash_get_last_bkt(sec_bkt);
+	last->next = &h->buckets_ext[bkt_id];
+	__hash_rw_writer_unlock(h);
+	return new_idx - 1;
+
+failure:
+	__hash_rw_writer_unlock(h);
+	return ret;
+
 }
 
 int32_t
@@ -890,7 +1033,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
+	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
 
 	bucket_idx = sig & h->bucket_bitmask;
@@ -910,10 +1053,12 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 	bkt = &h->buckets[bucket_idx];
 
 	/* Check if key is in secondary location */
-	ret = search_one_bucket(h, key, alt_hash, data, bkt);
-	if (ret != -1) {
-		__hash_rw_reader_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, bkt) {
+		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		if (ret != -1) {
+			__hash_rw_reader_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_reader_unlock(h);
 	return -ENOENT;
@@ -1015,15 +1160,17 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
-	int32_t ret;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *cur_bkt, *prev_bkt, *next_bkt;
+	int32_t ret, i;
+	struct rte_hash_bucket *tobe_removed_bkt = NULL;
 
 	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	prim_bkt = &h->buckets[bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -1032,17 +1179,51 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	/* Calculate secondary hash */
 	alt_hash = rte_hash_secondary_hash(sig);
 	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[bucket_idx];
 
 	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, bkt, alt_hash);
+	ret = search_and_remove(h, key, sec_bkt, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
 
+	/* Not in main table, we need to search ext buckets */
+	if (h->ext_table_support) {
+		next_bkt = sec_bkt->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			ret = search_and_remove(h, key, cur_bkt, alt_hash);
+			if (ret != -1)
+				goto return_bkt;
+		}
+	}
+
 	__hash_rw_writer_unlock(h);
 	return -ENOENT;
+
+/* Search extendable buckets to see if any empty bucket need to be recycled */
+return_bkt:
+	for (cur_bkt = sec_bkt->next, prev_bkt = sec_bkt; cur_bkt != NULL;
+			prev_bkt = cur_bkt, cur_bkt = cur_bkt->next) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			if (cur_bkt->key_idx[i] != EMPTY_SLOT)
+				break;
+		}
+		if (i == RTE_HASH_BUCKET_ENTRIES) {
+			prev_bkt->next = cur_bkt->next;
+			cur_bkt->next = NULL;
+			tobe_removed_bkt = cur_bkt;
+			break;
+		}
+	}
+
+	__hash_rw_writer_unlock(h);
+
+	if (tobe_removed_bkt) {
+		uint32_t index = tobe_removed_bkt - h->buckets_ext + 1;
+		rte_ring_mp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+	}
+	return ret;
 }
 
 int32_t
@@ -1143,12 +1324,14 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 {
 	uint64_t hits = 0;
 	int32_t i;
+	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	struct rte_hash_bucket *cur_bkt, *next_bkt;
 
 	/* Prefetch first keys */
 	for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++)
@@ -1266,6 +1449,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		continue;
 	}
 
+	/* all found, do not need to go through ext bkt */
+	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
+		if (hit_mask != NULL)
+			*hit_mask = hits;
+		__hash_rw_reader_unlock(h);
+		return;
+	}
+
+	/* need to check ext buckets for match */
+	for (i = 0; i < num_keys; i++) {
+		if ((hits & (1ULL << i)) != 0)
+			continue;
+		next_bkt = secondary_bkt[i]->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			if (data != NULL)
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], &data[i], cur_bkt);
+			else
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], NULL, cur_bkt);
+			if (ret != -1) {
+				positions[i] = ret;
+				hits |= 1ULL << i;
+				break;
+			}
+		}
+	}
+
 	__hash_rw_reader_unlock(h);
 
 	if (hit_mask != NULL)
@@ -1308,10 +1519,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 
 	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries_main = h->num_buckets *
+							RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries = total_entries_main << 1;
+
 	/* Out of bounds */
-	if (*next >= total_entries)
-		return -ENOENT;
+	if (*next >= total_entries_main)
+		goto extend_table;
 
 	/* Calculate bucket and index of current iterator */
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
@@ -1321,8 +1535,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
-		if (*next == total_entries)
-			return -ENOENT;
+		if (*next == total_entries_main)
+			goto extend_table;
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
@@ -1341,4 +1555,32 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	(*next)++;
 
 	return position - 1;
+
+extend_table:
+	/* Out of bounds */
+	if (*next >= total_entries || !h->ext_table_support)
+		return -ENOENT;
+
+	bucket_idx = (*next - total_entries_main) / RTE_HASH_BUCKET_ENTRIES;
+	idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+
+	while (h->buckets_ext[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+		(*next)++;
+		if (*next == total_entries)
+			return -ENOENT;
+		bucket_idx = (*next - total_entries_main) /
+						RTE_HASH_BUCKET_ENTRIES;
+		idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+	}
+	/* Get position of entry in key table */
+	position = h->buckets_ext[bucket_idx].key_idx[idx];
+	next_key = (struct rte_hash_key *) ((char *)h->key_store +
+				position * h->key_entry_size);
+	/* Return key and data */
+	*key = next_key->key;
+	*data = next_key->pdata;
+
+	/* Increment iterator */
+	(*next)++;
+	return position - 1;
 }
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fc0e5c2..e601520 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -142,6 +142,8 @@ struct rte_hash_bucket {
 	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
+
+	void *next;
 } __rte_cache_aligned;
 
 /** A hash table structure. */
@@ -166,6 +168,7 @@ struct rte_hash {
 	/**< If multi-writer support is enabled. */
 	uint8_t readwrite_concur_support;
 	/**< If read-write concurrency support is enabled */
+	uint8_t ext_table_support;     /**< Enable extendable bucket table */
 	rte_hash_function hash_func;    /**< Function used to calculate hash. */
 	uint32_t hash_func_init_val;    /**< Init value used by hash_func. */
 	rte_hash_cmp_eq_t rte_hash_custom_cmp_eq;
@@ -184,6 +187,8 @@ struct rte_hash {
 	 * to the key table.
 	 */
 	rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
+	struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */
+	struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets */
 } __rte_cache_aligned;
 
 struct queue_node {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d931..11d8e28 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -37,6 +37,9 @@ extern "C" {
 /** Flag to support reader writer concurrency */
 #define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
 
+/** Flag to indicate the extendabe bucket table feature should be used */
+#define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
+
 /** Signature of key that is stored internally. */
 typedef uint32_t hash_sig_t;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v2 6/7] test/hash: implement extendable bucket hash test
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (4 preceding siblings ...)
  2018-09-21 17:17   ` [PATCH v2 5/7] hash: add extendable bucket feature Yipeng Wang
@ 2018-09-21 17:17   ` Yipeng Wang
  2018-09-27  4:24     ` Honnappa Nagarahalli
  2018-09-21 17:17   ` [PATCH v2 7/7] hash: use partial-key hashing Yipeng Wang
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-21 17:17 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash.c      | 151 +++++++++++++++++++++++++++++++++++++++++++--
 test/test/test_hash_perf.c | 114 +++++++++++++++++++++++++---------
 2 files changed, 230 insertions(+), 35 deletions(-)

diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd..c97095f 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -660,6 +660,116 @@ static int test_full_bucket(void)
 	return 0;
 }
 
+/*
+ * Similar to the test above (full bucket test), but for extendable buckets.
+ */
+static int test_extendable_bucket(void)
+{
+	struct rte_hash_parameters params_pseudo_hash = {
+		.name = "test5",
+		.entries = 64,
+		.key_len = sizeof(struct flow_key), /* 13 */
+		.hash_func = pseudo_hash,
+		.hash_func_init_val = 0,
+		.socket_id = 0,
+		.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
+	};
+	struct rte_hash *handle;
+	int pos[64];
+	int expected_pos[64];
+	unsigned int i;
+	struct flow_key rand_keys[64];
+
+	for (i = 0; i < 64; i++) {
+		rand_keys[i].port_dst = i;
+		rand_keys[i].port_src = i+1;
+	}
+
+	handle = rte_hash_create(&params_pseudo_hash);
+	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
+
+	/* Fill bucket */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add - update */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Delete 1 key, check other keys are still found */
+	pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
+	print_key_info("Del", &rand_keys[35], pos[35]);
+	RETURN_IF_ERROR(pos[35] != expected_pos[35],
+			"failed to delete key (pos[1]=%d)", pos[35]);
+	pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
+	print_key_info("Lkp", &rand_keys[20], pos[20]);
+	RETURN_IF_ERROR(pos[20] != expected_pos[20],
+			"failed lookup after deleting key from same bucket "
+			"(pos[20]=%d)", pos[20]);
+
+	/* Go back to previous state */
+	pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
+	print_key_info("Add", &rand_keys[35], pos[35]);
+	expected_pos[35] = pos[35];
+	RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)", pos[35]);
+
+	/* Delete */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
+		print_key_info("Del", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to delete key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != -ENOENT,
+			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add again */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	rte_hash_free(handle);
+
+	/* Cover the NULL case. */
+	rte_hash_free(0);
+	return 0;
+}
+
 /******************************************************************************/
 static int
 fbk_hash_unit_test(void)
@@ -1096,7 +1206,7 @@ test_hash_creation_with_good_parameters(void)
  * Test to see the average table utilization (entries added/max entries)
  * before hitting a random entry that cannot be added
  */
-static int test_average_table_utilization(void)
+static int test_average_table_utilization(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	uint8_t simple_key[MAX_KEYSIZE];
@@ -1107,12 +1217,23 @@ static int test_average_table_utilization(void)
 
 	printf("\n# Running test to determine average utilization"
 	       "\n  before adding elements begins to fail\n");
+	if (ext_table)
+		printf("ext table is enabled\n");
+	else
+		printf("ext table is disabled\n");
+
 	printf("Measuring performance, please wait");
 	fflush(stdout);
 	ut_params.entries = 1 << 16;
 	ut_params.name = "test_average_utilization";
 	ut_params.hash_func = rte_jhash;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
+
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
 	for (j = 0; j < ITERATIONS; j++) {
@@ -1161,7 +1282,7 @@ static int test_average_table_utilization(void)
 }
 
 #define NUM_ENTRIES 256
-static int test_hash_iteration(void)
+static int test_hash_iteration(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	unsigned i;
@@ -1177,6 +1298,11 @@ static int test_hash_iteration(void)
 	ut_params.name = "test_hash_iteration";
 	ut_params.hash_func = rte_jhash;
 	ut_params.key_len = 16;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
@@ -1186,8 +1312,13 @@ static int test_hash_iteration(void)
 		for (i = 0; i < ut_params.key_len; i++)
 			keys[added_keys][i] = rte_rand() % 255;
 		ret = rte_hash_add_key_data(handle, keys[added_keys], data[added_keys]);
-		if (ret < 0)
+		if (ret < 0) {
+			if (ext_table) {
+				printf("Insertion failed for ext table\n");
+				goto err;
+			}
 			break;
+		}
 	}
 
 	/* Iterate through the hash table */
@@ -1474,6 +1605,8 @@ test_hash(void)
 		return -1;
 	if (test_full_bucket() < 0)
 		return -1;
+	if (test_extendable_bucket() < 0)
+		return -1;
 
 	if (test_fbk_hash_find_existing() < 0)
 		return -1;
@@ -1483,9 +1616,17 @@ test_hash(void)
 		return -1;
 	if (test_hash_creation_with_good_parameters() < 0)
 		return -1;
-	if (test_average_table_utilization() < 0)
+
+	/* ext table disabled */
+	if (test_average_table_utilization(0) < 0)
+		return -1;
+	if (test_hash_iteration(0) < 0)
+		return -1;
+
+	/* ext table enabled */
+	if (test_average_table_utilization(1) < 0)
 		return -1;
-	if (test_hash_iteration() < 0)
+	if (test_hash_iteration(1) < 0)
 		return -1;
 
 	run_hash_func_tests();
diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 4d00c20..d169cd0 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -18,7 +18,8 @@
 #include "test.h"
 
 #define MAX_ENTRIES (1 << 19)
-#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
+#define KEYS_TO_ADD (MAX_ENTRIES)
+#define ADD_PERCENT 0.75 /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
 #define BUCKET_SIZE 8
 #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
@@ -77,7 +78,7 @@ static struct rte_hash_parameters ut_params = {
 
 static int
 create_table(unsigned int with_data, unsigned int table_index,
-		unsigned int with_locks)
+		unsigned int with_locks, unsigned int ext)
 {
 	char name[RTE_HASH_NAMESIZE];
 
@@ -95,6 +96,9 @@ create_table(unsigned int with_data, unsigned int table_index,
 	else
 		ut_params.extra_flag = 0;
 
+	if (ext)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	ut_params.name = name;
 	ut_params.key_len = hashtest_key_lens[table_index];
 	ut_params.socket_id = rte_socket_id();
@@ -116,15 +120,21 @@ create_table(unsigned int with_data, unsigned int table_index,
 
 /* Shuffle the keys that have been added, so lookups will be totally random */
 static void
-shuffle_input_keys(unsigned table_index)
+shuffle_input_keys(unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	uint32_t swap_idx;
 	uint8_t temp_key[MAX_KEYSIZE];
 	hash_sig_t temp_signature;
 	int32_t temp_position;
+	unsigned int keys_to_add;
+
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = KEYS_TO_ADD - 1; i > 0; i--) {
+	for (i = keys_to_add - 1; i > 0; i--) {
 		swap_idx = rte_rand() % i;
 
 		memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
@@ -146,14 +156,20 @@ shuffle_input_keys(unsigned table_index)
  * ALL can fit in hash table (no errors)
  */
 static int
-get_input_keys(unsigned with_pushes, unsigned table_index)
+get_input_keys(unsigned int with_pushes, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j;
 	unsigned bucket_idx, incr, success = 1;
 	uint8_t k = 0;
 	int32_t ret;
 	const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
+	unsigned int keys_to_add;
 
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 	/* Reset all arrays */
 	for (i = 0; i < MAX_ENTRIES; i++)
 		slot_taken[i] = 0;
@@ -170,7 +186,7 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 	 * Regardless a key has been added correctly or not (success),
 	 * the next one to try will be increased by 1.
 	 */
-	for (i = 0; i < KEYS_TO_ADD;) {
+	for (i = 0; i < keys_to_add;) {
 		incr = 0;
 		if (i != 0) {
 			keys[i][0] = ++k;
@@ -234,14 +250,20 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 }
 
 static int
-timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_adds(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *data;
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		data = (void *) ((uintptr_t) signatures[i]);
 		if (with_hash && with_data) {
 			ret = rte_hash_add_key_with_hash_data(h[table_index],
@@ -283,22 +305,31 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][ADD][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][ADD][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
 
 static int
-timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_lookups(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i, j;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *ret_data;
 	void *expected_data;
 	int32_t ret;
-
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD; j++) {
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
+	for (i = 0; i < num_lookups / keys_to_add; i++) {
+		for (j = 0; j < keys_to_add; j++) {
 			if (with_hash && with_data) {
 				ret = rte_hash_lookup_with_hash_data(h[table_index],
 							(const void *) keys[j],
@@ -351,13 +382,14 @@ timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_lookups_multi(unsigned with_data, unsigned table_index)
+timed_lookups_multi(unsigned int with_data, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j, k;
 	int32_t positions_burst[BURST_SIZE];
@@ -366,11 +398,20 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	void *ret_data[BURST_SIZE];
 	uint64_t hit_mask;
 	int ret;
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
 
 	const uint64_t start_tsc = rte_rdtsc();
 
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
+	for (i = 0; i < num_lookups/keys_to_add; i++) {
+		for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
 			for (k = 0; k < BURST_SIZE; k++)
 				keys_burst[k] = keys[j * BURST_SIZE + k];
 			if (with_data) {
@@ -418,19 +459,25 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_deletes(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		/* There are no delete functions with data, so just call two functions */
 		if (with_hash)
 			ret = rte_hash_del_key_with_hash(h[table_index],
@@ -450,7 +497,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][DELETE][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][DELETE][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
@@ -468,7 +515,8 @@ reset_table(unsigned table_index)
 }
 
 static int
-run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
+run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
+						unsigned int ext)
 {
 	unsigned i, j, with_data, with_hash;
 
@@ -477,25 +525,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
 
 	for (with_data = 0; with_data <= 1; with_data++) {
 		for (i = 0; i < NUM_KEYSIZES; i++) {
-			if (create_table(with_data, i, with_locks) < 0)
+			if (create_table(with_data, i, with_locks, ext) < 0)
 				return -1;
 
-			if (get_input_keys(with_pushes, i) < 0)
+			if (get_input_keys(with_pushes, i, ext) < 0)
 				return -1;
 			for (with_hash = 0; with_hash <= 1; with_hash++) {
-				if (timed_adds(with_hash, with_data, i) < 0)
+				if (timed_adds(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				for (j = 0; j < NUM_SHUFFLES; j++)
-					shuffle_input_keys(i);
+					shuffle_input_keys(i, ext);
 
-				if (timed_lookups(with_hash, with_data, i) < 0)
+				if (timed_lookups(with_hash, with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_lookups_multi(with_data, i) < 0)
+				if (timed_lookups_multi(with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_deletes(with_hash, with_data, i) < 0)
+				if (timed_deletes(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				/* Print a dot to show progress on operations */
@@ -631,10 +679,16 @@ test_hash_perf(void)
 				printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
 			else
 				printf("\nELEMENTS IN PRIMARY OR SECONDARY LOCATION\n");
-			if (run_all_tbl_perf_tests(with_pushes, with_locks) < 0)
+			if (run_all_tbl_perf_tests(with_pushes, with_locks, 0) < 0)
 				return -1;
 		}
 	}
+
+	printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
+
+	if (run_all_tbl_perf_tests(1, 0, 1) < 0)
+		return -1;
+
 	if (fbk_hash_perf_test() < 0)
 		return -1;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v2 7/7] hash: use partial-key hashing
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (5 preceding siblings ...)
  2018-09-21 17:17   ` [PATCH v2 6/7] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-09-21 17:17   ` Yipeng Wang
  2018-09-27  4:24     ` Honnappa Nagarahalli
  2018-09-26 12:57   ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Bruce Richardson
  2018-09-27  4:23   ` Honnappa Nagarahalli
  8 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-21 17:17 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, michel, honnappa.nagarahalli

This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Bascially the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 228 ++++++++++++++++++--------------------
 lib/librte_hash/rte_cuckoo_hash.h |   6 +-
 lib/librte_hash/rte_hash.h        |   5 +-
 3 files changed, 114 insertions(+), 125 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 616900b..5108ff0 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -90,6 +90,27 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
 		return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
 }
 
+static inline void
+get_buckets_index(const struct rte_hash *h, const hash_sig_t hash,
+		uint32_t *prim_bkt, uint32_t *sec_bkt, uint16_t *sig)
+{
+	/*
+	 * We use higher 16 bits of hash as the signature value stored in table.
+	 * We use the lower bits for the primary bucket
+	 * location. Then we XOR primary bucket location and the signature
+	 * to get the secondary bucket location. This is same as
+	 * proposed in Bin Fan, et al's paper
+	 * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
+	 * Smarter Hashing". The benefit to use
+	 * XOR is that one could derive the alternative bucket location
+	 * by only using the current bucket location and the signature.
+	 */
+	*sig = hash >> 16;
+
+	*prim_bkt = hash & h->bucket_bitmask;
+	*sec_bkt =  (*prim_bkt ^ *sig) & h->bucket_bitmask;
+}
+
 struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
@@ -327,9 +348,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
-		h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
-	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
 		h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
 	else
 #endif
@@ -416,18 +435,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
 	return h->hash_func(key, h->key_len, h->hash_func_init_val);
 }
 
-/* Calc the secondary hash value from the primary hash value of a given key */
-static inline hash_sig_t
-rte_hash_secondary_hash(const hash_sig_t primary_hash)
-{
-	static const unsigned all_bits_shift = 12;
-	static const unsigned alt_bits_xor = 0x5bd1e995;
-
-	uint32_t tag = primary_hash >> all_bits_shift;
-
-	return primary_hash ^ ((tag + 1) * alt_bits_xor);
-}
-
 int32_t
 rte_hash_count(const struct rte_hash *h)
 {
@@ -558,14 +565,13 @@ enqueue_slot_back(const struct rte_hash *h,
 /* Search a key from bucket and update its data */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
-	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+	struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	int i;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		if (bkt->sig_current[i] == sig &&
-				bkt->sig_alt[i] == alt_hash) {
+		if (bkt->sig_current[i] == sig) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -592,7 +598,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		struct rte_hash_bucket *prim_bkt,
 		struct rte_hash_bucket *sec_bkt,
 		const struct rte_hash_key *key, void *data,
-		hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+		uint16_t sig, uint32_t new_idx,
 		int32_t *ret_val)
 {
 	unsigned int i;
@@ -603,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -611,7 +617,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -626,7 +632,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			prim_bkt->sig_alt[i] = alt_hash;
 			prim_bkt->key_idx[i] = new_idx;
 			break;
 		}
@@ -651,7 +656,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *alt_bkt,
 			const struct rte_hash_key *key, void *data,
 			struct queue_node *leaf, uint32_t leaf_slot,
-			hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+			uint16_t sig, uint32_t new_idx,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
@@ -672,7 +677,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -680,7 +685,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -693,8 +698,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		prev_bkt = prev_node->bkt;
 		prev_slot = curr_node->prev_slot;
 
-		prev_alt_bkt_idx =
-			prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
+		prev_alt_bkt_idx = (prev_node->cur_bkt_idx ^
+				prev_bkt->sig_current[prev_slot]) &
+				h->bucket_bitmask;
 
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
@@ -708,10 +714,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		 * Cuckoo insert to move elements back to its
 		 * primary bucket if available
 		 */
-		curr_bkt->sig_alt[curr_slot] =
-			 prev_bkt->sig_current[prev_slot];
 		curr_bkt->sig_current[curr_slot] =
-			prev_bkt->sig_alt[prev_slot];
+			prev_bkt->sig_current[prev_slot];
 		curr_bkt->key_idx[curr_slot] =
 			prev_bkt->key_idx[prev_slot];
 
@@ -721,7 +725,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
-	curr_bkt->sig_alt[curr_slot] = alt_hash;
 	curr_bkt->key_idx[curr_slot] = new_idx;
 
 	__hash_rw_writer_unlock(h);
@@ -739,39 +742,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *bkt,
 			struct rte_hash_bucket *sec_bkt,
 			const struct rte_hash_key *key, void *data,
-			hash_sig_t sig, hash_sig_t alt_hash,
+			uint16_t sig, uint32_t bucket_idx,
 			uint32_t new_idx, int32_t *ret_val)
 {
 	unsigned int i;
 	struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
 	struct queue_node *tail, *head;
 	struct rte_hash_bucket *curr_bkt, *alt_bkt;
+	uint32_t cur_idx, alt_idx;
 
 	tail = queue;
 	head = queue + 1;
 	tail->bkt = bkt;
 	tail->prev = NULL;
 	tail->prev_slot = -1;
+	tail->cur_bkt_idx = bucket_idx;
 
 	/* Cuckoo bfs Search */
 	while (likely(tail != head && head <
 					queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
 					RTE_HASH_BUCKET_ENTRIES)) {
 		curr_bkt = tail->bkt;
+		cur_idx = tail->cur_bkt_idx;
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
 				int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
 						bkt, sec_bkt, key, data,
-						tail, i, sig, alt_hash,
+						tail, i, sig,
 						new_idx, ret_val);
 				if (likely(ret != -1))
 					return ret;
 			}
 
 			/* Enqueue new node and keep prev node info */
-			alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
-						    & h->bucket_bitmask]);
+			alt_idx = (curr_bkt->sig_current[i] ^ cur_idx) &
+							h->bucket_bitmask;
+			alt_bkt = &(h->buckets[alt_idx]);
 			head->bkt = alt_bkt;
+			head->cur_bkt_idx = alt_idx;
 			head->prev = tail;
 			head->prev_slot = i;
 			head++;
@@ -786,7 +794,7 @@ static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig, void *data)
 {
-	hash_sig_t alt_hash;
+	uint16_t short_sig;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
@@ -801,18 +809,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	int32_t ret_val;
 	struct rte_hash_bucket *last;
 
-	prim_bucket_idx = sig & h->bucket_bitmask;
+	get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx, &short_sig);
 	prim_bkt = &h->buckets[prim_bucket_idx];
-	rte_prefetch0(prim_bkt);
-
-	alt_hash = rte_hash_secondary_hash(sig);
-	sec_bucket_idx = alt_hash & h->bucket_bitmask;
 	sec_bkt = &h->buckets[sec_bucket_idx];
+	rte_prefetch0(prim_bkt);
 	rte_prefetch0(sec_bkt);
 
 	/* Check if key is already inserted in primary location */
 	__hash_rw_writer_lock(h);
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -820,12 +825,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Check if key is already inserted in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			return ret;
 		}
 	}
+
 	__hash_rw_writer_unlock(h);
 
 	/* Did not find a match, so get a new slot for storing the new key */
@@ -863,7 +869,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+					short_sig, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -873,7 +879,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -883,7 +889,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-					alt_hash, sig, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, new_idx, &ret_val);
 
 	if (ret == 0)
 		return new_idx - 1;
@@ -903,14 +909,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	 */
 	__hash_rw_writer_lock(h);
 	/* We check for duplicates again since could be inserted before the lock */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		goto failure;
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			enqueue_slot_back(h, cached_free_slots, slot_id);
 			goto failure;
@@ -923,8 +929,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
-				cur_bkt->sig_current[i] = alt_hash;
-				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->sig_current[i] = short_sig;
 				cur_bkt->key_idx[i] = new_idx;
 				__hash_rw_writer_unlock(h);
 				return new_idx - 1;
@@ -942,8 +947,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
-	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
@@ -1002,7 +1006,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
 
 /* Search one bucket to find the match key */
 static inline int32_t
-search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
+search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
 			void **data, const struct rte_hash_bucket *bkt)
 {
 	int i;
@@ -1031,30 +1035,28 @@ static inline int32_t
 __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 					hash_sig_t sig, void **data)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx, &short_sig);
+	bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_reader_lock(h);
 
 	/* Check if key is in primary location */
-	ret = search_one_bucket(h, key, sig, data, bkt);
+	ret = search_one_bucket(h, key, short_sig, data, bkt);
 	if (ret != -1) {
 		__hash_rw_reader_unlock(h);
 		return ret;
 	}
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	bkt = &h->buckets[sec_bucket_idx];
 
 	/* Check if key is in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, bkt) {
-		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
 		if (ret != -1) {
 			__hash_rw_reader_unlock(h);
 			return ret;
@@ -1101,7 +1103,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	struct lcore_cache *cached_free_slots;
 
 	bkt->sig_current[i] = NULL_SIGNATURE;
-	bkt->sig_alt[i] = NULL_SIGNATURE;
 	if (h->multi_writer_support) {
 		lcore_id = rte_lcore_id();
 		cached_free_slots = &h->local_free_slots[lcore_id];
@@ -1126,7 +1127,7 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig)
+			struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
@@ -1158,31 +1159,29 @@ static inline int32_t
 __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt;
 	struct rte_hash_bucket *cur_bkt, *prev_bkt, *next_bkt;
 	int32_t ret, i;
 	struct rte_hash_bucket *tobe_removed_bkt = NULL;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	prim_bkt = &h->buckets[bucket_idx];
+	get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx, &short_sig);
+	prim_bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, prim_bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
 
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	sec_bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[sec_bucket_idx];
 
 	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, sec_bkt, alt_hash);
+	ret = search_and_remove(h, key, sec_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -1192,7 +1191,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	if (h->ext_table_support) {
 		next_bkt = sec_bkt->next;
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
-			ret = search_and_remove(h, key, cur_bkt, alt_hash);
+			ret = search_and_remove(h, key, cur_bkt, short_sig);
 			if (ret != -1)
 				goto return_bkt;
 		}
@@ -1265,55 +1264,35 @@ static inline void
 compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 			const struct rte_hash_bucket *prim_bkt,
 			const struct rte_hash_bucket *sec_bkt,
-			hash_sig_t prim_hash, hash_sig_t sec_hash,
+			uint16_t sig,
 			enum rte_hash_sig_compare_function sig_cmp_fn)
 {
 	unsigned int i;
 
+	/* For match mask the first bit of every two bits indicates the match */
 	switch (sig_cmp_fn) {
-#ifdef RTE_MACHINE_CPUFLAG_AVX2
-	case RTE_HASH_COMPARE_AVX2:
-		*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)prim_bkt->sig_current),
-				_mm256_set1_epi32(prim_hash)));
-		*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)sec_bkt->sig_current),
-				_mm256_set1_epi32(sec_hash)));
-		break;
-#endif
 #ifdef RTE_MACHINE_CPUFLAG_SSE2
 	case RTE_HASH_COMPARE_SSE:
-		/* Compare the first 4 signatures in the bucket */
-		*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+		/* Compare all signatures in the bucket */
+		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
-				_mm_set1_epi32(prim_hash)));
-		*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&prim_bkt->sig_current[4]),
-				_mm_set1_epi32(prim_hash)))) << 4;
-		/* Compare the first 4 signatures in the bucket */
-		*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+				_mm_set1_epi16(sig)));
+		/* Compare all signatures in the bucket */
+		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
-				_mm_set1_epi32(sec_hash)));
-		*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&sec_bkt->sig_current[4]),
-				_mm_set1_epi32(sec_hash)))) << 4;
+				_mm_set1_epi16(sig)));
 		break;
 #endif
 	default:
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			*prim_hash_matches |=
-				((prim_hash == prim_bkt->sig_current[i]) << i);
+				((sig == prim_bkt->sig_current[i]) << (i << 1));
 			*sec_hash_matches |=
-				((sec_hash == sec_bkt->sig_current[i]) << i);
+				((sig == sec_bkt->sig_current[i]) << (i << 1));
 		}
 	}
-
 }
 
 #define PREFETCH_OFFSET 4
@@ -1326,7 +1305,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	int32_t i;
 	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
-	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
@@ -1345,10 +1326,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		rte_prefetch0(keys[i + PREFETCH_OFFSET]);
 
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
+		get_buckets_index(h, prim_hash[i],
+				&prim_index[i], &sec_index[i], &sig[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1357,10 +1339,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	/* Calculate and prefetch rest of the buckets */
 	for (; i < num_keys; i++) {
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		get_buckets_index(h, prim_hash[i],
+				&prim_index[i], &sec_index[i], &sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1371,10 +1355,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
 				primary_bkt[i], secondary_bkt[i],
-				prim_hash[i], sec_hash[i], h->sig_cmp_fn);
+				sig[i], h->sig_cmp_fn);
 
 		if (prim_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 			uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1385,7 +1370,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		}
 
 		if (sec_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 			uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1399,7 +1385,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		positions[i] = -ENOENT;
 		while (prim_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 
 			uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1418,11 +1405,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			prim_hitmask[i] &= ~(1 << (hit_index));
+			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 		while (sec_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 
 			uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1442,7 +1430,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			sec_hitmask[i] &= ~(1 << (hit_index));
+			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 next_key:
@@ -1465,10 +1453,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
 			if (data != NULL)
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], &data[i], cur_bkt);
+						sig[i], &data[i], cur_bkt);
 			else
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], NULL, cur_bkt);
+						sig[i], NULL, cur_bkt);
 			if (ret != -1) {
 				positions[i] = ret;
 				hits |= 1ULL << i;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index e601520..7753cd8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -129,18 +129,15 @@ struct rte_hash_key {
 enum rte_hash_sig_compare_function {
 	RTE_HASH_COMPARE_SCALAR = 0,
 	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_AVX2,
 	RTE_HASH_COMPARE_NUM
 };
 
 /** Bucket structure */
 struct rte_hash_bucket {
-	hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
+	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
 	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
 
-	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
-
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
 	void *next;
@@ -193,6 +190,7 @@ struct rte_hash {
 
 struct queue_node {
 	struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
+	uint32_t cur_bkt_idx;
 
 	struct queue_node *prev;     /* Parent(bucket) in search path */
 	int prev_slot;               /* Parent(slot) in search path */
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 11d8e28..0bd7696 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -40,7 +40,10 @@ extern "C" {
 /** Flag to indicate the extendabe bucket table feature should be used */
 #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
 
-/** Signature of key that is stored internally. */
+/**
+ * A hash value that is used to generate signature stored in table and the
+ * location the signature is stored.
+ */
 typedef uint32_t hash_sig_t;
 
 /** Type of function that can be used for calculating the hash value. */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 1/7] test/hash: fix bucket size in hash perf test
  2018-09-21 17:17   ` [PATCH v2 1/7] test/hash: fix bucket size in hash perf test Yipeng Wang
@ 2018-09-26 10:04     ` Bruce Richardson
  2018-09-27  3:39       ` Wang, Yipeng1
  2018-09-27  4:23     ` Honnappa Nagarahalli
  1 sibling, 1 reply; 107+ messages in thread
From: Bruce Richardson @ 2018-09-26 10:04 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, michel, honnappa.nagarahalli

On Fri, Sep 21, 2018 at 10:17:29AM -0700, Yipeng Wang wrote:
> The bucket size was changed from 4 to 8 but the corresponding
> perf test was not changed accordingly.
> 

Can you perhaps give a little detail on what actual problems this caused.
Did it just mean that we used up too much memory in the test because we
thought there were more buckets than there were, or something else?

> Fixes: 58017c98ed53 ("hash: add vectorized comparison")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  test/test/test_hash_perf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
> index 33dcb9f..9ed7125 100644
> --- a/test/test/test_hash_perf.c
> +++ b/test/test/test_hash_perf.c
> @@ -20,7 +20,7 @@
>  #define MAX_ENTRIES (1 << 19)
>  #define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
>  #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
> -#define BUCKET_SIZE 4
> +#define BUCKET_SIZE 8
>  #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
>  #define MAX_KEYSIZE 64
>  #define NUM_KEYSIZES 10
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 2/7] test/hash: more accurate hash perf test output
  2018-09-21 17:17   ` [PATCH v2 2/7] test/hash: more accurate hash perf test output Yipeng Wang
@ 2018-09-26 10:07     ` Bruce Richardson
  0 siblings, 0 replies; 107+ messages in thread
From: Bruce Richardson @ 2018-09-26 10:07 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, michel, honnappa.nagarahalli

On Fri, Sep 21, 2018 at 10:17:30AM -0700, Yipeng Wang wrote:
> Edit the printf information when error happens to be more
> accurate and informative.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 3/7] test/hash: fix rw test with non-consecutive cores
  2018-09-21 17:17   ` [PATCH v2 3/7] test/hash: fix rw test with non-consecutive cores Yipeng Wang
@ 2018-09-26 11:02     ` Bruce Richardson
  2018-09-27  3:40       ` Wang, Yipeng1
  2018-09-26 11:13     ` Bruce Richardson
  1 sibling, 1 reply; 107+ messages in thread
From: Bruce Richardson @ 2018-09-26 11:02 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, michel, honnappa.nagarahalli

On Fri, Sep 21, 2018 at 10:17:31AM -0700, Yipeng Wang wrote:
> the multi-reader and multi-writer rte_hash unit test does not
> work correctly with non-consicutive core ids. This commit
> fixes the issue.
> 
> Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---

When testing this patch, I see that the read-write autotests are not
currently in the meson.build file for the test binary. I think this
patchset should include this fix too, as a separate patch.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 3/7] test/hash: fix rw test with non-consecutive cores
  2018-09-21 17:17   ` [PATCH v2 3/7] test/hash: fix rw test with non-consecutive cores Yipeng Wang
  2018-09-26 11:02     ` Bruce Richardson
@ 2018-09-26 11:13     ` Bruce Richardson
  1 sibling, 0 replies; 107+ messages in thread
From: Bruce Richardson @ 2018-09-26 11:13 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, michel, honnappa.nagarahalli

On Fri, Sep 21, 2018 at 10:17:31AM -0700, Yipeng Wang wrote:
> the multi-reader and multi-writer rte_hash unit test does not
> work correctly with non-consicutive core ids. This commit
> fixes the issue.
> 
> Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  test/test/test_hash_readwrite.c | 78 ++++++++++++++++++++++++++---------------
>  1 file changed, 49 insertions(+), 29 deletions(-)
> 

With existing code, testing with "-l 0,2,4,6,40,42,44,48" gives error:

++++++++Start function tests:+++++++++
Core #2 inserting and reading 1966080: 3,932,160 - 5,898,240
Core #4 inserting and reading 1966080: 7,864,320 - 9,830,400
Core #6 inserting and reading 1966080: 11,796,480 - 13,762,560
Core #40 inserting and reading 1966080: 78,643,200 - 80,609,280
Core #42 inserting and reading 1966080: 82,575,360 - 84,541,440
Core #44 inserting and reading 1966080: 86,507,520 - 88,473,600
Core #48 inserting and reading 1966080: 94,371,840 - 96,337,920
Core #0 inserting and reading 1966080: 0 - 1,966,080
key 1966080 is lost
1 key lost
Test Failed


With this patch applies, test runs as expected.

Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v3 0/5] hash: fix multiple issues
  2018-09-06 17:09 [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing Yipeng Wang
                   ` (5 preceding siblings ...)
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
@ 2018-09-26 12:54 ` Yipeng Wang
  2018-09-26 12:54   ` [PATCH v3 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
                     ` (5 more replies)
  2018-09-26 20:26 ` [PATCH v3 0/3] hash: add extendable bucket and partial key hashing Yipeng Wang
  7 siblings, 6 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 12:54 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

This patch set was part of extendable hash table patch
set V2. According to Bruce's comment, this patch set
is now separated from the original patch set for easier
review and merge.
https://mails.dpdk.org/archives/dev/2018-September/112555.html

This patch set fixes multiple issues/bugs from rte_hash and hash
unit test.

Yipeng Wang (5):
  test/hash: fix bucket size in hash perf test
  test/hash: more accurate hash perf test output
  test/hash: fix rw test with non-consecutive cores
  test/hash: fix missing file in meson build file
  hash: fix unused define

 lib/librte_hash/rte_cuckoo_hash.h |  2 -
 test/test/meson.build             |  1 +
 test/test/test_hash_perf.c        | 12 +++---
 test/test/test_hash_readwrite.c   | 78 ++++++++++++++++++++++++---------------
 4 files changed, 56 insertions(+), 37 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v3 1/5] test/hash: fix bucket size in hash perf test
  2018-09-26 12:54 ` [PATCH v3 0/5] hash: fix multiple issues Yipeng Wang
@ 2018-09-26 12:54   ` Yipeng Wang
  2018-09-27 11:17     ` Bruce Richardson
  2018-09-26 12:54   ` [PATCH v3 2/5] test/hash: more accurate hash perf test output Yipeng Wang
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 12:54 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

The bucket size was changed from 4 to 8 but the corresponding
perf test was not changed accordingly.

In the test, the bucket size and number of buckets are used
to map to the underneath rte_hash structure. They are used
to test performance of two conditions: keys in primary
buckets only and keys in both primary and secondary buckets.

Although there is no functional issue with bucket size set
to 4, it mismatches the underneath rte_hash structure,
which may affect code readability and future extension.

Fixes: 58017c98ed53 ("hash: add vectorized comparison")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash_perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 33dcb9f..9ed7125 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -20,7 +20,7 @@
 #define MAX_ENTRIES (1 << 19)
 #define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
-#define BUCKET_SIZE 4
+#define BUCKET_SIZE 8
 #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
 #define MAX_KEYSIZE 64
 #define NUM_KEYSIZES 10
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v3 2/5] test/hash: more accurate hash perf test output
  2018-09-26 12:54 ` [PATCH v3 0/5] hash: fix multiple issues Yipeng Wang
  2018-09-26 12:54   ` [PATCH v3 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
@ 2018-09-26 12:54   ` Yipeng Wang
  2018-09-26 12:54   ` [PATCH v3 3/5] test/hash: fix rw test with non-consecutive cores Yipeng Wang
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 12:54 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

Edit the printf information when error happens to be more
accurate and informative.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_hash_perf.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 9ed7125..4d00c20 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -248,7 +248,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 						(const void *) keys[i],
 						signatures[i], data);
 			if (ret < 0) {
-				printf("Failed to add key number %u\n", ret);
+				printf("H+D: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else if (with_hash && !with_data) {
@@ -258,7 +258,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 			if (ret >= 0)
 				positions[i] = ret;
 			else {
-				printf("Failed to add key number %u\n", ret);
+				printf("H: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else if (!with_hash && with_data) {
@@ -266,7 +266,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 						(const void *) keys[i],
 						data);
 			if (ret < 0) {
-				printf("Failed to add key number %u\n", ret);
+				printf("D: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else {
@@ -274,7 +274,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 			if (ret >= 0)
 				positions[i] = ret;
 			else {
-				printf("Failed to add key number %u\n", ret);
+				printf("Failed to add key number %u\n", i);
 				return -1;
 			}
 		}
@@ -442,7 +442,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 		if (ret >= 0)
 			positions[i] = ret;
 		else {
-			printf("Failed to add key number %u\n", ret);
+			printf("Failed to delete key number %u\n", i);
 			return -1;
 		}
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v3 3/5] test/hash: fix rw test with non-consecutive cores
  2018-09-26 12:54 ` [PATCH v3 0/5] hash: fix multiple issues Yipeng Wang
  2018-09-26 12:54   ` [PATCH v3 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
  2018-09-26 12:54   ` [PATCH v3 2/5] test/hash: more accurate hash perf test output Yipeng Wang
@ 2018-09-26 12:54   ` Yipeng Wang
  2018-09-27 11:18     ` Bruce Richardson
  2018-09-26 12:54   ` [PATCH v3 4/5] test/hash: fix missing file in meson build file Yipeng Wang
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 12:54 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

the multi-reader and multi-writer rte_hash unit test does not
work correctly with non-consicutive core ids. This commit
fixes the issue.

Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_hash_readwrite.c | 78 ++++++++++++++++++++++++++---------------
 1 file changed, 49 insertions(+), 29 deletions(-)

diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c
index 55ae33d..2a4f7b9 100644
--- a/test/test/test_hash_readwrite.c
+++ b/test/test/test_hash_readwrite.c
@@ -24,6 +24,7 @@
 #define NUM_TEST 3
 unsigned int core_cnt[NUM_TEST] = {2, 4, 8};
 
+unsigned int slave_core_ids[RTE_MAX_LCORE];
 struct perf {
 	uint32_t single_read;
 	uint32_t single_write;
@@ -60,12 +61,15 @@ test_hash_readwrite_worker(__attribute__((unused)) void *arg)
 	uint64_t begin, cycles;
 	int ret;
 
-	offset = (lcore_id - rte_get_master_lcore())
-			* tbl_rw_test_param.num_insert;
+	for (i = 0; i < rte_lcore_count(); i++) {
+		if (slave_core_ids[i] == lcore_id)
+			break;
+	}
+	offset = tbl_rw_test_param.num_insert * i;
 
 	printf("Core #%d inserting and reading %d: %'"PRId64" - %'"PRId64"\n",
 	       lcore_id, tbl_rw_test_param.num_insert,
-	       offset, offset + tbl_rw_test_param.num_insert);
+	       offset, offset + tbl_rw_test_param.num_insert - 1);
 
 	begin = rte_rdtsc_precise();
 
@@ -171,6 +175,7 @@ test_hash_readwrite_functional(int use_htm)
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 	int use_jhash = 1;
+	int slave_cnt = rte_lcore_count() - 1;
 
 	rte_atomic64_init(&gcycles);
 	rte_atomic64_clear(&gcycles);
@@ -182,17 +187,17 @@ test_hash_readwrite_functional(int use_htm)
 		goto err;
 
 	tbl_rw_test_param.num_insert =
-		TOTAL_INSERT / rte_lcore_count();
+		TOTAL_INSERT / slave_cnt;
 
 	tbl_rw_test_param.rounded_tot_insert =
 		tbl_rw_test_param.num_insert
-		* rte_lcore_count();
+		* slave_cnt;
 
 	printf("++++++++Start function tests:+++++++++\n");
 
 	/* Fire all threads. */
 	rte_eal_mp_remote_launch(test_hash_readwrite_worker,
-				 NULL, CALL_MASTER);
+				 NULL, SKIP_MASTER);
 	rte_eal_mp_wait_lcore();
 
 	while (rte_hash_iterate(tbl_rw_test_param.h, &next_key,
@@ -249,7 +254,7 @@ test_hash_readwrite_functional(int use_htm)
 }
 
 static int
-test_rw_reader(__attribute__((unused)) void *arg)
+test_rw_reader(void *arg)
 {
 	uint64_t i;
 	uint64_t begin, cycles;
@@ -276,7 +281,7 @@ test_rw_reader(__attribute__((unused)) void *arg)
 }
 
 static int
-test_rw_writer(__attribute__((unused)) void *arg)
+test_rw_writer(void *arg)
 {
 	uint64_t i;
 	uint32_t lcore_id = rte_lcore_id();
@@ -285,8 +290,13 @@ test_rw_writer(__attribute__((unused)) void *arg)
 	uint64_t start_coreid = (uint64_t)(uintptr_t)arg;
 	uint64_t offset;
 
-	offset = TOTAL_INSERT / 2 + (lcore_id - start_coreid)
-					* tbl_rw_test_param.num_insert;
+	for (i = 0; i < rte_lcore_count(); i++) {
+		if (slave_core_ids[i] == lcore_id)
+			break;
+	}
+
+	offset = TOTAL_INSERT / 2 + (i - (start_coreid)) *
+				tbl_rw_test_param.num_insert;
 	begin = rte_rdtsc_precise();
 	for (i = offset; i < offset + tbl_rw_test_param.num_insert; i++) {
 		ret = rte_hash_add_key_data(tbl_rw_test_param.h,
@@ -384,8 +394,8 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 	perf_results->single_read = end / i;
 
 	for (n = 0; n < NUM_TEST; n++) {
-		unsigned int tot_lcore = rte_lcore_count();
-		if (tot_lcore < core_cnt[n] * 2 + 1)
+		unsigned int tot_slave_lcore = rte_lcore_count() - 1;
+		if (tot_slave_lcore < core_cnt[n] * 2)
 			goto finish;
 
 		rte_atomic64_clear(&greads);
@@ -415,17 +425,19 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 		 */
 
 		/* Test only reader cases */
-		for (i = 1; i <= core_cnt[n]; i++)
+		for (i = 0; i < core_cnt[n]; i++)
 			rte_eal_remote_launch(test_rw_reader,
-					(void *)(uintptr_t)read_cnt, i);
+					(void *)(uintptr_t)read_cnt,
+					slave_core_ids[i]);
 
 		rte_eal_mp_wait_lcore();
 
 		start_coreid = i;
 		/* Test only writer cases */
-		for (; i <= core_cnt[n] * 2; i++)
+		for (; i < core_cnt[n] * 2; i++)
 			rte_eal_remote_launch(test_rw_writer,
-					(void *)((uintptr_t)start_coreid), i);
+					(void *)((uintptr_t)start_coreid),
+					slave_core_ids[i]);
 
 		rte_eal_mp_wait_lcore();
 
@@ -464,22 +476,26 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 			}
 		}
 
-		start_coreid = core_cnt[n] + 1;
+		start_coreid = core_cnt[n];
 
 		if (reader_faster) {
-			for (i = core_cnt[n] + 1; i <= core_cnt[n] * 2; i++)
+			for (i = core_cnt[n]; i < core_cnt[n] * 2; i++)
 				rte_eal_remote_launch(test_rw_writer,
-					(void *)((uintptr_t)start_coreid), i);
-			for (i = 1; i <= core_cnt[n]; i++)
+					(void *)((uintptr_t)start_coreid),
+					slave_core_ids[i]);
+			for (i = 0; i < core_cnt[n]; i++)
 				rte_eal_remote_launch(test_rw_reader,
-					(void *)(uintptr_t)read_cnt, i);
+					(void *)(uintptr_t)read_cnt,
+					slave_core_ids[i]);
 		} else {
-			for (i = 1; i <= core_cnt[n]; i++)
+			for (i = 0; i < core_cnt[n]; i++)
 				rte_eal_remote_launch(test_rw_reader,
-					(void *)(uintptr_t)read_cnt, i);
-			for (; i <= core_cnt[n] * 2; i++)
+					(void *)(uintptr_t)read_cnt,
+					slave_core_ids[i]);
+			for (; i < core_cnt[n] * 2; i++)
 				rte_eal_remote_launch(test_rw_writer,
-					(void *)((uintptr_t)start_coreid), i);
+					(void *)((uintptr_t)start_coreid),
+					slave_core_ids[i]);
 		}
 
 		rte_eal_mp_wait_lcore();
@@ -562,13 +578,19 @@ test_hash_readwrite_main(void)
 	 * writer threads for performance numbers.
 	 */
 	int use_htm, reader_faster;
+	unsigned int i = 0, core_id = 0;
 
-	if (rte_lcore_count() == 1) {
-		printf("More than one lcore is required "
+	if (rte_lcore_count() <= 2) {
+		printf("More than two lcores are required "
 			"to do read write test\n");
 		return 0;
 	}
 
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		slave_core_ids[i] = core_id;
+		i++;
+	}
+
 	setlocale(LC_NUMERIC, "");
 
 	if (rte_tm_supported()) {
@@ -610,8 +632,6 @@ test_hash_readwrite_main(void)
 
 	printf("Results summary:\n");
 
-	int i;
-
 	printf("single read: %u\n", htm_results.single_read);
 	printf("single write: %u\n", htm_results.single_write);
 	for (i = 0; i < NUM_TEST; i++) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v3 4/5] test/hash: fix missing file in meson build file
  2018-09-26 12:54 ` [PATCH v3 0/5] hash: fix multiple issues Yipeng Wang
                     ` (2 preceding siblings ...)
  2018-09-26 12:54   ` [PATCH v3 3/5] test/hash: fix rw test with non-consecutive cores Yipeng Wang
@ 2018-09-26 12:54   ` Yipeng Wang
  2018-09-27 11:22     ` Bruce Richardson
  2018-09-26 12:54   ` [PATCH v3 5/5] hash: fix unused define Yipeng Wang
  2018-09-28 14:11   ` [PATCH v4 0/5] hash: fix multiple issues Yipeng Wang
  5 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 12:54 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

The test_hash_readwrite.c was not in the meson.build file. This
commit adds the missing test into the file.

Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6ec..1826bab 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -40,6 +40,7 @@ test_sources = files('commands.c',
 	'test_hash.c',
 	'test_hash_functions.c',
 	'test_hash_multiwriter.c',
+	'test_hash_readwrite.c',
 	'test_hash_perf.c',
 	'test_hash_scaling.c',
 	'test_interrupts.c',
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v3 5/5] hash: fix unused define
  2018-09-26 12:54 ` [PATCH v3 0/5] hash: fix multiple issues Yipeng Wang
                     ` (3 preceding siblings ...)
  2018-09-26 12:54   ` [PATCH v3 4/5] test/hash: fix missing file in meson build file Yipeng Wang
@ 2018-09-26 12:54   ` Yipeng Wang
  2018-09-28 14:11   ` [PATCH v4 0/5] hash: fix multiple issues Yipeng Wang
  5 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 12:54 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

Since the depth-first search of cuckoo path is removed, we do not
need the macro anymore which specifies the depth of the cuckoo
search.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index b43f467..fc0e5c2 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -104,8 +104,6 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
 
 #define LCORE_CACHE_SIZE		64
 
-#define RTE_HASH_MAX_PUSHES             100
-
 #define RTE_HASH_BFS_QUEUE_MAX_LEN       1000
 
 #define RTE_XABORT_CUCKOO_PATH_INVALIDED 0x4
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 4/7] hash: fix unnecessary code
  2018-09-21 17:17   ` [PATCH v2 4/7] hash: fix unnecessary code Yipeng Wang
@ 2018-09-26 12:55     ` Bruce Richardson
  0 siblings, 0 replies; 107+ messages in thread
From: Bruce Richardson @ 2018-09-26 12:55 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, michel, honnappa.nagarahalli

On Fri, Sep 21, 2018 at 10:17:32AM -0700, Yipeng Wang wrote:
> Since the depth-first search of cuckoo path is removed, we do not
> need the macro anymore which specifies the depth of the cuckoo
> search.
> 
> Fixes: f2e3001b53ec ("hash: support read/write concurrency")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
I'd suggest rewording the title to "fix unused define" to be a bit more
specific. Otherwise all ok.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 0/7] hash: add extendable bucket and partial key hashing
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (6 preceding siblings ...)
  2018-09-21 17:17   ` [PATCH v2 7/7] hash: use partial-key hashing Yipeng Wang
@ 2018-09-26 12:57   ` Bruce Richardson
  2018-09-27  3:41     ` Wang, Yipeng1
  2018-09-27  4:23   ` Honnappa Nagarahalli
  8 siblings, 1 reply; 107+ messages in thread
From: Bruce Richardson @ 2018-09-26 12:57 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, michel, honnappa.nagarahalli

On Fri, Sep 21, 2018 at 10:17:28AM -0700, Yipeng Wang wrote:
> The first four commits of the patch set try to fix small issues of
> previous code.
> 
> The other commits make two major optimizations over the current rte_hash
> library.
> 
I'd suggest splitting this set into two. The first 4 patches are easy to
review and should be quickly merged (I hope :-)), allowing us to focus more on
the bigger patches adding the key new feature support.

/Bruce

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v3 0/3] hash: add extendable bucket and partial key hashing
  2018-09-06 17:09 [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing Yipeng Wang
                   ` (6 preceding siblings ...)
  2018-09-26 12:54 ` [PATCH v3 0/5] hash: fix multiple issues Yipeng Wang
@ 2018-09-26 20:26 ` Yipeng Wang
  2018-09-26 20:26   ` [PATCH v3 1/3] hash: add extendable bucket feature Yipeng Wang
                     ` (5 more replies)
  7 siblings, 6 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 20:26 UTC (permalink / raw)
  To: bruce.richardson
  Cc: dev, yipeng1.wang, honnappa.nagarahalli, michel, sameh.gobriel

This patch set made two major optimizations over the current rte_hash
library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

v2->v3:
The first four commits were separated from this patch set as another
independent patch set:
https://mails.dpdk.org/archives/dev/2018-September/113118.html
1. hash: move snprintf for ext_ring name under the ext_table condition.
2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
3. hash: after failing cuckoo path, search not only ext buckets, but also the
secondary bucket first to see if there may be an empty location now.
4. hash: totally rewrote the key deleting function logic. If the deleted key was
not in the last bucket of the linked list when ext table enabled, the last
entry in the linked list will be placed in the vacant slot from the deleted
key. The purpose is to compact the entries in the linked list to be more close
to the main table. This is to make sure that not many extendable buckets are
wasted with only one or two entries after some time of running, also benefit
lookup speed.
5. Other minor coding style/comments improvements.

V1->V2:
1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
2. hash: Reorder the rte_hash struct to align cache line better.
3. test: Minor changes in auto test to add key insertion failure check during
iteration test.
4. test: Add new commit to fix read-write test non-consecutive core issue.
4. hash: Add a new commit to remove unnecessary code introduced by previous
patches.
5. hash: Comments improvement and coding style improvements over multiple
places.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>

Yipeng Wang (3):
  hash: add extendable bucket feature
  test/hash: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 553 +++++++++++++++++++++++++++-----------
 lib/librte_hash/rte_cuckoo_hash.h |  11 +-
 lib/librte_hash/rte_hash.h        |   8 +-
 test/test/test_hash.c             | 151 ++++++++++-
 test/test/test_hash_perf.c        | 114 +++++---
 5 files changed, 646 insertions(+), 191 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v3 1/3] hash: add extendable bucket feature
  2018-09-26 20:26 ` [PATCH v3 0/3] hash: add extendable bucket and partial key hashing Yipeng Wang
@ 2018-09-26 20:26   ` Yipeng Wang
  2018-09-26 20:26   ` [PATCH v3 2/3] test/hash: implement extendable bucket hash test Yipeng Wang
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 20:26 UTC (permalink / raw)
  To: bruce.richardson
  Cc: dev, yipeng1.wang, honnappa.nagarahalli, michel, sameh.gobriel

In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the table utilization can always acheive 100%.
Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 367 ++++++++++++++++++++++++++++++++------
 lib/librte_hash/rte_cuckoo_hash.h |   5 +
 lib/librte_hash/rte_hash.h        |   3 +
 3 files changed, 324 insertions(+), 51 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..1c66a0d 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,6 +31,10 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
+#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)                            \
+	for (CURRENT_BKT = START_BUCKET;                                      \
+		CURRENT_BKT != NULL;                                          \
+		CURRENT_BKT = CURRENT_BKT->next)
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
 	return h;
 }
 
+static inline struct rte_hash_bucket *
+rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt)
+{
+	while (lst_bkt->next != NULL)
+		lst_bkt = lst_bkt->next;
+	return lst_bkt;
+}
+
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)
 {
 	h->cmp_jump_table_idx = KEY_CUSTOM;
@@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	struct rte_tailq_entry *te = NULL;
 	struct rte_hash_list *hash_list;
 	struct rte_ring *r = NULL;
+	struct rte_ring *r_ext = NULL;
 	char hash_name[RTE_HASH_NAMESIZE];
 	void *k = NULL;
 	void *buckets = NULL;
+	void *buckets_ext = NULL;
 	char ring_name[RTE_RING_NAMESIZE];
+	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
 	unsigned i;
 	unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		multi_writer_support = 1;
 	}
 
+	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
+		ext_table_support = 1;
+
 	/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
 	if (multi_writer_support)
 		/*
@@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err;
 	}
 
+	const uint32_t num_buckets = rte_align32pow2(params->entries) /
+						RTE_HASH_BUCKET_ENTRIES;
+
+	/* Create ring for extendable buckets. */
+	if (ext_table_support) {
+		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
+								params->name);
+		r_ext = rte_ring_create(ext_ring_name,
+				rte_align32pow2(num_buckets + 1),
+				params->socket_id, 0);
+
+		if (r_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+								"failed\n");
+			goto err;
+		}
+	}
+
 	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
 
 	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err_unlock;
 	}
 
-	const uint32_t num_buckets = rte_align32pow2(params->entries)
-					/ RTE_HASH_BUCKET_ENTRIES;
-
 	buckets = rte_zmalloc_socket(NULL,
 				num_buckets * sizeof(struct rte_hash_bucket),
 				RTE_CACHE_LINE_SIZE, params->socket_id);
 
 	if (buckets == NULL) {
-		RTE_LOG(ERR, HASH, "memory allocation failed\n");
+		RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
 		goto err_unlock;
 	}
 
+	/* Allocate same number of extendable buckets */
+	if (ext_table_support) {
+		buckets_ext = rte_zmalloc_socket(NULL,
+				num_buckets * sizeof(struct rte_hash_bucket),
+				RTE_CACHE_LINE_SIZE, params->socket_id);
+		if (buckets_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+							"failed\n");
+			goto err_unlock;
+		}
+		/* Populate ext bkt ring. We reserve 0 similar to the
+		 * key-data slot, just in case in future we want to
+		 * use bucket index for the linked list and 0 means NULL
+		 * for next bucket
+		 */
+		for (i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+	}
+
 	const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params->key_len;
 	const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
 
@@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->num_buckets = num_buckets;
 	h->bucket_bitmask = h->num_buckets - 1;
 	h->buckets = buckets;
+	h->buckets_ext = buckets_ext;
+	h->free_ext_bkts = r_ext;
 	h->hash_func = (params->hash_func == NULL) ?
 		default_hash_func : params->hash_func;
 	h->key_store = k;
@@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->hw_trans_mem_support = hw_trans_mem_support;
 	h->multi_writer_support = multi_writer_support;
 	h->readwrite_concur_support = readwrite_concur_support;
+	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
 err:
 	rte_ring_free(r);
+	rte_ring_free(r_ext);
 	rte_free(te);
 	rte_free(h);
 	rte_free(buckets);
+	rte_free(buckets_ext);
 	rte_free(k);
 	return NULL;
 }
@@ -344,8 +402,10 @@ rte_hash_free(struct rte_hash *h)
 		rte_free(h->readwrite_lock);
 	}
 	rte_ring_free(h->free_slots);
+	rte_ring_free(h->free_ext_bkts);
 	rte_free(h->key_store);
 	rte_free(h->buckets);
+	rte_free(h->buckets_ext);
 	rte_free(h);
 	rte_free(te);
 }
@@ -403,7 +463,6 @@ __hash_rw_writer_lock(const struct rte_hash *h)
 		rte_rwlock_write_lock(h->readwrite_lock);
 }
 
-
 static inline void
 __hash_rw_reader_lock(const struct rte_hash *h)
 {
@@ -448,6 +507,14 @@ rte_hash_reset(struct rte_hash *h)
 	while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
 		rte_pause();
 
+	/* clear free extendable bucket ring and memory */
+	if (h->ext_table_support) {
+		memset(h->buckets_ext, 0, h->num_buckets *
+						sizeof(struct rte_hash_bucket));
+		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
+			rte_pause();
+	}
+
 	/* Repopulate the free slots ring. Entry zero is reserved for key misses */
 	if (h->multi_writer_support)
 		tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
@@ -458,6 +525,13 @@ rte_hash_reset(struct rte_hash *h)
 	for (i = 1; i < tot_ring_cnt + 1; i++)
 		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
 
+	/* Repopulate the free ext bkt ring. */
+	if (h->ext_table_support) {
+		for (i = 1; i < h->num_buckets + 1; i++)
+			rte_ring_sp_enqueue(h->free_ext_bkts,
+						(void *)((uintptr_t) i));
+	}
+
 	if (h->multi_writer_support) {
 		/* Reset local caches per lcore */
 		for (i = 0; i < RTE_MAX_LCORE; i++)
@@ -524,24 +598,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		int32_t *ret_val)
 {
 	unsigned int i;
-	struct rte_hash_bucket *cur_bkt = prim_bkt;
+	struct rte_hash_bucket *cur_bkt;
 	int32_t ret;
 
 	__hash_rw_writer_lock(h);
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	/* Insert new entry if there is room in the primary
@@ -580,7 +657,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
-	struct rte_hash_bucket *cur_bkt = bkt;
+	struct rte_hash_bucket *cur_bkt;
 	struct queue_node *prev_node, *curr_node = leaf;
 	struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
 	uint32_t prev_slot, curr_slot = leaf_slot;
@@ -597,18 +674,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
 
-	ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	while (likely(curr_node->prev != NULL)) {
@@ -711,15 +790,18 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	hash_sig_t alt_hash;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
-	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
 	void *slot_id = NULL;
-	uint32_t new_idx;
+	void *ext_bkt_id = NULL;
+	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
+	unsigned int i;
 	struct lcore_cache *cached_free_slots = NULL;
 	int32_t ret_val;
+	struct rte_hash_bucket *last;
 
 	prim_bucket_idx = sig & h->bucket_bitmask;
 	prim_bkt = &h->buckets[prim_bucket_idx];
@@ -739,10 +821,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Check if key is already inserted in secondary location */
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_writer_unlock(h);
 
@@ -808,10 +892,70 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
-	} else {
+	}
+
+	/* if ext table not enabled, we failed the insertion */
+	if (!h->ext_table_support) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret;
 	}
+
+	/* Now we need to go through the extendable bucket. Protection is needed
+	 * to protect all extendable bucket processes.
+	 */
+	__hash_rw_writer_lock(h);
+	/* We check for duplicates again since could be inserted before the lock */
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	if (ret != -1) {
+		enqueue_slot_back(h, cached_free_slots, slot_id);
+		goto failure;
+	}
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			enqueue_slot_back(h, cached_free_slots, slot_id);
+			goto failure;
+		}
+	}
+
+	/* Search sec and ext buckets to find an empty entry to insert. */
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			/* Check if slot is available */
+			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
+				cur_bkt->sig_current[i] = alt_hash;
+				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->key_idx[i] = new_idx;
+				__hash_rw_writer_unlock(h);
+				return new_idx - 1;
+			}
+		}
+	}
+
+	/* Failed to get an empty entry from extendable buckets. Link a new
+	 * extendable bucket. We first get a free bucket from ring.
+	 */
+	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+		ret = -ENOSPC;
+		goto failure;
+	}
+
+	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
+	/* Use the first location of the new bucket */
+	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
+	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
+	/* Link the new bucket to sec bucket linked list */
+	last = rte_hash_get_last_bkt(sec_bkt);
+	last->next = &h->buckets_ext[bkt_id];
+	__hash_rw_writer_unlock(h);
+	return new_idx - 1;
+
+failure:
+	__hash_rw_writer_unlock(h);
+	return ret;
+
 }
 
 int32_t
@@ -890,7 +1034,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
+	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
 
 	bucket_idx = sig & h->bucket_bitmask;
@@ -910,10 +1054,12 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 	bkt = &h->buckets[bucket_idx];
 
 	/* Check if key is in secondary location */
-	ret = search_one_bucket(h, key, alt_hash, data, bkt);
-	if (ret != -1) {
-		__hash_rw_reader_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, bkt) {
+		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		if (ret != -1) {
+			__hash_rw_reader_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_reader_unlock(h);
 	return -ENOENT;
@@ -978,16 +1124,42 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	}
 }
 
+/* Compact the linked list by moving key from last entry in linked list to the
+ * empty slot.
+ */
+static inline void
+__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
+	int i;
+	struct rte_hash_bucket *last_bkt;
+
+	if (!cur_bkt->next)
+		return;
+
+	last_bkt = rte_hash_get_last_bkt(cur_bkt);
+
+	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
+			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
+			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
+			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
+			last_bkt->sig_current[i] = NULL_SIGNATURE;
+			last_bkt->sig_alt[i] = NULL_SIGNATURE;
+			last_bkt->key_idx[i] = EMPTY_SLOT;
+			return;
+		}
+	}
+}
+
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig)
+			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
 	int32_t ret;
 
-	/* Check if key is in primary location */
+	/* Check if key is in bucket */
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 		if (bkt->sig_current[i] == sig &&
 				bkt->key_idx[i] != EMPTY_SLOT) {
@@ -996,12 +1168,12 @@ search_and_remove(const struct rte_hash *h, const void *key,
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
 				remove_entry(h, bkt, i);
 
-				/*
-				 * Return index where key is stored,
+				/* Return index where key is stored,
 				 * subtracting the first dummy index
 				 */
 				ret = bkt->key_idx[i] - 1;
 				bkt->key_idx[i] = EMPTY_SLOT;
+				*pos = i;
 				return ret;
 			}
 		}
@@ -1015,34 +1187,66 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
-	int32_t ret;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
+	struct rte_hash_bucket *cur_bkt;
+	int pos;
+	int32_t ret, i;
 
 	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	prim_bkt = &h->buckets[bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
 	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+		__rte_hash_compact_ll(prim_bkt, pos);
+		last_bkt = prim_bkt->next;
+		prev_bkt = prim_bkt;
+		goto return_bkt;
 	}
 
 	/* Calculate secondary hash */
 	alt_hash = rte_hash_secondary_hash(sig);
 	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[bucket_idx];
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		if (ret != -1) {
+			__rte_hash_compact_ll(cur_bkt, pos);
+			last_bkt = sec_bkt->next;
+			prev_bkt = sec_bkt;
+			goto return_bkt;
+		}
+	}
 
-	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, bkt, alt_hash);
-	if (ret != -1) {
+	__hash_rw_writer_unlock(h);
+	return -ENOENT;
+
+/* Search last bucket to see if empty to be recycled */
+return_bkt:
+	if (!last_bkt) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
+	while (last_bkt->next) {
+		prev_bkt = last_bkt;
+		last_bkt = last_bkt->next;
+	}
+
+	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT)
+			break;
+	}
+	/* found empty bucket and recycle */
+	if (i == RTE_HASH_BUCKET_ENTRIES) {
+		prev_bkt->next = last_bkt->next = NULL;
+		uint32_t index = last_bkt - h->buckets_ext + 1;
+		rte_ring_mp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+	}
 
 	__hash_rw_writer_unlock(h);
-	return -ENOENT;
+	return ret;
 }
 
 int32_t
@@ -1143,12 +1347,14 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 {
 	uint64_t hits = 0;
 	int32_t i;
+	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	struct rte_hash_bucket *cur_bkt, *next_bkt;
 
 	/* Prefetch first keys */
 	for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++)
@@ -1266,6 +1472,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		continue;
 	}
 
+	/* all found, do not need to go through ext bkt */
+	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
+		if (hit_mask != NULL)
+			*hit_mask = hits;
+		__hash_rw_reader_unlock(h);
+		return;
+	}
+
+	/* need to check ext buckets for match */
+	for (i = 0; i < num_keys; i++) {
+		if ((hits & (1ULL << i)) != 0)
+			continue;
+		next_bkt = secondary_bkt[i]->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			if (data != NULL)
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], &data[i], cur_bkt);
+			else
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], NULL, cur_bkt);
+			if (ret != -1) {
+				positions[i] = ret;
+				hits |= 1ULL << i;
+				break;
+			}
+		}
+	}
+
 	__hash_rw_reader_unlock(h);
 
 	if (hit_mask != NULL)
@@ -1308,10 +1542,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 
 	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries_main = h->num_buckets *
+							RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries = total_entries_main << 1;
+
 	/* Out of bounds */
-	if (*next >= total_entries)
-		return -ENOENT;
+	if (*next >= total_entries_main)
+		goto extend_table;
 
 	/* Calculate bucket and index of current iterator */
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
@@ -1321,8 +1558,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
-		if (*next == total_entries)
-			return -ENOENT;
+		if (*next == total_entries_main)
+			goto extend_table;
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
@@ -1341,4 +1578,32 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	(*next)++;
 
 	return position - 1;
+
+extend_table:
+	/* Out of bounds */
+	if (*next >= total_entries || !h->ext_table_support)
+		return -ENOENT;
+
+	bucket_idx = (*next - total_entries_main) / RTE_HASH_BUCKET_ENTRIES;
+	idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+
+	while (h->buckets_ext[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+		(*next)++;
+		if (*next == total_entries)
+			return -ENOENT;
+		bucket_idx = (*next - total_entries_main) /
+						RTE_HASH_BUCKET_ENTRIES;
+		idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+	}
+	/* Get position of entry in key table */
+	position = h->buckets_ext[bucket_idx].key_idx[idx];
+	next_key = (struct rte_hash_key *) ((char *)h->key_store +
+				position * h->key_entry_size);
+	/* Return key and data */
+	*key = next_key->key;
+	*data = next_key->pdata;
+
+	/* Increment iterator */
+	(*next)++;
+	return position - 1;
 }
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fc0e5c2..e601520 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -142,6 +142,8 @@ struct rte_hash_bucket {
 	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
+
+	void *next;
 } __rte_cache_aligned;
 
 /** A hash table structure. */
@@ -166,6 +168,7 @@ struct rte_hash {
 	/**< If multi-writer support is enabled. */
 	uint8_t readwrite_concur_support;
 	/**< If read-write concurrency support is enabled */
+	uint8_t ext_table_support;     /**< Enable extendable bucket table */
 	rte_hash_function hash_func;    /**< Function used to calculate hash. */
 	uint32_t hash_func_init_val;    /**< Init value used by hash_func. */
 	rte_hash_cmp_eq_t rte_hash_custom_cmp_eq;
@@ -184,6 +187,8 @@ struct rte_hash {
 	 * to the key table.
 	 */
 	rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
+	struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */
+	struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets */
 } __rte_cache_aligned;
 
 struct queue_node {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d931..11d8e28 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -37,6 +37,9 @@ extern "C" {
 /** Flag to support reader writer concurrency */
 #define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
 
+/** Flag to indicate the extendabe bucket table feature should be used */
+#define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
+
 /** Signature of key that is stored internally. */
 typedef uint32_t hash_sig_t;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v3 2/3] test/hash: implement extendable bucket hash test
  2018-09-26 20:26 ` [PATCH v3 0/3] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-09-26 20:26   ` [PATCH v3 1/3] hash: add extendable bucket feature Yipeng Wang
@ 2018-09-26 20:26   ` Yipeng Wang
  2018-09-26 20:26   ` [PATCH v3 3/3] hash: use partial-key hashing Yipeng Wang
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 20:26 UTC (permalink / raw)
  To: bruce.richardson
  Cc: dev, yipeng1.wang, honnappa.nagarahalli, michel, sameh.gobriel

This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash.c      | 151 +++++++++++++++++++++++++++++++++++++++++++--
 test/test/test_hash_perf.c | 114 +++++++++++++++++++++++++---------
 2 files changed, 230 insertions(+), 35 deletions(-)

diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd..c97095f 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -660,6 +660,116 @@ static int test_full_bucket(void)
 	return 0;
 }
 
+/*
+ * Similar to the test above (full bucket test), but for extendable buckets.
+ */
+static int test_extendable_bucket(void)
+{
+	struct rte_hash_parameters params_pseudo_hash = {
+		.name = "test5",
+		.entries = 64,
+		.key_len = sizeof(struct flow_key), /* 13 */
+		.hash_func = pseudo_hash,
+		.hash_func_init_val = 0,
+		.socket_id = 0,
+		.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
+	};
+	struct rte_hash *handle;
+	int pos[64];
+	int expected_pos[64];
+	unsigned int i;
+	struct flow_key rand_keys[64];
+
+	for (i = 0; i < 64; i++) {
+		rand_keys[i].port_dst = i;
+		rand_keys[i].port_src = i+1;
+	}
+
+	handle = rte_hash_create(&params_pseudo_hash);
+	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
+
+	/* Fill bucket */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add - update */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Delete 1 key, check other keys are still found */
+	pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
+	print_key_info("Del", &rand_keys[35], pos[35]);
+	RETURN_IF_ERROR(pos[35] != expected_pos[35],
+			"failed to delete key (pos[1]=%d)", pos[35]);
+	pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
+	print_key_info("Lkp", &rand_keys[20], pos[20]);
+	RETURN_IF_ERROR(pos[20] != expected_pos[20],
+			"failed lookup after deleting key from same bucket "
+			"(pos[20]=%d)", pos[20]);
+
+	/* Go back to previous state */
+	pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
+	print_key_info("Add", &rand_keys[35], pos[35]);
+	expected_pos[35] = pos[35];
+	RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)", pos[35]);
+
+	/* Delete */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
+		print_key_info("Del", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to delete key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != -ENOENT,
+			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add again */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	rte_hash_free(handle);
+
+	/* Cover the NULL case. */
+	rte_hash_free(0);
+	return 0;
+}
+
 /******************************************************************************/
 static int
 fbk_hash_unit_test(void)
@@ -1096,7 +1206,7 @@ test_hash_creation_with_good_parameters(void)
  * Test to see the average table utilization (entries added/max entries)
  * before hitting a random entry that cannot be added
  */
-static int test_average_table_utilization(void)
+static int test_average_table_utilization(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	uint8_t simple_key[MAX_KEYSIZE];
@@ -1107,12 +1217,23 @@ static int test_average_table_utilization(void)
 
 	printf("\n# Running test to determine average utilization"
 	       "\n  before adding elements begins to fail\n");
+	if (ext_table)
+		printf("ext table is enabled\n");
+	else
+		printf("ext table is disabled\n");
+
 	printf("Measuring performance, please wait");
 	fflush(stdout);
 	ut_params.entries = 1 << 16;
 	ut_params.name = "test_average_utilization";
 	ut_params.hash_func = rte_jhash;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
+
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
 	for (j = 0; j < ITERATIONS; j++) {
@@ -1161,7 +1282,7 @@ static int test_average_table_utilization(void)
 }
 
 #define NUM_ENTRIES 256
-static int test_hash_iteration(void)
+static int test_hash_iteration(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	unsigned i;
@@ -1177,6 +1298,11 @@ static int test_hash_iteration(void)
 	ut_params.name = "test_hash_iteration";
 	ut_params.hash_func = rte_jhash;
 	ut_params.key_len = 16;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
@@ -1186,8 +1312,13 @@ static int test_hash_iteration(void)
 		for (i = 0; i < ut_params.key_len; i++)
 			keys[added_keys][i] = rte_rand() % 255;
 		ret = rte_hash_add_key_data(handle, keys[added_keys], data[added_keys]);
-		if (ret < 0)
+		if (ret < 0) {
+			if (ext_table) {
+				printf("Insertion failed for ext table\n");
+				goto err;
+			}
 			break;
+		}
 	}
 
 	/* Iterate through the hash table */
@@ -1474,6 +1605,8 @@ test_hash(void)
 		return -1;
 	if (test_full_bucket() < 0)
 		return -1;
+	if (test_extendable_bucket() < 0)
+		return -1;
 
 	if (test_fbk_hash_find_existing() < 0)
 		return -1;
@@ -1483,9 +1616,17 @@ test_hash(void)
 		return -1;
 	if (test_hash_creation_with_good_parameters() < 0)
 		return -1;
-	if (test_average_table_utilization() < 0)
+
+	/* ext table disabled */
+	if (test_average_table_utilization(0) < 0)
+		return -1;
+	if (test_hash_iteration(0) < 0)
+		return -1;
+
+	/* ext table enabled */
+	if (test_average_table_utilization(1) < 0)
 		return -1;
-	if (test_hash_iteration() < 0)
+	if (test_hash_iteration(1) < 0)
 		return -1;
 
 	run_hash_func_tests();
diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 4d00c20..d169cd0 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -18,7 +18,8 @@
 #include "test.h"
 
 #define MAX_ENTRIES (1 << 19)
-#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
+#define KEYS_TO_ADD (MAX_ENTRIES)
+#define ADD_PERCENT 0.75 /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
 #define BUCKET_SIZE 8
 #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
@@ -77,7 +78,7 @@ static struct rte_hash_parameters ut_params = {
 
 static int
 create_table(unsigned int with_data, unsigned int table_index,
-		unsigned int with_locks)
+		unsigned int with_locks, unsigned int ext)
 {
 	char name[RTE_HASH_NAMESIZE];
 
@@ -95,6 +96,9 @@ create_table(unsigned int with_data, unsigned int table_index,
 	else
 		ut_params.extra_flag = 0;
 
+	if (ext)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	ut_params.name = name;
 	ut_params.key_len = hashtest_key_lens[table_index];
 	ut_params.socket_id = rte_socket_id();
@@ -116,15 +120,21 @@ create_table(unsigned int with_data, unsigned int table_index,
 
 /* Shuffle the keys that have been added, so lookups will be totally random */
 static void
-shuffle_input_keys(unsigned table_index)
+shuffle_input_keys(unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	uint32_t swap_idx;
 	uint8_t temp_key[MAX_KEYSIZE];
 	hash_sig_t temp_signature;
 	int32_t temp_position;
+	unsigned int keys_to_add;
+
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = KEYS_TO_ADD - 1; i > 0; i--) {
+	for (i = keys_to_add - 1; i > 0; i--) {
 		swap_idx = rte_rand() % i;
 
 		memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
@@ -146,14 +156,20 @@ shuffle_input_keys(unsigned table_index)
  * ALL can fit in hash table (no errors)
  */
 static int
-get_input_keys(unsigned with_pushes, unsigned table_index)
+get_input_keys(unsigned int with_pushes, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j;
 	unsigned bucket_idx, incr, success = 1;
 	uint8_t k = 0;
 	int32_t ret;
 	const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
+	unsigned int keys_to_add;
 
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 	/* Reset all arrays */
 	for (i = 0; i < MAX_ENTRIES; i++)
 		slot_taken[i] = 0;
@@ -170,7 +186,7 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 	 * Regardless a key has been added correctly or not (success),
 	 * the next one to try will be increased by 1.
 	 */
-	for (i = 0; i < KEYS_TO_ADD;) {
+	for (i = 0; i < keys_to_add;) {
 		incr = 0;
 		if (i != 0) {
 			keys[i][0] = ++k;
@@ -234,14 +250,20 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 }
 
 static int
-timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_adds(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *data;
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		data = (void *) ((uintptr_t) signatures[i]);
 		if (with_hash && with_data) {
 			ret = rte_hash_add_key_with_hash_data(h[table_index],
@@ -283,22 +305,31 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][ADD][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][ADD][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
 
 static int
-timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_lookups(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i, j;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *ret_data;
 	void *expected_data;
 	int32_t ret;
-
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD; j++) {
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
+	for (i = 0; i < num_lookups / keys_to_add; i++) {
+		for (j = 0; j < keys_to_add; j++) {
 			if (with_hash && with_data) {
 				ret = rte_hash_lookup_with_hash_data(h[table_index],
 							(const void *) keys[j],
@@ -351,13 +382,14 @@ timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_lookups_multi(unsigned with_data, unsigned table_index)
+timed_lookups_multi(unsigned int with_data, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j, k;
 	int32_t positions_burst[BURST_SIZE];
@@ -366,11 +398,20 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	void *ret_data[BURST_SIZE];
 	uint64_t hit_mask;
 	int ret;
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
 
 	const uint64_t start_tsc = rte_rdtsc();
 
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
+	for (i = 0; i < num_lookups/keys_to_add; i++) {
+		for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
 			for (k = 0; k < BURST_SIZE; k++)
 				keys_burst[k] = keys[j * BURST_SIZE + k];
 			if (with_data) {
@@ -418,19 +459,25 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_deletes(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		/* There are no delete functions with data, so just call two functions */
 		if (with_hash)
 			ret = rte_hash_del_key_with_hash(h[table_index],
@@ -450,7 +497,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][DELETE][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][DELETE][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
@@ -468,7 +515,8 @@ reset_table(unsigned table_index)
 }
 
 static int
-run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
+run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
+						unsigned int ext)
 {
 	unsigned i, j, with_data, with_hash;
 
@@ -477,25 +525,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
 
 	for (with_data = 0; with_data <= 1; with_data++) {
 		for (i = 0; i < NUM_KEYSIZES; i++) {
-			if (create_table(with_data, i, with_locks) < 0)
+			if (create_table(with_data, i, with_locks, ext) < 0)
 				return -1;
 
-			if (get_input_keys(with_pushes, i) < 0)
+			if (get_input_keys(with_pushes, i, ext) < 0)
 				return -1;
 			for (with_hash = 0; with_hash <= 1; with_hash++) {
-				if (timed_adds(with_hash, with_data, i) < 0)
+				if (timed_adds(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				for (j = 0; j < NUM_SHUFFLES; j++)
-					shuffle_input_keys(i);
+					shuffle_input_keys(i, ext);
 
-				if (timed_lookups(with_hash, with_data, i) < 0)
+				if (timed_lookups(with_hash, with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_lookups_multi(with_data, i) < 0)
+				if (timed_lookups_multi(with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_deletes(with_hash, with_data, i) < 0)
+				if (timed_deletes(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				/* Print a dot to show progress on operations */
@@ -631,10 +679,16 @@ test_hash_perf(void)
 				printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
 			else
 				printf("\nELEMENTS IN PRIMARY OR SECONDARY LOCATION\n");
-			if (run_all_tbl_perf_tests(with_pushes, with_locks) < 0)
+			if (run_all_tbl_perf_tests(with_pushes, with_locks, 0) < 0)
 				return -1;
 		}
 	}
+
+	printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
+
+	if (run_all_tbl_perf_tests(1, 0, 1) < 0)
+		return -1;
+
 	if (fbk_hash_perf_test() < 0)
 		return -1;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v3 3/3] hash: use partial-key hashing
  2018-09-26 20:26 ` [PATCH v3 0/3] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-09-26 20:26   ` [PATCH v3 1/3] hash: add extendable bucket feature Yipeng Wang
  2018-09-26 20:26   ` [PATCH v3 2/3] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-09-26 20:26   ` Yipeng Wang
  2018-09-28 17:23   ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-26 20:26 UTC (permalink / raw)
  To: bruce.richardson
  Cc: dev, yipeng1.wang, honnappa.nagarahalli, michel, sameh.gobriel

This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Bascially the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 228 ++++++++++++++++++--------------------
 lib/librte_hash/rte_cuckoo_hash.h |   6 +-
 lib/librte_hash/rte_hash.h        |   5 +-
 3 files changed, 113 insertions(+), 126 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 1c66a0d..1b9b5e8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -90,6 +90,27 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
 		return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
 }
 
+static inline void
+get_buckets_index(const struct rte_hash *h, const hash_sig_t hash,
+		uint32_t *prim_bkt, uint32_t *sec_bkt, uint16_t *sig)
+{
+	/*
+	 * We use higher 16 bits of hash as the signature value stored in table.
+	 * We use the lower bits for the primary bucket
+	 * location. Then we XOR primary bucket location and the signature
+	 * to get the secondary bucket location. This is same as
+	 * proposed in Bin Fan, et al's paper
+	 * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
+	 * Smarter Hashing". The benefit to use
+	 * XOR is that one could derive the alternative bucket location
+	 * by only using the current bucket location and the signature.
+	 */
+	*sig = hash >> 16;
+
+	*prim_bkt = hash & h->bucket_bitmask;
+	*sec_bkt =  (*prim_bkt ^ *sig) & h->bucket_bitmask;
+}
+
 struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
@@ -327,9 +348,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
-		h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
-	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
 		h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
 	else
 #endif
@@ -417,18 +436,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
 	return h->hash_func(key, h->key_len, h->hash_func_init_val);
 }
 
-/* Calc the secondary hash value from the primary hash value of a given key */
-static inline hash_sig_t
-rte_hash_secondary_hash(const hash_sig_t primary_hash)
-{
-	static const unsigned all_bits_shift = 12;
-	static const unsigned alt_bits_xor = 0x5bd1e995;
-
-	uint32_t tag = primary_hash >> all_bits_shift;
-
-	return primary_hash ^ ((tag + 1) * alt_bits_xor);
-}
-
 int32_t
 rte_hash_count(const struct rte_hash *h)
 {
@@ -560,14 +567,13 @@ enqueue_slot_back(const struct rte_hash *h,
 /* Search a key from bucket and update its data */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
-	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+	struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	int i;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		if (bkt->sig_current[i] == sig &&
-				bkt->sig_alt[i] == alt_hash) {
+		if (bkt->sig_current[i] == sig) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -594,7 +600,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		struct rte_hash_bucket *prim_bkt,
 		struct rte_hash_bucket *sec_bkt,
 		const struct rte_hash_key *key, void *data,
-		hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+		uint16_t sig, uint32_t new_idx,
 		int32_t *ret_val)
 {
 	unsigned int i;
@@ -605,7 +611,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -613,7 +619,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -628,7 +634,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			prim_bkt->sig_alt[i] = alt_hash;
 			prim_bkt->key_idx[i] = new_idx;
 			break;
 		}
@@ -653,7 +658,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *alt_bkt,
 			const struct rte_hash_key *key, void *data,
 			struct queue_node *leaf, uint32_t leaf_slot,
-			hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+			uint16_t sig, uint32_t new_idx,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
@@ -674,7 +679,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -682,7 +687,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -695,8 +700,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		prev_bkt = prev_node->bkt;
 		prev_slot = curr_node->prev_slot;
 
-		prev_alt_bkt_idx =
-			prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
+		prev_alt_bkt_idx = (prev_node->cur_bkt_idx ^
+				prev_bkt->sig_current[prev_slot]) &
+				h->bucket_bitmask;
 
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
@@ -710,10 +716,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		 * Cuckoo insert to move elements back to its
 		 * primary bucket if available
 		 */
-		curr_bkt->sig_alt[curr_slot] =
-			 prev_bkt->sig_current[prev_slot];
 		curr_bkt->sig_current[curr_slot] =
-			prev_bkt->sig_alt[prev_slot];
+			prev_bkt->sig_current[prev_slot];
 		curr_bkt->key_idx[curr_slot] =
 			prev_bkt->key_idx[prev_slot];
 
@@ -723,7 +727,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
-	curr_bkt->sig_alt[curr_slot] = alt_hash;
 	curr_bkt->key_idx[curr_slot] = new_idx;
 
 	__hash_rw_writer_unlock(h);
@@ -741,39 +744,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *bkt,
 			struct rte_hash_bucket *sec_bkt,
 			const struct rte_hash_key *key, void *data,
-			hash_sig_t sig, hash_sig_t alt_hash,
+			uint16_t sig, uint32_t bucket_idx,
 			uint32_t new_idx, int32_t *ret_val)
 {
 	unsigned int i;
 	struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
 	struct queue_node *tail, *head;
 	struct rte_hash_bucket *curr_bkt, *alt_bkt;
+	uint32_t cur_idx, alt_idx;
 
 	tail = queue;
 	head = queue + 1;
 	tail->bkt = bkt;
 	tail->prev = NULL;
 	tail->prev_slot = -1;
+	tail->cur_bkt_idx = bucket_idx;
 
 	/* Cuckoo bfs Search */
 	while (likely(tail != head && head <
 					queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
 					RTE_HASH_BUCKET_ENTRIES)) {
 		curr_bkt = tail->bkt;
+		cur_idx = tail->cur_bkt_idx;
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
 				int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
 						bkt, sec_bkt, key, data,
-						tail, i, sig, alt_hash,
+						tail, i, sig,
 						new_idx, ret_val);
 				if (likely(ret != -1))
 					return ret;
 			}
 
 			/* Enqueue new node and keep prev node info */
-			alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
-						    & h->bucket_bitmask]);
+			alt_idx = (curr_bkt->sig_current[i] ^ cur_idx) &
+							h->bucket_bitmask;
+			alt_bkt = &(h->buckets[alt_idx]);
 			head->bkt = alt_bkt;
+			head->cur_bkt_idx = alt_idx;
 			head->prev = tail;
 			head->prev_slot = i;
 			head++;
@@ -788,7 +796,7 @@ static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig, void *data)
 {
-	hash_sig_t alt_hash;
+	uint16_t short_sig;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
@@ -803,18 +811,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	int32_t ret_val;
 	struct rte_hash_bucket *last;
 
-	prim_bucket_idx = sig & h->bucket_bitmask;
+	get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx, &short_sig);
 	prim_bkt = &h->buckets[prim_bucket_idx];
-	rte_prefetch0(prim_bkt);
-
-	alt_hash = rte_hash_secondary_hash(sig);
-	sec_bucket_idx = alt_hash & h->bucket_bitmask;
 	sec_bkt = &h->buckets[sec_bucket_idx];
+	rte_prefetch0(prim_bkt);
 	rte_prefetch0(sec_bkt);
 
 	/* Check if key is already inserted in primary location */
 	__hash_rw_writer_lock(h);
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -822,12 +827,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Check if key is already inserted in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			return ret;
 		}
 	}
+
 	__hash_rw_writer_unlock(h);
 
 	/* Did not find a match, so get a new slot for storing the new key */
@@ -865,7 +871,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+					short_sig, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -875,7 +881,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -885,7 +891,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-					alt_hash, sig, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, new_idx, &ret_val);
 
 	if (ret == 0)
 		return new_idx - 1;
@@ -905,14 +911,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	 */
 	__hash_rw_writer_lock(h);
 	/* We check for duplicates again since could be inserted before the lock */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		goto failure;
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			enqueue_slot_back(h, cached_free_slots, slot_id);
 			goto failure;
@@ -924,8 +930,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
-				cur_bkt->sig_current[i] = alt_hash;
-				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->sig_current[i] = short_sig;
 				cur_bkt->key_idx[i] = new_idx;
 				__hash_rw_writer_unlock(h);
 				return new_idx - 1;
@@ -943,8 +948,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
-	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
@@ -1003,7 +1007,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
 
 /* Search one bucket to find the match key */
 static inline int32_t
-search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
+search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
 			void **data, const struct rte_hash_bucket *bkt)
 {
 	int i;
@@ -1032,30 +1036,28 @@ static inline int32_t
 __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 					hash_sig_t sig, void **data)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx, &short_sig);
+	bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_reader_lock(h);
 
 	/* Check if key is in primary location */
-	ret = search_one_bucket(h, key, sig, data, bkt);
+	ret = search_one_bucket(h, key, short_sig, data, bkt);
 	if (ret != -1) {
 		__hash_rw_reader_unlock(h);
 		return ret;
 	}
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	bkt = &h->buckets[sec_bucket_idx];
 
 	/* Check if key is in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, bkt) {
-		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
 		if (ret != -1) {
 			__hash_rw_reader_unlock(h);
 			return ret;
@@ -1102,7 +1104,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	struct lcore_cache *cached_free_slots;
 
 	bkt->sig_current[i] = NULL_SIGNATURE;
-	bkt->sig_alt[i] = NULL_SIGNATURE;
 	if (h->multi_writer_support) {
 		lcore_id = rte_lcore_id();
 		cached_free_slots = &h->local_free_slots[lcore_id];
@@ -1141,9 +1142,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
 			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
-			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			last_bkt->sig_alt[i] = NULL_SIGNATURE;
 			last_bkt->key_idx[i] = EMPTY_SLOT;
 			return;
 		}
@@ -1153,7 +1152,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
+			struct rte_hash_bucket *bkt, uint16_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
@@ -1185,19 +1184,19 @@ static inline int32_t
 __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
 	struct rte_hash_bucket *cur_bkt;
 	int pos;
 	int32_t ret, i;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	prim_bkt = &h->buckets[bucket_idx];
+	get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx, &short_sig);
+	prim_bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
+	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
 	if (ret != -1) {
 		__rte_hash_compact_ll(prim_bkt, pos);
 		last_bkt = prim_bkt->next;
@@ -1206,12 +1205,10 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	sec_bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[sec_bucket_idx];
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
 		if (ret != -1) {
 			__rte_hash_compact_ll(cur_bkt, pos);
 			last_bkt = sec_bkt->next;
@@ -1288,55 +1285,35 @@ static inline void
 compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 			const struct rte_hash_bucket *prim_bkt,
 			const struct rte_hash_bucket *sec_bkt,
-			hash_sig_t prim_hash, hash_sig_t sec_hash,
+			uint16_t sig,
 			enum rte_hash_sig_compare_function sig_cmp_fn)
 {
 	unsigned int i;
 
+	/* For match mask the first bit of every two bits indicates the match */
 	switch (sig_cmp_fn) {
-#ifdef RTE_MACHINE_CPUFLAG_AVX2
-	case RTE_HASH_COMPARE_AVX2:
-		*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)prim_bkt->sig_current),
-				_mm256_set1_epi32(prim_hash)));
-		*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)sec_bkt->sig_current),
-				_mm256_set1_epi32(sec_hash)));
-		break;
-#endif
 #ifdef RTE_MACHINE_CPUFLAG_SSE2
 	case RTE_HASH_COMPARE_SSE:
-		/* Compare the first 4 signatures in the bucket */
-		*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+		/* Compare all signatures in the bucket */
+		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
-				_mm_set1_epi32(prim_hash)));
-		*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&prim_bkt->sig_current[4]),
-				_mm_set1_epi32(prim_hash)))) << 4;
-		/* Compare the first 4 signatures in the bucket */
-		*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+				_mm_set1_epi16(sig)));
+		/* Compare all signatures in the bucket */
+		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
-				_mm_set1_epi32(sec_hash)));
-		*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&sec_bkt->sig_current[4]),
-				_mm_set1_epi32(sec_hash)))) << 4;
+				_mm_set1_epi16(sig)));
 		break;
 #endif
 	default:
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			*prim_hash_matches |=
-				((prim_hash == prim_bkt->sig_current[i]) << i);
+				((sig == prim_bkt->sig_current[i]) << (i << 1));
 			*sec_hash_matches |=
-				((sec_hash == sec_bkt->sig_current[i]) << i);
+				((sig == sec_bkt->sig_current[i]) << (i << 1));
 		}
 	}
-
 }
 
 #define PREFETCH_OFFSET 4
@@ -1349,7 +1326,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	int32_t i;
 	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
-	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
@@ -1368,10 +1347,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		rte_prefetch0(keys[i + PREFETCH_OFFSET]);
 
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
+		get_buckets_index(h, prim_hash[i],
+				&prim_index[i], &sec_index[i], &sig[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1380,10 +1360,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	/* Calculate and prefetch rest of the buckets */
 	for (; i < num_keys; i++) {
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		get_buckets_index(h, prim_hash[i],
+				&prim_index[i], &sec_index[i], &sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1394,10 +1376,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
 				primary_bkt[i], secondary_bkt[i],
-				prim_hash[i], sec_hash[i], h->sig_cmp_fn);
+				sig[i], h->sig_cmp_fn);
 
 		if (prim_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 			uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1408,7 +1391,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		}
 
 		if (sec_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 			uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1422,7 +1406,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		positions[i] = -ENOENT;
 		while (prim_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 
 			uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1441,11 +1426,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			prim_hitmask[i] &= ~(1 << (hit_index));
+			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 		while (sec_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 
 			uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1465,7 +1451,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			sec_hitmask[i] &= ~(1 << (hit_index));
+			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 next_key:
@@ -1488,10 +1474,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
 			if (data != NULL)
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], &data[i], cur_bkt);
+						sig[i], &data[i], cur_bkt);
 			else
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], NULL, cur_bkt);
+						sig[i], NULL, cur_bkt);
 			if (ret != -1) {
 				positions[i] = ret;
 				hits |= 1ULL << i;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index e601520..7753cd8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -129,18 +129,15 @@ struct rte_hash_key {
 enum rte_hash_sig_compare_function {
 	RTE_HASH_COMPARE_SCALAR = 0,
 	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_AVX2,
 	RTE_HASH_COMPARE_NUM
 };
 
 /** Bucket structure */
 struct rte_hash_bucket {
-	hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
+	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
 	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
 
-	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
-
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
 	void *next;
@@ -193,6 +190,7 @@ struct rte_hash {
 
 struct queue_node {
 	struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
+	uint32_t cur_bkt_idx;
 
 	struct queue_node *prev;     /* Parent(bucket) in search path */
 	int prev_slot;               /* Parent(slot) in search path */
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 11d8e28..0bd7696 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -40,7 +40,10 @@ extern "C" {
 /** Flag to indicate the extendabe bucket table feature should be used */
 #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
 
-/** Signature of key that is stored internally. */
+/**
+ * A hash value that is used to generate signature stored in table and the
+ * location the signature is stored.
+ */
 typedef uint32_t hash_sig_t;
 
 /** Type of function that can be used for calculating the hash value. */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 1/7] test/hash: fix bucket size in hash perf test
  2018-09-26 10:04     ` Bruce Richardson
@ 2018-09-27  3:39       ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-09-27  3:39 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev

Hi Bruce,

In the test, the bucket size and number of buckets are used
to map to the underneath rte_hash structure. They are used
to test performance of two scenarios: keys in primary
buckets only and keys in both primary and secondary buckets.

Although there is no functional issue with bucket size set
to 4, it mismatches the underneath rte_hash structure (i.e. 8),
which may affect code readability and future extension.

I added this description into the commit message.

Thanks
Yipeng

>-----Original Message-----
>
>Can you perhaps give a little detail on what actual problems this caused.
>Did it just mean that we used up too much memory in the test because we
>thought there were more buckets than there were, or something else?
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 3/7] test/hash: fix rw test with non-consecutive cores
  2018-09-26 11:02     ` Bruce Richardson
@ 2018-09-27  3:40       ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-09-27  3:40 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev, Gobriel, Sameh

I added another commit for this. Please test.

Thanks!

>-----Original Message-----
>
>When testing this patch, I see that the read-write autotests are not
>currently in the meson.build file for the test binary. I think this
>patchset should include this fix too, as a separate patch.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 0/7] hash: add extendable bucket and partial key hashing
  2018-09-26 12:57   ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Bruce Richardson
@ 2018-09-27  3:41     ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-09-27  3:41 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev, Gobriel, Sameh

Done! Now they are two separate patch sets.

Thanks
Yipeng

>-----Original Message-----
>From: Richardson, Bruce
>I'd suggest splitting this set into two. The first 4 patches are easy to
>review and should be quickly merged (I hope :-)), allowing us to focus more on
>the bigger patches adding the key new feature support.
>
>/Bruce

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 0/7] hash: add extendable bucket and partial key hashing
  2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (7 preceding siblings ...)
  2018-09-26 12:57   ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Bruce Richardson
@ 2018-09-27  4:23   ` Honnappa Nagarahalli
  2018-09-29  0:46     ` Wang, Yipeng1
  8 siblings, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-27  4:23 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson; +Cc: dev, michel, nd



> -----Original Message-----
> From: Yipeng Wang <yipeng1.wang@intel.com>
> Sent: Friday, September 21, 2018 12:17 PM
> To: bruce.richardson@intel.com
> Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Subject: [PATCH v2 0/7] hash: add extendable bucket and partial key hashing
> 
> The first four commits of the patch set try to fix small issues of previous code.
> 
> The other commits make two major optimizations over the current rte_hash
> library.
> 
> First, it adds Extendable Bucket Table feature: a new structure that can
> accommodate keys that failed to get inserted into the main hash table due to
> the unlikely event of excessive hash collisions. The hash table buckets will get
> extended using a linked list to host these keys. This new design will guarantee
> insertion of 100% of the keys for a given hash table size with minimal
> overhead. A new flag value is added for user to indicate if the extendable
> bucket feature should be enabled or not. The linked list buckets is similar
> concept to the extendable bucket hash table in packet framework.
> In details, for insertion, the linked buckets will be used to store the keys that
> fail to get in the primary and the secondary bucket and the cuckoo path could
> not find an empty location for the maximum path length (small probability).
> For lookup, the key is checked first in the primary, then the secondary, then if
> the secondary is extended the linked list is traversed for a possible match.
> 
> Second, the patch set changes the current hashing algorithm to be "partial-
> key hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
> "MemC3: Compact and Concurrent MemCache with Dumber Caching and
> Smarter Hashing".
I read this paper (but not the papers in references). My understanding is that the existing algorithm already uses 'partial-key hashing'. This patch set is not adding the 'partial-key hashing' feature. Instead it is reducing the size of the signature ('tag' as referred in the paper) from 32 to 16b.
Please let me know if I have not understood this correct.

> Instead of storing both 32-bit signature and alternative
> signature in the bucket, we only store a small 16-bit signature and calculate
> the alternative bucket index by XORing the signature with the current bucket
> index.
According to the referenced paper, the signature ('tag') reduces the number of accesses to the keys, thus improving the performance.
But, if we reduce the size of the signature from 32b to 16b, it will result in higher probability of false matches on the signature. This in turn will increase the number of accesses to keys. Have you run any performance benchmarks and compared the numbers with the existing code? Is it possible to share the numbers?

> This doubles the hash table memory efficiency since now one bucket only
> occupies one cache line instead of two in the original design.
Agree, reduced memory footprint should help increase the performance.

> 
> V1->V2:
> 1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
> 2. hash: Reorder the rte_hash struct to align cache line better.
> 3. test: Minor changes in auto test to add key insertion failure check during
> iteration test.
> 4. test: Add new commit to fix read-write test non-consecutive core issue.
> 4. hash: Add a new commit to remove unnecessary code introduced by
> previous patches.
> 5. hash: Comments improvement and coding style improvements over
> multiple places.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> 
> Yipeng Wang (7):
>   test/hash: fix bucket size in hash perf test
>   test/hash: more accurate hash perf test output
>   test/hash: fix rw test with non-consecutive cores
>   hash: fix unnecessary code
>   hash: add extendable bucket feature
>   test/hash: implement extendable bucket hash test
>   hash: use partial-key hashing
> 
>  lib/librte_hash/rte_cuckoo_hash.c | 516 +++++++++++++++++++++++++++------
> -----
>  lib/librte_hash/rte_cuckoo_hash.h |  13 +-
>  lib/librte_hash/rte_hash.h        |   8 +-
>  test/test/test_hash.c             | 151 ++++++++++-
>  test/test/test_hash_perf.c        | 126 +++++++---
>  test/test/test_hash_readwrite.c   |  78 +++---
>  6 files changed, 672 insertions(+), 220 deletions(-)
> 
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 1/7] test/hash: fix bucket size in hash perf test
  2018-09-21 17:17   ` [PATCH v2 1/7] test/hash: fix bucket size in hash perf test Yipeng Wang
  2018-09-26 10:04     ` Bruce Richardson
@ 2018-09-27  4:23     ` Honnappa Nagarahalli
  2018-09-29  0:31       ` Wang, Yipeng1
  1 sibling, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-27  4:23 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson; +Cc: dev, michel, nd



> -----Original Message-----
> From: Yipeng Wang <yipeng1.wang@intel.com>
> Sent: Friday, September 21, 2018 12:17 PM
> To: bruce.richardson@intel.com
> Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Subject: [PATCH v2 1/7] test/hash: fix bucket size in hash perf test
> 
> The bucket size was changed from 4 to 8 but the corresponding perf test was
> not changed accordingly.
> 
> Fixes: 58017c98ed53 ("hash: add vectorized comparison")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  test/test/test_hash_perf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c index
> 33dcb9f..9ed7125 100644
> --- a/test/test/test_hash_perf.c
> +++ b/test/test/test_hash_perf.c
> @@ -20,7 +20,7 @@
>  #define MAX_ENTRIES (1 << 19)
>  #define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
> #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added,
> several times */ -#define BUCKET_SIZE 4
> +#define BUCKET_SIZE 8
May be we should add a comment to warn that it should be same as ' RTE_HASH_BUCKET_ENTRIES'?

>  #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)  #define
> MAX_KEYSIZE 64  #define NUM_KEYSIZES 10
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-21 17:17   ` [PATCH v2 5/7] hash: add extendable bucket feature Yipeng Wang
@ 2018-09-27  4:23     ` Honnappa Nagarahalli
  2018-09-27 11:15       ` Bruce Richardson
  2018-09-29  1:10       ` Wang, Yipeng1
  0 siblings, 2 replies; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-27  4:23 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson; +Cc: dev, michel



> -----Original Message-----
> From: Yipeng Wang <yipeng1.wang@intel.com>
> Sent: Friday, September 21, 2018 12:18 PM
> To: bruce.richardson@intel.com
> Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Subject: [PATCH v2 5/7] hash: add extendable bucket feature
>
> In use cases that hash table capacity needs to be guaranteed, the extendable
> bucket feature can be used to contain extra keys in linked lists when conflict
> happens. This is similar concept to the extendable bucket hash table in packet
> framework.
>
> This commit adds the extendable bucket feature. User can turn it on or off
> through the extra flag field during table creation time.
>
> Extendable bucket table composes of buckets that can be linked list to current
> main table. When extendable bucket is enabled, the table utilization can
> always acheive 100%.
IMO, referring to this as 'table utilization' indicates an efficiency about memory utilization. Please consider changing this to indicate that all of the configured number of entries will be accommodated?

> Although keys ending up in the ext buckets may have longer look up time, they
> should be rare due to the cuckoo algorithm.
>
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  lib/librte_hash/rte_cuckoo_hash.c | 326
> +++++++++++++++++++++++++++++++++-----
>  lib/librte_hash/rte_cuckoo_hash.h |   5 +
>  lib/librte_hash/rte_hash.h        |   3 +
>  3 files changed, 292 insertions(+), 42 deletions(-)
>
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index f7b86c8..616900b 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -31,6 +31,10 @@
>  #include "rte_hash.h"
>  #include "rte_cuckoo_hash.h"
>
> +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)
> \
> +for (CURRENT_BKT = START_BUCKET;                                      \
> +CURRENT_BKT != NULL;                                          \
> +CURRENT_BKT = CURRENT_BKT->next)
>
>  TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
>
> @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
>  return h;
>  }
>
> +static inline struct rte_hash_bucket *
> +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) {
> +while (lst_bkt->next != NULL)
> +lst_bkt = lst_bkt->next;
> +return lst_bkt;
> +}
> +
>  void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)  {
>  h->cmp_jump_table_idx = KEY_CUSTOM;
> @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  struct rte_tailq_entry *te = NULL;
>  struct rte_hash_list *hash_list;
>  struct rte_ring *r = NULL;
> +struct rte_ring *r_ext = NULL;
>  char hash_name[RTE_HASH_NAMESIZE];
>  void *k = NULL;
>  void *buckets = NULL;
> +void *buckets_ext = NULL;
>  char ring_name[RTE_RING_NAMESIZE];
> +char ext_ring_name[RTE_RING_NAMESIZE];
>  unsigned num_key_slots;
>  unsigned i;
>  unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
> +unsigned int ext_table_support = 0;
>  unsigned int readwrite_concur_support = 0;
>
>  rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
> @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  multi_writer_support = 1;
>  }
>
> +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
> +ext_table_support = 1;
> +
>  /* Store all keys and leave the first entry as a dummy entry for
> lookup_bulk */
>  if (multi_writer_support)
>  /*
> @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  goto err;
>  }
>
> +const uint32_t num_buckets = rte_align32pow2(params->entries) /
> +RTE_HASH_BUCKET_ENTRIES;
> +
> +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> +params-
> >name);
Can be inside the if statement below.

> +/* Create ring for extendable buckets. */
> +if (ext_table_support) {
> +r_ext = rte_ring_create(ext_ring_name,
> +rte_align32pow2(num_buckets + 1),
> +params->socket_id, 0);
> +
> +if (r_ext == NULL) {
> +RTE_LOG(ERR, HASH, "ext buckets memory allocation
> "
> +"failed\n");
> +goto err;
> +}
> +}
> +
>  snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
>
>  rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  goto err_unlock;
>  }
>
> -const uint32_t num_buckets = rte_align32pow2(params->entries)
> -/ RTE_HASH_BUCKET_ENTRIES;
> -
>  buckets = rte_zmalloc_socket(NULL,
>  num_buckets * sizeof(struct rte_hash_bucket),
>  RTE_CACHE_LINE_SIZE, params->socket_id);
>
>  if (buckets == NULL) {
> -RTE_LOG(ERR, HASH, "memory allocation failed\n");
> +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
>  goto err_unlock;
>  }
>
> +/* Allocate same number of extendable buckets */
IMO, we are allocating too much memory to support this feature. Especially, when we claim that keys ending up in the extendable table is a rare occurrence. By doubling the memory we are effectively saying that the main table might have 50% utilization. It will also significantly increase the cycles required to iterate the complete hash table (in rte_hash_iterate API) even when we expect that the extendable table contains very few entries.

I am wondering if we can provide options to control the amount of extra memory that gets allocated and make the memory allocation dynamic (or on demand basis). I think this also goes well with the general direction DPDK is taking - allocate resources as needed rather than allocating all the resources during initialization.

> +if (ext_table_support) {
> +buckets_ext = rte_zmalloc_socket(NULL,
> +num_buckets * sizeof(struct rte_hash_bucket),
> +RTE_CACHE_LINE_SIZE, params->socket_id);
> +if (buckets_ext == NULL) {
> +RTE_LOG(ERR, HASH, "ext buckets memory allocation
> "
> +"failed\n");
> +goto err_unlock;
> +}
> +/* Populate ext bkt ring. We reserve 0 similar to the
> + * key-data slot, just in case in future we want to
> + * use bucket index for the linked list and 0 means NULL
> + * for next bucket
> + */
> +for (i = 1; i <= num_buckets; i++)
Since, the bucket index 0 is reserved, should be 'i < num_buckets'

> +rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
> +}
> +
>  const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params-
> >key_len;
>  const uint64_t key_tbl_size = (uint64_t) key_entry_size *
> num_key_slots;
>
> @@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  h->num_buckets = num_buckets;
>  h->bucket_bitmask = h->num_buckets - 1;
>  h->buckets = buckets;
> +h->buckets_ext = buckets_ext;
> +h->free_ext_bkts = r_ext;
>  h->hash_func = (params->hash_func == NULL) ?
>  default_hash_func : params->hash_func;
>  h->key_store = k;
> @@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  h->hw_trans_mem_support = hw_trans_mem_support;
>  h->multi_writer_support = multi_writer_support;
>  h->readwrite_concur_support = readwrite_concur_support;
> +h->ext_table_support = ext_table_support;
>
>  #if defined(RTE_ARCH_X86)
>  if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> @@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
>  err:
>  rte_ring_free(r);
> +rte_ring_free(r_ext);
>  rte_free(te);
>  rte_free(h);
>  rte_free(buckets);
> +rte_free(buckets_ext);
>  rte_free(k);
>  return NULL;
>  }
> @@ -344,6 +402,7 @@ rte_hash_free(struct rte_hash *h)
>  rte_free(h->readwrite_lock);
>  }
>  rte_ring_free(h->free_slots);
> +rte_ring_free(h->free_ext_bkts);
>  rte_free(h->key_store);
>  rte_free(h->buckets);
Add rte_free(h->buckets_ext);

>  rte_free(h);
> @@ -403,7 +462,6 @@ __hash_rw_writer_lock(const struct rte_hash *h)
>  rte_rwlock_write_lock(h->readwrite_lock);
>  }
>
> -
>  static inline void
>  __hash_rw_reader_lock(const struct rte_hash *h)  { @@ -448,6 +506,14 @@
> rte_hash_reset(struct rte_hash *h)
>  while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
>  rte_pause();
>
> +/* clear free extendable bucket ring and memory */
> +if (h->ext_table_support) {
> +memset(h->buckets_ext, 0, h->num_buckets *
> +sizeof(struct
> rte_hash_bucket));
> +while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
> +rte_pause();
> +}
> +
>  /* Repopulate the free slots ring. Entry zero is reserved for key misses
> */
>  if (h->multi_writer_support)
>  tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) * @@ -
> 458,6 +524,12 @@ rte_hash_reset(struct rte_hash *h)
>  for (i = 1; i < tot_ring_cnt + 1; i++)
>  rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
>
> +/* Repopulate the free ext bkt ring. */
> +if (h->ext_table_support)
> +for (i = 1; i < h->num_buckets + 1; i++)
Index 0 is reserved as per the comments. Condition should be 'i < h->num_buckets'.

> +rte_ring_sp_enqueue(h->free_ext_bkts,
> +(void *)((uintptr_t) i));
> +
>  if (h->multi_writer_support) {
>  /* Reset local caches per lcore */
>  for (i = 0; i < RTE_MAX_LCORE; i++)
> @@ -524,24 +596,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash
> *h,
>  int32_t *ret_val)
>  {
>  unsigned int i;
> -struct rte_hash_bucket *cur_bkt = prim_bkt;
> +struct rte_hash_bucket *cur_bkt;
>  int32_t ret;
>
>  __hash_rw_writer_lock(h);
>  /* Check if key was inserted after last check but before this
>   * protected region in case of inserting duplicated keys.
>   */
> -ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  *ret_val = ret;
>  return 1;
>  }
> -ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
> -if (ret != -1) {
> -__hash_rw_writer_unlock(h);
> -*ret_val = ret;
> -return 1;
> +
> +FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> +ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +if (ret != -1) {
> +__hash_rw_writer_unlock(h);
> +*ret_val = ret;
> +return 1;
> +}
>  }
>
>  /* Insert new entry if there is room in the primary @@ -580,7 +655,7
> @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>  int32_t *ret_val)
>  {
>  uint32_t prev_alt_bkt_idx;
> -struct rte_hash_bucket *cur_bkt = bkt;
> +struct rte_hash_bucket *cur_bkt;
>  struct queue_node *prev_node, *curr_node = leaf;
>  struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
>  uint32_t prev_slot, curr_slot = leaf_slot; @@ -597,18 +672,20 @@
> rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>  /* Check if key was inserted after last check but before this
>   * protected region.
>   */
> -ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, bkt, sig, alt_hash);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  *ret_val = ret;
>  return 1;
>  }
>
> -ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
> -if (ret != -1) {
> -__hash_rw_writer_unlock(h);
> -*ret_val = ret;
> -return 1;
> +FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
> +ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +if (ret != -1) {
> +__hash_rw_writer_unlock(h);
> +*ret_val = ret;
> +return 1;
> +}
>  }
>
>  while (likely(curr_node->prev != NULL)) { @@ -711,15 +788,18 @@
> __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,  {
>  hash_sig_t alt_hash;
>  uint32_t prim_bucket_idx, sec_bucket_idx;
> -struct rte_hash_bucket *prim_bkt, *sec_bkt;
> +struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
>  struct rte_hash_key *new_k, *keys = h->key_store;
>  void *slot_id = NULL;
> -uint32_t new_idx;
> +void *ext_bkt_id = NULL;
> +uint32_t new_idx, bkt_id;
>  int ret;
>  unsigned n_slots;
>  unsigned lcore_id;
> +unsigned int i;
>  struct lcore_cache *cached_free_slots = NULL;
>  int32_t ret_val;
> +struct rte_hash_bucket *last;
>
>  prim_bucket_idx = sig & h->bucket_bitmask;
>  prim_bkt = &h->buckets[prim_bucket_idx]; @@ -739,10 +819,12 @@
> __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>  }
>
>  /* Check if key is already inserted in secondary location */
> -ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
> -if (ret != -1) {
> -__hash_rw_writer_unlock(h);
> -return ret;
> +FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> +ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +if (ret != -1) {
> +__hash_rw_writer_unlock(h);
> +return ret;
> +}
>  }
>  __hash_rw_writer_unlock(h);
>
> @@ -808,10 +890,71 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
>  else if (ret == 1) {
>  enqueue_slot_back(h, cached_free_slots, slot_id);
>  return ret_val;
> -} else {
> +}
> +
> +/* if ext table not enabled, we failed the insertion */
> +if (!h->ext_table_support) {
>  enqueue_slot_back(h, cached_free_slots, slot_id);
>  return ret;
>  }
> +
> +/* Now we need to go through the extendable bucket. Protection is
> needed
> + * to protect all extendable bucket processes.
> + */
> +__hash_rw_writer_lock(h);
> +/* We check for duplicates again since could be inserted before the
> lock */
> +ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +if (ret != -1) {
> +enqueue_slot_back(h, cached_free_slots, slot_id);
> +goto failure;
> +}
> +
> +FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> +ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +if (ret != -1) {
> +enqueue_slot_back(h, cached_free_slots, slot_id);
> +goto failure;
> +}
> +}
> +
> +/* Search extendable buckets to find an empty entry to insert. */
> +struct rte_hash_bucket *next_bkt = sec_bkt->next;
> +FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> +for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +/* Check if slot is available */
> +if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
> +cur_bkt->sig_current[i] = alt_hash;
> +cur_bkt->sig_alt[i] = sig;
> +cur_bkt->key_idx[i] = new_idx;
> +__hash_rw_writer_unlock(h);
> +return new_idx - 1;
> +}
> +}
> +}
> +
> +/* Failed to get an empty entry from extendable buckets. Link a new
> + * extendable bucket. We first get a free bucket from ring.
> + */
> +if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
> +ret = -ENOSPC;
> +goto failure;
> +}
> +
> +bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
If index 0 is reserved, -1 is not required.

> +/* Use the first location of the new bucket */
> +(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
> +(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
> +(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
> +/* Link the new bucket to sec bucket linked list */
> +last = rte_hash_get_last_bkt(sec_bkt);
> +last->next = &h->buckets_ext[bkt_id];
> +__hash_rw_writer_unlock(h);
> +return new_idx - 1;
> +
> +failure:
> +__hash_rw_writer_unlock(h);
> +return ret;
> +
>  }
>
>  int32_t
> @@ -890,7 +1033,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash
> *h, const void *key,  {
>  uint32_t bucket_idx;
>  hash_sig_t alt_hash;
> -struct rte_hash_bucket *bkt;
> +struct rte_hash_bucket *bkt, *cur_bkt;
>  int ret;
>
>  bucket_idx = sig & h->bucket_bitmask;
> @@ -910,10 +1053,12 @@ __rte_hash_lookup_with_hash(const struct
> rte_hash *h, const void *key,
>  bkt = &h->buckets[bucket_idx];
>
>  /* Check if key is in secondary location */
> -ret = search_one_bucket(h, key, alt_hash, data, bkt);
> -if (ret != -1) {
> -__hash_rw_reader_unlock(h);
> -return ret;
> +FOR_EACH_BUCKET(cur_bkt, bkt) {
> +ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
> +if (ret != -1) {
> +__hash_rw_reader_unlock(h);
> +return ret;
> +}
>  }
>  __hash_rw_reader_unlock(h);
>  return -ENOENT;
> @@ -1015,15 +1160,17 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,  {
>  uint32_t bucket_idx;
>  hash_sig_t alt_hash;
> -struct rte_hash_bucket *bkt;
> -int32_t ret;
> +struct rte_hash_bucket *prim_bkt, *sec_bkt;
> +struct rte_hash_bucket *cur_bkt, *prev_bkt, *next_bkt;
> +int32_t ret, i;
> +struct rte_hash_bucket *tobe_removed_bkt = NULL;
>
>  bucket_idx = sig & h->bucket_bitmask;
> -bkt = &h->buckets[bucket_idx];
> +prim_bkt = &h->buckets[bucket_idx];
>
>  __hash_rw_writer_lock(h);
>  /* look for key in primary bucket */
> -ret = search_and_remove(h, key, bkt, sig);
> +ret = search_and_remove(h, key, prim_bkt, sig);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  return ret;
> @@ -1032,17 +1179,51 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,
>  /* Calculate secondary hash */
>  alt_hash = rte_hash_secondary_hash(sig);
>  bucket_idx = alt_hash & h->bucket_bitmask;
> -bkt = &h->buckets[bucket_idx];
> +sec_bkt = &h->buckets[bucket_idx];
>
>  /* look for key in secondary bucket */
> -ret = search_and_remove(h, key, bkt, alt_hash);
> +ret = search_and_remove(h, key, sec_bkt, alt_hash);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  return ret;
>  }
>
> +/* Not in main table, we need to search ext buckets */
> +if (h->ext_table_support) {
> +next_bkt = sec_bkt->next;
> +FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> +ret = search_and_remove(h, key, cur_bkt, alt_hash);
> +if (ret != -1)
> +goto return_bkt;
> +}
> +}
> +
>  __hash_rw_writer_unlock(h);
>  return -ENOENT;
> +
> +/* Search extendable buckets to see if any empty bucket need to be
> +recycled */
> +return_bkt:
> +for (cur_bkt = sec_bkt->next, prev_bkt = sec_bkt; cur_bkt != NULL;
> +prev_bkt = cur_bkt, cur_bkt = cur_bkt->next) {
> +for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +if (cur_bkt->key_idx[i] != EMPTY_SLOT)
> +break;
> +}
> +if (i == RTE_HASH_BUCKET_ENTRIES) {
> +prev_bkt->next = cur_bkt->next;
> +cur_bkt->next = NULL;
> +tobe_removed_bkt = cur_bkt;
> +break;
> +}
> +}
> +
> +__hash_rw_writer_unlock(h);
> +
> +if (tobe_removed_bkt) {
> +uint32_t index = tobe_removed_bkt - h->buckets_ext + 1;
No need to increase the index by 1 if entry 0 is reserved.

> +rte_ring_mp_enqueue(h->free_ext_bkts, (void
> *)(uintptr_t)index);
> +}
> +return ret;
>  }
>
>  int32_t
> @@ -1143,12 +1324,14 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,  {
>  uint64_t hits = 0;
>  int32_t i;
> +int32_t ret;
>  uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
>  uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
>  const struct rte_hash_bucket
> *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>  const struct rte_hash_bucket
> *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>  uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
>  uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
> +struct rte_hash_bucket *cur_bkt, *next_bkt;
>
>  /* Prefetch first keys */
>  for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++) @@ -1266,6
> +1449,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void
> **keys,
>  continue;
>  }
>
> +/* all found, do not need to go through ext bkt */
> +if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
> +if (hit_mask != NULL)
> +*hit_mask = hits;
> +__hash_rw_reader_unlock(h);
> +return;
> +}
> +
> +/* need to check ext buckets for match */
> +for (i = 0; i < num_keys; i++) {
> +if ((hits & (1ULL << i)) != 0)
> +continue;
> +next_bkt = secondary_bkt[i]->next;
> +FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> +if (data != NULL)
> +ret = search_one_bucket(h, keys[i],
> +sec_hash[i], &data[i],
> cur_bkt);
> +else
> +ret = search_one_bucket(h, keys[i],
> +sec_hash[i], NULL, cur_bkt);
> +if (ret != -1) {
> +positions[i] = ret;
> +hits |= 1ULL << i;
> +break;
> +}
> +}
> +}
> +
>  __hash_rw_reader_unlock(h);
>
>  if (hit_mask != NULL)
> @@ -1308,10 +1519,13 @@ rte_hash_iterate(const struct rte_hash *h, const
> void **key, void **data, uint32
>
>  RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
>
> -const uint32_t total_entries = h->num_buckets *
> RTE_HASH_BUCKET_ENTRIES;
> +const uint32_t total_entries_main = h->num_buckets *
> +
> RTE_HASH_BUCKET_ENTRIES;
> +const uint32_t total_entries = total_entries_main << 1;
> +
>  /* Out of bounds */
Minor: update the comment to reflect the new code.

> -if (*next >= total_entries)
> -return -ENOENT;
> +if (*next >= total_entries_main)
> +goto extend_table;
>
>  /* Calculate bucket and index of current iterator */
>  bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES; @@ -1321,8
> +1535,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void
> **data, uint32
>  while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>  (*next)++;
>  /* End of table */
> -if (*next == total_entries)
> -return -ENOENT;
> +if (*next == total_entries_main)
> +goto extend_table;
>  bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>  idx = *next % RTE_HASH_BUCKET_ENTRIES;
>  }
> @@ -1341,4 +1555,32 @@ rte_hash_iterate(const struct rte_hash *h, const
> void **key, void **data, uint32
>  (*next)++;
>
>  return position - 1;
> +
> +extend_table:
> +/* Out of bounds */
> +if (*next >= total_entries || !h->ext_table_support)
> +return -ENOENT;
> +
> +bucket_idx = (*next - total_entries_main) /
> RTE_HASH_BUCKET_ENTRIES;
> +idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
> +
> +while (h->buckets_ext[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> +(*next)++;
> +if (*next == total_entries)
> +return -ENOENT;
> +bucket_idx = (*next - total_entries_main) /
> +RTE_HASH_BUCKET_ENTRIES;
> +idx = (*next - total_entries_main) %
> RTE_HASH_BUCKET_ENTRIES;
> +}
> +/* Get position of entry in key table */
> +position = h->buckets_ext[bucket_idx].key_idx[idx];
There is a possibility that 'position' is not the same value read in the while loop. It presents a problem if 'position' becomes EMPTY_SLOT. 'position' should be read as part of the while loop. Since it is 32b value, it should be atomic on most platforms. This issue applies to existing code as well.

__hash_rw_reader_lock(h) required
> +next_key = (struct rte_hash_key *) ((char *)h->key_store +
> +position * h->key_entry_size);
> +/* Return key and data */
> +*key = next_key->key;
> +*data = next_key->pdata;
> +
__hash_rw_reader_unlock(h) required

> +/* Increment iterator */
> +(*next)++;
> +return position - 1;
>  }
> diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> b/lib/librte_hash/rte_cuckoo_hash.h
> index fc0e5c2..e601520 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.h
> +++ b/lib/librte_hash/rte_cuckoo_hash.h
> @@ -142,6 +142,8 @@ struct rte_hash_bucket {
>  hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
>
>  uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
> +
> +void *next;
>  } __rte_cache_aligned;
>
>  /** A hash table structure. */
> @@ -166,6 +168,7 @@ struct rte_hash {
>  /**< If multi-writer support is enabled. */
>  uint8_t readwrite_concur_support;
>  /**< If read-write concurrency support is enabled */
> +uint8_t ext_table_support;     /**< Enable extendable bucket table */
>  rte_hash_function hash_func;    /**< Function used to calculate hash.
> */
>  uint32_t hash_func_init_val;    /**< Init value used by hash_func. */
>  rte_hash_cmp_eq_t rte_hash_custom_cmp_eq; @@ -184,6 +187,8
> @@ struct rte_hash {
>   * to the key table.
>   */
>  rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
> +struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */
> +struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets
> +*/
>  } __rte_cache_aligned;
>
>  struct queue_node {
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h index
> 9e7d931..11d8e28 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -37,6 +37,9 @@ extern "C" {
>  /** Flag to support reader writer concurrency */  #define
> RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
>
> +/** Flag to indicate the extendabe bucket table feature should be used
> +*/ #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
> +
>  /** Signature of key that is stored internally. */  typedef uint32_t hash_sig_t;
>
> --
> 2.7.4

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 6/7] test/hash: implement extendable bucket hash test
  2018-09-21 17:17   ` [PATCH v2 6/7] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-09-27  4:24     ` Honnappa Nagarahalli
  2018-09-29  0:50       ` Wang, Yipeng1
  0 siblings, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-27  4:24 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson; +Cc: dev, michel



> -----Original Message-----
> From: Yipeng Wang <yipeng1.wang@intel.com>
> Sent: Friday, September 21, 2018 12:18 PM
> To: bruce.richardson@intel.com
> Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Subject: [PATCH v2 6/7] test/hash: implement extendable bucket hash test
>
> This commit changes the current rte_hash unit test to test the extendable
> table feature and performance.
>
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  test/test/test_hash.c      | 151
> +++++++++++++++++++++++++++++++++++++++++++--
>  test/test/test_hash_perf.c | 114 +++++++++++++++++++++++++---------
>  2 files changed, 230 insertions(+), 35 deletions(-)
>
> diff --git a/test/test/test_hash.c b/test/test/test_hash.c index
> b3db9fd..c97095f 100644
> --- a/test/test/test_hash.c
> +++ b/test/test/test_hash.c
> @@ -660,6 +660,116 @@ static int test_full_bucket(void)
>  return 0;
>  }
>
> +/*
> + * Similar to the test above (full bucket test), but for extendable buckets.
> + */
> +static int test_extendable_bucket(void) {
> +struct rte_hash_parameters params_pseudo_hash = {
> +.name = "test5",
> +.entries = 64,
> +.key_len = sizeof(struct flow_key), /* 13 */
> +.hash_func = pseudo_hash,
> +.hash_func_init_val = 0,
> +.socket_id = 0,
> +.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
> +};
> +struct rte_hash *handle;
> +int pos[64];
> +int expected_pos[64];
> +unsigned int i;
> +struct flow_key rand_keys[64];
> +
> +for (i = 0; i < 64; i++) {
> +rand_keys[i].port_dst = i;
> +rand_keys[i].port_src = i+1;
> +}
> +
> +handle = rte_hash_create(&params_pseudo_hash);
> +RETURN_IF_ERROR(handle == NULL, "hash creation failed");
> +
> +/* Fill bucket */
> +for (i = 0; i < 64; i++) {
> +pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
> +print_key_info("Add", &rand_keys[i], pos[i]);
> +RETURN_IF_ERROR(pos[i] < 0,
> +"failed to add key (pos[%u]=%d)", i, pos[i]);
> +expected_pos[i] = pos[i];
> +}
> +
> +/* Lookup */
> +for (i = 0; i < 64; i++) {
> +pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
> +print_key_info("Lkp", &rand_keys[i], pos[i]);
> +RETURN_IF_ERROR(pos[i] != expected_pos[i],
> +"failed to find key (pos[%u]=%d)", i, pos[i]);
> +}
> +
> +/* Add - update */
> +for (i = 0; i < 64; i++) {
> +pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
> +print_key_info("Add", &rand_keys[i], pos[i]);
> +RETURN_IF_ERROR(pos[i] != expected_pos[i],
> +"failed to add key (pos[%u]=%d)", i, pos[i]);
> +}
> +
> +/* Lookup */
> +for (i = 0; i < 64; i++) {
> +pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
> +print_key_info("Lkp", &rand_keys[i], pos[i]);
> +RETURN_IF_ERROR(pos[i] != expected_pos[i],
> +"failed to find key (pos[%u]=%d)", i, pos[i]);
> +}
> +
> +/* Delete 1 key, check other keys are still found */
> +pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
> +print_key_info("Del", &rand_keys[35], pos[35]);
> +RETURN_IF_ERROR(pos[35] != expected_pos[35],
> +"failed to delete key (pos[1]=%d)", pos[35]);
> +pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
> +print_key_info("Lkp", &rand_keys[20], pos[20]);
> +RETURN_IF_ERROR(pos[20] != expected_pos[20],
> +"failed lookup after deleting key from same bucket "
> +"(pos[20]=%d)", pos[20]);
> +
> +/* Go back to previous state */
> +pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
> +print_key_info("Add", &rand_keys[35], pos[35]);
> +expected_pos[35] = pos[35];
> +RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)",
> +pos[35]);
> +
> +/* Delete */
> +for (i = 0; i < 64; i++) {
> +pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
> +print_key_info("Del", &rand_keys[i], pos[i]);
> +RETURN_IF_ERROR(pos[i] != expected_pos[i],
> +"failed to delete key (pos[%u]=%d)", i, pos[i]);
> +}
> +
> +/* Lookup */
> +for (i = 0; i < 64; i++) {
> +pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
> +print_key_info("Lkp", &rand_keys[i], pos[i]);
> +RETURN_IF_ERROR(pos[i] != -ENOENT,
> +"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
> +}
> +
> +/* Add again */
> +for (i = 0; i < 64; i++) {
> +pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
> +print_key_info("Add", &rand_keys[i], pos[i]);
> +RETURN_IF_ERROR(pos[i] < 0,
> +"failed to add key (pos[%u]=%d)", i, pos[i]);
> +expected_pos[i] = pos[i];
> +}
> +
> +rte_hash_free(handle);
> +
> +/* Cover the NULL case. */
> +rte_hash_free(0);
> +return 0;
> +}
> +
>
> /*****************************************************************
> *************/
>  static int
>  fbk_hash_unit_test(void)
> @@ -1096,7 +1206,7 @@ test_hash_creation_with_good_parameters(void)
>   * Test to see the average table utilization (entries added/max entries)
>   * before hitting a random entry that cannot be added
>   */
> -static int test_average_table_utilization(void)
> +static int test_average_table_utilization(uint32_t ext_table)
>  {
>  struct rte_hash *handle;
>  uint8_t simple_key[MAX_KEYSIZE];
> @@ -1107,12 +1217,23 @@ static int test_average_table_utilization(void)
>
>  printf("\n# Running test to determine average utilization"
>         "\n  before adding elements begins to fail\n");
> +if (ext_table)
> +printf("ext table is enabled\n");
> +else
> +printf("ext table is disabled\n");
> +
>  printf("Measuring performance, please wait");
>  fflush(stdout);
>  ut_params.entries = 1 << 16;
>  ut_params.name = "test_average_utilization";
>  ut_params.hash_func = rte_jhash;
> +if (ext_table)
> +ut_params.extra_flag |=
> RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +else
> +ut_params.extra_flag &=
> ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +
>  handle = rte_hash_create(&ut_params);
> +
>  RETURN_IF_ERROR(handle == NULL, "hash creation failed");
>
>  for (j = 0; j < ITERATIONS; j++) {
My understanding is that when extendable table feature is enabled, we will add entries to the full capacity. Hence the rte_hash_count and rte_hash_reset should get tested in this test case.

> @@ -1161,7 +1282,7 @@ static int test_average_table_utilization(void)
>  }
>
>  #define NUM_ENTRIES 256
> -static int test_hash_iteration(void)
> +static int test_hash_iteration(uint32_t ext_table)
>  {
>  struct rte_hash *handle;
>  unsigned i;
> @@ -1177,6 +1298,11 @@ static int test_hash_iteration(void)
>  ut_params.name = "test_hash_iteration";
>  ut_params.hash_func = rte_jhash;
>  ut_params.key_len = 16;
> +if (ext_table)
> +ut_params.extra_flag |=
> RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +else
> +ut_params.extra_flag &=
> ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +
>  handle = rte_hash_create(&ut_params);
>  RETURN_IF_ERROR(handle == NULL, "hash creation failed");
>
> @@ -1186,8 +1312,13 @@ static int test_hash_iteration(void)
>  for (i = 0; i < ut_params.key_len; i++)
>  keys[added_keys][i] = rte_rand() % 255;
>  ret = rte_hash_add_key_data(handle, keys[added_keys],
> data[added_keys]);
> -if (ret < 0)
> +if (ret < 0) {
> +if (ext_table) {
> +printf("Insertion failed for ext table\n");
> +goto err;
> +}
>  break;
> +}
>  }
>
I suggest we add a call to rte_hash_count() to verify that configured maximum number of entries are added, will be a good corner test for rte_hash_count as well.

>  /* Iterate through the hash table */
> @@ -1474,6 +1605,8 @@ test_hash(void)
>  return -1;
>  if (test_full_bucket() < 0)
>  return -1;
> +if (test_extendable_bucket() < 0)
> +return -1;
>
>  if (test_fbk_hash_find_existing() < 0)
>  return -1;
> @@ -1483,9 +1616,17 @@ test_hash(void)
>  return -1;
>  if (test_hash_creation_with_good_parameters() < 0)
>  return -1;
> -if (test_average_table_utilization() < 0)
> +
> +/* ext table disabled */
> +if (test_average_table_utilization(0) < 0)
> +return -1;
> +if (test_hash_iteration(0) < 0)
> +return -1;
> +
> +/* ext table enabled */
> +if (test_average_table_utilization(1) < 0)
>  return -1;
> -if (test_hash_iteration() < 0)
> +if (test_hash_iteration(1) < 0)
>  return -1;
>
>  run_hash_func_tests();
> diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c index
> 4d00c20..d169cd0 100644
> --- a/test/test/test_hash_perf.c
> +++ b/test/test/test_hash_perf.c
> @@ -18,7 +18,8 @@
>  #include "test.h"
>
>  #define MAX_ENTRIES (1 << 19)
> -#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
> +#define KEYS_TO_ADD (MAX_ENTRIES)
> +#define ADD_PERCENT 0.75 /* 75% table utilization */
>  #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added,
> several times */  #define BUCKET_SIZE 8  #define NUM_BUCKETS
> (MAX_ENTRIES / BUCKET_SIZE) @@ -77,7 +78,7 @@ static struct
> rte_hash_parameters ut_params = {
>
>  static int
>  create_table(unsigned int with_data, unsigned int table_index,
> -unsigned int with_locks)
> +unsigned int with_locks, unsigned int ext)
>  {
>  char name[RTE_HASH_NAMESIZE];
>
> @@ -95,6 +96,9 @@ create_table(unsigned int with_data, unsigned int
> table_index,
>  else
>  ut_params.extra_flag = 0;
>
> +if (ext)
> +ut_params.extra_flag |=
> RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +
>  ut_params.name = name;
>  ut_params.key_len = hashtest_key_lens[table_index];
>  ut_params.socket_id = rte_socket_id(); @@ -116,15 +120,21 @@
> create_table(unsigned int with_data, unsigned int table_index,
>
>  /* Shuffle the keys that have been added, so lookups will be totally random */
> static void -shuffle_input_keys(unsigned table_index)
> +shuffle_input_keys(unsigned int table_index, unsigned int ext)
>  {
>  unsigned i;
>  uint32_t swap_idx;
>  uint8_t temp_key[MAX_KEYSIZE];
>  hash_sig_t temp_signature;
>  int32_t temp_position;
> +unsigned int keys_to_add;
> +
> +if (!ext)
> +keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +else
> +keys_to_add = KEYS_TO_ADD;
>
> -for (i = KEYS_TO_ADD - 1; i > 0; i--) {
> +for (i = keys_to_add - 1; i > 0; i--) {
>  swap_idx = rte_rand() % i;
>
>  memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
> @@ -146,14 +156,20 @@ shuffle_input_keys(unsigned table_index)
>   * ALL can fit in hash table (no errors)
>   */
>  static int
> -get_input_keys(unsigned with_pushes, unsigned table_index)
> +get_input_keys(unsigned int with_pushes, unsigned int table_index,
> +unsigned int ext)
>  {
>  unsigned i, j;
>  unsigned bucket_idx, incr, success = 1;
>  uint8_t k = 0;
>  int32_t ret;
>  const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
> +unsigned int keys_to_add;
>
> +if (!ext)
> +keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +else
> +keys_to_add = KEYS_TO_ADD;
>  /* Reset all arrays */
>  for (i = 0; i < MAX_ENTRIES; i++)
>  slot_taken[i] = 0;
> @@ -170,7 +186,7 @@ get_input_keys(unsigned with_pushes, unsigned
> table_index)
>   * Regardless a key has been added correctly or not (success),
>   * the next one to try will be increased by 1.
>   */
> -for (i = 0; i < KEYS_TO_ADD;) {
> +for (i = 0; i < keys_to_add;) {
>  incr = 0;
>  if (i != 0) {
>  keys[i][0] = ++k;
> @@ -234,14 +250,20 @@ get_input_keys(unsigned with_pushes, unsigned
> table_index)  }
>
>  static int
> -timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
> +timed_adds(unsigned int with_hash, unsigned int with_data,
> +unsigned int table_index, unsigned int ext)
>  {
>  unsigned i;
>  const uint64_t start_tsc = rte_rdtsc();
>  void *data;
>  int32_t ret;
> +unsigned int keys_to_add;
> +if (!ext)
> +keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +else
> +keys_to_add = KEYS_TO_ADD;
>
> -for (i = 0; i < KEYS_TO_ADD; i++) {
> +for (i = 0; i < keys_to_add; i++) {
>  data = (void *) ((uintptr_t) signatures[i]);
>  if (with_hash && with_data) {
>  ret =
> rte_hash_add_key_with_hash_data(h[table_index],
> @@ -283,22 +305,31 @@ timed_adds(unsigned with_hash, unsigned
> with_data, unsigned table_index)
>  const uint64_t end_tsc = rte_rdtsc();
>  const uint64_t time_taken = end_tsc - start_tsc;
>
> -cycles[table_index][ADD][with_hash][with_data] =
> time_taken/KEYS_TO_ADD;
> +cycles[table_index][ADD][with_hash][with_data] =
> +time_taken/keys_to_add;
>
>  return 0;
>  }
>
>  static int
> -timed_lookups(unsigned with_hash, unsigned with_data, unsigned
> table_index)
> +timed_lookups(unsigned int with_hash, unsigned int with_data,
> +unsigned int table_index, unsigned int ext)
>  {
>  unsigned i, j;
>  const uint64_t start_tsc = rte_rdtsc();
>  void *ret_data;
>  void *expected_data;
>  int32_t ret;
> -
> -for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
> -for (j = 0; j < KEYS_TO_ADD; j++) {
> +unsigned int keys_to_add, num_lookups;
> +
> +if (!ext) {
> +keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +num_lookups = NUM_LOOKUPS * ADD_PERCENT;
> +} else {
> +keys_to_add = KEYS_TO_ADD;
> +num_lookups = NUM_LOOKUPS;
> +}
> +for (i = 0; i < num_lookups / keys_to_add; i++) {
> +for (j = 0; j < keys_to_add; j++) {
>  if (with_hash && with_data) {
>  ret =
> rte_hash_lookup_with_hash_data(h[table_index],
>  (const void *) keys[j],
> @@ -351,13 +382,14 @@ timed_lookups(unsigned with_hash, unsigned
> with_data, unsigned table_index)
>  const uint64_t end_tsc = rte_rdtsc();
>  const uint64_t time_taken = end_tsc - start_tsc;
>
> -cycles[table_index][LOOKUP][with_hash][with_data] =
> time_taken/NUM_LOOKUPS;
> +cycles[table_index][LOOKUP][with_hash][with_data] =
> +time_taken/num_lookups;
>
>  return 0;
>  }
>
>  static int
> -timed_lookups_multi(unsigned with_data, unsigned table_index)
> +timed_lookups_multi(unsigned int with_data, unsigned int table_index,
> +unsigned int ext)
>  {
>  unsigned i, j, k;
>  int32_t positions_burst[BURST_SIZE];
> @@ -366,11 +398,20 @@ timed_lookups_multi(unsigned with_data,
> unsigned table_index)
>  void *ret_data[BURST_SIZE];
>  uint64_t hit_mask;
>  int ret;
> +unsigned int keys_to_add, num_lookups;
> +
> +if (!ext) {
> +keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +num_lookups = NUM_LOOKUPS * ADD_PERCENT;
> +} else {
> +keys_to_add = KEYS_TO_ADD;
> +num_lookups = NUM_LOOKUPS;
> +}
>
>  const uint64_t start_tsc = rte_rdtsc();
>
> -for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
> -for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
> +for (i = 0; i < num_lookups/keys_to_add; i++) {
> +for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
>  for (k = 0; k < BURST_SIZE; k++)
>  keys_burst[k] = keys[j * BURST_SIZE + k];
>  if (with_data) {
> @@ -418,19 +459,25 @@ timed_lookups_multi(unsigned with_data,
> unsigned table_index)
>  const uint64_t end_tsc = rte_rdtsc();
>  const uint64_t time_taken = end_tsc - start_tsc;
>
> -cycles[table_index][LOOKUP_MULTI][0][with_data] =
> time_taken/NUM_LOOKUPS;
> +cycles[table_index][LOOKUP_MULTI][0][with_data] =
> +time_taken/num_lookups;
>
>  return 0;
>  }
>
>  static int
> -timed_deletes(unsigned with_hash, unsigned with_data, unsigned
> table_index)
> +timed_deletes(unsigned int with_hash, unsigned int with_data,
> +unsigned int table_index, unsigned int ext)
>  {
>  unsigned i;
>  const uint64_t start_tsc = rte_rdtsc();
>  int32_t ret;
> +unsigned int keys_to_add;
> +if (!ext)
> +keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +else
> +keys_to_add = KEYS_TO_ADD;
>
> -for (i = 0; i < KEYS_TO_ADD; i++) {
> +for (i = 0; i < keys_to_add; i++) {
>  /* There are no delete functions with data, so just call two
> functions */
>  if (with_hash)
>  ret = rte_hash_del_key_with_hash(h[table_index],
> @@ -450,7 +497,7 @@ timed_deletes(unsigned with_hash, unsigned
> with_data, unsigned table_index)
>  const uint64_t end_tsc = rte_rdtsc();
>  const uint64_t time_taken = end_tsc - start_tsc;
>
> -cycles[table_index][DELETE][with_hash][with_data] =
> time_taken/KEYS_TO_ADD;
> +cycles[table_index][DELETE][with_hash][with_data] =
> +time_taken/keys_to_add;
>
>  return 0;
>  }
> @@ -468,7 +515,8 @@ reset_table(unsigned table_index)  }
>
>  static int
> -run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
> +run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
> +unsigned int ext)
>  {
>  unsigned i, j, with_data, with_hash;
>
> @@ -477,25 +525,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes,
> unsigned int with_locks)
>
>  for (with_data = 0; with_data <= 1; with_data++) {
>  for (i = 0; i < NUM_KEYSIZES; i++) {
> -if (create_table(with_data, i, with_locks) < 0)
> +if (create_table(with_data, i, with_locks, ext) < 0)
>  return -1;
>
> -if (get_input_keys(with_pushes, i) < 0)
> +if (get_input_keys(with_pushes, i, ext) < 0)
>  return -1;
>  for (with_hash = 0; with_hash <= 1; with_hash++) {
> -if (timed_adds(with_hash, with_data, i) < 0)
> +if (timed_adds(with_hash, with_data, i, ext) <
> 0)
>  return -1;
>
>  for (j = 0; j < NUM_SHUFFLES; j++)
> -shuffle_input_keys(i);
> +shuffle_input_keys(i, ext);
>
> -if (timed_lookups(with_hash, with_data, i) < 0)
> +if (timed_lookups(with_hash, with_data, i, ext)
> < 0)
>  return -1;
>
> -if (timed_lookups_multi(with_data, i) < 0)
> +if (timed_lookups_multi(with_data, i, ext) < 0)
>  return -1;
>
> -if (timed_deletes(with_hash, with_data, i) < 0)
> +if (timed_deletes(with_hash, with_data, i, ext)
> < 0)
>  return -1;
>
>  /* Print a dot to show progress on operations
> */ @@ -631,10 +679,16 @@ test_hash_perf(void)
>  printf("\nALL ELEMENTS IN PRIMARY
> LOCATION\n");
>  else
>  printf("\nELEMENTS IN PRIMARY OR
> SECONDARY LOCATION\n");
> -if (run_all_tbl_perf_tests(with_pushes, with_locks) <
> 0)
> +if (run_all_tbl_perf_tests(with_pushes, with_locks, 0)
> < 0)
>  return -1;
>  }
>  }
> +
> +printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
> +
> +if (run_all_tbl_perf_tests(1, 0, 1) < 0)
> +return -1;
> +
>  if (fbk_hash_perf_test() < 0)
>  return -1;
>
> --
> 2.7.4

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 7/7] hash: use partial-key hashing
  2018-09-21 17:17   ` [PATCH v2 7/7] hash: use partial-key hashing Yipeng Wang
@ 2018-09-27  4:24     ` Honnappa Nagarahalli
  2018-09-29  0:55       ` Wang, Yipeng1
  0 siblings, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-27  4:24 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson; +Cc: dev, michel



> -----Original Message-----
> From: Yipeng Wang <yipeng1.wang@intel.com>
> Sent: Friday, September 21, 2018 12:18 PM
> To: bruce.richardson@intel.com
> Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Subject: [PATCH v2 7/7] hash: use partial-key hashing
>
> This commit changes the hashing mechanism to "partial-key hashing" to
> calculate bucket index and signature of key.
>
> This is  proposed in Bin Fan, et al's paper
> "MemC3: Compact and Concurrent MemCache with Dumber Caching and
> Smarter Hashing". Bascially the idea is to use "xor" to derive alternative
> bucket from current bucket index and signature.
>
> With "partial-key hashing", it reduces the bucket memory requirement from
> two cache lines to one cache line, which improves the memory efficiency and
> thus the lookup speed.
>
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  lib/librte_hash/rte_cuckoo_hash.c | 228 ++++++++++++++++++-------------------
> -
>  lib/librte_hash/rte_cuckoo_hash.h |   6 +-
>  lib/librte_hash/rte_hash.h        |   5 +-
>  3 files changed, 114 insertions(+), 125 deletions(-)
>
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index 616900b..5108ff0 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -90,6 +90,27 @@ rte_hash_cmp_eq(const void *key1, const void *key2,
> const struct rte_hash *h)
>  return cmp_jump_table[h->cmp_jump_table_idx](key1, key2,
> h->key_len);  }
>
> +static inline void
> +get_buckets_index(const struct rte_hash *h, const hash_sig_t hash,
> +uint32_t *prim_bkt, uint32_t *sec_bkt, uint16_t *sig) {
> +/*
> + * We use higher 16 bits of hash as the signature value stored in table.
> + * We use the lower bits for the primary bucket
> + * location. Then we XOR primary bucket location and the signature
> + * to get the secondary bucket location. This is same as
> + * proposed in Bin Fan, et al's paper
> + * "MemC3: Compact and Concurrent MemCache with Dumber
> Caching and
> + * Smarter Hashing". The benefit to use
> + * XOR is that one could derive the alternative bucket location
> + * by only using the current bucket location and the signature.
> + */
> +*sig = hash >> 16;
> +
> +*prim_bkt = hash & h->bucket_bitmask;
> +*sec_bkt =  (*prim_bkt ^ *sig) & h->bucket_bitmask; }
> +
IMO, this function can be split into 2 - one for primary bucket index and another for secondary bucket index. The secondary bucket index calculation function can be used in functions ' rte_hash_cuckoo_move_insert_mw' and ' rte_hash_cuckoo_make_space_mw'.

>  struct rte_hash *
>  rte_hash_create(const struct rte_hash_parameters *params)  { @@ -327,9
> +348,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
>  h->ext_table_support = ext_table_support;
>
>  #if defined(RTE_ARCH_X86)
> -if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> -h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
> -else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
> +if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
>  h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
>  else
>  #endif
> @@ -416,18 +435,6 @@ rte_hash_hash(const struct rte_hash *h, const void
> *key)
>  return h->hash_func(key, h->key_len, h->hash_func_init_val);  }
>
> -/* Calc the secondary hash value from the primary hash value of a given key
> */ -static inline hash_sig_t -rte_hash_secondary_hash(const hash_sig_t
> primary_hash) -{
> -static const unsigned all_bits_shift = 12;
> -static const unsigned alt_bits_xor = 0x5bd1e995;
> -
> -uint32_t tag = primary_hash >> all_bits_shift;
> -
> -return primary_hash ^ ((tag + 1) * alt_bits_xor);
> -}
> -
>  int32_t
>  rte_hash_count(const struct rte_hash *h)  { @@ -558,14 +565,13 @@
> enqueue_slot_back(const struct rte_hash *h,
>  /* Search a key from bucket and update its data */  static inline int32_t
> search_and_update(const struct rte_hash *h, void *data, const void *key,
> -struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
> +struct rte_hash_bucket *bkt, uint16_t sig)
>  {
>  int i;
>  struct rte_hash_key *k, *keys = h->key_store;
>
>  for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> -if (bkt->sig_current[i] == sig &&
> -bkt->sig_alt[i] == alt_hash) {
> +if (bkt->sig_current[i] == sig) {
>  k = (struct rte_hash_key *) ((char *)keys +
>  bkt->key_idx[i] * h->key_entry_size);
>  if (rte_hash_cmp_eq(key, k->key, h) == 0) { @@ -
> 592,7 +598,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>  struct rte_hash_bucket *prim_bkt,
>  struct rte_hash_bucket *sec_bkt,
>  const struct rte_hash_key *key, void *data,
> -hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
> +uint16_t sig, uint32_t new_idx,
>  int32_t *ret_val)
>  {
>  unsigned int i;
> @@ -603,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>  /* Check if key was inserted after last check but before this
>   * protected region in case of inserting duplicated keys.
>   */
> -ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, prim_bkt, sig);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  *ret_val = ret;
> @@ -611,7 +617,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>  }
>
>  FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +ret = search_and_update(h, data, key, cur_bkt, sig);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  *ret_val = ret;
> @@ -626,7 +632,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>  /* Check if slot is available */
>  if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
>  prim_bkt->sig_current[i] = sig;
> -prim_bkt->sig_alt[i] = alt_hash;
>  prim_bkt->key_idx[i] = new_idx;
>  break;
>  }
> @@ -651,7 +656,7 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  struct rte_hash_bucket *alt_bkt,
>  const struct rte_hash_key *key, void *data,
>  struct queue_node *leaf, uint32_t leaf_slot,
> -hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
> +uint16_t sig, uint32_t new_idx,
>  int32_t *ret_val)
>  {
>  uint32_t prev_alt_bkt_idx;
> @@ -672,7 +677,7 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  /* Check if key was inserted after last check but before this
>   * protected region.
>   */
> -ret = search_and_update(h, data, key, bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, bkt, sig);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  *ret_val = ret;
> @@ -680,7 +685,7 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  }
>
>  FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +ret = search_and_update(h, data, key, cur_bkt, sig);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  *ret_val = ret;
> @@ -693,8 +698,9 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  prev_bkt = prev_node->bkt;
>  prev_slot = curr_node->prev_slot;
>
> -prev_alt_bkt_idx =
> -prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
> +prev_alt_bkt_idx = (prev_node->cur_bkt_idx ^
> +prev_bkt->sig_current[prev_slot]) &
> +h->bucket_bitmask;
>
>  if (unlikely(&h->buckets[prev_alt_bkt_idx]
>  != curr_bkt)) {
> @@ -708,10 +714,8 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>   * Cuckoo insert to move elements back to its
>   * primary bucket if available
>   */
> -curr_bkt->sig_alt[curr_slot] =
> - prev_bkt->sig_current[prev_slot];
>  curr_bkt->sig_current[curr_slot] =
> -prev_bkt->sig_alt[prev_slot];
> +prev_bkt->sig_current[prev_slot];
>  curr_bkt->key_idx[curr_slot] =
>  prev_bkt->key_idx[prev_slot];
>
> @@ -721,7 +725,6 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  }
>
>  curr_bkt->sig_current[curr_slot] = sig;
> -curr_bkt->sig_alt[curr_slot] = alt_hash;
>  curr_bkt->key_idx[curr_slot] = new_idx;
>
>  __hash_rw_writer_unlock(h);
> @@ -739,39 +742,44 @@ rte_hash_cuckoo_make_space_mw(const struct
> rte_hash *h,
>  struct rte_hash_bucket *bkt,
>  struct rte_hash_bucket *sec_bkt,
>  const struct rte_hash_key *key, void *data,
> -hash_sig_t sig, hash_sig_t alt_hash,
> +uint16_t sig, uint32_t bucket_idx,
>  uint32_t new_idx, int32_t *ret_val)
>  {
>  unsigned int i;
>  struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
>  struct queue_node *tail, *head;
>  struct rte_hash_bucket *curr_bkt, *alt_bkt;
> +uint32_t cur_idx, alt_idx;
>
>  tail = queue;
>  head = queue + 1;
>  tail->bkt = bkt;
>  tail->prev = NULL;
>  tail->prev_slot = -1;
> +tail->cur_bkt_idx = bucket_idx;
>
>  /* Cuckoo bfs Search */
>  while (likely(tail != head && head <
>  queue +
> RTE_HASH_BFS_QUEUE_MAX_LEN -
>  RTE_HASH_BUCKET_ENTRIES)) {
>  curr_bkt = tail->bkt;
> +cur_idx = tail->cur_bkt_idx;
>  for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>  if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
>  int32_t ret =
> rte_hash_cuckoo_move_insert_mw(h,
>  bkt, sec_bkt, key, data,
> -tail, i, sig, alt_hash,
> +tail, i, sig,
>  new_idx, ret_val);
>  if (likely(ret != -1))
>  return ret;
>  }
>
>  /* Enqueue new node and keep prev node info */
> -alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
> -    & h->bucket_bitmask]);
> +alt_idx = (curr_bkt->sig_current[i] ^ cur_idx) &
> +h->bucket_bitmask;
> +alt_bkt = &(h->buckets[alt_idx]);
>  head->bkt = alt_bkt;
> +head->cur_bkt_idx = alt_idx;
>  head->prev = tail;
>  head->prev_slot = i;
>  head++;
> @@ -786,7 +794,7 @@ static inline int32_t
> __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>  hash_sig_t sig, void *data)
>  {
> -hash_sig_t alt_hash;
> +uint16_t short_sig;
>  uint32_t prim_bucket_idx, sec_bucket_idx;
>  struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
>  struct rte_hash_key *new_k, *keys = h->key_store; @@ -801,18
> +809,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const
> void *key,
>  int32_t ret_val;
>  struct rte_hash_bucket *last;
>
> -prim_bucket_idx = sig & h->bucket_bitmask;
> +get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx,
> +&short_sig);
>  prim_bkt = &h->buckets[prim_bucket_idx];
> -rte_prefetch0(prim_bkt);
> -
> -alt_hash = rte_hash_secondary_hash(sig);
> -sec_bucket_idx = alt_hash & h->bucket_bitmask;
>  sec_bkt = &h->buckets[sec_bucket_idx];
> +rte_prefetch0(prim_bkt);
>  rte_prefetch0(sec_bkt);
>
>  /* Check if key is already inserted in primary location */
>  __hash_rw_writer_lock(h);
> -ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, prim_bkt, short_sig);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  return ret;
> @@ -820,12 +825,13 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
>
>  /* Check if key is already inserted in secondary location */
>  FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +ret = search_and_update(h, data, key, cur_bkt, short_sig);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  return ret;
>  }
>  }
> +
>  __hash_rw_writer_unlock(h);
>
>  /* Did not find a match, so get a new slot for storing the new key */
> @@ -863,7 +869,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
>
>  /* Find an empty slot and insert */
>  ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
> -sig, alt_hash, new_idx, &ret_val);
> +short_sig, new_idx, &ret_val);
>  if (ret == 0)
>  return new_idx - 1;
>  else if (ret == 1) {
> @@ -873,7 +879,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
>
>  /* Primary bucket full, need to make space for new entry */
>  ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key,
> data,
> -sig, alt_hash, new_idx, &ret_val);
> +short_sig, prim_bucket_idx, new_idx,
> &ret_val);
>  if (ret == 0)
>  return new_idx - 1;
>  else if (ret == 1) {
> @@ -883,7 +889,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
>
>  /* Also search secondary bucket to get better occupancy */
>  ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key,
> data,
> -alt_hash, sig, new_idx, &ret_val);
> +short_sig, sec_bucket_idx, new_idx, &ret_val);
>
>  if (ret == 0)
>  return new_idx - 1;
> @@ -903,14 +909,14 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
>   */
>  __hash_rw_writer_lock(h);
>  /* We check for duplicates again since could be inserted before the
> lock */
> -ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, prim_bkt, short_sig);
>  if (ret != -1) {
>  enqueue_slot_back(h, cached_free_slots, slot_id);
>  goto failure;
>  }
>
>  FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +ret = search_and_update(h, data, key, cur_bkt, short_sig);
>  if (ret != -1) {
>  enqueue_slot_back(h, cached_free_slots, slot_id);
>  goto failure;
> @@ -923,8 +929,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
>  for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>  /* Check if slot is available */
>  if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
> -cur_bkt->sig_current[i] = alt_hash;
> -cur_bkt->sig_alt[i] = sig;
> +cur_bkt->sig_current[i] = short_sig;
>  cur_bkt->key_idx[i] = new_idx;
>  __hash_rw_writer_unlock(h);
>  return new_idx - 1;
> @@ -942,8 +947,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
>
>  bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
>  /* Use the first location of the new bucket */
> -(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
> -(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
> +(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
>  (h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
>  /* Link the new bucket to sec bucket linked list */
>  last = rte_hash_get_last_bkt(sec_bkt); @@ -1002,7 +1006,7 @@
> rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
>
>  /* Search one bucket to find the match key */  static inline int32_t -
> search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
> +search_one_bucket(const struct rte_hash *h, const void *key, uint16_t
> +sig,
>  void **data, const struct rte_hash_bucket *bkt)  {
>  int i;
> @@ -1031,30 +1035,28 @@ static inline int32_t
> __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
>  hash_sig_t sig, void **data)
>  {
> -uint32_t bucket_idx;
> -hash_sig_t alt_hash;
> +uint32_t prim_bucket_idx, sec_bucket_idx;
>  struct rte_hash_bucket *bkt, *cur_bkt;
>  int ret;
> +uint16_t short_sig;
>
> -bucket_idx = sig & h->bucket_bitmask;
> -bkt = &h->buckets[bucket_idx];
> +get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx,
> &short_sig);
> +bkt = &h->buckets[prim_bucket_idx];
>
>  __hash_rw_reader_lock(h);
>
>  /* Check if key is in primary location */
> -ret = search_one_bucket(h, key, sig, data, bkt);
> +ret = search_one_bucket(h, key, short_sig, data, bkt);
>  if (ret != -1) {
>  __hash_rw_reader_unlock(h);
>  return ret;
>  }
>  /* Calculate secondary hash */
> -alt_hash = rte_hash_secondary_hash(sig);
> -bucket_idx = alt_hash & h->bucket_bitmask;
> -bkt = &h->buckets[bucket_idx];
> +bkt = &h->buckets[sec_bucket_idx];
>
>  /* Check if key is in secondary location */
>  FOR_EACH_BUCKET(cur_bkt, bkt) {
> -ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
> +ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
>  if (ret != -1) {
>  __hash_rw_reader_unlock(h);
>  return ret;
> @@ -1101,7 +1103,6 @@ remove_entry(const struct rte_hash *h, struct
> rte_hash_bucket *bkt, unsigned i)
>  struct lcore_cache *cached_free_slots;
>
>  bkt->sig_current[i] = NULL_SIGNATURE;
> -bkt->sig_alt[i] = NULL_SIGNATURE;
>  if (h->multi_writer_support) {
>  lcore_id = rte_lcore_id();
>  cached_free_slots = &h->local_free_slots[lcore_id]; @@ -
> 1126,7 +1127,7 @@ remove_entry(const struct rte_hash *h, struct
> rte_hash_bucket *bkt, unsigned i)
>  /* Search one bucket and remove the matched key */  static inline int32_t
> search_and_remove(const struct rte_hash *h, const void *key,
> -struct rte_hash_bucket *bkt, hash_sig_t sig)
> +struct rte_hash_bucket *bkt, uint16_t sig)
>  {
>  struct rte_hash_key *k, *keys = h->key_store;
>  unsigned int i;
> @@ -1158,31 +1159,29 @@ static inline int32_t
> __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
>  hash_sig_t sig)
>  {
> -uint32_t bucket_idx;
> -hash_sig_t alt_hash;
> +uint32_t prim_bucket_idx, sec_bucket_idx;
>  struct rte_hash_bucket *prim_bkt, *sec_bkt;
>  struct rte_hash_bucket *cur_bkt, *prev_bkt, *next_bkt;
>  int32_t ret, i;
>  struct rte_hash_bucket *tobe_removed_bkt = NULL;
> +uint16_t short_sig;
>
> -bucket_idx = sig & h->bucket_bitmask;
> -prim_bkt = &h->buckets[bucket_idx];
> +get_buckets_index(h, sig, &prim_bucket_idx, &sec_bucket_idx,
> &short_sig);
> +prim_bkt = &h->buckets[prim_bucket_idx];
>
>  __hash_rw_writer_lock(h);
>  /* look for key in primary bucket */
> -ret = search_and_remove(h, key, prim_bkt, sig);
> +ret = search_and_remove(h, key, prim_bkt, short_sig);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  return ret;
>  }
>
>  /* Calculate secondary hash */
> -alt_hash = rte_hash_secondary_hash(sig);
> -bucket_idx = alt_hash & h->bucket_bitmask;
> -sec_bkt = &h->buckets[bucket_idx];
> +sec_bkt = &h->buckets[sec_bucket_idx];
>
>  /* look for key in secondary bucket */
> -ret = search_and_remove(h, key, sec_bkt, alt_hash);
> +ret = search_and_remove(h, key, sec_bkt, short_sig);
>  if (ret != -1) {
>  __hash_rw_writer_unlock(h);
>  return ret;
> @@ -1192,7 +1191,7 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,
>  if (h->ext_table_support) {
>  next_bkt = sec_bkt->next;
>  FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> -ret = search_and_remove(h, key, cur_bkt, alt_hash);
> +ret = search_and_remove(h, key, cur_bkt, short_sig);
>  if (ret != -1)
>  goto return_bkt;
>  }
> @@ -1265,55 +1264,35 @@ static inline void  compare_signatures(uint32_t
> *prim_hash_matches, uint32_t *sec_hash_matches,
>  const struct rte_hash_bucket *prim_bkt,
>  const struct rte_hash_bucket *sec_bkt,
> -hash_sig_t prim_hash, hash_sig_t sec_hash,
> +uint16_t sig,
>  enum rte_hash_sig_compare_function sig_cmp_fn)  {
>  unsigned int i;
>
> +/* For match mask the first bit of every two bits indicates the match
> +*/
>  switch (sig_cmp_fn) {
> -#ifdef RTE_MACHINE_CPUFLAG_AVX2
> -case RTE_HASH_COMPARE_AVX2:
> -*prim_hash_matches =
> _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> -_mm256_load_si256(
> -(__m256i const *)prim_bkt-
> >sig_current),
> -_mm256_set1_epi32(prim_hash)));
> -*sec_hash_matches =
> _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> -_mm256_load_si256(
> -(__m256i const *)sec_bkt-
> >sig_current),
> -_mm256_set1_epi32(sec_hash)));
> -break;
> -#endif
>  #ifdef RTE_MACHINE_CPUFLAG_SSE2
>  case RTE_HASH_COMPARE_SSE:
> -/* Compare the first 4 signatures in the bucket */
> -*prim_hash_matches =
> _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +/* Compare all signatures in the bucket */
> +*prim_hash_matches =
> _mm_movemask_epi8(_mm_cmpeq_epi16(
>  _mm_load_si128(
>  (__m128i const *)prim_bkt-
> >sig_current),
> -_mm_set1_epi32(prim_hash)));
> -*prim_hash_matches |=
> (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> -_mm_load_si128(
> -(__m128i const *)&prim_bkt-
> >sig_current[4]),
> -_mm_set1_epi32(prim_hash)))) << 4;
> -/* Compare the first 4 signatures in the bucket */
> -*sec_hash_matches =
> _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +_mm_set1_epi16(sig)));
> +/* Compare all signatures in the bucket */
> +*sec_hash_matches =
> _mm_movemask_epi8(_mm_cmpeq_epi16(
>  _mm_load_si128(
>  (__m128i const *)sec_bkt-
> >sig_current),
> -_mm_set1_epi32(sec_hash)));
> -*sec_hash_matches |=
> (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> -_mm_load_si128(
> -(__m128i const *)&sec_bkt-
> >sig_current[4]),
> -_mm_set1_epi32(sec_hash)))) << 4;
> +_mm_set1_epi16(sig)));
>  break;
>  #endif
>  default:
>  for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>  *prim_hash_matches |=
> -((prim_hash == prim_bkt->sig_current[i]) << i);
> +((sig == prim_bkt->sig_current[i]) << (i << 1));
>  *sec_hash_matches |=
> -((sec_hash == sec_bkt->sig_current[i]) << i);
> +((sig == sec_bkt->sig_current[i]) << (i << 1));
>  }
>  }
> -
>  }
>
>  #define PREFETCH_OFFSET 4
> @@ -1326,7 +1305,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h,
> const void **keys,
>  int32_t i;
>  int32_t ret;
>  uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
> -uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
> +uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
> +uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
> +uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
>  const struct rte_hash_bucket
> *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>  const struct rte_hash_bucket
> *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>  uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0}; @@ -
> 1345,10 +1326,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h,
> const void **keys,
>  rte_prefetch0(keys[i + PREFETCH_OFFSET]);
>
>  prim_hash[i] = rte_hash_hash(h, keys[i]);
> -sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
> +get_buckets_index(h, prim_hash[i],
> +&prim_index[i], &sec_index[i], &sig[i]);
>
> -primary_bkt[i] = &h->buckets[prim_hash[i] & h-
> >bucket_bitmask];
> -secondary_bkt[i] = &h->buckets[sec_hash[i] & h-
> >bucket_bitmask];
> +primary_bkt[i] = &h->buckets[prim_index[i]];
> +secondary_bkt[i] = &h->buckets[sec_index[i]];
>
>  rte_prefetch0(primary_bkt[i]);
>  rte_prefetch0(secondary_bkt[i]);
> @@ -1357,10 +1339,12 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,
>  /* Calculate and prefetch rest of the buckets */
>  for (; i < num_keys; i++) {
>  prim_hash[i] = rte_hash_hash(h, keys[i]);
> -sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
>
> -primary_bkt[i] = &h->buckets[prim_hash[i] & h-
> >bucket_bitmask];
> -secondary_bkt[i] = &h->buckets[sec_hash[i] & h-
> >bucket_bitmask];
> +get_buckets_index(h, prim_hash[i],
> +&prim_index[i], &sec_index[i], &sig[i]);
> +
> +primary_bkt[i] = &h->buckets[prim_index[i]];
> +secondary_bkt[i] = &h->buckets[sec_index[i]];
>
>  rte_prefetch0(primary_bkt[i]);
>  rte_prefetch0(secondary_bkt[i]);
> @@ -1371,10 +1355,11 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,
>  for (i = 0; i < num_keys; i++) {
>  compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
>  primary_bkt[i], secondary_bkt[i],
> -prim_hash[i], sec_hash[i], h->sig_cmp_fn);
> +sig[i], h->sig_cmp_fn);
>
>  if (prim_hitmask[i]) {
> -uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
> +uint32_t first_hit =
> +__builtin_ctzl(prim_hitmask[i]) >> 1;
>  uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
>  const struct rte_hash_key *key_slot =
>  (const struct rte_hash_key *)(
> @@ -1385,7 +1370,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h,
> const void **keys,
>  }
>
>  if (sec_hitmask[i]) {
> -uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
> +uint32_t first_hit =
> +__builtin_ctzl(sec_hitmask[i]) >> 1;
>  uint32_t key_idx = secondary_bkt[i]-
> >key_idx[first_hit];
>  const struct rte_hash_key *key_slot =
>  (const struct rte_hash_key *)(
> @@ -1399,7 +1385,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h,
> const void **keys,
>  for (i = 0; i < num_keys; i++) {
>  positions[i] = -ENOENT;
>  while (prim_hitmask[i]) {
> -uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
> +uint32_t hit_index =
> +__builtin_ctzl(prim_hitmask[i]) >> 1;
>
>  uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
>  const struct rte_hash_key *key_slot = @@ -1418,11
> +1405,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void
> **keys,
>  positions[i] = key_idx - 1;
>  goto next_key;
>  }
> -prim_hitmask[i] &= ~(1 << (hit_index));
> +prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
>  }
>
>  while (sec_hitmask[i]) {
> -uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
> +uint32_t hit_index =
> +__builtin_ctzl(sec_hitmask[i]) >> 1;
>
>  uint32_t key_idx = secondary_bkt[i]-
> >key_idx[hit_index];
>  const struct rte_hash_key *key_slot = @@ -1442,7
> +1430,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void
> **keys,
>  positions[i] = key_idx - 1;
>  goto next_key;
>  }
> -sec_hitmask[i] &= ~(1 << (hit_index));
> +sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
>  }
>
>  next_key:
> @@ -1465,10 +1453,10 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,
>  FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>  if (data != NULL)
>  ret = search_one_bucket(h, keys[i],
> -sec_hash[i], &data[i],
> cur_bkt);
> +sig[i], &data[i], cur_bkt);
>  else
>  ret = search_one_bucket(h, keys[i],
> -sec_hash[i], NULL, cur_bkt);
> +sig[i], NULL, cur_bkt);
>  if (ret != -1) {
>  positions[i] = ret;
>  hits |= 1ULL << i;
> diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> b/lib/librte_hash/rte_cuckoo_hash.h
> index e601520..7753cd8 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.h
> +++ b/lib/librte_hash/rte_cuckoo_hash.h
> @@ -129,18 +129,15 @@ struct rte_hash_key {  enum
> rte_hash_sig_compare_function {
>  RTE_HASH_COMPARE_SCALAR = 0,
>  RTE_HASH_COMPARE_SSE,
> -RTE_HASH_COMPARE_AVX2,
>  RTE_HASH_COMPARE_NUM
>  };
>
>  /** Bucket structure */
>  struct rte_hash_bucket {
> -hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
> +uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
>
>  uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
>
> -hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
> -
>  uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
>
>  void *next;
> @@ -193,6 +190,7 @@ struct rte_hash {
>
>  struct queue_node {
>  struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
> +uint32_t cur_bkt_idx;
>
>  struct queue_node *prev;     /* Parent(bucket) in search path */
>  int prev_slot;               /* Parent(slot) in search path */
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h index
> 11d8e28..0bd7696 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -40,7 +40,10 @@ extern "C" {
>  /** Flag to indicate the extendabe bucket table feature should be used */
> #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
>
> -/** Signature of key that is stored internally. */
> +/**
> + * A hash value that is used to generate signature stored in table and
> +the
> + * location the signature is stored.
> + */
This is an external file. This documentation goes into the API guide. IMO, we should change the comment to help the user. How about changing this to 'hash value of the key'?

>  typedef uint32_t hash_sig_t;
>
>  /** Type of function that can be used for calculating the hash value. */
> --
> 2.7.4

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-27  4:23     ` Honnappa Nagarahalli
@ 2018-09-27 11:15       ` Bruce Richardson
  2018-09-27 11:27         ` Ananyev, Konstantin
  2018-09-27 19:21         ` Honnappa Nagarahalli
  2018-09-29  1:10       ` Wang, Yipeng1
  1 sibling, 2 replies; 107+ messages in thread
From: Bruce Richardson @ 2018-09-27 11:15 UTC (permalink / raw)
  To: Honnappa Nagarahalli; +Cc: Yipeng Wang, dev, michel

On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote:
> 
> 
> > -----Original Message-----
> > From: Yipeng Wang <yipeng1.wang@intel.com>
> > Sent: Friday, September 21, 2018 12:18 PM
> > To: bruce.richardson@intel.com
> > Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br;
> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Subject: [PATCH v2 5/7] hash: add extendable bucket feature
> >
> > In use cases that hash table capacity needs to be guaranteed, the extendable
> > bucket feature can be used to contain extra keys in linked lists when conflict
> > happens. This is similar concept to the extendable bucket hash table in packet
> > framework.
> >
> > This commit adds the extendable bucket feature. User can turn it on or off
> > through the extra flag field during table creation time.
> >
> > Extendable bucket table composes of buckets that can be linked list to current
> > main table. When extendable bucket is enabled, the table utilization can
> > always acheive 100%.
> IMO, referring to this as 'table utilization' indicates an efficiency about memory utilization. Please consider changing this to indicate that all of the configured number of entries will be accommodated?
> 
> > Although keys ending up in the ext buckets may have longer look up time, they
> > should be rare due to the cuckoo algorithm.
> >
> > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> > ---
> >  lib/librte_hash/rte_cuckoo_hash.c | 326
> > +++++++++++++++++++++++++++++++++-----
> >  lib/librte_hash/rte_cuckoo_hash.h |   5 +
> >  lib/librte_hash/rte_hash.h        |   3 +
> >  3 files changed, 292 insertions(+), 42 deletions(-)
> >
> > diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> > b/lib/librte_hash/rte_cuckoo_hash.c
> > index f7b86c8..616900b 100644
> > --- a/lib/librte_hash/rte_cuckoo_hash.c
> > +++ b/lib/librte_hash/rte_cuckoo_hash.c
> > @@ -31,6 +31,10 @@
> >  #include "rte_hash.h"
> >  #include "rte_cuckoo_hash.h"
> >
> > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)
> > \
> > +for (CURRENT_BKT = START_BUCKET;                                      \
> > +CURRENT_BKT != NULL;                                          \
> > +CURRENT_BKT = CURRENT_BKT->next)
> >
> >  TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
> >
> > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
> >  return h;
> >  }
> >
> > +static inline struct rte_hash_bucket *
> > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) {
> > +while (lst_bkt->next != NULL)
> > +lst_bkt = lst_bkt->next;
> > +return lst_bkt;
> > +}
> > +
> >  void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)  {
> >  h->cmp_jump_table_idx = KEY_CUSTOM;
> > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters
> > *params)
> >  struct rte_tailq_entry *te = NULL;
> >  struct rte_hash_list *hash_list;
> >  struct rte_ring *r = NULL;
> > +struct rte_ring *r_ext = NULL;
> >  char hash_name[RTE_HASH_NAMESIZE];
> >  void *k = NULL;
> >  void *buckets = NULL;
> > +void *buckets_ext = NULL;
> >  char ring_name[RTE_RING_NAMESIZE];
> > +char ext_ring_name[RTE_RING_NAMESIZE];
> >  unsigned num_key_slots;
> >  unsigned i;
> >  unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
> > +unsigned int ext_table_support = 0;
> >  unsigned int readwrite_concur_support = 0;
> >
> >  rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
> > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters
> > *params)
> >  multi_writer_support = 1;
> >  }
> >
> > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
> > +ext_table_support = 1;
> > +
> >  /* Store all keys and leave the first entry as a dummy entry for
> > lookup_bulk */
> >  if (multi_writer_support)
> >  /*
> > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters
> > *params)
> >  goto err;
> >  }
> >
> > +const uint32_t num_buckets = rte_align32pow2(params->entries) /
> > +RTE_HASH_BUCKET_ENTRIES;
> > +
> > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> > +params-
> > >name);
> Can be inside the if statement below.
> 
> > +/* Create ring for extendable buckets. */
> > +if (ext_table_support) {
> > +r_ext = rte_ring_create(ext_ring_name,
> > +rte_align32pow2(num_buckets + 1),
> > +params->socket_id, 0);
> > +
> > +if (r_ext == NULL) {
> > +RTE_LOG(ERR, HASH, "ext buckets memory allocation
> > "
> > +"failed\n");
> > +goto err;
> > +}
> > +}
> > +
> >  snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
> >
> >  rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters
> > *params)
> >  goto err_unlock;
> >  }
> >
> > -const uint32_t num_buckets = rte_align32pow2(params->entries)
> > -/ RTE_HASH_BUCKET_ENTRIES;
> > -
> >  buckets = rte_zmalloc_socket(NULL,
> >  num_buckets * sizeof(struct rte_hash_bucket),
> >  RTE_CACHE_LINE_SIZE, params->socket_id);
> >
> >  if (buckets == NULL) {
> > -RTE_LOG(ERR, HASH, "memory allocation failed\n");
> > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
> >  goto err_unlock;
> >  }
> >
> > +/* Allocate same number of extendable buckets */
> IMO, we are allocating too much memory to support this feature. Especially, when we claim that keys ending up in the extendable table is a rare occurrence. By doubling the memory we are effectively saying that the main table might have 50% utilization. It will also significantly increase the cycles required to iterate the complete hash table (in rte_hash_iterate API) even when we expect that the extendable table contains very few entries.
> 
> I am wondering if we can provide options to control the amount of extra memory that gets allocated and make the memory allocation dynamic (or on demand basis). I think this also goes well with the general direction DPDK is taking - allocate resources as needed rather than allocating all the resources during initialization.
> 

Given that adding new entries should not normally be a fast-path function,
how about allowing memory allocation in add itself. Why not initialize with
a fairly small number of extra bucket entries, and then each time they are
all used, double the number of entries. That will give efficient resource
scaling, I think.

/Bruce

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3 1/5] test/hash: fix bucket size in hash perf test
  2018-09-26 12:54   ` [PATCH v3 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
@ 2018-09-27 11:17     ` Bruce Richardson
  0 siblings, 0 replies; 107+ messages in thread
From: Bruce Richardson @ 2018-09-27 11:17 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, honnappa.nagarahalli, sameh.gobriel

On Wed, Sep 26, 2018 at 05:54:21AM -0700, Yipeng Wang wrote:
> The bucket size was changed from 4 to 8 but the corresponding
> perf test was not changed accordingly.
> 
> In the test, the bucket size and number of buckets are used
> to map to the underneath rte_hash structure. They are used
> to test performance of two conditions: keys in primary
> buckets only and keys in both primary and secondary buckets.
> 
> Although there is no functional issue with bucket size set
> to 4, it mismatches the underneath rte_hash structure,
> which may affect code readability and future extension.
> 
> Fixes: 58017c98ed53 ("hash: add vectorized comparison")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  test/test/test_hash_perf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
> index 33dcb9f..9ed7125 100644
> --- a/test/test/test_hash_perf.c
> +++ b/test/test/test_hash_perf.c
> @@ -20,7 +20,7 @@
>  #define MAX_ENTRIES (1 << 19)
>  #define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
>  #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
> -#define BUCKET_SIZE 4
> +#define BUCKET_SIZE 8
>  #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
>  #define MAX_KEYSIZE 64
>  #define NUM_KEYSIZES 10

Honnappa's suggestion that a comment be added here to indicate that the
value should be kept in sync with the rte_hash one is a good one, and
should be done.

Otherwise:

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3 3/5] test/hash: fix rw test with non-consecutive cores
  2018-09-26 12:54   ` [PATCH v3 3/5] test/hash: fix rw test with non-consecutive cores Yipeng Wang
@ 2018-09-27 11:18     ` Bruce Richardson
  0 siblings, 0 replies; 107+ messages in thread
From: Bruce Richardson @ 2018-09-27 11:18 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, honnappa.nagarahalli, sameh.gobriel

On Wed, Sep 26, 2018 at 05:54:23AM -0700, Yipeng Wang wrote:
> the multi-reader and multi-writer rte_hash unit test does not
> work correctly with non-consicutive core ids. This commit
                           ^^^^^^
typo, forgot to mention on previous review

> fixes the issue.
> 
> Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Tested-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v3 4/5] test/hash: fix missing file in meson build file
  2018-09-26 12:54   ` [PATCH v3 4/5] test/hash: fix missing file in meson build file Yipeng Wang
@ 2018-09-27 11:22     ` Bruce Richardson
  0 siblings, 0 replies; 107+ messages in thread
From: Bruce Richardson @ 2018-09-27 11:22 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, honnappa.nagarahalli, sameh.gobriel

On Wed, Sep 26, 2018 at 05:54:24AM -0700, Yipeng Wang wrote:
> The test_hash_readwrite.c was not in the meson.build file. This
> commit adds the missing test into the file.
> 
> Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-27 11:15       ` Bruce Richardson
@ 2018-09-27 11:27         ` Ananyev, Konstantin
  2018-09-27 12:27           ` Bruce Richardson
  2018-09-27 19:21         ` Honnappa Nagarahalli
  1 sibling, 1 reply; 107+ messages in thread
From: Ananyev, Konstantin @ 2018-09-27 11:27 UTC (permalink / raw)
  To: Richardson, Bruce, Honnappa Nagarahalli; +Cc: Wang, Yipeng1, dev, michel



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Thursday, September 27, 2018 12:15 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; dev@dpdk.org; michel@digirati.com.br
> Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature
> 
> On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote:
> >
> >
> > > -----Original Message-----
> > > From: Yipeng Wang <yipeng1.wang@intel.com>
> > > Sent: Friday, September 21, 2018 12:18 PM
> > > To: bruce.richardson@intel.com
> > > Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br;
> > > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature
> > >
> > > In use cases that hash table capacity needs to be guaranteed, the extendable
> > > bucket feature can be used to contain extra keys in linked lists when conflict
> > > happens. This is similar concept to the extendable bucket hash table in packet
> > > framework.
> > >
> > > This commit adds the extendable bucket feature. User can turn it on or off
> > > through the extra flag field during table creation time.
> > >
> > > Extendable bucket table composes of buckets that can be linked list to current
> > > main table. When extendable bucket is enabled, the table utilization can
> > > always acheive 100%.
> > IMO, referring to this as 'table utilization' indicates an efficiency about memory utilization. Please consider changing this to indicate that
> all of the configured number of entries will be accommodated?
> >
> > > Although keys ending up in the ext buckets may have longer look up time, they
> > > should be rare due to the cuckoo algorithm.
> > >
> > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> > > ---
> > >  lib/librte_hash/rte_cuckoo_hash.c | 326
> > > +++++++++++++++++++++++++++++++++-----
> > >  lib/librte_hash/rte_cuckoo_hash.h |   5 +
> > >  lib/librte_hash/rte_hash.h        |   3 +
> > >  3 files changed, 292 insertions(+), 42 deletions(-)
> > >
> > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> > > b/lib/librte_hash/rte_cuckoo_hash.c
> > > index f7b86c8..616900b 100644
> > > --- a/lib/librte_hash/rte_cuckoo_hash.c
> > > +++ b/lib/librte_hash/rte_cuckoo_hash.c
> > > @@ -31,6 +31,10 @@
> > >  #include "rte_hash.h"
> > >  #include "rte_cuckoo_hash.h"
> > >
> > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)
> > > \
> > > +for (CURRENT_BKT = START_BUCKET;                                      \
> > > +CURRENT_BKT != NULL;                                          \
> > > +CURRENT_BKT = CURRENT_BKT->next)
> > >
> > >  TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
> > >
> > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
> > >  return h;
> > >  }
> > >
> > > +static inline struct rte_hash_bucket *
> > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) {
> > > +while (lst_bkt->next != NULL)
> > > +lst_bkt = lst_bkt->next;
> > > +return lst_bkt;
> > > +}
> > > +
> > >  void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)  {
> > >  h->cmp_jump_table_idx = KEY_CUSTOM;
> > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters
> > > *params)
> > >  struct rte_tailq_entry *te = NULL;
> > >  struct rte_hash_list *hash_list;
> > >  struct rte_ring *r = NULL;
> > > +struct rte_ring *r_ext = NULL;
> > >  char hash_name[RTE_HASH_NAMESIZE];
> > >  void *k = NULL;
> > >  void *buckets = NULL;
> > > +void *buckets_ext = NULL;
> > >  char ring_name[RTE_RING_NAMESIZE];
> > > +char ext_ring_name[RTE_RING_NAMESIZE];
> > >  unsigned num_key_slots;
> > >  unsigned i;
> > >  unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
> > > +unsigned int ext_table_support = 0;
> > >  unsigned int readwrite_concur_support = 0;
> > >
> > >  rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
> > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters
> > > *params)
> > >  multi_writer_support = 1;
> > >  }
> > >
> > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
> > > +ext_table_support = 1;
> > > +
> > >  /* Store all keys and leave the first entry as a dummy entry for
> > > lookup_bulk */
> > >  if (multi_writer_support)
> > >  /*
> > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters
> > > *params)
> > >  goto err;
> > >  }
> > >
> > > +const uint32_t num_buckets = rte_align32pow2(params->entries) /
> > > +RTE_HASH_BUCKET_ENTRIES;
> > > +
> > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> > > +params-
> > > >name);
> > Can be inside the if statement below.
> >
> > > +/* Create ring for extendable buckets. */
> > > +if (ext_table_support) {
> > > +r_ext = rte_ring_create(ext_ring_name,
> > > +rte_align32pow2(num_buckets + 1),
> > > +params->socket_id, 0);
> > > +
> > > +if (r_ext == NULL) {
> > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation
> > > "
> > > +"failed\n");
> > > +goto err;
> > > +}
> > > +}
> > > +
> > >  snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
> > >
> > >  rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters
> > > *params)
> > >  goto err_unlock;
> > >  }
> > >
> > > -const uint32_t num_buckets = rte_align32pow2(params->entries)
> > > -/ RTE_HASH_BUCKET_ENTRIES;
> > > -
> > >  buckets = rte_zmalloc_socket(NULL,
> > >  num_buckets * sizeof(struct rte_hash_bucket),
> > >  RTE_CACHE_LINE_SIZE, params->socket_id);
> > >
> > >  if (buckets == NULL) {
> > > -RTE_LOG(ERR, HASH, "memory allocation failed\n");
> > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
> > >  goto err_unlock;
> > >  }
> > >
> > > +/* Allocate same number of extendable buckets */
> > IMO, we are allocating too much memory to support this feature. Especially, when we claim that keys ending up in the extendable table is
> a rare occurrence. By doubling the memory we are effectively saying that the main table might have 50% utilization. It will also significantly
> increase the cycles required to iterate the complete hash table (in rte_hash_iterate API) even when we expect that the extendable table
> contains very few entries.
> >
> > I am wondering if we can provide options to control the amount of extra memory that gets allocated and make the memory allocation
> dynamic (or on demand basis). I think this also goes well with the general direction DPDK is taking - allocate resources as needed rather than
> allocating all the resources during initialization.
> >
> 
> Given that adding new entries should not normally be a fast-path function,

Umm, I don't think I agree with that.
There are plenty of cases where add/delete speed is important.
Konstantin

> how about allowing memory allocation in add itself. Why not initialize with
> a fairly small number of extra bucket entries, and then each time they are
> all used, double the number of entries. That will give efficient resource
> scaling, I think.
> 
> /Bruce

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-27 11:27         ` Ananyev, Konstantin
@ 2018-09-27 12:27           ` Bruce Richardson
  2018-09-27 12:33             ` Ananyev, Konstantin
  0 siblings, 1 reply; 107+ messages in thread
From: Bruce Richardson @ 2018-09-27 12:27 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Honnappa Nagarahalli, Wang, Yipeng1, dev, michel

On Thu, Sep 27, 2018 at 12:27:21PM +0100, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Thursday, September 27, 2018 12:15 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; dev@dpdk.org; michel@digirati.com.br
> > Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature
> > 
> > On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yipeng Wang <yipeng1.wang@intel.com>
> > > > Sent: Friday, September 21, 2018 12:18 PM
> > > > To: bruce.richardson@intel.com
> > > > Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br;
> > > > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature
> > > >
> > > > In use cases that hash table capacity needs to be guaranteed, the extendable
> > > > bucket feature can be used to contain extra keys in linked lists when conflict
> > > > happens. This is similar concept to the extendable bucket hash table in packet
> > > > framework.
> > > >
> > > > This commit adds the extendable bucket feature. User can turn it on or off
> > > > through the extra flag field during table creation time.
> > > >
> > > > Extendable bucket table composes of buckets that can be linked list to current
> > > > main table. When extendable bucket is enabled, the table utilization can
> > > > always acheive 100%.
> > > IMO, referring to this as 'table utilization' indicates an efficiency about memory utilization. Please consider changing this to indicate that
> > all of the configured number of entries will be accommodated?
> > >
> > > > Although keys ending up in the ext buckets may have longer look up time, they
> > > > should be rare due to the cuckoo algorithm.
> > > >
> > > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> > > > ---
> > > >  lib/librte_hash/rte_cuckoo_hash.c | 326
> > > > +++++++++++++++++++++++++++++++++-----
> > > >  lib/librte_hash/rte_cuckoo_hash.h |   5 +
> > > >  lib/librte_hash/rte_hash.h        |   3 +
> > > >  3 files changed, 292 insertions(+), 42 deletions(-)
> > > >
> > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> > > > b/lib/librte_hash/rte_cuckoo_hash.c
> > > > index f7b86c8..616900b 100644
> > > > --- a/lib/librte_hash/rte_cuckoo_hash.c
> > > > +++ b/lib/librte_hash/rte_cuckoo_hash.c
> > > > @@ -31,6 +31,10 @@
> > > >  #include "rte_hash.h"
> > > >  #include "rte_cuckoo_hash.h"
> > > >
> > > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)
> > > > \
> > > > +for (CURRENT_BKT = START_BUCKET;                                      \
> > > > +CURRENT_BKT != NULL;                                          \
> > > > +CURRENT_BKT = CURRENT_BKT->next)
> > > >
> > > >  TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
> > > >
> > > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
> > > >  return h;
> > > >  }
> > > >
> > > > +static inline struct rte_hash_bucket *
> > > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) {
> > > > +while (lst_bkt->next != NULL)
> > > > +lst_bkt = lst_bkt->next;
> > > > +return lst_bkt;
> > > > +}
> > > > +
> > > >  void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)  {
> > > >  h->cmp_jump_table_idx = KEY_CUSTOM;
> > > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters
> > > > *params)
> > > >  struct rte_tailq_entry *te = NULL;
> > > >  struct rte_hash_list *hash_list;
> > > >  struct rte_ring *r = NULL;
> > > > +struct rte_ring *r_ext = NULL;
> > > >  char hash_name[RTE_HASH_NAMESIZE];
> > > >  void *k = NULL;
> > > >  void *buckets = NULL;
> > > > +void *buckets_ext = NULL;
> > > >  char ring_name[RTE_RING_NAMESIZE];
> > > > +char ext_ring_name[RTE_RING_NAMESIZE];
> > > >  unsigned num_key_slots;
> > > >  unsigned i;
> > > >  unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
> > > > +unsigned int ext_table_support = 0;
> > > >  unsigned int readwrite_concur_support = 0;
> > > >
> > > >  rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
> > > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters
> > > > *params)
> > > >  multi_writer_support = 1;
> > > >  }
> > > >
> > > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
> > > > +ext_table_support = 1;
> > > > +
> > > >  /* Store all keys and leave the first entry as a dummy entry for
> > > > lookup_bulk */
> > > >  if (multi_writer_support)
> > > >  /*
> > > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters
> > > > *params)
> > > >  goto err;
> > > >  }
> > > >
> > > > +const uint32_t num_buckets = rte_align32pow2(params->entries) /
> > > > +RTE_HASH_BUCKET_ENTRIES;
> > > > +
> > > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> > > > +params-
> > > > >name);
> > > Can be inside the if statement below.
> > >
> > > > +/* Create ring for extendable buckets. */
> > > > +if (ext_table_support) {
> > > > +r_ext = rte_ring_create(ext_ring_name,
> > > > +rte_align32pow2(num_buckets + 1),
> > > > +params->socket_id, 0);
> > > > +
> > > > +if (r_ext == NULL) {
> > > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation
> > > > "
> > > > +"failed\n");
> > > > +goto err;
> > > > +}
> > > > +}
> > > > +
> > > >  snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
> > > >
> > > >  rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> > > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters
> > > > *params)
> > > >  goto err_unlock;
> > > >  }
> > > >
> > > > -const uint32_t num_buckets = rte_align32pow2(params->entries)
> > > > -/ RTE_HASH_BUCKET_ENTRIES;
> > > > -
> > > >  buckets = rte_zmalloc_socket(NULL,
> > > >  num_buckets * sizeof(struct rte_hash_bucket),
> > > >  RTE_CACHE_LINE_SIZE, params->socket_id);
> > > >
> > > >  if (buckets == NULL) {
> > > > -RTE_LOG(ERR, HASH, "memory allocation failed\n");
> > > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
> > > >  goto err_unlock;
> > > >  }
> > > >
> > > > +/* Allocate same number of extendable buckets */
> > > IMO, we are allocating too much memory to support this feature. Especially, when we claim that keys ending up in the extendable table is
> > a rare occurrence. By doubling the memory we are effectively saying that the main table might have 50% utilization. It will also significantly
> > increase the cycles required to iterate the complete hash table (in rte_hash_iterate API) even when we expect that the extendable table
> > contains very few entries.
> > >
> > > I am wondering if we can provide options to control the amount of extra memory that gets allocated and make the memory allocation
> > dynamic (or on demand basis). I think this also goes well with the general direction DPDK is taking - allocate resources as needed rather than
> > allocating all the resources during initialization.
> > >
> > 
> > Given that adding new entries should not normally be a fast-path function,
> 
> Umm, I don't think I agree with that.
> There are plenty of cases where add/delete speed is important.
> Konstantin

True, I suppose.
Perhaps then the best approach is to give a couple of options to the
developer. Allow specifying an initial amount of memory, and then an option
to allow the memory to grow or not. If the cost of memory allocation is a
problem for a specific app, then they can provide a large default and not
allow growing, while other apps, for whom speed is not that critical, can
provide a small default and allow growing.

Does that seem reasonable?

/Bruce

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-27 12:27           ` Bruce Richardson
@ 2018-09-27 12:33             ` Ananyev, Konstantin
  0 siblings, 0 replies; 107+ messages in thread
From: Ananyev, Konstantin @ 2018-09-27 12:33 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: Honnappa Nagarahalli, Wang, Yipeng1, dev, michel



> -----Original Message-----
> From: Richardson, Bruce
> Sent: Thursday, September 27, 2018 1:28 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Wang, Yipeng1 <yipeng1.wang@intel.com>; dev@dpdk.org;
> michel@digirati.com.br
> Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature
> 
> On Thu, Sep 27, 2018 at 12:27:21PM +0100, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > > Sent: Thursday, September 27, 2018 12:15 PM
> > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; dev@dpdk.org; michel@digirati.com.br
> > > Subject: Re: [dpdk-dev] [PATCH v2 5/7] hash: add extendable bucket feature
> > >
> > > On Thu, Sep 27, 2018 at 04:23:48AM +0000, Honnappa Nagarahalli wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yipeng Wang <yipeng1.wang@intel.com>
> > > > > Sent: Friday, September 21, 2018 12:18 PM
> > > > > To: bruce.richardson@intel.com
> > > > > Cc: dev@dpdk.org; yipeng1.wang@intel.com; michel@digirati.com.br;
> > > > > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > > > Subject: [PATCH v2 5/7] hash: add extendable bucket feature
> > > > >
> > > > > In use cases that hash table capacity needs to be guaranteed, the extendable
> > > > > bucket feature can be used to contain extra keys in linked lists when conflict
> > > > > happens. This is similar concept to the extendable bucket hash table in packet
> > > > > framework.
> > > > >
> > > > > This commit adds the extendable bucket feature. User can turn it on or off
> > > > > through the extra flag field during table creation time.
> > > > >
> > > > > Extendable bucket table composes of buckets that can be linked list to current
> > > > > main table. When extendable bucket is enabled, the table utilization can
> > > > > always acheive 100%.
> > > > IMO, referring to this as 'table utilization' indicates an efficiency about memory utilization. Please consider changing this to indicate
> that
> > > all of the configured number of entries will be accommodated?
> > > >
> > > > > Although keys ending up in the ext buckets may have longer look up time, they
> > > > > should be rare due to the cuckoo algorithm.
> > > > >
> > > > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> > > > > ---
> > > > >  lib/librte_hash/rte_cuckoo_hash.c | 326
> > > > > +++++++++++++++++++++++++++++++++-----
> > > > >  lib/librte_hash/rte_cuckoo_hash.h |   5 +
> > > > >  lib/librte_hash/rte_hash.h        |   3 +
> > > > >  3 files changed, 292 insertions(+), 42 deletions(-)
> > > > >
> > > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> > > > > b/lib/librte_hash/rte_cuckoo_hash.c
> > > > > index f7b86c8..616900b 100644
> > > > > --- a/lib/librte_hash/rte_cuckoo_hash.c
> > > > > +++ b/lib/librte_hash/rte_cuckoo_hash.c
> > > > > @@ -31,6 +31,10 @@
> > > > >  #include "rte_hash.h"
> > > > >  #include "rte_cuckoo_hash.h"
> > > > >
> > > > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)
> > > > > \
> > > > > +for (CURRENT_BKT = START_BUCKET;                                      \
> > > > > +CURRENT_BKT != NULL;                                          \
> > > > > +CURRENT_BKT = CURRENT_BKT->next)
> > > > >
> > > > >  TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
> > > > >
> > > > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
> > > > >  return h;
> > > > >  }
> > > > >
> > > > > +static inline struct rte_hash_bucket *
> > > > > +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) {
> > > > > +while (lst_bkt->next != NULL)
> > > > > +lst_bkt = lst_bkt->next;
> > > > > +return lst_bkt;
> > > > > +}
> > > > > +
> > > > >  void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)  {
> > > > >  h->cmp_jump_table_idx = KEY_CUSTOM;
> > > > > @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters
> > > > > *params)
> > > > >  struct rte_tailq_entry *te = NULL;
> > > > >  struct rte_hash_list *hash_list;
> > > > >  struct rte_ring *r = NULL;
> > > > > +struct rte_ring *r_ext = NULL;
> > > > >  char hash_name[RTE_HASH_NAMESIZE];
> > > > >  void *k = NULL;
> > > > >  void *buckets = NULL;
> > > > > +void *buckets_ext = NULL;
> > > > >  char ring_name[RTE_RING_NAMESIZE];
> > > > > +char ext_ring_name[RTE_RING_NAMESIZE];
> > > > >  unsigned num_key_slots;
> > > > >  unsigned i;
> > > > >  unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
> > > > > +unsigned int ext_table_support = 0;
> > > > >  unsigned int readwrite_concur_support = 0;
> > > > >
> > > > >  rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
> > > > > @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters
> > > > > *params)
> > > > >  multi_writer_support = 1;
> > > > >  }
> > > > >
> > > > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
> > > > > +ext_table_support = 1;
> > > > > +
> > > > >  /* Store all keys and leave the first entry as a dummy entry for
> > > > > lookup_bulk */
> > > > >  if (multi_writer_support)
> > > > >  /*
> > > > > @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters
> > > > > *params)
> > > > >  goto err;
> > > > >  }
> > > > >
> > > > > +const uint32_t num_buckets = rte_align32pow2(params->entries) /
> > > > > +RTE_HASH_BUCKET_ENTRIES;
> > > > > +
> > > > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> > > > > +params-
> > > > > >name);
> > > > Can be inside the if statement below.
> > > >
> > > > > +/* Create ring for extendable buckets. */
> > > > > +if (ext_table_support) {
> > > > > +r_ext = rte_ring_create(ext_ring_name,
> > > > > +rte_align32pow2(num_buckets + 1),
> > > > > +params->socket_id, 0);
> > > > > +
> > > > > +if (r_ext == NULL) {
> > > > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation
> > > > > "
> > > > > +"failed\n");
> > > > > +goto err;
> > > > > +}
> > > > > +}
> > > > > +
> > > > >  snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
> > > > >
> > > > >  rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> > > > > @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters
> > > > > *params)
> > > > >  goto err_unlock;
> > > > >  }
> > > > >
> > > > > -const uint32_t num_buckets = rte_align32pow2(params->entries)
> > > > > -/ RTE_HASH_BUCKET_ENTRIES;
> > > > > -
> > > > >  buckets = rte_zmalloc_socket(NULL,
> > > > >  num_buckets * sizeof(struct rte_hash_bucket),
> > > > >  RTE_CACHE_LINE_SIZE, params->socket_id);
> > > > >
> > > > >  if (buckets == NULL) {
> > > > > -RTE_LOG(ERR, HASH, "memory allocation failed\n");
> > > > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
> > > > >  goto err_unlock;
> > > > >  }
> > > > >
> > > > > +/* Allocate same number of extendable buckets */
> > > > IMO, we are allocating too much memory to support this feature. Especially, when we claim that keys ending up in the extendable
> table is
> > > a rare occurrence. By doubling the memory we are effectively saying that the main table might have 50% utilization. It will also
> significantly
> > > increase the cycles required to iterate the complete hash table (in rte_hash_iterate API) even when we expect that the extendable table
> > > contains very few entries.
> > > >
> > > > I am wondering if we can provide options to control the amount of extra memory that gets allocated and make the memory allocation
> > > dynamic (or on demand basis). I think this also goes well with the general direction DPDK is taking - allocate resources as needed rather
> than
> > > allocating all the resources during initialization.
> > > >
> > >
> > > Given that adding new entries should not normally be a fast-path function,
> >
> > Umm, I don't think I agree with that.
> > There are plenty of cases where add/delete speed is important.
> > Konstantin
> 
> True, I suppose.
> Perhaps then the best approach is to give a couple of options to the
> developer. Allow specifying an initial amount of memory, and then an option
> to allow the memory to grow or not. If the cost of memory allocation is a
> problem for a specific app, then they can provide a large default and not
> allow growing, while other apps, for whom speed is not that critical, can
> provide a small default and allow growing.
> 
> Does that seem reasonable?

Yes, I think it does.
Konstantin

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-27 11:15       ` Bruce Richardson
  2018-09-27 11:27         ` Ananyev, Konstantin
@ 2018-09-27 19:21         ` Honnappa Nagarahalli
  2018-09-28 17:35           ` Wang, Yipeng1
  1 sibling, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-27 19:21 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Yipeng Wang, dev, michel

> > >
> > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> > > ---
> > >  lib/librte_hash/rte_cuckoo_hash.c | 326
> > > +++++++++++++++++++++++++++++++++-----
> > >  lib/librte_hash/rte_cuckoo_hash.h |   5 +
> > >  lib/librte_hash/rte_hash.h        |   3 +
> > >  3 files changed, 292 insertions(+), 42 deletions(-)
> > >
> > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> > > b/lib/librte_hash/rte_cuckoo_hash.c
> > > index f7b86c8..616900b 100644
> > > --- a/lib/librte_hash/rte_cuckoo_hash.c
> > > +++ b/lib/librte_hash/rte_cuckoo_hash.c
> > > @@ -31,6 +31,10 @@
> > >  #include "rte_hash.h"
> > >  #include "rte_cuckoo_hash.h"
> > >
> > > +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)
> > > \
> > > +for (CURRENT_BKT = START_BUCKET;                                      \
> > > +CURRENT_BKT != NULL;                                          \
> > > +CURRENT_BKT = CURRENT_BKT->next)
> > >
> > >  TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
> > >
> > > @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)  return
> > > h;  }
> > >
> > > +static inline struct rte_hash_bucket * rte_hash_get_last_bkt(struct
> > > +rte_hash_bucket *lst_bkt) { while (lst_bkt->next != NULL) lst_bkt =
> > > +lst_bkt->next; return lst_bkt; }
> > > +
> > >  void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t
> > > func)  {  h->cmp_jump_table_idx = KEY_CUSTOM; @@ -85,13 +97,17 @@
> > > rte_hash_create(const struct rte_hash_parameters
> > > *params)
> > >  struct rte_tailq_entry *te = NULL;
> > >  struct rte_hash_list *hash_list;
> > >  struct rte_ring *r = NULL;
> > > +struct rte_ring *r_ext = NULL;
> > >  char hash_name[RTE_HASH_NAMESIZE];
> > >  void *k = NULL;
> > >  void *buckets = NULL;
> > > +void *buckets_ext = NULL;
> > >  char ring_name[RTE_RING_NAMESIZE];
> > > +char ext_ring_name[RTE_RING_NAMESIZE];
> > >  unsigned num_key_slots;
> > >  unsigned i;
> > >  unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
> > > +unsigned int ext_table_support = 0;
> > >  unsigned int readwrite_concur_support = 0;
> > >
> > >  rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
> > > @@ -124,6 +140,9 @@ rte_hash_create(const struct
> rte_hash_parameters
> > > *params)
> > >  multi_writer_support = 1;
> > >  }
> > >
> > > +if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
> > > +ext_table_support = 1;
> > > +
> > >  /* Store all keys and leave the first entry as a dummy entry for
> > > lookup_bulk */  if (multi_writer_support)
> > >  /*
> > > @@ -145,6 +164,24 @@ rte_hash_create(const struct
> > > rte_hash_parameters
> > > *params)
> > >  goto err;
> > >  }
> > >
> > > +const uint32_t num_buckets = rte_align32pow2(params->entries) /
> > > +RTE_HASH_BUCKET_ENTRIES;
> > > +
> > > +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> > > +params-
> > > >name);
> > Can be inside the if statement below.
> >
> > > +/* Create ring for extendable buckets. */ if (ext_table_support) {
> > > +r_ext = rte_ring_create(ext_ring_name, rte_align32pow2(num_buckets
> > > ++ 1),
> > > +params->socket_id, 0);
> > > +
> > > +if (r_ext == NULL) {
> > > +RTE_LOG(ERR, HASH, "ext buckets memory allocation
> > > "
> > > +"failed\n");
> > > +goto err;
> > > +}
> > > +}
> > > +
> > >  snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
> > >
> > >  rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> > > @@ -177,18 +214,34 @@ rte_hash_create(const struct
> > > rte_hash_parameters
> > > *params)
> > >  goto err_unlock;
> > >  }
> > >
> > > -const uint32_t num_buckets = rte_align32pow2(params->entries) -/
> > > RTE_HASH_BUCKET_ENTRIES;
> > > -
> > >  buckets = rte_zmalloc_socket(NULL,
> > >  num_buckets * sizeof(struct rte_hash_bucket),  RTE_CACHE_LINE_SIZE,
> > > params->socket_id);
> > >
> > >  if (buckets == NULL) {
> > > -RTE_LOG(ERR, HASH, "memory allocation failed\n");
> > > +RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
> > >  goto err_unlock;
> > >  }
> > >
> > > +/* Allocate same number of extendable buckets */
> > IMO, we are allocating too much memory to support this feature. Especially,
> when we claim that keys ending up in the extendable table is a rare
> occurrence. By doubling the memory we are effectively saying that the main
> table might have 50% utilization. It will also significantly increase the cycles
> required to iterate the complete hash table (in rte_hash_iterate API) even
> when we expect that the extendable table contains very few entries.
> >
> > I am wondering if we can provide options to control the amount of extra
> memory that gets allocated and make the memory allocation dynamic (or on
> demand basis). I think this also goes well with the general direction DPDK is
> taking - allocate resources as needed rather than allocating all the resources
> during initialization.
> >
>
> Given that adding new entries should not normally be a fast-path function,
> how about allowing memory allocation in add itself. Why not initialize with a
> fairly small number of extra bucket entries, and then each time they are all
> used, double the number of entries. That will give efficient resource scaling, I
> think.
>
+1
'small number of extra bucket entries' == 5% of total capacity requested (assuming cuckoo hash will provide 95% efficiency)

> /Bruce
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v4 0/5] hash: fix multiple issues
  2018-09-26 12:54 ` [PATCH v3 0/5] hash: fix multiple issues Yipeng Wang
                     ` (4 preceding siblings ...)
  2018-09-26 12:54   ` [PATCH v3 5/5] hash: fix unused define Yipeng Wang
@ 2018-09-28 14:11   ` Yipeng Wang
  2018-09-28 14:11     ` [PATCH v4 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
                       ` (5 more replies)
  5 siblings, 6 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 14:11 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

This patch set was part of extendable hash table patch
set before V2. According to Bruce's comment, this patch set
is now separated from the original patch set for easier
review and merge.
https://mails.dpdk.org/archives/dev/2018-September/112555.html

This patch set fixes multiple issues/bugs from rte_hash and hash
unit test.

V3->V4:
In first commit, per Honnappa's suggestion, added comment to explain
what value should BUCKET_SIZE be.
In third commit, fix a typo: "consecutive"

V2->V3:
As Bruce suggested:
Added a new commit to add missing file into meson.build for readwrite test.
Revised the commit message for the last commit.

Yipeng Wang (5):
  test/hash: fix bucket size in hash perf test
  test/hash: more accurate hash perf test output
  test/hash: fix rw test with non-consecutive cores
  test/hash: fix missing file in meson build file
  hash: fix unused define

 lib/librte_hash/rte_cuckoo_hash.h |  2 -
 test/test/meson.build             |  1 +
 test/test/test_hash_perf.c        | 13 ++++---
 test/test/test_hash_readwrite.c   | 78 ++++++++++++++++++++++++---------------
 4 files changed, 57 insertions(+), 37 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v4 1/5] test/hash: fix bucket size in hash perf test
  2018-09-28 14:11   ` [PATCH v4 0/5] hash: fix multiple issues Yipeng Wang
@ 2018-09-28 14:11     ` Yipeng Wang
  2018-10-01 20:28       ` Honnappa Nagarahalli
  2018-09-28 14:11     ` [PATCH v4 2/5] test/hash: more accurate hash perf test output Yipeng Wang
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 14:11 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

The bucket size was changed from 4 to 8 but the corresponding
perf test was not changed accordingly.

In the test, the bucket size and number of buckets are used
to map to the underneath rte_hash structure. They are used
to test performance of two conditions: keys in primary
buckets only and keys in both primary and secondary buckets.

Although there is no functional issue with bucket size set
to 4, it mismatches the underneath rte_hash structure,
which may affect code readability and future extension.

Fixes: 58017c98ed53 ("hash: add vectorized comparison")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_hash_perf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 33dcb9f..fe11632 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -20,7 +20,8 @@
 #define MAX_ENTRIES (1 << 19)
 #define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
-#define BUCKET_SIZE 4
+/* BUCKET_SIZE should be same as RTE_HASH_BUCKET_ENTRIES in rte_hash library */
+#define BUCKET_SIZE 8
 #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
 #define MAX_KEYSIZE 64
 #define NUM_KEYSIZES 10
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 2/5] test/hash: more accurate hash perf test output
  2018-09-28 14:11   ` [PATCH v4 0/5] hash: fix multiple issues Yipeng Wang
  2018-09-28 14:11     ` [PATCH v4 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
@ 2018-09-28 14:11     ` Yipeng Wang
  2018-09-28 14:11     ` [PATCH v4 3/5] test/hash: fix rw test with non-consecutive cores Yipeng Wang
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 14:11 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

Edit the printf information when error happens to be more
accurate and informative.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_hash_perf.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index fe11632..0d39e10 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -249,7 +249,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 						(const void *) keys[i],
 						signatures[i], data);
 			if (ret < 0) {
-				printf("Failed to add key number %u\n", ret);
+				printf("H+D: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else if (with_hash && !with_data) {
@@ -259,7 +259,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 			if (ret >= 0)
 				positions[i] = ret;
 			else {
-				printf("Failed to add key number %u\n", ret);
+				printf("H: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else if (!with_hash && with_data) {
@@ -267,7 +267,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 						(const void *) keys[i],
 						data);
 			if (ret < 0) {
-				printf("Failed to add key number %u\n", ret);
+				printf("D: Failed to add key number %u\n", i);
 				return -1;
 			}
 		} else {
@@ -275,7 +275,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 			if (ret >= 0)
 				positions[i] = ret;
 			else {
-				printf("Failed to add key number %u\n", ret);
+				printf("Failed to add key number %u\n", i);
 				return -1;
 			}
 		}
@@ -443,7 +443,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 		if (ret >= 0)
 			positions[i] = ret;
 		else {
-			printf("Failed to add key number %u\n", ret);
+			printf("Failed to delete key number %u\n", i);
 			return -1;
 		}
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 3/5] test/hash: fix rw test with non-consecutive cores
  2018-09-28 14:11   ` [PATCH v4 0/5] hash: fix multiple issues Yipeng Wang
  2018-09-28 14:11     ` [PATCH v4 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
  2018-09-28 14:11     ` [PATCH v4 2/5] test/hash: more accurate hash perf test output Yipeng Wang
@ 2018-09-28 14:11     ` Yipeng Wang
  2018-09-28 14:11     ` [PATCH v4 4/5] test/hash: fix missing file in meson build file Yipeng Wang
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 14:11 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

the multi-reader and multi-writer rte_hash unit test does not
work correctly with non-consecutive core ids. This commit
fixes the issue.

Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_hash_readwrite.c | 78 ++++++++++++++++++++++++++---------------
 1 file changed, 49 insertions(+), 29 deletions(-)

diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c
index 55ae33d..2a4f7b9 100644
--- a/test/test/test_hash_readwrite.c
+++ b/test/test/test_hash_readwrite.c
@@ -24,6 +24,7 @@
 #define NUM_TEST 3
 unsigned int core_cnt[NUM_TEST] = {2, 4, 8};
 
+unsigned int slave_core_ids[RTE_MAX_LCORE];
 struct perf {
 	uint32_t single_read;
 	uint32_t single_write;
@@ -60,12 +61,15 @@ test_hash_readwrite_worker(__attribute__((unused)) void *arg)
 	uint64_t begin, cycles;
 	int ret;
 
-	offset = (lcore_id - rte_get_master_lcore())
-			* tbl_rw_test_param.num_insert;
+	for (i = 0; i < rte_lcore_count(); i++) {
+		if (slave_core_ids[i] == lcore_id)
+			break;
+	}
+	offset = tbl_rw_test_param.num_insert * i;
 
 	printf("Core #%d inserting and reading %d: %'"PRId64" - %'"PRId64"\n",
 	       lcore_id, tbl_rw_test_param.num_insert,
-	       offset, offset + tbl_rw_test_param.num_insert);
+	       offset, offset + tbl_rw_test_param.num_insert - 1);
 
 	begin = rte_rdtsc_precise();
 
@@ -171,6 +175,7 @@ test_hash_readwrite_functional(int use_htm)
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 	int use_jhash = 1;
+	int slave_cnt = rte_lcore_count() - 1;
 
 	rte_atomic64_init(&gcycles);
 	rte_atomic64_clear(&gcycles);
@@ -182,17 +187,17 @@ test_hash_readwrite_functional(int use_htm)
 		goto err;
 
 	tbl_rw_test_param.num_insert =
-		TOTAL_INSERT / rte_lcore_count();
+		TOTAL_INSERT / slave_cnt;
 
 	tbl_rw_test_param.rounded_tot_insert =
 		tbl_rw_test_param.num_insert
-		* rte_lcore_count();
+		* slave_cnt;
 
 	printf("++++++++Start function tests:+++++++++\n");
 
 	/* Fire all threads. */
 	rte_eal_mp_remote_launch(test_hash_readwrite_worker,
-				 NULL, CALL_MASTER);
+				 NULL, SKIP_MASTER);
 	rte_eal_mp_wait_lcore();
 
 	while (rte_hash_iterate(tbl_rw_test_param.h, &next_key,
@@ -249,7 +254,7 @@ test_hash_readwrite_functional(int use_htm)
 }
 
 static int
-test_rw_reader(__attribute__((unused)) void *arg)
+test_rw_reader(void *arg)
 {
 	uint64_t i;
 	uint64_t begin, cycles;
@@ -276,7 +281,7 @@ test_rw_reader(__attribute__((unused)) void *arg)
 }
 
 static int
-test_rw_writer(__attribute__((unused)) void *arg)
+test_rw_writer(void *arg)
 {
 	uint64_t i;
 	uint32_t lcore_id = rte_lcore_id();
@@ -285,8 +290,13 @@ test_rw_writer(__attribute__((unused)) void *arg)
 	uint64_t start_coreid = (uint64_t)(uintptr_t)arg;
 	uint64_t offset;
 
-	offset = TOTAL_INSERT / 2 + (lcore_id - start_coreid)
-					* tbl_rw_test_param.num_insert;
+	for (i = 0; i < rte_lcore_count(); i++) {
+		if (slave_core_ids[i] == lcore_id)
+			break;
+	}
+
+	offset = TOTAL_INSERT / 2 + (i - (start_coreid)) *
+				tbl_rw_test_param.num_insert;
 	begin = rte_rdtsc_precise();
 	for (i = offset; i < offset + tbl_rw_test_param.num_insert; i++) {
 		ret = rte_hash_add_key_data(tbl_rw_test_param.h,
@@ -384,8 +394,8 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 	perf_results->single_read = end / i;
 
 	for (n = 0; n < NUM_TEST; n++) {
-		unsigned int tot_lcore = rte_lcore_count();
-		if (tot_lcore < core_cnt[n] * 2 + 1)
+		unsigned int tot_slave_lcore = rte_lcore_count() - 1;
+		if (tot_slave_lcore < core_cnt[n] * 2)
 			goto finish;
 
 		rte_atomic64_clear(&greads);
@@ -415,17 +425,19 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 		 */
 
 		/* Test only reader cases */
-		for (i = 1; i <= core_cnt[n]; i++)
+		for (i = 0; i < core_cnt[n]; i++)
 			rte_eal_remote_launch(test_rw_reader,
-					(void *)(uintptr_t)read_cnt, i);
+					(void *)(uintptr_t)read_cnt,
+					slave_core_ids[i]);
 
 		rte_eal_mp_wait_lcore();
 
 		start_coreid = i;
 		/* Test only writer cases */
-		for (; i <= core_cnt[n] * 2; i++)
+		for (; i < core_cnt[n] * 2; i++)
 			rte_eal_remote_launch(test_rw_writer,
-					(void *)((uintptr_t)start_coreid), i);
+					(void *)((uintptr_t)start_coreid),
+					slave_core_ids[i]);
 
 		rte_eal_mp_wait_lcore();
 
@@ -464,22 +476,26 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 			}
 		}
 
-		start_coreid = core_cnt[n] + 1;
+		start_coreid = core_cnt[n];
 
 		if (reader_faster) {
-			for (i = core_cnt[n] + 1; i <= core_cnt[n] * 2; i++)
+			for (i = core_cnt[n]; i < core_cnt[n] * 2; i++)
 				rte_eal_remote_launch(test_rw_writer,
-					(void *)((uintptr_t)start_coreid), i);
-			for (i = 1; i <= core_cnt[n]; i++)
+					(void *)((uintptr_t)start_coreid),
+					slave_core_ids[i]);
+			for (i = 0; i < core_cnt[n]; i++)
 				rte_eal_remote_launch(test_rw_reader,
-					(void *)(uintptr_t)read_cnt, i);
+					(void *)(uintptr_t)read_cnt,
+					slave_core_ids[i]);
 		} else {
-			for (i = 1; i <= core_cnt[n]; i++)
+			for (i = 0; i < core_cnt[n]; i++)
 				rte_eal_remote_launch(test_rw_reader,
-					(void *)(uintptr_t)read_cnt, i);
-			for (; i <= core_cnt[n] * 2; i++)
+					(void *)(uintptr_t)read_cnt,
+					slave_core_ids[i]);
+			for (; i < core_cnt[n] * 2; i++)
 				rte_eal_remote_launch(test_rw_writer,
-					(void *)((uintptr_t)start_coreid), i);
+					(void *)((uintptr_t)start_coreid),
+					slave_core_ids[i]);
 		}
 
 		rte_eal_mp_wait_lcore();
@@ -562,13 +578,19 @@ test_hash_readwrite_main(void)
 	 * writer threads for performance numbers.
 	 */
 	int use_htm, reader_faster;
+	unsigned int i = 0, core_id = 0;
 
-	if (rte_lcore_count() == 1) {
-		printf("More than one lcore is required "
+	if (rte_lcore_count() <= 2) {
+		printf("More than two lcores are required "
 			"to do read write test\n");
 		return 0;
 	}
 
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		slave_core_ids[i] = core_id;
+		i++;
+	}
+
 	setlocale(LC_NUMERIC, "");
 
 	if (rte_tm_supported()) {
@@ -610,8 +632,6 @@ test_hash_readwrite_main(void)
 
 	printf("Results summary:\n");
 
-	int i;
-
 	printf("single read: %u\n", htm_results.single_read);
 	printf("single write: %u\n", htm_results.single_write);
 	for (i = 0; i < NUM_TEST; i++) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 4/5] test/hash: fix missing file in meson build file
  2018-09-28 14:11   ` [PATCH v4 0/5] hash: fix multiple issues Yipeng Wang
                       ` (2 preceding siblings ...)
  2018-09-28 14:11     ` [PATCH v4 3/5] test/hash: fix rw test with non-consecutive cores Yipeng Wang
@ 2018-09-28 14:11     ` Yipeng Wang
  2018-09-28 14:11     ` [PATCH v4 5/5] hash: fix unused define Yipeng Wang
  2018-10-25 22:04     ` [PATCH v4 0/5] hash: fix multiple issues Thomas Monjalon
  5 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 14:11 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

The test_hash_readwrite.c was not in the meson.build file. This
commit adds the missing test into the file.

Fixes: 0eb3726ebcf1 ("test/hash: add test for read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6ec..1826bab 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -40,6 +40,7 @@ test_sources = files('commands.c',
 	'test_hash.c',
 	'test_hash_functions.c',
 	'test_hash_multiwriter.c',
+	'test_hash_readwrite.c',
 	'test_hash_perf.c',
 	'test_hash_scaling.c',
 	'test_interrupts.c',
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 5/5] hash: fix unused define
  2018-09-28 14:11   ` [PATCH v4 0/5] hash: fix multiple issues Yipeng Wang
                       ` (3 preceding siblings ...)
  2018-09-28 14:11     ` [PATCH v4 4/5] test/hash: fix missing file in meson build file Yipeng Wang
@ 2018-09-28 14:11     ` Yipeng Wang
  2018-10-25 22:04     ` [PATCH v4 0/5] hash: fix multiple issues Thomas Monjalon
  5 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 14:11 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

Since the depth-first search of cuckoo path is removed, we do not
need the macro anymore which specifies the depth of the cuckoo
search.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index b43f467..fc0e5c2 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -104,8 +104,6 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
 
 #define LCORE_CACHE_SIZE		64
 
-#define RTE_HASH_MAX_PUSHES             100
-
 #define RTE_HASH_BFS_QUEUE_MAX_LEN       1000
 
 #define RTE_XABORT_CUCKOO_PATH_INVALIDED 0x4
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 0/4] hash: add extendable bucket and partial key hashing
  2018-09-26 20:26 ` [PATCH v3 0/3] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (2 preceding siblings ...)
  2018-09-26 20:26   ` [PATCH v3 3/3] hash: use partial-key hashing Yipeng Wang
@ 2018-09-28 17:23   ` Yipeng Wang
  2018-09-28 17:23     ` [PATCH v4 1/4] hash: fix race condition in iterate Yipeng Wang
                       ` (4 more replies)
  2018-10-01 18:34   ` [PATCH v5 " Yipeng Wang
  2018-10-04 16:35   ` [PATCH v6 " Yipeng Wang
  5 siblings, 5 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 17:23 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

This patch set made two major optimizations over the current rte_hash
library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

v3->v4:
1. hash: Revise commit message to be more clear for "utilization" (Honnappa)
2. hash: in delete key function, return bucket change to use rte_ring_sp_enqueue
instead of rte_ring_mp_enqueue, since it is already protected inside locks.
3. hash: update rte_hash_iterate comments (Honnappa)
4. hash: Add a new commit to fix race condition in the rte_hash_iterate (Honnappa)
5. hash/test: during utilization test, double check rte_hash_cnt returns correct
value (Honnappa)
6. hash: for partial-key-hashing commit, break the get_buckets_index function
into three. It may make future extension easier (Honnappa)
7. hash: change the comment for typedef uint32_t hash_sig_t to be more clear
to users (Honnappa)

v2->v3:
The first four commits were separated from this patch set as another
independent patch set:
https://mails.dpdk.org/archives/dev/2018-September/113118.html
1. hash: move snprintf for ext_ring name under the ext_table condition.
2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
3. hash: after failing cuckoo path, search not only ext buckets, but also the
secondary bucket first to see if there may be an empty location now.
4. hash: totally rewrote the key deleting function logic. If the deleted key was
not in the last bucket of the linked list when ext table enabled, the last
entry in the linked list will be placed in the vacant slot from the deleted
key. The purpose is to compact the entries in the linked list to be more close
to the main table. This is to make sure that not many extendable buckets are
wasted with only one or two entries after some time of running, also benefit
lookup speed.
5. Other minor coding style/comments improvements.

V1->V2:
1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
2. hash: Reorder the rte_hash struct to align cache line better.
3. test: Minor changes in auto test to add key insertion failure check during
iteration test.
4. test: Add new commit to fix read-write test non-consecutive core issue.
4. hash: Add a new commit to remove unnecessary code introduced by previous
patches.
5. hash: Comments improvement and coding style improvements over multiple
places.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>

Yipeng Wang (4):
  hash: fix race condition in iterate
  hash: add extendable bucket feature
  test/hash: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 585 ++++++++++++++++++++++++++++----------
 lib/librte_hash/rte_cuckoo_hash.h |  11 +-
 lib/librte_hash/rte_hash.h        |   8 +-
 test/test/test_hash.c             | 159 ++++++++++-
 test/test/test_hash_perf.c        | 114 ++++++--
 5 files changed, 683 insertions(+), 194 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v4 1/4] hash: fix race condition in iterate
  2018-09-28 17:23   ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
@ 2018-09-28 17:23     ` Yipeng Wang
  2018-10-01 20:23       ` Honnappa Nagarahalli
  2018-09-28 17:23     ` [PATCH v4 2/4] hash: add extendable bucket feature Yipeng Wang
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 17:23 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.

This commit extends the protected region.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..eba13e9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 	idx = *next % RTE_HASH_BUCKET_ENTRIES;
 
+	__hash_rw_reader_lock(h);
 	/* If current position is empty, go to the next one */
 	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
-		if (*next == total_entries)
+		if (*next == total_entries) {
+			__hash_rw_reader_unlock(h);
 			return -ENOENT;
+		}
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
-	__hash_rw_reader_lock(h);
+
 	/* Get position of entry in key table */
 	position = h->buckets[bucket_idx].key_idx[idx];
 	next_key = (struct rte_hash_key *) ((char *)h->key_store +
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 2/4] hash: add extendable bucket feature
  2018-09-28 17:23   ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-09-28 17:23     ` [PATCH v4 1/4] hash: fix race condition in iterate Yipeng Wang
@ 2018-09-28 17:23     ` Yipeng Wang
  2018-10-02  3:58       ` Honnappa Nagarahalli
  2018-10-03 15:08       ` Stephen Hemminger
  2018-09-28 17:23     ` [PATCH v4 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 17:23 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the hash table load can always acheive 100%.
In other words, the table can always accomodate the same
number of keys as the specified table size. This provides
100% table capacity guarantee.
Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 376 ++++++++++++++++++++++++++++++++------
 lib/librte_hash/rte_cuckoo_hash.h |   5 +
 lib/librte_hash/rte_hash.h        |   3 +
 3 files changed, 331 insertions(+), 53 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index eba13e9..02650b9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,6 +31,10 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
+#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)                            \
+	for (CURRENT_BKT = START_BUCKET;                                      \
+		CURRENT_BKT != NULL;                                          \
+		CURRENT_BKT = CURRENT_BKT->next)
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
 	return h;
 }
 
+static inline struct rte_hash_bucket *
+rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt)
+{
+	while (lst_bkt->next != NULL)
+		lst_bkt = lst_bkt->next;
+	return lst_bkt;
+}
+
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)
 {
 	h->cmp_jump_table_idx = KEY_CUSTOM;
@@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	struct rte_tailq_entry *te = NULL;
 	struct rte_hash_list *hash_list;
 	struct rte_ring *r = NULL;
+	struct rte_ring *r_ext = NULL;
 	char hash_name[RTE_HASH_NAMESIZE];
 	void *k = NULL;
 	void *buckets = NULL;
+	void *buckets_ext = NULL;
 	char ring_name[RTE_RING_NAMESIZE];
+	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
 	unsigned i;
 	unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		multi_writer_support = 1;
 	}
 
+	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
+		ext_table_support = 1;
+
 	/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
 	if (multi_writer_support)
 		/*
@@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err;
 	}
 
+	const uint32_t num_buckets = rte_align32pow2(params->entries) /
+						RTE_HASH_BUCKET_ENTRIES;
+
+	/* Create ring for extendable buckets. */
+	if (ext_table_support) {
+		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
+								params->name);
+		r_ext = rte_ring_create(ext_ring_name,
+				rte_align32pow2(num_buckets + 1),
+				params->socket_id, 0);
+
+		if (r_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+								"failed\n");
+			goto err;
+		}
+	}
+
 	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
 
 	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err_unlock;
 	}
 
-	const uint32_t num_buckets = rte_align32pow2(params->entries)
-					/ RTE_HASH_BUCKET_ENTRIES;
-
 	buckets = rte_zmalloc_socket(NULL,
 				num_buckets * sizeof(struct rte_hash_bucket),
 				RTE_CACHE_LINE_SIZE, params->socket_id);
 
 	if (buckets == NULL) {
-		RTE_LOG(ERR, HASH, "memory allocation failed\n");
+		RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
 		goto err_unlock;
 	}
 
+	/* Allocate same number of extendable buckets */
+	if (ext_table_support) {
+		buckets_ext = rte_zmalloc_socket(NULL,
+				num_buckets * sizeof(struct rte_hash_bucket),
+				RTE_CACHE_LINE_SIZE, params->socket_id);
+		if (buckets_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+							"failed\n");
+			goto err_unlock;
+		}
+		/* Populate ext bkt ring. We reserve 0 similar to the
+		 * key-data slot, just in case in future we want to
+		 * use bucket index for the linked list and 0 means NULL
+		 * for next bucket
+		 */
+		for (i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+	}
+
 	const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params->key_len;
 	const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
 
@@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->num_buckets = num_buckets;
 	h->bucket_bitmask = h->num_buckets - 1;
 	h->buckets = buckets;
+	h->buckets_ext = buckets_ext;
+	h->free_ext_bkts = r_ext;
 	h->hash_func = (params->hash_func == NULL) ?
 		default_hash_func : params->hash_func;
 	h->key_store = k;
@@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->hw_trans_mem_support = hw_trans_mem_support;
 	h->multi_writer_support = multi_writer_support;
 	h->readwrite_concur_support = readwrite_concur_support;
+	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
 err:
 	rte_ring_free(r);
+	rte_ring_free(r_ext);
 	rte_free(te);
 	rte_free(h);
 	rte_free(buckets);
+	rte_free(buckets_ext);
 	rte_free(k);
 	return NULL;
 }
@@ -344,8 +402,10 @@ rte_hash_free(struct rte_hash *h)
 		rte_free(h->readwrite_lock);
 	}
 	rte_ring_free(h->free_slots);
+	rte_ring_free(h->free_ext_bkts);
 	rte_free(h->key_store);
 	rte_free(h->buckets);
+	rte_free(h->buckets_ext);
 	rte_free(h);
 	rte_free(te);
 }
@@ -403,7 +463,6 @@ __hash_rw_writer_lock(const struct rte_hash *h)
 		rte_rwlock_write_lock(h->readwrite_lock);
 }
 
-
 static inline void
 __hash_rw_reader_lock(const struct rte_hash *h)
 {
@@ -448,6 +507,14 @@ rte_hash_reset(struct rte_hash *h)
 	while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
 		rte_pause();
 
+	/* clear free extendable bucket ring and memory */
+	if (h->ext_table_support) {
+		memset(h->buckets_ext, 0, h->num_buckets *
+						sizeof(struct rte_hash_bucket));
+		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
+			rte_pause();
+	}
+
 	/* Repopulate the free slots ring. Entry zero is reserved for key misses */
 	if (h->multi_writer_support)
 		tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
@@ -458,6 +525,13 @@ rte_hash_reset(struct rte_hash *h)
 	for (i = 1; i < tot_ring_cnt + 1; i++)
 		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
 
+	/* Repopulate the free ext bkt ring. */
+	if (h->ext_table_support) {
+		for (i = 1; i < h->num_buckets + 1; i++)
+			rte_ring_sp_enqueue(h->free_ext_bkts,
+						(void *)((uintptr_t) i));
+	}
+
 	if (h->multi_writer_support) {
 		/* Reset local caches per lcore */
 		for (i = 0; i < RTE_MAX_LCORE; i++)
@@ -524,24 +598,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		int32_t *ret_val)
 {
 	unsigned int i;
-	struct rte_hash_bucket *cur_bkt = prim_bkt;
+	struct rte_hash_bucket *cur_bkt;
 	int32_t ret;
 
 	__hash_rw_writer_lock(h);
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	/* Insert new entry if there is room in the primary
@@ -580,7 +657,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
-	struct rte_hash_bucket *cur_bkt = bkt;
+	struct rte_hash_bucket *cur_bkt;
 	struct queue_node *prev_node, *curr_node = leaf;
 	struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
 	uint32_t prev_slot, curr_slot = leaf_slot;
@@ -597,18 +674,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
 
-	ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	while (likely(curr_node->prev != NULL)) {
@@ -711,15 +790,18 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	hash_sig_t alt_hash;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
-	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
 	void *slot_id = NULL;
-	uint32_t new_idx;
+	void *ext_bkt_id = NULL;
+	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
+	unsigned int i;
 	struct lcore_cache *cached_free_slots = NULL;
 	int32_t ret_val;
+	struct rte_hash_bucket *last;
 
 	prim_bucket_idx = sig & h->bucket_bitmask;
 	prim_bkt = &h->buckets[prim_bucket_idx];
@@ -739,10 +821,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Check if key is already inserted in secondary location */
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_writer_unlock(h);
 
@@ -808,10 +892,70 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
-	} else {
+	}
+
+	/* if ext table not enabled, we failed the insertion */
+	if (!h->ext_table_support) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret;
 	}
+
+	/* Now we need to go through the extendable bucket. Protection is needed
+	 * to protect all extendable bucket processes.
+	 */
+	__hash_rw_writer_lock(h);
+	/* We check for duplicates again since could be inserted before the lock */
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	if (ret != -1) {
+		enqueue_slot_back(h, cached_free_slots, slot_id);
+		goto failure;
+	}
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			enqueue_slot_back(h, cached_free_slots, slot_id);
+			goto failure;
+		}
+	}
+
+	/* Search sec and ext buckets to find an empty entry to insert. */
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			/* Check if slot is available */
+			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
+				cur_bkt->sig_current[i] = alt_hash;
+				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->key_idx[i] = new_idx;
+				__hash_rw_writer_unlock(h);
+				return new_idx - 1;
+			}
+		}
+	}
+
+	/* Failed to get an empty entry from extendable buckets. Link a new
+	 * extendable bucket. We first get a free bucket from ring.
+	 */
+	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+		ret = -ENOSPC;
+		goto failure;
+	}
+
+	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
+	/* Use the first location of the new bucket */
+	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
+	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
+	/* Link the new bucket to sec bucket linked list */
+	last = rte_hash_get_last_bkt(sec_bkt);
+	last->next = &h->buckets_ext[bkt_id];
+	__hash_rw_writer_unlock(h);
+	return new_idx - 1;
+
+failure:
+	__hash_rw_writer_unlock(h);
+	return ret;
+
 }
 
 int32_t
@@ -890,7 +1034,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
+	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
 
 	bucket_idx = sig & h->bucket_bitmask;
@@ -910,10 +1054,12 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 	bkt = &h->buckets[bucket_idx];
 
 	/* Check if key is in secondary location */
-	ret = search_one_bucket(h, key, alt_hash, data, bkt);
-	if (ret != -1) {
-		__hash_rw_reader_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, bkt) {
+		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		if (ret != -1) {
+			__hash_rw_reader_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_reader_unlock(h);
 	return -ENOENT;
@@ -978,16 +1124,42 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	}
 }
 
+/* Compact the linked list by moving key from last entry in linked list to the
+ * empty slot.
+ */
+static inline void
+__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
+	int i;
+	struct rte_hash_bucket *last_bkt;
+
+	if (!cur_bkt->next)
+		return;
+
+	last_bkt = rte_hash_get_last_bkt(cur_bkt);
+
+	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
+			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
+			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
+			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
+			last_bkt->sig_current[i] = NULL_SIGNATURE;
+			last_bkt->sig_alt[i] = NULL_SIGNATURE;
+			last_bkt->key_idx[i] = EMPTY_SLOT;
+			return;
+		}
+	}
+}
+
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig)
+			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
 	int32_t ret;
 
-	/* Check if key is in primary location */
+	/* Check if key is in bucket */
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 		if (bkt->sig_current[i] == sig &&
 				bkt->key_idx[i] != EMPTY_SLOT) {
@@ -996,12 +1168,12 @@ search_and_remove(const struct rte_hash *h, const void *key,
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
 				remove_entry(h, bkt, i);
 
-				/*
-				 * Return index where key is stored,
+				/* Return index where key is stored,
 				 * subtracting the first dummy index
 				 */
 				ret = bkt->key_idx[i] - 1;
 				bkt->key_idx[i] = EMPTY_SLOT;
+				*pos = i;
 				return ret;
 			}
 		}
@@ -1015,34 +1187,66 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
-	int32_t ret;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
+	struct rte_hash_bucket *cur_bkt;
+	int pos;
+	int32_t ret, i;
 
 	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	prim_bkt = &h->buckets[bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
 	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+		__rte_hash_compact_ll(prim_bkt, pos);
+		last_bkt = prim_bkt->next;
+		prev_bkt = prim_bkt;
+		goto return_bkt;
 	}
 
 	/* Calculate secondary hash */
 	alt_hash = rte_hash_secondary_hash(sig);
 	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[bucket_idx];
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		if (ret != -1) {
+			__rte_hash_compact_ll(cur_bkt, pos);
+			last_bkt = sec_bkt->next;
+			prev_bkt = sec_bkt;
+			goto return_bkt;
+		}
+	}
 
-	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, bkt, alt_hash);
-	if (ret != -1) {
+	__hash_rw_writer_unlock(h);
+	return -ENOENT;
+
+/* Search last bucket to see if empty to be recycled */
+return_bkt:
+	if (!last_bkt) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
+	while (last_bkt->next) {
+		prev_bkt = last_bkt;
+		last_bkt = last_bkt->next;
+	}
+
+	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT)
+			break;
+	}
+	/* found empty bucket and recycle */
+	if (i == RTE_HASH_BUCKET_ENTRIES) {
+		prev_bkt->next = last_bkt->next = NULL;
+		uint32_t index = last_bkt - h->buckets_ext + 1;
+		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+	}
 
 	__hash_rw_writer_unlock(h);
-	return -ENOENT;
+	return ret;
 }
 
 int32_t
@@ -1143,12 +1347,14 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 {
 	uint64_t hits = 0;
 	int32_t i;
+	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	struct rte_hash_bucket *cur_bkt, *next_bkt;
 
 	/* Prefetch first keys */
 	for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++)
@@ -1266,6 +1472,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		continue;
 	}
 
+	/* all found, do not need to go through ext bkt */
+	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
+		if (hit_mask != NULL)
+			*hit_mask = hits;
+		__hash_rw_reader_unlock(h);
+		return;
+	}
+
+	/* need to check ext buckets for match */
+	for (i = 0; i < num_keys; i++) {
+		if ((hits & (1ULL << i)) != 0)
+			continue;
+		next_bkt = secondary_bkt[i]->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			if (data != NULL)
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], &data[i], cur_bkt);
+			else
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], NULL, cur_bkt);
+			if (ret != -1) {
+				positions[i] = ret;
+				hits |= 1ULL << i;
+				break;
+			}
+		}
+	}
+
 	__hash_rw_reader_unlock(h);
 
 	if (hit_mask != NULL)
@@ -1308,10 +1542,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 
 	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
-	/* Out of bounds */
-	if (*next >= total_entries)
-		return -ENOENT;
+	const uint32_t total_entries_main = h->num_buckets *
+							RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries = total_entries_main << 1;
+
+	/* Out of bounds of all buckets (both main table and ext table */
+	if (*next >= total_entries_main)
+		goto extend_table;
 
 	/* Calculate bucket and index of current iterator */
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
@@ -1322,14 +1559,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
-		if (*next == total_entries) {
+		if (*next == total_entries_main) {
 			__hash_rw_reader_unlock(h);
-			return -ENOENT;
+			goto extend_table;
 		}
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
-
 	/* Get position of entry in key table */
 	position = h->buckets[bucket_idx].key_idx[idx];
 	next_key = (struct rte_hash_key *) ((char *)h->key_store +
@@ -1344,4 +1580,38 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	(*next)++;
 
 	return position - 1;
+
+/* Begin to iterate extendable buckets */
+extend_table:
+	/* Out of total bound or if ext bucket feature is not enabled */
+	if (*next >= total_entries || !h->ext_table_support)
+		return -ENOENT;
+
+	bucket_idx = (*next - total_entries_main) / RTE_HASH_BUCKET_ENTRIES;
+	idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+
+	__hash_rw_reader_lock(h);
+	while (h->buckets_ext[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+		(*next)++;
+		if (*next == total_entries) {
+			__hash_rw_reader_unlock(h);
+			return -ENOENT;
+		}
+		bucket_idx = (*next - total_entries_main) /
+						RTE_HASH_BUCKET_ENTRIES;
+		idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+	}
+	/* Get position of entry in key table */
+	position = h->buckets_ext[bucket_idx].key_idx[idx];
+	next_key = (struct rte_hash_key *) ((char *)h->key_store +
+				position * h->key_entry_size);
+	/* Return key and data */
+	*key = next_key->key;
+	*data = next_key->pdata;
+
+	__hash_rw_reader_unlock(h);
+
+	/* Increment iterator */
+	(*next)++;
+	return position - 1;
 }
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fc0e5c2..e601520 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -142,6 +142,8 @@ struct rte_hash_bucket {
 	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
+
+	void *next;
 } __rte_cache_aligned;
 
 /** A hash table structure. */
@@ -166,6 +168,7 @@ struct rte_hash {
 	/**< If multi-writer support is enabled. */
 	uint8_t readwrite_concur_support;
 	/**< If read-write concurrency support is enabled */
+	uint8_t ext_table_support;     /**< Enable extendable bucket table */
 	rte_hash_function hash_func;    /**< Function used to calculate hash. */
 	uint32_t hash_func_init_val;    /**< Init value used by hash_func. */
 	rte_hash_cmp_eq_t rte_hash_custom_cmp_eq;
@@ -184,6 +187,8 @@ struct rte_hash {
 	 * to the key table.
 	 */
 	rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
+	struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */
+	struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets */
 } __rte_cache_aligned;
 
 struct queue_node {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d931..11d8e28 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -37,6 +37,9 @@ extern "C" {
 /** Flag to support reader writer concurrency */
 #define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
 
+/** Flag to indicate the extendabe bucket table feature should be used */
+#define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
+
 /** Signature of key that is stored internally. */
 typedef uint32_t hash_sig_t;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 3/4] test/hash: implement extendable bucket hash test
  2018-09-28 17:23   ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-09-28 17:23     ` [PATCH v4 1/4] hash: fix race condition in iterate Yipeng Wang
  2018-09-28 17:23     ` [PATCH v4 2/4] hash: add extendable bucket feature Yipeng Wang
@ 2018-09-28 17:23     ` Yipeng Wang
  2018-10-01 19:53       ` Honnappa Nagarahalli
  2018-09-28 17:23     ` [PATCH v4 4/4] hash: use partial-key hashing Yipeng Wang
  2018-10-03 19:05     ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Dharmik Thakkar
  4 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 17:23 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 test/test/test_hash.c      | 159 +++++++++++++++++++++++++++++++++++++++++++--
 test/test/test_hash_perf.c | 114 +++++++++++++++++++++++---------
 2 files changed, 238 insertions(+), 35 deletions(-)

diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd..815c734 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -660,6 +660,116 @@ static int test_full_bucket(void)
 	return 0;
 }
 
+/*
+ * Similar to the test above (full bucket test), but for extendable buckets.
+ */
+static int test_extendable_bucket(void)
+{
+	struct rte_hash_parameters params_pseudo_hash = {
+		.name = "test5",
+		.entries = 64,
+		.key_len = sizeof(struct flow_key), /* 13 */
+		.hash_func = pseudo_hash,
+		.hash_func_init_val = 0,
+		.socket_id = 0,
+		.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
+	};
+	struct rte_hash *handle;
+	int pos[64];
+	int expected_pos[64];
+	unsigned int i;
+	struct flow_key rand_keys[64];
+
+	for (i = 0; i < 64; i++) {
+		rand_keys[i].port_dst = i;
+		rand_keys[i].port_src = i+1;
+	}
+
+	handle = rte_hash_create(&params_pseudo_hash);
+	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
+
+	/* Fill bucket */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add - update */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Delete 1 key, check other keys are still found */
+	pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
+	print_key_info("Del", &rand_keys[35], pos[35]);
+	RETURN_IF_ERROR(pos[35] != expected_pos[35],
+			"failed to delete key (pos[1]=%d)", pos[35]);
+	pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
+	print_key_info("Lkp", &rand_keys[20], pos[20]);
+	RETURN_IF_ERROR(pos[20] != expected_pos[20],
+			"failed lookup after deleting key from same bucket "
+			"(pos[20]=%d)", pos[20]);
+
+	/* Go back to previous state */
+	pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
+	print_key_info("Add", &rand_keys[35], pos[35]);
+	expected_pos[35] = pos[35];
+	RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)", pos[35]);
+
+	/* Delete */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
+		print_key_info("Del", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to delete key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != -ENOENT,
+			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add again */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	rte_hash_free(handle);
+
+	/* Cover the NULL case. */
+	rte_hash_free(0);
+	return 0;
+}
+
 /******************************************************************************/
 static int
 fbk_hash_unit_test(void)
@@ -1096,7 +1206,7 @@ test_hash_creation_with_good_parameters(void)
  * Test to see the average table utilization (entries added/max entries)
  * before hitting a random entry that cannot be added
  */
-static int test_average_table_utilization(void)
+static int test_average_table_utilization(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	uint8_t simple_key[MAX_KEYSIZE];
@@ -1107,12 +1217,23 @@ static int test_average_table_utilization(void)
 
 	printf("\n# Running test to determine average utilization"
 	       "\n  before adding elements begins to fail\n");
+	if (ext_table)
+		printf("ext table is enabled\n");
+	else
+		printf("ext table is disabled\n");
+
 	printf("Measuring performance, please wait");
 	fflush(stdout);
 	ut_params.entries = 1 << 16;
 	ut_params.name = "test_average_utilization";
 	ut_params.hash_func = rte_jhash;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
+
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
 	for (j = 0; j < ITERATIONS; j++) {
@@ -1139,6 +1260,14 @@ static int test_average_table_utilization(void)
 			rte_hash_free(handle);
 			return -1;
 		}
+		if (ext_table) {
+			if (cnt != ut_params.entries) {
+				printf("rte_hash_count returned wrong value "
+					"%u, %u, %u\n", j, added_keys, cnt);
+				rte_hash_free(handle);
+				return -1;
+			}
+		}
 
 		average_keys_added += added_keys;
 
@@ -1161,7 +1290,7 @@ static int test_average_table_utilization(void)
 }
 
 #define NUM_ENTRIES 256
-static int test_hash_iteration(void)
+static int test_hash_iteration(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	unsigned i;
@@ -1177,6 +1306,11 @@ static int test_hash_iteration(void)
 	ut_params.name = "test_hash_iteration";
 	ut_params.hash_func = rte_jhash;
 	ut_params.key_len = 16;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
@@ -1186,8 +1320,13 @@ static int test_hash_iteration(void)
 		for (i = 0; i < ut_params.key_len; i++)
 			keys[added_keys][i] = rte_rand() % 255;
 		ret = rte_hash_add_key_data(handle, keys[added_keys], data[added_keys]);
-		if (ret < 0)
+		if (ret < 0) {
+			if (ext_table) {
+				printf("Insertion failed for ext table\n");
+				goto err;
+			}
 			break;
+		}
 	}
 
 	/* Iterate through the hash table */
@@ -1474,6 +1613,8 @@ test_hash(void)
 		return -1;
 	if (test_full_bucket() < 0)
 		return -1;
+	if (test_extendable_bucket() < 0)
+		return -1;
 
 	if (test_fbk_hash_find_existing() < 0)
 		return -1;
@@ -1483,9 +1624,17 @@ test_hash(void)
 		return -1;
 	if (test_hash_creation_with_good_parameters() < 0)
 		return -1;
-	if (test_average_table_utilization() < 0)
+
+	/* ext table disabled */
+	if (test_average_table_utilization(0) < 0)
+		return -1;
+	if (test_hash_iteration(0) < 0)
+		return -1;
+
+	/* ext table enabled */
+	if (test_average_table_utilization(1) < 0)
 		return -1;
-	if (test_hash_iteration() < 0)
+	if (test_hash_iteration(1) < 0)
 		return -1;
 
 	run_hash_func_tests();
diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 0d39e10..5252111 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -18,7 +18,8 @@
 #include "test.h"
 
 #define MAX_ENTRIES (1 << 19)
-#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
+#define KEYS_TO_ADD (MAX_ENTRIES)
+#define ADD_PERCENT 0.75 /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
 /* BUCKET_SIZE should be same as RTE_HASH_BUCKET_ENTRIES in rte_hash library */
 #define BUCKET_SIZE 8
@@ -78,7 +79,7 @@ static struct rte_hash_parameters ut_params = {
 
 static int
 create_table(unsigned int with_data, unsigned int table_index,
-		unsigned int with_locks)
+		unsigned int with_locks, unsigned int ext)
 {
 	char name[RTE_HASH_NAMESIZE];
 
@@ -96,6 +97,9 @@ create_table(unsigned int with_data, unsigned int table_index,
 	else
 		ut_params.extra_flag = 0;
 
+	if (ext)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	ut_params.name = name;
 	ut_params.key_len = hashtest_key_lens[table_index];
 	ut_params.socket_id = rte_socket_id();
@@ -117,15 +121,21 @@ create_table(unsigned int with_data, unsigned int table_index,
 
 /* Shuffle the keys that have been added, so lookups will be totally random */
 static void
-shuffle_input_keys(unsigned table_index)
+shuffle_input_keys(unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	uint32_t swap_idx;
 	uint8_t temp_key[MAX_KEYSIZE];
 	hash_sig_t temp_signature;
 	int32_t temp_position;
+	unsigned int keys_to_add;
+
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = KEYS_TO_ADD - 1; i > 0; i--) {
+	for (i = keys_to_add - 1; i > 0; i--) {
 		swap_idx = rte_rand() % i;
 
 		memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
@@ -147,14 +157,20 @@ shuffle_input_keys(unsigned table_index)
  * ALL can fit in hash table (no errors)
  */
 static int
-get_input_keys(unsigned with_pushes, unsigned table_index)
+get_input_keys(unsigned int with_pushes, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j;
 	unsigned bucket_idx, incr, success = 1;
 	uint8_t k = 0;
 	int32_t ret;
 	const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
+	unsigned int keys_to_add;
 
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 	/* Reset all arrays */
 	for (i = 0; i < MAX_ENTRIES; i++)
 		slot_taken[i] = 0;
@@ -171,7 +187,7 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 	 * Regardless a key has been added correctly or not (success),
 	 * the next one to try will be increased by 1.
 	 */
-	for (i = 0; i < KEYS_TO_ADD;) {
+	for (i = 0; i < keys_to_add;) {
 		incr = 0;
 		if (i != 0) {
 			keys[i][0] = ++k;
@@ -235,14 +251,20 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 }
 
 static int
-timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_adds(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *data;
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		data = (void *) ((uintptr_t) signatures[i]);
 		if (with_hash && with_data) {
 			ret = rte_hash_add_key_with_hash_data(h[table_index],
@@ -284,22 +306,31 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][ADD][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][ADD][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
 
 static int
-timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_lookups(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i, j;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *ret_data;
 	void *expected_data;
 	int32_t ret;
-
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD; j++) {
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
+	for (i = 0; i < num_lookups / keys_to_add; i++) {
+		for (j = 0; j < keys_to_add; j++) {
 			if (with_hash && with_data) {
 				ret = rte_hash_lookup_with_hash_data(h[table_index],
 							(const void *) keys[j],
@@ -352,13 +383,14 @@ timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_lookups_multi(unsigned with_data, unsigned table_index)
+timed_lookups_multi(unsigned int with_data, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j, k;
 	int32_t positions_burst[BURST_SIZE];
@@ -367,11 +399,20 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	void *ret_data[BURST_SIZE];
 	uint64_t hit_mask;
 	int ret;
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
 
 	const uint64_t start_tsc = rte_rdtsc();
 
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
+	for (i = 0; i < num_lookups/keys_to_add; i++) {
+		for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
 			for (k = 0; k < BURST_SIZE; k++)
 				keys_burst[k] = keys[j * BURST_SIZE + k];
 			if (with_data) {
@@ -419,19 +460,25 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_deletes(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		/* There are no delete functions with data, so just call two functions */
 		if (with_hash)
 			ret = rte_hash_del_key_with_hash(h[table_index],
@@ -451,7 +498,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][DELETE][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][DELETE][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
@@ -469,7 +516,8 @@ reset_table(unsigned table_index)
 }
 
 static int
-run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
+run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
+						unsigned int ext)
 {
 	unsigned i, j, with_data, with_hash;
 
@@ -478,25 +526,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
 
 	for (with_data = 0; with_data <= 1; with_data++) {
 		for (i = 0; i < NUM_KEYSIZES; i++) {
-			if (create_table(with_data, i, with_locks) < 0)
+			if (create_table(with_data, i, with_locks, ext) < 0)
 				return -1;
 
-			if (get_input_keys(with_pushes, i) < 0)
+			if (get_input_keys(with_pushes, i, ext) < 0)
 				return -1;
 			for (with_hash = 0; with_hash <= 1; with_hash++) {
-				if (timed_adds(with_hash, with_data, i) < 0)
+				if (timed_adds(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				for (j = 0; j < NUM_SHUFFLES; j++)
-					shuffle_input_keys(i);
+					shuffle_input_keys(i, ext);
 
-				if (timed_lookups(with_hash, with_data, i) < 0)
+				if (timed_lookups(with_hash, with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_lookups_multi(with_data, i) < 0)
+				if (timed_lookups_multi(with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_deletes(with_hash, with_data, i) < 0)
+				if (timed_deletes(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				/* Print a dot to show progress on operations */
@@ -632,10 +680,16 @@ test_hash_perf(void)
 				printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
 			else
 				printf("\nELEMENTS IN PRIMARY OR SECONDARY LOCATION\n");
-			if (run_all_tbl_perf_tests(with_pushes, with_locks) < 0)
+			if (run_all_tbl_perf_tests(with_pushes, with_locks, 0) < 0)
 				return -1;
 		}
 	}
+
+	printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
+
+	if (run_all_tbl_perf_tests(1, 0, 1) < 0)
+		return -1;
+
 	if (fbk_hash_perf_test() < 0)
 		return -1;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v4 4/4] hash: use partial-key hashing
  2018-09-28 17:23   ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
                       ` (2 preceding siblings ...)
  2018-09-28 17:23     ` [PATCH v4 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-09-28 17:23     ` Yipeng Wang
  2018-10-01 20:09       ` Honnappa Nagarahalli
  2018-10-03 19:05     ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Dharmik Thakkar
  4 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-09-28 17:23 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Bascially the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 246 +++++++++++++++++++-------------------
 lib/librte_hash/rte_cuckoo_hash.h |   6 +-
 lib/librte_hash/rte_hash.h        |   5 +-
 3 files changed, 131 insertions(+), 126 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 02650b9..e101708 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -90,6 +90,36 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
 		return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
 }
 
+/*
+ * We use higher 16 bits of hash as the signature value stored in table.
+ * We use the lower bits for the primary bucket
+ * location. Then we XOR primary bucket location and the signature
+ * to get the secondary bucket location. This is same as
+ * proposed in Bin Fan, et al's paper
+ * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
+ * Smarter Hashing". The benefit to use
+ * XOR is that one could derive the alternative bucket location
+ * by only using the current bucket location and the signature.
+ */
+static inline uint16_t
+get_short_sig(const hash_sig_t hash)
+{
+	return hash >> 16;
+}
+
+static inline uint32_t
+get_prim_bucket_index(const struct rte_hash *h, const hash_sig_t hash)
+{
+	return hash & h->bucket_bitmask;
+}
+
+static inline uint32_t
+get_alt_bucket_index(const struct rte_hash *h,
+			uint32_t cur_bkt_idx, uint16_t sig)
+{
+	return (cur_bkt_idx ^ sig) & h->bucket_bitmask;
+}
+
 struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
@@ -327,9 +357,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
-		h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
-	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
 		h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
 	else
 #endif
@@ -417,18 +445,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
 	return h->hash_func(key, h->key_len, h->hash_func_init_val);
 }
 
-/* Calc the secondary hash value from the primary hash value of a given key */
-static inline hash_sig_t
-rte_hash_secondary_hash(const hash_sig_t primary_hash)
-{
-	static const unsigned all_bits_shift = 12;
-	static const unsigned alt_bits_xor = 0x5bd1e995;
-
-	uint32_t tag = primary_hash >> all_bits_shift;
-
-	return primary_hash ^ ((tag + 1) * alt_bits_xor);
-}
-
 int32_t
 rte_hash_count(const struct rte_hash *h)
 {
@@ -560,14 +576,13 @@ enqueue_slot_back(const struct rte_hash *h,
 /* Search a key from bucket and update its data */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
-	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+	struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	int i;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		if (bkt->sig_current[i] == sig &&
-				bkt->sig_alt[i] == alt_hash) {
+		if (bkt->sig_current[i] == sig) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -594,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		struct rte_hash_bucket *prim_bkt,
 		struct rte_hash_bucket *sec_bkt,
 		const struct rte_hash_key *key, void *data,
-		hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+		uint16_t sig, uint32_t new_idx,
 		int32_t *ret_val)
 {
 	unsigned int i;
@@ -605,7 +620,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -613,7 +628,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -628,7 +643,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			prim_bkt->sig_alt[i] = alt_hash;
 			prim_bkt->key_idx[i] = new_idx;
 			break;
 		}
@@ -653,7 +667,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *alt_bkt,
 			const struct rte_hash_key *key, void *data,
 			struct queue_node *leaf, uint32_t leaf_slot,
-			hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+			uint16_t sig, uint32_t new_idx,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
@@ -674,7 +688,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -682,7 +696,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -695,8 +709,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		prev_bkt = prev_node->bkt;
 		prev_slot = curr_node->prev_slot;
 
-		prev_alt_bkt_idx =
-			prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
+		prev_alt_bkt_idx = get_alt_bucket_index(h,
+					prev_node->cur_bkt_idx,
+					prev_bkt->sig_current[prev_slot]);
 
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
@@ -710,10 +725,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		 * Cuckoo insert to move elements back to its
 		 * primary bucket if available
 		 */
-		curr_bkt->sig_alt[curr_slot] =
-			 prev_bkt->sig_current[prev_slot];
 		curr_bkt->sig_current[curr_slot] =
-			prev_bkt->sig_alt[prev_slot];
+			prev_bkt->sig_current[prev_slot];
 		curr_bkt->key_idx[curr_slot] =
 			prev_bkt->key_idx[prev_slot];
 
@@ -723,7 +736,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
-	curr_bkt->sig_alt[curr_slot] = alt_hash;
 	curr_bkt->key_idx[curr_slot] = new_idx;
 
 	__hash_rw_writer_unlock(h);
@@ -741,39 +753,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *bkt,
 			struct rte_hash_bucket *sec_bkt,
 			const struct rte_hash_key *key, void *data,
-			hash_sig_t sig, hash_sig_t alt_hash,
+			uint16_t sig, uint32_t bucket_idx,
 			uint32_t new_idx, int32_t *ret_val)
 {
 	unsigned int i;
 	struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
 	struct queue_node *tail, *head;
 	struct rte_hash_bucket *curr_bkt, *alt_bkt;
+	uint32_t cur_idx, alt_idx;
 
 	tail = queue;
 	head = queue + 1;
 	tail->bkt = bkt;
 	tail->prev = NULL;
 	tail->prev_slot = -1;
+	tail->cur_bkt_idx = bucket_idx;
 
 	/* Cuckoo bfs Search */
 	while (likely(tail != head && head <
 					queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
 					RTE_HASH_BUCKET_ENTRIES)) {
 		curr_bkt = tail->bkt;
+		cur_idx = tail->cur_bkt_idx;
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
 				int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
 						bkt, sec_bkt, key, data,
-						tail, i, sig, alt_hash,
+						tail, i, sig,
 						new_idx, ret_val);
 				if (likely(ret != -1))
 					return ret;
 			}
 
 			/* Enqueue new node and keep prev node info */
-			alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
-						    & h->bucket_bitmask]);
+			alt_idx = get_alt_bucket_index(h, cur_idx,
+						curr_bkt->sig_current[i]);
+			alt_bkt = &(h->buckets[alt_idx]);
 			head->bkt = alt_bkt;
+			head->cur_bkt_idx = alt_idx;
 			head->prev = tail;
 			head->prev_slot = i;
 			head++;
@@ -788,7 +805,7 @@ static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig, void *data)
 {
-	hash_sig_t alt_hash;
+	uint16_t short_sig;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
@@ -803,18 +820,17 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	int32_t ret_val;
 	struct rte_hash_bucket *last;
 
-	prim_bucket_idx = sig & h->bucket_bitmask;
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
 	prim_bkt = &h->buckets[prim_bucket_idx];
-	rte_prefetch0(prim_bkt);
-
-	alt_hash = rte_hash_secondary_hash(sig);
-	sec_bucket_idx = alt_hash & h->bucket_bitmask;
 	sec_bkt = &h->buckets[sec_bucket_idx];
+	rte_prefetch0(prim_bkt);
 	rte_prefetch0(sec_bkt);
 
 	/* Check if key is already inserted in primary location */
 	__hash_rw_writer_lock(h);
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -822,12 +838,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Check if key is already inserted in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			return ret;
 		}
 	}
+
 	__hash_rw_writer_unlock(h);
 
 	/* Did not find a match, so get a new slot for storing the new key */
@@ -865,7 +882,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+					short_sig, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -875,7 +892,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -885,7 +902,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-					alt_hash, sig, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, new_idx, &ret_val);
 
 	if (ret == 0)
 		return new_idx - 1;
@@ -905,14 +922,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	 */
 	__hash_rw_writer_lock(h);
 	/* We check for duplicates again since could be inserted before the lock */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		goto failure;
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			enqueue_slot_back(h, cached_free_slots, slot_id);
 			goto failure;
@@ -924,8 +941,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
-				cur_bkt->sig_current[i] = alt_hash;
-				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->sig_current[i] = short_sig;
 				cur_bkt->key_idx[i] = new_idx;
 				__hash_rw_writer_unlock(h);
 				return new_idx - 1;
@@ -943,8 +959,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
-	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
@@ -1003,7 +1018,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
 
 /* Search one bucket to find the match key */
 static inline int32_t
-search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
+search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
 			void **data, const struct rte_hash_bucket *bkt)
 {
 	int i;
@@ -1032,30 +1047,30 @@ static inline int32_t
 __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 					hash_sig_t sig, void **data)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_reader_lock(h);
 
 	/* Check if key is in primary location */
-	ret = search_one_bucket(h, key, sig, data, bkt);
+	ret = search_one_bucket(h, key, short_sig, data, bkt);
 	if (ret != -1) {
 		__hash_rw_reader_unlock(h);
 		return ret;
 	}
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	bkt = &h->buckets[sec_bucket_idx];
 
 	/* Check if key is in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, bkt) {
-		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
 		if (ret != -1) {
 			__hash_rw_reader_unlock(h);
 			return ret;
@@ -1102,7 +1117,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	struct lcore_cache *cached_free_slots;
 
 	bkt->sig_current[i] = NULL_SIGNATURE;
-	bkt->sig_alt[i] = NULL_SIGNATURE;
 	if (h->multi_writer_support) {
 		lcore_id = rte_lcore_id();
 		cached_free_slots = &h->local_free_slots[lcore_id];
@@ -1141,9 +1155,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
 			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
-			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			last_bkt->sig_alt[i] = NULL_SIGNATURE;
 			last_bkt->key_idx[i] = EMPTY_SLOT;
 			return;
 		}
@@ -1153,7 +1165,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
+			struct rte_hash_bucket *bkt, uint16_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
@@ -1185,19 +1197,21 @@ static inline int32_t
 __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
 	struct rte_hash_bucket *cur_bkt;
 	int pos;
 	int32_t ret, i;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	prim_bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	prim_bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
+	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
 	if (ret != -1) {
 		__rte_hash_compact_ll(prim_bkt, pos);
 		last_bkt = prim_bkt->next;
@@ -1206,12 +1220,10 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	sec_bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[sec_bucket_idx];
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
 		if (ret != -1) {
 			__rte_hash_compact_ll(cur_bkt, pos);
 			last_bkt = sec_bkt->next;
@@ -1288,55 +1300,35 @@ static inline void
 compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 			const struct rte_hash_bucket *prim_bkt,
 			const struct rte_hash_bucket *sec_bkt,
-			hash_sig_t prim_hash, hash_sig_t sec_hash,
+			uint16_t sig,
 			enum rte_hash_sig_compare_function sig_cmp_fn)
 {
 	unsigned int i;
 
+	/* For match mask the first bit of every two bits indicates the match */
 	switch (sig_cmp_fn) {
-#ifdef RTE_MACHINE_CPUFLAG_AVX2
-	case RTE_HASH_COMPARE_AVX2:
-		*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)prim_bkt->sig_current),
-				_mm256_set1_epi32(prim_hash)));
-		*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)sec_bkt->sig_current),
-				_mm256_set1_epi32(sec_hash)));
-		break;
-#endif
 #ifdef RTE_MACHINE_CPUFLAG_SSE2
 	case RTE_HASH_COMPARE_SSE:
-		/* Compare the first 4 signatures in the bucket */
-		*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+		/* Compare all signatures in the bucket */
+		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
-				_mm_set1_epi32(prim_hash)));
-		*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&prim_bkt->sig_current[4]),
-				_mm_set1_epi32(prim_hash)))) << 4;
-		/* Compare the first 4 signatures in the bucket */
-		*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+				_mm_set1_epi16(sig)));
+		/* Compare all signatures in the bucket */
+		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
-				_mm_set1_epi32(sec_hash)));
-		*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&sec_bkt->sig_current[4]),
-				_mm_set1_epi32(sec_hash)))) << 4;
+				_mm_set1_epi16(sig)));
 		break;
 #endif
 	default:
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			*prim_hash_matches |=
-				((prim_hash == prim_bkt->sig_current[i]) << i);
+				((sig == prim_bkt->sig_current[i]) << (i << 1));
 			*sec_hash_matches |=
-				((sec_hash == sec_bkt->sig_current[i]) << i);
+				((sig == sec_bkt->sig_current[i]) << (i << 1));
 		}
 	}
-
 }
 
 #define PREFETCH_OFFSET 4
@@ -1349,7 +1341,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	int32_t i;
 	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
-	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
@@ -1368,10 +1362,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		rte_prefetch0(keys[i + PREFETCH_OFFSET]);
 
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1380,10 +1377,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	/* Calculate and prefetch rest of the buckets */
 	for (; i < num_keys; i++) {
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1394,10 +1394,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
 				primary_bkt[i], secondary_bkt[i],
-				prim_hash[i], sec_hash[i], h->sig_cmp_fn);
+				sig[i], h->sig_cmp_fn);
 
 		if (prim_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 			uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1408,7 +1409,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		}
 
 		if (sec_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 			uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1422,7 +1424,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		positions[i] = -ENOENT;
 		while (prim_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 
 			uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1441,11 +1444,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			prim_hitmask[i] &= ~(1 << (hit_index));
+			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 		while (sec_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 
 			uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1465,7 +1469,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			sec_hitmask[i] &= ~(1 << (hit_index));
+			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 next_key:
@@ -1488,10 +1492,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
 			if (data != NULL)
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], &data[i], cur_bkt);
+						sig[i], &data[i], cur_bkt);
 			else
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], NULL, cur_bkt);
+						sig[i], NULL, cur_bkt);
 			if (ret != -1) {
 				positions[i] = ret;
 				hits |= 1ULL << i;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index e601520..7753cd8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -129,18 +129,15 @@ struct rte_hash_key {
 enum rte_hash_sig_compare_function {
 	RTE_HASH_COMPARE_SCALAR = 0,
 	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_AVX2,
 	RTE_HASH_COMPARE_NUM
 };
 
 /** Bucket structure */
 struct rte_hash_bucket {
-	hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
+	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
 	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
 
-	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
-
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
 	void *next;
@@ -193,6 +190,7 @@ struct rte_hash {
 
 struct queue_node {
 	struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
+	uint32_t cur_bkt_idx;
 
 	struct queue_node *prev;     /* Parent(bucket) in search path */
 	int prev_slot;               /* Parent(slot) in search path */
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 11d8e28..6ace64e 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -40,7 +40,10 @@ extern "C" {
 /** Flag to indicate the extendabe bucket table feature should be used */
 #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
 
-/** Signature of key that is stored internally. */
+/**
+ * The type of hash value of a key.
+ * It should be a value of at least 32bit with fully random pattern.
+ */
 typedef uint32_t hash_sig_t;
 
 /** Type of function that can be used for calculating the hash value. */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-27 19:21         ` Honnappa Nagarahalli
@ 2018-09-28 17:35           ` Wang, Yipeng1
  2018-09-29 21:09             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-09-28 17:35 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce, Ananyev, Konstantin
  Cc: dev, Gobriel, Sameh

> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Thursday, September 27, 2018 12:22 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; dev@dpdk.org;
> michel@digirati.com.br
> Subject: RE: [PATCH v2 5/7] hash: add extendable bucket feature
> 
> > > > +/* Allocate same number of extendable buckets */
> > > IMO, we are allocating too much memory to support this feature.
> > > Especially,
> > when we claim that keys ending up in the extendable table is a rare
> > occurrence. By doubling the memory we are effectively saying that the
> > main table might have 50% utilization. It will also significantly
> > increase the cycles required to iterate the complete hash table (in
> > rte_hash_iterate API) even when we expect that the extendable table
> contains very few entries.
> > >
> > > I am wondering if we can provide options to control the amount of
> > > extra
> > memory that gets allocated and make the memory allocation dynamic (or
> > on demand basis). I think this also goes well with the general
> > direction DPDK is taking - allocate resources as needed rather than
> > allocating all the resources during initialization.
> > >
> >
> > Given that adding new entries should not normally be a fast-path
> > function, how about allowing memory allocation in add itself. Why not
> > initialize with a fairly small number of extra bucket entries, and
> > then each time they are all used, double the number of entries. That
> > will give efficient resource scaling, I think.
> >
> +1
> 'small number of extra bucket entries' == 5% of total capacity requested
> (assuming cuckoo hash will provide 95% efficiency)
> 
> > /Bruce
 [Wang, Yipeng] 
Thanks for the comments.
We allocate same as table size for extendable buckets at creation
because the purpose is to provide capacity guarantee even for the worst scenario
(all keys collide in same buckets).
Applications (e.g. Telco workloads) that require 100% capacity guarantee will be sure
that insertion always succeeds below the specified table size.
With any dynamic memory allocation or less buckets, this guarantee is
broken (even if it is very rare). The dynamic memory allocation could fail. 

Users who do not need such behavior can disable this feature.
Given that the cuckoo algorithm already ensures
very high utilization, they usually do not need the extendable buckets.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 1/7] test/hash: fix bucket size in hash perf test
  2018-09-27  4:23     ` Honnappa Nagarahalli
@ 2018-09-29  0:31       ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-09-29  0:31 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce; +Cc: dev, michel, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> several times */ -#define BUCKET_SIZE 4
>> +#define BUCKET_SIZE 8
>May be we should add a comment to warn that it should be same as ' RTE_HASH_BUCKET_ENTRIES'?
>
[Wang, Yipeng] Done in V4, Thanks for the comment!

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 0/7] hash: add extendable bucket and partial key hashing
  2018-09-27  4:23   ` Honnappa Nagarahalli
@ 2018-09-29  0:46     ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-09-29  0:46 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce; +Cc: dev, michel, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> Second, the patch set changes the current hashing algorithm to be "partial-
>> key hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
>> "MemC3: Compact and Concurrent MemCache with Dumber Caching and
>> Smarter Hashing".
>I read this paper (but not the papers in references). My understanding is that the existing algorithm already uses 'partial-key hashing'.
>This patch set is not adding the 'partial-key hashing' feature. Instead it is reducing the size of the signature ('tag' as referred in the
>paper) from 32 to 16b.
>Please let me know if I have not understood this correct.
[Wang, Yipeng] Currently two signature values are stored in hash table: sig_current and sig_alt.
They are used to derived the index of two alternative buckets.
Partial key hashing avoids storing two values, but stores only one. The alternative bucket
Index is derived by XOR this single signature with the current bucket index.
So, this commit not only reduces the signature size from 32-bit to 16-bit, it also
reduces the number of signatures stored from two to one.
As a result, we can use one 64-byte cache line for the bucket instead of two.

>> Instead of storing both 32-bit signature and alternative
>> signature in the bucket, we only store a small 16-bit signature and calculate
>> the alternative bucket index by XORing the signature with the current bucket
>> index.
>According to the referenced paper, the signature ('tag') reduces the number of accesses to the keys, thus improving the performance.
>But, if we reduce the size of the signature from 32b to 16b, it will result in higher probability of false matches on the signature. This in
>turn will increase the number of accesses to keys. Have you run any performance benchmarks and compared the numbers with the
>existing code? Is it possible to share the numbers?
>
[Wang, Yipeng] 
>From our test it is very unlikely that two different keys would map to the same bucket and also have the same 16-bit signature.
Even we reduce the signature size from 32-bit to 16-bit, it should be very very rare assuming a good hash function.
For performance numbers, it could vary depending on the test case. Since the speedup comes from the 2X memory efficiency,
If your hash table is large (e.g. over last level cache), it will give much higher speedup. For the existing unit test, I generally seen 10% speedup. 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 6/7] test/hash: implement extendable bucket hash test
  2018-09-27  4:24     ` Honnappa Nagarahalli
@ 2018-09-29  0:50       ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-09-29  0:50 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce; +Cc: dev, michel, Gobriel, Sameh



>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, September 26, 2018 9:24 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
>Cc: dev@dpdk.org; michel@digirati.com.br
>Subject: RE: [PATCH v2 6/7] test/hash: implement extendable bucket hash test
>
>>  RETURN_IF_ERROR(handle == NULL, "hash creation failed");
>>
>>  for (j = 0; j < ITERATIONS; j++) {
>My understanding is that when extendable table feature is enabled, we will add entries to the full capacity. Hence the
>rte_hash_count and rte_hash_reset should get tested in this test case.
>
[Wang, Yipeng] Currently both functions are already there in the loop right?
For V4, I've added another condition to double check if the count == param->entries.

>> @@ -1186,8 +1312,13 @@ static int test_hash_iteration(void)
>>  for (i = 0; i < ut_params.key_len; i++)
>>  keys[added_keys][i] = rte_rand() % 255;
>>  ret = rte_hash_add_key_data(handle, keys[added_keys],
>> data[added_keys]);
>> -if (ret < 0)
>> +if (ret < 0) {
>> +if (ext_table) {
>> +printf("Insertion failed for ext table\n");
>> +goto err;
>> +}
>>  break;
>> +}
>>  }
>>
>I suggest we add a call to rte_hash_count() to verify that configured maximum number of entries are added, will be a good corner test
>for rte_hash_count as well.
>
[Wang, Yipeng] Please check if the newly added logic in V4 addresses your concern. 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 7/7] hash: use partial-key hashing
  2018-09-27  4:24     ` Honnappa Nagarahalli
@ 2018-09-29  0:55       ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-09-29  0:55 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce; +Cc: dev, michel, Gobriel, Sameh

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, September 26, 2018 9:24 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
>Cc: dev@dpdk.org; michel@digirati.com.br
>Subject: RE: [PATCH v2 7/7] hash: use partial-key hashing
>
>
>> +static inline void
>> +get_buckets_index(const struct rte_hash *h, const hash_sig_t hash,
>> +uint32_t *prim_bkt, uint32_t *sec_bkt, uint16_t *sig) {
>> +/*
>> + * We use higher 16 bits of hash as the signature value stored in table.
>> + * We use the lower bits for the primary bucket
>> + * location. Then we XOR primary bucket location and the signature
>> + * to get the secondary bucket location. This is same as
>> + * proposed in Bin Fan, et al's paper
>> + * "MemC3: Compact and Concurrent MemCache with Dumber
>> Caching and
>> + * Smarter Hashing". The benefit to use
>> + * XOR is that one could derive the alternative bucket location
>> + * by only using the current bucket location and the signature.
>> + */
>> +*sig = hash >> 16;
>> +
>> +*prim_bkt = hash & h->bucket_bitmask;
>> +*sec_bkt =  (*prim_bkt ^ *sig) & h->bucket_bitmask; }
>> +
>IMO, this function can be split into 2 - one for primary bucket index and another for secondary bucket index. The secondary bucket
>index calculation function can be used in functions ' rte_hash_cuckoo_move_insert_mw' and ' rte_hash_cuckoo_make_space_mw'.
>
[Wang, Yipeng] I agree that breaking them down and use function call instead of explicit code will be easier for future extension,
i.e. changing the algorithm, etc. 
I split the function into 3 in V4. Please check out.

>> -/** Signature of key that is stored internally. */
>> +/**
>> + * A hash value that is used to generate signature stored in table and
>> +the
>> + * location the signature is stored.
>> + */
>This is an external file. This documentation goes into the API guide. IMO, we should change the comment to help the user. How about
>changing this to 'hash value of the key'?
>
[Wang, Yipeng] Improved in V4. Please check! Thanks!

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-27  4:23     ` Honnappa Nagarahalli
  2018-09-27 11:15       ` Bruce Richardson
@ 2018-09-29  1:10       ` Wang, Yipeng1
  2018-10-01 20:56         ` Honnappa Nagarahalli
  1 sibling, 1 reply; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-09-29  1:10 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce; +Cc: dev, michel, Gobriel, Sameh

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>>
>> Extendable bucket table composes of buckets that can be linked list to current
>> main table. When extendable bucket is enabled, the table utilization can
>> always acheive 100%.
>IMO, referring to this as 'table utilization' indicates an efficiency about memory utilization. Please consider changing this to indicate
>that all of the configured number of entries will be accommodated?
>
[Wang, Yipeng] Improved in V4, please check! Thanks!

>> +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
>> +params-
>> >name);
>Can be inside the if statement below.
[Wang, Yipeng] Done in V3, Thanks!
>
>> +/* Populate ext bkt ring. We reserve 0 similar to the
>> + * key-data slot, just in case in future we want to
>> + * use bucket index for the linked list and 0 means NULL
>> + * for next bucket
>> + */
>> +for (i = 1; i <= num_buckets; i++)
>Since, the bucket index 0 is reserved, should be 'i < num_buckets'
[Wang, Yipeng]  So the bucket array is from 0 to num_buckets - 1, and the index
Is from 1 to num_buckets. So I guess reserving 0 means reserving the index 0 but
not reduce the usable bucket count.
So I guess we still need to enqueue index of 1 to num_buckets into the free
Bucket ring for use?
>
>>  rte_free(h->key_store);
>>  rte_free(h->buckets);
>Add rte_free(h->buckets_ext);
[Wang, Yipeng] Done in V3, thanks!
>
>> +for (i = 1; i < h->num_buckets + 1; i++)
>Index 0 is reserved as per the comments. Condition should be 'i < h->num_buckets'.
[Wang, Yipeng] Similar to previous one, I guess we still need the number of num_buckets index
To be inserted in the ring.
>
>> +bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
>If index 0 is reserved, -1 is not required.
>
[Wang, Yipeng] Similar to previous one, bkt_id is the subscript to the array so it ranges from 0 to num_bucket - 1.
While the index is ranged from 1 to num_bucket. So I guess every time we got a bucket index we need to -1
To get the bucket array subscript. 
>> +if (tobe_removed_bkt) {
>> +uint32_t index = tobe_removed_bkt - h->buckets_ext + 1;
>No need to increase the index by 1 if entry 0 is reserved.
>
[Wang, Yipeng] Similar to previous one.
>> @@ -1308,10 +1519,13 @@ rte_hash_iterate(const struct rte_hash *h, const
>> void **key, void **data, uint32
>>
>>  RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
>>
>> -const uint32_t total_entries = h->num_buckets *
>> RTE_HASH_BUCKET_ENTRIES;
>> +const uint32_t total_entries_main = h->num_buckets *
>> +
>> RTE_HASH_BUCKET_ENTRIES;
>> +const uint32_t total_entries = total_entries_main << 1;
>> +
>>  /* Out of bounds */
>Minor: update the comment to reflect the new code.
[Wang, Yipeng] Done in V4, thanks!
>
>> @@ -1341,4 +1555,32 @@ rte_hash_iterate(const struct rte_hash *h, const
>> void **key, void **data, uint32
>>  (*next)++;
>>
>>  return position - 1;
>> +
>> +extend_table:
>> +/* Out of bounds */
>> +if (*next >= total_entries || !h->ext_table_support)
>> +return -ENOENT;
>> +
>> +bucket_idx = (*next - total_entries_main) /
>> RTE_HASH_BUCKET_ENTRIES;
>> +idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
>> +
>> +while (h->buckets_ext[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> +(*next)++;
>> +if (*next == total_entries)
>> +return -ENOENT;
>> +bucket_idx = (*next - total_entries_main) /
>> +RTE_HASH_BUCKET_ENTRIES;
>> +idx = (*next - total_entries_main) %
>> RTE_HASH_BUCKET_ENTRIES;
>> +}
>> +/* Get position of entry in key table */
>> +position = h->buckets_ext[bucket_idx].key_idx[idx];
>There is a possibility that 'position' is not the same value read in the while loop. It presents a problem if 'position' becomes
>EMPTY_SLOT. 'position' should be read as part of the while loop. Since it is 32b value, it should be atomic on most platforms. This issue
>applies to existing code as well.
>
[Wang, Yipeng] I agree. I add a new bug fix commit to fix this in V4. Basically I just extend the current critical region to cover
The while loop. Please check if that works. Thanks.

>__hash_rw_reader_lock(h) required
>> +next_key = (struct rte_hash_key *) ((char *)h->key_store +
>> +position * h->key_entry_size);
>> +/* Return key and data */
>> +*key = next_key->key;
>> +*data = next_key->pdata;
>> +
>__hash_rw_reader_unlock(h) required
[Wang, Yipeng] Agree, done in V4.  Thanks!
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-28 17:35           ` Wang, Yipeng1
@ 2018-09-29 21:09             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-29 21:09 UTC (permalink / raw)
  To: Wang, Yipeng1, Richardson, Bruce, Ananyev, Konstantin
  Cc: dev, Gobriel, Sameh, nd


> >
> > > > > +/* Allocate same number of extendable buckets */
> > > > IMO, we are allocating too much memory to support this feature.
> > > > Especially,
> > > when we claim that keys ending up in the extendable table is a rare
> > > occurrence. By doubling the memory we are effectively saying that
> > > the main table might have 50% utilization. It will also
> > > significantly increase the cycles required to iterate the complete
> > > hash table (in rte_hash_iterate API) even when we expect that the
> > > extendable table
> > contains very few entries.
> > > >
> > > > I am wondering if we can provide options to control the amount of
> > > > extra
> > > memory that gets allocated and make the memory allocation dynamic
> > > (or on demand basis). I think this also goes well with the general
> > > direction DPDK is taking - allocate resources as needed rather than
> > > allocating all the resources during initialization.
> > > >
> > >
> > > Given that adding new entries should not normally be a fast-path
> > > function, how about allowing memory allocation in add itself. Why
> > > not initialize with a fairly small number of extra bucket entries,
> > > and then each time they are all used, double the number of entries.
> > > That will give efficient resource scaling, I think.
> > >
> > +1
> > 'small number of extra bucket entries' == 5% of total capacity
> > requested (assuming cuckoo hash will provide 95% efficiency)
> >
> > > /Bruce
>  [Wang, Yipeng]
> Thanks for the comments.
> We allocate same as table size for extendable buckets at creation because the
> purpose is to provide capacity guarantee even for the worst scenario (all keys
> collide in same buckets).
> Applications (e.g. Telco workloads) that require 100% capacity guarantee will
> be sure that insertion always succeeds below the specified table size.
> With any dynamic memory allocation or less buckets, this guarantee is broken
> (even if it is very rare). The dynamic memory allocation could fail.
> 
> Users who do not need such behavior can disable this feature.
> Given that the cuckoo algorithm already ensures very high utilization, they
> usually do not need the extendable buckets.
Adding the dynamic memory allocation will make the code complicated. It is also possible that, keeping this feature disabled, one can create the table with more than required number of entries. I suggest we document the reason for doubling the memory. If someone sees a concrete requirement for dynamic allocation in the future, the code can be changed.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v5 0/4] hash: add extendable bucket and partial key hashing
  2018-09-26 20:26 ` [PATCH v3 0/3] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (3 preceding siblings ...)
  2018-09-28 17:23   ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
@ 2018-10-01 18:34   ` Yipeng Wang
  2018-10-01 18:34     ` [PATCH v5 1/4] hash: fix race condition in iterate Yipeng Wang
                       ` (4 more replies)
  2018-10-04 16:35   ` [PATCH v6 " Yipeng Wang
  5 siblings, 5 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-01 18:34 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

This patch set made two major optimizations over the current rte_hash
library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

v4->v5:
1. hash: for the first commit, move back the lock and read "position" in the
while condition as Honnappa suggested.
2. hash: minor coding style change (Honnappa) and commit message typo fix.
3. Add Review-by from Honnappa.

v3->v4:
1. hash: Revise commit message to be more clear for "utilization" (Honnappa)
2. hash: in delete key function, return bucket change to use rte_ring_sp_enqueue
instead of rte_ring_mp_enqueue, since it is already protected inside locks.
3. hash: update rte_hash_iterate comments (Honnappa)
4. hash: Add a new commit to fix race condition in the rte_hash_iterate (Honnappa)
5. hash/test: during utilization test, double check rte_hash_cnt returns correct
value (Honnappa)
6. hash: for partial-key-hashing commit, break the get_buckets_index function
into three. It may make future extension easier (Honnappa)
7. hash: change the comment for typedef uint32_t hash_sig_t to be more clear
to users (Honnappa)

v2->v3:
The first four commits were separated from this patch set as another
independent patch set:
https://mails.dpdk.org/archives/dev/2018-September/113118.html
1. hash: move snprintf for ext_ring name under the ext_table condition.
2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
3. hash: after failing cuckoo path, search not only ext buckets, but also the
secondary bucket first to see if there may be an empty location now.
4. hash: totally rewrote the key deleting function logic. If the deleted key was
not in the last bucket of the linked list when ext table enabled, the last
entry in the linked list will be placed in the vacant slot from the deleted
key. The purpose is to compact the entries in the linked list to be more close
to the main table. This is to make sure that not many extendable buckets are
wasted with only one or two entries after some time of running, also benefit
lookup speed.
5. Other minor coding style/comments improvements.

V1->V2:
1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
2. hash: Reorder the rte_hash struct to align cache line better.
3. test: Minor changes in auto test to add key insertion failure check during
iteration test.
4. test: Add new commit to fix read-write test non-consecutive core issue.
4. hash: Add a new commit to remove unnecessary code introduced by previous
patches.
5. hash: Comments improvement and coding style improvements over multiple
places.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>

Yipeng Wang (4):
  hash: fix race condition in iterate
  hash: add extendable bucket feature
  test/hash: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 580 ++++++++++++++++++++++++++++----------
 lib/librte_hash/rte_cuckoo_hash.h |  11 +-
 lib/librte_hash/rte_hash.h        |   8 +-
 test/test/test_hash.c             | 159 ++++++++++-
 test/test/test_hash_perf.c        | 114 ++++++--
 5 files changed, 677 insertions(+), 195 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v5 1/4] hash: fix race condition in iterate
  2018-10-01 18:34   ` [PATCH v5 " Yipeng Wang
@ 2018-10-01 18:34     ` Yipeng Wang
  2018-10-02 17:26       ` Honnappa Nagarahalli
  2018-10-01 18:35     ` [PATCH v5 2/4] hash: add extendable bucket feature Yipeng Wang
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-10-01 18:34 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.

This commit reads out the position in the while condition,
which makes sure that the position will not be changed
to empty before entering the lock.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..da8ddf4 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1318,7 +1318,7 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	idx = *next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+	while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
 		if (*next == total_entries)
@@ -1326,9 +1326,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
+
 	__hash_rw_reader_lock(h);
-	/* Get position of entry in key table */
-	position = h->buckets[bucket_idx].key_idx[idx];
 	next_key = (struct rte_hash_key *) ((char *)h->key_store +
 				position * h->key_entry_size);
 	/* Return key and data */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v5 2/4] hash: add extendable bucket feature
  2018-10-01 18:34   ` [PATCH v5 " Yipeng Wang
  2018-10-01 18:34     ` [PATCH v5 1/4] hash: fix race condition in iterate Yipeng Wang
@ 2018-10-01 18:35     ` Yipeng Wang
  2018-10-01 18:35     ` [PATCH v5 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-01 18:35 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the hash table load can always acheive 100%.
In other words, the table can always accommodate the same
number of keys as the specified table size. This provides
100% table capacity guarantee.
Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 369 ++++++++++++++++++++++++++++++++------
 lib/librte_hash/rte_cuckoo_hash.h |   5 +
 lib/librte_hash/rte_hash.h        |   3 +
 3 files changed, 326 insertions(+), 51 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index da8ddf4..133e181 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,6 +31,10 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
+#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)                            \
+	for (CURRENT_BKT = START_BUCKET;                                      \
+		CURRENT_BKT != NULL;                                          \
+		CURRENT_BKT = CURRENT_BKT->next)
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
 	return h;
 }
 
+static inline struct rte_hash_bucket *
+rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt)
+{
+	while (lst_bkt->next != NULL)
+		lst_bkt = lst_bkt->next;
+	return lst_bkt;
+}
+
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)
 {
 	h->cmp_jump_table_idx = KEY_CUSTOM;
@@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	struct rte_tailq_entry *te = NULL;
 	struct rte_hash_list *hash_list;
 	struct rte_ring *r = NULL;
+	struct rte_ring *r_ext = NULL;
 	char hash_name[RTE_HASH_NAMESIZE];
 	void *k = NULL;
 	void *buckets = NULL;
+	void *buckets_ext = NULL;
 	char ring_name[RTE_RING_NAMESIZE];
+	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
 	unsigned i;
 	unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		multi_writer_support = 1;
 	}
 
+	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
+		ext_table_support = 1;
+
 	/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
 	if (multi_writer_support)
 		/*
@@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err;
 	}
 
+	const uint32_t num_buckets = rte_align32pow2(params->entries) /
+						RTE_HASH_BUCKET_ENTRIES;
+
+	/* Create ring for extendable buckets. */
+	if (ext_table_support) {
+		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
+								params->name);
+		r_ext = rte_ring_create(ext_ring_name,
+				rte_align32pow2(num_buckets + 1),
+				params->socket_id, 0);
+
+		if (r_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+								"failed\n");
+			goto err;
+		}
+	}
+
 	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
 
 	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err_unlock;
 	}
 
-	const uint32_t num_buckets = rte_align32pow2(params->entries)
-					/ RTE_HASH_BUCKET_ENTRIES;
-
 	buckets = rte_zmalloc_socket(NULL,
 				num_buckets * sizeof(struct rte_hash_bucket),
 				RTE_CACHE_LINE_SIZE, params->socket_id);
 
 	if (buckets == NULL) {
-		RTE_LOG(ERR, HASH, "memory allocation failed\n");
+		RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
 		goto err_unlock;
 	}
 
+	/* Allocate same number of extendable buckets */
+	if (ext_table_support) {
+		buckets_ext = rte_zmalloc_socket(NULL,
+				num_buckets * sizeof(struct rte_hash_bucket),
+				RTE_CACHE_LINE_SIZE, params->socket_id);
+		if (buckets_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+							"failed\n");
+			goto err_unlock;
+		}
+		/* Populate ext bkt ring. We reserve 0 similar to the
+		 * key-data slot, just in case in future we want to
+		 * use bucket index for the linked list and 0 means NULL
+		 * for next bucket
+		 */
+		for (i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+	}
+
 	const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params->key_len;
 	const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
 
@@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->num_buckets = num_buckets;
 	h->bucket_bitmask = h->num_buckets - 1;
 	h->buckets = buckets;
+	h->buckets_ext = buckets_ext;
+	h->free_ext_bkts = r_ext;
 	h->hash_func = (params->hash_func == NULL) ?
 		default_hash_func : params->hash_func;
 	h->key_store = k;
@@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->hw_trans_mem_support = hw_trans_mem_support;
 	h->multi_writer_support = multi_writer_support;
 	h->readwrite_concur_support = readwrite_concur_support;
+	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
 err:
 	rte_ring_free(r);
+	rte_ring_free(r_ext);
 	rte_free(te);
 	rte_free(h);
 	rte_free(buckets);
+	rte_free(buckets_ext);
 	rte_free(k);
 	return NULL;
 }
@@ -344,8 +402,10 @@ rte_hash_free(struct rte_hash *h)
 		rte_free(h->readwrite_lock);
 	}
 	rte_ring_free(h->free_slots);
+	rte_ring_free(h->free_ext_bkts);
 	rte_free(h->key_store);
 	rte_free(h->buckets);
+	rte_free(h->buckets_ext);
 	rte_free(h);
 	rte_free(te);
 }
@@ -403,7 +463,6 @@ __hash_rw_writer_lock(const struct rte_hash *h)
 		rte_rwlock_write_lock(h->readwrite_lock);
 }
 
-
 static inline void
 __hash_rw_reader_lock(const struct rte_hash *h)
 {
@@ -448,6 +507,14 @@ rte_hash_reset(struct rte_hash *h)
 	while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
 		rte_pause();
 
+	/* clear free extendable bucket ring and memory */
+	if (h->ext_table_support) {
+		memset(h->buckets_ext, 0, h->num_buckets *
+						sizeof(struct rte_hash_bucket));
+		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
+			rte_pause();
+	}
+
 	/* Repopulate the free slots ring. Entry zero is reserved for key misses */
 	if (h->multi_writer_support)
 		tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
@@ -458,6 +525,13 @@ rte_hash_reset(struct rte_hash *h)
 	for (i = 1; i < tot_ring_cnt + 1; i++)
 		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
 
+	/* Repopulate the free ext bkt ring. */
+	if (h->ext_table_support) {
+		for (i = 1; i <= h->num_buckets; i++)
+			rte_ring_sp_enqueue(h->free_ext_bkts,
+						(void *)((uintptr_t) i));
+	}
+
 	if (h->multi_writer_support) {
 		/* Reset local caches per lcore */
 		for (i = 0; i < RTE_MAX_LCORE; i++)
@@ -524,24 +598,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		int32_t *ret_val)
 {
 	unsigned int i;
-	struct rte_hash_bucket *cur_bkt = prim_bkt;
+	struct rte_hash_bucket *cur_bkt;
 	int32_t ret;
 
 	__hash_rw_writer_lock(h);
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	/* Insert new entry if there is room in the primary
@@ -580,7 +657,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
-	struct rte_hash_bucket *cur_bkt = bkt;
+	struct rte_hash_bucket *cur_bkt;
 	struct queue_node *prev_node, *curr_node = leaf;
 	struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
 	uint32_t prev_slot, curr_slot = leaf_slot;
@@ -597,18 +674,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
 
-	ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	while (likely(curr_node->prev != NULL)) {
@@ -711,15 +790,18 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	hash_sig_t alt_hash;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
-	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
 	void *slot_id = NULL;
-	uint32_t new_idx;
+	void *ext_bkt_id = NULL;
+	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
+	unsigned int i;
 	struct lcore_cache *cached_free_slots = NULL;
 	int32_t ret_val;
+	struct rte_hash_bucket *last;
 
 	prim_bucket_idx = sig & h->bucket_bitmask;
 	prim_bkt = &h->buckets[prim_bucket_idx];
@@ -739,10 +821,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Check if key is already inserted in secondary location */
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_writer_unlock(h);
 
@@ -808,10 +892,70 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
-	} else {
+	}
+
+	/* if ext table not enabled, we failed the insertion */
+	if (!h->ext_table_support) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret;
 	}
+
+	/* Now we need to go through the extendable bucket. Protection is needed
+	 * to protect all extendable bucket processes.
+	 */
+	__hash_rw_writer_lock(h);
+	/* We check for duplicates again since could be inserted before the lock */
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	if (ret != -1) {
+		enqueue_slot_back(h, cached_free_slots, slot_id);
+		goto failure;
+	}
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			enqueue_slot_back(h, cached_free_slots, slot_id);
+			goto failure;
+		}
+	}
+
+	/* Search sec and ext buckets to find an empty entry to insert. */
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			/* Check if slot is available */
+			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
+				cur_bkt->sig_current[i] = alt_hash;
+				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->key_idx[i] = new_idx;
+				__hash_rw_writer_unlock(h);
+				return new_idx - 1;
+			}
+		}
+	}
+
+	/* Failed to get an empty entry from extendable buckets. Link a new
+	 * extendable bucket. We first get a free bucket from ring.
+	 */
+	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+		ret = -ENOSPC;
+		goto failure;
+	}
+
+	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
+	/* Use the first location of the new bucket */
+	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
+	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
+	/* Link the new bucket to sec bucket linked list */
+	last = rte_hash_get_last_bkt(sec_bkt);
+	last->next = &h->buckets_ext[bkt_id];
+	__hash_rw_writer_unlock(h);
+	return new_idx - 1;
+
+failure:
+	__hash_rw_writer_unlock(h);
+	return ret;
+
 }
 
 int32_t
@@ -890,7 +1034,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
+	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
 
 	bucket_idx = sig & h->bucket_bitmask;
@@ -910,10 +1054,12 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 	bkt = &h->buckets[bucket_idx];
 
 	/* Check if key is in secondary location */
-	ret = search_one_bucket(h, key, alt_hash, data, bkt);
-	if (ret != -1) {
-		__hash_rw_reader_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, bkt) {
+		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		if (ret != -1) {
+			__hash_rw_reader_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_reader_unlock(h);
 	return -ENOENT;
@@ -978,16 +1124,42 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	}
 }
 
+/* Compact the linked list by moving key from last entry in linked list to the
+ * empty slot.
+ */
+static inline void
+__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
+	int i;
+	struct rte_hash_bucket *last_bkt;
+
+	if (!cur_bkt->next)
+		return;
+
+	last_bkt = rte_hash_get_last_bkt(cur_bkt);
+
+	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
+			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
+			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
+			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
+			last_bkt->sig_current[i] = NULL_SIGNATURE;
+			last_bkt->sig_alt[i] = NULL_SIGNATURE;
+			last_bkt->key_idx[i] = EMPTY_SLOT;
+			return;
+		}
+	}
+}
+
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig)
+			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
 	int32_t ret;
 
-	/* Check if key is in primary location */
+	/* Check if key is in bucket */
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 		if (bkt->sig_current[i] == sig &&
 				bkt->key_idx[i] != EMPTY_SLOT) {
@@ -996,12 +1168,12 @@ search_and_remove(const struct rte_hash *h, const void *key,
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
 				remove_entry(h, bkt, i);
 
-				/*
-				 * Return index where key is stored,
+				/* Return index where key is stored,
 				 * subtracting the first dummy index
 				 */
 				ret = bkt->key_idx[i] - 1;
 				bkt->key_idx[i] = EMPTY_SLOT;
+				*pos = i;
 				return ret;
 			}
 		}
@@ -1015,34 +1187,66 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
-	int32_t ret;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
+	struct rte_hash_bucket *cur_bkt;
+	int pos;
+	int32_t ret, i;
 
 	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	prim_bkt = &h->buckets[bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
 	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+		__rte_hash_compact_ll(prim_bkt, pos);
+		last_bkt = prim_bkt->next;
+		prev_bkt = prim_bkt;
+		goto return_bkt;
 	}
 
 	/* Calculate secondary hash */
 	alt_hash = rte_hash_secondary_hash(sig);
 	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[bucket_idx];
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		if (ret != -1) {
+			__rte_hash_compact_ll(cur_bkt, pos);
+			last_bkt = sec_bkt->next;
+			prev_bkt = sec_bkt;
+			goto return_bkt;
+		}
+	}
 
-	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, bkt, alt_hash);
-	if (ret != -1) {
+	__hash_rw_writer_unlock(h);
+	return -ENOENT;
+
+/* Search last bucket to see if empty to be recycled */
+return_bkt:
+	if (!last_bkt) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
+	while (last_bkt->next) {
+		prev_bkt = last_bkt;
+		last_bkt = last_bkt->next;
+	}
+
+	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT)
+			break;
+	}
+	/* found empty bucket and recycle */
+	if (i == RTE_HASH_BUCKET_ENTRIES) {
+		prev_bkt->next = last_bkt->next = NULL;
+		uint32_t index = last_bkt - h->buckets_ext + 1;
+		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+	}
 
 	__hash_rw_writer_unlock(h);
-	return -ENOENT;
+	return ret;
 }
 
 int32_t
@@ -1143,12 +1347,14 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 {
 	uint64_t hits = 0;
 	int32_t i;
+	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	struct rte_hash_bucket *cur_bkt, *next_bkt;
 
 	/* Prefetch first keys */
 	for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++)
@@ -1266,6 +1472,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		continue;
 	}
 
+	/* all found, do not need to go through ext bkt */
+	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
+		if (hit_mask != NULL)
+			*hit_mask = hits;
+		__hash_rw_reader_unlock(h);
+		return;
+	}
+
+	/* need to check ext buckets for match */
+	for (i = 0; i < num_keys; i++) {
+		if ((hits & (1ULL << i)) != 0)
+			continue;
+		next_bkt = secondary_bkt[i]->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			if (data != NULL)
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], &data[i], cur_bkt);
+			else
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], NULL, cur_bkt);
+			if (ret != -1) {
+				positions[i] = ret;
+				hits |= 1ULL << i;
+				break;
+			}
+		}
+	}
+
 	__hash_rw_reader_unlock(h);
 
 	if (hit_mask != NULL)
@@ -1308,10 +1542,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 
 	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
-	/* Out of bounds */
-	if (*next >= total_entries)
-		return -ENOENT;
+	const uint32_t total_entries_main = h->num_buckets *
+							RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries = total_entries_main << 1;
+
+	/* Out of bounds of all buckets (both main table and ext table */
+	if (*next >= total_entries_main)
+		goto extend_table;
 
 	/* Calculate bucket and index of current iterator */
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
@@ -1322,7 +1559,7 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 		(*next)++;
 		/* End of table */
 		if (*next == total_entries)
-			return -ENOENT;
+			goto extend_table;
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
@@ -1340,4 +1577,34 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	(*next)++;
 
 	return position - 1;
+
+/* Begin to iterate extendable buckets */
+extend_table:
+	/* Out of total bound or if ext bucket feature is not enabled */
+	if (*next >= total_entries || !h->ext_table_support)
+		return -ENOENT;
+
+	bucket_idx = (*next - total_entries_main) / RTE_HASH_BUCKET_ENTRIES;
+	idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+
+	while ((position = h->buckets_ext[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
+		(*next)++;
+		if (*next == total_entries)
+			return -ENOENT;
+		bucket_idx = (*next - total_entries_main) /
+						RTE_HASH_BUCKET_ENTRIES;
+		idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+	}
+	__hash_rw_reader_lock(h);
+	next_key = (struct rte_hash_key *) ((char *)h->key_store +
+				position * h->key_entry_size);
+	/* Return key and data */
+	*key = next_key->key;
+	*data = next_key->pdata;
+
+	__hash_rw_reader_unlock(h);
+
+	/* Increment iterator */
+	(*next)++;
+	return position - 1;
 }
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fc0e5c2..e601520 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -142,6 +142,8 @@ struct rte_hash_bucket {
 	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
+
+	void *next;
 } __rte_cache_aligned;
 
 /** A hash table structure. */
@@ -166,6 +168,7 @@ struct rte_hash {
 	/**< If multi-writer support is enabled. */
 	uint8_t readwrite_concur_support;
 	/**< If read-write concurrency support is enabled */
+	uint8_t ext_table_support;     /**< Enable extendable bucket table */
 	rte_hash_function hash_func;    /**< Function used to calculate hash. */
 	uint32_t hash_func_init_val;    /**< Init value used by hash_func. */
 	rte_hash_cmp_eq_t rte_hash_custom_cmp_eq;
@@ -184,6 +187,8 @@ struct rte_hash {
 	 * to the key table.
 	 */
 	rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
+	struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */
+	struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets */
 } __rte_cache_aligned;
 
 struct queue_node {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d931..11d8e28 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -37,6 +37,9 @@ extern "C" {
 /** Flag to support reader writer concurrency */
 #define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
 
+/** Flag to indicate the extendabe bucket table feature should be used */
+#define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
+
 /** Signature of key that is stored internally. */
 typedef uint32_t hash_sig_t;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v5 3/4] test/hash: implement extendable bucket hash test
  2018-10-01 18:34   ` [PATCH v5 " Yipeng Wang
  2018-10-01 18:34     ` [PATCH v5 1/4] hash: fix race condition in iterate Yipeng Wang
  2018-10-01 18:35     ` [PATCH v5 2/4] hash: add extendable bucket feature Yipeng Wang
@ 2018-10-01 18:35     ` Yipeng Wang
  2018-10-01 18:35     ` [PATCH v5 4/4] hash: use partial-key hashing Yipeng Wang
  2018-10-03 19:10     ` [PATCH v5 0/4] hash: add extendable bucket and partial key hashing Dharmik Thakkar
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-01 18:35 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 test/test/test_hash.c      | 159 +++++++++++++++++++++++++++++++++++++++++++--
 test/test/test_hash_perf.c | 114 +++++++++++++++++++++++---------
 2 files changed, 238 insertions(+), 35 deletions(-)

diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd..815c734 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -660,6 +660,116 @@ static int test_full_bucket(void)
 	return 0;
 }
 
+/*
+ * Similar to the test above (full bucket test), but for extendable buckets.
+ */
+static int test_extendable_bucket(void)
+{
+	struct rte_hash_parameters params_pseudo_hash = {
+		.name = "test5",
+		.entries = 64,
+		.key_len = sizeof(struct flow_key), /* 13 */
+		.hash_func = pseudo_hash,
+		.hash_func_init_val = 0,
+		.socket_id = 0,
+		.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
+	};
+	struct rte_hash *handle;
+	int pos[64];
+	int expected_pos[64];
+	unsigned int i;
+	struct flow_key rand_keys[64];
+
+	for (i = 0; i < 64; i++) {
+		rand_keys[i].port_dst = i;
+		rand_keys[i].port_src = i+1;
+	}
+
+	handle = rte_hash_create(&params_pseudo_hash);
+	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
+
+	/* Fill bucket */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add - update */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Delete 1 key, check other keys are still found */
+	pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
+	print_key_info("Del", &rand_keys[35], pos[35]);
+	RETURN_IF_ERROR(pos[35] != expected_pos[35],
+			"failed to delete key (pos[1]=%d)", pos[35]);
+	pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
+	print_key_info("Lkp", &rand_keys[20], pos[20]);
+	RETURN_IF_ERROR(pos[20] != expected_pos[20],
+			"failed lookup after deleting key from same bucket "
+			"(pos[20]=%d)", pos[20]);
+
+	/* Go back to previous state */
+	pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
+	print_key_info("Add", &rand_keys[35], pos[35]);
+	expected_pos[35] = pos[35];
+	RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)", pos[35]);
+
+	/* Delete */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
+		print_key_info("Del", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to delete key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != -ENOENT,
+			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add again */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	rte_hash_free(handle);
+
+	/* Cover the NULL case. */
+	rte_hash_free(0);
+	return 0;
+}
+
 /******************************************************************************/
 static int
 fbk_hash_unit_test(void)
@@ -1096,7 +1206,7 @@ test_hash_creation_with_good_parameters(void)
  * Test to see the average table utilization (entries added/max entries)
  * before hitting a random entry that cannot be added
  */
-static int test_average_table_utilization(void)
+static int test_average_table_utilization(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	uint8_t simple_key[MAX_KEYSIZE];
@@ -1107,12 +1217,23 @@ static int test_average_table_utilization(void)
 
 	printf("\n# Running test to determine average utilization"
 	       "\n  before adding elements begins to fail\n");
+	if (ext_table)
+		printf("ext table is enabled\n");
+	else
+		printf("ext table is disabled\n");
+
 	printf("Measuring performance, please wait");
 	fflush(stdout);
 	ut_params.entries = 1 << 16;
 	ut_params.name = "test_average_utilization";
 	ut_params.hash_func = rte_jhash;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
+
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
 	for (j = 0; j < ITERATIONS; j++) {
@@ -1139,6 +1260,14 @@ static int test_average_table_utilization(void)
 			rte_hash_free(handle);
 			return -1;
 		}
+		if (ext_table) {
+			if (cnt != ut_params.entries) {
+				printf("rte_hash_count returned wrong value "
+					"%u, %u, %u\n", j, added_keys, cnt);
+				rte_hash_free(handle);
+				return -1;
+			}
+		}
 
 		average_keys_added += added_keys;
 
@@ -1161,7 +1290,7 @@ static int test_average_table_utilization(void)
 }
 
 #define NUM_ENTRIES 256
-static int test_hash_iteration(void)
+static int test_hash_iteration(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	unsigned i;
@@ -1177,6 +1306,11 @@ static int test_hash_iteration(void)
 	ut_params.name = "test_hash_iteration";
 	ut_params.hash_func = rte_jhash;
 	ut_params.key_len = 16;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
@@ -1186,8 +1320,13 @@ static int test_hash_iteration(void)
 		for (i = 0; i < ut_params.key_len; i++)
 			keys[added_keys][i] = rte_rand() % 255;
 		ret = rte_hash_add_key_data(handle, keys[added_keys], data[added_keys]);
-		if (ret < 0)
+		if (ret < 0) {
+			if (ext_table) {
+				printf("Insertion failed for ext table\n");
+				goto err;
+			}
 			break;
+		}
 	}
 
 	/* Iterate through the hash table */
@@ -1474,6 +1613,8 @@ test_hash(void)
 		return -1;
 	if (test_full_bucket() < 0)
 		return -1;
+	if (test_extendable_bucket() < 0)
+		return -1;
 
 	if (test_fbk_hash_find_existing() < 0)
 		return -1;
@@ -1483,9 +1624,17 @@ test_hash(void)
 		return -1;
 	if (test_hash_creation_with_good_parameters() < 0)
 		return -1;
-	if (test_average_table_utilization() < 0)
+
+	/* ext table disabled */
+	if (test_average_table_utilization(0) < 0)
+		return -1;
+	if (test_hash_iteration(0) < 0)
+		return -1;
+
+	/* ext table enabled */
+	if (test_average_table_utilization(1) < 0)
 		return -1;
-	if (test_hash_iteration() < 0)
+	if (test_hash_iteration(1) < 0)
 		return -1;
 
 	run_hash_func_tests();
diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 0d39e10..5252111 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -18,7 +18,8 @@
 #include "test.h"
 
 #define MAX_ENTRIES (1 << 19)
-#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
+#define KEYS_TO_ADD (MAX_ENTRIES)
+#define ADD_PERCENT 0.75 /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
 /* BUCKET_SIZE should be same as RTE_HASH_BUCKET_ENTRIES in rte_hash library */
 #define BUCKET_SIZE 8
@@ -78,7 +79,7 @@ static struct rte_hash_parameters ut_params = {
 
 static int
 create_table(unsigned int with_data, unsigned int table_index,
-		unsigned int with_locks)
+		unsigned int with_locks, unsigned int ext)
 {
 	char name[RTE_HASH_NAMESIZE];
 
@@ -96,6 +97,9 @@ create_table(unsigned int with_data, unsigned int table_index,
 	else
 		ut_params.extra_flag = 0;
 
+	if (ext)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	ut_params.name = name;
 	ut_params.key_len = hashtest_key_lens[table_index];
 	ut_params.socket_id = rte_socket_id();
@@ -117,15 +121,21 @@ create_table(unsigned int with_data, unsigned int table_index,
 
 /* Shuffle the keys that have been added, so lookups will be totally random */
 static void
-shuffle_input_keys(unsigned table_index)
+shuffle_input_keys(unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	uint32_t swap_idx;
 	uint8_t temp_key[MAX_KEYSIZE];
 	hash_sig_t temp_signature;
 	int32_t temp_position;
+	unsigned int keys_to_add;
+
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = KEYS_TO_ADD - 1; i > 0; i--) {
+	for (i = keys_to_add - 1; i > 0; i--) {
 		swap_idx = rte_rand() % i;
 
 		memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
@@ -147,14 +157,20 @@ shuffle_input_keys(unsigned table_index)
  * ALL can fit in hash table (no errors)
  */
 static int
-get_input_keys(unsigned with_pushes, unsigned table_index)
+get_input_keys(unsigned int with_pushes, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j;
 	unsigned bucket_idx, incr, success = 1;
 	uint8_t k = 0;
 	int32_t ret;
 	const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
+	unsigned int keys_to_add;
 
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 	/* Reset all arrays */
 	for (i = 0; i < MAX_ENTRIES; i++)
 		slot_taken[i] = 0;
@@ -171,7 +187,7 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 	 * Regardless a key has been added correctly or not (success),
 	 * the next one to try will be increased by 1.
 	 */
-	for (i = 0; i < KEYS_TO_ADD;) {
+	for (i = 0; i < keys_to_add;) {
 		incr = 0;
 		if (i != 0) {
 			keys[i][0] = ++k;
@@ -235,14 +251,20 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 }
 
 static int
-timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_adds(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *data;
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		data = (void *) ((uintptr_t) signatures[i]);
 		if (with_hash && with_data) {
 			ret = rte_hash_add_key_with_hash_data(h[table_index],
@@ -284,22 +306,31 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][ADD][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][ADD][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
 
 static int
-timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_lookups(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i, j;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *ret_data;
 	void *expected_data;
 	int32_t ret;
-
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD; j++) {
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
+	for (i = 0; i < num_lookups / keys_to_add; i++) {
+		for (j = 0; j < keys_to_add; j++) {
 			if (with_hash && with_data) {
 				ret = rte_hash_lookup_with_hash_data(h[table_index],
 							(const void *) keys[j],
@@ -352,13 +383,14 @@ timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_lookups_multi(unsigned with_data, unsigned table_index)
+timed_lookups_multi(unsigned int with_data, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j, k;
 	int32_t positions_burst[BURST_SIZE];
@@ -367,11 +399,20 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	void *ret_data[BURST_SIZE];
 	uint64_t hit_mask;
 	int ret;
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
 
 	const uint64_t start_tsc = rte_rdtsc();
 
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
+	for (i = 0; i < num_lookups/keys_to_add; i++) {
+		for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
 			for (k = 0; k < BURST_SIZE; k++)
 				keys_burst[k] = keys[j * BURST_SIZE + k];
 			if (with_data) {
@@ -419,19 +460,25 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_deletes(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		/* There are no delete functions with data, so just call two functions */
 		if (with_hash)
 			ret = rte_hash_del_key_with_hash(h[table_index],
@@ -451,7 +498,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][DELETE][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][DELETE][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
@@ -469,7 +516,8 @@ reset_table(unsigned table_index)
 }
 
 static int
-run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
+run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
+						unsigned int ext)
 {
 	unsigned i, j, with_data, with_hash;
 
@@ -478,25 +526,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
 
 	for (with_data = 0; with_data <= 1; with_data++) {
 		for (i = 0; i < NUM_KEYSIZES; i++) {
-			if (create_table(with_data, i, with_locks) < 0)
+			if (create_table(with_data, i, with_locks, ext) < 0)
 				return -1;
 
-			if (get_input_keys(with_pushes, i) < 0)
+			if (get_input_keys(with_pushes, i, ext) < 0)
 				return -1;
 			for (with_hash = 0; with_hash <= 1; with_hash++) {
-				if (timed_adds(with_hash, with_data, i) < 0)
+				if (timed_adds(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				for (j = 0; j < NUM_SHUFFLES; j++)
-					shuffle_input_keys(i);
+					shuffle_input_keys(i, ext);
 
-				if (timed_lookups(with_hash, with_data, i) < 0)
+				if (timed_lookups(with_hash, with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_lookups_multi(with_data, i) < 0)
+				if (timed_lookups_multi(with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_deletes(with_hash, with_data, i) < 0)
+				if (timed_deletes(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				/* Print a dot to show progress on operations */
@@ -632,10 +680,16 @@ test_hash_perf(void)
 				printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
 			else
 				printf("\nELEMENTS IN PRIMARY OR SECONDARY LOCATION\n");
-			if (run_all_tbl_perf_tests(with_pushes, with_locks) < 0)
+			if (run_all_tbl_perf_tests(with_pushes, with_locks, 0) < 0)
 				return -1;
 		}
 	}
+
+	printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
+
+	if (run_all_tbl_perf_tests(1, 0, 1) < 0)
+		return -1;
+
 	if (fbk_hash_perf_test() < 0)
 		return -1;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v5 4/4] hash: use partial-key hashing
  2018-10-01 18:34   ` [PATCH v5 " Yipeng Wang
                       ` (2 preceding siblings ...)
  2018-10-01 18:35     ` [PATCH v5 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-10-01 18:35     ` Yipeng Wang
  2018-10-02 20:52       ` Dharmik Thakkar
  2018-10-03 19:10     ` [PATCH v5 0/4] hash: add extendable bucket and partial key hashing Dharmik Thakkar
  4 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-10-01 18:35 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel

This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Bascially the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 246 +++++++++++++++++++-------------------
 lib/librte_hash/rte_cuckoo_hash.h |   6 +-
 lib/librte_hash/rte_hash.h        |   5 +-
 3 files changed, 131 insertions(+), 126 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 133e181..3c7c9c5 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -90,6 +90,36 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
 		return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
 }
 
+/*
+ * We use higher 16 bits of hash as the signature value stored in table.
+ * We use the lower bits for the primary bucket
+ * location. Then we XOR primary bucket location and the signature
+ * to get the secondary bucket location. This is same as
+ * proposed in Bin Fan, et al's paper
+ * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
+ * Smarter Hashing". The benefit to use
+ * XOR is that one could derive the alternative bucket location
+ * by only using the current bucket location and the signature.
+ */
+static inline uint16_t
+get_short_sig(const hash_sig_t hash)
+{
+	return hash >> 16;
+}
+
+static inline uint32_t
+get_prim_bucket_index(const struct rte_hash *h, const hash_sig_t hash)
+{
+	return hash & h->bucket_bitmask;
+}
+
+static inline uint32_t
+get_alt_bucket_index(const struct rte_hash *h,
+			uint32_t cur_bkt_idx, uint16_t sig)
+{
+	return (cur_bkt_idx ^ sig) & h->bucket_bitmask;
+}
+
 struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
@@ -327,9 +357,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
-		h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
-	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
 		h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
 	else
 #endif
@@ -417,18 +445,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
 	return h->hash_func(key, h->key_len, h->hash_func_init_val);
 }
 
-/* Calc the secondary hash value from the primary hash value of a given key */
-static inline hash_sig_t
-rte_hash_secondary_hash(const hash_sig_t primary_hash)
-{
-	static const unsigned all_bits_shift = 12;
-	static const unsigned alt_bits_xor = 0x5bd1e995;
-
-	uint32_t tag = primary_hash >> all_bits_shift;
-
-	return primary_hash ^ ((tag + 1) * alt_bits_xor);
-}
-
 int32_t
 rte_hash_count(const struct rte_hash *h)
 {
@@ -560,14 +576,13 @@ enqueue_slot_back(const struct rte_hash *h,
 /* Search a key from bucket and update its data */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
-	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+	struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	int i;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		if (bkt->sig_current[i] == sig &&
-				bkt->sig_alt[i] == alt_hash) {
+		if (bkt->sig_current[i] == sig) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -594,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		struct rte_hash_bucket *prim_bkt,
 		struct rte_hash_bucket *sec_bkt,
 		const struct rte_hash_key *key, void *data,
-		hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+		uint16_t sig, uint32_t new_idx,
 		int32_t *ret_val)
 {
 	unsigned int i;
@@ -605,7 +620,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -613,7 +628,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -628,7 +643,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			prim_bkt->sig_alt[i] = alt_hash;
 			prim_bkt->key_idx[i] = new_idx;
 			break;
 		}
@@ -653,7 +667,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *alt_bkt,
 			const struct rte_hash_key *key, void *data,
 			struct queue_node *leaf, uint32_t leaf_slot,
-			hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+			uint16_t sig, uint32_t new_idx,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
@@ -674,7 +688,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -682,7 +696,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -695,8 +709,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		prev_bkt = prev_node->bkt;
 		prev_slot = curr_node->prev_slot;
 
-		prev_alt_bkt_idx =
-			prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
+		prev_alt_bkt_idx = get_alt_bucket_index(h,
+					prev_node->cur_bkt_idx,
+					prev_bkt->sig_current[prev_slot]);
 
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
@@ -710,10 +725,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		 * Cuckoo insert to move elements back to its
 		 * primary bucket if available
 		 */
-		curr_bkt->sig_alt[curr_slot] =
-			 prev_bkt->sig_current[prev_slot];
 		curr_bkt->sig_current[curr_slot] =
-			prev_bkt->sig_alt[prev_slot];
+			prev_bkt->sig_current[prev_slot];
 		curr_bkt->key_idx[curr_slot] =
 			prev_bkt->key_idx[prev_slot];
 
@@ -723,7 +736,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
-	curr_bkt->sig_alt[curr_slot] = alt_hash;
 	curr_bkt->key_idx[curr_slot] = new_idx;
 
 	__hash_rw_writer_unlock(h);
@@ -741,39 +753,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *bkt,
 			struct rte_hash_bucket *sec_bkt,
 			const struct rte_hash_key *key, void *data,
-			hash_sig_t sig, hash_sig_t alt_hash,
+			uint16_t sig, uint32_t bucket_idx,
 			uint32_t new_idx, int32_t *ret_val)
 {
 	unsigned int i;
 	struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
 	struct queue_node *tail, *head;
 	struct rte_hash_bucket *curr_bkt, *alt_bkt;
+	uint32_t cur_idx, alt_idx;
 
 	tail = queue;
 	head = queue + 1;
 	tail->bkt = bkt;
 	tail->prev = NULL;
 	tail->prev_slot = -1;
+	tail->cur_bkt_idx = bucket_idx;
 
 	/* Cuckoo bfs Search */
 	while (likely(tail != head && head <
 					queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
 					RTE_HASH_BUCKET_ENTRIES)) {
 		curr_bkt = tail->bkt;
+		cur_idx = tail->cur_bkt_idx;
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
 				int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
 						bkt, sec_bkt, key, data,
-						tail, i, sig, alt_hash,
+						tail, i, sig,
 						new_idx, ret_val);
 				if (likely(ret != -1))
 					return ret;
 			}
 
 			/* Enqueue new node and keep prev node info */
-			alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
-						    & h->bucket_bitmask]);
+			alt_idx = get_alt_bucket_index(h, cur_idx,
+						curr_bkt->sig_current[i]);
+			alt_bkt = &(h->buckets[alt_idx]);
 			head->bkt = alt_bkt;
+			head->cur_bkt_idx = alt_idx;
 			head->prev = tail;
 			head->prev_slot = i;
 			head++;
@@ -788,7 +805,7 @@ static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig, void *data)
 {
-	hash_sig_t alt_hash;
+	uint16_t short_sig;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
@@ -803,18 +820,17 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	int32_t ret_val;
 	struct rte_hash_bucket *last;
 
-	prim_bucket_idx = sig & h->bucket_bitmask;
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
 	prim_bkt = &h->buckets[prim_bucket_idx];
-	rte_prefetch0(prim_bkt);
-
-	alt_hash = rte_hash_secondary_hash(sig);
-	sec_bucket_idx = alt_hash & h->bucket_bitmask;
 	sec_bkt = &h->buckets[sec_bucket_idx];
+	rte_prefetch0(prim_bkt);
 	rte_prefetch0(sec_bkt);
 
 	/* Check if key is already inserted in primary location */
 	__hash_rw_writer_lock(h);
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -822,12 +838,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Check if key is already inserted in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			return ret;
 		}
 	}
+
 	__hash_rw_writer_unlock(h);
 
 	/* Did not find a match, so get a new slot for storing the new key */
@@ -865,7 +882,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+					short_sig, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -875,7 +892,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -885,7 +902,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-					alt_hash, sig, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, new_idx, &ret_val);
 
 	if (ret == 0)
 		return new_idx - 1;
@@ -905,14 +922,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	 */
 	__hash_rw_writer_lock(h);
 	/* We check for duplicates again since could be inserted before the lock */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		goto failure;
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			enqueue_slot_back(h, cached_free_slots, slot_id);
 			goto failure;
@@ -924,8 +941,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
-				cur_bkt->sig_current[i] = alt_hash;
-				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->sig_current[i] = short_sig;
 				cur_bkt->key_idx[i] = new_idx;
 				__hash_rw_writer_unlock(h);
 				return new_idx - 1;
@@ -943,8 +959,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
-	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
@@ -1003,7 +1018,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
 
 /* Search one bucket to find the match key */
 static inline int32_t
-search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
+search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
 			void **data, const struct rte_hash_bucket *bkt)
 {
 	int i;
@@ -1032,30 +1047,30 @@ static inline int32_t
 __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 					hash_sig_t sig, void **data)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_reader_lock(h);
 
 	/* Check if key is in primary location */
-	ret = search_one_bucket(h, key, sig, data, bkt);
+	ret = search_one_bucket(h, key, short_sig, data, bkt);
 	if (ret != -1) {
 		__hash_rw_reader_unlock(h);
 		return ret;
 	}
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	bkt = &h->buckets[sec_bucket_idx];
 
 	/* Check if key is in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, bkt) {
-		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
 		if (ret != -1) {
 			__hash_rw_reader_unlock(h);
 			return ret;
@@ -1102,7 +1117,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	struct lcore_cache *cached_free_slots;
 
 	bkt->sig_current[i] = NULL_SIGNATURE;
-	bkt->sig_alt[i] = NULL_SIGNATURE;
 	if (h->multi_writer_support) {
 		lcore_id = rte_lcore_id();
 		cached_free_slots = &h->local_free_slots[lcore_id];
@@ -1141,9 +1155,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
 			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
-			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			last_bkt->sig_alt[i] = NULL_SIGNATURE;
 			last_bkt->key_idx[i] = EMPTY_SLOT;
 			return;
 		}
@@ -1153,7 +1165,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
+			struct rte_hash_bucket *bkt, uint16_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
@@ -1185,19 +1197,21 @@ static inline int32_t
 __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
 	struct rte_hash_bucket *cur_bkt;
 	int pos;
 	int32_t ret, i;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	prim_bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	prim_bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
+	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
 	if (ret != -1) {
 		__rte_hash_compact_ll(prim_bkt, pos);
 		last_bkt = prim_bkt->next;
@@ -1206,12 +1220,10 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	sec_bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[sec_bucket_idx];
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
 		if (ret != -1) {
 			__rte_hash_compact_ll(cur_bkt, pos);
 			last_bkt = sec_bkt->next;
@@ -1288,55 +1300,35 @@ static inline void
 compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 			const struct rte_hash_bucket *prim_bkt,
 			const struct rte_hash_bucket *sec_bkt,
-			hash_sig_t prim_hash, hash_sig_t sec_hash,
+			uint16_t sig,
 			enum rte_hash_sig_compare_function sig_cmp_fn)
 {
 	unsigned int i;
 
+	/* For match mask the first bit of every two bits indicates the match */
 	switch (sig_cmp_fn) {
-#ifdef RTE_MACHINE_CPUFLAG_AVX2
-	case RTE_HASH_COMPARE_AVX2:
-		*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)prim_bkt->sig_current),
-				_mm256_set1_epi32(prim_hash)));
-		*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)sec_bkt->sig_current),
-				_mm256_set1_epi32(sec_hash)));
-		break;
-#endif
 #ifdef RTE_MACHINE_CPUFLAG_SSE2
 	case RTE_HASH_COMPARE_SSE:
-		/* Compare the first 4 signatures in the bucket */
-		*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+		/* Compare all signatures in the bucket */
+		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
-				_mm_set1_epi32(prim_hash)));
-		*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&prim_bkt->sig_current[4]),
-				_mm_set1_epi32(prim_hash)))) << 4;
-		/* Compare the first 4 signatures in the bucket */
-		*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+				_mm_set1_epi16(sig)));
+		/* Compare all signatures in the bucket */
+		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
-				_mm_set1_epi32(sec_hash)));
-		*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&sec_bkt->sig_current[4]),
-				_mm_set1_epi32(sec_hash)))) << 4;
+				_mm_set1_epi16(sig)));
 		break;
 #endif
 	default:
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			*prim_hash_matches |=
-				((prim_hash == prim_bkt->sig_current[i]) << i);
+				((sig == prim_bkt->sig_current[i]) << (i << 1));
 			*sec_hash_matches |=
-				((sec_hash == sec_bkt->sig_current[i]) << i);
+				((sig == sec_bkt->sig_current[i]) << (i << 1));
 		}
 	}
-
 }
 
 #define PREFETCH_OFFSET 4
@@ -1349,7 +1341,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	int32_t i;
 	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
-	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
@@ -1368,10 +1362,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		rte_prefetch0(keys[i + PREFETCH_OFFSET]);
 
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1380,10 +1377,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	/* Calculate and prefetch rest of the buckets */
 	for (; i < num_keys; i++) {
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1394,10 +1394,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
 				primary_bkt[i], secondary_bkt[i],
-				prim_hash[i], sec_hash[i], h->sig_cmp_fn);
+				sig[i], h->sig_cmp_fn);
 
 		if (prim_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 			uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1408,7 +1409,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		}
 
 		if (sec_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 			uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1422,7 +1424,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		positions[i] = -ENOENT;
 		while (prim_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 
 			uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1441,11 +1444,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			prim_hitmask[i] &= ~(1 << (hit_index));
+			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 		while (sec_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 
 			uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1465,7 +1469,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			sec_hitmask[i] &= ~(1 << (hit_index));
+			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 next_key:
@@ -1488,10 +1492,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
 			if (data != NULL)
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], &data[i], cur_bkt);
+						sig[i], &data[i], cur_bkt);
 			else
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], NULL, cur_bkt);
+						sig[i], NULL, cur_bkt);
 			if (ret != -1) {
 				positions[i] = ret;
 				hits |= 1ULL << i;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index e601520..7753cd8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -129,18 +129,15 @@ struct rte_hash_key {
 enum rte_hash_sig_compare_function {
 	RTE_HASH_COMPARE_SCALAR = 0,
 	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_AVX2,
 	RTE_HASH_COMPARE_NUM
 };
 
 /** Bucket structure */
 struct rte_hash_bucket {
-	hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
+	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
 	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
 
-	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
-
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
 	void *next;
@@ -193,6 +190,7 @@ struct rte_hash {
 
 struct queue_node {
 	struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
+	uint32_t cur_bkt_idx;
 
 	struct queue_node *prev;     /* Parent(bucket) in search path */
 	int prev_slot;               /* Parent(slot) in search path */
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 11d8e28..6ace64e 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -40,7 +40,10 @@ extern "C" {
 /** Flag to indicate the extendabe bucket table feature should be used */
 #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
 
-/** Signature of key that is stored internally. */
+/**
+ * The type of hash value of a key.
+ * It should be a value of at least 32bit with fully random pattern.
+ */
 typedef uint32_t hash_sig_t;
 
 /** Type of function that can be used for calculating the hash value. */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 3/4] test/hash: implement extendable bucket hash test
  2018-09-28 17:23     ` [PATCH v4 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-10-01 19:53       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-01 19:53 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson
  Cc: konstantin.ananyev, dev, sameh.gobriel, Honnappa Nagarahalli, nd



> -----Original Message-----
> From: Yipeng Wang <yipeng1.wang@intel.com>
> Sent: Friday, September 28, 2018 12:24 PM
> To: bruce.richardson@intel.com
> Cc: konstantin.ananyev@intel.com; dev@dpdk.org; yipeng1.wang@intel.com;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> sameh.gobriel@intel.com
> Subject: [PATCH v4 3/4] test/hash: implement extendable bucket hash test
> 
> This commit changes the current rte_hash unit test to test the extendable
> table feature and performance.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  test/test/test_hash.c      | 159
> +++++++++++++++++++++++++++++++++++++++++++--
>  test/test/test_hash_perf.c | 114 +++++++++++++++++++++++---------
>  2 files changed, 238 insertions(+), 35 deletions(-)
> 
> diff --git a/test/test/test_hash.c b/test/test/test_hash.c index
> b3db9fd..815c734 100644
> --- a/test/test/test_hash.c
> +++ b/test/test/test_hash.c
> @@ -660,6 +660,116 @@ static int test_full_bucket(void)
>  	return 0;
>  }
> 
> +/*
> + * Similar to the test above (full bucket test), but for extendable buckets.
> + */
> +static int test_extendable_bucket(void) {
> +	struct rte_hash_parameters params_pseudo_hash = {
> +		.name = "test5",
> +		.entries = 64,
> +		.key_len = sizeof(struct flow_key), /* 13 */
> +		.hash_func = pseudo_hash,
> +		.hash_func_init_val = 0,
> +		.socket_id = 0,
> +		.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
> +	};
> +	struct rte_hash *handle;
> +	int pos[64];
> +	int expected_pos[64];
> +	unsigned int i;
> +	struct flow_key rand_keys[64];
> +
> +	for (i = 0; i < 64; i++) {
> +		rand_keys[i].port_dst = i;
> +		rand_keys[i].port_src = i+1;
> +	}
> +
> +	handle = rte_hash_create(&params_pseudo_hash);
> +	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
> +
> +	/* Fill bucket */
> +	for (i = 0; i < 64; i++) {
> +		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
> +		print_key_info("Add", &rand_keys[i], pos[i]);
> +		RETURN_IF_ERROR(pos[i] < 0,
> +			"failed to add key (pos[%u]=%d)", i, pos[i]);
> +		expected_pos[i] = pos[i];
> +	}
> +
> +	/* Lookup */
> +	for (i = 0; i < 64; i++) {
> +		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
> +		print_key_info("Lkp", &rand_keys[i], pos[i]);
> +		RETURN_IF_ERROR(pos[i] != expected_pos[i],
> +			"failed to find key (pos[%u]=%d)", i, pos[i]);
> +	}
> +
> +	/* Add - update */
> +	for (i = 0; i < 64; i++) {
> +		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
> +		print_key_info("Add", &rand_keys[i], pos[i]);
> +		RETURN_IF_ERROR(pos[i] != expected_pos[i],
> +			"failed to add key (pos[%u]=%d)", i, pos[i]);
> +	}
> +
> +	/* Lookup */
> +	for (i = 0; i < 64; i++) {
> +		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
> +		print_key_info("Lkp", &rand_keys[i], pos[i]);
> +		RETURN_IF_ERROR(pos[i] != expected_pos[i],
> +			"failed to find key (pos[%u]=%d)", i, pos[i]);
> +	}
> +
> +	/* Delete 1 key, check other keys are still found */
> +	pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
> +	print_key_info("Del", &rand_keys[35], pos[35]);
> +	RETURN_IF_ERROR(pos[35] != expected_pos[35],
> +			"failed to delete key (pos[1]=%d)", pos[35]);
> +	pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
> +	print_key_info("Lkp", &rand_keys[20], pos[20]);
> +	RETURN_IF_ERROR(pos[20] != expected_pos[20],
> +			"failed lookup after deleting key from same bucket "
> +			"(pos[20]=%d)", pos[20]);
> +
> +	/* Go back to previous state */
> +	pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
> +	print_key_info("Add", &rand_keys[35], pos[35]);
> +	expected_pos[35] = pos[35];
> +	RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)",
> +pos[35]);
> +
> +	/* Delete */
> +	for (i = 0; i < 64; i++) {
> +		pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
> +		print_key_info("Del", &rand_keys[i], pos[i]);
> +		RETURN_IF_ERROR(pos[i] != expected_pos[i],
> +			"failed to delete key (pos[%u]=%d)", i, pos[i]);
> +	}
> +
> +	/* Lookup */
> +	for (i = 0; i < 64; i++) {
> +		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
> +		print_key_info("Lkp", &rand_keys[i], pos[i]);
> +		RETURN_IF_ERROR(pos[i] != -ENOENT,
> +			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
> +	}
> +
> +	/* Add again */
> +	for (i = 0; i < 64; i++) {
> +		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
> +		print_key_info("Add", &rand_keys[i], pos[i]);
> +		RETURN_IF_ERROR(pos[i] < 0,
> +			"failed to add key (pos[%u]=%d)", i, pos[i]);
> +		expected_pos[i] = pos[i];
> +	}
> +
> +	rte_hash_free(handle);
> +
> +	/* Cover the NULL case. */
> +	rte_hash_free(0);
> +	return 0;
> +}
> +
> 
> /*****************************************************************
> *************/
>  static int
>  fbk_hash_unit_test(void)
> @@ -1096,7 +1206,7 @@ test_hash_creation_with_good_parameters(void)
>   * Test to see the average table utilization (entries added/max entries)
>   * before hitting a random entry that cannot be added
>   */
> -static int test_average_table_utilization(void)
> +static int test_average_table_utilization(uint32_t ext_table)
>  {
>  	struct rte_hash *handle;
>  	uint8_t simple_key[MAX_KEYSIZE];
> @@ -1107,12 +1217,23 @@ static int test_average_table_utilization(void)
> 
>  	printf("\n# Running test to determine average utilization"
>  	       "\n  before adding elements begins to fail\n");
> +	if (ext_table)
> +		printf("ext table is enabled\n");
> +	else
> +		printf("ext table is disabled\n");
> +
>  	printf("Measuring performance, please wait");
>  	fflush(stdout);
>  	ut_params.entries = 1 << 16;
>  	ut_params.name = "test_average_utilization";
>  	ut_params.hash_func = rte_jhash;
> +	if (ext_table)
> +		ut_params.extra_flag |=
> RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +	else
> +		ut_params.extra_flag &=
> ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +
>  	handle = rte_hash_create(&ut_params);
> +
>  	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
> 
>  	for (j = 0; j < ITERATIONS; j++) {
> @@ -1139,6 +1260,14 @@ static int test_average_table_utilization(void)
>  			rte_hash_free(handle);
>  			return -1;
>  		}
> +		if (ext_table) {
> +			if (cnt != ut_params.entries) {
> +				printf("rte_hash_count returned wrong value
> "
> +					"%u, %u, %u\n", j, added_keys, cnt);
> +				rte_hash_free(handle);
> +				return -1;
> +			}
> +		}
> 
>  		average_keys_added += added_keys;
> 
> @@ -1161,7 +1290,7 @@ static int test_average_table_utilization(void)
>  }
> 
>  #define NUM_ENTRIES 256
> -static int test_hash_iteration(void)
> +static int test_hash_iteration(uint32_t ext_table)
>  {
>  	struct rte_hash *handle;
>  	unsigned i;
> @@ -1177,6 +1306,11 @@ static int test_hash_iteration(void)
>  	ut_params.name = "test_hash_iteration";
>  	ut_params.hash_func = rte_jhash;
>  	ut_params.key_len = 16;
> +	if (ext_table)
> +		ut_params.extra_flag |=
> RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +	else
> +		ut_params.extra_flag &=
> ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +
>  	handle = rte_hash_create(&ut_params);
>  	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
> 
> @@ -1186,8 +1320,13 @@ static int test_hash_iteration(void)
>  		for (i = 0; i < ut_params.key_len; i++)
>  			keys[added_keys][i] = rte_rand() % 255;
>  		ret = rte_hash_add_key_data(handle, keys[added_keys],
> data[added_keys]);
> -		if (ret < 0)
> +		if (ret < 0) {
> +			if (ext_table) {
> +				printf("Insertion failed for ext table\n");
> +				goto err;
> +			}
>  			break;
> +		}
>  	}
> 
>  	/* Iterate through the hash table */
> @@ -1474,6 +1613,8 @@ test_hash(void)
>  		return -1;
>  	if (test_full_bucket() < 0)
>  		return -1;
> +	if (test_extendable_bucket() < 0)
> +		return -1;
> 
>  	if (test_fbk_hash_find_existing() < 0)
>  		return -1;
> @@ -1483,9 +1624,17 @@ test_hash(void)
>  		return -1;
>  	if (test_hash_creation_with_good_parameters() < 0)
>  		return -1;
> -	if (test_average_table_utilization() < 0)
> +
> +	/* ext table disabled */
> +	if (test_average_table_utilization(0) < 0)
> +		return -1;
> +	if (test_hash_iteration(0) < 0)
> +		return -1;
> +
> +	/* ext table enabled */
> +	if (test_average_table_utilization(1) < 0)
>  		return -1;
> -	if (test_hash_iteration() < 0)
> +	if (test_hash_iteration(1) < 0)
>  		return -1;
> 
>  	run_hash_func_tests();
> diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c index
> 0d39e10..5252111 100644
> --- a/test/test/test_hash_perf.c
> +++ b/test/test/test_hash_perf.c
> @@ -18,7 +18,8 @@
>  #include "test.h"
> 
>  #define MAX_ENTRIES (1 << 19)
> -#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
> +#define KEYS_TO_ADD (MAX_ENTRIES)
> +#define ADD_PERCENT 0.75 /* 75% table utilization */
>  #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added,
> several times */
>  /* BUCKET_SIZE should be same as RTE_HASH_BUCKET_ENTRIES in rte_hash
> library */  #define BUCKET_SIZE 8 @@ -78,7 +79,7 @@ static struct
> rte_hash_parameters ut_params = {
> 
>  static int
>  create_table(unsigned int with_data, unsigned int table_index,
> -		unsigned int with_locks)
> +		unsigned int with_locks, unsigned int ext)
>  {
>  	char name[RTE_HASH_NAMESIZE];
> 
> @@ -96,6 +97,9 @@ create_table(unsigned int with_data, unsigned int
> table_index,
>  	else
>  		ut_params.extra_flag = 0;
> 
> +	if (ext)
> +		ut_params.extra_flag |=
> RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
> +
>  	ut_params.name = name;
>  	ut_params.key_len = hashtest_key_lens[table_index];
>  	ut_params.socket_id = rte_socket_id(); @@ -117,15 +121,21 @@
> create_table(unsigned int with_data, unsigned int table_index,
> 
>  /* Shuffle the keys that have been added, so lookups will be totally random */
> static void -shuffle_input_keys(unsigned table_index)
> +shuffle_input_keys(unsigned int table_index, unsigned int ext)
>  {
>  	unsigned i;
>  	uint32_t swap_idx;
>  	uint8_t temp_key[MAX_KEYSIZE];
>  	hash_sig_t temp_signature;
>  	int32_t temp_position;
> +	unsigned int keys_to_add;
> +
> +	if (!ext)
> +		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +	else
> +		keys_to_add = KEYS_TO_ADD;
> 
> -	for (i = KEYS_TO_ADD - 1; i > 0; i--) {
> +	for (i = keys_to_add - 1; i > 0; i--) {
>  		swap_idx = rte_rand() % i;
> 
>  		memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
> @@ -147,14 +157,20 @@ shuffle_input_keys(unsigned table_index)
>   * ALL can fit in hash table (no errors)
>   */
>  static int
> -get_input_keys(unsigned with_pushes, unsigned table_index)
> +get_input_keys(unsigned int with_pushes, unsigned int table_index,
> +							unsigned int ext)
>  {
>  	unsigned i, j;
>  	unsigned bucket_idx, incr, success = 1;
>  	uint8_t k = 0;
>  	int32_t ret;
>  	const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
> +	unsigned int keys_to_add;
> 
> +	if (!ext)
> +		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +	else
> +		keys_to_add = KEYS_TO_ADD;
>  	/* Reset all arrays */
>  	for (i = 0; i < MAX_ENTRIES; i++)
>  		slot_taken[i] = 0;
> @@ -171,7 +187,7 @@ get_input_keys(unsigned with_pushes, unsigned
> table_index)
>  	 * Regardless a key has been added correctly or not (success),
>  	 * the next one to try will be increased by 1.
>  	 */
> -	for (i = 0; i < KEYS_TO_ADD;) {
> +	for (i = 0; i < keys_to_add;) {
>  		incr = 0;
>  		if (i != 0) {
>  			keys[i][0] = ++k;
> @@ -235,14 +251,20 @@ get_input_keys(unsigned with_pushes, unsigned
> table_index)  }
> 
>  static int
> -timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
> +timed_adds(unsigned int with_hash, unsigned int with_data,
> +				unsigned int table_index, unsigned int ext)
>  {
>  	unsigned i;
>  	const uint64_t start_tsc = rte_rdtsc();
>  	void *data;
>  	int32_t ret;
> +	unsigned int keys_to_add;
> +	if (!ext)
> +		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +	else
> +		keys_to_add = KEYS_TO_ADD;
> 
> -	for (i = 0; i < KEYS_TO_ADD; i++) {
> +	for (i = 0; i < keys_to_add; i++) {
>  		data = (void *) ((uintptr_t) signatures[i]);
>  		if (with_hash && with_data) {
>  			ret =
> rte_hash_add_key_with_hash_data(h[table_index],
> @@ -284,22 +306,31 @@ timed_adds(unsigned with_hash, unsigned
> with_data, unsigned table_index)
>  	const uint64_t end_tsc = rte_rdtsc();
>  	const uint64_t time_taken = end_tsc - start_tsc;
> 
> -	cycles[table_index][ADD][with_hash][with_data] =
> time_taken/KEYS_TO_ADD;
> +	cycles[table_index][ADD][with_hash][with_data] =
> +time_taken/keys_to_add;
> 
>  	return 0;
>  }
> 
>  static int
> -timed_lookups(unsigned with_hash, unsigned with_data, unsigned
> table_index)
> +timed_lookups(unsigned int with_hash, unsigned int with_data,
> +				unsigned int table_index, unsigned int ext)
>  {
>  	unsigned i, j;
>  	const uint64_t start_tsc = rte_rdtsc();
>  	void *ret_data;
>  	void *expected_data;
>  	int32_t ret;
> -
> -	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
> -		for (j = 0; j < KEYS_TO_ADD; j++) {
> +	unsigned int keys_to_add, num_lookups;
> +
> +	if (!ext) {
> +		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
> +	} else {
> +		keys_to_add = KEYS_TO_ADD;
> +		num_lookups = NUM_LOOKUPS;
> +	}
> +	for (i = 0; i < num_lookups / keys_to_add; i++) {
> +		for (j = 0; j < keys_to_add; j++) {
>  			if (with_hash && with_data) {
>  				ret =
> rte_hash_lookup_with_hash_data(h[table_index],
>  							(const void *) keys[j],
> @@ -352,13 +383,14 @@ timed_lookups(unsigned with_hash, unsigned
> with_data, unsigned table_index)
>  	const uint64_t end_tsc = rte_rdtsc();
>  	const uint64_t time_taken = end_tsc - start_tsc;
> 
> -	cycles[table_index][LOOKUP][with_hash][with_data] =
> time_taken/NUM_LOOKUPS;
> +	cycles[table_index][LOOKUP][with_hash][with_data] =
> +time_taken/num_lookups;
> 
>  	return 0;
>  }
> 
>  static int
> -timed_lookups_multi(unsigned with_data, unsigned table_index)
> +timed_lookups_multi(unsigned int with_data, unsigned int table_index,
> +							unsigned int ext)
>  {
>  	unsigned i, j, k;
>  	int32_t positions_burst[BURST_SIZE];
> @@ -367,11 +399,20 @@ timed_lookups_multi(unsigned with_data,
> unsigned table_index)
>  	void *ret_data[BURST_SIZE];
>  	uint64_t hit_mask;
>  	int ret;
> +	unsigned int keys_to_add, num_lookups;
> +
> +	if (!ext) {
> +		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
> +	} else {
> +		keys_to_add = KEYS_TO_ADD;
> +		num_lookups = NUM_LOOKUPS;
> +	}
> 
>  	const uint64_t start_tsc = rte_rdtsc();
> 
> -	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
> -		for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
> +	for (i = 0; i < num_lookups/keys_to_add; i++) {
> +		for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
>  			for (k = 0; k < BURST_SIZE; k++)
>  				keys_burst[k] = keys[j * BURST_SIZE + k];
>  			if (with_data) {
> @@ -419,19 +460,25 @@ timed_lookups_multi(unsigned with_data,
> unsigned table_index)
>  	const uint64_t end_tsc = rte_rdtsc();
>  	const uint64_t time_taken = end_tsc - start_tsc;
> 
> -	cycles[table_index][LOOKUP_MULTI][0][with_data] =
> time_taken/NUM_LOOKUPS;
> +	cycles[table_index][LOOKUP_MULTI][0][with_data] =
> +time_taken/num_lookups;
> 
>  	return 0;
>  }
> 
>  static int
> -timed_deletes(unsigned with_hash, unsigned with_data, unsigned
> table_index)
> +timed_deletes(unsigned int with_hash, unsigned int with_data,
> +				unsigned int table_index, unsigned int ext)
>  {
>  	unsigned i;
>  	const uint64_t start_tsc = rte_rdtsc();
>  	int32_t ret;
> +	unsigned int keys_to_add;
> +	if (!ext)
> +		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
> +	else
> +		keys_to_add = KEYS_TO_ADD;
> 
> -	for (i = 0; i < KEYS_TO_ADD; i++) {
> +	for (i = 0; i < keys_to_add; i++) {
>  		/* There are no delete functions with data, so just call two
> functions */
>  		if (with_hash)
>  			ret = rte_hash_del_key_with_hash(h[table_index],
> @@ -451,7 +498,7 @@ timed_deletes(unsigned with_hash, unsigned
> with_data, unsigned table_index)
>  	const uint64_t end_tsc = rte_rdtsc();
>  	const uint64_t time_taken = end_tsc - start_tsc;
> 
> -	cycles[table_index][DELETE][with_hash][with_data] =
> time_taken/KEYS_TO_ADD;
> +	cycles[table_index][DELETE][with_hash][with_data] =
> +time_taken/keys_to_add;
> 
>  	return 0;
>  }
> @@ -469,7 +516,8 @@ reset_table(unsigned table_index)  }
> 
>  static int
> -run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
> +run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
> +						unsigned int ext)
>  {
>  	unsigned i, j, with_data, with_hash;
> 
> @@ -478,25 +526,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes,
> unsigned int with_locks)
> 
>  	for (with_data = 0; with_data <= 1; with_data++) {
>  		for (i = 0; i < NUM_KEYSIZES; i++) {
> -			if (create_table(with_data, i, with_locks) < 0)
> +			if (create_table(with_data, i, with_locks, ext) < 0)
>  				return -1;
> 
> -			if (get_input_keys(with_pushes, i) < 0)
> +			if (get_input_keys(with_pushes, i, ext) < 0)
>  				return -1;
>  			for (with_hash = 0; with_hash <= 1; with_hash++) {
> -				if (timed_adds(with_hash, with_data, i) < 0)
> +				if (timed_adds(with_hash, with_data, i, ext) <
> 0)
>  					return -1;
> 
>  				for (j = 0; j < NUM_SHUFFLES; j++)
> -					shuffle_input_keys(i);
> +					shuffle_input_keys(i, ext);
> 
> -				if (timed_lookups(with_hash, with_data, i) < 0)
> +				if (timed_lookups(with_hash, with_data, i, ext)
> < 0)
>  					return -1;
> 
> -				if (timed_lookups_multi(with_data, i) < 0)
> +				if (timed_lookups_multi(with_data, i, ext) < 0)
>  					return -1;
> 
> -				if (timed_deletes(with_hash, with_data, i) < 0)
> +				if (timed_deletes(with_hash, with_data, i, ext)
> < 0)
>  					return -1;
> 
>  				/* Print a dot to show progress on operations
> */ @@ -632,10 +680,16 @@ test_hash_perf(void)
>  				printf("\nALL ELEMENTS IN PRIMARY
> LOCATION\n");
>  			else
>  				printf("\nELEMENTS IN PRIMARY OR
> SECONDARY LOCATION\n");
> -			if (run_all_tbl_perf_tests(with_pushes, with_locks) <
> 0)
> +			if (run_all_tbl_perf_tests(with_pushes, with_locks, 0)
> < 0)
>  				return -1;
>  		}
>  	}
> +
> +	printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
> +
> +	if (run_all_tbl_perf_tests(1, 0, 1) < 0)
> +		return -1;
> +
>  	if (fbk_hash_perf_test() < 0)
>  		return -1;
> 
> --
> 2.7.4

Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 4/4] hash: use partial-key hashing
  2018-09-28 17:23     ` [PATCH v4 4/4] hash: use partial-key hashing Yipeng Wang
@ 2018-10-01 20:09       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-01 20:09 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson
  Cc: konstantin.ananyev, dev, sameh.gobriel, Honnappa Nagarahalli, nd

> 
> This commit changes the hashing mechanism to "partial-key hashing" to
> calculate bucket index and signature of key.
> 
> This is  proposed in Bin Fan, et al's paper
> "MemC3: Compact and Concurrent MemCache with Dumber Caching and
> Smarter Hashing". Bascially the idea is to use "xor" to derive alternative
> bucket from current bucket index and signature.
> 
> With "partial-key hashing", it reduces the bucket memory requirement from
> two cache lines to one cache line, which improves the memory efficiency and
> thus the lookup speed.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  lib/librte_hash/rte_cuckoo_hash.c | 246 +++++++++++++++++++------------------
> -
>  lib/librte_hash/rte_cuckoo_hash.h |   6 +-
>  lib/librte_hash/rte_hash.h        |   5 +-
>  3 files changed, 131 insertions(+), 126 deletions(-)
> 
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index 02650b9..e101708 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -90,6 +90,36 @@ rte_hash_cmp_eq(const void *key1, const void *key2,
> const struct rte_hash *h)
>  		return cmp_jump_table[h->cmp_jump_table_idx](key1, key2,
> h->key_len);  }
> 
> +/*
> + * We use higher 16 bits of hash as the signature value stored in table.
> + * We use the lower bits for the primary bucket
> + * location. Then we XOR primary bucket location and the signature
> + * to get the secondary bucket location. This is same as
> + * proposed in Bin Fan, et al's paper
> + * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
> + * Smarter Hashing". The benefit to use
> + * XOR is that one could derive the alternative bucket location
> + * by only using the current bucket location and the signature.
> + */
> +static inline uint16_t
> +get_short_sig(const hash_sig_t hash)
> +{
> +	return hash >> 16;
> +}
> +
> +static inline uint32_t
> +get_prim_bucket_index(const struct rte_hash *h, const hash_sig_t hash)
> +{
> +	return hash & h->bucket_bitmask;
> +}
> +
> +static inline uint32_t
> +get_alt_bucket_index(const struct rte_hash *h,
> +			uint32_t cur_bkt_idx, uint16_t sig)
> +{
> +	return (cur_bkt_idx ^ sig) & h->bucket_bitmask; }
> +
>  struct rte_hash *
>  rte_hash_create(const struct rte_hash_parameters *params)  { @@ -327,9
> +357,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
>  	h->ext_table_support = ext_table_support;
> 
>  #if defined(RTE_ARCH_X86)
> -	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> -		h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
> -	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
> +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
>  		h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
>  	else
>  #endif
> @@ -417,18 +445,6 @@ rte_hash_hash(const struct rte_hash *h, const void
> *key)
>  	return h->hash_func(key, h->key_len, h->hash_func_init_val);  }
> 
> -/* Calc the secondary hash value from the primary hash value of a given key
> */ -static inline hash_sig_t -rte_hash_secondary_hash(const hash_sig_t
> primary_hash) -{
> -	static const unsigned all_bits_shift = 12;
> -	static const unsigned alt_bits_xor = 0x5bd1e995;
> -
> -	uint32_t tag = primary_hash >> all_bits_shift;
> -
> -	return primary_hash ^ ((tag + 1) * alt_bits_xor);
> -}
> -
>  int32_t
>  rte_hash_count(const struct rte_hash *h)  { @@ -560,14 +576,13 @@
> enqueue_slot_back(const struct rte_hash *h,
>  /* Search a key from bucket and update its data */  static inline int32_t
> search_and_update(const struct rte_hash *h, void *data, const void *key,
> -	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
> +	struct rte_hash_bucket *bkt, uint16_t sig)
>  {
>  	int i;
>  	struct rte_hash_key *k, *keys = h->key_store;
> 
>  	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> -		if (bkt->sig_current[i] == sig &&
> -				bkt->sig_alt[i] == alt_hash) {
> +		if (bkt->sig_current[i] == sig) {
>  			k = (struct rte_hash_key *) ((char *)keys +
>  					bkt->key_idx[i] * h->key_entry_size);
>  			if (rte_hash_cmp_eq(key, k->key, h) == 0) { @@ -
> 594,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>  		struct rte_hash_bucket *prim_bkt,
>  		struct rte_hash_bucket *sec_bkt,
>  		const struct rte_hash_key *key, void *data,
> -		hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
> +		uint16_t sig, uint32_t new_idx,
>  		int32_t *ret_val)
>  {
>  	unsigned int i;
> @@ -605,7 +620,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>  	/* Check if key was inserted after last check but before this
>  	 * protected region in case of inserting duplicated keys.
>  	 */
> -	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +	ret = search_and_update(h, data, key, prim_bkt, sig);
>  	if (ret != -1) {
>  		__hash_rw_writer_unlock(h);
>  		*ret_val = ret;
> @@ -613,7 +628,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>  	}
> 
>  	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +		ret = search_and_update(h, data, key, cur_bkt, sig);
>  		if (ret != -1) {
>  			__hash_rw_writer_unlock(h);
>  			*ret_val = ret;
> @@ -628,7 +643,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>  		/* Check if slot is available */
>  		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
>  			prim_bkt->sig_current[i] = sig;
> -			prim_bkt->sig_alt[i] = alt_hash;
>  			prim_bkt->key_idx[i] = new_idx;
>  			break;
>  		}
> @@ -653,7 +667,7 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  			struct rte_hash_bucket *alt_bkt,
>  			const struct rte_hash_key *key, void *data,
>  			struct queue_node *leaf, uint32_t leaf_slot,
> -			hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
> +			uint16_t sig, uint32_t new_idx,
>  			int32_t *ret_val)
>  {
>  	uint32_t prev_alt_bkt_idx;
> @@ -674,7 +688,7 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  	/* Check if key was inserted after last check but before this
>  	 * protected region.
>  	 */
> -	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
> +	ret = search_and_update(h, data, key, bkt, sig);
>  	if (ret != -1) {
>  		__hash_rw_writer_unlock(h);
>  		*ret_val = ret;
> @@ -682,7 +696,7 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  	}
> 
>  	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
> -		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +		ret = search_and_update(h, data, key, cur_bkt, sig);
>  		if (ret != -1) {
>  			__hash_rw_writer_unlock(h);
>  			*ret_val = ret;
> @@ -695,8 +709,9 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  		prev_bkt = prev_node->bkt;
>  		prev_slot = curr_node->prev_slot;
> 
> -		prev_alt_bkt_idx =
> -			prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
> +		prev_alt_bkt_idx = get_alt_bucket_index(h,
> +					prev_node->cur_bkt_idx,
> +					prev_bkt->sig_current[prev_slot]);
> 
>  		if (unlikely(&h->buckets[prev_alt_bkt_idx]
>  				!= curr_bkt)) {
> @@ -710,10 +725,8 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  		 * Cuckoo insert to move elements back to its
>  		 * primary bucket if available
>  		 */
> -		curr_bkt->sig_alt[curr_slot] =
> -			 prev_bkt->sig_current[prev_slot];
>  		curr_bkt->sig_current[curr_slot] =
> -			prev_bkt->sig_alt[prev_slot];
> +			prev_bkt->sig_current[prev_slot];
>  		curr_bkt->key_idx[curr_slot] =
>  			prev_bkt->key_idx[prev_slot];
> 
> @@ -723,7 +736,6 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
>  	}
> 
>  	curr_bkt->sig_current[curr_slot] = sig;
> -	curr_bkt->sig_alt[curr_slot] = alt_hash;
>  	curr_bkt->key_idx[curr_slot] = new_idx;
> 
>  	__hash_rw_writer_unlock(h);
> @@ -741,39 +753,44 @@ rte_hash_cuckoo_make_space_mw(const struct
> rte_hash *h,
>  			struct rte_hash_bucket *bkt,
>  			struct rte_hash_bucket *sec_bkt,
>  			const struct rte_hash_key *key, void *data,
> -			hash_sig_t sig, hash_sig_t alt_hash,
> +			uint16_t sig, uint32_t bucket_idx,
>  			uint32_t new_idx, int32_t *ret_val)
>  {
>  	unsigned int i;
>  	struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
>  	struct queue_node *tail, *head;
>  	struct rte_hash_bucket *curr_bkt, *alt_bkt;
> +	uint32_t cur_idx, alt_idx;
> 
>  	tail = queue;
>  	head = queue + 1;
>  	tail->bkt = bkt;
>  	tail->prev = NULL;
>  	tail->prev_slot = -1;
> +	tail->cur_bkt_idx = bucket_idx;
> 
>  	/* Cuckoo bfs Search */
>  	while (likely(tail != head && head <
>  					queue +
> RTE_HASH_BFS_QUEUE_MAX_LEN -
>  					RTE_HASH_BUCKET_ENTRIES)) {
>  		curr_bkt = tail->bkt;
> +		cur_idx = tail->cur_bkt_idx;
>  		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>  			if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
>  				int32_t ret =
> rte_hash_cuckoo_move_insert_mw(h,
>  						bkt, sec_bkt, key, data,
> -						tail, i, sig, alt_hash,
> +						tail, i, sig,
>  						new_idx, ret_val);
>  				if (likely(ret != -1))
>  					return ret;
>  			}
> 
>  			/* Enqueue new node and keep prev node info */
> -			alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
> -						    & h->bucket_bitmask]);
> +			alt_idx = get_alt_bucket_index(h, cur_idx,
> +						curr_bkt->sig_current[i]);
> +			alt_bkt = &(h->buckets[alt_idx]);
>  			head->bkt = alt_bkt;
> +			head->cur_bkt_idx = alt_idx;
>  			head->prev = tail;
>  			head->prev_slot = i;
>  			head++;
> @@ -788,7 +805,7 @@ static inline int32_t
> __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>  						hash_sig_t sig, void *data)
>  {
> -	hash_sig_t alt_hash;
> +	uint16_t short_sig;
>  	uint32_t prim_bucket_idx, sec_bucket_idx;
>  	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
>  	struct rte_hash_key *new_k, *keys = h->key_store; @@ -803,18
> +820,17 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const
> void *key,
>  	int32_t ret_val;
>  	struct rte_hash_bucket *last;
> 
> -	prim_bucket_idx = sig & h->bucket_bitmask;
> +	short_sig = get_short_sig(sig);
> +	prim_bucket_idx = get_prim_bucket_index(h, sig);
> +	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
>  	prim_bkt = &h->buckets[prim_bucket_idx];
> -	rte_prefetch0(prim_bkt);
> -
> -	alt_hash = rte_hash_secondary_hash(sig);
> -	sec_bucket_idx = alt_hash & h->bucket_bitmask;
>  	sec_bkt = &h->buckets[sec_bucket_idx];
> +	rte_prefetch0(prim_bkt);
>  	rte_prefetch0(sec_bkt);
> 
>  	/* Check if key is already inserted in primary location */
>  	__hash_rw_writer_lock(h);
> -	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +	ret = search_and_update(h, data, key, prim_bkt, short_sig);
>  	if (ret != -1) {
>  		__hash_rw_writer_unlock(h);
>  		return ret;
> @@ -822,12 +838,13 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
> 
>  	/* Check if key is already inserted in secondary location */
>  	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +		ret = search_and_update(h, data, key, cur_bkt, short_sig);
>  		if (ret != -1) {
>  			__hash_rw_writer_unlock(h);
>  			return ret;
>  		}
>  	}
> +
>  	__hash_rw_writer_unlock(h);
> 
>  	/* Did not find a match, so get a new slot for storing the new key */
> @@ -865,7 +882,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
> 
>  	/* Find an empty slot and insert */
>  	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
> -					sig, alt_hash, new_idx, &ret_val);
> +					short_sig, new_idx, &ret_val);
>  	if (ret == 0)
>  		return new_idx - 1;
>  	else if (ret == 1) {
> @@ -875,7 +892,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
> 
>  	/* Primary bucket full, need to make space for new entry */
>  	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key,
> data,
> -					sig, alt_hash, new_idx, &ret_val);
> +				short_sig, prim_bucket_idx, new_idx,
> &ret_val);
>  	if (ret == 0)
>  		return new_idx - 1;
>  	else if (ret == 1) {
> @@ -885,7 +902,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
> 
>  	/* Also search secondary bucket to get better occupancy */
>  	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key,
> data,
> -					alt_hash, sig, new_idx, &ret_val);
> +				short_sig, sec_bucket_idx, new_idx, &ret_val);
> 
>  	if (ret == 0)
>  		return new_idx - 1;
> @@ -905,14 +922,14 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
>  	 */
>  	__hash_rw_writer_lock(h);
>  	/* We check for duplicates again since could be inserted before the
> lock */
> -	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +	ret = search_and_update(h, data, key, prim_bkt, short_sig);
>  	if (ret != -1) {
>  		enqueue_slot_back(h, cached_free_slots, slot_id);
>  		goto failure;
>  	}
> 
>  	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +		ret = search_and_update(h, data, key, cur_bkt, short_sig);
>  		if (ret != -1) {
>  			enqueue_slot_back(h, cached_free_slots, slot_id);
>  			goto failure;
> @@ -924,8 +941,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
>  		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>  			/* Check if slot is available */
>  			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
> -				cur_bkt->sig_current[i] = alt_hash;
> -				cur_bkt->sig_alt[i] = sig;
> +				cur_bkt->sig_current[i] = short_sig;
>  				cur_bkt->key_idx[i] = new_idx;
>  				__hash_rw_writer_unlock(h);
>  				return new_idx - 1;
> @@ -943,8 +959,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash
> *h, const void *key,
> 
>  	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
>  	/* Use the first location of the new bucket */
> -	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
> -	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
> +	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
>  	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
>  	/* Link the new bucket to sec bucket linked list */
>  	last = rte_hash_get_last_bkt(sec_bkt); @@ -1003,7 +1018,7 @@
> rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
> 
>  /* Search one bucket to find the match key */  static inline int32_t -
> search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
> +search_one_bucket(const struct rte_hash *h, const void *key, uint16_t
> +sig,
>  			void **data, const struct rte_hash_bucket *bkt)  {
>  	int i;
> @@ -1032,30 +1047,30 @@ static inline int32_t
> __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
>  					hash_sig_t sig, void **data)
>  {
> -	uint32_t bucket_idx;
> -	hash_sig_t alt_hash;
> +	uint32_t prim_bucket_idx, sec_bucket_idx;
>  	struct rte_hash_bucket *bkt, *cur_bkt;
>  	int ret;
> +	uint16_t short_sig;
> 
> -	bucket_idx = sig & h->bucket_bitmask;
> -	bkt = &h->buckets[bucket_idx];
> +	short_sig = get_short_sig(sig);
> +	prim_bucket_idx = get_prim_bucket_index(h, sig);
> +	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
> +	bkt = &h->buckets[prim_bucket_idx];
> 
>  	__hash_rw_reader_lock(h);
> 
>  	/* Check if key is in primary location */
> -	ret = search_one_bucket(h, key, sig, data, bkt);
> +	ret = search_one_bucket(h, key, short_sig, data, bkt);
>  	if (ret != -1) {
>  		__hash_rw_reader_unlock(h);
>  		return ret;
>  	}
>  	/* Calculate secondary hash */
> -	alt_hash = rte_hash_secondary_hash(sig);
> -	bucket_idx = alt_hash & h->bucket_bitmask;
> -	bkt = &h->buckets[bucket_idx];
> +	bkt = &h->buckets[sec_bucket_idx];
> 
>  	/* Check if key is in secondary location */
>  	FOR_EACH_BUCKET(cur_bkt, bkt) {
> -		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
> +		ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
>  		if (ret != -1) {
>  			__hash_rw_reader_unlock(h);
>  			return ret;
> @@ -1102,7 +1117,6 @@ remove_entry(const struct rte_hash *h, struct
> rte_hash_bucket *bkt, unsigned i)
>  	struct lcore_cache *cached_free_slots;
> 
>  	bkt->sig_current[i] = NULL_SIGNATURE;
> -	bkt->sig_alt[i] = NULL_SIGNATURE;
>  	if (h->multi_writer_support) {
>  		lcore_id = rte_lcore_id();
>  		cached_free_slots = &h->local_free_slots[lcore_id]; @@ -
> 1141,9 +1155,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt,
> int pos) {
>  		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
>  			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
>  			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
> -			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
>  			last_bkt->sig_current[i] = NULL_SIGNATURE;
> -			last_bkt->sig_alt[i] = NULL_SIGNATURE;
>  			last_bkt->key_idx[i] = EMPTY_SLOT;
>  			return;
>  		}
> @@ -1153,7 +1165,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket
> *cur_bkt, int pos) {
>  /* Search one bucket and remove the matched key */  static inline int32_t
> search_and_remove(const struct rte_hash *h, const void *key,
> -			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
> +			struct rte_hash_bucket *bkt, uint16_t sig, int *pos)
>  {
>  	struct rte_hash_key *k, *keys = h->key_store;
>  	unsigned int i;
> @@ -1185,19 +1197,21 @@ static inline int32_t
> __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
>  						hash_sig_t sig)
>  {
> -	uint32_t bucket_idx;
> -	hash_sig_t alt_hash;
> +	uint32_t prim_bucket_idx, sec_bucket_idx;
>  	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
>  	struct rte_hash_bucket *cur_bkt;
>  	int pos;
>  	int32_t ret, i;
> +	uint16_t short_sig;
> 
> -	bucket_idx = sig & h->bucket_bitmask;
> -	prim_bkt = &h->buckets[bucket_idx];
> +	short_sig = get_short_sig(sig);
> +	prim_bucket_idx = get_prim_bucket_index(h, sig);
> +	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
> +	prim_bkt = &h->buckets[prim_bucket_idx];
> 
>  	__hash_rw_writer_lock(h);
>  	/* look for key in primary bucket */
> -	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
> +	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
>  	if (ret != -1) {
>  		__rte_hash_compact_ll(prim_bkt, pos);
>  		last_bkt = prim_bkt->next;
> @@ -1206,12 +1220,10 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,
>  	}
> 
>  	/* Calculate secondary hash */
> -	alt_hash = rte_hash_secondary_hash(sig);
> -	bucket_idx = alt_hash & h->bucket_bitmask;
> -	sec_bkt = &h->buckets[bucket_idx];
> +	sec_bkt = &h->buckets[sec_bucket_idx];
> 
>  	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
> +		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
>  		if (ret != -1) {
>  			__rte_hash_compact_ll(cur_bkt, pos);
>  			last_bkt = sec_bkt->next;
> @@ -1288,55 +1300,35 @@ static inline void  compare_signatures(uint32_t
> *prim_hash_matches, uint32_t *sec_hash_matches,
>  			const struct rte_hash_bucket *prim_bkt,
>  			const struct rte_hash_bucket *sec_bkt,
> -			hash_sig_t prim_hash, hash_sig_t sec_hash,
> +			uint16_t sig,
>  			enum rte_hash_sig_compare_function sig_cmp_fn)  {
>  	unsigned int i;
> 
> +	/* For match mask the first bit of every two bits indicates the match
> +*/
>  	switch (sig_cmp_fn) {
> -#ifdef RTE_MACHINE_CPUFLAG_AVX2
> -	case RTE_HASH_COMPARE_AVX2:
> -		*prim_hash_matches =
> _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> -				_mm256_load_si256(
> -					(__m256i const *)prim_bkt-
> >sig_current),
> -				_mm256_set1_epi32(prim_hash)));
> -		*sec_hash_matches =
> _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> -				_mm256_load_si256(
> -					(__m256i const *)sec_bkt-
> >sig_current),
> -				_mm256_set1_epi32(sec_hash)));
> -		break;
> -#endif
>  #ifdef RTE_MACHINE_CPUFLAG_SSE2
>  	case RTE_HASH_COMPARE_SSE:
> -		/* Compare the first 4 signatures in the bucket */
> -		*prim_hash_matches =
> _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +		/* Compare all signatures in the bucket */
> +		*prim_hash_matches =
> _mm_movemask_epi8(_mm_cmpeq_epi16(
>  				_mm_load_si128(
>  					(__m128i const *)prim_bkt-
> >sig_current),
> -				_mm_set1_epi32(prim_hash)));
> -		*prim_hash_matches |=
> (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> -				_mm_load_si128(
> -					(__m128i const *)&prim_bkt-
> >sig_current[4]),
> -				_mm_set1_epi32(prim_hash)))) << 4;
> -		/* Compare the first 4 signatures in the bucket */
> -		*sec_hash_matches =
> _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +				_mm_set1_epi16(sig)));
> +		/* Compare all signatures in the bucket */
> +		*sec_hash_matches =
> _mm_movemask_epi8(_mm_cmpeq_epi16(
>  				_mm_load_si128(
>  					(__m128i const *)sec_bkt-
> >sig_current),
> -				_mm_set1_epi32(sec_hash)));
> -		*sec_hash_matches |=
> (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> -				_mm_load_si128(
> -					(__m128i const *)&sec_bkt-
> >sig_current[4]),
> -				_mm_set1_epi32(sec_hash)))) << 4;
> +				_mm_set1_epi16(sig)));
>  		break;
>  #endif
>  	default:
>  		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>  			*prim_hash_matches |=
> -				((prim_hash == prim_bkt->sig_current[i]) << i);
> +				((sig == prim_bkt->sig_current[i]) << (i << 1));
>  			*sec_hash_matches |=
> -				((sec_hash == sec_bkt->sig_current[i]) << i);
> +				((sig == sec_bkt->sig_current[i]) << (i << 1));
>  		}
>  	}
> -
>  }
> 
>  #define PREFETCH_OFFSET 4
> @@ -1349,7 +1341,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h,
> const void **keys,
>  	int32_t i;
>  	int32_t ret;
>  	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
> -	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
> +	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
> +	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
> +	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
>  	const struct rte_hash_bucket
> *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>  	const struct rte_hash_bucket
> *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>  	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0}; @@ -
> 1368,10 +1362,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h,
> const void **keys,
>  		rte_prefetch0(keys[i + PREFETCH_OFFSET]);
> 
>  		prim_hash[i] = rte_hash_hash(h, keys[i]);
> -		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
> 
> -		primary_bkt[i] = &h->buckets[prim_hash[i] & h-
> >bucket_bitmask];
> -		secondary_bkt[i] = &h->buckets[sec_hash[i] & h-
> >bucket_bitmask];
> +		sig[i] = get_short_sig(prim_hash[i]);
> +		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
> +		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
> +
> +		primary_bkt[i] = &h->buckets[prim_index[i]];
> +		secondary_bkt[i] = &h->buckets[sec_index[i]];
> 
>  		rte_prefetch0(primary_bkt[i]);
>  		rte_prefetch0(secondary_bkt[i]);
> @@ -1380,10 +1377,13 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,
>  	/* Calculate and prefetch rest of the buckets */
>  	for (; i < num_keys; i++) {
>  		prim_hash[i] = rte_hash_hash(h, keys[i]);
> -		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
> 
> -		primary_bkt[i] = &h->buckets[prim_hash[i] & h-
> >bucket_bitmask];
> -		secondary_bkt[i] = &h->buckets[sec_hash[i] & h-
> >bucket_bitmask];
> +		sig[i] = get_short_sig(prim_hash[i]);
> +		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
> +		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
> +
> +		primary_bkt[i] = &h->buckets[prim_index[i]];
> +		secondary_bkt[i] = &h->buckets[sec_index[i]];
> 
>  		rte_prefetch0(primary_bkt[i]);
>  		rte_prefetch0(secondary_bkt[i]);
> @@ -1394,10 +1394,11 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,
>  	for (i = 0; i < num_keys; i++) {
>  		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
>  				primary_bkt[i], secondary_bkt[i],
> -				prim_hash[i], sec_hash[i], h->sig_cmp_fn);
> +				sig[i], h->sig_cmp_fn);
> 
>  		if (prim_hitmask[i]) {
> -			uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
> +			uint32_t first_hit =
> +					__builtin_ctzl(prim_hitmask[i]) >> 1;
>  			uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
>  			const struct rte_hash_key *key_slot =
>  				(const struct rte_hash_key *)(
> @@ -1408,7 +1409,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h,
> const void **keys,
>  		}
> 
>  		if (sec_hitmask[i]) {
> -			uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
> +			uint32_t first_hit =
> +					__builtin_ctzl(sec_hitmask[i]) >> 1;
>  			uint32_t key_idx = secondary_bkt[i]-
> >key_idx[first_hit];
>  			const struct rte_hash_key *key_slot =
>  				(const struct rte_hash_key *)(
> @@ -1422,7 +1424,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h,
> const void **keys,
>  	for (i = 0; i < num_keys; i++) {
>  		positions[i] = -ENOENT;
>  		while (prim_hitmask[i]) {
> -			uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
> +			uint32_t hit_index =
> +					__builtin_ctzl(prim_hitmask[i]) >> 1;
> 
>  			uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
>  			const struct rte_hash_key *key_slot = @@ -1441,11
> +1444,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void
> **keys,
>  				positions[i] = key_idx - 1;
>  				goto next_key;
>  			}
> -			prim_hitmask[i] &= ~(1 << (hit_index));
> +			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
>  		}
> 
>  		while (sec_hitmask[i]) {
> -			uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
> +			uint32_t hit_index =
> +					__builtin_ctzl(sec_hitmask[i]) >> 1;
> 
>  			uint32_t key_idx = secondary_bkt[i]-
> >key_idx[hit_index];
>  			const struct rte_hash_key *key_slot = @@ -1465,7
> +1469,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void
> **keys,
>  				positions[i] = key_idx - 1;
>  				goto next_key;
>  			}
> -			sec_hitmask[i] &= ~(1 << (hit_index));
> +			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
>  		}
> 
>  next_key:
> @@ -1488,10 +1492,10 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,
>  		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>  			if (data != NULL)
>  				ret = search_one_bucket(h, keys[i],
> -						sec_hash[i], &data[i],
> cur_bkt);
> +						sig[i], &data[i], cur_bkt);
>  			else
>  				ret = search_one_bucket(h, keys[i],
> -						sec_hash[i], NULL, cur_bkt);
> +						sig[i], NULL, cur_bkt);
>  			if (ret != -1) {
>  				positions[i] = ret;
>  				hits |= 1ULL << i;
> diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> b/lib/librte_hash/rte_cuckoo_hash.h
> index e601520..7753cd8 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.h
> +++ b/lib/librte_hash/rte_cuckoo_hash.h
> @@ -129,18 +129,15 @@ struct rte_hash_key {  enum
> rte_hash_sig_compare_function {
>  	RTE_HASH_COMPARE_SCALAR = 0,
>  	RTE_HASH_COMPARE_SSE,
> -	RTE_HASH_COMPARE_AVX2,
>  	RTE_HASH_COMPARE_NUM
>  };
> 
>  /** Bucket structure */
>  struct rte_hash_bucket {
> -	hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
> +	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
> 
>  	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
> 
> -	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
> -
>  	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
> 
>  	void *next;
> @@ -193,6 +190,7 @@ struct rte_hash {
> 
>  struct queue_node {
>  	struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
> +	uint32_t cur_bkt_idx;
> 
>  	struct queue_node *prev;     /* Parent(bucket) in search path */
>  	int prev_slot;               /* Parent(slot) in search path */
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h index
> 11d8e28..6ace64e 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -40,7 +40,10 @@ extern "C" {
>  /** Flag to indicate the extendabe bucket table feature should be used */
> #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
> 
> -/** Signature of key that is stored internally. */
> +/**
> + * The type of hash value of a key.
> + * It should be a value of at least 32bit with fully random pattern.
> + */
>  typedef uint32_t hash_sig_t;
> 
>  /** Type of function that can be used for calculating the hash value. */
> --
> 2.7.4
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 1/4] hash: fix race condition in iterate
  2018-09-28 17:23     ` [PATCH v4 1/4] hash: fix race condition in iterate Yipeng Wang
@ 2018-10-01 20:23       ` Honnappa Nagarahalli
  2018-10-02  0:17         ` Wang, Yipeng1
  0 siblings, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-01 20:23 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson; +Cc: konstantin.ananyev, dev, sameh.gobriel, nd

> 
> In rte_hash_iterate, the reader lock did not protect the while loop which
> checks empty entry. This created a race condition that the entry may become
> empty when enters the lock, then a wrong key data value would be read out.
> 
> This commit extends the protected region.
> 
> Fixes: f2e3001b53ec ("hash: support read/write concurrency")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_hash/rte_cuckoo_hash.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index f7b86c8..eba13e9 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h, const
> void **key, void **data, uint32
>  	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>  	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> 
> +	__hash_rw_reader_lock(h);
This does not work well with the lock-less changes I am making.  We should leave the lock in its original position. Instead change the while loop as follows:

while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT)

>  	/* If current position is empty, go to the next one */
>  	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>  		(*next)++;
>  		/* End of table */
> -		if (*next == total_entries)
> +		if (*next == total_entries) {
> +			__hash_rw_reader_unlock(h);
>  			return -ENOENT;
> +		}
>  		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>  		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>  	}
> -	__hash_rw_reader_lock(h);
> +
>  	/* Get position of entry in key table */
>  	position = h->buckets[bucket_idx].key_idx[idx];
If we change the while loop as I suggested as above, we can remove this line.

>  	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 1/5] test/hash: fix bucket size in hash perf test
  2018-09-28 14:11     ` [PATCH v4 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
@ 2018-10-01 20:28       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-01 20:28 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson
  Cc: dev, sameh.gobriel, nd, Honnappa Nagarahalli

> 
> The bucket size was changed from 4 to 8 but the corresponding perf test was
> not changed accordingly.
> 
> In the test, the bucket size and number of buckets are used to map to the
> underneath rte_hash structure. They are used to test performance of two
> conditions: keys in primary buckets only and keys in both primary and
> secondary buckets.
> 
> Although there is no functional issue with bucket size set to 4, it mismatches
> the underneath rte_hash structure, which may affect code readability and
> future extension.
> 
> Fixes: 58017c98ed53 ("hash: add vectorized comparison")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  test/test/test_hash_perf.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c index
> 33dcb9f..fe11632 100644
> --- a/test/test/test_hash_perf.c
> +++ b/test/test/test_hash_perf.c
> @@ -20,7 +20,8 @@
>  #define MAX_ENTRIES (1 << 19)
>  #define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
> #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added,
> several times */ -#define BUCKET_SIZE 4
> +/* BUCKET_SIZE should be same as RTE_HASH_BUCKET_ENTRIES in rte_hash
> +library */ #define BUCKET_SIZE 8
>  #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)  #define
> MAX_KEYSIZE 64  #define NUM_KEYSIZES 10
> --
> 2.7.4
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-09-29  1:10       ` Wang, Yipeng1
@ 2018-10-01 20:56         ` Honnappa Nagarahalli
  2018-10-02  1:56           ` Wang, Yipeng1
  0 siblings, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-01 20:56 UTC (permalink / raw)
  To: Wang, Yipeng1, Richardson, Bruce
  Cc: dev, michel, Gobriel, Sameh, Honnappa Nagarahalli, nd

> >-----Original Message-----
> >From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> >>
> >> Extendable bucket table composes of buckets that can be linked list
> >> to current main table. When extendable bucket is enabled, the table
> >> utilization can always acheive 100%.
> >IMO, referring to this as 'table utilization' indicates an efficiency
> >about memory utilization. Please consider changing this to indicate that all of
> the configured number of entries will be accommodated?
> >
> [Wang, Yipeng] Improved in V4, please check! Thanks!
> 
> >> +snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> >> +params-
> >> >name);
> >Can be inside the if statement below.
> [Wang, Yipeng] Done in V3, Thanks!
> >
> >> +/* Populate ext bkt ring. We reserve 0 similar to the
> >> + * key-data slot, just in case in future we want to
> >> + * use bucket index for the linked list and 0 means NULL
> >> + * for next bucket
> >> + */
> >> +for (i = 1; i <= num_buckets; i++)
> >Since, the bucket index 0 is reserved, should be 'i < num_buckets'
> [Wang, Yipeng]  So the bucket array is from 0 to num_buckets - 1, and the
> index Is from 1 to num_buckets. So I guess reserving 0 means reserving the
> index 0 but not reduce the usable bucket count.
> So I guess we still need to enqueue index of 1 to num_buckets into the free
> Bucket ring for use?
I understand it now. I mis-read the 'similar to the key-data slot' comment. I see that the changes are correct.
Minor comment, I am not particular: I think it makes sense to change it to the same logic followed for key-data slot. i.e. allocate an extra bucket.

> >
> >>  rte_free(h->key_store);
> >>  rte_free(h->buckets);
> >Add rte_free(h->buckets_ext);
> [Wang, Yipeng] Done in V3, thanks!
> >
> >> +for (i = 1; i < h->num_buckets + 1; i++)
Minor comment:
If we are not changing the logic, I suggest we change the for loop as follows (like it is done earlier)
for (i = 1; i <= h->num_buckets; i++)


> >Index 0 is reserved as per the comments. Condition should be 'i < h-
> >num_buckets'.
> [Wang, Yipeng] Similar to previous one, I guess we still need the number of
> num_buckets index To be inserted in the ring.
> >
> >> +bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
> >If index 0 is reserved, -1 is not required.
> >
> [Wang, Yipeng] Similar to previous one, bkt_id is the subscript to the array so
> it ranges from 0 to num_bucket - 1.
> While the index is ranged from 1 to num_bucket. So I guess every time we got
> a bucket index we need to -1 To get the bucket array subscript.
> >> +if (tobe_removed_bkt) {
> >> +uint32_t index = tobe_removed_bkt - h->buckets_ext + 1;
> >No need to increase the index by 1 if entry 0 is reserved.
> >
> [Wang, Yipeng] Similar to previous one.
> >> @@ -1308,10 +1519,13 @@ rte_hash_iterate(const struct rte_hash *h,
> >> const void **key, void **data, uint32
> >>
> >>  RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
> >>
> >> -const uint32_t total_entries = h->num_buckets *
> >> RTE_HASH_BUCKET_ENTRIES;
> >> +const uint32_t total_entries_main = h->num_buckets *
> >> +
> >> RTE_HASH_BUCKET_ENTRIES;
> >> +const uint32_t total_entries = total_entries_main << 1;
> >> +
> >>  /* Out of bounds */
> >Minor: update the comment to reflect the new code.
> [Wang, Yipeng] Done in V4, thanks!
> >
> >> @@ -1341,4 +1555,32 @@ rte_hash_iterate(const struct rte_hash *h,
> >> const void **key, void **data, uint32  (*next)++;
> >>
> >>  return position - 1;
> >> +
> >> +extend_table:
> >> +/* Out of bounds */
> >> +if (*next >= total_entries || !h->ext_table_support) return -ENOENT;
> >> +
> >> +bucket_idx = (*next - total_entries_main) /
> >> RTE_HASH_BUCKET_ENTRIES;
> >> +idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
> >> +
> >> +while (h->buckets_ext[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> >> +(*next)++; if (*next == total_entries) return -ENOENT; bucket_idx =
> >> +(*next - total_entries_main) / RTE_HASH_BUCKET_ENTRIES; idx = (*next
> >> +- total_entries_main) %
> >> RTE_HASH_BUCKET_ENTRIES;
> >> +}
> >> +/* Get position of entry in key table */ position =
> >> +h->buckets_ext[bucket_idx].key_idx[idx];
> >There is a possibility that 'position' is not the same value read in
> >the while loop. It presents a problem if 'position' becomes EMPTY_SLOT.
> >'position' should be read as part of the while loop. Since it is 32b value, it
> should be atomic on most platforms. This issue applies to existing code as
> well.
> >
> [Wang, Yipeng] I agree. I add a new bug fix commit to fix this in V4. Basically I
> just extend the current critical region to cover The while loop. Please check if
> that works. Thanks.
> 
> >__hash_rw_reader_lock(h) required
> >> +next_key = (struct rte_hash_key *) ((char *)h->key_store + position
> >> +* h->key_entry_size);
> >> +/* Return key and data */
> >> +*key = next_key->key;
> >> +*data = next_key->pdata;
> >> +
> >__hash_rw_reader_unlock(h) required
> [Wang, Yipeng] Agree, done in V4.  Thanks!
> >

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 1/4] hash: fix race condition in iterate
  2018-10-01 20:23       ` Honnappa Nagarahalli
@ 2018-10-02  0:17         ` Wang, Yipeng1
  2018-10-02  4:26           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-10-02  0:17 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce
  Cc: Ananyev, Konstantin, dev, Gobriel, Sameh, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> @@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h, const
>> void **key, void **data, uint32
>>  	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>>  	idx = *next % RTE_HASH_BUCKET_ENTRIES;
>>
>> +	__hash_rw_reader_lock(h);
>This does not work well with the lock-less changes I am making.  We should leave the lock in its original position. Instead change the
>while loop as follows:
>
>while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT)
>
>>  	/* If current position is empty, go to the next one */
>>  	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>>  		(*next)++;
>>  		/* End of table */
>> -		if (*next == total_entries)
>> +		if (*next == total_entries) {
>> +			__hash_rw_reader_unlock(h);
>>  			return -ENOENT;
>> +		}
>>  		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>>  		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>>  	}
>> -	__hash_rw_reader_lock(h);
>> +
>>  	/* Get position of entry in key table */
>>  	position = h->buckets[bucket_idx].key_idx[idx];
>If we change the while loop as I suggested as above, we can remove this line.
>
>>  	next_key = (struct rte_hash_key *) ((char *)h->key_store +

[Wang, Yipeng] Sorry that I did not realize you already have it in your patch set and I agree.
Do you want to export it as a bug fix in your patch set? I will remove my change.

For the lock free, do we need to protect it with version counter? Imagine the following corner case:
While the iterator read the key and data, there is a writer deleted, removed, and recycled the key-data pair,
and write a new key and data into it. While the writer are writing, will the reader reads out wrong key/data, or
mismatched key/data?

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v2 5/7] hash: add extendable bucket feature
  2018-10-01 20:56         ` Honnappa Nagarahalli
@ 2018-10-02  1:56           ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-10-02  1:56 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce; +Cc: dev, michel, Gobriel, Sameh, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> >> +for (i = 1; i <= num_buckets; i++)
>> >Since, the bucket index 0 is reserved, should be 'i < num_buckets'
>> [Wang, Yipeng]  So the bucket array is from 0 to num_buckets - 1, and the
>> index Is from 1 to num_buckets. So I guess reserving 0 means reserving the
>> index 0 but not reduce the usable bucket count.
>> So I guess we still need to enqueue index of 1 to num_buckets into the free
>> Bucket ring for use?
>I understand it now. I mis-read the 'similar to the key-data slot' comment. I see that the changes are correct.
>Minor comment, I am not particular: I think it makes sense to change it to the same logic followed for key-data slot. i.e. allocate an
>extra bucket.
[Wang, Yipeng] hmm, I think key-data slot doing similar thing that allocates same number of slots.
As I re-read the code, maybe the current code allocate one more than needed?
>
>> >
>> >>  rte_free(h->key_store);
>> >>  rte_free(h->buckets);
>> >Add rte_free(h->buckets_ext);
>> [Wang, Yipeng] Done in V3, thanks!
>> >
>> >> +for (i = 1; i < h->num_buckets + 1; i++)
>Minor comment:
>If we are not changing the logic, I suggest we change the for loop as follows (like it is done earlier)
>for (i = 1; i <= h->num_buckets; i++)
>
[Wang, Yipeng] Thanks, I did in V5.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 2/4] hash: add extendable bucket feature
  2018-09-28 17:23     ` [PATCH v4 2/4] hash: add extendable bucket feature Yipeng Wang
@ 2018-10-02  3:58       ` Honnappa Nagarahalli
  2018-10-02 23:39         ` Wang, Yipeng1
  2018-10-03 15:08       ` Stephen Hemminger
  1 sibling, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-02  3:58 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson
  Cc: konstantin.ananyev, dev, sameh.gobriel, Honnappa Nagarahalli, nd

> 
> In use cases that hash table capacity needs to be guaranteed, the extendable
> bucket feature can be used to contain extra keys in linked lists when conflict
> happens. This is similar concept to the extendable bucket hash table in packet
> framework.
> 
> This commit adds the extendable bucket feature. User can turn it on or off
> through the extra flag field during table creation time.
> 
> Extendable bucket table composes of buckets that can be linked list to current
> main table. When extendable bucket is enabled, the hash table load can
> always acheive 100%.
> In other words, the table can always accomodate the same number of keys as
> the specified table size. This provides 100% table capacity guarantee.
> Although keys ending up in the ext buckets may have longer look up time, they
> should be rare due to the cuckoo algorithm.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  lib/librte_hash/rte_cuckoo_hash.c | 376
> ++++++++++++++++++++++++++++++++------
>  lib/librte_hash/rte_cuckoo_hash.h |   5 +
>  lib/librte_hash/rte_hash.h        |   3 +
>  3 files changed, 331 insertions(+), 53 deletions(-)
> 
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index eba13e9..02650b9 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -31,6 +31,10 @@
>  #include "rte_hash.h"
>  #include "rte_cuckoo_hash.h"
> 
> +#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)
> \
> +	for (CURRENT_BKT = START_BUCKET;                                      \
> +		CURRENT_BKT != NULL;                                          \
> +		CURRENT_BKT = CURRENT_BKT->next)
> 
>  TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
> 
> @@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
>  	return h;
>  }
> 
> +static inline struct rte_hash_bucket *
> +rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt) {
> +	while (lst_bkt->next != NULL)
> +		lst_bkt = lst_bkt->next;
> +	return lst_bkt;
> +}
> +
>  void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)  {
>  	h->cmp_jump_table_idx = KEY_CUSTOM;
> @@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  	struct rte_tailq_entry *te = NULL;
>  	struct rte_hash_list *hash_list;
>  	struct rte_ring *r = NULL;
> +	struct rte_ring *r_ext = NULL;
>  	char hash_name[RTE_HASH_NAMESIZE];
>  	void *k = NULL;
>  	void *buckets = NULL;
> +	void *buckets_ext = NULL;
>  	char ring_name[RTE_RING_NAMESIZE];
> +	char ext_ring_name[RTE_RING_NAMESIZE];
>  	unsigned num_key_slots;
>  	unsigned i;
>  	unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
> +	unsigned int ext_table_support = 0;
>  	unsigned int readwrite_concur_support = 0;
> 
>  	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
> @@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  		multi_writer_support = 1;
>  	}
> 
> +	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
> +		ext_table_support = 1;
> +
>  	/* Store all keys and leave the first entry as a dummy entry for
> lookup_bulk */
>  	if (multi_writer_support)
>  		/*
> @@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  		goto err;
>  	}
> 
> +	const uint32_t num_buckets = rte_align32pow2(params->entries) /
> +						RTE_HASH_BUCKET_ENTRIES;
> +
> +	/* Create ring for extendable buckets. */
> +	if (ext_table_support) {
> +		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
> +								params-
> >name);
> +		r_ext = rte_ring_create(ext_ring_name,
> +				rte_align32pow2(num_buckets + 1),
> +				params->socket_id, 0);
> +
> +		if (r_ext == NULL) {
> +			RTE_LOG(ERR, HASH, "ext buckets memory allocation
> "
> +								"failed\n");
> +			goto err;
> +		}
> +	}
> +
>  	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
> 
>  	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> @@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  		goto err_unlock;
>  	}
> 
> -	const uint32_t num_buckets = rte_align32pow2(params->entries)
> -					/ RTE_HASH_BUCKET_ENTRIES;
> -
>  	buckets = rte_zmalloc_socket(NULL,
>  				num_buckets * sizeof(struct rte_hash_bucket),
>  				RTE_CACHE_LINE_SIZE, params->socket_id);
> 
>  	if (buckets == NULL) {
> -		RTE_LOG(ERR, HASH, "memory allocation failed\n");
> +		RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
>  		goto err_unlock;
>  	}
> 
> +	/* Allocate same number of extendable buckets */
> +	if (ext_table_support) {
> +		buckets_ext = rte_zmalloc_socket(NULL,
> +				num_buckets * sizeof(struct rte_hash_bucket),
> +				RTE_CACHE_LINE_SIZE, params->socket_id);
> +		if (buckets_ext == NULL) {
> +			RTE_LOG(ERR, HASH, "ext buckets memory allocation
> "
> +							"failed\n");
> +			goto err_unlock;
> +		}
> +		/* Populate ext bkt ring. We reserve 0 similar to the
> +		 * key-data slot, just in case in future we want to
> +		 * use bucket index for the linked list and 0 means NULL
> +		 * for next bucket
> +		 */
> +		for (i = 1; i <= num_buckets; i++)
> +			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
> +	}
> +
>  	const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params-
> >key_len;
>  	const uint64_t key_tbl_size = (uint64_t) key_entry_size *
> num_key_slots;
> 
> @@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  	h->num_buckets = num_buckets;
>  	h->bucket_bitmask = h->num_buckets - 1;
>  	h->buckets = buckets;
> +	h->buckets_ext = buckets_ext;
> +	h->free_ext_bkts = r_ext;
>  	h->hash_func = (params->hash_func == NULL) ?
>  		default_hash_func : params->hash_func;
>  	h->key_store = k;
> @@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  	h->hw_trans_mem_support = hw_trans_mem_support;
>  	h->multi_writer_support = multi_writer_support;
>  	h->readwrite_concur_support = readwrite_concur_support;
> +	h->ext_table_support = ext_table_support;
> 
>  #if defined(RTE_ARCH_X86)
>  	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> @@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters
> *params)
>  	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
>  err:
>  	rte_ring_free(r);
> +	rte_ring_free(r_ext);
>  	rte_free(te);
>  	rte_free(h);
>  	rte_free(buckets);
> +	rte_free(buckets_ext);
>  	rte_free(k);
>  	return NULL;
>  }
> @@ -344,8 +402,10 @@ rte_hash_free(struct rte_hash *h)
>  		rte_free(h->readwrite_lock);
>  	}
>  	rte_ring_free(h->free_slots);
> +	rte_ring_free(h->free_ext_bkts);
>  	rte_free(h->key_store);
>  	rte_free(h->buckets);
> +	rte_free(h->buckets_ext);
>  	rte_free(h);
>  	rte_free(te);
>  }
> @@ -403,7 +463,6 @@ __hash_rw_writer_lock(const struct rte_hash *h)
>  		rte_rwlock_write_lock(h->readwrite_lock);
>  }
> 
> -
>  static inline void
>  __hash_rw_reader_lock(const struct rte_hash *h)  { @@ -448,6 +507,14 @@
> rte_hash_reset(struct rte_hash *h)
>  	while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
>  		rte_pause();
> 
> +	/* clear free extendable bucket ring and memory */
> +	if (h->ext_table_support) {
> +		memset(h->buckets_ext, 0, h->num_buckets *
> +						sizeof(struct
> rte_hash_bucket));
> +		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
> +			rte_pause();
> +	}
> +
>  	/* Repopulate the free slots ring. Entry zero is reserved for key misses
> */
>  	if (h->multi_writer_support)
>  		tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) * @@ -
> 458,6 +525,13 @@ rte_hash_reset(struct rte_hash *h)
>  	for (i = 1; i < tot_ring_cnt + 1; i++)
>  		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
> 
> +	/* Repopulate the free ext bkt ring. */
> +	if (h->ext_table_support) {
> +		for (i = 1; i < h->num_buckets + 1; i++)
> +			rte_ring_sp_enqueue(h->free_ext_bkts,
> +						(void *)((uintptr_t) i));
> +	}
> +
>  	if (h->multi_writer_support) {
>  		/* Reset local caches per lcore */
>  		for (i = 0; i < RTE_MAX_LCORE; i++)
> @@ -524,24 +598,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash
> *h,
>  		int32_t *ret_val)
>  {
>  	unsigned int i;
> -	struct rte_hash_bucket *cur_bkt = prim_bkt;
> +	struct rte_hash_bucket *cur_bkt;
>  	int32_t ret;
> 
>  	__hash_rw_writer_lock(h);
>  	/* Check if key was inserted after last check but before this
>  	 * protected region in case of inserting duplicated keys.
>  	 */
> -	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
> +	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
>  	if (ret != -1) {
>  		__hash_rw_writer_unlock(h);
>  		*ret_val = ret;
>  		return 1;
>  	}
> -	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
> -	if (ret != -1) {
> -		__hash_rw_writer_unlock(h);
> -		*ret_val = ret;
> -		return 1;
> +
> +	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> +		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +		if (ret != -1) {
> +			__hash_rw_writer_unlock(h);
> +			*ret_val = ret;
> +			return 1;
> +		}
>  	}
> 
>  	/* Insert new entry if there is room in the primary @@ -580,7 +657,7
> @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>  			int32_t *ret_val)
>  {
>  	uint32_t prev_alt_bkt_idx;
> -	struct rte_hash_bucket *cur_bkt = bkt;
> +	struct rte_hash_bucket *cur_bkt;
>  	struct queue_node *prev_node, *curr_node = leaf;
>  	struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
>  	uint32_t prev_slot, curr_slot = leaf_slot; @@ -597,18 +674,20 @@
> rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>  	/* Check if key was inserted after last check but before this
>  	 * protected region.
>  	 */
> -	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
> +	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
>  	if (ret != -1) {
>  		__hash_rw_writer_unlock(h);
>  		*ret_val = ret;
>  		return 1;
>  	}
> 
> -	ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
> -	if (ret != -1) {
> -		__hash_rw_writer_unlock(h);
> -		*ret_val = ret;
> -		return 1;
> +	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
> +		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +		if (ret != -1) {
> +			__hash_rw_writer_unlock(h);
> +			*ret_val = ret;
> +			return 1;
> +		}
>  	}
> 
>  	while (likely(curr_node->prev != NULL)) { @@ -711,15 +790,18 @@
> __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,  {
>  	hash_sig_t alt_hash;
>  	uint32_t prim_bucket_idx, sec_bucket_idx;
> -	struct rte_hash_bucket *prim_bkt, *sec_bkt;
> +	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
>  	struct rte_hash_key *new_k, *keys = h->key_store;
>  	void *slot_id = NULL;
> -	uint32_t new_idx;
> +	void *ext_bkt_id = NULL;
> +	uint32_t new_idx, bkt_id;
>  	int ret;
>  	unsigned n_slots;
>  	unsigned lcore_id;
> +	unsigned int i;
>  	struct lcore_cache *cached_free_slots = NULL;
>  	int32_t ret_val;
> +	struct rte_hash_bucket *last;
> 
>  	prim_bucket_idx = sig & h->bucket_bitmask;
>  	prim_bkt = &h->buckets[prim_bucket_idx]; @@ -739,10 +821,12 @@
> __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>  	}
> 
>  	/* Check if key is already inserted in secondary location */
> -	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
> -	if (ret != -1) {
> -		__hash_rw_writer_unlock(h);
> -		return ret;
> +	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> +		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +		if (ret != -1) {
> +			__hash_rw_writer_unlock(h);
> +			return ret;
> +		}
>  	}
>  	__hash_rw_writer_unlock(h);
> 
> @@ -808,10 +892,70 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
>  	else if (ret == 1) {
>  		enqueue_slot_back(h, cached_free_slots, slot_id);
>  		return ret_val;
> -	} else {
> +	}
> +
> +	/* if ext table not enabled, we failed the insertion */
> +	if (!h->ext_table_support) {
>  		enqueue_slot_back(h, cached_free_slots, slot_id);
>  		return ret;
>  	}
> +
> +	/* Now we need to go through the extendable bucket. Protection is
> needed
> +	 * to protect all extendable bucket processes.
> +	 */
> +	__hash_rw_writer_lock(h);
> +	/* We check for duplicates again since could be inserted before the
> lock */
> +	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +	if (ret != -1) {
> +		enqueue_slot_back(h, cached_free_slots, slot_id);
> +		goto failure;
> +	}
> +
> +	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> +		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +		if (ret != -1) {
> +			enqueue_slot_back(h, cached_free_slots, slot_id);
> +			goto failure;
> +		}
> +	}
> +
> +	/* Search sec and ext buckets to find an empty entry to insert. */
> +	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> +		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +			/* Check if slot is available */
> +			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
> +				cur_bkt->sig_current[i] = alt_hash;
> +				cur_bkt->sig_alt[i] = sig;
> +				cur_bkt->key_idx[i] = new_idx;
> +				__hash_rw_writer_unlock(h);
> +				return new_idx - 1;
> +			}
> +		}
> +	}
> +
> +	/* Failed to get an empty entry from extendable buckets. Link a new
> +	 * extendable bucket. We first get a free bucket from ring.
> +	 */
> +	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
> +		ret = -ENOSPC;
> +		goto failure;
> +	}
> +
> +	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
> +	/* Use the first location of the new bucket */
> +	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
> +	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
> +	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
> +	/* Link the new bucket to sec bucket linked list */
> +	last = rte_hash_get_last_bkt(sec_bkt);
> +	last->next = &h->buckets_ext[bkt_id];
> +	__hash_rw_writer_unlock(h);
> +	return new_idx - 1;
> +
> +failure:
> +	__hash_rw_writer_unlock(h);
> +	return ret;
> +
>  }
> 
>  int32_t
> @@ -890,7 +1034,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash
> *h, const void *key,  {
>  	uint32_t bucket_idx;
>  	hash_sig_t alt_hash;
> -	struct rte_hash_bucket *bkt;
> +	struct rte_hash_bucket *bkt, *cur_bkt;
>  	int ret;
> 
>  	bucket_idx = sig & h->bucket_bitmask;
> @@ -910,10 +1054,12 @@ __rte_hash_lookup_with_hash(const struct
> rte_hash *h, const void *key,
>  	bkt = &h->buckets[bucket_idx];
> 
>  	/* Check if key is in secondary location */
> -	ret = search_one_bucket(h, key, alt_hash, data, bkt);
> -	if (ret != -1) {
> -		__hash_rw_reader_unlock(h);
> -		return ret;
> +	FOR_EACH_BUCKET(cur_bkt, bkt) {
> +		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
> +		if (ret != -1) {
> +			__hash_rw_reader_unlock(h);
> +			return ret;
> +		}
>  	}
>  	__hash_rw_reader_unlock(h);
>  	return -ENOENT;
> @@ -978,16 +1124,42 @@ remove_entry(const struct rte_hash *h, struct
> rte_hash_bucket *bkt, unsigned i)
>  	}
>  }
> 
> +/* Compact the linked list by moving key from last entry in linked list
> +to the
> + * empty slot.
> + */
> +static inline void
> +__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
> +	int i;
> +	struct rte_hash_bucket *last_bkt;
> +
> +	if (!cur_bkt->next)
> +		return;
> +
> +	last_bkt = rte_hash_get_last_bkt(cur_bkt);
> +
> +	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
> +		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
> +			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
> +			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
> +			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
> +			last_bkt->sig_current[i] = NULL_SIGNATURE;
> +			last_bkt->sig_alt[i] = NULL_SIGNATURE;
> +			last_bkt->key_idx[i] = EMPTY_SLOT;
> +			return;
In lock-free algorithm, this will require the global counter increment.

> +		}
> +	}
> +}
> +
>  /* Search one bucket and remove the matched key */  static inline int32_t
> search_and_remove(const struct rte_hash *h, const void *key,
> -			struct rte_hash_bucket *bkt, hash_sig_t sig)
> +			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
>  {
>  	struct rte_hash_key *k, *keys = h->key_store;
>  	unsigned int i;
>  	int32_t ret;
> 
> -	/* Check if key is in primary location */
> +	/* Check if key is in bucket */
>  	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>  		if (bkt->sig_current[i] == sig &&
>  				bkt->key_idx[i] != EMPTY_SLOT) {
> @@ -996,12 +1168,12 @@ search_and_remove(const struct rte_hash *h,
> const void *key,
>  			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>  				remove_entry(h, bkt, i);
> 
> -				/*
> -				 * Return index where key is stored,
> +				/* Return index where key is stored,
>  				 * subtracting the first dummy index
>  				 */
>  				ret = bkt->key_idx[i] - 1;
>  				bkt->key_idx[i] = EMPTY_SLOT;
> +				*pos = i;
>  				return ret;
>  			}
>  		}
> @@ -1015,34 +1187,66 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,  {
>  	uint32_t bucket_idx;
>  	hash_sig_t alt_hash;
> -	struct rte_hash_bucket *bkt;
> -	int32_t ret;
> +	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
> +	struct rte_hash_bucket *cur_bkt;
> +	int pos;
> +	int32_t ret, i;
> 
>  	bucket_idx = sig & h->bucket_bitmask;
> -	bkt = &h->buckets[bucket_idx];
> +	prim_bkt = &h->buckets[bucket_idx];
> 
>  	__hash_rw_writer_lock(h);
>  	/* look for key in primary bucket */
> -	ret = search_and_remove(h, key, bkt, sig);
> +	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
>  	if (ret != -1) {
> -		__hash_rw_writer_unlock(h);
> -		return ret;
> +		__rte_hash_compact_ll(prim_bkt, pos);
> +		last_bkt = prim_bkt->next;
> +		prev_bkt = prim_bkt;
> +		goto return_bkt;
>  	}
> 
>  	/* Calculate secondary hash */
>  	alt_hash = rte_hash_secondary_hash(sig);
>  	bucket_idx = alt_hash & h->bucket_bitmask;
> -	bkt = &h->buckets[bucket_idx];
> +	sec_bkt = &h->buckets[bucket_idx];
> +
> +	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> +		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
> +		if (ret != -1) {
> +			__rte_hash_compact_ll(cur_bkt, pos);
> +			last_bkt = sec_bkt->next;
> +			prev_bkt = sec_bkt;
> +			goto return_bkt;
> +		}
> +	}
> 
> -	/* look for key in secondary bucket */
> -	ret = search_and_remove(h, key, bkt, alt_hash);
> -	if (ret != -1) {
> +	__hash_rw_writer_unlock(h);
> +	return -ENOENT;
> +
> +/* Search last bucket to see if empty to be recycled */
> +return_bkt:
> +	if (!last_bkt) {
>  		__hash_rw_writer_unlock(h);
>  		return ret;
>  	}
> +	while (last_bkt->next) {
> +		prev_bkt = last_bkt;
> +		last_bkt = last_bkt->next;
> +	}
Minor: We are trying to find the last bucket here, along with its previous. May be we can modify 'rte_hash_get_last_bkt' instead?

> +
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		if (last_bkt->key_idx[i] != EMPTY_SLOT)
> +			break;
> +	}
> +	/* found empty bucket and recycle */
> +	if (i == RTE_HASH_BUCKET_ENTRIES) {
> +		prev_bkt->next = last_bkt->next = NULL;
> +		uint32_t index = last_bkt - h->buckets_ext + 1;
> +		rte_ring_sp_enqueue(h->free_ext_bkts, (void
> *)(uintptr_t)index);
In the lock-less algorithm, the bucket cannot be freed immediately. I looked at couple of solutions. The bucket needs to be stored internally and should be associated with the key-store index (or position). I am thinking that I will add a field to 'struct rte_hash_key' to store the bucket pointer or index.

>From the code, my understanding is that we will free only the last bucket. We will never free the middle bucket, please correct me if I am wrong. This will keep it simple for the lock-free algorithm.

I could work through these issues. So, I do not see any issues for lock-free algorithm (as of now :) ).

> +	}
> 
>  	__hash_rw_writer_unlock(h);
> -	return -ENOENT;
> +	return ret;
>  }
> 
>  int32_t
> @@ -1143,12 +1347,14 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,  {
>  	uint64_t hits = 0;
>  	int32_t i;
> +	int32_t ret;
>  	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
>  	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
>  	const struct rte_hash_bucket
> *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>  	const struct rte_hash_bucket
> *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>  	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
>  	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
> +	struct rte_hash_bucket *cur_bkt, *next_bkt;
> 
>  	/* Prefetch first keys */
>  	for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++) @@ -1266,6
> +1472,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void
> **keys,
>  		continue;
>  	}
> 
> +	/* all found, do not need to go through ext bkt */
> +	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
> +		if (hit_mask != NULL)
> +			*hit_mask = hits;
> +		__hash_rw_reader_unlock(h);
> +		return;
> +	}
> +
> +	/* need to check ext buckets for match */
> +	for (i = 0; i < num_keys; i++) {
> +		if ((hits & (1ULL << i)) != 0)
> +			continue;
> +		next_bkt = secondary_bkt[i]->next;
> +		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> +			if (data != NULL)
> +				ret = search_one_bucket(h, keys[i],
> +						sec_hash[i], &data[i],
> cur_bkt);
> +			else
> +				ret = search_one_bucket(h, keys[i],
> +						sec_hash[i], NULL, cur_bkt);
> +			if (ret != -1) {
> +				positions[i] = ret;
> +				hits |= 1ULL << i;
> +				break;
> +			}
> +		}
> +	}
> +
>  	__hash_rw_reader_unlock(h);
> 
>  	if (hit_mask != NULL)
> @@ -1308,10 +1542,13 @@ rte_hash_iterate(const struct rte_hash *h, const
> void **key, void **data, uint32
> 
>  	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
> 
> -	const uint32_t total_entries = h->num_buckets *
> RTE_HASH_BUCKET_ENTRIES;
> -	/* Out of bounds */
> -	if (*next >= total_entries)
> -		return -ENOENT;
> +	const uint32_t total_entries_main = h->num_buckets *
> +
> 	RTE_HASH_BUCKET_ENTRIES;
> +	const uint32_t total_entries = total_entries_main << 1;
> +
> +	/* Out of bounds of all buckets (both main table and ext table */
Typo: missing ')'

> +	if (*next >= total_entries_main)
> +		goto extend_table;
> 
>  	/* Calculate bucket and index of current iterator */
>  	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES; @@ -1322,14
> +1559,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key,
> void **data, uint32
>  	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>  		(*next)++;
>  		/* End of table */
> -		if (*next == total_entries) {
> +		if (*next == total_entries_main) {
>  			__hash_rw_reader_unlock(h);
> -			return -ENOENT;
> +			goto extend_table;
>  		}
>  		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>  		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>  	}
> -
>  	/* Get position of entry in key table */
>  	position = h->buckets[bucket_idx].key_idx[idx];
>  	next_key = (struct rte_hash_key *) ((char *)h->key_store + @@ -
> 1344,4 +1580,38 @@ rte_hash_iterate(const struct rte_hash *h, const void
> **key, void **data, uint32
>  	(*next)++;
> 
>  	return position - 1;
> +
> +/* Begin to iterate extendable buckets */
> +extend_table:
> +	/* Out of total bound or if ext bucket feature is not enabled */
> +	if (*next >= total_entries || !h->ext_table_support)
> +		return -ENOENT;
> +
> +	bucket_idx = (*next - total_entries_main) /
> RTE_HASH_BUCKET_ENTRIES;
> +	idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
> +
> +	__hash_rw_reader_lock(h);
> +	while (h->buckets_ext[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> +		(*next)++;
> +		if (*next == total_entries) {
> +			__hash_rw_reader_unlock(h);
> +			return -ENOENT;
> +		}
> +		bucket_idx = (*next - total_entries_main) /
> +						RTE_HASH_BUCKET_ENTRIES;
> +		idx = (*next - total_entries_main) %
> RTE_HASH_BUCKET_ENTRIES;
> +	}
> +	/* Get position of entry in key table */
> +	position = h->buckets_ext[bucket_idx].key_idx[idx];
> +	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> +				position * h->key_entry_size);
> +	/* Return key and data */
> +	*key = next_key->key;
> +	*data = next_key->pdata;
> +
> +	__hash_rw_reader_unlock(h);
> +
> +	/* Increment iterator */
> +	(*next)++;
> +	return position - 1;
>  }
> diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> b/lib/librte_hash/rte_cuckoo_hash.h
> index fc0e5c2..e601520 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.h
> +++ b/lib/librte_hash/rte_cuckoo_hash.h
> @@ -142,6 +142,8 @@ struct rte_hash_bucket {
>  	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
> 
>  	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
> +
> +	void *next;
>  } __rte_cache_aligned;
> 
>  /** A hash table structure. */
> @@ -166,6 +168,7 @@ struct rte_hash {
>  	/**< If multi-writer support is enabled. */
>  	uint8_t readwrite_concur_support;
>  	/**< If read-write concurrency support is enabled */
> +	uint8_t ext_table_support;     /**< Enable extendable bucket table */
>  	rte_hash_function hash_func;    /**< Function used to calculate hash.
> */
>  	uint32_t hash_func_init_val;    /**< Init value used by hash_func. */
>  	rte_hash_cmp_eq_t rte_hash_custom_cmp_eq; @@ -184,6 +187,8
> @@ struct rte_hash {
>  	 * to the key table.
>  	 */
>  	rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
> +	struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */
> +	struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets
> +*/
>  } __rte_cache_aligned;
> 
>  struct queue_node {
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h index
> 9e7d931..11d8e28 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -37,6 +37,9 @@ extern "C" {
>  /** Flag to support reader writer concurrency */  #define
> RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
> 
> +/** Flag to indicate the extendabe bucket table feature should be used
> +*/ #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
> +
>  /** Signature of key that is stored internally. */  typedef uint32_t hash_sig_t;
> 
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 1/4] hash: fix race condition in iterate
  2018-10-02  0:17         ` Wang, Yipeng1
@ 2018-10-02  4:26           ` Honnappa Nagarahalli
  2018-10-02 23:53             ` Wang, Yipeng1
  0 siblings, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-02  4:26 UTC (permalink / raw)
  To: Wang, Yipeng1, Richardson, Bruce
  Cc: Ananyev, Konstantin, dev, Gobriel, Sameh, nd, Honnappa Nagarahalli

> 
> >-----Original Message-----
> >From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> >> @@ -1317,16 +1317,19 @@ rte_hash_iterate(const struct rte_hash *h,
> >> const void **key, void **data, uint32
> >>  	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> >>  	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> >>
> >> +	__hash_rw_reader_lock(h);
> >This does not work well with the lock-less changes I am making.  We
> >should leave the lock in its original position. Instead change the while loop as
> follows:
> >
> >while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT)
> >
> >>  	/* If current position is empty, go to the next one */
> >>  	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> >>  		(*next)++;
> >>  		/* End of table */
> >> -		if (*next == total_entries)
> >> +		if (*next == total_entries) {
> >> +			__hash_rw_reader_unlock(h);
> >>  			return -ENOENT;
> >> +		}
> >>  		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> >>  		idx = *next % RTE_HASH_BUCKET_ENTRIES;
> >>  	}
> >> -	__hash_rw_reader_lock(h);
> >> +
> >>  	/* Get position of entry in key table */
> >>  	position = h->buckets[bucket_idx].key_idx[idx];
> >If we change the while loop as I suggested as above, we can remove this line.
> >
> >>  	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> 
> [Wang, Yipeng] Sorry that I did not realize you already have it in your patch
> set and I agree.
> Do you want to export it as a bug fix in your patch set? I will remove my
> change.
Sure, I will make a separate commit for this.

> 
> For the lock free, do we need to protect it with version counter? Imagine the
> following corner case:
> While the iterator read the key and data, there is a writer deleted, removed,
> and recycled the key-data pair, and write a new key and data into it. While the
> writer are writing, will the reader reads out wrong key/data, or mismatched
> key/data?
> 
In the lock-free algorithm, the key-data is not 'freed' until the readers have completed all their references to the 'deleted' key-data. Hence, the writers will not be able to allocate the same key store index till the readers have stopped referring to the 'deleted' key-data.
I re-checked my ladder diagrams [1] and I could not find any issues.

[1] https://dpdkuserspace2018.sched.com/event/G44w/lock-free-read-write-concurrency-in-rtehash (PPT)

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v5 1/4] hash: fix race condition in iterate
  2018-10-01 18:34     ` [PATCH v5 1/4] hash: fix race condition in iterate Yipeng Wang
@ 2018-10-02 17:26       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-02 17:26 UTC (permalink / raw)
  To: Yipeng Wang, bruce.richardson
  Cc: konstantin.ananyev, dev, sameh.gobriel, Honnappa Nagarahalli, nd



> -----Original Message-----
> From: Yipeng Wang <yipeng1.wang@intel.com>
> Sent: Monday, October 1, 2018 1:35 PM
> To: bruce.richardson@intel.com
> Cc: konstantin.ananyev@intel.com; dev@dpdk.org;
> yipeng1.wang@intel.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; sameh.gobriel@intel.com
> Subject: [PATCH v5 1/4] hash: fix race condition in iterate
> 
> In rte_hash_iterate, the reader lock did not protect the while loop which
> checks empty entry. This created a race condition that the entry may
> become empty when enters the lock, then a wrong key data value would be
> read out.
> 
> This commit reads out the position in the while condition, which makes sure
> that the position will not be changed to empty before entering the lock.
> 
> Fixes: f2e3001b53ec ("hash: support read/write concurrency")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_hash/rte_cuckoo_hash.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index f7b86c8..da8ddf4 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -1318,7 +1318,7 @@ rte_hash_iterate(const struct rte_hash *h, const
> void **key, void **data, uint32
>  	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> 
>  	/* If current position is empty, go to the next one */
> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> +	while ((position = h->buckets[bucket_idx].key_idx[idx]) ==
> EMPTY_SLOT)
> +{
>  		(*next)++;
>  		/* End of table */
>  		if (*next == total_entries)
> @@ -1326,9 +1326,8 @@ rte_hash_iterate(const struct rte_hash *h, const
> void **key, void **data, uint32
>  		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>  		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>  	}
> +
>  	__hash_rw_reader_lock(h);
> -	/* Get position of entry in key table */
> -	position = h->buckets[bucket_idx].key_idx[idx];
>  	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>  				position * h->key_entry_size);
>  	/* Return key and data */
> --
> 2.7.4
This looks good. I can rework my patch too. I will leave the decision to you.

Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v5 4/4] hash: use partial-key hashing
  2018-10-01 18:35     ` [PATCH v5 4/4] hash: use partial-key hashing Yipeng Wang
@ 2018-10-02 20:52       ` Dharmik Thakkar
  2018-10-03  0:43         ` Wang, Yipeng1
  0 siblings, 1 reply; 107+ messages in thread
From: Dharmik Thakkar @ 2018-10-02 20:52 UTC (permalink / raw)
  To: Yipeng Wang
  Cc: bruce.richardson, konstantin.ananyev, dev, Honnappa Nagarahalli,
	sameh.gobriel

I am attempting to test the patch on an Arm machine, but it failed to apply.

I’m getting the following error:

error: patch failed: test/test/test_hash_perf.c:18
error: test/test/test_hash_perf.c: patch does not apply
Patch failed at 0003 test/hash: implement extendable bucket hash test

> On Oct 1, 2018, at 1:35 PM, Yipeng Wang <yipeng1.wang@intel.com> wrote:
>
> This commit changes the hashing mechanism to "partial-key
> hashing" to calculate bucket index and signature of key.
>
> This is  proposed in Bin Fan, et al's paper
> "MemC3: Compact and Concurrent MemCache with Dumber Caching
> and Smarter Hashing". Bascially the idea is to use "xor" to
> derive alternative bucket from current bucket index and
> signature.
>
> With "partial-key hashing", it reduces the bucket memory
> requirement from two cache lines to one cache line, which
> improves the memory efficiency and thus the lookup speed.
>
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
> lib/librte_hash/rte_cuckoo_hash.c | 246 +++++++++++++++++++-------------------
> lib/librte_hash/rte_cuckoo_hash.h |   6 +-
> lib/librte_hash/rte_hash.h        |   5 +-
> 3 files changed, 131 insertions(+), 126 deletions(-)
>
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
> index 133e181..3c7c9c5 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -90,6 +90,36 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
> return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
> }
>
> +/*
> + * We use higher 16 bits of hash as the signature value stored in table.
> + * We use the lower bits for the primary bucket
> + * location. Then we XOR primary bucket location and the signature
> + * to get the secondary bucket location. This is same as
> + * proposed in Bin Fan, et al's paper
> + * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
> + * Smarter Hashing". The benefit to use
> + * XOR is that one could derive the alternative bucket location
> + * by only using the current bucket location and the signature.
> + */
> +static inline uint16_t
> +get_short_sig(const hash_sig_t hash)
> +{
> +return hash >> 16;
> +}
> +
> +static inline uint32_t
> +get_prim_bucket_index(const struct rte_hash *h, const hash_sig_t hash)
> +{
> +return hash & h->bucket_bitmask;
> +}
> +
> +static inline uint32_t
> +get_alt_bucket_index(const struct rte_hash *h,
> +uint32_t cur_bkt_idx, uint16_t sig)
> +{
> +return (cur_bkt_idx ^ sig) & h->bucket_bitmask;
> +}
> +
> struct rte_hash *
> rte_hash_create(const struct rte_hash_parameters *params)
> {
> @@ -327,9 +357,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
> h->ext_table_support = ext_table_support;
>
> #if defined(RTE_ARCH_X86)
> -if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> -h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
> -else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
> +if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
> h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
> else
> #endif
> @@ -417,18 +445,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
> return h->hash_func(key, h->key_len, h->hash_func_init_val);
> }
>
> -/* Calc the secondary hash value from the primary hash value of a given key */
> -static inline hash_sig_t
> -rte_hash_secondary_hash(const hash_sig_t primary_hash)
> -{
> -static const unsigned all_bits_shift = 12;
> -static const unsigned alt_bits_xor = 0x5bd1e995;
> -
> -uint32_t tag = primary_hash >> all_bits_shift;
> -
> -return primary_hash ^ ((tag + 1) * alt_bits_xor);
> -}
> -
> int32_t
> rte_hash_count(const struct rte_hash *h)
> {
> @@ -560,14 +576,13 @@ enqueue_slot_back(const struct rte_hash *h,
> /* Search a key from bucket and update its data */
> static inline int32_t
> search_and_update(const struct rte_hash *h, void *data, const void *key,
> -struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
> +struct rte_hash_bucket *bkt, uint16_t sig)
> {
> int i;
> struct rte_hash_key *k, *keys = h->key_store;
>
> for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> -if (bkt->sig_current[i] == sig &&
> -bkt->sig_alt[i] == alt_hash) {
> +if (bkt->sig_current[i] == sig) {
> k = (struct rte_hash_key *) ((char *)keys +
> bkt->key_idx[i] * h->key_entry_size);
> if (rte_hash_cmp_eq(key, k->key, h) == 0) {
> @@ -594,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
> struct rte_hash_bucket *prim_bkt,
> struct rte_hash_bucket *sec_bkt,
> const struct rte_hash_key *key, void *data,
> -hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
> +uint16_t sig, uint32_t new_idx,
> int32_t *ret_val)
> {
> unsigned int i;
> @@ -605,7 +620,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
> /* Check if key was inserted after last check but before this
>  * protected region in case of inserting duplicated keys.
>  */
> -ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, prim_bkt, sig);
> if (ret != -1) {
> __hash_rw_writer_unlock(h);
> *ret_val = ret;
> @@ -613,7 +628,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
> }
>
> FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +ret = search_and_update(h, data, key, cur_bkt, sig);
> if (ret != -1) {
> __hash_rw_writer_unlock(h);
> *ret_val = ret;
> @@ -628,7 +643,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
> /* Check if slot is available */
> if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
> prim_bkt->sig_current[i] = sig;
> -prim_bkt->sig_alt[i] = alt_hash;
> prim_bkt->key_idx[i] = new_idx;
> break;
> }
> @@ -653,7 +667,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> struct rte_hash_bucket *alt_bkt,
> const struct rte_hash_key *key, void *data,
> struct queue_node *leaf, uint32_t leaf_slot,
> -hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
> +uint16_t sig, uint32_t new_idx,
> int32_t *ret_val)
> {
> uint32_t prev_alt_bkt_idx;
> @@ -674,7 +688,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> /* Check if key was inserted after last check but before this
>  * protected region.
>  */
> -ret = search_and_update(h, data, key, bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, bkt, sig);
> if (ret != -1) {
> __hash_rw_writer_unlock(h);
> *ret_val = ret;
> @@ -682,7 +696,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> }
>
> FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +ret = search_and_update(h, data, key, cur_bkt, sig);
> if (ret != -1) {
> __hash_rw_writer_unlock(h);
> *ret_val = ret;
> @@ -695,8 +709,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> prev_bkt = prev_node->bkt;
> prev_slot = curr_node->prev_slot;
>
> -prev_alt_bkt_idx =
> -prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
> +prev_alt_bkt_idx = get_alt_bucket_index(h,
> +prev_node->cur_bkt_idx,
> +prev_bkt->sig_current[prev_slot]);
>
> if (unlikely(&h->buckets[prev_alt_bkt_idx]
> != curr_bkt)) {
> @@ -710,10 +725,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>  * Cuckoo insert to move elements back to its
>  * primary bucket if available
>  */
> -curr_bkt->sig_alt[curr_slot] =
> - prev_bkt->sig_current[prev_slot];
> curr_bkt->sig_current[curr_slot] =
> -prev_bkt->sig_alt[prev_slot];
> +prev_bkt->sig_current[prev_slot];
> curr_bkt->key_idx[curr_slot] =
> prev_bkt->key_idx[prev_slot];
>
> @@ -723,7 +736,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> }
>
> curr_bkt->sig_current[curr_slot] = sig;
> -curr_bkt->sig_alt[curr_slot] = alt_hash;
> curr_bkt->key_idx[curr_slot] = new_idx;
>
> __hash_rw_writer_unlock(h);
> @@ -741,39 +753,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
> struct rte_hash_bucket *bkt,
> struct rte_hash_bucket *sec_bkt,
> const struct rte_hash_key *key, void *data,
> -hash_sig_t sig, hash_sig_t alt_hash,
> +uint16_t sig, uint32_t bucket_idx,
> uint32_t new_idx, int32_t *ret_val)
> {
> unsigned int i;
> struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
> struct queue_node *tail, *head;
> struct rte_hash_bucket *curr_bkt, *alt_bkt;
> +uint32_t cur_idx, alt_idx;
>
> tail = queue;
> head = queue + 1;
> tail->bkt = bkt;
> tail->prev = NULL;
> tail->prev_slot = -1;
> +tail->cur_bkt_idx = bucket_idx;
>
> /* Cuckoo bfs Search */
> while (likely(tail != head && head <
> queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
> RTE_HASH_BUCKET_ENTRIES)) {
> curr_bkt = tail->bkt;
> +cur_idx = tail->cur_bkt_idx;
> for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
> int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
> bkt, sec_bkt, key, data,
> -tail, i, sig, alt_hash,
> +tail, i, sig,
> new_idx, ret_val);
> if (likely(ret != -1))
> return ret;
> }
>
> /* Enqueue new node and keep prev node info */
> -alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
> -    & h->bucket_bitmask]);
> +alt_idx = get_alt_bucket_index(h, cur_idx,
> +curr_bkt->sig_current[i]);
> +alt_bkt = &(h->buckets[alt_idx]);
> head->bkt = alt_bkt;
> +head->cur_bkt_idx = alt_idx;
> head->prev = tail;
> head->prev_slot = i;
> head++;
> @@ -788,7 +805,7 @@ static inline int32_t
> __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> hash_sig_t sig, void *data)
> {
> -hash_sig_t alt_hash;
> +uint16_t short_sig;
> uint32_t prim_bucket_idx, sec_bucket_idx;
> struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
> struct rte_hash_key *new_k, *keys = h->key_store;
> @@ -803,18 +820,17 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> int32_t ret_val;
> struct rte_hash_bucket *last;
>
> -prim_bucket_idx = sig & h->bucket_bitmask;
> +short_sig = get_short_sig(sig);
> +prim_bucket_idx = get_prim_bucket_index(h, sig);
> +sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
> prim_bkt = &h->buckets[prim_bucket_idx];
> -rte_prefetch0(prim_bkt);
> -
> -alt_hash = rte_hash_secondary_hash(sig);
> -sec_bucket_idx = alt_hash & h->bucket_bitmask;
> sec_bkt = &h->buckets[sec_bucket_idx];
> +rte_prefetch0(prim_bkt);
> rte_prefetch0(sec_bkt);
>
> /* Check if key is already inserted in primary location */
> __hash_rw_writer_lock(h);
> -ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, prim_bkt, short_sig);
> if (ret != -1) {
> __hash_rw_writer_unlock(h);
> return ret;
> @@ -822,12 +838,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>
> /* Check if key is already inserted in secondary location */
> FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +ret = search_and_update(h, data, key, cur_bkt, short_sig);
> if (ret != -1) {
> __hash_rw_writer_unlock(h);
> return ret;
> }
> }
> +
> __hash_rw_writer_unlock(h);
>
> /* Did not find a match, so get a new slot for storing the new key */
> @@ -865,7 +882,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>
> /* Find an empty slot and insert */
> ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
> -sig, alt_hash, new_idx, &ret_val);
> +short_sig, new_idx, &ret_val);
> if (ret == 0)
> return new_idx - 1;
> else if (ret == 1) {
> @@ -875,7 +892,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>
> /* Primary bucket full, need to make space for new entry */
> ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
> -sig, alt_hash, new_idx, &ret_val);
> +short_sig, prim_bucket_idx, new_idx, &ret_val);
> if (ret == 0)
> return new_idx - 1;
> else if (ret == 1) {
> @@ -885,7 +902,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>
> /* Also search secondary bucket to get better occupancy */
> ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
> -alt_hash, sig, new_idx, &ret_val);
> +short_sig, sec_bucket_idx, new_idx, &ret_val);
>
> if (ret == 0)
> return new_idx - 1;
> @@ -905,14 +922,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>  */
> __hash_rw_writer_lock(h);
> /* We check for duplicates again since could be inserted before the lock */
> -ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
> +ret = search_and_update(h, data, key, prim_bkt, short_sig);
> if (ret != -1) {
> enqueue_slot_back(h, cached_free_slots, slot_id);
> goto failure;
> }
>
> FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
> +ret = search_and_update(h, data, key, cur_bkt, short_sig);
> if (ret != -1) {
> enqueue_slot_back(h, cached_free_slots, slot_id);
> goto failure;
> @@ -924,8 +941,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> /* Check if slot is available */
> if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
> -cur_bkt->sig_current[i] = alt_hash;
> -cur_bkt->sig_alt[i] = sig;
> +cur_bkt->sig_current[i] = short_sig;
> cur_bkt->key_idx[i] = new_idx;
> __hash_rw_writer_unlock(h);
> return new_idx - 1;
> @@ -943,8 +959,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>
> bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
> /* Use the first location of the new bucket */
> -(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
> -(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
> +(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
> (h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
> /* Link the new bucket to sec bucket linked list */
> last = rte_hash_get_last_bkt(sec_bkt);
> @@ -1003,7 +1018,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
>
> /* Search one bucket to find the match key */
> static inline int32_t
> -search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
> +search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
> void **data, const struct rte_hash_bucket *bkt)
> {
> int i;
> @@ -1032,30 +1047,30 @@ static inline int32_t
> __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> hash_sig_t sig, void **data)
> {
> -uint32_t bucket_idx;
> -hash_sig_t alt_hash;
> +uint32_t prim_bucket_idx, sec_bucket_idx;
> struct rte_hash_bucket *bkt, *cur_bkt;
> int ret;
> +uint16_t short_sig;
>
> -bucket_idx = sig & h->bucket_bitmask;
> -bkt = &h->buckets[bucket_idx];
> +short_sig = get_short_sig(sig);
> +prim_bucket_idx = get_prim_bucket_index(h, sig);
> +sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
> +bkt = &h->buckets[prim_bucket_idx];
>
> __hash_rw_reader_lock(h);
>
> /* Check if key is in primary location */
> -ret = search_one_bucket(h, key, sig, data, bkt);
> +ret = search_one_bucket(h, key, short_sig, data, bkt);
> if (ret != -1) {
> __hash_rw_reader_unlock(h);
> return ret;
> }
> /* Calculate secondary hash */
> -alt_hash = rte_hash_secondary_hash(sig);
> -bucket_idx = alt_hash & h->bucket_bitmask;
> -bkt = &h->buckets[bucket_idx];
> +bkt = &h->buckets[sec_bucket_idx];
>
> /* Check if key is in secondary location */
> FOR_EACH_BUCKET(cur_bkt, bkt) {
> -ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
> +ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
> if (ret != -1) {
> __hash_rw_reader_unlock(h);
> return ret;
> @@ -1102,7 +1117,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
> struct lcore_cache *cached_free_slots;
>
> bkt->sig_current[i] = NULL_SIGNATURE;
> -bkt->sig_alt[i] = NULL_SIGNATURE;
> if (h->multi_writer_support) {
> lcore_id = rte_lcore_id();
> cached_free_slots = &h->local_free_slots[lcore_id];
> @@ -1141,9 +1155,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
> if (last_bkt->key_idx[i] != EMPTY_SLOT) {
> cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
> cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
> -cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
> last_bkt->sig_current[i] = NULL_SIGNATURE;
> -last_bkt->sig_alt[i] = NULL_SIGNATURE;
> last_bkt->key_idx[i] = EMPTY_SLOT;
> return;
> }
> @@ -1153,7 +1165,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
> /* Search one bucket and remove the matched key */
> static inline int32_t
> search_and_remove(const struct rte_hash *h, const void *key,
> -struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
> +struct rte_hash_bucket *bkt, uint16_t sig, int *pos)
> {
> struct rte_hash_key *k, *keys = h->key_store;
> unsigned int i;
> @@ -1185,19 +1197,21 @@ static inline int32_t
> __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
> hash_sig_t sig)
> {
> -uint32_t bucket_idx;
> -hash_sig_t alt_hash;
> +uint32_t prim_bucket_idx, sec_bucket_idx;
> struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
> struct rte_hash_bucket *cur_bkt;
> int pos;
> int32_t ret, i;
> +uint16_t short_sig;
>
> -bucket_idx = sig & h->bucket_bitmask;
> -prim_bkt = &h->buckets[bucket_idx];
> +short_sig = get_short_sig(sig);
> +prim_bucket_idx = get_prim_bucket_index(h, sig);
> +sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
> +prim_bkt = &h->buckets[prim_bucket_idx];
>
> __hash_rw_writer_lock(h);
> /* look for key in primary bucket */
> -ret = search_and_remove(h, key, prim_bkt, sig, &pos);
> +ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
> if (ret != -1) {
> __rte_hash_compact_ll(prim_bkt, pos);
> last_bkt = prim_bkt->next;
> @@ -1206,12 +1220,10 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
> }
>
> /* Calculate secondary hash */
> -alt_hash = rte_hash_secondary_hash(sig);
> -bucket_idx = alt_hash & h->bucket_bitmask;
> -sec_bkt = &h->buckets[bucket_idx];
> +sec_bkt = &h->buckets[sec_bucket_idx];
>
> FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> -ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
> +ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
> if (ret != -1) {
> __rte_hash_compact_ll(cur_bkt, pos);
> last_bkt = sec_bkt->next;
> @@ -1288,55 +1300,35 @@ static inline void
> compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
> const struct rte_hash_bucket *prim_bkt,
> const struct rte_hash_bucket *sec_bkt,
> -hash_sig_t prim_hash, hash_sig_t sec_hash,
> +uint16_t sig,
> enum rte_hash_sig_compare_function sig_cmp_fn)
> {
> unsigned int i;
>
> +/* For match mask the first bit of every two bits indicates the match */
> switch (sig_cmp_fn) {
> -#ifdef RTE_MACHINE_CPUFLAG_AVX2
> -case RTE_HASH_COMPARE_AVX2:
> -*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> -_mm256_load_si256(
> -(__m256i const *)prim_bkt->sig_current),
> -_mm256_set1_epi32(prim_hash)));
> -*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> -_mm256_load_si256(
> -(__m256i const *)sec_bkt->sig_current),
> -_mm256_set1_epi32(sec_hash)));
> -break;
> -#endif
> #ifdef RTE_MACHINE_CPUFLAG_SSE2
> case RTE_HASH_COMPARE_SSE:
> -/* Compare the first 4 signatures in the bucket */
> -*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +/* Compare all signatures in the bucket */
> +*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> _mm_load_si128(
> (__m128i const *)prim_bkt->sig_current),
> -_mm_set1_epi32(prim_hash)));
> -*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> -_mm_load_si128(
> -(__m128i const *)&prim_bkt->sig_current[4]),
> -_mm_set1_epi32(prim_hash)))) << 4;
> -/* Compare the first 4 signatures in the bucket */
> -*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +_mm_set1_epi16(sig)));
> +/* Compare all signatures in the bucket */
> +*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> _mm_load_si128(
> (__m128i const *)sec_bkt->sig_current),
> -_mm_set1_epi32(sec_hash)));
> -*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> -_mm_load_si128(
> -(__m128i const *)&sec_bkt->sig_current[4]),
> -_mm_set1_epi32(sec_hash)))) << 4;
> +_mm_set1_epi16(sig)));
> break;
> #endif
> default:
> for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> *prim_hash_matches |=
> -((prim_hash == prim_bkt->sig_current[i]) << i);
> +((sig == prim_bkt->sig_current[i]) << (i << 1));
> *sec_hash_matches |=
> -((sec_hash == sec_bkt->sig_current[i]) << i);
> +((sig == sec_bkt->sig_current[i]) << (i << 1));
> }
> }
> -
> }
>
> #define PREFETCH_OFFSET 4
> @@ -1349,7 +1341,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> int32_t i;
> int32_t ret;
> uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
> -uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
> +uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
> +uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
> +uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
> const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
> const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
> uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
> @@ -1368,10 +1362,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> rte_prefetch0(keys[i + PREFETCH_OFFSET]);
>
> prim_hash[i] = rte_hash_hash(h, keys[i]);
> -sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
>
> -primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
> -secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
> +sig[i] = get_short_sig(prim_hash[i]);
> +prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
> +sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
> +
> +primary_bkt[i] = &h->buckets[prim_index[i]];
> +secondary_bkt[i] = &h->buckets[sec_index[i]];
>
> rte_prefetch0(primary_bkt[i]);
> rte_prefetch0(secondary_bkt[i]);
> @@ -1380,10 +1377,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> /* Calculate and prefetch rest of the buckets */
> for (; i < num_keys; i++) {
> prim_hash[i] = rte_hash_hash(h, keys[i]);
> -sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
>
> -primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
> -secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
> +sig[i] = get_short_sig(prim_hash[i]);
> +prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
> +sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
> +
> +primary_bkt[i] = &h->buckets[prim_index[i]];
> +secondary_bkt[i] = &h->buckets[sec_index[i]];
>
> rte_prefetch0(primary_bkt[i]);
> rte_prefetch0(secondary_bkt[i]);
> @@ -1394,10 +1394,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> for (i = 0; i < num_keys; i++) {
> compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
> primary_bkt[i], secondary_bkt[i],
> -prim_hash[i], sec_hash[i], h->sig_cmp_fn);
> +sig[i], h->sig_cmp_fn);
>
> if (prim_hitmask[i]) {
> -uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
> +uint32_t first_hit =
> +__builtin_ctzl(prim_hitmask[i]) >> 1;
> uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
> const struct rte_hash_key *key_slot =
> (const struct rte_hash_key *)(
> @@ -1408,7 +1409,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> }
>
> if (sec_hitmask[i]) {
> -uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
> +uint32_t first_hit =
> +__builtin_ctzl(sec_hitmask[i]) >> 1;
> uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
> const struct rte_hash_key *key_slot =
> (const struct rte_hash_key *)(
> @@ -1422,7 +1424,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> for (i = 0; i < num_keys; i++) {
> positions[i] = -ENOENT;
> while (prim_hitmask[i]) {
> -uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
> +uint32_t hit_index =
> +__builtin_ctzl(prim_hitmask[i]) >> 1;
>
> uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
> const struct rte_hash_key *key_slot =
> @@ -1441,11 +1444,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> positions[i] = key_idx - 1;
> goto next_key;
> }
> -prim_hitmask[i] &= ~(1 << (hit_index));
> +prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
> }
>
> while (sec_hitmask[i]) {
> -uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
> +uint32_t hit_index =
> +__builtin_ctzl(sec_hitmask[i]) >> 1;
>
> uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
> const struct rte_hash_key *key_slot =
> @@ -1465,7 +1469,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> positions[i] = key_idx - 1;
> goto next_key;
> }
> -sec_hitmask[i] &= ~(1 << (hit_index));
> +sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
> }
>
> next_key:
> @@ -1488,10 +1492,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> if (data != NULL)
> ret = search_one_bucket(h, keys[i],
> -sec_hash[i], &data[i], cur_bkt);
> +sig[i], &data[i], cur_bkt);
> else
> ret = search_one_bucket(h, keys[i],
> -sec_hash[i], NULL, cur_bkt);
> +sig[i], NULL, cur_bkt);
> if (ret != -1) {
> positions[i] = ret;
> hits |= 1ULL << i;
> diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
> index e601520..7753cd8 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.h
> +++ b/lib/librte_hash/rte_cuckoo_hash.h
> @@ -129,18 +129,15 @@ struct rte_hash_key {
> enum rte_hash_sig_compare_function {
> RTE_HASH_COMPARE_SCALAR = 0,
> RTE_HASH_COMPARE_SSE,
> -RTE_HASH_COMPARE_AVX2,
> RTE_HASH_COMPARE_NUM
> };
>
> /** Bucket structure */
> struct rte_hash_bucket {
> -hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
> +uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
>
> uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
>
> -hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
> -
> uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
>
> void *next;
> @@ -193,6 +190,7 @@ struct rte_hash {
>
> struct queue_node {
> struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
> +uint32_t cur_bkt_idx;
>
> struct queue_node *prev;     /* Parent(bucket) in search path */
> int prev_slot;               /* Parent(slot) in search path */
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
> index 11d8e28..6ace64e 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -40,7 +40,10 @@ extern "C" {
> /** Flag to indicate the extendabe bucket table feature should be used */
> #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
>
> -/** Signature of key that is stored internally. */
> +/**
> + * The type of hash value of a key.
> + * It should be a value of at least 32bit with fully random pattern.
> + */
> typedef uint32_t hash_sig_t;
>
> /** Type of function that can be used for calculating the hash value. */
> --
> 2.7.4
>

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 2/4] hash: add extendable bucket feature
  2018-10-02  3:58       ` Honnappa Nagarahalli
@ 2018-10-02 23:39         ` Wang, Yipeng1
  2018-10-03  4:37           ` Honnappa Nagarahalli
  2018-10-03 15:08           ` Stephen Hemminger
  0 siblings, 2 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-10-02 23:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce
  Cc: Ananyev, Konstantin, dev, Gobriel, Sameh, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> +
>> +	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
>> +		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
>> +			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
>> +			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
>> +			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
>> +			last_bkt->sig_current[i] = NULL_SIGNATURE;
>> +			last_bkt->sig_alt[i] = NULL_SIGNATURE;
>> +			last_bkt->key_idx[i] = EMPTY_SLOT;
>> +			return;
>In lock-free algorithm, this will require the global counter increment.
>
[Wang, Yipeng] I agree. Similar to your protect for cuckoo displacement, protecting the copy part.
>> +	while (last_bkt->next) {
>> +		prev_bkt = last_bkt;
>> +		last_bkt = last_bkt->next;
>> +	}
>Minor: We are trying to find the last bucket here, along with its previous. May be we can modify 'rte_hash_get_last_bkt' instead?
>
[Wang, Yipeng] Then there will be one more store in each iteration for the regular find_last. I was having an individual function for
this but since it is only used here I removed that. If you think it is necessary or you may reuse it somewhere else for LF, I can add it back.
>> +
>> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>> +		if (last_bkt->key_idx[i] != EMPTY_SLOT)
>> +			break;
>> +	}
>> +	/* found empty bucket and recycle */
>> +	if (i == RTE_HASH_BUCKET_ENTRIES) {
>> +		prev_bkt->next = last_bkt->next = NULL;
>> +		uint32_t index = last_bkt - h->buckets_ext + 1;
>> +		rte_ring_sp_enqueue(h->free_ext_bkts, (void
>> *)(uintptr_t)index);
>In the lock-less algorithm, the bucket cannot be freed immediately. I looked at couple of solutions. The bucket needs to be stored
>internally and should be associated with the key-store index (or position). I am thinking that I will add a field to 'struct rte_hash_key'
>to store the bucket pointer or index.
[Wang, Yipeng] Even if the bucket is recycled immediately, what's the worst could happen? Even if there are readers currently iterating the deleted
Bucket, it is still fine right? It is a miss anyway. 
>
>From the code, my understanding is that we will free only the last bucket. We will never free the middle bucket, please correct me if I
>am wrong. This will keep it simple for the lock-free algorithm.
[Wang, Yipeng] It is correct.
>
>I could work through these issues. So, I do not see any issues for lock-free algorithm (as of now :) ).
>
>> +
>> +	/* Out of bounds of all buckets (both main table and ext table */
>Typo: missing ')'
>
[Wang, Yipeng] Yeah thanks!

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 1/4] hash: fix race condition in iterate
  2018-10-02  4:26           ` Honnappa Nagarahalli
@ 2018-10-02 23:53             ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-10-02 23:53 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce
  Cc: Ananyev, Konstantin, dev, Gobriel, Sameh, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> >>  	/* Get position of entry in key table */
>> >>  	position = h->buckets[bucket_idx].key_idx[idx];
>> >If we change the while loop as I suggested as above, we can remove this line.
>> >
>> >>  	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>>
>> [Wang, Yipeng] Sorry that I did not realize you already have it in your patch
>> set and I agree.
>> Do you want to export it as a bug fix in your patch set? I will remove my
>> change.
>Sure, I will make a separate commit for this.
[Wang, Yipeng] I fixed the issue you mentioned and put it in V5 and I saw you acked it. You don't need to change you patch now.
>>
>> For the lock free, do we need to protect it with version counter? Imagine the
>> following corner case:
>> While the iterator read the key and data, there is a writer deleted, removed,
>> and recycled the key-data pair, and write a new key and data into it. While the
>> writer are writing, will the reader reads out wrong key/data, or mismatched
>> key/data?
>>
>In the lock-free algorithm, the key-data is not 'freed' until the readers have completed all their references to the 'deleted' key-data.
>Hence, the writers will not be able to allocate the same key store index till the readers have stopped referring to the 'deleted' key-
>data.
>I re-checked my ladder diagrams [1] and I could not find any issues.
>
>[1] https://dpdkuserspace2018.sched.com/event/G44w/lock-free-read-write-concurrency-in-rtehash (PPT)
[Wang, Yipeng]
After checking your slides, I agree. It works with upper level application which should have RCU-like mechanisms to ensure
the grace period has finished before recycling. In this case, the logic is good! 

If you haven't done so, could you be more specific in doc and the API comment to indicate that the key-data pair recycle function should only be called
after reader finished (or maybe specifically indicate RCU type of mechanism is needed?). I feel that users not familiar with this could easily
get it wrong. 

Besides this, I don't have other concern.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v5 4/4] hash: use partial-key hashing
  2018-10-02 20:52       ` Dharmik Thakkar
@ 2018-10-03  0:43         ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-10-03  0:43 UTC (permalink / raw)
  To: Dharmik Thakkar
  Cc: Richardson, Bruce, Ananyev, Konstantin, dev,
	Honnappa Nagarahalli, Gobriel, Sameh

I am sorry that I did not clearly say in the cover letter that this patch set is
depending on another bug-fix patch set (http://patchwork.dpdk.org/cover/45611/)
we submitted. I will update the cover letter in next version.

They were in the same patch set and I separated them because one is dedicated to bug fixing.

Please check if this is the reason that you cannot apply.

Thanks
Yipeng

>-----Original Message-----
>From: Dharmik Thakkar [mailto:Dharmik.Thakkar@arm.com]
>Sent: Tuesday, October 2, 2018 1:53 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>
>Cc: Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
>Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gobriel, Sameh <sameh.gobriel@intel.com>
>Subject: Re: [dpdk-dev] [PATCH v5 4/4] hash: use partial-key hashing
>
>I am attempting to test the patch on an Arm machine, but it failed to apply.
>
>I’m getting the following error:
>
>error: patch failed: test/test/test_hash_perf.c:18
>error: test/test/test_hash_perf.c: patch does not apply
>Patch failed at 0003 test/hash: implement extendable bucket hash test
>
>> On Oct 1, 2018, at 1:35 PM, Yipeng Wang <yipeng1.wang@intel.com> wrote:
>>
>> This commit changes the hashing mechanism to "partial-key
>> hashing" to calculate bucket index and signature of key.
>>
>> This is  proposed in Bin Fan, et al's paper
>> "MemC3: Compact and Concurrent MemCache with Dumber Caching
>> and Smarter Hashing". Bascially the idea is to use "xor" to
>> derive alternative bucket from current bucket index and
>> signature.
>>
>> With "partial-key hashing", it reduces the bucket memory
>> requirement from two cache lines to one cache line, which
>> improves the memory efficiency and thus the lookup speed.
>>
>> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> ---
>> lib/librte_hash/rte_cuckoo_hash.c | 246 +++++++++++++++++++-------------------
>> lib/librte_hash/rte_cuckoo_hash.h |   6 +-
>> lib/librte_hash/rte_hash.h        |   5 +-
>> 3 files changed, 131 insertions(+), 126 deletions(-)
>>
>> diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>> index 133e181..3c7c9c5 100644
>> --- a/lib/librte_hash/rte_cuckoo_hash.c
>> +++ b/lib/librte_hash/rte_cuckoo_hash.c
>> @@ -90,6 +90,36 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
>> return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
>> }
>>
>> +/*
>> + * We use higher 16 bits of hash as the signature value stored in table.
>> + * We use the lower bits for the primary bucket
>> + * location. Then we XOR primary bucket location and the signature
>> + * to get the secondary bucket location. This is same as
>> + * proposed in Bin Fan, et al's paper
>> + * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
>> + * Smarter Hashing". The benefit to use
>> + * XOR is that one could derive the alternative bucket location
>> + * by only using the current bucket location and the signature.
>> + */
>> +static inline uint16_t
>> +get_short_sig(const hash_sig_t hash)
>> +{
>> +return hash >> 16;
>> +}
>> +
>> +static inline uint32_t
>> +get_prim_bucket_index(const struct rte_hash *h, const hash_sig_t hash)
>> +{
>> +return hash & h->bucket_bitmask;
>> +}
>> +
>> +static inline uint32_t
>> +get_alt_bucket_index(const struct rte_hash *h,
>> +uint32_t cur_bkt_idx, uint16_t sig)
>> +{
>> +return (cur_bkt_idx ^ sig) & h->bucket_bitmask;
>> +}
>> +
>> struct rte_hash *
>> rte_hash_create(const struct rte_hash_parameters *params)
>> {
>> @@ -327,9 +357,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
>> h->ext_table_support = ext_table_support;
>>
>> #if defined(RTE_ARCH_X86)
>> -if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
>> -h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
>> -else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
>> +if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
>> h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
>> else
>> #endif
>> @@ -417,18 +445,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
>> return h->hash_func(key, h->key_len, h->hash_func_init_val);
>> }
>>
>> -/* Calc the secondary hash value from the primary hash value of a given key */
>> -static inline hash_sig_t
>> -rte_hash_secondary_hash(const hash_sig_t primary_hash)
>> -{
>> -static const unsigned all_bits_shift = 12;
>> -static const unsigned alt_bits_xor = 0x5bd1e995;
>> -
>> -uint32_t tag = primary_hash >> all_bits_shift;
>> -
>> -return primary_hash ^ ((tag + 1) * alt_bits_xor);
>> -}
>> -
>> int32_t
>> rte_hash_count(const struct rte_hash *h)
>> {
>> @@ -560,14 +576,13 @@ enqueue_slot_back(const struct rte_hash *h,
>> /* Search a key from bucket and update its data */
>> static inline int32_t
>> search_and_update(const struct rte_hash *h, void *data, const void *key,
>> -struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
>> +struct rte_hash_bucket *bkt, uint16_t sig)
>> {
>> int i;
>> struct rte_hash_key *k, *keys = h->key_store;
>>
>> for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>> -if (bkt->sig_current[i] == sig &&
>> -bkt->sig_alt[i] == alt_hash) {
>> +if (bkt->sig_current[i] == sig) {
>> k = (struct rte_hash_key *) ((char *)keys +
>> bkt->key_idx[i] * h->key_entry_size);
>> if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>> @@ -594,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>> struct rte_hash_bucket *prim_bkt,
>> struct rte_hash_bucket *sec_bkt,
>> const struct rte_hash_key *key, void *data,
>> -hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
>> +uint16_t sig, uint32_t new_idx,
>> int32_t *ret_val)
>> {
>> unsigned int i;
>> @@ -605,7 +620,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>> /* Check if key was inserted after last check but before this
>>  * protected region in case of inserting duplicated keys.
>>  */
>> -ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
>> +ret = search_and_update(h, data, key, prim_bkt, sig);
>> if (ret != -1) {
>> __hash_rw_writer_unlock(h);
>> *ret_val = ret;
>> @@ -613,7 +628,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>> }
>>
>> FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
>> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
>> +ret = search_and_update(h, data, key, cur_bkt, sig);
>> if (ret != -1) {
>> __hash_rw_writer_unlock(h);
>> *ret_val = ret;
>> @@ -628,7 +643,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
>> /* Check if slot is available */
>> if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
>> prim_bkt->sig_current[i] = sig;
>> -prim_bkt->sig_alt[i] = alt_hash;
>> prim_bkt->key_idx[i] = new_idx;
>> break;
>> }
>> @@ -653,7 +667,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>> struct rte_hash_bucket *alt_bkt,
>> const struct rte_hash_key *key, void *data,
>> struct queue_node *leaf, uint32_t leaf_slot,
>> -hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
>> +uint16_t sig, uint32_t new_idx,
>> int32_t *ret_val)
>> {
>> uint32_t prev_alt_bkt_idx;
>> @@ -674,7 +688,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>> /* Check if key was inserted after last check but before this
>>  * protected region.
>>  */
>> -ret = search_and_update(h, data, key, bkt, sig, alt_hash);
>> +ret = search_and_update(h, data, key, bkt, sig);
>> if (ret != -1) {
>> __hash_rw_writer_unlock(h);
>> *ret_val = ret;
>> @@ -682,7 +696,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>> }
>>
>> FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
>> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
>> +ret = search_and_update(h, data, key, cur_bkt, sig);
>> if (ret != -1) {
>> __hash_rw_writer_unlock(h);
>> *ret_val = ret;
>> @@ -695,8 +709,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>> prev_bkt = prev_node->bkt;
>> prev_slot = curr_node->prev_slot;
>>
>> -prev_alt_bkt_idx =
>> -prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
>> +prev_alt_bkt_idx = get_alt_bucket_index(h,
>> +prev_node->cur_bkt_idx,
>> +prev_bkt->sig_current[prev_slot]);
>>
>> if (unlikely(&h->buckets[prev_alt_bkt_idx]
>> != curr_bkt)) {
>> @@ -710,10 +725,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>>  * Cuckoo insert to move elements back to its
>>  * primary bucket if available
>>  */
>> -curr_bkt->sig_alt[curr_slot] =
>> - prev_bkt->sig_current[prev_slot];
>> curr_bkt->sig_current[curr_slot] =
>> -prev_bkt->sig_alt[prev_slot];
>> +prev_bkt->sig_current[prev_slot];
>> curr_bkt->key_idx[curr_slot] =
>> prev_bkt->key_idx[prev_slot];
>>
>> @@ -723,7 +736,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
>> }
>>
>> curr_bkt->sig_current[curr_slot] = sig;
>> -curr_bkt->sig_alt[curr_slot] = alt_hash;
>> curr_bkt->key_idx[curr_slot] = new_idx;
>>
>> __hash_rw_writer_unlock(h);
>> @@ -741,39 +753,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
>> struct rte_hash_bucket *bkt,
>> struct rte_hash_bucket *sec_bkt,
>> const struct rte_hash_key *key, void *data,
>> -hash_sig_t sig, hash_sig_t alt_hash,
>> +uint16_t sig, uint32_t bucket_idx,
>> uint32_t new_idx, int32_t *ret_val)
>> {
>> unsigned int i;
>> struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
>> struct queue_node *tail, *head;
>> struct rte_hash_bucket *curr_bkt, *alt_bkt;
>> +uint32_t cur_idx, alt_idx;
>>
>> tail = queue;
>> head = queue + 1;
>> tail->bkt = bkt;
>> tail->prev = NULL;
>> tail->prev_slot = -1;
>> +tail->cur_bkt_idx = bucket_idx;
>>
>> /* Cuckoo bfs Search */
>> while (likely(tail != head && head <
>> queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
>> RTE_HASH_BUCKET_ENTRIES)) {
>> curr_bkt = tail->bkt;
>> +cur_idx = tail->cur_bkt_idx;
>> for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>> if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
>> int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
>> bkt, sec_bkt, key, data,
>> -tail, i, sig, alt_hash,
>> +tail, i, sig,
>> new_idx, ret_val);
>> if (likely(ret != -1))
>> return ret;
>> }
>>
>> /* Enqueue new node and keep prev node info */
>> -alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
>> -    & h->bucket_bitmask]);
>> +alt_idx = get_alt_bucket_index(h, cur_idx,
>> +curr_bkt->sig_current[i]);
>> +alt_bkt = &(h->buckets[alt_idx]);
>> head->bkt = alt_bkt;
>> +head->cur_bkt_idx = alt_idx;
>> head->prev = tail;
>> head->prev_slot = i;
>> head++;
>> @@ -788,7 +805,7 @@ static inline int32_t
>> __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>> hash_sig_t sig, void *data)
>> {
>> -hash_sig_t alt_hash;
>> +uint16_t short_sig;
>> uint32_t prim_bucket_idx, sec_bucket_idx;
>> struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
>> struct rte_hash_key *new_k, *keys = h->key_store;
>> @@ -803,18 +820,17 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>> int32_t ret_val;
>> struct rte_hash_bucket *last;
>>
>> -prim_bucket_idx = sig & h->bucket_bitmask;
>> +short_sig = get_short_sig(sig);
>> +prim_bucket_idx = get_prim_bucket_index(h, sig);
>> +sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
>> prim_bkt = &h->buckets[prim_bucket_idx];
>> -rte_prefetch0(prim_bkt);
>> -
>> -alt_hash = rte_hash_secondary_hash(sig);
>> -sec_bucket_idx = alt_hash & h->bucket_bitmask;
>> sec_bkt = &h->buckets[sec_bucket_idx];
>> +rte_prefetch0(prim_bkt);
>> rte_prefetch0(sec_bkt);
>>
>> /* Check if key is already inserted in primary location */
>> __hash_rw_writer_lock(h);
>> -ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
>> +ret = search_and_update(h, data, key, prim_bkt, short_sig);
>> if (ret != -1) {
>> __hash_rw_writer_unlock(h);
>> return ret;
>> @@ -822,12 +838,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>>
>> /* Check if key is already inserted in secondary location */
>> FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
>> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
>> +ret = search_and_update(h, data, key, cur_bkt, short_sig);
>> if (ret != -1) {
>> __hash_rw_writer_unlock(h);
>> return ret;
>> }
>> }
>> +
>> __hash_rw_writer_unlock(h);
>>
>> /* Did not find a match, so get a new slot for storing the new key */
>> @@ -865,7 +882,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>>
>> /* Find an empty slot and insert */
>> ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
>> -sig, alt_hash, new_idx, &ret_val);
>> +short_sig, new_idx, &ret_val);
>> if (ret == 0)
>> return new_idx - 1;
>> else if (ret == 1) {
>> @@ -875,7 +892,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>>
>> /* Primary bucket full, need to make space for new entry */
>> ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
>> -sig, alt_hash, new_idx, &ret_val);
>> +short_sig, prim_bucket_idx, new_idx, &ret_val);
>> if (ret == 0)
>> return new_idx - 1;
>> else if (ret == 1) {
>> @@ -885,7 +902,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>>
>> /* Also search secondary bucket to get better occupancy */
>> ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
>> -alt_hash, sig, new_idx, &ret_val);
>> +short_sig, sec_bucket_idx, new_idx, &ret_val);
>>
>> if (ret == 0)
>> return new_idx - 1;
>> @@ -905,14 +922,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>>  */
>> __hash_rw_writer_lock(h);
>> /* We check for duplicates again since could be inserted before the lock */
>> -ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
>> +ret = search_and_update(h, data, key, prim_bkt, short_sig);
>> if (ret != -1) {
>> enqueue_slot_back(h, cached_free_slots, slot_id);
>> goto failure;
>> }
>>
>> FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
>> -ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
>> +ret = search_and_update(h, data, key, cur_bkt, short_sig);
>> if (ret != -1) {
>> enqueue_slot_back(h, cached_free_slots, slot_id);
>> goto failure;
>> @@ -924,8 +941,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>> for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>> /* Check if slot is available */
>> if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
>> -cur_bkt->sig_current[i] = alt_hash;
>> -cur_bkt->sig_alt[i] = sig;
>> +cur_bkt->sig_current[i] = short_sig;
>> cur_bkt->key_idx[i] = new_idx;
>> __hash_rw_writer_unlock(h);
>> return new_idx - 1;
>> @@ -943,8 +959,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>>
>> bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
>> /* Use the first location of the new bucket */
>> -(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
>> -(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
>> +(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
>> (h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
>> /* Link the new bucket to sec bucket linked list */
>> last = rte_hash_get_last_bkt(sec_bkt);
>> @@ -1003,7 +1018,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
>>
>> /* Search one bucket to find the match key */
>> static inline int32_t
>> -search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
>> +search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
>> void **data, const struct rte_hash_bucket *bkt)
>> {
>> int i;
>> @@ -1032,30 +1047,30 @@ static inline int32_t
>> __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
>> hash_sig_t sig, void **data)
>> {
>> -uint32_t bucket_idx;
>> -hash_sig_t alt_hash;
>> +uint32_t prim_bucket_idx, sec_bucket_idx;
>> struct rte_hash_bucket *bkt, *cur_bkt;
>> int ret;
>> +uint16_t short_sig;
>>
>> -bucket_idx = sig & h->bucket_bitmask;
>> -bkt = &h->buckets[bucket_idx];
>> +short_sig = get_short_sig(sig);
>> +prim_bucket_idx = get_prim_bucket_index(h, sig);
>> +sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
>> +bkt = &h->buckets[prim_bucket_idx];
>>
>> __hash_rw_reader_lock(h);
>>
>> /* Check if key is in primary location */
>> -ret = search_one_bucket(h, key, sig, data, bkt);
>> +ret = search_one_bucket(h, key, short_sig, data, bkt);
>> if (ret != -1) {
>> __hash_rw_reader_unlock(h);
>> return ret;
>> }
>> /* Calculate secondary hash */
>> -alt_hash = rte_hash_secondary_hash(sig);
>> -bucket_idx = alt_hash & h->bucket_bitmask;
>> -bkt = &h->buckets[bucket_idx];
>> +bkt = &h->buckets[sec_bucket_idx];
>>
>> /* Check if key is in secondary location */
>> FOR_EACH_BUCKET(cur_bkt, bkt) {
>> -ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
>> +ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
>> if (ret != -1) {
>> __hash_rw_reader_unlock(h);
>> return ret;
>> @@ -1102,7 +1117,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
>> struct lcore_cache *cached_free_slots;
>>
>> bkt->sig_current[i] = NULL_SIGNATURE;
>> -bkt->sig_alt[i] = NULL_SIGNATURE;
>> if (h->multi_writer_support) {
>> lcore_id = rte_lcore_id();
>> cached_free_slots = &h->local_free_slots[lcore_id];
>> @@ -1141,9 +1155,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
>> if (last_bkt->key_idx[i] != EMPTY_SLOT) {
>> cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
>> cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
>> -cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
>> last_bkt->sig_current[i] = NULL_SIGNATURE;
>> -last_bkt->sig_alt[i] = NULL_SIGNATURE;
>> last_bkt->key_idx[i] = EMPTY_SLOT;
>> return;
>> }
>> @@ -1153,7 +1165,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
>> /* Search one bucket and remove the matched key */
>> static inline int32_t
>> search_and_remove(const struct rte_hash *h, const void *key,
>> -struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
>> +struct rte_hash_bucket *bkt, uint16_t sig, int *pos)
>> {
>> struct rte_hash_key *k, *keys = h->key_store;
>> unsigned int i;
>> @@ -1185,19 +1197,21 @@ static inline int32_t
>> __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
>> hash_sig_t sig)
>> {
>> -uint32_t bucket_idx;
>> -hash_sig_t alt_hash;
>> +uint32_t prim_bucket_idx, sec_bucket_idx;
>> struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
>> struct rte_hash_bucket *cur_bkt;
>> int pos;
>> int32_t ret, i;
>> +uint16_t short_sig;
>>
>> -bucket_idx = sig & h->bucket_bitmask;
>> -prim_bkt = &h->buckets[bucket_idx];
>> +short_sig = get_short_sig(sig);
>> +prim_bucket_idx = get_prim_bucket_index(h, sig);
>> +sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
>> +prim_bkt = &h->buckets[prim_bucket_idx];
>>
>> __hash_rw_writer_lock(h);
>> /* look for key in primary bucket */
>> -ret = search_and_remove(h, key, prim_bkt, sig, &pos);
>> +ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
>> if (ret != -1) {
>> __rte_hash_compact_ll(prim_bkt, pos);
>> last_bkt = prim_bkt->next;
>> @@ -1206,12 +1220,10 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
>> }
>>
>> /* Calculate secondary hash */
>> -alt_hash = rte_hash_secondary_hash(sig);
>> -bucket_idx = alt_hash & h->bucket_bitmask;
>> -sec_bkt = &h->buckets[bucket_idx];
>> +sec_bkt = &h->buckets[sec_bucket_idx];
>>
>> FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
>> -ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
>> +ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
>> if (ret != -1) {
>> __rte_hash_compact_ll(cur_bkt, pos);
>> last_bkt = sec_bkt->next;
>> @@ -1288,55 +1300,35 @@ static inline void
>> compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
>> const struct rte_hash_bucket *prim_bkt,
>> const struct rte_hash_bucket *sec_bkt,
>> -hash_sig_t prim_hash, hash_sig_t sec_hash,
>> +uint16_t sig,
>> enum rte_hash_sig_compare_function sig_cmp_fn)
>> {
>> unsigned int i;
>>
>> +/* For match mask the first bit of every two bits indicates the match */
>> switch (sig_cmp_fn) {
>> -#ifdef RTE_MACHINE_CPUFLAG_AVX2
>> -case RTE_HASH_COMPARE_AVX2:
>> -*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
>> -_mm256_load_si256(
>> -(__m256i const *)prim_bkt->sig_current),
>> -_mm256_set1_epi32(prim_hash)));
>> -*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
>> -_mm256_load_si256(
>> -(__m256i const *)sec_bkt->sig_current),
>> -_mm256_set1_epi32(sec_hash)));
>> -break;
>> -#endif
>> #ifdef RTE_MACHINE_CPUFLAG_SSE2
>> case RTE_HASH_COMPARE_SSE:
>> -/* Compare the first 4 signatures in the bucket */
>> -*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
>> +/* Compare all signatures in the bucket */
>> +*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
>> _mm_load_si128(
>> (__m128i const *)prim_bkt->sig_current),
>> -_mm_set1_epi32(prim_hash)));
>> -*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
>> -_mm_load_si128(
>> -(__m128i const *)&prim_bkt->sig_current[4]),
>> -_mm_set1_epi32(prim_hash)))) << 4;
>> -/* Compare the first 4 signatures in the bucket */
>> -*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
>> +_mm_set1_epi16(sig)));
>> +/* Compare all signatures in the bucket */
>> +*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
>> _mm_load_si128(
>> (__m128i const *)sec_bkt->sig_current),
>> -_mm_set1_epi32(sec_hash)));
>> -*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
>> -_mm_load_si128(
>> -(__m128i const *)&sec_bkt->sig_current[4]),
>> -_mm_set1_epi32(sec_hash)))) << 4;
>> +_mm_set1_epi16(sig)));
>> break;
>> #endif
>> default:
>> for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>> *prim_hash_matches |=
>> -((prim_hash == prim_bkt->sig_current[i]) << i);
>> +((sig == prim_bkt->sig_current[i]) << (i << 1));
>> *sec_hash_matches |=
>> -((sec_hash == sec_bkt->sig_current[i]) << i);
>> +((sig == sec_bkt->sig_current[i]) << (i << 1));
>> }
>> }
>> -
>> }
>>
>> #define PREFETCH_OFFSET 4
>> @@ -1349,7 +1341,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>> int32_t i;
>> int32_t ret;
>> uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
>> -uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
>> +uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
>> +uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
>> +uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
>> const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>> const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
>> uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
>> @@ -1368,10 +1362,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>> rte_prefetch0(keys[i + PREFETCH_OFFSET]);
>>
>> prim_hash[i] = rte_hash_hash(h, keys[i]);
>> -sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
>>
>> -primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
>> -secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
>> +sig[i] = get_short_sig(prim_hash[i]);
>> +prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
>> +sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
>> +
>> +primary_bkt[i] = &h->buckets[prim_index[i]];
>> +secondary_bkt[i] = &h->buckets[sec_index[i]];
>>
>> rte_prefetch0(primary_bkt[i]);
>> rte_prefetch0(secondary_bkt[i]);
>> @@ -1380,10 +1377,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>> /* Calculate and prefetch rest of the buckets */
>> for (; i < num_keys; i++) {
>> prim_hash[i] = rte_hash_hash(h, keys[i]);
>> -sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
>>
>> -primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
>> -secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
>> +sig[i] = get_short_sig(prim_hash[i]);
>> +prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
>> +sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
>> +
>> +primary_bkt[i] = &h->buckets[prim_index[i]];
>> +secondary_bkt[i] = &h->buckets[sec_index[i]];
>>
>> rte_prefetch0(primary_bkt[i]);
>> rte_prefetch0(secondary_bkt[i]);
>> @@ -1394,10 +1394,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>> for (i = 0; i < num_keys; i++) {
>> compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
>> primary_bkt[i], secondary_bkt[i],
>> -prim_hash[i], sec_hash[i], h->sig_cmp_fn);
>> +sig[i], h->sig_cmp_fn);
>>
>> if (prim_hitmask[i]) {
>> -uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
>> +uint32_t first_hit =
>> +__builtin_ctzl(prim_hitmask[i]) >> 1;
>> uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
>> const struct rte_hash_key *key_slot =
>> (const struct rte_hash_key *)(
>> @@ -1408,7 +1409,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>> }
>>
>> if (sec_hitmask[i]) {
>> -uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
>> +uint32_t first_hit =
>> +__builtin_ctzl(sec_hitmask[i]) >> 1;
>> uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
>> const struct rte_hash_key *key_slot =
>> (const struct rte_hash_key *)(
>> @@ -1422,7 +1424,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>> for (i = 0; i < num_keys; i++) {
>> positions[i] = -ENOENT;
>> while (prim_hitmask[i]) {
>> -uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
>> +uint32_t hit_index =
>> +__builtin_ctzl(prim_hitmask[i]) >> 1;
>>
>> uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
>> const struct rte_hash_key *key_slot =
>> @@ -1441,11 +1444,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>> positions[i] = key_idx - 1;
>> goto next_key;
>> }
>> -prim_hitmask[i] &= ~(1 << (hit_index));
>> +prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
>> }
>>
>> while (sec_hitmask[i]) {
>> -uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
>> +uint32_t hit_index =
>> +__builtin_ctzl(sec_hitmask[i]) >> 1;
>>
>> uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
>> const struct rte_hash_key *key_slot =
>> @@ -1465,7 +1469,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>> positions[i] = key_idx - 1;
>> goto next_key;
>> }
>> -sec_hitmask[i] &= ~(1 << (hit_index));
>> +sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
>> }
>>
>> next_key:
>> @@ -1488,10 +1492,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
>> FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>> if (data != NULL)
>> ret = search_one_bucket(h, keys[i],
>> -sec_hash[i], &data[i], cur_bkt);
>> +sig[i], &data[i], cur_bkt);
>> else
>> ret = search_one_bucket(h, keys[i],
>> -sec_hash[i], NULL, cur_bkt);
>> +sig[i], NULL, cur_bkt);
>> if (ret != -1) {
>> positions[i] = ret;
>> hits |= 1ULL << i;
>> diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
>> index e601520..7753cd8 100644
>> --- a/lib/librte_hash/rte_cuckoo_hash.h
>> +++ b/lib/librte_hash/rte_cuckoo_hash.h
>> @@ -129,18 +129,15 @@ struct rte_hash_key {
>> enum rte_hash_sig_compare_function {
>> RTE_HASH_COMPARE_SCALAR = 0,
>> RTE_HASH_COMPARE_SSE,
>> -RTE_HASH_COMPARE_AVX2,
>> RTE_HASH_COMPARE_NUM
>> };
>>
>> /** Bucket structure */
>> struct rte_hash_bucket {
>> -hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
>> +uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
>>
>> uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
>>
>> -hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
>> -
>> uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
>>
>> void *next;
>> @@ -193,6 +190,7 @@ struct rte_hash {
>>
>> struct queue_node {
>> struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
>> +uint32_t cur_bkt_idx;
>>
>> struct queue_node *prev;     /* Parent(bucket) in search path */
>> int prev_slot;               /* Parent(slot) in search path */
>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>> index 11d8e28..6ace64e 100644
>> --- a/lib/librte_hash/rte_hash.h
>> +++ b/lib/librte_hash/rte_hash.h
>> @@ -40,7 +40,10 @@ extern "C" {
>> /** Flag to indicate the extendabe bucket table feature should be used */
>> #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
>>
>> -/** Signature of key that is stored internally. */
>> +/**
>> + * The type of hash value of a key.
>> + * It should be a value of at least 32bit with fully random pattern.
>> + */
>> typedef uint32_t hash_sig_t;
>>
>> /** Type of function that can be used for calculating the hash value. */
>> --
>> 2.7.4
>>
>
>IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the
>intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose,
>or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 2/4] hash: add extendable bucket feature
  2018-10-02 23:39         ` Wang, Yipeng1
@ 2018-10-03  4:37           ` Honnappa Nagarahalli
  2018-10-03 15:08           ` Stephen Hemminger
  1 sibling, 0 replies; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-03  4:37 UTC (permalink / raw)
  To: Wang, Yipeng1, Richardson, Bruce
  Cc: Ananyev, Konstantin, dev, Gobriel, Sameh, nd

> >> +	while (last_bkt->next) {
> >> +		prev_bkt = last_bkt;
> >> +		last_bkt = last_bkt->next;
> >> +	}
> >Minor: We are trying to find the last bucket here, along with its previous.
> May be we can modify 'rte_hash_get_last_bkt' instead?
> >
> [Wang, Yipeng] Then there will be one more store in each iteration for the
> regular find_last. I was having an individual function for this but since it is
> only used here I removed that. If you think it is necessary or you may reuse it
> somewhere else for LF, I can add it back.
I am fine with the existing code.

> >> +
> >> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> >> +		if (last_bkt->key_idx[i] != EMPTY_SLOT)
> >> +			break;
> >> +	}
> >> +	/* found empty bucket and recycle */
> >> +	if (i == RTE_HASH_BUCKET_ENTRIES) {
> >> +		prev_bkt->next = last_bkt->next = NULL;
> >> +		uint32_t index = last_bkt - h->buckets_ext + 1;
> >> +		rte_ring_sp_enqueue(h->free_ext_bkts, (void
> >> *)(uintptr_t)index);
> >In the lock-less algorithm, the bucket cannot be freed immediately. I
> >looked at couple of solutions. The bucket needs to be stored internally and
> should be associated with the key-store index (or position). I am thinking that I
> will add a field to 'struct rte_hash_key'
> >to store the bucket pointer or index.
> [Wang, Yipeng] Even if the bucket is recycled immediately, what's the worst
> could happen? Even if there are readers currently iterating the deleted Bucket,
> it is still fine right? It is a miss anyway.
Good question. Logically, freeing the bucket is similar to freeing memory. I think the worst that can happen is, readers will continue looking up a bucket (and any additional linked buckets), that they do not have to. But, I do not see any illegal memory accesses. I will have a better answer once I get code this.

> >
> >From the code, my understanding is that we will free only the last
> >bucket. We will never free the middle bucket, please correct me if I am
> wrong. This will keep it simple for the lock-free algorithm.
> [Wang, Yipeng] It is correct.
> >
> >I could work through these issues. So, I do not see any issues for lock-free
> algorithm (as of now :) ).
> >
> >> +
> >> +	/* Out of bounds of all buckets (both main table and ext table */
> >Typo: missing ')'
> >
> [Wang, Yipeng] Yeah thanks!
> 
Apologies :)

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 2/4] hash: add extendable bucket feature
  2018-09-28 17:23     ` [PATCH v4 2/4] hash: add extendable bucket feature Yipeng Wang
  2018-10-02  3:58       ` Honnappa Nagarahalli
@ 2018-10-03 15:08       ` Stephen Hemminger
  2018-10-03 16:53         ` Wang, Yipeng1
  1 sibling, 1 reply; 107+ messages in thread
From: Stephen Hemminger @ 2018-10-03 15:08 UTC (permalink / raw)
  To: Yipeng Wang
  Cc: bruce.richardson, konstantin.ananyev, dev, honnappa.nagarahalli,
	sameh.gobriel

On Fri, 28 Sep 2018 10:23:44 -0700
Yipeng Wang <yipeng1.wang@intel.com> wrote:

> +	/* clear free extendable bucket ring and memory */
> +	if (h->ext_table_support) {
> +		memset(h->buckets_ext, 0, h->num_buckets *
> +						sizeof(struct rte_hash_bucket));
> +		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
> +			rte_pause();

Pause is much to short. Maybe nanosleep or sched_yield()?

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 2/4] hash: add extendable bucket feature
  2018-10-02 23:39         ` Wang, Yipeng1
  2018-10-03  4:37           ` Honnappa Nagarahalli
@ 2018-10-03 15:08           ` Stephen Hemminger
  1 sibling, 0 replies; 107+ messages in thread
From: Stephen Hemminger @ 2018-10-03 15:08 UTC (permalink / raw)
  To: Wang, Yipeng1
  Cc: Honnappa Nagarahalli, Richardson, Bruce, Ananyev, Konstantin,
	dev, Gobriel, Sameh, nd

On Tue, 2 Oct 2018 23:39:51 +0000
"Wang, Yipeng1" <yipeng1.wang@intel.com> wrote:

> >> +		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
> >> +			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
> >> +			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
> >> +			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
> >> +			last_bkt->sig_current[i] = NULL_SIGNATURE;
> >> +			last_bkt->sig_alt[i] = NULL_SIGNATURE;
> >> +			last_bkt->key_idx[i] = EMPTY_SLOT;
> >> +			return;  
> >In lock-free algorithm, this will require the global counter increment.
> >  
> [Wang, Yipeng] I agree. Similar to your protect for cuckoo displacement, protecting the copy part

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 2/4] hash: add extendable bucket feature
  2018-10-03 15:08       ` Stephen Hemminger
@ 2018-10-03 16:53         ` Wang, Yipeng1
  2018-10-03 17:59           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-10-03 16:53 UTC (permalink / raw)
  To: Stephen Hemminger, honnappa.nagarahalli
  Cc: Richardson, Bruce, Ananyev, Konstantin, dev, Gobriel, Sameh

>-----Original Message-----
>From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>On Fri, 28 Sep 2018 10:23:44 -0700
>Yipeng Wang <yipeng1.wang@intel.com> wrote:
>
>> +	/* clear free extendable bucket ring and memory */
>> +	if (h->ext_table_support) {
>> +		memset(h->buckets_ext, 0, h->num_buckets *
>> +						sizeof(struct rte_hash_bucket));
>> +		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
>> +			rte_pause();
>
>Pause is much to short. Maybe nanosleep or sched_yield()?

Hmm.. As a second thought, maybe we don't need any pause/sleep here?

It is not a waiting loop and in multithreading case it is in the writer lock so this thread
Should be the only thread operating this data structure.

What do you think?

BTW Honnappa, in the lock free implementation, is hash_reset protected? We should
indicate in the API doc which API is supposed to be protected by user.

Thanks
Yipeng

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 2/4] hash: add extendable bucket feature
  2018-10-03 16:53         ` Wang, Yipeng1
@ 2018-10-03 17:59           ` Honnappa Nagarahalli
  2018-10-04  1:22             ` Wang, Yipeng1
  0 siblings, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-03 17:59 UTC (permalink / raw)
  To: Wang, Yipeng1, Stephen Hemminger
  Cc: Richardson, Bruce, Ananyev, Konstantin, dev, Gobriel, Sameh, nd

> 
> >-----Original Message-----
> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >On Fri, 28 Sep 2018 10:23:44 -0700
> >Yipeng Wang <yipeng1.wang@intel.com> wrote:
> >
> >> +	/* clear free extendable bucket ring and memory */
> >> +	if (h->ext_table_support) {
> >> +		memset(h->buckets_ext, 0, h->num_buckets *
> >> +						sizeof(struct
> rte_hash_bucket));
> >> +		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
> >> +			rte_pause();
> >
> >Pause is much to short. Maybe nanosleep or sched_yield()?
> 
> Hmm.. As a second thought, maybe we don't need any pause/sleep here?
> 
> It is not a waiting loop and in multithreading case it is in the writer lock so
> this thread Should be the only thread operating this data structure.
> 
> What do you think?
Yes, this is a single thread use case. This is resetting the ring.

> 
> BTW Honnappa, in the lock free implementation, is hash_reset protected?
> We should indicate in the API doc which API is supposed to be protected by
> user.
I do not understand the use case for hash_reset API. Why not call hash_free and hash_create?
But, lock free implementation does not handle hash_reset. I will document it in the next version.

> 
> Thanks
> Yipeng

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 0/4] hash: add extendable bucket and partial key hashing
  2018-09-28 17:23   ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
                       ` (3 preceding siblings ...)
  2018-09-28 17:23     ` [PATCH v4 4/4] hash: use partial-key hashing Yipeng Wang
@ 2018-10-03 19:05     ` Dharmik Thakkar
  4 siblings, 0 replies; 107+ messages in thread
From: Dharmik Thakkar @ 2018-10-03 19:05 UTC (permalink / raw)
  To: Yipeng Wang
  Cc: bruce.richardson, konstantin.ananyev, dev, Honnappa Nagarahalli,
	sameh.gobriel, nd

Tested OK on Qualcomm Centriq 2400.

> On Sep 28, 2018, at 12:23 PM, Yipeng Wang <yipeng1.wang@intel.com> wrote:
> 
> This patch set made two major optimizations over the current rte_hash
> library.
> 
> First, it adds Extendable Bucket Table feature: a new structure that can
> accommodate keys that failed to get inserted into the main hash table due to
> the unlikely event of excessive hash collisions. The hash table buckets will
> get extended using a linked list to host these keys. This new design will
> guarantee insertion of 100% of the keys for a given hash table size with
> minimal overhead. A new flag value is added for user to indicate if the
> extendable bucket feature should be enabled or not. The linked list buckets is
> similar concept to the extendable bucket hash table in packet framework.
> In details, for insertion, the linked buckets will be used to store the keys
> that fail to get in the primary and the secondary bucket and the cuckoo path
> could not find an empty location for the maximum path length (small
> probability). For lookup, the key is checked first in the primary, then the
> secondary, then if the secondary is extended the linked list is traversed
> for a possible match.
> 
> Second, the patch set changes the current hashing algorithm to be "partial-key
> hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
> "MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
> Hashing". Instead of storing both 32-bit signature and alternative signature
> in the bucket, we only store a small 16-bit signature and calculate the
> alternative bucket index by XORing the signature with the current bucket index.
> This doubles the hash table memory efficiency since now one bucket
> only occupies one cache line instead of two in the original design.
> 
> v3->v4:
> 1. hash: Revise commit message to be more clear for "utilization" (Honnappa)
> 2. hash: in delete key function, return bucket change to use rte_ring_sp_enqueue
> instead of rte_ring_mp_enqueue, since it is already protected inside locks.
> 3. hash: update rte_hash_iterate comments (Honnappa)
> 4. hash: Add a new commit to fix race condition in the rte_hash_iterate (Honnappa)
> 5. hash/test: during utilization test, double check rte_hash_cnt returns correct
> value (Honnappa)
> 6. hash: for partial-key-hashing commit, break the get_buckets_index function
> into three. It may make future extension easier (Honnappa)
> 7. hash: change the comment for typedef uint32_t hash_sig_t to be more clear
> to users (Honnappa)
> 
> v2->v3:
> The first four commits were separated from this patch set as another
> independent patch set:
> https://mails.dpdk.org/archives/dev/2018-September/113118.html
> 1. hash: move snprintf for ext_ring name under the ext_table condition.
> 2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
> 3. hash: after failing cuckoo path, search not only ext buckets, but also the
> secondary bucket first to see if there may be an empty location now.
> 4. hash: totally rewrote the key deleting function logic. If the deleted key was
> not in the last bucket of the linked list when ext table enabled, the last
> entry in the linked list will be placed in the vacant slot from the deleted
> key. The purpose is to compact the entries in the linked list to be more close
> to the main table. This is to make sure that not many extendable buckets are
> wasted with only one or two entries after some time of running, also benefit
> lookup speed.
> 5. Other minor coding style/comments improvements.
> 
> V1->V2:
> 1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
> 2. hash: Reorder the rte_hash struct to align cache line better.
> 3. test: Minor changes in auto test to add key insertion failure check during
> iteration test.
> 4. test: Add new commit to fix read-write test non-consecutive core issue.
> 4. hash: Add a new commit to remove unnecessary code introduced by previous
> patches.
> 5. hash: Comments improvement and coding style improvements over multiple
> places.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> 
> Yipeng Wang (4):
>  hash: fix race condition in iterate
>  hash: add extendable bucket feature
>  test/hash: implement extendable bucket hash test
>  hash: use partial-key hashing
> 
> lib/librte_hash/rte_cuckoo_hash.c | 585 ++++++++++++++++++++++++++++----------
> lib/librte_hash/rte_cuckoo_hash.h |  11 +-
> lib/librte_hash/rte_hash.h        |   8 +-
> test/test/test_hash.c             | 159 ++++++++++-
> test/test/test_hash_perf.c        | 114 ++++++--
> 5 files changed, 683 insertions(+), 194 deletions(-)
> 
> -- 
> 2.7.4
> 
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v5 0/4] hash: add extendable bucket and partial key hashing
  2018-10-01 18:34   ` [PATCH v5 " Yipeng Wang
                       ` (3 preceding siblings ...)
  2018-10-01 18:35     ` [PATCH v5 4/4] hash: use partial-key hashing Yipeng Wang
@ 2018-10-03 19:10     ` Dharmik Thakkar
  2018-10-04  0:36       ` Wang, Yipeng1
  4 siblings, 1 reply; 107+ messages in thread
From: Dharmik Thakkar @ 2018-10-03 19:10 UTC (permalink / raw)
  To: Yipeng Wang
  Cc: bruce.richardson, konstantin.ananyev, dev, Honnappa Nagarahalli,
	sameh.gobriel, nd

Tested OK on Qualcomm Centriq 2400.
> On Oct 1, 2018, at 1:34 PM, Yipeng Wang <yipeng1.wang@intel.com> wrote:
> 
> This patch set made two major optimizations over the current rte_hash
> library.
> 
> First, it adds Extendable Bucket Table feature: a new structure that can
> accommodate keys that failed to get inserted into the main hash table due to
> the unlikely event of excessive hash collisions. The hash table buckets will
> get extended using a linked list to host these keys. This new design will
> guarantee insertion of 100% of the keys for a given hash table size with
> minimal overhead. A new flag value is added for user to indicate if the
> extendable bucket feature should be enabled or not. The linked list buckets is
> similar concept to the extendable bucket hash table in packet framework.
> In details, for insertion, the linked buckets will be used to store the keys
> that fail to get in the primary and the secondary bucket and the cuckoo path
> could not find an empty location for the maximum path length (small
> probability). For lookup, the key is checked first in the primary, then the
> secondary, then if the secondary is extended the linked list is traversed
> for a possible match.
> 
> Second, the patch set changes the current hashing algorithm to be "partial-key
> hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
> "MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
> Hashing". Instead of storing both 32-bit signature and alternative signature
> in the bucket, we only store a small 16-bit signature and calculate the
> alternative bucket index by XORing the signature with the current bucket index.
> This doubles the hash table memory efficiency since now one bucket
> only occupies one cache line instead of two in the original design.
> 
> v4->v5:
> 1. hash: for the first commit, move back the lock and read "position" in the
> while condition as Honnappa suggested.
> 2. hash: minor coding style change (Honnappa) and commit message typo fix.
> 3. Add Review-by from Honnappa.
> 
> v3->v4:
> 1. hash: Revise commit message to be more clear for "utilization" (Honnappa)
> 2. hash: in delete key function, return bucket change to use rte_ring_sp_enqueue
> instead of rte_ring_mp_enqueue, since it is already protected inside locks.
> 3. hash: update rte_hash_iterate comments (Honnappa)
> 4. hash: Add a new commit to fix race condition in the rte_hash_iterate (Honnappa)
> 5. hash/test: during utilization test, double check rte_hash_cnt returns correct
> value (Honnappa)
> 6. hash: for partial-key-hashing commit, break the get_buckets_index function
> into three. It may make future extension easier (Honnappa)
> 7. hash: change the comment for typedef uint32_t hash_sig_t to be more clear
> to users (Honnappa)
> 
> v2->v3:
> The first four commits were separated from this patch set as another
> independent patch set:
> https://mails.dpdk.org/archives/dev/2018-September/113118.html
> 1. hash: move snprintf for ext_ring name under the ext_table condition.
> 2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
> 3. hash: after failing cuckoo path, search not only ext buckets, but also the
> secondary bucket first to see if there may be an empty location now.
> 4. hash: totally rewrote the key deleting function logic. If the deleted key was
> not in the last bucket of the linked list when ext table enabled, the last
> entry in the linked list will be placed in the vacant slot from the deleted
> key. The purpose is to compact the entries in the linked list to be more close
> to the main table. This is to make sure that not many extendable buckets are
> wasted with only one or two entries after some time of running, also benefit
> lookup speed.
> 5. Other minor coding style/comments improvements.
> 
> V1->V2:
> 1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
> 2. hash: Reorder the rte_hash struct to align cache line better.
> 3. test: Minor changes in auto test to add key insertion failure check during
> iteration test.
> 4. test: Add new commit to fix read-write test non-consecutive core issue.
> 4. hash: Add a new commit to remove unnecessary code introduced by previous
> patches.
> 5. hash: Comments improvement and coding style improvements over multiple
> places.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> 
> Yipeng Wang (4):
>  hash: fix race condition in iterate
>  hash: add extendable bucket feature
>  test/hash: implement extendable bucket hash test
>  hash: use partial-key hashing
> 
> lib/librte_hash/rte_cuckoo_hash.c | 580 ++++++++++++++++++++++++++++----------
> lib/librte_hash/rte_cuckoo_hash.h |  11 +-
> lib/librte_hash/rte_hash.h        |   8 +-
> test/test/test_hash.c             | 159 ++++++++++-
> test/test/test_hash_perf.c        | 114 ++++++--
> 5 files changed, 677 insertions(+), 195 deletions(-)
> 
> -- 
> 2.7.4
> 
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v5 0/4] hash: add extendable bucket and partial key hashing
  2018-10-03 19:10     ` [PATCH v5 0/4] hash: add extendable bucket and partial key hashing Dharmik Thakkar
@ 2018-10-04  0:36       ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-10-04  0:36 UTC (permalink / raw)
  To: Dharmik Thakkar
  Cc: Richardson, Bruce, Ananyev, Konstantin, dev,
	Honnappa Nagarahalli, Gobriel, Sameh, nd

>> --
>> 2.7.4
>>
>Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>

[Wang, Yipeng] Thanks for testing, Dharmik!

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 2/4] hash: add extendable bucket feature
  2018-10-03 17:59           ` Honnappa Nagarahalli
@ 2018-10-04  1:22             ` Wang, Yipeng1
  0 siblings, 0 replies; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-10-04  1:22 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Stephen Hemminger
  Cc: Richardson, Bruce, Ananyev, Konstantin, dev, Gobriel, Sameh, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>> >-----Original Message-----
>> >From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> >On Fri, 28 Sep 2018 10:23:44 -0700
>> >Yipeng Wang <yipeng1.wang@intel.com> wrote:
>> >
>> >> +	/* clear free extendable bucket ring and memory */
>> >> +	if (h->ext_table_support) {
>> >> +		memset(h->buckets_ext, 0, h->num_buckets *
>> >> +						sizeof(struct
>> rte_hash_bucket));
>> >> +		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
>> >> +			rte_pause();
>> >
>> >Pause is much to short. Maybe nanosleep or sched_yield()?
>>
>> Hmm.. As a second thought, maybe we don't need any pause/sleep here?
>>
>> It is not a waiting loop and in multithreading case it is in the writer lock so
>> this thread Should be the only thread operating this data structure.
>>
>> What do you think?
>Yes, this is a single thread use case. This is resetting the ring.
>
[Wang, Yipeng] If people agree on this, I can have a separate patch later to remove the pause.
>>
>> BTW Honnappa, in the lock free implementation, is hash_reset protected?
>> We should indicate in the API doc which API is supposed to be protected by
>> user.
>I do not understand the use case for hash_reset API. Why not call hash_free and hash_create?
>But, lock free implementation does not handle hash_reset. I will document it in the next version.
>
[Wang, Yipeng]
I assume reset maybe still faster than free and create. Also, after free, you cannot guarantee that
the creation can succeed.  It might also require less user-level code by using reset. 
I agree that in most use cases people can just free and create a new one.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v6 0/4] hash: add extendable bucket and partial key hashing
  2018-09-26 20:26 ` [PATCH v3 0/3] hash: add extendable bucket and partial key hashing Yipeng Wang
                     ` (4 preceding siblings ...)
  2018-10-01 18:34   ` [PATCH v5 " Yipeng Wang
@ 2018-10-04 16:35   ` Yipeng Wang
  2018-10-04 16:35     ` [PATCH v6 1/4] hash: fix race condition in iterate Yipeng Wang
                       ` (4 more replies)
  5 siblings, 5 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-04 16:35 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar

This patch has dependency on another bug fix patch set:
http://patchwork.dpdk.org/cover/45611/

This patch set makes two major optimizations over the current rte_hash
library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

v5->v6:
1. hash: fix a typo in comment, found by Honnappa.
2. Fix typos in commit messages.
3. Add review-by and acked-by.

v4->v5:
1. hash: for the first commit, move back the lock and read "position" in the
while condition as Honnappa suggested.
2. hash: minor coding style change (Honnappa) and commit message typo fix.
3. Add Review-by from Honnappa.

v3->v4:
1. hash: Revise commit message to be more clear for "utilization" (Honnappa)
2. hash: in delete key function, return bucket change to use rte_ring_sp_enqueue
instead of rte_ring_mp_enqueue, since it is already protected inside locks.
3. hash: update rte_hash_iterate comments (Honnappa)
4. hash: Add a new commit to fix race condition in the rte_hash_iterate (Honnappa)
5. hash/test: during utilization test, double check rte_hash_cnt returns correct
value (Honnappa)
6. hash: for partial-key-hashing commit, break the get_buckets_index function
into three. It may make future extension easier (Honnappa)
7. hash: change the comment for typedef uint32_t hash_sig_t to be more clear
to users (Honnappa)

v2->v3:
The first four commits were separated from this patch set as another
independent patch set:
https://mails.dpdk.org/archives/dev/2018-September/113118.html
1. hash: move snprintf for ext_ring name under the ext_table condition.
2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
3. hash: after failing cuckoo path, search not only ext buckets, but also the
secondary bucket first to see if there may be an empty location now.
4. hash: totally rewrote the key deleting function logic. If the deleted key was
not in the last bucket of the linked list when ext table enabled, the last
entry in the linked list will be placed in the vacant slot from the deleted
key. The purpose is to compact the entries in the linked list to be more close
to the main table. This is to make sure that not many extendable buckets are
wasted with only one or two entries after some time of running, also benefit
lookup speed.
5. Other minor coding style/comments improvements.

V1->V2:
1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
2. hash: Reorder the rte_hash struct to align cache line better.
3. test: Minor changes in auto test to add key insertion failure check during
iteration test.
4. test: Add new commit to fix read-write test non-consecutive core issue.
4. hash: Add a new commit to remove unnecessary code introduced by previous
patches.
5. hash: Comments improvement and coding style improvements over multiple
places.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>

Yipeng Wang (4):
  hash: fix race condition in iterate
  hash: add extendable bucket feature
  test/hash: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 580 ++++++++++++++++++++++++++++----------
 lib/librte_hash/rte_cuckoo_hash.h |  11 +-
 lib/librte_hash/rte_hash.h        |   8 +-
 test/test/test_hash.c             | 159 ++++++++++-
 test/test/test_hash_perf.c        | 114 ++++++--
 5 files changed, 677 insertions(+), 195 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v6 1/4] hash: fix race condition in iterate
  2018-10-04 16:35   ` [PATCH v6 " Yipeng Wang
@ 2018-10-04 16:35     ` Yipeng Wang
  2018-10-04 16:35     ` [PATCH v6 2/4] hash: add extendable bucket feature Yipeng Wang
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-04 16:35 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar

In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.

This commit reads out the position in the while condition,
which makes sure that the position will not be changed
to empty before entering the lock.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..da8ddf4 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1318,7 +1318,7 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	idx = *next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+	while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
 		if (*next == total_entries)
@@ -1326,9 +1326,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
+
 	__hash_rw_reader_lock(h);
-	/* Get position of entry in key table */
-	position = h->buckets[bucket_idx].key_idx[idx];
 	next_key = (struct rte_hash_key *) ((char *)h->key_store +
 				position * h->key_entry_size);
 	/* Return key and data */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 2/4] hash: add extendable bucket feature
  2018-10-04 16:35   ` [PATCH v6 " Yipeng Wang
  2018-10-04 16:35     ` [PATCH v6 1/4] hash: fix race condition in iterate Yipeng Wang
@ 2018-10-04 16:35     ` Yipeng Wang
  2018-10-04 16:35     ` [PATCH v6 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-04 16:35 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar

In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the hash table load can always achieve 100%.
In other words, the table can always accommodate the same
number of keys as the specified table size. This provides
100% table capacity guarantee.

Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 369 ++++++++++++++++++++++++++++++++------
 lib/librte_hash/rte_cuckoo_hash.h |   5 +
 lib/librte_hash/rte_hash.h        |   3 +
 3 files changed, 326 insertions(+), 51 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index da8ddf4..1e3112e 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,6 +31,10 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
+#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)                            \
+	for (CURRENT_BKT = START_BUCKET;                                      \
+		CURRENT_BKT != NULL;                                          \
+		CURRENT_BKT = CURRENT_BKT->next)
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
 	return h;
 }
 
+static inline struct rte_hash_bucket *
+rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt)
+{
+	while (lst_bkt->next != NULL)
+		lst_bkt = lst_bkt->next;
+	return lst_bkt;
+}
+
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)
 {
 	h->cmp_jump_table_idx = KEY_CUSTOM;
@@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	struct rte_tailq_entry *te = NULL;
 	struct rte_hash_list *hash_list;
 	struct rte_ring *r = NULL;
+	struct rte_ring *r_ext = NULL;
 	char hash_name[RTE_HASH_NAMESIZE];
 	void *k = NULL;
 	void *buckets = NULL;
+	void *buckets_ext = NULL;
 	char ring_name[RTE_RING_NAMESIZE];
+	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
 	unsigned i;
 	unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		multi_writer_support = 1;
 	}
 
+	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
+		ext_table_support = 1;
+
 	/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
 	if (multi_writer_support)
 		/*
@@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err;
 	}
 
+	const uint32_t num_buckets = rte_align32pow2(params->entries) /
+						RTE_HASH_BUCKET_ENTRIES;
+
+	/* Create ring for extendable buckets. */
+	if (ext_table_support) {
+		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
+								params->name);
+		r_ext = rte_ring_create(ext_ring_name,
+				rte_align32pow2(num_buckets + 1),
+				params->socket_id, 0);
+
+		if (r_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+								"failed\n");
+			goto err;
+		}
+	}
+
 	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
 
 	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err_unlock;
 	}
 
-	const uint32_t num_buckets = rte_align32pow2(params->entries)
-					/ RTE_HASH_BUCKET_ENTRIES;
-
 	buckets = rte_zmalloc_socket(NULL,
 				num_buckets * sizeof(struct rte_hash_bucket),
 				RTE_CACHE_LINE_SIZE, params->socket_id);
 
 	if (buckets == NULL) {
-		RTE_LOG(ERR, HASH, "memory allocation failed\n");
+		RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
 		goto err_unlock;
 	}
 
+	/* Allocate same number of extendable buckets */
+	if (ext_table_support) {
+		buckets_ext = rte_zmalloc_socket(NULL,
+				num_buckets * sizeof(struct rte_hash_bucket),
+				RTE_CACHE_LINE_SIZE, params->socket_id);
+		if (buckets_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+							"failed\n");
+			goto err_unlock;
+		}
+		/* Populate ext bkt ring. We reserve 0 similar to the
+		 * key-data slot, just in case in future we want to
+		 * use bucket index for the linked list and 0 means NULL
+		 * for next bucket
+		 */
+		for (i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+	}
+
 	const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params->key_len;
 	const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
 
@@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->num_buckets = num_buckets;
 	h->bucket_bitmask = h->num_buckets - 1;
 	h->buckets = buckets;
+	h->buckets_ext = buckets_ext;
+	h->free_ext_bkts = r_ext;
 	h->hash_func = (params->hash_func == NULL) ?
 		default_hash_func : params->hash_func;
 	h->key_store = k;
@@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->hw_trans_mem_support = hw_trans_mem_support;
 	h->multi_writer_support = multi_writer_support;
 	h->readwrite_concur_support = readwrite_concur_support;
+	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
 err:
 	rte_ring_free(r);
+	rte_ring_free(r_ext);
 	rte_free(te);
 	rte_free(h);
 	rte_free(buckets);
+	rte_free(buckets_ext);
 	rte_free(k);
 	return NULL;
 }
@@ -344,8 +402,10 @@ rte_hash_free(struct rte_hash *h)
 		rte_free(h->readwrite_lock);
 	}
 	rte_ring_free(h->free_slots);
+	rte_ring_free(h->free_ext_bkts);
 	rte_free(h->key_store);
 	rte_free(h->buckets);
+	rte_free(h->buckets_ext);
 	rte_free(h);
 	rte_free(te);
 }
@@ -403,7 +463,6 @@ __hash_rw_writer_lock(const struct rte_hash *h)
 		rte_rwlock_write_lock(h->readwrite_lock);
 }
 
-
 static inline void
 __hash_rw_reader_lock(const struct rte_hash *h)
 {
@@ -448,6 +507,14 @@ rte_hash_reset(struct rte_hash *h)
 	while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
 		rte_pause();
 
+	/* clear free extendable bucket ring and memory */
+	if (h->ext_table_support) {
+		memset(h->buckets_ext, 0, h->num_buckets *
+						sizeof(struct rte_hash_bucket));
+		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
+			rte_pause();
+	}
+
 	/* Repopulate the free slots ring. Entry zero is reserved for key misses */
 	if (h->multi_writer_support)
 		tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
@@ -458,6 +525,13 @@ rte_hash_reset(struct rte_hash *h)
 	for (i = 1; i < tot_ring_cnt + 1; i++)
 		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
 
+	/* Repopulate the free ext bkt ring. */
+	if (h->ext_table_support) {
+		for (i = 1; i <= h->num_buckets; i++)
+			rte_ring_sp_enqueue(h->free_ext_bkts,
+						(void *)((uintptr_t) i));
+	}
+
 	if (h->multi_writer_support) {
 		/* Reset local caches per lcore */
 		for (i = 0; i < RTE_MAX_LCORE; i++)
@@ -524,24 +598,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		int32_t *ret_val)
 {
 	unsigned int i;
-	struct rte_hash_bucket *cur_bkt = prim_bkt;
+	struct rte_hash_bucket *cur_bkt;
 	int32_t ret;
 
 	__hash_rw_writer_lock(h);
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	/* Insert new entry if there is room in the primary
@@ -580,7 +657,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
-	struct rte_hash_bucket *cur_bkt = bkt;
+	struct rte_hash_bucket *cur_bkt;
 	struct queue_node *prev_node, *curr_node = leaf;
 	struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
 	uint32_t prev_slot, curr_slot = leaf_slot;
@@ -597,18 +674,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
 
-	ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	while (likely(curr_node->prev != NULL)) {
@@ -711,15 +790,18 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	hash_sig_t alt_hash;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
-	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
 	void *slot_id = NULL;
-	uint32_t new_idx;
+	void *ext_bkt_id = NULL;
+	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
+	unsigned int i;
 	struct lcore_cache *cached_free_slots = NULL;
 	int32_t ret_val;
+	struct rte_hash_bucket *last;
 
 	prim_bucket_idx = sig & h->bucket_bitmask;
 	prim_bkt = &h->buckets[prim_bucket_idx];
@@ -739,10 +821,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Check if key is already inserted in secondary location */
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_writer_unlock(h);
 
@@ -808,10 +892,70 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
-	} else {
+	}
+
+	/* if ext table not enabled, we failed the insertion */
+	if (!h->ext_table_support) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret;
 	}
+
+	/* Now we need to go through the extendable bucket. Protection is needed
+	 * to protect all extendable bucket processes.
+	 */
+	__hash_rw_writer_lock(h);
+	/* We check for duplicates again since could be inserted before the lock */
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	if (ret != -1) {
+		enqueue_slot_back(h, cached_free_slots, slot_id);
+		goto failure;
+	}
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			enqueue_slot_back(h, cached_free_slots, slot_id);
+			goto failure;
+		}
+	}
+
+	/* Search sec and ext buckets to find an empty entry to insert. */
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			/* Check if slot is available */
+			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
+				cur_bkt->sig_current[i] = alt_hash;
+				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->key_idx[i] = new_idx;
+				__hash_rw_writer_unlock(h);
+				return new_idx - 1;
+			}
+		}
+	}
+
+	/* Failed to get an empty entry from extendable buckets. Link a new
+	 * extendable bucket. We first get a free bucket from ring.
+	 */
+	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+		ret = -ENOSPC;
+		goto failure;
+	}
+
+	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
+	/* Use the first location of the new bucket */
+	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
+	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
+	/* Link the new bucket to sec bucket linked list */
+	last = rte_hash_get_last_bkt(sec_bkt);
+	last->next = &h->buckets_ext[bkt_id];
+	__hash_rw_writer_unlock(h);
+	return new_idx - 1;
+
+failure:
+	__hash_rw_writer_unlock(h);
+	return ret;
+
 }
 
 int32_t
@@ -890,7 +1034,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
+	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
 
 	bucket_idx = sig & h->bucket_bitmask;
@@ -910,10 +1054,12 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 	bkt = &h->buckets[bucket_idx];
 
 	/* Check if key is in secondary location */
-	ret = search_one_bucket(h, key, alt_hash, data, bkt);
-	if (ret != -1) {
-		__hash_rw_reader_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, bkt) {
+		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		if (ret != -1) {
+			__hash_rw_reader_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_reader_unlock(h);
 	return -ENOENT;
@@ -978,16 +1124,42 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	}
 }
 
+/* Compact the linked list by moving key from last entry in linked list to the
+ * empty slot.
+ */
+static inline void
+__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
+	int i;
+	struct rte_hash_bucket *last_bkt;
+
+	if (!cur_bkt->next)
+		return;
+
+	last_bkt = rte_hash_get_last_bkt(cur_bkt);
+
+	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
+			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
+			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
+			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
+			last_bkt->sig_current[i] = NULL_SIGNATURE;
+			last_bkt->sig_alt[i] = NULL_SIGNATURE;
+			last_bkt->key_idx[i] = EMPTY_SLOT;
+			return;
+		}
+	}
+}
+
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig)
+			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
 	int32_t ret;
 
-	/* Check if key is in primary location */
+	/* Check if key is in bucket */
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 		if (bkt->sig_current[i] == sig &&
 				bkt->key_idx[i] != EMPTY_SLOT) {
@@ -996,12 +1168,12 @@ search_and_remove(const struct rte_hash *h, const void *key,
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
 				remove_entry(h, bkt, i);
 
-				/*
-				 * Return index where key is stored,
+				/* Return index where key is stored,
 				 * subtracting the first dummy index
 				 */
 				ret = bkt->key_idx[i] - 1;
 				bkt->key_idx[i] = EMPTY_SLOT;
+				*pos = i;
 				return ret;
 			}
 		}
@@ -1015,34 +1187,66 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
-	int32_t ret;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
+	struct rte_hash_bucket *cur_bkt;
+	int pos;
+	int32_t ret, i;
 
 	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	prim_bkt = &h->buckets[bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
 	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+		__rte_hash_compact_ll(prim_bkt, pos);
+		last_bkt = prim_bkt->next;
+		prev_bkt = prim_bkt;
+		goto return_bkt;
 	}
 
 	/* Calculate secondary hash */
 	alt_hash = rte_hash_secondary_hash(sig);
 	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[bucket_idx];
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		if (ret != -1) {
+			__rte_hash_compact_ll(cur_bkt, pos);
+			last_bkt = sec_bkt->next;
+			prev_bkt = sec_bkt;
+			goto return_bkt;
+		}
+	}
 
-	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, bkt, alt_hash);
-	if (ret != -1) {
+	__hash_rw_writer_unlock(h);
+	return -ENOENT;
+
+/* Search last bucket to see if empty to be recycled */
+return_bkt:
+	if (!last_bkt) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
+	while (last_bkt->next) {
+		prev_bkt = last_bkt;
+		last_bkt = last_bkt->next;
+	}
+
+	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT)
+			break;
+	}
+	/* found empty bucket and recycle */
+	if (i == RTE_HASH_BUCKET_ENTRIES) {
+		prev_bkt->next = last_bkt->next = NULL;
+		uint32_t index = last_bkt - h->buckets_ext + 1;
+		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+	}
 
 	__hash_rw_writer_unlock(h);
-	return -ENOENT;
+	return ret;
 }
 
 int32_t
@@ -1143,12 +1347,14 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 {
 	uint64_t hits = 0;
 	int32_t i;
+	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	struct rte_hash_bucket *cur_bkt, *next_bkt;
 
 	/* Prefetch first keys */
 	for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++)
@@ -1266,6 +1472,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		continue;
 	}
 
+	/* all found, do not need to go through ext bkt */
+	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
+		if (hit_mask != NULL)
+			*hit_mask = hits;
+		__hash_rw_reader_unlock(h);
+		return;
+	}
+
+	/* need to check ext buckets for match */
+	for (i = 0; i < num_keys; i++) {
+		if ((hits & (1ULL << i)) != 0)
+			continue;
+		next_bkt = secondary_bkt[i]->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			if (data != NULL)
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], &data[i], cur_bkt);
+			else
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], NULL, cur_bkt);
+			if (ret != -1) {
+				positions[i] = ret;
+				hits |= 1ULL << i;
+				break;
+			}
+		}
+	}
+
 	__hash_rw_reader_unlock(h);
 
 	if (hit_mask != NULL)
@@ -1308,10 +1542,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 
 	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
-	/* Out of bounds */
-	if (*next >= total_entries)
-		return -ENOENT;
+	const uint32_t total_entries_main = h->num_buckets *
+							RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries = total_entries_main << 1;
+
+	/* Out of bounds of all buckets (both main table and ext table) */
+	if (*next >= total_entries_main)
+		goto extend_table;
 
 	/* Calculate bucket and index of current iterator */
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
@@ -1322,7 +1559,7 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 		(*next)++;
 		/* End of table */
 		if (*next == total_entries)
-			return -ENOENT;
+			goto extend_table;
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
@@ -1340,4 +1577,34 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	(*next)++;
 
 	return position - 1;
+
+/* Begin to iterate extendable buckets */
+extend_table:
+	/* Out of total bound or if ext bucket feature is not enabled */
+	if (*next >= total_entries || !h->ext_table_support)
+		return -ENOENT;
+
+	bucket_idx = (*next - total_entries_main) / RTE_HASH_BUCKET_ENTRIES;
+	idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+
+	while ((position = h->buckets_ext[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
+		(*next)++;
+		if (*next == total_entries)
+			return -ENOENT;
+		bucket_idx = (*next - total_entries_main) /
+						RTE_HASH_BUCKET_ENTRIES;
+		idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+	}
+	__hash_rw_reader_lock(h);
+	next_key = (struct rte_hash_key *) ((char *)h->key_store +
+				position * h->key_entry_size);
+	/* Return key and data */
+	*key = next_key->key;
+	*data = next_key->pdata;
+
+	__hash_rw_reader_unlock(h);
+
+	/* Increment iterator */
+	(*next)++;
+	return position - 1;
 }
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fc0e5c2..e601520 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -142,6 +142,8 @@ struct rte_hash_bucket {
 	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
+
+	void *next;
 } __rte_cache_aligned;
 
 /** A hash table structure. */
@@ -166,6 +168,7 @@ struct rte_hash {
 	/**< If multi-writer support is enabled. */
 	uint8_t readwrite_concur_support;
 	/**< If read-write concurrency support is enabled */
+	uint8_t ext_table_support;     /**< Enable extendable bucket table */
 	rte_hash_function hash_func;    /**< Function used to calculate hash. */
 	uint32_t hash_func_init_val;    /**< Init value used by hash_func. */
 	rte_hash_cmp_eq_t rte_hash_custom_cmp_eq;
@@ -184,6 +187,8 @@ struct rte_hash {
 	 * to the key table.
 	 */
 	rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
+	struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */
+	struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets */
 } __rte_cache_aligned;
 
 struct queue_node {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d931..11d8e28 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -37,6 +37,9 @@ extern "C" {
 /** Flag to support reader writer concurrency */
 #define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
 
+/** Flag to indicate the extendabe bucket table feature should be used */
+#define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
+
 /** Signature of key that is stored internally. */
 typedef uint32_t hash_sig_t;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 3/4] test/hash: implement extendable bucket hash test
  2018-10-04 16:35   ` [PATCH v6 " Yipeng Wang
  2018-10-04 16:35     ` [PATCH v6 1/4] hash: fix race condition in iterate Yipeng Wang
  2018-10-04 16:35     ` [PATCH v6 2/4] hash: add extendable bucket feature Yipeng Wang
@ 2018-10-04 16:35     ` Yipeng Wang
  2018-10-04 16:35     ` [PATCH v6 4/4] hash: use partial-key hashing Yipeng Wang
  2018-10-10 21:27     ` [PATCH v7 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-04 16:35 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar

This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 test/test/test_hash.c      | 159 +++++++++++++++++++++++++++++++++++++++++++--
 test/test/test_hash_perf.c | 114 +++++++++++++++++++++++---------
 2 files changed, 238 insertions(+), 35 deletions(-)

diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd..815c734 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -660,6 +660,116 @@ static int test_full_bucket(void)
 	return 0;
 }
 
+/*
+ * Similar to the test above (full bucket test), but for extendable buckets.
+ */
+static int test_extendable_bucket(void)
+{
+	struct rte_hash_parameters params_pseudo_hash = {
+		.name = "test5",
+		.entries = 64,
+		.key_len = sizeof(struct flow_key), /* 13 */
+		.hash_func = pseudo_hash,
+		.hash_func_init_val = 0,
+		.socket_id = 0,
+		.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
+	};
+	struct rte_hash *handle;
+	int pos[64];
+	int expected_pos[64];
+	unsigned int i;
+	struct flow_key rand_keys[64];
+
+	for (i = 0; i < 64; i++) {
+		rand_keys[i].port_dst = i;
+		rand_keys[i].port_src = i+1;
+	}
+
+	handle = rte_hash_create(&params_pseudo_hash);
+	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
+
+	/* Fill bucket */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add - update */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Delete 1 key, check other keys are still found */
+	pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
+	print_key_info("Del", &rand_keys[35], pos[35]);
+	RETURN_IF_ERROR(pos[35] != expected_pos[35],
+			"failed to delete key (pos[1]=%d)", pos[35]);
+	pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
+	print_key_info("Lkp", &rand_keys[20], pos[20]);
+	RETURN_IF_ERROR(pos[20] != expected_pos[20],
+			"failed lookup after deleting key from same bucket "
+			"(pos[20]=%d)", pos[20]);
+
+	/* Go back to previous state */
+	pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
+	print_key_info("Add", &rand_keys[35], pos[35]);
+	expected_pos[35] = pos[35];
+	RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)", pos[35]);
+
+	/* Delete */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
+		print_key_info("Del", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to delete key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != -ENOENT,
+			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add again */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	rte_hash_free(handle);
+
+	/* Cover the NULL case. */
+	rte_hash_free(0);
+	return 0;
+}
+
 /******************************************************************************/
 static int
 fbk_hash_unit_test(void)
@@ -1096,7 +1206,7 @@ test_hash_creation_with_good_parameters(void)
  * Test to see the average table utilization (entries added/max entries)
  * before hitting a random entry that cannot be added
  */
-static int test_average_table_utilization(void)
+static int test_average_table_utilization(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	uint8_t simple_key[MAX_KEYSIZE];
@@ -1107,12 +1217,23 @@ static int test_average_table_utilization(void)
 
 	printf("\n# Running test to determine average utilization"
 	       "\n  before adding elements begins to fail\n");
+	if (ext_table)
+		printf("ext table is enabled\n");
+	else
+		printf("ext table is disabled\n");
+
 	printf("Measuring performance, please wait");
 	fflush(stdout);
 	ut_params.entries = 1 << 16;
 	ut_params.name = "test_average_utilization";
 	ut_params.hash_func = rte_jhash;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
+
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
 	for (j = 0; j < ITERATIONS; j++) {
@@ -1139,6 +1260,14 @@ static int test_average_table_utilization(void)
 			rte_hash_free(handle);
 			return -1;
 		}
+		if (ext_table) {
+			if (cnt != ut_params.entries) {
+				printf("rte_hash_count returned wrong value "
+					"%u, %u, %u\n", j, added_keys, cnt);
+				rte_hash_free(handle);
+				return -1;
+			}
+		}
 
 		average_keys_added += added_keys;
 
@@ -1161,7 +1290,7 @@ static int test_average_table_utilization(void)
 }
 
 #define NUM_ENTRIES 256
-static int test_hash_iteration(void)
+static int test_hash_iteration(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	unsigned i;
@@ -1177,6 +1306,11 @@ static int test_hash_iteration(void)
 	ut_params.name = "test_hash_iteration";
 	ut_params.hash_func = rte_jhash;
 	ut_params.key_len = 16;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
@@ -1186,8 +1320,13 @@ static int test_hash_iteration(void)
 		for (i = 0; i < ut_params.key_len; i++)
 			keys[added_keys][i] = rte_rand() % 255;
 		ret = rte_hash_add_key_data(handle, keys[added_keys], data[added_keys]);
-		if (ret < 0)
+		if (ret < 0) {
+			if (ext_table) {
+				printf("Insertion failed for ext table\n");
+				goto err;
+			}
 			break;
+		}
 	}
 
 	/* Iterate through the hash table */
@@ -1474,6 +1613,8 @@ test_hash(void)
 		return -1;
 	if (test_full_bucket() < 0)
 		return -1;
+	if (test_extendable_bucket() < 0)
+		return -1;
 
 	if (test_fbk_hash_find_existing() < 0)
 		return -1;
@@ -1483,9 +1624,17 @@ test_hash(void)
 		return -1;
 	if (test_hash_creation_with_good_parameters() < 0)
 		return -1;
-	if (test_average_table_utilization() < 0)
+
+	/* ext table disabled */
+	if (test_average_table_utilization(0) < 0)
+		return -1;
+	if (test_hash_iteration(0) < 0)
+		return -1;
+
+	/* ext table enabled */
+	if (test_average_table_utilization(1) < 0)
 		return -1;
-	if (test_hash_iteration() < 0)
+	if (test_hash_iteration(1) < 0)
 		return -1;
 
 	run_hash_func_tests();
diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 0d39e10..5252111 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -18,7 +18,8 @@
 #include "test.h"
 
 #define MAX_ENTRIES (1 << 19)
-#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
+#define KEYS_TO_ADD (MAX_ENTRIES)
+#define ADD_PERCENT 0.75 /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
 /* BUCKET_SIZE should be same as RTE_HASH_BUCKET_ENTRIES in rte_hash library */
 #define BUCKET_SIZE 8
@@ -78,7 +79,7 @@ static struct rte_hash_parameters ut_params = {
 
 static int
 create_table(unsigned int with_data, unsigned int table_index,
-		unsigned int with_locks)
+		unsigned int with_locks, unsigned int ext)
 {
 	char name[RTE_HASH_NAMESIZE];
 
@@ -96,6 +97,9 @@ create_table(unsigned int with_data, unsigned int table_index,
 	else
 		ut_params.extra_flag = 0;
 
+	if (ext)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	ut_params.name = name;
 	ut_params.key_len = hashtest_key_lens[table_index];
 	ut_params.socket_id = rte_socket_id();
@@ -117,15 +121,21 @@ create_table(unsigned int with_data, unsigned int table_index,
 
 /* Shuffle the keys that have been added, so lookups will be totally random */
 static void
-shuffle_input_keys(unsigned table_index)
+shuffle_input_keys(unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	uint32_t swap_idx;
 	uint8_t temp_key[MAX_KEYSIZE];
 	hash_sig_t temp_signature;
 	int32_t temp_position;
+	unsigned int keys_to_add;
+
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = KEYS_TO_ADD - 1; i > 0; i--) {
+	for (i = keys_to_add - 1; i > 0; i--) {
 		swap_idx = rte_rand() % i;
 
 		memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
@@ -147,14 +157,20 @@ shuffle_input_keys(unsigned table_index)
  * ALL can fit in hash table (no errors)
  */
 static int
-get_input_keys(unsigned with_pushes, unsigned table_index)
+get_input_keys(unsigned int with_pushes, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j;
 	unsigned bucket_idx, incr, success = 1;
 	uint8_t k = 0;
 	int32_t ret;
 	const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
+	unsigned int keys_to_add;
 
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 	/* Reset all arrays */
 	for (i = 0; i < MAX_ENTRIES; i++)
 		slot_taken[i] = 0;
@@ -171,7 +187,7 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 	 * Regardless a key has been added correctly or not (success),
 	 * the next one to try will be increased by 1.
 	 */
-	for (i = 0; i < KEYS_TO_ADD;) {
+	for (i = 0; i < keys_to_add;) {
 		incr = 0;
 		if (i != 0) {
 			keys[i][0] = ++k;
@@ -235,14 +251,20 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 }
 
 static int
-timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_adds(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *data;
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		data = (void *) ((uintptr_t) signatures[i]);
 		if (with_hash && with_data) {
 			ret = rte_hash_add_key_with_hash_data(h[table_index],
@@ -284,22 +306,31 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][ADD][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][ADD][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
 
 static int
-timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_lookups(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i, j;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *ret_data;
 	void *expected_data;
 	int32_t ret;
-
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD; j++) {
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
+	for (i = 0; i < num_lookups / keys_to_add; i++) {
+		for (j = 0; j < keys_to_add; j++) {
 			if (with_hash && with_data) {
 				ret = rte_hash_lookup_with_hash_data(h[table_index],
 							(const void *) keys[j],
@@ -352,13 +383,14 @@ timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_lookups_multi(unsigned with_data, unsigned table_index)
+timed_lookups_multi(unsigned int with_data, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j, k;
 	int32_t positions_burst[BURST_SIZE];
@@ -367,11 +399,20 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	void *ret_data[BURST_SIZE];
 	uint64_t hit_mask;
 	int ret;
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
 
 	const uint64_t start_tsc = rte_rdtsc();
 
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
+	for (i = 0; i < num_lookups/keys_to_add; i++) {
+		for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
 			for (k = 0; k < BURST_SIZE; k++)
 				keys_burst[k] = keys[j * BURST_SIZE + k];
 			if (with_data) {
@@ -419,19 +460,25 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_deletes(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		/* There are no delete functions with data, so just call two functions */
 		if (with_hash)
 			ret = rte_hash_del_key_with_hash(h[table_index],
@@ -451,7 +498,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][DELETE][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][DELETE][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
@@ -469,7 +516,8 @@ reset_table(unsigned table_index)
 }
 
 static int
-run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
+run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
+						unsigned int ext)
 {
 	unsigned i, j, with_data, with_hash;
 
@@ -478,25 +526,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
 
 	for (with_data = 0; with_data <= 1; with_data++) {
 		for (i = 0; i < NUM_KEYSIZES; i++) {
-			if (create_table(with_data, i, with_locks) < 0)
+			if (create_table(with_data, i, with_locks, ext) < 0)
 				return -1;
 
-			if (get_input_keys(with_pushes, i) < 0)
+			if (get_input_keys(with_pushes, i, ext) < 0)
 				return -1;
 			for (with_hash = 0; with_hash <= 1; with_hash++) {
-				if (timed_adds(with_hash, with_data, i) < 0)
+				if (timed_adds(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				for (j = 0; j < NUM_SHUFFLES; j++)
-					shuffle_input_keys(i);
+					shuffle_input_keys(i, ext);
 
-				if (timed_lookups(with_hash, with_data, i) < 0)
+				if (timed_lookups(with_hash, with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_lookups_multi(with_data, i) < 0)
+				if (timed_lookups_multi(with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_deletes(with_hash, with_data, i) < 0)
+				if (timed_deletes(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				/* Print a dot to show progress on operations */
@@ -632,10 +680,16 @@ test_hash_perf(void)
 				printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
 			else
 				printf("\nELEMENTS IN PRIMARY OR SECONDARY LOCATION\n");
-			if (run_all_tbl_perf_tests(with_pushes, with_locks) < 0)
+			if (run_all_tbl_perf_tests(with_pushes, with_locks, 0) < 0)
 				return -1;
 		}
 	}
+
+	printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
+
+	if (run_all_tbl_perf_tests(1, 0, 1) < 0)
+		return -1;
+
 	if (fbk_hash_perf_test() < 0)
 		return -1;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 4/4] hash: use partial-key hashing
  2018-10-04 16:35   ` [PATCH v6 " Yipeng Wang
                       ` (2 preceding siblings ...)
  2018-10-04 16:35     ` [PATCH v6 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-10-04 16:35     ` Yipeng Wang
  2018-10-10 21:27     ` [PATCH v7 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-04 16:35 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar

This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Basically the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 246 +++++++++++++++++++-------------------
 lib/librte_hash/rte_cuckoo_hash.h |   6 +-
 lib/librte_hash/rte_hash.h        |   5 +-
 3 files changed, 131 insertions(+), 126 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 1e3112e..750caf8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -90,6 +90,36 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
 		return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
 }
 
+/*
+ * We use higher 16 bits of hash as the signature value stored in table.
+ * We use the lower bits for the primary bucket
+ * location. Then we XOR primary bucket location and the signature
+ * to get the secondary bucket location. This is same as
+ * proposed in Bin Fan, et al's paper
+ * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
+ * Smarter Hashing". The benefit to use
+ * XOR is that one could derive the alternative bucket location
+ * by only using the current bucket location and the signature.
+ */
+static inline uint16_t
+get_short_sig(const hash_sig_t hash)
+{
+	return hash >> 16;
+}
+
+static inline uint32_t
+get_prim_bucket_index(const struct rte_hash *h, const hash_sig_t hash)
+{
+	return hash & h->bucket_bitmask;
+}
+
+static inline uint32_t
+get_alt_bucket_index(const struct rte_hash *h,
+			uint32_t cur_bkt_idx, uint16_t sig)
+{
+	return (cur_bkt_idx ^ sig) & h->bucket_bitmask;
+}
+
 struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
@@ -327,9 +357,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
-		h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
-	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
 		h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
 	else
 #endif
@@ -417,18 +445,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
 	return h->hash_func(key, h->key_len, h->hash_func_init_val);
 }
 
-/* Calc the secondary hash value from the primary hash value of a given key */
-static inline hash_sig_t
-rte_hash_secondary_hash(const hash_sig_t primary_hash)
-{
-	static const unsigned all_bits_shift = 12;
-	static const unsigned alt_bits_xor = 0x5bd1e995;
-
-	uint32_t tag = primary_hash >> all_bits_shift;
-
-	return primary_hash ^ ((tag + 1) * alt_bits_xor);
-}
-
 int32_t
 rte_hash_count(const struct rte_hash *h)
 {
@@ -560,14 +576,13 @@ enqueue_slot_back(const struct rte_hash *h,
 /* Search a key from bucket and update its data */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
-	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+	struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	int i;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		if (bkt->sig_current[i] == sig &&
-				bkt->sig_alt[i] == alt_hash) {
+		if (bkt->sig_current[i] == sig) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -594,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		struct rte_hash_bucket *prim_bkt,
 		struct rte_hash_bucket *sec_bkt,
 		const struct rte_hash_key *key, void *data,
-		hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+		uint16_t sig, uint32_t new_idx,
 		int32_t *ret_val)
 {
 	unsigned int i;
@@ -605,7 +620,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -613,7 +628,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -628,7 +643,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			prim_bkt->sig_alt[i] = alt_hash;
 			prim_bkt->key_idx[i] = new_idx;
 			break;
 		}
@@ -653,7 +667,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *alt_bkt,
 			const struct rte_hash_key *key, void *data,
 			struct queue_node *leaf, uint32_t leaf_slot,
-			hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+			uint16_t sig, uint32_t new_idx,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
@@ -674,7 +688,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -682,7 +696,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -695,8 +709,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		prev_bkt = prev_node->bkt;
 		prev_slot = curr_node->prev_slot;
 
-		prev_alt_bkt_idx =
-			prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
+		prev_alt_bkt_idx = get_alt_bucket_index(h,
+					prev_node->cur_bkt_idx,
+					prev_bkt->sig_current[prev_slot]);
 
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
@@ -710,10 +725,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		 * Cuckoo insert to move elements back to its
 		 * primary bucket if available
 		 */
-		curr_bkt->sig_alt[curr_slot] =
-			 prev_bkt->sig_current[prev_slot];
 		curr_bkt->sig_current[curr_slot] =
-			prev_bkt->sig_alt[prev_slot];
+			prev_bkt->sig_current[prev_slot];
 		curr_bkt->key_idx[curr_slot] =
 			prev_bkt->key_idx[prev_slot];
 
@@ -723,7 +736,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
-	curr_bkt->sig_alt[curr_slot] = alt_hash;
 	curr_bkt->key_idx[curr_slot] = new_idx;
 
 	__hash_rw_writer_unlock(h);
@@ -741,39 +753,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *bkt,
 			struct rte_hash_bucket *sec_bkt,
 			const struct rte_hash_key *key, void *data,
-			hash_sig_t sig, hash_sig_t alt_hash,
+			uint16_t sig, uint32_t bucket_idx,
 			uint32_t new_idx, int32_t *ret_val)
 {
 	unsigned int i;
 	struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
 	struct queue_node *tail, *head;
 	struct rte_hash_bucket *curr_bkt, *alt_bkt;
+	uint32_t cur_idx, alt_idx;
 
 	tail = queue;
 	head = queue + 1;
 	tail->bkt = bkt;
 	tail->prev = NULL;
 	tail->prev_slot = -1;
+	tail->cur_bkt_idx = bucket_idx;
 
 	/* Cuckoo bfs Search */
 	while (likely(tail != head && head <
 					queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
 					RTE_HASH_BUCKET_ENTRIES)) {
 		curr_bkt = tail->bkt;
+		cur_idx = tail->cur_bkt_idx;
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
 				int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
 						bkt, sec_bkt, key, data,
-						tail, i, sig, alt_hash,
+						tail, i, sig,
 						new_idx, ret_val);
 				if (likely(ret != -1))
 					return ret;
 			}
 
 			/* Enqueue new node and keep prev node info */
-			alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
-						    & h->bucket_bitmask]);
+			alt_idx = get_alt_bucket_index(h, cur_idx,
+						curr_bkt->sig_current[i]);
+			alt_bkt = &(h->buckets[alt_idx]);
 			head->bkt = alt_bkt;
+			head->cur_bkt_idx = alt_idx;
 			head->prev = tail;
 			head->prev_slot = i;
 			head++;
@@ -788,7 +805,7 @@ static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig, void *data)
 {
-	hash_sig_t alt_hash;
+	uint16_t short_sig;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
@@ -803,18 +820,17 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	int32_t ret_val;
 	struct rte_hash_bucket *last;
 
-	prim_bucket_idx = sig & h->bucket_bitmask;
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
 	prim_bkt = &h->buckets[prim_bucket_idx];
-	rte_prefetch0(prim_bkt);
-
-	alt_hash = rte_hash_secondary_hash(sig);
-	sec_bucket_idx = alt_hash & h->bucket_bitmask;
 	sec_bkt = &h->buckets[sec_bucket_idx];
+	rte_prefetch0(prim_bkt);
 	rte_prefetch0(sec_bkt);
 
 	/* Check if key is already inserted in primary location */
 	__hash_rw_writer_lock(h);
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -822,12 +838,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Check if key is already inserted in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			return ret;
 		}
 	}
+
 	__hash_rw_writer_unlock(h);
 
 	/* Did not find a match, so get a new slot for storing the new key */
@@ -865,7 +882,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+					short_sig, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -875,7 +892,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -885,7 +902,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-					alt_hash, sig, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, new_idx, &ret_val);
 
 	if (ret == 0)
 		return new_idx - 1;
@@ -905,14 +922,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	 */
 	__hash_rw_writer_lock(h);
 	/* We check for duplicates again since could be inserted before the lock */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		goto failure;
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			enqueue_slot_back(h, cached_free_slots, slot_id);
 			goto failure;
@@ -924,8 +941,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
-				cur_bkt->sig_current[i] = alt_hash;
-				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->sig_current[i] = short_sig;
 				cur_bkt->key_idx[i] = new_idx;
 				__hash_rw_writer_unlock(h);
 				return new_idx - 1;
@@ -943,8 +959,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
-	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
@@ -1003,7 +1018,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
 
 /* Search one bucket to find the match key */
 static inline int32_t
-search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
+search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
 			void **data, const struct rte_hash_bucket *bkt)
 {
 	int i;
@@ -1032,30 +1047,30 @@ static inline int32_t
 __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 					hash_sig_t sig, void **data)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_reader_lock(h);
 
 	/* Check if key is in primary location */
-	ret = search_one_bucket(h, key, sig, data, bkt);
+	ret = search_one_bucket(h, key, short_sig, data, bkt);
 	if (ret != -1) {
 		__hash_rw_reader_unlock(h);
 		return ret;
 	}
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	bkt = &h->buckets[sec_bucket_idx];
 
 	/* Check if key is in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, bkt) {
-		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
 		if (ret != -1) {
 			__hash_rw_reader_unlock(h);
 			return ret;
@@ -1102,7 +1117,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	struct lcore_cache *cached_free_slots;
 
 	bkt->sig_current[i] = NULL_SIGNATURE;
-	bkt->sig_alt[i] = NULL_SIGNATURE;
 	if (h->multi_writer_support) {
 		lcore_id = rte_lcore_id();
 		cached_free_slots = &h->local_free_slots[lcore_id];
@@ -1141,9 +1155,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
 			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
-			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			last_bkt->sig_alt[i] = NULL_SIGNATURE;
 			last_bkt->key_idx[i] = EMPTY_SLOT;
 			return;
 		}
@@ -1153,7 +1165,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
+			struct rte_hash_bucket *bkt, uint16_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
@@ -1185,19 +1197,21 @@ static inline int32_t
 __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
 	struct rte_hash_bucket *cur_bkt;
 	int pos;
 	int32_t ret, i;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	prim_bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	prim_bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
+	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
 	if (ret != -1) {
 		__rte_hash_compact_ll(prim_bkt, pos);
 		last_bkt = prim_bkt->next;
@@ -1206,12 +1220,10 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	sec_bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[sec_bucket_idx];
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
 		if (ret != -1) {
 			__rte_hash_compact_ll(cur_bkt, pos);
 			last_bkt = sec_bkt->next;
@@ -1288,55 +1300,35 @@ static inline void
 compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 			const struct rte_hash_bucket *prim_bkt,
 			const struct rte_hash_bucket *sec_bkt,
-			hash_sig_t prim_hash, hash_sig_t sec_hash,
+			uint16_t sig,
 			enum rte_hash_sig_compare_function sig_cmp_fn)
 {
 	unsigned int i;
 
+	/* For match mask the first bit of every two bits indicates the match */
 	switch (sig_cmp_fn) {
-#ifdef RTE_MACHINE_CPUFLAG_AVX2
-	case RTE_HASH_COMPARE_AVX2:
-		*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)prim_bkt->sig_current),
-				_mm256_set1_epi32(prim_hash)));
-		*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)sec_bkt->sig_current),
-				_mm256_set1_epi32(sec_hash)));
-		break;
-#endif
 #ifdef RTE_MACHINE_CPUFLAG_SSE2
 	case RTE_HASH_COMPARE_SSE:
-		/* Compare the first 4 signatures in the bucket */
-		*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+		/* Compare all signatures in the bucket */
+		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
-				_mm_set1_epi32(prim_hash)));
-		*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&prim_bkt->sig_current[4]),
-				_mm_set1_epi32(prim_hash)))) << 4;
-		/* Compare the first 4 signatures in the bucket */
-		*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+				_mm_set1_epi16(sig)));
+		/* Compare all signatures in the bucket */
+		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
-				_mm_set1_epi32(sec_hash)));
-		*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&sec_bkt->sig_current[4]),
-				_mm_set1_epi32(sec_hash)))) << 4;
+				_mm_set1_epi16(sig)));
 		break;
 #endif
 	default:
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			*prim_hash_matches |=
-				((prim_hash == prim_bkt->sig_current[i]) << i);
+				((sig == prim_bkt->sig_current[i]) << (i << 1));
 			*sec_hash_matches |=
-				((sec_hash == sec_bkt->sig_current[i]) << i);
+				((sig == sec_bkt->sig_current[i]) << (i << 1));
 		}
 	}
-
 }
 
 #define PREFETCH_OFFSET 4
@@ -1349,7 +1341,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	int32_t i;
 	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
-	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
@@ -1368,10 +1362,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		rte_prefetch0(keys[i + PREFETCH_OFFSET]);
 
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1380,10 +1377,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	/* Calculate and prefetch rest of the buckets */
 	for (; i < num_keys; i++) {
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1394,10 +1394,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
 				primary_bkt[i], secondary_bkt[i],
-				prim_hash[i], sec_hash[i], h->sig_cmp_fn);
+				sig[i], h->sig_cmp_fn);
 
 		if (prim_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 			uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1408,7 +1409,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		}
 
 		if (sec_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 			uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1422,7 +1424,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		positions[i] = -ENOENT;
 		while (prim_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 
 			uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1441,11 +1444,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			prim_hitmask[i] &= ~(1 << (hit_index));
+			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 		while (sec_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 
 			uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1465,7 +1469,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			sec_hitmask[i] &= ~(1 << (hit_index));
+			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 next_key:
@@ -1488,10 +1492,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
 			if (data != NULL)
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], &data[i], cur_bkt);
+						sig[i], &data[i], cur_bkt);
 			else
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], NULL, cur_bkt);
+						sig[i], NULL, cur_bkt);
 			if (ret != -1) {
 				positions[i] = ret;
 				hits |= 1ULL << i;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index e601520..7753cd8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -129,18 +129,15 @@ struct rte_hash_key {
 enum rte_hash_sig_compare_function {
 	RTE_HASH_COMPARE_SCALAR = 0,
 	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_AVX2,
 	RTE_HASH_COMPARE_NUM
 };
 
 /** Bucket structure */
 struct rte_hash_bucket {
-	hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
+	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
 	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
 
-	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
-
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
 	void *next;
@@ -193,6 +190,7 @@ struct rte_hash {
 
 struct queue_node {
 	struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
+	uint32_t cur_bkt_idx;
 
 	struct queue_node *prev;     /* Parent(bucket) in search path */
 	int prev_slot;               /* Parent(slot) in search path */
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 11d8e28..6ace64e 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -40,7 +40,10 @@ extern "C" {
 /** Flag to indicate the extendabe bucket table feature should be used */
 #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
 
-/** Signature of key that is stored internally. */
+/**
+ * The type of hash value of a key.
+ * It should be a value of at least 32bit with fully random pattern.
+ */
 typedef uint32_t hash_sig_t;
 
 /** Type of function that can be used for calculating the hash value. */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 0/4] hash: add extendable bucket and partial key hashing
  2018-10-04 16:35   ` [PATCH v6 " Yipeng Wang
                       ` (3 preceding siblings ...)
  2018-10-04 16:35     ` [PATCH v6 4/4] hash: use partial-key hashing Yipeng Wang
@ 2018-10-10 21:27     ` Yipeng Wang
  2018-10-10 21:27       ` [PATCH v7 1/4] hash: fix race condition in iterate Yipeng Wang
                         ` (4 more replies)
  4 siblings, 5 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-10 21:27 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar, qiaobinf, michel

This patch has dependency on another bug fix patch set:
http://patchwork.dpdk.org/cover/45611/

This patch set makes two major optimizations over the current rte_hash
library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

v6->v7:
Fix a bug accidentally introduced in V5. In iterate function, the main table
copmleted condition should be 
*next == total_entries_main instead of total_entries

v5->v6:
1. hash: fix a typo in comment, found by Honnappa.
2. Fix typos in commit messages.
3. Add review-by and acked-by.

v4->v5:
1. hash: for the first commit, move back the lock and read "position" in the
while condition as Honnappa suggested.
2. hash: minor coding style change (Honnappa) and commit message typo fix.
3. Add Review-by from Honnappa.

v3->v4:
1. hash: Revise commit message to be more clear for "utilization" (Honnappa)
2. hash: in delete key function, return bucket change to use rte_ring_sp_enqueue
instead of rte_ring_mp_enqueue, since it is already protected inside locks.
3. hash: update rte_hash_iterate comments (Honnappa)
4. hash: Add a new commit to fix race condition in the rte_hash_iterate (Honnappa)
5. hash/test: during utilization test, double check rte_hash_cnt returns correct
value (Honnappa)
6. hash: for partial-key-hashing commit, break the get_buckets_index function
into three. It may make future extension easier (Honnappa)
7. hash: change the comment for typedef uint32_t hash_sig_t to be more clear
to users (Honnappa)

v2->v3:
The first four commits were separated from this patch set as another
independent patch set:
https://mails.dpdk.org/archives/dev/2018-September/113118.html
1. hash: move snprintf for ext_ring name under the ext_table condition.
2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
3. hash: after failing cuckoo path, search not only ext buckets, but also the
secondary bucket first to see if there may be an empty location now.
4. hash: totally rewrote the key deleting function logic. If the deleted key was
not in the last bucket of the linked list when ext table enabled, the last
entry in the linked list will be placed in the vacant slot from the deleted
key. The purpose is to compact the entries in the linked list to be more close
to the main table. This is to make sure that not many extendable buckets are
wasted with only one or two entries after some time of running, also benefit
lookup speed.
5. Other minor coding style/comments improvements.

V1->V2:
1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
2. hash: Reorder the rte_hash struct to align cache line better.
3. test: Minor changes in auto test to add key insertion failure check during
iteration test.
4. test: Add new commit to fix read-write test non-consecutive core issue.
4. hash: Add a new commit to remove unnecessary code introduced by previous
patches.
5. hash: Comments improvement and coding style improvements over multiple
places.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>

Yipeng Wang (4):
  hash: fix race condition in iterate
  hash: add extendable bucket feature
  test/hash: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 582 ++++++++++++++++++++++++++++----------
 lib/librte_hash/rte_cuckoo_hash.h |  11 +-
 lib/librte_hash/rte_hash.h        |   8 +-
 test/test/test_hash.c             | 159 ++++++++++-
 test/test/test_hash_perf.c        | 114 ++++++--
 5 files changed, 678 insertions(+), 196 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v7 1/4] hash: fix race condition in iterate
  2018-10-10 21:27     ` [PATCH v7 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
@ 2018-10-10 21:27       ` Yipeng Wang
  2018-10-10 21:27       ` [PATCH v7 2/4] hash: add extendable bucket feature Yipeng Wang
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-10 21:27 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar, qiaobinf, michel

In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.

This commit reads out the position in the while condition,
which makes sure that the position will not be changed
to empty before entering the lock.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..da8ddf4 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1318,7 +1318,7 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	idx = *next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+	while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
 		if (*next == total_entries)
@@ -1326,9 +1326,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
+
 	__hash_rw_reader_lock(h);
-	/* Get position of entry in key table */
-	position = h->buckets[bucket_idx].key_idx[idx];
 	next_key = (struct rte_hash_key *) ((char *)h->key_store +
 				position * h->key_entry_size);
 	/* Return key and data */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 2/4] hash: add extendable bucket feature
  2018-10-10 21:27     ` [PATCH v7 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-10-10 21:27       ` [PATCH v7 1/4] hash: fix race condition in iterate Yipeng Wang
@ 2018-10-10 21:27       ` Yipeng Wang
  2018-10-10 21:27       ` [PATCH v7 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-10 21:27 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar, qiaobinf, michel

In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the hash table load can always achieve 100%.
In other words, the table can always accommodate the same
number of keys as the specified table size. This provides
100% table capacity guarantee.

Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 371 ++++++++++++++++++++++++++++++++------
 lib/librte_hash/rte_cuckoo_hash.h |   5 +
 lib/librte_hash/rte_hash.h        |   3 +
 3 files changed, 327 insertions(+), 52 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index da8ddf4..b872caa 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,6 +31,10 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
+#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)                            \
+	for (CURRENT_BKT = START_BUCKET;                                      \
+		CURRENT_BKT != NULL;                                          \
+		CURRENT_BKT = CURRENT_BKT->next)
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
 	return h;
 }
 
+static inline struct rte_hash_bucket *
+rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt)
+{
+	while (lst_bkt->next != NULL)
+		lst_bkt = lst_bkt->next;
+	return lst_bkt;
+}
+
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)
 {
 	h->cmp_jump_table_idx = KEY_CUSTOM;
@@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	struct rte_tailq_entry *te = NULL;
 	struct rte_hash_list *hash_list;
 	struct rte_ring *r = NULL;
+	struct rte_ring *r_ext = NULL;
 	char hash_name[RTE_HASH_NAMESIZE];
 	void *k = NULL;
 	void *buckets = NULL;
+	void *buckets_ext = NULL;
 	char ring_name[RTE_RING_NAMESIZE];
+	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
 	unsigned i;
 	unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		multi_writer_support = 1;
 	}
 
+	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
+		ext_table_support = 1;
+
 	/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
 	if (multi_writer_support)
 		/*
@@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err;
 	}
 
+	const uint32_t num_buckets = rte_align32pow2(params->entries) /
+						RTE_HASH_BUCKET_ENTRIES;
+
+	/* Create ring for extendable buckets. */
+	if (ext_table_support) {
+		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
+								params->name);
+		r_ext = rte_ring_create(ext_ring_name,
+				rte_align32pow2(num_buckets + 1),
+				params->socket_id, 0);
+
+		if (r_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+								"failed\n");
+			goto err;
+		}
+	}
+
 	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
 
 	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err_unlock;
 	}
 
-	const uint32_t num_buckets = rte_align32pow2(params->entries)
-					/ RTE_HASH_BUCKET_ENTRIES;
-
 	buckets = rte_zmalloc_socket(NULL,
 				num_buckets * sizeof(struct rte_hash_bucket),
 				RTE_CACHE_LINE_SIZE, params->socket_id);
 
 	if (buckets == NULL) {
-		RTE_LOG(ERR, HASH, "memory allocation failed\n");
+		RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
 		goto err_unlock;
 	}
 
+	/* Allocate same number of extendable buckets */
+	if (ext_table_support) {
+		buckets_ext = rte_zmalloc_socket(NULL,
+				num_buckets * sizeof(struct rte_hash_bucket),
+				RTE_CACHE_LINE_SIZE, params->socket_id);
+		if (buckets_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+							"failed\n");
+			goto err_unlock;
+		}
+		/* Populate ext bkt ring. We reserve 0 similar to the
+		 * key-data slot, just in case in future we want to
+		 * use bucket index for the linked list and 0 means NULL
+		 * for next bucket
+		 */
+		for (i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+	}
+
 	const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params->key_len;
 	const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
 
@@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->num_buckets = num_buckets;
 	h->bucket_bitmask = h->num_buckets - 1;
 	h->buckets = buckets;
+	h->buckets_ext = buckets_ext;
+	h->free_ext_bkts = r_ext;
 	h->hash_func = (params->hash_func == NULL) ?
 		default_hash_func : params->hash_func;
 	h->key_store = k;
@@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->hw_trans_mem_support = hw_trans_mem_support;
 	h->multi_writer_support = multi_writer_support;
 	h->readwrite_concur_support = readwrite_concur_support;
+	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
 err:
 	rte_ring_free(r);
+	rte_ring_free(r_ext);
 	rte_free(te);
 	rte_free(h);
 	rte_free(buckets);
+	rte_free(buckets_ext);
 	rte_free(k);
 	return NULL;
 }
@@ -344,8 +402,10 @@ rte_hash_free(struct rte_hash *h)
 		rte_free(h->readwrite_lock);
 	}
 	rte_ring_free(h->free_slots);
+	rte_ring_free(h->free_ext_bkts);
 	rte_free(h->key_store);
 	rte_free(h->buckets);
+	rte_free(h->buckets_ext);
 	rte_free(h);
 	rte_free(te);
 }
@@ -403,7 +463,6 @@ __hash_rw_writer_lock(const struct rte_hash *h)
 		rte_rwlock_write_lock(h->readwrite_lock);
 }
 
-
 static inline void
 __hash_rw_reader_lock(const struct rte_hash *h)
 {
@@ -448,6 +507,14 @@ rte_hash_reset(struct rte_hash *h)
 	while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
 		rte_pause();
 
+	/* clear free extendable bucket ring and memory */
+	if (h->ext_table_support) {
+		memset(h->buckets_ext, 0, h->num_buckets *
+						sizeof(struct rte_hash_bucket));
+		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
+			rte_pause();
+	}
+
 	/* Repopulate the free slots ring. Entry zero is reserved for key misses */
 	if (h->multi_writer_support)
 		tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
@@ -458,6 +525,13 @@ rte_hash_reset(struct rte_hash *h)
 	for (i = 1; i < tot_ring_cnt + 1; i++)
 		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
 
+	/* Repopulate the free ext bkt ring. */
+	if (h->ext_table_support) {
+		for (i = 1; i <= h->num_buckets; i++)
+			rte_ring_sp_enqueue(h->free_ext_bkts,
+						(void *)((uintptr_t) i));
+	}
+
 	if (h->multi_writer_support) {
 		/* Reset local caches per lcore */
 		for (i = 0; i < RTE_MAX_LCORE; i++)
@@ -524,24 +598,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		int32_t *ret_val)
 {
 	unsigned int i;
-	struct rte_hash_bucket *cur_bkt = prim_bkt;
+	struct rte_hash_bucket *cur_bkt;
 	int32_t ret;
 
 	__hash_rw_writer_lock(h);
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	/* Insert new entry if there is room in the primary
@@ -580,7 +657,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
-	struct rte_hash_bucket *cur_bkt = bkt;
+	struct rte_hash_bucket *cur_bkt;
 	struct queue_node *prev_node, *curr_node = leaf;
 	struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
 	uint32_t prev_slot, curr_slot = leaf_slot;
@@ -597,18 +674,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
 
-	ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	while (likely(curr_node->prev != NULL)) {
@@ -711,15 +790,18 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	hash_sig_t alt_hash;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
-	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
 	void *slot_id = NULL;
-	uint32_t new_idx;
+	void *ext_bkt_id = NULL;
+	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
+	unsigned int i;
 	struct lcore_cache *cached_free_slots = NULL;
 	int32_t ret_val;
+	struct rte_hash_bucket *last;
 
 	prim_bucket_idx = sig & h->bucket_bitmask;
 	prim_bkt = &h->buckets[prim_bucket_idx];
@@ -739,10 +821,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Check if key is already inserted in secondary location */
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_writer_unlock(h);
 
@@ -808,10 +892,70 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
-	} else {
+	}
+
+	/* if ext table not enabled, we failed the insertion */
+	if (!h->ext_table_support) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret;
 	}
+
+	/* Now we need to go through the extendable bucket. Protection is needed
+	 * to protect all extendable bucket processes.
+	 */
+	__hash_rw_writer_lock(h);
+	/* We check for duplicates again since could be inserted before the lock */
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	if (ret != -1) {
+		enqueue_slot_back(h, cached_free_slots, slot_id);
+		goto failure;
+	}
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			enqueue_slot_back(h, cached_free_slots, slot_id);
+			goto failure;
+		}
+	}
+
+	/* Search sec and ext buckets to find an empty entry to insert. */
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			/* Check if slot is available */
+			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
+				cur_bkt->sig_current[i] = alt_hash;
+				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->key_idx[i] = new_idx;
+				__hash_rw_writer_unlock(h);
+				return new_idx - 1;
+			}
+		}
+	}
+
+	/* Failed to get an empty entry from extendable buckets. Link a new
+	 * extendable bucket. We first get a free bucket from ring.
+	 */
+	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+		ret = -ENOSPC;
+		goto failure;
+	}
+
+	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
+	/* Use the first location of the new bucket */
+	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
+	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
+	/* Link the new bucket to sec bucket linked list */
+	last = rte_hash_get_last_bkt(sec_bkt);
+	last->next = &h->buckets_ext[bkt_id];
+	__hash_rw_writer_unlock(h);
+	return new_idx - 1;
+
+failure:
+	__hash_rw_writer_unlock(h);
+	return ret;
+
 }
 
 int32_t
@@ -890,7 +1034,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
+	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
 
 	bucket_idx = sig & h->bucket_bitmask;
@@ -910,10 +1054,12 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 	bkt = &h->buckets[bucket_idx];
 
 	/* Check if key is in secondary location */
-	ret = search_one_bucket(h, key, alt_hash, data, bkt);
-	if (ret != -1) {
-		__hash_rw_reader_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, bkt) {
+		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		if (ret != -1) {
+			__hash_rw_reader_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_reader_unlock(h);
 	return -ENOENT;
@@ -978,16 +1124,42 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	}
 }
 
+/* Compact the linked list by moving key from last entry in linked list to the
+ * empty slot.
+ */
+static inline void
+__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
+	int i;
+	struct rte_hash_bucket *last_bkt;
+
+	if (!cur_bkt->next)
+		return;
+
+	last_bkt = rte_hash_get_last_bkt(cur_bkt);
+
+	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
+			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
+			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
+			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
+			last_bkt->sig_current[i] = NULL_SIGNATURE;
+			last_bkt->sig_alt[i] = NULL_SIGNATURE;
+			last_bkt->key_idx[i] = EMPTY_SLOT;
+			return;
+		}
+	}
+}
+
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig)
+			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
 	int32_t ret;
 
-	/* Check if key is in primary location */
+	/* Check if key is in bucket */
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 		if (bkt->sig_current[i] == sig &&
 				bkt->key_idx[i] != EMPTY_SLOT) {
@@ -996,12 +1168,12 @@ search_and_remove(const struct rte_hash *h, const void *key,
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
 				remove_entry(h, bkt, i);
 
-				/*
-				 * Return index where key is stored,
+				/* Return index where key is stored,
 				 * subtracting the first dummy index
 				 */
 				ret = bkt->key_idx[i] - 1;
 				bkt->key_idx[i] = EMPTY_SLOT;
+				*pos = i;
 				return ret;
 			}
 		}
@@ -1015,34 +1187,66 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
-	int32_t ret;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
+	struct rte_hash_bucket *cur_bkt;
+	int pos;
+	int32_t ret, i;
 
 	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	prim_bkt = &h->buckets[bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
 	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+		__rte_hash_compact_ll(prim_bkt, pos);
+		last_bkt = prim_bkt->next;
+		prev_bkt = prim_bkt;
+		goto return_bkt;
 	}
 
 	/* Calculate secondary hash */
 	alt_hash = rte_hash_secondary_hash(sig);
 	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[bucket_idx];
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		if (ret != -1) {
+			__rte_hash_compact_ll(cur_bkt, pos);
+			last_bkt = sec_bkt->next;
+			prev_bkt = sec_bkt;
+			goto return_bkt;
+		}
+	}
 
-	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, bkt, alt_hash);
-	if (ret != -1) {
+	__hash_rw_writer_unlock(h);
+	return -ENOENT;
+
+/* Search last bucket to see if empty to be recycled */
+return_bkt:
+	if (!last_bkt) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
+	while (last_bkt->next) {
+		prev_bkt = last_bkt;
+		last_bkt = last_bkt->next;
+	}
+
+	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT)
+			break;
+	}
+	/* found empty bucket and recycle */
+	if (i == RTE_HASH_BUCKET_ENTRIES) {
+		prev_bkt->next = last_bkt->next = NULL;
+		uint32_t index = last_bkt - h->buckets_ext + 1;
+		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+	}
 
 	__hash_rw_writer_unlock(h);
-	return -ENOENT;
+	return ret;
 }
 
 int32_t
@@ -1143,12 +1347,14 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 {
 	uint64_t hits = 0;
 	int32_t i;
+	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	struct rte_hash_bucket *cur_bkt, *next_bkt;
 
 	/* Prefetch first keys */
 	for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++)
@@ -1266,6 +1472,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		continue;
 	}
 
+	/* all found, do not need to go through ext bkt */
+	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
+		if (hit_mask != NULL)
+			*hit_mask = hits;
+		__hash_rw_reader_unlock(h);
+		return;
+	}
+
+	/* need to check ext buckets for match */
+	for (i = 0; i < num_keys; i++) {
+		if ((hits & (1ULL << i)) != 0)
+			continue;
+		next_bkt = secondary_bkt[i]->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			if (data != NULL)
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], &data[i], cur_bkt);
+			else
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], NULL, cur_bkt);
+			if (ret != -1) {
+				positions[i] = ret;
+				hits |= 1ULL << i;
+				break;
+			}
+		}
+	}
+
 	__hash_rw_reader_unlock(h);
 
 	if (hit_mask != NULL)
@@ -1308,10 +1542,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 
 	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
-	/* Out of bounds */
-	if (*next >= total_entries)
-		return -ENOENT;
+	const uint32_t total_entries_main = h->num_buckets *
+							RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries = total_entries_main << 1;
+
+	/* Out of bounds of all buckets (both main table and ext table) */
+	if (*next >= total_entries_main)
+		goto extend_table;
 
 	/* Calculate bucket and index of current iterator */
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
@@ -1321,8 +1558,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
-		if (*next == total_entries)
-			return -ENOENT;
+		if (*next == total_entries_main)
+			goto extend_table;
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
@@ -1340,4 +1577,34 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	(*next)++;
 
 	return position - 1;
+
+/* Begin to iterate extendable buckets */
+extend_table:
+	/* Out of total bound or if ext bucket feature is not enabled */
+	if (*next >= total_entries || !h->ext_table_support)
+		return -ENOENT;
+
+	bucket_idx = (*next - total_entries_main) / RTE_HASH_BUCKET_ENTRIES;
+	idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+
+	while ((position = h->buckets_ext[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
+		(*next)++;
+		if (*next == total_entries)
+			return -ENOENT;
+		bucket_idx = (*next - total_entries_main) /
+						RTE_HASH_BUCKET_ENTRIES;
+		idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+	}
+	__hash_rw_reader_lock(h);
+	next_key = (struct rte_hash_key *) ((char *)h->key_store +
+				position * h->key_entry_size);
+	/* Return key and data */
+	*key = next_key->key;
+	*data = next_key->pdata;
+
+	__hash_rw_reader_unlock(h);
+
+	/* Increment iterator */
+	(*next)++;
+	return position - 1;
 }
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fc0e5c2..e601520 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -142,6 +142,8 @@ struct rte_hash_bucket {
 	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
+
+	void *next;
 } __rte_cache_aligned;
 
 /** A hash table structure. */
@@ -166,6 +168,7 @@ struct rte_hash {
 	/**< If multi-writer support is enabled. */
 	uint8_t readwrite_concur_support;
 	/**< If read-write concurrency support is enabled */
+	uint8_t ext_table_support;     /**< Enable extendable bucket table */
 	rte_hash_function hash_func;    /**< Function used to calculate hash. */
 	uint32_t hash_func_init_val;    /**< Init value used by hash_func. */
 	rte_hash_cmp_eq_t rte_hash_custom_cmp_eq;
@@ -184,6 +187,8 @@ struct rte_hash {
 	 * to the key table.
 	 */
 	rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
+	struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */
+	struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets */
 } __rte_cache_aligned;
 
 struct queue_node {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d931..11d8e28 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -37,6 +37,9 @@ extern "C" {
 /** Flag to support reader writer concurrency */
 #define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
 
+/** Flag to indicate the extendabe bucket table feature should be used */
+#define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
+
 /** Signature of key that is stored internally. */
 typedef uint32_t hash_sig_t;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 3/4] test/hash: implement extendable bucket hash test
  2018-10-10 21:27     ` [PATCH v7 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-10-10 21:27       ` [PATCH v7 1/4] hash: fix race condition in iterate Yipeng Wang
  2018-10-10 21:27       ` [PATCH v7 2/4] hash: add extendable bucket feature Yipeng Wang
@ 2018-10-10 21:27       ` Yipeng Wang
  2018-10-10 21:27       ` [PATCH v7 4/4] hash: use partial-key hashing Yipeng Wang
  2018-10-16 18:47       ` [PATCH] doc: update release note for hash library Yipeng Wang
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-10 21:27 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar, qiaobinf, michel

This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 test/test/test_hash.c      | 159 +++++++++++++++++++++++++++++++++++++++++++--
 test/test/test_hash_perf.c | 114 +++++++++++++++++++++++---------
 2 files changed, 238 insertions(+), 35 deletions(-)

diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd..815c734 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -660,6 +660,116 @@ static int test_full_bucket(void)
 	return 0;
 }
 
+/*
+ * Similar to the test above (full bucket test), but for extendable buckets.
+ */
+static int test_extendable_bucket(void)
+{
+	struct rte_hash_parameters params_pseudo_hash = {
+		.name = "test5",
+		.entries = 64,
+		.key_len = sizeof(struct flow_key), /* 13 */
+		.hash_func = pseudo_hash,
+		.hash_func_init_val = 0,
+		.socket_id = 0,
+		.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
+	};
+	struct rte_hash *handle;
+	int pos[64];
+	int expected_pos[64];
+	unsigned int i;
+	struct flow_key rand_keys[64];
+
+	for (i = 0; i < 64; i++) {
+		rand_keys[i].port_dst = i;
+		rand_keys[i].port_src = i+1;
+	}
+
+	handle = rte_hash_create(&params_pseudo_hash);
+	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
+
+	/* Fill bucket */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add - update */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Delete 1 key, check other keys are still found */
+	pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
+	print_key_info("Del", &rand_keys[35], pos[35]);
+	RETURN_IF_ERROR(pos[35] != expected_pos[35],
+			"failed to delete key (pos[1]=%d)", pos[35]);
+	pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
+	print_key_info("Lkp", &rand_keys[20], pos[20]);
+	RETURN_IF_ERROR(pos[20] != expected_pos[20],
+			"failed lookup after deleting key from same bucket "
+			"(pos[20]=%d)", pos[20]);
+
+	/* Go back to previous state */
+	pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
+	print_key_info("Add", &rand_keys[35], pos[35]);
+	expected_pos[35] = pos[35];
+	RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)", pos[35]);
+
+	/* Delete */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
+		print_key_info("Del", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to delete key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != -ENOENT,
+			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add again */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	rte_hash_free(handle);
+
+	/* Cover the NULL case. */
+	rte_hash_free(0);
+	return 0;
+}
+
 /******************************************************************************/
 static int
 fbk_hash_unit_test(void)
@@ -1096,7 +1206,7 @@ test_hash_creation_with_good_parameters(void)
  * Test to see the average table utilization (entries added/max entries)
  * before hitting a random entry that cannot be added
  */
-static int test_average_table_utilization(void)
+static int test_average_table_utilization(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	uint8_t simple_key[MAX_KEYSIZE];
@@ -1107,12 +1217,23 @@ static int test_average_table_utilization(void)
 
 	printf("\n# Running test to determine average utilization"
 	       "\n  before adding elements begins to fail\n");
+	if (ext_table)
+		printf("ext table is enabled\n");
+	else
+		printf("ext table is disabled\n");
+
 	printf("Measuring performance, please wait");
 	fflush(stdout);
 	ut_params.entries = 1 << 16;
 	ut_params.name = "test_average_utilization";
 	ut_params.hash_func = rte_jhash;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
+
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
 	for (j = 0; j < ITERATIONS; j++) {
@@ -1139,6 +1260,14 @@ static int test_average_table_utilization(void)
 			rte_hash_free(handle);
 			return -1;
 		}
+		if (ext_table) {
+			if (cnt != ut_params.entries) {
+				printf("rte_hash_count returned wrong value "
+					"%u, %u, %u\n", j, added_keys, cnt);
+				rte_hash_free(handle);
+				return -1;
+			}
+		}
 
 		average_keys_added += added_keys;
 
@@ -1161,7 +1290,7 @@ static int test_average_table_utilization(void)
 }
 
 #define NUM_ENTRIES 256
-static int test_hash_iteration(void)
+static int test_hash_iteration(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	unsigned i;
@@ -1177,6 +1306,11 @@ static int test_hash_iteration(void)
 	ut_params.name = "test_hash_iteration";
 	ut_params.hash_func = rte_jhash;
 	ut_params.key_len = 16;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
@@ -1186,8 +1320,13 @@ static int test_hash_iteration(void)
 		for (i = 0; i < ut_params.key_len; i++)
 			keys[added_keys][i] = rte_rand() % 255;
 		ret = rte_hash_add_key_data(handle, keys[added_keys], data[added_keys]);
-		if (ret < 0)
+		if (ret < 0) {
+			if (ext_table) {
+				printf("Insertion failed for ext table\n");
+				goto err;
+			}
 			break;
+		}
 	}
 
 	/* Iterate through the hash table */
@@ -1474,6 +1613,8 @@ test_hash(void)
 		return -1;
 	if (test_full_bucket() < 0)
 		return -1;
+	if (test_extendable_bucket() < 0)
+		return -1;
 
 	if (test_fbk_hash_find_existing() < 0)
 		return -1;
@@ -1483,9 +1624,17 @@ test_hash(void)
 		return -1;
 	if (test_hash_creation_with_good_parameters() < 0)
 		return -1;
-	if (test_average_table_utilization() < 0)
+
+	/* ext table disabled */
+	if (test_average_table_utilization(0) < 0)
+		return -1;
+	if (test_hash_iteration(0) < 0)
+		return -1;
+
+	/* ext table enabled */
+	if (test_average_table_utilization(1) < 0)
 		return -1;
-	if (test_hash_iteration() < 0)
+	if (test_hash_iteration(1) < 0)
 		return -1;
 
 	run_hash_func_tests();
diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 0d39e10..5252111 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -18,7 +18,8 @@
 #include "test.h"
 
 #define MAX_ENTRIES (1 << 19)
-#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
+#define KEYS_TO_ADD (MAX_ENTRIES)
+#define ADD_PERCENT 0.75 /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
 /* BUCKET_SIZE should be same as RTE_HASH_BUCKET_ENTRIES in rte_hash library */
 #define BUCKET_SIZE 8
@@ -78,7 +79,7 @@ static struct rte_hash_parameters ut_params = {
 
 static int
 create_table(unsigned int with_data, unsigned int table_index,
-		unsigned int with_locks)
+		unsigned int with_locks, unsigned int ext)
 {
 	char name[RTE_HASH_NAMESIZE];
 
@@ -96,6 +97,9 @@ create_table(unsigned int with_data, unsigned int table_index,
 	else
 		ut_params.extra_flag = 0;
 
+	if (ext)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	ut_params.name = name;
 	ut_params.key_len = hashtest_key_lens[table_index];
 	ut_params.socket_id = rte_socket_id();
@@ -117,15 +121,21 @@ create_table(unsigned int with_data, unsigned int table_index,
 
 /* Shuffle the keys that have been added, so lookups will be totally random */
 static void
-shuffle_input_keys(unsigned table_index)
+shuffle_input_keys(unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	uint32_t swap_idx;
 	uint8_t temp_key[MAX_KEYSIZE];
 	hash_sig_t temp_signature;
 	int32_t temp_position;
+	unsigned int keys_to_add;
+
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = KEYS_TO_ADD - 1; i > 0; i--) {
+	for (i = keys_to_add - 1; i > 0; i--) {
 		swap_idx = rte_rand() % i;
 
 		memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
@@ -147,14 +157,20 @@ shuffle_input_keys(unsigned table_index)
  * ALL can fit in hash table (no errors)
  */
 static int
-get_input_keys(unsigned with_pushes, unsigned table_index)
+get_input_keys(unsigned int with_pushes, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j;
 	unsigned bucket_idx, incr, success = 1;
 	uint8_t k = 0;
 	int32_t ret;
 	const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
+	unsigned int keys_to_add;
 
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 	/* Reset all arrays */
 	for (i = 0; i < MAX_ENTRIES; i++)
 		slot_taken[i] = 0;
@@ -171,7 +187,7 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 	 * Regardless a key has been added correctly or not (success),
 	 * the next one to try will be increased by 1.
 	 */
-	for (i = 0; i < KEYS_TO_ADD;) {
+	for (i = 0; i < keys_to_add;) {
 		incr = 0;
 		if (i != 0) {
 			keys[i][0] = ++k;
@@ -235,14 +251,20 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 }
 
 static int
-timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_adds(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *data;
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		data = (void *) ((uintptr_t) signatures[i]);
 		if (with_hash && with_data) {
 			ret = rte_hash_add_key_with_hash_data(h[table_index],
@@ -284,22 +306,31 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][ADD][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][ADD][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
 
 static int
-timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_lookups(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i, j;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *ret_data;
 	void *expected_data;
 	int32_t ret;
-
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD; j++) {
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
+	for (i = 0; i < num_lookups / keys_to_add; i++) {
+		for (j = 0; j < keys_to_add; j++) {
 			if (with_hash && with_data) {
 				ret = rte_hash_lookup_with_hash_data(h[table_index],
 							(const void *) keys[j],
@@ -352,13 +383,14 @@ timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_lookups_multi(unsigned with_data, unsigned table_index)
+timed_lookups_multi(unsigned int with_data, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j, k;
 	int32_t positions_burst[BURST_SIZE];
@@ -367,11 +399,20 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	void *ret_data[BURST_SIZE];
 	uint64_t hit_mask;
 	int ret;
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
 
 	const uint64_t start_tsc = rte_rdtsc();
 
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
+	for (i = 0; i < num_lookups/keys_to_add; i++) {
+		for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
 			for (k = 0; k < BURST_SIZE; k++)
 				keys_burst[k] = keys[j * BURST_SIZE + k];
 			if (with_data) {
@@ -419,19 +460,25 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_deletes(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		/* There are no delete functions with data, so just call two functions */
 		if (with_hash)
 			ret = rte_hash_del_key_with_hash(h[table_index],
@@ -451,7 +498,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][DELETE][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][DELETE][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
@@ -469,7 +516,8 @@ reset_table(unsigned table_index)
 }
 
 static int
-run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
+run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
+						unsigned int ext)
 {
 	unsigned i, j, with_data, with_hash;
 
@@ -478,25 +526,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
 
 	for (with_data = 0; with_data <= 1; with_data++) {
 		for (i = 0; i < NUM_KEYSIZES; i++) {
-			if (create_table(with_data, i, with_locks) < 0)
+			if (create_table(with_data, i, with_locks, ext) < 0)
 				return -1;
 
-			if (get_input_keys(with_pushes, i) < 0)
+			if (get_input_keys(with_pushes, i, ext) < 0)
 				return -1;
 			for (with_hash = 0; with_hash <= 1; with_hash++) {
-				if (timed_adds(with_hash, with_data, i) < 0)
+				if (timed_adds(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				for (j = 0; j < NUM_SHUFFLES; j++)
-					shuffle_input_keys(i);
+					shuffle_input_keys(i, ext);
 
-				if (timed_lookups(with_hash, with_data, i) < 0)
+				if (timed_lookups(with_hash, with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_lookups_multi(with_data, i) < 0)
+				if (timed_lookups_multi(with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_deletes(with_hash, with_data, i) < 0)
+				if (timed_deletes(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				/* Print a dot to show progress on operations */
@@ -632,10 +680,16 @@ test_hash_perf(void)
 				printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
 			else
 				printf("\nELEMENTS IN PRIMARY OR SECONDARY LOCATION\n");
-			if (run_all_tbl_perf_tests(with_pushes, with_locks) < 0)
+			if (run_all_tbl_perf_tests(with_pushes, with_locks, 0) < 0)
 				return -1;
 		}
 	}
+
+	printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
+
+	if (run_all_tbl_perf_tests(1, 0, 1) < 0)
+		return -1;
+
 	if (fbk_hash_perf_test() < 0)
 		return -1;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 4/4] hash: use partial-key hashing
  2018-10-10 21:27     ` [PATCH v7 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
                         ` (2 preceding siblings ...)
  2018-10-10 21:27       ` [PATCH v7 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-10-10 21:27       ` Yipeng Wang
  2018-10-16 18:47       ` [PATCH] doc: update release note for hash library Yipeng Wang
  4 siblings, 0 replies; 107+ messages in thread
From: Yipeng Wang @ 2018-10-10 21:27 UTC (permalink / raw)
  To: bruce.richardson
  Cc: konstantin.ananyev, dev, yipeng1.wang, honnappa.nagarahalli,
	sameh.gobriel, dharmik.thakkar, qiaobinf, michel

This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Basically the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 246 +++++++++++++++++++-------------------
 lib/librte_hash/rte_cuckoo_hash.h |   6 +-
 lib/librte_hash/rte_hash.h        |   5 +-
 3 files changed, 131 insertions(+), 126 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index b872caa..9a48934 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -90,6 +90,36 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
 		return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
 }
 
+/*
+ * We use higher 16 bits of hash as the signature value stored in table.
+ * We use the lower bits for the primary bucket
+ * location. Then we XOR primary bucket location and the signature
+ * to get the secondary bucket location. This is same as
+ * proposed in Bin Fan, et al's paper
+ * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
+ * Smarter Hashing". The benefit to use
+ * XOR is that one could derive the alternative bucket location
+ * by only using the current bucket location and the signature.
+ */
+static inline uint16_t
+get_short_sig(const hash_sig_t hash)
+{
+	return hash >> 16;
+}
+
+static inline uint32_t
+get_prim_bucket_index(const struct rte_hash *h, const hash_sig_t hash)
+{
+	return hash & h->bucket_bitmask;
+}
+
+static inline uint32_t
+get_alt_bucket_index(const struct rte_hash *h,
+			uint32_t cur_bkt_idx, uint16_t sig)
+{
+	return (cur_bkt_idx ^ sig) & h->bucket_bitmask;
+}
+
 struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
@@ -327,9 +357,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
-		h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
-	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
 		h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
 	else
 #endif
@@ -417,18 +445,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
 	return h->hash_func(key, h->key_len, h->hash_func_init_val);
 }
 
-/* Calc the secondary hash value from the primary hash value of a given key */
-static inline hash_sig_t
-rte_hash_secondary_hash(const hash_sig_t primary_hash)
-{
-	static const unsigned all_bits_shift = 12;
-	static const unsigned alt_bits_xor = 0x5bd1e995;
-
-	uint32_t tag = primary_hash >> all_bits_shift;
-
-	return primary_hash ^ ((tag + 1) * alt_bits_xor);
-}
-
 int32_t
 rte_hash_count(const struct rte_hash *h)
 {
@@ -560,14 +576,13 @@ enqueue_slot_back(const struct rte_hash *h,
 /* Search a key from bucket and update its data */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
-	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+	struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	int i;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		if (bkt->sig_current[i] == sig &&
-				bkt->sig_alt[i] == alt_hash) {
+		if (bkt->sig_current[i] == sig) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -594,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		struct rte_hash_bucket *prim_bkt,
 		struct rte_hash_bucket *sec_bkt,
 		const struct rte_hash_key *key, void *data,
-		hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+		uint16_t sig, uint32_t new_idx,
 		int32_t *ret_val)
 {
 	unsigned int i;
@@ -605,7 +620,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -613,7 +628,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -628,7 +643,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			prim_bkt->sig_alt[i] = alt_hash;
 			prim_bkt->key_idx[i] = new_idx;
 			break;
 		}
@@ -653,7 +667,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *alt_bkt,
 			const struct rte_hash_key *key, void *data,
 			struct queue_node *leaf, uint32_t leaf_slot,
-			hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+			uint16_t sig, uint32_t new_idx,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
@@ -674,7 +688,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -682,7 +696,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -695,8 +709,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		prev_bkt = prev_node->bkt;
 		prev_slot = curr_node->prev_slot;
 
-		prev_alt_bkt_idx =
-			prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
+		prev_alt_bkt_idx = get_alt_bucket_index(h,
+					prev_node->cur_bkt_idx,
+					prev_bkt->sig_current[prev_slot]);
 
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
@@ -710,10 +725,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		 * Cuckoo insert to move elements back to its
 		 * primary bucket if available
 		 */
-		curr_bkt->sig_alt[curr_slot] =
-			 prev_bkt->sig_current[prev_slot];
 		curr_bkt->sig_current[curr_slot] =
-			prev_bkt->sig_alt[prev_slot];
+			prev_bkt->sig_current[prev_slot];
 		curr_bkt->key_idx[curr_slot] =
 			prev_bkt->key_idx[prev_slot];
 
@@ -723,7 +736,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
-	curr_bkt->sig_alt[curr_slot] = alt_hash;
 	curr_bkt->key_idx[curr_slot] = new_idx;
 
 	__hash_rw_writer_unlock(h);
@@ -741,39 +753,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *bkt,
 			struct rte_hash_bucket *sec_bkt,
 			const struct rte_hash_key *key, void *data,
-			hash_sig_t sig, hash_sig_t alt_hash,
+			uint16_t sig, uint32_t bucket_idx,
 			uint32_t new_idx, int32_t *ret_val)
 {
 	unsigned int i;
 	struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
 	struct queue_node *tail, *head;
 	struct rte_hash_bucket *curr_bkt, *alt_bkt;
+	uint32_t cur_idx, alt_idx;
 
 	tail = queue;
 	head = queue + 1;
 	tail->bkt = bkt;
 	tail->prev = NULL;
 	tail->prev_slot = -1;
+	tail->cur_bkt_idx = bucket_idx;
 
 	/* Cuckoo bfs Search */
 	while (likely(tail != head && head <
 					queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
 					RTE_HASH_BUCKET_ENTRIES)) {
 		curr_bkt = tail->bkt;
+		cur_idx = tail->cur_bkt_idx;
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
 				int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
 						bkt, sec_bkt, key, data,
-						tail, i, sig, alt_hash,
+						tail, i, sig,
 						new_idx, ret_val);
 				if (likely(ret != -1))
 					return ret;
 			}
 
 			/* Enqueue new node and keep prev node info */
-			alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
-						    & h->bucket_bitmask]);
+			alt_idx = get_alt_bucket_index(h, cur_idx,
+						curr_bkt->sig_current[i]);
+			alt_bkt = &(h->buckets[alt_idx]);
 			head->bkt = alt_bkt;
+			head->cur_bkt_idx = alt_idx;
 			head->prev = tail;
 			head->prev_slot = i;
 			head++;
@@ -788,7 +805,7 @@ static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig, void *data)
 {
-	hash_sig_t alt_hash;
+	uint16_t short_sig;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
@@ -803,18 +820,17 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	int32_t ret_val;
 	struct rte_hash_bucket *last;
 
-	prim_bucket_idx = sig & h->bucket_bitmask;
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
 	prim_bkt = &h->buckets[prim_bucket_idx];
-	rte_prefetch0(prim_bkt);
-
-	alt_hash = rte_hash_secondary_hash(sig);
-	sec_bucket_idx = alt_hash & h->bucket_bitmask;
 	sec_bkt = &h->buckets[sec_bucket_idx];
+	rte_prefetch0(prim_bkt);
 	rte_prefetch0(sec_bkt);
 
 	/* Check if key is already inserted in primary location */
 	__hash_rw_writer_lock(h);
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -822,12 +838,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Check if key is already inserted in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			return ret;
 		}
 	}
+
 	__hash_rw_writer_unlock(h);
 
 	/* Did not find a match, so get a new slot for storing the new key */
@@ -865,7 +882,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+					short_sig, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -875,7 +892,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -885,7 +902,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-					alt_hash, sig, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, new_idx, &ret_val);
 
 	if (ret == 0)
 		return new_idx - 1;
@@ -905,14 +922,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	 */
 	__hash_rw_writer_lock(h);
 	/* We check for duplicates again since could be inserted before the lock */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		goto failure;
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			enqueue_slot_back(h, cached_free_slots, slot_id);
 			goto failure;
@@ -924,8 +941,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
-				cur_bkt->sig_current[i] = alt_hash;
-				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->sig_current[i] = short_sig;
 				cur_bkt->key_idx[i] = new_idx;
 				__hash_rw_writer_unlock(h);
 				return new_idx - 1;
@@ -943,8 +959,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
-	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
@@ -1003,7 +1018,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
 
 /* Search one bucket to find the match key */
 static inline int32_t
-search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
+search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
 			void **data, const struct rte_hash_bucket *bkt)
 {
 	int i;
@@ -1032,30 +1047,30 @@ static inline int32_t
 __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 					hash_sig_t sig, void **data)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_reader_lock(h);
 
 	/* Check if key is in primary location */
-	ret = search_one_bucket(h, key, sig, data, bkt);
+	ret = search_one_bucket(h, key, short_sig, data, bkt);
 	if (ret != -1) {
 		__hash_rw_reader_unlock(h);
 		return ret;
 	}
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	bkt = &h->buckets[sec_bucket_idx];
 
 	/* Check if key is in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, bkt) {
-		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
 		if (ret != -1) {
 			__hash_rw_reader_unlock(h);
 			return ret;
@@ -1102,7 +1117,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	struct lcore_cache *cached_free_slots;
 
 	bkt->sig_current[i] = NULL_SIGNATURE;
-	bkt->sig_alt[i] = NULL_SIGNATURE;
 	if (h->multi_writer_support) {
 		lcore_id = rte_lcore_id();
 		cached_free_slots = &h->local_free_slots[lcore_id];
@@ -1141,9 +1155,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
 			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
-			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			last_bkt->sig_alt[i] = NULL_SIGNATURE;
 			last_bkt->key_idx[i] = EMPTY_SLOT;
 			return;
 		}
@@ -1153,7 +1165,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
+			struct rte_hash_bucket *bkt, uint16_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
@@ -1185,19 +1197,21 @@ static inline int32_t
 __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
 	struct rte_hash_bucket *cur_bkt;
 	int pos;
 	int32_t ret, i;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	prim_bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	prim_bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
+	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
 	if (ret != -1) {
 		__rte_hash_compact_ll(prim_bkt, pos);
 		last_bkt = prim_bkt->next;
@@ -1206,12 +1220,10 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	sec_bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[sec_bucket_idx];
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
 		if (ret != -1) {
 			__rte_hash_compact_ll(cur_bkt, pos);
 			last_bkt = sec_bkt->next;
@@ -1288,55 +1300,35 @@ static inline void
 compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 			const struct rte_hash_bucket *prim_bkt,
 			const struct rte_hash_bucket *sec_bkt,
-			hash_sig_t prim_hash, hash_sig_t sec_hash,
+			uint16_t sig,
 			enum rte_hash_sig_compare_function sig_cmp_fn)
 {
 	unsigned int i;
 
+	/* For match mask the first bit of every two bits indicates the match */
 	switch (sig_cmp_fn) {
-#ifdef RTE_MACHINE_CPUFLAG_AVX2
-	case RTE_HASH_COMPARE_AVX2:
-		*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)prim_bkt->sig_current),
-				_mm256_set1_epi32(prim_hash)));
-		*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)sec_bkt->sig_current),
-				_mm256_set1_epi32(sec_hash)));
-		break;
-#endif
 #ifdef RTE_MACHINE_CPUFLAG_SSE2
 	case RTE_HASH_COMPARE_SSE:
-		/* Compare the first 4 signatures in the bucket */
-		*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+		/* Compare all signatures in the bucket */
+		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
-				_mm_set1_epi32(prim_hash)));
-		*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&prim_bkt->sig_current[4]),
-				_mm_set1_epi32(prim_hash)))) << 4;
-		/* Compare the first 4 signatures in the bucket */
-		*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+				_mm_set1_epi16(sig)));
+		/* Compare all signatures in the bucket */
+		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
-				_mm_set1_epi32(sec_hash)));
-		*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&sec_bkt->sig_current[4]),
-				_mm_set1_epi32(sec_hash)))) << 4;
+				_mm_set1_epi16(sig)));
 		break;
 #endif
 	default:
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			*prim_hash_matches |=
-				((prim_hash == prim_bkt->sig_current[i]) << i);
+				((sig == prim_bkt->sig_current[i]) << (i << 1));
 			*sec_hash_matches |=
-				((sec_hash == sec_bkt->sig_current[i]) << i);
+				((sig == sec_bkt->sig_current[i]) << (i << 1));
 		}
 	}
-
 }
 
 #define PREFETCH_OFFSET 4
@@ -1349,7 +1341,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	int32_t i;
 	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
-	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
@@ -1368,10 +1362,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		rte_prefetch0(keys[i + PREFETCH_OFFSET]);
 
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1380,10 +1377,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	/* Calculate and prefetch rest of the buckets */
 	for (; i < num_keys; i++) {
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1394,10 +1394,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
 				primary_bkt[i], secondary_bkt[i],
-				prim_hash[i], sec_hash[i], h->sig_cmp_fn);
+				sig[i], h->sig_cmp_fn);
 
 		if (prim_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 			uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1408,7 +1409,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		}
 
 		if (sec_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 			uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1422,7 +1424,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		positions[i] = -ENOENT;
 		while (prim_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 
 			uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1441,11 +1444,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			prim_hitmask[i] &= ~(1 << (hit_index));
+			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 		while (sec_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 
 			uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1465,7 +1469,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			sec_hitmask[i] &= ~(1 << (hit_index));
+			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 next_key:
@@ -1488,10 +1492,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
 			if (data != NULL)
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], &data[i], cur_bkt);
+						sig[i], &data[i], cur_bkt);
 			else
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], NULL, cur_bkt);
+						sig[i], NULL, cur_bkt);
 			if (ret != -1) {
 				positions[i] = ret;
 				hits |= 1ULL << i;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index e601520..7753cd8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -129,18 +129,15 @@ struct rte_hash_key {
 enum rte_hash_sig_compare_function {
 	RTE_HASH_COMPARE_SCALAR = 0,
 	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_AVX2,
 	RTE_HASH_COMPARE_NUM
 };
 
 /** Bucket structure */
 struct rte_hash_bucket {
-	hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
+	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
 	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
 
-	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
-
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
 	void *next;
@@ -193,6 +190,7 @@ struct rte_hash {
 
 struct queue_node {
 	struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
+	uint32_t cur_bkt_idx;
 
 	struct queue_node *prev;     /* Parent(bucket) in search path */
 	int prev_slot;               /* Parent(slot) in search path */
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 11d8e28..6ace64e 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -40,7 +40,10 @@ extern "C" {
 /** Flag to indicate the extendabe bucket table feature should be used */
 #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
 
-/** Signature of key that is stored internally. */
+/**
+ * The type of hash value of a key.
+ * It should be a value of at least 32bit with fully random pattern.
+ */
 typedef uint32_t hash_sig_t;
 
 /** Type of function that can be used for calculating the hash value. */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH] doc: update release note for hash library
  2018-10-10 21:27     ` [PATCH v7 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
                         ` (3 preceding siblings ...)
  2018-10-10 21:27       ` [PATCH v7 4/4] hash: use partial-key hashing Yipeng Wang
@ 2018-10-16 18:47       ` Yipeng Wang
  2018-10-17 20:09         ` Honnappa Nagarahalli
  4 siblings, 1 reply; 107+ messages in thread
From: Yipeng Wang @ 2018-10-16 18:47 UTC (permalink / raw)
  To: john.mcnamara; +Cc: dev, yipeng1.wang, honnappa.nagarahalli, sameh.gobriel

This patch updates release note for the new extendable bucket
feature and the partial-key hashing.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index c13ea82..95c218d 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -164,6 +164,13 @@ New Features
   this application doesn't need to launch dedicated worker threads for vhost
   enqueue/dequeue operations.
 
+* **Added extendable bucket feature to hash library (rte_hash).**
+
+  This new “extendable bucket” feature provides 100% insertion guarantee to
+  the capacity specified by the user by extending hash table with extra
+  buckets when needed to accommodate the unlikely event of intensive hash
+  collisions.  In addition, the internal hashing algorithm was changed to use
+  partial-key hashing to improve memory efficiency and lookup performance.
 
 API Changes
 -----------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH] doc: update release note for hash library
  2018-10-16 18:47       ` [PATCH] doc: update release note for hash library Yipeng Wang
@ 2018-10-17 20:09         ` Honnappa Nagarahalli
  2018-10-25 18:45           ` Wang, Yipeng1
  0 siblings, 1 reply; 107+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-17 20:09 UTC (permalink / raw)
  To: Yipeng Wang, john.mcnamara; +Cc: dev, sameh.gobriel, nd

> 
> This patch updates release note for the new extendable bucket feature and
> the partial-key hashing.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> ---
>  doc/guides/rel_notes/release_18_11.rst | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_18_11.rst
> b/doc/guides/rel_notes/release_18_11.rst
> index c13ea82..95c218d 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -164,6 +164,13 @@ New Features
>    this application doesn't need to launch dedicated worker threads for vhost
>    enqueue/dequeue operations.
> 
> +* **Added extendable bucket feature to hash library (rte_hash).**
> +
> +  This new “extendable bucket” feature provides 100% insertion
> + guarantee to  the capacity specified by the user by extending hash
> + table with extra  buckets when needed to accommodate the unlikely
> + event of intensive hash  collisions.  In addition, the internal
> + hashing algorithm was changed to use  partial-key hashing to improve
Do we need to provide the reference to partial-key hashing paper?

> memory efficiency and lookup performance.
> 
>  API Changes
>  -----------
> --
> 2.7.4
Other than the above comment, looks fine.
Compiled and verified
Acked-by: Honnappa Nagarahalli <Honnappa.nagarahalli@arm.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH] doc: update release note for hash library
  2018-10-17 20:09         ` Honnappa Nagarahalli
@ 2018-10-25 18:45           ` Wang, Yipeng1
  2018-10-25 23:07             ` Thomas Monjalon
  0 siblings, 1 reply; 107+ messages in thread
From: Wang, Yipeng1 @ 2018-10-25 18:45 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Mcnamara, John; +Cc: dev, Gobriel, Sameh, nd

> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> >
> > +* **Added extendable bucket feature to hash library (rte_hash).**
> > +
> > +  This new “extendable bucket” feature provides 100% insertion
> > + guarantee to  the capacity specified by the user by extending hash
> > + table with extra  buckets when needed to accommodate the unlikely
> > + event of intensive hash  collisions.  In addition, the internal
> > + hashing algorithm was changed to use  partial-key hashing to improve
> Do we need to provide the reference to partial-key hashing paper?
> 
[Wang, Yipeng] Sorry for the delay, I thought I replied...

I assumed it should be a 1-2 sentence summary so I did not include the citation.

@John, do you think it is good to include reference here? The original idea is from a paper which
I referenced in the source code comments and commit message.

> > memory efficiency and lookup performance.
> >
> >  API Changes
> >  -----------
> > --
> > 2.7.4
> Other than the above comment, looks fine.
> Compiled and verified
> Acked-by: Honnappa Nagarahalli <Honnappa.nagarahalli@arm.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v4 0/5] hash: fix multiple issues
  2018-09-28 14:11   ` [PATCH v4 0/5] hash: fix multiple issues Yipeng Wang
                       ` (4 preceding siblings ...)
  2018-09-28 14:11     ` [PATCH v4 5/5] hash: fix unused define Yipeng Wang
@ 2018-10-25 22:04     ` Thomas Monjalon
  5 siblings, 0 replies; 107+ messages in thread
From: Thomas Monjalon @ 2018-10-25 22:04 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, bruce.richardson, honnappa.nagarahalli, sameh.gobriel

28/09/2018 16:11, Yipeng Wang:
> This patch set was part of extendable hash table patch
> set before V2. According to Bruce's comment, this patch set
> is now separated from the original patch set for easier
> review and merge.
> https://mails.dpdk.org/archives/dev/2018-September/112555.html
> 
> This patch set fixes multiple issues/bugs from rte_hash and hash
> unit test.
> 
> V3->V4:
> In first commit, per Honnappa's suggestion, added comment to explain
> what value should BUCKET_SIZE be.
> In third commit, fix a typo: "consecutive"
> 
> V2->V3:
> As Bruce suggested:
> Added a new commit to add missing file into meson.build for readwrite test.
> Revised the commit message for the last commit.
> 
> Yipeng Wang (5):
>   test/hash: fix bucket size in hash perf test
>   test/hash: more accurate hash perf test output
>   test/hash: fix rw test with non-consecutive cores
>   test/hash: fix missing file in meson build file
>   hash: fix unused define

Applied, thanks

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH] doc: update release note for hash library
  2018-10-25 18:45           ` Wang, Yipeng1
@ 2018-10-25 23:07             ` Thomas Monjalon
  0 siblings, 0 replies; 107+ messages in thread
From: Thomas Monjalon @ 2018-10-25 23:07 UTC (permalink / raw)
  To: Wang, Yipeng1
  Cc: dev, Honnappa Nagarahalli, Mcnamara, John, Gobriel, Sameh, nd

25/10/2018 20:45, Wang, Yipeng1:
> > -----Original Message-----
> > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> > >
> > > +* **Added extendable bucket feature to hash library (rte_hash).**
> > > +
> > > +  This new “extendable bucket” feature provides 100% insertion
> > > + guarantee to  the capacity specified by the user by extending hash
> > > + table with extra  buckets when needed to accommodate the unlikely
> > > + event of intensive hash  collisions.  In addition, the internal
> > > + hashing algorithm was changed to use  partial-key hashing to improve
> > Do we need to provide the reference to partial-key hashing paper?
> > 
> [Wang, Yipeng] Sorry for the delay, I thought I replied...
> 
> I assumed it should be a 1-2 sentence summary so I did not include the citation.
> 
> @John, do you think it is good to include reference here? The original idea is from a paper which
> I referenced in the source code comments and commit message.

It's really better when release notes are updated in the same patch
as the code. I have inserted it in your patches when applying them.

I think the citation is not so important in the release notes.

^ permalink raw reply	[flat|nested] 107+ messages in thread

end of thread, other threads:[~2018-10-25 23:07 UTC | newest]

Thread overview: 107+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-06 17:09 [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing Yipeng Wang
2018-09-06 17:09 ` [PATCH v1 1/5] test: fix bucket size in hash table perf test Yipeng Wang
2018-09-06 17:09 ` [PATCH v1 2/5] test: more accurate hash table perf test output Yipeng Wang
2018-09-06 17:09 ` [PATCH v1 3/5] hash: add extendable bucket feature Yipeng Wang
2018-09-06 17:09 ` [PATCH v1 4/5] test: implement extendable bucket hash test Yipeng Wang
2018-09-06 17:09 ` [PATCH v1 5/5] hash: use partial-key hashing Yipeng Wang
2018-09-21 17:17 ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Yipeng Wang
2018-09-21 17:17   ` [PATCH v2 1/7] test/hash: fix bucket size in hash perf test Yipeng Wang
2018-09-26 10:04     ` Bruce Richardson
2018-09-27  3:39       ` Wang, Yipeng1
2018-09-27  4:23     ` Honnappa Nagarahalli
2018-09-29  0:31       ` Wang, Yipeng1
2018-09-21 17:17   ` [PATCH v2 2/7] test/hash: more accurate hash perf test output Yipeng Wang
2018-09-26 10:07     ` Bruce Richardson
2018-09-21 17:17   ` [PATCH v2 3/7] test/hash: fix rw test with non-consecutive cores Yipeng Wang
2018-09-26 11:02     ` Bruce Richardson
2018-09-27  3:40       ` Wang, Yipeng1
2018-09-26 11:13     ` Bruce Richardson
2018-09-21 17:17   ` [PATCH v2 4/7] hash: fix unnecessary code Yipeng Wang
2018-09-26 12:55     ` Bruce Richardson
2018-09-21 17:17   ` [PATCH v2 5/7] hash: add extendable bucket feature Yipeng Wang
2018-09-27  4:23     ` Honnappa Nagarahalli
2018-09-27 11:15       ` Bruce Richardson
2018-09-27 11:27         ` Ananyev, Konstantin
2018-09-27 12:27           ` Bruce Richardson
2018-09-27 12:33             ` Ananyev, Konstantin
2018-09-27 19:21         ` Honnappa Nagarahalli
2018-09-28 17:35           ` Wang, Yipeng1
2018-09-29 21:09             ` Honnappa Nagarahalli
2018-09-29  1:10       ` Wang, Yipeng1
2018-10-01 20:56         ` Honnappa Nagarahalli
2018-10-02  1:56           ` Wang, Yipeng1
2018-09-21 17:17   ` [PATCH v2 6/7] test/hash: implement extendable bucket hash test Yipeng Wang
2018-09-27  4:24     ` Honnappa Nagarahalli
2018-09-29  0:50       ` Wang, Yipeng1
2018-09-21 17:17   ` [PATCH v2 7/7] hash: use partial-key hashing Yipeng Wang
2018-09-27  4:24     ` Honnappa Nagarahalli
2018-09-29  0:55       ` Wang, Yipeng1
2018-09-26 12:57   ` [PATCH v2 0/7] hash: add extendable bucket and partial key hashing Bruce Richardson
2018-09-27  3:41     ` Wang, Yipeng1
2018-09-27  4:23   ` Honnappa Nagarahalli
2018-09-29  0:46     ` Wang, Yipeng1
2018-09-26 12:54 ` [PATCH v3 0/5] hash: fix multiple issues Yipeng Wang
2018-09-26 12:54   ` [PATCH v3 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
2018-09-27 11:17     ` Bruce Richardson
2018-09-26 12:54   ` [PATCH v3 2/5] test/hash: more accurate hash perf test output Yipeng Wang
2018-09-26 12:54   ` [PATCH v3 3/5] test/hash: fix rw test with non-consecutive cores Yipeng Wang
2018-09-27 11:18     ` Bruce Richardson
2018-09-26 12:54   ` [PATCH v3 4/5] test/hash: fix missing file in meson build file Yipeng Wang
2018-09-27 11:22     ` Bruce Richardson
2018-09-26 12:54   ` [PATCH v3 5/5] hash: fix unused define Yipeng Wang
2018-09-28 14:11   ` [PATCH v4 0/5] hash: fix multiple issues Yipeng Wang
2018-09-28 14:11     ` [PATCH v4 1/5] test/hash: fix bucket size in hash perf test Yipeng Wang
2018-10-01 20:28       ` Honnappa Nagarahalli
2018-09-28 14:11     ` [PATCH v4 2/5] test/hash: more accurate hash perf test output Yipeng Wang
2018-09-28 14:11     ` [PATCH v4 3/5] test/hash: fix rw test with non-consecutive cores Yipeng Wang
2018-09-28 14:11     ` [PATCH v4 4/5] test/hash: fix missing file in meson build file Yipeng Wang
2018-09-28 14:11     ` [PATCH v4 5/5] hash: fix unused define Yipeng Wang
2018-10-25 22:04     ` [PATCH v4 0/5] hash: fix multiple issues Thomas Monjalon
2018-09-26 20:26 ` [PATCH v3 0/3] hash: add extendable bucket and partial key hashing Yipeng Wang
2018-09-26 20:26   ` [PATCH v3 1/3] hash: add extendable bucket feature Yipeng Wang
2018-09-26 20:26   ` [PATCH v3 2/3] test/hash: implement extendable bucket hash test Yipeng Wang
2018-09-26 20:26   ` [PATCH v3 3/3] hash: use partial-key hashing Yipeng Wang
2018-09-28 17:23   ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
2018-09-28 17:23     ` [PATCH v4 1/4] hash: fix race condition in iterate Yipeng Wang
2018-10-01 20:23       ` Honnappa Nagarahalli
2018-10-02  0:17         ` Wang, Yipeng1
2018-10-02  4:26           ` Honnappa Nagarahalli
2018-10-02 23:53             ` Wang, Yipeng1
2018-09-28 17:23     ` [PATCH v4 2/4] hash: add extendable bucket feature Yipeng Wang
2018-10-02  3:58       ` Honnappa Nagarahalli
2018-10-02 23:39         ` Wang, Yipeng1
2018-10-03  4:37           ` Honnappa Nagarahalli
2018-10-03 15:08           ` Stephen Hemminger
2018-10-03 15:08       ` Stephen Hemminger
2018-10-03 16:53         ` Wang, Yipeng1
2018-10-03 17:59           ` Honnappa Nagarahalli
2018-10-04  1:22             ` Wang, Yipeng1
2018-09-28 17:23     ` [PATCH v4 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
2018-10-01 19:53       ` Honnappa Nagarahalli
2018-09-28 17:23     ` [PATCH v4 4/4] hash: use partial-key hashing Yipeng Wang
2018-10-01 20:09       ` Honnappa Nagarahalli
2018-10-03 19:05     ` [PATCH v4 0/4] hash: add extendable bucket and partial key hashing Dharmik Thakkar
2018-10-01 18:34   ` [PATCH v5 " Yipeng Wang
2018-10-01 18:34     ` [PATCH v5 1/4] hash: fix race condition in iterate Yipeng Wang
2018-10-02 17:26       ` Honnappa Nagarahalli
2018-10-01 18:35     ` [PATCH v5 2/4] hash: add extendable bucket feature Yipeng Wang
2018-10-01 18:35     ` [PATCH v5 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
2018-10-01 18:35     ` [PATCH v5 4/4] hash: use partial-key hashing Yipeng Wang
2018-10-02 20:52       ` Dharmik Thakkar
2018-10-03  0:43         ` Wang, Yipeng1
2018-10-03 19:10     ` [PATCH v5 0/4] hash: add extendable bucket and partial key hashing Dharmik Thakkar
2018-10-04  0:36       ` Wang, Yipeng1
2018-10-04 16:35   ` [PATCH v6 " Yipeng Wang
2018-10-04 16:35     ` [PATCH v6 1/4] hash: fix race condition in iterate Yipeng Wang
2018-10-04 16:35     ` [PATCH v6 2/4] hash: add extendable bucket feature Yipeng Wang
2018-10-04 16:35     ` [PATCH v6 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
2018-10-04 16:35     ` [PATCH v6 4/4] hash: use partial-key hashing Yipeng Wang
2018-10-10 21:27     ` [PATCH v7 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
2018-10-10 21:27       ` [PATCH v7 1/4] hash: fix race condition in iterate Yipeng Wang
2018-10-10 21:27       ` [PATCH v7 2/4] hash: add extendable bucket feature Yipeng Wang
2018-10-10 21:27       ` [PATCH v7 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
2018-10-10 21:27       ` [PATCH v7 4/4] hash: use partial-key hashing Yipeng Wang
2018-10-16 18:47       ` [PATCH] doc: update release note for hash library Yipeng Wang
2018-10-17 20:09         ` Honnappa Nagarahalli
2018-10-25 18:45           ` Wang, Yipeng1
2018-10-25 23:07             ` Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.