From mboxrd@z Thu Jan 1 00:00:00 1970 From: "De Lara Guarch, Pablo" Subject: Re: [PATCH v4 1/7] member: implement main API Date: Mon, 2 Oct 2017 10:04:18 +0000 Message-ID: References: <1504655989-1518-1-git-send-email-yipeng1.wang@intel.com> <1506534034-39433-1-git-send-email-yipeng1.wang@intel.com> <1506534034-39433-2-git-send-email-yipeng1.wang@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "thomas@monjalon.net" , "Tai, Charlie" , "Gobriel, Sameh" , "Mcnamara, John" To: "Wang, Yipeng1" , "dev@dpdk.org" Return-path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id F2CEF1B1ED for ; Mon, 2 Oct 2017 12:04:30 +0200 (CEST) In-Reply-To: <1506534034-39433-2-git-send-email-yipeng1.wang@intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Yipeng, > -----Original Message----- > From: Wang, Yipeng1 > Sent: Wednesday, September 27, 2017 6:40 PM > To: dev@dpdk.org > Cc: thomas@monjalon.net; Tai, Charlie ; Gobriel, > Sameh ; De Lara Guarch, Pablo > ; Mcnamara, John > ; Wang, Yipeng1 > Subject: [PATCH v4 1/7] member: implement main API >=20 > Membership library is an extension and generalization of a traditional fi= lter > (for example Bloom Filter) structure. In general, the Membership library = is a > data structure that provides a "set-summary" and responds to set- > membership queries of whether a certain element belongs to a set(s). A > membership test for an element will return the set this element belongs t= o > or not-found if the element is never inserted into the set-summary. >=20 > The results of the membership test are not 100% accurate. Certain false > positive or false negative probability could exist. However, comparing to= a > "full-blown" complete list of elements, a "set-summary" > is memory efficient and fast on lookup. >=20 > This patch adds the main API definition. >=20 > Signed-off-by: Yipeng Wang A few comments on changes that you didn't make in the v4. Thanks, Pablo ... > + > +struct rte_member_setsum * > +rte_member_create(const struct rte_member_parameters *params) { > + struct rte_tailq_entry *te; > + struct rte_member_list *member_list; > + struct rte_member_setsum *setsum; > + int ret; > + > + if (params =3D=3D NULL) { > + rte_errno =3D EINVAL; > + return NULL; > + } > + > + if (params->key_len =3D=3D 0 || > + params->prim_hash_seed =3D=3D params- > >sec_hash_seed) { > + rte_errno =3D EINVAL; > + RTE_MEMBER_LOG(ERR, "Memship create with invalid > parameters\n"); Do not use "Memship". Change to " rte_member_create has invalid parameters"= ? Or something else that you want, but not Memship. > + return NULL; > + } > + ... > +struct rte_member_parameters { > + const char *name; /**< Name of the hash. */ > + > + /** > + * User to specify the type of the setsummary from one of > + * rte_member_setsum_type. > + * > + * HT based setsummary is implemented like a hash table. User > should use > + * this type when there are many sets. > + * > + * vBF setsummary is a vector of bloom filters. It is used when > number > + * of sets is not big (less than 32 for current implementation). > + */ > + enum rte_member_setsum_type type; > + > + /** > + * If it is HT based setsummary, user to specify the subtype or mode > + * of the setsummary. It could be cache, or non-cache mode. > + * Set iscache to be 1 if to use as cache mode. Change to "is_cache". > + * > + * For cache mode, keys can be evicted out of the HT setsummary. > Keys > + * with the same signature and map to the same bucket > + * will overwrite each other in the setsummary table. > + * This mode is useful for the case that the set-summary only > + * needs to keep record of the recently inserted keys. Both > + * false-negative and false-positive could happen. > + * > + * For non-cache mode, keys cannot be evicted out of the cache. So > for > + * this mode the setsummary will become full eventually. Keys with > the > + * same signature but map to the same bucket will still occupy > multiple > + * entries. This mode does not give false-negative result. > + */ > + uint8_t is_cache; > + > + /** > + * For HT setsummary, num_keys equals to the number of entries > of the > + * table. When the number of keys that inserted in the HT > setsummary "number of keys inserted in the HT summary". Or "were inserted". > + * approaches this number, eviction could happen. For cache mode, > + * keys could be evicted out of the table. For non-cache mode, keys > will ... > + /** > + * false_positive_rate is only relevant to vBF based setsummary. > + * false_positive_rate is the user-defined false positive rate > + * given expected number of inserted keys (num_keys). It is used to > + * calculate the total number of bits for each BF, and the number of > + * hash values used during lookup and insertion. For details please > + * refer to vBF implementation and membership library > documentation. > + * Note that this parameter is not directly set by users for HT mode. > + * > + * HT setsummary's false positive rate is in the order of: > + * false_pos =3D (1/bucket_count)*(1/2^16), since we use 16-bit > signature. > + * This is because two keys needs to map to same bucket and same > + * signature to have a collision (false positive). bucket_count is > equal > + * to number of entries (num_keys) divided by entry count per > bucket > + * (RTE_MEMBER_BUCKET_ENTRIES). Thus, the false_positive_rate > is not > + * directly set by users. Unless I understood it incorrectly, I think you should clarify that this field is not set for HT mode, but it is for vBF, right? > + */ > + float false_positive_rate; > + > + /** > + * We use two seeds to calculate two independent hashes for each > key. > + * > + * For HT type, one hash is used as signature, and the other is used > + * for bucket location. > + * For vBF type, these two hashes and their combinations are used > as > + * hash locations to index the bit array. > + */ > + uint32_t prim_hash_seed; > + > + /** > + * The secondary seed should be a different value from the primary > seed. > + */ > + uint32_t sec_hash_seed; > + > + int socket_id; /**< NUMA Socket ID for memory. > */ > +}; > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Find an existing set-summary and return a pointer to it. > + * > + * @param name > + * Name of the set-summary. > + * @return > + * Pointer to the set-summary or NULL if object not found > + * with rte_errno set appropriately. Possible rte_errno values include= : > + * - ENOENT - value not available for return > + */ > +struct rte_member_setsum * > +rte_member_find_existing(const char *name); > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Create set-summary (SS). > + * > + * @param params > + * Parameters to initialize the setsummary. > + * @return > + * Return the pointer to the setsummary. > + * Return value is NULL if the creation failed. > + */ > +struct rte_member_setsum * > +rte_member_create(const struct rte_member_parameters *params); > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Lookup key in set-summary (SS). > + * Single key lookup and return as soon as the first match found > + * > + * @param setsum > + * Pointer of a setsummary. > + * @param key > + * Pointer of the key to be looked up. > + * @param set_id > + * Output the set id matches the key. > + * @return > + * Return 1 for found a match and 0 for not found a match. > + */ > +int > +rte_member_lookup(const struct rte_member_setsum *setsum, const > void *key, > + member_set_t *set_id); > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * lookup bulk of keys in set-summary (SS). "Lookup bulk" > + * Each key lookup returns as soon as the first match found > + * > + * @param setsum > + * Pointer of a setsummary. > + * @param keys > + * Pointer of the bulk of keys to be looked up. > + * @param num_keys > + * Number of keys that will be lookup. > + * @param set_ids > + * Output set ids for all the keys to this array. > + * User should preallocate array that can contain all results, which s= ize is > + * the num_keys. > + * @return