[PATCH 0/2] Improve efficiency of detecting duplicate in libselinux

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Improve efficiency of detecting duplicate in libselinux
@ 2023-02-09 11:42 wanghuizhao
  2023-02-09 11:42 ` [PATCH 1/2] libselinux: migrating hashtab from policycoreutils wanghuizhao
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: wanghuizhao @ 2023-02-09 11:42 UTC (permalink / raw)
  To: selinux, nixiaoming, tengfei22, luohaizheng, leisure.wang, dongxinhua
  Cc: wuquanming, chenjingwen6, douzhaolei, hushijie3, jiangyangyang,
	laiyuanyuan.lai, luozhoujian1, wangxu72, wanqian10, weiyuchen3,
	wangfangpeng1, young.liuyang, zengweilin, wangle6

In my process of using selinux, I found that when semodule -i some.pp loads
a large number of modules with rules, the loading time increases rapidly.

(none) /tmp # time semodule -i 1.pp
real    0m 1.085s
user    0m 0.871s
sys     0m 0.063s
(none) /tmp # time semodule -i 2.pp
real    0m 3.178s
user    0m 2.890s
sys     0m 0.170s
(none) /tmp # time semodule -i 3.pp
real    0m 13.365s
user    0m 12.927s
sys     0m 0.340s

Then, I analyzed and found a function nodups_specs in label_file.c. The
algorithm complexity of implementing this function is O(N^2). In my use
scenario, the N number would be over 20,000, so it would be performed
hundreds of millions of times. It takes 13s to install the policy.

Therefore, I propose an optimization solution to this problem. The purpose
of this function is to check for duplicates. I'd like to propose a new
implementation that uses hash tables to check for duplicates.

This patch set consists of two parts to implement this solution.

- Migrate the existing hashtab templates in policycoreutils/newrole to
  libselinux.

- Implement the hash function and key comparison function required by the
  hashtab template. Improve algorithm for checking duplicate items of
  nodups_specs function.

Test again with the new implementation.

(none) /tmp # time semodule -i 1.pp
real    0m 0.725s
user    0m 0.483s
sys     0m 0.100s
(none) /tmp # time semodule -i 2.pp
real    0m 0.939s
user    0m 0.597s
sys     0m 0.228s
(none) /tmp # time semodule -i 3.pp
real    0m 1.885s
user    0m 1.473s
sys     0m 0.300s

wanghuizhao (2):
  libselinux: migrating hashtab from policycoreutils
  libselinux: performance optimization for duplicate detection

 libselinux/src/hashtab.c    | 208 ++++++++++++++++++++++++++++++++++++++++++++
 libselinux/src/hashtab.h    | 115 ++++++++++++++++++++++++
 libselinux/src/label_file.c | 112 +++++++++++++++++++-----
 libselinux/src/label_file.h |   5 ++
 4 files changed, 416 insertions(+), 24 deletions(-)
 create mode 100644 libselinux/src/hashtab.c
 create mode 100644 libselinux/src/hashtab.h

-- 
2.12.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] libselinux: migrating hashtab from policycoreutils
  2023-02-09 11:42 [PATCH 0/2] Improve efficiency of detecting duplicate in libselinux wanghuizhao
@ 2023-02-09 11:42 ` wanghuizhao
  2023-02-10 13:58   ` Christian Göttsche
  2023-02-09 11:42 ` [PATCH 2/2] libselinux: performance optimization for duplicate detection wanghuizhao
  2023-02-17  8:44 ` [PATCH v2 0/3] Improve efficiency of detecting duplicate in libselinux wanghuizhao
  2 siblings, 1 reply; 15+ messages in thread
From: wanghuizhao @ 2023-02-09 11:42 UTC (permalink / raw)
  To: selinux, nixiaoming, tengfei22, luohaizheng, leisure.wang, dongxinhua
  Cc: wuquanming, chenjingwen6, douzhaolei, hushijie3, jiangyangyang,
	laiyuanyuan.lai, luozhoujian1, wangxu72, wanqian10, weiyuchen3,
	wangfangpeng1, young.liuyang, zengweilin, wangle6

To use hashtab in libselinux, migrate the existing hashtab template
from policycoreutils/newrole to libselinux.

Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
---
 libselinux/src/hashtab.c | 208 +++++++++++++++++++++++++++++++++++++++++++++++
 libselinux/src/hashtab.h | 115 ++++++++++++++++++++++++++
 2 files changed, 323 insertions(+)
 create mode 100644 libselinux/src/hashtab.c
 create mode 100644 libselinux/src/hashtab.h

diff --git a/libselinux/src/hashtab.c b/libselinux/src/hashtab.c
new file mode 100644
index 00000000..26d4f4c7
--- /dev/null
+++ b/libselinux/src/hashtab.c
@@ -0,0 +1,208 @@
+
+/* Author : Stephen Smalley, <sds@tycho.nsa.gov> */
+
+/* FLASK */
+
+/*
+ * Implementation of the hash table type.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+#include "hashtab.h"
+
+hashtab_t hashtab_create(unsigned int (*hash_value) (hashtab_t h,
+						     const_hashtab_key_t key),
+			 int (*keycmp) (hashtab_t h,
+					const_hashtab_key_t key1,
+					const_hashtab_key_t key2),
+			 unsigned int size)
+{
+
+	hashtab_t p;
+	unsigned int i;
+
+	p = (hashtab_t) malloc(sizeof(hashtab_val_t));
+	if (p == NULL)
+		return p;
+
+	memset(p, 0, sizeof(hashtab_val_t));
+	p->size = size;
+	p->nel = 0;
+	p->hash_value = hash_value;
+	p->keycmp = keycmp;
+	p->htable = (hashtab_ptr_t *) malloc(sizeof(hashtab_ptr_t) * size);
+	if (p->htable == NULL) {
+		free(p);
+		return NULL;
+	}
+	for (i = 0; i < size; i++)
+		p->htable[i] = (hashtab_ptr_t) NULL;
+
+	return p;
+}
+
+int hashtab_insert(hashtab_t h, hashtab_key_t key, hashtab_datum_t datum)
+{
+	unsigned int hvalue;
+	hashtab_ptr_t prev, cur, newnode;
+
+	if (!h)
+		return HASHTAB_OVERFLOW;
+
+	hvalue = h->hash_value(h, key);
+	prev = NULL;
+	cur = h->htable[hvalue];
+	while (cur && h->keycmp(h, key, cur->key) > 0) {
+		prev = cur;
+		cur = cur->next;
+	}
+
+	if (cur && (h->keycmp(h, key, cur->key) == 0))
+		return HASHTAB_PRESENT;
+
+	newnode = (hashtab_ptr_t) malloc(sizeof(hashtab_node_t));
+	if (newnode == NULL)
+		return HASHTAB_OVERFLOW;
+	memset(newnode, 0, sizeof(struct hashtab_node));
+	newnode->key = key;
+	newnode->datum = datum;
+	if (prev) {
+		newnode->next = prev->next;
+		prev->next = newnode;
+	} else {
+		newnode->next = h->htable[hvalue];
+		h->htable[hvalue] = newnode;
+	}
+
+	h->nel++;
+	return HASHTAB_SUCCESS;
+}
+
+int hashtab_remove(hashtab_t h, hashtab_key_t key,
+		   void (*destroy) (hashtab_key_t k,
+				    hashtab_datum_t d, void *args), void *args)
+{
+	unsigned int hvalue;
+	hashtab_ptr_t cur, last;
+
+	if (!h)
+		return HASHTAB_MISSING;
+
+	hvalue = h->hash_value(h, key);
+	last = NULL;
+	cur = h->htable[hvalue];
+	while (cur != NULL && h->keycmp(h, key, cur->key) > 0) {
+		last = cur;
+		cur = cur->next;
+	}
+
+	if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
+		return HASHTAB_MISSING;
+
+	if (last == NULL)
+		h->htable[hvalue] = cur->next;
+	else
+		last->next = cur->next;
+
+	if (destroy)
+		destroy(cur->key, cur->datum, args);
+	free(cur);
+	h->nel--;
+	return HASHTAB_SUCCESS;
+}
+
+hashtab_datum_t hashtab_search(hashtab_t h, const_hashtab_key_t key)
+{
+
+	unsigned int hvalue;
+	hashtab_ptr_t cur;
+
+	if (!h)
+		return NULL;
+
+	hvalue = h->hash_value(h, key);
+	cur = h->htable[hvalue];
+	while (cur != NULL && h->keycmp(h, key, cur->key) > 0)
+		cur = cur->next;
+
+	if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
+		return NULL;
+
+	return cur->datum;
+}
+
+void hashtab_destroy(hashtab_t h)
+{
+	unsigned int i;
+	hashtab_ptr_t cur, temp;
+
+	if (!h)
+		return;
+
+	for (i = 0; i < h->size; i++) {
+		cur = h->htable[i];
+		while (cur != NULL) {
+			temp = cur;
+			cur = cur->next;
+			free(temp);
+		}
+		h->htable[i] = NULL;
+	}
+
+	free(h->htable);
+	h->htable = NULL;
+
+	free(h);
+}
+
+int hashtab_map(hashtab_t h,
+		int (*apply) (hashtab_key_t k,
+			      hashtab_datum_t d, void *args), void *args)
+{
+	unsigned int i;
+	hashtab_ptr_t cur;
+	int ret;
+
+	if (!h)
+		return HASHTAB_SUCCESS;
+
+	for (i = 0; i < h->size; i++) {
+		cur = h->htable[i];
+		while (cur != NULL) {
+			ret = apply(cur->key, cur->datum, args);
+			if (ret)
+				return ret;
+			cur = cur->next;
+		}
+	}
+	return HASHTAB_SUCCESS;
+}
+
+void hashtab_hash_eval(hashtab_t h, char *tag)
+{
+	unsigned int i;
+	int chain_len, slots_used, max_chain_len;
+	hashtab_ptr_t cur;
+
+	slots_used = 0;
+	max_chain_len = 0;
+	for (i = 0; i < h->size; i++) {
+		cur = h->htable[i];
+		if (cur) {
+			slots_used++;
+			chain_len = 0;
+			while (cur) {
+				chain_len++;
+				cur = cur->next;
+			}
+
+			if (chain_len > max_chain_len)
+				max_chain_len = chain_len;
+		}
+	}
+
+	printf
+	    ("%s:  %d entries and %d/%d buckets used, longest chain length %d\n",
+	     tag, h->nel, slots_used, h->size, max_chain_len);
+}
diff --git a/libselinux/src/hashtab.h b/libselinux/src/hashtab.h
new file mode 100644
index 00000000..092b96a9
--- /dev/null
+++ b/libselinux/src/hashtab.h
@@ -0,0 +1,115 @@
+
+/* Author : Stephen Smalley, <sds@tycho.nsa.gov> */
+
+/* FLASK */
+
+/*
+ * A hash table (hashtab) maintains associations between
+ * key values and datum values.  The type of the key values 
+ * and the type of the datum values is arbitrary.  The
+ * functions for hash computation and key comparison are
+ * provided by the creator of the table.
+ */
+
+#ifndef _NEWROLE_HASHTAB_H_
+#define _NEWROLE_HASHTAB_H_
+
+#include <stdint.h>
+#include <errno.h>
+#include <stdio.h>
+
+typedef char *hashtab_key_t;	/* generic key type */
+typedef const char *const_hashtab_key_t;	/* constant generic key type */
+typedef void *hashtab_datum_t;	/* generic datum type */
+
+typedef struct hashtab_node *hashtab_ptr_t;
+
+typedef struct hashtab_node {
+	hashtab_key_t key;
+	hashtab_datum_t datum;
+	hashtab_ptr_t next;
+} hashtab_node_t;
+
+typedef struct hashtab_val {
+	hashtab_ptr_t *htable;	/* hash table */
+	unsigned int size;	/* number of slots in hash table */
+	uint32_t nel;		/* number of elements in hash table */
+	unsigned int (*hash_value) (struct hashtab_val * h, const_hashtab_key_t key);	/* hash function */
+	int (*keycmp) (struct hashtab_val * h, const_hashtab_key_t key1, const_hashtab_key_t key2);	/* key comparison function */
+} hashtab_val_t;
+
+typedef hashtab_val_t *hashtab_t;
+
+/* Define status codes for hash table functions */
+#define HASHTAB_SUCCESS     0
+#define HASHTAB_OVERFLOW    -ENOMEM
+#define HASHTAB_PRESENT     -EEXIST
+#define HASHTAB_MISSING     -ENOENT
+
+/*
+   Creates a new hash table with the specified characteristics.
+
+   Returns NULL if insufficient space is available or
+   the new hash table otherwise.
+ */
+extern hashtab_t hashtab_create(unsigned int (*hash_value) (hashtab_t h,
+							    const_hashtab_key_t
+							    key),
+				int (*keycmp) (hashtab_t h,
+					       const_hashtab_key_t key1,
+					       const_hashtab_key_t key2),
+				unsigned int size);
+/*
+   Inserts the specified (key, datum) pair into the specified hash table.
+
+   Returns HASHTAB_OVERFLOW if insufficient space is available or
+   HASHTAB_PRESENT  if there is already an entry with the same key or
+   HASHTAB_SUCCESS otherwise.
+ */
+extern int hashtab_insert(hashtab_t h, hashtab_key_t k, hashtab_datum_t d);
+
+/*
+   Removes the entry with the specified key from the hash table.
+   Applies the specified destroy function to (key,datum,args) for
+   the entry.
+
+   Returns HASHTAB_MISSING if no entry has the specified key or
+   HASHTAB_SUCCESS otherwise.
+ */
+extern int hashtab_remove(hashtab_t h, hashtab_key_t k,
+			  void (*destroy) (hashtab_key_t k,
+					   hashtab_datum_t d,
+					   void *args), void *args);
+
+/*
+   Searches for the entry with the specified key in the hash table.
+
+   Returns NULL if no entry has the specified key or
+   the datum of the entry otherwise.
+ */
+extern hashtab_datum_t hashtab_search(hashtab_t h, const_hashtab_key_t k);
+
+/*
+   Destroys the specified hash table.
+ */
+extern void hashtab_destroy(hashtab_t h);
+
+/*
+   Applies the specified apply function to (key,datum,args)
+   for each entry in the specified hash table.
+
+   The order in which the function is applied to the entries
+   is dependent upon the internal structure of the hash table.
+
+   If apply returns a non-zero status, then hashtab_map will cease
+   iterating through the hash table and will propagate the error
+   return to its caller.
+ */
+extern int hashtab_map(hashtab_t h,
+		       int (*apply) (hashtab_key_t k,
+				     hashtab_datum_t d,
+				     void *args), void *args);
+
+extern void hashtab_hash_eval(hashtab_t h, char *tag);
+
+#endif
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] libselinux: performance optimization for duplicate detection
  2023-02-09 11:42 [PATCH 0/2] Improve efficiency of detecting duplicate in libselinux wanghuizhao
  2023-02-09 11:42 ` [PATCH 1/2] libselinux: migrating hashtab from policycoreutils wanghuizhao
@ 2023-02-09 11:42 ` wanghuizhao
  2023-02-10 14:09   ` Christian Göttsche
  2023-02-17  8:44 ` [PATCH v2 0/3] Improve efficiency of detecting duplicate in libselinux wanghuizhao
  2 siblings, 1 reply; 15+ messages in thread
From: wanghuizhao @ 2023-02-09 11:42 UTC (permalink / raw)
  To: selinux, nixiaoming, tengfei22, luohaizheng, leisure.wang, dongxinhua
  Cc: wuquanming, chenjingwen6, douzhaolei, hushijie3, jiangyangyang,
	laiyuanyuan.lai, luozhoujian1, wangxu72, wanqian10, weiyuchen3,
	wangfangpeng1, young.liuyang, zengweilin, wangle6

When semodule -i some.pp to install a module package, duplicate items are
detected for the module. The detection function is nodups_specs in
libselinux/src/label_file.c. The algorithm complexity of implementing
this function is O(N^2). In scenarios where N is very large, the efficiency
is very low.

To solve this problem, I propose to use the hash table to detect duplicates.
The algorithm complexity of new implementing is O(N). The execution efficiency
will be greatly improved.

Comparison between the execution time of the nodups_specs function.

Old double-layer loop implementation O(N^2):

semodule -i myapp1.pp
nodups_specs data->nspec: 5002
nodups_specs start: 11785.242s
nodups_specs end:   11785.588s
nodups_specs consumes:  0.346s

semodule -i myapp2.pp
nodups_specs data->nspec: 10002
nodups_specs start: 11804.280s
nodups_specs end:   11806.546s
nodups_specs consumes:  2.266s

semodule -i myapp3.pp
nodups_specs data->nspec: 20002
nodups_specs start: 11819.106s
nodups_specs end:   11830.892s
nodups_specs consumes: 11.786s

New hash table implementation O(N):

semodule -i myapp1.pp
nodups_specs data->nspec: 5002
nodups_specs start: 11785.588s
nodups_specs end:   11785.590s
nodups_specs consumes:  0.002s

semodule -i myapp2.pp
nodups_specs data->nspec: 10002
nodups_specs start: 11806.546s
nodups_specs end:   11806.552s
nodups_specs consumes:  0.006s

semodule -i myapp3.pp
nodups_specs data->nspec: 20002
nodups_specs start: 11830.892s
nodups_specs end:   11830.905s
nodups_specs consumes:  0.013s

Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
---
 libselinux/src/label_file.c | 112 ++++++++++++++++++++++++++++++++++----------
 libselinux/src/label_file.h |   5 ++
 2 files changed, 93 insertions(+), 24 deletions(-)

diff --git a/libselinux/src/label_file.c b/libselinux/src/label_file.c
index 74ae9b9f..e4a85043 100644
--- a/libselinux/src/label_file.c
+++ b/libselinux/src/label_file.c
@@ -19,6 +19,7 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 
+#include "hashtab.h"
 #include "callbacks.h"
 #include "label_internal.h"
 #include "label_file.h"
@@ -57,40 +58,103 @@ static int find_stem_from_file(struct saved_data *data, const char *key)
 }
 
 /*
+ * hash calculation and key comparison of hash table
+ */
+
+static unsigned int symhash(hashtab_t h, const_hashtab_key_t key)
+{
+	const struct chkdups_key *k = (const struct chkdups_key *)key;
+	const char *p = NULL;
+	size_t size;
+	unsigned int val = 0;
+
+	size = strlen(k->regex);
+	for (p = k->regex; ((size_t) (p - k->regex)) < size; p++)
+		val =
+			(val << 4 | (val >> (8 * sizeof(unsigned int) - 4)) +
+			k->mode) ^ (*p);
+	return val % h->size;
+}
+
+static int symcmp(hashtab_t h
+		  __attribute__ ((unused)), const_hashtab_key_t key1,
+		  const_hashtab_key_t key2)
+{
+	const struct chkdups_key *a = (const struct chkdups_key *)key1;
+	const struct chkdups_key *b = (const struct chkdups_key *)key2;
+
+	return strcmp(a->regex, b->regex) || (a->mode && b->mode && a->mode != b->mode);
+}
+
+/*
  * Warn about duplicate specifications.
  */
 static int nodups_specs(struct saved_data *data, const char *path)
 {
-	int rc = 0;
-	unsigned int ii, jj;
+	int rc = 0, ret = 0;
+	unsigned int ii;
 	struct spec *curr_spec, *spec_arr = data->spec_arr;
+	struct chkdups_key *new = NULL;
+	unsigned int hashtab_len = (data->nspec / 10) ? data->nspec / 10 : 1;
+	hashtab_ptr_t cur, temp;
 
+	hashtab_t hash_table = hashtab_create(symhash, symcmp, data->nspec);
+	if (hash_table == NULL) {
+		rc = -1;
+		COMPAT_LOG(SELINUX_ERROR, "%s: hashtab create failed.\n", path);
+		return rc;
+	}
 	for (ii = 0; ii < data->nspec; ii++) {
-		curr_spec = &spec_arr[ii];
-		for (jj = ii + 1; jj < data->nspec; jj++) {
-			if ((!strcmp(spec_arr[jj].regex_str,
-				curr_spec->regex_str))
-			    && (!spec_arr[jj].mode || !curr_spec->mode
-				|| spec_arr[jj].mode == curr_spec->mode)) {
-				rc = -1;
-				errno = EINVAL;
-				if (strcmp(spec_arr[jj].lr.ctx_raw,
-					    curr_spec->lr.ctx_raw)) {
-					COMPAT_LOG
-						(SELINUX_ERROR,
-						 "%s: Multiple different specifications for %s  (%s and %s).\n",
-						 path, curr_spec->regex_str,
-						 spec_arr[jj].lr.ctx_raw,
-						 curr_spec->lr.ctx_raw);
-				} else {
-					COMPAT_LOG
-						(SELINUX_ERROR,
-						 "%s: Multiple same specifications for %s.\n",
-						 path, curr_spec->regex_str);
-				}
+		new = (struct chkdups_key *)malloc(sizeof(struct chkdups_key));
+		new->regex = spec_arr[ii].regex_str;
+		new->mode = spec_arr[ii].mode;
+		ret = hashtab_insert(hash_table, (hashtab_key_t)new, &spec_arr[ii]);
+		if (ret == HASHTAB_SUCCESS)
+			continue;
+		if (ret == HASHTAB_PRESENT) {
+			curr_spec =
+				(struct spec *)hashtab_search(hash_table, (hashtab_key_t)new);
+			rc = -1;
+			errno = EINVAL;
+			if (strcmp(spec_arr[ii].lr.ctx_raw, curr_spec->lr.ctx_raw)) {
+				COMPAT_LOG
+					(SELINUX_ERROR,
+					 "%s: Multiple different specifications for %s  (%s and %s).\n",
+					 path, curr_spec->regex_str,
+					 spec_arr[ii].lr.ctx_raw,
+					 curr_spec->lr.ctx_raw);
+			} else {
+				COMPAT_LOG
+					(SELINUX_ERROR,
+					 "%s: Multiple same specifications for %s.\n",
+					 path, curr_spec->regex_str);
 			}
 		}
+		if (ret == HASHTAB_OVERFLOW) {
+			rc = -1;
+			COMPAT_LOG
+				(SELINUX_ERROR,
+				"%s: hashtab happen memory error.\n",
+				path);
+			break;
+		}
+	}
+
+	for (ii = 0; ii < hashtab_len; ii++) {
+		cur = hash_table->htable[ii];
+		while (cur != NULL) {
+			temp = cur;
+			cur = cur->next;
+			free(temp->key);
+			free(temp);
+		}
+		hash_table->htable[ii] = NULL;
 	}
+
+	free(hash_table->htable);
+	hash_table->htable = NULL;
+	free(hash_table);
+
 	return rc;
 }
 
diff --git a/libselinux/src/label_file.h b/libselinux/src/label_file.h
index 190bc175..ad79319e 100644
--- a/libselinux/src/label_file.h
+++ b/libselinux/src/label_file.h
@@ -35,6 +35,11 @@
 /* Required selinux_restorecon and selabel_get_digests_all_partial_matches() */
 #define RESTORECON_PARTIAL_MATCH_DIGEST  "security.sehash"
 
+struct chkdups_key {
+	char *regex;
+	unsigned int mode;
+};
+
 struct selabel_sub {
 	char *src;
 	int slen;
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] libselinux: migrating hashtab from policycoreutils
  2023-02-09 11:42 ` [PATCH 1/2] libselinux: migrating hashtab from policycoreutils wanghuizhao
@ 2023-02-10 13:58   ` Christian Göttsche
  2023-02-11 11:53     ` wanghuizhao
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Göttsche @ 2023-02-10 13:58 UTC (permalink / raw)
  To: wanghuizhao; +Cc: selinux

On Thu, 9 Feb 2023 at 12:54, wanghuizhao <wanghuizhao1@huawei.com> wrote:
>
> To use hashtab in libselinux, migrate the existing hashtab template
> from policycoreutils/newrole to libselinux.
>
> Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
> ---
>  libselinux/src/hashtab.c | 208 +++++++++++++++++++++++++++++++++++++++++++++++
>  libselinux/src/hashtab.h | 115 ++++++++++++++++++++++++++
>  2 files changed, 323 insertions(+)
>  create mode 100644 libselinux/src/hashtab.c
>  create mode 100644 libselinux/src/hashtab.h
>
> diff --git a/libselinux/src/hashtab.c b/libselinux/src/hashtab.c
> new file mode 100644
> index 00000000..26d4f4c7
> --- /dev/null
> +++ b/libselinux/src/hashtab.c
> @@ -0,0 +1,208 @@
> +
> +/* Author : Stephen Smalley, <sds@tycho.nsa.gov> */
> +
> +/* FLASK */
> +
> +/*
> + * Implementation of the hash table type.
> + */
> +
> +#include <stdlib.h>
> +#include <string.h>
> +#include "hashtab.h"
> +
> +hashtab_t hashtab_create(unsigned int (*hash_value) (hashtab_t h,
> +                                                    const_hashtab_key_t key),
> +                        int (*keycmp) (hashtab_t h,
> +                                       const_hashtab_key_t key1,
> +                                       const_hashtab_key_t key2),
> +                        unsigned int size)
> +{
> +
> +       hashtab_t p;
> +       unsigned int i;
> +
> +       p = (hashtab_t) malloc(sizeof(hashtab_val_t));
> +       if (p == NULL)
> +               return p;
> +
> +       memset(p, 0, sizeof(hashtab_val_t));
> +       p->size = size;
> +       p->nel = 0;
> +       p->hash_value = hash_value;
> +       p->keycmp = keycmp;
> +       p->htable = (hashtab_ptr_t *) malloc(sizeof(hashtab_ptr_t) * size);
> +       if (p->htable == NULL) {
> +               free(p);
> +               return NULL;
> +       }
> +       for (i = 0; i < size; i++)
> +               p->htable[i] = (hashtab_ptr_t) NULL;
> +
> +       return p;
> +}
> +
> +int hashtab_insert(hashtab_t h, hashtab_key_t key, hashtab_datum_t datum)
> +{
> +       unsigned int hvalue;
> +       hashtab_ptr_t prev, cur, newnode;
> +
> +       if (!h)
> +               return HASHTAB_OVERFLOW;
> +
> +       hvalue = h->hash_value(h, key);
> +       prev = NULL;
> +       cur = h->htable[hvalue];
> +       while (cur && h->keycmp(h, key, cur->key) > 0) {
> +               prev = cur;
> +               cur = cur->next;
> +       }
> +
> +       if (cur && (h->keycmp(h, key, cur->key) == 0))
> +               return HASHTAB_PRESENT;
> +
> +       newnode = (hashtab_ptr_t) malloc(sizeof(hashtab_node_t));
> +       if (newnode == NULL)
> +               return HASHTAB_OVERFLOW;
> +       memset(newnode, 0, sizeof(struct hashtab_node));
> +       newnode->key = key;
> +       newnode->datum = datum;
> +       if (prev) {
> +               newnode->next = prev->next;
> +               prev->next = newnode;
> +       } else {
> +               newnode->next = h->htable[hvalue];
> +               h->htable[hvalue] = newnode;
> +       }
> +
> +       h->nel++;
> +       return HASHTAB_SUCCESS;
> +}
> +
> +int hashtab_remove(hashtab_t h, hashtab_key_t key,
> +                  void (*destroy) (hashtab_key_t k,
> +                                   hashtab_datum_t d, void *args), void *args)
> +{
> +       unsigned int hvalue;
> +       hashtab_ptr_t cur, last;
> +
> +       if (!h)
> +               return HASHTAB_MISSING;
> +
> +       hvalue = h->hash_value(h, key);
> +       last = NULL;
> +       cur = h->htable[hvalue];
> +       while (cur != NULL && h->keycmp(h, key, cur->key) > 0) {
> +               last = cur;
> +               cur = cur->next;
> +       }
> +
> +       if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
> +               return HASHTAB_MISSING;
> +
> +       if (last == NULL)
> +               h->htable[hvalue] = cur->next;
> +       else
> +               last->next = cur->next;
> +
> +       if (destroy)
> +               destroy(cur->key, cur->datum, args);
> +       free(cur);
> +       h->nel--;
> +       return HASHTAB_SUCCESS;
> +}
> +
> +hashtab_datum_t hashtab_search(hashtab_t h, const_hashtab_key_t key)
> +{
> +
> +       unsigned int hvalue;
> +       hashtab_ptr_t cur;
> +
> +       if (!h)
> +               return NULL;
> +
> +       hvalue = h->hash_value(h, key);
> +       cur = h->htable[hvalue];
> +       while (cur != NULL && h->keycmp(h, key, cur->key) > 0)
> +               cur = cur->next;
> +
> +       if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
> +               return NULL;
> +
> +       return cur->datum;
> +}
> +
> +void hashtab_destroy(hashtab_t h)
> +{
> +       unsigned int i;
> +       hashtab_ptr_t cur, temp;
> +
> +       if (!h)
> +               return;
> +
> +       for (i = 0; i < h->size; i++) {
> +               cur = h->htable[i];
> +               while (cur != NULL) {
> +                       temp = cur;
> +                       cur = cur->next;
> +                       free(temp);
> +               }
> +               h->htable[i] = NULL;
> +       }
> +
> +       free(h->htable);
> +       h->htable = NULL;
> +
> +       free(h);
> +}
> +
> +int hashtab_map(hashtab_t h,
> +               int (*apply) (hashtab_key_t k,
> +                             hashtab_datum_t d, void *args), void *args)
> +{
> +       unsigned int i;
> +       hashtab_ptr_t cur;
> +       int ret;
> +
> +       if (!h)
> +               return HASHTAB_SUCCESS;
> +
> +       for (i = 0; i < h->size; i++) {
> +               cur = h->htable[i];
> +               while (cur != NULL) {
> +                       ret = apply(cur->key, cur->datum, args);
> +                       if (ret)
> +                               return ret;
> +                       cur = cur->next;
> +               }
> +       }
> +       return HASHTAB_SUCCESS;
> +}
> +
> +void hashtab_hash_eval(hashtab_t h, char *tag)
> +{
> +       unsigned int i;
> +       int chain_len, slots_used, max_chain_len;
> +       hashtab_ptr_t cur;
> +
> +       slots_used = 0;
> +       max_chain_len = 0;
> +       for (i = 0; i < h->size; i++) {
> +               cur = h->htable[i];
> +               if (cur) {
> +                       slots_used++;
> +                       chain_len = 0;
> +                       while (cur) {
> +                               chain_len++;
> +                               cur = cur->next;
> +                       }
> +
> +                       if (chain_len > max_chain_len)
> +                               max_chain_len = chain_len;
> +               }
> +       }
> +
> +       printf
> +           ("%s:  %d entries and %d/%d buckets used, longest chain length %d\n",
> +            tag, h->nel, slots_used, h->size, max_chain_len);
> +}
> diff --git a/libselinux/src/hashtab.h b/libselinux/src/hashtab.h
> new file mode 100644
> index 00000000..092b96a9
> --- /dev/null
> +++ b/libselinux/src/hashtab.h
> @@ -0,0 +1,115 @@
> +
> +/* Author : Stephen Smalley, <sds@tycho.nsa.gov> */
> +
> +/* FLASK */
> +
> +/*
> + * A hash table (hashtab) maintains associations between
> + * key values and datum values.  The type of the key values


libselinux/src/hashtab.h:8: trailing whitespace.
+ * key values and datum values.  The type of the key values

> + * and the type of the datum values is arbitrary.  The
> + * functions for hash computation and key comparison are
> + * provided by the creator of the table.
> + */
> +
> +#ifndef _NEWROLE_HASHTAB_H_
> +#define _NEWROLE_HASHTAB_H_

_SELINUX_HASHTAB_H ?
(or `#pragma once`, seems to be widely supported according to
https://en.wikipedia.org/wiki/Pragma_once)

> +
> +#include <stdint.h>
> +#include <errno.h>
> +#include <stdio.h>
> +
> +typedef char *hashtab_key_t;   /* generic key type */
> +typedef const char *const_hashtab_key_t;       /* constant generic key type */
> +typedef void *hashtab_datum_t; /* generic datum type */
> +
> +typedef struct hashtab_node *hashtab_ptr_t;
> +
> +typedef struct hashtab_node {
> +       hashtab_key_t key;
> +       hashtab_datum_t datum;
> +       hashtab_ptr_t next;
> +} hashtab_node_t;
> +
> +typedef struct hashtab_val {
> +       hashtab_ptr_t *htable;  /* hash table */
> +       unsigned int size;      /* number of slots in hash table */
> +       uint32_t nel;           /* number of elements in hash table */
> +       unsigned int (*hash_value) (struct hashtab_val * h, const_hashtab_key_t key);   /* hash function */
> +       int (*keycmp) (struct hashtab_val * h, const_hashtab_key_t key1, const_hashtab_key_t key2);     /* key comparison function */
> +} hashtab_val_t;
> +
> +typedef hashtab_val_t *hashtab_t;
> +
> +/* Define status codes for hash table functions */
> +#define HASHTAB_SUCCESS     0
> +#define HASHTAB_OVERFLOW    -ENOMEM
> +#define HASHTAB_PRESENT     -EEXIST
> +#define HASHTAB_MISSING     -ENOENT
> +
> +/*
> +   Creates a new hash table with the specified characteristics.
> +
> +   Returns NULL if insufficient space is available or
> +   the new hash table otherwise.
> + */
> +extern hashtab_t hashtab_create(unsigned int (*hash_value) (hashtab_t h,
> +                                                           const_hashtab_key_t
> +                                                           key),
> +                               int (*keycmp) (hashtab_t h,
> +                                              const_hashtab_key_t key1,
> +                                              const_hashtab_key_t key2),
> +                               unsigned int size);
> +/*
> +   Inserts the specified (key, datum) pair into the specified hash table.
> +
> +   Returns HASHTAB_OVERFLOW if insufficient space is available or
> +   HASHTAB_PRESENT  if there is already an entry with the same key or
> +   HASHTAB_SUCCESS otherwise.
> + */
> +extern int hashtab_insert(hashtab_t h, hashtab_key_t k, hashtab_datum_t d);
> +
> +/*
> +   Removes the entry with the specified key from the hash table.
> +   Applies the specified destroy function to (key,datum,args) for
> +   the entry.
> +
> +   Returns HASHTAB_MISSING if no entry has the specified key or
> +   HASHTAB_SUCCESS otherwise.
> + */
> +extern int hashtab_remove(hashtab_t h, hashtab_key_t k,
> +                         void (*destroy) (hashtab_key_t k,
> +                                          hashtab_datum_t d,
> +                                          void *args), void *args);
> +
> +/*
> +   Searches for the entry with the specified key in the hash table.
> +
> +   Returns NULL if no entry has the specified key or
> +   the datum of the entry otherwise.
> + */
> +extern hashtab_datum_t hashtab_search(hashtab_t h, const_hashtab_key_t k);
> +
> +/*
> +   Destroys the specified hash table.
> + */
> +extern void hashtab_destroy(hashtab_t h);
> +
> +/*
> +   Applies the specified apply function to (key,datum,args)
> +   for each entry in the specified hash table.
> +
> +   The order in which the function is applied to the entries
> +   is dependent upon the internal structure of the hash table.
> +
> +   If apply returns a non-zero status, then hashtab_map will cease
> +   iterating through the hash table and will propagate the error
> +   return to its caller.
> + */
> +extern int hashtab_map(hashtab_t h,
> +                      int (*apply) (hashtab_key_t k,
> +                                    hashtab_datum_t d,
> +                                    void *args), void *args);
> +
> +extern void hashtab_hash_eval(hashtab_t h, char *tag);
> +
> +#endif
> --
> 2.12.3
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] libselinux: performance optimization for duplicate detection
  2023-02-09 11:42 ` [PATCH 2/2] libselinux: performance optimization for duplicate detection wanghuizhao
@ 2023-02-10 14:09   ` Christian Göttsche
  2023-02-11 12:50     ` wanghuizhao
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Göttsche @ 2023-02-10 14:09 UTC (permalink / raw)
  To: wanghuizhao; +Cc: selinux

On Thu, 9 Feb 2023 at 12:54, wanghuizhao <wanghuizhao1@huawei.com> wrote:
>
> When semodule -i some.pp to install a module package, duplicate items are
> detected for the module. The detection function is nodups_specs in
> libselinux/src/label_file.c. The algorithm complexity of implementing
> this function is O(N^2). In scenarios where N is very large, the efficiency
> is very low.
>
> To solve this problem, I propose to use the hash table to detect duplicates.
> The algorithm complexity of new implementing is O(N). The execution efficiency
> will be greatly improved.
>
> Comparison between the execution time of the nodups_specs function.
>
> Old double-layer loop implementation O(N^2):
>
> semodule -i myapp1.pp
> nodups_specs data->nspec: 5002
> nodups_specs start: 11785.242s
> nodups_specs end:   11785.588s
> nodups_specs consumes:  0.346s
>
> semodule -i myapp2.pp
> nodups_specs data->nspec: 10002
> nodups_specs start: 11804.280s
> nodups_specs end:   11806.546s
> nodups_specs consumes:  2.266s
>
> semodule -i myapp3.pp
> nodups_specs data->nspec: 20002
> nodups_specs start: 11819.106s
> nodups_specs end:   11830.892s
> nodups_specs consumes: 11.786s
>
> New hash table implementation O(N):
>
> semodule -i myapp1.pp
> nodups_specs data->nspec: 5002
> nodups_specs start: 11785.588s
> nodups_specs end:   11785.590s
> nodups_specs consumes:  0.002s
>
> semodule -i myapp2.pp
> nodups_specs data->nspec: 10002
> nodups_specs start: 11806.546s
> nodups_specs end:   11806.552s
> nodups_specs consumes:  0.006s
>
> semodule -i myapp3.pp
> nodups_specs data->nspec: 20002
> nodups_specs start: 11830.892s
> nodups_specs end:   11830.905s
> nodups_specs consumes:  0.013s
>
> Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
> ---
>  libselinux/src/label_file.c | 112 ++++++++++++++++++++++++++++++++++----------
>  libselinux/src/label_file.h |   5 ++
>  2 files changed, 93 insertions(+), 24 deletions(-)
>
> diff --git a/libselinux/src/label_file.c b/libselinux/src/label_file.c
> index 74ae9b9f..e4a85043 100644
> --- a/libselinux/src/label_file.c
> +++ b/libselinux/src/label_file.c
> @@ -19,6 +19,7 @@
>  #include <sys/types.h>
>  #include <sys/stat.h>
>
> +#include "hashtab.h"
>  #include "callbacks.h"
>  #include "label_internal.h"
>  #include "label_file.h"
> @@ -57,40 +58,103 @@ static int find_stem_from_file(struct saved_data *data, const char *key)
>  }
>
>  /*
> + * hash calculation and key comparison of hash table
> + */
> +
> +static unsigned int symhash(hashtab_t h, const_hashtab_key_t key)
> +{
> +       const struct chkdups_key *k = (const struct chkdups_key *)key;
> +       const char *p = NULL;
> +       size_t size;
> +       unsigned int val = 0;
> +
> +       size = strlen(k->regex);
> +       for (p = k->regex; ((size_t) (p - k->regex)) < size; p++)
> +               val =
> +                       (val << 4 | (val >> (8 * sizeof(unsigned int) - 4)) +
> +                       k->mode) ^ (*p);

label_file.c: In function ‘symhash’:
label_file.c:74:77: error: suggest parentheses around arithmetic in
operand of ‘|’ [-Werror=parentheses]
   74 |                         (val << 4 | (val >> (8 *
sizeof(unsigned int) - 4)) +
      |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
   75 |                         k->mode) ^ (*p);
      |                         ~~~~~~~

> +       return val % h->size;
> +}
> +
> +static int symcmp(hashtab_t h
> +                 __attribute__ ((unused)), const_hashtab_key_t key1,
> +                 const_hashtab_key_t key2)
> +{
> +       const struct chkdups_key *a = (const struct chkdups_key *)key1;
> +       const struct chkdups_key *b = (const struct chkdups_key *)key2;
> +
> +       return strcmp(a->regex, b->regex) || (a->mode && b->mode && a->mode != b->mode);
> +}
> +
> +/*
>   * Warn about duplicate specifications.
>   */
>  static int nodups_specs(struct saved_data *data, const char *path)
>  {
> -       int rc = 0;
> -       unsigned int ii, jj;
> +       int rc = 0, ret = 0;
> +       unsigned int ii;
>         struct spec *curr_spec, *spec_arr = data->spec_arr;
> +       struct chkdups_key *new = NULL;
> +       unsigned int hashtab_len = (data->nspec / 10) ? data->nspec / 10 : 1;
> +       hashtab_ptr_t cur, temp;
>
> +       hashtab_t hash_table = hashtab_create(symhash, symcmp, data->nspec);
> +       if (hash_table == NULL) {
> +               rc = -1;
> +               COMPAT_LOG(SELINUX_ERROR, "%s: hashtab create failed.\n", path);
> +               return rc;
> +       }
>         for (ii = 0; ii < data->nspec; ii++) {
> -               curr_spec = &spec_arr[ii];
> -               for (jj = ii + 1; jj < data->nspec; jj++) {
> -                       if ((!strcmp(spec_arr[jj].regex_str,
> -                               curr_spec->regex_str))
> -                           && (!spec_arr[jj].mode || !curr_spec->mode
> -                               || spec_arr[jj].mode == curr_spec->mode)) {
> -                               rc = -1;
> -                               errno = EINVAL;
> -                               if (strcmp(spec_arr[jj].lr.ctx_raw,
> -                                           curr_spec->lr.ctx_raw)) {
> -                                       COMPAT_LOG
> -                                               (SELINUX_ERROR,
> -                                                "%s: Multiple different specifications for %s  (%s and %s).\n",
> -                                                path, curr_spec->regex_str,
> -                                                spec_arr[jj].lr.ctx_raw,
> -                                                curr_spec->lr.ctx_raw);
> -                               } else {
> -                                       COMPAT_LOG
> -                                               (SELINUX_ERROR,
> -                                                "%s: Multiple same specifications for %s.\n",
> -                                                path, curr_spec->regex_str);
> -                               }
> +               new = (struct chkdups_key *)malloc(sizeof(struct chkdups_key));

oom check missing

> +               new->regex = spec_arr[ii].regex_str;
> +               new->mode = spec_arr[ii].mode;
> +               ret = hashtab_insert(hash_table, (hashtab_key_t)new, &spec_arr[ii]);
> +               if (ret == HASHTAB_SUCCESS)
> +                       continue;
> +               if (ret == HASHTAB_PRESENT) {
> +                       curr_spec =
> +                               (struct spec *)hashtab_search(hash_table, (hashtab_key_t)new);
> +                       rc = -1;
> +                       errno = EINVAL;
> +                       if (strcmp(spec_arr[ii].lr.ctx_raw, curr_spec->lr.ctx_raw)) {
> +                               COMPAT_LOG
> +                                       (SELINUX_ERROR,
> +                                        "%s: Multiple different specifications for %s  (%s and %s).\n",
> +                                        path, curr_spec->regex_str,
> +                                        spec_arr[ii].lr.ctx_raw,
> +                                        curr_spec->lr.ctx_raw);
> +                       } else {
> +                               COMPAT_LOG
> +                                       (SELINUX_ERROR,
> +                                        "%s: Multiple same specifications for %s.\n",
> +                                        path, curr_spec->regex_str);
>                         }

`new` leaking

>                 }
> +               if (ret == HASHTAB_OVERFLOW) {
> +                       rc = -1;
> +                       COMPAT_LOG
> +                               (SELINUX_ERROR,
> +                               "%s: hashtab happen memory error.\n",
> +                               path);
> +                       break;

`new` leaking

> +               }
> +       }
> +
> +       for (ii = 0; ii < hashtab_len; ii++) {
> +               cur = hash_table->htable[ii];
> +               while (cur != NULL) {
> +                       temp = cur;
> +                       cur = cur->next;
> +                       free(temp->key);
> +                       free(temp);
> +               }
> +               hash_table->htable[ii] = NULL;
>         }

The common way of destroying hash-tables is hashtab_destroy().
Since the keys need to be free'd as well `hashtab_map(hash_table,
key_destroy, NULL)` with a custom key_destroy function can be used.
(To avoid iterating the hash-table twice hashtab_destroy() could be
modified to take an optional key destroy callback.)

> +
> +       free(hash_table->htable);
> +       hash_table->htable = NULL;
> +       free(hash_table);
> +
>         return rc;
>  }
>
> diff --git a/libselinux/src/label_file.h b/libselinux/src/label_file.h
> index 190bc175..ad79319e 100644
> --- a/libselinux/src/label_file.h
> +++ b/libselinux/src/label_file.h
> @@ -35,6 +35,11 @@
>  /* Required selinux_restorecon and selabel_get_digests_all_partial_matches() */
>  #define RESTORECON_PARTIAL_MATCH_DIGEST  "security.sehash"
>
> +struct chkdups_key {
> +       char *regex;
> +       unsigned int mode;
> +};

Why declare in the header and not in the source file?

> +
>  struct selabel_sub {
>         char *src;
>         int slen;
> --
> 2.12.3
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] libselinux: migrating hashtab from policycoreutils
  2023-02-10 13:58   ` Christian Göttsche
@ 2023-02-11 11:53     ` wanghuizhao
  0 siblings, 0 replies; 15+ messages in thread
From: wanghuizhao @ 2023-02-11 11:53 UTC (permalink / raw)
  To: Christian Göttsche
  Cc: selinux, Xiaoming Ni, weiyuchen3, wangfangpeng1, chenjingwen6,
	dongxinhua


On 2023/2/10 21:58, Christian Göttsche wrote:
> On Thu, 9 Feb 2023 at 12:54, wanghuizhao <wanghuizhao1@huawei.com> wrote:
>> To use hashtab in libselinux, migrate the existing hashtab template
>> from policycoreutils/newrole to libselinux.
>>
>> Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
>> ---
>>   libselinux/src/hashtab.c | 208 +++++++++++++++++++++++++++++++++++++++++++++++
>>   libselinux/src/hashtab.h | 115 ++++++++++++++++++++++++++
>>   2 files changed, 323 insertions(+)
>>   create mode 100644 libselinux/src/hashtab.c
>>   create mode 100644 libselinux/src/hashtab.h
>>
>> diff --git a/libselinux/src/hashtab.c b/libselinux/src/hashtab.c
>> new file mode 100644
>> index 00000000..26d4f4c7
>> --- /dev/null
>> +++ b/libselinux/src/hashtab.c
>> @@ -0,0 +1,208 @@
>> +
>> +/* Author : Stephen Smalley, <sds@tycho.nsa.gov> */
>> +
>> +/* FLASK */
>> +
>> +/*
>> + * Implementation of the hash table type.
>> + */
>> +
>> +#include <stdlib.h>
>> +#include <string.h>
>> +#include "hashtab.h"
>> +
>> +hashtab_t hashtab_create(unsigned int (*hash_value) (hashtab_t h,
>> +                                                    const_hashtab_key_t key),
>> +                        int (*keycmp) (hashtab_t h,
>> +                                       const_hashtab_key_t key1,
>> +                                       const_hashtab_key_t key2),
>> +                        unsigned int size)
>> +{
>> +
>> +       hashtab_t p;
>> +       unsigned int i;
>> +
>> +       p = (hashtab_t) malloc(sizeof(hashtab_val_t));
>> +       if (p == NULL)
>> +               return p;
>> +
>> +       memset(p, 0, sizeof(hashtab_val_t));
>> +       p->size = size;
>> +       p->nel = 0;
>> +       p->hash_value = hash_value;
>> +       p->keycmp = keycmp;
>> +       p->htable = (hashtab_ptr_t *) malloc(sizeof(hashtab_ptr_t) * size);
>> +       if (p->htable == NULL) {
>> +               free(p);
>> +               return NULL;
>> +       }
>> +       for (i = 0; i < size; i++)
>> +               p->htable[i] = (hashtab_ptr_t) NULL;
>> +
>> +       return p;
>> +}
>> +
>> +int hashtab_insert(hashtab_t h, hashtab_key_t key, hashtab_datum_t datum)
>> +{
>> +       unsigned int hvalue;
>> +       hashtab_ptr_t prev, cur, newnode;
>> +
>> +       if (!h)
>> +               return HASHTAB_OVERFLOW;
>> +
>> +       hvalue = h->hash_value(h, key);
>> +       prev = NULL;
>> +       cur = h->htable[hvalue];
>> +       while (cur && h->keycmp(h, key, cur->key) > 0) {
>> +               prev = cur;
>> +               cur = cur->next;
>> +       }
>> +
>> +       if (cur && (h->keycmp(h, key, cur->key) == 0))
>> +               return HASHTAB_PRESENT;
>> +
>> +       newnode = (hashtab_ptr_t) malloc(sizeof(hashtab_node_t));
>> +       if (newnode == NULL)
>> +               return HASHTAB_OVERFLOW;
>> +       memset(newnode, 0, sizeof(struct hashtab_node));
>> +       newnode->key = key;
>> +       newnode->datum = datum;
>> +       if (prev) {
>> +               newnode->next = prev->next;
>> +               prev->next = newnode;
>> +       } else {
>> +               newnode->next = h->htable[hvalue];
>> +               h->htable[hvalue] = newnode;
>> +       }
>> +
>> +       h->nel++;
>> +       return HASHTAB_SUCCESS;
>> +}
>> +
>> +int hashtab_remove(hashtab_t h, hashtab_key_t key,
>> +                  void (*destroy) (hashtab_key_t k,
>> +                                   hashtab_datum_t d, void *args), void *args)
>> +{
>> +       unsigned int hvalue;
>> +       hashtab_ptr_t cur, last;
>> +
>> +       if (!h)
>> +               return HASHTAB_MISSING;
>> +
>> +       hvalue = h->hash_value(h, key);
>> +       last = NULL;
>> +       cur = h->htable[hvalue];
>> +       while (cur != NULL && h->keycmp(h, key, cur->key) > 0) {
>> +               last = cur;
>> +               cur = cur->next;
>> +       }
>> +
>> +       if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
>> +               return HASHTAB_MISSING;
>> +
>> +       if (last == NULL)
>> +               h->htable[hvalue] = cur->next;
>> +       else
>> +               last->next = cur->next;
>> +
>> +       if (destroy)
>> +               destroy(cur->key, cur->datum, args);
>> +       free(cur);
>> +       h->nel--;
>> +       return HASHTAB_SUCCESS;
>> +}
>> +
>> +hashtab_datum_t hashtab_search(hashtab_t h, const_hashtab_key_t key)
>> +{
>> +
>> +       unsigned int hvalue;
>> +       hashtab_ptr_t cur;
>> +
>> +       if (!h)
>> +               return NULL;
>> +
>> +       hvalue = h->hash_value(h, key);
>> +       cur = h->htable[hvalue];
>> +       while (cur != NULL && h->keycmp(h, key, cur->key) > 0)
>> +               cur = cur->next;
>> +
>> +       if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
>> +               return NULL;
>> +
>> +       return cur->datum;
>> +}
>> +
>> +void hashtab_destroy(hashtab_t h)
>> +{
>> +       unsigned int i;
>> +       hashtab_ptr_t cur, temp;
>> +
>> +       if (!h)
>> +               return;
>> +
>> +       for (i = 0; i < h->size; i++) {
>> +               cur = h->htable[i];
>> +               while (cur != NULL) {
>> +                       temp = cur;
>> +                       cur = cur->next;
>> +                       free(temp);
>> +               }
>> +               h->htable[i] = NULL;
>> +       }
>> +
>> +       free(h->htable);
>> +       h->htable = NULL;
>> +
>> +       free(h);
>> +}
>> +
>> +int hashtab_map(hashtab_t h,
>> +               int (*apply) (hashtab_key_t k,
>> +                             hashtab_datum_t d, void *args), void *args)
>> +{
>> +       unsigned int i;
>> +       hashtab_ptr_t cur;
>> +       int ret;
>> +
>> +       if (!h)
>> +               return HASHTAB_SUCCESS;
>> +
>> +       for (i = 0; i < h->size; i++) {
>> +               cur = h->htable[i];
>> +               while (cur != NULL) {
>> +                       ret = apply(cur->key, cur->datum, args);
>> +                       if (ret)
>> +                               return ret;
>> +                       cur = cur->next;
>> +               }
>> +       }
>> +       return HASHTAB_SUCCESS;
>> +}
>> +
>> +void hashtab_hash_eval(hashtab_t h, char *tag)
>> +{
>> +       unsigned int i;
>> +       int chain_len, slots_used, max_chain_len;
>> +       hashtab_ptr_t cur;
>> +
>> +       slots_used = 0;
>> +       max_chain_len = 0;
>> +       for (i = 0; i < h->size; i++) {
>> +               cur = h->htable[i];
>> +               if (cur) {
>> +                       slots_used++;
>> +                       chain_len = 0;
>> +                       while (cur) {
>> +                               chain_len++;
>> +                               cur = cur->next;
>> +                       }
>> +
>> +                       if (chain_len > max_chain_len)
>> +                               max_chain_len = chain_len;
>> +               }
>> +       }
>> +
>> +       printf
>> +           ("%s:  %d entries and %d/%d buckets used, longest chain length %d\n",
>> +            tag, h->nel, slots_used, h->size, max_chain_len);
>> +}
>> diff --git a/libselinux/src/hashtab.h b/libselinux/src/hashtab.h
>> new file mode 100644
>> index 00000000..092b96a9
>> --- /dev/null
>> +++ b/libselinux/src/hashtab.h
>> @@ -0,0 +1,115 @@
>> +
>> +/* Author : Stephen Smalley, <sds@tycho.nsa.gov> */
>> +
>> +/* FLASK */
>> +
>> +/*
>> + * A hash table (hashtab) maintains associations between
>> + * key values and datum values.  The type of the key values
>
> libselinux/src/hashtab.h:8: trailing whitespace.
> + * key values and datum values.  The type of the key values
>
>> + * and the type of the datum values is arbitrary.  The
>> + * functions for hash computation and key comparison are
>> + * provided by the creator of the table.
>> + */
>> +
>> +#ifndef _NEWROLE_HASHTAB_H_
>> +#define _NEWROLE_HASHTAB_H_
> _SELINUX_HASHTAB_H ?
> (or `#pragma once`, seems to be widely supported according to
> https://en.wikipedia.org/wiki/Pragma_once)
>
>> +
>> +#include <stdint.h>
>> +#include <errno.h>
>> +#include <stdio.h>
>> +
>> +typedef char *hashtab_key_t;   /* generic key type */
>> +typedef const char *const_hashtab_key_t;       /* constant generic key type */
>> +typedef void *hashtab_datum_t; /* generic datum type */
>> +
>> +typedef struct hashtab_node *hashtab_ptr_t;
>> +
>> +typedef struct hashtab_node {
>> +       hashtab_key_t key;
>> +       hashtab_datum_t datum;
>> +       hashtab_ptr_t next;
>> +} hashtab_node_t;
>> +
>> +typedef struct hashtab_val {
>> +       hashtab_ptr_t *htable;  /* hash table */
>> +       unsigned int size;      /* number of slots in hash table */
>> +       uint32_t nel;           /* number of elements in hash table */
>> +       unsigned int (*hash_value) (struct hashtab_val * h, const_hashtab_key_t key);   /* hash function */
>> +       int (*keycmp) (struct hashtab_val * h, const_hashtab_key_t key1, const_hashtab_key_t key2);     /* key comparison function */
>> +} hashtab_val_t;
>> +
>> +typedef hashtab_val_t *hashtab_t;
>> +
>> +/* Define status codes for hash table functions */
>> +#define HASHTAB_SUCCESS     0
>> +#define HASHTAB_OVERFLOW    -ENOMEM
>> +#define HASHTAB_PRESENT     -EEXIST
>> +#define HASHTAB_MISSING     -ENOENT
>> +
>> +/*
>> +   Creates a new hash table with the specified characteristics.
>> +
>> +   Returns NULL if insufficient space is available or
>> +   the new hash table otherwise.
>> + */
>> +extern hashtab_t hashtab_create(unsigned int (*hash_value) (hashtab_t h,
>> +                                                           const_hashtab_key_t
>> +                                                           key),
>> +                               int (*keycmp) (hashtab_t h,
>> +                                              const_hashtab_key_t key1,
>> +                                              const_hashtab_key_t key2),
>> +                               unsigned int size);
>> +/*
>> +   Inserts the specified (key, datum) pair into the specified hash table.
>> +
>> +   Returns HASHTAB_OVERFLOW if insufficient space is available or
>> +   HASHTAB_PRESENT  if there is already an entry with the same key or
>> +   HASHTAB_SUCCESS otherwise.
>> + */
>> +extern int hashtab_insert(hashtab_t h, hashtab_key_t k, hashtab_datum_t d);
>> +
>> +/*
>> +   Removes the entry with the specified key from the hash table.
>> +   Applies the specified destroy function to (key,datum,args) for
>> +   the entry.
>> +
>> +   Returns HASHTAB_MISSING if no entry has the specified key or
>> +   HASHTAB_SUCCESS otherwise.
>> + */
>> +extern int hashtab_remove(hashtab_t h, hashtab_key_t k,
>> +                         void (*destroy) (hashtab_key_t k,
>> +                                          hashtab_datum_t d,
>> +                                          void *args), void *args);
>> +
>> +/*
>> +   Searches for the entry with the specified key in the hash table.
>> +
>> +   Returns NULL if no entry has the specified key or
>> +   the datum of the entry otherwise.
>> + */
>> +extern hashtab_datum_t hashtab_search(hashtab_t h, const_hashtab_key_t k);
>> +
>> +/*
>> +   Destroys the specified hash table.
>> + */
>> +extern void hashtab_destroy(hashtab_t h);
>> +
>> +/*
>> +   Applies the specified apply function to (key,datum,args)
>> +   for each entry in the specified hash table.
>> +
>> +   The order in which the function is applied to the entries
>> +   is dependent upon the internal structure of the hash table.
>> +
>> +   If apply returns a non-zero status, then hashtab_map will cease
>> +   iterating through the hash table and will propagate the error
>> +   return to its caller.
>> + */
>> +extern int hashtab_map(hashtab_t h,
>> +                      int (*apply) (hashtab_key_t k,
>> +                                    hashtab_datum_t d,
>> +                                    void *args), void *args);
>> +
>> +extern void hashtab_hash_eval(hashtab_t h, char *tag);
>> +
>> +#endif
>> --
>> 2.12.3

Thanks for your review, _NEWROLE_HASHTAB_H_ and trailing whitespace are 
my oversights. I'll fix it in the next patch version.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] libselinux: performance optimization for duplicate detection
  2023-02-10 14:09   ` Christian Göttsche
@ 2023-02-11 12:50     ` wanghuizhao
  0 siblings, 0 replies; 15+ messages in thread
From: wanghuizhao @ 2023-02-11 12:50 UTC (permalink / raw)
  To: Christian Göttsche
  Cc: selinux, Xiaoming Ni, weiyuchen3, wangfangpeng1, chenjingwen6,
	dongxinhua


On 2023/2/10 22:09, Christian Göttsche wrote:
> On Thu, 9 Feb 2023 at 12:54, wanghuizhao <wanghuizhao1@huawei.com> wrote:
>> When semodule -i some.pp to install a module package, duplicate items are
>> detected for the module. The detection function is nodups_specs in
>> libselinux/src/label_file.c. The algorithm complexity of implementing
>> this function is O(N^2). In scenarios where N is very large, the efficiency
>> is very low.
>>
>> To solve this problem, I propose to use the hash table to detect duplicates.
>> The algorithm complexity of new implementing is O(N). The execution efficiency
>> will be greatly improved.
>>
>> Comparison between the execution time of the nodups_specs function.
>>
>> Old double-layer loop implementation O(N^2):
>>
>> semodule -i myapp1.pp
>> nodups_specs data->nspec: 5002
>> nodups_specs start: 11785.242s
>> nodups_specs end:   11785.588s
>> nodups_specs consumes:  0.346s
>>
>> semodule -i myapp2.pp
>> nodups_specs data->nspec: 10002
>> nodups_specs start: 11804.280s
>> nodups_specs end:   11806.546s
>> nodups_specs consumes:  2.266s
>>
>> semodule -i myapp3.pp
>> nodups_specs data->nspec: 20002
>> nodups_specs start: 11819.106s
>> nodups_specs end:   11830.892s
>> nodups_specs consumes: 11.786s
>>
>> New hash table implementation O(N):
>>
>> semodule -i myapp1.pp
>> nodups_specs data->nspec: 5002
>> nodups_specs start: 11785.588s
>> nodups_specs end:   11785.590s
>> nodups_specs consumes:  0.002s
>>
>> semodule -i myapp2.pp
>> nodups_specs data->nspec: 10002
>> nodups_specs start: 11806.546s
>> nodups_specs end:   11806.552s
>> nodups_specs consumes:  0.006s
>>
>> semodule -i myapp3.pp
>> nodups_specs data->nspec: 20002
>> nodups_specs start: 11830.892s
>> nodups_specs end:   11830.905s
>> nodups_specs consumes:  0.013s
>>
>> Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
>> ---
>>   libselinux/src/label_file.c | 112 ++++++++++++++++++++++++++++++++++----------
>>   libselinux/src/label_file.h |   5 ++
>>   2 files changed, 93 insertions(+), 24 deletions(-)
>>
>> diff --git a/libselinux/src/label_file.c b/libselinux/src/label_file.c
>> index 74ae9b9f..e4a85043 100644
>> --- a/libselinux/src/label_file.c
>> +++ b/libselinux/src/label_file.c
>> @@ -19,6 +19,7 @@
>>   #include <sys/types.h>
>>   #include <sys/stat.h>
>>
>> +#include "hashtab.h"
>>   #include "callbacks.h"
>>   #include "label_internal.h"
>>   #include "label_file.h"
>> @@ -57,40 +58,103 @@ static int find_stem_from_file(struct saved_data *data, const char *key)
>>   }
>>
>>   /*
>> + * hash calculation and key comparison of hash table
>> + */
>> +
>> +static unsigned int symhash(hashtab_t h, const_hashtab_key_t key)
>> +{
>> +       const struct chkdups_key *k = (const struct chkdups_key *)key;
>> +       const char *p = NULL;
>> +       size_t size;
>> +       unsigned int val = 0;
>> +
>> +       size = strlen(k->regex);
>> +       for (p = k->regex; ((size_t) (p - k->regex)) < size; p++)
>> +               val =
>> +                       (val << 4 | (val >> (8 * sizeof(unsigned int) - 4)) +
>> +                       k->mode) ^ (*p);
> label_file.c: In function ‘symhash’:
> label_file.c:74:77: error: suggest parentheses around arithmetic in
> operand of ‘|’ [-Werror=parentheses]
>     74 |                         (val << 4 | (val >> (8 *
> sizeof(unsigned int) - 4)) +
>        |
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
>     75 |                         k->mode) ^ (*p);
>        |                         ~~~~~~~

Thanks for your review, I don't have this error on my GCC 7.3.0. Maybe I 
lost some

flags during compilation. After modifying the code, I'll add these flags 
to compile

and verify again.

>> +       return val % h->size;
>> +}
>> +
>> +static int symcmp(hashtab_t h
>> +                 __attribute__ ((unused)), const_hashtab_key_t key1,
>> +                 const_hashtab_key_t key2)
>> +{
>> +       const struct chkdups_key *a = (const struct chkdups_key *)key1;
>> +       const struct chkdups_key *b = (const struct chkdups_key *)key2;
>> +
>> +       return strcmp(a->regex, b->regex) || (a->mode && b->mode && a->mode != b->mode);
>> +}
>> +
>> +/*
>>    * Warn about duplicate specifications.
>>    */
>>   static int nodups_specs(struct saved_data *data, const char *path)
>>   {
>> -       int rc = 0;
>> -       unsigned int ii, jj;
>> +       int rc = 0, ret = 0;
>> +       unsigned int ii;
>>          struct spec *curr_spec, *spec_arr = data->spec_arr;
>> +       struct chkdups_key *new = NULL;
>> +       unsigned int hashtab_len = (data->nspec / 10) ? data->nspec / 10 : 1;
>> +       hashtab_ptr_t cur, temp;
>>
>> +       hashtab_t hash_table = hashtab_create(symhash, symcmp, data->nspec);
>> +       if (hash_table == NULL) {
>> +               rc = -1;
>> +               COMPAT_LOG(SELINUX_ERROR, "%s: hashtab create failed.\n", path);
>> +               return rc;
>> +       }
>>          for (ii = 0; ii < data->nspec; ii++) {
>> -               curr_spec = &spec_arr[ii];
>> -               for (jj = ii + 1; jj < data->nspec; jj++) {
>> -                       if ((!strcmp(spec_arr[jj].regex_str,
>> -                               curr_spec->regex_str))
>> -                           && (!spec_arr[jj].mode || !curr_spec->mode
>> -                               || spec_arr[jj].mode == curr_spec->mode)) {
>> -                               rc = -1;
>> -                               errno = EINVAL;
>> -                               if (strcmp(spec_arr[jj].lr.ctx_raw,
>> -                                           curr_spec->lr.ctx_raw)) {
>> -                                       COMPAT_LOG
>> -                                               (SELINUX_ERROR,
>> -                                                "%s: Multiple different specifications for %s  (%s and %s).\n",
>> -                                                path, curr_spec->regex_str,
>> -                                                spec_arr[jj].lr.ctx_raw,
>> -                                                curr_spec->lr.ctx_raw);
>> -                               } else {
>> -                                       COMPAT_LOG
>> -                                               (SELINUX_ERROR,
>> -                                                "%s: Multiple same specifications for %s.\n",
>> -                                                path, curr_spec->regex_str);
>> -                               }
>> +               new = (struct chkdups_key *)malloc(sizeof(struct chkdups_key));
> oom check missing


Thanks for your review, I'll fix it in the next patch version

>> +               new->regex = spec_arr[ii].regex_str;
>> +               new->mode = spec_arr[ii].mode;
>> +               ret = hashtab_insert(hash_table, (hashtab_key_t)new, &spec_arr[ii]);
>> +               if (ret == HASHTAB_SUCCESS)
>> +                       continue;
>> +               if (ret == HASHTAB_PRESENT) {
>> +                       curr_spec =
>> +                               (struct spec *)hashtab_search(hash_table, (hashtab_key_t)new);
>> +                       rc = -1;
>> +                       errno = EINVAL;
>> +                       if (strcmp(spec_arr[ii].lr.ctx_raw, curr_spec->lr.ctx_raw)) {
>> +                               COMPAT_LOG
>> +                                       (SELINUX_ERROR,
>> +                                        "%s: Multiple different specifications for %s  (%s and %s).\n",
>> +                                        path, curr_spec->regex_str,
>> +                                        spec_arr[ii].lr.ctx_raw,
>> +                                        curr_spec->lr.ctx_raw);
>> +                       } else {
>> +                               COMPAT_LOG
>> +                                       (SELINUX_ERROR,
>> +                                        "%s: Multiple same specifications for %s.\n",
>> +                                        path, curr_spec->regex_str);
>>                          }
> `new` leaking


Thanks for your review, I'll fix it in the next patch version

>>                  }
>> +               if (ret == HASHTAB_OVERFLOW) {
>> +                       rc = -1;
>> +                       COMPAT_LOG
>> +                               (SELINUX_ERROR,
>> +                               "%s: hashtab happen memory error.\n",
>> +                               path);
>> +                       break;
> `new` leaking


Thanks for your review, I'll fix it in the next patch version

>> +               }
>> +       }
>> +
>> +       for (ii = 0; ii < hashtab_len; ii++) {
>> +               cur = hash_table->htable[ii];
>> +               while (cur != NULL) {
>> +                       temp = cur;
>> +                       cur = cur->next;
>> +                       free(temp->key);
>> +                       free(temp);
>> +               }
>> +               hash_table->htable[ii] = NULL;
>>          }
> The common way of destroying hash-tables is hashtab_destroy().
> Since the keys need to be free'd as well `hashtab_map(hash_table,
> key_destroy, NULL)` with a custom key_destroy function can be used.
> (To avoid iterating the hash-table twice hashtab_destroy() could be
> modified to take an optional key destroy callback.)


Thanks for your comment, I was hesitant to modify some of the logic of 
hashtab_destroy().

Maybe I should create a new hashtab_destroy_key() function. Let this 
function have the

optional key destroy callback in its arguments.


A more direct approach is to modify the hashtab_destroy(). However, this 
makes the interface

of hashtab_destroy() inconsistent with that of other hashtab.c files.

>> +
>> +       free(hash_table->htable);
>> +       hash_table->htable = NULL;
>> +       free(hash_table);
>> +
>>          return rc;
>>   }
>>
>> diff --git a/libselinux/src/label_file.h b/libselinux/src/label_file.h
>> index 190bc175..ad79319e 100644
>> --- a/libselinux/src/label_file.h
>> +++ b/libselinux/src/label_file.h
>> @@ -35,6 +35,11 @@
>>   /* Required selinux_restorecon and selabel_get_digests_all_partial_matches() */
>>   #define RESTORECON_PARTIAL_MATCH_DIGEST  "security.sehash"
>>
>> +struct chkdups_key {
>> +       char *regex;
>> +       unsigned int mode;
>> +};
> Why declare in the header and not in the source file?


Thanks for your comment, I habitually define structs in header files. 
After thinking

about it, I think that this structure is only used in this one place. 
There are no other

source file references, and it should be more appropriate to put in the 
source file.

I'll fix it in the next patch version

>> +
>>   struct selabel_sub {
>>          char *src;
>>          int slen;
>> --
>> 2.12.3

Thanks

wanghuizhao


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 0/3] Improve efficiency of detecting duplicate in libselinux
  2023-02-09 11:42 [PATCH 0/2] Improve efficiency of detecting duplicate in libselinux wanghuizhao
  2023-02-09 11:42 ` [PATCH 1/2] libselinux: migrating hashtab from policycoreutils wanghuizhao
  2023-02-09 11:42 ` [PATCH 2/2] libselinux: performance optimization for duplicate detection wanghuizhao
@ 2023-02-17  8:44 ` wanghuizhao
  2023-02-17  8:44   ` [PATCH v2 1/3] libselinux: migrating hashtab from policycoreutils wanghuizhao
                     ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: wanghuizhao @ 2023-02-17  8:44 UTC (permalink / raw)
  To: selinux, nixiaoming, chenjingwen6, dongxinhua
  Cc: lautrbach, jwcart2, jason, cgzones, wangfangpeng1, weiyuchen3,
	wanghuizhao1

changes in v2:
  1. add hashtab_destroy_key function in the hashtab.c and declaration in
     hashtab.h. Replace the original resource release statement with
     hashtab_destroy_key function.
  2. change macro definition to _SELINUX_HASHTAB_H_ in the hashtab.h
  3. fix build error:
     label_file.c:74:77: error: suggest parentheses around arithmetic in
     operand of ‘|’ [-Werror=parentheses]
     74 | (val << 4 | (val >> (8 * sizeof(unsigned int) - 4)) +
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
     75 |                         k->mode) ^ (*p);
        |                         ~~~~~~~
  4. fix memory leak and add oom check in label_file.c
  5. fix length parameter of hashtab_create function to hashtab_len
  6. move struct chkdups_key to label_file.c

v1: https://lore.kernel.org/selinux/20230209114253.120485-1-wanghuizhao1@huawei.com/

wanghuizhao (3):
  libselinux: migrating hashtab from policycoreutils
  libselinux: adapting hashtab to libselinux
  libselinux: performance optimization for duplicate detection

 libselinux/src/hashtab.c    | 234 ++++++++++++++++++++++++++++++++++++++++++++
 libselinux/src/hashtab.h    | 117 ++++++++++++++++++++++
 libselinux/src/label_file.c | 118 +++++++++++++++++-----
 3 files changed, 445 insertions(+), 24 deletions(-)
 create mode 100644 libselinux/src/hashtab.c
 create mode 100644 libselinux/src/hashtab.h

-- 
2.12.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] libselinux: migrating hashtab from policycoreutils
  2023-02-17  8:44 ` [PATCH v2 0/3] Improve efficiency of detecting duplicate in libselinux wanghuizhao
@ 2023-02-17  8:44   ` wanghuizhao
  2023-02-17  8:44   ` [PATCH v2 2/3] libselinux: adapting hashtab to libselinux wanghuizhao
  2023-02-17  8:44   ` [PATCH v2 3/3] libselinux: performance optimization for duplicate detection wanghuizhao
  2 siblings, 0 replies; 15+ messages in thread
From: wanghuizhao @ 2023-02-17  8:44 UTC (permalink / raw)
  To: selinux, nixiaoming, chenjingwen6, dongxinhua
  Cc: lautrbach, jwcart2, jason, cgzones, wangfangpeng1, weiyuchen3,
	wanghuizhao1

To use hashtab in libselinux, migrate the existing hashtab template
from policycoreutils/newrole to libselinux.

Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
---
 libselinux/src/hashtab.c | 208 +++++++++++++++++++++++++++++++++++++++++++++++
 libselinux/src/hashtab.h | 115 ++++++++++++++++++++++++++
 2 files changed, 323 insertions(+)
 create mode 100644 libselinux/src/hashtab.c
 create mode 100644 libselinux/src/hashtab.h

diff --git a/libselinux/src/hashtab.c b/libselinux/src/hashtab.c
new file mode 100644
index 00000000..26d4f4c7
--- /dev/null
+++ b/libselinux/src/hashtab.c
@@ -0,0 +1,208 @@
+
+/* Author : Stephen Smalley, <sds@tycho.nsa.gov> */
+
+/* FLASK */
+
+/*
+ * Implementation of the hash table type.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+#include "hashtab.h"
+
+hashtab_t hashtab_create(unsigned int (*hash_value) (hashtab_t h,
+						     const_hashtab_key_t key),
+			 int (*keycmp) (hashtab_t h,
+					const_hashtab_key_t key1,
+					const_hashtab_key_t key2),
+			 unsigned int size)
+{
+
+	hashtab_t p;
+	unsigned int i;
+
+	p = (hashtab_t) malloc(sizeof(hashtab_val_t));
+	if (p == NULL)
+		return p;
+
+	memset(p, 0, sizeof(hashtab_val_t));
+	p->size = size;
+	p->nel = 0;
+	p->hash_value = hash_value;
+	p->keycmp = keycmp;
+	p->htable = (hashtab_ptr_t *) malloc(sizeof(hashtab_ptr_t) * size);
+	if (p->htable == NULL) {
+		free(p);
+		return NULL;
+	}
+	for (i = 0; i < size; i++)
+		p->htable[i] = (hashtab_ptr_t) NULL;
+
+	return p;
+}
+
+int hashtab_insert(hashtab_t h, hashtab_key_t key, hashtab_datum_t datum)
+{
+	unsigned int hvalue;
+	hashtab_ptr_t prev, cur, newnode;
+
+	if (!h)
+		return HASHTAB_OVERFLOW;
+
+	hvalue = h->hash_value(h, key);
+	prev = NULL;
+	cur = h->htable[hvalue];
+	while (cur && h->keycmp(h, key, cur->key) > 0) {
+		prev = cur;
+		cur = cur->next;
+	}
+
+	if (cur && (h->keycmp(h, key, cur->key) == 0))
+		return HASHTAB_PRESENT;
+
+	newnode = (hashtab_ptr_t) malloc(sizeof(hashtab_node_t));
+	if (newnode == NULL)
+		return HASHTAB_OVERFLOW;
+	memset(newnode, 0, sizeof(struct hashtab_node));
+	newnode->key = key;
+	newnode->datum = datum;
+	if (prev) {
+		newnode->next = prev->next;
+		prev->next = newnode;
+	} else {
+		newnode->next = h->htable[hvalue];
+		h->htable[hvalue] = newnode;
+	}
+
+	h->nel++;
+	return HASHTAB_SUCCESS;
+}
+
+int hashtab_remove(hashtab_t h, hashtab_key_t key,
+		   void (*destroy) (hashtab_key_t k,
+				    hashtab_datum_t d, void *args), void *args)
+{
+	unsigned int hvalue;
+	hashtab_ptr_t cur, last;
+
+	if (!h)
+		return HASHTAB_MISSING;
+
+	hvalue = h->hash_value(h, key);
+	last = NULL;
+	cur = h->htable[hvalue];
+	while (cur != NULL && h->keycmp(h, key, cur->key) > 0) {
+		last = cur;
+		cur = cur->next;
+	}
+
+	if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
+		return HASHTAB_MISSING;
+
+	if (last == NULL)
+		h->htable[hvalue] = cur->next;
+	else
+		last->next = cur->next;
+
+	if (destroy)
+		destroy(cur->key, cur->datum, args);
+	free(cur);
+	h->nel--;
+	return HASHTAB_SUCCESS;
+}
+
+hashtab_datum_t hashtab_search(hashtab_t h, const_hashtab_key_t key)
+{
+
+	unsigned int hvalue;
+	hashtab_ptr_t cur;
+
+	if (!h)
+		return NULL;
+
+	hvalue = h->hash_value(h, key);
+	cur = h->htable[hvalue];
+	while (cur != NULL && h->keycmp(h, key, cur->key) > 0)
+		cur = cur->next;
+
+	if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
+		return NULL;
+
+	return cur->datum;
+}
+
+void hashtab_destroy(hashtab_t h)
+{
+	unsigned int i;
+	hashtab_ptr_t cur, temp;
+
+	if (!h)
+		return;
+
+	for (i = 0; i < h->size; i++) {
+		cur = h->htable[i];
+		while (cur != NULL) {
+			temp = cur;
+			cur = cur->next;
+			free(temp);
+		}
+		h->htable[i] = NULL;
+	}
+
+	free(h->htable);
+	h->htable = NULL;
+
+	free(h);
+}
+
+int hashtab_map(hashtab_t h,
+		int (*apply) (hashtab_key_t k,
+			      hashtab_datum_t d, void *args), void *args)
+{
+	unsigned int i;
+	hashtab_ptr_t cur;
+	int ret;
+
+	if (!h)
+		return HASHTAB_SUCCESS;
+
+	for (i = 0; i < h->size; i++) {
+		cur = h->htable[i];
+		while (cur != NULL) {
+			ret = apply(cur->key, cur->datum, args);
+			if (ret)
+				return ret;
+			cur = cur->next;
+		}
+	}
+	return HASHTAB_SUCCESS;
+}
+
+void hashtab_hash_eval(hashtab_t h, char *tag)
+{
+	unsigned int i;
+	int chain_len, slots_used, max_chain_len;
+	hashtab_ptr_t cur;
+
+	slots_used = 0;
+	max_chain_len = 0;
+	for (i = 0; i < h->size; i++) {
+		cur = h->htable[i];
+		if (cur) {
+			slots_used++;
+			chain_len = 0;
+			while (cur) {
+				chain_len++;
+				cur = cur->next;
+			}
+
+			if (chain_len > max_chain_len)
+				max_chain_len = chain_len;
+		}
+	}
+
+	printf
+	    ("%s:  %d entries and %d/%d buckets used, longest chain length %d\n",
+	     tag, h->nel, slots_used, h->size, max_chain_len);
+}
diff --git a/libselinux/src/hashtab.h b/libselinux/src/hashtab.h
new file mode 100644
index 00000000..78471269
--- /dev/null
+++ b/libselinux/src/hashtab.h
@@ -0,0 +1,115 @@
+
+/* Author : Stephen Smalley, <sds@tycho.nsa.gov> */
+
+/* FLASK */
+
+/*
+ * A hash table (hashtab) maintains associations between
+ * key values and datum values.  The type of the key values
+ * and the type of the datum values is arbitrary.  The
+ * functions for hash computation and key comparison are
+ * provided by the creator of the table.
+ */
+
+#ifndef _NEWROLE_HASHTAB_H_
+#define _NEWROLE_HASHTAB_H_
+
+#include <stdint.h>
+#include <errno.h>
+#include <stdio.h>
+
+typedef char *hashtab_key_t;	/* generic key type */
+typedef const char *const_hashtab_key_t;	/* constant generic key type */
+typedef void *hashtab_datum_t;	/* generic datum type */
+
+typedef struct hashtab_node *hashtab_ptr_t;
+
+typedef struct hashtab_node {
+	hashtab_key_t key;
+	hashtab_datum_t datum;
+	hashtab_ptr_t next;
+} hashtab_node_t;
+
+typedef struct hashtab_val {
+	hashtab_ptr_t *htable;	/* hash table */
+	unsigned int size;	/* number of slots in hash table */
+	uint32_t nel;		/* number of elements in hash table */
+	unsigned int (*hash_value) (struct hashtab_val * h, const_hashtab_key_t key);	/* hash function */
+	int (*keycmp) (struct hashtab_val * h, const_hashtab_key_t key1, const_hashtab_key_t key2);	/* key comparison function */
+} hashtab_val_t;
+
+typedef hashtab_val_t *hashtab_t;
+
+/* Define status codes for hash table functions */
+#define HASHTAB_SUCCESS     0
+#define HASHTAB_OVERFLOW    -ENOMEM
+#define HASHTAB_PRESENT     -EEXIST
+#define HASHTAB_MISSING     -ENOENT
+
+/*
+   Creates a new hash table with the specified characteristics.
+
+   Returns NULL if insufficient space is available or
+   the new hash table otherwise.
+ */
+extern hashtab_t hashtab_create(unsigned int (*hash_value) (hashtab_t h,
+							    const_hashtab_key_t
+							    key),
+				int (*keycmp) (hashtab_t h,
+					       const_hashtab_key_t key1,
+					       const_hashtab_key_t key2),
+				unsigned int size);
+/*
+   Inserts the specified (key, datum) pair into the specified hash table.
+
+   Returns HASHTAB_OVERFLOW if insufficient space is available or
+   HASHTAB_PRESENT  if there is already an entry with the same key or
+   HASHTAB_SUCCESS otherwise.
+ */
+extern int hashtab_insert(hashtab_t h, hashtab_key_t k, hashtab_datum_t d);
+
+/*
+   Removes the entry with the specified key from the hash table.
+   Applies the specified destroy function to (key,datum,args) for
+   the entry.
+
+   Returns HASHTAB_MISSING if no entry has the specified key or
+   HASHTAB_SUCCESS otherwise.
+ */
+extern int hashtab_remove(hashtab_t h, hashtab_key_t k,
+			  void (*destroy) (hashtab_key_t k,
+					   hashtab_datum_t d,
+					   void *args), void *args);
+
+/*
+   Searches for the entry with the specified key in the hash table.
+
+   Returns NULL if no entry has the specified key or
+   the datum of the entry otherwise.
+ */
+extern hashtab_datum_t hashtab_search(hashtab_t h, const_hashtab_key_t k);
+
+/*
+   Destroys the specified hash table.
+ */
+extern void hashtab_destroy(hashtab_t h);
+
+/*
+   Applies the specified apply function to (key,datum,args)
+   for each entry in the specified hash table.
+
+   The order in which the function is applied to the entries
+   is dependent upon the internal structure of the hash table.
+
+   If apply returns a non-zero status, then hashtab_map will cease
+   iterating through the hash table and will propagate the error
+   return to its caller.
+ */
+extern int hashtab_map(hashtab_t h,
+		       int (*apply) (hashtab_key_t k,
+				     hashtab_datum_t d,
+				     void *args), void *args);
+
+extern void hashtab_hash_eval(hashtab_t h, char *tag);
+
+#endif
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] libselinux: adapting hashtab to libselinux
  2023-02-17  8:44 ` [PATCH v2 0/3] Improve efficiency of detecting duplicate in libselinux wanghuizhao
  2023-02-17  8:44   ` [PATCH v2 1/3] libselinux: migrating hashtab from policycoreutils wanghuizhao
@ 2023-02-17  8:44   ` wanghuizhao
  2023-02-17  8:44   ` [PATCH v2 3/3] libselinux: performance optimization for duplicate detection wanghuizhao
  2 siblings, 0 replies; 15+ messages in thread
From: wanghuizhao @ 2023-02-17  8:44 UTC (permalink / raw)
  To: selinux, nixiaoming, chenjingwen6, dongxinhua
  Cc: lautrbach, jwcart2, jason, cgzones, wangfangpeng1, weiyuchen3,
	wanghuizhao1

To adapt to the scenarios of libselinux, this patch does three things:

1. Add a new function hashtab_destroy_key. This function is used to
   reclaim memory using the customized key destruction method.

2. Changed the macro definition to _SELINUX_HASHTAB_H_.

3. Add a function declaration to the header file.

Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
---
 libselinux/src/hashtab.c | 26 ++++++++++++++++++++++++++
 libselinux/src/hashtab.h |  6 ++++--
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/libselinux/src/hashtab.c b/libselinux/src/hashtab.c
index 26d4f4c7..c415ad0d 100644
--- a/libselinux/src/hashtab.c
+++ b/libselinux/src/hashtab.c
@@ -156,6 +156,32 @@ void hashtab_destroy(hashtab_t h)
 	free(h);
 }
 
+void hashtab_destroy_key(hashtab_t h,
+		int (*destroy_key) (hashtab_key_t k))
+{
+	unsigned int i;
+	hashtab_ptr_t cur, temp;
+
+	if (!h)
+		return;
+
+	for (i = 0; i < h->size; i++) {
+		cur = h->htable[i];
+		while (cur != NULL) {
+			temp = cur;
+			cur = cur->next;
+			destroy_key(temp->key);
+			free(temp);
+		}
+		h->htable[i] = NULL;
+	}
+
+	free(h->htable);
+	h->htable = NULL;
+
+	free(h);
+}
+
 int hashtab_map(hashtab_t h,
 		int (*apply) (hashtab_key_t k,
 			      hashtab_datum_t d, void *args), void *args)
diff --git a/libselinux/src/hashtab.h b/libselinux/src/hashtab.h
index 78471269..9d2b593b 100644
--- a/libselinux/src/hashtab.h
+++ b/libselinux/src/hashtab.h
@@ -11,8 +11,8 @@
  * provided by the creator of the table.
  */
 
-#ifndef _NEWROLE_HASHTAB_H_
-#define _NEWROLE_HASHTAB_H_
+#ifndef _SELINUX_HASHTAB_H_
+#define _SELINUX_HASHTAB_H_
 
 #include <stdint.h>
 #include <errno.h>
@@ -93,6 +93,8 @@ extern hashtab_datum_t hashtab_search(hashtab_t h, const_hashtab_key_t k);
    Destroys the specified hash table.
  */
 extern void hashtab_destroy(hashtab_t h);
+extern void hashtab_destroy_key(hashtab_t h,
+			int (*destroy_key) (hashtab_key_t k));
 
 /*
    Applies the specified apply function to (key,datum,args)
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] libselinux: performance optimization for duplicate detection
  2023-02-17  8:44 ` [PATCH v2 0/3] Improve efficiency of detecting duplicate in libselinux wanghuizhao
  2023-02-17  8:44   ` [PATCH v2 1/3] libselinux: migrating hashtab from policycoreutils wanghuizhao
  2023-02-17  8:44   ` [PATCH v2 2/3] libselinux: adapting hashtab to libselinux wanghuizhao
@ 2023-02-17  8:44   ` wanghuizhao
  2023-02-23  7:42     ` wanghuizhao
  2023-02-24 12:00     ` Christian Göttsche
  2 siblings, 2 replies; 15+ messages in thread
From: wanghuizhao @ 2023-02-17  8:44 UTC (permalink / raw)
  To: selinux, nixiaoming, chenjingwen6, dongxinhua
  Cc: lautrbach, jwcart2, jason, cgzones, wangfangpeng1, weiyuchen3,
	wanghuizhao1

When semodule -i some.pp to install a module package, duplicate items are
detected for the module. The detection function is nodups_specs in
libselinux/src/label_file.c. The algorithm complexity of implementing
this function is O(N^2). In scenarios where N is very large, the efficiency
is very low.

To solve this problem, I propose to use the hash table to detect duplicates.
The algorithm complexity of new implementing is O(N). The execution efficiency
will be greatly improved.

Comparison between the execution time of the nodups_specs function.

Old double-layer loop implementation O(N^2):

semodule -i myapp1.pp
nodups_specs data->nspec: 5002
nodups_specs start: 11785.242s
nodups_specs end:   11785.588s
nodups_specs consumes:  0.346s

semodule -i myapp2.pp
nodups_specs data->nspec: 10002
nodups_specs start: 11804.280s
nodups_specs end:   11806.546s
nodups_specs consumes:  2.266s

semodule -i myapp3.pp
nodups_specs data->nspec: 20002
nodups_specs start: 11819.106s
nodups_specs end:   11830.892s
nodups_specs consumes: 11.786s

New hash table implementation O(N):

semodule -i myapp1.pp
nodups_specs data->nspec: 5002
nodups_specs start: 11785.588s
nodups_specs end:   11785.590s
nodups_specs consumes:  0.002s

semodule -i myapp2.pp
nodups_specs data->nspec: 10002
nodups_specs start: 11806.546s
nodups_specs end:   11806.552s
nodups_specs consumes:  0.006s

semodule -i myapp3.pp
nodups_specs data->nspec: 20002
nodups_specs start: 11830.892s
nodups_specs end:   11830.905s
nodups_specs consumes:  0.013s

Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
---
 libselinux/src/label_file.c | 118 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 94 insertions(+), 24 deletions(-)

diff --git a/libselinux/src/label_file.c b/libselinux/src/label_file.c
index 74ae9b9f..eebf9665 100644
--- a/libselinux/src/label_file.c
+++ b/libselinux/src/label_file.c
@@ -19,10 +19,17 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 
+#include "hashtab.h"
 #include "callbacks.h"
 #include "label_internal.h"
 #include "label_file.h"
 
+
+struct chkdups_key {
+	char *regex;
+	unsigned int mode;
+};
+
 /*
  * Internals, mostly moved over from matchpathcon.c
  */
@@ -57,40 +64,103 @@ static int find_stem_from_file(struct saved_data *data, const char *key)
 }
 
 /*
+ * hash calculation and key comparison of hash table
+ */
+
+static unsigned int symhash(hashtab_t h, const_hashtab_key_t key)
+{
+	const struct chkdups_key *k = (const struct chkdups_key *)key;
+	const char *p = NULL;
+	size_t size;
+	unsigned int val = 0;
+
+	size = strlen(k->regex);
+	for (p = k->regex; ((size_t) (p - k->regex)) < size; p++)
+		val =
+			((val << 4) | ((val >> (8 * sizeof(unsigned int) - 4)) +
+			k->mode)) ^ (*p);
+	return val % h->size;
+}
+
+static int symcmp(hashtab_t h
+		  __attribute__ ((unused)), const_hashtab_key_t key1,
+		  const_hashtab_key_t key2)
+{
+	const struct chkdups_key *a = (const struct chkdups_key *)key1;
+	const struct chkdups_key *b = (const struct chkdups_key *)key2;
+
+	return strcmp(a->regex, b->regex) || (a->mode && b->mode && a->mode != b->mode);
+}
+
+static int destroy_chkdups_key(hashtab_key_t key)
+{
+	free(key);
+
+	return 0;
+}
+
+/*
  * Warn about duplicate specifications.
  */
 static int nodups_specs(struct saved_data *data, const char *path)
 {
-	int rc = 0;
-	unsigned int ii, jj;
+	int rc = 0, ret = 0;
+	unsigned int ii;
 	struct spec *curr_spec, *spec_arr = data->spec_arr;
+	struct chkdups_key *new = NULL;
+	unsigned int hashtab_len = (data->nspec / 10) ? data->nspec / 10 : 1;
 
+	hashtab_t hash_table = hashtab_create(symhash, symcmp, hashtab_len);
+	if (!hash_table) {
+		rc = -1;
+		COMPAT_LOG(SELINUX_ERROR, "%s: hashtab create failed.\n", path);
+		return rc;
+	}
 	for (ii = 0; ii < data->nspec; ii++) {
-		curr_spec = &spec_arr[ii];
-		for (jj = ii + 1; jj < data->nspec; jj++) {
-			if ((!strcmp(spec_arr[jj].regex_str,
-				curr_spec->regex_str))
-			    && (!spec_arr[jj].mode || !curr_spec->mode
-				|| spec_arr[jj].mode == curr_spec->mode)) {
-				rc = -1;
-				errno = EINVAL;
-				if (strcmp(spec_arr[jj].lr.ctx_raw,
-					    curr_spec->lr.ctx_raw)) {
-					COMPAT_LOG
-						(SELINUX_ERROR,
-						 "%s: Multiple different specifications for %s  (%s and %s).\n",
-						 path, curr_spec->regex_str,
-						 spec_arr[jj].lr.ctx_raw,
-						 curr_spec->lr.ctx_raw);
-				} else {
-					COMPAT_LOG
-						(SELINUX_ERROR,
-						 "%s: Multiple same specifications for %s.\n",
-						 path, curr_spec->regex_str);
-				}
+		new = (struct chkdups_key *)malloc(sizeof(struct chkdups_key));
+		if (!new) {
+			rc = -1;
+			COMPAT_LOG(SELINUX_ERROR, "%s: hashtab key create failed.\n", path);
+			return rc;
+		}
+		new->regex = spec_arr[ii].regex_str;
+		new->mode = spec_arr[ii].mode;
+		ret = hashtab_insert(hash_table, (hashtab_key_t)new, &spec_arr[ii]);
+		if (ret == HASHTAB_SUCCESS)
+			continue;
+		if (ret == HASHTAB_PRESENT) {
+			curr_spec =
+				(struct spec *)hashtab_search(hash_table, (hashtab_key_t)new);
+			rc = -1;
+			errno = EINVAL;
+			free(new);
+			if (strcmp(spec_arr[ii].lr.ctx_raw, curr_spec->lr.ctx_raw)) {
+				COMPAT_LOG
+					(SELINUX_ERROR,
+					 "%s: Multiple different specifications for %s  (%s and %s).\n",
+					 path, curr_spec->regex_str,
+					 spec_arr[ii].lr.ctx_raw,
+					 curr_spec->lr.ctx_raw);
+			} else {
+				COMPAT_LOG
+					(SELINUX_ERROR,
+					 "%s: Multiple same specifications for %s.\n",
+					 path, curr_spec->regex_str);
 			}
 		}
+		if (ret == HASHTAB_OVERFLOW) {
+			rc = -1;
+			free(new);
+			COMPAT_LOG
+				(SELINUX_ERROR,
+				"%s: hashtab happen memory error.\n",
+				path);
+			break;
+		}
 	}
+
+	hashtab_destroy_key(hash_table, destroy_chkdups_key);
+
 	return rc;
 }
 
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] libselinux: performance optimization for duplicate detection
  2023-02-17  8:44   ` [PATCH v2 3/3] libselinux: performance optimization for duplicate detection wanghuizhao
@ 2023-02-23  7:42     ` wanghuizhao
  2023-02-24 12:00     ` Christian Göttsche
  1 sibling, 0 replies; 15+ messages in thread
From: wanghuizhao @ 2023-02-23  7:42 UTC (permalink / raw)
  To: selinux; +Cc: lautrbach, jwcart2, jason, cgzones, weiyuchen3


On 2023/2/17 16:44, wanghuizhao wrote:
> diff --git a/libselinux/src/label_file.c b/libselinux/src/label_file.c
> index 74ae9b9f..eebf9665 100644
> --- a/libselinux/src/label_file.c
> +++ b/libselinux/src/label_file.c
> +
> +/*
>    * Warn about duplicate specifications.
>    */
>   static int nodups_specs(struct saved_data *data, const char *path)
>   {
> -	int rc = 0;
> -	unsigned int ii, jj;
> +	int rc = 0, ret = 0;
> +	unsigned int ii;
>   	struct spec *curr_spec, *spec_arr = data->spec_arr;
> +	struct chkdups_key *new = NULL;
> +	unsigned int hashtab_len = (data->nspec / 10) ? data->nspec / 10 : 1;
>   
> +	hashtab_t hash_table = hashtab_create(symhash, symcmp, hashtab_len);
> +	if (!hash_table) {
> +		rc = -1;
> +		COMPAT_LOG(SELINUX_ERROR, "%s: hashtab create failed.\n", path);
> +		return rc;
> +	}
>   	for (ii = 0; ii < data->nspec; ii++) {
> -		curr_spec = &spec_arr[ii];
> -		for (jj = ii + 1; jj < data->nspec; jj++) {
> -			if ((!strcmp(spec_arr[jj].regex_str,
> -				curr_spec->regex_str))
> -			    && (!spec_arr[jj].mode || !curr_spec->mode
> -				|| spec_arr[jj].mode == curr_spec->mode)) {
> -				rc = -1;
> -				errno = EINVAL;
> -				if (strcmp(spec_arr[jj].lr.ctx_raw,
> -					    curr_spec->lr.ctx_raw)) {
> -					COMPAT_LOG
> -						(SELINUX_ERROR,
> -						 "%s: Multiple different specifications for %s  (%s and %s).\n",
> -						 path, curr_spec->regex_str,
> -						 spec_arr[jj].lr.ctx_raw,
> -						 curr_spec->lr.ctx_raw);
> -				} else {
> -					COMPAT_LOG
> -						(SELINUX_ERROR,
> -						 "%s: Multiple same specifications for %s.\n",
> -						 path, curr_spec->regex_str);
> -				}
> +		new = (struct chkdups_key *)malloc(sizeof(struct chkdups_key));
> +		if (!new) {
> +			rc = -1;
> +			COMPAT_LOG(SELINUX_ERROR, "%s: hashtab key create failed.\n", path);
> +			return rc;
> +		}


I found a hashtab leak here. I'll fix it in the next patch version.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] libselinux: performance optimization for duplicate detection
  2023-02-17  8:44   ` [PATCH v2 3/3] libselinux: performance optimization for duplicate detection wanghuizhao
  2023-02-23  7:42     ` wanghuizhao
@ 2023-02-24 12:00     ` Christian Göttsche
  2023-02-25 14:32       ` wanghuizhao
  1 sibling, 1 reply; 15+ messages in thread
From: Christian Göttsche @ 2023-02-24 12:00 UTC (permalink / raw)
  To: wanghuizhao; +Cc: selinux

On Fri, 17 Feb 2023 at 09:45, wanghuizhao <wanghuizhao1@huawei.com> wrote:
>
> When semodule -i some.pp to install a module package, duplicate items are
> detected for the module. The detection function is nodups_specs in
> libselinux/src/label_file.c. The algorithm complexity of implementing
> this function is O(N^2). In scenarios where N is very large, the efficiency
> is very low.
>
> To solve this problem, I propose to use the hash table to detect duplicates.
> The algorithm complexity of new implementing is O(N). The execution efficiency
> will be greatly improved.

Isn't the new complexity O(N * log(N)) due to the hashtable insertion
of O(log(N))?

>
> Comparison between the execution time of the nodups_specs function.
>
> Old double-layer loop implementation O(N^2):
>
> semodule -i myapp1.pp
> nodups_specs data->nspec: 5002
> nodups_specs start: 11785.242s
> nodups_specs end:   11785.588s
> nodups_specs consumes:  0.346s
>
> semodule -i myapp2.pp
> nodups_specs data->nspec: 10002
> nodups_specs start: 11804.280s
> nodups_specs end:   11806.546s
> nodups_specs consumes:  2.266s
>
> semodule -i myapp3.pp
> nodups_specs data->nspec: 20002
> nodups_specs start: 11819.106s
> nodups_specs end:   11830.892s
> nodups_specs consumes: 11.786s
>
> New hash table implementation O(N):
>
> semodule -i myapp1.pp
> nodups_specs data->nspec: 5002
> nodups_specs start: 11785.588s
> nodups_specs end:   11785.590s
> nodups_specs consumes:  0.002s
>
> semodule -i myapp2.pp
> nodups_specs data->nspec: 10002
> nodups_specs start: 11806.546s
> nodups_specs end:   11806.552s
> nodups_specs consumes:  0.006s
>
> semodule -i myapp3.pp
> nodups_specs data->nspec: 20002
> nodups_specs start: 11830.892s
> nodups_specs end:   11830.905s
> nodups_specs consumes:  0.013s
>
> Signed-off-by: wanghuizhao <wanghuizhao1@huawei.com>
> ---
>  libselinux/src/label_file.c | 118 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 94 insertions(+), 24 deletions(-)
>
> diff --git a/libselinux/src/label_file.c b/libselinux/src/label_file.c
> index 74ae9b9f..eebf9665 100644
> --- a/libselinux/src/label_file.c
> +++ b/libselinux/src/label_file.c
> @@ -19,10 +19,17 @@
>  #include <sys/types.h>
>  #include <sys/stat.h>
>
> +#include "hashtab.h"
>  #include "callbacks.h"
>  #include "label_internal.h"
>  #include "label_file.h"
>
> +
> +struct chkdups_key {
> +       char *regex;
> +       unsigned int mode;
> +};
> +
>  /*
>   * Internals, mostly moved over from matchpathcon.c
>   */
> @@ -57,40 +64,103 @@ static int find_stem_from_file(struct saved_data *data, const char *key)
>  }
>
>  /*
> + * hash calculation and key comparison of hash table
> + */
> +
> +static unsigned int symhash(hashtab_t h, const_hashtab_key_t key)
> +{
> +       const struct chkdups_key *k = (const struct chkdups_key *)key;
> +       const char *p = NULL;
> +       size_t size;
> +       unsigned int val = 0;
> +
> +       size = strlen(k->regex);
> +       for (p = k->regex; ((size_t) (p - k->regex)) < size; p++)
> +               val =
> +                       ((val << 4) | ((val >> (8 * sizeof(unsigned int) - 4)) +
> +                       k->mode)) ^ (*p);

v1 added k->mode after the bit-wise or (probably changed by the added
parenthesis due to the compiler warning).
Using

     (((val << 4) | (val >> (8 * sizeof(unsigned int) - 4))) + k->mode) ^ (*p);

gives a slightly better spread (tested against refpolicy (master)):

    nodups_spec:  6062 entries and 606/606 buckets used, longest chain length 21
vs
    nodups_spec:  6062 entries and 606/606 buckets used, longest chain length 20

And for a adjusted hashtable size (see below):

    nodups_spec:  6062 entries and 3807/6062 buckets used, longest
chain length 6
vs
    nodups_spec:  6062 entries and 3815/6062 buckets used, longest
chain length 6

> +       return val % h->size;
> +}
> +
> +static int symcmp(hashtab_t h
> +                 __attribute__ ((unused)), const_hashtab_key_t key1,
> +                 const_hashtab_key_t key2)
> +{
> +       const struct chkdups_key *a = (const struct chkdups_key *)key1;
> +       const struct chkdups_key *b = (const struct chkdups_key *)key2;
> +
> +       return strcmp(a->regex, b->regex) || (a->mode && b->mode && a->mode != b->mode);
> +}
> +
> +static int destroy_chkdups_key(hashtab_key_t key)
> +{
> +       free(key);
> +
> +       return 0;
> +}
> +
> +/*
>   * Warn about duplicate specifications.
>   */
>  static int nodups_specs(struct saved_data *data, const char *path)
>  {
> -       int rc = 0;
> -       unsigned int ii, jj;
> +       int rc = 0, ret = 0;
> +       unsigned int ii;
>         struct spec *curr_spec, *spec_arr = data->spec_arr;
> +       struct chkdups_key *new = NULL;
> +       unsigned int hashtab_len = (data->nspec / 10) ? data->nspec / 10 : 1;
>
> +       hashtab_t hash_table = hashtab_create(symhash, symcmp, hashtab_len);

v1 used a hashtable size of `data->nspec`, instead of its tenth.
Since the hashtable implementation from newrole does not contain a
resize feature, like the one from libsepol, the size will be fixed for
its lifetime.
Using `data.>nspec` gives slightly better (but probably negligible) performance:

    Benchmark 1: DESTDIR=~/Downloads/destdir/
./scripts/env_use_destdir ~/Downloads/destdir/sbin/setfiles -c
../refpolicy/tmp/policy.bin ../refpolicy/tmp/all_mods.fc
     Time (mean ± σ):     340.4 ms ±  14.4 ms    [User: 280.6 ms,
System: 59.7 ms]
     Range (min … max):   328.1 ms … 386.0 ms    30 runs
vs
    Benchmark 1: DESTDIR=~/Downloads/destdir/
./scripts/env_use_destdir ~/Downloads/destdir/sbin/setfiles -c
../refpolicy/tmp/policy.bin ../refpolicy/tmp/all_mods.fc
     Time (mean ± σ):     334.7 ms ±   5.9 ms    [User: 279.6 ms,
System: 55.0 ms]
     Range (min … max):   327.5 ms … 362.1 ms    30 runs

Since your policy contains much more file context definitions, could
you run some benchmarks yourself?

> +       if (!hash_table) {
> +               rc = -1;
> +               COMPAT_LOG(SELINUX_ERROR, "%s: hashtab create failed.\n", path);
> +               return rc;
> +       }
>         for (ii = 0; ii < data->nspec; ii++) {
> -               curr_spec = &spec_arr[ii];
> -               for (jj = ii + 1; jj < data->nspec; jj++) {
> -                       if ((!strcmp(spec_arr[jj].regex_str,
> -                               curr_spec->regex_str))
> -                           && (!spec_arr[jj].mode || !curr_spec->mode
> -                               || spec_arr[jj].mode == curr_spec->mode)) {
> -                               rc = -1;
> -                               errno = EINVAL;
> -                               if (strcmp(spec_arr[jj].lr.ctx_raw,
> -                                           curr_spec->lr.ctx_raw)) {
> -                                       COMPAT_LOG
> -                                               (SELINUX_ERROR,
> -                                                "%s: Multiple different specifications for %s  (%s and %s).\n",
> -                                                path, curr_spec->regex_str,
> -                                                spec_arr[jj].lr.ctx_raw,
> -                                                curr_spec->lr.ctx_raw);
> -                               } else {
> -                                       COMPAT_LOG
> -                                               (SELINUX_ERROR,
> -                                                "%s: Multiple same specifications for %s.\n",
> -                                                path, curr_spec->regex_str);
> -                               }
> +               new = (struct chkdups_key *)malloc(sizeof(struct chkdups_key));
> +               if (!new) {
> +                       rc = -1;
> +                       COMPAT_LOG(SELINUX_ERROR, "%s: hashtab key create failed.\n", path);
> +                       return rc;
> +               }
> +               new->regex = spec_arr[ii].regex_str;
> +               new->mode = spec_arr[ii].mode;
> +               ret = hashtab_insert(hash_table, (hashtab_key_t)new, &spec_arr[ii]);
> +               if (ret == HASHTAB_SUCCESS)
> +                       continue;
> +               if (ret == HASHTAB_PRESENT) {
> +                       curr_spec =
> +                               (struct spec *)hashtab_search(hash_table, (hashtab_key_t)new);
> +                       rc = -1;
> +                       errno = EINVAL;
> +                       free(new);
> +                       if (strcmp(spec_arr[ii].lr.ctx_raw, curr_spec->lr.ctx_raw)) {
> +                               COMPAT_LOG
> +                                       (SELINUX_ERROR,
> +                                        "%s: Multiple different specifications for %s  (%s and %s).\n",
> +                                        path, curr_spec->regex_str,
> +                                        spec_arr[ii].lr.ctx_raw,
> +                                        curr_spec->lr.ctx_raw);
> +                       } else {
> +                               COMPAT_LOG
> +                                       (SELINUX_ERROR,
> +                                        "%s: Multiple same specifications for %s.\n",
> +                                        path, curr_spec->regex_str);
>                         }
>                 }
> +               if (ret == HASHTAB_OVERFLOW) {
> +                       rc = -1;
> +                       free(new);
> +                       COMPAT_LOG
> +                               (SELINUX_ERROR,
> +                               "%s: hashtab happen memory error.\n",
> +                               path);
> +                       break;
> +               }
>         }
> +
> +       hashtab_destroy_key(hash_table, destroy_chkdups_key);
> +
>         return rc;
>  }
>
> --
> 2.12.3
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] libselinux: performance optimization for duplicate detection
  2023-02-24 12:00     ` Christian Göttsche
@ 2023-02-25 14:32       ` wanghuizhao
  2023-03-03 14:15         ` wanghuizhao
  0 siblings, 1 reply; 15+ messages in thread
From: wanghuizhao @ 2023-02-25 14:32 UTC (permalink / raw)
  To: Christian Göttsche; +Cc: selinux, weiyuchen3


On 2023/2/24 20:00, Christian Göttsche wrote:
> On Fri, 17 Feb 2023 at 09:45, wanghuizhao <wanghuizhao1@huawei.com> wrote:
>> When semodule -i some.pp to install a module package, duplicate items are
>> detected for the module. The detection function is nodups_specs in
>> libselinux/src/label_file.c. The algorithm complexity of implementing
>> this function is O(N^2). In scenarios where N is very large, the efficiency
>> is very low.
>>
>> To solve this problem, I propose to use the hash table to detect duplicates.
>> The algorithm complexity of new implementing is O(N). The execution efficiency
>> will be greatly improved.
> Isn't the new complexity O(N * log(N)) due to the hashtable insertion
> of O(log(N))?


It's a little inaccurate to use O(N) here. When the hash table is 
inserted,  the hash value is calculated based on the inserted data. In 
this scenario, regex string is traversed cyclically. But these have 
nothing to do with O(log(N)).

If there is a hash collision, or if duplicates are detected, it will 
take a little time for pointer manipulation and strcmp.

However, in the original solution, a large number of strcmp operations 
were performed, which was also extremely time-consuming, although the 
second layer loop was only half the original.

for (ii = 0; ii < data->nspec; ii++) {
         curr_spec = &spec_arr[ii];
         for (jj = ii + 1; jj < data->nspec; jj++) {
             if ((!strcmp(spec_arr[jj].regex_str,
                 curr_spec->regex_str))
                 && (!spec_arr[jj].mode || !curr_spec->mode
                 || spec_arr[jj].mode == curr_spec->mode)) {

Maybe I should state that there is another symbol M related to the 
length of the string, and the overall time complexity is O(M * (N^2)) -> 
O(M * N) ?

>>   /*
>> + * hash calculation and key comparison of hash table
>> + */
>> +
>> +static unsigned int symhash(hashtab_t h, const_hashtab_key_t key)
>> +{
>> +       const struct chkdups_key *k = (const struct chkdups_key *)key;
>> +       const char *p = NULL;
>> +       size_t size;
>> +       unsigned int val = 0;
>> +
>> +       size = strlen(k->regex);
>> +       for (p = k->regex; ((size_t) (p - k->regex)) < size; p++)
>> +               val =
>> +                       ((val << 4) | ((val >> (8 * sizeof(unsigned int) - 4)) +
>> +                       k->mode)) ^ (*p);
> v1 added k->mode after the bit-wise or (probably changed by the added
> parenthesis due to the compiler warning).
> Using
>
>       (((val << 4) | (val >> (8 * sizeof(unsigned int) - 4))) + k->mode) ^ (*p);
>
> gives a slightly better spread (tested against refpolicy (master)):
>
>      nodups_spec:  6062 entries and 606/606 buckets used, longest chain length 21
> vs
>      nodups_spec:  6062 entries and 606/606 buckets used, longest chain length 20
>
> And for a adjusted hashtable size (see below):
>
>      nodups_spec:  6062 entries and 3807/6062 buckets used, longest
> chain length 6
> vs
>      nodups_spec:  6062 entries and 3815/6062 buckets used, longest
> chain length 6


That's good advice. I compared the two hash calculations with my own 
test set.

Using

      (((val << 4) | (val >> (8 * sizeof(unsigned int) - 4))) + k->mode) 
^ (*p);

The hash result is better.

root ~ # semodule -i myapp1.pp
nodups_specs: 5002 entries and 500/500 buckets used, longest chain length 16
root ~ # semodule -i myapp2.pp
nodups_specs: 10002 entries and 1000/1000 buckets used, longest chain 
length 20
root ~ # semodule -i myapp3.pp
nodups_specs: 20002 entries and 1539/2000 buckets used, longest chain 
length 20

adjusted hashtable size:

root ~ # semodule -i myapp1.pp
nodups_specs: 5002 entries and 3550/5002 buckets used, longest chain 
length 3
root ~ # semodule -i myapp2.pp
nodups_specs: 10002 entries and 7060/10002 buckets used, longest chain 
length 4
root ~ # semodule -i myapp3.pp
nodups_specs: 20002 entries and 13454/20002 buckets used, longest chain 
length 4

If I use the hash calculation of the patch v2, the result is as follows:

root ~ # semodule -i myapp1.pp
nodups_specs: 5002 entries and 500/500 buckets used, longest chain length 16
root ~ # semodule -i myapp2.pp
nodups_specs: 10002 entries and 1000/1000 buckets used, longest chain 
length 22
root ~ # semodule -i myapp3.pp
nodups_specs: 20002 entries and 1310/2000 buckets used, longest chain 
length 22

adjusted hashtable size:

root ~ # semodule -i myapp1.pp
nodups_specs: 5002 entries and 3551/5002 buckets used, longest chain 
length 4
root ~ # semodule -i myapp2.pp
nodups_specs: 10002 entries and 5602/10002 buckets used, longest chain 
length 5
root ~ # semodule -i myapp3.pp
nodups_specs: 20002 entries and 8975/20002 buckets used, longest chain 
length 6


The result is obvious that the hash calculation of the patch v1 is 
better. Thank you very much for your review and I'll fix it in the next 
patch version.

>> +       return val % h->size;
>> +}
>> +
>> +static int symcmp(hashtab_t h
>> +                 __attribute__ ((unused)), const_hashtab_key_t key1,
>> +                 const_hashtab_key_t key2)
>> +{
>> +       const struct chkdups_key *a = (const struct chkdups_key *)key1;
>> +       const struct chkdups_key *b = (const struct chkdups_key *)key2;
>> +
>> +       return strcmp(a->regex, b->regex) || (a->mode && b->mode && a->mode != b->mode);
>> +}
>> +
>> +static int destroy_chkdups_key(hashtab_key_t key)
>> +{
>> +       free(key);
>> +
>> +       return 0;
>> +}
>> +
>> +/*
>>    * Warn about duplicate specifications.
>>    */
>>   static int nodups_specs(struct saved_data *data, const char *path)
>>   {
>> -       int rc = 0;
>> -       unsigned int ii, jj;
>> +       int rc = 0, ret = 0;
>> +       unsigned int ii;
>>          struct spec *curr_spec, *spec_arr = data->spec_arr;
>> +       struct chkdups_key *new = NULL;
>> +       unsigned int hashtab_len = (data->nspec / 10) ? data->nspec / 10 : 1;
>>
>> +       hashtab_t hash_table = hashtab_create(symhash, symcmp, hashtab_len);
> v1 used a hashtable size of `data->nspec`, instead of its tenth.
> Since the hashtable implementation from newrole does not contain a
> resize feature, like the one from libsepol, the size will be fixed for
> its lifetime.
> Using `data.>nspec` gives slightly better (but probably negligible) performance:
>
>      Benchmark 1: DESTDIR=~/Downloads/destdir/
> ./scripts/env_use_destdir ~/Downloads/destdir/sbin/setfiles -c
> ../refpolicy/tmp/policy.bin ../refpolicy/tmp/all_mods.fc
>       Time (mean ± σ):     340.4 ms ±  14.4 ms    [User: 280.6 ms,
> System: 59.7 ms]
>       Range (min … max):   328.1 ms … 386.0 ms    30 runs
> vs
>      Benchmark 1: DESTDIR=~/Downloads/destdir/
> ./scripts/env_use_destdir ~/Downloads/destdir/sbin/setfiles -c
> ../refpolicy/tmp/policy.bin ../refpolicy/tmp/all_mods.fc
>       Time (mean ± σ):     334.7 ms ±   5.9 ms    [User: 279.6 ms,
> System: 55.0 ms]
>       Range (min … max):   327.5 ms … 362.1 ms    30 runs
>
> Since your policy contains much more file context definitions, could
> you run some benchmarks yourself?


I can't easily execute this test script in my corporate environment. So 
I manually performed multiple tests.


When the length of the hash table is `data->nspec` :

root ~ # time semodule -i myapp1.pp          (10 runs) 5002 entries

Average Time real : 0.723s

Range (min … max): 0.695s ... 0.725s

root ~ # time semodule -i myapp2.pp          (10 runs) 10002 entries

Average Time real : 0.952s

Range (min … max): 0.933s ... 0.973s

root ~ # time semodule -i myapp3.pp          (10 runs) 20002 entries

Average Time real : 1.888s

Range (min … max): 1.866s ... 1.916s


When the length of the hash table is `hashtab_len ` :

root ~ # time semodule -i myapp1.pp          (10 runs) 5002 entries

Average Time real : 0.730s

Range (min … max): 0.713s ... 0.743s

root ~ # time semodule -i myapp2.pp          (10 runs) 10002 entries

Average Time real : 0.965s

Range (min … max): 0.933s ... 0.983s

root ~ # time semodule -i myapp3.pp          (10 runs) 20002 entries

Average Time real : 1.908s

Range (min … max): 1.886s ... 1.946s


The larger the hash table, the lower the probability of hash collision. 
When the length of the hash table is `data->nspec`, The overall 
efficiency will be improved a little.

My idea is to set up a macro here to control the reduction in the length 
of the hashtab table. Users can determine the reduction ratio of the 
hashtab. Memory is a scarce resource in my usage scenario, so I may need 
to limit the size of the hash table. However, in a computer or server, 
memory resources are sufficient, and a large hash table can be used.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] libselinux: performance optimization for duplicate detection
  2023-02-25 14:32       ` wanghuizhao
@ 2023-03-03 14:15         ` wanghuizhao
  0 siblings, 0 replies; 15+ messages in thread
From: wanghuizhao @ 2023-03-03 14:15 UTC (permalink / raw)
  To: Christian Göttsche; +Cc: selinux, weiyuchen3


On 2023/2/25 22:32, wanghuizhao wrote:
>>>   /*
>>> + * hash calculation and key comparison of hash table
>>> + */
>>> +
>>> +static unsigned int symhash(hashtab_t h, const_hashtab_key_t key)
>>> +{
>>> +       const struct chkdups_key *k = (const struct chkdups_key *)key;
>>> +       const char *p = NULL;
>>> +       size_t size;
>>> +       unsigned int val = 0;
>>> +
>>> +       size = strlen(k->regex);
>>> +       for (p = k->regex; ((size_t) (p - k->regex)) < size; p++)
>>> +               val =
>>> +                       ((val << 4) | ((val >> (8 * sizeof(unsigned 
>>> int) - 4)) +
>>> +                       k->mode)) ^ (*p);
>> v1 added k->mode after the bit-wise or (probably changed by the added
>> parenthesis due to the compiler warning).
>> Using
>>
>>       (((val << 4) | (val >> (8 * sizeof(unsigned int) - 4))) + 
>> k->mode) ^ (*p);
>>
>> gives a slightly better spread (tested against refpolicy (master)):
>>
>>      nodups_spec:  6062 entries and 606/606 buckets used, longest 
>> chain length 21
>> vs
>>      nodups_spec:  6062 entries and 606/606 buckets used, longest 
>> chain length 20
>>
>> And for a adjusted hashtable size (see below):
>>
>>      nodups_spec:  6062 entries and 3807/6062 buckets used, longest
>> chain length 6
>> vs
>>      nodups_spec:  6062 entries and 3815/6062 buckets used, longest
>> chain length 6
>
>
> That's good advice. I compared the two hash calculations with my own 
> test set.
>
> Using
>
>      (((val << 4) | (val >> (8 * sizeof(unsigned int) - 4))) + 
> k->mode) ^ (*p);
>
> The hash result is better.
>
> root ~ # semodule -i myapp1.pp
> nodups_specs: 5002 entries and 500/500 buckets used, longest chain 
> length 16
> root ~ # semodule -i myapp2.pp
> nodups_specs: 10002 entries and 1000/1000 buckets used, longest chain 
> length 20
> root ~ # semodule -i myapp3.pp
> nodups_specs: 20002 entries and 1539/2000 buckets used, longest chain 
> length 20
>
> adjusted hashtable size:
>
> root ~ # semodule -i myapp1.pp
> nodups_specs: 5002 entries and 3550/5002 buckets used, longest chain 
> length 3
> root ~ # semodule -i myapp2.pp
> nodups_specs: 10002 entries and 7060/10002 buckets used, longest chain 
> length 4
> root ~ # semodule -i myapp3.pp
> nodups_specs: 20002 entries and 13454/20002 buckets used, longest 
> chain length 4
>
> If I use the hash calculation of the patch v2, the result is as follows:
>
> root ~ # semodule -i myapp1.pp
> nodups_specs: 5002 entries and 500/500 buckets used, longest chain 
> length 16
> root ~ # semodule -i myapp2.pp
> nodups_specs: 10002 entries and 1000/1000 buckets used, longest chain 
> length 22
> root ~ # semodule -i myapp3.pp
> nodups_specs: 20002 entries and 1310/2000 buckets used, longest chain 
> length 22
>
> adjusted hashtable size:
>
> root ~ # semodule -i myapp1.pp
> nodups_specs: 5002 entries and 3551/5002 buckets used, longest chain 
> length 4
> root ~ # semodule -i myapp2.pp
> nodups_specs: 10002 entries and 5602/10002 buckets used, longest chain 
> length 5
> root ~ # semodule -i myapp3.pp
> nodups_specs: 20002 entries and 8975/20002 buckets used, longest chain 
> length 6
>
>
> The result is obvious that the hash calculation of the patch v1 is 
> better. Thank you very much for your review and I'll fix it in the 
> next patch version.


(((val << 4) | (val >> (8 * sizeof(unsigned int) - 4))) + k->mode) ^ (*p);

I found that this hash calculation would cause a problem. In a scenario 
like this：

(1) regex: /tmp/something  mode: 0

(2) regex: /tmp/something  mode: 123

In the original checking logic, the two are judged as conflicting. 
However, in the new function implementation, `mode` is added to the hash 
calculation. As a result, they may have different hash values, so 
duplicates cannot be detected. I will fix this bug in the next patch.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-03-03 14:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 11:42 [PATCH 0/2] Improve efficiency of detecting duplicate in libselinux wanghuizhao
2023-02-09 11:42 ` [PATCH 1/2] libselinux: migrating hashtab from policycoreutils wanghuizhao
2023-02-10 13:58   ` Christian Göttsche
2023-02-11 11:53     ` wanghuizhao
2023-02-09 11:42 ` [PATCH 2/2] libselinux: performance optimization for duplicate detection wanghuizhao
2023-02-10 14:09   ` Christian Göttsche
2023-02-11 12:50     ` wanghuizhao
2023-02-17  8:44 ` [PATCH v2 0/3] Improve efficiency of detecting duplicate in libselinux wanghuizhao
2023-02-17  8:44   ` [PATCH v2 1/3] libselinux: migrating hashtab from policycoreutils wanghuizhao
2023-02-17  8:44   ` [PATCH v2 2/3] libselinux: adapting hashtab to libselinux wanghuizhao
2023-02-17  8:44   ` [PATCH v2 3/3] libselinux: performance optimization for duplicate detection wanghuizhao
2023-02-23  7:42     ` wanghuizhao
2023-02-24 12:00     ` Christian Göttsche
2023-02-25 14:32       ` wanghuizhao
2023-03-03 14:15         ` wanghuizhao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.