selinux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9] selinux: sidtab: reverse lookup hash table
@ 2019-11-22  9:33 Jeff Vander Stoep
  2019-11-22 14:21 ` Stephen Smalley
  2019-12-03  0:32 ` Paul Moore
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff Vander Stoep @ 2019-11-22  9:33 UTC (permalink / raw)
  To: selinux
  Cc: omosnace, paul, sds, will, paulmck, rcu, Jeff Vander Stoep,
	Jovana Knezevic

This replaces the reverse table lookup and reverse cache with a
hashtable which improves cache-miss reverse-lookup times from
O(n) to O(1)* and maintains the same performance as a reverse
cache hit.

This reduces the time needed to add a new sidtab entry from ~500us
to 5us on a Pixel 3 when there are ~10,000 sidtab entries.

The implementation uses the kernel's generic hashtable API,
It uses the context's string represtation as the hash source,
and the kernels generic string hashing algorithm full_name_hash()
to reduce the string to a 32 bit value.

This change also maintains the improvement introduced in
commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
performance") which removed the need to keep the current sidtab
locked during policy reload. It does however introduce periodic
locking of the target sidtab while converting the hashtable. Sidtab
entries are never modified or removed, so the context struct stored
in the sid_to_context tree can also be used for the context_to_sid
hashtable to reduce memory usage.

This bug was reported by:
- On the selinux bug tracker.
  BUG: kernel softlockup due to too many SIDs/contexts #37
  https://github.com/SELinuxProject/selinux-kernel/issues/37
- Jovana Knezevic on Android's bugtracker.
  Bug: 140252993
  "During multi-user performance testing, we create and remove users
  many times. selinux_android_restorecon_pkgdir goes from 1ms to over
  20ms after about 200 user creations and removals. Accumulated over
  ~280 packages, that adds a significant time to user creation,
  making perf benchmarks unreliable."

* Hashtable lookup is only O(1) when n < the number of buckets.

Changes in V2:
-The hashtable uses sidtab_entry_leaf objects directly so these
objects are shared between the sid_to_context lookup tree and the
context_to_sid hashtable. This simplifies memory allocation and
was suggested by Ondrej Mosnacek.
-The new sidtab hash stats file in selinuxfs has been moved out of
the avc dir and into a new "ss" dir.

V3:
-Add lock nesting notation.

V4/V5:
-Moved to *_rcu variants of the various hashtable functions
as suggested by Will Deacon.
-Naming/spelling fixups.

V6
-Remove nested locking. Use lock of active sidtab to gate
access to the new sidtab.
-Remove use of rcu_head/kfree_rcu(), they're unnecessary because
hashtable objects are never removed when read/add operations are
occurring. Why is this safe? Quoting Ondrej Mosnacek from the
selinux mailing list:
"It is not visible in this patch, but the sidtab (along with other
policy-lifetime structures) is protected by a big fat read-write lock.
The only places where sidtab_destroy() is called are (a) error paths
when initializing a new sidtab (here the sidtab isn't shared yet, so
no race) and (b) when freeing the old sidtab during policy reload - in
this case it is happening after a policy write-locked critical
section, which had removed the old sidtab pointer from the shared
structures, so at that point all sidtab readers will already be
accessing the new sidtab and the old one is visible only by the thread
doing the destruction."

V7
-Change format of /sys/fs/selinux/ss/sidtab_hash_stats to match
/sys/fs/selinux/avc/hash_stats.
-Add __rcu annotation to rcu pointers.
-Test with CONFIG_SPARSE_RCU_POINTER and CONFIG_PROVE_RCU.
-Add rcu@vger.kernel.org and Paul McKenney to Cc for review of the
RCU logic.

V8
-Removed the __rcu annotation used in V7. The annotation is
intended to be applied to pointers to an object, however the
objects referenced in the rcu hashtable are allocated in an
array.
-Fixed bug where multiple SIDs were receiving the same hash
due to security_get_user_sids() reusing the same context
struct without calling context_init() on it. This bug was
discovered and root-caused by Stephen Smalley.

V9
-Do not compute the hash in string_to_context_struct
because this string representation is non-canonical.

Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Jovana Knezevic <jovanak@google.com>
---
 security/selinux/Kconfig            |  12 ++
 security/selinux/include/security.h |   1 +
 security/selinux/selinuxfs.c        |  65 +++++++
 security/selinux/ss/context.h       |  11 +-
 security/selinux/ss/policydb.c      |   5 +
 security/selinux/ss/services.c      |  96 +++++++---
 security/selinux/ss/services.h      |   4 +-
 security/selinux/ss/sidtab.c        | 263 ++++++++++++++--------------
 security/selinux/ss/sidtab.h        |  16 +-
 9 files changed, 306 insertions(+), 167 deletions(-)

diff --git a/security/selinux/Kconfig b/security/selinux/Kconfig
index 5711689deb6a..c9e576c430c2 100644
--- a/security/selinux/Kconfig
+++ b/security/selinux/Kconfig
@@ -85,3 +85,15 @@ config SECURITY_SELINUX_CHECKREQPROT_VALUE
 	  via /selinux/checkreqprot if authorized by policy.
 
 	  If you are unsure how to answer this question, answer 0.
+
+config SECURITY_SELINUX_SIDTAB_HASH_BITS
+	int "NSA SELinux sidtab hashtable size"
+	depends on SECURITY_SELINUX
+	range 8 13
+	default 9
+	help
+	  This option sets the number of buckets used in the sidtab hashtable
+	  to 2^SECURITY_SELINUX_SIDTAB_HASH_BITS buckets. The number of hash
+	  collisions may be viewed at /sys/fs/selinux/ss/sidtab_hash_stats. If
+	  chain lengths are high (e.g. > 20) then selecting a higher value here
+	  will ensure that lookups times are short and stable.
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index ae840634e3c7..8c0dbbd076c6 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -395,5 +395,6 @@ extern int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm);
 extern void avtab_cache_init(void);
 extern void ebitmap_cache_init(void);
 extern void hashtab_cache_init(void);
+extern int security_sidtab_hash_stats(struct selinux_state *state, char *page);
 
 #endif /* _SELINUX_SECURITY_H_ */
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index ee94fa469c29..dd7bb1f1dc99 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -1482,6 +1482,32 @@ static ssize_t sel_read_avc_hash_stats(struct file *filp, char __user *buf,
 	return length;
 }
 
+static ssize_t sel_read_sidtab_hash_stats(struct file *filp, char __user *buf,
+					size_t count, loff_t *ppos)
+{
+	struct selinux_fs_info *fsi = file_inode(filp)->i_sb->s_fs_info;
+	struct selinux_state *state = fsi->state;
+	char *page;
+	ssize_t length;
+
+	page = (char *)__get_free_page(GFP_KERNEL);
+	if (!page)
+		return -ENOMEM;
+
+	length = security_sidtab_hash_stats(state, page);
+	if (length >= 0)
+		length = simple_read_from_buffer(buf, count, ppos, page,
+						length);
+	free_page((unsigned long)page);
+
+	return length;
+}
+
+static const struct file_operations sel_sidtab_hash_stats_ops = {
+	.read		= sel_read_sidtab_hash_stats,
+	.llseek		= generic_file_llseek,
+};
+
 static const struct file_operations sel_avc_cache_threshold_ops = {
 	.read		= sel_read_avc_cache_threshold,
 	.write		= sel_write_avc_cache_threshold,
@@ -1599,6 +1625,37 @@ static int sel_make_avc_files(struct dentry *dir)
 	return 0;
 }
 
+static int sel_make_ss_files(struct dentry *dir)
+{
+	struct super_block *sb = dir->d_sb;
+	struct selinux_fs_info *fsi = sb->s_fs_info;
+	int i;
+	static struct tree_descr files[] = {
+		{ "sidtab_hash_stats", &sel_sidtab_hash_stats_ops, S_IRUGO },
+	};
+
+	for (i = 0; i < ARRAY_SIZE(files); i++) {
+		struct inode *inode;
+		struct dentry *dentry;
+
+		dentry = d_alloc_name(dir, files[i].name);
+		if (!dentry)
+			return -ENOMEM;
+
+		inode = sel_make_inode(dir->d_sb, S_IFREG|files[i].mode);
+		if (!inode) {
+			dput(dentry);
+			return -ENOMEM;
+		}
+
+		inode->i_fop = files[i].ops;
+		inode->i_ino = ++fsi->last_ino;
+		d_add(dentry, inode);
+	}
+
+	return 0;
+}
+
 static ssize_t sel_read_initcon(struct file *file, char __user *buf,
 				size_t count, loff_t *ppos)
 {
@@ -1963,6 +2020,14 @@ static int sel_fill_super(struct super_block *sb, struct fs_context *fc)
 	}
 
 	ret = sel_make_avc_files(dentry);
+
+	dentry = sel_make_dir(sb->s_root, "ss", &fsi->last_ino);
+	if (IS_ERR(dentry)) {
+		ret = PTR_ERR(dentry);
+		goto err;
+	}
+
+	ret = sel_make_ss_files(dentry);
 	if (ret)
 		goto err;
 
diff --git a/security/selinux/ss/context.h b/security/selinux/ss/context.h
index 513e67f48878..3ba044fe02ed 100644
--- a/security/selinux/ss/context.h
+++ b/security/selinux/ss/context.h
@@ -31,6 +31,7 @@ struct context {
 	u32 len;        /* length of string in bytes */
 	struct mls_range range;
 	char *str;	/* string representation if context cannot be mapped. */
+	u32 hash;	/* a hash of the string representation */
 };
 
 static inline void mls_context_init(struct context *c)
@@ -168,12 +169,13 @@ static inline int context_cpy(struct context *dst, struct context *src)
 		kfree(dst->str);
 		return rc;
 	}
+	dst->hash = src->hash;
 	return 0;
 }
 
 static inline void context_destroy(struct context *c)
 {
-	c->user = c->role = c->type = 0;
+	c->user = c->role = c->type = c->hash = 0;
 	kfree(c->str);
 	c->str = NULL;
 	c->len = 0;
@@ -182,6 +184,8 @@ static inline void context_destroy(struct context *c)
 
 static inline int context_cmp(struct context *c1, struct context *c2)
 {
+	if (c1->hash && c2->hash && (c1->hash != c2->hash))
+		return 0;
 	if (c1->len && c2->len)
 		return (c1->len == c2->len && !strcmp(c1->str, c2->str));
 	if (c1->len || c2->len)
@@ -192,5 +196,10 @@ static inline int context_cmp(struct context *c1, struct context *c2)
 		mls_context_cmp(c1, c2));
 }
 
+static inline unsigned int context_compute_hash(const char *s)
+{
+	return full_name_hash(NULL, s, strlen(s));
+}
+
 #endif	/* _SS_CONTEXT_H_ */
 
diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index e20624a68f5d..e369b0092cdf 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -878,6 +878,11 @@ int policydb_load_isids(struct policydb *p, struct sidtab *s)
 			sidtab_destroy(s);
 			goto out;
 		}
+		rc = context_add_hash(p, &c->context[0]);
+		if (rc) {
+			sidtab_destroy(s);
+			goto out;
+		}
 
 		rc = sidtab_set_initial(s, c->sid[0], &c->context[0]);
 		if (rc) {
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 3a29e7c24ba9..f5287008d843 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1257,6 +1257,17 @@ static int context_struct_to_string(struct policydb *p,
 
 #include "initial_sid_to_string.h"
 
+int security_sidtab_hash_stats(struct selinux_state *state, char *page)
+{
+	int rc;
+
+	read_lock(&state->ss->policy_rwlock);
+	rc = sidtab_hash_stats(state->ss->sidtab, page);
+	read_unlock(&state->ss->policy_rwlock);
+
+	return rc;
+}
+
 const char *security_get_initial_sid_context(u32 sid)
 {
 	if (unlikely(sid > SECINITSID_NUM))
@@ -1449,6 +1460,42 @@ static int string_to_context_struct(struct policydb *pol,
 	return rc;
 }
 
+int context_add_hash(struct policydb *policydb,
+		     struct context *context)
+{
+	int rc;
+	char *str;
+	int len;
+
+	if (context->str) {
+		context->hash = context_compute_hash(context->str);
+	} else {
+		rc = context_struct_to_string(policydb, context,
+					      &str, &len);
+		if (rc)
+			return rc;
+		context->hash = context_compute_hash(str);
+		kfree(str);
+	}
+	return 0;
+}
+
+static int context_struct_to_sid(struct selinux_state *state,
+				 struct context *context, u32 *sid)
+{
+	int rc;
+	struct sidtab *sidtab = state->ss->sidtab;
+	struct policydb *policydb = &state->ss->policydb;
+
+	if (!context->hash) {
+		rc = context_add_hash(policydb, context);
+		if (rc)
+			return rc;
+	}
+
+	return sidtab_context_to_sid(sidtab, context, sid);
+}
+
 static int security_context_to_sid_core(struct selinux_state *state,
 					const char *scontext, u32 scontext_len,
 					u32 *sid, u32 def_sid, gfp_t gfp_flags,
@@ -1501,7 +1548,7 @@ static int security_context_to_sid_core(struct selinux_state *state,
 		str = NULL;
 	} else if (rc)
 		goto out_unlock;
-	rc = sidtab_context_to_sid(sidtab, &context, sid);
+	rc = context_struct_to_sid(state, &context, sid);
 	context_destroy(&context);
 out_unlock:
 	read_unlock(&state->ss->policy_rwlock);
@@ -1805,7 +1852,7 @@ static int security_compute_sid(struct selinux_state *state,
 			goto out_unlock;
 	}
 	/* Obtain the sid for the context. */
-	rc = sidtab_context_to_sid(sidtab, &newcontext, out_sid);
+	rc = context_struct_to_sid(state, &newcontext, out_sid);
 out_unlock:
 	read_unlock(&state->ss->policy_rwlock);
 	context_destroy(&newcontext);
@@ -1950,6 +1997,7 @@ static int convert_context(struct context *oldc, struct context *newc, void *p)
 			context_init(newc);
 			newc->str = s;
 			newc->len = oldc->len;
+			newc->hash = oldc->hash;
 			return 0;
 		}
 		kfree(s);
@@ -2026,6 +2074,10 @@ static int convert_context(struct context *oldc, struct context *newc, void *p)
 			goto bad;
 	}
 
+	rc = context_add_hash(args->newp, newc);
+	if (rc)
+		goto bad;
+
 	return 0;
 bad:
 	/* Map old representation to string and save it. */
@@ -2035,6 +2087,7 @@ static int convert_context(struct context *oldc, struct context *newc, void *p)
 	context_destroy(newc);
 	newc->str = s;
 	newc->len = len;
+	newc->hash = context_compute_hash(s);
 	pr_info("SELinux:  Context %s became invalid (unmapped).\n",
 		newc->str);
 	return 0;
@@ -2273,8 +2326,7 @@ int security_port_sid(struct selinux_state *state,
 
 	if (c) {
 		if (!c->sid[0]) {
-			rc = sidtab_context_to_sid(sidtab,
-						   &c->context[0],
+			rc = context_struct_to_sid(state, &c->context[0],
 						   &c->sid[0]);
 			if (rc)
 				goto out;
@@ -2299,14 +2351,12 @@ int security_ib_pkey_sid(struct selinux_state *state,
 			 u64 subnet_prefix, u16 pkey_num, u32 *out_sid)
 {
 	struct policydb *policydb;
-	struct sidtab *sidtab;
 	struct ocontext *c;
 	int rc = 0;
 
 	read_lock(&state->ss->policy_rwlock);
 
 	policydb = &state->ss->policydb;
-	sidtab = state->ss->sidtab;
 
 	c = policydb->ocontexts[OCON_IBPKEY];
 	while (c) {
@@ -2320,7 +2370,7 @@ int security_ib_pkey_sid(struct selinux_state *state,
 
 	if (c) {
 		if (!c->sid[0]) {
-			rc = sidtab_context_to_sid(sidtab,
+			rc = context_struct_to_sid(state,
 						   &c->context[0],
 						   &c->sid[0]);
 			if (rc)
@@ -2367,8 +2417,7 @@ int security_ib_endport_sid(struct selinux_state *state,
 
 	if (c) {
 		if (!c->sid[0]) {
-			rc = sidtab_context_to_sid(sidtab,
-						   &c->context[0],
+			rc = context_struct_to_sid(state, &c->context[0],
 						   &c->sid[0]);
 			if (rc)
 				goto out;
@@ -2409,13 +2458,11 @@ int security_netif_sid(struct selinux_state *state,
 
 	if (c) {
 		if (!c->sid[0] || !c->sid[1]) {
-			rc = sidtab_context_to_sid(sidtab,
-						  &c->context[0],
-						  &c->sid[0]);
+			rc = context_struct_to_sid(state, &c->context[0],
+						   &c->sid[0]);
 			if (rc)
 				goto out;
-			rc = sidtab_context_to_sid(sidtab,
-						   &c->context[1],
+			rc = context_struct_to_sid(state, &c->context[1],
 						   &c->sid[1]);
 			if (rc)
 				goto out;
@@ -2456,14 +2503,12 @@ int security_node_sid(struct selinux_state *state,
 		      u32 *out_sid)
 {
 	struct policydb *policydb;
-	struct sidtab *sidtab;
 	int rc;
 	struct ocontext *c;
 
 	read_lock(&state->ss->policy_rwlock);
 
 	policydb = &state->ss->policydb;
-	sidtab = state->ss->sidtab;
 
 	switch (domain) {
 	case AF_INET: {
@@ -2505,7 +2550,7 @@ int security_node_sid(struct selinux_state *state,
 
 	if (c) {
 		if (!c->sid[0]) {
-			rc = sidtab_context_to_sid(sidtab,
+			rc = context_struct_to_sid(state,
 						   &c->context[0],
 						   &c->sid[0]);
 			if (rc)
@@ -2589,12 +2634,17 @@ int security_get_user_sids(struct selinux_state *state,
 		usercon.role = i + 1;
 		ebitmap_for_each_positive_bit(&role->types, tnode, j) {
 			usercon.type = j + 1;
+			/*
+			 * The same context struct is reused here so the hash
+			 * must be reset.
+			 */
+			usercon.hash = 0;
 
 			if (mls_setup_user_range(policydb, fromcon, user,
 						 &usercon))
 				continue;
 
-			rc = sidtab_context_to_sid(sidtab, &usercon, &sid);
+			rc = context_struct_to_sid(state, &usercon, &sid);
 			if (rc)
 				goto out_unlock;
 			if (mynel < maxnel) {
@@ -2665,7 +2715,6 @@ static inline int __security_genfs_sid(struct selinux_state *state,
 				       u32 *sid)
 {
 	struct policydb *policydb = &state->ss->policydb;
-	struct sidtab *sidtab = state->ss->sidtab;
 	int len;
 	u16 sclass;
 	struct genfs *genfs;
@@ -2700,7 +2749,7 @@ static inline int __security_genfs_sid(struct selinux_state *state,
 		goto out;
 
 	if (!c->sid[0]) {
-		rc = sidtab_context_to_sid(sidtab, &c->context[0], &c->sid[0]);
+		rc = context_struct_to_sid(state, &c->context[0], &c->sid[0]);
 		if (rc)
 			goto out;
 	}
@@ -2763,7 +2812,7 @@ int security_fs_use(struct selinux_state *state, struct super_block *sb)
 	if (c) {
 		sbsec->behavior = c->v.behavior;
 		if (!c->sid[0]) {
-			rc = sidtab_context_to_sid(sidtab, &c->context[0],
+			rc = context_struct_to_sid(state, &c->context[0],
 						   &c->sid[0]);
 			if (rc)
 				goto out;
@@ -3019,8 +3068,7 @@ int security_sid_mls_copy(struct selinux_state *state,
 			goto out_unlock;
 		}
 	}
-
-	rc = sidtab_context_to_sid(sidtab, &newcon, new_sid);
+	rc = context_struct_to_sid(state, &newcon, new_sid);
 out_unlock:
 	read_unlock(&state->ss->policy_rwlock);
 	context_destroy(&newcon);
@@ -3613,7 +3661,7 @@ int security_netlbl_secattr_to_sid(struct selinux_state *state,
 		if (!mls_context_isvalid(policydb, &ctx_new))
 			goto out_free;
 
-		rc = sidtab_context_to_sid(sidtab, &ctx_new, sid);
+		rc = context_struct_to_sid(state, &ctx_new, sid);
 		if (rc)
 			goto out_free;
 
diff --git a/security/selinux/ss/services.h b/security/selinux/ss/services.h
index 9a36de860368..fc40640a9725 100644
--- a/security/selinux/ss/services.h
+++ b/security/selinux/ss/services.h
@@ -8,7 +8,7 @@
 #define _SS_SERVICES_H_
 
 #include "policydb.h"
-#include "sidtab.h"
+#include "context.h"
 
 /* Mapping for a single class */
 struct selinux_mapping {
@@ -39,4 +39,6 @@ void services_compute_xperms_drivers(struct extended_perms *xperms,
 void services_compute_xperms_decision(struct extended_perms_decision *xpermd,
 					struct avtab_node *node);
 
+int context_add_hash(struct policydb *policydb, struct context *context);
+
 #endif	/* _SS_SERVICES_H_ */
diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c
index 7d49994e8d5f..97d6fedeff68 100644
--- a/security/selinux/ss/sidtab.c
+++ b/security/selinux/ss/sidtab.c
@@ -17,26 +17,43 @@
 #include "security.h"
 #include "sidtab.h"
 
+#define index_to_sid(index) (index + SECINITSID_NUM + 1)
+#define sid_to_index(sid) (sid - (SECINITSID_NUM + 1))
+
 int sidtab_init(struct sidtab *s)
 {
 	u32 i;
 
 	memset(s->roots, 0, sizeof(s->roots));
 
-	/* max count is SIDTAB_MAX so valid index is always < SIDTAB_MAX */
-	for (i = 0; i < SIDTAB_RCACHE_SIZE; i++)
-		s->rcache[i] = SIDTAB_MAX;
-
 	for (i = 0; i < SECINITSID_NUM; i++)
 		s->isids[i].set = 0;
 
 	s->count = 0;
 	s->convert = NULL;
+	hash_init(s->context_to_sid);
 
 	spin_lock_init(&s->lock);
 	return 0;
 }
 
+static u32 context_to_sid(struct sidtab *s, struct context *context)
+{
+	struct sidtab_entry_leaf *entry;
+	u32 sid = 0;
+
+	rcu_read_lock();
+	hash_for_each_possible_rcu(s->context_to_sid, entry, list,
+				   context->hash) {
+		if (context_cmp(&entry->context, context)) {
+			sid = entry->sid;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	return sid;
+}
+
 int sidtab_set_initial(struct sidtab *s, u32 sid, struct context *context)
 {
 	struct sidtab_isid_entry *entry;
@@ -47,14 +64,60 @@ int sidtab_set_initial(struct sidtab *s, u32 sid, struct context *context)
 
 	entry = &s->isids[sid - 1];
 
-	rc = context_cpy(&entry->context, context);
+	rc = context_cpy(&entry->leaf.context, context);
 	if (rc)
 		return rc;
 
 	entry->set = 1;
+
+	/*
+	 * Multiple initial sids may map to the same context. Check that this
+	 * context is not already represented in the context_to_sid hashtable
+	 * to avoid duplicate entries and long linked lists upon hash
+	 * collision.
+	 */
+	if (!context_to_sid(s, context)) {
+		entry->leaf.sid = sid;
+		hash_add(s->context_to_sid, &entry->leaf.list, context->hash);
+	}
+
 	return 0;
 }
 
+int sidtab_hash_stats(struct sidtab *sidtab, char *page)
+{
+	int i;
+	int chain_len = 0;
+	int slots_used = 0;
+	int entries = 0;
+	int max_chain_len = 0;
+	int cur_bucket = 0;
+	struct sidtab_entry_leaf *entry;
+
+	rcu_read_lock();
+	hash_for_each_rcu(sidtab->context_to_sid, i, entry, list) {
+		entries++;
+		if (i == cur_bucket) {
+			chain_len++;
+			if (chain_len == 1)
+				slots_used++;
+		} else {
+			cur_bucket = i;
+			if (chain_len > max_chain_len)
+				max_chain_len = chain_len;
+			chain_len = 0;
+		}
+	}
+	rcu_read_unlock();
+
+	if (chain_len > max_chain_len)
+		max_chain_len = chain_len;
+
+	return scnprintf(page, PAGE_SIZE, "entries: %d\nbuckets used: %d/%d\n"
+			 "longest chain: %d\n", entries,
+			 slots_used, SIDTAB_HASH_BUCKETS, max_chain_len);
+}
+
 static u32 sidtab_level_from_count(u32 count)
 {
 	u32 capacity = SIDTAB_LEAF_ENTRIES;
@@ -88,7 +151,8 @@ static int sidtab_alloc_roots(struct sidtab *s, u32 level)
 	return 0;
 }
 
-static struct context *sidtab_do_lookup(struct sidtab *s, u32 index, int alloc)
+static struct sidtab_entry_leaf *sidtab_do_lookup(struct sidtab *s, u32 index,
+						  int alloc)
 {
 	union sidtab_entry_inner *entry;
 	u32 level, capacity_shift, leaf_index = index / SIDTAB_LEAF_ENTRIES;
@@ -125,7 +189,7 @@ static struct context *sidtab_do_lookup(struct sidtab *s, u32 index, int alloc)
 		if (!entry->ptr_leaf)
 			return NULL;
 	}
-	return &entry->ptr_leaf->entries[index % SIDTAB_LEAF_ENTRIES].context;
+	return &entry->ptr_leaf->entries[index % SIDTAB_LEAF_ENTRIES];
 }
 
 static struct context *sidtab_lookup(struct sidtab *s, u32 index)
@@ -136,12 +200,12 @@ static struct context *sidtab_lookup(struct sidtab *s, u32 index)
 	if (index >= count)
 		return NULL;
 
-	return sidtab_do_lookup(s, index, 0);
+	return &sidtab_do_lookup(s, index, 0)->context;
 }
 
 static struct context *sidtab_lookup_initial(struct sidtab *s, u32 sid)
 {
-	return s->isids[sid - 1].set ? &s->isids[sid - 1].context : NULL;
+	return s->isids[sid - 1].set ? &s->isids[sid - 1].leaf.context : NULL;
 }
 
 static struct context *sidtab_search_core(struct sidtab *s, u32 sid, int force)
@@ -150,7 +214,7 @@ static struct context *sidtab_search_core(struct sidtab *s, u32 sid, int force)
 
 	if (sid != 0) {
 		if (sid > SECINITSID_NUM)
-			context = sidtab_lookup(s, sid - (SECINITSID_NUM + 1));
+			context = sidtab_lookup(s, sid_to_index(sid));
 		else
 			context = sidtab_lookup_initial(s, sid);
 		if (context && (!context->len || force))
@@ -170,117 +234,30 @@ struct context *sidtab_search_force(struct sidtab *s, u32 sid)
 	return sidtab_search_core(s, sid, 1);
 }
 
-static int sidtab_find_context(union sidtab_entry_inner entry,
-			       u32 *pos, u32 count, u32 level,
-			       struct context *context, u32 *index)
-{
-	int rc;
-	u32 i;
-
-	if (level != 0) {
-		struct sidtab_node_inner *node = entry.ptr_inner;
-
-		i = 0;
-		while (i < SIDTAB_INNER_ENTRIES && *pos < count) {
-			rc = sidtab_find_context(node->entries[i],
-						 pos, count, level - 1,
-						 context, index);
-			if (rc == 0)
-				return 0;
-			i++;
-		}
-	} else {
-		struct sidtab_node_leaf *node = entry.ptr_leaf;
-
-		i = 0;
-		while (i < SIDTAB_LEAF_ENTRIES && *pos < count) {
-			if (context_cmp(&node->entries[i].context, context)) {
-				*index = *pos;
-				return 0;
-			}
-			(*pos)++;
-			i++;
-		}
-	}
-	return -ENOENT;
-}
-
-static void sidtab_rcache_update(struct sidtab *s, u32 index, u32 pos)
-{
-	while (pos > 0) {
-		WRITE_ONCE(s->rcache[pos], READ_ONCE(s->rcache[pos - 1]));
-		--pos;
-	}
-	WRITE_ONCE(s->rcache[0], index);
-}
-
-static void sidtab_rcache_push(struct sidtab *s, u32 index)
-{
-	sidtab_rcache_update(s, index, SIDTAB_RCACHE_SIZE - 1);
-}
-
-static int sidtab_rcache_search(struct sidtab *s, struct context *context,
-				u32 *index)
-{
-	u32 i;
-
-	for (i = 0; i < SIDTAB_RCACHE_SIZE; i++) {
-		u32 v = READ_ONCE(s->rcache[i]);
-
-		if (v >= SIDTAB_MAX)
-			continue;
-
-		if (context_cmp(sidtab_do_lookup(s, v, 0), context)) {
-			sidtab_rcache_update(s, v, i);
-			*index = v;
-			return 0;
-		}
-	}
-	return -ENOENT;
-}
-
-static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
-				 u32 *index)
+int sidtab_context_to_sid(struct sidtab *s, struct context *context,
+			  u32 *sid)
 {
 	unsigned long flags;
-	u32 count, count_locked, level, pos;
+	u32 count;
 	struct sidtab_convert_params *convert;
-	struct context *dst, *dst_convert;
+	struct sidtab_entry_leaf *dst, *dst_convert;
 	int rc;
 
-	rc = sidtab_rcache_search(s, context, index);
-	if (rc == 0)
-		return 0;
-
-	/* read entries only after reading count */
-	count = smp_load_acquire(&s->count);
-	level = sidtab_level_from_count(count);
-
-	pos = 0;
-	rc = sidtab_find_context(s->roots[level], &pos, count, level,
-				 context, index);
-	if (rc == 0) {
-		sidtab_rcache_push(s, *index);
+	*sid = context_to_sid(s, context);
+	if (*sid)
 		return 0;
-	}
 
 	/* lock-free search failed: lock, re-search, and insert if not found */
 	spin_lock_irqsave(&s->lock, flags);
 
+	rc = 0;
+	*sid = context_to_sid(s, context);
+	if (*sid)
+		goto out_unlock;
+
+	/* read entries only after reading count */
+	count = smp_load_acquire(&s->count);
 	convert = s->convert;
-	count_locked = s->count;
-	level = sidtab_level_from_count(count_locked);
-
-	/* if count has changed before we acquired the lock, then catch up */
-	while (count < count_locked) {
-		if (context_cmp(sidtab_do_lookup(s, count, 0), context)) {
-			sidtab_rcache_push(s, count);
-			*index = count;
-			rc = 0;
-			goto out_unlock;
-		}
-		++count;
-	}
 
 	/* bail out if we already reached max entries */
 	rc = -EOVERFLOW;
@@ -293,7 +270,9 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
 	if (!dst)
 		goto out_unlock;
 
-	rc = context_cpy(dst, context);
+	dst->sid = index_to_sid(count);
+
+	rc = context_cpy(&dst->context, context);
 	if (rc)
 		goto out_unlock;
 
@@ -305,29 +284,32 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
 		rc = -ENOMEM;
 		dst_convert = sidtab_do_lookup(convert->target, count, 1);
 		if (!dst_convert) {
-			context_destroy(dst);
+			context_destroy(&dst->context);
 			goto out_unlock;
 		}
 
-		rc = convert->func(context, dst_convert, convert->args);
+		rc = convert->func(context, &dst_convert->context,
+				convert->args);
 		if (rc) {
-			context_destroy(dst);
+			context_destroy(&dst->context);
 			goto out_unlock;
 		}
-
-		/* at this point we know the insert won't fail */
+		dst_convert->sid = index_to_sid(count);
 		convert->target->count = count + 1;
+
+		hash_add_rcu(convert->target->context_to_sid,
+				&dst_convert->list, dst_convert->context.hash);
 	}
 
 	if (context->len)
 		pr_info("SELinux:  Context %s is not valid (left unmapped).\n",
 			context->str);
 
-	sidtab_rcache_push(s, count);
-	*index = count;
+	*sid = index_to_sid(count);
 
-	/* write entries before writing new count */
+	/* Write entries before updating count. */
 	smp_store_release(&s->count, count + 1);
+	hash_add_rcu(s->context_to_sid, &dst->list, dst->context.hash);
 
 	rc = 0;
 out_unlock:
@@ -335,25 +317,19 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
 	return rc;
 }
 
-int sidtab_context_to_sid(struct sidtab *s, struct context *context, u32 *sid)
+static void sidtab_convert_hashtable(struct sidtab *s, u32 count)
 {
-	int rc;
+	struct sidtab_entry_leaf *entry;
 	u32 i;
 
-	for (i = 0; i < SECINITSID_NUM; i++) {
-		struct sidtab_isid_entry *entry = &s->isids[i];
+	for (i = 0; i < count; i++) {
+		entry = sidtab_do_lookup(s, i, 0);
+		entry->sid = index_to_sid(i);
 
-		if (entry->set && context_cmp(context, &entry->context)) {
-			*sid = i + 1;
-			return 0;
-		}
-	}
+		hash_add_rcu(s->context_to_sid, &entry->list,
+				entry->context.hash);
 
-	rc = sidtab_reverse_lookup(s, context, sid);
-	if (rc)
-		return rc;
-	*sid += SECINITSID_NUM + 1;
-	return 0;
+	}
 }
 
 static int sidtab_convert_tree(union sidtab_entry_inner *edst,
@@ -400,6 +376,7 @@ static int sidtab_convert_tree(union sidtab_entry_inner *edst,
 		}
 		cond_resched();
 	}
+
 	return 0;
 }
 
@@ -435,7 +412,7 @@ int sidtab_convert(struct sidtab *s, struct sidtab_convert_params *params)
 	/* enable live convert of new entries */
 	s->convert = params;
 
-	/* we can safely do the rest of the conversion outside the lock */
+	/* we can safely convert the tree outside the lock */
 	spin_unlock_irqrestore(&s->lock, flags);
 
 	pr_info("SELinux:  Converting %u SID table entries...\n", count);
@@ -449,8 +426,17 @@ int sidtab_convert(struct sidtab *s, struct sidtab_convert_params *params)
 		spin_lock_irqsave(&s->lock, flags);
 		s->convert = NULL;
 		spin_unlock_irqrestore(&s->lock, flags);
+		return rc;
 	}
-	return rc;
+	/*
+	 * The hashtable can also be modified in sidtab_context_to_sid()
+	 * so we must re-acquire the lock here.
+	 */
+	spin_lock_irqsave(&s->lock, flags);
+	sidtab_convert_hashtable(params->target, count);
+	spin_unlock_irqrestore(&s->lock, flags);
+
+	return 0;
 }
 
 static void sidtab_destroy_tree(union sidtab_entry_inner entry, u32 level)
@@ -484,11 +470,16 @@ void sidtab_destroy(struct sidtab *s)
 
 	for (i = 0; i < SECINITSID_NUM; i++)
 		if (s->isids[i].set)
-			context_destroy(&s->isids[i].context);
+			context_destroy(&s->isids[i].leaf.context);
 
 	level = SIDTAB_MAX_LEVEL;
 	while (level && !s->roots[level].ptr_inner)
 		--level;
 
 	sidtab_destroy_tree(s->roots[level], level);
+	/*
+	 * The context_to_sid hashtable's objects are all shared
+	 * with the isids array and context tree, and so don't need
+	 * to be cleaned up here.
+	 */
 }
diff --git a/security/selinux/ss/sidtab.h b/security/selinux/ss/sidtab.h
index 1f4763141aa1..e2809401c417 100644
--- a/security/selinux/ss/sidtab.h
+++ b/security/selinux/ss/sidtab.h
@@ -13,11 +13,14 @@
 
 #include <linux/spinlock_types.h>
 #include <linux/log2.h>
+#include <linux/hashtable.h>
 
 #include "context.h"
 
 struct sidtab_entry_leaf {
+	u32 sid;
 	struct context context;
+	struct hlist_node list;
 };
 
 struct sidtab_node_inner;
@@ -57,7 +60,7 @@ struct sidtab_node_inner {
 
 struct sidtab_isid_entry {
 	int set;
-	struct context context;
+	struct sidtab_entry_leaf leaf;
 };
 
 struct sidtab_convert_params {
@@ -66,7 +69,8 @@ struct sidtab_convert_params {
 	struct sidtab *target;
 };
 
-#define SIDTAB_RCACHE_SIZE 3
+#define SIDTAB_HASH_BITS CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS
+#define SIDTAB_HASH_BUCKETS (1 << SIDTAB_HASH_BITS)
 
 struct sidtab {
 	/*
@@ -83,11 +87,11 @@ struct sidtab {
 	struct sidtab_convert_params *convert;
 	spinlock_t lock;
 
-	/* reverse lookup cache - access atomically via {READ|WRITE}_ONCE() */
-	u32 rcache[SIDTAB_RCACHE_SIZE];
-
 	/* index == SID - 1 (no entry for SECSID_NULL) */
 	struct sidtab_isid_entry isids[SECINITSID_NUM];
+
+	/* Hash table for fast reverse context-to-sid lookups. */
+	DECLARE_HASHTABLE(context_to_sid, SIDTAB_HASH_BITS);
 };
 
 int sidtab_init(struct sidtab *s);
@@ -101,6 +105,8 @@ int sidtab_context_to_sid(struct sidtab *s, struct context *context, u32 *sid);
 
 void sidtab_destroy(struct sidtab *s);
 
+int sidtab_hash_stats(struct sidtab *sidtab, char *page);
+
 #endif	/* _SS_SIDTAB_H_ */
 
 
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-11-22  9:33 [PATCH v9] selinux: sidtab: reverse lookup hash table Jeff Vander Stoep
@ 2019-11-22 14:21 ` Stephen Smalley
  2019-12-03  0:32 ` Paul Moore
  1 sibling, 0 replies; 15+ messages in thread
From: Stephen Smalley @ 2019-11-22 14:21 UTC (permalink / raw)
  To: Jeff Vander Stoep, selinux
  Cc: omosnace, paul, will, paulmck, rcu, Jovana Knezevic

On 11/22/19 4:33 AM, Jeff Vander Stoep wrote:
> This replaces the reverse table lookup and reverse cache with a
> hashtable which improves cache-miss reverse-lookup times from
> O(n) to O(1)* and maintains the same performance as a reverse
> cache hit.
> 
> This reduces the time needed to add a new sidtab entry from ~500us
> to 5us on a Pixel 3 when there are ~10,000 sidtab entries.
> 
> The implementation uses the kernel's generic hashtable API,
> It uses the context's string represtation as the hash source,
> and the kernels generic string hashing algorithm full_name_hash()
> to reduce the string to a 32 bit value.
> 
> This change also maintains the improvement introduced in
> commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
> performance") which removed the need to keep the current sidtab
> locked during policy reload. It does however introduce periodic
> locking of the target sidtab while converting the hashtable. Sidtab
> entries are never modified or removed, so the context struct stored
> in the sid_to_context tree can also be used for the context_to_sid
> hashtable to reduce memory usage.
> 
> This bug was reported by:
> - On the selinux bug tracker.
>    BUG: kernel softlockup due to too many SIDs/contexts #37
>    https://github.com/SELinuxProject/selinux-kernel/issues/37
> - Jovana Knezevic on Android's bugtracker.
>    Bug: 140252993
>    "During multi-user performance testing, we create and remove users
>    many times. selinux_android_restorecon_pkgdir goes from 1ms to over
>    20ms after about 200 user creations and removals. Accumulated over
>    ~280 packages, that adds a significant time to user creation,
>    making perf benchmarks unreliable."
> 
> * Hashtable lookup is only O(1) when n < the number of buckets.
> 
> Changes in V2:
> -The hashtable uses sidtab_entry_leaf objects directly so these
> objects are shared between the sid_to_context lookup tree and the
> context_to_sid hashtable. This simplifies memory allocation and
> was suggested by Ondrej Mosnacek.
> -The new sidtab hash stats file in selinuxfs has been moved out of
> the avc dir and into a new "ss" dir.
> 
> V3:
> -Add lock nesting notation.
> 
> V4/V5:
> -Moved to *_rcu variants of the various hashtable functions
> as suggested by Will Deacon.
> -Naming/spelling fixups.
> 
> V6
> -Remove nested locking. Use lock of active sidtab to gate
> access to the new sidtab.
> -Remove use of rcu_head/kfree_rcu(), they're unnecessary because
> hashtable objects are never removed when read/add operations are
> occurring. Why is this safe? Quoting Ondrej Mosnacek from the
> selinux mailing list:
> "It is not visible in this patch, but the sidtab (along with other
> policy-lifetime structures) is protected by a big fat read-write lock.
> The only places where sidtab_destroy() is called are (a) error paths
> when initializing a new sidtab (here the sidtab isn't shared yet, so
> no race) and (b) when freeing the old sidtab during policy reload - in
> this case it is happening after a policy write-locked critical
> section, which had removed the old sidtab pointer from the shared
> structures, so at that point all sidtab readers will already be
> accessing the new sidtab and the old one is visible only by the thread
> doing the destruction."
> 
> V7
> -Change format of /sys/fs/selinux/ss/sidtab_hash_stats to match
> /sys/fs/selinux/avc/hash_stats.
> -Add __rcu annotation to rcu pointers.
> -Test with CONFIG_SPARSE_RCU_POINTER and CONFIG_PROVE_RCU.
> -Add rcu@vger.kernel.org and Paul McKenney to Cc for review of the
> RCU logic.
> 
> V8
> -Removed the __rcu annotation used in V7. The annotation is
> intended to be applied to pointers to an object, however the
> objects referenced in the rcu hashtable are allocated in an
> array.
> -Fixed bug where multiple SIDs were receiving the same hash
> due to security_get_user_sids() reusing the same context
> struct without calling context_init() on it. This bug was
> discovered and root-caused by Stephen Smalley.
> 
> V9
> -Do not compute the hash in string_to_context_struct
> because this string representation is non-canonical.
> 
> Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
> Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
> Reported-by: Jovana Knezevic <jovanak@google.com>

Looks good to me and tested out fine, and cat 
/sys/fs/selinux/ss/sidtab_hash_stats looked good both at boot and after 
running the selinux testsuite.

Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Tested-by: Stephen Smalley <sds@tycho.nsa.gov>

> ---
>   security/selinux/Kconfig            |  12 ++
>   security/selinux/include/security.h |   1 +
>   security/selinux/selinuxfs.c        |  65 +++++++
>   security/selinux/ss/context.h       |  11 +-
>   security/selinux/ss/policydb.c      |   5 +
>   security/selinux/ss/services.c      |  96 +++++++---
>   security/selinux/ss/services.h      |   4 +-
>   security/selinux/ss/sidtab.c        | 263 ++++++++++++++--------------
>   security/selinux/ss/sidtab.h        |  16 +-
>   9 files changed, 306 insertions(+), 167 deletions(-)
> 
> diff --git a/security/selinux/Kconfig b/security/selinux/Kconfig
> index 5711689deb6a..c9e576c430c2 100644
> --- a/security/selinux/Kconfig
> +++ b/security/selinux/Kconfig
> @@ -85,3 +85,15 @@ config SECURITY_SELINUX_CHECKREQPROT_VALUE
>   	  via /selinux/checkreqprot if authorized by policy.
>   
>   	  If you are unsure how to answer this question, answer 0.
> +
> +config SECURITY_SELINUX_SIDTAB_HASH_BITS
> +	int "NSA SELinux sidtab hashtable size"
> +	depends on SECURITY_SELINUX
> +	range 8 13
> +	default 9
> +	help
> +	  This option sets the number of buckets used in the sidtab hashtable
> +	  to 2^SECURITY_SELINUX_SIDTAB_HASH_BITS buckets. The number of hash
> +	  collisions may be viewed at /sys/fs/selinux/ss/sidtab_hash_stats. If
> +	  chain lengths are high (e.g. > 20) then selecting a higher value here
> +	  will ensure that lookups times are short and stable.
> diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
> index ae840634e3c7..8c0dbbd076c6 100644
> --- a/security/selinux/include/security.h
> +++ b/security/selinux/include/security.h
> @@ -395,5 +395,6 @@ extern int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm);
>   extern void avtab_cache_init(void);
>   extern void ebitmap_cache_init(void);
>   extern void hashtab_cache_init(void);
> +extern int security_sidtab_hash_stats(struct selinux_state *state, char *page);
>   
>   #endif /* _SELINUX_SECURITY_H_ */
> diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
> index ee94fa469c29..dd7bb1f1dc99 100644
> --- a/security/selinux/selinuxfs.c
> +++ b/security/selinux/selinuxfs.c
> @@ -1482,6 +1482,32 @@ static ssize_t sel_read_avc_hash_stats(struct file *filp, char __user *buf,
>   	return length;
>   }
>   
> +static ssize_t sel_read_sidtab_hash_stats(struct file *filp, char __user *buf,
> +					size_t count, loff_t *ppos)
> +{
> +	struct selinux_fs_info *fsi = file_inode(filp)->i_sb->s_fs_info;
> +	struct selinux_state *state = fsi->state;
> +	char *page;
> +	ssize_t length;
> +
> +	page = (char *)__get_free_page(GFP_KERNEL);
> +	if (!page)
> +		return -ENOMEM;
> +
> +	length = security_sidtab_hash_stats(state, page);
> +	if (length >= 0)
> +		length = simple_read_from_buffer(buf, count, ppos, page,
> +						length);
> +	free_page((unsigned long)page);
> +
> +	return length;
> +}
> +
> +static const struct file_operations sel_sidtab_hash_stats_ops = {
> +	.read		= sel_read_sidtab_hash_stats,
> +	.llseek		= generic_file_llseek,
> +};
> +
>   static const struct file_operations sel_avc_cache_threshold_ops = {
>   	.read		= sel_read_avc_cache_threshold,
>   	.write		= sel_write_avc_cache_threshold,
> @@ -1599,6 +1625,37 @@ static int sel_make_avc_files(struct dentry *dir)
>   	return 0;
>   }
>   
> +static int sel_make_ss_files(struct dentry *dir)
> +{
> +	struct super_block *sb = dir->d_sb;
> +	struct selinux_fs_info *fsi = sb->s_fs_info;
> +	int i;
> +	static struct tree_descr files[] = {
> +		{ "sidtab_hash_stats", &sel_sidtab_hash_stats_ops, S_IRUGO },
> +	};
> +
> +	for (i = 0; i < ARRAY_SIZE(files); i++) {
> +		struct inode *inode;
> +		struct dentry *dentry;
> +
> +		dentry = d_alloc_name(dir, files[i].name);
> +		if (!dentry)
> +			return -ENOMEM;
> +
> +		inode = sel_make_inode(dir->d_sb, S_IFREG|files[i].mode);
> +		if (!inode) {
> +			dput(dentry);
> +			return -ENOMEM;
> +		}
> +
> +		inode->i_fop = files[i].ops;
> +		inode->i_ino = ++fsi->last_ino;
> +		d_add(dentry, inode);
> +	}
> +
> +	return 0;
> +}
> +
>   static ssize_t sel_read_initcon(struct file *file, char __user *buf,
>   				size_t count, loff_t *ppos)
>   {
> @@ -1963,6 +2020,14 @@ static int sel_fill_super(struct super_block *sb, struct fs_context *fc)
>   	}
>   
>   	ret = sel_make_avc_files(dentry);
> +
> +	dentry = sel_make_dir(sb->s_root, "ss", &fsi->last_ino);
> +	if (IS_ERR(dentry)) {
> +		ret = PTR_ERR(dentry);
> +		goto err;
> +	}
> +
> +	ret = sel_make_ss_files(dentry);
>   	if (ret)
>   		goto err;
>   
> diff --git a/security/selinux/ss/context.h b/security/selinux/ss/context.h
> index 513e67f48878..3ba044fe02ed 100644
> --- a/security/selinux/ss/context.h
> +++ b/security/selinux/ss/context.h
> @@ -31,6 +31,7 @@ struct context {
>   	u32 len;        /* length of string in bytes */
>   	struct mls_range range;
>   	char *str;	/* string representation if context cannot be mapped. */
> +	u32 hash;	/* a hash of the string representation */
>   };
>   
>   static inline void mls_context_init(struct context *c)
> @@ -168,12 +169,13 @@ static inline int context_cpy(struct context *dst, struct context *src)
>   		kfree(dst->str);
>   		return rc;
>   	}
> +	dst->hash = src->hash;
>   	return 0;
>   }
>   
>   static inline void context_destroy(struct context *c)
>   {
> -	c->user = c->role = c->type = 0;
> +	c->user = c->role = c->type = c->hash = 0;
>   	kfree(c->str);
>   	c->str = NULL;
>   	c->len = 0;
> @@ -182,6 +184,8 @@ static inline void context_destroy(struct context *c)
>   
>   static inline int context_cmp(struct context *c1, struct context *c2)
>   {
> +	if (c1->hash && c2->hash && (c1->hash != c2->hash))
> +		return 0;
>   	if (c1->len && c2->len)
>   		return (c1->len == c2->len && !strcmp(c1->str, c2->str));
>   	if (c1->len || c2->len)
> @@ -192,5 +196,10 @@ static inline int context_cmp(struct context *c1, struct context *c2)
>   		mls_context_cmp(c1, c2));
>   }
>   
> +static inline unsigned int context_compute_hash(const char *s)
> +{
> +	return full_name_hash(NULL, s, strlen(s));
> +}
> +
>   #endif	/* _SS_CONTEXT_H_ */
>   
> diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
> index e20624a68f5d..e369b0092cdf 100644
> --- a/security/selinux/ss/policydb.c
> +++ b/security/selinux/ss/policydb.c
> @@ -878,6 +878,11 @@ int policydb_load_isids(struct policydb *p, struct sidtab *s)
>   			sidtab_destroy(s);
>   			goto out;
>   		}
> +		rc = context_add_hash(p, &c->context[0]);
> +		if (rc) {
> +			sidtab_destroy(s);
> +			goto out;
> +		}
>   
>   		rc = sidtab_set_initial(s, c->sid[0], &c->context[0]);
>   		if (rc) {
> diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
> index 3a29e7c24ba9..f5287008d843 100644
> --- a/security/selinux/ss/services.c
> +++ b/security/selinux/ss/services.c
> @@ -1257,6 +1257,17 @@ static int context_struct_to_string(struct policydb *p,
>   
>   #include "initial_sid_to_string.h"
>   
> +int security_sidtab_hash_stats(struct selinux_state *state, char *page)
> +{
> +	int rc;
> +
> +	read_lock(&state->ss->policy_rwlock);
> +	rc = sidtab_hash_stats(state->ss->sidtab, page);
> +	read_unlock(&state->ss->policy_rwlock);
> +
> +	return rc;
> +}
> +
>   const char *security_get_initial_sid_context(u32 sid)
>   {
>   	if (unlikely(sid > SECINITSID_NUM))
> @@ -1449,6 +1460,42 @@ static int string_to_context_struct(struct policydb *pol,
>   	return rc;
>   }
>   
> +int context_add_hash(struct policydb *policydb,
> +		     struct context *context)
> +{
> +	int rc;
> +	char *str;
> +	int len;
> +
> +	if (context->str) {
> +		context->hash = context_compute_hash(context->str);
> +	} else {
> +		rc = context_struct_to_string(policydb, context,
> +					      &str, &len);
> +		if (rc)
> +			return rc;
> +		context->hash = context_compute_hash(str);
> +		kfree(str);
> +	}
> +	return 0;
> +}
> +
> +static int context_struct_to_sid(struct selinux_state *state,
> +				 struct context *context, u32 *sid)
> +{
> +	int rc;
> +	struct sidtab *sidtab = state->ss->sidtab;
> +	struct policydb *policydb = &state->ss->policydb;
> +
> +	if (!context->hash) {
> +		rc = context_add_hash(policydb, context);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return sidtab_context_to_sid(sidtab, context, sid);
> +}
> +
>   static int security_context_to_sid_core(struct selinux_state *state,
>   					const char *scontext, u32 scontext_len,
>   					u32 *sid, u32 def_sid, gfp_t gfp_flags,
> @@ -1501,7 +1548,7 @@ static int security_context_to_sid_core(struct selinux_state *state,
>   		str = NULL;
>   	} else if (rc)
>   		goto out_unlock;
> -	rc = sidtab_context_to_sid(sidtab, &context, sid);
> +	rc = context_struct_to_sid(state, &context, sid);
>   	context_destroy(&context);
>   out_unlock:
>   	read_unlock(&state->ss->policy_rwlock);
> @@ -1805,7 +1852,7 @@ static int security_compute_sid(struct selinux_state *state,
>   			goto out_unlock;
>   	}
>   	/* Obtain the sid for the context. */
> -	rc = sidtab_context_to_sid(sidtab, &newcontext, out_sid);
> +	rc = context_struct_to_sid(state, &newcontext, out_sid);
>   out_unlock:
>   	read_unlock(&state->ss->policy_rwlock);
>   	context_destroy(&newcontext);
> @@ -1950,6 +1997,7 @@ static int convert_context(struct context *oldc, struct context *newc, void *p)
>   			context_init(newc);
>   			newc->str = s;
>   			newc->len = oldc->len;
> +			newc->hash = oldc->hash;
>   			return 0;
>   		}
>   		kfree(s);
> @@ -2026,6 +2074,10 @@ static int convert_context(struct context *oldc, struct context *newc, void *p)
>   			goto bad;
>   	}
>   
> +	rc = context_add_hash(args->newp, newc);
> +	if (rc)
> +		goto bad;
> +
>   	return 0;
>   bad:
>   	/* Map old representation to string and save it. */
> @@ -2035,6 +2087,7 @@ static int convert_context(struct context *oldc, struct context *newc, void *p)
>   	context_destroy(newc);
>   	newc->str = s;
>   	newc->len = len;
> +	newc->hash = context_compute_hash(s);
>   	pr_info("SELinux:  Context %s became invalid (unmapped).\n",
>   		newc->str);
>   	return 0;
> @@ -2273,8 +2326,7 @@ int security_port_sid(struct selinux_state *state,
>   
>   	if (c) {
>   		if (!c->sid[0]) {
> -			rc = sidtab_context_to_sid(sidtab,
> -						   &c->context[0],
> +			rc = context_struct_to_sid(state, &c->context[0],
>   						   &c->sid[0]);
>   			if (rc)
>   				goto out;
> @@ -2299,14 +2351,12 @@ int security_ib_pkey_sid(struct selinux_state *state,
>   			 u64 subnet_prefix, u16 pkey_num, u32 *out_sid)
>   {
>   	struct policydb *policydb;
> -	struct sidtab *sidtab;
>   	struct ocontext *c;
>   	int rc = 0;
>   
>   	read_lock(&state->ss->policy_rwlock);
>   
>   	policydb = &state->ss->policydb;
> -	sidtab = state->ss->sidtab;
>   
>   	c = policydb->ocontexts[OCON_IBPKEY];
>   	while (c) {
> @@ -2320,7 +2370,7 @@ int security_ib_pkey_sid(struct selinux_state *state,
>   
>   	if (c) {
>   		if (!c->sid[0]) {
> -			rc = sidtab_context_to_sid(sidtab,
> +			rc = context_struct_to_sid(state,
>   						   &c->context[0],
>   						   &c->sid[0]);
>   			if (rc)
> @@ -2367,8 +2417,7 @@ int security_ib_endport_sid(struct selinux_state *state,
>   
>   	if (c) {
>   		if (!c->sid[0]) {
> -			rc = sidtab_context_to_sid(sidtab,
> -						   &c->context[0],
> +			rc = context_struct_to_sid(state, &c->context[0],
>   						   &c->sid[0]);
>   			if (rc)
>   				goto out;
> @@ -2409,13 +2458,11 @@ int security_netif_sid(struct selinux_state *state,
>   
>   	if (c) {
>   		if (!c->sid[0] || !c->sid[1]) {
> -			rc = sidtab_context_to_sid(sidtab,
> -						  &c->context[0],
> -						  &c->sid[0]);
> +			rc = context_struct_to_sid(state, &c->context[0],
> +						   &c->sid[0]);
>   			if (rc)
>   				goto out;
> -			rc = sidtab_context_to_sid(sidtab,
> -						   &c->context[1],
> +			rc = context_struct_to_sid(state, &c->context[1],
>   						   &c->sid[1]);
>   			if (rc)
>   				goto out;
> @@ -2456,14 +2503,12 @@ int security_node_sid(struct selinux_state *state,
>   		      u32 *out_sid)
>   {
>   	struct policydb *policydb;
> -	struct sidtab *sidtab;
>   	int rc;
>   	struct ocontext *c;
>   
>   	read_lock(&state->ss->policy_rwlock);
>   
>   	policydb = &state->ss->policydb;
> -	sidtab = state->ss->sidtab;
>   
>   	switch (domain) {
>   	case AF_INET: {
> @@ -2505,7 +2550,7 @@ int security_node_sid(struct selinux_state *state,
>   
>   	if (c) {
>   		if (!c->sid[0]) {
> -			rc = sidtab_context_to_sid(sidtab,
> +			rc = context_struct_to_sid(state,
>   						   &c->context[0],
>   						   &c->sid[0]);
>   			if (rc)
> @@ -2589,12 +2634,17 @@ int security_get_user_sids(struct selinux_state *state,
>   		usercon.role = i + 1;
>   		ebitmap_for_each_positive_bit(&role->types, tnode, j) {
>   			usercon.type = j + 1;
> +			/*
> +			 * The same context struct is reused here so the hash
> +			 * must be reset.
> +			 */
> +			usercon.hash = 0;
>   
>   			if (mls_setup_user_range(policydb, fromcon, user,
>   						 &usercon))
>   				continue;
>   
> -			rc = sidtab_context_to_sid(sidtab, &usercon, &sid);
> +			rc = context_struct_to_sid(state, &usercon, &sid);
>   			if (rc)
>   				goto out_unlock;
>   			if (mynel < maxnel) {
> @@ -2665,7 +2715,6 @@ static inline int __security_genfs_sid(struct selinux_state *state,
>   				       u32 *sid)
>   {
>   	struct policydb *policydb = &state->ss->policydb;
> -	struct sidtab *sidtab = state->ss->sidtab;
>   	int len;
>   	u16 sclass;
>   	struct genfs *genfs;
> @@ -2700,7 +2749,7 @@ static inline int __security_genfs_sid(struct selinux_state *state,
>   		goto out;
>   
>   	if (!c->sid[0]) {
> -		rc = sidtab_context_to_sid(sidtab, &c->context[0], &c->sid[0]);
> +		rc = context_struct_to_sid(state, &c->context[0], &c->sid[0]);
>   		if (rc)
>   			goto out;
>   	}
> @@ -2763,7 +2812,7 @@ int security_fs_use(struct selinux_state *state, struct super_block *sb)
>   	if (c) {
>   		sbsec->behavior = c->v.behavior;
>   		if (!c->sid[0]) {
> -			rc = sidtab_context_to_sid(sidtab, &c->context[0],
> +			rc = context_struct_to_sid(state, &c->context[0],
>   						   &c->sid[0]);
>   			if (rc)
>   				goto out;
> @@ -3019,8 +3068,7 @@ int security_sid_mls_copy(struct selinux_state *state,
>   			goto out_unlock;
>   		}
>   	}
> -
> -	rc = sidtab_context_to_sid(sidtab, &newcon, new_sid);
> +	rc = context_struct_to_sid(state, &newcon, new_sid);
>   out_unlock:
>   	read_unlock(&state->ss->policy_rwlock);
>   	context_destroy(&newcon);
> @@ -3613,7 +3661,7 @@ int security_netlbl_secattr_to_sid(struct selinux_state *state,
>   		if (!mls_context_isvalid(policydb, &ctx_new))
>   			goto out_free;
>   
> -		rc = sidtab_context_to_sid(sidtab, &ctx_new, sid);
> +		rc = context_struct_to_sid(state, &ctx_new, sid);
>   		if (rc)
>   			goto out_free;
>   
> diff --git a/security/selinux/ss/services.h b/security/selinux/ss/services.h
> index 9a36de860368..fc40640a9725 100644
> --- a/security/selinux/ss/services.h
> +++ b/security/selinux/ss/services.h
> @@ -8,7 +8,7 @@
>   #define _SS_SERVICES_H_
>   
>   #include "policydb.h"
> -#include "sidtab.h"
> +#include "context.h"
>   
>   /* Mapping for a single class */
>   struct selinux_mapping {
> @@ -39,4 +39,6 @@ void services_compute_xperms_drivers(struct extended_perms *xperms,
>   void services_compute_xperms_decision(struct extended_perms_decision *xpermd,
>   					struct avtab_node *node);
>   
> +int context_add_hash(struct policydb *policydb, struct context *context);
> +
>   #endif	/* _SS_SERVICES_H_ */
> diff --git a/security/selinux/ss/sidtab.c b/security/selinux/ss/sidtab.c
> index 7d49994e8d5f..97d6fedeff68 100644
> --- a/security/selinux/ss/sidtab.c
> +++ b/security/selinux/ss/sidtab.c
> @@ -17,26 +17,43 @@
>   #include "security.h"
>   #include "sidtab.h"
>   
> +#define index_to_sid(index) (index + SECINITSID_NUM + 1)
> +#define sid_to_index(sid) (sid - (SECINITSID_NUM + 1))
> +
>   int sidtab_init(struct sidtab *s)
>   {
>   	u32 i;
>   
>   	memset(s->roots, 0, sizeof(s->roots));
>   
> -	/* max count is SIDTAB_MAX so valid index is always < SIDTAB_MAX */
> -	for (i = 0; i < SIDTAB_RCACHE_SIZE; i++)
> -		s->rcache[i] = SIDTAB_MAX;
> -
>   	for (i = 0; i < SECINITSID_NUM; i++)
>   		s->isids[i].set = 0;
>   
>   	s->count = 0;
>   	s->convert = NULL;
> +	hash_init(s->context_to_sid);
>   
>   	spin_lock_init(&s->lock);
>   	return 0;
>   }
>   
> +static u32 context_to_sid(struct sidtab *s, struct context *context)
> +{
> +	struct sidtab_entry_leaf *entry;
> +	u32 sid = 0;
> +
> +	rcu_read_lock();
> +	hash_for_each_possible_rcu(s->context_to_sid, entry, list,
> +				   context->hash) {
> +		if (context_cmp(&entry->context, context)) {
> +			sid = entry->sid;
> +			break;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return sid;
> +}
> +
>   int sidtab_set_initial(struct sidtab *s, u32 sid, struct context *context)
>   {
>   	struct sidtab_isid_entry *entry;
> @@ -47,14 +64,60 @@ int sidtab_set_initial(struct sidtab *s, u32 sid, struct context *context)
>   
>   	entry = &s->isids[sid - 1];
>   
> -	rc = context_cpy(&entry->context, context);
> +	rc = context_cpy(&entry->leaf.context, context);
>   	if (rc)
>   		return rc;
>   
>   	entry->set = 1;
> +
> +	/*
> +	 * Multiple initial sids may map to the same context. Check that this
> +	 * context is not already represented in the context_to_sid hashtable
> +	 * to avoid duplicate entries and long linked lists upon hash
> +	 * collision.
> +	 */
> +	if (!context_to_sid(s, context)) {
> +		entry->leaf.sid = sid;
> +		hash_add(s->context_to_sid, &entry->leaf.list, context->hash);
> +	}
> +
>   	return 0;
>   }
>   
> +int sidtab_hash_stats(struct sidtab *sidtab, char *page)
> +{
> +	int i;
> +	int chain_len = 0;
> +	int slots_used = 0;
> +	int entries = 0;
> +	int max_chain_len = 0;
> +	int cur_bucket = 0;
> +	struct sidtab_entry_leaf *entry;
> +
> +	rcu_read_lock();
> +	hash_for_each_rcu(sidtab->context_to_sid, i, entry, list) {
> +		entries++;
> +		if (i == cur_bucket) {
> +			chain_len++;
> +			if (chain_len == 1)
> +				slots_used++;
> +		} else {
> +			cur_bucket = i;
> +			if (chain_len > max_chain_len)
> +				max_chain_len = chain_len;
> +			chain_len = 0;
> +		}
> +	}
> +	rcu_read_unlock();
> +
> +	if (chain_len > max_chain_len)
> +		max_chain_len = chain_len;
> +
> +	return scnprintf(page, PAGE_SIZE, "entries: %d\nbuckets used: %d/%d\n"
> +			 "longest chain: %d\n", entries,
> +			 slots_used, SIDTAB_HASH_BUCKETS, max_chain_len);
> +}
> +
>   static u32 sidtab_level_from_count(u32 count)
>   {
>   	u32 capacity = SIDTAB_LEAF_ENTRIES;
> @@ -88,7 +151,8 @@ static int sidtab_alloc_roots(struct sidtab *s, u32 level)
>   	return 0;
>   }
>   
> -static struct context *sidtab_do_lookup(struct sidtab *s, u32 index, int alloc)
> +static struct sidtab_entry_leaf *sidtab_do_lookup(struct sidtab *s, u32 index,
> +						  int alloc)
>   {
>   	union sidtab_entry_inner *entry;
>   	u32 level, capacity_shift, leaf_index = index / SIDTAB_LEAF_ENTRIES;
> @@ -125,7 +189,7 @@ static struct context *sidtab_do_lookup(struct sidtab *s, u32 index, int alloc)
>   		if (!entry->ptr_leaf)
>   			return NULL;
>   	}
> -	return &entry->ptr_leaf->entries[index % SIDTAB_LEAF_ENTRIES].context;
> +	return &entry->ptr_leaf->entries[index % SIDTAB_LEAF_ENTRIES];
>   }
>   
>   static struct context *sidtab_lookup(struct sidtab *s, u32 index)
> @@ -136,12 +200,12 @@ static struct context *sidtab_lookup(struct sidtab *s, u32 index)
>   	if (index >= count)
>   		return NULL;
>   
> -	return sidtab_do_lookup(s, index, 0);
> +	return &sidtab_do_lookup(s, index, 0)->context;
>   }
>   
>   static struct context *sidtab_lookup_initial(struct sidtab *s, u32 sid)
>   {
> -	return s->isids[sid - 1].set ? &s->isids[sid - 1].context : NULL;
> +	return s->isids[sid - 1].set ? &s->isids[sid - 1].leaf.context : NULL;
>   }
>   
>   static struct context *sidtab_search_core(struct sidtab *s, u32 sid, int force)
> @@ -150,7 +214,7 @@ static struct context *sidtab_search_core(struct sidtab *s, u32 sid, int force)
>   
>   	if (sid != 0) {
>   		if (sid > SECINITSID_NUM)
> -			context = sidtab_lookup(s, sid - (SECINITSID_NUM + 1));
> +			context = sidtab_lookup(s, sid_to_index(sid));
>   		else
>   			context = sidtab_lookup_initial(s, sid);
>   		if (context && (!context->len || force))
> @@ -170,117 +234,30 @@ struct context *sidtab_search_force(struct sidtab *s, u32 sid)
>   	return sidtab_search_core(s, sid, 1);
>   }
>   
> -static int sidtab_find_context(union sidtab_entry_inner entry,
> -			       u32 *pos, u32 count, u32 level,
> -			       struct context *context, u32 *index)
> -{
> -	int rc;
> -	u32 i;
> -
> -	if (level != 0) {
> -		struct sidtab_node_inner *node = entry.ptr_inner;
> -
> -		i = 0;
> -		while (i < SIDTAB_INNER_ENTRIES && *pos < count) {
> -			rc = sidtab_find_context(node->entries[i],
> -						 pos, count, level - 1,
> -						 context, index);
> -			if (rc == 0)
> -				return 0;
> -			i++;
> -		}
> -	} else {
> -		struct sidtab_node_leaf *node = entry.ptr_leaf;
> -
> -		i = 0;
> -		while (i < SIDTAB_LEAF_ENTRIES && *pos < count) {
> -			if (context_cmp(&node->entries[i].context, context)) {
> -				*index = *pos;
> -				return 0;
> -			}
> -			(*pos)++;
> -			i++;
> -		}
> -	}
> -	return -ENOENT;
> -}
> -
> -static void sidtab_rcache_update(struct sidtab *s, u32 index, u32 pos)
> -{
> -	while (pos > 0) {
> -		WRITE_ONCE(s->rcache[pos], READ_ONCE(s->rcache[pos - 1]));
> -		--pos;
> -	}
> -	WRITE_ONCE(s->rcache[0], index);
> -}
> -
> -static void sidtab_rcache_push(struct sidtab *s, u32 index)
> -{
> -	sidtab_rcache_update(s, index, SIDTAB_RCACHE_SIZE - 1);
> -}
> -
> -static int sidtab_rcache_search(struct sidtab *s, struct context *context,
> -				u32 *index)
> -{
> -	u32 i;
> -
> -	for (i = 0; i < SIDTAB_RCACHE_SIZE; i++) {
> -		u32 v = READ_ONCE(s->rcache[i]);
> -
> -		if (v >= SIDTAB_MAX)
> -			continue;
> -
> -		if (context_cmp(sidtab_do_lookup(s, v, 0), context)) {
> -			sidtab_rcache_update(s, v, i);
> -			*index = v;
> -			return 0;
> -		}
> -	}
> -	return -ENOENT;
> -}
> -
> -static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
> -				 u32 *index)
> +int sidtab_context_to_sid(struct sidtab *s, struct context *context,
> +			  u32 *sid)
>   {
>   	unsigned long flags;
> -	u32 count, count_locked, level, pos;
> +	u32 count;
>   	struct sidtab_convert_params *convert;
> -	struct context *dst, *dst_convert;
> +	struct sidtab_entry_leaf *dst, *dst_convert;
>   	int rc;
>   
> -	rc = sidtab_rcache_search(s, context, index);
> -	if (rc == 0)
> -		return 0;
> -
> -	/* read entries only after reading count */
> -	count = smp_load_acquire(&s->count);
> -	level = sidtab_level_from_count(count);
> -
> -	pos = 0;
> -	rc = sidtab_find_context(s->roots[level], &pos, count, level,
> -				 context, index);
> -	if (rc == 0) {
> -		sidtab_rcache_push(s, *index);
> +	*sid = context_to_sid(s, context);
> +	if (*sid)
>   		return 0;
> -	}
>   
>   	/* lock-free search failed: lock, re-search, and insert if not found */
>   	spin_lock_irqsave(&s->lock, flags);
>   
> +	rc = 0;
> +	*sid = context_to_sid(s, context);
> +	if (*sid)
> +		goto out_unlock;
> +
> +	/* read entries only after reading count */
> +	count = smp_load_acquire(&s->count);
>   	convert = s->convert;
> -	count_locked = s->count;
> -	level = sidtab_level_from_count(count_locked);
> -
> -	/* if count has changed before we acquired the lock, then catch up */
> -	while (count < count_locked) {
> -		if (context_cmp(sidtab_do_lookup(s, count, 0), context)) {
> -			sidtab_rcache_push(s, count);
> -			*index = count;
> -			rc = 0;
> -			goto out_unlock;
> -		}
> -		++count;
> -	}
>   
>   	/* bail out if we already reached max entries */
>   	rc = -EOVERFLOW;
> @@ -293,7 +270,9 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
>   	if (!dst)
>   		goto out_unlock;
>   
> -	rc = context_cpy(dst, context);
> +	dst->sid = index_to_sid(count);
> +
> +	rc = context_cpy(&dst->context, context);
>   	if (rc)
>   		goto out_unlock;
>   
> @@ -305,29 +284,32 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
>   		rc = -ENOMEM;
>   		dst_convert = sidtab_do_lookup(convert->target, count, 1);
>   		if (!dst_convert) {
> -			context_destroy(dst);
> +			context_destroy(&dst->context);
>   			goto out_unlock;
>   		}
>   
> -		rc = convert->func(context, dst_convert, convert->args);
> +		rc = convert->func(context, &dst_convert->context,
> +				convert->args);
>   		if (rc) {
> -			context_destroy(dst);
> +			context_destroy(&dst->context);
>   			goto out_unlock;
>   		}
> -
> -		/* at this point we know the insert won't fail */
> +		dst_convert->sid = index_to_sid(count);
>   		convert->target->count = count + 1;
> +
> +		hash_add_rcu(convert->target->context_to_sid,
> +				&dst_convert->list, dst_convert->context.hash);
>   	}
>   
>   	if (context->len)
>   		pr_info("SELinux:  Context %s is not valid (left unmapped).\n",
>   			context->str);
>   
> -	sidtab_rcache_push(s, count);
> -	*index = count;
> +	*sid = index_to_sid(count);
>   
> -	/* write entries before writing new count */
> +	/* Write entries before updating count. */
>   	smp_store_release(&s->count, count + 1);
> +	hash_add_rcu(s->context_to_sid, &dst->list, dst->context.hash);
>   
>   	rc = 0;
>   out_unlock:
> @@ -335,25 +317,19 @@ static int sidtab_reverse_lookup(struct sidtab *s, struct context *context,
>   	return rc;
>   }
>   
> -int sidtab_context_to_sid(struct sidtab *s, struct context *context, u32 *sid)
> +static void sidtab_convert_hashtable(struct sidtab *s, u32 count)
>   {
> -	int rc;
> +	struct sidtab_entry_leaf *entry;
>   	u32 i;
>   
> -	for (i = 0; i < SECINITSID_NUM; i++) {
> -		struct sidtab_isid_entry *entry = &s->isids[i];
> +	for (i = 0; i < count; i++) {
> +		entry = sidtab_do_lookup(s, i, 0);
> +		entry->sid = index_to_sid(i);
>   
> -		if (entry->set && context_cmp(context, &entry->context)) {
> -			*sid = i + 1;
> -			return 0;
> -		}
> -	}
> +		hash_add_rcu(s->context_to_sid, &entry->list,
> +				entry->context.hash);
>   
> -	rc = sidtab_reverse_lookup(s, context, sid);
> -	if (rc)
> -		return rc;
> -	*sid += SECINITSID_NUM + 1;
> -	return 0;
> +	}
>   }
>   
>   static int sidtab_convert_tree(union sidtab_entry_inner *edst,
> @@ -400,6 +376,7 @@ static int sidtab_convert_tree(union sidtab_entry_inner *edst,
>   		}
>   		cond_resched();
>   	}
> +
>   	return 0;
>   }
>   
> @@ -435,7 +412,7 @@ int sidtab_convert(struct sidtab *s, struct sidtab_convert_params *params)
>   	/* enable live convert of new entries */
>   	s->convert = params;
>   
> -	/* we can safely do the rest of the conversion outside the lock */
> +	/* we can safely convert the tree outside the lock */
>   	spin_unlock_irqrestore(&s->lock, flags);
>   
>   	pr_info("SELinux:  Converting %u SID table entries...\n", count);
> @@ -449,8 +426,17 @@ int sidtab_convert(struct sidtab *s, struct sidtab_convert_params *params)
>   		spin_lock_irqsave(&s->lock, flags);
>   		s->convert = NULL;
>   		spin_unlock_irqrestore(&s->lock, flags);
> +		return rc;
>   	}
> -	return rc;
> +	/*
> +	 * The hashtable can also be modified in sidtab_context_to_sid()
> +	 * so we must re-acquire the lock here.
> +	 */
> +	spin_lock_irqsave(&s->lock, flags);
> +	sidtab_convert_hashtable(params->target, count);
> +	spin_unlock_irqrestore(&s->lock, flags);
> +
> +	return 0;
>   }
>   
>   static void sidtab_destroy_tree(union sidtab_entry_inner entry, u32 level)
> @@ -484,11 +470,16 @@ void sidtab_destroy(struct sidtab *s)
>   
>   	for (i = 0; i < SECINITSID_NUM; i++)
>   		if (s->isids[i].set)
> -			context_destroy(&s->isids[i].context);
> +			context_destroy(&s->isids[i].leaf.context);
>   
>   	level = SIDTAB_MAX_LEVEL;
>   	while (level && !s->roots[level].ptr_inner)
>   		--level;
>   
>   	sidtab_destroy_tree(s->roots[level], level);
> +	/*
> +	 * The context_to_sid hashtable's objects are all shared
> +	 * with the isids array and context tree, and so don't need
> +	 * to be cleaned up here.
> +	 */
>   }
> diff --git a/security/selinux/ss/sidtab.h b/security/selinux/ss/sidtab.h
> index 1f4763141aa1..e2809401c417 100644
> --- a/security/selinux/ss/sidtab.h
> +++ b/security/selinux/ss/sidtab.h
> @@ -13,11 +13,14 @@
>   
>   #include <linux/spinlock_types.h>
>   #include <linux/log2.h>
> +#include <linux/hashtable.h>
>   
>   #include "context.h"
>   
>   struct sidtab_entry_leaf {
> +	u32 sid;
>   	struct context context;
> +	struct hlist_node list;
>   };
>   
>   struct sidtab_node_inner;
> @@ -57,7 +60,7 @@ struct sidtab_node_inner {
>   
>   struct sidtab_isid_entry {
>   	int set;
> -	struct context context;
> +	struct sidtab_entry_leaf leaf;
>   };
>   
>   struct sidtab_convert_params {
> @@ -66,7 +69,8 @@ struct sidtab_convert_params {
>   	struct sidtab *target;
>   };
>   
> -#define SIDTAB_RCACHE_SIZE 3
> +#define SIDTAB_HASH_BITS CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS
> +#define SIDTAB_HASH_BUCKETS (1 << SIDTAB_HASH_BITS)
>   
>   struct sidtab {
>   	/*
> @@ -83,11 +87,11 @@ struct sidtab {
>   	struct sidtab_convert_params *convert;
>   	spinlock_t lock;
>   
> -	/* reverse lookup cache - access atomically via {READ|WRITE}_ONCE() */
> -	u32 rcache[SIDTAB_RCACHE_SIZE];
> -
>   	/* index == SID - 1 (no entry for SECSID_NULL) */
>   	struct sidtab_isid_entry isids[SECINITSID_NUM];
> +
> +	/* Hash table for fast reverse context-to-sid lookups. */
> +	DECLARE_HASHTABLE(context_to_sid, SIDTAB_HASH_BITS);
>   };
>   
>   int sidtab_init(struct sidtab *s);
> @@ -101,6 +105,8 @@ int sidtab_context_to_sid(struct sidtab *s, struct context *context, u32 *sid);
>   
>   void sidtab_destroy(struct sidtab *s);
>   
> +int sidtab_hash_stats(struct sidtab *sidtab, char *page);
> +
>   #endif	/* _SS_SIDTAB_H_ */
>   
>   
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-11-22  9:33 [PATCH v9] selinux: sidtab: reverse lookup hash table Jeff Vander Stoep
  2019-11-22 14:21 ` Stephen Smalley
@ 2019-12-03  0:32 ` Paul Moore
  2019-12-04  9:11   ` Ondrej Mosnacek
  2019-12-09 21:17   ` Paul Moore
  1 sibling, 2 replies; 15+ messages in thread
From: Paul Moore @ 2019-12-03  0:32 UTC (permalink / raw)
  To: Jeff Vander Stoep
  Cc: selinux, omosnace, Stephen Smalley, will, paulmck, rcu, Jovana Knezevic

On Fri, Nov 22, 2019 at 4:33 AM Jeff Vander Stoep <jeffv@google.com> wrote:
> This replaces the reverse table lookup and reverse cache with a
> hashtable which improves cache-miss reverse-lookup times from
> O(n) to O(1)* and maintains the same performance as a reverse
> cache hit.
>
> This reduces the time needed to add a new sidtab entry from ~500us
> to 5us on a Pixel 3 when there are ~10,000 sidtab entries.
>
> The implementation uses the kernel's generic hashtable API,
> It uses the context's string represtation as the hash source,
> and the kernels generic string hashing algorithm full_name_hash()
> to reduce the string to a 32 bit value.
>
> This change also maintains the improvement introduced in
> commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
> performance") which removed the need to keep the current sidtab
> locked during policy reload. It does however introduce periodic
> locking of the target sidtab while converting the hashtable. Sidtab
> entries are never modified or removed, so the context struct stored
> in the sid_to_context tree can also be used for the context_to_sid
> hashtable to reduce memory usage.
>
> This bug was reported by:
> - On the selinux bug tracker.
>   BUG: kernel softlockup due to too many SIDs/contexts #37
>   https://github.com/SELinuxProject/selinux-kernel/issues/37
> - Jovana Knezevic on Android's bugtracker.
>   Bug: 140252993
>   "During multi-user performance testing, we create and remove users
>   many times. selinux_android_restorecon_pkgdir goes from 1ms to over
>   20ms after about 200 user creations and removals. Accumulated over
>   ~280 packages, that adds a significant time to user creation,
>   making perf benchmarks unreliable."
>
> * Hashtable lookup is only O(1) when n < the number of buckets.
>
> Changes in V2:
> -The hashtable uses sidtab_entry_leaf objects directly so these
> objects are shared between the sid_to_context lookup tree and the
> context_to_sid hashtable. This simplifies memory allocation and
> was suggested by Ondrej Mosnacek.
> -The new sidtab hash stats file in selinuxfs has been moved out of
> the avc dir and into a new "ss" dir.
>
> V3:
> -Add lock nesting notation.
>
> V4/V5:
> -Moved to *_rcu variants of the various hashtable functions
> as suggested by Will Deacon.
> -Naming/spelling fixups.
>
> V6
> -Remove nested locking. Use lock of active sidtab to gate
> access to the new sidtab.
> -Remove use of rcu_head/kfree_rcu(), they're unnecessary because
> hashtable objects are never removed when read/add operations are
> occurring. Why is this safe? Quoting Ondrej Mosnacek from the
> selinux mailing list:
> "It is not visible in this patch, but the sidtab (along with other
> policy-lifetime structures) is protected by a big fat read-write lock.
> The only places where sidtab_destroy() is called are (a) error paths
> when initializing a new sidtab (here the sidtab isn't shared yet, so
> no race) and (b) when freeing the old sidtab during policy reload - in
> this case it is happening after a policy write-locked critical
> section, which had removed the old sidtab pointer from the shared
> structures, so at that point all sidtab readers will already be
> accessing the new sidtab and the old one is visible only by the thread
> doing the destruction."
>
> V7
> -Change format of /sys/fs/selinux/ss/sidtab_hash_stats to match
> /sys/fs/selinux/avc/hash_stats.
> -Add __rcu annotation to rcu pointers.
> -Test with CONFIG_SPARSE_RCU_POINTER and CONFIG_PROVE_RCU.
> -Add rcu@vger.kernel.org and Paul McKenney to Cc for review of the
> RCU logic.
>
> V8
> -Removed the __rcu annotation used in V7. The annotation is
> intended to be applied to pointers to an object, however the
> objects referenced in the rcu hashtable are allocated in an
> array.
> -Fixed bug where multiple SIDs were receiving the same hash
> due to security_get_user_sids() reusing the same context
> struct without calling context_init() on it. This bug was
> discovered and root-caused by Stephen Smalley.
>
> V9
> -Do not compute the hash in string_to_context_struct
> because this string representation is non-canonical.
>
> Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
> Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
> Reported-by: Jovana Knezevic <jovanak@google.com>
> ---
>  security/selinux/Kconfig            |  12 ++
>  security/selinux/include/security.h |   1 +
>  security/selinux/selinuxfs.c        |  65 +++++++
>  security/selinux/ss/context.h       |  11 +-
>  security/selinux/ss/policydb.c      |   5 +
>  security/selinux/ss/services.c      |  96 +++++++---
>  security/selinux/ss/services.h      |   4 +-
>  security/selinux/ss/sidtab.c        | 263 ++++++++++++++--------------
>  security/selinux/ss/sidtab.h        |  16 +-
>  9 files changed, 306 insertions(+), 167 deletions(-)

Thanks Jeff, as well as everyone else who contributed reviews and feedback.

I've pulled this into a working branch and I'll be merging it with the
other sidtab patches before posting it to a "next-queue" branch for
review later this week.  When done, I'll send a note to the list, as
well as the relevant patch authors; your help in reviewing the merge
would be greatly appreciated.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-03  0:32 ` Paul Moore
@ 2019-12-04  9:11   ` Ondrej Mosnacek
  2019-12-04 15:48     ` Stephen Smalley
  2019-12-04 23:52     ` Paul Moore
  2019-12-09 21:17   ` Paul Moore
  1 sibling, 2 replies; 15+ messages in thread
From: Ondrej Mosnacek @ 2019-12-04  9:11 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jeff Vander Stoep, SElinux list, Stephen Smalley, Will Deacon,
	Paul E. McKenney, rcu, Jovana Knezevic

On Tue, Dec 3, 2019 at 1:33 AM Paul Moore <paul@paul-moore.com> wrote:
> On Fri, Nov 22, 2019 at 4:33 AM Jeff Vander Stoep <jeffv@google.com> wrote:
> > This replaces the reverse table lookup and reverse cache with a
> > hashtable which improves cache-miss reverse-lookup times from
> > O(n) to O(1)* and maintains the same performance as a reverse
> > cache hit.
> >
> > This reduces the time needed to add a new sidtab entry from ~500us
> > to 5us on a Pixel 3 when there are ~10,000 sidtab entries.
> >
> > The implementation uses the kernel's generic hashtable API,
> > It uses the context's string represtation as the hash source,
> > and the kernels generic string hashing algorithm full_name_hash()
> > to reduce the string to a 32 bit value.
> >
> > This change also maintains the improvement introduced in
> > commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
> > performance") which removed the need to keep the current sidtab
> > locked during policy reload. It does however introduce periodic
> > locking of the target sidtab while converting the hashtable. Sidtab
> > entries are never modified or removed, so the context struct stored
> > in the sid_to_context tree can also be used for the context_to_sid
> > hashtable to reduce memory usage.
> >
> > This bug was reported by:
> > - On the selinux bug tracker.
> >   BUG: kernel softlockup due to too many SIDs/contexts #37
> >   https://github.com/SELinuxProject/selinux-kernel/issues/37
> > - Jovana Knezevic on Android's bugtracker.
> >   Bug: 140252993
> >   "During multi-user performance testing, we create and remove users
> >   many times. selinux_android_restorecon_pkgdir goes from 1ms to over
> >   20ms after about 200 user creations and removals. Accumulated over
> >   ~280 packages, that adds a significant time to user creation,
> >   making perf benchmarks unreliable."
> >
> > * Hashtable lookup is only O(1) when n < the number of buckets.
> >
> > Changes in V2:
> > -The hashtable uses sidtab_entry_leaf objects directly so these
> > objects are shared between the sid_to_context lookup tree and the
> > context_to_sid hashtable. This simplifies memory allocation and
> > was suggested by Ondrej Mosnacek.
> > -The new sidtab hash stats file in selinuxfs has been moved out of
> > the avc dir and into a new "ss" dir.
> >
> > V3:
> > -Add lock nesting notation.
> >
> > V4/V5:
> > -Moved to *_rcu variants of the various hashtable functions
> > as suggested by Will Deacon.
> > -Naming/spelling fixups.
> >
> > V6
> > -Remove nested locking. Use lock of active sidtab to gate
> > access to the new sidtab.
> > -Remove use of rcu_head/kfree_rcu(), they're unnecessary because
> > hashtable objects are never removed when read/add operations are
> > occurring. Why is this safe? Quoting Ondrej Mosnacek from the
> > selinux mailing list:
> > "It is not visible in this patch, but the sidtab (along with other
> > policy-lifetime structures) is protected by a big fat read-write lock.
> > The only places where sidtab_destroy() is called are (a) error paths
> > when initializing a new sidtab (here the sidtab isn't shared yet, so
> > no race) and (b) when freeing the old sidtab during policy reload - in
> > this case it is happening after a policy write-locked critical
> > section, which had removed the old sidtab pointer from the shared
> > structures, so at that point all sidtab readers will already be
> > accessing the new sidtab and the old one is visible only by the thread
> > doing the destruction."
> >
> > V7
> > -Change format of /sys/fs/selinux/ss/sidtab_hash_stats to match
> > /sys/fs/selinux/avc/hash_stats.
> > -Add __rcu annotation to rcu pointers.
> > -Test with CONFIG_SPARSE_RCU_POINTER and CONFIG_PROVE_RCU.
> > -Add rcu@vger.kernel.org and Paul McKenney to Cc for review of the
> > RCU logic.
> >
> > V8
> > -Removed the __rcu annotation used in V7. The annotation is
> > intended to be applied to pointers to an object, however the
> > objects referenced in the rcu hashtable are allocated in an
> > array.
> > -Fixed bug where multiple SIDs were receiving the same hash
> > due to security_get_user_sids() reusing the same context
> > struct without calling context_init() on it. This bug was
> > discovered and root-caused by Stephen Smalley.
> >
> > V9
> > -Do not compute the hash in string_to_context_struct
> > because this string representation is non-canonical.
> >
> > Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
> > Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
> > Reported-by: Jovana Knezevic <jovanak@google.com>
> > ---
> >  security/selinux/Kconfig            |  12 ++
> >  security/selinux/include/security.h |   1 +
> >  security/selinux/selinuxfs.c        |  65 +++++++
> >  security/selinux/ss/context.h       |  11 +-
> >  security/selinux/ss/policydb.c      |   5 +
> >  security/selinux/ss/services.c      |  96 +++++++---
> >  security/selinux/ss/services.h      |   4 +-
> >  security/selinux/ss/sidtab.c        | 263 ++++++++++++++--------------
> >  security/selinux/ss/sidtab.h        |  16 +-
> >  9 files changed, 306 insertions(+), 167 deletions(-)
>
> Thanks Jeff, as well as everyone else who contributed reviews and feedback.
>
> I've pulled this into a working branch and I'll be merging it with the
> other sidtab patches before posting it to a "next-queue" branch for
> review later this week.  When done, I'll send a note to the list, as
> well as the relevant patch authors; your help in reviewing the merge
> would be greatly appreciated.

I tried doing the merge on my own here [1], you can use it as a sanity
check if we came to the same/similar result. I based it off your
existing next-queue, which contains only Jeff's patch at the time of
writing. I only build-tested it so far.

Note that there are two whitespace cleanups included in the string
cache commit that I intuitively did while resolving the merge
conflicts. You might want to move those to the first commit or just
ignore them.

[1] https://gitlab.com/omos/linux-public/compare/selinux-next...rebase-selinux-sidtab-string-cache

-- 
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-04  9:11   ` Ondrej Mosnacek
@ 2019-12-04 15:48     ` Stephen Smalley
  2019-12-04 23:52     ` Paul Moore
  1 sibling, 0 replies; 15+ messages in thread
From: Stephen Smalley @ 2019-12-04 15:48 UTC (permalink / raw)
  To: Ondrej Mosnacek, Paul Moore
  Cc: Jeff Vander Stoep, SElinux list, Will Deacon, Paul E. McKenney,
	rcu, Jovana Knezevic

On 12/4/19 4:11 AM, Ondrej Mosnacek wrote:
> On Tue, Dec 3, 2019 at 1:33 AM Paul Moore <paul@paul-moore.com> wrote:
>> On Fri, Nov 22, 2019 at 4:33 AM Jeff Vander Stoep <jeffv@google.com> wrote:
>>> This replaces the reverse table lookup and reverse cache with a
>>> hashtable which improves cache-miss reverse-lookup times from
>>> O(n) to O(1)* and maintains the same performance as a reverse
>>> cache hit.
>>>
>>> This reduces the time needed to add a new sidtab entry from ~500us
>>> to 5us on a Pixel 3 when there are ~10,000 sidtab entries.
>>>
>>> The implementation uses the kernel's generic hashtable API,
>>> It uses the context's string represtation as the hash source,
>>> and the kernels generic string hashing algorithm full_name_hash()
>>> to reduce the string to a 32 bit value.
>>>
>>> This change also maintains the improvement introduced in
>>> commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
>>> performance") which removed the need to keep the current sidtab
>>> locked during policy reload. It does however introduce periodic
>>> locking of the target sidtab while converting the hashtable. Sidtab
>>> entries are never modified or removed, so the context struct stored
>>> in the sid_to_context tree can also be used for the context_to_sid
>>> hashtable to reduce memory usage.
>>>
>>> This bug was reported by:
>>> - On the selinux bug tracker.
>>>    BUG: kernel softlockup due to too many SIDs/contexts #37
>>>    https://github.com/SELinuxProject/selinux-kernel/issues/37
>>> - Jovana Knezevic on Android's bugtracker.
>>>    Bug: 140252993
>>>    "During multi-user performance testing, we create and remove users
>>>    many times. selinux_android_restorecon_pkgdir goes from 1ms to over
>>>    20ms after about 200 user creations and removals. Accumulated over
>>>    ~280 packages, that adds a significant time to user creation,
>>>    making perf benchmarks unreliable."
>>>
>>> * Hashtable lookup is only O(1) when n < the number of buckets.
>>>
>>> Changes in V2:
>>> -The hashtable uses sidtab_entry_leaf objects directly so these
>>> objects are shared between the sid_to_context lookup tree and the
>>> context_to_sid hashtable. This simplifies memory allocation and
>>> was suggested by Ondrej Mosnacek.
>>> -The new sidtab hash stats file in selinuxfs has been moved out of
>>> the avc dir and into a new "ss" dir.
>>>
>>> V3:
>>> -Add lock nesting notation.
>>>
>>> V4/V5:
>>> -Moved to *_rcu variants of the various hashtable functions
>>> as suggested by Will Deacon.
>>> -Naming/spelling fixups.
>>>
>>> V6
>>> -Remove nested locking. Use lock of active sidtab to gate
>>> access to the new sidtab.
>>> -Remove use of rcu_head/kfree_rcu(), they're unnecessary because
>>> hashtable objects are never removed when read/add operations are
>>> occurring. Why is this safe? Quoting Ondrej Mosnacek from the
>>> selinux mailing list:
>>> "It is not visible in this patch, but the sidtab (along with other
>>> policy-lifetime structures) is protected by a big fat read-write lock.
>>> The only places where sidtab_destroy() is called are (a) error paths
>>> when initializing a new sidtab (here the sidtab isn't shared yet, so
>>> no race) and (b) when freeing the old sidtab during policy reload - in
>>> this case it is happening after a policy write-locked critical
>>> section, which had removed the old sidtab pointer from the shared
>>> structures, so at that point all sidtab readers will already be
>>> accessing the new sidtab and the old one is visible only by the thread
>>> doing the destruction."
>>>
>>> V7
>>> -Change format of /sys/fs/selinux/ss/sidtab_hash_stats to match
>>> /sys/fs/selinux/avc/hash_stats.
>>> -Add __rcu annotation to rcu pointers.
>>> -Test with CONFIG_SPARSE_RCU_POINTER and CONFIG_PROVE_RCU.
>>> -Add rcu@vger.kernel.org and Paul McKenney to Cc for review of the
>>> RCU logic.
>>>
>>> V8
>>> -Removed the __rcu annotation used in V7. The annotation is
>>> intended to be applied to pointers to an object, however the
>>> objects referenced in the rcu hashtable are allocated in an
>>> array.
>>> -Fixed bug where multiple SIDs were receiving the same hash
>>> due to security_get_user_sids() reusing the same context
>>> struct without calling context_init() on it. This bug was
>>> discovered and root-caused by Stephen Smalley.
>>>
>>> V9
>>> -Do not compute the hash in string_to_context_struct
>>> because this string representation is non-canonical.
>>>
>>> Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
>>> Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
>>> Reported-by: Jovana Knezevic <jovanak@google.com>
>>> ---
>>>   security/selinux/Kconfig            |  12 ++
>>>   security/selinux/include/security.h |   1 +
>>>   security/selinux/selinuxfs.c        |  65 +++++++
>>>   security/selinux/ss/context.h       |  11 +-
>>>   security/selinux/ss/policydb.c      |   5 +
>>>   security/selinux/ss/services.c      |  96 +++++++---
>>>   security/selinux/ss/services.h      |   4 +-
>>>   security/selinux/ss/sidtab.c        | 263 ++++++++++++++--------------
>>>   security/selinux/ss/sidtab.h        |  16 +-
>>>   9 files changed, 306 insertions(+), 167 deletions(-)
>>
>> Thanks Jeff, as well as everyone else who contributed reviews and feedback.
>>
>> I've pulled this into a working branch and I'll be merging it with the
>> other sidtab patches before posting it to a "next-queue" branch for
>> review later this week.  When done, I'll send a note to the list, as
>> well as the relevant patch authors; your help in reviewing the merge
>> would be greatly appreciated.
> 
> I tried doing the merge on my own here [1], you can use it as a sanity
> check if we came to the same/similar result. I based it off your
> existing next-queue, which contains only Jeff's patch at the time of
> writing. I only build-tested it so far.
> 
> Note that there are two whitespace cleanups included in the string
> cache commit that I intuitively did while resolving the merge
> conflicts. You might want to move those to the first commit or just
> ignore them.
> 
> [1] https://gitlab.com/omos/linux-public/compare/selinux-next...rebase-selinux-sidtab-string-cache

FWIW, this merge tested successfully for me.  Looks like we can't 
leverage the cached strings for the reverse hash computation rather than 
dynamically generating the string since we don't yet have a sidtab entry 
at that point IIUC.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-04  9:11   ` Ondrej Mosnacek
  2019-12-04 15:48     ` Stephen Smalley
@ 2019-12-04 23:52     ` Paul Moore
  2019-12-05 11:48       ` Ondrej Mosnacek
  1 sibling, 1 reply; 15+ messages in thread
From: Paul Moore @ 2019-12-04 23:52 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Jeff Vander Stoep, SElinux list, Stephen Smalley, Will Deacon,
	Paul E. McKenney, rcu, Jovana Knezevic

On Wed, Dec 4, 2019 at 4:11 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> On Tue, Dec 3, 2019 at 1:33 AM Paul Moore <paul@paul-moore.com> wrote:
> > Thanks Jeff, as well as everyone else who contributed reviews and feedback.
> >
> > I've pulled this into a working branch and I'll be merging it with the
> > other sidtab patches before posting it to a "next-queue" branch for
> > review later this week.  When done, I'll send a note to the list, as
> > well as the relevant patch authors; your help in reviewing the merge
> > would be greatly appreciated.
>
> I tried doing the merge on my own here [1], you can use it as a sanity
> check if we came to the same/similar result. I based it off your
> existing next-queue, which contains only Jeff's patch at the time of
> writing. I only build-tested it so far.

Thanks, that was a good sanity check.  There are some minor diffs from
what I ended up with, but nothing substantive that I can see.

Although I'll be honest, the merge wasn't as bad as I thought it would
be; most of the fuzz was simply due shuffling and renaming of data
structures, which generally isn't too bad.  Although I'm still
building the kernel to test it, so let's see if that statement still
holds (although it looks like it passed Stephen's testing). ;)

If you haven't noticed already, the merge currently lives in the
selinux/next-queue branch; if you notice anything off, feel free to
send a fixup patch.

> Note that there are two whitespace cleanups included in the string
> cache commit that I intuitively did while resolving the merge
> conflicts. You might want to move those to the first commit or just
> ignore them.

When looking at the combined diff between the two sidtab patches and
comparing it to your merge I did make a few additional small cosmetic
tweaks.  Assuming the testing goes well, I'll probably go over
everything one more time to make sure the style looks okay, but today
I was focusing more on the correctness.

> [1] https://gitlab.com/omos/linux-public/compare/selinux-next...rebase-selinux-sidtab-string-cache

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-04 23:52     ` Paul Moore
@ 2019-12-05 11:48       ` Ondrej Mosnacek
  2019-12-05 14:08         ` Paul Moore
  0 siblings, 1 reply; 15+ messages in thread
From: Ondrej Mosnacek @ 2019-12-05 11:48 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jeff Vander Stoep, SElinux list, Stephen Smalley, Will Deacon,
	Paul E. McKenney, rcu, Jovana Knezevic

On Thu, Dec 5, 2019 at 12:52 AM Paul Moore <paul@paul-moore.com> wrote:
> On Wed, Dec 4, 2019 at 4:11 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > On Tue, Dec 3, 2019 at 1:33 AM Paul Moore <paul@paul-moore.com> wrote:
> > > Thanks Jeff, as well as everyone else who contributed reviews and feedback.
> > >
> > > I've pulled this into a working branch and I'll be merging it with the
> > > other sidtab patches before posting it to a "next-queue" branch for
> > > review later this week.  When done, I'll send a note to the list, as
> > > well as the relevant patch authors; your help in reviewing the merge
> > > would be greatly appreciated.
> >
> > I tried doing the merge on my own here [1], you can use it as a sanity
> > check if we came to the same/similar result. I based it off your
> > existing next-queue, which contains only Jeff's patch at the time of
> > writing. I only build-tested it so far.
>
> Thanks, that was a good sanity check.  There are some minor diffs from
> what I ended up with, but nothing substantive that I can see.
>
> Although I'll be honest, the merge wasn't as bad as I thought it would
> be; most of the fuzz was simply due shuffling and renaming of data
> structures, which generally isn't too bad.  Although I'm still
> building the kernel to test it, so let's see if that statement still
> holds (although it looks like it passed Stephen's testing). ;)
>
> If you haven't noticed already, the merge currently lives in the
> selinux/next-queue branch; if you notice anything off, feel free to
> send a fixup patch.

It looks OK semantically when compared to my merge. I only see
reordering/comment/whitespace differences.

>
> > Note that there are two whitespace cleanups included in the string
> > cache commit that I intuitively did while resolving the merge
> > conflicts. You might want to move those to the first commit or just
> > ignore them.
>
> When looking at the combined diff between the two sidtab patches and
> comparing it to your merge I did make a few additional small cosmetic
> tweaks.  Assuming the testing goes well, I'll probably go over
> everything one more time to make sure the style looks okay, but today
> I was focusing more on the correctness.

The whitespace misalignment introduced by Jeff's patch is still there
in your branch. Personally, I'd prefer that we fix them now rather
than deferring it to a future patch, because it seems that no one ever
has time to bother sending whitespace fixup patches :) But I'll
understand it if you prefer not to touch it more than necessary, so I
won't fight about this further.

>
> > [1] https://gitlab.com/omos/linux-public/compare/selinux-next...rebase-selinux-sidtab-string-cache

--
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-05 11:48       ` Ondrej Mosnacek
@ 2019-12-05 14:08         ` Paul Moore
  2019-12-05 17:41           ` Paul Moore
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Moore @ 2019-12-05 14:08 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Jeff Vander Stoep, SElinux list, Stephen Smalley, Will Deacon,
	Paul E. McKenney, rcu, Jovana Knezevic

On Thu, Dec 5, 2019 at 6:47 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> On Thu, Dec 5, 2019 at 12:52 AM Paul Moore <paul@paul-moore.com> wrote:
> > On Wed, Dec 4, 2019 at 4:11 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > > On Tue, Dec 3, 2019 at 1:33 AM Paul Moore <paul@paul-moore.com> wrote:
> > > > Thanks Jeff, as well as everyone else who contributed reviews and feedback.
> > > >
> > > > I've pulled this into a working branch and I'll be merging it with the
> > > > other sidtab patches before posting it to a "next-queue" branch for
> > > > review later this week.  When done, I'll send a note to the list, as
> > > > well as the relevant patch authors; your help in reviewing the merge
> > > > would be greatly appreciated.
> > >
> > > I tried doing the merge on my own here [1], you can use it as a sanity
> > > check if we came to the same/similar result. I based it off your
> > > existing next-queue, which contains only Jeff's patch at the time of
> > > writing. I only build-tested it so far.
> >
> > Thanks, that was a good sanity check.  There are some minor diffs from
> > what I ended up with, but nothing substantive that I can see.
> >
> > Although I'll be honest, the merge wasn't as bad as I thought it would
> > be; most of the fuzz was simply due shuffling and renaming of data
> > structures, which generally isn't too bad.  Although I'm still
> > building the kernel to test it, so let's see if that statement still
> > holds (although it looks like it passed Stephen's testing). ;)
> >
> > If you haven't noticed already, the merge currently lives in the
> > selinux/next-queue branch; if you notice anything off, feel free to
> > send a fixup patch.
>
> It looks OK semantically when compared to my merge. I only see
> reordering/comment/whitespace differences.

Thanks for the double check.  Unfortunately my kernel build locks my
test VM in early boot; it appears to be non-SELinux related and since
the test build is based on selinux/next+patches (which is based off
v5.4-rc1) I imagine there might be some unrelated problems in the
build.  I'm going to rebase my test build to Linus' current and try
this again.

> > > Note that there are two whitespace cleanups included in the string
> > > cache commit that I intuitively did while resolving the merge
> > > conflicts. You might want to move those to the first commit or just
> > > ignore them.
> >
> > When looking at the combined diff between the two sidtab patches and
> > comparing it to your merge I did make a few additional small cosmetic
> > tweaks.  Assuming the testing goes well, I'll probably go over
> > everything one more time to make sure the style looks okay, but today
> > I was focusing more on the correctness.
>
> The whitespace misalignment introduced by Jeff's patch is still there
> in your branch. Personally, I'd prefer that we fix them now rather
> than deferring it to a future patch, because it seems that no one ever
> has time to bother sending whitespace fixup patches :) But I'll
> understand it if you prefer not to touch it more than necessary, so I
> won't fight about this further.

Like I said, I only quickly scanned the combined diff for style
problems so it doesn't surprise me that there are still some issues.
I'll give it a closer look once I can get a kernel build passing all
the tests.

As an aside, I keep debating doing a big style-fix patch (likely
automated via astyle or similar, likely with some additional fixes by
hand, and passed through checkpatch.pl) after one of the -rc1 rebases
to clean up all these little things that have crept into the code over
the years.  I dislike the idea of the churn that would likely bring,
but it should make life a little bit better, and help cut down on the
trivial "checkpatch patches" we get from time to time (although that
really hasn't been too bad of an issue).

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-05 14:08         ` Paul Moore
@ 2019-12-05 17:41           ` Paul Moore
  2019-12-05 18:10             ` Stephen Smalley
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Moore @ 2019-12-05 17:41 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: Jeff Vander Stoep, SElinux list, Stephen Smalley, Will Deacon,
	Paul E. McKenney, rcu, Jovana Knezevic

On Thu, Dec 5, 2019 at 9:08 AM Paul Moore <paul@paul-moore.com> wrote:
> Thanks for the double check.  Unfortunately my kernel build locks my
> test VM in early boot; it appears to be non-SELinux related and since
> the test build is based on selinux/next+patches (which is based off
> v5.4-rc1) I imagine there might be some unrelated problems in the
> build.  I'm going to rebase my test build to Linus' current and try
> this again.

Hmm.  I haven't done any debugging yet, but the BPF tests are failing
(they pass with kernel-5.5.0-0.rc0.git5.1.2.secnext.fc32.x86_64):

1..15
ok 1
Failed to load BPF prog: Invalid argument
not ok 2
#   Failed test at ./test line 68.
Failed to create BPF map: Permission denied
ok 3
Failed to create BPF map: Permission denied
ok 4
Failed to create BPF map: Permission denied
ok 5
Failed to load BPF prog: Permission denied
ok 6
Failed to load BPF prog: Invalid argument
ok 7
client: Using a BPF map fd
client: Connected to server via ./test_sock
server:  Accepted a connection, receiving message
client: Sent descriptor, waiting for reply
server:  Received a descriptor, fd=5, sending back 0
client: Received reply, code=0
client: ...This implies the descriptor was received
ok 8
Failed to load BPF prog: Invalid argument
client: Using a BPF prog fd
client: Connected to server via ./test_sock
sendmsg: Bad file descriptor
server:  Accepted a connection, receiving message
server:  Received no descriptor, sending back 1
not ok 9
#   Failed test at ./test line 118.
Failed to load BPF prog: Invalid argument
client: Using a BPF prog fd
connect: Connection refused
ok 10
client: Using a BPF map fd
connect: Connection refused
ok 11
ok 12
Client request_service_provider_fd() failing command BR_FAILED_REPLY, exiting.
ok 13
ok 14
Failed to load BPF prog: Invalid argument
Client request_service_provider_fd() failing command BR_FAILED_REPLY, exiting.
ok 15
# Looks like you failed 2 tests of 15.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-05 17:41           ` Paul Moore
@ 2019-12-05 18:10             ` Stephen Smalley
  2019-12-05 18:14               ` Paul Moore
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Smalley @ 2019-12-05 18:10 UTC (permalink / raw)
  To: Paul Moore, Ondrej Mosnacek
  Cc: Jeff Vander Stoep, SElinux list, Will Deacon, Paul E. McKenney,
	rcu, Jovana Knezevic

On 12/5/19 12:41 PM, Paul Moore wrote:
> On Thu, Dec 5, 2019 at 9:08 AM Paul Moore <paul@paul-moore.com> wrote:
>> Thanks for the double check.  Unfortunately my kernel build locks my
>> test VM in early boot; it appears to be non-SELinux related and since
>> the test build is based on selinux/next+patches (which is based off
>> v5.4-rc1) I imagine there might be some unrelated problems in the
>> build.  I'm going to rebase my test build to Linus' current and try
>> this again.
> 
> Hmm.  I haven't done any debugging yet, but the BPF tests are failing
> (they pass with kernel-5.5.0-0.rc0.git5.1.2.secnext.fc32.x86_64):
> 
> 1..15
> ok 1
> Failed to load BPF prog: Invalid argument
> not ok 2
> #   Failed test at ./test line 68.
> Failed to create BPF map: Permission denied
> ok 3
> Failed to create BPF map: Permission denied
> ok 4
> Failed to create BPF map: Permission denied
> ok 5
> Failed to load BPF prog: Permission denied
> ok 6
> Failed to load BPF prog: Invalid argument
> ok 7
> client: Using a BPF map fd
> client: Connected to server via ./test_sock
> server:  Accepted a connection, receiving message
> client: Sent descriptor, waiting for reply
> server:  Received a descriptor, fd=5, sending back 0
> client: Received reply, code=0
> client: ...This implies the descriptor was received
> ok 8
> Failed to load BPF prog: Invalid argument
> client: Using a BPF prog fd
> client: Connected to server via ./test_sock
> sendmsg: Bad file descriptor
> server:  Accepted a connection, receiving message
> server:  Received no descriptor, sending back 1
> not ok 9
> #   Failed test at ./test line 118.
> Failed to load BPF prog: Invalid argument
> client: Using a BPF prog fd
> connect: Connection refused
> ok 10
> client: Using a BPF map fd
> connect: Connection refused
> ok 11
> ok 12
> Client request_service_provider_fd() failing command BR_FAILED_REPLY, exiting.
> ok 13
> ok 14
> Failed to load BPF prog: Invalid argument
> Client request_service_provider_fd() failing command BR_FAILED_REPLY, exiting.
> ok 15
> # Looks like you failed 2 tests of 15.

They all pass for me (with your next-queue branch, using the 
selinux-testsuite defconfig fragment merged with the Fedora config).

The error above doesn't look SELinux-related; it looks like your kernel 
is rejecting the trivial bpf program used in the test code as being 
invalid for some reason.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-05 18:10             ` Stephen Smalley
@ 2019-12-05 18:14               ` Paul Moore
  2019-12-06  0:50                 ` Paul Moore
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Moore @ 2019-12-05 18:14 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Ondrej Mosnacek, Jeff Vander Stoep, SElinux list, Will Deacon,
	Paul E. McKenney, rcu, Jovana Knezevic

On Thu, Dec 5, 2019 at 1:10 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On 12/5/19 12:41 PM, Paul Moore wrote:
> > On Thu, Dec 5, 2019 at 9:08 AM Paul Moore <paul@paul-moore.com> wrote:
> >> Thanks for the double check.  Unfortunately my kernel build locks my
> >> test VM in early boot; it appears to be non-SELinux related and since
> >> the test build is based on selinux/next+patches (which is based off
> >> v5.4-rc1) I imagine there might be some unrelated problems in the
> >> build.  I'm going to rebase my test build to Linus' current and try
> >> this again.
> >
> > Hmm.  I haven't done any debugging yet, but the BPF tests are failing
> > (they pass with kernel-5.5.0-0.rc0.git5.1.2.secnext.fc32.x86_64):
> >
> > 1..15
> > ok 1
> > Failed to load BPF prog: Invalid argument
> > not ok 2
> > #   Failed test at ./test line 68.
> > Failed to create BPF map: Permission denied
> > ok 3
> > Failed to create BPF map: Permission denied
> > ok 4
> > Failed to create BPF map: Permission denied
> > ok 5
> > Failed to load BPF prog: Permission denied
> > ok 6
> > Failed to load BPF prog: Invalid argument
> > ok 7
> > client: Using a BPF map fd
> > client: Connected to server via ./test_sock
> > server:  Accepted a connection, receiving message
> > client: Sent descriptor, waiting for reply
> > server:  Received a descriptor, fd=5, sending back 0
> > client: Received reply, code=0
> > client: ...This implies the descriptor was received
> > ok 8
> > Failed to load BPF prog: Invalid argument
> > client: Using a BPF prog fd
> > client: Connected to server via ./test_sock
> > sendmsg: Bad file descriptor
> > server:  Accepted a connection, receiving message
> > server:  Received no descriptor, sending back 1
> > not ok 9
> > #   Failed test at ./test line 118.
> > Failed to load BPF prog: Invalid argument
> > client: Using a BPF prog fd
> > connect: Connection refused
> > ok 10
> > client: Using a BPF map fd
> > connect: Connection refused
> > ok 11
> > ok 12
> > Client request_service_provider_fd() failing command BR_FAILED_REPLY, exiting.
> > ok 13
> > ok 14
> > Failed to load BPF prog: Invalid argument
> > Client request_service_provider_fd() failing command BR_FAILED_REPLY, exiting.
> > ok 15
> > # Looks like you failed 2 tests of 15.
>
> They all pass for me (with your next-queue branch, using the
> selinux-testsuite defconfig fragment merged with the Fedora config).

Oh goodie, I'm special :/

FWIW, my current test kernel is the next-queue branch rebased on top
of Linus' current tree, using the latest config from the secnext
kernel builds (Fedora Rawhide + stuff for the test suite).

> The error above doesn't look SELinux-related; it looks like your kernel
> is rejecting the trivial bpf program used in the test code as being
> invalid for some reason.

That's where I'm at as well, I'm building an instrumented kernel right
now to try and track down the source.  I'm sure it is something silly
like a messed up kernel config or something, but I'd like to
understand *why*.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-05 18:14               ` Paul Moore
@ 2019-12-06  0:50                 ` Paul Moore
  2019-12-06 13:45                   ` Stephen Smalley
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Moore @ 2019-12-06  0:50 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Ondrej Mosnacek, Jeff Vander Stoep, SElinux list, Will Deacon,
	Paul E. McKenney, rcu, Jovana Knezevic

On Thu, Dec 5, 2019 at 1:14 PM Paul Moore <paul@paul-moore.com> wrote:
> On Thu, Dec 5, 2019 at 1:10 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> > On 12/5/19 12:41 PM, Paul Moore wrote:
> > > Hmm.  I haven't done any debugging yet, but the BPF tests are failing
> > > (they pass with kernel-5.5.0-0.rc0.git5.1.2.secnext.fc32.x86_64):

...

> > They all pass for me (with your next-queue branch, using the
> > selinux-testsuite defconfig fragment merged with the Fedora config).
>
> Oh goodie, I'm special :/
>
> FWIW, my current test kernel is the next-queue branch rebased on top
> of Linus' current tree, using the latest config from the secnext
> kernel builds (Fedora Rawhide + stuff for the test suite).
>
> > The error above doesn't look SELinux-related; it looks like your kernel
> > is rejecting the trivial bpf program used in the test code as being
> > invalid for some reason.
>
> That's where I'm at as well, I'm building an instrumented kernel right
> now to try and track down the source.  I'm sure it is something silly
> like a messed up kernel config or something, but I'd like to
> understand *why*.

I traced the "./bpf_test -p" failure down to a BTF check in the BPF
verifier, there is a comment in that code block which helpfully reads:
"Either gcc or pahole or kernel are broken.".

 :/

The relevant commit is 8580ac9404f6 ("bpf: Process in-kernel BTF"),
and it appears to be new for v5.5; it isn't present in selinux/next or
selinux/next-queue.  Recompiling with CONFIG_DEBUG_INFO_BTF disabled
does allow "./bpf_test -p" to succeed, but I hit other BPF test
failures further along.  For reasons I don't understand, the secnext
kernel builds (which should have this code, and have
CONFIG_DEBUG_INFO_BTF enabled) are not hitting this problem, but that
may be due to differences in the build tools on the two systems
(although they *should* be the same).

Given that we haven't hit -rc1 yet, and everyone else's builds are
working just fine, I'm going to leave this alone for now.  Whatever
the problems may be, they definitely don't appear to be SELinux
related.

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-06  0:50                 ` Paul Moore
@ 2019-12-06 13:45                   ` Stephen Smalley
  2019-12-06 15:08                     ` Paul Moore
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Smalley @ 2019-12-06 13:45 UTC (permalink / raw)
  To: Paul Moore
  Cc: Ondrej Mosnacek, Jeff Vander Stoep, SElinux list, Will Deacon,
	Paul E. McKenney, rcu, Jovana Knezevic

On 12/5/19 7:50 PM, Paul Moore wrote:
> On Thu, Dec 5, 2019 at 1:14 PM Paul Moore <paul@paul-moore.com> wrote:
>> On Thu, Dec 5, 2019 at 1:10 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>>> On 12/5/19 12:41 PM, Paul Moore wrote:
>>>> Hmm.  I haven't done any debugging yet, but the BPF tests are failing
>>>> (they pass with kernel-5.5.0-0.rc0.git5.1.2.secnext.fc32.x86_64):
> 
> ...
> 
>>> They all pass for me (with your next-queue branch, using the
>>> selinux-testsuite defconfig fragment merged with the Fedora config).
>>
>> Oh goodie, I'm special :/
>>
>> FWIW, my current test kernel is the next-queue branch rebased on top
>> of Linus' current tree, using the latest config from the secnext
>> kernel builds (Fedora Rawhide + stuff for the test suite).
>>
>>> The error above doesn't look SELinux-related; it looks like your kernel
>>> is rejecting the trivial bpf program used in the test code as being
>>> invalid for some reason.
>>
>> That's where I'm at as well, I'm building an instrumented kernel right
>> now to try and track down the source.  I'm sure it is something silly
>> like a messed up kernel config or something, but I'd like to
>> understand *why*.
> 
> I traced the "./bpf_test -p" failure down to a BTF check in the BPF
> verifier, there is a comment in that code block which helpfully reads:
> "Either gcc or pahole or kernel are broken.".
> 
>   :/
> 
> The relevant commit is 8580ac9404f6 ("bpf: Process in-kernel BTF"),
> and it appears to be new for v5.5; it isn't present in selinux/next or
> selinux/next-queue.  Recompiling with CONFIG_DEBUG_INFO_BTF disabled
> does allow "./bpf_test -p" to succeed, but I hit other BPF test
> failures further along.  For reasons I don't understand, the secnext
> kernel builds (which should have this code, and have
> CONFIG_DEBUG_INFO_BTF enabled) are not hitting this problem, but that
> may be due to differences in the build tools on the two systems
> (although they *should* be the same).
> 
> Given that we haven't hit -rc1 yet, and everyone else's builds are
> working just fine, I'm going to leave this alone for now.  Whatever
> the problems may be, they definitely don't appear to be SELinux
> related.

I re-based next-queue on top of -linus, enabled CONFIG_DEBUG_INFO_BTF, 
rebuilt and booted new kernel, did a git clean -fdx in the 
selinux-testsuite directory, and built/ran the testsuite; bpf tests 
still passed for me.  This was on F31.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-06 13:45                   ` Stephen Smalley
@ 2019-12-06 15:08                     ` Paul Moore
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Moore @ 2019-12-06 15:08 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Ondrej Mosnacek, Jeff Vander Stoep, SElinux list, Will Deacon,
	Paul E. McKenney, rcu, Jovana Knezevic

On Fri, Dec 6, 2019 at 8:45 AM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On 12/5/19 7:50 PM, Paul Moore wrote:
> > On Thu, Dec 5, 2019 at 1:14 PM Paul Moore <paul@paul-moore.com> wrote:
> >> On Thu, Dec 5, 2019 at 1:10 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
> >>> On 12/5/19 12:41 PM, Paul Moore wrote:
> >>>> Hmm.  I haven't done any debugging yet, but the BPF tests are failing
> >>>> (they pass with kernel-5.5.0-0.rc0.git5.1.2.secnext.fc32.x86_64):
> >
> > ...
> >
> >>> They all pass for me (with your next-queue branch, using the
> >>> selinux-testsuite defconfig fragment merged with the Fedora config).
> >>
> >> Oh goodie, I'm special :/
> >>
> >> FWIW, my current test kernel is the next-queue branch rebased on top
> >> of Linus' current tree, using the latest config from the secnext
> >> kernel builds (Fedora Rawhide + stuff for the test suite).
> >>
> >>> The error above doesn't look SELinux-related; it looks like your kernel
> >>> is rejecting the trivial bpf program used in the test code as being
> >>> invalid for some reason.
> >>
> >> That's where I'm at as well, I'm building an instrumented kernel right
> >> now to try and track down the source.  I'm sure it is something silly
> >> like a messed up kernel config or something, but I'd like to
> >> understand *why*.
> >
> > I traced the "./bpf_test -p" failure down to a BTF check in the BPF
> > verifier, there is a comment in that code block which helpfully reads:
> > "Either gcc or pahole or kernel are broken.".
> >
> >   :/
> >
> > The relevant commit is 8580ac9404f6 ("bpf: Process in-kernel BTF"),
> > and it appears to be new for v5.5; it isn't present in selinux/next or
> > selinux/next-queue.  Recompiling with CONFIG_DEBUG_INFO_BTF disabled
> > does allow "./bpf_test -p" to succeed, but I hit other BPF test
> > failures further along.  For reasons I don't understand, the secnext
> > kernel builds (which should have this code, and have
> > CONFIG_DEBUG_INFO_BTF enabled) are not hitting this problem, but that
> > may be due to differences in the build tools on the two systems
> > (although they *should* be the same).
> >
> > Given that we haven't hit -rc1 yet, and everyone else's builds are
> > working just fine, I'm going to leave this alone for now.  Whatever
> > the problems may be, they definitely don't appear to be SELinux
> > related.
>
> I re-based next-queue on top of -linus, enabled CONFIG_DEBUG_INFO_BTF,
> rebuilt and booted new kernel, did a git clean -fdx in the
> selinux-testsuite directory, and built/ran the testsuite; bpf tests
> still passed for me.  This was on F31.

Thanks.  If I had another day to spend on this I'm reasonably
confident it would end up being the result of some oddity/difference
in the Rawhide toolchains, or ghosts.  Either way, this seems to be
both unrelated to the SELinux changes and not something anyone else is
seeing despite a number of different scenarios.  I still need to go
through the two patches and cleanup some of the whitespace issues we
talked about earlier, but barring anything weird happening (see the
previous ghost comment) I'll merge both of these patches into
selinux/next when the merge window closes this weekend.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9] selinux: sidtab: reverse lookup hash table
  2019-12-03  0:32 ` Paul Moore
  2019-12-04  9:11   ` Ondrej Mosnacek
@ 2019-12-09 21:17   ` Paul Moore
  1 sibling, 0 replies; 15+ messages in thread
From: Paul Moore @ 2019-12-09 21:17 UTC (permalink / raw)
  To: Jeff Vander Stoep
  Cc: selinux, omosnace, Stephen Smalley, will, paulmck, rcu, Jovana Knezevic

On Mon, Dec 2, 2019 at 7:32 PM Paul Moore <paul@paul-moore.com> wrote:
> On Fri, Nov 22, 2019 at 4:33 AM Jeff Vander Stoep <jeffv@google.com> wrote:
> > This replaces the reverse table lookup and reverse cache with a
> > hashtable which improves cache-miss reverse-lookup times from
> > O(n) to O(1)* and maintains the same performance as a reverse
> > cache hit.
> >
> > This reduces the time needed to add a new sidtab entry from ~500us
> > to 5us on a Pixel 3 when there are ~10,000 sidtab entries.
> >
> > The implementation uses the kernel's generic hashtable API,
> > It uses the context's string represtation as the hash source,
> > and the kernels generic string hashing algorithm full_name_hash()
> > to reduce the string to a 32 bit value.
> >
> > This change also maintains the improvement introduced in
> > commit ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve
> > performance") which removed the need to keep the current sidtab
> > locked during policy reload. It does however introduce periodic
> > locking of the target sidtab while converting the hashtable. Sidtab
> > entries are never modified or removed, so the context struct stored
> > in the sid_to_context tree can also be used for the context_to_sid
> > hashtable to reduce memory usage.
> >
> > This bug was reported by:
> > - On the selinux bug tracker.
> >   BUG: kernel softlockup due to too many SIDs/contexts #37
> >   https://github.com/SELinuxProject/selinux-kernel/issues/37
> > - Jovana Knezevic on Android's bugtracker.
> >   Bug: 140252993
> >   "During multi-user performance testing, we create and remove users
> >   many times. selinux_android_restorecon_pkgdir goes from 1ms to over
> >   20ms after about 200 user creations and removals. Accumulated over
> >   ~280 packages, that adds a significant time to user creation,
> >   making perf benchmarks unreliable."
> >
> > * Hashtable lookup is only O(1) when n < the number of buckets.
> >
> > Changes in V2:
> > -The hashtable uses sidtab_entry_leaf objects directly so these
> > objects are shared between the sid_to_context lookup tree and the
> > context_to_sid hashtable. This simplifies memory allocation and
> > was suggested by Ondrej Mosnacek.
> > -The new sidtab hash stats file in selinuxfs has been moved out of
> > the avc dir and into a new "ss" dir.
> >
> > V3:
> > -Add lock nesting notation.
> >
> > V4/V5:
> > -Moved to *_rcu variants of the various hashtable functions
> > as suggested by Will Deacon.
> > -Naming/spelling fixups.
> >
> > V6
> > -Remove nested locking. Use lock of active sidtab to gate
> > access to the new sidtab.
> > -Remove use of rcu_head/kfree_rcu(), they're unnecessary because
> > hashtable objects are never removed when read/add operations are
> > occurring. Why is this safe? Quoting Ondrej Mosnacek from the
> > selinux mailing list:
> > "It is not visible in this patch, but the sidtab (along with other
> > policy-lifetime structures) is protected by a big fat read-write lock.
> > The only places where sidtab_destroy() is called are (a) error paths
> > when initializing a new sidtab (here the sidtab isn't shared yet, so
> > no race) and (b) when freeing the old sidtab during policy reload - in
> > this case it is happening after a policy write-locked critical
> > section, which had removed the old sidtab pointer from the shared
> > structures, so at that point all sidtab readers will already be
> > accessing the new sidtab and the old one is visible only by the thread
> > doing the destruction."
> >
> > V7
> > -Change format of /sys/fs/selinux/ss/sidtab_hash_stats to match
> > /sys/fs/selinux/avc/hash_stats.
> > -Add __rcu annotation to rcu pointers.
> > -Test with CONFIG_SPARSE_RCU_POINTER and CONFIG_PROVE_RCU.
> > -Add rcu@vger.kernel.org and Paul McKenney to Cc for review of the
> > RCU logic.
> >
> > V8
> > -Removed the __rcu annotation used in V7. The annotation is
> > intended to be applied to pointers to an object, however the
> > objects referenced in the rcu hashtable are allocated in an
> > array.
> > -Fixed bug where multiple SIDs were receiving the same hash
> > due to security_get_user_sids() reusing the same context
> > struct without calling context_init() on it. This bug was
> > discovered and root-caused by Stephen Smalley.
> >
> > V9
> > -Do not compute the hash in string_to_context_struct
> > because this string representation is non-canonical.
> >
> > Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
> > Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
> > Reported-by: Jovana Knezevic <jovanak@google.com>
> > ---
> >  security/selinux/Kconfig            |  12 ++
> >  security/selinux/include/security.h |   1 +
> >  security/selinux/selinuxfs.c        |  65 +++++++
> >  security/selinux/ss/context.h       |  11 +-
> >  security/selinux/ss/policydb.c      |   5 +
> >  security/selinux/ss/services.c      |  96 +++++++---
> >  security/selinux/ss/services.h      |   4 +-
> >  security/selinux/ss/sidtab.c        | 263 ++++++++++++++--------------
> >  security/selinux/ss/sidtab.h        |  16 +-
> >  9 files changed, 306 insertions(+), 167 deletions(-)
>
> Thanks Jeff, as well as everyone else who contributed reviews and feedback.
>
> I've pulled this into a working branch and I'll be merging it with the
> other sidtab patches before posting it to a "next-queue" branch for
> review later this week.  When done, I'll send a note to the list, as
> well as the relevant patch authors; your help in reviewing the merge
> would be greatly appreciated.

FYI, this is now in selinux/next proper.  Thanks again everyone!

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-12-09 21:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22  9:33 [PATCH v9] selinux: sidtab: reverse lookup hash table Jeff Vander Stoep
2019-11-22 14:21 ` Stephen Smalley
2019-12-03  0:32 ` Paul Moore
2019-12-04  9:11   ` Ondrej Mosnacek
2019-12-04 15:48     ` Stephen Smalley
2019-12-04 23:52     ` Paul Moore
2019-12-05 11:48       ` Ondrej Mosnacek
2019-12-05 14:08         ` Paul Moore
2019-12-05 17:41           ` Paul Moore
2019-12-05 18:10             ` Stephen Smalley
2019-12-05 18:14               ` Paul Moore
2019-12-06  0:50                 ` Paul Moore
2019-12-06 13:45                   ` Stephen Smalley
2019-12-06 15:08                     ` Paul Moore
2019-12-09 21:17   ` Paul Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).