[PATCH 0/2] SELinux: Improve hashing

* [PATCH 0/2] SELinux: Improve hashing
@ 2020-04-29 20:29 siarhei.liakh
  2020-04-29 20:29 ` [PATCH 1/2] SELinux: Add median to debug output of hash table stats siarhei.liakh
  2020-04-29 20:29 ` [PATCH 2/2] SELinux: Introduce hash3() as alternative to shift-xor siarhei.liakh
  0 siblings, 2 replies; 8+ messages in thread
From: siarhei.liakh @ 2020-04-29 20:29 UTC (permalink / raw)
  To: selinux
  Cc: colin.king, eparis, gregkh, jeffv, omosnace, paul,
	stephen.smalley.work, tglx

From: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>

This is a follow-up to the original discussion of hashing within SELinux [1].
Based on the initial feedback received from Paul Moore and Ondrej Mosnacek,
I've re-focused and reduced the patch set down to the following two patches:

1. SELinux: Add median to debug output of hash table stats
2. SELinux: Introduce hash3() as alternative to shift-xor

The first one simply adds some extra stats in debug output generated by
the hash tables. I found it to be useful during the investigation, however
the patch not required for the actual hash improvement to work or be visible.

The second patch contains the actual proposed hash function alternative,
which improves bucket distribution in AVC, and latency in AVTab. After some
substantial research and testing, this new function (hash3()) ended up being
just as good at hashing 3x u32s as (or even *very* slightly better than)
jhash_3words() from jhash.h, avtab_hash() (which is a version of MurmurHash3),
and even a version of xxhash unrolled for just 3x 32-bit values. Please note
that all of the above is written strictly in context of SELinux and the way
it addresses its Access Vectors by 3-tuple of values (Source ID, Target ID,
Class ID). I suspect that significant part of surprising performance of
hash3() (or, depending on a view point, rather poor performance of other
"good" hashes) can be attributed to a sequential nature of these IDs within
a limited range.

Please keep in mind that the goal of this patch is not to replace any
particular hash function with something clearly more superior in one way
or another, but rather to find an acceptable standard which could be used
throughout SELinux as a way to eliminate multiple, substantially similar,
yet slightly different local implementations of hash functions within each
of SELinux components.

See second patch for the actual discussion of the new hash function and its
performance.

Please CC me directly on all replies.

Thank you!

References:

[1] https://lore.kernel.org/selinux/20200408182416.30995-1-siarhei.liakh@concurrent-rt.com/

Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>

Siarhei Liakh (2):
  SELinux: Add median to debug output of hash table stats
  SELinux: Introduce hash3() as alternative to shift-xor

 security/selinux/Kconfig          | 10 ++++
 security/selinux/avc.c            | 50 +++++++++++++---
 security/selinux/include/hash3.h  | 44 +++++++++++++++
 security/selinux/include/median.h | 67 ++++++++++++++++++++++
 security/selinux/ss/avtab.c       | 94 ++++++++++++++++---------------
 security/selinux/ss/avtab.h       |  1 +
 security/selinux/ss/hashtab.c     | 28 ++++++++-
 security/selinux/ss/hashtab.h     |  5 ++
 security/selinux/ss/policydb.c    | 14 +++--
 9 files changed, 252 insertions(+), 61 deletions(-)
 create mode 100644 security/selinux/include/hash3.h
 create mode 100644 security/selinux/include/median.h

-- 
2.17.1

^ permalink raw reply	[flat|nested] 8+ messages in thread