Re: [PATCH 0/9] SELinux: Improve hash functions and sizing of hash tables

From: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
To: Paul Moore <paul@paul-moore.com>
Cc: selinux@vger.kernel.org, colin.king@canonical.com,
	Eric Paris <eparis@parisplace.org>,
	gregkh@linuxfoundation.org, jeffv@google.com,
	omosnace@redhat.com,
	Stephen Smalley <stephen.smalley.work@gmail.com>,
	tglx@linutronix.de
Subject: Re: [PATCH 0/9] SELinux: Improve hash functions and sizing of hash tables
Date: Mon, 13 Apr 2020 16:43:35 -0400	[thread overview]
Message-ID: <20200413204334.GA10584@concurrent-rt.com> (raw)
In-Reply-To: <CAHC9VhRJ=b-dTVwgTE1TKezqY8KWoGFoHSU1XdfMNjP6uoHQFg@mail.gmail.com>

The 04/09/2020 09:41, Paul Moore wrote:
> On Wed, Apr 8, 2020 at 2:24 PM <siarhei.liakh@concurrent-rt.com> wrote:
> > From: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
> >
> > This patch set is the result of an actual customer support case where a client
> > of ours observed unacceptable variability of timings while sending UDP packets
> > via sendto() with SELinux enabled. The initial investigation centered around
> > kernel 3.10.108 in CentOS 6.5 environment. Combination of these patches with
> > some substantial tuning of newly added Kconfig options was able to reduce
> > *maximum* sendto() latency to about 160us, down from well over 2ms. Worst
> > latency was typically observed when a new SSH login was initiated concurrently
> > with the test program running a sendto() loop, thus causing an AVC miss with a
> > subsequent call to security_compute_av(), which would spend most of its time
> > iterating through policydb within context_struct_compute_av().
> >
> > The original patch set was developed for linux kernel 3.10.108 and later ported
> > to newer versions. The patch set presented here is based off Linus' tree as of
> > April 7, 2020 and contains only a subset of the changes which are still relevant
> > to 5.6+ as many of the issues had already been addressed in different ways.
> >
> > The patch set consists of 9 patches total and is meant to achieve two goals:
> > 1. Replace most local copies of custom hash functions with generic hash
> >    functions already available in inlude/linux/*.h.
> >
> > 2. Replace hard-coded hash table sizing parameters with Kconfig tunables.
> >
> > "Advanced Hashing" Kconfig option is the only dependency between the patches,
> > but other than that and any combination of them can be used.
> 
> I haven't yet looked at these patches in detail, but a few quick thoughts:

Looks like I sent you a snapshot with a couple of minor porting errors before I
fixed them. You can still look at the patches since the errors do not change
the logic, just hold off on compiling. Apologies for the mix-up, I'll send out
another version soon.

Just wanted to provide some additional background so we can better understand
each other. I focus primarily on real-time aspects of the system, so pretty
much everything I write here has to be viewed in that context. For example,
while I would like to get average performance numbers improved, I am much more
concerned with extreme outliers. Such frame of thinking, to some degree, could
be considered similar to the primary reason why Linux prefers heapsort over
quicksort: while quicksort is faster on average, it also suffers from a
pathological worst-case scenario of O(n^2).

I understand that real-time Linux is a small niche. However, I do believe that
at least some of the changes I suggest would be useful to general public also.
Although I would be really happy if my patches were to get accepted, I think
of them more as conversation starters rather than actual finished product.

> * I would be very curious to see what the performance improvement is
> on a current kernel build from either selinux/next or Linus' tree.

I've got a Fedora 32 beta box installed, so I will get some benchmark numbers
for Linus' tree with default F32 policies soon.

> Performance numbers from an extremely old commercial distribution
> aren't of much interest to mainline development.

I get that. My point is that some of the inefficiencies (at least from my
standpoint) had been in the kernel for a long time and are still there today.
As much as I'd love to get our clients off ancient release, they have their
own reasons to use it, thus fixing my point of reference to something old.

> * In general I'm a fan of reducing the number of Kconfig options
> whenever possible in favor of the system auto-tuning based on usage
> (e.g. the loaded policy).  Obviously this isn't possible in some
> cases, but I worry that there is always a risk that if we expose a
> build knob there is a risk it will be mis-configured.  My initial
> reaction is that this patch set exposes way too many things as Kconfig
> knobs.  As an aside, I also worry about runtime tunables, but to a
> much lesser extent (the risk is less, the benefits greater, etc.).

Sure. My point is that hardcoded sizing is inconvenient so it would be nice
to fix it in some way. Moving stuff into Kconfig was the easiest way to do
it in my case, but I am open to better suggestions.

> * The AVC hash table improvement doesn't exactly look like a
> sea-change, 

Correct. However, old hash reliably produces about 40% collision rate while
a new one has distribution which is demonstrably better and it gets even more
so with liarger number of buckets (which, admittedly, is not necessarily what
a typical end-user would use).

> have you tried it on multiple policies and work loads?  I
> wonder if the small improvement you saw changes on different workloads
> and/or policies.

I am not an expert on SELinux (be that kernel or usersace side of business),
so I make do whith what is readily available to me: whatever comes with
the distribuition by default (meaning Targeted policy from CentOS), or
whatever a client is willing to provide us (if anything at all). Further,
as our client had discovered issues under particular workload, that is the
one I am using to benchmark my fixes. That said, I am willing to run a
better test suite if you have one ready.

> * In general I agree with your statement about using common code, e.g.
> hash functions, to improve code quality, maintenance, etc. but the
> hashing functions you are replacing are rather trivial and easily
> maintained.  Not to mention their performance in the SELinux code is a
> well known quantity at this point.

Here is my take on it: proper hash functions are just as hard to come up
with as locking and encryption schemes. Please consider following three points:

1. In 2006 Thomas Gleixner discovered that standard known-good hash_64()
is actually not that good. A very interesting discussion followed:
https://lwn.net/Articles/687494/
https://lore.kernel.org/lkml/20160428163525.495514517@linutronix.de/
https://lore.kernel.org/lkml/20160430205235.24232.qmail@ns.horizon.com/

2. A simple custom hash had already been deemed inadequate and replaced in
avtab.c with (unfortunately) a custom version of known good generic hash:
commit 33ebc1932a07efd8728975750409741940334489
Author: John Brooks <john.brooks@jolla.com>
Date:   Tue Mar 24 16:54:17 2015 -0400

3. We already have at least three ways to do essentially the same thing:

/* _OLD_ AVTab hash - see point #2 above */
static inline int avtab_hash(struct avtab_key *keyp, u32 mask)
{
       return ((keyp->target_class + (keyp->target_type << 2) +
                (keyp->source_type << 9)) & mask);
}

static u32 rangetr_hash(struct hashtab *h, const void *k)
{
        const struct range_trans *key = k;

        return (key->source_type + (key->target_type << 3) +
                (key->target_class << 5)) & (h->size - 1);
}

static inline int avc_hash(u32 ssid, u32 tsid, u16 tclass)
{
        return (ssid ^ (tsid<<2) ^ (tclass<<4)) & (AVC_CACHE_SLOTS - 1);
}

Yet another local copy of a simple custom hash is already being suggested for
future use in SELinux:

+static u32 role_trans_hash(struct hashtab *h, const void *k)
+{
+	const struct role_trans_key *key = k;
+
+	return (key->role + (key->type << 3) + (key->tclass << 5)) &
+		(h->size - 1);
+}

https://lore.kernel.org/selinux/20200407182858.1149087-1-omosnace@redhat.com/

So, as a kernel developer looking at SELinux for the first time in attempt to
resolve a real issue for a real custoer, I have the following questions:
- Why do we need so many ways to hash 3x 32-bit values?
- Why avtab hash was deemed to be bad, but not the other ones? Was it due to
unfortunate chice of shift constants?
- Why avc hash is using XOR while rangetr is using addition? Why shift (2, 4)
vs (3, 5)?
- Why not just use known-good jhash_3words() everywhere and be done with it?

> I'll take a closer look at these patches, likely next week after the
> merge window closes, but in the meantime if you could provide some
> more current performance numbers with a better explanation of the
> workloads I think it would be helpful.

Thank you for making time for this. I'll get you some numbers.
I really hope a better Linux for everyone will come out of this.

Thank you!
-- 
Siarhei Liakh
Concurrent Real-Time