Re: [PATCH 0/3] KASLR feature to randomize each loadable module

From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "jannh@google.com" <jannh@google.com>,
	"keescook@chromium.org" <keescook@chromium.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Van De Ven, Arjan" <arjan.van.de.ven@intel.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"Accardi, Kristen C" <kristen.c.accardi@intel.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"kernel-hardening@lists.openwall.com" 
	<kernel-hardening@lists.openwall.com>,
	"Hansen, Dave" <dave.hansen@intel.com>
Subject: Re: [PATCH 0/3] KASLR feature to randomize each loadable module
Date: Thu, 21 Jun 2018 18:59:59 +0000	[thread overview]
Message-ID: <1529607615.29548.202.camel@intel.com> (raw)
In-Reply-To: <CAG48ez2uuQkSS9DLz6j5HbpuxaHMyAVYGMM+xoZEo51N=sHmdg@mail.gmail.com>

On Thu, 2018-06-21 at 15:37 +0200, Jann Horn wrote:
> On Thu, Jun 21, 2018 at 12:34 AM Kees Cook <keescook@chromium.org>
> wrote:
> > And most systems have <200 modules, really. I have 113 on a desktop
> > right now, 63 on a server. So this looks like a trivial win.
> But note that the eBPF JIT also uses module_alloc(). Every time a BPF
> program (this includes seccomp filters!) is JIT-compiled by the
> kernel, another module_alloc() allocation is made. For example, on my
> desktop machine, I have a bunch of seccomp-sandboxed processes thanks
> to Chrome. If I enable the net.core.bpf_jit_enable sysctl and open a
> few Chrome tabs, BPF JIT allocations start showing up between
> modules:
> 
> # grep -C1 bpf_jit_binary_alloc /proc/vmallocinfo | cut -d' ' -f 2-
>   20480 load_module+0x1326/0x2ab0 pages=4 vmalloc N0=4
>   12288 bpf_jit_binary_alloc+0x32/0x90 pages=2 vmalloc N0=2
>   20480 load_module+0x1326/0x2ab0 pages=4 vmalloc N0=4
> --
>   20480 load_module+0x1326/0x2ab0 pages=4 vmalloc N0=4
>   12288 bpf_jit_binary_alloc+0x32/0x90 pages=2 vmalloc N0=2
>   36864 load_module+0x1326/0x2ab0 pages=8 vmalloc N0=8
> --
>   20480 load_module+0x1326/0x2ab0 pages=4 vmalloc N0=4
>   12288 bpf_jit_binary_alloc+0x32/0x90 pages=2 vmalloc N0=2
>   40960 load_module+0x1326/0x2ab0 pages=9 vmalloc N0=9
> --
>   20480 load_module+0x1326/0x2ab0 pages=4 vmalloc N0=4
>   12288 bpf_jit_binary_alloc+0x32/0x90 pages=2 vmalloc N0=2
>  253952 load_module+0x1326/0x2ab0 pages=61 vmalloc N0=61
> 
> If you use Chrome with Site Isolation, you have a few dozen open
> tabs,
> and the BPF JIT is enabled, reaching a few hundred allocations might
> not be that hard.
> 
> Also: What's the impact on memory usage? Is this going to increase
> the
> number of pagetables that need to be allocated by the kernel per
> module_alloc() by 4K or 8K or so?
Thanks, it seems it might require some extra memory.  I'll look into it
to find out exactly how much.

I didn't include eBFP modules in the randomization estimates, but it
looks like they are usually smaller than a page.  So with the slight
leap that the larger normal modules based estimate is the worst case,
you should still get ~800 modules at 18 bits. After that it will start
to go down to 10 bits and so in either case it at least won't regress
the randomness of the existing algorithm.

> > 
> > > 
> > > As for fragmentation, this algorithm reduces the average number
> > > of modules that
> > > can be loaded without an allocation failure by about 6% (~17000
> > > to ~16000)
> > > (p<0.05). It can also reduce the largest module executable
> > > section that can be
> > > loaded by half to ~500MB in the worst case.
> > Given that we only have 8312 tristate Kconfig items, I think 16000
> > will remain just fine. And even large modules (i915) are under
> > 2MB...
> > 
> > > 
> > > The new __vmalloc_node_try_addr function uses the existing
> > > function
> > > __vmalloc_node_range, in order to introduce this algorithm with
> > > the least
> > > invasive change. The side effect is that each time there is a
> > > collision when
> > > trying to allocate in the random area a TLB flush will be
> > > triggered. There is
> > > a more complex, more efficient implementation that can be used
> > > instead if
> > > there is interest in improving performance.
> > The only time when module loading speed is noticeable, I would
> > think,
> > would be boot time. Have you done any boot time delta analysis? I
> > wouldn't expect it to change hardly at all, but it's probably a
> > good
> > idea to actually test it. :)
> If you have a forking server that applies seccomp filters on each
> fork, or something like that, you might care about those TLB flushes.
> 

I can test this as well.