From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 Sender: keescook@google.com In-Reply-To: References: <20180227111532.1144-1-ppandit@redhat.com> From: Kees Cook Date: Mon, 5 Mar 2018 12:42:38 -0800 Message-ID: Subject: Re: [PATCH 0/1] Zero initialise kernel stack variables Content-Type: text/plain; charset="UTF-8" To: Casey Schaufler Cc: Nick Kralevich , P J P , Kernel Hardening , Florian Weimer , P J P List-ID: On Fri, Mar 2, 2018 at 1:01 PM, Casey Schaufler wrote: > On 3/2/2018 11:52 AM, Nick Kralevich wrote: >> On Tue, Feb 27, 2018 at 11:28 AM, Kees Cook wrote: >>>> This same kernel is running on my F27 test machine as I write this. >>>> There is no slowness or notice-able performance impact as such. >>> Unfortunately "noticeable" isn't going to be a viable metric. You'll >>> need to do some real-world benchmarks (i.e. kernel builds, hackbench, >>> etc), and compare the results. Even just initializing >>> passed-by-reference variables (GCC_PLUGIN_STRUCTLEAK_BYREF_ALL) had >>> measurable performance impact. >>> >>> It would be nice to have four options/features available from the >>> compiler, from least to most performance impact: >> >> One of the recurring themes I've noticed from the kernel-hardening >> mailing list is that someone will propose a valuable compile time >> feature (such as this one), people will ask for performance >> measurements, some specific use case will be found which has adverse >> performance impact, and the entire change will be rejected. It's >> silly. The way I think about adding security hardening to upstream is that we have to consider a number of conflicting interests. I would say the primary principle is "secure by default", which comes with lots of caveats, namely in the form of performance impact, maintainability impact, and false positive rate. If something has zero performance and maintenance impact and zero false positive rate, then this is a slam-dunk and there's no problem. This is rarely the case, of course. Frequently the maintainability and false positive rate can be seen or inferred from the content of the patches, so the explicit question, when not answered in a commit log or documentation, becomes "what is the performance impact?" When something can't be always-on, then we have to look at asking "can this be a _runtime_ toggle?" rather than a compile-time option, since many of the end-users of the kernel are getting it through distros, and distros are allergic to making many different kernel packages with different compile-time options. They just want _one_ kernel binary. Now, speaking to "the entire change will be rejected", this usually happens in two cases. Either the added complexity is seen as not worth the hardening (in which case, better evidence is needed to support the hardening, or refactoring to reduce maintenance burdens), or we bump up against what I'll call "security absolutism" ways of thinking from some maintainers. Dealing with the latter is much more difficult, since they don't see pragmatic security improvements as having value compared to "perfect protection". Obviously perfect is preferred, but it can't always be done. (e.g. see "an unpowered computer is safest", etc.) I want hardening features to _actually_ get used by the widest possible audience of end users. That means trying to balance all the things above, and get the best possible approach for upstream and downstream. In this specific case, we can't have a runtime toggle, which means we need to get the performance impact as low as possible if we want to see distros using it. We may not get down to a level were a distro wants to enable it, then that's fine, other people who do their own kernel builds will still have it available as a compile-time option, but then at least we've tried, and we'll have evidence to show for what kinds of improvements would be needed in the future to gain wider roll-out. > A security developers we *must* consider performance to be > critical at all times. There is no easier way to get something > rejected out of hand than for it to have serious performance > impact. On the other hand, if I had a nickel for every time I > saw something really stoopid done to improve performance I'd > have a big pile of nickels. One of the performance issues your > changed caused was in the common case of network packet delivery. > The people who work on that code are fanatical about performance. > Just as the security people might flame an obvious buffer overflow, > the networking people will stomp on an unnecessary performance hit. The networking folks are especially unhappy about doing (what they see as) redundant memory initialization, even when it kills classes of bugs. I personally find this rather frustrating, but I understand that performance is critical to them. In these cases, having a generalized approach (such as a compiler flag or plugin) provides a way for the general public to benefit from killing bug classes and for the perf-critical users to disable the protection. I don't like having to face these trade-offs, but as we're seeing, the corner-cutting by CPUs and kernels continue to haunt us, and we need to find efficient ways to initialize memory, avoid side-channels, etc. >> The beautiful thing about compile time features is that they can be >> enabled / disabled per compilation unit. If there's a performance >> problem in a certain chunk of code, figure out a way to opt that code >> out of the protections. 95% of the performance of the kernel is likely >> only dependent on 5% of the code, and even if we opt out that code, we >> still protect the remaining 95% of the code. If you assume that bugs >> are evenly distributed throughout the code base, a 95% reduction is >> HUGE. > > For good or ill there is a strong tradition that a 95% security > solution is broken. You don't need all the code to be penetrated, > just a teeny bit. And what about the code that couldn't be > penetrated anyway, but now has to be slower to protect that little > bit? Yeah, and it's specifically the networking code that has had many of the flaws in leaking memory contents due to drivers not fully initializing some structure here or there. So we'd leave ourselves open to a known source of regular bugs. I think the best use of per-compilation-unit protections is for things that have high false-positive rates, and it can be used as a way to start building up coverage over time across the entire kernel with the goal of 100% coverage (e.g. runtime integer overflow detection like Clang's -fsanitize=integer). >> Strategically, we shouldn't let the perfect be the enemy of the good. >> Let's move away from fixating on benchmarks, and just enable easy >> opt-out for the folks who demonstrate performance bottlenecks. I agree that opt-out is desired, but upstream conservatism tends to dictate opt-in initially. Having performance numbers (like I talk about above) is just part of the larger piece for trying to figure out a strategy for upstreaming. Once it's in, then distros enable it by default, then the kernel does too. This can span years, though. > The problem is not that people will choose performance over security, > or the other way around. It's that they don't consider both to be > important in the first place. > >> >> I suspect that many kernel-hardening features would have an easier >> time being accepted if it wasn't presented as an all or nothing >> feature. The largest constraint, IMO, is the maintainability one. Having complex code with lots of #ifdefs for the compile-time toggles creates code that bit-rots, gets less testing, etc. Perf is part of the picture, but one that gets discussed more explicitly since it can impact the other areas of concern. > You're up against the notion that you can ignore one aspect of > the system to enhance another. > >> >> My $0.02. Thanks for bringing it up! I wish things were easier, but this is what we have. :) I'd also like to think that if we can hammer out all the concerns for a patch series on this list, then we can avoid flame-wars and auto-ignore when presenting things for merging. -Kees -- Kees Cook Pixel Security