From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752602AbdGFAUM (ORCPT ); Wed, 5 Jul 2017 20:20:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:47846 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752440AbdGFAUL (ORCPT ); Wed, 5 Jul 2017 20:20:11 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 90FFB22BDC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: References: <1499126133.2707.20.camel@decadent.org.uk> <20170704084122.GC14722@dhcp22.suse.cz> <20170704093538.GF14722@dhcp22.suse.cz> <20170704094728.GB22013@1wt.eu> <20170704104211.GG14722@dhcp22.suse.cz> <20170704113611.GA4732@decadent.org.uk> <1499209315.2707.29.camel@decadent.org.uk> <1499257180.2707.34.camel@decadent.org.uk> <20170705142354.GB21220@dhcp22.suse.cz> From: Andy Lutomirski Date: Wed, 5 Jul 2017 17:19:47 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm: larger stack guard gap, between vmas To: Kees Cook Cc: Andy Lutomirski , Linus Torvalds , Michal Hocko , Ben Hutchings , Willy Tarreau , Hugh Dickins , Oleg Nesterov , "Jason A. Donenfeld" , Rik van Riel , Larry Woodman , "Kirill A. Shutemov" , Tony Luck , "James E.J. Bottomley" , Helge Diller , James Hogan , Laura Abbott , Greg KH , "security@kernel.org" , Qualys Security Advisory , LKML , Ximin Luo Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 5, 2017 at 4:50 PM, Kees Cook wrote: > On Wed, Jul 5, 2017 at 10:23 AM, Andy Lutomirski wrote: >> Right. But I think the approach that we're all taking here is a bit >> nutty. We all realize that this issue is a longstanding *GCC* bug >> [1], but we're acting like it's a Big Deal (tm) kernel bug that Must >> Be Fixed (tm) and therefore is allowed to break ABI. My security hat >> is normally pretty hard-line, but I think it may be time to call BS. >> >> Imagine if Kees had sent some symlink hardening patch that was >> default-on and broke a stock distro. Or if I had sent a vsyscall >> hardening patch that broke real code. It would get reverted right >> away, probably along with a diatribe about how we should have known >> better. I think this stack gap stuff is the same thing. It's not a >> security fix -- it's a hardening patch. >> >> Looking at it that way, I think a new inherited-on-exec flag is nucking futs. >> >> I'm starting to think that the right approach is to mostly revert all >> this stuff (the execve fixes are fine). Then start over and think >> about it as hardening. I would suggest the following approach: >> >> - The stack gap is one page, just like it's been for years. >> - As a hardening feature, if the stack would expand within 64k or >> whatever of a non-MAP_FIXED mapping, refuse to expand it. (This might >> have to be a non-hinted mapping, not just a non-MAP_FIXED mapping.) >> The idea being that, if you deliberately place a mapping under the >> stack, you know what you're doing. If you're like LibreOffice and do >> something daft and are thus exploitable, you're on your own. >> - As a hardening measure, don't let mmap without MAP_FIXED position >> something within 64k or whatever of the bottom of the stack unless a >> MAP_FIXED mapping is between them. >> >> And that's all. It's not like a 64k gap actually fixes these bugs for >> real -- it just makes them harder to exploit. >> >> [1] The code that GCC generates for char buf[bug number] and alloca() >> is flat-out wrong. Everyone who's ever thought about it all all knows >> it and has known about it for years, but no one cared to fix it. > > As part of that should we put restrictions on the environment of > set*id exec too? Part of the risks demonstrated by Qualys was that > allowing a privilege-elevating binary to inherit rlimits can have lead > to the nasty memory layout side-effects. That would fall into the > "hardening" bucket as well. And if it turns out there is some set*id > binary out there that can't run with "only", e.g., 128MB of stack, we > can make it configurable... Yes. I think it's ridiculous that you can change rlimits and then exec a setuid thing. It's not so easy to fix, though. Maybe track, per-task, inherited by clone and exec, what the rlimits were the last time the process had privilege and reset to those limits when running something setuid. But a better approach might be to have some sysctls that say what the rlimits become when doing setuid. We need per-user-ns sysctls for stuff like this, and we don't really have them... --Andy