From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751560AbdKKQNi (ORCPT ); Sat, 11 Nov 2017 11:13:38 -0500 Received: from mail-it0-f48.google.com ([209.85.214.48]:38969 "EHLO mail-it0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750929AbdKKQNf (ORCPT ); Sat, 11 Nov 2017 11:13:35 -0500 X-Google-Smtp-Source: AGs4zMZj0EbQPes6GbbJREH3Ulkv8hJU/ufXk8SRGvCU0B4GI2g7HnAUhf8JjjjyQWIeikOfvkaXU8BKwGxRVQ394sg= MIME-Version: 1.0 In-Reply-To: References: <20171109193715.GB21978@ZenIV.linux.org.uk> <40ad7c6e-f0d7-959a-bf29-d3e3843f5d31@gentoo.org> <23f7da04-95f7-24e7-ee70-ce40c5b8fee3@gentoo.org> <67939ef3-29c6-762c-7afe-46cc69630d95@gentoo.org> From: Kees Cook Date: Sat, 11 Nov 2017 08:13:33 -0800 X-Google-Sender-Auth: L_6LDEdAaQNgza8QkGUp8trqG1k Message-ID: Subject: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11 To: Linus Torvalds Cc: Patrick McLean , Emese Revfy , Al Viro , Bruce Fields , "Darrick J. Wong" , Linux Kernel Mailing List , Linux NFS Mailing List , stable , Thorsten Leemhuis , "kernel-hardening@lists.openwall.com" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 10, 2017 at 6:36 PM, Linus Torvalds wrote: > [ Bringing in the gcc plugin people and the kernel hardening list, > since it now is no longer even remotely looking like a nfsd, vfs or > filesystem issue any more ] > > Kees, Emese, > the whole thread is on lkml, but there's clearly something horribly > wrong with RANDSTRUCT, and it's not new even though it looked that way > for a while. It wouldn't be the first issue we've seen; it's (obviously) a pretty aggressive change to the resulting build. > Patrick seems to trigger it with nfsd, so it might be specific to that. > > Alternatively, it might just be that very few people run > RANDSTRUCT-built kernels, or just have been lucky with the seeding. Given its potential cache-line abuse, I'm not surprised that its usage is more limited than other features. > Sorry for top-posting, but there's not really anything in the email > itself to reply to, other than saying thanks to Patrick for narrowing > it down like this. Agreed; thanks Patrick! :) Given that the issue is non-deterministic, I wonder if the bug is related to some kind of missing RCU or barrier that goes unnoticed in normal struct layouts. > It would have been very interesting if it had actually bisected to > something, but it seems that the real issue is just the choice of > seeding for RANDSTRUCT. That's where we've seen bugs in the past: some pathological ordering of a struct uncovers a corner case. In the past it's been much more deterministic: doesn't build, or immediately crashes on boot, etc. I'll take a closer look at this and see if I can provide something to narrow it down. -Kees > > Linus > > On Fri, Nov 10, 2017 at 4:27 PM, Patrick McLean wrote: >> On 2017-11-10 03:26 PM, Patrick McLean wrote: >>> On 2017-11-10 10:42 AM, Linus Torvalds wrote: >>>> >>>> I really don't see anything that looks even half-way suspicious in >>>> that 4.13.8..11 range. But as mentioned, compiler interactions can be >>>> _really_ subtle. >>>> >>>> And hey, it can be a real kernel bug too, that just happens to be >>>> exposed by RANDSTRUCT, so a bisect really would be very nice. >>> >>> I am working on bisecting the issue now, but I think I have some more >>> evidence pointing to a compiler issue related to RANDSTRUCT. There are >>> actually 3 issues that we have seen. Sometimes we get the null pointer >>> deref in the initial message, sometimes we get the GPF, and sometimes we >>> see an issue where the NFS clients see all files as root-owned >>> directories. Any given kernel will always see the same issue, but after >>> a "make mrproper" and recompile (with the same .config), the issue will >>> often change. I suspect that all 3 of these problems are actually the >>> same issue manifesting itself in different ways depending on what seed >>> the RANDSTRUCT gcc plugin is using. >> >> Further update on this, using the same seed for RANDSTRUCT, I have >> reproduced this issue on v4.13.0, so it does not seem to be recently >> introduced. The older kernel apparently only worked for us because we >> were lucky. Generally we always compile new kernels from a fresh tree, >> so they are never using the same seed. >> >> In case someone wants to play with this, here are some interesting seeds >> (in include/generated/randomize_layout_hash.h): >> >> Produce a NULL pointer dereference (though I am not sure what the client >> does to produce this). >> 5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc >> >> All files for nfsd4 clients appear as directories owned as root, no >> matter the real owner (this happens for all clients we have tested): >> 3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e >> >> This is the seed that was breaking motherboards (make sure you have a >> way to flash the BIOS with this one): >> 3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd >> >> Finally, here is a seed that produces a kernel that does not exhibit any >> problems we are aware of: >> e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b >> >>>> >>>> Because in the end, compiler bugs are very rare. They are particularly >>>> annoying when they do happen, though, so they loom big in the mind of >>>> people who have had to chase them down. >>>> -- Kees Cook Pixel Security