From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753843AbdHUPXQ (ORCPT <rfc822;w@1wt.eu>);
        Mon, 21 Aug 2017 11:23:16 -0400
Received: from mail-qk0-f181.google.com ([209.85.220.181]:35624 "EHLO
        mail-qk0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1753433AbdHUPXO (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 21 Aug 2017 11:23:14 -0400
Content-Type: text/plain;
        charset=us-ascii
Mime-Version: 1.0 (1.0)
Subject: Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3
From: Andy Lutomirski <luto@amacapital.net>
X-Mailer: iPhone Mail (14G60)
In-Reply-To: <20170821140813.idloyrk4lowann3j@hirez.programming.kicks-ass.net>
Date: Mon, 21 Aug 2017 08:23:10 -0700
Cc: Andy Lutomirski <luto@kernel.org>, Will Deacon <will.deacon@arm.com>,
        Mark Rutland <mark.rutland@arm.com>,
        Matt Fleming <matt@codeblueprint.co.uk>,
        Ard Biesheuvel <ard.biesheuvel@linaro.org>,
        Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>,
        "linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        joeyli <jlee@suse.com>, Borislav Petkov <bp@alien8.de>,
        "Michael S. Tsirkin" <mst@redhat.com>,
        "Neri, Ricardo" <ricardo.neri@intel.com>,
        "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Message-Id: <6E0248C9-19AB-474E-A901-2A0422337DD0@amacapital.net>
References: <20170816095338.GB17270@leverpostej> <20170816100709.GG12845@arm.com> <20170816110321.GC17270@leverpostej> <20170816125715.GB3384@codeblueprint.co.uk> <CALCETrVSHuz=hcAgGxYHBmpiNcP4ECrJh885cR4X17QsJVj8+w@mail.gmail.com> <20170815223541.GA25778@remoulade> <20170817103514.GC27872@arm.com> <CALCETrVhLmntPArQiuOcQeNf9Y2kDxa+mUY=v1P8rVOkeCZn4Q@mail.gmail.com> <20170821103359.jt2xf2cx5wxjldau@hirez.programming.kicks-ass.net> <E5C753C8-1B92-4B4F-B95D-32DCC5C0AF5F@amacapital.net> <20170821140813.idloyrk4lowann3j@hirez.programming.kicks-ass.net>
To: Peter Zijlstra <peterz@infradead.org>
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v7LFNLBH005929


> On Aug 21, 2017, at 7:08 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Mon, Aug 21, 2017 at 06:56:01AM -0700, Andy Lutomirski wrote:
>> 
>> 
>>> On Aug 21, 2017, at 3:33 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>>>> 
>>>> Using a kernel thread solves the problem for real.  Anything that
>>>> blindly accesses user memory in kernel thread context is terminally
>>>> broken no matter what.
>>> 
>>> So perf-callchain doesn't do it 'blindly', it wants either:
>>> 
>>> - user_mode(regs) true, or
>>> - task_pt_regs() set.
>>> 
>>> However I'm thinking that if the kernel thread has ->mm == &efi_mm, the
>>> EFI code running could very well have user_mode(regs) being true.
>>> 
>>> intel_pmu_pebs_fixup() OTOH 'blindly' assumes that the LBR addresses are
>>> accessible. It bails on error though. So while its careful, it does
>>> attempt to access the 'user' mapping directly. Which should also trigger
>>> with the EFI code.
>>> 
>>> And I'm not seeing anything particularly broken with either. The PEBS
>>> fixup relies on the CPU having just executed the code, and if it could
>>> fetch and execute the code, why shouldn't it be able to fetch and read?
>> 
>> There are two ways this could be a problem.  One is that u privileged
>> user apps shouldn't be able to read from EFI memory.
> 
> Ah, but only root can create per-cpu events or attach events to kernel
> threads (with sensible paranoia levels).

But this may not need to be percpu.  If a non root user can trigger, say, an EFI variable read in their own thread context, boom.

> 
>> The other is that, if EFI were to have IO memory mapped at a "user"
>> address, perf could end up reading it.
> 
> Ah, but in neither mode does perf assume much, the LBR follows branches
> the CPU took and thus we _know_ there was code there, not MMIO. And the
> stack unwind simply follows the stack up, although I suppose it could be
> 'tricked' into probing MMIO. We can certainly add an "->mm !=
> ->active_mm" escape clause to the unwind code.
> 
> Although I don't see how we're currently avoiding the same problem with
> existing userspace unwinds, userspace can equally have MMIO mapped.

But user space at least only has IO mapped to which the user program in question has rights.

> 
> But neither will use pre-existing user addresses in the efi_mm I think.