linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Lutomirski <luto@kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [RFC 2/2] x86/pti/64: Remove the SYSCALL64 entry trampoline
Date: Sun, 22 Jul 2018 13:59:21 -0700	[thread overview]
Message-ID: <422DF5AC-6B45-406F-B3FC-DD1AA9BC18F6@amacapital.net> (raw)
In-Reply-To: <CA+55aFz1ne3KTzni2Yvsp8ZRFzk+s78ZhKyGeLZvmRivBhFMfA@mail.gmail.com>


> On Jul 22, 2018, at 11:27 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Sun, Jul 22, 2018 at 10:45 AM Andy Lutomirski <luto@kernel.org> wrote:
>> 
>> This patch changes the code to map the percpu TSS into the user page
>> tables to allow the non-trampoline SYSCALL64 path to work under PTI.
> 
> Me likey.
> 
> However:
> 
>> This does not add a new direct information leak, since the TSS is
>> readable by Meltdown from the cpu_entry_area alias regardless.
> 
> Afaik, it does now potentially expose through meltdown the per-thread
> entry stack info, which is new.

It’s always been exposed through the RO alias. The only new exposure is the *address* of the RW alias, I think.

> 
> But I don't think that's a show-stopper.
> 
>> static void __init pti_clone_user_shared(void)
>> {
>> +       for_each_possible_cpu(cpu) {
> 
> But this code is pretty disgusting and seems wrong.
> 
> Do you really want to do all trhe _possible_ cpu's, not just the
> online ones? I'd rather expose less (think MAXCPU) and then have the
> CPU hotplug code expose the page as the CPU comes up?

We already have exactly the same issue for cpu_entry_area. If we change it, I think we should do cpu_entry_area at the same time.  But that’s awkward because cpu_entry_area is mapped one PMD at a time right now.

It’s also awkward to expose a percpu page dynamically, because (I think) percpu data isn’t guaranteed to all be in the same PGD-sized area. A vmalloc fault in the early SYSCALL64 path is fatal.

> 
>> +               unsigned long va = (unsigned long)&per_cpu(cpu_tss_rw, cpu);
>> +               phys_addr_t pa = per_cpu_ptr_to_phys((void *)va);
>> +               pte_t *target_pte;
>> +
>> +               target_pte = pti_user_pagetable_walk_pte(va);
> 
> This function only exists if CONFIG_X86_VSYSCALL_EMULATION, so it
> won't even compile under (very unusual) configurations.

Oops.

> 
> The "disgusting" part is that I think it could/should share more code
> with the vsyscall case, and the whole target-pte checking and setting
> should be shared too.

I tried that. It was uglier. The percpu code wants to make up a new PTE because the real kernel mapping uses large pages. The vsyscall code wants to copy a PTE because it’s really a PTE and it has unusual permissions.

> 
> Beause not being shared, I react to this:
> 
>> +               set_pte(target_pte, pfn_pte(pa >> PAGE_SHIFT, PAGE_KERNEL));
> 
> Hmm. The vsyscall code just does
> 
>        *target_pte = ..
> 
> without any set_pte() stuff. Do we want/need the PVOP cases, and if
> so, why doesn't the vsyscall case need it?

It doesn’t need it. I could use plain assignment.

> 
> Anyway, I love the approach, and how this gets rid of the nasty
> trampoline, so no real complaints, just "this needs some fixups".
> 
> 

I’ll do the fixups. I think that, if we want to unmap the pages for CPUs that aren’t present, that should be a separate patch. I’m also not convinced it adds much value.

In general, PTI is fairly crappy, and it leaks all kinds of information. I suspect the worst leak is the NMI stack for local and remote CPUs. Fixing *that* is going to be fugly, but may actually be important, because I can easily imagine malicious user code that causes arbitrary kernel memory to get read and spilled on the NMI stack.

What we *should* do IMO is defer allocation of percpu space for not-present CPUs to save a bunch of memory.  But that’s a major change and will probably break things.

  reply	other threads:[~2018-07-22 20:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-22 17:45 [RFC 0/2] Get rid of the entry trampoline Andy Lutomirski
2018-07-22 17:45 ` [RFC 1/2] x86/entry/64: Use the TSS sp2 slot for rsp_scratch Andy Lutomirski
2018-07-22 20:12   ` Ingo Molnar
2018-07-23 12:38   ` Dave Hansen
2018-07-24  2:36     ` Andy Lutomirski
2018-07-22 17:45 ` [RFC 2/2] x86/pti/64: Remove the SYSCALL64 entry trampoline Andy Lutomirski
2018-07-22 18:27   ` Linus Torvalds
2018-07-22 20:59     ` Andy Lutomirski [this message]
2018-07-23 12:59   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=422DF5AC-6B45-406F-B3FC-DD1AA9BC18F6@amacapital.net \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).