linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexandre Chartre <alexandre.chartre@oracle.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>, X86 ML <x86@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Joerg Roedel <jroedel@suse.de>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	jan.setjeeilers@oracle.com, Junaid Shahid <junaids@google.com>,
	oweisse@google.com, Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Alexander Graf <graf@amazon.de>,
	mgross@linux.intel.com, kuzuno@gmail.com
Subject: Re: [RFC][PATCH v2 12/21] x86/pti: Use PTI stack instead of trampoline stack
Date: Thu, 19 Nov 2020 13:06:19 +0100	[thread overview]
Message-ID: <88bab705-4b33-bda9-3ece-563234822095@oracle.com> (raw)
In-Reply-To: <2f6a446a-e656-627c-27f2-8411f318448c@oracle.com>



On 11/19/20 9:05 AM, Alexandre Chartre wrote:
>>>>>>>>>
>>>>>>>>> When entering the kernel from userland, use the per-task PTI stack
>>>>>>>>> instead of the per-cpu trampoline stack. Like the trampoline stack,
>>>>>>>>> the PTI stack is mapped both in the kernel and in the user page-table.
>>>>>>>>> Using a per-task stack which is mapped into the kernel and the user
>>>>>>>>> page-table instead of a per-cpu stack will allow executing more code
>>>>>>>>> before switching to the kernel stack and to the kernel page-table.
>>>>>>>>
>>>>>>>> Why?
>>>>>>>
>>>>>>> When executing more code in the kernel, we are likely to reach a point
>>>>>>> where we need to sleep while we are using the user page-table, so we need
>>>>>>> to be using a per-thread stack.
>>>>>>>
>>>>>>>> I can't immediately evaluate how nasty the page table setup is because
>>>>>>>> it's not in this patch.
>>>>>>>
>>>>>>> The page-table is the regular page-table as introduced by PTI. It is just
>>>>>>> augmented with a few additional mapping which are in patch 11 (x86/pti:
>>>>>>> Extend PTI user mappings).
>>>>>>>
>>>>>>>>     But AFAICS the only thing that this enables is sleeping with user pagetables.
>>>>>>>
>>>>>>> That's precisely the point, it allows to sleep with the user page-table.
>>>>>>>
>>>>>>>> Do we really need to do that?
>>>>>>>
>>>>>>> Actually, probably not with this particular patchset, because I do the page-table
>>>>>>> switch at the very beginning and end of the C handler. I had some code where I
>>>>>>> moved the page-table switch deeper in the kernel handler where you definitively
>>>>>>> can sleep (for example, if you switch back to the user page-table before
>>>>>>> exit_to_user_mode_prepare()).
>>>>>>>
>>>>>>> So a first step should probably be to not introduce the per-task PTI trampoline stack,
>>>>>>> and stick with the existing trampoline stack. The per-task PTI trampoline stack can
>>>>>>> be introduced later when the page-table switch is moved deeper in the C handler and
>>>>>>> we can effectively sleep while using the user page-table.
>>>>>>
>>>>>> Seems reasonable.
>>>>>>
>>>>>
>>>>> I finally remember why I have introduced a per-task PTI trampoline stack right now:
>>>>> that's to be able to move the CR3 switch anywhere in the C handler. To do so, we need
>>>>> a per-task stack to enter (and return) from the C handler as the handler can potentially
>>>>> go to sleep.
>>>>>
>>>>> Without a per-task trampoline stack, we would be limited to call the switch CR3 functions
>>>>> from the assembly entry code before and after calling the C function handler (also called
>>>>> from assembly).
>>>>
>>>> The noinstr part of the C entry code won't sleep.
>>>>
>>>
>>> But the noinstr part of the handler can sleep, and if it does we will need to
>>> preserve the trampoline stack (even if we switch to the per-task kernel stack to
>>> execute the noinstr part).
>>>
>>> Example:
>>>
>>> #define DEFINE_IDTENTRY(func)                                           \
>>> static __always_inline void __##func(struct pt_regs *regs);             \
>>>                                                                           \
>>> __visible noinstr void func(struct pt_regs *regs)                       \
>>> {                                                                       \
>>>           irqentry_state_t state;         -+                              \
>>>                                            |                              \
>>>           user_pagetable_escape(regs);     | use trampoline stack (1)
>>>           state = irqentry_enter(regs);    |                              \
>>>           instrumentation_begin();        -+                              \
>>>           run_idt(__##func, regs);       |===| run __func() on kernel stack (this can sleep)
>>>           instrumentation_end();          -+                              \
>>>           irqentry_exit(regs, state);      | use trampoline stack (2)
>>>           user_pagetable_return(regs);    -+                              \
>>> }
>>>
>>> Between (1) and (2) we need to preserve and use the same trampoline stack
>>> in case __func() went sleeping.
>>>
>>
>> Why?  Right now, we have the percpu entry stack, and we do just fine
>> if we enter on one percpu stack and exit from a different one.
>> We would need to call from asm to C on the entry stack, return back to
>> asm, and then switch stacks.
>>
> 
> That's the problem: I didn't want to return back to asm, so that the pagetable
> switch can be done anywhere in the C handler.
> 
> So yes, returning to asm to switch the stack is the solution if we want to avoid
> having per-task trampoline stack. The drawback is that this forces to do the
> page-table switch at the beginning and end of the handler; the pagetable switch
> cannot be moved deeper down into the C handler.
> 
> But that's probably a good first step (effectively just moving CR3 switch to C
> without adding per-task trampoline stack). I will update the patches to do that,
> and we can defer the per-task trampoline stack to later if there's an effective
> need for it.
> 

That might not be a good first step after all... Calling CR3 switch C functions
from assembly introduces extra pt_regs copies between the trampoline stack and the
kernel stack.

Currently when entering syscall, we immediately switches CR3 and builds pt_regs
directly on the kernel stack. On return, registers are restored from pt_regs from
the kernel stack, the return frame is built on the trampoline stack and then we
switch CR3.

To call CR3 switch C functions on syscall entry, we need to switch to the trampoline
stack, build pt_regs on the trampoline stack, call CR3 switch, switch to the kernel
stack, copy pt_regs to the kernel stack. On return, we have to copy pt_regs back to
the trampoline stack, call CR3 switch, restore registers.

This is less of an impact for interrupt because we enter on the trampoline stack and
the current code already builds pt_regs on the trampoline stack and copies it to the
kernel stack (although this can certainly be avoided in the current code).

I am not comfortable adding these extra steps in syscall and interrupt as the current
code is fairly optimized. With a per-task trampoline stack, we don't have extra copy
because we can build pt_regs directly on the trampoline stack and it will preserved
even when switching to the kernel stack. On syscall/interrupt return, it also saves
a copy of the iret frame from the kernel stack to the trampoline stack.

alex.

  reply	other threads:[~2020-11-19 12:06 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-16 14:47 [RFC][PATCH v2 00/21] x86/pti: Defer CR3 switch to C code Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 01/21] x86/syscall: Add wrapper for invoking syscall function Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 02/21] x86/entry: Update asm_call_on_stack to support more function arguments Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 03/21] x86/entry: Consolidate IST entry from userspace Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 04/21] x86/sev-es: Define a setup stack function for the VC idtentry Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 05/21] x86/entry: Implement ret_from_fork body with C code Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 06/21] x86/pti: Provide C variants of PTI switch CR3 macros Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 07/21] x86/entry: Fill ESPFIX stack using C code Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 08/21] x86/pti: Introduce per-task PTI trampoline stack Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 09/21] x86/pti: Function to clone page-table entries from a specified mm Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 10/21] x86/pti: Function to map per-cpu page-table entry Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 11/21] x86/pti: Extend PTI user mappings Alexandre Chartre
2020-11-16 19:48   ` Andy Lutomirski
2020-11-16 20:21     ` Alexandre Chartre
2020-11-16 23:06       ` Andy Lutomirski
2020-11-17  8:42         ` Alexandre Chartre
2020-11-17 15:49           ` Andy Lutomirski
2020-11-19 19:15           ` Thomas Gleixner
2020-11-16 14:47 ` [RFC][PATCH v2 12/21] x86/pti: Use PTI stack instead of trampoline stack Alexandre Chartre
2020-11-16 16:57   ` Andy Lutomirski
2020-11-16 18:10     ` Alexandre Chartre
2020-11-16 18:34       ` Andy Lutomirski
2020-11-16 19:37         ` Alexandre Chartre
2020-11-17 15:09         ` Alexandre Chartre
2020-11-17 15:52           ` Andy Lutomirski
2020-11-17 17:01             ` Alexandre Chartre
2020-11-19  1:49               ` Andy Lutomirski
2020-11-19  8:05                 ` Alexandre Chartre
2020-11-19 12:06                   ` Alexandre Chartre [this message]
2020-11-19 16:06                     ` Andy Lutomirski
2020-11-19 17:02                       ` Alexandre Chartre
2020-11-16 21:24       ` David Laight
2020-11-17  8:27         ` Alexandre Chartre
2020-11-19 19:10       ` Thomas Gleixner
2020-11-19 19:55         ` Alexandre Chartre
2020-11-19 21:20           ` Thomas Gleixner
2020-11-24  7:20   ` [x86/pti] 5da9e742d1: PANIC:double_fault kernel test robot
2020-11-16 14:47 ` [RFC][PATCH v2 13/21] x86/pti: Execute syscall functions on the kernel stack Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 14/21] x86/pti: Execute IDT handlers " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 15/21] x86/pti: Execute IDT handlers with error code " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 16/21] x86/pti: Execute system vector handlers " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 17/21] x86/pti: Execute page fault handler " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 18/21] x86/pti: Execute NMI " Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 19/21] x86/pti: Defer CR3 switch to C code for IST entries Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 20/21] x86/pti: Defer CR3 switch to C code for non-IST and syscall entries Alexandre Chartre
2020-11-16 14:47 ` [RFC][PATCH v2 21/21] x86/pti: Use a different stack canary with the user and kernel page-table Alexandre Chartre
2020-11-16 16:56   ` Andy Lutomirski
2020-11-16 18:34     ` Alexandre Chartre
2020-11-16 20:17 ` [RFC][PATCH v2 00/21] x86/pti: Defer CR3 switch to C code Borislav Petkov
2020-11-17  7:56   ` Alexandre Chartre
2020-11-17 16:55     ` Borislav Petkov
2020-11-17 18:12       ` Alexandre Chartre
2020-11-17 18:28         ` Borislav Petkov
2020-11-17 19:02           ` Alexandre Chartre
2020-11-17 21:23             ` Borislav Petkov
2020-11-18  7:08               ` Alexandre Chartre
2020-11-17 21:26         ` Borislav Petkov
2020-11-18  7:41           ` Alexandre Chartre
2020-11-18  9:30             ` David Laight
2020-11-18 10:29               ` Alexandre Chartre
2020-11-18 13:22                 ` David Laight
2020-11-18 17:15                   ` Alexandre Chartre
2020-11-18 11:29             ` Borislav Petkov
2020-11-18 19:37               ` Alexandre Chartre
2020-11-16 20:24 ` Borislav Petkov
2020-11-17  8:19   ` Alexandre Chartre
2020-11-17 17:07     ` Borislav Petkov
2020-11-17 18:24       ` Alexandre Chartre
2020-11-19 19:32     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88bab705-4b33-bda9-3ece-563234822095@oracle.com \
    --to=alexandre.chartre@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=graf@amazon.de \
    --cc=hpa@zytor.com \
    --cc=jan.setjeeilers@oracle.com \
    --cc=jroedel@suse.de \
    --cc=junaids@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kuzuno@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mgross@linux.intel.com \
    --cc=mingo@redhat.com \
    --cc=oweisse@google.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).