From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754121AbdKBLsX (ORCPT ); Thu, 2 Nov 2017 07:48:23 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:53816 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751370AbdKBLsW (ORCPT ); Thu, 2 Nov 2017 07:48:22 -0400 Date: Thu, 2 Nov 2017 12:48:20 +0100 (CET) From: Thomas Gleixner To: Andy Lutomirski cc: Dave Hansen , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , moritz.lipp@iaik.tugraz.at, Daniel Gruss , michael.schwarz@iaik.tugraz.at, Linus Torvalds , Kees Cook , Hugh Dickins , X86 ML , Borislav Petkov , Josh Poimboeuf Subject: Re: KAISER memory layout (Re: [PATCH 06/23] x86, kaiser: introduce user-mapped percpu areas) In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2 Nov 2017, Andy Lutomirski wrote: > I think we're far enough along here that it may be time to nail down > the memory layout for real. I propose the following: > > The user tables will contain the following: > > - The GDT array. > - The IDT. > - The vsyscall page. We can make this be _PAGE_USER. I rather remove it for the kaiser case. > - The TSS. > - The per-cpu entry stack. Let's make it one page with guard pages > on either side. This can replace rsp_scratch. > - cpu_current_top_of_stack. This could be in the same page as the TSS. > - The entry text. > - The percpu IST (aka "EXCEPTION") stacks. Do you really want to put the full exception stacks into that user mapping? I think we should not do that. There are two options: 1) Always use the per-cpu entry stack and switch to the proper IST after the CR3 fixup 2) Have separate per-cpu entry stacks for the ISTs and switch to the real ones after the CR3 fixup. > We can either try to move all of the above into the fixmap or we can > have the user tables be sparse a la Dave's current approach. If we do > it the latter way, I think we'll want to add a mechanism to have holes > in the percpu space to give the entry stack a guard page. > > I would *much* prefer moving everything into the fixmap, but that's a > wee bit awkward because we can't address per-cpu data in the fixmap > using %gs, which makes the SYSCALL code awkward. But we could alias > the SYSCALL entry text itself per-cpu into the fixmap, which lets us > use %rip-relative addressing, which is quite nice. > > So I guess my preference is to actually try the fixmap approach. We > give the TSS the same aliasing treatment we gave the GDT, and I can > try to make the entry trampoline work through the fixmap and thus not > need %gs-based addressing until CR3 gets updated. (This actually > saves several cycles of latency.) Makes a lot of sense. Thanks, tglx