From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932186AbbFQLFC (ORCPT ); Wed, 17 Jun 2015 07:05:02 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:36706 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753710AbbFQLEz (ORCPT ); Wed, 17 Jun 2015 07:04:55 -0400 Date: Wed, 17 Jun 2015 13:04:50 +0200 From: Ingo Molnar To: Richard Weinberger Cc: Andy Lutomirski , "x86@kernel.org" , LKML , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Rik van Riel , Oleg Nesterov , Denys Vlasenko , Borislav Petkov , Kees Cook , Brian Gerst , Linus Torvalds , Denys Vlasenko Subject: Re: [RFC/INCOMPLETE 00/13] x86: Rewrite exit-to-userspace code Message-ID: <20150617110450.GA8919@gmail.com> References: <20150617094857.GB3940@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Richard Weinberger wrote: > On Wed, Jun 17, 2015 at 11:48 AM, Ingo Molnar wrote: > > > > * Andy Lutomirski wrote: > > > >> This is incomplete, but it's finally good enough that I think it's > >> time to get other opinions on it. It is a complete rewrite of the > >> slow path code that handles exits to user mode. > > > > Modulo the small comments I made about the debug checks interface plus naming > > details the structure and intention of this series gives me warm fuzzy feelings. > > > >> The exit-to-usermode code is copied in several places and is written in a nasty > >> combination of asm and C. It's not at all clear what it's supposed to do, and > >> the way it's structured makes it very hard to work with. For example, it's not > >> even clear why syscall exit hooks are called only once per syscall right now. > >> (It seems to be a side effect of the way that rdi and rdx are handled in the asm > >> loop, and it seems reliable, but it's still pointlessly complicated.) The > >> existing code also makes context tracking overly complicated and hard to > >> understand. Finally, it's nearly impossible for anyone to change what happens > >> on exit to usermode, since the existing code is so fragile. > > > > Amen. > > > >> I tried to clean it up incrementally, but I decided it was too hard. Instead, > >> this series just replaces the code. It seems to work. > > > > Any known bugs beyond UML build breakage? > > > >> Context tracking in particular works very differently now. The low-level entry > >> code checks that we're in CONTEXT_USER and switches to CONTEXT_KERNEL. The exit > >> code does the reverse. There is no need to track what CONTEXT_XYZ state we came > >> from, because we already know. Similarly, SCHEDULE_USER is gone, since we can > >> reschedule if needed by simply calling schedule() from C code. > >> > >> The main things that are missing are that I haven't done the 32-bit parts > >> (anyone want to help?) and therefore I haven't deleted the old C code. I also > >> think this may break UML for trivial reasons. > >> > >> Because I haven't converted the 32-bit code yet, all of the now-unnecessary > >> unnecessary calls to exception_enter are still present in traps.c. > >> > >> IRQ context tracking is still duplicated. We should probably clean it up by > >> changing the core code to supply something like > >> irq_enter_we_are_already_in_context_kernel. > >> > >> Thoughts? > > > > So assuming you fix the UML build I'm inclined to go for it, even in this > > incomplete form, to increase testing coverage. > > Andy, can you please share the build breakage you're facing? > I'll happily help you fixing it. So they come in the form of: ./arch/um/include/shared/kern_util.h:25:12: error: conflicting types for ‘do_signal’ which comes from now x86 also having a do_signal(). The patch below fixes it by harmonizing the UML implementation with the x86 one. This improves the UML side a bit, and fixes the build failure. Thanks, Ingo =========================> Subject: uml: Fix do_signal() prototype From: Ingo Molnar Date: Wed Jun 17 12:58:37 CEST 2015 Now that x86 exports its do_signal(), the prototypes clash. Fix the clash and also improve the code a bit: remove the unnecessary kern_do_signal() indirection. This allows interrupt_end() to share the 'regs' parameter calculation. Also remove the unused return code to match x86. Minimally build and boot tested. Cc: Richard Weinberger Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar --- arch/um/include/shared/kern_util.h | 3 ++- arch/um/kernel/process.c | 6 ++++-- arch/um/kernel/signal.c | 8 +------- arch/um/kernel/tlb.c | 2 +- arch/um/kernel/trap.c | 2 +- 5 files changed, 9 insertions(+), 12 deletions(-) Index: tip/arch/um/include/shared/kern_util.h =================================================================== --- tip.orig/arch/um/include/shared/kern_util.h +++ tip/arch/um/include/shared/kern_util.h @@ -22,7 +22,8 @@ extern int kmalloc_ok; extern unsigned long alloc_stack(int order, int atomic); extern void free_stack(unsigned long stack, int order); -extern int do_signal(void); +struct pt_regs; +extern void do_signal(struct pt_regs *regs); extern void interrupt_end(void); extern void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs); Index: tip/arch/um/kernel/process.c =================================================================== --- tip.orig/arch/um/kernel/process.c +++ tip/arch/um/kernel/process.c @@ -90,12 +90,14 @@ void *__switch_to(struct task_struct *fr void interrupt_end(void) { + struct pt_regs *regs = ¤t->thread.regs; + if (need_resched()) schedule(); if (test_thread_flag(TIF_SIGPENDING)) - do_signal(); + do_signal(regs); if (test_and_clear_thread_flag(TIF_NOTIFY_RESUME)) - tracehook_notify_resume(¤t->thread.regs); + tracehook_notify_resume(regs); } void exit_thread(void) Index: tip/arch/um/kernel/signal.c =================================================================== --- tip.orig/arch/um/kernel/signal.c +++ tip/arch/um/kernel/signal.c @@ -64,7 +64,7 @@ static void handle_signal(struct ksignal signal_setup_done(err, ksig, singlestep); } -static int kern_do_signal(struct pt_regs *regs) +void do_signal(struct pt_regs *regs) { struct ksignal ksig; int handled_sig = 0; @@ -110,10 +110,4 @@ static int kern_do_signal(struct pt_regs */ if (!handled_sig) restore_saved_sigmask(); - return handled_sig; -} - -int do_signal(void) -{ - return kern_do_signal(¤t->thread.regs); } Index: tip/arch/um/kernel/tlb.c =================================================================== --- tip.orig/arch/um/kernel/tlb.c +++ tip/arch/um/kernel/tlb.c @@ -291,7 +291,7 @@ void fix_range_common(struct mm_struct * /* We are under mmap_sem, release it such that current can terminate */ up_write(¤t->mm->mmap_sem); force_sig(SIGKILL, current); - do_signal(); + do_signal(¤t->thread.regs); } } Index: tip/arch/um/kernel/trap.c =================================================================== --- tip.orig/arch/um/kernel/trap.c +++ tip/arch/um/kernel/trap.c @@ -173,7 +173,7 @@ static void bad_segv(struct faultinfo fi void fatal_sigsegv(void) { force_sigsegv(SIGSEGV, current); - do_signal(); + do_signal(¤t->thread.regs); /* * This is to tell gcc that we're not returning - do_signal * can, in general, return, but in this case, it's not, since