From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752906AbbEATk4 (ORCPT ); Fri, 1 May 2015 15:40:56 -0400 Received: from mail-lb0-f172.google.com ([209.85.217.172]:36038 "EHLO mail-lb0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752141AbbEATiB (ORCPT ); Fri, 1 May 2015 15:38:01 -0400 MIME-Version: 1.0 In-Reply-To: <5543CFE5.1030509@redhat.com> References: <1430429035-25563-1-git-send-email-riel@redhat.com> <1430429035-25563-4-git-send-email-riel@redhat.com> <20150501064044.GA18957@gmail.com> <554399D1.6010405@redhat.com> <20150501155912.GA451@gmail.com> <20150501162109.GA1091@gmail.com> <5543A94B.3020108@redhat.com> <20150501163431.GB1327@gmail.com> <5543C05E.9040209@redhat.com> <20150501184025.GA2114@gmail.com> <5543CFE5.1030509@redhat.com> From: Andy Lutomirski Date: Fri, 1 May 2015 12:37:38 -0700 Message-ID: Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry To: Rik van Riel Cc: Ingo Molnar , "linux-kernel@vger.kernel.org" , X86 ML , williams@redhat.com, Andrew Lutomirski , fweisbec@redhat.com, Peter Zijlstra , Heiko Carstens , Thomas Gleixner , Ingo Molnar , Paolo Bonzini , "Paul E. McKenney" , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 1, 2015 at 12:11 PM, Rik van Riel wrote: > On 05/01/2015 02:40 PM, Ingo Molnar wrote: > >> Or we could do that in the syscall path with a single store of a >> constant flag to a location in the task struct. We have a number of >> natural flags that get written on syscall entry, such as: >> >> pushq_cfi $__USER_DS /* pt_regs->ss */ >> >> That goes to a constant location on the kernel stack. On return from >> system calls we could write 0 to that location. Huh? IRET with zero there will fault, and we can't guarantee that all syscalls return using sysret. Also, we'd have to audit all the entries, and system_call_after_swapgs currently enables interrupts early enough that an interrupt before all the pushes will do unpredictable things to pt_regs. We could further abuse orig_ax, but that would require a lot of auditing. Honestly, though, I think keeping a flag in an otherwise-hot cache line is a better bet. Also, making it per-cpu instead of per-task will probably be easier on the RCU code, since otherwise the RCU code will have to do some kind of synchronization (even if it's a wait-free probe of the rq lock or similar) to figure out which pt_regs to look at. If we went that route, I'd advocate sticking the flag in tss->sp1. That cacheline is unconditionally read on kernel entry already, and sp1 is available in tip:x86/asm (and maybe even in Linus' tree -- I lost track). [1] Alternatively, I'd suggest that we actually add a whole new word to pt_regs. [1] It's not unconditionally accessed yet, but it wil be once Denys' latest patches are in. --Andy