From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932140AbbEGKsx (ORCPT ); Thu, 7 May 2015 06:48:53 -0400 Received: from mail-wi0-f176.google.com ([209.85.212.176]:35644 "EHLO mail-wi0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752573AbbEGKsu (ORCPT ); Thu, 7 May 2015 06:48:50 -0400 Date: Thu, 7 May 2015 12:48:45 +0200 From: Ingo Molnar To: Rik van Riel Cc: Andy Lutomirski , Mike Galbraith , "linux-kernel@vger.kernel.org" , X86 ML , williams@redhat.com, Andrew Lutomirski , fweisbec@redhat.com, Peter Zijlstra , Heiko Carstens , Thomas Gleixner , Ingo Molnar , Paolo Bonzini Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Message-ID: <20150507104845.GB14924@gmail.com> References: <1430429035-25563-1-git-send-email-riel@redhat.com> <1430429035-25563-4-git-send-email-riel@redhat.com> <20150501064044.GA18957@gmail.com> <554399D1.6010405@redhat.com> <1430659432.4233.3.camel@gmail.com> <55465B2D.6010300@redhat.com> <55466E72.8060602@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55466E72.8060602@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Rik van Riel wrote: > > If, on the other hand, you're just going to remotely sample the > > in-memory context, that sounds good. > > It's the latter. > > If you look at /proc//{stack,syscall,wchan} and other files, > you will see we already have ways to determine, from in memory > content, where a program is running at a certain point in time. > > In fact, the timer interrupt based accounting does a similar thing. > It has a task examine its own in-memory state to figure out what it > was doing before the timer interrupt happened. > > The kernel side stack pointer is probably enough to tell us whether > a task is active in kernel space, on an irq stack, or (maybe) in > user space. Not convinced about the latter, we may need to look at > the same state the RCU code keeps track of to see what mode a task > is in... > > I am looking at the code to see what locks we need to grab. > > I suspect the runqueue lock may be enough, to ensure that the task > struct, and stack do not go away while we are looking at them. That will be enough, especially if you get to the task reference via rq->curr. > We cannot take the lock_trace(task) from irq context, and we > probably do not need to anyway, since we do not care about a precise > stack trace for the task. So one worry with this and similar approaches of statistically detecting user mode would be the fact that on the way out to user-space we don't really destroy the previous call trace - we just pop off the stack (non-destructively), restore RIPs and are gone. We'll need that percpu flag I suspect. And once we have the flag, we can get rid of the per syscall RCU callback as well, relatively easily: with CMPXCHG (in synchronize_rcu()!) we can reliably sample whether a CPU is in user mode right now, while the syscall entry/exit path does not use any atomics, we can just use a simple MOV. Once we observe 'user mode', then we have observed quiescent state and synchronize_rcu() can continue. If we've observed kernel mode we can frob the remote task's TIF_ flags to make it go into a quiescent state publishing routine on syscall-return. The only hard requirement of this scheme from the RCU synchronization POV is that all kernel contexts that may touch RCU state need to flip this flag reliably to 'kernel mode': i.e. all irq handlers, traps, NMIs and all syscall variants need to do this. But once it's there, it's really neat. Thanks, Ingo