From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752732AbbEGRrk (ORCPT ); Thu, 7 May 2015 13:47:40 -0400 Received: from mail-lb0-f177.google.com ([209.85.217.177]:34726 "EHLO mail-lb0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751394AbbEGRrd (ORCPT ); Thu, 7 May 2015 13:47:33 -0400 MIME-Version: 1.0 In-Reply-To: <20150507150845.GA20608@gmail.com> References: <20150501064044.GA18957@gmail.com> <554399D1.6010405@redhat.com> <1430659432.4233.3.camel@gmail.com> <55465B2D.6010300@redhat.com> <55466E72.8060602@redhat.com> <20150507104845.GB14924@gmail.com> <20150507124437.GB17443@gmail.com> <20150507150845.GA20608@gmail.com> From: Andy Lutomirski Date: Thu, 7 May 2015 10:47:10 -0700 Message-ID: Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry To: Ingo Molnar Cc: fweisbec@redhat.com, Paolo Bonzini , X86 ML , Thomas Gleixner , Peter Zijlstra , Heiko Carstens , Ingo Molnar , Mike Galbraith , Rik van Riel , "linux-kernel@vger.kernel.org" , williams@redhat.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On May 7, 2015 8:38 PM, "Ingo Molnar" wrote: > > > * Andy Lutomirski wrote: > > > I think one or both of us is missing something or we're just talking > > about different things. > > That's very much possible! > > I think part of the problem is that I called the 'remote CPU' the RT > CPU, while you seem to be calling it the CPU that does the > synchronize_rcu(). > > So lets start again, with calling the synchronize_rcu() the 'remote > CPU', and the one doing the RT workload the 'RT CPU': > > > If the nohz/RT cpu is about to enter user mode and stay there for a > > long time, it does: > > > > this_cpu_inc(rcu_qs_ctr); > > > > or whatever. Maybe we add: > > > > this_cpu_set(rcu_state) = IN_USER; > > > > or however it's spelled. > > > > The remote CPU wants to check our state. If this happens just > > before the IN_USER write or rcu_qs_ctr increment becomes visible, > > then it'll think we're in kernel mode. Now it either has to poll > > (which is fine) or try to get us to tell the RCU core when we become > > quiescent by setting TIF_RCU_THINGY. > > So do you mean: > > this_cpu_set(rcu_state) = IN_KERNEL; > ... > this_cpu_inc(rcu_qs_ctr); > this_cpu_set(rcu_state) = IN_USER; > > ? > > So in your proposal we'd have an INC and two MOVs. I think we can make > it just two simple stores into a byte flag, one on entry and one on > exit: > > this_cpu_set(rcu_state) = IN_KERNEL; > ... > this_cpu_set(rcu_state) = IN_USER; > I was thinking that either a counter or a state flag could make sense. Doing both would be pointless. The counter could use the low bits to indicate the state. The benefit of the counter would be that the RCU-waiting CPU could observe that the counter has incremented and that therefore a grace period has elapsed. Getting it right would require lots of care. > plus the rare but magic TIF_RCU_THINGY that tells a waiting > synchronize_rcu() about the next quiescent state. > > > The problem is that I don't see how TIF_RCU_THINGY can work > > reliably. If the remote CPU sets it, it'll be too late and we'll > > still enter user mode without seeing it. If it's just an > > optimization, though, then it should be fine. > > Well, after setting it, the remote CPU has to re-check whether the RT > CPU has entered user-mode - before it goes to wait. How? Suppose the exit path looked like: this_cpu_write(rcu_state, IN_USER); if (ti->flags & _TIF_RCU_NOTIFY) { if (test_and_clear_bit(TIF_RCU_NOTIFY, &ti->flags)) slow_notify_rcu_that_we_are_exiting(); } iret or sysret; The RCU-waiting CPU sees that rcu_state == IN_KERNEL and sets _TIF_RCU_NOTIFY. This could happen arbitrarily late before IRET because stores can be delayed. (It could even happen after sysret, IIRC, but IRET is serializing.) If we put an mfence after this_cpu_set or did an unconditional test_and_clear_bit on ti->flags then this problem goes away, but that would probably be slower than we'd like. --Andy