From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753066AbcHJSMs (ORCPT ); Wed, 10 Aug 2016 14:12:48 -0400 Received: from mail.efficios.com ([78.47.125.74]:57813 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932436AbcHJSMj (ORCPT ); Wed, 10 Aug 2016 14:12:39 -0400 Date: Wed, 10 Aug 2016 17:33:44 +0000 (UTC) From: Mathieu Desnoyers To: Boqun Feng Cc: Andy Lutomirski , Peter Zijlstra , Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , linux-kernel , linux-api , Paul Turner , Andrew Hunter , Andi Kleen , Dave Watson , Chris Lameter , Ben Maurer , rostedt , "Paul E. McKenney" , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Michael Kerrisk Message-ID: <1918884375.7403.1470850424697.JavaMail.zimbra@efficios.com> In-Reply-To: <20160809161328.GA1740@tardis.cn.ibm.com> References: <1469135662-31512-1-git-send-email-mathieu.desnoyers@efficios.com> <1806206514.82247.1469502139408.JavaMail.zimbra@efficios.com> <20160803122717.GL6862@twins.programming.kicks-ass.net> <20160804042710.GA13232@tardis.cn.ibm.com> <20160809161328.GA1740@tardis.cn.ibm.com> Subject: Re: [RFC PATCH v7 1/7] Restartable sequences system call MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [78.47.125.74] X-Mailer: Zimbra 8.7.0_GA_1659 (ZimbraWebClient - FF45 (Linux)/8.7.0_GA_1659) Thread-Topic: Restartable sequences system call Thread-Index: 3iJEdotk7uolMzm0qRtGeMRzM6bQGQ== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Aug 9, 2016, at 12:13 PM, Boqun Feng boqun.feng@gmail.com wrote: > > However, I'm thinking maybe we can use some tricks to avoid unnecessary > aborts-on-preemption. > > First of all, I notice we haven't make any constraint on what kind of > memory objects could be "protected" by rseq critical sections yet. And I > think this is something we should decide before adding this feature into > kernel. > > We can do some optimization if we have some constraints. For example, if > the memory objects inside the rseq critical sections could only be > modified by userspace programs, we therefore don't need to abort > immediately when userspace task -> kernel task context switch. The rseq_owner per-cpu variable and rseq_cpu field in task_struct you propose below would indeed take care of this scenario. > > Further more, if the memory objects inside the rseq critical sections > could only be modified by userspace programs that have registered their > rseq structures, we don't need to abort immediately between the context > switches between two rseq-unregistered tasks or one rseq-registered > task and one rseq-unregistered task. > > Instead, we do tricks as follow: > > defining a percpu pointer in kernel: > > DEFINE_PER_CPU(struct task_struct *, rseq_owner); > > and a cpu field in struct task_struct: > > struct task_struct { > ... > #ifdef CONFIG_RSEQ > struct rseq __user *rseq; > uint32_t rseq_event_counter; > int rseq_cpu; > #endif > ... > }; > > (task_struct::rseq_cpu should be initialized as -1.) > > each time at sched out(in rseq_sched_out()), we do something like: > > if (prev->rseq) { > raw_cpu_write(rseq_owner, prev); > prev->rseq_cpu = smp_processor_id(); > } > > each time sched in(in rseq_handle_notify_resume()), we do something > like: > > if (current->rseq && > (this_cpu_read(rseq_owner) != current || > current->rseq_cpu != smp_processor_id())) > __rseq_handle_notify_resume(regs); > > (Also need to modify rseq_signal_deliver() to call > __rseq_handle_notify_resume() directly). > > > I think this could save some unnecessary aborts-on-preemption, however, > TBH, I'm too sleepy to verify every corner case. Will recheck this > tomorrow. This adds extra fields to the task struct, per-cpu rseq_owner pointers, and hooks into sched_in which are not needed otherwise, all this to eliminate unneeded abort-on-preemption. If we look at the single-stepping use-case, this means that gdb would only be able to single-step applications as long as neither itself, nor any of its libraries, use rseq. This seems to be quite fragile. I prefer requiring rseq users to implement a fallback to locking which progresses in every situation rather than adding complexity and overhead trying lessen the odds of triggering the restart. Simply lessening the odds of triggering the restart without a design that ensures progress even in restart cases seems to make the lack-of-progress problem just harder to debug when it will surface in real life. Thanks, Mathieu > > Regards, > Boqun -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com