From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B56C7C433E0 for ; Tue, 19 May 2020 20:20:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9CA072075F for ; Tue, 19 May 2020 20:20:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728015AbgESUU3 (ORCPT ); Tue, 19 May 2020 16:20:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726595AbgESUU2 (ORCPT ); Tue, 19 May 2020 16:20:28 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80714C08C5C0 for ; Tue, 19 May 2020 13:20:28 -0700 (PDT) Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jb8iZ-00013p-1v; Tue, 19 May 2020 22:20:03 +0200 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id 515EB100D01; Tue, 19 May 2020 22:20:02 +0200 (CEST) From: Thomas Gleixner To: Andy Lutomirski Cc: LKML , X86 ML , "Paul E. McKenney" , Andy Lutomirski , Alexandre Chartre , Frederic Weisbecker , Paolo Bonzini , Sean Christopherson , Masami Hiramatsu , Petr Mladek , Steven Rostedt , Joel Fernandes , Boris Ostrovsky , Juergen Gross , Brian Gerst , Mathieu Desnoyers , Josh Poimboeuf , Will Deacon , Tom Lendacky , Wei Liu , Michael Kelley , Jason Chen CJ , Zhao Yakui , "Peter Zijlstra \(Intel\)" Subject: Re: [patch V6 12/37] x86/entry: Provide idtentry_entry/exit_cond_rcu() In-Reply-To: <87ftbv7nsd.fsf@nanos.tec.linutronix.de> References: <20200515234547.710474468@linutronix.de> <20200515235125.628629605@linutronix.de> <87ftbv7nsd.fsf@nanos.tec.linutronix.de> Date: Tue, 19 May 2020 22:20:02 +0200 Message-ID: <87a7237k3x.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thomas Gleixner writes: > Andy Lutomirski writes: >> On Fri, May 15, 2020 at 5:10 PM Thomas Gleixner wrote: >>> The pagefault handler cannot use the regular idtentry_enter() because that >>> invokes rcu_irq_enter() if the pagefault was caused in the kernel. Not a >>> problem per se, but kernel side page faults can schedule which is not >>> possible without invoking rcu_irq_exit(). >>> >>> Adding rcu_irq_exit() and a matching rcu_irq_enter() into the actual >>> pagefault handling code would be possible, but not pretty either. >>> >>> Provide idtentry_entry/exit_cond_rcu() which calls rcu_irq_enter() only >>> when RCU is not watching. The conditional RCU enabling is a correctness >>> issue: A kernel page fault which hits a RCU idle reason can neither >>> schedule nor is it likely to survive. But avoiding RCU warnings or RCU side >>> effects is at least increasing the chance for useful debug output. >>> >>> The function is also useful for implementing lightweight reschedule IPI and >>> KVM posted interrupt IPI entry handling later. >> >> Why is this conditional? That is, couldn't we do this for all >> idtentry_enter() calls instead of just for page faults? Evil things >> like NMI shouldn't go through this path at all. > > I thought about that, but then ended up with the conclusion that RCU > might be unhappy, but my conclusion might be fundamentally wrong. It's about this: rcu_nmi_enter() { if (!rcu_is_watching()) { make it watch; } else if (!in_nmi()) { do_magic_nohz_dyntick_muck(); } So if we do all irq/system vector entries conditional then the do_magic() gets never executed. After that I got lost... Thanks, tglx