From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753184AbdEHQvO (ORCPT ); Mon, 8 May 2017 12:51:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44668 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750841AbdEHQvM (ORCPT ); Mon, 8 May 2017 12:51:12 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com EAA4AC049DFD Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jpoimboe@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com EAA4AC049DFD Date: Mon, 8 May 2017 11:51:08 -0500 From: Josh Poimboeuf To: Petr Mladek Cc: Jessica Yu , Jiri Kosina , Miroslav Benes , Steven Rostedt , "Paul E. McKenney" , live-patching@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/3] livepatch/rcu: Warn when system consistency is broken in RCU code Message-ID: <20170508165108.d3vd4h6ffa25bfui@treble> References: <1493895316-19165-1-git-send-email-pmladek@suse.com> <1493895316-19165-3-git-send-email-pmladek@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1493895316-19165-3-git-send-email-pmladek@suse.com> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 08 May 2017 16:51:12 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 04, 2017 at 12:55:15PM +0200, Petr Mladek wrote: > RCU is not watching inside some RCU code. As a result, the livepatch > ftrace handler might see ops->func_stack and some other flags in > a wrong state. Then a livepatch might make the system unstable. > > Note that there might be serious consequences only when the livepatch > modifies semantic of functions used by some parts of RCU infrastructure. > We are safe when none of the patched functions is used by this RCU code. > Also everything is good when the functions might be changed one by one > (using the immediate flag). See Documentation/livepatch/livepatch.txt > for more details about the consistency model. > > Fortunately, the sensitive parts of the RCU code are annotated > and could be detected. This patch adds a warning when an insecure > usage is detected. > > Unfortunately, the ftrace handler might be called when the problematic > patch has already been removed from ops->func stack. In this case, > it is not able to read the immediate flag. It makes the check > unreliable. We rather avoid it and report the problem even when > the system stability is not affected. > > It would be possible to add some more complex logic to avoid > warnings when RCU infrastructure is modified using immediate > patches. But let's keep it simple until real life experience > forces us to do the opposite. > > This patch is inspired by similar problems solved in stack > tracer, see > https://lkml.kernel.org/r/20170412115304.3077dbc8@gandalf.local.home > > Signed-off-by: Petr Mladek > --- > Documentation/livepatch/livepatch.txt | 15 +++++++++++++++ > kernel/livepatch/patch.c | 8 ++++++++ > 2 files changed, 23 insertions(+) > > diff --git a/Documentation/livepatch/livepatch.txt b/Documentation/livepatch/livepatch.txt > index ecdb18104ab0..92619f7d86fd 100644 > --- a/Documentation/livepatch/livepatch.txt > +++ b/Documentation/livepatch/livepatch.txt > @@ -476,6 +476,21 @@ The current Livepatch implementation has several limitations: > by "notrace". > > > + + Limited patching of RCU infrastructure > + > + The livepatch ftrace handler uses RCU for handling the stack of patches. > + Also the ftrace framework uses RCU to detect when the handlers are not > + longer in use. > + > + The directly used RCU functions could not be patched. Otherwise, > + there would be an infinite recursion. > + > + Some other RCU functions can not be patched safely because the RCU > + framework might be in an inconsistent state. This context is annotated > + and warning is printed. In theory, it might be safe to modify such > + functions using immediate patches. But this is hard to detect properly > + in the ftrace handler, so the warning is always printed. > + > + Livepatch works reliably only when the dynamic ftrace is located at > the very beginning of the function. > diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c > index 4c4fbe409008..ffdf5fa8005b 100644 > --- a/kernel/livepatch/patch.c > +++ b/kernel/livepatch/patch.c > @@ -62,6 +62,14 @@ static void notrace klp_ftrace_handler(unsigned long ip, > /* RCU may not be watching, make it see us. */ > rcu_irq_enter_irqson(); > > + /* > + * RCU still might not see us if we patch a function inside > + * the RCU infrastructure. Then we might see wrong state of > + * func->stack and other flags. > + */ > + if (unlikely(!rcu_is_watching())) > + WARN_ONCE(1, "Livepatch modified a function that can not be handled a safe way.!"); > + > rcu_read_lock(); > > func = list_first_or_null_rcu(&ops->func_stack, struct klp_func, So the ftrace stack tracer seems to do this check a little differently. It calls rcu_irq_enter_disabled() first, and then calls rcu_irq_enter(). Any reason we're not doing it that way? The warning would be more helpful if it printed a little more information, like the ip and the parent_ip. And it should probably mention that RCU is broken. Also I wonder if we can constrain the warning somehow. I think the warning only applies if a patch is in progress, right? In that case, if RCU is broken, would it be feasible to mutex_trylock() the klp mutex to try to ensure that no patches are being applied while the ftrace handler is running? Then it wouldn't matter if RCU were broken because the func stack wouldn't be changing anyway. Then it could only warn if it failed to get the mutex. Stepping back a bit, the documentation and comments describe patches to functions inside the RCU infrastructure. As far as I can tell, only a single function would be affected: rcu_dynticks_eqs_enter(). Because it's the only function called after the "Breaks tracing momentarily" comment in rcu_eqs_enter_common(). Any reason why we couldn't just annotate rcu_dynticks_eqs_enter() with notrace? Stepping back even further, if I'm understanding this issue correctly, this warning can also affect patches to functions which are called from NMI context. If the NMI occurs in the part of rcu_eqs_enter_common() where rcu_irq_enter() doesn't work, then RCU won't work here, right? If so, that worries me because there are a lot of functions which can be called (and patched) from NMI context. I wonder if there's some way to solve this by changing RCU code, but I'm not familiar enough with RCU to have any ideas there. Another idea would be to figure out a way to stop using RCU in klp_ftrace_handler() altogether. -- Josh