Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()

From: Miroslav Benes <mbenes@suse.cz>
To: Li Bin <huawei.libin@huawei.com>
Cc: jpoimboe@redhat.com, sjenning@redhat.com, jikos@kernel.org,
	vojtech@suse.com, live-patching@vger.kernel.org,
	linux-kernel@vger.kernel.org, guohanjun@huawei.com,
	dingtianhong@huawei.com, xiexiuqi@huawei.com,
	zhouchengming1@huawei.com
Subject: Re: [PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()
Date: Tue, 15 Dec 2015 09:14:50 +0100 (CET)	[thread overview]
Message-ID: <alpine.LNX.2.00.1512150857510.24899@pobox.suse.cz> (raw)
In-Reply-To: <1448855677-8392-1-git-send-email-huawei.libin@huawei.com>


Hi,

[ sry for late responses. Two weeks of holiday and trying to go 
through all the emails... ]

On Mon, 30 Nov 2015, Li Bin wrote:

> There is a potential race as following:
> 
> CPU0                         |  CPU1
> -----------------------------|-----------------------------------
> enabled_store()              |  klp_unregister_patch()
>                              |  |-mutex_lock(&klp_mutex);
> |-mutex_lock(&klp_mutex);    |  |-klp_free_patch();
>                              |  |-mutex_unlock(&klp_mutex);
> |-[process the patch's state]|
> |-mutex_unlock(&klp_mutex)   |
> 
> Fix this race condition by adding klp_is_patch_registered() check in
> enabled_store() after get the lock klp_mutex.
> 
> Signed-off-by: Li Bin <huawei.libin@huawei.com>

Well, I think all the relevant feedback was mentioned in other replies. 
I'll add my two cents here than to reply to the respective ones.

1. as Josh pointed out this is a potential deadlock situation between 
   sysfs infrastructure and our unregister path. It has been already 
   discussed [1] and some solutions were proposed [2].

2. if I am not missing something it is purely theoretical now. 
   klp_unregister_patch is supposed to be called from module_exit function 
   to clean up after the patch module. There is no way today how this 
   function can be called. We take module reference (try_module_get() in 
   klp_register_patch()) and we do not put it anywhere. Because we can't 
   as Petr mentioned. Without a consistency model we cannot know if there 
   is a task in a module code or not.

So I think we can postpone the solution till there is a consistency model.

Regards,
Miroslav

[1] https://lkml.kernel.org/r/alpine.LNX.2.00.1502171728520.29490@pobox.suse.cz

[2]
 1. rework klp_unregister_patch and move kobject_put out of mutex 
    protected parts. This is what Jiri Slaby also proposed.

 2. as Petr Mladek said we could use mutex_trylock instead of mutex_lock 
    in enabled_store.

    I've even had the solution prepared in one of my git branches since 
    April so here it is just for the sake of completeness. It is based on 
    Jiri Slaby's kgraft consistency model for klp and it is maybe a 
    superset, but you get the idea.

-->8---

>From b854b3feac2883f5b0a17ea7a5c83b4389fcd6ad Mon Sep 17 00:00:00 2001
From: Miroslav Benes <mbenes@suse.cz>
Date: Thu, 2 Apr 2015 14:13:00 +0200
Subject: [PATCH] livepatch: allow removal of a disabled patch

Currently we do not allow patch module to unload since there is no
method to determine if a task is still running in the patched code.

The consistency model gives us the way because when the patching
finishes we know that all tasks were marked as safe to call a new
patched function. Thus every new call to the function calls the new
patched code and at the same time no task can be somewhere in the old
code, because it had to leave that code to be marked as safe.

We can safely let the patch module go after that.

Completion is used for synchronization between module removal and sysfs
infrastructure. Note that we still do not allow the removal for simple
model, that is no consistency model.

With this change a call to try_module_get() is moved to
__klp_enable_patch from klp_register_patch to make module reference
counting symmetric (module_put() is in a patch disable path). Also
mutex_lock() in enabled_store is changed to mutex_trylock() to prevent
a deadlock situation when klp_unregister_patch is called and sysfs
directories are removed.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
---
 include/linux/livepatch.h            |  3 ++
 kernel/livepatch/core.c              | 70 ++++++++++++++++++++++++------------
 samples/livepatch/livepatch-sample.c |  1 -
 3 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 5b0c5ca341ee..be2494759a6b 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -24,6 +24,7 @@
 #include <linux/module.h>
 #include <linux/ftrace.h>
 #include <linux/ptrace.h>
+#include <linux/completion.h>
 
 #if IS_ENABLED(CONFIG_LIVEPATCH)
 
@@ -163,6 +164,7 @@ struct klp_object {
  * @kobj:	kobject for sysfs resources
  * @state:	tracks patch-level application state
  * @cmodel:	cmodel_id's implementation
+ * @finish:	for waiting till it is safe to remove the patch module
  */
 struct klp_patch {
 	/* external */
@@ -175,6 +177,7 @@ struct klp_patch {
 	struct kobject kobj;
 	enum klp_state state;
 	struct klp_cmodel *cmodel;
+	struct completion finish;
 };
 
 #define klp_for_each_object(patch, obj) \
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index aa2a5c7c1301..046bf055c187 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -27,6 +27,7 @@
 #include <linux/ftrace.h>
 #include <linux/list.h>
 #include <linux/kallsyms.h>
+#include <linux/completion.h>
 #include <linux/livepatch.h>
 
 /**
@@ -514,6 +515,16 @@ static void klp_disable_patch_real(struct klp_patch *patch)
 	}
 
 	patch->state = KLP_DISABLED;
+
+	if (patch->cmodel->async_finish) {
+		/*
+		 * We need to wait for all the tasks to leave our rcu-protected
+		 * section in ftrace handler. Disabled funcs have been deleted
+		 * from the list, but there still could be readers who see them.
+		 */
+		synchronize_rcu();
+		module_put(patch->mod);
+	}
 }
 
 static int __klp_disable_patch(struct klp_patch *patch)
@@ -615,6 +626,18 @@ static int __klp_enable_patch(struct klp_patch *patch)
 	    list_prev_entry(patch, list)->state == KLP_DISABLED)
 		return -EBUSY;
 
+	/*
+	 * A reference is taken on the patch module to prevent it from being
+	 * unloaded.
+	 *
+	 * Note: For immediate (no consistency model) patches we don't allow
+	 * patch modules to unload since there is no safe/sane method to
+	 * determine if a thread is still running in the patched code contained
+	 * in the patch module once the ftrace registration is successful.
+	 */
+	if (!try_module_get(patch->mod))
+		return -ENODEV;
+
 	pr_notice_once("tainting kernel with TAINT_LIVEPATCH\n");
 	add_taint(TAINT_LIVEPATCH, LOCKDEP_STILL_OK);
 
@@ -715,7 +738,21 @@ static ssize_t enabled_store(struct kobject *kobj, struct kobj_attribute *attr,
 
 	patch = container_of(kobj, struct klp_patch, kobj);
 
-	mutex_lock(&klp_mutex);
+	/*
+	 * There has to be mutex_trylock here to prevent a potential deadlock
+	 * between enabled_store() and klp_unregister_patch().
+	 * klp_unregister_patch() takes klp_mutex and destroys our sysfs
+	 * infrastructure (thus taking sysfs active protection). We do the
+	 * opposite here.
+	 */
+	if (!mutex_trylock(&klp_mutex))
+		return -EBUSY;
+
+	if (!klp_is_patch_registered(patch)) {
+		/* module with the patch could disappear meanwhile */
+		ret = -EINVAL;
+		goto err;
+	}
 
 	if (val == patch->state) {
 		/* already in requested state */
@@ -759,10 +796,10 @@ static struct attribute *klp_patch_attrs[] = {
 
 static void klp_kobj_release_patch(struct kobject *kobj)
 {
-	/*
-	 * Once we have a consistency model we'll need to module_put() the
-	 * patch module here.  See klp_register_patch() for more details.
-	 */
+	struct klp_patch *patch;
+
+	patch = container_of(kobj, struct klp_patch, kobj);
+	complete(&patch->finish);
 }
 
 static struct kobj_type klp_ktype_patch = {
@@ -834,6 +871,7 @@ static void klp_free_patch(struct klp_patch *patch)
 	if (!list_empty(&patch->list))
 		list_del(&patch->list);
 	kobject_put(&patch->kobj);
+	wait_for_completion(&patch->finish);
 }
 
 static int klp_init_func(struct klp_object *obj, struct klp_func *func)
@@ -934,6 +972,7 @@ static int klp_init_patch(struct klp_patch *patch)
 
 	patch->cmodel = cmodel;
 	patch->state = KLP_DISABLED;
+	init_completion(&patch->finish);
 
 	ret = kobject_init_and_add(&patch->kobj, &klp_ktype_patch,
 				   klp_root_kobj, "%s", patch->mod->name);
@@ -999,33 +1038,20 @@ EXPORT_SYMBOL_GPL(klp_unregister_patch);
  * Initializes the data structure associated with the patch and
  * creates the sysfs interface.
  *
+ * There is no need to take the reference on the patch module here. It is done
+ * later when the patch is enabled.
+ *
  * Return: 0 on success, otherwise error
  */
 int klp_register_patch(struct klp_patch *patch)
 {
-	int ret;
-
 	if (!klp_initialized())
 		return -ENODEV;
 
 	if (!patch || !patch->mod)
 		return -EINVAL;
 
-	/*
-	 * A reference is taken on the patch module to prevent it from being
-	 * unloaded.  Right now, we don't allow patch modules to unload since
-	 * there is currently no method to determine if a thread is still
-	 * running in the patched code contained in the patch module once
-	 * the ftrace registration is successful.
-	 */
-	if (!try_module_get(patch->mod))
-		return -ENODEV;
-
-	ret = klp_init_patch(patch);
-	if (ret)
-		module_put(patch->mod);
-
-	return ret;
+	return klp_init_patch(patch);
 }
 EXPORT_SYMBOL_GPL(klp_register_patch);
 
diff --git a/samples/livepatch/livepatch-sample.c b/samples/livepatch/livepatch-sample.c
index 25289083deac..85e9a87ecd8e 100644
--- a/samples/livepatch/livepatch-sample.c
+++ b/samples/livepatch/livepatch-sample.c
@@ -83,7 +83,6 @@ static int livepatch_init(void)
 
 static void livepatch_exit(void)
 {
-	WARN_ON(klp_disable_patch(&patch));
 	WARN_ON(klp_unregister_patch(&patch));
 }
 
-- 
2.1.4